1. 21 Jan, 2020 1 commit
  2. 25 Nov, 2019 1 commit
    • Chris Wilson's avatar
      drm/i915/gt: Schedule request retirement when timeline idles · 31177017
      Chris Wilson authored
      The major drawback of commit 7e34f4e4 ("drm/i915/gen8+: Add RC6 CTX
      corruption WA") is that it disables RC6 while Skylake (and friends) is
      active, and we do not consider the GPU idle until all outstanding
      requests have been retired and the engine switched over to the kernel
      context. If userspace is idle, this task falls onto our background idle
      worker, which only runs roughly once a second, meaning that userspace has
      to have been idle for a couple of seconds before we enable RC6 again.
      Naturally, this causes us to consume considerably more energy than
      before as powersaving is effectively disabled while a display server
      (here's looking at you Xorg) is running.
      As execlists will get a completion event as each context is completed,
      we can use this interrupt to queue a retire worker bound to this engine
      to cleanup idle timelines. We will then immediately notice the idle
      engine (without userspace intervention or the aid of the background
      retire worker) and start parking the GPU. Thus during light workloads,
      we will do much more work to idle the GPU faster...  Hopefully with
      commensurate power saving!
      v2: Watch context completions and only look at those local to the engine
      when retiring to reduce the amount of excess work we perform.
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=112315
      References: 7e34f4e4 ("drm/i915/gen8+: Add RC6 CTX corruption WA")
      References: 2248a283
       ("drm/i915/gen8+: Add RC6 CTX corruption WA")
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20191125105858.1718307-3-chris@chris-wilson.co.uk
      (cherry picked from commit 4f88f8747fa43c97c3b3712d8d87295ea757cc51)
      Signed-off-by: default avatarJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
  3. 05 Nov, 2019 1 commit
    • Jon Bloomfield's avatar
      drm/i915: Add support for mandatory cmdparsing · 311a50e7
      Jon Bloomfield authored
      The existing cmdparser for gen7 can be bypassed by specifying
      batch_len=0 in the execbuf call. This is safe because bypassing
      simply reduces the cmd-set available.
      In a later patch we will introduce cmdparsing for gen9, as a
      security measure, which must be strictly enforced since without
      it we are vulnerable to DoS attacks.
      Introduce the concept of 'required' cmd parsing that cannot be
      bypassed by submitting zero-length bb's.
      v2: rebase (Mika)
      v2: rebase (Mika)
      v3: fix conflict on engine flags (Mika)
      Signed-off-by: default avatarJon Bloomfield <jon.bloomfield@intel.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Dave Airlie <airlied@redhat.com>
      Cc: Takashi Iwai <tiwai@suse.de>
      Cc: Tyler Hicks <tyhicks@canonical.com>
      Signed-off-by: default avatarMika Kuoppala <mika.kuoppala@linux.intel.com>
      Reviewed-by: default avatarChris Wilson <chris.p.wilson@intel.com>
  4. 29 Oct, 2019 1 commit
  5. 24 Oct, 2019 1 commit
  6. 23 Oct, 2019 3 commits
  7. 17 Oct, 2019 1 commit
  8. 04 Oct, 2019 1 commit
  9. 06 Sep, 2019 1 commit
  10. 23 Aug, 2019 1 commit
  11. 16 Aug, 2019 2 commits
  12. 15 Aug, 2019 1 commit
  13. 09 Aug, 2019 1 commit
  14. 08 Aug, 2019 1 commit
  15. 06 Aug, 2019 1 commit
  16. 04 Aug, 2019 1 commit
  17. 30 Jul, 2019 1 commit
  18. 29 Jul, 2019 1 commit
  19. 11 Jul, 2019 1 commit
  20. 21 Jun, 2019 2 commits
  21. 20 Jun, 2019 2 commits
    • Chris Wilson's avatar
      drm/i915/execlists: Minimalistic timeslicing · 8ee36e04
      Chris Wilson authored
      If we have multiple contexts of equal priority pending execution,
      activate a timer to demote the currently executing context in favour of
      the next in the queue when that timeslice expires. This enforces
      fairness between contexts (so long as they allow preemption -- forced
      preemption, in the future, will kick those who do not obey) and allows
      us to avoid userspace blocking forward progress with e.g. unbounded
      For the starting point here, we use the jiffie as our timeslice so that
      we should be reasonably efficient wrt frequent CPU wakeups.
      Testcase: igt/gem_exec_scheduler/semaphore-resolve
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarMika Kuoppala <mika.kuoppala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190620142052.19311-2-chris@chris-wilson.co.uk
    • Chris Wilson's avatar
      drm/i915/execlists: Preempt-to-busy · 22b7a426
      Chris Wilson authored
      When using a global seqno, we required a precise stop-the-workd event to
      handle preemption and unwind the global seqno counter. To accomplish
      this, we would preempt to a special out-of-band context and wait for the
      machine to report that it was idle. Given an idle machine, we could very
      precisely see which requests had completed and which we needed to feed
      back into the run queue.
      However, now that we have scrapped the global seqno, we no longer need
      to precisely unwind the global counter and only track requests by their
      per-context seqno. This allows us to loosely unwind inflight requests
      while scheduling a preemption, with the enormous caveat that the
      requests we put back on the run queue are still _inflight_ (until the
      preemption request is complete). This makes request tracking much more
      messy, as at any point then we can see a completed request that we
      believe is not currently scheduled for execution. We also have to be
      careful not to rewind RING_TAIL past RING_HEAD on preempting to the
      running context, and for this we use a semaphore to prevent completion
      of the request before continuing.
      To accomplish this feat, we change how we track requests scheduled to
      the HW. Instead of appending our requests onto a single list as we
      submit, we track each submission to ELSP as its own block. Then upon
      receiving the CS preemption event, we promote the pending block to the
      inflight block (discarding what was previously being tracked). As normal
      CS completion events arrive, we then remove stale entries from the
      inflight tracker.
      v2: Be a tinge paranoid and ensure we flush the write into the HWS page
      for the GPU semaphore to pick in a timely fashion.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarMika Kuoppala <mika.kuoppala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190620142052.19311-1-chris@chris-wilson.co.uk
  22. 19 Jun, 2019 2 commits
    • Chris Wilson's avatar
      drm/i915: Keep rings pinned while the context is active · 09c5ab38
      Chris Wilson authored
      Remember to keep the rings pinned as well as the context image until the
      GPU is no longer active.
      v2: Introduce a ring->pin_count primarily to hide the
      mock_ring that doesn't fit into the normal GGTT vma picture.
      v3: Order is important in teardown, ringbuffer submission needs to drop
      the pin count on the engine->kernel_context before it can gleefully free
      its ring.
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110946
      Fixes: ce476c80
       ("drm/i915: Keep contexts pinned until after the next kernel context switch")
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190619170135.15281-1-chris@chris-wilson.co.uk
    • Chris Wilson's avatar
      drm/i915: Make the semaphore saturation mask global · 44d89409
      Chris Wilson authored
      The idea behind keeping the saturation mask local to a context backfired
      spectacularly. The premise with the local mask was that we would be more
      proactive in attempting to use semaphores after each time the context
      idled, and that all new contexts would attempt to use semaphores
      ignoring the current state of the system. This turns out to be horribly
      optimistic. If the system state is still oversaturated and the existing
      workloads have all stopped using semaphores, the new workloads would
      attempt to use semaphores and be deprioritised behind real work. The
      new contexts would not switch off using semaphores until their initial
      batch of low priority work had completed. Given sufficient backload load
      of equal user priority, this would completely starve the new work of any
      GPU time.
      To compensate, remove the local tracking in favour of keeping it as
      global state on the engine -- once the system is saturated and
      semaphores are disabled, everyone stops attempting to use semaphores
      until the system is idle again. One of the reason for preferring local
      context tracking was that it worked with virtual engines, so for
      switching to global state we could either do a complete check of all the
      virtual siblings or simply disable semaphores for those requests. This
      takes the simpler approach of disabling semaphores on virtual engines.
      The downside is that the decision that the engine is saturated is a
      local measure -- we are only checking whether or not this context was
      scheduled in a timely fashion, it may be legitimately delayed due to user
      priorities. We still have the same dilemma though, that we do not want
      to employ the semaphore poll unless it will be used.
      v2: Explain why we need to assume the worst wrt virtual engines.
      Fixes: ca6e56f6
       ("drm/i915: Disable semaphore busywaits on saturated systems")
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
      Cc: Dmitry Ermilov <dmitry.ermilov@intel.com>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190618074153.16055-8-chris@chris-wilson.co.uk
  23. 14 Jun, 2019 2 commits
    • Chris Wilson's avatar
      drm/i915: Replace engine->timeline with a plain list · 422d7df4
      Chris Wilson authored
      To continue the onslaught of removing the assumption of a global
      execution ordering, another casualty is the engine->timeline. Without an
      actual timeline to track, it is overkill and we can replace it with a
      much less grand plain list. We still need a list of requests inflight,
      for the simple purpose of finding inflight requests (for retiring,
      resetting, preemption etc).
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarMika Kuoppala <mika.kuoppala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190614164606.15633-3-chris@chris-wilson.co.uk
    • Chris Wilson's avatar
      drm/i915: Keep contexts pinned until after the next kernel context switch · ce476c80
      Chris Wilson authored
      We need to keep the context image pinned in memory until after the GPU
      has finished writing into it. Since it continues to write as we signal
      the final breadcrumb, we need to keep it pinned until the request after
      it is complete. Currently we know the order in which requests execute on
      each engine, and so to remove that presumption we need to identify a
      request/context-switch we know must occur after our completion. Any
      request queued after the signal must imply a context switch, for
      simplicity we use a fresh request from the kernel context.
      The sequence of operations for keeping the context pinned until saved is:
       - On context activation, we preallocate a node for each physical engine
         the context may operate on. This is to avoid allocations during
         unpinning, which may be from inside FS_RECLAIM context (aka the
       - On context deactivation on retirement of the last active request (which
         is before we know the context has been saved), we add the
         preallocated node onto a barrier list on each engine
       - On engine idling, we emit a switch to kernel context. When this
         switch completes, we know that all previous contexts must have been
         saved, and so on retiring this request we can finally unpin all the
         contexts that were marked as deactivated prior to the switch.
      We can enhance this in future by flushing all the idle contexts on a
      regular heartbeat pulse of a switch to kernel context, which will also
      be used to check for hung engines.
      v2: intel_context_active_acquire/_release
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Reviewed-by: default avatarMika Kuoppala <mika.kuoppala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190614164606.15633-1-chris@chris-wilson.co.uk
  24. 29 May, 2019 1 commit
    • Jani Nikula's avatar
      Revert "drm/i915: Expand subslice mask" · a10f361d
      Jani Nikula authored
      This reverts commit 1ac159e2 ("drm/i915: Expand subslice mask"),
      which kills ICL due to GEM_BUG_ON() sanity checks before CI even gets a
      chance to do anything.
      The commit exposes an issue in commit 1e40d4ae ("drm/i915/cnl:
      Implement WaProgramMgsrForCorrectSliceSpecificMmioReads"), which will
      also need to be addressed.
      There's a proposed fix [1], but considering the seeming uncertainty with
      the fix as well as the size of the regressing commit (in this context,
      the one that actually brings down ICL), this warrants a revert to get
      ICL working, and gives us time to get all of this right without
      rushing. Even if this means shooting the messenger.
      <3>[    9.426327] intel_sseu_get_subslices:46 GEM_BUG_ON(slice >= sseu->max_slices)
      <4>[    9.426355] ------------[ cut here ]------------
      <2>[    9.426357] kernel BUG at drivers/gpu/drm/i915/gt/intel_sseu.c:46!
      <4>[    9.426371] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
      <4>[    9.426377] CPU: 1 PID: 364 Comm: systemd-udevd Not tainted 5.2.0-rc2-CI-CI_DRM_6159+ #1
      <4>[    9.426385] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP TLC, BIOS ICLSFWR1.R00.3183.A00.1905020411 05/02/2019
      <4>[    9.426444] RIP: 0010:intel_sseu_get_subslices+0x8a/0xe0 [i915]
      <4>[    9.426452] Code: d5 76 b7 e0 48 8b 35 9d 24 21 00 49 c7 c0 07 f0 72 a0 b9 2e 00 00 00 48 c7 c2 00 8e 6d a0 48 c7 c7 a5 14 5b a0 e8 36 3c be e0 <0f> 0b 48 c7 c1 80 d5 6f a0 ba 30 00 00 00 48 c7 c6 00 8e 6d a0 48
      <4>[    9.426468] RSP: 0018:ffffc9000037b9c8 EFLAGS: 00010282
      <4>[    9.426475] RAX: 000000000000000f RBX: 0000000000000000 RCX: 0000000000000000
      <4>[    9.426482] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffff88849e346f98
      <4>[    9.426490] RBP: ffff88848a200000 R08: 0000000000000004 R09: ffff88849d50b000
      <4>[    9.426497] R10: 0000000000000000 R11: ffff88849e346f98 R12: ffff88848a209e78
      <4>[    9.426505] R13: 0000000003000000 R14: ffff88848a20b1a8 R15: 0000000000000000
      <4>[    9.426513] FS:  00007f73d5ae8680(0000) GS:ffff88849fc80000(0000) knlGS:0000000000000000
      <4>[    9.426521] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      <4>[    9.426527] CR2: 0000561417b01260 CR3: 0000000494764003 CR4: 0000000000760ee0
      <4>[    9.426535] PKRU: 55555554
      <4>[    9.426538] Call Trace:
      <4>[    9.426585]  wa_init_mcr+0xd5/0x110 [i915]
      <4>[    9.426597]  ? lock_acquire+0xa6/0x1c0
      <4>[    9.426645]  icl_gt_workarounds_init+0x21/0x1a0 [i915]
      <4>[    9.426694]  ? i915_driver_load+0xfcf/0x18a0 [i915]
      <4>[    9.426739]  gt_init_workarounds+0x14c/0x230 [i915]
      <4>[    9.426748]  ? _raw_spin_unlock_irq+0x24/0x50
      <4>[    9.426789]  intel_gt_init_workarounds+0x1b/0x30 [i915]
      <4>[    9.426835]  i915_driver_load+0xfd7/0x18a0 [i915]
      <4>[    9.426843]  ? lock_acquire+0xa6/0x1c0
      <4>[    9.426850]  ? __pm_runtime_resume+0x4f/0x80
      <4>[    9.426857]  ? _raw_spin_unlock_irqrestore+0x4c/0x60
      <4>[    9.426863]  ? _raw_spin_unlock_irqrestore+0x4c/0x60
      <4>[    9.426870]  ? lockdep_hardirqs_on+0xe3/0x1b0
      <4>[    9.426915]  i915_pci_probe+0x29/0xa0 [i915]
      <4>[    9.426923]  pci_device_probe+0x9e/0x120
      <4>[    9.426930]  really_probe+0xea/0x3c0
      <4>[    9.426936]  driver_probe_device+0x10b/0x120
      <4>[    9.426942]  device_driver_attach+0x4a/0x50
      <4>[    9.426948]  __driver_attach+0x97/0x130
      <4>[    9.426954]  ? device_driver_attach+0x50/0x50
      <4>[    9.426960]  bus_for_each_dev+0x74/0xc0
      <4>[    9.426966]  bus_add_driver+0x13f/0x210
      <4>[    9.426971]  ? 0xffffffffa083b000
      <4>[    9.426976]  driver_register+0x56/0xe0
      <4>[    9.426982]  ? 0xffffffffa083b000
      <4>[    9.426987]  do_one_initcall+0x58/0x300
      <4>[    9.426994]  ? do_init_module+0x1d/0x1f6
      <4>[    9.427001]  ? rcu_read_lock_sched_held+0x6f/0x80
      <4>[    9.427007]  ? kmem_cache_alloc_trace+0x261/0x290
      <4>[    9.427014]  do_init_module+0x56/0x1f6
      <4>[    9.427020]  load_module+0x24d1/0x2990
      <4>[    9.427032]  ? __se_sys_finit_module+0xd3/0xf0
      <4>[    9.427037]  __se_sys_finit_module+0xd3/0xf0
      <4>[    9.427047]  do_syscall_64+0x55/0x1c0
      <4>[    9.427053]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      <4>[    9.427059] RIP: 0033:0x7f73d5609839
      <4>[    9.427064] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1f f6 2c 00 f7 d8 64 89 01 48
      <4>[    9.427082] RSP: 002b:00007ffdf34477b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
      <4>[    9.427091] RAX: ffffffffffffffda RBX: 00005559fd5d7b40 RCX: 00007f73d5609839
      <4>[    9.427099] RDX: 0000000000000000 RSI: 00007f73d52e8145 RDI: 000000000000000f
      <4>[    9.427106] RBP: 00007f73d52e8145 R08: 0000000000000000 R09: 00007ffdf34478d0
      <4>[    9.427114] R10: 000000000000000f R11: 0000000000000246 R12: 0000000000000000
      <4>[    9.427121] R13: 00005559fd5c90f0 R14: 0000000000020000 R15: 00005559fd5d7b40
      <4>[    9.427131] Modules linked in: i915(+) mei_hdcp x86_pkg_temp_thermal coretemp snd_hda_intel crct10dif_pclmul crc32_pclmul snd_hda_codec snd_hwdep e1000e snd_hda_core ghash_clmulni_intel ptp snd_pcm cdc_ether usbnet mii pps_core mei_me mei prime_numbers btusb btrtl btbcm btintel bluetooth ecdh_generic ecc
      <4>[    9.427254] ---[ end trace af3eeb543bd66e66 ]---
      [1] http://patchwork.freedesktop.org/patch/msgid/20190528200655.11605-1-chris@chris-wilson.co.uk
      References: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6159/fi-icl-u2/pstore0-1517155098_Oops_1.log
      References: 1e40d4ae ("drm/i915/cnl: Implement WaProgramMgsrForCorrectSliceSpecificMmioReads")
      Fixes: 1ac159e2
       ("drm/i915: Expand subslice mask")
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
      Cc: Manasi Navare <manasi.d.navare@intel.com>
      Cc: Michel Thierry <michel.thierry@intel.com>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Cc: Oscar Mateo <oscar.mateo@intel.com>
      Cc: Stuart Summers <stuart.summers@intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
      Cc: Yunwei Zhang <yunwei.zhang@intel.com>
      Acked-by: default avatarDaniel Vetter <daniel@ffwll.ch>
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190529082150.31526-1-jani.nikula@intel.com
  25. 28 May, 2019 2 commits
  26. 22 May, 2019 3 commits
    • Tvrtko Ursulin's avatar
      drm/i915: Engine discovery query · c5d3e39c
      Tvrtko Ursulin authored
      Engine discovery query allows userspace to enumerate engines, probe their
      configuration features, all without needing to maintain the internal PCI
      ID based database.
      A new query for the generic i915 query ioctl is added named
      DRM_I915_QUERY_ENGINE_INFO, together with accompanying structure
      drm_i915_query_engine_info. The address of latter should be passed to the
      kernel in the query.data_ptr field, and should be large enough for the
      kernel to fill out all known engines as struct drm_i915_engine_info
      elements trailing the query.
      As with other queries, setting the item query length to zero allows
      userspace to query minimum required buffer size.
      Enumerated engines have common type mask which can be used to query all
      hardware engines, versus engines userspace can submit to using the execbuf
      Engines also have capabilities which are per engine class namespace of
      bits describing features not present on all engine instances.
       * Fixed HEVC assignment.
       * Reorder some fields, rename type to flags, increase width. (Lionel)
       * No need to allocate temporary storage if we do it engine by engine.
       * Describe engine flags and mark mbz fields. (Lionel)
       * HEVC only applies to VCS.
       * Squash SFC flag into main patch.
       * Tidy some comments.
       * Add uabi_ prefix to engine capabilities. (Chris Wilson)
       * Report exact size of engine info array. (Chris Wilson)
       * Drop the engine flags. (Joonas Lahtinen)
       * Added some more reserved fields.
       * Move flags after class/instance.
       * Do not check engine info array was zeroed by userspace but zero the
         unused fields for them instead.
       * Simplify length calculation loop. (Lionel Landwerlin)
       * Remove MBZ comments where not applicable.
       * Rename ABI flags to match engine class define naming.
       * Rename SFC ABI flag to reflect it applies to VCS and VECS.
       * SFC is wired to even _logical_ engine instances.
       * SFC applies to VCS and VECS.
       * HEVC is present on all instances on Gen11. (Tony)
       * Simplify length calculation even more. (Chris Wilson)
       * Move info_ptr assigment closer to loop for clarity. (Chris Wilson)
       * Use vdbox_sfc_access from runtime info.
       * Rebase for RUNTIME_INFO.
       * Refactor for lower indentation.
       * Rename uAPI class/instance to engine_class/instance to avoid C++
       * Rebase for s/num_rings/num_engines/ in RUNTIME_INFO.
       * Use new copy_query_item.
       * Consolidate with struct i915_engine_class_instnace.
      Signed-off-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Jon Bloomfield <jon.bloomfield@intel.com>
      Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
      Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Tony Ye <tony.ye@intel.com>
      Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> # v7
      Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190522090054.6007-1-tvrtko.ursulin@linux.intel.com
    • Chris Wilson's avatar
      drm/i915/execlists: Virtual engine bonding · ee113690
      Chris Wilson authored
      Some users require that when a master batch is executed on one particular
      engine, a companion batch is run simultaneously on a specific slave
      engine. For this purpose, we introduce virtual engine bonding, allowing
      maps of master:slaves to be constructed to constrain which physical
      engines a virtual engine may select given a fence on a master engine.
      For the moment, we continue to ignore the issue of preemption deferring
      the master request for later. Ideally, we would like to then also remove
      the slave and run something else rather than have it stall the pipeline.
      With load balancing, we should be able to move workload around it, but
      there is a similar stall on the master pipeline while it may wait for
      the slave to be executed. At the cost of more latency for the bonded
      request, it may be interesting to launch both on their engines in
      lockstep. (Bubbles abound.)
      Opens: Also what about bonding an engine as its own master? It doesn't
      break anything internally, so allow the silliness.
      v2: Emancipate the bonds
      v3: Couple in delayed scheduling for the selftests
      v4: Handle invalid mutually exclusive bonding
      v5: Mention what the uapi does
      v6: s/nbond/num_bonds/
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-9-chris@chris-wilson.co.uk
    • Chris Wilson's avatar
      drm/i915: Load balancing across a virtual engine · 6d06779e
      Chris Wilson authored
      Having allowed the user to define a set of engines that they will want
      to only use, we go one step further and allow them to bind those engines
      into a single virtual instance. Submitting a batch to the virtual engine
      will then forward it to any one of the set in a manner as best to
      distribute load.  The virtual engine has a single timeline across all
      engines (it operates as a single queue), so it is not able to concurrently
      run batches across multiple engines by itself; that is left up to the user
      to submit multiple concurrent batches to multiple queues. Multiple users
      will be load balanced across the system.
      The mechanism used for load balancing in this patch is a late greedy
      balancer. When a request is ready for execution, it is added to each
      engine's queue, and when an engine is ready for its next request it
      claims it from the virtual engine. The first engine to do so, wins, i.e.
      the request is executed at the earliest opportunity (idle moment) in the
      As not all HW is created equal, the user is still able to skip the
      virtual engine and execute the batch on a specific engine, all within the
      same queue. It will then be executed in order on the correct engine,
      with execution on other virtual engines being moved away due to the load
      A couple of areas for potential improvement left!
      - The virtual engine always take priority over equal-priority tasks.
      Mostly broken up by applying FQ_CODEL rules for prioritising new clients,
      and hopefully the virtual and real engines are not then congested (i.e.
      all work is via virtual engines, or all work is to the real engine).
      - We require the breadcrumb irq around every virtual engine request. For
      normal engines, we eliminate the need for the slow round trip via
      interrupt by using the submit fence and queueing in order. For virtual
      engines, we have to allow any job to transfer to a new ring, and cannot
      coalesce the submissions, so require the completion fence instead,
      forcing the persistent use of interrupts.
      - We only drip feed single requests through each virtual engine and onto
      the physical engines, even if there was enough work to fill all ELSP,
      leaving small stalls with an idle CS event at the end of every request.
      Could we be greedy and fill both slots? Being lazy is virtuous for load
      distribution on less-than-full workloads though.
      Other areas of improvement are more general, such as reducing lock
      contention, reducing dispatch overhead, looking at direct submission
      rather than bouncing around tasklets etc.
      sseu: Lift the restriction to allow sseu to be reconfigured on virtual
      engines composed of RENDER_CLASS (rcs).
      v2: macroize check_user_mbz()
      v3: Cancel virtual engines on wedging
      v4: Commence commenting
      v5: Replace 64b sibling_mask with a list of class:instance
      v6: Drop the one-element array in the uabi
      v7: Assert it is an virtual engine in to_virtual_engine()
      v8: Skip over holes in [class][inst] so we can selftest with (vcs0, vcs2)
      Link: https://github.com/intel/media-driver/pull/283
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-6-chris@chris-wilson.co.uk
  27. 08 May, 2019 1 commit
  28. 03 May, 2019 1 commit
  29. 01 May, 2019 1 commit
  30. 24 Apr, 2019 1 commit
    • Chris Wilson's avatar
      drm/i915: Invert the GEM wakeref hierarchy · 79ffac85
      Chris Wilson authored
      In the current scheme, on submitting a request we take a single global
      GEM wakeref, which trickles down to wake up all GT power domains. This
      is undesirable as we would like to be able to localise our power
      management to the available power domains and to remove the global GEM
      operations from the heart of the driver. (The intent there is to push
      global GEM decisions to the boundary as used by the GEM user interface.)
      Now during request construction, each request is responsible via its
      logical context to acquire a wakeref on each power domain it intends to
      utilize. Currently, each request takes a wakeref on the engine(s) and
      the engines themselves take a chipset wakeref. This gives us a
      transition on each engine which we can extend if we want to insert more
      powermangement control (such as soft rc6). The global GEM operations
      that currently require a struct_mutex are reduced to listening to pm
      events from the chipset GT wakeref. As we reduce the struct_mutex
      requirement, these listeners should evaporate.
      Perhaps the biggest immediate change is that this removes the
      struct_mutex requirement around GT power management, allowing us greater
      flexibility in request construction. Another important knock-on effect,
      is that by tracking engine usage, we can insert a switch back to the
      kernel context on that engine immediately, avoiding any extra delay or
      inserting global synchronisation barriers. This makes tracking when an
      engine and its associated contexts are idle much easier -- important for
      when we forgo our assumed execution ordering and need idle barriers to
      unpin used contexts. In the process, it means we remove a large chunk of
      code whose only purpose was to switch back to the kernel context.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Imre Deak <imre.deak@intel.com>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190424200717.1686-5-chris@chris-wilson.co.uk