From: "Ville Syrjälä" <ville.syrjala@linux.intel.com>
To: Nirmoy Das <nirmoy.das@linux.intel.com>
Cc: intel-gfx@lists.freedesktop.org, Nirmoy Das <nirmoy.das@intel.com>
Subject: Re: [PATCH v2 04/15] drm/i915: Bypass LMEMBAR/GTTMMADR for MTL stolen memory access
Date: Fri, 12 Jan 2024 18:55:01 +0200 [thread overview]
Message-ID: <ZaFu5b5LId3bYw1e@intel.com> (raw)
In-Reply-To: <d512219a-5c89-46ad-8335-91c43d54c24f@linux.intel.com>
On Fri, Jan 12, 2024 at 05:31:10PM +0100, Nirmoy Das wrote:
>
> On 1/12/2024 4:12 PM, Ville Syrjälä wrote:
> > On Wed, Jan 10, 2024 at 11:49:47AM +0100, Nirmoy Das wrote:
> >> Hi Ville,
> >>
> >> Apologies, but I lost track of this series after I returned from sick leave.
> >>
> >>
> >> On 12/15/2023 11:59 AM, Ville Syrjala wrote:
> >>> From: Ville Syrjälä<ville.syrjala@linux.intel.com>
> >>>
> >>> On MTL accessing stolen memory via the BARs is somehow borked,
> >>> and it can hang the machine. As a workaround let's bypass the
> >>> BARs and just go straight to DSMBASE/GSMBASE instead.
> >>>
> >>> Note that on every other platform this itself would hang the
> >>> machine, but on MTL the system firmware is expected to relax
> >>> the access permission guarding stolen memory to enable this
> >>> workaround, and thus direct CPU accesses should be fine.
> >>>
> >>> TODO: add w/a numbers and whatnot
> >>>
> >>> Cc: Paz Zcharya<pazz@chromium.org>
> >>> Cc: Nirmoy Das<nirmoy.das@intel.com>
> >>> Cc: Radhakrishna Sripada<radhakrishna.sripada@intel.com>
> >>> Cc: Joonas Lahtinen<joonas.lahtinen@linux.intel.com>
> >>> Signed-off-by: Ville Syrjälä<ville.syrjala@linux.intel.com>
> >>> ---
> >>> drivers/gpu/drm/i915/gem/i915_gem_stolen.c | 11 ++++++++++-
> >>> drivers/gpu/drm/i915/gt/intel_ggtt.c | 13 ++++++++++++-
> >>> 2 files changed, 22 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
> >>> index ee237043c302..252fe5cd6ede 100644
> >>> --- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
> >>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
> >>> @@ -941,7 +941,16 @@ i915_gem_stolen_lmem_setup(struct drm_i915_private *i915, u16 type,
> >>> dsm_size = ALIGN_DOWN(lmem_size - dsm_base, SZ_1M);
> >>> }
> >>>
> >>> - if (pci_resource_len(pdev, GEN12_LMEM_BAR) < lmem_size) {
> >>> + if (IS_METEORLAKE(i915)) {
> >>> + /*
> >>> + * Workaround: access via BAR can hang MTL, go directly to DSM.
> >>> + *
> >>> + * Normally this would not work but on MTL the system firmware
> >>> + * should have relaxed the access permissions sufficiently.
> >>> + */
> >>> + io_start = intel_uncore_read64(uncore, GEN12_DSMBASE) & GEN12_BDSM_MASK;
> >>> + io_size = dsm_size;
> >> This will work well on host driver but I am afraid this will not work on
> >> VM when someone tries to do direct device assignment of the igfx.
> >>
> >> GSMBASE/DSMBASE is reserved region so won't show up in VM, last I checked.
> > Hmm. So BARs get passed over but other regions won't be? I wonder if
> > there's a way to pass them explicitly...
>
> Yes, when a user ask qemu to pass though a pci device then qemu will
> ensure to map those
>
> BARs.
>
> >
> >> This is an obscure usages but are we suppose to support that? If so then
> >> we need to detect that and fall back to binder approach.
> > I suppose some people may attempt it. But I'm not sure how well that
> > will work in practice even on other platforms. I don't think we've
> > ever really considered that use case any kind of priority so bug
> > reports tend to go unanswered.
> >
> > My main worry with the MI_UPDATE_GTT stuff is:
> > - only used on this one platform so very limited testing coverage
> > - async so more opprtunities to screw things up
> > - what happens if the engine hangs while we're waiting for MI_UPDATE_GTT
> > to finish?
> > - requires working command submission, so even getting a working
> > display now depends on a lot more extra components working correctly
> >
> > hence the patch to disable it. During testing my MTL was very unstable
> > so I wanted to eliminate all potential sources of new bugs.
>
> Valid concerns but unfortunately MI_UPDATE_GTT is the only generic
> solution came up in the discussions
>
> which supports host, vm, also SRIOV case.
>
> >
> > Hmm. But we can't even use MI_UPDATE_GTT until command submission is
> > up and running, so we still need the direct CPU path for early ggtt
> > setup no?
>
> It is very unlikely for the bug to appear when there is only single user
> of the GPU. So the HW team is fine with
>
> having a small window where we do modify GTT using stolen.
>
>
> How about a modparam which defaults to your approach and have a doc
> saying to use binder on VM ?
>
> It would be nice if i915 could detect if it is running in virtualized
> environment but I don't have any ideas for that.
We have i915_run_as_guest() but dunno if that covers everything
we need.
So in order to accomodate both approachs we'd need:
1. select DSM/GSMBASE vs. BAR based on host vs. guest
2. perhaps disable binder on host for now to keep things
more uniform between the platforms by default
3. maybe extend binder to more platforms and enable it
across the board (in case we decide it has other real
benefits besides not hanging mtl).
--
Ville Syrjälä
Intel
next prev parent reply other threads:[~2024-01-12 16:55 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-15 10:59 [PATCH v2 00/15] drm/i915: (stolen) memory region related fixes Ville Syrjala
2023-12-15 10:59 ` [PATCH v2 01/15] drm/i915: Use struct resource for memory region IO as well Ville Syrjala
2023-12-15 10:59 ` [PATCH v2 02/15] drm/i915: Print memory region info during probe Ville Syrjala
2023-12-15 10:59 ` [PATCH v2 03/15] drm/i915: Remove ad-hoc lmem/stolen debugs Ville Syrjala
2023-12-15 10:59 ` [PATCH v2 04/15] drm/i915: Bypass LMEMBAR/GTTMMADR for MTL stolen memory access Ville Syrjala
2023-12-15 21:58 ` Sripada, Radhakrishna
2024-01-10 9:13 ` Andrzej Hajda
2024-01-10 10:49 ` Nirmoy Das
2024-01-10 11:48 ` Nirmoy Das
2024-01-12 15:12 ` Ville Syrjälä
2024-01-12 16:31 ` Nirmoy Das
2024-01-12 16:55 ` Ville Syrjälä [this message]
2023-12-15 10:59 ` [PATCH v2 05/15] drm/i915: Disable the "binder" Ville Syrjala
2024-01-10 10:28 ` Andrzej Hajda
2023-12-15 10:59 ` [PATCH v2 06/15] drm/i915: Rename the DSM/GSM registers Ville Syrjala
2023-12-15 13:56 ` Andrzej Hajda
2023-12-15 10:59 ` [PATCH v2 07/15] drm/i915: Fix PTE decode during initial plane readout Ville Syrjala
2023-12-18 12:36 ` Andrzej Hajda
2023-12-15 10:59 ` [PATCH v2 08/15] drm/i915: Fix region start " Ville Syrjala
2023-12-18 13:00 ` Andrzej Hajda
2023-12-19 0:01 ` Ville Syrjälä
2024-01-12 14:53 ` Ville Syrjälä
2023-12-15 10:59 ` [PATCH v2 09/15] drm/i915: Fix MTL " Ville Syrjala
2023-12-19 10:58 ` Andrzej Hajda
2024-01-12 14:52 ` Ville Syrjälä
2023-12-15 10:59 ` [PATCH v2 10/15] drm/i915: s/phys_base/dma_addr/ Ville Syrjala
2023-12-19 10:55 ` Andrzej Hajda
2023-12-15 10:59 ` [PATCH v2 11/15] drm/i915: Split the smem and lmem plane readout apart Ville Syrjala
2023-12-19 10:55 ` Andrzej Hajda
2023-12-19 12:47 ` Ville Syrjälä
2023-12-15 10:59 ` [PATCH v2 12/15] drm/i915: Simplify intel_initial_plane_config() calling convention Ville Syrjala
2023-12-19 10:59 ` Andrzej Hajda
2024-01-13 0:55 ` kernel test robot
2024-01-13 6:27 ` kernel test robot
2023-12-15 10:59 ` [PATCH v2 13/15] drm/i915/fbdev: Fix smem_start for LMEMBAR stolen objects Ville Syrjala
2024-01-10 9:12 ` Andrzej Hajda
2023-12-15 10:59 ` [PATCH v2 14/15] drm/i915: Tweak BIOS fb reuse check Ville Syrjala
2024-01-10 9:36 ` Andrzej Hajda
2023-12-15 10:59 ` [PATCH v2 15/15] drm/i915: Try to relocate the BIOS fb to the start of ggtt Ville Syrjala
2024-01-10 10:11 ` Andrzej Hajda
2024-01-11 13:42 ` Ville Syrjälä
2024-01-11 14:06 ` [PATCH v3 " Ville Syrjala
2023-12-15 15:45 ` ✗ Fi.CI.CHECKPATCH: warning for drm/i915: (stolen) memory region related fixes (rev3) Patchwork
2023-12-15 15:45 ` ✗ Fi.CI.SPARSE: " Patchwork
2023-12-15 16:02 ` ✗ Fi.CI.BAT: failure " Patchwork
2024-01-11 17:08 ` ✗ Fi.CI.CHECKPATCH: warning for drm/i915: (stolen) memory region related fixes (rev5) Patchwork
2024-01-11 17:08 ` ✗ Fi.CI.SPARSE: " Patchwork
2024-01-11 17:22 ` ✓ Fi.CI.BAT: success " Patchwork
2024-01-11 22:46 ` ✓ Fi.CI.IGT: " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZaFu5b5LId3bYw1e@intel.com \
--to=ville.syrjala@linux.intel.com \
--cc=intel-gfx@lists.freedesktop.org \
--cc=nirmoy.das@intel.com \
--cc=nirmoy.das@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox