All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Dixit, Ashutosh" <ashutosh.dixit@intel.com>
To: Matthew Auld <matthew.auld@intel.com>
Cc: intel-xe@lists.freedesktop.org
Subject: Re: [Intel-xe] [PATCH v12 00/13] xe_device_mem_access fixes and related bits
Date: Fri, 30 Jun 2023 09:59:26 -0700	[thread overview]
Message-ID: <87cz1c6g41.wl-ashutosh.dixit@intel.com> (raw)
In-Reply-To: <c722bfa2-ffbb-394a-36e7-896d0447d5af@intel.com>

On Fri, 30 Jun 2023 04:07:44 -0700, Matthew Auld wrote:
>

Hi Matt,

> On 30/06/2023 07:21, Dixit, Ashutosh wrote:
> > On Mon, 26 Jun 2023 03:50:38 -0700, Matthew Auld wrote:
> >>
> >> Main goal is to fix the races in xe_device_mem_access_get(). With that fixed we
> >> can clean up some hacks and also start rolling it out to more places that need
> >> it, including now asserting it around every mmio access. We also add lockdep
> >> annotations for xe_device_mem_access_get() and fix the remaining lockdep
> >> fallout.
> >>
> >> v11 -> v12:
> >>    - freq_rpe_show also needs the device to be awake
> >>    - Improvements to the lockdep annotation patch
> >
> > Just FYI, fwiw this is from my local branch, but even with this series I am
> > seeing this on DG2. Probe is fine but as soon as my IGT (perf) runs the
> > trace below spews out. The unit test runs fine on RPLP. If there's a
> > temporary workaround for this I'd like to know. Thanks.
> >
> > -Ashutosh
> >
> > [  486.110571] xe: loading out-of-tree module taints kernel.
> > [  486.131776] xe 0000:03:00.0: vgaarb: deactivate vga console
> > [  486.133136] GT topology dss mask (geometry): 00000000,0000ff00
> > [  486.133139] GT topology dss mask (compute):  00000000,0000ff00
> > [  486.133141] GT topology EU mask per DSS:     0000ffff
> > [  486.133636] xe 0000:03:00.0: [drm] VISIBLE VRAM: 0x0000004000000000, 0x0000000200000000
> > [  486.133686] xe 0000:03:00.0: [drm] VRAM[0, 0]: 0x0000004000000000, 0x000000017c800000
> > [  486.133688] xe 0000:03:00.0: [drm] Total VRAM: 0x0000004000000000, 0x0000000180000000
> > [  486.133690] xe 0000:03:00.0: [drm] Available VRAM: 0x0000004000000000, 0x000000017c800000
> > [  486.195204] xe 0000:03:00.0: [drm] Using GuC firmware (70.5) from i915/dg2_guc_70.bin
> > [  486.198192] xe 0000:03:00.0: [drm] HuC disabled
> > [  486.241959] xe 0000:03:00.0: [drm] ccs0 fused off
> > [  486.241964] xe 0000:03:00.0: [drm] ccs2 fused off
> > [  486.241965] xe 0000:03:00.0: [drm] ccs3 fused off
> > [  486.242509] xe REG[0x223a8-0x223af]: allow read access
> > [  486.242606] xe REG[0x1c03a8-0x1c03af]: allow read access
> > [  486.242708] xe REG[0x1d03a8-0x1d03af]: allow read access
> > [  486.242826] xe REG[0x1c83a8-0x1c83af]: allow read access
> > [  486.242945] xe REG[0x1d83a8-0x1d83af]: allow read access
> > [  486.243033] xe REG[0x1c3a8-0x1c3af]: allow read access
> > [  486.306291] [drm] Initialized xe 1.1.0 20201103 for 0000:03:00.0 on minor 0
> > [  486.309344] insmod (3290) used greatest stack depth: 10936 bytes left
> > [  487.559809] xe 0000:03:00.0: [drm] GT0: suspended
>
> Device hits runtime suspend after probing the device. Looks normal so far...
>
> > [  500.096224] [IGT] perf: executing
> > [  502.830435] pcieport 0000:01:00.0: not ready 1023ms after resume; giving up
> > [  502.832901] pcieport 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible
> > [  502.835939] pcieport 0000:02:01.0: Unable to change power state from D3cold to D0, device inaccessible
> > [  502.836769] pcieport 0000:02:04.0: Unable to change power state from D3cold to D0, device inaccessible
> > [  505.070434] xe 0000:03:00.0: not ready 1023ms after resume; giving up
> > [  505.071074] xe 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible
>
> And here we are tying to resume the device. This is still deep in PCI
> stuff, and it's already looking bad since the device is unable to exit from
> D3cold. Also we shouldn't even be in D3cold here (it implies PCI device is
> powered off), but only D3hot. For reference on my DG2 it only goes from D0
> -> D3hot on runtime suspend and D3hot -> D0 on runtime resume. D3cold is
> explicitly disabled for now with rpm as per xe->d3cold_allowed. But even
> so, it's unclear why it can't restore power and get back to D0 (device is
> maybe unresponsive/dead?). Although even if it did get as far as the driver
> part of the resume it would still be all kinds of broken since VRAM has
> been nuked.
>
> I would assume there is something broken/faulty with that system.

Yes, now that I think about it, believe the IGT did work on a different
system.

> You could maybe try disabling rpm, and avoiding forced suspend/resume on
> that system:
>
> --- a/drivers/gpu/drm/xe/xe_pm.c
> +++ b/drivers/gpu/drm/xe/xe_pm.c
> @@ -124,7 +124,6 @@ void xe_pm_runtime_init(struct xe_device *xe)
>         pm_runtime_use_autosuspend(dev);
>         pm_runtime_set_autosuspend_delay(dev, 1000);
>         pm_runtime_set_active(dev);
> -       pm_runtime_allow(dev);
>         pm_runtime_mark_last_busy(dev);
>         pm_runtime_put_autosuspend(dev);

Thanks, yes this did unblock me on this system.

Ashutosh

      reply	other threads:[~2023-06-30 17:08 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-26 10:50 [Intel-xe] [PATCH v12 00/13] xe_device_mem_access fixes and related bits Matthew Auld
2023-06-26 10:50 ` [Intel-xe] [PATCH v12 01/13] drm/xe: fix xe_device_mem_access_get() races Matthew Auld
2023-06-30 15:22   ` Gupta, Anshuman
2023-07-04 11:25     ` Matthew Auld
2023-07-04 15:29       ` Gupta, Anshuman
2023-07-04 16:00         ` Matthew Auld
2023-07-11  9:00           ` Gupta, Anshuman
2023-07-11 11:06             ` Matthew Auld
2023-07-11 17:56               ` Gupta, Anshuman
2023-06-26 10:50 ` [Intel-xe] [PATCH v12 02/13] drm/xe/vm: tidy up xe_runtime_pm usage Matthew Auld
2023-06-26 10:50 ` [Intel-xe] [PATCH v12 03/13] drm/xe/debugfs: grab mem_access around forcewake Matthew Auld
2023-06-26 10:50 ` [Intel-xe] [PATCH v12 04/13] drm/xe/guc_pc: add missing mem_access for freq_rpe_show Matthew Auld
2023-06-27  6:53   ` Gupta, Anshuman
2023-06-27  8:20     ` Matthew Auld
2023-06-27 10:14       ` Gupta, Anshuman
2023-06-26 10:50 ` [Intel-xe] [PATCH v12 05/13] drm/xe/mmio: grab mem_access in xe_mmio_ioctl Matthew Auld
2023-06-26 10:50 ` [Intel-xe] [PATCH v12 06/13] drm/xe: ensure correct access_put ordering Matthew Auld
2023-06-26 10:50 ` [Intel-xe] [PATCH v12 07/13] drm/xe/pci: wrap probe with mem_access Matthew Auld
2023-06-26 10:50 ` [Intel-xe] [PATCH v12 08/13] drm/xe/display: use mem_access underneath Matthew Auld
2023-06-28  9:51   ` Gupta, Anshuman
2023-06-29  9:19     ` Matthew Auld
2023-06-26 10:50 ` [Intel-xe] [PATCH v12 09/13] drm/xe/mmio: enforce xe_device_assert_mem_access Matthew Auld
2023-06-26 10:50 ` [Intel-xe] [PATCH v12 10/13] drm/xe: drop xe_device_mem_access_get() from guc_ct_send Matthew Auld
2023-06-26 10:50 ` [Intel-xe] [PATCH v12 11/13] drm/xe/ggtt: prime ggtt->lock against FS_RECLAIM Matthew Auld
2023-06-26 10:50 ` [Intel-xe] [PATCH v12 12/13] drm/xe: drop xe_device_mem_access_get() from invalidation_vma Matthew Auld
2023-06-26 10:50 ` [Intel-xe] [PATCH v12 13/13] drm/xe: add lockdep annotation for xe_device_mem_access_get() Matthew Auld
2023-06-26 12:55 ` [Intel-xe] ✓ CI.Patch_applied: success for xe_device_mem_access fixes and related bits (rev2) Patchwork
2023-06-26 12:56 ` [Intel-xe] ✗ CI.checkpatch: warning " Patchwork
2023-06-26 12:57 ` [Intel-xe] ✓ CI.KUnit: success " Patchwork
2023-06-26 13:01 ` [Intel-xe] ✓ CI.Build: " Patchwork
2023-06-26 13:01 ` [Intel-xe] ✓ CI.Hooks: " Patchwork
2023-06-26 13:02 ` [Intel-xe] ✓ CI.checksparse: " Patchwork
2023-06-26 13:46 ` [Intel-xe] ○ CI.BAT: info " Patchwork
2023-06-30  6:21 ` [Intel-xe] [PATCH v12 00/13] xe_device_mem_access fixes and related bits Dixit, Ashutosh
2023-06-30 11:07   ` Matthew Auld
2023-06-30 16:59     ` Dixit, Ashutosh [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87cz1c6g41.wl-ashutosh.dixit@intel.com \
    --to=ashutosh.dixit@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=matthew.auld@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.