From: "Dixit, Ashutosh" <ashutosh.dixit@intel.com>
To: Matthew Auld <matthew.auld@intel.com>
Cc: intel-xe@lists.freedesktop.org
Subject: Re: [Intel-xe] [PATCH v12 00/13] xe_device_mem_access fixes and related bits
Date: Fri, 30 Jun 2023 09:59:26 -0700 [thread overview]
Message-ID: <87cz1c6g41.wl-ashutosh.dixit@intel.com> (raw)
In-Reply-To: <c722bfa2-ffbb-394a-36e7-896d0447d5af@intel.com>
On Fri, 30 Jun 2023 04:07:44 -0700, Matthew Auld wrote:
>
Hi Matt,
> On 30/06/2023 07:21, Dixit, Ashutosh wrote:
> > On Mon, 26 Jun 2023 03:50:38 -0700, Matthew Auld wrote:
> >>
> >> Main goal is to fix the races in xe_device_mem_access_get(). With that fixed we
> >> can clean up some hacks and also start rolling it out to more places that need
> >> it, including now asserting it around every mmio access. We also add lockdep
> >> annotations for xe_device_mem_access_get() and fix the remaining lockdep
> >> fallout.
> >>
> >> v11 -> v12:
> >> - freq_rpe_show also needs the device to be awake
> >> - Improvements to the lockdep annotation patch
> >
> > Just FYI, fwiw this is from my local branch, but even with this series I am
> > seeing this on DG2. Probe is fine but as soon as my IGT (perf) runs the
> > trace below spews out. The unit test runs fine on RPLP. If there's a
> > temporary workaround for this I'd like to know. Thanks.
> >
> > -Ashutosh
> >
> > [ 486.110571] xe: loading out-of-tree module taints kernel.
> > [ 486.131776] xe 0000:03:00.0: vgaarb: deactivate vga console
> > [ 486.133136] GT topology dss mask (geometry): 00000000,0000ff00
> > [ 486.133139] GT topology dss mask (compute): 00000000,0000ff00
> > [ 486.133141] GT topology EU mask per DSS: 0000ffff
> > [ 486.133636] xe 0000:03:00.0: [drm] VISIBLE VRAM: 0x0000004000000000, 0x0000000200000000
> > [ 486.133686] xe 0000:03:00.0: [drm] VRAM[0, 0]: 0x0000004000000000, 0x000000017c800000
> > [ 486.133688] xe 0000:03:00.0: [drm] Total VRAM: 0x0000004000000000, 0x0000000180000000
> > [ 486.133690] xe 0000:03:00.0: [drm] Available VRAM: 0x0000004000000000, 0x000000017c800000
> > [ 486.195204] xe 0000:03:00.0: [drm] Using GuC firmware (70.5) from i915/dg2_guc_70.bin
> > [ 486.198192] xe 0000:03:00.0: [drm] HuC disabled
> > [ 486.241959] xe 0000:03:00.0: [drm] ccs0 fused off
> > [ 486.241964] xe 0000:03:00.0: [drm] ccs2 fused off
> > [ 486.241965] xe 0000:03:00.0: [drm] ccs3 fused off
> > [ 486.242509] xe REG[0x223a8-0x223af]: allow read access
> > [ 486.242606] xe REG[0x1c03a8-0x1c03af]: allow read access
> > [ 486.242708] xe REG[0x1d03a8-0x1d03af]: allow read access
> > [ 486.242826] xe REG[0x1c83a8-0x1c83af]: allow read access
> > [ 486.242945] xe REG[0x1d83a8-0x1d83af]: allow read access
> > [ 486.243033] xe REG[0x1c3a8-0x1c3af]: allow read access
> > [ 486.306291] [drm] Initialized xe 1.1.0 20201103 for 0000:03:00.0 on minor 0
> > [ 486.309344] insmod (3290) used greatest stack depth: 10936 bytes left
> > [ 487.559809] xe 0000:03:00.0: [drm] GT0: suspended
>
> Device hits runtime suspend after probing the device. Looks normal so far...
>
> > [ 500.096224] [IGT] perf: executing
> > [ 502.830435] pcieport 0000:01:00.0: not ready 1023ms after resume; giving up
> > [ 502.832901] pcieport 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible
> > [ 502.835939] pcieport 0000:02:01.0: Unable to change power state from D3cold to D0, device inaccessible
> > [ 502.836769] pcieport 0000:02:04.0: Unable to change power state from D3cold to D0, device inaccessible
> > [ 505.070434] xe 0000:03:00.0: not ready 1023ms after resume; giving up
> > [ 505.071074] xe 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible
>
> And here we are tying to resume the device. This is still deep in PCI
> stuff, and it's already looking bad since the device is unable to exit from
> D3cold. Also we shouldn't even be in D3cold here (it implies PCI device is
> powered off), but only D3hot. For reference on my DG2 it only goes from D0
> -> D3hot on runtime suspend and D3hot -> D0 on runtime resume. D3cold is
> explicitly disabled for now with rpm as per xe->d3cold_allowed. But even
> so, it's unclear why it can't restore power and get back to D0 (device is
> maybe unresponsive/dead?). Although even if it did get as far as the driver
> part of the resume it would still be all kinds of broken since VRAM has
> been nuked.
>
> I would assume there is something broken/faulty with that system.
Yes, now that I think about it, believe the IGT did work on a different
system.
> You could maybe try disabling rpm, and avoiding forced suspend/resume on
> that system:
>
> --- a/drivers/gpu/drm/xe/xe_pm.c
> +++ b/drivers/gpu/drm/xe/xe_pm.c
> @@ -124,7 +124,6 @@ void xe_pm_runtime_init(struct xe_device *xe)
> pm_runtime_use_autosuspend(dev);
> pm_runtime_set_autosuspend_delay(dev, 1000);
> pm_runtime_set_active(dev);
> - pm_runtime_allow(dev);
> pm_runtime_mark_last_busy(dev);
> pm_runtime_put_autosuspend(dev);
Thanks, yes this did unblock me on this system.
Ashutosh
prev parent reply other threads:[~2023-06-30 17:08 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-26 10:50 [Intel-xe] [PATCH v12 00/13] xe_device_mem_access fixes and related bits Matthew Auld
2023-06-26 10:50 ` [Intel-xe] [PATCH v12 01/13] drm/xe: fix xe_device_mem_access_get() races Matthew Auld
2023-06-30 15:22 ` Gupta, Anshuman
2023-07-04 11:25 ` Matthew Auld
2023-07-04 15:29 ` Gupta, Anshuman
2023-07-04 16:00 ` Matthew Auld
2023-07-11 9:00 ` Gupta, Anshuman
2023-07-11 11:06 ` Matthew Auld
2023-07-11 17:56 ` Gupta, Anshuman
2023-06-26 10:50 ` [Intel-xe] [PATCH v12 02/13] drm/xe/vm: tidy up xe_runtime_pm usage Matthew Auld
2023-06-26 10:50 ` [Intel-xe] [PATCH v12 03/13] drm/xe/debugfs: grab mem_access around forcewake Matthew Auld
2023-06-26 10:50 ` [Intel-xe] [PATCH v12 04/13] drm/xe/guc_pc: add missing mem_access for freq_rpe_show Matthew Auld
2023-06-27 6:53 ` Gupta, Anshuman
2023-06-27 8:20 ` Matthew Auld
2023-06-27 10:14 ` Gupta, Anshuman
2023-06-26 10:50 ` [Intel-xe] [PATCH v12 05/13] drm/xe/mmio: grab mem_access in xe_mmio_ioctl Matthew Auld
2023-06-26 10:50 ` [Intel-xe] [PATCH v12 06/13] drm/xe: ensure correct access_put ordering Matthew Auld
2023-06-26 10:50 ` [Intel-xe] [PATCH v12 07/13] drm/xe/pci: wrap probe with mem_access Matthew Auld
2023-06-26 10:50 ` [Intel-xe] [PATCH v12 08/13] drm/xe/display: use mem_access underneath Matthew Auld
2023-06-28 9:51 ` Gupta, Anshuman
2023-06-29 9:19 ` Matthew Auld
2023-06-26 10:50 ` [Intel-xe] [PATCH v12 09/13] drm/xe/mmio: enforce xe_device_assert_mem_access Matthew Auld
2023-06-26 10:50 ` [Intel-xe] [PATCH v12 10/13] drm/xe: drop xe_device_mem_access_get() from guc_ct_send Matthew Auld
2023-06-26 10:50 ` [Intel-xe] [PATCH v12 11/13] drm/xe/ggtt: prime ggtt->lock against FS_RECLAIM Matthew Auld
2023-06-26 10:50 ` [Intel-xe] [PATCH v12 12/13] drm/xe: drop xe_device_mem_access_get() from invalidation_vma Matthew Auld
2023-06-26 10:50 ` [Intel-xe] [PATCH v12 13/13] drm/xe: add lockdep annotation for xe_device_mem_access_get() Matthew Auld
2023-06-26 12:55 ` [Intel-xe] ✓ CI.Patch_applied: success for xe_device_mem_access fixes and related bits (rev2) Patchwork
2023-06-26 12:56 ` [Intel-xe] ✗ CI.checkpatch: warning " Patchwork
2023-06-26 12:57 ` [Intel-xe] ✓ CI.KUnit: success " Patchwork
2023-06-26 13:01 ` [Intel-xe] ✓ CI.Build: " Patchwork
2023-06-26 13:01 ` [Intel-xe] ✓ CI.Hooks: " Patchwork
2023-06-26 13:02 ` [Intel-xe] ✓ CI.checksparse: " Patchwork
2023-06-26 13:46 ` [Intel-xe] ○ CI.BAT: info " Patchwork
2023-06-30 6:21 ` [Intel-xe] [PATCH v12 00/13] xe_device_mem_access fixes and related bits Dixit, Ashutosh
2023-06-30 11:07 ` Matthew Auld
2023-06-30 16:59 ` Dixit, Ashutosh [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87cz1c6g41.wl-ashutosh.dixit@intel.com \
--to=ashutosh.dixit@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=matthew.auld@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox