From: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
To: Matthew Brost <matthew.brost@intel.com>
Cc: intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
maarten.lankhorst@linux.intel.com, matthew.auld@intel.com
Subject: Re: [Intel-gfx] [PATCH v3 2/3] drm/i915: Update error capture code to avoid using the current vma state
Date: Fri, 29 Oct 2021 08:31:26 +0200 [thread overview]
Message-ID: <6371ae79-6dd8-7375-e6ab-16cd98690f2f@linux.intel.com> (raw)
In-Reply-To: <20211028225535.GA32750@jons-linux-dev-box>
On 10/29/21 00:55, Matthew Brost wrote:
> On Thu, Oct 28, 2021 at 02:01:27PM +0200, Thomas Hellström wrote:
>> With asynchronous migrations, the vma state may be several migrations
>> ahead of the state that matches the request we're capturing.
>> Address that by introducing an i915_vma_snapshot structure that
>> can be used to snapshot relevant state at request submission.
>> In order to make sure we access the correct memory, the snapshots take
>> references on relevant sg-tables and memory regions.
>>
>> Also move the capture list allocation out of the fence signaling
>> critical path and use the CONFIG_DRM_I915_CAPTURE_ERROR define to
>> avoid compiling in members and functions used for error capture
>> when they're not used.
>>
>> Finally, correct lockdep annotation would reveal that error capture is
>> typically done in the fence signalling critical path. Alter the
>> error capture memory allocation mode accordingly.
>>
> I've seen this as well:
> https://patchwork.freedesktop.org/patch/451415/?series=93704&rev=5
>
> John Harrison and Daniele feeling was if a NOWAIT memory allocation
> context was used if the system was under any amount of memory pressure
> the error capture is likely to fail due to the size of the objects being
> allocated. Daniel's Vetter has purposed another solution - basically
> allocate a page at the NOWAIT context which is a larger rework.
>
> We have Jira for this. I'll dig this up and send it over off the list if
> you want to join that discussion.
>
> Matt
>
Please do, I basically agree with John and Daniele error capture may
fail under memory pressure, but I couldn't see how we could avoid that
short of exposing us to dma-fence deadlocks.
I figure basically we'd have to pin all vmas, reset, retire the request
and *then* do the allocating parts of the capture.
I'll ping Daniel about the best course of action meanwhile for the above
series.
/Thomas
WARNING: multiple messages have this Message-ID (diff)
From: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
To: Matthew Brost <matthew.brost@intel.com>
Cc: intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
maarten.lankhorst@linux.intel.com, matthew.auld@intel.com
Subject: Re: [PATCH v3 2/3] drm/i915: Update error capture code to avoid using the current vma state
Date: Fri, 29 Oct 2021 08:31:26 +0200 [thread overview]
Message-ID: <6371ae79-6dd8-7375-e6ab-16cd98690f2f@linux.intel.com> (raw)
In-Reply-To: <20211028225535.GA32750@jons-linux-dev-box>
On 10/29/21 00:55, Matthew Brost wrote:
> On Thu, Oct 28, 2021 at 02:01:27PM +0200, Thomas Hellström wrote:
>> With asynchronous migrations, the vma state may be several migrations
>> ahead of the state that matches the request we're capturing.
>> Address that by introducing an i915_vma_snapshot structure that
>> can be used to snapshot relevant state at request submission.
>> In order to make sure we access the correct memory, the snapshots take
>> references on relevant sg-tables and memory regions.
>>
>> Also move the capture list allocation out of the fence signaling
>> critical path and use the CONFIG_DRM_I915_CAPTURE_ERROR define to
>> avoid compiling in members and functions used for error capture
>> when they're not used.
>>
>> Finally, correct lockdep annotation would reveal that error capture is
>> typically done in the fence signalling critical path. Alter the
>> error capture memory allocation mode accordingly.
>>
> I've seen this as well:
> https://patchwork.freedesktop.org/patch/451415/?series=93704&rev=5
>
> John Harrison and Daniele feeling was if a NOWAIT memory allocation
> context was used if the system was under any amount of memory pressure
> the error capture is likely to fail due to the size of the objects being
> allocated. Daniel's Vetter has purposed another solution - basically
> allocate a page at the NOWAIT context which is a larger rework.
>
> We have Jira for this. I'll dig this up and send it over off the list if
> you want to join that discussion.
>
> Matt
>
Please do, I basically agree with John and Daniele error capture may
fail under memory pressure, but I couldn't see how we could avoid that
short of exposing us to dma-fence deadlocks.
I figure basically we'd have to pin all vmas, reset, retire the request
and *then* do the allocating parts of the capture.
I'll ping Daniel about the best course of action meanwhile for the above
series.
/Thomas
next prev parent reply other threads:[~2021-10-29 6:31 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-10-28 12:01 [Intel-gfx] [PATCH v3 0/3] Prepare error capture for asynchronous migration Thomas Hellström
2021-10-28 12:01 ` Thomas Hellström
2021-10-28 12:01 ` [Intel-gfx] [PATCH v3 1/3] drm/i915: Introduce refcounted sg-tables Thomas Hellström
2021-10-28 12:01 ` Thomas Hellström
2021-10-28 12:01 ` [Intel-gfx] [PATCH v3 2/3] drm/i915: Update error capture code to avoid using the current vma state Thomas Hellström
2021-10-28 12:01 ` Thomas Hellström
2021-10-28 22:55 ` [Intel-gfx] " Matthew Brost
2021-10-28 22:55 ` Matthew Brost
2021-10-29 6:31 ` Thomas Hellström [this message]
2021-10-29 6:31 ` Thomas Hellström
2021-10-28 12:01 ` [Intel-gfx] [PATCH v3 3/3] drm/i915: Initial introduction of vma resources Thomas Hellström
2021-10-28 12:01 ` Thomas Hellström
2021-10-28 14:49 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Prepare error capture for asynchronous migration (rev3) Patchwork
2021-10-28 15:21 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2021-10-28 19:55 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork
2021-10-28 22:23 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Prepare error capture for asynchronous migration (rev4) Patchwork
2021-10-28 22:55 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2021-10-29 7:26 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6371ae79-6dd8-7375-e6ab-16cd98690f2f@linux.intel.com \
--to=thomas.hellstrom@linux.intel.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=intel-gfx@lists.freedesktop.org \
--cc=maarten.lankhorst@linux.intel.com \
--cc=matthew.auld@intel.com \
--cc=matthew.brost@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.