From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: "Yang, Dong" <dong.yang@intel.com>,
"intel-gfx@lists.freedesktop.org"
<intel-gfx@lists.freedesktop.org>
Subject: Re: [Intel-gfx] [PATCH] drm/i915/gt: Do not add same i915_request to intel_context twice
Date: Mon, 13 Dec 2021 09:36:55 +0000 [thread overview]
Message-ID: <c15f9066-4119-2f97-ed93-1fc5a8d3d0fe@linux.intel.com> (raw)
In-Reply-To: <DM6PR11MB3051330D8484CC5EA0290DB0F2749@DM6PR11MB3051.namprd11.prod.outlook.com>
On 13/12/2021 01:53, Yang, Dong wrote:
> I am working on a customized kernel based on 5.4.39, issue can only reproduced when system facing low memory pressure, and system try to reclaim memory, then wrong double insert i915_reqeust coming from the i915_gem_shrink() path.
5.4 is quite old and there have been fixes to this code since. Any chance that you can repro on drm-tip? What project are you working on?
Is your bug perhaps similar to what c744d50363b7 ("drm/i915/gt: Split the breadcrumb spinlock between global and contexts") fixed? As the commit says:
"""
Furthermore, this closes the race between enabling the signaling context
while it is in the process of being signaled and removed:
"""
>
> i915_request_enable_breadcrumb+0x136/0x14a
> dma_fence_enable_sw_signaling+0x47/0xb0
> enable_signaling+0x66/0x80
> i915_active_wait+0xc1/0x150
> __i915_vma_unbind+0x17/0x1a0
> i915_vma_unbind+0x47/0xc0
> i915_gem_object_unbind+0x189/0x290
> i915_gem_shrink+0x139/0x460
> ? __pm_runtime_resume+0x53/0x70
> i915_gem_shrinker_scan+0x9c/0xb0
> do_shrink_slab+0x14f/0x2b0
> shrink_slab+0xa7/0x2a0
> shrink_node+0xd1/0x410
> balance_pgdat+0x2b7/0x500
> kswapd+0x1e2/0x3b0
>
> I believe it's not related to the ce->signal_lock, the lock should works normally.
>
> The i915_request_enable_breadcrumb() can be invoked by several context, like called from ioctl(), from interrupt context, and from memory swap thread, I suggest add a double check before insert i915_request to the list, it's hard to assure valid call from all the paths, but add check&protect can avoid the critical effect, because add same i915_request twice will trigger a dead loop in signal_irq_work() , and the loop will never break continue the i915_request. hwsp_seqno be changed, and invalid address access error reported followed by system panic.
Maybe, but I was pointing out double insert_breadcrumb is already protected when called inside i915_request_enable_breadcrumb - by the virtue of the spinlock and I915_FENCE_FLAG_SIGNAL. So maybe a race with remove or something, but it looks unlikely it is simple double add due parallel enablement.
Regards,
Tvrtko
>
> Thanks,
> Dong
>
> -----Original Message-----
> From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> Sent: Friday, December 10, 2021 4:51 PM
> To: Yang, Dong <dong.yang@intel.com>; intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH] drm/i915/gt: Do not add same i915_request to intel_context twice
>
>
> On 10/12/2021 01:31, dong.yang@intel.com wrote:
>> From: "Yang, Dong" <dong.yang@intel.com>
>>
>> With unknow race condition, the i915_request will be added
>
> What do you mean with unknown here?
>
>> to intel_context list twice, and result in system panic.
>>
>> If node alreay exist then do not add it again.
>
> Note the call chains are under ce->signal_lock and protecting from double add AFAICT:
>
> static void insert_breadcrumb(struct i915_request *rq) { ...
> if (test_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags))
> return;
> ...
> set_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags);
>
>
> bool i915_request_enable_breadcrumb(struct i915_request *rq) { ...
> spin_lock(&ce->signal_lock);
> if (test_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags))
> insert_breadcrumb(rq);
> spin_unlock(&ce->signal_lock);
>
>
> void i915_request_cancel_breadcrumb(struct i915_request *rq) { ...
> spin_lock(&ce->signal_lock);
> if (!test_and_clear_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags)) {
> spin_unlock(&ce->signal_lock);
> return;
> }
>
> void intel_context_remove_breadcrumbs(struct intel_context *ce,
> struct intel_breadcrumbs *b)
> {
> ...
> spin_lock_irqsave(&ce->signal_lock, flags);
>
> if (list_empty(&ce->signals))
> goto unlock;
>
> list_for_each_entry_safe(rq, rn, &ce->signals, signal_link) {
> GEM_BUG_ON(!__i915_request_is_complete(rq));
> if (!test_and_clear_bit(I915_FENCE_FLAG_SIGNAL,
> &rq->fence.flags))
> continue;
>
> The last one in signal_irq_work is guarded by the __i915_request_is_complete check.
>
> So I think more context is needed on how you found this may be an issue.
>
> Regards,
>
> Tvrtko
>
>>
>> Signed-off-by: Yang, Dong <dong.yang@intel.com>
>> ---
>> drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 3 +++
>> 1 file changed, 3 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
>> b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
>> index 209cf265bf74..9c7bc060d2ae 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
>> @@ -387,6 +387,9 @@ static void insert_breadcrumb(struct i915_request *rq)
>> }
>> }
>>
>> + if (&rq->signal_link == pos)
>> + return;
>> +
>> i915_request_get(rq);
>> list_add_rcu(&rq->signal_link, pos);
>> GEM_BUG_ON(!check_signal_order(ce, rq));
>>
next prev parent reply other threads:[~2021-12-13 9:37 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-12-10 1:31 [Intel-gfx] [PATCH] drm/i915/gt: Do not add same i915_request to intel_context twice dong.yang
2021-12-10 8:51 ` Tvrtko Ursulin
2021-12-13 1:53 ` Yang, Dong
2021-12-13 9:36 ` Tvrtko Ursulin [this message]
2021-12-14 5:58 ` Yang, Dong
2021-12-14 15:40 ` Tvrtko Ursulin
2021-12-10 11:00 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for " Patchwork
2021-12-10 11:31 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2021-12-11 4:59 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c15f9066-4119-2f97-ed93-1fc5a8d3d0fe@linux.intel.com \
--to=tvrtko.ursulin@linux.intel.com \
--cc=dong.yang@intel.com \
--cc=intel-gfx@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox