* WARNING in amdgpu_sync_keep_later / dma_fence_is_later should be rate limited
@ 2023-09-21 19:41 Rafał Miłecki
2023-09-21 19:52 ` Deucher, Alexander
0 siblings, 1 reply; 5+ messages in thread
From: Rafał Miłecki @ 2023-09-21 19:41 UTC (permalink / raw)
To: Alex Deucher, Christian König, Pan Xinhui, amd-gfx,
dri-devel, Lang Yu
Hi,
backporting commit 187916e6ed9d ("drm/amdgpu: install stub fence into
potential unused fence pointers") to stable kernels resulted in lots
of WARNINGs on some devices. In my case I was getting 3 WARNINGs per
second (~150 lines logged every second). Commit ended up being
reverted for stable but it exposed a potential problem. My messages
log size was reaching gigabytes and was running my /tmp/ out of space.
Could someone take a look at amdgpu_sync_keep_later /
dma_fence_is_later and make sure its logging is rate limited to avoid
such situations in the future, please?
Revert in linux-5.15.x:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.15.y&id=fae2d591f3cb31f722c7f065acf586830eab8c2a
openSUSE bug report:
https://bugzilla.opensuse.org/show_bug.cgi?id=1215523
--
Rafał
^ permalink raw reply [flat|nested] 5+ messages in thread* RE: WARNING in amdgpu_sync_keep_later / dma_fence_is_later should be rate limited
2023-09-21 19:41 WARNING in amdgpu_sync_keep_later / dma_fence_is_later should be rate limited Rafał Miłecki
@ 2023-09-21 19:52 ` Deucher, Alexander
2023-09-21 20:11 ` Rafał Miłecki
0 siblings, 1 reply; 5+ messages in thread
From: Deucher, Alexander @ 2023-09-21 19:52 UTC (permalink / raw)
To: Rafał Miłecki, Koenig, Christian, Pan, Xinhui,
amd-gfx@lists.freedesktop.org, dri-devel, Yu, Lang
[Public]
> -----Original Message-----
> From: Rafał Miłecki <zajec5@gmail.com>
> Sent: Thursday, September 21, 2023 3:41 PM
> To: Deucher, Alexander <Alexander.Deucher@amd.com>; Koenig, Christian
> <Christian.Koenig@amd.com>; Pan, Xinhui <Xinhui.Pan@amd.com>; amd-
> gfx@lists.freedesktop.org; dri-devel <dri-devel@lists.freedesktop.org>; Yu,
> Lang <Lang.Yu@amd.com>
> Subject: WARNING in amdgpu_sync_keep_later / dma_fence_is_later should
> be rate limited
>
> Hi,
>
> backporting commit 187916e6ed9d ("drm/amdgpu: install stub fence into
> potential unused fence pointers") to stable kernels resulted in lots of
> WARNINGs on some devices. In my case I was getting 3 WARNINGs per
> second (~150 lines logged every second). Commit ended up being reverted for
> stable but it exposed a potential problem. My messages log size was reaching
> gigabytes and was running my /tmp/ out of space.
>
> Could someone take a look at amdgpu_sync_keep_later / dma_fence_is_later
> and make sure its logging is rate limited to avoid such situations in the future,
> please?
>
> Revert in linux-5.15.x:
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=li
> nux-5.15.y&id=fae2d591f3cb31f722c7f065acf586830eab8c2a
>
> openSUSE bug report:
> https://bugzilla.opensuse.org/show_bug.cgi?id=1215523
These patches were never intended for stable. They were picked up by Sasha's stable autoselect tools and automatically applied to stable kernels.
Alex
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: WARNING in amdgpu_sync_keep_later / dma_fence_is_later should be rate limited
2023-09-21 19:52 ` Deucher, Alexander
@ 2023-09-21 20:11 ` Rafał Miłecki
2023-09-21 21:30 ` Alex Deucher
0 siblings, 1 reply; 5+ messages in thread
From: Rafał Miłecki @ 2023-09-21 20:11 UTC (permalink / raw)
To: Deucher, Alexander, Koenig, Christian, Pan, Xinhui,
amd-gfx@lists.freedesktop.org, dri-devel, Yu, Lang
On 21.09.2023 21:52, Deucher, Alexander wrote:
>> backporting commit 187916e6ed9d ("drm/amdgpu: install stub fence into
>> potential unused fence pointers") to stable kernels resulted in lots of
>> WARNINGs on some devices. In my case I was getting 3 WARNINGs per
>> second (~150 lines logged every second). Commit ended up being reverted for
>> stable but it exposed a potential problem. My messages log size was reaching
>> gigabytes and was running my /tmp/ out of space.
>>
>> Could someone take a look at amdgpu_sync_keep_later / dma_fence_is_later
>> and make sure its logging is rate limited to avoid such situations in the future,
>> please?
>>
>> Revert in linux-5.15.x:
>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=li
>> nux-5.15.y&id=fae2d591f3cb31f722c7f065acf586830eab8c2a
>>
>> openSUSE bug report:
>> https://bugzilla.opensuse.org/show_bug.cgi?id=1215523
>
> These patches were never intended for stable. They were picked up by Sasha's stable autoselect tools and automatically applied to stable kernels.
Are you saying massive WARNINGs in dma_fence_is_later() can't happen
in any other case? I understand it was an incorrect backport action but
I thought we may learn from it and still add some rate limit.
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: WARNING in amdgpu_sync_keep_later / dma_fence_is_later should be rate limited
2023-09-21 20:11 ` Rafał Miłecki
@ 2023-09-21 21:30 ` Alex Deucher
2023-09-22 5:35 ` Christian König
0 siblings, 1 reply; 5+ messages in thread
From: Alex Deucher @ 2023-09-21 21:30 UTC (permalink / raw)
To: Rafał Miłecki
Cc: Pan, Xinhui, dri-devel, amd-gfx@lists.freedesktop.org,
Deucher, Alexander, Yu, Lang, Koenig, Christian
On Thu, Sep 21, 2023 at 4:21 PM Rafał Miłecki <zajec5@gmail.com> wrote:
>
> On 21.09.2023 21:52, Deucher, Alexander wrote:
> >> backporting commit 187916e6ed9d ("drm/amdgpu: install stub fence into
> >> potential unused fence pointers") to stable kernels resulted in lots of
> >> WARNINGs on some devices. In my case I was getting 3 WARNINGs per
> >> second (~150 lines logged every second). Commit ended up being reverted for
> >> stable but it exposed a potential problem. My messages log size was reaching
> >> gigabytes and was running my /tmp/ out of space.
> >>
> >> Could someone take a look at amdgpu_sync_keep_later / dma_fence_is_later
> >> and make sure its logging is rate limited to avoid such situations in the future,
> >> please?
> >>
> >> Revert in linux-5.15.x:
> >> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=li
> >> nux-5.15.y&id=fae2d591f3cb31f722c7f065acf586830eab8c2a
> >>
> >> openSUSE bug report:
> >> https://bugzilla.opensuse.org/show_bug.cgi?id=1215523
> >
> > These patches were never intended for stable. They were picked up by Sasha's stable autoselect tools and automatically applied to stable kernels.
>
> Are you saying massive WARNINGs in dma_fence_is_later() can't happen
> in any other case? I understand it was an incorrect backport action but
> I thought we may learn from it and still add some rate limit.
All of the current places where that function is used check the
contexts before calling it so it should be safe as is in the tree.
That said, something like this could potentially happen again. I
don't think using WARN_ON_RATELIMIT() would be a problem.
Alex
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: WARNING in amdgpu_sync_keep_later / dma_fence_is_later should be rate limited
2023-09-21 21:30 ` Alex Deucher
@ 2023-09-22 5:35 ` Christian König
0 siblings, 0 replies; 5+ messages in thread
From: Christian König @ 2023-09-22 5:35 UTC (permalink / raw)
To: Alex Deucher, Rafał Miłecki
Cc: Pan, Xinhui, amd-gfx@lists.freedesktop.org, dri-devel,
Deucher, Alexander, Yu, Lang, Koenig, Christian
Am 21.09.23 um 23:30 schrieb Alex Deucher:
> On Thu, Sep 21, 2023 at 4:21 PM Rafał Miłecki <zajec5@gmail.com> wrote:
>> On 21.09.2023 21:52, Deucher, Alexander wrote:
>>>> backporting commit 187916e6ed9d ("drm/amdgpu: install stub fence into
>>>> potential unused fence pointers") to stable kernels resulted in lots of
>>>> WARNINGs on some devices. In my case I was getting 3 WARNINGs per
>>>> second (~150 lines logged every second). Commit ended up being reverted for
>>>> stable but it exposed a potential problem. My messages log size was reaching
>>>> gigabytes and was running my /tmp/ out of space.
>>>>
>>>> Could someone take a look at amdgpu_sync_keep_later / dma_fence_is_later
>>>> and make sure its logging is rate limited to avoid such situations in the future,
>>>> please?
>>>>
>>>> Revert in linux-5.15.x:
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=li
>>>> nux-5.15.y&id=fae2d591f3cb31f722c7f065acf586830eab8c2a
>>>>
>>>> openSUSE bug report:
>>>> https://bugzilla.opensuse.org/show_bug.cgi?id=1215523
>>> These patches were never intended for stable. They were picked up by Sasha's stable autoselect tools and automatically applied to stable kernels.
>> Are you saying massive WARNINGs in dma_fence_is_later() can't happen
>> in any other case? I understand it was an incorrect backport action but
>> I thought we may learn from it and still add some rate limit.
> All of the current places where that function is used check the
> contexts before calling it so it should be safe as is in the tree.
> That said, something like this could potentially happen again. I
> don't think using WARN_ON_RATELIMIT() would be a problem.
Yeah, but it also shouldn't be necessary.
When this triggers you have a major driver bug at hand, spamming the
logs is then the least of your problems.
Christian.
>
> Alex
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2023-09-22 5:35 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-21 19:41 WARNING in amdgpu_sync_keep_later / dma_fence_is_later should be rate limited Rafał Miłecki
2023-09-21 19:52 ` Deucher, Alexander
2023-09-21 20:11 ` Rafał Miłecki
2023-09-21 21:30 ` Alex Deucher
2023-09-22 5:35 ` Christian König
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox