All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alyssa Ross <hi@alyssa.is>
To: Thomas Zimmermann <tzimmermann@suse.de>
Cc: dri-devel@lists.freedesktop.org, javierm@redhat.com,
	mripard@kernel.org, stable@vger.kernel.org
Subject: Re: [PATCH v2] drm/atomic-helpers: Invoke end_fb_access while owning plane state
Date: Wed, 29 Nov 2023 14:49:36 +0100	[thread overview]
Message-ID: <87r0k8vfdb.fsf@alyssa.is> (raw)
In-Reply-To: <1093023d-1e9a-422f-bb5d-e716c0789f70@suse.de>

[-- Attachment #1: Type: text/plain, Size: 3385 bytes --]

Thomas Zimmermann <tzimmermann@suse.de> writes:

> Hi
>
> Am 27.11.23 um 17:25 schrieb Alyssa Ross:
>> Thomas Zimmermann <tzimmermann@suse.de> writes:
>> 
>>> Invoke drm_plane_helper_funcs.end_fb_access before
>>> drm_atomic_helper_commit_hw_done(). The latter function hands over
>>> ownership of the plane state to the following commit, which might
>>> free it. Releasing resources in end_fb_access then operates on undefined
>>> state. This bug has been observed with non-blocking commits when they
>>> are being queued up quickly.
>>>
>>> Here is an example stack trace from the bug report. The plane state has
>>> been free'd already, so the pages for drm_gem_fb_vunmap() are gone.
>>>
>>> Unable to handle kernel paging request at virtual address 0000000100000049
>>> [...]
>>>   drm_gem_fb_vunmap+0x18/0x74
>>>   drm_gem_end_shadow_fb_access+0x1c/0x2c
>>>   drm_atomic_helper_cleanup_planes+0x58/0xd8
>>>   drm_atomic_helper_commit_tail+0x90/0xa0
>>>   commit_tail+0x15c/0x188
>>>   commit_work+0x14/0x20
>>>
>>> For aborted commits, it is still ok to run end_fb_access as part of the
>>> plane's cleanup. Add a test to drm_atomic_helper_cleanup_planes().
>>>
>>> v2:
>>> 	* fix test in drm_atomic_helper_cleanup_planes()
>>>
>>> Reported-by: Alyssa Ross <hi@alyssa.is>
>>> Closes: https://lore.kernel.org/dri-devel/87leazm0ya.fsf@alyssa.is/
>>> Suggested-by: Daniel Vetter <daniel@ffwll.ch>
>>> Fixes: 94d879eaf7fb ("drm/atomic-helper: Add {begin,end}_fb_access to plane helpers")
>>> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
>>> Cc: <stable@vger.kernel.org> # v6.2+
>>> ---
>>>   drivers/gpu/drm/drm_atomic_helper.c | 17 +++++++++++++++++
>>>   1 file changed, 17 insertions(+)
>> 
>> Got this basically immediately. :(
>
> I've never seen such problems on other systems. Is there anything 
> different about the Mac systems? How do you trigger these errors?

My understanding is that all sorts of things are different, but I don't
know too much about the details.  There's of course a chance that there
could be some other change in the Asahi Linux kernel that causes this
problem to surface — as I said, I reviewed the diff with mainline and
didn't see anything that looked relevant, but I could well have missed
something.  I don't think I can test mainline directly, as it doesn't
yet support enough of the hardware — for slightly older Apple Silicon
Mac models, I think enough is upstream that this would be possible, but
I don't have access to any.

I started off encountering these errors every few days.  I noticed them
because they would sometimes result in my system either starting to
freeze for 10 seconds at a time, or until I switched VT.  They seem to
correlate with the system being under high CPU load.  I was also able to
substantially increase the frequency with which they occurred by adding
logging to the kernel — even just drm.debug=0x10 makes a big difference,
and when I also added a few dump_backtrace() calls when I was trying to
understand the code and diagnose the problem, I would relatively
consistently encounter an Oops within a few minutes of load.

BTW: v3 is looking good so far.  I've only been testing it since this
morning, though, so I'll keep trying it out for a bit longer before I
declare the problem to have been solved and send a Tested-by.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

WARNING: multiple messages have this Message-ID (diff)
From: Alyssa Ross <hi@alyssa.is>
To: Thomas Zimmermann <tzimmermann@suse.de>
Cc: mripard@kernel.org, stable@vger.kernel.org, javierm@redhat.com,
	dri-devel@lists.freedesktop.org
Subject: Re: [PATCH v2] drm/atomic-helpers: Invoke end_fb_access while owning plane state
Date: Wed, 29 Nov 2023 14:49:36 +0100	[thread overview]
Message-ID: <87r0k8vfdb.fsf@alyssa.is> (raw)
In-Reply-To: <1093023d-1e9a-422f-bb5d-e716c0789f70@suse.de>

[-- Attachment #1: Type: text/plain, Size: 3385 bytes --]

Thomas Zimmermann <tzimmermann@suse.de> writes:

> Hi
>
> Am 27.11.23 um 17:25 schrieb Alyssa Ross:
>> Thomas Zimmermann <tzimmermann@suse.de> writes:
>> 
>>> Invoke drm_plane_helper_funcs.end_fb_access before
>>> drm_atomic_helper_commit_hw_done(). The latter function hands over
>>> ownership of the plane state to the following commit, which might
>>> free it. Releasing resources in end_fb_access then operates on undefined
>>> state. This bug has been observed with non-blocking commits when they
>>> are being queued up quickly.
>>>
>>> Here is an example stack trace from the bug report. The plane state has
>>> been free'd already, so the pages for drm_gem_fb_vunmap() are gone.
>>>
>>> Unable to handle kernel paging request at virtual address 0000000100000049
>>> [...]
>>>   drm_gem_fb_vunmap+0x18/0x74
>>>   drm_gem_end_shadow_fb_access+0x1c/0x2c
>>>   drm_atomic_helper_cleanup_planes+0x58/0xd8
>>>   drm_atomic_helper_commit_tail+0x90/0xa0
>>>   commit_tail+0x15c/0x188
>>>   commit_work+0x14/0x20
>>>
>>> For aborted commits, it is still ok to run end_fb_access as part of the
>>> plane's cleanup. Add a test to drm_atomic_helper_cleanup_planes().
>>>
>>> v2:
>>> 	* fix test in drm_atomic_helper_cleanup_planes()
>>>
>>> Reported-by: Alyssa Ross <hi@alyssa.is>
>>> Closes: https://lore.kernel.org/dri-devel/87leazm0ya.fsf@alyssa.is/
>>> Suggested-by: Daniel Vetter <daniel@ffwll.ch>
>>> Fixes: 94d879eaf7fb ("drm/atomic-helper: Add {begin,end}_fb_access to plane helpers")
>>> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
>>> Cc: <stable@vger.kernel.org> # v6.2+
>>> ---
>>>   drivers/gpu/drm/drm_atomic_helper.c | 17 +++++++++++++++++
>>>   1 file changed, 17 insertions(+)
>> 
>> Got this basically immediately. :(
>
> I've never seen such problems on other systems. Is there anything 
> different about the Mac systems? How do you trigger these errors?

My understanding is that all sorts of things are different, but I don't
know too much about the details.  There's of course a chance that there
could be some other change in the Asahi Linux kernel that causes this
problem to surface — as I said, I reviewed the diff with mainline and
didn't see anything that looked relevant, but I could well have missed
something.  I don't think I can test mainline directly, as it doesn't
yet support enough of the hardware — for slightly older Apple Silicon
Mac models, I think enough is upstream that this would be possible, but
I don't have access to any.

I started off encountering these errors every few days.  I noticed them
because they would sometimes result in my system either starting to
freeze for 10 seconds at a time, or until I switched VT.  They seem to
correlate with the system being under high CPU load.  I was also able to
substantially increase the frequency with which they occurred by adding
logging to the kernel — even just drm.debug=0x10 makes a big difference,
and when I also added a few dump_backtrace() calls when I was trying to
understand the code and diagnose the problem, I would relatively
consistently encounter an Oops within a few minutes of load.

BTW: v3 is looking good so far.  I've only been testing it since this
morning, though, so I'll keep trying it out for a bit longer before I
declare the problem to have been solved and send a Tested-by.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

  reply	other threads:[~2023-11-29 13:49 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-27 14:20 [PATCH v2] drm/atomic-helpers: Invoke end_fb_access while owning plane state Thomas Zimmermann
2023-11-27 14:20 ` Thomas Zimmermann
2023-11-27 16:25 ` Alyssa Ross
2023-11-27 16:25   ` Alyssa Ross
2023-11-28  7:56   ` Thomas Zimmermann
2023-11-28  7:56     ` Thomas Zimmermann
2023-11-29 13:49     ` Alyssa Ross [this message]
2023-11-29 13:49       ` Alyssa Ross

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87r0k8vfdb.fsf@alyssa.is \
    --to=hi@alyssa.is \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=javierm@redhat.com \
    --cc=mripard@kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=tzimmermann@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.