From: Thomas Gleixner <tglx@linutronix.de>
To: Linus Torvalds <torvalds@linux-foundation.org>,
Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: Haneen Mohammed <hamohammed.sa@gmail.com>,
Hillf Danton <hdanton@sina.com>,
Sanan Hasanov <Sanan.Hasanov@ucf.edu>,
Rodrigo Siqueira <rodrigosiqueiramelo@gmail.com>,
LKML <linux-kernel@vger.kernel.org>,
DRI <dri-devel@lists.freedesktop.org>,
Melissa Wen <melissa.srw@gmail.com>,
Maira Canal <mairacanal@riseup.net>,
syzkaller@googlegroups.com
Subject: Re: drm/vkms: deadlock between dev->event_lock and timer
Date: Wed, 13 Sep 2023 23:08:21 +0200 [thread overview]
Message-ID: <87pm2lzsqi.ffs@tglx> (raw)
In-Reply-To: <CAHk-=wgb9ccWN3Nks5STYUDqQUeHZdCLsK4kA37SdDJuGZfukg@mail.gmail.com>
On Wed, Sep 13 2023 at 09:47, Linus Torvalds wrote:
> On Wed, 13 Sept 2023 at 07:21, Tetsuo Handa
> <penguin-kernel@i-love.sakura.ne.jp> wrote:
>>
>> Hello. A deadlock was reported in drivers/gpu/drm/vkms/ .
>> It looks like this locking pattern remains as of 6.6-rc1. Please fix.
>>
>> void drm_crtc_vblank_off(struct drm_crtc *crtc) {
>> spin_lock_irq(&dev->event_lock);
>> drm_vblank_disable_and_save(dev, pipe) {
>> __disable_vblank(dev, pipe) {
>> crtc->funcs->disable_vblank(crtc) == vkms_disable_vblank {
>> hrtimer_cancel(&out->vblank_hrtimer) { // Retries with dev->event_lock held until lock_hrtimer_base() succeeds.
>> ret = hrtimer_try_to_cancel(timer) {
>> base = lock_hrtimer_base(timer, &flags); // Fails forever because vkms_vblank_simulate() is in progress.
>
> Heh. Ok. This is clearly a bug, but it does seem to be limited to just
> the vkms driver, and literally only to the "simulate vblank" case.
>
> The worst part about it is that it's so subtle and not obvious.
>
> Some light grepping seems to show that amdgpu has almost the exact
> same pattern in its own vkms thing, except it uses
>
> hrtimer_try_to_cancel(&amdgpu_crtc->vblank_timer);
>
> directly, which presumably fixes the livelock, but means that the
> cancel will fail if it's currently running.
>
> So just doing the same thing in the vkms driver probably fixes things.
>
> Maybe the vkms people need to add a flag to say "it's canceled" so
> that it doesn't then get re-enabled? Or maybe it doesn't matter
> and/or already happens for some reason I didn't look into.
Maybe the VKMS people need to understand locking in the first place. The
first thing I saw in this code is:
static enum hrtimer_restart vkms_vblank_simulate(struct hrtimer *timer)
{
...
mutex_unlock(&output->enabled_lock);
What?
Unlocking a mutex in the context of a hrtimer callback is simply
violating all mutex locking rules.
How has this code ever survived lock debugging without triggering a big
fat warning?
Thanks,
tglx
WARNING: multiple messages have this Message-ID (diff)
From: Thomas Gleixner <tglx@linutronix.de>
To: Linus Torvalds <torvalds@linux-foundation.org>,
Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: Rodrigo Siqueira <rodrigosiqueiramelo@gmail.com>,
Melissa Wen <melissa.srw@gmail.com>,
Maira Canal <mairacanal@riseup.net>,
Haneen Mohammed <hamohammed.sa@gmail.com>,
Daniel Vetter <daniel@ffwll.ch>, David Airlie <airlied@gmail.com>,
DRI <dri-devel@lists.freedesktop.org>,
syzkaller@googlegroups.com, LKML <linux-kernel@vger.kernel.org>,
Hillf Danton <hdanton@sina.com>,
Sanan Hasanov <Sanan.Hasanov@ucf.edu>
Subject: Re: drm/vkms: deadlock between dev->event_lock and timer
Date: Wed, 13 Sep 2023 23:08:21 +0200 [thread overview]
Message-ID: <87pm2lzsqi.ffs@tglx> (raw)
In-Reply-To: <CAHk-=wgb9ccWN3Nks5STYUDqQUeHZdCLsK4kA37SdDJuGZfukg@mail.gmail.com>
On Wed, Sep 13 2023 at 09:47, Linus Torvalds wrote:
> On Wed, 13 Sept 2023 at 07:21, Tetsuo Handa
> <penguin-kernel@i-love.sakura.ne.jp> wrote:
>>
>> Hello. A deadlock was reported in drivers/gpu/drm/vkms/ .
>> It looks like this locking pattern remains as of 6.6-rc1. Please fix.
>>
>> void drm_crtc_vblank_off(struct drm_crtc *crtc) {
>> spin_lock_irq(&dev->event_lock);
>> drm_vblank_disable_and_save(dev, pipe) {
>> __disable_vblank(dev, pipe) {
>> crtc->funcs->disable_vblank(crtc) == vkms_disable_vblank {
>> hrtimer_cancel(&out->vblank_hrtimer) { // Retries with dev->event_lock held until lock_hrtimer_base() succeeds.
>> ret = hrtimer_try_to_cancel(timer) {
>> base = lock_hrtimer_base(timer, &flags); // Fails forever because vkms_vblank_simulate() is in progress.
>
> Heh. Ok. This is clearly a bug, but it does seem to be limited to just
> the vkms driver, and literally only to the "simulate vblank" case.
>
> The worst part about it is that it's so subtle and not obvious.
>
> Some light grepping seems to show that amdgpu has almost the exact
> same pattern in its own vkms thing, except it uses
>
> hrtimer_try_to_cancel(&amdgpu_crtc->vblank_timer);
>
> directly, which presumably fixes the livelock, but means that the
> cancel will fail if it's currently running.
>
> So just doing the same thing in the vkms driver probably fixes things.
>
> Maybe the vkms people need to add a flag to say "it's canceled" so
> that it doesn't then get re-enabled? Or maybe it doesn't matter
> and/or already happens for some reason I didn't look into.
Maybe the VKMS people need to understand locking in the first place. The
first thing I saw in this code is:
static enum hrtimer_restart vkms_vblank_simulate(struct hrtimer *timer)
{
...
mutex_unlock(&output->enabled_lock);
What?
Unlocking a mutex in the context of a hrtimer callback is simply
violating all mutex locking rules.
How has this code ever survived lock debugging without triggering a big
fat warning?
Thanks,
tglx
next prev parent reply other threads:[~2023-09-13 21:14 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-12 23:02 BUG: soft lockup in smp_call_function Sanan Hasanov
2023-09-13 10:05 ` Peter Zijlstra
2023-09-13 11:07 ` Hillf Danton
2023-09-13 14:21 ` drm/vkms: deadlock between dev->event_lock and timer Tetsuo Handa
2023-09-13 14:21 ` Tetsuo Handa
2023-09-13 16:47 ` Linus Torvalds
2023-09-13 16:47 ` Linus Torvalds
2023-09-13 21:08 ` Thomas Gleixner [this message]
2023-09-13 21:08 ` Thomas Gleixner
2023-09-14 6:33 ` Tetsuo Handa
2023-09-14 6:33 ` Tetsuo Handa
2023-09-14 8:12 ` Daniel Vetter
2023-09-14 8:12 ` Daniel Vetter
2023-09-18 22:02 ` Helen Koike
2023-09-19 6:38 ` Daniel Stone
2023-09-19 6:38 ` Daniel Stone
2023-09-13 14:30 ` BUG: soft lockup in smp_call_function Tetsuo Handa
2023-09-14 12:21 ` Hillf Danton
2023-09-14 13:13 ` Tetsuo Handa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87pm2lzsqi.ffs@tglx \
--to=tglx@linutronix.de \
--cc=Sanan.Hasanov@ucf.edu \
--cc=dri-devel@lists.freedesktop.org \
--cc=hamohammed.sa@gmail.com \
--cc=hdanton@sina.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mairacanal@riseup.net \
--cc=melissa.srw@gmail.com \
--cc=penguin-kernel@i-love.sakura.ne.jp \
--cc=rodrigosiqueiramelo@gmail.com \
--cc=syzkaller@googlegroups.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.