* Re: [Intel-gfx] [PATCH] console: implement lockdep support for console_lock
@ 2012-09-22 20:06 ` Greg KH
0 siblings, 0 replies; 17+ messages in thread
From: Greg KH @ 2012-09-22 20:06 UTC (permalink / raw)
To: Daniel Vetter
Cc: LKML, Peter Zijlstra, Intel Graphics Development, DRI Development,
Thomas Gleixner, Alan Cox
On Sat, Sep 22, 2012 at 07:52:11PM +0200, Daniel Vetter wrote:
> Dave Airlie recently discovered a locking bug in the fbcon layer,
> where a timer_del_sync (for the blinking cursor) deadlocks with the
> timer itself, since both (want to) hold the console_lock:
>
> https://lkml.org/lkml/2012/8/21/36
>
> Unfortunately the console_lock isn't a plain mutex and hence has no
> lockdep support. Which resulted in a few days wasted of tracking down
> this bug (complicated by the fact that printk doesn't show anything
> when the console is locked) instead of noticing the bug much earlier
> with the lockdep splat.
>
> Hence I've figured I need to fix that for the next deadlock involving
> console_lock - and with kms/drm growing ever more complex locking
> that'll eventually happen.
>
> Now the console_lock has rather funky semantics, so after a quick irc
> discussion with Thomas Gleixner and Dave Airlie I've quickly ditched
> the original idead of switching to a real mutex (since it won't work)
> and instead opted to annotate the console_lock with lockdep
> information manually.
>
> There are a few special cases:
> - The console_lock state is protected by the console_sem, and usually
> grabbed/dropped at _lock/_unlock time. But the suspend/resume code
> drops the semaphore without dropping the console_lock (see
> suspend_console/resume_console). But since the same thread that did
> the suspend will do the resume, we don't need to fix up anything.
>
> - In the printk code there's a special trylock, only used to kick off
> the logbuffer printk'ing in console_unlock. But all that happens
> while lockdep is disable (since printk does a few other evil
> tricks). So no issue there, either.
>
> - The console_lock can also be acquired form irq context (but only
> with a trylock). lockdep already handles that.
>
> This all leaves us with annotating the normal console_lock, _unlock
> and _trylock functions.
>
> And yes, it works - simply unloading a drm kms driver resulted in
> lockdep complaining about the deadlock in fbcon_deinit:
>
> ======================================================
> [ INFO: possible circular locking dependency detected ]
> 3.6.0-rc2+ #552 Not tainted
> -------------------------------------------------------
> kms-reload/3577 is trying to acquire lock:
> ((&info->queue)){+.+...}, at: [<ffffffff81058c70>] wait_on_work+0x0/0xa7
>
> but task is already holding lock:
> (console_lock){+.+.+.}, at: [<ffffffff81264686>] bind_con_driver+0x38/0x263
>
> which lock already depends on the new lock.
>
> the existing dependency chain (in reverse order) is:
>
> -> #1 (console_lock){+.+.+.}:
> [<ffffffff81087440>] lock_acquire+0x95/0x105
> [<ffffffff81040190>] console_lock+0x59/0x5b
> [<ffffffff81209cb6>] fb_flashcursor+0x2e/0x12c
> [<ffffffff81057c3e>] process_one_work+0x1d9/0x3b4
> [<ffffffff810584a2>] worker_thread+0x1a7/0x24b
> [<ffffffff8105ca29>] kthread+0x7f/0x87
> [<ffffffff813b1204>] kernel_thread_helper+0x4/0x10
>
> -> #0 ((&info->queue)){+.+...}:
> [<ffffffff81086cb3>] __lock_acquire+0x999/0xcf6
> [<ffffffff81087440>] lock_acquire+0x95/0x105
> [<ffffffff81058cab>] wait_on_work+0x3b/0xa7
> [<ffffffff81058dd6>] __cancel_work_timer+0xbf/0x102
> [<ffffffff81058e33>] cancel_work_sync+0xb/0xd
> [<ffffffff8120a3b3>] fbcon_deinit+0x11c/0x1dc
> [<ffffffff81264793>] bind_con_driver+0x145/0x263
> [<ffffffff81264a45>] unbind_con_driver+0x14f/0x195
> [<ffffffff8126540c>] store_bind+0x1ad/0x1c1
> [<ffffffff8127cbb7>] dev_attr_store+0x13/0x1f
> [<ffffffff8116d884>] sysfs_write_file+0xe9/0x121
> [<ffffffff811145b2>] vfs_write+0x9b/0xfd
> [<ffffffff811147b7>] sys_write+0x3e/0x6b
> [<ffffffff813b0039>] system_call_fastpath+0x16/0x1b
>
> other info that might help us debug this:
>
> Possible unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> lock(console_lock);
> lock((&info->queue));
> lock(console_lock);
> lock((&info->queue));
>
> *** DEADLOCK ***
>
> v2: Mark the lockdep_map static, noticed by Jani Nikula.
>
> Cc: Dave Airlie <airlied@gmail.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> ---
> kernel/printk.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
So I'm guessing I should take this through the tty tree, right? Any
objections to that for 3.7?
thanks,
greg k-h
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: [Intel-gfx] [PATCH] console: implement lockdep support for console_lock
2012-09-22 20:06 ` [Intel-gfx] " Greg KH
(?)
@ 2012-09-24 11:36 ` Daniel Vetter
-1 siblings, 0 replies; 17+ messages in thread
From: Daniel Vetter @ 2012-09-24 11:36 UTC (permalink / raw)
To: Greg KH
Cc: Daniel Vetter, LKML, Peter Zijlstra, Intel Graphics Development,
DRI Development, Thomas Gleixner, Alan Cox
On Sat, Sep 22, 2012 at 01:06:29PM -0700, Greg KH wrote:
> On Sat, Sep 22, 2012 at 07:52:11PM +0200, Daniel Vetter wrote:
> > Dave Airlie recently discovered a locking bug in the fbcon layer,
> > where a timer_del_sync (for the blinking cursor) deadlocks with the
> > timer itself, since both (want to) hold the console_lock:
> >
> > https://lkml.org/lkml/2012/8/21/36
> >
> > Unfortunately the console_lock isn't a plain mutex and hence has no
> > lockdep support. Which resulted in a few days wasted of tracking down
> > this bug (complicated by the fact that printk doesn't show anything
> > when the console is locked) instead of noticing the bug much earlier
> > with the lockdep splat.
> >
> > Hence I've figured I need to fix that for the next deadlock involving
> > console_lock - and with kms/drm growing ever more complex locking
> > that'll eventually happen.
> >
> > Now the console_lock has rather funky semantics, so after a quick irc
> > discussion with Thomas Gleixner and Dave Airlie I've quickly ditched
> > the original idead of switching to a real mutex (since it won't work)
> > and instead opted to annotate the console_lock with lockdep
> > information manually.
> >
> > There are a few special cases:
> > - The console_lock state is protected by the console_sem, and usually
> > grabbed/dropped at _lock/_unlock time. But the suspend/resume code
> > drops the semaphore without dropping the console_lock (see
> > suspend_console/resume_console). But since the same thread that did
> > the suspend will do the resume, we don't need to fix up anything.
> >
> > - In the printk code there's a special trylock, only used to kick off
> > the logbuffer printk'ing in console_unlock. But all that happens
> > while lockdep is disable (since printk does a few other evil
> > tricks). So no issue there, either.
> >
> > - The console_lock can also be acquired form irq context (but only
> > with a trylock). lockdep already handles that.
> >
> > This all leaves us with annotating the normal console_lock, _unlock
> > and _trylock functions.
> >
> > And yes, it works - simply unloading a drm kms driver resulted in
> > lockdep complaining about the deadlock in fbcon_deinit:
> >
> > ======================================================
> > [ INFO: possible circular locking dependency detected ]
> > 3.6.0-rc2+ #552 Not tainted
> > -------------------------------------------------------
> > kms-reload/3577 is trying to acquire lock:
> > ((&info->queue)){+.+...}, at: [<ffffffff81058c70>] wait_on_work+0x0/0xa7
> >
> > but task is already holding lock:
> > (console_lock){+.+.+.}, at: [<ffffffff81264686>] bind_con_driver+0x38/0x263
> >
> > which lock already depends on the new lock.
> >
> > the existing dependency chain (in reverse order) is:
> >
> > -> #1 (console_lock){+.+.+.}:
> > [<ffffffff81087440>] lock_acquire+0x95/0x105
> > [<ffffffff81040190>] console_lock+0x59/0x5b
> > [<ffffffff81209cb6>] fb_flashcursor+0x2e/0x12c
> > [<ffffffff81057c3e>] process_one_work+0x1d9/0x3b4
> > [<ffffffff810584a2>] worker_thread+0x1a7/0x24b
> > [<ffffffff8105ca29>] kthread+0x7f/0x87
> > [<ffffffff813b1204>] kernel_thread_helper+0x4/0x10
> >
> > -> #0 ((&info->queue)){+.+...}:
> > [<ffffffff81086cb3>] __lock_acquire+0x999/0xcf6
> > [<ffffffff81087440>] lock_acquire+0x95/0x105
> > [<ffffffff81058cab>] wait_on_work+0x3b/0xa7
> > [<ffffffff81058dd6>] __cancel_work_timer+0xbf/0x102
> > [<ffffffff81058e33>] cancel_work_sync+0xb/0xd
> > [<ffffffff8120a3b3>] fbcon_deinit+0x11c/0x1dc
> > [<ffffffff81264793>] bind_con_driver+0x145/0x263
> > [<ffffffff81264a45>] unbind_con_driver+0x14f/0x195
> > [<ffffffff8126540c>] store_bind+0x1ad/0x1c1
> > [<ffffffff8127cbb7>] dev_attr_store+0x13/0x1f
> > [<ffffffff8116d884>] sysfs_write_file+0xe9/0x121
> > [<ffffffff811145b2>] vfs_write+0x9b/0xfd
> > [<ffffffff811147b7>] sys_write+0x3e/0x6b
> > [<ffffffff813b0039>] system_call_fastpath+0x16/0x1b
> >
> > other info that might help us debug this:
> >
> > Possible unsafe locking scenario:
> >
> > CPU0 CPU1
> > ---- ----
> > lock(console_lock);
> > lock((&info->queue));
> > lock(console_lock);
> > lock((&info->queue));
> >
> > *** DEADLOCK ***
> >
> > v2: Mark the lockdep_map static, noticed by Jani Nikula.
> >
> > Cc: Dave Airlie <airlied@gmail.com>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
> > Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> > ---
> > kernel/printk.c | 9 +++++++++
> > 1 file changed, 9 insertions(+)
>
> So I'm guessing I should take this through the tty tree, right? Any
> objections to that for 3.7?
I didn't know who would be the relevant maintainer, so just spammed a few
people. Would be awesome if you could merge these patches for 3.7, and at
least Alan Cox seems to like them:
http://marc.info/?l=linux-fbdev&m=134564125601147&w=1
Thanks, Daniel
>
> thanks,
>
> greg k-h
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: [Intel-gfx] [PATCH] console: implement lockdep support for console_lock
2012-09-22 20:06 ` [Intel-gfx] " Greg KH
(?)
(?)
@ 2012-10-02 12:56 ` Daniel Vetter
2012-10-02 13:28 ` [Intel-gfx] " Greg KH
-1 siblings, 1 reply; 17+ messages in thread
From: Daniel Vetter @ 2012-10-02 12:56 UTC (permalink / raw)
To: Greg KH
Cc: LKML, Peter Zijlstra, Intel Graphics Development, DRI Development,
Thomas Gleixner, Alan Cox
On Sat, Sep 22, 2012 at 10:06 PM, Greg KH <gregkh@linuxfoundation.org> wrote:
> On Sat, Sep 22, 2012 at 07:52:11PM +0200, Daniel Vetter wrote:
>> Dave Airlie recently discovered a locking bug in the fbcon layer,
>> where a timer_del_sync (for the blinking cursor) deadlocks with the
>> timer itself, since both (want to) hold the console_lock:
>>
>> https://lkml.org/lkml/2012/8/21/36
>>
>> Unfortunately the console_lock isn't a plain mutex and hence has no
>> lockdep support. Which resulted in a few days wasted of tracking down
>> this bug (complicated by the fact that printk doesn't show anything
>> when the console is locked) instead of noticing the bug much earlier
>> with the lockdep splat.
>>
>> Hence I've figured I need to fix that for the next deadlock involving
>> console_lock - and with kms/drm growing ever more complex locking
>> that'll eventually happen.
>>
>> Now the console_lock has rather funky semantics, so after a quick irc
>> discussion with Thomas Gleixner and Dave Airlie I've quickly ditched
>> the original idead of switching to a real mutex (since it won't work)
>> and instead opted to annotate the console_lock with lockdep
>> information manually.
>>
>> There are a few special cases:
>> - The console_lock state is protected by the console_sem, and usually
>> grabbed/dropped at _lock/_unlock time. But the suspend/resume code
>> drops the semaphore without dropping the console_lock (see
>> suspend_console/resume_console). But since the same thread that did
>> the suspend will do the resume, we don't need to fix up anything.
>>
>> - In the printk code there's a special trylock, only used to kick off
>> the logbuffer printk'ing in console_unlock. But all that happens
>> while lockdep is disable (since printk does a few other evil
>> tricks). So no issue there, either.
>>
>> - The console_lock can also be acquired form irq context (but only
>> with a trylock). lockdep already handles that.
>>
>> This all leaves us with annotating the normal console_lock, _unlock
>> and _trylock functions.
>>
>> And yes, it works - simply unloading a drm kms driver resulted in
>> lockdep complaining about the deadlock in fbcon_deinit:
>>
>> ======================================================
>> [ INFO: possible circular locking dependency detected ]
>> 3.6.0-rc2+ #552 Not tainted
>> -------------------------------------------------------
>> kms-reload/3577 is trying to acquire lock:
>> ((&info->queue)){+.+...}, at: [<ffffffff81058c70>] wait_on_work+0x0/0xa7
>>
>> but task is already holding lock:
>> (console_lock){+.+.+.}, at: [<ffffffff81264686>] bind_con_driver+0x38/0x263
>>
>> which lock already depends on the new lock.
>>
>> the existing dependency chain (in reverse order) is:
>>
>> -> #1 (console_lock){+.+.+.}:
>> [<ffffffff81087440>] lock_acquire+0x95/0x105
>> [<ffffffff81040190>] console_lock+0x59/0x5b
>> [<ffffffff81209cb6>] fb_flashcursor+0x2e/0x12c
>> [<ffffffff81057c3e>] process_one_work+0x1d9/0x3b4
>> [<ffffffff810584a2>] worker_thread+0x1a7/0x24b
>> [<ffffffff8105ca29>] kthread+0x7f/0x87
>> [<ffffffff813b1204>] kernel_thread_helper+0x4/0x10
>>
>> -> #0 ((&info->queue)){+.+...}:
>> [<ffffffff81086cb3>] __lock_acquire+0x999/0xcf6
>> [<ffffffff81087440>] lock_acquire+0x95/0x105
>> [<ffffffff81058cab>] wait_on_work+0x3b/0xa7
>> [<ffffffff81058dd6>] __cancel_work_timer+0xbf/0x102
>> [<ffffffff81058e33>] cancel_work_sync+0xb/0xd
>> [<ffffffff8120a3b3>] fbcon_deinit+0x11c/0x1dc
>> [<ffffffff81264793>] bind_con_driver+0x145/0x263
>> [<ffffffff81264a45>] unbind_con_driver+0x14f/0x195
>> [<ffffffff8126540c>] store_bind+0x1ad/0x1c1
>> [<ffffffff8127cbb7>] dev_attr_store+0x13/0x1f
>> [<ffffffff8116d884>] sysfs_write_file+0xe9/0x121
>> [<ffffffff811145b2>] vfs_write+0x9b/0xfd
>> [<ffffffff811147b7>] sys_write+0x3e/0x6b
>> [<ffffffff813b0039>] system_call_fastpath+0x16/0x1b
>>
>> other info that might help us debug this:
>>
>> Possible unsafe locking scenario:
>>
>> CPU0 CPU1
>> ---- ----
>> lock(console_lock);
>> lock((&info->queue));
>> lock(console_lock);
>> lock((&info->queue));
>>
>> *** DEADLOCK ***
>>
>> v2: Mark the lockdep_map static, noticed by Jani Nikula.
>>
>> Cc: Dave Airlie <airlied@gmail.com>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
>> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
>> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
>> ---
>> kernel/printk.c | 9 +++++++++
>> 1 file changed, 9 insertions(+)
>
> So I'm guessing I should take this through the tty tree, right? Any
> objections to that for 3.7?
I've noticed that the tty tree went in already :( Any chance you could
still slip this in for 3.7? I'd _really_ like to have this stuff in
for debugging console_lock madness in drm drivers - we've already had
our fair share of those ...
Thanks, Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: [PATCH] console: implement lockdep support for console_lock
2012-10-02 12:56 ` Daniel Vetter
@ 2012-10-02 13:28 ` Greg KH
0 siblings, 0 replies; 17+ messages in thread
From: Greg KH @ 2012-10-02 13:28 UTC (permalink / raw)
To: Daniel Vetter
Cc: Peter Zijlstra, Intel Graphics Development, LKML, DRI Development,
Thomas Gleixner, Alan Cox
On Tue, Oct 02, 2012 at 02:56:48PM +0200, Daniel Vetter wrote:
> On Sat, Sep 22, 2012 at 10:06 PM, Greg KH <gregkh@linuxfoundation.org> wrote:
> > On Sat, Sep 22, 2012 at 07:52:11PM +0200, Daniel Vetter wrote:
> >> Dave Airlie recently discovered a locking bug in the fbcon layer,
> >> where a timer_del_sync (for the blinking cursor) deadlocks with the
> >> timer itself, since both (want to) hold the console_lock:
> >>
> >> https://lkml.org/lkml/2012/8/21/36
> >>
> >> Unfortunately the console_lock isn't a plain mutex and hence has no
> >> lockdep support. Which resulted in a few days wasted of tracking down
> >> this bug (complicated by the fact that printk doesn't show anything
> >> when the console is locked) instead of noticing the bug much earlier
> >> with the lockdep splat.
> >>
> >> Hence I've figured I need to fix that for the next deadlock involving
> >> console_lock - and with kms/drm growing ever more complex locking
> >> that'll eventually happen.
> >>
> >> Now the console_lock has rather funky semantics, so after a quick irc
> >> discussion with Thomas Gleixner and Dave Airlie I've quickly ditched
> >> the original idead of switching to a real mutex (since it won't work)
> >> and instead opted to annotate the console_lock with lockdep
> >> information manually.
> >>
> >> There are a few special cases:
> >> - The console_lock state is protected by the console_sem, and usually
> >> grabbed/dropped at _lock/_unlock time. But the suspend/resume code
> >> drops the semaphore without dropping the console_lock (see
> >> suspend_console/resume_console). But since the same thread that did
> >> the suspend will do the resume, we don't need to fix up anything.
> >>
> >> - In the printk code there's a special trylock, only used to kick off
> >> the logbuffer printk'ing in console_unlock. But all that happens
> >> while lockdep is disable (since printk does a few other evil
> >> tricks). So no issue there, either.
> >>
> >> - The console_lock can also be acquired form irq context (but only
> >> with a trylock). lockdep already handles that.
> >>
> >> This all leaves us with annotating the normal console_lock, _unlock
> >> and _trylock functions.
> >>
> >> And yes, it works - simply unloading a drm kms driver resulted in
> >> lockdep complaining about the deadlock in fbcon_deinit:
> >>
> >> ======================================================
> >> [ INFO: possible circular locking dependency detected ]
> >> 3.6.0-rc2+ #552 Not tainted
> >> -------------------------------------------------------
> >> kms-reload/3577 is trying to acquire lock:
> >> ((&info->queue)){+.+...}, at: [<ffffffff81058c70>] wait_on_work+0x0/0xa7
> >>
> >> but task is already holding lock:
> >> (console_lock){+.+.+.}, at: [<ffffffff81264686>] bind_con_driver+0x38/0x263
> >>
> >> which lock already depends on the new lock.
> >>
> >> the existing dependency chain (in reverse order) is:
> >>
> >> -> #1 (console_lock){+.+.+.}:
> >> [<ffffffff81087440>] lock_acquire+0x95/0x105
> >> [<ffffffff81040190>] console_lock+0x59/0x5b
> >> [<ffffffff81209cb6>] fb_flashcursor+0x2e/0x12c
> >> [<ffffffff81057c3e>] process_one_work+0x1d9/0x3b4
> >> [<ffffffff810584a2>] worker_thread+0x1a7/0x24b
> >> [<ffffffff8105ca29>] kthread+0x7f/0x87
> >> [<ffffffff813b1204>] kernel_thread_helper+0x4/0x10
> >>
> >> -> #0 ((&info->queue)){+.+...}:
> >> [<ffffffff81086cb3>] __lock_acquire+0x999/0xcf6
> >> [<ffffffff81087440>] lock_acquire+0x95/0x105
> >> [<ffffffff81058cab>] wait_on_work+0x3b/0xa7
> >> [<ffffffff81058dd6>] __cancel_work_timer+0xbf/0x102
> >> [<ffffffff81058e33>] cancel_work_sync+0xb/0xd
> >> [<ffffffff8120a3b3>] fbcon_deinit+0x11c/0x1dc
> >> [<ffffffff81264793>] bind_con_driver+0x145/0x263
> >> [<ffffffff81264a45>] unbind_con_driver+0x14f/0x195
> >> [<ffffffff8126540c>] store_bind+0x1ad/0x1c1
> >> [<ffffffff8127cbb7>] dev_attr_store+0x13/0x1f
> >> [<ffffffff8116d884>] sysfs_write_file+0xe9/0x121
> >> [<ffffffff811145b2>] vfs_write+0x9b/0xfd
> >> [<ffffffff811147b7>] sys_write+0x3e/0x6b
> >> [<ffffffff813b0039>] system_call_fastpath+0x16/0x1b
> >>
> >> other info that might help us debug this:
> >>
> >> Possible unsafe locking scenario:
> >>
> >> CPU0 CPU1
> >> ---- ----
> >> lock(console_lock);
> >> lock((&info->queue));
> >> lock(console_lock);
> >> lock((&info->queue));
> >>
> >> *** DEADLOCK ***
> >>
> >> v2: Mark the lockdep_map static, noticed by Jani Nikula.
> >>
> >> Cc: Dave Airlie <airlied@gmail.com>
> >> Cc: Thomas Gleixner <tglx@linutronix.de>
> >> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
> >> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> >> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> >> ---
> >> kernel/printk.c | 9 +++++++++
> >> 1 file changed, 9 insertions(+)
> >
> > So I'm guessing I should take this through the tty tree, right? Any
> > objections to that for 3.7?
>
> I've noticed that the tty tree went in already :( Any chance you could
> still slip this in for 3.7? I'd _really_ like to have this stuff in
> for debugging console_lock madness in drm drivers - we've already had
> our fair share of those ...
No, as it hasn't been in linux-next already, I can't send it in for 3.7,
sorry, you know that. I'll be glad to queue it up for 3.8 if you want me to.
thanks,
greg k-h
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: [Intel-gfx] [PATCH] console: implement lockdep support for console_lock
@ 2012-10-02 13:28 ` Greg KH
0 siblings, 0 replies; 17+ messages in thread
From: Greg KH @ 2012-10-02 13:28 UTC (permalink / raw)
To: Daniel Vetter
Cc: LKML, Peter Zijlstra, Intel Graphics Development, DRI Development,
Thomas Gleixner, Alan Cox
On Tue, Oct 02, 2012 at 02:56:48PM +0200, Daniel Vetter wrote:
> On Sat, Sep 22, 2012 at 10:06 PM, Greg KH <gregkh@linuxfoundation.org> wrote:
> > On Sat, Sep 22, 2012 at 07:52:11PM +0200, Daniel Vetter wrote:
> >> Dave Airlie recently discovered a locking bug in the fbcon layer,
> >> where a timer_del_sync (for the blinking cursor) deadlocks with the
> >> timer itself, since both (want to) hold the console_lock:
> >>
> >> https://lkml.org/lkml/2012/8/21/36
> >>
> >> Unfortunately the console_lock isn't a plain mutex and hence has no
> >> lockdep support. Which resulted in a few days wasted of tracking down
> >> this bug (complicated by the fact that printk doesn't show anything
> >> when the console is locked) instead of noticing the bug much earlier
> >> with the lockdep splat.
> >>
> >> Hence I've figured I need to fix that for the next deadlock involving
> >> console_lock - and with kms/drm growing ever more complex locking
> >> that'll eventually happen.
> >>
> >> Now the console_lock has rather funky semantics, so after a quick irc
> >> discussion with Thomas Gleixner and Dave Airlie I've quickly ditched
> >> the original idead of switching to a real mutex (since it won't work)
> >> and instead opted to annotate the console_lock with lockdep
> >> information manually.
> >>
> >> There are a few special cases:
> >> - The console_lock state is protected by the console_sem, and usually
> >> grabbed/dropped at _lock/_unlock time. But the suspend/resume code
> >> drops the semaphore without dropping the console_lock (see
> >> suspend_console/resume_console). But since the same thread that did
> >> the suspend will do the resume, we don't need to fix up anything.
> >>
> >> - In the printk code there's a special trylock, only used to kick off
> >> the logbuffer printk'ing in console_unlock. But all that happens
> >> while lockdep is disable (since printk does a few other evil
> >> tricks). So no issue there, either.
> >>
> >> - The console_lock can also be acquired form irq context (but only
> >> with a trylock). lockdep already handles that.
> >>
> >> This all leaves us with annotating the normal console_lock, _unlock
> >> and _trylock functions.
> >>
> >> And yes, it works - simply unloading a drm kms driver resulted in
> >> lockdep complaining about the deadlock in fbcon_deinit:
> >>
> >> ======================================================
> >> [ INFO: possible circular locking dependency detected ]
> >> 3.6.0-rc2+ #552 Not tainted
> >> -------------------------------------------------------
> >> kms-reload/3577 is trying to acquire lock:
> >> ((&info->queue)){+.+...}, at: [<ffffffff81058c70>] wait_on_work+0x0/0xa7
> >>
> >> but task is already holding lock:
> >> (console_lock){+.+.+.}, at: [<ffffffff81264686>] bind_con_driver+0x38/0x263
> >>
> >> which lock already depends on the new lock.
> >>
> >> the existing dependency chain (in reverse order) is:
> >>
> >> -> #1 (console_lock){+.+.+.}:
> >> [<ffffffff81087440>] lock_acquire+0x95/0x105
> >> [<ffffffff81040190>] console_lock+0x59/0x5b
> >> [<ffffffff81209cb6>] fb_flashcursor+0x2e/0x12c
> >> [<ffffffff81057c3e>] process_one_work+0x1d9/0x3b4
> >> [<ffffffff810584a2>] worker_thread+0x1a7/0x24b
> >> [<ffffffff8105ca29>] kthread+0x7f/0x87
> >> [<ffffffff813b1204>] kernel_thread_helper+0x4/0x10
> >>
> >> -> #0 ((&info->queue)){+.+...}:
> >> [<ffffffff81086cb3>] __lock_acquire+0x999/0xcf6
> >> [<ffffffff81087440>] lock_acquire+0x95/0x105
> >> [<ffffffff81058cab>] wait_on_work+0x3b/0xa7
> >> [<ffffffff81058dd6>] __cancel_work_timer+0xbf/0x102
> >> [<ffffffff81058e33>] cancel_work_sync+0xb/0xd
> >> [<ffffffff8120a3b3>] fbcon_deinit+0x11c/0x1dc
> >> [<ffffffff81264793>] bind_con_driver+0x145/0x263
> >> [<ffffffff81264a45>] unbind_con_driver+0x14f/0x195
> >> [<ffffffff8126540c>] store_bind+0x1ad/0x1c1
> >> [<ffffffff8127cbb7>] dev_attr_store+0x13/0x1f
> >> [<ffffffff8116d884>] sysfs_write_file+0xe9/0x121
> >> [<ffffffff811145b2>] vfs_write+0x9b/0xfd
> >> [<ffffffff811147b7>] sys_write+0x3e/0x6b
> >> [<ffffffff813b0039>] system_call_fastpath+0x16/0x1b
> >>
> >> other info that might help us debug this:
> >>
> >> Possible unsafe locking scenario:
> >>
> >> CPU0 CPU1
> >> ---- ----
> >> lock(console_lock);
> >> lock((&info->queue));
> >> lock(console_lock);
> >> lock((&info->queue));
> >>
> >> *** DEADLOCK ***
> >>
> >> v2: Mark the lockdep_map static, noticed by Jani Nikula.
> >>
> >> Cc: Dave Airlie <airlied@gmail.com>
> >> Cc: Thomas Gleixner <tglx@linutronix.de>
> >> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
> >> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> >> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> >> ---
> >> kernel/printk.c | 9 +++++++++
> >> 1 file changed, 9 insertions(+)
> >
> > So I'm guessing I should take this through the tty tree, right? Any
> > objections to that for 3.7?
>
> I've noticed that the tty tree went in already :( Any chance you could
> still slip this in for 3.7? I'd _really_ like to have this stuff in
> for debugging console_lock madness in drm drivers - we've already had
> our fair share of those ...
No, as it hasn't been in linux-next already, I can't send it in for 3.7,
sorry, you know that. I'll be glad to queue it up for 3.8 if you want me to.
thanks,
greg k-h
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: [Intel-gfx] [PATCH] console: implement lockdep support for console_lock
2012-10-02 13:28 ` [Intel-gfx] " Greg KH
(?)
@ 2012-10-02 13:31 ` Daniel Vetter
-1 siblings, 0 replies; 17+ messages in thread
From: Daniel Vetter @ 2012-10-02 13:31 UTC (permalink / raw)
To: Greg KH
Cc: LKML, Peter Zijlstra, Intel Graphics Development, DRI Development,
Thomas Gleixner, Alan Cox
On Tue, Oct 2, 2012 at 3:28 PM, Greg KH <gregkh@linuxfoundation.org> wrote:
> No, as it hasn't been in linux-next already, I can't send it in for 3.7,
> sorry, you know that. I'll be glad to queue it up for 3.8 if you want me to.
Hey, was worth a shot ;-) Yeah, if you can pick it up for 3.8, that
would be nice, since the patches have been floating for a while by now
...
Thanks, Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
^ permalink raw reply [flat|nested] 17+ messages in thread