* xen_console_resume() may sleep with irqs disabled
@ 2013-06-19 12:08 David Vrabel
2013-06-19 13:05 ` Ian Campbell
0 siblings, 1 reply; 3+ messages in thread
From: David Vrabel @ 2013-06-19 12:08 UTC (permalink / raw)
To: Xen-devel; +Cc: Ian Campbell, Konrad Rzeszutek Wilk
During a suspend/resume cycle (e.g., a migration), xen_console_resume()
is called from xen_suspend() which is called with local irqs disabled by
stop_machine().
xen_console_resume() calls rebind_evtchn_irq() which attempts to lock
the irq_mapping_update_lock mutex. This produces the BUG listed below.
The lock was changed to a mutex by 773659 (xen/irq: Alter the locking to
use a mutex instead of a spinlock.). Clearly we can't just revert this
change but it's not clear to me what the correct fix here is.
Can xen_console_resume() be deferred to later in the resume process?
David
[ 39.877210] BUG: sleeping function called from invalid context at
/anfs/drall/scratch/davidvr/x86/linux/kernel/mutex.c:413
[ 39.877210] in_atomic(): 1, irqs_disabled(): 1, pid: 7, name: migration/0
[ 39.877210] no locks held by migration/0/7.
[ 39.877210] irq event stamp: 38
[ 39.877210] hardirqs last enabled at (37): [<ffffffff812d07b0>]
_raw_spin_unlock_irq+0x30/0x50
[ 39.877210] hardirqs last disabled at (38): [<ffffffff81091ab5>]
stop_machine_cpu_stop+0x95/0x110
[ 39.877210] softirqs last enabled at (0): [<ffffffff8103cd98>]
copy_process.part.63+0x348/0x13e0
[ 39.877210] softirqs last disabled at (0): [< (null)>]
(null)
[ 39.877210] CPU: 0 PID: 7 Comm: migration/0 Not tainted
3.10.0-rc6.davidvr #82
[ 39.877210] 0000000000000000 ffff88000ec8fbc8 ffffffff812cc2cf
ffff88000ec8fbe8
[ 39.877210] ffffffff810723b5 ffff88000ed16800 ffff88000ec8ffd8
ffff88000ec8fc68
[ 39.877210] ffffffff812cd181 ffff88000ec8fc18 ffffffff81092b17
ffff88000ed16800
[ 39.877210] Call Trace:
[ 39.877210] [<ffffffff812cc2cf>] dump_stack+0x19/0x1b
[ 39.877210] [<ffffffff810723b5>] __might_sleep+0xf5/0x130
[ 39.877210] [<ffffffff812cd181>] mutex_lock_nested+0x41/0x3b0
[ 39.877210] [<ffffffff81092b17>] ? __irq_put_desc_unlock+0x27/0x50
[ 39.877210] [<ffffffff8109410c>] ? __disable_irq_nosync+0x4c/0x70
[ 39.877210] [<ffffffff81091a01>] ? queue_stop_cpus_work+0xc1/0xe0
[ 39.877210] [<ffffffff811bc912>] rebind_evtchn_irq+0x42/0xc0
[ 39.877210] [<ffffffff811e3c80>] xen_console_resume+0x60/0x70
[ 39.877210] [<ffffffff811b87c2>] xen_suspend+0x92/0xb0
[ 39.877210] [<ffffffff81091acb>] stop_machine_cpu_stop+0xab/0x110
[ 39.877210] [<ffffffff81091a20>] ? queue_stop_cpus_work+0xe0/0xe0
[ 39.877210] [<ffffffff8109170b>] cpu_stopper_thread+0x8b/0x140
[ 39.877210] [<ffffffff812d073f>] ? _raw_spin_unlock_irqrestore+0x3f/0x80
[ 39.877210] [<ffffffff81086f25>] ? trace_hardirqs_on_caller+0x105/0x1d0
[ 39.877210] [<ffffffff81086ffd>] ? trace_hardirqs_on+0xd/0x10
[ 39.877210] [<ffffffff812d0749>] ? _raw_spin_unlock_irqrestore+0x49/0x80
[ 39.877210] [<ffffffff8106a6c6>] smpboot_thread_fn+0x166/0x1f0
[ 39.877210] [<ffffffff8106a560>] ? lg_global_unlock+0x80/0x80
[ 39.877210] [<ffffffff8106271b>] kthread+0xdb/0xe0
[ 39.877210] [<ffffffff81062640>] ? __kthread_parkme+0x90/0x90
[ 39.877210] [<ffffffff812d1eac>] ret_from_fork+0x7c/0xb0
[ 39.877210] [<ffffffff81062640>] ? __kthread_parkme+0x90/0x90
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: xen_console_resume() may sleep with irqs disabled
2013-06-19 12:08 xen_console_resume() may sleep with irqs disabled David Vrabel
@ 2013-06-19 13:05 ` Ian Campbell
2013-06-19 15:45 ` Konrad Rzeszutek Wilk
0 siblings, 1 reply; 3+ messages in thread
From: Ian Campbell @ 2013-06-19 13:05 UTC (permalink / raw)
To: David Vrabel; +Cc: Konrad Rzeszutek Wilk, Xen-devel
On Wed, 2013-06-19 at 13:08 +0100, David Vrabel wrote:
> During a suspend/resume cycle (e.g., a migration), xen_console_resume()
> is called from xen_suspend() which is called with local irqs disabled by
> stop_machine().
>
> xen_console_resume() calls rebind_evtchn_irq() which attempts to lock
> the irq_mapping_update_lock mutex. This produces the BUG listed below.
>
> The lock was changed to a mutex by 773659 (xen/irq: Alter the locking to
> use a mutex instead of a spinlock.). Clearly we can't just revert this
> change but it's not clear to me what the correct fix here is.
>
> Can xen_console_resume() be deferred to later in the resume process?
Does that risk losing debug output from the early part of the resume?
That would be worth avoiding.
So I think it's just a case of making sure that the rest of the driver
is functional if we move the call later.
It might even be possible to move that restore into the standard device
model restore hook instead of this slightly special case.
Does a similar issue not exist with console drivers for a native
hibernate? I wonder how they get around that. I have a vague
recollection of a debug option for leaving the console unsuspended over
hibernate.
Ian.
>
> David
>
> [ 39.877210] BUG: sleeping function called from invalid context at
> /anfs/drall/scratch/davidvr/x86/linux/kernel/mutex.c:413
> [ 39.877210] in_atomic(): 1, irqs_disabled(): 1, pid: 7, name: migration/0
> [ 39.877210] no locks held by migration/0/7.
> [ 39.877210] irq event stamp: 38
> [ 39.877210] hardirqs last enabled at (37): [<ffffffff812d07b0>]
> _raw_spin_unlock_irq+0x30/0x50
> [ 39.877210] hardirqs last disabled at (38): [<ffffffff81091ab5>]
> stop_machine_cpu_stop+0x95/0x110
> [ 39.877210] softirqs last enabled at (0): [<ffffffff8103cd98>]
> copy_process.part.63+0x348/0x13e0
> [ 39.877210] softirqs last disabled at (0): [< (null)>]
> (null)
> [ 39.877210] CPU: 0 PID: 7 Comm: migration/0 Not tainted
> 3.10.0-rc6.davidvr #82
> [ 39.877210] 0000000000000000 ffff88000ec8fbc8 ffffffff812cc2cf
> ffff88000ec8fbe8
> [ 39.877210] ffffffff810723b5 ffff88000ed16800 ffff88000ec8ffd8
> ffff88000ec8fc68
> [ 39.877210] ffffffff812cd181 ffff88000ec8fc18 ffffffff81092b17
> ffff88000ed16800
> [ 39.877210] Call Trace:
> [ 39.877210] [<ffffffff812cc2cf>] dump_stack+0x19/0x1b
> [ 39.877210] [<ffffffff810723b5>] __might_sleep+0xf5/0x130
> [ 39.877210] [<ffffffff812cd181>] mutex_lock_nested+0x41/0x3b0
> [ 39.877210] [<ffffffff81092b17>] ? __irq_put_desc_unlock+0x27/0x50
> [ 39.877210] [<ffffffff8109410c>] ? __disable_irq_nosync+0x4c/0x70
> [ 39.877210] [<ffffffff81091a01>] ? queue_stop_cpus_work+0xc1/0xe0
> [ 39.877210] [<ffffffff811bc912>] rebind_evtchn_irq+0x42/0xc0
> [ 39.877210] [<ffffffff811e3c80>] xen_console_resume+0x60/0x70
> [ 39.877210] [<ffffffff811b87c2>] xen_suspend+0x92/0xb0
> [ 39.877210] [<ffffffff81091acb>] stop_machine_cpu_stop+0xab/0x110
> [ 39.877210] [<ffffffff81091a20>] ? queue_stop_cpus_work+0xe0/0xe0
> [ 39.877210] [<ffffffff8109170b>] cpu_stopper_thread+0x8b/0x140
> [ 39.877210] [<ffffffff812d073f>] ? _raw_spin_unlock_irqrestore+0x3f/0x80
> [ 39.877210] [<ffffffff81086f25>] ? trace_hardirqs_on_caller+0x105/0x1d0
> [ 39.877210] [<ffffffff81086ffd>] ? trace_hardirqs_on+0xd/0x10
> [ 39.877210] [<ffffffff812d0749>] ? _raw_spin_unlock_irqrestore+0x49/0x80
> [ 39.877210] [<ffffffff8106a6c6>] smpboot_thread_fn+0x166/0x1f0
> [ 39.877210] [<ffffffff8106a560>] ? lg_global_unlock+0x80/0x80
> [ 39.877210] [<ffffffff8106271b>] kthread+0xdb/0xe0
> [ 39.877210] [<ffffffff81062640>] ? __kthread_parkme+0x90/0x90
> [ 39.877210] [<ffffffff812d1eac>] ret_from_fork+0x7c/0xb0
> [ 39.877210] [<ffffffff81062640>] ? __kthread_parkme+0x90/0x90
>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: xen_console_resume() may sleep with irqs disabled
2013-06-19 13:05 ` Ian Campbell
@ 2013-06-19 15:45 ` Konrad Rzeszutek Wilk
0 siblings, 0 replies; 3+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-06-19 15:45 UTC (permalink / raw)
To: Ian Campbell; +Cc: David Vrabel, Xen-devel
On Wed, Jun 19, 2013 at 02:05:43PM +0100, Ian Campbell wrote:
> On Wed, 2013-06-19 at 13:08 +0100, David Vrabel wrote:
> > During a suspend/resume cycle (e.g., a migration), xen_console_resume()
> > is called from xen_suspend() which is called with local irqs disabled by
> > stop_machine().
> >
> > xen_console_resume() calls rebind_evtchn_irq() which attempts to lock
> > the irq_mapping_update_lock mutex. This produces the BUG listed below.
> >
> > The lock was changed to a mutex by 773659 (xen/irq: Alter the locking to
> > use a mutex instead of a spinlock.). Clearly we can't just revert this
> > change but it's not clear to me what the correct fix here is.
> >
> > Can xen_console_resume() be deferred to later in the resume process?
>
> Does that risk losing debug output from the early part of the resume?
> That would be worth avoiding.
>
> So I think it's just a case of making sure that the rest of the driver
> is functional if we move the call later.
>
> It might even be possible to move that restore into the standard device
> model restore hook instead of this slightly special case.
>
> Does a similar issue not exist with console drivers for a native
> hibernate? I wonder how they get around that. I have a vague
> recollection of a debug option for leaving the console unsuspended over
> hibernate.
<nods> I think it was called 'no_console_suspend'
Which looks like you just need to sprinkle 'suspend_console()'
and 'resume_console()' around in the manage.c code?
>
> Ian.
>
> >
> > David
> >
> > [ 39.877210] BUG: sleeping function called from invalid context at
> > /anfs/drall/scratch/davidvr/x86/linux/kernel/mutex.c:413
> > [ 39.877210] in_atomic(): 1, irqs_disabled(): 1, pid: 7, name: migration/0
> > [ 39.877210] no locks held by migration/0/7.
> > [ 39.877210] irq event stamp: 38
> > [ 39.877210] hardirqs last enabled at (37): [<ffffffff812d07b0>]
> > _raw_spin_unlock_irq+0x30/0x50
> > [ 39.877210] hardirqs last disabled at (38): [<ffffffff81091ab5>]
> > stop_machine_cpu_stop+0x95/0x110
> > [ 39.877210] softirqs last enabled at (0): [<ffffffff8103cd98>]
> > copy_process.part.63+0x348/0x13e0
> > [ 39.877210] softirqs last disabled at (0): [< (null)>]
> > (null)
> > [ 39.877210] CPU: 0 PID: 7 Comm: migration/0 Not tainted
> > 3.10.0-rc6.davidvr #82
> > [ 39.877210] 0000000000000000 ffff88000ec8fbc8 ffffffff812cc2cf
> > ffff88000ec8fbe8
> > [ 39.877210] ffffffff810723b5 ffff88000ed16800 ffff88000ec8ffd8
> > ffff88000ec8fc68
> > [ 39.877210] ffffffff812cd181 ffff88000ec8fc18 ffffffff81092b17
> > ffff88000ed16800
> > [ 39.877210] Call Trace:
> > [ 39.877210] [<ffffffff812cc2cf>] dump_stack+0x19/0x1b
> > [ 39.877210] [<ffffffff810723b5>] __might_sleep+0xf5/0x130
> > [ 39.877210] [<ffffffff812cd181>] mutex_lock_nested+0x41/0x3b0
> > [ 39.877210] [<ffffffff81092b17>] ? __irq_put_desc_unlock+0x27/0x50
> > [ 39.877210] [<ffffffff8109410c>] ? __disable_irq_nosync+0x4c/0x70
> > [ 39.877210] [<ffffffff81091a01>] ? queue_stop_cpus_work+0xc1/0xe0
> > [ 39.877210] [<ffffffff811bc912>] rebind_evtchn_irq+0x42/0xc0
> > [ 39.877210] [<ffffffff811e3c80>] xen_console_resume+0x60/0x70
> > [ 39.877210] [<ffffffff811b87c2>] xen_suspend+0x92/0xb0
> > [ 39.877210] [<ffffffff81091acb>] stop_machine_cpu_stop+0xab/0x110
> > [ 39.877210] [<ffffffff81091a20>] ? queue_stop_cpus_work+0xe0/0xe0
> > [ 39.877210] [<ffffffff8109170b>] cpu_stopper_thread+0x8b/0x140
> > [ 39.877210] [<ffffffff812d073f>] ? _raw_spin_unlock_irqrestore+0x3f/0x80
> > [ 39.877210] [<ffffffff81086f25>] ? trace_hardirqs_on_caller+0x105/0x1d0
> > [ 39.877210] [<ffffffff81086ffd>] ? trace_hardirqs_on+0xd/0x10
> > [ 39.877210] [<ffffffff812d0749>] ? _raw_spin_unlock_irqrestore+0x49/0x80
> > [ 39.877210] [<ffffffff8106a6c6>] smpboot_thread_fn+0x166/0x1f0
> > [ 39.877210] [<ffffffff8106a560>] ? lg_global_unlock+0x80/0x80
> > [ 39.877210] [<ffffffff8106271b>] kthread+0xdb/0xe0
> > [ 39.877210] [<ffffffff81062640>] ? __kthread_parkme+0x90/0x90
> > [ 39.877210] [<ffffffff812d1eac>] ret_from_fork+0x7c/0xb0
> > [ 39.877210] [<ffffffff81062640>] ? __kthread_parkme+0x90/0x90
> >
>
>
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2013-06-19 15:45 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-06-19 12:08 xen_console_resume() may sleep with irqs disabled David Vrabel
2013-06-19 13:05 ` Ian Campbell
2013-06-19 15:45 ` Konrad Rzeszutek Wilk
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.