PV multiconsole bug during resume.

All of lore.kernel.org
 help / color / mirror / Atom feed

* PV multiconsole bug during resume.
@ 2012-05-24 19:37 Konrad Rzeszutek Wilk
  2012-05-25  8:56 ` Ian Campbell
  0 siblings, 1 reply; 3+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-05-24 19:37 UTC (permalink / raw)
  To: xen-devel, Stefano Stabellini, Ian Campbell

So .. we used to have in the event.c a spin_lock to protect the
irq_mapping_update_lock, but with git commit  773659483685d652970583384a0294948e57f8b3
"xen/irq: Alter the locking to use a mutex instead of a spinlock."
I changed it to a mutex b/c we keept on getting WARNs.

But now I get this when I resume a PVHVM guest:

Grant tables using version 2 layout.
BUG: sleeping function called from invalid context at /home/konrad/ssd/linux/kernel/mutex.c:85
in_atomic(): 1, irqs_disabled(): 1, pid: 6, name: migration/0
Pid: 6, comm: migration/0 Tainted: G           O 3.4.0upstream-00113-g598ff45-dirty #1
Call Trace:
 [<ffffffff8109830a>] __might_sleep+0xda/0x100
 [<ffffffff815a47f7>] mutex_lock+0x27/0x50
 [<ffffffff81311ea6>] rebind_evtchn_irq+0x36/0x90
 [<ffffffff81341bfc>] xen_console_resume+0x5c/0x60
 [<ffffffff81313fea>] xen_suspend+0x8a/0xb0
 [<ffffffff810d42f3>] stop_machine_cpu_stop+0xa3/0xf0
 [<ffffffff810d4250>] ? stop_one_cpu_nowait+0x50/0x50
 [<ffffffff810d3f81>] cpu_stopper_thread+0xf1/0x1c0
 [<ffffffff815a5be6>] ? __schedule+0x3c6/0x760
 [<ffffffff815a6bb9>] ? _raw_spin_unlock_irqrestore+0x19/0x30
 [<ffffffff810d3e90>] ? res_counter_charge+0x150/0x150
 [<ffffffff8108e636>] kthread+0x96/0xa0
 [<ffffffff815aeb24>] kernel_thread_helper+0x4/0x10
 [<ffffffff815a7138>] ? retint_restore_args+0x5/0x6
 [<ffffffff815aeb20>] ? gs_change+0x13/0x13
PM: noirq restore of devices complete after 0.163 msecs


Any ideas?

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: PV multiconsole bug during resume.
  2012-05-24 19:37 PV multiconsole bug during resume Konrad Rzeszutek Wilk
@ 2012-05-25  8:56 ` Ian Campbell
  2012-05-25  9:31   ` Stefano Stabellini
  0 siblings, 1 reply; 3+ messages in thread
From: Ian Campbell @ 2012-05-25  8:56 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel@lists.xensource.com, Stefano Stabellini

On Thu, 2012-05-24 at 20:37 +0100, Konrad Rzeszutek Wilk wrote:
> So .. we used to have in the event.c a spin_lock to protect the
> irq_mapping_update_lock, but with git commit  773659483685d652970583384a0294948e57f8b3
> "xen/irq: Alter the locking to use a mutex instead of a spinlock."
> I changed it to a mutex b/c we keept on getting WARNs.
> 
> But now I get this when I resume a PVHVM guest:
> 
> Grant tables using version 2 layout.
> BUG: sleeping function called from invalid context at /home/konrad/ssd/linux/kernel/mutex.c:85
> in_atomic(): 1, irqs_disabled(): 1, pid: 6, name: migration/0
> Pid: 6, comm: migration/0 Tainted: G           O 3.4.0upstream-00113-g598ff45-dirty #1
> Call Trace:
>  [<ffffffff8109830a>] __might_sleep+0xda/0x100
>  [<ffffffff815a47f7>] mutex_lock+0x27/0x50
>  [<ffffffff81311ea6>] rebind_evtchn_irq+0x36/0x90
>  [<ffffffff81341bfc>] xen_console_resume+0x5c/0x60
>  [<ffffffff81313fea>] xen_suspend+0x8a/0xb0
>  [<ffffffff810d42f3>] stop_machine_cpu_stop+0xa3/0xf0
>  [<ffffffff810d4250>] ? stop_one_cpu_nowait+0x50/0x50
>  [<ffffffff810d3f81>] cpu_stopper_thread+0xf1/0x1c0
>  [<ffffffff815a5be6>] ? __schedule+0x3c6/0x760
>  [<ffffffff815a6bb9>] ? _raw_spin_unlock_irqrestore+0x19/0x30
>  [<ffffffff810d3e90>] ? res_counter_charge+0x150/0x150
>  [<ffffffff8108e636>] kthread+0x96/0xa0
>  [<ffffffff815aeb24>] kernel_thread_helper+0x4/0x10
>  [<ffffffff815a7138>] ? retint_restore_args+0x5/0x6
>  [<ffffffff815aeb20>] ? gs_change+0x13/0x13
> PM: noirq restore of devices complete after 0.163 msecs
> 
> 
> Any ideas?

xen_console_resume is called from stop_machine context, which has irqs
disabled etc and so cannot sleep.

One option might be to hoist the call of xen_console_resume out of the
stop machine section, e.g. to the same place as xs_resume (which is the
only over caller of rebind_evtchn_irq).

I'm a bit worried about not getting debug output during resume though,
or worse poking the evtchn before it is actually setup again.

The alternative is to consider whether irq_mapping_update_lock is even
needed in rebind_evtchn_irq. I have a feeling that the lock is a bit
pessimistic in the general case (i.e. it covers more than it needs to).
Lots of stuff which it might once has covered is actually locked
elsewhere these days -- i.e. picking an irq number is handled in the
core -- and the old array no longer exists (we use desc->handler_data
instead). Once you have picked an irq and an evtchn I'm not sure that
the lock when updating the info is even useful any more, so long as the
irq/evtchn in question is masked. Anyhow, might be worth having a good
look at the use of that lock -- if we can't get rid of it entirely
perhaps its scope can be greatly reduced?

Ian.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: PV multiconsole bug during resume.
  2012-05-25  8:56 ` Ian Campbell
@ 2012-05-25  9:31   ` Stefano Stabellini
  0 siblings, 0 replies; 3+ messages in thread
From: Stefano Stabellini @ 2012-05-25  9:31 UTC (permalink / raw)
  To: Ian Campbell
  Cc: xen-devel@lists.xensource.com, Stefano Stabellini,
	Konrad Rzeszutek Wilk

On Fri, 25 May 2012, Ian Campbell wrote:
> On Thu, 2012-05-24 at 20:37 +0100, Konrad Rzeszutek Wilk wrote:
> > So .. we used to have in the event.c a spin_lock to protect the
> > irq_mapping_update_lock, but with git commit  773659483685d652970583384a0294948e57f8b3
> > "xen/irq: Alter the locking to use a mutex instead of a spinlock."
> > I changed it to a mutex b/c we keept on getting WARNs.
> > 
> > But now I get this when I resume a PVHVM guest:
> > 
> > Grant tables using version 2 layout.
> > BUG: sleeping function called from invalid context at /home/konrad/ssd/linux/kernel/mutex.c:85
> > in_atomic(): 1, irqs_disabled(): 1, pid: 6, name: migration/0
> > Pid: 6, comm: migration/0 Tainted: G           O 3.4.0upstream-00113-g598ff45-dirty #1
> > Call Trace:
> >  [<ffffffff8109830a>] __might_sleep+0xda/0x100
> >  [<ffffffff815a47f7>] mutex_lock+0x27/0x50
> >  [<ffffffff81311ea6>] rebind_evtchn_irq+0x36/0x90
> >  [<ffffffff81341bfc>] xen_console_resume+0x5c/0x60
> >  [<ffffffff81313fea>] xen_suspend+0x8a/0xb0
> >  [<ffffffff810d42f3>] stop_machine_cpu_stop+0xa3/0xf0
> >  [<ffffffff810d4250>] ? stop_one_cpu_nowait+0x50/0x50
> >  [<ffffffff810d3f81>] cpu_stopper_thread+0xf1/0x1c0
> >  [<ffffffff815a5be6>] ? __schedule+0x3c6/0x760
> >  [<ffffffff815a6bb9>] ? _raw_spin_unlock_irqrestore+0x19/0x30
> >  [<ffffffff810d3e90>] ? res_counter_charge+0x150/0x150
> >  [<ffffffff8108e636>] kthread+0x96/0xa0
> >  [<ffffffff815aeb24>] kernel_thread_helper+0x4/0x10
> >  [<ffffffff815a7138>] ? retint_restore_args+0x5/0x6
> >  [<ffffffff815aeb20>] ? gs_change+0x13/0x13
> > PM: noirq restore of devices complete after 0.163 msecs
> > 
> > 
> > Any ideas?
> 
> xen_console_resume is called from stop_machine context, which has irqs
> disabled etc and so cannot sleep.
> 
> One option might be to hoist the call of xen_console_resume out of the
> stop machine section, e.g. to the same place as xs_resume (which is the
> only over caller of rebind_evtchn_irq).
> 
> I'm a bit worried about not getting debug output during resume though,
> or worse poking the evtchn before it is actually setup again.
> 
> The alternative is to consider whether irq_mapping_update_lock is even
> needed in rebind_evtchn_irq. I have a feeling that the lock is a bit
> pessimistic in the general case (i.e. it covers more than it needs to).
> Lots of stuff which it might once has covered is actually locked
> elsewhere these days -- i.e. picking an irq number is handled in the
> core -- and the old array no longer exists (we use desc->handler_data
> instead). Once you have picked an irq and an evtchn I'm not sure that
> the lock when updating the info is even useful any more, so long as the
> irq/evtchn in question is masked. Anyhow, might be worth having a good
> look at the use of that lock -- if we can't get rid of it entirely
> perhaps its scope can be greatly reduced?

Considering that xen_console_resume is called by stop_machine, there is
certainly no need to protect rebind_evtchn_irq with a mutex at that
point.
However rebind_evtchn_irq is also called by xs_resume (xb_init_comms),
outside of stop_machine, so we might actually need a mutex there.

Maybe we need a rebind_evtchn_irqsafe function that doesn't take the
mutex and can be called from irq context.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2012-05-25  9:31 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-05-24 19:37 PV multiconsole bug during resume Konrad Rzeszutek Wilk
2012-05-25  8:56 ` Ian Campbell
2012-05-25  9:31   ` Stefano Stabellini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.