public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: + fix-soft-lockup-when-removing-netconsole-module.patch added to -mm tree
@ 2007-05-30 21:01 Oleg Nesterov
  2007-05-30 21:51 ` Andrew Morton
  0 siblings, 1 reply; 3+ messages in thread
From: Oleg Nesterov @ 2007-05-30 21:01 UTC (permalink / raw)
  To: Jason Wessel
  Cc: Stephen Hemminger, David S. Miller, Andrew Morton, linux-kernel

Jason Wessel wrote:
>
> The netpoll_cleanup handler can hang the kernel if there is no work in the
> work queue because a call to cancel_rearming_delayed_work() with no work
> goes into an infinite loop.

This should not be true any longer, cancel_rearming_delayed_work() should work
correctly in any case.

Could you please clarify?

> @@ -771,30 +771,32 @@ void netpoll_cleanup(struct netpoll *np)
>
> [...snip...]
>
> +		if (atomic_dec_and_test(&npinfo->refcnt)) {
> +			skb_queue_purge(&npinfo->arp_tx);
> +			skb_queue_purge(&npinfo->txq);
> +			if (delayed_work_pending(&npinfo->tx_work)) {
>  				cancel_rearming_delayed_work(&npinfo->tx_work);
>  				flush_scheduled_work();

But this "if (delayed_work_pending())" is racy anyway?

Oleg.


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: + fix-soft-lockup-when-removing-netconsole-module.patch added to -mm tree
  2007-05-30 21:01 + fix-soft-lockup-when-removing-netconsole-module.patch added to -mm tree Oleg Nesterov
@ 2007-05-30 21:51 ` Andrew Morton
  2007-05-30 22:27   ` Oleg Nesterov
  0 siblings, 1 reply; 3+ messages in thread
From: Andrew Morton @ 2007-05-30 21:51 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Jason Wessel, Stephen Hemminger, David S. Miller, linux-kernel

On Thu, 31 May 2007 01:01:37 +0400
Oleg Nesterov <oleg@tv-sign.ru> wrote:

> Jason Wessel wrote:
> >
> > The netpoll_cleanup handler can hang the kernel if there is no work in the
> > work queue because a call to cancel_rearming_delayed_work() with no work
> > goes into an infinite loop.
> 
> This should not be true any longer, cancel_rearming_delayed_work() should work
> correctly in any case.
> 
> Could you please clarify?

We need a 2.6.21.x fix.

> > @@ -771,30 +771,32 @@ void netpoll_cleanup(struct netpoll *np)
> >
> > [...snip...]
> >
> > +		if (atomic_dec_and_test(&npinfo->refcnt)) {
> > +			skb_queue_purge(&npinfo->arp_tx);
> > +			skb_queue_purge(&npinfo->txq);
> > +			if (delayed_work_pending(&npinfo->tx_work)) {
> >  				cancel_rearming_delayed_work(&npinfo->tx_work);
> >  				flush_scheduled_work();
> 
> But this "if (delayed_work_pending())" is racy anyway?
> 

I guess so, a bit.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: + fix-soft-lockup-when-removing-netconsole-module.patch added to -mm tree
  2007-05-30 21:51 ` Andrew Morton
@ 2007-05-30 22:27   ` Oleg Nesterov
  0 siblings, 0 replies; 3+ messages in thread
From: Oleg Nesterov @ 2007-05-30 22:27 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jason Wessel, Stephen Hemminger, David S. Miller, linux-kernel,
	Tejun Heo

On 05/30, Andrew Morton wrote:
>
> On Thu, 31 May 2007 01:01:37 +0400
> Oleg Nesterov <oleg@tv-sign.ru> wrote:
> 
> > Jason Wessel wrote:
> > >
> > > The netpoll_cleanup handler can hang the kernel if there is no work in the
> > > work queue because a call to cancel_rearming_delayed_work() with no work
> > > goes into an infinite loop.
> > 
> > This should not be true any longer, cancel_rearming_delayed_work() should work
> > correctly in any case.
> > 
> > Could you please clarify?
> 
> We need a 2.6.21.x fix.

Ah, OK, sorry for noise.

> > > @@ -771,30 +771,32 @@ void netpoll_cleanup(struct netpoll *np)
> > >
> > > [...snip...]
> > >
> > > +		if (atomic_dec_and_test(&npinfo->refcnt)) {
> > > +			skb_queue_purge(&npinfo->arp_tx);
> > > +			skb_queue_purge(&npinfo->txq);
> > > +			if (delayed_work_pending(&npinfo->tx_work)) {
> > >  				cancel_rearming_delayed_work(&npinfo->tx_work);
> > >  				flush_scheduled_work();
> > 
> > But this "if (delayed_work_pending())" is racy anyway?
> > 
> 
> I guess so, a bit.

How about this COMPLETELY UNTESTED patch? (it borrows Tejun's double flush
trick).

--- n/net/core/netpoll.c~	2007-05-31 02:12:37.000000000 +0400
+++ n/net/core/netpoll.c	2007-05-31 02:13:39.000000000 +0400
@@ -773,8 +773,16 @@ void netpoll_cleanup(struct netpoll *np)
 			if (atomic_dec_and_test(&npinfo->refcnt)) {
 				skb_queue_purge(&npinfo->arp_tx);
 				skb_queue_purge(&npinfo->txq);
-				cancel_rearming_delayed_work(&npinfo->tx_work);
+
 				flush_scheduled_work();
+				/*
+				 * the next invocation of queue_process() can't
+				 * re-schedule ->tx_work because ->txq is empty
+				 */
+				if (!cancel_delayed_work(&npinfo->tx_work)) {
+					/* may be queued, wait for completion */
+					flush_scheduled_work();
+				}
 
 				kfree(npinfo);
 			}


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2007-05-30 22:27 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-05-30 21:01 + fix-soft-lockup-when-removing-netconsole-module.patch added to -mm tree Oleg Nesterov
2007-05-30 21:51 ` Andrew Morton
2007-05-30 22:27   ` Oleg Nesterov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox