* Re: [NETPOLL] netconsole: fix soft lockup when removing module [not found] <20070701173558.GA207@tv-sign.ru> @ 2007-07-02 6:34 ` Jarek Poplawski 2007-07-02 9:24 ` Oleg Nesterov 2007-07-02 7:52 ` [PATCH] " Jarek Poplawski 1 sibling, 1 reply; 10+ messages in thread From: Jarek Poplawski @ 2007-07-02 6:34 UTC (permalink / raw) To: Oleg Nesterov Cc: Linus Torvalds, Andrew Morton, David S. Miller, linux-kernel, netdev On Sun, Jul 01, 2007 at 09:35:58PM +0400, Oleg Nesterov wrote: > Jarek Poplawski wrote: > > > > #1 > > Until kernel ver. 2.6.21 (including) cancel_rearming_delayed_work() > > required a work function should always (unconditionally) rearm with > > delay > 0 - otherwise it would endlessly loop. This patch replaces > > this function with cancel_delayed_work(). Later kernel versions don't > > require this, so here it's only for uniformity. > > But 2.6.22 doesn't need this change, why it was merged? One bad reason is given above. Should I look for another one? > > In fact, I suspect this change adds a race, You are right! > > > --- a/net/core/netpoll.c > > +++ b/net/core/netpoll.c > > @@ -72,7 +72,8 @@ static void queue_process(struct work_struct *work) > > netif_tx_unlock(dev); > > local_irq_restore(flags); > > > > - schedule_delayed_work(&npinfo->tx_work, HZ/10); > > + if (atomic_read(&npinfo->refcnt)) > > + schedule_delayed_work(&npinfo->tx_work, HZ/10); > > return; > > } > > netif_tx_unlock(dev); > > @@ -785,9 +786,15 @@ void netpoll_cleanup(struct netpoll *np) > > if (atomic_dec_and_test(&npinfo->refcnt)) { > > skb_queue_purge(&npinfo->arp_tx); > > skb_queue_purge(&npinfo->txq); > > - cancel_rearming_delayed_work(&npinfo->tx_work); > > + cancel_delayed_work(&npinfo->tx_work); > > flush_scheduled_work(); > > Suppose that ->refcnt == 1, and queue_process() was preempted just after > atomic_read(&npinfo->refcnt). > > netpoll_cleanup() comes, cancel_delayed_work() does nothing, flush_scheduled_work() > sleeps. > > queue_process() gets CPU, re-schedules ->tx_work, and returns. > > flush_scheduled_work() completes, netpoll_cleanup() frees npinfo and returns > while ->tx_work is pending. > > No? No no. (Yes!) I had some doubts about this, and you found very good reason for this. I'll soon send a patch to restore cancel_rearming_delayed_work in 2.6.22. So, 2.6.21 needs something better (maybe you've found it btw.?), but they weren't too interested, anyway. Thanks very much & regards, Jarek P. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [NETPOLL] netconsole: fix soft lockup when removing module 2007-07-02 6:34 ` [NETPOLL] netconsole: fix soft lockup when removing module Jarek Poplawski @ 2007-07-02 9:24 ` Oleg Nesterov 2007-07-02 11:08 ` Jarek Poplawski 0 siblings, 1 reply; 10+ messages in thread From: Oleg Nesterov @ 2007-07-02 9:24 UTC (permalink / raw) To: Jarek Poplawski Cc: Linus Torvalds, Andrew Morton, David S. Miller, linux-kernel, netdev On 07/02, Jarek Poplawski wrote: > > > > --- a/net/core/netpoll.c > > > +++ b/net/core/netpoll.c > > > @@ -72,7 +72,8 @@ static void queue_process(struct work_struct *work) > > > netif_tx_unlock(dev); > > > local_irq_restore(flags); > > > > > > - schedule_delayed_work(&npinfo->tx_work, HZ/10); > > > + if (atomic_read(&npinfo->refcnt)) > > > + schedule_delayed_work(&npinfo->tx_work, HZ/10); > > > return; > > > } > > [...snip...] > > So, 2.6.21 needs something better (maybe you've found it btw.?), > but they weren't too interested, anyway. We can do a double flush trick. If queue_process() checks ->refcnt before schedule_delayed_work() like above, netpoll_cleanup() can do flush_scheduled_work(); // the next invocation of queue_process() // must see ->refcnt == 0 if (!cancel_delayed_work(&npinfo->tx_work)) { /* may be queued, wait for completion */ flush_scheduled_work(); } Jarek, I don't understand net/, a silly question. Why do we need the #2 chunk? Isn't it better to move skb_queue_purge(&npinfo->txq) after cancel_..._work() instead? Oleg. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [NETPOLL] netconsole: fix soft lockup when removing module 2007-07-02 9:24 ` Oleg Nesterov @ 2007-07-02 11:08 ` Jarek Poplawski 0 siblings, 0 replies; 10+ messages in thread From: Jarek Poplawski @ 2007-07-02 11:08 UTC (permalink / raw) To: Oleg Nesterov Cc: Linus Torvalds, Andrew Morton, David S. Miller, linux-kernel, netdev On Mon, Jul 02, 2007 at 01:24:08PM +0400, Oleg Nesterov wrote: > On 07/02, Jarek Poplawski wrote: > > > > > > --- a/net/core/netpoll.c > > > > +++ b/net/core/netpoll.c > > > > @@ -72,7 +72,8 @@ static void queue_process(struct work_struct *work) > > > > netif_tx_unlock(dev); > > > > local_irq_restore(flags); > > > > > > > > - schedule_delayed_work(&npinfo->tx_work, HZ/10); > > > > + if (atomic_read(&npinfo->refcnt)) > > > > + schedule_delayed_work(&npinfo->tx_work, HZ/10); > > > > return; > > > > } > > > > [...snip...] > > > > So, 2.6.21 needs something better (maybe you've found it btw.?), > > but they weren't too interested, anyway. > > We can do a double flush trick. If queue_process() checks ->refcnt before > schedule_delayed_work() like above, netpoll_cleanup() can do > > flush_scheduled_work(); > > // the next invocation of queue_process() > // must see ->refcnt == 0 > if (!cancel_delayed_work(&npinfo->tx_work)) { > /* may be queued, wait for completion */ > flush_scheduled_work(); > } I'll try to think about it later, but I don't plan to do next patch, so feel free to send this. I didn't plan to fix netpoll at all (I never didn't use nor studied this before...). But couldn't stand this stupid lockup stays in 2.6.21. Now, I see it probably doesn't annoy more than 2 or 3 people around... > > Jarek, I don't understand net/, a silly question. Why do we need the #2 chunk? > Isn't it better to move skb_queue_purge(&npinfo->txq) after cancel_..._work() > instead? I've thought about this too, but because I don't know netpoll/netconsole enough I didn't want to change more than minimum needed. skb_queue_purge() uses heavy locking (irqsave) and I don't remember now if I've found the reason or simply believed somebody had to have a reason to do this there, anyway, if moved after cancel_ it could be done without this locking, and something like while () instead of my if () should be enough. But there was not much interest about this patch, and I'm not currently interested to be the main netconsole expert too, so maybe you would like to try... Cheers, Jarek P. ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH] Re: [NETPOLL] netconsole: fix soft lockup when removing module [not found] <20070701173558.GA207@tv-sign.ru> 2007-07-02 6:34 ` [NETPOLL] netconsole: fix soft lockup when removing module Jarek Poplawski @ 2007-07-02 7:52 ` Jarek Poplawski 2007-07-02 8:59 ` Oleg Nesterov 2007-07-04 6:41 ` [PATCH] Re: [NETPOLL] netconsole: fix soft lockup when removing module Jarek Poplawski 1 sibling, 2 replies; 10+ messages in thread From: Jarek Poplawski @ 2007-07-02 7:52 UTC (permalink / raw) To: Oleg Nesterov Cc: Linus Torvalds, Andrew Morton, David S. Miller, linux-kernel, netdev >From my recent patch: > > #1 > > Until kernel ver. 2.6.21 (including) cancel_rearming_delayed_work() > > required a work function should always (unconditionally) rearm with > > delay > 0 - otherwise it would endlessly loop. This patch replaces > > this function with cancel_delayed_work(). Later kernel versions don't > > require this, so here it's only for uniformity. But Oleg Nesterov <oleg@tv-sign.ru> found: > But 2.6.22 doesn't need this change, why it was merged? > > In fact, I suspect this change adds a race, ... His description was right (thanks), so this patch reverts #1. Signed-off-by: Jarek Poplawski <jarkao2@o2.pl> --- diff -Nurp 2.6.22-rc7-/net/core/netpoll.c 2.6.22-rc7/net/core/netpoll.c --- 2.6.22-rc7-/net/core/netpoll.c 2007-07-02 09:03:27.000000000 +0200 +++ 2.6.22-rc7/net/core/netpoll.c 2007-07-02 09:32:34.000000000 +0200 @@ -72,8 +72,7 @@ static void queue_process(struct work_st netif_tx_unlock(dev); local_irq_restore(flags); - if (atomic_read(&npinfo->refcnt)) - schedule_delayed_work(&npinfo->tx_work, HZ/10); + schedule_delayed_work(&npinfo->tx_work, HZ/10); return; } netif_tx_unlock(dev); @@ -786,7 +785,7 @@ void netpoll_cleanup(struct netpoll *np) if (atomic_dec_and_test(&npinfo->refcnt)) { skb_queue_purge(&npinfo->arp_tx); skb_queue_purge(&npinfo->txq); - cancel_delayed_work(&npinfo->tx_work); + cancel_rearming_delayed_work(&npinfo->tx_work); flush_scheduled_work(); /* clean after last, unfinished work */ ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] Re: [NETPOLL] netconsole: fix soft lockup when removing module 2007-07-02 7:52 ` [PATCH] " Jarek Poplawski @ 2007-07-02 8:59 ` Oleg Nesterov 2007-07-02 10:12 ` [PATCH 2/2][NETPOLL] netconsole: delete flush_scheduled_work Jarek Poplawski 2007-07-04 6:41 ` [PATCH] Re: [NETPOLL] netconsole: fix soft lockup when removing module Jarek Poplawski 1 sibling, 1 reply; 10+ messages in thread From: Oleg Nesterov @ 2007-07-02 8:59 UTC (permalink / raw) To: Jarek Poplawski Cc: Linus Torvalds, Andrew Morton, David S. Miller, linux-kernel, netdev On 07/02, Jarek Poplawski wrote: > > diff -Nurp 2.6.22-rc7-/net/core/netpoll.c 2.6.22-rc7/net/core/netpoll.c > --- 2.6.22-rc7-/net/core/netpoll.c 2007-07-02 09:03:27.000000000 +0200 > +++ 2.6.22-rc7/net/core/netpoll.c 2007-07-02 09:32:34.000000000 +0200 > @@ -72,8 +72,7 @@ static void queue_process(struct work_st > netif_tx_unlock(dev); > local_irq_restore(flags); > > - if (atomic_read(&npinfo->refcnt)) > - schedule_delayed_work(&npinfo->tx_work, HZ/10); > + schedule_delayed_work(&npinfo->tx_work, HZ/10); > return; > } > netif_tx_unlock(dev); > @@ -786,7 +785,7 @@ void netpoll_cleanup(struct netpoll *np) > if (atomic_dec_and_test(&npinfo->refcnt)) { > skb_queue_purge(&npinfo->arp_tx); > skb_queue_purge(&npinfo->txq); > - cancel_delayed_work(&npinfo->tx_work); > + cancel_rearming_delayed_work(&npinfo->tx_work); > flush_scheduled_work(); While you are here, could you also delete this flush_scheduled_work() ? It is not needed any longer. Oleg. ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 2/2][NETPOLL] netconsole: delete flush_scheduled_work 2007-07-02 8:59 ` Oleg Nesterov @ 2007-07-02 10:12 ` Jarek Poplawski 0 siblings, 0 replies; 10+ messages in thread From: Jarek Poplawski @ 2007-07-02 10:12 UTC (permalink / raw) To: Oleg Nesterov Cc: Linus Torvalds, Andrew Morton, David S. Miller, linux-kernel, netdev On Mon, Jul 02, 2007 at 12:59:49PM +0400, Oleg Nesterov wrote: ... > While you are here, could you also delete this flush_scheduled_work() ? > It is not needed any longer. Yes. I've thought about this, and even planned to mention, but then forgotten... Of course, you are right, but since it stayed so long and doesn't seem to be dangerous, and there is -rc7 I wasn't so brave. But now I have an explanation... Jarek P. ----------> Subject: [PATCH][NETPOLL] netconsole: delete flush_scheduled_work flush_scheduled_work() isn't needed after cancel_rearming_delayed_work(), so here it's removed from netpoll_cleanup(). PS: This patch was prepared on 2.6.22-rc7 with my other today's patch: netconsole: fix soft lockup ... Noticed-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Jarek Poplawski <jarkao2@o2.pl> --- diff -Nurp 2.6.22-rc7-plus-revert1-/net/core/netpoll.c 2.6.22-rc7-plus-revert1/net/core/netpoll.c --- 2.6.22-rc7-plus-revert1-/net/core/netpoll.c 2007-07-02 09:32:34.000000000 +0200 +++ 2.6.22-rc7-plus-revert1/net/core/netpoll.c 2007-07-02 11:43:29.000000000 +0200 @@ -786,7 +786,6 @@ void netpoll_cleanup(struct netpoll *np) skb_queue_purge(&npinfo->arp_tx); skb_queue_purge(&npinfo->txq); cancel_rearming_delayed_work(&npinfo->tx_work); - flush_scheduled_work(); /* clean after last, unfinished work */ if (!skb_queue_empty(&npinfo->txq)) { ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] Re: [NETPOLL] netconsole: fix soft lockup when removing module 2007-07-02 7:52 ` [PATCH] " Jarek Poplawski 2007-07-02 8:59 ` Oleg Nesterov @ 2007-07-04 6:41 ` Jarek Poplawski 2007-07-04 6:47 ` David Miller 2007-07-04 7:21 ` Jarek Poplawski 1 sibling, 2 replies; 10+ messages in thread From: Jarek Poplawski @ 2007-07-04 6:41 UTC (permalink / raw) To: Oleg Nesterov Cc: Linus Torvalds, Andrew Morton, David S. Miller, linux-kernel, netdev On Mon, Jul 02, 2007 at 09:52:26AM +0200, Jarek Poplawski wrote: > > From my recent patch: > > > > #1 > > > Until kernel ver. 2.6.21 (including) cancel_rearming_delayed_work() > > > required a work function should always (unconditionally) rearm with > > > delay > 0 - otherwise it would endlessly loop. This patch replaces > > > this function with cancel_delayed_work(). Later kernel versions don't > > > require this, so here it's only for uniformity. > > But Oleg Nesterov <oleg@tv-sign.ru> found: > > > But 2.6.22 doesn't need this change, why it was merged? > > > > In fact, I suspect this change adds a race, > ... > > His description was right (thanks), so this patch reverts #1. > > Signed-off-by: Jarek Poplawski <jarkao2@o2.pl> Oleg, I think maybe you could ack these 2 netconsole patches... They were done on your request but it looks like Andrew is waiting on something... Thanks, Jarek P. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] Re: [NETPOLL] netconsole: fix soft lockup when removing module 2007-07-04 6:41 ` [PATCH] Re: [NETPOLL] netconsole: fix soft lockup when removing module Jarek Poplawski @ 2007-07-04 6:47 ` David Miller 2007-07-04 7:08 ` Jarek Poplawski 2007-07-04 7:21 ` Jarek Poplawski 1 sibling, 1 reply; 10+ messages in thread From: David Miller @ 2007-07-04 6:47 UTC (permalink / raw) To: jarkao2; +Cc: oleg, torvalds, akpm, linux-kernel, netdev From: Jarek Poplawski <jarkao2@o2.pl> Date: Wed, 4 Jul 2007 08:41:59 +0200 > On Mon, Jul 02, 2007 at 09:52:26AM +0200, Jarek Poplawski wrote: > > > > From my recent patch: > > > > > > #1 > > > > Until kernel ver. 2.6.21 (including) cancel_rearming_delayed_work() > > > > required a work function should always (unconditionally) rearm with > > > > delay > 0 - otherwise it would endlessly loop. This patch replaces > > > > this function with cancel_delayed_work(). Later kernel versions don't > > > > require this, so here it's only for uniformity. > > > > But Oleg Nesterov <oleg@tv-sign.ru> found: > > > > > But 2.6.22 doesn't need this change, why it was merged? > > > > > > In fact, I suspect this change adds a race, > > ... > > > > His description was right (thanks), so this patch reverts #1. > > > > Signed-off-by: Jarek Poplawski <jarkao2@o2.pl> > > Oleg, > > I think maybe you could ack these 2 netconsole patches... > They were done on your request but it looks like Andrew > is waiting on something... I plan to apply this patch, don't worry about it :) ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] Re: [NETPOLL] netconsole: fix soft lockup when removing module 2007-07-04 6:47 ` David Miller @ 2007-07-04 7:08 ` Jarek Poplawski 0 siblings, 0 replies; 10+ messages in thread From: Jarek Poplawski @ 2007-07-04 7:08 UTC (permalink / raw) To: David Miller; +Cc: oleg, torvalds, akpm, linux-kernel, netdev On Tue, Jul 03, 2007 at 11:47:18PM -0700, David Miller wrote: ... > I plan to apply this patch, don't worry about it :) Now I'm really worried! Don't you evere sleep? Good night, Jarek P. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] Re: [NETPOLL] netconsole: fix soft lockup when removing module 2007-07-04 6:41 ` [PATCH] Re: [NETPOLL] netconsole: fix soft lockup when removing module Jarek Poplawski 2007-07-04 6:47 ` David Miller @ 2007-07-04 7:21 ` Jarek Poplawski 1 sibling, 0 replies; 10+ messages in thread From: Jarek Poplawski @ 2007-07-04 7:21 UTC (permalink / raw) To: Oleg Nesterov Cc: Linus Torvalds, Andrew Morton, David S. Miller, linux-kernel, netdev On Wed, Jul 04, 2007 at 08:41:59AM +0200, Jarek Poplawski wrote: ... > They were done on your request but it looks like Andrew > is waiting on something... Andrew, This time I'm not sorry for my English because I've just found I could speak "Chiefly Midland and Southern U.S.". Jarek P. ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2007-07-04 7:13 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20070701173558.GA207@tv-sign.ru>
2007-07-02 6:34 ` [NETPOLL] netconsole: fix soft lockup when removing module Jarek Poplawski
2007-07-02 9:24 ` Oleg Nesterov
2007-07-02 11:08 ` Jarek Poplawski
2007-07-02 7:52 ` [PATCH] " Jarek Poplawski
2007-07-02 8:59 ` Oleg Nesterov
2007-07-02 10:12 ` [PATCH 2/2][NETPOLL] netconsole: delete flush_scheduled_work Jarek Poplawski
2007-07-04 6:41 ` [PATCH] Re: [NETPOLL] netconsole: fix soft lockup when removing module Jarek Poplawski
2007-07-04 6:47 ` David Miller
2007-07-04 7:08 ` Jarek Poplawski
2007-07-04 7:21 ` Jarek Poplawski
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).