* Re: NOHZ: local_softirq_pending 08 [not found] <20091011095217.GA2200@darkstar> @ 2009-10-11 10:08 ` Johannes Berg 2009-10-11 10:17 ` Michael Buesch ` (2 more replies) 0 siblings, 3 replies; 35+ messages in thread From: Johannes Berg @ 2009-10-11 10:08 UTC (permalink / raw) To: Dave Young; +Cc: linux-kernel, tglx, linux-wireless, David S. Miller [-- Attachment #1: Type: text/plain, Size: 881 bytes --] On Sun, 2009-10-11 at 17:52 +0800, Dave Young wrote: > With kernel 2.6.32-rc3-00052-g0eca52a I got following KERN_ERR > messages just while using firefox: > > [ 130.527399] NOHZ: local_softirq_pending 08 > Any idea? or known issue? Are you using b43 (or wl12x1)? If so, it's a known issue, but the driver was recently left without an active maintainer in a brouhaha about a bug fix. Cf. http://thread.gmane.org/gmane.linux.kernel.wireless.general/39440 Absent proof that mac80211 is safe to run with BHs enabled, the correct solution is disabling tasklets around the RX function, unlike all the proposed patches. However, Michael thinks it's such a bad solution that he has refused to implement it. So far, nobody has bothered to fix the drivers. FWIW, I believe the bug to be in b43 and wl12x1, and not as Michael thinks in the stack. johannes [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 801 bytes --] ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: NOHZ: local_softirq_pending 08 2009-10-11 10:08 ` NOHZ: local_softirq_pending 08 Johannes Berg @ 2009-10-11 10:17 ` Michael Buesch 2009-10-11 10:37 ` Johannes Berg 2009-10-11 10:38 ` David Miller 2009-10-11 10:55 ` Dave Young 2009-10-11 11:18 ` Tilman Schmidt 2 siblings, 2 replies; 35+ messages in thread From: Michael Buesch @ 2009-10-11 10:17 UTC (permalink / raw) To: Johannes Berg Cc: Dave Young, linux-kernel, tglx, linux-wireless, David S. Miller On Sunday 11 October 2009 12:08:55 Johannes Berg wrote: > On Sun, 2009-10-11 at 17:52 +0800, Dave Young wrote: > > > With kernel 2.6.32-rc3-00052-g0eca52a I got following KERN_ERR > > messages just while using firefox: > > > > [ 130.527399] NOHZ: local_softirq_pending 08 > > > Any idea? or known issue? > > Are you using b43 (or wl12x1)? If so, it's a known issue, but the driver > was recently left without an active maintainer in a brouhaha about a bug > fix. > > Cf. http://thread.gmane.org/gmane.linux.kernel.wireless.general/39440 > > Absent proof that mac80211 is safe to run with BHs enabled, the correct > solution is disabling tasklets around the RX function, unlike all the > proposed patches. However, Michael thinks it's such a bad solution that > he has refused to implement it. Ehm, no. That's not exactly true. We call the non-_irqsafe functions, which by definition are designed to run in non-irq (soft or hard) context. At least that's how I understand the documentation, last time I read it. Why don't you simply do local_bh_disable() in those functions, if they require bh disabled, instead of depending on the driver doing it? > FWIW, I believe the bug to be in b43 and wl12x1, and not as Michael > thinks in the stack. If mac80211 requires BHs disabled, it should do this. -- Greetings, Michael. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: NOHZ: local_softirq_pending 08 2009-10-11 10:17 ` Michael Buesch @ 2009-10-11 10:37 ` Johannes Berg 2009-10-11 10:38 ` David Miller 1 sibling, 0 replies; 35+ messages in thread From: Johannes Berg @ 2009-10-11 10:37 UTC (permalink / raw) To: Michael Buesch Cc: Dave Young, linux-kernel, tglx, linux-wireless, David S. Miller [-- Attachment #1: Type: text/plain, Size: 1366 bytes --] On Sun, 2009-10-11 at 12:17 +0200, Michael Buesch wrote: > Ehm, no. That's not exactly true. > We call the non-_irqsafe functions, which by definition are designed to > run in non-irq (soft or hard) context. At least that's how I understand the > documentation, last time I read it. So maybe the documentation is not entirely accurate. Such happens. From this and previous threads tt's pretty obvious that these functions cannot be called with softirqs enabled. And I've also stated before that I do not believe that we should call them with softirqs enabled without auditing the code for locking, which historically has been a weak point of mac80211. > Why don't you simply do local_bh_disable() in those functions, if they > require bh disabled, instead of depending on the driver doing it? > > > FWIW, I believe the bug to be in b43 and wl12x1, and not as Michael > > thinks in the stack. > > If mac80211 requires BHs disabled, it should do this. I don't believe adding that into mac80211, even though it nests, is a good idea for the case of many drivers where mac80211 and/or the driver knows. It's pretty damn trivial to add two lines of code to the driver, instead of penalising every other driver. The typical kernel style is making things provide the required context, not a function take any possible context. johannes [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 801 bytes --] ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: NOHZ: local_softirq_pending 08 2009-10-11 10:17 ` Michael Buesch 2009-10-11 10:37 ` Johannes Berg @ 2009-10-11 10:38 ` David Miller 1 sibling, 0 replies; 35+ messages in thread From: David Miller @ 2009-10-11 10:38 UTC (permalink / raw) To: mb; +Cc: johannes, hidave.darkstar, linux-kernel, tglx, linux-wireless From: Michael Buesch <mb@bu3sch.de> Date: Sun, 11 Oct 2009 12:17:30 +0200 > On Sunday 11 October 2009 12:08:55 Johannes Berg wrote: >> On Sun, 2009-10-11 at 17:52 +0800, Dave Young wrote: >> >> FWIW, I believe the bug to be in b43 and wl12x1, and not as Michael >> thinks in the stack. > > If mac80211 requires BHs disabled, it should do this. That's overhead, and %99 of drivers do not require it, and therefore for %99 of drivers it's unnecessary overhead. In general we avoid doing things like that. Instead, we put the cost only where it's actually needed. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: NOHZ: local_softirq_pending 08 2009-10-11 10:08 ` NOHZ: local_softirq_pending 08 Johannes Berg 2009-10-11 10:17 ` Michael Buesch @ 2009-10-11 10:55 ` Dave Young 2009-10-11 11:18 ` Tilman Schmidt 2 siblings, 0 replies; 35+ messages in thread From: Dave Young @ 2009-10-11 10:55 UTC (permalink / raw) To: Johannes Berg; +Cc: linux-kernel, tglx, linux-wireless, David S. Miller On Sun, Oct 11, 2009 at 6:08 PM, Johannes Berg <johannes@sipsolutions.net> wrote: > On Sun, 2009-10-11 at 17:52 +0800, Dave Young wrote: > >> With kernel 2.6.32-rc3-00052-g0eca52a I got following KERN_ERR >> messages just while using firefox: >> >> [ 130.527399] NOHZ: local_softirq_pending 08 > >> Any idea? or known issue? > > Are you using b43 (or wl12x1)? If so, it's a known issue, but the driver > was recently left without an active maintainer in a brouhaha about a bug > fix. Yes, I'm using b43. I will test the patch you posted in another thread. > > Cf. http://thread.gmane.org/gmane.linux.kernel.wireless.general/39440 > > Absent proof that mac80211 is safe to run with BHs enabled, the correct > solution is disabling tasklets around the RX function, unlike all the > proposed patches. However, Michael thinks it's such a bad solution that > he has refused to implement it. So far, nobody has bothered to fix the > drivers. > > FWIW, I believe the bug to be in b43 and wl12x1, and not as Michael > thinks in the stack. > > johannes > > -- Regards dave ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: NOHZ: local_softirq_pending 08 2009-10-11 10:08 ` NOHZ: local_softirq_pending 08 Johannes Berg 2009-10-11 10:17 ` Michael Buesch 2009-10-11 10:55 ` Dave Young @ 2009-10-11 11:18 ` Tilman Schmidt 2009-10-11 11:40 ` Johannes Berg 2 siblings, 1 reply; 35+ messages in thread From: Tilman Schmidt @ 2009-10-11 11:18 UTC (permalink / raw) To: Johannes Berg Cc: Dave Young, linux-kernel, tglx, linux-wireless, David S. Miller On Sun, 11 Oct 2009 12:08:55 +0200, Johannes Berg wrote: > On Sun, 2009-10-11 at 17:52 +0800, Dave Young wrote: > >> With kernel 2.6.32-rc3-00052-g0eca52a I got following KERN_ERR >> messages just while using firefox: >> >> [ 130.527399] NOHZ: local_softirq_pending 08 > >> Any idea? or known issue? > > Are you using b43 (or wl12x1)? If so, it's a known issue, but the driver > was recently left without an active maintainer in a brouhaha about a bug > fix. > > Cf. http://thread.gmane.org/gmane.linux.kernel.wireless.general/39440 Can you explain a bit more what that message is about? I am encountering it in a completely different context (PPP over ISDN) and I would like to know where to start looking for the cause and developing a fix. The thread on linux.kernel.wireless.general only seems to address the specific situation in the wireless stack. Thanks, Tilman ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: NOHZ: local_softirq_pending 08 2009-10-11 11:18 ` Tilman Schmidt @ 2009-10-11 11:40 ` Johannes Berg 2009-10-12 8:28 ` Tilman Schmidt 0 siblings, 1 reply; 35+ messages in thread From: Johannes Berg @ 2009-10-11 11:40 UTC (permalink / raw) To: Tilman Schmidt Cc: Dave Young, linux-kernel, tglx, linux-wireless, David S. Miller [-- Attachment #1: Type: text/plain, Size: 468 bytes --] On Sun, 2009-10-11 at 13:18 +0200, Tilman Schmidt wrote: > Can you explain a bit more what that message is about? > I am encountering it in a completely different context > (PPP over ISDN) and I would like to know where to start > looking for the cause and developing a fix. The thread > on linux.kernel.wireless.general only seems to address > the specific situation in the wireless stack. Basically, calling netif_rx() with softirqs enabled. johannes [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 801 bytes --] ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: NOHZ: local_softirq_pending 08 2009-10-11 11:40 ` Johannes Berg @ 2009-10-12 8:28 ` Tilman Schmidt 2009-10-12 10:32 ` David Miller 0 siblings, 1 reply; 35+ messages in thread From: Tilman Schmidt @ 2009-10-12 8:28 UTC (permalink / raw) To: Johannes Berg Cc: Dave Young, linux-kernel, tglx, linux-wireless, David S. Miller, linux-ppp, netdev, Paul Mackerras -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 [CCing PPP people] Am 11.10.2009 13:40 schrieb Johannes Berg: > On Sun, 2009-10-11 at 13:18 +0200, Tilman Schmidt wrote: > >> Can you explain a bit more what that message is about? >> I am encountering it in a completely different context >> (PPP over ISDN) [...] > > Basically, calling netif_rx() with softirqs enabled. AFAICS that would have to be the netif_rx() call in ppp_receive_nonmp_frame() [drivers/net/ppp_generic.c#L1791], called (via others) from the tasklet work function ppp_sync_process() [drivers/net/ppp_synctty.c#L545]. Should that be changed to the "if (in_interrupt()) netif_rx(skb) else netif_rx_ni(skb)" stanza from the linux.kernel.wireless.general thread then? Thanks, Tilman - -- Tilman Schmidt E-Mail: tilman@imap.cc Bonn, Germany Diese Nachricht besteht zu 100% aus wiederverwerteten Bits. Ungeöffnet mindestens haltbar bis: (siehe Rückseite) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.4 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFK0ujIQ3+did9BuFsRAtGBAJ9rj2pyQZ75ZKTipLhpICqA3ZvTdQCdHQs/ RmdeOT3TuPZykXJxcHLJCzU= =87DI -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: NOHZ: local_softirq_pending 08 2009-10-12 8:28 ` Tilman Schmidt @ 2009-10-12 10:32 ` David Miller 2009-10-12 11:25 ` Tilman Schmidt 0 siblings, 1 reply; 35+ messages in thread From: David Miller @ 2009-10-12 10:32 UTC (permalink / raw) To: tilman Cc: johannes, hidave.darkstar, linux-kernel, tglx, linux-wireless, linux-ppp, netdev, paulus From: Tilman Schmidt <tilman@imap.cc> Date: Mon, 12 Oct 2009 10:28:56 +0200 > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > [CCing PPP people] > > Am 11.10.2009 13:40 schrieb Johannes Berg: >> On Sun, 2009-10-11 at 13:18 +0200, Tilman Schmidt wrote: >> >>> Can you explain a bit more what that message is about? >>> I am encountering it in a completely different context >>> (PPP over ISDN) [...] >> >> Basically, calling netif_rx() with softirqs enabled. > > AFAICS that would have to be the netif_rx() call in > ppp_receive_nonmp_frame() [drivers/net/ppp_generic.c#L1791], > called (via others) from the tasklet work function > ppp_sync_process() [drivers/net/ppp_synctty.c#L545]. > Should that be changed to the > "if (in_interrupt()) netif_rx(skb) else netif_rx_ni(skb)" > stanza from the linux.kernel.wireless.general thread then? The PPP receive paths in ppp_generic.c do a local_bh_disable()/ local_bh_enable() around packet receiving (via ppp_recv_lock()/ ppp_recv_unlock() in ppp_do_recv). So at least that part is perfectly fine. ppp_input(), as called from ppp_sync_process(), also disables BH's around ppp_do_recv() calls (via read_lock_bh()/read_unlock_bh()). So that's fine too. Do you have a bug report or are you just scanning around looking for trouble? :-) ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: NOHZ: local_softirq_pending 08 2009-10-12 10:32 ` David Miller @ 2009-10-12 11:25 ` Tilman Schmidt 2009-10-15 11:40 ` Jarek Poplawski 0 siblings, 1 reply; 35+ messages in thread From: Tilman Schmidt @ 2009-10-12 11:25 UTC (permalink / raw) To: David Miller Cc: tilman, johannes, hidave.darkstar, linux-kernel, tglx, linux-wireless, linux-ppp, netdev, paulus -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Mon, 12 Oct 2009 03:32:46 -0700 (PDT), David Miller wrote: > The PPP receive paths in ppp_generic.c do a local_bh_disable()/ > local_bh_enable() around packet receiving (via ppp_recv_lock()/ > ppp_recv_unlock() in ppp_do_recv). > > So at least that part is perfectly fine. > > ppp_input(), as called from ppp_sync_process(), also disables BH's > around ppp_do_recv() calls (via read_lock_bh()/read_unlock_bh()). > > So that's fine too. > > Do you have a bug report or are you just scanning around looking > for trouble? :-) I have encountered the message in the subject during a test of the Gigaset CAPI driver, and would like to determine whether it's a bug in the driver, a bug somewhere else, or no bug at all. The test scenario was PPP over ISDN with pppd+capiplugin. In an alternative scenario, also PPP over ISDN but with smpppd+capidrv, the message did not occur. Johannes' answer pointed me to the netif_rx() function. The Gigaset driver itself doesn't call that function at all. In the scenario where I saw the message, it was the SYNC_PPP line discipline that did. But from your explanation I gather that the cause cannot lie there. So now I'm looking for other possible causes of that message. - -- Tilman Schmidt E-Mail: tilman@imap.cc Bonn, Germany Diese Nachricht besteht zu 100% aus wiederverwerteten Bits. Ungeöffnet mindestens haltbar bis: (siehe Rückseite) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.4 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFK0xITQ3+did9BuFsRAmXGAKCIiqJffUnuKw9rPjxHwbj9AnXOigCdGgS9 MpxRNGs0A4GTydYJaylbjyo= =LFxi -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: NOHZ: local_softirq_pending 08 2009-10-12 11:25 ` Tilman Schmidt @ 2009-10-15 11:40 ` Jarek Poplawski 2009-10-15 17:53 ` Jarek Poplawski 0 siblings, 1 reply; 35+ messages in thread From: Jarek Poplawski @ 2009-10-15 11:40 UTC (permalink / raw) To: Tilman Schmidt Cc: David Miller, johannes, hidave.darkstar, linux-kernel, tglx, linux-wireless, linux-ppp, netdev, paulus On 12-10-2009 13:25, Tilman Schmidt wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On Mon, 12 Oct 2009 03:32:46 -0700 (PDT), David Miller wrote: >> The PPP receive paths in ppp_generic.c do a local_bh_disable()/ >> local_bh_enable() around packet receiving (via ppp_recv_lock()/ >> ppp_recv_unlock() in ppp_do_recv). >> >> So at least that part is perfectly fine. >> >> ppp_input(), as called from ppp_sync_process(), also disables BH's >> around ppp_do_recv() calls (via read_lock_bh()/read_unlock_bh()). >> >> So that's fine too. >> >> Do you have a bug report or are you just scanning around looking >> for trouble? :-) > > I have encountered the message in the subject during a test of > the Gigaset CAPI driver, and would like to determine whether > it's a bug in the driver, a bug somewhere else, or no bug at > all. The test scenario was PPP over ISDN with pppd+capiplugin. > In an alternative scenario, also PPP over ISDN but with > smpppd+capidrv, the message did not occur. > > Johannes' answer pointed me to the netif_rx() function. > The Gigaset driver itself doesn't call that function at all. > In the scenario where I saw the message, it was the SYNC_PPP > line discipline that did. But from your explanation I gather > that the cause cannot lie there. > > So now I'm looking for other possible causes of that message. Anyway, I agree with Michael Buesch there is no reason to waste time for tracking all netif_rx vs netif_rx_ni uses, and it seems we could avoid it by using the "proper" version of raise_softirq_irqoff() in __napi_schedule(). Could anybody try if I'm not wrong? Thanks, Jarek P. --- net/core/dev.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/net/core/dev.c b/net/core/dev.c index 28b0b9e..7fc4009 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -2728,7 +2728,7 @@ void __napi_schedule(struct napi_struct *n) local_irq_save(flags); list_add_tail(&n->poll_list, &__get_cpu_var(softnet_data).poll_list); - __raise_softirq_irqoff(NET_RX_SOFTIRQ); + raise_softirq_irqoff(NET_RX_SOFTIRQ); local_irq_restore(flags); } EXPORT_SYMBOL(__napi_schedule); ^ permalink raw reply related [flat|nested] 35+ messages in thread
* Re: NOHZ: local_softirq_pending 08 2009-10-15 11:40 ` Jarek Poplawski @ 2009-10-15 17:53 ` Jarek Poplawski 2009-10-21 18:46 ` Tilman Schmidt 0 siblings, 1 reply; 35+ messages in thread From: Jarek Poplawski @ 2009-10-15 17:53 UTC (permalink / raw) To: Tilman Schmidt Cc: David Miller, johannes, hidave.darkstar, linux-kernel, tglx, linux-wireless, linux-ppp, netdev, paulus Jarek Poplawski wrote, On 10/15/2009 01:40 PM: > On 12-10-2009 13:25, Tilman Schmidt wrote: >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> On Mon, 12 Oct 2009 03:32:46 -0700 (PDT), David Miller wrote: >>> The PPP receive paths in ppp_generic.c do a local_bh_disable()/ >>> local_bh_enable() around packet receiving (via ppp_recv_lock()/ >>> ppp_recv_unlock() in ppp_do_recv). >>> >>> So at least that part is perfectly fine. >>> >>> ppp_input(), as called from ppp_sync_process(), also disables BH's >>> around ppp_do_recv() calls (via read_lock_bh()/read_unlock_bh()). >>> >>> So that's fine too. >>> >>> Do you have a bug report or are you just scanning around looking >>> for trouble? :-) >> I have encountered the message in the subject during a test of >> the Gigaset CAPI driver, and would like to determine whether >> it's a bug in the driver, a bug somewhere else, or no bug at >> all. The test scenario was PPP over ISDN with pppd+capiplugin. >> In an alternative scenario, also PPP over ISDN but with >> smpppd+capidrv, the message did not occur. >> >> Johannes' answer pointed me to the netif_rx() function. >> The Gigaset driver itself doesn't call that function at all. >> In the scenario where I saw the message, it was the SYNC_PPP >> line discipline that did. But from your explanation I gather >> that the cause cannot lie there. >> >> So now I'm looking for other possible causes of that message. BTW, it seems calling napi_schedule() from process context should trigger such a warning too. Jarek P. > > Anyway, I agree with Michael Buesch there is no reason to waste time > for tracking all netif_rx vs netif_rx_ni uses, and it seems we could > avoid it by using the "proper" version of raise_softirq_irqoff() in > __napi_schedule(). Could anybody try if I'm not wrong? > > Thanks, > Jarek P. > --- > > net/core/dev.c | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/net/core/dev.c b/net/core/dev.c > index 28b0b9e..7fc4009 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -2728,7 +2728,7 @@ void __napi_schedule(struct napi_struct *n) > > local_irq_save(flags); > list_add_tail(&n->poll_list, &__get_cpu_var(softnet_data).poll_list); > - __raise_softirq_irqoff(NET_RX_SOFTIRQ); > + raise_softirq_irqoff(NET_RX_SOFTIRQ); > local_irq_restore(flags); > } > EXPORT_SYMBOL(__napi_schedule); > -- > To unsubscribe from this list: send the line "unsubscribe linux-ppp" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: NOHZ: local_softirq_pending 08 2009-10-15 17:53 ` Jarek Poplawski @ 2009-10-21 18:46 ` Tilman Schmidt 2009-10-21 21:19 ` [PATCH] net: Adjust softirq raising in __napi_schedule Jarek Poplawski 2009-10-22 23:37 ` NOHZ: local_softirq_pending 08 Tilman Schmidt 0 siblings, 2 replies; 35+ messages in thread From: Tilman Schmidt @ 2009-10-21 18:46 UTC (permalink / raw) To: Jarek Poplawski Cc: David Miller, johannes, hidave.darkstar, linux-kernel, tglx, linux-wireless, linux-ppp, netdev, paulus [-- Attachment #1: Type: text/plain, Size: 2202 bytes --] On 15.10.2009 19:53 Jarek Poplawski wrote: > Jarek Poplawski wrote, On 10/15/2009 01:40 PM: > >> On 12-10-2009 13:25, Tilman Schmidt wrote: >>> I have encountered the message in the subject during a test of >>> the Gigaset CAPI driver, and would like to determine whether >>> it's a bug in the driver, a bug somewhere else, or no bug at >>> all. The test scenario was PPP over ISDN with pppd+capiplugin. >>> In an alternative scenario, also PPP over ISDN but with >>> smpppd+capidrv, the message did not occur. I'm sorry, I had confused the two cases. The message occurs in the smpppd+capidrv scenario, not with pppd+capiplugin. >>> Johannes' answer pointed me to the netif_rx() function. >>> The Gigaset driver itself doesn't call that function at all. >>> In the scenario where I saw the message, it was the SYNC_PPP >>> line discipline that did. This analysis was therefore wrong. It would be the netif_rx() call towards the end of isdn_ppp_push_higher() in drivers/isdn/i4l/isdn_ppp.c L1177. >> Anyway, I agree with Michael Buesch there is no reason to waste time >> for tracking all netif_rx vs netif_rx_ni uses, and it seems we could >> avoid it by using the "proper" version of raise_softirq_irqoff() in >> __napi_schedule(). Could anybody try if I'm not wrong? >> >> net/core/dev.c | 2 +- >> 1 files changed, 1 insertions(+), 1 deletions(-) >> >> diff --git a/net/core/dev.c b/net/core/dev.c >> index 28b0b9e..7fc4009 100644 >> --- a/net/core/dev.c >> +++ b/net/core/dev.c >> @@ -2728,7 +2728,7 @@ void __napi_schedule(struct napi_struct *n) >> >> local_irq_save(flags); >> list_add_tail(&n->poll_list, &__get_cpu_var(softnet_data).poll_list); >> - __raise_softirq_irqoff(NET_RX_SOFTIRQ); >> + raise_softirq_irqoff(NET_RX_SOFTIRQ); >> local_irq_restore(flags); >> } >> EXPORT_SYMBOL(__napi_schedule); I have tested your patch and I can confirm that it fixes the messages. I have not noticed any ill effects. Thanks, Tilman -- Tilman Schmidt E-Mail: tilman@imap.cc Bonn, Germany Diese Nachricht besteht zu 100% aus wiederverwerteten Bits. Ungeöffnet mindestens haltbar bis: (siehe Rückseite) [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 254 bytes --] ^ permalink raw reply [flat|nested] 35+ messages in thread
* [PATCH] net: Adjust softirq raising in __napi_schedule 2009-10-21 18:46 ` Tilman Schmidt @ 2009-10-21 21:19 ` Jarek Poplawski 2009-10-21 21:25 ` Johannes Berg 2009-10-22 23:37 ` NOHZ: local_softirq_pending 08 Tilman Schmidt 1 sibling, 1 reply; 35+ messages in thread From: Jarek Poplawski @ 2009-10-21 21:19 UTC (permalink / raw) To: Tilman Schmidt Cc: David Miller, johannes, hidave.darkstar, linux-kernel, tglx, linux-wireless, linux-ppp, netdev, paulus, Michael Buesch, Oliver Hartkopp On Wed, Oct 21, 2009 at 08:46:40PM +0200, Tilman Schmidt wrote: ... > I have tested your patch and I can confirm that it fixes the messages. > I have not noticed any ill effects. OK. So, in any case, here is this next variant/proposal. Thanks, Jarek P. ------------------------> net: Adjust softirq raising in __napi_schedule This patch changes __raise_softirq_irqoff() to raise_softirq_irqoff() in __napi_schedule() to enable proper softirq scheduling from process context. The main intent is to let use netif_rx() universally, and make netif_rx_ni() redundant. Currently using netif_rx() instead of netif_rx_ni() triggers: "NOHZ: local_softirq_pending 08" warnings, but additional cost of one "if" on the fast path doesn't seem to justify maintaining it separately. This patch is based on the analysis, suggestions and the original patch for mac80211 by: Michael Buesch <mb@bu3sch.de> Another patch calling netif_rx variants conditionally was done by: Oliver Hartkopp <oliver@hartkopp.net> Reported-by: Michael Buesch <mb@bu3sch.de> Reported-by: Oliver Hartkopp <oliver@hartkopp.net> Reported-by: Tilman Schmidt <tilman@imap.cc> Diagnosed-by: Michael Buesch <mb@bu3sch.de> Tested-by: Tilman Schmidt <tilman@imap.cc> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> --- net/core/dev.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/net/core/dev.c b/net/core/dev.c index 28b0b9e..7fc4009 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -2728,7 +2728,7 @@ void __napi_schedule(struct napi_struct *n) local_irq_save(flags); list_add_tail(&n->poll_list, &__get_cpu_var(softnet_data).poll_list); - __raise_softirq_irqoff(NET_RX_SOFTIRQ); + raise_softirq_irqoff(NET_RX_SOFTIRQ); local_irq_restore(flags); } EXPORT_SYMBOL(__napi_schedule); ^ permalink raw reply related [flat|nested] 35+ messages in thread
* Re: [PATCH] net: Adjust softirq raising in __napi_schedule 2009-10-21 21:19 ` [PATCH] net: Adjust softirq raising in __napi_schedule Jarek Poplawski @ 2009-10-21 21:25 ` Johannes Berg 2009-10-21 21:37 ` Tilman Schmidt 2009-10-21 21:39 ` Jarek Poplawski 0 siblings, 2 replies; 35+ messages in thread From: Johannes Berg @ 2009-10-21 21:25 UTC (permalink / raw) To: Jarek Poplawski Cc: Tilman Schmidt, David Miller, hidave.darkstar, linux-kernel, tglx, linux-wireless, linux-ppp, netdev, paulus, Michael Buesch, Oliver Hartkopp [-- Attachment #1: Type: text/plain, Size: 2238 bytes --] On Wed, 2009-10-21 at 23:19 +0200, Jarek Poplawski wrote: > On Wed, Oct 21, 2009 at 08:46:40PM +0200, Tilman Schmidt wrote: > ... > > I have tested your patch and I can confirm that it fixes the messages. > > I have not noticed any ill effects. > > OK. So, in any case, here is this next variant/proposal. > > Thanks, > Jarek P. > ------------------------> > net: Adjust softirq raising in __napi_schedule > > This patch changes __raise_softirq_irqoff() to raise_softirq_irqoff() > in __napi_schedule() to enable proper softirq scheduling from process > context. The main intent is to let use netif_rx() universally, and > make netif_rx_ni() redundant. > > Currently using netif_rx() instead of netif_rx_ni() triggers: > "NOHZ: local_softirq_pending 08" warnings, but additional cost of one > "if" on the fast path doesn't seem to justify maintaining it > separately. > > This patch is based on the analysis, suggestions and the original > patch for mac80211 by: Michael Buesch <mb@bu3sch.de> > > Another patch calling netif_rx variants conditionally was done by: > Oliver Hartkopp <oliver@hartkopp.net> > > Reported-by: Michael Buesch <mb@bu3sch.de> > Reported-by: Oliver Hartkopp <oliver@hartkopp.net> > Reported-by: Tilman Schmidt <tilman@imap.cc> > Diagnosed-by: Michael Buesch <mb@bu3sch.de> > Tested-by: Tilman Schmidt <tilman@imap.cc> > Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> > --- > > net/core/dev.c | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/net/core/dev.c b/net/core/dev.c > index 28b0b9e..7fc4009 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -2728,7 +2728,7 @@ void __napi_schedule(struct napi_struct *n) > > local_irq_save(flags); > list_add_tail(&n->poll_list, &__get_cpu_var(softnet_data).poll_list); > - __raise_softirq_irqoff(NET_RX_SOFTIRQ); > + raise_softirq_irqoff(NET_RX_SOFTIRQ); This still doesn't make any sense. There may or may not be a lot of code that assumes that everything else is run with other tasklets disabled, and that it cannot be interrupted by a tasklet and thus create a race. Can you prove that is not the case, across the entire networking layer? johannes [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 801 bytes --] ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] net: Adjust softirq raising in __napi_schedule 2009-10-21 21:25 ` Johannes Berg @ 2009-10-21 21:37 ` Tilman Schmidt 2009-10-21 21:39 ` Jarek Poplawski 1 sibling, 0 replies; 35+ messages in thread From: Tilman Schmidt @ 2009-10-21 21:37 UTC (permalink / raw) To: Johannes Berg Cc: Jarek Poplawski, David Miller, hidave.darkstar, linux-kernel, tglx, linux-wireless, linux-ppp, netdev, paulus, Michael Buesch, Oliver Hartkopp [-- Attachment #1: Type: text/plain, Size: 574 bytes --] Johannes Berg schrieb: > This still doesn't make any sense. > > There may or may not be a lot of code that assumes that everything else > is run with other tasklets disabled, and that it cannot be interrupted > by a tasklet and thus create a race. > > Can you prove that is not the case, across the entire networking layer? So what's the solution you propose? -- Tilman Schmidt E-Mail: tilman@imap.cc Bonn, Germany Diese Nachricht besteht zu 100% aus wiederverwerteten Bits. Ungeöffnet mindestens haltbar bis: (siehe Rückseite) [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 254 bytes --] ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] net: Adjust softirq raising in __napi_schedule 2009-10-21 21:25 ` Johannes Berg 2009-10-21 21:37 ` Tilman Schmidt @ 2009-10-21 21:39 ` Jarek Poplawski 2009-10-22 8:27 ` Johannes Berg 2009-10-22 11:29 ` David Miller 1 sibling, 2 replies; 35+ messages in thread From: Jarek Poplawski @ 2009-10-21 21:39 UTC (permalink / raw) To: Johannes Berg Cc: Tilman Schmidt, David Miller, hidave.darkstar, linux-kernel, tglx, linux-wireless, linux-ppp, netdev, paulus, Michael Buesch, Oliver Hartkopp On Thu, Oct 22, 2009 at 06:25:30AM +0900, Johannes Berg wrote: > On Wed, 2009-10-21 at 23:19 +0200, Jarek Poplawski wrote: ... > > --- a/net/core/dev.c > > +++ b/net/core/dev.c > > @@ -2728,7 +2728,7 @@ void __napi_schedule(struct napi_struct *n) > > > > local_irq_save(flags); > > list_add_tail(&n->poll_list, &__get_cpu_var(softnet_data).poll_list); > > - __raise_softirq_irqoff(NET_RX_SOFTIRQ); > > + raise_softirq_irqoff(NET_RX_SOFTIRQ); > > This still doesn't make any sense. > > There may or may not be a lot of code that assumes that everything else > is run with other tasklets disabled, and that it cannot be interrupted > by a tasklet and thus create a race. > > Can you prove that is not the case, across the entire networking layer? I'm not sure I can understand your question. This patch is mainly to avoid using netif_rx()/netif_rx_ni() pair as a test of proper process context handling; IMHO there're better tools for this (lockdep, WARN_ON's). Jarek P. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] net: Adjust softirq raising in __napi_schedule 2009-10-21 21:39 ` Jarek Poplawski @ 2009-10-22 8:27 ` Johannes Berg 2009-10-23 14:39 ` Tilman Schmidt 2009-10-22 11:29 ` David Miller 1 sibling, 1 reply; 35+ messages in thread From: Johannes Berg @ 2009-10-22 8:27 UTC (permalink / raw) To: Jarek Poplawski Cc: Tilman Schmidt, David Miller, hidave.darkstar, linux-kernel, tglx, linux-wireless, linux-ppp, netdev, paulus, Michael Buesch, Oliver Hartkopp [-- Attachment #1: Type: text/plain, Size: 1228 bytes --] On Wed, 2009-10-21 at 23:39 +0200, Jarek Poplawski wrote: > > > - __raise_softirq_irqoff(NET_RX_SOFTIRQ); > > > + raise_softirq_irqoff(NET_RX_SOFTIRQ); > > > > This still doesn't make any sense. > > > > There may or may not be a lot of code that assumes that everything else > > is run with other tasklets disabled, and that it cannot be interrupted > > by a tasklet and thus create a race. > > > > Can you prove that is not the case, across the entire networking layer? > > I'm not sure I can understand your question. This patch is mainly to > avoid using netif_rx()/netif_rx_ni() pair as a test of proper process > context handling; IMHO there're better tools for this (lockdep, > WARN_ON's). And how exactly does that matter to the patch at hand?! I'm saying that it seems to me, as indicated by the API (and without proof otherwise that's how it is) the networking layer needs to have packets handed to it with softirqs disabled. Therefore, this patch is not needed. While it may not be _wrong_, it'll definitely introduce a performance regression. This really should be obvious. You're fixing the warning at the source of the warning, rather than the source of the problem. johannes [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 801 bytes --] ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] net: Adjust softirq raising in __napi_schedule 2009-10-22 8:27 ` Johannes Berg @ 2009-10-23 14:39 ` Tilman Schmidt 2009-10-23 14:46 ` Johannes Berg 0 siblings, 1 reply; 35+ messages in thread From: Tilman Schmidt @ 2009-10-23 14:39 UTC (permalink / raw) To: Johannes Berg Cc: Jarek Poplawski, David Miller, hidave.darkstar, linux-kernel, tglx, linux-wireless, linux-ppp, netdev, paulus, Michael Buesch, Oliver Hartkopp -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Johannes Berg schrieb: > On Wed, 2009-10-21 at 23:39 +0200, Jarek Poplawski wrote: > >> I'm not sure I can understand your question. This patch is mainly to >> avoid using netif_rx()/netif_rx_ni() pair as a test of proper process >> context handling; IMHO there're better tools for this (lockdep, >> WARN_ON's). > > I'm saying that it seems to me, as indicated by the API (and without > proof otherwise that's how it is) the networking layer needs to have > packets handed to it with softirqs disabled. Strange. Then what are the two separate functions netif_rx() and netif_rx_ni() for? > This really should be obvious. You're fixing the warning at the source > of the warning, rather than the source of the problem. Good idea. So please do tell us where the source of the problem is. Thanks, Tilman - -- Tilman Schmidt E-Mail: tilman@imap.cc Bonn, Germany Diese Nachricht besteht zu 100% aus wiederverwerteten Bits. Ungeöffnet mindestens haltbar bis: (siehe Rückseite) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.4 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFK4cALQ3+did9BuFsRAnW8AKCP4ey+gT2RZBYpzx91PaXd11A/PwCgh35g fhEbJs++1BRIQ3encV8fIm4= =SSaA -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] net: Adjust softirq raising in __napi_schedule 2009-10-23 14:39 ` Tilman Schmidt @ 2009-10-23 14:46 ` Johannes Berg 2009-10-26 7:41 ` Jarek Poplawski 0 siblings, 1 reply; 35+ messages in thread From: Johannes Berg @ 2009-10-23 14:46 UTC (permalink / raw) To: Tilman Schmidt Cc: Jarek Poplawski, David Miller, hidave.darkstar, linux-kernel, tglx, linux-wireless, linux-ppp, netdev, paulus, Michael Buesch, Oliver Hartkopp [-- Attachment #1: Type: text/plain, Size: 505 bytes --] On Fri, 2009-10-23 at 16:39 +0200, Tilman Schmidt wrote: > Strange. Then what are the two separate functions netif_rx() and > netif_rx_ni() for? netif_rx_ni() disables preemption. > > This really should be obvious. You're fixing the warning at the source > > of the warning, rather than the source of the problem. > > Good idea. So please do tell us where the source of the problem is. You use netif_rx_ni() instead of netif_rx() at whatever place that causes this problem. johannes [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 801 bytes --] ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] net: Adjust softirq raising in __napi_schedule 2009-10-23 14:46 ` Johannes Berg @ 2009-10-26 7:41 ` Jarek Poplawski 2009-10-26 7:44 ` Johannes Berg 0 siblings, 1 reply; 35+ messages in thread From: Jarek Poplawski @ 2009-10-26 7:41 UTC (permalink / raw) To: Johannes Berg Cc: Tilman Schmidt, David Miller, hidave.darkstar, linux-kernel, tglx, linux-wireless, linux-ppp, netdev, paulus, Michael Buesch, Oliver Hartkopp On Fri, Oct 23, 2009 at 04:46:31PM +0200, Johannes Berg wrote: > On Fri, 2009-10-23 at 16:39 +0200, Tilman Schmidt wrote: > > > Strange. Then what are the two separate functions netif_rx() and > > netif_rx_ni() for? > > netif_rx_ni() disables preemption. You wrote earlier: > [...] the networking layer needs to have > packets handed to it with softirqs disabled. How disabling preemption can fix something which needs softirqs disabled? Could you be more precise? > > > This really should be obvious. You're fixing the warning at the source > > > of the warning, rather than the source of the problem. > > > > Good idea. So please do tell us where the source of the problem is. > > You use netif_rx_ni() instead of netif_rx() at whatever place that > causes this problem. This isn't a very precise description either. Jarek P. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] net: Adjust softirq raising in __napi_schedule 2009-10-26 7:41 ` Jarek Poplawski @ 2009-10-26 7:44 ` Johannes Berg 2009-10-26 7:54 ` Jarek Poplawski 0 siblings, 1 reply; 35+ messages in thread From: Johannes Berg @ 2009-10-26 7:44 UTC (permalink / raw) To: Jarek Poplawski Cc: Tilman Schmidt, David Miller, hidave.darkstar, linux-kernel, tglx, linux-wireless, linux-ppp, netdev, paulus, Michael Buesch, Oliver Hartkopp [-- Attachment #1: Type: text/plain, Size: 474 bytes --] On Mon, 2009-10-26 at 07:41 +0000, Jarek Poplawski wrote: > > netif_rx_ni() disables preemption. > > You wrote earlier: > > > [...] the networking layer needs to have > > packets handed to it with softirqs disabled. > > How disabling preemption can fix something which needs softirqs > disabled? Could you be more precise? No, I wrote that I didn't know. I suppose now that I looked at it I do know, and only disabling preemption is required. johannes [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 801 bytes --] ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] net: Adjust softirq raising in __napi_schedule 2009-10-26 7:44 ` Johannes Berg @ 2009-10-26 7:54 ` Jarek Poplawski 2009-10-26 7:58 ` Johannes Berg 0 siblings, 1 reply; 35+ messages in thread From: Jarek Poplawski @ 2009-10-26 7:54 UTC (permalink / raw) To: Johannes Berg Cc: Tilman Schmidt, David Miller, hidave.darkstar, linux-kernel, tglx, linux-wireless, linux-ppp, netdev, paulus, Michael Buesch, Oliver Hartkopp On Mon, Oct 26, 2009 at 08:44:14AM +0100, Johannes Berg wrote: > On Mon, 2009-10-26 at 07:41 +0000, Jarek Poplawski wrote: > > > > netif_rx_ni() disables preemption. > > > > You wrote earlier: > > > > > [...] the networking layer needs to have > > > packets handed to it with softirqs disabled. > > > > How disabling preemption can fix something which needs softirqs > > disabled? Could you be more precise? > > No, I wrote that I didn't know. I suppose now that I looked at it I do > know, and only disabling preemption is required. But netif_rx has preemption disabled most of the time (by hardirqs disabling). So maybe disabling preemption isn't the main reason here either? Jarek P. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] net: Adjust softirq raising in __napi_schedule 2009-10-26 7:54 ` Jarek Poplawski @ 2009-10-26 7:58 ` Johannes Berg 2009-10-26 8:47 ` Tilman Schmidt 0 siblings, 1 reply; 35+ messages in thread From: Johannes Berg @ 2009-10-26 7:58 UTC (permalink / raw) To: Jarek Poplawski Cc: Tilman Schmidt, David Miller, hidave.darkstar, linux-kernel, tglx, linux-wireless, linux-ppp, netdev, paulus, Michael Buesch, Oliver Hartkopp [-- Attachment #1: Type: text/plain, Size: 765 bytes --] On Mon, 2009-10-26 at 07:54 +0000, Jarek Poplawski wrote: > > No, I wrote that I didn't know. I suppose now that I looked at it I do > > know, and only disabling preemption is required. > > But netif_rx has preemption disabled most of the time (by hardirqs > disabling). So maybe disabling preemption isn't the main reason here > either? Not for netpoll though, which may or may not be relevant (if I were to venture a guess I'd say it isn't and it disables preemption to be able to do the softirq thing) However, I lost track now of why we're discussing this. Basically it boils down to using netif_rx() when in (soft)irq, and netif_rx_ni() when in process context. That could just be an optimisation, but it's a very valid one. johannes [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 801 bytes --] ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] net: Adjust softirq raising in __napi_schedule 2009-10-26 7:58 ` Johannes Berg @ 2009-10-26 8:47 ` Tilman Schmidt 2009-10-26 8:56 ` Johannes Berg 0 siblings, 1 reply; 35+ messages in thread From: Tilman Schmidt @ 2009-10-26 8:47 UTC (permalink / raw) To: Johannes Berg Cc: Jarek Poplawski, David Miller, hidave.darkstar, linux-kernel, tglx, linux-wireless, linux-ppp, netdev, paulus, Michael Buesch, Oliver Hartkopp -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Am 26.10.2009 09:58 schrieb Johannes Berg: > On Mon, 2009-10-26 at 07:54 +0000, Jarek Poplawski wrote: > >>> No, I wrote that I didn't know. I suppose now that I looked at it I do >>> know, and only disabling preemption is required. >> But netif_rx has preemption disabled most of the time (by hardirqs >> disabling). So maybe disabling preemption isn't the main reason here >> either? > > Not for netpoll though, which may or may not be relevant (if I were to > venture a guess I'd say it isn't and it disables preemption to be able > to do the softirq thing) > > However, I lost track now of why we're discussing this. The starting point were several reports of the kernel message: NOHZ: local_softirq_pending 08 Originally most if not all of them came from wireless networking, but I muddied the waters by adding to the mix a case involving ISDN. You stated that all the solutions proposed so far were wrong, so we're naturally turning to you for guidance on what the right solution might be. > Basically it boils down to using netif_rx() when in (soft)irq, and > netif_rx_ni() when in process context. That could just be an > optimisation, but it's a very valid one. Hmmm. That seems to contradict your earlier statement to me that simply replacing a call to netif_rx() by one to netif_rx_ni() when not in interrupt context isn't a valid solution either. What am I missing? Thanks, Tilman - -- Tilman Schmidt E-Mail: tilman@imap.cc Bonn, Germany Diese Nachricht besteht zu 100% aus wiederverwerteten Bits. Ungeöffnet mindestens haltbar bis: (siehe Rückseite) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.4 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFK5WIXQ3+did9BuFsRAsj1AJ0e4VJ7Nsp69ROXCiT4oM/Q6lIpSwCfZvXS 4nV4tWTIzgk4IRlCt0CCF3Y= =r15I -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] net: Adjust softirq raising in __napi_schedule 2009-10-26 8:47 ` Tilman Schmidt @ 2009-10-26 8:56 ` Johannes Berg 2009-10-27 0:52 ` Tilman Schmidt 0 siblings, 1 reply; 35+ messages in thread From: Johannes Berg @ 2009-10-26 8:56 UTC (permalink / raw) To: Tilman Schmidt Cc: Jarek Poplawski, David Miller, hidave.darkstar, linux-kernel, tglx, linux-wireless, linux-ppp, netdev, paulus, Michael Buesch, Oliver Hartkopp [-- Attachment #1: Type: text/plain, Size: 1582 bytes --] On Mon, 2009-10-26 at 10:47 +0200, Tilman Schmidt wrote: > > However, I lost track now of why we're discussing this. > > The starting point were several reports of the kernel message: > > NOHZ: local_softirq_pending 08 > > Originally most if not all of them came from wireless networking, > but I muddied the waters by adding to the mix a case involving ISDN. > You stated that all the solutions proposed so far were wrong, so > we're naturally turning to you for guidance on what the right > solution might be. Right. Sorry about getting lost here. > > Basically it boils down to using netif_rx() when in (soft)irq, and > > netif_rx_ni() when in process context. That could just be an > > optimisation, but it's a very valid one. > > Hmmm. That seems to contradict your earlier statement to me that > simply replacing a call to netif_rx() by one to netif_rx_ni() > when not in interrupt context isn't a valid solution either. > What am I missing? Well, I think you misunderstood me. It would be correct to do this, if and only if the code that calls it doesn't need the extra guarantee. Any code (say ISDN code) that calls netif_rx() is clearly assuming to always be running in (soft)irq context, otherwise it couldn't call netif_rx() unconditionally. Agree so far? So now if you change the ISDN code to call netif_rx_ni(), you've changed the assumption that the ISDN code makes -- that it is running in (soft)irq context. Therefore, you need to verify that this is actually a correct change, which is what I tried to say. johannes [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 801 bytes --] ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] net: Adjust softirq raising in __napi_schedule 2009-10-26 8:56 ` Johannes Berg @ 2009-10-27 0:52 ` Tilman Schmidt 2009-10-27 7:01 ` Johannes Berg 0 siblings, 1 reply; 35+ messages in thread From: Tilman Schmidt @ 2009-10-27 0:52 UTC (permalink / raw) To: Johannes Berg Cc: Jarek Poplawski, David Miller, hidave.darkstar, linux-kernel, tglx, linux-wireless, linux-ppp, netdev, paulus, Michael Buesch, Oliver Hartkopp [-- Attachment #1: Type: text/plain, Size: 2335 bytes --] Am 26.10.2009 09:56 schrieb Johannes Berg: > On Mon, 2009-10-26 at 10:47 +0200, Tilman Schmidt wrote: > >>> Basically it boils down to using netif_rx() when in (soft)irq, and >>> netif_rx_ni() when in process context. That could just be an >>> optimisation, but it's a very valid one. >> Hmmm. That seems to contradict your earlier statement to me that >> simply replacing a call to netif_rx() by one to netif_rx_ni() >> when not in interrupt context isn't a valid solution either. >> What am I missing? > > Well, I think you misunderstood me. It would be correct to do this, if > and only if the code that calls it doesn't need the extra guarantee. I see. Thanks for the clarification. > Any code (say ISDN code) that calls netif_rx() is clearly assuming to > always be running in (soft)irq context, otherwise it couldn't call > netif_rx() unconditionally. Agree so far? Well, in fact I'm not sure. :-) All I know is that in the ISDN case, no such assumption is explicitly stated anywhere. (The code in question is called from the rcvcallb_skb() callback method which the hardware driver calls when data has been received, and the description of that method in Documentation/isdn/INTERFACE does not say anything about the context in which it may be called.) The relevant code in drivers/isdn/i4l/isdn_ppp.c is rather old, perhaps even older than softirqs and the netif_rx() / netif_rx_ni() split. (Bear in mind that we are talking about the old ISDN4Linux subsystem which initially didn't even make it into the 2.6 series because it was considered obsolete.) It seems quite possible to me that just no one ever thought about that question. > So now if you change the ISDN code to call netif_rx_ni(), you've changed > the assumption that the ISDN code makes -- that it is running in > (soft)irq context. Therefore, you need to verify that this is actually a > correct change, which is what I tried to say. Understood. However, the fact that the local_softirq_pending message is appearing would seem to indicate that this assumption was wrong to begin with, wouldn't it? Thanks, Tilman -- Tilman Schmidt E-Mail: tilman@imap.cc Bonn, Germany Diese Nachricht besteht zu 100% aus wiederverwerteten Bits. Ungeöffnet mindestens haltbar bis: (siehe Rückseite) [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 254 bytes --] ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] net: Adjust softirq raising in __napi_schedule 2009-10-27 0:52 ` Tilman Schmidt @ 2009-10-27 7:01 ` Johannes Berg 0 siblings, 0 replies; 35+ messages in thread From: Johannes Berg @ 2009-10-27 7:01 UTC (permalink / raw) To: Tilman Schmidt Cc: Jarek Poplawski, David Miller, hidave.darkstar, linux-kernel, tglx, linux-wireless, linux-ppp, netdev, paulus, Michael Buesch, Oliver Hartkopp [-- Attachment #1: Type: text/plain, Size: 1706 bytes --] On Tue, 2009-10-27 at 01:52 +0100, Tilman Schmidt wrote: > > Any code (say ISDN code) that calls netif_rx() is clearly assuming to > > always be running in (soft)irq context, otherwise it couldn't call > > netif_rx() unconditionally. Agree so far? > > Well, in fact I'm not sure. :-) All I know is that in the ISDN case, no > such assumption is explicitly stated anywhere. (The code in question is > called from the rcvcallb_skb() callback method which the hardware driver > calls when data has been received, and the description of that method in > Documentation/isdn/INTERFACE does not say anything about the context in > which it may be called.) The relevant code in drivers/isdn/i4l/isdn_ppp.c > is rather old, perhaps even older than softirqs and the netif_rx() / > netif_rx_ni() split. (Bear in mind that we are talking about the old > ISDN4Linux subsystem which initially didn't even make it into the 2.6 > series because it was considered obsolete.) It seems quite possible to me > that just no one ever thought about that question. Heh :) > > So now if you change the ISDN code to call netif_rx_ni(), you've changed > > the assumption that the ISDN code makes -- that it is running in > > (soft)irq context. Therefore, you need to verify that this is actually a > > correct change, which is what I tried to say. > > Understood. However, the fact that the local_softirq_pending message is > appearing would seem to indicate that this assumption was wrong to > begin with, wouldn't it? I thought it only recently started appearing with a new driver or something, but I may have misunderstood you. Anyway, I think that sums up the issue from my POV. johannes [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 801 bytes --] ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] net: Adjust softirq raising in __napi_schedule 2009-10-21 21:39 ` Jarek Poplawski 2009-10-22 8:27 ` Johannes Berg @ 2009-10-22 11:29 ` David Miller 2009-10-22 12:54 ` Jarek Poplawski 1 sibling, 1 reply; 35+ messages in thread From: David Miller @ 2009-10-22 11:29 UTC (permalink / raw) To: jarkao2 Cc: johannes, tilman, hidave.darkstar, linux-kernel, tglx, linux-wireless, linux-ppp, netdev, paulus, mb, oliver From: Jarek Poplawski <jarkao2@gmail.com> Date: Wed, 21 Oct 2009 23:39:47 +0200 > I'm not sure I can understand your question. This patch is mainly to > avoid using netif_rx()/netif_rx_ni() pair as a test of proper process > context handling; IMHO there're better tools for this (lockdep, > WARN_ON's). Semantically I think your patch is correct, but I wonder about cost. Something that is a simply per-cpu inline "or" operation is now a function call and potentially mispredicted branch inside of raise_softirq_irqoff(). And netif_rx() is indeed a fast path for tunnels and other users so this does matter. I like having people call things in the correct context the function was built for, and thus we can avoiryd completely useless operations and tests as we can now in netif_rx(). Makaing things general purpose costs something, and it costs too much here for this critical routine, sorry. I was just having a talk with Nick Piggin about these kinds of issues today, too few people care about these ever encrouching tiny pieces of bloat that slow the kernel down gradually over time, and I simply won't stand for it when I notice it :-) ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] net: Adjust softirq raising in __napi_schedule 2009-10-22 11:29 ` David Miller @ 2009-10-22 12:54 ` Jarek Poplawski 0 siblings, 0 replies; 35+ messages in thread From: Jarek Poplawski @ 2009-10-22 12:54 UTC (permalink / raw) To: David Miller Cc: johannes, tilman, hidave.darkstar, linux-kernel, tglx, linux-wireless, linux-ppp, netdev, paulus, mb, oliver On Thu, Oct 22, 2009 at 04:29:39AM -0700, David Miller wrote: > From: Jarek Poplawski <jarkao2@gmail.com> > Date: Wed, 21 Oct 2009 23:39:47 +0200 > > > I'm not sure I can understand your question. This patch is mainly to > > avoid using netif_rx()/netif_rx_ni() pair as a test of proper process > > context handling; IMHO there're better tools for this (lockdep, > > WARN_ON's). > > Semantically I think your patch is correct, but I wonder about cost. > > Something that is a simply per-cpu inline "or" operation is now a > function call and potentially mispredicted branch inside of > raise_softirq_irqoff(). > > And netif_rx() is indeed a fast path for tunnels and other users so > this does matter. > > I like having people call things in the correct context the function > was built for, and thus we can avoiryd completely useless operations and > tests as we can now in netif_rx(). I like it too, but in this particular case I'm not sure netif_rx() functionality requires this kind of separation; it looks to me quite similarly to e.g. tasklet_schedule(), the same for process or softirq contexts. > > Makaing things general purpose costs something, and it costs too much > here for this critical routine, sorry. > > I was just having a talk with Nick Piggin about these kinds of issues > today, too few people care about these ever encrouching tiny pieces > of bloat that slow the kernel down gradually over time, and I simply > won't stand for it when I notice it :-) I'm not sure we're saving in the right place. As a matter of fact, whenever I look into kernel/ code I can't see this kind of optimization. There is quite a lot of WARN_ON's and if's. These NOHZ warnings simply show somebody's else debugging triggers far from places where it all started and is quite accidental, while this particular "bug" should've been printed immediately long time ago, if we really cared. Since I understand it's a question of taste, and it's not anything critical, I'm quite OK with staying with the old way (except old bugs, I hope ;-). Jarek P. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: NOHZ: local_softirq_pending 08 2009-10-21 18:46 ` Tilman Schmidt 2009-10-21 21:19 ` [PATCH] net: Adjust softirq raising in __napi_schedule Jarek Poplawski @ 2009-10-22 23:37 ` Tilman Schmidt 2009-10-23 13:34 ` Johannes Berg 1 sibling, 1 reply; 35+ messages in thread From: Tilman Schmidt @ 2009-10-22 23:37 UTC (permalink / raw) To: Jarek Poplawski Cc: David Miller, johannes, hidave.darkstar, linux-kernel, tglx, linux-wireless, linux-ppp, netdev, paulus, isdn4linux, i4ldeveloper, Karsten Keil [-- Attachment #1: Type: text/plain, Size: 1719 bytes --] On 21.10.2009 20:46, /me wrote: >>>> I have encountered the message in the subject during a test of >>>> the Gigaset CAPI driver, and would like to determine whether >>>> it's a bug in the driver, a bug somewhere else, or no bug at >>>> all. The test scenario was PPP over ISDN with pppd+capiplugin. >>>> In an alternative scenario, also PPP over ISDN but with >>>> smpppd+capidrv, the message did not occur. > > I'm sorry, I had confused the two cases. The message occurs in > the smpppd+capidrv scenario, not with pppd+capiplugin. > >>>> Johannes' answer pointed me to the netif_rx() function. >>>> The Gigaset driver itself doesn't call that function at all. >>>> In the scenario where I saw the message, it was the SYNC_PPP >>>> line discipline that did. > > This analysis was therefore wrong. It would be the netif_rx() > call towards the end of isdn_ppp_push_higher() in > drivers/isdn/i4l/isdn_ppp.c L1177. Having noticed that, I cooked up the following patch which fixed the messages for me. Comments? (Adding i4l people to the already impressive CC list.) --- a/drivers/isdn/i4l/isdn_ppp.c +++ b/drivers/isdn/i4l/isdn_ppp.c @@ -1174,7 +1174,10 @@ isdn_ppp_push_higher(isdn_net_dev * net_dev, isdn_net_local * lp, struct sk_buff #endif /* CONFIG_IPPP_FILTER */ skb->dev = dev; skb_reset_mac_header(skb); - netif_rx(skb); + if (in_interrupt()) + netif_rx(skb); + else + netif_rx_ni(skb); /* net_dev->local->stats.rx_packets++; done in isdn_net.c */ return; -- Tilman Schmidt E-Mail: tilman@imap.cc Bonn, Germany Diese Nachricht besteht zu 100% aus wiederverwerteten Bits. Ungeöffnet mindestens haltbar bis: (siehe Rückseite) [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 254 bytes --] ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: NOHZ: local_softirq_pending 08 2009-10-22 23:37 ` NOHZ: local_softirq_pending 08 Tilman Schmidt @ 2009-10-23 13:34 ` Johannes Berg 2009-10-23 14:27 ` Tilman Schmidt 0 siblings, 1 reply; 35+ messages in thread From: Johannes Berg @ 2009-10-23 13:34 UTC (permalink / raw) To: Tilman Schmidt Cc: Jarek Poplawski, David Miller, hidave.darkstar, linux-kernel, tglx, linux-wireless, linux-ppp, netdev, paulus, isdn4linux, i4ldeveloper, Karsten Keil [-- Attachment #1: Type: text/plain, Size: 566 bytes --] On Fri, 2009-10-23 at 01:37 +0200, Tilman Schmidt wrote: > --- a/drivers/isdn/i4l/isdn_ppp.c > +++ b/drivers/isdn/i4l/isdn_ppp.c > @@ -1174,7 +1174,10 @@ isdn_ppp_push_higher(isdn_net_dev * net_dev, isdn_net_local * lp, struct sk_buff > #endif /* CONFIG_IPPP_FILTER */ > skb->dev = dev; > skb_reset_mac_header(skb); > - netif_rx(skb); > + if (in_interrupt()) > + netif_rx(skb); > + else > + netif_rx_ni(skb); So you've verified that the entire i4l stack can cope with being called twice on the same CPU, from different contexts? johannes [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 801 bytes --] ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: NOHZ: local_softirq_pending 08 2009-10-23 13:34 ` Johannes Berg @ 2009-10-23 14:27 ` Tilman Schmidt 2009-10-23 14:31 ` Johannes Berg 0 siblings, 1 reply; 35+ messages in thread From: Tilman Schmidt @ 2009-10-23 14:27 UTC (permalink / raw) To: Johannes Berg Cc: Jarek Poplawski, David Miller, hidave.darkstar, linux-kernel, tglx, linux-wireless, linux-ppp, netdev, paulus, isdn4linux, i4ldeveloper, Karsten Keil -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Johannes Berg schrieb: > On Fri, 2009-10-23 at 01:37 +0200, Tilman Schmidt wrote: > >> --- a/drivers/isdn/i4l/isdn_ppp.c >> +++ b/drivers/isdn/i4l/isdn_ppp.c >> @@ -1174,7 +1174,10 @@ isdn_ppp_push_higher(isdn_net_dev * net_dev, isdn_net_local * lp, struct sk_buff >> #endif /* CONFIG_IPPP_FILTER */ >> skb->dev = dev; >> skb_reset_mac_header(skb); >> - netif_rx(skb); >> + if (in_interrupt()) >> + netif_rx(skb); >> + else >> + netif_rx_ni(skb); > > So you've verified that the entire i4l stack can cope with being called > twice on the same CPU, from different contexts? What makes you think so? Better yet, what do you propose? Thanks, Tilman - -- Tilman Schmidt E-Mail: tilman@imap.cc Bonn, Germany Diese Nachricht besteht zu 100% aus wiederverwerteten Bits. Ungeöffnet mindestens haltbar bis: (siehe Rückseite) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.4 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFK4b09Q3+did9BuFsRAqBvAKCbRI0iXQEyK3ztxkGHcqpbcceqbACgkagX JF7nYd152ihp2uemIs/cB54= =YOin -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: NOHZ: local_softirq_pending 08 2009-10-23 14:27 ` Tilman Schmidt @ 2009-10-23 14:31 ` Johannes Berg 2009-10-23 16:33 ` Tilman Schmidt 0 siblings, 1 reply; 35+ messages in thread From: Johannes Berg @ 2009-10-23 14:31 UTC (permalink / raw) To: Tilman Schmidt Cc: Jarek Poplawski, David Miller, hidave.darkstar, linux-kernel, tglx, linux-wireless, linux-ppp, netdev, paulus, isdn4linux, i4ldeveloper, Karsten Keil [-- Attachment #1: Type: text/plain, Size: 1369 bytes --] On Fri, 2009-10-23 at 16:27 +0200, Tilman Schmidt wrote: > >> --- a/drivers/isdn/i4l/isdn_ppp.c > >> +++ b/drivers/isdn/i4l/isdn_ppp.c > >> @@ -1174,7 +1174,10 @@ isdn_ppp_push_higher(isdn_net_dev * net_dev, isdn_net_local * lp, struct sk_buff > >> #endif /* CONFIG_IPPP_FILTER */ > >> skb->dev = dev; > >> skb_reset_mac_header(skb); > >> - netif_rx(skb); > >> + if (in_interrupt()) > >> + netif_rx(skb); > >> + else > >> + netif_rx_ni(skb); > > > > So you've verified that the entire i4l stack can cope with being called > > twice on the same CPU, from different contexts? > > What makes you think so? I thought I'd explained this in my other email. *sigh* You're squelching a warning, which comes from the fact that you're calling something that calls into netif_rx() with softirqs enabled. That would seem to imply that potentially a softirq could at the same time call into that code too. Basically, what happens now is this: disable softirqs call into i4l/ppp i4l/ppp code call netif_rx() more code enable softirqs You're changing it to call into i4l/ppp i4l/ppp code call netif_rx_ni() more code so the entire chunks "i4l/ppp code" and "more code" are now no longer protected against being interrupted by a softirq that runs the same code, maybe for a different device, on the same CPU. johannes [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 801 bytes --] ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: NOHZ: local_softirq_pending 08 2009-10-23 14:31 ` Johannes Berg @ 2009-10-23 16:33 ` Tilman Schmidt 0 siblings, 0 replies; 35+ messages in thread From: Tilman Schmidt @ 2009-10-23 16:33 UTC (permalink / raw) To: Johannes Berg Cc: Jarek Poplawski, David Miller, hidave.darkstar, linux-kernel, tglx, linux-wireless, linux-ppp, netdev, paulus, isdn4linux, i4ldeveloper, Karsten Keil -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Johannes Berg schrieb: > On Fri, 2009-10-23 at 16:27 +0200, Tilman Schmidt wrote: >> Johannes Berg schrieb: >>> So you've verified that the entire i4l stack can cope with being called >>> twice on the same CPU, from different contexts? >> What makes you think so? > > I thought I'd explained this in my other email. *sigh* [snip] Ah, I see. You misunderstood my posting. I did not propose that patch as a definitive and verified solution, but rather as a request for comments from the people who know and maintain the code in question. I thought that was clear from the facts that - - I didn't include "[PATCH]" in the subject line - - I didn't add a "Signed-off-by" line - - I wrote "fixed the messages", not "solved the problem" - - I explicitly wrote "Comments?" and "Adding i4l people to CC" Apparently all that was still not clear enough. Sorry about that. So let me try to make my concern as explicit as possible: - - The patch I posted had the effect that the test which reliably triggered the local_softirq_pending message before did not do so anymore. - - To me, this seems to indicate that the netif_rx(skb) call in line 1177 of source file drivers/isdn/i4l/isdn_ppp.c is indeed involved in the problem. - - Now I'm asking people who know more than myself about the ramifications of that message (ie., you) and/or the code I narrowed it down to (ie., the ISDN4Linux maintainers - which is why I added them to the CC list) to have a look and determine how to fix the problem properly. - - This would of course include, in finis, the verification you mistakenly assumed I might have done already. I hope that's clear enough. If you have any questions, feel free to ask. Thanks, Tilman - -- Tilman Schmidt E-Mail: tilman@imap.cc Bonn, Germany Diese Nachricht besteht zu 100% aus wiederverwerteten Bits. Ungeöffnet mindestens haltbar bis: (siehe Rückseite) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.4 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFK4drJQ3+did9BuFsRAmstAJ94UF/LupINlYpjbxzz9xoiN5w34wCfflRz YfR/fXt3HasrxUSP29REOnE= =VQ/C -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 35+ messages in thread
end of thread, other threads:[~2009-10-27 7:01 UTC | newest]
Thread overview: 35+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20091011095217.GA2200@darkstar>
2009-10-11 10:08 ` NOHZ: local_softirq_pending 08 Johannes Berg
2009-10-11 10:17 ` Michael Buesch
2009-10-11 10:37 ` Johannes Berg
2009-10-11 10:38 ` David Miller
2009-10-11 10:55 ` Dave Young
2009-10-11 11:18 ` Tilman Schmidt
2009-10-11 11:40 ` Johannes Berg
2009-10-12 8:28 ` Tilman Schmidt
2009-10-12 10:32 ` David Miller
2009-10-12 11:25 ` Tilman Schmidt
2009-10-15 11:40 ` Jarek Poplawski
2009-10-15 17:53 ` Jarek Poplawski
2009-10-21 18:46 ` Tilman Schmidt
2009-10-21 21:19 ` [PATCH] net: Adjust softirq raising in __napi_schedule Jarek Poplawski
2009-10-21 21:25 ` Johannes Berg
2009-10-21 21:37 ` Tilman Schmidt
2009-10-21 21:39 ` Jarek Poplawski
2009-10-22 8:27 ` Johannes Berg
2009-10-23 14:39 ` Tilman Schmidt
2009-10-23 14:46 ` Johannes Berg
2009-10-26 7:41 ` Jarek Poplawski
2009-10-26 7:44 ` Johannes Berg
2009-10-26 7:54 ` Jarek Poplawski
2009-10-26 7:58 ` Johannes Berg
2009-10-26 8:47 ` Tilman Schmidt
2009-10-26 8:56 ` Johannes Berg
2009-10-27 0:52 ` Tilman Schmidt
2009-10-27 7:01 ` Johannes Berg
2009-10-22 11:29 ` David Miller
2009-10-22 12:54 ` Jarek Poplawski
2009-10-22 23:37 ` NOHZ: local_softirq_pending 08 Tilman Schmidt
2009-10-23 13:34 ` Johannes Berg
2009-10-23 14:27 ` Tilman Schmidt
2009-10-23 14:31 ` Johannes Berg
2009-10-23 16:33 ` Tilman Schmidt
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).