* Re: Lockup with 2.6.9-ac15 related to netconsole [not found] ` <20041217233524.GA11202@electric-eye.fr.zoreil.com> @ 2004-12-20 9:42 ` Mark Broadbent 2004-12-20 21:14 ` Matt Mackall 0 siblings, 1 reply; 20+ messages in thread From: Mark Broadbent @ 2004-12-20 9:42 UTC (permalink / raw) To: romieu; +Cc: mpm, linux-kernel, netdev Francois Romieu said: > Matt Mackall <mpm@selenic.com> : > [...] >> Please try the attached untested, uncompiled patch to add polling to >> r8169: > [...] >> @@ -1839,6 +1842,15 @@ >> } >> #endif >> >> +#ifdef CONFIG_NET_POLL_CONTROLLER >> +static void rtl8169_netpoll(struct net_device *dev) >> +{ >> + disable_irq(dev->irq); >> + rtl8169_interrupt(dev->irq, netdev, NULL); > ^^^^^^ -> should be "dev" > > The r8169 driver in -mm offers netpoll. A patch which syncs the r8169 > driver from 2.6.10-rc3 with current -mm is available at: > http://www.fr.zoreil.com/people/francois/misc/20041218-2.6.10-rc3-r8169.c-test.patch> > Please report success/failure. Cc: netdev@oss.sgi.com is welcome. Exactly the same happens, I still get a 'NMI Watchdog detected LOCKUP' with the r8169 device using the above patch on top of 2.6.10-rc3-bk10. Thanks Mark -- Mark Broadbent <markb@wetlettuce.com> Web: http://www.wetlettuce.com ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Lockup with 2.6.9-ac15 related to netconsole 2004-12-20 9:42 ` Lockup with 2.6.9-ac15 related to netconsole Mark Broadbent @ 2004-12-20 21:14 ` Matt Mackall 2004-12-21 0:22 ` Francois Romieu 0 siblings, 1 reply; 20+ messages in thread From: Matt Mackall @ 2004-12-20 21:14 UTC (permalink / raw) To: Mark Broadbent; +Cc: romieu, linux-kernel, netdev On Mon, Dec 20, 2004 at 09:42:08AM -0000, Mark Broadbent wrote: > > Exactly the same happens, I still get a 'NMI Watchdog detected LOCKUP' > with the r8169 device using the above patch on top of 2.6.10-rc3-bk10. Ok, that suggests a problem localized to netpoll itself. Do you have spinlock debugging turned on by any chance? -- Mathematics is the supreme nostalgia of our time. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Lockup with 2.6.9-ac15 related to netconsole 2004-12-20 21:14 ` Matt Mackall @ 2004-12-21 0:22 ` Francois Romieu 2004-12-21 0:55 ` Matt Mackall 0 siblings, 1 reply; 20+ messages in thread From: Francois Romieu @ 2004-12-21 0:22 UTC (permalink / raw) To: Matt Mackall; +Cc: Mark Broadbent, linux-kernel, netdev Matt Mackall <mpm@selenic.com> : > On Mon, Dec 20, 2004 at 09:42:08AM -0000, Mark Broadbent wrote: > > > > Exactly the same happens, I still get a 'NMI Watchdog detected LOCKUP' > > with the r8169 device using the above patch on top of 2.6.10-rc3-bk10. > > Ok, that suggests a problem localized to netpoll itself. Do you have > spinlock debugging turned on by any chance? Any chance of: 1 dev_queue_xmit 2 dev->xmit_lock taken 3 interruption 4 printk 5 netconsole write 6 dev->xmit_lock again 7 lockup ? This is probably the silly question of the day. -- Ueimor ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Lockup with 2.6.9-ac15 related to netconsole 2004-12-21 0:22 ` Francois Romieu @ 2004-12-21 0:55 ` Matt Mackall 2004-12-21 10:23 ` Mark Broadbent 0 siblings, 1 reply; 20+ messages in thread From: Matt Mackall @ 2004-12-21 0:55 UTC (permalink / raw) To: Francois Romieu; +Cc: Mark Broadbent, linux-kernel, netdev On Tue, Dec 21, 2004 at 01:22:18AM +0100, Francois Romieu wrote: > Matt Mackall <mpm@selenic.com> : > > On Mon, Dec 20, 2004 at 09:42:08AM -0000, Mark Broadbent wrote: > > > > > > Exactly the same happens, I still get a 'NMI Watchdog detected LOCKUP' > > > with the r8169 device using the above patch on top of 2.6.10-rc3-bk10. > > > > Ok, that suggests a problem localized to netpoll itself. Do you have > > spinlock debugging turned on by any chance? > > Any chance of: > 1 dev_queue_xmit > 2 dev->xmit_lock taken > 3 interruption > 4 printk > 5 netconsole write > 6 dev->xmit_lock again > 7 lockup > > ? > > This is probably the silly question of the day. Maybe, but the answer isn't obvious to me at the moment as I haven't been thinking about such stuff enough lately. Silly response of the day: Mark, can you try this (again completely untested, but at least compiles) patch? I'm afraid I don't have a proper test rig to reproduce this at the moment. This will attempt to grab the lock, and if it fails, will check for recursion. Then it will try to print a message on the local console, temporarily disabling netconsole to allow the printk to get through.. Index: l/net/core/netpoll.c =================================================================== --- l.orig/net/core/netpoll.c 2004-11-04 10:53:23.388610000 -0800 +++ l/net/core/netpoll.c 2004-12-20 16:45:40.212709000 -0800 @@ -31,6 +31,8 @@ #define MAX_SKBS 32 #define MAX_UDP_CHUNK 1460 +static int netpoll_kill; + static spinlock_t skb_list_lock = SPIN_LOCK_UNLOCKED; static int nr_skbs; static struct sk_buff *skbs; @@ -183,13 +185,24 @@ int status; repeat: - if(!np || !np->dev || !netif_running(np->dev)) { + if(!np || !np->dev || !netif_running(np->dev) || netpoll_kill) { __kfree_skb(skb); return; } - spin_lock(&np->dev->xmit_lock); - np->dev->xmit_lock_owner = smp_processor_id(); + if(spin_trylock(&np->dev->xmit_lock)) + np->dev->xmit_lock_owner = smp_processor_id(); + else { + if(np->dev->xmit_lock_owner == smp_processor_id()) { + netpoll_kill = 1; + __kfree_skb(skb); + printk("Tried to recursively get dev->xmit_lock"); + netpoll_kill = 0; + return; + } + spin_lock(&np->dev->xmit_lock); + + } /* * network drivers do not expect to be called if the queue is -- Mathematics is the supreme nostalgia of our time. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Lockup with 2.6.9-ac15 related to netconsole 2004-12-21 0:55 ` Matt Mackall @ 2004-12-21 10:23 ` Mark Broadbent 2004-12-21 12:37 ` Francois Romieu 0 siblings, 1 reply; 20+ messages in thread From: Mark Broadbent @ 2004-12-21 10:23 UTC (permalink / raw) To: mpm; +Cc: romieu, linux-kernel, netdev Matt Mackall said: > On Tue, Dec 21, 2004 at 01:22:18AM +0100, Francois Romieu wrote: >> Matt Mackall <mpm@selenic.com> : >> > On Mon, Dec 20, 2004 at 09:42:08AM -0000, Mark Broadbent wrote: >> > > >> > > Exactly the same happens, I still get a 'NMI Watchdog detected >> > > LOCKUP' with the r8169 device using the above patch on top of >> > > 2.6.10-rc3-bk10. >> > >> > Ok, that suggests a problem localized to netpoll itself. Do you have >> > spinlock debugging turned on by any chance? >> >> Any chance of: >> 1 dev_queue_xmit >> 2 dev->xmit_lock taken >> 3 interruption >> 4 printk >> 5 netconsole write >> 6 dev->xmit_lock again >> 7 lockup >> >> ? >> >> This is probably the silly question of the day. > > Maybe, but the answer isn't obvious to me at the moment as I haven't > been thinking about such stuff enough lately. Silly response of the > day: > > Mark, can you try this (again completely untested, but at least > compiles) patch? I'm afraid I don't have a proper test rig to > reproduce this at the moment. This will attempt to grab the lock, and > if it fails, will check for recursion. Then it will try to print a > message on the local console, temporarily disabling netconsole to > allow the printk to get through.. OK, patch applied and spinlock debugging enabled. Testing with eth1 (r1869) doesn'tyield any additional messages, just the standard 'NMI Watchdog detected lockup'. Thanks Mark -- Mark Broadbent <markb@wetlettuce.com> Web: http://www.wetlettuce.com ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Lockup with 2.6.9-ac15 related to netconsole 2004-12-21 10:23 ` Mark Broadbent @ 2004-12-21 12:37 ` Francois Romieu 2004-12-21 13:29 ` Mark Broadbent 0 siblings, 1 reply; 20+ messages in thread From: Francois Romieu @ 2004-12-21 12:37 UTC (permalink / raw) To: Mark Broadbent; +Cc: mpm, romieu, linux-kernel, netdev Mark Broadbent <markb@wetlettuce.com> : [...] > OK, patch applied and spinlock debugging enabled. Testing with eth1 > (r1869) doesn'tyield any additional messages, just the standard > 'NMI Watchdog detected lockup'. Does the modified version below trigger _exactly_ the same hang ? --- net/core/netpoll.c 2004-12-21 13:09:51.000000000 +0100 +++ net/core/netpoll.c 2004-12-21 13:27:01.000000000 +0100 @@ -31,6 +31,8 @@ #define MAX_SKBS 32 #define MAX_UDP_CHUNK 1460 +static int netpoll_kill; + static spinlock_t skb_list_lock = SPIN_LOCK_UNLOCKED; static int nr_skbs; static struct sk_buff *skbs; @@ -184,11 +186,21 @@ void netpoll_send_skb(struct netpoll *np repeat: if(!np || !np->dev || !netif_running(np->dev)) { +too_bad: __kfree_skb(skb); return; } - spin_lock(&np->dev->xmit_lock); + if (!spin_trylock(&np->dev->xmit_lock)) { + netpoll_kill = 1; + goto too_bad; + } + + if (netpoll_kill) { + if (net_ratelimit()) + printk(KERN_ERR "netconsole raced"); + netpoll_kill = 0; + } np->dev->xmit_lock_owner = smp_processor_id(); /* ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Lockup with 2.6.9-ac15 related to netconsole 2004-12-21 12:37 ` Francois Romieu @ 2004-12-21 13:29 ` Mark Broadbent 2004-12-21 20:48 ` Francois Romieu 0 siblings, 1 reply; 20+ messages in thread From: Mark Broadbent @ 2004-12-21 13:29 UTC (permalink / raw) To: romieu; +Cc: mpm, linux-kernel, netdev Francois Romieu said: > Mark Broadbent <markb@wetlettuce.com> : > [...] >> OK, patch applied and spinlock debugging enabled. Testing with eth1 >> (r1869) doesn'tyield any additional messages, just the standard >> 'NMI Watchdog detected lockup'. > > Does the modified version below trigger _exactly_ the same hang ? Using the patch supplied I get no hang, just the message 'netconsole raced' output to the console and the packet capture proceeds as normal. Thanks Mark -- Mark Broadbent <markb@wetlettuce.com> Web: http://www.wetlettuce.com ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Lockup with 2.6.9-ac15 related to netconsole 2004-12-21 13:29 ` Mark Broadbent @ 2004-12-21 20:48 ` Francois Romieu 2004-12-21 21:27 ` Matt Mackall 0 siblings, 1 reply; 20+ messages in thread From: Francois Romieu @ 2004-12-21 20:48 UTC (permalink / raw) To: Mark Broadbent; +Cc: mpm, linux-kernel, netdev Mark Broadbent <markb@wetlettuce.com> : [...] > Using the patch supplied I get no hang, just the message 'netconsole > raced' output to the console and the packet capture proceeds as normal. > Thanks The patch is more a bandaid for debugging than a real fix. The netconsole will drop some messages until its locking is fixed If you can issue one more test, I'd like to know if some messages appear on the VGA console around the time at which tcpdump is started (the test assumes that netconsole is not used/insmoded at all). Please check that the console log_level is set high enough as it will be really disappointing if nothing appears :o) -- Ueimor ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Lockup with 2.6.9-ac15 related to netconsole 2004-12-21 20:48 ` Francois Romieu @ 2004-12-21 21:27 ` Matt Mackall 2004-12-21 22:58 ` Francois Romieu 0 siblings, 1 reply; 20+ messages in thread From: Matt Mackall @ 2004-12-21 21:27 UTC (permalink / raw) To: Francois Romieu; +Cc: Mark Broadbent, linux-kernel, netdev On Tue, Dec 21, 2004 at 09:48:53PM +0100, Francois Romieu wrote: > Mark Broadbent <markb@wetlettuce.com> : > [...] > > Using the patch supplied I get no hang, just the message 'netconsole > > raced' output to the console and the packet capture proceeds as normal. > > Thanks > > The patch is more a bandaid for debugging than a real fix. The netconsole > will drop some messages until its locking is fixed Unfortunately there's no good way to fix its locking in this circumstance (or the harder case of driver-private locks). I think I'll just have to come up with some scheme for queueing messages that arrive when the queue lock is held. > If you can issue one more test, I'd like to know if some messages appear > on the VGA console around the time at which tcpdump is started (the test > assumes that netconsole is not used/insmoded at all). Please check that > the console log_level is set high enough as it will be really disappointing > if nothing appears :o) I think it's the promiscuous mode message itself that's the problem but I've not had time to reproduce it. -- Mathematics is the supreme nostalgia of our time. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Lockup with 2.6.9-ac15 related to netconsole 2004-12-21 21:27 ` Matt Mackall @ 2004-12-21 22:58 ` Francois Romieu 2004-12-22 9:34 ` Patrick McHardy 0 siblings, 1 reply; 20+ messages in thread From: Francois Romieu @ 2004-12-21 22:58 UTC (permalink / raw) To: Matt Mackall; +Cc: Mark Broadbent, linux-kernel, netdev Matt Mackall <mpm@selenic.com> : [...] > I think it's the promiscuous mode message itself that's the problem Yes. dev_mc_add takes dev->xmit_lock and the game is over. Marc, if the patch below happens to work, it should not drop messages like the previous one (it is an ugly short-term suggestion). --- net/core/netpoll.c 2004-12-21 13:09:51.000000000 +0100 +++ net/core/netpoll.c 2004-12-21 23:35:25.000000000 +0100 @@ -22,6 +22,7 @@ #include <net/tcp.h> #include <net/udp.h> #include <asm/unaligned.h> +#include <net/pkt_sched.h> /* * We maintain a small pool of fully-sized skbs, to make sure the @@ -184,11 +187,19 @@ void netpoll_send_skb(struct netpoll *np repeat: if(!np || !np->dev || !netif_running(np->dev)) { __kfree_skb(skb); return; } - spin_lock(&np->dev->xmit_lock); + while (!spin_trylock(&np->dev->xmit_lock)) { + if (np->dev->xmit_lock_owner == smp_processor_id()) { + struct Qdisc *q = dev->qdisc; + + q->ops->enqueue(skb, q); + return; + } + } + np->dev->xmit_lock_owner = smp_processor_id(); /* ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Lockup with 2.6.9-ac15 related to netconsole 2004-12-21 22:58 ` Francois Romieu @ 2004-12-22 9:34 ` Patrick McHardy 2004-12-22 10:54 ` Patrick McHardy 0 siblings, 1 reply; 20+ messages in thread From: Patrick McHardy @ 2004-12-22 9:34 UTC (permalink / raw) To: Francois Romieu; +Cc: Matt Mackall, Mark Broadbent, linux-kernel, netdev Francois Romieu wrote: > Marc, if the patch below happens to work, it should not drop messages > like the previous one (it is an ugly short-term suggestion). > > - spin_lock(&np->dev->xmit_lock); > + while (!spin_trylock(&np->dev->xmit_lock)) { > + if (np->dev->xmit_lock_owner == smp_processor_id()) { > + struct Qdisc *q = dev->qdisc; > + > + q->ops->enqueue(skb, q); Shouldn't this be requeue ? Regards Patrick ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Lockup with 2.6.9-ac15 related to netconsole 2004-12-22 9:34 ` Patrick McHardy @ 2004-12-22 10:54 ` Patrick McHardy 2004-12-22 12:39 ` Francois Romieu 2004-12-22 14:37 ` Mark Broadbent 0 siblings, 2 replies; 20+ messages in thread From: Patrick McHardy @ 2004-12-22 10:54 UTC (permalink / raw) To: Francois Romieu; +Cc: Matt Mackall, Mark Broadbent, linux-kernel, netdev Patrick McHardy wrote: > Francois Romieu wrote: > >> Marc, if the patch below happens to work, it should not drop messages >> like the previous one (it is an ugly short-term suggestion). >> > >> - spin_lock(&np->dev->xmit_lock); >> + while (!spin_trylock(&np->dev->xmit_lock)) { >> + if (np->dev->xmit_lock_owner == smp_processor_id()) { >> + struct Qdisc *q = dev->qdisc; >> + >> + q->ops->enqueue(skb, q); > > > Shouldn't this be requeue ? Since the code doesn't dequeue itself, enqueue seems fine to keep at least the queued messages ordered. But you need to grab dev->queue_lock, otherwise you risk corrupting qdisc internal data. You should probably also deal with the noqueue-qdisc, which doesn't have an enqueue function. So it should look something like this: while (!spin_trylock(&np->dev->xmit_lock)) { if (np->dev->xmit_lock_owner == smp_processor_id()) { struct Qdisc *q; rcu_read_lock(); q = rcu_dereference(dev->qdisc); if (q->enqueue) { spin_lock(&dev->queue_lock); q->ops->enqueue(skb, q); spin_unlock(&dev->queue_lock); netif_schedule(np->dev); } else kfree_skb(skb); rcu_read_unlock(); return; } } Regards Patrick ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Lockup with 2.6.9-ac15 related to netconsole 2004-12-22 10:54 ` Patrick McHardy @ 2004-12-22 12:39 ` Francois Romieu 2004-12-22 13:33 ` jamal 2004-12-22 14:57 ` Patrick McHardy 2004-12-22 14:37 ` Mark Broadbent 1 sibling, 2 replies; 20+ messages in thread From: Francois Romieu @ 2004-12-22 12:39 UTC (permalink / raw) To: Patrick McHardy; +Cc: Matt Mackall, Mark Broadbent, linux-kernel, netdev Patrick McHardy <kaber@trash.net> : [...] > at least the queued messages ordered. But you need to grab > dev->queue_lock, otherwise you risk corrupting qdisc internal data. > You should probably also deal with the noqueue-qdisc, which doesn't > have an enqueue function. So it should look something like this: If I am not mistaken, a failure on spin_trylock + the test on xmit_lock_owner imply that it is safe to directly handle the queue. It means that qdisc_run() has been interrupted on the current cpu and the other paths seem fine as well. Counter-example is welcome (no joke). Of course the patch is completely ugly and violates any layering principle one could think of. It was not submitted for inclusion :o) > while (!spin_trylock(&np->dev->xmit_lock)) { > if (np->dev->xmit_lock_owner == smp_processor_id()) { > struct Qdisc *q; > > rcu_read_lock(); > q = rcu_dereference(dev->qdisc); > if (q->enqueue) { > spin_lock(&dev->queue_lock); I'd expect it to deadlock if dev_queue_xmit -> qdisc_run is interrupted on the current cpu and a printk is issued as dev->queue_lock will have been taken elsewhere. -- Ueimor ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Lockup with 2.6.9-ac15 related to netconsole 2004-12-22 12:39 ` Francois Romieu @ 2004-12-22 13:33 ` jamal 2004-12-22 14:57 ` Patrick McHardy 1 sibling, 0 replies; 20+ messages in thread From: jamal @ 2004-12-22 13:33 UTC (permalink / raw) To: Francois Romieu Cc: Patrick McHardy, Matt Mackall, Mark Broadbent, linux-kernel, netdev On Wed, 2004-12-22 at 07:39, Francois Romieu wrote: > If I am not mistaken, a failure on spin_trylock + the test on > xmit_lock_owner imply that it is safe to directly handle the > queue. It means that qdisc_run() has been interrupted on the > current cpu and the other paths seem fine as well. Counter-example > is welcome (no joke). Think more than 2 processors ;-> cheers, jamal ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Lockup with 2.6.9-ac15 related to netconsole 2004-12-22 12:39 ` Francois Romieu 2004-12-22 13:33 ` jamal @ 2004-12-22 14:57 ` Patrick McHardy 2004-12-22 17:18 ` Matt Mackall 1 sibling, 1 reply; 20+ messages in thread From: Patrick McHardy @ 2004-12-22 14:57 UTC (permalink / raw) To: Francois Romieu; +Cc: Matt Mackall, Mark Broadbent, linux-kernel, netdev Francois Romieu wrote: > Patrick McHardy <kaber@trash.net> : > [...] > >>at least the queued messages ordered. But you need to grab >>dev->queue_lock, otherwise you risk corrupting qdisc internal data. >>You should probably also deal with the noqueue-qdisc, which doesn't >>have an enqueue function. So it should look something like this: > > > If I am not mistaken, a failure on spin_trylock + the test on > xmit_lock_owner imply that it is safe to directly handle the > queue. It means that qdisc_run() has been interrupted on the > current cpu and the other paths seem fine as well. Counter-example > is welcome (no joke). enqueue is only protected by dev->queue_lock, and dev->queue_lock is dropped as soon as dev->xmit_lock is grabbed, so any other CPU might call enqueue at the same time. Example: CPU1 CPU2 dev_queue_xmit dev_queue_xmit lock(dev->queue_lock) lock(dev->queue_lock) q->enqueue qdisc_run qdisc_restart trylock(dev->xmit_lock), ok unlock(dev->queue_lock) ... printk("something") ... netpoll_send_skb trylock(dev->xmit_lock), fails q->enqueue q->enqueue > Of course the patch is completely ugly and violates any layering > principle one could think of. It was not submitted for inclusion :o) Sure, but I think we should have a short-term workaround until a better solution has been invented. Maybe dropping the packets would be best for now, it only affects printks issued in paths starting at qdisc_restart (-> hard_start_xmit -> ...). Queueing the packets might also cause reordering since not all packets are queued. >>while (!spin_trylock(&np->dev->xmit_lock)) { >> if (np->dev->xmit_lock_owner == smp_processor_id()) { >> struct Qdisc *q; >> >> rcu_read_lock(); >> q = rcu_dereference(dev->qdisc); >> if (q->enqueue) { >> spin_lock(&dev->queue_lock); > > > I'd expect it to deadlock if dev_queue_xmit -> qdisc_run is interrupted > on the current cpu and a printk is issued as dev->queue_lock will have > been taken elsewhere. Hmm this is complicated, let me think some more about it. Regards Patrick ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Lockup with 2.6.9-ac15 related to netconsole 2004-12-22 14:57 ` Patrick McHardy @ 2004-12-22 17:18 ` Matt Mackall 2004-12-25 11:26 ` Wish you all a Merry Christmas Pranav 2004-12-28 13:45 ` Lockup with 2.6.9-ac15 related to netconsole jamal 0 siblings, 2 replies; 20+ messages in thread From: Matt Mackall @ 2004-12-22 17:18 UTC (permalink / raw) To: Patrick McHardy; +Cc: Francois Romieu, Mark Broadbent, linux-kernel, netdev On Wed, Dec 22, 2004 at 03:57:57PM +0100, Patrick McHardy wrote: > >Of course the patch is completely ugly and violates any layering > >principle one could think of. It was not submitted for inclusion :o) > > Sure, but I think we should have a short-term workaround until > a better solution has been invented. Maybe dropping the packets > would be best for now, it only affects printks issued in paths > starting at qdisc_restart (-> hard_start_xmit -> ...). Queueing > the packets might also cause reordering since not all packets > are queued. When I mentioned queueing, I was thinking of a netpoll-private queue that would be hooked to a softirq or some such so that it would be pushed out as soon as possible. Dropping may be the better approach as queueing throws away netpoll's immediacy and ordering properties. And getting netpoll _more_ tangled in the net stack mechanics is definitely a step in the wrong direction. More generally, I'm tempted to add some warn_on style functionality so that printks in such troublesome paths can be lifted out. -- Mathematics is the supreme nostalgia of our time. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Wish you all a Merry Christmas 2004-12-22 17:18 ` Matt Mackall @ 2004-12-25 11:26 ` Pranav 2004-12-25 11:30 ` Jan Engelhardt 2004-12-28 13:45 ` Lockup with 2.6.9-ac15 related to netconsole jamal 1 sibling, 1 reply; 20+ messages in thread From: Pranav @ 2004-12-25 11:26 UTC (permalink / raw) To: netdev; +Cc: linux-kernel Hi everyone, Wishing you all A Prosperous Merry ChristMas. Hope Coming years brings Peace,Happiness,blessings of CHRISTmas to you all ,your family and this World. With Regards, Pranav. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Wish you all a Merry Christmas 2004-12-25 11:26 ` Wish you all a Merry Christmas Pranav @ 2004-12-25 11:30 ` Jan Engelhardt 0 siblings, 0 replies; 20+ messages in thread From: Jan Engelhardt @ 2004-12-25 11:30 UTC (permalink / raw) To: Pranav; +Cc: netdev, linux-kernel >Wishing you all A Prosperous Merry ChristMas. >Hope Coming years brings Peace,Happiness,blessings of CHRISTmas to you all >,your family and this World. > >With Regards, >Pranav. I don't see how this is related to linux-kernel. Jan Engelhardt -- ENOSPC ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Lockup with 2.6.9-ac15 related to netconsole 2004-12-22 17:18 ` Matt Mackall 2004-12-25 11:26 ` Wish you all a Merry Christmas Pranav @ 2004-12-28 13:45 ` jamal 1 sibling, 0 replies; 20+ messages in thread From: jamal @ 2004-12-28 13:45 UTC (permalink / raw) To: Matt Mackall Cc: Patrick McHardy, Francois Romieu, Mark Broadbent, linux-kernel, netdev On Wed, 2004-12-22 at 12:18, Matt Mackall wrote: > On Wed, Dec 22, 2004 at 03:57:57PM +0100, Patrick McHardy wrote: > > >Of course the patch is completely ugly and violates any layering > > >principle one could think of. It was not submitted for inclusion :o) > > > > Sure, but I think we should have a short-term workaround until > > a better solution has been invented. Maybe dropping the packets > > would be best for now, it only affects printks issued in paths > > starting at qdisc_restart (-> hard_start_xmit -> ...). Queueing > > the packets might also cause reordering since not all packets > > are queued. > > When I mentioned queueing, I was thinking of a netpoll-private queue > that would be hooked to a softirq or some such so that it would be > pushed out as soon as possible. Dropping may be the better approach I think so - just junk those packets. cheers, jamal ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Lockup with 2.6.9-ac15 related to netconsole 2004-12-22 10:54 ` Patrick McHardy 2004-12-22 12:39 ` Francois Romieu @ 2004-12-22 14:37 ` Mark Broadbent 1 sibling, 0 replies; 20+ messages in thread From: Mark Broadbent @ 2004-12-22 14:37 UTC (permalink / raw) To: kaber; +Cc: romieu, mpm, linux-kernel, netdev Patrick McHardy said: > Patrick McHardy wrote: >> Francois Romieu wrote: >> >>> Marc, if the patch below happens to work, it should not drop messages >>> like the previous one (it is an ugly short-term suggestion). >>> >> >>> - spin_lock(&np->dev->xmit_lock); >>> + while (!spin_trylock(&np->dev->xmit_lock)) { >>> + if (np->dev->xmit_lock_owner == smp_processor_id()) { >>> + struct Qdisc *q = dev->qdisc; >>> + >>> + q->ops->enqueue(skb, q); >> >> >> Shouldn't this be requeue ? > > Since the code doesn't dequeue itself, enqueue seems fine to keep > at least the queued messages ordered. But you need to grab > dev->queue_lock, otherwise you risk corrupting qdisc internal data. You > should probably also deal with the noqueue-qdisc, which doesn't have an > enqueue function. So it should look something like this: > > while (!spin_trylock(&np->dev->xmit_lock)) { > if (np->dev->xmit_lock_owner == smp_processor_id()) { > struct Qdisc *q; > > rcu_read_lock(); > q = rcu_dereference(dev->qdisc); > if (q->enqueue) { > spin_lock(&dev->queue_lock); > q->ops->enqueue(skb, q); > spin_unlock(&dev->queue_lock); > netif_schedule(np->dev); > } else > kfree_skb(skb); > rcu_read_unlock(); > return; > } > } I've tried both patches (modified slightly to get them to compile) but they both produce hard NMI detected lockups (as before). Thanks Mark Patches after modification to allow compilation: Francois' patch (against 2.6.10-rc3-bk10): diff -X dontdiff -urN linux-2.6.9-rc3-bk10.orig/net/core/netpoll.c linux-2.6.9-rc3-bk10/net/core/netpoll.c--- linux-2.6.9-rc3-bk10.orig/net/core/netpoll.c 2004-12-22 12:09:32.000000000 +0000+++ linux-2.6.9-rc3-bk10/net/core/netpoll.c 2004-12-22 14:13:54.000000000 +0000@@ -22,6 +22,7 @@ #include <net/tcp.h> #include <net/udp.h> #include <asm/unaligned.h> +#include <net/pkt_sched.h> /* * We maintain a small pool of fully-sized skbs, to make sure the @@ -188,7 +189,15 @@ return; } - spin_lock(&np->dev->xmit_lock); + while (!spin_trylock(&np->dev->xmit_lock)) { + if (np->dev->xmit_lock_owner == smp_processor_id()) { + struct Qdisc *q = np->dev->qdisc; + + q->ops->enqueue(skb, q); + return; + } + } + np->dev->xmit_lock_owner = smp_processor_id(); /* Patrick's patch (against 2.6.10-rc3-bk10): diff -X dontdiff -urN linux-2.6.9-rc3-bk10.orig/net/core/netpoll.c linux-2.6.9-rc3-bk10/net/core/netpoll.c--- linux-2.6.9-rc3-bk10.orig/net/core/netpoll.c 2004-12-22 12:09:32.000000000 +0000+++ linux-2.6.9-rc3-bk10/net/core/netpoll.c 2004-12-22 11:08:06.000000000 +0000@@ -22,6 +22,7 @@ #include <net/tcp.h> #include <net/udp.h> #include <asm/unaligned.h> +#include <net/pkt_sched.h> /* * We maintain a small pool of fully-sized skbs, to make sure the @@ -188,7 +189,24 @@ return; } - spin_lock(&np->dev->xmit_lock); + while (!spin_trylock(&np->dev->xmit_lock)) { + if (np->dev->xmit_lock_owner == smp_processor_id()) { + struct Qdisc *q; + + rcu_read_lock(); + q = rcu_dereference(np->dev->qdisc); + if (q->enqueue) { + spin_lock(&np->dev->queue_lock); + q->ops->enqueue(skb, q); + spin_unlock(&np->dev->queue_lock); + netif_schedule(np->dev); + } else + __kfree_skb(skb); + rcu_read_unlock(); + return; + } + } + np->dev->xmit_lock_owner = smp_processor_id(); /* -- Mark Broadbent <markb@wetlettuce.com> Web: http://www.wetlettuce.com ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2004-12-28 13:45 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <59719.192.102.214.6.1103214002.squirrel@webmail.wetlettuce.com>
[not found] ` <20041216211024.GK2767@waste.org>
[not found] ` <34721.192.102.214.6.1103274614.squirrel@webmail.wetlettuce.com>
[not found] ` <20041217215752.GP2767@waste.org>
[not found] ` <20041217233524.GA11202@electric-eye.fr.zoreil.com>
2004-12-20 9:42 ` Lockup with 2.6.9-ac15 related to netconsole Mark Broadbent
2004-12-20 21:14 ` Matt Mackall
2004-12-21 0:22 ` Francois Romieu
2004-12-21 0:55 ` Matt Mackall
2004-12-21 10:23 ` Mark Broadbent
2004-12-21 12:37 ` Francois Romieu
2004-12-21 13:29 ` Mark Broadbent
2004-12-21 20:48 ` Francois Romieu
2004-12-21 21:27 ` Matt Mackall
2004-12-21 22:58 ` Francois Romieu
2004-12-22 9:34 ` Patrick McHardy
2004-12-22 10:54 ` Patrick McHardy
2004-12-22 12:39 ` Francois Romieu
2004-12-22 13:33 ` jamal
2004-12-22 14:57 ` Patrick McHardy
2004-12-22 17:18 ` Matt Mackall
2004-12-25 11:26 ` Wish you all a Merry Christmas Pranav
2004-12-25 11:30 ` Jan Engelhardt
2004-12-28 13:45 ` Lockup with 2.6.9-ac15 related to netconsole jamal
2004-12-22 14:37 ` Mark Broadbent
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).