* netpoll trapped question
@ 2004-08-31 17:10 Jeff Moyer
2004-09-06 21:35 ` Matt Mackall
0 siblings, 1 reply; 11+ messages in thread
From: Jeff Moyer @ 2004-08-31 17:10 UTC (permalink / raw)
To: mpm; +Cc: linux-kernel
Hi, Matt,
This part of the netpoll trapped logic seems suspect to me, from
include/linux/netdevice.h:
static inline void netif_wake_queue(struct net_device *dev)
{
#ifdef CONFIG_NETPOLL_TRAP
if (netpoll_trap())
return;
#endif
if (test_and_clear_bit(__LINK_STATE_XOFF, &dev->state))
__netif_schedule(dev);
}
static inline void netif_stop_queue(struct net_device *dev)
{
#ifdef CONFIG_NETPOLL_TRAP
if (netpoll_trap())
return;
#endif
set_bit(__LINK_STATE_XOFF, &dev->state);
}
This looks buggy. Network drivers are now not able to stop the queue when
they run out of Tx descriptors. I think the __netif_schedule is okay to do
in the context of netpoll, and certainly a set_bit is okay. Why are these
hooks in place? I've tested alt-sysrq-t over netconsole and also netdump
with these #ifdef's removed, and things work correctly. Compare this with
alt-sysrq-t hanging the system with these tests in place. If I run netdump
with this logic still in place, I get the following messages from the tg3
driver:
eth0: BUG! Tx Ring full when queue awake!
Shall I send a patch, or have I missed something?
-Jeff
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: netpoll trapped question 2004-08-31 17:10 netpoll trapped question Jeff Moyer @ 2004-09-06 21:35 ` Matt Mackall 2004-09-07 16:33 ` Jeff Moyer 0 siblings, 1 reply; 11+ messages in thread From: Matt Mackall @ 2004-09-06 21:35 UTC (permalink / raw) To: Jeff Moyer; +Cc: linux-kernel On Tue, Aug 31, 2004 at 01:10:43PM -0400, Jeff Moyer wrote: > Hi, Matt, > > This part of the netpoll trapped logic seems suspect to me, from > include/linux/netdevice.h: > > static inline void netif_wake_queue(struct net_device *dev) > { > #ifdef CONFIG_NETPOLL_TRAP > if (netpoll_trap()) > return; > #endif > if (test_and_clear_bit(__LINK_STATE_XOFF, &dev->state)) > __netif_schedule(dev); > } > > static inline void netif_stop_queue(struct net_device *dev) > { > #ifdef CONFIG_NETPOLL_TRAP > if (netpoll_trap()) > return; > #endif > set_bit(__LINK_STATE_XOFF, &dev->state); > } > > This looks buggy. Network drivers are now not able to stop the queue when > they run out of Tx descriptors. I think the __netif_schedule is okay to do > in the context of netpoll, and certainly a set_bit is okay. Why are these > hooks in place? I've tested alt-sysrq-t over netconsole and also netdump > with these #ifdef's removed, and things work correctly. Compare this with > alt-sysrq-t hanging the system with these tests in place. If I run netdump > with this logic still in place, I get the following messages from the tg3 > driver: > > eth0: BUG! Tx Ring full when queue awake! > > Shall I send a patch, or have I missed something? I don't remember the origin or motivation of this bit, so I'm not sure at the moment. Shoot me a patch and I'll poke at it. Btw, did I send you my thoughts on your recursion patch? -- Mathematics is the supreme nostalgia of our time. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: netpoll trapped question 2004-09-06 21:35 ` Matt Mackall @ 2004-09-07 16:33 ` Jeff Moyer 2004-09-07 16:50 ` Matt Mackall 0 siblings, 1 reply; 11+ messages in thread From: Jeff Moyer @ 2004-09-07 16:33 UTC (permalink / raw) To: Matt Mackall; +Cc: linux-kernel ==> Regarding Re: netpoll trapped question; Matt Mackall <mpm@selenic.com> adds: mpm> On Tue, Aug 31, 2004 at 01:10:43PM -0400, Jeff Moyer wrote: >> Hi, Matt, >> >> This part of the netpoll trapped logic seems suspect to me, from >> include/linux/netdevice.h: >> >> static inline void netif_wake_queue(struct net_device *dev) { #ifdef >> CONFIG_NETPOLL_TRAP if (netpoll_trap()) return; #endif if >> (test_and_clear_bit(__LINK_STATE_XOFF, &dev->state)) >> __netif_schedule(dev); } >> >> static inline void netif_stop_queue(struct net_device *dev) { #ifdef >> CONFIG_NETPOLL_TRAP if (netpoll_trap()) return; #endif >> set_bit(__LINK_STATE_XOFF, &dev->state); } >> >> This looks buggy. Network drivers are now not able to stop the queue >> when they run out of Tx descriptors. I think the __netif_schedule is >> okay to do in the context of netpoll, and certainly a set_bit is okay. >> Why are these hooks in place? I've tested alt-sysrq-t over netconsole >> and also netdump with these #ifdef's removed, and things work correctly. >> Compare this with alt-sysrq-t hanging the system with these tests in >> place. If I run netdump with this logic still in place, I get the >> following messages from the tg3 driver> >> eth0> BUG! Tx Ring full when queue awake! >> Shall I send a patch, or have I missed something? mpm> I don't remember the origin or motivation of this bit, so I'm not sure mpm> at the moment. Shoot me a patch and I'll poke at it. What tree would you like me to base the patch on? In absence of response, it will be the latest -mm. mpm> Btw, did I send you my thoughts on your recursion patch? Yes, from your mail: mpm> But I still don't like this. dev->poll() is liable to attempt to mpm> recursively take its own driver lock again internally and we still mpm> deadlock. Have we already seen recursion here? If we do, I think we mpm> need to fix that in drivers. Meanwhile we should just bail here and mpm> maybe set a "something bad happened" flag. I'm not sure I follow. Which lock are you concerned about deadlocking on? >From an earlier message, you say this: mpm> It can [recurse] if the poll function does a printk or the like and mpm> wants to recurse via netconsole. So I think it's worth addressing in netpoll, as opposed to every driver. -Jeff ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: netpoll trapped question 2004-09-07 16:33 ` Jeff Moyer @ 2004-09-07 16:50 ` Matt Mackall 2004-09-07 16:53 ` Jeff Moyer 0 siblings, 1 reply; 11+ messages in thread From: Matt Mackall @ 2004-09-07 16:50 UTC (permalink / raw) To: Jeff Moyer; +Cc: linux-kernel On Tue, Sep 07, 2004 at 12:33:49PM -0400, Jeff Moyer wrote: > ==> Regarding Re: netpoll trapped question; Matt Mackall <mpm@selenic.com> adds: > > mpm> On Tue, Aug 31, 2004 at 01:10:43PM -0400, Jeff Moyer wrote: > >> Hi, Matt, > >> > >> This part of the netpoll trapped logic seems suspect to me, from > >> include/linux/netdevice.h: > >> > >> static inline void netif_wake_queue(struct net_device *dev) { #ifdef > >> CONFIG_NETPOLL_TRAP if (netpoll_trap()) return; #endif if > >> (test_and_clear_bit(__LINK_STATE_XOFF, &dev->state)) > >> __netif_schedule(dev); } > >> > >> static inline void netif_stop_queue(struct net_device *dev) { #ifdef > >> CONFIG_NETPOLL_TRAP if (netpoll_trap()) return; #endif > >> set_bit(__LINK_STATE_XOFF, &dev->state); } > >> > >> This looks buggy. Network drivers are now not able to stop the queue > >> when they run out of Tx descriptors. I think the __netif_schedule is > >> okay to do in the context of netpoll, and certainly a set_bit is okay. > >> Why are these hooks in place? I've tested alt-sysrq-t over netconsole > >> and also netdump with these #ifdef's removed, and things work correctly. > >> Compare this with alt-sysrq-t hanging the system with these tests in > >> place. If I run netdump with this logic still in place, I get the > >> following messages from the tg3 > driver> > >> > eth0> BUG! Tx Ring full when queue awake! > >> Shall I send a patch, or have I missed something? > > mpm> I don't remember the origin or motivation of this bit, so I'm not sure > mpm> at the moment. Shoot me a patch and I'll poke at it. > > What tree would you like me to base the patch on? In absence of response, > it will be the latest -mm. -mm is fine. > mpm> Btw, did I send you my thoughts on your recursion patch? > > Yes, from your mail: > > mpm> But I still don't like this. dev->poll() is liable to attempt to > mpm> recursively take its own driver lock again internally and we still > mpm> deadlock. Have we already seen recursion here? If we do, I think we > mpm> need to fix that in drivers. Meanwhile we should just bail here and > mpm> maybe set a "something bad happened" flag. > > I'm not sure I follow. Which lock are you concerned about deadlocking on? A random lock private to a given driver, for instance one taken on entry to the IRQ handler. If said driver tries to do a printk inside that lock and we recursively call the handler in netconsole, we're in trouble. These are the issues that will have to be cleaned up in individual drivers. So far, I haven't seen any reports, but I'm pretty sure such cases exist. I suppose it's also possible for us to disable recursion in netconsole instead of at the netpoll level. -- Mathematics is the supreme nostalgia of our time. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: netpoll trapped question 2004-09-07 16:50 ` Matt Mackall @ 2004-09-07 16:53 ` Jeff Moyer 2004-09-07 16:59 ` Matt Mackall 0 siblings, 1 reply; 11+ messages in thread From: Jeff Moyer @ 2004-09-07 16:53 UTC (permalink / raw) To: Matt Mackall; +Cc: linux-kernel ==> Regarding Re: netpoll trapped question; Matt Mackall <mpm@selenic.com> adds: mpm> On Tue, Sep 07, 2004 at 12:33:49PM -0400, Jeff Moyer wrote: >> ==> Regarding Re: netpoll trapped question; Matt Mackall >> <mpm@selenic.com> adds: >> mpm> On Tue, Aug 31, 2004 at 01:10:43PM -0400, Jeff Moyer wrote: >> >> Hi, Matt, >> >> >> >> This part of the netpoll trapped logic seems suspect to me, from >> >> include/linux/netdevice.h: >> >> >> >> static inline void netif_wake_queue(struct net_device *dev) { #ifdef >> >> CONFIG_NETPOLL_TRAP if (netpoll_trap()) return; #endif if >> >> (test_and_clear_bit(__LINK_STATE_XOFF, &dev->state)) >> >> __netif_schedule(dev); } >> >> >> >> static inline void netif_stop_queue(struct net_device *dev) { #ifdef >> >> CONFIG_NETPOLL_TRAP if (netpoll_trap()) return; #endif >> >> set_bit(__LINK_STATE_XOFF, &dev->state); } >> >> >> >> This looks buggy. Network drivers are now not able to stop the queue >> >> when they run out of Tx descriptors. I think the __netif_schedule is >> >> okay to do in the context of netpoll, and certainly a set_bit is >> okay. >> Why are these hooks in place? I've tested alt-sysrq-t over >> netconsole >> and also netdump with these #ifdef's removed, and things >> work correctly. >> Compare this with alt-sysrq-t hanging the system >> with these tests in >> place. If I run netdump with this logic still in >> place, I get the >> following messages from the tg3 driver> >> >> eth0> BUG! Tx Ring full when queue awake! >> >> Shall I send a patch, or have I missed something? >> mpm> I don't remember the origin or motivation of this bit, so I'm not sure mpm> at the moment. Shoot me a patch and I'll poke at it. >> What tree would you like me to base the patch on? In absence of >> response, it will be the latest -mm. mpm> -mm is fine. mpm> Btw, did I send you my thoughts on your recursion patch? >> Yes, from your mail: >> mpm> But I still don't like this. dev->poll() is liable to attempt to mpm> recursively take its own driver lock again internally and we still mpm> deadlock. Have we already seen recursion here? If we do, I think we mpm> need to fix that in drivers. Meanwhile we should just bail here and mpm> maybe set a "something bad happened" flag. >> I'm not sure I follow. Which lock are you concerned about deadlocking >> on? mpm> A random lock private to a given driver, for instance one taken on mpm> entry to the IRQ handler. If said driver tries to do a printk inside mpm> that lock and we recursively call the handler in netconsole, we're in mpm> trouble. These are the issues that will have to be cleaned up in mpm> individual drivers. So far, I haven't seen any reports, but I'm pretty mpm> sure such cases exist. I suppose it's also possible for us to disable mpm> recursion in netconsole instead of at the netpoll level. Recursion in netconsole is protected by the console semaphore. -Jeff ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: netpoll trapped question 2004-09-07 16:53 ` Jeff Moyer @ 2004-09-07 16:59 ` Matt Mackall 2004-09-07 17:10 ` Jeff Moyer 0 siblings, 1 reply; 11+ messages in thread From: Matt Mackall @ 2004-09-07 16:59 UTC (permalink / raw) To: Jeff Moyer; +Cc: linux-kernel On Tue, Sep 07, 2004 at 12:53:17PM -0400, Jeff Moyer wrote: > mpm> A random lock private to a given driver, for instance one taken on > mpm> entry to the IRQ handler. If said driver tries to do a printk inside > mpm> that lock and we recursively call the handler in netconsole, we're in > mpm> trouble. These are the issues that will have to be cleaned up in > mpm> individual drivers. So far, I haven't seen any reports, but I'm pretty > mpm> sure such cases exist. I suppose it's also possible for us to disable > mpm> recursion in netconsole instead of at the netpoll level. > > Recursion in netconsole is protected by the console semaphore. Yes, true. But we're still in trouble if we have nic irq handler -> take private lock -> printk -> netconsole -> nic irq handler -> take private lock. See? -- Mathematics is the supreme nostalgia of our time. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: netpoll trapped question 2004-09-07 16:59 ` Matt Mackall @ 2004-09-07 17:10 ` Jeff Moyer 2004-09-07 17:22 ` Matt Mackall 2004-09-08 8:22 ` Herbert Xu 0 siblings, 2 replies; 11+ messages in thread From: Jeff Moyer @ 2004-09-07 17:10 UTC (permalink / raw) To: Matt Mackall; +Cc: linux-kernel ==> Regarding Re: netpoll trapped question; Matt Mackall <mpm@selenic.com> adds: mpm> On Tue, Sep 07, 2004 at 12:53:17PM -0400, Jeff Moyer wrote: A random mpm> lock private to a given driver, for instance one taken on entry to the mpm> IRQ handler. If said driver tries to do a printk inside that lock and mpm> we recursively call the handler in netconsole, we're in trouble. mpm> These are the issues that will have to be cleaned up in individual mpm> drivers. So far, I haven't seen any reports, but I'm pretty sure such mpm> cases exist. I suppose it's also possible for us to disable recursion mpm> in netconsole instead of at the netpoll level. >> Recursion in netconsole is protected by the console semaphore. mpm> Yes, true. But we're still in trouble if we have nic irq handler -> mpm> take private lock -> printk -> netconsole -> nic irq handler -> take mpm> private lock. See? Okay, so that one has to be addressed on a per-driver basis. There's no way for us to detect that situation. And how do drivers address this? Simply don't printk inside the lock? I think that's reasonable. -Jeff ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: netpoll trapped question 2004-09-07 17:10 ` Jeff Moyer @ 2004-09-07 17:22 ` Matt Mackall 2004-09-08 8:22 ` Herbert Xu 1 sibling, 0 replies; 11+ messages in thread From: Matt Mackall @ 2004-09-07 17:22 UTC (permalink / raw) To: Jeff Moyer; +Cc: linux-kernel On Tue, Sep 07, 2004 at 01:10:00PM -0400, Jeff Moyer wrote: > ==> Regarding Re: netpoll trapped question; Matt Mackall <mpm@selenic.com> adds: > > mpm> On Tue, Sep 07, 2004 at 12:53:17PM -0400, Jeff Moyer wrote: A random > mpm> lock private to a given driver, for instance one taken on entry to the > mpm> IRQ handler. If said driver tries to do a printk inside that lock and > mpm> we recursively call the handler in netconsole, we're in trouble. > mpm> These are the issues that will have to be cleaned up in individual > mpm> drivers. So far, I haven't seen any reports, but I'm pretty sure such > mpm> cases exist. I suppose it's also possible for us to disable recursion > mpm> in netconsole instead of at the netpoll level. > >> Recursion in netconsole is protected by the console semaphore. > > mpm> Yes, true. But we're still in trouble if we have nic irq handler -> > mpm> take private lock -> printk -> netconsole -> nic irq handler -> take > mpm> private lock. See? > > Okay, so that one has to be addressed on a per-driver basis. There's no > way for us to detect that situation. And how do drivers address this? > Simply don't printk inside the lock? I think that's reasonable. That or drop the lock where possible. -- Mathematics is the supreme nostalgia of our time. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: netpoll trapped question 2004-09-07 17:10 ` Jeff Moyer 2004-09-07 17:22 ` Matt Mackall @ 2004-09-08 8:22 ` Herbert Xu 2004-09-08 8:34 ` Herbert Xu 1 sibling, 1 reply; 11+ messages in thread From: Herbert Xu @ 2004-09-08 8:22 UTC (permalink / raw) To: jmoyer; +Cc: mpm, linux-kernel Jeff Moyer <jmoyer@redhat.com> wrote: > > mpm> Yes, true. But we're still in trouble if we have nic irq handler -> > mpm> take private lock -> printk -> netconsole -> nic irq handler -> take > mpm> private lock. See? > > Okay, so that one has to be addressed on a per-driver basis. There's no > way for us to detect that situation. And how do drivers address this? > Simply don't printk inside the lock? I think that's reasonable. Why not queue the message whenever you're in IRQ context, and only print when you are not? -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: netpoll trapped question 2004-09-08 8:22 ` Herbert Xu @ 2004-09-08 8:34 ` Herbert Xu 2004-09-08 11:54 ` Jeff Moyer 0 siblings, 1 reply; 11+ messages in thread From: Herbert Xu @ 2004-09-08 8:34 UTC (permalink / raw) To: Herbert Xu; +Cc: jmoyer, mpm, linux-kernel Herbert Xu <herbert@gondor.apana.org.au> wrote: > Jeff Moyer <jmoyer@redhat.com> wrote: >> >> mpm> Yes, true. But we're still in trouble if we have nic irq handler -> >> mpm> take private lock -> printk -> netconsole -> nic irq handler -> take >> mpm> private lock. See? >> >> Okay, so that one has to be addressed on a per-driver basis. There's no >> way for us to detect that situation. And how do drivers address this? >> Simply don't printk inside the lock? I think that's reasonable. > > Why not queue the message whenever you're in IRQ context, and only > print when you are not? Actually how can this happen at all? The IRQ handler should've disabled local IRQs which prevents the second handler from occuring. -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: netpoll trapped question 2004-09-08 8:34 ` Herbert Xu @ 2004-09-08 11:54 ` Jeff Moyer 0 siblings, 0 replies; 11+ messages in thread From: Jeff Moyer @ 2004-09-08 11:54 UTC (permalink / raw) To: Herbert Xu; +Cc: mpm, linux-kernel ==> Regarding Re: netpoll trapped question; Herbert Xu <herbert@gondor.apana.org.au> adds: herbert> Herbert Xu <herbert@gondor.apana.org.au> wrote: >> Jeff Moyer <jmoyer@redhat.com> wrote: >>> mpm> Yes, true. But we're still in trouble if we have nic irq handler -> mpm> take private lock -> printk -> netconsole -> nic irq handler -> take mpm> private lock. See? >>> Okay, so that one has to be addressed on a per-driver basis. There's >>> no way for us to detect that situation. And how do drivers address >>> this? Simply don't printk inside the lock? I think that's reasonable. >> Why not queue the message whenever you're in IRQ context, and only print >> when you are not? This question has been answered before. We don't want to defer the output of say an oops or panic message, since we may never get back to process context. herbert> Actually how can this happen at all? The IRQ handler should've herbert> disabled local IRQs which prevents the second handler from herbert> occuring. The netpoll infrastructure calls into the device's netpoll routine. This will, in turn, call the interrupt routine for the device. -Jeff ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2004-09-08 11:57 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2004-08-31 17:10 netpoll trapped question Jeff Moyer 2004-09-06 21:35 ` Matt Mackall 2004-09-07 16:33 ` Jeff Moyer 2004-09-07 16:50 ` Matt Mackall 2004-09-07 16:53 ` Jeff Moyer 2004-09-07 16:59 ` Matt Mackall 2004-09-07 17:10 ` Jeff Moyer 2004-09-07 17:22 ` Matt Mackall 2004-09-08 8:22 ` Herbert Xu 2004-09-08 8:34 ` Herbert Xu 2004-09-08 11:54 ` Jeff Moyer
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox