netpoll trapped question

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* netpoll trapped question
@ 2004-08-31 17:10 Jeff Moyer
  2004-09-06 21:35 ` Matt Mackall
  0 siblings, 1 reply; 11+ messages in thread
From: Jeff Moyer @ 2004-08-31 17:10 UTC (permalink / raw)
  To: mpm; +Cc: linux-kernel

Hi, Matt,

This part of the netpoll trapped logic seems suspect to me, from
include/linux/netdevice.h:

static inline void netif_wake_queue(struct net_device *dev)
{
#ifdef CONFIG_NETPOLL_TRAP
	if (netpoll_trap())
		return;
#endif
	if (test_and_clear_bit(__LINK_STATE_XOFF, &dev->state))
		__netif_schedule(dev);
}

static inline void netif_stop_queue(struct net_device *dev)
{
#ifdef CONFIG_NETPOLL_TRAP
	if (netpoll_trap())
		return;
#endif
	set_bit(__LINK_STATE_XOFF, &dev->state);
}

This looks buggy.  Network drivers are now not able to stop the queue when
they run out of Tx descriptors.  I think the __netif_schedule is okay to do
in the context of netpoll, and certainly a set_bit is okay.  Why are these
hooks in place?  I've tested alt-sysrq-t over netconsole and also netdump
with these #ifdef's removed, and things work correctly.  Compare this with
alt-sysrq-t hanging the system with these tests in place.  If I run netdump
with this logic still in place, I get the following messages from the tg3
driver:

  eth0: BUG! Tx Ring full when queue awake!

Shall I send a patch, or have I missed something?

-Jeff

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: netpoll trapped question
  2004-08-31 17:10 netpoll trapped question Jeff Moyer
@ 2004-09-06 21:35 ` Matt Mackall
  2004-09-07 16:33   ` Jeff Moyer
  0 siblings, 1 reply; 11+ messages in thread
From: Matt Mackall @ 2004-09-06 21:35 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: linux-kernel

On Tue, Aug 31, 2004 at 01:10:43PM -0400, Jeff Moyer wrote:
> Hi, Matt,
> 
> This part of the netpoll trapped logic seems suspect to me, from
> include/linux/netdevice.h:
> 
> static inline void netif_wake_queue(struct net_device *dev)
> {
> #ifdef CONFIG_NETPOLL_TRAP
> 	if (netpoll_trap())
> 		return;
> #endif
> 	if (test_and_clear_bit(__LINK_STATE_XOFF, &dev->state))
> 		__netif_schedule(dev);
> }
> 
> static inline void netif_stop_queue(struct net_device *dev)
> {
> #ifdef CONFIG_NETPOLL_TRAP
> 	if (netpoll_trap())
> 		return;
> #endif
> 	set_bit(__LINK_STATE_XOFF, &dev->state);
> }
> 
> This looks buggy.  Network drivers are now not able to stop the queue when
> they run out of Tx descriptors.  I think the __netif_schedule is okay to do
> in the context of netpoll, and certainly a set_bit is okay.  Why are these
> hooks in place?  I've tested alt-sysrq-t over netconsole and also netdump
> with these #ifdef's removed, and things work correctly.  Compare this with
> alt-sysrq-t hanging the system with these tests in place.  If I run netdump
> with this logic still in place, I get the following messages from the tg3
> driver:
> 
>   eth0: BUG! Tx Ring full when queue awake!
> 
> Shall I send a patch, or have I missed something?

I don't remember the origin or motivation of this bit, so I'm not sure
at the moment. Shoot me a patch and I'll poke at it.

Btw, did I send you my thoughts on your recursion patch?

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: netpoll trapped question
  2004-09-06 21:35 ` Matt Mackall
@ 2004-09-07 16:33   ` Jeff Moyer
  2004-09-07 16:50     ` Matt Mackall
  0 siblings, 1 reply; 11+ messages in thread
From: Jeff Moyer @ 2004-09-07 16:33 UTC (permalink / raw)
  To: Matt Mackall; +Cc: linux-kernel

==> Regarding Re: netpoll trapped question; Matt Mackall <mpm@selenic.com> adds:

mpm> On Tue, Aug 31, 2004 at 01:10:43PM -0400, Jeff Moyer wrote:
>> Hi, Matt,
>> 
>> This part of the netpoll trapped logic seems suspect to me, from
>> include/linux/netdevice.h:
>> 
>> static inline void netif_wake_queue(struct net_device *dev) { #ifdef
>> CONFIG_NETPOLL_TRAP if (netpoll_trap()) return; #endif if
>> (test_and_clear_bit(__LINK_STATE_XOFF, &dev->state))
>> __netif_schedule(dev); }
>> 
>> static inline void netif_stop_queue(struct net_device *dev) { #ifdef
>> CONFIG_NETPOLL_TRAP if (netpoll_trap()) return; #endif
>> set_bit(__LINK_STATE_XOFF, &dev->state); }
>> 
>> This looks buggy.  Network drivers are now not able to stop the queue
>> when they run out of Tx descriptors.  I think the __netif_schedule is
>> okay to do in the context of netpoll, and certainly a set_bit is okay.
>> Why are these hooks in place?  I've tested alt-sysrq-t over netconsole
>> and also netdump with these #ifdef's removed, and things work correctly.
>> Compare this with alt-sysrq-t hanging the system with these tests in
>> place.  If I run netdump with this logic still in place, I get the
>> following messages from the tg3
driver>
>>
eth0> BUG! Tx Ring full when queue awake!
>> Shall I send a patch, or have I missed something?

mpm> I don't remember the origin or motivation of this bit, so I'm not sure
mpm> at the moment. Shoot me a patch and I'll poke at it.

What tree would you like me to base the patch on?  In absence of response,
it will be the latest -mm.

mpm> Btw, did I send you my thoughts on your recursion patch?

Yes, from your mail:

mpm> But I still don't like this. dev->poll() is liable to attempt to
mpm> recursively take its own driver lock again internally and we still
mpm> deadlock. Have we already seen recursion here? If we do, I think we
mpm> need to fix that in drivers. Meanwhile we should just bail here and
mpm> maybe set a "something bad happened" flag.

I'm not sure I follow.  Which lock are you concerned about deadlocking on?
>From an earlier message, you say this:

mpm> It can [recurse] if the poll function does a printk or the like and
mpm> wants to recurse via netconsole.

So I think it's worth addressing in netpoll, as opposed to every driver.

-Jeff

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: netpoll trapped question
  2004-09-07 16:33   ` Jeff Moyer
@ 2004-09-07 16:50     ` Matt Mackall
  2004-09-07 16:53       ` Jeff Moyer
  0 siblings, 1 reply; 11+ messages in thread
From: Matt Mackall @ 2004-09-07 16:50 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: linux-kernel

On Tue, Sep 07, 2004 at 12:33:49PM -0400, Jeff Moyer wrote:
> ==> Regarding Re: netpoll trapped question; Matt Mackall <mpm@selenic.com> adds:
> 
> mpm> On Tue, Aug 31, 2004 at 01:10:43PM -0400, Jeff Moyer wrote:
> >> Hi, Matt,
> >> 
> >> This part of the netpoll trapped logic seems suspect to me, from
> >> include/linux/netdevice.h:
> >> 
> >> static inline void netif_wake_queue(struct net_device *dev) { #ifdef
> >> CONFIG_NETPOLL_TRAP if (netpoll_trap()) return; #endif if
> >> (test_and_clear_bit(__LINK_STATE_XOFF, &dev->state))
> >> __netif_schedule(dev); }
> >> 
> >> static inline void netif_stop_queue(struct net_device *dev) { #ifdef
> >> CONFIG_NETPOLL_TRAP if (netpoll_trap()) return; #endif
> >> set_bit(__LINK_STATE_XOFF, &dev->state); }
> >> 
> >> This looks buggy.  Network drivers are now not able to stop the queue
> >> when they run out of Tx descriptors.  I think the __netif_schedule is
> >> okay to do in the context of netpoll, and certainly a set_bit is okay.
> >> Why are these hooks in place?  I've tested alt-sysrq-t over netconsole
> >> and also netdump with these #ifdef's removed, and things work correctly.
> >> Compare this with alt-sysrq-t hanging the system with these tests in
> >> place.  If I run netdump with this logic still in place, I get the
> >> following messages from the tg3
> driver>
> >>
> eth0> BUG! Tx Ring full when queue awake!
> >> Shall I send a patch, or have I missed something?
> 
> mpm> I don't remember the origin or motivation of this bit, so I'm not sure
> mpm> at the moment. Shoot me a patch and I'll poke at it.
> 
> What tree would you like me to base the patch on?  In absence of response,
> it will be the latest -mm.

-mm is fine.
 
> mpm> Btw, did I send you my thoughts on your recursion patch?
> 
> Yes, from your mail:
> 
> mpm> But I still don't like this. dev->poll() is liable to attempt to
> mpm> recursively take its own driver lock again internally and we still
> mpm> deadlock. Have we already seen recursion here? If we do, I think we
> mpm> need to fix that in drivers. Meanwhile we should just bail here and
> mpm> maybe set a "something bad happened" flag.
> 
> I'm not sure I follow.  Which lock are you concerned about deadlocking on?

A random lock private to a given driver, for instance one taken on entry to
the IRQ handler. If said driver tries to do a printk inside that lock
and we recursively call the handler in netconsole, we're in trouble.
These are the issues that will have to be cleaned up in individual
drivers. So far, I haven't seen any reports, but I'm pretty sure such
cases exist. I suppose it's also possible for us to disable recursion
in netconsole instead of at the netpoll level.

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: netpoll trapped question
  2004-09-07 16:50     ` Matt Mackall
@ 2004-09-07 16:53       ` Jeff Moyer
  2004-09-07 16:59         ` Matt Mackall
  0 siblings, 1 reply; 11+ messages in thread
From: Jeff Moyer @ 2004-09-07 16:53 UTC (permalink / raw)
  To: Matt Mackall; +Cc: linux-kernel

==> Regarding Re: netpoll trapped question; Matt Mackall <mpm@selenic.com> adds:

mpm> On Tue, Sep 07, 2004 at 12:33:49PM -0400, Jeff Moyer wrote:
>> ==> Regarding Re: netpoll trapped question; Matt Mackall
>> <mpm@selenic.com> adds:
>> 
mpm> On Tue, Aug 31, 2004 at 01:10:43PM -0400, Jeff Moyer wrote:
>> >> Hi, Matt,
>> >> 
>> >> This part of the netpoll trapped logic seems suspect to me, from >>
>> include/linux/netdevice.h:
>> >> 
>> >> static inline void netif_wake_queue(struct net_device *dev) { #ifdef
>> >> CONFIG_NETPOLL_TRAP if (netpoll_trap()) return; #endif if >>
>> (test_and_clear_bit(__LINK_STATE_XOFF, &dev->state)) >>
>> __netif_schedule(dev); }
>> >> 
>> >> static inline void netif_stop_queue(struct net_device *dev) { #ifdef
>> >> CONFIG_NETPOLL_TRAP if (netpoll_trap()) return; #endif >>
>> set_bit(__LINK_STATE_XOFF, &dev->state); }
>> >> 
>> >> This looks buggy.  Network drivers are now not able to stop the queue
>> >> when they run out of Tx descriptors.  I think the __netif_schedule is
>> >> okay to do in the context of netpoll, and certainly a set_bit is
>> okay.  >> Why are these hooks in place?  I've tested alt-sysrq-t over
>> netconsole >> and also netdump with these #ifdef's removed, and things
>> work correctly.  >> Compare this with alt-sysrq-t hanging the system
>> with these tests in >> place.  If I run netdump with this logic still in
>> place, I get the >> following messages from the tg3
driver>
>> >>
eth0> BUG! Tx Ring full when queue awake!
>> >> Shall I send a patch, or have I missed something?
>> 
mpm> I don't remember the origin or motivation of this bit, so I'm not sure
mpm> at the moment. Shoot me a patch and I'll poke at it.
>> What tree would you like me to base the patch on?  In absence of
>> response, it will be the latest -mm.

mpm> -mm is fine.
 
mpm> Btw, did I send you my thoughts on your recursion patch?
>> Yes, from your mail:
>> 
mpm> But I still don't like this. dev->poll() is liable to attempt to
mpm> recursively take its own driver lock again internally and we still
mpm> deadlock. Have we already seen recursion here? If we do, I think we
mpm> need to fix that in drivers. Meanwhile we should just bail here and
mpm> maybe set a "something bad happened" flag.
>> I'm not sure I follow.  Which lock are you concerned about deadlocking
>> on?

mpm> A random lock private to a given driver, for instance one taken on
mpm> entry to the IRQ handler. If said driver tries to do a printk inside
mpm> that lock and we recursively call the handler in netconsole, we're in
mpm> trouble.  These are the issues that will have to be cleaned up in
mpm> individual drivers. So far, I haven't seen any reports, but I'm pretty
mpm> sure such cases exist. I suppose it's also possible for us to disable
mpm> recursion in netconsole instead of at the netpoll level.

Recursion in netconsole is protected by the console semaphore.

-Jeff

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: netpoll trapped question
  2004-09-07 16:53       ` Jeff Moyer
@ 2004-09-07 16:59         ` Matt Mackall
  2004-09-07 17:10           ` Jeff Moyer
  0 siblings, 1 reply; 11+ messages in thread
From: Matt Mackall @ 2004-09-07 16:59 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: linux-kernel

On Tue, Sep 07, 2004 at 12:53:17PM -0400, Jeff Moyer wrote:
> mpm> A random lock private to a given driver, for instance one taken on
> mpm> entry to the IRQ handler. If said driver tries to do a printk inside
> mpm> that lock and we recursively call the handler in netconsole, we're in
> mpm> trouble.  These are the issues that will have to be cleaned up in
> mpm> individual drivers. So far, I haven't seen any reports, but I'm pretty
> mpm> sure such cases exist. I suppose it's also possible for us to disable
> mpm> recursion in netconsole instead of at the netpoll level.
> 
> Recursion in netconsole is protected by the console semaphore.

Yes, true. But we're still in trouble if we have nic irq handler ->
take private lock -> printk -> netconsole -> nic irq handler -> take
private lock. See?

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: netpoll trapped question
  2004-09-07 16:59         ` Matt Mackall
@ 2004-09-07 17:10           ` Jeff Moyer
  2004-09-07 17:22             ` Matt Mackall
  2004-09-08  8:22             ` Herbert Xu
  0 siblings, 2 replies; 11+ messages in thread
From: Jeff Moyer @ 2004-09-07 17:10 UTC (permalink / raw)
  To: Matt Mackall; +Cc: linux-kernel

==> Regarding Re: netpoll trapped question; Matt Mackall <mpm@selenic.com> adds:

mpm> On Tue, Sep 07, 2004 at 12:53:17PM -0400, Jeff Moyer wrote: A random
mpm> lock private to a given driver, for instance one taken on entry to the
mpm> IRQ handler. If said driver tries to do a printk inside that lock and
mpm> we recursively call the handler in netconsole, we're in trouble.
mpm> These are the issues that will have to be cleaned up in individual
mpm> drivers. So far, I haven't seen any reports, but I'm pretty sure such
mpm> cases exist. I suppose it's also possible for us to disable recursion
mpm> in netconsole instead of at the netpoll level.
>> Recursion in netconsole is protected by the console semaphore.

mpm> Yes, true. But we're still in trouble if we have nic irq handler ->
mpm> take private lock -> printk -> netconsole -> nic irq handler -> take
mpm> private lock. See?

Okay, so that one has to be addressed on a per-driver basis.  There's no
way for us to detect that situation.  And how do drivers address this?
Simply don't printk inside the lock?  I think that's reasonable.

-Jeff

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: netpoll trapped question
  2004-09-07 17:10           ` Jeff Moyer
@ 2004-09-07 17:22             ` Matt Mackall
  2004-09-08  8:22             ` Herbert Xu
  1 sibling, 0 replies; 11+ messages in thread
From: Matt Mackall @ 2004-09-07 17:22 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: linux-kernel

On Tue, Sep 07, 2004 at 01:10:00PM -0400, Jeff Moyer wrote:
> ==> Regarding Re: netpoll trapped question; Matt Mackall <mpm@selenic.com> adds:
> 
> mpm> On Tue, Sep 07, 2004 at 12:53:17PM -0400, Jeff Moyer wrote: A random
> mpm> lock private to a given driver, for instance one taken on entry to the
> mpm> IRQ handler. If said driver tries to do a printk inside that lock and
> mpm> we recursively call the handler in netconsole, we're in trouble.
> mpm> These are the issues that will have to be cleaned up in individual
> mpm> drivers. So far, I haven't seen any reports, but I'm pretty sure such
> mpm> cases exist. I suppose it's also possible for us to disable recursion
> mpm> in netconsole instead of at the netpoll level.
> >> Recursion in netconsole is protected by the console semaphore.
> 
> mpm> Yes, true. But we're still in trouble if we have nic irq handler ->
> mpm> take private lock -> printk -> netconsole -> nic irq handler -> take
> mpm> private lock. See?
> 
> Okay, so that one has to be addressed on a per-driver basis.  There's no
> way for us to detect that situation.  And how do drivers address this?
> Simply don't printk inside the lock?  I think that's reasonable.

That or drop the lock where possible.

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: netpoll trapped question
  2004-09-07 17:10           ` Jeff Moyer
  2004-09-07 17:22             ` Matt Mackall
@ 2004-09-08  8:22             ` Herbert Xu
  2004-09-08  8:34               ` Herbert Xu
  1 sibling, 1 reply; 11+ messages in thread
From: Herbert Xu @ 2004-09-08  8:22 UTC (permalink / raw)
  To: jmoyer; +Cc: mpm, linux-kernel

Jeff Moyer <jmoyer@redhat.com> wrote:
> 
> mpm> Yes, true. But we're still in trouble if we have nic irq handler ->
> mpm> take private lock -> printk -> netconsole -> nic irq handler -> take
> mpm> private lock. See?
> 
> Okay, so that one has to be addressed on a per-driver basis.  There's no
> way for us to detect that situation.  And how do drivers address this?
> Simply don't printk inside the lock?  I think that's reasonable.

Why not queue the message whenever you're in IRQ context, and only
print when you are not?
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: netpoll trapped question
  2004-09-08  8:22             ` Herbert Xu
@ 2004-09-08  8:34               ` Herbert Xu
  2004-09-08 11:54                 ` Jeff Moyer
  0 siblings, 1 reply; 11+ messages in thread
From: Herbert Xu @ 2004-09-08  8:34 UTC (permalink / raw)
  To: Herbert Xu; +Cc: jmoyer, mpm, linux-kernel

Herbert Xu <herbert@gondor.apana.org.au> wrote:
> Jeff Moyer <jmoyer@redhat.com> wrote:
>> 
>> mpm> Yes, true. But we're still in trouble if we have nic irq handler ->
>> mpm> take private lock -> printk -> netconsole -> nic irq handler -> take
>> mpm> private lock. See?
>> 
>> Okay, so that one has to be addressed on a per-driver basis.  There's no
>> way for us to detect that situation.  And how do drivers address this?
>> Simply don't printk inside the lock?  I think that's reasonable.
> 
> Why not queue the message whenever you're in IRQ context, and only
> print when you are not?

Actually how can this happen at all? The IRQ handler should've disabled
local IRQs which prevents the second handler from occuring.
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: netpoll trapped question
  2004-09-08  8:34               ` Herbert Xu
@ 2004-09-08 11:54                 ` Jeff Moyer
  0 siblings, 0 replies; 11+ messages in thread
From: Jeff Moyer @ 2004-09-08 11:54 UTC (permalink / raw)
  To: Herbert Xu; +Cc: mpm, linux-kernel

==> Regarding Re: netpoll trapped question; Herbert Xu <herbert@gondor.apana.org.au> adds:

herbert> Herbert Xu <herbert@gondor.apana.org.au> wrote:
>> Jeff Moyer <jmoyer@redhat.com> wrote:
>>>
mpm> Yes, true. But we're still in trouble if we have nic irq handler ->
mpm> take private lock -> printk -> netconsole -> nic irq handler -> take
mpm> private lock. See?
>>> Okay, so that one has to be addressed on a per-driver basis.  There's
>>> no way for us to detect that situation.  And how do drivers address
>>> this?  Simply don't printk inside the lock?  I think that's reasonable.

>> Why not queue the message whenever you're in IRQ context, and only print
>> when you are not?

This question has been answered before.  We don't want to defer the output
of say an oops or panic message, since we may never get back to process
context.

herbert> Actually how can this happen at all? The IRQ handler should've
herbert> disabled local IRQs which prevents the second handler from
herbert> occuring.

The netpoll infrastructure calls into the device's netpoll routine.  This
will, in turn, call the interrupt routine for the device.

-Jeff

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2004-09-08 11:57 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-08-31 17:10 netpoll trapped question Jeff Moyer
2004-09-06 21:35 ` Matt Mackall
2004-09-07 16:33   ` Jeff Moyer
2004-09-07 16:50     ` Matt Mackall
2004-09-07 16:53       ` Jeff Moyer
2004-09-07 16:59         ` Matt Mackall
2004-09-07 17:10           ` Jeff Moyer
2004-09-07 17:22             ` Matt Mackall
2004-09-08  8:22             ` Herbert Xu
2004-09-08  8:34               ` Herbert Xu
2004-09-08 11:54                 ` Jeff Moyer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox