Re: lockups with netconsole on e1000 on media insertion

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Re: lockups with netconsole on e1000 on media insertion
       [not found] <42F347D2.7000207@home.se.suse.lists.linux.kernel>
@ 2005-08-05 11:45 ` Andi Kleen
  2005-08-05 12:44   ` John Bäckstrand
                     ` (2 more replies)
  0 siblings, 3 replies; 38+ messages in thread
From: Andi Kleen @ 2005-08-05 11:45 UTC (permalink / raw)
  To: John Bäckstrand; +Cc: linux-kernel, netdev

John Bäckstrand <sandos@home.se> writes:

> I've been trying to hunt down a hard lockup issue with some hardware
> of mine, but I've possibly hit a kernel bug instead. When using
> netconsole on my e1000, if I unplug the cable and then re-plug it, the
> machine locks up hard. It manages to print the "link up" message on
> the screen, but nothing after that. Now, I wonder if this is supposed
> to be so? I tried this on 4 different configurations, 2.6.13-rc5 and
> 2.6.12 with and without "noapic acpi=off", same result on all of
> them. I've tried with 1 and 3 other NICs in the machine at the same
> time.

I ran into the same problem some time ago on e1000. The problem was
that if the link doesn't come up netconsole ends up waiting forever
for it.

The patch was for 2.6.12, did a quick untested port to 2.6.13rc5.

-Andi

Only try a limited number to send packets in netpoll

Avoids hangs on e1000 when link is not up.

Signed-off-by: Andi Kleen <ak@suse.de>

Index: linux/net/core/netpoll.c
===================================================================
--- linux.orig/net/core/netpoll.c
+++ linux/net/core/netpoll.c
@@ -247,9 +247,11 @@ static void netpoll_send_skb(struct netp
 {
 	int status;
 	struct netpoll_info *npinfo;
+	/* Only try 5 times in case the link is down etc. */
+	int try = 5;
 
 repeat:
-	if(!np || !np->dev || !netif_running(np->dev)) {
+	if(try-- == 0 || !np || !np->dev || !netif_running(np->dev)) {
 		__kfree_skb(skb);
 		return;
 	}
@@ -286,6 +288,9 @@ repeat:
 
 	/* transmit busy */
 	if(status) {
+		/* Don't count spinlock as try */
+		if (status == NETDEV_TX_LOCKED)
+			try++; 
 		netpoll_poll(np);
 		goto repeat;
 	}

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: lockups with netconsole on e1000 on media insertion
  2005-08-05 11:45 ` lockups with netconsole on e1000 on media insertion Andi Kleen
@ 2005-08-05 12:44   ` John Bäckstrand
  2005-08-05 13:49   ` Steven Rostedt
  2005-08-05 20:12   ` Matt Mackall
  2 siblings, 0 replies; 38+ messages in thread
From: John Bäckstrand @ 2005-08-05 12:44 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel, netdev

Andi Kleen wrote:
> The patch was for 2.6.12, did a quick untested port to 2.6.13rc5.
> 
> -Andi
> 
> Only try a limited number to send packets in netpoll

Thanks, worked nicely!

---
John Bäckstrand

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: lockups with netconsole on e1000 on media insertion
  2005-08-05 11:45 ` lockups with netconsole on e1000 on media insertion Andi Kleen
  2005-08-05 12:44   ` John Bäckstrand
@ 2005-08-05 13:49   ` Steven Rostedt
  2005-08-05 13:55     ` Andi Kleen
  2005-08-07 21:12     ` lockups with netconsole on e1000 on media insertion John Bäckstrand
  2005-08-05 20:12   ` Matt Mackall
  2 siblings, 2 replies; 38+ messages in thread
From: Steven Rostedt @ 2005-08-05 13:49 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Ingo Molnar, netdev, linux-kernel, John Bäckstrand

On Fri, 2005-08-05 at 13:45 +0200, Andi Kleen wrote:
> John Bäckstrand <sandos@home.se> writes:
> 
> > I've been trying to hunt down a hard lockup issue with some hardware
> > of mine, but I've possibly hit a kernel bug instead. When using
> > netconsole on my e1000, if I unplug the cable and then re-plug it, the
> > machine locks up hard. It manages to print the "link up" message on
> > the screen, but nothing after that. Now, I wonder if this is supposed
> > to be so? I tried this on 4 different configurations, 2.6.13-rc5 and
> > 2.6.12 with and without "noapic acpi=off", same result on all of
> > them. I've tried with 1 and 3 other NICs in the machine at the same
> > time.
> 
> I ran into the same problem some time ago on e1000. The problem was
> that if the link doesn't come up netconsole ends up waiting forever
> for it.
> 
> The patch was for 2.6.12, did a quick untested port to 2.6.13rc5.
> 
> -Andi
> 
> Only try a limited number to send packets in netpoll
> 
> Avoids hangs on e1000 when link is not up.
> 
> Signed-off-by: Andi Kleen <ak@suse.de>
> 
> Index: linux/net/core/netpoll.c
> ===================================================================
> --- linux.orig/net/core/netpoll.c
> +++ linux/net/core/netpoll.c
> @@ -247,9 +247,11 @@ static void netpoll_send_skb(struct netp
>  {
>  	int status;
>  	struct netpoll_info *npinfo;
> +	/* Only try 5 times in case the link is down etc. */
> +	int try = 5;
>  
>  repeat:
> -	if(!np || !np->dev || !netif_running(np->dev)) {
> +	if(try-- == 0 || !np || !np->dev || !netif_running(np->dev)) {
>  		__kfree_skb(skb);
>  		return;
>  	}
> @@ -286,6 +288,9 @@ repeat:
>  
>  	/* transmit busy */
>  	if(status) {
> +		/* Don't count spinlock as try */
> +		if (status == NETDEV_TX_LOCKED)
> +			try++; 
>  		netpoll_poll(np);
>  		goto repeat;
>  	}
> -

This is fixing the symptom and is not the cure.  Unfortunately I don't
have a e1000 card so I can't try a fix. But I did have a e100 card that
would lock up the same way.  The problem was that netpoll_poll calls the
cards netpoll routine (in e1000_main.c e1000_netpoll).  In the e100
case, when the transmit buffer would fill up, the queue would go down.
But the netpoll routine in the e100 code never put it back up after it
was all transfered. So this would lock up the kernel when that happened.

I believe that the e1000 is suffering the same problem, but I can't fix
it since I don't have an e1000 to test, but what probably needs to be
done is to check to see if the transmit buffer can be cleaned and the
queue go back up.

e1000_netpoll calls e1000_intr which looks like this:

static irqreturn_t
e1000_intr(int irq, void *data, struct pt_regs *regs)
{
	struct net_device *netdev = data;
	struct e1000_adapter *adapter = netdev_priv(netdev);
	struct e1000_hw *hw = &adapter->hw;
	uint32_t icr = E1000_READ_REG(hw, ICR);
#ifndef CONFIG_E1000_NAPI
	unsigned int i;
#endif

	if(unlikely(!icr))
		return IRQ_NONE;  /* Not our interrupt */

^^^^^^^^
---- Here I'm wondering if the netpoll case this is returned?


	if(unlikely(icr & (E1000_ICR_RXSEQ | E1000_ICR_LSC))) {
		hw->get_link_status = 1;
		mod_timer(&adapter->watchdog_timer, jiffies);
	}

#ifdef CONFIG_E1000_NAPI
	if(likely(netif_rx_schedule_prep(netdev))) {

		/* Disable interrupts and register for poll. The flush 
		  of the posted write is intentionally left out.
		*/

		atomic_inc(&adapter->irq_sem);
		E1000_WRITE_REG(hw, IMC, ~0);
		__netif_rx_schedule(netdev);
	}
#else
	/* Writing IMC and IMS is needed for 82547.
	   Due to Hub Link bus being occupied, an interrupt
	   de-assertion message is not able to be sent.
	   When an interrupt assertion message is generated later,
	   two messages are re-ordered and sent out.
	   That causes APIC to think 82547 is in de-assertion
	   state, while 82547 is in assertion state, resulting
	   in dead lock. Writing IMC forces 82547 into
	   de-assertion state.
	*/
	if(hw->mac_type == e1000_82547 || hw->mac_type == e1000_82547_rev_2){
		atomic_inc(&adapter->irq_sem);
		E1000_WRITE_REG(hw, IMC, ~0);
	}

	for(i = 0; i < E1000_MAX_INTR; i++)
		if(unlikely(!adapter->clean_rx(adapter) &
		   !e1000_clean_tx_irq(adapter)))
^^^^^
----  This should clean the transmit buffer, but it may not get here.

			break;

	if(hw->mac_type == e1000_82547 || hw->mac_type == e1000_82547_rev_2)
		e1000_irq_enable(adapter);
#endif

	return IRQ_HANDLED;
}



So maybe the patch should be something like:

--- linux-2.6.13-rc3/drivers/net/e1000/e1000_main.c.orig	2005-08-05 09:32:01.000000000 -0400
+++ linux-2.6.13-rc3/drivers/net/e1000/e1000_main.c	2005-08-05 09:33:56.000000000 -0400
@@ -3816,6 +3816,7 @@ e1000_netpoll(struct net_device *netdev)
 	struct e1000_adapter *adapter = netdev_priv(netdev);
 	disable_irq(adapter->pdev->irq);
 	e1000_intr(adapter->pdev->irq, netdev, NULL);
+	e1000_clean_tx_irq(adapter);
 	enable_irq(adapter->pdev->irq);
 }
 #endif


I don't have the card, so I can't test it. But if this works (after
removing the previous patch) then this is the better solution.  If this
does work, then we should probably add the timeout in netpoll with a
warning that the netpoll of the driver is broken:

Here's a modified version of the other patch: So we know where the
problem is.

#### John, Delete this part if you apply the above. ####

--- linux-2.6.13-rc3/net/core/netpoll.c.orig	2005-08-05 09:37:00.000000000 -0400
+++ linux-2.6.13-rc3/net/core/netpoll.c	2005-08-05 09:44:19.000000000 -0400
@@ -247,9 +247,14 @@ static void netpoll_send_skb(struct netp
 {
 	int status;
 	struct netpoll_info *npinfo;
+	/* only try five times incase link is down */
+	int try=5;
 
 repeat:
-	if(!np || !np->dev || !netif_running(np->dev)) {
+	if(try-- == 0 || !np || !np->dev || !netif_running(np->dev)) {
+		if (!try)
+			printk(KERN_WARNING "net driver is stuck down, maybe a"
+					" problem with the driver's netpoll\n");
 		__kfree_skb(skb);
 		return;
 	}
@@ -286,6 +291,9 @@ repeat:
 
 	/* transmit busy */
 	if(status) {
+		/* Don't count spinlock as try */
+		if (status == NETDEV_TX_LOCKED)
+			try++;
 		netpoll_poll(np);
 		goto repeat;
 	}


-- Steve



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: lockups with netconsole on e1000 on media insertion
  2005-08-05 13:49   ` Steven Rostedt
@ 2005-08-05 13:55     ` Andi Kleen
  2005-08-05 14:10       ` Steven Rostedt
  2005-08-07 21:12     ` lockups with netconsole on e1000 on media insertion John Bäckstrand
  1 sibling, 1 reply; 38+ messages in thread
From: Andi Kleen @ 2005-08-05 13:55 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Andi Kleen, Ingo Molnar, netdev, linux-kernel, John B?ckstrand

> This is fixing the symptom and is not the cure.  Unfortunately I don't
> have a e1000 card so I can't try a fix. But I did have a e100 card that
> would lock up the same way.  The problem was that netpoll_poll calls the
> cards netpoll routine (in e1000_main.c e1000_netpoll).  In the e100
> case, when the transmit buffer would fill up, the queue would go down.
> But the netpoll routine in the e100 code never put it back up after it
> was all transfered. So this would lock up the kernel when that happened.

In my case the hang happened when no cable was connected.

There is no way to handle this in any other way. You eventually
have to bail out.

>  
>  repeat:
> -	if(!np || !np->dev || !netif_running(np->dev)) {
> +	if(try-- == 0 || !np || !np->dev || !netif_running(np->dev)) {
> +		if (!try)
> +			printk(KERN_WARNING "net driver is stuck down, maybe a"
> +					" problem with the driver's netpoll\n");

... and nobody will see that. It will not even trigger an output.

-Andi


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: lockups with netconsole on e1000 on media insertion
  2005-08-05 13:55     ` Andi Kleen
@ 2005-08-05 14:10       ` Steven Rostedt
  2005-08-05 14:14         ` Andi Kleen
  0 siblings, 1 reply; 38+ messages in thread
From: Steven Rostedt @ 2005-08-05 14:10 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Ingo Molnar, netdev, linux-kernel, John B?ckstrand

On Fri, 2005-08-05 at 15:55 +0200, Andi Kleen wrote:
> > This is fixing the symptom and is not the cure.  Unfortunately I don't
> > have a e1000 card so I can't try a fix. But I did have a e100 card that
> > would lock up the same way.  The problem was that netpoll_poll calls the
> > cards netpoll routine (in e1000_main.c e1000_netpoll).  In the e100
> > case, when the transmit buffer would fill up, the queue would go down.
> > But the netpoll routine in the e100 code never put it back up after it
> > was all transfered. So this would lock up the kernel when that happened.
> 
> In my case the hang happened when no cable was connected.

But should come back when the cable is reconnected. 

OK, I admit, it shouldn't hang in the first place.

> 
> There is no way to handle this in any other way. You eventually
> have to bail out.
> 
> >  
> >  repeat:
> > -	if(!np || !np->dev || !netif_running(np->dev)) {
> > +	if(try-- == 0 || !np || !np->dev || !netif_running(np->dev)) {
> > +		if (!try)
> > +			printk(KERN_WARNING "net driver is stuck down, maybe a"
> > +					" problem with the driver's netpoll\n");
> 
> ... and nobody will see that. It will not even trigger an output.

Since one would be using net console right? :-)   Oops! I forgot that.
Well it may make it to the logs, since this patch also bails out.
That's why I think your first patch with this warning as well as a fix
for the e1000 should be submitted.  Since the e1000 shouldn't lock up
netpoll just because the queue was put down.

Hmm, how bad is it to have a printk in a routine that is registered to
printk?   If this does print, a "static once" variable should be added
so that this is only printed once and not everytime it tries to print
this message.

-- Steve



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: lockups with netconsole on e1000 on media insertion
  2005-08-05 14:10       ` Steven Rostedt
@ 2005-08-05 14:14         ` Andi Kleen
  2005-08-05 14:27           ` Steven Rostedt
  2005-08-05 14:36           ` [PATCH] netpoll can lock up on low memory Steven Rostedt
  0 siblings, 2 replies; 38+ messages in thread
From: Andi Kleen @ 2005-08-05 14:14 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Andi Kleen, Ingo Molnar, netdev, linux-kernel, John B?ckstrand

On Fri, Aug 05, 2005 at 10:10:13AM -0400, Steven Rostedt wrote:
> On Fri, 2005-08-05 at 15:55 +0200, Andi Kleen wrote:
> > > This is fixing the symptom and is not the cure.  Unfortunately I don't
> > > have a e1000 card so I can't try a fix. But I did have a e100 card that
> > > would lock up the same way.  The problem was that netpoll_poll calls the
> > > cards netpoll routine (in e1000_main.c e1000_netpoll).  In the e100
> > > case, when the transmit buffer would fill up, the queue would go down.
> > > But the netpoll routine in the e100 code never put it back up after it
> > > was all transfered. So this would lock up the kernel when that happened.
> > 
> > In my case the hang happened when no cable was connected.
> 
> But should come back when the cable is reconnected. 

Which might be never. Not an option.

> Hmm, how bad is it to have a printk in a routine that is registered to
> printk?   If this does print, a "static once" variable should be added
> so that this is only printed once and not everytime it tries to print
> this message.

printk notices it is recursing and will not try to output it.

-Andi

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: lockups with netconsole on e1000 on media insertion
  2005-08-05 14:14         ` Andi Kleen
@ 2005-08-05 14:27           ` Steven Rostedt
  2005-08-05 14:36             ` David S. Miller
  2005-08-05 14:36           ` [PATCH] netpoll can lock up on low memory Steven Rostedt
  1 sibling, 1 reply; 38+ messages in thread
From: Steven Rostedt @ 2005-08-05 14:27 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Ingo Molnar, netdev, linux-kernel, John B?ckstrand

On Fri, 2005-08-05 at 16:14 +0200, Andi Kleen wrote:
> On Fri, Aug 05, 2005 at 10:10:13AM -0400, Steven Rostedt wrote:
> > On Fri, 2005-08-05 at 15:55 +0200, Andi Kleen wrote:
> > > > This is fixing the symptom and is not the cure.  Unfortunately I don't
> > > > have a e1000 card so I can't try a fix. But I did have a e100 card that
> > > > would lock up the same way.  The problem was that netpoll_poll calls the
> > > > cards netpoll routine (in e1000_main.c e1000_netpoll).  In the e100
> > > > case, when the transmit buffer would fill up, the queue would go down.
> > > > But the netpoll routine in the e100 code never put it back up after it
> > > > was all transfered. So this would lock up the kernel when that happened.
> > > 
> > > In my case the hang happened when no cable was connected.
> > 
> > But should come back when the cable is reconnected. 
> 
> Which might be never. Not an option.

Hey! You removed my admission to this. Don't make me look stupid
here ;-)

> 
> > Hmm, how bad is it to have a printk in a routine that is registered to
> > printk?   If this does print, a "static once" variable should be added
> > so that this is only printed once and not everytime it tries to print
> > this message.
> 
> printk notices it is recursing and will not try to output it.

Darn it, since this should really be reported.  Yes, the core netpoll
should bail out, but it is also a problem with the driver and should be
fixed.

Come to think of it, I should have submitted a patch that did what you
did when I discovered the problem with the e100. But that network card
was slow and could easily lock up when doing a sysrq-t.  I wasn't
removing cables, so I just submitted the fix for the e100, not thinking
that the netpoll shouldn't lock up itself.

-- Steve



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: lockups with netconsole on e1000 on media insertion
  2005-08-05 14:27           ` Steven Rostedt
@ 2005-08-05 14:36             ` David S. Miller
  2005-08-05 15:02               ` Steven Rostedt
  0 siblings, 1 reply; 38+ messages in thread
From: David S. Miller @ 2005-08-05 14:36 UTC (permalink / raw)
  To: rostedt; +Cc: ak, mingo, netdev, linux-kernel, sandos

From: Steven Rostedt <rostedt@goodmis.org>
Date: Fri, 05 Aug 2005 10:27:06 -0400

> Darn it, since this should really be reported.  Yes, the core netpoll
> should bail out, but it is also a problem with the driver and should be
> fixed.

I don't get how you can even remotely claim this to
be a problem with the driver.

If there is no cable plugged in, the link never comes
up, and that is a completely normal thing.  The netpoll
code should simply not try forever to wait for the link
to go up.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH] netpoll can lock up on low memory.
  2005-08-05 14:14         ` Andi Kleen
  2005-08-05 14:27           ` Steven Rostedt
@ 2005-08-05 14:36           ` Steven Rostedt
  2005-08-05 20:01             ` Matt Mackall
  1 sibling, 1 reply; 38+ messages in thread
From: Steven Rostedt @ 2005-08-05 14:36 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Andrew Morton, Ingo Molnar, netdev, linux-kernel, John B?ckstrand

Looking at the netpoll routines, I noticed that the find_skb could
lockup if the memory is low.  This is because the allocations are called
with GFP_ATOMIC (since this is in interrupt context) and if it fails, it
will continue to fail. This is just by observing the code, I didn't have
this actually happen. So if this is not the case, please let me know how
it can get out. Otherwise, please accept this patch.  Also, as Andi told
me, the printk here would probably not show up anyway if this happens
with netconsole.

Here I changed it to break out instead of just looping.

-- Steve

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

--- linux-2.6.13-rc3/net/core/netpoll.c.orig	2005-08-05 09:37:00.000000000 -0400
+++ linux-2.6.13-rc3/net/core/netpoll.c	2005-08-05 10:29:32.000000000 -0400
@@ -229,8 +229,9 @@ repeat:
 	}

 	if(!skb) {
-		count++;
-		if (once && (count == 1000000)) {
+		if (count++ == 100000)
+			return NULL;
+		if (once)
 			printk("out of netpoll skbs!\n");
 			once = 0;
 		}

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: lockups with netconsole on e1000 on media insertion
  2005-08-05 14:36             ` David S. Miller
@ 2005-08-05 15:02               ` Steven Rostedt
  0 siblings, 0 replies; 38+ messages in thread
From: Steven Rostedt @ 2005-08-05 15:02 UTC (permalink / raw)
  To: David S. Miller; +Cc: ak, mingo, netdev, linux-kernel, sandos

On Fri, 2005-08-05 at 07:36 -0700, David S. Miller wrote:
> From: Steven Rostedt <rostedt@goodmis.org>
> Date: Fri, 05 Aug 2005 10:27:06 -0400
> 
> > Darn it, since this should really be reported.  Yes, the core netpoll
> > should bail out, but it is also a problem with the driver and should be
> > fixed.
> 
> I don't get how you can even remotely claim this to
> be a problem with the driver.
> 
> If there is no cable plugged in, the link never comes
> up, and that is a completely normal thing.  The netpoll
> code should simply not try forever to wait for the link
> to go up.

You're right with that case. The problem with the driver is that it
doesn't clean up the transmits if it just happened to overflow the
transmit buffer and shut down the queue.  The netpoll should at least
see that the queue can be brought up again.  That's what I have a
problem with.  

In other words, I see two bugs:

1. The bug with the netpoll.  It locks up if the driver's queue is down
and never comes up. Which is fixed with Andi's patch.

2.  The bug with the driver. Its netpoll doesn't detect that the queue
can come back up again.  With the timeout on netpoll this may no longer
be a bug, since it should clean itself up after netpoll times out and
turns interrupts back on.  But if a timeout is avoidable by netpoll
being a little smarter, then I believe that it should be fixed.

Now do you understand where I'm coming from?

-- Steve

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] netpoll can lock up on low memory.
  2005-08-05 14:36           ` [PATCH] netpoll can lock up on low memory Steven Rostedt
@ 2005-08-05 20:01             ` Matt Mackall
  2005-08-05 20:57               ` Steven Rostedt
  2005-08-05 21:26               ` Andi Kleen
  0 siblings, 2 replies; 38+ messages in thread
From: Matt Mackall @ 2005-08-05 20:01 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Andi Kleen, Andrew Morton, Ingo Molnar, netdev, linux-kernel,
	John B?ckstrand

On Fri, Aug 05, 2005 at 10:36:31AM -0400, Steven Rostedt wrote:
> Looking at the netpoll routines, I noticed that the find_skb could
> lockup if the memory is low. This is because the allocations are
> called with GFP_ATOMIC (since this is in interrupt context) and if
> it fails, it will continue to fail. This is just by observing the
> code, I didn't have this actually happen. So if this is not the
> case, please let me know how it can get out. Otherwise, please
> accept this patch.

By netpoll_poll() tickling the driver enough to free the currently
queued outgoing SKBs.

Also note that by the time we're in this loop, we're ready to take
desperate measures. We've already exhausted our private queue of SKBs
so we have no alternative but to keep kicking the driver until
something happens.

The netpoll philosophy is to assume that its traffic is an absolute
priority - it is better to potentially hang trying to deliver a panic
message than to give up and crash silently.

> Also, as Andi told me, the printk here would probably not show up
> anyway if this happens with netconsole.

That's fine. But in fact, it does show up occassionally - I've seen
it.

NAK'ed.

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: lockups with netconsole on e1000 on media insertion
  2005-08-05 11:45 ` lockups with netconsole on e1000 on media insertion Andi Kleen
  2005-08-05 12:44   ` John Bäckstrand
  2005-08-05 13:49   ` Steven Rostedt
@ 2005-08-05 20:12   ` Matt Mackall
  2005-08-05 21:56     ` Andi Kleen
  2 siblings, 1 reply; 38+ messages in thread
From: Matt Mackall @ 2005-08-05 20:12 UTC (permalink / raw)
  To: Andi Kleen; +Cc: John B?ckstrand, linux-kernel, netdev

On Fri, Aug 05, 2005 at 01:45:55PM +0200, Andi Kleen wrote:
> John B?ckstrand <sandos@home.se> writes:
> 
> > I've been trying to hunt down a hard lockup issue with some hardware
> > of mine, but I've possibly hit a kernel bug instead. When using
> > netconsole on my e1000, if I unplug the cable and then re-plug it, the
> > machine locks up hard. It manages to print the "link up" message on
> > the screen, but nothing after that. Now, I wonder if this is supposed
> > to be so? I tried this on 4 different configurations, 2.6.13-rc5 and
> > 2.6.12 with and without "noapic acpi=off", same result on all of
> > them. I've tried with 1 and 3 other NICs in the machine at the same
> > time.
> 
> I ran into the same problem some time ago on e1000. The problem was
> that if the link doesn't come up netconsole ends up waiting forever
> for it.

I still don't like this fix. Yes, you're right, it should eventually
give up. But here it gives up way too easily - 5 could easily
translate to 5 microseconds. This is analogous to giving up on serial
transmit if CTS is down for 5 loops.

I'd be much happier if there were some udelay or the like in here so
that we're not giving up on such a short timeframe.

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] netpoll can lock up on low memory.
  2005-08-05 20:01             ` Matt Mackall
@ 2005-08-05 20:57               ` Steven Rostedt
  2005-08-05 21:28                 ` Matt Mackall
  2005-08-05 21:26               ` Andi Kleen
  1 sibling, 1 reply; 38+ messages in thread
From: Steven Rostedt @ 2005-08-05 20:57 UTC (permalink / raw)
  To: Matt Mackall
  Cc: Andi Kleen, Andrew Morton, Ingo Molnar, netdev, linux-kernel,
	John B?ckstrand

On Fri, 2005-08-05 at 13:01 -0700, Matt Mackall wrote:
> On Fri, Aug 05, 2005 at 10:36:31AM -0400, Steven Rostedt wrote:
> > Looking at the netpoll routines, I noticed that the find_skb could
> > lockup if the memory is low. This is because the allocations are
> > called with GFP_ATOMIC (since this is in interrupt context) and if
> > it fails, it will continue to fail. This is just by observing the
> > code, I didn't have this actually happen. So if this is not the
> > case, please let me know how it can get out. Otherwise, please
> > accept this patch.
> 
> By netpoll_poll() tickling the driver enough to free the currently
> queued outgoing SKBs.

I believe that the e1000 wont free up any outgoing packets since the
netpoll call doesn't seem to get to the e1000_clean_tx part of the
e1000_intr, otherwise the system wouldn't lock under the
netpoll_send_skb when one disconnects the wire and puts it back in.  The
disconnect would lock it up anyway (with Andi's patch it now doesn't)
but since it won't come back up after the link is back up, there seems
to be something wrong with the e1000 netpoll driver.  This is because
the e1000_netpoll doesn't seem to be cleaning up the tx buffer and start
the queue back up.

> 
> Also note that by the time we're in this loop, we're ready to take
> desperate measures. We've already exhausted our private queue of SKBs
> so we have no alternative but to keep kicking the driver until
> something happens.

OK, the system is under heavy memory load and starts eating up the
netpoll packets.  When the last packet is gone, and you have something
like the e1000 that doesn't clean up its packets with netpoll, then you
just locked up the system.

The scary part of this loop is that if the netpoll doesn't come up with
the goods, its game over.  Say we are at desperate measures but it could
be a case where we need to output more information and lockup here
before we can go out and free some memory. 

> 
> The netpoll philosophy is to assume that its traffic is an absolute
> priority - it is better to potentially hang trying to deliver a panic
> message than to give up and crash silently.

So even a long timeout would not do?  So you don't even get a message to
the console?

> 
> > Also, as Andi told me, the printk here would probably not show up
> > anyway if this happens with netconsole.
> 
> That's fine. But in fact, it does show up occassionally - I've seen
> it.

Then maybe what Andi told me is not true ;-)

Oh, and did your machine crash when you saw it?  Have you seen it with
the e1000 driver?

> 
> NAK'ed.
> 
(ouch!)

OK, since my argument is currently only theory, and I don't have a e1000
card to test this on, I'll take out my fix to the e100 (where it cleaned
up it's tx drivers in netpoll) and see if I can get the machine to
lockup here just by putting it under extreme memory loads.

-- Steve

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] netpoll can lock up on low memory.
  2005-08-05 20:01             ` Matt Mackall
  2005-08-05 20:57               ` Steven Rostedt
@ 2005-08-05 21:26               ` Andi Kleen
  2005-08-05 21:42                 ` Matt Mackall
                                   ` (2 more replies)
  1 sibling, 3 replies; 38+ messages in thread
From: Andi Kleen @ 2005-08-05 21:26 UTC (permalink / raw)
  To: Matt Mackall
  Cc: Steven Rostedt, Andi Kleen, Andrew Morton, Ingo Molnar, netdev,
	linux-kernel, John B?ckstrand, davem

On Fri, Aug 05, 2005 at 01:01:57PM -0700, Matt Mackall wrote:
> The netpoll philosophy is to assume that its traffic is an absolute
> priority - it is better to potentially hang trying to deliver a panic
> message than to give up and crash silently.

That would be ok if netpoll was only used to deliver panics. But 
it is not. It delivers all messages, and you cannot hang the 
kernel during that. Actually even for panics it is wrong, because often
it is more important to reboot in a panic than (with a panic timeout) 
to actually deliver the panic. That's needed e.g. in a failover cluster.

If that was the policy it would be a quite dumb one and make netpoll
totally unsuitable for production use. I hope it is not.

> 
> > Also, as Andi told me, the printk here would probably not show up
> > anyway if this happens with netconsole.
> 
> That's fine. But in fact, it does show up occassionally - I've seen
> it.
> 
> NAK'ed.

Too bad. This would mean that all serious non toy users of netpoll
would have to carry this patch on their own. But that wouldn't be good.

Dave, can you please apply the timeout patch anyways?

I suspect Steven's patch for the e1000 is needed in addition to
handle different cases too.

-Andi

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] netpoll can lock up on low memory.
  2005-08-05 20:57               ` Steven Rostedt
@ 2005-08-05 21:28                 ` Matt Mackall
  2005-08-06  0:23                   ` Steven Rostedt
  0 siblings, 1 reply; 38+ messages in thread
From: Matt Mackall @ 2005-08-05 21:28 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Andi Kleen, Andrew Morton, Ingo Molnar, netdev, linux-kernel,
	John B?ckstrand

On Fri, Aug 05, 2005 at 04:57:00PM -0400, Steven Rostedt wrote:
> On Fri, 2005-08-05 at 13:01 -0700, Matt Mackall wrote:
> > On Fri, Aug 05, 2005 at 10:36:31AM -0400, Steven Rostedt wrote:
> > > Looking at the netpoll routines, I noticed that the find_skb could
> > > lockup if the memory is low. This is because the allocations are
> > > called with GFP_ATOMIC (since this is in interrupt context) and if
> > > it fails, it will continue to fail. This is just by observing the
> > > code, I didn't have this actually happen. So if this is not the
> > > case, please let me know how it can get out. Otherwise, please
> > > accept this patch.
> > 
> > By netpoll_poll() tickling the driver enough to free the currently
> > queued outgoing SKBs.
> 
> I believe that the e1000 wont free up any outgoing packets since the
> netpoll call doesn't seem to get to the e1000_clean_tx part of the
> e1000_intr, otherwise the system wouldn't lock under the
> netpoll_send_skb when one disconnects the wire and puts it back in.  The
> disconnect would lock it up anyway (with Andi's patch it now doesn't)
> but since it won't come back up after the link is back up, there seems
> to be something wrong with the e1000 netpoll driver.  This is because
> the e1000_netpoll doesn't seem to be cleaning up the tx buffer and start
> the queue back up.

That does seem like a driver problem.

> > Also note that by the time we're in this loop, we're ready to take
> > desperate measures. We've already exhausted our private queue of SKBs
> > so we have no alternative but to keep kicking the driver until
> > something happens.
> 
> OK, the system is under heavy memory load and starts eating up the
> netpoll packets.  When the last packet is gone, and you have something
> like the e1000 that doesn't clean up its packets with netpoll, then you
> just locked up the system.
> 
> The scary part of this loop is that if the netpoll doesn't come up with
> the goods, its game over.  Say we are at desperate measures but it could
> be a case where we need to output more information and lockup here
> before we can go out and free some memory. 

Realistically, we were probably going to crash anyway at this point as
we're apparently failing to recycle SKBs.

Netpoll generally must assume it won't get a second chance, as it's
being called by things like oops() and panic() and used by things like
kgdb. If netpoll fails, the box is dead anyway.

> > The netpoll philosophy is to assume that its traffic is an absolute
> > priority - it is better to potentially hang trying to deliver a panic
> > message than to give up and crash silently.
> 
> So even a long timeout would not do?  So you don't even get a message to
> the console?

In general, there's no way to measure time here. And if we're
using netconsole, what makes you think there's any other console?

> > > Also, as Andi told me, the printk here would probably not show up
> > > anyway if this happens with netconsole.
> > 
> > That's fine. But in fact, it does show up occassionally - I've seen
> > it.
> 
> Then maybe what Andi told me is not true ;-)
> 
> Oh, and did your machine crash when you saw it?  Have you seen it with
> the e1000 driver?

No and no. Most of my own testing is done with tg3.

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] netpoll can lock up on low memory.
  2005-08-05 21:26               ` Andi Kleen
@ 2005-08-05 21:42                 ` Matt Mackall
  2005-08-05 21:51                   ` Andi Kleen
  2005-08-06  0:30                 ` Steven Rostedt
  2005-08-06  7:45                 ` Ingo Molnar
  2 siblings, 1 reply; 38+ messages in thread
From: Matt Mackall @ 2005-08-05 21:42 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Steven Rostedt, Andrew Morton, Ingo Molnar, netdev, linux-kernel,
	John B?ckstrand, davem

On Fri, Aug 05, 2005 at 11:26:10PM +0200, Andi Kleen wrote:
> On Fri, Aug 05, 2005 at 01:01:57PM -0700, Matt Mackall wrote:
> > The netpoll philosophy is to assume that its traffic is an absolute
> > priority - it is better to potentially hang trying to deliver a panic
> > message than to give up and crash silently.
> 
> That would be ok if netpoll was only used to deliver panics. But 
> it is not. It delivers all messages, and you cannot hang the 
> kernel during that. Actually even for panics it is wrong, because often
> it is more important to reboot in a panic than (with a panic timeout) 
> to actually deliver the panic. That's needed e.g. in a failover cluster.
> 
> If that was the policy it would be a quite dumb one and make netpoll
> totally unsuitable for production use. I hope it is not.

Suggest you rip __GFP_NOFAIL out of JBD before complaining about this.

> Dave, can you please apply the timeout patch anyways?

Yes, let's go right over the maintainer's objections almost
immediately after he chimes into the thread. I'll spare the list the
colorful language this inspires.

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] netpoll can lock up on low memory.
  2005-08-05 21:42                 ` Matt Mackall
@ 2005-08-05 21:51                   ` Andi Kleen
  2005-08-06  1:16                     ` Matt Mackall
  0 siblings, 1 reply; 38+ messages in thread
From: Andi Kleen @ 2005-08-05 21:51 UTC (permalink / raw)
  To: Matt Mackall
  Cc: Andi Kleen, Steven Rostedt, Andrew Morton, Ingo Molnar, netdev,
	linux-kernel, John B?ckstrand, davem

> > If that was the policy it would be a quite dumb one and make netpoll
> > totally unsuitable for production use. I hope it is not.
> 
> Suggest you rip __GFP_NOFAIL out of JBD before complaining about this.

So you're suggesting we should become as bad at handling networking
errors as we are at handling IO errors?  

> > Dave, can you please apply the timeout patch anyways?
> 
> Yes, let's go right over the maintainer's objections almost
> immediately after he chimes into the thread. I'll spare the list the
> colorful language this inspires.

Sure when the maintainer has a unreasonable position on something
I think that's justified. Yours in this case is clearly unreasonable.

-Andi

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: lockups with netconsole on e1000 on media insertion
  2005-08-05 20:12   ` Matt Mackall
@ 2005-08-05 21:56     ` Andi Kleen
  2005-08-05 23:20       ` Matt Mackall
  0 siblings, 1 reply; 38+ messages in thread
From: Andi Kleen @ 2005-08-05 21:56 UTC (permalink / raw)
  To: Matt Mackall; +Cc: Andi Kleen, John B?ckstrand, linux-kernel, netdev

> I still don't like this fix. Yes, you're right, it should eventually
> give up. But here it gives up way too easily - 5 could easily
> translate to 5 microseconds. This is analogous to giving up on serial
> transmit if CTS is down for 5 loops.
> 
> I'd be much happier if there were some udelay or the like in here so
> that we're not giving up on such a short timeframe.

Problem is that it could translate to a long aggregate delay
e.g. when the kernel tries to dump the backlog after console_init.
That is why I made the delay so short.

Longer delay would be possible, but then it would need some logic
to detect down links and don't delay on them and then retry later etc. 
Would be all far more complicated.

-Andi

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: lockups with netconsole on e1000 on media insertion
  2005-08-05 21:56     ` Andi Kleen
@ 2005-08-05 23:20       ` Matt Mackall
  2005-08-05 23:51         ` Andi Kleen
  0 siblings, 1 reply; 38+ messages in thread
From: Matt Mackall @ 2005-08-05 23:20 UTC (permalink / raw)
  To: Andi Kleen; +Cc: John B?ckstrand, linux-kernel, netdev

On Fri, Aug 05, 2005 at 11:56:50PM +0200, Andi Kleen wrote:
> > I still don't like this fix. Yes, you're right, it should eventually
> > give up. But here it gives up way too easily - 5 could easily
> > translate to 5 microseconds. This is analogous to giving up on serial
> > transmit if CTS is down for 5 loops.
> > 
> > I'd be much happier if there were some udelay or the like in here so
> > that we're not giving up on such a short timeframe.
> 
> Problem is that it could translate to a long aggregate delay
> e.g. when the kernel tries to dump the backlog after console_init.
> That is why I made the delay so short.

But why are we in a hurry to dump the backlog on the floor? Why are we
worrying about the performance of netpoll without the cable plugged in
at all? We shouldn't be optimizing the data loss case.

My primary concern here is that the loop have a non-negligible extent
in time. 5 loops is effectively equal to none. I'd be very surprised
if it was even enough for deglitching.

With serial console, we do polled I/O that runs at the serial rate -
milliseconds per line of output.

> Longer delay would be possible, but then it would need some logic
> to detect down links and don't delay on them and then retry later etc. 
> Would be all far more complicated.

I think we could probably have subsequent failures be much shorter
without too much added complexity. But I'm not sure it matters.

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: lockups with netconsole on e1000 on media insertion
  2005-08-05 23:20       ` Matt Mackall
@ 2005-08-05 23:51         ` Andi Kleen
  2005-08-06  1:22           ` Matt Mackall
  0 siblings, 1 reply; 38+ messages in thread
From: Andi Kleen @ 2005-08-05 23:51 UTC (permalink / raw)
  To: Matt Mackall; +Cc: Andi Kleen, John B?ckstrand, linux-kernel, netdev

> But why are we in a hurry to dump the backlog on the floor? Why are we
> worrying about the performance of netpoll without the cable plugged in
> at all? We shouldn't be optimizing the data loss case.

Because a system shouldn't stall for minutes (or forever like right now) 
at boot just because the network cable isn't plugged in.

> 
> My primary concern here is that the loop have a non-negligible extent
> in time. 5 loops is effectively equal to none. I'd be very surprised
> if it was even enough for deglitching.

In the normal case the packets should just be send out.

-Andi

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] netpoll can lock up on low memory.
  2005-08-05 21:28                 ` Matt Mackall
@ 2005-08-06  0:23                   ` Steven Rostedt
  2005-08-06  1:53                     ` Matt Mackall
  0 siblings, 1 reply; 38+ messages in thread
From: Steven Rostedt @ 2005-08-06  0:23 UTC (permalink / raw)
  To: Matt Mackall
  Cc: Andi Kleen, Andrew Morton, Ingo Molnar, netdev, linux-kernel,
	John B?ckstrand

On Fri, 2005-08-05 at 14:28 -0700, Matt Mackall wrote:
> 
> Netpoll generally must assume it won't get a second chance, as it's
> being called by things like oops() and panic() and used by things like
> kgdb. If netpoll fails, the box is dead anyway.
> 

But it is also being called by every printk in the kernel. What happens
when the printk that causes this lock up is not a panic but just some
info print.  One would use netconsole when they turn on more verbose
printing, to keep the output fast, right?  So if the system gets a
little memory tight, but not to the point of failing, this will cause a
lock up and no one would know why. 

If you need to really get the data out, then the design should be
changed.  Have some return value showing the failure, check for
oops_in_progress or whatever, and try again after turning interrupts
back on, and getting to a point where the system can free up memory
(write to swap, etc).  Just a busy loop without ever getting a skb is
just bad.

> > > The netpoll philosophy is to assume that its traffic is an absolute
> > > priority - it is better to potentially hang trying to deliver a panic
> > > message than to give up and crash silently.
> > 
> > So even a long timeout would not do?  So you don't even get a message to
> > the console?
> 
> In general, there's no way to measure time here. And if we're
> using netconsole, what makes you think there's any other console?

Why assume that there isn't another console?  The screen may be used
with netconsole, you just lose whatever has been scrolled too far.

> 
> > > > Also, as Andi told me, the printk here would probably not show up
> > > > anyway if this happens with netconsole.
> > > 
> > > That's fine. But in fact, it does show up occassionally - I've seen
> > > it.
> > 
> > Then maybe what Andi told me is not true ;-)
> > 
> > Oh, and did your machine crash when you saw it?  Have you seen it with
> > the e1000 driver?
> 
> No and no. Most of my own testing is done with tg3.
> 

If you saw the message and the system didn't crash, then that's proof
that if the driver is not working properly, you would have lock up the
system, and the system was _not_ in a state that it _had_ to get the
message out.

-- Steve

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] netpoll can lock up on low memory.
  2005-08-05 21:26               ` Andi Kleen
  2005-08-05 21:42                 ` Matt Mackall
@ 2005-08-06  0:30                 ` Steven Rostedt
  2005-08-06  7:45                 ` Ingo Molnar
  2 siblings, 0 replies; 38+ messages in thread
From: Steven Rostedt @ 2005-08-06  0:30 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Matt Mackall, Andrew Morton, Ingo Molnar, netdev, linux-kernel,
	John B?ckstrand, davem

On Fri, 2005-08-05 at 23:26 +0200, Andi Kleen wrote:

> I suspect Steven's patch for the e1000 is needed in addition to
> handle different cases too.
> 

I haven't tested it. Someone with a e1000 must see if it works. I
submitted the e100 fix that had the same problem, but I would feel
better if the patch I sent for the e1000 actually got tested.

To test, one would setup a box with the e1000 and netconsole. Run with
something doing several printks (possible using sysrq-t or such), and
then unplug the cable (without Andi's patch) and replug it back in. If
the patch worked, the system would hang while the cable was detached,
but come back shortly after the cable was plugged back in.

-- Steve

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] netpoll can lock up on low memory.
  2005-08-05 21:51                   ` Andi Kleen
@ 2005-08-06  1:16                     ` Matt Mackall
  0 siblings, 0 replies; 38+ messages in thread
From: Matt Mackall @ 2005-08-06  1:16 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Steven Rostedt, Andrew Morton, Ingo Molnar, netdev, linux-kernel,
	John B?ckstrand, davem

On Fri, Aug 05, 2005 at 11:51:18PM +0200, Andi Kleen wrote:
> > > If that was the policy it would be a quite dumb one and make netpoll
> > > totally unsuitable for production use. I hope it is not.
> > 
> > Suggest you rip __GFP_NOFAIL out of JBD before complaining about this.
> 
> So you're suggesting we should become as bad at handling networking
> errors as we are at handling IO errors?  

No, I'm suggesting that the machine will hang forever sometimes and no
amount of patching up netpoll or JBD, etc. will fix that. A hardware
watchdog is a requirement for robust failover anyway and if you think
otherwise, you're dreaming.

And for reference, both are examples of theoretical
should-never-happen memory allocation failures.

> > > Dave, can you please apply the timeout patch anyways?
> > 
> > Yes, let's go right over the maintainer's objections almost
> > immediately after he chimes into the thread. I'll spare the list the
> > colorful language this inspires.
> 
> Sure when the maintainer has a unreasonable position on something
> I think that's justified. Yours in this case is clearly unreasonable.

What's clear is that you didn't like my position from my very first
post in this thread and immediately went for the nuclear option
without even trying to discuss it.

Are you even aware that the patch we're discussing here is for a problem
that has yet to be observed and that Steven's initial analysis had
missed a couple things?

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: lockups with netconsole on e1000 on media insertion
  2005-08-05 23:51         ` Andi Kleen
@ 2005-08-06  1:22           ` Matt Mackall
  2005-08-06  1:37             ` Daniel Phillips
  0 siblings, 1 reply; 38+ messages in thread
From: Matt Mackall @ 2005-08-06  1:22 UTC (permalink / raw)
  To: Andi Kleen; +Cc: John B?ckstrand, linux-kernel, netdev

On Sat, Aug 06, 2005 at 01:51:22AM +0200, Andi Kleen wrote:
> > But why are we in a hurry to dump the backlog on the floor? Why are we
> > worrying about the performance of netpoll without the cable plugged in
> > at all? We shouldn't be optimizing the data loss case.
> 
> Because a system shouldn't stall for minutes (or forever like right now) 
> at boot just because the network cable isn't plugged in.

Using netconsole without a network cable could well be classified as a
serious configuration error. NFS also is a bit sluggish without a
network cable.

I've already agreed that forever is a problem. Can we work towards
agreeing on a non-trivial timeout, please?

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: lockups with netconsole on e1000 on media insertion
  2005-08-06  1:22           ` Matt Mackall
@ 2005-08-06  1:37             ` Daniel Phillips
  0 siblings, 0 replies; 38+ messages in thread
From: Daniel Phillips @ 2005-08-06  1:37 UTC (permalink / raw)
  To: Matt Mackall; +Cc: Andi Kleen, John B?ckstrand, linux-kernel, netdev

On Saturday 06 August 2005 11:22, Matt Mackall wrote:
> On Sat, Aug 06, 2005 at 01:51:22AM +0200, Andi Kleen wrote:
> > > But why are we in a hurry to dump the backlog on the floor? Why are we
> > > worrying about the performance of netpoll without the cable plugged in
> > > at all? We shouldn't be optimizing the data loss case.
> >
> > Because a system shouldn't stall for minutes (or forever like right now)
> > at boot just because the network cable isn't plugged in.
>
> Using netconsole without a network cable could well be classified as a
> serious configuration error.

But please don't.  An OS that slows to a crawl or crashes because a cable 
isn't plugged in an OS that deserves to be ridiculed.  Silly timeouts on boot 
are scary and a waste of user's time.

Regards,

Daniel

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] netpoll can lock up on low memory.
  2005-08-06  0:23                   ` Steven Rostedt
@ 2005-08-06  1:53                     ` Matt Mackall
  2005-08-06  2:32                       ` Steven Rostedt
  0 siblings, 1 reply; 38+ messages in thread
From: Matt Mackall @ 2005-08-06  1:53 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Andi Kleen, Andrew Morton, Ingo Molnar, netdev, linux-kernel,
	John B?ckstrand

On Fri, Aug 05, 2005 at 08:23:55PM -0400, Steven Rostedt wrote:
> On Fri, 2005-08-05 at 14:28 -0700, Matt Mackall wrote:
> > 
> > Netpoll generally must assume it won't get a second chance, as it's
> > being called by things like oops() and panic() and used by things like
> > kgdb. If netpoll fails, the box is dead anyway.
> 
> But it is also being called by every printk in the kernel. What happens
> when the printk that causes this lock up is not a panic but just some
> info print.  One would use netconsole when they turn on more verbose
> printing, to keep the output fast, right?  So if the system gets a
> little memory tight, but not to the point of failing, this will cause a
> lock up and no one would know why. 

This doesn't happen, or if it does, it happens far less often than
genuine crashes. This can only happen when netpoll's burned through
its entire pool of SKBs in a single interrupt and the card never
releases them, despite repeated prodding. In other words, you'll get
most of a dump out of the box, but you're probably screwed no matter
what you do.

Also note that _any_ printk can be the kernel's dying breath. This is
why for both serial and video we do polled I/O to be sure we actually
get our message out. Netconsole is no different.

> If you need to really get the data out, then the design should be
> changed.  Have some return value showing the failure, check for
> oops_in_progress or whatever, and try again after turning interrupts
> back on, and getting to a point where the system can free up memory
> (write to swap, etc).  Just a busy loop without ever getting a skb is
> just bad.

Why, pray tell, do you think there will be a second chance after
re-enabling interrupts? How does this work when we're panicking or
oopsing where we most care? How does this work when the netpoll client
is the kernel debugger and the machine is completely stopped because
we're tracing it?

As for busy loops, let me direct you to the "poll" part of the name.
It is in fact the whole point.

> > > So even a long timeout would not do?  So you don't even get a message to
> > > the console?
> > 
> > In general, there's no way to measure time here. And if we're
> > using netconsole, what makes you think there's any other console?
> 
> Why assume that there isn't another console?  The screen may be used
> with netconsole, you just lose whatever has been scrolled too far.

Yes, there may be another console, but we should by no means depend on
that being the case. We should in fact assume it's not.

> > > > > Also, as Andi told me, the printk here would probably not show up
> > > > > anyway if this happens with netconsole.
> > > > 
> > > > That's fine. But in fact, it does show up occassionally - I've seen
> > > > it.
> > > 
> > > Then maybe what Andi told me is not true ;-)
> > > 
> > > Oh, and did your machine crash when you saw it?  Have you seen it with
> > > the e1000 driver?
> > 
> > No and no. Most of my own testing is done with tg3.
> > 
> 
> If you saw the message and the system didn't crash, then that's proof
> that if the driver is not working properly, you would have lock up the
> system, and the system was _not_ in a state that it _had_ to get the
> message out.

Let me be more precise. I've seen it in the middle of an oops dump,
where it complained, then made further progress, and then died. In
other words, the code works. And I've since upped the pool size.

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] netpoll can lock up on low memory.
  2005-08-06  1:53                     ` Matt Mackall
@ 2005-08-06  2:32                       ` Steven Rostedt
  2005-08-06  7:30                         ` Daniel Phillips
                                           ` (2 more replies)
  0 siblings, 3 replies; 38+ messages in thread
From: Steven Rostedt @ 2005-08-06  2:32 UTC (permalink / raw)
  To: Matt Mackall
  Cc: David S. Miller, Andi Kleen, Andrew Morton, Ingo Molnar, netdev,
	linux-kernel, John B?ckstrand

On Fri, 2005-08-05 at 18:53 -0700, Matt Mackall wrote:
> On Fri, Aug 05, 2005 at 08:23:55PM -0400, Steven Rostedt wrote:
[...]
> > If you need to really get the data out, then the design should be
> > changed.  Have some return value showing the failure, check for
> > oops_in_progress or whatever, and try again after turning interrupts
> > back on, and getting to a point where the system can free up memory
> > (write to swap, etc).  Just a busy loop without ever getting a skb is
> > just bad.
> 
> Why, pray tell, do you think there will be a second chance after
> re-enabling interrupts? How does this work when we're panicking or
> oopsing where we most care? How does this work when the netpoll client
> is the kernel debugger and the machine is completely stopped because
> we're tracing it?

What I meant was to check for an oops and maybe then don't break out.
Otherwise let the system try to reclaim memory. Since this is locked
when the alloc_skb called with GFP_ATOMIC and fails.

> 
> As for busy loops, let me direct you to the "poll" part of the name.
> It is in fact the whole point.

In the kernel I would think that a poll would probe for an event and let
the system continue if the event hasn't arrived.  Not block all
activities until an event has arrived.

> 
> > > > So even a long timeout would not do?  So you don't even get a message to
> > > > the console?
> > > 
> > > In general, there's no way to measure time here. And if we're
> > > using netconsole, what makes you think there's any other console?
> > 
> > Why assume that there isn't another console?  The screen may be used
> > with netconsole, you just lose whatever has been scrolled too far.
> 
> Yes, there may be another console, but we should by no means depend on
> that being the case. We should in fact assume it's not.
> 
> > > > > > Also, as Andi told me, the printk here would probably not show up
> > > > > > anyway if this happens with netconsole.
> > > > > 
> > > > > That's fine. But in fact, it does show up occassionally - I've seen
> > > > > it.
> > > > 
> > > > Then maybe what Andi told me is not true ;-)
> > > > 
> > > > Oh, and did your machine crash when you saw it?  Have you seen it with
> > > > the e1000 driver?
> > > 
> > > No and no. Most of my own testing is done with tg3.
> > > 
> > 
> > If you saw the message and the system didn't crash, then that's proof
> > that if the driver is not working properly, you would have lock up the
> > system, and the system was _not_ in a state that it _had_ to get the
> > message out.
> 
> Let me be more precise. I've seen it in the middle of an oops dump,
> where it complained, then made further progress, and then died. In
> other words, the code works. And I've since upped the pool size.

OK, this is more clear than what you said previously.  When I asked if
the system crashed, I should have asked if the system was crashing.  I
thought that you meant that you saw this in normal activity with no
oops.

So, if anything, this discussion has pointed out that the e1000 has a
problem with its netpoll.  I wrote an earlier patch, but since I don't
own a e1000, someone will need to test it, or at least check to see if
it looks OK.

-- Steve



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] netpoll can lock up on low memory.
  2005-08-06  2:32                       ` Steven Rostedt
@ 2005-08-06  7:30                         ` Daniel Phillips
  2005-08-06  7:58                         ` Ingo Molnar
  2005-08-06  9:46                         ` David S. Miller
  2 siblings, 0 replies; 38+ messages in thread
From: Daniel Phillips @ 2005-08-06  7:30 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Matt Mackall, David S. Miller, Andi Kleen, Andrew Morton,
	Ingo Molnar, netdev, linux-kernel, John B?ckstrand

On Saturday 06 August 2005 12:32, Steven Rostedt wrote:
> > > If you need to really get the data out, then the design should be
> > > changed.  Have some return value showing the failure, check for
> > > oops_in_progress or whatever, and try again after turning interrupts
> > > back on, and getting to a point where the system can free up memory
> > > (write to swap, etc).  Just a busy loop without ever getting a skb is
> > > just bad.
> >
> > Why, pray tell, do you think there will be a second chance after
> > re-enabling interrupts? How does this work when we're panicking or
> > oopsing where we most care? How does this work when the netpoll client
> > is the kernel debugger and the machine is completely stopped because
> > we're tracing it?
>
> What I meant was to check for an oops and maybe then don't break out.
> Otherwise let the system try to reclaim memory. Since this is locked
> when the alloc_skb called with GFP_ATOMIC and fails.

You might want to take a look at my stupid little __GFP_MEMALLOC hack in the 
network block IO deadlock thread on netdev.  It will let you use the memalloc 
reserve from atomic context.  As long as you can be sure your usage will be 
bounded and you will eventually give it back, this should be ok.

Regards,

Daniel

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] netpoll can lock up on low memory.
  2005-08-05 21:26               ` Andi Kleen
  2005-08-05 21:42                 ` Matt Mackall
  2005-08-06  0:30                 ` Steven Rostedt
@ 2005-08-06  7:45                 ` Ingo Molnar
  2005-08-06 11:29                   ` Andi Kleen
  2 siblings, 1 reply; 38+ messages in thread
From: Ingo Molnar @ 2005-08-06  7:45 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Matt Mackall, Steven Rostedt, Andrew Morton, netdev, linux-kernel,
	John B?ckstrand, davem

* Andi Kleen <ak@suse.de> wrote:

> On Fri, Aug 05, 2005 at 01:01:57PM -0700, Matt Mackall wrote:
> > The netpoll philosophy is to assume that its traffic is an absolute
> > priority - it is better to potentially hang trying to deliver a panic
> > message than to give up and crash silently.
> 
> That would be ok if netpoll was only used to deliver panics. But it is 
> not. It delivers all messages, and you cannot hang the kernel during 
> that. Actually even for panics it is wrong, because often it is more 
> important to reboot in a panic than (with a panic timeout) to actually 
> deliver the panic. That's needed e.g. in a failover cluster.

without going into the merits of this discussion, reliable failover 
clusters must include (and do include) an external ability to cut power.  
No amount of in-kernel logic will prevent the kernel from hanging, given 
a bad enough kernel bug.

So the right question is not 'can we prevent the kernel from hanging, 
ever' (we cannot), but 'which change makes it less likely for the kernel 
to hang'. (and, obviously: assuming all other kernel components are 
functioning per specification, netpoll itself most not hang :-)

even a plain printk to VGA can hang in certain kernel crashes. Netpoll 
is more complex and thus has more exposure to hangs. E.g. netpoll relies 
on the network driver to correctly recycle skbs within a bound amount of 
time. If the network driver leaks skbs, it's game over for netpoll.

[ i'd prefer a hang over nondeterministic behavior, and e.g. losing 
  console messages is sure nondeterministic behavior. What if the 
  console message is "WARNING: the box has just been broken into"? ]

we could do one thing (see the patch below): i think it would be useful 
to fill up the netlogging skb queue straight at initialization time.  
Especially if netpoll is used for dumping alone, the system might not be 
in a situation to fill up the queue at the point of crash, so better be 
a bit more prepared and keep the pipeline filled.

	Ingo

Signed-off-by: Ingo Molnar <mingo@elte.hu>

--- net/core/netpoll.c.orig
+++ net/core/netpoll.c
@@ -720,6 +720,8 @@ int netpoll_setup(struct netpoll *np)
 	}
 	/* last thing to do is link it to the net device structure */
 	ndev->npinfo = npinfo;
+	/* fill up the skb queue */
+	refill_skbs();

 	return 0;

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] netpoll can lock up on low memory.
  2005-08-06  2:32                       ` Steven Rostedt
  2005-08-06  7:30                         ` Daniel Phillips
@ 2005-08-06  7:58                         ` Ingo Molnar
  2005-08-06 23:10                           ` Matt Mackall
  2005-08-06  9:46                         ` David S. Miller
  2 siblings, 1 reply; 38+ messages in thread
From: Ingo Molnar @ 2005-08-06  7:58 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Matt Mackall, David S. Miller, Andi Kleen, Andrew Morton, netdev,
	linux-kernel, John B?ckstrand


btw., the current NR_SKBS 32 in netpoll.c seems quite low, especially 
e1000 can have a whole lot more skbs queued at once. Might be more 
robust to increase it to 128 or 256?

	Ingo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] netpoll can lock up on low memory.
  2005-08-06  2:32                       ` Steven Rostedt
  2005-08-06  7:30                         ` Daniel Phillips
  2005-08-06  7:58                         ` Ingo Molnar
@ 2005-08-06  9:46                         ` David S. Miller
  2005-08-06  9:57                           ` Steven Rostedt
  2 siblings, 1 reply; 38+ messages in thread
From: David S. Miller @ 2005-08-06  9:46 UTC (permalink / raw)
  To: rostedt; +Cc: mpm, ak, akpm, mingo, netdev, linux-kernel, sandos


Can you guys stop peeing your pants over this, put aside
your differences, and work on a mutually acceptable fix
for these bugs?

Much appreciated, thanks :-)

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] netpoll can lock up on low memory.
  2005-08-06  9:46                         ` David S. Miller
@ 2005-08-06  9:57                           ` Steven Rostedt
  2005-08-06 12:09                             ` John Bäckstrand
  2005-08-07  5:40                             ` Matt Mackall
  0 siblings, 2 replies; 38+ messages in thread
From: Steven Rostedt @ 2005-08-06  9:57 UTC (permalink / raw)
  To: David S. Miller; +Cc: mpm, ak, akpm, mingo, netdev, linux-kernel, sandos

On Sat, 2005-08-06 at 02:46 -0700, David S. Miller wrote:
> Can you guys stop peeing your pants over this, put aside
> your differences, and work on a mutually acceptable fix
> for these bugs?
> 
> Much appreciated, thanks :-)

In my last email, I stated that this discussion seems to have
demonstrated that the e1000 driver's netpoll is indeed broken, and needs
to be fixed.  I submitted eariler a patch for this, but it's untested
and someone who owns an e1000 needs to try it.

As for all the netpoll issues, I'm satisfied with whatever you guys
decide.  But I've seen lots of problems posted over the netpoll and
e1000, where people send in patches that do everything but fix the
e1000, and that's where I chimed in.

Thank you, my pants are dry now :-)

-- Steve

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] netpoll can lock up on low memory.
  2005-08-06  7:45                 ` Ingo Molnar
@ 2005-08-06 11:29                   ` Andi Kleen
  0 siblings, 0 replies; 38+ messages in thread
From: Andi Kleen @ 2005-08-06 11:29 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andi Kleen, Matt Mackall, Steven Rostedt, Andrew Morton, netdev,
	linux-kernel, John B?ckstrand, davem

On Sat, Aug 06, 2005 at 09:45:03AM +0200, Ingo Molnar wrote:
> 
> * Andi Kleen <ak@suse.de> wrote:
> 
> > On Fri, Aug 05, 2005 at 01:01:57PM -0700, Matt Mackall wrote:
> > > The netpoll philosophy is to assume that its traffic is an absolute
> > > priority - it is better to potentially hang trying to deliver a panic
> > > message than to give up and crash silently.
> > 
> > That would be ok if netpoll was only used to deliver panics. But it is 
> > not. It delivers all messages, and you cannot hang the kernel during 
> > that. Actually even for panics it is wrong, because often it is more 
> > important to reboot in a panic than (with a panic timeout) to actually 
> > deliver the panic. That's needed e.g. in a failover cluster.
> 
> without going into the merits of this discussion, reliable failover 
> clusters must include (and do include) an external ability to cut power.  
> No amount of in-kernel logic will prevent the kernel from hanging, given 
> a bad enough kernel bug.

Ok, true, but we should do a best effort.

> 
> So the right question is not 'can we prevent the kernel from hanging, 
> ever' (we cannot), but 'which change makes it less likely for the kernel 
> to hang'. (and, obviously: assuming all other kernel components are 
> functioning per specification, netpoll itself most not hang :-)
> 
> even a plain printk to VGA can hang in certain kernel crashes. Netpoll 
> is more complex and thus has more exposure to hangs. E.g. netpoll relies 
> on the network driver to correctly recycle skbs within a bound amount of 
> time. If the network driver leaks skbs, it's game over for netpoll.

I don't think we even need to think about such rare cases,
until the easy cases ("everything hangs when the cable is pulled") 
are not fixed.

> [ i'd prefer a hang over nondeterministic behavior, and e.g. losing 
>   console messages is sure nondeterministic behavior. What if the 
>   console message is "WARNING: the box has just been broken into"? ]

That just makes netconsole useless in production. If it causes frequenet
hangs people will not use it.


> 
> we could do one thing (see the patch below): i think it would be useful 
> to fill up the netlogging skb queue straight at initialization time.  
> Especially if netpoll is used for dumping alone, the system might not be 
> in a situation to fill up the queue at the point of crash, so better be 
> a bit more prepared and keep the pipeline filled.

You're solving a completely different issue here?

-Andi


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] netpoll can lock up on low memory.
  2005-08-06  9:57                           ` Steven Rostedt
@ 2005-08-06 12:09                             ` John Bäckstrand
  2005-08-07  5:40                             ` Matt Mackall
  1 sibling, 0 replies; 38+ messages in thread
From: John Bäckstrand @ 2005-08-06 12:09 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: David S. Miller, mpm, ak, akpm, mingo, netdev, linux-kernel

Steven Rostedt wrote:

> In my last email, I stated that this discussion seems to have
> demonstrated that the e1000 driver's netpoll is indeed broken, and needs
> to be fixed.  I submitted eariler a patch for this, but it's untested
> and someone who owns an e1000 needs to try it.

I can test this, but not right now: Im trying, again, to find my hard 
lockup issue, and so I will try to run this machine until it locks up. 
It lasted 9 days at one time, so it could potentially take some time, 
I'm afraid.

---
John Bäckstrand

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] netpoll can lock up on low memory.
  2005-08-06  7:58                         ` Ingo Molnar
@ 2005-08-06 23:10                           ` Matt Mackall
  0 siblings, 0 replies; 38+ messages in thread
From: Matt Mackall @ 2005-08-06 23:10 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Steven Rostedt, David S. Miller, Andi Kleen, Andrew Morton,
	netdev, linux-kernel, John B?ckstrand

On Sat, Aug 06, 2005 at 09:58:27AM +0200, Ingo Molnar wrote:
> 
> btw., the current NR_SKBS 32 in netpoll.c seems quite low, especially 
> e1000 can have a whole lot more skbs queued at once. Might be more 
> robust to increase it to 128 or 256?

Not sure that the card's queueing really makes a difference. It either
eventually releases the queued SKBs or it doesn't. What's more
important is that we be able to survive bursts like the output of
sysrq-t. This seems to work already.

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] netpoll can lock up on low memory.
  2005-08-06  9:57                           ` Steven Rostedt
  2005-08-06 12:09                             ` John Bäckstrand
@ 2005-08-07  5:40                             ` Matt Mackall
  1 sibling, 0 replies; 38+ messages in thread
From: Matt Mackall @ 2005-08-07  5:40 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: David S. Miller, ak, akpm, mingo, netdev, linux-kernel, sandos

On Sat, Aug 06, 2005 at 05:57:20AM -0400, Steven Rostedt wrote:
> On Sat, 2005-08-06 at 02:46 -0700, David S. Miller wrote:
> > Can you guys stop peeing your pants over this, put aside
> > your differences, and work on a mutually acceptable fix
> > for these bugs?
> > 
> > Much appreciated, thanks :-)
> 
> In my last email, I stated that this discussion seems to have
> demonstrated that the e1000 driver's netpoll is indeed broken, and needs
> to be fixed.  I submitted eariler a patch for this, but it's untested
> and someone who owns an e1000 needs to try it.

I've got your e1000 change in my queue and I'll try to test it
tomorrow (realized I've got e1000 in my laptop).
 
> As for all the netpoll issues, I'm satisfied with whatever you guys
> decide.  But I've seen lots of problems posted over the netpoll and
> e1000, where people send in patches that do everything but fix the
> e1000, and that's where I chimed in.

Andi's patch looks like it fixes a related but slightly different
problem. I'm working on a variant. And I'll try to make the skb
allocation code eventually give up too.

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: lockups with netconsole on e1000 on media insertion
  2005-08-05 13:49   ` Steven Rostedt
  2005-08-05 13:55     ` Andi Kleen
@ 2005-08-07 21:12     ` John Bäckstrand
  2005-08-08  2:29       ` Steven Rostedt
  1 sibling, 1 reply; 38+ messages in thread
From: John Bäckstrand @ 2005-08-07 21:12 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Andi Kleen, Ingo Molnar, netdev, linux-kernel

Steven Rostedt wrote:
> I don't have the card, so I can't test it. But if this works (after
> removing the previous patch) then this is the better solution. 

I can confirm that this alone does not work for the simple 
unplug/re-plug cycle I described, it still locks up hard. Tried this 
alone on -rc6.

---
John Bäckstrand

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: lockups with netconsole on e1000 on media insertion
  2005-08-07 21:12     ` lockups with netconsole on e1000 on media insertion John Bäckstrand
@ 2005-08-08  2:29       ` Steven Rostedt
  0 siblings, 0 replies; 38+ messages in thread
From: Steven Rostedt @ 2005-08-08  2:29 UTC (permalink / raw)
  To: John Bäckstrand
  Cc: Matt Mackall, Andi Kleen, Ingo Molnar, netdev, linux-kernel

On Sun, 2005-08-07 at 23:12 +0200, John Bäckstrand wrote:
> Steven Rostedt wrote:
> > I don't have the card, so I can't test it. But if this works (after
> > removing the previous patch) then this is the better solution. 
> 
> I can confirm that this alone does not work for the simple 
> unplug/re-plug cycle I described, it still locks up hard. Tried this 
> alone on -rc6.

Darn it.  If I had a e1000 I could debug it. I have other methods of
logging than printks in all there varieties (see relayfs and friends).
I still believe that the e1000_netpoll is not turning on the queue for
some reason and the netpoll_send_skb is locking up because of that.
Especially since Andi's patch fixes the problem.

In e1000_clean_tx_irq, which I added to the e1000_netpoll call, has the
following lines:

        if(unlikely(cleaned && netif_queue_stopped(netdev) &&
                    netif_carrier_ok(netdev)))
                netif_wake_queue(netdev);

The netif_queue_stopped is true, since that causes the looping in
netpoll_send_pkt.  So either it didn't clean any buffers (cleaned is
false) or netif_carrier_ok is false.  I don't know what the e1000 does
when you pull the cable while it's transmitting, does it call the
e1000_down? If so it could cause the carrier_ok to fail.

Oh well, someone with a e1000 card will need to look into this. The
problem should be easily found.  Good luck.

-- Steve

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2005-08-08  2:29 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <42F347D2.7000207@home.se.suse.lists.linux.kernel>
2005-08-05 11:45 ` lockups with netconsole on e1000 on media insertion Andi Kleen
2005-08-05 12:44   ` John Bäckstrand
2005-08-05 13:49   ` Steven Rostedt
2005-08-05 13:55     ` Andi Kleen
2005-08-05 14:10       ` Steven Rostedt
2005-08-05 14:14         ` Andi Kleen
2005-08-05 14:27           ` Steven Rostedt
2005-08-05 14:36             ` David S. Miller
2005-08-05 15:02               ` Steven Rostedt
2005-08-05 14:36           ` [PATCH] netpoll can lock up on low memory Steven Rostedt
2005-08-05 20:01             ` Matt Mackall
2005-08-05 20:57               ` Steven Rostedt
2005-08-05 21:28                 ` Matt Mackall
2005-08-06  0:23                   ` Steven Rostedt
2005-08-06  1:53                     ` Matt Mackall
2005-08-06  2:32                       ` Steven Rostedt
2005-08-06  7:30                         ` Daniel Phillips
2005-08-06  7:58                         ` Ingo Molnar
2005-08-06 23:10                           ` Matt Mackall
2005-08-06  9:46                         ` David S. Miller
2005-08-06  9:57                           ` Steven Rostedt
2005-08-06 12:09                             ` John Bäckstrand
2005-08-07  5:40                             ` Matt Mackall
2005-08-05 21:26               ` Andi Kleen
2005-08-05 21:42                 ` Matt Mackall
2005-08-05 21:51                   ` Andi Kleen
2005-08-06  1:16                     ` Matt Mackall
2005-08-06  0:30                 ` Steven Rostedt
2005-08-06  7:45                 ` Ingo Molnar
2005-08-06 11:29                   ` Andi Kleen
2005-08-07 21:12     ` lockups with netconsole on e1000 on media insertion John Bäckstrand
2005-08-08  2:29       ` Steven Rostedt
2005-08-05 20:12   ` Matt Mackall
2005-08-05 21:56     ` Andi Kleen
2005-08-05 23:20       ` Matt Mackall
2005-08-05 23:51         ` Andi Kleen
2005-08-06  1:22           ` Matt Mackall
2005-08-06  1:37             ` Daniel Phillips

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox