Re: Kernel Panic every 2 weeks on ISP server (NULL pointer dereference)

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Eric Dumazet <eric.dumazet@gmail.com>
To: Luciano Ruete <lruete@sequre.com.ar>
Cc: netdev@vger.kernel.org
Subject: Re: Kernel Panic every 2 weeks on ISP server (NULL pointer dereference)
Date: Sun, 23 Oct 2011 07:16:29 +0200	[thread overview]
Message-ID: <1319346989.6180.71.camel@edumazet-laptop> (raw)
In-Reply-To: <201110222218.12524.lruete@sequre.com.ar>

Le samedi 22 octobre 2011 à 22:18 -0300, Luciano Ruete a écrit :
> Hi,
> 
> I'm the sysadmin at a 3500 customers ISP, wich runs an iptables+tc solution 
> for load balancing and QoS.
> 
> Every 2 or 3 weeks the server panics with a "NULL pointer dereference" and 
> with IP at "dev_queue_xmit"
> 
> It is curious that if i disable MSI on the network card driver this panics 
> seems to disapear, does this ring a bell?
> 
> The server is an IBM, previously with Broadcom NetXtreme II BCM5709 nics and 
> now with Intel 82576. I change the nics thinking that maybe the bug was in 
> Broadcom Driver but it seems to affect MSI in general.
> 
> The tc+iptables rules are auto-generated with sequreisp[1] an ISP solution 
> that i wrote and is open sourced under AGPLv3.
> 
> Tell me if you need any further information, and plz CC because I'm not 
> suscribed. 
> 
> 
> root@server:~# uname -a
> Linux server 2.6.35-30-server #60~lucid1-Ubuntu SMP Tue Sep 20 22:28:40 UTC 
> 2011 x86_64 GNU/Linux
> 
> 
> [1]https://github.com/sequre/sequreisp
> 

Hi Luciano

[694250.472081] Code: f6 
49 c1 e6 07          shl    $0x7,%r14
66 89 93 ac 00 00 00 mov    %dx,0xac(%rbx)

4d 03 b5 40 03 00 00 add    0x340(%r13),%r14   
 txq = dev_pick_tx(dev, skb);

0f b7 83 a6 00 00 00 movzwl 0xa6(%rbx),%eax
4d 8b 66 08          mov    0x8(%r14),%r12    
	   q = rcu_dereference_bh(txq->qdisc);
80 e4 cf             and    $0xcf,%ah
80 cc 20             or     $0x20,%ah

66 89 83 a6 00 00 00 mov    %ax,0xa6(%rbx)   
   skb->tc_verd = SET_TC_AT(skb->tc_verd, AT_EGRESS);

<49> 83 3c 24 00     cmpq   $0x0,(%r12)       
  if (q->enqueue)   CRASH because q is NULL.

0f 84 3b 02 00 00 je      ...		
				rc = __dev_xmit_skb(skb, q, dev, txq);  
49 8d 84 24 9c 00 00 00   lea    0x9c(%r12),%rax
48 89 


This looks like a dev_pick_tx() bug, using an out of bound 
queue_index number and returning a txq pointing after
the device allocated array.



With recent kernels, this cannot happen anymore because
we added fixes in this area.

You could try Ubuntu 11.10 (based on linux 3.0) kernel
on your server, or apply following patch :

commit df32cc193ad88f7b1326b90af799c927b27f7654
Author: Tom Herbert <therbert@google.com>
Date:   Mon Nov 1 12:55:52 2010 -0700

    net: check queue_index from sock is valid for device
    
    In dev_pick_tx recompute the queue index if the value stored in the
    socket is greater than or equal to the number of real queues for the
    device.  The saved index in the sock structure is not guaranteed to
    be appropriate for the egress device (this could happen on a route
    change or in presence of tunnelling).  The result of the queue index
    being bad would be to return a bogus queue (crash could prersumably
    follow).
    
    Signed-off-by: Tom Herbert <therbert@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/net/core/dev.c b/net/core/dev.c
index 35dfb83..0dd54a6 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2131,7 +2131,7 @@ static struct netdev_queue *dev_pick_tx(struct net_device *dev,
 	} else {
 		struct sock *sk = skb->sk;
 		queue_index = sk_tx_queue_get(sk);
-		if (queue_index < 0) {
+		if (queue_index < 0 || queue_index >= dev->real_num_tx_queues) {
 
 			queue_index = 0;
 			if (dev->real_num_tx_queues > 1)

next prev parent reply	other threads:[~2011-10-23  5:42 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-10-23  1:18 Kernel Panic every 2 weeks on ISP server (NULL pointer dereference) Luciano Ruete
2011-10-23  5:16 ` Eric Dumazet [this message]
2011-10-24 18:09   ` Luciano Ruete
2011-10-24 18:21     ` Eric Dumazet
2011-11-07 13:11     ` Luciano Ruete

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:35dfb83 dfblob:0dd54a6 )
 OR (
bs:"Re: Kernel Panic every 2 weeks on ISP server (NULL pointer dereference)" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1319346989.6180.71.camel@edumazet-laptop \
    --to=eric.dumazet@gmail.com \
    --cc=lruete@sequre.com.ar \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).