All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Benjamin Li" <benli@broadcom.com>
To: "Bruno Prémont" <bonbons@linux-vserver.org>
Cc: "netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"Michael Chan" <mchan@broadcom.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: BNX2: Kernel crashes with 2.6.31 and 2.6.31.9
Date: Tue, 29 Dec 2009 21:08:11 -0800	[thread overview]
Message-ID: <1262149691.2788.63.camel@localhost> (raw)
In-Reply-To: <20091229145403.39f82773@pluto.restena.lu>

[-- Attachment #1: Type: text/plain, Size: 3529 bytes --]

Hi Bruno.

Could you try running with the attached patch?  This debug patch is
built against the linux-2.6.31.9 kernel.  I think the panic is occuring
right before a reset has occured due to a TX timeout.  To see if this is
happening, this patch will print hardware state information when a TX
timeout occurs.  If you could run with this patch and send the logs when
the panic occurs, I would really appreciate it.

Thanks again.

-Ben

On Tue, 2009-12-29 at 05:54 -0800, Bruno Prémont wrote:
> On Tue, 29 Dec 2009 01:05:40 "Benjamin Li" <benli@broadcom.com> wrote:
> > Hi Bruno,
> > 
> > It looks like the the NULL dereference is happening at a0fc.
> > 
> > a0f8:       48 8b 42 70             mov 0x70(%rdx),%rax 
> > a0fc:       0f b7 10                movzwl (%rax),%edx
> > a0ff:       31 c0                   xor    %eax,%eax
> > 
> > The offset of 0x70 is the bp field in the bnx2_napi structure.  (Seen
> > in the bnx2_napi structure dump below)  These lines are found in the
> > routine, bnx2_get_hw_tx_cons() which look like they were inlined by
> > the compiler.  More specifically it looks like the dereference of the
> > hw_tx_cons_ptr failed.
> > 
> > cons = *bnapi->hw_tx_cons_ptr;
> > 
> > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=drivers/net/bnx2.c;h=06b901152d4487fa04164437cc179661b44657fe;hb=74fca6a42863ffacaf7ba6f1936a9f228950f657#l2761
> > 
> > To be sure this is the case, could you send the .config file you are
> > using or if you could send me the bnx2 kernel module built with the
> > CFLAG '-g', then we can definitely verify where in the code it is
> > crashing.
> > 
> > Did you see anything suspicious in the system kernel logs?  If you
> > could isolate the logs from when the machine booted to when it crash
> > and send it to us it would be very helpful. 
> 
> It crashes every now and then (since netconsole is enabled it does not
> survive 24 hours :( ) while or just after transmitting log messages with
> netconsole, the messages being transmitted are logging that occurs with
> netfilter 'LOG' target.
> 
> Sample output as seen by netconsole recipient (1 packet per line, IP
> addresses masked):
> 
> [ 2115.949606] (reject)output: IN= OUT=eth0
> SRC=***.**.*.** DST=**.***.**.***
> LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=29589
> DF
> PROTO=TCP
> SPT=58991 DPT=80
> WINDOW=5840
> RES=0x00
> SYN
> URGP=0
> 
> [ 2115.949704] (reject)output: IN= OUT=eth0
> SRC=***.**.*.** DST=**.***.**.***
> [ 2115.949729] BUG: unable to handle kernel NULL pointer dereference at (null)
> [ 2115.949732] IP: [<ffffffffa00680fc>] bnx2_poll_work+0x2c/0x12d0 [bnx2]
> [ 2115.949742] PGD 5b6f0067 PUD 59c04067 PMD 0
> [ 2115.949744] Oops: 0000 [#1] SMP
> [ 2115.949746] last sysfs file: /sys/kernel/uevent_seqnum
> [ 2115.949749] CPU 3
> [ 2115.949750] Modules linked in: dm_round_robin scsi_dh_rdac ipmi_devintf netconsole squashfs configfs zlib_inflate ext2 loop dm_multipath scsi_dh dm_mod sg sr_mod cdrom ata_piix h
> pwdt qla2xxx ipmi_si ahci bnx2 ipmi_msghandler libata uhci_hcd ehci_hcd
> [ 2115.949764] Pid: 7926, comm: php-cgi Not tainted 2.6.31.9-x86_64 #1 ProLiant DL360 G5
> [ 2115.949766] RIP: 0010:[<ffffffffa00680fc>]  [<ffffffffa00680fc>] bnx2_poll_work+0x2c/0x12d0 [bnx2]
> 
> Looks like netpoll is triggering suicide on BNX2.
> 
> Any way to get the NULL-pointer non-fatal would help a lot! (any
> sensible thing to do when bnapi->hw_tx_cons_ptr is NULL that would
> allow the system to continue working without killing everything?)
> 
> 
> Regards,
> Bruno
> 

[-- Attachment #2: bnx2_ftq_state_dump.diff --]
[-- Type: text/plain, Size: 5692 bytes --]

diff --git a/linux-2.6.31.9/drivers/net/bnx2.c b/linux-2.6.31.9/drivers/net/bnx2.c
index 06b9011..140bd48 100644
--- a/linux-2.6.31.9/drivers/net/bnx2.c
+++ b/linux-2.6.31.9/drivers/net/bnx2.c
@@ -6239,11 +6239,111 @@ bnx2_reset_task(struct work_struct *work)
 	bnx2_netif_start(bp);
 }
 
+
+static void bnx2_dump_ftq(struct bnx2 *bp)
+{
+	printk(KERN_ERR PFX "<--- start FTQ dump on %s --->\n", bp->dev->name);
+	printk(KERN_ERR PFX "%s: BNX2_RV2P_PFTQ_CTL %x\n", bp->dev->name,
+	       REG_RD(bp, BNX2_RV2P_PFTQ_CTL));
+	printk(KERN_ERR PFX "%s: BNX2_RV2P_TFTQ_CTL %x\n", bp->dev->name,
+	       REG_RD(bp, BNX2_RV2P_TFTQ_CTL));
+	printk(KERN_ERR PFX "%s: BNX2_RV2P_MFTQ_CTL %x\n", bp->dev->name,
+	       REG_RD(bp, BNX2_RV2P_MFTQ_CTL));
+	printk(KERN_ERR PFX "%s: BNX2_TBDR_FTQ_CTL %x\n", bp->dev->name,
+	       REG_RD(bp, BNX2_TBDR_FTQ_CTL));
+	printk(KERN_ERR PFX "%s: BNX2_TDMA_FTQ_CTL %x\n", bp->dev->name,
+	       REG_RD(bp, BNX2_TDMA_FTQ_CTL));
+	printk(KERN_ERR PFX "%s: BNX2_TXP_FTQ_CTL %x\n", bp->dev->name,
+	       bnx2_reg_rd_ind(bp, BNX2_TXP_FTQ_CTL));
+	printk(KERN_ERR PFX "%s: BNX2_TPAT_FTQ_CTL %x\n", bp->dev->name,
+	       bnx2_reg_rd_ind(bp, BNX2_TPAT_FTQ_CTL));
+	printk(KERN_ERR PFX "%s: BNX2_RXP_CFTQ_CTL %x\n", bp->dev->name,
+	       bnx2_reg_rd_ind(bp, BNX2_RXP_CFTQ_CTL));
+	printk(KERN_ERR PFX "%s: BNX2_RXP_FTQ_CTL %x\n", bp->dev->name,
+	       bnx2_reg_rd_ind(bp, BNX2_RXP_FTQ_CTL));
+	printk(KERN_ERR PFX "%s: BNX2_COM_COMXQ_FTQ_CTL %x\n", bp->dev->name,
+	       bnx2_reg_rd_ind(bp, BNX2_COM_COMXQ_FTQ_CTL));
+	printk(KERN_ERR PFX "%s: BNX2_COM_COMTQ_FTQ_CTL %x\n", bp->dev->name,
+	       bnx2_reg_rd_ind(bp, BNX2_COM_COMTQ_FTQ_CTL));
+	printk(KERN_ERR PFX "%s: BNX2_COM_COMQ_FTQ_CTL %x\n", bp->dev->name,
+	       bnx2_reg_rd_ind(bp, BNX2_COM_COMQ_FTQ_CTL));
+	printk(KERN_ERR PFX "%s: BNX2_CP_CPQ_FTQ_CTL %x\n", bp->dev->name,
+	       bnx2_reg_rd_ind(bp, BNX2_CP_CPQ_FTQ_CTL));
+	printk(KERN_ERR PFX
+	       "%s: TXP mode %x state %x evt_mask %x pc %x pc %x instr %x\n",
+	       bp->dev->name,
+	       bnx2_reg_rd_ind(bp, BNX2_TXP_CPU_MODE),
+	       bnx2_reg_rd_ind(bp, BNX2_TXP_CPU_STATE),
+	       bnx2_reg_rd_ind(bp, BNX2_TXP_CPU_EVENT_MASK),
+	       bnx2_reg_rd_ind(bp, BNX2_TXP_CPU_PROGRAM_COUNTER),
+	       bnx2_reg_rd_ind(bp, BNX2_TXP_CPU_PROGRAM_COUNTER),
+	       bnx2_reg_rd_ind(bp, BNX2_TXP_CPU_INSTRUCTION));
+	printk(KERN_ERR PFX
+	       "%s: TPAT mode %x state %x evt_mask %x pc %x pc %x instr %x\n",
+	       bp->dev->name,
+	       bnx2_reg_rd_ind(bp, BNX2_TPAT_CPU_MODE),
+	       bnx2_reg_rd_ind(bp, BNX2_TPAT_CPU_STATE),
+	       bnx2_reg_rd_ind(bp, BNX2_TPAT_CPU_EVENT_MASK),
+	       bnx2_reg_rd_ind(bp, BNX2_TPAT_CPU_PROGRAM_COUNTER),
+	       bnx2_reg_rd_ind(bp, BNX2_TPAT_CPU_PROGRAM_COUNTER),
+	       bnx2_reg_rd_ind(bp, BNX2_TPAT_CPU_INSTRUCTION));
+	printk(KERN_ERR PFX
+	       "%s: RXP mode %x state %x evt_mask %x pc %x pc %x instr %x\n",
+	       bp->dev->name,
+	       bnx2_reg_rd_ind(bp, BNX2_RXP_CPU_MODE),
+	       bnx2_reg_rd_ind(bp, BNX2_RXP_CPU_STATE),
+	       bnx2_reg_rd_ind(bp, BNX2_RXP_CPU_EVENT_MASK),
+	       bnx2_reg_rd_ind(bp, BNX2_RXP_CPU_PROGRAM_COUNTER),
+	       bnx2_reg_rd_ind(bp, BNX2_RXP_CPU_PROGRAM_COUNTER),
+	       bnx2_reg_rd_ind(bp, BNX2_RXP_CPU_INSTRUCTION));
+	printk(KERN_ERR PFX
+	       "%s: COM mode %x state %x evt_mask %x pc %x pc %x instr %x\n",
+	       bp->dev->name,
+	       bnx2_reg_rd_ind(bp, BNX2_COM_CPU_MODE),
+	       bnx2_reg_rd_ind(bp, BNX2_COM_CPU_STATE),
+	       bnx2_reg_rd_ind(bp, BNX2_COM_CPU_EVENT_MASK),
+	       bnx2_reg_rd_ind(bp, BNX2_COM_CPU_PROGRAM_COUNTER),
+	       bnx2_reg_rd_ind(bp, BNX2_COM_CPU_PROGRAM_COUNTER),
+	       bnx2_reg_rd_ind(bp, BNX2_COM_CPU_INSTRUCTION));
+	printk(KERN_ERR PFX
+	       "%s: CP mode %x state %x evt_mask %x pc %x pc %x instr %x\n",
+	       bp->dev->name,
+	       bnx2_reg_rd_ind(bp, BNX2_CP_CPU_MODE),
+	       bnx2_reg_rd_ind(bp, BNX2_CP_CPU_STATE),
+	       bnx2_reg_rd_ind(bp, BNX2_CP_CPU_EVENT_MASK),
+	       bnx2_reg_rd_ind(bp, BNX2_CP_CPU_PROGRAM_COUNTER),
+	       bnx2_reg_rd_ind(bp, BNX2_CP_CPU_PROGRAM_COUNTER),
+	       bnx2_reg_rd_ind(bp, BNX2_CP_CPU_INSTRUCTION));
+	printk(KERN_ERR PFX "<--- end FTQ dump on %s --->\n", bp->dev->name);
+}
+
+static void
+bnx2_dump_state(struct bnx2 *bp)
+{
+	printk(KERN_ERR PFX "DEBUG: intr_sem[%x]\n", 
+		atomic_read(&bp->intr_sem));
+	printk(KERN_ERR PFX "DEBUG: EMAC_TX_STATUS[%08x] RPM_MGMT_PKT_CTRL[%08x]\n",
+		REG_RD(bp, BNX2_EMAC_TX_STATUS),
+		REG_RD(bp, BNX2_RPM_MGMT_PKT_CTRL));
+	printk(KERN_ERR PFX "DEBUG: MCP_STATE_P0[%08x] MCP_STATE_P1[%08x]\n",
+		bnx2_reg_rd_ind(bp, BNX2_MCP_STATE_P0),
+		bnx2_reg_rd_ind(bp, BNX2_MCP_STATE_P1));
+	printk(KERN_ERR PFX "DEBUG: HC_STATS_INTERRUPT_STATUS[%08x]\n",
+		REG_RD(bp, BNX2_HC_STATS_INTERRUPT_STATUS));
+	if (bp->flags & BNX2_FLAG_USING_MSIX)
+		printk(KERN_ERR PFX "DEBUG: PBA[%08x]\n",
+			REG_RD(bp, BNX2_PCI_GRC_WINDOW3_BASE));
+}
+
+
 static void
 bnx2_tx_timeout(struct net_device *dev)
 {
 	struct bnx2 *bp = netdev_priv(dev);
 
+	bnx2_dump_ftq(bp);
+	bnx2_dump_state(bp);
+
 	/* This allows the netif to be shutdown gracefully before resetting */
 	schedule_work(&bp->reset_task);
 }
diff --git a/linux-2.6.31.9/drivers/net/bnx2.h b/linux-2.6.31.9/drivers/net/bnx2.h
index a4f12fd..0ec9df2 100644
--- a/linux-2.6.31.9/drivers/net/bnx2.h
+++ b/linux-2.6.31.9/drivers/net/bnx2.h
@@ -6342,6 +6342,8 @@ struct l2_fhdr {
 
 #define BNX2_MCP_ROM					0x00150000
 #define BNX2_MCP_SCRATCH				0x00160000
+#define BNX2_MCP_STATE_P1				0x0016f9c8
+#define BNX2_MCP_STATE_P0				0x0016fdc
 
 #define BNX2_SHM_HDR_SIGNATURE				BNX2_MCP_SCRATCH
 #define BNX2_SHM_HDR_SIGNATURE_SIG_MASK			 0xffff0000

  reply	other threads:[~2009-12-30  5:08 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-12-29  7:49 BNX2: Kernel crashes with 2.6.31 and 2.6.31.9 Bruno Prémont
2009-12-29  9:05 ` Benjamin Li
2009-12-29  9:33   ` Bruno Prémont
2009-12-29 13:54   ` Bruno Prémont
2009-12-30  5:08     ` Benjamin Li [this message]
2010-02-19  8:10       ` Bruno Prémont
2010-02-19 19:57         ` Benjamin Li
2010-02-19 21:03           ` Brian Haley
2010-02-19 21:47             ` Benjamin Li
2010-02-23 12:15           ` Bruno Prémont
2010-03-02  1:26             ` Benjamin Li
2010-03-02  7:10               ` Bruno Prémont
2010-03-02  8:20                 ` Bruno Prémont
2010-03-02 22:12                   ` Michael Chan
2010-03-04 20:31                     ` Brian Haley
2010-03-10 23:09                       ` Brian Haley
2010-03-10 23:32                         ` Michael Chan
2010-03-11  2:09                           ` Brian Haley
2010-03-11 17:49                             ` Michael Chan
2010-03-11 18:05                               ` David Miller
2010-03-11 18:38                                 ` Michael Chan
2010-03-11 19:40                                   ` Brian Haley
2010-03-11 19:47                                     ` Michael Chan
2010-03-11 21:57                                       ` Brian Haley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1262149691.2788.63.camel@localhost \
    --to=benli@broadcom.com \
    --cc=bonbons@linux-vserver.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mchan@broadcom.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.