netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Detected Tx Unit Hang in ixgbe, kernel 2.6.25
@ 2008-05-06 17:04 Ben Greear
  2008-05-06 20:42 ` Brandeburg, Jesse
  0 siblings, 1 reply; 3+ messages in thread
From: Ben Greear @ 2008-05-06 17:04 UTC (permalink / raw)
  To: NetDev

I'm using a 10Gbps copper(CX4) dual-port NIC from silicomusa.com.
It uses the Intel chipset and ixgbe driver.  I'm using
kernel 2.6.25 plus some hacks (no patches to ixgbe).

This particular test case was to create 500 mac-vlans on
each of the two ports and generate UDP traffic between
them (I have a version of the send-to-self patch applied
to my kernel and enabled.)

During the setup for this test, the interfaces would have
been bounced (effectively ifdown, ifup), so that is the
reason for the link going up and down.

I noticed 90%+ drop rate when I first started the test,
and then after maybe 1-2 minutes, things calmed down and
started working.  I checked /var/log/messages and saw the
messages below.

I previously ran 5Gbps of traffic through the two ports
with them acting like a bridge for more than 24-hours without
any obvious problems, so I think the hardware is probably OK.

May  6 09:51:41 simech-ice kernel: ixgbe: eth3: ixgbe_check_tx_hang: Detected Tx Unit Hang
May  6 09:51:41 simech-ice kernel:   TDH                  <1e>
May  6 09:51:41 simech-ice kernel:   TDT                  <3ff>
May  6 09:51:41 simech-ice kernel:   next_to_use          <3ff>
May  6 09:51:41 simech-ice kernel:   next_to_clean        <1a>
May  6 09:51:41 simech-ice kernel: tx_buffer_info[next_to_clean]
May  6 09:51:41 simech-ice kernel:   time_stamp           <11e035210>
May  6 09:51:41 simech-ice kernel:   next_to_watch        <1b>
May  6 09:51:41 simech-ice kernel:   jiffies              <11e035862>
May  6 09:51:41 simech-ice kernel:   next_to_watch.status <17a8209>
May  6 09:51:41 simech-ice kernel: ixgbe: eth2: ixgbe_check_tx_hang: Detected Tx Unit Hang
May  6 09:51:41 simech-ice kernel:   TDH                  <3d6>
May  6 09:51:41 simech-ice kernel:   TDT                  <3b0>
May  6 09:51:41 simech-ice kernel:   next_to_use          <3b0>
May  6 09:51:41 simech-ice kernel:   next_to_clean        <3d2>
May  6 09:51:41 simech-ice kernel: tx_buffer_info[next_to_clean]
May  6 09:51:41 simech-ice kernel:   time_stamp           <11e035211>
May  6 09:51:41 simech-ice kernel:   next_to_watch        <3d3>
May  6 09:51:41 simech-ice kernel:   jiffies              <11e035887>
May  6 09:51:41 simech-ice kernel:   next_to_watch.status <17a8209>
May  6 09:51:46 simech-ice kernel: ixgbe: eth3: ixgbe_check_tx_hang: Detected Tx Unit Hang
May  6 09:51:46 simech-ice kernel:   TDH                  <28d>
May  6 09:51:46 simech-ice kernel:   TDT                  <26c>
May  6 09:51:46 simech-ice kernel:   next_to_use          <26c>
May  6 09:51:46 simech-ice kernel:   next_to_clean        <289>
May  6 09:51:46 simech-ice kernel: tx_buffer_info[next_to_clean]
May  6 09:51:46 simech-ice kernel:   time_stamp           <11e0363e0>
May  6 09:51:46 simech-ice kernel:   next_to_watch        <28a>
May  6 09:51:46 simech-ice kernel:   jiffies              <11e036e8e>
May  6 09:51:46 simech-ice kernel:   next_to_watch.status <17a8209>
May  6 09:51:46 simech-ice kernel: ixgbe: eth2: ixgbe_check_tx_hang: Detected Tx Unit Hang
May  6 09:51:46 simech-ice kernel:   TDH                  <1bd>
May  6 09:51:46 simech-ice kernel:   TDT                  <19c>
May  6 09:51:46 simech-ice kernel:   next_to_use          <19c>
May  6 09:51:46 simech-ice kernel:   next_to_clean        <1b9>
May  6 09:51:46 simech-ice kernel: tx_buffer_info[next_to_clean]
May  6 09:51:46 simech-ice kernel:   time_stamp           <11e036346>
May  6 09:51:46 simech-ice kernel:   next_to_watch        <1ba>
May  6 09:51:46 simech-ice kernel:   jiffies              <11e036e9a>
May  6 09:51:46 simech-ice kernel:   next_to_watch.status <17a8209>
May  6 09:51:47 simech-ice kernel: ixgbe: eth2: ixgbe_check_tx_hang: Detected Tx Unit Hang
May  6 09:51:47 simech-ice kernel:   TDH                  <29e>
May  6 09:51:47 simech-ice kernel:   TDT                  <27c>
May  6 09:51:47 simech-ice kernel:   next_to_use          <27c>
May  6 09:51:47 simech-ice kernel:   next_to_clean        <29a>
May  6 09:51:47 simech-ice kernel: tx_buffer_info[next_to_clean]
May  6 09:51:47 simech-ice kernel:   time_stamp           <11e0363e0>
May  6 09:51:47 simech-ice kernel:   next_to_watch        <29b>
May  6 09:51:47 simech-ice kernel:   jiffies              <11e036fee>
May  6 09:51:47 simech-ice kernel:   next_to_watch.status <17a8209>
May  6 09:51:47 simech-ice kernel: ixgbe: eth3: ixgbe_check_tx_hang: Detected Tx Unit Hang
May  6 09:51:47 simech-ice kernel:   TDH                  <33f>
May  6 09:51:47 simech-ice kernel:   TDT                  <321>
May  6 09:51:47 simech-ice kernel:   next_to_use          <321>
May  6 09:51:47 simech-ice kernel:   next_to_clean        <33b>
May  6 09:51:47 simech-ice kernel: tx_buffer_info[next_to_clean]
May  6 09:51:47 simech-ice kernel:   time_stamp           <11e0363e2>
May  6 09:51:47 simech-ice kernel:   next_to_watch        <33c>
May  6 09:51:47 simech-ice kernel:   jiffies              <11e036ff5>
May  6 09:51:47 simech-ice kernel:   next_to_watch.status <17a8209>
May  6 09:51:51 simech-ice kernel: ixgbe: eth2: ixgbe_check_tx_hang: Detected Tx Unit Hang
May  6 09:51:51 simech-ice kernel:   TDH                  <398>
May  6 09:51:51 simech-ice kernel:   TDT                  <374>
May  6 09:51:51 simech-ice kernel:   next_to_use          <374>
May  6 09:51:51 simech-ice kernel:   next_to_clean        <394>
May  6 09:51:51 simech-ice kernel: tx_buffer_info[next_to_clean]
May  6 09:51:51 simech-ice kernel:   time_stamp           <11e037748>
May  6 09:51:51 simech-ice kernel:   next_to_watch        <395>
May  6 09:51:51 simech-ice kernel:   jiffies              <11e038251>
May  6 09:51:51 simech-ice kernel:   next_to_watch.status <17a8209>
May  6 09:51:51 simech-ice kernel: ixgbe: eth3: ixgbe_check_tx_hang: Detected Tx Unit Hang
May  6 09:51:51 simech-ice kernel:   TDH                  <101>
May  6 09:51:51 simech-ice kernel:   TDT                  <dd>
May  6 09:51:51 simech-ice kernel:   next_to_use          <dd>
May  6 09:51:51 simech-ice kernel:   next_to_clean        <fd>
May  6 09:51:51 simech-ice kernel: tx_buffer_info[next_to_clean]
May  6 09:51:51 simech-ice kernel:   time_stamp           <11e037743>
May  6 09:51:51 simech-ice kernel:   next_to_watch        <fe>
May  6 09:51:51 simech-ice kernel:   jiffies              <11e03825c>
May  6 09:51:51 simech-ice kernel:   next_to_watch.status <17a8209>
May  6 09:52:00 simech-ice kernel: ixgbe: eth3: ixgbe_check_tx_hang: Detected Tx Unit Hang
May  6 09:52:00 simech-ice kernel:   TDH                  <2b5>
May  6 09:52:00 simech-ice kernel:   TDT                  <292>
May  6 09:52:00 simech-ice kernel:   next_to_use          <292>
May  6 09:52:00 simech-ice kernel:   next_to_clean        <2b1>
May  6 09:52:00 simech-ice kernel: tx_buffer_info[next_to_clean]
May  6 09:52:00 simech-ice kernel:   time_stamp           <11e038937>
May  6 09:52:00 simech-ice kernel:   next_to_watch        <2b2>
May  6 09:52:00 simech-ice kernel:   jiffies              <11e03a29c>
May  6 09:52:00 simech-ice kernel:   next_to_watch.status <17a8209>
May  6 09:52:00 simech-ice kernel: ixgbe: eth2: ixgbe_check_tx_hang: Detected Tx Unit Hang
May  6 09:52:00 simech-ice kernel:   TDH                  <8>
May  6 09:52:00 simech-ice kernel:   TDT                  <3e6>
May  6 09:52:00 simech-ice kernel:   next_to_use          <3e6>
May  6 09:52:00 simech-ice kernel:   next_to_clean        <4>
May  6 09:52:00 simech-ice kernel: tx_buffer_info[next_to_clean]
May  6 09:52:00 simech-ice kernel:   time_stamp           <11e038957>
May  6 09:52:00 simech-ice kernel:   next_to_watch        <5>
May  6 09:52:00 simech-ice kernel:   jiffies              <11e03a2d5>
May  6 09:52:00 simech-ice kernel:   next_to_watch.status <17a8209>
May  6 09:52:11 simech-ice kernel: NETDEV WATCHDOG: eth3: transmit timed out
May  6 09:52:11 simech-ice kernel: NETDEV WATCHDOG: eth2: transmit timed out
May  6 09:52:11 simech-ice kernel: ixgbe: eth3: ixgbe_watchdog: NIC Link is Down
May  6 09:52:11 simech-ice kernel: ixgbe: eth3: ixgbe_check_tx_hang: Detected Tx Unit Hang
May  6 09:52:11 simech-ice kernel:   TDH                  <18c>
May  6 09:52:11 simech-ice kernel:   TDT                  <12a>
May  6 09:52:11 simech-ice kernel:   next_to_use          <12a>
May  6 09:52:11 simech-ice kernel:   next_to_clean        <188>
May  6 09:52:11 simech-ice kernel: tx_buffer_info[next_to_clean]
May  6 09:52:11 simech-ice kernel:   time_stamp           <11e03aa83>
May  6 09:52:11 simech-ice kernel:   next_to_watch        <189>
May  6 09:52:11 simech-ice kernel:   jiffies              <11e03cde1>
May  6 09:52:11 simech-ice kernel:   next_to_watch.status <17a8209>
May  6 09:52:11 simech-ice kernel: ixgbe: eth2: ixgbe_watchdog: NIC Link is Up 10 Gbps, Flow Control: RX/TX
May  6 09:52:11 simech-ice kernel: ixgbe: eth2: ixgbe_watchdog: NIC Link is Down
May  6 09:52:11 simech-ice kernel: ADDRCONF(NETDEV_UP): eth3#435: link is not ready
May  6 09:52:11 simech-ice kernel: ixgbe: eth3: ixgbe_watchdog: NIC Link is Up 10 Gbps, Flow Control: RX/TX
May  6 09:52:11 simech-ice kernel: ixgbe: eth2: ixgbe_watchdog: NIC Link is Up 10 Gbps, Flow Control: RX/TX
May  6 09:52:11 simech-ice kernel: ADDRCONF(NETDEV_CHANGE): eth3#435: link becomes ready
May  6 09:52:22 simech-ice kernel: NETDEV WATCHDOG: eth3: transmit timed out
May  6 09:52:22 simech-ice kernel: NETDEV WATCHDOG: eth2: transmit timed out
May  6 09:52:23 simech-ice kernel: ixgbe: eth2: ixgbe_watchdog: NIC Link is Down
May  6 09:52:23 simech-ice kernel: ixgbe: eth2: ixgbe_check_tx_hang: Detected Tx Unit Hang
May  6 09:52:23 simech-ice kernel:   TDH                  <19b>
May  6 09:52:23 simech-ice kernel:   TDT                  <173>
May  6 09:52:23 simech-ice kernel:   next_to_use          <173>
May  6 09:52:23 simech-ice kernel:   next_to_clean        <197>
May  6 09:52:23 simech-ice kernel: tx_buffer_info[next_to_clean]
May  6 09:52:23 simech-ice kernel:   time_stamp           <11e03d200>
May  6 09:52:23 simech-ice kernel:   next_to_watch        <198>
May  6 09:52:23 simech-ice kernel:   jiffies              <11e03fcd1>
May  6 09:52:23 simech-ice kernel:   next_to_watch.status <17a8209>
May  6 09:52:23 simech-ice kernel: ixgbe: eth2: ixgbe_watchdog: NIC Link is Up 10 Gbps, Flow Control: RX/TX
May  6 09:52:23 simech-ice kernel: ixgbe: eth2: ixgbe_watchdog: NIC Link is Down
May  6 09:52:23 simech-ice kernel: ixgbe: eth3: ixgbe_watchdog: NIC Link is Up 10 Gbps, Flow Control: RX/TX
May  6 09:52:23 simech-ice kernel: ixgbe: eth3: ixgbe_watchdog: NIC Link is Down
May  6 09:52:23 simech-ice kernel: ixgbe: eth3: ixgbe_watchdog: NIC Link is Up 10 Gbps, Flow Control: RX/TX
May  6 09:52:23 simech-ice kernel: ixgbe: eth3: ixgbe_watchdog: NIC Link is Down
May  6 09:52:23 simech-ice kernel: ixgbe: eth3: ixgbe_watchdog: NIC Link is Up 10 Gbps, Flow Control: RX/TX
May  6 09:52:23 simech-ice kernel: ixgbe: eth2: ixgbe_watchdog: NIC Link is Up 10 Gbps, Flow Control: RX/TX
May  6 09:52:27 simech-ice kernel: ixgbe: eth3: ixgbe_check_tx_hang: Detected Tx Unit Hang
May  6 09:52:27 simech-ice kernel:   TDH                  <6>
May  6 09:52:27 simech-ice kernel:   TDT                  <3e4>
May  6 09:52:27 simech-ice kernel:   next_to_use          <3e4>
May  6 09:52:27 simech-ice kernel:   next_to_clean        <2>
May  6 09:52:27 simech-ice kernel: tx_buffer_info[next_to_clean]
May  6 09:52:27 simech-ice kernel:   time_stamp           <11e0400bb>
May  6 09:52:27 simech-ice kernel:   next_to_watch        <3>
May  6 09:52:27 simech-ice kernel:   jiffies              <11e040d75>
May  6 09:52:27 simech-ice kernel:   next_to_watch.status <17a8209>
May  6 09:52:34 simech-ice kernel: NETDEV WATCHDOG: eth3: transmit timed out
May  6 09:52:34 simech-ice kernel: ixgbe: eth2: ixgbe_watchdog: NIC Link is Down
May  6 09:52:34 simech-ice kernel: ixgbe: eth2: ixgbe_watchdog: NIC Link is Up 10 Gbps, Flow Control: RX/TX
May  6 09:52:34 simech-ice kernel: ixgbe: eth2: ixgbe_watchdog: NIC Link is Down
May  6 09:52:34 simech-ice kernel: ixgbe: eth2: ixgbe_watchdog: NIC Link is Up 10 Gbps, Flow Control: RX/TX
May  6 09:52:34 simech-ice kernel: ixgbe: eth3: ixgbe_watchdog: NIC Link is Up 10 Gbps, Flow Control: RX/TX
May  6 09:52:42 simech-ice kernel: ixgbe: eth3: ixgbe_check_tx_hang: Detected Tx Unit Hang
May  6 09:52:42 simech-ice kernel:   TDH                  <189>
May  6 09:52:42 simech-ice kernel:   TDT                  <159>
May  6 09:52:42 simech-ice kernel:   next_to_use          <159>
May  6 09:52:42 simech-ice kernel:   next_to_clean        <184>
May  6 09:52:42 simech-ice kernel: tx_buffer_info[next_to_clean]
May  6 09:52:42 simech-ice kernel:   time_stamp           <11e042edb>
May  6 09:52:42 simech-ice kernel:   next_to_watch        <185>
May  6 09:52:42 simech-ice kernel:   jiffies              <11e0449ec>
May  6 09:52:42 simech-ice kernel:   next_to_watch.status <17a8209>
May  6 09:52:45 simech-ice kernel: NETDEV WATCHDOG: eth2: transmit timed out
May  6 09:52:48 simech-ice kernel: ixgbe: eth2: ixgbe_check_tx_hang: Detected Tx Unit Hang
May  6 09:52:48 simech-ice kernel:   TDH                  <d>
May  6 09:52:48 simech-ice kernel:   TDT                  <3e6>
May  6 09:52:48 simech-ice kernel:   next_to_use          <3e6>
May  6 09:52:48 simech-ice kernel:   next_to_clean        <9>
May  6 09:52:48 simech-ice kernel: tx_buffer_info[next_to_clean]
May  6 09:52:48 simech-ice kernel:   time_stamp           <11e042de5>
May  6 09:52:48 simech-ice kernel:   next_to_watch        <a>
May  6 09:52:48 simech-ice kernel:   jiffies              <11e045e0b>
May  6 09:52:48 simech-ice kernel:   next_to_watch.status <17a8209>
May  6 09:52:48 simech-ice kernel: ixgbe: eth2: ixgbe_check_tx_hang: Detected Tx Unit Hang
May  6 09:52:48 simech-ice kernel:   TDH                  <78>
May  6 09:52:48 simech-ice kernel:   TDT                  <52>
May  6 09:52:48 simech-ice kernel:   next_to_use          <52>
May  6 09:52:48 simech-ice kernel:   next_to_clean        <73>
May  6 09:52:48 simech-ice kernel: tx_buffer_info[next_to_clean]
May  6 09:52:48 simech-ice kernel:   time_stamp           <11e042e11>
May  6 09:52:48 simech-ice kernel:   next_to_watch        <74>
May  6 09:52:48 simech-ice kernel:   jiffies              <11e045e7c>
May  6 09:52:48 simech-ice kernel:   next_to_watch.status <17a8209>
May  6 09:52:48 simech-ice kernel: ixgbe: eth3: ixgbe_watchdog: NIC Link is Down
May  6 09:52:48 simech-ice kernel: ixgbe: eth3: ixgbe_watchdog: NIC Link is Up 10 Gbps, Flow Control: RX/TX
May  6 09:52:48 simech-ice kernel: ixgbe: eth3: ixgbe_watchdog: NIC Link is Down
May  6 09:52:48 simech-ice kernel: ixgbe: eth2: ixgbe_watchdog: NIC Link is Up 10 Gbps, Flow Control: RX/TX
May  6 09:52:48 simech-ice kernel: ixgbe: eth3: ixgbe_watchdog: NIC Link is Up 10 Gbps, Flow Control: RX/TX
May  6 09:52:59 simech-ice kernel: NETDEV WATCHDOG: eth3: transmit timed out
May  6 09:52:59 simech-ice kernel: NETDEV WATCHDOG: eth2: transmit timed out
May  6 09:52:59 simech-ice kernel: ixgbe: eth2: ixgbe_watchdog: NIC Link is Up 10 Gbps, Flow Control: RX/TX
May  6 09:52:59 simech-ice kernel: ixgbe: eth3: ixgbe_watchdog: NIC Link is Up 10 Gbps, Flow Control: RX/TX
May  6 09:53:07 simech-ice kernel: ixgbe: eth2: ixgbe_check_tx_hang: Detected Tx Unit Hang
May  6 09:53:07 simech-ice kernel:   TDH                  <28>
May  6 09:53:07 simech-ice kernel:   TDT                  <3>
May  6 09:53:07 simech-ice kernel:   next_to_use          <3>
May  6 09:53:07 simech-ice kernel:   next_to_clean        <23>
May  6 09:53:07 simech-ice kernel: tx_buffer_info[next_to_clean]
May  6 09:53:07 simech-ice kernel:   time_stamp           <11e049a4d>
May  6 09:53:07 simech-ice kernel:   next_to_watch        <24>
May  6 09:53:07 simech-ice kernel:   jiffies              <11e04a866>
May  6 09:53:07 simech-ice kernel:   next_to_watch.status <17a8209>
May  6 09:53:07 simech-ice kernel: ixgbe: eth3: ixgbe_check_tx_hang: Detected Tx Unit Hang
May  6 09:53:07 simech-ice kernel:   TDH                  <2ad>
May  6 09:53:07 simech-ice kernel:   TDT                  <28c>
May  6 09:53:07 simech-ice kernel:   next_to_use          <28c>
May  6 09:53:07 simech-ice kernel:   next_to_clean        <2a7>
May  6 09:53:07 simech-ice kernel: tx_buffer_info[next_to_clean]
May  6 09:53:07 simech-ice kernel:   time_stamp           <11e04979e>
May  6 09:53:07 simech-ice kernel:   next_to_watch        <2a8>
May  6 09:53:07 simech-ice kernel:   jiffies              <11e04a880>
May  6 09:53:07 simech-ice kernel:   next_to_watch.status <17a8209>
May  6 09:53:10 simech-ice kernel: ixgbe: eth3: ixgbe_check_tx_hang: Detected Tx Unit Hang
May  6 09:53:10 simech-ice kernel:   TDH                  <129>
May  6 09:53:10 simech-ice kernel:   TDT                  <103>
May  6 09:53:10 simech-ice kernel:   next_to_use          <103>
May  6 09:53:10 simech-ice kernel:   next_to_clean        <125>
May  6 09:53:10 simech-ice kernel: tx_buffer_info[next_to_clean]
May  6 09:53:10 simech-ice kernel:   time_stamp           <11e04b236>
May  6 09:53:10 simech-ice kernel:   next_to_watch        <126>
May  6 09:53:10 simech-ice kernel:   jiffies              <11e04b61f>
May  6 09:53:10 simech-ice kernel:   next_to_watch.status <17a8209>
May  6 09:53:14 simech-ice kernel: ixgbe: eth3: ixgbe_check_tx_hang: Detected Tx Unit Hang
May  6 09:53:14 simech-ice kernel:   TDH                  <18e>
May  6 09:53:14 simech-ice kernel:   TDT                  <165>
May  6 09:53:14 simech-ice kernel:   next_to_use          <165>
May  6 09:53:14 simech-ice kernel:   next_to_clean        <189>
May  6 09:53:14 simech-ice kernel: tx_buffer_info[next_to_clean]
May  6 09:53:14 simech-ice kernel:   time_stamp           <11e04b24c>
May  6 09:53:14 simech-ice kernel:   next_to_watch        <18a>
May  6 09:53:14 simech-ice kernel:   jiffies              <11e04c4e4>
May  6 09:53:14 simech-ice kernel:   next_to_watch.status <17a8209>
May  6 09:53:14 simech-ice kernel: ixgbe: eth2: ixgbe_check_tx_hang: Detected Tx Unit Hang
May  6 09:53:14 simech-ice kernel:   TDH                  <3b>
May  6 09:53:14 simech-ice kernel:   TDT                  <16>
May  6 09:53:14 simech-ice kernel:   next_to_use          <16>
May  6 09:53:14 simech-ice kernel:   next_to_clean        <37>
May  6 09:53:14 simech-ice kernel: tx_buffer_info[next_to_clean]
May  6 09:53:14 simech-ice kernel:   time_stamp           <11e04b1d7>
May  6 09:53:14 simech-ice kernel:   next_to_watch        <38>
May  6 09:53:14 simech-ice kernel:   jiffies              <11e04c6e3>
May  6 09:53:14 simech-ice kernel:   next_to_watch.status <17a8209>


Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 3+ messages in thread

* RE: Detected Tx Unit Hang in ixgbe, kernel 2.6.25
  2008-05-06 17:04 Detected Tx Unit Hang in ixgbe, kernel 2.6.25 Ben Greear
@ 2008-05-06 20:42 ` Brandeburg, Jesse
  2008-05-06 20:58   ` Ben Greear
  0 siblings, 1 reply; 3+ messages in thread
From: Brandeburg, Jesse @ 2008-05-06 20:42 UTC (permalink / raw)
  To: Ben Greear, NetDev; +Cc: e1000-devel

Ben Greear wrote:
> I'm using a 10Gbps copper(CX4) dual-port NIC from silicomusa.com.
> It uses the Intel chipset and ixgbe driver.  I'm using
> kernel 2.6.25 plus some hacks (no patches to ixgbe).
> 
> This particular test case was to create 500 mac-vlans on
> each of the two ports and generate UDP traffic between
> them (I have a version of the send-to-self patch applied
> to my kernel and enabled.)
> 
> During the setup for this test, the interfaces would have
> been bounced (effectively ifdown, ifup), so that is the
> reason for the link going up and down.
> 
> I noticed 90%+ drop rate when I first started the test,
> and then after maybe 1-2 minutes, things calmed down and
> started working.  I checked /var/log/messages and saw the
> messages below.

do you have ipv6 enabled?  I've seen this behavior that if a port is
flooded before the events/X thread finishes, lots of packets get dropped
and the events/X thread takes a long time to complete.  Not sure if it
is related.
 
> I previously ran 5Gbps of traffic through the two ports
> with them acting like a bridge for more than 24-hours without
> any obvious problems, so I think the hardware is probably OK.
> 
> May  6 09:51:41 simech-ice kernel: ixgbe: eth3: ixgbe_check_tx_hang:
> Detected Tx Unit Hang 
> May  6 09:51:41 simech-ice kernel:   TDH                  <1e>
> May  6 09:51:41 simech-ice kernel:   TDT                  <3ff>
> May  6 09:51:46 simech-ice kernel: ixgbe: eth3: ixgbe_check_tx_hang:
> Detected Tx Unit Hang 
> May  6 09:51:46 simech-ice kernel:   TDH                  <28d>
> May  6 09:51:46 simech-ice kernel:   TDT                  <26c>
> May  6 09:51:47 simech-ice kernel: ixgbe: eth3: ixgbe_check_tx_hang:
> Detected Tx Unit Hang 
> May  6 09:51:47 simech-ice kernel:   TDH                  <33f>
> May  6 09:51:47 simech-ice kernel:   TDT                  <321>

hm, snipped above to demonstrate my point.  These appear to be false
hangs.  TDH is still moving (indicating the hardware is still processing
packets.)  Do you have flow control enabled?  Can you try with fewer
descriptors?  It is truly unlikely you need more than 512, usually.

The driver (incorrectly, will patch soon) defaults to flow control
enabled.  I suggest you disable it with ethtool -A

You might be able to just comment out the detect_tx_hung variable being
set, see if the problem goes away (false hang for sure then)

Jesse


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Detected Tx Unit Hang in ixgbe, kernel 2.6.25
  2008-05-06 20:42 ` Brandeburg, Jesse
@ 2008-05-06 20:58   ` Ben Greear
  0 siblings, 0 replies; 3+ messages in thread
From: Ben Greear @ 2008-05-06 20:58 UTC (permalink / raw)
  To: Brandeburg, Jesse; +Cc: NetDev, e1000-devel

Brandeburg, Jesse wrote:
> Ben Greear wrote:
>> I'm using a 10Gbps copper(CX4) dual-port NIC from silicomusa.com.
>> It uses the Intel chipset and ixgbe driver.  I'm using
>> kernel 2.6.25 plus some hacks (no patches to ixgbe).
>>
>> This particular test case was to create 500 mac-vlans on
>> each of the two ports and generate UDP traffic between
>> them (I have a version of the send-to-self patch applied
>> to my kernel and enabled.)
>>
>> During the setup for this test, the interfaces would have
>> been bounced (effectively ifdown, ifup), so that is the
>> reason for the link going up and down.
>>
>> I noticed 90%+ drop rate when I first started the test,
>> and then after maybe 1-2 minutes, things calmed down and
>> started working.  I checked /var/log/messages and saw the
>> messages below.
> 
> do you have ipv6 enabled?  I've seen this behavior that if a port is
> flooded before the events/X thread finishes, lots of packets get dropped
> and the events/X thread takes a long time to complete.  Not sure if it
> is related.

It is enabled, though I wasn't particularly using it (on purpose).

> hm, snipped above to demonstrate my point.  These appear to be false
> hangs.  TDH is still moving (indicating the hardware is still processing
> packets.)  Do you have flow control enabled?  Can you try with fewer
> descriptors?  It is truly unlikely you need more than 512, usually.
> 
> The driver (incorrectly, will patch soon) defaults to flow control
> enabled.  I suggest you disable it with ethtool -A
> 
> You might be able to just comment out the detect_tx_hung variable being
> set, see if the problem goes away (false hang for sure then)

Ok, I also noticed that softirqd was at around 100% CPU (2 of them in fact, on
this 2 x 4-core system.  But, the NICs were not obviously transmitting
many packets (as determined by looking at the tx/rx packet counters).

In subsequent tests, I see softirqd CPU usage go quite high when adding
mac-vlans, before I ever start traffic.  But, other applications (ntp, etc)
do seem to listen for new devices and open sockets per interface and probably
attempt to send some frames.

Also, this is a 64-bit kernel, with 8GB RAM, in case that matters.

Finally, I hit this a bit later.  I have no idea of the root cause here...it
seems mac-vlans are implicated, but it could be something else.  It is tainted
by my module, but this module was supposedly not really doing anything.  I
will also run some more tests w/out it loaded.

BUG: soft lockup - CPU#7 stuck for 61s! [ksoftirqd/7:25]
CPU 7:
Modules linked in: arc4 michael_mic wanlink(P) e1000e e1000 8021q redirdev macvlan pktgen rfcomm l2cap bluetooth autofs4 nfs lockd nfs_acl sunrpc ipv6 loop dm_multipath i5000_edac edac_core iTCO_wdt ixgbe i2c_i801 i2c_core pcspkr button iTCO_vendor_support sg sr_mod cdrom floppy dm_snapshot dm_zero dm_mirror dm_mod ata_generic pata_acpi ata_piix libata sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ssb ehci_hcd [last unloaded: x_tables]
Pid: 25, comm: ksoftirqd/7 Tainted: P         2.6.25 #1
RIP: 0010:[<ffffffff8120163d>]  [<ffffffff8120163d>] skb_clone+0x5a/0x5e
RSP: 0018:ffff81022f207d98  EFLAGS: 00000202
RAX: ffff81012173f300 RBX: ffff81022f207da8 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff810131b0f168 RDI: ffff81012173f368
RBP: ffff81022f207d10 R08: ffff81012173f300 R09: ffff810131b0f100
R10: 0000000000000040 R11: 0000000000000000 R12: ffffffff8100cb56
R13: ffff81022f207d10 R14: ffff810131b0f100 R15: ffff81022d5b6000
FS:  0000000000000000(0000) GS:ffff81022f0b8c80(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00007faf08544a90 CR3: 0000000000201000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400

Call Trace:
  <IRQ>  [<ffffffff8120163d>] ? skb_clone+0x5a/0x5e
  [<ffffffff8827f54a>] ? :macvlan:macvlan_handle_frame+0x102/0x222
  [<ffffffff81094817>] ? add_partial+0x49/0x51
  [<ffffffff81206db1>] ? netif_receive_skb+0x346/0x4f3
  [<ffffffff88123ed2>] ? :ixgbe:ixgbe_clean_rx_irq+0x467/0x666
  [<ffffffff881266b7>] ? :ixgbe:ixgbe_clean_rxonly+0x4a/0xa4
  [<ffffffff8120931e>] ? net_rx_action+0xb0/0x1c6
  [<ffffffff8103a030>] ? __do_softirq+0x4a/0xa5
  [<ffffffff8103a3b8>] ? ksoftirqd+0x0/0x11e
  [<ffffffff8100d0ac>] ? call_softirq+0x1c/0x28
  <EOI>  [<ffffffff8100e978>] ? do_softirq+0x34/0x72
  [<ffffffff8103a41c>] ? ksoftirqd+0x64/0x11e
  [<ffffffff81048088>] ? kthread+0x49/0x79
  [<ffffffff8100cd38>] ? child_rip+0xa/0x12
  [<ffffffff8104803f>] ? kthread+0x0/0x79
  [<ffffffff8100cd2e>] ? child_rip+0x0/0x12

unregister_netdevice: waiting for eth3#352 to become free. Usage count = 3
unregister_netdevice: waiting for eth3#352 to become free. Usage count = 3
unregister_netdevice: waiting for eth3#352 to become free. Usage count = 3
unregister_netdevice: waiting for eth3#352 to become free. Usage count = 3
unregister_netdevice: waiting for eth3#352 to become free. Usage count = 3


I'll try disabling the flow-control, and if that doesn't help,
will compile out ipv6 and try that too.

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2008-05-06 20:59 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-05-06 17:04 Detected Tx Unit Hang in ixgbe, kernel 2.6.25 Ben Greear
2008-05-06 20:42 ` Brandeburg, Jesse
2008-05-06 20:58   ` Ben Greear

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).