Netdev List

Netdev List
 help / color / mirror / Atom feed

* e1000 TX unit hang and a possibly kernel oops
From: Breno Leitao @ 2010-04-06 17:23 UTC (permalink / raw)
  To: e1000-devel@lists.sourceforge.net, netdev; +Cc: kamaleshb

Hi, 

During a test workload on kernel 2.6.27.19-5-ppc64 (SLES11), we are facing
a "TX Unit Hang" issue, and then a kernel oops.

We also tried the same workload with the sourceforge device driver and the
problem is also reproducible. When  TSO is off, the problem don't appear on
both driver. 

I am not sure if the TX hang issue could be related to oops that happen in the
virtual memory subsystem, that is why I decided to ask you guys.

Thanks, 

Mar 17 02:34:34 c862f3sq03 kernel: e1000: eth1: e1000_clean_tx_irq: Detected Tx Unit Hang
Mar 17 02:34:34 c862f3sq03 kernel:   Tx Queue             <0>
Mar 17 02:34:34 c862f3sq03 kernel:   TDH                  <d6>
Mar 17 02:34:34 c862f3sq03 kernel:   TDT                  <d1>
Mar 17 02:34:34 c862f3sq03 kernel:   next_to_use          <d1>
Mar 17 02:34:34 c862f3sq03 kernel:   next_to_clean        <d5>
Mar 17 02:34:34 c862f3sq03 kernel: buffer_info[next_to_clean]
Mar 17 02:34:34 c862f3sq03 kernel:   time_stamp           <10053bc21>
Mar 17 02:34:34 c862f3sq03 kernel:   next_to_watch        <da>
Mar 17 02:34:34 c862f3sq03 kernel:   jiffies              <10053bcbc>
Mar 17 02:34:34 c862f3sq03 kernel:   next_to_watch.status <0>
Mar 17 02:34:36 c862f3sq03 kernel: e1000: eth1: e1000_clean_tx_irq: Detected Tx Unit Hang
Mar 17 02:34:36 c862f3sq03 kernel:   Tx Queue             <0>
Mar 17 02:34:36 c862f3sq03 kernel:   TDH                  <d6>
Mar 17 02:34:36 c862f3sq03 kernel:   TDT                  <d1>
Mar 17 02:34:36 c862f3sq03 kernel:   next_to_use          <d1>
Mar 17 02:34:36 c862f3sq03 kernel:   next_to_clean        <d5>
Mar 17 02:34:36 c862f3sq03 kernel: buffer_info[next_to_clean]
Mar 17 02:34:36 c862f3sq03 kernel:   time_stamp           <10053bc21>
Mar 17 02:34:36 c862f3sq03 kernel:   next_to_watch        <da>
Mar 17 02:34:36 c862f3sq03 kernel:   jiffies              <10053bd84>
Mar 17 02:34:36 c862f3sq03 kernel:   next_to_watch.status <0>
Mar 17 02:34:38 c862f3sq03 kernel: e1000: eth1: e1000_clean_tx_irq: Detected Tx Unit Hang
Mar 17 02:34:38 c862f3sq03 kernel:   Tx Queue             <0>
Mar 17 02:34:38 c862f3sq03 kernel:   TDH                  <d6>
Mar 17 02:34:38 c862f3sq03 kernel:   TDT                  <d1>
Mar 17 02:34:38 c862f3sq03 kernel:   next_to_use          <d1>
Mar 17 02:34:38 c862f3sq03 kernel:   next_to_clean        <d5>
Mar 17 02:34:38 c862f3sq03 kernel: buffer_info[next_to_clean]
Mar 17 02:34:38 c862f3sq03 kernel:   time_stamp           <10053bc21>
Mar 17 02:34:38 c862f3sq03 kernel:   next_to_watch        <da>
Mar 17 02:34:38 c862f3sq03 kernel:   jiffies              <10053be4c>
Mar 17 02:34:38 c862f3sq03 kernel:   next_to_watch.status <0>
Mar 17 02:34:40 c862f3sq03 kernel: ------------[ cut here ]------------
Mar 17 02:34:40 c862f3sq03 kernel: Badness at net/sched/sch_generic.c:219
Mar 17 02:34:40 c862f3sq03 kernel: NIP: c000000000498eac LR: c000000000498d8c CTR: 0000000000000001
Mar 17 02:34:40 c862f3sq03 kernel: REGS: c00000000f68fa80 TRAP: 0700   Tainted: G           (2.6.27.19-5-ppc64)
Mar 17 02:34:40 c862f3sq03 kernel: MSR: 8000000000029032 <EE,ME,IR,DR>  CR: 88000022  XER: 00000010
Mar 17 02:34:40 c862f3sq03 kernel: TASK = c000000000918340[0] 'swapper' THREAD: c0000000009d0000 CPU: 0
Mar 17 02:34:40 c862f3sq03 kernel: GPR00: 0000000000000000 c00000000f68fd00 c0000000009cbc80 0000000000000080 
Mar 17 02:34:40 c862f3sq03 kernel: GPR04: 0000000000000000 c000000000498ce4 c0000000009bb550 c0000003be260980 
Mar 17 02:34:40 c862f3sq03 kernel: GPR08: 0000000000000002 c000000000c91e68 0000000000000080 ffffffffffffff01 
Mar 17 02:34:40 c862f3sq03 kernel: GPR12: 0000000000000000 c000000000a92c80 0000000000051bc3 0000000000051aa1 
Mar 17 02:34:40 c862f3sq03 kernel: GPR16: 0000000000051bbb c000000000a51280 0000000000000000 0000000000000000 
Mar 17 02:34:40 c862f3sq03 kernel: GPR20: c000000000bda198 c000000000bda598 c000000000bda998 0000000000000002 
Mar 17 02:34:40 c862f3sq03 kernel: GPR24: ffffffffffffffff 0000000000000000 0000000000000000 0000000000000000 
Mar 17 02:34:40 c862f3sq03 kernel: GPR28: c0000003be2c9400 0000000000000001 c000000000961118 c0000003be260980 
Mar 17 02:34:40 c862f3sq03 kernel: NIP [c000000000498eac] .dev_watchdog+0x190/0x2cc
Mar 17 02:34:40 c862f3sq03 kernel: LR [c000000000498d8c] .dev_watchdog+0x70/0x2cc
Mar 17 02:34:40 c862f3sq03 kernel: Call Trace:
Mar 17 02:34:40 c862f3sq03 kernel: [c00000000f68fd00] [c000000000498d8c] .dev_watchdog+0x70/0x2cc (unreliable)
Mar 17 02:34:40 c862f3sq03 kernel: [c00000000f68fdc0] [c00000000009cec4] .run_timer_softirq+0x1e8/0x2c4
Mar 17 02:34:40 c862f3sq03 kernel: [c00000000f68fec0] [c00000000009705c] .__do_softirq+0x13c/0x284
Mar 17 02:34:40 c862f3sq03 kernel: [c00000000f68ff90] [c00000000002ab4c] .call_do_softirq+0x14/0x24
Mar 17 02:34:40 c862f3sq03 kernel: [c0000000009d3880] [c00000000000db5c] .do_softirq+0x88/0xf0
Mar 17 02:34:40 c862f3sq03 kernel: [c0000000009d3920] [c000000000096e38] .irq_exit+0x5c/0xb4
Mar 17 02:34:40 c862f3sq03 kernel: [c0000000009d39a0] [c000000000027c1c] .timer_interrupt+0xd8/0x104
Mar 17 02:34:40 c862f3sq03 kernel: [c0000000009d3a30] [c000000000003718] decrementer_common+0x118/0x180
Mar 17 02:34:40 c862f3sq03 kernel: --- Exception: 901 at .pseries_dedicated_idle_sleep+0xf0/0x1b8
Mar 17 02:34:40 c862f3sq03 kernel:     LR = .pseries_dedicated_idle_sleep+0xe0/0x1b8
Mar 17 02:34:40 c862f3sq03 kernel: [c0000000009d3d20] [c0000000000501b8] .pseries_dedicated_idle_sleep+0x80/0x1b8 (unreliable)
Mar 17 02:34:40 c862f3sq03 kernel: [c0000000009d3dd0] [c000000000013114] .cpu_idle+0xfc/0x1a4
Mar 17 02:34:40 c862f3sq03 kernel: [c0000000009d3e60] [c00000000051b124] .rest_init+0x7c/0x94
Mar 17 02:34:40 c862f3sq03 kernel: [c0000000009d3ee0] [c000000000760d18] .start_kernel+0x52c/0x554
Mar 17 02:34:40 c862f3sq03 kernel: [c0000000009d3f90] [c000000000008568] .start_here_common+0x3c/0x54
Mar 17 02:34:40 c862f3sq03 kernel: Instruction dump:
Mar 17 02:34:40 c862f3sq03 kernel: 48000064 e97e8018 e81c0330 e93c033a 7d290214 e80b0000 7d604851 40800048 
Mar 17 02:34:40 c862f3sq03 kernel: e93e8028 80090000 2f800000 40fe0014 <0fe00000> 38000001 e93e8028 90090000 
Mar 17 02:34:40 c862f3sq03 kernel: e1000: eth1: e1000_clean_tx_irq: Detected Tx Unit Hang
Mar 17 02:34:40 c862f3sq03 kernel:   Tx Queue             <0>
Mar 17 02:34:40 c862f3sq03 kernel:   TDH                  <d6>
Mar 17 02:34:40 c862f3sq03 kernel:   TDT                  <d1>
Mar 17 02:34:40 c862f3sq03 kernel:   next_to_use          <d1>
Mar 17 02:34:40 c862f3sq03 kernel:   next_to_clean        <d5>
Mar 17 02:34:40 c862f3sq03 kernel: buffer_info[next_to_clean]
Mar 17 02:34:40 c862f3sq03 kernel:   time_stamp           <10053bc21>
Mar 17 02:34:40 c862f3sq03 kernel:   next_to_watch        <da>
Mar 17 02:34:40 c862f3sq03 kernel:   jiffies              <10053bf14>
Mar 17 02:34:40 c862f3sq03 kernel:   next_to_watch.status <0>
Mar 17 02:34:44 c862f3sq03 kernel: e1000: eth1: e1000_watchdog: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
--snip--
Mar 17 15:31:05 c862f3sq04 kernel: e1000: eth1: e1000_clean_tx_irq: Detected Tx Unit Hang
Mar 17 15:31:05 c862f3sq04 kernel:   Tx Queue             <0>
Mar 17 15:31:05 c862f3sq04 kernel:   TDH                  <98>
Mar 17 15:31:05 c862f3sq04 kernel:   TDT                  <93>
Mar 17 15:31:05 c862f3sq04 kernel:   next_to_use          <93>
Mar 17 15:31:05 c862f3sq04 kernel:   next_to_clean        <97>
Mar 17 15:31:05 c862f3sq04 kernel: buffer_info[next_to_clean]
Mar 17 15:31:05 c862f3sq04 kernel:   time_stamp           <10080baf1>
Mar 17 15:31:05 c862f3sq04 kernel:   next_to_watch        <9a>
Mar 17 15:31:05 c862f3sq04 kernel:   jiffies              <10080bca8>
Mar 17 15:31:05 c862f3sq04 kernel:   next_to_watch.status <0>
Mar 17 15:31:06 c862f3sq04 kernel: ------------[ cut here ]------------
Mar 17 15:31:06 c862f3sq04 kernel: Badness at net/sched/sch_generic.c:219
Mar 17 15:31:06 c862f3sq04 kernel: NIP: c000000000498eac LR: c000000000498d8c CTR: 0000000000000001
Mar 17 15:31:06 c862f3sq04 kernel: REGS: c00000000f68fa80 TRAP: 0700   Tainted: G           (2.6.27.19-5-ppc64)
Mar 17 15:31:06 c862f3sq04 kernel: MSR: 8000000000029032 <EE,ME,IR,DR>  CR: 88000022  XER: 00000010
Mar 17 15:31:06 c862f3sq04 kernel: TASK = c000000000918340[0] 'swapper' THREAD: c0000000009d0000 CPU: 0
Mar 17 15:31:06 c862f3sq04 kernel: GPR00: 0000000000000000 c00000000f68fd00 c0000000009cbc80 0000000000000080 
Mar 17 15:31:06 c862f3sq04 kernel: GPR04: 0000000000000000 c000000000498ce4 c0000000009bb550 c0000001dcf6d180 
Mar 17 15:31:06 c862f3sq04 kernel: GPR08: 0000000000000002 c000000000c91e68 0000000000000080 ffffffffffffffd9 
Mar 17 15:31:06 c862f3sq04 kernel: GPR12: 0000000000000000 c000000000a92c80 0000000000051bc3 0000000000051aa1 
Mar 17 15:31:06 c862f3sq04 kernel: GPR16: 0000000000051bbb c000000000a51280 0000000000000000 0000000000000000 
Mar 17 15:31:06 c862f3sq04 kernel: GPR20: c000000000bda198 c000000000bda598 c000000000bda998 0000000000000002 
Mar 17 15:31:06 c862f3sq04 kernel: GPR24: ffffffffffffffff 0000000000000000 0000000000000000 0000000000000000 
Mar 17 15:31:06 c862f3sq04 kernel: GPR28: c0000001dcb60380 0000000000000001 c000000000961118 c0000001dcf6d180 
Mar 17 15:31:06 c862f3sq04 kernel: NIP [c000000000498eac] .dev_watchdog+0x190/0x2cc
Mar 17 15:31:06 c862f3sq04 kernel: LR [c000000000498d8c] .dev_watchdog+0x70/0x2cc
Mar 17 15:31:06 c862f3sq04 kernel: Call Trace:
Mar 17 15:31:06 c862f3sq04 kernel: [c00000000f68fd00] [c000000000498d8c] .dev_watchdog+0x70/0x2cc (unreliable)
Mar 17 15:31:06 c862f3sq04 kernel: [c00000000f68fdc0] [c00000000009cec4] .run_timer_softirq+0x1e8/0x2c4
Mar 17 15:31:06 c862f3sq04 kernel: [c00000000f68fec0] [c00000000009705c] .__do_softirq+0x13c/0x284
Mar 17 15:31:06 c862f3sq04 kernel: [c00000000f68ff90] [c00000000002ab4c] .call_do_softirq+0x14/0x24
Mar 17 15:31:06 c862f3sq04 kernel: [c0000000009d3880] [c00000000000db5c] .do_softirq+0x88/0xf0
Mar 17 15:31:06 c862f3sq04 kernel: [c0000000009d3920] [c000000000096e38] .irq_exit+0x5c/0xb4
Mar 17 15:31:06 c862f3sq04 kernel: [c0000000009d39a0] [c000000000027c1c] .timer_interrupt+0xd8/0x104
Mar 17 15:31:06 c862f3sq04 kernel: [c0000000009d3a30] [c000000000003718] decrementer_common+0x118/0x180
Mar 17 15:31:06 c862f3sq04 kernel: --- Exception: 901 at .pseries_dedicated_idle_sleep+0xe8/0x1b8
Mar 17 15:31:06 c862f3sq04 kernel:     LR = .pseries_dedicated_idle_sleep+0xe0/0x1b8
Mar 17 15:31:06 c862f3sq04 kernel: [c0000000009d3d20] [c0000000000501b8] .pseries_dedicated_idle_sleep+0x80/0x1b8 (unreliable)
Mar 17 15:31:06 c862f3sq04 kernel: [c0000000009d3dd0] [c000000000013114] .cpu_idle+0xfc/0x1a4
Mar 17 15:31:06 c862f3sq04 kernel: [c0000000009d3e60] [c00000000051b124] .rest_init+0x7c/0x94
Mar 17 15:31:06 c862f3sq04 kernel: [c0000000009d3ee0] [c000000000760d18] .start_kernel+0x52c/0x554
Mar 17 15:31:06 c862f3sq04 kernel: [c0000000009d3f90] [c000000000008568] .start_here_common+0x3c/0x54
Mar 17 15:31:06 c862f3sq04 kernel: Instruction dump:
Mar 17 15:31:06 c862f3sq04 kernel: 48000064 e97e8018 e81c0330 e93c033a 7d290214 e80b0000 7d604851 40800048 
Mar 17 15:31:06 c862f3sq04 kernel: e93e8028 80090000 2f800000 40fe0014 <0fe00000> 38000001 e93e8028 90090000 
Mar 17 15:31:09 c862f3sq04 kernel: e1000: eth1: e1000_watchdog: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
--snip--
Mar 17 15:59:54 c862f3sq02 kernel: e1000: eth1: e1000_clean_tx_irq: Detected Tx Unit Hang
Mar 17 15:59:54 c862f3sq02 kernel:   Tx Queue             <0>
Mar 17 15:59:54 c862f3sq02 kernel:   TDH                  <3d>
Mar 17 15:59:54 c862f3sq02 kernel:   TDT                  <1f>
Mar 17 15:59:54 c862f3sq02 kernel:   next_to_use          <1f>
Mar 17 15:59:54 c862f3sq02 kernel:   next_to_clean        <3c>
Mar 17 15:59:54 c862f3sq02 kernel: buffer_info[next_to_clean]
Mar 17 15:59:54 c862f3sq02 kernel:   time_stamp           <1004d6039>
Mar 17 15:59:54 c862f3sq02 kernel:   next_to_watch        <41>
Mar 17 15:59:54 c862f3sq02 kernel:   jiffies              <1004d6164>
Mar 17 15:59:54 c862f3sq02 kernel:   next_to_watch.status <0>
Mar 17 15:59:54 c862f3sq02 kernel: klogd 1.4.1, ---------- state change ----------
Mar 17 15:59:56 c862f3sq02 kernel: e1000: eth1: e1000_clean_tx_irq: Detected Tx Unit Hang
Mar 17 15:59:56 c862f3sq02 kernel:   Tx Queue             <0>
Mar 17 15:59:56 c862f3sq02 kernel:   TDH                  <3d>
Mar 17 15:59:56 c862f3sq02 kernel:   TDT                  <1f>
Mar 17 15:59:56 c862f3sq02 kernel:   next_to_use          <1f>
Mar 17 15:59:56 c862f3sq02 kernel:   next_to_clean        <3c>
Mar 17 15:59:56 c862f3sq02 kernel: buffer_info[next_to_clean]
Mar 17 15:59:56 c862f3sq02 kernel:   time_stamp           <1004d6039>
Mar 17 15:59:56 c862f3sq02 kernel:   next_to_watch        <41>
Mar 17 15:59:56 c862f3sq02 kernel:   jiffies              <1004d622c>
Mar 17 15:59:56 c862f3sq02 kernel:   next_to_watch.status <0>
Mar 17 15:59:58 c862f3sq02 kernel: e1000: eth1: e1000_clean_tx_irq: Detected Tx Unit Hang
Mar 17 15:59:58 c862f3sq02 kernel:   Tx Queue             <0>
Mar 17 15:59:58 c862f3sq02 kernel:   TDH                  <3d>
Mar 17 15:59:58 c862f3sq02 kernel:   TDT                  <1f>
Mar 17 15:59:58 c862f3sq02 kernel:   next_to_use          <1f>
Mar 17 15:59:58 c862f3sq02 kernel:   next_to_clean        <3c>
Mar 17 15:59:58 c862f3sq02 kernel: buffer_info[next_to_clean]
Mar 17 15:59:58 c862f3sq02 kernel:   time_stamp           <1004d6039>
Mar 17 15:59:58 c862f3sq02 kernel:   next_to_watch        <41>
Mar 17 15:59:58 c862f3sq02 kernel:   jiffies              <1004d62f4>
Mar 17 15:59:58 c862f3sq02 kernel:   next_to_watch.status <0>
Mar 17 16:00:03 c862f3sq02 kernel: e1000: eth1: e1000_watchdog: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX

Mar 17 17:42:59 c862f3sq02 kernel: e1000: eth1: e1000_clean_tx_irq: Detected Tx Unit Hang
Mar 17 17:42:59 c862f3sq02 kernel:   Tx Queue             <0>
Mar 17 17:42:59 c862f3sq02 kernel:   TDH                  <ec>
Mar 17 17:42:59 c862f3sq02 kernel:   TDT                  <e8>
Mar 17 17:42:59 c862f3sq02 kernel:   next_to_use          <e8>
Mar 17 17:42:59 c862f3sq02 kernel:   next_to_clean        <ec>
Mar 17 17:42:59 c862f3sq02 kernel: buffer_info[next_to_clean]
Mar 17 17:42:59 c862f3sq02 kernel:   time_stamp           <10056d08d>
Mar 17 17:42:59 c862f3sq02 kernel:   next_to_watch        <f0>
Mar 17 17:42:59 c862f3sq02 kernel:   jiffies              <10056d168>
Mar 17 17:42:59 c862f3sq02 kernel:   next_to_watch.status <0>
Mar 17 17:43:01 c862f3sq02 kernel: e1000: eth1: e1000_clean_tx_irq: Detected Tx Unit Hang
Mar 17 17:43:01 c862f3sq02 kernel:   Tx Queue             <0>
Mar 17 17:43:01 c862f3sq02 kernel:   TDH                  <ec>
Mar 17 17:43:01 c862f3sq02 kernel:   TDT                  <e8>
Mar 17 17:43:01 c862f3sq02 kernel:   next_to_use          <e8>
Mar 17 17:43:01 c862f3sq02 kernel:   next_to_clean        <ec>
Mar 17 17:43:01 c862f3sq02 kernel: buffer_info[next_to_clean]
Mar 17 17:43:01 c862f3sq02 kernel:   time_stamp           <10056d08d>
Mar 17 17:43:01 c862f3sq02 kernel:   next_to_watch        <f0>
Mar 17 17:43:01 c862f3sq02 kernel:   jiffies              <10056d230>
Mar 17 17:43:01 c862f3sq02 kernel:   next_to_watch.status <0>
Mar 17 17:43:06 c862f3sq02 kernel: e1000: eth1: e1000_watchdog: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: R
--snip--

Mar 17 18:43:32 c862f3sq03 kernel: e1000: eth1: e1000_clean_tx_irq: Detected Tx Unit Hang
Mar 17 18:43:32 c862f3sq03 kernel:   Tx Queue             <0>
Mar 17 18:43:32 c862f3sq03 kernel:   TDH                  <3d>
Mar 17 18:43:32 c862f3sq03 kernel:   TDT                  <37>
Mar 17 18:43:32 c862f3sq03 kernel:   next_to_use          <37>
Mar 17 18:43:32 c862f3sq03 kernel:   next_to_clean        <3c>
Mar 17 18:43:32 c862f3sq03 kernel: buffer_info[next_to_clean]
Mar 17 18:43:32 c862f3sq03 kernel:   time_stamp           <100ac71f5>
Mar 17 18:43:32 c862f3sq03 kernel:   next_to_watch        <41>
Mar 17 18:43:32 c862f3sq03 kernel:   jiffies              <100ac72e4>
Mar 17 18:43:32 c862f3sq03 kernel:   next_to_watch.status <0>
Mar 17 18:43:32 c862f3sq03 kernel: klogd 1.4.1, ---------- state change ----------
Mar 17 18:43:34 c862f3sq03 kernel: e1000: eth1: e1000_clean_tx_irq: Detected Tx Unit Hang
Mar 17 18:43:34 c862f3sq03 kernel:   Tx Queue             <0>
Mar 17 18:43:34 c862f3sq03 kernel:   TDH                  <3d>
Mar 17 18:43:34 c862f3sq03 kernel:   TDT                  <37>
Mar 17 18:43:34 c862f3sq03 kernel:   next_to_use          <37>
Mar 17 18:43:34 c862f3sq03 kernel:   next_to_clean        <3c>
Mar 17 18:43:34 c862f3sq03 kernel: buffer_info[next_to_clean]
Mar 17 18:43:34 c862f3sq03 kernel:   time_stamp           <100ac71f5>
Mar 17 18:43:34 c862f3sq03 kernel:   next_to_watch        <41>
Mar 17 18:43:34 c862f3sq03 kernel:   jiffies              <100ac73ac>
Mar 17 18:43:34 c862f3sq03 kernel:   next_to_watch.status <0>
Mar 17 18:43:36 c862f3sq03 kernel: e1000: eth1: e1000_clean_tx_irq: Detected Tx Unit Hang
Mar 17 18:43:36 c862f3sq03 kernel:   Tx Queue             <0>
Mar 17 18:43:36 c862f3sq03 kernel:   TDH                  <3d>
Mar 17 18:43:36 c862f3sq03 kernel:   TDT                  <37>
Mar 17 18:43:36 c862f3sq03 kernel:   next_to_use          <37>
Mar 17 18:43:36 c862f3sq03 kernel:   next_to_clean        <3c>
Mar 17 18:43:36 c862f3sq03 kernel: buffer_info[next_to_clean]
Mar 17 18:43:36 c862f3sq03 kernel:   time_stamp           <100ac71f5>
Mar 17 18:43:36 c862f3sq03 kernel:   next_to_watch        <41>
Mar 17 18:43:36 c862f3sq03 kernel:   jiffies              <100ac7474>
Mar 17 18:43:36 c862f3sq03 kernel:   next_to_watch.status <0>
Mar 17 18:43:38 c862f3sq03 kernel: e1000: eth1: e1000_clean_tx_irq: Detected Tx Unit Hang
Mar 17 18:43:38 c862f3sq03 kernel:   Tx Queue             <0>
Mar 17 18:43:38 c862f3sq03 kernel:   TDH                  <3d>
Mar 17 18:43:38 c862f3sq03 kernel:   TDT                  <37>
Mar 17 18:43:38 c862f3sq03 kernel:   next_to_use          <37>
Mar 17 18:43:38 c862f3sq03 kernel:   next_to_clean        <3c>
Mar 17 18:43:38 c862f3sq03 kernel: buffer_info[next_to_clean]
Mar 17 18:43:38 c862f3sq03 kernel:   time_stamp           <100ac71f5>
Mar 17 18:43:38 c862f3sq03 kernel:   next_to_watch        <41>
Mar 17 18:43:38 c862f3sq03 kernel:   jiffies              <100ac753c>
Mar 17 18:43:38 c862f3sq03 kernel:   next_to_watch.status <0>
--snip--
Mar 17 19:21:15 c862f3sq02 kernel: Unable to handle kernel paging request for data at address 0x00000000
Mar 17 19:21:15 c862f3sq02 kernel: Faulting instruction address: 0xc00000000010776c
Mar 17 19:21:15 c862f3sq02 kernel: Oops: Kernel access of bad area, sig: 11 [#1]
Mar 17 19:21:15 c862f3sq02 kernel: SMP NR_CPUS=1024 NUMA pSeries
Mar 17 19:21:15 c862f3sq02 kernel: Modules linked in: mmfs26(X) mmfslinux(X) tracedev(X) nfs lockd nfs_acl sunrpc dm_round_robin scsi_dh_rdac dm_multipath scsi_dh ipv6 af_packet fuse loop dm_mod lpfc sr_mod cdrom ses scsi_transport_fc sg enclosure e100 e1000 scsi_tgt ib_ehca mii ib_core sd_mod crc_t10dif ipr(X) pata_pdc2027x libata scsi_mod
Mar 17 19:21:15 c862f3sq02 kernel: Supported: Yes, External
Mar 17 19:21:15 c862f3sq02 kernel: NIP: c00000000010776c LR: c000000000107734 CTR: 0000000000000000
Mar 17 19:21:15 c862f3sq02 kernel: REGS: c0000001d0f6b560 TRAP: 0300   Tainted: G        W  (2.6.27.19-5-ppc64)
Mar 17 19:21:15 c862f3sq02 kernel: MSR: 8000000000009032 <EE,ME,IR,DR>  CR: 44000002  XER: 00000010
Mar 17 19:21:15 c862f3sq02 kernel: DAR: 0000000000000000, DSISR: 0000000040000000
Mar 17 19:21:15 c862f3sq02 kernel: TASK = c0000001db8bada0[11171] 'bash' THREAD: c0000001d0f68000 CPU: 2
Mar 17 19:21:15 c862f3sq02 kernel: GPR00: 0000000000000000 c0000001d0f6b7e0 c0000000009cbc80 0000000000000000 
Mar 17 19:21:15 c862f3sq02 kernel: GPR04: c0000001da3c2300 00000000000000d0 0000000000000000 c0000001a7310000 
Mar 17 19:21:15 c862f3sq02 kernel: GPR08: 000000000001a731 00000000000001a7 0000000000000000 0000000000000000 
Mar 17 19:21:15 c862f3sq02 kernel: GPR12: 0000000000000005 c000000000a93080 c000000000094818 0000000000000000 
Mar 17 19:21:15 c862f3sq02 kernel: GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000002000000 
Mar 17 19:21:15 c862f3sq02 kernel: GPR20: 00006b3600000093 c0000001da3c2300 000004000003dfd0 c0000001da55c5b0 
Mar 17 19:21:15 c862f3sq02 kernel: GPR24: c0000001db890000 c0000001a7310018 c000000001f50c50 c000000001fa3600 
Mar 17 19:21:15 c862f3sq02 kernel: GPR28: c0000000020d2c00 0000000000000000 c00000000094b070 00006b3600000093 
Mar 17 19:21:15 c862f3sq02 kernel: NIP [c00000000010776c] .do_wp_page+0x520/0x830
Mar 17 19:21:15 c862f3sq02 kernel: LR [c000000000107734] .do_wp_page+0x4e8/0x830
Mar 17 19:21:15 c862f3sq02 kernel: Call Trace:
Mar 17 19:21:15 c862f3sq02 kernel: [c0000001d0f6b7e0] [c0000000001076f4] .do_wp_page+0x4a8/0x830 (unreliable)
Mar 17 19:21:15 c862f3sq02 kernel: [c0000001d0f6b8b0] [c000000000109be8] .handle_mm_fault+0x36c/0x478
Mar 17 19:21:15 c862f3sq02 kernel: [c0000001d0f6b990] [c000000000518b10] .do_page_fault+0x3c4/0x5c0
Mar 17 19:21:15 c862f3sq02 kernel: [c0000001d0f6bac0] [c00000000000567c] handle_page_fault+0x20/0x5c
Mar 17 19:21:15 c862f3sq02 kernel: --- Exception: 301 at .schedule_tail+0x78/0x94
Mar 17 19:21:15 c862f3sq02 kernel:     LR = .schedule_tail+0x70/0x94
Mar 17 19:21:15 c862f3sq02 kernel: [c0000001d0f6bdb0] [c00000000008bdd8] .schedule_tail+0x2c/0x94 (unreliable)
Mar 17 19:21:15 c862f3sq02 kernel: [c0000001d0f6be30] [c000000000008910] .ret_from_fork+0x4/0x74
Mar 17 19:21:15 c862f3sq02 kernel: Instruction dump:
Mar 17 19:21:15 c862f3sq02 kernel: 409e02c8 e8f80000 3c004000 e97e8008 39400000 780007c6 78e905a4 7d290214 
Mar 17 19:21:15 c862f3sq02 kernel: 7920e120 79288402 78001f24 79294602 <7d6b002a> 2fab0000 419e000c 79202428 
Mar 17 19:21:15 c862f3sq02 kernel: ---[ end trace e6c07b27e638f8ff ]---


^ permalink raw reply

* Re: [RFC][PATCH] ipmr:  Fix struct mfcctl to be independent of MAXVIFS v2
From: Eric W. Biederman @ 2010-04-06 17:23 UTC (permalink / raw)
  To: Ben Greear
  Cc: netdev, David S. Miller, Eric Dumazet, Patrick McHardy, Ilia K,
	Tom Goff
In-Reply-To: <4BBB67FE.6020209@candelatech.com>

Ben Greear <greearb@candelatech.com> writes:

> On 04/06/2010 08:38 AM, Eric W. Biederman wrote:
>>
>> Right now if you recompile the kernel increasing MAXVIFS
>> to support more VIFS users of the MRT_ADD_VIF and MRT_DEL_VIF
>> will break because the ABI changed.
>>
>> My goal is an API that works with just a recompile of existing
>> applications, and an ABI that continues to work for old
>> applications.
>>
>> The unused/dead fields at the end of struct mfcctl make this
>> exercise more difficult than it should be.
>>
>> - Rename the existing struct mfcctl mfcctl_old.
>> - Define a new and larger struct mfcctl that we can detect
>>    by size.
>>
>>    The new and larger struct mfcctl won't have trailing garbage
>>    fields so we can accept anything of that size or larger,
>>    and simply ignore the entries that are above MAXVIFS.
>>
>> My new struct mfcctl is now 128 bytes which is noticeable on
>> the stack but should still be small enough not to cause problems.
>>
>> v2:  Rework the support larger arrays so that most/all? existing
>>     applications can simply be recompiled and work with a larger
>>     maximum number of VIFS.
>
> If we're going to change the ABI, can we not support an arbitrary
> number of VIFS instead of just a larger fixed maximum?

The ABI as I have specified should work for any larger structure than
I have specified.  But like select many applications will limit themselves
to use the definition of struct mfcctl that is passed to them.

Eric

^ permalink raw reply

* RE: [PATCH] [V3] Add non-Virtex5 support for LL TEMAC driver
From: John Linn @ 2010-04-06 17:11 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, linuxppc-dev, grant.likely, jwboyer, john.williams,
	michal.simek, John Tyner
In-Reply-To: <1270573233.2081.47.camel@edumazet-laptop>

> -----Original Message-----
> From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
> Sent: Tuesday, April 06, 2010 11:01 AM
> To: John Linn
> Cc: netdev@vger.kernel.org; linuxppc-dev@ozlabs.org; grant.likely@secretlab.ca;
> jwboyer@linux.vnet.ibm.com; john.williams@petalogix.com; michal.simek@petalogix.com; John Tyner
> Subject: RE: [PATCH] [V3] Add non-Virtex5 support for LL TEMAC driver
> 
> Le mardi 06 avril 2010 à 10:12 -0600, John Linn a écrit :
> > > -----Original Message-----
> > > From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
> > > Sent: Monday, April 05, 2010 3:30 PM
> > > To: John Linn
> > > Cc: netdev@vger.kernel.org; linuxppc-dev@ozlabs.org; grant.likely@secretlab.ca;
> > > jwboyer@linux.vnet.ibm.com; john.williams@petalogix.com; michal.simek@petalogix.com; John Tyner
> > > Subject: Re: [PATCH] [V3] Add non-Virtex5 support for LL TEMAC driver
> > >
> > > Le lundi 05 avril 2010 à 15:11 -0600, John Linn a écrit :
> > > > This patch adds support for using the LL TEMAC Ethernet driver on
> > > > non-Virtex 5 platforms by adding support for accessing the Soft DMA
> > > > registers as if they were memory mapped instead of solely through the
> > > > DCR's (available on the Virtex 5).
> > > >
> > > > The patch also updates the driver so that it runs on the MicroBlaze.
> > > > The changes were tested on the PowerPC 440, PowerPC 405, and the
> > > > MicroBlaze platforms.
> > > >
> > > > Signed-off-by: John Tyner <jtyner@cs.ucr.edu>
> > > > Signed-off-by: John Linn <john.linn@xilinx.com>
> > > >
> > > > ---
> > >
> > > > +/* Align the IP data in the packet on word boundaries as MicroBlaze
> > > > + * needs it.
> > > > + */
> > > > +
> > > >  #define XTE_ALIGN       32
> > > > -#define BUFFER_ALIGN(adr) ((XTE_ALIGN - ((u32) adr)) % XTE_ALIGN)
> > > > +#define BUFFER_ALIGN(adr) ((34 - ((u32) adr)) % XTE_ALIGN)
> > > >
> > >
> > > Very interesting way of doing this, but why such convoluted thing ?
> >
> > This is trying to align for a cache line (32 bytes) before my change.
> >
> > My change was then also making it align the IP data on a word boundary.
> >
> > >
> > > Because of the % 32, this is equivalent to :
> > >
> > > #define BUFFER_ALIGN(adr) ((2 - ((u32) adr)) % XTE_ALIGN)
> > >
> >
> > Yes, but I'm not sure that's clearer IMHO.
> >
> > > But wait, dont we recognise the magic constant NET_IP_ALIGN ?
> >
> > Yes it could be used.  I'm struggling with how to make this all be clearer.
> >
> 
> I am not saying its clearer, I am saying we have a standard way to
> handle this exact problem (aligning rcvs buffer so that IP header is
> aligned)
> 
> There is no need to invent new ones, this makes reviewing of this driver
> more difficult.
> 
> 
> > How about this?
> > #define BUFFER_ALIGN(adr) (((XTE_ALIGN + NET_IP_ALIGN) - ((u32) adr)) % XTE_ALIGN)
> >
> 
> Sorry, I still dont understand why you need XTE_ALIGN + ...
> 
> ((A + B) - C) % A   is equal to (B - C) % A
> 
> Which one is more readable ?

I'm fine with your suggestion.

#define BUFFER_ALIGN(adr) ((2 - ((u32) adr)) % XTE_ALIGN)

> 
> Please take a look at existing and clean code, no magic macro, and we
> can understand the intention.
> 
> find drivers/net | xargs grep -n netdev_alloc_skb_ip_align
> 
> 

Yes I see how it's used, but it only allows you to reserve 2 bytes in the skb with no options.




This email and any attachments are intended for the sole use of the named recipient(s) and contain(s) confidential information that may be proprietary, privileged or copyrighted under applicable law. If you are not the intended recipient, do not read, copy, or forward this email message or any attachments. Delete this email message and any attachments immediately.

^ permalink raw reply

* RE: [PATCH] [V3] Add non-Virtex5 support for LL TEMAC driver
From: Eric Dumazet @ 2010-04-06 17:00 UTC (permalink / raw)
  To: John Linn
  Cc: netdev, linuxppc-dev, grant.likely, jwboyer, john.williams,
	michal.simek, John Tyner
In-Reply-To: <2fefb2a2-d0dc-461d-ac8c-3e7d177b7cf8@VA3EHSMHS032.ehs.local>

Le mardi 06 avril 2010 à 10:12 -0600, John Linn a écrit :
> > -----Original Message-----
> > From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
> > Sent: Monday, April 05, 2010 3:30 PM
> > To: John Linn
> > Cc: netdev@vger.kernel.org; linuxppc-dev@ozlabs.org; grant.likely@secretlab.ca;
> > jwboyer@linux.vnet.ibm.com; john.williams@petalogix.com; michal.simek@petalogix.com; John Tyner
> > Subject: Re: [PATCH] [V3] Add non-Virtex5 support for LL TEMAC driver
> > 
> > Le lundi 05 avril 2010 à 15:11 -0600, John Linn a écrit :
> > > This patch adds support for using the LL TEMAC Ethernet driver on
> > > non-Virtex 5 platforms by adding support for accessing the Soft DMA
> > > registers as if they were memory mapped instead of solely through the
> > > DCR's (available on the Virtex 5).
> > >
> > > The patch also updates the driver so that it runs on the MicroBlaze.
> > > The changes were tested on the PowerPC 440, PowerPC 405, and the
> > > MicroBlaze platforms.
> > >
> > > Signed-off-by: John Tyner <jtyner@cs.ucr.edu>
> > > Signed-off-by: John Linn <john.linn@xilinx.com>
> > >
> > > ---
> > 
> > > +/* Align the IP data in the packet on word boundaries as MicroBlaze
> > > + * needs it.
> > > + */
> > > +
> > >  #define XTE_ALIGN       32
> > > -#define BUFFER_ALIGN(adr) ((XTE_ALIGN - ((u32) adr)) % XTE_ALIGN)
> > > +#define BUFFER_ALIGN(adr) ((34 - ((u32) adr)) % XTE_ALIGN)
> > >
> > 
> > Very interesting way of doing this, but why such convoluted thing ?
> 
> This is trying to align for a cache line (32 bytes) before my change.
> 
> My change was then also making it align the IP data on a word boundary. 
> 
> > 
> > Because of the % 32, this is equivalent to :
> > 
> > #define BUFFER_ALIGN(adr) ((2 - ((u32) adr)) % XTE_ALIGN)
> > 
> 
> Yes, but I'm not sure that's clearer IMHO.
> 
> > But wait, dont we recognise the magic constant NET_IP_ALIGN ?
> 
> Yes it could be used.  I'm struggling with how to make this all be clearer.
> 

I am not saying its clearer, I am saying we have a standard way to
handle this exact problem (aligning rcvs buffer so that IP header is
aligned)

There is no need to invent new ones, this makes reviewing of this driver
more difficult.


> How about this?
> #define BUFFER_ALIGN(adr) (((XTE_ALIGN + NET_IP_ALIGN) - ((u32) adr)) % XTE_ALIGN)
> 

Sorry, I still dont understand why you need XTE_ALIGN + ...

((A + B) - C) % A   is equal to (B - C) % A

Which one is more readable ?

Please take a look at existing and clean code, no magic macro, and we
can understand the intention.

find drivers/net | xargs grep -n netdev_alloc_skb_ip_align



^ permalink raw reply

* Re: [PATCH 5/5] netfilter: xt_TEE: have cloned packet travel through Xtables too
From: Patrick McHardy @ 2010-04-06 16:37 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: netfilter-devel, netdev
In-Reply-To: <alpine.LSU.2.01.1004061814120.13186@obet.zrqbmnf.qr>

Jan Engelhardt wrote:
> On Thursday 2010-04-01 15:22, Patrick McHardy wrote:
>>> Or should we be using skb_alloc and copying the data portion over, like 
>>> ipt_REJECT does since v2.6.24-2931-g9ba99b0?
>> I guess pskb_copy() would be most optimal since we can modify
>> the header, but the non-linear area could be shared
> 
> Trying to improve my understanding: when doing skb_pull,
> does the skb->head that is relevant for pskb_copy move?

skb_pull() only changes skb->data.

^ permalink raw reply

* Re: [PATCH 5/5] netfilter: xt_TEE: have cloned packet travel through Xtables too
From: Jan Engelhardt @ 2010-04-06 16:14 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netfilter-devel, netdev
In-Reply-To: <4BB49E10.8080608@trash.net>


On Thursday 2010-04-01 15:22, Patrick McHardy wrote:
>> Or should we be using skb_alloc and copying the data portion over, like 
>> ipt_REJECT does since v2.6.24-2931-g9ba99b0?
>
>I guess pskb_copy() would be most optimal since we can modify
>the header, but the non-linear area could be shared

Trying to improve my understanding: when doing skb_pull,
does the skb->head that is relevant for pskb_copy move?

^ permalink raw reply

* RE: [PATCH] [V3] Add non-Virtex5 support for LL TEMAC driver
From: John Linn @ 2010-04-06 16:12 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, linuxppc-dev, grant.likely, jwboyer, john.williams,
	michal.simek, John Tyner
In-Reply-To: <1270502993.9013.36.camel@edumazet-laptop>

> -----Original Message-----
> From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
> Sent: Monday, April 05, 2010 3:30 PM
> To: John Linn
> Cc: netdev@vger.kernel.org; linuxppc-dev@ozlabs.org; grant.likely@secretlab.ca;
> jwboyer@linux.vnet.ibm.com; john.williams@petalogix.com; michal.simek@petalogix.com; John Tyner
> Subject: Re: [PATCH] [V3] Add non-Virtex5 support for LL TEMAC driver
> 
> Le lundi 05 avril 2010 à 15:11 -0600, John Linn a écrit :
> > This patch adds support for using the LL TEMAC Ethernet driver on
> > non-Virtex 5 platforms by adding support for accessing the Soft DMA
> > registers as if they were memory mapped instead of solely through the
> > DCR's (available on the Virtex 5).
> >
> > The patch also updates the driver so that it runs on the MicroBlaze.
> > The changes were tested on the PowerPC 440, PowerPC 405, and the
> > MicroBlaze platforms.
> >
> > Signed-off-by: John Tyner <jtyner@cs.ucr.edu>
> > Signed-off-by: John Linn <john.linn@xilinx.com>
> >
> > ---
> 
> > +/* Align the IP data in the packet on word boundaries as MicroBlaze
> > + * needs it.
> > + */
> > +
> >  #define XTE_ALIGN       32
> > -#define BUFFER_ALIGN(adr) ((XTE_ALIGN - ((u32) adr)) % XTE_ALIGN)
> > +#define BUFFER_ALIGN(adr) ((34 - ((u32) adr)) % XTE_ALIGN)
> >
> 
> Very interesting way of doing this, but why such convoluted thing ?

This is trying to align for a cache line (32 bytes) before my change.

My change was then also making it align the IP data on a word boundary. 

> 
> Because of the % 32, this is equivalent to :
> 
> #define BUFFER_ALIGN(adr) ((2 - ((u32) adr)) % XTE_ALIGN)
> 

Yes, but I'm not sure that's clearer IMHO.

> But wait, dont we recognise the magic constant NET_IP_ALIGN ?

Yes it could be used.  I'm struggling with how to make this all be clearer.

How about this?

#define BUFFER_ALIGN(adr) (((XTE_ALIGN + NET_IP_ALIGN) - ((u32) adr)) % XTE_ALIGN)

> 
> So, I ask, cant you use netdev_alloc_skb_ip_align() in this driver ?

From what I can tell, this wouldn't work as it only reserves the 2 bytes to align with 
a word boundary.

Thanks,
John

> 
> 


This email and any attachments are intended for the sole use of the named recipient(s) and contain(s) confidential information that may be proprietary, privileged or copyrighted under applicable law. If you are not the intended recipient, do not read, copy, or forward this email message or any attachments. Delete this email message and any attachments immediately.

^ permalink raw reply

* Re: HFSC classes going out of bounds, regression in recent kernels?
From: Denys Fedorysychenko @ 2010-04-06 15:45 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netdev, Jeff Garzik, Eric Dumazet
In-Reply-To: <4BBB5543.40607@trash.net>

On Tuesday 06 April 2010 18:37:39 Patrick McHardy wrote:
> Denys Fedorysychenko wrote:
> > I notice on one of my QoS machines that HFSC start going out of bandwidth
> > limits. The most terrible thing - it happens suddenly, and if i just
> > relaunch QoS script - everything will work fine.
> 
> That sounds like there's an overflow somewhere.
> 
> > I'm not sure it is not my mistake, but most probably it is a bug.
> > I can't tell for sure when it is happened, last kernel was on this
> > machine 2.6.28 i guess, or maybe even older.
> 
> Looking through the recent patches in this area, my prime suspect
> is the attached patch. Does reverting it make any difference?
> 
I will try to upgrade soon, it is critical router, so probably i will do this 
tonight. 
I guess with reverting this patch also it will hurt shaper resolution on high 
speeds... not a case for me, but for other people.

^ permalink raw reply

* [PATCH] socket: remove duplicate declaration of struct timespec
From: Hagen Paul Pfeifer @ 2010-04-06 15:39 UTC (permalink / raw)
  To: netdev; +Cc: davem, Hagen Paul Pfeifer

struct timespec ts was alreay defined. Reuse the previously
defined one and reduce the memory footprint on the stack by
16 bytes.

Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>
---
 net/socket.c |    5 ++---
 1 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/net/socket.c b/net/socket.c
index 769c386..ae904b5 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -619,10 +619,9 @@ void __sock_recv_timestamp(struct msghdr *msg, struct sock *sk,
 			put_cmsg(msg, SOL_SOCKET, SCM_TIMESTAMP,
 				 sizeof(tv), &tv);
 		} else {
-			struct timespec ts;
-			skb_get_timestampns(skb, &ts);
+			skb_get_timestampns(skb, &ts[0]);
 			put_cmsg(msg, SOL_SOCKET, SCM_TIMESTAMPNS,
-				 sizeof(ts), &ts);
+				 sizeof(ts[0]), &ts[0]);
 		}
 	}
 
-- 
1.6.6.196.g1f735.dirty


^ permalink raw reply related

* [RFC][PATCH] ipmr:  Fix struct mfcctl to be independent of MAXVIFS v2
From: Eric W. Biederman @ 2010-04-06 15:38 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller, Eric Dumazet, Patrick McHardy, Ilia K, Tom Goff
In-Reply-To: <m1mxxgbwtl.fsf@fess.ebiederm.org>


Right now if you recompile the kernel increasing MAXVIFS
to support more VIFS users of the MRT_ADD_VIF and MRT_DEL_VIF
will break because the ABI changed.

My goal is an API that works with just a recompile of existing
applications, and an ABI that continues to work for old
applications.

The unused/dead fields at the end of struct mfcctl make this
exercise more difficult than it should be.

- Rename the existing struct mfcctl mfcctl_old.
- Define a new and larger struct mfcctl that we can detect
  by size.

  The new and larger struct mfcctl won't have trailing garbage
  fields so we can accept anything of that size or larger,
  and simply ignore the entries that are above MAXVIFS.

My new struct mfcctl is now 128 bytes which is noticeable on
the stack but should still be small enough not to cause problems.

v2:  Rework the support larger arrays so that most/all? existing
   applications can simply be recompiled and work with a larger
   maximum number of VIFS.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 include/linux/mroute.h |   20 +++++++++++++++-----
 net/ipv4/ipmr.c        |   17 ++++++++++++++---
 2 files changed, 29 insertions(+), 8 deletions(-)

diff --git a/include/linux/mroute.h b/include/linux/mroute.h
index c5f3d53..6dbdebf 100644
--- a/include/linux/mroute.h
+++ b/include/linux/mroute.h
@@ -76,15 +76,25 @@ struct vifctl {
  *	Cache manipulation structures for mrouted and PIMd
  */
  
+struct mfcctl_old {
+#define MFCCTL_OLD_VIFS 32
+	struct in_addr mfcc_origin;		/* Origin of mcast	*/
+	struct in_addr mfcc_mcastgrp;		/* Group in question	*/
+	vifi_t	mfcc_parent;			/* Where it arrived	*/
+	unsigned char mfcc_ttls[MFCCTL_OLD_VIFS];		/* Where it is going	*/
+	unsigned int mfcc_pkt_cnt;		/* dead */
+	unsigned int mfcc_byte_cnt;		/* dead */
+	unsigned int mfcc_wrong_if;		/* dead */
+	int	     mfcc_expire;		/* dead */
+};
+
 struct mfcctl {
+#define MFCCTL_VIFS 118
 	struct in_addr mfcc_origin;		/* Origin of mcast	*/
 	struct in_addr mfcc_mcastgrp;		/* Group in question	*/
 	vifi_t	mfcc_parent;			/* Where it arrived	*/
-	unsigned char mfcc_ttls[MAXVIFS];	/* Where it is going	*/
-	unsigned int mfcc_pkt_cnt;		/* pkt count for src-grp */
-	unsigned int mfcc_byte_cnt;
-	unsigned int mfcc_wrong_if;
-	int	     mfcc_expire;
+	unsigned char mfcc_ttls[MFCCTL_VIFS];	/* Where it is going 	*/
+	/* Don't put anything here as mfcc_ttls should grow into here */
 };
 
 /* 
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index 0b9d03c..516289b 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -1012,10 +1012,18 @@ int ip_mroute_setsockopt(struct sock *sk, int optname, char __user *optval, unsi
 		 */
 	case MRT_ADD_MFC:
 	case MRT_DEL_MFC:
-		if (optlen != sizeof(mfc))
+
+		if (optlen == sizeof(struct mfcctl_old)) {
+			if (copy_from_user(&mfc, optval, sizeof(struct mfcctl_old)))
+				return -EFAULT;
+			memset(mfc.mfcc_ttls + MFCCTL_OLD_VIFS, 255,
+			       MFCCTL_VIFS - MFCCTL_OLD_VIFS);
+		} else if (optlen >= (sizeof(struct mfcctl))) {
+			if (copy_from_user(&mfc, optval, sizeof(mfc)))
+				return -EFAULT;
+		} else
 			return -EINVAL;
-		if (copy_from_user(&mfc, optval, sizeof(mfc)))
-			return -EFAULT;
+
 		rtnl_lock();
 		if (optname == MRT_DEL_MFC)
 			ret = ipmr_mfc_delete(net, &mfc);
@@ -2032,6 +2040,9 @@ int __init ip_mr_init(void)
 {
 	int err;
 
+	BUILD_BUG_ON(MFCCTL_VIFS < MAXVIFS);
+	BUILD_BUG_ON(sizeof(struct mfcctl) <= sizeof(struct mfcctl_old));
+
 	mrt_cachep = kmem_cache_create("ip_mrt_cache",
 				       sizeof(struct mfc_cache),
 				       0, SLAB_HWCACHE_ALIGN|SLAB_PANIC,
-- 
1.6.5.2.143.g8cc62


^ permalink raw reply related

* Re: HFSC classes going out of bounds, regression in recent kernels?
From: Patrick McHardy @ 2010-04-06 15:37 UTC (permalink / raw)
  To: Denys Fedorysychenko; +Cc: netdev, Jeff Garzik, Eric Dumazet
In-Reply-To: <201004061822.21735.nuclearcat@nuclearcat.com>

[-- Attachment #1: Type: text/plain, Size: 593 bytes --]

Denys Fedorysychenko wrote:
> I notice on one of my QoS machines that HFSC start going out of bandwidth 
> limits. The most terrible thing - it happens suddenly, and if i just relaunch 
> QoS script - everything will work fine.

That sounds like there's an overflow somewhere.

> I'm not sure it is not my mistake, but most probably it is a bug.
> I can't tell for sure when it is happened, last kernel was on this machine 
> 2.6.28 i guess, or maybe even older.

Looking through the recent patches in this area, my prime suspect
is the attached patch. Does reverting it make any difference?


[-- Attachment #2: x --]
[-- Type: text/plain, Size: 1726 bytes --]

commit a4a710c4a7490587406462bf1d54504b7783d7d7
Author: Jarek Poplawski <jarkao2@gmail.com>
Date:   Mon Jun 8 22:05:13 2009 +0000

    pkt_sched: Change PSCHED_SHIFT from 10 to 6
    
    Change PSCHED_SHIFT from 10 to 6 to increase schedulers time
    resolution. This will increase 16x a number of (internal) ticks per
    nanosecond, and is needed to improve accuracy of schedulers based on
    rate tables, like HTB, TBF or CBQ, with rates above 100Mbit. It is
    assumed this change is safe for 32bit accounting of time diffs up
    to 2 minutes, which should be enough for common use (extremely low
    rate values may overflow, so get inaccurate instead). To make full
    use of this change an updated iproute2 will be needed. (But using
    older iproute2 should be safe too.)
    
    This change breaks ticks - microseconds similarity, so some minor code
    fixes might be needed. It is also planned to change naming adequately
    eg. to PSCHED_TICKS2NS() etc. in the near future.
    
    Reported-by: Antonio Almeida <vexwek@gmail.com>
    Tested-by: Antonio Almeida <vexwek@gmail.com>
    Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index cd0e026..120935b 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -41,8 +41,8 @@ static inline void *qdisc_priv(struct Qdisc *q)
 typedef u64	psched_time_t;
 typedef long	psched_tdiff_t;
 
-/* Avoid doing 64 bit divide by 1000 */
-#define PSCHED_SHIFT			10
+/* Avoid doing 64 bit divide */
+#define PSCHED_SHIFT			6
 #define PSCHED_US2NS(x)			((s64)(x) << PSCHED_SHIFT)
 #define PSCHED_NS2US(x)			((x) >> PSCHED_SHIFT)
 

^ permalink raw reply related

* HFSC classes going out of bounds, regression in recent kernels?
From: Denys Fedorysychenko @ 2010-04-06 15:22 UTC (permalink / raw)
  To: netdev, Jeff Garzik, Eric Dumazet, Patrick McHardy

Hi

I notice on one of my QoS machines that HFSC start going out of bandwidth 
limits. The most terrible thing - it happens suddenly, and if i just relaunch 
QoS script - everything will work fine.
I'm not sure it is not my mistake, but most probably it is a bug.
I can't tell for sure when it is happened, last kernel was on this machine 
2.6.28 i guess, or maybe even older.

The only possible cause - that i must setup stab for each qdisc, not only 
root. But i see it goes wild too much on graphs (up to 25-26 Mbps for 5 min 
average).
Here is qdisc setup:

qdisc bfifo 110: parent 1:110 limit 100000b
qdisc hfsc 1: root refcnt 2
 linklayer ethernet overhead 4 mpu 64 mtu 2047 tsize 512
qdisc bfifo 120: parent 1:120 limit 100000b
qdisc bfifo 130: parent 1:130 limit 100000b
qdisc bfifo 140: parent 1:140 limit 100000b
qdisc bfifo 150: parent 1:150 limit 1000000b
qdisc bfifo 160: parent 1:160 limit 100000b
qdisc bfifo 170: parent 1:170 limit 100000b
qdisc bfifo 180: parent 1:180 limit 100000b
qdisc bfifo 190: parent 1:190 limit 100000b


All offloading (except checksumming) disabled on eth0 and eth0.33.


Here is example of stats, on this snapshot total bandwidth goes out of bounds 
20Mbit/s to 21.6 Mbit/s.

Router# tc -s -d class show dev eth0.33;sleep 10;tc -s -d class show dev 
eth0.33;                                                                                                                 
class hfsc 1:110 parent 1:100 leaf 110: sc m1 0bit d 0us m2 8000bit                                                                                                                                       
 Sent 1544928 bytes 18392 pkt (dropped 0, overlimits 0 requeues 0)                                                                                                                                        
 rate 1344bit 2pps backlog 0b 0p requeues 0                                                                                                                                                               
 period 18392 work 1544928 bytes rtwork 1411368 bytes level 0                                                                                                                                             
                                                                                                                                                                                                          
class hfsc 1: root                                                                                                                                                                                        
 Sent 4180 bytes 24 pkt (dropped 0, overlimits 0 requeues 0)                                                                                                                                              
 backlog 0b 0p requeues 0                                                                                                                                                                                 
 period 0 level 2                                                                                                                                                                                         
                                                                                                                                                                                                          
class hfsc 1:100 parent 1: sc m1 0bit d 0us m2 20000Kbit ul m1 0bit d 0us m2 
20000Kbit                                                                                                                    
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)                                                                                                                                                  
 rate 0bit 0pps backlog 0b 0p requeues 0                                                                                                                                                                  
 period 74 work 22978893888 bytes level 1                                                                                                                                                                 
                                                                                                                                                                                                          
class hfsc 1:130 parent 1:100 leaf 130: sc m1 0bit d 0us m2 7000Kbit ul m1 
0bit d 0us m2 20000Kbit                                                                                                        
 Sent 5900522796 bytes 19334198 pkt (dropped 0, overlimits 0 requeues 0)                                                                                                                                  
 rate 5826Kbit 2360pps backlog 0b 0p requeues 0                                                                                                                                                           
 period 13454851 work 5900522796 bytes rtwork 3473525428 bytes level 0                                                                                                                                    
                                                                                                                                                                                                          
class hfsc 1:120 parent 1:100 leaf 120: sc m1 0bit d 0us m2 3000Kbit ul m1 
0bit d 0us m2 20000Kbit                                                                                                        
 Sent 659289392 bytes 3151731 pkt (dropped 484, overlimits 0 requeues 0)                                                                                                                                  
 rate 490560bit 329pps backlog 0b 0p requeues 0                                                                                                                                                           
 period 2902223 work 659289392 bytes rtwork 391958652 bytes level 0                                                                                                                                       
                                                                                                                                                                                                          
class hfsc 1:150 parent 1:100 leaf 150: sc m1 0bit d 0us m2 2000Kbit ul m1 
0bit d 0us m2 20000Kbit                                                                                                        
 Sent 15792563632 bytes 79195890 pkt (dropped 376595, overlimits 0 requeues 0)                                                                                                                            
 rate 14354Kbit 8650pps backlog 0b 0p requeues 0                                                                                                                                                          
 period 9906952 work 15792563632 bytes rtwork 2219169504 bytes level 0                                                                                                                                    
                                                                                                                                                                                                          
class hfsc 1:140 parent 1:100 leaf 140: sc m1 0bit d 0us m2 2000Kbit ul m1 
0bit d 0us m2 20000Kbit                                                                                                        
 Sent 87850052 bytes 540018 pkt (dropped 0, overlimits 0 requeues 0)                                                                                                                                      
 rate 47024bit 70pps backlog 0b 0p requeues 0                                                                                                                                                             
 period 527595 work 87850052 bytes rtwork 82404272 bytes level 0                                                                                                                                          
                                                                                                                                                                                                          
class hfsc 1:170 parent 1:100 leaf 170: sc m1 0bit d 0us m2 256000bit ul m1 
0bit d 0us m2 20000Kbit                                                                                                       
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)                                                                                                                                                  
 rate 0bit 0pps backlog 0b 0p requeues 0                                                                                                                                                                  
 period 0 level 0                                                                                                                                                                                         
                                                                                                                                                                                                          
class hfsc 1:160 parent 1:100 leaf 160: sc m1 0bit d 0us m2 256000bit ul m1 
0bit d 0us m2 20000Kbit                                                                                                       
 Sent 75659416 bytes 58924 pkt (dropped 201, overlimits 0 requeues 0)                                                                                                                                     
 rate 256bit 0pps backlog 0b 0p requeues 0                                                                                                                                                                
 period 27786 work 75659416 bytes rtwork 14074548 bytes level 0                                                                                                                                           
                                                                                                                                                                                                          
class hfsc 1:190 parent 1:100 leaf 190: sc m1 0bit d 0us m2 200000bit ul m1 
0bit d 0us m2 400000bit                                                                                                       
 Sent 459905752 bytes 5083942 pkt (dropped 2305517, overlimits 0 requeues 0)                                                                                                                              
 rate 400112bit 548pps backlog 0b 1094p requeues 0                                                                                                                                                        
 period 3 work 459806144 bytes rtwork 229903112 bytes level 0                                                                                                                                             
                                                                                                                                                                                                          
class hfsc 1:180 parent 1:100 leaf 180: sc m1 0bit d 0us m2 2000Kbit ul m1 
0bit d 0us m2 20000Kbit                                                                                                        
 Sent 1657528 bytes 7985 pkt (dropped 0, overlimits 0 requeues 0)                                                                                                                                         
 rate 1320bit 1pps backlog 0b 0p requeues 0                                                                                                                                                               
 period 7097 work 1657528 bytes rtwork 1380512 bytes level 0                                                                                                                                              

------------------------
After 10 seconds
                                                                                                                                                                                                          
class hfsc 1:110 parent 1:100 leaf 110: sc m1 0bit d 0us m2 8000bit                                                                                                                                       
 Sent 1546608 bytes 18412 pkt (dropped 0, overlimits 0 requeues 0)                                                                                                                                        
 rate 1344bit 2pps backlog 0b 0p requeues 0                                                                                                                                                               
 period 18412 work 1546608 bytes rtwork 1413048 bytes level 0                                                                                                                                             
                                                                                                                                                                                                          
class hfsc 1: root                                                                                                                                                                                        
 Sent 4180 bytes 24 pkt (dropped 0, overlimits 0 requeues 0)                                                                                                                                              
 backlog 0b 0p requeues 0                                                                                                                                                                                 
 period 0 level 2                                                                                                                                                                                         
                                                                                                                                                                                                          
class hfsc 1:100 parent 1: sc m1 0bit d 0us m2 20000Kbit ul m1 0bit d 0us m2 
20000Kbit                                                                                                                    
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)                                                                                                                                                  
 rate 0bit 0pps backlog 0b 0p requeues 0                                                                                                                                                                  
 period 74 work 23004936640 bytes level 1                                                                                                                                                                 

class hfsc 1:130 parent 1:100 leaf 130: sc m1 0bit d 0us m2 7000Kbit ul m1 
0bit d 0us m2 20000Kbit
 Sent 5907643408 bytes 19355979 pkt (dropped 0, overlimits 0 requeues 0)
 rate 5663Kbit 2226pps backlog 0b 6p requeues 0
 period 13471026 work 5907643020 bytes rtwork 3477346092 bytes level 0

class hfsc 1:120 parent 1:100 leaf 120: sc m1 0bit d 0us m2 3000Kbit ul m1 
0bit d 0us m2 20000Kbit
 Sent 659705168 bytes 3154632 pkt (dropped 484, overlimits 0 requeues 0)
 rate 380008bit 303pps backlog 0b 0p requeues 0
 period 2905063 work 659705168 bytes rtwork 392246028 bytes level 0

class hfsc 1:150 parent 1:100 leaf 150: sc m1 0bit d 0us m2 2000Kbit ul m1 
0bit d 0us m2 20000Kbit
 Sent 15810495520 bytes 79281120 pkt (dropped 376595, overlimits 0 requeues 0)
 rate 14362Kbit 8547pps backlog 0b 0p requeues 0
 period 9928987 work 15810495520 bytes rtwork 2221499996 bytes level 0

class hfsc 1:140 parent 1:100 leaf 140: sc m1 0bit d 0us m2 2000Kbit ul m1 
0bit d 0us m2 20000Kbit
 Sent 87918300 bytes 540801 pkt (dropped 0, overlimits 0 requeues 0)
 rate 50528bit 74pps backlog 0b 0p requeues 0
 period 528353 work 87918300 bytes rtwork 82463492 bytes level 0

class hfsc 1:170 parent 1:100 leaf 170: sc m1 0bit d 0us m2 256000bit ul m1 
0bit d 0us m2 20000Kbit
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 rate 0bit 0pps backlog 0b 0p requeues 0
 period 0 level 0

class hfsc 1:160 parent 1:100 leaf 160: sc m1 0bit d 0us m2 256000bit ul m1 
0bit d 0us m2 20000Kbit
 Sent 75659480 bytes 58925 pkt (dropped 201, overlimits 0 requeues 0)
 rate 96bit 0pps backlog 0b 0p requeues 0
 period 27787 work 75659480 bytes rtwork 14074612 bytes level 0

class hfsc 1:190 parent 1:100 leaf 190: sc m1 0bit d 0us m2 200000bit ul m1 
0bit d 0us m2 400000bit
 Sent 460405988 bytes 5089449 pkt (dropped 2307857, overlimits 0 requeues 0)
 rate 399960bit 548pps backlog 0b 1109p requeues 0
 period 3 work 460306304 bytes rtwork 230153172 bytes level 0

class hfsc 1:180 parent 1:100 leaf 180: sc m1 0bit d 0us m2 2000Kbit ul m1 
0bit d 0us m2 20000Kbit
 Sent 1662240 bytes 8007 pkt (dropped 0, overlimits 0 requeues 0)
 rate 2088bit 1pps backlog 0b 0p requeues 0
 period 7118 work 1662240 bytes rtwork 1384776 bytes level 0

qdisc setup

^ permalink raw reply

* Re: [PATCH v2] rfs: Receive Flow Steering
From: Eric Dumazet @ 2010-04-06 15:10 UTC (permalink / raw)
  To: Tom Herbert; +Cc: davem, netdev
In-Reply-To: <v2n65634d661004060725paf401b5bg1ee692caef565d33@mail.gmail.com>

Le mardi 06 avril 2010 à 07:25 -0700, Tom Herbert a écrit :

> > 2) inet_rps_save_rxhash(sk, skb->rxhash);
> >
> >        It should have a check to make sure some part of the stack doesnt feed
> > many different rxhash for a given socket (Make sure we dont pollute flow
> > table with pseudo random values)
> >
> If packets for a connection are always received on the same device, is
> it reasonable to assume the rxhash is constant for that connection?
> 
> I suppose it's possible that packets for a same sockets are being
> constantly received on two different devices that are giving different
> rxhashes.  This would already be bad in that OOO is probably happening
> anyway.  I don't know if thrashing the sock_flow_table is going to
> aggravate this scenario much.
> 
> Are there any other degenerative cases you're worried about?

No, I was only considering this mostly as a debugging aid, before RPS is
stable, because mismatches could be unnoticed and performance not
optimal.
 



^ permalink raw reply

* RE: [PATCH] [V3] Add non-Virtex5 support for LL TEMAC driver
From: Steven J. Magnani @ 2010-04-06 15:08 UTC (permalink / raw)
  To: John Linn
  Cc: Eric Dumazet, netdev, linuxppc-dev, grant.likely, jwboyer,
	john.williams, michal.simek, John Tyner
In-Reply-To: <7a42d507-bf50-40f3-a2a0-8de682b314d7@SG2EHSMHS017.ehs.local>

On Mon, 2010-04-05 at 15:33 -0600, John Linn wrote:
> > > +/* Align the IP data in the packet on word boundaries as MicroBlaze
> > > + * needs it.
> > > + */
> > > +
> > >  #define XTE_ALIGN       32
> > > -#define BUFFER_ALIGN(adr) ((XTE_ALIGN - ((u32) adr)) % XTE_ALIGN)
> > > +#define BUFFER_ALIGN(adr) ((34 - ((u32) adr)) % XTE_ALIGN)
> > >
> > 
> > Very interesting way of doing this, but why such convoluted thing ?
> 
> Grant might have insight into why this started this way, I just updated to help with MicroBlaze alignment.
> 
> > 
> > Because of the % 32, this is equivalent to :
> > 
> > #define BUFFER_ALIGN(adr) ((2 - ((u32) adr)) % XTE_ALIGN)
> > 
> > But wait, dont we recognise the magic constant NET_IP_ALIGN ?
> > 
> > So, I ask, cant you use netdev_alloc_skb_ip_align() in this driver ?
> > 

That should work. I switched to that in the older xilinx_lltemac driver
without any problem. 

------------------------------------------------------------------------
 Steven J. Magnani               "I claim this network for MARS!
 www.digidescorp.com              Earthling, return my space modulator!"

 #include <standard.disclaimer>




^ permalink raw reply

* Re: [v2 Patch 3/3] bonding: make bonding support netpoll
From: Andy Gospodarek @ 2010-04-06 14:48 UTC (permalink / raw)
  To: Cong Wang
  Cc: linux-kernel, Matt Mackall, netdev, bridge, Andy Gospodarek,
	Neil Horman, Jeff Moyer, Stephen Hemminger, bonding-devel,
	Jay Vosburgh, David Miller
In-Reply-To: <4BBABAB8.4010401@redhat.com>

On Tue, Apr 06, 2010 at 12:38:16PM +0800, Cong Wang wrote:
> Cong Wang wrote:
>> Before I try to reproduce it, could you please try to replace the  
>> 'read_lock()'
>> in slaves_support_netpoll() with 'read_lock_bh()'? (read_unlock() too)  
>> Try if this helps.
>>
>
> Confirmed. Please use the attached patch instead, for your testing.
>
> Thanks!
>

Moving those locks to bh-locks will not resolve this.  I tried that
yesterday and tried your new patch today without success.  That warning
is a WARN_ON_ONCE so you need to reboot to see that it is still a
problem.  Simply unloading and loading the new module is not an accurate
test.

Also, my system still hangs when removing the bonding module.  I do not
think you intended to fix this with the patch, but wanted it to be clear
to everyone on the list.

You should also configure your kernel with a some of the lock debugging
enabled.  I've been using the following:

CONFIG_DETECT_HUNG_TASK=y
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_MUTEXES=y
CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_PROVE_LOCKING=y
CONFIG_LOCKDEP=y
CONFIG_LOCK_STAT=y
CONFIG_DEBUG_LOCKDEP=y

Here is the output when I remove a slave from the bond.  My
xmit_roundrobin patch from earlier (replacing read_lock with
read_trylock) was applied.  It might be helpful for you when debugging
these issues.

------------[ cut here ]------------
WARNING: at kernel/softirq.c:143 local_bh_enable+0x43/0xba()
Hardware name: HP xw4400 Workstation
Modules linked in: netconsole bonding ipt_REJECT bridge stp autofs4 i2c_dev i2c_core hidp rfcomm
l2cap crc16 bluetooth rfki]
Pid: 10, comm: events/1 Not tainted 2.6.34-rc3 #6
Call Trace:
 [<ffffffff81058754>] ? cpu_clock+0x2d/0x41
 [<ffffffff810404d9>] ? local_bh_enable+0x43/0xba
 [<ffffffff8103a350>] warn_slowpath_common+0x77/0x8f
 [<ffffffff812a4659>] ? dev_queue_xmit+0x408/0x467
 [<ffffffff8103a377>] warn_slowpath_null+0xf/0x11
 [<ffffffff810404d9>] local_bh_enable+0x43/0xba
 [<ffffffff812a4659>] dev_queue_xmit+0x408/0x467
 [<ffffffff812a435e>] ? dev_queue_xmit+0x10d/0x467
 [<ffffffffa04a383f>] bond_dev_queue_xmit+0x1cd/0x1f9 [bonding]
 [<ffffffffa04a41ee>] bond_start_xmit+0x139/0x3e9 [bonding]
 [<ffffffff812b0e9a>] queue_process+0xa8/0x160
 [<ffffffff812b0df2>] ? queue_process+0x0/0x160
 [<ffffffff81050bfb>] worker_thread+0x1af/0x2ae
 [<ffffffff81050ba2>] ? worker_thread+0x156/0x2ae
 [<ffffffff81053c34>] ? autoremove_wake_function+0x0/0x38
 [<ffffffff81050a4c>] ? worker_thread+0x0/0x2ae
 [<ffffffff81053901>] kthread+0x7d/0x85
 [<ffffffff81003794>] kernel_thread_helper+0x4/0x10
 [<ffffffff813362bc>] ? restore_args+0x0/0x30
 [<ffffffff81053884>] ? kthread+0x0/0x85
 [<ffffffff81003790>] ? kernel_thread_helper+0x0/0x10
---[ end trace 241f49bf65e0f4f0 ]---

=========================================================
[ INFO: possible irq lock inversion dependency detected ]
2.6.34-rc3 #6
---------------------------------------------------------
events/1/10 just changed the state of lock:
 (&bonding_netdev_xmit_lock_key){+.+...}, at: [<ffffffff812b0e75>] queue_process+0x83/0x160
but this lock was taken by another, SOFTIRQ-safe lock in the past:
 (&(&dev->tx_global_lock)->rlock){+.-...}

and interrupts could create inverse lock ordering between them.


other info that might help us debug this:
4 locks held by events/1/10:
 #0:  (events){+.+.+.}, at: [<ffffffff81050ba2>] worker_thread+0x156/0x2ae
 #1:  ((&(&npinfo->tx_work)->work)){+.+...}, at: [<ffffffff81050ba2>] worker_thread+0x156/0x2ae
 #2:  (&bonding_netdev_xmit_lock_key){+.+...}, at: [<ffffffff812b0e75>] queue_process+0x83/0x160
 #3:  (&bond->lock){++.+..}, at: [<ffffffffa04a4107>] bond_start_xmit+0x52/0x3e9 [bonding]

the shortest dependencies between 2nd lock and 1st lock:
 -> (&(&dev->tx_global_lock)->rlock){+.-...} ops: 129 {
    HARDIRQ-ON-W at:
                          [<ffffffff810651ef>] __lock_acquire+0x643/0x813
                          [<ffffffff81065487>] lock_acquire+0xc8/0xed
                          [<ffffffff81335742>] _raw_spin_lock+0x31/0x66
                          [<ffffffff812b64bd>] dev_deactivate+0x6f/0x195
                          [<ffffffff812ad7c4>] linkwatch_do_dev+0x9a/0xae
                          [<ffffffff812ada6a>] __linkwatch_run_queue+0x106/0x14a
                          [<ffffffff812adad8>] linkwatch_event+0x2a/0x31
                          [<ffffffff81050bfb>] worker_thread+0x1af/0x2ae
                          [<ffffffff81053901>] kthread+0x7d/0x85
                          [<ffffffff81003794>] kernel_thread_helper+0x4/0x10
    IN-SOFTIRQ-W at:
                          [<ffffffff810651a3>] __lock_acquire+0x5f7/0x813
                          [<ffffffff81065487>] lock_acquire+0xc8/0xed
                          [<ffffffff81335742>] _raw_spin_lock+0x31/0x66
                          [<ffffffff812b6606>] dev_watchdog+0x23/0x1f2
                          [<ffffffff8104701b>] run_timer_softirq+0x1d1/0x285
                          [<ffffffff81040021>] __do_softirq+0xdb/0x1ab
                          [<ffffffff8100388c>] call_softirq+0x1c/0x34
                          [<ffffffff81004f9d>] do_softirq+0x38/0x83
                          [<ffffffff8103ff44>] irq_exit+0x45/0x47
                          [<ffffffff810193bc>] smp_apic_timer_interrupt+0x88/0x98
                          [<ffffffff81003353>] apic_timer_interrupt+0x13/0x20
                          [<ffffffff81001a21>] cpu_idle+0x4d/0x6b
                          [<ffffffff8131da3a>] rest_init+0xbe/0xc2
                          [<ffffffff81a00d4e>] start_kernel+0x38c/0x399
                          [<ffffffff81a002a5>] x86_64_start_reservations+0xb5/0xb9
                          [<ffffffff81a0038f>] x86_64_start_kernel+0xe6/0xed
    INITIAL USE at:
                         [<ffffffff8106525c>] __lock_acquire+0x6b0/0x813
                         [<ffffffff81065487>] lock_acquire+0xc8/0xed
                         [<ffffffff81335742>] _raw_spin_lock+0x31/0x66
                         [<ffffffff812b64bd>] dev_deactivate+0x6f/0x195
                         [<ffffffff812ad7c4>] linkwatch_do_dev+0x9a/0xae
                         [<ffffffff812ada6a>] __linkwatch_run_queue+0x106/0x14a
                         [<ffffffff812adad8>] linkwatch_event+0x2a/0x31
                         [<ffffffff81050bfb>] worker_thread+0x1af/0x2ae
                         [<ffffffff81053901>] kthread+0x7d/0x85
                         [<ffffffff81003794>] kernel_thread_helper+0x4/0x10
  }
  ... key      at: [<ffffffff8282ceb0>] __key.51521+0x0/0x8
  ... acquired at:
   [<ffffffff810649f9>] validate_chain+0xb87/0xd3a
   [<ffffffff81065359>] __lock_acquire+0x7ad/0x813
   [<ffffffff81065487>] lock_acquire+0xc8/0xed
   [<ffffffff81335742>] _raw_spin_lock+0x31/0x66
   [<ffffffff812b64e4>] dev_deactivate+0x96/0x195
   [<ffffffff812a17fc>] __dev_close+0x69/0x86
   [<ffffffff8129f8ed>] __dev_change_flags+0xa8/0x12b
   [<ffffffff812a148c>] dev_change_flags+0x1c/0x51
   [<ffffffff812eee8a>] devinet_ioctl+0x26e/0x5d0
   [<ffffffff812ef978>] inet_ioctl+0x8a/0xa2
   [<ffffffff8128fc28>] sock_do_ioctl+0x26/0x45
   [<ffffffff8128fe5a>] sock_ioctl+0x213/0x226
   [<ffffffff810e5988>] vfs_ioctl+0x2a/0x9d
   [<ffffffff810e5f13>] do_vfs_ioctl+0x491/0x4e2
   [<ffffffff810e5fbb>] sys_ioctl+0x57/0x7a
   [<ffffffff8100296b>] system_call_fastpath+0x16/0x1b

-> (&bonding_netdev_xmit_lock_key){+.+...} ops: 2 {
   HARDIRQ-ON-W at:
                        [<ffffffff810651ef>] __lock_acquire+0x643/0x813
                        [<ffffffff81065487>] lock_acquire+0xc8/0xed
                        [<ffffffff81335742>] _raw_spin_lock+0x31/0x66
                        [<ffffffff812b64e4>] dev_deactivate+0x96/0x195
                        [<ffffffff812a17fc>] __dev_close+0x69/0x86
                        [<ffffffff8129f8ed>] __dev_change_flags+0xa8/0x12b
                        [<ffffffff812a148c>] dev_change_flags+0x1c/0x51
                        [<ffffffff812eee8a>] devinet_ioctl+0x26e/0x5d0
                        [<ffffffff812ef978>] inet_ioctl+0x8a/0xa2
                        [<ffffffff8128fc28>] sock_do_ioctl+0x26/0x45
                        [<ffffffff8128fe5a>] sock_ioctl+0x213/0x226
                        [<ffffffff810e5988>] vfs_ioctl+0x2a/0x9d
                        [<ffffffff810e5f13>] do_vfs_ioctl+0x491/0x4e2
                        [<ffffffff810e5fbb>] sys_ioctl+0x57/0x7a
                        [<ffffffff8100296b>] system_call_fastpath+0x16/0x1b
   SOFTIRQ-ON-W at:
                        [<ffffffff81062006>] mark_held_locks+0x49/0x69
                        [<ffffffff81062139>] trace_hardirqs_on_caller+0x113/0x13e
                        [<ffffffff81062171>] trace_hardirqs_on+0xd/0xf
                        [<ffffffff81040548>] local_bh_enable+0xb2/0xba
                        [<ffffffff812a4659>] dev_queue_xmit+0x408/0x467
                        [<ffffffffa04a383f>] bond_dev_queue_xmit+0x1cd/0x1f9 [bonding]
                        [<ffffffffa04a41ee>] bond_start_xmit+0x139/0x3e9 [bonding]
                        [<ffffffff812b0e9a>] queue_process+0xa8/0x160
                        [<ffffffff81050bfb>] worker_thread+0x1af/0x2ae
                        [<ffffffff81053901>] kthread+0x7d/0x85
                        [<ffffffff81003794>] kernel_thread_helper+0x4/0x10
   INITIAL USE at:
                       [<ffffffff8106525c>] __lock_acquire+0x6b0/0x813
                       [<ffffffff81065487>] lock_acquire+0xc8/0xed
                       [<ffffffff81335742>] _raw_spin_lock+0x31/0x66
                       [<ffffffff812b64e4>] dev_deactivate+0x96/0x195
                       [<ffffffff812a17fc>] __dev_close+0x69/0x86
                       [<ffffffff8129f8ed>] __dev_change_flags+0xa8/0x12b
                       [<ffffffff812a148c>] dev_change_flags+0x1c/0x51
                       [<ffffffff812eee8a>] devinet_ioctl+0x26e/0x5d0
                       [<ffffffff812ef978>] inet_ioctl+0x8a/0xa2
                       [<ffffffff8128fc28>] sock_do_ioctl+0x26/0x45
                       [<ffffffff8128fe5a>] sock_ioctl+0x213/0x226
                       [<ffffffff810e5988>] vfs_ioctl+0x2a/0x9d
                       [<ffffffff810e5f13>] do_vfs_ioctl+0x491/0x4e2
                       [<ffffffff810e5fbb>] sys_ioctl+0x57/0x7a
                       [<ffffffff8100296b>] system_call_fastpath+0x16/0x1b
 }
 ... key      at: [<ffffffffa04b1968>] bonding_netdev_xmit_lock_key+0x0/0xffffffffffffa78c [bonding]
 ... acquired at:
   [<ffffffff8106386d>] check_usage_backwards+0xb8/0xc7
   [<ffffffff81061d81>] mark_lock+0x311/0x54d
   [<ffffffff81062006>] mark_held_locks+0x49/0x69
   [<ffffffff81062139>] trace_hardirqs_on_caller+0x113/0x13e
   [<ffffffff81062171>] trace_hardirqs_on+0xd/0xf
   [<ffffffff81040548>] local_bh_enable+0xb2/0xba
   [<ffffffff812a4659>] dev_queue_xmit+0x408/0x467
   [<ffffffffa04a383f>] bond_dev_queue_xmit+0x1cd/0x1f9 [bonding]
   [<ffffffffa04a41ee>] bond_start_xmit+0x139/0x3e9 [bonding]
   [<ffffffff812b0e9a>] queue_process+0xa8/0x160
   [<ffffffff81050bfb>] worker_thread+0x1af/0x2ae
   [<ffffffff81053901>] kthread+0x7d/0x85
   [<ffffffff81003794>] kernel_thread_helper+0x4/0x10


stack backtrace:
Pid: 10, comm: events/1 Tainted: G        W  2.6.34-rc3 #6
Call Trace:
 [<ffffffff8106189e>] print_irq_inversion_bug+0x121/0x130
 [<ffffffff8106386d>] check_usage_backwards+0xb8/0xc7
 [<ffffffff810637b5>] ? check_usage_backwards+0x0/0xc7
 [<ffffffff81061d81>] mark_lock+0x311/0x54d
 [<ffffffff81062006>] mark_held_locks+0x49/0x69
 [<ffffffff81040548>] ? local_bh_enable+0xb2/0xba
 [<ffffffff81062139>] trace_hardirqs_on_caller+0x113/0x13e
 [<ffffffff812a4659>] ? dev_queue_xmit+0x408/0x467
 [<ffffffff81062171>] trace_hardirqs_on+0xd/0xf
 [<ffffffff81040548>] local_bh_enable+0xb2/0xba
 [<ffffffff812a4659>] dev_queue_xmit+0x408/0x467
 [<ffffffff812a435e>] ? dev_queue_xmit+0x10d/0x467
 [<ffffffffa04a383f>] bond_dev_queue_xmit+0x1cd/0x1f9 [bonding]
 [<ffffffffa04a41ee>] bond_start_xmit+0x139/0x3e9 [bonding]
 [<ffffffff812b0e9a>] queue_process+0xa8/0x160
 [<ffffffff812b0df2>] ? queue_process+0x0/0x160
 [<ffffffff81050bfb>] worker_thread+0x1af/0x2ae
 [<ffffffff81050ba2>] ? worker_thread+0x156/0x2ae
 [<ffffffff81053c34>] ? autoremove_wake_function+0x0/0x38
 [<ffffffff81050a4c>] ? worker_thread+0x0/0x2ae
 [<ffffffff81053901>] kthread+0x7d/0x85
 [<ffffffff81003794>] kernel_thread_helper+0x4/0x10
 [<ffffffff813362bc>] ? restore_args+0x0/0x30
 [<ffffffff81053884>] ? kthread+0x0/0x85
 [<ffffffff81003790>] ? kernel_thread_helper+0x0/0x10
Dead loop on virtual device bond0, fix it urgently!

^ permalink raw reply

* Re: [PATCH] ethtool: add names of newer Marvell chips
From: Jeff Garzik @ 2010-04-06 14:44 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <20100402081632.5401cf2f@nehalam>

On 04/02/2010 11:16 AM, Stephen Hemminger wrote:
> Fill in names of newer chips.
>
> Signed-off-by: Stephen Hemminger<shemminger@vyatta.com>

applied



^ permalink raw reply

* Re: [PATCH] ethtool: RXHASH flag support (v2)
From: Jeff Garzik @ 2010-04-06 14:44 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <20100402081515.1b963562@nehalam>

On 04/02/2010 11:15 AM, Stephen Hemminger wrote:
> Add support for RXHASH flag in ethtool offload.
>
> Signed-off-by: Stephen Hemminger<shemminger@vyatta.com>
> ---
>   ethtool-copy.h |    1 +
>   ethtool.8      |    4 ++++
>   ethtool.c      |   39 ++++++++++++++++++++++++++++++++++-----
>   3 files changed, 39 insertions(+), 5 deletions(-)

applied



^ permalink raw reply

* [PATCH] packet: support for TX time stamps on RAW sockets
From: Richard Cochran @ 2010-04-06 14:30 UTC (permalink / raw)
  To: netdev; +Cc: patrick.ohly

Enable the SO_TIMESTAMPING socket infrastructure for raw packet sockets.
For lack of a better idea, we have elected to use PACKET_RECV_OUTPUT for
the control message cmsg_type. This macro currently is not used anywhere
within the kernel.

Similar support for UDP and CAN sockets was added in commit
51f31cabe3ce5345b51e4a4f82138b38c4d5dc91

Signed-off-by: Richard Cochran <richard.cochran@omicron.at>
---
 net/packet/af_packet.c |   60 +++++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 59 insertions(+), 1 deletions(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index b0f037c..4513222 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -81,6 +81,7 @@
 #include <linux/mutex.h>
 #include <linux/if_vlan.h>
 #include <linux/virtio_net.h>
+#include <linux/errqueue.h>
 
 #ifdef CONFIG_INET
 #include <net/inet_common.h>
@@ -314,6 +315,8 @@ static inline struct packet_sock *pkt_sk(struct sock *sk)
 
 static void packet_sock_destruct(struct sock *sk)
 {
+	skb_queue_purge(&sk->sk_error_queue);
+
 	WARN_ON(atomic_read(&sk->sk_rmem_alloc));
 	WARN_ON(atomic_read(&sk->sk_wmem_alloc));
 
@@ -482,6 +485,9 @@ retry:
 	skb->dev = dev;
 	skb->priority = sk->sk_priority;
 	skb->mark = sk->sk_mark;
+	err = sock_tx_timestamp(msg, sk, skb_tx(skb));
+	if (err < 0)
+		goto out_unlock;
 
 	dev_queue_xmit(skb);
 	rcu_read_unlock();
@@ -1187,6 +1193,9 @@ static int packet_snd(struct socket *sock,
 	err = skb_copy_datagram_from_iovec(skb, offset, msg->msg_iov, 0, len);
 	if (err)
 		goto out_free;
+	err = sock_tx_timestamp(msg, sk, skb_tx(skb));
+	if (err < 0)
+		goto out_free;
 
 	skb->protocol = proto;
 	skb->dev = dev;
@@ -1486,6 +1495,50 @@ out:
 	return err;
 }
 
+static int packet_recv_error(struct sock *sk, struct msghdr *msg, int len)
+{
+	struct sock_exterr_skb *serr;
+	struct sk_buff *skb, *skb2;
+	int copied, err;
+
+	err = -EAGAIN;
+	skb = skb_dequeue(&sk->sk_error_queue);
+	if (skb == NULL)
+		goto out;
+
+	copied = skb->len;
+	if (copied > len) {
+		msg->msg_flags |= MSG_TRUNC;
+		copied = len;
+	}
+	err = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+	if (err)
+		goto out_free_skb;
+
+	sock_recv_timestamp(msg, sk, skb);
+
+	serr = SKB_EXT_ERR(skb);
+	put_cmsg(msg,SOL_PACKET,PACKET_RECV_OUTPUT,sizeof(serr->ee),&serr->ee);
+
+	msg->msg_flags |= MSG_ERRQUEUE;
+	err = copied;
+
+	/* Reset and regenerate socket error */
+	spin_lock_bh(&sk->sk_error_queue.lock);
+	sk->sk_err = 0;
+	if ((skb2 = skb_peek(&sk->sk_error_queue)) != NULL) {
+		sk->sk_err = SKB_EXT_ERR(skb2)->ee.ee_errno;
+		spin_unlock_bh(&sk->sk_error_queue.lock);
+		sk->sk_error_report(sk);
+	} else
+		spin_unlock_bh(&sk->sk_error_queue.lock);
+
+out_free_skb:
+	kfree_skb(skb);
+out:
+	return err;
+}
+
 /*
  *	Pull a packet from our receive queue and hand it to the user.
  *	If necessary we block.
@@ -1501,7 +1554,7 @@ static int packet_recvmsg(struct kiocb *iocb, struct socket *sock,
 	int vnet_hdr_len = 0;
 
 	err = -EINVAL;
-	if (flags & ~(MSG_PEEK|MSG_DONTWAIT|MSG_TRUNC|MSG_CMSG_COMPAT))
+	if (flags & ~(MSG_PEEK|MSG_DONTWAIT|MSG_TRUNC|MSG_CMSG_COMPAT|MSG_ERRQUEUE))
 		goto out;
 
 #if 0
@@ -1510,6 +1563,11 @@ static int packet_recvmsg(struct kiocb *iocb, struct socket *sock,
 		return -ENODEV;
 #endif
 
+	if (flags & MSG_ERRQUEUE) {
+		err = packet_recv_error(sk, msg, len);
+		goto out;
+	}
+
 	/*
 	 *	Call the generic datagram receiver. This handles all sorts
 	 *	of horrible races and re-entrancy so we can forget about it
-- 
1.6.0.4


^ permalink raw reply related

* Re: [PATCH v2] rfs: Receive Flow Steering
From: Tom Herbert @ 2010-04-06 14:25 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: davem, netdev
In-Reply-To: <1270559096.2081.35.camel@edumazet-laptop>

>
> Running on a preprod machine here, seems fine.
>
> Some questions :
>
> 1) The need to add "rps_flow_entries=xxx" at boot time is problematic.
>   Maybe we can allow it being dynamic (and use vmalloc() instead of
> alloc_large_system_hash())
>
Okay, could be a sysctl with vmalloc.

> 2) inet_rps_save_rxhash(sk, skb->rxhash);
>
>        It should have a check to make sure some part of the stack doesnt feed
> many different rxhash for a given socket (Make sure we dont pollute flow
> table with pseudo random values)
>
If packets for a connection are always received on the same device, is
it reasonable to assume the rxhash is constant for that connection?

I suppose it's possible that packets for a same sockets are being
constantly received on two different devices that are giving different
rxhashes.  This would already be bad in that OOO is probably happening
anyway.  I don't know if thrashing the sock_flow_table is going to
aggravate this scenario much.

Are there any other degenerative cases you're worried about?

> 3) UDP connected sockets dont benefit of RFS currently
>   (Not sure many apps use connected UDP sockets, I do have some of them
> in house)
>
Makes sense to support that.

> I am trying following code for IPV4 only :
>
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index 7af756d..5c2d37a 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -1216,6 +1216,7 @@ int udp_disconnect(struct sock *sk, int flags)
>        sk->sk_state = TCP_CLOSE;
>        inet->inet_daddr = 0;
>        inet->inet_dport = 0;
> +       inet_rps_save_rxhash(sk, 0);
>        sk->sk_bound_dev_if = 0;
>        if (!(sk->sk_userlocks & SOCK_BINDADDR_LOCK))
>                inet_reset_saddr(sk);
> @@ -1257,8 +1258,12 @@ EXPORT_SYMBOL(udp_lib_unhash);
>
>  static int __udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
>  {
> -       int rc = sock_queue_rcv_skb(sk, skb);
> +       int rc;
> +
> +       if (inet_sk(sk)->inet_daddr)
> +               inet_rps_save_rxhash(sk, skb->rxhash);
>
> +       rc = sock_queue_rcv_skb(sk, skb);
>        if (rc < 0) {
>                int is_udplite = IS_UDPLITE(sk);
>
>
>
>

^ permalink raw reply

* Re: Increased Latencies when upgrading kernel version
From: Xianghua Xiao @ 2010-04-06 14:10 UTC (permalink / raw)
  To: Taylor Lewick; +Cc: Eric Dumazet, netdev, linux-kernel
In-Reply-To: <m2jd585dc4f1004051034ka36301d6xdce95defe6388836@mail.gmail.com>

On Mon, Apr 5, 2010 at 12:34 PM, Taylor Lewick <taylor.lewick@gmail.com> wrote:
> Okay, don't know what to officially file this under, as a regression
> with regards to performance or what, but here is the data.  Again,
> I've noticed system and network latency appear to have worsened with
> later kernel versions.
>
> I was turned onto this problem via the following links:
> http://www.kernel.org/pub/linux/kernel/people/christoph/ols2009/ols-2009-paper.pdf
> and http://kerneltrap.org/mailarchive/linux-netdev/2009/4/16/5491284
>
> So I set up a test on two servers with Identical hardware, servers,
> nics, etc, and used hackbench, udpping, and an internally written app
> to compare latency.
>
> Here are just the hackbench results with just the averages across a 5
> runs for two different hackbench tests.  The 2.6.16 and 2.6.27 kernels
> as set up were configured with voluntary preemption, and 250 HZ, so I
> just repeated that initially for 2.6.33.1 test.  I also tested no
> preemption at same HZ setting of 250.
>
> I ran 2.6.16.60 on one server, and the other kernel versions on
> another server.  These tests are repeatable across different servers,
> as in I verified I
> don't have a bad server.
>
> Kernel Version         HB1 (25 process 300)    HB2 (100 process 300)
> 2.6.16.60                 .5402                           1.8946
> 2.6.27.19                 .619                             2.6268
> 2.6.32.3-voluntary     .5636                           2.3484
> 2.6.33.1-voluntary     .5404                           2.2872
> 2.6.33.1-nopreempt   .5606                           2.3466
>
> So 2.6.16.60 is fast, 2.6.27.19 is slow, and 2.6.33.1 with voluntary
> preemption is the next best, but results didn't hold up well as
> Hackbench tests used larger numbers of groups., for example, 2.6.16.60
> and 2.6.33.1-voluntary were basically the same for HB1, but that
> didn't hold when hackebnch tests used more groups.
>
> At this point, I'm looking for ideas in kernel build to tweak, but I'm
> not a developer.  So SLAB vs SLUB, sparse vs dense IRQ numbering, etc.
> Running a -rt kernel isn't an option at this time.  I did test that as
> well, and latencies were quite a bit worse, but I wasn't adjusting
> code to take advantage of a real time OS.
>
> I can make some changes or repeat tests.
>
> Below is some hardware comparisons betweent the two machines.
> Differences I noticed was more interrupts and CPU flags on later
> kernel version.
>
> HostA 2.6.16.60
> cat /proc/interrupts
>         CPU0       CPU1       CPU2       CPU3       CPU4       CPU5
>    CPU6       CPU7
>  0:  108509762          0          0          0          0          0
>         0          0    IO-APIC-edge  timer
>  8:          1          0          0          0          0          0
>         0          0    IO-APIC-edge  rtc
>  9:          0          0          0          0          0          0
>         0          0   IO-APIC-level  acpi
>  58:        305          0    5157735        220    2980100       5927
>      1187          0   IO-APIC-level  libata
> 162:          0          0          0          0          0          0
>         0          0   IO-APIC-level  uhci_hcd:usb1
> 170:          0          0          0          0          0          0
>         0          0   IO-APIC-level  uhci_hcd:usb2
> 177:       6326          0     229018          0     283720      35597
>       367          0   IO-APIC-level  megasas
> 178:        122          0       1784       1103       3531         20
>      1457          0   IO-APIC-level  uhci_hcd:usb3, ehci_hcd:usb6
> 186:          0          0          0          0          0          0
>         0          0   IO-APIC-level  uhci_hcd:usb4
> 194:         22          0          0          0          0          0
>         0          0   IO-APIC-level  ehci_hcd:usb5
> 210:    1790109        577          0          0          0          0
>         0          0       PCI-MSI-X  eth4-0
> 218:     233811         93          0          0          0          0
>         0          0       PCI-MSI-X  eth4-1
> NMI:          0          0          0          0          0          0
>         0          0
> LOC:  108509683  108509662  108509637  108509614  108509588  108509566
>  108509541  108509516
> ERR:          7
> MIS:          0
>
> lspci
> 00:00.0 Host bridge: Intel Corporation QuickPath Architecture I/O Hub
> to ESI Port (rev 13)
> 00:01.0 PCI bridge: Intel Corporation QuickPath Architecture I/O Hub
> PCI Express Root Port 1 (rev 13)
> 00:03.0 PCI bridge: Intel Corporation QuickPath Architecture I/O Hub
> PCI Express Root Port 3 (rev 13)
> 00:07.0 PCI bridge: Intel Corporation QuickPath Architecture I/O Hub
> PCI Express Root Port 7 (rev 13)
> 00:09.0 PCI bridge: Intel Corporation QuickPath Architecture I/O Hub
> PCI Express Root Port 9 (rev 13)
> 00:14.0 PIC: Intel Corporation QuickPath Architecture I/O Hub System
> Management Registers (rev 13)
> 00:14.1 PIC: Intel Corporation QuickPath Architecture I/O Hub GPIO and
> Scratch Pad Registers (rev 13)
> 00:14.2 PIC: Intel Corporation QuickPath Architecture I/O Hub Control
> Status and RAS Registers (rev 13)
> 00:16.0 System peripheral: Intel Corporation DMA Engine (rev 13)
> 00:16.1 System peripheral: Intel Corporation DMA Engine (rev 13)
> 00:16.2 System peripheral: Intel Corporation DMA Engine (rev 13)
> 00:16.3 System peripheral: Intel Corporation DMA Engine (rev 13)
> 00:16.4 System peripheral: Intel Corporation DMA Engine (rev 13)
> 00:16.5 System peripheral: Intel Corporation DMA Engine (rev 13)
> 00:16.6 System peripheral: Intel Corporation DMA Engine (rev 13)
> 00:16.7 System peripheral: Intel Corporation DMA Engine (rev 13)
> 00:1a.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB
> UHCI Controller #4 (rev 02)
> 00:1a.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB
> UHCI Controller #5 (rev 02)
> 00:1a.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2
> EHCI Controller #2 (rev 02)
> 00:1c.0 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express
> Port 1 (rev 02)
> 00:1d.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB
> UHCI Controller #1 (rev 02)
> 00:1d.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB
> UHCI Controller #2 (rev 02)
> 00:1d.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2
> EHCI Controller #1 (rev 02)
> 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 92)
> 00:1f.0 ISA bridge: Intel Corporation 82801IB (ICH9) LPC Interface
> Controller (rev 02)
> 00:1f.2 IDE interface: Intel Corporation 82801IB (ICH9) 2 port SATA
> IDE Controller (rev 02)
> 03:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS
> 1078 (rev 04)
> 04:00.0 PCI bridge: Integrated Device Technology, Inc. Unknown device
> 8018 (rev 0e)
> 05:02.0 PCI bridge: Integrated Device Technology, Inc. Unknown device
> 8018 (rev 0e)
> 05:04.0 PCI bridge: Integrated Device Technology, Inc. Unknown device
> 8018 (rev 0e)
> 06:00.0 Ethernet controller: Intel Corporation 82575GB Gigabit Network
> Connection (rev 02)
> 06:00.1 Ethernet controller: Intel Corporation 82575GB Gigabit Network
> Connection (rev 02)
> 07:00.0 Ethernet controller: Intel Corporation 82575GB Gigabit Network
> Connection (rev 02)
> 07:00.1 Ethernet controller: Intel Corporation 82575GB Gigabit Network
> Connection (rev 02)
> 08:00.0 Ethernet controller: Solarflare Communications Unknown device
> 0710 (rev 02)
> 09:03.0 VGA compatible controller: Matrox Graphics, Inc. Unknown
> device 0532 (rev 0a)
>
> cat /proc/cpuinfo (just showing first CPU for brevity)
> processor       : 0
> vendor_id       : GenuineIntel
> cpu family      : 6
> model           : 26
> model name      : Intel(R) Xeon(R) CPU           X5570  @ 2.93GHz
> stepping        : 5
> cpu MHz         : 2926.090
> cache size      : 8192 KB
> physical id     : 1
> siblings        : 4
> core id         : 0
> cpu cores       : 4
> fpu             : yes
> fpu_exception   : yes
> cpuid level     : 11
> wp              : yes
> flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
> mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall
> nx rdtscp lm constant_tsc pni monitor d
> s_cpl vmx est tm2 cx16 xtpr dca popcnt lahf_lm
> bogomips        : 5857.34
> clflush size    : 64
> cache_alignment : 64
> address sizes   : 40 bits physical, 48 bits virtual
> power management:
>
> ethtool -c eth4
> Coalesce parameters for eth4:
> Adaptive RX: on  TX: off
> stats-block-usecs: 0
> sample-interval: 0
> pkt-rate-low: 0
> pkt-rate-high: 0
>
> rx-usecs: 0
> rx-frames: 0
> rx-usecs-irq: 60
> rx-frames-irq: 0
>
> tx-usecs: 0
> tx-frames: 0
> tx-usecs-irq: 0
> tx-frames-irq: 0
>
> rx-usecs-low: 0
> rx-frame-low: 0
> tx-usecs-low: 0
> tx-frame-low: 0
>
> rx-usecs-high: 0
> rx-frame-high: 0
> tx-usecs-high: 0
> tx-frame-high: 0
>
>
> HostB 2.6.33.1
>    CPU0       CPU1       CPU2       CPU3       CPU4       CPU5
> CPU6       CPU7
>   0:       8637          0          0          0          0
> 0          0          0   IO-APIC-edge      timer
>   1:          2          0          0          0          0
> 0          0          0   IO-APIC-edge      i8042
>   3:          2          0          0          0          0
> 0          0          0   IO-APIC-edge
>   4:          2          0          0          0          0
> 0          0          0   IO-APIC-edge
>   8:          1          0          0          0          0
> 0          0          0   IO-APIC-edge      rtc0
>   9:          0          0          0          0          0
> 0          0          0   IO-APIC-fasteoi   acpi
>  12:          4          0          0          0          0
> 0          0          0   IO-APIC-edge      i8042
>  16:       7434        683          0          0          0
> 0          0          0   IO-APIC-fasteoi   megasas
>  17:          0          0          0          0          0
> 0          0          0   IO-APIC-fasteoi   uhci_hcd:usb3
>  18:          0          0          0          0          0
> 0          0          0   IO-APIC-fasteoi   uhci_hcd:usb4
>  19:         23          0          0          0          0
> 0          0          0   IO-APIC-fasteoi   ehci_hcd:usb1
>  20:          0          0          0          0          0
> 0          0          0   IO-APIC-fasteoi   uhci_hcd:usb6
>  21:        129          0         15          0          0
> 0          0          0   IO-APIC-fasteoi   ehci_hcd:usb2,
> uhci_hcd:usb5
>  23:        369          0          0          0          0
> 0          0          0   IO-APIC-fasteoi   ata_piix
>  67:       2346        731          0          0          0
> 0          0          0   PCI-MSI-edge      eth4-0
>  68:       1809        404          0          0          0
> 0          0          0   PCI-MSI-edge      eth4-1
>  NMI:          0          0          0          0          0
> 0          0          0   Non-maskable interrupts
>  LOC:      33071      38348      47397      23246      15715
> 11065       9004      10391   Local timer interrupts
>  SPU:          0          0          0          0          0
> 0          0          0   Spurious interrupts
>  PMI:          0          0          0          0          0
> 0          0          0   Performance monitoring interrupts
>  PND:          0          0          0          0          0
> 0          0          0   Performance pending work
>  RES:       2490       2124       4187       4974       1724
> 5548       1892       2871   Rescheduling interrupts
>  CAL:        497       2166        141        115        133
> 144        140        144   Function call interrupts
>  TLB:        243        244        928        945        289
> 187        134         93   TLB shootdowns
>  TRM:          0          0          0          0          0
> 0          0          0   Thermal event interrupts
>  THR:          0          0          0          0          0
> 0          0          0   Threshold APIC interrupts
>  MCE:          0          0          0          0          0
> 0          0          0   Machine check exceptions
>  MCP:          2          2          2          2          2
> 2          2          2   Machine check polls
>  ERR:          7
>  MIS:          0
>
> lspci
> 00:00.0 Host bridge: Intel Corporation X58 I/O Hub to ESI Port (rev 13)
> 00:01.0 PCI bridge: Intel Corporation X58 I/O Hub PCI Express Root
> Port 1 (rev 13)
> 00:03.0 PCI bridge: Intel Corporation X58 I/O Hub PCI Express Root
> Port 3 (rev 13)
> 00:07.0 PCI bridge: Intel Corporation X58 I/O Hub PCI Express Root
> Port 7 (rev 13)
> 00:09.0 PCI bridge: Intel Corporation X58 I/O Hub PCI Express Root
> Port 9 (rev 13)
> 00:14.0 PIC: Intel Corporation X58 I/O Hub System Management Registers (rev 13)
> 00:14.1 PIC: Intel Corporation X58 I/O Hub GPIO and Scratch Pad
> Registers (rev 13)
> 00:14.2 PIC: Intel Corporation X58 I/O Hub Control Status and RAS
> Registers (rev 13)
> 00:1a.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB
> UHCI Controller #4 (rev 02)
> 00:1a.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB
> UHCI Controller #5 (rev 02)
> 00:1a.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2
> EHCI Controller #2 (rev 02)
> 00:1c.0 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express
> Port 1 (rev 02)
> 00:1d.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB
> UHCI Controller #1 (rev 02)
> 00:1d.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB
> UHCI Controller #2 (rev 02)
> 00:1d.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2
> EHCI Controller #1 (rev 02)
> 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 92)
> 00:1f.0 ISA bridge: Intel Corporation 82801IB (ICH9) LPC Interface
> Controller (rev 02)
> 00:1f.2 IDE interface: Intel Corporation 82801IB (ICH9) 2 port SATA
> IDE Controller (rev 02)
> 03:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS
> 1078 (rev 04)
> 04:00.0 PCI bridge: Integrated Device Technology, Inc. PES12N3A PCI
> Express Switch (rev 0e)
> 05:02.0 PCI bridge: Integrated Device Technology, Inc. PES12N3A PCI
> Express Switch (rev 0e)
> 05:04.0 PCI bridge: Integrated Device Technology, Inc. PES12N3A PCI
> Express Switch (rev 0e)
> 06:00.0 Ethernet controller: Intel Corporation 82575GB Gigabit Network
> Connection (rev 02)
> 06:00.1 Ethernet controller: Intel Corporation 82575GB Gigabit Network
> Connection (rev 02)
> 07:00.0 Ethernet controller: Intel Corporation 82575GB Gigabit Network
> Connection (rev 02)
> 07:00.1 Ethernet controller: Intel Corporation 82575GB Gigabit Network
> Connection (rev 02)
> 08:00.0 Ethernet controller: Solarflare Communications SFC4000 rev B
> [Solarstorm] (rev 02)
> 09:03.0 VGA compatible controller: Matrox Graphics, Inc. MGA G200eW
> WPCM450 (rev 0a)
>
> cat /proc/cpuinfo (just showing first CPU for brevity)
> processor       : 0
> vendor_id       : GenuineIntel
> cpu family      : 6
> model           : 26
> model name      : Intel(R) Xeon(R) CPU           X5570  @ 2.93GHz
> stepping        : 5
> cpu MHz         : 2925.888
> cache size      : 8192 KB
> physical id     : 1
> siblings        : 4
> core id         : 0
> cpu cores       : 4
> apicid          : 16
> initial apicid  : 16
> fpu             : yes
> fpu_exception   : yes
> cpuid level     : 11
> wp              : yes
> flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
> mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe
> syscall nx rdtscp lm constant_tsc arch_perfmon pebs bt
> s rep_good xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl
> vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm ida
> tpr_shadow vnmi flexpriority ept vpid
> bogomips        : 5851.77
> clflush size    : 64
> cache_alignment : 64
> address sizes   : 40 bits physical, 48 bits virtual
> power management:
>
> ethtool -c eth4
> Coalesce parameters for eth4:
> Adaptive RX: on  TX: off
> stats-block-usecs: 0
> sample-interval: 0
> pkt-rate-low: 0
> pkt-rate-high: 0
>
> rx-usecs: 0
> rx-frames: 0
> rx-usecs-irq: 60
> rx-frames-irq: 0
>
> tx-usecs: 0
> tx-frames: 0
> tx-usecs-irq: 0
> tx-frames-irq: 0
>
> rx-usecs-low: 0
> rx-frame-low: 0
> tx-usecs-low: 0
> tx-frame-low: 0
>
> rx-usecs-high: 0
> rx-frame-high: 0
> tx-usecs-high: 0
> tx-frame-high: 0
>
>
>
> On Thu, Apr 1, 2010 at 8:53 PM, Taylor Lewick <taylor.lewick@gmail.com> wrote:
>> Okay.  I will get this info out to the list Monday.  Briefly, I'm
>> using identical hardware (server), identical NICs, same drivers,
>> connected to same switch, and using udpping, hackbench, and an
>> internall written app to test latency.  Without exception the
>> evolution has looked like the following.
>>
>> 2.6.16.60 latencies for system and network are fast.  Meaning
>> hackbench and udpping win, and win by quite a bit.
>>
>> 2.6.27.19 was awful.  2.6.32.1 and 2.6.331. were better for networking
>> (with some tweaks, i.e. disable netfilter, etc), and I was able to get
>> networking latencies to within 1-3 microseconds of 2.6.16.60
>> latencies, but the hackbench results are still pretty bad.
>>
>> Again, I'll post numbers and more detailed hardware info on Monday
>> when I'm back at office...
>>
>> On Thu, Apr 1, 2010 at 4:19 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>> Le jeudi 01 avril 2010 à 14:12 -0500, Taylor Lewick a écrit :
>>>> For some time now we've been running an older kernel, 2.6.16.60.  When
>>>> we tried to upgrade, first going to 2.6.27.19 and then to 2.6.32.1 and
>>>> 2.6.33.1 we noticed that latencies increased.  At first we noticed it
>>>> by doing network tests via udpping, netperf, etc.  We made some
>>>> tweaks, and were able to get network latency to within 1 to 2
>>>> microseconds of where we were previously on 2.6.16.60.  Then we did
>>>> some more testing, and noticed that system latency also seems higher.
>>>>
>>>> We've done our tests on identical hardware servers, same NICs,
>>>> connected through same network gear.  Basically, we've tried to keep
>>>> everything identical except the kernel versions, and we are unable to
>>>> achieve the same performance for system latency on the newer kernels,
>>>> despite adjusting various kernel settings and recompiling.
>>>>
>>>> The latency differences are about 15 microseconds per transaction.
>>>>
>>>> At this point, I don't know what else to try.  I haven't played around
>>>> with the /proc/sys/kernel/sched_* paramaters under the newer kernels
>>>> yet.  Have tried changing pre-emption modes with little effect, in
>>>> fact, voluntary preemption seems to be peforming the best for us.
>>>>
>>>> At this time the realtime patch isn't really an option for us to
>>>> consider, at least not yet.
>>>>
>>>> Any suggestions?  Is this a known issue when upgrading to more recent
>>>> kernel versions?
>>>>
>>>
>>> Hi Taylor
>>>
>>> Well, this is bit difficult to generically answer to your generic
>>> question. 15 us more latency per transaction seems pretty bad.
>>>
>>> Some inputs would be nice, describing your workload and
>>> software/hardware architecture.
>>>
>>> lspci
>>> cat /proc/cpuinfo
>>> cat /proc/interrupts
>>> dmesg
>>> ethtool -S eth0
>>> ethtool -c eth0
>>>
>>>
>>>
>>>
>>
>
Just want to ack you here, I upgraded a 2.6.18 kernel to 2.6.33.1 on a
shipping product and the performance(hackbench, latency, cpu
usage,etc) is a lot worse on the same hardware platform. We tried
2.6.27 before and it's also bad. I'm tring various CONFIG options and
so far nothing really helped. I'm using the RT patch.
Xianghua

^ permalink raw reply

* Re: [PATCH v2] rfs: Receive Flow Steering
From: Changli Gao @ 2010-04-06 13:42 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Tom Herbert, davem, netdev
In-Reply-To: <1270559096.2081.35.camel@edumazet-laptop>

On Tue, Apr 6, 2010 at 9:04 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> 1) The need to add "rps_flow_entries=xxx" at boot time is problematic.
>   Maybe we can allow it being dynamic (and use vmalloc() instead of
> alloc_large_system_hash())

Is flex_array better than vmalloc()?

-- 
Regards，
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* RE: [PATCH] bnx2x: use the dma state API instead of the pci equivalents
From: Eilon Greenstein @ 2010-04-06 13:41 UTC (permalink / raw)
  To: FUJITA Tomonori, davem@davemloft.net
  Cc: netdev@vger.kernel.org, Vladislav Zolotarov
In-Reply-To: <8628FE4E7912BF47A96AE7DD7BAC0AADDDC525AF62@SJEXCHCCR02.corp.ad.broadcom.com>

On Tue, 2010-04-06 at 00:39 -0700, Vladislav Zolotarov wrote:
> Thanks, Fujita.
> 
> The patch looks fine. I'll run some regression tests on the patched driver to check that things still work and if it's ok we will ack it shortly.
> 
> vlad
> 
> 

> > =
> > From: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
> > Subject: [PATCH] bnx2x: use the DMA API instead of the pci equivalents
> >
> > The DMA API is preferred.
> >
> > Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Vlad's testing with this patch were finished successfully.

Thanks Fujita!
Acked-by: Vladislav Zolotarov <vladz@broadcom.com>
Acked-by: Eilon Greenstein <eilong@broadcom.com>



^ permalink raw reply

* Re: [Bugme-new] [Bug 15682] New: XFRM is not updating RTAX_ADVMSS metric
From: jamal @ 2010-04-06 13:40 UTC (permalink / raw)
  To: Andrew Morton, Herbert Xu
  Cc: netdev, bugzilla-daemon, bugme-daemon, eduardo.panisset,
	David S. Miller
In-Reply-To: <20100405125055.cdc1e279.akpm@linux-foundation.org>


Herbert would give better answers. I dont think what Eduardo is 
doing is correct. You cant just start factoring in tcp headers
at the xfrm level - and besides, the mtu calculation
already takes care tunnel headers - so tcp should be able to 
compute correct MSS. 

cheers,
jamal

On Mon, 2010-04-05 at 12:50 -0700, Andrew Morton wrote:
> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
> 
> On Fri, 2 Apr 2010 17:34:35 GMT
> bugzilla-daemon@bugzilla.kernel.org wrote:
> 
> > https://bugzilla.kernel.org/show_bug.cgi?id=15682
> > 
> >            Summary: XFRM is not updating RTAX_ADVMSS metric
> >            Product: Networking
> >            Version: 2.5
> >     Kernel Version: 2.6.28-2
> >           Platform: All
> >         OS/Version: Linux
> >               Tree: Mainline
> >             Status: NEW
> >           Severity: normal
> >           Priority: P1
> >          Component: Other
> >         AssignedTo: acme@ghostprotocols.net
> >         ReportedBy: eduardo.panisset@gmail.com
> >         Regression: No
> > 
> > 
> > I have been testing DSMIPv6 code which uses all kind of advanced
> > features of XFRM framework and I believe I have found a bug related to
> > update RTAX_ADVMSS route metric.
> > The XFRM code on net/xfrm/xfrm_policy.c by its functions
> > xfrm_init_pmtu and xfrm_bundle_ok updates RTAX_MTU route caching
> > metric however I believe it must update RTAX_ADVMSS as this later is
> > used by tcp connect function for adverting the MSS value on SYN
> > messages.
> > 
> > As MSS is not being updated by XFRM the TCP SYN messages (e.g.
> > originated from a internet browser)  is erroneously informing its MSS
> > (without taking into account the overhead added to IP packet size by
> > XFRM transformations).  One result of that is the browser gets
> > "frozen" after starts a TCP connection because TCP messages sent by
> > TCP server will never get to it (TCP server is sending too large
> > segments to browser).
> > 
> > Below I describe the changes I have done (on xfrm_init_pmtu and
> > xfrm_bundle_ok) and that seem to fix this problem:
> > 
> > xfrm_init_pmtu:
> >                  .
> >                  .
> >                  .
> > 
> >         dst->metrics[RTAX_MTU-1] = pmtu; // original code, below my changes
> > 
> >         if (dst->xfrm->props.mode == XFRM_MODE_TUNNEL)
> >                  switch (dst->xfrm->props.family)
> >                  {
> >                  case AF_INET:
> >                  dst->metrics[RTAX_ADVMSS-1] = max_t(unsigned int,
> > pmtu - sizeof(struct iphdr) - sizeof(struct tcphdr), 256);
> >                  break;
> > 
> >                  case AF_INET6:
> >                  dst->metrics[RTAX_ADVMSS-1] = max_t(unsigned int,
> > pmtu - sizeof(struct ipv6hdr) - sizeof(struct tcphdr),
> >                             dev_net(dst->dev)->ipv6.
> > sysctl.ip6_rt_min_advmss);
> >                  break;
> >                  }
> > 
> > xfrm_bundle_ok:
> > 
> >                .
> >                .
> >                .
> > 
> >         dst->metrics[RTAX_MTU-1] = mtu; // original code, below my changes
> > 
> >         if (dst->xfrm->props.mode == XFRM_MODE_TUNNEL)
> >                 switch (dst->xfrm->props.family)
> >                 {
> >                 case AF_INET:
> >                         dst->metrics[RTAX_ADVMSS-1] = max_t(unsigned
> > int, mtu - sizeof(struct iphdr) - sizeof(struct tcphdr), 256);
> >                 break;
> > 
> >                 case AF_INET6:
> >                         dst->metrics[RTAX_ADVMSS-1] = max_t(unsigned
> > int, mtu - sizeof(struct ipv6hdr) - sizeof(struct tcphdr),
> > 
> > dev_net(dst->dev)->ipv6.sysctl.ip6_rt_min_advmss);
> >                 break;
> >                 }
> > 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply

* Re: [PATCH 1/4] flow: virtualize flow cache entry methods
From: Timo Teräs @ 2010-04-06 13:26 UTC (permalink / raw)
  To: Herbert Xu; +Cc: netdev
In-Reply-To: <20100406123404.GA24294@gondor.apana.org.au>

Herbert Xu wrote:
> On Mon, Apr 05, 2010 at 08:01:24PM +0300, Timo Teras wrote:
>> This allows to validate the cached object before returning it.
>> It also allows to destruct object properly, if the last reference
>> was held in flow cache. This is also a prepartion for caching
>> bundles in the flow cache.
>>
>> In return for virtualizing the methods, we save on:
>> - not having to regenerate the whole flow cache on policy removal:
>>   each flow matching a killed policy gets refreshed as the getter
>>   function notices it smartly.
>> - we do not have to call flow_cache_flush from policy gc, since the
>>   flow cache now properly deletes the object if it had any references
>>
>> Signed-off-by: Timo Teras <timo.teras@iki.fi>
> 
> Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
> 
> Thanks a lot for the patch!

As noticed in review of 2/4, this needs to be fixed by calling
flow object delete() unconditionally if genid is outdated. I'll
repost with this fixed for next iteration.


^ permalink raw reply

* Re: [PATCH 2/4] xfrm: cache bundles instead of policies for outgoing flows
From: Herbert Xu @ 2010-04-06 13:11 UTC (permalink / raw)
  To: Timo Teräs; +Cc: netdev
In-Reply-To: <4BBB2F31.7090806@iki.fi>

On Tue, Apr 06, 2010 at 03:55:13PM +0300, Timo Teräs wrote:
>
> Which also makes me think of another issue. The resolver does
> not get notice if the genid was outdated. So it might end up
> the old policies from bundle after xfrm_policy_insert(). I think
> we should explicitly call ops->delete() in flow_cache_lookup if
> the flow genid was outdated. (I remember actually doing this,
> but also removing it when I was hunting my the one hlist related
> corruption bug.)

Right, that makes sense.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox