Netdev List
 help / color / mirror / Atom feed
* Re: skb_shinfo(skb)->nr_frags > 0 while skb_is_gso(skb) == 0?
From: Michael Chan @ 2009-10-07  1:37 UTC (permalink / raw)
  To: John Wright; +Cc: netdev@vger.kernel.org, Bob Montgomery
In-Reply-To: <20091007010315.GA26498@neptune.jswright>


On Tue, 2009-10-06 at 18:03 -0700, John Wright wrote:
> So, first, a question for someone who knows more about sk_buff's than
> I:
> is it reasonable/legal for an skb for which skb_is_gso(skb) == 0 to
> also
> have skb_shinfo(skb)->nr_frags > 0?
> 

As Stephen pointed out, yes.

> If yes, then for Michael, or someone familiar with bnx2 hardware: are
> "partial BD completions" (where the hw_tx_cons value is on a ring
> index
> that, on the tx_buf_ring, would have a NULL skb value, and on the
> tx_desc_ring, the tx_bd would not have the TX_BD_FLAGS_START flag set)
> possible only for gso skb's, or is it possible any time nr_frags > 0?
> 

Partial BD completions are only possible on TSO/GSO packets, unless
there is a hardware bug that we haven't found during all these years.

Several years ago, the same crash in bnx2 was found to be caused by HTB
corrupting nr_frags while the skb was queued by the driver.  That issue
has been fixed.  I wonder if skb->gso_size can change on us while the
skb is queued or we still have another case of changing nr_frags.

I think it will be good if you can run the same test on 2.6.31 where
is_gso and nr_frags are cached.  If the chip really does partial BD
completions on TX, you should still see the same issue whether we cache
these values or not.  If it doesn't crash on 2.6.31, then it may be
something else.

Thanks for the detailed debugging information.



^ permalink raw reply

* Re: [PATCH net-next] myri10ge: add adaptive coalescing
From: Rick Jones @ 2009-10-07  1:28 UTC (permalink / raw)
  To: David Miller; +Cc: brice, netdev, gallatin
In-Reply-To: <20091006.172537.148962576.davem@davemloft.net>

David Miller wrote:
> From: Brice Goglin <brice@myri.com>
> Date: Tue, 06 Oct 2009 18:52:43 +0200
> 
> 
>>This patch adds support for adaptive interrupt coalescing to the
>>myri10ge driver. It is based on the host periodically look at
>>statistics and update the NIC coalescing accordingly.
>>
>>The NIC only provides packet throughput and we feel that it is a
>>better heuristics than the packet rate heuristics currently used
>>in ethtool. Also, assuming that the packet packet rate heuristics
>>uses what is actually sent on the wire when using TSO, it would be
>>much more expensive to implement correctly, as the driver would
>>need to calculate how many packets were sent.
>>
>>Signed-off-by: Andrew Gallatin <gallatin@myri.com>
>>Signed-off-by: Brice Goglin <brice@myri.com>
> 
> 
> Drivers tried to do this as far back as 6 years ago (tg3) and we don't
> recommend doing this with NAPI drivers.

Doesn't e1000(e) still try to do adaptive coalescing?

rick jones

^ permalink raw reply

* Re: skb_shinfo(skb)->nr_frags > 0 while skb_is_gso(skb) == 0?
From: Stephen Hemminger @ 2009-10-07  1:21 UTC (permalink / raw)
  To: John Wright; +Cc: netdev, Michael Chan, Bob Montgomery
In-Reply-To: <20091007010315.GA26498@neptune.jswright>

On Tue, 6 Oct 2009 19:03:15 -0600
John Wright <john.wright@hp.com> wrote:

> Hello,
> 
> Bob Montgomery and I are debugging an OOPS in the bnx2 driver.  The
> driver OOPSes in bnx2_tx_int(), getting a NULL pointer dereference when
> checking if the skb is GSO.  (This is on 2.6.29, before is_gso was
> cached in the tx_buf (commit d62fda08), but bear with me - while kernels
> with that commit might not crash in the same place, I think we have
> discovered a bug that would manifest itself another way.)
> 
> So, first, a question for someone who knows more about sk_buff's than I:
> is it reasonable/legal for an skb for which skb_is_gso(skb) == 0 to also
> have skb_shinfo(skb)->nr_frags > 0?

Yes, if driver support Scatter/Gather and Checksum offload,
TCP (especially splice) will hand fragmented frames to device.

Don't know what assumptions driver is making that could cause your
issue.

^ permalink raw reply

* skb_shinfo(skb)->nr_frags > 0 while skb_is_gso(skb) == 0?
From: John Wright @ 2009-10-07  1:03 UTC (permalink / raw)
  To: netdev, Michael Chan; +Cc: Bob Montgomery

[-- Attachment #1: Type: text/plain, Size: 7447 bytes --]

Hello,

Bob Montgomery and I are debugging an OOPS in the bnx2 driver.  The
driver OOPSes in bnx2_tx_int(), getting a NULL pointer dereference when
checking if the skb is GSO.  (This is on 2.6.29, before is_gso was
cached in the tx_buf (commit d62fda08), but bear with me - while kernels
with that commit might not crash in the same place, I think we have
discovered a bug that would manifest itself another way.)

So, first, a question for someone who knows more about sk_buff's than I:
is it reasonable/legal for an skb for which skb_is_gso(skb) == 0 to also
have skb_shinfo(skb)->nr_frags > 0?

If yes, then for Michael, or someone familiar with bnx2 hardware: are
"partial BD completions" (where the hw_tx_cons value is on a ring index
that, on the tx_buf_ring, would have a NULL skb value, and on the
tx_desc_ring, the tx_bd would not have the TX_BD_FLAGS_START flag set)
possible only for gso skb's, or is it possible any time nr_frags > 0?

We have a crash where both of the above conditions seem to be met.  Our
crash actually occurs in skb_is_gso(skb) - when dereferencing skb, which
is NULL - called from this chunk of code in bnx2_tx_int():

...
                tx_buf = &txr->tx_buf_ring[sw_ring_cons];
                skb = tx_buf->skb;

                /* partial BD completions possible with TSO packets */
                if (skb_is_gso(skb)) {
                        u16 last_idx, last_ring_idx;

                        last_idx = sw_cons +
                                skb_shinfo(skb)->nr_frags + 1;
...

An analysis of the disassembly shows that, at the time of the OOPS, RCX
contains the bnapi passed to bnx2_tx_int, RBX contains in its lower two
bytes the value of sw_cons, and R12 contains the value of hw_cons, the
last read value of the hardware consumer pointer.  (See the disassembly
of bnx2_poll_work, which all this was inlined into, at [1].)  Now, from
the OOPS:

[120223.249838] RAX: 000000000000005b RBX: 000000005746575b RCX: ffff8801a43cd680
[120223.249838] R10: 0000000000000006 R11: ffff8800562f6bc0 R12: 000000000000575a

Already, we see hw_cons is 0x575a (corresponding to a ring index of
0x5a), and our sw_cons (which is never supposed to go beyond hw_cons) is
0x575b.  How did this happen?

Unfortunately, it looks like another CPU ran bnx2_start_xmit and put
stuff on this ring before the OOPS handler finished running and
crash_kexec() was called.  So in our actual dump, it looks like the skb
we tried to read was not really NULL:

crash> struct bnx2_napi.tx_ring ffff8801a43cd680
  tx_ring = {
    tx_prod_bseq = 0xf8c4f09b, 
    tx_prod = 0x5783, 
    tx_bidx_addr = 0x12588, 
    tx_bseq_addr = 0x12590, 
    tx_desc_ring = 0xffff8801a3cb4000, 
    tx_buf_ring = 0xffff880325e52000, 
    tx_cons = 0x5746, 
    hw_tx_cons = 0x5746, 
    tx_desc_mapping = 0x1a3cb4000
  }
crash> p *((struct sw_tx_bd *)(0xffff880325e52000) + 0x5b)
$20 = {
  skb = 0xffff8801574c2780
}

But we'll assume for now that the kernel didn't lie, and that it really
was NULL at the time of the OOPS. :)  What's interesting, though, is
the place where the card thinks it is:

crash> p *((struct sw_tx_bd *)(0xffff880325e52000) + 0x5a)
$21 = {
  skb = 0x0
}

Ok, we might expect this if we've just cleared that skb, but we couldn't
have, since that would mean our sw_cons value would equal hw_cons, and
we wouldn't have performed that iteration in the for loop.  (We only try
to clean up a particular ring index after the card has *passed* it.)
Inside of the tx_desc_ring for this bnapi is something more interesting:

crash> p *((struct tx_bd *)(0xffff8801a3cb4000) + 0x5a)
$22 = {
  tx_bd_haddr_hi = 0x2, 
  tx_bd_haddr_lo = 0x5a8acd31, 
  tx_bd_mss_nbytes = 0x36, 
  tx_bd_vlan_tag_flags = 0x42
}

But 0x42 == 0100 0010 == TX_BD_FLAGS_TCP_UDP_CKSUM | TX_BD_FLAGS_END.
Note, TX_BD_FLAGS_START is not one of its flags, and yet it's also not
GSO - if it were, we'd expect TX_BD_FLAGS_SW_LSO (1 << 15) to be set.
See bnx2_start_xmit:

...
        if ((mss = skb_shinfo(skb)->gso_size)) {
                u32 tcp_opt_len;
                struct iphdr *iph;

                vlan_tag_flags |= TX_BD_FLAGS_SW_LSO;
...

(I have verified that this flag, among others, is indeed set on a tx_bd
whose skb we still have and has non-zero gso_size.)

If we go back one, we find a tx_bd with the TX_BD_FLAGS_START set, but
not TX_BD_FLAGS_END:

crash> p *((struct tx_bd *)(0xffff8801a3cb4000) + 0x59)
$24 = {
  tx_bd_haddr_hi = 0x1, 
  tx_bd_haddr_lo = 0xf915d3e, 
  tx_bd_mss_nbytes = 0x42, 
  tx_bd_vlan_tag_flags = 0x82
}

So, presumably, the corresponding skb would have nr_frags = 1.


That said, here's what we *think* is happening:

At the end of one of the iterations through the
while(sw_cons != hw_cons) loop, we have sw_cons == 0x5759, and

  hw_cons = bnx2_get_hw_tx_cons(bnapi);  /* hw_cons = 0x575a */

Then, next time around the loop, sw_cons != hw_cons, so we go back
inside the loop.  But since the skb is not gso, we don't detect that the
card is in the middle of the fragmented skb, and then we go ahead and
increment sw_cons (nr_frags + 1) times, which puts it at 0x575b.  Then,
we clean up an skb and a DMA mapping that the card might still be using,
and go around the loop again.  And we'll keep doing so, until the card
happens to catch up with us or we crash.  I'm not sure what happens to
the card when the skb's and DMA mappings get ripped out from under it,
but I suppose it's not good - in this case, we happened not even to have
any more skb's in the tx_buf_ring to go through, so we crashed before we
could find out. :)

I think this patch (against v2.6.29 - after d62fda08, is_gso and
nr_frags are actually stored in the tx_buf - see attached patch for one
that would apply against linux-2.6.git HEAD) would solve this issue:


diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index a7e688a..db075c8 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -2615,8 +2615,8 @@ bnx2_tx_int(struct bnx2 *bp, struct bnx2_napi *bnapi, int 
                tx_buf = &txr->tx_buf_ring[sw_ring_cons];
                skb = tx_buf->skb;
 
-               /* partial BD completions possible with TSO packets */
-               if (skb_is_gso(skb)) {
+               /* partial BD completions possible with fragmented packets */
+               if (skb_shinfo(skb)->nr_frags) {
                        u16 last_idx, last_ring_idx;
 
                        last_idx = sw_cons +


Unfortunately, we haven't been able to reproduce this crash reliably.
(Under heavy network load, we've seen multiple crashes on the same
instruction; but this one seems to be the first that clearly exhibits
the "sw_cons overtakes hw_cons" behavior.  We're still analyzing some
older dumps to see if we can find evidence of the same behavior causing
those crashes.)

Can somebody who knows a bit more about sk_buff's and fragmenting
and/or the bnx2 chipsets comment on this?

I have attached the OOPS message and an equivalent patch that applies
against current linux-2.6.git, and posted a copy of the bnx2_poll_work
disassembly at [1].  I can also provide the debug vmlinux, modules, and
crash dump if necessary.

 [1]: http://free.linux.hp.com/~jswright/bnx2-crash-data/

Thanks!

-- 
+----------------------------------------------------------+
| John Wright <john.wright@hp.com>                         |
| HP Mission Critical OS Enablement & Solution Test (MOST) |
+----------------------------------------------------------+

[-- Attachment #2: oops-message.txt --]
[-- Type: text/plain, Size: 4984 bytes --]

[120222.643171] BUG: unable to handle kernel NULL pointer dereference at 00000000000000cc
[120222.644007] IP: [<ffffffffa0089c65>] bnx2_poll_work+0xd5/0x1133 [bnx2]
[120222.644007] PGD 178498067 PUD 11f002067 PMD 0 
[120222.644007] Oops: 0000 [#1] SMP 
[120222.644007] last sysfs file: /sys/devices/system/cpu/cpu15/crash_notes
[120222.644007] CPU 0 
[120222.644007] Modules linked in: sctp crc32c libcrc32c ipmi_devintf nfsd exportfs nfs lockd nfs_acl auth_rpcgss sunrpc deflate zlib_deflate ctr twofish twofish_common camellia serpent blowfish des_generic cbc aes_x86_64 aes_generic xcbc rmd160 sha256_generic sha1_generic crypto_null af_key loop psmouse ipmi_si ipmi_msghandler hpilo serio_raw button container evdev ext3 jbd mbcache usbhid hid sg sr_mod cdrom ide_pci_generic ide_core ata_generic ata_piix libata ehci_hcd uhci_hcd cciss bnx2 zlib_inflate e1000e scsi_mod thermal processor fan thermal_sys
[120223.249838] Pid: 15233, comm: mirrorclient Not tainted 2.6.29-clim-3-amd64 #1 ProLiant DL380 G6
[120223.249838] RIP: 0010:[<ffffffffa0089c65>]  [<ffffffffa0089c65>] bnx2_poll_work+0xd5/0x1133 [bnx2]
[120223.249838] RSP: 0000:ffffffff8072ccf0  EFLAGS: 00010286
[120223.249838] RAX: 000000000000005b RBX: 000000005746575b RCX: ffff8801a43cd680
[120223.249838] RDX: ffff8801a43cd680 RSI: 000000000000005b RDI: ffff88007cfb4a28
[120223.249838] RBP: ffffffff8072ce60 R08: 0000000000000000 R09: 000000000000ad77
[120223.249838] R10: 0000000000000006 R11: ffff8800562f6bc0 R12: 000000000000575a
[120223.249838] R13: ffff880325e522d8 R14: 0000000000000000 R15: ffff8801a43c6b00
[120223.249838] FS:  0000000042df8950(0063) GS:ffffffff80735000(0000) knlGS:0000000000000000
[120223.249838] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[120223.249838] CR2: 00000000000000cc CR3: 000000003215c000 CR4: 00000000000006e0
[120223.249838] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[120223.249838] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[120223.249838] Process mirrorclient (pid: 15233, threadinfo ffff880035842000, task ffff880185c4b320)
[120223.249838] Stack:
[120223.249838]  000000000000001e 000000004d8d1012 0000000000000040 ffff8801a43cd680
[120223.249838]  ffff8801a43cc700 ffffffff80227322 ffffffff8072cd90 ffffffffa006765f
[120223.249838]  ffffffff8072cd70 ffff8801a4284000 ffff880325eb8800 ffff8801a425c940
[120223.249838] Call Trace:
[120223.249838]  <IRQ> <0> [<ffffffff80227322>] ? swiotlb_map_single_phys+0x0/0x18
[120223.249838]  [<ffffffffa006765f>] ? e1000_alloc_rx_buffers+0x13b/0x203 [e1000e]
[120225.260273]  [<ffffffffa0067a17>] ? e1000_clean_rx_irq+0x290/0x2cb [e1000e]
[120225.260273]  [<ffffffff80248291>] ? irq_exit+0x4c/0x79
[120225.401539]  [<ffffffffa008acf6>] bnx2_poll_msix+0x33/0xad [bnx2]
[120225.401539]  [<ffffffff804162e8>] net_rx_action+0xae/0x1a1
[120225.401539]  [<ffffffff80248584>] __do_softirq+0x8a/0x132
[120225.401539]  [<ffffffff8021251c>] call_softirq+0x1c/0x30
[120225.401539]  [<ffffffff8021362c>] do_softirq+0x44/0x8f
[120225.401539]  [<ffffffff80248284>] irq_exit+0x3f/0x79
[120225.770044]  [<ffffffff802138a6>] do_IRQ+0xc3/0xe5
[120225.770044]  [<ffffffff80211c93>] ret_from_intr+0x0/0x29
[120225.894313]  <EOI> <0> [<ffffffff8035d4fd>] ? copy_user_generic_string+0x2d/0x40
[120225.967993]  [<ffffffff80412b57>] ? memcpy_toiovec+0x37/0x67
[120226.056077]  [<ffffffff8041321e>] ? skb_copy_datagram_iovec+0x4b/0x1d8
[120226.056077]  [<ffffffff8044f389>] ? tcp_rcv_established+0x344/0xabb
[120226.220504]  [<ffffffff80456fa8>] ? tcp_v4_do_rcv+0x1b1/0x35e
[120226.289418]  [<ffffffff8025680d>] ? prepare_to_wait+0x60/0x69
[120226.289418]  [<ffffffff8040d2a7>] ? sk_wait_data+0xc3/0xd1
[120226.289418]  [<ffffffff80445955>] ? tcp_prequeue_process+0x73/0x89
[120226.289418]  [<ffffffff80446c40>] ? tcp_recvmsg+0x4eb/0xb18
[120226.289418]  [<ffffffff8040c022>] ? sock_common_recvmsg+0x32/0x47
[120226.289418]  [<ffffffff8040a6c2>] ? sock_recvmsg+0x10e/0x133
[120226.680001]  [<ffffffff8040a838>] ? sock_sendmsg+0xfd/0x120
[120226.680001]  [<ffffffff802565f0>] ? autoremove_wake_function+0x0/0x38
[120226.680001]  [<ffffffff802d0b4e>] ? core_sys_select+0x1df/0x254
[120226.680001]  [<ffffffff8024ca7a>] ? mod_timer+0x3d/0x43
[120226.680001]  [<ffffffff8040d0fe>] ? sk_reset_timer+0x17/0x27
[120226.680001]  [<ffffffff8020f6fc>] ? __switch_to+0xb9/0x3a5
[120226.680001]  [<ffffffff8040b805>] ? sys_recvfrom+0xa3/0xf8
[120226.680001]  [<ffffffff80240157>] ? finish_task_switch+0x2b/0xc8
[120226.680001]  [<ffffffff804c4a1d>] ? thread_return+0x3d/0xd4
[120226.680001]  [<ffffffff802111aa>] ? system_call_fastpath+0x16/0x1b
[120226.680001] Code: 00 00 00 41 01 c4 e9 f1 00 00 00 48 8b 8d a8 fe ff ff 0f b6 f3 0f b7 c6 4c 8d 2c c5 00 00 00 00 4c 03 a9 00 02 00 00 4d 8b 75 00 <41> 8b 86 cc 00 00 00 49 03 86 d0 00 00 00 66 83 78 06 00 74 24 
[120226.680001] RIP  [<ffffffffa0089c65>] bnx2_poll_work+0xd5/0x1133 [bnx2]
[120226.680001]  RSP <ffffffff8072ccf0>
[120226.680001] CR2: 00000000000000cc

[-- Attachment #3: bnx2-crash.patch --]
[-- Type: text/x-diff, Size: 567 bytes --]

diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index 08cddb6..7a2dc58 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -2797,8 +2797,8 @@ bnx2_tx_int(struct bnx2 *bp, struct bnx2_napi *bnapi, int budget)
 		/* prefetch skb_end_pointer() to speedup skb_shinfo(skb) */
 		prefetch(&skb->end);
 
-		/* partial BD completions possible with TSO packets */
-		if (tx_buf->is_gso) {
+		/* partial BD completions possible with fragmented packets */
+		if (tx_buf->nr_frags) {
 			u16 last_idx, last_ring_idx;
 
 			last_idx = sw_cons + tx_buf->nr_frags + 1;

^ permalink raw reply related

* Re: [RFC net-next-2.6] net: speedup sk_wake_async()
From: Rick Jones @ 2009-10-07  0:42 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David S. Miller, Linux Netdev List
In-Reply-To: <4ACBCDD8.5000306@gmail.com>

Eric Dumazet wrote:
> Latency works, part 1
> 
> 
> An incoming datagram must bring into cpu cache *lot* of cache lines,
> in particular : (other parts omitted (hash chains, ip route cache...))
> 
> On 32bit arches :

How about 64-bit?

> offsetof(struct sock, sk_rcvbuf)       =0x30    (read)
> offsetof(struct sock, sk_lock)         =0x34   (rw)
> 
> offsetof(struct sock, sk_sleep)        =0x50 (read)
> offsetof(struct sock, sk_rmem_alloc)   =0x64   (rw)
> offsetof(struct sock, sk_receive_queue)=0x74   (rw)
> 
> offsetof(struct sock, sk_forward_alloc)=0x98   (rw)
> 
> offsetof(struct sock, sk_callback_lock)=0xcc    (rw)
> offsetof(struct sock, sk_drops)        =0xd8 (read if we add dropcount support, rw if frame dropped)
> offsetof(struct sock, sk_filter)       =0xf8    (read)
> 
> offsetof(struct sock, sk_socket)       =0x138 (read)
> 
> offsetof(struct sock, sk_data_ready)   =0x15c   (read)
> 
> 
> We can avoid sk->sk_socket and socket->fasync_list referencing on sockets
> with no fasync() structures. (socket->fasync_list ptr is probably already in cache
> because it shares a cache line with socket->wait, ie location pointed by sk->sk_sleep)
> 
> This avoids one cache line load per incoming packet for common cases (no fasync())
> 
> We can leave (or even move in a future patch) sk->sk_socket in a cold location
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Got any netperf service demand changes?

rick jones

^ permalink raw reply

* Re: [PATCH 0/4] ISDN patches for 2.6.32 (v3)
From: David Miller @ 2009-10-07  0:36 UTC (permalink / raw)
  To: tilman; +Cc: netdev, linux-kernel, isdn, keil, isdn4linux, i4ldeveloper
In-Reply-To: <20091006-patch-capi-0.tilman@imap.cc>

From: Tilman Schmidt <tilman@imap.cc>
Date: Wed,  7 Oct 2009 00:17:55 +0200 (CEST)

> as I still haven't heard anything from Karsten and none of the other
> ISDN developers has volunteered to step into the breach, I would like to
> get back to the suggestion of merging my ISDN patches through you.
> 
> Here is a respin of the four patches for the Kernel CAPI subsystem I
> first submitted on 2009-09-06. I have folded into patch 1 the amendment
> I posted as "patch 5/4" on 2009-09-25, and re-added to patches 2 and 3
> Karsten's original Acked-by that came out of the discussion on the
> isdn4linux mailing list. Apart from that, the series is unchanged from
> the second submission on 2009-09-19.
> 
> I'll be following up with the respun patches to the Gigaset driver.
> 
> It would be great if these could still be integrated into kernel release
> 2.6.32.

Ok, I'll look over this stuff, thanks for your patience Tilman.

^ permalink raw reply

* Re: [PATCH 3/3] be2net: Bug fix to properly update ethtool tx-checksumming after ethtool -K <ifname> tx off
From: David Miller @ 2009-10-07  0:35 UTC (permalink / raw)
  To: ajitk; +Cc: netdev
In-Reply-To: <20091005122207.GA22935@serverengines.com>

From: Ajit Khaparde <ajitk@serverengines.com>
Date: Mon, 5 Oct 2009 17:52:19 +0530

> This is a fix for a bug which was a result of wrong use of checksum offload flag.
> The status of tx-checksumming was not changed from on to off
> after a 'ethtool -K <ifname> tx off' operation.
> Use the proper checksum offload flag NETIF_F_HW_CSUM instead of 
> NETIF_F_IP_CSUM and NETIF_F_IPV6_CSUM.
> Patch is against net-2.6 tree.
> 
> Signed-off-by: Ajit Khaparde <ajitk@serverengines.com>

Applied.

^ permalink raw reply

* Re: [PATCH 2/3] be2net: Fix a typo in be_cmds.h
From: David Miller @ 2009-10-07  0:35 UTC (permalink / raw)
  To: ajitk; +Cc: netdev
In-Reply-To: <20091005122154.GA22954@serverengines.com>

From: Ajit Khaparde <ajitk@serverengines.com>
Date: Mon, 5 Oct 2009 17:52:05 +0530

> MCC_STATUS_NOT_SUPPORTED should be decimal 66 not hex 66.
> This patch fixes this typo. Patch against net-2.6 tree.
> 
> Signed-off-by: Ajit Khaparde <ajitk@serverengines.com>

Applied.

^ permalink raw reply

* Re: [PATCH 1/3] be2net: Bug Fix while accounting of multicast frames during netdev stats update
From: David Miller @ 2009-10-07  0:35 UTC (permalink / raw)
  To: ajitk; +Cc: netdev
In-Reply-To: <20091005122140.GA22946@serverengines.com>

From: Ajit Khaparde <ajitk@serverengines.com>
Date: Mon, 5 Oct 2009 17:51:51 +0530

> While updating the statistics to be passed via the get_stats,
> tx multicast frames were being accounted instead of rx multicast frames.
> This patch fixes the bug. This patch is against the net-2.6 tree.
> 
> Signed-off-by: Ajit Khaparde <ajitk@serverengines.com>

Applied.

^ permalink raw reply

* Re: [net-2.6 PATCH 0/3] qlge: Fixes for qlge.
From: David Miller @ 2009-10-07  0:35 UTC (permalink / raw)
  To: ron.mercer; +Cc: netdev
In-Reply-To: <1254779209-15174-1-git-send-email-ron.mercer@qlogic.com>


All 3 patches applied, thanks Ron.

^ permalink raw reply

* Re: [RFC net-next-2.6] net: speedup sk_wake_async()
From: David Miller @ 2009-10-07  0:28 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <4ACBCDD8.5000306@gmail.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 07 Oct 2009 01:08:08 +0200

> An incoming datagram must bring into cpu cache *lot* of cache lines,
> in particular : (other parts omitted (hash chains, ip route cache...))
> 
> On 32bit arches :
 ...
> We can avoid sk->sk_socket and socket->fasync_list referencing on sockets
> with no fasync() structures. (socket->fasync_list ptr is probably already in cache
> because it shares a cache line with socket->wait, ie location pointed by sk->sk_sleep)
> 
> This avoids one cache line load per incoming packet for common cases (no fasync())
> 
> We can leave (or even move in a future patch) sk->sk_socket in a cold location
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

I like it, applied to net-next-2.6, thanks!

^ permalink raw reply

* Re: [PATCH net-next] myri10ge: add adaptive coalescing
From: David Miller @ 2009-10-07  0:25 UTC (permalink / raw)
  To: brice; +Cc: netdev, gallatin
In-Reply-To: <4ACB75DB.4060003@myri.com>

From: Brice Goglin <brice@myri.com>
Date: Tue, 06 Oct 2009 18:52:43 +0200

> This patch adds support for adaptive interrupt coalescing to the
> myri10ge driver. It is based on the host periodically look at
> statistics and update the NIC coalescing accordingly.
> 
> The NIC only provides packet throughput and we feel that it is a
> better heuristics than the packet rate heuristics currently used
> in ethtool. Also, assuming that the packet packet rate heuristics
> uses what is actually sent on the wire when using TSO, it would be
> much more expensive to implement correctly, as the driver would
> need to calculate how many packets were sent.
> 
> Signed-off-by: Andrew Gallatin <gallatin@myri.com>
> Signed-off-by: Brice Goglin <brice@myri.com>

Drivers tried to do this as far back as 6 years ago (tg3) and we don't
recommend doing this with NAPI drivers.

You detection code can never respond quick enough in response to
changes in traffic conditions.  By the time you program the new values
into the registers things on the wire can change a lot.

It's also very easy to flap and hit the settings a lot.

That's why we recommend using low hw mitigation settings and simply
leaving it like that.

^ permalink raw reply

* Re: [Bugme-new] [Bug 14336] New: [Pardus] Soft Lockup Problem with Attansic Ethernet Card
From: J. K. Cliburn @ 2009-10-06 23:39 UTC (permalink / raw)
  To: Andrew Morton
  Cc: badibere, bugzilla-daemon, bugme-daemon, netdev, linux-acpi,
	Jesse Barnes, Chris Snook, Jie Yang
In-Reply-To: <20091006155502.f3fc86d9.akpm@linux-foundation.org>


On Oct 6, 2009, at 5:55 PM, Andrew Morton wrote:

>
> (switched to email.  Please respond via emailed reply-to-all, not  
> via the
> bugzilla web interface).
>
> On Tue, 6 Oct 2009 22:37:13 GMT
> bugzilla-daemon@bugzilla.kernel.org wrote:
>
>> http://bugzilla.kernel.org/show_bug.cgi?id=14336
>>
>>            Summary: [Pardus] Soft Lockup Problem with Attansic  
>> Ethernet
>>                     Card
>>            Product: Networking
>>            Version: 2.5
>>     Kernel Version: 2.6.30.8
>>           Platform: All
>>         OS/Version: Linux
>>               Tree: Mainline
>>             Status: NEW
>>           Severity: high
>>           Priority: P1
>>          Component: IPV4
>>         AssignedTo: shemminger@linux-foundation.org
>>         ReportedBy: badibere@gmail.com
>>         Regression: No
>>
>>
>> I have soft lockup problem with attansic ethernet card while  
>> activating eth0
>> interface.
>>
>> My ethernet card is 09:00.0 Ethernet controller: Attansic  
>> Technology Corp.
>> Device 1063 (rev c0)
>>
>> If I try to activate eth0 interface while network cable is pluged  
>> in, then
>> everything freezes.
>>
>> If I try to activate eth0 interface while network cable isn't  
>> pluged in, then
>> no problem. But, if i plug in, again freezes.
>>
>> My /var/log/syslog is in attachment.
>>
>
> From the dmesg output it appears that this card is driven by the atl1c
> driver:
>
> Oct  7 00:55:57 baDibere kernel: [  169.943463] atl1c 0000:09:00.0:  
> Unable to allocate MSI interrupt Error: -22
> Oct  7 00:55:57 baDibere kernel: [  169.943576] atl1c 0000:09:00.0:  
> atl1c: eth0 NIC Link is Up<100 Mbps Full Duplex>
> Oct  7 00:56:22 baDibere kernel: [  194.849361] NVRM: Xid  
> (0001:00): 16, Head 00000000 Count 00003651
> Oct  7 00:56:31 baDibere kernel: [  203.849307] NVRM: Xid  
> (0001:00): 16, Head 00000000 Count 00003653
> Oct  7 00:56:36 baDibere kernel: [  208.849339] NVRM: Xid  
> (0001:00): 8, Channel 00000003
> Oct  7 00:56:41 baDibere kernel: [  213.852320] NVRM: Xid  
> (0001:00): 16, Head 00000000 Count 00003655
>
>
> The "Unable to allocate MSI interrupt" might be the immediate problem.
>
> Who belongs to MSI interrupt allocation?  PCI?  ACPI?
>
> However the driver does attempt to handle and recover from the MSI
> interrupt allocation error so perhaps that's no the cause at all.
>

I'd be curious to know if the lockup occurs when booted with pci=nomsi.

Jay

^ permalink raw reply

* Re: [Bugme-new] [Bug 14336] New: [Pardus] Soft Lockup Problem with  Attansic Ethernet Card
From: J. K. Cliburn @ 2009-10-06 23:33 UTC (permalink / raw)
  To: Erdem ARTAN
  Cc: Andrew Morton, bugzilla-daemon, bugme-daemon, netdev, linux-acpi,
	Jesse Barnes, Chris Snook, Jie Yang
In-Reply-To: <3d54862b0910061601iba25b87hfba151f71d6ce4ef@mail.gmail.com>

On Oct 6, 2009, at 6:01 PM, Erdem ARTAN wrote:

> Yesterday, there wasn't any problem but 3 days ago there was same  
> problem.
>
> While there wasn't any problem, if I'm not wrong, both atl1c and  
> atl1e modules were loaded on boot. But, If I try to load atl1e now,  
> nothing changes.

The pci_id says your chip is an L1c.  Jie Yang is the maintainer of  
the atl1c driver and is cc'd.

Jay

^ permalink raw reply

* RE: [net-next-2.6 PATCH 1/9] vxge: Modify __vxge_hw_device_is_privilaged() to not assume function-0 as the privileged function: Resubmit#1
From: Ramkrishna Vepa @ 2009-10-06 23:22 UTC (permalink / raw)
  To: David Miller, Sreenivasa Honnur; +Cc: netdev, support
In-Reply-To: <20091006.152241.159698718.davem@davemloft.net>

> So I'm going to apply this patch set to net-next-2.6, omitting patch
> #8 from the series.
[Ram] Thanks!

Ram

^ permalink raw reply

* alt1c: ethernet driver bug
From: Stephen Hemminger @ 2009-10-06 23:10 UTC (permalink / raw)
  To: Jie Yang; +Cc: netdev

> http://bugzilla.kernel.org/show_bug.cgi?id=14336
> 
>            Summary: [Pardus] Soft Lockup Problem with Attansic Ethernet
>                     Card
>            Product: Networking
>            Version: 2.5
>     Kernel Version: 2.6.30.8
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: high
>           Priority: P1
>          Component: IPV4
>         AssignedTo: shemminger@linux-foundation.org
>         ReportedBy: badibere@gmail.com
>         Regression: No
> 
> 
> I have soft lockup problem with attansic ethernet card while activating eth0
> interface.
> 
> My ethernet card is 09:00.0 Ethernet controller: Attansic Technology Corp.
> Device 1063 (rev c0)
> 
> If I try to activate eth0 interface while network cable is pluged in, then
> everything freezes.
> 
> If I try to activate eth0 interface while network cable isn't pluged in, then
> no problem. But, if i plug in, again freezes.
> 
> My /var/log/syslog is in attachment.
> 

this is what was in the bug attachment

Oct  7 00:53:33 baDibere kernel: [   16.162099] atl1c 0000:09:00.0: enabling device (0000 -> 0002)
Oct  7 00:53:33 baDibere kernel: [   16.162117] atl1c 0000:09:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
Oct  7 00:53:33 baDibere kernel: [   16.162141] atl1c 0000:09:00.0: setting latency timer to 64
Oct  7 00:53:33 baDibere kernel: [   16.162247] atl1c 0000:09:00.0: PME# disabled
Oct  7 00:53:33 baDibere kernel: [   16.162266] atl1c 0000:09:00.0: PME# disabled
Oct  7 00:53:33 baDibere kernel: [   16.230835] atl1c 0000:09:00.0: version 1.0.0.1-NAPI
Oct  7 00:55:57 baDibere kernel: [  169.943463] atl1c 0000:09:00.0: Unable to allocate MSI interrupt Error: -22
Oct  7 00:55:57 baDibere kernel: [  169.943576] atl1c 0000:09:00.0: atl1c: eth0 NIC Link is Up<100 Mbps Full Duplex>

^ permalink raw reply

* [RFC net-next-2.6] net: speedup sk_wake_async()
From: Eric Dumazet @ 2009-10-06 23:08 UTC (permalink / raw)
  To: David S. Miller; +Cc: Linux Netdev List

Latency works, part 1


An incoming datagram must bring into cpu cache *lot* of cache lines,
in particular : (other parts omitted (hash chains, ip route cache...))

On 32bit arches :

offsetof(struct sock, sk_rcvbuf)       =0x30    (read)
offsetof(struct sock, sk_lock)         =0x34   (rw)

offsetof(struct sock, sk_sleep)        =0x50 (read)
offsetof(struct sock, sk_rmem_alloc)   =0x64   (rw)
offsetof(struct sock, sk_receive_queue)=0x74   (rw)

offsetof(struct sock, sk_forward_alloc)=0x98   (rw)

offsetof(struct sock, sk_callback_lock)=0xcc    (rw)
offsetof(struct sock, sk_drops)        =0xd8 (read if we add dropcount support, rw if frame dropped)
offsetof(struct sock, sk_filter)       =0xf8    (read)

offsetof(struct sock, sk_socket)       =0x138 (read)

offsetof(struct sock, sk_data_ready)   =0x15c   (read)


We can avoid sk->sk_socket and socket->fasync_list referencing on sockets
with no fasync() structures. (socket->fasync_list ptr is probably already in cache
because it shares a cache line with socket->wait, ie location pointed by sk->sk_sleep)

This avoids one cache line load per incoming packet for common cases (no fasync())

We can leave (or even move in a future patch) sk->sk_socket in a cold location

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/net/sock.h |    3 ++-
 net/socket.c       |    3 +++
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 1621935..98398bd 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -504,6 +504,7 @@ enum sock_flags {
 	SOCK_TIMESTAMPING_SOFTWARE,     /* %SOF_TIMESTAMPING_SOFTWARE */
 	SOCK_TIMESTAMPING_RAW_HARDWARE, /* %SOF_TIMESTAMPING_RAW_HARDWARE */
 	SOCK_TIMESTAMPING_SYS_HARDWARE, /* %SOF_TIMESTAMPING_SYS_HARDWARE */
+	SOCK_FASYNC, /* fasync() active */
 };
 
 static inline void sock_copy_flags(struct sock *nsk, struct sock *osk)
@@ -1396,7 +1397,7 @@ static inline unsigned long sock_wspace(struct sock *sk)
 
 static inline void sk_wake_async(struct sock *sk, int how, int band)
 {
-	if (sk->sk_socket && sk->sk_socket->fasync_list)
+	if (sock_flag(sk, SOCK_FASYNC))
 		sock_wake_async(sk->sk_socket, how, band);
 }
 
diff --git a/net/socket.c b/net/socket.c
index 7565536..d53ad11 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -1100,11 +1100,14 @@ static int sock_fasync(int fd, struct file *filp, int on)
 		fna->fa_next = sock->fasync_list;
 		write_lock_bh(&sk->sk_callback_lock);
 		sock->fasync_list = fna;
+		sock_set_flag(sk, SOCK_FASYNC);
 		write_unlock_bh(&sk->sk_callback_lock);
 	} else {
 		if (fa != NULL) {
 			write_lock_bh(&sk->sk_callback_lock);
 			*prev = fa->fa_next;
+			if (!sock->fasync_list)
+				sock_reset_flag(sk, SOCK_FASYNC);
 			write_unlock_bh(&sk->sk_callback_lock);
 			kfree(fa);
 		}

^ permalink raw reply related

* Re: [Bugme-new] [Bug 14336] New: [Pardus] Soft Lockup Problem with Attansic Ethernet Card
From: Andrew Morton @ 2009-10-06 22:55 UTC (permalink / raw)
  To: badibere
  Cc: bugzilla-daemon, bugme-daemon, netdev, linux-acpi, Jesse Barnes,
	Jay Cliburn, Chris Snook, Jie Yang
In-Reply-To: <bug-14336-10286@http.bugzilla.kernel.org/>


(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Tue, 6 Oct 2009 22:37:13 GMT
bugzilla-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=14336
> 
>            Summary: [Pardus] Soft Lockup Problem with Attansic Ethernet
>                     Card
>            Product: Networking
>            Version: 2.5
>     Kernel Version: 2.6.30.8
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: high
>           Priority: P1
>          Component: IPV4
>         AssignedTo: shemminger@linux-foundation.org
>         ReportedBy: badibere@gmail.com
>         Regression: No
> 
> 
> I have soft lockup problem with attansic ethernet card while activating eth0
> interface.
> 
> My ethernet card is 09:00.0 Ethernet controller: Attansic Technology Corp.
> Device 1063 (rev c0)
> 
> If I try to activate eth0 interface while network cable is pluged in, then
> everything freezes.
> 
> If I try to activate eth0 interface while network cable isn't pluged in, then
> no problem. But, if i plug in, again freezes.
> 
> My /var/log/syslog is in attachment.
> 

>From the dmesg output it appears that this card is driven by the atl1c
driver:

Oct  7 00:55:57 baDibere kernel: [  169.943463] atl1c 0000:09:00.0: Unable to allocate MSI interrupt Error: -22
Oct  7 00:55:57 baDibere kernel: [  169.943576] atl1c 0000:09:00.0: atl1c: eth0 NIC Link is Up<100 Mbps Full Duplex>
Oct  7 00:56:22 baDibere kernel: [  194.849361] NVRM: Xid (0001:00): 16, Head 00000000 Count 00003651
Oct  7 00:56:31 baDibere kernel: [  203.849307] NVRM: Xid (0001:00): 16, Head 00000000 Count 00003653
Oct  7 00:56:36 baDibere kernel: [  208.849339] NVRM: Xid (0001:00): 8, Channel 00000003
Oct  7 00:56:41 baDibere kernel: [  213.852320] NVRM: Xid (0001:00): 16, Head 00000000 Count 00003655


The "Unable to allocate MSI interrupt" might be the immediate problem.

Who belongs to MSI interrupt allocation?  PCI?  ACPI?

However the driver does attempt to handle and recover from the MSI
interrupt allocation error so perhaps that's no the cause at all.


^ permalink raw reply

* Re: [net-next-2.6 PATCH 1/9] vxge: Modify __vxge_hw_device_is_privilaged() to not assume function-0 as the privileged function: Resubmit#1
From: David Miller @ 2009-10-06 22:22 UTC (permalink / raw)
  To: Sreenivasa.Honnur; +Cc: netdev, support
In-Reply-To: <Pine.GSO.4.10.10910050749130.9377-100000@guinness>


So I'm going to apply this patch set to net-next-2.6, omitting patch
#8 from the series.

Thanks.

^ permalink raw reply

* Re: [net-next-2.6 PATCH 15/15] igb: add flushes between RAR writes when setting mac address
From: David Miller @ 2009-10-06 22:22 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, alexander.h.duyck
In-Reply-To: <20091005163601.32487.87871.stgit@localhost.localdomain>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Mon, 05 Oct 2009 09:36:01 -0700

> From: Alexander Duyck <alexander.h.duyck@intel.com>
> 
> There are some switches that will do write combining when they see two
> sequential regions written. In order to avoid any possible write combining
> issues it is necessary to add a flush after writing each piece of a rar
> register.
> 
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Applied.

^ permalink raw reply

* Re: [net-next-2.6 PATCH 14/15] igb: add additional error handling to the phy code
From: David Miller @ 2009-10-06 22:22 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, alexander.h.duyck
In-Reply-To: <20091005163541.32487.2008.stgit@localhost.localdomain>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Mon, 05 Oct 2009 09:35:42 -0700

> From: Alexander Duyck <alexander.h.duyck@intel.com>
> 
> This update adds additional exception handling to the phy code to handle
> situations where it may be called incorrectly.  In addition it adds some
> bounds checking to the cable length checks to prevent an array overrun in
> the event that the hardware returned a different value than expected.
> 
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Applied.

^ permalink raw reply

* Re: [net-next-2.6 PATCH 13/15] igb: add code to retry a phy read in the event of failure on link check
From: David Miller @ 2009-10-06 22:21 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, alexander.h.duyck
In-Reply-To: <20091005163522.32487.5411.stgit@localhost.localdomain>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Mon, 05 Oct 2009 09:35:23 -0700

> From: Alexander Duyck <alexander.h.duyck@intel.com>
> 
> This patch adds a retry to phy reads in the event of failure.  The original
> code broke out of the loop on failure and this is a mistake as we should be
> trying to do the read twice.
> 
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Applied.

^ permalink raw reply

* Re: [net-next-2.6 PATCH 12/15] igb: move the generic copper link setup code into e1000_phy.c
From: David Miller @ 2009-10-06 22:21 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, alexander.h.duyck
In-Reply-To: <20091005163502.32487.1493.stgit@localhost.localdomain>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Mon, 05 Oct 2009 09:35:03 -0700

> From: Alexander Duyck <alexander.h.duyck@intel.com>
> 
> This patch moves the generic portion of the copper link setup into a
> seperate function in e1000_phy.c.  This helps to reduce the size of
> copper_link_setup_82575 and make it a bit more readable.
> 
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Applied.

^ permalink raw reply

* Re: [net-next-2.6 PATCH 11/15] igb: remove microwire support from igb
From: David Miller @ 2009-10-06 22:21 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, alexander.h.duyck
In-Reply-To: <20091005163443.32487.94787.stgit@localhost.localdomain>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Mon, 05 Oct 2009 09:34:44 -0700

> From: Alexander Duyck <alexander.h.duyck@intel.com>
> 
> igb doesn't have any devices that use a microwire interface for NVM.  As
> such the code related to this can be removed.
> 
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Applied.

^ permalink raw reply

* Re: [net-next-2.6 PATCH 10/15] igb: change how we handle alternate mac addresses
From: David Miller @ 2009-10-06 22:21 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, alexander.h.duyck
In-Reply-To: <20091005163424.32487.97223.stgit@localhost.localdomain>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Mon, 05 Oct 2009 09:34:25 -0700

> From: Alexander Duyck <alexander.h.duyck@intel.com>
> 
> This patch allows us to treat the alternate mac address as though it is the
> physical address on the adapter.  This is accomplished by letting the
> alt_mac_address function to only fail on an NVM error.  If no errors occur
> and the alternate mac address is not present then RAR0 is read as the
> default mac address.
> 
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Applied.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox