Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: Linux TCP's Robustness to Multipath Packet Reordering
From: Dominik Kaspar @ 2011-04-26 21:27 UTC (permalink / raw)
  To: John Heffner; +Cc: Eric Dumazet, Carsten Wolff, netdev
In-Reply-To: <BANLkTikoQfbSxnJi_OR+N6sa5iVNcTO6Ug@mail.gmail.com>

Hi John,

Thanks for your advice. I am very well aware that TCP is not designed
to work under such conditions. I am still surprised how well Linux TCP
handles many situations of excessive, persistent packet reordering. In
scenarios of fairly heterogeneous path characteristics, Linux TCP
aggregates multiple paths close to ideally :-)

If I'm not mistaken, cwnd moderation is a measure to prevent TCP from
sending large bursts if a single ACK covers many segments. In what way
can cwnd moderation prevent TCP from increasing its estimate of packet
reordering?

Greetings,
Dominik

On Tue, Apr 26, 2011 at 10:16 PM, John Heffner <johnwheffner@gmail.com> wrote:
> First, TCP is definitely not designed to work under such conditions.
> For example, assumptions behind RTO calculation and fast retransmit
> heuristics are violated.  However, in this particular case my first
> guess is that you are being limited by "cwnd moderation," which was
> the topic of recent discussion here.  Under persistent reordering,
> cwnd moderation can inhibit the ability of cwnd to grow.
>
> Thanks,
>  -John
>
>
> On Tue, Apr 26, 2011 at 2:00 PM, Dominik Kaspar <dokaspar.ietf@gmail.com> wrote:
>> Hi Eric,
>>
>> Here are the tcpdump files for the first TSO-disabled experiment, in a
>> full version and a short version with only the first 10000 packets:
>>
>> http://home.simula.no/~kaspar/static/mptcp-emu-wlan-hspa-01-tos0-exp1-full.pcap
>> http://home.simula.no/~kaspar/static/mptcp-emu-wlan-hspa-01-tos0-exp1-short.pcap
>>
>> By the way, the packets are sent from the server (x.x.x.189) to the
>> client interfaces (x.x.x.74) and (x.x.x.216) with the following
>> pattern (which is a non-bursty 128-bit approximation of scheduling
>> with a 600:400 ratio over primary path 0 and secondary path 1):
>>
>> 0010010100101001010010100101001010010100101001010010100101001010
>> 0101001010010100101001010010100101001010010100101001010010100101
>>
>> Greetings,
>> Dominik
>>
>> On Tue, Apr 26, 2011 at 7:10 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>> Le mardi 26 avril 2011 à 18:58 +0200, Dominik Kaspar a écrit :
>>>> Hi Eric,
>>>>
>>>> On Mon, Apr 25, 2011 at 5:38 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>>> >
>>>> > Since you have at sender a rule to spoof destination address of packets,
>>>> > you should make sure you dont send "super packets (up to 64Kbytes)",
>>>> > because it would stress the multipath more than you wanted to. This way,
>>>> > you send only normal packets (1500 MTU).
>>>> >
>>>> > ethtool -K eth0 tso off
>>>> > ethtool -K eth0 gso off
>>>> >
>>>> > I am pretty sure it should help your (atypic) workload.
>>>>
>>>> I made new experiments with the exact same multipath setup as before,
>>>> but disabled TSO and GSO on all involved Ethernet interfaces. However,
>>>> this did not seem to change much about TCP's behavior when packets are
>>>> striped over heterogeneous paths. You can see the results of four
>>>> 20-minute experiments on this plot:
>>>>
>>>> http://home.simula.no/~kaspar/static/mptcp-emu-wlan-hspa-01-tos0.png
>>>>
>>>> Cheers,
>>>> Dominik
>>>
>>> Hi Dominik
>>>
>>> Any chance to have a pcap file from sender side, of say first 10.000
>>> packets ?
>>>
>>>
>>>
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>

^ permalink raw reply

* Re: [Bugme-new] [Bug 33842] New: NULL pointer dereference in ip_fragment
From: Eric Dumazet @ 2011-04-26 21:24 UTC (permalink / raw)
  To: Bandan Das; +Cc: David Miller, netdev, akpm, tom
In-Reply-To: <20110426211946.GO15903@stratus.com>

Le mardi 26 avril 2011 à 17:19 -0400, Bandan Das a écrit :
> > 
> Yeah, I just rechecked and this is already in Linus' tree. So, Tomas you can
> either try pulling in those changes or you can apply this patch and see
> if it makes any difference. Thanks!

Better pull Linus tree because there is another patch involved.

(commits c65353daf137dd41f3ede3baf62d561fca076228
ip: ip_options_compile() resilient to NULL skb route)




^ permalink raw reply

* Re: [Bugme-new] [Bug 33842] New: NULL pointer dereference in ip_fragment
From: Bandan Das @ 2011-04-26 21:19 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Bandan Das, David Miller, netdev, akpm, tom
In-Reply-To: <1303851718.2699.8.camel@edumazet-laptop>

> > Umm.. I could be wrong! I just did a quick grep for your name in the 
> > 2.6.39-rc4 changelog : 
> > http://www.kernel.org/pub/linux/kernel/v2.6/testing/ChangeLog-2.6.39-rc4
> > 
> > and didn't find it there.
> 
> Then it will be in rc5, dont worry ;)
> 
> 
Yeah, I just rechecked and this is already in Linus' tree. So, Tomas you can
either try pulling in those changes or you can apply this patch and see
if it makes any difference. Thanks!


diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c
index 008ff6c..f3bc322 100644
--- a/net/bridge/br_netfilter.c
+++ b/net/bridge/br_netfilter.c
@@ -249,11 +249,9 @@ static int br_parse_ip_options(struct sk_buff *skb)
 		goto drop;
 	}
 
-	/* Zero out the CB buffer if no options present */
-	if (iph->ihl == 5) {
-		memset(IPCB(skb), 0, sizeof(struct inet_skb_parm));
+	memset(IPCB(skb), 0, sizeof(struct inet_skb_parm));
+	if (iph->ihl == 5)
 		return 0;
-	}
 
 	opt->optlen = iph->ihl*4 - sizeof(struct iphdr);
 	if (ip_options_compile(dev_net(dev), opt, skb))

^ permalink raw reply related

* Re: Linux TCP's Robustness to Multipath Packet Reordering
From: Eric Dumazet @ 2011-04-26 21:17 UTC (permalink / raw)
  To: Dominik Kaspar; +Cc: Carsten Wolff, netdev
In-Reply-To: <1303852092.2699.11.camel@edumazet-laptop>

Le mardi 26 avril 2011 à 23:08 +0200, Eric Dumazet a écrit :
> Le mardi 26 avril 2011 à 23:04 +0200, Dominik Kaspar a écrit :
> 
> > In these experiments, a queue size of 1000 packets was specified. I am
> > aware that this is typically referred to as "buffer bloat" and causes
> > the RTT and the cwnd to grow excessively. The smaller I configure the
> > queues, the more time it takes for TCP to "level up" to the aggregate
> > throughput. By keeping the queues so large, I hope to more quickly
> > identify the reason why TCP is actually able to adjust to the immense
> > multipath reordering. What parameters could be highly relevant, other
> > than the queue size?
> > 
> 
> losses of course ;)
> 
> Real internet is full of packet losses, and probability of these losses
> depends on queue sizes (RED like AQM)
> 
> 

BTW, netem in linux-2.6.39 contains lot of changes in netem module

commit 661b79725fea030803a89a16cda
(netem: revised correlated loss generator)

    This is a patch originated with Stefano Salsano and Fabio Ludovici.
    It provides several alternative loss models for use with netem.
    This patch adds two state machine based loss models.
    

http://netgroup.uniroma2.it/twiki/bin/view.cgi/Main/NetemCLG




^ permalink raw reply

* Re: Linux TCP's Robustness to Multipath Packet Reordering
From: Dominik Kaspar @ 2011-04-26 21:16 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Carsten Wolff, netdev
In-Reply-To: <1303852092.2699.11.camel@edumazet-laptop>

On Tue, Apr 26, 2011 at 11:08 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le mardi 26 avril 2011 à 23:04 +0200, Dominik Kaspar a écrit :
>
>> In these experiments, a queue size of 1000 packets was specified. I am
>> aware that this is typically referred to as "buffer bloat" and causes
>> the RTT and the cwnd to grow excessively. The smaller I configure the
>> queues, the more time it takes for TCP to "level up" to the aggregate
>> throughput. By keeping the queues so large, I hope to more quickly
>> identify the reason why TCP is actually able to adjust to the immense
>> multipath reordering. What parameters could be highly relevant, other
>> than the queue size?
>>
>
> losses of course ;)
>
> Real internet is full of packet losses, and probability of these losses
> depends on queue sizes (RED like AQM)
>

No additional random loss is introduced (yet), so packet loss happens
only when the queue size of 1000 packets is hit. Since the queues are
configured overly large, packet loss rarely happens at all... of
course at the cost of a large RTT.

I suspect that artificially bloating the RTT somehow allows TCP to
better adjust to multipath reordering... just haven't got a clue why.

Cheers,
Dominik

^ permalink raw reply

* Re: Linux TCP's Robustness to Multipath Packet Reordering
From: Eric Dumazet @ 2011-04-26 21:08 UTC (permalink / raw)
  To: Dominik Kaspar; +Cc: Carsten Wolff, netdev
In-Reply-To: <BANLkTint2-Fg35T9SqWPm3nOaoc1d=ZEnQ@mail.gmail.com>

Le mardi 26 avril 2011 à 23:04 +0200, Dominik Kaspar a écrit :

> In these experiments, a queue size of 1000 packets was specified. I am
> aware that this is typically referred to as "buffer bloat" and causes
> the RTT and the cwnd to grow excessively. The smaller I configure the
> queues, the more time it takes for TCP to "level up" to the aggregate
> throughput. By keeping the queues so large, I hope to more quickly
> identify the reason why TCP is actually able to adjust to the immense
> multipath reordering. What parameters could be highly relevant, other
> than the queue size?
> 

losses of course ;)

Real internet is full of packet losses, and probability of these losses
depends on queue sizes (RED like AQM)


> Thanks for the tip about printing tc/netem statistics after each run,
> I will use "tc -s -d qdisc" next time.
> 



^ permalink raw reply

* Re: Linux TCP's Robustness to Multipath Packet Reordering
From: Dominik Kaspar @ 2011-04-26 21:04 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Carsten Wolff, netdev
In-Reply-To: <1303850622.2699.6.camel@edumazet-laptop>

On Tue, Apr 26, 2011 at 10:43 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le lundi 25 avril 2011 à 16:35 +0200, Dominik Kaspar a écrit :
>
>> For the experiments, all default TCP options were used, meaning that
>> SACK, DSACK, Timestamps, were all enabled. Not sure how to turn on/off
>> TSO... so that is probably enabled, too. Path emulation is done with
>> tc/netem at the receiver interfaces (eth1, eth2) with this script:
>>
>> http://home.simula.no/~kaspar/static/netem.sh
>>
>
> What are the exact parameters ? (queue size for instance)
>
> It would be nice to give detailed stats after one run, on receiver
> (since you have netem on ingress side)
>
> tc -s -d qdisc

In these experiments, a queue size of 1000 packets was specified. I am
aware that this is typically referred to as "buffer bloat" and causes
the RTT and the cwnd to grow excessively. The smaller I configure the
queues, the more time it takes for TCP to "level up" to the aggregate
throughput. By keeping the queues so large, I hope to more quickly
identify the reason why TCP is actually able to adjust to the immense
multipath reordering. What parameters could be highly relevant, other
than the queue size?

Thanks for the tip about printing tc/netem statistics after each run,
I will use "tc -s -d qdisc" next time.

Greetings,
Dominik

^ permalink raw reply

* Re: [Bugme-new] [Bug 33842] New: NULL pointer dereference in ip_fragment
From: Eric Dumazet @ 2011-04-26 21:01 UTC (permalink / raw)
  To: Bandan Das; +Cc: David Miller, netdev, akpm, tom
In-Reply-To: <20110426205901.GN15903@stratus.com>

Le mardi 26 avril 2011 à 16:59 -0400, Bandan Das a écrit :
> On  0, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > Le mardi 26 avril 2011 à 13:46 -0700, David Miller a écrit :
> > 
> > > This patch is mangled by your email client, tab characters
> > > have been turned into spaces, so it won't be usable by anyone.
> > 
> > Thats strange, I thought it was already in linux-2.6 anyway ?
> > 
> Umm.. I could be wrong! I just did a quick grep for your name in the 
> 2.6.39-rc4 changelog : 
> http://www.kernel.org/pub/linux/kernel/v2.6/testing/ChangeLog-2.6.39-rc4
> 
> and didn't find it there.

Then it will be in rc5, dont worry ;)




^ permalink raw reply

* Re: [Bugme-new] [Bug 33842] New: NULL pointer dereference in ip_fragment
From: Bandan Das @ 2011-04-26 20:59 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, bandan.das, netdev, akpm, tom
In-Reply-To: <1303851185.2699.7.camel@edumazet-laptop>

On  0, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le mardi 26 avril 2011 à 13:46 -0700, David Miller a écrit :
> 
> > This patch is mangled by your email client, tab characters
> > have been turned into spaces, so it won't be usable by anyone.
> 
> Thats strange, I thought it was already in linux-2.6 anyway ?
> 
Umm.. I could be wrong! I just did a quick grep for your name in the 
2.6.39-rc4 changelog : 
http://www.kernel.org/pub/linux/kernel/v2.6/testing/ChangeLog-2.6.39-rc4

and didn't find it there.

--
Bandan

^ permalink raw reply

* Re: [Bugme-new] [Bug 33842] New: NULL pointer dereference in ip_fragment
From: Eric Dumazet @ 2011-04-26 20:53 UTC (permalink / raw)
  To: David Miller; +Cc: bandan.das, netdev, akpm, tom
In-Reply-To: <20110426.134637.48491363.davem@davemloft.net>

Le mardi 26 avril 2011 à 13:46 -0700, David Miller a écrit :

> This patch is mangled by your email client, tab characters
> have been turned into spaces, so it won't be usable by anyone.

Thats strange, I thought it was already in linux-2.6 anyway ?




^ permalink raw reply

* Re: [Bugme-new] [Bug 33842] New: NULL pointer dereference in ip_fragment
From: David Miller @ 2011-04-26 20:46 UTC (permalink / raw)
  To: bandan.das; +Cc: netdev, akpm, tom, eric.dumazet
In-Reply-To: <20110426203154.GM15903@stratus.com>

From: Bandan Das <bandan.das@stratus.com>
Date: Tue, 26 Apr 2011 16:31:54 -0400

> https://bugzilla.kernel.org/show_bug.cgi?id=33842
> 
> I believe  Eric's recent change to br_parse_ip_options() 
> didn't make it to 2.6.39-rc4:
> 
> bridge: reset IPCB in br_parse_ip_options
> commit f8e9881c2aef1e982e5abc25c046820cd0b7cf64
> 
> diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c
> index 008ff6c..b353f7c 100644
> --- a/net/bridge/br_netfilter.c
> +++ b/net/bridge/br_netfilter.c
> @@ -249,11 +249,9 @@  static int br_parse_ip_options(struct sk_buff *skb)
>            goto drop;
>            }
>  
> -       /* Zero out the CB buffer if no options present */
> -       if (iph->ihl == 5) {
> -          memset(IPCB(skb), 0, sizeof(struct inet_skb_parm));
> +          memset(IPCB(skb), 0, sizeof(struct inet_skb_parm));
> +          if (iph->ihl == 5)
>               return 0;
> -             }
>  
>         opt->optlen = iph->ihl*4 - sizeof(struct iphdr);
>         if (ip_options_compile(dev_net(dev), opt, skb))
> 
> 
> 
> Tomas, could you please try a kernel that has the above 
> mentioned change and see if the crash re-occurs ?

This patch is mangled by your email client, tab characters
have been turned into spaces, so it won't be usable by anyone.

^ permalink raw reply

* Re: Linux TCP's Robustness to Multipath Packet Reordering
From: Eric Dumazet @ 2011-04-26 20:43 UTC (permalink / raw)
  To: Dominik Kaspar; +Cc: Carsten Wolff, netdev
In-Reply-To: <BANLkTi=xns1Gdjyt-SX3yDETSQfO23rXXg@mail.gmail.com>

Le lundi 25 avril 2011 à 16:35 +0200, Dominik Kaspar a écrit :

> For the experiments, all default TCP options were used, meaning that
> SACK, DSACK, Timestamps, were all enabled. Not sure how to turn on/off
> TSO... so that is probably enabled, too. Path emulation is done with
> tc/netem at the receiver interfaces (eth1, eth2) with this script:
> 
> http://home.simula.no/~kaspar/static/netem.sh
> 

What are the exact parameters ? (queue size for instance)

It would be nice to give detailed stats after one run, on receiver
(since you have netem on ingress side)

tc -s -d qdisc



^ permalink raw reply

* Re: [Bugme-new] [Bug 33842] New: NULL pointer dereference in ip_fragment
From: Bandan Das @ 2011-04-26 20:31 UTC (permalink / raw)
  To: NetDev; +Cc: akpm, tom, Eric Dumazet

https://bugzilla.kernel.org/show_bug.cgi?id=33842

I believe  Eric's recent change to br_parse_ip_options() 
didn't make it to 2.6.39-rc4:

bridge: reset IPCB in br_parse_ip_options
commit f8e9881c2aef1e982e5abc25c046820cd0b7cf64

diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c
index 008ff6c..b353f7c 100644
--- a/net/bridge/br_netfilter.c
+++ b/net/bridge/br_netfilter.c
@@ -249,11 +249,9 @@  static int br_parse_ip_options(struct sk_buff *skb)
           goto drop;
           }
 
-       /* Zero out the CB buffer if no options present */
-       if (iph->ihl == 5) {
-          memset(IPCB(skb), 0, sizeof(struct inet_skb_parm));
+          memset(IPCB(skb), 0, sizeof(struct inet_skb_parm));
+          if (iph->ihl == 5)
              return 0;
-             }
 
        opt->optlen = iph->ihl*4 - sizeof(struct iphdr);
        if (ip_options_compile(dev_net(dev), opt, skb))



Tomas, could you please try a kernel that has the above 
mentioned change and see if the crash re-occurs ?

Thanks,
Bandan

^ permalink raw reply related

* [PATCH] bnx2: cancel timer on device removal
From: Neil Horman @ 2011-04-26 20:30 UTC (permalink / raw)
  To: netdev; +Cc: Neil Horman, Michael Chan, David S. Miller

This oops was recently reported to me:

invalid opcode: 0000 [#1] SMP
last sysfs file:
/sys/devices/pci0000:00/0000:00:01.0/0000:01:0d.0/0000:02:05.0/device
CPU 1
Modules linked in: bnx2(+) sunrpc ipv6 dm_mirror dm_region_hash dm_log sg
microcode serio_raw amd64_edac_mod edac_core edac_mce_amd k8temp i2c_piix4
shpchp ext4 mbcache jbd2 sd_mod crc_t10dif mptsas mptscsih mptbase
scsi_transport_sas radeon ttm drm_kms_helper drm hwmon i2c_algo_bit i2c_core
dm_mod [last unloaded: bnx2]

Modules linked in: bnx2(+) sunrpc ipv6 dm_mirror dm_region_hash dm_log sg
microcode serio_raw amd64_edac_mod edac_core edac_mce_amd k8temp i2c_piix4
shpchp ext4 mbcache jbd2 sd_mod crc_t10dif mptsas mptscsih mptbase
scsi_transport_sas radeon ttm drm_kms_helper drm hwmon i2c_algo_bit i2c_core
dm_mod [last unloaded: bnx2]
Pid: 23900, comm: pidof Not tainted 2.6.32-130.el6.x86_64 #1 BladeCenter LS21
-[797251Z]-
RIP: 0010:[<ffffffffa058b270>]  [<ffffffffa058b270>] 0xffffffffa058b270
RSP: 0018:ffff880002083e48  EFLAGS: 00010246
RAX: ffff880002083e90 RBX: ffff88007ccd4000 RCX: 0000000000000000
RDX: 0000000000000100 RSI: dead000000200200 RDI: ffff8800007b8700
RBP: ffff880002083ed0 R08: ffff88000208db40 R09: 0000022d191d27c8
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800007b9bc8
R13: ffff880002083e90 R14: ffff8800007b8700 R15: ffffffffa058b270
FS:  00007fbb3bcf7700(0000) GS:ffff880002080000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000001664a98 CR3: 0000000060395000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process pidof (pid: 23900, threadinfo ffff8800007e8000, task ffff8800091c0040)
Stack:
 ffffffff81079f77 ffffffff8109e010 ffff88007ccd5c20 ffff88007ccd5820
<0> ffff88007ccd5420 ffff8800007e9fd8 ffff8800007e9fd8 0000010000000000
<0> ffff88007ccd5020 ffff880002083e90 ffff880002083e90 ffffffff8102a00d
Call Trace:
 <IRQ>
 [<ffffffff81079f77>] ? run_timer_softirq+0x197/0x340
 [<ffffffff8109e010>] ? tick_sched_timer+0x0/0xc0
 [<ffffffff8102a00d>] ? lapic_next_event+0x1d/0x30
 [<ffffffff8106f737>] __do_softirq+0xb7/0x1e0
 [<ffffffff81092cc0>] ? hrtimer_interrupt+0x140/0x250
 [<ffffffff81185f90>] ? filldir+0x0/0xe0
 [<ffffffff8100c2cc>] call_softirq+0x1c/0x30
 [<ffffffff8100df05>] do_softirq+0x65/0xa0
 [<ffffffff8106f525>] irq_exit+0x85/0x90
 [<ffffffff814e3340>] smp_apic_timer_interrupt+0x70/0x9b
 [<ffffffff8100bc93>] apic_timer_interrupt+0x13/0x20
 <EOI>
 [<ffffffff81211ba5>] ? selinux_file_permission+0x45/0x150
 [<ffffffff81262a75>] ? _atomic_dec_and_lock+0x55/0x80
 [<ffffffff812050c6>] security_file_permission+0x16/0x20
 [<ffffffff811861c1>] vfs_readdir+0x71/0xe0
 [<ffffffff81186399>] sys_getdents+0x89/0xf0
 [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b

It occured during some stress testing, in which the reporter was repeatedly
removing and modprobing the bnx2 module while doing various other random
operations on the bnx2 registered net device.  Noting that this error occured on
a serdes based device, we noted that there were a few ethtool operations (most
notably self_test and set_phys_id) that have execution paths that lead into
bnx2_setup_serdes_phy.  This function is notable because it executes a mod_timer
call, which starts the bp->timer running.  Currently bnx2 is setup to assume
that this timer only nees to be stopped when bnx2_close or bnx2_suspend is
called.  Since the above ethtool operations are not gated on the net device
having been opened however, that assumption is incorrect, and can lead to the
timer still running after the module has been removed, leading to the oops above
(as well as other simmilar oopses).

Fix the problem by ensuring that the timer is stopped when pci_device_unregister
is called.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: Hushan Jia <hjia@redhat.com>
CC: Michael Chan <mchan@broadcom.com>
CC: "David S. Miller" <davem@davemloft.net>
---
 drivers/net/bnx2.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index bf729ee..7f76d4c 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -8358,6 +8358,8 @@ bnx2_remove_one(struct pci_dev *pdev)
 
 	unregister_netdev(dev);
 
+	del_timer_sync(&bp->timer);
+
 	if (bp->mips_firmware)
 		release_firmware(bp->mips_firmware);
 	if (bp->rv2p_firmware)
-- 
1.7.4.4


^ permalink raw reply related

* Re: Linux TCP's Robustness to Multipath Packet Reordering
From: John Heffner @ 2011-04-26 20:16 UTC (permalink / raw)
  To: Dominik Kaspar; +Cc: Eric Dumazet, Carsten Wolff, netdev
In-Reply-To: <BANLkTikjcd6ZAF5wH_8CKA_OCrfprGokmw@mail.gmail.com>

First, TCP is definitely not designed to work under such conditions.
For example, assumptions behind RTO calculation and fast retransmit
heuristics are violated.  However, in this particular case my first
guess is that you are being limited by "cwnd moderation," which was
the topic of recent discussion here.  Under persistent reordering,
cwnd moderation can inhibit the ability of cwnd to grow.

Thanks,
  -John


On Tue, Apr 26, 2011 at 2:00 PM, Dominik Kaspar <dokaspar.ietf@gmail.com> wrote:
> Hi Eric,
>
> Here are the tcpdump files for the first TSO-disabled experiment, in a
> full version and a short version with only the first 10000 packets:
>
> http://home.simula.no/~kaspar/static/mptcp-emu-wlan-hspa-01-tos0-exp1-full.pcap
> http://home.simula.no/~kaspar/static/mptcp-emu-wlan-hspa-01-tos0-exp1-short.pcap
>
> By the way, the packets are sent from the server (x.x.x.189) to the
> client interfaces (x.x.x.74) and (x.x.x.216) with the following
> pattern (which is a non-bursty 128-bit approximation of scheduling
> with a 600:400 ratio over primary path 0 and secondary path 1):
>
> 0010010100101001010010100101001010010100101001010010100101001010
> 0101001010010100101001010010100101001010010100101001010010100101
>
> Greetings,
> Dominik
>
> On Tue, Apr 26, 2011 at 7:10 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> Le mardi 26 avril 2011 à 18:58 +0200, Dominik Kaspar a écrit :
>>> Hi Eric,
>>>
>>> On Mon, Apr 25, 2011 at 5:38 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>> >
>>> > Since you have at sender a rule to spoof destination address of packets,
>>> > you should make sure you dont send "super packets (up to 64Kbytes)",
>>> > because it would stress the multipath more than you wanted to. This way,
>>> > you send only normal packets (1500 MTU).
>>> >
>>> > ethtool -K eth0 tso off
>>> > ethtool -K eth0 gso off
>>> >
>>> > I am pretty sure it should help your (atypic) workload.
>>>
>>> I made new experiments with the exact same multipath setup as before,
>>> but disabled TSO and GSO on all involved Ethernet interfaces. However,
>>> this did not seem to change much about TCP's behavior when packets are
>>> striped over heterogeneous paths. You can see the results of four
>>> 20-minute experiments on this plot:
>>>
>>> http://home.simula.no/~kaspar/static/mptcp-emu-wlan-hspa-01-tos0.png
>>>
>>> Cheers,
>>> Dominik
>>
>> Hi Dominik
>>
>> Any chance to have a pcap file from sender side, of say first 10.000
>> packets ?
>>
>>
>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply

* Re: [PATCH v2] iproute2: tc add mqprio qdisc support
From: John Fastabend @ 2011-04-26 19:52 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: shemminger@vyatta.com, netdev@vger.kernel.org
In-Reply-To: <1303845492.2850.7.camel@bwh-desktop>

On 4/26/2011 12:18 PM, Ben Hutchings wrote:
> On Tue, 2011-04-26 at 11:53 -0700, John Fastabend wrote:
>> Add mqprio qdisc support. Output matches the following,
> [...]
> 
> Stephen already applied the previous version of this, so you'll need to
> provide an incremental patch to change the argument parsing.
> 
> Ben.
> 

OK done. Thanks Ben.

^ permalink raw reply

* [PATCH] iproute2: improve mqprio inputs for queue offsets and counts
From: John Fastabend @ 2011-04-26 19:44 UTC (permalink / raw)
  To: shemminger, bhutchings; +Cc: netdev

This changes mqprio input format to be more user friendly.

Old usage,

# ./tc/tc qdisc add dev eth3 root mqprio help
Usage: ... mqprio [num_tc NUMBER] [map P0 P1...]
                  [offset txq0 txq1 ...] [count cnt0 cnt1 ...] [hw 1|0]

New uage,

# ./tc/tc qdisc add dev eth3 root mqprio help
Usage: ... mqprio [num_tc NUMBER] [map P0 P1 ...]
                  [queues count1@offset1 count2@offset2 ...] [hw 1|0]

Suggested-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---

 tc/q_mqprio.c |   27 +++++++++++++++++----------
 1 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/tc/q_mqprio.c b/tc/q_mqprio.c
index c589b4c..bf734a0 100644
--- a/tc/q_mqprio.c
+++ b/tc/q_mqprio.c
@@ -25,8 +25,8 @@
 static void explain(void)
 {
 	fprintf(stderr, "Usage: ... mqprio [num_tc NUMBER] [map P0 P1 ...]\n");
-	fprintf(stderr, "                  [offset txq0 txq1 ...] ");
-	fprintf(stderr, "[count cnt0,cnt1 ...] [hw 1|0]\n");
+	fprintf(stderr, "                  [queues count1@offset1 count2@offset2 ...] ");
+	fprintf(stderr, "[hw 1|0]\n");
 }
 
 static int mqprio_parse_opt(struct qdisc_util *qu, int argc,
@@ -58,22 +58,29 @@ static int mqprio_parse_opt(struct qdisc_util *qu, int argc,
 			}
 			for ( ; idx < TC_QOPT_MAX_QUEUE; idx++)
 				opt.prio_tc_map[idx] = 0;
-		} else if (strcmp(*argv, "offset") == 0) {
+		} else if (strcmp(*argv, "queues") == 0) {
+			char *tmp, *tok;
+
 			while (idx < TC_QOPT_MAX_QUEUE && NEXT_ARG_OK()) {
 				NEXT_ARG();
-				if (get_u16(&opt.offset[idx], *argv, 10)) {
+
+				tmp = strdup(*argv);
+				if (!tmp)
+					break;
+
+				tok = strtok(tmp, "@");
+				if (get_u16(&opt.count[idx], tok, 10)) {
+					free(tmp);
 					PREV_ARG();
 					break;
 				}
-				idx++;
-			}
-		} else if (strcmp(*argv, "count") == 0) {
-			while (idx < TC_QOPT_MAX_QUEUE && NEXT_ARG_OK()) {
-				NEXT_ARG();
-				if (get_u16(&opt.count[idx], *argv, 10)) {
+				tok = strtok(NULL, "@");
+				if (get_u16(&opt.offset[idx], tok, 10)) {
+					free(tmp);
 					PREV_ARG();
 					break;
 				}
+				free(tmp);
 				idx++;
 			}
 		} else if (strcmp(*argv, "hw") == 0) {


^ permalink raw reply related

* Re: [PATCH] tg3: Convert u32 flag,flg2,flg3 uses to bitmap
From: David Miller @ 2011-04-26 19:49 UTC (permalink / raw)
  To: joe; +Cc: mcarlson, mchan, eric.dumazet, netdev
In-Reply-To: <1303841530.24299.37.camel@Joe-Laptop>

From: Joe Perches <joe@perches.com>
Date: Tue, 26 Apr 2011 11:12:10 -0700

> Using a bitmap instead of separate u32 flags allows a consistent, simpler
> and more extensible mechanism to determine capabilities.
> 
> Convert bitmasks to enum.
> Add tg3_flag, tg3_flag_clear and tg3_flag_set.
> Convert the flag & bitmask tests.
> 
> Signed-off-by: Joe Perches <joe@perches.com>

I like this, Broadcom folks please review.

^ permalink raw reply

* Re: [PATCH net-2.6 4/4] xfrm: Fix integer underrun on zero sized replay windows
From: David Miller @ 2011-04-26 19:48 UTC (permalink / raw)
  To: steffen.klassert; +Cc: herbert, netdev
In-Reply-To: <20110426105840.GJ5495@secunet.com>

From: Steffen Klassert <steffen.klassert@secunet.com>
Date: Tue, 26 Apr 2011 12:58:40 +0200

> On Tue, Apr 26, 2011 at 07:42:32AM +0200, Steffen Klassert wrote:
>> The check if the replay window is contained within one subspace or
>> spans over two subspaces causes an unwanted integer underrun on
>> zero sized replay windows when we subtract minus one. We fix this by
>> changeing this check to avoid the subtraction.
>> 
> 
> Don't apply this one, it does not fix the issue completely.
> I'll send a better one, sorry.

Ok.

^ permalink raw reply

* Re: [PATCH net-2.6 3/4] xfrm: Check for the new replay implementation if an esn state is inserted
From: David Miller @ 2011-04-26 19:47 UTC (permalink / raw)
  To: herbert; +Cc: steffen.klassert, netdev
In-Reply-To: <20110426054304.GD18896@gondor.apana.org.au>

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Tue, 26 Apr 2011 15:43:04 +1000

> On Tue, Apr 26, 2011 at 07:41:21AM +0200, Steffen Klassert wrote:
>> IPsec extended sequence numbers can be used only with the new
>> anti-replay window implementation. So check if the new implementation
>> is used if an esn state is inserted and return an error if it is not.
>> 
>> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
> 
> Acked-by: Herbert Xu <herbert@gondor.apana.org.au>

Applied.

^ permalink raw reply

* Re: [PATCH net-2.6 2/4] esp6: Fix scatterlist initialization
From: David Miller @ 2011-04-26 19:47 UTC (permalink / raw)
  To: herbert; +Cc: steffen.klassert, netdev
In-Reply-To: <20110426054158.GC18896@gondor.apana.org.au>

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Tue, 26 Apr 2011 15:41:58 +1000

> On Tue, Apr 26, 2011 at 07:40:23AM +0200, Steffen Klassert wrote:
>> When we use IPsec extended sequence numbers, we may overwrite
>> the last scatterlist of the associated data by the scatterlist
>> for the skb. This patch fixes this by placing the scatterlist
>> for the skb right behind the last scatterlist of the associated
>> data. esp4 does it already like that.
>> 
>> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
> 
> Acked-by: Herbert Xu <herbert@gondor.apana.org.au>

Applied.

^ permalink raw reply

* Re: [PATCH net-2.6 1/4] xfrm: Fix replay window size calculation on initialization
From: David Miller @ 2011-04-26 19:47 UTC (permalink / raw)
  To: herbert; +Cc: steffen.klassert, netdev
In-Reply-To: <20110426054107.GB18896@gondor.apana.org.au>

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Tue, 26 Apr 2011 15:41:07 +1000

> On Tue, Apr 26, 2011 at 07:39:24AM +0200, Steffen Klassert wrote:
>> On replay initialization, we compute the size of the replay
>> buffer to see if the replay window fits into the buffer.
>> This computation lacks a mutliplication by 8 because we need
>> the size in bit, not in byte. So we might return an error
>> even though the replay window would fit into the buffer.
>> This patch fixes this issue.
>> 
>> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
> 
> Acked-by: Herbert Xu <herbert@gondor.apana.org.au>

Applied.

^ permalink raw reply

* Re: linux-next: build failure after merge of the net tree
From: David Miller @ 2011-04-26 19:43 UTC (permalink / raw)
  To: sfr; +Cc: netdev, linux-next, linux-kernel, hayeswang, romieu
In-Reply-To: <20110426135146.c01f8395.sfr@canb.auug.org.au>

From: Stephen Rothwell <sfr@canb.auug.org.au>
Date: Tue, 26 Apr 2011 13:51:46 +1000

> Hi all,
> 
> After merging the net tree, today's linux-next build (x86_64 allmodconfig)
> failed like this:
> 
> drivers/net/r8169.c: In function 'rtl8168e_1_hw_phy_config':
> drivers/net/r8169.c:2578: error: too many arguments to function 'rtl_apply_firmware'
> drivers/net/r8169.c:2578: error: void value not ignored as it ought to be
> drivers/net/r8169.c: In function 'rtl8168e_2_hw_phy_config':
> drivers/net/r8169.c:2586: error: too many arguments to function 'rtl_apply_firmware'
> drivers/net/r8169.c:2586: error: void value not ignored as it ought to be
> 
> Caused by commit 953a12cc2889 ("r8169: don't request firmware when
> there's no userspace") from the net-current tree interacting with commit
> 01dc7fec4025 ("net/r8169: support RTL8168E").
> 
> I applied the following fixup patch:

Stephen, I just merged net-2.6 into net-next-2.6 and added your patch
to the merge commit to make sure the build is fine.

Thanks a lot!

^ permalink raw reply

* Re: [PATCH v2] iproute2: tc add mqprio qdisc support
From: Ben Hutchings @ 2011-04-26 19:18 UTC (permalink / raw)
  To: John Fastabend; +Cc: shemminger, netdev
In-Reply-To: <20110426185331.12236.41211.stgit@jf-dev1-dcblab>

On Tue, 2011-04-26 at 11:53 -0700, John Fastabend wrote:
> Add mqprio qdisc support. Output matches the following,
[...]

Stephen already applied the previous version of this, so you'll need to
provide an incremental patch to change the argument parsing.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* Re: [PATCH] dsa/mv88e6131: fix unknown multicast/broadcast forwarding on mv88e6085
From: Peter Korsgaard @ 2011-04-26 19:08 UTC (permalink / raw)
  To: Lennert Buytenhek; +Cc: davem, netdev
In-Reply-To: <20110426145740.GL1897@wantstofly.org>

>>>>> "Lennert" == Lennert Buytenhek <buytenh@wantstofly.org> writes:

Hi,

 >> The 88e6085 has a few differences from the other devices in the port
 >> control registers, causing unknown multicast/broadcast packets to get
 >> dropped when using the standard port setup.
 >> 
 >> At the same time update kconfig to clarify that the mv88e6085 is now
 >> supported.
 >> 
 >> Signed-off-by: Peter Korsgaard <jacmet@sunsite.dk>

 Lennert> Assuming that you've tested this.. :)

Yes, this time with multicast as well *red ears* ;)

Thanks.

-- 
Bye, Peter Korsgaard

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox