* Re: [patch] pm_qos update fixing mmotm 2010-05-11 -dies in pm_qos_update_request()
From: mgross @ 2010-05-17 0:12 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: markgross, Valdis.Kletnieks, mgross, akpm, davem, linux-kernel,
e1000-devel, netdev, linux-pm
In-Reply-To: <201005170021.25427.rjw@sisk.pl>
On Mon, May 17, 2010 at 12:21:25AM +0200, Rafael J. Wysocki wrote:
> On Saturday 15 May 2010, mgross wrote:
> > On Sat, May 15, 2010 at 09:38:47PM +0200, Rafael J. Wysocki wrote:
> > > On Saturday 15 May 2010, mgross wrote:
> > > > I apologize for the goofy email address.
> > > >
> > > > The following is a fix for the crash reported by Valdis.
> > > >
> > > > The problem was that the original pm_qos silently fails when a request
> > > > update is passed to a parameter that has not been added to the list
> > > > yet. It seems that the e1000e is doing this. This update restores this
> > > > behavior.
> > > >
> > > > I need to think about how to better handle such abuse, but for now this
> > > > restores the original behavior.
> > >
> > > Can you please post a signed-off incremental patch against
> > >
> > > git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6.git for-llinus
> > >
> > > that contains your original PM QOS update?
> >
> > No problem:
> >
> > Signed-off-by: markgross <markgross@thegnar.org>
>
> Thanks! Do you want to use this address for the sign-off or the Intel one?
I guess so. Ever since switching groups within intel last summer my
mgross@linux.intel.com address isn't checked as often as this one.
The other option is to use my outlook email (mark.gross@intel.com), but
I really hate posting from outlook. Besides, doing upstream kernel
stuff isn't my day job any more so using markgross@thegnar.org makes
sense to me.
thanks,
--mgross
>
> Rafael
>
>
> > From 487b8dcaeb66d3c226d4c06c1bd99689f93024be Mon Sep 17 00:00:00 2001
> > From: mgross <mgross@mgross-desktop.(none)>
> > Date: Sat, 15 May 2010 14:30:15 -0700
> > Subject: [PATCH] Gard against pm_qos users calling API before registering a proper
> > request.
> >
> > This update handles a use case where pm_qos update requests need to
> > silently fail if the update is being sent to a handle that is null.
> >
> > The problem was that the original pm_qos silently fails when a request
> > update is passed to a parameter that has not been added to the list yet.
> > This update restores that behavior.
> >
> > Signed-off-by: markgross <markgross@thegnar.org>
> >
> > ---
> > kernel/pm_qos_params.c | 26 ++++++++++++++------------
> > 1 files changed, 14 insertions(+), 12 deletions(-)
> >
> > diff --git a/kernel/pm_qos_params.c b/kernel/pm_qos_params.c
> > index a1aea04..f42d3f7 100644
> > --- a/kernel/pm_qos_params.c
> > +++ b/kernel/pm_qos_params.c
> > @@ -252,19 +252,21 @@ void pm_qos_update_request(struct pm_qos_request_list *pm_qos_req,
> > int pending_update = 0;
> > s32 temp;
> >
> > - spin_lock_irqsave(&pm_qos_lock, flags);
> > - if (new_value == PM_QOS_DEFAULT_VALUE)
> > - temp = pm_qos_array[pm_qos_req->pm_qos_class]->default_value;
> > - else
> > - temp = new_value;
> > -
> > - if (temp != pm_qos_req->value) {
> > - pending_update = 1;
> > - pm_qos_req->value = temp;
> > + if (pm_qos_req) { /*guard against callers passing in null */
> > + spin_lock_irqsave(&pm_qos_lock, flags);
> > + if (new_value == PM_QOS_DEFAULT_VALUE)
> > + temp = pm_qos_array[pm_qos_req->pm_qos_class]->default_value;
> > + else
> > + temp = new_value;
> > +
> > + if (temp != pm_qos_req->value) {
> > + pending_update = 1;
> > + pm_qos_req->value = temp;
> > + }
> > + spin_unlock_irqrestore(&pm_qos_lock, flags);
> > + if (pending_update)
> > + update_target(pm_qos_req->pm_qos_class);
> > }
> > - spin_unlock_irqrestore(&pm_qos_lock, flags);
> > - if (pending_update)
> > - update_target(pm_qos_req->pm_qos_class);
> > }
> > EXPORT_SYMBOL_GPL(pm_qos_update_request);
> >
> >
>
^ permalink raw reply
* Re: [PATCH 13/37] drivers/net/wireless/iwmc3200wifi: Use kmemdup
From: Samuel Ortiz @ 2010-05-16 23:01 UTC (permalink / raw)
To: Julia Lawall
Cc: Zhu, Yi, Intel Linux Wireless, John W. Linville,
linux-wireless@vger.kernel.org, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org, kernel-janitors@vger.kernel.org
In-Reply-To: <Pine.LNX.4.64.1005152316400.21345@ask.diku.dk>
On Sat, May 15, 2010 at 10:16:58PM +0100, Julia Lawall wrote:
> From: Julia Lawall <julia@diku.dk>
>
> Use kmemdup when some other buffer is immediately copied into the
> allocated region.
>
> A simplified version of the semantic patch that makes this change is as
> follows: (http://coccinelle.lip6.fr/)
>
> // <smpl>
> @@
> expression from,to,size,flag;
> statement S;
> @@
>
> - to = \(kmalloc\|kzalloc\)(size,flag);
> + to = kmemdup(from,size,flag);
> if (to==NULL || ...) S
> - memcpy(to, from, size);
> // </smpl>
>
> Signed-off-by: Julia Lawall <julia@diku.dk>
Acked-by: Samuel Ortiz <sameo@linux.intel.com>
Cheers,
Samuel.
> ---
> drivers/net/wireless/iwmc3200wifi/rx.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff -u -p a/drivers/net/wireless/iwmc3200wifi/rx.c b/drivers/net/wireless/iwmc3200wifi/rx.c
> --- a/drivers/net/wireless/iwmc3200wifi/rx.c
> +++ b/drivers/net/wireless/iwmc3200wifi/rx.c
> @@ -321,14 +321,14 @@ iwm_rx_ticket_node_alloc(struct iwm_priv
> return ERR_PTR(-ENOMEM);
> }
>
> - ticket_node->ticket = kzalloc(sizeof(struct iwm_rx_ticket), GFP_KERNEL);
> + ticket_node->ticket = kmemdup(ticket, sizeof(struct iwm_rx_ticket),
> + GFP_KERNEL);
> if (!ticket_node->ticket) {
> IWM_ERR(iwm, "Couldn't allocate RX ticket\n");
> kfree(ticket_node);
> return ERR_PTR(-ENOMEM);
> }
>
> - memcpy(ticket_node->ticket, ticket, sizeof(struct iwm_rx_ticket));
> INIT_LIST_HEAD(&ticket_node->node);
>
> return ticket_node;
--
Intel Open Source Technology Centre
http://oss.intel.com/
---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number: 302 456 199 R.C.S. NANTERRE
Capital: 4,572,000 Euros
This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
^ permalink raw reply
* Re: [patch] pm_qos update fixing mmotm 2010-05-11 -dies in pm_qos_update_request()
From: Rafael J. Wysocki @ 2010-05-16 22:21 UTC (permalink / raw)
To: markgross
Cc: mgross, Valdis.Kletnieks, e1000-devel, netdev, linux-kernel,
linux-pm, akpm, davem
In-Reply-To: <20100515214256.GA3506@thegnar.org>
On Saturday 15 May 2010, mgross wrote:
> On Sat, May 15, 2010 at 09:38:47PM +0200, Rafael J. Wysocki wrote:
> > On Saturday 15 May 2010, mgross wrote:
> > > I apologize for the goofy email address.
> > >
> > > The following is a fix for the crash reported by Valdis.
> > >
> > > The problem was that the original pm_qos silently fails when a request
> > > update is passed to a parameter that has not been added to the list
> > > yet. It seems that the e1000e is doing this. This update restores this
> > > behavior.
> > >
> > > I need to think about how to better handle such abuse, but for now this
> > > restores the original behavior.
> >
> > Can you please post a signed-off incremental patch against
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6.git for-llinus
> >
> > that contains your original PM QOS update?
>
> No problem:
>
> Signed-off-by: markgross <markgross@thegnar.org>
Thanks! Do you want to use this address for the sign-off or the Intel one?
Rafael
> From 487b8dcaeb66d3c226d4c06c1bd99689f93024be Mon Sep 17 00:00:00 2001
> From: mgross <mgross@mgross-desktop.(none)>
> Date: Sat, 15 May 2010 14:30:15 -0700
> Subject: [PATCH] Gard against pm_qos users calling API before registering a proper
> request.
>
> This update handles a use case where pm_qos update requests need to
> silently fail if the update is being sent to a handle that is null.
>
> The problem was that the original pm_qos silently fails when a request
> update is passed to a parameter that has not been added to the list yet.
> This update restores that behavior.
>
> Signed-off-by: markgross <markgross@thegnar.org>
>
> ---
> kernel/pm_qos_params.c | 26 ++++++++++++++------------
> 1 files changed, 14 insertions(+), 12 deletions(-)
>
> diff --git a/kernel/pm_qos_params.c b/kernel/pm_qos_params.c
> index a1aea04..f42d3f7 100644
> --- a/kernel/pm_qos_params.c
> +++ b/kernel/pm_qos_params.c
> @@ -252,19 +252,21 @@ void pm_qos_update_request(struct pm_qos_request_list *pm_qos_req,
> int pending_update = 0;
> s32 temp;
>
> - spin_lock_irqsave(&pm_qos_lock, flags);
> - if (new_value == PM_QOS_DEFAULT_VALUE)
> - temp = pm_qos_array[pm_qos_req->pm_qos_class]->default_value;
> - else
> - temp = new_value;
> -
> - if (temp != pm_qos_req->value) {
> - pending_update = 1;
> - pm_qos_req->value = temp;
> + if (pm_qos_req) { /*guard against callers passing in null */
> + spin_lock_irqsave(&pm_qos_lock, flags);
> + if (new_value == PM_QOS_DEFAULT_VALUE)
> + temp = pm_qos_array[pm_qos_req->pm_qos_class]->default_value;
> + else
> + temp = new_value;
> +
> + if (temp != pm_qos_req->value) {
> + pending_update = 1;
> + pm_qos_req->value = temp;
> + }
> + spin_unlock_irqrestore(&pm_qos_lock, flags);
> + if (pending_update)
> + update_target(pm_qos_req->pm_qos_class);
> }
> - spin_unlock_irqrestore(&pm_qos_lock, flags);
> - if (pending_update)
> - update_target(pm_qos_req->pm_qos_class);
> }
> EXPORT_SYMBOL_GPL(pm_qos_update_request);
>
>
------------------------------------------------------------------------------
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit http://communities.intel.com/community/wired
^ permalink raw reply
* Re: bnx2/BCM5709: why 5 interrupts on a 4 core system (2.6.33.3)
From: Eric Dumazet @ 2010-05-16 21:26 UTC (permalink / raw)
To: Krzysztof Olędzki; +Cc: Michael Chan, netdev@vger.kernel.org
In-Reply-To: <4BF05FC2.4020804@ans.pl>
Le dimanche 16 mai 2010 à 23:12 +0200, Krzysztof Olędzki a écrit :
> On 2010-05-16 22:47, Eric Dumazet wrote:
> > Le dimanche 16 mai 2010 à 22:34 +0200, Krzysztof Olędzki a écrit :
> >> On 2010-05-16 22:15, Eric Dumazet wrote:
> >
> >>> All tx packets through bonding will use txqueue 0, since bnx2 doesnt
> >>> provide a ndo_select_queue() function.
> >>
> >> OK, that explains everything. Thank you Eric. I assume it may take some
> >> time for bonding to become multiqueue aware and/or bnx2x to provide
> >> ndo_select_queue?
> >>
> >
> > bonding might become multiqueue aware, there are several patches
> > floating around.
> >
> > But with your ping tests, it wont change the selected txqueue anyway (it
> > will be the same for any targets, because skb_tx_hash() wont hash the
> > destination address, only the skb->protocol.
>
> What do you mean by "wont hash the destination address, only the
> skb->protocol"? It won't hash the destination address for ICMP or for
> all IP protocols?
locally generated ICMP packets all use same tx queue, because
sk->sk_hash is not set :
if (skb->sk && skb->sk->sk_hash)
hash = skb->sk->sk_hash;
else
hash = (__force u16) skb->protocol;
hash = jhash_1word(hash, hashrnd);
return (u16) (((u64) hash * dev->real_num_tx_queues) >> 32);
However, replies will spread four queues, if hardware is capable to
perform hashing of ICMP packets, using IP addresses (source/destination)
>
> My normal workload is TCP and UDP based so if it is only ICMP then there
> is no problem. Actually I have noticeably more UDP traffic than an
> average network, mainly because of LWAPP/CAPWAP, so I'm interested in
> good performance for both TCP and UDP.
>
> During my initial tests ICMP ping showed the same behavior like UDP/TCP
> with iperf, so I sticked with it. I'll redo everyting with UDP and TCP
> of course. :)
>
> >> BTW: With a normal router workload, should I expect big performance drop
> >> when receiving and forwarding the same packet using different CPUs?
> >> Bonding provides very important functionality, I'm not able to drop it. :(
> >>
> >
> > Not sure what you mean by forwarding same packet using different CPUs.
> > You probably meant different queues, because in normal case, only one
> > cpu is involved (the one receiving the packet is also the one
> > transmitting it, unless you have congestion or trafic shaping)
>
> I mean to receive it on a one CPU and to send it on a different one. I
> would like to assing different vectors (eth1-0 .. eth1-4) to different
> CPUs, but with bnx2x+bonding packets are received on queues 1-4 (eth1-1
> .. eth1-4) and sent from queue 0 (eth1-0). So, for a one packet, two
> different CPUs will be involved (RX on q1-q4, TX on q0).
As I said, (unless you use RPS), one forwarded packet only uses one CPU.
How tx queue is selected is another story. We try to do a 1-1 mapping.
>
> > If you have 4 cpus, you can use following patch and have a transparent
> > bonding against multiqueue.
>
> Thanks! If I get it right: with the patch, packets should be sent using
> the same CPU (queue?) that was used when receiving?
Yes, for forwarding loads.
(You might use 5 or 8 instead of 4, because its not clear to me if bnx2
has 5 txqueues or 4 in your case)
>
> > Still bonding xmit path hits a global
> > rwlock, so performance is not what you can get without bonding.
>
> It may not be perfect, but it should be much better than nothing, right?
>
Sure.
^ permalink raw reply
* Re: bnx2/BCM5709: why 5 interrupts on a 4 core system (2.6.33.3)
From: Krzysztof Olędzki @ 2010-05-16 21:12 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Michael Chan, netdev@vger.kernel.org
In-Reply-To: <1274042826.2299.26.camel@edumazet-laptop>
On 2010-05-16 22:47, Eric Dumazet wrote:
> Le dimanche 16 mai 2010 à 22:34 +0200, Krzysztof Olędzki a écrit :
>> On 2010-05-16 22:15, Eric Dumazet wrote:
>
>>> All tx packets through bonding will use txqueue 0, since bnx2 doesnt
>>> provide a ndo_select_queue() function.
>>
>> OK, that explains everything. Thank you Eric. I assume it may take some
>> time for bonding to become multiqueue aware and/or bnx2x to provide
>> ndo_select_queue?
>>
>
> bonding might become multiqueue aware, there are several patches
> floating around.
>
> But with your ping tests, it wont change the selected txqueue anyway (it
> will be the same for any targets, because skb_tx_hash() wont hash the
> destination address, only the skb->protocol.
What do you mean by "wont hash the destination address, only the
skb->protocol"? It won't hash the destination address for ICMP or for
all IP protocols?
My normal workload is TCP and UDP based so if it is only ICMP then there
is no problem. Actually I have noticeably more UDP traffic than an
average network, mainly because of LWAPP/CAPWAP, so I'm interested in
good performance for both TCP and UDP.
During my initial tests ICMP ping showed the same behavior like UDP/TCP
with iperf, so I sticked with it. I'll redo everyting with UDP and TCP
of course. :)
>> BTW: With a normal router workload, should I expect big performance drop
>> when receiving and forwarding the same packet using different CPUs?
>> Bonding provides very important functionality, I'm not able to drop it. :(
>>
>
> Not sure what you mean by forwarding same packet using different CPUs.
> You probably meant different queues, because in normal case, only one
> cpu is involved (the one receiving the packet is also the one
> transmitting it, unless you have congestion or trafic shaping)
I mean to receive it on a one CPU and to send it on a different one. I
would like to assing different vectors (eth1-0 .. eth1-4) to different
CPUs, but with bnx2x+bonding packets are received on queues 1-4 (eth1-1
.. eth1-4) and sent from queue 0 (eth1-0). So, for a one packet, two
different CPUs will be involved (RX on q1-q4, TX on q0).
> If you have 4 cpus, you can use following patch and have a transparent
> bonding against multiqueue.
Thanks! If I get it right: with the patch, packets should be sent using
the same CPU (queue?) that was used when receiving?
> Still bonding xmit path hits a global
> rwlock, so performance is not what you can get without bonding.
It may not be perfect, but it should be much better than nothing, right?
Best regards,
Krzysztof Olędzki
^ permalink raw reply
* Re: bnx2/BCM5709: why 5 interrupts on a 4 core system (2.6.33.3)
From: George B. @ 2010-05-16 21:06 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Krzysztof Olędzki, Michael Chan, netdev@vger.kernel.org
In-Reply-To: <1274042826.2299.26.camel@edumazet-laptop>
2010/5/16 Eric Dumazet <eric.dumazet@gmail.com>:
> Le dimanche 16 mai 2010 à 22:34 +0200, Krzysztof Olędzki a écrit :
>> On 2010-05-16 22:15, Eric Dumazet wrote:
>
>> > All tx packets through bonding will use txqueue 0, since bnx2 doesnt
>> > provide a ndo_select_queue() function.
>>
>> OK, that explains everything. Thank you Eric. I assume it may take some
>> time for bonding to become multiqueue aware and/or bnx2x to provide
>> ndo_select_queue?
>>
>
> bonding might become multiqueue aware, there are several patches
> floating around.
>
> But with your ping tests, it wont change the selected txqueue anyway (it
> will be the same for any targets, because skb_tx_hash() wont hash the
> destination address, only the skb->protocol.
>
>> BTW: With a normal router workload, should I expect big performance drop
>> when receiving and forwarding the same packet using different CPUs?
>> Bonding provides very important functionality, I'm not able to drop it. :(
>>
>
> Not sure what you mean by forwarding same packet using different CPUs.
> You probably meant different queues, because in normal case, only one
> cpu is involved (the one receiving the packet is also the one
> transmitting it, unless you have congestion or trafic shaping)
>
> If you have 4 cpus, you can use following patch and have a transparent
> bonding against multiqueue. Still bonding xmit path hits a global
> rwlock, so performance is not what you can get without bonding.
>
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 5e12462..2c257f7 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -5012,8 +5012,8 @@ int bond_create(struct net *net, const char *name)
>
> rtnl_lock();
>
> - bond_dev = alloc_netdev(sizeof(struct bonding), name ? name : "",
> - bond_setup);
> + bond_dev = alloc_netdev_mq(sizeof(struct bonding), name ? name : "",
> + bond_setup, 4);
> if (!bond_dev) {
> pr_err("%s: eek! can't alloc netdev!\n", name);
> rtnl_unlock();
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
FWIW, I will be comparing VLANs on bonded ethernet interfaces compared
to bonded to vlan interfaces (create a vlan on two interfaces and bond
them together) later this week to see if I can notice any performance
difference. I am expecting I will when two or more vlans are
experiencing heavy traffic. What concerns me is if one ethernet goes
away, will the bond interface see the ethernet underlying the vlan
interface has gone down?
So in summary, rather than bonding ethernet interfaces and then
applying vlans to the bond, I intend to create vlans on the ethernet
interfaces and bond them. So one bond interface per vlan plus one for
the "raw" interfaces. I am hoping that will allow better throughput
with multiple processors (and less head-of-line blocking for vlans
with low traffic rates). Note: that configuration doesn't work with
2.6.32, I haven't tried with 2.6.33, and it allows me to configure it
with 2.6.34-rc7 though I haven't tested it yet on a multiqueue
ethernet with multiple processors. I should have some systems to test
with later this week.
^ permalink raw reply
* Re: TCP-MD5 checksum failure on x86_64 SMP
From: Eric Dumazet @ 2010-05-16 20:48 UTC (permalink / raw)
To: Bijay Singh
Cc: Stephen Hemminger, David Miller, <bhaskie@gmail.com>,
<bhutchings@solarflare.com>, netdev, Ilpo Järvinen
In-Reply-To: <1273611036.2512.18.camel@edumazet-laptop>
Le mardi 11 mai 2010 à 22:50 +0200, Eric Dumazet a écrit :
> Le mardi 11 mai 2010 à 04:08 +0000, Bijay Singh a écrit :
> > Hi Eric,
> >
> > I guess that makes me the enviable one. So I am keen to test out this feature completely, as long as I know what to do as a next step, directions, patches.
> >
> > Thanks
>
>
> I believe third problem comes from commit 4957faad
> (TCPCT part 1g: Responder Cookie => Initiator), from William Allen
> Simpson.
>
> When a SYN-ACK packet is built (in tcp_synack_options()),
> it specifically forbids a TIMESTAMP option to be included if SACK is
> also selected :
>
> doing_ts &= !ireq->sack_ok;
>
> Problem is this mask is done on a local variable. socket is still marked
> as being timestamp enabled.
>
>
> Later, when we build tcp options for data packets, we _include_ a
> timestamp, while our SYNACK didnt mention the option.
>
> So the following trafic can happen (and fails) :
>
> 18:38:29.041966 IP 192.168.0.33.58906 > 192.168.0.56.22226: Flags [S], seq 4014064674, win 8860, options [mss 4430,sackOK,TS val 519041 ecr 0,nop,wscale 7,nop,nop,md5can't check - 9b44126367effcf3247fcbf6da76b24d], length 0
> 18:38:29.042072 IP 192.168.0.56.22226 > 192.168.0.33.58906: Flags [S.], seq 586328714, ack 4014064675, win 5792, options [nop,nop,md5can't check - badd847799ded46f39642c341cc7e92b,mss 1460,nop,nop,sackOK,nop,wscale 7], length 0
> 18:38:29.042093 IP 192.168.0.33.58906 > 192.168.0.56.22226: Flags [.], ack 1, win 70, options [nop,nop,md5can't check - 3994ef6987df02a592963fba04c5d313], length 0
> 18:38:29.043217 IP 192.168.0.33.58906 > 192.168.0.56.22226: Flags [.], seq 1:1441, ack 1, win 70, options [nop,nop,md5can't check - 8399f7ccab3a6b8c5a3027ed58bba314], length 1440
> 18:38:29.043226 IP 192.168.0.33.58906 > 192.168.0.56.22226: Flags [P.], seq 1441:2501, ack 1, win 70, options [nop,nop,md5can't check - 701ebf65b1894a6bed4cefbf7a56596a], length 1060
> 18:38:29.043374 IP 192.168.0.56.22226 > 192.168.0.33.58906: Flags [.], ack 1441, win 68, options [nop,nop,md5can't check - 1badb315ba436ab59bff5b37daa871be,nop,nop,TS val 113051377 ecr 519041], length 0
> 18:38:29.043383 IP 192.168.0.56.22226 > 192.168.0.33.58906: Flags [.], ack 2501, win 91, options [nop,nop,md5can't check - 120564dcb99f822f3b70910282a6ed9d,nop,nop,TS val 113051377 ecr 519041], length 0
> 18:38:29.043673 IP 192.168.0.56.22226 > 192.168.0.33.58906: Flags [.], seq 1:1429, ack 2501, win 91, options [nop,nop,md5can't check - fe5dfb438065373b52ba85bf800876a8,nop,nop,TS val 113051377 ecr 519041], length 1428
> 18:38:29.043681 IP 192.168.0.56.22226 > 192.168.0.33.58906: Flags [P.], seq 1429:2500, ack 2501, win 91, options [nop,nop,md5can't check - 7a910cd5ff357bf0e2c8d3489aafaa86,nop,nop,TS val 113051377 ecr 519041], length 1071
> 18:38:32.037786 IP 192.168.0.56.22226 > 192.168.0.33.58906: Flags [.], seq 1:1429, ack 2501, win 91, options [nop,nop,md5can't check - fe5dfb438065373b52ba85bf800876a8,nop,nop,TS val 113051677 ecr 519041], length 1428
> 18:38:38.037708 IP 192.168.0.56.22226 > 192.168.0.33.58906: Flags [.], seq 1:1429, ack 2501, win 91, options [nop,nop,md5can't check - fe5dfb438065373b52ba85bf800876a8,nop,nop,TS val 113052277 ecr 519041], length 1428
> 18:38:50.037524 IP 192.168.0.56.22226 > 192.168.0.33.58906: Flags [.], seq 1:1429, ack 2501, win 91, options [nop,nop,md5can't check - fe5dfb438065373b52ba85bf800876a8,nop,nop,TS val 113053477 ecr 519041], length 1428
>
>
> Could you try following patch ?
>
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 5db3a2c..0be21cd 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -668,7 +668,7 @@ static unsigned tcp_synack_options(struct sock *sk,
> u8 cookie_plus = (xvp != NULL && !xvp->cookie_out_never) ?
> xvp->cookie_plus :
> 0;
> - bool doing_ts = ireq->tstamp_ok;
> + bool doing_ts;
>
> #ifdef CONFIG_TCP_MD5SIG
> *md5 = tcp_rsk(req)->af_specific->md5_lookup(sk, req);
> @@ -681,11 +681,12 @@ static unsigned tcp_synack_options(struct sock *sk,
> * rather than TS in order to fit in better with old,
> * buggy kernels, but that was deemed to be unnecessary.
> */
> - doing_ts &= !ireq->sack_ok;
> + ireq->tstamp_ok &= !ireq->sack_ok;
> }
> #else
> *md5 = NULL;
> #endif
> + doing_ts = ireq->tstamp_ok;
>
> /* We always send an MSS option. */
> opts->mss = mss;
>
>
>
>
Bijay, had you tested this patch by any chance ?
Thanks
^ permalink raw reply
* Re: bnx2/BCM5709: why 5 interrupts on a 4 core system (2.6.33.3)
From: Eric Dumazet @ 2010-05-16 20:47 UTC (permalink / raw)
To: Krzysztof Olędzki; +Cc: Michael Chan, netdev@vger.kernel.org
In-Reply-To: <4BF056F0.8010008@ans.pl>
Le dimanche 16 mai 2010 à 22:34 +0200, Krzysztof Olędzki a écrit :
> On 2010-05-16 22:15, Eric Dumazet wrote:
> > All tx packets through bonding will use txqueue 0, since bnx2 doesnt
> > provide a ndo_select_queue() function.
>
> OK, that explains everything. Thank you Eric. I assume it may take some
> time for bonding to become multiqueue aware and/or bnx2x to provide
> ndo_select_queue?
>
bonding might become multiqueue aware, there are several patches
floating around.
But with your ping tests, it wont change the selected txqueue anyway (it
will be the same for any targets, because skb_tx_hash() wont hash the
destination address, only the skb->protocol.
> BTW: With a normal router workload, should I expect big performance drop
> when receiving and forwarding the same packet using different CPUs?
> Bonding provides very important functionality, I'm not able to drop it. :(
>
Not sure what you mean by forwarding same packet using different CPUs.
You probably meant different queues, because in normal case, only one
cpu is involved (the one receiving the packet is also the one
transmitting it, unless you have congestion or trafic shaping)
If you have 4 cpus, you can use following patch and have a transparent
bonding against multiqueue. Still bonding xmit path hits a global
rwlock, so performance is not what you can get without bonding.
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 5e12462..2c257f7 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -5012,8 +5012,8 @@ int bond_create(struct net *net, const char *name)
rtnl_lock();
- bond_dev = alloc_netdev(sizeof(struct bonding), name ? name : "",
- bond_setup);
+ bond_dev = alloc_netdev_mq(sizeof(struct bonding), name ? name : "",
+ bond_setup, 4);
if (!bond_dev) {
pr_err("%s: eek! can't alloc netdev!\n", name);
rtnl_unlock();
^ permalink raw reply related
* Re: bnx2/BCM5709: why 5 interrupts on a 4 core system (2.6.33.3)
From: Krzysztof Olędzki @ 2010-05-16 20:34 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Michael Chan, netdev@vger.kernel.org
In-Reply-To: <1274040928.2299.17.camel@edumazet-laptop>
On 2010-05-16 22:15, Eric Dumazet wrote:
> Le dimanche 16 mai 2010 à 13:00 -0700, Michael Chan a écrit :
>> Krzysztof Oledzki wrote:
>>
>>> On 2010-05-16 20:51, Michael Chan wrote:
>>>> Krzysztof Oledzki wrote:
>>>>
>>>>>
>>>>> Why the driver registers 5 interrupts instead of 4? How to
>>>>> limit it to 4?
>>>>>
>>>>
>>>> The first vector (eth0-0) handles link interrupt and other slow
>>>> path events. It also has an RX ring for non-IP packets that are
>>>> not hashed by the RSS hash. The majority of the rx packets should
>>>> be hashed to the rx rings eth0-1 - eth0-4, so I would assign these
>>>> vectors to different CPUs.
>>>
>>> Thank you for your prompt response.
>>>
>>> In my case the first vector must be handling something more:
>>> - "ping -f 192.168.0.1" increases interrupts on both eth1-0
>>> and eth1-4
>>> - "ping -f 192.168.0.2" increases interrupts on both eth1-0
>>> and eth1-3
>>> - "ping -f 192.168.0.3" increases interrupts on both eth1-0
>>> and eth1-1
>>> - "ping -f 192.168.0.7" increases interrupts on both eth1-0
>>> and eth1-2
>>>
>>> CPU0 CPU1 CPU2 CPU3
>>> 67: 1563979 0 0 0
>>> PCI-MSI-edge eth1-0
>>> 68: 1072869 0 0 0
>>> PCI-MSI-edge eth1-1
>>> 69: 137905 0 0 0
>>> PCI-MSI-edge eth1-2
>>> 70: 259246 0 0 0
>>> PCI-MSI-edge eth1-3
>>> 71: 760252 0 0 0
>>> PCI-MSI-edge eth1-4
>>>
>>> As you can see, eth1-1 + eth1-2 + eth1-3 + eth1-4 ~= eth1-0.
>>
>> I think that ICMP ping packets will always go to ring 0 (eth1-0)
>> because they are non-IP packets. I need to double check tomorrow
>> on how exactly the hashing works on RX. Can you try running IP
>> traffic? IP packets should theoretically go to rings 1 - 4.
>>
>
> ICMP packets are IP packets (Protocol=1)
Exactly. However, the firmware may handle ICMP and TCP in a different way.
>>> So, it seems that TX or RX is always handled by the first vector.
>>> I'll try to find if it is TX or RX.
>>>
>>> BTW: I'm using .1Q vlans over bonding, does it change anything?
>>
>> That should not matter, as the VLAN tag is stripped before hashing.
>
> warning, bonding currently is not multiqueue aware.
>
> All tx packets through bonding will use txqueue 0, since bnx2 doesnt
> provide a ndo_select_queue() function.
OK, that explains everything. Thank you Eric. I assume it may take some
time for bonding to become multiqueue aware and/or bnx2x to provide
ndo_select_queue?
BTW: With a normal router workload, should I expect big performance drop
when receiving and forwarding the same packet using different CPUs?
Bonding provides very important functionality, I'm not able to drop it. :(
Best regards,
Krzysztof Olędzki
^ permalink raw reply
* Re: bnx2/BCM5709: why 5 interrupts on a 4 core system (2.6.33.3)
From: Michael Chan @ 2010-05-16 20:24 UTC (permalink / raw)
To: 'Eric Dumazet'
Cc: 'Krzysztof Oledzki', netdev@vger.kernel.org
In-Reply-To: <1274040928.2299.17.camel@edumazet-laptop>
Eric Dumazet write:
> > I think that ICMP ping packets will always go to ring 0 (eth1-0)
> > because they are non-IP packets. I need to double check tomorrow
> > on how exactly the hashing works on RX. Can you try running IP
> > traffic? IP packets should theoretically go to rings 1 - 4.
> >
>
> ICMP packets are IP packets (Protocol=1)
>
Sorry, Eric is right. Anyway, I'll check on the hashing to see how
it works on UDP, TCP, and other packets.
^ permalink raw reply
* Re: bnx2/BCM5709: why 5 interrupts on a 4 core system (2.6.33.3)
From: Eric Dumazet @ 2010-05-16 20:15 UTC (permalink / raw)
To: Michael Chan; +Cc: 'Krzysztof Oledzki', netdev@vger.kernel.org
In-Reply-To: <C27F8246C663564A84BB7AB3439772421B7814753A@IRVEXCHCCR01.corp.ad.broadcom.com>
Le dimanche 16 mai 2010 à 13:00 -0700, Michael Chan a écrit :
> Krzysztof Oledzki wrote:
>
> > On 2010-05-16 20:51, Michael Chan wrote:
> > > Krzysztof Oledzki wrote:
> > >
> > >>
> > >> Why the driver registers 5 interrupts instead of 4? How to
> > >> limit it to 4?
> > >>
> > >
> > > The first vector (eth0-0) handles link interrupt and other slow
> > > path events. It also has an RX ring for non-IP packets that are
> > > not hashed by the RSS hash. The majority of the rx packets should
> > > be hashed to the rx rings eth0-1 - eth0-4, so I would assign these
> > > vectors to different CPUs.
> >
> > Thank you for your prompt response.
> >
> > In my case the first vector must be handling something more:
> > - "ping -f 192.168.0.1" increases interrupts on both eth1-0
> > and eth1-4
> > - "ping -f 192.168.0.2" increases interrupts on both eth1-0
> > and eth1-3
> > - "ping -f 192.168.0.3" increases interrupts on both eth1-0
> > and eth1-1
> > - "ping -f 192.168.0.7" increases interrupts on both eth1-0
> > and eth1-2
> >
> > CPU0 CPU1 CPU2 CPU3
> > 67: 1563979 0 0 0
> > PCI-MSI-edge eth1-0
> > 68: 1072869 0 0 0
> > PCI-MSI-edge eth1-1
> > 69: 137905 0 0 0
> > PCI-MSI-edge eth1-2
> > 70: 259246 0 0 0
> > PCI-MSI-edge eth1-3
> > 71: 760252 0 0 0
> > PCI-MSI-edge eth1-4
> >
> > As you can see, eth1-1 + eth1-2 + eth1-3 + eth1-4 ~= eth1-0.
>
> I think that ICMP ping packets will always go to ring 0 (eth1-0)
> because they are non-IP packets. I need to double check tomorrow
> on how exactly the hashing works on RX. Can you try running IP
> traffic? IP packets should theoretically go to rings 1 - 4.
>
ICMP packets are IP packets (Protocol=1)
> >
> > So, it seems that TX or RX is always handled by the first vector.
> > I'll try to find if it is TX or RX.
> >
> > BTW: I'm using .1Q vlans over bonding, does it change anything?
>
> That should not matter, as the VLAN tag is stripped before hashing.
warning, bonding currently is not multiqueue aware.
All tx packets through bonding will use txqueue 0, since bnx2 doesnt
provide a ndo_select_queue() function.
^ permalink raw reply
* Re: TCP-MD5 checksum failure on x86_64 SMP
From: Eric Dumazet @ 2010-05-16 19:53 UTC (permalink / raw)
To: David Miller
Cc: shemminger, Bijay.Singh, bhaskie, bhutchings, netdev,
ilpo.jarvinen
In-Reply-To: <20100512.152406.193725816.davem@davemloft.net>
Le mercredi 12 mai 2010 à 15:24 -0700, David Miller a écrit :
> From: Stephen Hemminger <shemminger@vyatta.com>
> Date: Wed, 12 May 2010 15:22:07 -0700
>
> > Yes, that looks like a possible bug, not sure what hardware
> > generates frag_list.
>
> GRO generates frag_list
ixgbe (82599) too, if I understand well this driver (TCP Receive Side
Coalescing RSC)
^ permalink raw reply
* Re: bnx2/BCM5709: why 5 interrupts on a 4 core system (2.6.33.3)
From: Michael Chan @ 2010-05-16 20:00 UTC (permalink / raw)
To: 'Krzysztof Oledzki'; +Cc: netdev@vger.kernel.org
In-Reply-To: <4BF0465A.5030307@ans.pl>
Krzysztof Oledzki wrote:
> On 2010-05-16 20:51, Michael Chan wrote:
> > Krzysztof Oledzki wrote:
> >
> >>
> >> Why the driver registers 5 interrupts instead of 4? How to
> >> limit it to 4?
> >>
> >
> > The first vector (eth0-0) handles link interrupt and other slow
> > path events. It also has an RX ring for non-IP packets that are
> > not hashed by the RSS hash. The majority of the rx packets should
> > be hashed to the rx rings eth0-1 - eth0-4, so I would assign these
> > vectors to different CPUs.
>
> Thank you for your prompt response.
>
> In my case the first vector must be handling something more:
> - "ping -f 192.168.0.1" increases interrupts on both eth1-0
> and eth1-4
> - "ping -f 192.168.0.2" increases interrupts on both eth1-0
> and eth1-3
> - "ping -f 192.168.0.3" increases interrupts on both eth1-0
> and eth1-1
> - "ping -f 192.168.0.7" increases interrupts on both eth1-0
> and eth1-2
>
> CPU0 CPU1 CPU2 CPU3
> 67: 1563979 0 0 0
> PCI-MSI-edge eth1-0
> 68: 1072869 0 0 0
> PCI-MSI-edge eth1-1
> 69: 137905 0 0 0
> PCI-MSI-edge eth1-2
> 70: 259246 0 0 0
> PCI-MSI-edge eth1-3
> 71: 760252 0 0 0
> PCI-MSI-edge eth1-4
>
> As you can see, eth1-1 + eth1-2 + eth1-3 + eth1-4 ~= eth1-0.
I think that ICMP ping packets will always go to ring 0 (eth1-0)
because they are non-IP packets. I need to double check tomorrow
on how exactly the hashing works on RX. Can you try running IP
traffic? IP packets should theoretically go to rings 1 - 4.
>
> So, it seems that TX or RX is always handled by the first vector.
> I'll try to find if it is TX or RX.
>
> BTW: I'm using .1Q vlans over bonding, does it change anything?
That should not matter, as the VLAN tag is stripped before hashing.
^ permalink raw reply
* Re: bnx2/BCM5709: why 5 interrupts on a 4 core system (2.6.33.3)
From: Krzysztof Olędzki @ 2010-05-16 19:49 UTC (permalink / raw)
To: Michael Chan; +Cc: netdev@vger.kernel.org
In-Reply-To: <4BF0465A.5030307@ans.pl>
On 2010-05-16 21:24, Krzysztof Olędzki wrote:
> On 2010-05-16 20:51, Michael Chan wrote:
>> Krzysztof Oledzki wrote:
>>
>>>
>>> Why the driver registers 5 interrupts instead of 4? How to
>>> limit it to 4?
>>>
>>
>> The first vector (eth0-0) handles link interrupt and other slow
>> path events. It also has an RX ring for non-IP packets that are
>> not hashed by the RSS hash. The majority of the rx packets should
>> be hashed to the rx rings eth0-1 - eth0-4, so I would assign these
>> vectors to different CPUs.
>
> Thank you for your prompt response.
>
> In my case the first vector must be handling something more:
> - "ping -f 192.168.0.1" increases interrupts on both eth1-0 and eth1-4
> - "ping -f 192.168.0.2" increases interrupts on both eth1-0 and eth1-3
> - "ping -f 192.168.0.3" increases interrupts on both eth1-0 and eth1-1
> - "ping -f 192.168.0.7" increases interrupts on both eth1-0 and eth1-2
>
> CPU0 CPU1 CPU2 CPU3
> 67: 1563979 0 0 0 PCI-MSI-edge eth1-0
> 68: 1072869 0 0 0 PCI-MSI-edge eth1-1
> 69: 137905 0 0 0 PCI-MSI-edge eth1-2
> 70: 259246 0 0 0 PCI-MSI-edge eth1-3
> 71: 760252 0 0 0 PCI-MSI-edge eth1-4
>
> As you can see, eth1-1 + eth1-2 + eth1-3 + eth1-4 ~= eth1-0.
>
> So, it seems that TX or RX is always handled by the first vector.
> I'll try to find if it is TX or RX.
>
> BTW: I'm using .1Q vlans over bonding, does it change anything?
It looks like TX for locally generated packets is always performed on
eth1-0. I guess it should look differently for forwarded packets?
Best regards,
Krzysztof Olędzki
^ permalink raw reply
* Re: bnx2/BCM5709: why 5 interrupts on a 4 core system (2.6.33.3)
From: Krzysztof Olędzki @ 2010-05-16 19:24 UTC (permalink / raw)
To: Michael Chan; +Cc: netdev@vger.kernel.org
In-Reply-To: <C27F8246C663564A84BB7AB3439772421B78147539@IRVEXCHCCR01.corp.ad.broadcom.com>
On 2010-05-16 20:51, Michael Chan wrote:
> Krzysztof Oledzki wrote:
>
>>
>> Why the driver registers 5 interrupts instead of 4? How to
>> limit it to 4?
>>
>
> The first vector (eth0-0) handles link interrupt and other slow
> path events. It also has an RX ring for non-IP packets that are
> not hashed by the RSS hash. The majority of the rx packets should
> be hashed to the rx rings eth0-1 - eth0-4, so I would assign these
> vectors to different CPUs.
Thank you for your prompt response.
In my case the first vector must be handling something more:
- "ping -f 192.168.0.1" increases interrupts on both eth1-0 and eth1-4
- "ping -f 192.168.0.2" increases interrupts on both eth1-0 and eth1-3
- "ping -f 192.168.0.3" increases interrupts on both eth1-0 and eth1-1
- "ping -f 192.168.0.7" increases interrupts on both eth1-0 and eth1-2
CPU0 CPU1 CPU2 CPU3
67: 1563979 0 0 0 PCI-MSI-edge eth1-0
68: 1072869 0 0 0 PCI-MSI-edge eth1-1
69: 137905 0 0 0 PCI-MSI-edge eth1-2
70: 259246 0 0 0 PCI-MSI-edge eth1-3
71: 760252 0 0 0 PCI-MSI-edge eth1-4
As you can see, eth1-1 + eth1-2 + eth1-3 + eth1-4 ~= eth1-0.
So, it seems that TX or RX is always handled by the first vector.
I'll try to find if it is TX or RX.
BTW: I'm using .1Q vlans over bonding, does it change anything?
Best regards,
Krzysztof Olędzki
^ permalink raw reply
* Re: bnx2/BCM5709: why 5 interrupts on a 4 core system (2.6.33.3)
From: Michael Chan @ 2010-05-16 18:51 UTC (permalink / raw)
To: 'Krzysztof Oledzki'; +Cc: netdev@vger.kernel.org
In-Reply-To: <alpine.LNX.1.10.1005161511490.6004@bizon.gios.gov.pl>
Krzysztof Oledzki wrote:
>
> Why the driver registers 5 interrupts instead of 4? How to
> limit it to 4?
>
The first vector (eth0-0) handles link interrupt and other slow
path events. It also has an RX ring for non-IP packets that are
not hashed by the RSS hash. The majority of the rx packets should
be hashed to the rx rings eth0-1 - eth0-4, so I would assign these
vectors to different CPUs.
^ permalink raw reply
* Re: [PATCH net-next-2.6] net: Consistent skb timestamping
From: Dimitris Michailidis @ 2010-05-16 18:30 UTC (permalink / raw)
To: David Miller; +Cc: therbert, eric.dumazet, netdev
In-Reply-To: <20100515.235635.63009445.davem@davemloft.net>
David Miller wrote:
> The real fix is to make the devices less stupid and give us timestamps
> directly, and thanks to things like PTP support in hardware that's
> actually more and more of a reality these days.
For cxgb4 a timestamp is written into Rx descriptors for each received
packet. The value comes from a TSC-like cycle counter. The raw timestamp
is very cheap to get, its value converted to system ktime a bit less so
though not too bad. It would be nicer though if the stack could hint the
driver whether it should do the conversion at all. Maybe export
netstamp_needed and add an inline wrapper to read it?
^ permalink raw reply
* Re: Weird TCP retransmit behaviour in recent kernels
From: Michael Smith @ 2010-05-16 16:08 UTC (permalink / raw)
To: Ilpo Järvinen; +Cc: Netdev
In-Reply-To: <alpine.DEB.2.00.1005160118240.30522@melkinpaasi.cs.helsinki.fi>
[-- Attachment #1: Type: TEXT/PLAIN, Size: 1091 bytes --]
On Sun, 16 May 2010, Ilpo Järvinen wrote:
> On Fri, 14 May 2010, Michael Smith wrote:
>> It seems like when consecutive packets are lost, the SLES11
>> server retransmits the first packet when the timeout fires. The client
>> ACKs, but the server doesn't retransmit the next lost packet; instead,
>> it sends a couple more new packets, which don't get ACKed.
> This is where your problem is, they should get acked in a _compliant_
> network (with duplicate ACKs).
> Some have seen similar phenomena, every time it has been fault in some
> middlebox/peer that does not do what it should. You can disable frto
> using tcp_frto sysctl if you like, however, I disagree with you as I'm
> pretty sure there is some broken middlebox in the network (which is trying
> to be too intelligent).
Thanks - tcp_frto=0 works around the problem here. The network in the
middle is provided by a number of other parties, so I can try to point
them in the right direction, but unless Microsoft turns on FRTO by default
sometime soon, I doubt they will have time to care. :)
Mike
^ permalink raw reply
* Re: VLAN I/F's and TX queue.
From: Joakim Tjernlund @ 2010-05-16 14:22 UTC (permalink / raw)
To: David Miller; +Cc: eric.dumazet, kaber, netdev
In-Reply-To: <20100516.004041.73702151.davem@davemloft.net>
David Miller <davem@davemloft.net> wrote on 2010/05/16 09:40:41:
>
> From: Joakim Tjernlund <joakim.tjernlund@transmode.se>
> Date: Mon, 10 May 2010 16:50:20 +0200
>
> > Patrick McHardy <kaber@trash.net> wrote on 2010/05/10 16:33:00:
> >>
> >> Joakim Tjernlund wrote:
> >> > Eric Dumazet <eric.dumazet@gmail.com> wrote on 2010/05/07 10:53:23:
> >> >>> 3) I would expect lost pkgs to be accounted on eth0 instead of
> >> >>> the VLAN interface(s) since that is where the pkg is lost, why
> >> >>> isn't it so?
> >> >> You try to send packets on eth0.XXX, some are dropped, and accounted for
> >> >> on eth0.XXX stats. What is wrong with this ?
> >> >
> >> > In this case one lost pkg is accounted for twice, once on eth0.1 and
> >> > once more on eth0.1.1. Note that eth0.1.1 is stacked on
> >> > top of eth0.1
> >> >
> >> > I would at least expect eth0 to also account lost pkgs too.
> >> > I was confused by the current accounting as I knew that
> >> > the underlying HW I/F should be the only I/F that could
> >> > drop pkgs.
> >>
> >> In case of NET_XMIT_CN, the packet is dropped by the qdisc before
> >> it reaches eth0, so its only accounted on the upper devices.
> >
> > hmm, I am afraid I don't follow this. Why would a pkg be dropped before
> > it reaches eth0?
>
> Because we have packet schedulers that sit before the device transmit
> happens, and those packet schedulers enforce limits based upon
> classification results or other criteria, and if those limits are
> exceeded packets are droppers and NET_XMIT_CN is returned back up into
> the transmit path of the networking stack.
OK, but what I don't get is if pgks are dropped as soon as the underlying
device cannot handle the pkg directly(returns !NETDEV_TX_OK or stops the queue)?
Are !NETDEV_TX_OK and stopping the queue handled differently by upper layers?
I would have expected the pkg be added to the TX queue and transmitted somewhat later.
If not, what is the TX queue for?
>
> The device never sees that packet get submitted to it's ->ndo_start_xmit()
> routine, and this is entirely intentional. And it is entirely intentional
> that NET_XMIT_CN gets passed up into the caller, where protocols such as
> TCP can key off this information to make congestion control decisions.
In this case it gets passed up to the VLAN driver, should the VLAN driver
do something else to use the TX queue?
Jocke
^ permalink raw reply
* bnx2/BCM5709: why 5 interrupts on a 4 core system (2.6.33.3)
From: Krzysztof Oledzki @ 2010-05-16 13:33 UTC (permalink / raw)
To: Michael Chan; +Cc: netdev
[-- Attachment #1: Type: TEXT/PLAIN, Size: 1531 bytes --]
Hello,
I have a Dell R610 server with BCM5709 NICs. The server has one, 4-core
CPU (X5570) and four BCM5709 NICs onboard. I would like to assign each
NIC's interrupt to a different CPU for a better performance. However, as I
have 5 INTs to assign and only 4 CPUs available, it is not obvious how to
do it right:
CPU0 CPU1 CPU2 CPU3
61: 85085 0 0 0 PCI-MSI-edge eth1-0
62: 23046 0 0 0 PCI-MSI-edge eth1-1
63: 24525 0 0 0 PCI-MSI-edge eth1-2
64: 77801 0 0 0 PCI-MSI-edge eth1-3
65: 24006 0 0 0 PCI-MSI-edge eth1-4
# uname -r
2.6.33.3
# dmesg |grep 0000:01:00.0
bnx2 0000:01:00.0: PCI INT A -> GSI 36 (level, low) -> IRQ 36
bnx2 0000:01:00.0: setting latency timer to 64
bnx2 0000:01:00.0: firmware: requesting bnx2/bnx2-mips-09-5.0.0.j3.fw
bnx2 0000:01:00.0: firmware: requesting bnx2/bnx2-rv2p-09-5.0.0.j3.fw
bnx2 0000:01:00.0: irq 61 for MSI/MSI-X
bnx2 0000:01:00.0: irq 62 for MSI/MSI-X
bnx2 0000:01:00.0: irq 63 for MSI/MSI-X
bnx2 0000:01:00.0: irq 64 for MSI/MSI-X
bnx2 0000:01:00.0: irq 65 for MSI/MSI-X
bnx2 0000:01:00.0: irq 66 for MSI/MSI-X
bnx2 0000:01:00.0: irq 67 for MSI/MSI-X
bnx2 0000:01:00.0: irq 68 for MSI/MSI-X
bnx2 0000:01:00.0: irq 69 for MSI/MSI-X
Why the driver registers 5 interrupts instead of 4? How to limit it to 4?
Best regards,
Krzysztof Olędzki
^ permalink raw reply
* [PATCH] atm: select FW_LOADER in Kconfig for solos-pci
From: Nathan Williams @ 2010-05-16 13:12 UTC (permalink / raw)
To: netdev
solos-pci uses request_firmware() for firmware upgrades
Signed-off-by: Nathan Williams <nathan@traverse.com.au>
diff --git a/drivers/atm/Kconfig b/drivers/atm/Kconfig
index 191b85e..f1a0a00 100644
--- a/drivers/atm/Kconfig
+++ b/drivers/atm/Kconfig
@@ -394,6 +394,7 @@ config ATM_HE_USE_SUNI
config ATM_SOLOS
tristate "Solos ADSL2+ PCI Multiport card driver"
depends on PCI
+ select FW_LOADER
help
Support for the Solos multiport ADSL2+ card.
^ permalink raw reply related
* [PATCH] r6040: fix link checking with switches
From: Florian Fainelli @ 2010-05-16 12:30 UTC (permalink / raw)
To: netdev, David Miller
The current link checking logic only works for one port, which is not correct
for swiches were multiple ports can have different link status. As a result
we would only check for link status on port 1 of the switch. Move the calls
to mii_check_media in r6040_timer which will be polling a single PHY chip
correctly and assume link is up for switches.
Signed-off-by: Florian Fainelli <florian@openwrt.org>
---
drivers/net/r6040.c | 8 +++-----
1 files changed, 3 insertions(+), 5 deletions(-)
diff --git a/drivers/net/r6040.c b/drivers/net/r6040.c
index 4122916..eeee379 100644
--- a/drivers/net/r6040.c
+++ b/drivers/net/r6040.c
@@ -400,9 +400,6 @@ static void r6040_init_mac_regs(struct net_device *dev)
* we may got called by r6040_tx_timeout which has left
* some unsent tx buffers */
iowrite16(0x01, ioaddr + MTPR);
-
- /* Check media */
- mii_check_media(&lp->mii_if, 1, 1);
}
static void r6040_tx_timeout(struct net_device *dev)
@@ -530,8 +527,6 @@ static int r6040_phy_mode_chk(struct net_device *dev)
phy_dat = 0x0000;
}
- mii_check_media(&lp->mii_if, 0, 1);
-
return phy_dat;
};
@@ -813,6 +808,9 @@ static void r6040_timer(unsigned long data)
/* Timer active again */
mod_timer(&lp->timer, round_jiffies(jiffies + HZ));
+
+ /* Check media */
+ mii_check_media(&lp->mii_if, 1, 1);
}
/* Read/set MAC address routines */
--
1.7.1
^ permalink raw reply related
* [PATCH] dm9000: fix "BUG: spinlock recursion"
From: Baruch Siach @ 2010-05-16 10:06 UTC (permalink / raw)
To: netdev; +Cc: Baruch Siach, stable, Sascha Hauer, Ben Dooks
dm9000_set_rx_csum and dm9000_hash_table are called from atomic context (in
dm9000_init_dm9000), and from non-atomic context (via ethtool_ops and
net_device_ops respectively). This causes a spinlock recursion BUG. Fix this by
renaming these functions to *_unlocked for the atomic context, and make the
original functions locking wrappers for use in the non-atomic context.
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Cc: stable@kernel.org
Cc: Sascha Hauer <s.hauer@pengutronix.de>
Cc: Ben Dooks <ben-linux@fluff.org>
---
drivers/net/dm9000.c | 38 +++++++++++++++++++++++++++-----------
1 files changed, 27 insertions(+), 11 deletions(-)
diff --git a/drivers/net/dm9000.c b/drivers/net/dm9000.c
index 7f9960f..3556b2c 100644
--- a/drivers/net/dm9000.c
+++ b/drivers/net/dm9000.c
@@ -476,17 +476,13 @@ static uint32_t dm9000_get_rx_csum(struct net_device *dev)
return dm->rx_csum;
}
-static int dm9000_set_rx_csum(struct net_device *dev, uint32_t data)
+static int dm9000_set_rx_csum_unlocked(struct net_device *dev, uint32_t data)
{
board_info_t *dm = to_dm9000_board(dev);
- unsigned long flags;
if (dm->can_csum) {
dm->rx_csum = data;
-
- spin_lock_irqsave(&dm->lock, flags);
iow(dm, DM9000_RCSR, dm->rx_csum ? RCSR_CSUM : 0);
- spin_unlock_irqrestore(&dm->lock, flags);
return 0;
}
@@ -494,6 +490,19 @@ static int dm9000_set_rx_csum(struct net_device *dev, uint32_t data)
return -EOPNOTSUPP;
}
+static int dm9000_set_rx_csum(struct net_device *dev, uint32_t data)
+{
+ board_info_t *dm = to_dm9000_board(dev);
+ unsigned long flags;
+ int ret;
+
+ spin_lock_irqsave(&dm->lock, flags);
+ ret = dm9000_set_rx_csum_unlocked(dev, data);
+ spin_unlock_irqrestore(&dm->lock, flags);
+
+ return ret;
+}
+
static int dm9000_set_tx_csum(struct net_device *dev, uint32_t data)
{
board_info_t *dm = to_dm9000_board(dev);
@@ -722,7 +731,7 @@ static unsigned char dm9000_type_to_char(enum dm9000_type type)
* Set DM9000 multicast address
*/
static void
-dm9000_hash_table(struct net_device *dev)
+dm9000_hash_table_unlocked(struct net_device *dev)
{
board_info_t *db = netdev_priv(dev);
struct dev_mc_list *mcptr;
@@ -730,12 +739,9 @@ dm9000_hash_table(struct net_device *dev)
u32 hash_val;
u16 hash_table[4];
u8 rcr = RCR_DIS_LONG | RCR_DIS_CRC | RCR_RXEN;
- unsigned long flags;
dm9000_dbg(db, 1, "entering %s\n", __func__);
- spin_lock_irqsave(&db->lock, flags);
-
for (i = 0, oft = DM9000_PAR; i < 6; i++, oft++)
iow(db, oft, dev->dev_addr[i]);
@@ -765,6 +771,16 @@ dm9000_hash_table(struct net_device *dev)
}
iow(db, DM9000_RCR, rcr);
+}
+
+static void
+dm9000_hash_table(struct net_device *dev)
+{
+ board_info_t *db = netdev_priv(dev);
+ unsigned long flags;
+
+ spin_lock_irqsave(&db->lock, flags);
+ dm9000_hash_table_unlocked(dev);
spin_unlock_irqrestore(&db->lock, flags);
}
@@ -784,7 +800,7 @@ dm9000_init_dm9000(struct net_device *dev)
db->io_mode = ior(db, DM9000_ISR) >> 6; /* ISR bit7:6 keeps I/O mode */
/* Checksum mode */
- dm9000_set_rx_csum(dev, db->rx_csum);
+ dm9000_set_rx_csum_unlocked(dev, db->rx_csum);
/* GPIO0 on pre-activate PHY */
iow(db, DM9000_GPR, 0); /* REG_1F bit0 activate phyxcer */
@@ -811,7 +827,7 @@ dm9000_init_dm9000(struct net_device *dev)
iow(db, DM9000_ISR, ISR_CLR_STATUS); /* Clear interrupt status */
/* Set address filter table */
- dm9000_hash_table(dev);
+ dm9000_hash_table_unlocked(dev);
imr = IMR_PAR | IMR_PTM | IMR_PRM;
if (db->type != TYPE_DM9000E)
--
1.7.1
^ permalink raw reply related
* [GIT] Networking
From: David Miller @ 2010-05-16 8:32 UTC (permalink / raw)
To: torvalds; +Cc: akpm, netdev, linux-kernel
1) The new SR-IOV VF netlink messages had a critical error in their
design, they were not designed to be symmetric (therefore you can't
take a netlink message dump, and use the elements to recreate
configurations like you can with other netlink message types).
Cure this by using nested netlink attributes.
Better to fix this before it appears in a released kernel.
Fix from Chris Wright.
2) The per-cpu TCP md5 signature state was not properly softirq
protected, leading to all kinds of corruptions. Fix from Eric
Dumazet.
3) SCTP transport teardown can leave a timer running, fix from Wei
Yongjun.
Please pull, thanks!
The following changes since commit 4fc4c3ce0dc1096cbd0daa3fe8f6905cbec2b87e:
Linus Torvalds (1):
Merge branch 'for-linus' of git://git.infradead.org/users/eparis/notify
are available in the git repository at:
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6.git master
Chris Wright (1):
rtnetlink: make SR-IOV VF interface symmetric
Eric Dumazet (1):
tcp: fix MD5 (RFC2385) support
Wei Yongjun (1):
sctp: delete active ICMP proto unreachable timer when free transport
include/linux/if_link.h | 23 ++++++-
include/net/tcp.h | 21 +-----
net/core/rtnetlink.c | 159 ++++++++++++++++++++++++++++++++--------------
net/ipv4/tcp.c | 34 +++++++---
net/sctp/transport.c | 4 +
5 files changed, 160 insertions(+), 81 deletions(-)
^ permalink raw reply
* Re: [PATCH] rtnetlink: make SR-IOV VF interface symmetric
From: David Miller @ 2010-05-16 8:05 UTC (permalink / raw)
To: chrisw; +Cc: kaber, mitch.a.williams, arnd, scofeldm, shemminger, netdev
In-Reply-To: <20100515031416.GE15313@sequoia.sous-sol.org>
From: Chris Wright <chrisw@sous-sol.org>
Date: Fri, 14 May 2010 20:14:16 -0700
> Now we have a set of nested attributes:
>
> IFLA_VFINFO_LIST (NESTED)
> IFLA_VF_INFO (NESTED)
> IFLA_VF_MAC
> IFLA_VF_VLAN
> IFLA_VF_TX_RATE
>
> This allows a single set to operate on multiple attributes if desired.
> Among other things, it means a dump can be replayed to set state.
>
> The current interface has yet to be released, so this seems like
> something to consider for 2.6.34.
>
> Signed-off-by: Chris Wright <chrisw@sous-sol.org
Agreed, applied to net-2.6, thanks Chris!
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox