Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [patch] pm_qos update fixing mmotm 2010-05-11 -dies in pm_qos_update_request()
From: mgross @ 2010-05-17  0:12 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: markgross, Valdis.Kletnieks, mgross, akpm, davem, linux-kernel,
	e1000-devel, netdev, linux-pm
In-Reply-To: <201005170021.25427.rjw@sisk.pl>

On Mon, May 17, 2010 at 12:21:25AM +0200, Rafael J. Wysocki wrote:
> On Saturday 15 May 2010, mgross wrote:
> > On Sat, May 15, 2010 at 09:38:47PM +0200, Rafael J. Wysocki wrote:
> > > On Saturday 15 May 2010, mgross wrote:
> > > > I apologize for the goofy email address.  
> > > > 
> > > > The following is a fix for the crash reported by Valdis.
> > > > 
> > > > The problem was that the original pm_qos silently fails when a request
> > > > update is passed to a parameter that has not been added to the list
> > > > yet.  It seems that the e1000e is doing this.  This update restores this
> > > > behavior.
> > > > 
> > > > I need to think about how to better handle such abuse, but for now this
> > > > restores the original behavior.
> > > 
> > > Can you please post a signed-off incremental patch against
> > > 
> > > git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6.git for-llinus
> > > 
> > > that contains your original PM QOS update?
> > 
> > No problem:
> >
> > Signed-off-by: markgross <markgross@thegnar.org>
> 
> Thanks!  Do you want to use this address for the sign-off or the Intel one?

I guess so.  Ever since switching groups within intel last summer my
mgross@linux.intel.com address isn't checked as often as this one. 

The other option is to use my outlook email (mark.gross@intel.com), but
I really hate posting from outlook.  Besides, doing upstream kernel
stuff isn't my day job any more so using markgross@thegnar.org  makes
sense to me.

thanks,

--mgross



> 
> Rafael
> 
>  
> > From 487b8dcaeb66d3c226d4c06c1bd99689f93024be Mon Sep 17 00:00:00 2001
> > From: mgross <mgross@mgross-desktop.(none)>
> > Date: Sat, 15 May 2010 14:30:15 -0700
> > Subject: [PATCH] Gard against pm_qos users calling API before registering a proper
> >  request.
> > 
> > This update handles a use case where pm_qos update requests need to
> > silently fail if the update is being sent to a handle that is null.
> > 
> > The problem was that the original pm_qos silently fails when a request
> > update is passed to a parameter that has not been added to the list yet.
> > This update restores that behavior.
> > 
> > Signed-off-by: markgross <markgross@thegnar.org>
> > 
> > ---
> >  kernel/pm_qos_params.c |   26 ++++++++++++++------------
> >  1 files changed, 14 insertions(+), 12 deletions(-)
> > 
> > diff --git a/kernel/pm_qos_params.c b/kernel/pm_qos_params.c
> > index a1aea04..f42d3f7 100644
> > --- a/kernel/pm_qos_params.c
> > +++ b/kernel/pm_qos_params.c
> > @@ -252,19 +252,21 @@ void pm_qos_update_request(struct pm_qos_request_list *pm_qos_req,
> >  	int pending_update = 0;
> >  	s32 temp;
> >  
> > -	spin_lock_irqsave(&pm_qos_lock, flags);
> > -	if (new_value == PM_QOS_DEFAULT_VALUE)
> > -		temp = pm_qos_array[pm_qos_req->pm_qos_class]->default_value;
> > -	else
> > -		temp = new_value;
> > -
> > -	if (temp != pm_qos_req->value) {
> > -		pending_update = 1;
> > -		pm_qos_req->value = temp;
> > +	if (pm_qos_req) { /*guard against callers passing in null */
> > +		spin_lock_irqsave(&pm_qos_lock, flags);
> > +		if (new_value == PM_QOS_DEFAULT_VALUE)
> > +			temp = pm_qos_array[pm_qos_req->pm_qos_class]->default_value;
> > +		else
> > +			temp = new_value;
> > +
> > +		if (temp != pm_qos_req->value) {
> > +			pending_update = 1;
> > +			pm_qos_req->value = temp;
> > +		}
> > +		spin_unlock_irqrestore(&pm_qos_lock, flags);
> > +		if (pending_update)
> > +			update_target(pm_qos_req->pm_qos_class);
> >  	}
> > -	spin_unlock_irqrestore(&pm_qos_lock, flags);
> > -	if (pending_update)
> > -		update_target(pm_qos_req->pm_qos_class);
> >  }
> >  EXPORT_SYMBOL_GPL(pm_qos_update_request);
> >  
> > 
> 

^ permalink raw reply

* Re: [PATCH 13/37] drivers/net/wireless/iwmc3200wifi: Use kmemdup
From: Samuel Ortiz @ 2010-05-16 23:01 UTC (permalink / raw)
  To: Julia Lawall
  Cc: Zhu, Yi, Intel Linux Wireless, John W. Linville,
	linux-wireless@vger.kernel.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, kernel-janitors@vger.kernel.org
In-Reply-To: <Pine.LNX.4.64.1005152316400.21345@ask.diku.dk>

On Sat, May 15, 2010 at 10:16:58PM +0100, Julia Lawall wrote:
> From: Julia Lawall <julia@diku.dk>
> 
> Use kmemdup when some other buffer is immediately copied into the
> allocated region.
> 
> A simplified version of the semantic patch that makes this change is as
> follows: (http://coccinelle.lip6.fr/)
> 
> // <smpl>
> @@
> expression from,to,size,flag;
> statement S;
> @@
> 
> -  to = \(kmalloc\|kzalloc\)(size,flag);
> +  to = kmemdup(from,size,flag);
>    if (to==NULL || ...) S
> -  memcpy(to, from, size);
> // </smpl>
> 
> Signed-off-by: Julia Lawall <julia@diku.dk>
Acked-by: Samuel Ortiz <sameo@linux.intel.com>

Cheers,
Samuel.
 
> ---
>  drivers/net/wireless/iwmc3200wifi/rx.c |    4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff -u -p a/drivers/net/wireless/iwmc3200wifi/rx.c b/drivers/net/wireless/iwmc3200wifi/rx.c
> --- a/drivers/net/wireless/iwmc3200wifi/rx.c
> +++ b/drivers/net/wireless/iwmc3200wifi/rx.c
> @@ -321,14 +321,14 @@ iwm_rx_ticket_node_alloc(struct iwm_priv
>  		return ERR_PTR(-ENOMEM);
>  	}
>  
> -	ticket_node->ticket = kzalloc(sizeof(struct iwm_rx_ticket), GFP_KERNEL);
> +	ticket_node->ticket = kmemdup(ticket, sizeof(struct iwm_rx_ticket),
> +				      GFP_KERNEL);
>  	if (!ticket_node->ticket) {
>  		IWM_ERR(iwm, "Couldn't allocate RX ticket\n");
>  		kfree(ticket_node);
>  		return ERR_PTR(-ENOMEM);
>  	}
>  
> -	memcpy(ticket_node->ticket, ticket, sizeof(struct iwm_rx_ticket));
>  	INIT_LIST_HEAD(&ticket_node->node);
>  
>  	return ticket_node;

-- 
Intel Open Source Technology Centre
http://oss.intel.com/
---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris, 
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 4,572,000 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

^ permalink raw reply

* Re: [patch] pm_qos update fixing mmotm 2010-05-11 -dies in pm_qos_update_request()
From: Rafael J. Wysocki @ 2010-05-16 22:21 UTC (permalink / raw)
  To: markgross
  Cc: mgross, Valdis.Kletnieks, e1000-devel, netdev, linux-kernel,
	linux-pm, akpm, davem
In-Reply-To: <20100515214256.GA3506@thegnar.org>

On Saturday 15 May 2010, mgross wrote:
> On Sat, May 15, 2010 at 09:38:47PM +0200, Rafael J. Wysocki wrote:
> > On Saturday 15 May 2010, mgross wrote:
> > > I apologize for the goofy email address.  
> > > 
> > > The following is a fix for the crash reported by Valdis.
> > > 
> > > The problem was that the original pm_qos silently fails when a request
> > > update is passed to a parameter that has not been added to the list
> > > yet.  It seems that the e1000e is doing this.  This update restores this
> > > behavior.
> > > 
> > > I need to think about how to better handle such abuse, but for now this
> > > restores the original behavior.
> > 
> > Can you please post a signed-off incremental patch against
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6.git for-llinus
> > 
> > that contains your original PM QOS update?
> 
> No problem:
>
> Signed-off-by: markgross <markgross@thegnar.org>

Thanks!  Do you want to use this address for the sign-off or the Intel one?

Rafael

 
> From 487b8dcaeb66d3c226d4c06c1bd99689f93024be Mon Sep 17 00:00:00 2001
> From: mgross <mgross@mgross-desktop.(none)>
> Date: Sat, 15 May 2010 14:30:15 -0700
> Subject: [PATCH] Gard against pm_qos users calling API before registering a proper
>  request.
> 
> This update handles a use case where pm_qos update requests need to
> silently fail if the update is being sent to a handle that is null.
> 
> The problem was that the original pm_qos silently fails when a request
> update is passed to a parameter that has not been added to the list yet.
> This update restores that behavior.
> 
> Signed-off-by: markgross <markgross@thegnar.org>
> 
> ---
>  kernel/pm_qos_params.c |   26 ++++++++++++++------------
>  1 files changed, 14 insertions(+), 12 deletions(-)
> 
> diff --git a/kernel/pm_qos_params.c b/kernel/pm_qos_params.c
> index a1aea04..f42d3f7 100644
> --- a/kernel/pm_qos_params.c
> +++ b/kernel/pm_qos_params.c
> @@ -252,19 +252,21 @@ void pm_qos_update_request(struct pm_qos_request_list *pm_qos_req,
>  	int pending_update = 0;
>  	s32 temp;
>  
> -	spin_lock_irqsave(&pm_qos_lock, flags);
> -	if (new_value == PM_QOS_DEFAULT_VALUE)
> -		temp = pm_qos_array[pm_qos_req->pm_qos_class]->default_value;
> -	else
> -		temp = new_value;
> -
> -	if (temp != pm_qos_req->value) {
> -		pending_update = 1;
> -		pm_qos_req->value = temp;
> +	if (pm_qos_req) { /*guard against callers passing in null */
> +		spin_lock_irqsave(&pm_qos_lock, flags);
> +		if (new_value == PM_QOS_DEFAULT_VALUE)
> +			temp = pm_qos_array[pm_qos_req->pm_qos_class]->default_value;
> +		else
> +			temp = new_value;
> +
> +		if (temp != pm_qos_req->value) {
> +			pending_update = 1;
> +			pm_qos_req->value = temp;
> +		}
> +		spin_unlock_irqrestore(&pm_qos_lock, flags);
> +		if (pending_update)
> +			update_target(pm_qos_req->pm_qos_class);
>  	}
> -	spin_unlock_irqrestore(&pm_qos_lock, flags);
> -	if (pending_update)
> -		update_target(pm_qos_req->pm_qos_class);
>  }
>  EXPORT_SYMBOL_GPL(pm_qos_update_request);
>  
> 


------------------------------------------------------------------------------

_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply

* Re: bnx2/BCM5709: why 5 interrupts on a 4 core system (2.6.33.3)
From: Eric Dumazet @ 2010-05-16 21:26 UTC (permalink / raw)
  To: Krzysztof Olędzki; +Cc: Michael Chan, netdev@vger.kernel.org
In-Reply-To: <4BF05FC2.4020804@ans.pl>

Le dimanche 16 mai 2010 à 23:12 +0200, Krzysztof Olędzki a écrit :
> On 2010-05-16 22:47, Eric Dumazet wrote:
> > Le dimanche 16 mai 2010 à 22:34 +0200, Krzysztof Olędzki a écrit :
> >> On 2010-05-16 22:15, Eric Dumazet wrote:
> >
> >>> All tx packets through bonding will use txqueue 0, since bnx2 doesnt
> >>> provide a ndo_select_queue() function.
> >>
> >> OK, that explains everything. Thank you Eric. I assume it may take some
> >> time for bonding to become multiqueue aware and/or bnx2x to provide
> >> ndo_select_queue?
> >>
> >
> > bonding might become multiqueue aware, there are several patches
> > floating around.
> >
> > But with your ping tests, it wont change the selected txqueue anyway (it
> > will be the same for any targets, because skb_tx_hash() wont hash the
> > destination address, only the skb->protocol.
> 
> What do you mean by "wont hash the destination address, only the 
> skb->protocol"? It won't hash the destination address for ICMP or for 
> all IP protocols?

locally generated ICMP packets all use same tx queue, because
sk->sk_hash is not set :

        if (skb->sk && skb->sk->sk_hash)
                hash = skb->sk->sk_hash;
        else
                hash = (__force u16) skb->protocol;

        hash = jhash_1word(hash, hashrnd);

        return (u16) (((u64) hash * dev->real_num_tx_queues) >> 32);
 



However, replies will spread four queues, if hardware is capable to
perform hashing of ICMP packets, using IP addresses (source/destination)

> 
> My normal workload is TCP and UDP based so if it is only ICMP then there 
> is no problem. Actually I have noticeably more UDP traffic than an 
> average network, mainly because of LWAPP/CAPWAP, so I'm interested in 
> good performance for both TCP and UDP.
> 
> During my initial tests ICMP ping showed the same behavior like UDP/TCP 
> with iperf, so I sticked with it. I'll redo everyting with UDP and TCP 
> of course. :)
> 
> >> BTW: With a normal router workload, should I expect big performance drop
> >> when receiving and forwarding the same packet using different CPUs?
> >> Bonding provides very important functionality, I'm not able to drop it. :(
> >>
> >
> > Not sure what you mean by forwarding same packet using different CPUs.
> > You probably meant different queues, because in normal case, only one
> > cpu is involved (the one receiving the packet is also the one
> > transmitting it, unless you have congestion or trafic shaping)
> 
> I mean to receive it on a one CPU and to send it on a different one. I 
> would like to assing different vectors (eth1-0 .. eth1-4) to different 
> CPUs, but with bnx2x+bonding packets are received on queues 1-4 (eth1-1 
> .. eth1-4) and sent from queue 0 (eth1-0). So, for a one packet, two 
> different CPUs will be involved (RX on q1-q4, TX on q0).

As I said, (unless you use RPS), one forwarded packet only uses one CPU.
How tx queue is selected is another story. We try to do a 1-1 mapping.

> 
> > If you have 4 cpus, you can use following patch and have a transparent
> > bonding against multiqueue.
> 
> Thanks! If I get it right: with the patch, packets should be sent using 
> the same CPU (queue?) that was used when receiving?

Yes, for forwarding loads.

(You might use 5 or 8 instead of 4, because its not clear to me if bnx2
has 5 txqueues or 4 in your case)

> 
> > Still bonding xmit path hits a global
> > rwlock, so performance is not what you can get without bonding.
> 
> It may not be perfect, but it should be much better than nothing, right?
> 

Sure.



^ permalink raw reply

* Re: bnx2/BCM5709: why 5 interrupts on a 4 core system (2.6.33.3)
From: Krzysztof Olędzki @ 2010-05-16 21:12 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Michael Chan, netdev@vger.kernel.org
In-Reply-To: <1274042826.2299.26.camel@edumazet-laptop>

On 2010-05-16 22:47, Eric Dumazet wrote:
> Le dimanche 16 mai 2010 à 22:34 +0200, Krzysztof Olędzki a écrit :
>> On 2010-05-16 22:15, Eric Dumazet wrote:
>
>>> All tx packets through bonding will use txqueue 0, since bnx2 doesnt
>>> provide a ndo_select_queue() function.
>>
>> OK, that explains everything. Thank you Eric. I assume it may take some
>> time for bonding to become multiqueue aware and/or bnx2x to provide
>> ndo_select_queue?
>>
>
> bonding might become multiqueue aware, there are several patches
> floating around.
>
> But with your ping tests, it wont change the selected txqueue anyway (it
> will be the same for any targets, because skb_tx_hash() wont hash the
> destination address, only the skb->protocol.

What do you mean by "wont hash the destination address, only the 
skb->protocol"? It won't hash the destination address for ICMP or for 
all IP protocols?

My normal workload is TCP and UDP based so if it is only ICMP then there 
is no problem. Actually I have noticeably more UDP traffic than an 
average network, mainly because of LWAPP/CAPWAP, so I'm interested in 
good performance for both TCP and UDP.

During my initial tests ICMP ping showed the same behavior like UDP/TCP 
with iperf, so I sticked with it. I'll redo everyting with UDP and TCP 
of course. :)

>> BTW: With a normal router workload, should I expect big performance drop
>> when receiving and forwarding the same packet using different CPUs?
>> Bonding provides very important functionality, I'm not able to drop it. :(
>>
>
> Not sure what you mean by forwarding same packet using different CPUs.
> You probably meant different queues, because in normal case, only one
> cpu is involved (the one receiving the packet is also the one
> transmitting it, unless you have congestion or trafic shaping)

I mean to receive it on a one CPU and to send it on a different one. I 
would like to assing different vectors (eth1-0 .. eth1-4) to different 
CPUs, but with bnx2x+bonding packets are received on queues 1-4 (eth1-1 
.. eth1-4) and sent from queue 0 (eth1-0). So, for a one packet, two 
different CPUs will be involved (RX on q1-q4, TX on q0).

> If you have 4 cpus, you can use following patch and have a transparent
> bonding against multiqueue.

Thanks! If I get it right: with the patch, packets should be sent using 
the same CPU (queue?) that was used when receiving?

> Still bonding xmit path hits a global
> rwlock, so performance is not what you can get without bonding.

It may not be perfect, but it should be much better than nothing, right?

Best regards,

			Krzysztof Olędzki

^ permalink raw reply

* Re: bnx2/BCM5709: why 5 interrupts on a 4 core system (2.6.33.3)
From: George B. @ 2010-05-16 21:06 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Krzysztof Olędzki, Michael Chan, netdev@vger.kernel.org
In-Reply-To: <1274042826.2299.26.camel@edumazet-laptop>

2010/5/16 Eric Dumazet <eric.dumazet@gmail.com>:
> Le dimanche 16 mai 2010 à 22:34 +0200, Krzysztof Olędzki a écrit :
>> On 2010-05-16 22:15, Eric Dumazet wrote:
>
>> > All tx packets through bonding will use txqueue 0, since bnx2 doesnt
>> > provide a ndo_select_queue() function.
>>
>> OK, that explains everything. Thank you Eric. I assume it may take some
>> time for bonding to become multiqueue aware and/or bnx2x to provide
>> ndo_select_queue?
>>
>
> bonding might become multiqueue aware, there are several patches
> floating around.
>
> But with your ping tests, it wont change the selected txqueue anyway (it
> will be the same for any targets, because skb_tx_hash() wont hash the
> destination address, only the skb->protocol.
>
>> BTW: With a normal router workload, should I expect big performance drop
>> when receiving and forwarding the same packet using different CPUs?
>> Bonding provides very important functionality, I'm not able to drop it. :(
>>
>
> Not sure what you mean by forwarding same packet using different CPUs.
> You probably meant different queues, because in normal case, only one
> cpu is involved (the one receiving the packet is also the one
> transmitting it, unless you have congestion or trafic shaping)
>
> If you have 4 cpus, you can use following patch and have a transparent
> bonding against multiqueue. Still bonding xmit path hits a global
> rwlock, so performance is not what you can get without bonding.
>
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 5e12462..2c257f7 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -5012,8 +5012,8 @@ int bond_create(struct net *net, const char *name)
>
>        rtnl_lock();
>
> -       bond_dev = alloc_netdev(sizeof(struct bonding), name ? name : "",
> -                               bond_setup);
> +       bond_dev = alloc_netdev_mq(sizeof(struct bonding), name ? name : "",
> +                               bond_setup, 4);
>        if (!bond_dev) {
>                pr_err("%s: eek! can't alloc netdev!\n", name);
>                rtnl_unlock();
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

FWIW, I will be comparing VLANs on bonded ethernet interfaces compared
to bonded to vlan interfaces (create a vlan on two interfaces and bond
them together) later this week to see if I can notice any performance
difference. I am expecting I will when two or more vlans are
experiencing heavy traffic.  What concerns me is if one ethernet goes
away, will the bond interface see the ethernet underlying the vlan
interface has gone down?

So in summary, rather than bonding ethernet interfaces and then
applying vlans to the bond, I intend to create vlans on the ethernet
interfaces and bond them. So one bond interface per vlan plus one for
the "raw" interfaces.  I am hoping that will allow better throughput
with multiple processors (and less head-of-line blocking for vlans
with low traffic rates).  Note: that configuration doesn't work with
2.6.32, I haven't tried with 2.6.33, and it allows me to configure it
with 2.6.34-rc7 though I haven't tested it yet on a multiqueue
ethernet with multiple processors.  I should have some systems to test
with later this week.

^ permalink raw reply

* Re: TCP-MD5 checksum failure on x86_64 SMP
From: Eric Dumazet @ 2010-05-16 20:48 UTC (permalink / raw)
  To: Bijay Singh
  Cc: Stephen Hemminger, David Miller, <bhaskie@gmail.com>,
	<bhutchings@solarflare.com>, netdev, Ilpo Järvinen
In-Reply-To: <1273611036.2512.18.camel@edumazet-laptop>

Le mardi 11 mai 2010 à 22:50 +0200, Eric Dumazet a écrit :
> Le mardi 11 mai 2010 à 04:08 +0000, Bijay Singh a écrit :
> > Hi Eric,
> > 
> > I guess that makes me the enviable one. So I am keen to test out this feature completely, as long as I know what to do as a next step, directions, patches.
> > 
> > Thanks
> 
> 
> I believe third problem comes from commit 4957faad
> (TCPCT part 1g: Responder Cookie => Initiator), from William Allen
> Simpson.
> 
> When a SYN-ACK packet is built (in tcp_synack_options()),
> it specifically forbids a TIMESTAMP option to be included if SACK is
> also selected :
> 
> doing_ts &= !ireq->sack_ok;
> 
> Problem is this mask is done on a local variable. socket is still marked
> as being timestamp enabled.
> 
> 
> Later, when we build tcp options for data packets, we _include_ a
> timestamp, while our SYNACK didnt mention the option.  
> 
> So the following trafic can happen (and fails) :
> 
> 18:38:29.041966 IP 192.168.0.33.58906 > 192.168.0.56.22226: Flags [S], seq 4014064674, win 8860, options [mss 4430,sackOK,TS val 519041 ecr 0,nop,wscale 7,nop,nop,md5can't check - 9b44126367effcf3247fcbf6da76b24d], length 0
> 18:38:29.042072 IP 192.168.0.56.22226 > 192.168.0.33.58906: Flags [S.], seq 586328714, ack 4014064675, win 5792, options [nop,nop,md5can't check - badd847799ded46f39642c341cc7e92b,mss 1460,nop,nop,sackOK,nop,wscale 7], length 0
> 18:38:29.042093 IP 192.168.0.33.58906 > 192.168.0.56.22226: Flags [.], ack 1, win 70, options [nop,nop,md5can't check - 3994ef6987df02a592963fba04c5d313], length 0
> 18:38:29.043217 IP 192.168.0.33.58906 > 192.168.0.56.22226: Flags [.], seq 1:1441, ack 1, win 70, options [nop,nop,md5can't check - 8399f7ccab3a6b8c5a3027ed58bba314], length 1440
> 18:38:29.043226 IP 192.168.0.33.58906 > 192.168.0.56.22226: Flags [P.], seq 1441:2501, ack 1, win 70, options [nop,nop,md5can't check - 701ebf65b1894a6bed4cefbf7a56596a], length 1060
> 18:38:29.043374 IP 192.168.0.56.22226 > 192.168.0.33.58906: Flags [.], ack 1441, win 68, options [nop,nop,md5can't check - 1badb315ba436ab59bff5b37daa871be,nop,nop,TS val 113051377 ecr 519041], length 0
> 18:38:29.043383 IP 192.168.0.56.22226 > 192.168.0.33.58906: Flags [.], ack 2501, win 91, options [nop,nop,md5can't check - 120564dcb99f822f3b70910282a6ed9d,nop,nop,TS val 113051377 ecr 519041], length 0
> 18:38:29.043673 IP 192.168.0.56.22226 > 192.168.0.33.58906: Flags [.], seq 1:1429, ack 2501, win 91, options [nop,nop,md5can't check - fe5dfb438065373b52ba85bf800876a8,nop,nop,TS val 113051377 ecr 519041], length 1428
> 18:38:29.043681 IP 192.168.0.56.22226 > 192.168.0.33.58906: Flags [P.], seq 1429:2500, ack 2501, win 91, options [nop,nop,md5can't check - 7a910cd5ff357bf0e2c8d3489aafaa86,nop,nop,TS val 113051377 ecr 519041], length 1071
> 18:38:32.037786 IP 192.168.0.56.22226 > 192.168.0.33.58906: Flags [.], seq 1:1429, ack 2501, win 91, options [nop,nop,md5can't check - fe5dfb438065373b52ba85bf800876a8,nop,nop,TS val 113051677 ecr 519041], length 1428
> 18:38:38.037708 IP 192.168.0.56.22226 > 192.168.0.33.58906: Flags [.], seq 1:1429, ack 2501, win 91, options [nop,nop,md5can't check - fe5dfb438065373b52ba85bf800876a8,nop,nop,TS val 113052277 ecr 519041], length 1428
> 18:38:50.037524 IP 192.168.0.56.22226 > 192.168.0.33.58906: Flags [.], seq 1:1429, ack 2501, win 91, options [nop,nop,md5can't check - fe5dfb438065373b52ba85bf800876a8,nop,nop,TS val 113053477 ecr 519041], length 1428
> 
> 
> Could you try following patch ?
> 
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 5db3a2c..0be21cd 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -668,7 +668,7 @@ static unsigned tcp_synack_options(struct sock *sk,
>  	u8 cookie_plus = (xvp != NULL && !xvp->cookie_out_never) ?
>  			 xvp->cookie_plus :
>  			 0;
> -	bool doing_ts = ireq->tstamp_ok;
> +	bool doing_ts;
>  
>  #ifdef CONFIG_TCP_MD5SIG
>  	*md5 = tcp_rsk(req)->af_specific->md5_lookup(sk, req);
> @@ -681,11 +681,12 @@ static unsigned tcp_synack_options(struct sock *sk,
>  		 * rather than TS in order to fit in better with old,
>  		 * buggy kernels, but that was deemed to be unnecessary.
>  		 */
> -		doing_ts &= !ireq->sack_ok;
> +		ireq->tstamp_ok &= !ireq->sack_ok;
>  	}
>  #else
>  	*md5 = NULL;
>  #endif
> +	doing_ts = ireq->tstamp_ok;
>  
>  	/* We always send an MSS option. */
>  	opts->mss = mss;
> 
> 
> 
> 

Bijay, had you tested this patch by any chance ?

Thanks



^ permalink raw reply

* Re: bnx2/BCM5709: why 5 interrupts on a 4 core system (2.6.33.3)
From: Eric Dumazet @ 2010-05-16 20:47 UTC (permalink / raw)
  To: Krzysztof Olędzki; +Cc: Michael Chan, netdev@vger.kernel.org
In-Reply-To: <4BF056F0.8010008@ans.pl>

Le dimanche 16 mai 2010 à 22:34 +0200, Krzysztof Olędzki a écrit :
> On 2010-05-16 22:15, Eric Dumazet wrote:

> > All tx packets through bonding will use txqueue 0, since bnx2 doesnt
> > provide a ndo_select_queue() function.
> 
> OK, that explains everything. Thank you Eric. I assume it may take some 
> time for bonding to become multiqueue aware and/or bnx2x to provide 
> ndo_select_queue?
> 

bonding might become multiqueue aware, there are several patches
floating around.

But with your ping tests, it wont change the selected txqueue anyway (it
will be the same for any targets, because skb_tx_hash() wont hash the
destination address, only the skb->protocol.

> BTW: With a normal router workload, should I expect big performance drop 
> when receiving and forwarding the same packet using different CPUs? 
> Bonding provides very important functionality, I'm not able to drop it. :(
> 

Not sure what you mean by forwarding same packet using different CPUs.
You probably meant different queues, because in normal case, only one
cpu is involved (the one receiving the packet is also the one
transmitting it, unless you have congestion or trafic shaping)

If you have 4 cpus, you can use following patch and have a transparent
bonding against multiqueue. Still bonding xmit path hits a global
rwlock, so performance is not what you can get without bonding.

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 5e12462..2c257f7 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -5012,8 +5012,8 @@ int bond_create(struct net *net, const char *name)

 	rtnl_lock();

-	bond_dev = alloc_netdev(sizeof(struct bonding), name ? name : "",
-				bond_setup);
+	bond_dev = alloc_netdev_mq(sizeof(struct bonding), name ? name : "",
+				bond_setup, 4);
 	if (!bond_dev) {
 		pr_err("%s: eek! can't alloc netdev!\n", name);
 		rtnl_unlock();

^ permalink raw reply related

* Re: bnx2/BCM5709: why 5 interrupts on a 4 core system (2.6.33.3)
From: Krzysztof Olędzki @ 2010-05-16 20:34 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Michael Chan, netdev@vger.kernel.org
In-Reply-To: <1274040928.2299.17.camel@edumazet-laptop>

On 2010-05-16 22:15, Eric Dumazet wrote:
> Le dimanche 16 mai 2010 à 13:00 -0700, Michael Chan a écrit :
>> Krzysztof Oledzki wrote:
>>
>>> On 2010-05-16 20:51, Michael Chan wrote:
>>>> Krzysztof Oledzki wrote:
>>>>
>>>>>
>>>>> Why the driver registers 5 interrupts instead of 4? How to
>>>>> limit it to 4?
>>>>>
>>>>
>>>> The first vector (eth0-0) handles link interrupt and other slow
>>>> path events.  It also has an RX ring for non-IP packets that are
>>>> not hashed by the RSS hash.  The majority of the rx packets should
>>>> be hashed to the rx rings eth0-1 - eth0-4, so I would assign these
>>>> vectors to different CPUs.
>>>
>>> Thank you for your prompt response.
>>>
>>> In my case the first vector must be handling something more:
>>>   - "ping -f 192.168.0.1" increases interrupts on both eth1-0
>>> and eth1-4
>>>   - "ping -f 192.168.0.2" increases interrupts on both eth1-0
>>> and eth1-3
>>>   - "ping -f 192.168.0.3" increases interrupts on both eth1-0
>>> and eth1-1
>>>   - "ping -f 192.168.0.7" increases interrupts on both eth1-0
>>> and eth1-2
>>>
>>>              CPU0       CPU1       CPU2       CPU3
>>>    67:    1563979          0          0          0
>>> PCI-MSI-edge      eth1-0
>>>    68:    1072869          0          0          0
>>> PCI-MSI-edge      eth1-1
>>>    69:     137905          0          0          0
>>> PCI-MSI-edge      eth1-2
>>>    70:     259246          0          0          0
>>> PCI-MSI-edge      eth1-3
>>>    71:     760252          0          0          0
>>> PCI-MSI-edge      eth1-4
>>>
>>> As you can see, eth1-1 + eth1-2 + eth1-3 + eth1-4 ~= eth1-0.
>>
>> I think that ICMP ping packets will always go to ring 0 (eth1-0)
>> because they are non-IP packets.  I need to double check tomorrow
>> on how exactly the hashing works on RX.  Can you try running IP
>> traffic?  IP packets should theoretically go to rings 1 - 4.
>>
>
> ICMP packets are IP packets (Protocol=1)

Exactly. However, the firmware may handle ICMP and TCP in a different way.

>>> So, it seems that TX or RX is always handled by the first vector.
>>> I'll try to find if it is TX or RX.
>>>
>>> BTW: I'm using .1Q vlans over bonding, does it change anything?
>>
>> That should not matter, as the VLAN tag is stripped before hashing.
>
> warning, bonding currently is not multiqueue aware.
>
> All tx packets through bonding will use txqueue 0, since bnx2 doesnt
> provide a ndo_select_queue() function.

OK, that explains everything. Thank you Eric. I assume it may take some 
time for bonding to become multiqueue aware and/or bnx2x to provide 
ndo_select_queue?

BTW: With a normal router workload, should I expect big performance drop 
when receiving and forwarding the same packet using different CPUs? 
Bonding provides very important functionality, I'm not able to drop it. :(

Best regards,

			Krzysztof Olędzki

^ permalink raw reply

* Re: bnx2/BCM5709: why 5 interrupts on a 4 core system (2.6.33.3)
From: Michael Chan @ 2010-05-16 20:24 UTC (permalink / raw)
  To: 'Eric Dumazet'
  Cc: 'Krzysztof Oledzki', netdev@vger.kernel.org
In-Reply-To: <1274040928.2299.17.camel@edumazet-laptop>

Eric Dumazet write:

> > I think that ICMP ping packets will always go to ring 0 (eth1-0)
> > because they are non-IP packets.  I need to double check tomorrow
> > on how exactly the hashing works on RX.  Can you try running IP
> > traffic?  IP packets should theoretically go to rings 1 - 4.
> >
>
> ICMP packets are IP packets (Protocol=1)
>

Sorry, Eric is right.  Anyway, I'll check on the hashing to see how
it works on UDP, TCP, and other packets.


^ permalink raw reply

* Re: bnx2/BCM5709: why 5 interrupts on a 4 core system (2.6.33.3)
From: Eric Dumazet @ 2010-05-16 20:15 UTC (permalink / raw)
  To: Michael Chan; +Cc: 'Krzysztof Oledzki', netdev@vger.kernel.org
In-Reply-To: <C27F8246C663564A84BB7AB3439772421B7814753A@IRVEXCHCCR01.corp.ad.broadcom.com>

Le dimanche 16 mai 2010 à 13:00 -0700, Michael Chan a écrit :
> Krzysztof Oledzki wrote:
> 
> > On 2010-05-16 20:51, Michael Chan wrote:
> > > Krzysztof Oledzki wrote:
> > >
> > >>
> > >> Why the driver registers 5 interrupts instead of 4? How to
> > >> limit it to 4?
> > >>
> > >
> > > The first vector (eth0-0) handles link interrupt and other slow
> > > path events.  It also has an RX ring for non-IP packets that are
> > > not hashed by the RSS hash.  The majority of the rx packets should
> > > be hashed to the rx rings eth0-1 - eth0-4, so I would assign these
> > > vectors to different CPUs.
> >
> > Thank you for your prompt response.
> >
> > In my case the first vector must be handling something more:
> >  - "ping -f 192.168.0.1" increases interrupts on both eth1-0
> > and eth1-4
> >  - "ping -f 192.168.0.2" increases interrupts on both eth1-0
> > and eth1-3
> >  - "ping -f 192.168.0.3" increases interrupts on both eth1-0
> > and eth1-1
> >  - "ping -f 192.168.0.7" increases interrupts on both eth1-0
> > and eth1-2
> >
> >             CPU0       CPU1       CPU2       CPU3
> >   67:    1563979          0          0          0
> > PCI-MSI-edge      eth1-0
> >   68:    1072869          0          0          0
> > PCI-MSI-edge      eth1-1
> >   69:     137905          0          0          0
> > PCI-MSI-edge      eth1-2
> >   70:     259246          0          0          0
> > PCI-MSI-edge      eth1-3
> >   71:     760252          0          0          0
> > PCI-MSI-edge      eth1-4
> >
> > As you can see, eth1-1 + eth1-2 + eth1-3 + eth1-4 ~= eth1-0.
> 
> I think that ICMP ping packets will always go to ring 0 (eth1-0)
> because they are non-IP packets.  I need to double check tomorrow
> on how exactly the hashing works on RX.  Can you try running IP
> traffic?  IP packets should theoretically go to rings 1 - 4.
> 

ICMP packets are IP packets (Protocol=1)

> >
> > So, it seems that TX or RX is always handled by the first vector.
> > I'll try to find if it is TX or RX.
> >
> > BTW: I'm using .1Q vlans over bonding, does it change anything?
> 
> That should not matter, as the VLAN tag is stripped before hashing.

warning, bonding currently is not multiqueue aware.

All tx packets through bonding will use txqueue 0, since bnx2 doesnt
provide a ndo_select_queue() function.






^ permalink raw reply

* Re: TCP-MD5 checksum failure on x86_64 SMP
From: Eric Dumazet @ 2010-05-16 19:53 UTC (permalink / raw)
  To: David Miller
  Cc: shemminger, Bijay.Singh, bhaskie, bhutchings, netdev,
	ilpo.jarvinen
In-Reply-To: <20100512.152406.193725816.davem@davemloft.net>

Le mercredi 12 mai 2010 à 15:24 -0700, David Miller a écrit :
> From: Stephen Hemminger <shemminger@vyatta.com>
> Date: Wed, 12 May 2010 15:22:07 -0700
> 
> > Yes, that looks like a possible bug, not sure what hardware
> > generates frag_list.
> 
> GRO generates frag_list

ixgbe (82599) too, if I understand well this driver (TCP Receive Side
Coalescing RSC)




^ permalink raw reply

* Re: bnx2/BCM5709: why 5 interrupts on a 4 core system (2.6.33.3)
From: Michael Chan @ 2010-05-16 20:00 UTC (permalink / raw)
  To: 'Krzysztof Oledzki'; +Cc: netdev@vger.kernel.org
In-Reply-To: <4BF0465A.5030307@ans.pl>

Krzysztof Oledzki wrote:

> On 2010-05-16 20:51, Michael Chan wrote:
> > Krzysztof Oledzki wrote:
> >
> >>
> >> Why the driver registers 5 interrupts instead of 4? How to
> >> limit it to 4?
> >>
> >
> > The first vector (eth0-0) handles link interrupt and other slow
> > path events.  It also has an RX ring for non-IP packets that are
> > not hashed by the RSS hash.  The majority of the rx packets should
> > be hashed to the rx rings eth0-1 - eth0-4, so I would assign these
> > vectors to different CPUs.
>
> Thank you for your prompt response.
>
> In my case the first vector must be handling something more:
>  - "ping -f 192.168.0.1" increases interrupts on both eth1-0
> and eth1-4
>  - "ping -f 192.168.0.2" increases interrupts on both eth1-0
> and eth1-3
>  - "ping -f 192.168.0.3" increases interrupts on both eth1-0
> and eth1-1
>  - "ping -f 192.168.0.7" increases interrupts on both eth1-0
> and eth1-2
>
>             CPU0       CPU1       CPU2       CPU3
>   67:    1563979          0          0          0
> PCI-MSI-edge      eth1-0
>   68:    1072869          0          0          0
> PCI-MSI-edge      eth1-1
>   69:     137905          0          0          0
> PCI-MSI-edge      eth1-2
>   70:     259246          0          0          0
> PCI-MSI-edge      eth1-3
>   71:     760252          0          0          0
> PCI-MSI-edge      eth1-4
>
> As you can see, eth1-1 + eth1-2 + eth1-3 + eth1-4 ~= eth1-0.

I think that ICMP ping packets will always go to ring 0 (eth1-0)
because they are non-IP packets.  I need to double check tomorrow
on how exactly the hashing works on RX.  Can you try running IP
traffic?  IP packets should theoretically go to rings 1 - 4.

>
> So, it seems that TX or RX is always handled by the first vector.
> I'll try to find if it is TX or RX.
>
> BTW: I'm using .1Q vlans over bonding, does it change anything?

That should not matter, as the VLAN tag is stripped before hashing.



^ permalink raw reply

* Re: bnx2/BCM5709: why 5 interrupts on a 4 core system (2.6.33.3)
From: Krzysztof Olędzki @ 2010-05-16 19:49 UTC (permalink / raw)
  To: Michael Chan; +Cc: netdev@vger.kernel.org
In-Reply-To: <4BF0465A.5030307@ans.pl>

On 2010-05-16 21:24, Krzysztof Olędzki wrote:
> On 2010-05-16 20:51, Michael Chan wrote:
>> Krzysztof Oledzki wrote:
>>
>>>
>>> Why the driver registers 5 interrupts instead of 4? How to
>>> limit it to 4?
>>>
>>
>> The first vector (eth0-0) handles link interrupt and other slow
>> path events.  It also has an RX ring for non-IP packets that are
>> not hashed by the RSS hash.  The majority of the rx packets should
>> be hashed to the rx rings eth0-1 - eth0-4, so I would assign these
>> vectors to different CPUs.
>
> Thank you for your prompt response.
>
> In my case the first vector must be handling something more:
>   - "ping -f 192.168.0.1" increases interrupts on both eth1-0 and eth1-4
>   - "ping -f 192.168.0.2" increases interrupts on both eth1-0 and eth1-3
>   - "ping -f 192.168.0.3" increases interrupts on both eth1-0 and eth1-1
>   - "ping -f 192.168.0.7" increases interrupts on both eth1-0 and eth1-2
>
>              CPU0       CPU1       CPU2       CPU3
>    67:    1563979          0          0          0   PCI-MSI-edge      eth1-0
>    68:    1072869          0          0          0   PCI-MSI-edge      eth1-1
>    69:     137905          0          0          0   PCI-MSI-edge      eth1-2
>    70:     259246          0          0          0   PCI-MSI-edge      eth1-3
>    71:     760252          0          0          0   PCI-MSI-edge      eth1-4
>
> As you can see, eth1-1 + eth1-2 + eth1-3 + eth1-4 ~= eth1-0.
>
> So, it seems that TX or RX is always handled by the first vector.
> I'll try to find if it is TX or RX.
>
> BTW: I'm using .1Q vlans over bonding, does it change anything?

It looks like TX for locally generated packets is always performed on 
eth1-0. I guess it should look differently for forwarded packets?

Best regards,

			Krzysztof Olędzki

^ permalink raw reply

* Re: bnx2/BCM5709: why 5 interrupts on a 4 core system (2.6.33.3)
From: Krzysztof Olędzki @ 2010-05-16 19:24 UTC (permalink / raw)
  To: Michael Chan; +Cc: netdev@vger.kernel.org
In-Reply-To: <C27F8246C663564A84BB7AB3439772421B78147539@IRVEXCHCCR01.corp.ad.broadcom.com>

On 2010-05-16 20:51, Michael Chan wrote:
> Krzysztof Oledzki wrote:
> 
>>
>> Why the driver registers 5 interrupts instead of 4? How to
>> limit it to 4?
>>
> 
> The first vector (eth0-0) handles link interrupt and other slow
> path events.  It also has an RX ring for non-IP packets that are
> not hashed by the RSS hash.  The majority of the rx packets should
> be hashed to the rx rings eth0-1 - eth0-4, so I would assign these
> vectors to different CPUs.

Thank you for your prompt response.

In my case the first vector must be handling something more:
 - "ping -f 192.168.0.1" increases interrupts on both eth1-0 and eth1-4
 - "ping -f 192.168.0.2" increases interrupts on both eth1-0 and eth1-3
 - "ping -f 192.168.0.3" increases interrupts on both eth1-0 and eth1-1
 - "ping -f 192.168.0.7" increases interrupts on both eth1-0 and eth1-2

            CPU0       CPU1       CPU2       CPU3
  67:    1563979          0          0          0   PCI-MSI-edge      eth1-0
  68:    1072869          0          0          0   PCI-MSI-edge      eth1-1
  69:     137905          0          0          0   PCI-MSI-edge      eth1-2
  70:     259246          0          0          0   PCI-MSI-edge      eth1-3
  71:     760252          0          0          0   PCI-MSI-edge      eth1-4

As you can see, eth1-1 + eth1-2 + eth1-3 + eth1-4 ~= eth1-0.

So, it seems that TX or RX is always handled by the first vector.
I'll try to find if it is TX or RX.

BTW: I'm using .1Q vlans over bonding, does it change anything?

Best regards,

			Krzysztof Olędzki

^ permalink raw reply

* Re: bnx2/BCM5709: why 5 interrupts on a 4 core system (2.6.33.3)
From: Michael Chan @ 2010-05-16 18:51 UTC (permalink / raw)
  To: 'Krzysztof Oledzki'; +Cc: netdev@vger.kernel.org
In-Reply-To: <alpine.LNX.1.10.1005161511490.6004@bizon.gios.gov.pl>

Krzysztof Oledzki wrote:

>
> Why the driver registers 5 interrupts instead of 4? How to
> limit it to 4?
>

The first vector (eth0-0) handles link interrupt and other slow
path events.  It also has an RX ring for non-IP packets that are
not hashed by the RSS hash.  The majority of the rx packets should
be hashed to the rx rings eth0-1 - eth0-4, so I would assign these
vectors to different CPUs.

^ permalink raw reply

* Re: [PATCH net-next-2.6] net: Consistent skb timestamping
From: Dimitris Michailidis @ 2010-05-16 18:30 UTC (permalink / raw)
  To: David Miller; +Cc: therbert, eric.dumazet, netdev
In-Reply-To: <20100515.235635.63009445.davem@davemloft.net>

David Miller wrote:

> The real fix is to make the devices less stupid and give us timestamps
> directly, and thanks to things like PTP support in hardware that's
> actually more and more of a reality these days.

For cxgb4 a timestamp is written into Rx descriptors for each received 
packet.  The value comes from a TSC-like cycle counter.  The raw timestamp 
is very cheap to get, its value converted to system ktime a bit less so 
though not too bad.  It would be nicer though if the stack could hint the 
driver whether it should do the conversion at all.  Maybe export 
netstamp_needed and add an inline wrapper to read it?

^ permalink raw reply

* Re: Weird TCP retransmit behaviour in recent kernels
From: Michael Smith @ 2010-05-16 16:08 UTC (permalink / raw)
  To: Ilpo Järvinen; +Cc: Netdev
In-Reply-To: <alpine.DEB.2.00.1005160118240.30522@melkinpaasi.cs.helsinki.fi>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1091 bytes --]

On Sun, 16 May 2010, Ilpo Järvinen wrote:

> On Fri, 14 May 2010, Michael Smith wrote:

>> It seems like when consecutive packets are lost, the SLES11
>> server retransmits the first packet when the timeout fires. The client
>> ACKs, but the server doesn't retransmit the next lost packet; instead,
>> it sends a couple more new packets, which don't get ACKed.

> This is where your problem is, they should get acked in a _compliant_
> network (with duplicate ACKs).

> Some have seen similar phenomena, every time it has been fault in some
> middlebox/peer that does not do what it should. You can disable frto
> using tcp_frto sysctl if you like, however, I disagree with you as I'm
> pretty sure there is some broken middlebox in the network (which is trying
> to be too intelligent).

Thanks - tcp_frto=0 works around the problem here. The network in the 
middle is provided by a number of other parties, so I can try to point 
them in the right direction, but unless Microsoft turns on FRTO by default 
sometime soon, I doubt they will have time to care. :)

Mike

^ permalink raw reply

* Re: VLAN I/F's and TX queue.
From: Joakim Tjernlund @ 2010-05-16 14:22 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, kaber, netdev
In-Reply-To: <20100516.004041.73702151.davem@davemloft.net>


David Miller <davem@davemloft.net> wrote on 2010/05/16 09:40:41:
>
> From: Joakim Tjernlund <joakim.tjernlund@transmode.se>
> Date: Mon, 10 May 2010 16:50:20 +0200
>
> > Patrick McHardy <kaber@trash.net> wrote on 2010/05/10 16:33:00:
> >>
> >> Joakim Tjernlund wrote:
> >> > Eric Dumazet <eric.dumazet@gmail.com> wrote on 2010/05/07 10:53:23:
> >> >>> 3) I would expect lost pkgs to be accounted on eth0 instead of
> >> >>>    the VLAN interface(s) since that is where the pkg is lost, why
> >> >>>    isn't it so?
> >> >> You try to send packets on eth0.XXX, some are dropped, and accounted for
> >> >> on eth0.XXX stats. What is wrong with this ?
> >> >
> >> > In this case one lost pkg is accounted for twice, once on eth0.1 and
> >> > once more on eth0.1.1. Note that eth0.1.1 is stacked on
> >> > top of eth0.1
> >> >
> >> > I would at least expect eth0 to also account lost pkgs too.
> >> > I was confused by the current accounting as I knew that
> >> > the underlying HW I/F should be the only I/F that could
> >> > drop pkgs.
> >>
> >> In case of NET_XMIT_CN, the packet is dropped by the qdisc before
> >> it reaches eth0, so its only accounted on the upper devices.
> >
> > hmm, I am afraid I don't follow this. Why would a pkg be dropped before
> > it reaches eth0?
>
> Because we have packet schedulers that sit before the device transmit
> happens, and those packet schedulers enforce limits based upon
> classification results or other criteria, and if those limits are
> exceeded packets are droppers and NET_XMIT_CN is returned back up into
> the transmit path of the networking stack.

OK, but what I don't get is if pgks are dropped as soon as the underlying
device cannot handle the pkg directly(returns !NETDEV_TX_OK or stops the queue)?
Are !NETDEV_TX_OK and stopping the queue handled differently by upper layers?
I would have expected the pkg be added to the TX queue and transmitted somewhat later.
If not, what is the TX queue for?

>
> The device never sees that packet get submitted to it's ->ndo_start_xmit()
> routine, and this is entirely intentional.  And it is entirely intentional
> that NET_XMIT_CN gets passed up into the caller, where protocols such as
> TCP can key off this information to make congestion control decisions.

In this case it gets passed up to the VLAN driver, should the VLAN driver
do something else to use the TX queue?

      Jocke


^ permalink raw reply

* bnx2/BCM5709: why 5 interrupts on a 4 core system (2.6.33.3)
From: Krzysztof Oledzki @ 2010-05-16 13:33 UTC (permalink / raw)
  To: Michael Chan; +Cc: netdev

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1531 bytes --]

Hello,

I have a Dell R610 server with BCM5709 NICs. The server has one, 4-core 
CPU (X5570) and four BCM5709 NICs onboard. I would like to assign each 
NIC's interrupt to a different CPU for a better performance. However, as I 
have 5 INTs to assign and only 4 CPUs available, it is not obvious how to 
do it right:

             CPU0       CPU1       CPU2       CPU3
   61:      85085          0          0          0   PCI-MSI-edge      eth1-0
   62:      23046          0          0          0   PCI-MSI-edge      eth1-1
   63:      24525          0          0          0   PCI-MSI-edge      eth1-2
   64:      77801          0          0          0   PCI-MSI-edge      eth1-3
   65:      24006          0          0          0   PCI-MSI-edge      eth1-4

# uname -r
2.6.33.3

# dmesg |grep  0000:01:00.0
bnx2 0000:01:00.0: PCI INT A -> GSI 36 (level, low) -> IRQ 36
bnx2 0000:01:00.0: setting latency timer to 64
bnx2 0000:01:00.0: firmware: requesting bnx2/bnx2-mips-09-5.0.0.j3.fw
bnx2 0000:01:00.0: firmware: requesting bnx2/bnx2-rv2p-09-5.0.0.j3.fw
bnx2 0000:01:00.0: irq 61 for MSI/MSI-X
bnx2 0000:01:00.0: irq 62 for MSI/MSI-X
bnx2 0000:01:00.0: irq 63 for MSI/MSI-X
bnx2 0000:01:00.0: irq 64 for MSI/MSI-X
bnx2 0000:01:00.0: irq 65 for MSI/MSI-X
bnx2 0000:01:00.0: irq 66 for MSI/MSI-X
bnx2 0000:01:00.0: irq 67 for MSI/MSI-X
bnx2 0000:01:00.0: irq 68 for MSI/MSI-X
bnx2 0000:01:00.0: irq 69 for MSI/MSI-X

Why the driver registers 5 interrupts instead of 4? How to limit it to 4?

Best regards,

 				Krzysztof Olędzki

^ permalink raw reply

* [PATCH] atm: select FW_LOADER in Kconfig for solos-pci
From: Nathan Williams @ 2010-05-16 13:12 UTC (permalink / raw)
  To: netdev

solos-pci uses request_firmware() for firmware upgrades

Signed-off-by: Nathan Williams <nathan@traverse.com.au>

diff --git a/drivers/atm/Kconfig b/drivers/atm/Kconfig
index 191b85e..f1a0a00 100644
--- a/drivers/atm/Kconfig
+++ b/drivers/atm/Kconfig
@@ -394,6 +394,7 @@ config ATM_HE_USE_SUNI
 config ATM_SOLOS
 	tristate "Solos ADSL2+ PCI Multiport card driver"
 	depends on PCI
+	select FW_LOADER
 	help
 	  Support for the Solos multiport ADSL2+ card.


^ permalink raw reply related

* [PATCH] r6040: fix link checking with switches
From: Florian Fainelli @ 2010-05-16 12:30 UTC (permalink / raw)
  To: netdev, David Miller

The current link checking logic only works for one port, which is not correct
for swiches were multiple ports can have different link status. As a result
we would only check for link status on port 1 of the switch. Move the calls
to mii_check_media in r6040_timer which will be polling a single PHY chip
correctly and assume link is up for switches.

Signed-off-by: Florian Fainelli <florian@openwrt.org>
---
 drivers/net/r6040.c |    8 +++-----
 1 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/net/r6040.c b/drivers/net/r6040.c
index 4122916..eeee379 100644
--- a/drivers/net/r6040.c
+++ b/drivers/net/r6040.c
@@ -400,9 +400,6 @@ static void r6040_init_mac_regs(struct net_device *dev)
 	 * we may got called by r6040_tx_timeout which has left
 	 * some unsent tx buffers */
 	iowrite16(0x01, ioaddr + MTPR);
-
-	/* Check media */
-	mii_check_media(&lp->mii_if, 1, 1);
 }
 
 static void r6040_tx_timeout(struct net_device *dev)
@@ -530,8 +527,6 @@ static int r6040_phy_mode_chk(struct net_device *dev)
 			phy_dat = 0x0000;
 	}
 
-	mii_check_media(&lp->mii_if, 0, 1);
-
 	return phy_dat;
 };
 
@@ -813,6 +808,9 @@ static void r6040_timer(unsigned long data)
 
 	/* Timer active again */
 	mod_timer(&lp->timer, round_jiffies(jiffies + HZ));
+
+	/* Check media */
+	mii_check_media(&lp->mii_if, 1, 1);
 }
 
 /* Read/set MAC address routines */
-- 
1.7.1



^ permalink raw reply related

* [PATCH] dm9000: fix "BUG: spinlock recursion"
From: Baruch Siach @ 2010-05-16 10:06 UTC (permalink / raw)
  To: netdev; +Cc: Baruch Siach, stable, Sascha Hauer, Ben Dooks

dm9000_set_rx_csum and dm9000_hash_table are called from atomic context (in
dm9000_init_dm9000), and from non-atomic context (via ethtool_ops and
net_device_ops respectively). This causes a spinlock recursion BUG. Fix this by
renaming these functions to *_unlocked for the atomic context, and make the
original functions locking wrappers for use in the non-atomic context.

Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Cc: stable@kernel.org
Cc: Sascha Hauer <s.hauer@pengutronix.de>
Cc: Ben Dooks <ben-linux@fluff.org>
---
 drivers/net/dm9000.c |   38 +++++++++++++++++++++++++++-----------
 1 files changed, 27 insertions(+), 11 deletions(-)

diff --git a/drivers/net/dm9000.c b/drivers/net/dm9000.c
index 7f9960f..3556b2c 100644
--- a/drivers/net/dm9000.c
+++ b/drivers/net/dm9000.c
@@ -476,17 +476,13 @@ static uint32_t dm9000_get_rx_csum(struct net_device *dev)
 	return dm->rx_csum;
 }
 
-static int dm9000_set_rx_csum(struct net_device *dev, uint32_t data)
+static int dm9000_set_rx_csum_unlocked(struct net_device *dev, uint32_t data)
 {
 	board_info_t *dm = to_dm9000_board(dev);
-	unsigned long flags;
 
 	if (dm->can_csum) {
 		dm->rx_csum = data;
-
-		spin_lock_irqsave(&dm->lock, flags);
 		iow(dm, DM9000_RCSR, dm->rx_csum ? RCSR_CSUM : 0);
-		spin_unlock_irqrestore(&dm->lock, flags);
 
 		return 0;
 	}
@@ -494,6 +490,19 @@ static int dm9000_set_rx_csum(struct net_device *dev, uint32_t data)
 	return -EOPNOTSUPP;
 }
 
+static int dm9000_set_rx_csum(struct net_device *dev, uint32_t data)
+{
+	board_info_t *dm = to_dm9000_board(dev);
+	unsigned long flags;
+	int ret;
+
+	spin_lock_irqsave(&dm->lock, flags);
+	ret = dm9000_set_rx_csum_unlocked(dev, data);
+	spin_unlock_irqrestore(&dm->lock, flags);
+
+	return ret;
+}
+
 static int dm9000_set_tx_csum(struct net_device *dev, uint32_t data)
 {
 	board_info_t *dm = to_dm9000_board(dev);
@@ -722,7 +731,7 @@ static unsigned char dm9000_type_to_char(enum dm9000_type type)
  *  Set DM9000 multicast address
  */
 static void
-dm9000_hash_table(struct net_device *dev)
+dm9000_hash_table_unlocked(struct net_device *dev)
 {
 	board_info_t *db = netdev_priv(dev);
 	struct dev_mc_list *mcptr;
@@ -730,12 +739,9 @@ dm9000_hash_table(struct net_device *dev)
 	u32 hash_val;
 	u16 hash_table[4];
 	u8 rcr = RCR_DIS_LONG | RCR_DIS_CRC | RCR_RXEN;
-	unsigned long flags;
 
 	dm9000_dbg(db, 1, "entering %s\n", __func__);
 
-	spin_lock_irqsave(&db->lock, flags);
-
 	for (i = 0, oft = DM9000_PAR; i < 6; i++, oft++)
 		iow(db, oft, dev->dev_addr[i]);
 
@@ -765,6 +771,16 @@ dm9000_hash_table(struct net_device *dev)
 	}
 
 	iow(db, DM9000_RCR, rcr);
+}
+
+static void
+dm9000_hash_table(struct net_device *dev)
+{
+	board_info_t *db = netdev_priv(dev);
+	unsigned long flags;
+
+	spin_lock_irqsave(&db->lock, flags);
+	dm9000_hash_table_unlocked(dev);
 	spin_unlock_irqrestore(&db->lock, flags);
 }
 
@@ -784,7 +800,7 @@ dm9000_init_dm9000(struct net_device *dev)
 	db->io_mode = ior(db, DM9000_ISR) >> 6;	/* ISR bit7:6 keeps I/O mode */
 
 	/* Checksum mode */
-	dm9000_set_rx_csum(dev, db->rx_csum);
+	dm9000_set_rx_csum_unlocked(dev, db->rx_csum);
 
 	/* GPIO0 on pre-activate PHY */
 	iow(db, DM9000_GPR, 0);	/* REG_1F bit0 activate phyxcer */
@@ -811,7 +827,7 @@ dm9000_init_dm9000(struct net_device *dev)
 	iow(db, DM9000_ISR, ISR_CLR_STATUS); /* Clear interrupt status */
 
 	/* Set address filter table */
-	dm9000_hash_table(dev);
+	dm9000_hash_table_unlocked(dev);
 
 	imr = IMR_PAR | IMR_PTM | IMR_PRM;
 	if (db->type != TYPE_DM9000E)
-- 
1.7.1


^ permalink raw reply related

* [GIT] Networking
From: David Miller @ 2010-05-16  8:32 UTC (permalink / raw)
  To: torvalds; +Cc: akpm, netdev, linux-kernel

1) The new SR-IOV VF netlink messages had a critical error in their
   design, they were not designed to be symmetric (therefore you can't
   take a netlink message dump, and use the elements to recreate
   configurations like you can with other netlink message types).

   Cure this by using nested netlink attributes.

   Better to fix this before it appears in a released kernel.

   Fix from Chris Wright.

2) The per-cpu TCP md5 signature state was not properly softirq
   protected, leading to all kinds of corruptions.  Fix from Eric
   Dumazet.

3) SCTP transport teardown can leave a timer running, fix from Wei
   Yongjun.

Please pull, thanks!

The following changes since commit 4fc4c3ce0dc1096cbd0daa3fe8f6905cbec2b87e:
  Linus Torvalds (1):
        Merge branch 'for-linus' of git://git.infradead.org/users/eparis/notify

are available in the git repository at:

  master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6.git master

Chris Wright (1):
      rtnetlink: make SR-IOV VF interface symmetric

Eric Dumazet (1):
      tcp: fix MD5 (RFC2385) support

Wei Yongjun (1):
      sctp: delete active ICMP proto unreachable timer when free transport

 include/linux/if_link.h |   23 ++++++-
 include/net/tcp.h       |   21 +-----
 net/core/rtnetlink.c    |  159 ++++++++++++++++++++++++++++++++--------------
 net/ipv4/tcp.c          |   34 +++++++---
 net/sctp/transport.c    |    4 +
 5 files changed, 160 insertions(+), 81 deletions(-)

^ permalink raw reply

* Re: [PATCH] rtnetlink: make SR-IOV VF interface symmetric
From: David Miller @ 2010-05-16  8:05 UTC (permalink / raw)
  To: chrisw; +Cc: kaber, mitch.a.williams, arnd, scofeldm, shemminger, netdev
In-Reply-To: <20100515031416.GE15313@sequoia.sous-sol.org>

From: Chris Wright <chrisw@sous-sol.org>
Date: Fri, 14 May 2010 20:14:16 -0700

> Now we have a set of nested attributes:
> 
>   IFLA_VFINFO_LIST (NESTED)
>     IFLA_VF_INFO (NESTED)
>       IFLA_VF_MAC
>       IFLA_VF_VLAN
>       IFLA_VF_TX_RATE
> 
> This allows a single set to operate on multiple attributes if desired.
> Among other things, it means a dump can be replayed to set state.
> 
> The current interface has yet to be released, so this seems like
> something to consider for 2.6.34.
> 
> Signed-off-by: Chris Wright <chrisw@sous-sol.org

Agreed, applied to net-2.6, thanks Chris!

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox