Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next-2.6] net: Consistent skb timestamping
From: Dimitris Michailidis @ 2010-05-16 18:30 UTC (permalink / raw)
  To: David Miller; +Cc: therbert, eric.dumazet, netdev
In-Reply-To: <20100515.235635.63009445.davem@davemloft.net>

David Miller wrote:

> The real fix is to make the devices less stupid and give us timestamps
> directly, and thanks to things like PTP support in hardware that's
> actually more and more of a reality these days.

For cxgb4 a timestamp is written into Rx descriptors for each received 
packet.  The value comes from a TSC-like cycle counter.  The raw timestamp 
is very cheap to get, its value converted to system ktime a bit less so 
though not too bad.  It would be nicer though if the stack could hint the 
driver whether it should do the conversion at all.  Maybe export 
netstamp_needed and add an inline wrapper to read it?


^ permalink raw reply

* Re: bnx2/BCM5709: why 5 interrupts on a 4 core system (2.6.33.3)
From: Michael Chan @ 2010-05-16 18:51 UTC (permalink / raw)
  To: 'Krzysztof Oledzki'; +Cc: netdev@vger.kernel.org
In-Reply-To: <alpine.LNX.1.10.1005161511490.6004@bizon.gios.gov.pl>

Krzysztof Oledzki wrote:

>
> Why the driver registers 5 interrupts instead of 4? How to
> limit it to 4?
>

The first vector (eth0-0) handles link interrupt and other slow
path events.  It also has an RX ring for non-IP packets that are
not hashed by the RSS hash.  The majority of the rx packets should
be hashed to the rx rings eth0-1 - eth0-4, so I would assign these
vectors to different CPUs.


^ permalink raw reply

* Re: bnx2/BCM5709: why 5 interrupts on a 4 core system (2.6.33.3)
From: Krzysztof Olędzki @ 2010-05-16 19:24 UTC (permalink / raw)
  To: Michael Chan; +Cc: netdev@vger.kernel.org
In-Reply-To: <C27F8246C663564A84BB7AB3439772421B78147539@IRVEXCHCCR01.corp.ad.broadcom.com>

On 2010-05-16 20:51, Michael Chan wrote:
> Krzysztof Oledzki wrote:
> 
>>
>> Why the driver registers 5 interrupts instead of 4? How to
>> limit it to 4?
>>
> 
> The first vector (eth0-0) handles link interrupt and other slow
> path events.  It also has an RX ring for non-IP packets that are
> not hashed by the RSS hash.  The majority of the rx packets should
> be hashed to the rx rings eth0-1 - eth0-4, so I would assign these
> vectors to different CPUs.

Thank you for your prompt response.

In my case the first vector must be handling something more:
 - "ping -f 192.168.0.1" increases interrupts on both eth1-0 and eth1-4
 - "ping -f 192.168.0.2" increases interrupts on both eth1-0 and eth1-3
 - "ping -f 192.168.0.3" increases interrupts on both eth1-0 and eth1-1
 - "ping -f 192.168.0.7" increases interrupts on both eth1-0 and eth1-2

            CPU0       CPU1       CPU2       CPU3
  67:    1563979          0          0          0   PCI-MSI-edge      eth1-0
  68:    1072869          0          0          0   PCI-MSI-edge      eth1-1
  69:     137905          0          0          0   PCI-MSI-edge      eth1-2
  70:     259246          0          0          0   PCI-MSI-edge      eth1-3
  71:     760252          0          0          0   PCI-MSI-edge      eth1-4

As you can see, eth1-1 + eth1-2 + eth1-3 + eth1-4 ~= eth1-0.

So, it seems that TX or RX is always handled by the first vector.
I'll try to find if it is TX or RX.

BTW: I'm using .1Q vlans over bonding, does it change anything?

Best regards,

			Krzysztof Olędzki

^ permalink raw reply

* Re: bnx2/BCM5709: why 5 interrupts on a 4 core system (2.6.33.3)
From: Krzysztof Olędzki @ 2010-05-16 19:49 UTC (permalink / raw)
  To: Michael Chan; +Cc: netdev@vger.kernel.org
In-Reply-To: <4BF0465A.5030307@ans.pl>

On 2010-05-16 21:24, Krzysztof Olędzki wrote:
> On 2010-05-16 20:51, Michael Chan wrote:
>> Krzysztof Oledzki wrote:
>>
>>>
>>> Why the driver registers 5 interrupts instead of 4? How to
>>> limit it to 4?
>>>
>>
>> The first vector (eth0-0) handles link interrupt and other slow
>> path events.  It also has an RX ring for non-IP packets that are
>> not hashed by the RSS hash.  The majority of the rx packets should
>> be hashed to the rx rings eth0-1 - eth0-4, so I would assign these
>> vectors to different CPUs.
>
> Thank you for your prompt response.
>
> In my case the first vector must be handling something more:
>   - "ping -f 192.168.0.1" increases interrupts on both eth1-0 and eth1-4
>   - "ping -f 192.168.0.2" increases interrupts on both eth1-0 and eth1-3
>   - "ping -f 192.168.0.3" increases interrupts on both eth1-0 and eth1-1
>   - "ping -f 192.168.0.7" increases interrupts on both eth1-0 and eth1-2
>
>              CPU0       CPU1       CPU2       CPU3
>    67:    1563979          0          0          0   PCI-MSI-edge      eth1-0
>    68:    1072869          0          0          0   PCI-MSI-edge      eth1-1
>    69:     137905          0          0          0   PCI-MSI-edge      eth1-2
>    70:     259246          0          0          0   PCI-MSI-edge      eth1-3
>    71:     760252          0          0          0   PCI-MSI-edge      eth1-4
>
> As you can see, eth1-1 + eth1-2 + eth1-3 + eth1-4 ~= eth1-0.
>
> So, it seems that TX or RX is always handled by the first vector.
> I'll try to find if it is TX or RX.
>
> BTW: I'm using .1Q vlans over bonding, does it change anything?

It looks like TX for locally generated packets is always performed on 
eth1-0. I guess it should look differently for forwarded packets?

Best regards,

			Krzysztof Olędzki

^ permalink raw reply

* Re: bnx2/BCM5709: why 5 interrupts on a 4 core system (2.6.33.3)
From: Michael Chan @ 2010-05-16 20:00 UTC (permalink / raw)
  To: 'Krzysztof Oledzki'; +Cc: netdev@vger.kernel.org
In-Reply-To: <4BF0465A.5030307@ans.pl>

Krzysztof Oledzki wrote:

> On 2010-05-16 20:51, Michael Chan wrote:
> > Krzysztof Oledzki wrote:
> >
> >>
> >> Why the driver registers 5 interrupts instead of 4? How to
> >> limit it to 4?
> >>
> >
> > The first vector (eth0-0) handles link interrupt and other slow
> > path events.  It also has an RX ring for non-IP packets that are
> > not hashed by the RSS hash.  The majority of the rx packets should
> > be hashed to the rx rings eth0-1 - eth0-4, so I would assign these
> > vectors to different CPUs.
>
> Thank you for your prompt response.
>
> In my case the first vector must be handling something more:
>  - "ping -f 192.168.0.1" increases interrupts on both eth1-0
> and eth1-4
>  - "ping -f 192.168.0.2" increases interrupts on both eth1-0
> and eth1-3
>  - "ping -f 192.168.0.3" increases interrupts on both eth1-0
> and eth1-1
>  - "ping -f 192.168.0.7" increases interrupts on both eth1-0
> and eth1-2
>
>             CPU0       CPU1       CPU2       CPU3
>   67:    1563979          0          0          0
> PCI-MSI-edge      eth1-0
>   68:    1072869          0          0          0
> PCI-MSI-edge      eth1-1
>   69:     137905          0          0          0
> PCI-MSI-edge      eth1-2
>   70:     259246          0          0          0
> PCI-MSI-edge      eth1-3
>   71:     760252          0          0          0
> PCI-MSI-edge      eth1-4
>
> As you can see, eth1-1 + eth1-2 + eth1-3 + eth1-4 ~= eth1-0.

I think that ICMP ping packets will always go to ring 0 (eth1-0)
because they are non-IP packets.  I need to double check tomorrow
on how exactly the hashing works on RX.  Can you try running IP
traffic?  IP packets should theoretically go to rings 1 - 4.

>
> So, it seems that TX or RX is always handled by the first vector.
> I'll try to find if it is TX or RX.
>
> BTW: I'm using .1Q vlans over bonding, does it change anything?

That should not matter, as the VLAN tag is stripped before hashing.



^ permalink raw reply

* Re: TCP-MD5 checksum failure on x86_64 SMP
From: Eric Dumazet @ 2010-05-16 19:53 UTC (permalink / raw)
  To: David Miller
  Cc: shemminger, Bijay.Singh, bhaskie, bhutchings, netdev,
	ilpo.jarvinen
In-Reply-To: <20100512.152406.193725816.davem@davemloft.net>

Le mercredi 12 mai 2010 à 15:24 -0700, David Miller a écrit :
> From: Stephen Hemminger <shemminger@vyatta.com>
> Date: Wed, 12 May 2010 15:22:07 -0700
> 
> > Yes, that looks like a possible bug, not sure what hardware
> > generates frag_list.
> 
> GRO generates frag_list

ixgbe (82599) too, if I understand well this driver (TCP Receive Side
Coalescing RSC)




^ permalink raw reply

* Re: bnx2/BCM5709: why 5 interrupts on a 4 core system (2.6.33.3)
From: Eric Dumazet @ 2010-05-16 20:15 UTC (permalink / raw)
  To: Michael Chan; +Cc: 'Krzysztof Oledzki', netdev@vger.kernel.org
In-Reply-To: <C27F8246C663564A84BB7AB3439772421B7814753A@IRVEXCHCCR01.corp.ad.broadcom.com>

Le dimanche 16 mai 2010 à 13:00 -0700, Michael Chan a écrit :
> Krzysztof Oledzki wrote:
> 
> > On 2010-05-16 20:51, Michael Chan wrote:
> > > Krzysztof Oledzki wrote:
> > >
> > >>
> > >> Why the driver registers 5 interrupts instead of 4? How to
> > >> limit it to 4?
> > >>
> > >
> > > The first vector (eth0-0) handles link interrupt and other slow
> > > path events.  It also has an RX ring for non-IP packets that are
> > > not hashed by the RSS hash.  The majority of the rx packets should
> > > be hashed to the rx rings eth0-1 - eth0-4, so I would assign these
> > > vectors to different CPUs.
> >
> > Thank you for your prompt response.
> >
> > In my case the first vector must be handling something more:
> >  - "ping -f 192.168.0.1" increases interrupts on both eth1-0
> > and eth1-4
> >  - "ping -f 192.168.0.2" increases interrupts on both eth1-0
> > and eth1-3
> >  - "ping -f 192.168.0.3" increases interrupts on both eth1-0
> > and eth1-1
> >  - "ping -f 192.168.0.7" increases interrupts on both eth1-0
> > and eth1-2
> >
> >             CPU0       CPU1       CPU2       CPU3
> >   67:    1563979          0          0          0
> > PCI-MSI-edge      eth1-0
> >   68:    1072869          0          0          0
> > PCI-MSI-edge      eth1-1
> >   69:     137905          0          0          0
> > PCI-MSI-edge      eth1-2
> >   70:     259246          0          0          0
> > PCI-MSI-edge      eth1-3
> >   71:     760252          0          0          0
> > PCI-MSI-edge      eth1-4
> >
> > As you can see, eth1-1 + eth1-2 + eth1-3 + eth1-4 ~= eth1-0.
> 
> I think that ICMP ping packets will always go to ring 0 (eth1-0)
> because they are non-IP packets.  I need to double check tomorrow
> on how exactly the hashing works on RX.  Can you try running IP
> traffic?  IP packets should theoretically go to rings 1 - 4.
> 

ICMP packets are IP packets (Protocol=1)

> >
> > So, it seems that TX or RX is always handled by the first vector.
> > I'll try to find if it is TX or RX.
> >
> > BTW: I'm using .1Q vlans over bonding, does it change anything?
> 
> That should not matter, as the VLAN tag is stripped before hashing.

warning, bonding currently is not multiqueue aware.

All tx packets through bonding will use txqueue 0, since bnx2 doesnt
provide a ndo_select_queue() function.






^ permalink raw reply

* Re: bnx2/BCM5709: why 5 interrupts on a 4 core system (2.6.33.3)
From: Michael Chan @ 2010-05-16 20:24 UTC (permalink / raw)
  To: 'Eric Dumazet'
  Cc: 'Krzysztof Oledzki', netdev@vger.kernel.org
In-Reply-To: <1274040928.2299.17.camel@edumazet-laptop>

Eric Dumazet write:

> > I think that ICMP ping packets will always go to ring 0 (eth1-0)
> > because they are non-IP packets.  I need to double check tomorrow
> > on how exactly the hashing works on RX.  Can you try running IP
> > traffic?  IP packets should theoretically go to rings 1 - 4.
> >
>
> ICMP packets are IP packets (Protocol=1)
>

Sorry, Eric is right.  Anyway, I'll check on the hashing to see how
it works on UDP, TCP, and other packets.


^ permalink raw reply

* Re: bnx2/BCM5709: why 5 interrupts on a 4 core system (2.6.33.3)
From: Krzysztof Olędzki @ 2010-05-16 20:34 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Michael Chan, netdev@vger.kernel.org
In-Reply-To: <1274040928.2299.17.camel@edumazet-laptop>

On 2010-05-16 22:15, Eric Dumazet wrote:
> Le dimanche 16 mai 2010 à 13:00 -0700, Michael Chan a écrit :
>> Krzysztof Oledzki wrote:
>>
>>> On 2010-05-16 20:51, Michael Chan wrote:
>>>> Krzysztof Oledzki wrote:
>>>>
>>>>>
>>>>> Why the driver registers 5 interrupts instead of 4? How to
>>>>> limit it to 4?
>>>>>
>>>>
>>>> The first vector (eth0-0) handles link interrupt and other slow
>>>> path events.  It also has an RX ring for non-IP packets that are
>>>> not hashed by the RSS hash.  The majority of the rx packets should
>>>> be hashed to the rx rings eth0-1 - eth0-4, so I would assign these
>>>> vectors to different CPUs.
>>>
>>> Thank you for your prompt response.
>>>
>>> In my case the first vector must be handling something more:
>>>   - "ping -f 192.168.0.1" increases interrupts on both eth1-0
>>> and eth1-4
>>>   - "ping -f 192.168.0.2" increases interrupts on both eth1-0
>>> and eth1-3
>>>   - "ping -f 192.168.0.3" increases interrupts on both eth1-0
>>> and eth1-1
>>>   - "ping -f 192.168.0.7" increases interrupts on both eth1-0
>>> and eth1-2
>>>
>>>              CPU0       CPU1       CPU2       CPU3
>>>    67:    1563979          0          0          0
>>> PCI-MSI-edge      eth1-0
>>>    68:    1072869          0          0          0
>>> PCI-MSI-edge      eth1-1
>>>    69:     137905          0          0          0
>>> PCI-MSI-edge      eth1-2
>>>    70:     259246          0          0          0
>>> PCI-MSI-edge      eth1-3
>>>    71:     760252          0          0          0
>>> PCI-MSI-edge      eth1-4
>>>
>>> As you can see, eth1-1 + eth1-2 + eth1-3 + eth1-4 ~= eth1-0.
>>
>> I think that ICMP ping packets will always go to ring 0 (eth1-0)
>> because they are non-IP packets.  I need to double check tomorrow
>> on how exactly the hashing works on RX.  Can you try running IP
>> traffic?  IP packets should theoretically go to rings 1 - 4.
>>
>
> ICMP packets are IP packets (Protocol=1)

Exactly. However, the firmware may handle ICMP and TCP in a different way.

>>> So, it seems that TX or RX is always handled by the first vector.
>>> I'll try to find if it is TX or RX.
>>>
>>> BTW: I'm using .1Q vlans over bonding, does it change anything?
>>
>> That should not matter, as the VLAN tag is stripped before hashing.
>
> warning, bonding currently is not multiqueue aware.
>
> All tx packets through bonding will use txqueue 0, since bnx2 doesnt
> provide a ndo_select_queue() function.

OK, that explains everything. Thank you Eric. I assume it may take some 
time for bonding to become multiqueue aware and/or bnx2x to provide 
ndo_select_queue?

BTW: With a normal router workload, should I expect big performance drop 
when receiving and forwarding the same packet using different CPUs? 
Bonding provides very important functionality, I'm not able to drop it. :(

Best regards,

			Krzysztof Olędzki

^ permalink raw reply

* Re: bnx2/BCM5709: why 5 interrupts on a 4 core system (2.6.33.3)
From: Eric Dumazet @ 2010-05-16 20:47 UTC (permalink / raw)
  To: Krzysztof Olędzki; +Cc: Michael Chan, netdev@vger.kernel.org
In-Reply-To: <4BF056F0.8010008@ans.pl>

Le dimanche 16 mai 2010 à 22:34 +0200, Krzysztof Olędzki a écrit :
> On 2010-05-16 22:15, Eric Dumazet wrote:

> > All tx packets through bonding will use txqueue 0, since bnx2 doesnt
> > provide a ndo_select_queue() function.
> 
> OK, that explains everything. Thank you Eric. I assume it may take some 
> time for bonding to become multiqueue aware and/or bnx2x to provide 
> ndo_select_queue?
> 

bonding might become multiqueue aware, there are several patches
floating around.

But with your ping tests, it wont change the selected txqueue anyway (it
will be the same for any targets, because skb_tx_hash() wont hash the
destination address, only the skb->protocol.

> BTW: With a normal router workload, should I expect big performance drop 
> when receiving and forwarding the same packet using different CPUs? 
> Bonding provides very important functionality, I'm not able to drop it. :(
> 

Not sure what you mean by forwarding same packet using different CPUs.
You probably meant different queues, because in normal case, only one
cpu is involved (the one receiving the packet is also the one
transmitting it, unless you have congestion or trafic shaping)

If you have 4 cpus, you can use following patch and have a transparent
bonding against multiqueue. Still bonding xmit path hits a global
rwlock, so performance is not what you can get without bonding.

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 5e12462..2c257f7 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -5012,8 +5012,8 @@ int bond_create(struct net *net, const char *name)
 
 	rtnl_lock();
 
-	bond_dev = alloc_netdev(sizeof(struct bonding), name ? name : "",
-				bond_setup);
+	bond_dev = alloc_netdev_mq(sizeof(struct bonding), name ? name : "",
+				bond_setup, 4);
 	if (!bond_dev) {
 		pr_err("%s: eek! can't alloc netdev!\n", name);
 		rtnl_unlock();



^ permalink raw reply related

* Re: TCP-MD5 checksum failure on x86_64 SMP
From: Eric Dumazet @ 2010-05-16 20:48 UTC (permalink / raw)
  To: Bijay Singh
  Cc: Stephen Hemminger, David Miller, <bhaskie@gmail.com>,
	<bhutchings@solarflare.com>, netdev, Ilpo Järvinen
In-Reply-To: <1273611036.2512.18.camel@edumazet-laptop>

Le mardi 11 mai 2010 à 22:50 +0200, Eric Dumazet a écrit :
> Le mardi 11 mai 2010 à 04:08 +0000, Bijay Singh a écrit :
> > Hi Eric,
> > 
> > I guess that makes me the enviable one. So I am keen to test out this feature completely, as long as I know what to do as a next step, directions, patches.
> > 
> > Thanks
> 
> 
> I believe third problem comes from commit 4957faad
> (TCPCT part 1g: Responder Cookie => Initiator), from William Allen
> Simpson.
> 
> When a SYN-ACK packet is built (in tcp_synack_options()),
> it specifically forbids a TIMESTAMP option to be included if SACK is
> also selected :
> 
> doing_ts &= !ireq->sack_ok;
> 
> Problem is this mask is done on a local variable. socket is still marked
> as being timestamp enabled.
> 
> 
> Later, when we build tcp options for data packets, we _include_ a
> timestamp, while our SYNACK didnt mention the option.  
> 
> So the following trafic can happen (and fails) :
> 
> 18:38:29.041966 IP 192.168.0.33.58906 > 192.168.0.56.22226: Flags [S], seq 4014064674, win 8860, options [mss 4430,sackOK,TS val 519041 ecr 0,nop,wscale 7,nop,nop,md5can't check - 9b44126367effcf3247fcbf6da76b24d], length 0
> 18:38:29.042072 IP 192.168.0.56.22226 > 192.168.0.33.58906: Flags [S.], seq 586328714, ack 4014064675, win 5792, options [nop,nop,md5can't check - badd847799ded46f39642c341cc7e92b,mss 1460,nop,nop,sackOK,nop,wscale 7], length 0
> 18:38:29.042093 IP 192.168.0.33.58906 > 192.168.0.56.22226: Flags [.], ack 1, win 70, options [nop,nop,md5can't check - 3994ef6987df02a592963fba04c5d313], length 0
> 18:38:29.043217 IP 192.168.0.33.58906 > 192.168.0.56.22226: Flags [.], seq 1:1441, ack 1, win 70, options [nop,nop,md5can't check - 8399f7ccab3a6b8c5a3027ed58bba314], length 1440
> 18:38:29.043226 IP 192.168.0.33.58906 > 192.168.0.56.22226: Flags [P.], seq 1441:2501, ack 1, win 70, options [nop,nop,md5can't check - 701ebf65b1894a6bed4cefbf7a56596a], length 1060
> 18:38:29.043374 IP 192.168.0.56.22226 > 192.168.0.33.58906: Flags [.], ack 1441, win 68, options [nop,nop,md5can't check - 1badb315ba436ab59bff5b37daa871be,nop,nop,TS val 113051377 ecr 519041], length 0
> 18:38:29.043383 IP 192.168.0.56.22226 > 192.168.0.33.58906: Flags [.], ack 2501, win 91, options [nop,nop,md5can't check - 120564dcb99f822f3b70910282a6ed9d,nop,nop,TS val 113051377 ecr 519041], length 0
> 18:38:29.043673 IP 192.168.0.56.22226 > 192.168.0.33.58906: Flags [.], seq 1:1429, ack 2501, win 91, options [nop,nop,md5can't check - fe5dfb438065373b52ba85bf800876a8,nop,nop,TS val 113051377 ecr 519041], length 1428
> 18:38:29.043681 IP 192.168.0.56.22226 > 192.168.0.33.58906: Flags [P.], seq 1429:2500, ack 2501, win 91, options [nop,nop,md5can't check - 7a910cd5ff357bf0e2c8d3489aafaa86,nop,nop,TS val 113051377 ecr 519041], length 1071
> 18:38:32.037786 IP 192.168.0.56.22226 > 192.168.0.33.58906: Flags [.], seq 1:1429, ack 2501, win 91, options [nop,nop,md5can't check - fe5dfb438065373b52ba85bf800876a8,nop,nop,TS val 113051677 ecr 519041], length 1428
> 18:38:38.037708 IP 192.168.0.56.22226 > 192.168.0.33.58906: Flags [.], seq 1:1429, ack 2501, win 91, options [nop,nop,md5can't check - fe5dfb438065373b52ba85bf800876a8,nop,nop,TS val 113052277 ecr 519041], length 1428
> 18:38:50.037524 IP 192.168.0.56.22226 > 192.168.0.33.58906: Flags [.], seq 1:1429, ack 2501, win 91, options [nop,nop,md5can't check - fe5dfb438065373b52ba85bf800876a8,nop,nop,TS val 113053477 ecr 519041], length 1428
> 
> 
> Could you try following patch ?
> 
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 5db3a2c..0be21cd 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -668,7 +668,7 @@ static unsigned tcp_synack_options(struct sock *sk,
>  	u8 cookie_plus = (xvp != NULL && !xvp->cookie_out_never) ?
>  			 xvp->cookie_plus :
>  			 0;
> -	bool doing_ts = ireq->tstamp_ok;
> +	bool doing_ts;
>  
>  #ifdef CONFIG_TCP_MD5SIG
>  	*md5 = tcp_rsk(req)->af_specific->md5_lookup(sk, req);
> @@ -681,11 +681,12 @@ static unsigned tcp_synack_options(struct sock *sk,
>  		 * rather than TS in order to fit in better with old,
>  		 * buggy kernels, but that was deemed to be unnecessary.
>  		 */
> -		doing_ts &= !ireq->sack_ok;
> +		ireq->tstamp_ok &= !ireq->sack_ok;
>  	}
>  #else
>  	*md5 = NULL;
>  #endif
> +	doing_ts = ireq->tstamp_ok;
>  
>  	/* We always send an MSS option. */
>  	opts->mss = mss;
> 
> 
> 
> 

Bijay, had you tested this patch by any chance ?

Thanks



^ permalink raw reply

* Re: bnx2/BCM5709: why 5 interrupts on a 4 core system (2.6.33.3)
From: George B. @ 2010-05-16 21:06 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Krzysztof Olędzki, Michael Chan, netdev@vger.kernel.org
In-Reply-To: <1274042826.2299.26.camel@edumazet-laptop>

2010/5/16 Eric Dumazet <eric.dumazet@gmail.com>:
> Le dimanche 16 mai 2010 à 22:34 +0200, Krzysztof Olędzki a écrit :
>> On 2010-05-16 22:15, Eric Dumazet wrote:
>
>> > All tx packets through bonding will use txqueue 0, since bnx2 doesnt
>> > provide a ndo_select_queue() function.
>>
>> OK, that explains everything. Thank you Eric. I assume it may take some
>> time for bonding to become multiqueue aware and/or bnx2x to provide
>> ndo_select_queue?
>>
>
> bonding might become multiqueue aware, there are several patches
> floating around.
>
> But with your ping tests, it wont change the selected txqueue anyway (it
> will be the same for any targets, because skb_tx_hash() wont hash the
> destination address, only the skb->protocol.
>
>> BTW: With a normal router workload, should I expect big performance drop
>> when receiving and forwarding the same packet using different CPUs?
>> Bonding provides very important functionality, I'm not able to drop it. :(
>>
>
> Not sure what you mean by forwarding same packet using different CPUs.
> You probably meant different queues, because in normal case, only one
> cpu is involved (the one receiving the packet is also the one
> transmitting it, unless you have congestion or trafic shaping)
>
> If you have 4 cpus, you can use following patch and have a transparent
> bonding against multiqueue. Still bonding xmit path hits a global
> rwlock, so performance is not what you can get without bonding.
>
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 5e12462..2c257f7 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -5012,8 +5012,8 @@ int bond_create(struct net *net, const char *name)
>
>        rtnl_lock();
>
> -       bond_dev = alloc_netdev(sizeof(struct bonding), name ? name : "",
> -                               bond_setup);
> +       bond_dev = alloc_netdev_mq(sizeof(struct bonding), name ? name : "",
> +                               bond_setup, 4);
>        if (!bond_dev) {
>                pr_err("%s: eek! can't alloc netdev!\n", name);
>                rtnl_unlock();
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

FWIW, I will be comparing VLANs on bonded ethernet interfaces compared
to bonded to vlan interfaces (create a vlan on two interfaces and bond
them together) later this week to see if I can notice any performance
difference. I am expecting I will when two or more vlans are
experiencing heavy traffic.  What concerns me is if one ethernet goes
away, will the bond interface see the ethernet underlying the vlan
interface has gone down?

So in summary, rather than bonding ethernet interfaces and then
applying vlans to the bond, I intend to create vlans on the ethernet
interfaces and bond them. So one bond interface per vlan plus one for
the "raw" interfaces.  I am hoping that will allow better throughput
with multiple processors (and less head-of-line blocking for vlans
with low traffic rates).  Note: that configuration doesn't work with
2.6.32, I haven't tried with 2.6.33, and it allows me to configure it
with 2.6.34-rc7 though I haven't tested it yet on a multiqueue
ethernet with multiple processors.  I should have some systems to test
with later this week.

^ permalink raw reply

* Re: bnx2/BCM5709: why 5 interrupts on a 4 core system (2.6.33.3)
From: Krzysztof Olędzki @ 2010-05-16 21:12 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Michael Chan, netdev@vger.kernel.org
In-Reply-To: <1274042826.2299.26.camel@edumazet-laptop>

On 2010-05-16 22:47, Eric Dumazet wrote:
> Le dimanche 16 mai 2010 à 22:34 +0200, Krzysztof Olędzki a écrit :
>> On 2010-05-16 22:15, Eric Dumazet wrote:
>
>>> All tx packets through bonding will use txqueue 0, since bnx2 doesnt
>>> provide a ndo_select_queue() function.
>>
>> OK, that explains everything. Thank you Eric. I assume it may take some
>> time for bonding to become multiqueue aware and/or bnx2x to provide
>> ndo_select_queue?
>>
>
> bonding might become multiqueue aware, there are several patches
> floating around.
>
> But with your ping tests, it wont change the selected txqueue anyway (it
> will be the same for any targets, because skb_tx_hash() wont hash the
> destination address, only the skb->protocol.

What do you mean by "wont hash the destination address, only the 
skb->protocol"? It won't hash the destination address for ICMP or for 
all IP protocols?

My normal workload is TCP and UDP based so if it is only ICMP then there 
is no problem. Actually I have noticeably more UDP traffic than an 
average network, mainly because of LWAPP/CAPWAP, so I'm interested in 
good performance for both TCP and UDP.

During my initial tests ICMP ping showed the same behavior like UDP/TCP 
with iperf, so I sticked with it. I'll redo everyting with UDP and TCP 
of course. :)

>> BTW: With a normal router workload, should I expect big performance drop
>> when receiving and forwarding the same packet using different CPUs?
>> Bonding provides very important functionality, I'm not able to drop it. :(
>>
>
> Not sure what you mean by forwarding same packet using different CPUs.
> You probably meant different queues, because in normal case, only one
> cpu is involved (the one receiving the packet is also the one
> transmitting it, unless you have congestion or trafic shaping)

I mean to receive it on a one CPU and to send it on a different one. I 
would like to assing different vectors (eth1-0 .. eth1-4) to different 
CPUs, but with bnx2x+bonding packets are received on queues 1-4 (eth1-1 
.. eth1-4) and sent from queue 0 (eth1-0). So, for a one packet, two 
different CPUs will be involved (RX on q1-q4, TX on q0).

> If you have 4 cpus, you can use following patch and have a transparent
> bonding against multiqueue.

Thanks! If I get it right: with the patch, packets should be sent using 
the same CPU (queue?) that was used when receiving?

> Still bonding xmit path hits a global
> rwlock, so performance is not what you can get without bonding.

It may not be perfect, but it should be much better than nothing, right?

Best regards,

			Krzysztof Olędzki

^ permalink raw reply

* Re: bnx2/BCM5709: why 5 interrupts on a 4 core system (2.6.33.3)
From: Eric Dumazet @ 2010-05-16 21:26 UTC (permalink / raw)
  To: Krzysztof Olędzki; +Cc: Michael Chan, netdev@vger.kernel.org
In-Reply-To: <4BF05FC2.4020804@ans.pl>

Le dimanche 16 mai 2010 à 23:12 +0200, Krzysztof Olędzki a écrit :
> On 2010-05-16 22:47, Eric Dumazet wrote:
> > Le dimanche 16 mai 2010 à 22:34 +0200, Krzysztof Olędzki a écrit :
> >> On 2010-05-16 22:15, Eric Dumazet wrote:
> >
> >>> All tx packets through bonding will use txqueue 0, since bnx2 doesnt
> >>> provide a ndo_select_queue() function.
> >>
> >> OK, that explains everything. Thank you Eric. I assume it may take some
> >> time for bonding to become multiqueue aware and/or bnx2x to provide
> >> ndo_select_queue?
> >>
> >
> > bonding might become multiqueue aware, there are several patches
> > floating around.
> >
> > But with your ping tests, it wont change the selected txqueue anyway (it
> > will be the same for any targets, because skb_tx_hash() wont hash the
> > destination address, only the skb->protocol.
> 
> What do you mean by "wont hash the destination address, only the 
> skb->protocol"? It won't hash the destination address for ICMP or for 
> all IP protocols?

locally generated ICMP packets all use same tx queue, because
sk->sk_hash is not set :

        if (skb->sk && skb->sk->sk_hash)
                hash = skb->sk->sk_hash;
        else
                hash = (__force u16) skb->protocol;

        hash = jhash_1word(hash, hashrnd);

        return (u16) (((u64) hash * dev->real_num_tx_queues) >> 32);
 



However, replies will spread four queues, if hardware is capable to
perform hashing of ICMP packets, using IP addresses (source/destination)

> 
> My normal workload is TCP and UDP based so if it is only ICMP then there 
> is no problem. Actually I have noticeably more UDP traffic than an 
> average network, mainly because of LWAPP/CAPWAP, so I'm interested in 
> good performance for both TCP and UDP.
> 
> During my initial tests ICMP ping showed the same behavior like UDP/TCP 
> with iperf, so I sticked with it. I'll redo everyting with UDP and TCP 
> of course. :)
> 
> >> BTW: With a normal router workload, should I expect big performance drop
> >> when receiving and forwarding the same packet using different CPUs?
> >> Bonding provides very important functionality, I'm not able to drop it. :(
> >>
> >
> > Not sure what you mean by forwarding same packet using different CPUs.
> > You probably meant different queues, because in normal case, only one
> > cpu is involved (the one receiving the packet is also the one
> > transmitting it, unless you have congestion or trafic shaping)
> 
> I mean to receive it on a one CPU and to send it on a different one. I 
> would like to assing different vectors (eth1-0 .. eth1-4) to different 
> CPUs, but with bnx2x+bonding packets are received on queues 1-4 (eth1-1 
> .. eth1-4) and sent from queue 0 (eth1-0). So, for a one packet, two 
> different CPUs will be involved (RX on q1-q4, TX on q0).

As I said, (unless you use RPS), one forwarded packet only uses one CPU.
How tx queue is selected is another story. We try to do a 1-1 mapping.

> 
> > If you have 4 cpus, you can use following patch and have a transparent
> > bonding against multiqueue.
> 
> Thanks! If I get it right: with the patch, packets should be sent using 
> the same CPU (queue?) that was used when receiving?

Yes, for forwarding loads.

(You might use 5 or 8 instead of 4, because its not clear to me if bnx2
has 5 txqueues or 4 in your case)

> 
> > Still bonding xmit path hits a global
> > rwlock, so performance is not what you can get without bonding.
> 
> It may not be perfect, but it should be much better than nothing, right?
> 

Sure.



^ permalink raw reply

* Re: [patch] pm_qos update fixing mmotm 2010-05-11 -dies in pm_qos_update_request()
From: Rafael J. Wysocki @ 2010-05-16 22:21 UTC (permalink / raw)
  To: markgross
  Cc: mgross, Valdis.Kletnieks, e1000-devel, netdev, linux-kernel,
	linux-pm, akpm, davem
In-Reply-To: <20100515214256.GA3506@thegnar.org>

On Saturday 15 May 2010, mgross wrote:
> On Sat, May 15, 2010 at 09:38:47PM +0200, Rafael J. Wysocki wrote:
> > On Saturday 15 May 2010, mgross wrote:
> > > I apologize for the goofy email address.  
> > > 
> > > The following is a fix for the crash reported by Valdis.
> > > 
> > > The problem was that the original pm_qos silently fails when a request
> > > update is passed to a parameter that has not been added to the list
> > > yet.  It seems that the e1000e is doing this.  This update restores this
> > > behavior.
> > > 
> > > I need to think about how to better handle such abuse, but for now this
> > > restores the original behavior.
> > 
> > Can you please post a signed-off incremental patch against
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6.git for-llinus
> > 
> > that contains your original PM QOS update?
> 
> No problem:
>
> Signed-off-by: markgross <markgross@thegnar.org>

Thanks!  Do you want to use this address for the sign-off or the Intel one?

Rafael

 
> From 487b8dcaeb66d3c226d4c06c1bd99689f93024be Mon Sep 17 00:00:00 2001
> From: mgross <mgross@mgross-desktop.(none)>
> Date: Sat, 15 May 2010 14:30:15 -0700
> Subject: [PATCH] Gard against pm_qos users calling API before registering a proper
>  request.
> 
> This update handles a use case where pm_qos update requests need to
> silently fail if the update is being sent to a handle that is null.
> 
> The problem was that the original pm_qos silently fails when a request
> update is passed to a parameter that has not been added to the list yet.
> This update restores that behavior.
> 
> Signed-off-by: markgross <markgross@thegnar.org>
> 
> ---
>  kernel/pm_qos_params.c |   26 ++++++++++++++------------
>  1 files changed, 14 insertions(+), 12 deletions(-)
> 
> diff --git a/kernel/pm_qos_params.c b/kernel/pm_qos_params.c
> index a1aea04..f42d3f7 100644
> --- a/kernel/pm_qos_params.c
> +++ b/kernel/pm_qos_params.c
> @@ -252,19 +252,21 @@ void pm_qos_update_request(struct pm_qos_request_list *pm_qos_req,
>  	int pending_update = 0;
>  	s32 temp;
>  
> -	spin_lock_irqsave(&pm_qos_lock, flags);
> -	if (new_value == PM_QOS_DEFAULT_VALUE)
> -		temp = pm_qos_array[pm_qos_req->pm_qos_class]->default_value;
> -	else
> -		temp = new_value;
> -
> -	if (temp != pm_qos_req->value) {
> -		pending_update = 1;
> -		pm_qos_req->value = temp;
> +	if (pm_qos_req) { /*guard against callers passing in null */
> +		spin_lock_irqsave(&pm_qos_lock, flags);
> +		if (new_value == PM_QOS_DEFAULT_VALUE)
> +			temp = pm_qos_array[pm_qos_req->pm_qos_class]->default_value;
> +		else
> +			temp = new_value;
> +
> +		if (temp != pm_qos_req->value) {
> +			pending_update = 1;
> +			pm_qos_req->value = temp;
> +		}
> +		spin_unlock_irqrestore(&pm_qos_lock, flags);
> +		if (pending_update)
> +			update_target(pm_qos_req->pm_qos_class);
>  	}
> -	spin_unlock_irqrestore(&pm_qos_lock, flags);
> -	if (pending_update)
> -		update_target(pm_qos_req->pm_qos_class);
>  }
>  EXPORT_SYMBOL_GPL(pm_qos_update_request);
>  
> 


------------------------------------------------------------------------------

_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply

* Re: [PATCH 13/37] drivers/net/wireless/iwmc3200wifi: Use kmemdup
From: Samuel Ortiz @ 2010-05-16 23:01 UTC (permalink / raw)
  To: Julia Lawall
  Cc: Zhu, Yi, Intel Linux Wireless, John W. Linville,
	linux-wireless@vger.kernel.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, kernel-janitors@vger.kernel.org
In-Reply-To: <Pine.LNX.4.64.1005152316400.21345@ask.diku.dk>

On Sat, May 15, 2010 at 10:16:58PM +0100, Julia Lawall wrote:
> From: Julia Lawall <julia@diku.dk>
> 
> Use kmemdup when some other buffer is immediately copied into the
> allocated region.
> 
> A simplified version of the semantic patch that makes this change is as
> follows: (http://coccinelle.lip6.fr/)
> 
> // <smpl>
> @@
> expression from,to,size,flag;
> statement S;
> @@
> 
> -  to = \(kmalloc\|kzalloc\)(size,flag);
> +  to = kmemdup(from,size,flag);
>    if (to==NULL || ...) S
> -  memcpy(to, from, size);
> // </smpl>
> 
> Signed-off-by: Julia Lawall <julia@diku.dk>
Acked-by: Samuel Ortiz <sameo@linux.intel.com>

Cheers,
Samuel.
 
> ---
>  drivers/net/wireless/iwmc3200wifi/rx.c |    4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff -u -p a/drivers/net/wireless/iwmc3200wifi/rx.c b/drivers/net/wireless/iwmc3200wifi/rx.c
> --- a/drivers/net/wireless/iwmc3200wifi/rx.c
> +++ b/drivers/net/wireless/iwmc3200wifi/rx.c
> @@ -321,14 +321,14 @@ iwm_rx_ticket_node_alloc(struct iwm_priv
>  		return ERR_PTR(-ENOMEM);
>  	}
>  
> -	ticket_node->ticket = kzalloc(sizeof(struct iwm_rx_ticket), GFP_KERNEL);
> +	ticket_node->ticket = kmemdup(ticket, sizeof(struct iwm_rx_ticket),
> +				      GFP_KERNEL);
>  	if (!ticket_node->ticket) {
>  		IWM_ERR(iwm, "Couldn't allocate RX ticket\n");
>  		kfree(ticket_node);
>  		return ERR_PTR(-ENOMEM);
>  	}
>  
> -	memcpy(ticket_node->ticket, ticket, sizeof(struct iwm_rx_ticket));
>  	INIT_LIST_HEAD(&ticket_node->node);
>  
>  	return ticket_node;

-- 
Intel Open Source Technology Centre
http://oss.intel.com/
---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris, 
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 4,572,000 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

^ permalink raw reply

* Re: [patch] pm_qos update fixing mmotm 2010-05-11 -dies in pm_qos_update_request()
From: mgross @ 2010-05-17  0:12 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: markgross, Valdis.Kletnieks, mgross, akpm, davem, linux-kernel,
	e1000-devel, netdev, linux-pm
In-Reply-To: <201005170021.25427.rjw@sisk.pl>

On Mon, May 17, 2010 at 12:21:25AM +0200, Rafael J. Wysocki wrote:
> On Saturday 15 May 2010, mgross wrote:
> > On Sat, May 15, 2010 at 09:38:47PM +0200, Rafael J. Wysocki wrote:
> > > On Saturday 15 May 2010, mgross wrote:
> > > > I apologize for the goofy email address.  
> > > > 
> > > > The following is a fix for the crash reported by Valdis.
> > > > 
> > > > The problem was that the original pm_qos silently fails when a request
> > > > update is passed to a parameter that has not been added to the list
> > > > yet.  It seems that the e1000e is doing this.  This update restores this
> > > > behavior.
> > > > 
> > > > I need to think about how to better handle such abuse, but for now this
> > > > restores the original behavior.
> > > 
> > > Can you please post a signed-off incremental patch against
> > > 
> > > git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6.git for-llinus
> > > 
> > > that contains your original PM QOS update?
> > 
> > No problem:
> >
> > Signed-off-by: markgross <markgross@thegnar.org>
> 
> Thanks!  Do you want to use this address for the sign-off or the Intel one?

I guess so.  Ever since switching groups within intel last summer my
mgross@linux.intel.com address isn't checked as often as this one. 

The other option is to use my outlook email (mark.gross@intel.com), but
I really hate posting from outlook.  Besides, doing upstream kernel
stuff isn't my day job any more so using markgross@thegnar.org  makes
sense to me.

thanks,

--mgross



> 
> Rafael
> 
>  
> > From 487b8dcaeb66d3c226d4c06c1bd99689f93024be Mon Sep 17 00:00:00 2001
> > From: mgross <mgross@mgross-desktop.(none)>
> > Date: Sat, 15 May 2010 14:30:15 -0700
> > Subject: [PATCH] Gard against pm_qos users calling API before registering a proper
> >  request.
> > 
> > This update handles a use case where pm_qos update requests need to
> > silently fail if the update is being sent to a handle that is null.
> > 
> > The problem was that the original pm_qos silently fails when a request
> > update is passed to a parameter that has not been added to the list yet.
> > This update restores that behavior.
> > 
> > Signed-off-by: markgross <markgross@thegnar.org>
> > 
> > ---
> >  kernel/pm_qos_params.c |   26 ++++++++++++++------------
> >  1 files changed, 14 insertions(+), 12 deletions(-)
> > 
> > diff --git a/kernel/pm_qos_params.c b/kernel/pm_qos_params.c
> > index a1aea04..f42d3f7 100644
> > --- a/kernel/pm_qos_params.c
> > +++ b/kernel/pm_qos_params.c
> > @@ -252,19 +252,21 @@ void pm_qos_update_request(struct pm_qos_request_list *pm_qos_req,
> >  	int pending_update = 0;
> >  	s32 temp;
> >  
> > -	spin_lock_irqsave(&pm_qos_lock, flags);
> > -	if (new_value == PM_QOS_DEFAULT_VALUE)
> > -		temp = pm_qos_array[pm_qos_req->pm_qos_class]->default_value;
> > -	else
> > -		temp = new_value;
> > -
> > -	if (temp != pm_qos_req->value) {
> > -		pending_update = 1;
> > -		pm_qos_req->value = temp;
> > +	if (pm_qos_req) { /*guard against callers passing in null */
> > +		spin_lock_irqsave(&pm_qos_lock, flags);
> > +		if (new_value == PM_QOS_DEFAULT_VALUE)
> > +			temp = pm_qos_array[pm_qos_req->pm_qos_class]->default_value;
> > +		else
> > +			temp = new_value;
> > +
> > +		if (temp != pm_qos_req->value) {
> > +			pending_update = 1;
> > +			pm_qos_req->value = temp;
> > +		}
> > +		spin_unlock_irqrestore(&pm_qos_lock, flags);
> > +		if (pending_update)
> > +			update_target(pm_qos_req->pm_qos_class);
> >  	}
> > -	spin_unlock_irqrestore(&pm_qos_lock, flags);
> > -	if (pending_update)
> > -		update_target(pm_qos_req->pm_qos_class);
> >  }
> >  EXPORT_SYMBOL_GPL(pm_qos_update_request);
> >  
> > 
> 

^ permalink raw reply

* [PATCH] phy/marvell: Add special settings for D-Link DNS-323 rev C1
From: Benjamin Herrenschmidt @ 2010-05-17  0:27 UTC (permalink / raw)
  To: netdev; +Cc: linux-arm-kernel, Nicolas Pitre, Herbert Valerio Riedel

Without this change, the network LED doesn't work on the device. The
value itself comes from the vendor kernel.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 drivers/net/phy/marvell.c |   11 +++++++++++
 1 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c
index 64c7fbe..22b1efa 100644
--- a/drivers/net/phy/marvell.c
+++ b/drivers/net/phy/marvell.c
@@ -34,6 +34,10 @@
 #include <asm/irq.h>
 #include <asm/uaccess.h>
 
+#ifdef CONFIG_ARM
+#include <asm/mach-types.h>
+#endif
+
 #define MII_M1011_IEVENT		0x13
 #define MII_M1011_IEVENT_CLEAR		0x0000
 
@@ -350,7 +354,14 @@ static int m88e1118_config_init(struct phy_device *phydev)
 		return err;
 
 	/* Adjust LED Control */
+#ifdef CONFIG_MACH_DNS323
+	/* The DNS-323 needs a special value in here for the LED to work */
+	if (machine_is_dns323())
+		err = phy_write(phydev, 0x10, 0x1100);
+	else
+#else
 	err = phy_write(phydev, 0x10, 0x021e);
+#endif
 	if (err < 0)
 		return err;
 



^ permalink raw reply related

* Re: [PATCH] phy/marvell: Add special settings for D-Link DNS-323 rev C1
From: Wolfram Sang @ 2010-05-17  0:59 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: netdev, Nicolas Pitre, linux-arm-kernel, Herbert Valerio Riedel
In-Reply-To: <1274056058.21352.697.camel@pasglop>


[-- Attachment #1.1: Type: text/plain, Size: 1453 bytes --]

On Mon, May 17, 2010 at 10:27:38AM +1000, Benjamin Herrenschmidt wrote:
> Without this change, the network LED doesn't work on the device. The
> value itself comes from the vendor kernel.
> 
> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> ---
>  drivers/net/phy/marvell.c |   11 +++++++++++
>  1 files changed, 11 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c
> index 64c7fbe..22b1efa 100644
> --- a/drivers/net/phy/marvell.c
> +++ b/drivers/net/phy/marvell.c
> @@ -34,6 +34,10 @@
>  #include <asm/irq.h>
>  #include <asm/uaccess.h>
>  
> +#ifdef CONFIG_ARM
> +#include <asm/mach-types.h>
> +#endif
> +
>  #define MII_M1011_IEVENT		0x13
>  #define MII_M1011_IEVENT_CLEAR		0x0000
>  
> @@ -350,7 +354,14 @@ static int m88e1118_config_init(struct phy_device *phydev)
>  		return err;
>  
>  	/* Adjust LED Control */
> +#ifdef CONFIG_MACH_DNS323
> +	/* The DNS-323 needs a special value in here for the LED to work */
> +	if (machine_is_dns323())
> +		err = phy_write(phydev, 0x10, 0x1100);
> +	else
> +#else
>  	err = phy_write(phydev, 0x10, 0x021e);
> +#endif

There is a fixup()-callback to prevent boardcode in the drivers. See
Documentation/networking/phy.txt, last chapter.

-- 
Pengutronix e.K.                           | Wolfram Sang                |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |

[-- Attachment #1.2: Digital signature --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH] phy/marvell: Add special settings for D-Link DNS-323 rev C1
From: Benjamin Herrenschmidt @ 2010-05-17  1:07 UTC (permalink / raw)
  To: Wolfram Sang
  Cc: netdev, Nicolas Pitre, linux-arm-kernel, Herbert Valerio Riedel
In-Reply-To: <20100517005942.GB27301@pengutronix.de>

On Mon, 2010-05-17 at 02:59 +0200, Wolfram Sang wrote:
> There is a fixup()-callback to prevent boardcode in the drivers. See
> Documentation/networking/phy.txt, last chapter.

Ah nice ! I missed that bit. I'll add a fixup and see if it works.

The problem is that writing to this register seems to be part of a
specific initialization sequence, which is done one way in the linux
driver and differently in the vendor kernel. I don't know whether
I can just 'override' the value and I have no docs for that part.

But I'll definitely give it a go tonight.

Cheers,
Ben.

^ permalink raw reply

* Re: [PATCH 1/9] mm: add generic adaptive large memory allocationAPIs
From: KOSAKI Motohiro @ 2010-05-17  1:34 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: kosaki.motohiro, peterz, xiaosuo, akpm, hnguyen, raisch, rolandd,
	sean.hefty, hal.rosenstock, divy, James.Bottomley, tytso, adilger,
	viro, menage, lizf, linux-rdma, linux-kernel, netdev, linux-scsi,
	linux-ext4, linux-fsdevel, linux-mm, containers, eric.dumazet
In-Reply-To: <201005132236.ADJ57893.FLFFMtOVJHOOSQ@I-love.SAKURA.ne.jp>

> Peter Zijlstra wrote:
> > NAK, I really utterly dislike that inatomic argument. The alloc side
> > doesn't function in atomic context either. Please keep the thing
> > symmetric in that regards.
> 
> Excuse me. kmalloc(GFP_KERNEL) may sleep (and therefore cannot be used in
> atomic context). However, kfree() for memory allocated with kmalloc(GFP_KERNEL)
> never sleep (and therefore can be used in atomic context).
> Why kmalloc() and kfree() are NOT kept symmetric?

In kmalloc case, we need to consider both kmalloc(GFP_KERNEL)/kfree() pair and
kmalloc(GFP_ATOMIC)/kfree() pair. latter is mainly used on atomic context.
To make kfree() atomic help to keep the implementation simple.

But kvmalloc don't have GFP_ATOMIC feautre. that's big difference.




^ permalink raw reply

* linux-next: manual merge of the net tree with Linus' tree
From: Stephen Rothwell @ 2010-05-17  2:09 UTC (permalink / raw)
  To: David Miller, netdev
  Cc: linux-next, linux-kernel, Jan Engelhardt, Chris Wright

Hi all,

Today's linux-next merge of the net tree got a conflict in
include/linux/if_link.h between commit
c02db8c6290bb992442fec1407643c94cc414375 ("rtnetlink: make SR-IOV VF
interface symmetric") from Linus' tree and commit
10708f37ae729baba9b67bd134c3720709d4ae62 ("net: core: add IFLA_STATS64
support") from the net tree.

Just context changes.  I fixed it up (see below) and can carry the fix
for a while.
-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

diff --cc include/linux/if_link.h
index d94963b,cfd420b..0000000
--- a/include/linux/if_link.h
+++ b/include/linux/if_link.h
@@@ -79,7 -111,11 +111,8 @@@ enum 
  	IFLA_NET_NS_PID,
  	IFLA_IFALIAS,
  	IFLA_NUM_VF,		/* Number of VFs if device is SR-IOV PF */
 -	IFLA_VF_MAC,		/* Hardware queue specific attributes */
 -	IFLA_VF_VLAN,
 -	IFLA_VF_TX_RATE,	/* TX Bandwidth Allocation */
 -	IFLA_VFINFO,
 +	IFLA_VFINFO_LIST,
+ 	IFLA_STATS64,
  	__IFLA_MAX
  };
  

^ permalink raw reply

* linux-next: manual merge of the net tree with the m68k tree
From: Stephen Rothwell @ 2010-05-17  2:11 UTC (permalink / raw)
  To: David Miller, netdev
  Cc: linux-next, linux-kernel, Geert Uytterhoeven, David Woodhouse,
	Ben Hutchings

Hi all,

Today's linux-next merge of the net tree got a conflict in
include/linux/mod_devicetable.h between commit
52498c252690651f915aecbcd10e26bcbafbe2a3 ("m68k: amiga - Zorro bus
modalias support") from the m68k tree and commit
8626d3b4328061f5b82b11ae1d6918a0c3602f42 ("phylib: Support phy module
autoloading") from the net tree.

Just overlapping additions.  I fixed it up (see below) and can carry the
fix as necessary.
-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

diff --cc include/linux/mod_devicetable.h
index 56fde43,55f1f9c..0000000
--- a/include/linux/mod_devicetable.h
+++ b/include/linux/mod_devicetable.h
@@@ -474,13 -474,30 +474,39 @@@ struct platform_device_id 
  			__attribute__((aligned(sizeof(kernel_ulong_t))));
  };
  
 +struct zorro_device_id {
 +	__u32 id;			/* Device ID or ZORRO_WILDCARD */
 +	kernel_ulong_t driver_data;	/* Data private to the driver */
 +};
 +
 +#define ZORRO_WILDCARD			(0xffffffff)	/* not official */
 +
 +#define ZORRO_DEVICE_MODALIAS_FMT	"zorro:i%08X"
 +
+ #define MDIO_MODULE_PREFIX	"mdio:"
+ 
+ #define MDIO_ID_FMT "%d%d%d%d%d%d%d%d%d%d%d%d%d%d%d%d%d%d%d%d%d%d%d%d%d%d%d%d%d%d%d%d"
+ #define MDIO_ID_ARGS(_id) \
+ 	(_id)>>31, ((_id)>>30) & 1, ((_id)>>29) & 1, ((_id)>>28) & 1,	\
+ 	((_id)>>27) & 1, ((_id)>>26) & 1, ((_id)>>25) & 1, ((_id)>>24) & 1, \
+ 	((_id)>>23) & 1, ((_id)>>22) & 1, ((_id)>>21) & 1, ((_id)>>20) & 1, \
+ 	((_id)>>19) & 1, ((_id)>>18) & 1, ((_id)>>17) & 1, ((_id)>>16) & 1, \
+ 	((_id)>>15) & 1, ((_id)>>14) & 1, ((_id)>>13) & 1, ((_id)>>12) & 1, \
+ 	((_id)>>11) & 1, ((_id)>>10) & 1, ((_id)>>9) & 1, ((_id)>>8) & 1, \
+ 	((_id)>>7) & 1, ((_id)>>6) & 1, ((_id)>>5) & 1, ((_id)>>4) & 1, \
+ 	((_id)>>3) & 1, ((_id)>>2) & 1, ((_id)>>1) & 1, (_id) & 1
+ 
+ /**
+  * struct mdio_device_id - identifies PHY devices on an MDIO/MII bus
+  * @phy_id: The result of
+  *     (mdio_read(&MII_PHYSID1) << 16 | mdio_read(&PHYSID2)) & @phy_id_mask
+  *     for this PHY type
+  * @phy_id_mask: Defines the significant bits of @phy_id.  A value of 0
+  *     is used to terminate an array of struct mdio_device_id.
+  */
+ struct mdio_device_id {
+ 	__u32 phy_id;
+ 	__u32 phy_id_mask;
+ };
+ 
  #endif /* LINUX_MOD_DEVICETABLE_H */

^ permalink raw reply

* linux-next: manual merge of the net tree with the m68k tree
From: Stephen Rothwell @ 2010-05-17  2:14 UTC (permalink / raw)
  To: David Miller, netdev
  Cc: linux-next, linux-kernel, Geert Uytterhoeven, David Woodhouse,
	Ben Hutchings

Hi all,

Today's linux-next merge of the net tree got a conflict in
scripts/mod/file2alias.c between commit
52498c252690651f915aecbcd10e26bcbafbe2a3 ("m68k: amiga - Zorro bus
modalias support") from the m68k tree and commit
8626d3b4328061f5b82b11ae1d6918a0c3602f42 ("phylib: Support phy module
autoloading") from the net tree.

Again, just overlapping additions.  I fixed it up (see below) and can
carry the fix as necessary.
-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

diff --cc scripts/mod/file2alias.c
index df90f31,36a60a8..0000000
--- a/scripts/mod/file2alias.c
+++ b/scripts/mod/file2alias.c
@@@ -796,16 -796,28 +796,38 @@@ static int do_platform_entry(const cha
  	return 1;
  }
  
 +/* Looks like: zorro:iN. */
 +static int do_zorro_entry(const char *filename, struct zorro_device_id *id,
 +			  char *alias)
 +{
 +	id->id = TO_NATIVE(id->id);
 +	strcpy(alias, "zorro:");
 +	ADD(alias, "i", id->id != ZORRO_WILDCARD, id->id);
 +	return 1;
 +}
 +
+ static int do_mdio_entry(const char *filename,
+ 			 struct mdio_device_id *id, char *alias)
+ {
+ 	int i;
+ 
+ 	alias += sprintf(alias, MDIO_MODULE_PREFIX);
+ 
+ 	for (i = 0; i < 32; i++) {
+ 		if (!((id->phy_id_mask >> (31-i)) & 1))
+ 			*(alias++) = '?';
+ 		else if ((id->phy_id >> (31-i)) & 1)
+ 			*(alias++) = '1';
+ 		else
+ 			*(alias++) = '0';
+ 	}
+ 
+ 	/* Terminate the string */
+ 	*alias = 0;
+ 
+ 	return 1;
+ }
+ 
  /* Ignore any prefix, eg. some architectures prepend _ */
  static inline int sym_is(const char *symbol, const char *name)
  {
@@@ -953,10 -965,10 +975,14 @@@ void handle_moddevtable(struct module *
  		do_table(symval, sym->st_size,
  			 sizeof(struct platform_device_id), "platform",
  			 do_platform_entry, mod);
 +	else if (sym_is(symname, "__mod_zorro_device_table"))
 +		do_table(symval, sym->st_size,
 +			 sizeof(struct zorro_device_id), "zorro",
 +			 do_zorro_entry, mod);
+ 	else if (sym_is(symname, "__mod_mdio_device_table"))
+ 		do_table(symval, sym->st_size,
+ 			 sizeof(struct mdio_device_id), "mdio",
+ 			 do_mdio_entry, mod);
  	free(zeros);
  }
  

^ permalink raw reply

* Re: TCP-MD5 checksum failure on x86_64 SMP
From: Bijay Singh @ 2010-05-17  3:49 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Stephen Hemminger, David Miller, <bhaskie@gmail.com>,
	<bhutchings@solarflare.com>, netdev, Ilpo Järvinen
In-Reply-To: <1274042939.2299.27.camel@edumazet-laptop>



On 17-May-2010, at 2:18 AM, Eric Dumazet wrote:

> Le mardi 11 mai 2010 à 22:50 +0200, Eric Dumazet a écrit :
>> Le mardi 11 mai 2010 à 04:08 +0000, Bijay Singh a écrit :
>>> Hi Eric,
>>> 
>>> I guess that makes me the enviable one. So I am keen to test out this feature completely, as long as I know what to do as a next step, directions, patches.
>>> 
>>> Thanks
>> 
>> 
>> I believe third problem comes from commit 4957faad
>> (TCPCT part 1g: Responder Cookie => Initiator), from William Allen
>> Simpson.
>> 
>> When a SYN-ACK packet is built (in tcp_synack_options()),
>> it specifically forbids a TIMESTAMP option to be included if SACK is
>> also selected :
>> 
>> doing_ts &= !ireq->sack_ok;
>> 
>> Problem is this mask is done on a local variable. socket is still marked
>> as being timestamp enabled.
>> 
>> 
>> Later, when we build tcp options for data packets, we _include_ a
>> timestamp, while our SYNACK didnt mention the option.  
>> 
>> So the following trafic can happen (and fails) :
>> 
>> 18:38:29.041966 IP 192.168.0.33.58906 > 192.168.0.56.22226: Flags [S], seq 4014064674, win 8860, options [mss 4430,sackOK,TS val 519041 ecr 0,nop,wscale 7,nop,nop,md5can't check - 9b44126367effcf3247fcbf6da76b24d], length 0
>> 18:38:29.042072 IP 192.168.0.56.22226 > 192.168.0.33.58906: Flags [S.], seq 586328714, ack 4014064675, win 5792, options [nop,nop,md5can't check - badd847799ded46f39642c341cc7e92b,mss 1460,nop,nop,sackOK,nop,wscale 7], length 0
>> 18:38:29.042093 IP 192.168.0.33.58906 > 192.168.0.56.22226: Flags [.], ack 1, win 70, options [nop,nop,md5can't check - 3994ef6987df02a592963fba04c5d313], length 0
>> 18:38:29.043217 IP 192.168.0.33.58906 > 192.168.0.56.22226: Flags [.], seq 1:1441, ack 1, win 70, options [nop,nop,md5can't check - 8399f7ccab3a6b8c5a3027ed58bba314], length 1440
>> 18:38:29.043226 IP 192.168.0.33.58906 > 192.168.0.56.22226: Flags [P.], seq 1441:2501, ack 1, win 70, options [nop,nop,md5can't check - 701ebf65b1894a6bed4cefbf7a56596a], length 1060
>> 18:38:29.043374 IP 192.168.0.56.22226 > 192.168.0.33.58906: Flags [.], ack 1441, win 68, options [nop,nop,md5can't check - 1badb315ba436ab59bff5b37daa871be,nop,nop,TS val 113051377 ecr 519041], length 0
>> 18:38:29.043383 IP 192.168.0.56.22226 > 192.168.0.33.58906: Flags [.], ack 2501, win 91, options [nop,nop,md5can't check - 120564dcb99f822f3b70910282a6ed9d,nop,nop,TS val 113051377 ecr 519041], length 0
>> 18:38:29.043673 IP 192.168.0.56.22226 > 192.168.0.33.58906: Flags [.], seq 1:1429, ack 2501, win 91, options [nop,nop,md5can't check - fe5dfb438065373b52ba85bf800876a8,nop,nop,TS val 113051377 ecr 519041], length 1428
>> 18:38:29.043681 IP 192.168.0.56.22226 > 192.168.0.33.58906: Flags [P.], seq 1429:2500, ack 2501, win 91, options [nop,nop,md5can't check - 7a910cd5ff357bf0e2c8d3489aafaa86,nop,nop,TS val 113051377 ecr 519041], length 1071
>> 18:38:32.037786 IP 192.168.0.56.22226 > 192.168.0.33.58906: Flags [.], seq 1:1429, ack 2501, win 91, options [nop,nop,md5can't check - fe5dfb438065373b52ba85bf800876a8,nop,nop,TS val 113051677 ecr 519041], length 1428
>> 18:38:38.037708 IP 192.168.0.56.22226 > 192.168.0.33.58906: Flags [.], seq 1:1429, ack 2501, win 91, options [nop,nop,md5can't check - fe5dfb438065373b52ba85bf800876a8,nop,nop,TS val 113052277 ecr 519041], length 1428
>> 18:38:50.037524 IP 192.168.0.56.22226 > 192.168.0.33.58906: Flags [.], seq 1:1429, ack 2501, win 91, options [nop,nop,md5can't check - fe5dfb438065373b52ba85bf800876a8,nop,nop,TS val 113053477 ecr 519041], length 1428
>> 
>> 
>> Could you try following patch ?
>> 
>> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
>> index 5db3a2c..0be21cd 100644
>> --- a/net/ipv4/tcp_output.c
>> +++ b/net/ipv4/tcp_output.c
>> @@ -668,7 +668,7 @@ static unsigned tcp_synack_options(struct sock *sk,
>> 	u8 cookie_plus = (xvp != NULL && !xvp->cookie_out_never) ?
>> 			 xvp->cookie_plus :
>> 			 0;
>> -	bool doing_ts = ireq->tstamp_ok;
>> +	bool doing_ts;
>> 
>> #ifdef CONFIG_TCP_MD5SIG
>> 	*md5 = tcp_rsk(req)->af_specific->md5_lookup(sk, req);
>> @@ -681,11 +681,12 @@ static unsigned tcp_synack_options(struct sock *sk,
>> 		 * rather than TS in order to fit in better with old,
>> 		 * buggy kernels, but that was deemed to be unnecessary.
>> 		 */
>> -		doing_ts &= !ireq->sack_ok;
>> +		ireq->tstamp_ok &= !ireq->sack_ok;
>> 	}
>> #else
>> 	*md5 = NULL;
>> #endif
>> +	doing_ts = ireq->tstamp_ok;
>> 
>> 	/* We always send an MSS option. */
>> 	opts->mss = mss;
>> 
>> 
>> 
>> 
> 
> Bijay, had you tested this patch by any chance ?
> 


I am on quite an old kernel 2.6.27 and could not apply your patches.

Then i moved on to the kernel 2.6.32.11 however since then I have not been able to bring up my card, this is something i need to fix before i can test you fix. Working on that.

> Thanks
> 
> 


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox