Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net] bonding: Avoid possible de-sync with arp_validate
From: David Miller @ 2013-09-06 17:58 UTC (permalink / raw)
  To: mleitner; +Cc: netdev, naleksan, fubar, andy
In-Reply-To: <994d2dc994a801fc3aae7aa912266b09ea45d95c.1378489255.git.mleitner@redhat.com>

From: Marcelo Ricardo Leitner <mleitner@redhat.com>
Date: Fri,  6 Sep 2013 14:41:53 -0300

> Please queue this one for -stable trees too.

This is very much not the correct way to submit a patch.

First of all, you haven't signed off on the patch.

Second of all, any supplemental information that doesn't belong in the
commit message needs to go after a "---" dileanator line.

^ permalink raw reply

* Re: [PATCH net] bonding: Avoid possible de-sync with arp_validate
From: Jay Vosburgh @ 2013-09-06 17:56 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner; +Cc: netdev, naleksan, andy
In-Reply-To: <994d2dc994a801fc3aae7aa912266b09ea45d95c.1378489255.git.mleitner@redhat.com>

Marcelo Ricardo Leitner <mleitner@redhat.com> wrote:

>Hi Dave,
>
>Please queue this one for -stable trees too.
>
>Thanks.
>
>---8<---
>
>As bond_store_arp_validation and bond_store_arp_interval doesn't
>deal with each other, we can't chain both at bond_open, otherwise
>we are open to this de-sync state, steps in sequence:
>- bond is down
>- arp_interval = 0
>- arp_validate = 1 or 2
>- bond is up
>- arp_interval = 1
>
>This would lead to bond issuing ARP requests but never listening
>to the replies, because the tap wasn't attached. After reaching
>this state, while bond is up, setting arp_validate won't help
>as it only acts if needed.

	You need to sign off you patch.

> drivers/net/bonding/bond_main.c | 7 +++----
> 1 file changed, 3 insertions(+), 4 deletions(-)
>
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index 39e5b1c..805d098 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -3180,11 +3180,10 @@ static int bond_open(struct net_device *bond_dev)
> 	if (bond->params.miimon)  /* link check interval, in milliseconds. */
> 		queue_delayed_work(bond->wq, &bond->mii_work, 0);
>
>-	if (bond->params.arp_interval) {  /* arp interval, in milliseconds. */
>+	if (bond->params.arp_interval)    /* arp interval, in milliseconds. */
> 		queue_delayed_work(bond->wq, &bond->arp_work, 0);
>-		if (bond->params.arp_validate)
>-			bond->recv_probe = bond_arp_rcv;
>-	}
>+	if (bond->params.arp_validate)
>+		bond->recv_probe = bond_arp_rcv;

	Won't this set the recv_probe when arp_interval is 0 (disabled)?
That's going to make bonding look at a bunch of traffic it's not going
to do anything with.

	Why is this better than having bonding_store_arp_interval set
recv_probe if arp_validate is set when it queues the arp_work?  And,
presumably, clear the recv_probe if arp_interval is being cleared.

	-J

> 	if (bond->params.mode == BOND_MODE_8023AD) {
> 		queue_delayed_work(bond->wq, &bond->ad_work, 0);
>-- 
>1.8.3.1
>

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply

* Re: [PATCH net] bonding: Avoid possible de-sync with arp_validate
From: Nikolay Aleksandrov @ 2013-09-06 17:49 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner; +Cc: netdev, naleksan, fubar, andy
In-Reply-To: <994d2dc994a801fc3aae7aa912266b09ea45d95c.1378489255.git.mleitner@redhat.com>

On 09/06/2013 07:41 PM, Marcelo Ricardo Leitner wrote:
> Hi Dave,
> 
> Please queue this one for -stable trees too.
> 
> Thanks.
> 
> ---8<---
> 
> As bond_store_arp_validation and bond_store_arp_interval doesn't
> deal with each other, we can't chain both at bond_open, otherwise
> we are open to this de-sync state, steps in sequence:
> - bond is down
> - arp_interval = 0
> - arp_validate = 1 or 2
> - bond is up
> - arp_interval = 1
> 
> This would lead to bond issuing ARP requests but never listening
> to the replies, because the tap wasn't attached. After reaching
> this state, while bond is up, setting arp_validate won't help
> as it only acts if needed.
> ---
>  drivers/net/bonding/bond_main.c | 7 +++----
>  1 file changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 39e5b1c..805d098 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -3180,11 +3180,10 @@ static int bond_open(struct net_device *bond_dev)
>  	if (bond->params.miimon)  /* link check interval, in milliseconds. */
>  		queue_delayed_work(bond->wq, &bond->mii_work, 0);
>  
> -	if (bond->params.arp_interval) {  /* arp interval, in milliseconds. */
> +	if (bond->params.arp_interval)    /* arp interval, in milliseconds. */
>  		queue_delayed_work(bond->wq, &bond->arp_work, 0);
> -		if (bond->params.arp_validate)
> -			bond->recv_probe = bond_arp_rcv;
> -	}
> +	if (bond->params.arp_validate)
> +		bond->recv_probe = bond_arp_rcv;
>  
>  	if (bond->params.mode == BOND_MODE_8023AD) {
>  		queue_delayed_work(bond->wq, &bond->ad_work, 0);
> 
Thanks for fixing this.

Acked-by: Nikolay Aleksandrov <nikolay@redhat.com>

^ permalink raw reply

* [PATCH net] bonding: Avoid possible de-sync with arp_validate
From: Marcelo Ricardo Leitner @ 2013-09-06 17:41 UTC (permalink / raw)
  To: netdev; +Cc: naleksan, fubar, andy

Hi Dave,

Please queue this one for -stable trees too.

Thanks.

---8<---

As bond_store_arp_validation and bond_store_arp_interval doesn't
deal with each other, we can't chain both at bond_open, otherwise
we are open to this de-sync state, steps in sequence:
- bond is down
- arp_interval = 0
- arp_validate = 1 or 2
- bond is up
- arp_interval = 1

This would lead to bond issuing ARP requests but never listening
to the replies, because the tap wasn't attached. After reaching
this state, while bond is up, setting arp_validate won't help
as it only acts if needed.
---
 drivers/net/bonding/bond_main.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 39e5b1c..805d098 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -3180,11 +3180,10 @@ static int bond_open(struct net_device *bond_dev)
 	if (bond->params.miimon)  /* link check interval, in milliseconds. */
 		queue_delayed_work(bond->wq, &bond->mii_work, 0);
 
-	if (bond->params.arp_interval) {  /* arp interval, in milliseconds. */
+	if (bond->params.arp_interval)    /* arp interval, in milliseconds. */
 		queue_delayed_work(bond->wq, &bond->arp_work, 0);
-		if (bond->params.arp_validate)
-			bond->recv_probe = bond_arp_rcv;
-	}
+	if (bond->params.arp_validate)
+		bond->recv_probe = bond_arp_rcv;
 
 	if (bond->params.mode == BOND_MODE_8023AD) {
 		queue_delayed_work(bond->wq, &bond->ad_work, 0);
-- 
1.8.3.1

^ permalink raw reply related

* Re: [E1000-devel] [net-next v5 8/8] i40e: include i40e in kernel proper
From: Stephen Hemminger @ 2013-09-06 17:39 UTC (permalink / raw)
  To: Williams, Mitch A
  Cc: Joe Perches, Brandeburg, Jesse, Kirsher, Jeffrey T,
	davem@davemloft.net, e1000-devel@lists.sourceforge.net,
	netdev@vger.kernel.org, gospo@redhat.com, sassmann@redhat.com
In-Reply-To: <AAEA33E297BCAC4B9BB20A7C2DF0AB8D5C3B0A3D@FMSMSX113.amr.corp.intel.com>

On Fri, 6 Sep 2013 17:28:09 +0000
"Williams, Mitch A" <mitch.a.williams@intel.com> wrote:

> > -----Original Message-----
> > From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org] On
> > Behalf Of Joe Perches
> > Sent: Thursday, September 05, 2013 11:47 PM
> > To: Brandeburg, Jesse
> > Cc: Stephen Hemminger; Kirsher, Jeffrey T; davem@davemloft.net; e1000-
> > devel@lists.sourceforge.net; netdev@vger.kernel.org; gospo@redhat.com;
> > sassmann@redhat.com
> > Subject: Re: [E1000-devel] [net-next v5 8/8] i40e: include i40e in kernel
> > proper
> > 
> > On Fri, 2013-09-06 at 06:28 +0000, Brandeburg, Jesse wrote:
> > > On Sep 5, 2013, at 11:15 PM, "Stephen Hemminger"
> > <stephen@networkplumber.org<mailto:stephen@networkplumber.org>> wrote:
> > >
> > > Dumb question why is this named Kbuild instead of Makefile like almost
> > > every other network driver?
> > >
> > > All the new kids are doing it, we'll at least that is what I thought when
> > I made it.
> > >
> > > Re-reading https://www.kernel.org/doc/Documentation/kbuild/makefiles.txt
> > >
> > > I see that Makefile is preferred but Kbuild overrides Makefile.
> > >
> > > Either way works, but I would prefer not to rename it at this point but
> > would gladly do so in a follow up patch
> > 
> > Please do.
> > 
> > I suppose it makes _some_ sense to use Kbuild in the
> > include paths but I think Makefile in places where
> > compilations are done is better.
> > 
> 
> That was me. I did it to make things less confusing for our customers and for us as engineers.
> 
> My goal was to make our out-of-tree driver as close as possible - including the makefiles - to the upstream driver. Doing this makes it simpler for us to backport and forward-port patches. It makes it less confusing for us when we're moving from one environment to the other.
> 
> The Kbuild file submitted here is exactly the same file that will be included in our out-of-tree tarball. To avoid confusion, I named it Kbuild, so nobody would just run make in the source directory. The toplevel makefile in our tarball is named Makefile, and I wanted that to be the only Makefile around so nobody would be confused (including me).
> 
> The kernel documentation specifically states that this is acceptable (Documentation/kbuild/modules.txt). Furthermore, Documentation/kbuild/makefiles.txt states that using Kbuild as a filename is acceptable.
> 
> Personally, I'd prefer to keep these named Kbuild. Doing so makes it simpler to keep our in-tree and out-of-tree drivers in sync. But if you're really passionate about having them named Makefile, we can deal with it. It's just one more detail for us to keep track of (and potentially screw up).
> 
> If there is a hard requirement that these files be named Makefile, then I would request that the kernel documentation be updated to reflect this.
> 
> -Mitch

First, upstream kernel does not care how or why you do something for out-of-tree driver.
As far as network developers are concerned, it is your own process (mistake) and you can
live with the consequences of that decision.

Second, why introduce something different to upstream with no apparent benefit

$ find drivers/net -name Kbuild | wc -l
0

^ permalink raw reply

* [PATCH] tcp: properly increase rcv_ssthresh for ofo packets
From: Eric Dumazet @ 2013-09-06 17:35 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Neal Cardwell

From: Eric Dumazet <edumazet@google.com>

TCP receive window handling is multi staged.

A socket has a memory budget, static or dynamic, in sk_rcvbuf.

Because we do not really know how this memory budget translates to
a TCP window (payload), TCP announces a small initial window
(about 20 MSS).

When a packet is received, we increase TCP rcv_win depending
on the payload/truesize ratio of this packet. Good citizen
packets give a hint that it's reasonable to have rcv_win = sk_rcvbuf/2

This heuristic takes place in tcp_grow_window()

Problem is : We currently call tcp_grow_window() only for in-order
packets.

This means that reorders or packet losses stop proper grow of
rcv_win, and senders are unable to benefit from fast recovery,
or proper reordering level detection.

Really, a packet being stored in OFO queue is not a bad citizen.
It should be part of the game as in-order packets.

In our traces, we very often see sender is limited by linux small
receive windows, even if linux hosts use autotuning (DRS) and should
allow rcv_win to grow to ~3MB.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
---
 net/ipv4/tcp_input.c |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 1969e16..28708d3 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4141,6 +4141,7 @@ static void tcp_data_queue_ofo(struct sock *sk, struct sk_buff *skb)
 		if (!tcp_try_coalesce(sk, skb1, skb, &fragstolen)) {
 			__skb_queue_after(&tp->out_of_order_queue, skb1, skb);
 		} else {
+			tcp_grow_window(sk, skb);
 			kfree_skb_partial(skb, fragstolen);
 			skb = NULL;
 		}
@@ -4216,8 +4217,10 @@ add_sack:
 	if (tcp_is_sack(tp))
 		tcp_sack_new_ofo_skb(sk, seq, end_seq);
 end:
-	if (skb)
+	if (skb) {
+		tcp_grow_window(sk, skb);
 		skb_set_owner_r(skb, sk);
+	}
 }

 static int __must_check tcp_queue_rcv(struct sock *sk, struct sk_buff *skb, int hdrlen,

^ permalink raw reply related

* RE: [E1000-devel] [net-next v5 8/8] i40e: include i40e in kernel proper
From: Williams, Mitch A @ 2013-09-06 17:28 UTC (permalink / raw)
  To: Joe Perches, Brandeburg, Jesse
  Cc: Stephen Hemminger, Kirsher, Jeffrey T, davem@davemloft.net,
	e1000-devel@lists.sourceforge.net, netdev@vger.kernel.org,
	gospo@redhat.com, sassmann@redhat.com
In-Reply-To: <1378450038.19204.6.camel@joe-AO722>

> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org] On
> Behalf Of Joe Perches
> Sent: Thursday, September 05, 2013 11:47 PM
> To: Brandeburg, Jesse
> Cc: Stephen Hemminger; Kirsher, Jeffrey T; davem@davemloft.net; e1000-
> devel@lists.sourceforge.net; netdev@vger.kernel.org; gospo@redhat.com;
> sassmann@redhat.com
> Subject: Re: [E1000-devel] [net-next v5 8/8] i40e: include i40e in kernel
> proper
> 
> On Fri, 2013-09-06 at 06:28 +0000, Brandeburg, Jesse wrote:
> > On Sep 5, 2013, at 11:15 PM, "Stephen Hemminger"
> <stephen@networkplumber.org<mailto:stephen@networkplumber.org>> wrote:
> >
> > Dumb question why is this named Kbuild instead of Makefile like almost
> > every other network driver?
> >
> > All the new kids are doing it, we'll at least that is what I thought when
> I made it.
> >
> > Re-reading https://www.kernel.org/doc/Documentation/kbuild/makefiles.txt
> >
> > I see that Makefile is preferred but Kbuild overrides Makefile.
> >
> > Either way works, but I would prefer not to rename it at this point but
> would gladly do so in a follow up patch
> 
> Please do.
> 
> I suppose it makes _some_ sense to use Kbuild in the
> include paths but I think Makefile in places where
> compilations are done is better.
> 

That was me. I did it to make things less confusing for our customers and for us as engineers.

My goal was to make our out-of-tree driver as close as possible - including the makefiles - to the upstream driver. Doing this makes it simpler for us to backport and forward-port patches. It makes it less confusing for us when we're moving from one environment to the other.

The Kbuild file submitted here is exactly the same file that will be included in our out-of-tree tarball. To avoid confusion, I named it Kbuild, so nobody would just run make in the source directory. The toplevel makefile in our tarball is named Makefile, and I wanted that to be the only Makefile around so nobody would be confused (including me).

The kernel documentation specifically states that this is acceptable (Documentation/kbuild/modules.txt). Furthermore, Documentation/kbuild/makefiles.txt states that using Kbuild as a filename is acceptable.

Personally, I'd prefer to keep these named Kbuild. Doing so makes it simpler to keep our in-tree and out-of-tree drivers in sync. But if you're really passionate about having them named Makefile, we can deal with it. It's just one more detail for us to keep track of (and potentially screw up).

If there is a hard requirement that these files be named Makefile, then I would request that the kernel documentation be updated to reflect this.

-Mitch

^ permalink raw reply

* Re: TSQ accounting skb->truesize degrades throughput for large packets
From: Eric Dumazet @ 2013-09-06 17:00 UTC (permalink / raw)
  To: Zoltan Kiss; +Cc: Wei Liu, Jonathan Davies, Ian Campbell, netdev, xen-devel
In-Reply-To: <522A049A.7000105@citrix.com>

On Fri, 2013-09-06 at 17:36 +0100, Zoltan Kiss wrote:

> So I guess it would be good to revisit the default value of this 
> setting.

If ixgbe requires 3 TSO packets in TX ring to get line rate, you also
can tweak dev->gso_max_size from 65535 to 64000.

^ permalink raw reply

* Re: IPsec+SCTP+IPv6 bug
From: Daniel Borkmann @ 2013-09-06 16:59 UTC (permalink / raw)
  To: Alexey Dobriyan; +Cc: netdev
In-Reply-To: <20130906165609.GA2795@p183.telecom.by>

Thanks Alexey, I'll have a look.

^ permalink raw reply

* Re: TSQ accounting skb->truesize degrades throughput for large packets
From: Eric Dumazet @ 2013-09-06 16:56 UTC (permalink / raw)
  To: Zoltan Kiss; +Cc: Wei Liu, Jonathan Davies, Ian Campbell, netdev, xen-devel
In-Reply-To: <522A049A.7000105@citrix.com>

On Fri, 2013-09-06 at 17:36 +0100, Zoltan Kiss wrote:
> On 06/09/13 13:57, Eric Dumazet wrote:
> > Well, I have no problem to get line rate on 20Gb with a single flow, so
> > other drivers have no problem.
> I've made some tests on bare metal:
> Dell PE R815, Intel 82599EB 10Gb, 3.11-rc4 32 bit kernel with 3.17.3 
> ixgbe (TSO, GSO on), iperf 2.0.5
> Transmitting packets toward the remote end (so running iperf -c on this 
> host) can make 8.3 Gbps with the default 128k tcp_limit_output_bytes. 
> When I increased this to 131.506 (128k + 434 bytes) suddenly it jumped 
> to 9.4 Gbps. Iperf CPU usage also jumped a few percent from ~36 to ~40% 
> (softint percentage in top also increased from ~3 to ~5%)

Typical tradeoff between latency and throughput

If you favor throughput, then you can increase tcp_limit_output_bytes

The default is quite reasonable IMHO.

> So I guess it would be good to revisit the default value of this 
> setting. What hw you used Eric for your 20Gb results?

Mellanox CX-3

Make sure your NIC doesn't hold TX packets in TX ring too long before
signaling an interrupt for TX completion.

For example I had to patch mellanox :

commit ecfd2ce1a9d5e6376ff5c00b366345160abdbbb7
Author: Eric Dumazet <edumazet@google.com>
Date:   Mon Nov 5 16:20:42 2012 +0000

    mlx4: change TX coalescing defaults
    
    mlx4 currently uses a too high tx coalescing setting, deferring
    TX completion interrupts by up to 128 us.
    
    With the recent skb_orphan() removal in commit 8112ec3b872,
    performance of a single TCP flow is capped to ~4 Gbps, unless
    we increase tcp_limit_output_bytes.
    
    I suggest using 16 us instead of 128 us, allowing a finer control.
    
    Performance of a single TCP flow is restored to previous levels,
    while keeping TCP small queues fully enabled with default sysctl.
    
    This patch is also a BQL prereq.
    
    Reported-by: Vimalkumar <j.vimal@gmail.com>
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Cc: Yevgeny Petrilin <yevgenyp@mellanox.com>
    Cc: Or Gerlitz <ogerlitz@mellanox.com>
    Acked-by: Amir Vadai <amirv@mellanox.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply

* IPsec+SCTP+IPv6 bug (was: Re: https://bugzilla.kernel.org/show_bug.cgi?id=24412)
From: Alexey Dobriyan @ 2013-09-06 16:56 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: netdev
In-Reply-To: <5229B37A.2070900@redhat.com>

	[add netdev@ to CC]





My notes and recollections are below.

Bug reporter says traffic is unecnrypted which is technically a different thing,
but when I tried to reproduce absense of encryption, I couldn't even get past
established TCP connection.

IPv6 case works (worked) without IPsec (setkey -F; setkey -FP).
IPv4 case worked with IPsec.
IPv6 didn't work with IPsec.

setkey(8) setup:

	#!/usr/sbin/setkey -f
	flush;
	spdflush;
	add A B ah 0x42 -A hmac-sha256 0xKEY1;
	add B A ah 0x43 -A hmac-sha256 0xKEY2;
	add A B esp 0x44 -E blowfish-cbc 0xKEY3;
	add B A esp 0x45 -E blowfish-cbc 0xKEY4;
	spdadd A B any -P in ipsec esp/transport//require ah/transport//require;
	spdadd B A any -P in ipsec esp/transport//require ah/transport//require;

A, B -- IPv4 or IPv6 client/server addresses.
Client and server copy of the file should have matching keys and SPI numbers,
but "opposite" addresses (A <=> B).

Keys are written in hex.
IIRC setkey is picky about key lengths (they have to match exactly those of crypto algorithms).

--------------------------------------------------------------------

IPv4 reproducer:
	socat sctp-listen:3333 -
	echo plaintext | socat - sctp-connect:IP4IP4IP4IP4:3333

IPv6 reproducer:
	socat sctp6-listen:3333 -

#include <sys/socket.h>
#include <arpa/inet.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <unistd.h>
#include <string.h>

int main(void)
{
        struct sockaddr_in6 sa6 = {};
        int fd;

        fd = socket(PF_INET6, SOCK_STREAM, 0x84);
        sa6.sin6_family = AF_INET6;
        sa6.sin6_port = htons(3333);
        inet_pton(AF_INET6, "IP6IP6IP6IP6IP6IP6", &sa6.sin6_addr);
        sa6.sin6_scope_id = 2;	/* it depends */
        connect(fd, (struct sockaddr *)&sa6, sizeof(struct sockaddr_in6));
        write(fd, "plaintext\n", strlen("plaintext\n"));
        close(fd);
        return 0;
}

^ permalink raw reply

* Re: GSO/GRO and UDP performance
From: Rick Jones @ 2013-09-06 16:42 UTC (permalink / raw)
  To: James Yonan; +Cc: Eric Dumazet, netdev
In-Reply-To: <1378472829.31445.21.camel@edumazet-glaptop>

On 09/06/2013 06:07 AM, Eric Dumazet wrote:
> On Fri, 2013-09-06 at 03:22 -0600, James Yonan wrote:
>
>> So I think that playing well with GSO/GRO is essential to get speedup in
>> UDP apps because of this 43x multiplier.
>>
>
> Thats not true. GRO cannot aggregate more than 16+1 packets.
>
> I think we cannot aggregate UDP packets, because UDP lacks sequence
> numbers, so reorders would be a problem.
>
> You really need something that is not UDP generic.

It may  not be as sexy, and it cannot get the 43x multiplier (just what 
*is* the service demand change on a netperf TCP_STREAM test these days 
between GSO/GRO on and off anyway?), but looking for basic path-length 
reductions would be goodness.

rick jones

^ permalink raw reply

* Re: TSQ accounting skb->truesize degrades throughput for large packets
From: Zoltan Kiss @ 2013-09-06 16:36 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Wei Liu, Jonathan Davies, Ian Campbell, netdev, xen-devel
In-Reply-To: <1378472268.31445.15.camel@edumazet-glaptop>

On 06/09/13 13:57, Eric Dumazet wrote:
> Well, I have no problem to get line rate on 20Gb with a single flow, so
> other drivers have no problem.
I've made some tests on bare metal:
Dell PE R815, Intel 82599EB 10Gb, 3.11-rc4 32 bit kernel with 3.17.3 
ixgbe (TSO, GSO on), iperf 2.0.5
Transmitting packets toward the remote end (so running iperf -c on this 
host) can make 8.3 Gbps with the default 128k tcp_limit_output_bytes. 
When I increased this to 131.506 (128k + 434 bytes) suddenly it jumped 
to 9.4 Gbps. Iperf CPU usage also jumped a few percent from ~36 to ~40% 
(softint percentage in top also increased from ~3 to ~5%)
So I guess it would be good to revisit the default value of this 
setting. What hw you used Eric for your 20Gb results?

Regards,

Zoli

^ permalink raw reply

* Re: [PATCH net] bnx2x: fix broken compilation with CONFIG_BNX2X_SRIOV is not set
From: Eric Dumazet @ 2013-09-06 16:10 UTC (permalink / raw)
  To: Dmitry Kravkov; +Cc: netdev, davem, Ariel Elior
In-Reply-To: <1378452928-6072-1-git-send-email-dmitry@broadcom.com>

On Fri, 2013-09-06 at 10:35 +0300, Dmitry Kravkov wrote:
> Since commit 60cad4e67bd6ff400e7ea61fe762b3042b12ae9d
> "bnx2x: VF RSS support - VF side" fails to compile w/o
> CONFIG_BNX2X_SRIOV option.
>  
> Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
> CC: Ariel Elior <ariele@broadcom.com>
> Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
> ---

Thanks !

Acked-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply

* Re: [PATCH net-next v2] net: add documentation for BQL helpers
From: Eric Dumazet @ 2013-09-06 16:09 UTC (permalink / raw)
  To: Florian Fainelli; +Cc: netdev, davem
In-Reply-To: <1378483080-31632-1-git-send-email-f.fainelli@gmail.com>

On Fri, 2013-09-06 at 16:58 +0100, Florian Fainelli wrote:
> Provide a kernel-doc comment documentation for the BQL helpers:
> - netdev_sent_queue
> - netdev_completed_queue
> - netdev_reset_queue
> 
> Similarly to how it is done for the other functions, the documentation
> only covers the function operating on struct net_device and not struct
> netdev_queue.
> 
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
> ---
> Change since v1:
> - incorporate suggestsions from Eric Dumazet and just specify the exact
>   match for netdev_sent_queue/netdev_completed_queue

Acked-by: Eric Dumazet <edumazet@google.com>

Thanks

^ permalink raw reply

* [PATCH net-next v2] net: add documentation for BQL helpers
From: Florian Fainelli @ 2013-09-06 15:58 UTC (permalink / raw)
  To: netdev; +Cc: eric.dumazet, davem, Florian Fainelli
In-Reply-To: <1378465119-1894-1-git-send-email-f.fainelli@gmail.com>

Provide a kernel-doc comment documentation for the BQL helpers:
- netdev_sent_queue
- netdev_completed_queue
- netdev_reset_queue

Similarly to how it is done for the other functions, the documentation
only covers the function operating on struct net_device and not struct
netdev_queue.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
Change since v1:
- incorporate suggestsions from Eric Dumazet and just specify the exact
  match for netdev_sent_queue/netdev_completed_queue

 include/linux/netdevice.h | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 8ed4ae9..041b42a 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2101,6 +2101,15 @@ static inline void netdev_tx_sent_queue(struct netdev_queue *dev_queue,
 #endif
 }
 
+/**
+ * 	netdev_sent_queue - report the number of bytes queued to hardware
+ * 	@dev: network device
+ * 	@bytes: number of bytes queued to the hardware device queue
+ *
+ * 	Report the number of bytes queued for sending/completion to the network
+ * 	device hardware queue. @bytes should be a good approximation and should
+ * 	exactly match netdev_completed_queue() @bytes
+ */
 static inline void netdev_sent_queue(struct net_device *dev, unsigned int bytes)
 {
 	netdev_tx_sent_queue(netdev_get_tx_queue(dev, 0), bytes);
@@ -2130,6 +2139,16 @@ static inline void netdev_tx_completed_queue(struct netdev_queue *dev_queue,
 #endif
 }
 
+/**
+ * 	netdev_completed_queue - report bytes and packets completed by device
+ * 	@dev: network device
+ * 	@pkts: actual number of packets sent over the medium
+ * 	@bytes: actual number of bytes sent over the medium
+ *
+ * 	Report the number of bytes and packets transmitted by the network device
+ * 	hardware queue over the physical medium, @bytes must exactly match the
+ * 	@bytes amount passed to netdev_sent_queue()
+ */
 static inline void netdev_completed_queue(struct net_device *dev,
 					  unsigned int pkts, unsigned int bytes)
 {
@@ -2144,6 +2163,13 @@ static inline void netdev_tx_reset_queue(struct netdev_queue *q)
 #endif
 }
 
+/**
+ * 	netdev_reset_queue - reset the packets and bytes count of a network device
+ * 	@dev_queue: network device
+ *
+ * 	Reset the bytes and packet count of a network device and clear the
+ * 	software flow control OFF bit for this network device
+ */
 static inline void netdev_reset_queue(struct net_device *dev_queue)
 {
 	netdev_tx_reset_queue(netdev_get_tx_queue(dev_queue, 0));
-- 
1.8.1.2

^ permalink raw reply related

* Re: [PATCH net-next] net: add documentation for BQL helpers
From: Eric Dumazet @ 2013-09-06 15:18 UTC (permalink / raw)
  To: Florian Fainelli; +Cc: netdev, davem
In-Reply-To: <1378465119-1894-1-git-send-email-f.fainelli@gmail.com>

On Fri, 2013-09-06 at 11:58 +0100, Florian Fainelli wrote:

>  
> +/**
> + * 	netdev_completed_queue - report bytes and packets completed by device
> + * 	@dev: network device
> + * 	@pkts: actual number of packets sent over the medium
> + * 	@bytes: actual number of bytes sent over the medium
> + *
> + * 	Report the number of bytes and packets transmitted by the network device
> + * 	hardware queue over the physical medium (without prepended/appended
> + * 	control blocks, FCS...)

Here, you should instead document that the @bytes must exactly match the
amount given in netdev_sent_queue(), or else BQL can panic.

(each queued skb accounts for X bytes on BQL, so at TX completion, same
X must be unaccounted)

^ permalink raw reply

* [RFC PATCHv2 4/4] net: mvneta: add support for fixed links
From: Thomas Petazzoni @ 2013-09-06 15:18 UTC (permalink / raw)
  To: David S. Miller, netdev, devicetree
  Cc: Florian Fainelli, Lior Amsalem, Gregory Clement, Ezequiel Garcia,
	linux-arm-kernel, Mark Rutland, Sascha Hauer, Christian Gmeiner
In-Reply-To: <1378480701-12908-1-git-send-email-thomas.petazzoni@free-electrons.com>

Following the introduction of of_phy_register_fixed_link(), this patch
introduces fixed link support in the mvneta driver, for Marvell Armada
370/XP SOCs.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
---
 .../devicetree/bindings/net/marvell-armada-370-neta.txt        |  4 ++--
 drivers/net/ethernet/marvell/mvneta.c                          | 10 ++++++----
 2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/marvell-armada-370-neta.txt b/Documentation/devicetree/bindings/net/marvell-armada-370-neta.txt
index 859a6fa..4d07d4e 100644
--- a/Documentation/devicetree/bindings/net/marvell-armada-370-neta.txt
+++ b/Documentation/devicetree/bindings/net/marvell-armada-370-neta.txt
@@ -4,8 +4,8 @@ Required properties:
 - compatible: should be "marvell,armada-370-neta".
 - reg: address and length of the register set for the device.
 - interrupts: interrupt for the device
-- phy: A phandle to a phy node defining the PHY address (as the reg
-  property, a single integer).
+- phy: A phandle to the PHY node describing the PHY to which this
+  Ethernet controller is connected to.
 - phy-mode: The interface between the SoC and the PHY (a string that
   of_get_phy_mode() can understand)
 - clocks: a pointer to the reference clock for this device.
diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index b017818..6da1516 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -2711,10 +2711,12 @@ static int mvneta_probe(struct platform_device *pdev)
 	}
 
 	phy_node = of_parse_phandle(dn, "phy", 0);
-	if (!phy_node) {
-		dev_err(&pdev->dev, "no associated PHY\n");
-		err = -ENODEV;
-		goto err_free_irq;
+	if (of_phy_is_fixed_link(phy_node)) {
+		err = of_phy_register_fixed_link(phy_node);
+		if (err < 0) {
+			dev_err(&pdev->dev, "cannot register fixed PHY\n");
+			goto err_free_irq;
+		}
 	}
 
 	phy_mode = of_get_phy_mode(dn);
-- 
1.8.1.2

^ permalink raw reply related

* [RFC PATCHv2 3/4] of: provide a binding for fixed link PHYs
From: Thomas Petazzoni @ 2013-09-06 15:18 UTC (permalink / raw)
  To: David S. Miller, netdev, devicetree
  Cc: Florian Fainelli, Lior Amsalem, Gregory Clement, Ezequiel Garcia,
	linux-arm-kernel, Mark Rutland, Sascha Hauer, Christian Gmeiner
In-Reply-To: <1378480701-12908-1-git-send-email-thomas.petazzoni@free-electrons.com>

Some Ethernet MACs have a "fixed link", and are not connected to a
normal MDIO-managed PHY device. For those situations, a Device Tree
binding allows to describe a "fixed link" using a special PHY node.

This patch adds:

 * A documentation for the fixed PHY Device Tree binding.

 * An of_phy_is_fixed_link() function that an Ethernet driver can call
   on its PHY phandle to find out whether it's a fixed link PHY or
   not. It should typically be used to know if
   of_phy_register_fixed_link() should be called.

 * An of_phy_register_fixed_link() function that instantiates the
   fixed PHY into the PHY subsystem, so that when the driver calls
   of_phy_connect(), the PHY device associated to the OF node will be
   found.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
---
 .../devicetree/bindings/net/fixed-link.txt         | 34 ++++++++++++++++++++++
 drivers/of/of_mdio.c                               | 24 +++++++++++++++
 include/linux/of_mdio.h                            | 15 ++++++++++
 3 files changed, 73 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/net/fixed-link.txt

diff --git a/Documentation/devicetree/bindings/net/fixed-link.txt b/Documentation/devicetree/bindings/net/fixed-link.txt
new file mode 100644
index 0000000..9f2a1a50
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/fixed-link.txt
@@ -0,0 +1,34 @@
+Fixed link Device Tree binding
+------------------------------
+
+Some Ethernet MACs have a "fixed link", and are not connected to a
+normal MDIO-managed PHY device. For those situations, a Device Tree
+binding allows to describe a "fixed link".
+
+Such a fixed link situation is described by creating a PHY node as a
+sub-node of an Ethernet device, with the following properties:
+
+* 'fixed-link' (boolean, mandatory), to indicate that this PHY is a
+  fixed link PHY.
+* 'speed' (integer, mandatory), to indicate the link speed. Accepted
+  values are 10, 100 and 1000
+* 'full-duplex' (boolean, optional), to indicate that full duplex is
+  used. When absent, half duplex is assumed.
+* 'pause' (boolean, optional), to indicate that pause should be
+  enabled.
+* 'asym-pause' (boolean, optional), to indicate that asym_pause should
+  be enabled.
+
+Example:
+
+ethernet@0 {
+	...
+	phy = <&phy0>;
+	phy0: phy@0 {
+	      fixed-link;
+	      speed = <1000>;
+	      full-duplex;
+	};
+	...
+};
+
diff --git a/drivers/of/of_mdio.c b/drivers/of/of_mdio.c
index d5a57a9..0507f8f 100644
--- a/drivers/of/of_mdio.c
+++ b/drivers/of/of_mdio.c
@@ -14,6 +14,7 @@
 #include <linux/netdevice.h>
 #include <linux/err.h>
 #include <linux/phy.h>
+#include <linux/phy_fixed.h>
 #include <linux/of.h>
 #include <linux/of_irq.h>
 #include <linux/of_mdio.h>
@@ -247,3 +248,26 @@ struct phy_device *of_phy_connect_fixed_link(struct net_device *dev,
 	return IS_ERR(phy) ? NULL : phy;
 }
 EXPORT_SYMBOL(of_phy_connect_fixed_link);
+
+#if defined(CONFIG_FIXED_PHY)
+bool of_phy_is_fixed_link(struct device_node *np)
+{
+	return of_property_read_bool(np, "fixed-link");
+}
+EXPORT_SYMBOL(of_phy_is_fixed_link);
+
+int of_phy_register_fixed_link(struct device_node *np)
+{
+	struct fixed_phy_status status = {};
+
+	status.link = 1;
+	status.duplex = of_property_read_bool(np, "full-duplex");
+	if (of_property_read_u32(np, "speed", &status.speed))
+		return -EINVAL;
+	status.pause = of_property_read_bool(np, "pause");
+	status.asym_pause = of_property_read_bool(np, "asym-pause");
+
+	return fixed_phy_register(PHY_POLL, &status, np);
+}
+EXPORT_SYMBOL(of_phy_register_fixed_link);
+#endif
diff --git a/include/linux/of_mdio.h b/include/linux/of_mdio.h
index 8163107..2f535ee 100644
--- a/include/linux/of_mdio.h
+++ b/include/linux/of_mdio.h
@@ -57,4 +57,19 @@ static inline struct mii_bus *of_mdio_find_bus(struct device_node *mdio_np)
 }
 #endif /* CONFIG_OF */
 
+#if defined(CONFIG_OF) && defined(CONFIG_FIXED_PHY)
+extern int of_phy_register_fixed_link(struct device_node *np);
+extern bool of_phy_is_fixed_link(struct device_node *np);
+#else
+static inline int of_phy_register_fixed_link(struct device_node *np)
+{
+	return -ENOSYS;
+}
+static inline bool of_phy_is_fixed_link(struct device_node *np)
+{
+	return false;
+}
+#endif
+
+
 #endif /* __LINUX_OF_MDIO_H */
-- 
1.8.1.2

^ permalink raw reply related

* [RFC PATCHv2 2/4] net: phy: extend fixed driver with fixed_phy_register()
From: Thomas Petazzoni @ 2013-09-06 15:18 UTC (permalink / raw)
  To: David S. Miller, netdev, devicetree
  Cc: Florian Fainelli, Lior Amsalem, Gregory Clement, Ezequiel Garcia,
	linux-arm-kernel, Mark Rutland, Sascha Hauer, Christian Gmeiner
In-Reply-To: <1378480701-12908-1-git-send-email-thomas.petazzoni@free-electrons.com>

The existing fixed_phy_add() function has several drawbacks that
prevents it from being used as is for OF-based declaration of fixed
PHYs:

 * The address of the PHY on the fake bus needs to be passed, while a
   dynamic allocation is desired.

 * Since the phy_device instantiation is post-poned until the next
   mdiobus scan, there is no way to associate the fixed PHY with its
   OF node, which later prevents of_phy_connect() from finding this
   fixed PHY from a given OF node.

To solve this, this commit introduces fixed_phy_register(), which will
allocate an available PHY address, add the PHY using fixed_phy_add()
and instantiate the phy_device structure associated with the provided
OF node.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
---
 drivers/net/phy/fixed.c   | 43 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/phy_fixed.h | 11 +++++++++++
 2 files changed, 54 insertions(+)

diff --git a/drivers/net/phy/fixed.c b/drivers/net/phy/fixed.c
index 0f02403..3f9412b 100644
--- a/drivers/net/phy/fixed.c
+++ b/drivers/net/phy/fixed.c
@@ -21,6 +21,7 @@
 #include <linux/phy_fixed.h>
 #include <linux/err.h>
 #include <linux/slab.h>
+#include <linux/of.h>
 
 #define MII_REGS_NUM 29
 
@@ -203,6 +204,48 @@ err_regs:
 }
 EXPORT_SYMBOL_GPL(fixed_phy_add);
 
+static int phy_fixed_addr;
+static DEFINE_SPINLOCK(phy_fixed_addr_lock);
+
+int fixed_phy_register(unsigned int irq,
+		       struct fixed_phy_status *status,
+		       struct device_node *np)
+{
+	struct fixed_mdio_bus *fmb = &platform_fmb;
+	struct phy_device *phy;
+	int phy_addr;
+	int ret;
+
+	/* Get the next available PHY address, up to PHY_MAX_ADDR */
+	spin_lock(&phy_fixed_addr_lock);
+	if (phy_fixed_addr == PHY_MAX_ADDR) {
+		spin_unlock(&phy_fixed_addr_lock);
+		return -ENOSPC;
+	}
+	phy_addr = phy_fixed_addr++;
+	spin_unlock(&phy_fixed_addr_lock);
+
+	ret = fixed_phy_add(PHY_POLL, phy_addr, status);
+	if (ret < 0)
+		return ret;
+
+	phy = get_phy_device(fmb->mii_bus, phy_addr, false);
+	if (!phy || IS_ERR(phy))
+		return -EINVAL;
+
+	of_node_get(np);
+	phy->dev.of_node = np;
+
+	ret = phy_device_register(phy);
+	if (ret) {
+		phy_device_free(phy);
+		of_node_put(np);
+		return ret;
+	}
+
+	return 0;
+}
+
 static int __init fixed_mdio_bus_init(void)
 {
 	struct fixed_mdio_bus *fmb = &platform_fmb;
diff --git a/include/linux/phy_fixed.h b/include/linux/phy_fixed.h
index 509d8f5..902e8a1 100644
--- a/include/linux/phy_fixed.h
+++ b/include/linux/phy_fixed.h
@@ -9,15 +9,26 @@ struct fixed_phy_status {
 	int asym_pause;
 };
 
+struct device_node;
+
 #ifdef CONFIG_FIXED_PHY
 extern int fixed_phy_add(unsigned int irq, int phy_id,
 			 struct fixed_phy_status *status);
+extern int fixed_phy_register(unsigned int irq,
+			      struct fixed_phy_status *status,
+			      struct device_node *np);
 #else
 static inline int fixed_phy_add(unsigned int irq, int phy_id,
 				struct fixed_phy_status *status)
 {
 	return -ENODEV;
 }
+extern int fixed_phy_register(unsigned int irq,
+			      struct fixed_phy_status *status,
+			      struct device_node *np)
+{
+	return -ENODEV;
+}
 #endif /* CONFIG_FIXED_PHY */
 
 /*
-- 
1.8.1.2

^ permalink raw reply related

* [RFC PATCHv2 1/4] net: phy: decouple PHY id and PHY address in fixed PHY driver
From: Thomas Petazzoni @ 2013-09-06 15:18 UTC (permalink / raw)
  To: David S. Miller, netdev, devicetree
  Cc: Florian Fainelli, Lior Amsalem, Gregory Clement, Ezequiel Garcia,
	linux-arm-kernel, Mark Rutland, Sascha Hauer, Christian Gmeiner
In-Reply-To: <1378480701-12908-1-git-send-email-thomas.petazzoni@free-electrons.com>

Until now, the fixed_phy_add() function was taking as argument
'phy_id', which was used both as the PHY address on the fake fixed
MDIO bus, and as the PHY id, as available in the MII_PHYSID1 and
MII_PHYSID2 registers. However, those two informations are completely
unrelated.

This patch decouples them. The PHY id of fixed PHYs is hardcoded to be
0xdeadbeef. Ideally, a really reserved value would be nicer, but there
doesn't seem to be an easy of making sure a dummy value can be
assigned to the Linux kernel for such usage.

The PHY address remains passed by the caller of phy_fixed_add().

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
---
 drivers/net/phy/fixed.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/net/phy/fixed.c b/drivers/net/phy/fixed.c
index ba55adf..0f02403 100644
--- a/drivers/net/phy/fixed.c
+++ b/drivers/net/phy/fixed.c
@@ -31,7 +31,7 @@ struct fixed_mdio_bus {
 };
 
 struct fixed_phy {
-	int id;
+	int addr;
 	u16 regs[MII_REGS_NUM];
 	struct phy_device *phydev;
 	struct fixed_phy_status status;
@@ -104,8 +104,8 @@ static int fixed_phy_update_regs(struct fixed_phy *fp)
 	if (fp->status.asym_pause)
 		lpa |= LPA_PAUSE_ASYM;
 
-	fp->regs[MII_PHYSID1] = fp->id >> 16;
-	fp->regs[MII_PHYSID2] = fp->id;
+	fp->regs[MII_PHYSID1] = 0xdead;
+	fp->regs[MII_PHYSID2] = 0xbeef;
 
 	fp->regs[MII_BMSR] = bmsr;
 	fp->regs[MII_BMCR] = bmcr;
@@ -115,7 +115,7 @@ static int fixed_phy_update_regs(struct fixed_phy *fp)
 	return 0;
 }
 
-static int fixed_mdio_read(struct mii_bus *bus, int phy_id, int reg_num)
+static int fixed_mdio_read(struct mii_bus *bus, int phy_addr, int reg_num)
 {
 	struct fixed_mdio_bus *fmb = bus->priv;
 	struct fixed_phy *fp;
@@ -124,7 +124,7 @@ static int fixed_mdio_read(struct mii_bus *bus, int phy_id, int reg_num)
 		return -1;
 
 	list_for_each_entry(fp, &fmb->phys, node) {
-		if (fp->id == phy_id) {
+		if (fp->addr == phy_addr) {
 			/* Issue callback if user registered it. */
 			if (fp->link_update) {
 				fp->link_update(fp->phydev->attached_dev,
@@ -138,7 +138,7 @@ static int fixed_mdio_read(struct mii_bus *bus, int phy_id, int reg_num)
 	return 0xFFFF;
 }
 
-static int fixed_mdio_write(struct mii_bus *bus, int phy_id, int reg_num,
+static int fixed_mdio_write(struct mii_bus *bus, int phy_addr, int reg_num,
 			    u16 val)
 {
 	return 0;
@@ -160,7 +160,7 @@ int fixed_phy_set_link_update(struct phy_device *phydev,
 		return -EINVAL;
 
 	list_for_each_entry(fp, &fmb->phys, node) {
-		if (fp->id == phydev->phy_id) {
+		if (fp->addr == phydev->addr) {
 			fp->link_update = link_update;
 			fp->phydev = phydev;
 			return 0;
@@ -171,7 +171,7 @@ int fixed_phy_set_link_update(struct phy_device *phydev,
 }
 EXPORT_SYMBOL_GPL(fixed_phy_set_link_update);
 
-int fixed_phy_add(unsigned int irq, int phy_id,
+int fixed_phy_add(unsigned int irq, int phy_addr,
 		  struct fixed_phy_status *status)
 {
 	int ret;
@@ -184,9 +184,9 @@ int fixed_phy_add(unsigned int irq, int phy_id,
 
 	memset(fp->regs, 0xFF,  sizeof(fp->regs[0]) * MII_REGS_NUM);
 
-	fmb->irqs[phy_id] = irq;
+	fmb->irqs[phy_addr] = irq;
 
-	fp->id = phy_id;
+	fp->addr = phy_addr;
 	fp->status = *status;
 
 	ret = fixed_phy_update_regs(fp);
-- 
1.8.1.2

^ permalink raw reply related

* [RFC PATCHv2 0/4] Add DT support for fixed PHYs
From: Thomas Petazzoni @ 2013-09-06 15:18 UTC (permalink / raw)
  To: David S. Miller, netdev, devicetree
  Cc: Florian Fainelli, Lior Amsalem, Gregory Clement, Ezequiel Garcia,
	linux-arm-kernel, Mark Rutland, Sascha Hauer, Christian Gmeiner

Hello,

Here is a second version of the patch set that adds a Device Tree
binding and the related code to support fixed PHYs. Marked as RFC,
this patch set is obviously not intended for merging in 3.12.

Since the first version, the changes have been:

 * Instead of using a 'fixed-link' property inside the Ethernet device
   DT node, with a fairly cryptic succession of integer values, we now
   use a PHY subnode under the Ethernet device DT node, with explicit
   properties to configure the duplex, speed, pause and other PHY
   properties.

 * The PHY address is automatically allocated by the kernel and no
   longer visible in the Device Tree binding.

 * The PHY device is created directly when the network driver calls
   of_phy_connect_fixed_link(), and associated to the PHY DT node,
   which allows the existing of_phy_connect() function to work,
   without the need to use the deprecated of_phy_connect_fixed_link().

The things I am not entirely happy with yet are:

 * The PHY ID is hardcoded to 0xdeadbeef. Ideally, it should be a
   properly reserved vendor/device identifier, but it isn't clear how
   to get one allocated for this purpose.

 * The fixed_phy_register() function in drivers/net/phy/fixed.c has
   some OF references. So ideally, I would have preferred to put this
   code in drivers/of/of_mdio.c, but to call get_phy_device(), we need
   a reference to the mii_bus structure that represents the fixed MDIO
   bus.

 * There is some error management missing in fixed_phy_register(), but
   it can certainly be added easily. This RFC is meant to sort out the
   general idea.

Thanks,

Thomas

Thomas Petazzoni (4):
  net: phy: decouple PHY id and PHY address in fixed PHY driver
  net: phy: extend fixed driver with fixed_phy_register()
  of: provide a binding for fixed link PHYs
  net: mvneta: add support for fixed links

 .../devicetree/bindings/net/fixed-link.txt         | 34 ++++++++++++
 .../bindings/net/marvell-armada-370-neta.txt       |  4 +-
 drivers/net/ethernet/marvell/mvneta.c              | 10 ++--
 drivers/net/phy/fixed.c                            | 63 ++++++++++++++++++----
 drivers/of/of_mdio.c                               | 24 +++++++++
 include/linux/of_mdio.h                            | 15 ++++++
 include/linux/phy_fixed.h                          | 11 ++++
 7 files changed, 145 insertions(+), 16 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/net/fixed-link.txt

-- 
1.8.1.2

^ permalink raw reply

* [patch net/stable] ipv6/exthdrs: accept tlv which includes only padding
From: Jiri Pirko @ 2013-09-06 14:02 UTC (permalink / raw)
  To: netdev; +Cc: davem, kuznet, jmorris, yoshfuji, kaber, eldad

In rfc4942 and rfc2460 I cannot find anything which would implicate to
drop packets which have only padding in tlv.

Current behaviour breaks TAHI Test v6LC.1.2.6.

Problem was intruduced in:
9b905fe6843 "ipv6/exthdrs: strict Pad1 and PadN check"

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
 net/ipv6/exthdrs.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/net/ipv6/exthdrs.c b/net/ipv6/exthdrs.c
index 07a7d65..8d67900 100644
--- a/net/ipv6/exthdrs.c
+++ b/net/ipv6/exthdrs.c
@@ -162,12 +162,6 @@ static bool ip6_parse_tlv(const struct tlvtype_proc *procs, struct sk_buff *skb)
 		off += optlen;
 		len -= optlen;
 	}
-	/* This case will not be caught by above check since its padding
-	 * length is smaller than 7:
-	 * 1 byte NH + 1 byte Length + 6 bytes Padding
-	 */
-	if ((padlen == 6) && ((off - skb_network_header_len(skb)) == 8))
-		goto bad;
 
 	if (len == 0)
 		return true;
-- 
1.8.3.1

^ permalink raw reply related

* Re: [PATCH net-next] net: add documentation for BQL helpers
From: Florian Fainelli @ 2013-09-06 13:56 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, David Miller
In-Reply-To: <1378472605.31445.18.camel@edumazet-glaptop>

2013/9/6 Eric Dumazet <eric.dumazet@gmail.com>:
> On Fri, 2013-09-06 at 11:58 +0100, Florian Fainelli wrote:
>> Provide a kernel-doc comment documentation for the BQL helpers:
>> - netdev_sent_queue
>> - netdev_completed_queue
>> - netdev_reset_queue
>>
>> Similarly to how it is done for the other functions, the documentation
>> only covers the function operating on struct net_device and not struct
>> netdev_queue.
>>
>> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
>> ---
>>  include/linux/netdevice.h | 27 +++++++++++++++++++++++++++
>>  1 file changed, 27 insertions(+)
>>
>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>> index 8ed4ae9..ac36629 100644
>> --- a/include/linux/netdevice.h
>> +++ b/include/linux/netdevice.h
>> @@ -2101,6 +2101,16 @@ static inline void netdev_tx_sent_queue(struct netdev_queue *dev_queue,
>>  #endif
>>  }
>>
>> +/**
>> + *   netdev_sent_queue - report the number of bytes queued to hardware
>> + *   @dev: network device
>> + *   @bytes: number of bytes queued to the hardware device queue
>> + *
>> + *   Report the number of bytes queued for sending/completion to the network
>> + *   device hardware queue. @bytes should specify the number of bytes which
>> + *   will be sent over the physical medium (without prepended/appended
>> + *   control blocks, FCS...)
>
> There is no such requirement.
>
> @bytes should be a good approximation, and should match
> netdev_completed_queue() @bytes
>
>
> If you think of TSO, we know that skb->len does not exactly matches
> number of bytes on physical medium ( check qdisc_pkt_len_init() to see
> how Qdisc layer tries to get better estimation )

Thanks Eric, do you also want me to update the comment above
netdev_completed_queue() or are you happy with it?
-- 
Florian

^ permalink raw reply

* Re: [Xen-devel] [PATCH] net/core: Order-3 frag allocator causes SWIOTLB bouncing under Xen
From: Konrad Rzeszutek Wilk @ 2013-09-06 13:27 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Eric Dumazet, Eliezer Tamir, Neil Horman, netdev, linux-kernel,
	xen-devel, Eric Dumazet, Li Zefan, david.vrabel, Zoltan Kiss,
	malcolm.crossley, David S. Miller
In-Reply-To: <1378366746.6935.35.camel@dagon.hellion.org.uk>

On Thu, Sep 05, 2013 at 08:39:06AM +0100, Ian Campbell wrote:
> On Wed, 2013-09-04 at 17:11 -0400, Konrad Rzeszutek Wilk wrote:
> > On Wed, Sep 04, 2013 at 02:00:40PM -0700, Eric Dumazet wrote:
> 
> > > Maybe you could add proper infrastructure to deal with Xen limitations.
> > 
> > I think Ian posted at some point an sysctl patch for that (more for
> > debugging that anything else). And it kind
> > of stalled: http://lists.xen.org/archives/html/xen-devel/2012-10/msg01832.html
> 
> I think I though you were looking into it from the swiotlb angle?

Yes. I didn't find anything immediately obvious - but I can still
reproduce with an skge DMA issues when booting baremetal with 'swiotlb=force'.

But only under 32-bit. I think there is some physical address
truncation with the new compound size skb's. Obviously needs
further investigation.

> 
> In any case I don't have time to look into this further.
> 
> > Is that what you mean by proper infrastructure ?
> > 
> > Oh wait, did you mean via dev and not the whole system wide sysctl?
> 
> The system wide sysctl was not acceptable AFAIR, which seems reasonable.

<nods>
> 
> Per-dev is hard because it affects the native drivers for each physical
> NIC when running under Xen, not the Xen PV NIC which we fixed by
> splitting compound frags into separate slots on the PV ring.

Right. Thank you for pointing that obvious issue.

> 
> Ian.
> 

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox