Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next] virtio-net: enable multiqueue by default
From: Michael S. Tsirkin @ 2016-11-25  4:43 UTC (permalink / raw)
  To: Jason Wang
  Cc: Neil Horman, Jeremy Eder, Hannes Frederic Sowa, netdev,
	linux-kernel, virtualization, Marko Myllynen, Maxime Coquelin
In-Reply-To: <1480048646-17536-1-git-send-email-jasowang@redhat.com>

On Fri, Nov 25, 2016 at 12:37:26PM +0800, Jason Wang wrote:
> We use single queue even if multiqueue is enabled and let admin to
> enable it through ethtool later. This is used to avoid possible
> regression (small packet TCP stream transmission). But looks like an
> overkill since:
> 
> - single queue user can disable multiqueue when launching qemu
> - brings extra troubles for the management since it needs extra admin
>   tool in guest to enable multiqueue
> - multiqueue performs much better than single queue in most of the
>   cases
> 
> So this patch enables multiqueue by default: if #queues is less than or
> equal to #vcpu, enable as much as queue pairs; if #queues is greater
> than #vcpu, enable #vcpu queue pairs.
> 
> Cc: Hannes Frederic Sowa <hannes@redhat.com>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: Neil Horman <nhorman@redhat.com>
> Cc: Jeremy Eder <jeder@redhat.com>
> Cc: Marko Myllynen <myllynen@redhat.com>
> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>

OK at some level but all uses of num_online_cpus()
like this are racy versus hotplug.
I know we already have this bug but shouldn't we fix it
before we add more?


> ---
>  drivers/net/virtio_net.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index d4ac7a6..a21d93a 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1886,8 +1886,11 @@ static int virtnet_probe(struct virtio_device *vdev)
>  	if (vi->any_header_sg)
>  		dev->needed_headroom = vi->hdr_len;
>  
> -	/* Use single tx/rx queue pair as default */
> -	vi->curr_queue_pairs = 1;
> +	/* Enable multiqueue by default */
> +	if (num_online_cpus() >= max_queue_pairs)
> +		vi->curr_queue_pairs = max_queue_pairs;
> +	else
> +		vi->curr_queue_pairs = num_online_cpus();
>  	vi->max_queue_pairs = max_queue_pairs;
>  
>  	/* Allocate/initialize the rx/tx queues, and invoke find_vqs */
> @@ -1918,6 +1921,8 @@ static int virtnet_probe(struct virtio_device *vdev)
>  		goto free_unregister_netdev;
>  	}
>  
> +	virtnet_set_affinity(vi);
> +
>  	/* Assume link up if device can't report link status,
>  	   otherwise get link status from config. */
>  	if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_STATUS)) {
> -- 
> 2.7.4

^ permalink raw reply

* [PATCH net-next] virtio-net: enable multiqueue by default
From: Jason Wang @ 2016-11-25  4:37 UTC (permalink / raw)
  To: virtualization, netdev, linux-kernel
  Cc: Neil Horman, Jeremy Eder, Michael S . Tsirkin,
	Hannes Frederic Sowa, Marko Myllynen, Maxime Coquelin

We use single queue even if multiqueue is enabled and let admin to
enable it through ethtool later. This is used to avoid possible
regression (small packet TCP stream transmission). But looks like an
overkill since:

- single queue user can disable multiqueue when launching qemu
- brings extra troubles for the management since it needs extra admin
  tool in guest to enable multiqueue
- multiqueue performs much better than single queue in most of the
  cases

So this patch enables multiqueue by default: if #queues is less than or
equal to #vcpu, enable as much as queue pairs; if #queues is greater
than #vcpu, enable #vcpu queue pairs.

Cc: Hannes Frederic Sowa <hannes@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Neil Horman <nhorman@redhat.com>
Cc: Jeremy Eder <jeder@redhat.com>
Cc: Marko Myllynen <myllynen@redhat.com>
Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/virtio_net.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index d4ac7a6..a21d93a 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1886,8 +1886,11 @@ static int virtnet_probe(struct virtio_device *vdev)
 	if (vi->any_header_sg)
 		dev->needed_headroom = vi->hdr_len;
 
-	/* Use single tx/rx queue pair as default */
-	vi->curr_queue_pairs = 1;
+	/* Enable multiqueue by default */
+	if (num_online_cpus() >= max_queue_pairs)
+		vi->curr_queue_pairs = max_queue_pairs;
+	else
+		vi->curr_queue_pairs = num_online_cpus();
 	vi->max_queue_pairs = max_queue_pairs;
 
 	/* Allocate/initialize the rx/tx queues, and invoke find_vqs */
@@ -1918,6 +1921,8 @@ static int virtnet_probe(struct virtio_device *vdev)
 		goto free_unregister_netdev;
 	}
 
+	virtnet_set_affinity(vi);
+
 	/* Assume link up if device can't report link status,
 	   otherwise get link status from config. */
 	if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_STATUS)) {
-- 
2.7.4

^ permalink raw reply related

* Re: Large performance regression with 6in4 tunnel (sit)
From: Sven-Haegar Koch @ 2016-11-25  4:06 UTC (permalink / raw)
  To: Stephen Rothwell; +Cc: Eli Cooper, netdev
In-Reply-To: <20161125134520.68ca0969@canb.auug.org.au>

On Fri, 25 Nov 2016, Stephen Rothwell wrote:

> On Fri, 25 Nov 2016 10:18:12 +0800 Eli Cooper <elicooper@gmx.com> wrote:
> >
> > Sounds like TSO/GSO packets are not properly segmented and therefore
> > dropped.
> > 
> > Could you first try turning off segmentation offloading for the tunnel
> > interface?
> >     ethtool -K sit0 tso off gso off
> 
> On Thu, 24 Nov 2016 18:30:14 -0800 Eric Dumazet <eric.dumazet@gmail.com>
> >
> > You also could try to disable TSO and see if this makes a difference
> > 
> > ethtool -K sixtofour0 tso off
> 
> So turning off tso brings performance up to IPv4 levels ...
> 
> Thanks for that, it solves my immediate problem.

Somehow this problem description really reminds me of a report on 
netdev a bit ago, which the following patch fixed:

commit 9ee6c5dc816aa8256257f2cd4008a9291ec7e985
Author: Lance Richardson <lrichard@redhat.com>
Date:   Wed Nov 2 16:36:17 2016 -0400

    ipv4: allow local fragmentation in ip_finish_output_gso()
    
    Some configurations (e.g. geneve interface with default
    MTU of 1500 over an ethernet interface with 1500 MTU) result
    in the transmission of packets that exceed the configured MTU.
    While this should be considered to be a "bad" configuration,
    it is still allowed and should not result in the sending
    of packets that exceed the configured MTU.

Could this be related?

I suppose it would be difficult to test this patch on this machine?

c'ya
sven-haegar

-- 
Three may keep a secret, if two of them are dead.
- Ben F.

^ permalink raw reply

* Re: [scr265482] ip_tunnel.c
From: Eric Dumazet @ 2016-11-25  4:15 UTC (permalink / raw)
  To: Liyang Yu (于立洋1); +Cc: Cong Wang, netdev@vger.kernel.org
In-Reply-To: <F41877A268BCDE49BAEE2286FA8C269F6FECFD83@LETV-MBX-IDC09.letv.local>

On Thu, Nov 24, 2016 at 7:52 PM, Liyang Yu (于立洋1) <yuliyang1@le.com> wrote:
>  I accept that the issue is not a CVE candidate. But it's a bug isn't it


Bug is in the RFC for not describing how both ends would magically synchronize,
with a sequence protocol which is unidirectional, with no ACK packets
or whatever going back.

With no description, it means that RFC author(s) never considered one
side of the tunnel could die and restart.

Contact the author(s) to get his thoughts maybe ?

>
>  Thank you for your suggestions, about the format of mail, and not next time.
>
>  Everything has its meaning. If sequence number is a joke, why the guys put it into RFC , even implemented the feature.

You will learn that not everything put in RFC or other piece of paper
makes sense.
And the implementation is exactly implementing the RFC bug, because, why not ?

Now if you believe you can make this work, please send patches.


>
>  And if you means that SMP ( Symmetric Multi Processing )?

Yes.

If you enable sequences, GRE performance is abysmal on SMP,
 because linux uses an extra lock, instead of allowing multiple cpus
using GRE tunnel at the same time,
and GSO/TSO are disabled .



>
>
>
> On Thu, Nov 24, 2016 at 5:39 PM, Liyang Yu (于立洋1) <yuliyang1@le.com> wrote:
>>
>>
>>
>>
>>
>>
>>
>> BTW:
>>
>>    Which RFC suggests UINT_MAX as GRE sequence number?  Can you show me?
>>
>>
>>
>>
>>
>>
>
>
> RFC 2890
>
> In any cases, this is absolutely not a security issue nor a CVE candidate.
> Please remove security@kernel.org from CC, no need to spam security guys, they have enough on their plate.
>
> Please send text messages, no HTML is allowed on netdev
>
> Nobody sane uses GRE sequence numbers, precisely because GRE has no documented way to synchronize the source and destination of the tunnel.
> Basically, if you use GRE sequence numbers, you must re-start other side of the tunnel if one side had to restart, or risk dropping up to 2^31 packets.
>
> Really, this is not something that can be solved by using 'a different initial sequence number'
>
> linux GRE sequence number support is a joke, it does not support SMP for a start.
>
>
>
>>
>>
>>
>> On Wed, Nov 23, 2016 at 11:45 PM, Liyang Yu (于立洋1) <yuliyang1@le.com> wrote:
>>
>> > Yeah,I means that recreate the tunnel again, But I don’t think the
>>
>> > patch can fix the bug. It only can make the first packet received successed. And the follow packet will droped also.
>>
>> > In function __gre_xmit  line 366
>>
>> >   tunnel->o_seqno++;
>>
>> >
>>
>> > If you restart from UINT_MAX, the 'o_seqno' of second packet will return to 0 again.
>>
>>
>>
>> The first packet after restart: o_seqno == UINT_MAX, the other end:
>> i_seqno = 0 The second packet after restart: o_seqno == 0, the other
>> end: i_seqno = 1
>>
>>
>>
>> So traffic should be back to normal.
>>
>>
>>
>> UINT_MAX is also what RFC suggests.

^ permalink raw reply

* Re: [scr265482] ip_tunnel.c
From: Cong Wang @ 2016-11-25  4:12 UTC (permalink / raw)
  To: Liyang Yu (于立洋1)
  Cc: security@kernel.org, netdev@vger.kernel.org,
	cve-request@mitre.org
In-Reply-To: <F41877A268BCDE49BAEE2286FA8C269F6FECFC5E@LETV-MBX-IDC09.letv.local>

On Thu, Nov 24, 2016 at 5:39 PM, Liyang Yu (于立洋1) <yuliyang1@le.com> wrote:
> BTW:
>
>    Which RFC suggests UINT_MAX as GRE sequence number?  Can you show me?


https://tools.ietf.org/html/rfc2890

"The receiver maintains the sequence number value of the last successfully
decapsulated packet. Upon establishment of the GRE tunnel, this value
should be set to (2**32)-1."


I agree with Eric here, this is a minor issue, not a serious bug at all.
Please test my patch since you can reproduce it.

BTW, please don't top-post here.

Thanks.

^ permalink raw reply

* Re: Large performance regression with 6in4 tunnel (sit)
From: Eric Dumazet @ 2016-11-25  3:54 UTC (permalink / raw)
  To: Stephen Rothwell; +Cc: Eli Cooper, netdev
In-Reply-To: <20161125140938.24538f97@canb.auug.org.au>

On Fri, 2016-11-25 at 14:09 +1100, Stephen Rothwell wrote:
> Hi Eric,
> 
> On Thu, 24 Nov 2016 19:01:28 -0800 Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >
> > Since I do not have this problem at all on my hosts, it could be a buggy
> > ethernet driver.
> > 
> > Could you share what NIC card and driver you are using ?
> 
> 
> # uname -a
> Linux bilbo 4.7.0-1-amd64 #1 SMP Debian 4.7.8-1 (2016-10-19) x86_64 GNU/Linux
> 
> # lspci | grep -i net
> 03:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03)
> 04:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03)
> 
> from boot dmesg:
> 
> [    7.573725] igb: Intel(R) Gigabit Ethernet Network Driver - version 5.3.0-k
> [    7.573726] igb: Copyright (c) 2007-2014 Intel Corporation.
> [    7.752918] igb 0000:03:00.0: added PHC on eth0
> [    7.752925] igb 0000:03:00.0: Intel(R) Gigabit Ethernet Network Connection
> [    7.752927] igb 0000:03:00.0: eth0: (PCIe:2.5Gb/s:Width x1) 00:1e:67:9f:d4:24
> [    7.753460] igb 0000:03:00.0: eth0: PBA No: 102100-000
> [    7.753460] igb 0000:03:00.0: Using MSI-X interrupts. 4 rx queue(s), 4 tx queue(s)
> [    7.902433] igb 0000:04:00.0: added PHC on eth1
> [    7.902434] igb 0000:04:00.0: Intel(R) Gigabit Ethernet Network Connection
> [    7.902435] igb 0000:04:00.0: eth1: (PCIe:2.5Gb/s:Width x1) 00:1e:67:9f:d4:25
> [    7.902484] igb 0000:04:00.0: eth1: PBA No: 102100-000
> [    7.902485] igb 0000:04:00.0: Using MSI-X interrupts. 4 rx queue(s), 4 tx queue(s)
> [   19.753325] igb 0000:03:00.0 eth0: igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
> 

Could you now report : 

ethtool -k eth0

Thanks

^ permalink raw reply

* Re: [scr265482] ip_tunnel.c
From: Liyang Yu (于立洋1) @ 2016-11-25  3:52 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Cong Wang, netdev@vger.kernel.org

 I accept that the issue is not a CVE candidate. But it's a bug isn't it
 
 Thank you for your suggestions, about the format of mail, and not next time. 

 Everything has its meaning. If sequence number is a joke, why the guys put it into RFC , even implemented the feature.

 And if you means that SMP ( Symmetric Multi Processing )? 



On Thu, Nov 24, 2016 at 5:39 PM, Liyang Yu (于立洋1) <yuliyang1@le.com> wrote:
>
>
>
>
>
>
>
> BTW:
>
>    Which RFC suggests UINT_MAX as GRE sequence number?  Can you show me?
>
>
>
>
>
>


RFC 2890

In any cases, this is absolutely not a security issue nor a CVE candidate.
Please remove security@kernel.org from CC, no need to spam security guys, they have enough on their plate.

Please send text messages, no HTML is allowed on netdev

Nobody sane uses GRE sequence numbers, precisely because GRE has no documented way to synchronize the source and destination of the tunnel.
Basically, if you use GRE sequence numbers, you must re-start other side of the tunnel if one side had to restart, or risk dropping up to 2^31 packets.

Really, this is not something that can be solved by using 'a different initial sequence number'

linux GRE sequence number support is a joke, it does not support SMP for a start.



>
>
>
> On Wed, Nov 23, 2016 at 11:45 PM, Liyang Yu (于立洋1) <yuliyang1@le.com> wrote:
>
> > Yeah,I means that recreate the tunnel again, But I don’t think the
>
> > patch can fix the bug. It only can make the first packet received successed. And the follow packet will droped also.
>
> > In function __gre_xmit  line 366
>
> >   tunnel->o_seqno++;
>
> >
>
> > If you restart from UINT_MAX, the 'o_seqno' of second packet will return to 0 again.
>
>
>
> The first packet after restart: o_seqno == UINT_MAX, the other end: 
> i_seqno = 0 The second packet after restart: o_seqno == 0, the other 
> end: i_seqno = 1
>
>
>
> So traffic should be back to normal.
>
>
>
> UINT_MAX is also what RFC suggests.

^ permalink raw reply

* Re: [PATCH net 1/2] r8152: fix the sw rx checksum is unavailable
From: Mark Lord @ 2016-11-25  3:49 UTC (permalink / raw)
  To: Francois Romieu
  Cc: David Miller, hayeswang, netdev, nic_swsd, linux-kernel,
	linux-usb
In-Reply-To: <20161125002702.GA14085@electric-eye.fr.zoreil.com>

On 16-11-24 07:27 PM, Francois Romieu wrote:
>
> Through aliasing the URB was given a page that contains said (previously)
> received file. The ethernet chip/usb host does not write anything in it.

I don't see how that could be possible.  Please elaborate.

The URB buffers are statically allocated by the driver at probe time,
ten of them in all, allocated with usb_alloc_coherent() in the copy of
the driver I am testing with.

There is no possibility for them to be used for anything other than
USB receive buffers, for this driver only.  Nothing in the driver
or kernel ever writes to those buffers after initial allocation,
and only the driver and USB host controller ever have pointers to the buffers.
-- 
Mark Lord

^ permalink raw reply

* Re: Large performance regression with 6in4 tunnel (sit)
From: Stephen Rothwell @ 2016-11-25  3:09 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Eli Cooper, netdev
In-Reply-To: <1480042888.8455.527.camel@edumazet-glaptop3.roam.corp.google.com>

Hi Eric,

On Thu, 24 Nov 2016 19:01:28 -0800 Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> Since I do not have this problem at all on my hosts, it could be a buggy
> ethernet driver.
> 
> Could you share what NIC card and driver you are using ?


# uname -a
Linux bilbo 4.7.0-1-amd64 #1 SMP Debian 4.7.8-1 (2016-10-19) x86_64 GNU/Linux

# lspci | grep -i net
03:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03)
04:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03)

from boot dmesg:

[    7.573725] igb: Intel(R) Gigabit Ethernet Network Driver - version 5.3.0-k
[    7.573726] igb: Copyright (c) 2007-2014 Intel Corporation.
[    7.752918] igb 0000:03:00.0: added PHC on eth0
[    7.752925] igb 0000:03:00.0: Intel(R) Gigabit Ethernet Network Connection
[    7.752927] igb 0000:03:00.0: eth0: (PCIe:2.5Gb/s:Width x1) 00:1e:67:9f:d4:24
[    7.753460] igb 0000:03:00.0: eth0: PBA No: 102100-000
[    7.753460] igb 0000:03:00.0: Using MSI-X interrupts. 4 rx queue(s), 4 tx queue(s)
[    7.902433] igb 0000:04:00.0: added PHC on eth1
[    7.902434] igb 0000:04:00.0: Intel(R) Gigabit Ethernet Network Connection
[    7.902435] igb 0000:04:00.0: eth1: (PCIe:2.5Gb/s:Width x1) 00:1e:67:9f:d4:25
[    7.902484] igb 0000:04:00.0: eth1: PBA No: 102100-000
[    7.902485] igb 0000:04:00.0: Using MSI-X interrupts. 4 rx queue(s), 4 tx queue(s)
[   19.753325] igb 0000:03:00.0 eth0: igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX

-- 
Cheers,
Stephen Rothwell

^ permalink raw reply

* Re: Large performance regression with 6in4 tunnel (sit)
From: Eric Dumazet @ 2016-11-25  3:01 UTC (permalink / raw)
  To: Stephen Rothwell; +Cc: Eli Cooper, netdev
In-Reply-To: <20161125134520.68ca0969@canb.auug.org.au>

On Fri, 2016-11-25 at 13:45 +1100, Stephen Rothwell wrote:

> So turning off tso brings performance up to IPv4 levels ...

ok.

> 
> Thanks for that, it solves my immediate problem.

Since I do not have this problem at all on my hosts, it could be a buggy
ethernet driver.

Could you share what NIC card and driver you are using ?

^ permalink raw reply

* Re: [PATCH net 1/1] driver: macvtap: Unregister netdev rx_handler if macvtap_newlink fails
From: Jason Wang @ 2016-11-25  2:48 UTC (permalink / raw)
  To: fgao, davem, edumazet, netdev, gfree.wind
In-Reply-To: <1480039506-29740-1-git-send-email-fgao@ikuai8.com>



On 2016年11月25日 10:05, fgao@48lvckh6395k16k5.yundunddos.com wrote:
> From: Gao Feng <fgao@ikuai8.com>
>
> The macvtap_newlink registers the netdev rx_handler firstly, but it
> does not unregister the handler if macvlan_common_newlink failed.
>
> Signed-off-by: Gao Feng <fgao@ikuai8.com>
> ---
>   drivers/net/macvtap.c | 8 +++++++-
>   1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
> index 070e329..bceca28 100644
> --- a/drivers/net/macvtap.c
> +++ b/drivers/net/macvtap.c
> @@ -491,7 +491,13 @@ static int macvtap_newlink(struct net *src_net,
>   	/* Don't put anything that may fail after macvlan_common_newlink
>   	 * because we can't undo what it does.
>   	 */
> -	return macvlan_common_newlink(src_net, dev, tb, data);
> +	err = macvlan_common_newlink(src_net, dev, tb, data);
> +	if (err) {
> +		netdev_rx_handler_unregister(dev);
> +		return err;
> +	}
> +
> +	return 0;
>   }
>   
>   static void macvtap_dellink(struct net_device *dev,

Acked-by: Jason Wang <jasowang@redhat.com>

^ permalink raw reply

* Re: Large performance regression with 6in4 tunnel (sit)
From: Stephen Rothwell @ 2016-11-25  2:45 UTC (permalink / raw)
  To: Eli Cooper; +Cc: netdev
In-Reply-To: <cf2fff0a-7d41-1416-0056-3718ea50a5bd@gmx.com>

Hi Eli,

On Fri, 25 Nov 2016 10:18:12 +0800 Eli Cooper <elicooper@gmx.com> wrote:
>
> Sounds like TSO/GSO packets are not properly segmented and therefore
> dropped.
> 
> Could you first try turning off segmentation offloading for the tunnel
> interface?
>     ethtool -K sit0 tso off gso off

On Thu, 24 Nov 2016 18:30:14 -0800 Eric Dumazet <eric.dumazet@gmail.com>
>
> You also could try to disable TSO and see if this makes a difference
> 
> ethtool -K sixtofour0 tso off

So turning off tso brings performance up to IPv4 levels ...

Thanks for that, it solves my immediate problem.
-- 
Cheers,
Stephen Rothwell

^ permalink raw reply

* Re: [PATCH 3/3] tools/virtio: use {READ,WRITE}_ONCE() in uaccess.h
From: Jason Wang @ 2016-11-25  2:40 UTC (permalink / raw)
  To: Mark Rutland, linux-kernel
  Cc: dave, kvm, mst, netdev, dbueso, virtualization, paulmck, dvyukov
In-Reply-To: <1479983114-17190-4-git-send-email-mark.rutland@arm.com>



On 2016年11月24日 18:25, Mark Rutland wrote:
> As a step towards killing off ACCESS_ONCE, use {READ,WRITE}_ONCE() for the
> virtio tools uaccess primitives, pulling these in from <linux/compiler.h>.
>
> With this done, we can kill off the now-unused ACCESS_ONCE() definition.
>
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> Cc: Jason Wang <jasowang@redhat.com>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: linux-kernel@vger.kernel.org
> Cc: virtualization@lists.linux-foundation.org
> ---
>   tools/virtio/linux/uaccess.h | 9 +++++----
>   1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/tools/virtio/linux/uaccess.h b/tools/virtio/linux/uaccess.h
> index 0a578fe..fa05d01 100644
> --- a/tools/virtio/linux/uaccess.h
> +++ b/tools/virtio/linux/uaccess.h
> @@ -1,8 +1,9 @@
>   #ifndef UACCESS_H
>   #define UACCESS_H
> -extern void *__user_addr_min, *__user_addr_max;
>   
> -#define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))
> +#include <linux/compiler.h>
> +
> +extern void *__user_addr_min, *__user_addr_max;
>   
>   static inline void __chk_user_ptr(const volatile void *p, size_t size)
>   {
> @@ -13,7 +14,7 @@ static inline void __chk_user_ptr(const volatile void *p, size_t size)
>   ({								\
>   	typeof(ptr) __pu_ptr = (ptr);				\
>   	__chk_user_ptr(__pu_ptr, sizeof(*__pu_ptr));		\
> -	ACCESS_ONCE(*(__pu_ptr)) = x;				\
> +	WRITE_ONCE(*(__pu_ptr), x);				\
>   	0;							\
>   })
>   
> @@ -21,7 +22,7 @@ static inline void __chk_user_ptr(const volatile void *p, size_t size)
>   ({								\
>   	typeof(ptr) __pu_ptr = (ptr);				\
>   	__chk_user_ptr(__pu_ptr, sizeof(*__pu_ptr));		\
> -	x = ACCESS_ONCE(*(__pu_ptr));				\
> +	x = READ_ONCE(*(__pu_ptr));				\
>   	0;							\
>   })
>   

Reviewed-by: Jason Wang <jasowang@redhat.com>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH 2/3] vringh: kill off ACCESS_ONCE()
From: Jason Wang @ 2016-11-25  2:40 UTC (permalink / raw)
  To: Mark Rutland, linux-kernel
  Cc: dave, kvm, mst, netdev, dbueso, virtualization, paulmck, dvyukov
In-Reply-To: <1479983114-17190-3-git-send-email-mark.rutland@arm.com>



On 2016年11月24日 18:25, Mark Rutland wrote:
> Despite living under drivers/ vringh.c is also used as part of the userspace
> virtio tools. Before we can kill off the ACCESS_ONCE()definition in the tools,
> we must convert vringh.c to use {READ,WRITE}_ONCE().
>
> This patch does so, along with the required include of <linux/compiler.h> for
> the relevant definitions. The userspace tools provide their own definitions in
> their own <linux/compiler.h>.
>
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> Cc: Jason Wang <jasowang@redhat.com>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: kvm@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: netdev@vger.kernel.org
> Cc: virtualization@lists.linux-foundation.org
> ---
>   drivers/vhost/vringh.c | 5 +++--
>   1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
> index 3bb02c6..bb8971f 100644
> --- a/drivers/vhost/vringh.c
> +++ b/drivers/vhost/vringh.c
> @@ -3,6 +3,7 @@
>    *
>    * Since these may be in userspace, we use (inline) accessors.
>    */
> +#include <linux/compiler.h>
>   #include <linux/module.h>
>   #include <linux/vringh.h>
>   #include <linux/virtio_ring.h>
> @@ -820,13 +821,13 @@ EXPORT_SYMBOL(vringh_need_notify_user);
>   static inline int getu16_kern(const struct vringh *vrh,
>   			      u16 *val, const __virtio16 *p)
>   {
> -	*val = vringh16_to_cpu(vrh, ACCESS_ONCE(*p));
> +	*val = vringh16_to_cpu(vrh, READ_ONCE(*p));
>   	return 0;
>   }
>   
>   static inline int putu16_kern(const struct vringh *vrh, __virtio16 *p, u16 val)
>   {
> -	ACCESS_ONCE(*p) = cpu_to_vringh16(vrh, val);
> +	WRITE_ONCE(*p, cpu_to_vringh16(vrh, val));
>   	return 0;
>   }
>   

Reviewed-by: Jason Wang <jasowang@redhat.com>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH 1/3] tools/virtio: fix READ_ONCE()
From: Jason Wang @ 2016-11-25  2:38 UTC (permalink / raw)
  To: Mark Rutland, linux-kernel
  Cc: dave, kvm, mst, netdev, dbueso, virtualization, paulmck, dvyukov
In-Reply-To: <1479983114-17190-2-git-send-email-mark.rutland@arm.com>



On 2016年11月24日 18:25, Mark Rutland wrote:
> The virtio tools implementation of READ_ONCE() has a single parameter called
> 'var', but erroneously refers to 'val' for its cast, and thus won't work unless
> there's a variable of the correct type that happens to be called 'var'.
>
> Fix this with s/var/val/, making READ_ONCE() work as expected regardless.
>
> Fixes: a7c490333df3cff5 ("tools/virtio: use virt_xxx barriers")
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> Cc: Jason Wang <jasowang@redhat.com>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: linux-kernel@vger.kernel.org
> Cc: virtualization@lists.linux-foundation.org
> ---
>   tools/virtio/linux/compiler.h | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tools/virtio/linux/compiler.h b/tools/virtio/linux/compiler.h
> index 845960e..c9ccfd4 100644
> --- a/tools/virtio/linux/compiler.h
> +++ b/tools/virtio/linux/compiler.h
> @@ -4,6 +4,6 @@
>   #define WRITE_ONCE(var, val) \
>   	(*((volatile typeof(val) *)(&(var))) = (val))
>   
> -#define READ_ONCE(var) (*((volatile typeof(val) *)(&(var))))
> +#define READ_ONCE(var) (*((volatile typeof(var) *)(&(var))))
>   
>   #endif

Reviewed-by: Jason Wang <jasowang@redhat.com>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: Large performance regression with 6in4 tunnel (sit)
From: Eric Dumazet @ 2016-11-25  2:30 UTC (permalink / raw)
  To: Stephen Rothwell; +Cc: netdev
In-Reply-To: <20161125120947.142c3af5@canb.auug.org.au>

On Fri, 2016-11-25 at 12:09 +1100, Stephen Rothwell wrote:
> Hi all,
> 
> This is a typical user error report i.e. a net well specified one :-)
> 
> I am using a 6in4 tunnel from my Linux server at home (since my ISP
> does not provide native IPv6) to another hosted Linus server (that has
> native IPv6 connectivity).  The throughput for IPv6 connections has
> dropped from megabits per second to 10s of kilobits per second.
> 
> First, I am using Debian supplied kernels, so strike one, right?
> 
> Second, I don't actually remember when the problem started - it probably
> started when I upgraded from a v4.4 based kernel to a v4.7 based one.
> This server does not get rebooted very often as it runs hosted services
> for quite a few people (its is ozlabs.org ...).
> 
> I tried creating the same tunnel to another hosted server I have access
> to that is running a v3.16 based kernel and the performance is fine
> (actually upward of 40MB/s).
> 
> I noticed from a tcpdump on the hosted server that (when I fetch a
> large file over HTTP) the server is sending packets larger than the MTU
> of the tunnel.  These packets don't get acked and are later resent as
> MTU sized packets.  I will then send more larger packets and repeat ...


tcpdump shows big packets because SIT supports TSO (since linux-3.13)

lpaa23:~# ip -6 ro get 2002:af6:798::1
2002:af6:798::1 via fe80:: dev sixtofour0  src 2002:af6:797::  metric
1024  pref medium
lpaa23:~# ./netperf -H 2002:af6:798::1
MIGRATED TCP STREAM TEST from ::0 (::) port 0 AF_INET6 to
2002:af6:798::1 () port 0 AF_INET6
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

 87380  16384  16384    10.00    10374.64   

lpaa23:~# ethtool -k sixtofour0|grep seg
tcp-segmentation-offload: on
	tx-tcp-segmentation: on
	tx-tcp-ecn-segmentation: on
	tx-tcp6-segmentation: on
	tx-tcp-psp-segmentation: off [fixed]
generic-segmentation-offload: on
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-ipip-segmentation: off [fixed]
tx-sit-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-mpls-segmentation: off [fixed]
tx-ggre-segmentation: off [fixed]


> 
> The mtu of the tunnel is set to 1280 (though leaving it unset and using
> the default gave the same results).  The tunnel is using sit and is
> statically set up at both ends (though the hosted server end does not
> specify a remote ipv4 end point).
> 
> Is there anything else I can tell you?  Testing patches is a bit of a
> pain, unfortunately, but I was hoping that someone may remember
> something that may have caused this.

You could use "perf record -a -g -e skb:kfree_skb" to see where packets
are dropped on the sender.

You also could try to disable TSO and see if this makes a difference

ethtool -K sixtofour0 tso off

^ permalink raw reply

* Re: [scr265482] ip_tunnel.c
From: Eric Dumazet @ 2016-11-25  2:27 UTC (permalink / raw)
  To: Liyang Yu (于立洋1)
  Cc: Cong Wang, security@kernel.org, netdev@vger.kernel.org,
	cve-request@mitre.org
In-Reply-To: <F41877A268BCDE49BAEE2286FA8C269F6FECFC5E@LETV-MBX-IDC09.letv.local>

On Thu, Nov 24, 2016 at 5:39 PM, Liyang Yu (于立洋1) <yuliyang1@le.com> wrote:
>
>
>
>
>
>
>
> BTW:
>
>    Which RFC suggests UINT_MAX as GRE sequence number?  Can you show me?
>
>
>
>
>
>


RFC 2890

In any cases, this is absolutely not a security issue nor a CVE candidate.
Please remove security@kernel.org from CC, no need to spam security guys,
they have enough on their plate.

Please send text messages, no HTML is allowed on netdev

Nobody sane uses GRE sequence numbers, precisely because GRE has
no documented way to synchronize the source and destination of the tunnel.
Basically, if you use GRE sequence numbers, you must re-start other side
of the tunnel if one side had to restart, or risk dropping up to 2^31 packets.

Really, this is not something that can be solved by using
'a different initial sequence number'

linux GRE sequence number support is a joke, it does not support SMP
for a start.



>
>
>
> On Wed, Nov 23, 2016 at 11:45 PM, Liyang Yu (于立洋1) <yuliyang1@le.com> wrote:
>
> > Yeah,I means that recreate the tunnel again, But I don’t think the
>
> > patch can fix the bug. It only can make the first packet received successed. And the follow packet will droped also.
>
> > In function __gre_xmit  line 366
>
> >   tunnel->o_seqno++;
>
> >
>
> > If you restart from UINT_MAX, the 'o_seqno' of second packet will return to 0 again.
>
>
>
> The first packet after restart: o_seqno == UINT_MAX, the other end: i_seqno = 0 The second packet after restart: o_seqno == 0, the other end: i_seqno = 1
>
>
>
> So traffic should be back to normal.
>
>
>
> UINT_MAX is also what RFC suggests.

^ permalink raw reply

* Re: Large performance regression with 6in4 tunnel (sit)
From: Eli Cooper @ 2016-11-25  2:18 UTC (permalink / raw)
  To: Stephen Rothwell, netdev
In-Reply-To: <20161125120947.142c3af5@canb.auug.org.au>

Hi Stephen,

On 2016/11/25 9:09, Stephen Rothwell wrote:
> Hi all,
>
> This is a typical user error report i.e. a net well specified one :-)
>
> I am using a 6in4 tunnel from my Linux server at home (since my ISP
> does not provide native IPv6) to another hosted Linus server (that has
> native IPv6 connectivity).  The throughput for IPv6 connections has
> dropped from megabits per second to 10s of kilobits per second.
>
> First, I am using Debian supplied kernels, so strike one, right?
>
> Second, I don't actually remember when the problem started - it probably
> started when I upgraded from a v4.4 based kernel to a v4.7 based one.
> This server does not get rebooted very often as it runs hosted services
> for quite a few people (its is ozlabs.org ...).
>
> I tried creating the same tunnel to another hosted server I have access
> to that is running a v3.16 based kernel and the performance is fine
> (actually upward of 40MB/s).
>
> I noticed from a tcpdump on the hosted server that (when I fetch a
> large file over HTTP) the server is sending packets larger than the MTU
> of the tunnel.  These packets don't get acked and are later resent as
> MTU sized packets.  I will then send more larger packets and repeat ...

Sounds like TSO/GSO packets are not properly segmented and therefore
dropped.

Could you first try turning off segmentation offloading for the tunnel
interface?
    ethtool -K sit0 tso off gso off

> The mtu of the tunnel is set to 1280 (though leaving it unset and using
> the default gave the same results).  The tunnel is using sit and is
> statically set up at both ends (though the hosted server end does not
> specify a remote ipv4 end point).
>
> Is there anything else I can tell you?  Testing patches is a bit of a
> pain, unfortunately, but I was hoping that someone may remember
> something that may have caused this.

Regards,
Eli

^ permalink raw reply

* Re: [RFC 02/10] IB/hfi-vnic: Virtual Network Interface Controller (VNIC) Bus driver
From: Vishwanathapura, Niranjana @ 2016-11-25  2:13 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: ira.weiny, Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, Dennis Dalessandro
In-Reply-To: <20161124161545.GA20818-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>

On Thu, Nov 24, 2016 at 09:15:45AM -0700, Jason Gunthorpe wrote:
>On Wed, Nov 23, 2016 at 04:08:25PM -0800, Vishwanathapura, Niranjana wrote:
>
>> In order to pass the hfi function pointers to the hfi_vnic ULP, I can,
>> a) Have hfi_vnic ULP define an interface API for hfi1 driver to call to
>> register its callback (as you pointed). Unfortunately there will be a module
>> dependency here.
>> Or,
>
>That is probably backwards
>
>> b) Add a new member ‘struct vnic_ops’ either to the ib_device structure or
>> ib_port_immutable structure. As it is hfi1 specific, only hfi1 driver will
>> set it. No module dependency here.
>
>You can add a hfi1_get_vnic_ops(struct ib_device *) and implement it
>in your module..
>

In order to be truely device independent the hfi_vnic ULP should not depend on 
a device exported symbol. Instead device should register its functions with the 
ULP. Hence the approaches a) and b).

Niranjana

>Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH net 1/1] driver: macvtap: Unregister netdev rx_handler if macvtap_newlink fails
From: fgao @ 2016-11-25  2:05 UTC (permalink / raw)
  To: davem, jasowang, edumazet, netdev, gfree.wind; +Cc: Gao Feng

From: Gao Feng <fgao@ikuai8.com>

The macvtap_newlink registers the netdev rx_handler firstly, but it
does not unregister the handler if macvlan_common_newlink failed.

Signed-off-by: Gao Feng <fgao@ikuai8.com>
---
 drivers/net/macvtap.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 070e329..bceca28 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -491,7 +491,13 @@ static int macvtap_newlink(struct net *src_net,
 	/* Don't put anything that may fail after macvlan_common_newlink
 	 * because we can't undo what it does.
 	 */
-	return macvlan_common_newlink(src_net, dev, tb, data);
+	err = macvlan_common_newlink(src_net, dev, tb, data);
+	if (err) {
+		netdev_rx_handler_unregister(dev);
+		return err;
+	}
+
+	return 0;
 }
 
 static void macvtap_dellink(struct net_device *dev,
-- 
1.9.1

^ permalink raw reply related

* Re: [PATCH v2] net: dsa: mv88e6xxx: add MV88E6097 switch
From: Andrew Lunn @ 2016-11-25  2:03 UTC (permalink / raw)
  To: Stefan Eichenberger
  Cc: David Miller, vivien.didelot, f.fainelli, netdev,
	stefan.eichenberger
In-Reply-To: <20161124211449.GB2246@eichest-notebook>

On Thu, Nov 24, 2016 at 10:14:49PM +0100, Stefan Eichenberger wrote:
> Hi David
> 
> On Thu, Nov 24, 2016 at 03:29:21PM -0500, David Miller wrote:
> > From: Stefan Eichenberger <eichest@gmail.com>
> > Date: Tue, 22 Nov 2016 17:47:21 +0100
> > 
> > > Add support for the MV88E6097 switch. The change was tested on an Armada
> > > based platform with a MV88E6097 switch.
> > > 
> > > Signed-off-by: Stefan Eichenberger <stefan.eichenberger@netmodule.com>
> > 
> > Applied to net-next, thanks.
> 
> I'm afraid this is the wrong patch version, Andrew and Vivien had some
> findings, this would be the correct patch series that include all
> necessary changes:

Hi Stefan

David does not remove patches once they are accepted. Please send
followup patches which convert the current state to what we actually
want.

	Andrew

^ permalink raw reply

* GOOD DAY FRIEND
From: Mr. Piyush Gupta @ 2016-11-24 17:01 UTC (permalink / raw)




-- 
Dear Friend,

Good Day, I am Mr. Piyush Gupta as a  (DBSHK) banker, I have funds worth 
of $25.500,000.00 to secretly secure and transfer in to your account of 
my late client who dead with next of kin, which i will like to invest in 
profitable business in your country.

Please if you are interested to help me without betrayed me, do and 
write me for more details. Here is my Email: infopiyushg@gmail.com

Regards,
Mr. Piyush Gupta.
(DBSHK)
Hong Kong.

^ permalink raw reply

* Large performance regression with 6in4 tunnel (sit)
From: Stephen Rothwell @ 2016-11-25  1:09 UTC (permalink / raw)
  To: netdev

Hi all,

This is a typical user error report i.e. a net well specified one :-)

I am using a 6in4 tunnel from my Linux server at home (since my ISP
does not provide native IPv6) to another hosted Linus server (that has
native IPv6 connectivity).  The throughput for IPv6 connections has
dropped from megabits per second to 10s of kilobits per second.

First, I am using Debian supplied kernels, so strike one, right?

Second, I don't actually remember when the problem started - it probably
started when I upgraded from a v4.4 based kernel to a v4.7 based one.
This server does not get rebooted very often as it runs hosted services
for quite a few people (its is ozlabs.org ...).

I tried creating the same tunnel to another hosted server I have access
to that is running a v3.16 based kernel and the performance is fine
(actually upward of 40MB/s).

I noticed from a tcpdump on the hosted server that (when I fetch a
large file over HTTP) the server is sending packets larger than the MTU
of the tunnel.  These packets don't get acked and are later resent as
MTU sized packets.  I will then send more larger packets and repeat ...

The mtu of the tunnel is set to 1280 (though leaving it unset and using
the default gave the same results).  The tunnel is using sit and is
statically set up at both ends (though the hosted server end does not
specify a remote ipv4 end point).

Is there anything else I can tell you?  Testing patches is a bit of a
pain, unfortunately, but I was hoping that someone may remember
something that may have caused this.
-- 
Cheers,
Stephen Rothwell

^ permalink raw reply

* Re: [net-next PATCH v1 0/2] stmmac: dwmac-meson8b: configurable RGMII TX delay
From: Martin Blumenstingl @ 2016-11-25  0:41 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Jerome Brunet, Sebastian Frias, linux-amlogic, devicetree, netdev,
	davem, khilman, mark.rutland, robh+dt, linux-arm-kernel,
	alexandre.torgue, peppe.cavallaro, carlo, Mans Rullgard,
	Andrew Lunn
In-Reply-To: <e6ca0941-e2e3-dd93-d4d3-8fbd76b60e17@gmail.com>

On Thu, Nov 24, 2016 at 7:55 PM, Florian Fainelli <f.fainelli@gmail.com> wrote:
> Le 24/11/2016 à 09:05, Martin Blumenstingl a écrit :
>> On Thu, Nov 24, 2016 at 4:56 PM, Jerome Brunet <jbrunet@baylibre.com> wrote:
>>> On Thu, 2016-11-24 at 15:34 +0100, Martin Blumenstingl wrote:
>>>> Currently the dwmac-meson8b stmmac glue driver uses a hardcoded 1/4
>>>> cycle TX clock delay. This seems to work fine for many boards (for
>>>> example Odroid-C2 or Amlogic's reference boards) but there are some
>>>> others where TX traffic is simply broken.
>>>> There are probably multiple reasons why it's working on some boards
>>>> while it's broken on others:
>>>> - some of Amlogic's reference boards are using a Micrel PHY
>>>> - hardware circuit design
>>>> - maybe more...
>>>>
>>>> This raises a question though:
>>>> Which device is supposed to enable the TX delay when both MAC and PHY
>>>> support it? And should we implement it for each PHY / MAC separately
>>>> or should we think about a more generic solution (currently it's not
>>>> possible to disable the TX delay generated by the RTL8211F PHY via
>>>> devicetree when using phy-mode "rgmii")?
>>>
>>> Actually you can skip the part which activate the Tx-delay on the phy
>>> by setting "phy-mode = "rgmii-id" instead of "rgmii"
>>>
>>> phy->interface will no longer be PHY_INTERFACE_MODE_RGMII
>>> but PHY_INTERFACE_MODE_RGMII_ID.
>> unfortunately this is not true for RTL8211F (I did my previous tests
>> with the same expectation in mind)!
>> the code seems to suggest that TX-delay is disabled whenever mode !=
>> PHY_INTERFACE_MODE_RGMII.
>> BUT: on my device RTL8211F_TX_DELAY is set even before
>> "phy_write(phydev, 0x11, reg);"!
>
> (Adding Sebastian (and Mans, and Andrew) since he raised the same
> question a while ago. I think I now understand a bit better what
> Sebastian was after a couple of weeks ago)
>
>>
>> Based on what I found it seems that rgmii-id, rgmii-txid and
>> rgmii-rxid are supposed to be handled by the PHY.
>
> Correct, the meaning of PHY_INTERFACE_MODE should be from the
> perspective of the PHY device:
>
> - PHY_INTERFACE_MODE_RGMII_TXID means that the PHY is responsible for
> adding a delay when the MAC transmits (TX MAC -> PHY (delay) -> wire)
> - PHY_INTERFACE_MODE_RGMII_RXID means that the PHY is responsible for
> adding a delay when the MAC receives (RX MAC <- (delay) PHY) <- wire)
and PHY_INTERFACE_MODE_RGMII_ID is basically _TXID and _RXID combined
(meaning that the PHY is responsible for the TX and RX delays)

>> That would mean that we have two problems here:
>> 1) drivers/net/phy/realtek.c:rtl8211f_config_init should check for
>> PHY_INTERFACE_MODE_RGMII_ID or PHY_INTERFACE_MODE_RGMII_TXID and
>> enable the TX-delay in that case - otherwise explicitly disable it
>
> Agreed.
(on a side-not: it seems that the RTL8211F's TX-delay setting is
either untouched by a hardware reset via GPIO or enabled automatically
during hardware reset via GPIO)

>> 2) dwmac-meson8b.c should only use the configured TX-delay for
>> PHY_INTERFACE_MODE_RGMII
>> @Florian: could you please share your thoughts on this (who handles
>> the TX delay in which case)?
>
> This also seems reasonable to do, provided that the PHY is also properly
> configured not to add delays in both directions, and therefore assumes
> that the MAC does it.
on Amlogic Meson systems (at least on the ARM64 ones) all customer
devices with Gbit ethernet are using the RTL8211F PHY. The only
exception are some development/reference boards from Amlogic
themselves, which seem to be using a Micrel RGMII PHY.

> We have a fairly large problem with how RGMII delays are done in PHYLIB
> and Ethernet MAC drivers (or just in general), where we can't really
> intersect properly what a PHY is supporting (in terms of internal
> delays), and what the MAC supports either. One possible approach could
> be to update PHY drivers a list of PHY_INTERFACE_MODE_* that they
> support (ideally, even with normalized nanosecond delay values), and
> then intersect that with the requested phy_interface_t during
> phy_{attach,connect} time, and feed this back to the MAC with a special
> error code/callback, so we could gracefully try to choose another
> PHY_INTERFACE_MODE_* value that the MAC supports....
>
> A larger problem is that a number of drivers have been deployed, and
> Device Trees, possibly with the meaning of "phy-mode" and
> "phy-connection-type" being from the MAC perspective, and not the PHY
> perspective *sigh*, good luck auditing those.
>
> So from there, here is possibly what we could do
>
> - submit a series of patches that update the PHYLIB documentation (there
> are other things missing here) and make it clear from which entity (PHY
> or MAC) does the delay apply to, document the "intersection" problem here
sounds like a good idea, maybe we should move this to a separate thread (I guess
this is the part which is especially interesting for Sebastian, Mans
and Andrew)?

> - have you document the configured behavior for dwmac-meson8b that we
> just discussed here in v2 of this patch series
I would add something like "...using the phy-modes rgmii-id or
rgmii-txid means that the TX-delay will be added by the PHY, thus no
TX-delay should be configured for the MAC/dwmac-meson8b glue" (I'll
improve this, I just want a quick confirmation that I get your idea
right)

Thanks for the quick and helpful answer Florian!


Regards,
Martin

^ permalink raw reply

* Re: [PATCH net 1/2] r8152: fix the sw rx checksum is unavailable
From: Francois Romieu @ 2016-11-25  0:27 UTC (permalink / raw)
  To: Mark Lord
  Cc: David Miller, hayeswang-Rasf1IRRPZFBDgjK7y7TUQ,
	netdev-u79uwXL29TY76Z2rM5mHXA, nic_swsd-Rasf1IRRPZFBDgjK7y7TUQ,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-usb-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <c3ca9e10-64de-2454-b99a-19e8302f5c12-e+AXbWqSrlAAvxtiuMwx3w@public.gmane.org>

Mark Lord <mlord-e+AXbWqSrlAAvxtiuMwx3w@public.gmane.org> :
[...]
> >From tracing through the powerpc arch code, this is the buffer that
> is being directly DMA'd into.  And the USB layer does an invalidate_dcache
> on that entire buffer before initiating the DMA (confirmed via printk).
> 
> The driver itself NEVER writes anything to that buffer,
> and nobody else has a pointer to it other than the USB host controller,
> so there's nothing else that can write to it either.
> 
> According to the driver writer, the chip should only ever write a fresh
> rx_desc struct at the beginning of a buffer, never ASCII data.
> 
> So how does that buffer end up containing ASCII data from the NFS transfers?

Through aliasing the URB was given a page that contains said (previously)
received file. The ethernet chip/usb host does not write anything in it.
There could be a device or a driver problem but it may not be the real
problem.

So far the analysis focused on "how was this corrupted content written into
this receive buffer page ?". If I read David correctly (?) the "nobody
else has a pointer to it other than the USB host controller" point may be
replaced with "the pointer to it aliases some already used page".

-- 
Ueimor
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox