Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 0/5] VSOCK: add vsock_test test suite
From: Jorgen S. Hansen @ 2017-12-20 14:48 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: netdev@vger.kernel.org, Dexuan Cui
In-Reply-To: <20171213144911.6428-1-stefanha@redhat.com>

> On Dec 13, 2017, at 3:49 PM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> 
> The vsock_diag.ko module already has a test suite but the core AF_VSOCK
> functionality has no tests.  This patch series adds several test cases that
> exercise AF_VSOCK SOCK_STREAM socket semantics (send/recv, connect/accept,
> half-closed connections, simultaneous connections).
> 
> The test suite is modest but I hope to cover additional cases in the future.
> My goal is to have a shared test suite so VMCI, Hyper-V, and KVM can ensure
> that our transports behave the same.
> 
> I have tested virtio-vsock.
> 
> Jorgen: Please test the VMCI transport and let me know if anything needs to be
> adjusted.  See tools/testing/vsock/README for information on how to run the
> test suite.
> 

I tried running the vsock_test on VMCI, and all the tests failed in one way or
another:
1) connection reset test: when the guest tries to connect to the host, we
  get EINVAL as the error instead of ECONNRESET. I’ll fix that.
2) client close and server close tests: On the host side, VMCI doesn’t
  support reading data from a socket that has been closed by the
  guest. When the guest closes a connection, all data is gone, and
  we return EOF on the host side. So the tests that try to read data
  after close, should not attempt that on VMCI host side. I got the
  tests to pass by adding a getsockname call to determine if
  the local CID was the host CID, and then skip the read attempt
  in that case. We could add a vmci flag, that would enable
  this behavior.
3) send_byte(fd, -EPIPE): for the VMCI transport, the close
 isn’t necessarily visible immediately on the peer. So in most
 cases, these send operations would complete with success.
 I was running these tests using nested virtualization, so I
 suspect that the problem is more likely to occur here, but
 I had to add a sleep to be sure to get the EPIPE error.
4) server close test: the connect would sometimes fail. This looks
  like an issue where we detect the peer close on the client side
  before we complete the connection handshake on the client
  side. There are two different channels used for the connection
  handshake and the disconnect. I’ll look into this to see what
  exactly is going on.
5) multiple connections tests: with the standard socket sizes,
  VMCI is only able to support about 100 concurrent stream
  connections so this test passes with MULTICONN_NFDS
  set to 100.

Thanks,
Jorgen

^ permalink raw reply

* Re: [PATCH] net: Fix double free and memory corruption in get_net_ns_by_id()
From: Nicolas Dichtel @ 2017-12-20 15:00 UTC (permalink / raw)
  To: Eric W. Biederman, netdev
  Cc: David Miller, ktkhai, security, secalert, eric.dumazet, stephen
In-Reply-To: <87d13aaaqr.fsf@xmission.com>

Le 19/12/2017 à 18:27, Eric W. Biederman a écrit :
> 
> (I can trivially verify that that idr_remove in cleanup_net happens
>  after the network namespace count has dropped to zero --EWB)
> 
> Function get_net_ns_by_id() does not check for net::count
> after it has found a peer in netns_ids idr.
> 
> It may dereference a peer, after its count has already been
> finaly decremented. This leads to double free and memory
> corruption:
> 
> put_net(peer)                                   rtnl_lock()
> atomic_dec_and_test(&peer->count) [count=0]     ...
> __put_net(peer)                                 get_net_ns_by_id(net, id)
>   spin_lock(&cleanup_list_lock)
>   list_add(&net->cleanup_list, &cleanup_list)
>   spin_unlock(&cleanup_list_lock)
> queue_work()                                      peer = idr_find(&net->netns_ids, id)
>   |                                               get_net(peer) [count=1]
>   |                                               ...
>   |                                               (use after final put)
>   v                                               ...
>   cleanup_net()                                   ...
>     spin_lock(&cleanup_list_lock)                 ...
>     list_replace_init(&cleanup_list, ..)          ...
>     spin_unlock(&cleanup_list_lock)               ...
>     ...                                           ...
>     ...                                           put_net(peer)
>     ...                                             atomic_dec_and_test(&peer->count) [count=0]
>     ...                                               spin_lock(&cleanup_list_lock)
>     ...                                               list_add(&net->cleanup_list, &cleanup_list)
>     ...                                               spin_unlock(&cleanup_list_lock)
>     ...                                             queue_work()
>     ...                                           rtnl_unlock()
>     rtnl_lock()                                   ...
>     for_each_net(tmp) {                           ...
>       id = __peernet2id(tmp, peer)                ...
>       spin_lock_irq(&tmp->nsid_lock)              ...
>       idr_remove(&tmp->netns_ids, id)             ...
>       ...                                         ...
>       net_drop_ns()                               ...
> 	net_free(peer)                            ...
>     }                                             ...
>   |
>   v
>   cleanup_net()
>     ...
>     (Second free of peer)
> 
> Also, put_net() on the right cpu may reorder with left's cpu
> list_replace_init(&cleanup_list, ..), and then cleanup_list
> will be corrupted.
> 
> Since cleanup_net() is executed in worker thread, while
> put_net(peer) can happen everywhere, there should be
> enough time for concurrent get_net_ns_by_id() to pick
> the peer up, and the race does not seem to be unlikely.
> The patch fixes the problem in standard way.
> 
> (Also, there is possible problem in peernet2id_alloc(), which requires
> check for net::count under nsid_lock and maybe_get_net(peer), but
> in current stable kernel it's used under rtnl_lock() and it has to be
> safe. Openswitch begun to use peernet2id_alloc(), and possibly it should
> be fixed too. While this is not in stable kernel yet, so I'll send
> a separate message to netdev@ later).
> 
> Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
> Fixes: 0c7aecd4bde4 "netns: add rtnl cmd to add and get peer netns ids"
> Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
> Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Good catch, thank you.

Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>

^ permalink raw reply

* Re: [PATCH bpf-next 1/8] bpf: offload: don't require rtnl for dev list manipulation
From: Kirill Tkhai @ 2017-12-20 15:00 UTC (permalink / raw)
  To: Jakub Kicinski, netdev, alexei.starovoitov, daniel; +Cc: oss-drivers
In-Reply-To: <20171220041006.25629-2-jakub.kicinski@netronome.com>

Hi, Jakub,

thanks for looking into this.

Sadly, that __bpf_prog_offload_destroy() needs rtnl_lock() context,
but rwsem is still good as it became useful for next patches from the series.

Please, see one small minor nit near the last hunk. Everything else looks good
for me.

On 20.12.2017 07:09, Jakub Kicinski wrote:
> We only need to hold rtnl_lock() around ndo calls.  The device
> offload initialization doesn't require it.  Neither will soon-
> -to-come querying the offload info.  Use struct rw_semaphore
> because map offload will require sleeping with the semaphore
> held for read.
> 
> Suggested-by: Kirill Tkhai <ktkhai@virtuozzo.com>
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>

Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com>

> ---
>  kernel/bpf/offload.c | 21 +++++++++++++++++----
>  1 file changed, 17 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c
> index 8455b89d1bbf..b88e5ebdc61d 100644
> --- a/kernel/bpf/offload.c
> +++ b/kernel/bpf/offload.c
> @@ -20,8 +20,12 @@
>  #include <linux/netdevice.h>
>  #include <linux/printk.h>
>  #include <linux/rtnetlink.h>
> +#include <linux/rwsem.h>
>  
> -/* protected by RTNL */
> +/* Protects bpf_prog_offload_devs and offload members of all progs.
> + * RTNL lock cannot be taken when holding this lock.
> + */
> +static struct rw_semaphore bpf_devs_lock;
>  static LIST_HEAD(bpf_prog_offload_devs);
>  
>  int bpf_prog_offload_init(struct bpf_prog *prog, union bpf_attr *attr)
> @@ -43,17 +47,21 @@ int bpf_prog_offload_init(struct bpf_prog *prog, union bpf_attr *attr)
>  	offload->prog = prog;
>  	init_waitqueue_head(&offload->verifier_done);
>  
> -	rtnl_lock();
> +	/* Our UNREGISTER notifier will grab bpf_devs_lock, so we are safe
> +	 * to assume the netdev doesn't get unregistered as long as we hold
> +	 * bpf_devs_lock.
> +	 */
> +	down_write(&bpf_devs_lock);
>  	offload->netdev = __dev_get_by_index(net, attr->prog_ifindex);
>  	if (!offload->netdev) {
> -		rtnl_unlock();
> +		up_write(&bpf_devs_lock);
>  		kfree(offload);
>  		return -EINVAL;
>  	}
>  
>  	prog->aux->offload = offload;
>  	list_add_tail(&offload->offloads, &bpf_prog_offload_devs);
> -	rtnl_unlock();
> +	up_write(&bpf_devs_lock);
>  
>  	return 0;
>  }
> @@ -126,7 +134,9 @@ void bpf_prog_offload_destroy(struct bpf_prog *prog)
>  	wake_up(&offload->verifier_done);
>  
>  	rtnl_lock();
> +	down_write(&bpf_devs_lock);
>  	__bpf_prog_offload_destroy(prog);
> +	up_write(&bpf_devs_lock);
>  	rtnl_unlock();
>  
>  	kfree(offload);
> @@ -181,11 +191,13 @@ static int bpf_offload_notification(struct notifier_block *notifier,
>  		if (netdev->reg_state != NETREG_UNREGISTERING)
>  			break;
>  
> +		down_write(&bpf_devs_lock);
>  		list_for_each_entry_safe(offload, tmp, &bpf_prog_offload_devs,
>  					 offloads) {
>  			if (offload->netdev == netdev)
>  				__bpf_prog_offload_destroy(offload->prog);
>  		}
> +		up_write(&bpf_devs_lock);
>  		break;
>  	default:
>  		break;
> @@ -199,6 +211,7 @@ static struct notifier_block bpf_offload_notifier = {
>  
>  static int __init bpf_offload_init(void)
>  {
> +	init_rwsem(&bpf_devs_lock);

DECLARE_RWSEM() could be used instead of this.

>  	register_netdevice_notifier(&bpf_offload_notifier);
>  	return 0;
>  }
> 

^ permalink raw reply

* Re: [patch iproute2] tc: add -bs option for batch mode
From: Stephen Hemminger @ 2017-12-20 15:17 UTC (permalink / raw)
  To: Chris Mi; +Cc: netdev@vger.kernel.org, gerlitz.or@gmail.com
In-Reply-To: <VI1PR0501MB21433CE21CEF8212A8BBE21AAB0C0@VI1PR0501MB2143.eurprd05.prod.outlook.com>

On Wed, 20 Dec 2017 09:23:34 +0000
Chris Mi <chrism@mellanox.com> wrote:

> > Your real performance win is just not asking for ACK for every rule.  
> No. Even if batch_size > 1, we ack every rule. The real performance win is
> to send multiple rules in one system call. If we are not asking for ACK for every rule,
> the performance will be improved further.

Try the no ACK method.

When we were optimizing routing daemons like Quagga, it was discovered
that an ACK for every route insert was the main bottleneck. Doing asynchronous
error handling got a bigger win than your batching.

Please try that, doing multiple messages using iov is not necessary.

^ permalink raw reply

* Re: [PATCH net v2] ipv4: Fix use-after-free when flushing FIB tables
From: Alexander Duyck @ 2017-12-20 15:32 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: Netdev, David Miller, Duyck, Alexander H, David Ahern,
	Fengguang Wu, mlxsw
In-Reply-To: <20171220085156.27991-1-idosch@mellanox.com>

On Wed, Dec 20, 2017 at 12:51 AM, Ido Schimmel <idosch@mellanox.com> wrote:
> Since commit 0ddcf43d5d4a ("ipv4: FIB Local/MAIN table collapse") the
> local table uses the same trie allocated for the main table when custom
> rules are not in use.
>
> When a net namespace is dismantled, the main table is flushed and freed
> (via an RCU callback) before the local table. In case the callback is
> invoked before the local table is iterated, a use-after-free can occur.
>
> Fix this by iterating over the FIB tables in reverse order, so that the
> main table is always freed after the local table.
>
> v2: Add a comment to make the fix more explicit per Dave's and Alex's
> feedback.
>
> Fixes: 0ddcf43d5d4a ("ipv4: FIB Local/MAIN table collapse")
> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> Reported-by: Fengguang Wu <fengguang.wu@intel.com>
> ---
>  net/ipv4/fib_frontend.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
> index f52d27a422c3..08ce7f14ede1 100644
> --- a/net/ipv4/fib_frontend.c
> +++ b/net/ipv4/fib_frontend.c
> @@ -1298,14 +1298,17 @@ static int __net_init ip_fib_net_init(struct net *net)
>
>  static void ip_fib_net_exit(struct net *net)
>  {
> -       unsigned int i;
> +       int i;
>
>         rtnl_lock();
>  #ifdef CONFIG_IP_MULTIPLE_TABLES
>         RCU_INIT_POINTER(net->ipv4.fib_main, NULL);
>         RCU_INIT_POINTER(net->ipv4.fib_default, NULL);
>  #endif
> -       for (i = 0; i < FIB_TABLE_HASHSZ; i++) {
> +       /* The local table must be destroyed before the main table,
> +        * as it might be using main's trie.
> +        */

I think we might want even more description here. Specifically why
reversing the order allows local to be destroyed before main. I was
thinking something along the lines of:

Destroy the tables in reverse order to guarantee that the local table,
ID 255, is destroyed before main table, ID 254. This is necessary as
local may contain references to data contained in main.

> +       for (i = FIB_TABLE_HASHSZ - 1; i >= 0; i--) {
>                 struct hlist_head *head = &net->ipv4.fib_table_hash[i];
>                 struct hlist_node *tmp;
>                 struct fib_table *tb;
> --
> 2.14.3
>

^ permalink raw reply

* [PATCH net v2] openvswitch: Fix pop_vlan action for double tagged frames
From: Eric Garver @ 2017-12-20 15:39 UTC (permalink / raw)
  To: netdev; +Cc: ovs-dev, Jiri Benc

skb_vlan_pop() expects skb->protocol to be a valid TPID for double
tagged frames, but skb->protocol is set to the ethertype by
key_extract(). So temporarily set it to the TPID when doing a pop_vlan.

Fixes: 5108bbaddc37 ("openvswitch: add processing of L3 packets")
Signed-off-by: Eric Garver <e@erig.me>
---
 net/openvswitch/actions.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 30a5df27116e..c484e0941047 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -280,6 +280,13 @@ static int set_mpls(struct sk_buff *skb, struct sw_flow_key *flow_key,
 static int pop_vlan(struct sk_buff *skb, struct sw_flow_key *key)
 {
 	int err;
+	__be16 proto = skb->protocol;
+
+	/* skb->protocol is set to the inner most parsed ethertype. To satisfy
+	 * skb_vlan_pop() for multi-tagged frames we must set it to the tpid.
+	 */
+	if (is_flow_key_valid(key) && key->eth.vlan.tci && key->eth.cvlan.tci)
+		skb->protocol = key->eth.cvlan.tpid;
 
 	err = skb_vlan_pop(skb);
 	if (skb_vlan_tag_present(skb)) {
@@ -288,6 +295,9 @@ static int pop_vlan(struct sk_buff *skb, struct sw_flow_key *key)
 		key->eth.vlan.tci = 0;
 		key->eth.vlan.tpid = 0;
 	}
+
+	skb->protocol = proto;
+
 	return err;
 }
 
-- 
2.12.0

^ permalink raw reply related

* Re: [PATCH v1 2/4] lib/net_utils: Introduce mac_pton_from_user()
From: David Miller @ 2017-12-20 15:51 UTC (permalink / raw)
  To: gregkh; +Cc: andriy.shevchenko, netdev, Larry.Finger, florian.c.schilhabel,
	devel
In-Reply-To: <20171220071355.GB1957@kroah.com>

From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date: Wed, 20 Dec 2017 08:13:55 +0100

> On Tue, Dec 19, 2017 at 09:14:10PM +0200, Andy Shevchenko wrote:
>> Some drivers are getting MAC from user space. Make a helper for them.
>> 
>> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
>> ---
>>  include/linux/kernel.h |  1 +
>>  lib/net_utils.c        | 12 ++++++++++++
>>  2 files changed, 13 insertions(+)
> 
> Don't do this just for some horrid staging drivers.  They can just drop
> that functionality entirely and use the "normal" way of doing this if
> they really want it.

Agreed.

^ permalink raw reply

* Re: [Intel-wired-lan] v4.15-rc2 on thinkpad x60: ethernet stopped working
From: Pavel Machek @ 2017-12-20 15:54 UTC (permalink / raw)
  To: Fujinaka, Todd
  Cc: Neftin, Sasha, Keller, Jacob E, bpoirier@suse.com,
	nix.or.die@gmail.com, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, intel-wired-lan@lists.osuosl.org,
	lsorense@csclub.uwaterloo.ca, David Miller
In-Reply-To: <9B4A1B1917080E46B64F07F2989DADD69845BBE2@ORSMSX110.amr.corp.intel.com>

[-- Attachment #1: Type: text/plain, Size: 619 bytes --]

Hi!

> >> Before ask for reverting 19110cfbb..., please, check if follow patch 
> >> of Benjamin work for you http://patchwork.ozlabs.org/patch/846825/

> >
> Pavel, before ask for revert - let's check Benjamin's patch following to his previous patch. Previous patch was not competed and latest one come to complete changes.
>

v4.15-rc4+:

Ethernet works with 19110cfbb reverted.

Ethernet works With patchwork.ozlabs.org/patch/846825/ applied.


									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply

* Re: Linux 4.14 - regression: broken tun/tap / bridge network with virtio - bisected
From: Andreas Hartmann @ 2017-12-20 15:56 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Michal Kubecek, Jason Wang, David Miller, Network Development
In-Reply-To: <6f75bdf5-839b-8c84-c8be-e83d071b245e@maya.org>

On 12/18/2017 at 06:11 PM Andreas Hartmann wrote:
> On 12/17/2017 at 11:33 PM Willem de Bruijn wrote:
[...]
>> I have been able to reproduce the hang by sending a UFO packet
>> between two guests running v4.13 on a host running v4.15-rc1.
>>
>> The vhost_net_ubuf_ref refcount indeed hits overflow (-1) from
>> vhost_zerocopy_callback being called for each segment of a
>> segmented UFO skb. This refcount is decremented then on each
>> segment, but incremented only once for the entire UFO skb.
>>
>> Before v4.14, these packets would be converted in skb_segment to
>> regular copy packets with skb_orphan_frags and the callback function
>> called once at this point. v4.14 added support for reference counted
>> zerocopy skb that can pass through skb_orphan_frags unmodified and
>> have their zerocopy state safely cloned with skb_zerocopy_clone.
>>
>> The call to skb_zerocopy_clone must come after skb_orphan_frags
>> to limit cloning of this state to those skbs that can do so safely.
>>
>> Please try a host with the following patch. This fixes it for me. I intend to
>> send it to net.
>>
>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>> index a592ca025fc4..d2d985418819 100644
>> --- a/net/core/skbuff.c
>> +++ b/net/core/skbuff.c
>> @@ -3654,8 +3654,6 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>>
>>                 skb_shinfo(nskb)->tx_flags |= skb_shinfo(head_skb)->tx_flags &
>>                                               SKBTX_SHARED_FRAG;
>> -               if (skb_zerocopy_clone(nskb, head_skb, GFP_ATOMIC))
>> -                       goto err;
>>
>>                 while (pos < offset + len) {
>>                         if (i >= nfrags) {
>> @@ -3681,6 +3679,8 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>>
>>                         if (unlikely(skb_orphan_frags(frag_skb, GFP_ATOMIC)))
>>                                 goto err;
>> +                       if (skb_zerocopy_clone(nskb, frag_skb, GFP_ATOMIC))
>> +                               goto err;
>>
>>                         *nskb_frag = *frag;
>>                         __skb_frag_ref(nskb_frag);
>>
>>
>> This is relatively inefficient, as it calls skb_zerocopy_clone for each frag
>> in the frags[] array. I will follow-up with a patch to net-next that only
>> checks once per skb:
>>
>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>> index 466581cf4cdc..a293a33604ec 100644
>> --- a/net/core/skbuff.c
>> +++ b/net/core/skbuff.c
>> @@ -3662,7 +3662,8 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>>
>>                 skb_shinfo(nskb)->tx_flags |= skb_shinfo(head_skb)->tx_flags &
>>                                               SKBTX_SHARED_FRAG;
>> -               if (skb_zerocopy_clone(nskb, head_skb, GFP_ATOMIC))
>> +               if (skb_orphan_frags(frag_skb, GFP_ATOMIC) ||
>> +                   skb_zerocopy_clone(nskb, frag_skb, GFP_ATOMIC))
>>                         goto err;
>>
>>                 while (pos < offset + len) {
>> @@ -3676,6 +3677,11 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>>
>>                                 BUG_ON(!nfrags);
>>
>> +                               if (skb_orphan_frags(frag_skb, GFP_ATOMIC) ||
>> +                                   skb_zerocopy_clone(nskb, frag_skb,
>> +                                                      GFP_ATOMIC))
>> +                                       goto err;
>> +
>>                                 list_skb = list_skb->next;
>>                         }
>>
>> @@ -3687,9 +3693,6 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>>                                 goto err;
>>                         }
>>
>> -                       if (unlikely(skb_orphan_frags(frag_skb, GFP_ATOMIC)))
>> -                               goto err;
>> -
> 
> I'm currently testing this one.
> 

Test is in progress. I'm testing w/ 4.14.7, which already contains "net:
accept UFO datagrams from tuntap and packet".

At first, I tested an unpatched 4.14.7 - the problem (no more killable
qemu-process) did occur promptly on shutdown of the machine. This was
expected.

Next, I applied the above patch (the second one). Until now, I didn't
face any problem any more on shutdown of VMs. Looks promising.


Thanks,
regards,
Andreas

^ permalink raw reply

* Re: [Intel-wired-lan] v4.15-rc2 on thinkpad x60: ethernet stopped working
From: Pavel Machek @ 2017-12-20 16:01 UTC (permalink / raw)
  To: Fujinaka, Todd
  Cc: Neftin, Sasha, Keller, Jacob E, bpoirier@suse.com,
	nix.or.die@gmail.com, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, intel-wired-lan@lists.osuosl.org,
	lsorense@csclub.uwaterloo.ca, David Miller
In-Reply-To: <20171220155421.GA3849@amd>

[-- Attachment #1: Type: text/plain, Size: 962 bytes --]

On Wed 2017-12-20 16:54:21, Pavel Machek wrote:
> Hi!
> 
> > >> Before ask for reverting 19110cfbb..., please, check if follow patch 
> > >> of Benjamin work for you http://patchwork.ozlabs.org/patch/846825/
> 
> > >
> > Pavel, before ask for revert - let's check Benjamin's patch following to his previous patch. Previous patch was not competed and latest one come to complete changes.
> >
> 
> v4.15-rc4+:
> 
> Ethernet works with 19110cfbb reverted.
> 
> Ethernet works With patchwork.ozlabs.org/patch/846825/ applied.

Hmm. So... ethernet originally did not work with patch/846825/ applied
or 19110cfbb reverted, so I re-plugged ethernet cables. Now it works
even with plain v4.15-rc4+.

So it looks like the bug was fixed in the mainline in the meantime...?

Sorry for the noise,
								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply

* [PATCH v2] net: ibm: emac: support RGMII-[RX|TX]ID phymode
From: Christian Lamparter @ 2017-12-20 16:02 UTC (permalink / raw)
  To: netdev; +Cc: David S . Miller, Andrew Lunn, Christophe Jaillet

The RGMII spec allows compliance for devices that implement an internal
delay on TXC and/or RXC inside the transmitter. This patch adds the
necessary RGMII_[RX|TX]ID mode code to handle such PHYs with the
emac driver.

Signed-off-by: Christian Lamparter <chunkeey@gmail.com>

---
v2: - utilize phy_interface_mode_is_rgmii()
---
 drivers/net/ethernet/ibm/emac/core.c  |  4 ++--
 drivers/net/ethernet/ibm/emac/emac.h  |  3 +++
 drivers/net/ethernet/ibm/emac/rgmii.c | 10 ++++++++--
 3 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/ibm/emac/core.c b/drivers/net/ethernet/ibm/emac/core.c
index 7feff2450ed6..043e72e28bba 100644
--- a/drivers/net/ethernet/ibm/emac/core.c
+++ b/drivers/net/ethernet/ibm/emac/core.c
@@ -199,8 +199,8 @@ static void __emac_set_multicast_list(struct emac_instance *dev);
 
 static inline int emac_phy_supports_gige(int phy_mode)
 {
-	return  phy_mode == PHY_MODE_GMII ||
-		phy_mode == PHY_MODE_RGMII ||
+	return  phy_interface_mode_is_rgmii(phy_mode) ||
+		phy_mode == PHY_MODE_GMII ||
 		phy_mode == PHY_MODE_SGMII ||
 		phy_mode == PHY_MODE_TBI ||
 		phy_mode == PHY_MODE_RTBI;
diff --git a/drivers/net/ethernet/ibm/emac/emac.h b/drivers/net/ethernet/ibm/emac/emac.h
index 5afcc27ceebb..8c6d2af7281b 100644
--- a/drivers/net/ethernet/ibm/emac/emac.h
+++ b/drivers/net/ethernet/ibm/emac/emac.h
@@ -112,6 +112,9 @@ struct emac_regs {
 #define PHY_MODE_RMII	PHY_INTERFACE_MODE_RMII
 #define PHY_MODE_SMII	PHY_INTERFACE_MODE_SMII
 #define PHY_MODE_RGMII	PHY_INTERFACE_MODE_RGMII
+#define PHY_MODE_RGMII_ID	PHY_INTERFACE_MODE_RGMII_ID
+#define PHY_MODE_RGMII_RXID	PHY_INTERFACE_MODE_RGMII_RXID
+#define PHY_MODE_RGMII_TXID	PHY_INTERFACE_MODE_RGMII_TXID
 #define PHY_MODE_TBI	PHY_INTERFACE_MODE_TBI
 #define PHY_MODE_GMII	PHY_INTERFACE_MODE_GMII
 #define PHY_MODE_RTBI	PHY_INTERFACE_MODE_RTBI
diff --git a/drivers/net/ethernet/ibm/emac/rgmii.c b/drivers/net/ethernet/ibm/emac/rgmii.c
index c4a1ac38bba8..124b0473d2b7 100644
--- a/drivers/net/ethernet/ibm/emac/rgmii.c
+++ b/drivers/net/ethernet/ibm/emac/rgmii.c
@@ -52,9 +52,9 @@
 /* RGMII bridge supports only GMII/TBI and RGMII/RTBI PHYs */
 static inline int rgmii_valid_mode(int phy_mode)
 {
-	return  phy_mode == PHY_MODE_GMII ||
+	return  phy_interface_mode_is_rgmii(phy_mode) ||
+		phy_mode == PHY_MODE_GMII ||
 		phy_mode == PHY_MODE_MII ||
-		phy_mode == PHY_MODE_RGMII ||
 		phy_mode == PHY_MODE_TBI ||
 		phy_mode == PHY_MODE_RTBI;
 }
@@ -63,6 +63,9 @@ static inline const char *rgmii_mode_name(int mode)
 {
 	switch (mode) {
 	case PHY_MODE_RGMII:
+	case PHY_MODE_RGMII_ID:
+	case PHY_MODE_RGMII_RXID:
+	case PHY_MODE_RGMII_TXID:
 		return "RGMII";
 	case PHY_MODE_TBI:
 		return "TBI";
@@ -81,6 +84,9 @@ static inline u32 rgmii_mode_mask(int mode, int input)
 {
 	switch (mode) {
 	case PHY_MODE_RGMII:
+	case PHY_MODE_RGMII_ID:
+	case PHY_MODE_RGMII_RXID:
+	case PHY_MODE_RGMII_TXID:
 		return RGMII_FER_RGMII(input);
 	case PHY_MODE_TBI:
 		return RGMII_FER_TBI(input);
-- 
2.15.1

^ permalink raw reply related

* Re: [PATCH v3 ipsec-next 3/3] xfrm: wrap xfrmdev_ops with offload config
From: Marcelo Ricardo Leitner @ 2017-12-20 16:03 UTC (permalink / raw)
  To: Shannon Nelson; +Cc: steffen.klassert, netdev
In-Reply-To: <1513726549-7065-4-git-send-email-shannon.nelson@oracle.com>

On Tue, Dec 19, 2017 at 03:35:49PM -0800, Shannon Nelson wrote:
> There's no reason to define netdev->xfrmdev_ops if
> the offload facility is not CONFIG'd in.
> 
> Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>

This one could use a Fixes tag perhaps:
Fixes: d77e38e612a0 ("xfrm: Add an IPsec hardware offloading API")

as in theory the build was broken since then, as it added:
+#ifdef CONFIG_XFRM_OFFLOAD
+struct xfrmdev_ops {
...
+#ifdef CONFIG_XFRM
+       const struct xfrmdev_ops *xfrmdev_ops;

So the pointer would have an undefined type
  if CONFIG_XFRM && !CONFIG_XFRM_OFFLOAD
Though I couldn't reproduce this, not sure why.

But.. is it buildable with this patch? I mine failed:

obj-$(CONFIG_XFRM) := xfrm_policy.o xfrm_state.o xfrm_hash.o \
                      xfrm_input.o xfrm_output.o \
                      xfrm_sysctl.o xfrm_replay.o xfrm_device.o

so xfrm_device is always in if CONFIG_XFRM is there,
xfrm_dev_init(), via xfrm_dev_notifier -> xfrm_dev_event() ->
  xfrm_dev_register() and then:

static int xfrm_dev_register(struct net_device *dev)
{
        if ((dev->features & NETIF_F_HW_ESP) && !dev->xfrmdev_ops)
                                                 ^^^^^^^^^^^^^^^^

We can't control CONFIG_XFRM_OFFLOAD directly, so unless you
unselected other offloadings such as INET_ESP_OFFLOAD, it is still on.

linux/net/xfrm/xfrm_device.c: In function ‘xfrm_dev_register’:
linux/net/xfrm/xfrm_device.c:147:48: error: ‘struct net_device’ has no member named ‘xfrmdev_ops’; did you mean ‘netdev_ops’?
  if ((dev->features & NETIF_F_HW_ESP) && !dev->xfrmdev_ops)
                                                ^~~~~~~~~~~
                                                netdev_ops


> ---
>  include/linux/netdevice.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 2eaac7d..145d0de 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1697,7 +1697,7 @@ struct net_device {
>  	const struct ndisc_ops *ndisc_ops;
>  #endif
>  
> -#ifdef CONFIG_XFRM
> +#ifdef CONFIG_XFRM_OFFLOAD
>  	const struct xfrmdev_ops *xfrmdev_ops;
>  #endif
>  
> -- 
> 2.7.4
> 

^ permalink raw reply

* Re: [PATCH net-next] dev: Correctly get length of alias string in dev_set_alias()
From: David Miller @ 2017-12-20 16:05 UTC (permalink / raw)
  To: serhe.popovych; +Cc: netdev
In-Reply-To: <1513633115-16940-1-git-send-email-serhe.popovych@gmail.com>

From: Serhey Popovych <serhe.popovych@gmail.com>
Date: Mon, 18 Dec 2017 23:38:35 +0200

> We supply number of bytes available in @alias via @len
> parameter to dev_set_alias() which is not the same
> as zero terminated string length that can be shorter.
> 
> Both dev_set_alias() users (rtnetlink and sysfs) can
> submit number of bytes up to IFALIASZ with actual string
> length slightly shorter by putting '\0' not at @len - 1.
> 
> Use strnlen() to get length of zero terminated string
> and not access beyond @len. Correct comment about @len
> and explain how to unset alias (i.e. use zero for @len).
> 
> Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>

I don't really see this as useful, really.

In the sysfs case, we are not presented with a NULL terminated string.
Instead, the net sysfs code gives us a length that goes up until the
trailing newline character.  The sysfs case is never larger than the
actual string size + 1.

The netlink attribute is usually sized appropriately for whatever the
string length actually is.

This therefore just seems to add an new strnlen() unnecessarily to
this code path, which rarely does anything helpful.

Thanks.

^ permalink raw reply

* Re: [PATCH v3 net-next 0/6] tls: Add generic NIC offload infrastructure
From: David Miller @ 2017-12-20 16:12 UTC (permalink / raw)
  To: borisp; +Cc: jiri, ilyal, netdev, davejwatson, tom, hannes, aviadye, liranl
In-Reply-To: <HE1PR0501MB223554DC2840A95B594ED0D1B00C0@HE1PR0501MB2235.eurprd05.prod.outlook.com>

From: Boris Pismenny <borisp@mellanox.com>
Date: Wed, 20 Dec 2017 08:28:03 +0000

> Dave, would you prefer to get the driver patches that use this infra
> before the infra?

The arguments you present are silly.

In order to analyze any proposed API, the users of it must be presented
for the reviewers to see as well.

Logically, you must have tried to make use of the APIs to see how well
they work and are usable for at least one such user, right?

Therefore, the use case exists, and you must present it alongside the
API proposal.

Whether you provide the API addition patches and the user in the same
patch series, or a separate one, doesn't really matter.  What is
important is that this is accessible to the reviewer at the same
time.

^ permalink raw reply

* Re: sparc64 verifier failures..
From: David Miller @ 2017-12-20 16:17 UTC (permalink / raw)
  To: daniel; +Cc: netdev, alexei.starovoitov
In-Reply-To: <58565df5-2c45-6ab3-fb48-de011da70e6a@iogearbox.net>

From: Daniel Borkmann <daniel@iogearbox.net>
Date: Wed, 20 Dec 2017 14:05:36 +0100

> On 12/19/2017 09:36 PM, David Miller wrote:
>> 
>> I'm getting about 100 verifier failures on sparc64.
>> 
>> The vast majority of them seem to be due to misaligned packet
>> accesses.  Here is a sample of some of the failures.
> 
> Thanks, I'll check it next days. It would probably make sense to have
> a --strict-align mode for test_verifier that enables it on all tests
> unconditionally so selftests suite would always run in both modes (at
> least on archs that don't have this restriction).

Indeed, and for some reason I thought we were already doing this.

^ permalink raw reply

* RE: [PATCH v3 net-next 0/6] tls: Add generic NIC offload infrastructure
From: Ilya Lesokhin @ 2017-12-20 16:23 UTC (permalink / raw)
  To: David Miller, Boris Pismenny
  Cc: jiri@resnulli.us, netdev@vger.kernel.org, davejwatson@fb.com,
	tom@herbertland.com, hannes@stressinduktion.org, Aviad Yehezkel,
	Liran Liss
In-Reply-To: <20171220.111208.1328340432834146497.davem@davemloft.net>

> 
> > Dave, would you prefer to get the driver patches that use this infra
> > before the infra?
> 
> The arguments you present are silly.
> 
> In order to analyze any proposed API, the users of it must be presented for the
> reviewers to see as well.
> 
> Logically, you must have tried to make use of the APIs to see how well they
> work and are usable for at least one such user, right?
Right, we agree.
> 
> Therefore, the use case exists, and you must present it alongside the API
> proposal.
> 
> Whether you provide the API addition patches and the user in the same patch
> series, or a separate one, doesn't really matter.  What is important is that this
> is accessible to the reviewer at the same time.

Note that we did provide a user in an accessible place.
https://github.com/Mellanox/tls-offload/tree/tls_device_v3
The link was at the bottom of the cover letter.

We just feel that the code there is not yet ready for upstream submission, and it might have
conflicts with other stuff submitted by Mellanox.

Would it be better if we submitted the mlx5e TLS support as an RFC alongside the TLS
Infrastructure patches?

^ permalink raw reply

* Re: [PATCH v3 ipsec-next 3/3] xfrm: wrap xfrmdev_ops with offload config
From: Shannon Nelson @ 2017-12-20 16:22 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner; +Cc: steffen.klassert, netdev
In-Reply-To: <20171220160240.GK6122@localhost.localdomain>

On 12/20/2017 8:03 AM, Marcelo Ricardo Leitner wrote:
> On Tue, Dec 19, 2017 at 03:35:49PM -0800, Shannon Nelson wrote:
>> There's no reason to define netdev->xfrmdev_ops if
>> the offload facility is not CONFIG'd in.
>>
>> Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
> 
> This one could use a Fixes tag perhaps:
> Fixes: d77e38e612a0 ("xfrm: Add an IPsec hardware offloading API")
> 
> as in theory the build was broken since then, as it added:
> +#ifdef CONFIG_XFRM_OFFLOAD
> +struct xfrmdev_ops {
> ...
> +#ifdef CONFIG_XFRM
> +       const struct xfrmdev_ops *xfrmdev_ops;
> 
> So the pointer would have an undefined type
>    if CONFIG_XFRM && !CONFIG_XFRM_OFFLOAD
> Though I couldn't reproduce this, not sure why.

Hmmm, I don't think this requires a "Fixes" tag, as the code all worked 
just fine, I'm just doing a little cleaning.

Patch 2/3 adds a more intense look at the data structure, so I needed to 
change it to the CONFIG_XFRM_OFFLOAD so as to not break the build. 
Since the xfrmdev_ops field is now never used unless we have 
CONFIG_XFRM_OFFLOAD, we can change the net_device definition to be just 
a bit smaller without it.

> 
> But.. is it buildable with this patch? I mine failed:
> 
> obj-$(CONFIG_XFRM) := xfrm_policy.o xfrm_state.o xfrm_hash.o \
>                        xfrm_input.o xfrm_output.o \
>                        xfrm_sysctl.o xfrm_replay.o xfrm_device.o
> 
> so xfrm_device is always in if CONFIG_XFRM is there,
> xfrm_dev_init(), via xfrm_dev_notifier -> xfrm_dev_event() ->
>    xfrm_dev_register() and then:
> 
> static int xfrm_dev_register(struct net_device *dev)
> {
>          if ((dev->features & NETIF_F_HW_ESP) && !dev->xfrmdev_ops)

This looks like you haven't applied version 3 of the 2nd patch "xfrm: 
check for xdo_dev_ops add and delete".  I missed this in the earlier 
version (not enough compile tests), but version 3 of patch 2/3  should 
address it.

sln

>                                                   ^^^^^^^^^^^^^^^^
> 
> We can't control CONFIG_XFRM_OFFLOAD directly, so unless you
> unselected other offloadings such as INET_ESP_OFFLOAD, it is still on.
> 
> linux/net/xfrm/xfrm_device.c: In function ‘xfrm_dev_register’:
> linux/net/xfrm/xfrm_device.c:147:48: error: ‘struct net_device’ has no member named ‘xfrmdev_ops’; did you mean ‘netdev_ops’?
>    if ((dev->features & NETIF_F_HW_ESP) && !dev->xfrmdev_ops)
>                                                  ^~~~~~~~~~~
>                                                  netdev_ops
> 
> 
>> ---
>>   include/linux/netdevice.h | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>> index 2eaac7d..145d0de 100644
>> --- a/include/linux/netdevice.h
>> +++ b/include/linux/netdevice.h
>> @@ -1697,7 +1697,7 @@ struct net_device {
>>   	const struct ndisc_ops *ndisc_ops;
>>   #endif
>>   
>> -#ifdef CONFIG_XFRM
>> +#ifdef CONFIG_XFRM_OFFLOAD
>>   	const struct xfrmdev_ops *xfrmdev_ops;
>>   #endif
>>   
>> -- 
>> 2.7.4
>>

^ permalink raw reply

* Re: [QUESTION] Doubt about NAPI_GRO_CB(skb)->is_atomic in tcpv4 gro process
From: Alexander Duyck @ 2017-12-20 16:24 UTC (permalink / raw)
  To: Yunsheng Lin
  Cc: netdev@vger.kernel.org, davem@davemloft.net, linuxarm@huawei.com,
	yuxiaowu, wzhen.wang, Xuehuahu
In-Reply-To: <ca367d0d-b0d7-3357-7196-c0da17ef9890@huawei.com>

On Wed, Dec 20, 2017 at 1:09 AM, Yunsheng Lin <linyunsheng@huawei.com> wrote:
> Hi, all
>         I have some doubt about NAPI_GRO_CB(skb)->is_atomic when
> analyzing the tcpv4 gro process:
>
> Firstly we set NAPI_GRO_CB(skb)->is_atomic to 1 in dev_gro_receive:
> https://elixir.free-electrons.com/linux/v4.15-rc4/source/net/core/dev.c#L4838
>
> And then in inet_gro_receive, we check the NAPI_GRO_CB(skb)->is_atomic
> before setting NAPI_GRO_CB(skb)->is_atomic according to IP_DF bit in the ip header:
> https://elixir.free-electrons.com/linux/v4.15-rc4/source/net/ipv4/af_inet.c#L1319
>
> struct sk_buff **inet_gro_receive(struct sk_buff **head, struct sk_buff *skb)
> {
> .....................
>         for (p = *head; p; p = p->next) {
> ........................
>
>                 /* If the previous IP ID value was based on an atomic
>                  * datagram we can overwrite the value and ignore it.
>                  */
>                 if (NAPI_GRO_CB(skb)->is_atomic)                      //we check it here
>                         NAPI_GRO_CB(p)->flush_id = flush_id;
>                 else
>                         NAPI_GRO_CB(p)->flush_id |= flush_id;
>         }
>
>         NAPI_GRO_CB(skb)->is_atomic = !!(iph->frag_off & htons(IP_DF));  //we set it here
>         NAPI_GRO_CB(skb)->flush |= flush;
>         skb_set_network_header(skb, off);
> ................................
> }
>
> My question is whether we should check the NAPI_GRO_CB(skb)->is_atomic or NAPI_GRO_CB(p)->is_atomic?
> If we should check NAPI_GRO_CB(skb)->is_atomic, then maybe it is unnecessary because it is alway true.
> If we should check NAPI_GRO_CB(p)->is_atomic, maybe there is a bug here.
>
> So what is the logic here? I am just start analyzing the gro, maybe I miss something obvious here.

The logic there is to address the multiple IP header case where there
are 2 or more IP headers due to things like VXLAN or GRE tunnels. So
what will happen is that an outer IP header will end up being sent
with DF not set and will clear the is_atomic value then we want to OR
in the next header that is applied. It defaults to assignment on
is_atomic because the first IP header will encounter flush_id with no
previous configuration occupying it.

The part I am not sure about is if we should be using assignment for
is_atomic or using an "&=" to clear the bit and leave it cleared. I
don't know if there has been much testing of multiple levels of tunnel
header.

Thanks.

- Alex

^ permalink raw reply

* [PATCH net] ip6_gre: fix device features for ioctl setup
From: Alexey Kodanev @ 2017-12-20 16:36 UTC (permalink / raw)
  To: netdev; +Cc: Alexander Duyck, David Miller, Alexey Kodanev

When ip6gre is created using ioctl, its features, such as
scatter-gather, GSO and tx-checksumming will be turned off:

  # ip -f inet6 tunnel add gre6 mode ip6gre remote fd00::1
  # ethtool -k gre6 (truncated output)
    tx-checksumming: off
    scatter-gather: off
    tcp-segmentation-offload: off
    generic-segmentation-offload: off [requested on]

But when netlink is used, they will be enabled:
  # ip link add gre6 type ip6gre remote fd00::1
  # ethtool -k gre6 (truncated output)
    tx-checksumming: on
    scatter-gather: on
    tcp-segmentation-offload: on
    generic-segmentation-offload: on

This results in a loss of performance when gre6 is created via ioctl.
The issue was found with LTP/gre tests.

Fix it by moving the setup of device features to a separate function
and invoke it with ndo_init callback because both netlink and ioctl
will eventually call it via register_netdevice():

   register_netdevice()
       - ndo_init() callback -> ip6gre_tunnel_init() or ip6gre_tap_init()
           - ip6gre_tunnel_init_common()
                - ip6gre_tnl_init_features()

The moved code also contains two minor style fixes:
  * removed needless tab from GRE6_FEATURES on NETIF_F_HIGHDMA line.
  * fixed the issue reported by checkpatch: "Unnecessary parentheses around
    'nt->encap.type == TUNNEL_ENCAP_NONE'"

Fixes: ac4eb009e477 ("ip6gre: Add support for basic offloads offloads excluding GSO")
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
---
 net/ipv6/ip6_gre.c |   57 +++++++++++++++++++++++++++++----------------------
 1 files changed, 32 insertions(+), 25 deletions(-)

diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index 4cfd8e0..9a4b376 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -1014,6 +1014,36 @@ static void ip6gre_tunnel_setup(struct net_device *dev)
 	eth_random_addr(dev->perm_addr);
 }
 
+#define GRE6_FEATURES (NETIF_F_SG |		\
+		       NETIF_F_FRAGLIST |	\
+		       NETIF_F_HIGHDMA |	\
+		       NETIF_F_HW_CSUM)
+
+static void ip6gre_tnl_init_features(struct net_device *dev)
+{
+	struct ip6_tnl *nt = netdev_priv(dev);
+
+	dev->features		|= GRE6_FEATURES;
+	dev->hw_features	|= GRE6_FEATURES;
+
+	if (!(nt->parms.o_flags & TUNNEL_SEQ)) {
+		/* TCP offload with GRE SEQ is not supported, nor
+		 * can we support 2 levels of outer headers requiring
+		 * an update.
+		 */
+		if (!(nt->parms.o_flags & TUNNEL_CSUM) ||
+		    nt->encap.type == TUNNEL_ENCAP_NONE) {
+			dev->features    |= NETIF_F_GSO_SOFTWARE;
+			dev->hw_features |= NETIF_F_GSO_SOFTWARE;
+		}
+
+		/* Can use a lockless transmit, unless we generate
+		 * output sequences
+		 */
+		dev->features |= NETIF_F_LLTX;
+	}
+}
+
 static int ip6gre_tunnel_init_common(struct net_device *dev)
 {
 	struct ip6_tnl *tunnel;
@@ -1048,6 +1078,8 @@ static int ip6gre_tunnel_init_common(struct net_device *dev)
 	if (!(tunnel->parms.flags & IP6_TNL_F_IGN_ENCAP_LIMIT))
 		dev->mtu -= 8;
 
+	ip6gre_tnl_init_features(dev);
+
 	return 0;
 }
 
@@ -1298,11 +1330,6 @@ static int ip6gre_tap_init(struct net_device *dev)
 	.ndo_get_iflink = ip6_tnl_get_iflink,
 };
 
-#define GRE6_FEATURES (NETIF_F_SG |		\
-		       NETIF_F_FRAGLIST |	\
-		       NETIF_F_HIGHDMA |		\
-		       NETIF_F_HW_CSUM)
-
 static void ip6gre_tap_setup(struct net_device *dev)
 {
 
@@ -1382,26 +1409,6 @@ static int ip6gre_newlink(struct net *src_net, struct net_device *dev,
 	nt->net = dev_net(dev);
 	ip6gre_tnl_link_config(nt, !tb[IFLA_MTU]);
 
-	dev->features		|= GRE6_FEATURES;
-	dev->hw_features	|= GRE6_FEATURES;
-
-	if (!(nt->parms.o_flags & TUNNEL_SEQ)) {
-		/* TCP offload with GRE SEQ is not supported, nor
-		 * can we support 2 levels of outer headers requiring
-		 * an update.
-		 */
-		if (!(nt->parms.o_flags & TUNNEL_CSUM) ||
-		    (nt->encap.type == TUNNEL_ENCAP_NONE)) {
-			dev->features    |= NETIF_F_GSO_SOFTWARE;
-			dev->hw_features |= NETIF_F_GSO_SOFTWARE;
-		}
-
-		/* Can use a lockless transmit, unless we generate
-		 * output sequences
-		 */
-		dev->features |= NETIF_F_LLTX;
-	}
-
 	err = register_netdevice(dev);
 	if (err)
 		goto out;
-- 
1.7.1

^ permalink raw reply related

* Re: [PATCHv3 net-next 00/14] net: sched: sch: introduce extack support
From: David Miller @ 2017-12-20 16:32 UTC (permalink / raw)
  To: aring; +Cc: jhs, xiyou.wangcong, jiri, netdev, kernel, dsahern
In-Reply-To: <20171218224513.29836-1-aring@mojatatu.com>

From: Alexander Aring <aring@mojatatu.com>
Date: Mon, 18 Dec 2017 17:44:59 -0500

> this patch series basically add support for extack in common qdisc handling.
> Additional it adds extack pointer to common qdisc callback handling this
> offers per qdisc implementation to setting the extack message for each
> failure over netlink.

This patch series doesn't apply cleanly to net-next.

^ permalink raw reply

* Re: [PATCHv3 net-next 00/14] net: sched: sch: introduce extack support
From: Alexander Aring @ 2017-12-20 16:35 UTC (permalink / raw)
  To: David Miller
  Cc: Jamal Hadi Salim, Cong Wang, Jiří Pírko, netdev,
	kernel, David Ahern
In-Reply-To: <20171220.113249.514392331552783606.davem@davemloft.net>

Hi,

On Wed, Dec 20, 2017 at 11:32 AM, David Miller <davem@davemloft.net> wrote:
> From: Alexander Aring <aring@mojatatu.com>
> Date: Mon, 18 Dec 2017 17:44:59 -0500
>
>> this patch series basically add support for extack in common qdisc handling.
>> Additional it adds extack pointer to common qdisc callback handling this
>> offers per qdisc implementation to setting the extack message for each
>> failure over netlink.
>
> This patch series doesn't apply cleanly to net-next.

okay, I will rebase it and send v4.

Thanks.

- Alex

^ permalink raw reply

* Re: [PATCH v3 net-next 0/6] tls: Add generic NIC offload infrastructure
From: David Miller @ 2017-12-20 16:36 UTC (permalink / raw)
  To: ilyal; +Cc: borisp, jiri, netdev, davejwatson, tom, hannes, aviadye, liranl
In-Reply-To: <AM4PR0501MB27231C4EC47FB287D7FF1EC0D40C0@AM4PR0501MB2723.eurprd05.prod.outlook.com>

From: Ilya Lesokhin <ilyal@mellanox.com>
Date: Wed, 20 Dec 2017 16:23:03 +0000

>> Whether you provide the API addition patches and the user in the same patch
>> series, or a separate one, doesn't really matter.  What is important is that this
>> is accessible to the reviewer at the same time.
> 
> Note that we did provide a user in an accessible place.

That is not accessible for people reading netdev, it needs to be posted
on the netdev list.

It is never appropriate to require a reviewer to look at some external
site to review a patch series posted here.

^ permalink raw reply

* Re: [PATCH net-next v4] ip6_vti: adjust vti mtu according to mtu of lower device
From: David Miller @ 2017-12-20 16:53 UTC (permalink / raw)
  To: alexey.kodanev; +Cc: netdev, steffen.klassert, pvorel, shannon.nelson
In-Reply-To: <1513691961-19692-1-git-send-email-alexey.kodanev@oracle.com>

From: Alexey Kodanev <alexey.kodanev@oracle.com>
Date: Tue, 19 Dec 2017 16:59:21 +0300

> LTP/udp6_ipsec_vti tests fail when sending large UDP datagrams over
> ip6_vti that require fragmentation and the underlying device has an
> MTU smaller than 1500 plus some extra space for headers. This happens
> because ip6_vti, by default, sets MTU to ETH_DATA_LEN and not updating
> it depending on a destination address or link parameter. Further
> attempts to send UDP packets may succeed because pmtu gets updated on
> ICMPV6_PKT_TOOBIG in vti6_err().
> 
> In case the lower device has larger MTU size, e.g. 9000, ip6_vti works
> but not using the possible maximum size, output packets have 1500 limit.
> 
> The above cases require manual MTU setup after ip6_vti creation. However
> ip_vti already updates MTU based on lower device with ip_tunnel_bind_dev().
> 
> Here is the example when the lower device MTU is set to 9000:
 ...
> Reported-by: Petr Vorel <pvorel@suse.cz>
> Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>

Applied, thanks Alexey.

^ permalink raw reply

* Re: [PATCH net-next] qed*: Utilize FW 8.33.1.0
From: David Miller @ 2017-12-20 16:54 UTC (permalink / raw)
  To: jakub.kicinski
  Cc: Tomer.Tayar, netdev, linux-rdma, linux-scsi, Ariel.Elior,
	Michal.Kalderon, Yuval.Bason, Ram.Amrani, Manish.Chopra,
	Chad.Dupuis, Manish.Rangankar
In-Reply-To: <20171219154651.30930d2a@cakuba.netronome.com>

From: Jakub Kicinski <jakub.kicinski@netronome.com>
Date: Tue, 19 Dec 2017 15:46:51 -0800

> On Tue, 19 Dec 2017 16:05:23 +0200, Tomer Tayar wrote:
>> Sorry for the very long patch.
>> The firmware changes are spread all over w/o a good modularity.
> 
> Rings false.  Significant portion of this patch is just whitespace 
> and comment changes.

Totally agreed.

This thing is beyond huge already, adding unrelated whitespace and
comment changes make reviewing it even more impossible.

^ permalink raw reply

* [PATCH ipsec-next 0/7]: Support multiple VTIs with the same src+dst pair
From: Lorenzo Colitti @ 2017-12-20 17:06 UTC (permalink / raw)
  To: netdev; +Cc: steffen.klassert, subashab, nharold, davem

When using IPsec tunnel mode, VTIs provide many benefits compared
to direct configuration of xfrm policies / states. However, one
limitation is that there can only be one VTI between a given pair
of IP addresses. This does not allow configuring multiple IPsec
tunnels to the same security gateway. This is required by some
deployments, for example I-WLAN [3GPP TS 24.327].

This patchset introduces a new VTI_KEYED flag that allows
configuration of multiple VTIs between the same IP address
pairs. The semantics are as follows:

- The output path is the same as current VTI behaviour, where a
  routing lookup selects a VTI interface, and the VTI's okey
  specifies the mark to use in the XFRM lookup.
- The input and ICMP error paths instead work by first looking up
  an SA with a loose match that ignores the mark. That mark is
  then used to find the tunnel by ikey (for input packets) or
  okey (for ICMP errors).

In order for ICMP errors to work, flags are added to the common
IP lookup functions to ignore the tunnel ikey and to look up
tunnels by okey instead of ikey.

On the same IP address pair, keyed VTIs can coexist with each
other (as long as the ikeys are different), but cannot coexist
with keyless VTIs. This is because the existing keyless VTI
behaviour (which this series does not change) is to always accept
packets matching an input policy, regardless of whether there is
any matching XFRM state. Thus, the keyless VTI would accept the
traffic for the keyed tunnel and drop it because it would not
match the keyed tunnel's state.

Changes from RFC series:
- Processing of ICMP errors now works when ikey != okey.
- Series now contains changes to the common tunnel lookup
  functions to match tunnels by okey and to ignore ikey when
  matching.
- Fixed missing EXPORT_SYMBOL for xfrm_state_lookup_loose.
- Made vti6_lookup static as it should have been.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox