Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH bpf-next 1/8] bpf: offload: don't require rtnl for dev list manipulation
From: Kirill Tkhai @ 2017-12-20 15:00 UTC (permalink / raw)
  To: Jakub Kicinski, netdev, alexei.starovoitov, daniel; +Cc: oss-drivers
In-Reply-To: <20171220041006.25629-2-jakub.kicinski@netronome.com>

Hi, Jakub,

thanks for looking into this.

Sadly, that __bpf_prog_offload_destroy() needs rtnl_lock() context,
but rwsem is still good as it became useful for next patches from the series.

Please, see one small minor nit near the last hunk. Everything else looks good
for me.

On 20.12.2017 07:09, Jakub Kicinski wrote:
> We only need to hold rtnl_lock() around ndo calls.  The device
> offload initialization doesn't require it.  Neither will soon-
> -to-come querying the offload info.  Use struct rw_semaphore
> because map offload will require sleeping with the semaphore
> held for read.
> 
> Suggested-by: Kirill Tkhai <ktkhai@virtuozzo.com>
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>

Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com>

> ---
>  kernel/bpf/offload.c | 21 +++++++++++++++++----
>  1 file changed, 17 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c
> index 8455b89d1bbf..b88e5ebdc61d 100644
> --- a/kernel/bpf/offload.c
> +++ b/kernel/bpf/offload.c
> @@ -20,8 +20,12 @@
>  #include <linux/netdevice.h>
>  #include <linux/printk.h>
>  #include <linux/rtnetlink.h>
> +#include <linux/rwsem.h>
>  
> -/* protected by RTNL */
> +/* Protects bpf_prog_offload_devs and offload members of all progs.
> + * RTNL lock cannot be taken when holding this lock.
> + */
> +static struct rw_semaphore bpf_devs_lock;
>  static LIST_HEAD(bpf_prog_offload_devs);
>  
>  int bpf_prog_offload_init(struct bpf_prog *prog, union bpf_attr *attr)
> @@ -43,17 +47,21 @@ int bpf_prog_offload_init(struct bpf_prog *prog, union bpf_attr *attr)
>  	offload->prog = prog;
>  	init_waitqueue_head(&offload->verifier_done);
>  
> -	rtnl_lock();
> +	/* Our UNREGISTER notifier will grab bpf_devs_lock, so we are safe
> +	 * to assume the netdev doesn't get unregistered as long as we hold
> +	 * bpf_devs_lock.
> +	 */
> +	down_write(&bpf_devs_lock);
>  	offload->netdev = __dev_get_by_index(net, attr->prog_ifindex);
>  	if (!offload->netdev) {
> -		rtnl_unlock();
> +		up_write(&bpf_devs_lock);
>  		kfree(offload);
>  		return -EINVAL;
>  	}
>  
>  	prog->aux->offload = offload;
>  	list_add_tail(&offload->offloads, &bpf_prog_offload_devs);
> -	rtnl_unlock();
> +	up_write(&bpf_devs_lock);
>  
>  	return 0;
>  }
> @@ -126,7 +134,9 @@ void bpf_prog_offload_destroy(struct bpf_prog *prog)
>  	wake_up(&offload->verifier_done);
>  
>  	rtnl_lock();
> +	down_write(&bpf_devs_lock);
>  	__bpf_prog_offload_destroy(prog);
> +	up_write(&bpf_devs_lock);
>  	rtnl_unlock();
>  
>  	kfree(offload);
> @@ -181,11 +191,13 @@ static int bpf_offload_notification(struct notifier_block *notifier,
>  		if (netdev->reg_state != NETREG_UNREGISTERING)
>  			break;
>  
> +		down_write(&bpf_devs_lock);
>  		list_for_each_entry_safe(offload, tmp, &bpf_prog_offload_devs,
>  					 offloads) {
>  			if (offload->netdev == netdev)
>  				__bpf_prog_offload_destroy(offload->prog);
>  		}
> +		up_write(&bpf_devs_lock);
>  		break;
>  	default:
>  		break;
> @@ -199,6 +211,7 @@ static struct notifier_block bpf_offload_notifier = {
>  
>  static int __init bpf_offload_init(void)
>  {
> +	init_rwsem(&bpf_devs_lock);

DECLARE_RWSEM() could be used instead of this.

>  	register_netdevice_notifier(&bpf_offload_notifier);
>  	return 0;
>  }
> 

^ permalink raw reply

* Re: [patch iproute2] tc: add -bs option for batch mode
From: Stephen Hemminger @ 2017-12-20 15:17 UTC (permalink / raw)
  To: Chris Mi; +Cc: netdev@vger.kernel.org, gerlitz.or@gmail.com
In-Reply-To: <VI1PR0501MB21433CE21CEF8212A8BBE21AAB0C0@VI1PR0501MB2143.eurprd05.prod.outlook.com>

On Wed, 20 Dec 2017 09:23:34 +0000
Chris Mi <chrism@mellanox.com> wrote:

> > Your real performance win is just not asking for ACK for every rule.  
> No. Even if batch_size > 1, we ack every rule. The real performance win is
> to send multiple rules in one system call. If we are not asking for ACK for every rule,
> the performance will be improved further.

Try the no ACK method.

When we were optimizing routing daemons like Quagga, it was discovered
that an ACK for every route insert was the main bottleneck. Doing asynchronous
error handling got a bigger win than your batching.

Please try that, doing multiple messages using iov is not necessary.

^ permalink raw reply

* Re: [PATCH net v2] ipv4: Fix use-after-free when flushing FIB tables
From: Alexander Duyck @ 2017-12-20 15:32 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: Netdev, David Miller, Duyck, Alexander H, David Ahern,
	Fengguang Wu, mlxsw
In-Reply-To: <20171220085156.27991-1-idosch@mellanox.com>

On Wed, Dec 20, 2017 at 12:51 AM, Ido Schimmel <idosch@mellanox.com> wrote:
> Since commit 0ddcf43d5d4a ("ipv4: FIB Local/MAIN table collapse") the
> local table uses the same trie allocated for the main table when custom
> rules are not in use.
>
> When a net namespace is dismantled, the main table is flushed and freed
> (via an RCU callback) before the local table. In case the callback is
> invoked before the local table is iterated, a use-after-free can occur.
>
> Fix this by iterating over the FIB tables in reverse order, so that the
> main table is always freed after the local table.
>
> v2: Add a comment to make the fix more explicit per Dave's and Alex's
> feedback.
>
> Fixes: 0ddcf43d5d4a ("ipv4: FIB Local/MAIN table collapse")
> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> Reported-by: Fengguang Wu <fengguang.wu@intel.com>
> ---
>  net/ipv4/fib_frontend.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
> index f52d27a422c3..08ce7f14ede1 100644
> --- a/net/ipv4/fib_frontend.c
> +++ b/net/ipv4/fib_frontend.c
> @@ -1298,14 +1298,17 @@ static int __net_init ip_fib_net_init(struct net *net)
>
>  static void ip_fib_net_exit(struct net *net)
>  {
> -       unsigned int i;
> +       int i;
>
>         rtnl_lock();
>  #ifdef CONFIG_IP_MULTIPLE_TABLES
>         RCU_INIT_POINTER(net->ipv4.fib_main, NULL);
>         RCU_INIT_POINTER(net->ipv4.fib_default, NULL);
>  #endif
> -       for (i = 0; i < FIB_TABLE_HASHSZ; i++) {
> +       /* The local table must be destroyed before the main table,
> +        * as it might be using main's trie.
> +        */

I think we might want even more description here. Specifically why
reversing the order allows local to be destroyed before main. I was
thinking something along the lines of:

Destroy the tables in reverse order to guarantee that the local table,
ID 255, is destroyed before main table, ID 254. This is necessary as
local may contain references to data contained in main.

> +       for (i = FIB_TABLE_HASHSZ - 1; i >= 0; i--) {
>                 struct hlist_head *head = &net->ipv4.fib_table_hash[i];
>                 struct hlist_node *tmp;
>                 struct fib_table *tb;
> --
> 2.14.3
>

^ permalink raw reply

* [PATCH net v2] openvswitch: Fix pop_vlan action for double tagged frames
From: Eric Garver @ 2017-12-20 15:39 UTC (permalink / raw)
  To: netdev; +Cc: ovs-dev, Jiri Benc

skb_vlan_pop() expects skb->protocol to be a valid TPID for double
tagged frames, but skb->protocol is set to the ethertype by
key_extract(). So temporarily set it to the TPID when doing a pop_vlan.

Fixes: 5108bbaddc37 ("openvswitch: add processing of L3 packets")
Signed-off-by: Eric Garver <e@erig.me>
---
 net/openvswitch/actions.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 30a5df27116e..c484e0941047 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -280,6 +280,13 @@ static int set_mpls(struct sk_buff *skb, struct sw_flow_key *flow_key,
 static int pop_vlan(struct sk_buff *skb, struct sw_flow_key *key)
 {
 	int err;
+	__be16 proto = skb->protocol;
+
+	/* skb->protocol is set to the inner most parsed ethertype. To satisfy
+	 * skb_vlan_pop() for multi-tagged frames we must set it to the tpid.
+	 */
+	if (is_flow_key_valid(key) && key->eth.vlan.tci && key->eth.cvlan.tci)
+		skb->protocol = key->eth.cvlan.tpid;
 
 	err = skb_vlan_pop(skb);
 	if (skb_vlan_tag_present(skb)) {
@@ -288,6 +295,9 @@ static int pop_vlan(struct sk_buff *skb, struct sw_flow_key *key)
 		key->eth.vlan.tci = 0;
 		key->eth.vlan.tpid = 0;
 	}
+
+	skb->protocol = proto;
+
 	return err;
 }
 
-- 
2.12.0

^ permalink raw reply related

* Re: [PATCH v1 2/4] lib/net_utils: Introduce mac_pton_from_user()
From: David Miller @ 2017-12-20 15:51 UTC (permalink / raw)
  To: gregkh; +Cc: andriy.shevchenko, netdev, Larry.Finger, florian.c.schilhabel,
	devel
In-Reply-To: <20171220071355.GB1957@kroah.com>

From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date: Wed, 20 Dec 2017 08:13:55 +0100

> On Tue, Dec 19, 2017 at 09:14:10PM +0200, Andy Shevchenko wrote:
>> Some drivers are getting MAC from user space. Make a helper for them.
>> 
>> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
>> ---
>>  include/linux/kernel.h |  1 +
>>  lib/net_utils.c        | 12 ++++++++++++
>>  2 files changed, 13 insertions(+)
> 
> Don't do this just for some horrid staging drivers.  They can just drop
> that functionality entirely and use the "normal" way of doing this if
> they really want it.

Agreed.

^ permalink raw reply

* Re: [Intel-wired-lan] v4.15-rc2 on thinkpad x60: ethernet stopped working
From: Pavel Machek @ 2017-12-20 15:54 UTC (permalink / raw)
  To: Fujinaka, Todd
  Cc: Neftin, Sasha, Keller, Jacob E, bpoirier@suse.com,
	nix.or.die@gmail.com, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, intel-wired-lan@lists.osuosl.org,
	lsorense@csclub.uwaterloo.ca, David Miller
In-Reply-To: <9B4A1B1917080E46B64F07F2989DADD69845BBE2@ORSMSX110.amr.corp.intel.com>

[-- Attachment #1: Type: text/plain, Size: 619 bytes --]

Hi!

> >> Before ask for reverting 19110cfbb..., please, check if follow patch 
> >> of Benjamin work for you http://patchwork.ozlabs.org/patch/846825/

> >
> Pavel, before ask for revert - let's check Benjamin's patch following to his previous patch. Previous patch was not competed and latest one come to complete changes.
>

v4.15-rc4+:

Ethernet works with 19110cfbb reverted.

Ethernet works With patchwork.ozlabs.org/patch/846825/ applied.


									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply

* Re: Linux 4.14 - regression: broken tun/tap / bridge network with virtio - bisected
From: Andreas Hartmann @ 2017-12-20 15:56 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Michal Kubecek, Jason Wang, David Miller, Network Development
In-Reply-To: <6f75bdf5-839b-8c84-c8be-e83d071b245e@maya.org>

On 12/18/2017 at 06:11 PM Andreas Hartmann wrote:
> On 12/17/2017 at 11:33 PM Willem de Bruijn wrote:
[...]
>> I have been able to reproduce the hang by sending a UFO packet
>> between two guests running v4.13 on a host running v4.15-rc1.
>>
>> The vhost_net_ubuf_ref refcount indeed hits overflow (-1) from
>> vhost_zerocopy_callback being called for each segment of a
>> segmented UFO skb. This refcount is decremented then on each
>> segment, but incremented only once for the entire UFO skb.
>>
>> Before v4.14, these packets would be converted in skb_segment to
>> regular copy packets with skb_orphan_frags and the callback function
>> called once at this point. v4.14 added support for reference counted
>> zerocopy skb that can pass through skb_orphan_frags unmodified and
>> have their zerocopy state safely cloned with skb_zerocopy_clone.
>>
>> The call to skb_zerocopy_clone must come after skb_orphan_frags
>> to limit cloning of this state to those skbs that can do so safely.
>>
>> Please try a host with the following patch. This fixes it for me. I intend to
>> send it to net.
>>
>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>> index a592ca025fc4..d2d985418819 100644
>> --- a/net/core/skbuff.c
>> +++ b/net/core/skbuff.c
>> @@ -3654,8 +3654,6 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>>
>>                 skb_shinfo(nskb)->tx_flags |= skb_shinfo(head_skb)->tx_flags &
>>                                               SKBTX_SHARED_FRAG;
>> -               if (skb_zerocopy_clone(nskb, head_skb, GFP_ATOMIC))
>> -                       goto err;
>>
>>                 while (pos < offset + len) {
>>                         if (i >= nfrags) {
>> @@ -3681,6 +3679,8 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>>
>>                         if (unlikely(skb_orphan_frags(frag_skb, GFP_ATOMIC)))
>>                                 goto err;
>> +                       if (skb_zerocopy_clone(nskb, frag_skb, GFP_ATOMIC))
>> +                               goto err;
>>
>>                         *nskb_frag = *frag;
>>                         __skb_frag_ref(nskb_frag);
>>
>>
>> This is relatively inefficient, as it calls skb_zerocopy_clone for each frag
>> in the frags[] array. I will follow-up with a patch to net-next that only
>> checks once per skb:
>>
>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>> index 466581cf4cdc..a293a33604ec 100644
>> --- a/net/core/skbuff.c
>> +++ b/net/core/skbuff.c
>> @@ -3662,7 +3662,8 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>>
>>                 skb_shinfo(nskb)->tx_flags |= skb_shinfo(head_skb)->tx_flags &
>>                                               SKBTX_SHARED_FRAG;
>> -               if (skb_zerocopy_clone(nskb, head_skb, GFP_ATOMIC))
>> +               if (skb_orphan_frags(frag_skb, GFP_ATOMIC) ||
>> +                   skb_zerocopy_clone(nskb, frag_skb, GFP_ATOMIC))
>>                         goto err;
>>
>>                 while (pos < offset + len) {
>> @@ -3676,6 +3677,11 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>>
>>                                 BUG_ON(!nfrags);
>>
>> +                               if (skb_orphan_frags(frag_skb, GFP_ATOMIC) ||
>> +                                   skb_zerocopy_clone(nskb, frag_skb,
>> +                                                      GFP_ATOMIC))
>> +                                       goto err;
>> +
>>                                 list_skb = list_skb->next;
>>                         }
>>
>> @@ -3687,9 +3693,6 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>>                                 goto err;
>>                         }
>>
>> -                       if (unlikely(skb_orphan_frags(frag_skb, GFP_ATOMIC)))
>> -                               goto err;
>> -
> 
> I'm currently testing this one.
> 

Test is in progress. I'm testing w/ 4.14.7, which already contains "net:
accept UFO datagrams from tuntap and packet".

At first, I tested an unpatched 4.14.7 - the problem (no more killable
qemu-process) did occur promptly on shutdown of the machine. This was
expected.

Next, I applied the above patch (the second one). Until now, I didn't
face any problem any more on shutdown of VMs. Looks promising.


Thanks,
regards,
Andreas

^ permalink raw reply

* Re: [Intel-wired-lan] v4.15-rc2 on thinkpad x60: ethernet stopped working
From: Pavel Machek @ 2017-12-20 16:01 UTC (permalink / raw)
  To: Fujinaka, Todd
  Cc: Neftin, Sasha, Keller, Jacob E, bpoirier@suse.com,
	nix.or.die@gmail.com, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, intel-wired-lan@lists.osuosl.org,
	lsorense@csclub.uwaterloo.ca, David Miller
In-Reply-To: <20171220155421.GA3849@amd>

[-- Attachment #1: Type: text/plain, Size: 962 bytes --]

On Wed 2017-12-20 16:54:21, Pavel Machek wrote:
> Hi!
> 
> > >> Before ask for reverting 19110cfbb..., please, check if follow patch 
> > >> of Benjamin work for you http://patchwork.ozlabs.org/patch/846825/
> 
> > >
> > Pavel, before ask for revert - let's check Benjamin's patch following to his previous patch. Previous patch was not competed and latest one come to complete changes.
> >
> 
> v4.15-rc4+:
> 
> Ethernet works with 19110cfbb reverted.
> 
> Ethernet works With patchwork.ozlabs.org/patch/846825/ applied.

Hmm. So... ethernet originally did not work with patch/846825/ applied
or 19110cfbb reverted, so I re-plugged ethernet cables. Now it works
even with plain v4.15-rc4+.

So it looks like the bug was fixed in the mainline in the meantime...?

Sorry for the noise,
								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply

* [PATCH v2] net: ibm: emac: support RGMII-[RX|TX]ID phymode
From: Christian Lamparter @ 2017-12-20 16:02 UTC (permalink / raw)
  To: netdev; +Cc: David S . Miller, Andrew Lunn, Christophe Jaillet

The RGMII spec allows compliance for devices that implement an internal
delay on TXC and/or RXC inside the transmitter. This patch adds the
necessary RGMII_[RX|TX]ID mode code to handle such PHYs with the
emac driver.

Signed-off-by: Christian Lamparter <chunkeey@gmail.com>

---
v2: - utilize phy_interface_mode_is_rgmii()
---
 drivers/net/ethernet/ibm/emac/core.c  |  4 ++--
 drivers/net/ethernet/ibm/emac/emac.h  |  3 +++
 drivers/net/ethernet/ibm/emac/rgmii.c | 10 ++++++++--
 3 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/ibm/emac/core.c b/drivers/net/ethernet/ibm/emac/core.c
index 7feff2450ed6..043e72e28bba 100644
--- a/drivers/net/ethernet/ibm/emac/core.c
+++ b/drivers/net/ethernet/ibm/emac/core.c
@@ -199,8 +199,8 @@ static void __emac_set_multicast_list(struct emac_instance *dev);
 
 static inline int emac_phy_supports_gige(int phy_mode)
 {
-	return  phy_mode == PHY_MODE_GMII ||
-		phy_mode == PHY_MODE_RGMII ||
+	return  phy_interface_mode_is_rgmii(phy_mode) ||
+		phy_mode == PHY_MODE_GMII ||
 		phy_mode == PHY_MODE_SGMII ||
 		phy_mode == PHY_MODE_TBI ||
 		phy_mode == PHY_MODE_RTBI;
diff --git a/drivers/net/ethernet/ibm/emac/emac.h b/drivers/net/ethernet/ibm/emac/emac.h
index 5afcc27ceebb..8c6d2af7281b 100644
--- a/drivers/net/ethernet/ibm/emac/emac.h
+++ b/drivers/net/ethernet/ibm/emac/emac.h
@@ -112,6 +112,9 @@ struct emac_regs {
 #define PHY_MODE_RMII	PHY_INTERFACE_MODE_RMII
 #define PHY_MODE_SMII	PHY_INTERFACE_MODE_SMII
 #define PHY_MODE_RGMII	PHY_INTERFACE_MODE_RGMII
+#define PHY_MODE_RGMII_ID	PHY_INTERFACE_MODE_RGMII_ID
+#define PHY_MODE_RGMII_RXID	PHY_INTERFACE_MODE_RGMII_RXID
+#define PHY_MODE_RGMII_TXID	PHY_INTERFACE_MODE_RGMII_TXID
 #define PHY_MODE_TBI	PHY_INTERFACE_MODE_TBI
 #define PHY_MODE_GMII	PHY_INTERFACE_MODE_GMII
 #define PHY_MODE_RTBI	PHY_INTERFACE_MODE_RTBI
diff --git a/drivers/net/ethernet/ibm/emac/rgmii.c b/drivers/net/ethernet/ibm/emac/rgmii.c
index c4a1ac38bba8..124b0473d2b7 100644
--- a/drivers/net/ethernet/ibm/emac/rgmii.c
+++ b/drivers/net/ethernet/ibm/emac/rgmii.c
@@ -52,9 +52,9 @@
 /* RGMII bridge supports only GMII/TBI and RGMII/RTBI PHYs */
 static inline int rgmii_valid_mode(int phy_mode)
 {
-	return  phy_mode == PHY_MODE_GMII ||
+	return  phy_interface_mode_is_rgmii(phy_mode) ||
+		phy_mode == PHY_MODE_GMII ||
 		phy_mode == PHY_MODE_MII ||
-		phy_mode == PHY_MODE_RGMII ||
 		phy_mode == PHY_MODE_TBI ||
 		phy_mode == PHY_MODE_RTBI;
 }
@@ -63,6 +63,9 @@ static inline const char *rgmii_mode_name(int mode)
 {
 	switch (mode) {
 	case PHY_MODE_RGMII:
+	case PHY_MODE_RGMII_ID:
+	case PHY_MODE_RGMII_RXID:
+	case PHY_MODE_RGMII_TXID:
 		return "RGMII";
 	case PHY_MODE_TBI:
 		return "TBI";
@@ -81,6 +84,9 @@ static inline u32 rgmii_mode_mask(int mode, int input)
 {
 	switch (mode) {
 	case PHY_MODE_RGMII:
+	case PHY_MODE_RGMII_ID:
+	case PHY_MODE_RGMII_RXID:
+	case PHY_MODE_RGMII_TXID:
 		return RGMII_FER_RGMII(input);
 	case PHY_MODE_TBI:
 		return RGMII_FER_TBI(input);
-- 
2.15.1

^ permalink raw reply related

* Re: [PATCH v3 ipsec-next 3/3] xfrm: wrap xfrmdev_ops with offload config
From: Marcelo Ricardo Leitner @ 2017-12-20 16:03 UTC (permalink / raw)
  To: Shannon Nelson; +Cc: steffen.klassert, netdev
In-Reply-To: <1513726549-7065-4-git-send-email-shannon.nelson@oracle.com>

On Tue, Dec 19, 2017 at 03:35:49PM -0800, Shannon Nelson wrote:
> There's no reason to define netdev->xfrmdev_ops if
> the offload facility is not CONFIG'd in.
> 
> Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>

This one could use a Fixes tag perhaps:
Fixes: d77e38e612a0 ("xfrm: Add an IPsec hardware offloading API")

as in theory the build was broken since then, as it added:
+#ifdef CONFIG_XFRM_OFFLOAD
+struct xfrmdev_ops {
...
+#ifdef CONFIG_XFRM
+       const struct xfrmdev_ops *xfrmdev_ops;

So the pointer would have an undefined type
  if CONFIG_XFRM && !CONFIG_XFRM_OFFLOAD
Though I couldn't reproduce this, not sure why.

But.. is it buildable with this patch? I mine failed:

obj-$(CONFIG_XFRM) := xfrm_policy.o xfrm_state.o xfrm_hash.o \
                      xfrm_input.o xfrm_output.o \
                      xfrm_sysctl.o xfrm_replay.o xfrm_device.o

so xfrm_device is always in if CONFIG_XFRM is there,
xfrm_dev_init(), via xfrm_dev_notifier -> xfrm_dev_event() ->
  xfrm_dev_register() and then:

static int xfrm_dev_register(struct net_device *dev)
{
        if ((dev->features & NETIF_F_HW_ESP) && !dev->xfrmdev_ops)
                                                 ^^^^^^^^^^^^^^^^

We can't control CONFIG_XFRM_OFFLOAD directly, so unless you
unselected other offloadings such as INET_ESP_OFFLOAD, it is still on.

linux/net/xfrm/xfrm_device.c: In function ‘xfrm_dev_register’:
linux/net/xfrm/xfrm_device.c:147:48: error: ‘struct net_device’ has no member named ‘xfrmdev_ops’; did you mean ‘netdev_ops’?
  if ((dev->features & NETIF_F_HW_ESP) && !dev->xfrmdev_ops)
                                                ^~~~~~~~~~~
                                                netdev_ops


> ---
>  include/linux/netdevice.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 2eaac7d..145d0de 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1697,7 +1697,7 @@ struct net_device {
>  	const struct ndisc_ops *ndisc_ops;
>  #endif
>  
> -#ifdef CONFIG_XFRM
> +#ifdef CONFIG_XFRM_OFFLOAD
>  	const struct xfrmdev_ops *xfrmdev_ops;
>  #endif
>  
> -- 
> 2.7.4
> 

^ permalink raw reply

* Re: [PATCH net-next] dev: Correctly get length of alias string in dev_set_alias()
From: David Miller @ 2017-12-20 16:05 UTC (permalink / raw)
  To: serhe.popovych; +Cc: netdev
In-Reply-To: <1513633115-16940-1-git-send-email-serhe.popovych@gmail.com>

From: Serhey Popovych <serhe.popovych@gmail.com>
Date: Mon, 18 Dec 2017 23:38:35 +0200

> We supply number of bytes available in @alias via @len
> parameter to dev_set_alias() which is not the same
> as zero terminated string length that can be shorter.
> 
> Both dev_set_alias() users (rtnetlink and sysfs) can
> submit number of bytes up to IFALIASZ with actual string
> length slightly shorter by putting '\0' not at @len - 1.
> 
> Use strnlen() to get length of zero terminated string
> and not access beyond @len. Correct comment about @len
> and explain how to unset alias (i.e. use zero for @len).
> 
> Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>

I don't really see this as useful, really.

In the sysfs case, we are not presented with a NULL terminated string.
Instead, the net sysfs code gives us a length that goes up until the
trailing newline character.  The sysfs case is never larger than the
actual string size + 1.

The netlink attribute is usually sized appropriately for whatever the
string length actually is.

This therefore just seems to add an new strnlen() unnecessarily to
this code path, which rarely does anything helpful.

Thanks.

^ permalink raw reply

* Re: [PATCH v3 net-next 0/6] tls: Add generic NIC offload infrastructure
From: David Miller @ 2017-12-20 16:12 UTC (permalink / raw)
  To: borisp; +Cc: jiri, ilyal, netdev, davejwatson, tom, hannes, aviadye, liranl
In-Reply-To: <HE1PR0501MB223554DC2840A95B594ED0D1B00C0@HE1PR0501MB2235.eurprd05.prod.outlook.com>

From: Boris Pismenny <borisp@mellanox.com>
Date: Wed, 20 Dec 2017 08:28:03 +0000

> Dave, would you prefer to get the driver patches that use this infra
> before the infra?

The arguments you present are silly.

In order to analyze any proposed API, the users of it must be presented
for the reviewers to see as well.

Logically, you must have tried to make use of the APIs to see how well
they work and are usable for at least one such user, right?

Therefore, the use case exists, and you must present it alongside the
API proposal.

Whether you provide the API addition patches and the user in the same
patch series, or a separate one, doesn't really matter.  What is
important is that this is accessible to the reviewer at the same
time.

^ permalink raw reply

* Re: sparc64 verifier failures..
From: David Miller @ 2017-12-20 16:17 UTC (permalink / raw)
  To: daniel; +Cc: netdev, alexei.starovoitov
In-Reply-To: <58565df5-2c45-6ab3-fb48-de011da70e6a@iogearbox.net>

From: Daniel Borkmann <daniel@iogearbox.net>
Date: Wed, 20 Dec 2017 14:05:36 +0100

> On 12/19/2017 09:36 PM, David Miller wrote:
>> 
>> I'm getting about 100 verifier failures on sparc64.
>> 
>> The vast majority of them seem to be due to misaligned packet
>> accesses.  Here is a sample of some of the failures.
> 
> Thanks, I'll check it next days. It would probably make sense to have
> a --strict-align mode for test_verifier that enables it on all tests
> unconditionally so selftests suite would always run in both modes (at
> least on archs that don't have this restriction).

Indeed, and for some reason I thought we were already doing this.

^ permalink raw reply

* RE: [PATCH v3 net-next 0/6] tls: Add generic NIC offload infrastructure
From: Ilya Lesokhin @ 2017-12-20 16:23 UTC (permalink / raw)
  To: David Miller, Boris Pismenny
  Cc: jiri@resnulli.us, netdev@vger.kernel.org, davejwatson@fb.com,
	tom@herbertland.com, hannes@stressinduktion.org, Aviad Yehezkel,
	Liran Liss
In-Reply-To: <20171220.111208.1328340432834146497.davem@davemloft.net>

> 
> > Dave, would you prefer to get the driver patches that use this infra
> > before the infra?
> 
> The arguments you present are silly.
> 
> In order to analyze any proposed API, the users of it must be presented for the
> reviewers to see as well.
> 
> Logically, you must have tried to make use of the APIs to see how well they
> work and are usable for at least one such user, right?
Right, we agree.
> 
> Therefore, the use case exists, and you must present it alongside the API
> proposal.
> 
> Whether you provide the API addition patches and the user in the same patch
> series, or a separate one, doesn't really matter.  What is important is that this
> is accessible to the reviewer at the same time.

Note that we did provide a user in an accessible place.
https://github.com/Mellanox/tls-offload/tree/tls_device_v3
The link was at the bottom of the cover letter.

We just feel that the code there is not yet ready for upstream submission, and it might have
conflicts with other stuff submitted by Mellanox.

Would it be better if we submitted the mlx5e TLS support as an RFC alongside the TLS
Infrastructure patches?

^ permalink raw reply

* Re: [PATCH v3 ipsec-next 3/3] xfrm: wrap xfrmdev_ops with offload config
From: Shannon Nelson @ 2017-12-20 16:22 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner; +Cc: steffen.klassert, netdev
In-Reply-To: <20171220160240.GK6122@localhost.localdomain>

On 12/20/2017 8:03 AM, Marcelo Ricardo Leitner wrote:
> On Tue, Dec 19, 2017 at 03:35:49PM -0800, Shannon Nelson wrote:
>> There's no reason to define netdev->xfrmdev_ops if
>> the offload facility is not CONFIG'd in.
>>
>> Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
> 
> This one could use a Fixes tag perhaps:
> Fixes: d77e38e612a0 ("xfrm: Add an IPsec hardware offloading API")
> 
> as in theory the build was broken since then, as it added:
> +#ifdef CONFIG_XFRM_OFFLOAD
> +struct xfrmdev_ops {
> ...
> +#ifdef CONFIG_XFRM
> +       const struct xfrmdev_ops *xfrmdev_ops;
> 
> So the pointer would have an undefined type
>    if CONFIG_XFRM && !CONFIG_XFRM_OFFLOAD
> Though I couldn't reproduce this, not sure why.

Hmmm, I don't think this requires a "Fixes" tag, as the code all worked 
just fine, I'm just doing a little cleaning.

Patch 2/3 adds a more intense look at the data structure, so I needed to 
change it to the CONFIG_XFRM_OFFLOAD so as to not break the build. 
Since the xfrmdev_ops field is now never used unless we have 
CONFIG_XFRM_OFFLOAD, we can change the net_device definition to be just 
a bit smaller without it.

> 
> But.. is it buildable with this patch? I mine failed:
> 
> obj-$(CONFIG_XFRM) := xfrm_policy.o xfrm_state.o xfrm_hash.o \
>                        xfrm_input.o xfrm_output.o \
>                        xfrm_sysctl.o xfrm_replay.o xfrm_device.o
> 
> so xfrm_device is always in if CONFIG_XFRM is there,
> xfrm_dev_init(), via xfrm_dev_notifier -> xfrm_dev_event() ->
>    xfrm_dev_register() and then:
> 
> static int xfrm_dev_register(struct net_device *dev)
> {
>          if ((dev->features & NETIF_F_HW_ESP) && !dev->xfrmdev_ops)

This looks like you haven't applied version 3 of the 2nd patch "xfrm: 
check for xdo_dev_ops add and delete".  I missed this in the earlier 
version (not enough compile tests), but version 3 of patch 2/3  should 
address it.

sln

>                                                   ^^^^^^^^^^^^^^^^
> 
> We can't control CONFIG_XFRM_OFFLOAD directly, so unless you
> unselected other offloadings such as INET_ESP_OFFLOAD, it is still on.
> 
> linux/net/xfrm/xfrm_device.c: In function ‘xfrm_dev_register’:
> linux/net/xfrm/xfrm_device.c:147:48: error: ‘struct net_device’ has no member named ‘xfrmdev_ops’; did you mean ‘netdev_ops’?
>    if ((dev->features & NETIF_F_HW_ESP) && !dev->xfrmdev_ops)
>                                                  ^~~~~~~~~~~
>                                                  netdev_ops
> 
> 
>> ---
>>   include/linux/netdevice.h | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>> index 2eaac7d..145d0de 100644
>> --- a/include/linux/netdevice.h
>> +++ b/include/linux/netdevice.h
>> @@ -1697,7 +1697,7 @@ struct net_device {
>>   	const struct ndisc_ops *ndisc_ops;
>>   #endif
>>   
>> -#ifdef CONFIG_XFRM
>> +#ifdef CONFIG_XFRM_OFFLOAD
>>   	const struct xfrmdev_ops *xfrmdev_ops;
>>   #endif
>>   
>> -- 
>> 2.7.4
>>

^ permalink raw reply

* Re: [QUESTION] Doubt about NAPI_GRO_CB(skb)->is_atomic in tcpv4 gro process
From: Alexander Duyck @ 2017-12-20 16:24 UTC (permalink / raw)
  To: Yunsheng Lin
  Cc: netdev@vger.kernel.org, davem@davemloft.net, linuxarm@huawei.com,
	yuxiaowu, wzhen.wang, Xuehuahu
In-Reply-To: <ca367d0d-b0d7-3357-7196-c0da17ef9890@huawei.com>

On Wed, Dec 20, 2017 at 1:09 AM, Yunsheng Lin <linyunsheng@huawei.com> wrote:
> Hi, all
>         I have some doubt about NAPI_GRO_CB(skb)->is_atomic when
> analyzing the tcpv4 gro process:
>
> Firstly we set NAPI_GRO_CB(skb)->is_atomic to 1 in dev_gro_receive:
> https://elixir.free-electrons.com/linux/v4.15-rc4/source/net/core/dev.c#L4838
>
> And then in inet_gro_receive, we check the NAPI_GRO_CB(skb)->is_atomic
> before setting NAPI_GRO_CB(skb)->is_atomic according to IP_DF bit in the ip header:
> https://elixir.free-electrons.com/linux/v4.15-rc4/source/net/ipv4/af_inet.c#L1319
>
> struct sk_buff **inet_gro_receive(struct sk_buff **head, struct sk_buff *skb)
> {
> .....................
>         for (p = *head; p; p = p->next) {
> ........................
>
>                 /* If the previous IP ID value was based on an atomic
>                  * datagram we can overwrite the value and ignore it.
>                  */
>                 if (NAPI_GRO_CB(skb)->is_atomic)                      //we check it here
>                         NAPI_GRO_CB(p)->flush_id = flush_id;
>                 else
>                         NAPI_GRO_CB(p)->flush_id |= flush_id;
>         }
>
>         NAPI_GRO_CB(skb)->is_atomic = !!(iph->frag_off & htons(IP_DF));  //we set it here
>         NAPI_GRO_CB(skb)->flush |= flush;
>         skb_set_network_header(skb, off);
> ................................
> }
>
> My question is whether we should check the NAPI_GRO_CB(skb)->is_atomic or NAPI_GRO_CB(p)->is_atomic?
> If we should check NAPI_GRO_CB(skb)->is_atomic, then maybe it is unnecessary because it is alway true.
> If we should check NAPI_GRO_CB(p)->is_atomic, maybe there is a bug here.
>
> So what is the logic here? I am just start analyzing the gro, maybe I miss something obvious here.

The logic there is to address the multiple IP header case where there
are 2 or more IP headers due to things like VXLAN or GRE tunnels. So
what will happen is that an outer IP header will end up being sent
with DF not set and will clear the is_atomic value then we want to OR
in the next header that is applied. It defaults to assignment on
is_atomic because the first IP header will encounter flush_id with no
previous configuration occupying it.

The part I am not sure about is if we should be using assignment for
is_atomic or using an "&=" to clear the bit and leave it cleared. I
don't know if there has been much testing of multiple levels of tunnel
header.

Thanks.

- Alex

^ permalink raw reply

* [PATCH net] ip6_gre: fix device features for ioctl setup
From: Alexey Kodanev @ 2017-12-20 16:36 UTC (permalink / raw)
  To: netdev; +Cc: Alexander Duyck, David Miller, Alexey Kodanev

When ip6gre is created using ioctl, its features, such as
scatter-gather, GSO and tx-checksumming will be turned off:

  # ip -f inet6 tunnel add gre6 mode ip6gre remote fd00::1
  # ethtool -k gre6 (truncated output)
    tx-checksumming: off
    scatter-gather: off
    tcp-segmentation-offload: off
    generic-segmentation-offload: off [requested on]

But when netlink is used, they will be enabled:
  # ip link add gre6 type ip6gre remote fd00::1
  # ethtool -k gre6 (truncated output)
    tx-checksumming: on
    scatter-gather: on
    tcp-segmentation-offload: on
    generic-segmentation-offload: on

This results in a loss of performance when gre6 is created via ioctl.
The issue was found with LTP/gre tests.

Fix it by moving the setup of device features to a separate function
and invoke it with ndo_init callback because both netlink and ioctl
will eventually call it via register_netdevice():

   register_netdevice()
       - ndo_init() callback -> ip6gre_tunnel_init() or ip6gre_tap_init()
           - ip6gre_tunnel_init_common()
                - ip6gre_tnl_init_features()

The moved code also contains two minor style fixes:
  * removed needless tab from GRE6_FEATURES on NETIF_F_HIGHDMA line.
  * fixed the issue reported by checkpatch: "Unnecessary parentheses around
    'nt->encap.type == TUNNEL_ENCAP_NONE'"

Fixes: ac4eb009e477 ("ip6gre: Add support for basic offloads offloads excluding GSO")
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
---
 net/ipv6/ip6_gre.c |   57 +++++++++++++++++++++++++++++----------------------
 1 files changed, 32 insertions(+), 25 deletions(-)

diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index 4cfd8e0..9a4b376 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -1014,6 +1014,36 @@ static void ip6gre_tunnel_setup(struct net_device *dev)
 	eth_random_addr(dev->perm_addr);
 }
 
+#define GRE6_FEATURES (NETIF_F_SG |		\
+		       NETIF_F_FRAGLIST |	\
+		       NETIF_F_HIGHDMA |	\
+		       NETIF_F_HW_CSUM)
+
+static void ip6gre_tnl_init_features(struct net_device *dev)
+{
+	struct ip6_tnl *nt = netdev_priv(dev);
+
+	dev->features		|= GRE6_FEATURES;
+	dev->hw_features	|= GRE6_FEATURES;
+
+	if (!(nt->parms.o_flags & TUNNEL_SEQ)) {
+		/* TCP offload with GRE SEQ is not supported, nor
+		 * can we support 2 levels of outer headers requiring
+		 * an update.
+		 */
+		if (!(nt->parms.o_flags & TUNNEL_CSUM) ||
+		    nt->encap.type == TUNNEL_ENCAP_NONE) {
+			dev->features    |= NETIF_F_GSO_SOFTWARE;
+			dev->hw_features |= NETIF_F_GSO_SOFTWARE;
+		}
+
+		/* Can use a lockless transmit, unless we generate
+		 * output sequences
+		 */
+		dev->features |= NETIF_F_LLTX;
+	}
+}
+
 static int ip6gre_tunnel_init_common(struct net_device *dev)
 {
 	struct ip6_tnl *tunnel;
@@ -1048,6 +1078,8 @@ static int ip6gre_tunnel_init_common(struct net_device *dev)
 	if (!(tunnel->parms.flags & IP6_TNL_F_IGN_ENCAP_LIMIT))
 		dev->mtu -= 8;
 
+	ip6gre_tnl_init_features(dev);
+
 	return 0;
 }
 
@@ -1298,11 +1330,6 @@ static int ip6gre_tap_init(struct net_device *dev)
 	.ndo_get_iflink = ip6_tnl_get_iflink,
 };
 
-#define GRE6_FEATURES (NETIF_F_SG |		\
-		       NETIF_F_FRAGLIST |	\
-		       NETIF_F_HIGHDMA |		\
-		       NETIF_F_HW_CSUM)
-
 static void ip6gre_tap_setup(struct net_device *dev)
 {
 
@@ -1382,26 +1409,6 @@ static int ip6gre_newlink(struct net *src_net, struct net_device *dev,
 	nt->net = dev_net(dev);
 	ip6gre_tnl_link_config(nt, !tb[IFLA_MTU]);
 
-	dev->features		|= GRE6_FEATURES;
-	dev->hw_features	|= GRE6_FEATURES;
-
-	if (!(nt->parms.o_flags & TUNNEL_SEQ)) {
-		/* TCP offload with GRE SEQ is not supported, nor
-		 * can we support 2 levels of outer headers requiring
-		 * an update.
-		 */
-		if (!(nt->parms.o_flags & TUNNEL_CSUM) ||
-		    (nt->encap.type == TUNNEL_ENCAP_NONE)) {
-			dev->features    |= NETIF_F_GSO_SOFTWARE;
-			dev->hw_features |= NETIF_F_GSO_SOFTWARE;
-		}
-
-		/* Can use a lockless transmit, unless we generate
-		 * output sequences
-		 */
-		dev->features |= NETIF_F_LLTX;
-	}
-
 	err = register_netdevice(dev);
 	if (err)
 		goto out;
-- 
1.7.1

^ permalink raw reply related

* Re: [PATCHv3 net-next 00/14] net: sched: sch: introduce extack support
From: David Miller @ 2017-12-20 16:32 UTC (permalink / raw)
  To: aring; +Cc: jhs, xiyou.wangcong, jiri, netdev, kernel, dsahern
In-Reply-To: <20171218224513.29836-1-aring@mojatatu.com>

From: Alexander Aring <aring@mojatatu.com>
Date: Mon, 18 Dec 2017 17:44:59 -0500

> this patch series basically add support for extack in common qdisc handling.
> Additional it adds extack pointer to common qdisc callback handling this
> offers per qdisc implementation to setting the extack message for each
> failure over netlink.

This patch series doesn't apply cleanly to net-next.

^ permalink raw reply

* Re: [PATCHv3 net-next 00/14] net: sched: sch: introduce extack support
From: Alexander Aring @ 2017-12-20 16:35 UTC (permalink / raw)
  To: David Miller
  Cc: Jamal Hadi Salim, Cong Wang, Jiří Pírko, netdev,
	kernel, David Ahern
In-Reply-To: <20171220.113249.514392331552783606.davem@davemloft.net>

Hi,

On Wed, Dec 20, 2017 at 11:32 AM, David Miller <davem@davemloft.net> wrote:
> From: Alexander Aring <aring@mojatatu.com>
> Date: Mon, 18 Dec 2017 17:44:59 -0500
>
>> this patch series basically add support for extack in common qdisc handling.
>> Additional it adds extack pointer to common qdisc callback handling this
>> offers per qdisc implementation to setting the extack message for each
>> failure over netlink.
>
> This patch series doesn't apply cleanly to net-next.

okay, I will rebase it and send v4.

Thanks.

- Alex

^ permalink raw reply

* Re: [PATCH v3 net-next 0/6] tls: Add generic NIC offload infrastructure
From: David Miller @ 2017-12-20 16:36 UTC (permalink / raw)
  To: ilyal; +Cc: borisp, jiri, netdev, davejwatson, tom, hannes, aviadye, liranl
In-Reply-To: <AM4PR0501MB27231C4EC47FB287D7FF1EC0D40C0@AM4PR0501MB2723.eurprd05.prod.outlook.com>

From: Ilya Lesokhin <ilyal@mellanox.com>
Date: Wed, 20 Dec 2017 16:23:03 +0000

>> Whether you provide the API addition patches and the user in the same patch
>> series, or a separate one, doesn't really matter.  What is important is that this
>> is accessible to the reviewer at the same time.
> 
> Note that we did provide a user in an accessible place.

That is not accessible for people reading netdev, it needs to be posted
on the netdev list.

It is never appropriate to require a reviewer to look at some external
site to review a patch series posted here.

^ permalink raw reply

* Re: [PATCH net-next v4] ip6_vti: adjust vti mtu according to mtu of lower device
From: David Miller @ 2017-12-20 16:53 UTC (permalink / raw)
  To: alexey.kodanev; +Cc: netdev, steffen.klassert, pvorel, shannon.nelson
In-Reply-To: <1513691961-19692-1-git-send-email-alexey.kodanev@oracle.com>

From: Alexey Kodanev <alexey.kodanev@oracle.com>
Date: Tue, 19 Dec 2017 16:59:21 +0300

> LTP/udp6_ipsec_vti tests fail when sending large UDP datagrams over
> ip6_vti that require fragmentation and the underlying device has an
> MTU smaller than 1500 plus some extra space for headers. This happens
> because ip6_vti, by default, sets MTU to ETH_DATA_LEN and not updating
> it depending on a destination address or link parameter. Further
> attempts to send UDP packets may succeed because pmtu gets updated on
> ICMPV6_PKT_TOOBIG in vti6_err().
> 
> In case the lower device has larger MTU size, e.g. 9000, ip6_vti works
> but not using the possible maximum size, output packets have 1500 limit.
> 
> The above cases require manual MTU setup after ip6_vti creation. However
> ip_vti already updates MTU based on lower device with ip_tunnel_bind_dev().
> 
> Here is the example when the lower device MTU is set to 9000:
 ...
> Reported-by: Petr Vorel <pvorel@suse.cz>
> Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>

Applied, thanks Alexey.

^ permalink raw reply

* Re: [PATCH net-next] qed*: Utilize FW 8.33.1.0
From: David Miller @ 2017-12-20 16:54 UTC (permalink / raw)
  To: jakub.kicinski
  Cc: Tomer.Tayar, netdev, linux-rdma, linux-scsi, Ariel.Elior,
	Michal.Kalderon, Yuval.Bason, Ram.Amrani, Manish.Chopra,
	Chad.Dupuis, Manish.Rangankar
In-Reply-To: <20171219154651.30930d2a@cakuba.netronome.com>

From: Jakub Kicinski <jakub.kicinski@netronome.com>
Date: Tue, 19 Dec 2017 15:46:51 -0800

> On Tue, 19 Dec 2017 16:05:23 +0200, Tomer Tayar wrote:
>> Sorry for the very long patch.
>> The firmware changes are spread all over w/o a good modularity.
> 
> Rings false.  Significant portion of this patch is just whitespace 
> and comment changes.

Totally agreed.

This thing is beyond huge already, adding unrelated whitespace and
comment changes make reviewing it even more impossible.

^ permalink raw reply

* [PATCH ipsec-next 0/7]: Support multiple VTIs with the same src+dst pair
From: Lorenzo Colitti @ 2017-12-20 17:06 UTC (permalink / raw)
  To: netdev; +Cc: steffen.klassert, subashab, nharold, davem

When using IPsec tunnel mode, VTIs provide many benefits compared
to direct configuration of xfrm policies / states. However, one
limitation is that there can only be one VTI between a given pair
of IP addresses. This does not allow configuring multiple IPsec
tunnels to the same security gateway. This is required by some
deployments, for example I-WLAN [3GPP TS 24.327].

This patchset introduces a new VTI_KEYED flag that allows
configuration of multiple VTIs between the same IP address
pairs. The semantics are as follows:

- The output path is the same as current VTI behaviour, where a
  routing lookup selects a VTI interface, and the VTI's okey
  specifies the mark to use in the XFRM lookup.
- The input and ICMP error paths instead work by first looking up
  an SA with a loose match that ignores the mark. That mark is
  then used to find the tunnel by ikey (for input packets) or
  okey (for ICMP errors).

In order for ICMP errors to work, flags are added to the common
IP lookup functions to ignore the tunnel ikey and to look up
tunnels by okey instead of ikey.

On the same IP address pair, keyed VTIs can coexist with each
other (as long as the ikeys are different), but cannot coexist
with keyless VTIs. This is because the existing keyless VTI
behaviour (which this series does not change) is to always accept
packets matching an input policy, regardless of whether there is
any matching XFRM state. Thus, the keyless VTI would accept the
traffic for the keyed tunnel and drop it because it would not
match the keyed tunnel's state.

Changes from RFC series:
- Processing of ICMP errors now works when ikey != okey.
- Series now contains changes to the common tunnel lookup
  functions to match tunnels by okey and to ignore ikey when
  matching.
- Fixed missing EXPORT_SYMBOL for xfrm_state_lookup_loose.
- Made vti6_lookup static as it should have been.

^ permalink raw reply

* [PATCH ipsec-next 1/7] net: xfrm: Don't check for TUNNEL_KEY when hashing VTI tunnels.
From: Lorenzo Colitti @ 2017-12-20 17:06 UTC (permalink / raw)
  To: netdev; +Cc: steffen.klassert, subashab, nharold, davem, Lorenzo Colitti
In-Reply-To: <20171220170607.41516-1-lorenzo@google.com>

Currently, ip_bucket sets the lookup i_key to 0 if the tunnel's
i_flags have VTI_ISVTI flag set but not TUNNEL_KEY. However, it
can can never happen that TUNNEL_KEY is set if VTI_ISVTI is also
set (see below). Therefore, just drop the check for TUNNEL_KEY
and only set i_key to 0 on VTI_ISVTI.

This will allow the VTI code to set TUNNEL_KEY on certain
tunnels in a future change.

None of the callers of ip_bucket pass in TUNNEL_KEY | VTI_ISVTI.
The call graph is as follows:

- ip_tunnel_add
  - ip_tunnel_create
    - ip_tunnel_ioctl
      - ipgre_tunnel_ioctl: can set TUNNEL_KEY but not VTI_ISVTI
      - ipip_tunnel_ioctl: hardcodes i_flags to 0
      - vti_tunnel_ioctl: hardcodes i_flags to VTI_ISVTI
  - ip_tunnel_update: doesn't touch i_flags
  - ip_tunnel_init_net: memsets flags to 0
  - ip_tunnel_newlink
    - ipgre_newlink
      - ipgre_netlink_parms: can set TUNNEL_KEY but not VTI_ISVTI
    - vti_newlink: hardcodes i_flags to VTI_ISVTI
  - ip_tunnel_changelink: doesn't set flags
- ip_tunnel_find
  - ip_tunnel_ioctl (see above)
  - ip_tunnel_newlink (see above)
  - ip_tunnel_changelink (see above)

VTI_ISVTI has the same value as TUNNEL_DONT_FRAGMENT, but that
is never set into tunnel parameters.

Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
---
 net/ipv4/ip_tunnel.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index 5ddb1cb52b..539c8f22c4 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -198,7 +198,7 @@ static struct hlist_head *ip_bucket(struct ip_tunnel_net *itn,
 	else
 		remote = 0;
 
-	if (!(parms->i_flags & TUNNEL_KEY) && (parms->i_flags & VTI_ISVTI))
+	if (parms->i_flags & VTI_ISVTI)
 		i_key = 0;
 
 	h = ip_tunnel_hash(i_key, remote);
-- 
2.15.1.620.gb9897f4670-goog

^ permalink raw reply related

* [PATCH ipsec-next 2/7] net: ipv4: Add new flags to tunnel lookup.
From: Lorenzo Colitti @ 2017-12-20 17:06 UTC (permalink / raw)
  To: netdev; +Cc: steffen.klassert, subashab, nharold, davem, Lorenzo Colitti
In-Reply-To: <20171220170607.41516-1-lorenzo@google.com>

This patch adds support for two new flags to ip_tunnel_lookup.

- TUNNEL_LOOKUP_NO_KEY: ignores the tunnel's i_key when
  determining which hash bucket to look up the tunnel in. This is
  useful for tunnels such as VTI, which have i_keys, but are
  always hashed into bucket 0.
- TUNNEL_LOOKUP_OKEY: finds the tunnel by o_key instead of i_key.

Together, these flags allow processing ICMP errors correctly on
keyed tunnels where i_key != o_key. If such tunnels receive an
ICMP error, the only information available in the packet is the
o_key, so we must be able to find a tunnel by o_key alone. For
that to work, the tunnel hash must not depend on the i_key,
because if it does, we won't be able to find it by o_key alone.

TUNNEL_LOOKUP_NO_KEY is very similar to TUNNEL_NO_KEY so it might
be possible just to use TUNNEL_NO_KEY instead. However, it might
be confusing to see code simultaneously pass in both TUNNEL_NO_KEY
and TUNNEL_KEY.

These flags are numbered separately from tunnel flags because
they are not tunnel properties but properties of tunnel lookups.
(Also, the tunnel flags are 16 bits and all but one is unused.)

The flags are passed into ip_lookup by adding a new parameter.
This could also be done by expanding the existing flags parameter
from __be16 to __be32 and ensuring that the new flags are all
above the 16-bit boundary.

Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
---
 include/net/ip_tunnels.h |  6 +++++-
 net/ipv4/ip_gre.c        |  6 +++---
 net/ipv4/ip_tunnel.c     | 22 +++++++++++++---------
 net/ipv4/ip_vti.c        |  4 ++--
 net/ipv4/ipip.c          |  6 +++---
 5 files changed, 26 insertions(+), 18 deletions(-)

diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index 1f16773cfd..19d97b993a 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -163,6 +163,10 @@ struct ip_tunnel {
 #define TUNNEL_OPTIONS_PRESENT \
 		(TUNNEL_GENEVE_OPT | TUNNEL_VXLAN_OPT | TUNNEL_ERSPAN_OPT)
 
+/* Flags for ip_tunnel_lookup. */
+#define TUNNEL_LOOKUP_NO_KEY	0x01
+#define TUNNEL_LOOKUP_OKEY	0x02
+
 struct tnl_ptk_info {
 	__be16 flags;
 	__be16 proto;
@@ -276,7 +280,7 @@ int ip_tunnel_change_mtu(struct net_device *dev, int new_mtu);
 void ip_tunnel_get_stats64(struct net_device *dev,
 			   struct rtnl_link_stats64 *tot);
 struct ip_tunnel *ip_tunnel_lookup(struct ip_tunnel_net *itn,
-				   int link, __be16 flags,
+				   int link, __be16 flags, u8 lookup_flags,
 				   __be32 remote, __be32 local,
 				   __be32 key);
 
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index fd4d6e96da..f16a46cb19 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -182,7 +182,7 @@ static void ipgre_err(struct sk_buff *skb, u32 info,
 		itn = net_generic(net, ipgre_net_id);
 
 	iph = (const struct iphdr *)(icmp_hdr(skb) + 1);
-	t = ip_tunnel_lookup(itn, skb->dev->ifindex, tpi->flags,
+	t = ip_tunnel_lookup(itn, skb->dev->ifindex, tpi->flags, 0,
 			     iph->daddr, iph->saddr, tpi->key);
 
 	if (!t)
@@ -280,7 +280,7 @@ static int erspan_rcv(struct sk_buff *skb, struct tnl_ptk_info *tpi,
 	 */
 	tpi->key = cpu_to_be32(ntohs(ershdr->session_id) & ID_MASK);
 	tunnel = ip_tunnel_lookup(itn, skb->dev->ifindex,
-				  tpi->flags | TUNNEL_KEY,
+				  tpi->flags | TUNNEL_KEY, 0,
 				  iph->saddr, iph->daddr, tpi->key);
 
 	if (tunnel) {
@@ -356,7 +356,7 @@ static int __ipgre_rcv(struct sk_buff *skb, const struct tnl_ptk_info *tpi,
 	struct ip_tunnel *tunnel;
 
 	iph = ip_hdr(skb);
-	tunnel = ip_tunnel_lookup(itn, skb->dev->ifindex, tpi->flags,
+	tunnel = ip_tunnel_lookup(itn, skb->dev->ifindex, tpi->flags, 0,
 				  iph->saddr, iph->daddr, tpi->key);
 
 	if (tunnel) {
diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index 539c8f22c4..f45968bb81 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -70,11 +70,13 @@ static unsigned int ip_tunnel_hash(__be32 key, __be32 remote)
 }
 
 static bool ip_tunnel_key_match(const struct ip_tunnel_parm *p,
-				__be16 flags, __be32 key)
+				__be32 flags, u8 lookup_flags, __be32 key)
 {
+	__be32 tunnel_key = (lookup_flags & TUNNEL_LOOKUP_OKEY) ? p->o_key :
+								  p->i_key;
 	if (p->i_flags & TUNNEL_KEY) {
 		if (flags & TUNNEL_KEY)
-			return key == p->i_key;
+			return key == tunnel_key;
 		else
 			/* key expected, none present */
 			return false;
@@ -94,15 +96,17 @@ static bool ip_tunnel_key_match(const struct ip_tunnel_parm *p,
    Given src, dst and key, find appropriate for input tunnel.
 */
 struct ip_tunnel *ip_tunnel_lookup(struct ip_tunnel_net *itn,
-				   int link, __be16 flags,
+				   int link, __be16 flags, u8 lookup_flags,
 				   __be32 remote, __be32 local,
 				   __be32 key)
 {
 	unsigned int hash;
 	struct ip_tunnel *t, *cand = NULL;
 	struct hlist_head *head;
+	__be32 hash_key;
 
-	hash = ip_tunnel_hash(key, remote);
+	hash_key = (lookup_flags & TUNNEL_LOOKUP_NO_KEY) ? 0 : key;
+	hash = ip_tunnel_hash(hash_key, remote);
 	head = &itn->tunnels[hash];
 
 	hlist_for_each_entry_rcu(t, head, hash_node) {
@@ -111,7 +115,7 @@ struct ip_tunnel *ip_tunnel_lookup(struct ip_tunnel_net *itn,
 		    !(t->dev->flags & IFF_UP))
 			continue;
 
-		if (!ip_tunnel_key_match(&t->parms, flags, key))
+		if (!ip_tunnel_key_match(&t->parms, flags, lookup_flags, key))
 			continue;
 
 		if (t->parms.link == link)
@@ -126,7 +130,7 @@ struct ip_tunnel *ip_tunnel_lookup(struct ip_tunnel_net *itn,
 		    !(t->dev->flags & IFF_UP))
 			continue;
 
-		if (!ip_tunnel_key_match(&t->parms, flags, key))
+		if (!ip_tunnel_key_match(&t->parms, flags, lookup_flags, key))
 			continue;
 
 		if (t->parms.link == link)
@@ -135,7 +139,7 @@ struct ip_tunnel *ip_tunnel_lookup(struct ip_tunnel_net *itn,
 			cand = t;
 	}
 
-	hash = ip_tunnel_hash(key, 0);
+	hash = ip_tunnel_hash(hash_key, 0);
 	head = &itn->tunnels[hash];
 
 	hlist_for_each_entry_rcu(t, head, hash_node) {
@@ -146,7 +150,7 @@ struct ip_tunnel *ip_tunnel_lookup(struct ip_tunnel_net *itn,
 		if (!(t->dev->flags & IFF_UP))
 			continue;
 
-		if (!ip_tunnel_key_match(&t->parms, flags, key))
+		if (!ip_tunnel_key_match(&t->parms, flags, lookup_flags, key))
 			continue;
 
 		if (t->parms.link == link)
@@ -238,7 +242,7 @@ static struct ip_tunnel *ip_tunnel_find(struct ip_tunnel_net *itn,
 		    remote == t->parms.iph.daddr &&
 		    link == t->parms.link &&
 		    type == t->dev->type &&
-		    ip_tunnel_key_match(&t->parms, flags, key))
+		    ip_tunnel_key_match(&t->parms, flags, 0, key))
 			break;
 	}
 	return t;
diff --git a/net/ipv4/ip_vti.c b/net/ipv4/ip_vti.c
index 949f432a5f..804cee8126 100644
--- a/net/ipv4/ip_vti.c
+++ b/net/ipv4/ip_vti.c
@@ -57,7 +57,7 @@ static int vti_input(struct sk_buff *skb, int nexthdr, __be32 spi,
 	struct net *net = dev_net(skb->dev);
 	struct ip_tunnel_net *itn = net_generic(net, vti_net_id);
 
-	tunnel = ip_tunnel_lookup(itn, skb->dev->ifindex, TUNNEL_NO_KEY,
+	tunnel = ip_tunnel_lookup(itn, skb->dev->ifindex, TUNNEL_NO_KEY, 0,
 				  iph->saddr, iph->daddr, 0);
 	if (tunnel) {
 		if (!xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb))
@@ -278,7 +278,7 @@ static int vti4_err(struct sk_buff *skb, u32 info)
 	int protocol = iph->protocol;
 	struct ip_tunnel_net *itn = net_generic(net, vti_net_id);
 
-	tunnel = ip_tunnel_lookup(itn, skb->dev->ifindex, TUNNEL_NO_KEY,
+	tunnel = ip_tunnel_lookup(itn, skb->dev->ifindex, TUNNEL_NO_KEY, 0,
 				  iph->daddr, iph->saddr, 0);
 	if (!tunnel)
 		return -1;
diff --git a/net/ipv4/ipip.c b/net/ipv4/ipip.c
index c891235b49..81f94ffb92 100644
--- a/net/ipv4/ipip.c
+++ b/net/ipv4/ipip.c
@@ -167,7 +167,7 @@ static int ipip_err(struct sk_buff *skb, u32 info)
 		goto out;
 	}
 
-	t = ip_tunnel_lookup(itn, skb->dev->ifindex, TUNNEL_NO_KEY,
+	t = ip_tunnel_lookup(itn, skb->dev->ifindex, TUNNEL_NO_KEY, 0,
 			     iph->daddr, iph->saddr, 0);
 	if (!t) {
 		err = -ENOENT;
@@ -224,8 +224,8 @@ static int ipip_tunnel_rcv(struct sk_buff *skb, u8 ipproto)
 	const struct iphdr *iph;
 
 	iph = ip_hdr(skb);
-	tunnel = ip_tunnel_lookup(itn, skb->dev->ifindex, TUNNEL_NO_KEY,
-			iph->saddr, iph->daddr, 0);
+	tunnel = ip_tunnel_lookup(itn, skb->dev->ifindex, TUNNEL_NO_KEY, 0,
+				  iph->saddr, iph->daddr, 0);
 	if (tunnel) {
 		const struct tnl_ptk_info *tpi;
 
-- 
2.15.1.620.gb9897f4670-goog

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox