Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH] net: phy: marvell: avoid configuring fiber page for SGMII-to-Copper
From: David Miller @ 2017-12-13 21:13 UTC (permalink / raw)
  To: rmk+kernel; +Cc: andrew, f.fainelli, jon, netdev
In-Reply-To: <E1eP3Ep-0000ZC-UN@rmk-PC.armlinux.org.uk>

From: Russell King <rmk+kernel@armlinux.org.uk>
Date: Wed, 13 Dec 2017 09:22:03 +0000

> When in SGMII-to-Copper mode, the fiber page is used for the MAC facing
> link, and does not require configuration of the fiber auto-negotiation
> settings.  Avoid trying.
> 
> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>

Applied.

^ permalink raw reply

* Re: [PATCH] net: phy: marvell: avoid pause mode on SGMII-to-Copper for 88e151x
From: David Miller @ 2017-12-13 21:14 UTC (permalink / raw)
  To: rmk+kernel; +Cc: andrew, f.fainelli, jon, netdev
In-Reply-To: <E1eP3Ev-0000ZJ-1L@rmk-PC.armlinux.org.uk>

From: Russell King <rmk+kernel@armlinux.org.uk>
Date: Wed, 13 Dec 2017 09:22:09 +0000

> @@ -924,6 +924,16 @@ static int m88e1510_config_init(struct phy_device *phydev)
>  		err = marvell_set_page(phydev, MII_MARVELL_COPPER_PAGE);
>  		if (err < 0)
>  			return err;
> +
> +		/* There appears to be a bug in the 88e1512 when used in
> +		 * SGMII to copper mode, where the AN advertisment register
> +		 * clears the pause bits each time a negotiation occurs.
> +		 * This means we can never be truely sure what was advertised,
> +		 * so disable Pause support.
> +		 */
> +		pause = SUPPORTED_Pause | SUPPORTED_Asym_Pause;
> +		phydev->supported &= ~pause;
> +		phydev->advertising &= ~pause;
>  	}
>  

This function doesn't have a local 'pause' variable in any of my trees.

I wonder what you generated this against, and even more importantly, what
tree the reviewers were considering when looking at this patch :)

^ permalink raw reply

* Re: [PATCH 1/2] hp100: Fix a possible sleep-in-atomic bug in hp100_login_to_vg_hub
From: David Miller @ 2017-12-13 21:20 UTC (permalink / raw)
  To: baijiaju1990; +Cc: perex, floeff, acme, netdev, linux-kernel
In-Reply-To: <1513158468-14382-1-git-send-email-baijiaju1990@gmail.com>


I want you to review all of your patches and resend them after you
have checked them carefully.

The first patch I even looked at, this one, is buggy.

You changed a schedule_timeout_interruptible(1) into a udelay(10)

That's not right.

schedule_timeout_interruptible() takes a "jiffies" argument, which
is a completely different unit than udelay() takes.  You would have
to scale the argument to udelay() in some way using HZ.

Furthermore, the udelay argument you would come up with would
be way too long to be appropirate in this atomic context.

That's why the code tries to use a sleeping timeout, a long wait is
necessary here.

^ permalink raw reply

* Re: [PATCH net-next 2/2] net/sched: act_csum: don't use spinlock in the fast path
From: David Miller @ 2017-12-13 21:23 UTC (permalink / raw)
  To: dcaratti; +Cc: xiyou.wangcong, jiri, netdev
In-Reply-To: <16125b3f16fb9f8c068ec80ce3f6550d0465521c.1513104506.git.dcaratti@redhat.com>

From: Davide Caratti <dcaratti@redhat.com>
Date: Wed, 13 Dec 2017 10:48:38 +0100

> Then, in the data path, use READ_ONCE() to
> read those values, to avoid lock contention among multiple readers.
 ...
> @@ -544,14 +543,12 @@ static int tcf_csum(struct sk_buff *skb, const struct tc_action *a,
>  
>  	tcf_lastuse_update(&p->tcf_tm);
>  	bstats_cpu_update(this_cpu_ptr(p->common.cpu_bstats), skb);
> -	spin_lock(&p->tcf_lock);
> -	action = p->tcf_action;
> -	update_flags = p->update_flags;
> -	spin_unlock(&p->tcf_lock);
>  
> +	action = READ_ONCE(p->tcf_action);
>  	if (unlikely(action == TC_ACT_SHOT))
>  		goto drop;
>  
> +	update_flags = READ_ONCE(p->update_flags);
>  	switch (tc_skb_protocol(skb)) {
>  	case cpu_to_be16(ETH_P_IP):
>  		if (!tcf_csum_ipv4(skb, update_flags))

That's not why the lock is here.

We must read both action and flags atomically so that they are consistent
with eachother.

We must never use action from one configuration change and flags from
yet another.

Find a way to load both of these values with a single cpu load, then you
can legally remove the lock.

^ permalink raw reply

* Re: [PATCH v2 net-next] net: dsa: lan9303: Introduce lan9303_read_wait
From: David Miller @ 2017-12-13 21:25 UTC (permalink / raw)
  To: privat; +Cc: andrew, vivien.didelot, f.fainelli, netdev, linux-kernel
In-Reply-To: <20171213144250.27120-1-privat@egil-hjelmeland.no>

From: Egil Hjelmeland <privat@egil-hjelmeland.no>
Date: Wed, 13 Dec 2017 15:42:50 +0100

> Simplify lan9303_indirect_phy_wait_for_completion()
> and lan9303_switch_wait_for_completion() by using a new function
> lan9303_read_wait()
> 
> Changes v1 -> v2:
>  - param 'mask' type u32
>  - removed param 'value' (will probably never be used)
>  - add newline before return
> 
> Signed-off-by: Egil Hjelmeland <privat@egil-hjelmeland.no>

Applied, thank you.

^ permalink raw reply

* Re: [PATCH 2/5] VSOCK: extract connect/accept functions from vsock_diag_test.c
From: David Miller @ 2017-12-13 21:32 UTC (permalink / raw)
  To: stefanha; +Cc: netdev, jhansen, decui
In-Reply-To: <20171213144911.6428-3-stefanha@redhat.com>

From: Stefan Hajnoczi <stefanha@redhat.com>
Date: Wed, 13 Dec 2017 14:49:08 +0000

> +#include <sys/socket.h>
> +#include "../../../include/uapi/linux/vm_sockets.h"
> +
 ...
> -#include "../../../include/uapi/linux/vm_sockets.h"
>  #include "../../../include/uapi/linux/vm_sockets_diag.h"

This really should never be necessary.

As part of the build, before tunning tests, one is required to do
"make headers_install" and whether as root or a normal user, this
will make the UAPI headers of the current kernel available to the
build of the testcases.

So please undo this weird relative include stuff.

Thanks.

^ permalink raw reply

* Re: [PATCH net V2 0/3] mlx4 misc fixes
From: David Miller @ 2017-12-13 21:40 UTC (permalink / raw)
  To: tariqt; +Cc: netdev, eranbe
In-Reply-To: <1513181531-17522-1-git-send-email-tariqt@mellanox.com>

From: Tariq Toukan <tariqt@mellanox.com>
Date: Wed, 13 Dec 2017 18:12:08 +0200

> This patchset contains misc bug fixes from the team
> to the mlx4 Core and Eth drivers.
> 
> Patch 1 by Eugenia fixes an MTU issue in selftest.
> Patch 2 by Eran fixes an accounting issue in the resource tracker.
> Patch 3 by Eran fixes a race condition that causes counter inconsistency.
> 
> Series generated against net commit:
> 200809716aed fou: fix some member types in guehdr
 ...
> v2:
> Patch 2: Add reviewer credit, rephrase commit message.

Series applied, thanks.

^ permalink raw reply

* Re: [PATCH v2 net-next 0/7] net: speedup netns create/delete time
From: Dmitry Torokhov @ 2017-12-13 21:43 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Tariq Toukan, David S . Miller, netdev, Eric W . Biederman,
	Eric Dumazet, Majd Dibbiny, Yonatan Cohen, Eran Ben Elisha
In-Reply-To: <CANn89iKiKvzz6_R+T4kO8ztXFC7ke299+vOsGr92q2NfGBwFjw@mail.gmail.com>

Hi Eric,

On Thu, Oct 19, 2017 at 7:11 AM, Eric Dumazet <edumazet@google.com> wrote:
>
> On Thu, Oct 19, 2017 at 4:48 AM, Tariq Toukan <tariqt@mellanox.com> wrote:
> >
> > Hi Eric,
> >
> > I just wanted to check if this is solved already, as I don't want to keep an
> > unnecessary revert patch in our internal branches.
> > According to my check bug still exists.
> >
> I will handle this today, thanks for the reminder.

Did you have a chance to do this? It looks like the original change
landed on mainline and causes modules to be autoloaded on KOBJ_UNBIND
again.

Thanks!

-- 
Dmitry



-- 
Dmitry

^ permalink raw reply

* Re: [PATCH net-next v2] bpf/tracing: fix kernel/events/core.c compilation error
From: Daniel Borkmann @ 2017-12-13 21:51 UTC (permalink / raw)
  To: Yonghong Song, ast, sfr, netdev; +Cc: kernel-team
In-Reply-To: <20171213183537.83534-1-yhs@fb.com>

On 12/13/2017 07:35 PM, Yonghong Song wrote:
> Commit f371b304f12e ("bpf/tracing: allow user space to
> query prog array on the same tp") introduced a perf
> ioctl command to query prog array attached to the
> same perf tracepoint. The commit introduced a
> compilation error under certain config conditions, e.g.,
>   (1). CONFIG_BPF_SYSCALL is not defined, or
>   (2). CONFIG_TRACING is defined but neither CONFIG_UPROBE_EVENTS
>        nor CONFIG_KPROBE_EVENTS is defined.
> 
> Error message:
>   kernel/events/core.o: In function `perf_ioctl':
>   core.c:(.text+0x98c4): undefined reference to `bpf_event_query_prog_array'
> 
> This patch fixed this error by guarding the real definition under
> CONFIG_BPF_EVENTS and provided static inline dummy function
> if CONFIG_BPF_EVENTS was not defined.
> It renamed the function from bpf_event_query_prog_array to
> perf_event_query_prog_array and moved the definition from linux/bpf.h
> to linux/trace_events.h so the definition is in proximity to
> other prog_array related functions.
> 
> Fixes: f371b304f12e ("bpf/tracing: allow user space to query prog array on the same tp")
> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
> Signed-off-by: Yonghong Song <yhs@fb.com>

That's better, applied to bpf-next, thanks Yonghong!

^ permalink raw reply

* Re: [PATCH v2 net-next 0/7] net: speedup netns create/delete time
From: Eric Dumazet @ 2017-12-13 21:52 UTC (permalink / raw)
  To: Dmitry Torokhov, Eric Dumazet
  Cc: Tariq Toukan, David S . Miller, netdev, Eric W . Biederman,
	Majd Dibbiny, Yonatan Cohen, Eran Ben Elisha
In-Reply-To: <CAKdAkRTpnxxW0h5smyUDQKm4f94D6-i7G5UK_NnGj_i-cwj35Q@mail.gmail.com>

On Wed, 2017-12-13 at 13:43 -0800, Dmitry Torokhov wrote:
> Hi Eric,
> 
> On Thu, Oct 19, 2017 at 7:11 AM, Eric Dumazet <edumazet@google.com> wrote:
> > 
> > On Thu, Oct 19, 2017 at 4:48 AM, Tariq Toukan <tariqt@mellanox.com> wrote:
> > > 
> > > Hi Eric,
> > > 
> > > I just wanted to check if this is solved already, as I don't want to keep an
> > > unnecessary revert patch in our internal branches.
> > > According to my check bug still exists.
> > > 
> > 
> > I will handle this today, thanks for the reminder.
> 
> Did you have a chance to do this? It looks like the original change
> landed on mainline and causes modules to be autoloaded on KOBJ_UNBIND
> again.
> 
> Thanks!

I sent the following to Tariq, and he tested it successfully.

I will submit this formally.

diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
index c3e84edc47c965d40199b652ba78876cdaa9c70c..0795482b15d5a8f1b65b570a071aa1419cb923d8 100644
--- a/lib/kobject_uevent.c
+++ b/lib/kobject_uevent.c
@@ -346,19 +346,25 @@ static int kobject_uevent_net_broadcast(struct kobject *kobj,
 static void zap_modalias_env(struct kobj_uevent_env *env)
 {
 	static const char modalias_prefix[] = "MODALIAS=";
+	size_t offset = 0, len;
 	int i;
 
 	for (i = 0; i < env->envp_idx;) {
+		len = strlen(env->envp[i]) + 1;
 		if (strncmp(env->envp[i], modalias_prefix,
 			    sizeof(modalias_prefix) - 1)) {
 			i++;
+			offset += len;
 			continue;
 		}
 
-		if (i != env->envp_idx - 1)
+		env->buflen -= len;
+		if (i != env->envp_idx - 1) {
+			memmove(env->envp[i], env->envp[i + 1],
+				env->buflen - offset);
 			memmove(&env->envp[i], &env->envp[i + 1],
 				sizeof(env->envp[i]) * env->envp_idx - 1);
-
+		}
 		env->envp_idx--;
 	}
 }

^ permalink raw reply related

* Re: [PATCH v4 net-next 4/4] bpftool: implement cgroup bpf operations
From: Jakub Kicinski @ 2017-12-13 22:12 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: netdev, linux-kernel, kernel-team, ast, daniel, kafai,
	Quentin Monnet, David Ahern
In-Reply-To: <20171213151854.21960-5-guro@fb.com>

On Wed, 13 Dec 2017 15:18:54 +0000, Roman Gushchin wrote:
> This patch adds basic cgroup bpf operations to bpftool:
> cgroup list, attach and detach commands.
> 
> Usage is described in the corresponding man pages,
> and examples are provided.
> 
> Syntax:
> $ bpftool cgroup list CGROUP
> $ bpftool cgroup attach CGROUP ATTACH_TYPE PROG [ATTACH_FLAGS]
> $ bpftool cgroup detach CGROUP ATTACH_TYPE PROG
> 
> Signed-off-by: Roman Gushchin <guro@fb.com>
> Cc: Alexei Starovoitov <ast@kernel.org>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
> Cc: Martin KaFai Lau <kafai@fb.com>
> Cc: Quentin Monnet <quentin.monnet@netronome.com>
> Reviewed-by: David Ahern <dsahern@gmail.com>

Excellent, thank you!

Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>

^ permalink raw reply

* pull-request: bpf 2017-12-13
From: Daniel Borkmann @ 2017-12-13 22:19 UTC (permalink / raw)
  To: davem; +Cc: daniel, ast, netdev

Hi David,

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Addition of explicit scheduling points to map alloc/free
   in order to avoid having to hold the CPU for too long,
   from Eric.

2) Fixing of a corruption in overlapping perf_event_output
   calls from different BPF prog types on the same CPU out
   of different contexts, from Daniel.

3) Fallout fixes for recent correction of broken uapi for
   BPF_PROG_TYPE_PERF_EVENT. um had a missing asm header
   that needed to be pulled in from asm-generic and for
   BPF selftests the asm-generic include did not work,
   so similar asm include scheme was adapted for that
   problematic header that perf is having with other
   header files under tools, from Daniel.

Please consider pulling these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git

Thanks a lot!

----------------------------------------------------------------

The following changes since commit 30791ac41927ebd3e75486f9504b6d2280463bf0:

  tcp md5sig: Use skb's saddr when replying to an incoming segment (2017-12-12 11:15:42 -0500)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git 

for you to fetch changes up to 9147efcbe0b7cc96b18eb64b1a3f0d4bba81443c:

  bpf: add schedule points to map alloc/free (2017-12-12 15:27:22 -0800)

----------------------------------------------------------------
Alexei Starovoitov (1):
      Merge branch 'bpf-misc-fixes'

Daniel Borkmann (3):
      bpf: fix corruption on concurrent perf_event_output calls
      bpf: fix build issues on um due to mising bpf_perf_event.h
      bpf: fix broken BPF selftest build

Eric Dumazet (1):
      bpf: add schedule points to map alloc/free

 arch/um/include/asm/Kbuild              |  1 +
 kernel/bpf/hashtab.c                    |  2 ++
 kernel/trace/bpf_trace.c                | 19 ++++++++++++-------
 tools/include/uapi/asm/bpf_perf_event.h |  7 +++++++
 tools/testing/selftests/bpf/Makefile    | 13 +------------
 5 files changed, 23 insertions(+), 19 deletions(-)
 create mode 100644 tools/include/uapi/asm/bpf_perf_event.h

^ permalink raw reply

* Re: [PATCH v2 net-next 0/7] net: speedup netns create/delete time
From: Dmitry Torokhov @ 2017-12-13 22:24 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Eric Dumazet, Tariq Toukan, David S . Miller, netdev,
	Eric W . Biederman, Majd Dibbiny, Yonatan Cohen, Eran Ben Elisha
In-Reply-To: <1513201949.25033.68.camel@gmail.com>

On Wed, Dec 13, 2017 at 1:52 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Wed, 2017-12-13 at 13:43 -0800, Dmitry Torokhov wrote:
>> Hi Eric,
>>
>> On Thu, Oct 19, 2017 at 7:11 AM, Eric Dumazet <edumazet@google.com> wrote:
>> >
>> > On Thu, Oct 19, 2017 at 4:48 AM, Tariq Toukan <tariqt@mellanox.com> wrote:
>> > >
>> > > Hi Eric,
>> > >
>> > > I just wanted to check if this is solved already, as I don't want to keep an
>> > > unnecessary revert patch in our internal branches.
>> > > According to my check bug still exists.
>> > >
>> >
>> > I will handle this today, thanks for the reminder.
>>
>> Did you have a chance to do this? It looks like the original change
>> landed on mainline and causes modules to be autoloaded on KOBJ_UNBIND
>> again.
>>
>> Thanks!
>
> I sent the following to Tariq, and he tested it successfully.
>
> I will submit this formally.
>
> diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
> index c3e84edc47c965d40199b652ba78876cdaa9c70c..0795482b15d5a8f1b65b570a071aa1419cb923d8 100644
> --- a/lib/kobject_uevent.c
> +++ b/lib/kobject_uevent.c
> @@ -346,19 +346,25 @@ static int kobject_uevent_net_broadcast(struct kobject *kobj,
>  static void zap_modalias_env(struct kobj_uevent_env *env)
>  {
>         static const char modalias_prefix[] = "MODALIAS=";
> +       size_t offset = 0, len;
>         int i;
>
>         for (i = 0; i < env->envp_idx;) {
> +               len = strlen(env->envp[i]) + 1;
>                 if (strncmp(env->envp[i], modalias_prefix,
>                             sizeof(modalias_prefix) - 1)) {
>                         i++;
> +                       offset += len;
>                         continue;
>                 }
>
> -               if (i != env->envp_idx - 1)
> +               env->buflen -= len;
> +               if (i != env->envp_idx - 1) {
> +                       memmove(env->envp[i], env->envp[i + 1],
> +                               env->buflen - offset);
>                         memmove(&env->envp[i], &env->envp[i + 1],
>                                 sizeof(env->envp[i]) * env->envp_idx - 1);
> -
> +               }
>                 env->envp_idx--;
>         }
>  }
>

As I mentioned in the other thread, that works for netlink, but breaks
if you actually using env->envp pointers, as they also need to be
adjusted. I have a patch that fixes it properly.

Thanks!


-- 
Dmitry

^ permalink raw reply

* Re: pull-request: bpf 2017-12-13
From: David Miller @ 2017-12-13 22:30 UTC (permalink / raw)
  To: daniel; +Cc: ast, netdev
In-Reply-To: <20171213221912.6036-1-daniel@iogearbox.net>

From: Daniel Borkmann <daniel@iogearbox.net>
Date: Wed, 13 Dec 2017 23:19:12 +0100

> The following pull-request contains BPF updates for your *net* tree.
> 
> The main changes are:
> 
> 1) Addition of explicit scheduling points to map alloc/free
>    in order to avoid having to hold the CPU for too long,
>    from Eric.
> 
> 2) Fixing of a corruption in overlapping perf_event_output
>    calls from different BPF prog types on the same CPU out
>    of different contexts, from Daniel.
> 
> 3) Fallout fixes for recent correction of broken uapi for
>    BPF_PROG_TYPE_PERF_EVENT. um had a missing asm header
>    that needed to be pulled in from asm-generic and for
>    BPF selftests the asm-generic include did not work,
>    so similar asm include scheme was adapted for that
>    problematic header that perf is having with other
>    header files under tools, from Daniel.
> 
> Please consider pulling these changes from:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git

Pulled, thanks Daniel.

^ permalink raw reply

* [PATCH net] vxlan: Restore initial MTU setting based on lower device
From: Stefano Brivio @ 2017-12-13 22:37 UTC (permalink / raw)
  To: David S . Miller, netdev
  Cc: Matthias Schiffer, Junhan Yan, Jiri Benc, Hangbin Liu

Commit a985343ba906 ("vxlan: refactor verification and
application of configuration") introduced a change in the
behaviour of initial MTU setting: earlier, the MTU for a link
created on top of a given lower device, without an initial MTU
specification, was set to the MTU of the lower device minus
headroom as a result of this path in vxlan_dev_configure():

	if (!conf->mtu)
		dev->mtu = lowerdev->mtu -
			   (use_ipv6 ? VXLAN6_HEADROOM : VXLAN_HEADROOM);

which is now gone. Now, the initial MTU, in absence of a
configured value, is simply set by ether_setup() to ETH_DATA_LEN
(1500 bytes).

This breaks userspace expectations in case the MTU of
the lower device is higher than 1500 bytes minus headroom.

Restore the previous behaviour by calculating, for a new link,
the MTU from the lower device, if present, and if no value is
explicitly configured.

Reported-by: Junhan Yan <juyan@redhat.com>
Fixes: a985343ba906 ("vxlan: refactor verification and application of configuration")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
---
I guess this should be queued up for -stable, back to 4.13.

I'm actually introducing the third occurrence of this calculation (there's
another one in vxlan_config_apply(), and one in vxlan_change_mtu()). I would
anyway fix the userspace breakage first, and then plan on getting rid of several
bits of MTU logic duplication, which spans further than this.

 drivers/net/vxlan.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 19b9cc51079e..3a7e36cdf2c7 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -3085,6 +3085,9 @@ static void vxlan_config_apply(struct net_device *dev,
 
 		if (conf->mtu)
 			dev->mtu = conf->mtu;
+		else if (lowerdev)
+			dev->mtu = lowerdev->mtu - (use_ipv6 ? VXLAN6_HEADROOM :
+							       VXLAN_HEADROOM);
 
 		vxlan->net = src_net;
 	}
-- 
2.9.4

^ permalink raw reply related

* Re: BUG REPORT: iproute2 seems to have bug with dsfield/tos in ip-rule and ip-route
From: David Ahern @ 2017-12-13 22:40 UTC (permalink / raw)
  To: Daniel Lakeland, Stephen Hemminger; +Cc: netdev
In-Reply-To: <4f4e5fc5-2141-9e38-2f5f-da5f7558d505@street-artists.org>

On 12/13/17 12:05 PM, Daniel Lakeland wrote:
> Following up my previous email with output from the machine:
> Note that like some of those other people I was able to get ip rule to
> accept tos values with just low order bits set
> 
> Here is example of how ip rule accepts low order dsfield bits but not
> modern DSCP type bits, also including some version info
> 
> dlakelan@pingpong:~$ sudo ip rule add dsfield 0x0c table 100
> dlakelan@pingpong:~$ ip rule show
> 0:    from all lookup local
> 32765:    from all tos 0x0c lookup 100
> 32766:    from all lookup main
> 32767:    from all lookup default
> 
> 
> dlakelan@pingpong:~$ sudo ip rule add dsfield 0xc0 table 100
> RTNETLINK answers: Invalid argument

In fib4_rule_configure, this the check that is failing:

    if (frh->tos & ~IPTOS_TOS_MASK)
        goto errout;

and EINVAL is returned.

IPv4 routes has not checking on tos -- it is passed from user and
rtm_tos to fc_tos to fib alias tos.

^ permalink raw reply

* [PATCH next 0/2] ipvlan: packet scrub
From: Mahesh Bandewar @ 2017-12-13 22:40 UTC (permalink / raw)
  To: David Miller, Netdev; +Cc: Eric Dumazet, Mahesh Bandewar, Mahesh Bandewar

From: Mahesh Bandewar <maheshb@google.com>

While crossing namespace boundary IPvlan aggressively scrubs packets.
This is creating problems. First thing is that scrubbing changes the 
packet type in skb meta-data to PACKET_HOST. This causes erroneous
packet delivery when dev_forward_skb() has already marked the packet
type as OTHER_HOST.

On the egress side scrubbing just before calling dev_queue_xmit()
creates another set of problems. Scrubbing remove skb->sk so the
prio update gets missed and more seriously, socket back-pressure
fails making TSQ not function correctly.

The first patch in the series just reverts the earlier change which
was adding a mac-check, but that is unnecessary if packet_type that
dev_forward_skb() has set is honored. The second path removes two of
the scrubs which are causing problems described above.


Mahesh Bandewar (2):
  Revert "ipvlan: add L2 check for packets arriving via virtual devices"
  ipvlan: remove excessive packet scrubbing

 drivers/net/ipvlan/ipvlan_core.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

-- 
2.15.1.424.g9478a66081-goog

^ permalink raw reply

* [PATCH next 1/2] Revert "ipvlan: add L2 check for packets arriving via virtual devices"
From: Mahesh Bandewar @ 2017-12-13 22:40 UTC (permalink / raw)
  To: David Miller, Netdev; +Cc: Eric Dumazet, Mahesh Bandewar, Mahesh Bandewar

From: Mahesh Bandewar <maheshb@google.com>

This reverts commit 92ff42645028fa6f9b8aa767718457b9264316b4.

Even though the check added is not that taxing, it's not really needed.
First of all this will be per packet cost and second thing is that the
eth_type_trans() already does this correctly. The excessive scrubbing
in IPvlan was changing the pkt-type skb metadata of the packet which
made it necessary to re-check the mac. The subsequent patch in this
series removes the faulty packet-scrub.

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
---
 drivers/net/ipvlan/ipvlan_core.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/drivers/net/ipvlan/ipvlan_core.c b/drivers/net/ipvlan/ipvlan_core.c
index 0bc7f721b717..9774c96ac7bb 100644
--- a/drivers/net/ipvlan/ipvlan_core.c
+++ b/drivers/net/ipvlan/ipvlan_core.c
@@ -322,10 +322,6 @@ static int ipvlan_rcv_frame(struct ipvl_addr *addr, struct sk_buff **pskb,
 		if (dev_forward_skb(ipvlan->dev, skb) == NET_RX_SUCCESS)
 			success = true;
 	} else {
-		if (!ether_addr_equal_64bits(eth_hdr(skb)->h_dest,
-					     ipvlan->phy_dev->dev_addr))
-			skb->pkt_type = PACKET_OTHERHOST;
-
 		ret = RX_HANDLER_ANOTHER;
 		success = true;
 	}
-- 
2.15.1.424.g9478a66081-goog

^ permalink raw reply related

* [PATCH next 2/2] ipvlan: remove excessive packet scrubbing
From: Mahesh Bandewar @ 2017-12-13 22:40 UTC (permalink / raw)
  To: David Miller, Netdev; +Cc: Eric Dumazet, Mahesh Bandewar, Mahesh Bandewar

From: Mahesh Bandewar <maheshb@google.com>

IPvlan currently scrubs packets at every location where packets may be
crossing namespace boundary. Though this is desirable, currently IPvlan
does it more than necessary. e.g. packets that are going to take
dev_forward_skb() path will get scrubbed so no point in scrubbing them
before forwarding. Another side-effect of scrubbing is that pkt-type gets
set to PACKET_HOST which overrides what was already been set by the
earlier path making erroneous delivery of the packets.

Also scrubbing packets just before calling dev_queue_xmit() has detrimental
effects since packets lose skb->sk and because of that miss prio updates,
incorrect socket back-pressure and would even break TSQ.

Fixes: b93dd49c1a35 ('ipvlan: Scrub skb before crossing the namespace boundary')
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
---
 drivers/net/ipvlan/ipvlan_core.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ipvlan/ipvlan_core.c b/drivers/net/ipvlan/ipvlan_core.c
index 9774c96ac7bb..c1f008fe4e1d 100644
--- a/drivers/net/ipvlan/ipvlan_core.c
+++ b/drivers/net/ipvlan/ipvlan_core.c
@@ -315,13 +315,13 @@ static int ipvlan_rcv_frame(struct ipvl_addr *addr, struct sk_buff **pskb,
 
 		*pskb = skb;
 	}
-	ipvlan_skb_crossing_ns(skb, dev);
 
 	if (local) {
 		skb->pkt_type = PACKET_HOST;
 		if (dev_forward_skb(ipvlan->dev, skb) == NET_RX_SUCCESS)
 			success = true;
 	} else {
+		skb->dev = dev;
 		ret = RX_HANDLER_ANOTHER;
 		success = true;
 	}
@@ -586,7 +586,7 @@ static int ipvlan_xmit_mode_l2(struct sk_buff *skb, struct net_device *dev)
 		return NET_XMIT_SUCCESS;
 	}
 
-	ipvlan_skb_crossing_ns(skb, ipvlan->phy_dev);
+	skb->dev = ipvlan->phy_dev;
 	return dev_queue_xmit(skb);
 }
 
-- 
2.15.1.424.g9478a66081-goog

^ permalink raw reply related

* [PATCH net-next 0/2] nfp: ethtool flash updates
From: Jakub Kicinski @ 2017-12-13 22:45 UTC (permalink / raw)
  To: netdev; +Cc: oss-drivers, Jakub Kicinski

Dirk says:

This series adds the ability to update the control FW with ethtool.

It should be noted that the locking scheme here is to release the RTNL
lock before the flashing operation and to take it again afterwards to
ensure consistent state from the core code point of view. In this time,
we take a reference to the device to prevent the device being freed
while its being flashed.

This provides protection for the device being flashed while at the same
time not holding up any networking related functions which would
otherwise be locked out due to RTNL being held.


Dirk van der Merwe (2):
  nfp: extend NSP infrastructure for configurable timeouts
  nfp: implement firmware flashing

 .../net/ethernet/netronome/nfp/nfp_net_ethtool.c   | 54 +++++++++++++++++
 .../net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c   | 68 +++++++++++++++++-----
 .../net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h   |  1 +
 3 files changed, 107 insertions(+), 16 deletions(-)

-- 
2.15.1

^ permalink raw reply

* [PATCH net-next 1/2] nfp: extend NSP infrastructure for configurable timeouts
From: Jakub Kicinski @ 2017-12-13 22:45 UTC (permalink / raw)
  To: netdev; +Cc: oss-drivers, Dirk van der Merwe
In-Reply-To: <20171213224502.25407-1-jakub.kicinski@netronome.com>

From: Dirk van der Merwe <dirk.vandermerwe@netronome.com>

The firmware flashing NSP operation takes longer to execute than the
current default timeout. We need a mechanism to set a longer timeout for
some commands. This patch adds the infrastructure to this.

The default timeout is still 30 seconds.

Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 .../net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c   | 60 ++++++++++++++++------
 1 file changed, 43 insertions(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c
index 14a6d1ba51a9..a6b40091968d 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c
@@ -51,6 +51,9 @@
 #include "nfp_cpp.h"
 #include "nfp_nsp.h"
 
+#define NFP_NSP_TIMEOUT_DEFAULT	30
+#define NFP_NSP_TIMEOUT_BOOT	30
+
 /* Offsets relative to the CSR base */
 #define NSP_STATUS		0x00
 #define   NSP_STATUS_MAGIC	GENMASK_ULL(63, 48)
@@ -260,10 +263,10 @@ u16 nfp_nsp_get_abi_ver_minor(struct nfp_nsp *state)
 }
 
 static int
-nfp_nsp_wait_reg(struct nfp_cpp *cpp, u64 *reg,
-		 u32 nsp_cpp, u64 addr, u64 mask, u64 val)
+nfp_nsp_wait_reg(struct nfp_cpp *cpp, u64 *reg, u32 nsp_cpp, u64 addr,
+		 u64 mask, u64 val, u32 timeout_sec)
 {
-	const unsigned long wait_until = jiffies + 30 * HZ;
+	const unsigned long wait_until = jiffies + timeout_sec * HZ;
 	int err;
 
 	for (;;) {
@@ -285,12 +288,13 @@ nfp_nsp_wait_reg(struct nfp_cpp *cpp, u64 *reg,
 }
 
 /**
- * nfp_nsp_command() - Execute a command on the NFP Service Processor
+ * __nfp_nsp_command() - Execute a command on the NFP Service Processor
  * @state:	NFP SP state
  * @code:	NFP SP Command Code
  * @option:	NFP SP Command Argument
  * @buff_cpp:	NFP SP Buffer CPP Address info
  * @buff_addr:	NFP SP Buffer Host address
+ * @timeout_sec:Timeout value to wait for completion in seconds
  *
  * Return: 0 for success with no result
  *
@@ -300,10 +304,11 @@ nfp_nsp_wait_reg(struct nfp_cpp *cpp, u64 *reg,
  *	-ENODEV if the NSP is not a supported model
  *	-EBUSY if the NSP is stuck
  *	-EINTR if interrupted while waiting for completion
- *	-ETIMEDOUT if the NSP took longer than 30 seconds to complete
+ *	-ETIMEDOUT if the NSP took longer than @timeout_sec seconds to complete
  */
-static int nfp_nsp_command(struct nfp_nsp *state, u16 code, u32 option,
-			   u32 buff_cpp, u64 buff_addr)
+static int
+__nfp_nsp_command(struct nfp_nsp *state, u16 code, u32 option, u32 buff_cpp,
+		  u64 buff_addr, u32 timeout_sec)
 {
 	u64 reg, ret_val, nsp_base, nsp_buffer, nsp_status, nsp_command;
 	struct nfp_cpp *cpp = state->cpp;
@@ -341,8 +346,8 @@ static int nfp_nsp_command(struct nfp_nsp *state, u16 code, u32 option,
 		return err;
 
 	/* Wait for NSP_COMMAND_START to go to 0 */
-	err = nfp_nsp_wait_reg(cpp, &reg,
-			       nsp_cpp, nsp_command, NSP_COMMAND_START, 0);
+	err = nfp_nsp_wait_reg(cpp, &reg, nsp_cpp, nsp_command,
+			       NSP_COMMAND_START, 0, NFP_NSP_TIMEOUT_DEFAULT);
 	if (err) {
 		nfp_err(cpp, "Error %d waiting for code 0x%04x to start\n",
 			err, code);
@@ -350,8 +355,8 @@ static int nfp_nsp_command(struct nfp_nsp *state, u16 code, u32 option,
 	}
 
 	/* Wait for NSP_STATUS_BUSY to go to 0 */
-	err = nfp_nsp_wait_reg(cpp, &reg,
-			       nsp_cpp, nsp_status, NSP_STATUS_BUSY, 0);
+	err = nfp_nsp_wait_reg(cpp, &reg, nsp_cpp, nsp_status, NSP_STATUS_BUSY,
+			       0, timeout_sec);
 	if (err) {
 		nfp_err(cpp, "Error %d waiting for code 0x%04x to complete\n",
 			err, code);
@@ -374,9 +379,18 @@ static int nfp_nsp_command(struct nfp_nsp *state, u16 code, u32 option,
 	return ret_val;
 }
 
-static int nfp_nsp_command_buf(struct nfp_nsp *nsp, u16 code, u32 option,
-			       const void *in_buf, unsigned int in_size,
-			       void *out_buf, unsigned int out_size)
+static int
+nfp_nsp_command(struct nfp_nsp *state, u16 code, u32 option, u32 buff_cpp,
+		u64 buff_addr)
+{
+	return __nfp_nsp_command(state, code, option, buff_cpp, buff_addr,
+				 NFP_NSP_TIMEOUT_DEFAULT);
+}
+
+static int
+__nfp_nsp_command_buf(struct nfp_nsp *nsp, u16 code, u32 option,
+		      const void *in_buf, unsigned int in_size, void *out_buf,
+		      unsigned int out_size, u32 timeout_sec)
 {
 	struct nfp_cpp *cpp = nsp->cpp;
 	unsigned int max_size;
@@ -429,7 +443,8 @@ static int nfp_nsp_command_buf(struct nfp_nsp *nsp, u16 code, u32 option,
 			return err;
 	}
 
-	ret = nfp_nsp_command(nsp, code, option, cpp_id, cpp_buf);
+	ret = __nfp_nsp_command(nsp, code, option, cpp_id, cpp_buf,
+				timeout_sec);
 	if (ret < 0)
 		return ret;
 
@@ -442,12 +457,23 @@ static int nfp_nsp_command_buf(struct nfp_nsp *nsp, u16 code, u32 option,
 	return ret;
 }
 
+static int
+nfp_nsp_command_buf(struct nfp_nsp *nsp, u16 code, u32 option,
+		    const void *in_buf, unsigned int in_size, void *out_buf,
+		    unsigned int out_size)
+{
+	return __nfp_nsp_command_buf(nsp, code, option, in_buf, in_size,
+				     out_buf, out_size,
+				     NFP_NSP_TIMEOUT_DEFAULT);
+}
+
 int nfp_nsp_wait(struct nfp_nsp *state)
 {
-	const unsigned long wait_until = jiffies + 30 * HZ;
+	const unsigned long wait_until = jiffies + NFP_NSP_TIMEOUT_BOOT * HZ;
 	int err;
 
-	nfp_dbg(state->cpp, "Waiting for NSP to respond (30 sec max).\n");
+	nfp_dbg(state->cpp, "Waiting for NSP to respond (%u sec max).\n",
+		NFP_NSP_TIMEOUT_BOOT);
 
 	for (;;) {
 		const unsigned long start_time = jiffies;
-- 
2.15.1

^ permalink raw reply related

* [PATCH net-next 2/2] nfp: implement firmware flashing
From: Jakub Kicinski @ 2017-12-13 22:45 UTC (permalink / raw)
  To: netdev; +Cc: oss-drivers, Dirk van der Merwe
In-Reply-To: <20171213224502.25407-1-jakub.kicinski@netronome.com>

From: Dirk van der Merwe <dirk.vandermerwe@netronome.com>

Firmware flashing takes around 60s (specified to not take more than
70s). Prevent hogging the RTNL lock in this time and make use of the
longer timeout for the NSP command. The timeout is set to 2.5 * 70
seconds.

We only allow flashing the firmware from reprs or PF netdevs. VFs do not
have an app reference.

Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 .../net/ethernet/netronome/nfp/nfp_net_ethtool.c   | 54 ++++++++++++++++++++++
 .../net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c   | 12 +++++
 .../net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h   |  1 +
 3 files changed, 67 insertions(+)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
index 2cde0eb00ee3..00b8c642e672 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
@@ -47,6 +47,7 @@
 #include <linux/interrupt.h>
 #include <linux/pci.h>
 #include <linux/ethtool.h>
+#include <linux/firmware.h>
 
 #include "nfpcore/nfp.h"
 #include "nfpcore/nfp_nsp.h"
@@ -1269,6 +1270,57 @@ static int nfp_net_set_channels(struct net_device *netdev,
 	return nfp_net_set_num_rings(nn, total_rx, total_tx);
 }
 
+static int
+nfp_net_flash_device(struct net_device *netdev, struct ethtool_flash *flash)
+{
+	const struct firmware *fw;
+	struct nfp_app *app;
+	struct nfp_nsp *nsp;
+	struct device *dev;
+	int err;
+
+	if (flash->region != ETHTOOL_FLASH_ALL_REGIONS)
+		return -EOPNOTSUPP;
+
+	app = nfp_app_from_netdev(netdev);
+	if (!app)
+		return -EOPNOTSUPP;
+
+	dev = &app->pdev->dev;
+
+	nsp = nfp_nsp_open(app->cpp);
+	if (IS_ERR(nsp)) {
+		err = PTR_ERR(nsp);
+		dev_err(dev, "Failed to access the NSP: %d\n", err);
+		return err;
+	}
+
+	err = request_firmware_direct(&fw, flash->data, dev);
+	if (err)
+		goto exit_close_nsp;
+
+	dev_info(dev, "Please be patient while writing flash image: %s\n",
+		 flash->data);
+	dev_hold(netdev);
+	rtnl_unlock();
+
+	err = nfp_nsp_write_flash(nsp, fw);
+	if (err < 0) {
+		dev_err(dev, "Flash write failed: %d\n", err);
+		goto exit_rtnl_lock;
+	}
+	dev_info(dev, "Finished writing flash image\n");
+
+exit_rtnl_lock:
+	rtnl_lock();
+	dev_put(netdev);
+	release_firmware(fw);
+
+exit_close_nsp:
+	nfp_nsp_close(nsp);
+	return err;
+}
+
 static const struct ethtool_ops nfp_net_ethtool_ops = {
 	.get_drvinfo		= nfp_net_get_drvinfo,
 	.get_link		= ethtool_op_get_link,
@@ -1279,6 +1331,7 @@ static const struct ethtool_ops nfp_net_ethtool_ops = {
 	.get_sset_count		= nfp_net_get_sset_count,
 	.get_rxnfc		= nfp_net_get_rxnfc,
 	.set_rxnfc		= nfp_net_set_rxnfc,
+	.flash_device		= nfp_net_flash_device,
 	.get_rxfh_indir_size	= nfp_net_get_rxfh_indir_size,
 	.get_rxfh_key_size	= nfp_net_get_rxfh_key_size,
 	.get_rxfh		= nfp_net_get_rxfh,
@@ -1304,6 +1357,7 @@ const struct ethtool_ops nfp_port_ethtool_ops = {
 	.get_strings		= nfp_port_get_strings,
 	.get_ethtool_stats	= nfp_port_get_stats,
 	.get_sset_count		= nfp_port_get_sset_count,
+	.flash_device		= nfp_net_flash_device,
 	.set_dump		= nfp_app_set_dump,
 	.get_dump_flag		= nfp_app_get_dump_flag,
 	.get_dump_data		= nfp_app_get_dump_data,
diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c
index a6b40091968d..39abac678b71 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c
@@ -96,6 +96,7 @@ enum nfp_nsp_cmd {
 	SPCODE_FW_LOAD		= 6, /* Load fw from buffer, len in option */
 	SPCODE_ETH_RESCAN	= 7, /* Rescan ETHs, write ETH_TABLE to buf */
 	SPCODE_ETH_CONTROL	= 8, /* Update media config from buffer */
+	SPCODE_NSP_WRITE_FLASH	= 11, /* Load and flash image from buffer */
 	SPCODE_NSP_SENSORS	= 12, /* Read NSP sensor(s) */
 	SPCODE_NSP_IDENTIFY	= 13, /* Read NSP version */
 };
@@ -514,6 +515,17 @@ int nfp_nsp_load_fw(struct nfp_nsp *state, const struct firmware *fw)
 				   fw->size, NULL, 0);
 }
 
+int nfp_nsp_write_flash(struct nfp_nsp *state, const struct firmware *fw)
+{
+	/* The flash time is specified to take a maximum of 70s so we add an
+	 * additional factor to this spec time.
+	 */
+	u32 timeout_sec = 2.5 * 70;
+
+	return __nfp_nsp_command_buf(state, SPCODE_NSP_WRITE_FLASH, fw->size,
+				     fw->data, fw->size, NULL, 0, timeout_sec);
+}
+
 int nfp_nsp_read_eth_table(struct nfp_nsp *state, void *buf, unsigned int size)
 {
 	return nfp_nsp_command_buf(state, SPCODE_ETH_RESCAN, size, NULL, 0,
diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h
index 650ca1a5bd21..e983c9d7f86c 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h
@@ -48,6 +48,7 @@ u16 nfp_nsp_get_abi_ver_minor(struct nfp_nsp *state);
 int nfp_nsp_wait(struct nfp_nsp *state);
 int nfp_nsp_device_soft_reset(struct nfp_nsp *state);
 int nfp_nsp_load_fw(struct nfp_nsp *state, const struct firmware *fw);
+int nfp_nsp_write_flash(struct nfp_nsp *state, const struct firmware *fw);
 int nfp_nsp_mac_reinit(struct nfp_nsp *state);
 
 static inline bool nfp_nsp_has_mac_reinit(struct nfp_nsp *state)
-- 
2.15.1

^ permalink raw reply related

* Re: [bpf-next V1-RFC PATCH 12/14] xdp: generic XDP handling of xdp_rxq_info
From: Saeed Mahameed @ 2017-12-13 22:50 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, Daniel Borkmann, Alexei Starovoitov
  Cc: netdev, dsahern, gospo, bjorn.topel, michael.chan
In-Reply-To: <151316402196.14967.14959244245059519467.stgit@firesoul>



On 12/13/2017 3:20 AM, Jesper Dangaard Brouer wrote:
> Hook points for xdp_rxq_info:
>   * init+reg: netif_alloc_rx_queues
>   * unreg   : netif_free_rx_queues
> 
> The net_device have some members (num_rx_queues + real_num_rx_queues)
> and data-area (dev->_rx with struct netdev_rx_queue's) that were
> primarily used for exporting information about RPS (CONFIG_RPS) queues
> to sysfs (CONFIG_SYSFS).
> 
> For generic XDP extend struct netdev_rx_queue with the xdp_rxq_info,
> and remove some of the CONFIG_SYSFS ifdefs.
> 
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> ---
>   include/linux/netdevice.h |    2 ++
>   net/core/dev.c            |   60 ++++++++++++++++++++++++++++++++++++++-------
>   2 files changed, 52 insertions(+), 10 deletions(-)
> 
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index cc4ce7456e38..43595b037872 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -44,6 +44,7 @@
>   #include <net/dcbnl.h>
>   #endif
>   #include <net/netprio_cgroup.h>
> +#include <net/xdp.h>
>   
>   #include <linux/netdev_features.h>
>   #include <linux/neighbour.h>
> @@ -686,6 +687,7 @@ struct netdev_rx_queue {
>   #endif
>   	struct kobject			kobj;
>   	struct net_device		*dev;
> +	struct xdp_rxq_info		xdp_rxq;
>   } ____cacheline_aligned_in_smp;
>   

Instead of duplicating this xdp_rxq_info and have 2 instances of it for 
drivers that do support XDP (the generic one and the driver internal 
xdp_rxq_info), drivers can use the generic netdev_rx_queue.xdp_rxq to 
register their own xdp_rxq_info.
I suggest the following API for drivers to use:

xdp_rxq_info_reg(netdev, rxq_index)
{
	rxqueue = dev->_rx + rxq_index;
	xdp_rxq = rxqueue.xdp_rxq;
         xdp_rxq_info_init(xdp_rxq);
	xdp_rxq.dev = netdev;
	xdp_rxq.queue_index = rxq_index;
}

xdp_rxq_info_unreg(netdev, rxq_index)
{
...
}

This way you can avoid the xdp_rxq_info structure management by the 
drivers them selves and reduce duplicated code to init, fill the 
xdp_rxq_info per driver.

-Saeed.

>   /*
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 6bea8931bb62..44932d6194a2 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -3873,9 +3873,33 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu,
>   	return NET_RX_DROP;
>   }
>   
> +static struct netdev_rx_queue *netif_get_rxqueue(struct sk_buff *skb)
> +{
> +	struct net_device *dev = skb->dev;
> +	struct netdev_rx_queue *rxqueue;
> +
> +	rxqueue = dev->_rx;
> +
> +	if (skb_rx_queue_recorded(skb)) {
> +		u16 index = skb_get_rx_queue(skb);
> +
> +		if (unlikely(index >= dev->real_num_rx_queues)) {
> +			WARN_ONCE(dev->real_num_rx_queues > 1,
> +				  "%s received packet on queue %u, but number "
> +				  "of RX queues is %u\n",
> +				  dev->name, index, dev->real_num_rx_queues);
> +
> +			return rxqueue; /* Return first rxqueue */
> +		}
> +		rxqueue += index;
> +	}
> +	return rxqueue;
> +}
> +
>   static u32 netif_receive_generic_xdp(struct sk_buff *skb,
>   				     struct bpf_prog *xdp_prog)
>   {
> +	struct netdev_rx_queue *rxqueue;
>   	u32 metalen, act = XDP_DROP;
>   	struct xdp_buff xdp;
>   	void *orig_data;
> @@ -3919,6 +3943,9 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
>   	xdp.data_hard_start = skb->data - skb_headroom(skb);
>   	orig_data = xdp.data;
>   
> +	rxqueue = netif_get_rxqueue(skb);
> +	xdp.rxq = &rxqueue->xdp_rxq;
> +
>   	act = bpf_prog_run_xdp(xdp_prog, &xdp);
>   
>   	off = xdp.data - orig_data;
> @@ -7538,7 +7565,6 @@ void netif_stacked_transfer_operstate(const struct net_device *rootdev,
>   }
>   EXPORT_SYMBOL(netif_stacked_transfer_operstate);
>   
> -#ifdef CONFIG_SYSFS
>   static int netif_alloc_rx_queues(struct net_device *dev)
>   {
>   	unsigned int i, count = dev->num_rx_queues;
> @@ -7553,11 +7579,31 @@ static int netif_alloc_rx_queues(struct net_device *dev)
>   
>   	dev->_rx = rx;
>   
> -	for (i = 0; i < count; i++)
> +	for (i = 0; i < count; i++) {
>   		rx[i].dev = dev;
> +
> +		/* XDP RX-queue setup */
> +		xdp_rxq_info_init(&rx[i].xdp_rxq);
> +		rx[i].xdp_rxq.dev = dev;
> +		rx[i].xdp_rxq.queue_index = i;
> +		xdp_rxq_info_reg(&rx[i].xdp_rxq);
> +	}
>   	return 0;
>   }
> -#endif
> +
> +static void netif_free_rx_queues(struct net_device *dev)
> +{
> +	unsigned int i, count = dev->num_rx_queues;
> +	struct netdev_rx_queue *rx;
> +
> +	if (!dev->_rx) /* netif_alloc_rx_queues alloc failed */
> +		return;
> +
> +	rx = dev->_rx;
> +
> +	for (i = 0; i < count; i++)
> +		xdp_rxq_info_unreg(&rx[i].xdp_rxq);
> +}
>   
>   static void netdev_init_one_queue(struct net_device *dev,
>   				  struct netdev_queue *queue, void *_unused)
> @@ -8118,12 +8164,10 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
>   		return NULL;
>   	}
>   
> -#ifdef CONFIG_SYSFS
>   	if (rxqs < 1) {
>   		pr_err("alloc_netdev: Unable to allocate device with zero RX queues\n");
>   		return NULL;
>   	}
> -#endif
>   
>   	alloc_size = sizeof(struct net_device);
>   	if (sizeof_priv) {
> @@ -8180,12 +8224,10 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
>   	if (netif_alloc_netdev_queues(dev))
>   		goto free_all;
>   
> -#ifdef CONFIG_SYSFS
>   	dev->num_rx_queues = rxqs;
>   	dev->real_num_rx_queues = rxqs;
>   	if (netif_alloc_rx_queues(dev))
>   		goto free_all;
> -#endif
>   
>   	strcpy(dev->name, name);
>   	dev->name_assign_type = name_assign_type;
> @@ -8224,9 +8266,7 @@ void free_netdev(struct net_device *dev)
>   
>   	might_sleep();
>   	netif_free_tx_queues(dev);
> -#ifdef CONFIG_SYSFS
> -	kvfree(dev->_rx);
> -#endif
> +	netif_free_rx_queues(dev);
>   
>   	kfree(rcu_dereference_protected(dev->ingress_queue, 1));
>   
> 

^ permalink raw reply

* Re: BUG REPORT: iproute2 seems to have bug with dsfield/tos in ip-rule and ip-route
From: Daniel Lakeland @ 2017-12-13 22:52 UTC (permalink / raw)
  To: David Ahern, Stephen Hemminger; +Cc: netdev
In-Reply-To: <224860aa-17a8-37a1-cbfe-66899c1bba94@gmail.com>

On 12/13/2017 02:40 PM, David Ahern wrote:
>
> In fib4_rule_configure, this the check that is failing:
>
>      if (frh->tos & ~IPTOS_TOS_MASK)
>          goto errout;
>
> and EINVAL is returned.
>
> IPv4 routes has not checking on tos -- it is passed from user and
> rtm_tos to fc_tos to fib alias tos.

it seems to me that this IPTOS_TOS_MASK check should be either gotten 
rid of, or equal to 0x03 in modern usage. The bottom 2 bits are ECN and 
I suppose someone might want to route based on congestion... and hence 
maybe the mask should be dropped entirely, but if you refuse to allow 
routes on ECN then you'd want 0x03 as the mask

it seems to me this is left over from before DSCP.

apparently most people don't route on DSCP or work around this with 
firewall marks, and so this doesn't cause trouble enough to have been 
reported before?

I think the follow up question is does anyone have any idea why someone 
who set up routes with dsfield settings is not seeing packets routed? 
The kernel may not handle ip rule with DSCP, but it takes

ip route add default dsfield CS6 dev veth0

just fine... and shows up in the route table, but for example the person 
is not seeing CS6 marked packets going to veth2 and instead is seeing 
them routed to veth0 the default route...

^ permalink raw reply

* Re: [bpf-next V1-RFC PATCH 02/14] xdp/mlx5: setup xdp_rxq_info and extend with qtype
From: Saeed Mahameed @ 2017-12-13 23:03 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, Tariq Toukan
  Cc: Daniel Borkmann, Alexei Starovoitov, netdev, dsahern, Matan Barak,
	gospo, bjorn.topel, michael.chan
In-Reply-To: <20171213144439.2036c59b@redhat.com>



On 12/13/2017 5:44 AM, Jesper Dangaard Brouer wrote:
> On Wed, 13 Dec 2017 14:27:08 +0200
> Tariq Toukan <tariqt@mellanox.com> wrote:
> 
>> Hi Jesper,
>> Thanks for taking care of the drop RQ.
>>
>> In general, mlx5 part looks ok to me.
>> Find a few comments below. Mostly pointing out some typos.
>>
>> On 13/12/2017 1:19 PM, Jesper Dangaard Brouer wrote:
>>> The mlx5 driver have a special drop-RQ queue (one per interface) that
>>> simply drops all incoming traffic. It helps driver keep other HW
>>> objects (flow steering) alive upon down/up operations.  It is
>>> temporarily pointed by flow steering objects during the interface
>>> setup, and when interface is down. It lacks many fields that are set
>>> in a regular RQ (for example its state is never switched to
>>> MLX5_RQC_STATE_RDY). (Thanks to Tariq Toukan for explaination).
>> typo: explanation
> 
> Fixed
> 
>>>
>>> The XDP RX-queue info API is extended with a queue-type, and mlx5 uses
>>> this kind of drop/sink-type (RXQ_TYPE_SINK) for this kind of sink queue.
>>>
>>> Driver hook points for xdp_rxq_info:
>>>    * init+reg: mlx5e_alloc_rq()
>>>    * init+reg: mlx5e_alloc_drop_rq()
>>>    * unreg   : mlx5e_free_rq()
>>>
>>> Tested on actual hardware with samples/bpf program
>>>
>>> Cc: Saeed Mahameed <saeedm@mellanox.com>
>>> Cc: Matan Barak <matanb@mellanox.com>
>>> Cc: Tariq Toukan <tariqt@mellanox.com>
>>> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
>>> ---
>>>    drivers/net/ethernet/mellanox/mlx5/core/en.h      |    4 ++++
>>>    drivers/net/ethernet/mellanox/mlx5/core/en_main.c |   14 +++++++++++++
>>>    drivers/net/ethernet/mellanox/mlx5/core/en_rx.c   |    1 +
>>>    include/net/xdp.h                                 |   23 +++++++++++++++++++++
>>>    net/core/xdp.c                                    |    6 +++++
>>>    5 files changed, 48 insertions(+)
>>>
>>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
>>> index c0872b3284cb..fe10a042783b 100644
>>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
>>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
>>> @@ -46,6 +46,7 @@
>>>    #include <linux/mlx5/transobj.h>
>>>    #include <linux/rhashtable.h>
>>>    #include <net/switchdev.h>
>>> +#include <net/xdp.h>
>>>    #include "wq.h"
>>>    #include "mlx5_core.h"
>>>    #include "en_stats.h"
>>> @@ -568,6 +569,9 @@ struct mlx5e_rq {
>>>    	u32                    rqn;
>>>    	struct mlx5_core_dev  *mdev;
>>>    	struct mlx5_core_mkey  umr_mkey;
>>> +
>>> +	/* XDP read-mostly */
>>> +	struct xdp_rxq_info xdp_rxq;
>>>    } ____cacheline_aligned_in_smp;
>>>    
>>>    struct mlx5e_channel {
>>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>> index 0f5c012de52e..ea44b5f25e11 100644
>>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>> @@ -582,6 +582,12 @@ static int mlx5e_alloc_rq(struct mlx5e_channel *c,
>>>    	rq->ix      = c->ix;
>>>    	rq->mdev    = mdev;
>>>    
>>> +	/* XDP RX-queue info */
>>> +	xdp_rxq_info_init(&rq->xdp_rxq);
>>> +	rq->xdp_rxq.dev		= rq->netdev;
>>> +	rq->xdp_rxq.queue_index = rq->ix;
>>> +	xdp_rxq_info_reg(&rq->xdp_rxq);
>>> +

See my comment below and my comment on patch #12 I believe we can reduce 
the amount of code duplication, and have a more generic way to register 
XDP RXQs, without the need for drivers to take care of xdp_rxq_info 
declaration and handling.

>> You don't set type here. This is ok as long as the following hold:
>> 1) RXQ_TYPE_DEFAULT is zero
> 
> True
> 
>> 2) xdp_rxq is zalloc'ed.
> 
> xdp_rxq memory area is part of rq allocation, but in
> xdp_rxq_info_init() I memset/zero the area explicit.
> 
>   
>>>    	rq->xdp_prog = params->xdp_prog ?
>>> bpf_prog_inc(params->xdp_prog) : NULL; if (IS_ERR(rq->xdp_prog)) {
>>>    		err = PTR_ERR(rq->xdp_prog);
>>> @@ -695,6 +701,7 @@ static int mlx5e_alloc_rq(struct mlx5e_channel
>>> *c, err_rq_wq_destroy:
>>>    	if (rq->xdp_prog)
>>>    		bpf_prog_put(rq->xdp_prog);
>>> +	xdp_rxq_info_unreg(&rq->xdp_rxq);
>>>    	mlx5_wq_destroy(&rq->wq_ctrl);
>>>    
>>>    	return err;
>>> @@ -707,6 +714,8 @@ static void mlx5e_free_rq(struct mlx5e_rq *rq)
>>>    	if (rq->xdp_prog)
>>>    		bpf_prog_put(rq->xdp_prog);
>>>    
>>> +	xdp_rxq_info_unreg(&rq->xdp_rxq);
>>> +
>>>    	switch (rq->wq_type) {
>>>    	case MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ:
>>>    		mlx5e_rq_free_mpwqe_info(rq);
>>> @@ -2768,6 +2777,11 @@ static int mlx5e_alloc_drop_rq(struct
>>> mlx5_core_dev *mdev, if (err)
>>>    		return err;
>>>    
>>> +	/* XDP RX-queue info for "Drop-RQ", packets never reach
>>> XDP */
>>> +	xdp_rxq_info_init(&rq->xdp_rxq);
>>> +	xdp_rxq_info_type(&rq->xdp_rxq, RXQ_TYPE_SINK);
>>> +	xdp_rxq_info_reg(&rq->xdp_rxq);
>>> +

I don't see why you need this, This RQ is not even assigned to any 
netdev_rxq! it is a pure HW object that drops traffic in HW when netdev 
is down, it even has no buffers or napi handling, just ignore it's 
existence for the sake of mlx5 xdp_rxq_info reg/unreg stuff and remove 
RXQ_TYPE_SINK, bottom line it is not a real RQ and for sure XDP has 
nothing to do with it.

>>>    	rq->mdev = mdev;
>>>    
>>>    	return 0;
>>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
>>> b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c index
>>> 5b499c7a698f..7b38480811d4 100644 ---
>>> a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c +++
>>> b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c @@ -812,6 +812,7
>>> @@ static inline int mlx5e_xdp_handle(struct mlx5e_rq *rq,
>>> xdp_set_data_meta_invalid(&xdp); xdp.data_end = xdp.data + *len;
>>>    	xdp.data_hard_start = va;
>>> +	xdp.rxq = &rq->xdp_rxq;
>>>    
>>>    	act = bpf_prog_run_xdp(prog, &xdp);
>>>    	switch (act) {
>>> diff --git a/include/net/xdp.h b/include/net/xdp.h
>>> index e4acd198fd60..5be560d943e1 100644
>>> --- a/include/net/xdp.h
>>> +++ b/include/net/xdp.h
>>> @@ -36,10 +36,33 @@ struct xdp_rxq_info {
>>>    	struct net_device *dev;
>>>    	u32 queue_index;
>>>    	u32 reg_state;
>>> +	u32 qtype;
>>>    } ____cacheline_aligned; /* perf critical, avoid false-sharing */
>>>    
>>>    void xdp_rxq_info_init(struct xdp_rxq_info *xdp_rxq);
>>>    void xdp_rxq_info_reg(struct xdp_rxq_info *xdp_rxq);
>>>    void xdp_rxq_info_unreg(struct xdp_rxq_info *xdp_rxq);
>>>    
>>> +/**
>>> + * DOC: XDP RX-queue type
>>> + *
>>> + * The XDP RX-queue info can have associated a type.
>>> + *
>>> + * @RXQ_TYPE_DEFAULT: default no specifik queue type need to be
>>> specified
>>
>> typo: specific
> 
> Thanks, this is a Danish typo (it's spelled that way in Danish).
>   
>>> + *
>>> + * @RXQ_TYPE_SINK: indicate a fake queue that never reach XDP RX
>>> + *	code.  Some drivers have a need to maintain a lower layer
>>> + *	RX-queue as a sink queue, while reconfiguring other
>>> RX-queues.
>>> + */
>>> +#define RXQ_TYPE_DEFAULT	0
>>> +#define RXQ_TYPE_SINK		1
>>> +#define RXQ_TYPE_MAX		RXQ_TYPE_SINK
>>
>> Definitions of incremental numbers, enum might be best here, you can
>> give them some enum type and use it in xdp_rxq_info->qtype.
> 
> I use defines to make the below BUILD_BUG_ON work, as enums does not
> get expanded to their values in the C-preprocessor stage.
> 
>>> +
>>> +static inline
>>> +void xdp_rxq_info_type(struct xdp_rxq_info *xdp_rxq, u32 qtype)
>>> +{
>>> +	BUILD_BUG_ON(qtype > RXQ_TYPE_MAX);
>>> +	xdp_rxq->qtype = qtype;
>>> +}
>>> +
>>>    #endif /* __LINUX_NET_XDP_H__ */
>>> diff --git a/net/core/xdp.c b/net/core/xdp.c
>>> index a9d2dd7b1ede..2a111f5987f6 100644
>>> --- a/net/core/xdp.c
>>> +++ b/net/core/xdp.c
>>> @@ -32,8 +32,14 @@ EXPORT_SYMBOL_GPL(xdp_rxq_info_init);
>>>    
>>>    void xdp_rxq_info_reg(struct xdp_rxq_info *xdp_rxq)
>>>    {
>>> +	if (xdp_rxq->qtype == RXQ_TYPE_SINK)
>>> +		goto skip_content_check;
>>> +
>>> +	/* Check information setup by driver code */
>>>    	WARN(!xdp_rxq->dev, "Missing net_device from driver");
>>>    	WARN(xdp_rxq->queue_index == U32_MAX, "Miss queue_index from driver"); +
>>> +skip_content_check:
>>>    	WARN(!(xdp_rxq->reg_state == REG_STATE_NEW),"API violation, miss init");
>>>      xdp_rxq->reg_state = REG_STATE_REGISTRED;
>> typo: REGISTERED (introduced in a previous patch)
> 
> Thanks for catching that! :-)
> 

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox