Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: stable/linux-4.10.y build: 203 builds: 3 failed, 200 passed, 3 errors, 5 warnings (v4.10.16)
From: David Miller @ 2017-05-15 14:23 UTC (permalink / raw)
  To: arnd; +Cc: bot, kernel-build-reports, edumazet, gregkh, stable, netdev
In-Reply-To: <CAK8P3a2gFXScsizfvd-2=yEWBvPoE7L-r5CCf63MzPuWejPzAQ@mail.gmail.com>

From: Arnd Bergmann <arnd@arndb.de>
Date: Mon, 15 May 2017 14:57:08 +0200

> On Sun, May 14, 2017 at 3:48 PM, kernelci.org bot <bot@kernelci.org> wrote:
>>
>> stable/linux-4.10.y build: 203 builds: 3 failed, 200 passed, 3 errors, 5 warnings (v4.10.16)
>> Full Build Summary: https://kernelci.org/build/stable/branch/linux-4.10.y/kernel/v4.10.16/
>> Tree: stable
>> Branch: linux-4.10.y
>> Git Describe: v4.10.16
>> Git Commit: 6e8e9958691907e8d7eb3b2107619dddbdaeb175
>> Git URL: http://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
>> Built: 4 unique architectures
>>
>> Build Failures Detected:
>>
>> arm: gcc version 5.3.1 20160412 (Linaro GCC 5.3-2016.05)
>> spear3xx_defconfig FAIL
>> spear6xx_defconfig FAIL
>> tct_hammer_defconfig FAIL
>> Errors summary:
>> 3 net/core/skbuff.c:1575:37: error: 'sock_edemux' undeclared (first use in this function)
>> Warnings summary:
> 
> This is a regression against v4.10.15, caused by the backport of
> c21b48cc1bbf ("net: adjust skb->truesize in ___pskb_trim()") by Eric
> Dumazet.
> 
> Part of another commit that Eric did earlier fixes it:
> 
> 158f323b9868 ("net: adjust skb->truesize in pskb_expand_head()")
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -1534,7 +1534,7 @@ void sock_efree(struct sk_buff *skb);
>  #ifdef CONFIG_INET
>  void sock_edemux(struct sk_buff *skb);
>  #else
> -#define sock_edemux(skb) sock_efree(skb)
> +#define sock_edemux sock_efree
>  #endif
> 
>  int sock_setsockopt(struct socket *sock, int level, int op,
> 
> This commit is not marked 'Cc: stable' upstream, but is referenced
> in the one that was backported and looks like it might be appropriate
> for stable as well. Eric, can you clarify?

Yeah we should definitely merge this into -stable, sorry for not
noticing this during the backport.

^ permalink raw reply

* RE: [PATCH 1/3] Fix ERROR: trailing statements should be on next line
From: David Laight @ 2017-05-15 14:25 UTC (permalink / raw)
  To: 'Alex Williamson', Michael S. Tsirkin
  Cc: Maciek Fijalkowski, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	virtualization@lists.linux-foundation.org
In-Reply-To: <20170514212048.24ee7667@t450s.home>

From: Alex Williamson
> Sent: 15 May 2017 04:21
...
> > >  	/* Find end of list, sew whole thing into vi->rq.pages. */
> > > -	for (end = page; end->private; end = (struct page *)end->private);
> > > +	for (end = page; end->private; end = (struct page *)end->private)
> > > +		;
> 
> FWIW, I generally like to put a comment on the next line to make it
> abundantly clear that there's nothing in the body of the loop, it's
> also more aesthetically pleasing than a semi-colon on the line by
> itself, ex. /* Nothing */;  It's just too easy to misinterpret the
> loop otherwise, especially without gratuitous white space.  Thanks,

My preference is to put 'continue;' on a line by itself.
Or even move the termination condition into the loop:
	for (end = page;; end = (struct page *)end->private)
		if (!end->private)
			break;

(oh, is that cast needed??)

	David

^ permalink raw reply

* switchdev offload & ecmp
From: Nicolas Dichtel @ 2017-05-15 14:25 UTC (permalink / raw)
  To: Jiri Pirko, Ido Schimmel; +Cc: Nikolay Aleksandrov, Roopa Prabhu, netdev

Hi Jiri and Ido,

I'm trying to understand how ecmp offloading works. It seems that rocker doesn't
support it:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/rocker/rocker_ofdpa.c#n2409.
But I saw that the support was added in spectrum:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/?h=684a95c064fc.

Is there a consistency between the ecmp algorithm of the kernel and the one from
spectrum?
I suspect that there can be scenarii where some packets of a flow are forwarded
by the driver and some other are forwarded by the kernel.
For example, an ecmp route with two nexthops: a connected route and a gw? In
that case, the periodic nexthops update
(https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c#n987)
won't help. How do you ensure that all packets of the flow are always forwarded
through the same nexthop?

Regards,
Nicolas

^ permalink raw reply

* Re: [PATCH] net: ethernet: stmmac: properly set PS bit in MII configurations during reset
From: Thomas Petazzoni @ 2017-05-15 14:27 UTC (permalink / raw)
  To: Giuseppe CAVALLARO; +Cc: Corentin Labbe, Alexandre Torgue, netdev, stable
In-Reply-To: <20170510091817.58f4e843@free-electrons.com>

Hello Giuseppe,

On Wed, 10 May 2017 09:18:17 +0200, Thomas Petazzoni wrote:

> On Wed, 10 May 2017 09:03:12 +0200, Giuseppe CAVALLARO wrote:
> 
> > > Please, read again my patch and the description of the problem that I
> > > have sent. But basically, any solution that does not allow to set the
> > > PS bit between asserting the DMA reset bit and polling for it to clear
> > > will not work for MII PHYs.    
> > 
> > yes your point was clear to me, I was just wondering if we could find an 
> > easier way
> > to solve it w/o changing the API, adding  the set_ps and propagating the 
> > "interface"
> > inside the DMA reset.
> > 
> > Maybe this could be fixed in the glue-logic in some way. Let me know 
> > what do you think.  
> 
> Well, it's more up to you to tell me how you would like this be solved.
> We figured out what the problem was, but I don't know well enough the
> architecture of the driver to decide how the solution to this problem
> should be designed. I made an initial simple proposal to show what is
> needed, but I'm definitely open to suggestions.

Do you have any suggestion on how to move forward with this?

Thanks a lot,

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

^ permalink raw reply

* Re: [PATCH] rt2x00: improve calling conventions for register accessors
From: Stanislaw Gruszka @ 2017-05-15 14:28 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Helmut Schaa, Kalle Valo, Daniel Golle, Mathias Kresin,
	Johannes Berg, Tomislav Požega, Serge Vasilugin,
	Roman Yeryomin, linux-wireless, netdev, linux-kernel
In-Reply-To: <20170515134711.2770374-1-arnd@arndb.de>

On Mon, May 15, 2017 at 03:46:55PM +0200, Arnd Bergmann wrote:
> With CONFIG_KASAN enabled and gcc-7, we get a warning about rather high
> stack usage (with a private patch set I have to turn on this warning,
> which I intend to get into the next kernel release):
> 
> wireless/ralink/rt2x00/rt2800lib.c: In function 'rt2800_bw_filter_calibration':
> wireless/ralink/rt2x00/rt2800lib.c:7990:1: error: the frame size of 2144 bytes is larger than 1536 bytes [-Werror=frame-larger-than=]
> 
> The problem is that KASAN inserts a redzone around each local variable that
> gets passed by reference, and the newly added function has a lot of them.
> We can easily avoid that here by changing the calling convention to have
> the output as the return value of the function. This should also results in
> smaller object code, saving around 4KB in .text with KASAN, or 2KB without
> KASAN.
> 
> Fixes: 41977e86c984 ("rt2x00: add support for MT7620")
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> ---
>  drivers/net/wireless/ralink/rt2x00/rt2800lib.c | 319 +++++++++++++------------
>  1 file changed, 164 insertions(+), 155 deletions(-)

We have read(, &val) calling convention since forever in rt2x00 and that
was never a problem. I dislike to change that now to make some tools
happy, I think problem should be fixed in the tools instead.

Stanislaw

^ permalink raw reply

* Re: [PATCH] dt-bindings: net: move FMan binding
From: David Miller @ 2017-05-15 14:30 UTC (permalink / raw)
  To: madalin.bucur
  Cc: robh+dt, mark.rutland, oss, benh, paulus, mpe, linuxppc-dev,
	linux-arm-kernel, devicetree, netdev, linux-kernel
In-Reply-To: <1494855398-26215-1-git-send-email-madalin.bucur@nxp.com>

From: Madalin Bucur <madalin.bucur@nxp.com>
Date: Mon, 15 May 2017 16:36:38 +0300

> Besides the PPC SoCs, the QorIQ DPAA FMan is also present on ARM SoCs,
> moving the device tree binding document into the bindings/net folder.
> 
> Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>

What tree is this targetted at for merging?

^ permalink raw reply

* Re: [PATCH net v2] qed: Fix uninitialized data in aRFS infrastructure
From: Arnd Bergmann @ 2017-05-15 14:35 UTC (permalink / raw)
  To: Yuval Mintz; +Cc: David Miller, Networking
In-Reply-To: <1494753683-3429-1-git-send-email-Yuval.Mintz@cavium.com>

On Sun, May 14, 2017 at 11:21 AM, Yuval Mintz <Yuval.Mintz@cavium.com> wrote:
> Current memset is using incorrect type of variable, causing the
> upper-half of the strucutre to be left uninitialized and causing:
>
>   ethernet/qlogic/qed/qed_init_fw_funcs.c: In function 'qed_set_rfs_mode_disable':
>   ethernet/qlogic/qed/qed_init_fw_funcs.c:993:3: error: '*((void *)&ramline+4)' is used uninitialized in this function [-Werror=uninitialized]
>
> Fixes: d51e4af5c209 ("qed: aRFS infrastructure support")
> Reported-by: Arnd Bergmann <arnd@arndb.de>
> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>

> I don't see the original warning you've reported [probably due to
> compiler differences], so other than via code reading I couldn't
> actually test this. Can you please compile-test this one?

I don't see the warning any more either with the compiler and defconfig
I used last time, maybe some compiler flags have changed in the
meantime.

Anyway, your patch is definitely required to make it work correctly, so

Reviewed-by: Arnd Bergmann <arnd@arndb.de>

       Arnd

^ permalink raw reply

* Re: [PATCH] rt2x00: improve calling conventions for register accessors
From: Arnd Bergmann @ 2017-05-15 14:36 UTC (permalink / raw)
  To: Stanislaw Gruszka
  Cc: Helmut Schaa, Kalle Valo, Daniel Golle, Mathias Kresin,
	Johannes Berg, Tomislav Požega, Serge Vasilugin,
	Roman Yeryomin, linux-wireless, Networking,
	Linux Kernel Mailing List
In-Reply-To: <20170515142520.GA13996@redhat.com>

On Mon, May 15, 2017 at 4:28 PM, Stanislaw Gruszka <sgruszka@redhat.com> wrote:
> On Mon, May 15, 2017 at 03:46:55PM +0200, Arnd Bergmann wrote:
>> With CONFIG_KASAN enabled and gcc-7, we get a warning about rather high
>> stack usage (with a private patch set I have to turn on this warning,
>> which I intend to get into the next kernel release):
>>
>> wireless/ralink/rt2x00/rt2800lib.c: In function 'rt2800_bw_filter_calibration':
>> wireless/ralink/rt2x00/rt2800lib.c:7990:1: error: the frame size of 2144 bytes is larger than 1536 bytes [-Werror=frame-larger-than=]
>>
>> The problem is that KASAN inserts a redzone around each local variable that
>> gets passed by reference, and the newly added function has a lot of them.
>> We can easily avoid that here by changing the calling convention to have
>> the output as the return value of the function. This should also results in
>> smaller object code, saving around 4KB in .text with KASAN, or 2KB without
>> KASAN.
>>
>> Fixes: 41977e86c984 ("rt2x00: add support for MT7620")
>> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
>> ---
>>  drivers/net/wireless/ralink/rt2x00/rt2800lib.c | 319 +++++++++++++------------
>>  1 file changed, 164 insertions(+), 155 deletions(-)
>
> We have read(, &val) calling convention since forever in rt2x00 and that
> was never a problem. I dislike to change that now to make some tools
> happy, I think problem should be fixed in the tools instead.

How about adding 'depends on !KASAN' in Kconfig instead?

       Arnd

^ permalink raw reply

* Re: [PATCH] bridge: netlink: check vlan_default_pvid range
From: Tobias Jungel @ 2017-05-15 14:38 UTC (permalink / raw)
  To: Nikolay Aleksandrov, Sabrina Dubroca
  Cc: Stephen Hemminger, David S. Miller, netdev
In-Reply-To: <54d47a4c-2bbc-ba5c-e147-73c2f48b3cc6@cumulusnetworks.com>

On Mon, 2017-05-15 at 16:31 +0300, Nikolay Aleksandrov wrote:
> On 5/15/17 4:29 PM, Nikolay Aleksandrov wrote:
> > On 5/15/17 4:21 PM, Tobias Jungel wrote:
> > > Thanks Sabrina and Nik.
> > > 
> > > On Mon, 2017-05-15 at 14:01 +0200, Sabrina Dubroca wrote:
> > > > Hi Tobias,
> > > > 
> > > > 2017-05-15, 13:08:19 +0200, Tobias Jungel wrote:
> > > > > Currently it is allowed to set the default pvid of a bridge
> > > > > to a
> > > > > value
> > > > > above VLAN_VID_MASK (0xfff). This patch checks the passed
> > > > > pvid and
> > > > > disables the pvid in case it is out of bounds.
> > > > 
> > > > Could we return an error (-EINVAL) to userspace
> > > > instead?  Silently
> > > > disabling the feature seems confusing to me.  This would
> > > > probably be
> > > > better in br_validate() (like the IFLA_BR_VLAN_PROTOCOL check),
> > > > since
> > > > there's already such a check when setting default_pvid from
> > > > sysfs (in
> > > > br_vlan_set_default_pvid()).
> > > 
> > > I will send a v2 that returns -EINVAL. br_validate seems to be
> > > the
> > > wrong place to me since it deals with the bridge ports.
> > > 
> > 
> > Could you elaborate ? br_validate should be called for all and is a
> > very good
> > suggestion.
> 
> I meant for the bridge newlink/changelink of course. :-)

Sorry had a wrong understanding of that function. Will come up with a
v3 later.

> 
> > 
> > > > 
> > > > > 
> > > > > Reproduce by calling:
> > > > > 
> > > > > [root@test ~]# ip l a type bridge
> > > > > [root@test ~]# ip l a type dummy
> > > > > [root@test ~]# ip l s bridge0 type bridge vlan_filtering 1
> > > > > [root@test ~]# ip l s bridge0 type bridge vlan_default_pvid
> > > > > 9999
> > > > > [root@test ~]# ip l s dummy0 master bridge0
> > > > > [root@test ~]# bridge vlan
> > > > > port    vlan ids
> > > > > bridge0     9999 PVID Egress Untagged
> > > > > 
> > > > > dummy0     9999 PVID Egress Untagged
> > > > 
> > > > You'll also need to add a Signed-off-by, and a Fixes tag would
> > > > be
> > > > nice.
> > > > 
> > > 
> > > Right, will add this as well.
> > > 
> > > > 
> > > > Thanks,
> > > > 
> 
> 

^ permalink raw reply

* Re: [PATCH] rt2x00: improve calling conventions for register accessors
From: David Miller @ 2017-05-15 14:39 UTC (permalink / raw)
  To: sgruszka
  Cc: arnd, helmut.schaa, kvalo, daniel, dev, johannes.berg,
	pozega.tomislav, vasilugin, roman, linux-wireless, netdev,
	linux-kernel
In-Reply-To: <20170515142520.GA13996@redhat.com>

From: Stanislaw Gruszka <sgruszka@redhat.com>
Date: Mon, 15 May 2017 16:28:01 +0200

> On Mon, May 15, 2017 at 03:46:55PM +0200, Arnd Bergmann wrote:
>> With CONFIG_KASAN enabled and gcc-7, we get a warning about rather high
>> stack usage (with a private patch set I have to turn on this warning,
>> which I intend to get into the next kernel release):
>> 
>> wireless/ralink/rt2x00/rt2800lib.c: In function 'rt2800_bw_filter_calibration':
>> wireless/ralink/rt2x00/rt2800lib.c:7990:1: error: the frame size of 2144 bytes is larger than 1536 bytes [-Werror=frame-larger-than=]
>> 
>> The problem is that KASAN inserts a redzone around each local variable that
>> gets passed by reference, and the newly added function has a lot of them.
>> We can easily avoid that here by changing the calling convention to have
>> the output as the return value of the function. This should also results in
>> smaller object code, saving around 4KB in .text with KASAN, or 2KB without
>> KASAN.
>> 
>> Fixes: 41977e86c984 ("rt2x00: add support for MT7620")
>> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
>> ---
>>  drivers/net/wireless/ralink/rt2x00/rt2800lib.c | 319 +++++++++++++------------
>>  1 file changed, 164 insertions(+), 155 deletions(-)
> 
> We have read(, &val) calling convention since forever in rt2x00 and that
> was never a problem. I dislike to change that now to make some tools
> happy, I think problem should be fixed in the tools instead.

Passing return values by reference is and always has been a really
poor way to achieve what these functions are doing.

And frankly, whilst the tool could see what's going on here better, we
should be making code easier rather than more difficult to audit.

I am therefore very much in favor of Arnd's change.

This isn't even a situation where there are multiple return values,
such as needing to signal an error and return an unsigned value at the
same time.

These functions return _one_ value, and therefore they should be
returned as a true return value.

Thanks.

^ permalink raw reply

* Re: [PATCH] rt2x00: improve calling conventions for register accessors
From: David Miller @ 2017-05-15 14:40 UTC (permalink / raw)
  To: arnd
  Cc: sgruszka, helmut.schaa, kvalo, daniel, dev, johannes.berg,
	pozega.tomislav, vasilugin, roman, linux-wireless, netdev,
	linux-kernel
In-Reply-To: <CAK8P3a1xYah7_4=DAcN+M83noUQyLZ9hV=WWD0Q79sQrFsoroA@mail.gmail.com>

From: Arnd Bergmann <arnd@arndb.de>
Date: Mon, 15 May 2017 16:36:45 +0200

> On Mon, May 15, 2017 at 4:28 PM, Stanislaw Gruszka <sgruszka@redhat.com> wrote:
>> On Mon, May 15, 2017 at 03:46:55PM +0200, Arnd Bergmann wrote:
>>> With CONFIG_KASAN enabled and gcc-7, we get a warning about rather high
>>> stack usage (with a private patch set I have to turn on this warning,
>>> which I intend to get into the next kernel release):
>>>
>>> wireless/ralink/rt2x00/rt2800lib.c: In function 'rt2800_bw_filter_calibration':
>>> wireless/ralink/rt2x00/rt2800lib.c:7990:1: error: the frame size of 2144 bytes is larger than 1536 bytes [-Werror=frame-larger-than=]
>>>
>>> The problem is that KASAN inserts a redzone around each local variable that
>>> gets passed by reference, and the newly added function has a lot of them.
>>> We can easily avoid that here by changing the calling convention to have
>>> the output as the return value of the function. This should also results in
>>> smaller object code, saving around 4KB in .text with KASAN, or 2KB without
>>> KASAN.
>>>
>>> Fixes: 41977e86c984 ("rt2x00: add support for MT7620")
>>> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
>>> ---
>>>  drivers/net/wireless/ralink/rt2x00/rt2800lib.c | 319 +++++++++++++------------
>>>  1 file changed, 164 insertions(+), 155 deletions(-)
>>
>> We have read(, &val) calling convention since forever in rt2x00 and that
>> was never a problem. I dislike to change that now to make some tools
>> happy, I think problem should be fixed in the tools instead.
> 
> How about adding 'depends on !KASAN' in Kconfig instead?

Please let's not go down that route and make such facilities less
useful due to decreased coverage.

Thanks.

^ permalink raw reply

* Re: [PATCH net] net: netcp: fix check of requested timestamping filter
From: Richard Cochran @ 2017-05-15 15:03 UTC (permalink / raw)
  To: Miroslav Lichvar; +Cc: netdev, WingMan Kwok
In-Reply-To: <20170515140436.12175-1-mlichvar@redhat.com>

On Mon, May 15, 2017 at 04:04:36PM +0200, Miroslav Lichvar wrote:
> The driver doesn't support timestamping of all received packets and
> should return error when trying to enable the HWTSTAMP_FILTER_ALL
> filter.
> 
> Cc: WingMan Kwok <w-kwok2@ti.com>
> Cc: Richard Cochran <richardcochran@gmail.com>
> Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>

Acked-by: Richard Cochran <richardcochran@gmail.com>

^ permalink raw reply

* Re: [PATCH] usbnet: no address filtering on RNDIS
From: Oliver Neukum @ 2017-05-15 15:08 UTC (permalink / raw)
  To: David Miller, bjorn; +Cc: netdev
In-Reply-To: <20170515.101658.471352019262168789.davem@davemloft.net>

Am Montag, den 15.05.2017, 10:16 -0400 schrieb David Miller:
> From: Bjørn Mork <bjorn@mork.no>
> Date: Mon, 15 May 2017 14:20:20 +0200
> 
> > Oliver Neukum <oneukum@suse.com> writes:
> > 
> >> RNDIS does not do multicast filtering and the commands crash a few devices.
> >> Make it conditional.
> > 
> > 
> > Strange.  I thought we already discussed this when the filter reset
> > request was initially added 3 years ago.  Ref
> > https://patchwork.ozlabs.org/patch/374137/
> 
> I think the suggestion at the end of that thread is a better way to
> handle this.

Indeed. I will redo the patch.

	Regards
		Oliver

^ permalink raw reply

* Re: iproute2 build error due to sr-ipv6 lwtunnel
From: Stephen Hemminger @ 2017-05-15 15:24 UTC (permalink / raw)
  To: David Lebrun; +Cc: Daniel Borkmann, netdev
In-Reply-To: <6c156160-706e-7f87-117e-5c2c9784ec6f@uclouvain.be>

[-- Attachment #1: Type: text/plain, Size: 427 bytes --]

On Mon, 15 May 2017 09:56:18 +0200
David Lebrun <david.lebrun@uclouvain.be> wrote:

> On 05/14/2017 03:26 PM, Daniel Borkmann wrote:
> > 
> > David, are you still looking into fixing this?  
> 
> Sorry, I left the issue on the side and then forgot about it.
> 
> Could you try the following iproute2 patch ? Thanks

The headers in iproute2 come from kernel headers.
Any changes need to be done upstream in kernel.

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* [PATCH V3 net 0/1] net/smc and the RDMA core
From: Ursula Braun @ 2017-05-15 15:33 UTC (permalink / raw)
  To: davem-fT/PcQaiUtIeIZ0/mPfg9Q
  Cc: hch-jcswGhMUV9g, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-s390-u79uwXL29TY76Z2rM5mHXA,
	jwi-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	schwidefsky-tA70FqPdS9bQT0dZR+AlfA,
	heiko.carstens-tA70FqPdS9bQT0dZR+AlfA,
	raspl-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	ubraun-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8

From: Ursula Braun <ursula.braun-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org>

Hi Dave,

as requested, here is V3 of the smc-patch with an updated commit log.

V3: update commit log
V2: do not use _internal_mr
V1: switch to usage of IB_PD_UNSAFE_GLOBAL_RKEY

Kind regards, Ursula

Ursula Braun (1):
  smc: switch to usage of IB_PD_UNSAFE_GLOBAL_RKEY

 net/smc/smc_clc.c  |  4 ++--
 net/smc/smc_core.c | 16 +++-------------
 net/smc/smc_core.h |  2 +-
 net/smc/smc_ib.c   | 21 ++-------------------
 net/smc/smc_ib.h   |  2 --
 5 files changed, 8 insertions(+), 37 deletions(-)

-- 
2.10.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH V3 net 1/1] smc: switch to usage of IB_PD_UNSAFE_GLOBAL_RKEY
From: Ursula Braun @ 2017-05-15 15:33 UTC (permalink / raw)
  To: davem
  Cc: hch, netdev, linux-rdma, linux-s390, jwi, schwidefsky,
	heiko.carstens, raspl, ubraun
In-Reply-To: <20170515153337.31262-1-ubraun@linux.vnet.ibm.com>

Currently, SMC enables remote access to physical memory when a user
has successfully configured and established an SMC-connection until ten
minutes after the last SMC connection is closed. Because this is considered
a security risk, drivers are supposed to use IB_PD_UNSAFE_GLOBAL_RKEY in
such a case.

This patch changes the current SMC code to use IB_PD_UNSAFE_GLOBAL_RKEY.
This improves user awareness, but does not remove the security risk itself.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
---
 net/smc/smc_clc.c  |  4 ++--
 net/smc/smc_core.c | 16 +++-------------
 net/smc/smc_core.h |  2 +-
 net/smc/smc_ib.c   | 21 ++-------------------
 net/smc/smc_ib.h   |  2 --
 5 files changed, 8 insertions(+), 37 deletions(-)

diff --git a/net/smc/smc_clc.c b/net/smc/smc_clc.c
index e41f594..03ec058 100644
--- a/net/smc/smc_clc.c
+++ b/net/smc/smc_clc.c
@@ -204,7 +204,7 @@ int smc_clc_send_confirm(struct smc_sock *smc)
 	memcpy(&cclc.lcl.mac, &link->smcibdev->mac[link->ibport - 1], ETH_ALEN);
 	hton24(cclc.qpn, link->roce_qp->qp_num);
 	cclc.rmb_rkey =
-		htonl(conn->rmb_desc->mr_rx[SMC_SINGLE_LINK]->rkey);
+		htonl(conn->rmb_desc->rkey[SMC_SINGLE_LINK]);
 	cclc.conn_idx = 1; /* for now: 1 RMB = 1 RMBE */
 	cclc.rmbe_alert_token = htonl(conn->alert_token_local);
 	cclc.qp_mtu = min(link->path_mtu, link->peer_mtu);
@@ -256,7 +256,7 @@ int smc_clc_send_accept(struct smc_sock *new_smc, int srv_first_contact)
 	memcpy(&aclc.lcl.mac, link->smcibdev->mac[link->ibport - 1], ETH_ALEN);
 	hton24(aclc.qpn, link->roce_qp->qp_num);
 	aclc.rmb_rkey =
-		htonl(conn->rmb_desc->mr_rx[SMC_SINGLE_LINK]->rkey);
+		htonl(conn->rmb_desc->rkey[SMC_SINGLE_LINK]);
 	aclc.conn_idx = 1;			/* as long as 1 RMB = 1 RMBE */
 	aclc.rmbe_alert_token = htonl(conn->alert_token_local);
 	aclc.qp_mtu = link->path_mtu;
diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
index 65020e9..3ac09a6 100644
--- a/net/smc/smc_core.c
+++ b/net/smc/smc_core.c
@@ -613,19 +613,8 @@ int smc_rmb_create(struct smc_sock *smc)
 			rmb_desc = NULL;
 			continue; /* if mapping failed, try smaller one */
 		}
-		rc = smc_ib_get_memory_region(lgr->lnk[SMC_SINGLE_LINK].roce_pd,
-					      IB_ACCESS_REMOTE_WRITE |
-					      IB_ACCESS_LOCAL_WRITE,
-					     &rmb_desc->mr_rx[SMC_SINGLE_LINK]);
-		if (rc) {
-			smc_ib_buf_unmap(lgr->lnk[SMC_SINGLE_LINK].smcibdev,
-					 tmp_bufsize, rmb_desc,
-					 DMA_FROM_DEVICE);
-			kfree(rmb_desc->cpu_addr);
-			kfree(rmb_desc);
-			rmb_desc = NULL;
-			continue;
-		}
+		rmb_desc->rkey[SMC_SINGLE_LINK] =
+			lgr->lnk[SMC_SINGLE_LINK].roce_pd->unsafe_global_rkey;
 		rmb_desc->used = 1;
 		write_lock_bh(&lgr->rmbs_lock);
 		list_add(&rmb_desc->list,
@@ -668,6 +657,7 @@ int smc_rmb_rtoken_handling(struct smc_connection *conn,
 
 	for (i = 0; i < SMC_RMBS_PER_LGR_MAX; i++) {
 		if ((lgr->rtokens[i][SMC_SINGLE_LINK].rkey == rkey) &&
+		    (lgr->rtokens[i][SMC_SINGLE_LINK].dma_addr == dma_addr) &&
 		    test_bit(i, lgr->rtokens_used_mask)) {
 			conn->rtoken_idx = i;
 			return 0;
diff --git a/net/smc/smc_core.h b/net/smc/smc_core.h
index 27eb3805..b013cb4 100644
--- a/net/smc/smc_core.h
+++ b/net/smc/smc_core.h
@@ -93,7 +93,7 @@ struct smc_buf_desc {
 	u64			dma_addr[SMC_LINKS_PER_LGR_MAX];
 						/* mapped address of buffer */
 	void			*cpu_addr;	/* virtual address of buffer */
-	struct ib_mr		*mr_rx[SMC_LINKS_PER_LGR_MAX];
+	u32			rkey[SMC_LINKS_PER_LGR_MAX];
 						/* for rmb only:
 						 * rkey provided to peer
 						 */
diff --git a/net/smc/smc_ib.c b/net/smc/smc_ib.c
index cb69ab9..b317155 100644
--- a/net/smc/smc_ib.c
+++ b/net/smc/smc_ib.c
@@ -37,24 +37,6 @@ u8 local_systemid[SMC_SYSTEMID_LEN] = SMC_LOCAL_SYSTEMID_RESET;	/* unique system
 								 * identifier
 								 */
 
-int smc_ib_get_memory_region(struct ib_pd *pd, int access_flags,
-			     struct ib_mr **mr)
-{
-	int rc;
-
-	if (*mr)
-		return 0; /* already done */
-
-	/* obtain unique key -
-	 * next invocation of get_dma_mr returns a different key!
-	 */
-	*mr = pd->device->get_dma_mr(pd, access_flags);
-	rc = PTR_ERR_OR_ZERO(*mr);
-	if (IS_ERR(*mr))
-		*mr = NULL;
-	return rc;
-}
-
 static int smc_ib_modify_qp_init(struct smc_link *lnk)
 {
 	struct ib_qp_attr qp_attr;
@@ -210,7 +192,8 @@ int smc_ib_create_protection_domain(struct smc_link *lnk)
 {
 	int rc;
 
-	lnk->roce_pd = ib_alloc_pd(lnk->smcibdev->ibdev, 0);
+	lnk->roce_pd = ib_alloc_pd(lnk->smcibdev->ibdev,
+				   IB_PD_UNSAFE_GLOBAL_RKEY);
 	rc = PTR_ERR_OR_ZERO(lnk->roce_pd);
 	if (IS_ERR(lnk->roce_pd))
 		lnk->roce_pd = NULL;
diff --git a/net/smc/smc_ib.h b/net/smc/smc_ib.h
index 7e1f0e2..b567152 100644
--- a/net/smc/smc_ib.h
+++ b/net/smc/smc_ib.h
@@ -61,8 +61,6 @@ void smc_ib_dealloc_protection_domain(struct smc_link *lnk);
 int smc_ib_create_protection_domain(struct smc_link *lnk);
 void smc_ib_destroy_queue_pair(struct smc_link *lnk);
 int smc_ib_create_queue_pair(struct smc_link *lnk);
-int smc_ib_get_memory_region(struct ib_pd *pd, int access_flags,
-			     struct ib_mr **mr);
 int smc_ib_ready_link(struct smc_link *lnk);
 int smc_ib_modify_qp_rts(struct smc_link *lnk);
 int smc_ib_modify_qp_reset(struct smc_link *lnk);
-- 
2.10.2

^ permalink raw reply related

* Re: [PATCH 2/3] bpf: Track alignment of MAP pointers in verifier.
From: David Miller @ 2017-05-15 15:34 UTC (permalink / raw)
  To: daniel; +Cc: ast, alexei.starovoitov, netdev
In-Reply-To: <5919A8AA.9010401@iogearbox.net>

From: Daniel Borkmann <daniel@iogearbox.net>
Date: Mon, 15 May 2017 15:10:02 +0200

>>> What are the semantics of using id here? In ptr_to_pkt, we have it,
>>> so that eventually, in find_good_pkt_pointers() we can match on id
>>> and update the range for all such regs with the same id. I'm just
>>> wondering as the side effect of this is that this makes state
>>> pruning worse.

Daniel, I looked at the state pruning for maps.  The situation is
quite interesting.

Once we have env->varlen_map_value_access (and load or store via a
PTR_TO_MAP_VALUE_ADJ pointer), the state pruning gets more strict, the
relevant tests are:

		if (memcmp(rold, rcur, sizeof(*rold)) == 0)
			continue;

		/* If the ranges were not the same, but everything else was and
		 * we didn't do a variable access into a map then we are a-ok.
		 */
		if (!varlen_map_access &&
		    memcmp(rold, rcur, offsetofend(struct bpf_reg_state, id)) == 0)
			continue;

The first memcmp() is not going to match any time we adjust any
component of a MAP pointer reg.  The offset, the alignment, anything.
That means any side effect whatsoever performed by check_pointer_add()
even if we changed the code to not modify reg->id

The second check elides:

	s64 min_value;
	u64 max_value;
	u32 min_align;
	u32 aux_off;
	u32 aux_off_align;

from the comparison but only if we haven't done a variable length
MAP access.

The only conclusion I can come to is that changing reg->id for
PTR_TO_MAP_VALUE_ADJ has no side effect for state pruning, unless we
perform PTR_TO_MAP_VALUE_ADJ register adjustments without ever
accessing the map via that pointer in the entire program.

I could add some new state to avoid the reg->id change, but given
the above I don't think that it is really necessary.

^ permalink raw reply

* Re: [PATCH net-next 1/3] net/sock: factor out dequeue/peek with offset code
From: Eric Dumazet @ 2017-05-15 16:02 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: netdev, David S. Miller, Eric Dumazet
In-Reply-To: <503e324ecd4085f256474df1a352a92814fd29f4.1494837879.git.pabeni@redhat.com>

On Mon, 2017-05-15 at 11:01 +0200, Paolo Abeni wrote:
> And update __sk_queue_drop_skb() to work on the specified queue.
> This will help the udp protocol to use an additional private
> rx queue in a later patch.
> 
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---

Acked-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply

* [PATCH v2 0/3] bpf: Track MAP pointer alignment
From: David Miller @ 2017-05-15 16:04 UTC (permalink / raw)
  To: daniel; +Cc: ast, alexei.starovoitov, netdev

This patch series updates the BPF verifier to properly track MAP
access alignment, just as we already do for packet pointers.

The two main elements of the implementation are putting the register
offset into a shared rather than reg type specific area of the
verifier register tracking strcutre, and also eliding packet pointer
specific checks when managing MAP pointer alignment.

More details of the implementation are in the commit message for patch
#2.

Patch #3 removes alignment restrictions from various test_verifier.c
tests which are no longer necessary with this MAP alignment
infrastructure in place.  Further improvements are possible, but
I didn't want to be overly ambitious here.

In the future we can add some MAP tests to test_align.c as well.

changes since v1:
 - Enforce offset limits just like for packet pointers
 - Handle "CONST_IMM += MAP_VALUE_ADJ"
 - Document effects on state pruning in commit message of patch #2

Signed-off-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply

* [PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier.
From: David Miller @ 2017-05-15 16:04 UTC (permalink / raw)
  To: daniel; +Cc: ast, alexei.starovoitov, netdev


If we use 1<<31, then sequences like:

	R1 = 0
	R1 <<= 2

do silly things.  Examples of this actually exist in
the MAP tests of test_verifier.c

Update test_align.c expectation strings.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 kernel/bpf/verifier.c                    |  2 +-
 tools/testing/selftests/bpf/test_align.c | 12 ++++++------
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 39f2dcb..0f33db0 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1715,7 +1715,7 @@ static void check_reg_overflow(struct bpf_reg_state *reg)
 static u32 calc_align(u32 imm)
 {
 	if (!imm)
-		return 1U << 31;
+		return 1U << 16;
 	return imm - ((imm - 1) & imm);
 }
 
diff --git a/tools/testing/selftests/bpf/test_align.c b/tools/testing/selftests/bpf/test_align.c
index 9644d4e..afaa297 100644
--- a/tools/testing/selftests/bpf/test_align.c
+++ b/tools/testing/selftests/bpf/test_align.c
@@ -235,12 +235,12 @@ static struct bpf_align_test tests[] = {
 		},
 		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
 		.matches = {
-			"4: R0=imm0,min_value=0,max_value=0,min_align=2147483648 R1=ctx R2=pkt(id=0,off=0,r=0) R3=pkt_end R5=pkt(id=0,off=0,r=0) R10=fp",
-			"5: R0=imm0,min_value=0,max_value=0,min_align=2147483648 R1=ctx R2=pkt(id=0,off=0,r=0) R3=pkt_end R5=pkt(id=0,off=14,r=0) R10=fp",
-			"6: R0=imm0,min_value=0,max_value=0,min_align=2147483648 R1=ctx R2=pkt(id=0,off=0,r=0) R3=pkt_end R4=pkt(id=0,off=14,r=0) R5=pkt(id=0,off=14,r=0) R10=fp",
-			"10: R0=imm0,min_value=0,max_value=0,min_align=2147483648 R1=ctx R2=pkt(id=0,off=0,r=18) R3=pkt_end R4=inv56 R5=pkt(id=0,off=14,r=18) R10=fp",
-			"14: R0=imm0,min_value=0,max_value=0,min_align=2147483648 R1=ctx R2=pkt(id=0,off=0,r=18) R3=pkt_end R4=inv48 R5=pkt(id=0,off=14,r=18) R10=fp",
-			"15: R0=imm0,min_value=0,max_value=0,min_align=2147483648 R1=ctx R2=pkt(id=0,off=0,r=18) R3=pkt_end R4=inv48 R5=pkt(id=0,off=14,r=18) R10=fp",
+			"4: R0=imm0,min_value=0,max_value=0,min_align=65536 R1=ctx R2=pkt(id=0,off=0,r=0) R3=pkt_end R5=pkt(id=0,off=0,r=0) R10=fp",
+			"5: R0=imm0,min_value=0,max_value=0,min_align=65536 R1=ctx R2=pkt(id=0,off=0,r=0) R3=pkt_end R5=pkt(id=0,off=14,r=0) R10=fp",
+			"6: R0=imm0,min_value=0,max_value=0,min_align=65536 R1=ctx R2=pkt(id=0,off=0,r=0) R3=pkt_end R4=pkt(id=0,off=14,r=0) R5=pkt(id=0,off=14,r=0) R10=fp",
+			"10: R0=imm0,min_value=0,max_value=0,min_align=65536 R1=ctx R2=pkt(id=0,off=0,r=18) R3=pkt_end R4=inv56 R5=pkt(id=0,off=14,r=18) R10=fp",
+			"14: R0=imm0,min_value=0,max_value=0,min_align=65536 R1=ctx R2=pkt(id=0,off=0,r=18) R3=pkt_end R4=inv48 R5=pkt(id=0,off=14,r=18) R10=fp",
+			"15: R0=imm0,min_value=0,max_value=0,min_align=65536 R1=ctx R2=pkt(id=0,off=0,r=18) R3=pkt_end R4=inv48 R5=pkt(id=0,off=14,r=18) R10=fp",
 		},
 	},
 	{
-- 
2.1.2.532.g19b5d50

^ permalink raw reply related

* [PATCH v2 2/3] bpf: Track alignment of MAP pointers in verifier.
From: David Miller @ 2017-05-15 16:04 UTC (permalink / raw)
  To: daniel; +Cc: ast, alexei.starovoitov, netdev


Just like packet pointers, track the known alignment of MAP pointers.

In order to facilitate the state tracking, move the register offset
field into where there is an unused 32-bit padding slot on 64-bit.

The check logic is the same as for packet pointers, except we do not
apply NET_IP_ALIGN to the calculations.

Also, there are several restrictions that apply to packet pointers
which we do not extend to MAP pointers.  For example, the
MAX_PACKET_OFF limitation and the "adding integer with < 48 upper zero
bits" thing.

When we add a variable to the MAP pointer, all of the state
transitions are identical except that we elide the reg->range clear
because it is a packet pointer specific piece of state.

The side effects here should not make state pruning worse.  The only
case where state pruning would be less effective is if we performed
arithmatic on MAP pointers but never actually accesses them.

This changes the string emitted when an unaligned access is trapped by
the verifier.  Therefore, we need to adjust the search string used by
test_verifier.c

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/linux/bpf_verifier.h                |   6 +-
 kernel/bpf/verifier.c                       | 127 ++++++++++++++++++++--------
 tools/testing/selftests/bpf/test_verifier.c |   3 +-
 3 files changed, 94 insertions(+), 42 deletions(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index d5093b5..acf711b 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -18,15 +18,13 @@
 
 struct bpf_reg_state {
 	enum bpf_reg_type type;
+	u32 off;
 	union {
 		/* valid when type == CONST_IMM | PTR_TO_STACK | UNKNOWN_VALUE */
 		s64 imm;
 
 		/* valid when type == PTR_TO_PACKET* */
-		struct {
-			u16 off;
-			u16 range;
-		};
+		u32 range;
 
 		/* valid when type == CONST_PTR_TO_MAP | PTR_TO_MAP_VALUE |
 		 *   PTR_TO_MAP_VALUE_OR_NULL
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 0f33db0..8c1ae23 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -231,10 +231,10 @@ static void print_verifier_state(struct bpf_verifier_state *state)
 		else if (t == CONST_PTR_TO_MAP || t == PTR_TO_MAP_VALUE ||
 			 t == PTR_TO_MAP_VALUE_OR_NULL ||
 			 t == PTR_TO_MAP_VALUE_ADJ)
-			verbose("(ks=%d,vs=%d,id=%u)",
+			verbose("(ks=%d,vs=%d,id=%u,off=%u)",
 				reg->map_ptr->key_size,
 				reg->map_ptr->value_size,
-				reg->id);
+				reg->id, reg->off);
 		if (reg->min_value != BPF_REGISTER_MIN_RANGE)
 			verbose(",min_value=%lld",
 				(long long)reg->min_value);
@@ -469,6 +469,7 @@ static void init_reg_state(struct bpf_reg_state *regs)
 
 	for (i = 0; i < MAX_BPF_REG; i++) {
 		regs[i].type = NOT_INIT;
+		regs[i].off = 0;
 		regs[i].imm = 0;
 		regs[i].min_value = BPF_REGISTER_MIN_RANGE;
 		regs[i].max_value = BPF_REGISTER_MAX_RANGE;
@@ -487,6 +488,7 @@ static void init_reg_state(struct bpf_reg_state *regs)
 static void __mark_reg_unknown_value(struct bpf_reg_state *regs, u32 regno)
 {
 	regs[regno].type = UNKNOWN_VALUE;
+	regs[regno].off = 0;
 	regs[regno].id = 0;
 	regs[regno].imm = 0;
 }
@@ -713,6 +715,7 @@ static int check_map_access_adj(struct bpf_verifier_env *env, u32 regno,
 }
 
 #define MAX_PACKET_OFF 0xffff
+#define MAX_MAP_OFF KMALLOC_MAX_SIZE
 
 static bool may_access_direct_pkt_data(struct bpf_verifier_env *env,
 				       const struct bpf_call_arg_meta *meta,
@@ -823,10 +826,27 @@ static int check_pkt_ptr_alignment(const struct bpf_reg_state *reg,
 }
 
 static int check_val_ptr_alignment(const struct bpf_reg_state *reg,
-				   int size, bool strict)
+				   int off, int size, bool strict)
 {
-	if (strict && size != 1) {
-		verbose("Unknown alignment. Only byte-sized access allowed in value access.\n");
+	int reg_off;
+
+	/* Byte size accesses are always allowed. */
+	if (!strict || size == 1)
+		return 0;
+
+	reg_off = reg->off;
+	if (reg->id) {
+		if (reg->aux_off_align % size) {
+			verbose("Value access is only %u byte aligned, %d byte access not allowed\n",
+				reg->aux_off_align, size);
+			return -EACCES;
+		}
+		reg_off += reg->aux_off;
+	}
+
+	if ((reg_off + off) % size != 0) {
+		verbose("misaligned value access off %d+%d size %d\n",
+			reg_off, off, size);
 		return -EACCES;
 	}
 
@@ -846,7 +866,7 @@ static int check_ptr_alignment(struct bpf_verifier_env *env,
 	case PTR_TO_PACKET:
 		return check_pkt_ptr_alignment(reg, off, size, strict);
 	case PTR_TO_MAP_VALUE_ADJ:
-		return check_val_ptr_alignment(reg, size, strict);
+		return check_val_ptr_alignment(reg, off, size, strict);
 	default:
 		if (off % size != 0) {
 			verbose("misaligned access off %d size %d\n",
@@ -1336,6 +1356,7 @@ static void clear_all_pkt_pointers(struct bpf_verifier_env *env)
 		    reg->type != PTR_TO_PACKET_END)
 			continue;
 		reg->type = UNKNOWN_VALUE;
+		reg->off = 0;
 		reg->imm = 0;
 	}
 }
@@ -1415,6 +1436,7 @@ static int check_call(struct bpf_verifier_env *env, int func_id, int insn_idx)
 	for (i = 0; i < CALLER_SAVED_REGS; i++) {
 		reg = regs + caller_saved[i];
 		reg->type = NOT_INIT;
+		reg->off = 0;
 		reg->imm = 0;
 	}
 
@@ -1458,8 +1480,8 @@ static int check_call(struct bpf_verifier_env *env, int func_id, int insn_idx)
 	return 0;
 }
 
-static int check_packet_ptr_add(struct bpf_verifier_env *env,
-				struct bpf_insn *insn)
+static int check_pointer_add(struct bpf_verifier_env *env,
+			     struct bpf_insn *insn, bool is_packet)
 {
 	struct bpf_reg_state *regs = env->cur_state.regs;
 	struct bpf_reg_state *dst_reg = &regs[insn->dst_reg];
@@ -1468,81 +1490,98 @@ static int check_packet_ptr_add(struct bpf_verifier_env *env,
 	s32 imm;
 
 	if (BPF_SRC(insn->code) == BPF_K) {
-		/* pkt_ptr += imm */
+		int max_offset;
+
+		/* pointer += imm */
 		imm = insn->imm;
 
 add_imm:
 		if (imm < 0) {
-			verbose("addition of negative constant to packet pointer is not allowed\n");
+			verbose("addition of negative constant to pointer is not allowed\n");
 			return -EACCES;
 		}
-		if (imm >= MAX_PACKET_OFF ||
-		    imm + dst_reg->off >= MAX_PACKET_OFF) {
-			verbose("constant %d is too large to add to packet pointer\n",
-				imm);
+		max_offset = (is_packet ? MAX_PACKET_OFF : MAX_MAP_OFF);
+		if (imm >= max_offset ||
+		    imm + dst_reg->off >= max_offset) {
+			verbose("constant %d is too large to add to pointer\n", imm);
 			return -EACCES;
 		}
-		/* a constant was added to pkt_ptr.
+		/* a constant was added to the pointer.
 		 * Remember it while keeping the same 'id'
 		 */
 		dst_reg->off += imm;
 	} else {
-		bool had_id;
-
-		if (src_reg->type == PTR_TO_PACKET) {
-			/* R6=pkt(id=0,off=0,r=62) R7=imm22; r7 += r6 */
+		if ((is_packet && src_reg->type == PTR_TO_PACKET) ||
+		    (!is_packet && src_reg->type == PTR_TO_MAP_VALUE_ADJ)) {
+			/* R6={pkt,map}(id=0,off=0,r=62) R7=imm22; r7 += r6 */
 			tmp_reg = *dst_reg;  /* save r7 state */
-			*dst_reg = *src_reg; /* copy pkt_ptr state r6 into r7 */
+			*dst_reg = *src_reg; /* copy ptr state r6 into r7 */
 			src_reg = &tmp_reg;  /* pretend it's src_reg state */
 			/* if the checks below reject it, the copy won't matter,
 			 * since we're rejecting the whole program. If all ok,
 			 * then imm22 state will be added to r7
-			 * and r7 will be pkt(id=0,off=22,r=62) while
-			 * r6 will stay as pkt(id=0,off=0,r=62)
+			 * and r7 will be {pkt,map}(id=0,off=22,r=62) while
+			 * r6 will stay as {pkt,map}(id=0,off=0,r=62)
 			 */
 		}
 
 		if (src_reg->type == CONST_IMM) {
-			/* pkt_ptr += reg where reg is known constant */
+			/* pointer += reg where reg is known constant */
 			imm = src_reg->imm;
 			goto add_imm;
 		}
-		/* disallow pkt_ptr += reg
+		/* disallow pointer += reg
 		 * if reg is not uknown_value with guaranteed zero upper bits
-		 * otherwise pkt_ptr may overflow and addition will become
+		 * otherwise pointer_ptr may overflow and addition will become
 		 * subtraction which is not allowed
 		 */
 		if (src_reg->type != UNKNOWN_VALUE) {
-			verbose("cannot add '%s' to ptr_to_packet\n",
+			verbose("cannot add '%s' to pointer\n",
 				reg_type_str[src_reg->type]);
 			return -EACCES;
 		}
-		if (src_reg->imm < 48) {
+		if (is_packet && src_reg->imm < 48) {
 			verbose("cannot add integer value with %lld upper zero bits to ptr_to_packet\n",
 				src_reg->imm);
 			return -EACCES;
 		}
 
-		had_id = (dst_reg->id != 0);
-
-		/* dst_reg stays as pkt_ptr type and since some positive
+		/* dst_reg stays as the same type and since some positive
 		 * integer value was added to the pointer, increment its 'id'
 		 */
 		dst_reg->id = ++env->id_gen;
 
-		/* something was added to pkt_ptr, set range to zero */
 		dst_reg->aux_off += dst_reg->off;
 		dst_reg->off = 0;
-		dst_reg->range = 0;
-		if (had_id)
+
+		if (is_packet) {
+			/* something was added to packet ptr, set range to zero */
+			dst_reg->range = 0;
+		}
+		if (dst_reg->aux_off_align) {
 			dst_reg->aux_off_align = min(dst_reg->aux_off_align,
 						     src_reg->min_align);
-		else
+		} else {
 			dst_reg->aux_off_align = src_reg->min_align;
+		}
+		if (!dst_reg->aux_off_align)
+			dst_reg->aux_off_align = 1;
 	}
 	return 0;
 }
 
+static int check_packet_ptr_add(struct bpf_verifier_env *env,
+				struct bpf_insn *insn)
+{
+	return check_pointer_add(env, insn, true);
+}
+
+static int check_map_ptr_add(struct bpf_verifier_env *env,
+			     struct bpf_insn *insn)
+{
+	return check_pointer_add(env, insn, false);
+}
+
 static int evaluate_reg_alu(struct bpf_verifier_env *env, struct bpf_insn *insn)
 {
 	struct bpf_reg_state *regs = env->cur_state.regs;
@@ -2056,10 +2095,23 @@ static int check_alu_op(struct bpf_verifier_env *env, struct bpf_insn *insn)
 		if (env->allow_ptr_leaks &&
 		    BPF_CLASS(insn->code) == BPF_ALU64 && opcode == BPF_ADD &&
 		    (dst_reg->type == PTR_TO_MAP_VALUE ||
-		     dst_reg->type == PTR_TO_MAP_VALUE_ADJ))
-			dst_reg->type = PTR_TO_MAP_VALUE_ADJ;
-		else
+		     dst_reg->type == PTR_TO_MAP_VALUE_ADJ ||
+		     (BPF_SRC(insn->code) == BPF_X &&
+		      (regs[insn->src_reg].type == PTR_TO_MAP_VALUE ||
+		       regs[insn->src_reg].type == PTR_TO_MAP_VALUE_ADJ)))) {
+			if (dst_reg->type == PTR_TO_MAP_VALUE) {
+				dst_reg->type = PTR_TO_MAP_VALUE_ADJ;
+			} else if (BPF_SRC(insn->code) == BPF_X) {
+				struct bpf_reg_state *src_reg;
+
+				src_reg = &regs[insn->src_reg];
+				if (src_reg->type == PTR_TO_MAP_VALUE)
+					src_reg->type = PTR_TO_MAP_VALUE_ADJ;
+			}
+			check_map_ptr_add(env, insn);
+		} else {
 			mark_reg_unknown_value(regs, insn->dst_reg);
+		}
 	}
 
 	return 0;
@@ -2480,6 +2532,7 @@ static int check_ld_abs(struct bpf_verifier_env *env, struct bpf_insn *insn)
 	for (i = 0; i < CALLER_SAVED_REGS; i++) {
 		reg = regs + caller_saved[i];
 		reg->type = NOT_INIT;
+		reg->off = 0;
 		reg->imm = 0;
 	}
 
diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
index 3773562..889fb5a 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -5070,7 +5070,8 @@ static void do_test_single(struct bpf_test *test, bool unpriv,
 
 	reject_from_alignment = fd_prog < 0 &&
 				(test->flags & F_NEEDS_EFFICIENT_UNALIGNED_ACCESS) &&
-				strstr(bpf_vlog, "Unknown alignment.");
+				(strstr(bpf_vlog, "misaligned value access") ||
+				 strstr(bpf_vlog, "Value access is only"));
 #ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
 	if (reject_from_alignment) {
 		printf("FAIL\nFailed due to alignment despite having efficient unaligned access: '%s'!\n",
-- 
2.1.2.532.g19b5d50

^ permalink raw reply related

* [PATCH v2 3/3] bpf: Update MAP test_verifier.c tests wrt. alignment.
From: David Miller @ 2017-05-15 16:04 UTC (permalink / raw)
  To: daniel; +Cc: ast, alexei.starovoitov, netdev


Now that we perform proper alignment tracking of MAP accesses in the
verifier, most of the MAP tests in test_verifier.c no longer need the
special F_NEEDS_EFFICIENT_UNALIGNED_ACCESS flag or can trivially be
made to not need it.

struct test_val is an integer count followed by an array of integers.
So for most tests, we adjust the memory access to be 32-bit rather
than 64-bit.

The exception is for tests where the 64-bit access is an important
aspect for what the test is trying to validate.  For example,
the test "invalid map access into an array with a invalid max check"
is trying to do a 64-bit store at the final 4 bytes of the test
array which violates the valid access range.

We could fix this by arranging the test array size such that
the last entry is 8 byte aligned, but that is for future
consideration.

Two other tests which retain the flag are intentionally doing
unaligned accesses to a MAP element.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 tools/testing/selftests/bpf/test_verifier.c | 20 +++++---------------
 1 file changed, 5 insertions(+), 15 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
index 889fb5a..73ac23a 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -3117,7 +3117,6 @@ static struct bpf_test tests[] = {
 		.errstr_unpriv = "R0 pointer arithmetic prohibited",
 		.result_unpriv = REJECT,
 		.result = ACCEPT,
-		.flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
 	},
 	{
 		"valid map access into an array with a variable",
@@ -3133,7 +3132,7 @@ static struct bpf_test tests[] = {
 			BPF_JMP_IMM(BPF_JGE, BPF_REG_1, MAX_ENTRIES, 3),
 			BPF_ALU64_IMM(BPF_LSH, BPF_REG_1, 2),
 			BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_1),
-			BPF_ST_MEM(BPF_DW, BPF_REG_0, 0,
+			BPF_ST_MEM(BPF_W, BPF_REG_0, 0,
 				   offsetof(struct test_val, foo)),
 			BPF_EXIT_INSN(),
 		},
@@ -3141,7 +3140,6 @@ static struct bpf_test tests[] = {
 		.errstr_unpriv = "R0 pointer arithmetic prohibited",
 		.result_unpriv = REJECT,
 		.result = ACCEPT,
-		.flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
 	},
 	{
 		"valid map access into an array with a signed variable",
@@ -3161,7 +3159,7 @@ static struct bpf_test tests[] = {
 			BPF_MOV32_IMM(BPF_REG_1, 0),
 			BPF_ALU32_IMM(BPF_LSH, BPF_REG_1, 2),
 			BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_1),
-			BPF_ST_MEM(BPF_DW, BPF_REG_0, 0,
+			BPF_ST_MEM(BPF_W, BPF_REG_0, 0,
 				   offsetof(struct test_val, foo)),
 			BPF_EXIT_INSN(),
 		},
@@ -3169,7 +3167,6 @@ static struct bpf_test tests[] = {
 		.errstr_unpriv = "R0 pointer arithmetic prohibited",
 		.result_unpriv = REJECT,
 		.result = ACCEPT,
-		.flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
 	},
 	{
 		"invalid map access into an array with a constant",
@@ -3211,7 +3208,6 @@ static struct bpf_test tests[] = {
 		.errstr = "R0 min value is outside of the array range",
 		.result_unpriv = REJECT,
 		.result = REJECT,
-		.flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
 	},
 	{
 		"invalid map access into an array with a variable",
@@ -3226,7 +3222,7 @@ static struct bpf_test tests[] = {
 			BPF_LDX_MEM(BPF_W, BPF_REG_1, BPF_REG_0, 0),
 			BPF_ALU64_IMM(BPF_LSH, BPF_REG_1, 2),
 			BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_1),
-			BPF_ST_MEM(BPF_DW, BPF_REG_0, 0,
+			BPF_ST_MEM(BPF_W, BPF_REG_0, 0,
 				   offsetof(struct test_val, foo)),
 			BPF_EXIT_INSN(),
 		},
@@ -3235,7 +3231,6 @@ static struct bpf_test tests[] = {
 		.errstr = "R0 min value is negative, either use unsigned index or do a if (index >=0) check.",
 		.result_unpriv = REJECT,
 		.result = REJECT,
-		.flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
 	},
 	{
 		"invalid map access into an array with no floor check",
@@ -3253,7 +3248,7 @@ static struct bpf_test tests[] = {
 			BPF_MOV32_IMM(BPF_REG_1, 0),
 			BPF_ALU32_IMM(BPF_LSH, BPF_REG_1, 2),
 			BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_1),
-			BPF_ST_MEM(BPF_DW, BPF_REG_0, 0,
+			BPF_ST_MEM(BPF_W, BPF_REG_0, 0,
 				   offsetof(struct test_val, foo)),
 			BPF_EXIT_INSN(),
 		},
@@ -3262,7 +3257,6 @@ static struct bpf_test tests[] = {
 		.errstr = "R0 min value is negative, either use unsigned index or do a if (index >=0) check.",
 		.result_unpriv = REJECT,
 		.result = REJECT,
-		.flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
 	},
 	{
 		"invalid map access into an array with a invalid max check",
@@ -3319,7 +3313,6 @@ static struct bpf_test tests[] = {
 		.errstr = "R0 min value is negative, either use unsigned index or do a if (index >=0) check.",
 		.result_unpriv = REJECT,
 		.result = REJECT,
-		.flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
 	},
 	{
 		"multiple registers share map_lookup_elem result",
@@ -3435,7 +3428,7 @@ static struct bpf_test tests[] = {
 			BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 1),
 			BPF_ALU64_IMM(BPF_LSH, BPF_REG_1, 2),
 			BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_1),
-			BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, offsetof(struct test_val, foo)),
+			BPF_ST_MEM(BPF_W, BPF_REG_0, 0, offsetof(struct test_val, foo)),
 			BPF_EXIT_INSN(),
 		},
 		.fixup_map2 = { 3 },
@@ -3443,7 +3436,6 @@ static struct bpf_test tests[] = {
 		.result = REJECT,
 		.errstr_unpriv = "R0 pointer arithmetic prohibited",
 		.result_unpriv = REJECT,
-		.flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
 	},
 	{
 		"constant register |= constant should keep constant type",
@@ -4835,7 +4827,6 @@ static struct bpf_test tests[] = {
 		.errstr = "R0 min value is negative, either use unsigned index or do a if (index >=0) check.",
 		.result = REJECT,
 		.result_unpriv = REJECT,
-		.flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
 	},
 	{
 		"invalid range check",
@@ -4867,7 +4858,6 @@ static struct bpf_test tests[] = {
 		.errstr = "R0 min value is negative, either use unsigned index or do a if (index >=0) check.",
 		.result = REJECT,
 		.result_unpriv = REJECT,
-		.flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
 	},
 	{
 		"map in map access",
-- 
2.1.2.532.g19b5d50

^ permalink raw reply related

* Re: [PATCH net-next 2/3] udp: use a separate rx queue for packet reception
From: Eric Dumazet @ 2017-05-15 16:10 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: netdev, David S. Miller, Eric Dumazet
In-Reply-To: <e97d459f9b0af848acde4a51d1ac434b15e5a6b7.1494837879.git.pabeni@redhat.com>

On Mon, 2017-05-15 at 11:01 +0200, Paolo Abeni wrote:
> under udp flood the sk_receive_queue spinlock is heavily contended.
> This patch try to reduce the contention on such lock adding a
> second receive queue to the udp sockets; recvmsg() looks first
> in such queue and, only if empty, tries to fetch the data from
> sk_receive_queue. The latter is spliced into the newly added
> queue every time the receive path has to acquire the
> sk_receive_queue lock.
> 
> The accounting of forward allocated memory is still protected with
> the sk_receive_queue lock, so udp_rmem_release() needs to acquire
> both locks when the forward deficit is flushed.
> 
> On specific scenarios we can end up acquiring and releasing the
> sk_receive_queue lock multiple times; that will be covered by
> the next patch
> 
> Suggested-by: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---

Very nice, thanks for working on this Paolo.

Acked-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply

* Re: [PATCH net-next 3/3] udp: keep the sk_receive_queue held when splicing
From: Eric Dumazet @ 2017-05-15 16:11 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: netdev, David S. Miller, Eric Dumazet
In-Reply-To: <41a898cfb2cf7ee618237a62bc48df1defed8957.1494837879.git.pabeni@redhat.com>

On Mon, 2017-05-15 at 11:01 +0200, Paolo Abeni wrote:
> On packet reception, when we are forced to splice the
> sk_receive_queue, we can keep the related lock held, so
> that we can avoid re-acquiring it, if fwd memory
> scheduling is required.
> 
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
>  net/ipv4/udp.c | 36 ++++++++++++++++++++++++++----------
>  1 file changed, 26 insertions(+), 10 deletions(-)
> 
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index 492c76b..d698973 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -1164,7 +1164,8 @@ int udp_sendpage(struct sock *sk, struct page *page, int offset,
>  }
>  
>  /* fully reclaim rmem/fwd memory allocated for skb */
> -static void udp_rmem_release(struct sock *sk, int size, int partial)
> +static void udp_rmem_release(struct sock *sk, int size, int partial,
> +			     int rx_queue_lock_held)

Looks good, but please use a bool rx_queue_lock_held ?

Thanks !

^ permalink raw reply

* Re: switchdev offload & ecmp
From: Ido Schimmel @ 2017-05-15 16:40 UTC (permalink / raw)
  To: Nicolas Dichtel; +Cc: Jiri Pirko, Nikolay Aleksandrov, Roopa Prabhu, netdev
In-Reply-To: <84b9801f-0c2a-cd55-61a4-411261943024@6wind.com>

Hi,

On Mon, May 15, 2017 at 04:25:43PM +0200, Nicolas Dichtel wrote:
> Hi Jiri and Ido,
> 
> I'm trying to understand how ecmp offloading works. It seems that rocker doesn't
> support it:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/rocker/rocker_ofdpa.c#n2409.
> But I saw that the support was added in spectrum:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/?h=684a95c064fc.
> 
> Is there a consistency between the ecmp algorithm of the kernel and the one from
> spectrum?

We currently use the hardware's defaults for ECMP hashing, which include
both L3 and L4 fields. I'm aware of Nik's patch, but we've yet to
reflect that. Note that the L4 fields aren't considered for fragmented
packets.

> I suspect that there can be scenarii where some packets of a flow are forwarded
> by the driver and some other are forwarded by the kernel.

Can you elaborate? The kernel only sees specific packets, which were
trapped to the CPU. See:
https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/tree/drivers/net/ethernet/mellanox/mlxsw/spectrum.c#n2996

> For example, an ecmp route with two nexthops: a connected route and a gw? 

Not sure I'm following you. A packet will either hit a remote route or a
directly connected one. We distinguish between the two based on the
scope of the first nexthop in the group. See:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c#n2043

> In that case, the periodic nexthops update
> (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c#n987)
> won't help. How do you ensure that all packets of the flow are always forwarded
> through the same nexthop?

I don't think we can ensure that for a flow in which some packets are
forwarded by the kernel and some by the device, but I failed to
understand your example of such a flow.

Thanks

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox