Netdev List
 help / color / mirror / Atom feed
* Re: Re: BUG: corrupted list in p9_read_work
From: Dominique Martinet @ 2018-10-10 16:00 UTC (permalink / raw)
  To: syzbot
  Cc: davem, dvyukov, ericvh, linux-kernel, lucho, netdev, rminnich,
	syzkaller-bugs, v9fs-developer
In-Reply-To: <00000000000010889e0577e0f5f2@google.com>

syzbot wrote on Wed, Oct 10, 2018:
>>> But note that syzbot can test fixes itself on request. It boils down
>>> to just giving it the patch and the base tree:
>>> https://github.com/google/syzkaller/blob/master/docs/syzbot.md#testing-patches
>
>> .. and for clarifying that bit, let's try that! :)
>
>> #syz test: git://github.com/martinetd/linux
>> e4ca13f7d075e551dc158df6af18fb412a1dba0a
>
> "git://github.com/martinetd/linux" does not look like a valid git
> repo address.

It works though, is it just picky because I didn't end it in .git? let's
try again, sorry for the noise...

#syz test: git://github.com/martinetd/linux.git e4ca13f7d075e551dc158df6af18fb412a1dba0a

-- 
Dominique

^ permalink raw reply

* Re: BUG: corrupted list in p9_read_work
From: syzbot @ 2018-10-10 16:02 UTC (permalink / raw)
  To: asmadeus, davem, dvyukov, ericvh, linux-kernel, lucho, netdev,
	rminnich, syzkaller-bugs, v9fs-developer
In-Reply-To: <20181010160022.GD20918@nautica>

Hello,

syzbot tried to test the proposed patch but build/boot failed:

failed to checkout kernel repo git://github.com/martinetd/linux.git on  
commit e4ca13f7d075e551dc158df6af18fb412a1dba0a: failed to run  
["git" "checkout" "e4ca13f7d075e551dc158df6af18fb412a1dba0a"]: exit status  
128
fatal: reference is not a tree: e4ca13f7d075e551dc158df6af18fb412a1dba0a



Tested on:

commit:         [unknown]
git tree:       git://github.com/martinetd/linux.git  
e4ca13f7d075e551dc158df6af18fb412a1dba0a
compiler:       gcc (GCC) 8.0.1 20180413 (experimental)

^ permalink raw reply

* Re: [PATCH] selftests: bpf: add config fragment LWTUNNEL
From: Song Liu @ 2018-10-10 16:06 UTC (permalink / raw)
  To: Anders Roxell
  Cc: Alexei Starovoitov, Daniel Borkmann, shuah, Networking, open list,
	linux-kselftest
In-Reply-To: <20181010092123.26218-1-anders.roxell@linaro.org>

On Wed, Oct 10, 2018 at 2:23 AM Anders Roxell <anders.roxell@linaro.org> wrote:
>
> When test_lwt_seg6local.sh was added commit c99a84eac026
> ("selftests/bpf: test for seg6local End.BPF action") config fragment
> wasn't added, and without CONFIG_LWTUNNEL enabled we see this:
> Error: CONFIG_LWTUNNEL is not enabled in this kernel.
> selftests: test_lwt_seg6local [FAILED]
>
> Signed-off-by: Anders Roxell <anders.roxell@linaro.org>

Acked-by: Song Liu <songliubraving@fb.com>

> ---
>  tools/testing/selftests/bpf/config | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/tools/testing/selftests/bpf/config b/tools/testing/selftests/bpf/config
> index 3655508f95fd..dd49df5e2df4 100644
> --- a/tools/testing/selftests/bpf/config
> +++ b/tools/testing/selftests/bpf/config
> @@ -19,3 +19,4 @@ CONFIG_CRYPTO_SHA256=m
>  CONFIG_VXLAN=y
>  CONFIG_GENEVE=y
>  CONFIG_NET_CLS_FLOWER=m
> +CONFIG_LWTUNNEL=y
> --
> 2.19.1
>

^ permalink raw reply

* Re: [PATCH] selftests: bpf: install script with_addr.sh
From: Song Liu @ 2018-10-10 16:07 UTC (permalink / raw)
  To: Anders Roxell
  Cc: Alexei Starovoitov, Daniel Borkmann, shuah, Networking, open list,
	linux-kselftest
In-Reply-To: <20181010142704.26990-1-anders.roxell@linaro.org>

On Wed, Oct 10, 2018 at 7:28 AM Anders Roxell <anders.roxell@linaro.org> wrote:
>
> When test_flow_dissector.sh runs it complains that it can't find script
> with_addr.sh:
>
> ./test_flow_dissector.sh: line 81: ./with_addr.sh: No such file or
> directory
>
> Rework so that with_addr.sh gets installed, add it to
> TEST_PROGS_EXTENDED variable.
>
> Fixes: 50b3ed57dee9 ("selftests/bpf: test bpf flow dissection")
> Signed-off-by: Anders Roxell <anders.roxell@linaro.org>

Acked-by: Song Liu <songliubraving@fb.com>

> ---
>  tools/testing/selftests/bpf/Makefile | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
> index 1381ab81099c..efd108ff737d 100644
> --- a/tools/testing/selftests/bpf/Makefile
> +++ b/tools/testing/selftests/bpf/Makefile
> @@ -51,6 +51,8 @@ TEST_PROGS := test_kmod.sh \
>         test_skb_cgroup_id.sh \
>         test_flow_dissector.sh
>
> +TEST_PROGS_EXTENDED := with_addr.sh
> +
>  # Compile but not part of 'make run_tests'
>  TEST_GEN_PROGS_EXTENDED = test_libbpf_open test_sock_addr test_skb_cgroup_id_user \
>         flow_dissector_load test_flow_dissector
> --
> 2.19.1
>

^ permalink raw reply

* Re: Re: BUG: corrupted list in p9_read_work
From: Dominique Martinet @ 2018-10-10 16:10 UTC (permalink / raw)
  To: syzbot
  Cc: davem, dvyukov, ericvh, linux-kernel, lucho, netdev, rminnich,
	syzkaller-bugs, v9fs-developer
In-Reply-To: <20181010160022.GD20918@nautica>

Dominique Martinet wrote on Wed, Oct 10, 2018:
> It works though, is it just picky because I didn't end it in .git? let's
> try again, sorry for the noise...
> 
> #syz test: git://github.com/martinetd/linux.git e4ca13f7d075e551dc158df6af18fb412a1dba0a

And I guess the commit hash needs to be in the default clone branch to
work ?
('git fetch <repo> <hash>' happily fetches the commit in a new clone for
me... But that feels like a github specific behaviour maybe)

Oh, well; made a branch for it, last try for me.

#syz test: git://github.com/martinetd/linux.git for-syzbot

-- 
Dominique

^ permalink raw reply

* RE: [PATCH net-next v2] net: core: change bool members of struct net_device to bitfield members
From: David Laight @ 2018-10-10  8:59 UTC (permalink / raw)
  To: 'Eric Dumazet', Heiner Kallweit, David Ahern,
	David Miller
  Cc: netdev@vger.kernel.org
In-Reply-To: <a7286ffb-0de5-bb9c-7dc4-7d88d952bb2f@gmail.com>

From: Eric Dumazet
> Sent: 09 October 2018 21:52
> 
> On 10/09/2018 01:24 PM, Heiner Kallweit wrote:
> 
> > Reordering the struct members to fill the holes could be a little tricky
> > and could have side effects because it may make a performance difference
> > whether certain members are in one cacheline or not.
> > And whether it's worth to spend this effort (incl. the related risks)
> > just to save a few bytes (also considering that typically we have quite
> > few instances of struct net_device)?
> 
> Not really.
> 
> In fact we probably should spend time reordering fields for performance,
> since some new fields were added a bit randomly, breaking the goal of data locality.
> 
> Some fields are used in control path only can could be moved out of the cache lines
> needed in data path (fast path).

Interesting thought....
The memory allocator rounds sizes up to a power of 2 and gives out memory
aligned to that value.
This means that the cache lines just above powers of 2 are used far
more frequently than those below one.
This will be made worse because the commonly used fields are normally at
the start of a structure.
This ought to be measurable?

Has anyone tried randomly splitting the padding between the start
and end of the allocation (while maintaining cache alignment)?
(Not sure how this would affect kfree().)

Or splitting pages (or small groups of pages) into non-power of 2
size blocks?
For instance you get three 1344 (21*64) byte blocks and five 768 byte
blocks into a 4k page.
These could give a significant footprint reduction as well as
balancing out cache line usage. 

I also wonder whether it is right to add a lot of padding to cache-line
align structure members on systems with large cache lines.
The intention is probably to get a few fields into the same cache line
not to add padding that may be larger than aggregate size of the fields.

Oh - and it is somewhat pointless because kmalloc() isn't guaranteed
to give out cache-line aligned buffers.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply

* Re: [PATCH net-next] net: mscc: allow extracting the FCS into the skb
From: Florian Fainelli @ 2018-10-10 16:25 UTC (permalink / raw)
  To: Antoine Tenart, Andrew Lunn
  Cc: davem, netdev, linux-kernel, thomas.petazzoni, alexandre.belloni,
	quentin.schulz, allan.nielsen
In-Reply-To: <20181010144631.GE3368@kwain>



On October 10, 2018 7:46:31 AM PDT, Antoine Tenart <antoine.tenart@bootlin.com> wrote:
>Hi all,
>
>On Tue, Oct 02, 2018 at 02:43:23PM +0200, Andrew Lunn wrote:
>> > The question could be "do we need len += ETH_FCS_LEN" to account
>for the
>> > FCS when NETIF_F_RXFCS is used", but I looked at other drivers and
>it
>> > seemed to me the FCS is not accounted in the stats. Should it be?
>> 
>> There does not appear to be a good answer to that. I submitted a
>patch
>> to a driver i'm using to not count it, so that the stats counters we
>> consistent with another driver. The patch was rejected.
>> 
>> I think the best you can do is flip a coin, and then document it
>using
>> a comment about if it is/is not included.
>
>I'll leave it as-is then :)
>
>@Dave, Florian: it seems to me no modification was requested after
>discussing those changes. Where do we stand regarding the patch?

Silence means agreement, thanks for your explanation, no more questions or concerns on my side.
-- 
Florian

^ permalink raw reply

* Re: BUG: corrupted list in p9_read_work
From: syzbot @ 2018-10-10 16:29 UTC (permalink / raw)
  To: asmadeus, davem, dvyukov, ericvh, linux-kernel, lucho, netdev,
	rminnich, syzkaller-bugs, v9fs-developer
In-Reply-To: <20181010161003.GA5371@nautica>

Hello,

syzbot has tested the proposed patch and the reproducer did not trigger  
crash:

Reported-and-tested-by:  
syzbot+2222c34dc40b515f30dc@syzkaller.appspotmail.com

Tested on:

commit:         e4ca13f7d075 9p/trans_fd: abort p9_read_work if req status..
git tree:       git://github.com/martinetd/linux.git for-syzbot
kernel config:  https://syzkaller.appspot.com/x/.config?x=fada1c387645ed03
compiler:       gcc (GCC) 8.0.1 20180413 (experimental)

Note: testing is done by a robot and is best-effort only.

^ permalink raw reply

* RE: [PATCH v2 lora-next 1/4] net: lora: sx1301: convert burst spi functions to regmap raw
From: Ben Whitten @ 2018-10-10  9:22 UTC (permalink / raw)
  To: Andreas Färber, Ben Whitten, Mark Brown
  Cc: starnight@g.ncu.edu.tw, hasnain.virk@arm.com,
	netdev@vger.kernel.org, liuxuenetmail@gmail.com,
	shess@hessware.de, linux-spi@vger.kernel.org, Stefan Popa
In-Reply-To: <3959c93b-edb5-c6ff-ec36-57f84ab8623f@suse.de>

> Subject: Re: [PATCH v2 lora-next 1/4] net: lora: sx1301:
> convert burst spi functions to regmap raw
> 
> Hi Ben and Mark,
> 
> Am 09.10.18 um 14:52 schrieb Ben Whitten:
> > As we have caching disabled we can access the regmap
> using raw for our
> > firmware reading and writing bursts.
> > We also remove the now defunct spi element from the
> structure as this
> > completes the move to regmap.
> >
> > Signed-off-by: Ben Whitten
> <ben.whitten@lairdtech.com>
> > ---
> >  drivers/net/lora/sx1301.c | 26 +++++++++++++++++------
> ---
> >  drivers/net/lora/sx1301.h |  2 --
> >  2 files changed, 17 insertions(+), 11 deletions(-)
> >
> > diff --git a/drivers/net/lora/sx1301.c
> b/drivers/net/lora/sx1301.c
> > index fd29258..5ab0e2d 100644
> > --- a/drivers/net/lora/sx1301.c
> > +++ b/drivers/net/lora/sx1301.c
> > @@ -76,19 +76,28 @@ static struct regmap_config
> sx1301_regmap_config = {
> >
> >  static int sx1301_read_burst(struct sx1301_priv *priv, u8
> reg, u8 *val, size_t len)
> >  {
> > -	u8 addr = reg & 0x7f;
> > -	return spi_write_then_read(priv->spi, &addr, 1, val,
> len);
> > +	size_t max;
> > +
> > +	max = regmap_get_raw_read_max(priv->regmap);
> > +	if (max && max < len) {
> > +		dev_err(priv->dev, "Burst greater then max
> raw read\n");
> > +		return -EINVAL;
> > +	}
> > +
> > +	return regmap_raw_read(priv->regmap, reg, val,
> len);
> 
> I believe the way we are using it with a single register needs
> regmap_noinc_read() instead, in case we ever want to
> enable caching? ...

Certainly, the noinc was introduced in 4.19 lets rebase lora-next
and use this API instead.

> 
> >  }
> >
> >  static int sx1301_write_burst(struct sx1301_priv *priv, u8
> reg, const u8 *val, size_t len)
> >  {
> > -	u8 addr = reg | BIT(7);
> > -	struct spi_transfer xfr[2] = {
> > -		{ .tx_buf = &addr, .len = 1 },
> > -		{ .tx_buf = val, .len = len },
> > -	};
> > +	size_t max;
> > +
> > +	max = regmap_get_raw_write_max(priv->regmap);
> > +	if (max && max < len) {
> > +		dev_err(priv->dev, "Burst greater then max
> raw write\n");
> > +		return -EINVAL;
> > +	}
> >
> > -	return spi_sync_transfer(priv->spi, xfr, 2);
> > +	return regmap_raw_write(priv->regmap, reg, val,
> len);
> 
> ... Which would mean we are lacking a noinc API for write
> here!
> 
> @Mark/Stefan, any reason noinc was not implemented
> symmetrically?
> 
> For sx1276 I have local regmap conversion patches, but I
> kept getting
> -EINVAL for bursts and haven't figured out why yet. Need to
> compare to
> your code - I assume you successfully tested this, Ben.

Yes I was able to load and verify/run the firmware using this
patch on my Sentrius RG1xx board.

Thanks,
Ben

^ permalink raw reply

* Re: [PATCH net 0/3] devlink param type string fixes
From: David Miller @ 2018-10-10 17:20 UTC (permalink / raw)
  To: moshe; +Cc: jiri, netdev, linux-kernel
In-Reply-To: <1539176967-22172-1-git-send-email-moshe@mellanox.com>

From: Moshe Shemesh <moshe@mellanox.com>
Date: Wed, 10 Oct 2018 16:09:24 +0300

> This patchset fixes devlink param infrastructure for string param type.
> 
> The devlink param infrastructure doesn't handle copying the string data
> correctly.  The first two patches fix it and the third patch adds helper
> function to safely copy string value without exceeding
> DEVLINK_PARAM_MAX_STRING_VALUE.

Series applied, thank you.

^ permalink raw reply

* Re: [PATCH net-next] net: mscc: allow extracting the FCS into the skb
From: David Miller @ 2018-10-10 17:26 UTC (permalink / raw)
  To: f.fainelli
  Cc: antoine.tenart, andrew, netdev, linux-kernel, thomas.petazzoni,
	alexandre.belloni, quentin.schulz, allan.nielsen
In-Reply-To: <28179B83-2C71-47FA-9D9C-49237E4A07AE@gmail.com>

From: Florian Fainelli <f.fainelli@gmail.com>
Date: Wed, 10 Oct 2018 09:25:01 -0700

> 
> 
> On October 10, 2018 7:46:31 AM PDT, Antoine Tenart <antoine.tenart@bootlin.com> wrote:
>>Hi all,
>>
>>On Tue, Oct 02, 2018 at 02:43:23PM +0200, Andrew Lunn wrote:
>>> > The question could be "do we need len += ETH_FCS_LEN" to account
>>for the
>>> > FCS when NETIF_F_RXFCS is used", but I looked at other drivers and
>>it
>>> > seemed to me the FCS is not accounted in the stats. Should it be?
>>> 
>>> There does not appear to be a good answer to that. I submitted a
>>patch
>>> to a driver i'm using to not count it, so that the stats counters we
>>> consistent with another driver. The patch was rejected.
>>> 
>>> I think the best you can do is flip a coin, and then document it
>>using
>>> a comment about if it is/is not included.
>>
>>I'll leave it as-is then :)
>>
>>@Dave, Florian: it seems to me no modification was requested after
>>discussing those changes. Where do we stand regarding the patch?
> 
> Silence means agreement, thanks for your explanation, no more questions or concerns on my side.

You'll probably need to resubmit the patch anew though.

^ permalink raw reply

* Re: [PATCH V1 net-next 00/12] Improving performance and reducing latencies, by using latest capabilities exposed in ENA device
From: Jesper Dangaard Brouer @ 2018-10-10 10:15 UTC (permalink / raw)
  To: Bshara, Saeed
  Cc: Bshara, Nafea, Kiyanovski, Arthur, davem@davemloft.net,
	netdev@vger.kernel.org, Woodhouse, David, Machulsky, Zorik,
	Matushevsky, Alexander, Wilson, Matt, Liguori, Anthony,
	Tzalik, Guy, Belgazal, Netanel, Saidi, Ali, Björn Töpel,
	brouer
In-Reply-To: <1539119909353.25034@amazon.com>


On Tue, 9 Oct 2018 21:18:30 +0000 "Bshara, Saeed" <saeedb@amazon.com> wrote:

> Currently the driver allocate page per rx buffer, but we are
> considering to support mode where page split to 2 buffers in order to
> overcome memory fragmentation issue on low memory systems. but, this
> won't work with XDP, right?
> what's your advice?

XDP is easiest to implement with 1-page per rx buffer, but given ixgbe
and i40e violated this, it is possible to use a page-split approach
with 2 frames per page, like they do. Do notice that you have to
deviate from the standard 256 bytes headroom for that to fit (which
basically killed my idea of placing the SKB in this headroom).

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

 
 
> From: Bshara, Nafea
> Sent: Tuesday, October 9, 2018 10:33 PM
> To: Jesper Dangaard Brouer; Kiyanovski, Arthur
> Cc: davem@davemloft.net; netdev@vger.kernel.org; Woodhouse, David; Machulsky, Zorik; Matushevsky, Alexander; Bshara, Saeed; Wilson, Matt; Liguori, Anthony; Tzalik, Guy; Belgazal, Netanel; Saidi, Ali; Björn Töpel
> Subject: Re: [PATCH V1 net-next 00/12] Improving performance and reducing latencies, by using latest capabilities exposed in ENA device
>     
> It is high priority for us right after this major release get merged.
> 
> On 10/9/18, 12:31 PM, "Jesper Dangaard Brouer" <brouer@redhat.com> wrote:
> 
>     
>     On Tue, 9 Oct 2018 21:44:57 +0300 <akiyano@amazon.com> wrote:
>     
>     > From: Arthur Kiyanovski <akiyano@amazon.com>
>     > 
>     > This patchset introduces the following:
>     > 1. A new placement policy of Tx headers and descriptors, which takes
>     > advantage of an option to place headers + descriptors in device memory
>     > space. This is sometimes referred to as LLQ - low latency queue.
>     > The patch set defines the admin capability, maps the device memory as
>     > write-combined, and adds a mode in transmit datapath to do header +
>     > descriptor placement on the device.
>     > 2. Support for RX checksum offloading
>     > 3. Miscelaneous small improvements and code cleanups
>     
>     What are your plans for XDP?
>     
>     You are unsure ask your-colleague David Woodhouse, who I've discussed
>     this with when he attended my talk at Kernel-Recipes[1], slide[2].
>     
>     [1]  https://kernel-recipes.org/en/2018/talks/xdp-a-new-programmable-network-layer/
>     [2]  http://people.netfilter.org/hawk/presentations/KernelRecipes2018/XDP_Kernel_Recipes_2018.pdf
>     -- 
>     Best regards,
>       Jesper Dangaard Brouer
>       MSc.CS, Principal Kernel Engineer at Red Hat
>       LinkedIn: http://www.linkedin.com/in/brouer
>     
> 
>     

^ permalink raw reply

* Re: [PATCH v2 lora-next 1/4] net: lora: sx1301: convert burst spi functions to regmap raw
From: Mark Brown @ 2018-10-10 10:19 UTC (permalink / raw)
  To: Andreas Färber
  Cc: Ben Whitten, starnight, hasnain.virk, netdev, liuxuenetmail,
	shess, Ben Whitten, linux-spi@vger.kernel.org, Stefan Popa
In-Reply-To: <3959c93b-edb5-c6ff-ec36-57f84ab8623f@suse.de>

[-- Attachment #1: Type: text/plain, Size: 473 bytes --]

On Wed, Oct 10, 2018 at 09:59:29AM +0200, Andreas Färber wrote:
> Am 09.10.18 um 14:52 schrieb Ben Whitten:

> > -	return spi_sync_transfer(priv->spi, xfr, 2);
> > +	return regmap_raw_write(priv->regmap, reg, val, len);

> ... Which would mean we are lacking a noinc API for write here!

> @Mark/Stefan, any reason noinc was not implemented symmetrically?

Nobody wanted it enough to do the work of implementing it, there's
nothing stopping someone doing that.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* [BUG] Bluetooth 4.14.74-v7+: hci_connect_le_scan_cleanup() crash NULL deref Raspberry Pi 3B+
From: Andreas Mohr @ 2018-10-10 18:00 UTC (permalink / raw)
  To: Marcel Holtmann, Johan Hedberg, David S. Miller
  Cc: linux-bluetooth, netdev, linux-kernel

Hello all,

[long time no see, good to be back!]

just newly got the following crash with a new setup:

[    6.234357] Adding 102396k swap on /var/swap.  Priority:-2 extents:1 across:102396k SSFS
[    7.114565] IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready
[    7.114582] brcmfmac: power management disabled
[    7.522803] smsc95xx 1-1.1:1.0 eth0: hardware isn't capable of remote wakeup
[    9.102052] smsc95xx 1-1.1:1.0 eth0: link up, 100Mbps, full-duplex, lpa 0x45E1
[   11.887650] Bluetooth: HCI UART driver ver 2.3
[   11.887660] Bluetooth: HCI UART protocol H4 registered
[   11.887663] Bluetooth: HCI UART protocol Three-wire (H5) registered
[   11.887811] Bluetooth: HCI UART protocol Broadcom registered
[   13.399873] bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this.
[   13.402896] Bridge firewalling registered
[   13.424852] nf_conntrack version 0.5.0 (15360 buckets, 61440 max)
[   14.284667] Netfilter messages via NETLINK v0.30.
[   14.290169] ctnetlink v0.93: registering with nfnetlink.
[   14.405316] IPv6: ADDRCONF(NETDEV_UP): docker0: link is not ready
[   33.422932] NET: Registered protocol family 38
[75215.105699] Unable to handle kernel NULL pointer dereference at virtual address 00000012
[75215.109883] pgd = 80004000
[75215.112020] [00000012] *pgd=00000000
[75215.114129] Internal error: Oops: 17 [#1] SMP ARM
[75215.116262] Modules linked in: aes_arm_bs crypto_simd cryptd algif_skcipher af_alg ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter xt_conntrack nf_nat nf_conntrack br_netfilter bridge stp llc overlay hci_uart serdev rfcomm cmac bnep brcmfmac brcmutil cfg80211 snd_bcm2835(C) snd_pcm snd_timer snd uio_pdrv_genirq uio fixed ftdi_sio usbserial btusb btrtl btintel btbcm bluetooth ecdh_generic rfkill i2c_dev ip_tables x_tables ipv6
[75215.133930] CPU: 2 PID: 167 Comm: kworker/u9:2 Tainted: G         C      4.14.74-v7+ #1149
[75215.139286] Hardware name: BCM2835
[75215.142121] Workqueue: hci0 hci_rx_work [bluetooth]
[75215.144853] task: b9315a00 task.stack: b71f8000
[75215.147707] PC is at hci_connect_le_scan_cleanup+0x14/0x124 [bluetooth]
[75215.150642] LR is at create_le_conn_complete+0xcc/0xd8 [bluetooth]
[75215.153455] pc : [<7f0dd0cc>]    lr : [<7f0df4d4>]    psr: 60000013
[75215.156326] sp : b71f9df0  ip : b71f9e10  fp : b71f9e0c
[75215.159124] r10: b2e86180  r9 : 00000088  r8 : b924b780
[75215.161834] r7 : b9da97e8  r6 : b9da9008  r5 : 00000000  r4 : b9da98b4
[75215.164616] r3 : 00200400  r2 : b9da98b4  r1 : 00000000  r0 : 00000000
[75215.167400] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[75215.170254] Control: 10c5383d  Table: 195ac06a  DAC: 00000055
[75215.173069] Process kworker/u9:2 (pid: 167, stack limit = 0xb71f8210)
[75215.175869] Stack: (0xb71f9df0 to 0xb71fa000)
[75215.178574] 9de0:                                     b9da98b4 00000000 b9da9008 b9da97e8
[75215.183810] 9e00: b71f9e2c b71f9e10 7f0df4d4 7f0dd0c4 7f0df408 b9da9000 b2e86180 0000000e
[75215.188960] 9e20: b71f9ebc b71f9e30 7f0e57bc 7f0df414 b71f9e70 b71f9e74 b9315a80 b9315a00
[75215.194193] 9e40: ba372d78 00000000 b9315e60 80b8ed40 b71f9e8c 80b8ed40 a72136c0 b9da97f4
[75215.199699] 9e60: 60000013 00000000 00da97e8 b9da200d 7f0df408 00000000 b71f9e9c b71f9e88
[75215.205455] 9e80: 807a3eec 801ee19c b9da97e8 b2e86180 b71f9ebc b9da9708 b9da98b4 b9da9000
[75215.211367] 9ea0: b9da97e8 b9da9008 00000088 b2e86180 b71f9efc b71f9ec0 7f0d8b3c 7f0e54f4
[75215.217508] 9ec0: b9da9718 b9da901c b908c300 80137520 b71f9efc b9092580 b9da9708 b70dae00
[75215.223972] 9ee0: b908c300 00000000 00000088 00000000 b71f9f34 b71f9f00 801376f0 7f0d89d0
[75215.230596] 9f00: b70dae18 80c02d00 00000088 b70dae00 b9092598 b70dae00 b70dae18 80c02d00
[75215.237352] 9f20: 00000088 b9092580 b71f9f7c b71f9f38 80137a50 801375a4 b71f9f5c b71f8000
[75215.244292] 9f40: 00000000 80c02d00 80c88562 b71f8038 b979b01c b979b000 00000000 b722f300
[75215.251303] 9f60: b9092580 801379ec b979b01c b732fe80 b71f9fac b71f9f80 8013dad4 801379f8
[75215.258312] 9f80: 80102d94 b722f300 8013d998 00000000 00000000 00000000 00000000 00000000
[75215.265319] 9fa0: 00000000 b71f9fb0 8010810c 8013d9a4 00000000 00000000 00000000 00000000
[75215.272327] 9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[75215.279336] 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
[75215.286614] [<7f0dd0cc>] (hci_connect_le_scan_cleanup [bluetooth]) from [<7f0df4d4>] (create_le_conn_complete+0xcc/0xd8 [bluetooth])
[75215.293988] [<7f0df4d4>] (create_le_conn_complete [bluetooth]) from [<7f0e57bc>] (hci_event_packet+0x2d4/0x2de8 [bluetooth])
[75215.301354] [<7f0e57bc>] (hci_event_packet [bluetooth]) from [<7f0d8b3c>] (hci_rx_work+0x178/0x250 [bluetooth])
[75215.308585] [<7f0d8b3c>] (hci_rx_work [bluetooth]) from [<801376f0>] (process_one_work+0x158/0x454)
[75215.315681] [<801376f0>] (process_one_work) from [<80137a50>] (worker_thread+0x64/0x5b8)
[75215.322757] [<80137a50>] (worker_thread) from [<8013dad4>] (kthread+0x13c/0x16c)
[75215.329829] [<8013dad4>] (kthread) from [<8010810c>] (ret_from_fork+0x14/0x28)
[75215.333466] Code: e92dd8f0 e24cb004 e52de004 e8bd4000 (e5d04012) 
[75215.337146] ---[ end trace f3a6771e5232874d ]---



Context notes:
- Raspberry Pi 3B (not 3B+), Raspbian
- Linux [HOSTNAME] 4.14.74-v7+ #1149 SMP Mon Oct 8 17:39:42 BST 2018 armv7l GNU/Linux
- updated to *custom* manual setup (very newish kernel) yesterday (via rpi-update), rebooted
- builtin Bluetooth (hci0)
- external Bluetooth (hci1, CSR I believe)
- FHEM, with various modules regularly polling various Bluetooth clients, via hci0 only
- very newly erected wiring (relocated to different location, new external Bluetooth, etc., yesterday) setup

Returned to the location today, merely to
find external Bluetooth LED blinking *very* rapidly, then
fumbled the stick, at which moment LED went solid.
I returned a couple seconds later, LED had gone dark.

Few minutes later, connected to box, only to find this crash dump trace there.
It seems quite obvious that
this crash may have happened due to
implementation stack mis-handling a (temporary??) connector disconnect/power-loss issue.

Ran
git log v4.14..master -- net/bluetooth/hci_conn.c
, did not immediately see (m)any obviously relevant items.

Thus now reporting here (get_maintainer.pl...).

Thank you!

Greetings,

Andreas Mohr

-- 
GNU/Linux. It's not the software that's free, it's you.

^ permalink raw reply

* Re: KASAN: use-after-free Read in sctp_id2assoc
From: Marcelo Ricardo Leitner @ 2018-10-10 18:13 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: syzbot, David Miller, LKML, linux-sctp, netdev, Neil Horman,
	syzkaller-bugs, Vladislav Yasevich
In-Reply-To: <CACT4Y+YVRWSF+VyoBB1vwDevt0gwZYz69bSAivmZDG6r4J-kaA@mail.gmail.com>

On Wed, Oct 10, 2018 at 05:28:12PM +0200, Dmitry Vyukov wrote:
> On Fri, Oct 5, 2018 at 4:58 PM, Marcelo Ricardo Leitner
> <marcelo.leitner@gmail.com> wrote:
> > On Thu, Oct 04, 2018 at 01:48:03AM -0700, syzbot wrote:
> >> Hello,
> >>
> >> syzbot found the following crash on:
> >>
> >> HEAD commit:    4e6d47206c32 tls: Add support for inplace records encryption
> >> git tree:       net-next
> >> console output: https://syzkaller.appspot.com/x/log.txt?x=13834b81400000
> >> kernel config:  https://syzkaller.appspot.com/x/.config?x=e569aa5632ebd436
> >> dashboard link: https://syzkaller.appspot.com/bug?extid=c7dd55d7aec49d48e49a
> >> compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
> >>
> >> Unfortunately, I don't have any reproducer for this crash yet.
> >>
> >> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> >> Reported-by: syzbot+c7dd55d7aec49d48e49a@syzkaller.appspotmail.com
> >>
> >> netlink: 'syz-executor1': attribute type 1 has an invalid length.
> >> ==================================================================
> >> BUG: KASAN: use-after-free in sctp_id2assoc+0x3a7/0x3e0
> >> net/sctp/socket.c:276
> >> Read of size 8 at addr ffff880195b3eb20 by task syz-executor2/15454
> >>
> >> CPU: 1 PID: 15454 Comm: syz-executor2 Not tainted 4.19.0-rc5+ #242
> >> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> >> Google 01/01/2011
> >> Call Trace:
> >>  __dump_stack lib/dump_stack.c:77 [inline]
> >>  dump_stack+0x1c4/0x2b4 lib/dump_stack.c:113
> >>  print_address_description.cold.8+0x9/0x1ff mm/kasan/report.c:256
> >>  kasan_report_error mm/kasan/report.c:354 [inline]
> >>  kasan_report.cold.9+0x242/0x309 mm/kasan/report.c:412
> >>  __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
> >>  sctp_id2assoc+0x3a7/0x3e0 net/sctp/socket.c:276
> >
> > I'm not seeing yet how this could happen.
> > All sockopts here are serialized by sock_lock.
> > do_peeloff here would create another socket, but the issue was
> > triggered before that.
> > The same function that freed this memory, also removes the entry from
> > idr mapping, so this entry shouldn't be there anymore.
> >
> > I have only two theories so far:
> > - an issue with IDR/RCU.
> > - something else happened that just the call stacks are not revealing.
> 
> The "asoc->base.sk != sk" check after idr_find suggests that we don't
> actually know what sock it belongs to. And if we don't know then

Right. The check is more because the IDR is global and not per socket
(and we don't want sockets accessing asocs from other sockets), and not
that the asoc may move to another socket in between, but it also
protects from such cases, yes.

> locking this sock can't help keeping another sock association alive.
> Am I missing something obvious here? Should we take assoc ref while we

Not sure. Maybe I am.  Thanks for looking into this, btw.

> are still holding sctp_assocs_id_lock?

Shouldn't be needed.

Solely by the call stacks:
- we tried to establish a new asoc from a sctp_connect() call,
  blocking one.
- it slept waiting for the connect
- (something closed the asoc in between the sleeps, because it freed
  the asoc right when waking up on sctp_wait_for_connect())
- it freed the asoc after sleeping on it on sctp_wait_for_connect [A]
- another thread tried to peeloff that asoc [B]

For [B] to access the asoc in question, it had to take the same sock
lock [A] had taken, and then the idr should not return an asoc in
sctp_i2asoc(). Note that we can't peeloff an asoc twice, thus why
the certainty here.

If [B] actually kicked in before the sleep resumed, that should have
been fine because it took the same sock lock [A] would have to
re-take. In this case an asoc would have been returned by
sctp_id2asoc(), the asoc would have been moved to a new socket, but
all while holding the original socket sock lock.

^ permalink raw reply

* Re: KASAN: use-after-free Read in sctp_id2assoc
From: Dmitry Vyukov @ 2018-10-10 18:28 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: syzbot, David Miller, LKML, linux-sctp, netdev, Neil Horman,
	syzkaller-bugs, Vladislav Yasevich
In-Reply-To: <20181010181325.GD6761@localhost.localdomain>

On Wed, Oct 10, 2018 at 8:13 PM, Marcelo Ricardo Leitner
<marcelo.leitner@gmail.com> wrote:
> On Wed, Oct 10, 2018 at 05:28:12PM +0200, Dmitry Vyukov wrote:
>> On Fri, Oct 5, 2018 at 4:58 PM, Marcelo Ricardo Leitner
>> <marcelo.leitner@gmail.com> wrote:
>> > On Thu, Oct 04, 2018 at 01:48:03AM -0700, syzbot wrote:
>> >> Hello,
>> >>
>> >> syzbot found the following crash on:
>> >>
>> >> HEAD commit:    4e6d47206c32 tls: Add support for inplace records encryption
>> >> git tree:       net-next
>> >> console output: https://syzkaller.appspot.com/x/log.txt?x=13834b81400000
>> >> kernel config:  https://syzkaller.appspot.com/x/.config?x=e569aa5632ebd436
>> >> dashboard link: https://syzkaller.appspot.com/bug?extid=c7dd55d7aec49d48e49a
>> >> compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
>> >>
>> >> Unfortunately, I don't have any reproducer for this crash yet.
>> >>
>> >> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>> >> Reported-by: syzbot+c7dd55d7aec49d48e49a@syzkaller.appspotmail.com
>> >>
>> >> netlink: 'syz-executor1': attribute type 1 has an invalid length.
>> >> ==================================================================
>> >> BUG: KASAN: use-after-free in sctp_id2assoc+0x3a7/0x3e0
>> >> net/sctp/socket.c:276
>> >> Read of size 8 at addr ffff880195b3eb20 by task syz-executor2/15454
>> >>
>> >> CPU: 1 PID: 15454 Comm: syz-executor2 Not tainted 4.19.0-rc5+ #242
>> >> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
>> >> Google 01/01/2011
>> >> Call Trace:
>> >>  __dump_stack lib/dump_stack.c:77 [inline]
>> >>  dump_stack+0x1c4/0x2b4 lib/dump_stack.c:113
>> >>  print_address_description.cold.8+0x9/0x1ff mm/kasan/report.c:256
>> >>  kasan_report_error mm/kasan/report.c:354 [inline]
>> >>  kasan_report.cold.9+0x242/0x309 mm/kasan/report.c:412
>> >>  __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
>> >>  sctp_id2assoc+0x3a7/0x3e0 net/sctp/socket.c:276
>> >
>> > I'm not seeing yet how this could happen.
>> > All sockopts here are serialized by sock_lock.
>> > do_peeloff here would create another socket, but the issue was
>> > triggered before that.
>> > The same function that freed this memory, also removes the entry from
>> > idr mapping, so this entry shouldn't be there anymore.
>> >
>> > I have only two theories so far:
>> > - an issue with IDR/RCU.
>> > - something else happened that just the call stacks are not revealing.
>>
>> The "asoc->base.sk != sk" check after idr_find suggests that we don't
>> actually know what sock it belongs to. And if we don't know then
>
> Right. The check is more because the IDR is global and not per socket
> (and we don't want sockets accessing asocs from other sockets), and not
> that the asoc may move to another socket in between, but it also
> protects from such cases, yes.
>
>> locking this sock can't help keeping another sock association alive.
>> Am I missing something obvious here? Should we take assoc ref while we
>
> Not sure. Maybe I am.  Thanks for looking into this, btw.
>
>> are still holding sctp_assocs_id_lock?
>
> Shouldn't be needed.
>
> Solely by the call stacks:
> - we tried to establish a new asoc from a sctp_connect() call,
>   blocking one.
> - it slept waiting for the connect
> - (something closed the asoc in between the sleeps, because it freed
>   the asoc right when waking up on sctp_wait_for_connect())
> - it freed the asoc after sleeping on it on sctp_wait_for_connect [A]
> - another thread tried to peeloff that asoc [B]
>
> For [B] to access the asoc in question, it had to take the same sock
> lock [A] had taken, and then the idr should not return an asoc in
> sctp_i2asoc(). Note that we can't peeloff an asoc twice, thus why
> the certainty here.
>
> If [B] actually kicked in before the sleep resumed, that should have
> been fine because it took the same sock lock [A] would have to
> re-take. In this case an asoc would have been returned by
> sctp_id2asoc(), the asoc would have been moved to a new socket, but
> all while holding the original socket sock lock.

But why A and B use the same lock?

sctp_assocs_id is global, so it contains asocs from all sockets, right?
assoc id comes straight from userspaces.
So isn't it possible that B uses completely different sock but passes
assoc id from the A sock? Then B should find assoc in sctp_assocs_id,
and at the point of "asoc->base.sk != sk" check the assoc can be
already freed.

^ permalink raw reply

* Re: KASAN: use-after-free Read in sctp_id2assoc
From: Marcelo Ricardo Leitner @ 2018-10-10 18:40 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: syzbot, David Miller, LKML, linux-sctp, netdev, Neil Horman,
	syzkaller-bugs, Vladislav Yasevich
In-Reply-To: <CACT4Y+YYpjjvW-UCkHEXWZA50F==7JC1cnMvoAZHKmOqq=9vAw@mail.gmail.com>

On Wed, Oct 10, 2018 at 08:28:22PM +0200, Dmitry Vyukov wrote:
> On Wed, Oct 10, 2018 at 8:13 PM, Marcelo Ricardo Leitner
> <marcelo.leitner@gmail.com> wrote:
> > On Wed, Oct 10, 2018 at 05:28:12PM +0200, Dmitry Vyukov wrote:
> >> On Fri, Oct 5, 2018 at 4:58 PM, Marcelo Ricardo Leitner
> >> <marcelo.leitner@gmail.com> wrote:
> >> > On Thu, Oct 04, 2018 at 01:48:03AM -0700, syzbot wrote:
> >> >> Hello,
> >> >>
> >> >> syzbot found the following crash on:
> >> >>
> >> >> HEAD commit:    4e6d47206c32 tls: Add support for inplace records encryption
> >> >> git tree:       net-next
> >> >> console output: https://syzkaller.appspot.com/x/log.txt?x=13834b81400000
> >> >> kernel config:  https://syzkaller.appspot.com/x/.config?x=e569aa5632ebd436
> >> >> dashboard link: https://syzkaller.appspot.com/bug?extid=c7dd55d7aec49d48e49a
> >> >> compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
> >> >>
> >> >> Unfortunately, I don't have any reproducer for this crash yet.
> >> >>
> >> >> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> >> >> Reported-by: syzbot+c7dd55d7aec49d48e49a@syzkaller.appspotmail.com
> >> >>
> >> >> netlink: 'syz-executor1': attribute type 1 has an invalid length.
> >> >> ==================================================================
> >> >> BUG: KASAN: use-after-free in sctp_id2assoc+0x3a7/0x3e0
> >> >> net/sctp/socket.c:276
> >> >> Read of size 8 at addr ffff880195b3eb20 by task syz-executor2/15454
> >> >>
> >> >> CPU: 1 PID: 15454 Comm: syz-executor2 Not tainted 4.19.0-rc5+ #242
> >> >> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> >> >> Google 01/01/2011
> >> >> Call Trace:
> >> >>  __dump_stack lib/dump_stack.c:77 [inline]
> >> >>  dump_stack+0x1c4/0x2b4 lib/dump_stack.c:113
> >> >>  print_address_description.cold.8+0x9/0x1ff mm/kasan/report.c:256
> >> >>  kasan_report_error mm/kasan/report.c:354 [inline]
> >> >>  kasan_report.cold.9+0x242/0x309 mm/kasan/report.c:412
> >> >>  __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
> >> >>  sctp_id2assoc+0x3a7/0x3e0 net/sctp/socket.c:276
> >> >
> >> > I'm not seeing yet how this could happen.
> >> > All sockopts here are serialized by sock_lock.
> >> > do_peeloff here would create another socket, but the issue was
> >> > triggered before that.
> >> > The same function that freed this memory, also removes the entry from
> >> > idr mapping, so this entry shouldn't be there anymore.
> >> >
> >> > I have only two theories so far:
> >> > - an issue with IDR/RCU.
> >> > - something else happened that just the call stacks are not revealing.
> >>
> >> The "asoc->base.sk != sk" check after idr_find suggests that we don't
> >> actually know what sock it belongs to. And if we don't know then
> >
> > Right. The check is more because the IDR is global and not per socket
> > (and we don't want sockets accessing asocs from other sockets), and not
> > that the asoc may move to another socket in between, but it also
> > protects from such cases, yes.
> >
> >> locking this sock can't help keeping another sock association alive.
> >> Am I missing something obvious here? Should we take assoc ref while we
> >
> > Not sure. Maybe I am.  Thanks for looking into this, btw.
> >
> >> are still holding sctp_assocs_id_lock?
> >
> > Shouldn't be needed.
> >
> > Solely by the call stacks:
> > - we tried to establish a new asoc from a sctp_connect() call,
> >   blocking one.
> > - it slept waiting for the connect
> > - (something closed the asoc in between the sleeps, because it freed
> >   the asoc right when waking up on sctp_wait_for_connect())
> > - it freed the asoc after sleeping on it on sctp_wait_for_connect [A]
> > - another thread tried to peeloff that asoc [B]
> >
> > For [B] to access the asoc in question, it had to take the same sock
> > lock [A] had taken, and then the idr should not return an asoc in
> > sctp_i2asoc(). Note that we can't peeloff an asoc twice, thus why
> > the certainty here.
> >
> > If [B] actually kicked in before the sleep resumed, that should have
> > been fine because it took the same sock lock [A] would have to
> > re-take. In this case an asoc would have been returned by
> > sctp_id2asoc(), the asoc would have been moved to a new socket, but
> > all while holding the original socket sock lock.
> 
> But why A and B use the same lock?
> 
> sctp_assocs_id is global, so it contains asocs from all sockets, right?
> assoc id comes straight from userspaces.
> So isn't it possible that B uses completely different sock but passes
> assoc id from the A sock? Then B should find assoc in sctp_assocs_id,
> and at the point of "asoc->base.sk != sk" check the assoc can be
> already freed.

That explains it. Somehow I was thinking the issue was for reading
->dead instead.  Now it's pretty clear. Then this should be the patch
we want. Can you please give it a spin? (only compile tested)

While holding the spinlock, an entry cannot be removed from the idr
and thus it cannot be freed. So even if the app uses an id from
another socket, it will still be there.

---8<---

diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index f73e9d38d5ba..a7722f43aa69 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -271,11 +271,10 @@ struct sctp_association *sctp_id2assoc(struct sock *sk, sctp_assoc_t id)
 
 	spin_lock_bh(&sctp_assocs_id_lock);
 	asoc = (struct sctp_association *)idr_find(&sctp_assocs_id, (int)id);
+	if (asoc && (asoc->base.sk != sk || asoc->base.dead))
+		asoc = NULL;
 	spin_unlock_bh(&sctp_assocs_id_lock);
 
-	if (!asoc || (asoc->base.sk != sk) || asoc->base.dead)
-		return NULL;
-
 	return asoc;
 }
 

^ permalink raw reply related

* Re: [PATCH 06/31] netfilter: nf_tables: add xfrm expression
From: Eyal Birger @ 2018-10-10 11:39 UTC (permalink / raw)
  To: fw; +Cc: Pablo Neira Ayuso, netfilter-devel, netdev
In-Reply-To: <20181008230125.2330-7-pablo@netfilter.org>

Hi Florian,

On Tue,  9 Oct 2018 01:01:00 +0200
Pablo Neira Ayuso <pablo@netfilter.org> wrote:

> From: Florian Westphal <fw@strlen.de>
> 
> supports fetching saddr/daddr of tunnel mode states, request id and
> spi. If direction is 'in', use inbound skb secpath, else dst->xfrm.
> 
> Joint work with Máté Eckl.
> 
> Signed-off-by: Florian Westphal <fw@strlen.de>
> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> ---

<snip>

> +
> +/* Return true if key asks for daddr/saddr and current
> + * state does have a valid address (BEET, TUNNEL).
> + */
> +static bool xfrm_state_addr_ok(enum nft_xfrm_keys k, u8 family, u8
> mode) +{
> +	switch (k) {
> +	case NFT_XFRM_KEY_DADDR_IP4:
> +	case NFT_XFRM_KEY_SADDR_IP4:
> +		if (family == NFPROTO_IPV4)
> +			break;
> +		return false;
> +	case NFT_XFRM_KEY_DADDR_IP6:
> +	case NFT_XFRM_KEY_SADDR_IP6:
> +		if (family == NFPROTO_IPV6)
> +			break;
> +		return false;
> +	default:
> +		return true;
> +	}
> +
> +	return mode == XFRM_MODE_BEET || mode == XFRM_MODE_TUNNEL;
> +}
> +
> +static void nft_xfrm_state_get_key(const struct nft_xfrm *priv,
> +				   struct nft_regs *regs,
> +				   const struct xfrm_state *state,
> +				   u8 family)
> +{
> +	u32 *dest = &regs->data[priv->dreg];
> +
> +	if (!xfrm_state_addr_ok(priv->key, family,
> state->props.mode)) {
> +		regs->verdict.code = NFT_BREAK;
> +		return;
> +	}
> +
> +	switch (priv->key) {
> +	case NFT_XFRM_KEY_UNSPEC:
> +	case __NFT_XFRM_KEY_MAX:
> +		WARN_ON_ONCE(1);
> +		break;
> +	case NFT_XFRM_KEY_DADDR_IP4:
> +		*dest = state->id.daddr.a4;
> +		return;
> +	case NFT_XFRM_KEY_DADDR_IP6:
> +		memcpy(dest, &state->id.daddr.in6, sizeof(struct
> in6_addr));
> +		return;
> +	case NFT_XFRM_KEY_SADDR_IP4:
> +		*dest = state->props.saddr.a4;
> +		return;
> +	case NFT_XFRM_KEY_SADDR_IP6:
> +		memcpy(dest, &state->props.saddr.in6, sizeof(struct
> in6_addr));
> +		return;
> +	case NFT_XFRM_KEY_REQID:
> +		*dest = state->props.reqid;
> +		return;
> +	case NFT_XFRM_KEY_SPI:
> +		*dest = state->id.spi;
> +		return;
> +	}
> +
> +	regs->verdict.code = NFT_BREAK;
> +}
> +
> +static void nft_xfrm_get_eval_in(const struct nft_xfrm *priv,
> +				    struct nft_regs *regs,
> +				    const struct nft_pktinfo *pkt)
> +{
> +	const struct sec_path *sp = pkt->skb->sp;
> +	const struct xfrm_state *state;
> +
> +	if (sp == NULL || sp->len <= priv->spnum) {
> +		regs->verdict.code = NFT_BREAK;
> +		return;
> +	}
> +
> +	state = sp->xvec[priv->spnum];
> +	nft_xfrm_state_get_key(priv, regs, state, nft_pf(pkt));

I'm not familiar enough with nftables to be sure, but doesn't the use
of nft_pf(pkt) in this context limit the matching of encapsulated
packets to the same family?

IIUC when an e.g. IPv6-in-IPv4 packet is matched, the nft_pf(pkt) will
be the decapsulated packet family - IPv6 - whereas the state may be
IPv4. So this check would not allow matching the 'underlay' address in
such cases.

I know this was a limitation in xt_policy. but is this intentional in
this matcher? or is it possible to use state->props.family when
validating the match instead of nft_pf(pkt)?

Eyal.

^ permalink raw reply

* Re: [PATCH net-next 04/19] net: usb: aqc111: Various callbacks implementation
From: Oliver Neukum @ 2018-10-10 11:33 UTC (permalink / raw)
  To: Bjørn Mork
  Cc: Dmitry Bezrukov, Igor Russkikh, David S . Miller,
	linux-usb@vger.kernel.org, netdev@vger.kernel.org
In-Reply-To: <87efcz9rn1.fsf@miraculix.mork.no>

On Di, 2018-10-09 at 15:27 +0200, Bjørn Mork  wrote:
> Oliver Neukum <oneukum@suse.com> writes:
> 
> > On Fr, 2018-10-05 at 10:24 +0000, Igor Russkikh wrote:
> > > From: Dmitry Bezrukov <dmitry.bezrukov@aquantia.com>
> > > 
> > > Reset, stop callbacks, driver unbind callback.
> > > More register defines required for these callbacks.
> > > 
> > > Signed-off-by: Dmitry Bezrukov <dmitry.bezrukov@aquantia.com>
> > > Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
> > > ---
> > >  drivers/net/usb/aqc111.c |  48 ++++++++++++++++++++++
> > >  drivers/net/usb/aqc111.h | 101 +++++++++++++++++++++++++++++++++++++++++++++++
> > >  2 files changed, 149 insertions(+)
> > > 
> > > diff --git a/drivers/net/usb/aqc111.c b/drivers/net/usb/aqc111.c
> > > index 7f3e5a615750..22bb259d71fb 100644
> > > --- a/drivers/net/usb/aqc111.c
> > > +++ b/drivers/net/usb/aqc111.c
> > > @@ -169,12 +169,60 @@ static int aqc111_bind(struct usbnet *dev, struct usb_interface *intf)
> > >  
> > >  static void aqc111_unbind(struct usbnet *dev, struct usb_interface *intf)
> > >  {
> > > +	u8 reg8;
> > > +	u16 reg16;
> > > +
> > > +	/* Force bz */
> > > +	reg16 = SFR_PHYPWR_RSTCTL_BZ;
> > > +	aqc111_write_cmd_nopm(dev, AQ_ACCESS_MAC, SFR_PHYPWR_RSTCTL,
> > > +			      2, 2, &reg16);
> > 
> > No, I am sorry, you are doing DMA on the kernel stack. That is not
> > allowed. These functions will all have to be fixed.
> 
> Huh?  No, he doesn't.  That's the whole point with
> usbnet_read_cmd_nopm(), isn't it?

Right. Too many indirections for me.
Please disregard my comment.

	Regards
		Oliver

^ permalink raw reply

* [PATCH net-next 0/4] Adds support of RSS to HNS3 Driver for Rev 2(=0x21) H/W
From: Salil Mehta @ 2018-10-10 19:05 UTC (permalink / raw)
  To: davem
  Cc: salil.mehta, yisen.zhuang, lipeng321, mehta.salil, netdev,
	linux-kernel, linuxarm

This patch-set mainly adds new additions related to RSS for the new
hardware Revision 0x21. It also adds support to use RSS hash value
provided by the hardware along with descriptor.

Jian Shen (3):
  net: hns3: Add new RSS hash algorithm support for PF
  net: hns3: Add RSS general configuration support for VF
  net: hns3: Add RSS tuples support for VF

Peng Li (1):
  net: hns3: Add HW RSS hash information to RX skb

 drivers/net/ethernet/hisilicon/hns3/hnae3.h        |   1 +
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.c    |  17 +
 drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c |  10 +-
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c    |  49 ++-
 .../ethernet/hisilicon/hns3/hns3vf/hclgevf_cmd.h   |   8 +-
 .../ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c  | 366 ++++++++++++++++++---
 .../ethernet/hisilicon/hns3/hns3vf/hclgevf_main.h  |  19 ++
 7 files changed, 406 insertions(+), 64 deletions(-)

-- 
2.7.4

^ permalink raw reply

* [PATCH net-next 1/4] net: hns3: Add new RSS hash algorithm support for PF
From: Salil Mehta @ 2018-10-10 19:05 UTC (permalink / raw)
  To: davem
  Cc: salil.mehta, yisen.zhuang, lipeng321, mehta.salil, netdev,
	linux-kernel, linuxarm, Jian Shen
In-Reply-To: <20181010190537.20972-1-salil.mehta@huawei.com>

From: Jian Shen <shenjian15@huawei.com>

This patch adds ETH_RSS_HASH_XOR hash algorithm supports, which
is supported by hw revision 0x21.

Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
---
 drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c |  9 ++++---
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c    | 31 +++++++++++++++++-----
 2 files changed, 30 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
index 7d79a07..cbf9a65 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
@@ -678,12 +678,13 @@ static int hns3_set_rss(struct net_device *netdev, const u32 *indir,
 	if (!h->ae_algo || !h->ae_algo->ops || !h->ae_algo->ops->set_rss)
 		return -EOPNOTSUPP;
 
-	/* currently we only support Toeplitz hash */
-	if ((hfunc != ETH_RSS_HASH_NO_CHANGE) && (hfunc != ETH_RSS_HASH_TOP)) {
-		netdev_err(netdev,
-			   "hash func not supported (only Toeplitz hash)\n");
+	if ((h->pdev->revision == 0x20 &&
+	     hfunc != ETH_RSS_HASH_TOP) || (hfunc != ETH_RSS_HASH_NO_CHANGE &&
+	     hfunc != ETH_RSS_HASH_TOP && hfunc != ETH_RSS_HASH_XOR)) {
+		netdev_err(netdev, "hash func not supported\n");
 		return -EOPNOTSUPP;
 	}
+
 	if (!indir) {
 		netdev_err(netdev,
 			   "set rss failed for indir is empty\n");
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index ca1a936..0c5a053 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -2806,8 +2806,19 @@ static int hclge_get_rss(struct hnae3_handle *handle, u32 *indir,
 	int i;
 
 	/* Get hash algorithm */
-	if (hfunc)
-		*hfunc = vport->rss_algo;
+	if (hfunc) {
+		switch (vport->rss_algo) {
+		case HCLGE_RSS_HASH_ALGO_TOEPLITZ:
+			*hfunc = ETH_RSS_HASH_TOP;
+			break;
+		case HCLGE_RSS_HASH_ALGO_SIMPLE:
+			*hfunc = ETH_RSS_HASH_XOR;
+			break;
+		default:
+			*hfunc = ETH_RSS_HASH_UNKNOWN;
+			break;
+		}
+	}
 
 	/* Get the RSS Key required by the user */
 	if (key)
@@ -2831,12 +2842,20 @@ static int hclge_set_rss(struct hnae3_handle *handle, const u32 *indir,
 
 	/* Set the RSS Hash Key if specififed by the user */
 	if (key) {
-
-		if (hfunc == ETH_RSS_HASH_TOP ||
-		    hfunc == ETH_RSS_HASH_NO_CHANGE)
+		switch (hfunc) {
+		case ETH_RSS_HASH_TOP:
 			hash_algo = HCLGE_RSS_HASH_ALGO_TOEPLITZ;
-		else
+			break;
+		case ETH_RSS_HASH_XOR:
+			hash_algo = HCLGE_RSS_HASH_ALGO_SIMPLE;
+			break;
+		case ETH_RSS_HASH_NO_CHANGE:
+			hash_algo = vport->rss_algo;
+			break;
+		default:
 			return -EINVAL;
+		}
+
 		ret = hclge_set_rss_algo_key(hdev, hash_algo, key);
 		if (ret)
 			return ret;
-- 
2.7.4

^ permalink raw reply related

* [PATCH net-next 2/4] net: hns3: Add RSS general configuration support for VF
From: Salil Mehta @ 2018-10-10 19:05 UTC (permalink / raw)
  To: davem
  Cc: salil.mehta, yisen.zhuang, lipeng321, mehta.salil, netdev,
	linux-kernel, linuxarm, Jian Shen
In-Reply-To: <20181010190537.20972-1-salil.mehta@huawei.com>

From: Jian Shen <shenjian15@huawei.com>

This patch adds RSS key, hash algorithm configuration support
for VF in revision 0x21.

Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
---
 .../ethernet/hisilicon/hns3/hns3vf/hclgevf_cmd.h   |   3 +-
 .../ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c  | 155 ++++++++++++++-------
 2 files changed, 106 insertions(+), 52 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_cmd.h b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_cmd.h
index 19b3286..eb8ed11 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_cmd.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_cmd.h
@@ -148,7 +148,8 @@ struct hclgevf_query_res_cmd {
 	__le16 rsv[7];
 };
 
-#define HCLGEVF_RSS_HASH_KEY_OFFSET	4
+#define HCLGEVF_RSS_DEFAULT_OUTPORT_B	4
+#define HCLGEVF_RSS_HASH_KEY_OFFSET_B	4
 #define HCLGEVF_RSS_HASH_KEY_NUM	16
 struct hclgevf_rss_config_cmd {
 	u8 hash_config;
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
index ca4a9f7..f21196b 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
@@ -386,6 +386,47 @@ static int hclgevf_get_vector_index(struct hclgevf_dev *hdev, int vector)
 	return -EINVAL;
 }
 
+static int hclgevf_set_rss_algo_key(struct hclgevf_dev *hdev,
+				    const u8 hfunc, const u8 *key)
+{
+	struct hclgevf_rss_config_cmd *req;
+	struct hclgevf_desc desc;
+	int key_offset;
+	int key_size;
+	int ret;
+
+	req = (struct hclgevf_rss_config_cmd *)desc.data;
+
+	for (key_offset = 0; key_offset < 3; key_offset++) {
+		hclgevf_cmd_setup_basic_desc(&desc,
+					     HCLGEVF_OPC_RSS_GENERIC_CONFIG,
+					     false);
+
+		req->hash_config |= (hfunc & HCLGEVF_RSS_HASH_ALGO_MASK);
+		req->hash_config |=
+			(key_offset << HCLGEVF_RSS_HASH_KEY_OFFSET_B);
+
+		if (key_offset == 2)
+			key_size =
+			HCLGEVF_RSS_KEY_SIZE - HCLGEVF_RSS_HASH_KEY_NUM * 2;
+		else
+			key_size = HCLGEVF_RSS_HASH_KEY_NUM;
+
+		memcpy(req->hash_key,
+		       key + key_offset * HCLGEVF_RSS_HASH_KEY_NUM, key_size);
+
+		ret = hclgevf_cmd_send(&hdev->hw, &desc, 1);
+		if (ret) {
+			dev_err(&hdev->pdev->dev,
+				"Configure RSS config fail, status = %d\n",
+				ret);
+			return ret;
+		}
+	}
+
+	return 0;
+}
+
 static u32 hclgevf_get_rss_key_size(struct hnae3_handle *handle)
 {
 	return HCLGEVF_RSS_KEY_SIZE;
@@ -466,68 +507,40 @@ static int hclgevf_set_rss_tc_mode(struct hclgevf_dev *hdev,  u16 rss_size)
 	return status;
 }
 
-static int hclgevf_get_rss_hw_cfg(struct hnae3_handle *handle, u8 *hash,
-				  u8 *key)
+static int hclgevf_get_rss(struct hnae3_handle *handle, u32 *indir, u8 *key,
+			   u8 *hfunc)
 {
 	struct hclgevf_dev *hdev = hclgevf_ae_get_hdev(handle);
-	struct hclgevf_rss_config_cmd *req;
-	int lkup_times = key ? 3 : 1;
-	struct hclgevf_desc desc;
-	int key_offset;
-	int key_size;
-	int status;
-
-	req = (struct hclgevf_rss_config_cmd *)desc.data;
-	lkup_times = (lkup_times == 3) ? 3 : ((hash) ? 1 : 0);
-
-	for (key_offset = 0; key_offset < lkup_times; key_offset++) {
-		hclgevf_cmd_setup_basic_desc(&desc,
-					     HCLGEVF_OPC_RSS_GENERIC_CONFIG,
-					     true);
-		req->hash_config |= (key_offset << HCLGEVF_RSS_HASH_KEY_OFFSET);
+	struct hclgevf_rss_cfg *rss_cfg = &hdev->rss_cfg;
+	int i;
 
-		status = hclgevf_cmd_send(&hdev->hw, &desc, 1);
-		if (status) {
-			dev_err(&hdev->pdev->dev,
-				"failed to get hardware RSS cfg, status = %d\n",
-				status);
-			return status;
+	if (handle->pdev->revision >= 0x21) {
+		/* Get hash algorithm */
+		if (hfunc) {
+			switch (rss_cfg->hash_algo) {
+			case HCLGEVF_RSS_HASH_ALGO_TOEPLITZ:
+				*hfunc = ETH_RSS_HASH_TOP;
+				break;
+			case HCLGEVF_RSS_HASH_ALGO_SIMPLE:
+				*hfunc = ETH_RSS_HASH_XOR;
+				break;
+			default:
+				*hfunc = ETH_RSS_HASH_UNKNOWN;
+				break;
+			}
 		}
 
-		if (key_offset == 2)
-			key_size =
-			HCLGEVF_RSS_KEY_SIZE - HCLGEVF_RSS_HASH_KEY_NUM * 2;
-		else
-			key_size = HCLGEVF_RSS_HASH_KEY_NUM;
-
+		/* Get the RSS Key required by the user */
 		if (key)
-			memcpy(key + key_offset * HCLGEVF_RSS_HASH_KEY_NUM,
-			       req->hash_key,
-			       key_size);
+			memcpy(key, rss_cfg->rss_hash_key,
+			       HCLGEVF_RSS_KEY_SIZE);
 	}
 
-	if (hash) {
-		if ((req->hash_config & 0xf) == HCLGEVF_RSS_HASH_ALGO_TOEPLITZ)
-			*hash = ETH_RSS_HASH_TOP;
-		else
-			*hash = ETH_RSS_HASH_UNKNOWN;
-	}
-
-	return 0;
-}
-
-static int hclgevf_get_rss(struct hnae3_handle *handle, u32 *indir, u8 *key,
-			   u8 *hfunc)
-{
-	struct hclgevf_dev *hdev = hclgevf_ae_get_hdev(handle);
-	struct hclgevf_rss_cfg *rss_cfg = &hdev->rss_cfg;
-	int i;
-
 	if (indir)
 		for (i = 0; i < HCLGEVF_RSS_IND_TBL_SIZE; i++)
 			indir[i] = rss_cfg->rss_indirection_tbl[i];
 
-	return hclgevf_get_rss_hw_cfg(handle, hfunc, key);
+	return 0;
 }
 
 static int hclgevf_set_rss(struct hnae3_handle *handle, const u32 *indir,
@@ -535,7 +548,36 @@ static int hclgevf_set_rss(struct hnae3_handle *handle, const u32 *indir,
 {
 	struct hclgevf_dev *hdev = hclgevf_ae_get_hdev(handle);
 	struct hclgevf_rss_cfg *rss_cfg = &hdev->rss_cfg;
-	int i;
+	int ret, i;
+
+	if (handle->pdev->revision >= 0x21) {
+		/* Set the RSS Hash Key if specififed by the user */
+		if (key) {
+			switch (hfunc) {
+			case ETH_RSS_HASH_TOP:
+				rss_cfg->hash_algo =
+					HCLGEVF_RSS_HASH_ALGO_TOEPLITZ;
+				break;
+			case ETH_RSS_HASH_XOR:
+				rss_cfg->hash_algo =
+					HCLGEVF_RSS_HASH_ALGO_SIMPLE;
+				break;
+			case ETH_RSS_HASH_NO_CHANGE:
+				break;
+			default:
+				return -EINVAL;
+			}
+
+			ret = hclgevf_set_rss_algo_key(hdev, rss_cfg->hash_algo,
+						       key);
+			if (ret)
+				return ret;
+
+			/* Update the shadow RSS key with user specified qids */
+			memcpy(rss_cfg->rss_hash_key, key,
+			       HCLGEVF_RSS_KEY_SIZE);
+		}
+	}
 
 	/* update the shadow RSS table with user specified qids */
 	for (i = 0; i < HCLGEVF_RSS_IND_TBL_SIZE; i++)
@@ -1276,6 +1318,17 @@ static int hclgevf_rss_init_hw(struct hclgevf_dev *hdev)
 
 	rss_cfg->rss_size = hdev->rss_size_max;
 
+	if (hdev->pdev->revision >= 0x21) {
+		rss_cfg->hash_algo = HCLGEVF_RSS_HASH_ALGO_TOEPLITZ;
+		netdev_rss_key_fill(rss_cfg->rss_hash_key,
+				    HCLGEVF_RSS_KEY_SIZE);
+
+		ret = hclgevf_set_rss_algo_key(hdev, rss_cfg->hash_algo,
+					       rss_cfg->rss_hash_key);
+		if (ret)
+			return ret;
+	}
+
 	/* Initialize RSS indirect table for each vport */
 	for (i = 0; i < HCLGEVF_RSS_IND_TBL_SIZE; i++)
 		rss_cfg->rss_indirection_tbl[i] = i % hdev->rss_size_max;
-- 
2.7.4

^ permalink raw reply related

* [PATCH net-next 3/4] net: hns3: Add RSS tuples support for VF
From: Salil Mehta @ 2018-10-10 19:05 UTC (permalink / raw)
  To: davem
  Cc: salil.mehta, yisen.zhuang, lipeng321, mehta.salil, netdev,
	linux-kernel, linuxarm, Jian Shen
In-Reply-To: <20181010190537.20972-1-salil.mehta@huawei.com>

From: Jian Shen <shenjian15@huawei.com>

This patch adds RSS tuple support for VF in revision
0x21.

Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
---
 drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c |   1 +
 .../ethernet/hisilicon/hns3/hns3vf/hclgevf_cmd.h   |   5 +-
 .../ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c  | 211 +++++++++++++++++++++
 .../ethernet/hisilicon/hns3/hns3vf/hclgevf_main.h  |  19 ++
 4 files changed, 234 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
index cbf9a65..f1354f6 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
@@ -1078,6 +1078,7 @@ static const struct ethtool_ops hns3vf_ethtool_ops = {
 	.get_ethtool_stats = hns3_get_stats,
 	.get_sset_count = hns3_get_sset_count,
 	.get_rxnfc = hns3_get_rxnfc,
+	.set_rxnfc = hns3_set_rxnfc,
 	.get_rxfh_key_size = hns3_get_rss_key_size,
 	.get_rxfh_indir_size = hns3_get_rss_indir_size,
 	.get_rxfh = hns3_get_rss,
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_cmd.h b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_cmd.h
index eb8ed11..bc294b0 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_cmd.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_cmd.h
@@ -89,6 +89,7 @@ enum hclgevf_opcode_type {
 	HCLGEVF_OPC_CFG_COM_TQP_QUEUE	= 0x0B20,
 	/* RSS cmd */
 	HCLGEVF_OPC_RSS_GENERIC_CONFIG	= 0x0D01,
+	HCLGEVF_OPC_RSS_INPUT_TUPLE     = 0x0D02,
 	HCLGEVF_OPC_RSS_INDIR_TABLE	= 0x0D07,
 	HCLGEVF_OPC_RSS_TC_MODE		= 0x0D08,
 	/* Mailbox cmd */
@@ -160,11 +161,11 @@ struct hclgevf_rss_config_cmd {
 struct hclgevf_rss_input_tuple_cmd {
 	u8 ipv4_tcp_en;
 	u8 ipv4_udp_en;
-	u8 ipv4_stcp_en;
+	u8 ipv4_sctp_en;
 	u8 ipv4_fragment_en;
 	u8 ipv6_tcp_en;
 	u8 ipv6_udp_en;
-	u8 ipv6_stcp_en;
+	u8 ipv6_sctp_en;
 	u8 ipv6_fragment_en;
 	u8 rsv[16];
 };
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
index f21196b..ac67fec 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
@@ -587,6 +587,193 @@ static int hclgevf_set_rss(struct hnae3_handle *handle, const u32 *indir,
 	return hclgevf_set_rss_indir_table(hdev);
 }
 
+static u8 hclgevf_get_rss_hash_bits(struct ethtool_rxnfc *nfc)
+{
+	u8 hash_sets = nfc->data & RXH_L4_B_0_1 ? HCLGEVF_S_PORT_BIT : 0;
+
+	if (nfc->data & RXH_L4_B_2_3)
+		hash_sets |= HCLGEVF_D_PORT_BIT;
+	else
+		hash_sets &= ~HCLGEVF_D_PORT_BIT;
+
+	if (nfc->data & RXH_IP_SRC)
+		hash_sets |= HCLGEVF_S_IP_BIT;
+	else
+		hash_sets &= ~HCLGEVF_S_IP_BIT;
+
+	if (nfc->data & RXH_IP_DST)
+		hash_sets |= HCLGEVF_D_IP_BIT;
+	else
+		hash_sets &= ~HCLGEVF_D_IP_BIT;
+
+	if (nfc->flow_type == SCTP_V4_FLOW || nfc->flow_type == SCTP_V6_FLOW)
+		hash_sets |= HCLGEVF_V_TAG_BIT;
+
+	return hash_sets;
+}
+
+static int hclgevf_set_rss_tuple(struct hnae3_handle *handle,
+				 struct ethtool_rxnfc *nfc)
+{
+	struct hclgevf_dev *hdev = hclgevf_ae_get_hdev(handle);
+	struct hclgevf_rss_cfg *rss_cfg = &hdev->rss_cfg;
+	struct hclgevf_rss_input_tuple_cmd *req;
+	struct hclgevf_desc desc;
+	u8 tuple_sets;
+	int ret;
+
+	if (handle->pdev->revision == 0x20)
+		return -EOPNOTSUPP;
+
+	if (nfc->data &
+	    ~(RXH_IP_SRC | RXH_IP_DST | RXH_L4_B_0_1 | RXH_L4_B_2_3))
+		return -EINVAL;
+
+	req = (struct hclgevf_rss_input_tuple_cmd *)desc.data;
+	hclgevf_cmd_setup_basic_desc(&desc, HCLGEVF_OPC_RSS_INPUT_TUPLE, false);
+
+	req->ipv4_tcp_en = rss_cfg->rss_tuple_sets.ipv4_tcp_en;
+	req->ipv4_udp_en = rss_cfg->rss_tuple_sets.ipv4_udp_en;
+	req->ipv4_sctp_en = rss_cfg->rss_tuple_sets.ipv4_sctp_en;
+	req->ipv4_fragment_en = rss_cfg->rss_tuple_sets.ipv4_fragment_en;
+	req->ipv6_tcp_en = rss_cfg->rss_tuple_sets.ipv6_tcp_en;
+	req->ipv6_udp_en = rss_cfg->rss_tuple_sets.ipv6_udp_en;
+	req->ipv6_sctp_en = rss_cfg->rss_tuple_sets.ipv6_sctp_en;
+	req->ipv6_fragment_en = rss_cfg->rss_tuple_sets.ipv6_fragment_en;
+
+	tuple_sets = hclgevf_get_rss_hash_bits(nfc);
+	switch (nfc->flow_type) {
+	case TCP_V4_FLOW:
+		req->ipv4_tcp_en = tuple_sets;
+		break;
+	case TCP_V6_FLOW:
+		req->ipv6_tcp_en = tuple_sets;
+		break;
+	case UDP_V4_FLOW:
+		req->ipv4_udp_en = tuple_sets;
+		break;
+	case UDP_V6_FLOW:
+		req->ipv6_udp_en = tuple_sets;
+		break;
+	case SCTP_V4_FLOW:
+		req->ipv4_sctp_en = tuple_sets;
+		break;
+	case SCTP_V6_FLOW:
+		if ((nfc->data & RXH_L4_B_0_1) ||
+		    (nfc->data & RXH_L4_B_2_3))
+			return -EINVAL;
+
+		req->ipv6_sctp_en = tuple_sets;
+		break;
+	case IPV4_FLOW:
+		req->ipv4_fragment_en = HCLGEVF_RSS_INPUT_TUPLE_OTHER;
+		break;
+	case IPV6_FLOW:
+		req->ipv6_fragment_en = HCLGEVF_RSS_INPUT_TUPLE_OTHER;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	ret = hclgevf_cmd_send(&hdev->hw, &desc, 1);
+	if (ret) {
+		dev_err(&hdev->pdev->dev,
+			"Set rss tuple fail, status = %d\n", ret);
+		return ret;
+	}
+
+	rss_cfg->rss_tuple_sets.ipv4_tcp_en = req->ipv4_tcp_en;
+	rss_cfg->rss_tuple_sets.ipv4_udp_en = req->ipv4_udp_en;
+	rss_cfg->rss_tuple_sets.ipv4_sctp_en = req->ipv4_sctp_en;
+	rss_cfg->rss_tuple_sets.ipv4_fragment_en = req->ipv4_fragment_en;
+	rss_cfg->rss_tuple_sets.ipv6_tcp_en = req->ipv6_tcp_en;
+	rss_cfg->rss_tuple_sets.ipv6_udp_en = req->ipv6_udp_en;
+	rss_cfg->rss_tuple_sets.ipv6_sctp_en = req->ipv6_sctp_en;
+	rss_cfg->rss_tuple_sets.ipv6_fragment_en = req->ipv6_fragment_en;
+	return 0;
+}
+
+static int hclgevf_get_rss_tuple(struct hnae3_handle *handle,
+				 struct ethtool_rxnfc *nfc)
+{
+	struct hclgevf_dev *hdev = hclgevf_ae_get_hdev(handle);
+	struct hclgevf_rss_cfg *rss_cfg = &hdev->rss_cfg;
+	u8 tuple_sets;
+
+	if (handle->pdev->revision == 0x20)
+		return -EOPNOTSUPP;
+
+	nfc->data = 0;
+
+	switch (nfc->flow_type) {
+	case TCP_V4_FLOW:
+		tuple_sets = rss_cfg->rss_tuple_sets.ipv4_tcp_en;
+		break;
+	case UDP_V4_FLOW:
+		tuple_sets = rss_cfg->rss_tuple_sets.ipv4_udp_en;
+		break;
+	case TCP_V6_FLOW:
+		tuple_sets = rss_cfg->rss_tuple_sets.ipv6_tcp_en;
+		break;
+	case UDP_V6_FLOW:
+		tuple_sets = rss_cfg->rss_tuple_sets.ipv6_udp_en;
+		break;
+	case SCTP_V4_FLOW:
+		tuple_sets = rss_cfg->rss_tuple_sets.ipv4_sctp_en;
+		break;
+	case SCTP_V6_FLOW:
+		tuple_sets = rss_cfg->rss_tuple_sets.ipv6_sctp_en;
+		break;
+	case IPV4_FLOW:
+	case IPV6_FLOW:
+		tuple_sets = HCLGEVF_S_IP_BIT | HCLGEVF_D_IP_BIT;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	if (!tuple_sets)
+		return 0;
+
+	if (tuple_sets & HCLGEVF_D_PORT_BIT)
+		nfc->data |= RXH_L4_B_2_3;
+	if (tuple_sets & HCLGEVF_S_PORT_BIT)
+		nfc->data |= RXH_L4_B_0_1;
+	if (tuple_sets & HCLGEVF_D_IP_BIT)
+		nfc->data |= RXH_IP_DST;
+	if (tuple_sets & HCLGEVF_S_IP_BIT)
+		nfc->data |= RXH_IP_SRC;
+
+	return 0;
+}
+
+static int hclgevf_set_rss_input_tuple(struct hclgevf_dev *hdev,
+				       struct hclgevf_rss_cfg *rss_cfg)
+{
+	struct hclgevf_rss_input_tuple_cmd *req;
+	struct hclgevf_desc desc;
+	int ret;
+
+	hclgevf_cmd_setup_basic_desc(&desc, HCLGEVF_OPC_RSS_INPUT_TUPLE, false);
+
+	req = (struct hclgevf_rss_input_tuple_cmd *)desc.data;
+
+	req->ipv4_tcp_en = rss_cfg->rss_tuple_sets.ipv4_tcp_en;
+	req->ipv4_udp_en = rss_cfg->rss_tuple_sets.ipv4_udp_en;
+	req->ipv4_sctp_en = rss_cfg->rss_tuple_sets.ipv4_sctp_en;
+	req->ipv4_fragment_en = rss_cfg->rss_tuple_sets.ipv4_fragment_en;
+	req->ipv6_tcp_en = rss_cfg->rss_tuple_sets.ipv6_tcp_en;
+	req->ipv6_udp_en = rss_cfg->rss_tuple_sets.ipv6_udp_en;
+	req->ipv6_sctp_en = rss_cfg->rss_tuple_sets.ipv6_sctp_en;
+	req->ipv6_fragment_en = rss_cfg->rss_tuple_sets.ipv6_fragment_en;
+
+	ret = hclgevf_cmd_send(&hdev->hw, &desc, 1);
+	if (ret)
+		dev_err(&hdev->pdev->dev,
+			"Configure rss input fail, status = %d\n", ret);
+	return ret;
+}
+
 static int hclgevf_get_tc_size(struct hnae3_handle *handle)
 {
 	struct hclgevf_dev *hdev = hclgevf_ae_get_hdev(handle);
@@ -1327,6 +1514,28 @@ static int hclgevf_rss_init_hw(struct hclgevf_dev *hdev)
 					       rss_cfg->rss_hash_key);
 		if (ret)
 			return ret;
+
+		rss_cfg->rss_tuple_sets.ipv4_tcp_en =
+					HCLGEVF_RSS_INPUT_TUPLE_OTHER;
+		rss_cfg->rss_tuple_sets.ipv4_udp_en =
+					HCLGEVF_RSS_INPUT_TUPLE_OTHER;
+		rss_cfg->rss_tuple_sets.ipv4_sctp_en =
+					HCLGEVF_RSS_INPUT_TUPLE_SCTP;
+		rss_cfg->rss_tuple_sets.ipv4_fragment_en =
+					HCLGEVF_RSS_INPUT_TUPLE_OTHER;
+		rss_cfg->rss_tuple_sets.ipv6_tcp_en =
+					HCLGEVF_RSS_INPUT_TUPLE_OTHER;
+		rss_cfg->rss_tuple_sets.ipv6_udp_en =
+					HCLGEVF_RSS_INPUT_TUPLE_OTHER;
+		rss_cfg->rss_tuple_sets.ipv6_sctp_en =
+					HCLGEVF_RSS_INPUT_TUPLE_SCTP;
+		rss_cfg->rss_tuple_sets.ipv6_fragment_en =
+					HCLGEVF_RSS_INPUT_TUPLE_OTHER;
+
+		ret = hclgevf_set_rss_input_tuple(hdev, rss_cfg);
+		if (ret)
+			return ret;
+
 	}
 
 	/* Initialize RSS indirect table for each vport */
@@ -1971,6 +2180,8 @@ static const struct hnae3_ae_ops hclgevf_ops = {
 	.get_rss_indir_size = hclgevf_get_rss_indir_size,
 	.get_rss = hclgevf_get_rss,
 	.set_rss = hclgevf_set_rss,
+	.get_rss_tuple = hclgevf_get_rss_tuple,
+	.set_rss_tuple = hclgevf_set_rss_tuple,
 	.get_tc_size = hclgevf_get_tc_size,
 	.get_fw_version = hclgevf_get_fw_version,
 	.set_vlan_filter = hclgevf_set_vlan_filter,
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.h b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.h
index cf5fbf7..aed241e 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.h
@@ -46,6 +46,13 @@
 #define HCLGEVF_RSS_HASH_ALGO_MASK	0xf
 #define HCLGEVF_RSS_CFG_TBL_NUM \
 	(HCLGEVF_RSS_IND_TBL_SIZE / HCLGEVF_RSS_CFG_TBL_SIZE)
+#define HCLGEVF_RSS_INPUT_TUPLE_OTHER	GENMASK(3, 0)
+#define HCLGEVF_RSS_INPUT_TUPLE_SCTP	GENMASK(4, 0)
+#define HCLGEVF_D_PORT_BIT		BIT(0)
+#define HCLGEVF_S_PORT_BIT		BIT(1)
+#define HCLGEVF_D_IP_BIT		BIT(2)
+#define HCLGEVF_S_IP_BIT		BIT(3)
+#define HCLGEVF_V_TAG_BIT		BIT(4)
 
 /* states of hclgevf device & tasks */
 enum hclgevf_states {
@@ -106,12 +113,24 @@ struct hclgevf_cfg {
 	u32 numa_node_map;
 };
 
+struct hclgevf_rss_tuple_cfg {
+	u8 ipv4_tcp_en;
+	u8 ipv4_udp_en;
+	u8 ipv4_sctp_en;
+	u8 ipv4_fragment_en;
+	u8 ipv6_tcp_en;
+	u8 ipv6_udp_en;
+	u8 ipv6_sctp_en;
+	u8 ipv6_fragment_en;
+};
+
 struct hclgevf_rss_cfg {
 	u8  rss_hash_key[HCLGEVF_RSS_KEY_SIZE]; /* user configured hash keys */
 	u32 hash_algo;
 	u32 rss_size;
 	u8 hw_tc_map;
 	u8  rss_indirection_tbl[HCLGEVF_RSS_IND_TBL_SIZE]; /* shadow table */
+	struct hclgevf_rss_tuple_cfg rss_tuple_sets;
 };
 
 struct hclgevf_misc_vector {
-- 
2.7.4

^ permalink raw reply related

* [PATCH net-next 4/4] net: hns3: Add HW RSS hash information to RX skb
From: Salil Mehta @ 2018-10-10 19:05 UTC (permalink / raw)
  To: davem
  Cc: salil.mehta, yisen.zhuang, lipeng321, mehta.salil, netdev,
	linux-kernel, linuxarm
In-Reply-To: <20181010190537.20972-1-salil.mehta@huawei.com>

From: Peng Li <lipeng321@huawei.com>

Drivers should call skb_set_hash to set the hash and its type
in an skbuff.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
---
 drivers/net/ethernet/hisilicon/hns3/hnae3.h            |  1 +
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.c        | 17 +++++++++++++++++
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c    | 18 ++++++++++++++++++
 3 files changed, 36 insertions(+)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hnae3.h b/drivers/net/ethernet/hisilicon/hns3/hnae3.h
index 1b49c5d..3df62a5 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hnae3.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hnae3.h
@@ -479,6 +479,7 @@ struct hnae3_knic_private_info {
 	const struct hnae3_dcb_ops *dcb_ops;
 
 	u16 int_rl_setting;
+	enum pkt_hash_types rss_type;
 };
 
 struct hnae3_roce_private_info {
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
index e9d4564..9bbb53c 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
@@ -2230,6 +2230,21 @@ static bool hns3_parse_vlan_tag(struct hns3_enet_ring *ring,
 	}
 }
 
+static void hns3_set_rx_skb_rss_type(struct hns3_enet_ring *ring,
+				     struct sk_buff *skb)
+{
+	struct hns3_desc *desc = &ring->desc[ring->next_to_clean];
+	struct hnae3_handle *handle = ring->tqp->handle;
+	enum pkt_hash_types rss_type;
+
+	if (le32_to_cpu(desc->rx.rss_hash))
+		rss_type = handle->kinfo.rss_type;
+	else
+		rss_type = PKT_HASH_TYPE_NONE;
+
+	skb_set_hash(skb, le32_to_cpu(desc->rx.rss_hash), rss_type);
+}
+
 static int hns3_handle_rx_bd(struct hns3_enet_ring *ring,
 			     struct sk_buff **out_skb, int *out_bnum)
 {
@@ -2371,6 +2386,8 @@ static int hns3_handle_rx_bd(struct hns3_enet_ring *ring,
 	ring->tqp_vector->rx_group.total_bytes += skb->len;
 
 	hns3_rx_checksum(ring, skb, desc);
+	hns3_set_rx_skb_rss_type(ring, skb);
+
 	return 0;
 }
 
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index 0c5a053..db97f6a 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -2773,6 +2773,22 @@ static int hclge_set_rss_tc_mode(struct hclge_dev *hdev, u16 *tc_valid,
 	return ret;
 }
 
+static void hclge_get_rss_type(struct hclge_vport *vport)
+{
+	if (vport->rss_tuple_sets.ipv4_tcp_en ||
+	    vport->rss_tuple_sets.ipv4_udp_en ||
+	    vport->rss_tuple_sets.ipv4_sctp_en ||
+	    vport->rss_tuple_sets.ipv6_tcp_en ||
+	    vport->rss_tuple_sets.ipv6_udp_en ||
+	    vport->rss_tuple_sets.ipv6_sctp_en)
+		vport->nic.kinfo.rss_type = PKT_HASH_TYPE_L4;
+	else if (vport->rss_tuple_sets.ipv4_fragment_en ||
+		 vport->rss_tuple_sets.ipv6_fragment_en)
+		vport->nic.kinfo.rss_type = PKT_HASH_TYPE_L3;
+	else
+		vport->nic.kinfo.rss_type = PKT_HASH_TYPE_NONE;
+}
+
 static int hclge_set_rss_input_tuple(struct hclge_dev *hdev)
 {
 	struct hclge_rss_input_tuple_cmd *req;
@@ -2792,6 +2808,7 @@ static int hclge_set_rss_input_tuple(struct hclge_dev *hdev)
 	req->ipv6_udp_en = hdev->vport[0].rss_tuple_sets.ipv6_udp_en;
 	req->ipv6_sctp_en = hdev->vport[0].rss_tuple_sets.ipv6_sctp_en;
 	req->ipv6_fragment_en = hdev->vport[0].rss_tuple_sets.ipv6_fragment_en;
+	hclge_get_rss_type(&hdev->vport[0]);
 	ret = hclge_cmd_send(&hdev->hw, &desc, 1);
 	if (ret)
 		dev_err(&hdev->pdev->dev,
@@ -2973,6 +2990,7 @@ static int hclge_set_rss_tuple(struct hnae3_handle *handle,
 	vport->rss_tuple_sets.ipv6_udp_en = req->ipv6_udp_en;
 	vport->rss_tuple_sets.ipv6_sctp_en = req->ipv6_sctp_en;
 	vport->rss_tuple_sets.ipv6_fragment_en = req->ipv6_fragment_en;
+	hclge_get_rss_type(vport);
 	return 0;
 }
 
-- 
2.7.4

^ permalink raw reply related

* Re: KASAN: use-after-free Read in sctp_id2assoc
From: Dmitry Vyukov @ 2018-10-10 19:10 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: syzbot, David Miller, LKML, linux-sctp, netdev, Neil Horman,
	syzkaller-bugs, Vladislav Yasevich
In-Reply-To: <20181010184011.GE6761@localhost.localdomain>

On Wed, Oct 10, 2018 at 8:40 PM, Marcelo Ricardo Leitner
<marcelo.leitner@gmail.com> wrote:
> On Wed, Oct 10, 2018 at 08:28:22PM +0200, Dmitry Vyukov wrote:
>> On Wed, Oct 10, 2018 at 8:13 PM, Marcelo Ricardo Leitner
>> <marcelo.leitner@gmail.com> wrote:
>> > On Wed, Oct 10, 2018 at 05:28:12PM +0200, Dmitry Vyukov wrote:
>> >> On Fri, Oct 5, 2018 at 4:58 PM, Marcelo Ricardo Leitner
>> >> <marcelo.leitner@gmail.com> wrote:
>> >> > On Thu, Oct 04, 2018 at 01:48:03AM -0700, syzbot wrote:
>> >> >> Hello,
>> >> >>
>> >> >> syzbot found the following crash on:
>> >> >>
>> >> >> HEAD commit:    4e6d47206c32 tls: Add support for inplace records encryption
>> >> >> git tree:       net-next
>> >> >> console output: https://syzkaller.appspot.com/x/log.txt?x=13834b81400000
>> >> >> kernel config:  https://syzkaller.appspot.com/x/.config?x=e569aa5632ebd436
>> >> >> dashboard link: https://syzkaller.appspot.com/bug?extid=c7dd55d7aec49d48e49a
>> >> >> compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
>> >> >>
>> >> >> Unfortunately, I don't have any reproducer for this crash yet.
>> >> >>
>> >> >> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>> >> >> Reported-by: syzbot+c7dd55d7aec49d48e49a@syzkaller.appspotmail.com
>> >> >>
>> >> >> netlink: 'syz-executor1': attribute type 1 has an invalid length.
>> >> >> ==================================================================
>> >> >> BUG: KASAN: use-after-free in sctp_id2assoc+0x3a7/0x3e0
>> >> >> net/sctp/socket.c:276
>> >> >> Read of size 8 at addr ffff880195b3eb20 by task syz-executor2/15454
>> >> >>
>> >> >> CPU: 1 PID: 15454 Comm: syz-executor2 Not tainted 4.19.0-rc5+ #242
>> >> >> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
>> >> >> Google 01/01/2011
>> >> >> Call Trace:
>> >> >>  __dump_stack lib/dump_stack.c:77 [inline]
>> >> >>  dump_stack+0x1c4/0x2b4 lib/dump_stack.c:113
>> >> >>  print_address_description.cold.8+0x9/0x1ff mm/kasan/report.c:256
>> >> >>  kasan_report_error mm/kasan/report.c:354 [inline]
>> >> >>  kasan_report.cold.9+0x242/0x309 mm/kasan/report.c:412
>> >> >>  __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
>> >> >>  sctp_id2assoc+0x3a7/0x3e0 net/sctp/socket.c:276
>> >> >
>> >> > I'm not seeing yet how this could happen.
>> >> > All sockopts here are serialized by sock_lock.
>> >> > do_peeloff here would create another socket, but the issue was
>> >> > triggered before that.
>> >> > The same function that freed this memory, also removes the entry from
>> >> > idr mapping, so this entry shouldn't be there anymore.
>> >> >
>> >> > I have only two theories so far:
>> >> > - an issue with IDR/RCU.
>> >> > - something else happened that just the call stacks are not revealing.
>> >>
>> >> The "asoc->base.sk != sk" check after idr_find suggests that we don't
>> >> actually know what sock it belongs to. And if we don't know then
>> >
>> > Right. The check is more because the IDR is global and not per socket
>> > (and we don't want sockets accessing asocs from other sockets), and not
>> > that the asoc may move to another socket in between, but it also
>> > protects from such cases, yes.
>> >
>> >> locking this sock can't help keeping another sock association alive.
>> >> Am I missing something obvious here? Should we take assoc ref while we
>> >
>> > Not sure. Maybe I am.  Thanks for looking into this, btw.
>> >
>> >> are still holding sctp_assocs_id_lock?
>> >
>> > Shouldn't be needed.
>> >
>> > Solely by the call stacks:
>> > - we tried to establish a new asoc from a sctp_connect() call,
>> >   blocking one.
>> > - it slept waiting for the connect
>> > - (something closed the asoc in between the sleeps, because it freed
>> >   the asoc right when waking up on sctp_wait_for_connect())
>> > - it freed the asoc after sleeping on it on sctp_wait_for_connect [A]
>> > - another thread tried to peeloff that asoc [B]
>> >
>> > For [B] to access the asoc in question, it had to take the same sock
>> > lock [A] had taken, and then the idr should not return an asoc in
>> > sctp_i2asoc(). Note that we can't peeloff an asoc twice, thus why
>> > the certainty here.
>> >
>> > If [B] actually kicked in before the sleep resumed, that should have
>> > been fine because it took the same sock lock [A] would have to
>> > re-take. In this case an asoc would have been returned by
>> > sctp_id2asoc(), the asoc would have been moved to a new socket, but
>> > all while holding the original socket sock lock.
>>
>> But why A and B use the same lock?
>>
>> sctp_assocs_id is global, so it contains asocs from all sockets, right?
>> assoc id comes straight from userspaces.
>> So isn't it possible that B uses completely different sock but passes
>> assoc id from the A sock? Then B should find assoc in sctp_assocs_id,
>> and at the point of "asoc->base.sk != sk" check the assoc can be
>> already freed.
>
> That explains it. Somehow I was thinking the issue was for reading
> ->dead instead.  Now it's pretty clear. Then this should be the patch
> we want. Can you please give it a spin? (only compile tested)

syzbot can only test patches for bug with reproducers, this one
doesn't have one (not surprising taking into account subtliness of the
race):
https://github.com/google/syzkaller/blob/master/docs/syzbot.md#testing-patches

It's not possible squeeze in custom patches either:
https://github.com/google/syzkaller/blob/master/docs/syzbot.md#no-custom-patches

But the change looks good to me.

Acked-by: Dmitry Vyukov <dvyukov@google.com>


> While holding the spinlock, an entry cannot be removed from the idr
> and thus it cannot be freed. So even if the app uses an id from
> another socket, it will still be there.
>
> ---8<---
>
> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> index f73e9d38d5ba..a7722f43aa69 100644
> --- a/net/sctp/socket.c
> +++ b/net/sctp/socket.c
> @@ -271,11 +271,10 @@ struct sctp_association *sctp_id2assoc(struct sock *sk, sctp_assoc_t id)
>
>         spin_lock_bh(&sctp_assocs_id_lock);
>         asoc = (struct sctp_association *)idr_find(&sctp_assocs_id, (int)id);
> +       if (asoc && (asoc->base.sk != sk || asoc->base.dead))
> +               asoc = NULL;
>         spin_unlock_bh(&sctp_assocs_id_lock);
>
> -       if (!asoc || (asoc->base.sk != sk) || asoc->base.dead)
> -               return NULL;
> -
>         return asoc;
>  }
>

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox