Netdev List
 help / color / mirror / Atom feed
* Re: bpfilter causes a leftover kernel process
From: Olivier Brunel @ 2018-08-27 16:31 UTC (permalink / raw)
  To: netdev; +Cc: Olivier Brunel
In-Reply-To: <20180826180816.04ef7d16@jjacky.com>


Small addentum: Of course I failed to realize this bpfilter helper is
also mentioned in the kernel log:

kern.info: [    8.997711] bpfilter: Loaded bpfilter_umh pid 920


It also seems to be absolutely required when CONFIG_BPFILTER is
enabled, that is I tried blacklisting the module bpfilter, but then
other things (e.g. iptables-restore) just fail to work.

So the process is required, never ends and prevents umouting the
rootfs on shutdown. Unless I'm missing something, there's definitely a
bug there?

Thanks,

^ permalink raw reply

* Re: [PATCH RFC net-next] net/fib: Poptrie based FIB lookup
From: Stephen Hemminger @ 2018-08-27 16:24 UTC (permalink / raw)
  To: Md. Islam
  Cc: Netdev, David Miller, David Ahern, Eric Dumazet, Alexey Kuznetsov,
	makita.toshiaki, panda, yasuhiro.ohara, john.fastabend,
	alexei.starovoitov
In-Reply-To: <CAFgPn1Dz3cuNMULfRVKjaN0vndUbXe7V8yUoSURus8UbvyMkqA@mail.gmail.com>

On Sun, 26 Aug 2018 22:28:48 -0400
"Md. Islam" <mislam4@kent.edu> wrote:

> This patch implements Poptrie [1] based FIB lookup. It exhibits pretty
> impressive lookup performance compared to LC-trie. This poptrie
> implementation however somewhat deviates from the original
> implementation [2]. I tested this patch very rigorously with several
> FIB tables containing half a million routes. I got same result as
> LC-trie based fib_lookup().
> 
> Poptrie is intended to work in conjunction with LC-trie (not replace
> it). It is primarily designed to overcome many issues of TCAM based
> router [1]. It [1] shows that the Poptrie can achieve very impressive
> lookup performance on CPU. This patch will mainly be used by XDP
> forwarding.
> 
> 1. Asai, Hirochika, and Yasuhiro Ohara. "Poptrie: A compressed trie
> with population count for fast and scalable software IP routing table
> lookup." ACM SIGCOMM Computer Communication Review. 2015.
> 
> 2. https://github.com/pixos/poptrie


I am glad to see more research in to lookup speed. Here are some non-technical
feedback. Looking deeper takes longer.

The license in github version is not compatiable with GPL. If you based your
code off that, you need to get approval from original copyright holder.

The code is not formatted according to current kernel coding style.
Please use checkpatch to see what the issues are.

It is preferred that a function return a value, rather than being void
and returing result by reference. Example:

> +
> +/*We assume that pt->root is not NULL*/
> +void poptrie_lookup(struct poptrie *pt, __be32 dest, struct net_device **dev)
> +{
...

> +        *dev = get_fib(&pt->nhs, fib_index);
> +        return;
> +    }

Why not?
static struct net_device *poptrie_lookup(struct poptrie *pt, __be32 dest)

Also, as Dave mentioned any implementation needs to handle multiple namespaces
and routing tables.

Could this alternative lookup be enabled via sysctl at runtime rather than kernel config?

^ permalink raw reply

* Re: any reason for "!!netif_carrier_ok" and "!!netif_dormant" in net-sysfs.c?
From: rpjday @ 2018-08-27 16:16 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20180827.090424.977658416500279863.davem@davemloft.net>


Quoting David Miller <davem@davemloft.net>:

> From: "Robert P. J. Day" <rpjday@crashcourse.ca>
> Date: Mon, 27 Aug 2018 04:55:29 -0400 (EDT)
>
>>   another pedantic oddity -- is there a reason for these two double
>> negations in net/core/net-sysfs.c?
>
> It turns an arbitrary integer into a boolean, this is a common
> construct across the kernel tree so I'm surprised you've never seen
> it before.
>
> Although, I don't know how much more hand holding we're willing to
> tolerate continuing to give to you at this point.
>
> Thanks.

   I mentioned in my earlier email that I know what that construct is
*typically* used for; I also pointed out that, AFAICT, it was totally
unnecessary in the context of the two routines I mentioned, which
would appear to ever return only one of two boolean values, 0 or 1.

   My experience with kernel code is that one should not introduce
unnecessary complexities, which suggests (as I stated) that there
seems to be no value in the "!!" construct *in those particular
cases*, hence my curiosity.

   If you want to criticize my question, at least have the courtesy
to represent it accurately.

rday

^ permalink raw reply

* Re: [PATCH v2 iproute2-next 1/3] tc: support conversions to or from 64 bit nanosecond-based time
From: Stephen Hemminger @ 2018-08-27 16:11 UTC (permalink / raw)
  To: Yousuk Seung
  Cc: netdev, David Ahern, Michael McLennan, Priyaranjan Jha, Dave Taht,
	Neal Cardwell
In-Reply-To: <20180827024230.246445-2-ysseung@google.com>

On Sun, 26 Aug 2018 19:42:28 -0700
Yousuk Seung <ysseung@google.com> wrote:

> +int get_time(unsigned int *time, const char *str)
> +{
> +	double t;
> +	char *p;
> +
> +	t = strtod(str, &p);
> +	if (p == str)
> +		return -1;
> +
> +	if (*p) {
> +		if (strcasecmp(p, "s") == 0 || strcasecmp(p, "sec") == 0 ||
> +		    strcasecmp(p, "secs") == 0)
> +			t *= TIME_UNITS_PER_SEC;
> +		else if (strcasecmp(p, "ms") == 0 || strcasecmp(p, "msec") == 0 ||
> +			 strcasecmp(p, "msecs") == 0)
> +			t *= TIME_UNITS_PER_SEC/1000;
> +		else if (strcasecmp(p, "us") == 0 || strcasecmp(p, "usec") == 0 ||
> +			 strcasecmp(p, "usecs") == 0)
> +			t *= TIME_UNITS_PER_SEC/1000000;
> +		else
> +			return -1;

Do we need to really support UPPER case.
Isn't existing matches semantics good enough?

^ permalink raw reply

* Re: any reason for "!!netif_carrier_ok" and "!!netif_dormant" in net-sysfs.c?
From: David Miller @ 2018-08-27 16:04 UTC (permalink / raw)
  To: rpjday; +Cc: netdev
In-Reply-To: <alpine.LFD.2.21.1808270448030.28019@localhost.localdomain>

From: "Robert P. J. Day" <rpjday@crashcourse.ca>
Date: Mon, 27 Aug 2018 04:55:29 -0400 (EDT)

>   another pedantic oddity -- is there a reason for these two double
> negations in net/core/net-sysfs.c?

It turns an arbitrary integer into a boolean, this is a common
construct across the kernel tree so I'm surprised you've never seen
it before.

Although, I don't know how much more hand holding we're willing to
tolerate continuing to give to you at this point.

Thanks.

^ permalink raw reply

* Re: [PATCH v2 01/17] asm: simd context helper API
From: Palmer Dabbelt @ 2018-08-27 19:50 UTC (permalink / raw)
  To: Jason; +Cc: linux-kernel, netdev, davem, Jason, luto, Greg KH, sneves,
	linux-arch
In-Reply-To: <20180824213849.23647-2-Jason@zx2c4.com>

On Fri, 24 Aug 2018 14:38:33 PDT (-0700), Jason@zx2c4.com wrote:
> Sometimes it's useful to amortize calls to XSAVE/XRSTOR and the related
> FPU/SIMD functions over a number of calls, because FPU restoration is
> quite expensive. This adds a simple header for carrying out this pattern:
>
>     simd_context_t simd_context = simd_get();
>     while ((item = get_item_from_queue()) != NULL) {
>         encrypt_item(item, simd_context);
>         simd_context = simd_relax(simd_context);
>     }
>     simd_put(simd_context);
>
> The relaxation step ensures that we don't trample over preemption, and
> the get/put API should be a familiar paradigm in the kernel.
>
> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Greg KH <gregkh@linuxfoundation.org>
> Cc: Samuel Neves <sneves@dei.uc.pt>
> Cc: linux-arch@vger.kernel.org
> ---
>  arch/alpha/include/asm/Kbuild      |  5 ++--
>  arch/arc/include/asm/Kbuild        |  1 +
>  arch/arm/include/asm/simd.h        | 42 ++++++++++++++++++++++++++++++
>  arch/arm64/include/asm/simd.h      | 37 +++++++++++++++++++++-----
>  arch/c6x/include/asm/Kbuild        |  3 ++-
>  arch/h8300/include/asm/Kbuild      |  3 ++-
>  arch/hexagon/include/asm/Kbuild    |  1 +
>  arch/ia64/include/asm/Kbuild       |  1 +
>  arch/m68k/include/asm/Kbuild       |  1 +
>  arch/microblaze/include/asm/Kbuild |  1 +
>  arch/mips/include/asm/Kbuild       |  1 +
>  arch/nds32/include/asm/Kbuild      |  7 ++---
>  arch/nios2/include/asm/Kbuild      |  1 +
>  arch/openrisc/include/asm/Kbuild   |  7 ++---
>  arch/parisc/include/asm/Kbuild     |  1 +
>  arch/powerpc/include/asm/Kbuild    |  3 ++-
>  arch/riscv/include/asm/Kbuild      |  3 ++-
>  arch/s390/include/asm/Kbuild       |  3 ++-
>  arch/sh/include/asm/Kbuild         |  1 +
>  arch/sparc/include/asm/Kbuild      |  1 +
>  arch/um/include/asm/Kbuild         |  3 ++-
>  arch/unicore32/include/asm/Kbuild  |  1 +
>  arch/x86/include/asm/simd.h        | 30 ++++++++++++++++++++-
>  arch/xtensa/include/asm/Kbuild     |  1 +
>  include/asm-generic/simd.h         | 15 +++++++++++
>  include/linux/simd.h               | 28 ++++++++++++++++++++
>  26 files changed, 180 insertions(+), 21 deletions(-)
>  create mode 100644 arch/arm/include/asm/simd.h
>  create mode 100644 include/linux/simd.h

...

> diff --git a/arch/riscv/include/asm/Kbuild b/arch/riscv/include/asm/Kbuild
> index 576ffdca06ba..8d3e7aef3234 100644
> --- a/arch/riscv/include/asm/Kbuild
> +++ b/arch/riscv/include/asm/Kbuild
> @@ -4,9 +4,9 @@ generic-y += checksum.h
>  generic-y += cputime.h
>  generic-y += device.h
>  generic-y += div64.h
> -generic-y += dma.h
>  generic-y += dma-contiguous.h
>  generic-y += dma-mapping.h
> +generic-y += dma.h
>  generic-y += emergency-restart.h
>  generic-y += errno.h
>  generic-y += exec.h

If this is the canonical ordering and doing so makes your life easier then I'm 
OK taking this as a separate patch into the RISC-V tree, but if not then feel 
free to roll something like this up into your next patch set.

> @@ -45,6 +45,7 @@ generic-y += setup.h
>  generic-y += shmbuf.h
>  generic-y += shmparam.h
>  generic-y += signal.h
> +generic-y += simd.h
>  generic-y += socket.h
>  generic-y += sockios.h
>  generic-y += stat.h

Either way, this looks fine for as far as the RISC-V stuff goes as it's pretty 
much a NOP.  As long as it stays a NOP then feel free to add a 

Reviewed-by: Palmer Dabbelt <palmer@sifive.com>

as far as the RISC-V parts are conceded.  It looks like there's a lot of other 
issues, though, so it's not much of a review :)

^ permalink raw reply

* Re: [PATCH v2 2/2] can: rcar: use SPDX identifier for Renesas drivers
From: Wolfram Sang @ 2018-08-27 19:45 UTC (permalink / raw)
  To: Marc Kleine-Budde
  Cc: Wolfram Sang, linux-renesas-soc, Kuninori Morimoto,
	Wolfgang Grandegger, David S. Miller, linux-can, netdev,
	linux-kernel
In-Reply-To: <89e744d5-8739-5215-1b37-fbd2f4be7aee@pengutronix.de>

[-- Attachment #1: Type: text/plain, Size: 229 bytes --]


> Applied to linux-can-next. Please add a patch description to the patch.
> My $UPSTREAM doesn't like empty patch descriptions :) I've shamelessly
> used Fabio Estevam patch description from his flexcan SPDX patch.

Thx, Marc!


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH] net: mvpp2: avoid bouncing buffers
From: Christoph Hellwig @ 2018-08-27 15:48 UTC (permalink / raw)
  To: Brian Brooks
  Cc: Christoph Hellwig, David Miller, antoine.tenart,
	maxime.chevallier, ymarkman, stefanc, netdev, linux-kernel,
	bjorn.topel, brian.brooks
In-Reply-To: <20180827135524.fv4mxkwjn5bv7p5e@di3>

WE should basically never have dev->dma_mask = &dev->coherent_dma_mask,
so until that is the case you are doctoring around the symptoms and
not the problem.

Does the patch below help your case?

----
>From 6294e0e330851ee06e66ab85b348f1d92d375d7a Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <hch@lst.de>
Date: Mon, 27 Aug 2018 17:23:24 +0200
Subject: driver core: initialize a default DMA mask for platform device

We still treat devices without a DMA mask as defaulting to 32-bits for
both mask, but a few releases ago we've started warning about such
cases, as they require special cases to work around this sloppyness.
Add a dma_mask field to struct platform_object so that we can initialize
the dma_mask pointer in struct device and initialize both masks to
32-bits by default.  Architectures can still override this in
arch_setup_pdev_archdata if needed.

Note that the code looks a little odd with the various conditionals
because we have to support platform_device structures that are
statically allocated.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/base/platform.c         | 15 +++++++++++++--
 include/linux/platform_device.h |  1 +
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index dff82a3c2caa..baf4b06cf2d9 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -225,6 +225,17 @@ struct platform_object {
 	char name[];
 };
 
+static void setup_pdev_archdata(struct platform_device *pdev)
+{
+	if (!pdev->dev.coherent_dma_mask)
+		pdev->dev.coherent_dma_mask = DMA_BIT_MASK(32);
+	if (!pdev->dma_mask)
+		pdev->dma_mask = DMA_BIT_MASK(32);
+	if (!pdev->dev.dma_mask)
+		pdev->dev.dma_mask = &pdev->dma_mask;
+	arch_setup_pdev_archdata(pdev);
+};
+
 /**
  * platform_device_put - destroy a platform device
  * @pdev: platform device to free
@@ -271,7 +282,7 @@ struct platform_device *platform_device_alloc(const char *name, int id)
 		pa->pdev.id = id;
 		device_initialize(&pa->pdev.dev);
 		pa->pdev.dev.release = platform_device_release;
-		arch_setup_pdev_archdata(&pa->pdev);
+		setup_pdev_archdata(&pa->pdev);
 	}
 
 	return pa ? &pa->pdev : NULL;
@@ -472,7 +483,7 @@ EXPORT_SYMBOL_GPL(platform_device_del);
 int platform_device_register(struct platform_device *pdev)
 {
 	device_initialize(&pdev->dev);
-	arch_setup_pdev_archdata(pdev);
+	setup_pdev_archdata(pdev);
 	return platform_device_add(pdev);
 }
 EXPORT_SYMBOL_GPL(platform_device_register);
diff --git a/include/linux/platform_device.h b/include/linux/platform_device.h
index 1a9f38f27f65..d84ec1de6022 100644
--- a/include/linux/platform_device.h
+++ b/include/linux/platform_device.h
@@ -25,6 +25,7 @@ struct platform_device {
 	int		id;
 	bool		id_auto;
 	struct device	dev;
+	dma_addr_t	dma_mask;
 	u32		num_resources;
 	struct resource	*resource;
 
-- 
2.18.0

^ permalink raw reply related

* [PATCH] bpf: fix build error with clang
From: Stefan Agner @ 2018-08-27 19:30 UTC (permalink / raw)
  To: daniel; +Cc: kafai, ast, mka, netdev, linux-kernel, Stefan Agner

Building the newly introduced BPF_PROG_TYPE_SK_REUSEPORT leads to
a compile time error when building with clang:
net/core/filter.o: In function `sk_reuseport_convert_ctx_access':
  ../net/core/filter.c:7284: undefined reference to `__compiletime_assert_7284'

It seems that clang has issues resolving hweight_long at compile
time. Since SK_FL_PROTO_MASK is a constant, we can use the interface
for known constant arguments which works fine with clang.

Fixes: 2dbb9b9e6df6 ("bpf: Introduce BPF_PROG_TYPE_SK_REUSEPORT")
Signed-off-by: Stefan Agner <stefan@agner.ch>
---
 net/core/filter.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index c25eb36f1320..7a2430945c71 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -7281,7 +7281,7 @@ static u32 sk_reuseport_convert_ctx_access(enum bpf_access_type type,
 		break;
 
 	case offsetof(struct sk_reuseport_md, ip_protocol):
-		BUILD_BUG_ON(hweight_long(SK_FL_PROTO_MASK) != BITS_PER_BYTE);
+		BUILD_BUG_ON(HWEIGHT32(SK_FL_PROTO_MASK) != BITS_PER_BYTE);
 		SK_REUSEPORT_LOAD_SK_FIELD_SIZE_OFF(__sk_flags_offset,
 						    BPF_W, 0);
 		*insn++ = BPF_ALU32_IMM(BPF_AND, si->dst_reg, SK_FL_PROTO_MASK);
-- 
2.18.0

^ permalink raw reply related

* Re: [PATCH RFC net-next] net: Poptrie based FIB lookup
From: Jesper Dangaard Brouer @ 2018-08-27 15:33 UTC (permalink / raw)
  To: Md. Islam
  Cc: Netdev, David Miller, David Ahern, Eric Dumazet, Alexey Kuznetsov,
	Stephen Hemminger, makita.toshiaki, panda, yasuhiro.ohara, brouer,
	John Fastabend
In-Reply-To: <CAFgPn1CHgSKRMy1RgutEh9sragUK7WtqO_COzzcqZEaugP8+7Q@mail.gmail.com>


On Sun, 26 Aug 2018 22:13:53 -0400 "Md. Islam" <mislam4@kent.edu> wrote:

> This patch implements Poptrie [1] based FIB lookup. It exhibits pretty
> impressive lookup performance compared to LC-trie. This poptrie
> implementation however somewhat deviates from the original
> implementation [2]. I tested this patch very rigorously with several
> FIB tables containing half a million routes. I got same result as
> LC-trie based fib_lookup().

It sounds really promising performance wise. The article [1] claim
lookup speeds up-to 240 Million lookups per second.  That is a crazy
speed. This is 4.166 nanosec per lookup (1/240*1000), and their test
CPU is 3.9GHz, which gives them 16.25 CPU cycles for a lookup, with 3
insn per cycle, that gives them max 48 perfectly pipelined instructions
per lookup.

> Poptrie is intended to work in conjunction with LC-trie (not replace
> it). It is primarily designed to overcome many issues of TCAM based
> router [1]. [1] shows that the Poptrie can achieve very impressive
> lookup performance on CPU. This patch will mainly be used by XDP
> forwarding.
> 
> 1. Asai, Hirochika, and Yasuhiro Ohara. "Poptrie: A compressed trie
> with population count for fast and scalable software IP routing table
> lookup." ACM SIGCOMM Computer Communication Review. 2015.
> 
> 2. https://github.com/pixos/poptrie
> 
> From c5e05ea66b06eb9313749bc8969b4c2798fcf96a Mon Sep 17 00:00:00 2001
> From: tamimcse <tamim@csebuet.org>
> Date: Sun, 26 Aug 2018 21:12:38 -0400
> Subject: [PATCH] Implented Poptrie

This above "commit-info" should not be part of the patch description.

> Signed-off-by: tamimcse <tamim@csebuet.org>

Use you real/full name here.

> ---

First of order of business: You need to conform to the kernels coding
standards!

https://www.kernel.org/doc/html/v4.18/process/coding-style.html

There is a script avail to check this called: scripts/checkpatch.pl
It summary says:
 total: 139 errors, 238 warnings, 6 checks, 372 lines checked
(Not good, more error+warnings than lines...)

Please fix up those...


>  include/net/ip_fib.h   |  40 +++++++
>  net/ipv4/Makefile      |   2 +-
>  net/ipv4/fib_poptrie.c | 295 +++++++++++++++++++++++++++++++++++++++++++++++++
>  net/ipv4/fib_trie.c    |   3 +
>  4 files changed, 339 insertions(+), 1 deletion(-)
>  create mode 100644 net/ipv4/fib_poptrie.c
[...]

> diff --git a/net/ipv4/fib_poptrie.c b/net/ipv4/fib_poptrie.c
> new file mode 100644
> index 0000000..b3a88ab
> --- /dev/null
> +++ b/net/ipv4/fib_poptrie.c
> @@ -0,0 +1,295 @@
[....]
> +/*Insert a new node at index*/
> +static void insert_chield_node(struct poptrie_node *node,
> +        char index)
> +{

You are misspelling "child" as "chield"

> +    int i, j;
> +    struct poptrie_node *arr;
> +    int arr_size  = (int)hweight64(node->nodevec);
> +
> +    arr = kcalloc(arr_size + 1, sizeof(*arr), GFP_ATOMIC);
> +    for (i = 0, j = 0; i < (arr_size + 1); i++) {
> +        if (i != index && j < arr_size)
> +            arr[i] = node->chield_nodes[j++];
> +    }
> +
> +    kfree(node->chield_nodes);
> +    node->chield_nodes = arr;
> +}

[...]
> +/*We assume that pt->root is not NULL*/
> +void poptrie_lookup(struct poptrie *pt, __be32 dest, struct net_device **dev)
> +{
> +    register u32 index;
> +    register u64 bitmap, bitmask;
> +    register unsigned long leaf_index;
> +    register unsigned long node_index;
> +    register struct poptrie_node *node = pt->root;
> +    register u8 fib_index = pt->def_nh;
> +    register u8 carry = 0;
> +    register u8 carry_bit = 2;

Do you have performance data, that tell you that "register" is needed here?

> +    while (1) {
> +        /*Extract 6 bytes from dest */
> +        if (likely(carry_bit != 8)) {
> +            index = ((dest & 252) >> carry_bit) | carry;
> +            carry = (dest & ((1 << carry_bit) - 1)) << (6 - carry_bit);
> +            carry_bit = carry_bit + 2;
> +            dest = dest >> 8;
> +        } else {
> +            index = carry;
> +            carry = 0;
> +            carry_bit = 2;
> +        }
> +
> +        /*Create a bitmap based on the the extracted value*/
> +        bitmap = 1ULL << index;
> +        bitmask = bitmap - 1;
> +
> +        /*Find corresponding leaf*/
> +        if (likely(node->vector & bitmap)) {
> +            leaf_index = hweight64(node->leafvec & bitmask);

Just as help for reviewers, the popcnt instruction is here.

 hweight64 == popcnt

> +            if (!(node->leafvec & bitmap))
> +                leaf_index--;
> +            fib_index = node->leaves[leaf_index];
> +        }
> +
> +        /*Find corresponding node*/
> +        if (likely(node->nodevec & bitmap)) {
> +            node_index = hweight64(node->nodevec & bitmask);

And here.

> +            node = &node->chield_nodes[node_index];
> +            continue;
> +        }
> +
> +        *dev = get_fib(&pt->nhs, fib_index);
> +        return;
> +    }
> +}


-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* GPL compliance issue with liquidio/lio_23xx_vsw.bin firmware
From: Florian Weimer @ 2018-08-27 15:01 UTC (permalink / raw)
  To: linux-firmware
  Cc: netdev@vger.kernel.org, Derek Chickles, Satanand Burla,
	Felix Manlunas, Raghu Vatsavayi, Manish Awasthi

liquidio/lio_23xx_vsw.bin contains a compiled MIPS Linux kernel:

$ tail --bytes=+1313 liquidio/lio_23xx_vsw.bin > elf
$ readelf -aW elf
[…]
   [ 6] __ksymtab         PROGBITS        ffffffff80e495f8 64a5f8 00d130 
00   A  0   0  8
   [ 7] __ksymtab_gpl     PROGBITS        ffffffff80e56728 657728 008400 
00   A  0   0  8
   [ 8] __ksymtab_strings PROGBITS        ffffffff80e5eb28 65fb28 018868 
00   A  0   0  1
[…]
Symbol table '.symtab' contains 1349 entries:
    Num:    Value          Size Type    Bind   Vis      Ndx Name
      0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
      1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS 
arch/mips/kernel/head.o
      2: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS init/main.c
      3: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS 
include/linux/types.h
[…]

Yet there is no corresponding source provided, and LICENCE.cavium lacks 
the required notices.

Thanks,
Florian

^ permalink raw reply

* [PATCH 4/4] RFC: r8169: Disable clk during suspend / resume
From: Hans de Goede @ 2018-08-27 14:32 UTC (permalink / raw)
  To: David S . Miller, Heiner Kallweit, Michael Turquette,
	Stephen Boyd, Andy Shevchenko, Irina Tirdea
  Cc: Hans de Goede, netdev, Johannes Stezenbach, Carlo Caione,
	linux-clk
In-Reply-To: <20180827143200.8597-1-hdegoede@redhat.com>

Disable the clk during suspend to save power. Note that tp->clk may be
NULL, the clk core functions handle this without problems.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
---
 drivers/net/ethernet/realtek/r8169.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index 779b02979493..aebc90158bd9 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -7312,8 +7312,10 @@ static int rtl8169_suspend(struct device *device)
 {
 	struct pci_dev *pdev = to_pci_dev(device);
 	struct net_device *dev = pci_get_drvdata(pdev);
+	struct rtl8169_private *tp = netdev_priv(dev);
 
 	rtl8169_net_suspend(dev);
+	clk_disable_unprepare(tp->clk);
 
 	return 0;
 }
@@ -7340,6 +7342,7 @@ static int rtl8169_resume(struct device *device)
 	struct net_device *dev = pci_get_drvdata(pdev);
 	struct rtl8169_private *tp = netdev_priv(dev);
 
+	clk_prepare_enable(tp->clk);
 	rtl8169_init_phy(dev, tp);
 
 	if (netif_running(dev))
-- 
2.18.0

^ permalink raw reply related

* [PATCH 3/4] clk: x86: Stop marking clocks as CLK_IS_CRITICAL
From: Hans de Goede @ 2018-08-27 14:31 UTC (permalink / raw)
  To: David S . Miller, Heiner Kallweit, Michael Turquette,
	Stephen Boyd, Andy Shevchenko, Irina Tirdea
  Cc: Hans de Goede, netdev, Johannes Stezenbach, Carlo Caione,
	linux-clk
In-Reply-To: <20180827143200.8597-1-hdegoede@redhat.com>

Commit d31fd43c0f9a ("clk: x86: Do not gate clocks enabled by the
firmware"), which added the code to mark clocks as CLK_IS_CRITICAL, causes
all unclaimed PMC clocks on Cherry Trail devices to be on all the time,
resulting on the device not being able to reach S0i3 when suspended.

The reason for this commit is that on some Bay Trail / Cherry Trail devices
the r8169 ethernet controller uses pmc_plt_clk_4. Now that the clk-pmc-atom
driver exports an "ether_clk" alias for pmc_plt_clk_4 and the r8169 driver
has been modified to get and enable this clock (if present) the marking of
the clocks as CLK_IS_CRITICAL is no longer necessary.

This commit removes the CLK_IS_CRITICAL marking, fixing Cherry Trail
devices not being able to reach S0i3 greatly decreasing their battery
drain when suspended.

Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=193891#c102
Cc: Johannes Stezenbach <js@sig21.net>
Cc: Carlo Caione <carlo@endlessm.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
---
 drivers/clk/x86/clk-pmc-atom.c | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/drivers/clk/x86/clk-pmc-atom.c b/drivers/clk/x86/clk-pmc-atom.c
index 75151901ff7d..d977193842df 100644
--- a/drivers/clk/x86/clk-pmc-atom.c
+++ b/drivers/clk/x86/clk-pmc-atom.c
@@ -187,13 +187,6 @@ static struct clk_plt *plt_clk_register(struct platform_device *pdev, int id,
 	pclk->reg = base + PMC_CLK_CTL_OFFSET + id * PMC_CLK_CTL_SIZE;
 	spin_lock_init(&pclk->lock);
 
-	/*
-	 * If the clock was already enabled by the firmware mark it as critical
-	 * to avoid it being gated by the clock framework if no driver owns it.
-	 */
-	if (plt_clk_is_enabled(&pclk->hw))
-		init.flags |= CLK_IS_CRITICAL;
-
 	ret = devm_clk_hw_register(&pdev->dev, &pclk->hw);
 	if (ret) {
 		pclk = ERR_PTR(ret);
-- 
2.18.0

^ permalink raw reply related

* [PATCH 2/4] r8169: Get and enable optional ether_clk clock
From: Hans de Goede @ 2018-08-27 14:31 UTC (permalink / raw)
  To: David S . Miller, Heiner Kallweit, Michael Turquette,
	Stephen Boyd, Andy Shevchenko, Irina Tirdea
  Cc: Hans de Goede, netdev, Johannes Stezenbach, Carlo Caione,
	linux-clk
In-Reply-To: <20180827143200.8597-1-hdegoede@redhat.com>

On some boards a platform clock is used as clock for the r8169 chip,
this commit adds support for getting and enabling this clock (assuming
it has an "ether_clk" alias set on it).

This is related to commit d31fd43c0f9a ("clk: x86: Do not gate clocks
enabled by the firmware") which is a previous attempt to fix this for some
x86 boards, but this causes all Cherry Trail SoC using boards to not reach
there lowest power states when suspending.

This commit (together with an atom-pmc-clk driver commit adding the alias)
fixes things properly by making the r8169 get the clock and enable it when
it needs it.

Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=193891#c102
Cc: Johannes Stezenbach <js@sig21.net>
Cc: Carlo Caione <carlo@endlessm.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
---
 drivers/net/ethernet/realtek/r8169.c | 33 ++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index eaedc11ed686..779b02979493 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -13,6 +13,7 @@
 #include <linux/pci.h>
 #include <linux/netdevice.h>
 #include <linux/etherdevice.h>
+#include <linux/clk.h>
 #include <linux/delay.h>
 #include <linux/ethtool.h>
 #include <linux/mii.h>
@@ -765,6 +766,7 @@ struct rtl8169_private {
 
 	u16 event_slow;
 	const struct rtl_coalesce_info *coalesce_info;
+	struct clk *clk;
 
 	struct mdio_ops {
 		void (*write)(struct rtl8169_private *, int, int);
@@ -7614,6 +7616,11 @@ static void rtl_hw_initialize(struct rtl8169_private *tp)
 	}
 }
 
+static void rtl_disable_clk(void *data)
+{
+	clk_disable_unprepare(data);
+}
+
 static int rtl_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 {
 	const struct rtl_cfg_info *cfg = rtl_cfg_infos + ent->driver_data;
@@ -7647,6 +7654,32 @@ static int rtl_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	mii->reg_num_mask = 0x1f;
 	mii->supports_gmii = cfg->has_gmii;
 
+	/* Get the *optional* external "ether_clk" used on some boards */
+	tp->clk = devm_clk_get(&pdev->dev, "ether_clk");
+	if (IS_ERR(tp->clk)) {
+		rc = PTR_ERR(tp->clk);
+		if (rc == -ENOENT) {
+			/* clk-core allows NULL (for suspend / resume) */
+			tp->clk = NULL;
+		} else if (rc == -EPROBE_DEFER) {
+			return rc;
+		} else {
+			dev_err(&pdev->dev, "failed to get clk: %d\n", rc);
+			return rc;
+		}
+	} else {
+		rc = clk_prepare_enable(tp->clk);
+		if (rc) {
+			dev_err(&pdev->dev, "failed to enable clk: %d\n", rc);
+			return rc;
+		}
+
+		rc = devm_add_action_or_reset(&pdev->dev, rtl_disable_clk,
+					      tp->clk);
+		if (rc)
+			return rc;
+	}
+
 	/* disable ASPM completely as that cause random device stop working
 	 * problems as well as full system hangs for some PCIe devices users */
 	pci_disable_link_state(pdev, PCIE_LINK_STATE_L0S | PCIE_LINK_STATE_L1 |
-- 
2.18.0

^ permalink raw reply related

* [PATCH 1/4] clk: x86: add "ether_clk" alias for Bay Trail / Cherry Trail
From: Hans de Goede @ 2018-08-27 14:31 UTC (permalink / raw)
  To: David S . Miller, Heiner Kallweit, Michael Turquette,
	Stephen Boyd, Andy Shevchenko, Irina Tirdea
  Cc: Hans de Goede, netdev, Johannes Stezenbach, Carlo Caione,
	linux-clk
In-Reply-To: <20180827143200.8597-1-hdegoede@redhat.com>

Commit d31fd43c0f9a ("clk: x86: Do not gate clocks enabled by the
firmware") causes all unclaimed PMC clocks on Cherry Trail devices to be on
all the time, resulting on the device not being able to reach S0i2 or S0i3
when suspended.

The reason for this commit is that on some Bay Trail / Cherry Trail devices
the ethernet controller uses pmc_plt_clk_4. This commit adds an "ether_clk"
alias, so that the relevant ethernet drivers can try to (optionally) use
this, without needing X86 specific code / hacks, thus fixing ethernet on
these devices without breaking S0i3 support.

This commit uses clkdev_hw_create() to create the alias, mirroring the code
for the already existing "mclk" alias for pmc_plt_clk_3.

Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=193891#c102
Cc: Johannes Stezenbach <js@sig21.net>
Cc: Carlo Caione <carlo@endlessm.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
---
 drivers/clk/x86/clk-pmc-atom.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/clk/x86/clk-pmc-atom.c b/drivers/clk/x86/clk-pmc-atom.c
index 08ef69945ffb..75151901ff7d 100644
--- a/drivers/clk/x86/clk-pmc-atom.c
+++ b/drivers/clk/x86/clk-pmc-atom.c
@@ -55,6 +55,7 @@ struct clk_plt_data {
 	u8 nparents;
 	struct clk_plt *clks[PMC_CLK_NUM];
 	struct clk_lookup *mclk_lookup;
+	struct clk_lookup *ether_clk_lookup;
 };
 
 /* Return an index in parent table */
@@ -351,11 +352,20 @@ static int plt_clk_probe(struct platform_device *pdev)
 		goto err_unreg_clk_plt;
 	}
 
+	data->ether_clk_lookup = clkdev_hw_create(&data->clks[4]->hw,
+						  "ether_clk", NULL);
+	if (!data->ether_clk_lookup) {
+		err = -ENOMEM;
+		goto err_drop_mclk;
+	}
+
 	plt_clk_free_parent_names_loop(parent_names, data->nparents);
 
 	platform_set_drvdata(pdev, data);
 	return 0;
 
+err_drop_mclk:
+	clkdev_drop(data->mclk_lookup);
 err_unreg_clk_plt:
 	plt_clk_unregister_loop(data, i);
 	plt_clk_unregister_parents(data);
@@ -369,6 +379,7 @@ static int plt_clk_remove(struct platform_device *pdev)
 
 	data = platform_get_drvdata(pdev);
 
+	clkdev_drop(data->ether_clk_lookup);
 	clkdev_drop(data->mclk_lookup);
 	plt_clk_unregister_loop(data, PMC_CLK_NUM);
 	plt_clk_unregister_parents(data);
-- 
2.18.0

^ permalink raw reply related

* [PATCH 0/4] clk-pmc-atom + r8169: Add ether_clk handling to fix suspend issues
From: Hans de Goede @ 2018-08-27 14:31 UTC (permalink / raw)
  To: David S . Miller, Heiner Kallweit, Michael Turquette,
	Stephen Boyd, Andy Shevchenko, Irina Tirdea
  Cc: Hans de Goede, netdev, Johannes Stezenbach, Carlo Caione,
	linux-clk

Hi All,

This series has as goal to revert commit d31fd43c0f9a ("clk: x86: Do not gate
clocks enabled by the firmware"), because that commit causes almost all
Cherry Trail devices to not use the S0i3 powerstate when suspending, leading
to them quickly draining their battery when suspended.

This commit was added to fix issues with the r8169 NIC on some Bay Trail
devices, where the pmc_plt_clk_4 was used as clk for the r8169 chip.

This series fixes this properly, so that we can undo the trouble some
commit.

The first patch adds an "ether_clk" alias to the clk-pmc-atom driver so
that the r8169 can pass "ether_clk" as generic id to clk_get instead of
having to add x86 specific code to the r8169 driver.

The second patch makes the r8169 driver do a clk_get for "ether_clk"
(ignoring -ENOENT errors so this is optional) and if that succeeds then
it enables the clock.

The third patch effectively revert the troublesome commit.

This series has been tested on a HP Stream x360 - 11-p000nd, which uses
a Bay Trail SoC with a r8169 ethernet chip and I can confirm that the
pmc_plt_clk_4 gets properly enabled, so this series should not cause any
regressions wrt devices needing pmc_plt_clk_4 enabled (which is not the
case on the Stream x360).

To avoid regressions we need to have patches 1 and 2 merged before 3,
so it is probably best if this entire series gets merged through a single
tree. Given that clk-pmc-atom.c has not seen any changes for over a
year I suggest that all 3 patches are merged through the netdev tree,
with an ack from the clk maintainers. Assuming that is ok with the clk
maintainers of course.

Note there is a fourth patch in this series, this patch is necessary to
reach S0i3 state on Bay Trail devices wich have a r8169 ethernet chip,
since I do not have access to hardware where the r8169 actually needs
the pmc_plt_clk_4 I have not been able to test this, hence it is marked
as RFC for now.

Carlos can you test the 4th patch (when you have time) and let us know if
it works?

Regards,

Hans

^ permalink raw reply

* Re: [PATCH] net: sched: Fix memory exposure from short TCA_U32_SEL
From: Roman Mashak @ 2018-08-27 14:26 UTC (permalink / raw)
  To: Kees Cook
  Cc: Jamal Hadi Salim, Al Viro, LKML, Cong Wang, Jiri Pirko,
	David S. Miller, Network Development
In-Reply-To: <CAGXu5jJD++hegH4j0ajZNejg8mgcHjqVyjveWj9BFaXwRCV_dQ@mail.gmail.com>

Kees Cook <keescook@chromium.org> writes:

> On Mon, Aug 27, 2018 at 4:46 AM, Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>> On 2018-08-26 5:56 p.m., Kees Cook wrote:
>>>
>>> On Sun, Aug 26, 2018 at 10:30 AM, Jamal Hadi Salim <jhs@mojatatu.com>
>>> wrote:
>>>>
>>>> We should add an nla_policy later.
>>>
>>>
>>> What's the right way to do that for cases like this?
>>
>>
>> Meant something like attached which you alluded-to in your comments
>> would give an upper bound (Max allowed keys is 128).
>
> The problem is that policy doesn't parse the contents: "nkeys"
> determines the size, so we have to both validate minimum size (to be
> sure the location of "nkeys" is valid) and check that the size is at
> least nkeys * struct long. I don't think there is a way to do this
> with the existing policy language.

While at these changes, could you also add and export in UAPI max
allowed keys count, which is currently 128? For example,
TCA_U32_NKEYS_MAX in pkt_cls.h

^ permalink raw reply

* Re: KASAN: invalid-free in p9stat_free
From: Dmitry Vyukov @ 2018-08-27 14:25 UTC (permalink / raw)
  To: Dominique Martinet
  Cc: syzbot, David Miller, Eric Van Hensbergen, LKML, Latchesar Ionkov,
	netdev, syzkaller-bugs, v9fs-developer
In-Reply-To: <20180827052412.GA26294@nautica>

On Sun, Aug 26, 2018 at 10:24 PM, Dominique Martinet
<asmadeus@codewreck.org> wrote:
> syzbot wrote on Sun, Aug 26, 2018:
>> HEAD commit:    e27bc174c9c6 Add linux-next specific files for 20180824
>> git tree:       linux-next
>> console output: https://syzkaller.appspot.com/x/log.txt?x=15dc19a6400000
>> kernel config:  https://syzkaller.appspot.com/x/.config?x=28446088176757ea
>> dashboard link: https://syzkaller.appspot.com/bug?extid=d4252148d198410b864f
>> compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
>> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=15f8efba400000
>> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=1178256a400000
>>
>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>> Reported-by: syzbot+d4252148d198410b864f@syzkaller.appspotmail.com
>>
>> random: sshd: uninitialized urandom read (32 bytes read)
>> random: sshd: uninitialized urandom read (32 bytes read)
>> random: sshd: uninitialized urandom read (32 bytes read)
>> random: sshd: uninitialized urandom read (32 bytes read)
>> ==================================================================
>> BUG: KASAN: double-free or invalid-free in p9stat_free+0x35/0x100
>> net/9p/protocol.c:48
>
> That looks straight-forward enough, p9pdu_vreadf does p9stat_free on
> error then v9fs_dir_readdir does the same ; there is nothing else that
> could return an error without going through the first free so we could
> just remove the later one...
>
> There are a couple other users of the 'S' pdu read (that reads the stat
> struct and frees it on error), so it's probably best to keep the current
> behaviour as far as this is concerned, what we could do though is make
> the free function idempotent (write NULLs in the freed fields), but I do
> not see this being done often, do you know what the policy is about
> this kind of pattern nowadays?


Hi Dominique,

kfree and then null pointer is pretty common, try to run:

find -name "*.c" -exec grep -A 1 "kfree(" {} \; | grep -B 1 " = NULL;"

Leaving dangling pointers behind is not the best idea.
And from what I remember a bunch of similar double frees were fixed by
nulling the pointer after the first kfree.


> The struct is cleanly zeroed before being read so there is no risk of
> double-frees between iterations so zeroing pointers is not strictly
> required, but it does make things safer in general.

^ permalink raw reply

* Re: WARNING in format_decode (2)
From: Steven Rostedt @ 2018-08-27 17:46 UTC (permalink / raw)
  To: syzbot
  Cc: linux-kernel, mingo, syzkaller-bugs, Alexei Starovoitov,
	Daniel Borkmann, netdev
In-Reply-To: <000000000000e12d4105746dcb0d@google.com>

On Mon, 27 Aug 2018 10:10:04 -0700
syzbot <syzbot+1ec5c5ec949c4adaa0c4@syzkaller.appspotmail.com> wrote:

> Hello,
> 
> syzbot found the following crash on:
> 
> HEAD commit:    2ad0d5269970 Merge git://git.kernel.org/pub/scm/linux/kern..
> git tree:       net-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=15b8efba400000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=79e695838ce7a210
> dashboard link: https://syzkaller.appspot.com/bug?extid=1ec5c5ec949c4adaa0c4
> compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1626f761400000
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+1ec5c5ec949c4adaa0c4@syzkaller.appspotmail.com
> 
> **                                                      **
> **   NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE   **
> **********************************************************
> ------------[ cut here ]------------
> Please remove unsupported %WARNING: CPU: 0 PID: 6453 at lib/vsprintf.c:2149 format_decode+0x8fc/0xaf0  
> lib/vsprintf.c:2149
> Kernel panic - not syncing: panic_on_warn set ...
> 
> CPU: 0 PID: 6453 Comm: syz-executor7 Not tainted 4.18.0+ #190
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
> Google 01/01/2011
> Call Trace:
>   __dump_stack lib/dump_stack.c:77 [inline]
>   dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
>   panic+0x238/0x4e7 kernel/panic.c:184
>   __warn.cold.8+0x163/0x1ba kernel/panic.c:536
>   report_bug+0x252/0x2d0 lib/bug.c:186
>   fixup_bug arch/x86/kernel/traps.c:178 [inline]
>   do_error_trap+0x1fc/0x4d0 arch/x86/kernel/traps.c:296
>   do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:316
>   invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:993
> RIP: 0010:format_decode+0x8fc/0xaf0 lib/vsprintf.c:2149
> Code: e8 59 59 c9 fa 41 c6 04 24 12 e9 94 fd ff ff e8 4a 59 c9 fa 0f be f3  
> 48 c7 c7 60 bc 89 87 c6 05 28 aa d2 01 01 e8 e4 e9 93 fa <0f> 0b 4d 8b 7d  
> c0 e9 56 fe ff ff 48 8b bd 68 ff ff ff e8 cd 4f 08
> RSP: 0018:ffff8801b6b27688 EFLAGS: 00010282
> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: ffffffff816422b1 RDI: ffff8801b6b27378
> RBP: ffff8801b6b27730 R08: ffff8801b69a0040 R09: 0000000000000006
> R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801b6b277a8
> R13: ffff8801b6b27708 R14: 0000000000000000 R15: ffff8801b6b27b04
>   vsnprintf+0x185/0x1b60 lib/vsprintf.c:2245
>   vscnprintf+0x2d/0x80 lib/vsprintf.c:2396
>   __trace_array_vprintk.part.60+0xc7/0x330 kernel/trace/trace.c:2990
>   __trace_array_vprintk kernel/trace/trace.c:3021 [inline]
>   trace_array_vprintk kernel/trace/trace.c:3021 [inline]
>   trace_vprintk+0x5f/0x90 kernel/trace/trace.c:3059
>   __trace_printk+0xce/0x120 kernel/trace/trace_printk.c:237
>   ____bpf_trace_printk kernel/trace/bpf_trace.c:274 [inline]

Looks like a bug in the bpf trace printk code.

-- Steve

>   bpf_trace_printk+0xb16/0xc30 kernel/trace/bpf_trace.c:166
>   bpf_prog_e51deac1441bc083+0x94a/0x1000
> Dumping ftrace buffer:
>     (ftrace buffer empty)
> Kernel Offset: disabled
> Rebooting in 86400 seconds..
> 
> 
> ---
> This bug is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkaller@googlegroups.com.
> 
> syzbot will keep track of this bug report. See:
> https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with  
> syzbot.
> syzbot can test patches for this bug, for details see:
> https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply

* Re: [PATCH] net: mvpp2: initialize port of_node pointer
From: Andrew Lunn @ 2018-08-27 13:47 UTC (permalink / raw)
  To: Baruch Siach
  Cc: Maxime Chevallier, Antoine Tenart, netdev, Russell King,
	Ori Shem-Tov, Jason Cooper, Gregory Clement,
	Sebastian Hesselbarth
In-Reply-To: <96a50db3fefc2f1be9ecb6bf8ca91c71508e8eb1.1535371973.git.baruch@tkos.co.il>

On Mon, Aug 27, 2018 at 03:12:53PM +0300, Baruch Siach wrote:
> Without a valid of_node in struct device we can't find the mvpp2 port
> device by its DT node. Specifically, this breaks
> of_find_net_device_by_node().

Hi Baruch

We need to be a little bit careful here. I've seen this done wrongly
before, breaking DSA support. Is you intention to use DSA? Can you
quote a section of DT, and indicate which node is port_node.

Thanks
	Andrew

^ permalink raw reply

* Re: [PATCH net] sctp: remove useless start_fail from sctp_ht_iter in proc
From: Marcelo Ricardo Leitner @ 2018-08-27 13:30 UTC (permalink / raw)
  To: Xin Long; +Cc: network dev, linux-sctp, davem, Neil Horman
In-Reply-To: <265533f54ceb4684bb8323c9601a743eed409527.1535366418.git.lucien.xin@gmail.com>

On Mon, Aug 27, 2018 at 06:40:18PM +0800, Xin Long wrote:
> After changing rhashtable_walk_start to return void, start_fail would
> never be set other value than 0, and the checking for start_fail is
> pointless, so remove it.
> 
> Fixes: 97a6ec4ac021 ("rhashtable: Change rhashtable_walk_start to return void")
> Signed-off-by: Xin Long <lucien.xin@gmail.com>

Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>

> ---
>  net/sctp/proc.c | 4 ----
>  1 file changed, 4 deletions(-)
> 
> diff --git a/net/sctp/proc.c b/net/sctp/proc.c
> index 4d6f1c8..a644292 100644
> --- a/net/sctp/proc.c
> +++ b/net/sctp/proc.c
> @@ -215,7 +215,6 @@ static const struct seq_operations sctp_eps_ops = {
>  struct sctp_ht_iter {
>  	struct seq_net_private p;
>  	struct rhashtable_iter hti;
> -	int start_fail;
>  };
>  
>  static void *sctp_transport_seq_start(struct seq_file *seq, loff_t *pos)
> @@ -224,7 +223,6 @@ static void *sctp_transport_seq_start(struct seq_file *seq, loff_t *pos)
>  
>  	sctp_transport_walk_start(&iter->hti);
>  
> -	iter->start_fail = 0;
>  	return sctp_transport_get_idx(seq_file_net(seq), &iter->hti, *pos);
>  }
>  
> @@ -232,8 +230,6 @@ static void sctp_transport_seq_stop(struct seq_file *seq, void *v)
>  {
>  	struct sctp_ht_iter *iter = seq->private;
>  
> -	if (iter->start_fail)
> -		return;
>  	sctp_transport_walk_stop(&iter->hti);
>  }
>  
> -- 
> 2.1.0
> 

^ permalink raw reply

* Re: [PATCH net] sctp: hold transport before accessing its asoc in sctp_transport_get_next
From: Marcelo Ricardo Leitner @ 2018-08-27 13:28 UTC (permalink / raw)
  To: Xin Long; +Cc: network dev, linux-sctp, davem, Neil Horman
In-Reply-To: <607dd2950d09fc83404d670a73099523087d4963.1535366311.git.lucien.xin@gmail.com>

On Mon, Aug 27, 2018 at 06:38:31PM +0800, Xin Long wrote:
> As Marcelo noticed, in sctp_transport_get_next, it is iterating over
> transports but then also accessing the association directly, without
> checking any refcnts before that, which can cause an use-after-free
> Read.
> 
> So fix it by holding transport before accessing the association. With
> that, sctp_transport_hold calls can be removed in the later places.
> 
> Fixes: 626d16f50f39 ("sctp: export some apis or variables for sctp_diag and reuse some for proc")
> Reported-by: syzbot+fe62a0c9aa6a85c6de16@syzkaller.appspotmail.com
> Signed-off-by: Xin Long <lucien.xin@gmail.com>

Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>

> ---
>  net/sctp/proc.c   |  4 ----
>  net/sctp/socket.c | 22 +++++++++++++++-------
>  2 files changed, 15 insertions(+), 11 deletions(-)
> 
> diff --git a/net/sctp/proc.c b/net/sctp/proc.c
> index ef5c9a8..4d6f1c8 100644
> --- a/net/sctp/proc.c
> +++ b/net/sctp/proc.c
> @@ -264,8 +264,6 @@ static int sctp_assocs_seq_show(struct seq_file *seq, void *v)
>  	}
>  
>  	transport = (struct sctp_transport *)v;
> -	if (!sctp_transport_hold(transport))
> -		return 0;
>  	assoc = transport->asoc;
>  	epb = &assoc->base;
>  	sk = epb->sk;
> @@ -322,8 +320,6 @@ static int sctp_remaddr_seq_show(struct seq_file *seq, void *v)
>  	}
>  
>  	transport = (struct sctp_transport *)v;
> -	if (!sctp_transport_hold(transport))
> -		return 0;
>  	assoc = transport->asoc;
>  
>  	list_for_each_entry_rcu(tsp, &assoc->peer.transport_addr_list,
> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> index e96b15a..aa76586 100644
> --- a/net/sctp/socket.c
> +++ b/net/sctp/socket.c
> @@ -5005,9 +5005,14 @@ struct sctp_transport *sctp_transport_get_next(struct net *net,
>  			break;
>  		}
>  
> +		if (!sctp_transport_hold(t))
> +			continue;
> +
>  		if (net_eq(sock_net(t->asoc->base.sk), net) &&
>  		    t->asoc->peer.primary_path == t)
>  			break;
> +
> +		sctp_transport_put(t);
>  	}
>  
>  	return t;
> @@ -5017,13 +5022,18 @@ struct sctp_transport *sctp_transport_get_idx(struct net *net,
>  					      struct rhashtable_iter *iter,
>  					      int pos)
>  {
> -	void *obj = SEQ_START_TOKEN;
> +	struct sctp_transport *t;
>  
> -	while (pos && (obj = sctp_transport_get_next(net, iter)) &&
> -	       !IS_ERR(obj))
> -		pos--;
> +	if (!pos)
> +		return SEQ_START_TOKEN;
>  
> -	return obj;
> +	while ((t = sctp_transport_get_next(net, iter)) && !IS_ERR(t)) {
> +		if (!--pos)
> +			break;
> +		sctp_transport_put(t);
> +	}
> +
> +	return t;
>  }
>  
>  int sctp_for_each_endpoint(int (*cb)(struct sctp_endpoint *, void *),
> @@ -5082,8 +5092,6 @@ int sctp_for_each_transport(int (*cb)(struct sctp_transport *, void *),
>  
>  	tsp = sctp_transport_get_idx(net, &hti, *pos + 1);
>  	for (; !IS_ERR_OR_NULL(tsp); tsp = sctp_transport_get_next(net, &hti)) {
> -		if (!sctp_transport_hold(tsp))
> -			continue;
>  		ret = cb(tsp, p);
>  		if (ret)
>  			break;
> -- 
> 2.1.0
> 

^ permalink raw reply

* Re: [PATCH net] sctp: remove useless start_fail from sctp_ht_iter in proc
From: Neil Horman @ 2018-08-27 13:13 UTC (permalink / raw)
  To: Xin Long; +Cc: network dev, linux-sctp, davem, Marcelo Ricardo Leitner
In-Reply-To: <265533f54ceb4684bb8323c9601a743eed409527.1535366418.git.lucien.xin@gmail.com>

On Mon, Aug 27, 2018 at 06:40:18PM +0800, Xin Long wrote:
> After changing rhashtable_walk_start to return void, start_fail would
> never be set other value than 0, and the checking for start_fail is
> pointless, so remove it.
> 
> Fixes: 97a6ec4ac021 ("rhashtable: Change rhashtable_walk_start to return void")
> Signed-off-by: Xin Long <lucien.xin@gmail.com>
> ---
>  net/sctp/proc.c | 4 ----
>  1 file changed, 4 deletions(-)
> 
> diff --git a/net/sctp/proc.c b/net/sctp/proc.c
> index 4d6f1c8..a644292 100644
> --- a/net/sctp/proc.c
> +++ b/net/sctp/proc.c
> @@ -215,7 +215,6 @@ static const struct seq_operations sctp_eps_ops = {
>  struct sctp_ht_iter {
>  	struct seq_net_private p;
>  	struct rhashtable_iter hti;
> -	int start_fail;
>  };
>  
>  static void *sctp_transport_seq_start(struct seq_file *seq, loff_t *pos)
> @@ -224,7 +223,6 @@ static void *sctp_transport_seq_start(struct seq_file *seq, loff_t *pos)
>  
>  	sctp_transport_walk_start(&iter->hti);
>  
> -	iter->start_fail = 0;
>  	return sctp_transport_get_idx(seq_file_net(seq), &iter->hti, *pos);
>  }
>  
> @@ -232,8 +230,6 @@ static void sctp_transport_seq_stop(struct seq_file *seq, void *v)
>  {
>  	struct sctp_ht_iter *iter = seq->private;
>  
> -	if (iter->start_fail)
> -		return;
>  	sctp_transport_walk_stop(&iter->hti);
>  }
>  
> -- 
> 2.1.0
> 
> 
Acked-by: Neil Horman <nhorman@tuxdriver.com>

^ permalink raw reply

* Re: [PATCH net] sctp: hold transport before accessing its asoc in sctp_transport_get_next
From: Neil Horman @ 2018-08-27 13:08 UTC (permalink / raw)
  To: Xin Long; +Cc: network dev, linux-sctp, davem, Marcelo Ricardo Leitner
In-Reply-To: <607dd2950d09fc83404d670a73099523087d4963.1535366311.git.lucien.xin@gmail.com>

On Mon, Aug 27, 2018 at 06:38:31PM +0800, Xin Long wrote:
> As Marcelo noticed, in sctp_transport_get_next, it is iterating over
> transports but then also accessing the association directly, without
> checking any refcnts before that, which can cause an use-after-free
> Read.
> 
> So fix it by holding transport before accessing the association. With
> that, sctp_transport_hold calls can be removed in the later places.
> 
> Fixes: 626d16f50f39 ("sctp: export some apis or variables for sctp_diag and reuse some for proc")
> Reported-by: syzbot+fe62a0c9aa6a85c6de16@syzkaller.appspotmail.com
> Signed-off-by: Xin Long <lucien.xin@gmail.com>
> ---
>  net/sctp/proc.c   |  4 ----
>  net/sctp/socket.c | 22 +++++++++++++++-------
>  2 files changed, 15 insertions(+), 11 deletions(-)
> 
> diff --git a/net/sctp/proc.c b/net/sctp/proc.c
> index ef5c9a8..4d6f1c8 100644
> --- a/net/sctp/proc.c
> +++ b/net/sctp/proc.c
> @@ -264,8 +264,6 @@ static int sctp_assocs_seq_show(struct seq_file *seq, void *v)
>  	}
>  
>  	transport = (struct sctp_transport *)v;
> -	if (!sctp_transport_hold(transport))
> -		return 0;
>  	assoc = transport->asoc;
>  	epb = &assoc->base;
>  	sk = epb->sk;
> @@ -322,8 +320,6 @@ static int sctp_remaddr_seq_show(struct seq_file *seq, void *v)
>  	}
>  
>  	transport = (struct sctp_transport *)v;
> -	if (!sctp_transport_hold(transport))
> -		return 0;
>  	assoc = transport->asoc;
>  
>  	list_for_each_entry_rcu(tsp, &assoc->peer.transport_addr_list,
> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> index e96b15a..aa76586 100644
> --- a/net/sctp/socket.c
> +++ b/net/sctp/socket.c
> @@ -5005,9 +5005,14 @@ struct sctp_transport *sctp_transport_get_next(struct net *net,
>  			break;
>  		}
>  
> +		if (!sctp_transport_hold(t))
> +			continue;
> +
>  		if (net_eq(sock_net(t->asoc->base.sk), net) &&
>  		    t->asoc->peer.primary_path == t)
>  			break;
> +
> +		sctp_transport_put(t);
>  	}
>  
>  	return t;
> @@ -5017,13 +5022,18 @@ struct sctp_transport *sctp_transport_get_idx(struct net *net,
>  					      struct rhashtable_iter *iter,
>  					      int pos)
>  {
> -	void *obj = SEQ_START_TOKEN;
> +	struct sctp_transport *t;
>  
> -	while (pos && (obj = sctp_transport_get_next(net, iter)) &&
> -	       !IS_ERR(obj))
> -		pos--;
> +	if (!pos)
> +		return SEQ_START_TOKEN;
>  
> -	return obj;
> +	while ((t = sctp_transport_get_next(net, iter)) && !IS_ERR(t)) {
> +		if (!--pos)
> +			break;
> +		sctp_transport_put(t);
> +	}
> +
> +	return t;
>  }
>  
>  int sctp_for_each_endpoint(int (*cb)(struct sctp_endpoint *, void *),
> @@ -5082,8 +5092,6 @@ int sctp_for_each_transport(int (*cb)(struct sctp_transport *, void *),
>  
>  	tsp = sctp_transport_get_idx(net, &hti, *pos + 1);
>  	for (; !IS_ERR_OR_NULL(tsp); tsp = sctp_transport_get_next(net, &hti)) {
> -		if (!sctp_transport_hold(tsp))
> -			continue;
>  		ret = cb(tsp, p);
>  		if (ret)
>  			break;
> -- 
> 2.1.0
> 
> 
Acked-by: Neil Horman <nhorman@tuxdriver.com>

Additionally, its not germaine to this particular fix, but why are we still
using that pos variable in sctp_transport_get_idx?  With the conversion to
rhashtables, it doesn't seem particularly useful anymore.

Neil

^ permalink raw reply

* Re: [PATCH net-next v3 02/10] net: mvpp2: phylink support
From: Russell King - ARM Linux @ 2018-08-27 16:50 UTC (permalink / raw)
  To: Antoine Tenart
  Cc: davem, kishon, gregory.clement, andrew, jason,
	sebastian.hesselbarth, netdev, linux-kernel, thomas.petazzoni,
	maxime.chevallier, miquel.raynal, nadavh, stefanc, ymarkman, mw,
	linux-arm-kernel
In-Reply-To: <20180517082939.14598-3-antoine.tenart@bootlin.com>

On Thu, May 17, 2018 at 10:29:31AM +0200, Antoine Tenart wrote:
> +static void mvpp2_mac_config(struct net_device *dev, unsigned int mode,
> +			     const struct phylink_link_state *state)
> +{
> +	struct mvpp2_port *port = netdev_priv(dev);
> +
> +	/* Check for invalid configuration */
> +	if (state->interface == PHY_INTERFACE_MODE_10GKR && port->gop_id != 0) {
> +		netdev_err(dev, "Invalid mode on %s\n", dev->name);
> +		return;
> +	}
> +
> +	netif_tx_stop_all_queues(port->dev);
> +	if (!port->has_phy)
> +		netif_carrier_off(port->dev);
...
> +	/* If the port already was up, make sure it's still in the same state */
> +	if (state->link || !port->has_phy) {
> +		mvpp2_port_enable(port);
> +
> +		mvpp2_egress_enable(port);
> +		mvpp2_ingress_enable(port);
> +		if (!port->has_phy)
> +			netif_carrier_on(dev);
> +		netif_tx_wake_all_queues(dev);
> +	}

I'm guessing that the above is what is classed as "small fixes"
in the cover letter for this series - as such I didn't look closely
at this patch, but I should have, because this is not a "small fix".
I don't find the above code in previous patches, and I don't find
any description about why this has changed.

I'm on record for having said (many times) that drivers that are
converted to phylink should _not_ mess with the net device carrier
state themselves, but should leave it to phylink to manage.

The above appears to cause a problem when testing on a singleshot
mcbin board - as a direct result of the above, I find that _all_
the network devices apparently have link, including those which
have no SFP inserted into the cage.  This results in debian jessie
hanging at boot for over 1 minute while systemd tries to bring up
all network devices.

The same should be true of the doubleshot board's 1G cage - which
is the same as the singleshot in every respect, and the singleshot
exhibits this problem there as well.  I haven't actually tested
this on the doubleshot though.

Have you identified a case where mac_config() is called with the
link up for which a major reconfiguration is required?  mac_config()
will get called for small reconfigurations such as changing the
pause parameters (which should _not_ require the queues to be
restarted) but the link should always be down for major
reconfigurations such sa changing the PHY interface mode.

mac_config() can also be called when nothing has changed - if we've
decided to re-run the link state resolve, it can be called with the
same parameters as previously, especially now that we have polling
of a GPIO for link state monitoring.  With your code above, this will
cause mvpp2 to down/up the carrier state every second.

Note that state->link here is the _future_ state of the link, which
may change state before the mac_link_up()/down() functions have been
called - turning the carrier on/off like this will prevent these
callbacks being made, and the link state being printed by phylink.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 13.8Mbps down 630kbps up
According to speedtest.net: 13Mbps down 490kbps up

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox