Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH v3 bpf-next 05/12] libbpf: add resizable non-thread safe internal hashmap
From: Andrii Nakryiko @ 2019-07-18  3:35 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Andrii Nakryiko, Alexei Starovoitov, Daniel Borkmann, Networking,
	bpf, Kernel Team, linux-perf-users, Jiri Olsa, Namhyung Kim,
	Adrian Hunter
In-Reply-To: <20190718002436.GA18962@kernel.org>

On Wed, Jul 17, 2019 at 5:24 PM Arnaldo Carvalho de Melo
<arnaldo.melo@gmail.com> wrote:
>
> Em Fri, May 24, 2019 at 11:59:00AM -0700, Andrii Nakryiko escreveu:
> > There is a need for fast point lookups inside libbpf for multiple use
> > cases (e.g., name resolution for BTF-to-C conversion, by-name lookups in
> > BTF for upcoming BPF CO-RE relocation support, etc). This patch
> > implements simple resizable non-thread safe hashmap using single linked
> > list chains.
>
> This is breaking the tools/perf build in some systems where __WORDSIZE use in
> tools/lib/bpf/hashmap.h is not finding the definition of __WORDSIZE,
> which I think requires using:
>
> #include <limits.h>
>
> Or perhaps
>
> #include <stdint.h>
>
> Non-exhaustive search on a fedora:30 system:
>
> [acme@quaco perf]$ grep "define.*__WORDSIZE" /usr/include/*/*.h
> /usr/include/bits/elfclass.h:#define __ELF_NATIVE_CLASS __WORDSIZE
> /usr/include/bits/siginfo-arch.h:#if defined __x86_64__ && __WORDSIZE == 32
> /usr/include/bits/timesize.h:# define __TIMESIZE        __WORDSIZE
> /usr/include/bits/wordsize.h:# define __WORDSIZE        64
> /usr/include/bits/wordsize.h:# define __WORDSIZE        32
> /usr/include/bits/wordsize.h:#define __WORDSIZE32_SIZE_ULONG            0
> /usr/include/bits/wordsize.h:#define __WORDSIZE32_PTRDIFF_LONG  0
> /usr/include/bits/wordsize.h:# define __WORDSIZE_TIME64_COMPAT32        1
> /usr/include/bits/wordsize.h:# define __WORDSIZE_TIME64_COMPAT32        0
> [acme@quaco perf]$ grep bits\/wordsize.h /usr/include/*.h
> /usr/include/bfd.h:#include <bits/wordsize.h>
> /usr/include/limits.h:#include <bits/wordsize.h>
> /usr/include/pthread.h:#include <bits/wordsize.h>
> /usr/include/stdint.h:#include <bits/wordsize.h>
> /usr/include/tiffconf.h:#include <bits/wordsize.h>
> [acme@quaco perf]$
>
> On fedora:30 it works, probably because some header included from there
> ends up adding the file that has that def, but these fail, for instance:
>
> [perfbuilder@quaco linux-perf-tools-build]$ export PERF_TARBALL=http://192.168.124.1/perf/perf-5.2.0.tar.xz
> [perfbuilder@quaco linux-perf-tools-build]$ time dm
>    1    13.57 alpine:3.4                    : FAIL gcc (Alpine 5.3.0) 5.3.0, clang version 3.8.0 (tags/RELEASE_380/final)
>    2    13.78 alpine:3.5                    : FAIL gcc (Alpine 6.2.1) 6.2.1 20160822, clang version 3.8.1 (tags/RELEASE_381/final)
>    3    12.42 alpine:3.6                    : FAIL gcc (Alpine 6.3.0) 6.3.0, clang version 4.0.0 (tags/RELEASE_400/final)
>    4    15.01 alpine:3.7                    : FAIL gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.0 (tags/RELEASE_500/final) (based on LLVM 5.0.0)
>    5    13.80 alpine:3.8                    : FAIL gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.1 (tags/RELEASE_501/final) (based on LLVM 5.0.1)
>    6    15.37 alpine:3.9                    : FAIL gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 5.0.1 (tags/RELEASE_502/final) (based on LLVM 5.0.1)
>    7    15.10 alpine:3.10                   : FAIL gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 8.0.0 (tags/RELEASE_800/final) (based on LLVM 8.0.0)
>    8    15.26 alpine:edge                   : FAIL gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 7.0.1 (tags/RELEASE_701/final) (based on LLVM 7.0.1)
>    9: amazonlinux:1^C
>
> I'm trying to see if adding #include <limits.h> fixes the problems in
> all the distros in my container based build test suite.
>
> Lets see with at least alpine:3.4:
>
> Nope, didn't work, will try revisiting this tomorrow...
>
> I'll let it build anyway to see if this fails in other systems/libcs,
> BTW, Alpine uses musl libc.

So it seems like __WORDSIZE is declared in bits/wordsize.h for glibc
and in bits/reg.h for musl, so something like the following should fix
this issue, but I can't easily verify. Can you please see if these
changes fix the issue on your side?

#ifdef __GLIBC__
#include <bits/wordsize.h>
#else
#include <bits/reg.h>
#endif

>
> - Arnaldo
>
> > Four different insert strategies are supported:
> >  - HASHMAP_ADD - only add key/value if key doesn't exist yet;
> >  - HASHMAP_SET - add key/value pair if key doesn't exist yet; otherwise,
> >    update value;
> >  - HASHMAP_UPDATE - update value, if key already exists; otherwise, do
> >    nothing and return -ENOENT;
> >  - HASHMAP_APPEND - always add key/value pair, even if key already exists.
> >    This turns hashmap into a multimap by allowing multiple values to be
> >    associated with the same key. Most useful read API for such hashmap is
> >    hashmap__for_each_key_entry() iteration. If hashmap__find() is still
> >    used, it will return last inserted key/value entry (first in a bucket
> >    chain).
> >
> > For HASHMAP_SET and HASHMAP_UPDATE, old key/value pair is returned, so
> > that calling code can handle proper memory management, if necessary.
> >
> > Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> > ---
> >  tools/lib/bpf/Build     |   2 +-
> >  tools/lib/bpf/hashmap.c | 229 ++++++++++++++++++++++++++++++++++++++++
> >  tools/lib/bpf/hashmap.h | 173 ++++++++++++++++++++++++++++++
> >  3 files changed, 403 insertions(+), 1 deletion(-)
> >  create mode 100644 tools/lib/bpf/hashmap.c
> >  create mode 100644 tools/lib/bpf/hashmap.h
> >
> > diff --git a/tools/lib/bpf/Build b/tools/lib/bpf/Build
> > index ee9d5362f35b..dcf0d36598e0 100644
> > --- a/tools/lib/bpf/Build
> > +++ b/tools/lib/bpf/Build
> > @@ -1 +1 @@
> > -libbpf-y := libbpf.o bpf.o nlattr.o btf.o libbpf_errno.o str_error.o netlink.o bpf_prog_linfo.o libbpf_probes.o xsk.o
> > +libbpf-y := libbpf.o bpf.o nlattr.o btf.o libbpf_errno.o str_error.o netlink.o bpf_prog_linfo.o libbpf_probes.o xsk.o hashmap.o
> > diff --git a/tools/lib/bpf/hashmap.c b/tools/lib/bpf/hashmap.c
> > new file mode 100644
> > index 000000000000..6122272943e6
> > --- /dev/null
> > +++ b/tools/lib/bpf/hashmap.c
> > @@ -0,0 +1,229 @@
> > +// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
> > +
> > +/*
> > + * Generic non-thread safe hash map implementation.
> > + *
> > + * Copyright (c) 2019 Facebook
> > + */
> > +#include <stdint.h>
> > +#include <stdlib.h>
> > +#include <stdio.h>
> > +#include <errno.h>
> > +#include <linux/err.h>
> > +#include "hashmap.h"
> > +
> > +/* start with 4 buckets */
> > +#define HASHMAP_MIN_CAP_BITS 2
> > +
> > +static void hashmap_add_entry(struct hashmap_entry **pprev,
> > +                           struct hashmap_entry *entry)
> > +{
> > +     entry->next = *pprev;
> > +     *pprev = entry;
> > +}
> > +
> > +static void hashmap_del_entry(struct hashmap_entry **pprev,
> > +                           struct hashmap_entry *entry)
> > +{
> > +     *pprev = entry->next;
> > +     entry->next = NULL;
> > +}
> > +
> > +void hashmap__init(struct hashmap *map, hashmap_hash_fn hash_fn,
> > +                hashmap_equal_fn equal_fn, void *ctx)
> > +{
> > +     map->hash_fn = hash_fn;
> > +     map->equal_fn = equal_fn;
> > +     map->ctx = ctx;
> > +
> > +     map->buckets = NULL;
> > +     map->cap = 0;
> > +     map->cap_bits = 0;
> > +     map->sz = 0;
> > +}
> > +
> > +struct hashmap *hashmap__new(hashmap_hash_fn hash_fn,
> > +                          hashmap_equal_fn equal_fn,
> > +                          void *ctx)
> > +{
> > +     struct hashmap *map = malloc(sizeof(struct hashmap));
> > +
> > +     if (!map)
> > +             return ERR_PTR(-ENOMEM);
> > +     hashmap__init(map, hash_fn, equal_fn, ctx);
> > +     return map;
> > +}
> > +
> > +void hashmap__clear(struct hashmap *map)
> > +{
> > +     free(map->buckets);
> > +     map->cap = map->cap_bits = map->sz = 0;
> > +}
> > +
> > +void hashmap__free(struct hashmap *map)
> > +{
> > +     if (!map)
> > +             return;
> > +
> > +     hashmap__clear(map);
> > +     free(map);
> > +}
> > +
> > +size_t hashmap__size(const struct hashmap *map)
> > +{
> > +     return map->sz;
> > +}
> > +
> > +size_t hashmap__capacity(const struct hashmap *map)
> > +{
> > +     return map->cap;
> > +}
> > +
> > +static bool hashmap_needs_to_grow(struct hashmap *map)
> > +{
> > +     /* grow if empty or more than 75% filled */
> > +     return (map->cap == 0) || ((map->sz + 1) * 4 / 3 > map->cap);
> > +}
> > +
> > +static int hashmap_grow(struct hashmap *map)
> > +{
> > +     struct hashmap_entry **new_buckets;
> > +     struct hashmap_entry *cur, *tmp;
> > +     size_t new_cap_bits, new_cap;
> > +     size_t h;
> > +     int bkt;
> > +
> > +     new_cap_bits = map->cap_bits + 1;
> > +     if (new_cap_bits < HASHMAP_MIN_CAP_BITS)
> > +             new_cap_bits = HASHMAP_MIN_CAP_BITS;
> > +
> > +     new_cap = 1UL << new_cap_bits;
> > +     new_buckets = calloc(new_cap, sizeof(new_buckets[0]));
> > +     if (!new_buckets)
> > +             return -ENOMEM;
> > +
> > +     hashmap__for_each_entry_safe(map, cur, tmp, bkt) {
> > +             h = hash_bits(map->hash_fn(cur->key, map->ctx), new_cap_bits);
> > +             hashmap_add_entry(&new_buckets[h], cur);
> > +     }
> > +
> > +     map->cap = new_cap;
> > +     map->cap_bits = new_cap_bits;
> > +     free(map->buckets);
> > +     map->buckets = new_buckets;
> > +
> > +     return 0;
> > +}
> > +
> > +static bool hashmap_find_entry(const struct hashmap *map,
> > +                            const void *key, size_t hash,
> > +                            struct hashmap_entry ***pprev,
> > +                            struct hashmap_entry **entry)
> > +{
> > +     struct hashmap_entry *cur, **prev_ptr;
> > +
> > +     if (!map->buckets)
> > +             return false;
> > +
> > +     for (prev_ptr = &map->buckets[hash], cur = *prev_ptr;
> > +          cur;
> > +          prev_ptr = &cur->next, cur = cur->next) {
> > +             if (map->equal_fn(cur->key, key, map->ctx)) {
> > +                     if (pprev)
> > +                             *pprev = prev_ptr;
> > +                     *entry = cur;
> > +                     return true;
> > +             }
> > +     }
> > +
> > +     return false;
> > +}
> > +
> > +int hashmap__insert(struct hashmap *map, const void *key, void *value,
> > +                 enum hashmap_insert_strategy strategy,
> > +                 const void **old_key, void **old_value)
> > +{
> > +     struct hashmap_entry *entry;
> > +     size_t h;
> > +     int err;
> > +
> > +     if (old_key)
> > +             *old_key = NULL;
> > +     if (old_value)
> > +             *old_value = NULL;
> > +
> > +     h = hash_bits(map->hash_fn(key, map->ctx), map->cap_bits);
> > +     if (strategy != HASHMAP_APPEND &&
> > +         hashmap_find_entry(map, key, h, NULL, &entry)) {
> > +             if (old_key)
> > +                     *old_key = entry->key;
> > +             if (old_value)
> > +                     *old_value = entry->value;
> > +
> > +             if (strategy == HASHMAP_SET || strategy == HASHMAP_UPDATE) {
> > +                     entry->key = key;
> > +                     entry->value = value;
> > +                     return 0;
> > +             } else if (strategy == HASHMAP_ADD) {
> > +                     return -EEXIST;
> > +             }
> > +     }
> > +
> > +     if (strategy == HASHMAP_UPDATE)
> > +             return -ENOENT;
> > +
> > +     if (hashmap_needs_to_grow(map)) {
> > +             err = hashmap_grow(map);
> > +             if (err)
> > +                     return err;
> > +             h = hash_bits(map->hash_fn(key, map->ctx), map->cap_bits);
> > +     }
> > +
> > +     entry = malloc(sizeof(struct hashmap_entry));
> > +     if (!entry)
> > +             return -ENOMEM;
> > +
> > +     entry->key = key;
> > +     entry->value = value;
> > +     hashmap_add_entry(&map->buckets[h], entry);
> > +     map->sz++;
> > +
> > +     return 0;
> > +}
> > +
> > +bool hashmap__find(const struct hashmap *map, const void *key, void **value)
> > +{
> > +     struct hashmap_entry *entry;
> > +     size_t h;
> > +
> > +     h = hash_bits(map->hash_fn(key, map->ctx), map->cap_bits);
> > +     if (!hashmap_find_entry(map, key, h, NULL, &entry))
> > +             return false;
> > +
> > +     if (value)
> > +             *value = entry->value;
> > +     return true;
> > +}
> > +
> > +bool hashmap__delete(struct hashmap *map, const void *key,
> > +                  const void **old_key, void **old_value)
> > +{
> > +     struct hashmap_entry **pprev, *entry;
> > +     size_t h;
> > +
> > +     h = hash_bits(map->hash_fn(key, map->ctx), map->cap_bits);
> > +     if (!hashmap_find_entry(map, key, h, &pprev, &entry))
> > +             return false;
> > +
> > +     if (old_key)
> > +             *old_key = entry->key;
> > +     if (old_value)
> > +             *old_value = entry->value;
> > +
> > +     hashmap_del_entry(pprev, entry);
> > +     free(entry);
> > +     map->sz--;
> > +
> > +     return true;
> > +}
> > +
> > diff --git a/tools/lib/bpf/hashmap.h b/tools/lib/bpf/hashmap.h
> > new file mode 100644
> > index 000000000000..03748a742146
> > --- /dev/null
> > +++ b/tools/lib/bpf/hashmap.h
> > @@ -0,0 +1,173 @@
> > +/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
> > +
> > +/*
> > + * Generic non-thread safe hash map implementation.
> > + *
> > + * Copyright (c) 2019 Facebook
> > + */
> > +#ifndef __LIBBPF_HASHMAP_H
> > +#define __LIBBPF_HASHMAP_H
> > +
> > +#include <stdbool.h>
> > +#include <stddef.h>
> > +#include "libbpf_internal.h"
> > +
> > +static inline size_t hash_bits(size_t h, int bits)
> > +{
> > +     /* shuffle bits and return requested number of upper bits */
> > +     return (h * 11400714819323198485llu) >> (__WORDSIZE - bits);
> > +}
> > +
> > +typedef size_t (*hashmap_hash_fn)(const void *key, void *ctx);
> > +typedef bool (*hashmap_equal_fn)(const void *key1, const void *key2, void *ctx);
> > +
> > +struct hashmap_entry {
> > +     const void *key;
> > +     void *value;
> > +     struct hashmap_entry *next;
> > +};
> > +
> > +struct hashmap {
> > +     hashmap_hash_fn hash_fn;
> > +     hashmap_equal_fn equal_fn;
> > +     void *ctx;
> > +
> > +     struct hashmap_entry **buckets;
> > +     size_t cap;
> > +     size_t cap_bits;
> > +     size_t sz;
> > +};
> > +
> > +#define HASHMAP_INIT(hash_fn, equal_fn, ctx) {       \
> > +     .hash_fn = (hash_fn),                   \
> > +     .equal_fn = (equal_fn),                 \
> > +     .ctx = (ctx),                           \
> > +     .buckets = NULL,                        \
> > +     .cap = 0,                               \
> > +     .cap_bits = 0,                          \
> > +     .sz = 0,                                \
> > +}
> > +
> > +void hashmap__init(struct hashmap *map, hashmap_hash_fn hash_fn,
> > +                hashmap_equal_fn equal_fn, void *ctx);
> > +struct hashmap *hashmap__new(hashmap_hash_fn hash_fn,
> > +                          hashmap_equal_fn equal_fn,
> > +                          void *ctx);
> > +void hashmap__clear(struct hashmap *map);
> > +void hashmap__free(struct hashmap *map);
> > +
> > +size_t hashmap__size(const struct hashmap *map);
> > +size_t hashmap__capacity(const struct hashmap *map);
> > +
> > +/*
> > + * Hashmap insertion strategy:
> > + * - HASHMAP_ADD - only add key/value if key doesn't exist yet;
> > + * - HASHMAP_SET - add key/value pair if key doesn't exist yet; otherwise,
> > + *   update value;
> > + * - HASHMAP_UPDATE - update value, if key already exists; otherwise, do
> > + *   nothing and return -ENOENT;
> > + * - HASHMAP_APPEND - always add key/value pair, even if key already exists.
> > + *   This turns hashmap into a multimap by allowing multiple values to be
> > + *   associated with the same key. Most useful read API for such hashmap is
> > + *   hashmap__for_each_key_entry() iteration. If hashmap__find() is still
> > + *   used, it will return last inserted key/value entry (first in a bucket
> > + *   chain).
> > + */
> > +enum hashmap_insert_strategy {
> > +     HASHMAP_ADD,
> > +     HASHMAP_SET,
> > +     HASHMAP_UPDATE,
> > +     HASHMAP_APPEND,
> > +};
> > +
> > +/*
> > + * hashmap__insert() adds key/value entry w/ various semantics, depending on
> > + * provided strategy value. If a given key/value pair replaced already
> > + * existing key/value pair, both old key and old value will be returned
> > + * through old_key and old_value to allow calling code do proper memory
> > + * management.
> > + */
> > +int hashmap__insert(struct hashmap *map, const void *key, void *value,
> > +                 enum hashmap_insert_strategy strategy,
> > +                 const void **old_key, void **old_value);
> > +
> > +static inline int hashmap__add(struct hashmap *map,
> > +                            const void *key, void *value)
> > +{
> > +     return hashmap__insert(map, key, value, HASHMAP_ADD, NULL, NULL);
> > +}
> > +
> > +static inline int hashmap__set(struct hashmap *map,
> > +                            const void *key, void *value,
> > +                            const void **old_key, void **old_value)
> > +{
> > +     return hashmap__insert(map, key, value, HASHMAP_SET,
> > +                            old_key, old_value);
> > +}
> > +
> > +static inline int hashmap__update(struct hashmap *map,
> > +                               const void *key, void *value,
> > +                               const void **old_key, void **old_value)
> > +{
> > +     return hashmap__insert(map, key, value, HASHMAP_UPDATE,
> > +                            old_key, old_value);
> > +}
> > +
> > +static inline int hashmap__append(struct hashmap *map,
> > +                               const void *key, void *value)
> > +{
> > +     return hashmap__insert(map, key, value, HASHMAP_APPEND, NULL, NULL);
> > +}
> > +
> > +bool hashmap__delete(struct hashmap *map, const void *key,
> > +                  const void **old_key, void **old_value);
> > +
> > +bool hashmap__find(const struct hashmap *map, const void *key, void **value);
> > +
> > +/*
> > + * hashmap__for_each_entry - iterate over all entries in hashmap
> > + * @map: hashmap to iterate
> > + * @cur: struct hashmap_entry * used as a loop cursor
> > + * @bkt: integer used as a bucket loop cursor
> > + */
> > +#define hashmap__for_each_entry(map, cur, bkt)                                   \
> > +     for (bkt = 0; bkt < map->cap; bkt++)                                \
> > +             for (cur = map->buckets[bkt]; cur; cur = cur->next)
> > +
> > +/*
> > + * hashmap__for_each_entry_safe - iterate over all entries in hashmap, safe
> > + * against removals
> > + * @map: hashmap to iterate
> > + * @cur: struct hashmap_entry * used as a loop cursor
> > + * @tmp: struct hashmap_entry * used as a temporary next cursor storage
> > + * @bkt: integer used as a bucket loop cursor
> > + */
> > +#define hashmap__for_each_entry_safe(map, cur, tmp, bkt)                 \
> > +     for (bkt = 0; bkt < map->cap; bkt++)                                \
> > +             for (cur = map->buckets[bkt];                               \
> > +                  cur && ({tmp = cur->next; true; });                    \
> > +                  cur = tmp)
> > +
> > +/*
> > + * hashmap__for_each_key_entry - iterate over entries associated with given key
> > + * @map: hashmap to iterate
> > + * @cur: struct hashmap_entry * used as a loop cursor
> > + * @key: key to iterate entries for
> > + */
> > +#define hashmap__for_each_key_entry(map, cur, _key)                      \
> > +     for (cur = ({ size_t bkt = hash_bits(map->hash_fn((_key), map->ctx),\
> > +                                          map->cap_bits);                \
> > +                  map->buckets ? map->buckets[bkt] : NULL; });           \
> > +          cur;                                                           \
> > +          cur = cur->next)                                               \
> > +             if (map->equal_fn(cur->key, (_key), map->ctx))
> > +
> > +#define hashmap__for_each_key_entry_safe(map, cur, tmp, _key)                    \
> > +     for (cur = ({ size_t bkt = hash_bits(map->hash_fn((_key), map->ctx),\
> > +                                          map->cap_bits);                \
> > +                  cur = map->buckets ? map->buckets[bkt] : NULL; });     \
> > +          cur && ({ tmp = cur->next; true; });                           \
> > +          cur = tmp)                                                     \
> > +             if (map->equal_fn(cur->key, (_key), map->ctx))
> > +
> > +#endif /* __LIBBPF_HASHMAP_H */
> > --
> > 2.17.1
>
> --
>
> - Arnaldo

^ permalink raw reply

* Re: [PATCH] net: bcmgenet: use promisc for unsupported filters
From: Florian Fainelli @ 2019-07-18  3:38 UTC (permalink / raw)
  To: justinpopo6, netdev
  Cc: linux-kernel, bcm-kernel-feedback-list, davem, opendmb
In-Reply-To: <1563400733-39451-1-git-send-email-justinpopo6@gmail.com>



On 7/17/2019 2:58 PM, justinpopo6@gmail.com wrote:
> From: Justin Chen <justinpopo6@gmail.com>
> 
> Currently we silently ignore filters if we cannot meet the filter
> requirements. This will lead to the MAC dropping packets that are
> expected to pass. A better solution would be to set the NIC to promisc
> mode when the required filters cannot be met.
> 
> Also correct the number of MDF filters supported. It should be 17,
> not 16.
> 
> Signed-off-by: Justin Chen <justinpopo6@gmail.com>

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>

Thanks Justin!
-- 
Florian

^ permalink raw reply

* Re: KASAN: use-after-free Read in nr_release
From: syzbot @ 2019-07-18  4:05 UTC (permalink / raw)
  To: davem, hdanton, linux-hams, linux-kernel, netdev, ralf,
	syzkaller-bugs, xiyou.wangcong
In-Reply-To: <0000000000007e8b70058acbd60f@google.com>

syzbot has bisected this bug to:

commit c8c8218ec5af5d2598381883acbefbf604e56b5e
Author: Cong Wang <xiyou.wangcong@gmail.com>
Date:   Thu Jun 27 21:30:58 2019 +0000

     netrom: fix a memory leak in nr_rx_frame()

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=10a3bcd0600000
start commit:   192f0f8e Merge tag 'powerpc-5.3-1' of git://git.kernel.org..
git tree:       net-next
final crash:    https://syzkaller.appspot.com/x/report.txt?x=12a3bcd0600000
console output: https://syzkaller.appspot.com/x/log.txt?x=14a3bcd0600000
kernel config:  https://syzkaller.appspot.com/x/.config?x=87305c3ca9c25c70
dashboard link: https://syzkaller.appspot.com/bug?extid=6eaef7158b19e3fec3a0
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=15882cd0600000

Reported-by: syzbot+6eaef7158b19e3fec3a0@syzkaller.appspotmail.com
Fixes: c8c8218ec5af ("netrom: fix a memory leak in nr_rx_frame()")

For information about bisection process see: https://goo.gl/tpsmEJ#bisection

^ permalink raw reply

* [PATCH] connector: remove redundant input callback from cn_dev
From: Vasily Averin @ 2019-07-18  4:26 UTC (permalink / raw)
  To: linux-kernel, Evgeniy Polyakov
  Cc: stanislav.kinsburskiy, Linux Kernel Network Developers

A small cleanup: this callback is never used.
Originally fixed by Stanislav Kinsburskiy <skinsbursky@virtuozzo.com>
for OpenVZ7 bug OVZ-6877

cc: stanislav.kinsburskiy@gmail.com
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
---
 drivers/connector/connector.c | 6 +-----
 include/linux/connector.h     | 1 -
 2 files changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/connector/connector.c b/drivers/connector/connector.c
index 23553ed6b548..2d22d6bf52f2 100644
--- a/drivers/connector/connector.c
+++ b/drivers/connector/connector.c
@@ -248,16 +248,12 @@ static int __maybe_unused cn_proc_show(struct seq_file *m, void *v)
 	return 0;
 }
 
-static struct cn_dev cdev = {
-	.input   = cn_rx_skb,
-};
-
 static int cn_init(void)
 {
 	struct cn_dev *dev = &cdev;
 	struct netlink_kernel_cfg cfg = {
 		.groups	= CN_NETLINK_USERS + 0xf,
-		.input	= dev->input,
+		.input	= cn_rx_skb,
 	};
 
 	dev->nls = netlink_kernel_create(&init_net, NETLINK_CONNECTOR, &cfg);
diff --git a/include/linux/connector.h b/include/linux/connector.h
index 1d72ef76f24f..bc18f04e8b46 100644
--- a/include/linux/connector.h
+++ b/include/linux/connector.h
@@ -50,7 +50,6 @@ struct cn_dev {
 
 	u32 seq, groups;
 	struct sock *nls;
-	void (*input) (struct sk_buff *skb);
 
 	struct cn_queue_dev *cbdev;
 };
-- 
2.17.1


^ permalink raw reply related

* Re: [PATCH] net: dsa: sja1105: Release lock in error case
From: Yuehaibing @ 2019-07-18  6:53 UTC (permalink / raw)
  To: Hariprasad Kelam, Andrew Lunn, Vivien Didelot, Florian Fainelli,
	David S. Miller, netdev, linux-kernel
In-Reply-To: <20190718022110.GA19222@hari-Inspiron-1545>

https://patchwork.ozlabs.org/patch/1133135/

This has been fixed.

On 2019/7/18 10:21, Hariprasad Kelam wrote:
> This patch adds release of unlock in fail case.
> 
> Issue identified by coccicheck
> 
> Signed-off-by: Hariprasad Kelam <hariprasad.kelam@gmail.com>
> ---
>  net/dsa/tag_sja1105.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/net/dsa/tag_sja1105.c b/net/dsa/tag_sja1105.c
> index 1d96c9d..26363d7 100644
> --- a/net/dsa/tag_sja1105.c
> +++ b/net/dsa/tag_sja1105.c
> @@ -216,6 +216,7 @@ static struct sk_buff
>  		if (!skb) {
>  			dev_err_ratelimited(dp->ds->dev,
>  					    "Failed to copy stampable skb\n");
> +			spin_unlock(&sp->data->meta_lock);
>  			return NULL;
>  		}
>  		sja1105_transfer_meta(skb, meta);
> 


^ permalink raw reply

* Re: regression with napi/softirq ?
From: Eric Dumazet @ 2019-07-18  6:58 UTC (permalink / raw)
  To: Thomas Gleixner, Sudip Mukherjee
  Cc: Peter Zijlstra (Intel), David S. Miller, netdev, linux-kernel
In-Reply-To: <alpine.DEB.2.21.1907172345360.1778@nanos.tec.linutronix.de>



On 7/17/19 11:52 PM, Thomas Gleixner wrote:
> Sudip,
> 
> On Wed, 17 Jul 2019, Sudip Mukherjee wrote:
>> On Wed, Jul 17, 2019 at 9:53 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>>> You can hack ksoftirq_running() to return always false to avoid this, but
>>> that might cause application starvation and a huge packet buffer backlog
>>> when the amount of incoming packets makes the CPU do nothing else than
>>> softirq processing.
>>
>> I tried that now, it is better but still not as good as v3.8
>> Now I am getting 375.9usec as the maximum time between raising the softirq
>> and it starting to execute and packet drops still there.
>>
>> And just a thought, do you think there should be a CONFIG_ option for
>> this feature of ksoftirqd_running() so that it can be disabled if needed
>> by users like us?
> 
> If at all then a sysctl to allow runtime control.
>  
>> Can you please think of anything else that might have changed which I still need
>> to change to make the time comparable to v3.8..
> 
> Something with in that small range of:
> 
>  63592 files changed, 13783320 insertions(+), 5155492 deletions(-)
> 
> :)
> 
> Seriously, that can be anything.
> 
> Can you please test with Linus' head of tree and add some more
> instrumentation, so we can see what holds off softirqs from being
> processed. If the ksoftirqd enforcement is disabled, then the only reason
> can be a long lasting softirq disabled region. Tracing should tell.

ksoftirqd might be spuriously scheduled from tx path, when
__qdisc_run() also reacts to need_resched().

By raising NET_TX while we are processing NET_RX (say we send a TCP ACK packet
in response to incoming packet), we force __do_softirq() to perform
another loop, but before doing an other round, it will also check need_resched()
and eventually call wakeup_softirqd()

I wonder if following patch makes any difference.

diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 11c03cf4aa74b44663c74e0e3284140b0c75d9c4..ab736e974396394ae6ba409868aaea56a50ad57b 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -377,6 +377,8 @@ void __qdisc_run(struct Qdisc *q)
        int packets;
 
        while (qdisc_restart(q, &packets)) {
+               if (qdisc_is_empty(q))
+                       break;
                /*
                 * Ordered by possible occurrence: Postpone processing if
                 * 1. we've exceeded packet quota


^ permalink raw reply related

* RE: [PATCH net-next 3/3] net: stmmac: Introducing support for Page Pool
From: Jose Abreu @ 2019-07-18  7:29 UTC (permalink / raw)
  To: Jon Hunter, Jose Abreu, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org, linux-stm32@st-md-mailman.stormreply.com,
	linux-arm-kernel@lists.infradead.org
  Cc: Joao Pinto, David S . Miller, Giuseppe Cavallaro,
	Alexandre Torgue, Maxime Coquelin, Maxime Ripard, Chen-Yu Tsai,
	linux-tegra
In-Reply-To: <29dcc161-f7c8-026e-c3cc-5adb04df128c@nvidia.com>

From: Jon Hunter <jonathanh@nvidia.com>
Date: Jul/17/2019, 19:58:53 (UTC+00:00)

> I am seeing a boot regression on one of our Tegra boards with both
> mainline and -next. Bisecting is pointing to this commit and reverting
> this commit on top of mainline fixes the problem. Unfortunately, there
> is not much of a backtrace but what I have captured is below. 
> 
> Please note that this is seen on a system that is using NFS to mount
> the rootfs and the crash occurs right around the point the rootfs is
> mounted.
> 
> Let me know if you have any thoughts.
> 
> Cheers
> Jon 
> 
> [   12.221843] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
> [   12.229485] CPU: 5 PID: 1 Comm: init Tainted: G S                5.2.0-11500-g916f562fb28a #18
> [   12.238076] Hardware name: NVIDIA Tegra186 P2771-0000 Development Board (DT)
> [   12.245105] Call trace:
> [   12.247548]  dump_backtrace+0x0/0x150
> [   12.251199]  show_stack+0x14/0x20
> [   12.254505]  dump_stack+0x9c/0xc4
> [   12.257809]  panic+0x13c/0x32c
> [   12.260853]  complete_and_exit+0x0/0x20
> [   12.264676]  do_group_exit+0x34/0x98
> [   12.268241]  get_signal+0x104/0x668
> [   12.271718]  do_notify_resume+0x2ac/0x380
> [   12.275716]  work_pending+0x8/0x10
> [   12.279109] SMP: stopping secondary CPUs
> [   12.283025] Kernel Offset: disabled
> [   12.286502] CPU features: 0x0002,20806000
> [   12.290499] Memory Limit: none
> [   12.293548] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---
> 
> -- 
> nvpublic

You don't have any more data ? Can you activate DMA-API debug and check 
if there is any more info outputted ?

---
Thanks,
Jose Miguel Abreu

^ permalink raw reply

* Re: [PATCH AUTOSEL 5.2 226/249] selftests: bpf: fix inlines in test_lwt_seg6local
From: Jiri Benc @ 2019-07-18  7:36 UTC (permalink / raw)
  To: Sasha Levin
  Cc: linux-kernel, stable, Yonghong Song, Daniel Borkmann,
	linux-kselftest, netdev, bpf, clang-built-linux
In-Reply-To: <20190717234757.GD3079@sasha-vm>

On Wed, 17 Jul 2019 19:47:57 -0400, Sasha Levin wrote:
> It fixes a bug, right?

A bug in selftests. And quite likely, it probably happens only with
some compiler versions.

I don't think patches only touching tools/testing/selftests/ qualify
for stable in general. They don't affect the end users.

 Jiri

^ permalink raw reply

* Re: [PATCH ipsec v2 0/4] xfrm interface: bugs fixes
From: Steffen Klassert @ 2019-07-18  7:37 UTC (permalink / raw)
  To: Nicolas Dichtel; +Cc: davem, netdev
In-Reply-To: <20190715100023.8475-1-nicolas.dichtel@6wind.com>

On Mon, Jul 15, 2019 at 12:00:19PM +0200, Nicolas Dichtel wrote:
> 
> Here is a bunch of bugs fixes. Some have been seen by code review and some when
> playing with x-netns.
> The details are in each patch.
> 
> v1 -> v2:
>  - add patch #3 and #4

Series applied, thanks Nicolas!


^ permalink raw reply

* Re: [PATCH] net: ag71xx: fix return value check in ag71xx_probe()
From: Oleksij Rempel @ 2019-07-18  7:39 UTC (permalink / raw)
  To: Wei Yongjun, Jay Cliburn, Chris Snook; +Cc: netdev, kernel-janitors
In-Reply-To: <20190717115225.23047-1-weiyongjun1@huawei.com>



On 17.07.19 13:52, Wei Yongjun wrote:
> In case of error, the function of_get_mac_address() returns ERR_PTR()
> and never returns NULL. The NULL test in the return value check should
> be replaced with IS_ERR().
> 
> Fixes: d51b6ce441d3 ("net: ethernet: add ag71xx driver")
> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>

Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>

> ---
>   drivers/net/ethernet/atheros/ag71xx.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/atheros/ag71xx.c b/drivers/net/ethernet/atheros/ag71xx.c
> index 72a57c6cd254..3088a43e6436 100644
> --- a/drivers/net/ethernet/atheros/ag71xx.c
> +++ b/drivers/net/ethernet/atheros/ag71xx.c
> @@ -1732,9 +1732,9 @@ static int ag71xx_probe(struct platform_device *pdev)
>   	ag->stop_desc->next = (u32)ag->stop_desc_dma;
>   
>   	mac_addr = of_get_mac_address(np);
> -	if (mac_addr)
> +	if (!IS_ERR(mac_addr))
>   		memcpy(ndev->dev_addr, mac_addr, ETH_ALEN);
> -	if (!mac_addr || !is_valid_ether_addr(ndev->dev_addr)) {
> +	if (IS_ERR(mac_addr) || !is_valid_ether_addr(ndev->dev_addr)) {
>   		netif_err(ag, probe, ndev, "invalid MAC address, using random address\n");
>   		eth_random_addr(ndev->dev_addr);
>   	}
> 
> 
> 
> 

Kind regards,
Oleksij Rempel

-- 
Pengutronix e.K.                           |                             |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply

* Re: [PATCH net-next 00/12] mlx5 TLS TX HW offload support
From: Tariq Toukan @ 2019-07-18  7:40 UTC (permalink / raw)
  To: Jakub Kicinski, Tariq Toukan
  Cc: David Miller, netdev@vger.kernel.org, Eran Ben Elisha,
	Saeed Mahameed, Moshe Shemesh
In-Reply-To: <20190717104141.37333cc9@cakuba.netronome.com>



On 7/17/2019 8:41 PM, Jakub Kicinski wrote:
> On Sun, 7 Jul 2019 06:44:27 +0000, Tariq Toukan wrote:
>> On 7/6/2019 2:29 AM, David Miller wrote:
>>> From: Tariq Toukan <tariqt@mellanox.com>
>>> Date: Fri,  5 Jul 2019 18:30:10 +0300
>>>    
>>>> This series from Eran and me, adds TLS TX HW offload support to
>>>> the mlx5 driver.
>>>
>>> Series applied, please deal with any further feedback you get from
>>> Jakub et al.
>>
>> I will followup with patches addressing Jakub's feedback.
> 
> Ping.
> 

Hi Jakub,

I'm waiting for the window to open:
http://vger.kernel.org/~davem/net-next.html

Do you think these can already go to net as fixes?

Regards,
Tariq

^ permalink raw reply

* Re: [PATCH] net: ethernet: fix error return code in ag71xx_probe()
From: Oleksij Rempel @ 2019-07-18  7:41 UTC (permalink / raw)
  To: Wei Yongjun, Jay Cliburn, Chris Snook; +Cc: netdev, kernel-janitors
In-Reply-To: <20190717115215.22965-1-weiyongjun1@huawei.com>



On 17.07.19 13:52, Wei Yongjun wrote:
> Fix to return error code -ENOMEM from the dmam_alloc_coherent() error
> handling case instead of 0, as done elsewhere in this function.
> 
> Fixes: d51b6ce441d3 ("net: ethernet: add ag71xx driver")
> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>

Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>

> ---
>   drivers/net/ethernet/atheros/ag71xx.c | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/atheros/ag71xx.c b/drivers/net/ethernet/atheros/ag71xx.c
> index 72a57c6cd254..446d62e93439 100644
> --- a/drivers/net/ethernet/atheros/ag71xx.c
> +++ b/drivers/net/ethernet/atheros/ag71xx.c
> @@ -1724,8 +1724,10 @@ static int ag71xx_probe(struct platform_device *pdev)
>   	ag->stop_desc = dmam_alloc_coherent(&pdev->dev,
>   					    sizeof(struct ag71xx_desc),
>   					    &ag->stop_desc_dma, GFP_KERNEL);
> -	if (!ag->stop_desc)
> +	if (!ag->stop_desc) {
> +		err = -ENOMEM;
>   		goto err_free;
> +	}
>   
>   	ag->stop_desc->data = 0;
>   	ag->stop_desc->ctrl = 0;
> 
> 
> 
> 

Kind regards,
Oleksij Rempel

-- 
Pengutronix e.K.                           |                             |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply

* Re: [PATCH v4 3/5] vsock/virtio: fix locking in virtio_transport_inc_tx_pkt()
From: Stefano Garzarella @ 2019-07-18  7:43 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
	virtualization, Jason Wang, kvm
In-Reply-To: <20190717105056-mutt-send-email-mst@kernel.org>

On Wed, Jul 17, 2019 at 4:51 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Jul 17, 2019 at 01:30:28PM +0200, Stefano Garzarella wrote:
> > fwd_cnt and last_fwd_cnt are protected by rx_lock, so we should use
> > the same spinlock also if we are in the TX path.
> >
> > Move also buf_alloc under the same lock.
> >
> > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
>
> Wait a second is this a bugfix?
> If it's used under the wrong lock won't values get corrupted?
> Won't traffic then stall or more data get to sent than
> credits?

Before this series, we only read vvs->fwd_cnt and vvs->buf_alloc in this
function, but using a different lock than the one used to write them.
I'm not sure if a corruption can happen, but if we want to avoid the
lock, we should use an atomic operation or memory barriers.

Since now we also need to update vvs->last_fwd_cnt, in order to limit the
credit message, I decided to take the same lock used to protect vvs->fwd_cnt
and vvs->last_fwd_cnt.


Thanks,
Stefano

>
> > ---
> >  include/linux/virtio_vsock.h            | 2 +-
> >  net/vmw_vsock/virtio_transport_common.c | 4 ++--
> >  2 files changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> > index 49fc9d20bc43..4c7781f4b29b 100644
> > --- a/include/linux/virtio_vsock.h
> > +++ b/include/linux/virtio_vsock.h
> > @@ -35,7 +35,6 @@ struct virtio_vsock_sock {
> >
> >       /* Protected by tx_lock */
> >       u32 tx_cnt;
> > -     u32 buf_alloc;
> >       u32 peer_fwd_cnt;
> >       u32 peer_buf_alloc;
> >
> > @@ -43,6 +42,7 @@ struct virtio_vsock_sock {
> >       u32 fwd_cnt;
> >       u32 last_fwd_cnt;
> >       u32 rx_bytes;
> > +     u32 buf_alloc;
> >       struct list_head rx_queue;
> >  };
> >
> > diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> > index a85559d4d974..34a2b42313b7 100644
> > --- a/net/vmw_vsock/virtio_transport_common.c
> > +++ b/net/vmw_vsock/virtio_transport_common.c
> > @@ -210,11 +210,11 @@ static void virtio_transport_dec_rx_pkt(struct virtio_vsock_sock *vvs,
> >
> >  void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct virtio_vsock_pkt *pkt)
> >  {
> > -     spin_lock_bh(&vvs->tx_lock);
> > +     spin_lock_bh(&vvs->rx_lock);
> >       vvs->last_fwd_cnt = vvs->fwd_cnt;
> >       pkt->hdr.fwd_cnt = cpu_to_le32(vvs->fwd_cnt);
> >       pkt->hdr.buf_alloc = cpu_to_le32(vvs->buf_alloc);
> > -     spin_unlock_bh(&vvs->tx_lock);
> > +     spin_unlock_bh(&vvs->rx_lock);
> >  }
> >  EXPORT_SYMBOL_GPL(virtio_transport_inc_tx_pkt);
> >
> > --
> > 2.20.1

^ permalink raw reply

* [PATCH 2/3] liquidio: Replace vmalloc + memset with vzalloc
From: Chuhong Yuan @ 2019-07-18  7:45 UTC (permalink / raw)
  Cc: Derek Chickles, Satanand Burla, Felix Manlunas, David S . Miller,
	netdev, linux-kernel, Chuhong Yuan

Use vzalloc and vzalloc_node instead of using vmalloc and
vmalloc_node and then zeroing the allocated memory by
memset 0.
This simplifies the code.

Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
---
 drivers/net/ethernet/cavium/liquidio/request_manager.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/cavium/liquidio/request_manager.c b/drivers/net/ethernet/cavium/liquidio/request_manager.c
index fcf20a8f92d9..032224178b64 100644
--- a/drivers/net/ethernet/cavium/liquidio/request_manager.c
+++ b/drivers/net/ethernet/cavium/liquidio/request_manager.c
@@ -218,15 +218,13 @@ int octeon_setup_iq(struct octeon_device *oct,
 		return 0;
 	}
 	oct->instr_queue[iq_no] =
-	    vmalloc_node(sizeof(struct octeon_instr_queue), numa_node);
+	    vzalloc_node(sizeof(struct octeon_instr_queue), numa_node);
 	if (!oct->instr_queue[iq_no])
 		oct->instr_queue[iq_no] =
-		    vmalloc(sizeof(struct octeon_instr_queue));
+		    vzalloc(sizeof(struct octeon_instr_queue));
 	if (!oct->instr_queue[iq_no])
 		return 1;
 
-	memset(oct->instr_queue[iq_no], 0,
-	       sizeof(struct octeon_instr_queue));
 
 	oct->instr_queue[iq_no]->q_index = q_index;
 	oct->instr_queue[iq_no]->app_ctx = app_ctx;
-- 
2.20.1


^ permalink raw reply related

* RE: [PATCH net-next 3/3] net: stmmac: Introducing support for Page Pool
From: Jose Abreu @ 2019-07-18  7:48 UTC (permalink / raw)
  To: Jon Hunter, Jose Abreu, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org, linux-stm32@st-md-mailman.stormreply.com,
	linux-arm-kernel@lists.infradead.org
  Cc: Joao Pinto, David S . Miller, Giuseppe Cavallaro,
	Alexandre Torgue, Maxime Coquelin, Maxime Ripard, Chen-Yu Tsai,
	linux-tegra
In-Reply-To: <29dcc161-f7c8-026e-c3cc-5adb04df128c@nvidia.com>

[-- Attachment #1: Type: text/plain, Size: 192 bytes --]

From: Jon Hunter <jonathanh@nvidia.com>
Date: Jul/17/2019, 19:58:53 (UTC+00:00)

> Let me know if you have any thoughts.

Can you try attached patch ?

---
Thanks,
Jose Miguel Abreu

[-- Attachment #2: 0001-net-stmmac-RX-Descriptors-need-to-be-clean-before-se.patch --]
[-- Type: application/octet-stream, Size: 2042 bytes --]

From 00bfde6f589e60ba1a2d0671c8ba0fcd0964d6e7 Mon Sep 17 00:00:00 2001
Message-Id: <00bfde6f589e60ba1a2d0671c8ba0fcd0964d6e7.1563435927.git.joabreu@synopsys.com>
From: Jose Abreu <joabreu@synopsys.com>
Date: Thu, 18 Jul 2019 09:42:31 +0200
Subject: [PATCH net] net: stmmac: RX Descriptors need to be clean before
 setting buffers

RX Descriptors are being cleaned after setting the buffers which may
lead to buffer addresses being wiped out.

Fix this by clearing earlier the RX Descriptors.

Reported-by: Jon Hunter <jonathanh@nvidia.com>
Fixes: 2af6106ae949 ("net: stmmac: Introducing support for Page Pool")
Signed-off-by: Jose Abreu <joabreu@synopsys.com>

---
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Cc: Jose Abreu <joabreu@synopsys.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: netdev@vger.kernel.org
Cc: linux-stm32@st-md-mailman.stormreply.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index c7c9e5f162e6..5f1294ce0216 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -1295,6 +1295,8 @@ static int init_dma_rx_desc_rings(struct net_device *dev, gfp_t flags)
 			  "(%s) dma_rx_phy=0x%08x\n", __func__,
 			  (u32)rx_q->dma_rx_phy);
 
+		stmmac_clear_rx_descriptors(priv, queue);
+
 		for (i = 0; i < DMA_RX_SIZE; i++) {
 			struct dma_desc *p;
 
@@ -1312,8 +1314,6 @@ static int init_dma_rx_desc_rings(struct net_device *dev, gfp_t flags)
 		rx_q->cur_rx = 0;
 		rx_q->dirty_rx = (unsigned int)(i - DMA_RX_SIZE);
 
-		stmmac_clear_rx_descriptors(priv, queue);
-
 		/* Setup the chained descriptor addresses */
 		if (priv->mode == STMMAC_CHAIN_MODE) {
 			if (priv->extend_desc)
-- 
2.7.4


^ permalink raw reply related

* IP GRO verifies csum again?
From: Jacob Wen @ 2019-07-18  7:49 UTC (permalink / raw)
  To: netdev; +Cc: herbert

Hi,

inet_gro_receive verifies IP csum but a NIC already did so and set 
CHECKSUM_UNNECESSARY.

https://github.com/torvalds/linux/blob/v5.2/net/ipv4/af_inet.c#L1432-L1433

if (unlikely(ip_fast_csum((u8 *)iph, 5)))

         goto out_unlock;

Is this a bug?

^ permalink raw reply

* Re: [PATCH v4 4/5] vhost/vsock: split packets to send using multiple buffers
From: Stefano Garzarella @ 2019-07-18  7:50 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
	virtualization, Jason Wang, kvm
In-Reply-To: <20190717105336-mutt-send-email-mst@kernel.org>

On Wed, Jul 17, 2019 at 4:55 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Jul 17, 2019 at 01:30:29PM +0200, Stefano Garzarella wrote:
> > If the packets to sent to the guest are bigger than the buffer
> > available, we can split them, using multiple buffers and fixing
> > the length in the packet header.
> > This is safe since virtio-vsock supports only stream sockets.
> >
> > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
>
> So how does it work right now? If an app
> does sendmsg with a 64K buffer and the other
> side publishes 4K buffers - does it just stall?

Before this series, the 64K (or bigger) user messages was split in 4K packets
(fixed in the code) and queued in an internal list for the TX worker.

After this series, we will queue up to 64K packets and then it will be split in
the TX worker, depending on the size of the buffers available in the
vring. (The idea was to allow EWMA or a configuration of the buffers size, but
for now we postponed it)

Note: virtio-vsock only supports stream socket for now.

Thanks,
Stefano

^ permalink raw reply

* Re: [PATCH v4 5/5] vsock/virtio: change the maximum packet size allowed
From: Stefano Garzarella @ 2019-07-18  7:52 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
	virtualization, Jason Wang, kvm
In-Reply-To: <20190717105703-mutt-send-email-mst@kernel.org>

On Wed, Jul 17, 2019 at 5:00 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Jul 17, 2019 at 01:30:30PM +0200, Stefano Garzarella wrote:
> > Since now we are able to split packets, we can avoid limiting
> > their sizes to VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE.
> > Instead, we can use VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max
> > packet size.
> >
> > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
>
>
> OK so this is kind of like GSO where we are passing
> 64K packets to the vsock and then split at the
> low level.

Exactly, something like that in the Host->Guest path, instead in the
Guest->Host we use the entire 64K packet.

Thanks,
Stefano

^ permalink raw reply

* Re: [PATCH v4 4/5] vhost/vsock: split packets to send using multiple buffers
From: Michael S. Tsirkin @ 2019-07-18  8:13 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
	virtualization, Jason Wang, kvm
In-Reply-To: <CAGxU2F45v40qAOHkm1Hk2E69gCS0UwVgS5NS+tDXXuzdF4EixA@mail.gmail.com>

On Thu, Jul 18, 2019 at 09:50:14AM +0200, Stefano Garzarella wrote:
> On Wed, Jul 17, 2019 at 4:55 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Wed, Jul 17, 2019 at 01:30:29PM +0200, Stefano Garzarella wrote:
> > > If the packets to sent to the guest are bigger than the buffer
> > > available, we can split them, using multiple buffers and fixing
> > > the length in the packet header.
> > > This is safe since virtio-vsock supports only stream sockets.
> > >
> > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> >
> > So how does it work right now? If an app
> > does sendmsg with a 64K buffer and the other
> > side publishes 4K buffers - does it just stall?
> 
> Before this series, the 64K (or bigger) user messages was split in 4K packets
> (fixed in the code) and queued in an internal list for the TX worker.
> 
> After this series, we will queue up to 64K packets and then it will be split in
> the TX worker, depending on the size of the buffers available in the
> vring. (The idea was to allow EWMA or a configuration of the buffers size, but
> for now we postponed it)

Got it. Using workers for xmit is IMHO a bad idea btw.
Why is it done like this?

> Note: virtio-vsock only supports stream socket for now.
> 
> Thanks,
> Stefano

^ permalink raw reply

* RE: [EXT] [PATCH] qed: Prefer pcie_capability_read_word()
From: Michal Kalderon @ 2019-07-18  8:22 UTC (permalink / raw)
  To: Frederick Lawler, Ariel Elior, GR-everest-linux-l2
  Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	bhelgaas@google.com
In-Reply-To: <20190718020745.8867-7-fred@fredlawl.com>

> From: Frederick Lawler <fred@fredlawl.com>
> Sent: Thursday, July 18, 2019 5:08 AM
> 
> External Email
> 
> ----------------------------------------------------------------------
> Commit 8c0d3a02c130 ("PCI: Add accessors for PCI Express Capability") added
> accessors for the PCI Express Capability so that drivers didn't need to be
> aware of differences between v1 and v2 of the PCI Express Capability.
> 
> Replace pci_read_config_word() and pci_write_config_word() calls with
> pcie_capability_read_word() and pcie_capability_write_word().
> 
> Signed-off-by: Frederick Lawler <fred@fredlawl.com>
> ---
>  drivers/net/ethernet/qlogic/qed/qed_rdma.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/ethernet/qlogic/qed/qed_rdma.c
> b/drivers/net/ethernet/qlogic/qed/qed_rdma.c
> index 7873d6dfd91f..8d8a920c3195 100644
> --- a/drivers/net/ethernet/qlogic/qed/qed_rdma.c
> +++ b/drivers/net/ethernet/qlogic/qed/qed_rdma.c
> @@ -530,9 +530,8 @@ static void qed_rdma_init_devinfo(struct qed_hwfn
> *p_hwfn,
>  	SET_FIELD(dev->dev_caps,
> QED_RDMA_DEV_CAP_LOCAL_INV_FENCE, 1);
> 
>  	/* Check atomic operations support in PCI configuration space. */
> -	pci_read_config_dword(cdev->pdev,
> -			      cdev->pdev->pcie_cap + PCI_EXP_DEVCTL2,
> -			      &pci_status_control);
> +	pcie_capability_read_dword(cdev->pdev, PCI_EXP_DEVCTL2,
> +				   &pci_status_control);
> 
>  	if (pci_status_control & PCI_EXP_DEVCTL2_LTR_EN)
>  		SET_FIELD(dev->dev_caps,
> QED_RDMA_DEV_CAP_ATOMIC_OP, 1);
> --
> 2.17.1

Thanks, 

Acked-by: Michal Kalderon <michal.kalderon@marvell.com>



^ permalink raw reply

* Re: IP GRO verifies csum again?
From: Eric Dumazet @ 2019-07-18  8:27 UTC (permalink / raw)
  To: Jacob Wen, netdev; +Cc: herbert
In-Reply-To: <6cc12686-a13d-81a8-ad3c-4601397c900e@oracle.com>

On 7/18/19 9:49 AM, Jacob Wen wrote:
> Hi,
> 
> inet_gro_receive verifies IP csum but a NIC already did so and set CHECKSUM_UNNECESSARY.
> 
> 
> https://github.com/torvalds/linux/blob/v5.2/net/ipv4/af_inet.c#L1432-L1433
> 
> if (unlikely(ip_fast_csum((u8 *)iph, 5)))
> 
>         goto out_unlock;
> 
> 
> Is this a bug?
> 

This checksum validates the TCP one, which is the real cost, since we need to touch all
the packet.

We do not bother 'offloading' IPV4 checksum over 20 bytes or so, since in modern cpus,
having the cache line hot in cpu caches means the checksum is almost free.

Adding a test here would not always be a win, say for CHECKSUM_COMPLETE cases,
which we try to generalize in favor of old CHECKSUM_UNNECESSARY.

^ permalink raw reply

* Re: [RFC PATCH 0/5] PTP: add support for Intel's TGPIO controller
From: Felipe Balbi @ 2019-07-18  8:58 UTC (permalink / raw)
  To: Richard Cochran
  Cc: netdev, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H . Peter Anvin, x86, linux-kernel, Christopher S . Hall
In-Reply-To: <20190717173915.GE1464@localhost>


Hi,

Richard Cochran <richardcochran@gmail.com> writes:

> On Wed, Jul 17, 2019 at 09:52:55AM +0300, Felipe Balbi wrote:
>> 
>> It's just a pin, like a GPIO. So it would be a PCB trace, flat flex,
>> copper wire... Anything, really.
>
> Cool.  Are there any Intel CPUs available that have this feature?

At least canon lake has it, but its BIOS doesn't enable it. This is
something that will be more important on future CPUs.

-- 
balbi

^ permalink raw reply

* Re: [RFC PATCH 4/5] PTP: Add flag for non-periodic output
From: Felipe Balbi @ 2019-07-18  8:59 UTC (permalink / raw)
  To: Richard Cochran
  Cc: netdev, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H . Peter Anvin, x86, linux-kernel, Christopher S . Hall
In-Reply-To: <20190717173645.GD1464@localhost>


Hi,

Richard Cochran <richardcochran@gmail.com> writes:

> On Wed, Jul 17, 2019 at 09:49:13AM +0300, Felipe Balbi wrote:
>> No worries, I'll work on this after vacations (I'll off for 2 weeks
>> starting next week). I thought about adding a new IOCTL until I saw that
>> rsv field. Oh well :-)
>
> It would be great if you could fix up the PTP ioctls as a preface to
> your series.

no problem, anything in particular in mind? Just create new versions of
all the IOCTLs so we can actually use the reserved fields in the future?

cheers

-- 
balbi

^ permalink raw reply

* [PATCH bpf] selftests/bpf: fix "valid read map access into a read-only array 1" on s390
From: Ilya Leoshkevich @ 2019-07-18  9:13 UTC (permalink / raw)
  To: bpf, netdev; +Cc: gor, heiko.carstens, Ilya Leoshkevich

This test looks up a 32-bit map element and then loads it using a 64-bit
load. This does not work on s390, which is a big-endian machine.

Since the point of this test doesn't seem to be loading a smaller value
using a larger load, simply use a 32-bit load.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
---
 tools/testing/selftests/bpf/verifier/array_access.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/verifier/array_access.c b/tools/testing/selftests/bpf/verifier/array_access.c
index bcb83196e459..f3c33e128709 100644
--- a/tools/testing/selftests/bpf/verifier/array_access.c
+++ b/tools/testing/selftests/bpf/verifier/array_access.c
@@ -226,7 +226,7 @@
 	BPF_LD_MAP_FD(BPF_REG_1, 0),
 	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
 	BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1),
-	BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 0),
+	BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_0, 0),
 	BPF_EXIT_INSN(),
 	},
 	.fixup_map_array_ro = { 3 },
-- 
2.21.0


^ permalink raw reply related

* Re: [PATCH net-next 3/3] net: stmmac: Introducing support for Page Pool
From: Jon Hunter @ 2019-07-18  9:16 UTC (permalink / raw)
  To: Jose Abreu, linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	linux-stm32@st-md-mailman.stormreply.com,
	linux-arm-kernel@lists.infradead.org
  Cc: Joao Pinto, David S . Miller, Giuseppe Cavallaro,
	Alexandre Torgue, Maxime Coquelin, Maxime Ripard, Chen-Yu Tsai,
	linux-tegra
In-Reply-To: <BN8PR12MB32661E919A8DEBC7095BAA12D3C80@BN8PR12MB3266.namprd12.prod.outlook.com>


On 18/07/2019 08:48, Jose Abreu wrote:
> From: Jon Hunter <jonathanh@nvidia.com>
> Date: Jul/17/2019, 19:58:53 (UTC+00:00)
> 
>> Let me know if you have any thoughts.
> 
> Can you try attached patch ?

Yes this did not help. I tried enabling the following but no more output
is seen.

CONFIG_DMA_API_DEBUG=y
CONFIG_DMA_API_DEBUG_SG=y

Have you tried using NFS on a board with this ethernet controller?

Cheers,
Jon

-- 
nvpublic

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox