* Re: [PATCH v3 bpf-next 05/12] libbpf: add resizable non-thread safe internal hashmap
From: Andrii Nakryiko @ 2019-07-18 3:35 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo
Cc: Andrii Nakryiko, Alexei Starovoitov, Daniel Borkmann, Networking,
bpf, Kernel Team, linux-perf-users, Jiri Olsa, Namhyung Kim,
Adrian Hunter
In-Reply-To: <20190718002436.GA18962@kernel.org>
On Wed, Jul 17, 2019 at 5:24 PM Arnaldo Carvalho de Melo
<arnaldo.melo@gmail.com> wrote:
>
> Em Fri, May 24, 2019 at 11:59:00AM -0700, Andrii Nakryiko escreveu:
> > There is a need for fast point lookups inside libbpf for multiple use
> > cases (e.g., name resolution for BTF-to-C conversion, by-name lookups in
> > BTF for upcoming BPF CO-RE relocation support, etc). This patch
> > implements simple resizable non-thread safe hashmap using single linked
> > list chains.
>
> This is breaking the tools/perf build in some systems where __WORDSIZE use in
> tools/lib/bpf/hashmap.h is not finding the definition of __WORDSIZE,
> which I think requires using:
>
> #include <limits.h>
>
> Or perhaps
>
> #include <stdint.h>
>
> Non-exhaustive search on a fedora:30 system:
>
> [acme@quaco perf]$ grep "define.*__WORDSIZE" /usr/include/*/*.h
> /usr/include/bits/elfclass.h:#define __ELF_NATIVE_CLASS __WORDSIZE
> /usr/include/bits/siginfo-arch.h:#if defined __x86_64__ && __WORDSIZE == 32
> /usr/include/bits/timesize.h:# define __TIMESIZE __WORDSIZE
> /usr/include/bits/wordsize.h:# define __WORDSIZE 64
> /usr/include/bits/wordsize.h:# define __WORDSIZE 32
> /usr/include/bits/wordsize.h:#define __WORDSIZE32_SIZE_ULONG 0
> /usr/include/bits/wordsize.h:#define __WORDSIZE32_PTRDIFF_LONG 0
> /usr/include/bits/wordsize.h:# define __WORDSIZE_TIME64_COMPAT32 1
> /usr/include/bits/wordsize.h:# define __WORDSIZE_TIME64_COMPAT32 0
> [acme@quaco perf]$ grep bits\/wordsize.h /usr/include/*.h
> /usr/include/bfd.h:#include <bits/wordsize.h>
> /usr/include/limits.h:#include <bits/wordsize.h>
> /usr/include/pthread.h:#include <bits/wordsize.h>
> /usr/include/stdint.h:#include <bits/wordsize.h>
> /usr/include/tiffconf.h:#include <bits/wordsize.h>
> [acme@quaco perf]$
>
> On fedora:30 it works, probably because some header included from there
> ends up adding the file that has that def, but these fail, for instance:
>
> [perfbuilder@quaco linux-perf-tools-build]$ export PERF_TARBALL=http://192.168.124.1/perf/perf-5.2.0.tar.xz
> [perfbuilder@quaco linux-perf-tools-build]$ time dm
> 1 13.57 alpine:3.4 : FAIL gcc (Alpine 5.3.0) 5.3.0, clang version 3.8.0 (tags/RELEASE_380/final)
> 2 13.78 alpine:3.5 : FAIL gcc (Alpine 6.2.1) 6.2.1 20160822, clang version 3.8.1 (tags/RELEASE_381/final)
> 3 12.42 alpine:3.6 : FAIL gcc (Alpine 6.3.0) 6.3.0, clang version 4.0.0 (tags/RELEASE_400/final)
> 4 15.01 alpine:3.7 : FAIL gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.0 (tags/RELEASE_500/final) (based on LLVM 5.0.0)
> 5 13.80 alpine:3.8 : FAIL gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.1 (tags/RELEASE_501/final) (based on LLVM 5.0.1)
> 6 15.37 alpine:3.9 : FAIL gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 5.0.1 (tags/RELEASE_502/final) (based on LLVM 5.0.1)
> 7 15.10 alpine:3.10 : FAIL gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 8.0.0 (tags/RELEASE_800/final) (based on LLVM 8.0.0)
> 8 15.26 alpine:edge : FAIL gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 7.0.1 (tags/RELEASE_701/final) (based on LLVM 7.0.1)
> 9: amazonlinux:1^C
>
> I'm trying to see if adding #include <limits.h> fixes the problems in
> all the distros in my container based build test suite.
>
> Lets see with at least alpine:3.4:
>
> Nope, didn't work, will try revisiting this tomorrow...
>
> I'll let it build anyway to see if this fails in other systems/libcs,
> BTW, Alpine uses musl libc.
So it seems like __WORDSIZE is declared in bits/wordsize.h for glibc
and in bits/reg.h for musl, so something like the following should fix
this issue, but I can't easily verify. Can you please see if these
changes fix the issue on your side?
#ifdef __GLIBC__
#include <bits/wordsize.h>
#else
#include <bits/reg.h>
#endif
>
> - Arnaldo
>
> > Four different insert strategies are supported:
> > - HASHMAP_ADD - only add key/value if key doesn't exist yet;
> > - HASHMAP_SET - add key/value pair if key doesn't exist yet; otherwise,
> > update value;
> > - HASHMAP_UPDATE - update value, if key already exists; otherwise, do
> > nothing and return -ENOENT;
> > - HASHMAP_APPEND - always add key/value pair, even if key already exists.
> > This turns hashmap into a multimap by allowing multiple values to be
> > associated with the same key. Most useful read API for such hashmap is
> > hashmap__for_each_key_entry() iteration. If hashmap__find() is still
> > used, it will return last inserted key/value entry (first in a bucket
> > chain).
> >
> > For HASHMAP_SET and HASHMAP_UPDATE, old key/value pair is returned, so
> > that calling code can handle proper memory management, if necessary.
> >
> > Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> > ---
> > tools/lib/bpf/Build | 2 +-
> > tools/lib/bpf/hashmap.c | 229 ++++++++++++++++++++++++++++++++++++++++
> > tools/lib/bpf/hashmap.h | 173 ++++++++++++++++++++++++++++++
> > 3 files changed, 403 insertions(+), 1 deletion(-)
> > create mode 100644 tools/lib/bpf/hashmap.c
> > create mode 100644 tools/lib/bpf/hashmap.h
> >
> > diff --git a/tools/lib/bpf/Build b/tools/lib/bpf/Build
> > index ee9d5362f35b..dcf0d36598e0 100644
> > --- a/tools/lib/bpf/Build
> > +++ b/tools/lib/bpf/Build
> > @@ -1 +1 @@
> > -libbpf-y := libbpf.o bpf.o nlattr.o btf.o libbpf_errno.o str_error.o netlink.o bpf_prog_linfo.o libbpf_probes.o xsk.o
> > +libbpf-y := libbpf.o bpf.o nlattr.o btf.o libbpf_errno.o str_error.o netlink.o bpf_prog_linfo.o libbpf_probes.o xsk.o hashmap.o
> > diff --git a/tools/lib/bpf/hashmap.c b/tools/lib/bpf/hashmap.c
> > new file mode 100644
> > index 000000000000..6122272943e6
> > --- /dev/null
> > +++ b/tools/lib/bpf/hashmap.c
> > @@ -0,0 +1,229 @@
> > +// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
> > +
> > +/*
> > + * Generic non-thread safe hash map implementation.
> > + *
> > + * Copyright (c) 2019 Facebook
> > + */
> > +#include <stdint.h>
> > +#include <stdlib.h>
> > +#include <stdio.h>
> > +#include <errno.h>
> > +#include <linux/err.h>
> > +#include "hashmap.h"
> > +
> > +/* start with 4 buckets */
> > +#define HASHMAP_MIN_CAP_BITS 2
> > +
> > +static void hashmap_add_entry(struct hashmap_entry **pprev,
> > + struct hashmap_entry *entry)
> > +{
> > + entry->next = *pprev;
> > + *pprev = entry;
> > +}
> > +
> > +static void hashmap_del_entry(struct hashmap_entry **pprev,
> > + struct hashmap_entry *entry)
> > +{
> > + *pprev = entry->next;
> > + entry->next = NULL;
> > +}
> > +
> > +void hashmap__init(struct hashmap *map, hashmap_hash_fn hash_fn,
> > + hashmap_equal_fn equal_fn, void *ctx)
> > +{
> > + map->hash_fn = hash_fn;
> > + map->equal_fn = equal_fn;
> > + map->ctx = ctx;
> > +
> > + map->buckets = NULL;
> > + map->cap = 0;
> > + map->cap_bits = 0;
> > + map->sz = 0;
> > +}
> > +
> > +struct hashmap *hashmap__new(hashmap_hash_fn hash_fn,
> > + hashmap_equal_fn equal_fn,
> > + void *ctx)
> > +{
> > + struct hashmap *map = malloc(sizeof(struct hashmap));
> > +
> > + if (!map)
> > + return ERR_PTR(-ENOMEM);
> > + hashmap__init(map, hash_fn, equal_fn, ctx);
> > + return map;
> > +}
> > +
> > +void hashmap__clear(struct hashmap *map)
> > +{
> > + free(map->buckets);
> > + map->cap = map->cap_bits = map->sz = 0;
> > +}
> > +
> > +void hashmap__free(struct hashmap *map)
> > +{
> > + if (!map)
> > + return;
> > +
> > + hashmap__clear(map);
> > + free(map);
> > +}
> > +
> > +size_t hashmap__size(const struct hashmap *map)
> > +{
> > + return map->sz;
> > +}
> > +
> > +size_t hashmap__capacity(const struct hashmap *map)
> > +{
> > + return map->cap;
> > +}
> > +
> > +static bool hashmap_needs_to_grow(struct hashmap *map)
> > +{
> > + /* grow if empty or more than 75% filled */
> > + return (map->cap == 0) || ((map->sz + 1) * 4 / 3 > map->cap);
> > +}
> > +
> > +static int hashmap_grow(struct hashmap *map)
> > +{
> > + struct hashmap_entry **new_buckets;
> > + struct hashmap_entry *cur, *tmp;
> > + size_t new_cap_bits, new_cap;
> > + size_t h;
> > + int bkt;
> > +
> > + new_cap_bits = map->cap_bits + 1;
> > + if (new_cap_bits < HASHMAP_MIN_CAP_BITS)
> > + new_cap_bits = HASHMAP_MIN_CAP_BITS;
> > +
> > + new_cap = 1UL << new_cap_bits;
> > + new_buckets = calloc(new_cap, sizeof(new_buckets[0]));
> > + if (!new_buckets)
> > + return -ENOMEM;
> > +
> > + hashmap__for_each_entry_safe(map, cur, tmp, bkt) {
> > + h = hash_bits(map->hash_fn(cur->key, map->ctx), new_cap_bits);
> > + hashmap_add_entry(&new_buckets[h], cur);
> > + }
> > +
> > + map->cap = new_cap;
> > + map->cap_bits = new_cap_bits;
> > + free(map->buckets);
> > + map->buckets = new_buckets;
> > +
> > + return 0;
> > +}
> > +
> > +static bool hashmap_find_entry(const struct hashmap *map,
> > + const void *key, size_t hash,
> > + struct hashmap_entry ***pprev,
> > + struct hashmap_entry **entry)
> > +{
> > + struct hashmap_entry *cur, **prev_ptr;
> > +
> > + if (!map->buckets)
> > + return false;
> > +
> > + for (prev_ptr = &map->buckets[hash], cur = *prev_ptr;
> > + cur;
> > + prev_ptr = &cur->next, cur = cur->next) {
> > + if (map->equal_fn(cur->key, key, map->ctx)) {
> > + if (pprev)
> > + *pprev = prev_ptr;
> > + *entry = cur;
> > + return true;
> > + }
> > + }
> > +
> > + return false;
> > +}
> > +
> > +int hashmap__insert(struct hashmap *map, const void *key, void *value,
> > + enum hashmap_insert_strategy strategy,
> > + const void **old_key, void **old_value)
> > +{
> > + struct hashmap_entry *entry;
> > + size_t h;
> > + int err;
> > +
> > + if (old_key)
> > + *old_key = NULL;
> > + if (old_value)
> > + *old_value = NULL;
> > +
> > + h = hash_bits(map->hash_fn(key, map->ctx), map->cap_bits);
> > + if (strategy != HASHMAP_APPEND &&
> > + hashmap_find_entry(map, key, h, NULL, &entry)) {
> > + if (old_key)
> > + *old_key = entry->key;
> > + if (old_value)
> > + *old_value = entry->value;
> > +
> > + if (strategy == HASHMAP_SET || strategy == HASHMAP_UPDATE) {
> > + entry->key = key;
> > + entry->value = value;
> > + return 0;
> > + } else if (strategy == HASHMAP_ADD) {
> > + return -EEXIST;
> > + }
> > + }
> > +
> > + if (strategy == HASHMAP_UPDATE)
> > + return -ENOENT;
> > +
> > + if (hashmap_needs_to_grow(map)) {
> > + err = hashmap_grow(map);
> > + if (err)
> > + return err;
> > + h = hash_bits(map->hash_fn(key, map->ctx), map->cap_bits);
> > + }
> > +
> > + entry = malloc(sizeof(struct hashmap_entry));
> > + if (!entry)
> > + return -ENOMEM;
> > +
> > + entry->key = key;
> > + entry->value = value;
> > + hashmap_add_entry(&map->buckets[h], entry);
> > + map->sz++;
> > +
> > + return 0;
> > +}
> > +
> > +bool hashmap__find(const struct hashmap *map, const void *key, void **value)
> > +{
> > + struct hashmap_entry *entry;
> > + size_t h;
> > +
> > + h = hash_bits(map->hash_fn(key, map->ctx), map->cap_bits);
> > + if (!hashmap_find_entry(map, key, h, NULL, &entry))
> > + return false;
> > +
> > + if (value)
> > + *value = entry->value;
> > + return true;
> > +}
> > +
> > +bool hashmap__delete(struct hashmap *map, const void *key,
> > + const void **old_key, void **old_value)
> > +{
> > + struct hashmap_entry **pprev, *entry;
> > + size_t h;
> > +
> > + h = hash_bits(map->hash_fn(key, map->ctx), map->cap_bits);
> > + if (!hashmap_find_entry(map, key, h, &pprev, &entry))
> > + return false;
> > +
> > + if (old_key)
> > + *old_key = entry->key;
> > + if (old_value)
> > + *old_value = entry->value;
> > +
> > + hashmap_del_entry(pprev, entry);
> > + free(entry);
> > + map->sz--;
> > +
> > + return true;
> > +}
> > +
> > diff --git a/tools/lib/bpf/hashmap.h b/tools/lib/bpf/hashmap.h
> > new file mode 100644
> > index 000000000000..03748a742146
> > --- /dev/null
> > +++ b/tools/lib/bpf/hashmap.h
> > @@ -0,0 +1,173 @@
> > +/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
> > +
> > +/*
> > + * Generic non-thread safe hash map implementation.
> > + *
> > + * Copyright (c) 2019 Facebook
> > + */
> > +#ifndef __LIBBPF_HASHMAP_H
> > +#define __LIBBPF_HASHMAP_H
> > +
> > +#include <stdbool.h>
> > +#include <stddef.h>
> > +#include "libbpf_internal.h"
> > +
> > +static inline size_t hash_bits(size_t h, int bits)
> > +{
> > + /* shuffle bits and return requested number of upper bits */
> > + return (h * 11400714819323198485llu) >> (__WORDSIZE - bits);
> > +}
> > +
> > +typedef size_t (*hashmap_hash_fn)(const void *key, void *ctx);
> > +typedef bool (*hashmap_equal_fn)(const void *key1, const void *key2, void *ctx);
> > +
> > +struct hashmap_entry {
> > + const void *key;
> > + void *value;
> > + struct hashmap_entry *next;
> > +};
> > +
> > +struct hashmap {
> > + hashmap_hash_fn hash_fn;
> > + hashmap_equal_fn equal_fn;
> > + void *ctx;
> > +
> > + struct hashmap_entry **buckets;
> > + size_t cap;
> > + size_t cap_bits;
> > + size_t sz;
> > +};
> > +
> > +#define HASHMAP_INIT(hash_fn, equal_fn, ctx) { \
> > + .hash_fn = (hash_fn), \
> > + .equal_fn = (equal_fn), \
> > + .ctx = (ctx), \
> > + .buckets = NULL, \
> > + .cap = 0, \
> > + .cap_bits = 0, \
> > + .sz = 0, \
> > +}
> > +
> > +void hashmap__init(struct hashmap *map, hashmap_hash_fn hash_fn,
> > + hashmap_equal_fn equal_fn, void *ctx);
> > +struct hashmap *hashmap__new(hashmap_hash_fn hash_fn,
> > + hashmap_equal_fn equal_fn,
> > + void *ctx);
> > +void hashmap__clear(struct hashmap *map);
> > +void hashmap__free(struct hashmap *map);
> > +
> > +size_t hashmap__size(const struct hashmap *map);
> > +size_t hashmap__capacity(const struct hashmap *map);
> > +
> > +/*
> > + * Hashmap insertion strategy:
> > + * - HASHMAP_ADD - only add key/value if key doesn't exist yet;
> > + * - HASHMAP_SET - add key/value pair if key doesn't exist yet; otherwise,
> > + * update value;
> > + * - HASHMAP_UPDATE - update value, if key already exists; otherwise, do
> > + * nothing and return -ENOENT;
> > + * - HASHMAP_APPEND - always add key/value pair, even if key already exists.
> > + * This turns hashmap into a multimap by allowing multiple values to be
> > + * associated with the same key. Most useful read API for such hashmap is
> > + * hashmap__for_each_key_entry() iteration. If hashmap__find() is still
> > + * used, it will return last inserted key/value entry (first in a bucket
> > + * chain).
> > + */
> > +enum hashmap_insert_strategy {
> > + HASHMAP_ADD,
> > + HASHMAP_SET,
> > + HASHMAP_UPDATE,
> > + HASHMAP_APPEND,
> > +};
> > +
> > +/*
> > + * hashmap__insert() adds key/value entry w/ various semantics, depending on
> > + * provided strategy value. If a given key/value pair replaced already
> > + * existing key/value pair, both old key and old value will be returned
> > + * through old_key and old_value to allow calling code do proper memory
> > + * management.
> > + */
> > +int hashmap__insert(struct hashmap *map, const void *key, void *value,
> > + enum hashmap_insert_strategy strategy,
> > + const void **old_key, void **old_value);
> > +
> > +static inline int hashmap__add(struct hashmap *map,
> > + const void *key, void *value)
> > +{
> > + return hashmap__insert(map, key, value, HASHMAP_ADD, NULL, NULL);
> > +}
> > +
> > +static inline int hashmap__set(struct hashmap *map,
> > + const void *key, void *value,
> > + const void **old_key, void **old_value)
> > +{
> > + return hashmap__insert(map, key, value, HASHMAP_SET,
> > + old_key, old_value);
> > +}
> > +
> > +static inline int hashmap__update(struct hashmap *map,
> > + const void *key, void *value,
> > + const void **old_key, void **old_value)
> > +{
> > + return hashmap__insert(map, key, value, HASHMAP_UPDATE,
> > + old_key, old_value);
> > +}
> > +
> > +static inline int hashmap__append(struct hashmap *map,
> > + const void *key, void *value)
> > +{
> > + return hashmap__insert(map, key, value, HASHMAP_APPEND, NULL, NULL);
> > +}
> > +
> > +bool hashmap__delete(struct hashmap *map, const void *key,
> > + const void **old_key, void **old_value);
> > +
> > +bool hashmap__find(const struct hashmap *map, const void *key, void **value);
> > +
> > +/*
> > + * hashmap__for_each_entry - iterate over all entries in hashmap
> > + * @map: hashmap to iterate
> > + * @cur: struct hashmap_entry * used as a loop cursor
> > + * @bkt: integer used as a bucket loop cursor
> > + */
> > +#define hashmap__for_each_entry(map, cur, bkt) \
> > + for (bkt = 0; bkt < map->cap; bkt++) \
> > + for (cur = map->buckets[bkt]; cur; cur = cur->next)
> > +
> > +/*
> > + * hashmap__for_each_entry_safe - iterate over all entries in hashmap, safe
> > + * against removals
> > + * @map: hashmap to iterate
> > + * @cur: struct hashmap_entry * used as a loop cursor
> > + * @tmp: struct hashmap_entry * used as a temporary next cursor storage
> > + * @bkt: integer used as a bucket loop cursor
> > + */
> > +#define hashmap__for_each_entry_safe(map, cur, tmp, bkt) \
> > + for (bkt = 0; bkt < map->cap; bkt++) \
> > + for (cur = map->buckets[bkt]; \
> > + cur && ({tmp = cur->next; true; }); \
> > + cur = tmp)
> > +
> > +/*
> > + * hashmap__for_each_key_entry - iterate over entries associated with given key
> > + * @map: hashmap to iterate
> > + * @cur: struct hashmap_entry * used as a loop cursor
> > + * @key: key to iterate entries for
> > + */
> > +#define hashmap__for_each_key_entry(map, cur, _key) \
> > + for (cur = ({ size_t bkt = hash_bits(map->hash_fn((_key), map->ctx),\
> > + map->cap_bits); \
> > + map->buckets ? map->buckets[bkt] : NULL; }); \
> > + cur; \
> > + cur = cur->next) \
> > + if (map->equal_fn(cur->key, (_key), map->ctx))
> > +
> > +#define hashmap__for_each_key_entry_safe(map, cur, tmp, _key) \
> > + for (cur = ({ size_t bkt = hash_bits(map->hash_fn((_key), map->ctx),\
> > + map->cap_bits); \
> > + cur = map->buckets ? map->buckets[bkt] : NULL; }); \
> > + cur && ({ tmp = cur->next; true; }); \
> > + cur = tmp) \
> > + if (map->equal_fn(cur->key, (_key), map->ctx))
> > +
> > +#endif /* __LIBBPF_HASHMAP_H */
> > --
> > 2.17.1
>
> --
>
> - Arnaldo
^ permalink raw reply
* Re: [PATCH v3 net-next 13/19] ionic: Add initial ethtool support
From: Andrew Lunn @ 2019-07-18 3:31 UTC (permalink / raw)
To: Shannon Nelson; +Cc: netdev
In-Reply-To: <79f2da6f-4568-4bc8-2fa4-3aa5a41bbff1@pensando.io>
On Fri, Jul 12, 2019 at 10:32:38PM -0700, Shannon Nelson wrote:
> On 7/8/19 7:30 PM, Andrew Lunn wrote:
> >>+static int ionic_nway_reset(struct net_device *netdev)
> >>+{
> >>+ struct lif *lif = netdev_priv(netdev);
> >>+ int err = 0;
> >>+
> >>+ if (netif_running(netdev))
> >>+ err = ionic_reset_queues(lif);
> >What does ionic_reset_queues() do? It sounds nothing like restarting
> >auto negotiation?
> >
> > Andrew
> Basically, it's a rip-it-all-down-and-start-over way of restarting the
> connection, and is also useful for fixing queues that are misbehaving. It's
> a little old-fashioned, taken from the ixgbe example, but is effective when
> there isn't an actual "restart auto-negotiation" command in the firmware.
O.K. More comments please.
Did you consider throwing the firmware away and just letting Linux
control the hardware? It would make this all much more transparent and
debuggable.
Andrew
^ permalink raw reply
* Re: [PATCH v3 net-next 13/19] ionic: Add initial ethtool support
From: Andrew Lunn @ 2019-07-18 3:28 UTC (permalink / raw)
To: Shannon Nelson; +Cc: netdev
In-Reply-To: <10c66ecf-6d67-d206-b496-6f8139f218a8@pensando.io>
On Fri, Jul 12, 2019 at 10:16:31PM -0700, Shannon Nelson wrote:
> On 7/8/19 7:14 PM, Andrew Lunn wrote:
> >>+static int ionic_set_pauseparam(struct net_device *netdev,
> >>+ struct ethtool_pauseparam *pause)
> >>+{
> >>+ struct lif *lif = netdev_priv(netdev);
> >>+ struct ionic *ionic = lif->ionic;
> >>+ struct ionic_dev *idev = &lif->ionic->idev;
> >>+
> >>+ u32 requested_pause;
> >>+ u32 cur_autoneg;
> >>+ int err;
> >>+
> >>+ cur_autoneg = idev->port_info->config.an_enable ? AUTONEG_ENABLE :
> >>+ AUTONEG_DISABLE;
> >>+ if (pause->autoneg != cur_autoneg) {
> >>+ netdev_info(netdev, "Please use 'ethtool -s ...' to change autoneg\n");
> >>+ return -EOPNOTSUPP;
> >>+ }
> >>+
> >>+ /* change both at the same time */
> >>+ requested_pause = PORT_PAUSE_TYPE_LINK;
> >>+ if (pause->rx_pause)
> >>+ requested_pause |= IONIC_PAUSE_F_RX;
> >>+ if (pause->tx_pause)
> >>+ requested_pause |= IONIC_PAUSE_F_TX;
> >>+
> >>+ if (requested_pause == idev->port_info->config.pause_type)
> >>+ return 0;
> >>+
> >>+ idev->port_info->config.pause_type = requested_pause;
> >>+
> >>+ mutex_lock(&ionic->dev_cmd_lock);
> >>+ ionic_dev_cmd_port_pause(idev, requested_pause);
> >>+ err = ionic_dev_cmd_wait(ionic, devcmd_timeout);
> >>+ mutex_unlock(&ionic->dev_cmd_lock);
> >>+ if (err)
> >>+ return err;
> >Hi Shannon
> >
> >I've no idea what the firmware black box is doing, but this looks
> >wrong.
> >
> >pause->autoneg is about if the results of auto-neg should be used or
> >not. If false, just configure the MAC with the pause settings and you
> >are done. If the interface is being forced, so autoneg in general is
> >disabled, just configure the MAC and you are done.
> >
> >If pause->autoneg is true and the interface is using auto-neg as a
> >whole, you pass the pause values to the PHY for it to advertise and
> >trigger an auto-neg. Once autoneg has completed, and the resolved
> >settings are available, the MAC is configured with the resolved
> >values.
> >
> >Looking at this code, i don't see any difference between configuring
> >the MAC or configuring the PHY. I would expect pause->autoneg to be
> >part of requested_pause somehow, so the firmware knows what is should
> >do.
> >
> > Andrew
>
> In this device there's actually very little the driver can do to directly
> configure the mac or phy besides passing through to the firmware what the
> user has requested - that happens here for the pause values, and in
> ionic_set_link_ksettings() for autoneg. The firmware is managing the port
> based on these requests with the help of internally configured rules defined
> in a customer setting.
I get that. But the firmware needs to conform to what Linux
expects. And what i see here does not conform. That is why i gave a
bit of detail in my reply.
What exactly does the firmware do? Once we know that, we can figure
out when the driver should return -EOPNOTSUPP because of firmware
limitations, and what it can configure and should return 0.
Andrew
^ permalink raw reply
* Re: [PATCH v3 net-next 13/19] ionic: Add initial ethtool support
From: Andrew Lunn @ 2019-07-18 3:21 UTC (permalink / raw)
To: Shannon Nelson; +Cc: netdev
In-Reply-To: <649f7e96-a76b-79f0-db25-d3ce57397df5@pensando.io>
On Tue, Jul 09, 2019 at 03:42:39PM -0700, Shannon Nelson wrote:
> On 7/8/19 7:27 PM, Andrew Lunn wrote:
> >>+static int ionic_get_module_eeprom(struct net_device *netdev,
> >>+ struct ethtool_eeprom *ee,
> >>+ u8 *data)
> >>+{
> >>+ struct lif *lif = netdev_priv(netdev);
> >>+ struct ionic_dev *idev = &lif->ionic->idev;
> >>+ struct xcvr_status *xcvr;
> >>+ u32 len;
> >>+
> >>+ /* copy the module bytes into data */
> >>+ xcvr = &idev->port_info->status.xcvr;
> >>+ len = min_t(u32, sizeof(xcvr->sprom), ee->len);
> >>+ memcpy(data, xcvr->sprom, len);
> >Hi Shannon
> >
> >This also looks odd. Where is the call into the firmware to get the
> >eeprom contents? Even though it is called 'eeprom', the data is not
> >static. It contains real time diagnostic values, temperature, transmit
> >power, receiver power, voltages etc.
>
> idev->port_info is a memory mapped space that the device keeps up-to-date.
Hi Shannon
It at least needs a comment. How frequently does the device update
this chunk of memory? It would be good to comment about that as
well. Or do MMIO reads block while i2c operations occur to update the
memory?
> >>+
> >>+ dev_dbg(&lif->netdev->dev, "notifyblock eid=0x%llx link_status=0x%x link_speed=0x%x link_down_cnt=0x%x\n",
> >>+ lif->info->status.eid,
> >>+ lif->info->status.link_status,
> >>+ lif->info->status.link_speed,
> >>+ lif->info->status.link_down_count);
> >>+ dev_dbg(&lif->netdev->dev, " port_status id=0x%x status=0x%x speed=0x%x\n",
> >>+ idev->port_info->status.id,
> >>+ idev->port_info->status.status,
> >>+ idev->port_info->status.speed);
> >>+ dev_dbg(&lif->netdev->dev, " xcvr status state=0x%x phy=0x%x pid=0x%x\n",
> >>+ xcvr->state, xcvr->phy, xcvr->pid);
> >>+ dev_dbg(&lif->netdev->dev, " port_config state=0x%x speed=0x%x mtu=0x%x an_enable=0x%x fec_type=0x%x pause_type=0x%x loopback_mode=0x%x\n",
> >>+ idev->port_info->config.state,
> >>+ idev->port_info->config.speed,
> >>+ idev->port_info->config.mtu,
> >>+ idev->port_info->config.an_enable,
> >>+ idev->port_info->config.fec_type,
> >>+ idev->port_info->config.pause_type,
> >>+ idev->port_info->config.loopback_mode);
> >It is a funny place to have all the debug code.
>
> Someone wanted a hook for getting some link info on the fly on request.
I would suggest deleting it all. Most of it is already available via
ethtool.
Andrew
^ permalink raw reply
* Re: [PATCH RFC 0/4] Add support to directly attach BPF program to ftrace
From: Joel Fernandes @ 2019-07-18 2:51 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: LKML, bpf, Daniel Borkmann, Network Development, Steven Rostedt,
kernel-team
In-Reply-To: <CAADnVQJY_=yeY0C3k1ZKpRFu5oNbB4zhQf5tQnLr=Mi8i6cgeQ@mail.gmail.com>
Hi Alexei,
On Wed, Jul 17, 2019 at 02:40:42PM -0700, Alexei Starovoitov wrote:
> On Wed, Jul 17, 2019 at 6:01 AM Joel Fernandes <joel@joelfernandes.org> wrote:
>
> I trimmed cc. some emails were bouncing.
Ok, thanks.
> > > I think allowing one tracepoint and disallowing another is pointless
> > > from security point of view. Tracing bpf program can do bpf_probe_read
> > > of anything.
> >
> > I think the assumption here is the user controls the program instructions at
> > runtime, but that's not the case. The BPF program we are loading is not
> > dynamically generated, it is built at build time and it is loaded from a
> > secure verified partition, so even though it can do bpf_probe_read, it is
> > still not something that the user can change.
>
> so you're saying that by having a set of signed bpf programs which
> instructions are known to be non-malicious and allowed set of tracepoints
> to attach via selinux whitelist, such setup will be safe?
> Have you considered how mix and match will behave?
Do you mean the effect of mixing tracepoints and programs? I have not
considered this. I am Ok with further enforcing of this (only certain
tracepoints can be attached to certain programs) if needed. What do
you think? We could have a new bpf(2) syscall attribute specify which
tracepoint is expected, or similar.
I wanted to walk you through our 2 usecases we are working on:
1. timeinstate: By hooking 2 programs onto sched_switch and cpu_frequency
tracepoints, we are able to collect CPU power per-UID (specific app). Connor
O'Brien is working on that.
2. inode to file path mapping: By hooking onto VFS tracepoints we are adding to
the android kernels, we can collect data when the kernel resolves a file path
to a inode/device number. A BPF map stores the inode/dev number (key) and the
path (value). We have usecases where we need a high speed lookup of this
without having to scan all the files in the filesystem.
For the first usecase, the BPF program will be loaded and attached to the
scheduler and cpufreq tracepoints at boot time and will stay attached
forever. This is why I was saying having a daemon to stay alive all the time
is pointless. However, if since you are completely against using tracefs
which it sounds like, then we can do a daemon that is always alive.
For the second usecase, the program attach is needed on-demand unlike the
first usecase, and then after the usecase completes, it is detached to avoid
overhead.
For the second usecase, privacy is important and we want the data to not be
available to any process. So we want to make sure only selected processes can
attach to that tracepoint. This is the reason why I was doing working on
these patches which use the tracefs as well, since we get that level of
control.
As you can see, I was trying to solve the sticky tracepoint problem in
usecase 1 and the privacy problem in usecase 2.
I had some discussions today at office and we think we can use the daemon
approach to solve both these problems as well which I think would make you
happy as well.
What do you think about all of this? Any other feedback?
> > And, we are planning to make it
> > even more secure by making it kernel verify the program at load time as well
> > (you were on some discussions about that a few months ago).
>
> It sounds like api decisions for this sticky raw_tp feature are
> driven by security choices which are not actually secure.
> I'm suggesting to avoid bringing up point of security as a reason for
> this api design, since it's making the opposite effect.
Ok, that's a fair point.
thanks,
- Joel
^ permalink raw reply
* Re: [PATCH] rtlwifi: btcoex: fix issue possible condition with no effect (if == else)
From: Pkshih @ 2019-07-18 2:45 UTC (permalink / raw)
To: Larry.Finger@lwfinger.net, linux-kernel@vger.kernel.org,
yuehaibing@huawei.com, kvalo@codeaurora.org,
natechancellor@gmail.com, hariprasad.kelam@gmail.com,
netdev@vger.kernel.org, davem@davemloft.net,
linux-wireless@vger.kernel.org
In-Reply-To: <20190712191535.GA4215@hari-Inspiron-1545>
On Fri, 2019-07-12 at 19:15 +0000, Hariprasad Kelam wrote:
> fix below issue reported by coccicheck
> drivers/net/wireless/realtek/rtlwifi/btcoexist/halbtcoutsrc.c:514:1-3:
> WARNING: possible condition with no effect (if == else)
>
> Signed-off-by: Hariprasad Kelam <hariprasad.kelam@gmail.com>
> ---
> drivers/net/wireless/realtek/rtlwifi/btcoexist/halbtcoutsrc.c | 8 +-------
> 1 file changed, 1 insertion(+), 7 deletions(-)
>
> diff --git a/drivers/net/wireless/realtek/rtlwifi/btcoexist/halbtcoutsrc.c
> b/drivers/net/wireless/realtek/rtlwifi/btcoexist/halbtcoutsrc.c
> index 152242a..191dafd0 100644
> --- a/drivers/net/wireless/realtek/rtlwifi/btcoexist/halbtcoutsrc.c
> +++ b/drivers/net/wireless/realtek/rtlwifi/btcoexist/halbtcoutsrc.c
> @@ -509,13 +509,7 @@ static u32 halbtc_get_wifi_link_status(struct btc_coexist
> *btcoexist)
>
> static s32 halbtc_get_wifi_rssi(struct rtl_priv *rtlpriv)
> {
> - int undec_sm_pwdb = 0;
> -
> - if (rtlpriv->mac80211.link_state >= MAC80211_LINKED)
> - undec_sm_pwdb = rtlpriv->dm.undec_sm_pwdb;
> - else /* associated entry pwdb */
> - undec_sm_pwdb = rtlpriv->dm.undec_sm_pwdb;
> - return undec_sm_pwdb;
> + return rtlpriv->dm.undec_sm_pwdb;
> }
>
> static bool halbtc_get(void *void_btcoexist, u8 get_type, void *out_buf)
It looks good to me. Thank you.
Acked-by: Ping-Ke Shih <pkshih@realtek.com>
^ permalink raw reply
* [PATCH] net: dsa: sja1105: Release lock in error case
From: Hariprasad Kelam @ 2019-07-18 2:21 UTC (permalink / raw)
To: Andrew Lunn, Vivien Didelot, Florian Fainelli, David S. Miller,
netdev, linux-kernel
This patch adds release of unlock in fail case.
Issue identified by coccicheck
Signed-off-by: Hariprasad Kelam <hariprasad.kelam@gmail.com>
---
net/dsa/tag_sja1105.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/net/dsa/tag_sja1105.c b/net/dsa/tag_sja1105.c
index 1d96c9d..26363d7 100644
--- a/net/dsa/tag_sja1105.c
+++ b/net/dsa/tag_sja1105.c
@@ -216,6 +216,7 @@ static struct sk_buff
if (!skb) {
dev_err_ratelimited(dp->ds->dev,
"Failed to copy stampable skb\n");
+ spin_unlock(&sp->data->meta_lock);
return NULL;
}
sja1105_transfer_meta(skb, meta);
--
2.7.4
^ permalink raw reply related
* [PATCH] udp: Fix typo in net/ipv4/udp.c
From: Su Yanjun @ 2019-07-18 2:19 UTC (permalink / raw)
To: davem, kuznet; +Cc: netdev, linux-kernel, Su Yanjun
Signed-off-by: Su Yanjun <suyj.fnst@cn.fujitsu.com>
---
net/ipv4/udp.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index c21862b..d88821c 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -2170,7 +2170,7 @@ static int __udp4_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
/* Initialize UDP checksum. If exited with zero value (success),
* CHECKSUM_UNNECESSARY means, that no more checks are required.
- * Otherwise, csum completion requires chacksumming packet body,
+ * Otherwise, csum completion requires checksumming packet body,
* including udp header and folding it to skb->csum.
*/
static inline int udp4_csum_init(struct sk_buff *skb, struct udphdr *uh,
--
2.7.4
^ permalink raw reply related
* [PATCH] udp: Fix typo in udpv4/p.c
From: Su Yanjun @ 2019-07-18 2:13 UTC (permalink / raw)
To: davem, kuznet; +Cc: netdev, linux-kernel, Su Yanjun
Signed-off-by: Su Yanjun <suyj.fnst@cn.fujitsu.com>
---
net/ipv4/udp.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index c21862b..d88821c 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -2170,7 +2170,7 @@ static int __udp4_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
/* Initialize UDP checksum. If exited with zero value (success),
* CHECKSUM_UNNECESSARY means, that no more checks are required.
- * Otherwise, csum completion requires chacksumming packet body,
+ * Otherwise, csum completion requires checksumming packet body,
* including udp header and folding it to skb->csum.
*/
static inline int udp4_csum_init(struct sk_buff *skb, struct udphdr *uh,
--
2.7.4
^ permalink raw reply related
* [PATCH] qed: Prefer pcie_capability_read_word()
From: Frederick Lawler @ 2019-07-18 2:07 UTC (permalink / raw)
To: aelior, GR-everest-linux-l2
Cc: Frederick Lawler, netdev, linux-kernel, bhelgaas
In-Reply-To: <20190718020745.8867-1-fred@fredlawl.com>
Commit 8c0d3a02c130 ("PCI: Add accessors for PCI Express Capability")
added accessors for the PCI Express Capability so that drivers didn't
need to be aware of differences between v1 and v2 of the PCI
Express Capability.
Replace pci_read_config_word() and pci_write_config_word() calls with
pcie_capability_read_word() and pcie_capability_write_word().
Signed-off-by: Frederick Lawler <fred@fredlawl.com>
---
drivers/net/ethernet/qlogic/qed/qed_rdma.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/qlogic/qed/qed_rdma.c b/drivers/net/ethernet/qlogic/qed/qed_rdma.c
index 7873d6dfd91f..8d8a920c3195 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_rdma.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_rdma.c
@@ -530,9 +530,8 @@ static void qed_rdma_init_devinfo(struct qed_hwfn *p_hwfn,
SET_FIELD(dev->dev_caps, QED_RDMA_DEV_CAP_LOCAL_INV_FENCE, 1);
/* Check atomic operations support in PCI configuration space. */
- pci_read_config_dword(cdev->pdev,
- cdev->pdev->pcie_cap + PCI_EXP_DEVCTL2,
- &pci_status_control);
+ pcie_capability_read_dword(cdev->pdev, PCI_EXP_DEVCTL2,
+ &pci_status_control);
if (pci_status_control & PCI_EXP_DEVCTL2_LTR_EN)
SET_FIELD(dev->dev_caps, QED_RDMA_DEV_CAP_ATOMIC_OP, 1);
--
2.17.1
^ permalink raw reply related
* [PATCH] igc: Prefer pcie_capability_read_word()
From: Frederick Lawler @ 2019-07-18 2:07 UTC (permalink / raw)
To: jeffrey.t.kirsher
Cc: Frederick Lawler, intel-wired-lan, netdev, linux-kernel, bhelgaas
In-Reply-To: <20190718020745.8867-1-fred@fredlawl.com>
Commit 8c0d3a02c130 ("PCI: Add accessors for PCI Express Capability")
added accessors for the PCI Express Capability so that drivers didn't
need to be aware of differences between v1 and v2 of the PCI
Express Capability.
Replace pci_read_config_word() and pci_write_config_word() calls with
pcie_capability_read_word() and pcie_capability_write_word().
Signed-off-by: Frederick Lawler <fred@fredlawl.com>
---
drivers/net/ethernet/intel/igc/igc_main.c | 12 ++++--------
1 file changed, 4 insertions(+), 8 deletions(-)
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index 34fa0e60a780..8e8ad07a5776 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -3891,13 +3891,11 @@ void igc_write_pci_cfg(struct igc_hw *hw, u32 reg, u16 *value)
s32 igc_read_pcie_cap_reg(struct igc_hw *hw, u32 reg, u16 *value)
{
struct igc_adapter *adapter = hw->back;
- u16 cap_offset;
- cap_offset = pci_find_capability(adapter->pdev, PCI_CAP_ID_EXP);
- if (!cap_offset)
+ if (!pci_is_pcie(adapter->pdev))
return -IGC_ERR_CONFIG;
- pci_read_config_word(adapter->pdev, cap_offset + reg, value);
+ pcie_capability_read_word(adapter->pdev, reg, value);
return IGC_SUCCESS;
}
@@ -3905,13 +3903,11 @@ s32 igc_read_pcie_cap_reg(struct igc_hw *hw, u32 reg, u16 *value)
s32 igc_write_pcie_cap_reg(struct igc_hw *hw, u32 reg, u16 *value)
{
struct igc_adapter *adapter = hw->back;
- u16 cap_offset;
- cap_offset = pci_find_capability(adapter->pdev, PCI_CAP_ID_EXP);
- if (!cap_offset)
+ if (!pci_is_pcie(adapter->pdev))
return -IGC_ERR_CONFIG;
- pci_write_config_word(adapter->pdev, cap_offset + reg, *value);
+ pcie_capability_write_word(adapter->pdev, reg, *value);
return IGC_SUCCESS;
}
--
2.17.1
^ permalink raw reply related
* [PATCH] cxgb4: Prefer pcie_capability_read_word()
From: Frederick Lawler @ 2019-07-18 2:07 UTC (permalink / raw)
To: vishal; +Cc: Frederick Lawler, netdev, linux-kernel, bhelgaas
Commit 8c0d3a02c130 ("PCI: Add accessors for PCI Express Capability")
added accessors for the PCI Express Capability so that drivers didn't
need to be aware of differences between v1 and v2 of the PCI
Express Capability.
Replace pci_read_config_word() and pci_write_config_word() calls with
pcie_capability_read_word() and pcie_capability_write_word().
Signed-off-by: Frederick Lawler <fred@fredlawl.com>
---
drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 6 ++----
drivers/net/ethernet/chelsio/cxgb4/t4_hw.c | 9 +++------
2 files changed, 5 insertions(+), 10 deletions(-)
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index 715e4edcf4a2..98ff71434673 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -5441,7 +5441,6 @@ static int cxgb4_iov_configure(struct pci_dev *pdev, int num_vfs)
char name[IFNAMSIZ];
u32 devcap2;
u16 flags;
- int pos;
/* If we want to instantiate Virtual Functions, then our
* parent bridge's PCI-E needs to support Alternative Routing
@@ -5449,9 +5448,8 @@ static int cxgb4_iov_configure(struct pci_dev *pdev, int num_vfs)
* and above.
*/
pbridge = pdev->bus->self;
- pos = pci_find_capability(pbridge, PCI_CAP_ID_EXP);
- pci_read_config_word(pbridge, pos + PCI_EXP_FLAGS, &flags);
- pci_read_config_dword(pbridge, pos + PCI_EXP_DEVCAP2, &devcap2);
+ pcie_capability_read_word(pbridge, PCI_EXP_FLAGS, &flags);
+ pcie_capability_read_dword(pbridge, PCI_EXP_DEVCAP2, &devcap2);
if ((flags & PCI_EXP_FLAGS_VERS) < 2 ||
!(devcap2 & PCI_EXP_DEVCAP2_ARI)) {
diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
index f9b70be59792..346d7b59c50b 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
@@ -7267,7 +7267,6 @@ int t4_fixup_host_params(struct adapter *adap, unsigned int page_size,
} else {
unsigned int pack_align;
unsigned int ingpad, ingpack;
- unsigned int pcie_cap;
/* T5 introduced the separation of the Free List Padding and
* Packing Boundaries. Thus, we can select a smaller Padding
@@ -7292,8 +7291,7 @@ int t4_fixup_host_params(struct adapter *adap, unsigned int page_size,
* multiple of the Maximum Payload Size.
*/
pack_align = fl_align;
- pcie_cap = pci_find_capability(adap->pdev, PCI_CAP_ID_EXP);
- if (pcie_cap) {
+ if (pci_is_pcie(adap->pdev)) {
unsigned int mps, mps_log;
u16 devctl;
@@ -7301,9 +7299,8 @@ int t4_fixup_host_params(struct adapter *adap, unsigned int page_size,
* [bits 7:5] encodes sizes as powers of 2 starting at
* 128 bytes.
*/
- pci_read_config_word(adap->pdev,
- pcie_cap + PCI_EXP_DEVCTL,
- &devctl);
+ pcie_capability_read_word(adap->pdev, PCI_EXP_DEVCTL,
+ &devctl);
mps_log = ((devctl & PCI_EXP_DEVCTL_PAYLOAD) >> 5) + 7;
mps = 1 << mps_log;
if (mps > pack_align)
--
2.17.1
^ permalink raw reply related
* [PATCH net] be2net: Synchronize be_update_queues with dev_watchdog
From: Benjamin Poirier @ 2019-07-18 1:42 UTC (permalink / raw)
To: David Miller
Cc: Sathya Perla, Ajit Khaparde, Sriharsha Basavapatna, Somnath Kotur,
Firo Yang, Saeed Mahameed, netdev
As pointed out by Firo Yang, a netdev tx timeout may trigger just before an
ethtool set_channels operation is started. be_tx_timeout(), which dumps
some queue structures, is not written to run concurrently with
be_update_queues(), which frees/allocates those queues structures. Add some
synchronization between the two.
Message-id: <CH2PR18MB31898E033896F9760D36BFF288C90@CH2PR18MB3189.namprd18.prod.outlook.com>
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
---
drivers/net/ethernet/emulex/benet/be_main.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c
index b7a246b33599..2edb86ec9fe9 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -4698,8 +4698,13 @@ int be_update_queues(struct be_adapter *adapter)
int status;
if (netif_running(netdev)) {
+ /* be_tx_timeout() must not run concurrently with this
+ * function, synchronize with an already-running dev_watchdog
+ */
+ netif_tx_lock_bh(netdev);
/* device cannot transmit now, avoid dev_watchdog timeouts */
netif_carrier_off(netdev);
+ netif_tx_unlock_bh(netdev);
be_close(netdev);
}
--
2.22.0
^ permalink raw reply related
* Re: [PATCH 1/4] dt-bindings: allow up to four clocks for orion-mdio
From: Andrew Lunn @ 2019-07-18 1:31 UTC (permalink / raw)
To: Rob Herring; +Cc: josua, netdev, stable, David S. Miller, Mark Rutland
In-Reply-To: <CAL_JsqK=qpCi6whqmjW2L8O=3u4oZemH=czm60q9QnC09Gr_ig@mail.gmail.com>
On Tue, Jul 09, 2019 at 04:03:28PM -0600, Rob Herring wrote:
> On Mon, Jul 8, 2019 at 8:41 PM Andrew Lunn <andrew@lunn.ch> wrote:
> >
> > > > Optional properties:
> > > > - interrupts: interrupt line number for the SMI error/done interrupt
> > > > -- clocks: phandle for up to three required clocks for the MDIO instance
> > > > +- clocks: phandle for up to four required clocks for the MDIO instance
> > >
> > > This needs to enumerate exactly what the clocks are. Shouldn't there
> > > be an additional clock-names value too?
> >
> > Hi Rob
> >
> > The driver does not care what they are called. It just turns them all
> > on, and turns them off again when removed.
>
> That's fine for the driver to do, but this is the hardware description.
>
> It's not just what they are called, but how many too. Is 1 clock in
> the DT valid? 0? It would be unusual for a given piece of h/w to
> function with a variable number of clocks.
Hi Rob
The orion5x has 0 clocks. kirkwood, dove, Armada XP, 370 375, 380
has 1 clock. Armada 37xx has 4.
So yes, 1 clock is valid. 0 clocks is also valid. The piece of
hardware itself does not care how many clocks are feeding it, so long
as they are all turned on.
Andrew
^ permalink raw reply
* RE: [EXT] [PATCH v1] net: fec: optionally reset PHY via a reset-controller
From: Andy Duan @ 2019-07-18 1:24 UTC (permalink / raw)
To: Sven Van Asbroeck
Cc: David S . Miller, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org
In-Reply-To: <CAGngYiU7B4uuqSAawNE6RFsjGPzbj5gzK9S299H+Qy+CWFjaAg@mail.gmail.com>
From: Sven Van Asbroeck <thesven73@gmail.com> Sent: Wednesday, July 17, 2019 8:48 PM
> On Tue, Jul 16, 2019 at 9:32 PM Andy Duan <fugang.duan@nxp.com> wrote:
> >
> > Yes, so the old legacy code is kept there. But it is better to clean
> > up all if there have enough boards to verify them.
>
> Would it make sense to print a warning message to the log whenever
> someone tries to use the legacy phy reset on the fec?
It is feasible, just as reminder to drop such usage.
^ permalink raw reply
* [PATCH iproute2] json: fix backslash escape typo in jsonw_puts
From: Ivan Delalande @ 2019-07-18 1:15 UTC (permalink / raw)
To: Stephen Hemminger, David Ahern; +Cc: netdev
Fixes: fcc16c22 ("provide common json output formatter")
Signed-off-by: Ivan Delalande <colona@arista.com>
---
lib/json_writer.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/lib/json_writer.c b/lib/json_writer.c
index 5004c181..88c5eb88 100644
--- a/lib/json_writer.c
+++ b/lib/json_writer.c
@@ -75,7 +75,7 @@ static void jsonw_puts(json_writer_t *self, const char *str)
fputs("\\b", self->out);
break;
case '\\':
- fputs("\\n", self->out);
+ fputs("\\\\", self->out);
break;
case '"':
fputs("\\\"", self->out);
--
2.22.0
^ permalink raw reply related
* Re: [PATCH ghak90 V6 02/10] audit: add container id
From: Richard Guy Briggs @ 2019-07-18 0:51 UTC (permalink / raw)
To: Paul Moore
Cc: Serge E. Hallyn, Tycho Andersen, containers, linux-api,
Linux-Audit Mailing List, linux-fsdevel, LKML, netdev,
netfilter-devel, sgrubb, omosnace, dhowells, simo, Eric Paris,
ebiederm, nhorman
In-Reply-To: <CAHC9VhScHizB2r5q3T5s0P3jkYdvzBPPudDksosYFp_TO7W9-Q@mail.gmail.com>
On 2019-07-16 19:30, Paul Moore wrote:
> On Tue, Jul 16, 2019 at 6:03 PM Richard Guy Briggs <rgb@redhat.com> wrote:
> > On 2019-07-15 17:04, Paul Moore wrote:
> > > On Mon, Jul 8, 2019 at 2:06 PM Richard Guy Briggs <rgb@redhat.com> wrote:
>
> ...
>
> > > > If we can't trust ns_capable() then why are we passing on
> > > > CAP_AUDIT_CONTROL? It is being passed down and not stripped purposely
> > > > by the orchestrator/engine. If ns_capable() isn't inherited how is it
> > > > gained otherwise? Can it be inserted by cotainer image? I think the
> > > > answer is "no". Either we trust ns_capable() or we have audit
> > > > namespaces (recommend based on user namespace) (or both).
> > >
> > > My thinking is that since ns_capable() checks the credentials with
> > > respect to the current user namespace we can't rely on it to control
> > > access since it would be possible for a privileged process running
> > > inside an unprivileged container to manipulate the audit container ID
> > > (containerized process has CAP_AUDIT_CONTROL, e.g. running as root in
> > > the container, while the container itself does not).
> >
> > What makes an unprivileged container unprivileged? "root", or "CAP_*"?
>
> My understanding is that when most people refer to an unprivileged
> container they are referring to a container run without capabilities
> or a container run by a user other than root. I'm sure there are
> better definitions out there, by folks much smarter than me on these
> things, but that's my working definition.
Close enough to my understanding...
> > If CAP_AUDIT_CONTROL is granted, does "root" matter?
>
> Our discussions here have been about capabilities, not UIDs. The only
> reason root might matter is that it generally has the full capability
> set.
Good, that's my understanding.
> > Does it matter what user namespace it is in?
>
> What likely matters is what check is called: capable() or
> ns_capable(). Those can yield very different results.
Ok, I finally found what I was looking for to better understand the
challenge with trusting ns_capable(). Sorry for being so dense and slow
on this one. I thought I had gone through the code carefully enough,
but this time I finally found it. set_cred_user_ns() sets a full set of
capabilities rather than inheriting them from the parent user_ns, called
from userns_install() or create_userns(). Even if the container
orchestrator/engine restricts those capabilities on its own containers,
they could easily unshare a userns and get a full set unless it also
restricted CAP_SYS_ADMIN, which is used too many other places to be
practical to restrict.
> > I understand that root is *gained* in an
> > unprivileged user namespace, but capabilities are inherited or permitted
> > and that process either has it or it doesn't and an unprivileged user
> > namespace can't gain a capability that has been rescinded. Different
> > subsystems use the userid or capabilities or both to determine
> > privileges.
>
> Once again, I believe the important thing to focus on here is
> capable() vs ns_capable(). We can't safely rely on ns_capable() for
> the audit container ID policy since that is easily met inside the
> container regardless of the process' creds which started the
> container.
Agreed.
> > In this case, is the userid relevant?
>
> We don't do UID checks, we do capability checks, so yes, the UID is irrelevant.
Agreed.
> > > > At this point I would say we are at an impasse unless we trust
> > > > ns_capable() or we implement audit namespaces.
> > >
> > > I'm not sure how we can trust ns_capable(), but if you can think of a
> > > way I would love to hear it. I'm also not sure how namespacing audit
> > > is helpful (see my above comments), but if you think it is please
> > > explain.
> >
> > So if we are not namespacing, why do we not trust capabilities?
>
> We can trust capable(CAP_AUDIT_CONTROL) for enforcing audit container
> ID policy, we can not trust ns_capable(CAP_AUDIT_CONTROL).
Ok. So does a process in a non-init user namespace have two (or more)
sets of capabilities stored in creds, one in the init_user_ns, and one
in current_user_ns? Or does it get stripped of all its capabilities in
init_user_ns once it has its own set in current_user_ns? If the former,
then we can use capable(). If the latter, we need another mechanism, as
you have suggested might be needed.
If some random unprivileged user wants to fire up a container
orchestrator/engine in his own user namespace, then audit needs to be
namespaced. Can we safely discard this scenario for now? That user can
use a VM.
> paul moore
- RGB
--
Richard Guy Briggs <rgb@redhat.com>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635
^ permalink raw reply
* Re: [PATCH v3 bpf-next 05/12] libbpf: add resizable non-thread safe internal hashmap
From: Arnaldo Carvalho de Melo @ 2019-07-18 0:24 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: andrii.nakryiko, ast, daniel, netdev, bpf, kernel-team,
linux-perf-users, Jiri Olsa, Namhyung Kim, Adrian Hunter
In-Reply-To: <20190524185908.3562231-6-andriin@fb.com>
Em Fri, May 24, 2019 at 11:59:00AM -0700, Andrii Nakryiko escreveu:
> There is a need for fast point lookups inside libbpf for multiple use
> cases (e.g., name resolution for BTF-to-C conversion, by-name lookups in
> BTF for upcoming BPF CO-RE relocation support, etc). This patch
> implements simple resizable non-thread safe hashmap using single linked
> list chains.
This is breaking the tools/perf build in some systems where __WORDSIZE use in
tools/lib/bpf/hashmap.h is not finding the definition of __WORDSIZE,
which I think requires using:
#include <limits.h>
Or perhaps
#include <stdint.h>
Non-exhaustive search on a fedora:30 system:
[acme@quaco perf]$ grep "define.*__WORDSIZE" /usr/include/*/*.h
/usr/include/bits/elfclass.h:#define __ELF_NATIVE_CLASS __WORDSIZE
/usr/include/bits/siginfo-arch.h:#if defined __x86_64__ && __WORDSIZE == 32
/usr/include/bits/timesize.h:# define __TIMESIZE __WORDSIZE
/usr/include/bits/wordsize.h:# define __WORDSIZE 64
/usr/include/bits/wordsize.h:# define __WORDSIZE 32
/usr/include/bits/wordsize.h:#define __WORDSIZE32_SIZE_ULONG 0
/usr/include/bits/wordsize.h:#define __WORDSIZE32_PTRDIFF_LONG 0
/usr/include/bits/wordsize.h:# define __WORDSIZE_TIME64_COMPAT32 1
/usr/include/bits/wordsize.h:# define __WORDSIZE_TIME64_COMPAT32 0
[acme@quaco perf]$ grep bits\/wordsize.h /usr/include/*.h
/usr/include/bfd.h:#include <bits/wordsize.h>
/usr/include/limits.h:#include <bits/wordsize.h>
/usr/include/pthread.h:#include <bits/wordsize.h>
/usr/include/stdint.h:#include <bits/wordsize.h>
/usr/include/tiffconf.h:#include <bits/wordsize.h>
[acme@quaco perf]$
On fedora:30 it works, probably because some header included from there
ends up adding the file that has that def, but these fail, for instance:
[perfbuilder@quaco linux-perf-tools-build]$ export PERF_TARBALL=http://192.168.124.1/perf/perf-5.2.0.tar.xz
[perfbuilder@quaco linux-perf-tools-build]$ time dm
1 13.57 alpine:3.4 : FAIL gcc (Alpine 5.3.0) 5.3.0, clang version 3.8.0 (tags/RELEASE_380/final)
2 13.78 alpine:3.5 : FAIL gcc (Alpine 6.2.1) 6.2.1 20160822, clang version 3.8.1 (tags/RELEASE_381/final)
3 12.42 alpine:3.6 : FAIL gcc (Alpine 6.3.0) 6.3.0, clang version 4.0.0 (tags/RELEASE_400/final)
4 15.01 alpine:3.7 : FAIL gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.0 (tags/RELEASE_500/final) (based on LLVM 5.0.0)
5 13.80 alpine:3.8 : FAIL gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.1 (tags/RELEASE_501/final) (based on LLVM 5.0.1)
6 15.37 alpine:3.9 : FAIL gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 5.0.1 (tags/RELEASE_502/final) (based on LLVM 5.0.1)
7 15.10 alpine:3.10 : FAIL gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 8.0.0 (tags/RELEASE_800/final) (based on LLVM 8.0.0)
8 15.26 alpine:edge : FAIL gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 7.0.1 (tags/RELEASE_701/final) (based on LLVM 7.0.1)
9: amazonlinux:1^C
I'm trying to see if adding #include <limits.h> fixes the problems in
all the distros in my container based build test suite.
Lets see with at least alpine:3.4:
Nope, didn't work, will try revisiting this tomorrow...
I'll let it build anyway to see if this fails in other systems/libcs,
BTW, Alpine uses musl libc.
- Arnaldo
> Four different insert strategies are supported:
> - HASHMAP_ADD - only add key/value if key doesn't exist yet;
> - HASHMAP_SET - add key/value pair if key doesn't exist yet; otherwise,
> update value;
> - HASHMAP_UPDATE - update value, if key already exists; otherwise, do
> nothing and return -ENOENT;
> - HASHMAP_APPEND - always add key/value pair, even if key already exists.
> This turns hashmap into a multimap by allowing multiple values to be
> associated with the same key. Most useful read API for such hashmap is
> hashmap__for_each_key_entry() iteration. If hashmap__find() is still
> used, it will return last inserted key/value entry (first in a bucket
> chain).
>
> For HASHMAP_SET and HASHMAP_UPDATE, old key/value pair is returned, so
> that calling code can handle proper memory management, if necessary.
>
> Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> ---
> tools/lib/bpf/Build | 2 +-
> tools/lib/bpf/hashmap.c | 229 ++++++++++++++++++++++++++++++++++++++++
> tools/lib/bpf/hashmap.h | 173 ++++++++++++++++++++++++++++++
> 3 files changed, 403 insertions(+), 1 deletion(-)
> create mode 100644 tools/lib/bpf/hashmap.c
> create mode 100644 tools/lib/bpf/hashmap.h
>
> diff --git a/tools/lib/bpf/Build b/tools/lib/bpf/Build
> index ee9d5362f35b..dcf0d36598e0 100644
> --- a/tools/lib/bpf/Build
> +++ b/tools/lib/bpf/Build
> @@ -1 +1 @@
> -libbpf-y := libbpf.o bpf.o nlattr.o btf.o libbpf_errno.o str_error.o netlink.o bpf_prog_linfo.o libbpf_probes.o xsk.o
> +libbpf-y := libbpf.o bpf.o nlattr.o btf.o libbpf_errno.o str_error.o netlink.o bpf_prog_linfo.o libbpf_probes.o xsk.o hashmap.o
> diff --git a/tools/lib/bpf/hashmap.c b/tools/lib/bpf/hashmap.c
> new file mode 100644
> index 000000000000..6122272943e6
> --- /dev/null
> +++ b/tools/lib/bpf/hashmap.c
> @@ -0,0 +1,229 @@
> +// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
> +
> +/*
> + * Generic non-thread safe hash map implementation.
> + *
> + * Copyright (c) 2019 Facebook
> + */
> +#include <stdint.h>
> +#include <stdlib.h>
> +#include <stdio.h>
> +#include <errno.h>
> +#include <linux/err.h>
> +#include "hashmap.h"
> +
> +/* start with 4 buckets */
> +#define HASHMAP_MIN_CAP_BITS 2
> +
> +static void hashmap_add_entry(struct hashmap_entry **pprev,
> + struct hashmap_entry *entry)
> +{
> + entry->next = *pprev;
> + *pprev = entry;
> +}
> +
> +static void hashmap_del_entry(struct hashmap_entry **pprev,
> + struct hashmap_entry *entry)
> +{
> + *pprev = entry->next;
> + entry->next = NULL;
> +}
> +
> +void hashmap__init(struct hashmap *map, hashmap_hash_fn hash_fn,
> + hashmap_equal_fn equal_fn, void *ctx)
> +{
> + map->hash_fn = hash_fn;
> + map->equal_fn = equal_fn;
> + map->ctx = ctx;
> +
> + map->buckets = NULL;
> + map->cap = 0;
> + map->cap_bits = 0;
> + map->sz = 0;
> +}
> +
> +struct hashmap *hashmap__new(hashmap_hash_fn hash_fn,
> + hashmap_equal_fn equal_fn,
> + void *ctx)
> +{
> + struct hashmap *map = malloc(sizeof(struct hashmap));
> +
> + if (!map)
> + return ERR_PTR(-ENOMEM);
> + hashmap__init(map, hash_fn, equal_fn, ctx);
> + return map;
> +}
> +
> +void hashmap__clear(struct hashmap *map)
> +{
> + free(map->buckets);
> + map->cap = map->cap_bits = map->sz = 0;
> +}
> +
> +void hashmap__free(struct hashmap *map)
> +{
> + if (!map)
> + return;
> +
> + hashmap__clear(map);
> + free(map);
> +}
> +
> +size_t hashmap__size(const struct hashmap *map)
> +{
> + return map->sz;
> +}
> +
> +size_t hashmap__capacity(const struct hashmap *map)
> +{
> + return map->cap;
> +}
> +
> +static bool hashmap_needs_to_grow(struct hashmap *map)
> +{
> + /* grow if empty or more than 75% filled */
> + return (map->cap == 0) || ((map->sz + 1) * 4 / 3 > map->cap);
> +}
> +
> +static int hashmap_grow(struct hashmap *map)
> +{
> + struct hashmap_entry **new_buckets;
> + struct hashmap_entry *cur, *tmp;
> + size_t new_cap_bits, new_cap;
> + size_t h;
> + int bkt;
> +
> + new_cap_bits = map->cap_bits + 1;
> + if (new_cap_bits < HASHMAP_MIN_CAP_BITS)
> + new_cap_bits = HASHMAP_MIN_CAP_BITS;
> +
> + new_cap = 1UL << new_cap_bits;
> + new_buckets = calloc(new_cap, sizeof(new_buckets[0]));
> + if (!new_buckets)
> + return -ENOMEM;
> +
> + hashmap__for_each_entry_safe(map, cur, tmp, bkt) {
> + h = hash_bits(map->hash_fn(cur->key, map->ctx), new_cap_bits);
> + hashmap_add_entry(&new_buckets[h], cur);
> + }
> +
> + map->cap = new_cap;
> + map->cap_bits = new_cap_bits;
> + free(map->buckets);
> + map->buckets = new_buckets;
> +
> + return 0;
> +}
> +
> +static bool hashmap_find_entry(const struct hashmap *map,
> + const void *key, size_t hash,
> + struct hashmap_entry ***pprev,
> + struct hashmap_entry **entry)
> +{
> + struct hashmap_entry *cur, **prev_ptr;
> +
> + if (!map->buckets)
> + return false;
> +
> + for (prev_ptr = &map->buckets[hash], cur = *prev_ptr;
> + cur;
> + prev_ptr = &cur->next, cur = cur->next) {
> + if (map->equal_fn(cur->key, key, map->ctx)) {
> + if (pprev)
> + *pprev = prev_ptr;
> + *entry = cur;
> + return true;
> + }
> + }
> +
> + return false;
> +}
> +
> +int hashmap__insert(struct hashmap *map, const void *key, void *value,
> + enum hashmap_insert_strategy strategy,
> + const void **old_key, void **old_value)
> +{
> + struct hashmap_entry *entry;
> + size_t h;
> + int err;
> +
> + if (old_key)
> + *old_key = NULL;
> + if (old_value)
> + *old_value = NULL;
> +
> + h = hash_bits(map->hash_fn(key, map->ctx), map->cap_bits);
> + if (strategy != HASHMAP_APPEND &&
> + hashmap_find_entry(map, key, h, NULL, &entry)) {
> + if (old_key)
> + *old_key = entry->key;
> + if (old_value)
> + *old_value = entry->value;
> +
> + if (strategy == HASHMAP_SET || strategy == HASHMAP_UPDATE) {
> + entry->key = key;
> + entry->value = value;
> + return 0;
> + } else if (strategy == HASHMAP_ADD) {
> + return -EEXIST;
> + }
> + }
> +
> + if (strategy == HASHMAP_UPDATE)
> + return -ENOENT;
> +
> + if (hashmap_needs_to_grow(map)) {
> + err = hashmap_grow(map);
> + if (err)
> + return err;
> + h = hash_bits(map->hash_fn(key, map->ctx), map->cap_bits);
> + }
> +
> + entry = malloc(sizeof(struct hashmap_entry));
> + if (!entry)
> + return -ENOMEM;
> +
> + entry->key = key;
> + entry->value = value;
> + hashmap_add_entry(&map->buckets[h], entry);
> + map->sz++;
> +
> + return 0;
> +}
> +
> +bool hashmap__find(const struct hashmap *map, const void *key, void **value)
> +{
> + struct hashmap_entry *entry;
> + size_t h;
> +
> + h = hash_bits(map->hash_fn(key, map->ctx), map->cap_bits);
> + if (!hashmap_find_entry(map, key, h, NULL, &entry))
> + return false;
> +
> + if (value)
> + *value = entry->value;
> + return true;
> +}
> +
> +bool hashmap__delete(struct hashmap *map, const void *key,
> + const void **old_key, void **old_value)
> +{
> + struct hashmap_entry **pprev, *entry;
> + size_t h;
> +
> + h = hash_bits(map->hash_fn(key, map->ctx), map->cap_bits);
> + if (!hashmap_find_entry(map, key, h, &pprev, &entry))
> + return false;
> +
> + if (old_key)
> + *old_key = entry->key;
> + if (old_value)
> + *old_value = entry->value;
> +
> + hashmap_del_entry(pprev, entry);
> + free(entry);
> + map->sz--;
> +
> + return true;
> +}
> +
> diff --git a/tools/lib/bpf/hashmap.h b/tools/lib/bpf/hashmap.h
> new file mode 100644
> index 000000000000..03748a742146
> --- /dev/null
> +++ b/tools/lib/bpf/hashmap.h
> @@ -0,0 +1,173 @@
> +/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
> +
> +/*
> + * Generic non-thread safe hash map implementation.
> + *
> + * Copyright (c) 2019 Facebook
> + */
> +#ifndef __LIBBPF_HASHMAP_H
> +#define __LIBBPF_HASHMAP_H
> +
> +#include <stdbool.h>
> +#include <stddef.h>
> +#include "libbpf_internal.h"
> +
> +static inline size_t hash_bits(size_t h, int bits)
> +{
> + /* shuffle bits and return requested number of upper bits */
> + return (h * 11400714819323198485llu) >> (__WORDSIZE - bits);
> +}
> +
> +typedef size_t (*hashmap_hash_fn)(const void *key, void *ctx);
> +typedef bool (*hashmap_equal_fn)(const void *key1, const void *key2, void *ctx);
> +
> +struct hashmap_entry {
> + const void *key;
> + void *value;
> + struct hashmap_entry *next;
> +};
> +
> +struct hashmap {
> + hashmap_hash_fn hash_fn;
> + hashmap_equal_fn equal_fn;
> + void *ctx;
> +
> + struct hashmap_entry **buckets;
> + size_t cap;
> + size_t cap_bits;
> + size_t sz;
> +};
> +
> +#define HASHMAP_INIT(hash_fn, equal_fn, ctx) { \
> + .hash_fn = (hash_fn), \
> + .equal_fn = (equal_fn), \
> + .ctx = (ctx), \
> + .buckets = NULL, \
> + .cap = 0, \
> + .cap_bits = 0, \
> + .sz = 0, \
> +}
> +
> +void hashmap__init(struct hashmap *map, hashmap_hash_fn hash_fn,
> + hashmap_equal_fn equal_fn, void *ctx);
> +struct hashmap *hashmap__new(hashmap_hash_fn hash_fn,
> + hashmap_equal_fn equal_fn,
> + void *ctx);
> +void hashmap__clear(struct hashmap *map);
> +void hashmap__free(struct hashmap *map);
> +
> +size_t hashmap__size(const struct hashmap *map);
> +size_t hashmap__capacity(const struct hashmap *map);
> +
> +/*
> + * Hashmap insertion strategy:
> + * - HASHMAP_ADD - only add key/value if key doesn't exist yet;
> + * - HASHMAP_SET - add key/value pair if key doesn't exist yet; otherwise,
> + * update value;
> + * - HASHMAP_UPDATE - update value, if key already exists; otherwise, do
> + * nothing and return -ENOENT;
> + * - HASHMAP_APPEND - always add key/value pair, even if key already exists.
> + * This turns hashmap into a multimap by allowing multiple values to be
> + * associated with the same key. Most useful read API for such hashmap is
> + * hashmap__for_each_key_entry() iteration. If hashmap__find() is still
> + * used, it will return last inserted key/value entry (first in a bucket
> + * chain).
> + */
> +enum hashmap_insert_strategy {
> + HASHMAP_ADD,
> + HASHMAP_SET,
> + HASHMAP_UPDATE,
> + HASHMAP_APPEND,
> +};
> +
> +/*
> + * hashmap__insert() adds key/value entry w/ various semantics, depending on
> + * provided strategy value. If a given key/value pair replaced already
> + * existing key/value pair, both old key and old value will be returned
> + * through old_key and old_value to allow calling code do proper memory
> + * management.
> + */
> +int hashmap__insert(struct hashmap *map, const void *key, void *value,
> + enum hashmap_insert_strategy strategy,
> + const void **old_key, void **old_value);
> +
> +static inline int hashmap__add(struct hashmap *map,
> + const void *key, void *value)
> +{
> + return hashmap__insert(map, key, value, HASHMAP_ADD, NULL, NULL);
> +}
> +
> +static inline int hashmap__set(struct hashmap *map,
> + const void *key, void *value,
> + const void **old_key, void **old_value)
> +{
> + return hashmap__insert(map, key, value, HASHMAP_SET,
> + old_key, old_value);
> +}
> +
> +static inline int hashmap__update(struct hashmap *map,
> + const void *key, void *value,
> + const void **old_key, void **old_value)
> +{
> + return hashmap__insert(map, key, value, HASHMAP_UPDATE,
> + old_key, old_value);
> +}
> +
> +static inline int hashmap__append(struct hashmap *map,
> + const void *key, void *value)
> +{
> + return hashmap__insert(map, key, value, HASHMAP_APPEND, NULL, NULL);
> +}
> +
> +bool hashmap__delete(struct hashmap *map, const void *key,
> + const void **old_key, void **old_value);
> +
> +bool hashmap__find(const struct hashmap *map, const void *key, void **value);
> +
> +/*
> + * hashmap__for_each_entry - iterate over all entries in hashmap
> + * @map: hashmap to iterate
> + * @cur: struct hashmap_entry * used as a loop cursor
> + * @bkt: integer used as a bucket loop cursor
> + */
> +#define hashmap__for_each_entry(map, cur, bkt) \
> + for (bkt = 0; bkt < map->cap; bkt++) \
> + for (cur = map->buckets[bkt]; cur; cur = cur->next)
> +
> +/*
> + * hashmap__for_each_entry_safe - iterate over all entries in hashmap, safe
> + * against removals
> + * @map: hashmap to iterate
> + * @cur: struct hashmap_entry * used as a loop cursor
> + * @tmp: struct hashmap_entry * used as a temporary next cursor storage
> + * @bkt: integer used as a bucket loop cursor
> + */
> +#define hashmap__for_each_entry_safe(map, cur, tmp, bkt) \
> + for (bkt = 0; bkt < map->cap; bkt++) \
> + for (cur = map->buckets[bkt]; \
> + cur && ({tmp = cur->next; true; }); \
> + cur = tmp)
> +
> +/*
> + * hashmap__for_each_key_entry - iterate over entries associated with given key
> + * @map: hashmap to iterate
> + * @cur: struct hashmap_entry * used as a loop cursor
> + * @key: key to iterate entries for
> + */
> +#define hashmap__for_each_key_entry(map, cur, _key) \
> + for (cur = ({ size_t bkt = hash_bits(map->hash_fn((_key), map->ctx),\
> + map->cap_bits); \
> + map->buckets ? map->buckets[bkt] : NULL; }); \
> + cur; \
> + cur = cur->next) \
> + if (map->equal_fn(cur->key, (_key), map->ctx))
> +
> +#define hashmap__for_each_key_entry_safe(map, cur, tmp, _key) \
> + for (cur = ({ size_t bkt = hash_bits(map->hash_fn((_key), map->ctx),\
> + map->cap_bits); \
> + cur = map->buckets ? map->buckets[bkt] : NULL; }); \
> + cur && ({ tmp = cur->next; true; }); \
> + cur = tmp) \
> + if (map->equal_fn(cur->key, (_key), map->ctx))
> +
> +#endif /* __LIBBPF_HASHMAP_H */
> --
> 2.17.1
--
- Arnaldo
^ permalink raw reply
* Re: [PATCH AUTOSEL 5.2 226/249] selftests: bpf: fix inlines in test_lwt_seg6local
From: Sasha Levin @ 2019-07-17 23:47 UTC (permalink / raw)
To: Jiri Benc
Cc: linux-kernel, stable, Yonghong Song, Daniel Borkmann,
linux-kselftest, netdev, bpf, clang-built-linux
In-Reply-To: <20190717114334.5556a14e@redhat.com>
On Wed, Jul 17, 2019 at 11:43:34AM +0200, Jiri Benc wrote:
>On Mon, 15 Jul 2019 09:46:31 -0400, Sasha Levin wrote:
>> From: Jiri Benc <jbenc@redhat.com>
>>
>> [ Upstream commit 11aca65ec4db09527d3e9b6b41a0615b7da4386b ]
>>
>> Selftests are reporting this failure in test_lwt_seg6local.sh:
>
>I don't think this is critical in any way and I don't think this is a
>stable material. How was this selected?
It fixes a bug, right?
--
Thanks,
Sasha
^ permalink raw reply
* Re: phylink: flow control on fixed-link not working.
From: René van Dorst @ 2019-07-17 22:58 UTC (permalink / raw)
To: Russell King - ARM Linux admin; +Cc: netdev
In-Reply-To: <20190717215150.tk3gvq7lqc56scac@shell.armlinux.org.uk>
Quoting Russell King - ARM Linux admin <linux@armlinux.org.uk>:
> On Wed, Jul 17, 2019 at 09:31:11PM +0000, René van Dorst wrote:
>> Hi,
>>
>> I am trying to enable flow control/pause on PHYLINK and fixed-link.
>>
>> My setup SOC mac (mt7621) <-> RGMII <-> SWITCH mac (mt7530).
>>
>> It seems that in fixed-link mode all the flow control/pause bits are cleared
>> in
>> phylink_parse_fixedlink(). If I read phylink_parse_fixedlink() [0]
>> correctly,
>> I see that pl->link_config.advertising is AND with pl->supprted
>> which has only
>> the PHY_SETTING() modes bits set. pl->link_config.advertising is
>> losing Pause
>> bits. pl->link_config.advertising is used in phylink_resolve_flow()
>> to set the
>> MLO_PAUSE_RX/TX BITS.
>>
>> I think this is an error.
>> Because in phylink_start() see this part [1].
>>
>> /* Apply the link configuration to the MAC when starting. This allows
>> * a fixed-link to start with the correct parameters, and also
>> * ensures that we set the appropriate advertisement for Serdes links.
>> */
>> phylink_resolve_flow(pl, &pl->link_config);
>> phylink_mac_config(pl, &pl->link_config);
>>
>>
>> If I add a this hacky patch below, flow control is enabled on the
>> fixed-link.
>> if (s) {
>> __set_bit(s->bit, pl->supported);
>> + if (phylink_test(pl->link_config.advertising, Pause))
>> + phylink_set(pl->supported, Pause);
>> } else {
>>
>> So is phylink_parse_fixedlink() broken or should it handled in a other way?
>
> Quite simply, if the MAC says it doesn't support pause modes (i.o.w.
> the validate callback clears them) then pause modes aren't supported.
Hi Russel,
Thanks for your response.
I believe that I am setting pause bits right on both ends see SOC [0] and
SWITCH [1] and also in the DTS [2].
Correct me if it is not the right way.
Maybe I am looking in the wrong part of the code.
But I added many debug lines in phylink_parse_fixedlink() [3] to see what
happens with the Pause bit in the pl->link_config.advertising and
pl->supported.
This is the dmesg output.
[ 1.991245] libphy: Fixed MDIO Bus: probed
[ 2.031260] phylink_create: config0: Pause
[ 2.039410] phylink_create: supported: Pause
[ 2.047904] mtk_validate: mask: Pause
[ 2.055186] mtk_validate: supported: Pause
[ 2.063332] mtk_validate: advertising: Pause
[ 2.071825] phylink_create: config1: Pause
[ 2.079966] phylink_create: config2: Pause
[ 2.088132] phylink_parse_fixedlink: config: Pause
[ 2.097660] phylink_parse_fixedlink: support: Pause
[ 2.107366] mtk_validate: mask: Pause
[ 2.114647] mtk_validate: supported: Pause
[ 2.122792] mtk_validate: advertising: Pause
[ 2.131283] phylink_parse_fixedlink: config2: Pause
[ 2.140971] phylink_parse_fixedlink: support2: Pause
[ 2.150845] phylink_parse_fixedlink: config3: Pause
[ 2.160546] phylink_parse_fixedlink: support3: Pause
[ 2.170420] phylink_parse_fixedlink: config4: Pause
[ 2.180120] phylink_parse_fixedlink: config5: Pause
[ 5.854674] mt7530 mdio-bus:1f: configuring for fixed/trgmii link mode
[ 5.867665] phylink_resolve_flow: PAUSE_AN: pause: 0, 12, 8dfba630
[ 5.867670] phylink_resolve_flow: new_pause: 0
[ 5.879980] mt7530 mdio-bus:1f: phylink_mac_config:
mode=fixed/trgmii/1Gbps/Full adv=00,00000000,00000220 pause=12 link=1
an=1
[ 6.651239] DSA: tree 0 setup
[ 6.658192] input: gpio-keys as /devices/platform/gpio-keys/input/input0
[ 6.672108] mt7530 mdio-bus:1f: phylink_mac_config:
mode=fixed/trgmii/1Gbps/Full adv=00,00000000,00000220 pause=12 link=1
an=1
[ 28.937543] mtk_soc_eth 1e100000.ethernet eth0: configuring for
fixed/trgmii link mode
[ 28.965884] mtk_soc_eth 1e100000.ethernet eth0: phylink_mac_config:
mode=fixed/trgmii/1Gbps/Full adv=00,00000000,00000220 pause=12 link=1
an=1
[ 29.000740] mtk_soc_eth 1e100000.ethernet eth0: phylink_mac_config:
mode=fixed/trgmii/1Gbps/Full adv=00,00000000,00000220 pause=12 link=1
an=1
[ 29.026392] mtk_soc_eth 1e100000.ethernet eth0: Link is Up -
1Gbps/Full - flow control off
[ 29.373577] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
I don't see the "config6:" [4] debug.
I think the pause bits are always cleared in
pl->link_config.advertising by phylink_parse_fixedlink()
Again I may understand the code wrong or I am looking at the wrong place.
So I hope you can point me in the right direction.
Greats,
René
[0]:
https://github.com/vDorst/linux-1/blob/8538cdefd425592d249a71445c466159b0f27475/drivers/net/ethernet/mediatek/mtk_eth_soc.c#L502
[1]:
https://github.com/vDorst/linux-1/blob/8538cdefd425592d249a71445c466159b0f27475/drivers/net/dsa/mt7530.c#L1468
[2]:
https://github.com/vDorst/linux-1/blob/8538cdefd425592d249a71445c466159b0f27475/drivers/staging/mt7621-dts/UBNT-ER-e50.dtsi#L122
[3]:
https://github.com/vDorst/linux-1/blob/8538cdefd425592d249a71445c466159b0f27475/drivers/net/phy/phylink.c#L214
[4]:
https://github.com/vDorst/linux-1/blob/8538cdefd425592d249a71445c466159b0f27475/drivers/net/phy/phylink.c#L263
>
> --
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
> According to speedtest.net: 11.9Mbps down 500kbps up
^ permalink raw reply
* Re: [PATCH] qlge: Move drivers/net/ethernet/qlogic/qlge/ to drivers/staging/qlge/
From: Benjamin Poirier @ 2019-07-17 22:52 UTC (permalink / raw)
To: David Miller; +Cc: gregkh, GR-Linux-NIC-Dev, manishc, netdev
In-Reply-To: <20190717.120208.205802053970227674.davem@davemloft.net>
On 2019/07/17 12:02, David Miller wrote:
> From: Benjamin Poirier <bpoirier@suse.com>
> Date: Tue, 16 Jul 2019 11:34:59 +0900
>
> > The hardware has been declared EOL by the vendor more than 5 years ago.
> > What's more relevant to the Linux kernel is that the quality of this driver
> > is not on par with many other mainline drivers.
> >
> > Cc: Manish Chopra <manishc@marvell.com>
> > Message-id: <20190617074858.32467-1-bpoirier@suse.com>
> > Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
>
> Please resubmit this when the net-next tree opens back up.
>
Sorry, I thought this was gonna go through Greg's tree. Will do.
^ permalink raw reply
* Re: [PATCH net] ipv6: rt6_check should return NULL if 'from' is NULL
From: David Miller @ 2019-07-17 22:26 UTC (permalink / raw)
To: dsahern; +Cc: netdev, linux-kernel, dsahern
In-Reply-To: <20190717220843.974-1-dsahern@kernel.org>
From: David Ahern <dsahern@kernel.org>
Date: Wed, 17 Jul 2019 15:08:43 -0700
> From: David Ahern <dsahern@gmail.com>
>
> Paul reported that l2tp sessions were broken after the commit referenced
> in the Fixes tag. Prior to this commit rt6_check returned NULL if the
> rt6_info 'from' was NULL - ie., the dst_entry was disconnected from a FIB
> entry. Restore that behavior.
>
> Fixes: 93531c674315 ("net/ipv6: separate handling of FIB entries from dst based routes")
> Reported-by: Paul Donohue <linux-kernel@PaulSD.com>
> Tested-by: Paul Donohue <linux-kernel@PaulSD.com>
> Signed-off-by: David Ahern <dsahern@gmail.com>
Applied and queued up for -stable, thanks.
^ permalink raw reply
* Re: [net 1/1] tipc: initialize 'validated' field of received packets
From: David Miller @ 2019-07-17 22:24 UTC (permalink / raw)
To: jon.maloy
Cc: netdev, gordan.mihaljevic, tung.q.nguyen, hoang.h.le, canh.d.luu,
ying.xue, tipc-discussion
In-Reply-To: <1563399824-4462-1-git-send-email-jon.maloy@ericsson.com>
From: Jon Maloy <jon.maloy@ericsson.com>
Date: Wed, 17 Jul 2019 23:43:44 +0200
> The tipc_msg_validate() function leaves a boolean flag 'validated' in
> the validated buffer's control block, to avoid performing this action
> more than once. However, at reception of new packets, the position of
> this field may already have been set by lower layer protocols, so
> that the packet is erroneously perceived as already validated by TIPC.
>
> We fix this by initializing the said field to 'false' before performing
> the initial validation.
>
> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Applied.
^ permalink raw reply
* Re: [PATCH bpf] bpf: fix narrower loads on s390
From: Y Song @ 2019-07-17 22:23 UTC (permalink / raw)
To: Ilya Leoshkevich; +Cc: bpf, netdev, gor, heiko.carstens
In-Reply-To: <B91434A8-6056-49E2-852D-6DE5FFD53B29@linux.ibm.com>
On Wed, Jul 17, 2019 at 1:52 PM Ilya Leoshkevich <iii@linux.ibm.com> wrote:
>
> > Am 17.07.2019 um 18:25 schrieb Y Song <ys114321@gmail.com>:
> >
> > On Wed, Jul 17, 2019 at 3:36 AM Ilya Leoshkevich <iii@linux.ibm.com> wrote:
> >>
> >>
> >> Here is a better one: len=0x11223344 and we would like to do
> >> ((u8 *)&len)[3].
> >>
> >> len is represented as `11 22 33 44` in memory, so the desired result is
> >> 0x44. It can be obtained by doing (*(u32 *)&len) & 0xff, but today the
> >> verifier does ((*(u32 *)&len) >> 24) & 0xff instead.
> >
> > What you described above for the memory layout all makes sense.
> > The root cause is for big endian, we should do *((u8 *)&len + 3).
> > This is exactly what macros in test_pkt_md_access.c tries to do.
> >
> > if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
> > #define TEST_FIELD(TYPE, FIELD, MASK) \
> > { \
> > TYPE tmp = *(volatile TYPE *)&skb->FIELD; \
> > if (tmp != ((*(volatile __u32 *)&skb->FIELD) & MASK)) \
> > return TC_ACT_SHOT; \
> > }
> > #else
> > #define TEST_FIELD_OFFSET(a, b) ((sizeof(a) - sizeof(b)) / sizeof(b))
> > #define TEST_FIELD(TYPE, FIELD, MASK) \
> > { \
> > TYPE tmp = *((volatile TYPE *)&skb->FIELD + \
> > TEST_FIELD_OFFSET(skb->FIELD, TYPE)); \
> > if (tmp != ((*(volatile __u32 *)&skb->FIELD) & MASK)) \
> > return TC_ACT_SHOT; \
> > }
> > #endif
> >
> > Could you check whether your __BYTE_ORDER__ is set
> > correctly or not for this case? You may need to tweak Makefile
> > if you are doing cross compilation, I am not sure how as I
> > did not have environment.
>
> I’m building natively on s390.
>
> Here is the (formatted) preprocessed C code for the first condition:
>
> {
> __u8 tmp = *((volatile __u8 *)&skb->len +
> ((sizeof(skb->len) - sizeof(__u8)) / sizeof(__u8)));
> if (tmp != ((*(volatile __u32 *)&skb->len) & 0xFF)) return 2;
> };
>
> So I believe the endianness is chosen correctly.
>
> Here is the clang-generated BPF bytecode for the first condition:
>
> # llvm-objdump -d test_pkt_md_access.o
> 0000000000000000 process:
> 0: 71 21 00 03 00 00 00 00 r2 = *(u8 *)(r1 + 3)
> 1: 61 31 00 00 00 00 00 00 r3 = *(u32 *)(r1 + 0)
> 2: 57 30 00 00 00 00 00 ff r3 &= 255
> 3: 5d 23 00 1d 00 00 00 00 if r2 != r3 goto +29 <LBB0_10>
>
> This also looks good to me.
>
> Finally, here is the verifier-generated BPF bytecode:
>
> # bpftool prog dump xlated id 14
> ; TEST_FIELD(__u8, len, 0xFF);
> 0: (61) r2 = *(u32 *)(r1 +104)
> 1: (bc) w2 = w2
> 2: (74) w2 >>= 24
> 3: (bc) w2 = w2
> 4: (54) w2 &= 255
> 5: (bc) w2 = w2
>
> Here we can see the shift that I'm referring to. I believe we should
> translate *(u8 *)(r1 + 3) in this case without this shift on big-endian
> machines.
Thanks for the detailed illustration.
Now, with your detailed output of byte codes and xlated program, it
indeed becomes apparent
that verifier should not do shift at insn 2.
I was correct that after insn 0, register r2 should hold the same
value for big and little endian.
But I missed the fact in the first review that insn->off for original
first insn is different.
r2 = *(u8 *)(r1 + 3), the first insn on big endian, and r2 = *(u8 *)r1
for little endian.
They should really have the same shift amount.
Therefore, indeed, shifting amount is actually different between big
and little endians.
So your code is correct. Could you add a macro in linux/filter.h? Most
narrow load related
macros are there? This way, we maintain verifier.c __BYTE_ORDER__ macro free.
Also, could you put your above analysis in the commit message? This will help
reasoning the change easily later on.
Thanks!
>
> Best regards,
> Ilya
^ permalink raw reply
* Re: [Patch net v3 0/2] ipv4: relax source validation check for loopback packets
From: David Miller @ 2019-07-17 22:23 UTC (permalink / raw)
To: xiyou.wangcong; +Cc: netdev, dsahern
In-Reply-To: <20190717214159.25959-1-xiyou.wangcong@gmail.com>
From: Cong Wang <xiyou.wangcong@gmail.com>
Date: Wed, 17 Jul 2019 14:41:57 -0700
> This patchset fixes a corner case when loopback packets get dropped
> by rp_filter when we route them from veth to lo. Patch 1 is the fix
> and patch 2 provides a simplified test case for this scenario.
Series applied, thanks Cong.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox