Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH v3 bpf-next 02/12] libbpf: implement BPF CO-RE offset relocation algorithm
From: Andrii Nakryiko @ 2019-08-03  6:30 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Yonghong Song, Song Liu, Kernel Team
In-Reply-To: <20190802215604.onihsysinwiu3shl@ast-mbp.dhcp.thefacebook.com>

On Fri, Aug 2, 2019 at 2:56 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Fri, Aug 02, 2019 at 12:16:52AM -0700, Andrii Nakryiko wrote:
> > On Thu, Aug 1, 2019 at 4:50 PM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Wed, Jul 31, 2019 at 11:47:53PM -0700, Andrii Nakryiko wrote:
> > > > This patch implements the core logic for BPF CO-RE offsets relocations.
> > > > Every instruction that needs to be relocated has corresponding
> > > > bpf_offset_reloc as part of BTF.ext. Relocations are performed by trying
> > > > to match recorded "local" relocation spec against potentially many
> > > > compatible "target" types, creating corresponding spec. Details of the
> > > > algorithm are noted in corresponding comments in the code.
> > > >
> > > > Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> > > > Acked-by: Song Liu <songliubraving@fb.com>
> > > ...
> > > > +             if (btf_is_composite(t)) {
> > > > +                     const struct btf_member *m = (void *)(t + 1);
> > > > +                     __u32 offset;
> > > > +
> > > > +                     if (access_idx >= BTF_INFO_VLEN(t->info))
> > > > +                             return -EINVAL;
> > > > +
> > > > +                     m = &m[access_idx];
> > > > +
> > > > +                     if (BTF_INFO_KFLAG(t->info)) {
> > > > +                             if (BTF_MEMBER_BITFIELD_SIZE(m->offset))
> > > > +                                     return -EINVAL;
> > > > +                             offset = BTF_MEMBER_BIT_OFFSET(m->offset);
> > > > +                     } else {
> > > > +                             offset = m->offset;
> > > > +                     }
> > >
> > > very similar logic exists in btf_dump.c
> > > probably makes sense to make a common helper at some point.
> >
> > Will add btf_member_bit_offset(type, member) and
> > btf_member_bit_size(type, member).
> >
> > >
> > > > +static size_t bpf_core_essential_name_len(const char *name)
> > > > +{
> > > > +     size_t n = strlen(name);
> > > > +     int i = n - 3;
> > > > +
> > > > +     while (i > 0) {
> > > > +             if (name[i] == '_' && name[i + 1] == '_' && name[i + 2] == '_')
> > > > +                     return i;
> > > > +             i--;
> > > > +     }
> > > > +     return n;
> > > > +}
> > >
> > > that's a bit of an eye irritant. How about?
> > >         size_t n = strlen(name);
> > >         int i, cnt = 0;
> > >
> > >         for (i = n - 1; i >= 0; i--) {
> > >                 if (name[i] == '_') {
> > >                     cnt++;
> > >                 } else {
> > >                    if (cnt == 3)
> > >                       return i + 1;
> > >                    cnt = 0;
> > >                 }
> > >         }
> > >         return n;
> >
> > I find this one much harder to read and understand. What's
> > eye-irritating about that loop?
> >
> > Your loop will also handle `a____b` differently. My version will
> > return "a_" as essential name, yours "a____b". Was this intentional on
> > your part?
>
> hmm. I think both will return sizeof("a") == 1

nope, there are 4 underscores, your implementation will bump cnt to 4
without checking it for `cnt == 3`, so will never detect flavor and
will just return sizeof("a____b"). It's easily fixable, but the point
is that my original irritating code is very straightforward and harder
to get wrong an, easier to understand at a glimpse, yours require much
more conscious thought to understand.

>
> > I'd rather use this instead, if you hate the first one:
> >
> > size_t n = strlen(name);
> > int i;
> >
> > for (i = n - 3; i > 0; i--) {
> >     if (strncmp(name + i, "___", 3) == 0)
> >         return i;
> > }
> >
> > Is this better?
>
> that is worse.
> What I don't like about it is that every byte is
> compared N=sizeof(string-to-found) times.
> I guess it's not such a big performance criticial path,
> but libbpf has to keep the bar high.

We are talking about searching for *three* characters in a short
string. Performance difference is negligible at best, unnoticeable at
worst. I'd rather have straightforward and easy code, but I'll rewrite
it as a state machine the way you proposed.

>
> > >
> > > > +     case BTF_KIND_ARRAY: {
> > > > +             const struct btf_array *loc_a, *targ_a;
> > > > +
> > > > +             loc_a = (void *)(local_type + 1);
> > > > +             targ_a = (void *)(targ_type + 1);
> > > > +             local_id = loc_a->type;
> > > > +             targ_id = targ_a->type;
> > >
> > > can we add a helper like:
> >
> > Yes, we can. I was thinking about that, but decided to not expand
> > patch set. But we do need to extract all those small, but nice
> > helpers. I'll put them in libbpf_internal.h for now, but I think it
> > might be good idea to expose them as part of btf.h. Thoughts?
>
> part of btf.h make sense to me.
>
> >
> > > const struct btf_array *btf_array(cosnt struct btf_type *t)
> > > {
> > >         return (const struct btf_array *)(t + 1);
> > > }
> > >
> > > then above will be:
> > >         case BTF_KIND_ARRAY: {
> > >                 local_id = btf_array(local_type)->type;
> > >                 targ_id = btf_array(targ_type)->type;
> > >
> > > and a bunch of code in btf.c and btf_dump.c would be cleaner as well?
> >
> > Yep, some of those are already scattered around btf.c and btf_dump.c,
> > will clean up and add patch to this patch set.
> >
> > >
> > > > +             goto recur;
> > > > +     }
> > > > +     default:
> > > > +             pr_warning("unexpected kind %d relocated, local [%d], target [%d]\n",
> > > > +                        kind, local_id, targ_id);
> > > > +             return 0;
> > > > +     }
> > > > +}
> > > > +
> > > > +/*
> > > > + * Given single high-level named field accessor in local type, find
> > > > + * corresponding high-level accessor for a target type. Along the way,
> > > > + * maintain low-level spec for target as well. Also keep updating target
> > > > + * offset.
> > > > + *
> > > > + * Searching is performed through recursive exhaustive enumeration of all
> > > > + * fields of a struct/union. If there are any anonymous (embedded)
> > > > + * structs/unions, they are recursively searched as well. If field with
> > > > + * desired name is found, check compatibility between local and target types,
> > > > + * before returning result.
> > > > + *
> > > > + * 1 is returned, if field is found.
> > > > + * 0 is returned if no compatible field is found.
> > > > + * <0 is returned on error.
> > > > + */
> > > > +static int bpf_core_match_member(const struct btf *local_btf,
> > > > +                              const struct bpf_core_accessor *local_acc,
> > > > +                              const struct btf *targ_btf,
> > > > +                              __u32 targ_id,
> > > > +                              struct bpf_core_spec *spec,
> > > > +                              __u32 *next_targ_id)
> > > > +{
> > > > +     const struct btf_type *local_type, *targ_type;
> > > > +     const struct btf_member *local_member, *m;
> > > > +     const char *local_name, *targ_name;
> > > > +     __u32 local_id;
> > > > +     int i, n, found;
> > > > +
> > > > +     targ_type = skip_mods_and_typedefs(targ_btf, targ_id, &targ_id);
> > > > +     if (!targ_type)
> > > > +             return -EINVAL;
> > > > +     if (!btf_is_composite(targ_type))
> > > > +             return 0;
> > > > +
> > > > +     local_id = local_acc->type_id;
> > > > +     local_type = btf__type_by_id(local_btf, local_id);
> > > > +     local_member = (void *)(local_type + 1);
> > > > +     local_member += local_acc->idx;
> > > > +     local_name = btf__name_by_offset(local_btf, local_member->name_off);
> > > > +
> > > > +     n = BTF_INFO_VLEN(targ_type->info);
> > > > +     m = (void *)(targ_type + 1);
> > >
> > > new btf_member() helper?
> > >
> > > > +     for (i = 0; i < n; i++, m++) {
> > > > +             __u32 offset;
> > > > +
> > > > +             /* bitfield relocations not supported */
> > > > +             if (BTF_INFO_KFLAG(targ_type->info)) {
> > > > +                     if (BTF_MEMBER_BITFIELD_SIZE(m->offset))
> > > > +                             continue;
> > > > +                     offset = BTF_MEMBER_BIT_OFFSET(m->offset);
> > > > +             } else {
> > > > +                     offset = m->offset;
> > > > +             }
> > > > +             if (offset % 8)
> > > > +                     continue;
> > >
> > > same bit of code again?
> > > definitely could use a helper.
> >
> > Different handling (continue here, return error above), but can use
> > those helpers I mentioned above.
> >
> > >
> > > > +     for (i = 0; i < local_spec->len; i++, local_acc++, targ_acc++) {
> > > > +             targ_type = skip_mods_and_typedefs(targ_spec->btf, targ_id,
> > > > +                                                &targ_id);
> > > > +             if (!targ_type)
> > > > +                     return -EINVAL;
> > > > +
> > > > +             if (local_acc->name) {
> > > > +                     matched = bpf_core_match_member(local_spec->btf,
> > > > +                                                     local_acc,
> > > > +                                                     targ_btf, targ_id,
> > > > +                                                     targ_spec, &targ_id);
> > > > +                     if (matched <= 0)
> > > > +                             return matched;
> > > > +             } else {
> > > > +                     /* for i=0, targ_id is already treated as array element
> > > > +                      * type (because it's the original struct), for others
> > > > +                      * we should find array element type first
> > > > +                      */
> > > > +                     if (i > 0) {
> > > > +                             const struct btf_array *a;
> > > > +
> > > > +                             if (!btf_is_array(targ_type))
> > > > +                                     return 0;
> > > > +
> > > > +                             a = (void *)(targ_type + 1);
> > > > +                             if (local_acc->idx >= a->nelems)
> > > > +                                     return 0;
> > >
> > > am I reading it correctly that the local spec requested out-of-bounds
> > > index in the target array type?
> > > Why this is 'ignore' instead of -EINVAL?
> >
> > Similar to any other mismatch (e.g., int in local type vs int64 in
> > target type). It just makes candidate not matching. Why would that be
> > error that will stop the whole relocation and subsequently object
> > loading process?
>
> Did the field name match or this is for anon types?
> I've read it as names matched and type miscompared.

No, not anonymous.

struct my_struct___local {
    int a;
};

struct my_struct___target {
    long long a;
};

my_struct___local->a will not match my_struct___target->a, but it's
not a reason to stop relocation process due to error.

>
> >
> > >
> > > > +/*
> > > > + * Probe few well-known locations for vmlinux kernel image and try to load BTF
> > > > + * data out of it to use for target BTF.
> > > > + */
> > > > +static struct btf *bpf_core_find_kernel_btf(void)
> > > > +{
> > > > +     const char *locations[] = {
> > > > +             "/lib/modules/%1$s/vmlinux-%1$s",
> > > > +             "/usr/lib/modules/%1$s/kernel/vmlinux",
> > > > +     };
> > > > +     char path[PATH_MAX + 1];
> > > > +     struct utsname buf;
> > > > +     struct btf *btf;
> > > > +     int i, err;
> > > > +
> > > > +     err = uname(&buf);
> > > > +     if (err) {
> > > > +             pr_warning("failed to uname(): %d\n", err);
> > >
> > > defensive programming ?
> > > I think uname() can fail only if &buf points to non-existing page like null.
> >
> > I haven't checked source for this syscall, but man page specified that
> > it might return -1 on error.
>
> man page says that it can only return EFAULT.

Ah, yeah, seems to be the only reason. I'll remove the check, it
wasn't paranoia :)

>
> >
> > >
> > > > +             return ERR_PTR(err);
> > > > +     }
> > > > +
> > > > +     for (i = 0; i < ARRAY_SIZE(locations); i++) {
> > > > +             snprintf(path, PATH_MAX, locations[i], buf.release);
> > > > +             pr_debug("attempting to load kernel BTF from '%s'\n", path);
> > >
> > > I think this debug message would have been more useful after access().
> >
> > Sure, will move.
> >
> > >
> > > > +
> > > > +             if (access(path, R_OK))
> > > > +                     continue;
> > > > +
> > > > +             btf = btf__parse_elf(path, NULL);
> > > > +             if (IS_ERR(btf))
> > > > +                     continue;
> > > > +
> > > > +             pr_debug("successfully loaded kernel BTF from '%s'\n", path);
> > > > +             return btf;
> > > > +     }
> > > > +
> > > > +     pr_warning("failed to find valid kernel BTF\n");
> > > > +     return ERR_PTR(-ESRCH);
> > > > +}
> > > > +
> > > > +/* Output spec definition in the format:
> > > > + * [<type-id>] (<type-name>) + <raw-spec> => <offset>@<spec>,
> > > > + * where <spec> is a C-syntax view of recorded field access, e.g.: x.a[3].b
> > > > + */
> > > > +static void bpf_core_dump_spec(int level, const struct bpf_core_spec *spec)
> > > > +{
> > > > +     const struct btf_type *t;
> > > > +     const char *s;
> > > > +     __u32 type_id;
> > > > +     int i;
> > > > +
> > > > +     type_id = spec->spec[0].type_id;
> > > > +     t = btf__type_by_id(spec->btf, type_id);
> > > > +     s = btf__name_by_offset(spec->btf, t->name_off);
> > > > +     libbpf_print(level, "[%u] (%s) + ", type_id, s);
> > >
> > > imo extra []() don't improve readability of the dump.
> >
> > [<num>] is the general convention I've been using throughout libbpf to
> > specify type ID, so I'd rather keep it for consistency. I can drop
> > parens, though, no problem.
> >
> > >
> > > > +
> > > > +     for (i = 0; i < spec->raw_len; i++)
> > > > +             libbpf_print(level, "%d%s", spec->raw_spec[i],
> > > > +                          i == spec->raw_len - 1 ? " => " : ":");
> > > > +
> > > > +     libbpf_print(level, "%u @ &x", spec->offset);
> > > > +
> > > > +     for (i = 0; i < spec->len; i++) {
> > > > +             if (spec->spec[i].name)
> > > > +                     libbpf_print(level, ".%s", spec->spec[i].name);
> > > > +             else
> > > > +                     libbpf_print(level, "[%u]", spec->spec[i].idx);
> > > > +     }
> > > > +
> > > > +}
> > > > +
> > > > +static size_t bpf_core_hash_fn(const void *key, void *ctx)
> > > > +{
> > > > +     return (size_t)key;
> > > > +}
> > > > +
> > > > +static bool bpf_core_equal_fn(const void *k1, const void *k2, void *ctx)
> > > > +{
> > > > +     return k1 == k2;
> > > > +}
> > > > +
> > > > +static void *u32_to_ptr(__u32 x)
> > > > +{
> > > > +     return (void *)(uintptr_t)x;
> > > > +}
> > >
> > > u32 to pointer on 64-bit arch?!
> > > That surely needs a comment.
> >
> > I should probably call it u32_to_hash_key() to make it obvious it's
> > conversion to hashmap's generic `void *` key type.
> >
> > >
> > > > +
> > > > +/*
> > > > + * CO-RE relocate single instruction.
> > > > + *
> > > > + * The outline and important points of the algorithm:
> > > > + * 1. For given local type, find corresponding candidate target types.
> > > > + *    Candidate type is a type with the same "essential" name, ignoring
> > > > + *    everything after last triple underscore (___). E.g., `sample`,
> > > > + *    `sample___flavor_one`, `sample___flavor_another_one`, are all candidates
> > > > + *    for each other. Names with triple underscore are referred to as
> > > > + *    "flavors" and are useful, among other things, to allow to
> > > > + *    specify/support incompatible variations of the same kernel struct, which
> > > > + *    might differ between different kernel versions and/or build
> > > > + *    configurations.
> > > > + *
> > > > + *    N.B. Struct "flavors" could be generated by bpftool's BTF-to-C
> > > > + *    converter, when deduplicated BTF of a kernel still contains more than
> > > > + *    one different types with the same name. In that case, ___2, ___3, etc
> > > > + *    are appended starting from second name conflict. But start flavors are
> > > > + *    also useful to be defined "locally", in BPF program, to extract same
> > > > + *    data from incompatible changes between different kernel
> > > > + *    versions/configurations. For instance, to handle field renames between
> > > > + *    kernel versions, one can use two flavors of the struct name with the
> > > > + *    same common name and use conditional relocations to extract that field,
> > > > + *    depending on target kernel version.
> > >
> > > there are actual kernel types that have ___ in the name.
> > > Ex: struct lmc___media
> > > We probably need to revisit this 'flavor' convention.
> >
> > There are only these:
> > - lmc___softc
> > - lmc___media
> > - lmc___ctl (all three in drivers/net/wan/lmc/lmc_var.h)
> > - ____ftrace_##name set of structs
> >
> > I couldn't come up with anything cleaner-looking. I think we can still
> > keep ___ convention, but:
> >
> > 1. Match only exactly 3 underscores, delimited by non-underscore from
> > both sides (so similar to your proposed loop above);
> > 2. We can also try matching candidates assuming full name without
> > ___xxx part removed, in addition to current logic. This seems like an
> > overkill at this point and unlikely to be useful in practice, so I'd
> > postpone implementing this until we really need it.
> >
> > What do you think? Which other convention did you have in mind?
>
> may be match ___[0-9]+ instead for now?
> Not as flexible, but user supplied "flavors" is not an immediate task.

All the tests I added use non-numeric flavors. While technically I can
use just ___1, ___2 and so on, it will greatly reduce readability,
while not really solving any problem (nothing prevents someone to add
something like lmc___1 eventually).

I think it's not worth it to complicate this logic just for
lmc___{softc,media,ctl}, but we can do 2) - try to match any struct as
is. If that fails, see if it's a "flavor" and match flavors.

>
> >
> > >
> > > > +     for (i = 0, j = 0; i < cand_ids->len; i++) {
> > > > +             cand_id = cand_ids->data[i];
> > > > +             cand_type = btf__type_by_id(targ_btf, cand_id);
> > > > +             cand_name = btf__name_by_offset(targ_btf, cand_type->name_off);
> > > > +
> > > > +             err = bpf_core_spec_match(&local_spec, targ_btf,
> > > > +                                       cand_id, &cand_spec);
> > > > +             if (err < 0) {
> > > > +                     pr_warning("prog '%s': relo #%d: failed to match spec ",
> > > > +                                prog_name, relo_idx);
> > > > +                     bpf_core_dump_spec(LIBBPF_WARN, &local_spec);
> > > > +                     libbpf_print(LIBBPF_WARN,
> > > > +                                  " to candidate #%d [%d] (%s): %d\n",
> > > > +                                  i, cand_id, cand_name, err);
> > > > +                     return err;
> > > > +             }
> > > > +             if (err == 0) {
> > > > +                     pr_debug("prog '%s': relo #%d: candidate #%d [%d] (%s) doesn't match spec ",
> > > > +                              prog_name, relo_idx, i, cand_id, cand_name);
> > > > +                     bpf_core_dump_spec(LIBBPF_DEBUG, &local_spec);
> > > > +                     libbpf_print(LIBBPF_DEBUG, "\n");
> > > > +                     continue;
> > > > +             }
> > > > +
> > > > +             pr_debug("prog '%s': relo #%d: candidate #%d matched as spec ",
> > > > +                      prog_name, relo_idx, i);
> > >
> > > did you mention that you're going to make a helper for this debug dumps?
> >
> > yeah, I added bpf_core_dump_spec(), but I don't know how to shorten
> > this further... This output is extremely useful to understand what's
> > happening and will be invaluable when users will inevitably report
> > confusing behavior in some cases, so I still want to keep it.
>
> not sure yet. Just pointing out that this function has more debug printfs
> than actual code which doesn't look right.
> We have complex algorithms in the kernel (like verifier).
> Yet we don't sprinkle printfs in there to this degree.
>

We do have a verbose verifier logging, though, exactly to help users
to debug issues, which is extremely helpful and is greatly appreciated
by users.
There is nothing worse for developer experience than getting -EINVAL
without any useful log message. Been there, banged my head against the
wall wishing for a bit more verbose log. What are we trying to
optimize for here?

^ permalink raw reply

* Re: [PATCH] isdn: hysdn: Fix error spaces around '*'
From: Greg KH @ 2019-08-03  6:32 UTC (permalink / raw)
  To: Jose Carlos Cazarin Filho; +Cc: isdn, devel, netdev, linux-kernel
In-Reply-To: <20190802195602.28414-1-joseespiriki@gmail.com>

On Fri, Aug 02, 2019 at 07:56:02PM +0000, Jose Carlos Cazarin Filho wrote:
> Fix checkpath error:
> CHECK: spaces preferred around that '*' (ctx:WxV)
> +extern hysdn_card *card_root;        /* pointer to first card */
> 
> Signed-off-by: Jose Carlos Cazarin Filho <joseespiriki@gmail.com>
> ---
>  Hello all!
>  This is my first commit to the Linux Kernel, I'm doing this to learn and be able
>  to contribute more in the future
>  Peace all! 
> 
>  drivers/staging/isdn/hysdn/hysdn_defs.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/staging/isdn/hysdn/hysdn_defs.h b/drivers/staging/isdn/hysdn/hysdn_defs.h
> index cdac46a21..f20150d62 100644
> --- a/drivers/staging/isdn/hysdn/hysdn_defs.h
> +++ b/drivers/staging/isdn/hysdn/hysdn_defs.h
> @@ -220,7 +220,7 @@ typedef struct hycapictrl_info hycapictrl_info;
>  /*****************/
>  /* exported vars */
>  /*****************/
> -extern hysdn_card *card_root;	/* pointer to first card */
> +extern hysdn_card * card_root;	/* pointer to first card */

The original code here is correct, checkpatch must be reporting this
incorrectly.

thanks,

greg k-h

^ permalink raw reply

* Re: [PATCH] drivers:staging:isdn:hysdn brace same line if
From: Greg KH @ 2019-08-03  6:35 UTC (permalink / raw)
  To: Fernando Eckhardt Valle; +Cc: isdn, netdev, devel, linux-kernel
In-Reply-To: <20190802195105.27788-1-phervalle@gmail.com>

On Fri, Aug 02, 2019 at 07:51:05PM +0000, Fernando Eckhardt Valle wrote:
> Fix checkpatch error "ERROR: that open brace { should be on the previous
> line" in drivers/staging/isdn/hysdn/hycapi.c:51.
> 
> Signed-off-by: Fernando Eckhardt Valle <phervalle@gmail.com>
> ---
>  drivers/staging/isdn/hysdn/hycapi.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)

Your subject line does not make sense :(


^ permalink raw reply

* Re: [PATCH 15/34] staging/vc04_services: convert put_page() to put_user_page*()
From: Greg Kroah-Hartman @ 2019-08-03  7:06 UTC (permalink / raw)
  To: john.hubbard
  Cc: Andrew Morton, linux-fbdev, Jan Kara, kvm, Dave Hansen,
	Dave Chinner, dri-devel, linux-mm, sparclinux, ceph-devel, devel,
	rds-devel, linux-rdma, Suniel Mahesh, x86, amd-gfx,
	Christoph Hellwig, Jason Gunthorpe, Mihaela Muraru, xen-devel,
	devel, linux-media, Stefan Wahren, John Hubbard, intel-gfx,
	Kishore KP, linux-block, Jérôme Glisse,
	linux-rpi-kernel, Dan Williams, Sidong Yang, linux-arm-kernel,
	linux-nfs, Eric Anholt, netdev, LKML, linux-xfs, linux-crypto,
	linux-fsdevel, Al Viro
In-Reply-To: <20190802022005.5117-16-jhubbard@nvidia.com>

On Thu, Aug 01, 2019 at 07:19:46PM -0700, john.hubbard@gmail.com wrote:
> From: John Hubbard <jhubbard@nvidia.com>
> 
> For pages that were retained via get_user_pages*(), release those pages
> via the new put_user_page*() routines, instead of via put_page() or
> release_pages().
> 
> This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
> ("mm: introduce put_user_page*(), placeholder versions").
> 
> Cc: Eric Anholt <eric@anholt.net>
> Cc: Stefan Wahren <stefan.wahren@i2se.com>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Mihaela Muraru <mihaela.muraru21@gmail.com>
> Cc: Suniel Mahesh <sunil.m@techveda.org>
> Cc: Al Viro <viro@zeniv.linux.org.uk>
> Cc: Sidong Yang <realwakka@gmail.com>
> Cc: Kishore KP <kishore.p@techveda.org>
> Cc: linux-rpi-kernel@lists.infradead.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: devel@driverdev.osuosl.org
> Signed-off-by: John Hubbard <jhubbard@nvidia.com>
> ---
>  .../vc04_services/interface/vchiq_arm/vchiq_2835_arm.c | 10 ++--------
>  1 file changed, 2 insertions(+), 8 deletions(-)

Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

^ permalink raw reply

* Re: [PATCH 10/34] genwqe: convert put_page() to put_user_page*()
From: Greg Kroah-Hartman @ 2019-08-03  7:06 UTC (permalink / raw)
  To: john.hubbard
  Cc: Andrew Morton, linux-fbdev, Jan Kara, kvm, Dave Hansen,
	Dave Chinner, dri-devel, linux-mm, sparclinux, ceph-devel, devel,
	rds-devel, linux-rdma, x86, amd-gfx, Christoph Hellwig,
	Jason Gunthorpe, xen-devel, devel, linux-media, Arnd Bergmann,
	Guilherme G. Piccoli, John Hubbard, intel-gfx, linux-block,
	Jérôme Glisse, linux-rpi-kernel, Dan Williams,
	linux-arm-kernel, linux-nfs, netdev, LKML, linux-xfs,
	linux-crypto, linux-fsdevel, Frank Haverkamp
In-Reply-To: <20190802022005.5117-11-jhubbard@nvidia.com>

On Thu, Aug 01, 2019 at 07:19:41PM -0700, john.hubbard@gmail.com wrote:
> From: John Hubbard <jhubbard@nvidia.com>
> 
> For pages that were retained via get_user_pages*(), release those pages
> via the new put_user_page*() routines, instead of via put_page() or
> release_pages().
> 
> This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
> ("mm: introduce put_user_page*(), placeholder versions").
> 
> This changes the release code slightly, because each page slot in the
> page_list[] array is no longer checked for NULL. However, that check
> was wrong anyway, because the get_user_pages() pattern of usage here
> never allowed for NULL entries within a range of pinned pages.
> 
> Cc: Frank Haverkamp <haver@linux.vnet.ibm.com>
> Cc: "Guilherme G. Piccoli" <gpiccoli@linux.vnet.ibm.com>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Signed-off-by: John Hubbard <jhubbard@nvidia.com>
> ---
>  drivers/misc/genwqe/card_utils.c | 17 +++--------------
>  1 file changed, 3 insertions(+), 14 deletions(-)

Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

^ permalink raw reply

* Re: [PATCH net 0/2] flow_offload hardware priority fixes
From: Pablo Neira Ayuso @ 2019-08-03  7:08 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netfilter-devel, davem, netdev, marcelo.leitner, jiri, wenxu,
	saeedm, paulb, gerlitz.or
In-Reply-To: <20190802152549.357784a7@cakuba.netronome.com>

Hi Jakub,

On Fri, Aug 02, 2019 at 03:25:49PM -0700, Jakub Kicinski wrote:
> On Sat, 3 Aug 2019 00:04:09 +0200, Pablo Neira Ayuso wrote:
> > That patch removed the reference to tcf_auto_prio() already, please
> > let me know if you have any more specific update you would like to see
> > on that patch.
> 
> Please explain why the artificial priorities are needed at all.
> Hardware should order tables based on table type - ethtool, TC, nft.
> As I mentioned in the first email, and unless you can make a strong 
> case against that.
> Within those tables we should follow the same ordering rules as we 
> do in software (modulo ethtool but ordering is pretty clear there).

The idea is that every subsystem (ethtool, tc, nf) sets up/binds its
own flow_block object. And each flow_block object has its own priority
range space. So whatever priority the user specifies only applies to
the specific subsystem. Drivers still need to be updated to support
for more than one flow_block/subsystem binding at this stage though.

^ permalink raw reply

* Re: KASAN: use-after-free Read in blkdev_direct_IO
From: syzbot @ 2019-08-03  7:43 UTC (permalink / raw)
  To: arvid.brodin, davem, linux-fsdevel, linux-kernel, netdev,
	syzkaller-bugs, viro, xiyou.wangcong
In-Reply-To: <000000000000a2db16058f2514fa@google.com>

syzbot has bisected this bug to:

commit b9a1e627405d68d475a3c1f35e685ccfb5bbe668
Author: Cong Wang <xiyou.wangcong@gmail.com>
Date:   Thu Jul 4 00:21:13 2019 +0000

     hsr: implement dellink to clean up resources

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=1589f3e8600000
start commit:   1e78030e Merge tag 'mmc-v5.3-rc1' of git://git.kernel.org/..
git tree:       upstream
final crash:    https://syzkaller.appspot.com/x/report.txt?x=1789f3e8600000
console output: https://syzkaller.appspot.com/x/log.txt?x=1389f3e8600000
kernel config:  https://syzkaller.appspot.com/x/.config?x=4c7b914a2680c9c6
dashboard link: https://syzkaller.appspot.com/bug?extid=0a0e5f37746013dc7476
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=11fa7830600000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=12f31c8a600000

Reported-by: syzbot+0a0e5f37746013dc7476@syzkaller.appspotmail.com
Fixes: b9a1e627405d ("hsr: implement dellink to clean up resources")

For information about bisection process see: https://goo.gl/tpsmEJ#bisection

^ permalink raw reply

* [PATCH iproute2-next] ss: sctp: fix typo for nodelay
From: Patrick Talbert @ 2019-08-03  8:37 UTC (permalink / raw)
  To: netdev; +Cc: stephen

nodealy should be nodelay.

Signed-off-by: Patrick Talbert <ptalbert@redhat.com>
---
 misc/ss.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/misc/ss.c b/misc/ss.c
index 0927b192..01b47fed 100644
--- a/misc/ss.c
+++ b/misc/ss.c
@@ -2414,7 +2414,7 @@ static void sctp_stats_print(struct sctp_info *s)
 	if (s->sctpi_s_pd_point)
 		out(" pdpoint:%d", s->sctpi_s_pd_point);
 	if (s->sctpi_s_nodelay)
-		out(" nodealy:%d", s->sctpi_s_nodelay);
+		out(" nodelay:%d", s->sctpi_s_nodelay);
 	if (s->sctpi_s_disable_fragments)
 		out(" nofrag:%d", s->sctpi_s_disable_fragments);
 	if (s->sctpi_s_v4mapped)
-- 
2.18.1


^ permalink raw reply related

* [PATCH iproute2-next] ss: sctp: Formatting tweak in sctp_show_info for locals
From: Patrick Talbert @ 2019-08-03  8:47 UTC (permalink / raw)
  To: netdev; +Cc: stephen

'locals' output does not include a leading space so it runs up against
skmem:() output. Add a leading space to fix it.

Signed-off-by: Patrick Talbert <ptalbert@redhat.com>
---
 misc/ss.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/misc/ss.c b/misc/ss.c
index 0927b192..5e70709d 100644
--- a/misc/ss.c
+++ b/misc/ss.c
@@ -2937,7 +2937,7 @@ static void sctp_show_info(const struct nlmsghdr *nlh, struct inet_diag_msg *r,
 		len = RTA_PAYLOAD(tb[INET_DIAG_LOCALS]);
 		sa = RTA_DATA(tb[INET_DIAG_LOCALS]);
 
-		out("locals:%s", format_host_sa(sa));
+		out(" locals:%s", format_host_sa(sa));
 		for (sa++, len -= sizeof(*sa); len > 0; sa++, len -= sizeof(*sa))
 			out(",%s", format_host_sa(sa));
 
-- 
2.18.1


^ permalink raw reply related

* Re: [v2,1/2] tools: bpftool: add net attach command to attach XDP on interface
From: Daniel T. Lee @ 2019-08-03  9:39 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Daniel Borkmann, Alexei Starovoitov, netdev
In-Reply-To: <20190802113935.63be803a@cakuba.netronome.com>

On Sat, Aug 3, 2019 at 3:39 AM Jakub Kicinski
<jakub.kicinski@netronome.com> wrote:
>
> On Fri, 2 Aug 2019 14:02:29 +0900, Daniel T. Lee wrote:
> > On Fri, Aug 2, 2019 at 8:36 AM Jakub Kicinski  wrote:
> > > On Thu,  1 Aug 2019 17:11:32 +0900, Daniel T. Lee wrote:
> > > > By this commit, using `bpftool net attach`, user can attach XDP prog on
> > > > interface. New type of enum 'net_attach_type' has been made, as stated at
> > > > cover-letter, the meaning of 'attach' is, prog will be attached on interface.
> > > >
> > > > BPF prog will be attached through libbpf 'bpf_set_link_xdp_fd'.
> > > >
> > > > Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
> > > > ---
> > > > Changes in v2:
> > > >   - command 'load' changed to 'attach' for the consistency
> > > >   - 'NET_ATTACH_TYPE_XDP_DRIVE' changed to 'NET_ATTACH_TYPE_XDP_DRIVER'
> > > >
> > > >  tools/bpf/bpftool/net.c | 107 +++++++++++++++++++++++++++++++++++++++-
> > > >  1 file changed, 106 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/tools/bpf/bpftool/net.c b/tools/bpf/bpftool/net.c
> > > > index 67e99c56bc88..f3b57660b303 100644
> > > > --- a/tools/bpf/bpftool/net.c
> > > > +++ b/tools/bpf/bpftool/net.c
> > > > @@ -55,6 +55,35 @@ struct bpf_attach_info {
> > > >       __u32 flow_dissector_id;
> > > >  };
> > > >
> > > > +enum net_attach_type {
> > > > +     NET_ATTACH_TYPE_XDP,
> > > > +     NET_ATTACH_TYPE_XDP_GENERIC,
> > > > +     NET_ATTACH_TYPE_XDP_DRIVER,
> > > > +     NET_ATTACH_TYPE_XDP_OFFLOAD,
> > > > +     __MAX_NET_ATTACH_TYPE
> > > > +};
> > > > +
> > > > +static const char * const attach_type_strings[] = {
> > > > +     [NET_ATTACH_TYPE_XDP] = "xdp",
> > > > +     [NET_ATTACH_TYPE_XDP_GENERIC] = "xdpgeneric",
> > > > +     [NET_ATTACH_TYPE_XDP_DRIVER] = "xdpdrv",
> > > > +     [NET_ATTACH_TYPE_XDP_OFFLOAD] = "xdpoffload",
> > > > +     [__MAX_NET_ATTACH_TYPE] = NULL,
> > >
> > > Not sure if the terminator is necessary,
> > > ARRAY_SIZE(attach_type_strings) should suffice?
> >
> > Yes, ARRAY_SIZE is fine though. But I was just trying to make below
> > 'parse_attach_type' consistent with 'parse_attach_type' from the 'prog.c'.
> > At 'prog.c', It has same terminator at 'attach_type_strings'.
> >
> > Should I change it or keep it?
>
> Oh well, I guess there is some precedent for that :S
>
> Quick grep for const char * const reveals we have around 7 non-NULL
> terminated arrays, and 2 NULL terminated. Plus the NULL-terminated
> don't align the '=' sign, while most do.
>
> it's not a big deal, my preference is for not NULL terminating here,
> and aligning '='.
>

I think following the majority, is better for this case.

Like you mentioned, those 2 NULL-terminated arrays are at 'cgroup.c'
and 'prog.c'
and the strings they are defining are 'bpf_attach_type' which is from
uapi 'bpf.h'.

And, in my guess, the reason for those 2 arrays uses NULL-terminator
is because enum from 'bpf_attach_type' has '__MAX_BPF_ATTACH_TYPE' at
the end.

Actually I was kind of uncomfortable with adding an enum since it's
not used globally and *only* used in 'net.c' context. Instead of using
hacky enum entry, stick to the majority, ARRAY_SIZE, is better to keep
consistency.

I'll update this to next version with '=' alignment.

>
> > > > +     NEXT_ARG();
> > > > +     if (!REQ_ARGS(1))
> > > > +             return -EINVAL;
> > >
> > > Error message needed here.
> > >
> >
> > Actually it provides error message like:
> > Error: 'xdp' needs at least 1 arguments, 0 found
> >
> > are you suggesting that any additional error message is necessary?
>
> Ah, sorry, I missed REQ_ARGS() there!
>
> > > > +             return -EINVAL;
> > > > +     }
> > >
> > > Please require the dev keyword before the interface name.
> > > That'll make it feel closer to prog load syntax.
> >
> > If adding the dev keyword before interface name, will it be too long to type in?
>
> I think it's probably muscle memory for most. Plus we have excellent
> bash completions.
>
> > and also `bpftool prog` use extra keyword (such as dev) when it is
> > optional keyword.
> >
> >        bpftool prog dump jited  PROG [{ file FILE | opcodes | linum }]
> >        bpftool prog pin   PROG FILE
> >        bpftool prog { load | loadall } OBJ  PATH \
> >
> > as you can see here, FILE uses optional keyword 'file' when the
> > argument is optional.
>
> Not sure I follow
>

command 'dump' has optional argument '[ file FILE ]',
and command 'pin' has required argument with 'FILE'.

I thought the preceding optional keyword 'file' with lower-case is
intended for the optional argument since it isn't preceded when the
argument is required.

> >        bpftool prog { load | loadall } OBJ  PATH \
> >                          [type TYPE] [dev NAME] \
> >                          [map { idx IDX | name NAME } MAP]\
> >                          [pinmaps MAP_DIR]
> >
> > Yes, bpftool prog load has dev keyword with it,
> >
> > but first, like previous, the argument is optional so i think it is
> > unnecessary to use optional keyword 'dev'.
>
> The keyword should not be optional if device name is specified.
> Maybe lack of coffee on my side..
>
> > and secondly, 'bpftool net attach' isn't really related to 'bpftool prog load'.
> >
> > At previous version patch, I was using word 'load' instead of
> > 'attach', since XDP program is not
> > considered as 'BPF_PROG_ATTACH', so it might give a confusion. However
> > by the last patch discussion,
> > word 'load' has been replaced to 'attach'.
> >
> > Keeping the consistency is very important, but I was just wandering
> > about making command
> > similar to 'bpftool prog load' syntax.
>
> In case of TC the device argument is optional. You may specify it, or
> you can refer to TC blocks instead. So for that reason alone I think
> it'll be much cleaner to require dev before the interface name.
>

Previously I didn't really considered TC.

Considering the extensibility, since device argument could be optional,
requiring dev before the interface name seems necessary.

Thank you for letting me know! :)

> > > > +     return 0;
> > > > +}
> > > > +
> > > > +static int do_attach_detach_xdp(int *progfd, enum net_attach_type *attach_type,
> > > > +                             int *ifindex)
> > > > +{
> > > > +     __u32 flags;
> > > > +     int err;
> > > > +
> > > > +     flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
> > >
> > > Please add this as an option so that user can decide whether overwrite
> > > is allowed or not.
> >
> > Adding force flag to bpftool seems necessary.
> > I will add an optional argument for this.
>
> Right, I was wondering if we want to call it force, though? force is
> sort of a reuse of iproute2 concept. But it's kind of hard to come up
> with names.
>
> Just to be sure - I mean something like:
>
> bpftool net attach xdp id xyz dev ethN noreplace
>
> Rather than:
>
> bpftool -f net attach xdp id xyz dev ethN
>

How about a word 'immutable'? 'noreplace' seems good though.
Just suggesting an option.

> > > > +     if (*attach_type == NET_ATTACH_TYPE_XDP_GENERIC)
> > > > +             flags |= XDP_FLAGS_SKB_MODE;
> > > > +     if (*attach_type == NET_ATTACH_TYPE_XDP_DRIVER)
> > > > +             flags |= XDP_FLAGS_DRV_MODE;
> > > > +     if (*attach_type == NET_ATTACH_TYPE_XDP_OFFLOAD)
> > > > +             flags |= XDP_FLAGS_HW_MODE;
> > > > +
> > > > +     err = bpf_set_link_xdp_fd(*ifindex, *progfd, flags);
> > > > +
> > > > +     return err;
> > >
> > > no need for the err variable here.
> >
> > My apologies, but I'm not sure why err variable isn't needed at here.
> > AFAIK, 'bpf_set_link_xdp_fd' from libbpf returns the netlink_recv result,
> > and in order to propagate error, err variable is necessary, I guess?
>
>         return bpf_set_link_xdp_fd(*ifindex, *progfd, flags);
>
> Is what I meant.
>

Oops. I've got it wrong.
I'll update to next patch.

> > > > +}
> > > > +
> > > > +static int do_attach(int argc, char **argv)
> > > > +{
> > > > +     enum net_attach_type attach_type;
> > > > +     int err, progfd, ifindex;
> > > > +
> > > > +     err = parse_attach_args(argc, argv, &progfd, &attach_type, &ifindex);
> > > > +     if (err)
> > > > +             return err;
> > >
> > > Probably not the best idea to move this out into a helper.
> >
> > Again, just trying to make consistent with 'prog.c'.
> >
> > But clearly it has differences with do_attach/detach from 'prog.c'.
> > From it, it uses the same parse logic 'parse_attach_detach_args' since
> > the two command 'bpftool prog attach/detach' uses the same argument format.
> >
> > However, in here, 'bpftool net' attach and detach requires different number of
> > argument, so function for parse argument has been defined separately.
> > The situation is little bit different, but keeping argument parse logic as an
> > helper, I think it's better in terms of consistency.
>
> Well they won't share the same arguments if you add the keyword for
> controlling IF_NOEXIST :(
>

My apologies, but I think I'm not following with you.

Currently detach/attach isn't sharing same arguments.
And even after adding the argument for IF_NOEXIST such as  [ noreplace ],
it'll stays the same for not sharing same arguments.

I'm not sure why it is not the best idea to move arg parse logic into a helper.

> > About the moving parse logic to a helper, I was trying to keep command
> > entry (do_attach)
> > as simple as possible. Parse all the argument in command entry will
> > make function longer
> > and might make harder to understand what it does.
> >
> > And I'm not pretty sure that argument parse logic will stays the same
> > after other attachment
> > type comes in. What I mean is, the argument count or type might be
> > added and to fulfill
> > all that specific cases, the code might grow larger.
> >
> > So for the consistency, simplicity and extensibility, I prefer to keep
> > it as a helper.
> >
> > > > +     if (is_prefix("xdp", attach_type_strings[attach_type]))
> > > > +             err = do_attach_detach_xdp(&progfd, &attach_type, &ifindex);
> > >
> > > Hm. We either need an error to be reported if it's not xdp or since we
> > > only accept XDP now perhaps the if() is superfluous?
> >
> > Well, if the attach_type isn't xdp, the error will be occurred from
> > the argument parse,
> > Will it be necessary to reinforce with error logic to make it more secure?
>
> Hm. it should already be fine, no? For non-xdp parse_attach_type() will
> return __MAX_NET_ATTACH_TYPE, then parsing returns EINVAL and we exit.
> Not sure I follow.

Yes. That was what i meant.

Thank you for taking your time for the review.

^ permalink raw reply

* Re: [PATCH] isdn: hysdn: Fix error spaces around '*'
From: Joe Perches @ 2019-08-03 11:15 UTC (permalink / raw)
  To: Greg KH, Jose Carlos Cazarin Filho; +Cc: isdn, devel, netdev, linux-kernel
In-Reply-To: <20190803063246.GA10186@kroah.com>

On Sat, 2019-08-03 at 08:32 +0200, Greg KH wrote:
> On Fri, Aug 02, 2019 at 07:56:02PM +0000, Jose Carlos Cazarin Filho wrote:
> > Fix checkpath error:
> > CHECK: spaces preferred around that '*' (ctx:WxV)
> > +extern hysdn_card *card_root;        /* pointer to first card */
[]
> > diff --git a/drivers/staging/isdn/hysdn/hysdn_defs.h b/drivers/staging/isdn/hysdn/hysdn_defs.h
[]
> > @@ -220,7 +220,7 @@ typedef struct hycapictrl_info hycapictrl_info;
> >  /*****************/
> >  /* exported vars */
> >  /*****************/
> > -extern hysdn_card *card_root;	/* pointer to first card */
> > +extern hysdn_card * card_root;	/* pointer to first card */
> 
> The original code here is correct, checkpatch must be reporting this
> incorrectly.

Here checkpatch thinks that hydsn_card is an identifier rather
than a typedef.

It's defined as:
	typedef struct HYSDN_CARD {
	...
	} hysdn_card;

And that confuses checkpatch.

kernel source code style would not use a typedef for a struct.

A change would be to remove the typedef and declare this as:
	struct hysdn_card {
		...
	};

And then do a global:
	sed 's/\bhysdn_card\b/struct hysdn_card/g'

But that's not necessary as the driver is likely to be removed.



^ permalink raw reply

* Re: [PATCH] isdn: hysdn: Fix error spaces around '*'
From: Greg KH @ 2019-08-03 11:24 UTC (permalink / raw)
  To: Joe Perches; +Cc: Jose Carlos Cazarin Filho, devel, netdev, isdn, linux-kernel
In-Reply-To: <6ff800ceda4b1c1f1d9e519aac13db42dc703294.camel@perches.com>

On Sat, Aug 03, 2019 at 04:15:05AM -0700, Joe Perches wrote:
> On Sat, 2019-08-03 at 08:32 +0200, Greg KH wrote:
> > On Fri, Aug 02, 2019 at 07:56:02PM +0000, Jose Carlos Cazarin Filho wrote:
> > > Fix checkpath error:
> > > CHECK: spaces preferred around that '*' (ctx:WxV)
> > > +extern hysdn_card *card_root;        /* pointer to first card */
> []
> > > diff --git a/drivers/staging/isdn/hysdn/hysdn_defs.h b/drivers/staging/isdn/hysdn/hysdn_defs.h
> []
> > > @@ -220,7 +220,7 @@ typedef struct hycapictrl_info hycapictrl_info;
> > >  /*****************/
> > >  /* exported vars */
> > >  /*****************/
> > > -extern hysdn_card *card_root;	/* pointer to first card */
> > > +extern hysdn_card * card_root;	/* pointer to first card */
> > 
> > The original code here is correct, checkpatch must be reporting this
> > incorrectly.
> 
> Here checkpatch thinks that hydsn_card is an identifier rather
> than a typedef.
> 
> It's defined as:
> 	typedef struct HYSDN_CARD {
> 	...
> 	} hysdn_card;
> 
> And that confuses checkpatch.
> 
> kernel source code style would not use a typedef for a struct.
> 
> A change would be to remove the typedef and declare this as:
> 	struct hysdn_card {
> 		...
> 	};
> 
> And then do a global:
> 	sed 's/\bhysdn_card\b/struct hysdn_card/g'
> 
> But that's not necessary as the driver is likely to be removed.

Ah, that makes sense why checkpatch did this, thanks for the
information.

And yes, it's not worth being changed, as this is going to be deleted.
But, I bet we get this sent a lot until it is as it's "easy pickings" :)

thanks,

greg k-h

^ permalink raw reply

* [PATCH v1 2/3] net: hisilicon: fix hip04-xmit never return TX_BUSY
From: Jiangfeng Xiao @ 2019-08-03 12:31 UTC (permalink / raw)
  To: davem, yisen.zhuang, salil.mehta, xiaojiangfeng
  Cc: netdev, linux-kernel, leeyou.li, xiaowei774, nixiaoming
In-Reply-To: <1564835501-90257-1-git-send-email-xiaojiangfeng@huawei.com>

TX_DESC_NUM is 256, in tx_count, the maximum value of
mod(TX_DESC_NUM - 1) is 254, the variable "count" in
the hip04_mac_start_xmit function is never equal to
(TX_DESC_NUM - 1), so hip04_mac_start_xmit never
return NETDEV_TX_BUSY.

tx_count is modified to mod(TX_DESC_NUM) so that
the maximum value of tx_count can reach
(TX_DESC_NUM - 1), then hip04_mac_start_xmit can reurn
NETDEV_TX_BUSY.

Signed-off-by: Jiangfeng Xiao <xiaojiangfeng@huawei.com>
---
 drivers/net/ethernet/hisilicon/hip04_eth.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c
index 1e1b154..d775b98 100644
--- a/drivers/net/ethernet/hisilicon/hip04_eth.c
+++ b/drivers/net/ethernet/hisilicon/hip04_eth.c
@@ -248,7 +248,7 @@ struct hip04_priv {
 
 static inline unsigned int tx_count(unsigned int head, unsigned int tail)
 {
-	return (head - tail) % (TX_DESC_NUM - 1);
+	return (head - tail) % TX_DESC_NUM;
 }
 
 static void hip04_config_port(struct net_device *ndev, u32 speed, u32 duplex)
-- 
1.8.5.6


^ permalink raw reply related

* [PATCH v1 1/3] net: hisilicon: make hip04_tx_reclaim non-reentrant
From: Jiangfeng Xiao @ 2019-08-03 12:31 UTC (permalink / raw)
  To: davem, yisen.zhuang, salil.mehta, xiaojiangfeng
  Cc: netdev, linux-kernel, leeyou.li, xiaowei774, nixiaoming
In-Reply-To: <1564835501-90257-1-git-send-email-xiaojiangfeng@huawei.com>

If hip04_tx_reclaim is interrupted while it is running
and then __napi_schedule continues to execute
hip04_rx_poll->hip04_tx_reclaim, reentrancy occurs
and oops is generated. So you need to mask the interrupt
during the hip04_tx_reclaim run.

The kernel oops exception stack is as follows:

Unable to handle kernel NULL pointer dereference
at virtual address 00000050
pgd = c0003000
[00000050] *pgd=80000000a04003, *pmd=00000000
Internal error: Oops: 206 [#1] SMP ARM
Modules linked in: hip04_eth mtdblock mtd_blkdevs mtd
ohci_platform ehci_platform ohci_hcd ehci_hcd
vfat fat sd_mod usb_storage scsi_mod usbcore usb_common
CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           O    4.4.185 #1
Hardware name: Hisilicon A15
task: c0a250e0 task.stack: c0a00000
PC is at hip04_tx_reclaim+0xe0/0x17c [hip04_eth]
LR is at hip04_tx_reclaim+0x30/0x17c [hip04_eth]
pc : [<bf30c3a4>]    lr : [<bf30c2f4>]    psr: 600e0313
sp : c0a01d88  ip : 00000000  fp : c0601f9c
r10: 00000000  r9 : c3482380  r8 : 00000001
r7 : 00000000  r6 : 000000e1  r5 : c3482000  r4 : 0000000c
r3 : f2209800  r2 : 00000000  r1 : 00000000  r0 : 00000000
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
Control: 32c5387d  Table: 03d28c80  DAC: 55555555
Process swapper/0 (pid: 0, stack limit = 0xc0a00190)
Stack: (0xc0a01d88 to 0xc0a02000)
[<bf30c3a4>] (hip04_tx_reclaim [hip04_eth]) from [<bf30d2e0>]
                                                (hip04_rx_poll+0x88/0x368 [hip04_eth])
[<bf30d2e0>] (hip04_rx_poll [hip04_eth]) from [<c04c2d9c>] (net_rx_action+0x114/0x34c)
[<c04c2d9c>] (net_rx_action) from [<c021eed8>] (__do_softirq+0x218/0x318)
[<c021eed8>] (__do_softirq) from [<c021f284>] (irq_exit+0x88/0xac)
[<c021f284>] (irq_exit) from [<c0240090>] (msa_irq_exit+0x11c/0x1d4)
[<c0240090>] (msa_irq_exit) from [<c02677e0>] (__handle_domain_irq+0x110/0x148)
[<c02677e0>] (__handle_domain_irq) from [<c0201588>] (gic_handle_irq+0xd4/0x118)
[<c0201588>] (gic_handle_irq) from [<c0551700>] (__irq_svc+0x40/0x58)
Exception stack(0xc0a01f30 to 0xc0a01f78)
1f20:                                     c0ae8b40 00000000 00000000 00000000
1f40: 00000002 ffffe000 c0601f9c 00000000 ffffffff c0a2257c c0a22440 c0831a38
1f60: c0a01ec4 c0a01f80 c0203714 c0203718 600e0213 ffffffff
[<c0551700>] (__irq_svc) from [<c0203718>] (arch_cpu_idle+0x20/0x3c)
[<c0203718>] (arch_cpu_idle) from [<c025bfd8>] (cpu_startup_entry+0x244/0x29c)
[<c025bfd8>] (cpu_startup_entry) from [<c054b0d8>] (rest_init+0xc8/0x10c)
[<c054b0d8>] (rest_init) from [<c0800c58>] (start_kernel+0x468/0x514)
Code: a40599e5 016086e2 018088e2 7660efe6 (503090e5)
---[ end trace 1db21d6d09c49d74 ]---
Kernel panic - not syncing: Fatal exception in interrupt
CPU3: stopping
CPU: 3 PID: 0 Comm: swapper/3 Tainted: G      D    O    4.4.185 #1

Signed-off-by: Jiangfeng Xiao <xiaojiangfeng@huawei.com>
---
 drivers/net/ethernet/hisilicon/hip04_eth.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c
index d604528..1e1b154 100644
--- a/drivers/net/ethernet/hisilicon/hip04_eth.c
+++ b/drivers/net/ethernet/hisilicon/hip04_eth.c
@@ -585,6 +585,9 @@ static int hip04_rx_poll(struct napi_struct *napi, int budget)
 	u16 len;
 	u32 err;
 
+	/* clean up tx descriptors */
+	tx_remaining = hip04_tx_reclaim(ndev, false);
+
 	while (cnt && !last) {
 		buf = priv->rx_buf[priv->rx_head];
 		skb = build_skb(buf, priv->rx_buf_size);
@@ -645,8 +648,7 @@ static int hip04_rx_poll(struct napi_struct *napi, int budget)
 	}
 	napi_complete_done(napi, rx);
 done:
-	/* clean up tx descriptors and start a new timer if necessary */
-	tx_remaining = hip04_tx_reclaim(ndev, false);
+	/* start a new timer if necessary */
 	if (rx < budget && tx_remaining)
 		hip04_start_tx_timer(priv);
 
-- 
1.8.5.6


^ permalink raw reply related

* [PATCH v1 3/3] net: hisilicon: Fix dma_map_single failed on arm64
From: Jiangfeng Xiao @ 2019-08-03 12:31 UTC (permalink / raw)
  To: davem, yisen.zhuang, salil.mehta, xiaojiangfeng
  Cc: netdev, linux-kernel, leeyou.li, xiaowei774, nixiaoming
In-Reply-To: <1564835501-90257-1-git-send-email-xiaojiangfeng@huawei.com>

On the arm64 platform, executing "ifconfig eth0 up" will fail,
returning "ifconfig: SIOCSIFFLAGS: Input/output error."

ndev->dev is not initialized, dma_map_single->get_dma_ops->
dummy_dma_ops->__dummy_map_page will return DMA_ERROR_CODE
directly, so when we use dma_map_single, the first parameter
is to use the device of platform_device.

Signed-off-by: Jiangfeng Xiao <xiaojiangfeng@huawei.com>
---
 drivers/net/ethernet/hisilicon/hip04_eth.c | 20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c
index d775b98..c841674 100644
--- a/drivers/net/ethernet/hisilicon/hip04_eth.c
+++ b/drivers/net/ethernet/hisilicon/hip04_eth.c
@@ -220,6 +220,7 @@ struct hip04_priv {
 	unsigned int reg_inten;
 
 	struct napi_struct napi;
+	struct device *dev;
 	struct net_device *ndev;
 
 	struct tx_desc *tx_desc;
@@ -465,7 +466,7 @@ static int hip04_tx_reclaim(struct net_device *ndev, bool force)
 		}
 
 		if (priv->tx_phys[tx_tail]) {
-			dma_unmap_single(&ndev->dev, priv->tx_phys[tx_tail],
+			dma_unmap_single(priv->dev, priv->tx_phys[tx_tail],
 					 priv->tx_skb[tx_tail]->len,
 					 DMA_TO_DEVICE);
 			priv->tx_phys[tx_tail] = 0;
@@ -516,8 +517,8 @@ static void hip04_start_tx_timer(struct hip04_priv *priv)
 		return NETDEV_TX_BUSY;
 	}
 
-	phys = dma_map_single(&ndev->dev, skb->data, skb->len, DMA_TO_DEVICE);
-	if (dma_mapping_error(&ndev->dev, phys)) {
+	phys = dma_map_single(priv->dev, skb->data, skb->len, DMA_TO_DEVICE);
+	if (dma_mapping_error(priv->dev, phys)) {
 		dev_kfree_skb(skb);
 		return NETDEV_TX_OK;
 	}
@@ -596,7 +597,7 @@ static int hip04_rx_poll(struct napi_struct *napi, int budget)
 			goto refill;
 		}
 
-		dma_unmap_single(&ndev->dev, priv->rx_phys[priv->rx_head],
+		dma_unmap_single(priv->dev, priv->rx_phys[priv->rx_head],
 				 RX_BUF_SIZE, DMA_FROM_DEVICE);
 		priv->rx_phys[priv->rx_head] = 0;
 
@@ -625,9 +626,9 @@ static int hip04_rx_poll(struct napi_struct *napi, int budget)
 		buf = netdev_alloc_frag(priv->rx_buf_size);
 		if (!buf)
 			goto done;
-		phys = dma_map_single(&ndev->dev, buf,
+		phys = dma_map_single(priv->dev, buf,
 				      RX_BUF_SIZE, DMA_FROM_DEVICE);
-		if (dma_mapping_error(&ndev->dev, phys))
+		if (dma_mapping_error(priv->dev, phys))
 			goto done;
 		priv->rx_buf[priv->rx_head] = buf;
 		priv->rx_phys[priv->rx_head] = phys;
@@ -730,9 +731,9 @@ static int hip04_mac_open(struct net_device *ndev)
 	for (i = 0; i < RX_DESC_NUM; i++) {
 		dma_addr_t phys;
 
-		phys = dma_map_single(&ndev->dev, priv->rx_buf[i],
+		phys = dma_map_single(priv->dev, priv->rx_buf[i],
 				      RX_BUF_SIZE, DMA_FROM_DEVICE);
-		if (dma_mapping_error(&ndev->dev, phys))
+		if (dma_mapping_error(priv->dev, phys))
 			return -EIO;
 
 		priv->rx_phys[i] = phys;
@@ -766,7 +767,7 @@ static int hip04_mac_stop(struct net_device *ndev)
 
 	for (i = 0; i < RX_DESC_NUM; i++) {
 		if (priv->rx_phys[i]) {
-			dma_unmap_single(&ndev->dev, priv->rx_phys[i],
+			dma_unmap_single(priv->dev, priv->rx_phys[i],
 					 RX_BUF_SIZE, DMA_FROM_DEVICE);
 			priv->rx_phys[i] = 0;
 		}
@@ -909,6 +910,7 @@ static int hip04_mac_probe(struct platform_device *pdev)
 		return -ENOMEM;
 
 	priv = netdev_priv(ndev);
+	priv->dev = d;
 	priv->ndev = ndev;
 	platform_set_drvdata(pdev, ndev);
 	SET_NETDEV_DEV(ndev, &pdev->dev);
-- 
1.8.5.6


^ permalink raw reply related

* [PATCH v1 0/3] net: hisilicon: Fix a few problems with hip04_eth
From: Jiangfeng Xiao @ 2019-08-03 12:31 UTC (permalink / raw)
  To: davem, yisen.zhuang, salil.mehta, xiaojiangfeng
  Cc: netdev, linux-kernel, leeyou.li, xiaowei774, nixiaoming

During the use of the hip04_eth driver,
several problems were found,
which solved the hip04_tx_reclaim reentry problem,
fixed the problem that hip04_mac_start_xmit never
returns NETDEV_TX_BUSY
and the dma_map_single failed on the arm64 platform.

Jiangfeng Xiao (3):
  net: hisilicon: make hip04_tx_reclaim non-reentrant
  net: hisilicon: fix hip04-xmit never return TX_BUSY
  net: hisilicon: Fix dma_map_single failed on arm64

 drivers/net/ethernet/hisilicon/hip04_eth.c | 28 ++++++++++++++++------------
 1 file changed, 16 insertions(+), 12 deletions(-)

-- 
1.8.5.6


^ permalink raw reply

* [PATCH net 0/2] action fixes for flow_offload infra compatibility
From: Vlad Buslov @ 2019-08-03 13:36 UTC (permalink / raw)
  To: netdev
  Cc: pieter.jansenvanvuuren, jhs, xiyou.wangcong, jiri, davem,
	Vlad Buslov

Fix rcu warnings due to usage of action helpers that expect rcu read lock
protection from rtnl-protected context of flow_offload infra.

Vlad Buslov (2):
  net: sched: police: allow accessing police->params with rtnl
  net: sched: sample: allow accessing psample_group with rtnl

 include/net/tc_act/tc_police.h | 4 ++--
 include/net/tc_act/tc_sample.h | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

-- 
2.21.0


^ permalink raw reply

* [PATCH net 2/2] net: sched: sample: allow accessing psample_group with rtnl
From: Vlad Buslov @ 2019-08-03 13:36 UTC (permalink / raw)
  To: netdev
  Cc: pieter.jansenvanvuuren, jhs, xiyou.wangcong, jiri, davem,
	Vlad Buslov
In-Reply-To: <20190803133619.10574-1-vladbu@mellanox.com>

Recently implemented support for sample action in flow_offload infra leads
to following rcu usage warning:

[ 1938.234856] =============================
[ 1938.234858] WARNING: suspicious RCU usage
[ 1938.234863] 5.3.0-rc1+ #574 Not tainted
[ 1938.234866] -----------------------------
[ 1938.234869] include/net/tc_act/tc_sample.h:47 suspicious rcu_dereference_check() usage!
[ 1938.234872]
               other info that might help us debug this:

[ 1938.234875]
               rcu_scheduler_active = 2, debug_locks = 1
[ 1938.234879] 1 lock held by tc/19540:
[ 1938.234881]  #0: 00000000b03cb918 (rtnl_mutex){+.+.}, at: tc_new_tfilter+0x47c/0x970
[ 1938.234900]
               stack backtrace:
[ 1938.234905] CPU: 2 PID: 19540 Comm: tc Not tainted 5.3.0-rc1+ #574
[ 1938.234908] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
[ 1938.234911] Call Trace:
[ 1938.234922]  dump_stack+0x85/0xc0
[ 1938.234930]  tc_setup_flow_action+0xed5/0x2040
[ 1938.234944]  fl_hw_replace_filter+0x11f/0x2e0 [cls_flower]
[ 1938.234965]  fl_change+0xd24/0x1b30 [cls_flower]
[ 1938.234990]  tc_new_tfilter+0x3e0/0x970
[ 1938.235021]  ? tc_del_tfilter+0x720/0x720
[ 1938.235028]  rtnetlink_rcv_msg+0x389/0x4b0
[ 1938.235038]  ? netlink_deliver_tap+0x95/0x400
[ 1938.235044]  ? rtnl_dellink+0x2d0/0x2d0
[ 1938.235053]  netlink_rcv_skb+0x49/0x110
[ 1938.235063]  netlink_unicast+0x171/0x200
[ 1938.235073]  netlink_sendmsg+0x224/0x3f0
[ 1938.235091]  sock_sendmsg+0x5e/0x60
[ 1938.235097]  ___sys_sendmsg+0x2ae/0x330
[ 1938.235111]  ? __handle_mm_fault+0x12cd/0x19e0
[ 1938.235125]  ? __handle_mm_fault+0x12cd/0x19e0
[ 1938.235138]  ? find_held_lock+0x2b/0x80
[ 1938.235147]  ? do_user_addr_fault+0x22d/0x490
[ 1938.235160]  __sys_sendmsg+0x59/0xa0
[ 1938.235178]  do_syscall_64+0x5c/0xb0
[ 1938.235187]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1938.235192] RIP: 0033:0x7ff9a4d597b8
[ 1938.235197] Code: 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 65 8f 0c 00 8b 00 85 c0 75 17 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83
 ec 28 89 54
[ 1938.235200] RSP: 002b:00007ffcfe381c48 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[ 1938.235205] RAX: ffffffffffffffda RBX: 000000005d4497f9 RCX: 00007ff9a4d597b8
[ 1938.235208] RDX: 0000000000000000 RSI: 00007ffcfe381cb0 RDI: 0000000000000003
[ 1938.235211] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000006
[ 1938.235214] R10: 0000000000404ec2 R11: 0000000000000246 R12: 0000000000000001
[ 1938.235217] R13: 0000000000480640 R14: 0000000000000012 R15: 0000000000000001

Change tcf_sample_psample_group() helper to allow using it from both rtnl
and rcu protected contexts.

Fixes: a7a7be6087b0 ("net/sched: add sample action to the hardware intermediate representation")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
---
 include/net/tc_act/tc_sample.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/net/tc_act/tc_sample.h b/include/net/tc_act/tc_sample.h
index 0a559d4b6f0f..b4fce0fae645 100644
--- a/include/net/tc_act/tc_sample.h
+++ b/include/net/tc_act/tc_sample.h
@@ -44,7 +44,7 @@ static inline int tcf_sample_trunc_size(const struct tc_action *a)
 static inline struct psample_group *
 tcf_sample_psample_group(const struct tc_action *a)
 {
-	return rcu_dereference(to_sample(a)->psample_group);
+	return rcu_dereference_rtnl(to_sample(a)->psample_group);
 }
 
 #endif /* __NET_TC_SAMPLE_H */
-- 
2.21.0


^ permalink raw reply related

* [PATCH net 1/2] net: sched: police: allow accessing police->params with rtnl
From: Vlad Buslov @ 2019-08-03 13:36 UTC (permalink / raw)
  To: netdev
  Cc: pieter.jansenvanvuuren, jhs, xiyou.wangcong, jiri, davem,
	Vlad Buslov
In-Reply-To: <20190803133619.10574-1-vladbu@mellanox.com>

Recently implemented support for police action in flow_offload infra leads
to following rcu usage warning:

[ 1925.881092] =============================
[ 1925.881094] WARNING: suspicious RCU usage
[ 1925.881098] 5.3.0-rc1+ #574 Not tainted
[ 1925.881100] -----------------------------
[ 1925.881104] include/net/tc_act/tc_police.h:57 suspicious rcu_dereference_check() usage!
[ 1925.881106]
               other info that might help us debug this:

[ 1925.881109]
               rcu_scheduler_active = 2, debug_locks = 1
[ 1925.881112] 1 lock held by tc/18591:
[ 1925.881115]  #0: 00000000b03cb918 (rtnl_mutex){+.+.}, at: tc_new_tfilter+0x47c/0x970
[ 1925.881124]
               stack backtrace:
[ 1925.881127] CPU: 2 PID: 18591 Comm: tc Not tainted 5.3.0-rc1+ #574
[ 1925.881130] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
[ 1925.881132] Call Trace:
[ 1925.881138]  dump_stack+0x85/0xc0
[ 1925.881145]  tc_setup_flow_action+0x1771/0x2040
[ 1925.881155]  fl_hw_replace_filter+0x11f/0x2e0 [cls_flower]
[ 1925.881175]  fl_change+0xd24/0x1b30 [cls_flower]
[ 1925.881200]  tc_new_tfilter+0x3e0/0x970
[ 1925.881231]  ? tc_del_tfilter+0x720/0x720
[ 1925.881243]  rtnetlink_rcv_msg+0x389/0x4b0
[ 1925.881250]  ? netlink_deliver_tap+0x95/0x400
[ 1925.881257]  ? rtnl_dellink+0x2d0/0x2d0
[ 1925.881264]  netlink_rcv_skb+0x49/0x110
[ 1925.881275]  netlink_unicast+0x171/0x200
[ 1925.881284]  netlink_sendmsg+0x224/0x3f0
[ 1925.881299]  sock_sendmsg+0x5e/0x60
[ 1925.881305]  ___sys_sendmsg+0x2ae/0x330
[ 1925.881309]  ? task_work_add+0x43/0x50
[ 1925.881314]  ? fput_many+0x45/0x80
[ 1925.881329]  ? __lock_acquire+0x248/0x1930
[ 1925.881342]  ? find_held_lock+0x2b/0x80
[ 1925.881347]  ? task_work_run+0x7b/0xd0
[ 1925.881359]  __sys_sendmsg+0x59/0xa0
[ 1925.881375]  do_syscall_64+0x5c/0xb0
[ 1925.881381]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1925.881384] RIP: 0033:0x7feb245047b8
[ 1925.881388] Code: 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 65 8f 0c 00 8b 00 85 c0 75 17 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83
 ec 28 89 54
[ 1925.881391] RSP: 002b:00007ffc2d2a5788 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[ 1925.881395] RAX: ffffffffffffffda RBX: 000000005d4497ed RCX: 00007feb245047b8
[ 1925.881398] RDX: 0000000000000000 RSI: 00007ffc2d2a57f0 RDI: 0000000000000003
[ 1925.881400] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000006
[ 1925.881403] R10: 0000000000404ec2 R11: 0000000000000246 R12: 0000000000000001
[ 1925.881406] R13: 0000000000480640 R14: 0000000000000012 R15: 0000000000000001

Change tcf_police_rate_bytes_ps() and tcf_police_tcfp_burst() helpers to
allow using them from both rtnl and rcu protected contexts.

Fixes: 8c8cfc6ed274 ("net/sched: add police action to the hardware intermediate representation")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
---
 include/net/tc_act/tc_police.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/net/tc_act/tc_police.h b/include/net/tc_act/tc_police.h
index 8b9ef3664262..cfdc7cb82cad 100644
--- a/include/net/tc_act/tc_police.h
+++ b/include/net/tc_act/tc_police.h
@@ -54,7 +54,7 @@ static inline u64 tcf_police_rate_bytes_ps(const struct tc_action *act)
 	struct tcf_police *police = to_police(act);
 	struct tcf_police_params *params;
 
-	params = rcu_dereference_bh(police->params);
+	params = rcu_dereference_bh_rtnl(police->params);
 	return params->rate.rate_bytes_ps;
 }
 
@@ -63,7 +63,7 @@ static inline s64 tcf_police_tcfp_burst(const struct tc_action *act)
 	struct tcf_police *police = to_police(act);
 	struct tcf_police_params *params;
 
-	params = rcu_dereference_bh(police->params);
+	params = rcu_dereference_bh_rtnl(police->params);
 	return params->tcfp_burst;
 }
 
-- 
2.21.0


^ permalink raw reply related

* Re: [PATCH net-next v5 5/6] flow_offload: support get flow_block immediately
From: wenxu @ 2019-08-03 13:42 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: jiri, pablo, fw, netfilter-devel, netdev, John Hurley
In-Reply-To: <20190802172155.7a36713d@cakuba.netronome.com>


在 2019/8/3 8:21, Jakub Kicinski 写道:
> On Sat, 3 Aug 2019 07:19:31 +0800, wenxu wrote:
>>> Or:
>>>
>>> device unregister:
>>>   - nft block destroy
>>>     - UNBIND cb
>>>       - free driver's block state
>>>   - driver notifier callback
>>>     - free driver's state
>>>
>>> No?  
>> For the second case maybe can't unbind cb? because the nft block is
>> destroied. There is no way to find the block(chain) in nft.
> But before the block is destroyed doesn't nft send an UNBIND event to
> the drivers, always?
you are correct, it will be send an UBIND event when the block is destroyed

^ permalink raw reply

* Re: [PATCH net-next v3] net: phy: broadcom: add 1000Base-X support for BCM54616S
From: Vladimir Oltean @ 2019-08-03 13:49 UTC (permalink / raw)
  To: Tao Ren
  Cc: Andrew Lunn, Florian Fainelli, Heiner Kallweit, David S . Miller,
	Arun Parameswaran, Justin Chen, netdev, lkml, openbmc
In-Reply-To: <20190802215419.313512-1-taoren@fb.com>

Hi Tao,

On Sat, 3 Aug 2019 at 00:56, Tao Ren <taoren@fb.com> wrote:
>
> genphy_read_status() cannot report correct link speed when BCM54616S PHY
> is configured in RGMII->1000Base-KX mode (for example, on Facebook CMM
> BMC platform), and it is because speed-related fields in MII registers
> are assigned different meanings in 1000X register set. Actually there
> is no speed field in 1000X register set because link speed is always
> 1000 Mb/s.
>
> The patch adds "probe" callback to detect PHY's operation mode based on
> INTERF_SEL[1:0] pins and 1000X/100FX selection bit in SerDES 100-FX
> Control register. Besides, link speed is manually set to 1000 Mb/s in
> "read_status" callback if PHY-switch link is 1000Base-X.
>
> Signed-off-by: Tao Ren <taoren@fb.com>
> ---
>  Changes in v3:
>   - rename bcm5482_read_status to bcm54xx_read_status so the callback can
>     be shared by BCM5482 and BCM54616S.
>  Changes in v2:
>   - Auto-detect PHY operation mode instead of passing DT node.
>   - move PHY mode auto-detect logic from config_init to probe callback.
>   - only set speed (not including duplex) in read_status callback.
>   - update patch description with more background to avoid confusion.
>   - patch #1 in the series ("net: phy: broadcom: set features explicitly
>     for BCM54616") is dropped: the fix should go to get_features callback
>     which may potentially depend on this patch.
>
>  drivers/net/phy/broadcom.c | 41 +++++++++++++++++++++++++++++++++-----
>  include/linux/brcmphy.h    | 10 ++++++++--
>  2 files changed, 44 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/net/phy/broadcom.c b/drivers/net/phy/broadcom.c
> index 937d0059e8ac..ecad8a201a09 100644
> --- a/drivers/net/phy/broadcom.c
> +++ b/drivers/net/phy/broadcom.c
> @@ -383,9 +383,9 @@ static int bcm5482_config_init(struct phy_device *phydev)
>                 /*
>                  * Select 1000BASE-X register set (primary SerDes)
>                  */
> -               reg = bcm_phy_read_shadow(phydev, BCM5482_SHD_MODE);
> -               bcm_phy_write_shadow(phydev, BCM5482_SHD_MODE,
> -                                    reg | BCM5482_SHD_MODE_1000BX);
> +               reg = bcm_phy_read_shadow(phydev, BCM54XX_SHD_MODE);
> +               bcm_phy_write_shadow(phydev, BCM54XX_SHD_MODE,
> +                                    reg | BCM54XX_SHD_MODE_1000BX);
>
>                 /*
>                  * LED1=ACTIVITYLED, LED3=LINKSPD[2]
> @@ -409,7 +409,7 @@ static int bcm5482_config_init(struct phy_device *phydev)
>         return err;
>  }
>
> -static int bcm5482_read_status(struct phy_device *phydev)
> +static int bcm54xx_read_status(struct phy_device *phydev)
>  {
>         int err;
>
> @@ -464,6 +464,35 @@ static int bcm54616s_config_aneg(struct phy_device *phydev)
>         return ret;
>  }
>
> +static int bcm54616s_probe(struct phy_device *phydev)
> +{
> +       int val, intf_sel;
> +
> +       val = bcm_phy_read_shadow(phydev, BCM54XX_SHD_MODE);
> +       if (val < 0)
> +               return val;
> +
> +       /* The PHY is strapped in RGMII to fiber mode when INTERF_SEL[1:0]
> +        * is 01b.
> +        */
> +       intf_sel = (val & BCM54XX_SHD_INTF_SEL_MASK) >> 1;
> +       if (intf_sel == 1) {
> +               val = bcm_phy_read_shadow(phydev, BCM54616S_SHD_100FX_CTRL);
> +               if (val < 0)
> +                       return val;
> +
> +               /* Bit 0 of the SerDes 100-FX Control register, when set
> +                * to 1, sets the MII/RGMII -> 100BASE-FX configuration.
> +                * When this bit is set to 0, it sets the GMII/RGMII ->
> +                * 1000BASE-X configuration.
> +                */
> +               if (!(val & BCM54616S_100FX_MODE))
> +                       phydev->dev_flags |= PHY_BCM_FLAGS_MODE_1000BX;
> +       }
> +
> +       return 0;
> +}
> +
>  static int brcm_phy_setbits(struct phy_device *phydev, int reg, int set)
>  {
>         int val;
> @@ -655,6 +684,8 @@ static struct phy_driver broadcom_drivers[] = {
>         .config_aneg    = bcm54616s_config_aneg,
>         .ack_interrupt  = bcm_phy_ack_intr,
>         .config_intr    = bcm_phy_config_intr,
> +       .read_status    = bcm54xx_read_status,
> +       .probe          = bcm54616s_probe,
>  }, {
>         .phy_id         = PHY_ID_BCM5464,
>         .phy_id_mask    = 0xfffffff0,
> @@ -689,7 +720,7 @@ static struct phy_driver broadcom_drivers[] = {
>         .name           = "Broadcom BCM5482",
>         /* PHY_GBIT_FEATURES */
>         .config_init    = bcm5482_config_init,
> -       .read_status    = bcm5482_read_status,
> +       .read_status    = bcm54xx_read_status,
>         .ack_interrupt  = bcm_phy_ack_intr,
>         .config_intr    = bcm_phy_config_intr,
>  }, {
> diff --git a/include/linux/brcmphy.h b/include/linux/brcmphy.h
> index 6db2d9a6e503..b475e7f20d28 100644
> --- a/include/linux/brcmphy.h
> +++ b/include/linux/brcmphy.h
> @@ -200,9 +200,15 @@
>  #define BCM5482_SHD_SSD                0x14    /* 10100: Secondary SerDes control */
>  #define BCM5482_SHD_SSD_LEDM   0x0008  /* SSD LED Mode enable */
>  #define BCM5482_SHD_SSD_EN     0x0001  /* SSD enable */
> -#define BCM5482_SHD_MODE       0x1f    /* 11111: Mode Control Register */
> -#define BCM5482_SHD_MODE_1000BX        0x0001  /* Enable 1000BASE-X registers */
>
> +/* 10011: SerDes 100-FX Control Register */
> +#define BCM54616S_SHD_100FX_CTRL       0x13
> +#define        BCM54616S_100FX_MODE            BIT(0)  /* 100-FX SerDes Enable */
> +
> +/* 11111: Mode Control Register */
> +#define BCM54XX_SHD_MODE               0x1f
> +#define BCM54XX_SHD_INTF_SEL_MASK      GENMASK(2, 1)   /* INTERF_SEL[1:0] */
> +#define BCM54XX_SHD_MODE_1000BX                BIT(0)  /* Enable 1000-X registers */
>
>  /*
>   * EXPANSION SHADOW ACCESS REGISTERS.  (PHY REG 0x15, 0x16, and 0x17)
> --
> 2.17.1
>

The patchset looks better now. But is it ok, I wonder, to keep
PHY_BCM_FLAGS_MODE_1000BX in phydev->dev_flags, considering that
phy_attach_direct is overwriting it?

Regards,
-Vladimir

^ permalink raw reply

* Re: [PATCH][net-next][V2] net/mlx5: remove self-assignment on esw->dev
From: Parav Pandit @ 2019-08-03 15:02 UTC (permalink / raw)
  To: Colin King
  Cc: Saeed Mahameed, Leon Romanovsky, David S . Miller, netdev,
	linux-rdma, kernel-janitors, Linux Kernel Mailing List
In-Reply-To: <20190802151316.16011-1-colin.king@canonical.com>

On Sat, Aug 3, 2019 at 7:54 PM Colin King <colin.king@canonical.com> wrote:
>
> From: Colin Ian King <colin.king@canonical.com>
>
> There is a self assignment of esw->dev to itself, clean this up by
> removing it. Also make dev a const pointer.
>
> Addresses-Coverity: ("Self assignment")
> Fixes: 6cedde451399 ("net/mlx5: E-Switch, Verify support QoS element type")
> Signed-off-by: Colin Ian King <colin.king@canonical.com>
> ---
>
> V2: make dev const
>
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
> index f4ace5f8e884..de0894b695e3 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
> @@ -1413,7 +1413,7 @@ static int esw_vport_egress_config(struct mlx5_eswitch *esw,
>
>  static bool element_type_supported(struct mlx5_eswitch *esw, int type)
>  {
> -       struct mlx5_core_dev *dev = esw->dev = esw->dev;
> +       const struct mlx5_core_dev *dev = esw->dev;
>
>         switch (type) {
>         case SCHEDULING_CONTEXT_ELEMENT_TYPE_TSAR:
> --
> 2.20.1
>
Reviewed-by: Parav Pandit <parav@mellanox.com>

^ permalink raw reply

* Re: [PATCH v3 bpf-next 02/12] libbpf: implement BPF CO-RE offset relocation algorithm
From: Alexei Starovoitov @ 2019-08-03 16:25 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Yonghong Song, Song Liu, Kernel Team
In-Reply-To: <CAEf4BzY46=Vosd+kha+_Yh_iXNXhgfSW3ihePApb4GfuzoUU6w@mail.gmail.com>

On Fri, Aug 02, 2019 at 11:30:21PM -0700, Andrii Nakryiko wrote:
> 
> No, not anonymous.
> 
> struct my_struct___local {
>     int a;
> };
> 
> struct my_struct___target {
>     long long a;
> };
> 
> my_struct___local->a will not match my_struct___target->a, but it's
> not a reason to stop relocation process due to error.

why? It feels that this is exactly the reason to fail relocation.
struct names matched. field names matched.
but the types are different. Why proceed further?

Also what about str->a followed by str->b.
Is it possible that str->a will pick up one flavor when str->b different one?
That will likely lead to wrong behavior?

> 
> All the tests I added use non-numeric flavors. While technically I can
> use just ___1, ___2 and so on, it will greatly reduce readability,
> while not really solving any problem (nothing prevents someone to add
> something like lmc___1 eventually).
> 
> I think it's not worth it to complicate this logic just for
> lmc___{softc,media,ctl}, but we can do 2) - try to match any struct as
> is. If that fails, see if it's a "flavor" and match flavors.

Could you please share benchmarking results of largish bpf prog
with couple thousands relocations against typical vmlinux.h ?
I'm concerned that this double check will be noticeable.
May be llvm should recognize "flavor" in the type name and
encode them differently in BTF ?
Or add a pre-pass to libbpf to sort out all types into flavored and not.
If flavored search is expensive may be all flavors could be a linked list
from the base type. The typical case is one or two flavors, right?

> > > > > +     for (i = 0, j = 0; i < cand_ids->len; i++) {
> > > > > +             cand_id = cand_ids->data[i];
> > > > > +             cand_type = btf__type_by_id(targ_btf, cand_id);
> > > > > +             cand_name = btf__name_by_offset(targ_btf, cand_type->name_off);
> > > > > +
> > > > > +             err = bpf_core_spec_match(&local_spec, targ_btf,
> > > > > +                                       cand_id, &cand_spec);
> > > > > +             if (err < 0) {
> > > > > +                     pr_warning("prog '%s': relo #%d: failed to match spec ",
> > > > > +                                prog_name, relo_idx);
> > > > > +                     bpf_core_dump_spec(LIBBPF_WARN, &local_spec);
> > > > > +                     libbpf_print(LIBBPF_WARN,
> > > > > +                                  " to candidate #%d [%d] (%s): %d\n",
> > > > > +                                  i, cand_id, cand_name, err);
> > > > > +                     return err;
> > > > > +             }
> > > > > +             if (err == 0) {
> > > > > +                     pr_debug("prog '%s': relo #%d: candidate #%d [%d] (%s) doesn't match spec ",
> > > > > +                              prog_name, relo_idx, i, cand_id, cand_name);
> > > > > +                     bpf_core_dump_spec(LIBBPF_DEBUG, &local_spec);
> > > > > +                     libbpf_print(LIBBPF_DEBUG, "\n");
> > > > > +                     continue;
> > > > > +             }
> > > > > +
> > > > > +             pr_debug("prog '%s': relo #%d: candidate #%d matched as spec ",
> > > > > +                      prog_name, relo_idx, i);
> > > >
> > > > did you mention that you're going to make a helper for this debug dumps?
> > >
> > > yeah, I added bpf_core_dump_spec(), but I don't know how to shorten
> > > this further... This output is extremely useful to understand what's
> > > happening and will be invaluable when users will inevitably report
> > > confusing behavior in some cases, so I still want to keep it.
> >
> > not sure yet. Just pointing out that this function has more debug printfs
> > than actual code which doesn't look right.
> > We have complex algorithms in the kernel (like verifier).
> > Yet we don't sprinkle printfs in there to this degree.
> >
> 
> We do have a verbose verifier logging, though, exactly to help users
> to debug issues, which is extremely helpful and is greatly appreciated
> by users.
> There is nothing worse for developer experience than getting -EINVAL
> without any useful log message. Been there, banged my head against the
> wall wishing for a bit more verbose log. What are we trying to
> optimize for here?

All I'm saying that three printfs in a row that essentially convey the same info
look like clowntown. Some level of verbosity is certainly useful.


^ permalink raw reply

* Re: [PATCH net-next] netfilter: conntrack: use shared sysctl constants
From: Matteo Croce @ 2019-08-03 16:34 UTC (permalink / raw)
  To: netdev, netfilter-devel
  Cc: Pablo Neira Ayuso, Jozsef Kadlecsik, Florian Westphal, LKML,
	Andrew Morton, Tonghao Zhang, Kees Cook
In-Reply-To: <20190723012303.2221-1-mcroce@redhat.com>

On Tue, Jul 23, 2019 at 3:23 AM Matteo Croce <mcroce@redhat.com> wrote:
>
> Use shared sysctl variables for zero and one constants, as in commit
> eec4844fae7c ("proc/sysctl: add shared variables for range check")
>
> Fixes: 8f14c99c7eda ("netfilter: conntrack: limit sysctl setting for boolean options")
> Signed-off-by: Matteo Croce <mcroce@redhat.com>
>

followup, can anyone review it?

Thanks,
-- 
Matteo Croce
per aspera ad upstream

^ permalink raw reply

* Re: [PATCH net v3] ipvs: Improve robustness to the ipvs sysctl
From: Pablo Neira Ayuso @ 2019-08-03 16:40 UTC (permalink / raw)
  To: Julian Anastasov
  Cc: hujunwei, wensong, horms, kadlec, Florian Westphal, davem,
	netdev@vger.kernel.org, lvs-devel, netfilter-devel, coreteam,
	Mingfangsen, wangxiaogang3, xuhanbing
In-Reply-To: <alpine.LFD.2.21.1907312052310.3631@ja.home.ssi.bg>

On Wed, Jul 31, 2019 at 08:53:47PM +0300, Julian Anastasov wrote:
> 
> 	Hello,
> 
> On Thu, 1 Aug 2019, hujunwei wrote:
> 
> > From: Junwei Hu <hujunwei4@huawei.com>
> > 
> > The ipvs module parse the user buffer and save it to sysctl,
> > then check if the value is valid. invalid value occurs
> > over a period of time.
> > Here, I add a variable, struct ctl_table tmp, used to read
> > the value from the user buffer, and save only when it is valid.
> > I delete proc_do_sync_mode and use extra1/2 in table for the
> > proc_dointvec_minmax call.
> > 
> > Fixes: f73181c8288f ("ipvs: add support for sync threads")
> > Signed-off-by: Junwei Hu <hujunwei4@huawei.com>
> > Acked-by: Julian Anastasov <ja@ssi.bg>
> 
> 	Yep, Acked-by: Julian Anastasov <ja@ssi.bg>

Applied, thanks.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox