Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH v3 bpf-next 02/12] libbpf: implement BPF CO-RE offset relocation algorithm
From: Andrii Nakryiko @ 2019-08-03  6:30 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Yonghong Song, Song Liu, Kernel Team
In-Reply-To: <20190802215604.onihsysinwiu3shl@ast-mbp.dhcp.thefacebook.com>

On Fri, Aug 2, 2019 at 2:56 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Fri, Aug 02, 2019 at 12:16:52AM -0700, Andrii Nakryiko wrote:
> > On Thu, Aug 1, 2019 at 4:50 PM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Wed, Jul 31, 2019 at 11:47:53PM -0700, Andrii Nakryiko wrote:
> > > > This patch implements the core logic for BPF CO-RE offsets relocations.
> > > > Every instruction that needs to be relocated has corresponding
> > > > bpf_offset_reloc as part of BTF.ext. Relocations are performed by trying
> > > > to match recorded "local" relocation spec against potentially many
> > > > compatible "target" types, creating corresponding spec. Details of the
> > > > algorithm are noted in corresponding comments in the code.
> > > >
> > > > Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> > > > Acked-by: Song Liu <songliubraving@fb.com>
> > > ...
> > > > +             if (btf_is_composite(t)) {
> > > > +                     const struct btf_member *m = (void *)(t + 1);
> > > > +                     __u32 offset;
> > > > +
> > > > +                     if (access_idx >= BTF_INFO_VLEN(t->info))
> > > > +                             return -EINVAL;
> > > > +
> > > > +                     m = &m[access_idx];
> > > > +
> > > > +                     if (BTF_INFO_KFLAG(t->info)) {
> > > > +                             if (BTF_MEMBER_BITFIELD_SIZE(m->offset))
> > > > +                                     return -EINVAL;
> > > > +                             offset = BTF_MEMBER_BIT_OFFSET(m->offset);
> > > > +                     } else {
> > > > +                             offset = m->offset;
> > > > +                     }
> > >
> > > very similar logic exists in btf_dump.c
> > > probably makes sense to make a common helper at some point.
> >
> > Will add btf_member_bit_offset(type, member) and
> > btf_member_bit_size(type, member).
> >
> > >
> > > > +static size_t bpf_core_essential_name_len(const char *name)
> > > > +{
> > > > +     size_t n = strlen(name);
> > > > +     int i = n - 3;
> > > > +
> > > > +     while (i > 0) {
> > > > +             if (name[i] == '_' && name[i + 1] == '_' && name[i + 2] == '_')
> > > > +                     return i;
> > > > +             i--;
> > > > +     }
> > > > +     return n;
> > > > +}
> > >
> > > that's a bit of an eye irritant. How about?
> > >         size_t n = strlen(name);
> > >         int i, cnt = 0;
> > >
> > >         for (i = n - 1; i >= 0; i--) {
> > >                 if (name[i] == '_') {
> > >                     cnt++;
> > >                 } else {
> > >                    if (cnt == 3)
> > >                       return i + 1;
> > >                    cnt = 0;
> > >                 }
> > >         }
> > >         return n;
> >
> > I find this one much harder to read and understand. What's
> > eye-irritating about that loop?
> >
> > Your loop will also handle `a____b` differently. My version will
> > return "a_" as essential name, yours "a____b". Was this intentional on
> > your part?
>
> hmm. I think both will return sizeof("a") == 1

nope, there are 4 underscores, your implementation will bump cnt to 4
without checking it for `cnt == 3`, so will never detect flavor and
will just return sizeof("a____b"). It's easily fixable, but the point
is that my original irritating code is very straightforward and harder
to get wrong an, easier to understand at a glimpse, yours require much
more conscious thought to understand.

>
> > I'd rather use this instead, if you hate the first one:
> >
> > size_t n = strlen(name);
> > int i;
> >
> > for (i = n - 3; i > 0; i--) {
> >     if (strncmp(name + i, "___", 3) == 0)
> >         return i;
> > }
> >
> > Is this better?
>
> that is worse.
> What I don't like about it is that every byte is
> compared N=sizeof(string-to-found) times.
> I guess it's not such a big performance criticial path,
> but libbpf has to keep the bar high.

We are talking about searching for *three* characters in a short
string. Performance difference is negligible at best, unnoticeable at
worst. I'd rather have straightforward and easy code, but I'll rewrite
it as a state machine the way you proposed.

>
> > >
> > > > +     case BTF_KIND_ARRAY: {
> > > > +             const struct btf_array *loc_a, *targ_a;
> > > > +
> > > > +             loc_a = (void *)(local_type + 1);
> > > > +             targ_a = (void *)(targ_type + 1);
> > > > +             local_id = loc_a->type;
> > > > +             targ_id = targ_a->type;
> > >
> > > can we add a helper like:
> >
> > Yes, we can. I was thinking about that, but decided to not expand
> > patch set. But we do need to extract all those small, but nice
> > helpers. I'll put them in libbpf_internal.h for now, but I think it
> > might be good idea to expose them as part of btf.h. Thoughts?
>
> part of btf.h make sense to me.
>
> >
> > > const struct btf_array *btf_array(cosnt struct btf_type *t)
> > > {
> > >         return (const struct btf_array *)(t + 1);
> > > }
> > >
> > > then above will be:
> > >         case BTF_KIND_ARRAY: {
> > >                 local_id = btf_array(local_type)->type;
> > >                 targ_id = btf_array(targ_type)->type;
> > >
> > > and a bunch of code in btf.c and btf_dump.c would be cleaner as well?
> >
> > Yep, some of those are already scattered around btf.c and btf_dump.c,
> > will clean up and add patch to this patch set.
> >
> > >
> > > > +             goto recur;
> > > > +     }
> > > > +     default:
> > > > +             pr_warning("unexpected kind %d relocated, local [%d], target [%d]\n",
> > > > +                        kind, local_id, targ_id);
> > > > +             return 0;
> > > > +     }
> > > > +}
> > > > +
> > > > +/*
> > > > + * Given single high-level named field accessor in local type, find
> > > > + * corresponding high-level accessor for a target type. Along the way,
> > > > + * maintain low-level spec for target as well. Also keep updating target
> > > > + * offset.
> > > > + *
> > > > + * Searching is performed through recursive exhaustive enumeration of all
> > > > + * fields of a struct/union. If there are any anonymous (embedded)
> > > > + * structs/unions, they are recursively searched as well. If field with
> > > > + * desired name is found, check compatibility between local and target types,
> > > > + * before returning result.
> > > > + *
> > > > + * 1 is returned, if field is found.
> > > > + * 0 is returned if no compatible field is found.
> > > > + * <0 is returned on error.
> > > > + */
> > > > +static int bpf_core_match_member(const struct btf *local_btf,
> > > > +                              const struct bpf_core_accessor *local_acc,
> > > > +                              const struct btf *targ_btf,
> > > > +                              __u32 targ_id,
> > > > +                              struct bpf_core_spec *spec,
> > > > +                              __u32 *next_targ_id)
> > > > +{
> > > > +     const struct btf_type *local_type, *targ_type;
> > > > +     const struct btf_member *local_member, *m;
> > > > +     const char *local_name, *targ_name;
> > > > +     __u32 local_id;
> > > > +     int i, n, found;
> > > > +
> > > > +     targ_type = skip_mods_and_typedefs(targ_btf, targ_id, &targ_id);
> > > > +     if (!targ_type)
> > > > +             return -EINVAL;
> > > > +     if (!btf_is_composite(targ_type))
> > > > +             return 0;
> > > > +
> > > > +     local_id = local_acc->type_id;
> > > > +     local_type = btf__type_by_id(local_btf, local_id);
> > > > +     local_member = (void *)(local_type + 1);
> > > > +     local_member += local_acc->idx;
> > > > +     local_name = btf__name_by_offset(local_btf, local_member->name_off);
> > > > +
> > > > +     n = BTF_INFO_VLEN(targ_type->info);
> > > > +     m = (void *)(targ_type + 1);
> > >
> > > new btf_member() helper?
> > >
> > > > +     for (i = 0; i < n; i++, m++) {
> > > > +             __u32 offset;
> > > > +
> > > > +             /* bitfield relocations not supported */
> > > > +             if (BTF_INFO_KFLAG(targ_type->info)) {
> > > > +                     if (BTF_MEMBER_BITFIELD_SIZE(m->offset))
> > > > +                             continue;
> > > > +                     offset = BTF_MEMBER_BIT_OFFSET(m->offset);
> > > > +             } else {
> > > > +                     offset = m->offset;
> > > > +             }
> > > > +             if (offset % 8)
> > > > +                     continue;
> > >
> > > same bit of code again?
> > > definitely could use a helper.
> >
> > Different handling (continue here, return error above), but can use
> > those helpers I mentioned above.
> >
> > >
> > > > +     for (i = 0; i < local_spec->len; i++, local_acc++, targ_acc++) {
> > > > +             targ_type = skip_mods_and_typedefs(targ_spec->btf, targ_id,
> > > > +                                                &targ_id);
> > > > +             if (!targ_type)
> > > > +                     return -EINVAL;
> > > > +
> > > > +             if (local_acc->name) {
> > > > +                     matched = bpf_core_match_member(local_spec->btf,
> > > > +                                                     local_acc,
> > > > +                                                     targ_btf, targ_id,
> > > > +                                                     targ_spec, &targ_id);
> > > > +                     if (matched <= 0)
> > > > +                             return matched;
> > > > +             } else {
> > > > +                     /* for i=0, targ_id is already treated as array element
> > > > +                      * type (because it's the original struct), for others
> > > > +                      * we should find array element type first
> > > > +                      */
> > > > +                     if (i > 0) {
> > > > +                             const struct btf_array *a;
> > > > +
> > > > +                             if (!btf_is_array(targ_type))
> > > > +                                     return 0;
> > > > +
> > > > +                             a = (void *)(targ_type + 1);
> > > > +                             if (local_acc->idx >= a->nelems)
> > > > +                                     return 0;
> > >
> > > am I reading it correctly that the local spec requested out-of-bounds
> > > index in the target array type?
> > > Why this is 'ignore' instead of -EINVAL?
> >
> > Similar to any other mismatch (e.g., int in local type vs int64 in
> > target type). It just makes candidate not matching. Why would that be
> > error that will stop the whole relocation and subsequently object
> > loading process?
>
> Did the field name match or this is for anon types?
> I've read it as names matched and type miscompared.

No, not anonymous.

struct my_struct___local {
    int a;
};

struct my_struct___target {
    long long a;
};

my_struct___local->a will not match my_struct___target->a, but it's
not a reason to stop relocation process due to error.

>
> >
> > >
> > > > +/*
> > > > + * Probe few well-known locations for vmlinux kernel image and try to load BTF
> > > > + * data out of it to use for target BTF.
> > > > + */
> > > > +static struct btf *bpf_core_find_kernel_btf(void)
> > > > +{
> > > > +     const char *locations[] = {
> > > > +             "/lib/modules/%1$s/vmlinux-%1$s",
> > > > +             "/usr/lib/modules/%1$s/kernel/vmlinux",
> > > > +     };
> > > > +     char path[PATH_MAX + 1];
> > > > +     struct utsname buf;
> > > > +     struct btf *btf;
> > > > +     int i, err;
> > > > +
> > > > +     err = uname(&buf);
> > > > +     if (err) {
> > > > +             pr_warning("failed to uname(): %d\n", err);
> > >
> > > defensive programming ?
> > > I think uname() can fail only if &buf points to non-existing page like null.
> >
> > I haven't checked source for this syscall, but man page specified that
> > it might return -1 on error.
>
> man page says that it can only return EFAULT.

Ah, yeah, seems to be the only reason. I'll remove the check, it
wasn't paranoia :)

>
> >
> > >
> > > > +             return ERR_PTR(err);
> > > > +     }
> > > > +
> > > > +     for (i = 0; i < ARRAY_SIZE(locations); i++) {
> > > > +             snprintf(path, PATH_MAX, locations[i], buf.release);
> > > > +             pr_debug("attempting to load kernel BTF from '%s'\n", path);
> > >
> > > I think this debug message would have been more useful after access().
> >
> > Sure, will move.
> >
> > >
> > > > +
> > > > +             if (access(path, R_OK))
> > > > +                     continue;
> > > > +
> > > > +             btf = btf__parse_elf(path, NULL);
> > > > +             if (IS_ERR(btf))
> > > > +                     continue;
> > > > +
> > > > +             pr_debug("successfully loaded kernel BTF from '%s'\n", path);
> > > > +             return btf;
> > > > +     }
> > > > +
> > > > +     pr_warning("failed to find valid kernel BTF\n");
> > > > +     return ERR_PTR(-ESRCH);
> > > > +}
> > > > +
> > > > +/* Output spec definition in the format:
> > > > + * [<type-id>] (<type-name>) + <raw-spec> => <offset>@<spec>,
> > > > + * where <spec> is a C-syntax view of recorded field access, e.g.: x.a[3].b
> > > > + */
> > > > +static void bpf_core_dump_spec(int level, const struct bpf_core_spec *spec)
> > > > +{
> > > > +     const struct btf_type *t;
> > > > +     const char *s;
> > > > +     __u32 type_id;
> > > > +     int i;
> > > > +
> > > > +     type_id = spec->spec[0].type_id;
> > > > +     t = btf__type_by_id(spec->btf, type_id);
> > > > +     s = btf__name_by_offset(spec->btf, t->name_off);
> > > > +     libbpf_print(level, "[%u] (%s) + ", type_id, s);
> > >
> > > imo extra []() don't improve readability of the dump.
> >
> > [<num>] is the general convention I've been using throughout libbpf to
> > specify type ID, so I'd rather keep it for consistency. I can drop
> > parens, though, no problem.
> >
> > >
> > > > +
> > > > +     for (i = 0; i < spec->raw_len; i++)
> > > > +             libbpf_print(level, "%d%s", spec->raw_spec[i],
> > > > +                          i == spec->raw_len - 1 ? " => " : ":");
> > > > +
> > > > +     libbpf_print(level, "%u @ &x", spec->offset);
> > > > +
> > > > +     for (i = 0; i < spec->len; i++) {
> > > > +             if (spec->spec[i].name)
> > > > +                     libbpf_print(level, ".%s", spec->spec[i].name);
> > > > +             else
> > > > +                     libbpf_print(level, "[%u]", spec->spec[i].idx);
> > > > +     }
> > > > +
> > > > +}
> > > > +
> > > > +static size_t bpf_core_hash_fn(const void *key, void *ctx)
> > > > +{
> > > > +     return (size_t)key;
> > > > +}
> > > > +
> > > > +static bool bpf_core_equal_fn(const void *k1, const void *k2, void *ctx)
> > > > +{
> > > > +     return k1 == k2;
> > > > +}
> > > > +
> > > > +static void *u32_to_ptr(__u32 x)
> > > > +{
> > > > +     return (void *)(uintptr_t)x;
> > > > +}
> > >
> > > u32 to pointer on 64-bit arch?!
> > > That surely needs a comment.
> >
> > I should probably call it u32_to_hash_key() to make it obvious it's
> > conversion to hashmap's generic `void *` key type.
> >
> > >
> > > > +
> > > > +/*
> > > > + * CO-RE relocate single instruction.
> > > > + *
> > > > + * The outline and important points of the algorithm:
> > > > + * 1. For given local type, find corresponding candidate target types.
> > > > + *    Candidate type is a type with the same "essential" name, ignoring
> > > > + *    everything after last triple underscore (___). E.g., `sample`,
> > > > + *    `sample___flavor_one`, `sample___flavor_another_one`, are all candidates
> > > > + *    for each other. Names with triple underscore are referred to as
> > > > + *    "flavors" and are useful, among other things, to allow to
> > > > + *    specify/support incompatible variations of the same kernel struct, which
> > > > + *    might differ between different kernel versions and/or build
> > > > + *    configurations.
> > > > + *
> > > > + *    N.B. Struct "flavors" could be generated by bpftool's BTF-to-C
> > > > + *    converter, when deduplicated BTF of a kernel still contains more than
> > > > + *    one different types with the same name. In that case, ___2, ___3, etc
> > > > + *    are appended starting from second name conflict. But start flavors are
> > > > + *    also useful to be defined "locally", in BPF program, to extract same
> > > > + *    data from incompatible changes between different kernel
> > > > + *    versions/configurations. For instance, to handle field renames between
> > > > + *    kernel versions, one can use two flavors of the struct name with the
> > > > + *    same common name and use conditional relocations to extract that field,
> > > > + *    depending on target kernel version.
> > >
> > > there are actual kernel types that have ___ in the name.
> > > Ex: struct lmc___media
> > > We probably need to revisit this 'flavor' convention.
> >
> > There are only these:
> > - lmc___softc
> > - lmc___media
> > - lmc___ctl (all three in drivers/net/wan/lmc/lmc_var.h)
> > - ____ftrace_##name set of structs
> >
> > I couldn't come up with anything cleaner-looking. I think we can still
> > keep ___ convention, but:
> >
> > 1. Match only exactly 3 underscores, delimited by non-underscore from
> > both sides (so similar to your proposed loop above);
> > 2. We can also try matching candidates assuming full name without
> > ___xxx part removed, in addition to current logic. This seems like an
> > overkill at this point and unlikely to be useful in practice, so I'd
> > postpone implementing this until we really need it.
> >
> > What do you think? Which other convention did you have in mind?
>
> may be match ___[0-9]+ instead for now?
> Not as flexible, but user supplied "flavors" is not an immediate task.

All the tests I added use non-numeric flavors. While technically I can
use just ___1, ___2 and so on, it will greatly reduce readability,
while not really solving any problem (nothing prevents someone to add
something like lmc___1 eventually).

I think it's not worth it to complicate this logic just for
lmc___{softc,media,ctl}, but we can do 2) - try to match any struct as
is. If that fails, see if it's a "flavor" and match flavors.

>
> >
> > >
> > > > +     for (i = 0, j = 0; i < cand_ids->len; i++) {
> > > > +             cand_id = cand_ids->data[i];
> > > > +             cand_type = btf__type_by_id(targ_btf, cand_id);
> > > > +             cand_name = btf__name_by_offset(targ_btf, cand_type->name_off);
> > > > +
> > > > +             err = bpf_core_spec_match(&local_spec, targ_btf,
> > > > +                                       cand_id, &cand_spec);
> > > > +             if (err < 0) {
> > > > +                     pr_warning("prog '%s': relo #%d: failed to match spec ",
> > > > +                                prog_name, relo_idx);
> > > > +                     bpf_core_dump_spec(LIBBPF_WARN, &local_spec);
> > > > +                     libbpf_print(LIBBPF_WARN,
> > > > +                                  " to candidate #%d [%d] (%s): %d\n",
> > > > +                                  i, cand_id, cand_name, err);
> > > > +                     return err;
> > > > +             }
> > > > +             if (err == 0) {
> > > > +                     pr_debug("prog '%s': relo #%d: candidate #%d [%d] (%s) doesn't match spec ",
> > > > +                              prog_name, relo_idx, i, cand_id, cand_name);
> > > > +                     bpf_core_dump_spec(LIBBPF_DEBUG, &local_spec);
> > > > +                     libbpf_print(LIBBPF_DEBUG, "\n");
> > > > +                     continue;
> > > > +             }
> > > > +
> > > > +             pr_debug("prog '%s': relo #%d: candidate #%d matched as spec ",
> > > > +                      prog_name, relo_idx, i);
> > >
> > > did you mention that you're going to make a helper for this debug dumps?
> >
> > yeah, I added bpf_core_dump_spec(), but I don't know how to shorten
> > this further... This output is extremely useful to understand what's
> > happening and will be invaluable when users will inevitably report
> > confusing behavior in some cases, so I still want to keep it.
>
> not sure yet. Just pointing out that this function has more debug printfs
> than actual code which doesn't look right.
> We have complex algorithms in the kernel (like verifier).
> Yet we don't sprinkle printfs in there to this degree.
>

We do have a verbose verifier logging, though, exactly to help users
to debug issues, which is extremely helpful and is greatly appreciated
by users.
There is nothing worse for developer experience than getting -EINVAL
without any useful log message. Been there, banged my head against the
wall wishing for a bit more verbose log. What are we trying to
optimize for here?

^ permalink raw reply

* Re: [PATCH] net: mdio-octeon: Fix build error and Kconfig warning
From: Nathan Chancellor @ 2019-08-03  6:05 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: David Miller, andrew, broonie, devel, f.fainelli, gregkh,
	hkallweit1, kernel-build-reports, linux-arm-kernel, linux-next,
	netdev, lkp, rdunlap
In-Reply-To: <20190803013952.GF5597@bombadil.infradead.org>

On Fri, Aug 02, 2019 at 06:39:52PM -0700, Matthew Wilcox wrote:
> On Fri, Aug 02, 2019 at 06:30:31PM -0700, Nathan Chancellor wrote:
> > On Fri, Aug 02, 2019 at 06:11:32PM -0700, David Miller wrote:
> > > The proper way to fix this is to include either
> > > 
> > > 	linux/io-64-nonatomic-hi-lo.h
> > > 
> > > or
> > > 
> > > 	linux/io-64-nonatomic-lo-hi.h
> > > 
> > > whichever is appropriate.
> > 
> > Hmmmm, is that not what I did?
> > 
> > Although I did not know about io-64-nonatomic-hi-lo.h. What is the
> > difference and which one is needed here?
> 
> Whether you write the high or low 32 bits first.  For this, it doesn't
> matter, since the compiled driver will never be run on real hardware.

That's what I figured. I have only seen lo-hi used personally, which is
what I went with here. Thanks for the confirmation!

> 
> > There is apparently another failure when OF_MDIO is not set, I guess I
> > can try to look into that as well and respin into a series if
> > necessary.
> 
> Thanks for taking care of that!

^ permalink raw reply

* [PATCH v2] net: mdio-octeon: Fix Kconfig warnings and build errors
From: Nathan Chancellor @ 2019-08-03  6:01 UTC (permalink / raw)
  To: natechancellor
  Cc: andrew, broonie, davem, devel, f.fainelli, gregkh, hkallweit1,
	kernel-build-reports, linux-arm-kernel, linux-next, lkp, netdev,
	rdunlap, willy
In-Reply-To: <20190731185023.20954-1-natechancellor@gmail.com>

After commit 171a9bae68c7 ("staging/octeon: Allow test build on
!MIPS"), the following combination of configs cause a few Kconfig
warnings and build errors (distilled from arm allyesconfig and Randy's
randconfig builds):

    CONFIG_NETDEVICES=y
    CONFIG_STAGING=y
    CONFIG_COMPILE_TEST=y

and CONFIG_OCTEON_ETHERNET as either a module or built-in.

WARNING: unmet direct dependencies detected for MDIO_OCTEON
  Depends on [n]: NETDEVICES [=y] && MDIO_DEVICE [=y] && MDIO_BUS [=y]
&& 64BIT [=n] && HAS_IOMEM [=y] && OF_MDIO [=n]
  Selected by [y]:
  - OCTEON_ETHERNET [=y] && STAGING [=y] && (CAVIUM_OCTEON_SOC ||
COMPILE_TEST [=y]) && NETDEVICES [=y]

In file included from ../drivers/net/phy/mdio-octeon.c:14:
../drivers/net/phy/mdio-cavium.h:111:36: error: implicit declaration of
function ‘writeq’; did you mean ‘writel’?
[-Werror=implicit-function-declaration]
  111 | #define oct_mdio_writeq(val, addr) writeq(val, (void *)addr)
      |                                    ^~~~~~

CONFIG_64BIT is not strictly necessary if the proper readq/writeq
definitions are included from io-64-nonatomic-lo-hi.h.

CONFIG_OF_MDIO is not needed when CONFIG_COMPILE_TEST is enabled because
of commit f9dc9ac51610 ("of/mdio: Add dummy functions in of_mdio.h.").

Fixes: 171a9bae68c7 ("staging/octeon: Allow test build on !MIPS")
Reported-by: kbuild test robot <lkp@intel.com>
Reported-by: Mark Brown <broonie@kernel.org>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
---

v1 -> v2:

* Address Randy's reported failure here: https://lore.kernel.org/netdev/b3444283-7a77-ece8-7ac6-41756aa7dc60@infradead.org/
  by not requiring CONFIG_OF_MDIO when CONFIG_COMPILE_TEST is set.

* Improve commit message

 drivers/net/phy/Kconfig       | 4 ++--
 drivers/net/phy/mdio-cavium.h | 2 ++
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
index 20f14c5fbb7e..0e3d9e3d3533 100644
--- a/drivers/net/phy/Kconfig
+++ b/drivers/net/phy/Kconfig
@@ -159,8 +159,8 @@ config MDIO_MSCC_MIIM
 
 config MDIO_OCTEON
 	tristate "Octeon and some ThunderX SOCs MDIO buses"
-	depends on 64BIT
-	depends on HAS_IOMEM && OF_MDIO
+	depends on (64BIT && OF_MDIO) || COMPILE_TEST
+	depends on HAS_IOMEM
 	select MDIO_CAVIUM
 	help
 	  This module provides a driver for the Octeon and ThunderX MDIO
diff --git a/drivers/net/phy/mdio-cavium.h b/drivers/net/phy/mdio-cavium.h
index ed5f9bb5448d..b7f89ad27465 100644
--- a/drivers/net/phy/mdio-cavium.h
+++ b/drivers/net/phy/mdio-cavium.h
@@ -108,6 +108,8 @@ static inline u64 oct_mdio_readq(u64 addr)
 	return cvmx_read_csr(addr);
 }
 #else
+#include <linux/io-64-nonatomic-lo-hi.h>
+
 #define oct_mdio_writeq(val, addr)	writeq(val, (void *)addr)
 #define oct_mdio_readq(addr)		readq((void *)addr)
 #endif
-- 
2.23.0.rc1


^ permalink raw reply related

* Re: [PATCH bpf-next 1/3] selftests/bpf: test_progs: switch to open_memstream
From: Andrii Nakryiko @ 2019-08-03  6:00 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Andrii Nakryiko, Stanislav Fomichev, netdev@vger.kernel.org,
	bpf@vger.kernel.org, davem@davemloft.net, ast@kernel.org,
	daniel@iogearbox.net
In-Reply-To: <20190802201456.GB4544@mini-arch>

On Fri, Aug 2, 2019 at 1:14 PM Stanislav Fomichev <sdf@fomichev.me> wrote:
>
> On 08/02, Andrii Nakryiko wrote:
> > On 8/2/19 10:17 AM, Stanislav Fomichev wrote:
> > > Use open_memstream to override stdout during test execution.
> > > The copy of the original stdout is held in env.stdout and used
> > > to print subtest info and dump failed log.
> >
> > I really like the idea. I didn't know about open_memstream, it's awesome. Thanks!
> One possible downside of using open_memstream is that it's glibc
> specific. I probably need to wrap it in #ifdef __GLIBC__ to make
> it work with other libcs and just print everything as it was before :-(.
> I'm not sure we care though.

Given this is selftests/bpf, it is probably OK.

>
> > > test_{v,}printf are now simple wrappers around stdout and will be
> > > removed in the next patch.
> > >
> > > Cc: Andrii Nakryiko <andriin@fb.com>
> > > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > > ---
> > >  tools/testing/selftests/bpf/test_progs.c | 100 ++++++++++-------------
> > >  tools/testing/selftests/bpf/test_progs.h |   2 +-
> > >  2 files changed, 46 insertions(+), 56 deletions(-)
> > >
> > > diff --git a/tools/testing/selftests/bpf/test_progs.c b/tools/testing/selftests/bpf/test_progs.c
> > > index db00196c8315..00d1565d01a3 100644
> > > --- a/tools/testing/selftests/bpf/test_progs.c
> > > +++ b/tools/testing/selftests/bpf/test_progs.c
> > > @@ -40,14 +40,22 @@ static bool should_run(struct test_selector *sel, int num, const char *name)
> > >
> > >  static void dump_test_log(const struct prog_test_def *test, bool failed)
> > >  {
> > > -   if (env.verbose || test->force_log || failed) {
> > > -           if (env.log_cnt) {
> > > -                   fprintf(stdout, "%s", env.log_buf);
> > > -                   if (env.log_buf[env.log_cnt - 1] != '\n')
> > > -                           fprintf(stdout, "\n");
> > > +   if (stdout == env.stdout)
> > > +           return;
> > > +
> > > +   fflush(stdout); /* exports env.log_buf & env.log_cap */
> > > +
> > > +   if (env.log_cap && (env.verbose || test->force_log || failed)) {
> > > +           int len = strlen(env.log_buf);
> >
> > env.log_cap is not really a capacity, it's actual number of bytes (without terminating zero), so there is no need to do strlen and it's probably better to rename env.log_cap into env.log_cnt.
> I'll rename it to log_size to match open_memstream args.
> We probably still need to do strlen because open_memstream can allocate
> bigger buffer to hold the data.

If I read man page correctly, env.log_cnt will be exactly the value
that strlen will return - number of actual bytes written (omitting
terminal zero), not number of pre-allocated bytes, thus I'm saying
that strlen is redundant. Please take a look again.

>
> > > +
> > > +           if (len) {
> > > +                   fprintf(env.stdout, "%s", env.log_buf);
> > > +                   if (env.log_buf[len - 1] != '\n')
> > > +                           fprintf(env.stdout, "\n");
> > > +
> > > +                   fseeko(stdout, 0, SEEK_SET);
> > Same bug as I already fixed with env.log_cnt = 0 being inside this if. You want to do seek always, not just when you print output log.
> SG, will move to where we currently clear log_cnt, thanks!
>
> > >  /* rewind */
> > >             }
> > >     }
> > > -   env.log_cnt = 0;
> > >  }
> > >
> > >  void test__end_subtest()
> > > @@ -62,7 +70,7 @@ void test__end_subtest()
> > >
> > >     dump_test_log(test, sub_error_cnt);
> > >
> > > -   printf("#%d/%d %s:%s\n",
> > > +   fprintf(env.stdout, "#%d/%d %s:%s\n",
> > >            test->test_num, test->subtest_num,
> > >            test->subtest_name, sub_error_cnt ? "FAIL" : "OK");
> > >  }
> > > @@ -100,53 +108,7 @@ void test__force_log() {
> > >
> > >  void test__vprintf(const char *fmt, va_list args)
> > >  {
> > > -   size_t rem_sz;
> > > -   int ret = 0;
> > > -
> > > -   if (env.verbose || (env.test && env.test->force_log)) {
> > > -           vfprintf(stderr, fmt, args);
> > > -           return;
> > > -   }
> > > -
> > > -try_again:
> > > -   rem_sz = env.log_cap - env.log_cnt;
> > > -   if (rem_sz) {
> > > -           va_list ap;
> > > -
> > > -           va_copy(ap, args);
> > > -           /* we reserved extra byte for \0 at the end */
> > > -           ret = vsnprintf(env.log_buf + env.log_cnt, rem_sz + 1, fmt, ap);
> > > -           va_end(ap);
> > > -
> > > -           if (ret < 0) {
> > > -                   env.log_buf[env.log_cnt] = '\0';
> > > -                   fprintf(stderr, "failed to log w/ fmt '%s'\n", fmt);
> > > -                   return;
> > > -           }
> > > -   }
> > > -
> > > -   if (!rem_sz || ret > rem_sz) {
> > > -           size_t new_sz = env.log_cap * 3 / 2;
> > > -           char *new_buf;
> > > -
> > > -           if (new_sz < 4096)
> > > -                   new_sz = 4096;
> > > -           if (new_sz < ret + env.log_cnt)
> > > -                   new_sz = ret + env.log_cnt;
> > > -
> > > -           /* +1 for guaranteed space for terminating \0 */
> > > -           new_buf = realloc(env.log_buf, new_sz + 1);
> > > -           if (!new_buf) {
> > > -                   fprintf(stderr, "failed to realloc log buffer: %d\n",
> > > -                           errno);
> > > -                   return;
> > > -           }
> > > -           env.log_buf = new_buf;
> > > -           env.log_cap = new_sz;
> > > -           goto try_again;
> > > -   }
> > > -
> > > -   env.log_cnt += ret;
> > > +   vprintf(fmt, args);
> > >  }
> > >
> > >  void test__printf(const char *fmt, ...)
> > > @@ -477,6 +439,32 @@ static error_t parse_arg(int key, char *arg, struct argp_state *state)
> > >     return 0;
> > >  }
> > >
> > > +static void stdout_hijack(void)
> > > +{
> > > +   if (env.verbose || (env.test && env.test->force_log)) {
> > > +           /* nothing to do, output to stdout by default */
> > > +           return;
> > > +   }
> > > +
> > > +   /* stdout -> buffer */
> > > +   fflush(stdout);
> > > +   stdout = open_memstream(&env.log_buf, &env.log_cap);
> > Check errors and restore original stdout if something went wrong? (And emit some warning to stderr).
> Good point, will do.
>
> > > +}
> > > +
> > > +static void stdout_restore(void)
> > > +{
> > > +   if (stdout == env.stdout)
> > > +           return;
> > > +
> > > +   fclose(stdout);
> > > +   free(env.log_buf);
> > > +
> > > +   env.log_buf = NULL;
> > > +   env.log_cap = 0;
> > > +
> > > +   stdout = env.stdout;
> > > +}
> > > +
> > >  int main(int argc, char **argv)
> > >  {
> > >     static const struct argp argp = {
> > > @@ -495,6 +483,7 @@ int main(int argc, char **argv)
> > >     srand(time(NULL));
> > >
> > >     env.jit_enabled = is_jit_enabled();
> > > +   env.stdout = stdout;
> > >
> > >     for (i = 0; i < prog_test_cnt; i++) {
> > >             struct prog_test_def *test = &prog_test_defs[i];
> > > @@ -508,6 +497,7 @@ int main(int argc, char **argv)
> > >                             test->test_num, test->test_name))
> > >                     continue;
> > >
> > > +           stdout_hijack();
> > Why do you do this for every test? Just do once before all the tests and restore after?
> We can do that, my thinking was to limit the area of hijacking :-)

But why? We actually want to hijack stdout/stderr for entire duration
of all the tests. If test_progs needs some "infrastructural" mandatory
output, we have env.stdout/env.stderr for that.

> But that would work as well, less allocations per test, I guess. Will
> do.
>
> > >             test->run_test();
> > >             /* ensure last sub-test is finalized properly */
> > >             if (test->subtest_name)
> > > @@ -522,6 +512,7 @@ int main(int argc, char **argv)
> > >                     env.succ_cnt++;
> > >
> > >             dump_test_log(test, test->error_cnt);
> > > +           stdout_restore();
> > >
> > >             printf("#%d %s:%s\n", test->test_num, test->test_name,
> > >                    test->error_cnt ? "FAIL" : "OK");
> > > @@ -529,7 +520,6 @@ int main(int argc, char **argv)
> > >     printf("Summary: %d/%d PASSED, %d FAILED\n",
> > >            env.succ_cnt, env.sub_succ_cnt, env.fail_cnt);
> > >
> > > -   free(env.log_buf);
> > >     free(env.test_selector.num_set);
> > >     free(env.subtest_selector.num_set);
> > >
> > > diff --git a/tools/testing/selftests/bpf/test_progs.h b/tools/testing/selftests/bpf/test_progs.h
> > > index afd14962456f..9fd89078494f 100644
> > > --- a/tools/testing/selftests/bpf/test_progs.h
> > > +++ b/tools/testing/selftests/bpf/test_progs.h
> > > @@ -56,8 +56,8 @@ struct test_env {
> > >     bool jit_enabled;
> > >
> > >     struct prog_test_def *test;
> > > +   FILE *stdout;
> > >     char *log_buf;
> > > -   size_t log_cnt;
> > >     size_t log_cap;
> > So it's actually log_cnt that's assigned on fflush for memstream, according to man page, so probably keep log_cnt, delete log_cap.
> Ack. See above, will rename to log_size, let me know if you disagree.
>
> > >     int succ_cnt; /* successful tests */

^ permalink raw reply

* [PATCH 1/1] bpf: introduce new helper udp_flow_src_port
From: Farid Zakaria @ 2019-08-03  4:43 UTC (permalink / raw)
  To: ast, daniel, netdev, bpf; +Cc: Farid Zakaria
In-Reply-To: <20190803044320.5530-1-farid.m.zakaria@gmail.com>

Foo over UDP uses UDP encapsulation to add additional entropy
into the packets so that they get beter distribution across EMCP
routes.

Expose udp_flow_src_port as a bpf helper so that tunnel filters
can benefit from the helper.

Signed-off-by: Farid Zakaria <farid.m.zakaria@gmail.com>
---
 include/uapi/linux/bpf.h                      | 21 +++++++--
 net/core/filter.c                             | 20 ++++++++
 tools/include/uapi/linux/bpf.h                | 21 +++++++--
 tools/testing/selftests/bpf/bpf_helpers.h     |  2 +
 .../bpf/prog_tests/udp_flow_src_port.c        | 28 +++++++++++
 .../bpf/progs/test_udp_flow_src_port_kern.c   | 47 +++++++++++++++++++
 6 files changed, 131 insertions(+), 8 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/udp_flow_src_port.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_udp_flow_src_port_kern.c

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 4393bd4b2419..90e814153dec 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2545,9 +2545,21 @@ union bpf_attr {
  * 		*th* points to the start of the TCP header, while *th_len*
  * 		contains **sizeof**\ (**struct tcphdr**).
  *
- * 	Return
- * 		0 if *iph* and *th* are a valid SYN cookie ACK, or a negative
- * 		error otherwise.
+ *  Return
+ *      0 if *iph* and *th* are a valid SYN cookie ACK, or a negative
+ *      error otherwise.
+ *
+ * int bpf_udp_flow_src_port(struct sk_buff *skb, int min, int max, int use_eth)
+ *  Description
+ *      It's common to implement tunnelling inside a UDP protocol to provide
+ *      additional randomness to the packet. The destination port of the UDP
+ *      header indicates the inner packet type whereas the source port is used
+ *      for additional entropy.
+ *
+ *  Return
+ *      An obfuscated hash of the packet that falls within the
+ *      min & max port range.
+ *      If min >= max, the default port range is used
  *
  * int bpf_sysctl_get_name(struct bpf_sysctl *ctx, char *buf, size_t buf_len, u64 flags)
  *	Description
@@ -2853,7 +2865,8 @@ union bpf_attr {
 	FN(sk_storage_get),		\
 	FN(sk_storage_delete),		\
 	FN(send_signal),		\
-	FN(tcp_gen_syncookie),
+	FN(tcp_gen_syncookie),  \
+	FN(udp_flow_src_port),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/net/core/filter.c b/net/core/filter.c
index 5a2707918629..fdf0ebb8c2c8 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2341,6 +2341,24 @@ static const struct bpf_func_proto bpf_msg_pull_data_proto = {
 	.arg4_type	= ARG_ANYTHING,
 };
 
+BPF_CALL_4(bpf_udp_flow_src_port, struct sk_buff *, skb, int, min,
+	   int, max, int, use_eth)
+{
+	struct net *net = dev_net(skb->dev);
+
+	return udp_flow_src_port(net, skb, min, max, use_eth);
+}
+
+static const struct bpf_func_proto bpf_udp_flow_src_port_proto = {
+	.func           = bpf_udp_flow_src_port,
+	.gpl_only       = false,
+	.ret_type       = RET_INTEGER,
+	.arg1_type		= ARG_PTR_TO_CTX,
+	.arg2_type      = ARG_ANYTHING,
+	.arg3_type		= ARG_ANYTHING,
+	.arg4_type		= ARG_ANYTHING,
+};
+
 BPF_CALL_4(bpf_msg_push_data, struct sk_msg *, msg, u32, start,
 	   u32, len, u64, flags)
 {
@@ -6186,6 +6204,8 @@ tc_cls_act_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_sk_storage_get_proto;
 	case BPF_FUNC_sk_storage_delete:
 		return &bpf_sk_storage_delete_proto;
+	case BPF_FUNC_udp_flow_src_port:
+		return &bpf_udp_flow_src_port_proto;
 #ifdef CONFIG_XFRM
 	case BPF_FUNC_skb_get_xfrm_state:
 		return &bpf_skb_get_xfrm_state_proto;
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 4393bd4b2419..90e814153dec 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -2545,9 +2545,21 @@ union bpf_attr {
  * 		*th* points to the start of the TCP header, while *th_len*
  * 		contains **sizeof**\ (**struct tcphdr**).
  *
- * 	Return
- * 		0 if *iph* and *th* are a valid SYN cookie ACK, or a negative
- * 		error otherwise.
+ *  Return
+ *      0 if *iph* and *th* are a valid SYN cookie ACK, or a negative
+ *      error otherwise.
+ *
+ * int bpf_udp_flow_src_port(struct sk_buff *skb, int min, int max, int use_eth)
+ *  Description
+ *      It's common to implement tunnelling inside a UDP protocol to provide
+ *      additional randomness to the packet. The destination port of the UDP
+ *      header indicates the inner packet type whereas the source port is used
+ *      for additional entropy.
+ *
+ *  Return
+ *      An obfuscated hash of the packet that falls within the
+ *      min & max port range.
+ *      If min >= max, the default port range is used
  *
  * int bpf_sysctl_get_name(struct bpf_sysctl *ctx, char *buf, size_t buf_len, u64 flags)
  *	Description
@@ -2853,7 +2865,8 @@ union bpf_attr {
 	FN(sk_storage_get),		\
 	FN(sk_storage_delete),		\
 	FN(send_signal),		\
-	FN(tcp_gen_syncookie),
+	FN(tcp_gen_syncookie),  \
+	FN(udp_flow_src_port),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
index 120aa86c58d3..385bfd8b7436 100644
--- a/tools/testing/selftests/bpf/bpf_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_helpers.h
@@ -313,6 +313,8 @@ static unsigned int (*bpf_set_hash)(void *ctx, __u32 hash) =
 static int (*bpf_skb_adjust_room)(void *ctx, __s32 len_diff, __u32 mode,
 				  unsigned long long flags) =
 	(void *) BPF_FUNC_skb_adjust_room;
+static int (*bpf_udp_flow_src_port)(void *ctx, int min, int max, int use_eth) =
+	(void *) BPF_FUNC_udp_flow_src_port;
 
 /* Scan the ARCH passed in from ARCH env variable (see Makefile) */
 #if defined(__TARGET_ARCH_x86)
diff --git a/tools/testing/selftests/bpf/prog_tests/udp_flow_src_port.c b/tools/testing/selftests/bpf/prog_tests/udp_flow_src_port.c
new file mode 100644
index 000000000000..0f7303b51d1d
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/udp_flow_src_port.c
@@ -0,0 +1,28 @@
+#include <test_progs.h>
+#include <linux/pkt_cls.h>
+
+void test_udp_flow_src_port(void)
+{
+	const char *file = "./test_udp_flow_src_port_kern.o";
+	struct bpf_object *obj;
+	__u32 duration, retval, size;
+	int err, prog_fd;
+    char buf[128];
+    struct tcphdr *tcph = (void *)buf + sizeof(struct ethhdr) + sizeof(struct iphdr);
+
+	err = bpf_prog_load(file, BPF_PROG_TYPE_SCHED_CLS, &obj, &prog_fd);
+	if (err) {
+		error_cnt++;
+		return;
+	}
+
+    short original = tcph->source;
+
+	err = bpf_prog_test_run(prog_fd, 1, &pkt_v4, sizeof(pkt_v4),
+				buf, &size, &retval, &duration);
+	CHECK(err || retval != TC_ACT_OK || tcph->source == original, "ipv4",
+	      "err %d errno %d retval %d sport %d duration %d\n",
+	      err, errno, retval, tcph->source, duration);
+
+	bpf_object__close(obj);
+}
diff --git a/tools/testing/selftests/bpf/progs/test_udp_flow_src_port_kern.c b/tools/testing/selftests/bpf/progs/test_udp_flow_src_port_kern.c
new file mode 100644
index 000000000000..6238bd5fa856
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_udp_flow_src_port_kern.c
@@ -0,0 +1,47 @@
+#include <stddef.h>
+#include <stdbool.h>
+#include <string.h>
+#include <linux/bpf.h>
+#include <linux/if_ether.h>
+#include <linux/in.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <linux/pkt_cls.h>
+#include <linux/tcp.h>
+#include "bpf_helpers.h"
+#include "bpf_endian.h"
+
+int _version SEC("version") = 1;
+char _license[] SEC("license") = "GPL";
+
+SEC("skb_udp_flow")
+int process(struct __sk_buff *skb)
+{
+    void *data = (void *)(size_t)skb->data;
+    void *data_end = (void *)(size_t)skb->data_end;
+    /* Is it an Ethernet frame? */
+    struct ethhdr *ethernet_header = (struct ethhdr *)data;
+    data += sizeof(*ethernet_header);
+    if (data > data_end) {
+        return TC_ACT_SHOT;
+    }
+
+    struct iphdr *ip_header = (struct iphdr *)data;
+    data += sizeof(*ip_header);
+    if (data > data_end) {
+        return TC_ACT_SHOT;
+    }
+
+    struct tcphdr *tcp_header = (struct tcphdr*)data;
+    data += sizeof(*tcp_header);
+    if (data > data_end) {
+        return TC_ACT_SHOT;
+    }
+
+    //lets assign the calculated source port in the
+    // tcp packet and verify it in the test
+    int sport = bpf_udp_flow_src_port(skb, 0, 0, 0);
+    tcp_header->source = sport;
+
+    return TC_ACT_OK;
+}
-- 
2.21.0


^ permalink raw reply related

* [PATCH 0/1] bpf: introduce new helper udp_flow_src_port
From: Farid Zakaria @ 2019-08-03  4:43 UTC (permalink / raw)
  To: ast, daniel, netdev, bpf; +Cc: Farid Zakaria

This is a submission to expose a new eBPF helper method
to allow access to udp_flow_src_port -- which is useful
when doing any Foo Over UDP network tunneling.

I hope this change adheres to the submission guidelines.
I've included a test and verified it passes:

./test_progs -t udp_flow_src_port
#31 udp_flow_src_port:OK
Summary: 1/0 PASSED, 0 FAILED

* shoutout to https://github.com/g2p/vido/blob/master/vido
which made testing the kernel + change super easy *
(This is my first contribution)

Farid Zakaria (1):
  bpf: introduce new helper udp_flow_src_port

 include/uapi/linux/bpf.h                      | 21 +++++++--
 net/core/filter.c                             | 20 ++++++++
 tools/include/uapi/linux/bpf.h                | 21 +++++++--
 tools/testing/selftests/bpf/bpf_helpers.h     |  2 +
 .../bpf/prog_tests/udp_flow_src_port.c        | 28 +++++++++++
 .../bpf/progs/test_udp_flow_src_port_kern.c   | 47 +++++++++++++++++++
 6 files changed, 131 insertions(+), 8 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/udp_flow_src_port.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_udp_flow_src_port_kern.c

-- 
2.21.0


^ permalink raw reply

* Re: [PATCH] net/mlx4_core: Use refcount_t for refcount
From: Chuhong Yuan @ 2019-08-03  2:42 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: linux-kernel@vger.kernel.org, davem@davemloft.net, Tariq Toukan,
	linux-rdma@vger.kernel.org, netdev@vger.kernel.org
In-Reply-To: <47bb83d0111f1132bbf532c16be483c5efbe839f.camel@mellanox.com>

Saeed Mahameed <saeedm@mellanox.com> 于2019年8月3日周六 上午2:38写道：
>
> On Sat, 2019-08-03 at 00:10 +0800, Chuhong Yuan wrote:
> > Chuhong Yuan <hslester96@gmail.com> 于2019年8月2日周五 下午8:10写道：
> > > refcount_t is better for reference counters since its
> > > implementation can prevent overflows.
> > > So convert atomic_t ref counters to refcount_t.
> > >
> > > Also convert refcount from 0-based to 1-based.
> > >
> >
> > It seems that directly converting refcount from 0-based
> > to 1-based is infeasible.
> > I am sorry for this mistake.
>
> Just curious, why not keep it 0 based and use refcout_t ?
>
> refcount API should have the same semantics as atomic_t API .. no ?

refcount API will warn when increase a 0 refcount.
It regards this as a use-after-free.

^ permalink raw reply

* Re: [PATCH 31/34] nfs: convert put_page() to put_user_page*()
From: John Hubbard @ 2019-08-03  1:41 UTC (permalink / raw)
  To: Calum Mackay, john.hubbard, Andrew Morton
  Cc: Christoph Hellwig, Dan Williams, Dave Chinner, Dave Hansen,
	Ira Weiny, Jan Kara, Jason Gunthorpe, Jérôme Glisse,
	LKML, amd-gfx, ceph-devel, devel, devel, dri-devel, intel-gfx,
	kvm, linux-arm-kernel, linux-block, linux-crypto, linux-fbdev,
	linux-fsdevel, linux-media, linux-mm, linux-nfs, linux-rdma,
	linux-rpi-kernel, linux-xfs, netdev, rds-devel, sparclinux, x86,
	xen-devel, Trond Myklebust, Anna Schumaker
In-Reply-To: <1738cb1e-15d8-0bbe-5362-341664f6efc8@oracle.com>

On 8/2/19 6:27 PM, Calum Mackay wrote:
> On 02/08/2019 3:20 am, john.hubbard@gmail.com wrote:
... 
> Since it's static, and only called twice, might it be better to change its two callers [nfs_direct_{read,write}_schedule_iovec()] to call put_user_pages() directly, and remove nfs_direct_release_pages() entirely?
> 
> thanks,
> calum.
> 
> 
>>     void nfs_init_cinfo_from_dreq(struct nfs_commit_info *cinfo,
>>
 
Hi Calum,

Absolutely! Is it OK to add your reviewed-by, with the following incremental
patch made to this one?

diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index b00b89dda3c5..c0c1b9f2c069 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -276,11 +276,6 @@ ssize_t nfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
        return nfs_file_direct_write(iocb, iter);
 }
 
-static void nfs_direct_release_pages(struct page **pages, unsigned int npages)
-{
-       put_user_pages(pages, npages);
-}
-
 void nfs_init_cinfo_from_dreq(struct nfs_commit_info *cinfo,
                              struct nfs_direct_req *dreq)
 {
@@ -510,7 +505,7 @@ static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq,
                        pos += req_len;
                        dreq->bytes_left -= req_len;
                }
-               nfs_direct_release_pages(pagevec, npages);
+               put_user_pages(pagevec, npages);
                kvfree(pagevec);
                if (result < 0)
                        break;
@@ -933,7 +928,7 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq,
                        pos += req_len;
                        dreq->bytes_left -= req_len;
                }
-               nfs_direct_release_pages(pagevec, npages);
+               put_user_pages(pagevec, npages);
                kvfree(pagevec);
                if (result < 0)
                        break;



thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply related

* Re: [PATCH] net: mdio-octeon: Fix build error and Kconfig warning
From: Matthew Wilcox @ 2019-08-03  1:39 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: David Miller, andrew, broonie, devel, f.fainelli, gregkh,
	hkallweit1, kernel-build-reports, linux-arm-kernel, linux-next,
	netdev, lkp, rdunlap
In-Reply-To: <20190803013031.GA76252@archlinux-threadripper>

On Fri, Aug 02, 2019 at 06:30:31PM -0700, Nathan Chancellor wrote:
> On Fri, Aug 02, 2019 at 06:11:32PM -0700, David Miller wrote:
> > The proper way to fix this is to include either
> > 
> > 	linux/io-64-nonatomic-hi-lo.h
> > 
> > or
> > 
> > 	linux/io-64-nonatomic-lo-hi.h
> > 
> > whichever is appropriate.
> 
> Hmmmm, is that not what I did?
> 
> Although I did not know about io-64-nonatomic-hi-lo.h. What is the
> difference and which one is needed here?

Whether you write the high or low 32 bits first.  For this, it doesn't
matter, since the compiled driver will never be run on real hardware.

> There is apparently another failure when OF_MDIO is not set, I guess I
> can try to look into that as well and respin into a series if
> necessary.

Thanks for taking care of that!

^ permalink raw reply

* Re: [PATCH 31/34] nfs: convert put_page() to put_user_page*()
From: Calum Mackay @ 2019-08-03  1:27 UTC (permalink / raw)
  To: john.hubbard, Andrew Morton
  Cc: calum.mackay, Christoph Hellwig, Dan Williams, Dave Chinner,
	Dave Hansen, Ira Weiny, Jan Kara, Jason Gunthorpe,
	Jérôme Glisse, LKML, amd-gfx, ceph-devel, devel, devel,
	dri-devel, intel-gfx, kvm, linux-arm-kernel, linux-block,
	linux-crypto, linux-fbdev, linux-fsdevel, linux-media, linux-mm,
	linux-nfs, linux-rdma, linux-rpi-kernel, linux-xfs, netdev,
	rds-devel, sparclinux, x86, xen-devel, John Hubbard,
	Trond Myklebust, Anna Schumaker
In-Reply-To: <20190802022005.5117-32-jhubbard@nvidia.com>

On 02/08/2019 3:20 am, john.hubbard@gmail.com wrote:
> From: John Hubbard <jhubbard@nvidia.com>
> 
> For pages that were retained via get_user_pages*(), release those pages
> via the new put_user_page*() routines, instead of via put_page() or
> release_pages().
> 
> This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
> ("mm: introduce put_user_page*(), placeholder versions").
> 
> Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
> Cc: Anna Schumaker <anna.schumaker@netapp.com>
> Cc: linux-nfs@vger.kernel.org
> Signed-off-by: John Hubbard <jhubbard@nvidia.com>
> ---
>   fs/nfs/direct.c | 4 +---
>   1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
> index 0cb442406168..b00b89dda3c5 100644
> --- a/fs/nfs/direct.c
> +++ b/fs/nfs/direct.c
> @@ -278,9 +278,7 @@ ssize_t nfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
>   
>   static void nfs_direct_release_pages(struct page **pages, unsigned int npages)
>   {
> -	unsigned int i;
> -	for (i = 0; i < npages; i++)
> -		put_page(pages[i]);
> +	put_user_pages(pages, npages);
>   }

Since it's static, and only called twice, might it be better to change 
its two callers [nfs_direct_{read,write}_schedule_iovec()] to call 
put_user_pages() directly, and remove nfs_direct_release_pages() entirely?

thanks,
calum.


>   
>   void nfs_init_cinfo_from_dreq(struct nfs_commit_info *cinfo,
> 

^ permalink raw reply

* Re: [PATCH] net: mdio-octeon: Fix build error and Kconfig warning
From: Nathan Chancellor @ 2019-08-03  1:30 UTC (permalink / raw)
  To: David Miller
  Cc: andrew, broonie, devel, f.fainelli, gregkh, hkallweit1,
	kernel-build-reports, linux-arm-kernel, linux-next, netdev, willy,
	lkp, rdunlap
In-Reply-To: <20190802.181132.1425585873361511856.davem@davemloft.net>

On Fri, Aug 02, 2019 at 06:11:32PM -0700, David Miller wrote:
> From: Nathan Chancellor <natechancellor@gmail.com>
> Date: Wed, 31 Jul 2019 11:50:24 -0700
> 
> > arm allyesconfig warns:
> > 
> > WARNING: unmet direct dependencies detected for MDIO_OCTEON
> >   Depends on [n]: NETDEVICES [=y] && MDIO_DEVICE [=y] && MDIO_BUS [=y]
> > && 64BIT && HAS_IOMEM [=y] && OF_MDIO [=y]
> >   Selected by [y]:
> >   - OCTEON_ETHERNET [=y] && STAGING [=y] && (CAVIUM_OCTEON_SOC &&
> > NETDEVICES [=y] || COMPILE_TEST [=y])
> > 
> > and errors:
> > 
> > In file included from ../drivers/net/phy/mdio-octeon.c:14:
> > ../drivers/net/phy/mdio-octeon.c: In function 'octeon_mdiobus_probe':
> > ../drivers/net/phy/mdio-cavium.h:111:36: error: implicit declaration of
> > function 'writeq'; did you mean 'writeb'?
> 
> The proper way to fix this is to include either
> 
> 	linux/io-64-nonatomic-hi-lo.h
> 
> or
> 
> 	linux/io-64-nonatomic-lo-hi.h
> 
> whichever is appropriate.

Hmmmm, is that not what I did?

Although I did not know about io-64-nonatomic-hi-lo.h. What is the
difference and which one is needed here?

There is apparently another failure when OF_MDIO is not set, I guess I
can try to look into that as well and respin into a series if
necessary.

Cheers,
Nathan

^ permalink raw reply

* NFSv4: rare bug *and* root cause captured in the wild
From: John Hubbard @ 2019-08-03  1:27 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, David S. Miller, Chuck Lever,
	J. Bruce Fields, linux-nfs, netdev, LKML

Hi,

While testing unrelated (put_user_pages) work on Linux 5.3-rc2+,
I rebooted the NFS *server*, tried to ssh to the client, and the
client dumped a backtrace as shown below.

Good news: I found that I can reliably reproduce it with those steps, 
at commit 1e78030e5e5b (in linux.git) plus my 34-patch series [1], which
off course is completely unrelated. :) Anyway, I'm making a note of the 
exact repro commit, so I don't lose the repro.

I see what's wrong, but I do *not* see an easy fix. Can one of the
designers please recommend an approach to fixing this?

This is almost certainly caused by commit 7e0a0e38fcfe ("SUNRPC: 
Replace the queue timer with a delayed work function"), which changed 
over to running things in process (kthread) context. The commit is dated
May 1, 2019, but I've only been running NFSv4 for a couple days, so
the problem has likely been there all along, not specific to 5.3.

The call stack starts off in atomic context, so we get the bug:

nfs4_do_reclaim
    rcu_read_lock /* we are now in_atomic() and must not sleep */
        nfs4_purge_state_owners
            nfs4_free_state_owner
                nfs4_destroy_seqid_counter
                    rpc_destroy_wait_queue
                        cancel_delayed_work_sync
                            __cancel_work_timer
                                __flush_work
                                    start_flush_work
                                        might_sleep: 
                                         (kernel/workqueue.c:2975: BUG)

Details: actual backtrace I am seeing:

BUG: sleeping function called from invalid context at kernel/workqueue.c:2975
in_atomic(): 1, irqs_disabled(): 0, pid: 2224, name: 10.110.48.28-ma
1 lock held by 10.110.48.28-ma/2224:
 #0: 00000000d338d2ec (rcu_read_lock){....}, at: nfs4_do_reclaim+0x22/0x6b0 [nfsv4]
CPU: 8 PID: 2224 Comm: 10.110.48.28-ma Not tainted 5.3.0-rc2-hubbard-github+ #52
Hardware name: ASUS X299-A/PRIME X299-A, BIOS 1704 02/14/2019
Call Trace:
 dump_stack+0x46/0x60
 ___might_sleep.cold+0x8e/0x9b
 __flush_work+0x61/0x370
 ? find_held_lock+0x2b/0x80
 ? add_timer+0x100/0x200
 ? _raw_spin_lock_irqsave+0x35/0x40
 __cancel_work_timer+0xfb/0x180
 ? nfs4_purge_state_owners+0xf4/0x170 [nfsv4]
 nfs4_free_state_owner+0x10/0x50 [nfsv4]
 nfs4_purge_state_owners+0x139/0x170 [nfsv4]
 nfs4_do_reclaim+0x7a/0x6b0 [nfsv4]
 ? pnfs_destroy_layouts_byclid+0xc4/0x100 [nfsv4]
 nfs4_state_manager+0x6be/0x7f0 [nfsv4]
 nfs4_run_state_manager+0x1b/0x40 [nfsv4]
 kthread+0xfb/0x130
 ? nfs4_state_manager+0x7f0/0x7f0 [nfsv4]
 ? kthread_bind+0x20/0x20
 ret_from_fork+0x35/0x40

And last but not least, some words of encouragement: the reason I moved 
from NFSv3 to NFSv4 is that the easy authentication (matching UIDs on
client and server) now works perfectly. Yea! So I'm enjoying v4, despite 
the occasional minor glitch. :)

[1] https://lore.kernel.org/r/20190802022005.5117-1-jhubbard@nvidia.com

thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply

* Re: [PATCH net] r8152: fix typo in register name
From: David Miller @ 2019-08-03  1:17 UTC (permalink / raw)
  To: kevlo; +Cc: hayeswang, netdev
In-Reply-To: <20190801032938.GA22256@ns.kevlo.org>

From: Kevin Lo <kevlo@kevlo.org>
Date: Thu, 1 Aug 2019 11:29:38 +0800

> It is likely that PAL_BDC_CR should be PLA_BDC_CR.
> 
> Signed-off-by: Kevin Lo <kevlo@kevlo.org>

Applied.

^ permalink raw reply

* Re: [PATCH] enetc: Select PHYLIB while CONFIG_FSL_ENETC_VF is set
From: David Miller @ 2019-08-03  1:15 UTC (permalink / raw)
  To: yuehaibing; +Cc: claudiu.manoil, linux-kernel, netdev
In-Reply-To: <20190801012419.9728-1-yuehaibing@huawei.com>

From: YueHaibing <yuehaibing@huawei.com>
Date: Thu, 1 Aug 2019 09:24:19 +0800

> Like FSL_ENETC, when CONFIG_FSL_ENETC_VF is set,
> we should select PHYLIB, otherwise building still fails:
> 
> drivers/net/ethernet/freescale/enetc/enetc.o: In function `enetc_open':
> enetc.c:(.text+0x2744): undefined reference to `phy_start'
> enetc.c:(.text+0x282c): undefined reference to `phy_disconnect'
> drivers/net/ethernet/freescale/enetc/enetc.o: In function `enetc_close':
> enetc.c:(.text+0x28f8): undefined reference to `phy_stop'
> enetc.c:(.text+0x2904): undefined reference to `phy_disconnect'
> drivers/net/ethernet/freescale/enetc/enetc_ethtool.o:(.rodata+0x3f8): undefined reference to `phy_ethtool_get_link_ksettings'
> drivers/net/ethernet/freescale/enetc/enetc_ethtool.o:(.rodata+0x400): undefined reference to `phy_ethtool_set_link_ksettings'
> 
> Reported-by: Hulk Robot <hulkci@huawei.com>
> Fixes: d4fd0404c1c9 ("enetc: Introduce basic PF and VF ENETC ethernet drivers")
> Signed-off-by: YueHaibing <yuehaibing@huawei.com>

Applied.

^ permalink raw reply

* Re: [PATCH net] net: phy: fix race in genphy_update_link
From: David Miller @ 2019-08-03  1:15 UTC (permalink / raw)
  To: hkallweit1; +Cc: andrew, f.fainelli, netdev, liuyonglong
In-Reply-To: <19122a98-cfcd-424c-a598-e034c1a9349d@gmail.com>

From: Heiner Kallweit <hkallweit1@gmail.com>
Date: Wed, 31 Jul 2019 23:05:10 +0200

> In phy_start_aneg() autoneg is started, and immediately after that
> link and autoneg status are read. As reported in [0] it can happen that
> at time of this read the PHY has reset the "aneg complete" bit but not
> yet the "link up" bit, what can result in a false link-up detection.
> To fix this don't report link as up if we're in aneg mode and PHY
> doesn't signal "aneg complete".
> 
> [0] https://marc.info/?t=156413509900003&r=1&w=2
> 
> Fixes: 4950c2ba49cc ("net: phy: fix autoneg mismatch case in genphy_read_status")
> Reported-by: liuyonglong <liuyonglong@huawei.com>
> Tested-by: liuyonglong <liuyonglong@huawei.com>
> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH] net: mdio-octeon: Fix build error and Kconfig warning
From: David Miller @ 2019-08-03  1:11 UTC (permalink / raw)
  To: natechancellor
  Cc: andrew, broonie, devel, f.fainelli, gregkh, hkallweit1,
	kernel-build-reports, linux-arm-kernel, linux-next, netdev, willy,
	lkp, rdunlap
In-Reply-To: <20190731185023.20954-1-natechancellor@gmail.com>

From: Nathan Chancellor <natechancellor@gmail.com>
Date: Wed, 31 Jul 2019 11:50:24 -0700

> arm allyesconfig warns:
> 
> WARNING: unmet direct dependencies detected for MDIO_OCTEON
>   Depends on [n]: NETDEVICES [=y] && MDIO_DEVICE [=y] && MDIO_BUS [=y]
> && 64BIT && HAS_IOMEM [=y] && OF_MDIO [=y]
>   Selected by [y]:
>   - OCTEON_ETHERNET [=y] && STAGING [=y] && (CAVIUM_OCTEON_SOC &&
> NETDEVICES [=y] || COMPILE_TEST [=y])
> 
> and errors:
> 
> In file included from ../drivers/net/phy/mdio-octeon.c:14:
> ../drivers/net/phy/mdio-octeon.c: In function 'octeon_mdiobus_probe':
> ../drivers/net/phy/mdio-cavium.h:111:36: error: implicit declaration of
> function 'writeq'; did you mean 'writeb'?

The proper way to fix this is to include either

	linux/io-64-nonatomic-hi-lo.h

or

	linux/io-64-nonatomic-lo-hi.h

whichever is appropriate.

^ permalink raw reply

* Re: [PATCH] atm: iphase: Fix Spectre v1 vulnerability
From: Gustavo A. R. Silva @ 2019-08-03  0:42 UTC (permalink / raw)
  To: David Miller; +Cc: 3chas3, linux-atm-general, netdev, linux-kernel
In-Reply-To: <20190802.173125.399771294364845553.davem@davemloft.net>

Hi Dave,

On 8/2/19 7:31 PM, David Miller wrote:
> From: "Gustavo A. R. Silva" <gustavo@embeddedor.com>
> Date: Tue, 30 Jul 2019 22:21:41 -0500
> 
>> board is controlled by user-space, hence leading to a potential
>> exploitation of the Spectre variant 1 vulnerability.
>>
>> This issue was detected with the help of Smatch:
> 
> Applied and queued up for -stable.
> 
> Do not CC: -stable for networking fixes, we take care of the stable
> submissions manually for this subsystem.
> 

Yeah. I'm aware of that. The thing is that you don't appear
as a maintainer of this file:

$ scripts/get_maintainer.pl --nokeywords --nogit --nogit-fallback -f drivers/atm/iphase.c
Chas Williams <3chas3@gmail.com> (maintainer:ATM)
linux-atm-general@lists.sourceforge.net (moderated list:ATM)
netdev@vger.kernel.org (open list:ATM)
linux-kernel@vger.kernel.org (open list)

so, I didn't know this patch would be applied to net.

Thanks
--
Gustavo

^ permalink raw reply

* Re: [PATCH net-next v2 0/6] net: dsa: mv88e6xxx: add support for MV88E6220
From: David Miller @ 2019-08-03  0:59 UTC (permalink / raw)
  To: h.feurstein
  Cc: netdev, linux-kernel, andrew, vivien.didelot, f.fainelli,
	rasmus.villemoes
In-Reply-To: <20190731082351.3157-1-h.feurstein@gmail.com>

From: Hubert Feurstein <h.feurstein@gmail.com>
Date: Wed, 31 Jul 2019 10:23:45 +0200

> This patch series adds support for the MV88E6220 chip to the mv88e6xxx driver.
> The MV88E6220 is almost the same as MV88E6250 except that the ports 2-4 are
> not routed to pins.
> 
> Furthermore, PTP support is added to the MV88E6250 family.
> 
> v2:
>  - insert all 6220 entries in correct numerical order
>  - introduce invalid_port_mask
>  - move ptp_cc_mult* to ptp_ops and restored original ptp_adjfine code
>  - added Andrews Reviewed-By to patch 2 and 4

Series applied, thank you.

^ permalink raw reply

* Re: [PATCH] net/ethernet/qlogic/qed: force the string buffer NULL-terminated
From: David Miller @ 2019-08-03  0:57 UTC (permalink / raw)
  To: xywang.sjtu; +Cc: aelior, GR-everest-linux-l2, netdev
In-Reply-To: <20190731081542.14178-1-xywang.sjtu@sjtu.edu.cn>

From: Wang Xiayang <xywang.sjtu@sjtu.edu.cn>
Date: Wed, 31 Jul 2019 16:15:42 +0800

> strncpy() does not ensure NULL-termination when the input string
> size equals to the destination buffer size 30.
> The output string is passed to qed_int_deassertion_aeu_bit()
> which calls DP_INFO() and relies NULL-termination.
> 
> Use strlcpy instead. The other conditional branch above strncpy()
> needs no fix as snprintf() ensures NULL-termination.
> 
> This issue is identified by a Coccinelle script.
> 
> Signed-off-by: Wang Xiayang <xywang.sjtu@sjtu.edu.cn>

Applied.

^ permalink raw reply

* RE: [PATCH v2 net] hv_sock: Fix hang when a connection is closed
From: Dexuan Cui @ 2019-08-03  0:49 UTC (permalink / raw)
  To: David Miller
  Cc: Sunil Muthuswamy, netdev@vger.kernel.org, KY Srinivasan,
	Haiyang Zhang, Stephen Hemminger, sashal@kernel.org,
	Michael Kelley, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org, olaf@aepfle.de, apw@canonical.com,
	jasowang@redhat.com, vkuznets, marcelo.cerri@canonical.com
In-Reply-To: <20190802.172729.1656276508211556851.davem@davemloft.net>

> From: linux-hyperv-owner@vger.kernel.org
> Sent: Friday, August 2, 2019 5:27 PM
> ...
> Applied and queued up for -stable.
> 
> Do not ever CC: stable for networking patches, we submit to -stable manually.
 
Thanks, David!
I'll remember to not add the stable tag for network patches.

Thanks,
-- Dexuan

^ permalink raw reply

* Re: [PATCH net-next v2 0/4] net: phy: Add AST2600 MDIO support
From: David Miller @ 2019-08-03  0:33 UTC (permalink / raw)
  To: andrew
  Cc: netdev, robh+dt, mark.rutland, joel, andrew, f.fainelli,
	hkallweit1, devicetree, linux-arm-kernel, linux-aspeed,
	linux-kernel
In-Reply-To: <20190731053959.16293-1-andrew@aj.id.au>

From: Andrew Jeffery <andrew@aj.id.au>
Date: Wed, 31 Jul 2019 15:09:55 +0930

> v2 of the ASPEED MDIO series addresses comments from Rob on the devicetree
> bindings and Andrew on the driver itself.
> 
> v1 of the series can be found here:
> 
> http://patchwork.ozlabs.org/cover/1138140/
> 
> Please review!

Series applied, thank you.

^ permalink raw reply

* Re: [PATCH] atm: iphase: Fix Spectre v1 vulnerability
From: David Miller @ 2019-08-03  0:31 UTC (permalink / raw)
  To: gustavo; +Cc: 3chas3, linux-atm-general, netdev, linux-kernel
In-Reply-To: <20190731032141.GA30246@embeddedor>

From: "Gustavo A. R. Silva" <gustavo@embeddedor.com>
Date: Tue, 30 Jul 2019 22:21:41 -0500

> board is controlled by user-space, hence leading to a potential
> exploitation of the Spectre variant 1 vulnerability.
> 
> This issue was detected with the help of Smatch:

Applied and queued up for -stable.

Do not CC: -stable for networking fixes, we take care of the stable
submissions manually for this subsystem.

Thank you.

^ permalink raw reply

* Re: [PATCH net v3] net: ipv6: Fix a bug in ndisc_send_ns when netdev only has a global address
From: David Miller @ 2019-08-03  0:29 UTC (permalink / raw)
  To: suyj.fnst; +Cc: kuznet, yoshfuji, netdev, linux-kernel
In-Reply-To: <1564537972-76503-1-git-send-email-suyj.fnst@cn.fujitsu.com>

From: Su Yanjun <suyj.fnst@cn.fujitsu.com>
Date: Wed, 31 Jul 2019 09:52:52 +0800

> When the egress interface does not have a link local address, it can
> not communicate with other hosts.
> 
> In RFC4861, 7.2.2 says
> "If the source address of the packet prompting the solicitation is the
> same as one of the addresses assigned to the outgoing interface, that
> address SHOULD be placed in the IP Source Address of the outgoing
> solicitation.  Otherwise, any one of the addresses assigned to the
> interface should be used."
> 
> In this patch we try get a global address if we get ll address failed.
> 
> Signed-off-by: Su Yanjun <suyj.fnst@cn.fujitsu.com>
> ---
> Changes since V2:
> 	- Let banned_flags under the scope of its use.

I do not want to apply this.

The only situation where this can occur is when userland is managing the
interface addresses and has failed to properly add a link local address.

That is a failure by userspace to uphold it's responsibilites when it
has taken over management of these issues, not a situation the kernel
should handle.

Sorry.

^ permalink raw reply

* Re: [PATCH v2 net] hv_sock: Fix hang when a connection is closed
From: David Miller @ 2019-08-03  0:27 UTC (permalink / raw)
  To: decui
  Cc: sunilmut, netdev, kys, haiyangz, sthemmin, sashal, mikelley,
	linux-hyperv, linux-kernel, olaf, apw, jasowang, vkuznets,
	marcelo.cerri
In-Reply-To: <PU1P153MB01696DDD3A3F601370701DD2BFDF0@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM>

From: Dexuan Cui <decui@microsoft.com>
Date: Wed, 31 Jul 2019 01:25:45 +0000

> 
> There is a race condition for an established connection that is being closed
> by the guest: the refcnt is 4 at the end of hvs_release() (Note: here the
> 'remove_sock' is false):
> 
> 1 for the initial value;
> 1 for the sk being in the bound list;
> 1 for the sk being in the connected list;
> 1 for the delayed close_work.
> 
> After hvs_release() finishes, __vsock_release() -> sock_put(sk) *may*
> decrease the refcnt to 3.
> 
> Concurrently, hvs_close_connection() runs in another thread:
>   calls vsock_remove_sock() to decrease the refcnt by 2;
>   call sock_put() to decrease the refcnt to 0, and free the sk;
>   next, the "release_sock(sk)" may hang due to use-after-free.
> 
> In the above, after hvs_release() finishes, if hvs_close_connection() runs
> faster than "__vsock_release() -> sock_put(sk)", then there is not any issue,
> because at the beginning of hvs_close_connection(), the refcnt is still 4.
> 
> The issue can be resolved if an extra reference is taken when the
> connection is established.
> 
> Fixes: a9eeb998c28d ("hv_sock: Add support for delayed close")
> Signed-off-by: Dexuan Cui <decui@microsoft.com>

Applied and queued up for -stable.

Do not ever CC: stable for networking patches, we submit to -stable manually.

Thank you.

^ permalink raw reply

* Re: [PATCH net-next] net/tls: prevent skb_orphan() from leaking TLS plain text with offload
From: David Miller @ 2019-08-03  0:24 UTC (permalink / raw)
  To: jakub.kicinski
  Cc: netdev, oss-drivers, edumazet, borisp, aviadye, davejwatson,
	john.fastabend, daniel, willemb, xiyou.wangcong, fw,
	alexei.starovoitov
In-Reply-To: <20190730211258.13748-1-jakub.kicinski@netronome.com>

From: Jakub Kicinski <jakub.kicinski@netronome.com>
Date: Tue, 30 Jul 2019 14:12:58 -0700

> I'm sending this for net-next because of lack of confidence
> in my own abilities. It should apply cleanly to net... :)

It looks like there will be changes to this.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox