* Re: [patch net-next 0/3] net: devlink: Finish network namespace support
From: Jiri Pirko @ 2019-08-02 7:42 UTC (permalink / raw)
To: David Ahern; +Cc: Jakub Kicinski, netdev, davem, sthemmin, mlxsw
In-Reply-To: <de94a881-cb87-e251-d55c-9f40d24b08d5@gmail.com>
Thu, Aug 01, 2019 at 12:31:52AM CEST, dsahern@gmail.com wrote:
>On 7/31/19 4:28 PM, Jakub Kicinski wrote:
>> On Wed, 31 Jul 2019 16:07:31 -0600, David Ahern wrote:
>>> On 7/31/19 4:02 PM, Jakub Kicinski wrote:
>>>> Can you elaborate further? Ports for most purposes are represented by
>>>> netdevices. Devlink port instances expose global topological view of
>>>> the ports which is primarily relevant if you can see the entire ASIC.
>>>> I think the global configuration and global view of resources is still
>>>> the most relevant need, so in your diagram you must account for some
>>>> "all-seeing" instance, e.g.:
>>>>
>>>> namespace 1 | namespace 2 | ... | namespace N
>>>> | | |
>>>> { ports 1 } | { ports 2 } | ... | { ports N }
>>>> | | |
>>>> subdevlink 1 | subdevlink 2 | ... | subdevlink N
>>>> \______ | _______/
>>>> master ASIC devlink
>>>> =================================================
>>>> driver
>>>>
>>>> No?
>>>
>>> sure, there could be a master devlink visible to the user if that makes
>>> sense or the driver can account for it behind the scenes as the sum of
>>> the devlink instances.
>>>
>>> The goal is to allow ports within an asic [1] to be divided across
>>> network namespace where each namespace sees a subset of the ports. This
>>> allows creating multiple logical switches from a single physical asic.
>>>
>>> [1] within constraints imposed by the driver/hardware - for example to
>>> account for resources shared by a set of ports. e.g., front panel ports
>>> 1 - 4 have shared resources and must always be in the same devlink instance.
>>
>> So the ASIC would start out all partitioned? Presumably some would
>> still like to use it non-partitioned? What follows there must be a top
>> level instance to decide on partitioning, and moving resources between
>> sub-instances.
>>
>> Right now I don't think there is much info in devlink ports which would
>> be relevant without full view of the ASIC..
>>
>
>not sure how it would play out. really just 'thinking out loud' about
>the above use case to make sure devlink with proper namespace support
>allows it - or does not prevent it.
I Don't see reason or usecase to have ports or other objects of devlink
in separate namespaces. Devlink and it's objects are one big item,
should be always together.
^ permalink raw reply
* Re: [Intel-wired-lan] [PATCH bpf-next v4 03/11] libbpf: add flags to umem config
From: Björn Töpel @ 2019-08-02 7:26 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: Kevin Laatz, Netdev, Alexei Starovoitov, Daniel Borkmann,
Björn Töpel, Karlsson, Magnus, Jakub Kicinski,
Jonathan Lemon, Saeed Mahameed, Maxim Mikityanskiy,
Stephen Hemminger, Bruce Richardson, ciara.loftus,
intel-wired-lan, bpf
In-Reply-To: <CAEf4Bzar-KgCjUEfKVeWzcB77xvXDagZFRQKDvWo1o9-JzCirw@mail.gmail.com>
On Fri, 2 Aug 2019 at 09:19, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
>
> On Thu, Aug 1, 2019 at 12:34 AM Björn Töpel <bjorn.topel@gmail.com> wrote:
> >
[...]
> >
> > Old application, dynamically linked to new libbpf.so will crash,
> > right? Old application passes old version of xsk_umem_config, and new
> > library accesses (non-existing) flag struct member.
>
> I think we have similar problems for all the _xattr type of commands
> (as well some of btf stuff accepting extra opts structs). How is this
> problem solved in general? Do we version same function multiple times,
> one for each added field? It feels like there should be some better
> way to handle this...
>
If the size of the struct was passed as an argument (and extra care is
taken when adding members to the struct), it could be handled w/o
versioning... but that's not the case here. :-( Versioning is a mess
to deal with, so I'd be happy if it could be avoided...
^ permalink raw reply
* Re: [PATCH v2 bpf-next 1/4] bpf: unprivileged BPF access via /dev/bpf
From: Song Liu @ 2019-08-02 7:21 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Kees Cook, Networking, bpf, Alexei Starovoitov, Daniel Borkmann,
Kernel Team, Lorenz Bauer, Jann Horn, Greg KH, Linux API,
LSM List
In-Reply-To: <CALCETrVoZL1YGUxx3kM-d21TWVRKdKw=f2B8aE5wc2zmX1cQ4g@mail.gmail.com>
Hi Andy,
> On Jul 31, 2019, at 12:09 PM, Andy Lutomirski <luto@kernel.org> wrote:
>
> On Wed, Jul 31, 2019 at 1:10 AM Song Liu <songliubraving@fb.com> wrote:
>>
>>
>>
>>> On Jul 30, 2019, at 1:24 PM, Andy Lutomirski <luto@kernel.org> wrote:
>>>
>>> On Mon, Jul 29, 2019 at 10:07 PM Song Liu <songliubraving@fb.com> wrote:
>>>>
>>>> Hi Andy,
>>>>
>>>>> On Jul 27, 2019, at 11:20 AM, Song Liu <songliubraving@fb.com> wrote:
>>>>>
>>>>> Hi Andy,
>>>>>
>>>>>
>>
>> [...]
>>
>>>>>
>>>>
>>>> I would like more comments on this.
>>>>
>>>> Currently, bpf permission is more or less "root or nothing", which we
>>>> would like to change.
>>>>
>>>> The short term goal is to separate bpf from root, in other words, it is
>>>> "all or nothing". Special user space utilities, such as systemd, would
>>>> benefit from this. Once this is implemented, systemd can call sys_bpf()
>>>> when it is not running as root.
>>>
>>> As generally nasty as Linux capabilities are, this sounds like a good
>>> use for CAP_BPF_ADMIN.
>>
>> I actually agree CAP_BPF_ADMIN makes sense. The hard part is to make
>> existing tools (setcap, getcap, etc.) and libraries aware of the new CAP.
>
> It's been done before -- it's not that hard. IMO the main tricky bit
> would be try be somewhat careful about defining exactly what
> CAP_BPF_ADMIN does.
Agreed. I think defining CAP_BPF_ADMIN could be a good topic for the
Plumbers conference.
OTOH, I don't think we have to wait for CAP_BPF_ADMIN to allow daemons
like systemd to do sys_bpf() without root.
>
>>> I don't see why you need to invent a whole new mechanism for this.
>>> The entire cgroup ecosystem outside bpf() does just fine using the
>>> write permission on files in cgroupfs to control access. Why can't
>>> bpf() do the same thing?
>>
>> It is easier to use write permission for BPF_PROG_ATTACH. But it is
>> not easy to do the same for other bpf commands: BPF_PROG_LOAD and
>> BPF_MAP_*. A lot of these commands don't have target concept. Maybe
>> we should have target concept for all these commands. But that is a
>> much bigger project. OTOH, "all or nothing" model allows all these
>> commands at once.
>
> For BPF_PROG_LOAD, I admit I've never understood why permission is
> required at all. I think that CAP_SYS_ADMIN or similar should be
> needed to get is_priv in the verifier, but I think that should mainly
> be useful for tracing, and that requires lots of privilege anyway.
> BPF_MAP_* is probably the trickiest part. One solution would be some
> kind of bpffs, but I'm sure other solutions are possible.
Improving permission management of cgroup_bpf is another good topic to
discuss. However, it is also an overkill for current use case.
Let me get more details about the use case, so that we have a clear
picture about short term and long term goals.
Thanks again for your suggestions.
Song
^ permalink raw reply
* Re: [Intel-wired-lan] [PATCH bpf-next v4 03/11] libbpf: add flags to umem config
From: Andrii Nakryiko @ 2019-08-02 7:19 UTC (permalink / raw)
To: Björn Töpel
Cc: Kevin Laatz, Netdev, Alexei Starovoitov, Daniel Borkmann,
Björn Töpel, Karlsson, Magnus, Jakub Kicinski,
Jonathan Lemon, Saeed Mahameed, Maxim Mikityanskiy,
Stephen Hemminger, Bruce Richardson, ciara.loftus,
intel-wired-lan, bpf
In-Reply-To: <CAJ+HfNhYe_FgV0tGTLzaFGVSiimVnthgESN8Psdtpxw696w0OQ@mail.gmail.com>
On Thu, Aug 1, 2019 at 12:34 AM Björn Töpel <bjorn.topel@gmail.com> wrote:
>
> On Thu, 1 Aug 2019 at 08:59, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> >
> > On Wed, Jul 31, 2019 at 8:21 AM Björn Töpel <bjorn.topel@gmail.com> wrote:
> > >
> > > On Tue, 30 Jul 2019 at 19:43, Kevin Laatz <kevin.laatz@intel.com> wrote:
> > > >
> > > > This patch adds a 'flags' field to the umem_config and umem_reg structs.
> > > > This will allow for more options to be added for configuring umems.
> > > >
> > > > The first use for the flags field is to add a flag for unaligned chunks
> > > > mode. These flags can either be user-provided or filled with a default.
> > > >
> > > > Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
> > > > Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
> > > >
> > > > ---
> > > > v2:
> > > > - Removed the headroom check from this patch. It has moved to the
> > > > previous patch.
> > > >
> > > > v4:
> > > > - modified chunk flag define
> > > > ---
> >
> > [...]
> >
> > > > diff --git a/tools/lib/bpf/xsk.h b/tools/lib/bpf/xsk.h
> > > > index 833a6e60d065..44a03d8c34b9 100644
> > > > --- a/tools/lib/bpf/xsk.h
> > > > +++ b/tools/lib/bpf/xsk.h
> > > > @@ -170,12 +170,14 @@ LIBBPF_API int xsk_socket__fd(const struct xsk_socket *xsk);
> > > > #define XSK_UMEM__DEFAULT_FRAME_SHIFT 12 /* 4096 bytes */
> > > > #define XSK_UMEM__DEFAULT_FRAME_SIZE (1 << XSK_UMEM__DEFAULT_FRAME_SHIFT)
> > > > #define XSK_UMEM__DEFAULT_FRAME_HEADROOM 0
> > > > +#define XSK_UMEM__DEFAULT_FLAGS 0
> > > >
> > > > struct xsk_umem_config {
> > > > __u32 fill_size;
> > > > __u32 comp_size;
> > > > __u32 frame_size;
> > > > __u32 frame_headroom;
> > > > + __u32 flags;
> > >
> > > And the flags addition here, unfortunately, requires symbol versioning
> > > of xsk_umem__create(). That'll be the first in libbpf! :-)
> >
> > xsk_umem_config is passed by pointer to xsk_umem__create(), so this
> > doesn't break ABI, does it?
> >
>
> Old application, dynamically linked to new libbpf.so will crash,
> right? Old application passes old version of xsk_umem_config, and new
> library accesses (non-existing) flag struct member.
I think we have similar problems for all the _xattr type of commands
(as well some of btf stuff accepting extra opts structs). How is this
problem solved in general? Do we version same function multiple times,
one for each added field? It feels like there should be some better
way to handle this...
>
>
> Björn
>
>
> > >
> > >
> > > Björn
> > >
> > > > };
> > > >
> > > > /* Flags for the libbpf_flags field. */
> > > > --
> > > > 2.17.1
> > > >
> > > > _______________________________________________
> > > > Intel-wired-lan mailing list
> > > > Intel-wired-lan@osuosl.org
> > > > https://lists.osuosl.org/mailman/listinfo/intel-wired-lan
^ permalink raw reply
* Re: [PATCH v3 bpf-next 02/12] libbpf: implement BPF CO-RE offset relocation algorithm
From: Andrii Nakryiko @ 2019-08-02 7:16 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
Daniel Borkmann, Yonghong Song, Song Liu, Kernel Team
In-Reply-To: <20190801235030.bzssmwzuvzdy7h7t@ast-mbp.dhcp.thefacebook.com>
On Thu, Aug 1, 2019 at 4:50 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Wed, Jul 31, 2019 at 11:47:53PM -0700, Andrii Nakryiko wrote:
> > This patch implements the core logic for BPF CO-RE offsets relocations.
> > Every instruction that needs to be relocated has corresponding
> > bpf_offset_reloc as part of BTF.ext. Relocations are performed by trying
> > to match recorded "local" relocation spec against potentially many
> > compatible "target" types, creating corresponding spec. Details of the
> > algorithm are noted in corresponding comments in the code.
> >
> > Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> > Acked-by: Song Liu <songliubraving@fb.com>
> ...
> > + if (btf_is_composite(t)) {
> > + const struct btf_member *m = (void *)(t + 1);
> > + __u32 offset;
> > +
> > + if (access_idx >= BTF_INFO_VLEN(t->info))
> > + return -EINVAL;
> > +
> > + m = &m[access_idx];
> > +
> > + if (BTF_INFO_KFLAG(t->info)) {
> > + if (BTF_MEMBER_BITFIELD_SIZE(m->offset))
> > + return -EINVAL;
> > + offset = BTF_MEMBER_BIT_OFFSET(m->offset);
> > + } else {
> > + offset = m->offset;
> > + }
>
> very similar logic exists in btf_dump.c
> probably makes sense to make a common helper at some point.
Will add btf_member_bit_offset(type, member) and
btf_member_bit_size(type, member).
>
> > +static size_t bpf_core_essential_name_len(const char *name)
> > +{
> > + size_t n = strlen(name);
> > + int i = n - 3;
> > +
> > + while (i > 0) {
> > + if (name[i] == '_' && name[i + 1] == '_' && name[i + 2] == '_')
> > + return i;
> > + i--;
> > + }
> > + return n;
> > +}
>
> that's a bit of an eye irritant. How about?
> size_t n = strlen(name);
> int i, cnt = 0;
>
> for (i = n - 1; i >= 0; i--) {
> if (name[i] == '_') {
> cnt++;
> } else {
> if (cnt == 3)
> return i + 1;
> cnt = 0;
> }
> }
> return n;
I find this one much harder to read and understand. What's
eye-irritating about that loop?
Your loop will also handle `a____b` differently. My version will
return "a_" as essential name, yours "a____b". Was this intentional on
your part?
I'd rather use this instead, if you hate the first one:
size_t n = strlen(name);
int i;
for (i = n - 3; i > 0; i--) {
if (strncmp(name + i, "___", 3) == 0)
return i;
}
Is this better?
>
> > + case BTF_KIND_ARRAY: {
> > + const struct btf_array *loc_a, *targ_a;
> > +
> > + loc_a = (void *)(local_type + 1);
> > + targ_a = (void *)(targ_type + 1);
> > + local_id = loc_a->type;
> > + targ_id = targ_a->type;
>
> can we add a helper like:
Yes, we can. I was thinking about that, but decided to not expand
patch set. But we do need to extract all those small, but nice
helpers. I'll put them in libbpf_internal.h for now, but I think it
might be good idea to expose them as part of btf.h. Thoughts?
> const struct btf_array *btf_array(cosnt struct btf_type *t)
> {
> return (const struct btf_array *)(t + 1);
> }
>
> then above will be:
> case BTF_KIND_ARRAY: {
> local_id = btf_array(local_type)->type;
> targ_id = btf_array(targ_type)->type;
>
> and a bunch of code in btf.c and btf_dump.c would be cleaner as well?
Yep, some of those are already scattered around btf.c and btf_dump.c,
will clean up and add patch to this patch set.
>
> > + goto recur;
> > + }
> > + default:
> > + pr_warning("unexpected kind %d relocated, local [%d], target [%d]\n",
> > + kind, local_id, targ_id);
> > + return 0;
> > + }
> > +}
> > +
> > +/*
> > + * Given single high-level named field accessor in local type, find
> > + * corresponding high-level accessor for a target type. Along the way,
> > + * maintain low-level spec for target as well. Also keep updating target
> > + * offset.
> > + *
> > + * Searching is performed through recursive exhaustive enumeration of all
> > + * fields of a struct/union. If there are any anonymous (embedded)
> > + * structs/unions, they are recursively searched as well. If field with
> > + * desired name is found, check compatibility between local and target types,
> > + * before returning result.
> > + *
> > + * 1 is returned, if field is found.
> > + * 0 is returned if no compatible field is found.
> > + * <0 is returned on error.
> > + */
> > +static int bpf_core_match_member(const struct btf *local_btf,
> > + const struct bpf_core_accessor *local_acc,
> > + const struct btf *targ_btf,
> > + __u32 targ_id,
> > + struct bpf_core_spec *spec,
> > + __u32 *next_targ_id)
> > +{
> > + const struct btf_type *local_type, *targ_type;
> > + const struct btf_member *local_member, *m;
> > + const char *local_name, *targ_name;
> > + __u32 local_id;
> > + int i, n, found;
> > +
> > + targ_type = skip_mods_and_typedefs(targ_btf, targ_id, &targ_id);
> > + if (!targ_type)
> > + return -EINVAL;
> > + if (!btf_is_composite(targ_type))
> > + return 0;
> > +
> > + local_id = local_acc->type_id;
> > + local_type = btf__type_by_id(local_btf, local_id);
> > + local_member = (void *)(local_type + 1);
> > + local_member += local_acc->idx;
> > + local_name = btf__name_by_offset(local_btf, local_member->name_off);
> > +
> > + n = BTF_INFO_VLEN(targ_type->info);
> > + m = (void *)(targ_type + 1);
>
> new btf_member() helper?
>
> > + for (i = 0; i < n; i++, m++) {
> > + __u32 offset;
> > +
> > + /* bitfield relocations not supported */
> > + if (BTF_INFO_KFLAG(targ_type->info)) {
> > + if (BTF_MEMBER_BITFIELD_SIZE(m->offset))
> > + continue;
> > + offset = BTF_MEMBER_BIT_OFFSET(m->offset);
> > + } else {
> > + offset = m->offset;
> > + }
> > + if (offset % 8)
> > + continue;
>
> same bit of code again?
> definitely could use a helper.
Different handling (continue here, return error above), but can use
those helpers I mentioned above.
>
> > + for (i = 0; i < local_spec->len; i++, local_acc++, targ_acc++) {
> > + targ_type = skip_mods_and_typedefs(targ_spec->btf, targ_id,
> > + &targ_id);
> > + if (!targ_type)
> > + return -EINVAL;
> > +
> > + if (local_acc->name) {
> > + matched = bpf_core_match_member(local_spec->btf,
> > + local_acc,
> > + targ_btf, targ_id,
> > + targ_spec, &targ_id);
> > + if (matched <= 0)
> > + return matched;
> > + } else {
> > + /* for i=0, targ_id is already treated as array element
> > + * type (because it's the original struct), for others
> > + * we should find array element type first
> > + */
> > + if (i > 0) {
> > + const struct btf_array *a;
> > +
> > + if (!btf_is_array(targ_type))
> > + return 0;
> > +
> > + a = (void *)(targ_type + 1);
> > + if (local_acc->idx >= a->nelems)
> > + return 0;
>
> am I reading it correctly that the local spec requested out-of-bounds
> index in the target array type?
> Why this is 'ignore' instead of -EINVAL?
Similar to any other mismatch (e.g., int in local type vs int64 in
target type). It just makes candidate not matching. Why would that be
error that will stop the whole relocation and subsequently object
loading process?
>
> > +/*
> > + * Probe few well-known locations for vmlinux kernel image and try to load BTF
> > + * data out of it to use for target BTF.
> > + */
> > +static struct btf *bpf_core_find_kernel_btf(void)
> > +{
> > + const char *locations[] = {
> > + "/lib/modules/%1$s/vmlinux-%1$s",
> > + "/usr/lib/modules/%1$s/kernel/vmlinux",
> > + };
> > + char path[PATH_MAX + 1];
> > + struct utsname buf;
> > + struct btf *btf;
> > + int i, err;
> > +
> > + err = uname(&buf);
> > + if (err) {
> > + pr_warning("failed to uname(): %d\n", err);
>
> defensive programming ?
> I think uname() can fail only if &buf points to non-existing page like null.
I haven't checked source for this syscall, but man page specified that
it might return -1 on error.
>
> > + return ERR_PTR(err);
> > + }
> > +
> > + for (i = 0; i < ARRAY_SIZE(locations); i++) {
> > + snprintf(path, PATH_MAX, locations[i], buf.release);
> > + pr_debug("attempting to load kernel BTF from '%s'\n", path);
>
> I think this debug message would have been more useful after access().
Sure, will move.
>
> > +
> > + if (access(path, R_OK))
> > + continue;
> > +
> > + btf = btf__parse_elf(path, NULL);
> > + if (IS_ERR(btf))
> > + continue;
> > +
> > + pr_debug("successfully loaded kernel BTF from '%s'\n", path);
> > + return btf;
> > + }
> > +
> > + pr_warning("failed to find valid kernel BTF\n");
> > + return ERR_PTR(-ESRCH);
> > +}
> > +
> > +/* Output spec definition in the format:
> > + * [<type-id>] (<type-name>) + <raw-spec> => <offset>@<spec>,
> > + * where <spec> is a C-syntax view of recorded field access, e.g.: x.a[3].b
> > + */
> > +static void bpf_core_dump_spec(int level, const struct bpf_core_spec *spec)
> > +{
> > + const struct btf_type *t;
> > + const char *s;
> > + __u32 type_id;
> > + int i;
> > +
> > + type_id = spec->spec[0].type_id;
> > + t = btf__type_by_id(spec->btf, type_id);
> > + s = btf__name_by_offset(spec->btf, t->name_off);
> > + libbpf_print(level, "[%u] (%s) + ", type_id, s);
>
> imo extra []() don't improve readability of the dump.
[<num>] is the general convention I've been using throughout libbpf to
specify type ID, so I'd rather keep it for consistency. I can drop
parens, though, no problem.
>
> > +
> > + for (i = 0; i < spec->raw_len; i++)
> > + libbpf_print(level, "%d%s", spec->raw_spec[i],
> > + i == spec->raw_len - 1 ? " => " : ":");
> > +
> > + libbpf_print(level, "%u @ &x", spec->offset);
> > +
> > + for (i = 0; i < spec->len; i++) {
> > + if (spec->spec[i].name)
> > + libbpf_print(level, ".%s", spec->spec[i].name);
> > + else
> > + libbpf_print(level, "[%u]", spec->spec[i].idx);
> > + }
> > +
> > +}
> > +
> > +static size_t bpf_core_hash_fn(const void *key, void *ctx)
> > +{
> > + return (size_t)key;
> > +}
> > +
> > +static bool bpf_core_equal_fn(const void *k1, const void *k2, void *ctx)
> > +{
> > + return k1 == k2;
> > +}
> > +
> > +static void *u32_to_ptr(__u32 x)
> > +{
> > + return (void *)(uintptr_t)x;
> > +}
>
> u32 to pointer on 64-bit arch?!
> That surely needs a comment.
I should probably call it u32_to_hash_key() to make it obvious it's
conversion to hashmap's generic `void *` key type.
>
> > +
> > +/*
> > + * CO-RE relocate single instruction.
> > + *
> > + * The outline and important points of the algorithm:
> > + * 1. For given local type, find corresponding candidate target types.
> > + * Candidate type is a type with the same "essential" name, ignoring
> > + * everything after last triple underscore (___). E.g., `sample`,
> > + * `sample___flavor_one`, `sample___flavor_another_one`, are all candidates
> > + * for each other. Names with triple underscore are referred to as
> > + * "flavors" and are useful, among other things, to allow to
> > + * specify/support incompatible variations of the same kernel struct, which
> > + * might differ between different kernel versions and/or build
> > + * configurations.
> > + *
> > + * N.B. Struct "flavors" could be generated by bpftool's BTF-to-C
> > + * converter, when deduplicated BTF of a kernel still contains more than
> > + * one different types with the same name. In that case, ___2, ___3, etc
> > + * are appended starting from second name conflict. But start flavors are
> > + * also useful to be defined "locally", in BPF program, to extract same
> > + * data from incompatible changes between different kernel
> > + * versions/configurations. For instance, to handle field renames between
> > + * kernel versions, one can use two flavors of the struct name with the
> > + * same common name and use conditional relocations to extract that field,
> > + * depending on target kernel version.
>
> there are actual kernel types that have ___ in the name.
> Ex: struct lmc___media
> We probably need to revisit this 'flavor' convention.
There are only these:
- lmc___softc
- lmc___media
- lmc___ctl (all three in drivers/net/wan/lmc/lmc_var.h)
- ____ftrace_##name set of structs
I couldn't come up with anything cleaner-looking. I think we can still
keep ___ convention, but:
1. Match only exactly 3 underscores, delimited by non-underscore from
both sides (so similar to your proposed loop above);
2. We can also try matching candidates assuming full name without
___xxx part removed, in addition to current logic. This seems like an
overkill at this point and unlikely to be useful in practice, so I'd
postpone implementing this until we really need it.
What do you think? Which other convention did you have in mind?
>
> > + for (i = 0, j = 0; i < cand_ids->len; i++) {
> > + cand_id = cand_ids->data[i];
> > + cand_type = btf__type_by_id(targ_btf, cand_id);
> > + cand_name = btf__name_by_offset(targ_btf, cand_type->name_off);
> > +
> > + err = bpf_core_spec_match(&local_spec, targ_btf,
> > + cand_id, &cand_spec);
> > + if (err < 0) {
> > + pr_warning("prog '%s': relo #%d: failed to match spec ",
> > + prog_name, relo_idx);
> > + bpf_core_dump_spec(LIBBPF_WARN, &local_spec);
> > + libbpf_print(LIBBPF_WARN,
> > + " to candidate #%d [%d] (%s): %d\n",
> > + i, cand_id, cand_name, err);
> > + return err;
> > + }
> > + if (err == 0) {
> > + pr_debug("prog '%s': relo #%d: candidate #%d [%d] (%s) doesn't match spec ",
> > + prog_name, relo_idx, i, cand_id, cand_name);
> > + bpf_core_dump_spec(LIBBPF_DEBUG, &local_spec);
> > + libbpf_print(LIBBPF_DEBUG, "\n");
> > + continue;
> > + }
> > +
> > + pr_debug("prog '%s': relo #%d: candidate #%d matched as spec ",
> > + prog_name, relo_idx, i);
>
> did you mention that you're going to make a helper for this debug dumps?
yeah, I added bpf_core_dump_spec(), but I don't know how to shorten
this further... This output is extremely useful to understand what's
happening and will be invaluable when users will inevitably report
confusing behavior in some cases, so I still want to keep it.
>
^ permalink raw reply
* Re: [PATCH 05/14] gpio: lpc32xx: allow building on non-lpc32xx targets
From: Bartosz Golaszewski @ 2019-08-02 7:10 UTC (permalink / raw)
To: Arnd Bergmann
Cc: soc, arm-soc, Vladimir Zapolskiy, Sylvain Lemieux, Russell King,
Gregory Clement, Linus Walleij, Jason Cooper, Andrew Lunn,
Sebastian Hesselbarth, David S. Miller, Greg Kroah-Hartman,
Alan Stern, Guenter Roeck, linux-gpio, netdev, linux-serial,
linux-usb, LINUXWATCHDOG, Lee Jones, LKML
In-Reply-To: <20190731195713.3150463-6-arnd@arndb.de>
śr., 31 lip 2019 o 22:06 Arnd Bergmann <arnd@arndb.de> napisał(a):
>
> The driver uses hardwire MMIO addresses instead of the data
> that is passed in device tree. Change it over to only
> hardcode the register offset values and allow compile-testing.
>
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Hi Arnd,
thanks for working on this.
> ---
> drivers/gpio/Kconfig | 8 +++++
> drivers/gpio/Makefile | 2 +-
> drivers/gpio/gpio-lpc32xx.c | 63 ++++++++++++++++++++++++-------------
> 3 files changed, 50 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/gpio/Kconfig b/drivers/gpio/Kconfig
> index bb13c266c329..ae86ee963eae 100644
> --- a/drivers/gpio/Kconfig
> +++ b/drivers/gpio/Kconfig
> @@ -311,6 +311,14 @@ config GPIO_LPC18XX
> Select this option to enable GPIO driver for
> NXP LPC18XX/43XX devices.
>
> +config GPIO_LPC32XX
> + tristate "NXP LPC32XX GPIO support"
> + default ARCH_LPC32XX
> + depends on OF_GPIO && (ARCH_LPC32XX || COMPILE_TEST)
> + help
> + Select this option to enable GPIO driver for
> + NXP LPC32XX devices.
> +
> config GPIO_LYNXPOINT
> tristate "Intel Lynxpoint GPIO support"
> depends on ACPI && X86
> diff --git a/drivers/gpio/Makefile b/drivers/gpio/Makefile
> index a4e91175c708..87d659ae95eb 100644
> --- a/drivers/gpio/Makefile
> +++ b/drivers/gpio/Makefile
> @@ -74,7 +74,7 @@ obj-$(CONFIG_GPIO_LP3943) += gpio-lp3943.o
> obj-$(CONFIG_GPIO_LP873X) += gpio-lp873x.o
> obj-$(CONFIG_GPIO_LP87565) += gpio-lp87565.o
> obj-$(CONFIG_GPIO_LPC18XX) += gpio-lpc18xx.o
> -obj-$(CONFIG_ARCH_LPC32XX) += gpio-lpc32xx.o
> +obj-$(CONFIG_GPIO_LPC32XX) += gpio-lpc32xx.o
> obj-$(CONFIG_GPIO_LYNXPOINT) += gpio-lynxpoint.o
> obj-$(CONFIG_GPIO_MADERA) += gpio-madera.o
> obj-$(CONFIG_GPIO_MAX3191X) += gpio-max3191x.o
> diff --git a/drivers/gpio/gpio-lpc32xx.c b/drivers/gpio/gpio-lpc32xx.c
> index 24885b3db3d5..548f7cb69386 100644
> --- a/drivers/gpio/gpio-lpc32xx.c
> +++ b/drivers/gpio/gpio-lpc32xx.c
> @@ -16,8 +16,7 @@
> #include <linux/platform_device.h>
> #include <linux/module.h>
>
> -#include <mach/hardware.h>
> -#include <mach/platform.h>
> +#define _GPREG(x) (x)
What purpose does this macro serve?
>
> #define LPC32XX_GPIO_P3_INP_STATE _GPREG(0x000)
> #define LPC32XX_GPIO_P3_OUTP_SET _GPREG(0x004)
> @@ -72,12 +71,12 @@
> #define LPC32XX_GPO_P3_GRP (LPC32XX_GPI_P3_GRP + LPC32XX_GPI_P3_MAX)
>
> struct gpio_regs {
> - void __iomem *inp_state;
> - void __iomem *outp_state;
> - void __iomem *outp_set;
> - void __iomem *outp_clr;
> - void __iomem *dir_set;
> - void __iomem *dir_clr;
> + unsigned long inp_state;
> + unsigned long outp_state;
> + unsigned long outp_set;
> + unsigned long outp_clr;
> + unsigned long dir_set;
> + unsigned long dir_clr;
> };
>
> /*
> @@ -167,14 +166,26 @@ struct lpc32xx_gpio_chip {
> struct gpio_regs *gpio_grp;
> };
>
> +void __iomem *gpio_reg_base;
Any reason why this can't be made part of struct lpc32xx_gpio_chip?
> +
> +static inline u32 gpreg_read(unsigned long offset)
Here and elsewhere: could you please keep the lpc32xx_gpio prefix for
all symbols?
> +{
> + return __raw_readl(gpio_reg_base + offset);
> +}
> +
> +static inline void gpreg_write(u32 val, unsigned long offset)
> +{
> + __raw_writel(val, gpio_reg_base + offset);
> +}
> +
> static void __set_gpio_dir_p012(struct lpc32xx_gpio_chip *group,
> unsigned pin, int input)
> {
> if (input)
> - __raw_writel(GPIO012_PIN_TO_BIT(pin),
> + gpreg_write(GPIO012_PIN_TO_BIT(pin),
> group->gpio_grp->dir_clr);
> else
> - __raw_writel(GPIO012_PIN_TO_BIT(pin),
> + gpreg_write(GPIO012_PIN_TO_BIT(pin),
> group->gpio_grp->dir_set);
> }
>
> @@ -184,19 +195,19 @@ static void __set_gpio_dir_p3(struct lpc32xx_gpio_chip *group,
> u32 u = GPIO3_PIN_TO_BIT(pin);
>
> if (input)
> - __raw_writel(u, group->gpio_grp->dir_clr);
> + gpreg_write(u, group->gpio_grp->dir_clr);
> else
> - __raw_writel(u, group->gpio_grp->dir_set);
> + gpreg_write(u, group->gpio_grp->dir_set);
> }
>
> static void __set_gpio_level_p012(struct lpc32xx_gpio_chip *group,
> unsigned pin, int high)
> {
> if (high)
> - __raw_writel(GPIO012_PIN_TO_BIT(pin),
> + gpreg_write(GPIO012_PIN_TO_BIT(pin),
> group->gpio_grp->outp_set);
> else
> - __raw_writel(GPIO012_PIN_TO_BIT(pin),
> + gpreg_write(GPIO012_PIN_TO_BIT(pin),
> group->gpio_grp->outp_clr);
> }
>
> @@ -206,31 +217,31 @@ static void __set_gpio_level_p3(struct lpc32xx_gpio_chip *group,
> u32 u = GPIO3_PIN_TO_BIT(pin);
>
> if (high)
> - __raw_writel(u, group->gpio_grp->outp_set);
> + gpreg_write(u, group->gpio_grp->outp_set);
> else
> - __raw_writel(u, group->gpio_grp->outp_clr);
> + gpreg_write(u, group->gpio_grp->outp_clr);
> }
>
> static void __set_gpo_level_p3(struct lpc32xx_gpio_chip *group,
> unsigned pin, int high)
> {
> if (high)
> - __raw_writel(GPO3_PIN_TO_BIT(pin), group->gpio_grp->outp_set);
> + gpreg_write(GPO3_PIN_TO_BIT(pin), group->gpio_grp->outp_set);
> else
> - __raw_writel(GPO3_PIN_TO_BIT(pin), group->gpio_grp->outp_clr);
> + gpreg_write(GPO3_PIN_TO_BIT(pin), group->gpio_grp->outp_clr);
> }
>
> static int __get_gpio_state_p012(struct lpc32xx_gpio_chip *group,
> unsigned pin)
> {
> - return GPIO012_PIN_IN_SEL(__raw_readl(group->gpio_grp->inp_state),
> + return GPIO012_PIN_IN_SEL(gpreg_read(group->gpio_grp->inp_state),
> pin);
> }
>
> static int __get_gpio_state_p3(struct lpc32xx_gpio_chip *group,
> unsigned pin)
> {
> - int state = __raw_readl(group->gpio_grp->inp_state);
> + int state = gpreg_read(group->gpio_grp->inp_state);
>
> /*
> * P3 GPIO pin input mapping is not contiguous, GPIOP3-0..4 is mapped
> @@ -242,13 +253,13 @@ static int __get_gpio_state_p3(struct lpc32xx_gpio_chip *group,
> static int __get_gpi_state_p3(struct lpc32xx_gpio_chip *group,
> unsigned pin)
> {
> - return GPI3_PIN_IN_SEL(__raw_readl(group->gpio_grp->inp_state), pin);
> + return GPI3_PIN_IN_SEL(gpreg_read(group->gpio_grp->inp_state), pin);
> }
>
> static int __get_gpo_state_p3(struct lpc32xx_gpio_chip *group,
> unsigned pin)
> {
> - return GPO3_PIN_IN_SEL(__raw_readl(group->gpio_grp->outp_state), pin);
> + return GPO3_PIN_IN_SEL(gpreg_read(group->gpio_grp->outp_state), pin);
> }
>
> /*
> @@ -498,6 +509,10 @@ static int lpc32xx_gpio_probe(struct platform_device *pdev)
> {
> int i;
>
> + gpio_reg_base = devm_platform_ioremap_resource(pdev, 0);
> + if (gpio_reg_base)
> + return -ENXIO;
> +
> for (i = 0; i < ARRAY_SIZE(lpc32xx_gpiochip); i++) {
> if (pdev->dev.of_node) {
> lpc32xx_gpiochip[i].chip.of_xlate = lpc32xx_of_xlate;
> @@ -527,3 +542,7 @@ static struct platform_driver lpc32xx_gpio_driver = {
> };
>
> module_platform_driver(lpc32xx_gpio_driver);
> +
> +MODULE_AUTHOR("Kevin Wells <kevin.wells@nxp.com>");
> +MODULE_LICENSE("GPL");
> +MODULE_DESCRIPTION("GPIO driver for LPC32xx SoC");
> --
> 2.20.0
>
Bart
^ permalink raw reply
* Re: [PATCH 16/34] drivers/tee: convert put_page() to put_user_page*()
From: Jens Wiklander @ 2019-08-02 6:29 UTC (permalink / raw)
To: john.hubbard
Cc: Andrew Morton, Christoph Hellwig, Dan Williams, Dave Chinner,
Dave Hansen, Ira Weiny, Jan Kara, Jason Gunthorpe,
Jérôme Glisse, LKML, amd-gfx, ceph-devel, devel, devel,
dri-devel, intel-gfx, kvm, Linux ARM, linux-block,
open list:HARDWARE RANDOM NUMBER GENERATOR CORE, linux-fbdev,
linux-fsdevel, linux-media, linux-mm, linux-nfs, linux-rdma,
linux-rpi-kernel, linux-xfs, netdev, rds-devel, sparclinux, x86,
xen-devel, John Hubbard
In-Reply-To: <20190802022005.5117-17-jhubbard@nvidia.com>
On Fri, Aug 2, 2019 at 4:20 AM <john.hubbard@gmail.com> wrote:
>
> From: John Hubbard <jhubbard@nvidia.com>
>
> For pages that were retained via get_user_pages*(), release those pages
> via the new put_user_page*() routines, instead of via put_page() or
> release_pages().
>
> This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
> ("mm: introduce put_user_page*(), placeholder versions").
>
> Cc: Jens Wiklander <jens.wiklander@linaro.org>
> Signed-off-by: John Hubbard <jhubbard@nvidia.com>
> ---
> drivers/tee/tee_shm.c | 10 ++--------
> 1 file changed, 2 insertions(+), 8 deletions(-)
Acked-by: Jens Wiklander <jens.wiklander@linaro.org>
I suppose you're taking this via your own tree or such.
Thanks,
Jens
>
> diff --git a/drivers/tee/tee_shm.c b/drivers/tee/tee_shm.c
> index 2da026fd12c9..c967d0420b67 100644
> --- a/drivers/tee/tee_shm.c
> +++ b/drivers/tee/tee_shm.c
> @@ -31,16 +31,13 @@ static void tee_shm_release(struct tee_shm *shm)
>
> poolm->ops->free(poolm, shm);
> } else if (shm->flags & TEE_SHM_REGISTER) {
> - size_t n;
> int rc = teedev->desc->ops->shm_unregister(shm->ctx, shm);
>
> if (rc)
> dev_err(teedev->dev.parent,
> "unregister shm %p failed: %d", shm, rc);
>
> - for (n = 0; n < shm->num_pages; n++)
> - put_page(shm->pages[n]);
> -
> + put_user_pages(shm->pages, shm->num_pages);
> kfree(shm->pages);
> }
>
> @@ -313,16 +310,13 @@ struct tee_shm *tee_shm_register(struct tee_context *ctx, unsigned long addr,
> return shm;
> err:
> if (shm) {
> - size_t n;
> -
> if (shm->id >= 0) {
> mutex_lock(&teedev->mutex);
> idr_remove(&teedev->idr, shm->id);
> mutex_unlock(&teedev->mutex);
> }
> if (shm->pages) {
> - for (n = 0; n < shm->num_pages; n++)
> - put_page(shm->pages[n]);
> + put_user_pages(shm->pages, shm->num_pages);
> kfree(shm->pages);
> }
> }
> --
> 2.22.0
>
^ permalink raw reply
* Re: [v2,0/2] tools: bpftool: add net attach/detach command to attach XDP prog
From: Y Song @ 2019-08-02 6:26 UTC (permalink / raw)
To: Daniel T. Lee; +Cc: Daniel Borkmann, Alexei Starovoitov, netdev
In-Reply-To: <20190801081133.13200-1-danieltimlee@gmail.com>
On Thu, Aug 1, 2019 at 1:33 AM Daniel T. Lee <danieltimlee@gmail.com> wrote:
>
> Currently, bpftool net only supports dumping progs attached on the
> interface. To attach XDP prog on interface, user must use other tool
> (eg. iproute2). By this patch, with `bpftool net attach/detach`, user
> can attach/detach XDP prog on interface.
>
> $ ./bpftool prog
> ...
> 208: xdp name xdp_prog1 tag ad822e38b629553f gpl
> loaded_at 2019-07-28T18:03:11+0900 uid 0
> ...
> $ ./bpftool net attach id 208 xdpdrv enp6s0np1
> $ ./bpftool net
> xdp:
> enp6s0np1(5) driver id 208
> ...
> $ ./bpftool net detach xdpdrv enp6s0np1
> $ ./bpftool net
> xdp:
> ...
>
> While this patch only contains support for XDP, through `net
> attach/detach`, bpftool can further support other prog attach types.
new bpftool net subcommands are added. I guess doc and bash
completion needs update as well.
>
> XDP attach/detach tested on Mellanox ConnectX-4 and Netronome Agilio.
>
> ---
> Changes in v2:
> - command 'load/unload' changed to 'attach/detach' for the consistency
>
> Daniel T. Lee (2):
> tools: bpftool: add net attach command to attach XDP on interface
> tools: bpftool: add net detach command to detach XDP on interface
>
> tools/bpf/bpftool/net.c | 160 +++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 159 insertions(+), 1 deletion(-)
>
> --
> 2.20.1
>
^ permalink raw reply
* Re: [v2,2/2] tools: bpftool: add net detach command to detach XDP on interface
From: Y Song @ 2019-08-02 6:25 UTC (permalink / raw)
To: Daniel T. Lee; +Cc: Daniel Borkmann, Alexei Starovoitov, netdev
In-Reply-To: <20190801081133.13200-3-danieltimlee@gmail.com>
On Thu, Aug 1, 2019 at 2:04 AM Daniel T. Lee <danieltimlee@gmail.com> wrote:
>
> By this commit, using `bpftool net detach`, the attached XDP prog can
> be detached. Detaching the BPF prog will be done through libbpf
> 'bpf_set_link_xdp_fd' with the progfd set to -1.
>
> Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
> ---
> Changes in v2:
> - command 'unload' changed to 'detach' for the consistency
>
> tools/bpf/bpftool/net.c | 55 ++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 54 insertions(+), 1 deletion(-)
>
> diff --git a/tools/bpf/bpftool/net.c b/tools/bpf/bpftool/net.c
> index f3b57660b303..2ae9a613b05c 100644
> --- a/tools/bpf/bpftool/net.c
> +++ b/tools/bpf/bpftool/net.c
> @@ -281,6 +281,31 @@ static int parse_attach_args(int argc, char **argv, int *progfd,
> return 0;
> }
>
> +static int parse_detach_args(int argc, char **argv,
> + enum net_attach_type *attach_type, int *ifindex)
> +{
> + if (!REQ_ARGS(2))
> + return -EINVAL;
> +
> + *attach_type = parse_attach_type(*argv);
> + if (*attach_type == __MAX_NET_ATTACH_TYPE) {
> + p_err("invalid net attach/detach type");
> + return -EINVAL;
> + }
> +
> + NEXT_ARG();
> + if (!REQ_ARGS(1))
> + return -EINVAL;
> +
> + *ifindex = if_nametoindex(*argv);
> + if (!*ifindex) {
> + p_err("Invalid ifname");
> + return -EINVAL;
> + }
> +
> + return 0;
> +}
> +
> static int do_attach_detach_xdp(int *progfd, enum net_attach_type *attach_type,
> int *ifindex)
> {
> @@ -323,6 +348,31 @@ static int do_attach(int argc, char **argv)
> return 0;
> }
>
> +static int do_detach(int argc, char **argv)
> +{
> + enum net_attach_type attach_type;
> + int err, progfd, ifindex;
> +
> + err = parse_detach_args(argc, argv, &attach_type, &ifindex);
> + if (err)
> + return err;
> +
> + /* to detach xdp prog */
> + progfd = -1;
> + if (is_prefix("xdp", attach_type_strings[attach_type]))
> + err = do_attach_detach_xdp(&progfd, &attach_type, &ifindex);
Similar to previous patch, parameters no need to be pointer.
> +
> + if (err < 0) {
> + p_err("link set %s failed", attach_type_strings[attach_type]);
> + return -1;
Maybe "return err"?
> + }
> +
> + if (json_output)
> + jsonw_null(json_wtr);
> +
> + return 0;
> +}
> +
> static int do_show(int argc, char **argv)
> {
> struct bpf_attach_info attach_info = {};
> @@ -406,6 +456,7 @@ static int do_help(int argc, char **argv)
> fprintf(stderr,
> "Usage: %s %s { show | list } [dev <devname>]\n"
> " %s %s attach PROG LOAD_TYPE <devname>\n"
> + " %s %s detach LOAD_TYPE <devname>\n"
> " %s %s help\n"
> "\n"
> " " HELP_SPEC_PROGRAM "\n"
> @@ -415,7 +466,8 @@ static int do_help(int argc, char **argv)
> " to dump program attachments. For program types\n"
> " sk_{filter,skb,msg,reuseport} and lwt/seg6, please\n"
> " consult iproute2.\n",
> - bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2]);
> + bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2],
> + bin_name, argv[-2]);
>
> return 0;
> }
> @@ -424,6 +476,7 @@ static const struct cmd cmds[] = {
> { "show", do_show },
> { "list", do_show },
> { "attach", do_attach },
> + { "detach", do_detach },
> { "help", do_help },
> { 0 }
> };
> --
> 2.20.1
>
^ permalink raw reply
* Re: [v2,1/2] tools: bpftool: add net attach command to attach XDP on interface
From: Y Song @ 2019-08-02 6:23 UTC (permalink / raw)
To: Daniel T. Lee; +Cc: Daniel Borkmann, Alexei Starovoitov, netdev
In-Reply-To: <20190801081133.13200-2-danieltimlee@gmail.com>
On Thu, Aug 1, 2019 at 2:04 AM Daniel T. Lee <danieltimlee@gmail.com> wrote:
>
> By this commit, using `bpftool net attach`, user can attach XDP prog on
> interface. New type of enum 'net_attach_type' has been made, as stated at
> cover-letter, the meaning of 'attach' is, prog will be attached on interface.
>
> BPF prog will be attached through libbpf 'bpf_set_link_xdp_fd'.
>
> Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
> ---
> Changes in v2:
> - command 'load' changed to 'attach' for the consistency
> - 'NET_ATTACH_TYPE_XDP_DRIVE' changed to 'NET_ATTACH_TYPE_XDP_DRIVER'
>
> tools/bpf/bpftool/net.c | 107 +++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 106 insertions(+), 1 deletion(-)
>
> diff --git a/tools/bpf/bpftool/net.c b/tools/bpf/bpftool/net.c
> index 67e99c56bc88..f3b57660b303 100644
> --- a/tools/bpf/bpftool/net.c
> +++ b/tools/bpf/bpftool/net.c
> @@ -55,6 +55,35 @@ struct bpf_attach_info {
> __u32 flow_dissector_id;
> };
>
> +enum net_attach_type {
> + NET_ATTACH_TYPE_XDP,
> + NET_ATTACH_TYPE_XDP_GENERIC,
> + NET_ATTACH_TYPE_XDP_DRIVER,
> + NET_ATTACH_TYPE_XDP_OFFLOAD,
> + __MAX_NET_ATTACH_TYPE
> +};
> +
> +static const char * const attach_type_strings[] = {
> + [NET_ATTACH_TYPE_XDP] = "xdp",
> + [NET_ATTACH_TYPE_XDP_GENERIC] = "xdpgeneric",
> + [NET_ATTACH_TYPE_XDP_DRIVER] = "xdpdrv",
> + [NET_ATTACH_TYPE_XDP_OFFLOAD] = "xdpoffload",
> + [__MAX_NET_ATTACH_TYPE] = NULL,
> +};
> +
> +static enum net_attach_type parse_attach_type(const char *str)
> +{
> + enum net_attach_type type;
> +
> + for (type = 0; type < __MAX_NET_ATTACH_TYPE; type++) {
> + if (attach_type_strings[type] &&
> + is_prefix(str, attach_type_strings[type]))
> + return type;
> + }
> +
> + return __MAX_NET_ATTACH_TYPE;
> +}
> +
> static int dump_link_nlmsg(void *cookie, void *msg, struct nlattr **tb)
> {
> struct bpf_netdev_t *netinfo = cookie;
> @@ -223,6 +252,77 @@ static int query_flow_dissector(struct bpf_attach_info *attach_info)
> return 0;
> }
>
> +static int parse_attach_args(int argc, char **argv, int *progfd,
> + enum net_attach_type *attach_type, int *ifindex)
> +{
> + if (!REQ_ARGS(3))
> + return -EINVAL;
> +
> + *progfd = prog_parse_fd(&argc, &argv);
> + if (*progfd < 0)
> + return *progfd;
> +
> + *attach_type = parse_attach_type(*argv);
> + if (*attach_type == __MAX_NET_ATTACH_TYPE) {
> + p_err("invalid net attach/detach type");
> + return -EINVAL;
> + }
> +
> + NEXT_ARG();
> + if (!REQ_ARGS(1))
> + return -EINVAL;
> +
> + *ifindex = if_nametoindex(*argv);
> + if (!*ifindex) {
> + p_err("Invalid ifname");
Do you want to use the full function name "invalid if_nametoindex" here?
> + return -EINVAL;
> + }
> +
> + return 0;
> +}
> +
> +static int do_attach_detach_xdp(int *progfd, enum net_attach_type *attach_type,
> + int *ifindex)
You can just use plain int as the argument type here.
> +{
> + __u32 flags;
> + int err;
> +
> + flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
> + if (*attach_type == NET_ATTACH_TYPE_XDP_GENERIC)
> + flags |= XDP_FLAGS_SKB_MODE;
> + if (*attach_type == NET_ATTACH_TYPE_XDP_DRIVER)
> + flags |= XDP_FLAGS_DRV_MODE;
> + if (*attach_type == NET_ATTACH_TYPE_XDP_OFFLOAD)
> + flags |= XDP_FLAGS_HW_MODE;
> +
> + err = bpf_set_link_xdp_fd(*ifindex, *progfd, flags);
> +
> + return err;
Just do "return bpf_set_link_xdp_fd(...)" and you do not need variable err.
> +}
> +
> +static int do_attach(int argc, char **argv)
> +{
> + enum net_attach_type attach_type;
> + int err, progfd, ifindex;
> +
> + err = parse_attach_args(argc, argv, &progfd, &attach_type, &ifindex);
> + if (err)
> + return err;
> +
> + if (is_prefix("xdp", attach_type_strings[attach_type]))
> + err = do_attach_detach_xdp(&progfd, &attach_type, &ifindex);
> +
> + if (err < 0) {
> + p_err("link set %s failed", attach_type_strings[attach_type]);
> + return -1;
> + }
The above "if (err < 0)" can be true only under the above "if (is_prefix(..))"
conditions. But compiler may optimize this. So I think current form is okay.
But could you change the return value to "return err" instead of "return -1"?
> +
> + if (json_output)
> + jsonw_null(json_wtr);
> +
> + return 0;
> +}
> +
> static int do_show(int argc, char **argv)
> {
> struct bpf_attach_info attach_info = {};
> @@ -305,13 +405,17 @@ static int do_help(int argc, char **argv)
>
> fprintf(stderr,
> "Usage: %s %s { show | list } [dev <devname>]\n"
> + " %s %s attach PROG LOAD_TYPE <devname>\n"
> " %s %s help\n"
> + "\n"
> + " " HELP_SPEC_PROGRAM "\n"
> + " LOAD_TYPE := { xdp | xdpgeneric | xdpdrv | xdpoffload }\n"
> "Note: Only xdp and tc attachments are supported now.\n"
> " For progs attached to cgroups, use \"bpftool cgroup\"\n"
> " to dump program attachments. For program types\n"
> " sk_{filter,skb,msg,reuseport} and lwt/seg6, please\n"
> " consult iproute2.\n",
> - bin_name, argv[-2], bin_name, argv[-2]);
> + bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2]);
>
> return 0;
> }
> @@ -319,6 +423,7 @@ static int do_help(int argc, char **argv)
> static const struct cmd cmds[] = {
> { "show", do_show },
> { "list", do_show },
> + { "attach", do_attach },
> { "help", do_help },
> { 0 }
> };
> --
> 2.20.1
>
^ permalink raw reply
* [PATCH net-next] cnic: Explicitly initialize all reference counts to 0.
From: Michael Chan @ 2019-08-02 6:17 UTC (permalink / raw)
To: davem; +Cc: netdev, hslester96, Rasesh Mody, GR-Linux-NIC-Dev
The driver is relying on zero'ed allocated memory and does not
explicitly call atomic_set() to initialize the ref counts to 0. Add
these atomic_set() calls so that it will be more straight forward
to convert atomic ref counts to refcount_t.
Reported-by: Chuhong Yuan <hslester96@gmail.com>
Cc: Rasesh Mody <rmody@marvell.com>
Cc: <GR-Linux-NIC-Dev@marvell.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
drivers/net/ethernet/broadcom/cnic.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/net/ethernet/broadcom/cnic.c b/drivers/net/ethernet/broadcom/cnic.c
index 57dc3cb..155599d 100644
--- a/drivers/net/ethernet/broadcom/cnic.c
+++ b/drivers/net/ethernet/broadcom/cnic.c
@@ -4096,12 +4096,16 @@ static int cnic_cm_alloc_mem(struct cnic_dev *dev)
{
struct cnic_local *cp = dev->cnic_priv;
u32 port_id;
+ int i;
cp->csk_tbl = kvcalloc(MAX_CM_SK_TBL_SZ, sizeof(struct cnic_sock),
GFP_KERNEL);
if (!cp->csk_tbl)
return -ENOMEM;
+ for (i = 0; i < MAX_CM_SK_TBL_SZ; i++)
+ atomic_set(&cp->csk_tbl[i].ref_count, 0);
+
port_id = prandom_u32();
port_id %= CNIC_LOCAL_PORT_RANGE;
if (cnic_init_id_tbl(&cp->csk_port_tbl, CNIC_LOCAL_PORT_RANGE,
@@ -5480,6 +5484,7 @@ static struct cnic_dev *cnic_alloc_dev(struct net_device *dev,
cdev->unregister_device = cnic_unregister_device;
cdev->iscsi_nl_msg_recv = cnic_iscsi_nl_msg_recv;
cdev->get_fc_npiv_tbl = cnic_get_fc_npiv_tbl;
+ atomic_set(&cdev->ref_count, 0);
cp = cdev->cnic_priv;
cp->dev = cdev;
--
2.5.1
^ permalink raw reply related
* Re: [PATCH 20/34] xen: convert put_page() to put_user_page*()
From: Juergen Gross @ 2019-08-02 6:10 UTC (permalink / raw)
To: John Hubbard, john.hubbard, Andrew Morton
Cc: devel, Dave Chinner, Christoph Hellwig, Dan Williams, Ira Weiny,
x86, linux-mm, Dave Hansen, amd-gfx, dri-devel, intel-gfx,
linux-arm-kernel, linux-rpi-kernel, devel, xen-devel,
Boris Ostrovsky, rds-devel, Jérôme Glisse, Jan Kara,
ceph-devel, kvm, linux-block, linux-crypto, linux-fbdev,
linux-fsdevel, LKML, linux-media, linux-nfs, linux-rdma,
linux-xfs, netdev, sparclinux, Jason Gunthorpe
In-Reply-To: <d5140833-e9ee-beb5-ff0a-2d13a4fe819f@nvidia.com>
[-- Attachment #1: Type: text/plain, Size: 2441 bytes --]
On 02.08.19 07:48, John Hubbard wrote:
> On 8/1/19 9:36 PM, Juergen Gross wrote:
>> On 02.08.19 04:19, john.hubbard@gmail.com wrote:
>>> From: John Hubbard <jhubbard@nvidia.com>
> ...
>>> diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
>>> index 2f5ce7230a43..29e461dbee2d 100644
>>> --- a/drivers/xen/privcmd.c
>>> +++ b/drivers/xen/privcmd.c
>>> @@ -611,15 +611,10 @@ static int lock_pages(
>>> static void unlock_pages(struct page *pages[], unsigned int nr_pages)
>>> {
>>> - unsigned int i;
>>> -
>>> if (!pages)
>>> return;
>>> - for (i = 0; i < nr_pages; i++) {
>>> - if (pages[i])
>>> - put_page(pages[i]);
>>> - }
>>> + put_user_pages(pages, nr_pages);
>>
>> You are not handling the case where pages[i] is NULL here. Or am I
>> missing a pending patch to put_user_pages() here?
>>
>
> Hi Juergen,
>
> You are correct--this no longer handles the cases where pages[i]
> is NULL. It's intentional, though possibly wrong. :)
>
> I see that I should have added my standard blurb to this
> commit description. I missed this one, but some of the other patches
> have it. It makes the following, possibly incorrect claim:
>
> "This changes the release code slightly, because each page slot in the
> page_list[] array is no longer checked for NULL. However, that check
> was wrong anyway, because the get_user_pages() pattern of usage here
> never allowed for NULL entries within a range of pinned pages."
>
> The way I've seen these page arrays used with get_user_pages(),
> things are either done single page, or with a contiguous range. So
> unless I'm missing a case where someone is either
>
> a) releasing individual pages within a range (and thus likely messing
> up their count of pages they have), or
>
> b) allocating two gup ranges within the same pages[] array, with a
> gap between the allocations,
>
> ...then it should be correct. If so, then I'll add the above blurb
> to this patch's commit description.
>
> If that's not the case (both here, and in 3 or 4 other patches in this
> series, then as you said, I should add NULL checks to put_user_pages()
> and put_user_pages_dirty_lock().
In this case it is not correct, but can easily be handled. The NULL case
can occur only in an error case with the pages array filled partially or
not at all.
I'd prefer something like the attached patch here.
Juergen
[-- Attachment #2: gup.patch --]
[-- Type: text/x-patch, Size: 2001 bytes --]
diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
index 2f5ce7230a43..12bd3154126d 100644
--- a/drivers/xen/privcmd.c
+++ b/drivers/xen/privcmd.c
@@ -582,10 +582,11 @@ static long privcmd_ioctl_mmap_batch(
static int lock_pages(
struct privcmd_dm_op_buf kbufs[], unsigned int num,
- struct page *pages[], unsigned int nr_pages)
+ struct page *pages[], unsigned int *nr_pages)
{
- unsigned int i;
+ unsigned int i, free = *nr_pages;
+ *nr_pages = 0;
for (i = 0; i < num; i++) {
unsigned int requested;
int pinned;
@@ -593,35 +594,22 @@ static int lock_pages(
requested = DIV_ROUND_UP(
offset_in_page(kbufs[i].uptr) + kbufs[i].size,
PAGE_SIZE);
- if (requested > nr_pages)
+ if (requested > free)
return -ENOSPC;
pinned = get_user_pages_fast(
(unsigned long) kbufs[i].uptr,
- requested, FOLL_WRITE, pages);
+ requested, FOLL_WRITE, pages + *nr_pages);
if (pinned < 0)
return pinned;
- nr_pages -= pinned;
- pages += pinned;
+ free -= pinned;
+ *nr_pages += pinned;
}
return 0;
}
-static void unlock_pages(struct page *pages[], unsigned int nr_pages)
-{
- unsigned int i;
-
- if (!pages)
- return;
-
- for (i = 0; i < nr_pages; i++) {
- if (pages[i])
- put_page(pages[i]);
- }
-}
-
static long privcmd_ioctl_dm_op(struct file *file, void __user *udata)
{
struct privcmd_data *data = file->private_data;
@@ -681,11 +669,12 @@ static long privcmd_ioctl_dm_op(struct file *file, void __user *udata)
xbufs = kcalloc(kdata.num, sizeof(*xbufs), GFP_KERNEL);
if (!xbufs) {
+ nr_pages = 0;
rc = -ENOMEM;
goto out;
}
- rc = lock_pages(kbufs, kdata.num, pages, nr_pages);
+ rc = lock_pages(kbufs, kdata.num, pages, &nr_pages);
if (rc)
goto out;
@@ -699,7 +688,8 @@ static long privcmd_ioctl_dm_op(struct file *file, void __user *udata)
xen_preemptible_hcall_end();
out:
- unlock_pages(pages, nr_pages);
+ if (pages)
+ put_user_pages(pages, nr_pages);
kfree(xbufs);
kfree(pages);
kfree(kbufs);
^ permalink raw reply related
* Re: [PATCH 20/34] xen: convert put_page() to put_user_page*()
From: John Hubbard @ 2019-08-02 5:48 UTC (permalink / raw)
To: Juergen Gross, john.hubbard, Andrew Morton
Cc: devel, Dave Chinner, Christoph Hellwig, Dan Williams, Ira Weiny,
x86, linux-mm, Dave Hansen, amd-gfx, dri-devel, intel-gfx,
linux-arm-kernel, linux-rpi-kernel, devel, xen-devel,
Boris Ostrovsky, rds-devel, Jérôme Glisse, Jan Kara,
ceph-devel, kvm, linux-block, linux-crypto, linux-fbdev,
linux-fsdevel, LKML, linux-media, linux-nfs, linux-rdma,
linux-xfs, netdev, sparclinux, Jason Gunthorpe
In-Reply-To: <4471e9dc-a315-42c1-0c3c-55ba4eeeb106@suse.com>
On 8/1/19 9:36 PM, Juergen Gross wrote:
> On 02.08.19 04:19, john.hubbard@gmail.com wrote:
>> From: John Hubbard <jhubbard@nvidia.com>
...
>> diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
>> index 2f5ce7230a43..29e461dbee2d 100644
>> --- a/drivers/xen/privcmd.c
>> +++ b/drivers/xen/privcmd.c
>> @@ -611,15 +611,10 @@ static int lock_pages(
>> static void unlock_pages(struct page *pages[], unsigned int nr_pages)
>> {
>> - unsigned int i;
>> -
>> if (!pages)
>> return;
>> - for (i = 0; i < nr_pages; i++) {
>> - if (pages[i])
>> - put_page(pages[i]);
>> - }
>> + put_user_pages(pages, nr_pages);
>
> You are not handling the case where pages[i] is NULL here. Or am I
> missing a pending patch to put_user_pages() here?
>
Hi Juergen,
You are correct--this no longer handles the cases where pages[i]
is NULL. It's intentional, though possibly wrong. :)
I see that I should have added my standard blurb to this
commit description. I missed this one, but some of the other patches
have it. It makes the following, possibly incorrect claim:
"This changes the release code slightly, because each page slot in the
page_list[] array is no longer checked for NULL. However, that check
was wrong anyway, because the get_user_pages() pattern of usage here
never allowed for NULL entries within a range of pinned pages."
The way I've seen these page arrays used with get_user_pages(),
things are either done single page, or with a contiguous range. So
unless I'm missing a case where someone is either
a) releasing individual pages within a range (and thus likely messing
up their count of pages they have), or
b) allocating two gup ranges within the same pages[] array, with a
gap between the allocations,
...then it should be correct. If so, then I'll add the above blurb
to this patch's commit description.
If that's not the case (both here, and in 3 or 4 other patches in this
series, then as you said, I should add NULL checks to put_user_pages()
and put_user_pages_dirty_lock().
thanks,
--
John Hubbard
NVIDIA
^ permalink raw reply
* Re: Slowness forming TIPC cluster with explicit node addresses
From: Chris Packham @ 2019-08-02 5:11 UTC (permalink / raw)
To: jon.maloy@ericsson.com, tipc-discussion@lists.sourceforge.net
Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <1564347861.9737.25.camel@alliedtelesis.co.nz>
On Mon, 2019-07-29 at 09:04 +1200, Chris Packham wrote:
> On Fri, 2019-07-26 at 13:31 +0000, Jon Maloy wrote:
> >
> >
> > >
> > >
> > > -----Original Message-----
> > > From: netdev-owner@vger.kernel.org <netdev-owner@vger.kernel.org>
> > > On
> > > Behalf Of Chris Packham
> > > Sent: 25-Jul-19 19:37
> > > To: tipc-discussion@lists.sourceforge.net
> > > Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org
> > > Subject: Slowness forming TIPC cluster with explicit node
> > > addresses
> > >
> > > Hi,
> > >
> > > I'm having problems forming a TIPC cluster between 2 nodes.
> > >
> > > This is the basic steps I'm going through on each node.
> > >
> > > modprobe tipc
> > > ip link set eth2 up
> > > tipc node set addr 1.1.5 # or 1.1.6
> > > tipc bearer enable media eth dev eth0
> > eth2, I assume...
> >
> Yes sorry I keep switching between between Ethernet ports for testing
> so I hand edited the email.
>
> >
> > >
> > >
> > >
> > > Then to confirm if the cluster is formed I use tipc link list
> > >
> > > [root@node-5 ~]# tipc link list
> > > broadcast-link: up
> > > ...
> > >
> > > Looking at tcpdump the two nodes are sending packets
> > >
> > > 22:30:05.782320 TIPC v2.0 1.1.5 > 0.0.0, headerlength 60 bytes,
> > > MessageSize
> > > 76 bytes, Neighbor Detection Protocol internal, messageType Link
> > > request
> > > 22:30:05.863555 TIPC v2.0 1.1.6 > 0.0.0, headerlength 60 bytes,
> > > MessageSize
> > > 76 bytes, Neighbor Detection Protocol internal, messageType Link
> > > request
> > >
> > > Eventually (after a few minutes) the link does come up
> > >
> > > [root@node-6 ~]# tipc link list
> > > broadcast-link: up
> > > 1001006:eth2-1001005:eth2: up
> > >
> > > [root@node-5 ~]# tipc link list
> > > broadcast-link: up
> > > 1001005:eth2-1001006:eth2: up
> > >
> > > When I remove the "tipc node set addr" things seem to kick into
> > > life straight
> > > away
> > >
> > > [root@node-5 ~]# tipc link list
> > > broadcast-link: up
> > > 0050b61bd2aa:eth2-0050b61e6dfa:eth2: up
> > >
> > > So there appears to be some difference in behaviour between
> > > having
> > > an
> > > explicit node address and using the default. Unfortunately our
> > > application
> > > relies on setting the node addresses.
> > I do this many times a day, without any problems. If there would be
> > any time difference, I would expect the 'auto configurable' version
> > to be slower, because it involves a DAD step.
> > Are you sure you don't have any other nodes running in your system?
> >
> > ///jon
> >
> Nope the two nodes are connected back to back. Does the number of
> Ethernet interfaces make a difference? As you can see I've got 3 on
> each node. One is completely disconnected, one is for booting over
> TFTP
> (only used by U-boot) and the other is the USB Ethernet I'm using
> for
> testing.
>
So I can still reproduce this on nodes that only have one network
interface and are the only things connected.
I did find one thing that helps
diff --git a/net/tipc/discover.c b/net/tipc/discover.c
index c138d68e8a69..49921dad404a 100644
--- a/net/tipc/discover.c
+++ b/net/tipc/discover.c
@@ -358,10 +358,10 @@ int tipc_disc_create(struct net *net, struct
tipc_bearer *b,
tipc_disc_init_msg(net, d->skb, DSC_REQ_MSG, b);
/* Do we need an address trial period first ? */
- if (!tipc_own_addr(net)) {
+// if (!tipc_own_addr(net)) {
tn->addr_trial_end = jiffies + msecs_to_jiffies(1000);
msg_set_type(buf_msg(d->skb), DSC_TRIAL_MSG);
- }
+// }
memcpy(&d->dest, dest, sizeof(*dest));
d->net = net;
d->bearer_id = b->identity;
I think because with pre-configured addresses the duplicate address
detection is skipped the shorter init phase is skipped. Would is make
sense to unconditionally do the trial step? Or is there some better way
to get things to transition with pre-assigned addresses.
^ permalink raw reply related
* Re: [v2,0/2] tools: bpftool: add net attach/detach command to attach XDP prog
From: Daniel T. Lee @ 2019-08-02 5:07 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: Daniel Borkmann, Alexei Starovoitov, netdev
In-Reply-To: <20190801162129.0a775fde@cakuba.netronome.com>
Thank you for letting me know.
I will add to next version of patch.
And, thank you for the detailed review. :)
On Fri, Aug 2, 2019 at 8:21 AM Jakub Kicinski
<jakub.kicinski@netronome.com> wrote:
>
> On Thu, 1 Aug 2019 17:11:31 +0900, Daniel T. Lee wrote:
> > Currently, bpftool net only supports dumping progs attached on the
> > interface. To attach XDP prog on interface, user must use other tool
> > (eg. iproute2). By this patch, with `bpftool net attach/detach`, user
> > can attach/detach XDP prog on interface.
> >
> > $ ./bpftool prog
> > ...
> > 208: xdp name xdp_prog1 tag ad822e38b629553f gpl
> > loaded_at 2019-07-28T18:03:11+0900 uid 0
> > ...
> > $ ./bpftool net attach id 208 xdpdrv enp6s0np1
> > $ ./bpftool net
> > xdp:
> > enp6s0np1(5) driver id 208
> > ...
> > $ ./bpftool net detach xdpdrv enp6s0np1
> > $ ./bpftool net
> > xdp:
> > ...
> >
> > While this patch only contains support for XDP, through `net
> > attach/detach`, bpftool can further support other prog attach types.
> >
> > XDP attach/detach tested on Mellanox ConnectX-4 and Netronome Agilio.
>
> Please provide documentation for man pages, and bash completions.
^ permalink raw reply
* Re: [v2,1/2] tools: bpftool: add net attach command to attach XDP on interface
From: Daniel T. Lee @ 2019-08-02 5:02 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: Daniel Borkmann, Alexei Starovoitov, netdev
In-Reply-To: <20190801163638.71700f6d@cakuba.netronome.com>
On Fri, Aug 2, 2019 at 8:36 AM Jakub Kicinski
<jakub.kicinski@netronome.com> wrote:
>
> On Thu, 1 Aug 2019 17:11:32 +0900, Daniel T. Lee wrote:
> > By this commit, using `bpftool net attach`, user can attach XDP prog on
> > interface. New type of enum 'net_attach_type' has been made, as stated at
> > cover-letter, the meaning of 'attach' is, prog will be attached on interface.
> >
> > BPF prog will be attached through libbpf 'bpf_set_link_xdp_fd'.
> >
> > Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
> > ---
> > Changes in v2:
> > - command 'load' changed to 'attach' for the consistency
> > - 'NET_ATTACH_TYPE_XDP_DRIVE' changed to 'NET_ATTACH_TYPE_XDP_DRIVER'
> >
> > tools/bpf/bpftool/net.c | 107 +++++++++++++++++++++++++++++++++++++++-
> > 1 file changed, 106 insertions(+), 1 deletion(-)
> >
> > diff --git a/tools/bpf/bpftool/net.c b/tools/bpf/bpftool/net.c
> > index 67e99c56bc88..f3b57660b303 100644
> > --- a/tools/bpf/bpftool/net.c
> > +++ b/tools/bpf/bpftool/net.c
> > @@ -55,6 +55,35 @@ struct bpf_attach_info {
> > __u32 flow_dissector_id;
> > };
> >
> > +enum net_attach_type {
> > + NET_ATTACH_TYPE_XDP,
> > + NET_ATTACH_TYPE_XDP_GENERIC,
> > + NET_ATTACH_TYPE_XDP_DRIVER,
> > + NET_ATTACH_TYPE_XDP_OFFLOAD,
> > + __MAX_NET_ATTACH_TYPE
> > +};
> > +
> > +static const char * const attach_type_strings[] = {
> > + [NET_ATTACH_TYPE_XDP] = "xdp",
> > + [NET_ATTACH_TYPE_XDP_GENERIC] = "xdpgeneric",
> > + [NET_ATTACH_TYPE_XDP_DRIVER] = "xdpdrv",
> > + [NET_ATTACH_TYPE_XDP_OFFLOAD] = "xdpoffload",
> > + [__MAX_NET_ATTACH_TYPE] = NULL,
>
> Not sure if the terminator is necessary,
> ARRAY_SIZE(attach_type_strings) should suffice?
Yes, ARRAY_SIZE is fine though. But I was just trying to make below
'parse_attach_type' consistent with 'parse_attach_type' from the 'prog.c'.
At 'prog.c', It has same terminator at 'attach_type_strings'.
Should I change it or keep it?
> > +};
> > +
> > +static enum net_attach_type parse_attach_type(const char *str)
> > +{
> > + enum net_attach_type type;
> > +
> > + for (type = 0; type < __MAX_NET_ATTACH_TYPE; type++) {
> > + if (attach_type_strings[type] &&
> > + is_prefix(str, attach_type_strings[type]))
> > + return type;
> > + }
> > +
> > + return __MAX_NET_ATTACH_TYPE;
> > +}
> > +
> > static int dump_link_nlmsg(void *cookie, void *msg, struct nlattr **tb)
> > {
> > struct bpf_netdev_t *netinfo = cookie;
> > @@ -223,6 +252,77 @@ static int query_flow_dissector(struct bpf_attach_info *attach_info)
> > return 0;
> > }
> >
> > +static int parse_attach_args(int argc, char **argv, int *progfd,
> > + enum net_attach_type *attach_type, int *ifindex)
> > +{
> > + if (!REQ_ARGS(3))
> > + return -EINVAL;
> > +
> > + *progfd = prog_parse_fd(&argc, &argv);
> > + if (*progfd < 0)
> > + return *progfd;
> > +
> > + *attach_type = parse_attach_type(*argv);
> > + if (*attach_type == __MAX_NET_ATTACH_TYPE) {
> > + p_err("invalid net attach/detach type");
> > + return -EINVAL;
>
> You should close the progfd on error paths.
I will add 'close(*progfd);' at next version of patch.
>
> > + }
>
> Hm. I'm not too sure about the ordering of arguments, type should
> probably be right after attach.
>
> If we ever add tc attach support or some other hook, that's more
> fundamental part of the command than the program. So I think:
>
> bpftool net attach xdp id xyz dev ethN
I think it is more reasonable than current format.
I'll change the argument order as attach type comes in first.
> > + NEXT_ARG();
> > + if (!REQ_ARGS(1))
> > + return -EINVAL;
>
> Error message needed here.
>
Actually it provides error message like:
Error: 'xdp' needs at least 1 arguments, 0 found
are you suggesting that any additional error message is necessary?
> > + *ifindex = if_nametoindex(*argv);
> > + if (!*ifindex) {
> > + p_err("Invalid ifname");
>
> "ifname" is not mentioned in help, it'd be best to keep this error
> message consistent with bpftool prog load.
I will change it to a 'devname', since the word is mentioned in help.
> > + return -EINVAL;
> > + }
>
> Please require the dev keyword before the interface name.
> That'll make it feel closer to prog load syntax.
If adding the dev keyword before interface name, will it be too long to type in?
and also `bpftool prog` use extra keyword (such as dev) when it is
optional keyword.
bpftool prog dump jited PROG [{ file FILE | opcodes | linum }]
bpftool prog pin PROG FILE
bpftool prog { load | loadall } OBJ PATH \
as you can see here, FILE uses optional keyword 'file' when the
argument is optional.
bpftool prog { load | loadall } OBJ PATH \
[type TYPE] [dev NAME] \
[map { idx IDX | name NAME } MAP]\
[pinmaps MAP_DIR]
Yes, bpftool prog load has dev keyword with it,
but first, like previous, the argument is optional so i think it is
unnecessary to use optional keyword 'dev'.
and secondly, 'bpftool net attach' isn't really related to 'bpftool prog load'.
At previous version patch, I was using word 'load' instead of
'attach', since XDP program is not
considered as 'BPF_PROG_ATTACH', so it might give a confusion. However
by the last patch discussion,
word 'load' has been replaced to 'attach'.
Keeping the consistency is very important, but I was just wandering
about making command
similar to 'bpftool prog load' syntax.
>
> > + return 0;
> > +}
> > +
> > +static int do_attach_detach_xdp(int *progfd, enum net_attach_type *attach_type,
> > + int *ifindex)
> > +{
> > + __u32 flags;
> > + int err;
> > +
> > + flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
>
> Please add this as an option so that user can decide whether overwrite
> is allowed or not.
Adding force flag to bpftool seems necessary.
I will add an optional argument for this.
> > + if (*attach_type == NET_ATTACH_TYPE_XDP_GENERIC)
> > + flags |= XDP_FLAGS_SKB_MODE;
> > + if (*attach_type == NET_ATTACH_TYPE_XDP_DRIVER)
> > + flags |= XDP_FLAGS_DRV_MODE;
> > + if (*attach_type == NET_ATTACH_TYPE_XDP_OFFLOAD)
> > + flags |= XDP_FLAGS_HW_MODE;
> > +
> > + err = bpf_set_link_xdp_fd(*ifindex, *progfd, flags);
> > +
> > + return err;
>
> no need for the err variable here.
My apologies, but I'm not sure why err variable isn't needed at here.
AFAIK, 'bpf_set_link_xdp_fd' from libbpf returns the netlink_recv result,
and in order to propagate error, err variable is necessary, I guess?
> > +}
> > +
> > +static int do_attach(int argc, char **argv)
> > +{
> > + enum net_attach_type attach_type;
> > + int err, progfd, ifindex;
> > +
> > + err = parse_attach_args(argc, argv, &progfd, &attach_type, &ifindex);
> > + if (err)
> > + return err;
>
> Probably not the best idea to move this out into a helper.
Again, just trying to make consistent with 'prog.c'.
But clearly it has differences with do_attach/detach from 'prog.c'.
From it, it uses the same parse logic 'parse_attach_detach_args' since
the two command 'bpftool prog attach/detach' uses the same argument format.
However, in here, 'bpftool net' attach and detach requires different number of
argument, so function for parse argument has been defined separately.
The situation is little bit different, but keeping argument parse logic as an
helper, I think it's better in terms of consistency.
About the moving parse logic to a helper, I was trying to keep command
entry (do_attach)
as simple as possible. Parse all the argument in command entry will
make function longer
and might make harder to understand what it does.
And I'm not pretty sure that argument parse logic will stays the same
after other attachment
type comes in. What I mean is, the argument count or type might be
added and to fulfill
all that specific cases, the code might grow larger.
So for the consistency, simplicity and extensibility, I prefer to keep
it as a helper.
> > + if (is_prefix("xdp", attach_type_strings[attach_type]))
> > + err = do_attach_detach_xdp(&progfd, &attach_type, &ifindex);
>
> Hm. We either need an error to be reported if it's not xdp or since we
> only accept XDP now perhaps the if() is superfluous?
Well, if the attach_type isn't xdp, the error will be occurred from
the argument parse,
Will it be necessary to reinforce with error logic to make it more secure?
> > + if (err < 0) {
> > + p_err("link set %s failed", attach_type_strings[attach_type]);
>
> "link set"? So you are familiar with iproute2 syntax! :)
Well at first, to avoid confusion about using a command for end-user,
I referenced iproute2 and tried to keep similar message form with it.
But through the discussion about iproute2 and bpftool, it would be better not to
use word such as 'link set'. Maybe "interface %s attach failed" would be better
for clear understanding.
I will fix it at next version of patch.
> > + return -1;
> > + }
> > +
> > + if (json_output)
> > + jsonw_null(json_wtr);
> > +
> > + return 0;
> > +}
> > +
> > static int do_show(int argc, char **argv)
> > {
> > struct bpf_attach_info attach_info = {};
> > @@ -305,13 +405,17 @@ static int do_help(int argc, char **argv)
> >
> > fprintf(stderr,
> > "Usage: %s %s { show | list } [dev <devname>]\n"
> > + " %s %s attach PROG LOAD_TYPE <devname>\n"
> > " %s %s help\n"
> > + "\n"
> > + " " HELP_SPEC_PROGRAM "\n"
> > + " LOAD_TYPE := { xdp | xdpgeneric | xdpdrv | xdpoffload }\n"
>
> ATTACH_TYPE now?
>
> Perhaps a new line before the "Note"?
My bad.
Will change it right away.
> > "Note: Only xdp and tc attachments are supported now.\n"
> > " For progs attached to cgroups, use \"bpftool cgroup\"\n"
> > " to dump program attachments. For program types\n"
> > " sk_{filter,skb,msg,reuseport} and lwt/seg6, please\n"
> > " consult iproute2.\n",
> > - bin_name, argv[-2], bin_name, argv[-2]);
> > + bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2]);
> >
> > return 0;
> > }
> > @@ -319,6 +423,7 @@ static int do_help(int argc, char **argv)
> > static const struct cmd cmds[] = {
> > { "show", do_show },
> > { "list", do_show },
> > + { "attach", do_attach },
> > { "help", do_help },
> > { 0 }
> > };
>
Thanks for the detailed review! :)
^ permalink raw reply
* Re: [PATCH 20/34] xen: convert put_page() to put_user_page*()
From: Juergen Gross @ 2019-08-02 4:36 UTC (permalink / raw)
To: john.hubbard, Andrew Morton
Cc: devel, Dave Chinner, Christoph Hellwig, Dan Williams, Ira Weiny,
x86, linux-mm, Dave Hansen, amd-gfx, dri-devel, intel-gfx,
linux-arm-kernel, linux-rpi-kernel, devel, xen-devel,
John Hubbard, Boris Ostrovsky, rds-devel, Jérôme Glisse,
Jan Kara, ceph-devel, kvm, linux-block, linux-crypto, linux-fbdev,
linux-fsdevel, LKML, linux-media, linux-nfs, linux-rdma,
linux-xfs, netdev, sparclinux, Jason Gunthorpe
In-Reply-To: <20190802022005.5117-21-jhubbard@nvidia.com>
On 02.08.19 04:19, john.hubbard@gmail.com wrote:
> From: John Hubbard <jhubbard@nvidia.com>
>
> For pages that were retained via get_user_pages*(), release those pages
> via the new put_user_page*() routines, instead of via put_page() or
> release_pages().
>
> This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
> ("mm: introduce put_user_page*(), placeholder versions").
>
> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: xen-devel@lists.xenproject.org
> Signed-off-by: John Hubbard <jhubbard@nvidia.com>
> ---
> drivers/xen/gntdev.c | 5 +----
> drivers/xen/privcmd.c | 7 +------
> 2 files changed, 2 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
> index 4c339c7e66e5..2586b3df2bb6 100644
> --- a/drivers/xen/gntdev.c
> +++ b/drivers/xen/gntdev.c
> @@ -864,10 +864,7 @@ static int gntdev_get_page(struct gntdev_copy_batch *batch, void __user *virt,
>
> static void gntdev_put_pages(struct gntdev_copy_batch *batch)
> {
> - unsigned int i;
> -
> - for (i = 0; i < batch->nr_pages; i++)
> - put_page(batch->pages[i]);
> + put_user_pages(batch->pages, batch->nr_pages);
> batch->nr_pages = 0;
> }
>
> diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
> index 2f5ce7230a43..29e461dbee2d 100644
> --- a/drivers/xen/privcmd.c
> +++ b/drivers/xen/privcmd.c
> @@ -611,15 +611,10 @@ static int lock_pages(
>
> static void unlock_pages(struct page *pages[], unsigned int nr_pages)
> {
> - unsigned int i;
> -
> if (!pages)
> return;
>
> - for (i = 0; i < nr_pages; i++) {
> - if (pages[i])
> - put_page(pages[i]);
> - }
> + put_user_pages(pages, nr_pages);
You are not handling the case where pages[i] is NULL here. Or am I
missing a pending patch to put_user_pages() here?
Juergen
^ permalink raw reply
* Re: [PATCH net] ipv4/route: do not check saddr dev if iif is LOOPBACK_IFINDEX
From: David Ahern @ 2019-08-02 4:16 UTC (permalink / raw)
To: Hangbin Liu
Cc: netdev, Stefano Brivio, Marcelo Ricardo Leitner, David S . Miller
In-Reply-To: <20190802041358.GT18865@dhcp-12-139.nay.redhat.com>
On 8/1/19 10:13 PM, Hangbin Liu wrote:
> On Thu, Aug 01, 2019 at 01:51:25PM -0600, David Ahern wrote:
>> On 8/1/19 2:29 AM, Hangbin Liu wrote:
>>> Jianlin reported a bug that for IPv4, ip route get from src_addr would fail
>>> if src_addr is not an address on local system.
>>>
>>> \# ip route get 1.1.1.1 from 2.2.2.2
>>> RTNETLINK answers: Invalid argument
>>
>> so this is a forwarding lookup in which case iif should be set. Based on
>
> with out setting iif in userspace, the kernel set iif to lo by default.
right, it presumes locally generated traffic.
>
>> the above 'route get' inet_rtm_getroute is doing a lookup as if it is
>> locally generated traffic.
>
> yeah... but what about the IPv6 part. That cause a different behavior in
> userspace.
just one of many, many annoying differences between v4 and v6. We could
try to catalog it.
^ permalink raw reply
* Re: [PATCH net] ipv4/route: do not check saddr dev if iif is LOOPBACK_IFINDEX
From: Hangbin Liu @ 2019-08-02 4:13 UTC (permalink / raw)
To: David Ahern
Cc: netdev, Stefano Brivio, Marcelo Ricardo Leitner, David S . Miller
In-Reply-To: <f44d9f26-046d-38a2-13aa-d25b92419d11@gmail.com>
On Thu, Aug 01, 2019 at 01:51:25PM -0600, David Ahern wrote:
> On 8/1/19 2:29 AM, Hangbin Liu wrote:
> > Jianlin reported a bug that for IPv4, ip route get from src_addr would fail
> > if src_addr is not an address on local system.
> >
> > \# ip route get 1.1.1.1 from 2.2.2.2
> > RTNETLINK answers: Invalid argument
>
> so this is a forwarding lookup in which case iif should be set. Based on
with out setting iif in userspace, the kernel set iif to lo by default.
> the above 'route get' inet_rtm_getroute is doing a lookup as if it is
> locally generated traffic.
yeah... but what about the IPv6 part. That cause a different behavior in
userspace.
Thanks
Hangbin
^ permalink raw reply
* Re: [PATCH net-next 00/15] net: Add functional tests for L3 and L4
From: David Ahern @ 2019-08-02 4:11 UTC (permalink / raw)
To: Alexei Starovoitov, David Ahern; +Cc: davem, netdev
In-Reply-To: <20190802001900.uyuryet2lrr3hgsq@ast-mbp.dhcp.thefacebook.com>
On 8/1/19 6:19 PM, Alexei Starovoitov wrote:
> Do you really need 'sleep 1' everywhere?
> It makes them so slow to run...
> What happens if you just remove it ? Tests will fail? Why?
yes, the sleep 1 is needed. A server process is getting launched into
the background. It's status is not relevant; the client's is the key.
The server needs time to initialize. And, yes I tried forking and it
gets hung up with bash and capturing the output for verbose mode.
^ permalink raw reply
* Re: [PATCH net-next 00/15] net: Add functional tests for L3 and L4
From: David Ahern @ 2019-08-02 4:04 UTC (permalink / raw)
To: Alexei Starovoitov, David Ahern; +Cc: davem, netdev
In-Reply-To: <20190802001900.uyuryet2lrr3hgsq@ast-mbp.dhcp.thefacebook.com>
On 8/1/19 6:19 PM, Alexei Starovoitov wrote:
> On Thu, Aug 01, 2019 at 11:56:33AM -0700, David Ahern wrote:
>> From: David Ahern <dsahern@gmail.com>
>>
>> This is a port the functional test cases created during the development
>> of the VRF feature. It covers various permutations of icmp, tcp and udp
>> for IPv4 and IPv6 including negative tests.
>
> Thanks a lot for doing this!
>
> Is there expected output ?
My tests are quiet by default with a verbose option and pause on fail.
$ fcnal-test.sh -h
usage: fcnal-test.sh OPTS
-4 IPv4 tests only
-6 IPv6 tests only
-t <test> Test name/set to run
-p Pause on fail
-P Pause after each test
-v Be verbose
> All tests suppose to pass on the latest net-next?
they do for me. Buster image. 5.3-rc1 kernel.
>
> I'm seeing:
> ./fcnal-test.sh
> ...
> SYSCTL: net.ipv4.raw_l3mdev_accept=0
> TEST: ping out - ns-B IP [ OK ]
> TEST: ping out, device bind - ns-B IP [ OK ]
> TEST: ping out, address bind - ns-B IP [FAIL]
> TEST: ping out - ns-B loopback IP [ OK ]
> TEST: ping out, device bind - ns-B loopback IP [ OK ]
> TEST: ping out, address bind - ns-B loopback IP [FAIL]
> TEST: ping in - ns-A IP [ OK ]
> TEST: ping in - ns-A loopback IP [ OK ]
> TEST: ping local - ns-A IP [ OK ]
> TEST: ping local - ns-A loopback IP [ OK ]
> TEST: ping local - loopback [ OK ]
> TEST: ping local, device bind - ns-A IP [ OK ]
> TEST: ping local, device bind - ns-A loopback IP [FAIL]
> TEST: ping local, device bind - loopback [FAIL]
...
TEST: ping out - ns-B IP [ OK ]
TEST: ping out, device bind - ns-B IP [ OK ]
TEST: ping out, address bind - ns-B IP [ OK ]
TEST: ping out - ns-B loopback IP [ OK ]
TEST: ping out, device bind - ns-B loopback IP [ OK ]
TEST: ping out, address bind - ns-B loopback IP [ OK ]
TEST: ping in - ns-A IP [ OK ]
TEST: ping in - ns-A loopback IP [ OK ]
TEST: ping local - ns-A IP [ OK ]
TEST: ping local - ns-A loopback IP [ OK ]
TEST: ping local - loopback [ OK ]
TEST: ping local, device bind - ns-A IP [ OK ]
TEST: ping local, device bind - ns-A loopback IP [ OK ]
TEST: ping local, device bind - loopback [ OK ]
TEST: ping out, blocked by rule - ns-B loopback IP [ OK ]
TEST: ping in, blocked by rule - ns-A loopback IP [ OK ]
TEST: ping out, blocked by route - ns-B loopback IP [ OK ]
TEST: ping in, blocked by route - ns-A loopback IP [ OK ]
TEST: ping out, unreachable default route - ns-B loopback IP [ OK ]
...
>
> with -v I see:
> COMMAND: ip netns exec ns-A ping -c1 -w1 -I 172.16.2.1 172.16.1.2
> ping: unknown iface 172.16.2.1
> TEST: ping out, address bind - ns-B IP [FAIL]
With ping from iputils-ping -I can be an address or a device.
$ man ping
-I interface
interface is either an address, or an interface name. If
interface is an address, it
sets source address to specified interface address. If
interface in an interface name,
it sets source interface to specified interface. NOTE: For
IPv6, when doing ping to a
link-local scope address, link specification (by the
'%'-notation in destination, or
by this option) can be used but it is no longer required.
^ permalink raw reply
* [PATCH net-next] net: can: Fix compiling warning
From: Mao Wenan @ 2019-08-02 3:36 UTC (permalink / raw)
To: socketcan, davem, netdev; +Cc: linux-kernel, kernel-janitors, Mao Wenan
There are two warings in net/can, fix them by setting bcm_sock_no_ioctlcmd
and raw_sock_no_ioctlcmd as static.
net/can/bcm.c:1683:5: warning: symbol 'bcm_sock_no_ioctlcmd' was not declared. Should it be static?
net/can/raw.c:840:5: warning: symbol 'raw_sock_no_ioctlcmd' was not declared. Should it be static?
Fixes: 473d924d7d46 ("can: fix ioctl function removal")
Signed-off-by: Mao Wenan <maowenan@huawei.com>
---
net/can/bcm.c | 2 +-
net/can/raw.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/can/bcm.c b/net/can/bcm.c
index bf1d0bbecec8..b8a32b4ac368 100644
--- a/net/can/bcm.c
+++ b/net/can/bcm.c
@@ -1680,7 +1680,7 @@ static int bcm_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
return size;
}
-int bcm_sock_no_ioctlcmd(struct socket *sock, unsigned int cmd,
+static int bcm_sock_no_ioctlcmd(struct socket *sock, unsigned int cmd,
unsigned long arg)
{
/* no ioctls for socket layer -> hand it down to NIC layer */
diff --git a/net/can/raw.c b/net/can/raw.c
index da386f1fa815..a01848ff9b12 100644
--- a/net/can/raw.c
+++ b/net/can/raw.c
@@ -837,7 +837,7 @@ static int raw_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
return size;
}
-int raw_sock_no_ioctlcmd(struct socket *sock, unsigned int cmd,
+static int raw_sock_no_ioctlcmd(struct socket *sock, unsigned int cmd,
unsigned long arg)
{
/* no ioctls for socket layer -> hand it down to NIC layer */
--
2.20.1
^ permalink raw reply related
* Re: [PATCH net-next v5 5/6] flow_offload: support get flow_block immediately
From: Jakub Kicinski @ 2019-08-02 2:56 UTC (permalink / raw)
To: wenxu; +Cc: jiri, pablo, fw, netfilter-devel, netdev, John Hurley
In-Reply-To: <43995d3c-dff5-3000-317c-09b119c61565@ucloud.cn>
On Fri, 2 Aug 2019 10:47:26 +0800, wenxu wrote:
> > After all the same device may have both a TC block and a NFT block.
>
> Only one subsystem can be used for the same device for both indr-dev and hw-dev
> the flow_block_cb_is_busy avoid the situation you mentioned.
AFAIU that's a temporary limitation until drivers learn to manage
multiple tables.
^ permalink raw reply
* Re: [PATCH net-next v5 5/6] flow_offload: support get flow_block immediately
From: wenxu @ 2019-08-02 2:47 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: jiri, pablo, fw, netfilter-devel, netdev, John Hurley
In-Reply-To: <20190801161129.25fee619@cakuba.netronome.com>
On 8/2/2019 7:11 AM, Jakub Kicinski wrote:
> On Thu, 1 Aug 2019 11:03:46 +0800, wenxu@ucloud.cn wrote:
>> From: wenxu <wenxu@ucloud.cn>
>>
>> The new flow-indr-block can't get the tcf_block
>> directly. It provide a callback list to find the flow_block immediately
>> when the device register and contain a ingress block.
>>
>> Signed-off-by: wenxu <wenxu@ucloud.cn>
> First of all thanks for splitting the series up into more patches,
> it is easier to follow the logic now!
>
>> @@ -328,6 +348,7 @@ struct flow_indr_block_dev {
>>
>> INIT_LIST_HEAD(&indr_dev->cb_list);
>> indr_dev->dev = dev;
>> + flow_get_default_block(indr_dev);
>> if (rhashtable_insert_fast(&indr_setup_block_ht, &indr_dev->ht_node,
>> flow_indr_setup_block_ht_params)) {
>> kfree(indr_dev);
> I wonder if it's still practical to keep the block information in the
> indr_dev structure at all. The way this used to work was:
>
>
> [hash table of devices] --------------
> | | netdev |
> | | refcnt |
> indir_dev[tun0]| ------ | cached block | ---- [ TC block ]
> | | callbacks | .
> | -------------- \__ [cb, cb_priv, cb_ident]
> | [cb, cb_priv, cb_ident]
> | --------------
> | | netdev |
> | | refcnt |
> indir_dev[tun1]| ------ | cached block | ---- [ TC block ]
> | | callbacks |.
> ----------------- -------------- \__ [cb, cb_priv, cb_ident]
> [cb, cb_priv, cb_ident]
>
>
> In the example above we have two tunnels tun0 and tun1, each one has a
> indr_dev structure allocated, and for each one of them two drivers
> registered for callbacks (hence the callbacks list has two entries).
>
> We used to cache the TC block in the indr_dev structure, but now that
> there are multiple subsytems using the indr_dev we either have to have
> a list of cached blocks (with entries for each subsystem) or just always
> iterate over the subsystems :(
>
> After all the same device may have both a TC block and a NFT block.
Only one subsystem can be used for the same device for both indr-dev and hw-dev
the flow_block_cb_is_busy avoid the situation you mentioned.
>
> I think always iterating would be easier:
>
> The indr_dev struct would no longer have the block pointer, instead
> when new driver registers for the callback instead of:
>
> if (indr_dev->ing_cmd_cb)
> indr_dev->ing_cmd_cb(indr_dev->dev, indr_dev->flow_block,
> indr_block_cb->cb, indr_block_cb->cb_priv,
> FLOW_BLOCK_BIND);
>
> We'd have something like the loop in flow_get_default_block():
>
> for each (subsystem)
> subsystem->handle_new_indir_cb(indr_dev, cb);
>
> And then per-subsystem logic would actually call the cb. Or:
>
> for each (subsystem)
> block = get_default_block(indir_dev)
> indr_dev->ing_cmd_cb(...)
>
> I hope this makes sense.
>
>
>
^ permalink raw reply
* Re: [PATCH 00/34] put_user_pages(): miscellaneous call sites
From: John Hubbard @ 2019-08-02 2:39 UTC (permalink / raw)
To: john.hubbard, Andrew Morton
Cc: Christoph Hellwig, Dan Williams, Dave Chinner, Dave Hansen,
Ira Weiny, Jan Kara, Jason Gunthorpe, Jérôme Glisse,
LKML, amd-gfx, ceph-devel, devel, devel, dri-devel, intel-gfx,
kvm, linux-arm-kernel, linux-block, linux-crypto, linux-fbdev,
linux-fsdevel, linux-media, linux-mm, linux-nfs, linux-rdma,
linux-rpi-kernel, linux-xfs, netdev, rds-devel, sparclinux, x86,
xen-devel
In-Reply-To: <20190802021653.4882-1-jhubbard@nvidia.com>
On 8/1/19 7:16 PM, john.hubbard@gmail.com wrote:
> From: John Hubbard <jhubbard@nvidia.com>
>
> Hi,
>
> These are best characterized as miscellaneous conversions: many (not all)
> call sites that don't involve biovec or iov_iter, nor mm/. It also leaves
> out a few call sites that require some more work. These are mostly pretty
> simple ones.
>
> It's probably best to send all of these via Andrew's -mm tree, assuming
> that there are no significant merge conflicts with ongoing work in other
> trees (which I doubt, given that these are small changes).
>
In case anyone is wondering, this truncated series is due to a script failure:
git-send-email chokes when it hits email addresses whose names have a
comma in them, as happened here with patch 0003.
Please disregard this set and reply to the other thread.
thanks,
--
John Hubbard
NVIDIA
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox