Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH bpf] selftests/bpf: fix test_send_signal_nmi on s390
From: Andrii Nakryiko @ 2019-07-12 19:59 UTC (permalink / raw)
  To: Y Song; +Cc: Ilya Leoshkevich, bpf, Networking, gor, heiko.carstens
In-Reply-To: <CAH3MdRWEfrQt6P4eMYgGRE9OgLkjQLqoZnCwFbrxwqKPyrrHpQ@mail.gmail.com>

On Fri, Jul 12, 2019 at 12:55 PM Y Song <ys114321@gmail.com> wrote:
>
> On Fri, Jul 12, 2019 at 11:24 AM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Fri, Jul 12, 2019 at 10:46 AM Ilya Leoshkevich <iii@linux.ibm.com> wrote:
> > >
> > > Many s390 setups (most notably, KVM guests) do not have access to
> > > hardware performance events.
> > >
> > > Therefore, use the software event instead.
> > >
> > > Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
> > > Acked-by: Vasily Gorbik <gor@linux.ibm.com>
> > > ---
> > >  tools/testing/selftests/bpf/prog_tests/send_signal.c | 9 +++++++++
> > >  1 file changed, 9 insertions(+)
> > >
> > > diff --git a/tools/testing/selftests/bpf/prog_tests/send_signal.c b/tools/testing/selftests/bpf/prog_tests/send_signal.c
> > > index 67cea1686305..4a45ea0b8448 100644
> > > --- a/tools/testing/selftests/bpf/prog_tests/send_signal.c
> > > +++ b/tools/testing/selftests/bpf/prog_tests/send_signal.c
> > > @@ -176,10 +176,19 @@ static int test_send_signal_tracepoint(void)
> > >  static int test_send_signal_nmi(void)
> > >  {
> > >         struct perf_event_attr attr = {
> > > +#if defined(__s390__)
> > > +               /* Many s390 setups (most notably, KVM guests) do not have
> > > +                * access to hardware performance events.
> > > +                */
> > > +               .sample_period = 1,
> > > +               .type = PERF_TYPE_SOFTWARE,
> > > +               .config = PERF_COUNT_SW_CPU_CLOCK,
> > > +#else
> >
> > Is there any harm in switching all archs to software event? I'd rather
> > avoid all those special arch cases, which will be really hard to test
> > for people without direct access to them.
>
> I still like to do hardware cpu_cycles in order to test nmi.
> In a physical box.
> $ perf list
> List of pre-defined events (to be used in -e):
>
>   branch-instructions OR branches                    [Hardware event]
>   branch-misses                                      [Hardware event]
>   bus-cycles                                         [Hardware event]
>   cache-misses                                       [Hardware event]
>   cache-references                                   [Hardware event]
>   cpu-cycles OR cycles                               [Hardware event]
>   instructions                                       [Hardware event]
>   ref-cycles                                         [Hardware event]
>
>   alignment-faults                                   [Software event]
>   bpf-output                                         [Software event]
>   context-switches OR cs                             [Software event]
>   cpu-clock                                          [Software event]
>   cpu-migrations OR migrations                       [Software event]
>   dummy                                              [Software event]
>   emulation-faults                                   [Software event]
>   major-faults                                       [Software event]
>   minor-faults                                       [Software event]
>   page-faults OR faults                              [Software event]
>   task-clock                                         [Software event]
>
>   L1-dcache-load-misses                              [Hardware cache event]
> ...
>
> In a VM
> $ perf list
> List of pre-defined events (to be used in -e):
>
>   alignment-faults                                   [Software event]
>   bpf-output                                         [Software event]
>   context-switches OR cs                             [Software event]
>   cpu-clock                                          [Software event]
>   cpu-migrations OR migrations                       [Software event]
>   dummy                                              [Software event]
>   emulation-faults                                   [Software event]
>   major-faults                                       [Software event]
>   minor-faults                                       [Software event]
>   page-faults OR faults                              [Software event]
>   task-clock                                         [Software event]
>
>   msr/smi/                                           [Kernel PMU
> event]
>   msr/tsc/                                           [Kernel PMU event]
> .....
>
> Is it possible that we detect at runtime whether the hardware
> cpu_cycles available or not?
> If available, let us do hardware one. Otherwise, skip or do the
> software one? The software one does not really do nmi so it will take
> the same code path in kernel as tracepoint.

Yeah, that's what I was worried about.

Ilya, could you please take a look how hard would it be to do this HW
vs SW perf event support?

>
> >
> > >                 .sample_freq = 50,
> > >                 .freq = 1,
> > >                 .type = PERF_TYPE_HARDWARE,
> > >                 .config = PERF_COUNT_HW_CPU_CYCLES,
> > > +#endif
> > >         };
> > >
> > >         return test_send_signal_common(&attr, BPF_PROG_TYPE_PERF_EVENT, "perf_event");
> > > --
> > > 2.21.0
> > >

^ permalink raw reply

* Re: [PATCH bpf] selftests/bpf: make directory prerequisites order-only
From: Andrii Nakryiko @ 2019-07-12 19:57 UTC (permalink / raw)
  To: Ilya Leoshkevich; +Cc: bpf, Networking, gor, heiko.carstens
In-Reply-To: <20190712135631.91398-1-iii@linux.ibm.com>

On Fri, Jul 12, 2019 at 6:57 AM Ilya Leoshkevich <iii@linux.ibm.com> wrote:
>
> When directories are used as prerequisites in Makefiles, they can cause
> a lot of unnecessary rebuilds, because a directory is considered changed
> whenever a file in this directory is added, removed or modified.
>
> If the only thing a target is interested in is the existence of the
> directory it depends on, which is the case for selftests/bpf, this
> directory should be specified as an order-only prerequisite: it would
> still be created in case it does not exist, but it would not trigger a
> rebuild of a target in case it's considered changed.
>
> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
> ---

Thanks!

Acked-by: Andrii Nakryiko <andriin@fb.com>


>  tools/testing/selftests/bpf/Makefile | 16 ++++++++--------
>  1 file changed, 8 insertions(+), 8 deletions(-)
>

[...]

^ permalink raw reply

* Re: [PATCH bpf] selftests/bpf: fix attach_probe on s390
From: Andrii Nakryiko @ 2019-07-12 19:55 UTC (permalink / raw)
  To: Ilya Leoshkevich
  Cc: bpf, Networking, gor, heiko.carstens, Daniel Borkmann,
	Alexei Starovoitov, Yonghong Song
In-Reply-To: <20190712134142.90668-1-iii@linux.ibm.com>

On Fri, Jul 12, 2019 at 6:42 AM Ilya Leoshkevich <iii@linux.ibm.com> wrote:
>
> attach_probe test fails, because it cannot install a kprobe on a
> non-existent sys_nanosleep symbol.
>
> Use the correct symbol name for the nanosleep syscall on 64-bit s390.
> Don't bother adding one for 31-bit mode, since tests are compiled only
> in 64-bit mode.
>
> Fixes: 1e8611bbdfc9 ("selftests/bpf: add kprobe/uprobe selftests")
> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
> Acked-by: Vasily Gorbik <gor@linux.ibm.com>
> ---

Acked-by: Andrii Nakryiko <andriin@fb.com>

This arch-specific naming is very unfortunate. I'm thinking of doing
this automatically in libbpf to help usability.


>  tools/testing/selftests/bpf/prog_tests/attach_probe.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/attach_probe.c b/tools/testing/selftests/bpf/prog_tests/attach_probe.c
> index a4686395522c..47af4afc5013 100644
> --- a/tools/testing/selftests/bpf/prog_tests/attach_probe.c
> +++ b/tools/testing/selftests/bpf/prog_tests/attach_probe.c
> @@ -23,6 +23,8 @@ ssize_t get_base_addr() {
>
>  #ifdef __x86_64__
>  #define SYS_KPROBE_NAME "__x64_sys_nanosleep"
> +#elif defined(__s390x__)
> +#define SYS_KPROBE_NAME "__s390x_sys_nanosleep"
>  #else
>  #define SYS_KPROBE_NAME "sys_nanosleep"
>  #endif
> --
> 2.21.0
>

^ permalink raw reply

* Re: [PATCH bpf] selftests/bpf: fix test_send_signal_nmi on s390
From: Y Song @ 2019-07-12 19:54 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: Ilya Leoshkevich, bpf, Networking, gor, heiko.carstens
In-Reply-To: <CAEf4BzbZ4gUZb67EKiNZTc0MnqqGz7sTB20u-M+sF+YG0Sr3pg@mail.gmail.com>

On Fri, Jul 12, 2019 at 11:24 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Fri, Jul 12, 2019 at 10:46 AM Ilya Leoshkevich <iii@linux.ibm.com> wrote:
> >
> > Many s390 setups (most notably, KVM guests) do not have access to
> > hardware performance events.
> >
> > Therefore, use the software event instead.
> >
> > Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
> > Acked-by: Vasily Gorbik <gor@linux.ibm.com>
> > ---
> >  tools/testing/selftests/bpf/prog_tests/send_signal.c | 9 +++++++++
> >  1 file changed, 9 insertions(+)
> >
> > diff --git a/tools/testing/selftests/bpf/prog_tests/send_signal.c b/tools/testing/selftests/bpf/prog_tests/send_signal.c
> > index 67cea1686305..4a45ea0b8448 100644
> > --- a/tools/testing/selftests/bpf/prog_tests/send_signal.c
> > +++ b/tools/testing/selftests/bpf/prog_tests/send_signal.c
> > @@ -176,10 +176,19 @@ static int test_send_signal_tracepoint(void)
> >  static int test_send_signal_nmi(void)
> >  {
> >         struct perf_event_attr attr = {
> > +#if defined(__s390__)
> > +               /* Many s390 setups (most notably, KVM guests) do not have
> > +                * access to hardware performance events.
> > +                */
> > +               .sample_period = 1,
> > +               .type = PERF_TYPE_SOFTWARE,
> > +               .config = PERF_COUNT_SW_CPU_CLOCK,
> > +#else
>
> Is there any harm in switching all archs to software event? I'd rather
> avoid all those special arch cases, which will be really hard to test
> for people without direct access to them.

I still like to do hardware cpu_cycles in order to test nmi.
In a physical box.
$ perf list
List of pre-defined events (to be used in -e):

  branch-instructions OR branches                    [Hardware event]
  branch-misses                                      [Hardware event]
  bus-cycles                                         [Hardware event]
  cache-misses                                       [Hardware event]
  cache-references                                   [Hardware event]
  cpu-cycles OR cycles                               [Hardware event]
  instructions                                       [Hardware event]
  ref-cycles                                         [Hardware event]

  alignment-faults                                   [Software event]
  bpf-output                                         [Software event]
  context-switches OR cs                             [Software event]
  cpu-clock                                          [Software event]
  cpu-migrations OR migrations                       [Software event]
  dummy                                              [Software event]
  emulation-faults                                   [Software event]
  major-faults                                       [Software event]
  minor-faults                                       [Software event]
  page-faults OR faults                              [Software event]
  task-clock                                         [Software event]

  L1-dcache-load-misses                              [Hardware cache event]
...

In a VM
$ perf list
List of pre-defined events (to be used in -e):

  alignment-faults                                   [Software event]
  bpf-output                                         [Software event]
  context-switches OR cs                             [Software event]
  cpu-clock                                          [Software event]
  cpu-migrations OR migrations                       [Software event]
  dummy                                              [Software event]
  emulation-faults                                   [Software event]
  major-faults                                       [Software event]
  minor-faults                                       [Software event]
  page-faults OR faults                              [Software event]
  task-clock                                         [Software event]

  msr/smi/                                           [Kernel PMU
event]
  msr/tsc/                                           [Kernel PMU event]
.....

Is it possible that we detect at runtime whether the hardware
cpu_cycles available or not?
If available, let us do hardware one. Otherwise, skip or do the
software one? The software one does not really do nmi so it will take
the same code path in kernel as tracepoint.

>
> >                 .sample_freq = 50,
> >                 .freq = 1,
> >                 .type = PERF_TYPE_HARDWARE,
> >                 .config = PERF_COUNT_HW_CPU_CYCLES,
> > +#endif
> >         };
> >
> >         return test_send_signal_common(&attr, BPF_PROG_TYPE_PERF_EVENT, "perf_event");
> > --
> > 2.21.0
> >

^ permalink raw reply

* Re: [oss-drivers] Re: [RFC bpf-next 2/8] bpf: extend list based insn patching infra to verification layer
From: Andrii Nakryiko @ 2019-07-12 19:51 UTC (permalink / raw)
  To: Jiong Wang
  Cc: Andrii Nakryiko, Alexei Starovoitov, Daniel Borkmann, Edward Cree,
	Naveen N. Rao, Jakub Kicinski, bpf, Networking, oss-drivers
In-Reply-To: <87muhk2264.fsf@netronome.com>

On Thu, Jul 11, 2019 at 5:20 AM Jiong Wang <jiong.wang@netronome.com> wrote:
>
>
> Jiong Wang writes:
>
> > Andrii Nakryiko writes:
> >
> >> On Thu, Jul 4, 2019 at 2:32 PM Jiong Wang <jiong.wang@netronome.com> wrote:
> >>>
> >>> Verification layer also needs to handle auxiliar info as well as adjusting
> >>> subprog start.
> >>>
> >>> At this layer, insns inside patch buffer could be jump, but they should
> >>> have been resolved, meaning they shouldn't jump to insn outside of the
> >>> patch buffer. Lineration function for this layer won't touch insns inside
> >>> patch buffer.
> >>>
> >>> Adjusting subprog is finished along with adjusting jump target when the
> >>> input will cover bpf to bpf call insn, re-register subprog start is cheap.
> >>> But adjustment when there is insn deleteion is not considered yet.
> >>>
> >>> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
> >>> ---
> >>>  kernel/bpf/verifier.c | 150 ++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>  1 file changed, 150 insertions(+)
> >>>
> >>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> >>> index a2e7637..2026d64 100644
> >>> --- a/kernel/bpf/verifier.c
> >>> +++ b/kernel/bpf/verifier.c
> >>> @@ -8350,6 +8350,156 @@ static void opt_hard_wire_dead_code_branches(struct bpf_verifier_env *env)
> >>>         }
> >>>  }
> >>>
> >>> +/* Linearize bpf list insn to array (verifier layer). */
> >>> +static struct bpf_verifier_env *
> >>> +verifier_linearize_list_insn(struct bpf_verifier_env *env,
> >>> +                            struct bpf_list_insn *list)
> >>
> >> It's unclear why this returns env back? It's not allocating a new env,
> >> so it's weird and unnecessary. Just return error code.
> >
> > The reason is I was thinking we have two layers in BPF, the core and the
> > verifier.
> >
> > For core layer (the relevant file is core.c), when doing patching, the
> > input is insn list and bpf_prog, the linearization should linearize the
> > insn list into insn array, and also whatever others affect inside bpf_prog
> > due to changing on insns, for example line info inside prog->aux. So the
> > return value is bpf_prog for core layer linearization hook.
> >
> > For verifier layer, it is similar, but the context if bpf_verifier_env, the
> > linearization hook should linearize the insn list, and also those affected
> > inside env, for example bpf_insn_aux_data, so the return value is
> > bpf_verifier_env, meaning returning an updated verifier context
> > (bpf_verifier_env) after insn list linearization.
>
> Realized your point is no new env is allocated, so just return error
> code. Yes, the env pointer is not changed, just internal data is
> updated. Return bpf_verifier_env mostly is trying to make the hook more
> clear that it returns an updated "context" where the linearization happens,
> for verifier layer, it is bpf_verifier_env, and for core layer, it is
> bpf_prog, so return value was designed to return these two types.

Oh, I missed that core layer returns bpf_prog*. I think this is
confusing as hell and is very contrary to what one would expect. If
the function doesn't allocate those objects, it shouldn't return them,
except for rare cases of some accessor functions. Me reading this,
I'll always be suprised and will have to go skim code just to check
whether those functions really return new bpf_prog or
bpf_verifier_env, respectively.

Please change them both to just return error code.

>
> >
> > Make sense?
> >
> > Regards,
> > Jiong
> >
> >>
> >>> +{
> >>> +       u32 *idx_map, idx, orig_cnt, fini_cnt = 0;
> >>> +       struct bpf_subprog_info *new_subinfo;
> >>> +       struct bpf_insn_aux_data *new_data;
> >>> +       struct bpf_prog *prog = env->prog;
> >>> +       struct bpf_verifier_env *ret_env;
> >>> +       struct bpf_insn *insns, *insn;
> >>> +       struct bpf_list_insn *elem;
> >>> +       int ret;
> >>> +
> >>> +       /* Calculate final size. */
> >>> +       for (elem = list; elem; elem = elem->next)
> >>> +               if (!(elem->flag & LIST_INSN_FLAG_REMOVED))
> >>> +                       fini_cnt++;
> >>> +
> >>> +       orig_cnt = prog->len;
> >>> +       insns = prog->insnsi;
> >>> +       /* If prog length remains same, nothing else to do. */
> >>> +       if (fini_cnt == orig_cnt) {
> >>> +               for (insn = insns, elem = list; elem; elem = elem->next, insn++)
> >>> +                       *insn = elem->insn;
> >>> +               return env;
> >>> +       }
> >>> +       /* Realloc insn buffer when necessary. */
> >>> +       if (fini_cnt > orig_cnt)
> >>> +               prog = bpf_prog_realloc(prog, bpf_prog_size(fini_cnt),
> >>> +                                       GFP_USER);
> >>> +       if (!prog)
> >>> +               return ERR_PTR(-ENOMEM);
> >>> +       insns = prog->insnsi;
> >>> +       prog->len = fini_cnt;
> >>> +       ret_env = env;
> >>> +
> >>> +       /* idx_map[OLD_IDX] = NEW_IDX */
> >>> +       idx_map = kvmalloc(orig_cnt * sizeof(u32), GFP_KERNEL);
> >>> +       if (!idx_map)
> >>> +               return ERR_PTR(-ENOMEM);
> >>> +       memset(idx_map, 0xff, orig_cnt * sizeof(u32));
> >>> +
> >>> +       /* Use the same alloc method used when allocating env->insn_aux_data. */
> >>> +       new_data = vzalloc(array_size(sizeof(*new_data), fini_cnt));
> >>> +       if (!new_data) {
> >>> +               kvfree(idx_map);
> >>> +               return ERR_PTR(-ENOMEM);
> >>> +       }
> >>> +
> >>> +       /* Copy over insn + calculate idx_map. */
> >>> +       for (idx = 0, elem = list; elem; elem = elem->next) {
> >>> +               int orig_idx = elem->orig_idx - 1;
> >>> +
> >>> +               if (orig_idx >= 0) {
> >>> +                       idx_map[orig_idx] = idx;
> >>> +
> >>> +                       if (elem->flag & LIST_INSN_FLAG_REMOVED)
> >>> +                               continue;
> >>> +
> >>> +                       new_data[idx] = env->insn_aux_data[orig_idx];
> >>> +
> >>> +                       if (elem->flag & LIST_INSN_FLAG_PATCHED)
> >>> +                               new_data[idx].zext_dst =
> >>> +                                       insn_has_def32(env, &elem->insn);
> >>> +               } else {
> >>> +                       new_data[idx].seen = true;
> >>> +                       new_data[idx].zext_dst = insn_has_def32(env,
> >>> +                                                               &elem->insn);
> >>> +               }
> >>> +               insns[idx++] = elem->insn;
> >>> +       }
> >>> +
> >>> +       new_subinfo = kvzalloc(sizeof(env->subprog_info), GFP_KERNEL);
> >>> +       if (!new_subinfo) {
> >>> +               kvfree(idx_map);
> >>> +               vfree(new_data);
> >>> +               return ERR_PTR(-ENOMEM);
> >>> +       }
> >>> +       memcpy(new_subinfo, env->subprog_info, sizeof(env->subprog_info));
> >>> +       memset(env->subprog_info, 0, sizeof(env->subprog_info));
> >>> +       env->subprog_cnt = 0;
> >>> +       env->prog = prog;
> >>> +       ret = add_subprog(env, 0);
> >>> +       if (ret < 0) {
> >>> +               ret_env = ERR_PTR(ret);
> >>> +               goto free_all_ret;
> >>> +       }
> >>> +       /* Relocate jumps using idx_map.
> >>> +        *   old_dst = jmp_insn.old_target + old_pc + 1;
> >>> +        *   new_dst = idx_map[old_dst] = jmp_insn.new_target + new_pc + 1;
> >>> +        *   jmp_insn.new_target = new_dst - new_pc - 1;
> >>> +        */
> >>> +       for (idx = 0, elem = list; elem; elem = elem->next) {
> >>> +               int orig_idx = elem->orig_idx;
> >>> +
> >>> +               if (elem->flag & LIST_INSN_FLAG_REMOVED)
> >>> +                       continue;
> >>> +               if ((elem->flag & LIST_INSN_FLAG_PATCHED) || !orig_idx) {
> >>> +                       idx++;
> >>> +                       continue;
> >>> +               }
> >>> +
> >>> +               ret = bpf_jit_adj_imm_off(&insns[idx], orig_idx - 1, idx,
> >>> +                                         idx_map);
> >>> +               if (ret < 0) {
> >>> +                       ret_env = ERR_PTR(ret);
> >>> +                       goto free_all_ret;
> >>> +               }
> >>> +               /* Recalculate subprog start as we are at bpf2bpf call insn. */
> >>> +               if (ret > 0) {
> >>> +                       ret = add_subprog(env, idx + insns[idx].imm + 1);
> >>> +                       if (ret < 0) {
> >>> +                               ret_env = ERR_PTR(ret);
> >>> +                               goto free_all_ret;
> >>> +                       }
> >>> +               }
> >>> +               idx++;
> >>> +       }
> >>> +       if (ret < 0) {
> >>> +               ret_env = ERR_PTR(ret);
> >>> +               goto free_all_ret;
> >>> +       }
> >>> +
> >>> +       env->subprog_info[env->subprog_cnt].start = fini_cnt;
> >>> +       for (idx = 0; idx <= env->subprog_cnt; idx++)
> >>> +               new_subinfo[idx].start = env->subprog_info[idx].start;
> >>> +       memcpy(env->subprog_info, new_subinfo, sizeof(env->subprog_info));
> >>> +
> >>> +       /* Adjust linfo.
> >>> +        * FIXME: no support for insn removal at the moment.
> >>> +        */
> >>> +       if (prog->aux->nr_linfo) {
> >>> +               struct bpf_line_info *linfo = prog->aux->linfo;
> >>> +               u32 nr_linfo = prog->aux->nr_linfo;
> >>> +
> >>> +               for (idx = 0; idx < nr_linfo; idx++)
> >>> +                       linfo[idx].insn_off = idx_map[linfo[idx].insn_off];
> >>> +       }
> >>> +       vfree(env->insn_aux_data);
> >>> +       env->insn_aux_data = new_data;
> >>> +       goto free_mem_list_ret;
> >>> +free_all_ret:
> >>> +       vfree(new_data);
> >>> +free_mem_list_ret:
> >>> +       kvfree(new_subinfo);
> >>> +       kvfree(idx_map);
> >>> +       return ret_env;
> >>> +}
> >>> +
> >>>  static int opt_remove_dead_code(struct bpf_verifier_env *env)
> >>>  {
> >>>         struct bpf_insn_aux_data *aux_data = env->insn_aux_data;
> >>> --
> >>> 2.7.4
> >>>
>

^ permalink raw reply

* Re: [RFC bpf-next 1/8] bpf: introducing list based insn patching infra to core layer
From: Andrii Nakryiko @ 2019-07-12 19:48 UTC (permalink / raw)
  To: Jiong Wang
  Cc: Alexei Starovoitov, Daniel Borkmann, Edward Cree, Naveen N. Rao,
	Andrii Nakryiko, Jakub Kicinski, bpf, Networking, oss-drivers
In-Reply-To: <87pnmg23fc.fsf@netronome.com>

On Thu, Jul 11, 2019 at 4:53 AM Jiong Wang <jiong.wang@netronome.com> wrote:
>
>
> Andrii Nakryiko writes:
>
> > On Thu, Jul 4, 2019 at 2:32 PM Jiong Wang <jiong.wang@netronome.com> wrote:
> >>
> >> This patch introduces list based bpf insn patching infra to bpf core layer
> >> which is lower than verification layer.
> >>
> >> This layer has bpf insn sequence as the solo input, therefore the tasks
> >> to be finished during list linerization is:
> >>   - copy insn
> >>   - relocate jumps
> >>   - relocation line info.
> >>
> >> Suggested-by: Alexei Starovoitov <ast@kernel.org>
> >> Suggested-by: Edward Cree <ecree@solarflare.com>
> >> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
> >> ---
> >>  include/linux/filter.h |  25 +++++
> >>  kernel/bpf/core.c      | 268 +++++++++++++++++++++++++++++++++++++++++++++++++
> >>  2 files changed, 293 insertions(+)
> >>
> >> diff --git a/include/linux/filter.h b/include/linux/filter.h
> >> index 1fe53e7..1fea68c 100644
> >> --- a/include/linux/filter.h
> >> +++ b/include/linux/filter.h
> >> @@ -842,6 +842,31 @@ struct bpf_prog *bpf_patch_insn_single(struct bpf_prog *prog, u32 off,
> >>                                        const struct bpf_insn *patch, u32 len);
> >>  int bpf_remove_insns(struct bpf_prog *prog, u32 off, u32 cnt);
> >>
> >> +int bpf_jit_adj_imm_off(struct bpf_insn *insn, int old_idx, int new_idx,
> >> +                       int idx_map[]);
> >> +
> >> +#define LIST_INSN_FLAG_PATCHED 0x1
> >> +#define LIST_INSN_FLAG_REMOVED 0x2
> >> +struct bpf_list_insn {
> >> +       struct bpf_insn insn;
> >> +       struct bpf_list_insn *next;
> >> +       s32 orig_idx;
> >> +       u32 flag;
> >> +};
> >> +
> >> +struct bpf_list_insn *bpf_create_list_insn(struct bpf_prog *prog);
> >> +void bpf_destroy_list_insn(struct bpf_list_insn *list);
> >> +/* Replace LIST_INSN with new list insns generated from PATCH. */
> >> +struct bpf_list_insn *bpf_patch_list_insn(struct bpf_list_insn *list_insn,
> >> +                                         const struct bpf_insn *patch,
> >> +                                         u32 len);
> >> +/* Pre-patch list_insn with insns inside PATCH, meaning LIST_INSN is not
> >> + * touched. New list insns are inserted before it.
> >> + */
> >> +struct bpf_list_insn *bpf_prepatch_list_insn(struct bpf_list_insn *list_insn,
> >> +                                            const struct bpf_insn *patch,
> >> +                                            u32 len);
> >> +
> >>  void bpf_clear_redirect_map(struct bpf_map *map);
> >>
> >>  static inline bool xdp_return_frame_no_direct(void)
> >> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> >> index e2c1b43..e60703e 100644
> >> --- a/kernel/bpf/core.c
> >> +++ b/kernel/bpf/core.c
> >> @@ -502,6 +502,274 @@ int bpf_remove_insns(struct bpf_prog *prog, u32 off, u32 cnt)
> >>         return WARN_ON_ONCE(bpf_adj_branches(prog, off, off + cnt, off, false));
> >>  }
> >>
> >> +int bpf_jit_adj_imm_off(struct bpf_insn *insn, int old_idx, int new_idx,
> >> +                       s32 idx_map[])
> >> +{
> >> +       u8 code = insn->code;
> >> +       s64 imm;
> >> +       s32 off;
> >> +
> >> +       if (BPF_CLASS(code) != BPF_JMP && BPF_CLASS(code) != BPF_JMP32)
> >> +               return 0;
> >> +
> >> +       if (BPF_CLASS(code) == BPF_JMP &&
> >> +           (BPF_OP(code) == BPF_EXIT ||
> >> +            (BPF_OP(code) == BPF_CALL && insn->src_reg != BPF_PSEUDO_CALL)))
> >> +               return 0;
> >> +
> >> +       /* BPF to BPF call. */
> >> +       if (BPF_OP(code) == BPF_CALL) {
> >> +               imm = idx_map[old_idx + insn->imm + 1] - new_idx - 1;
> >> +               if (imm < S32_MIN || imm > S32_MAX)
> >> +                       return -ERANGE;
> >> +               insn->imm = imm;
> >> +               return 1;
> >> +       }
> >> +
> >> +       /* Jump. */
> >> +       off = idx_map[old_idx + insn->off + 1] - new_idx - 1;
> >> +       if (off < S16_MIN || off > S16_MAX)
> >> +               return -ERANGE;
> >> +       insn->off = off;
> >> +       return 0;
> >> +}
> >> +
> >> +void bpf_destroy_list_insn(struct bpf_list_insn *list)
> >> +{
> >> +       struct bpf_list_insn *elem, *next;
> >> +
> >> +       for (elem = list; elem; elem = next) {
> >> +               next = elem->next;
> >> +               kvfree(elem);
> >> +       }
> >> +}
> >> +
> >> +struct bpf_list_insn *bpf_create_list_insn(struct bpf_prog *prog)
> >> +{
> >> +       unsigned int idx, len = prog->len;
> >> +       struct bpf_list_insn *hdr, *prev;
> >> +       struct bpf_insn *insns;
> >> +
> >> +       hdr = kvzalloc(sizeof(*hdr), GFP_KERNEL);
> >> +       if (!hdr)
> >> +               return ERR_PTR(-ENOMEM);
> >> +
> >> +       insns = prog->insnsi;
> >> +       hdr->insn = insns[0];
> >> +       hdr->orig_idx = 1;
> >> +       prev = hdr;
> >
> > I'm not sure why you need this "prologue" instead of handling first
> > instruction uniformly in for loop below?
>
> It is because the head of the list doesn't have precessor, so no need of
> the prev->next assignment, not could do a check inside the loop to rule the
> head out when doing it.

yeah, prev = NULL initially. Then

if (prev) prev->next = node;

Or see my suggestiong about having patchabel_insns_list wrapper struct
(in cover letter thread).

>
> >> +
> >> +       for (idx = 1; idx < len; idx++) {
> >> +               struct bpf_list_insn *node = kvzalloc(sizeof(*node),
> >> +                                                     GFP_KERNEL);
> >> +
> >> +               if (!node) {
> >> +                       /* Destroy what has been allocated. */
> >> +                       bpf_destroy_list_insn(hdr);
> >> +                       return ERR_PTR(-ENOMEM);
> >> +               }
> >> +               node->insn = insns[idx];
> >> +               node->orig_idx = idx + 1;
> >
> > Why orig_idx is 1-based? It's really confusing.
>
> orig_idx == 0 means one insn is without original insn, means it is an new
> insn generated for patching purpose.
>
> While the LIST_INSN_FLAG_PATCHED in the RFC means one insn in original prog
> is patched.
>
> I had been trying to differenciate above two cases, but yes, they are
> confusing and differenciating them might be useless, if an insn in original
> prog is patched, all its info could be treated as clobbered and needing
> re-calculating or should do conservative assumption.

Instruction will be new and not patched only in patch_buffer. Once you
add them to the list, they are patched, no? Not sure what's the
distinction you are trying to maintain here.

>
> >
> >> +               prev->next = node;
> >> +               prev = node;
> >> +       }
> >> +
> >> +       return hdr;
> >> +}
> >> +

[...]

> >> +
> >> +       len--;
> >> +       patch++;
> >> +
> >> +       prev = list_insn;
> >> +       next = list_insn->next;
> >> +       for (idx = 0; idx < len; idx++) {
> >> +               struct bpf_list_insn *node = kvzalloc(sizeof(*node),
> >> +                                                     GFP_KERNEL);
> >> +
> >> +               if (!node) {
> >> +                       /* Link what's allocated, so list destroyer could
> >> +                        * free them.
> >> +                        */
> >> +                       prev->next = next;
> >
> > Why this special handling, if you can just insert element so that list
> > is well-formed after each instruction?
>
> Good idea, just always do "node->next = next", the "prev->next = node" in
> next round will fix it.
>
> >
> >> +                       return ERR_PTR(-ENOMEM);
> >> +               }
> >> +
> >> +               node->insn = patch[idx];
> >> +               prev->next = node;
> >> +               prev = node;
> >
> > E.g.,
> >
> > node->next = next;
> > prev->next = node;
> > prev = node;
> >
> >> +       }
> >> +
> >> +       prev->next = next;
> >
> > And no need for this either.
> >
> >> +       return prev;
> >> +}
> >> +
> >> +struct bpf_list_insn *bpf_prepatch_list_insn(struct bpf_list_insn *list_insn,
> >> +                                            const struct bpf_insn *patch,
> >> +                                            u32 len)
> >
> > prepatch and patch functions should share the same logic.
> >
> > Prepend is just that - insert all instructions from buffer before current insns.
> > Patch -> replace current one with first instriction in a buffer, then
> > prepend remaining ones before the next instruction (so patch should
> > call info prepend, with adjusted count and array pointer).
>
> Ack, there indeed has quite a few things to simplify.
>
> >
> >> +{
> >> +       struct bpf_list_insn *prev, *node, *begin_node;
> >> +       u32 idx;
> >> +
> >> +       if (!len)
> >> +               return list_insn;
> >> +
> >> +       node = kvzalloc(sizeof(*node), GFP_KERNEL);
> >> +       if (!node)
> >> +               return ERR_PTR(-ENOMEM);
> >> +       node->insn = patch[0];
> >> +       begin_node = node;
> >> +       prev = node;
> >> +
> >> +       for (idx = 1; idx < len; idx++) {
> >> +               node = kvzalloc(sizeof(*node), GFP_KERNEL);
> >> +               if (!node) {
> >> +                       node = begin_node;
> >> +                       /* Release what's has been allocated. */
> >> +                       while (node) {
> >> +                               struct bpf_list_insn *next = node->next;
> >> +
> >> +                               kvfree(node);
> >> +                               node = next;
> >> +                       }
> >> +                       return ERR_PTR(-ENOMEM);
> >> +               }
> >> +               node->insn = patch[idx];
> >> +               prev->next = node;
> >> +               prev = node;
> >> +       }
> >> +
> >> +       prev->next = list_insn;
> >> +       return begin_node;
> >> +}
> >> +
> >>  void bpf_prog_kallsyms_del_subprogs(struct bpf_prog *fp)
> >>  {
> >>         int i;
> >> --
> >> 2.7.4
> >>
>

^ permalink raw reply

* Re: [RFC bpf-next 0/8] bpf: accelerate insn patching speed
From: Andrii Nakryiko @ 2019-07-12 19:43 UTC (permalink / raw)
  To: Jiong Wang
  Cc: Alexei Starovoitov, Daniel Borkmann, Edward Cree, Naveen N. Rao,
	Andrii Nakryiko, Jakub Kicinski, bpf, Networking, oss-drivers,
	Yonghong Song
In-Reply-To: <87r26w24v4.fsf@netronome.com>

On Thu, Jul 11, 2019 at 4:22 AM Jiong Wang <jiong.wang@netronome.com> wrote:
>
>
> Andrii Nakryiko writes:
>
> > On Thu, Jul 4, 2019 at 2:31 PM Jiong Wang <jiong.wang@netronome.com> wrote:
> >>
> >> This is an RFC based on latest bpf-next about acclerating insn patching
> >> speed, it is now near the shape of final PATCH set, and we could see the
> >> changes migrating to list patching would brings, so send out for
> >> comments. Most of the info are in cover letter. I splitted the code in a
> >> way to show API migration more easily.
> >
> >
> > Hey Jiong,
> >
> >
> > Sorry, took me a while to get to this and learn more about instruction
> > patching. Overall this looks good and I think is a good direction.
> > I'll post high-level feedback here, and some more
> > implementation-specific ones in corresponding patches.
>
> Great, thanks very much for the feedbacks. Most of your feedbacks are
> hitting those pain points I exactly had ran into. For some of them, I
> thought similar solutions like yours, but failed due to various
> reasons. Let's go through them again, I could have missed some important
> things.
>
> Please see my replies below.

Thanks for thoughtful reply :)

>
> >>
> >> Test Results
> >> ===
> >>   - Full pass on test_verifier/test_prog/test_prog_32 under all three
> >>     modes (interpreter, JIT, JIT with blinding).
> >>
> >>   - Benchmarking shows 10 ~ 15x faster on medium sized prog, and reduce
> >>     patching time from 5100s (nearly one and a half hour) to less than
> >>     0.5s for 1M insn patching.
> >>
> >> Known Issues
> >> ===
> >>   - The following warning is triggered when running scale test which
> >>     contains 1M insns and patching:
> >>       warning of mm/page_alloc.c:4639 __alloc_pages_nodemask+0x29e/0x330
> >>
> >>     This is caused by existing code, it can be reproduced on bpf-next
> >>     master with jit blinding enabled, then run scale unit test, it will
> >>     shown up after half an hour. After this set, patching is very fast, so
> >>     it shows up quickly.
> >>
> >>   - No line info adjustment support when doing insn delete, subprog adj
> >>     is with bug when doing insn delete as well. Generally, removal of insns
> >>     could possibly cause remove of entire line or subprog, therefore
> >>     entries of prog->aux->linfo or env->subprog needs to be deleted. I
> >>     don't have good idea and clean code for integrating this into the
> >>     linearization code at the moment, will do more experimenting,
> >>     appreciate ideas and suggestions on this.
> >
> > Is there any specific problem to detect which line info to delete? Or
> > what am I missing besides careful implementation?
>
> Mostly line info and subprog info are range info which covers a range of
> insns. Deleting insns could causing you adjusting the range or removing one
> range entirely. subprog info could be fully recalcuated during
> linearization while line info I need some careful implementation and I
> failed to have clean code for this during linearization also as said no
> unit tests to help me understand whether the code is correct or not.
>

Ok, that's good that it's just about clean implementation. Try to
implement it as clearly as possible. Then post it here, and if it can
be improved someone (me?) will try to help to clean it up further.

Not a big expert on line info, so can't comment on that,
unfortunately. Maybe Yonghong can chime in (cc'ed)


> I will described this latter, spent too much time writing the following
> reply. Might worth an separate discussion thread.
>
> >>
> >>     Insn delete doesn't happen on normal programs, for example Cilium
> >>     benchmarks, and happens rarely on test_progs, so the test coverage is
> >>     not good. That's also why this RFC have a full pass on selftest with
> >>     this known issue.
> >
> > I hope you'll add test for deletion (and w/ corresponding line info)
> > in final patch set :)
>
> Will try. Need to spend some time on BTF format.
> >
> >>
> >>   - Could further use mem pool to accelerate the speed, changes are trivial
> >>     on top of this RFC, and could be 2x extra faster. Not included in this
> >>     RFC as reducing the algo complexity from quadratic to linear of insn
> >>     number is the first step.
> >
> > Honestly, I think that would add more complexity than necessary, and I
> > think we can further speed up performance without that, see below.
> >
> >>
> >> Background
> >> ===
> >> This RFC aims to accelerate BPF insn patching speed, patching means expand
> >> one bpf insn at any offset inside bpf prog into a set of new insns, or
> >> remove insns.
> >>
> >> At the moment, insn patching is quadratic of insn number, this is due to
> >> branch targets of jump insns needs to be adjusted, and the algo used is:
> >>
> >>   for insn inside prog
> >>     patch insn + regeneate bpf prog
> >>     for insn inside new prog
> >>       adjust jump target
> >>
> >> This is causing significant time spending when a bpf prog requires large
> >> amount of patching on different insns. Benchmarking shows it could take
> >> more than half minutes to finish patching when patching number is more
> >> than 50K, and the time spent could be more than one hour when patching
> >> number is around 1M.
> >>
> >>   15000   :    3s
> >>   45000   :   29s
> >>   95000   :  125s
> >>   195000  :  712s
> >>   1000000 : 5100s
> >>
> >> This RFC introduces new patching infrastructure. Before doing insn
> >> patching, insns in bpf prog are turned into a singly linked list, insert
> >> new insns just insert new list node, delete insns just set delete flag.
> >> And finally, the list is linearized back into array, and branch target
> >> adjustment is done for all jump insns during linearization. This algo
> >> brings the time complexity from quadratic to linear of insn number.
> >>
> >> Benchmarking shows the new patching infrastructure could be 10 ~ 15x faster
> >> on medium sized prog, and for a 1M patching it reduce the time from 5100s
> >> to less than 0.5s.
> >>
> >> Patching API
> >> ===
> >> Insn patching could happen on two layers inside BPF. One is "core layer"
> >> where only BPF insns are patched. The other is "verification layer" where
> >> insns have corresponding aux info as well high level subprog info, so
> >> insn patching means aux info needs to be patched as well, and subprog info
> >> needs to be adjusted. BPF prog also has debug info associated, so line info
> >> should always be updated after insn patching.
> >>
> >> So, list creation, destroy, insert, delete is the same for both layer,
> >> but lineration is different. "verification layer" patching require extra
> >> work. Therefore the patch APIs are:
> >>
> >>    list creation:                bpf_create_list_insn
> >>    list patch:                   bpf_patch_list_insn
> >>    list pre-patch:               bpf_prepatch_list_insn
> >
> > I think pre-patch name is very confusing, until I read full
> > description I couldn't understand what it's supposed to be used for.
> > Speaking of bpf_patch_list_insn, patch is also generic enough to leave
> > me wondering whether instruction buffer is inserted after instruction,
> > or instruction is replaced with a bunch of instructions.
> >
> > So how about two more specific names:
> > bpf_patch_list_insn -> bpf_list_insn_replace (meaning replace given
> > instruction with a list of patch instructions)
> > bpf_prepatch_list_insn -> bpf_list_insn_prepend (well, I think this
> > one is pretty clear).
>
> My sense on English word is not great, will switch to above which indeed
> reads more clear.
>
> >>    list lineration (core layer): prog = bpf_linearize_list_insn(prog, list)
> >>    list lineration (veri layer): env = verifier_linearize_list_insn(env, list)
> >
> > These two functions are both quite involved, as well as share a lot of
> > common code. I'd rather have one linearize instruction, that takes env
> > as an optional parameter. If env is specified (which is the case for
> > all cases except for constant blinding pass), then adjust aux_data and
> > subprogs along the way.
>
> Two version of lineration and how to unify them was a painpoint to me. I
> thought to factor out some of the common code out, but it actually doesn't
> count much, the final size counting + insnsi resize parts are the same,
> then things start to diverge since the "Copy over insn" loop.
>
> verifier layer needs to copy and initialize aux data etc. And jump
> relocation is different. At core layer, the use case is JIT blinding which
> could expand an jump_imm insn into a and/or/jump_reg sequence, and the

Sorry, I didn't get what "could expand an jump_imm insn into a
and/or/jump_reg sequence", maybe you can clarify if I'm missing
something.

But from your cover letter description, core layer has no jumps at
all, while verifier has jumps inside patch buffer. So, if you support
jumps inside of patch buffer, it will automatically work for core
layer. Or what am I missing?

Just compared two version of linearize side by side. From what I can
see, unified version could look like this, high-level:

1. Count final insn count (but see my other suggestions how to avoid
that altogether). If not changed - exit.
2. Realloc insn buffer, copy just instructions (not aux_data yet).
Build idx_map, if necessary.
3. (if env) then bpf_patchable_insn has aux_data, so now do another
pass and copy it into resulting array.
4. (if env) Copy sub info. Though I'd see if we can just reuse old
ones and just adjust offsets. I'm not sure why we need to allocate new
array, subprogram count shouldn't change, right?
5. (common) Relocate jumps. Not clear why core layer doesn't care
about PATCHED (or, alternatively, why verifier layer cares). And
again, with targets pointer it will look totally different (and
simpler).
6. (if env) adjust subprogs
7. (common) Adjust prog's line info.

The devil is in the details, but I think this will still be better if
contained in one function if a bunch of `if (env)` checks. Still
pretty linear.

> jump_reg is at the end of the patch buffer, it should be relocated. While
> all use case in verifier layer, no jump in the prog will be patched and all
> new jumps in patch buffer will jump inside the buffer locally so no need to
> resolve.
>
> And yes we could unify them into one and control the diverge using
> argument, but then where to place the function is an issue. My
> understanding is verifier.c is designed to be on top of core.c and core.c
> should not reference and no need to be aware of any verifier specific data
> structures, for example env or bpf_aux_insn_data etc.

Func prototype where it is. Maybe forward-declare verifier env struct.
Implementation in verifier.c?

>
> So, in this RFC, I had choosed to write separate linerization function for
> core and verifier layer. Does this make sense?

See above. Let's still try to make it better.

>
> >
> > This would keep logic less duplicated and shouldn't complexity beyond
> > few null checks in few places.
> >
> >>    list destroy:                 bpf_destroy_list_insn
> >>
> >
> > I'd also add a macro foreach_list_insn instead of explicit for loops
> > in multiple places. That would also allow to skip deleted instructions
> > transparently.
> >
> >> list patch could change the insn at patch point, it will invalid the aux
> >
> > typo: invalid -> invalidate
>
> Ack.
>
> >
> >> info at patching point. list pre-patch insert new insns before patch point
> >> where the insn and associated aux info are not touched, it is used for
> >> example in convert_ctx_access when generating prologue.
> >>
> >> Typical API sequence for one patching pass:
> >>
> >>    struct bpf_list_insn list = bpf_create_list_insn(struct bpf_prog);
> >>    for (elem = list; elem; elem = elem->next)
> >>       patch_buf = gen_patch_buf_logic;
> >>       elem = bpf_patch_list_insn(elem, patch_buf, cnt);
> >>    bpf_prog = bpf_linearize_list_insn(list)
> >>    bpf_destroy_list_insn(list)
> >>
> >> Several patching passes could also share the same list:
> >>
> >>    struct bpf_list_insn list = bpf_create_list_insn(struct bpf_prog);
> >>    for (elem = list; elem; elem = elem->next)
> >>       patch_buf = gen_patch_buf_logic1;
> >>       elem = bpf_patch_list_insn(elem, patch_buf, cnt);
> >>    for (elem = list; elem; elem = elem->next)
> >>       patch_buf = gen_patch_buf_logic2;
> >>       elem = bpf_patch_list_insn(elem, patch_buf, cnt);
> >>    bpf_prog = bpf_linearize_list_insn(list)
> >>    bpf_destroy_list_insn(list)
> >>
> >> but note new inserted insns int early passes won't have aux info except
> >> zext info. So, if one patch pass requires all aux info updated and
> >> recalculated for all insns including those pathced, it should first
> >> linearize the old list, then re-create the list. The RFC always create and
> >> linearize the list for each migrated patching pass separately.
> >
> > I think we should do just one list creation, few passes of patching
> > and then linearize once. That will save quite a lot of memory
> > allocation and will speed up a lot of things. All the verifier
> > patching happens one after the other without any other functionality
> > in between, so there shouldn't be any problem.
>
> Yes, as mentioned above, it is possible and I had tried to do it in an very
> initial impl. IIRC convert_ctx_access + fixup_bpf_calls could share the
> same list, but then the 32-bit zero extension insertion pass requires
> aux.zext_dst set properly for all instructions including those patched

So zext_dst. Seems like it's easily calculatable, so doesn't seem like
it even needs to be accessed from aux_data.

But. I can see at least two ways to do this:
1. those patching passes that care about aux_data, should just do
extra check for NULL. Because when we adjust insns now, we just leave
zero-initialized aux_data, except for zext_dst and seen. So it's easy
to default to them if aux_data is NULL for patchable_insn.
2. just allocate and fill them out them when applying patch insns
buffer. It's not a duplication, we already fill them out during
patching today. So just do the same, except through malloc()'ed
pointer instead. At the end they will be copied into linear resulting
array during linearization (uniformly with non-patched insns).

> one which we need to linearize the list first (as we set zext_dst during
> linerization), or the other choice is we do the zext_dst initialization
> during bpf_patch_list_insn, but this then make bpf_patch_list_insn diverge
> between core and verifier layer.

List construction is much simpler, even if we have to have extra
check, similar to `if (env) { do_extra(); }`, IMO, it's fine.

>
> > As for aux_data. We can solve that even more simply and reliably by
> > storing a pointer along the struct bpf_list_insn
>
> This is exactly what I had implemented initially, but then the issue is how
> to handle aux_data for patched insn? IIRC I was leave it as a NULL pointer,
> but later found zext_dst info is required for all insns, so I end up
> duplicating zext_dst in bpf_list_insn.

See above. No duplication. You have a pointer. Whether aux_data is in
original array or was malloc()'ed, doesn't matter. But no duplication
of fields.

>
> This leads me worrying we need to keep duplicating fields there as soon as
> there is new similar requirements in future patching pass and I thought it
> might be better to just reference the aux_data inside env using orig_idx,
> this avoids duplicating information, but we need to make sure used fields
> inside aux_data for patched insn update-to-date during linearization or
> patching list.
>
> > (btw, how about calling it bpf_patchable_insn?).
>
> No preference, will use this one.
>
> > Here's how I propose to represent this patchable instruction:
> >
> > struct bpf_list_insn {
> >        struct bpf_insn insn;
> >        struct bpf_list_insn *next;
> >        struct bpf_list_insn *target;
> >        struct bpf_insn_aux_data *aux_data;
> >        s32 orig_idx; // can repurpose this to have three meanings:
> >                      // -2 - deleted
> >                      // -1 - patched/inserted insn
> >                      // >=0 - original idx
>
> I actually had experimented the -2/-1/0 trick, exactly the same number
> assignment :) IIRC the code was not clear compared with using flag, the
> reason seems to be:
>   1. we still need orig_idx of an patched insn somehow, meaning negate the
>      index.

Not following, original index with be >=0, no?


>   2. somehow somecode need to know whether one insn is deleted or patched
>      after the negation, so I end up with some ugly code.

So that's why you'll have constants with descriptive name for -2 and -1.

>
> Anyway, I might had not thought hard enough on this, I will retry using the
> special index instead of flag, hopefully I could have clean code this time.
>

Yeah, please try again. All those `orig_idx = insn->orig_idx - 1; if
(orig_idx >= 0) { ... }` are very confusing.

> > };
> >
> > The idea would be as follows:
> > 1. when creating original list, target pointer will point directly to
> > a patchable instruction wrapper for jumps/calls. This will allow to
> > stop tracking and re-calculating jump offsets and instruction indicies
> > until linearization.
>
> Not sure I have followed the idea of "target" pointer. At the moment we are
> using index mapping array (generated as by-product during coping insn).
>
> While the "target" pointer means to during list initialization, each jump
> insn will have target initialized to the list node of the converted jump
> destination insn, and all those non-jump insns are with NULL? Then during
> linearization you assign index to each list node (could be done as
> by-product of other pass) before insn coping which could then relocate the
> insn during the coping as the "target" would have final index calculated?
> Am I following correctly?

Yes, I think you are understanding correctly what I'm saying. For
implementation, you can do it in few ways, through few passes or with
some additional data, is less important. See what's cleanest.

>
> > 2. aux_data is also filled at that point. Later at linearization time
> > you'd just iterate over all the instructions in final order and copy
> > original aux_data, if it's present. And then just repace env's
> > aux_data array at the end, should be very simple and fast.
>
> As explained, I am worried making aux_data a pointer will causing
> duplicating some fields into list_insn if the fields are required for
> patched insns.

Addressed above, I don't think there will be any duplication, because
we pass aux_data by pointer.

>
> > 3. during fix_bpf_calls, zext, ctx rewrite passes, we'll reuse the
> > same list of instructions and those passes will just keep inserting
> > instruction buffers. Given we have restriction that all the jumps are
> > only within patch buffer, it will be trivial to construct proper
> > patchable instruction wrappers for newly added instructions, with NULL
> > for aux_data and possibly non-NULL target (if it's a JMP insn).
> > 4. After those passes, linearize, adjust subprogs (for this you'll
> > probably still need to create index mapping, right?), copy or create
> > new aux_data.
> > 5. Done.
> >
> > What do you think? I think this should be overall simpler and faster.
> > But let me know if I'm missing something.
>
> Thanks for all these thoughts, they are very good suggestions and reminds
> me to revisit some points I had forgotten. I will do the following things:
>
>   1. retry the negative index solution to eliminate flag if the result code
>      could be clean.
>   2. the "target" pointer seems make sense, it makes list_insn bigger but
>      normally space trade with time, so I will try to implement it to see
>      how the code looks like.
>   3. I still have concerns on making aux_data as pointer. Mostly due to
>      patched insn will have NULL pointer and in case aux info of patched
>      insn is required, we need to duplicate info inside list_insn. For
>      example 32-bit zext opt requires zext_dst.
>


So one more thing I wanted to suggest. I'll try to keep high-level
suggestions here.

What about having a wrapper for patchable_insn list, where you can
store some additional data, like final count and whatever else. It
will eliminate some passes (counting) and will make list handling
easier (because you can have a dummy head pointer, so no special
handling of first element, you had this concern in patch #1, I
believe). But it will be clear if it's beneficial once implemented.

> Regards,
> Jiong
>
> >>
> >> Compared with old patching code, this new infrastructure has much less core
> >> code, even though the final code has a couple of extra lines but that is
> >> mostly due to for list based infrastructure, we need to do more error
> >> checks, so the list and associated aux data structure could be freed when
> >> errors happens.
> >>
> >> Patching Restrictions
> >> ===
> >>   - For core layer, the linearization assume no new jumps inside patch buf.
> >>     Currently, the only user of this layer is jit blinding.
> >>   - For verifier layer, there could be new jumps inside patch buf, but
> >>     they should have branch target resolved themselves, meaning new jumps
> >>     doesn't jump to insns out of the patch buf. This is the case for all
> >>     existing verifier layer users.
> >>   - bpf_insn_aux_data for all patched insns including the one at patch
> >>     point are invalidated, only 32-bit zext info will be recalcuated.
> >>     If the aux data of insn at patch point needs to be retained, it is
> >>     purely insn insertion, so need to use the pre-patch API.
> >>
> >> I plan to send out a PATCH set once I finished insn deletion line info adj
> >> support, please have a looks at this RFC, and appreciate feedbacks.
> >>
> >> Jiong Wang (8):
> >>   bpf: introducing list based insn patching infra to core layer
> >>   bpf: extend list based insn patching infra to verification layer
> >>   bpf: migrate jit blinding to list patching infra
> >>   bpf: migrate convert_ctx_accesses to list patching infra
> >>   bpf: migrate fixup_bpf_calls to list patching infra
> >>   bpf: migrate zero extension opt to list patching infra
> >>   bpf: migrate insn remove to list patching infra
> >>   bpf: delete all those code around old insn patching infrastructure
> >>
> >>  include/linux/bpf_verifier.h |   1 -
> >>  include/linux/filter.h       |  27 +-
> >>  kernel/bpf/core.c            | 431 +++++++++++++++++-----------
> >>  kernel/bpf/verifier.c        | 649 +++++++++++++++++++------------------------
> >>  4 files changed, 580 insertions(+), 528 deletions(-)
> >>
> >> --
> >> 2.7.4
> >>
>

^ permalink raw reply

* Re: [PATCH v1 1/6] rcu: Add support for consolidated-RCU reader checking
From: Joel Fernandes @ 2019-07-12 19:40 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Peter Zijlstra, linux-kernel, Alexey Kuznetsov, Bjorn Helgaas,
	Borislav Petkov, c0d1n61at3, David S. Miller, edumazet,
	Greg Kroah-Hartman, Hideaki YOSHIFUJI, H. Peter Anvin,
	Ingo Molnar, Josh Triplett, keescook, kernel-hardening,
	Lai Jiangshan, Len Brown, linux-acpi, linux-pci, linux-pm,
	Mathieu Desnoyers, neilb, netdev, oleg, Pavel Machek,
	Rafael J. Wysocki, Rasmus Villemoes, rcu, Steven Rostedt,
	Tejun Heo, Thomas Gleixner, will,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)
In-Reply-To: <20190712174630.GX26519@linux.ibm.com>

On Fri, Jul 12, 2019 at 10:46:30AM -0700, Paul E. McKenney wrote:
> On Fri, Jul 12, 2019 at 01:06:31PM -0400, Joel Fernandes wrote:
> > On Fri, Jul 12, 2019 at 09:45:31AM -0700, Paul E. McKenney wrote:
> > > On Fri, Jul 12, 2019 at 11:10:51AM -0400, Joel Fernandes wrote:
> > > > On Fri, Jul 12, 2019 at 01:11:25PM +0200, Peter Zijlstra wrote:
> > > > > On Thu, Jul 11, 2019 at 07:43:56PM -0400, Joel Fernandes (Google) wrote:
> > > > > > +int rcu_read_lock_any_held(void)
> > > > > > +{
> > > > > > +	int lockdep_opinion = 0;
> > > > > > +
> > > > > > +	if (!debug_lockdep_rcu_enabled())
> > > > > > +		return 1;
> > > > > > +	if (!rcu_is_watching())
> > > > > > +		return 0;
> > > > > > +	if (!rcu_lockdep_current_cpu_online())
> > > > > > +		return 0;
> > > > > > +
> > > > > > +	/* Preemptible RCU flavor */
> > > > > > +	if (lock_is_held(&rcu_lock_map))
> > > > > 
> > > > > you forgot debug_locks here.
> > > > 
> > > > Actually, it turns out debug_locks checking is not even needed. If
> > > > debug_locks == 0, then debug_lockdep_rcu_enabled() returns 0 and we would not
> > > > get to this point.
> > > > 
> > > > > > +		return 1;
> > > > > > +
> > > > > > +	/* BH flavor */
> > > > > > +	if (in_softirq() || irqs_disabled())
> > > > > 
> > > > > I'm not sure I'd put irqs_disabled() under BH, also this entire
> > > > > condition is superfluous, see below.
> > > > > 
> > > > > > +		return 1;
> > > > > > +
> > > > > > +	/* Sched flavor */
> > > > > > +	if (debug_locks)
> > > > > > +		lockdep_opinion = lock_is_held(&rcu_sched_lock_map);
> > > > > > +	return lockdep_opinion || !preemptible();
> > > > > 
> > > > > that !preemptible() turns into:
> > > > > 
> > > > >   !(preempt_count()==0 && !irqs_disabled())
> > > > > 
> > > > > which is:
> > > > > 
> > > > >   preempt_count() != 0 || irqs_disabled()
> > > > > 
> > > > > and already includes irqs_disabled() and in_softirq().
> > > > > 
> > > > > > +}
> > > > > 
> > > > > So maybe something lke:
> > > > > 
> > > > > 	if (debug_locks && (lock_is_held(&rcu_lock_map) ||
> > > > > 			    lock_is_held(&rcu_sched_lock_map)))
> > > > > 		return true;
> > > > 
> > > > Agreed, I will do it this way (without the debug_locks) like:
> > > > 
> > > > ---8<-----------------------
> > > > 
> > > > diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
> > > > index ba861d1716d3..339aebc330db 100644
> > > > --- a/kernel/rcu/update.c
> > > > +++ b/kernel/rcu/update.c
> > > > @@ -296,27 +296,15 @@ EXPORT_SYMBOL_GPL(rcu_read_lock_bh_held);
> > > >  
> > > >  int rcu_read_lock_any_held(void)
> > > >  {
> > > > -	int lockdep_opinion = 0;
> > > > -
> > > >  	if (!debug_lockdep_rcu_enabled())
> > > >  		return 1;
> > > >  	if (!rcu_is_watching())
> > > >  		return 0;
> > > >  	if (!rcu_lockdep_current_cpu_online())
> > > >  		return 0;
> > > > -
> > > > -	/* Preemptible RCU flavor */
> > > > -	if (lock_is_held(&rcu_lock_map))
> > > > -		return 1;
> > > > -
> > > > -	/* BH flavor */
> > > > -	if (in_softirq() || irqs_disabled())
> > > > -		return 1;
> > > > -
> > > > -	/* Sched flavor */
> > > > -	if (debug_locks)
> > > > -		lockdep_opinion = lock_is_held(&rcu_sched_lock_map);
> > > > -	return lockdep_opinion || !preemptible();
> > > > +	if (lock_is_held(&rcu_lock_map) || lock_is_held(&rcu_sched_lock_map))
> > > 
> > > OK, I will bite...  Why not also lock_is_held(&rcu_bh_lock_map)?
> > 
> > Hmm, I was borrowing the strategy from rcu_read_lock_bh_held() which does not
> > check for a lock held in this map.
> > 
> > Honestly, even  lock_is_held(&rcu_sched_lock_map) seems unnecessary per-se
> > since !preemptible() will catch that? rcu_read_lock_sched() disables
> > preemption already, so lockdep's opinion of the matter seems redundant there.
> 
> Good point!  At least as long as the lockdep splats list RCU-bh among
> the locks held, which they did last I checked.
> 
> Of course, you could make the same argument for getting rid of
> rcu_sched_lock_map.  Does it make sense to have the one without
> the other?

It probably makes it inconsistent in the least. I will add the check for
the rcu_bh_lock_map in a separate patch, if that's Ok with you - since I also
want to update the rcu_read_lock_bh_held() logic in the same patch.

That rcu_read_lock_bh_held() could also just return !preemptible as Peter
suggested for the bh case.

> > Sorry I already sent out patches again before seeing your comment but I can
> > rework and resend them based on any other suggestions.
> 
> Not a problem!

Thanks. Depending on whether there is any other feedback, I will work on the
bh_ stuff as a separate patch on top of this series, or work it into the next
series revision if I'm reposting. Hopefully that sounds Ok to you.

thanks,

 - Joel



^ permalink raw reply

* [PATCH] be2net: fix adapter->big_page_size miscaculation
From: Qian Cai @ 2019-07-12 19:23 UTC (permalink / raw)
  To: davem
  Cc: sathya.perla, ajit.khaparde, sriharsha.basavapatna, somnath.kotur,
	arnd, dhowells, hpa, netdev, linux-arch, linux-kernel, Qian Cai

The commit d66acc39c7ce ("bitops: Optimise get_order()") introduced a
problem for the be2net driver as "rx_frag_size" could be a module
parameter that can be changed while loading the module. That commit
checks __builtin_constant_p() first in get_order() which cause
"adapter->big_page_size" to be assigned a value based on the
the default "rx_frag_size" value at the compilation time. It also
generate a compilation warning,

In file included from ./arch/powerpc/include/asm/page_64.h:107,
                 from ./arch/powerpc/include/asm/page.h:242,
                 from ./arch/powerpc/include/asm/mmu.h:132,
                 from ./arch/powerpc/include/asm/lppaca.h:47,
                 from ./arch/powerpc/include/asm/paca.h:17,
                 from ./arch/powerpc/include/asm/current.h:13,
                 from ./include/linux/thread_info.h:21,
                 from ./arch/powerpc/include/asm/processor.h:39,
                 from ./include/linux/prefetch.h:15,
                 from drivers/net/ethernet/emulex/benet/be_main.c:14:
drivers/net/ethernet/emulex/benet/be_main.c: In function
'be_rx_cqs_create':
./include/asm-generic/getorder.h:54:9: warning: comparison is always
true due to limited range of data type [-Wtype-limits]
   (((n) < (1UL << PAGE_SHIFT)) ? 0 :  \
         ^
drivers/net/ethernet/emulex/benet/be_main.c:3138:33: note: in expansion
of macro 'get_order'
  adapter->big_page_size = (1 << get_order(rx_frag_size)) * PAGE_SIZE;
                                 ^~~~~~~~~

Fix it by using __get_order() instead which will calculate in runtime.

Fixes: d66acc39c7ce ("bitops: Optimise get_order()")
Signed-off-by: Qian Cai <cai@lca.pw>
---
 drivers/net/ethernet/emulex/benet/be_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c
index 82015c8a5ed7..db13e714df7c 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -3135,7 +3135,7 @@ static int be_rx_cqs_create(struct be_adapter *adapter)
 	if (adapter->num_rx_qs == 0)
 		adapter->num_rx_qs = 1;
 
-	adapter->big_page_size = (1 << get_order(rx_frag_size)) * PAGE_SIZE;
+	adapter->big_page_size = (1 << __get_order(rx_frag_size)) * PAGE_SIZE;
 	for_all_rx_queues(adapter, rxo, i) {
 		rxo->adapter = adapter;
 		cq = &rxo->cq;
-- 
1.8.3.1


^ permalink raw reply related

* [PATCH] rtlwifi: btcoex: fix issue possible condition with no effect (if == else)
From: Hariprasad Kelam @ 2019-07-12 19:15 UTC (permalink / raw)
  To: Ping-Ke Shih, Kalle Valo, David S. Miller, YueHaibing,
	Hariprasad Kelam, Larry Finger, Nathan Chancellor, linux-wireless,
	netdev, linux-kernel

fix below issue reported by coccicheck
drivers/net/wireless/realtek/rtlwifi/btcoexist/halbtcoutsrc.c:514:1-3:
WARNING: possible condition with no effect (if == else)

Signed-off-by: Hariprasad Kelam <hariprasad.kelam@gmail.com>
---
 drivers/net/wireless/realtek/rtlwifi/btcoexist/halbtcoutsrc.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/drivers/net/wireless/realtek/rtlwifi/btcoexist/halbtcoutsrc.c b/drivers/net/wireless/realtek/rtlwifi/btcoexist/halbtcoutsrc.c
index 152242a..191dafd0 100644
--- a/drivers/net/wireless/realtek/rtlwifi/btcoexist/halbtcoutsrc.c
+++ b/drivers/net/wireless/realtek/rtlwifi/btcoexist/halbtcoutsrc.c
@@ -509,13 +509,7 @@ static u32 halbtc_get_wifi_link_status(struct btc_coexist *btcoexist)
 
 static s32 halbtc_get_wifi_rssi(struct rtl_priv *rtlpriv)
 {
-	int undec_sm_pwdb = 0;
-
-	if (rtlpriv->mac80211.link_state >= MAC80211_LINKED)
-		undec_sm_pwdb = rtlpriv->dm.undec_sm_pwdb;
-	else /* associated entry pwdb */
-		undec_sm_pwdb = rtlpriv->dm.undec_sm_pwdb;
-	return undec_sm_pwdb;
+	return rtlpriv->dm.undec_sm_pwdb;
 }
 
 static bool halbtc_get(void *void_btcoexist, u8 get_type, void *out_buf)
-- 
2.7.4


^ permalink raw reply related

* Re: [PATCH v3 bpf 0/3] fix BTF verification size resolution
From: Yonghong Song @ 2019-07-12 18:47 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf@vger.kernel.org, netdev@vger.kernel.org,
	Alexei Starovoitov, daniel@iogearbox.net
  Cc: andrii.nakryiko@gmail.com, Kernel Team
In-Reply-To: <20190712172557.4039121-1-andriin@fb.com>



On 7/12/19 10:25 AM, Andrii Nakryiko wrote:
> BTF size resolution logic isn't always resolving type size correctly, leading
> to erroneous map creation failures due to value size mismatch.
> 
> This patch set:
> 1. fixes the issue (patch #1);
> 2. adds tests for trickier cases (patch #2);
> 3. and converts few test cases utilizing BTF-defined maps, that previously
>     couldn't use typedef'ed arrays due to kernel bug (patch #3).
> 
> Andrii Nakryiko (3):
>    bpf: fix BTF verifier size resolution logic
>    selftests/bpf: add trickier size resolution tests
>    selftests/bpf: use typedef'ed arrays as map values

Looks good to me. Ack for the whole series.
Acked-by: Yonghong Song <yhs@fb.com>

> 
>   kernel/bpf/btf.c                              | 19 ++--
>   .../bpf/progs/test_get_stack_rawtp.c          |  3 +-
>   .../bpf/progs/test_stacktrace_build_id.c      |  3 +-
>   .../selftests/bpf/progs/test_stacktrace_map.c |  2 +-
>   tools/testing/selftests/bpf/test_btf.c        | 88 +++++++++++++++++++
>   5 files changed, 104 insertions(+), 11 deletions(-)
> 

^ permalink raw reply

* Re: [PATCH net-next v3] net/mlx5e: Convert single case statement switch statements into if statements
From: David Miller @ 2019-07-12 18:37 UTC (permalink / raw)
  To: natechancellor
  Cc: saeedm, leon, borisp, netdev, linux-rdma, linux-kernel,
	clang-built-linux, ndesaulniers
In-Reply-To: <20190710060614.6155-1-natechancellor@gmail.com>

From: Nathan Chancellor <natechancellor@gmail.com>
Date: Tue,  9 Jul 2019 23:06:15 -0700

> During the review of commit 1ff2f0fa450e ("net/mlx5e: Return in default
> case statement in tx_post_resync_params"), Leon and Nick pointed out
> that the switch statements can be converted to single if statements
> that return early so that the code is easier to follow.
> 
> Suggested-by: Leon Romanovsky <leon@kernel.org>
> Suggested-by: Nick Desaulniers <ndesaulniers@google.com>
> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>

Applied, thanks for following up Nathan.

^ permalink raw reply

* Re: [RFC PATCH net-next 0/3] net: batched receive in GRO path
From: David Miller @ 2019-07-12 18:30 UTC (permalink / raw)
  To: eric.dumazet; +Cc: ecree, pabeni, netdev
In-Reply-To: <d7ca6e7a-b80e-12e8-9050-c25b8b92bf26@gmail.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 12 Jul 2019 18:48:29 +0200

> I should have mentioned that we have a patch that I forgot to
> upstream adding the PSH flag to all TSO packets, meaning the
> receiver can automatically learn the boundary of a GRO packet and
> not have to wait for the napi->poll() end (busypolling or not)

Wow, I thought we were already doing this...

^ permalink raw reply

* Re: [PATCH v3 0/7] Convert skb_frag_t to bio_vec
From: David Miller @ 2019-07-12 18:27 UTC (permalink / raw)
  To: willy; +Cc: hch, netdev
In-Reply-To: <20190712134345.19767-1-willy@infradead.org>

From: Matthew Wilcox <willy@infradead.org>
Date: Fri, 12 Jul 2019 06:43:38 -0700

> From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> 
> The skb_frag_t and bio_vec are fundamentally the same (page, offset,
> length) tuple.  This patch series unifies the two, leaving the
> skb_frag_t typedef in place.  This has the immediate advantage that
> we already have iov_iter support for bvecs and don't need to add
> support for iterating skbuffs.  It enables a long-term plan to use
> bvecs more broadly within the kernel and should make network-storage
> drivers able to do less work converting between skbuffs and biovecs.
> 
> It will consume more memory on 32-bit kernels.  If that proves
> problematic, we can look at ways of addressing it.
> 
> v3: Rebase on latest Linus with net-next merged.
>   - Reorder the uncontroversial 'Use skb accessors' patches first so you
>     can apply just those two if you want to hold off on the full
>     conversion.
>   - Convert all the users of 'struct skb_frag_struct' to skb_frag_t.

I have no objections to this series, please resubmit it (taking into
consideration any more feedback you get) when net-next opens back
up.

^ permalink raw reply

* Re: [PATCH] [net-next, netfilter] mlx5: avoid unused variable warning
From: Saeed Mahameed @ 2019-07-12 18:25 UTC (permalink / raw)
  To: davem@davemloft.net, arnd@arndb.de, leon@kernel.org
  Cc: Aya Levin, Maxim Mikityanskiy, Or Gerlitz, pablo@netfilter.org,
	Tariq Toukan, linux-rdma@vger.kernel.org,
	jakub.kicinski@netronome.com, Eran Ben Elisha,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <20190712085823.4111911-1-arnd@arndb.de>

On Fri, 2019-07-12 at 10:57 +0200, Arnd Bergmann wrote:
> Without CONFIG_MLX5_ESWITCH we get a harmless warning:
> 
> drivers/net/ethernet/mellanox/mlx5/core/en_main.c:3467:21: error:
> unused variable 'priv' [-Werror,-Wunused-variable]
>         struct mlx5e_priv *priv = netdev_priv(dev);
> 

Hi Arnd,

thanks for your patch, a similar patch that addresses this issue was
already submitted and applied to net-next [1]

[1] https://www.spinics.net/lists/netdev/msg585433.html

> Hide the declaration in the same #ifdef as its usage.
> 
> Fixes: 4e95bc268b91 ("net: flow_offload: add
> flow_block_cb_setup_simple()")
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> index 6d0ae87c8ded..b562ba904ea1 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> @@ -3464,7 +3464,9 @@ static LIST_HEAD(mlx5e_block_cb_list);
>  static int mlx5e_setup_tc(struct net_device *dev, enum tc_setup_type
> type,
>  			  void *type_data)
>  {
> +#ifdef CONFIG_MLX5_ESWITCH
>  	struct mlx5e_priv *priv = netdev_priv(dev);
> +#endif
>  
>  	switch (type) {
>  #ifdef CONFIG_MLX5_ESWITCH

^ permalink raw reply

* Re: [PATCH bpf] selftests/bpf: fix test_send_signal_nmi on s390
From: Andrii Nakryiko @ 2019-07-12 18:22 UTC (permalink / raw)
  To: Ilya Leoshkevich; +Cc: bpf, Networking, gor, heiko.carstens
In-Reply-To: <20190712174528.1767-1-iii@linux.ibm.com>

On Fri, Jul 12, 2019 at 10:46 AM Ilya Leoshkevich <iii@linux.ibm.com> wrote:
>
> Many s390 setups (most notably, KVM guests) do not have access to
> hardware performance events.
>
> Therefore, use the software event instead.
>
> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
> Acked-by: Vasily Gorbik <gor@linux.ibm.com>
> ---
>  tools/testing/selftests/bpf/prog_tests/send_signal.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/send_signal.c b/tools/testing/selftests/bpf/prog_tests/send_signal.c
> index 67cea1686305..4a45ea0b8448 100644
> --- a/tools/testing/selftests/bpf/prog_tests/send_signal.c
> +++ b/tools/testing/selftests/bpf/prog_tests/send_signal.c
> @@ -176,10 +176,19 @@ static int test_send_signal_tracepoint(void)
>  static int test_send_signal_nmi(void)
>  {
>         struct perf_event_attr attr = {
> +#if defined(__s390__)
> +               /* Many s390 setups (most notably, KVM guests) do not have
> +                * access to hardware performance events.
> +                */
> +               .sample_period = 1,
> +               .type = PERF_TYPE_SOFTWARE,
> +               .config = PERF_COUNT_SW_CPU_CLOCK,
> +#else

Is there any harm in switching all archs to software event? I'd rather
avoid all those special arch cases, which will be really hard to test
for people without direct access to them.

>                 .sample_freq = 50,
>                 .freq = 1,
>                 .type = PERF_TYPE_HARDWARE,
>                 .config = PERF_COUNT_HW_CPU_CYCLES,
> +#endif
>         };
>
>         return test_send_signal_common(&attr, BPF_PROG_TYPE_PERF_EVENT, "perf_event");
> --
> 2.21.0
>

^ permalink raw reply

* Re: [bpf-next v3 11/12] selftests/bpf: Add tests for bpf_prog_test_run for perf events progs
From: Andrii Nakryiko @ 2019-07-12 17:49 UTC (permalink / raw)
  To: Krzesimir Nowak
  Cc: open list, Alban Crequy, Iago López Galeiras,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
	Networking, bpf, xdp-newbies
In-Reply-To: <CAGGp+cGMnumMx+GnKbD_ty1C+UWib70s0oBzqdS-=mA-L0jyHA@mail.gmail.com>

On Fri, Jul 12, 2019 at 10:37 AM Krzesimir Nowak <krzesimir@kinvolk.io> wrote:
>
> On Fri, Jul 12, 2019 at 2:38 AM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Mon, Jul 8, 2019 at 3:42 PM Krzesimir Nowak <krzesimir@kinvolk.io> wrote:
> > >
> > > The tests check if ctx and data are correctly prepared from ctx_in and
> > > data_in, so accessing the ctx and using the bpf_perf_prog_read_value
> > > work as expected.
> > >
> >
> > These are x86_64-specific tests, aren't they? Should probably guard
> > them behind #ifdef's.
>
> Yeah, they are x86_64 specific, because pt_regs are arch specific. I
> was wondering what to do here in the cover letter. Ifdef? Ifdef and
> cover also other arches (please no)? Do some weird tricks with
> overriding the definition of pt_regs? Else?

So one way to go about this would be to use bpf_helpers.h's
PT_REGS_PARM{1-5} and PT_REGS_RC, which seem to be define for all
"supported" platforms. You won't be testing all possible registers,
but those that are most commonly used by BPF programs (to get input
params and func result) would be tested, which is probably the most
important one. That way your test will be arch-agnostic.

>
> >
> > > Signed-off-by: Krzesimir Nowak <krzesimir@kinvolk.io>
> > > ---
> > >  tools/testing/selftests/bpf/test_verifier.c   | 48 ++++++++++
> > >  .../selftests/bpf/verifier/perf_event_run.c   | 96 +++++++++++++++++++
> > >  2 files changed, 144 insertions(+)
> > >  create mode 100644 tools/testing/selftests/bpf/verifier/perf_event_run.c
> > >
> > > diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
> > > index 6f124cc4ee34..484ea8842b06 100644
> > > --- a/tools/testing/selftests/bpf/test_verifier.c
> > > +++ b/tools/testing/selftests/bpf/test_verifier.c
> > > @@ -295,6 +295,54 @@ static void bpf_fill_scale(struct bpf_test *self)
> > >         }
> > >  }
> > >
> > > +static void bpf_fill_perf_event_test_run_check(struct bpf_test *self)
> > > +{
> > > +       compiletime_assert(
> > > +               sizeof(struct bpf_perf_event_data) <= TEST_CTX_LEN,
> > > +               "buffer for ctx is too short to fit struct bpf_perf_event_data");
> > > +       compiletime_assert(
> > > +               sizeof(struct bpf_perf_event_value) <= TEST_DATA_LEN,
> > > +               "buffer for data is too short to fit struct bpf_perf_event_value");
> > > +
> > > +       struct bpf_perf_event_data ctx = {
> > > +               .regs = (bpf_user_pt_regs_t) {
> > > +                       .r15 = 1,
> > > +                       .r14 = 2,
> > > +                       .r13 = 3,
> > > +                       .r12 = 4,
> > > +                       .rbp = 5,
> > > +                       .rbx = 6,
> > > +                       .r11 = 7,
> > > +                       .r10 = 8,
> > > +                       .r9 = 9,
> > > +                       .r8 = 10,
> > > +                       .rax = 11,
> > > +                       .rcx = 12,
> > > +                       .rdx = 13,
> > > +                       .rsi = 14,
> > > +                       .rdi = 15,
> > > +                       .orig_rax = 16,
> > > +                       .rip = 17,
> > > +                       .cs = 18,
> > > +                       .eflags = 19,
> > > +                       .rsp = 20,
> > > +                       .ss = 21,
> > > +               },
> > > +               .sample_period = 1,
> > > +               .addr = 2,
> > > +       };
> > > +       struct bpf_perf_event_value data = {
> > > +               .counter = 1,
> > > +               .enabled = 2,
> > > +               .running = 3,
> > > +       };
> > > +
> > > +       memcpy(self->ctx, &ctx, sizeof(ctx));
> > > +       memcpy(self->data, &data, sizeof(data));
> >
> > Just curious, just assignment didn't work?
> >
> > > +       free(self->fill_insns);
> > > +       self->fill_insns = NULL;
> > > +}
> > > +
> > >  /* BPF_SK_LOOKUP contains 13 instructions, if you need to fix up maps */
> > >  #define BPF_SK_LOOKUP(func)                                            \
> > >         /* struct bpf_sock_tuple tuple = {} */                          \
> > > diff --git a/tools/testing/selftests/bpf/verifier/perf_event_run.c b/tools/testing/selftests/bpf/verifier/perf_event_run.c
> > > new file mode 100644
> > > index 000000000000..3f877458a7f8
> > > --- /dev/null
> > > +++ b/tools/testing/selftests/bpf/verifier/perf_event_run.c
> > > @@ -0,0 +1,96 @@
> > > +#define PER_LOAD_AND_CHECK_PTREG(PT_REG_FIELD, VALUE)                  \
> > > +       PER_LOAD_AND_CHECK_CTX(offsetof(bpf_user_pt_regs_t, PT_REG_FIELD), VALUE)
> > > +#define PER_LOAD_AND_CHECK_EVENT(PED_FIELD, VALUE)                     \
> > > +       PER_LOAD_AND_CHECK_CTX(offsetof(struct bpf_perf_event_data, PED_FIELD), VALUE)
> > > +#define PER_LOAD_AND_CHECK_CTX(OFFSET, VALUE)                          \
> > > +       PER_LOAD_AND_CHECK_64(BPF_REG_4, BPF_REG_1, OFFSET, VALUE)
> > > +#define PER_LOAD_AND_CHECK_VALUE(PEV_FIELD, VALUE)                     \
> > > +       PER_LOAD_AND_CHECK_64(BPF_REG_7, BPF_REG_6, offsetof(struct bpf_perf_event_value, PEV_FIELD), VALUE)
> >
> > Wrap long lines? Try also running scripts/checkpatch.pl again these
> > files you are modifying.
>
> Will wrap. Checkpatch was also complaining about complex macro not
> being inside parens, but I can't see how to wrap it in parens and keep
> it working at the same time.
>
> >
> > > +#define PER_LOAD_AND_CHECK_64(DST, SRC, OFFSET, VALUE)                 \
> > > +       BPF_LDX_MEM(BPF_DW, DST, SRC, OFFSET),                          \
> > > +       BPF_JMP_IMM(BPF_JEQ, DST, VALUE, 2),                            \
> > > +       BPF_MOV64_IMM(BPF_REG_0, VALUE),                                \
> > > +       BPF_EXIT_INSN()
> > > +
> > > +{
> > > +       "check if regs contain expected values",
> > > +       .insns = {
> > > +       PER_LOAD_AND_CHECK_PTREG(r15, 1),
> > > +       PER_LOAD_AND_CHECK_PTREG(r14, 2),
> > > +       PER_LOAD_AND_CHECK_PTREG(r13, 3),
> > > +       PER_LOAD_AND_CHECK_PTREG(r12, 4),
> > > +       PER_LOAD_AND_CHECK_PTREG(rbp, 5),
> > > +       PER_LOAD_AND_CHECK_PTREG(rbx, 6),
> > > +       PER_LOAD_AND_CHECK_PTREG(r11, 7),
> > > +       PER_LOAD_AND_CHECK_PTREG(r10, 8),
> > > +       PER_LOAD_AND_CHECK_PTREG(r9, 9),
> > > +       PER_LOAD_AND_CHECK_PTREG(r8, 10),
> > > +       PER_LOAD_AND_CHECK_PTREG(rax, 11),
> > > +       PER_LOAD_AND_CHECK_PTREG(rcx, 12),
> > > +       PER_LOAD_AND_CHECK_PTREG(rdx, 13),
> > > +       PER_LOAD_AND_CHECK_PTREG(rsi, 14),
> > > +       PER_LOAD_AND_CHECK_PTREG(rdi, 15),
> > > +       PER_LOAD_AND_CHECK_PTREG(orig_rax, 16),
> > > +       PER_LOAD_AND_CHECK_PTREG(rip, 17),
> > > +       PER_LOAD_AND_CHECK_PTREG(cs, 18),
> > > +       PER_LOAD_AND_CHECK_PTREG(eflags, 19),
> > > +       PER_LOAD_AND_CHECK_PTREG(rsp, 20),
> > > +       PER_LOAD_AND_CHECK_PTREG(ss, 21),
> > > +       BPF_MOV64_IMM(BPF_REG_0, 0),
> > > +       BPF_EXIT_INSN(),
> > > +       },
> > > +       .result = ACCEPT,
> > > +       .prog_type = BPF_PROG_TYPE_PERF_EVENT,
> > > +       .ctx_len = sizeof(struct bpf_perf_event_data),
> > > +       .data_len = sizeof(struct bpf_perf_event_value),
> > > +       .fill_helper = bpf_fill_perf_event_test_run_check,
> > > +       .override_data_out_len = true,
> > > +},
> > > +{
> > > +       "check if sample period and addr contain expected values",
> > > +       .insns = {
> > > +       PER_LOAD_AND_CHECK_EVENT(sample_period, 1),
> > > +       PER_LOAD_AND_CHECK_EVENT(addr, 2),
> > > +       BPF_MOV64_IMM(BPF_REG_0, 0),
> > > +       BPF_EXIT_INSN(),
> > > +       },
> > > +       .result = ACCEPT,
> > > +       .prog_type = BPF_PROG_TYPE_PERF_EVENT,
> > > +       .ctx_len = sizeof(struct bpf_perf_event_data),
> > > +       .data_len = sizeof(struct bpf_perf_event_value),
> > > +       .fill_helper = bpf_fill_perf_event_test_run_check,
> > > +       .override_data_out_len = true,
> > > +},
> > > +{
> > > +       "check if bpf_perf_prog_read_value returns expected data",
> > > +       .insns = {
> > > +       // allocate space for a struct bpf_perf_event_value
> > > +       BPF_MOV64_REG(BPF_REG_6, BPF_REG_10),
> > > +       BPF_ALU64_IMM(BPF_ADD, BPF_REG_6, -(int)sizeof(struct bpf_perf_event_value)),
> > > +       // prepare parameters for bpf_perf_prog_read_value(ctx, struct bpf_perf_event_value*, u32)
> > > +       // BPF_REG_1 already contains the context
> > > +       BPF_MOV64_REG(BPF_REG_2, BPF_REG_6),
> > > +       BPF_MOV64_IMM(BPF_REG_3, sizeof(struct bpf_perf_event_value)),
> > > +       BPF_EMIT_CALL(BPF_FUNC_perf_prog_read_value),
> > > +       // check the return value
> > > +       BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1),
> > > +       BPF_EXIT_INSN(),
> > > +       // check if the fields match the expected values
> >
> > Use /* */ comments.
>
> Oops. Will fix.
>
> >
> > > +       PER_LOAD_AND_CHECK_VALUE(counter, 1),
> > > +       PER_LOAD_AND_CHECK_VALUE(enabled, 2),
> > > +       PER_LOAD_AND_CHECK_VALUE(running, 3),
> > > +       BPF_MOV64_IMM(BPF_REG_0, 0),
> > > +       BPF_EXIT_INSN(),
> > > +       },
> > > +       .result = ACCEPT,
> > > +       .prog_type = BPF_PROG_TYPE_PERF_EVENT,
> > > +       .ctx_len = sizeof(struct bpf_perf_event_data),
> > > +       .data_len = sizeof(struct bpf_perf_event_value),
> > > +       .fill_helper = bpf_fill_perf_event_test_run_check,
> > > +       .override_data_out_len = true,
> > > +},
> > > +#undef PER_LOAD_AND_CHECK_64
> > > +#undef PER_LOAD_AND_CHECK_VALUE
> > > +#undef PER_LOAD_AND_CHECK_CTX
> > > +#undef PER_LOAD_AND_CHECK_EVENT
> > > +#undef PER_LOAD_AND_CHECK_PTREG
> > > --
> > > 2.20.1
> > >
>
>
>
> --
> Kinvolk GmbH | Adalbertstr.6a, 10999 Berlin | tel: +491755589364
> Geschäftsführer/Directors: Alban Crequy, Chris Kühl, Iago López Galeiras
> Registergericht/Court of registration: Amtsgericht Charlottenburg
> Registernummer/Registration number: HRB 171414 B
> Ust-ID-Nummer/VAT ID number: DE302207000

^ permalink raw reply

* Re: [PATCH v1 1/6] rcu: Add support for consolidated-RCU reader checking
From: Paul E. McKenney @ 2019-07-12 17:46 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Peter Zijlstra, linux-kernel, Alexey Kuznetsov, Bjorn Helgaas,
	Borislav Petkov, c0d1n61at3, David S. Miller, edumazet,
	Greg Kroah-Hartman, Hideaki YOSHIFUJI, H. Peter Anvin,
	Ingo Molnar, Josh Triplett, keescook, kernel-hardening,
	Lai Jiangshan, Len Brown, linux-acpi, linux-pci, linux-pm,
	Mathieu Desnoyers, neilb, netdev, oleg, Pavel Machek,
	Rafael J. Wysocki, Rasmus Villemoes, rcu, Steven Rostedt,
	Tejun Heo, Thomas Gleixner, will,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)
In-Reply-To: <20190712170631.GA111598@google.com>

On Fri, Jul 12, 2019 at 01:06:31PM -0400, Joel Fernandes wrote:
> On Fri, Jul 12, 2019 at 09:45:31AM -0700, Paul E. McKenney wrote:
> > On Fri, Jul 12, 2019 at 11:10:51AM -0400, Joel Fernandes wrote:
> > > On Fri, Jul 12, 2019 at 01:11:25PM +0200, Peter Zijlstra wrote:
> > > > On Thu, Jul 11, 2019 at 07:43:56PM -0400, Joel Fernandes (Google) wrote:
> > > > > +int rcu_read_lock_any_held(void)
> > > > > +{
> > > > > +	int lockdep_opinion = 0;
> > > > > +
> > > > > +	if (!debug_lockdep_rcu_enabled())
> > > > > +		return 1;
> > > > > +	if (!rcu_is_watching())
> > > > > +		return 0;
> > > > > +	if (!rcu_lockdep_current_cpu_online())
> > > > > +		return 0;
> > > > > +
> > > > > +	/* Preemptible RCU flavor */
> > > > > +	if (lock_is_held(&rcu_lock_map))
> > > > 
> > > > you forgot debug_locks here.
> > > 
> > > Actually, it turns out debug_locks checking is not even needed. If
> > > debug_locks == 0, then debug_lockdep_rcu_enabled() returns 0 and we would not
> > > get to this point.
> > > 
> > > > > +		return 1;
> > > > > +
> > > > > +	/* BH flavor */
> > > > > +	if (in_softirq() || irqs_disabled())
> > > > 
> > > > I'm not sure I'd put irqs_disabled() under BH, also this entire
> > > > condition is superfluous, see below.
> > > > 
> > > > > +		return 1;
> > > > > +
> > > > > +	/* Sched flavor */
> > > > > +	if (debug_locks)
> > > > > +		lockdep_opinion = lock_is_held(&rcu_sched_lock_map);
> > > > > +	return lockdep_opinion || !preemptible();
> > > > 
> > > > that !preemptible() turns into:
> > > > 
> > > >   !(preempt_count()==0 && !irqs_disabled())
> > > > 
> > > > which is:
> > > > 
> > > >   preempt_count() != 0 || irqs_disabled()
> > > > 
> > > > and already includes irqs_disabled() and in_softirq().
> > > > 
> > > > > +}
> > > > 
> > > > So maybe something lke:
> > > > 
> > > > 	if (debug_locks && (lock_is_held(&rcu_lock_map) ||
> > > > 			    lock_is_held(&rcu_sched_lock_map)))
> > > > 		return true;
> > > 
> > > Agreed, I will do it this way (without the debug_locks) like:
> > > 
> > > ---8<-----------------------
> > > 
> > > diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
> > > index ba861d1716d3..339aebc330db 100644
> > > --- a/kernel/rcu/update.c
> > > +++ b/kernel/rcu/update.c
> > > @@ -296,27 +296,15 @@ EXPORT_SYMBOL_GPL(rcu_read_lock_bh_held);
> > >  
> > >  int rcu_read_lock_any_held(void)
> > >  {
> > > -	int lockdep_opinion = 0;
> > > -
> > >  	if (!debug_lockdep_rcu_enabled())
> > >  		return 1;
> > >  	if (!rcu_is_watching())
> > >  		return 0;
> > >  	if (!rcu_lockdep_current_cpu_online())
> > >  		return 0;
> > > -
> > > -	/* Preemptible RCU flavor */
> > > -	if (lock_is_held(&rcu_lock_map))
> > > -		return 1;
> > > -
> > > -	/* BH flavor */
> > > -	if (in_softirq() || irqs_disabled())
> > > -		return 1;
> > > -
> > > -	/* Sched flavor */
> > > -	if (debug_locks)
> > > -		lockdep_opinion = lock_is_held(&rcu_sched_lock_map);
> > > -	return lockdep_opinion || !preemptible();
> > > +	if (lock_is_held(&rcu_lock_map) || lock_is_held(&rcu_sched_lock_map))
> > 
> > OK, I will bite...  Why not also lock_is_held(&rcu_bh_lock_map)?
> 
> Hmm, I was borrowing the strategy from rcu_read_lock_bh_held() which does not
> check for a lock held in this map.
> 
> Honestly, even  lock_is_held(&rcu_sched_lock_map) seems unnecessary per-se
> since !preemptible() will catch that? rcu_read_lock_sched() disables
> preemption already, so lockdep's opinion of the matter seems redundant there.

Good point!  At least as long as the lockdep splats list RCU-bh among
the locks held, which they did last I checked.

Of course, you could make the same argument for getting rid of
rcu_sched_lock_map.  Does it make sense to have the one without
the other?

> Sorry I already sent out patches again before seeing your comment but I can
> rework and resend them based on any other suggestions.

Not a problem!

							Thax, Paul


^ permalink raw reply

* [PATCH bpf] selftests/bpf: fix test_send_signal_nmi on s390
From: Ilya Leoshkevich @ 2019-07-12 17:45 UTC (permalink / raw)
  To: bpf, netdev; +Cc: gor, heiko.carstens, Ilya Leoshkevich

Many s390 setups (most notably, KVM guests) do not have access to
hardware performance events.

Therefore, use the software event instead.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Acked-by: Vasily Gorbik <gor@linux.ibm.com>
---
 tools/testing/selftests/bpf/prog_tests/send_signal.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/send_signal.c b/tools/testing/selftests/bpf/prog_tests/send_signal.c
index 67cea1686305..4a45ea0b8448 100644
--- a/tools/testing/selftests/bpf/prog_tests/send_signal.c
+++ b/tools/testing/selftests/bpf/prog_tests/send_signal.c
@@ -176,10 +176,19 @@ static int test_send_signal_tracepoint(void)
 static int test_send_signal_nmi(void)
 {
 	struct perf_event_attr attr = {
+#if defined(__s390__)
+		/* Many s390 setups (most notably, KVM guests) do not have
+		 * access to hardware performance events.
+		 */
+		.sample_period = 1,
+		.type = PERF_TYPE_SOFTWARE,
+		.config = PERF_COUNT_SW_CPU_CLOCK,
+#else
 		.sample_freq = 50,
 		.freq = 1,
 		.type = PERF_TYPE_HARDWARE,
 		.config = PERF_COUNT_HW_CPU_CYCLES,
+#endif
 	};
 
 	return test_send_signal_common(&attr, BPF_PROG_TYPE_PERF_EVENT, "perf_event");
-- 
2.21.0


^ permalink raw reply related

* Re: [net-next 0/2] tipc: link changeover issues
From: David Miller @ 2019-07-12 17:45 UTC (permalink / raw)
  To: tuong.t.lien; +Cc: jon.maloy, maloy, ying.xue, netdev, tipc-discussion
In-Reply-To: <20190712051537.10826-1-tuong.t.lien@dektech.com.au>


net-next is closed right now, resubmit this when the tree opens back up.

^ permalink raw reply

* [PATCH v2 bpf-next] selftests/bpf: remove logic duplication in test_verifier.c
From: Andrii Nakryiko @ 2019-07-12 17:44 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel
  Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko, Krzesimir Nowak

test_verifier tests can specify single- and multi-runs tests. Internally
logic of handling them is duplicated. Get rid of it by making single run
retval/data specification to be a first run spec.

Cc: Krzesimir Nowak <krzesimir@kinvolk.io>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 tools/testing/selftests/bpf/test_verifier.c | 35 +++++++++------------
 1 file changed, 15 insertions(+), 20 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
index b0773291012a..84135d5f4b35 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -86,7 +86,7 @@ struct bpf_test {
 	int fixup_sk_storage_map[MAX_FIXUPS];
 	const char *errstr;
 	const char *errstr_unpriv;
-	uint32_t retval, retval_unpriv, insn_processed;
+	uint32_t insn_processed;
 	int prog_len;
 	enum {
 		UNDEF,
@@ -95,16 +95,20 @@ struct bpf_test {
 	} result, result_unpriv;
 	enum bpf_prog_type prog_type;
 	uint8_t flags;
-	__u8 data[TEST_DATA_LEN];
 	void (*fill_helper)(struct bpf_test *self);
 	uint8_t runs;
-	struct {
-		uint32_t retval, retval_unpriv;
-		union {
-			__u8 data[TEST_DATA_LEN];
-			__u64 data64[TEST_DATA_LEN / 8];
-		};
-	} retvals[MAX_TEST_RUNS];
+#define bpf_testdata_struct_t					\
+	struct {						\
+		uint32_t retval, retval_unpriv;			\
+		union {						\
+			__u8 data[TEST_DATA_LEN];		\
+			__u64 data64[TEST_DATA_LEN / 8];	\
+		};						\
+	}
+	union {
+		bpf_testdata_struct_t;
+		bpf_testdata_struct_t retvals[MAX_TEST_RUNS];
+	};
 	enum bpf_attach_type expected_attach_type;
 };
 
@@ -949,17 +953,8 @@ static void do_test_single(struct bpf_test *test, bool unpriv,
 		uint32_t expected_val;
 		int i;
 
-		if (!test->runs) {
-			expected_val = unpriv && test->retval_unpriv ?
-				test->retval_unpriv : test->retval;
-
-			err = do_prog_test_run(fd_prog, unpriv, expected_val,
-					       test->data, sizeof(test->data));
-			if (err)
-				run_errs++;
-			else
-				run_successes++;
-		}
+		if (!test->runs)
+			test->runs = 1;
 
 		for (i = 0; i < test->runs; i++) {
 			if (unpriv && test->retvals[i].retval_unpriv)
-- 
2.17.1


^ permalink raw reply related

* Re: [bpf-next v3 11/12] selftests/bpf: Add tests for bpf_prog_test_run for perf events progs
From: Krzesimir Nowak @ 2019-07-12 17:37 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: open list, Alban Crequy, Iago López Galeiras,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
	Networking, bpf, xdp-newbies
In-Reply-To: <CAEf4BzYaV=AxYZna225qKzyWPteU4YFPiBRE4cO30tYmyN_pJQ@mail.gmail.com>

On Fri, Jul 12, 2019 at 2:38 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Mon, Jul 8, 2019 at 3:42 PM Krzesimir Nowak <krzesimir@kinvolk.io> wrote:
> >
> > The tests check if ctx and data are correctly prepared from ctx_in and
> > data_in, so accessing the ctx and using the bpf_perf_prog_read_value
> > work as expected.
> >
>
> These are x86_64-specific tests, aren't they? Should probably guard
> them behind #ifdef's.

Yeah, they are x86_64 specific, because pt_regs are arch specific. I
was wondering what to do here in the cover letter. Ifdef? Ifdef and
cover also other arches (please no)? Do some weird tricks with
overriding the definition of pt_regs? Else?

>
> > Signed-off-by: Krzesimir Nowak <krzesimir@kinvolk.io>
> > ---
> >  tools/testing/selftests/bpf/test_verifier.c   | 48 ++++++++++
> >  .../selftests/bpf/verifier/perf_event_run.c   | 96 +++++++++++++++++++
> >  2 files changed, 144 insertions(+)
> >  create mode 100644 tools/testing/selftests/bpf/verifier/perf_event_run.c
> >
> > diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
> > index 6f124cc4ee34..484ea8842b06 100644
> > --- a/tools/testing/selftests/bpf/test_verifier.c
> > +++ b/tools/testing/selftests/bpf/test_verifier.c
> > @@ -295,6 +295,54 @@ static void bpf_fill_scale(struct bpf_test *self)
> >         }
> >  }
> >
> > +static void bpf_fill_perf_event_test_run_check(struct bpf_test *self)
> > +{
> > +       compiletime_assert(
> > +               sizeof(struct bpf_perf_event_data) <= TEST_CTX_LEN,
> > +               "buffer for ctx is too short to fit struct bpf_perf_event_data");
> > +       compiletime_assert(
> > +               sizeof(struct bpf_perf_event_value) <= TEST_DATA_LEN,
> > +               "buffer for data is too short to fit struct bpf_perf_event_value");
> > +
> > +       struct bpf_perf_event_data ctx = {
> > +               .regs = (bpf_user_pt_regs_t) {
> > +                       .r15 = 1,
> > +                       .r14 = 2,
> > +                       .r13 = 3,
> > +                       .r12 = 4,
> > +                       .rbp = 5,
> > +                       .rbx = 6,
> > +                       .r11 = 7,
> > +                       .r10 = 8,
> > +                       .r9 = 9,
> > +                       .r8 = 10,
> > +                       .rax = 11,
> > +                       .rcx = 12,
> > +                       .rdx = 13,
> > +                       .rsi = 14,
> > +                       .rdi = 15,
> > +                       .orig_rax = 16,
> > +                       .rip = 17,
> > +                       .cs = 18,
> > +                       .eflags = 19,
> > +                       .rsp = 20,
> > +                       .ss = 21,
> > +               },
> > +               .sample_period = 1,
> > +               .addr = 2,
> > +       };
> > +       struct bpf_perf_event_value data = {
> > +               .counter = 1,
> > +               .enabled = 2,
> > +               .running = 3,
> > +       };
> > +
> > +       memcpy(self->ctx, &ctx, sizeof(ctx));
> > +       memcpy(self->data, &data, sizeof(data));
>
> Just curious, just assignment didn't work?
>
> > +       free(self->fill_insns);
> > +       self->fill_insns = NULL;
> > +}
> > +
> >  /* BPF_SK_LOOKUP contains 13 instructions, if you need to fix up maps */
> >  #define BPF_SK_LOOKUP(func)                                            \
> >         /* struct bpf_sock_tuple tuple = {} */                          \
> > diff --git a/tools/testing/selftests/bpf/verifier/perf_event_run.c b/tools/testing/selftests/bpf/verifier/perf_event_run.c
> > new file mode 100644
> > index 000000000000..3f877458a7f8
> > --- /dev/null
> > +++ b/tools/testing/selftests/bpf/verifier/perf_event_run.c
> > @@ -0,0 +1,96 @@
> > +#define PER_LOAD_AND_CHECK_PTREG(PT_REG_FIELD, VALUE)                  \
> > +       PER_LOAD_AND_CHECK_CTX(offsetof(bpf_user_pt_regs_t, PT_REG_FIELD), VALUE)
> > +#define PER_LOAD_AND_CHECK_EVENT(PED_FIELD, VALUE)                     \
> > +       PER_LOAD_AND_CHECK_CTX(offsetof(struct bpf_perf_event_data, PED_FIELD), VALUE)
> > +#define PER_LOAD_AND_CHECK_CTX(OFFSET, VALUE)                          \
> > +       PER_LOAD_AND_CHECK_64(BPF_REG_4, BPF_REG_1, OFFSET, VALUE)
> > +#define PER_LOAD_AND_CHECK_VALUE(PEV_FIELD, VALUE)                     \
> > +       PER_LOAD_AND_CHECK_64(BPF_REG_7, BPF_REG_6, offsetof(struct bpf_perf_event_value, PEV_FIELD), VALUE)
>
> Wrap long lines? Try also running scripts/checkpatch.pl again these
> files you are modifying.

Will wrap. Checkpatch was also complaining about complex macro not
being inside parens, but I can't see how to wrap it in parens and keep
it working at the same time.

>
> > +#define PER_LOAD_AND_CHECK_64(DST, SRC, OFFSET, VALUE)                 \
> > +       BPF_LDX_MEM(BPF_DW, DST, SRC, OFFSET),                          \
> > +       BPF_JMP_IMM(BPF_JEQ, DST, VALUE, 2),                            \
> > +       BPF_MOV64_IMM(BPF_REG_0, VALUE),                                \
> > +       BPF_EXIT_INSN()
> > +
> > +{
> > +       "check if regs contain expected values",
> > +       .insns = {
> > +       PER_LOAD_AND_CHECK_PTREG(r15, 1),
> > +       PER_LOAD_AND_CHECK_PTREG(r14, 2),
> > +       PER_LOAD_AND_CHECK_PTREG(r13, 3),
> > +       PER_LOAD_AND_CHECK_PTREG(r12, 4),
> > +       PER_LOAD_AND_CHECK_PTREG(rbp, 5),
> > +       PER_LOAD_AND_CHECK_PTREG(rbx, 6),
> > +       PER_LOAD_AND_CHECK_PTREG(r11, 7),
> > +       PER_LOAD_AND_CHECK_PTREG(r10, 8),
> > +       PER_LOAD_AND_CHECK_PTREG(r9, 9),
> > +       PER_LOAD_AND_CHECK_PTREG(r8, 10),
> > +       PER_LOAD_AND_CHECK_PTREG(rax, 11),
> > +       PER_LOAD_AND_CHECK_PTREG(rcx, 12),
> > +       PER_LOAD_AND_CHECK_PTREG(rdx, 13),
> > +       PER_LOAD_AND_CHECK_PTREG(rsi, 14),
> > +       PER_LOAD_AND_CHECK_PTREG(rdi, 15),
> > +       PER_LOAD_AND_CHECK_PTREG(orig_rax, 16),
> > +       PER_LOAD_AND_CHECK_PTREG(rip, 17),
> > +       PER_LOAD_AND_CHECK_PTREG(cs, 18),
> > +       PER_LOAD_AND_CHECK_PTREG(eflags, 19),
> > +       PER_LOAD_AND_CHECK_PTREG(rsp, 20),
> > +       PER_LOAD_AND_CHECK_PTREG(ss, 21),
> > +       BPF_MOV64_IMM(BPF_REG_0, 0),
> > +       BPF_EXIT_INSN(),
> > +       },
> > +       .result = ACCEPT,
> > +       .prog_type = BPF_PROG_TYPE_PERF_EVENT,
> > +       .ctx_len = sizeof(struct bpf_perf_event_data),
> > +       .data_len = sizeof(struct bpf_perf_event_value),
> > +       .fill_helper = bpf_fill_perf_event_test_run_check,
> > +       .override_data_out_len = true,
> > +},
> > +{
> > +       "check if sample period and addr contain expected values",
> > +       .insns = {
> > +       PER_LOAD_AND_CHECK_EVENT(sample_period, 1),
> > +       PER_LOAD_AND_CHECK_EVENT(addr, 2),
> > +       BPF_MOV64_IMM(BPF_REG_0, 0),
> > +       BPF_EXIT_INSN(),
> > +       },
> > +       .result = ACCEPT,
> > +       .prog_type = BPF_PROG_TYPE_PERF_EVENT,
> > +       .ctx_len = sizeof(struct bpf_perf_event_data),
> > +       .data_len = sizeof(struct bpf_perf_event_value),
> > +       .fill_helper = bpf_fill_perf_event_test_run_check,
> > +       .override_data_out_len = true,
> > +},
> > +{
> > +       "check if bpf_perf_prog_read_value returns expected data",
> > +       .insns = {
> > +       // allocate space for a struct bpf_perf_event_value
> > +       BPF_MOV64_REG(BPF_REG_6, BPF_REG_10),
> > +       BPF_ALU64_IMM(BPF_ADD, BPF_REG_6, -(int)sizeof(struct bpf_perf_event_value)),
> > +       // prepare parameters for bpf_perf_prog_read_value(ctx, struct bpf_perf_event_value*, u32)
> > +       // BPF_REG_1 already contains the context
> > +       BPF_MOV64_REG(BPF_REG_2, BPF_REG_6),
> > +       BPF_MOV64_IMM(BPF_REG_3, sizeof(struct bpf_perf_event_value)),
> > +       BPF_EMIT_CALL(BPF_FUNC_perf_prog_read_value),
> > +       // check the return value
> > +       BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1),
> > +       BPF_EXIT_INSN(),
> > +       // check if the fields match the expected values
>
> Use /* */ comments.

Oops. Will fix.

>
> > +       PER_LOAD_AND_CHECK_VALUE(counter, 1),
> > +       PER_LOAD_AND_CHECK_VALUE(enabled, 2),
> > +       PER_LOAD_AND_CHECK_VALUE(running, 3),
> > +       BPF_MOV64_IMM(BPF_REG_0, 0),
> > +       BPF_EXIT_INSN(),
> > +       },
> > +       .result = ACCEPT,
> > +       .prog_type = BPF_PROG_TYPE_PERF_EVENT,
> > +       .ctx_len = sizeof(struct bpf_perf_event_data),
> > +       .data_len = sizeof(struct bpf_perf_event_value),
> > +       .fill_helper = bpf_fill_perf_event_test_run_check,
> > +       .override_data_out_len = true,
> > +},
> > +#undef PER_LOAD_AND_CHECK_64
> > +#undef PER_LOAD_AND_CHECK_VALUE
> > +#undef PER_LOAD_AND_CHECK_CTX
> > +#undef PER_LOAD_AND_CHECK_EVENT
> > +#undef PER_LOAD_AND_CHECK_PTREG
> > --
> > 2.20.1
> >



--
Kinvolk GmbH | Adalbertstr.6a, 10999 Berlin | tel: +491755589364
Geschäftsführer/Directors: Alban Crequy, Chris Kühl, Iago López Galeiras
Registergericht/Court of registration: Amtsgericht Charlottenburg
Registernummer/Registration number: HRB 171414 B
Ust-ID-Nummer/VAT ID number: DE302207000

^ permalink raw reply

* Re: [bpf-next v3 02/12] selftests/bpf: Avoid a clobbering of errno
From: Krzesimir Nowak @ 2019-07-12 17:31 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: open list, Alban Crequy, Iago López Galeiras,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
	Networking, bpf, xdp-newbies
In-Reply-To: <CAEf4Bzb-KW+p1zFcz39OSUuH0=DLFRNLa3NYT4V_-zz0Q_TJ5g@mail.gmail.com>

On Fri, Jul 12, 2019 at 2:59 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Thu, Jul 11, 2019 at 5:04 AM Krzesimir Nowak <krzesimir@kinvolk.io> wrote:
> >
> > On Thu, Jul 11, 2019 at 1:52 AM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> > >
> > > On Mon, Jul 8, 2019 at 3:42 PM Krzesimir Nowak <krzesimir@kinvolk.io> wrote:
> > > >
> > > > Save errno right after bpf_prog_test_run returns, so we later check
> > > > the error code actually set by bpf_prog_test_run, not by some libcap
> > > > function.
> > > >
> > > > Changes since v1:
> > > > - Fix the "Fixes:" tag to mention actual commit that introduced the
> > > >   bug
> > > >
> > > > Changes since v2:
> > > > - Move the declaration so it fits the reverse christmas tree style.
> > > >
> > > > Cc: Daniel Borkmann <daniel@iogearbox.net>
> > > > Fixes: 832c6f2c29ec ("bpf: test make sure to run unpriv test cases in test_verifier")
> > > > Signed-off-by: Krzesimir Nowak <krzesimir@kinvolk.io>
> > > > ---
> > > >  tools/testing/selftests/bpf/test_verifier.c | 4 +++-
> > > >  1 file changed, 3 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
> > > > index b8d065623ead..3fe126e0083b 100644
> > > > --- a/tools/testing/selftests/bpf/test_verifier.c
> > > > +++ b/tools/testing/selftests/bpf/test_verifier.c
> > > > @@ -823,16 +823,18 @@ static int do_prog_test_run(int fd_prog, bool unpriv, uint32_t expected_val,
> > > >         __u8 tmp[TEST_DATA_LEN << 2];
> > > >         __u32 size_tmp = sizeof(tmp);
> > > >         uint32_t retval;
> > > > +       int saved_errno;
> > > >         int err;
> > > >
> > > >         if (unpriv)
> > > >                 set_admin(true);
> > > >         err = bpf_prog_test_run(fd_prog, 1, data, size_data,
> > > >                                 tmp, &size_tmp, &retval, NULL);
> > >
> > > Given err is either 0 or -1, how about instead making err useful right
> > > here without extra variable?
> > >
> > > if (bpf_prog_test_run(...))
> > >         err = errno;
> >
> > I change it later to bpf_prog_test_run_xattr, which can also return
> > -EINVAL and then errno is not set. But this one probably should not be
>
> This is wrong. bpf_prog_test_run/bpf_prog_test_run_xattr should either
> always return -1 and set errno to actual error (like syscalls do), or
> always use return code with proper error. Give they are pretending to
> be just pure syscall, it's probably better to set errno to EINVAL and
> return -1 on invalid input args?

Yeah, this is inconsistent at best. But seems to be kind of expected?
See tools/testing/selftests/bpf/prog_tests/prog_run_xattr.c.

>
> > triggered by the test code. So not sure, probably would be better to
> > keep it as is for consistency?
> >
> > >
> > > > +       saved_errno = errno;
> > > >         if (unpriv)
> > > >                 set_admin(false);
> > > >         if (err) {
> > > > -               switch (errno) {
> > > > +               switch (saved_errno) {
> > > >                 case 524/*ENOTSUPP*/:
> > >
> > > ENOTSUPP is defined in include/linux/errno.h, is there any problem
> > > with using this in selftests?
> >
> > I just used whatever there was earlier. Seems like <linux/errno.h> is
> > not copied to tools include directory.
>
> Ok, let's leave it as is, thanks!
>
> >
> > >
> > > >                         printf("Did not run the program (not supported) ");
> > > >                         return 0;
> > > > --
> > > > 2.20.1
> > > >
> >
> >
> >
> > --
> > Kinvolk GmbH | Adalbertstr.6a, 10999 Berlin | tel: +491755589364
> > Geschäftsführer/Directors: Alban Crequy, Chris Kühl, Iago López Galeiras
> > Registergericht/Court of registration: Amtsgericht Charlottenburg
> > Registernummer/Registration number: HRB 171414 B
> > Ust-ID-Nummer/VAT ID number: DE302207000



-- 
Kinvolk GmbH | Adalbertstr.6a, 10999 Berlin | tel: +491755589364
Geschäftsführer/Directors: Alban Crequy, Chris Kühl, Iago López Galeiras
Registergericht/Court of registration: Amtsgericht Charlottenburg
Registernummer/Registration number: HRB 171414 B
Ust-ID-Nummer/VAT ID number: DE302207000

^ permalink raw reply

* Re: [PATCH v4 bpf-next 1/4] selftests/bpf: compile progs with -D__TARGET_ARCH_$(SRCARCH)
From: Andrii Nakryiko @ 2019-07-12 17:28 UTC (permalink / raw)
  To: Ilya Leoshkevich
  Cc: bpf, Networking, Y Song, Daniel Borkmann, Stanislav Fomichev,
	David S. Miller, Alexei Starovoitov
In-Reply-To: <6E5C9DDE-FF1D-4BFA-813E-7A0C3232B5F0@linux.ibm.com>

On Fri, Jul 12, 2019 at 2:00 AM Ilya Leoshkevich <iii@linux.ibm.com> wrote:
>
> > Am 12.07.2019 um 02:53 schrieb Andrii Nakryiko <andrii.nakryiko@gmail.com>:
> >
> > On Thu, Jul 11, 2019 at 7:32 AM Ilya Leoshkevich <iii@linux.ibm.com> wrote:
> >>
> >> This opens up the possibility of accessing registers in an
> >> arch-independent way.
> >>
> >> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
> >> ---
> >> tools/testing/selftests/bpf/Makefile | 4 +++-
> >> 1 file changed, 3 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
> >> index 2620406a53ec..ad84450e4ab8 100644
> >> --- a/tools/testing/selftests/bpf/Makefile
> >> +++ b/tools/testing/selftests/bpf/Makefile
> >> @@ -1,4 +1,5 @@
> >> # SPDX-License-Identifier: GPL-2.0
> >> +include ../../../scripts/Makefile.arch
> >>
> >> LIBDIR := ../../../lib
> >> BPFDIR := $(LIBDIR)/bpf
> >> @@ -138,7 +139,8 @@ CLANG_SYS_INCLUDES := $(shell $(CLANG) -v -E - </dev/null 2>&1 \
> >>
> >> CLANG_FLAGS = -I. -I./include/uapi -I../../../include/uapi \
> >>              $(CLANG_SYS_INCLUDES) \
> >> -             -Wno-compare-distinct-pointer-types
> >> +             -Wno-compare-distinct-pointer-types \
> >> +             -D__TARGET_ARCH_$(SRCARCH)
> >
> > samples/bpf/Makefile uses $(ARCH), why does it work for samples?
> > Should we update samples/bpf/Makefile as well?
>
> I believe that in common cases both are okay, but judging by
> linux:Makefile and linux:tools/scripts/Makefile.arch, one could use e.g.
> ARCH=i686, and that would be converted to SRCARCH=x86. So IMHO SRCARCH
> is safer, and we should change bpf/samples/Makefile. I could send a
> patch separately.

Yeah, let's do that then, thanks!

^ permalink raw reply

* Re: [PATCH v2 6/7] dt-bindings: net: realtek: Add property to configure LED mode
From: Matthias Kaehlcke @ 2019-07-12 17:28 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Rob Herring, Andrew Lunn, Heiner Kallweit, David S . Miller,
	Mark Rutland, netdev, devicetree, linux-kernel@vger.kernel.org,
	Douglas Anderson
In-Reply-To: <7d102d81-750d-32d9-a554-95f018e69f2f@gmail.com>

Hi Florian,

On Wed, Jul 10, 2019 at 09:28:39AM -0700, Florian Fainelli wrote:
> On 7/10/19 8:55 AM, Rob Herring wrote:
> > On Wed, Jul 3, 2019 at 5:23 PM Matthias Kaehlcke <mka@chromium.org> wrote:
> >>
> >> Hi Florian,
> >>
> >> On Wed, Jul 03, 2019 at 02:37:47PM -0700, Florian Fainelli wrote:
> >>> On 7/3/19 12:37 PM, Matthias Kaehlcke wrote:
> >>>> The LED behavior of some Realtek PHYs is configurable. Add the
> >>>> property 'realtek,led-modes' to specify the configuration of the
> >>>> LEDs.
> >>>>
> >>>> Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
> >>>> ---
> >>>> Changes in v2:
> >>>> - patch added to the series
> >>>> ---
> >>>>  .../devicetree/bindings/net/realtek.txt         |  9 +++++++++
> >>>>  include/dt-bindings/net/realtek.h               | 17 +++++++++++++++++
> >>>>  2 files changed, 26 insertions(+)
> >>>>  create mode 100644 include/dt-bindings/net/realtek.h
> >>>>
> >>>> diff --git a/Documentation/devicetree/bindings/net/realtek.txt b/Documentation/devicetree/bindings/net/realtek.txt
> >>>> index 71d386c78269..40b0d6f9ee21 100644
> >>>> --- a/Documentation/devicetree/bindings/net/realtek.txt
> >>>> +++ b/Documentation/devicetree/bindings/net/realtek.txt
> >>>> @@ -9,6 +9,12 @@ Optional properties:
> >>>>
> >>>>     SSC is only available on some Realtek PHYs (e.g. RTL8211E).
> >>>>
> >>>> +- realtek,led-modes: LED mode configuration.
> >>>> +
> >>>> +   A 0..3 element vector, with each element configuring the operating
> >>>> +   mode of an LED. Omitted LEDs are turned off. Allowed values are
> >>>> +   defined in "include/dt-bindings/net/realtek.h".
> >>>
> >>> This should probably be made more general and we should define LED modes
> >>> that makes sense regardless of the PHY device, introduce a set of
> >>> generic functions for validating and then add new function pointer for
> >>> setting the LED configuration to the PHY driver. This would allow to be
> >>> more future proof where each PHY driver could expose standard LEDs class
> >>> devices to user-space, and it would also allow facilities like: ethtool
> >>> -p to plug into that.
> >>>
> >>> Right now, each driver invents its own way of configuring LEDs, that
> >>> does not scale, and there is not really a good reason for that other
> >>> than reviewing drivers in isolation and therefore making it harder to
> >>> extract the commonality. Yes, I realize that since you are the latest
> >>> person submitting something in that area, you are being selected :)
> > 
> > I agree.
> > 
> >> I see the merit of your proposal to come up with a generic mechanism
> >> to configure Ethernet LEDs, however I can't justify spending much of
> >> my work time on this. If it is deemed useful I'm happy to send another
> >> version of the current patchset that addresses the reviewer's comments,
> >> but if the implementation of a generic LED configuration interface is
> >> a requirement I will have to abandon at least the LED configuration
> >> part of this series.
> > 
> > Can you at least define a common binding for this. Maybe that's just
> > removing 'realtek'. While the kernel side can evolve to a common
> > infrastructure, the DT bindings can't.
> 
> That would be a great start, and that is actually what I had in mind
> (should have been more specific), I was not going to have you Matthias
> do the grand slam and convert all this LED configuration into the LEDs
> class etc. that would not be fair.
> 
> It seems to be that we can fairly easily agree on a common binding for
> LED configuration, I would define something along those lines to be
> flexible:
> 
> phy-led-configuration = <LED_NUM_MASK LED_CFG_MASK>;
> 
> where LED_NUM_MASK is one of:
> 
> 0 -> link
> 1 -> activity
> 2 -> speed

I don't understand this proposal completely. Is LED_NUM_MASK actually
a mask/set (potentially containing multiple LEDs) or is it "one of"
the LEDs?

Are you suggesting to assign each LED a specific role (link, activity,
speed)?

Could you maybe post a specific example involving multiple LEDs?

Thanks

Matthias

> that way you can define single/dual/triple LED configurations by
> updating the bitmask.
> 
> LED_CFG_MASK is one of:
> 
> 0 -> LED_CFG_10
> 1 -> LED_CFG_100
> 2 -> LED_CFG_1000
> 
> (let's assume 1Gbps or less for now)
> 
> or this can be combined in a single cell with a left shift.
> 
> Andrew, Heiner, do you see that approach working correctly and scaling
> appropriately?

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox