* Re: [PATCH 16/16] compiler_attributes.h: add note about __section
From: Sedat Dilek @ 2019-08-19 18:01 UTC (permalink / raw)
To: Nick Desaulniers
Cc: akpm, jpoimboe, yhs, Miguel Ojeda, Clang-Built-Linux ML,
Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
linux-kernel, netdev, bpf
In-Reply-To: <20190812215052.71840-16-ndesaulniers@google.com>
On Mon, Aug 12, 2019 at 11:53 PM Nick Desaulniers
<ndesaulniers@google.com> wrote:
>
> The antipattern described can be found with:
> $ grep -e __section\(\" -r -e __section__\(\"
>
> Link: https://bugs.llvm.org/show_bug.cgi?id=42950
> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com> [ Linux v5.3-rc5 ]
Patchset "for-5.3/x86-section-name-escaping":
include/linux/compiler.h: remove unused KENTRY macro
include/linux: prefer __section from compiler_attributes.h
include/asm-generic: prefer __section from compiler_attributes.h
x86: prefer __section from compiler_attributes.h
Thanks.
- Sedat -
> ---
> include/linux/compiler_attributes.h | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> diff --git a/include/linux/compiler_attributes.h b/include/linux/compiler_attributes.h
> index 6b318efd8a74..f8c008d7f616 100644
> --- a/include/linux/compiler_attributes.h
> +++ b/include/linux/compiler_attributes.h
> @@ -225,6 +225,16 @@
> #define __pure __attribute__((__pure__))
>
> /*
> + * Note: Since this macro makes use of the "stringification operator" `#`, a
> + * quoted string literal should not be passed to it. eg.
> + * prefer:
> + * __section(.foo)
> + * to:
> + * __section(".foo")
> + * unless the section name is dynamically built up, in which case the
> + * verbose __attribute__((__section__(".foo" x))) should be preferred.
> + * See also: https://bugs.llvm.org/show_bug.cgi?id=42950
> + *
> * gcc: https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-section-function-attribute
> * gcc: https://gcc.gnu.org/onlinedocs/gcc/Common-Variable-Attributes.html#index-section-variable-attribute
> * clang: https://clang.llvm.org/docs/AttributeReference.html#section-declspec-allocate
> --
> 2.23.0.rc1.153.gdeed80330f-goog
>
^ permalink raw reply
* Re: [PATCH net-next 3/6] net: dsa: enable and disable all ports
From: Vivien Didelot @ 2019-08-19 18:03 UTC (permalink / raw)
To: Marek Behun; +Cc: netdev, davem, f.fainelli, andrew
In-Reply-To: <20190819193246.0e40a1d4@nic.cz>
Hi Marek,
On Mon, 19 Aug 2019 19:32:46 +0200, Marek Behun <marek.behun@nic.cz> wrote:
> > Call the .port_enable and .port_disable functions for all ports,
> > not only the user ports, so that drivers may optimize the power
> > consumption of all ports after a successful setup.
> >
> > Unused ports are now disabled on setup. CPU and DSA ports are now
> > enabled on setup and disabled on teardown. User ports were already
> > enabled at slave creation and disabled at slave destruction.
>
> My original reason for enabling CPU and DSA ports is that enabling
> serdes irq could not be done in .setup in mv88e6xxx, since the required
> phylink structures did not yet exists for those ports.
>
> The case after this patch would be that .port_enable is called for
> CPU/DSA ports right after these required phylink structures are created
> for this port. A thought came to me while reading this that some driver
> in the future can expect, in their implementation of
> port_enable/port_disable, that phylink structures already exist for all
> ports, not just the one being currently enabled/disabled.
>
> Wouldn't it be safer if CPU/DSA ports were enabled in setup after all
> ports are registered, and disabled in teardown before ports are
> unregistered?
>
> Current:
> ->setup()
> for each port
> dsa_port_link_register_of()
> dsa_port_enable()
>
> Proposed:
> ->setup()
> for each port
> dsa_port_link_register_of()
> for each port
> dsa_port_enable()
I understand what you mean, but the scope of .port_enable really is the
port itself, so I do not expect a driver to configure something else. Also,
I prefer to keep it simple at the moment and not speculate about what a
driver may need in the future, as we may also wonder which switch must be
enabled first in a tree, etc. If that case happens in the future, we can
definitely isolate the ports enabling code after the tree is setup.
So if this series meets your requirement for now, I'd keep it like that.
Thanks,
Vivien
^ permalink raw reply
* Re: [PATCH 15/16] include/linux/compiler.h: remove unused KENTRY macro
From: Sedat Dilek @ 2019-08-19 18:03 UTC (permalink / raw)
To: Nick Desaulniers
Cc: akpm, jpoimboe, yhs, Miguel Ojeda, Clang-Built-Linux ML,
Luc Van Oostenryck, Alexei Starovoitov, Daniel Borkmann,
Martin KaFai Lau, Song Liu, linux-sparse, linux-kernel, netdev,
bpf
In-Reply-To: <20190812215052.71840-15-ndesaulniers@google.com>
On Mon, Aug 12, 2019 at 11:53 PM Nick Desaulniers
<ndesaulniers@google.com> wrote:
>
> This macro is not used throughout the kernel. Delete it rather than
> update the __section to be a fully spelled out
> __attribute__((__section__())) to avoid
> https://bugs.llvm.org/show_bug.cgi?id=42950.
>
> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com> [ Linux v5.3-rc5 ]
Patchset "for-5.3/x86-section-name-escaping" (5 patches):
compiler_attributes.h: add note about __section
include/linux/compiler.h: remove unused KENTRY macro
include/linux: prefer __section from compiler_attributes.h
include/asm-generic: prefer __section from compiler_attributes.h
x86: prefer __section from compiler_attributes.h
Thanks.
- Sedat -
> ---
> include/linux/compiler.h | 23 -----------------------
> 1 file changed, 23 deletions(-)
>
> diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> index 5e88e7e33abe..f01c1e527f85 100644
> --- a/include/linux/compiler.h
> +++ b/include/linux/compiler.h
> @@ -136,29 +136,6 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
> } while (0)
> #endif
>
> -/*
> - * KENTRY - kernel entry point
> - * This can be used to annotate symbols (functions or data) that are used
> - * without their linker symbol being referenced explicitly. For example,
> - * interrupt vector handlers, or functions in the kernel image that are found
> - * programatically.
> - *
> - * Not required for symbols exported with EXPORT_SYMBOL, or initcalls. Those
> - * are handled in their own way (with KEEP() in linker scripts).
> - *
> - * KENTRY can be avoided if the symbols in question are marked as KEEP() in the
> - * linker script. For example an architecture could KEEP() its entire
> - * boot/exception vector code rather than annotate each function and data.
> - */
> -#ifndef KENTRY
> -# define KENTRY(sym) \
> - extern typeof(sym) sym; \
> - static const unsigned long __kentry_##sym \
> - __used \
> - __section("___kentry" "+" #sym ) \
> - = (unsigned long)&sym;
> -#endif
> -
> #ifndef RELOC_HIDE
> # define RELOC_HIDE(ptr, off) \
> ({ unsigned long __ptr; \
> --
> 2.23.0.rc1.153.gdeed80330f-goog
>
^ permalink raw reply
* Re: [PATCH 14/16] include/linux: prefer __section from compiler_attributes.h
From: Sedat Dilek @ 2019-08-19 18:05 UTC (permalink / raw)
To: Nick Desaulniers
Cc: akpm, jpoimboe, yhs, Miguel Ojeda, Clang-Built-Linux ML,
Luc Van Oostenryck, Lai Jiangshan, Paul E. McKenney,
Josh Triplett, Steven Rostedt, Mathieu Desnoyers,
Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
Thomas Gleixner, Ingo Molnar, Peter Zijlstra (Intel),
Nicholas Piggin, Jiri Kosina, Will Deacon, Ard Biesheuvel,
Michael Ellerman, Masahiro Yamada, Hans Liljestrand,
Elena Reshetova, David Windsor, Marc Zyngier, Ming Lei,
Dou Liyang, Julien Thierry, Mauro Carvalho Chehab, Jens Axboe,
linux-kernel, linux-sparse, rcu, netdev, bpf
In-Reply-To: <20190812215052.71840-14-ndesaulniers@google.com>
On Mon, Aug 12, 2019 at 11:53 PM Nick Desaulniers
<ndesaulniers@google.com> wrote:
>
> Link: https://github.com/ClangBuiltLinux/linux/issues/619
> Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
> Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com>
> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com> [ Linux v5.3-rc5 ]
Patchset "for-5.3/x86-section-name-escaping" (5 patches):
compiler_attributes.h: add note about __section
include/linux/compiler.h: remove unused KENTRY macro
include/linux: prefer __section from compiler_attributes.h
include/asm-generic: prefer __section from compiler_attributes.h
x86: prefer __section from compiler_attributes.h
Thanks.
- Sedat -
> ---
> include/linux/cache.h | 6 +++---
> include/linux/compiler.h | 8 ++++----
> include/linux/cpu.h | 2 +-
> include/linux/export.h | 2 +-
> include/linux/init_task.h | 4 ++--
> include/linux/interrupt.h | 5 ++---
> include/linux/sched/debug.h | 2 +-
> include/linux/srcutree.h | 2 +-
> 8 files changed, 15 insertions(+), 16 deletions(-)
>
> diff --git a/include/linux/cache.h b/include/linux/cache.h
> index 750621e41d1c..3f4df9eef1e1 100644
> --- a/include/linux/cache.h
> +++ b/include/linux/cache.h
> @@ -28,7 +28,7 @@
> * but may get written to during init, so can't live in .rodata (via "const").
> */
> #ifndef __ro_after_init
> -#define __ro_after_init __attribute__((__section__(".data..ro_after_init")))
> +#define __ro_after_init __section(.data..ro_after_init)
> #endif
>
> #ifndef ____cacheline_aligned
> @@ -45,8 +45,8 @@
>
> #ifndef __cacheline_aligned
> #define __cacheline_aligned \
> - __attribute__((__aligned__(SMP_CACHE_BYTES), \
> - __section__(".data..cacheline_aligned")))
> + __aligned(SMP_CACHE_BYTES) \
> + __section(.data..cacheline_aligned)
> #endif /* __cacheline_aligned */
>
> #ifndef __cacheline_aligned_in_smp
> diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> index f0fd5636fddb..5e88e7e33abe 100644
> --- a/include/linux/compiler.h
> +++ b/include/linux/compiler.h
> @@ -24,7 +24,7 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
> long ______r; \
> static struct ftrace_likely_data \
> __aligned(4) \
> - __section("_ftrace_annotated_branch") \
> + __section(_ftrace_annotated_branch) \
> ______f = { \
> .data.func = __func__, \
> .data.file = __FILE__, \
> @@ -60,7 +60,7 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
> #define __trace_if_value(cond) ({ \
> static struct ftrace_branch_data \
> __aligned(4) \
> - __section("_ftrace_branch") \
> + __section(_ftrace_branch) \
> __if_trace = { \
> .func = __func__, \
> .file = __FILE__, \
> @@ -118,7 +118,7 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
> ".popsection\n\t"
>
> /* Annotate a C jump table to allow objtool to follow the code flow */
> -#define __annotate_jump_table __section(".rodata..c_jump_table")
> +#define __annotate_jump_table __section(.rodata..c_jump_table)
>
> #else
> #define annotate_reachable()
> @@ -298,7 +298,7 @@ unsigned long read_word_at_a_time(const void *addr)
> * visible to the compiler.
> */
> #define __ADDRESSABLE(sym) \
> - static void * __section(".discard.addressable") __used \
> + static void * __section(.discard.addressable) __used \
> __PASTE(__addressable_##sym, __LINE__) = (void *)&sym;
>
> /**
> diff --git a/include/linux/cpu.h b/include/linux/cpu.h
> index fcb1386bb0d4..186bbd79d6ce 100644
> --- a/include/linux/cpu.h
> +++ b/include/linux/cpu.h
> @@ -166,7 +166,7 @@ void cpu_startup_entry(enum cpuhp_state state);
> void cpu_idle_poll_ctrl(bool enable);
>
> /* Attach to any functions which should be considered cpuidle. */
> -#define __cpuidle __attribute__((__section__(".cpuidle.text")))
> +#define __cpuidle __section(.cpuidle.text)
>
> bool cpu_in_idle(unsigned long pc);
>
> diff --git a/include/linux/export.h b/include/linux/export.h
> index fd8711ed9ac4..808c1a0c2ef9 100644
> --- a/include/linux/export.h
> +++ b/include/linux/export.h
> @@ -104,7 +104,7 @@ struct kernel_symbol {
> * discarded in the final link stage.
> */
> #define __ksym_marker(sym) \
> - static int __ksym_marker_##sym[0] __section(".discard.ksym") __used
> + static int __ksym_marker_##sym[0] __section(.discard.ksym) __used
>
> #define __EXPORT_SYMBOL(sym, sec) \
> __ksym_marker(sym); \
> diff --git a/include/linux/init_task.h b/include/linux/init_task.h
> index 6049baa5b8bc..50139505da34 100644
> --- a/include/linux/init_task.h
> +++ b/include/linux/init_task.h
> @@ -51,12 +51,12 @@ extern struct cred init_cred;
>
> /* Attach to the init_task data structure for proper alignment */
> #ifdef CONFIG_ARCH_TASK_STRUCT_ON_STACK
> -#define __init_task_data __attribute__((__section__(".data..init_task")))
> +#define __init_task_data __section(.data..init_task)
> #else
> #define __init_task_data /**/
> #endif
>
> /* Attach to the thread_info data structure for proper alignment */
> -#define __init_thread_info __attribute__((__section__(".data..init_thread_info")))
> +#define __init_thread_info __section(.data..init_thread_info)
>
> #endif
> diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
> index 5b8328a99b2a..29debfe4dd0f 100644
> --- a/include/linux/interrupt.h
> +++ b/include/linux/interrupt.h
> @@ -741,8 +741,7 @@ extern int arch_early_irq_init(void);
> /*
> * We want to know which function is an entrypoint of a hardirq or a softirq.
> */
> -#define __irq_entry __attribute__((__section__(".irqentry.text")))
> -#define __softirq_entry \
> - __attribute__((__section__(".softirqentry.text")))
> +#define __irq_entry __section(.irqentry.text)
> +#define __softirq_entry __section(.softirqentry.text)
>
> #endif
> diff --git a/include/linux/sched/debug.h b/include/linux/sched/debug.h
> index 95fb9e025247..e17b66221fdd 100644
> --- a/include/linux/sched/debug.h
> +++ b/include/linux/sched/debug.h
> @@ -42,7 +42,7 @@ extern void proc_sched_set_task(struct task_struct *p);
> #endif
>
> /* Attach to any functions which should be ignored in wchan output. */
> -#define __sched __attribute__((__section__(".sched.text")))
> +#define __sched __section(.sched.text)
>
> /* Linker adds these: start and end of __sched functions */
> extern char __sched_text_start[], __sched_text_end[];
> diff --git a/include/linux/srcutree.h b/include/linux/srcutree.h
> index 9cfcc8a756ae..9de652f4e1bd 100644
> --- a/include/linux/srcutree.h
> +++ b/include/linux/srcutree.h
> @@ -124,7 +124,7 @@ struct srcu_struct {
> # define __DEFINE_SRCU(name, is_static) \
> is_static struct srcu_struct name; \
> struct srcu_struct * const __srcu_struct_##name \
> - __section("___srcu_struct_ptrs") = &name
> + __section(___srcu_struct_ptrs) = &name
> #else
> # define __DEFINE_SRCU(name, is_static) \
> static DEFINE_PER_CPU(struct srcu_data, name##_srcu_data); \
> --
> 2.23.0.rc1.153.gdeed80330f-goog
>
^ permalink raw reply
* Re: [RFC PATCH bpf-next 00/14] xdp_flow: Flow offload to XDP
From: Jakub Kicinski @ 2019-08-19 18:15 UTC (permalink / raw)
To: Toshiaki Makita
Cc: Stanislav Fomichev, Alexei Starovoitov, Daniel Borkmann,
Martin KaFai Lau, Song Liu, Yonghong Song, David S. Miller,
Jesper Dangaard Brouer, John Fastabend, Jamal Hadi Salim,
Cong Wang, Jiri Pirko, netdev, bpf, William Tu
In-Reply-To: <5e9bee13-a746-f148-00de-feb7cb7b1403@gmail.com>
On Sat, 17 Aug 2019 23:01:59 +0900, Toshiaki Makita wrote:
> On 19/08/17 (土) 3:52:24, Jakub Kicinski wrote:
> > On Fri, 16 Aug 2019 10:28:10 +0900, Toshiaki Makita wrote:
> >> On 2019/08/16 4:22, Jakub Kicinski wrote:
> >>> There's a certain allure in bringing the in-kernel BPF translation
> >>> infrastructure forward. OTOH from system architecture perspective IMHO
> >>> it does seem like a task best handed in user space. bpfilter can replace
> >>> iptables completely, here we're looking at an acceleration relatively
> >>> loosely coupled with flower.
> >>
> >> I don't think it's loosely coupled. Emulating TC behavior in userspace
> >> is not so easy.
> >>
> >> Think about recent multi-mask support in flower. Previously userspace could
> >> assume there is one mask and hash table for each preference in TC. After the
> >> change TC accepts different masks with the same pref. Such a change tends to
> >> break userspace emulation. It may ignore masks passed from flow insertion
> >> and use the mask remembered when the first flow of the pref is inserted. It
> >> may override the mask of all existing flows with the pref. It may fail to
> >> insert such flows. Any of them would result in unexpected wrong datapath
> >> handling which is critical.
> >> I think such an emulation layer needs to be updated in sync with TC.
> >
> > Oh, so you're saying that if xdp_flow is merged all patches to
> > cls_flower and netfilter which affect flow offload will be required
> > to update xdp_flow as well?
>
> Hmm... you are saying that we are allowed to break other in-kernel
> subsystem by some change? Sounds strange...
No I'm not saying that, please don't put words in my mouth.
I'm asking you if that's your intention.
Having an implementation nor support a feature of another implementation
and degrade gracefully to the slower one is not necessarily breakage.
We need to make a concious decision here, hence the clarifying question.
> > That's a question of policy. Technically the implementation in user
> > space is equivalent.
> >
> > The advantage of user space implementation is that you can add more
> > to it and explore use cases which do not fit in the flow offload API,
> > but are trivial for BPF. Not to mention the obvious advantage of
> > decoupling the upgrade path.
>
> I understand the advantage, but I can't trust such a third-party kernel
> emulation solution for this kind of thing which handles critical data path.
That's a strange argument to make. All production data path BPF today
comes from user space.
> > Personally I'm not happy with the way this patch set messes with the
> > flow infrastructure. You should use the indirect callback
> > infrastructure instead, and that way you can build the whole thing
> > touching none of the flow offload core.
>
> I don't want to mess up the core flow infrastructure either. I'm all
> ears about less invasive ways. Using indirect callback sounds like a
> good idea. Will give it a try. Many thanks.
Excellent, thanks!
^ permalink raw reply
* Re: [PATCH 00/16] treewide: prefer __section from compiler_attributes.h
From: Sedat Dilek @ 2019-08-19 18:18 UTC (permalink / raw)
To: Nick Desaulniers
Cc: akpm, jpoimboe, yhs, Miguel Ojeda, Clang-Built-Linux ML,
Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
netdev, bpf
In-Reply-To: <20190812215052.71840-17-ndesaulniers@google.com>
On Mon, Aug 12, 2019 at 11:53 PM Nick Desaulniers
<ndesaulniers@google.com> wrote:
>
> GCC unescapes escaped string section names while Clang does not. Because
> __section uses the `#` stringification operator for the section name, it
> doesn't need to be escaped.
>
> This fixes an Oops observed in distro's that use systemd and not
> net.core.bpf_jit_enable=1, when their kernels are compiled with Clang.
>
> Instead, we should:
> 1. Prefer __section(.section_name_no_quotes).
> 2. Only use __attribute__((__section(".section"))) when creating the
> section name via C preprocessor (see the definition of __define_initcall
> in arch/um/include/shared/init.h).
>
> This antipattern was found with:
> $ grep -e __section\(\" -e __section__\(\" -r
>
> See the discussions in:
> https://bugs.llvm.org/show_bug.cgi?id=42950
> https://marc.info/?l=linux-netdev&m=156412960619946&w=2
>
> Nick Desaulniers (16):
> s390/boot: fix section name escaping
> arc: prefer __section from compiler_attributes.h
> parisc: prefer __section from compiler_attributes.h
> um: prefer __section from compiler_attributes.h
> sh: prefer __section from compiler_attributes.h
> ia64: prefer __section from compiler_attributes.h
> arm: prefer __section from compiler_attributes.h
> mips: prefer __section from compiler_attributes.h
> sparc: prefer __section from compiler_attributes.h
> powerpc: prefer __section and __printf from compiler_attributes.h
> x86: prefer __section from compiler_attributes.h
> arm64: prefer __section from compiler_attributes.h
> include/asm-generic: prefer __section from compiler_attributes.h
> include/linux: prefer __section from compiler_attributes.h
> include/linux/compiler.h: remove unused KENTRY macro
> compiler_attributes.h: add note about __section
>
Hi Nick,
thanks for this patchset and the nice section names cleanup and simplification.
I have tested 5 relevant patches for my x86-64 Debian/buster system.
Patchset "for-5.3/x86-section-name-escaping" (5 patches):
compiler_attributes.h: add note about __section
include/linux/compiler.h: remove unused KENTRY macro
include/linux: prefer __section from compiler_attributes.h
include/asm-generic: prefer __section from compiler_attributes.h
x86: prefer __section from compiler_attributes.h
Toolchain: LLVM/Clang compiler and LLD linker version 9.0.0-rc2 (from
Debian/experimental)
I can boot on bare metal.
$ cat /proc/version
Linux version 5.3.0-rc5-1-amd64-cbl-asmgoto
(sedat.dilek@gmail.com@iniza) (clang version 9.0.0-+rc2-1~exp1
(tags/RELEASE_900/rc2)) #1~buster+dileks1 SMP 2019-08-19
I have sent by Tested-by to the single patches.
Have a nice day,
- Sedat -
> arch/arc/include/asm/linkage.h | 8 +++----
> arch/arc/include/asm/mach_desc.h | 3 +--
> arch/arm/include/asm/cache.h | 2 +-
> arch/arm/include/asm/mach/arch.h | 4 ++--
> arch/arm/include/asm/setup.h | 2 +-
> arch/arm64/include/asm/cache.h | 2 +-
> arch/arm64/kernel/smp_spin_table.c | 2 +-
> arch/ia64/include/asm/cache.h | 2 +-
> arch/mips/include/asm/cache.h | 2 +-
> arch/parisc/include/asm/cache.h | 2 +-
> arch/parisc/include/asm/ldcw.h | 2 +-
> arch/powerpc/boot/main.c | 3 +--
> arch/powerpc/boot/ps3.c | 6 ++----
> arch/powerpc/include/asm/cache.h | 2 +-
> arch/powerpc/kernel/btext.c | 2 +-
> arch/s390/boot/startup.c | 2 +-
> arch/sh/include/asm/cache.h | 2 +-
> arch/sparc/include/asm/cache.h | 2 +-
> arch/sparc/kernel/btext.c | 2 +-
> arch/um/kernel/um_arch.c | 6 +++---
> arch/x86/include/asm/cache.h | 2 +-
> arch/x86/include/asm/intel-mid.h | 2 +-
> arch/x86/include/asm/iommu_table.h | 5 ++---
> arch/x86/include/asm/irqflags.h | 2 +-
> arch/x86/include/asm/mem_encrypt.h | 2 +-
> arch/x86/kernel/cpu/cpu.h | 3 +--
> include/asm-generic/error-injection.h | 2 +-
> include/asm-generic/kprobes.h | 5 ++---
> include/linux/cache.h | 6 +++---
> include/linux/compiler.h | 31 ++++-----------------------
> include/linux/compiler_attributes.h | 10 +++++++++
> include/linux/cpu.h | 2 +-
> include/linux/export.h | 2 +-
> include/linux/init_task.h | 4 ++--
> include/linux/interrupt.h | 5 ++---
> include/linux/sched/debug.h | 2 +-
> include/linux/srcutree.h | 2 +-
> 37 files changed, 62 insertions(+), 83 deletions(-)
>
> --
> 2.23.0.rc1.153.gdeed80330f-goog
>
^ permalink raw reply
* [net PATCH] net/smc: make sure EPOLLOUT is raised
From: Jason Baron @ 2019-08-19 18:36 UTC (permalink / raw)
To: davem; +Cc: netdev, Eric Dumazet, Ursula Braun, Karsten Graul
Currently, we are only explicitly setting SOCK_NOSPACE on a write timeout
for non-blocking sockets. Epoll() edge-trigger mode relies on SOCK_NOSPACE
being set when -EAGAIN is returned to ensure that EPOLLOUT is raised.
Expand the setting of SOCK_NOSPACE to non-blocking sockets as well that can
use SO_SNDTIMEO to adjust their write timeout. This mirrors the behavior
that Eric Dumazet introduced for tcp sockets.
Signed-off-by: Jason Baron <jbaron@akamai.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Ursula Braun <ubraun@linux.ibm.com>
Cc: Karsten Graul <kgraul@linux.ibm.com>
---
net/smc/smc_tx.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/net/smc/smc_tx.c b/net/smc/smc_tx.c
index f0de323..6c8f09c 100644
--- a/net/smc/smc_tx.c
+++ b/net/smc/smc_tx.c
@@ -76,13 +76,11 @@ static int smc_tx_wait(struct smc_sock *smc, int flags)
DEFINE_WAIT_FUNC(wait, woken_wake_function);
struct smc_connection *conn = &smc->conn;
struct sock *sk = &smc->sk;
- bool noblock;
long timeo;
int rc = 0;
/* similar to sk_stream_wait_memory */
timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT);
- noblock = timeo ? false : true;
add_wait_queue(sk_sleep(sk), &wait);
while (1) {
sk_set_bit(SOCKWQ_ASYNC_NOSPACE, sk);
@@ -97,8 +95,8 @@ static int smc_tx_wait(struct smc_sock *smc, int flags)
break;
}
if (!timeo) {
- if (noblock)
- set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
+ /* ensure EPOLLOUT is subsequently generated */
+ set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
rc = -EAGAIN;
break;
}
--
2.7.4
^ permalink raw reply related
* Re: [PATCH net] tcp: make sure EPOLLOUT wont be missed
From: Jason Baron @ 2019-08-19 18:40 UTC (permalink / raw)
To: Eric Dumazet, Eric Dumazet, David S . Miller
Cc: netdev, Soheil Hassas Yeganeh, Neal Cardwell, Vladimir Rutsky
In-Reply-To: <c31d7c04-9d6f-aefb-500e-5b9b635ff221@gmail.com>
On 8/17/19 12:26 PM, Eric Dumazet wrote:
>
>
> On 8/17/19 4:19 PM, Jason Baron wrote:
>>
>>
>> On 8/17/19 12:26 AM, Eric Dumazet wrote:
>>> As Jason Baron explained in commit 790ba4566c1a ("tcp: set SOCK_NOSPACE
>>> under memory pressure"), it is crucial we properly set SOCK_NOSPACE
>>> when needed.
>>>
>>> However, Jason patch had a bug, because the 'nonblocking' status
>>> as far as sk_stream_wait_memory() is concerned is governed
>>> by MSG_DONTWAIT flag passed at sendmsg() time :
>>>
>>> long timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT);
>>>
>>> So it is very possible that tcp sendmsg() calls sk_stream_wait_memory(),
>>> and that sk_stream_wait_memory() returns -EAGAIN with SOCK_NOSPACE
>>> cleared, if sk->sk_sndtimeo has been set to a small (but not zero)
>>> value.
>>
>> Is MSG_DONTWAIT not set in this case? The original patch was intended
>> only for the explicit non-blocking case. The epoll manpage says:
>> "EPOLLET flag should use nonblocking file descriptors". So the original
>> intention was not to impact the blocking case. This seems to me like
>> a different use-case.
>>
>
> I guess the problem is how we define 'non-blocking' ...
>
> SO_SNDTIMEO can be used by application to implement a variation of non-blocking,
> by waiting for a socket event with a short timeout, to maybe recover
> from memory pressure conditions in a more efficient way than simply looping.
>
> Note that the man page for epoll() only _suggests_ to use nonblocking file descriptors.
>
> <quote>
> The suggested way to use epoll as an edge-triggered (EPOLLET)
> interface is as follows:
>
> i with nonblocking file descriptors; and
>
> ii by waiting for an event only after read(2) or
> write(2) return EAGAIN.
> </quote>
>
>
Ok, seems reasonable:
Acked-by: Jason Baron <jbaron@akamai.com>
I found a similar pattern in net/smc/smc_tx.c, which I also just sent a
patch for.
Thanks,
-Jason
^ permalink raw reply
* [PATCH 4/5] netfilter: xt_nfacct: Fix alignment mismatch in xt_nfacct_match_info
From: Pablo Neira Ayuso @ 2019-08-19 18:49 UTC (permalink / raw)
To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <20190819184911.15263-1-pablo@netfilter.org>
From: Juliana Rodrigueiro <juliana.rodrigueiro@intra2net.com>
When running a 64-bit kernel with a 32-bit iptables binary, the size of
the xt_nfacct_match_info struct diverges.
kernel: sizeof(struct xt_nfacct_match_info) : 40
iptables: sizeof(struct xt_nfacct_match_info)) : 36
Trying to append nfacct related rules results in an unhelpful message.
Although it is suggested to look for more information in dmesg, nothing
can be found there.
# iptables -A <chain> -m nfacct --nfacct-name <acct-object>
iptables: Invalid argument. Run `dmesg' for more information.
This patch fixes the memory misalignment by enforcing 8-byte alignment
within the struct's first revision. This solution is often used in many
other uapi netfilter headers.
Signed-off-by: Juliana Rodrigueiro <juliana.rodrigueiro@intra2net.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
include/uapi/linux/netfilter/xt_nfacct.h | 5 +++++
net/netfilter/xt_nfacct.c | 36 ++++++++++++++++++++++----------
2 files changed, 30 insertions(+), 11 deletions(-)
diff --git a/include/uapi/linux/netfilter/xt_nfacct.h b/include/uapi/linux/netfilter/xt_nfacct.h
index 5c8a4d760ee3..b5123ab8d54a 100644
--- a/include/uapi/linux/netfilter/xt_nfacct.h
+++ b/include/uapi/linux/netfilter/xt_nfacct.h
@@ -11,4 +11,9 @@ struct xt_nfacct_match_info {
struct nf_acct *nfacct;
};
+struct xt_nfacct_match_info_v1 {
+ char name[NFACCT_NAME_MAX];
+ struct nf_acct *nfacct __attribute__((aligned(8)));
+};
+
#endif /* _XT_NFACCT_MATCH_H */
diff --git a/net/netfilter/xt_nfacct.c b/net/netfilter/xt_nfacct.c
index d0ab1adf5bff..5aab6df74e0f 100644
--- a/net/netfilter/xt_nfacct.c
+++ b/net/netfilter/xt_nfacct.c
@@ -54,25 +54,39 @@ nfacct_mt_destroy(const struct xt_mtdtor_param *par)
nfnl_acct_put(info->nfacct);
}
-static struct xt_match nfacct_mt_reg __read_mostly = {
- .name = "nfacct",
- .family = NFPROTO_UNSPEC,
- .checkentry = nfacct_mt_checkentry,
- .match = nfacct_mt,
- .destroy = nfacct_mt_destroy,
- .matchsize = sizeof(struct xt_nfacct_match_info),
- .usersize = offsetof(struct xt_nfacct_match_info, nfacct),
- .me = THIS_MODULE,
+static struct xt_match nfacct_mt_reg[] __read_mostly = {
+ {
+ .name = "nfacct",
+ .revision = 0,
+ .family = NFPROTO_UNSPEC,
+ .checkentry = nfacct_mt_checkentry,
+ .match = nfacct_mt,
+ .destroy = nfacct_mt_destroy,
+ .matchsize = sizeof(struct xt_nfacct_match_info),
+ .usersize = offsetof(struct xt_nfacct_match_info, nfacct),
+ .me = THIS_MODULE,
+ },
+ {
+ .name = "nfacct",
+ .revision = 1,
+ .family = NFPROTO_UNSPEC,
+ .checkentry = nfacct_mt_checkentry,
+ .match = nfacct_mt,
+ .destroy = nfacct_mt_destroy,
+ .matchsize = sizeof(struct xt_nfacct_match_info_v1),
+ .usersize = offsetof(struct xt_nfacct_match_info_v1, nfacct),
+ .me = THIS_MODULE,
+ },
};
static int __init nfacct_mt_init(void)
{
- return xt_register_match(&nfacct_mt_reg);
+ return xt_register_matches(nfacct_mt_reg, ARRAY_SIZE(nfacct_mt_reg));
}
static void __exit nfacct_mt_exit(void)
{
- xt_unregister_match(&nfacct_mt_reg);
+ xt_unregister_matches(nfacct_mt_reg, ARRAY_SIZE(nfacct_mt_reg));
}
module_init(nfacct_mt_init);
--
2.11.0
^ permalink raw reply related
* [PATCH 1/5] MAINTAINERS: Remove IP MASQUERADING record
From: Pablo Neira Ayuso @ 2019-08-19 18:49 UTC (permalink / raw)
To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <20190819184911.15263-1-pablo@netfilter.org>
From: Denis Efremov <efremov@linux.com>
This entry is in MAINTAINERS for historical purpose.
It doesn't match current sources since the commit
adf82accc5f5 ("netfilter: x_tables: merge ip and
ipv6 masquerade modules") moved the module.
The net/netfilter/xt_MASQUERADE.c module is already under
the netfilter section. Thus, there is no purpose to keep this
separate entry in MAINTAINERS.
Cc: Florian Westphal <fw@strlen.de>
Cc: Juanjo Ciarlante <jjciarla@raiz.uncu.edu.ar>
Cc: netfilter-devel@vger.kernel.org
Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Denis Efremov <efremov@linux.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
MAINTAINERS | 5 -----
1 file changed, 5 deletions(-)
diff --git a/MAINTAINERS b/MAINTAINERS
index a416574780d6..6839cfd91dde 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8439,11 +8439,6 @@ S: Maintained
F: fs/io_uring.c
F: include/uapi/linux/io_uring.h
-IP MASQUERADING
-M: Juanjo Ciarlante <jjciarla@raiz.uncu.edu.ar>
-S: Maintained
-F: net/ipv4/netfilter/ipt_MASQUERADE.c
-
IPMI SUBSYSTEM
M: Corey Minyard <minyard@acm.org>
L: openipmi-developer@lists.sourceforge.net (moderated for non-subscribers)
--
2.11.0
^ permalink raw reply related
* [PATCH 0/5] Netfilter fixes for net
From: Pablo Neira Ayuso @ 2019-08-19 18:49 UTC (permalink / raw)
To: netfilter-devel; +Cc: davem, netdev
Hi,
The following patchset contains Netfilter fixes for net:
1) Remove IP MASQUERADING record in MAINTAINERS file,
from Denis Efremov.
2) Counter arguments are swapped in ebtables, from
Todd Seidelmann.
3) Missing netlink attribute validation in flow_offload
extension.
4) Incorrect alignment in xt_nfacct that breaks 32-bits
userspace / 64-bits kernels, from Juliana Rodrigueiro.
5) Missing include guard in nf_conntrack_h323_types.h,
from Masahiro Yamada.
You can pull these changes from:
git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git
Thanks.
----------------------------------------------------------------
The following changes since commit cfef46d692efd852a0da6803f920cc756eea2855:
ravb: Fix use-after-free ravb_tstamp_skb (2019-08-18 14:19:14 -0700)
are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git HEAD
for you to fetch changes up to 38a429c898ddd210cc35463b096389f97c3c5a73:
netfilter: add include guard to nf_conntrack_h323_types.h (2019-08-19 13:59:57 +0200)
----------------------------------------------------------------
Denis Efremov (1):
MAINTAINERS: Remove IP MASQUERADING record
Juliana Rodrigueiro (1):
netfilter: xt_nfacct: Fix alignment mismatch in xt_nfacct_match_info
Masahiro Yamada (1):
netfilter: add include guard to nf_conntrack_h323_types.h
Pablo Neira Ayuso (1):
netfilter: nft_flow_offload: missing netlink attribute policy
Todd Seidelmann (1):
netfilter: ebtables: Fix argument order to ADD_COUNTER
MAINTAINERS | 5 ----
include/linux/netfilter/nf_conntrack_h323_types.h | 5 ++++
include/uapi/linux/netfilter/xt_nfacct.h | 5 ++++
net/bridge/netfilter/ebtables.c | 8 ++---
net/netfilter/nft_flow_offload.c | 6 ++++
net/netfilter/xt_nfacct.c | 36 ++++++++++++++++-------
6 files changed, 45 insertions(+), 20 deletions(-)
^ permalink raw reply
* [PATCH 5/5] netfilter: add include guard to nf_conntrack_h323_types.h
From: Pablo Neira Ayuso @ 2019-08-19 18:49 UTC (permalink / raw)
To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <20190819184911.15263-1-pablo@netfilter.org>
From: Masahiro Yamada <yamada.masahiro@socionext.com>
Add a header include guard just in case.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
include/linux/netfilter/nf_conntrack_h323_types.h | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/include/linux/netfilter/nf_conntrack_h323_types.h b/include/linux/netfilter/nf_conntrack_h323_types.h
index 7a6871ac8784..74c6f9241944 100644
--- a/include/linux/netfilter/nf_conntrack_h323_types.h
+++ b/include/linux/netfilter/nf_conntrack_h323_types.h
@@ -4,6 +4,9 @@
* Copyright (c) 2006 Jing Min Zhao <zhaojingmin@users.sourceforge.net>
*/
+#ifndef _NF_CONNTRACK_H323_TYPES_H
+#define _NF_CONNTRACK_H323_TYPES_H
+
typedef struct TransportAddress_ipAddress { /* SEQUENCE */
int options; /* No use */
unsigned int ip;
@@ -931,3 +934,5 @@ typedef struct RasMessage { /* CHOICE */
InfoRequestResponse infoRequestResponse;
};
} RasMessage;
+
+#endif /* _NF_CONNTRACK_H323_TYPES_H */
--
2.11.0
^ permalink raw reply related
* [PATCH 2/5] netfilter: ebtables: Fix argument order to ADD_COUNTER
From: Pablo Neira Ayuso @ 2019-08-19 18:49 UTC (permalink / raw)
To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <20190819184911.15263-1-pablo@netfilter.org>
From: Todd Seidelmann <tseidelmann@linode.com>
The ordering of arguments to the x_tables ADD_COUNTER macro
appears to be wrong in ebtables (cf. ip_tables.c, ip6_tables.c,
and arp_tables.c).
This causes data corruption in the ebtables userspace tools
because they get incorrect packet & byte counts from the kernel.
Fixes: d72133e628803 ("netfilter: ebtables: use ADD_COUNTER macro")
Signed-off-by: Todd Seidelmann <tseidelmann@linode.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
net/bridge/netfilter/ebtables.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c
index c8177a89f52c..4096d8a74a2b 100644
--- a/net/bridge/netfilter/ebtables.c
+++ b/net/bridge/netfilter/ebtables.c
@@ -221,7 +221,7 @@ unsigned int ebt_do_table(struct sk_buff *skb,
return NF_DROP;
}
- ADD_COUNTER(*(counter_base + i), 1, skb->len);
+ ADD_COUNTER(*(counter_base + i), skb->len, 1);
/* these should only watch: not modify, nor tell us
* what to do with the packet
@@ -959,8 +959,8 @@ static void get_counters(const struct ebt_counter *oldcounters,
continue;
counter_base = COUNTER_BASE(oldcounters, nentries, cpu);
for (i = 0; i < nentries; i++)
- ADD_COUNTER(counters[i], counter_base[i].pcnt,
- counter_base[i].bcnt);
+ ADD_COUNTER(counters[i], counter_base[i].bcnt,
+ counter_base[i].pcnt);
}
}
@@ -1280,7 +1280,7 @@ static int do_update_counters(struct net *net, const char *name,
/* we add to the counters of the first cpu */
for (i = 0; i < num_counters; i++)
- ADD_COUNTER(t->private->counters[i], tmp[i].pcnt, tmp[i].bcnt);
+ ADD_COUNTER(t->private->counters[i], tmp[i].bcnt, tmp[i].pcnt);
write_unlock_bh(&t->lock);
ret = 0;
--
2.11.0
^ permalink raw reply related
* [PATCH 3/5] netfilter: nft_flow_offload: missing netlink attribute policy
From: Pablo Neira Ayuso @ 2019-08-19 18:49 UTC (permalink / raw)
To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <20190819184911.15263-1-pablo@netfilter.org>
The netlink attribute policy for NFTA_FLOW_TABLE_NAME is missing.
Fixes: a3c90f7a2323 ("netfilter: nf_tables: flow offload expression")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
net/netfilter/nft_flow_offload.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/net/netfilter/nft_flow_offload.c b/net/netfilter/nft_flow_offload.c
index 060a4ed46d5e..01705ad74a9a 100644
--- a/net/netfilter/nft_flow_offload.c
+++ b/net/netfilter/nft_flow_offload.c
@@ -149,6 +149,11 @@ static int nft_flow_offload_validate(const struct nft_ctx *ctx,
return nft_chain_validate_hooks(ctx->chain, hook_mask);
}
+static const struct nla_policy nft_flow_offload_policy[NFTA_FLOW_MAX + 1] = {
+ [NFTA_FLOW_TABLE_NAME] = { .type = NLA_STRING,
+ .len = NFT_NAME_MAXLEN - 1 },
+};
+
static int nft_flow_offload_init(const struct nft_ctx *ctx,
const struct nft_expr *expr,
const struct nlattr * const tb[])
@@ -207,6 +212,7 @@ static const struct nft_expr_ops nft_flow_offload_ops = {
static struct nft_expr_type nft_flow_offload_type __read_mostly = {
.name = "flow_offload",
.ops = &nft_flow_offload_ops,
+ .policy = nft_flow_offload_policy,
.maxattr = NFTA_FLOW_MAX,
.owner = THIS_MODULE,
};
--
2.11.0
^ permalink raw reply related
* Re: [PATCH net-next 0/1] Add BASE-T1 PHY support
From: Heiner Kallweit @ 2019-08-19 19:07 UTC (permalink / raw)
To: Christian Herber, davem@davemloft.net
Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <AM6PR0402MB379864B810F08D3698618B5F86A80@AM6PR0402MB3798.eurprd04.prod.outlook.com>
On 19.08.2019 08:32, Christian Herber wrote:
> On 16.08.2019 22:59, Heiner Kallweit wrote:
>> On 15.08.2019 17:32, Christian Herber wrote:
>>> This patch adds basic support for BASE-T1 PHYs in the framework.
>>> BASE-T1 PHYs main area of application are automotive and industrial.
>>> BASE-T1 is standardized in IEEE 802.3, namely
>>> - IEEE 802.3bw: 100BASE-T1
>>> - IEEE 802.3bp 1000BASE-T1
>>> - IEEE 802.3cg: 10BASE-T1L and 10BASE-T1S
>>>
>>> There are no products which contain BASE-T1 and consumer type PHYs like
>>> 1000BASE-T. However, devices exist which combine 100BASE-T1 and 1000BASE-T1
>>> PHYs with auto-negotiation.
>>
>> Is this meant in a way that *currently* there are no PHY's combining Base-T1
>> with normal Base-T modes? Or are there reasons why this isn't possible in
>> general? I'm asking because we have PHY's combining copper and fiber, and e.g.
>> the mentioned Aquantia PHY that combines NBase-T with 1000Base-T2.
>>
>>>
>>> The intention of this patch is to make use of the existing Clause 45 functions.
>>> BASE-T1 adds some additional registers e.g. for aneg control, which follow a
>>> similiar register layout as the existing devices. The bits which are used in
>>> BASE-T1 specific registers are the same as in basic registers, thus the
>>> existing functions can be resued, with get_aneg_ctrl() selecting the correct
>>> register address.
>>>
>> If Base-T1 can't be combined with other modes then at a first glance I see no
>> benefit in defining new registers e.g. for aneg control, and the standard ones
>> are unused. Why not using the standard registers? Can you shed some light on that?
>>
>> Are the new registers internally shadowed to the standard location?
>> That's something I've seen on other PHY's: one register appears in different
>> places in different devices.
>>
>>> The current version of ethtool has been prepared for 100/1000BASE-T1 and works
>>> with this patch. 10BASE-T1 needs to be added to ethtool.
>>>
>>> Christian Herber (1):
>>> Added BASE-T1 PHY support to PHY Subsystem
>>>
>>> drivers/net/phy/phy-c45.c | 113 +++++++++++++++++++++++++++++++----
>>> drivers/net/phy/phy-core.c | 4 +-
>>> include/uapi/linux/ethtool.h | 2 +
>>> include/uapi/linux/mdio.h | 21 +++++++
>>> 4 files changed, 129 insertions(+), 11 deletions(-)
>>>
>>
>> Heiner
>>
>
> Hi Heiner,
>
> I do not think the Aquantia part you are describing is publicly
> documented, so i cannot comment on that part.
Right, datasheet isn't publicly available. All I wanted to say with
mentioning this PHY: It's not a rare exception that a PHY combines
standard BaseT modes with "non-consumer" modes for special purposes.
One practical use case of this proprietary 1000Base-T2 mode is
re-using existing 2-pair cabling in aircrafts.
> There are multiple reasons why e.g. xBASE-T1 plus 1000BASE-T is
> unlikely. First, the is no use-case known to me, where this would be
> required. Second, there is no way that you can do an auto-negotiation
> between the two, as these both have their own auto-neg defined (Clause
> 28/73 vs. Clause 98). Thirdly, if you would ever have a product with
> both, I believe it would just include two full PHYs and a way to select
> which flavor you want. Of course, this is the theory until proven
> otherwise, but to me it is sufficient to use a single driver.
>
I'm with you if you say it's unlikely. However your statement in the
commit message leaves the impression that there can't be such a device.
And that's a difference.
Regarding "including two full PHYs":
This case we have already, there are PHYs combining different IP blocks,
each one supporting a specific mode (e.g. copper and fiber). There you
also have the case of different autoneg methods, clause 28 vs. clause 37.
> The registers are different in the fields they include. It is just that
> the flags which are used by the Linux driver, like restarting auto-neg,
> are at the same position.
>
Good to know. Your commit description doesn't mention any specific PHY.
I suppose you have PHYs you'd like to operate with the genphy_c45 driver.
Could you give an example? And ideally, is a public datasheet available?
> Christian
>
>
Heiner
^ permalink raw reply
* [PATCH bpf-next v2 0/4] selftests/bpf: test_progs: misc fixes
From: Stanislav Fomichev @ 2019-08-19 19:17 UTC (permalink / raw)
To: netdev, bpf; +Cc: davem, ast, daniel, Stanislav Fomichev, Andrii Nakryiko
* add test__skip to indicate skipped tests
* remove global success/error counts (use environment)
* remove asserts from the tests
* remove unused ret from send_signal test
v2:
* drop patch that changes output to keep consistent with test_verifier
(Alexei Starovoitov)
* QCHECK instead of test__fail (Andrii Nakryiko)
* test__skip count number of subtests (Andrii Nakryiko)
Cc: Andrii Nakryiko <andriin@fb.com>
Stanislav Fomichev (4):
selftests/bpf: test_progs: test__skip
selftests/bpf: test_progs: remove global fail/success counts
selftests/bpf: test_progs: remove asserts from subtests
selftests/bpf: test_progs: remove unused ret
.../selftests/bpf/prog_tests/bpf_obj_id.c | 20 +++++----
.../bpf/prog_tests/bpf_verif_scale.c | 9 +---
.../selftests/bpf/prog_tests/flow_dissector.c | 4 +-
.../bpf/prog_tests/get_stack_raw_tp.c | 3 --
.../selftests/bpf/prog_tests/global_data.c | 20 +++------
.../selftests/bpf/prog_tests/l4lb_all.c | 8 +---
.../selftests/bpf/prog_tests/map_lock.c | 37 ++++++++--------
.../selftests/bpf/prog_tests/pkt_access.c | 4 +-
.../selftests/bpf/prog_tests/pkt_md_access.c | 4 +-
.../bpf/prog_tests/queue_stack_map.c | 8 +---
.../bpf/prog_tests/reference_tracking.c | 4 +-
.../selftests/bpf/prog_tests/send_signal.c | 43 +++++++++----------
.../selftests/bpf/prog_tests/spinlock.c | 16 +++----
.../bpf/prog_tests/stacktrace_build_id.c | 7 +--
.../bpf/prog_tests/stacktrace_build_id_nmi.c | 7 +--
.../selftests/bpf/prog_tests/stacktrace_map.c | 17 +++-----
.../bpf/prog_tests/stacktrace_map_raw_tp.c | 9 ++--
.../bpf/prog_tests/task_fd_query_rawtp.c | 3 --
.../bpf/prog_tests/task_fd_query_tp.c | 5 ---
.../selftests/bpf/prog_tests/tcp_estats.c | 4 +-
tools/testing/selftests/bpf/prog_tests/xdp.c | 4 +-
.../bpf/prog_tests/xdp_adjust_tail.c | 4 +-
.../selftests/bpf/prog_tests/xdp_noinline.c | 7 +--
tools/testing/selftests/bpf/test_progs.c | 41 ++++++++++++------
tools/testing/selftests/bpf/test_progs.h | 19 +++++---
25 files changed, 135 insertions(+), 172 deletions(-)
--
2.23.0.rc1.153.gdeed80330f-goog
^ permalink raw reply
* [PATCH bpf-next v2 1/4] selftests/bpf: test_progs: test__skip
From: Stanislav Fomichev @ 2019-08-19 19:17 UTC (permalink / raw)
To: netdev, bpf; +Cc: davem, ast, daniel, Stanislav Fomichev, Andrii Nakryiko
In-Reply-To: <20190819191752.241637-1-sdf@google.com>
Export test__skip() to indicate skipped tests and use it in
test_send_signal_nmi().
Cc: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
.../selftests/bpf/prog_tests/send_signal.c | 1 +
tools/testing/selftests/bpf/test_progs.c | 20 +++++++++++++++++--
tools/testing/selftests/bpf/test_progs.h | 2 ++
3 files changed, 21 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/send_signal.c b/tools/testing/selftests/bpf/prog_tests/send_signal.c
index 1575f0a1f586..40c2c5efdd3e 100644
--- a/tools/testing/selftests/bpf/prog_tests/send_signal.c
+++ b/tools/testing/selftests/bpf/prog_tests/send_signal.c
@@ -204,6 +204,7 @@ static int test_send_signal_nmi(void)
if (errno == ENOENT) {
printf("%s:SKIP:no PERF_COUNT_HW_CPU_CYCLES\n",
__func__);
+ test__skip();
return 0;
}
/* Let the test fail with a more informative message */
diff --git a/tools/testing/selftests/bpf/test_progs.c b/tools/testing/selftests/bpf/test_progs.c
index 12895d03d58b..e545dfb55872 100644
--- a/tools/testing/selftests/bpf/test_progs.c
+++ b/tools/testing/selftests/bpf/test_progs.c
@@ -17,6 +17,7 @@ struct prog_test_def {
bool force_log;
int pass_cnt;
int error_cnt;
+ int skip_cnt;
bool tested;
const char *subtest_name;
@@ -56,6 +57,14 @@ static void dump_test_log(const struct prog_test_def *test, bool failed)
fseeko(stdout, 0, SEEK_SET); /* rewind */
}
+static void skip_account(void)
+{
+ if (env.test->skip_cnt) {
+ env.skip_cnt++;
+ env.test->skip_cnt = 0;
+ }
+}
+
void test__end_subtest()
{
struct prog_test_def *test = env.test;
@@ -65,6 +74,7 @@ void test__end_subtest()
env.fail_cnt++;
else
env.sub_succ_cnt++;
+ skip_account();
dump_test_log(test, sub_error_cnt);
@@ -105,6 +115,11 @@ void test__force_log() {
env.test->force_log = true;
}
+void test__skip(void)
+{
+ env.test->skip_cnt++;
+}
+
struct ipv4_packet pkt_v4 = {
.eth.h_proto = __bpf_constant_htons(ETH_P_IP),
.iph.ihl = 5,
@@ -510,6 +525,7 @@ int main(int argc, char **argv)
env.fail_cnt++;
else
env.succ_cnt++;
+ skip_account();
dump_test_log(test, test->error_cnt);
@@ -518,8 +534,8 @@ int main(int argc, char **argv)
test->error_cnt ? "FAIL" : "OK");
}
stdio_restore();
- printf("Summary: %d/%d PASSED, %d FAILED\n",
- env.succ_cnt, env.sub_succ_cnt, env.fail_cnt);
+ printf("Summary: %d/%d PASSED, %d SKIPPED, %d FAILED\n",
+ env.succ_cnt, env.sub_succ_cnt, env.skip_cnt, env.fail_cnt);
free(env.test_selector.num_set);
free(env.subtest_selector.num_set);
diff --git a/tools/testing/selftests/bpf/test_progs.h b/tools/testing/selftests/bpf/test_progs.h
index 37d427f5a1e5..9defd35cb6c0 100644
--- a/tools/testing/selftests/bpf/test_progs.h
+++ b/tools/testing/selftests/bpf/test_progs.h
@@ -64,6 +64,7 @@ struct test_env {
int succ_cnt; /* successful tests */
int sub_succ_cnt; /* successful sub-tests */
int fail_cnt; /* total failed tests + sub-tests */
+ int skip_cnt; /* skipped tests */
};
extern int error_cnt;
@@ -72,6 +73,7 @@ extern struct test_env env;
extern void test__force_log();
extern bool test__start_subtest(const char *name);
+extern void test__skip(void);
#define MAGIC_BYTES 123
--
2.23.0.rc1.153.gdeed80330f-goog
^ permalink raw reply related
* [PATCH bpf-next v2 2/4] selftests/bpf: test_progs: remove global fail/success counts
From: Stanislav Fomichev @ 2019-08-19 19:17 UTC (permalink / raw)
To: netdev, bpf; +Cc: davem, ast, daniel, Stanislav Fomichev, Andrii Nakryiko
In-Reply-To: <20190819191752.241637-1-sdf@google.com>
Now that we have a global per-test/per-environment state, there
is no longer need to have global fail/success counters (and there
is no need to save/get the diff before/after the test).
Introduce QCHECK macro (suggested by Andrii) and covert existing tests
to it. QCHECK uses new test__fail() to record the failure.
Cc: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
.../selftests/bpf/prog_tests/bpf_obj_id.c | 3 +--
.../bpf/prog_tests/bpf_verif_scale.c | 9 +-------
.../selftests/bpf/prog_tests/flow_dissector.c | 4 +---
.../bpf/prog_tests/get_stack_raw_tp.c | 3 ---
.../selftests/bpf/prog_tests/global_data.c | 20 +++++-------------
.../selftests/bpf/prog_tests/l4lb_all.c | 8 ++-----
.../selftests/bpf/prog_tests/map_lock.c | 17 ++++++---------
.../selftests/bpf/prog_tests/pkt_access.c | 4 +---
.../selftests/bpf/prog_tests/pkt_md_access.c | 4 +---
.../bpf/prog_tests/queue_stack_map.c | 8 ++-----
.../bpf/prog_tests/reference_tracking.c | 4 +---
.../selftests/bpf/prog_tests/spinlock.c | 6 ++----
.../selftests/bpf/prog_tests/stacktrace_map.c | 17 +++++++--------
.../bpf/prog_tests/stacktrace_map_raw_tp.c | 9 +++-----
.../bpf/prog_tests/task_fd_query_rawtp.c | 3 ---
.../bpf/prog_tests/task_fd_query_tp.c | 5 -----
.../selftests/bpf/prog_tests/tcp_estats.c | 4 +---
tools/testing/selftests/bpf/prog_tests/xdp.c | 4 +---
.../bpf/prog_tests/xdp_adjust_tail.c | 4 +---
.../selftests/bpf/prog_tests/xdp_noinline.c | 7 ++-----
tools/testing/selftests/bpf/test_progs.c | 21 ++++++++-----------
tools/testing/selftests/bpf/test_progs.h | 17 +++++++++------
22 files changed, 58 insertions(+), 123 deletions(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_obj_id.c b/tools/testing/selftests/bpf/prog_tests/bpf_obj_id.c
index fb5840a62548..b9d0cd312839 100644
--- a/tools/testing/selftests/bpf/prog_tests/bpf_obj_id.c
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_obj_id.c
@@ -48,8 +48,7 @@ void test_bpf_obj_id(void)
/* test_obj_id.o is a dumb prog. It should never fail
* to load.
*/
- if (err)
- error_cnt++;
+ QCHECK(err);
assert(!err);
/* Insert a magic value to the map */
diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_verif_scale.c b/tools/testing/selftests/bpf/prog_tests/bpf_verif_scale.c
index 1a1eae356f81..d0fd20d021a7 100644
--- a/tools/testing/selftests/bpf/prog_tests/bpf_verif_scale.c
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_verif_scale.c
@@ -28,8 +28,6 @@ static int check_load(const char *file, enum bpf_prog_type type)
attr.prog_flags = BPF_F_TEST_RND_HI32;
err = bpf_prog_load_xattr(&attr, &obj, &prog_fd);
bpf_object__close(obj);
- if (err)
- error_cnt++;
return err;
}
@@ -105,12 +103,7 @@ void test_bpf_verif_scale(void)
continue;
err = check_load(test->file, test->attach_type);
- if (test->fails) { /* expected to fail */
- if (err)
- error_cnt--;
- else
- error_cnt++;
- }
+ QCHECK(err && !test->fails);
}
if (env.verifier_stats)
diff --git a/tools/testing/selftests/bpf/prog_tests/flow_dissector.c b/tools/testing/selftests/bpf/prog_tests/flow_dissector.c
index 6892b88ae065..dbd20338a667 100644
--- a/tools/testing/selftests/bpf/prog_tests/flow_dissector.c
+++ b/tools/testing/selftests/bpf/prog_tests/flow_dissector.c
@@ -452,10 +452,8 @@ void test_flow_dissector(void)
err = bpf_flow_load(&obj, "./bpf_flow.o", "flow_dissector",
"jmp_table", "last_dissection", &prog_fd, &keys_fd);
- if (err) {
- error_cnt++;
+ if (QCHECK(err))
return;
- }
for (i = 0; i < ARRAY_SIZE(tests); i++) {
struct bpf_flow_keys flow_keys;
diff --git a/tools/testing/selftests/bpf/prog_tests/get_stack_raw_tp.c b/tools/testing/selftests/bpf/prog_tests/get_stack_raw_tp.c
index 3d59b3c841fe..eba9a970703b 100644
--- a/tools/testing/selftests/bpf/prog_tests/get_stack_raw_tp.c
+++ b/tools/testing/selftests/bpf/prog_tests/get_stack_raw_tp.c
@@ -135,10 +135,7 @@ void test_get_stack_raw_tp(void)
exp_cnt -= err;
}
- goto close_prog_noerr;
close_prog:
- error_cnt++;
-close_prog_noerr:
if (!IS_ERR_OR_NULL(link))
bpf_link__destroy(link);
if (!IS_ERR_OR_NULL(pb))
diff --git a/tools/testing/selftests/bpf/prog_tests/global_data.c b/tools/testing/selftests/bpf/prog_tests/global_data.c
index d011079fb0bf..4978cfc5ea25 100644
--- a/tools/testing/selftests/bpf/prog_tests/global_data.c
+++ b/tools/testing/selftests/bpf/prog_tests/global_data.c
@@ -7,10 +7,8 @@ static void test_global_data_number(struct bpf_object *obj, __u32 duration)
uint64_t num;
map_fd = bpf_find_map(__func__, obj, "result_number");
- if (map_fd < 0) {
- error_cnt++;
+ if (QCHECK(map_fd < 0))
return;
- }
struct {
char *name;
@@ -44,10 +42,8 @@ static void test_global_data_string(struct bpf_object *obj, __u32 duration)
char str[32];
map_fd = bpf_find_map(__func__, obj, "result_string");
- if (map_fd < 0) {
- error_cnt++;
+ if (QCHECK(map_fd < 0))
return;
- }
struct {
char *name;
@@ -81,10 +77,8 @@ static void test_global_data_struct(struct bpf_object *obj, __u32 duration)
struct foo val;
map_fd = bpf_find_map(__func__, obj, "result_struct");
- if (map_fd < 0) {
- error_cnt++;
+ if (QCHECK(map_fd < 0))
return;
- }
struct {
char *name;
@@ -112,16 +106,12 @@ static void test_global_data_rdonly(struct bpf_object *obj, __u32 duration)
__u8 *buff;
map = bpf_object__find_map_by_name(obj, "test_glo.rodata");
- if (!map || !bpf_map__is_internal(map)) {
- error_cnt++;
+ if (QCHECK(!map || !bpf_map__is_internal(map)))
return;
- }
map_fd = bpf_map__fd(map);
- if (map_fd < 0) {
- error_cnt++;
+ if (QCHECK(map_fd < 0))
return;
- }
buff = malloc(bpf_map__def(map)->value_size);
if (buff)
diff --git a/tools/testing/selftests/bpf/prog_tests/l4lb_all.c b/tools/testing/selftests/bpf/prog_tests/l4lb_all.c
index 20ddca830e68..5c6de66c1b82 100644
--- a/tools/testing/selftests/bpf/prog_tests/l4lb_all.c
+++ b/tools/testing/selftests/bpf/prog_tests/l4lb_all.c
@@ -30,10 +30,8 @@ static void test_l4lb(const char *file)
u32 *magic = (u32 *)buf;
err = bpf_prog_load(file, BPF_PROG_TYPE_SCHED_CLS, &obj, &prog_fd);
- if (err) {
- error_cnt++;
+ if (QCHECK(err))
return;
- }
map_fd = bpf_find_map(__func__, obj, "vip_map");
if (map_fd < 0)
@@ -72,10 +70,8 @@ static void test_l4lb(const char *file)
bytes += stats[i].bytes;
pkts += stats[i].pkts;
}
- if (bytes != MAGIC_BYTES * NUM_ITER * 2 || pkts != NUM_ITER * 2) {
- error_cnt++;
+ if (QCHECK(bytes != MAGIC_BYTES * NUM_ITER * 2 || pkts != NUM_ITER * 2))
printf("test_l4lb:FAIL:stats %lld %lld\n", bytes, pkts);
- }
out:
bpf_object__close(obj);
}
diff --git a/tools/testing/selftests/bpf/prog_tests/map_lock.c b/tools/testing/selftests/bpf/prog_tests/map_lock.c
index ee99368c595c..c1bddc433a5a 100644
--- a/tools/testing/selftests/bpf/prog_tests/map_lock.c
+++ b/tools/testing/selftests/bpf/prog_tests/map_lock.c
@@ -8,14 +8,12 @@ static void *parallel_map_access(void *arg)
for (i = 0; i < 10000; i++) {
err = bpf_map_lookup_elem_flags(map_fd, &key, vars, BPF_F_LOCK);
- if (err) {
+ if (QCHECK(err)) {
printf("lookup failed\n");
- error_cnt++;
goto out;
}
- if (vars[0] != 0) {
+ if (QCHECK(vars[0] != 0)) {
printf("lookup #%d var[0]=%d\n", i, vars[0]);
- error_cnt++;
goto out;
}
rnd = vars[1];
@@ -24,7 +22,7 @@ static void *parallel_map_access(void *arg)
continue;
printf("lookup #%d var[1]=%d var[%d]=%d\n",
i, rnd, j, vars[j]);
- error_cnt++;
+ QCHECK(vars[j] != rnd);
goto out;
}
}
@@ -42,15 +40,15 @@ void test_map_lock(void)
void *ret;
err = bpf_prog_load(file, BPF_PROG_TYPE_CGROUP_SKB, &obj, &prog_fd);
- if (err) {
+ if (QCHECK(err)) {
printf("test_map_lock:bpf_prog_load errno %d\n", errno);
goto close_prog;
}
map_fd[0] = bpf_find_map(__func__, obj, "hash_map");
- if (map_fd[0] < 0)
+ if (QCHECK(map_fd[0] < 0))
goto close_prog;
map_fd[1] = bpf_find_map(__func__, obj, "array_map");
- if (map_fd[1] < 0)
+ if (QCHECK(map_fd[1] < 0))
goto close_prog;
bpf_map_update_elem(map_fd[0], &key, vars, BPF_F_LOCK);
@@ -67,9 +65,6 @@ void test_map_lock(void)
for (i = 4; i < 6; i++)
assert(pthread_join(thread_id[i], &ret) == 0 &&
ret == (void *)&map_fd[i - 4]);
- goto close_prog_noerr;
close_prog:
- error_cnt++;
-close_prog_noerr:
bpf_object__close(obj);
}
diff --git a/tools/testing/selftests/bpf/prog_tests/pkt_access.c b/tools/testing/selftests/bpf/prog_tests/pkt_access.c
index 4ecfd721a044..345c5201acee 100644
--- a/tools/testing/selftests/bpf/prog_tests/pkt_access.c
+++ b/tools/testing/selftests/bpf/prog_tests/pkt_access.c
@@ -9,10 +9,8 @@ void test_pkt_access(void)
int err, prog_fd;
err = bpf_prog_load(file, BPF_PROG_TYPE_SCHED_CLS, &obj, &prog_fd);
- if (err) {
- error_cnt++;
+ if (QCHECK(err))
return;
- }
err = bpf_prog_test_run(prog_fd, 100000, &pkt_v4, sizeof(pkt_v4),
NULL, NULL, &retval, &duration);
diff --git a/tools/testing/selftests/bpf/prog_tests/pkt_md_access.c b/tools/testing/selftests/bpf/prog_tests/pkt_md_access.c
index ac0d43435806..c0723e306f6f 100644
--- a/tools/testing/selftests/bpf/prog_tests/pkt_md_access.c
+++ b/tools/testing/selftests/bpf/prog_tests/pkt_md_access.c
@@ -9,10 +9,8 @@ void test_pkt_md_access(void)
int err, prog_fd;
err = bpf_prog_load(file, BPF_PROG_TYPE_SCHED_CLS, &obj, &prog_fd);
- if (err) {
- error_cnt++;
+ if (QCHECK(err))
return;
- }
err = bpf_prog_test_run(prog_fd, 10, &pkt_v4, sizeof(pkt_v4),
NULL, NULL, &retval, &duration);
diff --git a/tools/testing/selftests/bpf/prog_tests/queue_stack_map.c b/tools/testing/selftests/bpf/prog_tests/queue_stack_map.c
index e60cd5ff1f55..99e346717767 100644
--- a/tools/testing/selftests/bpf/prog_tests/queue_stack_map.c
+++ b/tools/testing/selftests/bpf/prog_tests/queue_stack_map.c
@@ -27,10 +27,8 @@ static void test_queue_stack_map_by_type(int type)
return;
err = bpf_prog_load(file, BPF_PROG_TYPE_SCHED_CLS, &obj, &prog_fd);
- if (err) {
- error_cnt++;
+ if (QCHECK(err))
return;
- }
map_in_fd = bpf_find_map(__func__, obj, "map_in");
if (map_in_fd < 0)
@@ -43,10 +41,8 @@ static void test_queue_stack_map_by_type(int type)
/* Push 32 elements to the input map */
for (i = 0; i < MAP_SIZE; i++) {
err = bpf_map_update_elem(map_in_fd, NULL, &vals[i], 0);
- if (err) {
- error_cnt++;
+ if (QCHECK(err))
goto out;
- }
}
/* The eBPF program pushes iph.saddr in the output map,
diff --git a/tools/testing/selftests/bpf/prog_tests/reference_tracking.c b/tools/testing/selftests/bpf/prog_tests/reference_tracking.c
index 4a4f428d1a78..cb2fe7b0dc46 100644
--- a/tools/testing/selftests/bpf/prog_tests/reference_tracking.c
+++ b/tools/testing/selftests/bpf/prog_tests/reference_tracking.c
@@ -10,10 +10,8 @@ void test_reference_tracking(void)
int err = 0;
obj = bpf_object__open(file);
- if (IS_ERR(obj)) {
- error_cnt++;
+ if (QCHECK(IS_ERR(obj)))
return;
- }
bpf_object__for_each_program(prog, obj) {
const char *title;
diff --git a/tools/testing/selftests/bpf/prog_tests/spinlock.c b/tools/testing/selftests/bpf/prog_tests/spinlock.c
index 114ebe6a438e..e4294a7fdf1a 100644
--- a/tools/testing/selftests/bpf/prog_tests/spinlock.c
+++ b/tools/testing/selftests/bpf/prog_tests/spinlock.c
@@ -11,7 +11,7 @@ void test_spinlock(void)
void *ret;
err = bpf_prog_load(file, BPF_PROG_TYPE_CGROUP_SKB, &obj, &prog_fd);
- if (err) {
+ if (QCHECK(err)) {
printf("test_spin_lock:bpf_prog_load errno %d\n", errno);
goto close_prog;
}
@@ -21,9 +21,7 @@ void test_spinlock(void)
for (i = 0; i < 4; i++)
assert(pthread_join(thread_id[i], &ret) == 0 &&
ret == (void *)&prog_fd);
- goto close_prog_noerr;
+
close_prog:
- error_cnt++;
-close_prog_noerr:
bpf_object__close(obj);
}
diff --git a/tools/testing/selftests/bpf/prog_tests/stacktrace_map.c b/tools/testing/selftests/bpf/prog_tests/stacktrace_map.c
index fc539335c5b3..708d4cded52a 100644
--- a/tools/testing/selftests/bpf/prog_tests/stacktrace_map.c
+++ b/tools/testing/selftests/bpf/prog_tests/stacktrace_map.c
@@ -26,19 +26,19 @@ void test_stacktrace_map(void)
/* find map fds */
control_map_fd = bpf_find_map(__func__, obj, "control_map");
- if (control_map_fd < 0)
+ if (QCHECK(control_map_fd < 0))
goto disable_pmu;
stackid_hmap_fd = bpf_find_map(__func__, obj, "stackid_hmap");
- if (stackid_hmap_fd < 0)
+ if (QCHECK(stackid_hmap_fd < 0))
goto disable_pmu;
stackmap_fd = bpf_find_map(__func__, obj, "stackmap");
- if (stackmap_fd < 0)
+ if (QCHECK(stackmap_fd < 0))
goto disable_pmu;
stack_amap_fd = bpf_find_map(__func__, obj, "stack_amap");
- if (stack_amap_fd < 0)
+ if (QCHECK(stack_amap_fd < 0))
goto disable_pmu;
/* give some time for bpf program run */
@@ -55,23 +55,20 @@ void test_stacktrace_map(void)
err = compare_map_keys(stackid_hmap_fd, stackmap_fd);
if (CHECK(err, "compare_map_keys stackid_hmap vs. stackmap",
"err %d errno %d\n", err, errno))
- goto disable_pmu_noerr;
+ goto disable_pmu;
err = compare_map_keys(stackmap_fd, stackid_hmap_fd);
if (CHECK(err, "compare_map_keys stackmap vs. stackid_hmap",
"err %d errno %d\n", err, errno))
- goto disable_pmu_noerr;
+ goto disable_pmu;
stack_trace_len = PERF_MAX_STACK_DEPTH * sizeof(__u64);
err = compare_stack_ips(stackmap_fd, stack_amap_fd, stack_trace_len);
if (CHECK(err, "compare_stack_ips stackmap vs. stack_amap",
"err %d errno %d\n", err, errno))
- goto disable_pmu_noerr;
+ goto disable_pmu;
- goto disable_pmu_noerr;
disable_pmu:
- error_cnt++;
-disable_pmu_noerr:
bpf_link__destroy(link);
close_prog:
bpf_object__close(obj);
diff --git a/tools/testing/selftests/bpf/prog_tests/stacktrace_map_raw_tp.c b/tools/testing/selftests/bpf/prog_tests/stacktrace_map_raw_tp.c
index fbfa8e76cf63..eb0208863fa3 100644
--- a/tools/testing/selftests/bpf/prog_tests/stacktrace_map_raw_tp.c
+++ b/tools/testing/selftests/bpf/prog_tests/stacktrace_map_raw_tp.c
@@ -26,15 +26,15 @@ void test_stacktrace_map_raw_tp(void)
/* find map fds */
control_map_fd = bpf_find_map(__func__, obj, "control_map");
- if (control_map_fd < 0)
+ if (QCHECK(control_map_fd < 0))
goto close_prog;
stackid_hmap_fd = bpf_find_map(__func__, obj, "stackid_hmap");
- if (stackid_hmap_fd < 0)
+ if (QCHECK(stackid_hmap_fd < 0))
goto close_prog;
stackmap_fd = bpf_find_map(__func__, obj, "stackmap");
- if (stackmap_fd < 0)
+ if (QCHECK(stackmap_fd < 0))
goto close_prog;
/* give some time for bpf program run */
@@ -58,10 +58,7 @@ void test_stacktrace_map_raw_tp(void)
"err %d errno %d\n", err, errno))
goto close_prog;
- goto close_prog_noerr;
close_prog:
- error_cnt++;
-close_prog_noerr:
if (!IS_ERR_OR_NULL(link))
bpf_link__destroy(link);
bpf_object__close(obj);
diff --git a/tools/testing/selftests/bpf/prog_tests/task_fd_query_rawtp.c b/tools/testing/selftests/bpf/prog_tests/task_fd_query_rawtp.c
index 958a3d88de99..1bdc1d86a50c 100644
--- a/tools/testing/selftests/bpf/prog_tests/task_fd_query_rawtp.c
+++ b/tools/testing/selftests/bpf/prog_tests/task_fd_query_rawtp.c
@@ -70,9 +70,6 @@ void test_task_fd_query_rawtp(void)
if (CHECK(!err, "check_results", "fd_type %d len %u\n", fd_type, len))
goto close_prog;
- goto close_prog_noerr;
close_prog:
- error_cnt++;
-close_prog_noerr:
bpf_object__close(obj);
}
diff --git a/tools/testing/selftests/bpf/prog_tests/task_fd_query_tp.c b/tools/testing/selftests/bpf/prog_tests/task_fd_query_tp.c
index f9b70e81682b..3f131b8fe328 100644
--- a/tools/testing/selftests/bpf/prog_tests/task_fd_query_tp.c
+++ b/tools/testing/selftests/bpf/prog_tests/task_fd_query_tp.c
@@ -62,14 +62,9 @@ static void test_task_fd_query_tp_core(const char *probe_name,
fd_type, buf))
goto close_pmu;
- close(pmu_fd);
- goto close_prog_noerr;
-
close_pmu:
close(pmu_fd);
close_prog:
- error_cnt++;
-close_prog_noerr:
bpf_object__close(obj);
}
diff --git a/tools/testing/selftests/bpf/prog_tests/tcp_estats.c b/tools/testing/selftests/bpf/prog_tests/tcp_estats.c
index bb8759d69099..594307dffd13 100644
--- a/tools/testing/selftests/bpf/prog_tests/tcp_estats.c
+++ b/tools/testing/selftests/bpf/prog_tests/tcp_estats.c
@@ -10,10 +10,8 @@ void test_tcp_estats(void)
err = bpf_prog_load(file, BPF_PROG_TYPE_TRACEPOINT, &obj, &prog_fd);
CHECK(err, "", "err %d errno %d\n", err, errno);
- if (err) {
- error_cnt++;
+ if (err)
return;
- }
bpf_object__close(obj);
}
diff --git a/tools/testing/selftests/bpf/prog_tests/xdp.c b/tools/testing/selftests/bpf/prog_tests/xdp.c
index a74167289545..a5da2bb221d4 100644
--- a/tools/testing/selftests/bpf/prog_tests/xdp.c
+++ b/tools/testing/selftests/bpf/prog_tests/xdp.c
@@ -16,10 +16,8 @@ void test_xdp(void)
int err, prog_fd, map_fd;
err = bpf_prog_load(file, BPF_PROG_TYPE_XDP, &obj, &prog_fd);
- if (err) {
- error_cnt++;
+ if (QCHECK(err))
return;
- }
map_fd = bpf_find_map(__func__, obj, "vip2tnl");
if (map_fd < 0)
diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_adjust_tail.c b/tools/testing/selftests/bpf/prog_tests/xdp_adjust_tail.c
index 922aa0a19764..115a2b883d1e 100644
--- a/tools/testing/selftests/bpf/prog_tests/xdp_adjust_tail.c
+++ b/tools/testing/selftests/bpf/prog_tests/xdp_adjust_tail.c
@@ -10,10 +10,8 @@ void test_xdp_adjust_tail(void)
int err, prog_fd;
err = bpf_prog_load(file, BPF_PROG_TYPE_XDP, &obj, &prog_fd);
- if (err) {
- error_cnt++;
+ if (QCHECK(err))
return;
- }
err = bpf_prog_test_run(prog_fd, 1, &pkt_v4, sizeof(pkt_v4),
buf, &size, &retval, &duration);
diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_noinline.c b/tools/testing/selftests/bpf/prog_tests/xdp_noinline.c
index 15f7c272edb0..edfff407ac69 100644
--- a/tools/testing/selftests/bpf/prog_tests/xdp_noinline.c
+++ b/tools/testing/selftests/bpf/prog_tests/xdp_noinline.c
@@ -31,10 +31,8 @@ void test_xdp_noinline(void)
u32 *magic = (u32 *)buf;
err = bpf_prog_load(file, BPF_PROG_TYPE_XDP, &obj, &prog_fd);
- if (err) {
- error_cnt++;
+ if (QCHECK(err))
return;
- }
map_fd = bpf_find_map(__func__, obj, "vip_map");
if (map_fd < 0)
@@ -73,8 +71,7 @@ void test_xdp_noinline(void)
bytes += stats[i].bytes;
pkts += stats[i].pkts;
}
- if (bytes != MAGIC_BYTES * NUM_ITER * 2 || pkts != NUM_ITER * 2) {
- error_cnt++;
+ if (QCHECK(bytes != MAGIC_BYTES * NUM_ITER * 2 || pkts != NUM_ITER * 2)) {
printf("test_xdp_noinline:FAIL:stats %lld %lld\n",
bytes, pkts);
}
diff --git a/tools/testing/selftests/bpf/test_progs.c b/tools/testing/selftests/bpf/test_progs.c
index e545dfb55872..e5892cb60eca 100644
--- a/tools/testing/selftests/bpf/test_progs.c
+++ b/tools/testing/selftests/bpf/test_progs.c
@@ -8,14 +8,12 @@
/* defined in test_progs.h */
struct test_env env;
-int error_cnt, pass_cnt;
struct prog_test_def {
const char *test_name;
int test_num;
void (*run_test)(void);
bool force_log;
- int pass_cnt;
int error_cnt;
int skip_cnt;
bool tested;
@@ -24,7 +22,6 @@ struct prog_test_def {
int subtest_num;
/* store counts before subtest started */
- int old_pass_cnt;
int old_error_cnt;
};
@@ -68,7 +65,7 @@ static void skip_account(void)
void test__end_subtest()
{
struct prog_test_def *test = env.test;
- int sub_error_cnt = error_cnt - test->old_error_cnt;
+ int sub_error_cnt = test->error_cnt - test->old_error_cnt;
if (sub_error_cnt)
env.fail_cnt++;
@@ -105,8 +102,7 @@ bool test__start_subtest(const char *name)
return false;
test->subtest_name = name;
- env.test->old_pass_cnt = pass_cnt;
- env.test->old_error_cnt = error_cnt;
+ env.test->old_error_cnt = env.test->error_cnt;
return true;
}
@@ -120,6 +116,11 @@ void test__skip(void)
env.test->skip_cnt++;
}
+void test__fail(void)
+{
+ env.test->error_cnt++;
+}
+
struct ipv4_packet pkt_v4 = {
.eth.h_proto = __bpf_constant_htons(ETH_P_IP),
.iph.ihl = 5,
@@ -144,7 +145,7 @@ int bpf_find_map(const char *test, struct bpf_object *obj, const char *name)
map = bpf_object__find_map_by_name(obj, name);
if (!map) {
printf("%s:FAIL:map '%s' not found\n", test, name);
- error_cnt++;
+ test__fail();
return -1;
}
return bpf_map__fd(map);
@@ -503,8 +504,6 @@ int main(int argc, char **argv)
stdio_hijack();
for (i = 0; i < prog_test_cnt; i++) {
struct prog_test_def *test = &prog_test_defs[i];
- int old_pass_cnt = pass_cnt;
- int old_error_cnt = error_cnt;
env.test = test;
test->test_num = i + 1;
@@ -519,8 +518,6 @@ int main(int argc, char **argv)
test__end_subtest();
test->tested = true;
- test->pass_cnt = pass_cnt - old_pass_cnt;
- test->error_cnt = error_cnt - old_error_cnt;
if (test->error_cnt)
env.fail_cnt++;
else
@@ -540,5 +537,5 @@ int main(int argc, char **argv)
free(env.test_selector.num_set);
free(env.subtest_selector.num_set);
- return error_cnt ? EXIT_FAILURE : EXIT_SUCCESS;
+ return env.fail_cnt ? EXIT_FAILURE : EXIT_SUCCESS;
}
diff --git a/tools/testing/selftests/bpf/test_progs.h b/tools/testing/selftests/bpf/test_progs.h
index 9defd35cb6c0..a74c1ca66bba 100644
--- a/tools/testing/selftests/bpf/test_progs.h
+++ b/tools/testing/selftests/bpf/test_progs.h
@@ -38,8 +38,6 @@ typedef __u16 __sum16;
#include "trace_helpers.h"
#include "flow_dissector_load.h"
-struct prog_test_def;
-
struct test_selector {
const char *name;
bool *num_set;
@@ -67,13 +65,12 @@ struct test_env {
int skip_cnt; /* skipped tests */
};
-extern int error_cnt;
-extern int pass_cnt;
extern struct test_env env;
extern void test__force_log();
extern bool test__start_subtest(const char *name);
extern void test__skip(void);
+extern void test__fail(void);
#define MAGIC_BYTES 123
@@ -96,17 +93,25 @@ extern struct ipv6_packet pkt_v6;
#define _CHECK(condition, tag, duration, format...) ({ \
int __ret = !!(condition); \
if (__ret) { \
- error_cnt++; \
+ test__fail(); \
printf("%s:FAIL:%s ", __func__, tag); \
printf(format); \
} else { \
- pass_cnt++; \
printf("%s:PASS:%s %d nsec\n", \
__func__, tag, duration); \
} \
__ret; \
})
+#define QCHECK(condition) ({ \
+ int __ret = !!(condition); \
+ if (__ret) { \
+ test__fail(); \
+ printf("%s:FAIL:%d ", __func__, __LINE__); \
+ } \
+ __ret; \
+})
+
#define CHECK(condition, tag, format...) \
_CHECK(condition, tag, duration, format)
#define CHECK_ATTR(condition, tag, format...) \
--
2.23.0.rc1.153.gdeed80330f-goog
^ permalink raw reply related
* [PATCH bpf-next v2 3/4] selftests/bpf: test_progs: remove asserts from subtests
From: Stanislav Fomichev @ 2019-08-19 19:17 UTC (permalink / raw)
To: netdev, bpf; +Cc: davem, ast, daniel, Stanislav Fomichev, Andrii Nakryiko
In-Reply-To: <20190819191752.241637-1-sdf@google.com>
Otherwise they can bring the whole process down.
Cc: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
.../selftests/bpf/prog_tests/bpf_obj_id.c | 19 +++++++++++-------
.../selftests/bpf/prog_tests/map_lock.c | 20 +++++++++++--------
.../selftests/bpf/prog_tests/spinlock.c | 12 ++++++-----
.../bpf/prog_tests/stacktrace_build_id.c | 7 ++++---
.../bpf/prog_tests/stacktrace_build_id_nmi.c | 7 ++++---
5 files changed, 39 insertions(+), 26 deletions(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_obj_id.c b/tools/testing/selftests/bpf/prog_tests/bpf_obj_id.c
index b9d0cd312839..acc2fc046b01 100644
--- a/tools/testing/selftests/bpf/prog_tests/bpf_obj_id.c
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_obj_id.c
@@ -48,15 +48,17 @@ void test_bpf_obj_id(void)
/* test_obj_id.o is a dumb prog. It should never fail
* to load.
*/
- QCHECK(err);
- assert(!err);
+ if (QCHECK(err))
+ continue;
/* Insert a magic value to the map */
map_fds[i] = bpf_find_map(__func__, objs[i], "test_map_id");
- assert(map_fds[i] >= 0);
+ if (QCHECK(map_fds[i] < 0))
+ goto done;
err = bpf_map_update_elem(map_fds[i], &array_key,
&array_magic_value, 0);
- assert(!err);
+ if (QCHECK(err))
+ goto done;
/* Check getting map info */
info_len = sizeof(struct bpf_map_info) * 2;
@@ -95,9 +97,11 @@ void test_bpf_obj_id(void)
prog_infos[i].map_ids = ptr_to_u64(map_ids + i);
prog_infos[i].nr_map_ids = 2;
err = clock_gettime(CLOCK_REALTIME, &real_time_ts);
- assert(!err);
+ if (QCHECK(err))
+ goto done;
err = clock_gettime(CLOCK_BOOTTIME, &boot_time_ts);
- assert(!err);
+ if (QCHECK(err))
+ goto done;
err = bpf_obj_get_info_by_fd(prog_fds[i], &prog_infos[i],
&info_len);
load_time = (real_time_ts.tv_sec - boot_time_ts.tv_sec)
@@ -223,7 +227,8 @@ void test_bpf_obj_id(void)
nr_id_found++;
err = bpf_map_lookup_elem(map_fd, &array_key, &array_value);
- assert(!err);
+ if (QCHECK(err))
+ goto done;
err = bpf_obj_get_info_by_fd(map_fd, &map_info, &info_len);
CHECK(err || info_len != sizeof(struct bpf_map_info) ||
diff --git a/tools/testing/selftests/bpf/prog_tests/map_lock.c b/tools/testing/selftests/bpf/prog_tests/map_lock.c
index c1bddc433a5a..7a12129def9a 100644
--- a/tools/testing/selftests/bpf/prog_tests/map_lock.c
+++ b/tools/testing/selftests/bpf/prog_tests/map_lock.c
@@ -54,17 +54,21 @@ void test_map_lock(void)
bpf_map_update_elem(map_fd[0], &key, vars, BPF_F_LOCK);
for (i = 0; i < 4; i++)
- assert(pthread_create(&thread_id[i], NULL,
- &spin_lock_thread, &prog_fd) == 0);
+ if (QCHECK(pthread_create(&thread_id[i], NULL,
+ &spin_lock_thread, &prog_fd)))
+ goto close_prog;
for (i = 4; i < 6; i++)
- assert(pthread_create(&thread_id[i], NULL,
- ¶llel_map_access, &map_fd[i - 4]) == 0);
+ if (QCHECK(pthread_create(&thread_id[i], NULL,
+ ¶llel_map_access, &map_fd[i - 4])))
+ goto close_prog;
for (i = 0; i < 4; i++)
- assert(pthread_join(thread_id[i], &ret) == 0 &&
- ret == (void *)&prog_fd);
+ if (QCHECK(pthread_join(thread_id[i], &ret) ||
+ ret != (void *)&prog_fd))
+ goto close_prog;
for (i = 4; i < 6; i++)
- assert(pthread_join(thread_id[i], &ret) == 0 &&
- ret == (void *)&map_fd[i - 4]);
+ if (QCHECK(pthread_join(thread_id[i], &ret) ||
+ ret != (void *)&map_fd[i - 4]))
+ goto close_prog;
close_prog:
bpf_object__close(obj);
}
diff --git a/tools/testing/selftests/bpf/prog_tests/spinlock.c b/tools/testing/selftests/bpf/prog_tests/spinlock.c
index e4294a7fdf1a..00b4ed1734e0 100644
--- a/tools/testing/selftests/bpf/prog_tests/spinlock.c
+++ b/tools/testing/selftests/bpf/prog_tests/spinlock.c
@@ -16,12 +16,14 @@ void test_spinlock(void)
goto close_prog;
}
for (i = 0; i < 4; i++)
- assert(pthread_create(&thread_id[i], NULL,
- &spin_lock_thread, &prog_fd) == 0);
- for (i = 0; i < 4; i++)
- assert(pthread_join(thread_id[i], &ret) == 0 &&
- ret == (void *)&prog_fd);
+ if (QCHECK(pthread_create(&thread_id[i], NULL,
+ &spin_lock_thread, &prog_fd)))
+ goto close_prog;
+ for (i = 0; i < 4; i++)
+ if (QCHECK(pthread_join(thread_id[i], &ret) ||
+ ret != (void *)&prog_fd))
+ goto close_prog;
close_prog:
bpf_object__close(obj);
}
diff --git a/tools/testing/selftests/bpf/prog_tests/stacktrace_build_id.c b/tools/testing/selftests/bpf/prog_tests/stacktrace_build_id.c
index ac44fda84833..552e07e7800c 100644
--- a/tools/testing/selftests/bpf/prog_tests/stacktrace_build_id.c
+++ b/tools/testing/selftests/bpf/prog_tests/stacktrace_build_id.c
@@ -51,9 +51,10 @@ void test_stacktrace_build_id(void)
"err %d errno %d\n", err, errno))
goto disable_pmu;
- assert(system("dd if=/dev/urandom of=/dev/zero count=4 2> /dev/null")
- == 0);
- assert(system("./urandom_read") == 0);
+ if (QCHECK(system("dd if=/dev/urandom of=/dev/zero count=4 2> /dev/null")))
+ goto disable_pmu;
+ if (QCHECK(system("./urandom_read")))
+ goto disable_pmu;
/* disable stack trace collection */
key = 0;
val = 1;
diff --git a/tools/testing/selftests/bpf/prog_tests/stacktrace_build_id_nmi.c b/tools/testing/selftests/bpf/prog_tests/stacktrace_build_id_nmi.c
index 9557b7dfb782..1553c848edc5 100644
--- a/tools/testing/selftests/bpf/prog_tests/stacktrace_build_id_nmi.c
+++ b/tools/testing/selftests/bpf/prog_tests/stacktrace_build_id_nmi.c
@@ -82,9 +82,10 @@ void test_stacktrace_build_id_nmi(void)
"err %d errno %d\n", err, errno))
goto disable_pmu;
- assert(system("dd if=/dev/urandom of=/dev/zero count=4 2> /dev/null")
- == 0);
- assert(system("taskset 0x1 ./urandom_read 100000") == 0);
+ if (QCHECK(system("dd if=/dev/urandom of=/dev/zero count=4 2> /dev/null")))
+ goto disable_pmu;
+ if (QCHECK(system("taskset 0x1 ./urandom_read 100000")))
+ goto disable_pmu;
/* disable stack trace collection */
key = 0;
val = 1;
--
2.23.0.rc1.153.gdeed80330f-goog
^ permalink raw reply related
* [PATCH bpf-next v2 4/4] selftests/bpf: test_progs: remove unused ret
From: Stanislav Fomichev @ 2019-08-19 19:17 UTC (permalink / raw)
To: netdev, bpf; +Cc: davem, ast, daniel, Stanislav Fomichev
In-Reply-To: <20190819191752.241637-1-sdf@google.com>
send_signal test returns static codes from the subtests which
nobody looks at, let's rely on the CHECK macros instead.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
.../selftests/bpf/prog_tests/send_signal.c | 42 +++++++++----------
1 file changed, 19 insertions(+), 23 deletions(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/send_signal.c b/tools/testing/selftests/bpf/prog_tests/send_signal.c
index 40c2c5efdd3e..b607112c64e7 100644
--- a/tools/testing/selftests/bpf/prog_tests/send_signal.c
+++ b/tools/testing/selftests/bpf/prog_tests/send_signal.c
@@ -8,7 +8,7 @@ static void sigusr1_handler(int signum)
sigusr1_received++;
}
-static int test_send_signal_common(struct perf_event_attr *attr,
+static void test_send_signal_common(struct perf_event_attr *attr,
int prog_type,
const char *test_name)
{
@@ -23,13 +23,13 @@ static int test_send_signal_common(struct perf_event_attr *attr,
if (CHECK(pipe(pipe_c2p), test_name,
"pipe pipe_c2p error: %s\n", strerror(errno)))
- goto no_fork_done;
+ return;
if (CHECK(pipe(pipe_p2c), test_name,
"pipe pipe_p2c error: %s\n", strerror(errno))) {
close(pipe_c2p[0]);
close(pipe_c2p[1]);
- goto no_fork_done;
+ return;
}
pid = fork();
@@ -38,7 +38,7 @@ static int test_send_signal_common(struct perf_event_attr *attr,
close(pipe_c2p[1]);
close(pipe_p2c[0]);
close(pipe_p2c[1]);
- goto no_fork_done;
+ return;
}
if (pid == 0) {
@@ -125,7 +125,7 @@ static int test_send_signal_common(struct perf_event_attr *attr,
goto disable_pmu;
}
- err = CHECK(buf[0] != '2', test_name, "incorrect result\n");
+ CHECK(buf[0] != '2', test_name, "incorrect result\n");
/* notify child safe to exit */
write(pipe_p2c[1], buf, 1);
@@ -138,11 +138,9 @@ static int test_send_signal_common(struct perf_event_attr *attr,
close(pipe_c2p[0]);
close(pipe_p2c[1]);
wait(NULL);
-no_fork_done:
- return err;
}
-static int test_send_signal_tracepoint(void)
+static void test_send_signal_tracepoint(void)
{
const char *id_path = "/sys/kernel/debug/tracing/events/syscalls/sys_enter_nanosleep/id";
struct perf_event_attr attr = {
@@ -159,21 +157,21 @@ static int test_send_signal_tracepoint(void)
if (CHECK(efd < 0, "tracepoint",
"open syscalls/sys_enter_nanosleep/id failure: %s\n",
strerror(errno)))
- return -1;
+ return;
bytes = read(efd, buf, sizeof(buf));
close(efd);
if (CHECK(bytes <= 0 || bytes >= sizeof(buf), "tracepoint",
"read syscalls/sys_enter_nanosleep/id failure: %s\n",
strerror(errno)))
- return -1;
+ return;
attr.config = strtol(buf, NULL, 0);
- return test_send_signal_common(&attr, BPF_PROG_TYPE_TRACEPOINT, "tracepoint");
+ test_send_signal_common(&attr, BPF_PROG_TYPE_TRACEPOINT, "tracepoint");
}
-static int test_send_signal_perf(void)
+static void test_send_signal_perf(void)
{
struct perf_event_attr attr = {
.sample_period = 1,
@@ -181,11 +179,11 @@ static int test_send_signal_perf(void)
.config = PERF_COUNT_SW_CPU_CLOCK,
};
- return test_send_signal_common(&attr, BPF_PROG_TYPE_PERF_EVENT,
- "perf_sw_event");
+ test_send_signal_common(&attr, BPF_PROG_TYPE_PERF_EVENT,
+ "perf_sw_event");
}
-static int test_send_signal_nmi(void)
+static void test_send_signal_nmi(void)
{
struct perf_event_attr attr = {
.sample_freq = 50,
@@ -205,25 +203,23 @@ static int test_send_signal_nmi(void)
printf("%s:SKIP:no PERF_COUNT_HW_CPU_CYCLES\n",
__func__);
test__skip();
- return 0;
+ return;
}
/* Let the test fail with a more informative message */
} else {
close(pmu_fd);
}
- return test_send_signal_common(&attr, BPF_PROG_TYPE_PERF_EVENT,
- "perf_hw_event");
+ test_send_signal_common(&attr, BPF_PROG_TYPE_PERF_EVENT,
+ "perf_hw_event");
}
void test_send_signal(void)
{
- int ret = 0;
-
if (test__start_subtest("send_signal_tracepoint"))
- ret |= test_send_signal_tracepoint();
+ test_send_signal_tracepoint();
if (test__start_subtest("send_signal_perf"))
- ret |= test_send_signal_perf();
+ test_send_signal_perf();
if (test__start_subtest("send_signal_nmi"))
- ret |= test_send_signal_nmi();
+ test_send_signal_nmi();
}
--
2.23.0.rc1.153.gdeed80330f-goog
^ permalink raw reply related
* [PATCH net-next,v2 0/6] Add software backchannel and mlx5e HV VHCA stats
From: Haiyang Zhang @ 2019-08-19 19:30 UTC (permalink / raw)
To: sashal@kernel.org, davem@davemloft.net, saeedm@mellanox.com,
leon@kernel.org, eranbe@mellanox.com, lorenzo.pieralisi@arm.com,
bhelgaas@google.com, linux-pci@vger.kernel.org,
linux-hyperv@vger.kernel.org, netdev@vger.kernel.org
Cc: Haiyang Zhang, KY Srinivasan, Stephen Hemminger,
linux-kernel@vger.kernel.org
This patch set adds paravirtual backchannel in software in pci_hyperv,
which is required by the mlx5e driver HV VHCA stats agent.
The stats agent is responsible on running a periodic rx/tx packets/bytes
stats update.
Dexuan Cui (1):
PCI: hv: Add a paravirtual backchannel in software
Haiyang Zhang (5):
PCI: hv: Add a Hyper-V PCI interface driver for software backchannel
interface
net/mlx5: Add wrappers for HyperV PCIe operations
net/mlx5: Add HV VHCA infrastructure
net/mlx5: Add HV VHCA control agent
net/mlx5e: Add mlx5e HV VHCA stats agent
MAINTAINERS | 1 +
drivers/net/ethernet/mellanox/mlx5/core/Makefile | 2 +
drivers/net/ethernet/mellanox/mlx5/core/en.h | 13 +
.../ethernet/mellanox/mlx5/core/en/hv_vhca_stats.c | 162 +++++++++
.../ethernet/mellanox/mlx5/core/en/hv_vhca_stats.h | 25 ++
drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 3 +
drivers/net/ethernet/mellanox/mlx5/core/lib/hv.c | 64 ++++
drivers/net/ethernet/mellanox/mlx5/core/lib/hv.h | 22 ++
.../net/ethernet/mellanox/mlx5/core/lib/hv_vhca.c | 371 +++++++++++++++++++++
.../net/ethernet/mellanox/mlx5/core/lib/hv_vhca.h | 104 ++++++
drivers/net/ethernet/mellanox/mlx5/core/main.c | 7 +
drivers/pci/Kconfig | 1 +
drivers/pci/controller/Kconfig | 7 +
drivers/pci/controller/Makefile | 1 +
drivers/pci/controller/pci-hyperv-intf.c | 70 ++++
drivers/pci/controller/pci-hyperv.c | 308 +++++++++++++++++
include/linux/hyperv.h | 29 ++
include/linux/mlx5/driver.h | 2 +
18 files changed, 1192 insertions(+)
create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/hv_vhca_stats.c
create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/hv_vhca_stats.h
create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lib/hv.c
create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lib/hv.h
create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.c
create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.h
create mode 100644 drivers/pci/controller/pci-hyperv-intf.c
--
1.8.3.1
^ permalink raw reply
* [PATCH net-next,v2 1/6] PCI: hv: Add a paravirtual backchannel in software
From: Haiyang Zhang @ 2019-08-19 19:30 UTC (permalink / raw)
To: sashal@kernel.org, davem@davemloft.net, saeedm@mellanox.com,
leon@kernel.org, eranbe@mellanox.com, lorenzo.pieralisi@arm.com,
bhelgaas@google.com, linux-pci@vger.kernel.org,
linux-hyperv@vger.kernel.org, netdev@vger.kernel.org
Cc: Haiyang Zhang, KY Srinivasan, Stephen Hemminger,
linux-kernel@vger.kernel.org, Dexuan Cui, Jake Oshins
In-Reply-To: <1566242976-108801-1-git-send-email-haiyangz@microsoft.com>
From: Dexuan Cui <decui@microsoft.com>
Windows SR-IOV provides a backchannel mechanism in software for communication
between a VF driver and a PF driver. These "configuration blocks" are
similar in concept to PCI configuration space, but instead of doing reads and
writes in 32-bit chunks through a very slow path, packets of up to 128 bytes
can be sent or received asynchronously.
Nearly every SR-IOV device contains just such a communications channel in
hardware, so using this one in software is usually optional. Using the
software channel, however, allows driver implementers to leverage software
tools that fuzz the communications channel looking for vulnerabilities.
The usage model for these packets puts the responsibility for reading or
writing on the VF driver. The VF driver sends a read or a write packet,
indicating which "block" is being referred to by number.
If the PF driver wishes to initiate communication, it can "invalidate" one or
more of the first 64 blocks. This invalidation is delivered via a callback
supplied by the VF driver by this driver.
No protocol is implied, except that supplied by the PF and VF drivers.
Signed-off-by: Jake Oshins <jakeo@microsoft.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
---
drivers/pci/controller/pci-hyperv.c | 302 ++++++++++++++++++++++++++++++++++++
include/linux/hyperv.h | 15 ++
2 files changed, 317 insertions(+)
diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index 40b6254..57adeca 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -365,6 +365,39 @@ struct pci_delete_interrupt {
struct tran_int_desc int_desc;
} __packed;
+/*
+ * Note: the VM must pass a valid block id, wslot and bytes_requested.
+ */
+struct pci_read_block {
+ struct pci_message message_type;
+ u32 block_id;
+ union win_slot_encoding wslot;
+ u32 bytes_requested;
+} __packed;
+
+struct pci_read_block_response {
+ struct vmpacket_descriptor hdr;
+ u32 status;
+ u8 bytes[HV_CONFIG_BLOCK_SIZE_MAX];
+} __packed;
+
+/*
+ * Note: the VM must pass a valid block id, wslot and byte_count.
+ */
+struct pci_write_block {
+ struct pci_message message_type;
+ u32 block_id;
+ union win_slot_encoding wslot;
+ u32 byte_count;
+ u8 bytes[HV_CONFIG_BLOCK_SIZE_MAX];
+} __packed;
+
+struct pci_dev_inval_block {
+ struct pci_incoming_message incoming;
+ union win_slot_encoding wslot;
+ u64 block_mask;
+} __packed;
+
struct pci_dev_incoming {
struct pci_incoming_message incoming;
union win_slot_encoding wslot;
@@ -499,6 +532,9 @@ struct hv_pci_dev {
struct hv_pcibus_device *hbus;
struct work_struct wrk;
+ void (*block_invalidate)(void *context, u64 block_mask);
+ void *invalidate_context;
+
/*
* What would be observed if one wrote 0xFFFFFFFF to a BAR and then
* read it back, for each of the BAR offsets within config space.
@@ -817,6 +853,256 @@ static int hv_pcifront_write_config(struct pci_bus *bus, unsigned int devfn,
.write = hv_pcifront_write_config,
};
+/*
+ * Paravirtual backchannel
+ *
+ * Hyper-V SR-IOV provides a backchannel mechanism in software for
+ * communication between a VF driver and a PF driver. These
+ * "configuration blocks" are similar in concept to PCI configuration space,
+ * but instead of doing reads and writes in 32-bit chunks through a very slow
+ * path, packets of up to 128 bytes can be sent or received asynchronously.
+ *
+ * Nearly every SR-IOV device contains just such a communications channel in
+ * hardware, so using this one in software is usually optional. Using the
+ * software channel, however, allows driver implementers to leverage software
+ * tools that fuzz the communications channel looking for vulnerabilities.
+ *
+ * The usage model for these packets puts the responsibility for reading or
+ * writing on the VF driver. The VF driver sends a read or a write packet,
+ * indicating which "block" is being referred to by number.
+ *
+ * If the PF driver wishes to initiate communication, it can "invalidate" one or
+ * more of the first 64 blocks. This invalidation is delivered via a callback
+ * supplied by the VF driver by this driver.
+ *
+ * No protocol is implied, except that supplied by the PF and VF drivers.
+ */
+
+struct hv_read_config_compl {
+ struct hv_pci_compl comp_pkt;
+ void *buf;
+ unsigned int len;
+ unsigned int bytes_returned;
+};
+
+/**
+ * hv_pci_read_config_compl() - Invoked when a response packet
+ * for a read config block operation arrives.
+ * @context: Identifies the read config operation
+ * @resp: The response packet itself
+ * @resp_packet_size: Size in bytes of the response packet
+ */
+static void hv_pci_read_config_compl(void *context, struct pci_response *resp,
+ int resp_packet_size)
+{
+ struct hv_read_config_compl *comp = context;
+ struct pci_read_block_response *read_resp =
+ (struct pci_read_block_response *)resp;
+ unsigned int data_len, hdr_len;
+
+ hdr_len = offsetof(struct pci_read_block_response, bytes);
+ if (resp_packet_size < hdr_len) {
+ comp->comp_pkt.completion_status = -1;
+ goto out;
+ }
+
+ data_len = resp_packet_size - hdr_len;
+ if (data_len > 0 && read_resp->status == 0) {
+ comp->bytes_returned = min(comp->len, data_len);
+ memcpy(comp->buf, read_resp->bytes, comp->bytes_returned);
+ } else {
+ comp->bytes_returned = 0;
+ }
+
+ comp->comp_pkt.completion_status = read_resp->status;
+out:
+ complete(&comp->comp_pkt.host_event);
+}
+
+/**
+ * hv_read_config_block() - Sends a read config block request to
+ * the back-end driver running in the Hyper-V parent partition.
+ * @pdev: The PCI driver's representation for this device.
+ * @buf: Buffer into which the config block will be copied.
+ * @len: Size in bytes of buf.
+ * @block_id: Identifies the config block which has been requested.
+ * @bytes_returned: Size which came back from the back-end driver.
+ *
+ * Return: 0 on success, -errno on failure
+ */
+int hv_read_config_block(struct pci_dev *pdev, void *buf, unsigned int len,
+ unsigned int block_id, unsigned int *bytes_returned)
+{
+ struct hv_pcibus_device *hbus =
+ container_of(pdev->bus->sysdata, struct hv_pcibus_device,
+ sysdata);
+ struct {
+ struct pci_packet pkt;
+ char buf[sizeof(struct pci_read_block)];
+ } pkt;
+ struct hv_read_config_compl comp_pkt;
+ struct pci_read_block *read_blk;
+ int ret;
+
+ if (len == 0 || len > HV_CONFIG_BLOCK_SIZE_MAX)
+ return -EINVAL;
+
+ init_completion(&comp_pkt.comp_pkt.host_event);
+ comp_pkt.buf = buf;
+ comp_pkt.len = len;
+
+ memset(&pkt, 0, sizeof(pkt));
+ pkt.pkt.completion_func = hv_pci_read_config_compl;
+ pkt.pkt.compl_ctxt = &comp_pkt;
+ read_blk = (struct pci_read_block *)&pkt.pkt.message;
+ read_blk->message_type.type = PCI_READ_BLOCK;
+ read_blk->wslot.slot = devfn_to_wslot(pdev->devfn);
+ read_blk->block_id = block_id;
+ read_blk->bytes_requested = len;
+
+ ret = vmbus_sendpacket(hbus->hdev->channel, read_blk,
+ sizeof(*read_blk), (unsigned long)&pkt.pkt,
+ VM_PKT_DATA_INBAND,
+ VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED);
+ if (ret)
+ return ret;
+
+ ret = wait_for_response(hbus->hdev, &comp_pkt.comp_pkt.host_event);
+ if (ret)
+ return ret;
+
+ if (comp_pkt.comp_pkt.completion_status != 0 ||
+ comp_pkt.bytes_returned == 0) {
+ dev_err(&hbus->hdev->device,
+ "Read Config Block failed: 0x%x, bytes_returned=%d\n",
+ comp_pkt.comp_pkt.completion_status,
+ comp_pkt.bytes_returned);
+ return -EIO;
+ }
+
+ *bytes_returned = comp_pkt.bytes_returned;
+ return 0;
+}
+EXPORT_SYMBOL(hv_read_config_block);
+
+/**
+ * hv_pci_write_config_compl() - Invoked when a response packet for a write
+ * config block operation arrives.
+ * @context: Identifies the write config operation
+ * @resp: The response packet itself
+ * @resp_packet_size: Size in bytes of the response packet
+ */
+static void hv_pci_write_config_compl(void *context, struct pci_response *resp,
+ int resp_packet_size)
+{
+ struct hv_pci_compl *comp_pkt = context;
+
+ comp_pkt->completion_status = resp->status;
+ complete(&comp_pkt->host_event);
+}
+
+/**
+ * hv_write_config_block() - Sends a write config block request to the
+ * back-end driver running in the Hyper-V parent partition.
+ * @pdev: The PCI driver's representation for this device.
+ * @buf: Buffer from which the config block will be copied.
+ * @len: Size in bytes of buf.
+ * @block_id: Identifies the config block which is being written.
+ *
+ * Return: 0 on success, -errno on failure
+ */
+int hv_write_config_block(struct pci_dev *pdev, void *buf, unsigned int len,
+ unsigned int block_id)
+{
+ struct hv_pcibus_device *hbus =
+ container_of(pdev->bus->sysdata, struct hv_pcibus_device,
+ sysdata);
+ struct {
+ struct pci_packet pkt;
+ char buf[sizeof(struct pci_write_block)];
+ u32 reserved;
+ } pkt;
+ struct hv_pci_compl comp_pkt;
+ struct pci_write_block *write_blk;
+ u32 pkt_size;
+ int ret;
+
+ if (len == 0 || len > HV_CONFIG_BLOCK_SIZE_MAX)
+ return -EINVAL;
+
+ init_completion(&comp_pkt.host_event);
+
+ memset(&pkt, 0, sizeof(pkt));
+ pkt.pkt.completion_func = hv_pci_write_config_compl;
+ pkt.pkt.compl_ctxt = &comp_pkt;
+ write_blk = (struct pci_write_block *)&pkt.pkt.message;
+ write_blk->message_type.type = PCI_WRITE_BLOCK;
+ write_blk->wslot.slot = devfn_to_wslot(pdev->devfn);
+ write_blk->block_id = block_id;
+ write_blk->byte_count = len;
+ memcpy(write_blk->bytes, buf, len);
+ pkt_size = offsetof(struct pci_write_block, bytes) + len;
+ /*
+ * This quirk is required on some hosts shipped around 2018, because
+ * these hosts don't check the pkt_size correctly (new hosts have been
+ * fixed since early 2019). The quirk is also safe on very old hosts
+ * and new hosts, because, on them, what really matters is the length
+ * specified in write_blk->byte_count.
+ */
+ pkt_size += sizeof(pkt.reserved);
+
+ ret = vmbus_sendpacket(hbus->hdev->channel, write_blk, pkt_size,
+ (unsigned long)&pkt.pkt, VM_PKT_DATA_INBAND,
+ VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED);
+ if (ret)
+ return ret;
+
+ ret = wait_for_response(hbus->hdev, &comp_pkt.host_event);
+ if (ret)
+ return ret;
+
+ if (comp_pkt.completion_status != 0) {
+ dev_err(&hbus->hdev->device,
+ "Write Config Block failed: 0x%x\n",
+ comp_pkt.completion_status);
+ return -EIO;
+ }
+
+ return 0;
+}
+EXPORT_SYMBOL(hv_write_config_block);
+
+/**
+ * hv_register_block_invalidate() - Invoked when a config block invalidation
+ * arrives from the back-end driver.
+ * @pdev: The PCI driver's representation for this device.
+ * @context: Identifies the device.
+ * @block_invalidate: Identifies all of the blocks being invalidated.
+ *
+ * Return: 0 on success, -errno on failure
+ */
+int hv_register_block_invalidate(struct pci_dev *pdev, void *context,
+ void (*block_invalidate)(void *context,
+ u64 block_mask))
+{
+ struct hv_pcibus_device *hbus =
+ container_of(pdev->bus->sysdata, struct hv_pcibus_device,
+ sysdata);
+ struct hv_pci_dev *hpdev;
+
+ hpdev = get_pcichild_wslot(hbus, devfn_to_wslot(pdev->devfn));
+ if (!hpdev)
+ return -ENODEV;
+
+ hpdev->block_invalidate = block_invalidate;
+ hpdev->invalidate_context = context;
+
+ put_pcichild(hpdev);
+ return 0;
+
+}
+EXPORT_SYMBOL(hv_register_block_invalidate);
+
/* Interrupt management hooks */
static void hv_int_desc_free(struct hv_pci_dev *hpdev,
struct tran_int_desc *int_desc)
@@ -1968,6 +2254,7 @@ static void hv_pci_onchannelcallback(void *context)
struct pci_response *response;
struct pci_incoming_message *new_message;
struct pci_bus_relations *bus_rel;
+ struct pci_dev_inval_block *inval;
struct pci_dev_incoming *dev_message;
struct hv_pci_dev *hpdev;
@@ -2045,6 +2332,21 @@ static void hv_pci_onchannelcallback(void *context)
}
break;
+ case PCI_INVALIDATE_BLOCK:
+
+ inval = (struct pci_dev_inval_block *)buffer;
+ hpdev = get_pcichild_wslot(hbus,
+ inval->wslot.slot);
+ if (hpdev) {
+ if (hpdev->block_invalidate) {
+ hpdev->block_invalidate(
+ hpdev->invalidate_context,
+ inval->block_mask);
+ }
+ put_pcichild(hpdev);
+ }
+ break;
+
default:
dev_warn(&hbus->hdev->device,
"Unimplemented protocol message %x\n",
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 6256cc3..9d37f8c 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1578,4 +1578,19 @@ struct vmpacket_descriptor *
for (pkt = hv_pkt_iter_first(channel); pkt; \
pkt = hv_pkt_iter_next(channel, pkt))
+/*
+ * Functions for passing data between SR-IOV PF and VF drivers. The VF driver
+ * sends requests to read and write blocks. Each block must be 128 bytes or
+ * smaller. Optionally, the VF driver can register a callback function which
+ * will be invoked when the host says that one or more of the first 64 block
+ * IDs is "invalid" which means that the VF driver should reread them.
+ */
+#define HV_CONFIG_BLOCK_SIZE_MAX 128
+int hv_read_config_block(struct pci_dev *dev, void *buf, unsigned int buf_len,
+ unsigned int block_id, unsigned int *bytes_returned);
+int hv_write_config_block(struct pci_dev *dev, void *buf, unsigned int len,
+ unsigned int block_id);
+int hv_register_block_invalidate(struct pci_dev *dev, void *context,
+ void (*block_invalidate)(void *context,
+ u64 block_mask));
#endif /* _HYPERV_H */
--
1.8.3.1
^ permalink raw reply related
* [PATCH net-next,v2 2/6] PCI: hv: Add a Hyper-V PCI interface driver for software backchannel interface
From: Haiyang Zhang @ 2019-08-19 19:30 UTC (permalink / raw)
To: sashal@kernel.org, davem@davemloft.net, saeedm@mellanox.com,
leon@kernel.org, eranbe@mellanox.com, lorenzo.pieralisi@arm.com,
bhelgaas@google.com, linux-pci@vger.kernel.org,
linux-hyperv@vger.kernel.org, netdev@vger.kernel.org
Cc: Haiyang Zhang, KY Srinivasan, Stephen Hemminger,
linux-kernel@vger.kernel.org
In-Reply-To: <1566242976-108801-1-git-send-email-haiyangz@microsoft.com>
This interface driver is a helper driver allows other drivers to
have a common interface with the Hyper-V PCI frontend driver.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
MAINTAINERS | 1 +
drivers/pci/Kconfig | 1 +
drivers/pci/controller/Kconfig | 7 ++++
drivers/pci/controller/Makefile | 1 +
drivers/pci/controller/pci-hyperv-intf.c | 70 ++++++++++++++++++++++++++++++++
drivers/pci/controller/pci-hyperv.c | 12 ++++--
include/linux/hyperv.h | 30 ++++++++++----
7 files changed, 111 insertions(+), 11 deletions(-)
create mode 100644 drivers/pci/controller/pci-hyperv-intf.c
diff --git a/MAINTAINERS b/MAINTAINERS
index e352550..866ae88 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -7453,6 +7453,7 @@ F: drivers/hid/hid-hyperv.c
F: drivers/hv/
F: drivers/input/serio/hyperv-keyboard.c
F: drivers/pci/controller/pci-hyperv.c
+F: drivers/pci/controller/pci-hyperv-intf.c
F: drivers/net/hyperv/
F: drivers/scsi/storvsc_drv.c
F: drivers/uio/uio_hv_generic.c
diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
index 2ab9240..c313de9 100644
--- a/drivers/pci/Kconfig
+++ b/drivers/pci/Kconfig
@@ -182,6 +182,7 @@ config PCI_LABEL
config PCI_HYPERV
tristate "Hyper-V PCI Frontend"
depends on X86 && HYPERV && PCI_MSI && PCI_MSI_IRQ_DOMAIN && X86_64
+ select PCI_HYPERV_INTERFACE
help
The PCI device frontend driver allows the kernel to import arbitrary
PCI devices from a PCI backend to support PCI driver domains.
diff --git a/drivers/pci/controller/Kconfig b/drivers/pci/controller/Kconfig
index fe9f9f1..70e0782 100644
--- a/drivers/pci/controller/Kconfig
+++ b/drivers/pci/controller/Kconfig
@@ -281,5 +281,12 @@ config VMD
To compile this driver as a module, choose M here: the
module will be called vmd.
+config PCI_HYPERV_INTERFACE
+ tristate "Hyper-V PCI Interface"
+ depends on X86 && HYPERV && PCI_MSI && PCI_MSI_IRQ_DOMAIN && X86_64
+ help
+ The Hyper-V PCI Interface is a helper driver allows other drivers to
+ have a common interface with the Hyper-V PCI frontend driver.
+
source "drivers/pci/controller/dwc/Kconfig"
endmenu
diff --git a/drivers/pci/controller/Makefile b/drivers/pci/controller/Makefile
index d56a507..a2a22c9 100644
--- a/drivers/pci/controller/Makefile
+++ b/drivers/pci/controller/Makefile
@@ -4,6 +4,7 @@ obj-$(CONFIG_PCIE_CADENCE_HOST) += pcie-cadence-host.o
obj-$(CONFIG_PCIE_CADENCE_EP) += pcie-cadence-ep.o
obj-$(CONFIG_PCI_FTPCI100) += pci-ftpci100.o
obj-$(CONFIG_PCI_HYPERV) += pci-hyperv.o
+obj-$(CONFIG_PCI_HYPERV_INTERFACE) += pci-hyperv-intf.o
obj-$(CONFIG_PCI_MVEBU) += pci-mvebu.o
obj-$(CONFIG_PCI_AARDVARK) += pci-aardvark.o
obj-$(CONFIG_PCI_TEGRA) += pci-tegra.o
diff --git a/drivers/pci/controller/pci-hyperv-intf.c b/drivers/pci/controller/pci-hyperv-intf.c
new file mode 100644
index 0000000..087b7eb
--- /dev/null
+++ b/drivers/pci/controller/pci-hyperv-intf.c
@@ -0,0 +1,70 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) Microsoft Corporation.
+ *
+ * Author:
+ * Haiyang Zhang <haiyangz@microsoft.com>
+ *
+ * This small module is a helper driver allows other drivers to
+ * have a common interface with the Hyper-V PCI frontend driver.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/hyperv.h>
+
+struct hyperv_pci_block_ops hvpci_block_ops;
+EXPORT_SYMBOL(hvpci_block_ops);
+
+int hyperv_read_cfg_blk(struct pci_dev *dev, void *buf, unsigned int buf_len,
+ unsigned int block_id, unsigned int *bytes_returned)
+{
+ if (!hvpci_block_ops.read_block)
+ return -EOPNOTSUPP;
+
+ return hvpci_block_ops.read_block(dev, buf, buf_len, block_id,
+ bytes_returned);
+}
+EXPORT_SYMBOL(hyperv_read_cfg_blk);
+
+int hyperv_write_cfg_blk(struct pci_dev *dev, void *buf, unsigned int len,
+ unsigned int block_id)
+{
+ if (!hvpci_block_ops.write_block)
+ return -EOPNOTSUPP;
+
+ return hvpci_block_ops.write_block(dev, buf, len, block_id);
+}
+EXPORT_SYMBOL(hyperv_write_cfg_blk);
+
+int hyperv_reg_block_invalidate(struct pci_dev *dev, void *context,
+ void (*block_invalidate)(void *context,
+ u64 block_mask))
+{
+ if (!hvpci_block_ops.reg_blk_invalidate)
+ return -EOPNOTSUPP;
+
+ return hvpci_block_ops.reg_blk_invalidate(dev, context,
+ block_invalidate);
+}
+EXPORT_SYMBOL(hyperv_reg_block_invalidate);
+
+static void __exit exit_hv_pci_intf(void)
+{
+ pr_info("unloaded\n");
+}
+
+static int __init init_hv_pci_intf(void)
+{
+ pr_info("loaded\n");
+
+ return 0;
+}
+
+module_init(init_hv_pci_intf);
+module_exit(exit_hv_pci_intf);
+
+MODULE_DESCRIPTION("Hyper-V PCI Interface");
+MODULE_LICENSE("GPL v2");
diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index 57adeca..9c93ac2 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -983,7 +983,6 @@ int hv_read_config_block(struct pci_dev *pdev, void *buf, unsigned int len,
*bytes_returned = comp_pkt.bytes_returned;
return 0;
}
-EXPORT_SYMBOL(hv_read_config_block);
/**
* hv_pci_write_config_compl() - Invoked when a response packet for a write
@@ -1070,7 +1069,6 @@ int hv_write_config_block(struct pci_dev *pdev, void *buf, unsigned int len,
return 0;
}
-EXPORT_SYMBOL(hv_write_config_block);
/**
* hv_register_block_invalidate() - Invoked when a config block invalidation
@@ -1101,7 +1099,6 @@ int hv_register_block_invalidate(struct pci_dev *pdev, void *context,
return 0;
}
-EXPORT_SYMBOL(hv_register_block_invalidate);
/* Interrupt management hooks */
static void hv_int_desc_free(struct hv_pci_dev *hpdev,
@@ -3045,10 +3042,19 @@ static int hv_pci_remove(struct hv_device *hdev)
static void __exit exit_hv_pci_drv(void)
{
vmbus_driver_unregister(&hv_pci_drv);
+
+ hvpci_block_ops.read_block = NULL;
+ hvpci_block_ops.write_block = NULL;
+ hvpci_block_ops.reg_blk_invalidate = NULL;
}
static int __init init_hv_pci_drv(void)
{
+ /* Initialize PCI block r/w interface */
+ hvpci_block_ops.read_block = hv_read_config_block;
+ hvpci_block_ops.write_block = hv_write_config_block;
+ hvpci_block_ops.reg_blk_invalidate = hv_register_block_invalidate;
+
return vmbus_driver_register(&hv_pci_drv);
}
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 9d37f8c..2afe6fd 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1579,18 +1579,32 @@ struct vmpacket_descriptor *
pkt = hv_pkt_iter_next(channel, pkt))
/*
- * Functions for passing data between SR-IOV PF and VF drivers. The VF driver
+ * Interface for passing data between SR-IOV PF and VF drivers. The VF driver
* sends requests to read and write blocks. Each block must be 128 bytes or
* smaller. Optionally, the VF driver can register a callback function which
* will be invoked when the host says that one or more of the first 64 block
* IDs is "invalid" which means that the VF driver should reread them.
*/
#define HV_CONFIG_BLOCK_SIZE_MAX 128
-int hv_read_config_block(struct pci_dev *dev, void *buf, unsigned int buf_len,
- unsigned int block_id, unsigned int *bytes_returned);
-int hv_write_config_block(struct pci_dev *dev, void *buf, unsigned int len,
- unsigned int block_id);
-int hv_register_block_invalidate(struct pci_dev *dev, void *context,
- void (*block_invalidate)(void *context,
- u64 block_mask));
+
+int hyperv_read_cfg_blk(struct pci_dev *dev, void *buf, unsigned int buf_len,
+ unsigned int block_id, unsigned int *bytes_returned);
+int hyperv_write_cfg_blk(struct pci_dev *dev, void *buf, unsigned int len,
+ unsigned int block_id);
+int hyperv_reg_block_invalidate(struct pci_dev *dev, void *context,
+ void (*block_invalidate)(void *context,
+ u64 block_mask));
+
+struct hyperv_pci_block_ops {
+ int (*read_block)(struct pci_dev *dev, void *buf, unsigned int buf_len,
+ unsigned int block_id, unsigned int *bytes_returned);
+ int (*write_block)(struct pci_dev *dev, void *buf, unsigned int len,
+ unsigned int block_id);
+ int (*reg_blk_invalidate)(struct pci_dev *dev, void *context,
+ void (*block_invalidate)(void *context,
+ u64 block_mask));
+};
+
+extern struct hyperv_pci_block_ops hvpci_block_ops;
+
#endif /* _HYPERV_H */
--
1.8.3.1
^ permalink raw reply related
* [PATCH net-next,v2 3/6] net/mlx5: Add wrappers for HyperV PCIe operations
From: Haiyang Zhang @ 2019-08-19 19:30 UTC (permalink / raw)
To: sashal@kernel.org, davem@davemloft.net, saeedm@mellanox.com,
leon@kernel.org, eranbe@mellanox.com, lorenzo.pieralisi@arm.com,
bhelgaas@google.com, linux-pci@vger.kernel.org,
linux-hyperv@vger.kernel.org, netdev@vger.kernel.org
Cc: Haiyang Zhang, KY Srinivasan, Stephen Hemminger,
linux-kernel@vger.kernel.org
In-Reply-To: <1566242976-108801-1-git-send-email-haiyangz@microsoft.com>
From: Eran Ben Elisha <eranbe@mellanox.com>
Add wrapper functions for HyperV PCIe read / write /
block_invalidate_register operations. This will be used as an
infrastructure in the downstream patch for software communication.
This will be enabled by default if CONFIG_PCI_HYPERV_INTERFACE is set.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
drivers/net/ethernet/mellanox/mlx5/core/Makefile | 1 +
drivers/net/ethernet/mellanox/mlx5/core/lib/hv.c | 64 ++++++++++++++++++++++++
drivers/net/ethernet/mellanox/mlx5/core/lib/hv.h | 22 ++++++++
3 files changed, 87 insertions(+)
create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lib/hv.c
create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lib/hv.h
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index 8b7edaa..247295b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -45,6 +45,7 @@ mlx5_core-$(CONFIG_MLX5_ESWITCH) += eswitch.o eswitch_offloads.o eswitch_offlo
mlx5_core-$(CONFIG_MLX5_MPFS) += lib/mpfs.o
mlx5_core-$(CONFIG_VXLAN) += lib/vxlan.o
mlx5_core-$(CONFIG_PTP_1588_CLOCK) += lib/clock.o
+mlx5_core-$(CONFIG_PCI_HYPERV_INTERFACE) += lib/hv.o
#
# Ipoib netdev
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/hv.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/hv.c
new file mode 100644
index 0000000..cf08d02
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/hv.c
@@ -0,0 +1,64 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+// Copyright (c) 2018 Mellanox Technologies
+
+#include <linux/hyperv.h>
+#include "mlx5_core.h"
+#include "lib/hv.h"
+
+static int mlx5_hv_config_common(struct mlx5_core_dev *dev, void *buf, int len,
+ int offset, bool read)
+{
+ int rc = -EOPNOTSUPP;
+ int bytes_returned;
+ int block_id;
+
+ if (offset % HV_CONFIG_BLOCK_SIZE_MAX || len % HV_CONFIG_BLOCK_SIZE_MAX)
+ return -EINVAL;
+
+ block_id = offset / HV_CONFIG_BLOCK_SIZE_MAX;
+
+ rc = read ?
+ hyperv_read_cfg_blk(dev->pdev, buf,
+ HV_CONFIG_BLOCK_SIZE_MAX, block_id,
+ &bytes_returned) :
+ hyperv_write_cfg_blk(dev->pdev, buf,
+ HV_CONFIG_BLOCK_SIZE_MAX, block_id);
+
+ /* Make sure len bytes were read successfully */
+ if (read)
+ rc |= !(len == bytes_returned);
+
+ if (rc) {
+ mlx5_core_err(dev, "Failed to %s hv config, err = %d, len = %d, offset = %d\n",
+ read ? "read" : "write", rc, len,
+ offset);
+ return rc;
+ }
+
+ return 0;
+}
+
+int mlx5_hv_read_config(struct mlx5_core_dev *dev, void *buf, int len,
+ int offset)
+{
+ return mlx5_hv_config_common(dev, buf, len, offset, true);
+}
+
+int mlx5_hv_write_config(struct mlx5_core_dev *dev, void *buf, int len,
+ int offset)
+{
+ return mlx5_hv_config_common(dev, buf, len, offset, false);
+}
+
+int mlx5_hv_register_invalidate(struct mlx5_core_dev *dev, void *context,
+ void (*block_invalidate)(void *context,
+ u64 block_mask))
+{
+ return hyperv_reg_block_invalidate(dev->pdev, context,
+ block_invalidate);
+}
+
+void mlx5_hv_unregister_invalidate(struct mlx5_core_dev *dev)
+{
+ hyperv_reg_block_invalidate(dev->pdev, NULL, NULL);
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/hv.h b/drivers/net/ethernet/mellanox/mlx5/core/lib/hv.h
new file mode 100644
index 0000000..f9a4557
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/hv.h
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
+/* Copyright (c) 2019 Mellanox Technologies. */
+
+#ifndef __LIB_HV_H__
+#define __LIB_HV_H__
+
+#if IS_ENABLED(CONFIG_PCI_HYPERV_INTERFACE)
+
+#include <linux/hyperv.h>
+#include <linux/mlx5/driver.h>
+
+int mlx5_hv_read_config(struct mlx5_core_dev *dev, void *buf, int len,
+ int offset);
+int mlx5_hv_write_config(struct mlx5_core_dev *dev, void *buf, int len,
+ int offset);
+int mlx5_hv_register_invalidate(struct mlx5_core_dev *dev, void *context,
+ void (*block_invalidate)(void *context,
+ u64 block_mask));
+void mlx5_hv_unregister_invalidate(struct mlx5_core_dev *dev);
+#endif
+
+#endif /* __LIB_HV_H__ */
--
1.8.3.1
^ permalink raw reply related
* [PATCH net-next,v2 6/6] net/mlx5e: Add mlx5e HV VHCA stats agent
From: Haiyang Zhang @ 2019-08-19 19:30 UTC (permalink / raw)
To: sashal@kernel.org, davem@davemloft.net, saeedm@mellanox.com,
leon@kernel.org, eranbe@mellanox.com, lorenzo.pieralisi@arm.com,
bhelgaas@google.com, linux-pci@vger.kernel.org,
linux-hyperv@vger.kernel.org, netdev@vger.kernel.org
Cc: Haiyang Zhang, KY Srinivasan, Stephen Hemminger,
linux-kernel@vger.kernel.org
In-Reply-To: <1566242976-108801-1-git-send-email-haiyangz@microsoft.com>
From: Eran Ben Elisha <eranbe@mellanox.com>
HV VHCA stats agent is responsible on running a preiodic rx/tx
packets/bytes stats update. Currently the supported format is version
MLX5_HV_VHCA_STATS_VERSION. Block ID 1 is dedicated for statistics data
transfer from the VF to the PF.
The reporter fetch the statistics data from all opened channels, fill it
in a buffer and send it to mlx5_hv_vhca_write_agent.
As the stats layer should include some metadata per block (sequence and
offset), the HV VHCA layer shall modify the buffer before actually send it
over block 1.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
drivers/net/ethernet/mellanox/mlx5/core/Makefile | 1 +
drivers/net/ethernet/mellanox/mlx5/core/en.h | 13 ++
.../ethernet/mellanox/mlx5/core/en/hv_vhca_stats.c | 162 +++++++++++++++++++++
.../ethernet/mellanox/mlx5/core/en/hv_vhca_stats.h | 25 ++++
drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 3 +
.../net/ethernet/mellanox/mlx5/core/lib/hv_vhca.h | 1 +
6 files changed, 205 insertions(+)
create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/hv_vhca_stats.c
create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/hv_vhca_stats.h
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index fc59a40..a889a38 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -36,6 +36,7 @@ mlx5_core-$(CONFIG_MLX5_CORE_EN_DCB) += en_dcbnl.o en/port_buffer.o
mlx5_core-$(CONFIG_MLX5_ESWITCH) += en_rep.o en_tc.o en/tc_tun.o lib/port_tun.o lag_mp.o \
lib/geneve.o en/tc_tun_vxlan.o en/tc_tun_gre.o \
en/tc_tun_geneve.o
+mlx5_core-$(CONFIG_PCI_HYPERV_INTERFACE) += en/hv_vhca_stats.o
#
# Core extra
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 8fc5107..4a8008b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -54,6 +54,7 @@
#include "mlx5_core.h"
#include "en_stats.h"
#include "en/fs.h"
+#include "lib/hv_vhca.h"
extern const struct net_device_ops mlx5e_netdev_ops;
struct page_pool;
@@ -777,6 +778,15 @@ struct mlx5e_modify_sq_param {
int rl_index;
};
+#if IS_ENABLED(CONFIG_PCI_HYPERV_INTERFACE)
+struct mlx5e_hv_vhca_stats_agent {
+ struct mlx5_hv_vhca_agent *agent;
+ struct delayed_work work;
+ u16 delay;
+ void *buf;
+};
+#endif
+
struct mlx5e_xsk {
/* UMEMs are stored separately from channels, because we don't want to
* lose them when channels are recreated. The kernel also stores UMEMs,
@@ -848,6 +858,9 @@ struct mlx5e_priv {
struct devlink_health_reporter *tx_reporter;
struct devlink_health_reporter *rx_reporter;
struct mlx5e_xsk xsk;
+#if IS_ENABLED(CONFIG_PCI_HYPERV_INTERFACE)
+ struct mlx5e_hv_vhca_stats_agent stats_agent;
+#endif
};
struct mlx5e_profile {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/hv_vhca_stats.c b/drivers/net/ethernet/mellanox/mlx5/core/en/hv_vhca_stats.c
new file mode 100644
index 0000000..c37b4ac
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/hv_vhca_stats.c
@@ -0,0 +1,162 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+// Copyright (c) 2018 Mellanox Technologies
+
+#include "en.h"
+#include "en/hv_vhca_stats.h"
+#include "lib/hv_vhca.h"
+#include "lib/hv.h"
+
+struct mlx5e_hv_vhca_per_ring_stats {
+ u64 rx_packets;
+ u64 rx_bytes;
+ u64 tx_packets;
+ u64 tx_bytes;
+};
+
+static void
+mlx5e_hv_vhca_fill_ring_stats(struct mlx5e_priv *priv, int ch,
+ struct mlx5e_hv_vhca_per_ring_stats *data)
+{
+ struct mlx5e_channel_stats *stats;
+ int tc;
+
+ stats = &priv->channel_stats[ch];
+ data->rx_packets = stats->rq.packets;
+ data->rx_bytes = stats->rq.bytes;
+
+ for (tc = 0; tc < priv->max_opened_tc; tc++) {
+ data->tx_packets += stats->sq[tc].packets;
+ data->tx_bytes += stats->sq[tc].bytes;
+ }
+}
+
+static void mlx5e_hv_vhca_fill_stats(struct mlx5e_priv *priv, u64 *data,
+ int buf_len)
+{
+ int ch, i = 0;
+
+ for (ch = 0; ch < priv->max_nch; ch++) {
+ u64 *buf = data + i;
+
+ if (WARN_ON_ONCE(buf +
+ sizeof(struct mlx5e_hv_vhca_per_ring_stats) >
+ data + buf_len))
+ return;
+
+ mlx5e_hv_vhca_fill_ring_stats(priv, ch,
+ (struct mlx5e_hv_vhca_per_ring_stats *)buf);
+ i += sizeof(struct mlx5e_hv_vhca_per_ring_stats) / sizeof(u64);
+ }
+}
+
+static int mlx5e_hv_vhca_stats_buf_size(struct mlx5e_priv *priv)
+{
+ return (sizeof(struct mlx5e_hv_vhca_per_ring_stats) *
+ priv->max_nch);
+}
+
+static void mlx5e_hv_vhca_stats_work(struct work_struct *work)
+{
+ struct mlx5e_hv_vhca_stats_agent *sagent;
+ struct mlx5_hv_vhca_agent *agent;
+ struct delayed_work *dwork;
+ struct mlx5e_priv *priv;
+ int buf_len, rc;
+ void *buf;
+
+ dwork = to_delayed_work(work);
+ sagent = container_of(dwork, struct mlx5e_hv_vhca_stats_agent, work);
+ priv = container_of(sagent, struct mlx5e_priv, stats_agent);
+ buf_len = mlx5e_hv_vhca_stats_buf_size(priv);
+ agent = sagent->agent;
+ buf = sagent->buf;
+
+ memset(buf, 0, buf_len);
+ mlx5e_hv_vhca_fill_stats(priv, buf, buf_len);
+
+ rc = mlx5_hv_vhca_agent_write(agent, buf, buf_len);
+ if (rc) {
+ mlx5_core_err(priv->mdev,
+ "%s: Failed to write stats, err = %d\n",
+ __func__, rc);
+ return;
+ }
+
+ if (sagent->delay)
+ queue_delayed_work(priv->wq, &sagent->work, sagent->delay);
+}
+
+enum {
+ MLX5_HV_VHCA_STATS_VERSION = 1,
+ MLX5_HV_VHCA_STATS_UPDATE_ONCE = 0xFFFF,
+};
+
+static void mlx5e_hv_vhca_stats_control(struct mlx5_hv_vhca_agent *agent,
+ struct mlx5_hv_vhca_control_block *block)
+{
+ struct mlx5e_hv_vhca_stats_agent *sagent;
+ struct mlx5e_priv *priv;
+
+ priv = mlx5_hv_vhca_agent_priv(agent);
+ sagent = &priv->stats_agent;
+
+ block->version = MLX5_HV_VHCA_STATS_VERSION;
+ block->rings = priv->max_nch;
+
+ if (!block->command) {
+ cancel_delayed_work_sync(&priv->stats_agent.work);
+ return;
+ }
+
+ sagent->delay = block->command == MLX5_HV_VHCA_STATS_UPDATE_ONCE ? 0 :
+ msecs_to_jiffies(block->command * 100);
+
+ queue_delayed_work(priv->wq, &sagent->work, sagent->delay);
+}
+
+static void mlx5e_hv_vhca_stats_cleanup(struct mlx5_hv_vhca_agent *agent)
+{
+ struct mlx5e_priv *priv = mlx5_hv_vhca_agent_priv(agent);
+
+ cancel_delayed_work_sync(&priv->stats_agent.work);
+}
+
+int mlx5e_hv_vhca_stats_create(struct mlx5e_priv *priv)
+{
+ int buf_len = mlx5e_hv_vhca_stats_buf_size(priv);
+ struct mlx5_hv_vhca_agent *agent;
+
+ priv->stats_agent.buf = kvzalloc(buf_len, GFP_KERNEL);
+ if (!priv->stats_agent.buf)
+ return -ENOMEM;
+
+ agent = mlx5_hv_vhca_agent_create(priv->mdev->hv_vhca,
+ MLX5_HV_VHCA_AGENT_STATS,
+ mlx5e_hv_vhca_stats_control, NULL,
+ mlx5e_hv_vhca_stats_cleanup,
+ priv);
+
+ if (IS_ERR_OR_NULL(agent)) {
+ if (IS_ERR(agent))
+ netdev_warn(priv->netdev,
+ "Failed to create hv vhca stats agent, err = %ld\n",
+ PTR_ERR(agent));
+
+ kfree(priv->stats_agent.buf);
+ return IS_ERR_OR_NULL(agent);
+ }
+
+ priv->stats_agent.agent = agent;
+ INIT_DELAYED_WORK(&priv->stats_agent.work, mlx5e_hv_vhca_stats_work);
+
+ return 0;
+}
+
+void mlx5e_hv_vhca_stats_destroy(struct mlx5e_priv *priv)
+{
+ if (IS_ERR_OR_NULL(priv->stats_agent.agent))
+ return;
+
+ mlx5_hv_vhca_agent_destroy(priv->stats_agent.agent);
+ kfree(priv->stats_agent.buf);
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/hv_vhca_stats.h b/drivers/net/ethernet/mellanox/mlx5/core/en/hv_vhca_stats.h
new file mode 100644
index 0000000..664463f
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/hv_vhca_stats.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
+/* Copyright (c) 2019 Mellanox Technologies. */
+
+#ifndef __MLX5_EN_STATS_VHCA_H__
+#define __MLX5_EN_STATS_VHCA_H__
+#include "en.h"
+
+#if IS_ENABLED(CONFIG_PCI_HYPERV_INTERFACE)
+
+int mlx5e_hv_vhca_stats_create(struct mlx5e_priv *priv);
+void mlx5e_hv_vhca_stats_destroy(struct mlx5e_priv *priv);
+
+#else
+
+static inline int mlx5e_hv_vhca_stats_create(struct mlx5e_priv *priv)
+{
+ return 0;
+}
+
+static inline void mlx5e_hv_vhca_stats_destroy(struct mlx5e_priv *priv)
+{
+}
+#endif
+
+#endif /* __MLX5_EN_STATS_VHCA_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 5721d3d..fac8455 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -64,6 +64,7 @@
#include "en/xsk/setup.h"
#include "en/xsk/rx.h"
#include "en/xsk/tx.h"
+#include "en/hv_vhca_stats.h"
bool mlx5e_check_fragmented_striding_rq_cap(struct mlx5_core_dev *mdev)
@@ -5103,6 +5104,7 @@ static void mlx5e_nic_enable(struct mlx5e_priv *priv)
if (mlx5e_monitor_counter_supported(priv))
mlx5e_monitor_counter_init(priv);
+ mlx5e_hv_vhca_stats_create(priv);
if (netdev->reg_state != NETREG_REGISTERED)
return;
#ifdef CONFIG_MLX5_CORE_EN_DCB
@@ -5135,6 +5137,7 @@ static void mlx5e_nic_disable(struct mlx5e_priv *priv)
queue_work(priv->wq, &priv->set_rx_mode_work);
+ mlx5e_hv_vhca_stats_destroy(priv);
if (mlx5e_monitor_counter_supported(priv))
mlx5e_monitor_counter_cleanup(priv);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.h b/drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.h
index 984e7ad..4bad6a5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.h
@@ -13,6 +13,7 @@
enum mlx5_hv_vhca_agent_type {
MLX5_HV_VHCA_AGENT_CONTROL = 0,
+ MLX5_HV_VHCA_AGENT_STATS = 1,
MLX5_HV_VHCA_AGENT_MAX = 32,
};
--
1.8.3.1
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox