* Re: [PATCH v3 0/7] Statsfs: a new ram-based file system for Linux kernel statistics
From: Jakub Kicinski @ 2020-05-26 22:31 UTC (permalink / raw)
To: Emanuele Giuseppe Esposito
Cc: linux-s390, kvm, linux-doc, netdev, Emanuele Giuseppe Esposito,
linux-kernel, kvm-ppc, Jonathan Adams, Christian Borntraeger,
Alexander Viro, David Rientjes, linux-fsdevel, Paolo Bonzini,
linux-mips, linuxppc-dev, linux-arm-kernel, Jim Mattson
In-Reply-To: <20200526110318.69006-1-eesposit@redhat.com>
On Tue, 26 May 2020 13:03:10 +0200 Emanuele Giuseppe Esposito wrote:
> There is currently no common way for Linux kernel subsystems to expose
> statistics to userspace shared throughout the Linux kernel; subsystems have
> to take care of gathering and displaying statistics by themselves, for
> example in the form of files in debugfs. For example KVM has its own code
> section that takes care of this in virt/kvm/kvm_main.c, where it sets up
> debugfs handlers for displaying values and aggregating them from various
> subfolders to obtain information about the system state (i.e. displaying
> the total number of exits, calculated by summing all exits of all cpus of
> all running virtual machines).
>
> Allowing each section of the kernel to do so has two disadvantages. First,
> it will introduce redundant code. Second, debugfs is anyway not the right
> place for statistics (for example it is affected by lockdown)
>
> In this patch series I introduce statsfs, a synthetic ram-based virtual
> filesystem that takes care of gathering and displaying statistics for the
> Linux kernel subsystems.
>
> The file system is mounted on /sys/kernel/stats and would be already used
> by kvm. Statsfs was initially introduced by Paolo Bonzini [1].
What's the direct motivation for this work? Moving KVM stats out of
debugfs?
In my experience stats belong in the API used for creating/enumerating
objects, statsfs sounds like going in the exact opposite direction -
creating a parallel structure / hierarchy for exposing stats. I know
nothing about KVM but are you sure all the info that has to be exposed
will be stats?
In case of networking we have the basic stats in sysfs, under the
netdevice's kobject. But since we're not using sysfs much any more
for config, new stats are added in netlink APIs. Again - same APIs
used for enumeration and config.
^ permalink raw reply
* [PATCH v2] powerpc/wii: Fix declaration made after definition
From: Nathan Chancellor @ 2020-05-26 20:57 UTC (permalink / raw)
To: Michael Ellerman
Cc: kbuild test robot, Nick Desaulniers, linux-kernel,
clang-built-linux, Paul Mackerras, Nathan Chancellor,
linuxppc-dev
In-Reply-To: <20200413190644.16757-1-natechancellor@gmail.com>
A 0day randconfig uncovered an error with clang, trimmed for brevity:
arch/powerpc/platforms/embedded6xx/wii.c:195:7: error: attribute
declaration must precede definition [-Werror,-Wignored-attributes]
if (!machine_is(wii))
^
The macro machine_is declares mach_##name but define_machine actually
defines mach_##name, hence the warning.
To fix this, move define_machine after the machine_is usage.
Fixes: 5a7ee3198dfa ("powerpc: wii: platform support")
Reported-by: kbuild test robot <lkp@intel.com>
Link: https://github.com/ClangBuiltLinux/linux/issues/989
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
---
v1 -> v2:
* s/is_machine/machine_is/ (Nick)
* Add Nick's reviewed-by tag.
arch/powerpc/platforms/embedded6xx/wii.c | 25 ++++++++++++------------
1 file changed, 12 insertions(+), 13 deletions(-)
diff --git a/arch/powerpc/platforms/embedded6xx/wii.c b/arch/powerpc/platforms/embedded6xx/wii.c
index 67e48b0a164e..a802ef957d63 100644
--- a/arch/powerpc/platforms/embedded6xx/wii.c
+++ b/arch/powerpc/platforms/embedded6xx/wii.c
@@ -172,19 +172,6 @@ static void wii_shutdown(void)
flipper_quiesce();
}
-define_machine(wii) {
- .name = "wii",
- .probe = wii_probe,
- .setup_arch = wii_setup_arch,
- .restart = wii_restart,
- .halt = wii_halt,
- .init_IRQ = wii_pic_probe,
- .get_irq = flipper_pic_get_irq,
- .calibrate_decr = generic_calibrate_decr,
- .progress = udbg_progress,
- .machine_shutdown = wii_shutdown,
-};
-
static const struct of_device_id wii_of_bus[] = {
{ .compatible = "nintendo,hollywood", },
{ },
@@ -200,3 +187,15 @@ static int __init wii_device_probe(void)
}
device_initcall(wii_device_probe);
+define_machine(wii) {
+ .name = "wii",
+ .probe = wii_probe,
+ .setup_arch = wii_setup_arch,
+ .restart = wii_restart,
+ .halt = wii_halt,
+ .init_IRQ = wii_pic_probe,
+ .get_irq = flipper_pic_get_irq,
+ .calibrate_decr = generic_calibrate_decr,
+ .progress = udbg_progress,
+ .machine_shutdown = wii_shutdown,
+};
base-commit: b0523c7b1c9d0edcd6c0fe6d2cb558a9ad5c60a8
--
2.27.0.rc0
^ permalink raw reply related
* Re: [PATCH] soc: fsl: qe: Replace one-element array and use struct_size() helper
From: Li Yang @ 2020-05-26 19:56 UTC (permalink / raw)
To: Qiang Zhao
Cc: Kees Cook, Gustavo A. R. Silva, lkml, Gustavo A. R. Silva,
linuxppc-dev,
moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE
In-Reply-To: <VE1PR04MB67682776BDC5682B3B330D6A91B30@VE1PR04MB6768.eurprd04.prod.outlook.com>
On Sun, May 24, 2020 at 9:49 PM Qiang Zhao <qiang.zhao@nxp.com> wrote:
>
> On Wed, May 23, 2020 at 5:22 PM Li Yang <leoyang.li@nxp.com>
> > -----Original Message-----
> > From: Li Yang <leoyang.li@nxp.com>
> > Sent: 2020年5月23日 5:22
> > To: Kees Cook <keescook@chromium.org>
> > Cc: Gustavo A. R. Silva <gustavoars@kernel.org>; Qiang Zhao
> > <qiang.zhao@nxp.com>; linuxppc-dev <linuxppc-dev@lists.ozlabs.org>;
> > moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE
> > <linux-arm-kernel@lists.infradead.org>; lkml <linux-kernel@vger.kernel.org>;
> > Gustavo A. R. Silva <gustavo@embeddedor.com>
> > Subject: Re: [PATCH] soc: fsl: qe: Replace one-element array and use
> > struct_size() helper
> >
> > On Wed, May 20, 2020 at 10:24 PM Kees Cook <keescook@chromium.org>
> > wrote:
> > >
> > > On Wed, May 20, 2020 at 06:52:21PM -0500, Li Yang wrote:
> > > > On Mon, May 18, 2020 at 5:57 PM Kees Cook <keescook@chromium.org>
> > wrote:
> > > > > Hm, looking at this code, I see a few other things that need to be
> > > > > fixed:
> > > > >
> > > > > 1) drivers/tty/serial/ucc_uart.c does not do a be32_to_cpu() conversion
> > > > > on the length test (understandably, a little-endian system has never
> > run
> > > > > this code since it's ppc specific), but it's still wrong:
> > > > >
> > > > > if (firmware->header.length != fw->size) {
> > > > >
> > > > > compare to the firmware loader:
> > > > >
> > > > > length = be32_to_cpu(hdr->length);
> > > > >
> > > > > 2) drivers/soc/fsl/qe/qe.c does not perform bounds checking on the
> > > > > per-microcode offsets, so the uploader might send data outside the
> > > > > firmware buffer. Perhaps:
> > > >
> > > > We do validate the CRC for each microcode, it is unlikely the CRC
> > > > check can pass if the offset or length is not correct. But you are
> > > > probably right that it will be safer to check the boundary and fail
> > >
> > > Right, but a malicious firmware file could still match CRC but trick
> > > the kernel code.
> > >
> > > > quicker before we actually start the CRC check. Will you come up
> > > > with a formal patch or you want us to deal with it?
> > >
> > > It sounds like Gustavo will be sending one, though I don't think
> > > either of us have the hardware to test it with, so if you could do
> > > that part, that would be great! :)
> >
> > That will be great. I think Zhao Qiang can help with the testing part.
> >
>
> Now the firmware are loaded in uboot, and kernel will do nothing for it.
> So testing on it maybe need some extra codes both in driver and dts.
> In the meanwhile, I am so busy on some high priority work that maybe test work
> could not be done in time.
> Once I am free, I will do it.
Thanks. You are right that most of the QE drivers doesn't support
requesting firmware in kernel except the ucc_uart. So it probably can
be tested with that driver without requiring code change.
>
> Best Regards
> Qiang Zhao
^ permalink raw reply
* Re: [PATCH] powerpc/wii: Fix declaration made after definition
From: Nick Desaulniers @ 2020-05-26 18:19 UTC (permalink / raw)
To: Nathan Chancellor
Cc: kbuild test robot, LKML, clang-built-linux, Paul Mackerras,
linuxppc-dev
In-Reply-To: <20200413190644.16757-1-natechancellor@gmail.com>
On Mon, Apr 13, 2020 at 12:07 PM Nathan Chancellor
<natechancellor@gmail.com> wrote:
>
> A 0day randconfig uncovered an error with clang, trimmed for brevity:
>
> arch/powerpc/platforms/embedded6xx/wii.c:195:7: error: attribute
> declaration must precede definition [-Werror,-Wignored-attributes]
> if (!machine_is(wii))
> ^
>
> The macro machine_is declares mach_##name but define_machine actually
> defines mach_##name, hence the warning.
>
> To fix this, move define_machine after the is_machine usage.
s/is_machine/machine_is/
With that fixup, you may add my reviewed by tag.
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
>
> Fixes: 5a7ee3198dfa ("powerpc: wii: platform support")
> Reported-by: kbuild test robot <lkp@intel.com>
> Link: https://github.com/ClangBuiltLinux/linux/issues/989
> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
> ---
> arch/powerpc/platforms/embedded6xx/wii.c | 25 ++++++++++++------------
^ I can run Linux on my Wii...think of the possibilities!
> 1 file changed, 12 insertions(+), 13 deletions(-)
>
> diff --git a/arch/powerpc/platforms/embedded6xx/wii.c b/arch/powerpc/platforms/embedded6xx/wii.c
> index 67e48b0a164e..a802ef957d63 100644
> --- a/arch/powerpc/platforms/embedded6xx/wii.c
> +++ b/arch/powerpc/platforms/embedded6xx/wii.c
> @@ -172,19 +172,6 @@ static void wii_shutdown(void)
> flipper_quiesce();
> }
>
> -define_machine(wii) {
> - .name = "wii",
> - .probe = wii_probe,
> - .setup_arch = wii_setup_arch,
> - .restart = wii_restart,
> - .halt = wii_halt,
> - .init_IRQ = wii_pic_probe,
> - .get_irq = flipper_pic_get_irq,
> - .calibrate_decr = generic_calibrate_decr,
> - .progress = udbg_progress,
> - .machine_shutdown = wii_shutdown,
> -};
> -
> static const struct of_device_id wii_of_bus[] = {
> { .compatible = "nintendo,hollywood", },
> { },
> @@ -200,3 +187,15 @@ static int __init wii_device_probe(void)
> }
> device_initcall(wii_device_probe);
>
> +define_machine(wii) {
> + .name = "wii",
> + .probe = wii_probe,
> + .setup_arch = wii_setup_arch,
> + .restart = wii_restart,
> + .halt = wii_halt,
> + .init_IRQ = wii_pic_probe,
> + .get_irq = flipper_pic_get_irq,
> + .calibrate_decr = generic_calibrate_decr,
> + .progress = udbg_progress,
> + .machine_shutdown = wii_shutdown,
> +};
>
> base-commit: 8f3d9f354286745c751374f5f1fcafee6b3f3136
> --
--
Thanks,
~Nick Desaulniers
^ permalink raw reply
* Re: [PATCH] KVM: PPC: Book3S HV: read ibm,secure-memory nodes
From: Laurent Dufour @ 2020-05-26 17:51 UTC (permalink / raw)
To: kvm-ppc, linuxppc-dev, paulus; +Cc: Alexey Kardashevskiy, linux-kernel
In-Reply-To: <20200416162715.45846-1-ldufour@linux.ibm.com>
Paul, could you please take that patch?
Le 16/04/2020 à 18:27, Laurent Dufour a écrit :
> The newly introduced ibm,secure-memory nodes supersede the
> ibm,uv-firmware's property secure-memory-ranges.
>
> Firmware will no more expose the secure-memory-ranges property so first
> read the new one and if not found rollback to the older one.
>
> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
> ---
> arch/powerpc/kvm/book3s_hv_uvmem.c | 14 ++++++++++++++
> 1 file changed, 14 insertions(+)
>
> diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c
> index 53b88cae3e73..ad950f8996e0 100644
> --- a/arch/powerpc/kvm/book3s_hv_uvmem.c
> +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
> @@ -735,6 +735,20 @@ static u64 kvmppc_get_secmem_size(void)
> const __be32 *prop;
> u64 size = 0;
>
> + /*
> + * First try the new ibm,secure-memory nodes which supersede the
> + * secure-memory-ranges property.
> + * If we found somes, no need to read the deprecated one.
> + */
> + for_each_compatible_node(np, NULL, "ibm,secure-memory") {
> + prop = of_get_property(np, "reg", &len);
> + if (!prop)
> + continue;
> + size += of_read_number(prop + 2, 2);
> + }
> + if (size)
> + return size;
> +
> np = of_find_compatible_node(NULL, NULL, "ibm,uv-firmware");
> if (!np)
> goto out;
>
^ permalink raw reply
* Re: [PATCH v2 08/20] arm64: simplify detection of memory zone boundaries for UMA configs
From: Catalin Marinas @ 2020-05-26 17:15 UTC (permalink / raw)
To: Mike Rapoport
Cc: Rich Felker, linux-ia64, linux-doc, Heiko Carstens, Michal Hocko,
James E.J. Bottomley, Max Filippov, Guo Ren, linux-csky,
linux-parisc, sparclinux, linux-hexagon, linux-riscv,
Mike Rapoport, Greg Ungerer, linux-arch, linux-s390,
linux-snps-arc, linux-c6x-dev, Baoquan He, Jonathan Corbet,
linux-sh, Helge Deller, x86, Russell King, Ley Foon Tan,
Yoshinori Sato, Geert Uytterhoeven, linux-arm-kernel, Mark Salter,
Matt Turner, linux-mips, uclinux-h8-devel, linux-xtensa,
linux-alpha, linux-um, linux-m68k, Tony Luck, Qian Cai,
Greentime Hu, Paul Walmsley, Stafford Horne, Guan Xuetao,
Hoan Tran, Michal Simek, Thomas Bogendoerfer, Brian Cain, Nick Hu,
linux-mm, Vineet Gupta, linux-kernel, openrisc,
Richard Weinberger, Andrew Morton, linuxppc-dev, David S. Miller
In-Reply-To: <20200429121126.17989-9-rppt@kernel.org>
On Wed, Apr 29, 2020 at 03:11:14PM +0300, Mike Rapoport wrote:
> From: Mike Rapoport <rppt@linux.ibm.com>
>
> The free_area_init() function only requires the definition of maximal PFN
> for each of the supported zone rater than calculation of actual zone sizes
> and the sizes of the holes between the zones.
>
> After removal of CONFIG_HAVE_MEMBLOCK_NODE_MAP the free_area_init() is
> available to all architectures.
>
> Using this function instead of free_area_init_node() simplifies the zone
> detection.
>
> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
(BTW, none of my acks so far made it to the linux-arm-kernel list
because of the large number of people on cc)
^ permalink raw reply
* Re: [PATCH v2 05/20] mm: use free_area_init() instead of free_area_init_nodes()
From: Catalin Marinas @ 2020-05-26 17:13 UTC (permalink / raw)
To: Mike Rapoport
Cc: Rich Felker, linux-ia64, linux-doc, Heiko Carstens, Michal Hocko,
James E.J. Bottomley, Max Filippov, Guo Ren, linux-csky,
linux-parisc, sparclinux, linux-hexagon, linux-riscv,
Mike Rapoport, Greg Ungerer, linux-arch, linux-s390,
linux-snps-arc, linux-c6x-dev, Baoquan He, Jonathan Corbet,
linux-sh, Helge Deller, x86, Russell King, Ley Foon Tan,
Yoshinori Sato, Geert Uytterhoeven, linux-arm-kernel, Mark Salter,
Matt Turner, linux-mips, uclinux-h8-devel, linux-xtensa,
linux-alpha, linux-um, linux-m68k, Tony Luck, Qian Cai,
Greentime Hu, Paul Walmsley, Stafford Horne, Guan Xuetao,
Hoan Tran, Michal Simek, Thomas Bogendoerfer, Brian Cain, Nick Hu,
linux-mm, Vineet Gupta, linux-kernel, openrisc,
Richard Weinberger, Andrew Morton, linuxppc-dev, David S. Miller
In-Reply-To: <20200429121126.17989-6-rppt@kernel.org>
On Wed, Apr 29, 2020 at 03:11:11PM +0300, Mike Rapoport wrote:
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index e42727e3568e..a650adb358ee 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -206,7 +206,7 @@ static void __init zone_sizes_init(unsigned long min, unsigned long max)
> #endif
> max_zone_pfns[ZONE_NORMAL] = max;
>
> - free_area_init_nodes(max_zone_pfns);
> + free_area_init(max_zone_pfns);
> }
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
^ permalink raw reply
* Re: [PATCH v2 03/20] mm: remove CONFIG_HAVE_MEMBLOCK_NODE_MAP option
From: Catalin Marinas @ 2020-05-26 17:11 UTC (permalink / raw)
To: Mike Rapoport
Cc: Rich Felker, linux-ia64, linux-doc, Heiko Carstens, Michal Hocko,
James E.J. Bottomley, Max Filippov, Guo Ren, linux-csky,
linux-parisc, sparclinux, linux-hexagon, linux-riscv,
Mike Rapoport, Greg Ungerer, linux-arch, linux-s390,
linux-snps-arc, linux-c6x-dev, Baoquan He, Jonathan Corbet,
linux-sh, Helge Deller, x86, Russell King, Ley Foon Tan,
Yoshinori Sato, Geert Uytterhoeven, linux-arm-kernel, Mark Salter,
Matt Turner, linux-mips, uclinux-h8-devel, linux-xtensa,
linux-alpha, linux-um, linux-m68k, Tony Luck, Qian Cai,
Greentime Hu, Paul Walmsley, Stafford Horne, Guan Xuetao,
Hoan Tran, Michal Simek, Thomas Bogendoerfer, Brian Cain, Nick Hu,
linux-mm, Vineet Gupta, linux-kernel, openrisc,
Richard Weinberger, Andrew Morton, linuxppc-dev, David S. Miller
In-Reply-To: <20200429121126.17989-4-rppt@kernel.org>
On Wed, Apr 29, 2020 at 03:11:09PM +0300, Mike Rapoport wrote:
> From: Mike Rapoport <rppt@linux.ibm.com>
>
> The CONFIG_HAVE_MEMBLOCK_NODE_MAP is used to differentiate initialization
> of nodes and zones structures between the systems that have region to node
> mapping in memblock and those that don't.
>
> Currently all the NUMA architectures enable this option and for the
> non-NUMA systems we can presume that all the memory belongs to node 0 and
> therefore the compile time configuration option is not required.
>
> The remaining few architectures that use DISCONTIGMEM without NUMA are
> easily updated to use memblock_add_node() instead of memblock_add() and
> thus have proper correspondence of memblock regions to NUMA nodes.
>
> Still, free_area_init_node() must have a backward compatible version
> because its semantics with and without CONFIG_HAVE_MEMBLOCK_NODE_MAP is
> different. Once all the architectures will use the new semantics, the
> entire compatibility layer can be dropped.
>
> To avoid addition of extra run time memory to store node id for
> architectures that keep memblock but have only a single node, the node id
> field of the memblock_region is guarded by CONFIG_NEED_MULTIPLE_NODES and
> the corresponding accessors presume that in those cases it is always 0.
>
> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> ---
> .../vm/numa-memblock/arch-support.txt | 34 ------
> arch/alpha/mm/numa.c | 4 +-
> arch/arm64/Kconfig | 1 -
For arm64:
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
^ permalink raw reply
* Re: [PATCH v3 1/3] riscv: Move kernel mapping to vmalloc zone
From: Alex Ghiti @ 2020-05-26 17:06 UTC (permalink / raw)
To: Zong Li
Cc: Albert Ou, Anup Patel, linux-kernel@vger.kernel.org List,
Atish Patra, Paul Mackerras, Paul Walmsley, Palmer Dabbelt,
linux-riscv, linuxppc-dev
In-Reply-To: <CANXhq0q_ba+zQrGUN0L+FQMPz6hWbYSQa2c8wLO3M-JTPWginA@mail.gmail.com>
Hi Zong,
Le 5/26/20 à 5:43 AM, Zong Li a écrit :
> On Sun, May 24, 2020 at 4:54 PM Alexandre Ghiti <alex@ghiti.fr> wrote:
>> This is a preparatory patch for relocatable kernel.
>>
>> The kernel used to be linked at PAGE_OFFSET address and used to be loaded
>> physically at the beginning of the main memory. Therefore, we could use
>> the linear mapping for the kernel mapping.
>>
>> But the relocated kernel base address will be different from PAGE_OFFSET
>> and since in the linear mapping, two different virtual addresses cannot
>> point to the same physical address, the kernel mapping needs to lie outside
>> the linear mapping.
>>
>> In addition, because modules and BPF must be close to the kernel (inside
>> +-2GB window), the kernel is placed at the end of the vmalloc zone minus
>> 2GB, which leaves room for modules and BPF. The kernel could not be
>> placed at the beginning of the vmalloc zone since other vmalloc
>> allocations from the kernel could get all the +-2GB window around the
>> kernel which would prevent new modules and BPF programs to be loaded.
>>
>> Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
>> ---
>> arch/riscv/boot/loader.lds.S | 3 +-
>> arch/riscv/include/asm/page.h | 10 +++++-
>> arch/riscv/include/asm/pgtable.h | 37 +++++++++++++-------
>> arch/riscv/kernel/head.S | 3 +-
>> arch/riscv/kernel/module.c | 4 +--
>> arch/riscv/kernel/vmlinux.lds.S | 3 +-
>> arch/riscv/mm/init.c | 58 +++++++++++++++++++++++++-------
>> arch/riscv/mm/physaddr.c | 2 +-
>> 8 files changed, 87 insertions(+), 33 deletions(-)
>>
>> diff --git a/arch/riscv/boot/loader.lds.S b/arch/riscv/boot/loader.lds.S
>> index 47a5003c2e28..62d94696a19c 100644
>> --- a/arch/riscv/boot/loader.lds.S
>> +++ b/arch/riscv/boot/loader.lds.S
>> @@ -1,13 +1,14 @@
>> /* SPDX-License-Identifier: GPL-2.0 */
>>
>> #include <asm/page.h>
>> +#include <asm/pgtable.h>
>>
>> OUTPUT_ARCH(riscv)
>> ENTRY(_start)
>>
>> SECTIONS
>> {
>> - . = PAGE_OFFSET;
>> + . = KERNEL_LINK_ADDR;
>>
>> .payload : {
>> *(.payload)
>> diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
>> index 2d50f76efe48..48bb09b6a9b7 100644
>> --- a/arch/riscv/include/asm/page.h
>> +++ b/arch/riscv/include/asm/page.h
>> @@ -90,18 +90,26 @@ typedef struct page *pgtable_t;
>>
>> #ifdef CONFIG_MMU
>> extern unsigned long va_pa_offset;
>> +extern unsigned long va_kernel_pa_offset;
>> extern unsigned long pfn_base;
>> #define ARCH_PFN_OFFSET (pfn_base)
>> #else
>> #define va_pa_offset 0
>> +#define va_kernel_pa_offset 0
>> #define ARCH_PFN_OFFSET (PAGE_OFFSET >> PAGE_SHIFT)
>> #endif /* CONFIG_MMU */
>>
>> extern unsigned long max_low_pfn;
>> extern unsigned long min_low_pfn;
>> +extern unsigned long kernel_virt_addr;
>>
>> #define __pa_to_va_nodebug(x) ((void *)((unsigned long) (x) + va_pa_offset))
>> -#define __va_to_pa_nodebug(x) ((unsigned long)(x) - va_pa_offset)
>> +#define linear_mapping_va_to_pa(x) ((unsigned long)(x) - va_pa_offset)
>> +#define kernel_mapping_va_to_pa(x) \
>> + ((unsigned long)(x) - va_kernel_pa_offset)
>> +#define __va_to_pa_nodebug(x) \
>> + (((x) >= PAGE_OFFSET) ? \
>> + linear_mapping_va_to_pa(x) : kernel_mapping_va_to_pa(x))
>>
>> #ifdef CONFIG_DEBUG_VIRTUAL
>> extern phys_addr_t __virt_to_phys(unsigned long x);
>> diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
>> index 35b60035b6b0..25213cfaf680 100644
>> --- a/arch/riscv/include/asm/pgtable.h
>> +++ b/arch/riscv/include/asm/pgtable.h
>> @@ -11,23 +11,29 @@
>>
>> #include <asm/pgtable-bits.h>
>>
>> -#ifndef __ASSEMBLY__
>> -
>> -/* Page Upper Directory not used in RISC-V */
>> -#include <asm-generic/pgtable-nopud.h>
>> -#include <asm/page.h>
>> -#include <asm/tlbflush.h>
>> -#include <linux/mm_types.h>
>> -
>> -#ifdef CONFIG_MMU
>> +#ifndef CONFIG_MMU
>> +#define KERNEL_VIRT_ADDR PAGE_OFFSET
>> +#define KERNEL_LINK_ADDR PAGE_OFFSET
>> +#else
>> +/*
>> + * Leave 2GB for modules and BPF that must lie within a 2GB range around
>> + * the kernel.
>> + */
>> +#define KERNEL_VIRT_ADDR (VMALLOC_END - SZ_2G + 1)
>> +#define KERNEL_LINK_ADDR KERNEL_VIRT_ADDR
>>
>> #define VMALLOC_SIZE (KERN_VIRT_SIZE >> 1)
>> #define VMALLOC_END (PAGE_OFFSET - 1)
>> #define VMALLOC_START (PAGE_OFFSET - VMALLOC_SIZE)
>>
>> #define BPF_JIT_REGION_SIZE (SZ_128M)
>> -#define BPF_JIT_REGION_START (PAGE_OFFSET - BPF_JIT_REGION_SIZE)
>> -#define BPF_JIT_REGION_END (VMALLOC_END)
>> +#define BPF_JIT_REGION_START (kernel_virt_addr)
>> +#define BPF_JIT_REGION_END (kernel_virt_addr + BPF_JIT_REGION_SIZE)
> It seems to have a potential risk here, the region of bpf is
> overlapping with kernel mapping, so if kernel size is bigger than
> 128MB, bpf region would be occupied and run out by kernel mapping.
>
>> +
>> +#ifdef CONFIG_64BIT
>> +#define VMALLOC_MODULE_START BPF_JIT_REGION_END
>> +#define VMALLOC_MODULE_END VMALLOC_END
>> +#endif
>>
> Although kernel_virt_addr is a fixed address now, I think it could be
> changed for the purpose of relocatable or KASLR, so if
> kernel_virt_addr is moved to far from VMALLOC_END than 2G, the region
> of module would be too big.
Yes you're right, that's wrong to allow modules to lie outside
the 2G window, thanks for noticing.
> In addition, the region of module could be
> +-2G around the kernel, so we don't be limited in one direction as
> before. It seems to me that the region of the module could be decided
> at runtime, for example, VMALLOC_MODULE_START is "&_end - 2G" and
> VMLLOC_MODULE_END is "&_start + 2G".
I had tried that, but as we need to make sure BPF region is different
from the module's
that makes the macro definitions really cumbersome. I'll give a try
again anyway. And
I tried to use _end and _start here but it failed, I have to debug this.
> I'm not sure whether the size of
> region of bpf has to be 128MB for some particular reason, if not,
> maybe the region of bpf could be the same with module to avoid being
> run out by module.
On the contrary, BPF region must not be the same as module's since in
that case,
modules could take all the space and make BPF fail.
Thanks for your review Zong,
Alex
>
>> /*
>> * Roughly size the vmemmap space to be large enough to fit enough
>> @@ -57,9 +63,16 @@
>> #define FIXADDR_SIZE PGDIR_SIZE
>> #endif
>> #define FIXADDR_START (FIXADDR_TOP - FIXADDR_SIZE)
>> -
>> #endif
>>
>> +#ifndef __ASSEMBLY__
>> +
>> +/* Page Upper Directory not used in RISC-V */
>> +#include <asm-generic/pgtable-nopud.h>
>> +#include <asm/page.h>
>> +#include <asm/tlbflush.h>
>> +#include <linux/mm_types.h>
>> +
>> #ifdef CONFIG_64BIT
>> #include <asm/pgtable-64.h>
>> #else
>> diff --git a/arch/riscv/kernel/head.S b/arch/riscv/kernel/head.S
>> index 98a406474e7d..8f5bb7731327 100644
>> --- a/arch/riscv/kernel/head.S
>> +++ b/arch/riscv/kernel/head.S
>> @@ -49,7 +49,8 @@ ENTRY(_start)
>> #ifdef CONFIG_MMU
>> relocate:
>> /* Relocate return address */
>> - li a1, PAGE_OFFSET
>> + la a1, kernel_virt_addr
>> + REG_L a1, 0(a1)
>> la a2, _start
>> sub a1, a1, a2
>> add ra, ra, a1
>> diff --git a/arch/riscv/kernel/module.c b/arch/riscv/kernel/module.c
>> index 8bbe5dbe1341..1a8fbe05accf 100644
>> --- a/arch/riscv/kernel/module.c
>> +++ b/arch/riscv/kernel/module.c
>> @@ -392,12 +392,10 @@ int apply_relocate_add(Elf_Shdr *sechdrs, const char *strtab,
>> }
>>
>> #if defined(CONFIG_MMU) && defined(CONFIG_64BIT)
>> -#define VMALLOC_MODULE_START \
>> - max(PFN_ALIGN((unsigned long)&_end - SZ_2G), VMALLOC_START)
>> void *module_alloc(unsigned long size)
>> {
>> return __vmalloc_node_range(size, 1, VMALLOC_MODULE_START,
>> - VMALLOC_END, GFP_KERNEL,
>> + VMALLOC_MODULE_END, GFP_KERNEL,
>> PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE,
>> __builtin_return_address(0));
>> }
>> diff --git a/arch/riscv/kernel/vmlinux.lds.S b/arch/riscv/kernel/vmlinux.lds.S
>> index 0339b6bbe11a..a9abde62909f 100644
>> --- a/arch/riscv/kernel/vmlinux.lds.S
>> +++ b/arch/riscv/kernel/vmlinux.lds.S
>> @@ -4,7 +4,8 @@
>> * Copyright (C) 2017 SiFive
>> */
>>
>> -#define LOAD_OFFSET PAGE_OFFSET
>> +#include <asm/pgtable.h>
>> +#define LOAD_OFFSET KERNEL_LINK_ADDR
>> #include <asm/vmlinux.lds.h>
>> #include <asm/page.h>
>> #include <asm/cache.h>
>> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
>> index 27a334106708..17f108baec4f 100644
>> --- a/arch/riscv/mm/init.c
>> +++ b/arch/riscv/mm/init.c
>> @@ -22,6 +22,9 @@
>>
>> #include "../kernel/head.h"
>>
>> +unsigned long kernel_virt_addr = KERNEL_VIRT_ADDR;
>> +EXPORT_SYMBOL(kernel_virt_addr);
>> +
>> unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)]
>> __page_aligned_bss;
>> EXPORT_SYMBOL(empty_zero_page);
>> @@ -178,8 +181,12 @@ void __init setup_bootmem(void)
>> }
>>
>> #ifdef CONFIG_MMU
>> +/* Offset between linear mapping virtual address and kernel load address */
>> unsigned long va_pa_offset;
>> EXPORT_SYMBOL(va_pa_offset);
>> +/* Offset between kernel mapping virtual address and kernel load address */
>> +unsigned long va_kernel_pa_offset;
>> +EXPORT_SYMBOL(va_kernel_pa_offset);
>> unsigned long pfn_base;
>> EXPORT_SYMBOL(pfn_base);
>>
>> @@ -271,7 +278,7 @@ static phys_addr_t __init alloc_pmd(uintptr_t va)
>> if (mmu_enabled)
>> return memblock_phys_alloc(PAGE_SIZE, PAGE_SIZE);
>>
>> - pmd_num = (va - PAGE_OFFSET) >> PGDIR_SHIFT;
>> + pmd_num = (va - kernel_virt_addr) >> PGDIR_SHIFT;
>> BUG_ON(pmd_num >= NUM_EARLY_PMDS);
>> return (uintptr_t)&early_pmd[pmd_num * PTRS_PER_PMD];
>> }
>> @@ -372,14 +379,30 @@ static uintptr_t __init best_map_size(phys_addr_t base, phys_addr_t size)
>> #error "setup_vm() is called from head.S before relocate so it should not use absolute addressing."
>> #endif
>>
>> +static uintptr_t load_pa, load_sz;
>> +
>> +void create_kernel_page_table(pgd_t *pgdir, uintptr_t map_size)
>> +{
>> + uintptr_t va, end_va;
>> +
>> + end_va = kernel_virt_addr + load_sz;
>> + for (va = kernel_virt_addr; va < end_va; va += map_size)
>> + create_pgd_mapping(pgdir, va,
>> + load_pa + (va - kernel_virt_addr),
>> + map_size, PAGE_KERNEL_EXEC);
>> +}
>> +
>> asmlinkage void __init setup_vm(uintptr_t dtb_pa)
>> {
>> uintptr_t va, end_va;
>> - uintptr_t load_pa = (uintptr_t)(&_start);
>> - uintptr_t load_sz = (uintptr_t)(&_end) - load_pa;
>> uintptr_t map_size = best_map_size(load_pa, MAX_EARLY_MAPPING_SIZE);
>>
>> + load_pa = (uintptr_t)(&_start);
>> + load_sz = (uintptr_t)(&_end) - load_pa;
>> +
>> va_pa_offset = PAGE_OFFSET - load_pa;
>> + va_kernel_pa_offset = kernel_virt_addr - load_pa;
>> +
>> pfn_base = PFN_DOWN(load_pa);
>>
>> /*
>> @@ -402,26 +425,22 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
>> create_pmd_mapping(fixmap_pmd, FIXADDR_START,
>> (uintptr_t)fixmap_pte, PMD_SIZE, PAGE_TABLE);
>> /* Setup trampoline PGD and PMD */
>> - create_pgd_mapping(trampoline_pg_dir, PAGE_OFFSET,
>> + create_pgd_mapping(trampoline_pg_dir, kernel_virt_addr,
>> (uintptr_t)trampoline_pmd, PGDIR_SIZE, PAGE_TABLE);
>> - create_pmd_mapping(trampoline_pmd, PAGE_OFFSET,
>> + create_pmd_mapping(trampoline_pmd, kernel_virt_addr,
>> load_pa, PMD_SIZE, PAGE_KERNEL_EXEC);
>> #else
>> /* Setup trampoline PGD */
>> - create_pgd_mapping(trampoline_pg_dir, PAGE_OFFSET,
>> + create_pgd_mapping(trampoline_pg_dir, kernel_virt_addr,
>> load_pa, PGDIR_SIZE, PAGE_KERNEL_EXEC);
>> #endif
>>
>> /*
>> - * Setup early PGD covering entire kernel which will allows
>> + * Setup early PGD covering entire kernel which will allow
>> * us to reach paging_init(). We map all memory banks later
>> * in setup_vm_final() below.
>> */
>> - end_va = PAGE_OFFSET + load_sz;
>> - for (va = PAGE_OFFSET; va < end_va; va += map_size)
>> - create_pgd_mapping(early_pg_dir, va,
>> - load_pa + (va - PAGE_OFFSET),
>> - map_size, PAGE_KERNEL_EXEC);
>> + create_kernel_page_table(early_pg_dir, map_size);
>>
>> /* Create fixed mapping for early FDT parsing */
>> end_va = __fix_to_virt(FIX_FDT) + FIX_FDT_SIZE;
>> @@ -441,6 +460,7 @@ static void __init setup_vm_final(void)
>> uintptr_t va, map_size;
>> phys_addr_t pa, start, end;
>> struct memblock_region *reg;
>> + static struct vm_struct vm_kernel = { 0 };
>>
>> /* Set mmu_enabled flag */
>> mmu_enabled = true;
>> @@ -467,10 +487,22 @@ static void __init setup_vm_final(void)
>> for (pa = start; pa < end; pa += map_size) {
>> va = (uintptr_t)__va(pa);
>> create_pgd_mapping(swapper_pg_dir, va, pa,
>> - map_size, PAGE_KERNEL_EXEC);
>> + map_size, PAGE_KERNEL);
>> }
>> }
>>
>> + /* Map the kernel */
>> + create_kernel_page_table(swapper_pg_dir, PMD_SIZE);
>> +
>> + /* Reserve the vmalloc area occupied by the kernel */
>> + vm_kernel.addr = (void *)kernel_virt_addr;
>> + vm_kernel.phys_addr = load_pa;
>> + vm_kernel.size = (load_sz + PMD_SIZE) & ~(PMD_SIZE - 1);
>> + vm_kernel.flags = VM_MAP | VM_NO_GUARD;
>> + vm_kernel.caller = __builtin_return_address(0);
>> +
>> + vm_area_add_early(&vm_kernel);
>> +
>> /* Clear fixmap PTE and PMD mappings */
>> clear_fixmap(FIX_PTE);
>> clear_fixmap(FIX_PMD);
>> diff --git a/arch/riscv/mm/physaddr.c b/arch/riscv/mm/physaddr.c
>> index e8e4dcd39fed..35703d5ef5fd 100644
>> --- a/arch/riscv/mm/physaddr.c
>> +++ b/arch/riscv/mm/physaddr.c
>> @@ -23,7 +23,7 @@ EXPORT_SYMBOL(__virt_to_phys);
>>
>> phys_addr_t __phys_addr_symbol(unsigned long x)
>> {
>> - unsigned long kernel_start = (unsigned long)PAGE_OFFSET;
>> + unsigned long kernel_start = (unsigned long)kernel_virt_addr;
>> unsigned long kernel_end = (unsigned long)_end;
>>
>> /*
>> --
>> 2.20.1
>>
^ permalink raw reply
* Re: [PATCH v3 7/7] [not for merge] netstats: example use of stats_fs API
From: Emanuele Giuseppe Esposito @ 2020-05-26 15:45 UTC (permalink / raw)
To: Andrew Lunn
Cc: linux-s390, kvm, linux-doc, netdev, Emanuele Giuseppe Esposito,
linux-kernel, kvm-ppc, Jonathan Adams, Christian Borntraeger,
Alexander Viro, David Rientjes, linux-fsdevel, Paolo Bonzini,
linux-mips, linuxppc-dev, linux-arm-kernel, Jim Mattson
In-Reply-To: <20200526141605.GJ768009@lunn.ch>
Hi Andrew
> How do you atomically get and display a group of statistics?
>
> If you look at how the netlink socket works, you will see code like:
>
> do {
> start = u64_stats_fetch_begin_irq(&cpu_stats->syncp);
> rx_packets = cpu_stats->rx_packets;
> rx_bytes = cpu_stats->rx_bytes;
> ....
> } while (u64_stats_fetch_retry_irq(&cpu_stats->syncp, start));
>
> It will ensure that rx_packets and rx_bytes are consistent with each
> other. If the value of the sequence counter changes while inside the
> loop, the loop so repeated until it does not change.
>
> In general, hardware counters in NICs are the same. You tell it to
> take a snapshot of the statistics counters, and then read them all
> back, to give a consistent view across all the statistics.
>
> I've not looked at this new code in detail, but it looks like you have
> one file per statistic, and assume each statistic is independent of
> every other statistic. This independence can limit how you use the
> values, particularly when debugging. The netlink interface we use does
> not have this limitation.
You're right, statistics are treated independently so what you describe
is currently not supported.
In KVM the utilization is more qualitative, so there isn't such problem.
But as long as the interface is based on file access, the possibility of
snapshotting might not be useful; however, it could still be considered
to be added later together with the binary access.
Jonathan, how is your metricfs handling this case?
Thank you,
Emanuele
^ permalink raw reply
* powerpc/pci: [PATCH 1/1 V3] PCIE PHB reset
From: wenxiong @ 2020-05-26 13:21 UTC (permalink / raw)
To: linuxppc-dev; +Cc: brking, Wen Xiong, oohall, wenxiong
From: Wen Xiong <wenxiong@linux.vnet.ibm.com>
Several device drivers hit EEH(Extended Error handling) when triggering
kdump on Pseries PowerVM. This patch implemented a reset of the PHBs
in pci general code when triggering kdump. PHB reset stop all PCI
transactions from normal kernel. We have tested the patch in several
enviroments:
- direct slot adapters
- adapters under the switch
- a VF adapter in PowerVM
- a VF adapter/adapter in KVM guest.
Signed-off-by: Wen Xiong <wenxiong@linux.vnet.ibm.com>
---
arch/powerpc/platforms/pseries/pci.c | 152 +++++++++++++++++++++++++++
1 file changed, 152 insertions(+)
diff --git a/arch/powerpc/platforms/pseries/pci.c b/arch/powerpc/platforms/pseries/pci.c
index 911534b89c85..cb7e4276cf04 100644
--- a/arch/powerpc/platforms/pseries/pci.c
+++ b/arch/powerpc/platforms/pseries/pci.c
@@ -11,6 +11,8 @@
#include <linux/kernel.h>
#include <linux/pci.h>
#include <linux/string.h>
+#include <linux/crash_dump.h>
+#include <linux/delay.h>
#include <asm/eeh.h>
#include <asm/pci-bridge.h>
@@ -354,3 +356,153 @@ int pseries_root_bridge_prepare(struct pci_host_bridge *bridge)
return 0;
}
+
+/**
+ * pseries_get_pdn_addr - Retrieve PHB address
+ * @pe: EEH PE
+ *
+ * Retrieve the assocated PHB address. Actually, there're 2 RTAS
+ * function calls dedicated for the purpose. We need implement
+ * it through the new function and then the old one. Besides,
+ * you should make sure the config address is figured out from
+ * FDT node before calling the function.
+ *
+ */
+static int pseries_get_pdn_addr(struct pci_controller *phb)
+{
+ int ret = -1;
+ int rets[3];
+ int ibm_get_config_addr_info;
+ int ibm_get_config_addr_info2;
+ int config_addr = 0;
+ struct pci_dn *root_pdn, *pdn;
+
+ ibm_get_config_addr_info2 = rtas_token("ibm,get-config-addr-info2");
+ ibm_get_config_addr_info = rtas_token("ibm,get-config-addr-info");
+
+ root_pdn = PCI_DN(phb->dn);
+ pdn = list_first_entry(&root_pdn->child_list, struct pci_dn, list);
+ config_addr = (pdn->busno << 16) | (pdn->devfn << 8);
+
+ if (ibm_get_config_addr_info2 != RTAS_UNKNOWN_SERVICE) {
+ /*
+ * First of all, we need to make sure there has one PE
+ * associated with the device. If option is 1, it
+ * queries if config address is supported in a PE or not.
+ * If option is 0, it returns PE config address or config
+ * address for the PE primary bus.
+ */
+ ret = rtas_call(ibm_get_config_addr_info2, 4, 2, rets,
+ config_addr, BUID_HI(pdn->phb->buid),
+ BUID_LO(pdn->phb->buid), 1);
+ if (ret || (rets[0] == 0)) {
+ pr_warn("%s: Failed to get address for PHB#%x-PE# option=%d config_addr=%x\n",
+ __func__, pdn->phb->global_number, 1, rets[0]);
+ return -1;
+ }
+
+ /* Retrieve the associated PE config address */
+ ret = rtas_call(ibm_get_config_addr_info2, 4, 2, rets,
+ config_addr, BUID_HI(pdn->phb->buid),
+ BUID_LO(pdn->phb->buid), 0);
+ if (ret) {
+ pr_warn("%s: Failed to get address for PHB#%x-PE# option=%d config_addr=%x\n",
+ __func__, pdn->phb->global_number, 0, rets[0]);
+ return -1;
+ }
+ return rets[0];
+ }
+
+ if (ibm_get_config_addr_info != RTAS_UNKNOWN_SERVICE) {
+ ret = rtas_call(ibm_get_config_addr_info, 4, 2, rets,
+ config_addr, BUID_HI(pdn->phb->buid),
+ BUID_LO(pdn->phb->buid), 0);
+ if (ret || rets[0]) {
+ pr_warn("%s: Failed to get address for PHB#%x-PE# config_addr=%x\n",
+ __func__, pdn->phb->global_number, rets[0]);
+ return -1;
+ }
+ return rets[0];
+ }
+
+ return ret;
+}
+
+static int __init pseries_phb_reset(void)
+{
+ struct pci_controller *phb;
+ int config_addr;
+ int ibm_set_slot_reset;
+ int ibm_configure_pe;
+ int ret;
+
+ if (is_kdump_kernel() || reset_devices) {
+ pr_info("Issue PHB reset ...\n");
+ ibm_set_slot_reset = rtas_token("ibm,set-slot-reset");
+ ibm_configure_pe = rtas_token("ibm,configure-pe");
+
+ if (ibm_set_slot_reset == RTAS_UNKNOWN_SERVICE ||
+ ibm_configure_pe == RTAS_UNKNOWN_SERVICE) {
+ pr_info("%s: EEH functionality not supported\n",
+ __func__);
+ }
+
+ list_for_each_entry(phb, &hose_list, list_node) {
+ config_addr = pseries_get_pdn_addr(phb);
+ if (config_addr == -1)
+ continue;
+
+ ret = rtas_call(ibm_set_slot_reset, 4, 1, NULL,
+ config_addr, BUID_HI(phb->buid),
+ BUID_LO(phb->buid), EEH_RESET_FUNDAMENTAL);
+
+ /* If fundamental-reset not supported, try hot-reset */
+ if (ret == -8)
+ ret = rtas_call(ibm_set_slot_reset, 4, 1, NULL,
+ config_addr, BUID_HI(phb->buid),
+ BUID_LO(phb->buid), EEH_RESET_HOT);
+
+ if (ret) {
+ pr_err("%s: PHB#%x-PE# failed with rtas_call activate reset=%d\n",
+ __func__, phb->global_number, ret);
+ continue;
+ }
+ }
+ msleep(EEH_PE_RST_SETTLE_TIME);
+
+ list_for_each_entry(phb, &hose_list, list_node) {
+ config_addr = pseries_get_pdn_addr(phb);
+ if (config_addr == -1)
+ continue;
+
+ ret = rtas_call(ibm_set_slot_reset, 4, 1, NULL,
+ config_addr, BUID_HI(phb->buid),
+ BUID_LO(phb->buid), EEH_RESET_DEACTIVATE);
+ if (ret) {
+ pr_err("%s: PHB#%x-PE# failed with rtas_call deactive reset=%d\n",
+ __func__, phb->global_number, ret);
+ continue;
+ }
+ }
+ msleep(EEH_PE_RST_SETTLE_TIME);
+
+ list_for_each_entry(phb, &hose_list, list_node) {
+ config_addr = pseries_get_pdn_addr(phb);
+ if (config_addr == -1)
+ continue;
+
+ ret = rtas_call(ibm_configure_pe, 3, 1, NULL,
+ config_addr, BUID_HI(phb->buid),
+ BUID_LO(phb->buid));
+ if (ret) {
+ pr_err("%s: PHB#%x-PE# failed with rtas_call configure_pe =%d\n",
+ __func__, phb->global_number, ret);
+ continue;
+ }
+ }
+ }
+
+ return 0;
+}
+machine_postcore_initcall(pseries, pseries_phb_reset);
+
--
2.18.1
^ permalink raw reply related
* Re: [PATCH v3 7/7] [not for merge] netstats: example use of stats_fs API
From: Andrew Lunn @ 2020-05-26 14:16 UTC (permalink / raw)
To: Emanuele Giuseppe Esposito
Cc: linux-s390, kvm, linux-doc, netdev, Emanuele Giuseppe Esposito,
linux-kernel, kvm-ppc, Jonathan Adams, Christian Borntraeger,
Alexander Viro, David Rientjes, linux-fsdevel, Paolo Bonzini,
linux-mips, linuxppc-dev, linux-arm-kernel, Jim Mattson
In-Reply-To: <20200526110318.69006-8-eesposit@redhat.com>
On Tue, May 26, 2020 at 01:03:17PM +0200, Emanuele Giuseppe Esposito wrote:
> Apply stats_fs on the networking statistics subsystem.
>
> Currently it only works with disabled network namespace
> (CONFIG_NET_NS=n), because multiple namespaces will have the same
> device name under the same root source that will cause a conflict in
> stats_fs.
Hi Emanuele
How do you atomically get and display a group of statistics?
If you look at how the netlink socket works, you will see code like:
do {
start = u64_stats_fetch_begin_irq(&cpu_stats->syncp);
rx_packets = cpu_stats->rx_packets;
rx_bytes = cpu_stats->rx_bytes;
....
} while (u64_stats_fetch_retry_irq(&cpu_stats->syncp, start));
It will ensure that rx_packets and rx_bytes are consistent with each
other. If the value of the sequence counter changes while inside the
loop, the loop so repeated until it does not change.
In general, hardware counters in NICs are the same. You tell it to
take a snapshot of the statistics counters, and then read them all
back, to give a consistent view across all the statistics.
I've not looked at this new code in detail, but it looks like you have
one file per statistic, and assume each statistic is independent of
every other statistic. This independence can limit how you use the
values, particularly when debugging. The netlink interface we use does
not have this limitation.
Andrew
^ permalink raw reply
* Re: [PATCH] selftests: powerpc: Add test for execute-disabled pkeys
From: Sandipan Das @ 2020-05-26 14:03 UTC (permalink / raw)
To: Michael Ellerman
Cc: fweimer, aneesh.kumar, linuxram, linux-mm, linux-kselftest,
linuxppc-dev, bauerman
In-Reply-To: <87367mg9h4.fsf@mpe.ellerman.id.au>
Hi Michael,
On 26/05/20 6:05 pm, Michael Ellerman wrote:
> [...]
>> +
>> +/* Override definitions as they might be inconsistent */
>> +#undef PKEY_DISABLE_ACCESS
>> +#define PKEY_DISABLE_ACCESS 0x3
>
> Why would they be inconsistent?
>
The definition in sys/mman.h still uses the value specific to
Intel's implementation i.e. 1, when this should have been 3
for powerpc. I have seen this on Ubuntu 18.04 and 20.04.
>
>> +/* Older distros might not define this */
>> +#ifndef SEGV_PKUERR
>> +#define SEGV_PKUERR 4
>> +#endif
> ...
>> +
>> + /* Restore permissions in order to continue */
>> + switch (fcode) {
>> + case SEGV_ACCERR:
>> + if (mprotect(insns, pgsize, PROT_READ | PROT_WRITE)) {
>> + perror("mprotect");
>> + goto fail;
>> + }
>> + break;
>> + case SEGV_PKUERR:
>> + if (sinfo->si_pkey != fpkey)
>> + goto fail;
>
> This doesn't compile on older distros, eg Ubuntu 16.04:
>
> pkey_exec_prot.c: In function 'segv_handler':
> pkey_exec_prot.c:121:12: error: 'siginfo_t {aka struct <anonymous>}' has no member named 'si_pkey'
> if (sinfo->si_pkey != fpkey)
> ^
> pkey_exec_prot.c:151:24: error: 'siginfo_t {aka struct <anonymous>}' has no member named 'si_pkey'
> pkey_set_rights(sinfo->si_pkey, 0);
> ^
> ../../lib.mk:142: recipe for target '/output/kselftest/powerpc/mm/pkey_exec_prot' failed
>
Thanks for reporting this.
>
> I think a reasonable solution is to use the absence of SEGV_PKUERR to
> basically turn the whole test into a nop at build time, eg:
>
> diff --git a/tools/testing/selftests/powerpc/mm/pkey_exec_prot.c b/tools/testing/selftests/powerpc/mm/pkey_exec_prot.c
> index b346ad205e68..218257b89fbb 100644
> --- a/tools/testing/selftests/powerpc/mm/pkey_exec_prot.c
> +++ b/tools/testing/selftests/powerpc/mm/pkey_exec_prot.c
> @@ -30,9 +30,7 @@
> #define PKEY_DISABLE_EXECUTE 0x4
>
> /* Older distros might not define this */
> -#ifndef SEGV_PKUERR
> -#define SEGV_PKUERR 4
> -#endif
> +#ifdef SEGV_PKUERR
>
> #define SYS_pkey_mprotect 386
> #define SYS_pkey_alloc 384
> @@ -319,6 +317,13 @@ static int test(void)
>
> return 0;
> }
> +#else
> +static int test(void)
> +{
> + printf("Test built with old libc lacking pkey support.\n");
> + SKIP_IF(true);
> +}
> +#endif /* SEGV_PKUERR */
>
> int main(void)
> {
>
>
Or can I use this from the pkey tests under selftests/vm?
static inline u32 *siginfo_get_pkey_ptr(siginfo_t *si)
{
#ifdef si_pkey
return &si->si_pkey;
#else
return (u32 *)(((u8 *)si) + si_pkey_offset);
#endif
}
Where si_pkey_offset is 0x20 for powerpc.
- Sandipan
^ permalink raw reply
* Re: [PATCH v2 0/2] powerpc: Remove support for ppc405/440 Xilinx platforms
From: Michael Ellerman @ 2020-05-26 13:44 UTC (permalink / raw)
To: Michal Simek, Michal Simek, Takashi Iwai
Cc: Kate Stewart, Mark Rutland, Desnes A. Nunes do Rosario,
Geert Uytterhoeven, linux-doc, alsa-devel, dri-devel,
Jaroslav Kysela, Richard Fontana, Paul Mackerras, Miquel Raynal,
Mauro Carvalho Chehab, Fabio Estevam, Sasha Levin, sfr,
Jonathan Corbet, maz, Masahiro Yamada, Takashi Iwai, YueHaibing,
Krzysztof Kozlowski, Allison Randal, linux-arm-kernel, devicetree,
Andrew Donnellan, Arnd Bergmann, Bartlomiej Zolnierkiewicz,
Alistair Popple, linuxppc-dev, Nicholas Piggin, Alexios Zavras,
Mark Brown, git, linux-fbdev, Jonathan Cameron, Thomas Gleixner,
Andy Shevchenko, Dmitry Vyukov, Christophe Leroy, Wei Hu,
Greg Kroah-Hartman, Nick Desaulniers, linux-kernel, Rob Herring,
Enrico Weigelt, David S. Miller, Thiago Jung Bauermann
In-Reply-To: <c22540fb-1c54-b718-9045-3ee645c30322@monstr.eu>
Michal Simek <monstr@monstr.eu> writes:
> Hi Michael,
>
> On 01. 04. 20 13:30, Michal Simek wrote:
>> On 01. 04. 20 12:38, Takashi Iwai wrote:
>>> On Wed, 01 Apr 2020 12:35:16 +0200,
>>> Michael Ellerman wrote:
>>>>
>>>> Michal Simek <michal.simek@xilinx.com> writes:
>>>>> On 01. 04. 20 4:07, Michael Ellerman wrote:
>>>>>> Michal Simek <michal.simek@xilinx.com> writes:
>>>>>>> Hi,
>>>>>>>
>>>>>>> recently we wanted to update xilinx intc driver and we found that function
>>>>>>> which we wanted to remove is still wired by ancient Xilinx PowerPC
>>>>>>> platforms. Here is the thread about it.
>>>>>>> https://lore.kernel.org/linux-next/48d3232d-0f1d-42ea-3109-f44bbabfa2e8@xilinx.com/
>>>>>>>
>>>>>>> I have been talking about it internally and there is no interest in these
>>>>>>> platforms and it is also orphan for quite a long time. None is really
>>>>>>> running/testing these platforms regularly that's why I think it makes sense
>>>>>>> to remove them also with drivers which are specific to this platform.
>>>>>>>
>>>>>>> U-Boot support was removed in 2017 without anybody complain about it
>>>>>>> https://github.com/Xilinx/u-boot-xlnx/commit/98f705c9cefdfdba62c069821bbba10273a0a8ed
>>>>>>>
>>>>>>> Based on current ppc/next.
>>>>>>>
>>>>>>> If anyone has any objection about it, please let me know.
>>>>>>
>>>>>> Thanks for taking the time to find all this code and remove it.
>>>>>>
>>>>>> I'm not going to take this series for v5.7, it was posted too close to
>>>>>> the merge window, and doing so wouldn't give people much time to object,
>>>>>> especially given people are distracted at the moment.
>>>>>>
>>>>>> I'm happy to take it for v5.8, assuming there's no major objections.
>>>>>
>>>>> Sure. Just to let you know Christophe Leroy included this patch in his
>>>>> series about ppc405 removal. It should be the same.
>>>>>
>>>>> If you don't want to take that alsa patch I can send it separately and
>>>>> this patch can be taken from his series. I don't really mind but please
>>>>> let me know what way you prefer.
>>>>
>>>> It's better to keep it all together, so I'm happy take the alsa patch as
>>>> well, it's already been acked.
>
> Can you please take this series? I know that there is v5 from Christophe
> which has this 1/2 as 1/13. But I need this alsa patch too and I would
> like to close this because it is around for almost 2 months and none
> raised a concern about removing just these Xilinx platforms.
Sorry I meant to reply to your last mail.
I have Christophe's series in my testing branch, planning for it to be
in v5.8.
Even if the rest of his series doesn't make it for some reason, as you
say the Xilinx removal is uncontroversial so I'll keep that in.
I forgot about the sound patch, I'll pick that up as well.
cheers
^ permalink raw reply
* Re: [PATCH v4 07/45] powerpc/ptdump: Limit size of flags text to 1/2 chars on PPC32
From: Michael Ellerman @ 2020-05-26 12:53 UTC (permalink / raw)
To: Christophe Leroy, Benjamin Herrenschmidt, Paul Mackerras
Cc: linuxppc-dev, linux-kernel
In-Reply-To: <e505c554-21b1-3d02-1ea5-c2a214b80ebb@csgroup.eu>
Christophe Leroy <christophe.leroy@csgroup.eu> writes:
> Le 25/05/2020 à 07:15, Michael Ellerman a écrit :
>> Christophe Leroy <christophe.leroy@csgroup.eu> writes:
>>> In order to have all flags fit on a 80 chars wide screen,
>>> reduce the flags to 1 char (2 where ambiguous).
>>
>> I don't love this, the output is less readable. Is fitting on an 80 char
>> screen a real issue for you? I just make my terminal window bigger.
>
> I don't have strong opinion about that, and the terminal can be made bigger.
> I just don't like how messy it is, some flags are so big that they hide
> other ones and getting it more ordered and more compact helped me during
> all the verifications I did with this series, but we can leave it as is
> if you prefer.
I think I do.
> Would you like a v5 without patches 7 and 8 ? Or I can just resend the
> patches that will be impacted, that is 9 and 38 ?
I dropped 7 and 8 and then fixed up 9 and 38, it was easy enough.
I used "coherent" and "huge".
> With the change I get.
>
> ---[ Start of kernel VM ]---
> 0xc0000000-0xc0ffffff 0x00000000 16M h r x p sh a
> 0xc1000000-0xc7ffffff 0x01000000 112M h rw p sh d a
> ---[ vmalloc() Area ]---
> 0xc9000000-0xc9003fff 0x050e4000 16K rw p sh d a
> 0xc9008000-0xc900bfff 0x050ec000 16K rw p sh d a
> 0xc9010000-0xc9013fff 0xd0000000 16K rw p i g sh d a
> 0xc9018000-0xc901bfff 0x050f0000 16K rw p sh d a
It's definitely more compact :)
But I worry no one other than you will be able to decipher it, without
constantly referring back to the source code.
cheers
^ permalink raw reply
* Re: [PATCH] selftests: powerpc: Add test for execute-disabled pkeys
From: Michael Ellerman @ 2020-05-26 12:35 UTC (permalink / raw)
To: Sandipan Das
Cc: fweimer, aneesh.kumar, linuxram, linux-mm, linux-kselftest,
linuxppc-dev, bauerman
In-Reply-To: <20200508162332.65316-1-sandipan@linux.ibm.com>
Hi Sandipan,
Sandipan Das <sandipan@linux.ibm.com> writes:
> diff --git a/tools/testing/selftests/powerpc/mm/pkey_exec_prot.c b/tools/testing/selftests/powerpc/mm/pkey_exec_prot.c
> new file mode 100644
> index 000000000000..b346ad205e68
> --- /dev/null
> +++ b/tools/testing/selftests/powerpc/mm/pkey_exec_prot.c
> @@ -0,0 +1,326 @@
> +// SPDX-License-Identifier: GPL-2.0+
> +
> +/*
> + * Copyright 2020, Sandipan Das, IBM Corp.
> + *
> + * Test if applying execute protection on pages using memory
> + * protection keys works as expected.
> + */
> +
> +#define _GNU_SOURCE
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <signal.h>
> +
> +#include <time.h>
> +#include <unistd.h>
> +#include <sys/mman.h>
> +
> +#include "utils.h"
> +
> +/* Override definitions as they might be inconsistent */
> +#undef PKEY_DISABLE_ACCESS
> +#define PKEY_DISABLE_ACCESS 0x3
Why would they be inconsistent?
> +/* Older distros might not define this */
> +#ifndef SEGV_PKUERR
> +#define SEGV_PKUERR 4
> +#endif
...
> +
> + /* Restore permissions in order to continue */
> + switch (fcode) {
> + case SEGV_ACCERR:
> + if (mprotect(insns, pgsize, PROT_READ | PROT_WRITE)) {
> + perror("mprotect");
> + goto fail;
> + }
> + break;
> + case SEGV_PKUERR:
> + if (sinfo->si_pkey != fpkey)
> + goto fail;
This doesn't compile on older distros, eg Ubuntu 16.04:
pkey_exec_prot.c: In function 'segv_handler':
pkey_exec_prot.c:121:12: error: 'siginfo_t {aka struct <anonymous>}' has no member named 'si_pkey'
if (sinfo->si_pkey != fpkey)
^
pkey_exec_prot.c:151:24: error: 'siginfo_t {aka struct <anonymous>}' has no member named 'si_pkey'
pkey_set_rights(sinfo->si_pkey, 0);
^
../../lib.mk:142: recipe for target '/output/kselftest/powerpc/mm/pkey_exec_prot' failed
I think a reasonable solution is to use the absence of SEGV_PKUERR to
basically turn the whole test into a nop at build time, eg:
diff --git a/tools/testing/selftests/powerpc/mm/pkey_exec_prot.c b/tools/testing/selftests/powerpc/mm/pkey_exec_prot.c
index b346ad205e68..218257b89fbb 100644
--- a/tools/testing/selftests/powerpc/mm/pkey_exec_prot.c
+++ b/tools/testing/selftests/powerpc/mm/pkey_exec_prot.c
@@ -30,9 +30,7 @@
#define PKEY_DISABLE_EXECUTE 0x4
/* Older distros might not define this */
-#ifndef SEGV_PKUERR
-#define SEGV_PKUERR 4
-#endif
+#ifdef SEGV_PKUERR
#define SYS_pkey_mprotect 386
#define SYS_pkey_alloc 384
@@ -319,6 +317,13 @@ static int test(void)
return 0;
}
+#else
+static int test(void)
+{
+ printf("Test built with old libc lacking pkey support.\n");
+ SKIP_IF(true);
+}
+#endif /* SEGV_PKUERR */
int main(void)
{
cheers
^ permalink raw reply related
* Re: [RESEND PATCH v7 4/5] ndctl/papr_scm, uapi: Add support for PAPR nvdimm specific methods
From: Michael Ellerman @ 2020-05-26 12:14 UTC (permalink / raw)
To: Vaibhav Jain, Ira Weiny
Cc: Aneesh Kumar K . V, linuxppc-dev, linux-kernel, Steven Rostedt,
linux-nvdimm
In-Reply-To: <87a71ww7f9.fsf@linux.ibm.com>
Vaibhav Jain <vaibhav@linux.ibm.com> writes:
> Hi Ira, Mpe and Aneesh,
>
> Vaibhav Jain <vaibhav@linux.ibm.com> writes:
>
>> Michael Ellerman <mpe@ellerman.id.au> writes:
>>
>>> Ira Weiny <ira.weiny@intel.com> writes:
>>>> On Wed, May 20, 2020 at 12:30:57AM +0530, Vaibhav Jain wrote:
>>>>> Introduce support for Papr nvDimm Specific Methods (PDSM) in papr_scm
>>>>> modules and add the command family to the white list of NVDIMM command
>>>>> sets. Also advertise support for ND_CMD_CALL for the dimm
>>>>> command mask and implement necessary scaffolding in the module to
>>>>> handle ND_CMD_CALL ioctl and PDSM requests that we receive.
>>> ...
>>>>> + *
>>>>> + * Payload Version:
>>>>> + *
>>>>> + * A 'payload_version' field is present in PDSM header that indicates a specific
>>>>> + * version of the structure present in PDSM Payload for a given PDSM command.
>>>>> + * This provides backward compatibility in case the PDSM Payload structure
>>>>> + * evolves and different structures are supported by 'papr_scm' and 'libndctl'.
>>>>> + *
>>>>> + * When sending a PDSM Payload to 'papr_scm', 'libndctl' should send the version
>>>>> + * of the payload struct it supports via 'payload_version' field. The 'papr_scm'
>>>>> + * module when servicing the PDSM envelope checks the 'payload_version' and then
>>>>> + * uses 'payload struct version' == MIN('payload_version field',
>>>>> + * 'max payload-struct-version supported by papr_scm') to service the PDSM.
>>>>> + * After servicing the PDSM, 'papr_scm' put the negotiated version of payload
>>>>> + * struct in returned 'payload_version' field.
>>>>
>>>> FWIW many people believe using a size rather than version is more sustainable.
>>>> It is expected that new payload structures are larger (more features) than the
>>>> previous payload structure.
>>>>
>>>> I can't find references at the moment through.
>>>
>>> I think clone_args is a good modern example:
>>>
>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/sched.h#n88
>>
>> Thank Ira and Mpe for pointing this out. I looked into how clone3 sycall
>> handles clone_args and few differences came out:
>>
>> * Unlike clone_args that are always transferred in one direction from
>> user-space to kernel, payload contents of pdsms are transferred in both
>> directions. Having a single version number makes it easier for
>> user-space and kernel to determine what data will be exchanged.
>>
>> * For PDSMs, the version number is negotiated between libndctl and
>> kernel. For example in case kernel only supports an older version of
>> a structure, its free to send a lower version number back to
>> libndctl. Such negotiations doesnt happen with clone3 syscall.
>
> If you are ok with the explaination above please let me know. I will
> quickly spin off a v8 addressing your review comments.
I don't have strong opinions about the user API, it's really up to the
nvdimm folks.
cheers
^ permalink raw reply
* [PATCH v3 7/7] [not for merge] netstats: example use of stats_fs API
From: Emanuele Giuseppe Esposito @ 2020-05-26 11:03 UTC (permalink / raw)
To: kvm
Cc: Emanuele Giuseppe Esposito, linux-s390, linux-doc, netdev,
Emanuele Giuseppe Esposito, linux-kernel, kvm-ppc, Jonathan Adams,
Christian Borntraeger, Alexander Viro, David Rientjes,
linux-fsdevel, Paolo Bonzini, linux-mips, linuxppc-dev,
linux-arm-kernel, Jim Mattson
In-Reply-To: <20200526110318.69006-1-eesposit@redhat.com>
Apply stats_fs on the networking statistics subsystem.
Currently it only works with disabled network namespace
(CONFIG_NET_NS=n), because multiple namespaces will have the same
device name under the same root source that will cause a conflict in
stats_fs.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
include/linux/netdevice.h | 2 ++
net/Kconfig | 1 +
net/core/dev.c | 66 +++++++++++++++++++++++++++++++++++++++
3 files changed, 69 insertions(+)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 130a668049ab..408c4e7b0e21 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -48,6 +48,7 @@
#include <uapi/linux/if_bonding.h>
#include <uapi/linux/pkt_cls.h>
#include <linux/hashtable.h>
+#include <linux/stats_fs.h>
struct netpoll_info;
struct device;
@@ -2117,6 +2118,7 @@ struct net_device {
unsigned wol_enabled:1;
struct list_head net_notifier_list;
+ struct stats_fs_source *stats_fs_src;
#if IS_ENABLED(CONFIG_MACSEC)
/* MACsec management functions */
diff --git a/net/Kconfig b/net/Kconfig
index df8d8c9bd021..3441d5bb6107 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -8,6 +8,7 @@ menuconfig NET
select NLATTR
select GENERIC_NET_UTILS
select BPF
+ select STATS_FS_API
---help---
Unless you really know what you are doing, you should say Y here.
The reason is that some programs need kernel networking support even
diff --git a/net/core/dev.c b/net/core/dev.c
index 522288177bbd..3db48cd1a097 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -142,6 +142,7 @@
#include <linux/net_namespace.h>
#include <linux/indirect_call_wrapper.h>
#include <net/devlink.h>
+#include <linux/stats_fs.h>
#include "net-sysfs.h"
@@ -150,6 +151,11 @@
/* This should be increased if a protocol with a bigger head is added. */
#define GRO_MAX_HEAD (MAX_HEADER + 128)
+#define NETDEV_STAT(str, m, ...) \
+ { str, offsetof(struct rtnl_link_stats64, m), \
+ &stats_fs_type_netdev_u64, \
+ STATS_FS_SUM, ## __VA_ARGS__ }
+
static DEFINE_SPINLOCK(ptype_lock);
static DEFINE_SPINLOCK(offload_lock);
struct list_head ptype_base[PTYPE_HASH_SIZE] __read_mostly;
@@ -196,6 +202,53 @@ static DEFINE_READ_MOSTLY_HASHTABLE(napi_hash, 8);
static seqcount_t devnet_rename_seq;
+static uint64_t stats_fs_get_netdev_u64(struct stats_fs_value *val,
+ void *base)
+{
+ struct net_device *netdev = (struct net_device *)base;
+ struct rtnl_link_stats64 net_stats;
+
+ dev_get_stats(netdev, &net_stats);
+
+ return stats_fs_get_u64(val, &net_stats);
+}
+
+static struct stats_fs_type stats_fs_type_netdev_u64 = {
+ .get = stats_fs_get_netdev_u64,
+ .clear = NULL,
+ .sign = false
+};
+
+static struct stats_fs_source *netdev_root;
+
+static struct stats_fs_value stats_fs_netdev_entries[] = {
+ NETDEV_STAT("rx_packets", rx_packets),
+ NETDEV_STAT("tx_packets", tx_packets),
+ NETDEV_STAT("rx_bytes", rx_bytes),
+ NETDEV_STAT("tx_bytes", tx_bytes),
+ NETDEV_STAT("rx_errors", rx_errors),
+ NETDEV_STAT("tx_errors", tx_errors),
+ NETDEV_STAT("rx_dropped", rx_dropped),
+ NETDEV_STAT("tx_dropped", tx_dropped),
+ NETDEV_STAT("multicast", multicast),
+ NETDEV_STAT("collisions", collisions),
+ NETDEV_STAT("rx_length_errors", rx_length_errors),
+ NETDEV_STAT("rx_over_errors", rx_over_errors),
+ NETDEV_STAT("rx_crc_errors", rx_crc_errors),
+ NETDEV_STAT("rx_frame_errors", rx_frame_errors),
+ NETDEV_STAT("rx_fifo_errors", rx_fifo_errors),
+ NETDEV_STAT("rx_missed_errors", rx_missed_errors),
+ NETDEV_STAT("tx_aborted_errors", tx_aborted_errors),
+ NETDEV_STAT("tx_carrier_errors", tx_carrier_errors),
+ NETDEV_STAT("tx_fifo_errors", tx_fifo_errors),
+ NETDEV_STAT("tx_heartbeat_errors", tx_heartbeat_errors),
+ NETDEV_STAT("tx_window_errors", tx_window_errors),
+ NETDEV_STAT("rx_compressed", rx_compressed),
+ NETDEV_STAT("tx_compressed", tx_compressed),
+ NETDEV_STAT("rx_nohandler", rx_nohandler),
+ { NULL }
+};
+
static inline void dev_base_seq_inc(struct net *net)
{
while (++net->dev_base_seq == 0)
@@ -8783,6 +8836,11 @@ static void rollback_registered_many(struct list_head *head)
ASSERT_RTNL();
list_for_each_entry_safe(dev, tmp, head, unreg_list) {
+ stats_fs_source_remove_subordinate(netdev_root,
+ dev->stats_fs_src);
+ stats_fs_source_revoke(dev->stats_fs_src);
+ stats_fs_source_put(dev->stats_fs_src);
+
/* Some devices call without registering
* for initialization unwind. Remove those
* devices and proceed with the remaining.
@@ -9436,6 +9494,11 @@ int register_netdevice(struct net_device *dev)
dev->rtnl_link_state == RTNL_LINK_INITIALIZED)
rtmsg_ifinfo(RTM_NEWLINK, dev, ~0U, GFP_KERNEL);
+ dev->stats_fs_src = stats_fs_source_create(0, dev->name);
+ stats_fs_source_add_subordinate(netdev_root, dev->stats_fs_src);
+ stats_fs_source_add_values(dev->stats_fs_src, stats_fs_netdev_entries,
+ dev, 0);
+
out:
return ret;
@@ -10500,6 +10563,9 @@ static int __init net_dev_init(void)
if (netdev_kobject_init())
goto out;
+ netdev_root = stats_fs_source_create(0, "net");
+ stats_fs_source_register(netdev_root);
+
INIT_LIST_HEAD(&ptype_all);
for (i = 0; i < PTYPE_HASH_SIZE; i++)
INIT_LIST_HEAD(&ptype_base[i]);
--
2.25.4
^ permalink raw reply related
* [PATCH v3 6/7] [not for merge] kvm: example of stats_fs_value show function
From: Emanuele Giuseppe Esposito @ 2020-05-26 11:03 UTC (permalink / raw)
To: kvm
Cc: Emanuele Giuseppe Esposito, linux-s390, linux-doc, netdev,
Emanuele Giuseppe Esposito, linux-kernel, kvm-ppc, Jonathan Adams,
Christian Borntraeger, Alexander Viro, David Rientjes,
linux-fsdevel, Paolo Bonzini, linux-mips, linuxppc-dev,
linux-arm-kernel, Jim Mattson
In-Reply-To: <20200526110318.69006-1-eesposit@redhat.com>
Add an example of the show function using the mp_state value.
mp_state is an enum that represents the VCPU state,
so instead of displaying its integer representation,
the show function takes care of translating the integer into a
more meaningful string representation.
The VCPU status is shown in the kvm/<vmid>/vcpu<cpuid>/mp_state file
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
arch/x86/kvm/stats_fs.c | 54 +++++++++++++++++++++++++++++++++++++++++
1 file changed, 54 insertions(+)
diff --git a/arch/x86/kvm/stats_fs.c b/arch/x86/kvm/stats_fs.c
index f6edebb9c559..902be18562da 100644
--- a/arch/x86/kvm/stats_fs.c
+++ b/arch/x86/kvm/stats_fs.c
@@ -39,11 +39,65 @@ struct stats_fs_value stats_fs_vcpu_arch_tsc_frac[] = {
{ NULL } /* base is &kvm_tsc_scaling_ratio_frac_bits */
};
+char *stats_fs_vcpu_get_mpstate(uint64_t state)
+{
+ char *state_str;
+
+ state_str = kzalloc(20, GFP_KERNEL);
+ if (!state_str)
+ return ERR_PTR(-ENOMEM);
+
+ switch (state) {
+ case KVM_MP_STATE_RUNNABLE:
+ strcpy(state_str, "RUNNABLE");
+ break;
+ case KVM_MP_STATE_UNINITIALIZED:
+ strcpy(state_str, "UNINITIALIZED");
+ break;
+ case KVM_MP_STATE_INIT_RECEIVED:
+ strcpy(state_str, "INIT_RECEIVED");
+ break;
+ case KVM_MP_STATE_HALTED:
+ strcpy(state_str, "HALTED");
+ break;
+ case KVM_MP_STATE_SIPI_RECEIVED:
+ strcpy(state_str, "SIPI_RECEIVED");
+ break;
+ case KVM_MP_STATE_STOPPED:
+ strcpy(state_str, "STOPPED");
+ break;
+ case KVM_MP_STATE_CHECK_STOP:
+ strcpy(state_str, "CHECK_STOP");
+ break;
+ case KVM_MP_STATE_OPERATING:
+ strcpy(state_str, "OPERATING");
+ break;
+ case KVM_MP_STATE_LOAD:
+ strcpy(state_str, "LOAD");
+ break;
+ default:
+ strcpy(state_str, "UNRECOGNIZED");
+ break;
+ }
+
+ return state_str;
+}
+
+struct stats_fs_value stats_fs_vcpu_mp_state[] = {
+ VCPU_ARCH_STATS_FS("mp_state", kvm_vcpu_arch, mp_state,
+ .type = &stats_fs_type_u32,
+ .show = stats_fs_vcpu_get_mpstate),
+ { NULL }
+};
+
void kvm_arch_create_vcpu_stats_fs(struct kvm_vcpu *vcpu)
{
stats_fs_source_add_values(vcpu->stats_fs_src, stats_fs_vcpu_tsc_offset,
&vcpu->arch, 0);
+ stats_fs_source_add_values(vcpu->stats_fs_src, stats_fs_vcpu_mp_state,
+ &vcpu->arch, 0);
+
if (lapic_in_kernel(vcpu))
stats_fs_source_add_values(vcpu->stats_fs_src,
stats_fs_vcpu_arch_lapic_timer,
--
2.25.4
^ permalink raw reply related
* [PATCH v3 5/7] kvm_main: replace debugfs with stats_fs
From: Emanuele Giuseppe Esposito @ 2020-05-26 11:03 UTC (permalink / raw)
To: kvm
Cc: Emanuele Giuseppe Esposito, linux-s390, linux-doc, netdev,
Emanuele Giuseppe Esposito, linux-kernel, kvm-ppc, Jonathan Adams,
Christian Borntraeger, Alexander Viro, David Rientjes,
linux-fsdevel, Paolo Bonzini, linux-mips, linuxppc-dev,
linux-arm-kernel, Jim Mattson
In-Reply-To: <20200526110318.69006-1-eesposit@redhat.com>
Use stats_fs API instead of debugfs to create sources and add values.
This also requires to change all architecture files to replace the old
debugfs_entries with stats_fs_vcpu_entries and statsfs_vm_entries.
The files/folders name and organization is kept unchanged, and a symlink
in sys/kernel/debugfs/kvm is left for backward compatibility.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
arch/arm64/kvm/Kconfig | 1 +
arch/arm64/kvm/guest.c | 2 +-
arch/mips/kvm/Kconfig | 1 +
arch/mips/kvm/mips.c | 2 +-
arch/powerpc/kvm/Kconfig | 1 +
arch/powerpc/kvm/book3s.c | 12 +-
arch/powerpc/kvm/booke.c | 8 +-
arch/s390/kvm/Kconfig | 1 +
arch/s390/kvm/kvm-s390.c | 16 +-
arch/x86/include/asm/kvm_host.h | 2 +-
arch/x86/kvm/Kconfig | 1 +
arch/x86/kvm/Makefile | 2 +-
arch/x86/kvm/debugfs.c | 64 -------
arch/x86/kvm/stats_fs.c | 60 ++++++
arch/x86/kvm/x86.c | 11 +-
include/linux/kvm_host.h | 45 ++---
virt/kvm/arm/arm.c | 2 +-
virt/kvm/kvm_main.c | 318 +++++---------------------------
18 files changed, 161 insertions(+), 388 deletions(-)
delete mode 100644 arch/x86/kvm/debugfs.c
create mode 100644 arch/x86/kvm/stats_fs.c
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 449386d76441..f95f6d1c3610 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -23,6 +23,7 @@ config KVM
depends on OF
# for TASKSTATS/TASK_DELAY_ACCT:
depends on NET && MULTIUSER
+ select STATS_FS_API
select MMU_NOTIFIER
select PREEMPT_NOTIFIERS
select HAVE_KVM_CPU_RELAX_INTERCEPT
diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index 8417b200bec9..235ed44e4353 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -29,7 +29,7 @@
#include "trace.h"
-struct kvm_stats_debugfs_item debugfs_entries[] = {
+struct stats_fs_value stats_fs_vcpu_entries[] = {
VCPU_STAT("halt_successful_poll", halt_successful_poll),
VCPU_STAT("halt_attempted_poll", halt_attempted_poll),
VCPU_STAT("halt_poll_invalid", halt_poll_invalid),
diff --git a/arch/mips/kvm/Kconfig b/arch/mips/kvm/Kconfig
index b91d145aa2d5..b19fbc5297b4 100644
--- a/arch/mips/kvm/Kconfig
+++ b/arch/mips/kvm/Kconfig
@@ -19,6 +19,7 @@ config KVM
tristate "Kernel-based Virtual Machine (KVM) support"
depends on HAVE_KVM
depends on MIPS_FP_SUPPORT
+ select STATS_FS_API
select EXPORT_UASM
select PREEMPT_NOTIFIERS
select KVM_GENERIC_DIRTYLOG_READ_PROTECT
diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index fdf1c14d9205..a47d21f35444 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -39,7 +39,7 @@
#define VECTORSPACING 0x100 /* for EI/VI mode */
#endif
-struct kvm_stats_debugfs_item debugfs_entries[] = {
+struct stats_fs_value stats_fs_vcpu_entries[] = {
VCPU_STAT("wait", wait_exits),
VCPU_STAT("cache", cache_exits),
VCPU_STAT("signal", signal_exits),
diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index 12885eda324e..6f0675edfe7c 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -19,6 +19,7 @@ if VIRTUALIZATION
config KVM
bool
+ select STATS_FS_API
select PREEMPT_NOTIFIERS
select HAVE_KVM_EVENTFD
select HAVE_KVM_VCPU_ASYNC_IOCTL
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 37508a356f28..e3346b3087d0 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -38,7 +38,7 @@
/* #define EXIT_DEBUG */
-struct kvm_stats_debugfs_item debugfs_entries[] = {
+struct stats_fs_value stats_fs_vcpu_entries[] = {
VCPU_STAT("exits", sum_exits),
VCPU_STAT("mmio", mmio_exits),
VCPU_STAT("sig", signal_exits),
@@ -66,8 +66,14 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
VCPU_STAT("pthru_all", pthru_all),
VCPU_STAT("pthru_host", pthru_host),
VCPU_STAT("pthru_bad_aff", pthru_bad_aff),
- VM_STAT("largepages_2M", num_2M_pages, .mode = 0444),
- VM_STAT("largepages_1G", num_1G_pages, .mode = 0444),
+ { NULL }
+};
+
+struct stats_fs_value stats_fs_vm_entries[] = {
+ VM_STAT("largepages_2M", num_2M_pages,
+ .value_flag = STATS_FS_FLOATING_VALUE),
+ VM_STAT("largepages_1G", num_1G_pages,
+ .value_flag = STATS_FS_FLOATING_VALUE),
{ NULL }
};
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index c2984cb6dfa7..b14c07786cc8 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -35,7 +35,12 @@
unsigned long kvmppc_booke_handlers;
-struct kvm_stats_debugfs_item debugfs_entries[] = {
+struct stats_fs_value stats_fs_vm_entries[] = {
+ VM_STAT("remote_tlb_flush", remote_tlb_flush),
+ { NULL }
+};
+
+struct stats_fs_value stats_fs_vcpu_entries[] = {
VCPU_STAT("mmio", mmio_exits),
VCPU_STAT("sig", signal_exits),
VCPU_STAT("itlb_r", itlb_real_miss_exits),
@@ -54,7 +59,6 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
VCPU_STAT("halt_wakeup", halt_wakeup),
VCPU_STAT("doorbell", dbell_exits),
VCPU_STAT("guest doorbell", gdbell_exits),
- VM_STAT("remote_tlb_flush", remote_tlb_flush),
{ NULL }
};
diff --git a/arch/s390/kvm/Kconfig b/arch/s390/kvm/Kconfig
index def3b60f1fe8..ec8b2e04d698 100644
--- a/arch/s390/kvm/Kconfig
+++ b/arch/s390/kvm/Kconfig
@@ -20,6 +20,7 @@ config KVM
def_tristate y
prompt "Kernel-based Virtual Machine (KVM) support"
depends on HAVE_KVM
+ select STATS_FS_API
select PREEMPT_NOTIFIERS
select HAVE_KVM_CPU_RELAX_INTERCEPT
select HAVE_KVM_VCPU_ASYNC_IOCTL
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index dbeb7da07f18..f2f090b78529 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -57,7 +57,16 @@
#define VCPU_IRQS_MAX_BUF (sizeof(struct kvm_s390_irq) * \
(KVM_MAX_VCPUS + LOCAL_IRQS))
-struct kvm_stats_debugfs_item debugfs_entries[] = {
+struct stats_fs_value stats_fs_vm_entries[] = {
+ VM_STAT("inject_float_mchk", inject_float_mchk),
+ VM_STAT("inject_io", inject_io),
+ VM_STAT("inject_pfault_done", inject_pfault_done),
+ VM_STAT("inject_service_signal", inject_service_signal),
+ VM_STAT("inject_virtio", inject_virtio),
+ { NULL }
+};
+
+struct stats_fs_value stats_fs_vcpu_entries[] = {
VCPU_STAT("userspace_handled", exit_userspace),
VCPU_STAT("exit_null", exit_null),
VCPU_STAT("exit_validity", exit_validity),
@@ -95,18 +104,13 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
VCPU_STAT("inject_ckc", inject_ckc),
VCPU_STAT("inject_cputm", inject_cputm),
VCPU_STAT("inject_external_call", inject_external_call),
- VM_STAT("inject_float_mchk", inject_float_mchk),
VCPU_STAT("inject_emergency_signal", inject_emergency_signal),
- VM_STAT("inject_io", inject_io),
VCPU_STAT("inject_mchk", inject_mchk),
- VM_STAT("inject_pfault_done", inject_pfault_done),
VCPU_STAT("inject_program", inject_program),
VCPU_STAT("inject_restart", inject_restart),
- VM_STAT("inject_service_signal", inject_service_signal),
VCPU_STAT("inject_set_prefix", inject_set_prefix),
VCPU_STAT("inject_stop_signal", inject_stop_signal),
VCPU_STAT("inject_pfault_init", inject_pfault_init),
- VM_STAT("inject_virtio", inject_virtio),
VCPU_STAT("instruction_epsw", instruction_epsw),
VCPU_STAT("instruction_gs", instruction_gs),
VCPU_STAT("instruction_io_other", instruction_io_other),
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 42a2d0d3984a..6a04f590963f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -35,7 +35,7 @@
#include <asm/kvm_vcpu_regs.h>
#include <asm/hyperv-tlfs.h>
-#define __KVM_HAVE_ARCH_VCPU_DEBUGFS
+#define __KVM_HAVE_ARCH_VCPU_STATS_FS
#define KVM_MAX_VCPUS 288
#define KVM_SOFT_MAX_VCPUS 240
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index d8154e0684b6..0b53bb14c97e 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -25,6 +25,7 @@ config KVM
# for TASKSTATS/TASK_DELAY_ACCT:
depends on NET && MULTIUSER
depends on X86_LOCAL_APIC
+ select STATS_FS_API
select PREEMPT_NOTIFIERS
select MMU_NOTIFIER
select HAVE_KVM_IRQCHIP
diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index a789759b7261..18285a382eba 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -11,7 +11,7 @@ kvm-$(CONFIG_KVM_ASYNC_PF) += $(KVM)/async_pf.o
kvm-y += x86.o emulate.o i8259.o irq.o lapic.o \
i8254.o ioapic.o irq_comm.o cpuid.o pmu.o mtrr.o \
- hyperv.o debugfs.o mmu/mmu.o mmu/page_track.o
+ hyperv.o stats_fs.o mmu/mmu.o mmu/page_track.o
kvm-intel-y += vmx/vmx.o vmx/vmenter.o vmx/pmu_intel.o vmx/vmcs12.o vmx/evmcs.o vmx/nested.o
kvm-amd-y += svm/svm.o svm/vmenter.o svm/pmu.o svm/nested.o svm/avic.o svm/sev.o
diff --git a/arch/x86/kvm/debugfs.c b/arch/x86/kvm/debugfs.c
deleted file mode 100644
index 018aebce33ff..000000000000
--- a/arch/x86/kvm/debugfs.c
+++ /dev/null
@@ -1,64 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-only
-/*
- * Kernel-based Virtual Machine driver for Linux
- *
- * Copyright 2016 Red Hat, Inc. and/or its affiliates.
- */
-#include <linux/kvm_host.h>
-#include <linux/debugfs.h>
-#include "lapic.h"
-
-static int vcpu_get_timer_advance_ns(void *data, u64 *val)
-{
- struct kvm_vcpu *vcpu = (struct kvm_vcpu *) data;
- *val = vcpu->arch.apic->lapic_timer.timer_advance_ns;
- return 0;
-}
-
-DEFINE_SIMPLE_ATTRIBUTE(vcpu_timer_advance_ns_fops, vcpu_get_timer_advance_ns, NULL, "%llu\n");
-
-static int vcpu_get_tsc_offset(void *data, u64 *val)
-{
- struct kvm_vcpu *vcpu = (struct kvm_vcpu *) data;
- *val = vcpu->arch.tsc_offset;
- return 0;
-}
-
-DEFINE_SIMPLE_ATTRIBUTE(vcpu_tsc_offset_fops, vcpu_get_tsc_offset, NULL, "%lld\n");
-
-static int vcpu_get_tsc_scaling_ratio(void *data, u64 *val)
-{
- struct kvm_vcpu *vcpu = (struct kvm_vcpu *) data;
- *val = vcpu->arch.tsc_scaling_ratio;
- return 0;
-}
-
-DEFINE_SIMPLE_ATTRIBUTE(vcpu_tsc_scaling_fops, vcpu_get_tsc_scaling_ratio, NULL, "%llu\n");
-
-static int vcpu_get_tsc_scaling_frac_bits(void *data, u64 *val)
-{
- *val = kvm_tsc_scaling_ratio_frac_bits;
- return 0;
-}
-
-DEFINE_SIMPLE_ATTRIBUTE(vcpu_tsc_scaling_frac_fops, vcpu_get_tsc_scaling_frac_bits, NULL, "%llu\n");
-
-void kvm_arch_create_vcpu_debugfs(struct kvm_vcpu *vcpu)
-{
- debugfs_create_file("tsc-offset", 0444, vcpu->debugfs_dentry, vcpu,
- &vcpu_tsc_offset_fops);
-
- if (lapic_in_kernel(vcpu))
- debugfs_create_file("lapic_timer_advance_ns", 0444,
- vcpu->debugfs_dentry, vcpu,
- &vcpu_timer_advance_ns_fops);
-
- if (kvm_has_tsc_control) {
- debugfs_create_file("tsc-scaling-ratio", 0444,
- vcpu->debugfs_dentry, vcpu,
- &vcpu_tsc_scaling_fops);
- debugfs_create_file("tsc-scaling-ratio-frac-bits", 0444,
- vcpu->debugfs_dentry, vcpu,
- &vcpu_tsc_scaling_frac_fops);
- }
-}
diff --git a/arch/x86/kvm/stats_fs.c b/arch/x86/kvm/stats_fs.c
new file mode 100644
index 000000000000..f6edebb9c559
--- /dev/null
+++ b/arch/x86/kvm/stats_fs.c
@@ -0,0 +1,60 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Kernel-based Virtual Machine driver for Linux
+ *
+ * Copyright 2016 Red Hat, Inc. and/or its affiliates.
+ */
+#include <linux/kvm_host.h>
+#include <linux/stats_fs.h>
+#include "lapic.h"
+
+#define VCPU_ARCH_STATS_FS(n, s, x, ...) \
+ { n, offsetof(struct s, x), .aggr_kind = STATS_FS_SUM, \
+ ##__VA_ARGS__ }
+
+struct stats_fs_value stats_fs_vcpu_tsc_offset[] = {
+ VCPU_ARCH_STATS_FS("tsc-offset", kvm_vcpu_arch, tsc_offset,
+ .type = &stats_fs_type_s64,
+ .value_flag = STATS_FS_FLOATING_VALUE),
+ { NULL }
+};
+
+struct stats_fs_value stats_fs_vcpu_arch_lapic_timer[] = {
+ VCPU_ARCH_STATS_FS("lapic_timer_advance_ns", kvm_timer, timer_advance_ns,
+ .type = &stats_fs_type_u64,
+ .value_flag = STATS_FS_FLOATING_VALUE),
+ { NULL }
+};
+
+struct stats_fs_value stats_fs_vcpu_arch_tsc_ratio[] = {
+ VCPU_ARCH_STATS_FS("tsc-scaling-ratio", kvm_vcpu_arch, tsc_scaling_ratio,
+ .type = &stats_fs_type_u64,
+ .value_flag = STATS_FS_FLOATING_VALUE),
+ { NULL }
+};
+
+struct stats_fs_value stats_fs_vcpu_arch_tsc_frac[] = {
+ { "tsc-scaling-ratio-frac-bits", 0, .type = &stats_fs_type_u64,
+ .value_flag = STATS_FS_FLOATING_VALUE },
+ { NULL } /* base is &kvm_tsc_scaling_ratio_frac_bits */
+};
+
+void kvm_arch_create_vcpu_stats_fs(struct kvm_vcpu *vcpu)
+{
+ stats_fs_source_add_values(vcpu->stats_fs_src, stats_fs_vcpu_tsc_offset,
+ &vcpu->arch, 0);
+
+ if (lapic_in_kernel(vcpu))
+ stats_fs_source_add_values(vcpu->stats_fs_src,
+ stats_fs_vcpu_arch_lapic_timer,
+ &vcpu->arch.apic->lapic_timer, 0);
+
+ if (kvm_has_tsc_control) {
+ stats_fs_source_add_values(vcpu->stats_fs_src,
+ stats_fs_vcpu_arch_tsc_ratio,
+ &vcpu->arch, 0);
+ stats_fs_source_add_values(vcpu->stats_fs_src,
+ stats_fs_vcpu_arch_tsc_frac,
+ &kvm_tsc_scaling_ratio_frac_bits, 0);
+ }
+}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 35723dafedeb..e441fbc00c03 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -190,7 +190,7 @@ static u64 __read_mostly host_xss;
u64 __read_mostly supported_xss;
EXPORT_SYMBOL_GPL(supported_xss);
-struct kvm_stats_debugfs_item debugfs_entries[] = {
+struct stats_fs_value stats_fs_vcpu_entries[] = {
VCPU_STAT("pf_fixed", pf_fixed),
VCPU_STAT("pf_guest", pf_guest),
VCPU_STAT("tlb_flush", tlb_flush),
@@ -217,6 +217,10 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
VCPU_STAT("nmi_injections", nmi_injections),
VCPU_STAT("req_event", req_event),
VCPU_STAT("l1d_flush", l1d_flush),
+ { NULL }
+};
+
+struct stats_fs_value stats_fs_vm_entries[] = {
VM_STAT("mmu_shadow_zapped", mmu_shadow_zapped),
VM_STAT("mmu_pte_write", mmu_pte_write),
VM_STAT("mmu_pte_updated", mmu_pte_updated),
@@ -226,8 +230,9 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
VM_STAT("mmu_cache_miss", mmu_cache_miss),
VM_STAT("mmu_unsync", mmu_unsync),
VM_STAT("remote_tlb_flush", remote_tlb_flush),
- VM_STAT("largepages", lpages, .mode = 0444),
- VM_STAT("nx_largepages_splitted", nx_lpage_splits, .mode = 0444),
+ VM_STAT("largepages", lpages, .value_flag = STATS_FS_FLOATING_VALUE),
+ VM_STAT("nx_largepages_splitted", nx_lpage_splits,
+ .value_flag = STATS_FS_FLOATING_VALUE),
VM_STAT("max_mmu_page_hash_collisions", max_mmu_page_hash_collisions),
{ NULL }
};
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 3845f857ef7b..f7b6a48bac8f 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -27,6 +27,7 @@
#include <linux/refcount.h>
#include <linux/nospec.h>
#include <asm/signal.h>
+#include <linux/stats_fs.h>
#include <linux/kvm.h>
#include <linux/kvm_para.h>
@@ -318,7 +319,7 @@ struct kvm_vcpu {
bool preempted;
bool ready;
struct kvm_vcpu_arch arch;
- struct dentry *debugfs_dentry;
+ struct stats_fs_source *stats_fs_src;
};
static inline int kvm_vcpu_exiting_guest_mode(struct kvm_vcpu *vcpu)
@@ -498,8 +499,7 @@ struct kvm {
long tlbs_dirty;
struct list_head devices;
u64 manual_dirty_log_protect;
- struct dentry *debugfs_dentry;
- struct kvm_stat_data **debugfs_stat_data;
+ struct stats_fs_source *stats_fs_src;
struct srcu_struct srcu;
struct srcu_struct irq_srcu;
pid_t userspace_pid;
@@ -880,8 +880,8 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu);
void kvm_arch_vcpu_postcreate(struct kvm_vcpu *vcpu);
void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu);
-#ifdef __KVM_HAVE_ARCH_VCPU_DEBUGFS
-void kvm_arch_create_vcpu_debugfs(struct kvm_vcpu *vcpu);
+#ifdef __KVM_HAVE_ARCH_VCPU_STATS_FS
+void kvm_arch_create_vcpu_stats_fs(struct kvm_vcpu *vcpu);
#endif
int kvm_arch_hardware_enable(void);
@@ -1110,33 +1110,16 @@ static inline bool kvm_is_error_gpa(struct kvm *kvm, gpa_t gpa)
return kvm_is_error_hva(hva);
}
-enum kvm_stat_kind {
- KVM_STAT_VM,
- KVM_STAT_VCPU,
-};
-
-struct kvm_stat_data {
- struct kvm *kvm;
- struct kvm_stats_debugfs_item *dbgfs_item;
-};
-
-struct kvm_stats_debugfs_item {
- const char *name;
- int offset;
- enum kvm_stat_kind kind;
- int mode;
-};
-
-#define KVM_DBGFS_GET_MODE(dbgfs_item) \
- ((dbgfs_item)->mode ? (dbgfs_item)->mode : 0644)
-
-#define VM_STAT(n, x, ...) \
- { n, offsetof(struct kvm, stat.x), KVM_STAT_VM, ## __VA_ARGS__ }
-#define VCPU_STAT(n, x, ...) \
- { n, offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU, ## __VA_ARGS__ }
+#define VM_STAT(n, x, ...) \
+ { n, offsetof(struct kvm, stat.x), &stats_fs_type_u64, \
+ STATS_FS_SUM, ## __VA_ARGS__ }
+#define VCPU_STAT(n, x, ...) \
+ { n, offsetof(struct kvm_vcpu, stat.x), &stats_fs_type_u64, \
+ STATS_FS_SUM, ## __VA_ARGS__ }
-extern struct kvm_stats_debugfs_item debugfs_entries[];
-extern struct dentry *kvm_debugfs_dir;
+extern struct stats_fs_value stats_fs_vcpu_entries[];
+extern struct stats_fs_value stats_fs_vm_entries[];
+extern struct stats_fs_source *kvm_stats_fs_dir;
#if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER)
static inline int mmu_notifier_retry(struct kvm *kvm, unsigned long mmu_seq)
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 48d0ec44ad77..4171f92fa473 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -140,7 +140,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
return ret;
}
-int kvm_arch_create_vcpu_debugfs(struct kvm_vcpu *vcpu)
+int kvm_arch_create_vcpu_stats_fs(struct kvm_vcpu *vcpu)
{
return 0;
}
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 74bdb7bf3295..3d2dccb5234e 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -25,6 +25,7 @@
#include <linux/vmalloc.h>
#include <linux/reboot.h>
#include <linux/debugfs.h>
+#include <linux/stats_fs.h>
#include <linux/highmem.h>
#include <linux/file.h>
#include <linux/syscore_ops.h>
@@ -109,11 +110,8 @@ static struct kmem_cache *kvm_vcpu_cache;
static __read_mostly struct preempt_ops kvm_preempt_ops;
static DEFINE_PER_CPU(struct kvm_vcpu *, kvm_running_vcpu);
-struct dentry *kvm_debugfs_dir;
-EXPORT_SYMBOL_GPL(kvm_debugfs_dir);
-
-static int kvm_debugfs_num_entries;
-static const struct file_operations stat_fops_per_vm;
+struct stats_fs_source *kvm_stats_fs_dir;
+EXPORT_SYMBOL_GPL(kvm_stats_fs_dir);
static long kvm_vcpu_ioctl(struct file *file, unsigned int ioctl,
unsigned long arg);
@@ -356,6 +354,8 @@ static void kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
void kvm_vcpu_destroy(struct kvm_vcpu *vcpu)
{
+ stats_fs_source_revoke(vcpu->stats_fs_src);
+ stats_fs_source_put(vcpu->stats_fs_src);
kvm_arch_vcpu_destroy(vcpu);
/*
@@ -601,52 +601,29 @@ static void kvm_free_memslots(struct kvm *kvm, struct kvm_memslots *slots)
kvfree(slots);
}
-static void kvm_destroy_vm_debugfs(struct kvm *kvm)
+static void kvm_destroy_vm_stats_fs(struct kvm *kvm)
{
- int i;
-
- if (!kvm->debugfs_dentry)
- return;
-
- debugfs_remove_recursive(kvm->debugfs_dentry);
-
- if (kvm->debugfs_stat_data) {
- for (i = 0; i < kvm_debugfs_num_entries; i++)
- kfree(kvm->debugfs_stat_data[i]);
- kfree(kvm->debugfs_stat_data);
- }
+ stats_fs_source_remove_subordinate(kvm_stats_fs_dir, kvm->stats_fs_src);
+ stats_fs_source_revoke(kvm->stats_fs_src);
+ stats_fs_source_put(kvm->stats_fs_src);
}
-static int kvm_create_vm_debugfs(struct kvm *kvm, int fd)
+static int kvm_create_vm_stats_fs(struct kvm *kvm, int fd)
{
char dir_name[ITOA_MAX_LEN * 2];
- struct kvm_stat_data *stat_data;
- struct kvm_stats_debugfs_item *p;
- if (!debugfs_initialized())
+ if (!stats_fs_initialized())
return 0;
snprintf(dir_name, sizeof(dir_name), "%d-%d", task_pid_nr(current), fd);
- kvm->debugfs_dentry = debugfs_create_dir(dir_name, kvm_debugfs_dir);
+ kvm->stats_fs_src = stats_fs_source_create(0, dir_name);
+ stats_fs_source_add_subordinate(kvm_stats_fs_dir, kvm->stats_fs_src);
- kvm->debugfs_stat_data = kcalloc(kvm_debugfs_num_entries,
- sizeof(*kvm->debugfs_stat_data),
- GFP_KERNEL_ACCOUNT);
- if (!kvm->debugfs_stat_data)
- return -ENOMEM;
+ stats_fs_source_add_values(kvm->stats_fs_src, stats_fs_vm_entries,
+ kvm, 0);
- for (p = debugfs_entries; p->name; p++) {
- stat_data = kzalloc(sizeof(*stat_data), GFP_KERNEL_ACCOUNT);
- if (!stat_data)
- return -ENOMEM;
-
- stat_data->kvm = kvm;
- stat_data->dbgfs_item = p;
- kvm->debugfs_stat_data[p - debugfs_entries] = stat_data;
- debugfs_create_file(p->name, KVM_DBGFS_GET_MODE(p),
- kvm->debugfs_dentry, stat_data,
- &stat_fops_per_vm);
- }
+ stats_fs_source_add_values(kvm->stats_fs_src, stats_fs_vcpu_entries,
+ NULL, 0);
return 0;
}
@@ -783,7 +760,7 @@ static void kvm_destroy_vm(struct kvm *kvm)
struct mm_struct *mm = kvm->mm;
kvm_uevent_notify_change(KVM_EVENT_DESTROY_VM, kvm);
- kvm_destroy_vm_debugfs(kvm);
+ kvm_destroy_vm_stats_fs(kvm);
kvm_arch_sync_events(kvm);
mutex_lock(&kvm_lock);
list_del(&kvm->vm_list);
@@ -2946,7 +2923,6 @@ static int kvm_vcpu_release(struct inode *inode, struct file *filp)
{
struct kvm_vcpu *vcpu = filp->private_data;
- debugfs_remove_recursive(vcpu->debugfs_dentry);
kvm_put_kvm(vcpu->kvm);
return 0;
}
@@ -2970,19 +2946,23 @@ static int create_vcpu_fd(struct kvm_vcpu *vcpu)
return anon_inode_getfd(name, &kvm_vcpu_fops, vcpu, O_RDWR | O_CLOEXEC);
}
-static void kvm_create_vcpu_debugfs(struct kvm_vcpu *vcpu)
+static void kvm_create_vcpu_stats_fs(struct kvm_vcpu *vcpu)
{
-#ifdef __KVM_HAVE_ARCH_VCPU_DEBUGFS
char dir_name[ITOA_MAX_LEN * 2];
- if (!debugfs_initialized())
+ if (!stats_fs_initialized())
return;
snprintf(dir_name, sizeof(dir_name), "vcpu%d", vcpu->vcpu_id);
- vcpu->debugfs_dentry = debugfs_create_dir(dir_name,
- vcpu->kvm->debugfs_dentry);
- kvm_arch_create_vcpu_debugfs(vcpu);
+ vcpu->stats_fs_src = stats_fs_source_create(0, dir_name);
+ stats_fs_source_add_subordinate(vcpu->kvm->stats_fs_src, vcpu->stats_fs_src);
+
+ stats_fs_source_add_values(vcpu->stats_fs_src, stats_fs_vcpu_entries, vcpu,
+ STATS_FS_HIDDEN);
+
+#ifdef __KVM_HAVE_ARCH_VCPU_STATS_FS
+ kvm_arch_create_vcpu_stats_fs(vcpu);
#endif
}
@@ -3031,8 +3011,6 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
if (r)
goto vcpu_free_run_page;
- kvm_create_vcpu_debugfs(vcpu);
-
mutex_lock(&kvm->lock);
if (kvm_get_vcpu_by_id(kvm, id)) {
r = -EEXIST;
@@ -3061,11 +3039,11 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
mutex_unlock(&kvm->lock);
kvm_arch_vcpu_postcreate(vcpu);
+ kvm_create_vcpu_stats_fs(vcpu);
return r;
unlock_vcpu_destroy:
mutex_unlock(&kvm->lock);
- debugfs_remove_recursive(vcpu->debugfs_dentry);
kvm_arch_vcpu_destroy(vcpu);
vcpu_free_run_page:
free_page((unsigned long)vcpu->run);
@@ -3839,7 +3817,7 @@ static int kvm_dev_ioctl_create_vm(unsigned long type)
* cases it will be called by the final fput(file) and will take
* care of doing kvm_put_kvm(kvm).
*/
- if (kvm_create_vm_debugfs(kvm, r) < 0) {
+ if (kvm_create_vm_stats_fs(kvm, r) < 0) {
put_unused_fd(r);
fput(file);
return -ENOMEM;
@@ -4295,214 +4273,6 @@ struct kvm_io_device *kvm_io_bus_get_dev(struct kvm *kvm, enum kvm_bus bus_idx,
}
EXPORT_SYMBOL_GPL(kvm_io_bus_get_dev);
-static int kvm_debugfs_open(struct inode *inode, struct file *file,
- int (*get)(void *, u64 *), int (*set)(void *, u64),
- const char *fmt)
-{
- struct kvm_stat_data *stat_data = (struct kvm_stat_data *)
- inode->i_private;
-
- /* The debugfs files are a reference to the kvm struct which
- * is still valid when kvm_destroy_vm is called.
- * To avoid the race between open and the removal of the debugfs
- * directory we test against the users count.
- */
- if (!refcount_inc_not_zero(&stat_data->kvm->users_count))
- return -ENOENT;
-
- if (simple_attr_open(inode, file, get,
- KVM_DBGFS_GET_MODE(stat_data->dbgfs_item) & 0222
- ? set : NULL,
- fmt)) {
- kvm_put_kvm(stat_data->kvm);
- return -ENOMEM;
- }
-
- return 0;
-}
-
-static int kvm_debugfs_release(struct inode *inode, struct file *file)
-{
- struct kvm_stat_data *stat_data = (struct kvm_stat_data *)
- inode->i_private;
-
- simple_attr_release(inode, file);
- kvm_put_kvm(stat_data->kvm);
-
- return 0;
-}
-
-static int kvm_get_stat_per_vm(struct kvm *kvm, size_t offset, u64 *val)
-{
- *val = *(ulong *)((void *)kvm + offset);
-
- return 0;
-}
-
-static int kvm_clear_stat_per_vm(struct kvm *kvm, size_t offset)
-{
- *(ulong *)((void *)kvm + offset) = 0;
-
- return 0;
-}
-
-static int kvm_get_stat_per_vcpu(struct kvm *kvm, size_t offset, u64 *val)
-{
- int i;
- struct kvm_vcpu *vcpu;
-
- *val = 0;
-
- kvm_for_each_vcpu(i, vcpu, kvm)
- *val += *(u64 *)((void *)vcpu + offset);
-
- return 0;
-}
-
-static int kvm_clear_stat_per_vcpu(struct kvm *kvm, size_t offset)
-{
- int i;
- struct kvm_vcpu *vcpu;
-
- kvm_for_each_vcpu(i, vcpu, kvm)
- *(u64 *)((void *)vcpu + offset) = 0;
-
- return 0;
-}
-
-static int kvm_stat_data_get(void *data, u64 *val)
-{
- int r = -EFAULT;
- struct kvm_stat_data *stat_data = (struct kvm_stat_data *)data;
-
- switch (stat_data->dbgfs_item->kind) {
- case KVM_STAT_VM:
- r = kvm_get_stat_per_vm(stat_data->kvm,
- stat_data->dbgfs_item->offset, val);
- break;
- case KVM_STAT_VCPU:
- r = kvm_get_stat_per_vcpu(stat_data->kvm,
- stat_data->dbgfs_item->offset, val);
- break;
- }
-
- return r;
-}
-
-static int kvm_stat_data_clear(void *data, u64 val)
-{
- int r = -EFAULT;
- struct kvm_stat_data *stat_data = (struct kvm_stat_data *)data;
-
- if (val)
- return -EINVAL;
-
- switch (stat_data->dbgfs_item->kind) {
- case KVM_STAT_VM:
- r = kvm_clear_stat_per_vm(stat_data->kvm,
- stat_data->dbgfs_item->offset);
- break;
- case KVM_STAT_VCPU:
- r = kvm_clear_stat_per_vcpu(stat_data->kvm,
- stat_data->dbgfs_item->offset);
- break;
- }
-
- return r;
-}
-
-static int kvm_stat_data_open(struct inode *inode, struct file *file)
-{
- __simple_attr_check_format("%llu\n", 0ull);
- return kvm_debugfs_open(inode, file, kvm_stat_data_get,
- kvm_stat_data_clear, "%llu\n");
-}
-
-static const struct file_operations stat_fops_per_vm = {
- .owner = THIS_MODULE,
- .open = kvm_stat_data_open,
- .release = kvm_debugfs_release,
- .read = simple_attr_read,
- .write = simple_attr_write,
- .llseek = no_llseek,
-};
-
-static int vm_stat_get(void *_offset, u64 *val)
-{
- unsigned offset = (long)_offset;
- struct kvm *kvm;
- u64 tmp_val;
-
- *val = 0;
- mutex_lock(&kvm_lock);
- list_for_each_entry(kvm, &vm_list, vm_list) {
- kvm_get_stat_per_vm(kvm, offset, &tmp_val);
- *val += tmp_val;
- }
- mutex_unlock(&kvm_lock);
- return 0;
-}
-
-static int vm_stat_clear(void *_offset, u64 val)
-{
- unsigned offset = (long)_offset;
- struct kvm *kvm;
-
- if (val)
- return -EINVAL;
-
- mutex_lock(&kvm_lock);
- list_for_each_entry(kvm, &vm_list, vm_list) {
- kvm_clear_stat_per_vm(kvm, offset);
- }
- mutex_unlock(&kvm_lock);
-
- return 0;
-}
-
-DEFINE_SIMPLE_ATTRIBUTE(vm_stat_fops, vm_stat_get, vm_stat_clear, "%llu\n");
-
-static int vcpu_stat_get(void *_offset, u64 *val)
-{
- unsigned offset = (long)_offset;
- struct kvm *kvm;
- u64 tmp_val;
-
- *val = 0;
- mutex_lock(&kvm_lock);
- list_for_each_entry(kvm, &vm_list, vm_list) {
- kvm_get_stat_per_vcpu(kvm, offset, &tmp_val);
- *val += tmp_val;
- }
- mutex_unlock(&kvm_lock);
- return 0;
-}
-
-static int vcpu_stat_clear(void *_offset, u64 val)
-{
- unsigned offset = (long)_offset;
- struct kvm *kvm;
-
- if (val)
- return -EINVAL;
-
- mutex_lock(&kvm_lock);
- list_for_each_entry(kvm, &vm_list, vm_list) {
- kvm_clear_stat_per_vcpu(kvm, offset);
- }
- mutex_unlock(&kvm_lock);
-
- return 0;
-}
-
-DEFINE_SIMPLE_ATTRIBUTE(vcpu_stat_fops, vcpu_stat_get, vcpu_stat_clear,
- "%llu\n");
-
-static const struct file_operations *stat_fops[] = {
- [KVM_STAT_VCPU] = &vcpu_stat_fops,
- [KVM_STAT_VM] = &vm_stat_fops,
-};
-
static void kvm_uevent_notify_change(unsigned int type, struct kvm *kvm)
{
struct kobj_uevent_env *env;
@@ -4537,34 +4307,33 @@ static void kvm_uevent_notify_change(unsigned int type, struct kvm *kvm)
}
add_uevent_var(env, "PID=%d", kvm->userspace_pid);
- if (!IS_ERR_OR_NULL(kvm->debugfs_dentry)) {
+ if (!IS_ERR_OR_NULL(kvm->stats_fs_src->source_dentry)) {
char *tmp, *p = kmalloc(PATH_MAX, GFP_KERNEL_ACCOUNT);
if (p) {
- tmp = dentry_path_raw(kvm->debugfs_dentry, p, PATH_MAX);
+ tmp = dentry_path_raw(kvm->stats_fs_src->source_dentry,
+ p, PATH_MAX);
if (!IS_ERR(tmp))
add_uevent_var(env, "STATS_PATH=%s", tmp);
kfree(p);
}
}
+
/* no need for checks, since we are adding at most only 5 keys */
env->envp[env->envp_idx++] = NULL;
kobject_uevent_env(&kvm_dev.this_device->kobj, KOBJ_CHANGE, env->envp);
kfree(env);
}
-static void kvm_init_debug(void)
+static void kvm_init_stats_fs(void)
{
- struct kvm_stats_debugfs_item *p;
+ kvm_stats_fs_dir = stats_fs_source_create(0, "kvm");
+ /* symlink to debugfs */
+ debugfs_create_symlink("kvm", NULL, "/sys/kernel/stats/kvm");
+ stats_fs_source_register(kvm_stats_fs_dir);
- kvm_debugfs_dir = debugfs_create_dir("kvm", NULL);
-
- kvm_debugfs_num_entries = 0;
- for (p = debugfs_entries; p->name; ++p, kvm_debugfs_num_entries++) {
- debugfs_create_file(p->name, KVM_DBGFS_GET_MODE(p),
- kvm_debugfs_dir, (void *)(long)p->offset,
- stat_fops[p->kind]);
- }
+ stats_fs_source_add_values(kvm_stats_fs_dir, stats_fs_vcpu_entries, NULL, 0);
+ stats_fs_source_add_values(kvm_stats_fs_dir, stats_fs_vm_entries, NULL, 0);
}
static int kvm_suspend(void)
@@ -4738,7 +4507,7 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
kvm_preempt_ops.sched_in = kvm_sched_in;
kvm_preempt_ops.sched_out = kvm_sched_out;
- kvm_init_debug();
+ kvm_init_stats_fs();
r = kvm_vfio_ops_init();
WARN_ON(r);
@@ -4767,7 +4536,8 @@ EXPORT_SYMBOL_GPL(kvm_init);
void kvm_exit(void)
{
- debugfs_remove_recursive(kvm_debugfs_dir);
+ stats_fs_source_revoke(kvm_stats_fs_dir);
+ stats_fs_source_put(kvm_stats_fs_dir);
misc_deregister(&kvm_dev);
kmem_cache_destroy(kvm_vcpu_cache);
kvm_async_pf_deinit();
--
2.25.4
^ permalink raw reply related
* [PATCH v3 4/7] stats_fs fs: virtual fs to show stats to the end-user
From: Emanuele Giuseppe Esposito @ 2020-05-26 11:03 UTC (permalink / raw)
To: kvm
Cc: Emanuele Giuseppe Esposito, linux-s390, linux-doc, netdev,
Emanuele Giuseppe Esposito, linux-kernel, kvm-ppc, Jonathan Adams,
Christian Borntraeger, Alexander Viro, David Rientjes,
linux-fsdevel, Paolo Bonzini, linux-mips, linuxppc-dev,
linux-arm-kernel, Jim Mattson
In-Reply-To: <20200526110318.69006-1-eesposit@redhat.com>
Add virtual fs that maps stats_fs sources with directories, and values
(simple or aggregates) to files.
Every time a file is read/cleared, the fs internally invokes the stats_fs
API to get/set the requested value.
Also introduce the optional show function in each value, that allows
to customize how the value is displayed inside a file. This could be
especially useful with enums.
fs/stats_fs/inode.cis pretty much similar to what is done in
fs/debugfs/inode.c, with the exception that the API is only
composed by stats_fs_create_file, stats_fs_create_dir and stats_fs_remove.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
fs/stats_fs/Makefile | 2 +-
fs/stats_fs/inode.c | 461 +++++++++++++++++++++++++++++++++++++
fs/stats_fs/internal.h | 15 ++
fs/stats_fs/stats_fs.c | 92 +++++++-
include/linux/stats_fs.h | 18 ++
include/uapi/linux/magic.h | 1 +
tools/lib/api/fs/fs.c | 21 ++
7 files changed, 608 insertions(+), 2 deletions(-)
create mode 100644 fs/stats_fs/inode.c
diff --git a/fs/stats_fs/Makefile b/fs/stats_fs/Makefile
index bc59a54d5721..19b7e13f6c3d 100644
--- a/fs/stats_fs/Makefile
+++ b/fs/stats_fs/Makefile
@@ -1,5 +1,5 @@
# SPDX-License-Identifier: GPL-2.0-only
-stats_fs-objs := stats_fs.o
+stats_fs-objs := inode.o stats_fs.o
stats_fs-tests-objs := stats_fs-tests.o
obj-$(CONFIG_STATS_FS) += stats_fs.o
diff --git a/fs/stats_fs/inode.c b/fs/stats_fs/inode.c
new file mode 100644
index 000000000000..eaa0a8bc7466
--- /dev/null
+++ b/fs/stats_fs/inode.c
@@ -0,0 +1,461 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * inode.c - part of stats_fs, a tiny little stats_fs file system
+ *
+ * Copyright (C) 2020 Emanuele Giuseppe Esposito <eesposit@redhat.com>
+ * Copyright (C) 2020 Redhat
+ */
+#define pr_fmt(fmt) "stats_fs: " fmt
+
+#include <linux/module.h>
+#include <linux/fs.h>
+#include <linux/mount.h>
+#include <linux/init.h>
+#include <linux/stats_fs.h>
+#include <linux/string.h>
+#include <linux/seq_file.h>
+#include <linux/parser.h>
+#include <linux/magic.h>
+#include <linux/slab.h>
+
+#include "internal.h"
+
+#define STATS_FS_DEFAULT_MODE 0700
+
+static struct simple_fs stats_fs;
+static bool stats_fs_registered;
+
+struct stats_fs_mount_opts {
+ kuid_t uid;
+ kgid_t gid;
+ umode_t mode;
+};
+
+enum {
+ Opt_uid,
+ Opt_gid,
+ Opt_mode,
+ Opt_err
+};
+
+static const match_table_t tokens = {
+ {Opt_uid, "uid=%u"},
+ {Opt_gid, "gid=%u"},
+ {Opt_mode, "mode=%o"},
+ {Opt_err, NULL}
+};
+
+struct stats_fs_fs_info {
+ struct stats_fs_mount_opts mount_opts;
+};
+
+static int stats_fs_parse_options(char *data, struct stats_fs_mount_opts *opts)
+{
+ substring_t args[MAX_OPT_ARGS];
+ int option;
+ int token;
+ kuid_t uid;
+ kgid_t gid;
+ char *p;
+
+ opts->mode = STATS_FS_DEFAULT_MODE;
+
+ while ((p = strsep(&data, ",")) != NULL) {
+ if (!*p)
+ continue;
+
+ token = match_token(p, tokens, args);
+ switch (token) {
+ case Opt_uid:
+ if (match_int(&args[0], &option))
+ return -EINVAL;
+ uid = make_kuid(current_user_ns(), option);
+ if (!uid_valid(uid))
+ return -EINVAL;
+ opts->uid = uid;
+ break;
+ case Opt_gid:
+ if (match_int(&args[0], &option))
+ return -EINVAL;
+ gid = make_kgid(current_user_ns(), option);
+ if (!gid_valid(gid))
+ return -EINVAL;
+ opts->gid = gid;
+ break;
+ case Opt_mode:
+ if (match_octal(&args[0], &option))
+ return -EINVAL;
+ opts->mode = option & S_IALLUGO;
+ break;
+ /*
+ * We might like to report bad mount options here;
+ * but traditionally stats_fs has ignored all mount options
+ */
+ }
+ }
+
+ return 0;
+}
+
+static int stats_fs_apply_options(struct super_block *sb)
+{
+ struct stats_fs_fs_info *fsi = sb->s_fs_info;
+ struct inode *inode = d_inode(sb->s_root);
+ struct stats_fs_mount_opts *opts = &fsi->mount_opts;
+
+ inode->i_mode &= ~S_IALLUGO;
+ inode->i_mode |= opts->mode;
+
+ inode->i_uid = opts->uid;
+ inode->i_gid = opts->gid;
+
+ return 0;
+}
+
+static int stats_fs_remount(struct super_block *sb, int *flags, char *data)
+{
+ int err;
+ struct stats_fs_fs_info *fsi = sb->s_fs_info;
+
+ sync_filesystem(sb);
+ err = stats_fs_parse_options(data, &fsi->mount_opts);
+ if (err)
+ goto fail;
+
+ stats_fs_apply_options(sb);
+
+fail:
+ return err;
+}
+
+static int stats_fs_show_options(struct seq_file *m, struct dentry *root)
+{
+ struct stats_fs_fs_info *fsi = root->d_sb->s_fs_info;
+ struct stats_fs_mount_opts *opts = &fsi->mount_opts;
+
+ if (!uid_eq(opts->uid, GLOBAL_ROOT_UID))
+ seq_printf(m, ",uid=%u",
+ from_kuid_munged(&init_user_ns, opts->uid));
+ if (!gid_eq(opts->gid, GLOBAL_ROOT_GID))
+ seq_printf(m, ",gid=%u",
+ from_kgid_munged(&init_user_ns, opts->gid));
+ if (opts->mode != STATS_FS_DEFAULT_MODE)
+ seq_printf(m, ",mode=%o", opts->mode);
+
+ return 0;
+}
+
+
+static void stats_fs_free_inode(struct inode *inode)
+{
+ kfree(inode->i_private);
+ free_inode_nonrcu(inode);
+}
+
+static const struct super_operations stats_fs_super_operations = {
+ .statfs = simple_statfs,
+ .remount_fs = stats_fs_remount,
+ .show_options = stats_fs_show_options,
+ .free_inode = stats_fs_free_inode,
+};
+
+static int stats_fs_fill_super(struct super_block *sb, void *data, int silent)
+{
+ static const struct tree_descr stats_fs_files[] = {{""}};
+ struct stats_fs_fs_info *fsi;
+ int err;
+
+ fsi = kzalloc(sizeof(struct stats_fs_fs_info), GFP_KERNEL);
+ sb->s_fs_info = fsi;
+ if (!fsi) {
+ err = -ENOMEM;
+ goto fail;
+ }
+
+ err = stats_fs_parse_options(data, &fsi->mount_opts);
+ if (err)
+ goto fail;
+
+ err = simple_fill_super(sb, STATSFS_MAGIC, stats_fs_files);
+ if (err)
+ goto fail;
+
+ sb->s_op = &stats_fs_super_operations;
+
+ stats_fs_apply_options(sb);
+
+ return 0;
+
+fail:
+ kfree(fsi);
+ sb->s_fs_info = NULL;
+ return err;
+}
+
+static struct dentry *stats_fs_mount(struct file_system_type *fs_type,
+ int flags, const char *dev_name,
+ void *data)
+{
+ return mount_single(fs_type, flags, data, stats_fs_fill_super);
+}
+
+static struct file_system_type stats_fs_fs_type = {
+ .owner = THIS_MODULE,
+ .name = "statsfs",
+ .mount = stats_fs_mount,
+ .kill_sb = kill_litter_super,
+};
+MODULE_ALIAS_FS("statsfs");
+
+static int stats_fs_u64_attr_get(void *data, u64 *val)
+{
+ int r = -EFAULT;
+ struct stats_fs_data_inode *val_inode =
+ (struct stats_fs_data_inode *)data;
+
+ r = stats_fs_source_get_value(val_inode->src, val_inode->val, val);
+ return r;
+}
+
+static int stats_fs_u64_attr_clear(void *data, u64 val)
+{
+ int r = -EFAULT;
+ struct stats_fs_data_inode *val_inode =
+ (struct stats_fs_data_inode *)data;
+
+ if (val)
+ return -EINVAL;
+
+ r = stats_fs_source_clear(val_inode->src, val_inode->val);
+ return r;
+}
+
+static int stats_fs_u64_attr_open(struct inode *inode, struct file *file)
+{
+ struct stats_fs_data_inode *val_inode;
+ char *fmt;
+
+ val_inode = (struct stats_fs_data_inode *)inode->i_private;
+
+ /* Inodes hold a pointer to the source which is not included in the
+ * refcount, so they files be opened while destroy is running, but
+ * values are removed (base_addr = NULL) before the source is destroyed.
+ */
+ if (!kref_get_unless_zero(&val_inode->src->refcount))
+ return -ENOENT;
+
+ if (is_val_signed(val_inode->val))
+ fmt = "%lld\n";
+ else
+ fmt = "%llu\n";
+
+ if (simple_attr_open(inode, file, stats_fs_u64_attr_get,
+ stats_fs_val_get_mode(val_inode->val) & 0222 ?
+ stats_fs_u64_attr_clear :
+ NULL,
+ fmt)) {
+ stats_fs_source_put(val_inode->src);
+ return -ENOMEM;
+ }
+ return 0;
+}
+
+static ssize_t stats_fs_string_attr_read(struct file *file, char __user *buf,
+ size_t len, loff_t *ppos)
+{
+ const char *str = file->private_data;
+ size_t size = strlen(str);
+ return simple_read_from_buffer(buf, len, ppos, str, size);
+}
+
+static int file_string_attr_open(struct inode *inode, struct file *file,
+ char *str)
+{
+ file->private_data = str;
+ return nonseekable_open(inode, file);
+}
+
+static int stats_fs_string_attr_open(struct inode *inode, struct file *file)
+{
+ struct stats_fs_data_inode *val_inode;
+ char *str;
+ u64 val;
+
+ val_inode = (struct stats_fs_data_inode *)inode->i_private;
+
+ WARN_ON(val_inode->val->value_flag & STATS_FS_FLOATING_VALUE);
+
+ /* Inodes hold a pointer to the source which is not included in the
+ * refcount, so they files be opened while destroy is running, but
+ * values are removed (base_addr = NULL) before the source is destroyed.
+ */
+ if (!kref_get_unless_zero(&val_inode->src->refcount))
+ return -ENOENT;
+
+ stats_fs_source_get_value(val_inode->src, val_inode->val, &val);
+ str = val_inode->val->show(val);
+
+ if (file_string_attr_open(inode, file, str)) {
+ stats_fs_source_put(val_inode->src);
+ return -ENOMEM;
+ }
+ return 0;
+}
+
+static int stats_fs_attr_release(struct inode *inode, struct file *file)
+{
+ struct stats_fs_data_inode *val_inode;
+
+ val_inode = (struct stats_fs_data_inode *)inode->i_private;
+
+ simple_attr_release(inode, file);
+ stats_fs_source_put(val_inode->src);
+
+ return 0;
+}
+
+static const struct file_operations stats_fs_u64_ops = {
+ .owner = THIS_MODULE,
+ .open = stats_fs_u64_attr_open,
+ .release = stats_fs_attr_release,
+ .read = simple_attr_read,
+ .write = simple_attr_write,
+ .llseek = no_llseek,
+};
+
+static const struct file_operations stats_fs_string_ops = {
+ .owner = THIS_MODULE,
+ .open = stats_fs_string_attr_open,
+ .release = stats_fs_attr_release,
+ .read = stats_fs_string_attr_read,
+ .write = simple_attr_write,
+ .llseek = no_llseek,
+};
+
+/**
+ * stats_fs_create_file - create a file in the stats_fs filesystem
+ * @val: a pointer to a stats_fs_value containing all the infos of
+ * the file to create (name, permission)
+ * @src: a pointer to a stats_fs_source containing the dentry of where
+ * to add this file
+ *
+ * This function will return a pointer to a dentry if it succeeds. This
+ * pointer must be passed to the stats_fs_remove() function when the file is
+ * to be removed (no automatic cleanup happens if your module is unloaded,
+ * you are responsible here.) If an error occurs, ERR_PTR(-ERROR) will be
+ * returned.
+ *
+ * Val and src will be also inglobated in a ststsfs_data_inode struct
+ * that will be internally stored as inode->i_private and used in the
+ * get/set attribute functions (see stats_fs_ops in stats_fs.c).
+ */
+struct dentry *stats_fs_create_file(struct stats_fs_value *val, struct stats_fs_source *src)
+{
+ struct dentry *dentry;
+ struct inode *inode;
+ struct stats_fs_data_inode *val_inode;
+
+ val_inode = kzalloc(sizeof(struct stats_fs_data_inode), GFP_KERNEL);
+ if (!val_inode) {
+ printk(KERN_ERR
+ "Kzalloc failure in stats_fs_create_files (ENOMEM)\n");
+ return ERR_PTR(-ENOMEM);
+ }
+
+ val_inode->src = src;
+ val_inode->val = val;
+
+
+ dentry = simplefs_create_file(&stats_fs, &stats_fs_fs_type,
+ val->name, stats_fs_val_get_mode(val),
+ src->source_dentry, val_inode, &inode);
+ if (IS_ERR(dentry))
+ return dentry;
+
+ inode->i_fop = val->show ? &stats_fs_string_ops : &stats_fs_u64_ops;
+
+ return simplefs_finish_dentry(dentry, inode);
+}
+/**
+ * stats_fs_create_dir - create a directory in the stats_fs filesystem
+ * @name: a pointer to a string containing the name of the directory to
+ * create.
+ * @parent: a pointer to the parent dentry for this file. This should be a
+ * directory dentry if set. If this parameter is NULL, then the
+ * directory will be created in the root of the stats_fs filesystem.
+ *
+ * This function creates a directory in stats_fs with the given name.
+ *
+ * This function will return a pointer to a dentry if it succeeds. This
+ * pointer must be passed to the stats_fs_remove() function when the file is
+ * to be removed (no automatic cleanup happens if your module is unloaded,
+ * you are responsible here.) If an error occurs, ERR_PTR(-ERROR) will be
+ * returned.
+ */
+struct dentry *stats_fs_create_dir(const char *name, struct dentry *parent)
+{
+ struct dentry *dentry;
+ struct inode *inode;
+
+ dentry = simplefs_create_dir(&stats_fs, &stats_fs_fs_type,
+ name, 0755, parent, &inode);
+ if (IS_ERR(dentry))
+ return dentry;
+
+ inode->i_op = &simple_dir_inode_operations;
+ return simplefs_finish_dentry(dentry, inode);
+}
+
+static void remove_one(struct dentry *victim)
+{
+ simple_release_fs(&stats_fs);
+}
+
+/**
+ * stats_fs_remove - recursively removes a directory
+ * @dentry: a pointer to a the dentry of the directory to be removed. If this
+ * parameter is NULL or an error value, nothing will be done.
+ *
+ * This function recursively removes a directory tree in stats_fs that
+ * was previously created with a call to another stats_fs function
+ * (like stats_fs_create_file() or variants thereof.)
+ *
+ * This function is required to be called in order for the file to be
+ * removed, no automatic cleanup of files will happen when a module is
+ * removed, you are responsible here.
+ */
+void stats_fs_remove(struct dentry *dentry)
+{
+ if (IS_ERR_OR_NULL(dentry))
+ return;
+
+ simple_pin_fs(&stats_fs, &stats_fs_fs_type);
+ simple_recursive_removal(dentry, remove_one);
+ simple_release_fs(&stats_fs);
+}
+/**
+ * stats_fs_initialized - Tells whether stats_fs has been registered
+ */
+bool stats_fs_initialized(void)
+{
+ return stats_fs_registered;
+}
+EXPORT_SYMBOL_GPL(stats_fs_initialized);
+
+static int __init stats_fs_init(void)
+{
+ int retval;
+
+ retval = sysfs_create_mount_point(kernel_kobj, "stats");
+ if (retval)
+ return retval;
+
+ retval = register_filesystem(&stats_fs_fs_type);
+ if (retval)
+ sysfs_remove_mount_point(kernel_kobj, "stats");
+ else
+ stats_fs_registered = true;
+
+ return retval;
+}
+core_initcall(stats_fs_init);
diff --git a/fs/stats_fs/internal.h b/fs/stats_fs/internal.h
index 4993afbb1e45..50977f332feb 100644
--- a/fs/stats_fs/internal.h
+++ b/fs/stats_fs/internal.h
@@ -16,4 +16,19 @@ struct stats_fs_value_source {
struct list_head list_element;
};
+struct stats_fs_data_inode {
+ struct stats_fs_source *src;
+ struct stats_fs_value *val;
+};
+
+struct dentry *stats_fs_create_file(struct stats_fs_value *val,
+ struct stats_fs_source *src);
+
+struct dentry *stats_fs_create_dir(const char *name, struct dentry *parent);
+
+void stats_fs_remove(struct dentry *dentry);
+#define stats_fs_remove_recursive stats_fs_remove
+
+int is_val_signed(struct stats_fs_value *val);
+
#endif /* _STATS_FS_INTERNAL_H_ */
diff --git a/fs/stats_fs/stats_fs.c b/fs/stats_fs/stats_fs.c
index b76ee44f6dac..f99c3ae6ce25 100644
--- a/fs/stats_fs/stats_fs.c
+++ b/fs/stats_fs/stats_fs.c
@@ -39,11 +39,35 @@ STATS_FS_TYPE_STRUCT_US(32)
STATS_FS_TYPE_STRUCT_US(64)
STATS_FS_TYPE_STRUCT(bool)
-static int is_val_signed(struct stats_fs_value *val)
+static void stats_fs_source_remove_files(struct stats_fs_source *src);
+
+int is_val_signed(struct stats_fs_value *val)
{
return val->type->sign;
}
+/* Called with rwsem held for writing */
+static void stats_fs_source_remove_files_locked(struct stats_fs_source *src)
+{
+ struct stats_fs_source *child;
+
+ if (src->source_dentry == NULL)
+ return;
+
+ list_for_each_entry (child, &src->subordinates_head, list_element)
+ stats_fs_source_remove_files(child);
+
+ stats_fs_remove_recursive(src->source_dentry);
+ src->source_dentry = NULL;
+}
+
+static void stats_fs_source_remove_files(struct stats_fs_source *src)
+{
+ down_write(&src->rwsem);
+ stats_fs_source_remove_files_locked(src);
+ up_write(&src->rwsem);
+}
+
static struct stats_fs_value *find_value(struct stats_fs_value_source *src,
struct stats_fs_value *val)
{
@@ -74,6 +98,63 @@ search_value_in_source(struct stats_fs_source *src, struct stats_fs_value *arg,
return NULL;
}
+/* Called with rwsem held for writing */
+static void stats_fs_create_files_locked(struct stats_fs_source *source)
+{
+ struct stats_fs_value_source *val_src;
+ struct stats_fs_value *val;
+
+ if (!source->source_dentry)
+ return;
+
+ list_for_each_entry (val_src, &source->values_head, list_element) {
+ if (val_src->files_created ||
+ (val_src->common_flags & STATS_FS_HIDDEN))
+ continue;
+
+ for (val = val_src->values; val->name; val++)
+ stats_fs_create_file(val, source);
+
+ val_src->files_created = true;
+ }
+}
+
+/* Called with rwsem held for writing */
+static void
+stats_fs_create_files_recursive_locked(struct stats_fs_source *source,
+ struct dentry *parent_dentry)
+{
+ struct stats_fs_source *child;
+
+ /* first check values in this folder, since it might be new */
+ if (!source->source_dentry && !(source->common_flags & STATS_FS_HIDDEN)) {
+ source->source_dentry =
+ stats_fs_create_dir(source->name, parent_dentry);
+ }
+
+ stats_fs_create_files_locked(source);
+
+ list_for_each_entry (child, &source->subordinates_head, list_element) {
+ if (child->source_dentry == NULL) {
+ /* assume that if child has a folder,
+ * also the sub-child have that.
+ */
+ down_write(&child->rwsem);
+ stats_fs_create_files_recursive_locked(
+ child, source->source_dentry);
+ up_write(&child->rwsem);
+ }
+ }
+}
+
+void stats_fs_source_register(struct stats_fs_source *source)
+{
+ down_write(&source->rwsem);
+ stats_fs_create_files_recursive_locked(source, NULL);
+ up_write(&source->rwsem);
+}
+EXPORT_SYMBOL_GPL(stats_fs_source_register);
+
/* Called with rwsem held for writing */
static struct stats_fs_value_source *create_value_source(void *base, uint32_t flags)
{
@@ -112,6 +193,9 @@ int stats_fs_source_add_values(struct stats_fs_source *source,
/* add the val_src to the source list */
list_add(&val_src->list_element, &source->values_head);
+ /* create child if it's the case */
+ stats_fs_create_files_locked(source);
+
up_write(&source->rwsem);
return 0;
@@ -125,6 +209,9 @@ void stats_fs_source_add_subordinate(struct stats_fs_source *source,
stats_fs_source_get(sub);
list_add(&sub->list_element, &source->subordinates_head);
+ if (source->source_dentry)
+ stats_fs_create_files_recursive_locked(sub,
+ source->source_dentry);
up_write(&source->rwsem);
}
@@ -141,6 +228,7 @@ stats_fs_source_remove_subordinate_locked(struct stats_fs_source *source,
list_element) {
if (src_entry == sub) {
list_del_init(&src_entry->list_element);
+ stats_fs_source_remove_files(src_entry);
stats_fs_source_put(src_entry);
return;
}
@@ -505,6 +593,8 @@ static void stats_fs_source_destroy(struct kref *kref_source)
stats_fs_source_remove_subordinate_locked(source, child);
}
+ stats_fs_source_remove_files_locked(source);
+
up_write(&source->rwsem);
kfree(source->name);
kfree(source);
diff --git a/include/linux/stats_fs.h b/include/linux/stats_fs.h
index 93847383f597..db1c3ae9ff8b 100644
--- a/include/linux/stats_fs.h
+++ b/include/linux/stats_fs.h
@@ -52,6 +52,9 @@ struct stats_fs_value {
enum stat_aggr aggr_kind;
uint32_t value_flag;
+
+ /* optional show function */
+ char *(*show)(uint64_t);
};
struct stats_fs_source {
@@ -144,6 +147,18 @@ extern const struct stats_fs_type stats_fs_type_bool;
struct stats_fs_source *stats_fs_source_create(uint32_t flags, const char *fmt,
...);
+/**
+ * stats_fs_source_register - register a source in the stats_fs filesystem
+ * @source: a pointer to the source that will be registered
+ *
+ * Add the given folder as direct child of /sys/kernel/stats.
+ * It also starts to recursively search its own child and create all folders
+ * and files if they weren't already. All subsequent add_subordinate calls
+ * on the same source that is used in this function will create corresponding
+ * files and directories.
+ */
+void stats_fs_source_register(struct stats_fs_source *source);
+
/**
* stats_fs_source_add_values - adds values to the given source
* @source: a pointer to the source that will receive the values
@@ -307,6 +322,9 @@ static inline struct stats_fs_source *stats_fs_source_create(uint32_t flags,
return ERR_PTR(-ENODEV);
}
+static inline void stats_fs_source_register(struct stats_fs_source *source)
+{ }
+
static inline int stats_fs_source_add_values(struct stats_fs_source *source,
struct stats_fs_value *val,
void *base_ptr, uint32_t flags)
diff --git a/include/uapi/linux/magic.h b/include/uapi/linux/magic.h
index d78064007b17..46c66ea3fc9e 100644
--- a/include/uapi/linux/magic.h
+++ b/include/uapi/linux/magic.h
@@ -10,6 +10,7 @@
#define CRAMFS_MAGIC 0x28cd3d45 /* some random number */
#define CRAMFS_MAGIC_WEND 0x453dcd28 /* magic number with the wrong endianess */
#define DEBUGFS_MAGIC 0x64626720
+#define STATSFS_MAGIC 0x73746174
#define SECURITYFS_MAGIC 0x73636673
#define SELINUX_MAGIC 0xf97cff8c
#define SMACK_MAGIC 0x43415d53 /* "SMAC" */
diff --git a/tools/lib/api/fs/fs.c b/tools/lib/api/fs/fs.c
index 027b18f7ed8c..c512e69b3cfe 100644
--- a/tools/lib/api/fs/fs.c
+++ b/tools/lib/api/fs/fs.c
@@ -35,6 +35,10 @@
#define TRACEFS_MAGIC 0x74726163
#endif
+#ifndef STATSFS_MAGIC
+#define STATSFS_MAGIC 0x73746174
+#endif
+
#ifndef HUGETLBFS_MAGIC
#define HUGETLBFS_MAGIC 0x958458f6
#endif
@@ -76,6 +80,16 @@ static const char * const tracefs__known_mountpoints[] = {
0,
};
+#ifndef STATSFS_DEFAULT_PATH
+#define STATSFS_DEFAULT_PATH "/sys/kernel/stats"
+#endif
+
+static const char * const statsfs__known_mountpoints[] = {
+ STATSFS_DEFAULT_PATH,
+ "/stats",
+ 0,
+};
+
static const char * const hugetlbfs__known_mountpoints[] = {
0,
};
@@ -100,6 +114,7 @@ enum {
FS__TRACEFS = 3,
FS__HUGETLBFS = 4,
FS__BPF_FS = 5,
+ FS__STATSFS = 6,
};
#ifndef TRACEFS_MAGIC
@@ -127,6 +142,11 @@ static struct fs fs__entries[] = {
.mounts = tracefs__known_mountpoints,
.magic = TRACEFS_MAGIC,
},
+ [FS__STATSFS] = {
+ .name = "statsfs",
+ .mounts = statsfs__known_mountpoints,
+ .magic = STATSFS_MAGIC,
+ },
[FS__HUGETLBFS] = {
.name = "hugetlbfs",
.mounts = hugetlbfs__known_mountpoints,
@@ -297,6 +317,7 @@ FS(sysfs, FS__SYSFS);
FS(procfs, FS__PROCFS);
FS(debugfs, FS__DEBUGFS);
FS(tracefs, FS__TRACEFS);
+FS(statsfs, FS__STATSFS);
FS(hugetlbfs, FS__HUGETLBFS);
FS(bpf_fs, FS__BPF_FS);
--
2.25.4
^ permalink raw reply related
* [PATCH v3 3/7] kunit: tests for stats_fs API
From: Emanuele Giuseppe Esposito @ 2020-05-26 11:03 UTC (permalink / raw)
To: kvm
Cc: Emanuele Giuseppe Esposito, linux-s390, linux-doc, netdev,
Emanuele Giuseppe Esposito, linux-kernel, kvm-ppc, Jonathan Adams,
Christian Borntraeger, Alexander Viro, David Rientjes,
linux-fsdevel, Paolo Bonzini, linux-mips, linuxppc-dev,
linux-arm-kernel, Jim Mattson
In-Reply-To: <20200526110318.69006-1-eesposit@redhat.com>
Add kunit tests to extensively test the stats_fs API functionality.
In order to run them, the kernel .config must set CONFIG_KUNIT=y
and a new .kunitconfig file must be created with CONFIG_STATS_FS=y
and CONFIG_STATS_FS_TEST=y
Tests can be then started by running the following command from the root
directory of the linux kernel source tree:
./tools/testing/kunit/kunit.py run --timeout=30 --jobs=`nproc --all`
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
fs/Kconfig | 6 +
fs/stats_fs/Makefile | 2 +
fs/stats_fs/stats_fs-tests.c | 1097 ++++++++++++++++++++++++++++++++++
3 files changed, 1105 insertions(+)
create mode 100644 fs/stats_fs/stats_fs-tests.c
diff --git a/fs/Kconfig b/fs/Kconfig
index 684ad61129ab..02bbb0e4cdf7 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -227,6 +227,12 @@ config STATS_FS
stats_fs is a virtual file system that provides counters and
other statistics about the running kernel.
+config STATS_FS_TEST
+ bool "Tests for stats_fs"
+ depends on STATS_FS && KUNIT
+ help
+ tests for the stats_fs API.
+
config STATS_FS_API
bool
imply STATS_FS
diff --git a/fs/stats_fs/Makefile b/fs/stats_fs/Makefile
index bd988daa4c39..bc59a54d5721 100644
--- a/fs/stats_fs/Makefile
+++ b/fs/stats_fs/Makefile
@@ -1,5 +1,7 @@
# SPDX-License-Identifier: GPL-2.0-only
stats_fs-objs := stats_fs.o
+stats_fs-tests-objs := stats_fs-tests.o
obj-$(CONFIG_STATS_FS) += stats_fs.o
obj-$(CONFIG_STATS_FS_STUB) += stub.o
+obj-$(CONFIG_STATS_FS_TEST) += stats_fs-tests.o
diff --git a/fs/stats_fs/stats_fs-tests.c b/fs/stats_fs/stats_fs-tests.c
new file mode 100644
index 000000000000..bbac133d7fe7
--- /dev/null
+++ b/fs/stats_fs/stats_fs-tests.c
@@ -0,0 +1,1097 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/module.h>
+#include <linux/errno.h>
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/slab.h>
+#include <linux/anon_inodes.h>
+#include <linux/spinlock.h>
+#include <linux/uaccess.h>
+#include <linux/rwsem.h>
+#include <linux/list.h>
+#include <linux/kref.h>
+
+#include <linux/limits.h>
+#include <linux/stats_fs.h>
+#include <kunit/test.h>
+#include "internal.h"
+
+#define STATS_FS_STAT(el, x, ...) \
+ { \
+ .name = #x, .offset = offsetof(struct container, el.x), \
+ ##__VA_ARGS__ \
+ }
+
+#define ARR_SIZE(el) ((int)(sizeof(el) / sizeof(struct stats_fs_value) - 1))
+
+struct test_values_struct {
+ uint64_t u64;
+ int32_t s32;
+ bool bo;
+ uint8_t u8;
+ int16_t s16;
+};
+
+struct container {
+ struct test_values_struct vals;
+};
+
+struct stats_fs_value test_values[6] = {
+ STATS_FS_STAT(vals, u64, .type = &stats_fs_type_u64,
+ .aggr_kind = STATS_FS_NONE,
+ .value_flag = STATS_FS_FLOATING_VALUE),
+ STATS_FS_STAT(vals, s32, .type = &stats_fs_type_s32,
+ .aggr_kind = STATS_FS_NONE),
+ STATS_FS_STAT(vals, bo, .type = &stats_fs_type_bool,
+ .aggr_kind = STATS_FS_NONE,
+ .value_flag = STATS_FS_FLOATING_VALUE),
+ STATS_FS_STAT(vals, u8, .type = &stats_fs_type_u8,
+ .aggr_kind = STATS_FS_NONE),
+ STATS_FS_STAT(vals, s16, .type = &stats_fs_type_s16,
+ .aggr_kind = STATS_FS_NONE,
+ .value_flag = STATS_FS_FLOATING_VALUE),
+ { NULL },
+};
+
+struct stats_fs_value test_aggr[4] = {
+ STATS_FS_STAT(vals, s32, .type = &stats_fs_type_s32,
+ .aggr_kind = STATS_FS_MIN,
+ .value_flag = STATS_FS_FLOATING_VALUE),
+ STATS_FS_STAT(vals, bo, .type = &stats_fs_type_bool,
+ .aggr_kind = STATS_FS_MAX,
+ .value_flag = STATS_FS_FLOATING_VALUE),
+ STATS_FS_STAT(vals, u64, .type = &stats_fs_type_u64,
+ .aggr_kind = STATS_FS_SUM,
+ .value_flag = STATS_FS_FLOATING_VALUE),
+ { NULL },
+};
+
+struct stats_fs_value test_same_name[3] = {
+ STATS_FS_STAT(vals, s32, .type = &stats_fs_type_s32,
+ .aggr_kind = STATS_FS_NONE),
+ STATS_FS_STAT(vals, s32, .type = &stats_fs_type_s32,
+ .aggr_kind = STATS_FS_MIN),
+ { NULL },
+};
+
+struct stats_fs_value test_all_aggr[6] = {
+ STATS_FS_STAT(vals, s32, .type = &stats_fs_type_s32,
+ .aggr_kind = STATS_FS_MIN),
+ STATS_FS_STAT(vals, bo, .type = &stats_fs_type_bool,
+ .aggr_kind = STATS_FS_COUNT_ZERO,
+ .value_flag = STATS_FS_FLOATING_VALUE),
+ STATS_FS_STAT(vals, u64, .type = &stats_fs_type_u64,
+ .aggr_kind = STATS_FS_SUM),
+ STATS_FS_STAT(vals, u8, .type = &stats_fs_type_u8,
+ .aggr_kind = STATS_FS_AVG,
+ .value_flag = STATS_FS_FLOATING_VALUE),
+ STATS_FS_STAT(vals, s16, .type = &stats_fs_type_s16,
+ .aggr_kind = STATS_FS_MAX,
+ .value_flag = STATS_FS_FLOATING_VALUE),
+ { NULL },
+};
+
+#define def_u64 ((uint64_t)64)
+
+#define def_val_s32 ((int32_t)S32_MIN)
+#define def_val_bool ((bool)true)
+#define def_val_u8 ((uint8_t)127)
+#define def_val_s16 ((int16_t)10000)
+
+#define def_val2_s32 ((int32_t)S16_MAX)
+#define def_val2_bool ((bool)false)
+#define def_val2_u8 ((uint8_t)255)
+#define def_val2_s16 ((int16_t)-20000)
+
+struct container cont = {
+ .vals = {
+ .u64 = def_u64,
+ .s32 = def_val_s32,
+ .bo = def_val_bool,
+ .u8 = def_val_u8,
+ .s16 = def_val_s16,
+ },
+};
+
+struct container cont2 = {
+ .vals = {
+ .u64 = def_u64,
+ .s32 = def_val2_s32,
+ .bo = def_val2_bool,
+ .u8 = def_val2_u8,
+ .s16 = def_val2_s16,
+ },
+};
+
+static void get_stats_at_addr(struct stats_fs_source *src, void *addr,
+ int *aggr, int *val, int use_addr)
+{
+ struct stats_fs_value *entry;
+ struct stats_fs_value_source *src_entry;
+ int counter_val = 0, counter_aggr = 0;
+
+ list_for_each_entry (src_entry, &src->values_head, list_element) {
+ if (use_addr && src_entry->base_addr != addr)
+ continue;
+
+ for (entry = src_entry->values; entry->name; entry++) {
+ if (entry->aggr_kind == STATS_FS_NONE)
+ counter_val++;
+ else
+ counter_aggr++;
+ }
+ }
+
+ if (aggr)
+ *aggr = counter_aggr;
+
+ if (val)
+ *val = counter_val;
+}
+
+int source_has_subsource(struct stats_fs_source *src,
+ struct stats_fs_source *sub)
+{
+ struct stats_fs_source *entry;
+
+ list_for_each_entry (entry, &src->subordinates_head, list_element) {
+ if (entry == sub)
+ return 1;
+ }
+ return 0;
+}
+
+int get_number_subsources(struct stats_fs_source *src)
+{
+ struct stats_fs_source *entry;
+ int counter = 0;
+
+ list_for_each_entry (entry, &src->subordinates_head, list_element) {
+ counter++;
+ }
+ return counter;
+}
+
+int get_number_values(struct stats_fs_source *src)
+{
+ int counter = 0;
+
+ get_stats_at_addr(src, NULL, NULL, &counter, 0);
+ return counter;
+}
+
+int get_total_number_values(struct stats_fs_source *src)
+{
+ struct stats_fs_source *sub_entry;
+ int counter = 0;
+
+ get_stats_at_addr(src, NULL, NULL, &counter, 0);
+
+ list_for_each_entry (sub_entry, &src->subordinates_head, list_element) {
+ counter += get_total_number_values(sub_entry);
+ }
+
+ return counter;
+}
+
+int get_number_aggregates(struct stats_fs_source *src)
+{
+ int counter = 0;
+
+ get_stats_at_addr(src, NULL, &counter, NULL, 1);
+ return counter;
+}
+
+int get_number_values_with_base(struct stats_fs_source *src, void *addr)
+{
+ int counter = 0;
+
+ get_stats_at_addr(src, addr, NULL, &counter, 1);
+ return counter;
+}
+
+int get_number_aggr_with_base(struct stats_fs_source *src, void *addr)
+{
+ int counter = 0;
+
+ get_stats_at_addr(src, addr, &counter, NULL, 1);
+ return counter;
+}
+
+static void test_empty_folder(struct kunit *test)
+{
+ struct stats_fs_source *src;
+
+ src = stats_fs_source_create(0, "kvm_%d", 123);
+ KUNIT_EXPECT_EQ(test, strcmp(src->name, "kvm_123"), 0);
+ KUNIT_EXPECT_EQ(test, get_number_subsources(src), 0);
+ KUNIT_EXPECT_EQ(test, get_number_values(src), 0);
+ KUNIT_EXPECT_EQ(test, get_number_aggregates(src), 0);
+ stats_fs_source_put(src);
+}
+
+static void test_add_subfolder(struct kunit *test)
+{
+ struct stats_fs_source *src, *sub;
+
+ src = stats_fs_source_create(0, "parent");
+ sub = stats_fs_source_create(0, "child");
+ stats_fs_source_add_subordinate(src, sub);
+ KUNIT_EXPECT_EQ(test, source_has_subsource(src, sub), true);
+ KUNIT_EXPECT_EQ(test, get_number_subsources(src), 1);
+ KUNIT_EXPECT_EQ(test, get_number_values(src), 0);
+ KUNIT_EXPECT_EQ(test, get_number_aggregates(src), 0);
+ KUNIT_EXPECT_EQ(test, get_number_values(sub), 0);
+ KUNIT_EXPECT_EQ(test, get_number_aggregates(sub), 0);
+ KUNIT_EXPECT_EQ(test, get_total_number_values(src), 0);
+
+ stats_fs_source_put(sub);
+ sub = stats_fs_source_create(0, "not a child");
+ KUNIT_EXPECT_EQ(test, source_has_subsource(src, sub), false);
+ KUNIT_EXPECT_EQ(test, get_number_subsources(src), 1);
+
+ stats_fs_source_put(sub);
+ stats_fs_source_put(src);
+}
+
+static void test_add_value(struct kunit *test)
+{
+ struct stats_fs_source *src;
+ int n;
+
+ src = stats_fs_source_create(0, "parent");
+
+ // add values
+ n = stats_fs_source_add_values(src, test_values, &cont, 0);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ n = get_number_values_with_base(src, &cont);
+ KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_values));
+
+ // add same values, nothing happens
+ n = stats_fs_source_add_values(src, test_values, &cont, 0);
+ KUNIT_EXPECT_EQ(test, n, -EEXIST);
+ n = get_number_values_with_base(src, &cont);
+ KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_values));
+
+ // size is invaried
+ KUNIT_EXPECT_EQ(test, get_number_values(src), ARR_SIZE(test_values));
+
+ // no aggregates
+ n = get_number_aggr_with_base(src, &cont);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ KUNIT_EXPECT_EQ(test, get_number_values(src), ARR_SIZE(test_values));
+ KUNIT_EXPECT_EQ(test, get_number_aggregates(src), 0);
+
+ stats_fs_source_put(src);
+}
+
+static void test_add_value_in_subfolder(struct kunit *test)
+{
+ struct stats_fs_source *src, *sub, *sub_not;
+ int n;
+
+ src = stats_fs_source_create(0, "parent");
+ sub = stats_fs_source_create(0, "child");
+
+ // src -> sub
+ stats_fs_source_add_subordinate(src, sub);
+
+ // add values
+ n = stats_fs_source_add_values(sub, test_values, &cont, 0);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ n = get_number_values_with_base(sub, &cont);
+ KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_values));
+ KUNIT_EXPECT_EQ(test, get_number_values(src), 0);
+ KUNIT_EXPECT_EQ(test, get_number_aggregates(src), 0);
+ KUNIT_EXPECT_EQ(test, get_total_number_values(src),
+ ARR_SIZE(test_values));
+
+ KUNIT_EXPECT_EQ(test, get_number_values(sub), ARR_SIZE(test_values));
+ // no values in sub
+ KUNIT_EXPECT_EQ(test, get_number_aggregates(sub), 0);
+
+ // different folder
+ sub_not = stats_fs_source_create(0, "not a child");
+
+ // add values
+ n = stats_fs_source_add_values(sub_not, test_values, &cont, 0);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ n = get_number_values_with_base(sub_not, &cont);
+ KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_values));
+ KUNIT_EXPECT_EQ(test, get_number_values(src), 0);
+ KUNIT_EXPECT_EQ(test, get_number_aggregates(src), 0);
+ KUNIT_EXPECT_EQ(test, get_total_number_values(src),
+ ARR_SIZE(test_values));
+
+ // remove sub, check values is 0
+ stats_fs_source_remove_subordinate(src, sub);
+ KUNIT_EXPECT_EQ(test, get_total_number_values(src), 0);
+
+ // re-add sub, check value are added
+ stats_fs_source_add_subordinate(src, sub);
+ KUNIT_EXPECT_EQ(test, get_total_number_values(src),
+ ARR_SIZE(test_values));
+
+ // add sub_not, check value are twice as many
+ stats_fs_source_add_subordinate(src, sub_not);
+ KUNIT_EXPECT_EQ(test, get_total_number_values(src),
+ ARR_SIZE(test_values) * 2);
+
+ KUNIT_EXPECT_EQ(test, get_number_values(sub_not),
+ ARR_SIZE(test_values));
+ KUNIT_EXPECT_EQ(test, get_number_aggregates(sub_not), 0);
+
+ stats_fs_source_put(sub);
+ stats_fs_source_put(sub_not);
+ stats_fs_source_put(src);
+}
+
+static void test_search_value(struct kunit *test)
+{
+ struct stats_fs_source *src;
+ uint64_t ret;
+ int n;
+
+ src = stats_fs_source_create(0, "parent");
+
+ // add values
+ n = stats_fs_source_add_values(src, test_values, &cont, 0);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ n = get_number_values_with_base(src, &cont);
+ KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_values));
+
+ // get u64
+ n = stats_fs_source_get_value_by_name(src, "u64", &ret);
+ KUNIT_EXPECT_EQ(test, ret, def_u64);
+ KUNIT_EXPECT_EQ(test, n, 0);
+
+ n = stats_fs_source_get_value_by_name(src, "s32", &ret);
+ KUNIT_EXPECT_EQ(test, ((int32_t)ret), def_val_s32);
+ KUNIT_EXPECT_EQ(test, n, 0);
+
+ n = stats_fs_source_get_value_by_name(src, "bo", &ret);
+ KUNIT_EXPECT_EQ(test, ((bool)ret), def_val_bool);
+ KUNIT_EXPECT_EQ(test, n, 0);
+
+ // get a non-added value
+ n = stats_fs_source_get_value_by_name(src, "does not exist", &ret);
+ KUNIT_EXPECT_EQ(test, ret, 0ull);
+ KUNIT_EXPECT_EQ(test, n, -ENOENT);
+
+ stats_fs_source_put(src);
+}
+
+static void test_search_value_in_subfolder(struct kunit *test)
+{
+ struct stats_fs_source *src, *sub;
+ uint64_t ret;
+ int n;
+
+ src = stats_fs_source_create(0, "parent");
+ sub = stats_fs_source_create(0, "child");
+
+ // src -> sub
+ stats_fs_source_add_subordinate(src, sub);
+
+ // add values to sub
+ n = stats_fs_source_add_values(sub, test_values, &cont, 0);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ n = get_number_values_with_base(sub, &cont);
+ KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_values));
+
+ n = stats_fs_source_get_value_by_name(sub, "u64", &ret);
+ KUNIT_EXPECT_EQ(test, ret, def_u64);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ n = stats_fs_source_get_value_by_name(src, "u64", &ret);
+ KUNIT_EXPECT_EQ(test, ret, 0ull);
+ KUNIT_EXPECT_EQ(test, n, -ENOENT);
+
+ n = stats_fs_source_get_value_by_name(sub, "s32", &ret);
+ KUNIT_EXPECT_EQ(test, ((int32_t)ret), def_val_s32);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ n = stats_fs_source_get_value_by_name(src, "s32", &ret);
+ KUNIT_EXPECT_EQ(test, ret, 0ull);
+ KUNIT_EXPECT_EQ(test, n, -ENOENT);
+
+ n = stats_fs_source_get_value_by_name(sub, "bo", &ret);
+ KUNIT_EXPECT_EQ(test, ((bool)ret), def_val_bool);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ n = stats_fs_source_get_value_by_name(src, "bo", &ret);
+ KUNIT_EXPECT_EQ(test, ret, 0ull);
+ KUNIT_EXPECT_EQ(test, n, -ENOENT);
+
+ n = stats_fs_source_get_value_by_name(sub, "does not exist", &ret);
+ KUNIT_EXPECT_EQ(test, ret, 0ull);
+ KUNIT_EXPECT_EQ(test, n, -ENOENT);
+ n = stats_fs_source_get_value_by_name(src, "does not exist", &ret);
+ KUNIT_EXPECT_EQ(test, ret, 0ull);
+ KUNIT_EXPECT_EQ(test, n, -ENOENT);
+
+ stats_fs_source_put(sub);
+ stats_fs_source_put(src);
+}
+
+static void test_search_value_in_empty_folder(struct kunit *test)
+{
+ struct stats_fs_source *src;
+ uint64_t ret;
+ int n;
+
+ src = stats_fs_source_create(0, "empty folder");
+ KUNIT_EXPECT_EQ(test, get_number_aggregates(src), 0);
+ KUNIT_EXPECT_EQ(test, get_number_subsources(src), 0);
+ KUNIT_EXPECT_EQ(test, get_number_values(src), 0);
+
+ n = stats_fs_source_get_value_by_name(src, "u64", &ret);
+ KUNIT_EXPECT_EQ(test, ret, 0ull);
+ KUNIT_EXPECT_EQ(test, n, -ENOENT);
+
+ n = stats_fs_source_get_value_by_name(src, "s32", &ret);
+ KUNIT_EXPECT_EQ(test, ret, 0ull);
+ KUNIT_EXPECT_EQ(test, n, -ENOENT);
+
+ n = stats_fs_source_get_value_by_name(src, "bo", &ret);
+ KUNIT_EXPECT_EQ(test, ret, 0ull);
+ KUNIT_EXPECT_EQ(test, n, -ENOENT);
+
+ n = stats_fs_source_get_value_by_name(src, "does not exist", &ret);
+ KUNIT_EXPECT_EQ(test, ret, 0ull);
+ KUNIT_EXPECT_EQ(test, n, -ENOENT);
+
+ stats_fs_source_put(src);
+}
+
+static void test_add_aggregate(struct kunit *test)
+{
+ struct stats_fs_source *src;
+ int n;
+
+ src = stats_fs_source_create(0, "parent");
+
+ // add aggr to src, no values
+ n = stats_fs_source_add_values(src, test_aggr, NULL, 0);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ n = get_number_values_with_base(src, NULL);
+ KUNIT_EXPECT_EQ(test, n, 0);
+
+ // count values
+ n = get_number_aggr_with_base(src, NULL);
+ KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_aggr));
+
+ // add same array again, should not be added
+ n = stats_fs_source_add_values(src, test_aggr, NULL, 0);
+ KUNIT_EXPECT_EQ(test, n, -EEXIST);
+ n = get_number_aggr_with_base(src, NULL);
+ KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_aggr));
+
+ KUNIT_EXPECT_EQ(test, get_number_values(src), 0);
+ KUNIT_EXPECT_EQ(test, get_number_aggregates(src), ARR_SIZE(test_aggr));
+
+ stats_fs_source_put(src);
+}
+
+static void test_add_aggregate_in_subfolder(struct kunit *test)
+{
+ struct stats_fs_source *src, *sub, *sub_not;
+ int n;
+
+ src = stats_fs_source_create(0, "parent");
+ sub = stats_fs_source_create(0, "child");
+ // src->sub
+ stats_fs_source_add_subordinate(src, sub);
+
+ // add aggr to sub
+ n = stats_fs_source_add_values(sub, test_aggr, NULL, 0);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ n = get_number_aggr_with_base(sub, NULL);
+ KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_aggr));
+ KUNIT_EXPECT_EQ(test, get_number_values(src), 0);
+ KUNIT_EXPECT_EQ(test, get_number_aggregates(src), 0);
+ KUNIT_EXPECT_EQ(test, get_total_number_values(src), 0);
+
+ KUNIT_EXPECT_EQ(test, get_number_values(sub), 0);
+ KUNIT_EXPECT_EQ(test, get_number_aggregates(sub), ARR_SIZE(test_aggr));
+
+ // not a child
+ sub_not = stats_fs_source_create(0, "not a child");
+
+ // add aggr to "not a child"
+ n = stats_fs_source_add_values(sub_not, test_aggr, NULL, 0);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ n = get_number_aggr_with_base(sub_not, NULL);
+ KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_aggr));
+ KUNIT_EXPECT_EQ(test, get_number_values(src), 0);
+ KUNIT_EXPECT_EQ(test, get_number_aggregates(src), 0);
+ KUNIT_EXPECT_EQ(test, get_total_number_values(src), 0);
+
+ // remove sub
+ stats_fs_source_remove_subordinate(src, sub);
+ KUNIT_EXPECT_EQ(test, get_total_number_values(src), 0);
+
+ // re-add both
+ stats_fs_source_add_subordinate(src, sub);
+ KUNIT_EXPECT_EQ(test, get_total_number_values(src), 0);
+ stats_fs_source_add_subordinate(src, sub_not);
+ KUNIT_EXPECT_EQ(test, get_total_number_values(src), 0);
+
+ KUNIT_EXPECT_EQ(test, get_number_values(sub_not), 0);
+ KUNIT_EXPECT_EQ(test, get_number_aggregates(sub_not),
+ ARR_SIZE(test_aggr));
+
+ stats_fs_source_put(sub);
+ stats_fs_source_put(sub_not);
+ stats_fs_source_put(src);
+}
+
+static void test_search_aggregate(struct kunit *test)
+{
+ struct stats_fs_source *src;
+ uint64_t ret;
+ int n;
+
+ src = stats_fs_source_create(0, "parent");
+ n = stats_fs_source_add_values(src, test_aggr, NULL, 0);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ n = get_number_aggr_with_base(src, NULL);
+ KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_aggr));
+ n = get_number_aggr_with_base(src, &cont);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ n = stats_fs_source_get_value_by_name(src, "u64", &ret);
+ KUNIT_EXPECT_EQ(test, ret, 0ull);
+ KUNIT_EXPECT_EQ(test, n, 0);
+
+ n = stats_fs_source_get_value_by_name(src, "s32", &ret);
+ KUNIT_EXPECT_EQ(test, (int64_t)ret, S64_MAX);
+ KUNIT_EXPECT_EQ(test, n, 0);
+
+ n = stats_fs_source_get_value_by_name(src, "bo", &ret);
+ KUNIT_EXPECT_EQ(test, ret, 0ull);
+ KUNIT_EXPECT_EQ(test, n, 0);
+
+ n = stats_fs_source_get_value_by_name(src, "does not exist", &ret);
+ KUNIT_EXPECT_EQ(test, ret, 0ull);
+ KUNIT_EXPECT_EQ(test, n, -ENOENT);
+ stats_fs_source_put(src);
+}
+
+static void test_search_aggregate_in_subfolder(struct kunit *test)
+{
+ struct stats_fs_source *src, *sub;
+ uint64_t ret;
+ int n;
+
+ src = stats_fs_source_create(0, "parent");
+ sub = stats_fs_source_create(0, "child");
+
+ stats_fs_source_add_subordinate(src, sub);
+
+ n = stats_fs_source_add_values(sub, test_aggr, NULL, 0);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ n = get_number_aggr_with_base(sub, NULL);
+ KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_aggr));
+ n = get_number_aggr_with_base(sub, &cont);
+ KUNIT_EXPECT_EQ(test, n, 0);
+
+ // no u64 in test_aggr
+ n = stats_fs_source_get_value_by_name(sub, "u64", &ret);
+ KUNIT_EXPECT_EQ(test, ret, 0ull);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ n = stats_fs_source_get_value_by_name(src, "u64", &ret);
+ KUNIT_EXPECT_EQ(test, ret, 0ull);
+ KUNIT_EXPECT_EQ(test, n, -ENOENT);
+
+ n = stats_fs_source_get_value_by_name(sub, "s32", &ret);
+ KUNIT_EXPECT_EQ(test, (int64_t)ret, S64_MAX);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ n = stats_fs_source_get_value_by_name(src, "s32", &ret);
+ KUNIT_EXPECT_EQ(test, ret, 0ull);
+ KUNIT_EXPECT_EQ(test, n, -ENOENT);
+
+ n = stats_fs_source_get_value_by_name(sub, "bo", &ret);
+ KUNIT_EXPECT_EQ(test, ret, 0ull);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ n = stats_fs_source_get_value_by_name(src, "bo", &ret);
+ KUNIT_EXPECT_EQ(test, ret, 0ull);
+ KUNIT_EXPECT_EQ(test, n, -ENOENT);
+
+ n = stats_fs_source_get_value_by_name(sub, "does not exist", &ret);
+ KUNIT_EXPECT_EQ(test, ret, 0ull);
+ KUNIT_EXPECT_EQ(test, n, -ENOENT);
+ n = stats_fs_source_get_value_by_name(src, "does not exist", &ret);
+ KUNIT_EXPECT_EQ(test, ret, 0ull);
+ KUNIT_EXPECT_EQ(test, n, -ENOENT);
+
+ stats_fs_source_put(sub);
+ stats_fs_source_put(src);
+}
+
+void test_search_same(struct kunit *test)
+{
+ struct stats_fs_source *src;
+ uint64_t ret;
+ int n;
+
+ src = stats_fs_source_create(0, "parent");
+ n = stats_fs_source_add_values(src, test_same_name, &cont, 0);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ n = get_number_values_with_base(src, &cont);
+ KUNIT_EXPECT_EQ(test, n, 1);
+ n = get_number_aggr_with_base(src, &cont);
+ KUNIT_EXPECT_EQ(test, n, 1);
+
+ n = stats_fs_source_add_values(src, test_same_name, &cont, 0);
+ KUNIT_EXPECT_EQ(test, n, -EEXIST);
+ n = get_number_values_with_base(src, &cont);
+ KUNIT_EXPECT_EQ(test, n, 1);
+ n = get_number_aggr_with_base(src, &cont);
+ KUNIT_EXPECT_EQ(test, n, 1);
+
+ // returns first the value
+ n = stats_fs_source_get_value_by_name(src, "s32", &ret);
+ KUNIT_EXPECT_EQ(test, ((int32_t)ret), def_val_s32);
+ KUNIT_EXPECT_EQ(test, n, 0);
+
+ stats_fs_source_put(src);
+}
+
+static void test_add_mixed(struct kunit *test)
+{
+ struct stats_fs_source *src;
+ int n;
+
+ src = stats_fs_source_create(0, "parent");
+
+ n = stats_fs_source_add_values(src, test_aggr, NULL, 0);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ n = get_number_values_with_base(src, NULL);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ n = stats_fs_source_add_values(src, test_values, &cont, 0);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ n = get_number_aggr_with_base(src, NULL);
+ KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_aggr));
+
+ n = stats_fs_source_add_values(src, test_values, &cont, 0);
+ KUNIT_EXPECT_EQ(test, n, -EEXIST);
+ n = get_number_values_with_base(src, &cont);
+ KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_values));
+ n = stats_fs_source_add_values(src, test_aggr, NULL, 0);
+ KUNIT_EXPECT_EQ(test, n, -EEXIST);
+ n = get_number_aggr_with_base(src, NULL);
+ KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_aggr));
+
+ KUNIT_EXPECT_EQ(test, get_number_values(src), ARR_SIZE(test_values));
+ KUNIT_EXPECT_EQ(test, get_number_aggregates(src), ARR_SIZE(test_aggr));
+ stats_fs_source_put(src);
+}
+
+static void test_search_mixed(struct kunit *test)
+{
+ struct stats_fs_source *src, *sub;
+ uint64_t ret;
+ int n;
+
+ src = stats_fs_source_create(0, "parent");
+ sub = stats_fs_source_create(0, "child");
+ stats_fs_source_add_subordinate(src, sub);
+
+ // src has the aggregates, sub the values. Just search
+ n = stats_fs_source_add_values(sub, test_values, &cont, 0);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ n = get_number_values_with_base(sub, &cont);
+ KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_values));
+ n = stats_fs_source_add_values(src, test_aggr, &cont, 0);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ n = get_number_aggr_with_base(src, &cont);
+ KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_aggr));
+
+ // u64 is sum so again same value
+ n = stats_fs_source_get_value_by_name(sub, "u64", &ret);
+ KUNIT_EXPECT_EQ(test, ret, def_u64);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ n = stats_fs_source_get_value_by_name(src, "u64", &ret);
+ KUNIT_EXPECT_EQ(test, ret, def_u64);
+ KUNIT_EXPECT_EQ(test, n, 0);
+
+ // s32 is min so return the value also in the aggregate
+ n = stats_fs_source_get_value_by_name(sub, "s32", &ret);
+ KUNIT_EXPECT_EQ(test, ((int32_t)ret), def_val_s32);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ n = stats_fs_source_get_value_by_name(src, "s32", &ret);
+ KUNIT_EXPECT_EQ(test, ((int32_t)ret), def_val_s32);
+ KUNIT_EXPECT_EQ(test, n, 0);
+
+ // bo is max
+ n = stats_fs_source_get_value_by_name(sub, "bo", &ret);
+ KUNIT_EXPECT_EQ(test, (bool)ret, def_val_bool);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ n = stats_fs_source_get_value_by_name(src, "bo", &ret);
+ KUNIT_EXPECT_EQ(test, (bool)ret, def_val_bool);
+ KUNIT_EXPECT_EQ(test, n, 0);
+
+ n = stats_fs_source_get_value_by_name(sub, "does not exist", &ret);
+ KUNIT_EXPECT_EQ(test, ret, 0ull);
+ KUNIT_EXPECT_EQ(test, n, -ENOENT);
+ n = stats_fs_source_get_value_by_name(src, "does not exist", &ret);
+ KUNIT_EXPECT_EQ(test, ret, 0ull);
+ KUNIT_EXPECT_EQ(test, n, -ENOENT);
+
+ stats_fs_source_put(sub);
+ stats_fs_source_put(src);
+}
+
+static void test_all_aggregations_agg_val_val(struct kunit *test)
+{
+ struct stats_fs_source *src, *sub1, *sub2;
+ uint64_t ret;
+ int n;
+
+ src = stats_fs_source_create(0, "parent");
+ sub1 = stats_fs_source_create(0, "child1");
+ sub2 = stats_fs_source_create(0, "child2");
+ stats_fs_source_add_subordinate(src, sub1);
+ stats_fs_source_add_subordinate(src, sub2);
+
+ n = stats_fs_source_add_values(sub1, test_all_aggr, &cont, 0);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ n = get_number_aggr_with_base(sub1, &cont);
+ KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_all_aggr));
+ n = stats_fs_source_add_values(sub2, test_all_aggr, &cont2, 0);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ n = get_number_aggr_with_base(sub2, &cont2);
+ KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_all_aggr));
+
+ n = stats_fs_source_add_values(src, test_all_aggr, NULL, 0);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ n = get_number_aggr_with_base(src, NULL);
+ KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_all_aggr));
+
+ // sum
+ n = stats_fs_source_get_value_by_name(src, "u64", &ret);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ KUNIT_EXPECT_EQ(test, ret, def_u64 * 2);
+
+ // min
+ n = stats_fs_source_get_value_by_name(src, "s32", &ret);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ KUNIT_EXPECT_EQ(test, ((int32_t)ret), def_val_s32);
+
+ // count_0
+ n = stats_fs_source_get_value_by_name(src, "bo", &ret);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ KUNIT_EXPECT_EQ(test, ret, 1ull);
+
+ // avg
+ n = stats_fs_source_get_value_by_name(src, "u8", &ret);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ KUNIT_EXPECT_EQ(test, ret, 191ull);
+
+ // max
+ n = stats_fs_source_get_value_by_name(src, "s16", &ret);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ KUNIT_EXPECT_EQ(test, (int16_t)ret, def_val_s16);
+
+ stats_fs_source_put(sub1);
+ stats_fs_source_put(sub2);
+ stats_fs_source_put(src);
+}
+
+static void test_all_aggregations_val_agg_val(struct kunit *test)
+{
+ struct stats_fs_source *src, *sub1, *sub2;
+ uint64_t ret;
+ int n;
+
+ src = stats_fs_source_create(0, "parent");
+ sub1 = stats_fs_source_create(0, "child1");
+ sub2 = stats_fs_source_create(0, "child2");
+ stats_fs_source_add_subordinate(src, sub1);
+ stats_fs_source_add_subordinate(src, sub2);
+
+ n = stats_fs_source_add_values(src, test_all_aggr, &cont, 0);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ n = get_number_aggr_with_base(src, &cont);
+ KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_all_aggr));
+ n = stats_fs_source_add_values(sub2, test_all_aggr, &cont2, 0);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ n = get_number_aggr_with_base(sub2, &cont2);
+ KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_all_aggr));
+
+ n = stats_fs_source_add_values(sub1, test_all_aggr, NULL, 0);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ n = get_number_aggr_with_base(sub1, NULL);
+ KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_all_aggr));
+
+ n = stats_fs_source_get_value_by_name(src, "u64", &ret);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ KUNIT_EXPECT_EQ(test, ret, def_u64);
+ n = stats_fs_source_get_value_by_name(sub1, "u64", &ret);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ KUNIT_EXPECT_EQ(test, ret, 0ull);
+ n = stats_fs_source_get_value_by_name(sub2, "u64", &ret);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ KUNIT_EXPECT_EQ(test, ret, def_u64);
+
+ n = stats_fs_source_get_value_by_name(src, "s32", &ret);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ KUNIT_EXPECT_EQ(test, ((int32_t)ret), def_val_s32);
+ n = stats_fs_source_get_value_by_name(sub1, "s32", &ret);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ KUNIT_EXPECT_EQ(test, (int64_t)ret, S64_MAX); // MIN
+ n = stats_fs_source_get_value_by_name(sub2, "s32", &ret);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ KUNIT_EXPECT_EQ(test, ((int32_t)ret), def_val2_s32);
+
+ n = stats_fs_source_get_value_by_name(src, "bo", &ret);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ KUNIT_EXPECT_EQ(test, (bool)ret, def_val_bool);
+ n = stats_fs_source_get_value_by_name(sub1, "bo", &ret);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ KUNIT_EXPECT_EQ(test, ret, 0ull);
+ n = stats_fs_source_get_value_by_name(sub2, "bo", &ret);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ KUNIT_EXPECT_EQ(test, (bool)ret, def_val2_bool);
+
+ n = stats_fs_source_get_value_by_name(src, "u8", &ret);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ KUNIT_EXPECT_EQ(test, (uint8_t)ret, def_val_u8);
+ n = stats_fs_source_get_value_by_name(sub1, "u8", &ret);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ KUNIT_EXPECT_EQ(test, ret, 0ull);
+ n = stats_fs_source_get_value_by_name(sub2, "u8", &ret);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ KUNIT_EXPECT_EQ(test, (uint8_t)ret, def_val2_u8);
+
+ n = stats_fs_source_get_value_by_name(src, "s16", &ret);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ KUNIT_EXPECT_EQ(test, (int16_t)ret, def_val_s16);
+ n = stats_fs_source_get_value_by_name(sub1, "s16", &ret);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ KUNIT_EXPECT_EQ(test, (int64_t)ret, S64_MIN); // MAX
+ n = stats_fs_source_get_value_by_name(sub2, "s16", &ret);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ KUNIT_EXPECT_EQ(test, (int16_t)ret, def_val2_s16);
+
+ stats_fs_source_put(sub1);
+ stats_fs_source_put(sub2);
+ stats_fs_source_put(src);
+}
+
+static void test_all_aggregations_agg_val_val_sub(struct kunit *test)
+{
+ struct stats_fs_source *src, *sub1, *sub11;
+ uint64_t ret;
+ int n;
+
+ src = stats_fs_source_create(0, "parent");
+ sub1 = stats_fs_source_create(0, "child1");
+ sub11 = stats_fs_source_create(0, "child11");
+ stats_fs_source_add_subordinate(src, sub1);
+ stats_fs_source_add_subordinate(sub1, sub11); // changes here!
+
+ n = stats_fs_source_add_values(sub1, test_values, &cont, 0);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ n = get_number_values_with_base(sub1, &cont);
+ KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_values));
+ n = stats_fs_source_add_values(sub11, test_values, &cont2, 0);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ n = get_number_values_with_base(sub11, &cont2);
+ KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_values));
+
+ KUNIT_EXPECT_EQ(test, get_total_number_values(src),
+ ARR_SIZE(test_values) * 2);
+
+ n = stats_fs_source_add_values(sub1, test_all_aggr, &cont, 0);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ n = get_number_aggr_with_base(sub1, &cont);
+ KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_all_aggr));
+ n = stats_fs_source_add_values(sub11, test_all_aggr, &cont2, 0);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ n = get_number_aggr_with_base(sub11, &cont2);
+ KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_all_aggr));
+
+ n = stats_fs_source_add_values(src, test_all_aggr, NULL, 0);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ n = get_number_aggr_with_base(src, NULL);
+ KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_all_aggr));
+
+ // sum
+ n = stats_fs_source_get_value_by_name(src, "u64", &ret);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ KUNIT_EXPECT_EQ(test, ret, def_u64 * 2);
+
+ // min
+ n = stats_fs_source_get_value_by_name(src, "s32", &ret);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ KUNIT_EXPECT_EQ(test, ((int32_t)ret), def_val_s32);
+
+ // count_0
+ n = stats_fs_source_get_value_by_name(src, "bo", &ret);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ KUNIT_EXPECT_EQ(test, ret, 1ull);
+
+ // avg
+ n = stats_fs_source_get_value_by_name(src, "u8", &ret);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ KUNIT_EXPECT_EQ(test, ret, 191ull);
+
+ // max
+ n = stats_fs_source_get_value_by_name(src, "s16", &ret);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ KUNIT_EXPECT_EQ(test, (int16_t)ret, def_val_s16);
+
+ stats_fs_source_put(sub1);
+ stats_fs_source_put(sub11);
+ stats_fs_source_put(src);
+}
+
+static void test_all_aggregations_agg_no_val_sub(struct kunit *test)
+{
+ struct stats_fs_source *src, *sub1, *sub11;
+ uint64_t ret;
+ int n;
+
+ src = stats_fs_source_create(0, "parent");
+ sub1 = stats_fs_source_create(0, "child1");
+ sub11 = stats_fs_source_create(0, "child11");
+ stats_fs_source_add_subordinate(src, sub1);
+ stats_fs_source_add_subordinate(sub1, sub11);
+
+ n = stats_fs_source_add_values(sub11, test_all_aggr, &cont2, 0);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ n = get_number_aggr_with_base(sub11, &cont2);
+ KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_all_aggr));
+
+ KUNIT_EXPECT_EQ(test, get_total_number_values(src), 0);
+
+ n = stats_fs_source_add_values(src, test_all_aggr, NULL, 0);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ n = get_number_aggr_with_base(src, NULL);
+ KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_all_aggr));
+
+ // sum
+ n = stats_fs_source_get_value_by_name(src, "u64", &ret);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ KUNIT_EXPECT_EQ(test, ret, def_u64);
+
+ // min
+ n = stats_fs_source_get_value_by_name(src, "s32", &ret);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ KUNIT_EXPECT_EQ(test, ((int32_t)ret), def_val2_s32);
+
+ // count_0
+ n = stats_fs_source_get_value_by_name(src, "bo", &ret);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ KUNIT_EXPECT_EQ(test, ret, 1ull);
+
+ // avg
+ n = stats_fs_source_get_value_by_name(src, "u8", &ret);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ KUNIT_EXPECT_EQ(test, (uint8_t)ret, def_val2_u8);
+
+ // max
+ n = stats_fs_source_get_value_by_name(src, "s16", &ret);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ KUNIT_EXPECT_EQ(test, (int16_t)ret, def_val2_s16);
+
+ stats_fs_source_put(sub1);
+ stats_fs_source_put(sub11);
+ stats_fs_source_put(src);
+}
+
+static void test_all_aggregations_agg_agg_val_sub(struct kunit *test)
+{
+ struct stats_fs_source *src, *sub1, *sub11, *sub12;
+ uint64_t ret;
+ int n;
+
+ src = stats_fs_source_create(0, "parent");
+ sub1 = stats_fs_source_create(0, "child1");
+ sub11 = stats_fs_source_create(0, "child11");
+ sub12 = stats_fs_source_create(0, "child12");
+ stats_fs_source_add_subordinate(src, sub1);
+ stats_fs_source_add_subordinate(sub1, sub11);
+ stats_fs_source_add_subordinate(sub1, sub12);
+
+ n = stats_fs_source_add_values(sub11, test_all_aggr, &cont2, 0);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ n = get_number_aggr_with_base(sub11, &cont2);
+ KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_all_aggr));
+
+ n = stats_fs_source_add_values(sub12, test_all_aggr, &cont, 0);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ n = get_number_aggr_with_base(sub12, &cont);
+ KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_all_aggr));
+
+ KUNIT_EXPECT_EQ(test, get_total_number_values(src), 0);
+
+ n = stats_fs_source_add_values(src, test_all_aggr, NULL, 0);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ n = get_number_aggr_with_base(src, NULL);
+ KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_all_aggr));
+
+ n = stats_fs_source_add_values(sub1, test_all_aggr, NULL, 0);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ n = get_number_aggr_with_base(sub1, NULL);
+ KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_all_aggr));
+
+ // sum
+ n = stats_fs_source_get_value_by_name(src, "u64", &ret);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ KUNIT_EXPECT_EQ(test, ret, def_u64 * 2);
+
+ // min
+ n = stats_fs_source_get_value_by_name(src, "s32", &ret);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ KUNIT_EXPECT_EQ(test, ((int32_t)ret), def_val_s32);
+
+ // count_0
+ n = stats_fs_source_get_value_by_name(src, "bo", &ret);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ KUNIT_EXPECT_EQ(test, ret, 1ull);
+
+ // avg
+ n = stats_fs_source_get_value_by_name(src, "u8", &ret);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ KUNIT_EXPECT_EQ(test, (uint8_t)ret,
+ (uint8_t)((def_val2_u8 + def_val_u8) / 2));
+
+ // max
+ n = stats_fs_source_get_value_by_name(src, "s16", &ret);
+ KUNIT_EXPECT_EQ(test, n, 0);
+ KUNIT_EXPECT_EQ(test, (int16_t)ret, def_val_s16);
+
+ stats_fs_source_put(sub1);
+ stats_fs_source_put(sub11);
+ stats_fs_source_put(sub12);
+ stats_fs_source_put(src);
+}
+
+static struct kunit_case stats_fs_test_cases[] = {
+ KUNIT_CASE(test_empty_folder),
+ KUNIT_CASE(test_add_subfolder),
+ KUNIT_CASE(test_add_value),
+ KUNIT_CASE(test_add_value_in_subfolder),
+ KUNIT_CASE(test_search_value),
+ KUNIT_CASE(test_search_value_in_subfolder),
+ KUNIT_CASE(test_search_value_in_empty_folder),
+ KUNIT_CASE(test_add_aggregate),
+ KUNIT_CASE(test_add_aggregate_in_subfolder),
+ KUNIT_CASE(test_search_aggregate),
+ KUNIT_CASE(test_search_aggregate_in_subfolder),
+ KUNIT_CASE(test_search_same),
+ KUNIT_CASE(test_add_mixed),
+ KUNIT_CASE(test_search_mixed),
+ KUNIT_CASE(test_all_aggregations_agg_val_val),
+ KUNIT_CASE(test_all_aggregations_val_agg_val),
+ KUNIT_CASE(test_all_aggregations_agg_val_val_sub),
+ KUNIT_CASE(test_all_aggregations_agg_no_val_sub),
+ KUNIT_CASE(test_all_aggregations_agg_agg_val_sub),
+ {}
+};
+
+static struct kunit_suite stats_fs_test_suite = {
+ .name = "stats_fs",
+ .test_cases = stats_fs_test_cases,
+};
+
+kunit_test_suite(stats_fs_test_suite);
--
2.25.4
^ permalink raw reply related
* [PATCH v3 2/7] documentation for stats_fs
From: Emanuele Giuseppe Esposito @ 2020-05-26 11:03 UTC (permalink / raw)
To: kvm
Cc: Emanuele Giuseppe Esposito, linux-s390, linux-doc, netdev,
Emanuele Giuseppe Esposito, linux-kernel, kvm-ppc, Jonathan Adams,
Christian Borntraeger, Alexander Viro, David Rientjes,
linux-fsdevel, Paolo Bonzini, linux-mips, linuxppc-dev,
linux-arm-kernel, Jim Mattson
In-Reply-To: <20200526110318.69006-1-eesposit@redhat.com>
Html docs for a complete documentation of the stats_fs API,
filesystem and usage.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
Documentation/filesystems/index.rst | 1 +
Documentation/filesystems/stats_fs.rst | 222 +++++++++++++++++++++++++
2 files changed, 223 insertions(+)
create mode 100644 Documentation/filesystems/stats_fs.rst
diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index e7b46dac7079..9a46fd851c6e 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -89,6 +89,7 @@ Documentation for filesystem implementations.
relay
romfs
squashfs
+ stats_fs
sysfs
sysv-fs
tmpfs
diff --git a/Documentation/filesystems/stats_fs.rst b/Documentation/filesystems/stats_fs.rst
new file mode 100644
index 000000000000..292c689ffb98
--- /dev/null
+++ b/Documentation/filesystems/stats_fs.rst
@@ -0,0 +1,222 @@
+========
+Stats_FS
+========
+
+Stats_fs is a synthetic ram-based virtual filesystem that takes care of
+gathering and displaying statistics for the Linux kernel subsystems.
+
+The motivation for stats_fs comes from the fact that there is no common
+way for Linux kernel subsystems to expose statistics to userspace shared
+throughout the Linux kernel; subsystems have to take care of gathering and
+displaying statistics by themselves, for example in the form of files in
+debugfs.
+
+Allowing each subsystem of the kernel to do so has two disadvantages.
+First, it will introduce redundant code. Second, debugfs is anyway not the
+right place for statistics (for example it is affected by lockdown).
+
+Stats_fs offers a generic and stable API, allowing any kind of
+directory/file organization and supporting multiple kind of aggregations
+(not only sum, but also average, max, min and count_zero) and data types
+(boolean, all unsigned/signed and custom types). The implementation takes
+care of gathering and displaying information at run time; users only need
+to specify the values to be included in each source. Optionally, users can
+also provide a display function for each value, that will take care of
+displaying the provided value in a custom format.
+
+Its main function is to display each statistics as a file in the desired
+folder hierarchy defined through the API. Stats_fs files can be read, and
+possibly cleared if their file mode allows it.
+
+Stats_fs is typically mounted with a command like::
+
+ mount -t stats_fs stats_fs /sys/kernel/stats_fs
+
+(Or an equivalent /etc/fstab line).
+
+Stats_fs has two main components: the public API defined by
+include/linux/stats_fs.h, and the virtual file system in
+/sys/kernel/stats.
+
+The API has two main elements, values and sources. Kernel
+subsystems will create a source, add child
+sources/values/aggregates and register it to the root source (that on the
+virtual fs would be /sys/kernel/stats).
+
+The stats_fs API is defined in ``<linux/stats_fs.h>``.
+
+ Sources
+ Sources are created via ``stats_fs_source_create()``, and each
+ source becomes a directory in the file system. Sources form a
+ parent-child relationship; root sources are added to the file
+ system via ``stats_fs_source_register()``. Therefore each Linux
+ subsystem will add its own entry to the root, filesystem similar
+ to what it is done in debugfs. Every other source is added to or
+ removed from a parent through the
+ ``stats_fs_source_add_subordinate()`` and
+ ``stats_fs_source_remove_subordinate()`` APIs. Once a source is
+ created and added to the tree (via add_subordinate), it will be
+ used to compute aggregate values in the parent source. A source
+ can optionally be hidden from the filesystem but still considered
+ in the aggregation operations if the corresponding flag is set
+ during initialization.
+
+ Values
+ Values represent quantites that are gathered by the stats_fs user.
+ Examples of values include the number of vm exits of a given kind,
+ the amount of memory used by some data structure, the length of
+ the longest hash table chain, or anything like that. Values are
+ defined with the stats_fs_source_add_values function. Each value
+ is defined by a ``struct stats_fs_value``; the same
+ ``stats_fs_value`` can be added to many different sources. A value
+ can be considered "simple" if it fetches data from a user-provided
+ location, or "aggregate" if it groups all values in the
+ subordinate sources that include the same ``stats_fs_value``.
+ Values by default are considered to be cumulative, meaning the
+ value they represent never decreases, but can also be defined as
+ floating if they exibith a different behavior. The main difference
+ between these two is reflected into the file permission, since a
+ floating value file does not allow the user to clear it. Each
+ value has a ``stats_fs_type`` pointer in order to allow the user
+ to provide custom get and clear functions. The library, however,
+ also exports default ``stats_fs_type`` structs for the standard
+ types (all unsigned and signed types plus boolean). A value can
+ also provide a show function that takes care of displaying the
+ value in a custom string format. This can be especially useful
+ when displaying enums.
+
+Because stats_fs is a different mountpoint than debugfs, it is not affected
+by security lockdown.
+
+Using Stats_fs
+================
+
+Define a value::
+
+ struct statistics{
+ uint64_t exit;
+ ...
+ };
+
+ struct kvm {
+ ...
+ struct statistics stat;
+ };
+
+ struct stats_fs_value kvm_stats[] = {
+ { "exit_vm", offsetof(struct kvm, stat.exit), &stats_fs_type_u64,
+ STATS_FS_SUM },
+ { NULL }
+ };
+
+The same ``struct stats_fs_value`` is used for both simple and aggregate
+values, though the type and offset are only used for simple values.
+Aggregates merge all values that use the same ``struct stats_fs_value``.
+
+Create the parent source::
+
+ struct stats_fs_source parent_source = stats_fs_source_create(0, "parent");
+
+Register it (files and folders
+will only be visible after this function is called)::
+
+ stats_fs_source_register(parent_source);
+
+Create and add a child::
+
+ struct stats_fs_source child_source = stats_fs_source_create(STATS_FS_HIDDEN, "child");
+
+ stats_fs_source_add_subordinate(parent_source, child_source);
+
+The STATS_FS_HIDDEN attribute won't affect the aggregation, it will only
+block the creation of the files.
+
+Add values to parent and child (also here order doesn't matter)::
+
+ struct kvm *base_ptr = kmalloc(..., sizeof(struct kvm));
+ ...
+ stats_fs_source_add_values(child_source, kvm_stats, base_ptr, 0);
+ stats_fs_source_add_values(parent_source, kvm_stats, NULL, STATS_FS_HIDDEN);
+
+``child_source`` will be a simple value, since it has a non-NULL base
+pointer, while ``parent_source`` will be an aggregate. During the adding
+phase, also values can optionally be marked as hidden, so that the folder
+and other values can be still shown.
+
+Of course the same ``struct stats_fs_value`` array can be also passed with a
+different base pointer, to represent the same value but in another instance
+of the kvm struct.
+
+Search:
+
+Fetch a value from the child source, returning the value
+pointed by ``(uint64_t *) base_ptr + kvm_stats[0].offset``::
+
+ uint64_t ret_child, ret_parent;
+
+ stats_fs_source_get_value(child_source, &kvm_stats[0], &ret_child);
+
+Fetch an aggregate value, searching all subsources of ``parent_source`` for
+the specified ``struct stats_fs_value``::
+
+ stats_fs_source_get_value(parent_source, &kvm_stats[0], &ret_parent);
+
+ assert(ret_child == ret_parent); // check expected result
+
+To make it more interesting, add another child::
+
+ struct stats_fs_source child_source2 = stats_fs_source_create(0, "child2");
+
+ stats_fs_source_add_subordinate(parent_source, child_source2);
+ // now the structure is parent -> child1
+ // -> child2
+
+ struct kvm *other_base_ptr = kmalloc(..., sizeof(struct kvm));
+ ...
+ stats_fs_source_add_values(child_source2, kvm_stats, other_base_ptr, 0);
+
+Note that other_base_ptr points to another instance of kvm, so the struct
+stats_fs_value is the same but the address at which they point is not.
+
+Now get the aggregate value::
+
+ uint64_t ret_child, ret_child2, ret_parent;
+
+ stats_fs_source_get_value(child_source, &kvm_stats[0], &ret_child);
+ stats_fs_source_get_value(parent_source, &kvm_stats[0], &ret_parent);
+ stats_fs_source_get_value(child_source2, &kvm_stats[0], &ret_child2);
+
+ assert((ret_child + ret_child2) == ret_parent);
+
+Cleanup::
+
+ stats_fs_source_remove_subordinate(parent_source, child_source);
+ stats_fs_source_revoke(child_source);
+ stats_fs_source_put(child_source);
+
+ stats_fs_source_remove_subordinate(parent_source, child_source2);
+ stats_fs_source_revoke(child_source2);
+ stats_fs_source_put(child_source2);
+
+ stats_fs_source_put(parent_source);
+ kfree(other_base_ptr);
+ kfree(base_ptr);
+
+Calling stats_fs_source_revoke is very important, because it will ensure
+that stats_fs will not access the data that were passed to
+stats_fs_source_add_value for this source.
+
+Because open files increase the reference count for a stats_fs_source, the
+source can end up living longer than the data that provides the values for
+the source. Calling stats_fs_source_revoke just before the backing data
+is freed avoids accesses to freed data structures. The sources will return
+0.
+
+This is not needed for the parent_source, since it just contains
+aggregates that would be 0 anyways if no matching child value exist.
+
+API Documentation
+=================
+
+.. kernel-doc:: include/linux/stats_fs.h
+ :export: fs/stats_fs/*.c
\ No newline at end of file
--
2.25.4
^ permalink raw reply related
* [PATCH v3 1/7] stats_fs API: create, add and remove stats_fs sources and values
From: Emanuele Giuseppe Esposito @ 2020-05-26 11:03 UTC (permalink / raw)
To: kvm
Cc: Emanuele Giuseppe Esposito, linux-s390, linux-doc, netdev,
Emanuele Giuseppe Esposito, linux-kernel, kvm-ppc, Jonathan Adams,
Christian Borntraeger, Alexander Viro, David Rientjes,
linux-fsdevel, Paolo Bonzini, linux-mips, linuxppc-dev,
linux-arm-kernel, Jim Mattson
In-Reply-To: <20200526110318.69006-1-eesposit@redhat.com>
Introduction to the stats_fs API, that allows to easily create, add
and remove stats_fs sources and values. The API allows to easily building
the statistics directory tree to automatically gather them for the linux
kernel. The main functionalities are: create a source, add child
sources/values/aggregates, register it to the root source (that on
the virtual fs would be /sys/kernel/stats), ad perform a search for
a value/aggregate.
Each source and value has an optional flag parameter:
in a value, it represent whether the statistic is cumulative or floating, in a
source whether it should be visible from the filesystem or not.
Defaults are respectively cumulative and visible.
Both flags fields are represented as an uint32_t to offer portability for
future flags.
Each value also takes a struct stats_fs_type pointer that defines
get and clear function for that stat, allowing custom
types handling. The API also provides default get and clear types for
the supported standard types (stats_fs_type_*).
The API representation is only logical and will be backed up
by a virtual file system in patch 4.
Its usage will be shared between the stats_fs file system
and the end-users like kvm, the former calling it when it needs to
display and clear statistics, the latter to add values and sources.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
MAINTAINERS | 7 +
fs/Kconfig | 14 +
fs/Makefile | 1 +
fs/stats_fs/Makefile | 5 +
fs/stats_fs/internal.h | 19 ++
fs/stats_fs/stats_fs.c | 552 +++++++++++++++++++++++++++++++++++++++
fs/stats_fs/stub.c | 13 +
include/linux/stats_fs.h | 363 +++++++++++++++++++++++++
8 files changed, 974 insertions(+)
create mode 100644 fs/stats_fs/Makefile
create mode 100644 fs/stats_fs/internal.h
create mode 100644 fs/stats_fs/stats_fs.c
create mode 100644 fs/stats_fs/stub.c
create mode 100644 include/linux/stats_fs.h
diff --git a/MAINTAINERS b/MAINTAINERS
index b816a453b10e..a8403d07cee5 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5229,6 +5229,13 @@ F: include/linux/debugfs.h
F: include/linux/kobj*
F: lib/kobj*
+STATS_FS
+M: Paolo Bonzini <pbonzini@redhat.com>
+R: Emanuele Giuseppe Esposito <e.emanuelegiuseppe@gmail.com>
+S: Supported
+F: include/linux/stats_fs.h
+F: fs/stats_fs
+
DRIVERS FOR ADAPTIVE VOLTAGE SCALING (AVS)
M: Kevin Hilman <khilman@kernel.org>
M: Nishanth Menon <nm@ti.com>
diff --git a/fs/Kconfig b/fs/Kconfig
index f08fbbfafd9a..684ad61129ab 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -221,6 +221,20 @@ config MEMFD_CREATE
config ARCH_HAS_GIGANTIC_PAGE
bool
+config STATS_FS
+ bool "Statistics Filesystem"
+ help
+ stats_fs is a virtual file system that provides counters and
+ other statistics about the running kernel.
+
+config STATS_FS_API
+ bool
+ imply STATS_FS
+
+config STATS_FS_STUB
+ bool
+ default y if STATS_FS_API && !STATS_FS
+
source "fs/configfs/Kconfig"
source "fs/efivarfs/Kconfig"
diff --git a/fs/Makefile b/fs/Makefile
index 2ce5112b02c8..91558eca0cf7 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -125,6 +125,7 @@ obj-$(CONFIG_BEFS_FS) += befs/
obj-$(CONFIG_HOSTFS) += hostfs/
obj-$(CONFIG_CACHEFILES) += cachefiles/
obj-$(CONFIG_DEBUG_FS) += debugfs/
+obj-$(CONFIG_STATS_FS) += stats_fs/
obj-$(CONFIG_TRACING) += tracefs/
obj-$(CONFIG_OCFS2_FS) += ocfs2/
obj-$(CONFIG_BTRFS_FS) += btrfs/
diff --git a/fs/stats_fs/Makefile b/fs/stats_fs/Makefile
new file mode 100644
index 000000000000..bd988daa4c39
--- /dev/null
+++ b/fs/stats_fs/Makefile
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: GPL-2.0-only
+stats_fs-objs := stats_fs.o
+
+obj-$(CONFIG_STATS_FS) += stats_fs.o
+obj-$(CONFIG_STATS_FS_STUB) += stub.o
diff --git a/fs/stats_fs/internal.h b/fs/stats_fs/internal.h
new file mode 100644
index 000000000000..4993afbb1e45
--- /dev/null
+++ b/fs/stats_fs/internal.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _STATS_FS_INTERNAL_H_
+#define _STATS_FS_INTERNAL_H_
+
+#include <linux/list.h>
+#include <linux/kref.h>
+#include <linux/rwsem.h>
+#include <linux/stats_fs.h>
+
+/* values, grouped by base */
+struct stats_fs_value_source {
+ void *base_addr;
+ bool files_created;
+ uint32_t common_flags;
+ struct stats_fs_value *values;
+ struct list_head list_element;
+};
+
+#endif /* _STATS_FS_INTERNAL_H_ */
diff --git a/fs/stats_fs/stats_fs.c b/fs/stats_fs/stats_fs.c
new file mode 100644
index 000000000000..b76ee44f6dac
--- /dev/null
+++ b/fs/stats_fs/stats_fs.c
@@ -0,0 +1,552 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/module.h>
+#include <linux/errno.h>
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/slab.h>
+#include <linux/rwsem.h>
+#include <linux/list.h>
+#include <linux/kref.h>
+#include <linux/limits.h>
+#include <linux/stats_fs.h>
+
+#include "internal.h"
+
+struct stats_fs_aggregate_value {
+ uint64_t sum, min, max;
+ uint32_t count, count_zero;
+};
+
+#define STATS_FS_DEFINE_TYPE_STRUCT(gtype, stype, si) \
+ const struct stats_fs_type stats_fs_type_##gtype = \
+ { \
+ .get = stats_fs_get_##gtype, \
+ .clear = stats_fs_clear_##stype, \
+ .sign = si, \
+ }; \
+ EXPORT_SYMBOL_GPL(stats_fs_type_##gtype);
+
+#define STATS_FS_TYPE_STRUCT_US(len) \
+ STATS_FS_DEFINE_TYPE_STRUCT(u##len, len, false) \
+ STATS_FS_DEFINE_TYPE_STRUCT(s##len, len, true)
+
+#define STATS_FS_TYPE_STRUCT(type) \
+ STATS_FS_DEFINE_TYPE_STRUCT(type, type, false)
+
+STATS_FS_TYPE_STRUCT_US(8)
+STATS_FS_TYPE_STRUCT_US(16)
+STATS_FS_TYPE_STRUCT_US(32)
+STATS_FS_TYPE_STRUCT_US(64)
+STATS_FS_TYPE_STRUCT(bool)
+
+static int is_val_signed(struct stats_fs_value *val)
+{
+ return val->type->sign;
+}
+
+static struct stats_fs_value *find_value(struct stats_fs_value_source *src,
+ struct stats_fs_value *val)
+{
+ struct stats_fs_value *entry;
+
+ for (entry = src->values; entry->name; entry++) {
+ if (entry == val)
+ return entry;
+ }
+ return NULL;
+}
+
+static struct stats_fs_value *
+search_value_in_source(struct stats_fs_source *src, struct stats_fs_value *arg,
+ struct stats_fs_value_source **val_src)
+{
+ struct stats_fs_value *entry;
+ struct stats_fs_value_source *src_entry;
+
+ list_for_each_entry (src_entry, &src->values_head, list_element) {
+ entry = find_value(src_entry, arg);
+ if (entry) {
+ *val_src = src_entry;
+ return entry;
+ }
+ }
+
+ return NULL;
+}
+
+/* Called with rwsem held for writing */
+static struct stats_fs_value_source *create_value_source(void *base, uint32_t flags)
+{
+ struct stats_fs_value_source *val_src;
+
+ val_src = kzalloc(sizeof(struct stats_fs_value_source), GFP_KERNEL);
+ if (!val_src)
+ return ERR_PTR(-ENOMEM);
+
+ val_src->base_addr = base;
+ val_src->common_flags = flags;
+ INIT_LIST_HEAD(&val_src->list_element);
+
+ return val_src;
+}
+
+int stats_fs_source_add_values(struct stats_fs_source *source,
+ struct stats_fs_value *stat, void *ptr,
+ uint32_t flags)
+{
+ struct stats_fs_value_source *val_src;
+ struct stats_fs_value_source *entry;
+
+ down_write(&source->rwsem);
+
+ list_for_each_entry (entry, &source->values_head, list_element) {
+ if (entry->base_addr == ptr && entry->values == stat) {
+ up_write(&source->rwsem);
+ return -EEXIST;
+ }
+ }
+
+ val_src = create_value_source(ptr, flags);
+ val_src->values = (struct stats_fs_value *)stat;
+
+ /* add the val_src to the source list */
+ list_add(&val_src->list_element, &source->values_head);
+
+ up_write(&source->rwsem);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(stats_fs_source_add_values);
+
+void stats_fs_source_add_subordinate(struct stats_fs_source *source,
+ struct stats_fs_source *sub)
+{
+ down_write(&source->rwsem);
+
+ stats_fs_source_get(sub);
+ list_add(&sub->list_element, &source->subordinates_head);
+
+ up_write(&source->rwsem);
+}
+EXPORT_SYMBOL_GPL(stats_fs_source_add_subordinate);
+
+/* Called with rwsem held for writing */
+static void
+stats_fs_source_remove_subordinate_locked(struct stats_fs_source *source,
+ struct stats_fs_source *sub)
+{
+ struct stats_fs_source *src_entry;
+
+ list_for_each_entry (src_entry, &source->subordinates_head,
+ list_element) {
+ if (src_entry == sub) {
+ list_del_init(&src_entry->list_element);
+ stats_fs_source_put(src_entry);
+ return;
+ }
+ }
+}
+
+void stats_fs_source_remove_subordinate(struct stats_fs_source *source,
+ struct stats_fs_source *sub)
+{
+ down_write(&source->rwsem);
+ stats_fs_source_remove_subordinate_locked(source, sub);
+ up_write(&source->rwsem);
+}
+EXPORT_SYMBOL_GPL(stats_fs_source_remove_subordinate);
+
+/* Called with rwsem held for reading */
+static uint64_t get_simple_value(struct stats_fs_value_source *src,
+ struct stats_fs_value *val)
+{
+ if (val->type->get)
+ return val->type->get(val, src->base_addr);
+ return 0;
+}
+
+/* Called with rwsem held for reading */
+static void
+search_all_simple_values(struct stats_fs_source *src,
+ struct stats_fs_value_source *ref_src_entry,
+ struct stats_fs_value *val,
+ struct stats_fs_aggregate_value *agg)
+{
+ struct stats_fs_value_source *src_entry;
+ uint64_t value_found;
+
+ list_for_each_entry (src_entry, &src->values_head, list_element) {
+ /* skip aggregates */
+ if (src_entry->base_addr == NULL)
+ continue;
+
+ /* useless to search here */
+ if (src_entry->values != ref_src_entry->values)
+ continue;
+
+ /* must be here */
+ value_found = get_simple_value(src_entry, val);
+
+ agg->sum += value_found;
+ agg->count++;
+ agg->count_zero += (value_found == 0);
+
+ if (is_val_signed(val)) {
+ agg->max = (((int64_t)value_found) >=
+ ((int64_t)agg->max)) ?
+ value_found :
+ agg->max;
+ agg->min = (((int64_t)value_found) <=
+ ((int64_t)agg->min)) ?
+ value_found :
+ agg->min;
+ } else {
+ agg->max = (value_found >= agg->max) ? value_found :
+ agg->max;
+ agg->min = (value_found <= agg->min) ? value_found :
+ agg->min;
+ }
+ }
+}
+
+/* Called with rwsem held for reading */
+static void
+do_recursive_aggregation(struct stats_fs_source *root,
+ struct stats_fs_value_source *ref_src_entry,
+ struct stats_fs_value *val,
+ struct stats_fs_aggregate_value *agg)
+{
+ struct stats_fs_source *subordinate;
+
+ /* search all simple values in this folder */
+ search_all_simple_values(root, ref_src_entry, val, agg);
+
+ /* recursively search in all subfolders */
+ list_for_each_entry (subordinate, &root->subordinates_head,
+ list_element) {
+ down_read(&subordinate->rwsem);
+ do_recursive_aggregation(subordinate, ref_src_entry, val, agg);
+ up_read(&subordinate->rwsem);
+ }
+}
+
+/* Called with rwsem held for reading */
+static void init_aggregate_value(struct stats_fs_aggregate_value *agg,
+ struct stats_fs_value *val)
+{
+ agg->count = agg->count_zero = agg->sum = 0;
+ if (is_val_signed(val)) {
+ agg->max = S64_MIN;
+ agg->min = S64_MAX;
+ } else {
+ agg->max = 0;
+ agg->min = U64_MAX;
+ }
+}
+
+/* Called with rwsem held for reading */
+static void store_final_value(struct stats_fs_aggregate_value *agg,
+ struct stats_fs_value *val, uint64_t *ret)
+{
+ switch (val->aggr_kind) {
+ case STATS_FS_AVG:{
+ if (is_val_signed(val))
+ *ret = agg->count ? ((int64_t)agg->sum) / agg->count : 0;
+ else
+ *ret = agg->count ? agg->sum / agg->count : 0;
+ break;
+ }
+ case STATS_FS_SUM:
+ *ret = agg->sum;
+ break;
+ case STATS_FS_MIN:
+ *ret = agg->min;
+ break;
+ case STATS_FS_MAX:
+ *ret = agg->max;
+ break;
+ case STATS_FS_COUNT_ZERO:
+ *ret = agg->count_zero;
+ break;
+ default:
+ break;
+ }
+}
+
+/* Called with rwsem held for reading */
+static int stats_fs_source_get_value_locked(struct stats_fs_source *source,
+ struct stats_fs_value *arg,
+ uint64_t *ret)
+{
+ struct stats_fs_value_source *src_entry;
+ struct stats_fs_value *found;
+ struct stats_fs_aggregate_value aggr;
+
+ *ret = 0;
+
+ if (!arg)
+ return -ENOENT;
+
+ /* look in simple values */
+ found = search_value_in_source(source, arg, &src_entry);
+
+ if (!found) {
+ printk(KERN_ERR "Stats_fs: Value in source \"%s\" not found!\n",
+ source->name);
+ return -ENOENT;
+ }
+
+ if (src_entry->base_addr != NULL) {
+ *ret = get_simple_value(src_entry, found);
+ return 0;
+ }
+
+ /* look in aggregates */
+ init_aggregate_value(&aggr, found);
+ do_recursive_aggregation(source, src_entry, found, &aggr);
+ store_final_value(&aggr, found, ret);
+
+ return 0;
+}
+
+int stats_fs_source_get_value(struct stats_fs_source *source,
+ struct stats_fs_value *arg, uint64_t *ret)
+{
+ int retval;
+
+ down_read(&source->rwsem);
+ retval = stats_fs_source_get_value_locked(source, arg, ret);
+ up_read(&source->rwsem);
+
+ return retval;
+}
+EXPORT_SYMBOL_GPL(stats_fs_source_get_value);
+
+/* Called with rwsem held for reading */
+static void set_all_simple_values(struct stats_fs_source *src,
+ struct stats_fs_value_source *ref_src_entry,
+ struct stats_fs_value *val)
+{
+ struct stats_fs_value_source *src_entry;
+
+ list_for_each_entry (src_entry, &src->values_head, list_element) {
+ /* skip aggregates */
+ if (src_entry->base_addr == NULL)
+ continue;
+
+ /* wrong to search here */
+ if (src_entry->values != ref_src_entry->values)
+ continue;
+
+ if (src_entry->base_addr &&
+ src_entry->values == ref_src_entry->values &&
+ val->type->clear)
+ val->type->clear(val, src_entry->base_addr);
+ }
+}
+
+/* Called with rwsem held for reading */
+static void do_recursive_clean(struct stats_fs_source *root,
+ struct stats_fs_value_source *ref_src_entry,
+ struct stats_fs_value *val)
+{
+ struct stats_fs_source *subordinate;
+
+ /* search all simple values in this folder */
+ set_all_simple_values(root, ref_src_entry, val);
+
+ /* recursively search in all subfolders */
+ list_for_each_entry (subordinate, &root->subordinates_head,
+ list_element) {
+ down_read(&subordinate->rwsem);
+ do_recursive_clean(subordinate, ref_src_entry, val);
+ up_read(&subordinate->rwsem);
+ }
+}
+
+/* Called with rwsem held for reading */
+static int stats_fs_source_clear_locked(struct stats_fs_source *source,
+ struct stats_fs_value *val)
+{
+ struct stats_fs_value_source *src_entry;
+ struct stats_fs_value *found;
+
+ if (!val)
+ return -ENOENT;
+
+ /* look in simple values */
+ found = search_value_in_source(source, val, &src_entry);
+
+ if (!found) {
+ printk(KERN_ERR "Stats_fs: Value in source \"%s\" not found!\n",
+ source->name);
+ return -ENOENT;
+ }
+
+ if (!(stats_fs_val_get_mode(val) & 0222))
+ return -EPERM;
+
+ if (src_entry->base_addr != NULL && found->type->clear) {
+ found->type->clear(found, src_entry->base_addr);
+ return 0;
+ }
+
+ /* look in aggregates */
+ do_recursive_clean(source, src_entry, found);
+
+ return 0;
+}
+
+int stats_fs_source_clear(struct stats_fs_source *source,
+ struct stats_fs_value *val)
+{
+ int retval;
+
+ down_read(&source->rwsem);
+ retval = stats_fs_source_clear_locked(source, val);
+ up_read(&source->rwsem);
+
+ return retval;
+}
+
+/* Called with rwsem held for reading */
+static struct stats_fs_value *
+find_value_by_name(struct stats_fs_value_source *src, char *val)
+{
+ struct stats_fs_value *entry;
+
+ for (entry = src->values; entry->name; entry++)
+ if (!strcmp(entry->name, val))
+ return entry;
+
+ return NULL;
+}
+
+/* Called with rwsem held for reading */
+static struct stats_fs_value *
+search_in_source_by_name(struct stats_fs_source *src, char *name)
+{
+ struct stats_fs_value *entry;
+ struct stats_fs_value_source *src_entry;
+
+ list_for_each_entry (src_entry, &src->values_head, list_element) {
+ entry = find_value_by_name(src_entry, name);
+ if (entry)
+ return entry;
+ }
+
+ return NULL;
+}
+
+int stats_fs_source_get_value_by_name(struct stats_fs_source *source,
+ char *name, uint64_t *ret)
+{
+ struct stats_fs_value *val;
+ int retval;
+
+ down_read(&source->rwsem);
+ val = search_in_source_by_name(source, name);
+
+ if (!val) {
+ *ret = 0;
+ up_read(&source->rwsem);
+ return -ENOENT;
+ }
+
+ retval = stats_fs_source_get_value_locked(source, val, ret);
+ up_read(&source->rwsem);
+
+ return retval;
+}
+EXPORT_SYMBOL_GPL(stats_fs_source_get_value_by_name);
+
+void stats_fs_source_get(struct stats_fs_source *source)
+{
+ kref_get(&source->refcount);
+}
+EXPORT_SYMBOL_GPL(stats_fs_source_get);
+
+void stats_fs_source_revoke(struct stats_fs_source *source)
+{
+ struct stats_fs_value_source *val_src_entry;
+
+ down_write(&source->rwsem);
+
+ list_for_each_entry (val_src_entry, &source->values_head, list_element)
+ val_src_entry->base_addr = NULL;
+
+ up_write(&source->rwsem);
+}
+EXPORT_SYMBOL_GPL(stats_fs_source_revoke);
+
+/* Called with rwsem held for writing
+ *
+ * The refcount is 0 and the lock was taken before refcount
+ * went from 1 to 0
+ */
+static void stats_fs_source_destroy(struct kref *kref_source)
+{
+ struct stats_fs_value_source *val_src_entry;
+ struct list_head *it, *safe;
+ struct stats_fs_source *child, *source;
+
+ source = container_of(kref_source, struct stats_fs_source, refcount);
+
+ /* iterate through the values and delete them */
+ list_for_each_safe (it, safe, &source->values_head) {
+ val_src_entry = list_entry(it, struct stats_fs_value_source,
+ list_element);
+ kfree(val_src_entry);
+ }
+
+ /* iterate through the subordinates and delete them */
+ list_for_each_safe (it, safe, &source->subordinates_head) {
+ child = list_entry(it, struct stats_fs_source, list_element);
+ stats_fs_source_remove_subordinate_locked(source, child);
+ }
+
+ up_write(&source->rwsem);
+ kfree(source->name);
+ kfree(source);
+}
+
+void stats_fs_source_put(struct stats_fs_source *source)
+{
+ kref_put_rwsem(&source->refcount, stats_fs_source_destroy,
+ &source->rwsem);
+}
+EXPORT_SYMBOL_GPL(stats_fs_source_put);
+
+struct stats_fs_source *stats_fs_source_create(uint32_t flags, const char *fmt, ...)
+{
+ va_list ap;
+ char buf[100];
+ struct stats_fs_source *ret;
+ int char_needed;
+
+ va_start(ap, fmt);
+ char_needed = vsnprintf(buf, 100, fmt, ap);
+ va_end(ap);
+
+ ret = kzalloc(sizeof(struct stats_fs_source), GFP_KERNEL);
+ if (!ret)
+ return ERR_PTR(-ENOMEM);
+
+ ret->name = kstrdup(buf, GFP_KERNEL);
+ if (!ret->name) {
+ kfree(ret);
+ return ERR_PTR(-ENOMEM);
+ }
+
+ ret->common_flags = flags;
+
+ kref_init(&ret->refcount);
+ init_rwsem(&ret->rwsem);
+
+ INIT_LIST_HEAD(&ret->values_head);
+ INIT_LIST_HEAD(&ret->subordinates_head);
+ INIT_LIST_HEAD(&ret->list_element);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(stats_fs_source_create);
diff --git a/fs/stats_fs/stub.c b/fs/stats_fs/stub.c
new file mode 100644
index 000000000000..0843b58ad3be
--- /dev/null
+++ b/fs/stats_fs/stub.c
@@ -0,0 +1,13 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/module.h>
+#include <linux/errno.h>
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/slab.h>
+#include <linux/rwsem.h>
+#include <linux/list.h>
+#include <linux/kref.h>
+#include <linux/limits.h>
+#include <linux/stats_fs.h>
+
+const struct stats_fs_type stats_fs_type_stub;
diff --git a/include/linux/stats_fs.h b/include/linux/stats_fs.h
new file mode 100644
index 000000000000..93847383f597
--- /dev/null
+++ b/include/linux/stats_fs.h
@@ -0,0 +1,363 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * stats_fs.h - a tiny little statistics file system
+ *
+ * Copyright (C) 2020 Emanuele Giuseppe Esposito
+ * Copyright (C) 2020 Redhat.
+ *
+ */
+
+#ifndef _STATS_FS_H_
+#define _STATS_FS_H_
+
+#include <linux/list.h>
+
+enum stat_aggr {
+ STATS_FS_NONE = 0,
+ STATS_FS_SUM,
+ STATS_FS_MIN,
+ STATS_FS_MAX,
+ STATS_FS_COUNT_ZERO,
+ STATS_FS_AVG,
+};
+
+/* flags used by source and internal source_value structure */
+enum source_value_common_flags {
+ STATS_FS_HIDDEN = 0x1,
+};
+
+enum stat_fs_value_flag {
+ STATS_FS_FLOATING_VALUE = 0x1,
+};
+
+struct stats_fs_value;
+
+struct stats_fs_type {
+ uint64_t (*get)(struct stats_fs_value *, void *);
+ void (*clear)(struct stats_fs_value *, void *);
+ bool sign;
+};
+
+struct stats_fs_value {
+ /* Name of the stat */
+ char *name;
+
+ /* Offset from base address to field containing the value */
+ int offset;
+
+ /* Type of the stat BOOL,U64,... */
+ const struct stats_fs_type *type;
+
+ /* Aggregate type: MIN, MAX, SUM,... */
+ enum stat_aggr aggr_kind;
+
+ uint32_t value_flag;
+};
+
+struct stats_fs_source {
+ struct kref refcount;
+
+ char *name;
+
+ uint32_t common_flags;
+
+ /* list of source stats_fs_value_source*/
+ struct list_head values_head;
+
+ /* list of struct stats_fs_source for subordinate sources */
+ struct list_head subordinates_head;
+
+ struct list_head list_element;
+
+ struct rw_semaphore rwsem;
+
+ struct dentry *source_dentry;
+};
+
+static inline int stats_fs_val_get_mode(struct stats_fs_value *val)
+{
+ return (val->value_flag & STATS_FS_FLOATING_VALUE || !val->type->clear)
+ ? 0444 : 0644;
+}
+
+#define STATS_FS_DEFINE_GET(name, type) \
+ static inline uint64_t stats_fs_get_##name(struct stats_fs_value *val, \
+ void *base) \
+ { \
+ return *((type *)(base + (uintptr_t)val->offset)); \
+ }
+
+#define STATS_FS_DEFINE_CLEAR(name, type) \
+ static inline void stats_fs_clear_##name(struct stats_fs_value *val, \
+ void *base) \
+ { \
+ *((type *)(base + (uintptr_t)val->offset)) = 0; \
+ }
+
+#define STATS_FS_DEFINE_FUNCT_US(len) \
+ STATS_FS_DEFINE_GET(u##len, u##len) \
+ STATS_FS_DEFINE_GET(s##len, s##len) \
+ STATS_FS_DEFINE_CLEAR(len, u##len)
+
+#define STATS_FS_DEFINE_FUNCT(type) \
+ STATS_FS_DEFINE_GET(type, type) \
+ STATS_FS_DEFINE_CLEAR(type, type)
+
+STATS_FS_DEFINE_FUNCT_US(8)
+STATS_FS_DEFINE_FUNCT_US(16)
+STATS_FS_DEFINE_FUNCT_US(32)
+STATS_FS_DEFINE_FUNCT_US(64)
+STATS_FS_DEFINE_FUNCT(bool)
+
+#undef STATS_FS_DEFINE_FUNCT
+#undef STATS_FS_DEFINE_FUNCT_US
+#undef STATS_FS_DEFINE_GET
+#undef STATS_FS_DEFINE_CLEAR
+
+#if defined(CONFIG_STATS_FS)
+
+extern const struct stats_fs_type stats_fs_type_u8;
+extern const struct stats_fs_type stats_fs_type_s8;
+extern const struct stats_fs_type stats_fs_type_u16;
+extern const struct stats_fs_type stats_fs_type_s16;
+extern const struct stats_fs_type stats_fs_type_u32;
+extern const struct stats_fs_type stats_fs_type_s32;
+extern const struct stats_fs_type stats_fs_type_u64;
+extern const struct stats_fs_type stats_fs_type_s64;
+extern const struct stats_fs_type stats_fs_type_bool;
+
+/**
+ * stats_fs_source_create - create a stats_fs_source
+ * @flags: an integer containing all source flags (STATS_FS_HIDDEN, ...)
+ * @fmt: source name format
+ *
+ * Creates a stats_fs_source with the given name. This
+ * does not mean it will be backed by the filesystem yet, it will only
+ * be visible to the user once one of its parents (or itself) are
+ * registered in stats_fs.
+ *
+ * Returns a pointer to a stats_fs_source if it succeeds.
+ * This or one of the parents' pointer must be passed to the stats_fs_put()
+ * function when the file is to be removed. If an error occurs,
+ * ERR_PTR(-ERROR) will be returned.
+ */
+struct stats_fs_source *stats_fs_source_create(uint32_t flags, const char *fmt,
+ ...);
+
+/**
+ * stats_fs_source_add_values - adds values to the given source
+ * @source: a pointer to the source that will receive the values
+ * @val: a pointer to the NULL terminated stats_fs_value array to add
+ * @base_ptr: a pointer to the base pointer used by these values
+ * @flags: an integer containing common value flags (STATS_FS_HIDDEN, ...)
+ *
+ * In addition to adding values to the source, also create the
+ * files in the filesystem if the source already is backed up by a directory.
+ *
+ * Returns 0 it succeeds. If the value are already in the
+ * source and have the same base_ptr, -EEXIST is returned.
+ */
+int stats_fs_source_add_values(struct stats_fs_source *source,
+ struct stats_fs_value *val, void *base_ptr,
+ uint32_t flags);
+
+/**
+ * stats_fs_source_add_subordinate - adds a child to the given source
+ * @parent: a pointer to the parent source
+ * @child: a pointer to child source to add
+ *
+ * Recursively create all files in the stats_fs filesystem
+ * only if the parent has already a dentry (created with
+ * stats_fs_source_register).
+ * This avoids the case where this function is called before register.
+ */
+void stats_fs_source_add_subordinate(struct stats_fs_source *parent,
+ struct stats_fs_source *child);
+
+/**
+ * stats_fs_source_remove_subordinate - removes a child from the given source
+ * @parent: a pointer to the parent source
+ * @child: a pointer to child source to remove
+ *
+ * Look if there is such child in the parent. If so,
+ * it will remove all its files and call stats_fs_put on the child.
+ */
+void stats_fs_source_remove_subordinate(struct stats_fs_source *parent,
+ struct stats_fs_source *child);
+
+/**
+ * stats_fs_source_get_value - search a value in the source (and
+ * subordinates)
+ * @source: a pointer to the source that will be searched
+ * @val: a pointer to the stats_fs_value to search
+ * @ret: a pointer to the uint64_t that will hold the found value
+ *
+ * Look up in the source if a value with same value pointer
+ * exists.
+ * If not, it will return -ENOENT. If it exists and it's a simple value
+ * (not an aggregate), the value that it points to will be returned.
+ * If it exists and it's an aggregate (aggr_type != STATS_FS_NONE), all
+ * subordinates will be recursively searched and every simple value match
+ * will be used to aggregate the final result. For example if it's a sum,
+ * all suboordinates having the same value will be sum together.
+ *
+ * This function will return 0 it succeeds.
+ */
+int stats_fs_source_get_value(struct stats_fs_source *source,
+ struct stats_fs_value *val, uint64_t *ret);
+
+/**
+ * stats_fs_source_get_value_by_name - search a value in the source (and
+ * subordinates)
+ * @source: a pointer to the source that will be searched
+ * @name: a pointer to the string representing the value to search
+ * (for example "exits")
+ * @ret: a pointer to the uint64_t that will hold the found value
+ *
+ * Same as stats_fs_source_get_value, but initially the name is used
+ * to search in the given source if there is a value with a matching
+ * name. If so, stats_fs_source_get_value will be called with the found
+ * value, otherwise -ENOENT will be returned.
+ */
+int stats_fs_source_get_value_by_name(struct stats_fs_source *source,
+ char *name, uint64_t *ret);
+
+/**
+ * stats_fs_source_clear - search and clears a value in the source (and
+ * subordinates)
+ * @source: a pointer to the source that will be searched
+ * @val: a pointer to the stats_fs_value to search
+ *
+ * Look up in the source if a value with same value pointer
+ * exists.
+ * If not, it will return -ENOENT. If it exists and it's a simple value
+ * (not an aggregate), the value that it points to will be set to 0.
+ * If it exists and it's an aggregate (aggr_type != STATS_FS_NONE), all
+ * subordinates will be recursively searched and every simple value match
+ * will be set to 0.
+ *
+ * This function will return 0 it succeeds.
+ */
+int stats_fs_source_clear(struct stats_fs_source *source,
+ struct stats_fs_value *val);
+
+/**
+ * stats_fs_source_revoke - disconnect the source from its backing data
+ * @source: a pointer to the source that will be revoked
+ *
+ * Ensure that stats_fs will not access the data that were passed to
+ * stats_fs_source_add_value for this source.
+ *
+ * Because open files increase the reference count for a stats_fs_source,
+ * the source can end up living longer than the data that provides the
+ * values for the source. Calling stats_fs_source_revoke just before the
+ * backing data is freed avoids accesses to freed data structures. The
+ * sources will return 0.
+ */
+void stats_fs_source_revoke(struct stats_fs_source *source);
+
+/**
+ * stats_fs_source_get - increases refcount of source
+ * @source: a pointer to the source whose refcount will be increased
+ */
+void stats_fs_source_get(struct stats_fs_source *source);
+
+/**
+ * stats_fs_source_put - decreases refcount of source and deletes if needed
+ * @source: a pointer to the source whose refcount will be decreased
+ *
+ * If refcount arrives to zero, take care of deleting
+ * and free the source resources and files, by firstly recursively calling
+ * stats_fs_source_remove_subordinate to the child and then deleting
+ * its own files and allocations.
+ */
+void stats_fs_source_put(struct stats_fs_source *source);
+
+/**
+ * stats_fs_initialized - returns true if stats_fs fs has been registered
+ */
+bool stats_fs_initialized(void);
+
+#else
+
+#include <linux/err.h>
+
+#define stats_fs_type_u8 stats_fs_type_stub
+#define stats_fs_type_s8 stats_fs_type_stub
+#define stats_fs_type_u16 stats_fs_type_stub
+#define stats_fs_type_s16 stats_fs_type_stub
+#define stats_fs_type_u32 stats_fs_type_stub
+#define stats_fs_type_s32 stats_fs_type_stub
+#define stats_fs_type_u64 stats_fs_type_stub
+#define stats_fs_type_s64 stats_fs_type_stub
+#define stats_fs_type_bool stats_fs_type_stub
+
+extern const struct stats_fs_type stats_fs_type_stub;
+
+/*
+ * We do not return NULL from these functions if CONFIG_STATS_FS is not enabled
+ * so users have a chance to detect if there was a real error or not. We don't
+ * want to duplicate the design decision mistakes of procfs and devfs again.
+ */
+
+static inline struct stats_fs_source *stats_fs_source_create(uint32_t flags,
+ const char *fmt,
+ ...)
+{
+ return ERR_PTR(-ENODEV);
+}
+
+static inline int stats_fs_source_add_values(struct stats_fs_source *source,
+ struct stats_fs_value *val,
+ void *base_ptr, uint32_t flags)
+{
+ return -ENODEV;
+}
+
+static inline void
+stats_fs_source_add_subordinate(struct stats_fs_source *parent,
+ struct stats_fs_source *child)
+{ }
+
+static inline void
+stats_fs_source_remove_subordinate(struct stats_fs_source *parent,
+ struct stats_fs_source *child)
+{ }
+
+static inline int stats_fs_source_get_value(struct stats_fs_source *source,
+ struct stats_fs_value *val,
+ uint64_t *ret)
+{
+ return -ENODEV;
+}
+
+static inline int
+stats_fs_source_get_value_by_name(struct stats_fs_source *source, char *name,
+ uint64_t *ret)
+{
+ return -ENODEV;
+}
+
+static inline int stats_fs_source_clear(struct stats_fs_source *source,
+ struct stats_fs_value *val)
+{
+ return -ENODEV;
+}
+
+static inline void stats_fs_source_revoke(struct stats_fs_source *source)
+{ }
+
+static inline void stats_fs_source_get(struct stats_fs_source *source)
+{ }
+
+static inline void stats_fs_source_put(struct stats_fs_source *source)
+{ }
+
+static inline bool stats_fs_initialized(void)
+{
+ return false;
+}
+
+#endif
+
+#endif
--
2.25.4
^ permalink raw reply related
* [PATCH v3 0/7] Statsfs: a new ram-based file system for Linux kernel statistics
From: Emanuele Giuseppe Esposito @ 2020-05-26 11:03 UTC (permalink / raw)
To: kvm
Cc: Emanuele Giuseppe Esposito, linux-s390, linux-doc, netdev,
Emanuele Giuseppe Esposito, linux-kernel, kvm-ppc, Jonathan Adams,
Christian Borntraeger, Alexander Viro, David Rientjes,
linux-fsdevel, Paolo Bonzini, linux-mips, linuxppc-dev,
linux-arm-kernel, Jim Mattson
There is currently no common way for Linux kernel subsystems to expose
statistics to userspace shared throughout the Linux kernel; subsystems have
to take care of gathering and displaying statistics by themselves, for
example in the form of files in debugfs. For example KVM has its own code
section that takes care of this in virt/kvm/kvm_main.c, where it sets up
debugfs handlers for displaying values and aggregating them from various
subfolders to obtain information about the system state (i.e. displaying
the total number of exits, calculated by summing all exits of all cpus of
all running virtual machines).
Allowing each section of the kernel to do so has two disadvantages. First,
it will introduce redundant code. Second, debugfs is anyway not the right
place for statistics (for example it is affected by lockdown)
In this patch series I introduce statsfs, a synthetic ram-based virtual
filesystem that takes care of gathering and displaying statistics for the
Linux kernel subsystems.
The file system is mounted on /sys/kernel/stats and would be already used
by kvm. Statsfs was initially introduced by Paolo Bonzini [1].
Statsfs offers a generic and stable API, allowing any kind of
directory/file organization and supporting multiple kind of aggregations
(not only sum, but also average, max, min and count_zero) and data types
(boolean, unsigned/signed and custom types). The implementation, which is
a generalization of KVM’s debugfs statistics code, takes care of gathering
and displaying information at run time; users only need to specify the
values to be included in each source.
Statsfs would also be a different mountpoint from debugfs, and would not
suffer from limited access due to the security lock down patches. Its main
function is to display each statistics as a file in the desired folder
hierarchy defined through the API. Statsfs files can be read, and possibly
cleared if their file mode allows it.
Statsfs has two main components: the public API defined by
include/linux/statsfs.h, and the virtual file system which should end up in
/sys/kernel/stats.
The API has two main elements, values and sources. Kernel subsystems like
KVM can use the API to create a source, add child sources/values/aggregates
and register it to the root source (that on the virtual fs would be
/sys/kernel/statsfs).
Sources are created via statsfs_source_create(), and each source becomes a
directory in the file system. Sources form a parent-child relationship;
root sources are added to the file system via statsfs_source_register().
Every other source is added to or removed from a parent through the
statsfs_source_add_subordinate and statsfs_source_remote_subordinate APIs.
Once a source is created and added to the tree (via add_subordinate), it
will be used to compute aggregate values in the parent source.
A source can optionally be hidden from the filesystem
but still considered in the aggregation operations if the corresponding
flag is set during initialization.
Values represent quantites that are gathered by the statsfs user. Examples
of values include the number of vm exits of a given kind, the amount of
memory used by some data structure, the length of the longest hash table
chain, or anything like that. Values are defined with the
statsfs_source_add_values function. Each value is defined by a struct
statsfs_value; the same statsfs_value can be added to many different
sources. A value can be considered "simple" if it fetches data from a
user-provided location, or "aggregate" if it groups all values in the
subordinates sources that include the same statsfs_value.
Each value has a stats_fs_type pointer in order to allow the user to
provide custom get and clear functions. The library, however, also
exports default stats_fs_type structs for the standard types
(all unsigned and signed types plus boolean).
A value can also provide a show function, that takes care
of displaying the value in a custom string format. This can be especially
useful when displaying enums.
For more information, please consult the kerneldoc documentation in patch 2
and the sample uses in the kunit tests, KVM and networking.
This series of patches is based on my previous series "libfs: group and
simplify linux fs code" and the single patch sent to kvm "kvm_host: unify
VM_STAT and VCPU_STAT definitions in a single place". The former simplifies
code duplicated in debugfs and tracefs (from which statsfs is based on),
the latter groups all macros definition for statistics in kvm in a single
common file shared by all architectures.
Patch 1 adds a new refcount and kref destructor wrappers that take a
semaphore, as those are used later by statsfs. Patch 2 introduces the
statsfs API, patch 3 provides extensive tests that can also be used as
example on how to use the API and patch 4 adds the file system support.
Finally, patch 5 provides a real-life example of statsfs usage in KVM,
with patch 6 providing a concrete example of the show function and
patch 7 another real-life example in the networking subsystem.
[1] https://lore.kernel.org/kvm/5d6cdcb1-d8ad-7ae6-7351-3544e2fa366d@redhat.com/?fbclid=IwAR18LHJ0PBcXcDaLzILFhHsl3qpT3z2vlG60RnqgbpGYhDv7L43n0ZXJY8M
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
v2 -> v3 move kconfig entry in the pseudo filesystem menu, add
documentation, get/clear function for value types, show function,
floating/cumulative and hidden flags. Also added the netstat
example
Emanuele Giuseppe Esposito (7):
stats_fs API: create, add and remove stats_fs sources and values
documentation for stats_fs
kunit: tests for stats_fs API
stats_fs fs: virtual fs to show stats to the end-user
kvm_main: replace debugfs with stats_fs
[not for merge] kvm: example of stats_fs_value show function
[not for merge] netstats: example use of stats_fs API
Documentation/filesystems/index.rst | 1 +
Documentation/filesystems/stats_fs.rst | 222 +++++
MAINTAINERS | 7 +
arch/arm64/kvm/Kconfig | 1 +
arch/arm64/kvm/guest.c | 2 +-
arch/mips/kvm/Kconfig | 1 +
arch/mips/kvm/mips.c | 2 +-
arch/powerpc/kvm/Kconfig | 1 +
arch/powerpc/kvm/book3s.c | 12 +-
arch/powerpc/kvm/booke.c | 8 +-
arch/s390/kvm/Kconfig | 1 +
arch/s390/kvm/kvm-s390.c | 16 +-
arch/x86/include/asm/kvm_host.h | 2 +-
arch/x86/kvm/Kconfig | 1 +
arch/x86/kvm/Makefile | 2 +-
arch/x86/kvm/debugfs.c | 64 --
arch/x86/kvm/stats_fs.c | 114 +++
arch/x86/kvm/x86.c | 11 +-
fs/Kconfig | 20 +
fs/Makefile | 1 +
fs/stats_fs/Makefile | 7 +
fs/stats_fs/inode.c | 461 ++++++++++
fs/stats_fs/internal.h | 34 +
fs/stats_fs/stats_fs-tests.c | 1097 ++++++++++++++++++++++++
fs/stats_fs/stats_fs.c | 642 ++++++++++++++
fs/stats_fs/stub.c | 13 +
include/linux/kvm_host.h | 45 +-
include/linux/netdevice.h | 2 +
include/linux/stats_fs.h | 381 ++++++++
include/uapi/linux/magic.h | 1 +
net/Kconfig | 1 +
net/core/dev.c | 68 ++
tools/lib/api/fs/fs.c | 21 +
virt/kvm/arm/arm.c | 2 +-
virt/kvm/kvm_main.c | 317 +------
35 files changed, 3193 insertions(+), 388 deletions(-)
create mode 100644 Documentation/filesystems/stats_fs.rst
delete mode 100644 arch/x86/kvm/debugfs.c
create mode 100644 arch/x86/kvm/stats_fs.c
create mode 100644 fs/stats_fs/Makefile
create mode 100644 fs/stats_fs/inode.c
create mode 100644 fs/stats_fs/internal.h
create mode 100644 fs/stats_fs/stats_fs-tests.c
create mode 100644 fs/stats_fs/stats_fs.c
create mode 100644 fs/stats_fs/stub.c
create mode 100644 include/linux/stats_fs.h
--
2.25.4
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox