* [PATCH v5 1/7] jailhouse: Provide detection for non-x86 systems
From: Jan Kiszka @ 2018-03-07 7:39 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, Bjorn Helgaas
Cc: jailhouse-dev, Mark Rutland, Juergen Gross, linux-pci, x86,
Linux Kernel Mailing List, virtualization, Andy Shevchenko,
Rob Herring
In-Reply-To: <cover.1520408357.git.jan.kiszka@siemens.com>
From: Jan Kiszka <jan.kiszka@siemens.com>
Implement jailhouse_paravirt() via device tree probing on architectures
!= x86. Will be used by the PCI core.
CC: Rob Herring <robh+dt@kernel.org>
CC: Mark Rutland <mark.rutland@arm.com>
CC: Juergen Gross <jgross@suse.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
---
Documentation/devicetree/bindings/jailhouse.txt | 8 ++++++++
arch/x86/include/asm/jailhouse_para.h | 2 +-
include/linux/hypervisor.h | 17 +++++++++++++++--
3 files changed, 24 insertions(+), 3 deletions(-)
create mode 100644 Documentation/devicetree/bindings/jailhouse.txt
diff --git a/Documentation/devicetree/bindings/jailhouse.txt b/Documentation/devicetree/bindings/jailhouse.txt
new file mode 100644
index 000000000000..2901c25ff340
--- /dev/null
+++ b/Documentation/devicetree/bindings/jailhouse.txt
@@ -0,0 +1,8 @@
+Jailhouse non-root cell device tree bindings
+--------------------------------------------
+
+When running in a non-root Jailhouse cell (partition), the device tree of this
+platform shall have a top-level "hypervisor" node with the following
+properties:
+
+- compatible = "jailhouse,cell"
diff --git a/arch/x86/include/asm/jailhouse_para.h b/arch/x86/include/asm/jailhouse_para.h
index 875b54376689..b885a961a150 100644
--- a/arch/x86/include/asm/jailhouse_para.h
+++ b/arch/x86/include/asm/jailhouse_para.h
@@ -1,7 +1,7 @@
/* SPDX-License-Identifier: GPL2.0 */
/*
- * Jailhouse paravirt_ops implementation
+ * Jailhouse paravirt detection
*
* Copyright (c) Siemens AG, 2015-2017
*
diff --git a/include/linux/hypervisor.h b/include/linux/hypervisor.h
index b19563f9a8eb..fc08b433c856 100644
--- a/include/linux/hypervisor.h
+++ b/include/linux/hypervisor.h
@@ -8,15 +8,28 @@
*/
#ifdef CONFIG_X86
+
+#include <asm/jailhouse_para.h>
#include <asm/x86_init.h>
+
static inline void hypervisor_pin_vcpu(int cpu)
{
x86_platform.hyper.pin_vcpu(cpu);
}
-#else
+
+#else /* !CONFIG_X86 */
+
+#include <linux/of.h>
+
static inline void hypervisor_pin_vcpu(int cpu)
{
}
-#endif
+
+static inline bool jailhouse_paravirt(void)
+{
+ return of_find_compatible_node(NULL, NULL, "jailhouse,cell");
+}
+
+#endif /* !CONFIG_X86 */
#endif /* __LINUX_HYPEVISOR_H */
--
2.13.6
^ permalink raw reply related
* [PATCH v5 0/7] jailhouse: Enhance secondary Jailhouse guest support /wrt PCI
From: Jan Kiszka @ 2018-03-07 7:39 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, Bjorn Helgaas
Cc: jailhouse-dev, Juergen Gross, Benedikt Spranger, linux-pci, x86,
Linux Kernel Mailing List, virtualization, Andy Shevchenko,
Rob Herring, Otavio Pontes, Mark Rutland
Basic x86 support [1] for running Linux as secondary Jailhouse [2] guest
is currently pending in the tip tree. This builds on top and enhances
the PCI support for x86 and also ARM guests (ARM[64] does not require
platform patches and works already).
Key elements of this series are:
- detection of Jailhouse via device tree hypervisor node
- function-level PCI scan if Jailhouse is detected
- MMCONFIG support for x86 guests
As most changes affect x86, I would suggest to route the series also via
tip after the necessary acks are collected.
Changes in v5:
- fix build breakage of patch 6 on i386
Changes in v4:
- slit up Kconfig changes
- respect pcibios_last_bus during mmconfig setup
- cosmetic changes requested by Andy
Changes in v3:
- avoided duplicate scans of PCI functions under Jailhouse
- reformated PCI_MMCONFIG condition and rephrase related commit log
Changes in v2:
- adjusted commit log and include ordering in patch 2
- rebased over Linus master
Jan
[1] https://lkml.org/lkml/2017/11/27/125
[2] http://jailhouse-project.org
CC: Benedikt Spranger <b.spranger@linutronix.de>
CC: Juergen Gross <jgross@suse.com>
CC: Mark Rutland <mark.rutland@arm.com>
CC: Otavio Pontes <otavio.pontes@intel.com>
CC: Rob Herring <robh+dt@kernel.org>
Jan Kiszka (6):
jailhouse: Provide detection for non-x86 systems
PCI: Scan all functions when running over Jailhouse
x86: Align x86_64 PCI_MMCONFIG with 32-bit variant
x86: Consolidate PCI_MMCONFIG configs
x86/jailhouse: Allow to use PCI_MMCONFIG without ACPI
MAINTAINERS: Add entry for Jailhouse
Otavio Pontes (1):
x86/jailhouse: Enable PCI mmconfig access in inmates
Documentation/devicetree/bindings/jailhouse.txt | 8 ++++++++
MAINTAINERS | 7 +++++++
arch/x86/Kconfig | 12 +++++++-----
arch/x86/include/asm/jailhouse_para.h | 2 +-
arch/x86/include/asm/pci_x86.h | 2 ++
arch/x86/kernel/Makefile | 2 +-
arch/x86/kernel/cpu/amd.c | 2 +-
arch/x86/kernel/jailhouse.c | 8 ++++++++
arch/x86/pci/legacy.c | 4 +++-
arch/x86/pci/mmconfig-shared.c | 4 ++--
drivers/pci/probe.c | 22 +++++++++++++++++++---
include/linux/hypervisor.h | 17 +++++++++++++++--
12 files changed, 74 insertions(+), 16 deletions(-)
create mode 100644 Documentation/devicetree/bindings/jailhouse.txt
--
2.13.6
^ permalink raw reply
* Re: [PATCH v4 6/7] x86/jailhouse: Allow to use PCI_MMCONFIG without ACPI
From: kbuild test robot @ 2018-03-06 0:41 UTC (permalink / raw)
To: Jan Kiszka
Cc: jailhouse-dev, linux-pci, x86, Linux Kernel Mailing List,
virtualization, Andy Shevchenko, Ingo Molnar, kbuild-all,
H . Peter Anvin, Bjorn Helgaas, Thomas Gleixner
In-Reply-To: <b4504c33ee7edb6e4f2220f7828f3a27586f4741.1520188299.git.jan.kiszka@siemens.com>
[-- Attachment #1: Type: text/plain, Size: 4344 bytes --]
Hi Jan,
I love your patch! Yet something to improve:
[auto build test ERROR on pci/next]
[also build test ERROR on v4.16-rc4 next-20180305]
[cannot apply to tip/x86/core]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Jan-Kiszka/jailhouse-Enhance-secondary-Jailhouse-guest-support-wrt-PCI/20180306-070138
base: https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git next
config: i386-randconfig-x079-201809 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=i386
All errors (new ones prefixed by >>):
arch/x86/kernel/cpu/amd.c: In function 'init_amd_gh':
>> arch/x86/kernel/cpu/amd.c:722:3: error: implicit declaration of function 'check_enable_amd_mmconf_dmi' [-Werror=implicit-function-declaration]
check_enable_amd_mmconf_dmi();
^~~~~~~~~~~~~~~~~~~~~~~~~~~
>> arch/x86/kernel/cpu/amd.c:724:2: error: implicit declaration of function 'fam10h_check_enable_mmcfg' [-Werror=implicit-function-declaration]
fam10h_check_enable_mmcfg();
^~~~~~~~~~~~~~~~~~~~~~~~~
cc1: some warnings being treated as errors
vim +/check_enable_amd_mmconf_dmi +722 arch/x86/kernel/cpu/amd.c
0d96b9ff7 Yinghai Lu 2009-08-29 716
26bfa5f89 Borislav Petkov 2014-06-24 717 static void init_amd_gh(struct cpuinfo_x86 *c)
26bfa5f89 Borislav Petkov 2014-06-24 718 {
377b0048c Jan Kiszka 2018-03-04 719 #ifdef CONFIG_MMCONF_FAM10H
26bfa5f89 Borislav Petkov 2014-06-24 720 /* do this for boot cpu */
26bfa5f89 Borislav Petkov 2014-06-24 721 if (c == &boot_cpu_data)
26bfa5f89 Borislav Petkov 2014-06-24 @722 check_enable_amd_mmconf_dmi();
26bfa5f89 Borislav Petkov 2014-06-24 723
26bfa5f89 Borislav Petkov 2014-06-24 @724 fam10h_check_enable_mmcfg();
26bfa5f89 Borislav Petkov 2014-06-24 725 #endif
6c62aa4a3 Yinghai Lu 2008-09-07 726
6c62aa4a3 Yinghai Lu 2008-09-07 727 /*
26bfa5f89 Borislav Petkov 2014-06-24 728 * Disable GART TLB Walk Errors on Fam10h. We do this here because this
26bfa5f89 Borislav Petkov 2014-06-24 729 * is always needed when GART is enabled, even in a kernel which has no
26bfa5f89 Borislav Petkov 2014-06-24 730 * MCE support built in. BIOS should disable GartTlbWlk Errors already.
26bfa5f89 Borislav Petkov 2014-06-24 731 * If it doesn't, we do it here as suggested by the BKDG.
26bfa5f89 Borislav Petkov 2014-06-24 732 *
26bfa5f89 Borislav Petkov 2014-06-24 733 * Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=33012
6c62aa4a3 Yinghai Lu 2008-09-07 734 */
26bfa5f89 Borislav Petkov 2014-06-24 735 msr_set_bit(MSR_AMD64_MCx_MASK(4), 10);
6c62aa4a3 Yinghai Lu 2008-09-07 736
26bfa5f89 Borislav Petkov 2014-06-24 737 /*
26bfa5f89 Borislav Petkov 2014-06-24 738 * On family 10h BIOS may not have properly enabled WC+ support, causing
26bfa5f89 Borislav Petkov 2014-06-24 739 * it to be converted to CD memtype. This may result in performance
26bfa5f89 Borislav Petkov 2014-06-24 740 * degradation for certain nested-paging guests. Prevent this conversion
26bfa5f89 Borislav Petkov 2014-06-24 741 * by clearing bit 24 in MSR_AMD64_BU_CFG2.
26bfa5f89 Borislav Petkov 2014-06-24 742 *
26bfa5f89 Borislav Petkov 2014-06-24 743 * NOTE: we want to use the _safe accessors so as not to #GP kvm
26bfa5f89 Borislav Petkov 2014-06-24 744 * guests on older kvm hosts.
26bfa5f89 Borislav Petkov 2014-06-24 745 */
26bfa5f89 Borislav Petkov 2014-06-24 746 msr_clear_bit(MSR_AMD64_BU_CFG2, 24);
11fdd252b Yinghai Lu 2008-09-07 747
26bfa5f89 Borislav Petkov 2014-06-24 748 if (cpu_has_amd_erratum(c, amd_erratum_383))
26bfa5f89 Borislav Petkov 2014-06-24 749 set_cpu_bug(c, X86_BUG_AMD_TLB_MMATCH);
11fdd252b Yinghai Lu 2008-09-07 750 }
11fdd252b Yinghai Lu 2008-09-07 751
:::::: The code at line 722 was first introduced by commit
:::::: 26bfa5f89486a8926cd4d4ca81a04d3f0f174934 x86, amd: Cleanup init_amd
:::::: TO: Borislav Petkov <bp@suse.de>
:::::: CC: H. Peter Anvin <hpa@linux.intel.com>
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 24220 bytes --]
[-- Attachment #3: Type: text/plain, Size: 183 bytes --]
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH v4 3/7] x86/jailhouse: Enable PCI mmconfig access in inmates
From: Andy Shevchenko @ 2018-03-05 15:36 UTC (permalink / raw)
To: Jan Kiszka
Cc: jailhouse-dev, linux-pci, x86, Linux Kernel Mailing List,
virtualization, Ingo Molnar, H . Peter Anvin, Bjorn Helgaas,
Thomas Gleixner
In-Reply-To: <d780e72d7a0992c891f6787aa517f8f090cb54d9.1520188299.git.jan.kiszka@siemens.com>
On Sun, Mar 4, 2018 at 8:31 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> From: Otavio Pontes <otavio.pontes@intel.com>
>
> Use the PCI mmconfig base address exported by jailhouse in boot
> parameters in order to access the memory mapped PCI configuration space.
>
FWIW,
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
> Signed-off-by: Otavio Pontes <otavio.pontes@intel.com>
> [Jan: rebased, fixed !CONFIG_PCI_MMCONFIG, used pcibios_last_bus]
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> ---
> arch/x86/include/asm/pci_x86.h | 2 ++
> arch/x86/kernel/jailhouse.c | 8 ++++++++
> arch/x86/pci/mmconfig-shared.c | 4 ++--
> 3 files changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h
> index eb66fa9cd0fc..959d618dbb17 100644
> --- a/arch/x86/include/asm/pci_x86.h
> +++ b/arch/x86/include/asm/pci_x86.h
> @@ -151,6 +151,8 @@ extern int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end,
> phys_addr_t addr);
> extern int pci_mmconfig_delete(u16 seg, u8 start, u8 end);
> extern struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus);
> +extern struct pci_mmcfg_region *__init pci_mmconfig_add(int segment, int start,
> + int end, u64 addr);
>
> extern struct list_head pci_mmcfg_list;
>
> diff --git a/arch/x86/kernel/jailhouse.c b/arch/x86/kernel/jailhouse.c
> index b68fd895235a..fa183a131edc 100644
> --- a/arch/x86/kernel/jailhouse.c
> +++ b/arch/x86/kernel/jailhouse.c
> @@ -124,6 +124,14 @@ static int __init jailhouse_pci_arch_init(void)
> if (pcibios_last_bus < 0)
> pcibios_last_bus = 0xff;
>
> +#ifdef CONFIG_PCI_MMCONFIG
> + if (setup_data.pci_mmconfig_base) {
> + pci_mmconfig_add(0, 0, pcibios_last_bus,
> + setup_data.pci_mmconfig_base);
> + pci_mmcfg_arch_init();
> + }
> +#endif
> +
> return 0;
> }
>
> diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c
> index 96684d0adcf9..0e590272366b 100644
> --- a/arch/x86/pci/mmconfig-shared.c
> +++ b/arch/x86/pci/mmconfig-shared.c
> @@ -94,8 +94,8 @@ static struct pci_mmcfg_region *pci_mmconfig_alloc(int segment, int start,
> return new;
> }
>
> -static struct pci_mmcfg_region *__init pci_mmconfig_add(int segment, int start,
> - int end, u64 addr)
> +struct pci_mmcfg_region *__init pci_mmconfig_add(int segment, int start,
> + int end, u64 addr)
> {
> struct pci_mmcfg_region *new;
>
> --
> 2.13.6
>
--
With Best Regards,
Andy Shevchenko
^ permalink raw reply
* Re: [PATCH net V2] virtio-net: re enable XDP_REDIRECT for mergeable buffer
From: David Miller @ 2018-03-05 3:16 UTC (permalink / raw)
To: jasowang
Cc: mst, netdev, john.fastabend, linux-kernel, virtualization, brouer
In-Reply-To: <986bd816-2452-9eca-ec8a-ce267ef49a2d@redhat.com>
From: Jason Wang <jasowang@redhat.com>
Date: Mon, 5 Mar 2018 10:43:41 +0800
>
>
> On 2018年03月05日 07:38, David Miller wrote:
>> From: Jason Wang <jasowang@redhat.com>
>> Date: Fri, 2 Mar 2018 17:29:14 +0800
>>
>>> XDP_REDIRECT support for mergeable buffer was removed since commit
>>> 7324f5399b06 ("virtio_net: disable XDP_REDIRECT in receive_mergeable()
>>> case"). This is because we don't reserve enough tailroom for struct
>>> skb_shared_info which breaks XDP assumption. So this patch fixes this
>>> by reserving enough tailroom and using fixed size of rx buffer.
>>>
>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>>> ---
>>> Changes from V1:
>>> - do not add duplicated tracepoint when redirection fails
>> Applied to net-next, thanks Jason.
>
> Hi David,
>
> Consider the change is not large, any chance to make it for -net to
> keep XDP redirection work?
Ok, I'll apply this to 'net' too.
^ permalink raw reply
* Re: [PATCH net V2] virtio-net: re enable XDP_REDIRECT for mergeable buffer
From: Jason Wang @ 2018-03-05 2:43 UTC (permalink / raw)
To: David Miller
Cc: mst, netdev, john.fastabend, linux-kernel, virtualization, brouer
In-Reply-To: <20180304.183811.344767417258525977.davem@davemloft.net>
On 2018年03月05日 07:38, David Miller wrote:
> From: Jason Wang <jasowang@redhat.com>
> Date: Fri, 2 Mar 2018 17:29:14 +0800
>
>> XDP_REDIRECT support for mergeable buffer was removed since commit
>> 7324f5399b06 ("virtio_net: disable XDP_REDIRECT in receive_mergeable()
>> case"). This is because we don't reserve enough tailroom for struct
>> skb_shared_info which breaks XDP assumption. So this patch fixes this
>> by reserving enough tailroom and using fixed size of rx buffer.
>>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> ---
>> Changes from V1:
>> - do not add duplicated tracepoint when redirection fails
> Applied to net-next, thanks Jason.
Hi David,
Consider the change is not large, any chance to make it for -net to keep
XDP redirection work?
Thanks
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH net V2] virtio-net: re enable XDP_REDIRECT for mergeable buffer
From: Jason Wang @ 2018-03-05 2:41 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: netdev, brouer, john.fastabend, linux-kernel, virtualization
In-Reply-To: <20180302193104-mutt-send-email-mst@kernel.org>
On 2018年03月03日 01:36, Michael S. Tsirkin wrote:
> On Fri, Mar 02, 2018 at 05:29:14PM +0800, Jason Wang wrote:
>> XDP_REDIRECT support for mergeable buffer was removed since commit
>> 7324f5399b06 ("virtio_net: disable XDP_REDIRECT in receive_mergeable()
>> case"). This is because we don't reserve enough tailroom for struct
>> skb_shared_info which breaks XDP assumption. So this patch fixes this
>> by reserving enough tailroom and using fixed size of rx buffer.
>>
>> Signed-off-by: Jason Wang<jasowang@redhat.com>
> Acked-by: Michael S. Tsirkin<mst@redhat.com>
>
> I think the next incremental step is to look at splitting
> out fast path XDP processing to a separate set of functions.
>
Let me try (probably after 1.1 stuffs).
Thanks
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH net V2] virtio-net: re enable XDP_REDIRECT for mergeable buffer
From: Jason Wang @ 2018-03-05 2:39 UTC (permalink / raw)
To: Jesper Dangaard Brouer
Cc: netdev, virtualization, john.fastabend, linux-kernel, mst
In-Reply-To: <20180302170758.332600d6@redhat.com>
On 2018年03月03日 00:07, Jesper Dangaard Brouer wrote:
> On Fri, 2 Mar 2018 17:29:14 +0800
> Jason Wang <jasowang@redhat.com> wrote:
>
>> XDP_REDIRECT support for mergeable buffer was removed since commit
>> 7324f5399b06 ("virtio_net: disable XDP_REDIRECT in receive_mergeable()
>> case"). This is because we don't reserve enough tailroom for struct
>> skb_shared_info which breaks XDP assumption. So this patch fixes this
>> by reserving enough tailroom and using fixed size of rx buffer.
>>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> ---
>> Changes from V1:
>> - do not add duplicated tracepoint when redirection fails
> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
>
> I gave it a quick spin on my testlab, and cpumap seems to
> work/not-crash now (if I managed to turn back config to
> receive_mergeable() correctly ;-)).
>
Thanks for the testing and reviewing.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH net V2] virtio-net: re enable XDP_REDIRECT for mergeable buffer
From: David Miller @ 2018-03-04 23:38 UTC (permalink / raw)
To: jasowang
Cc: mst, netdev, john.fastabend, linux-kernel, virtualization, brouer
In-Reply-To: <1519982954-14360-1-git-send-email-jasowang@redhat.com>
From: Jason Wang <jasowang@redhat.com>
Date: Fri, 2 Mar 2018 17:29:14 +0800
> XDP_REDIRECT support for mergeable buffer was removed since commit
> 7324f5399b06 ("virtio_net: disable XDP_REDIRECT in receive_mergeable()
> case"). This is because we don't reserve enough tailroom for struct
> skb_shared_info which breaks XDP assumption. So this patch fixes this
> by reserving enough tailroom and using fixed size of rx buffer.
>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
> Changes from V1:
> - do not add duplicated tracepoint when redirection fails
Applied to net-next, thanks Jason.
^ permalink raw reply
* [PATCH v4 7/7] MAINTAINERS: Add entry for Jailhouse
From: Jan Kiszka @ 2018-03-04 18:31 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, Bjorn Helgaas
Cc: jailhouse-dev, linux-pci, x86, Linux Kernel Mailing List,
virtualization, Andy Shevchenko
In-Reply-To: <cover.1520188299.git.jan.kiszka@siemens.com>
From: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
MAINTAINERS | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/MAINTAINERS b/MAINTAINERS
index 4623caf8d72d..6dc0b8f3ae0e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -7523,6 +7523,13 @@ Q: http://patchwork.linuxtv.org/project/linux-media/list/
S: Maintained
F: drivers/media/dvb-frontends/ix2505v*
+JAILHOUSE HYPERVISOR INTERFACE
+M: Jan Kiszka <jan.kiszka@siemens.com>
+L: jailhouse-dev@googlegroups.com
+S: Maintained
+F: arch/x86/kernel/jailhouse.c
+F: arch/x86/include/asm/jailhouse_para.h
+
JC42.4 TEMPERATURE SENSOR DRIVER
M: Guenter Roeck <linux@roeck-us.net>
L: linux-hwmon@vger.kernel.org
--
2.13.6
^ permalink raw reply related
* [PATCH v4 6/7] x86/jailhouse: Allow to use PCI_MMCONFIG without ACPI
From: Jan Kiszka @ 2018-03-04 18:31 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, Bjorn Helgaas
Cc: jailhouse-dev, linux-pci, x86, Linux Kernel Mailing List,
virtualization, Andy Shevchenko
In-Reply-To: <cover.1520188299.git.jan.kiszka@siemens.com>
From: Jan Kiszka <jan.kiszka@siemens.com>
Jailhouse does not use ACPI, but it does support MMCONFIG. Make sure the
latter can be built without having to enable ACPI as well. Primarily, we
need to make the AMD mmconf-fam10h_64 depend upon MMCONFIG and ACPI,
instead of just the former.
Saves some bytes in the Jailhouse non-root kernel.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
arch/x86/Kconfig | 6 +++++-
arch/x86/kernel/Makefile | 2 +-
arch/x86/kernel/cpu/amd.c | 2 +-
3 files changed, 7 insertions(+), 3 deletions(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 8986a6b6e3df..08a3236cb6f2 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2643,7 +2643,7 @@ config PCI_DIRECT
config PCI_MMCONFIG
bool "Support mmconfig PCI config space access" if X86_64
default y
- depends on PCI && (ACPI || SFI)
+ depends on PCI && (ACPI || SFI || JAILHOUSE_GUEST)
depends on X86_64 || (PCI_GOANY || PCI_GOMMCONFIG)
config PCI_OLPC
@@ -2659,6 +2659,10 @@ config PCI_DOMAINS
def_bool y
depends on PCI
+config MMCONF_FAM10H
+ def_bool y
+ depends on PCI_MMCONFIG && ACPI
+
config PCI_CNB20LE_QUIRK
bool "Read CNB20LE Host Bridge Windows" if EXPERT
depends on PCI
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 29786c87e864..73ccf80c09a2 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -146,6 +146,6 @@ ifeq ($(CONFIG_X86_64),y)
obj-$(CONFIG_GART_IOMMU) += amd_gart_64.o aperture_64.o
obj-$(CONFIG_CALGARY_IOMMU) += pci-calgary_64.o tce_64.o
- obj-$(CONFIG_PCI_MMCONFIG) += mmconf-fam10h_64.o
+ obj-$(CONFIG_MMCONF_FAM10H) += mmconf-fam10h_64.o
obj-y += vsmp_64.o
endif
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index f0e6456ca7d3..12bc0a1139da 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -716,7 +716,7 @@ static void init_amd_k8(struct cpuinfo_x86 *c)
static void init_amd_gh(struct cpuinfo_x86 *c)
{
-#ifdef CONFIG_X86_64
+#ifdef CONFIG_MMCONF_FAM10H
/* do this for boot cpu */
if (c == &boot_cpu_data)
check_enable_amd_mmconf_dmi();
--
2.13.6
^ permalink raw reply related
* [PATCH v4 5/7] x86: Consolidate PCI_MMCONFIG configs
From: Jan Kiszka @ 2018-03-04 18:31 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, Bjorn Helgaas
Cc: jailhouse-dev, linux-pci, x86, Linux Kernel Mailing List,
virtualization, Andy Shevchenko
In-Reply-To: <cover.1520188299.git.jan.kiszka@siemens.com>
From: Jan Kiszka <jan.kiszka@siemens.com>
Since e279b6c1d329 ("x86: start unification of arch/x86/Kconfig.*"), we
have two PCI_MMCONFIG entries, one from the original i386 and another
from x86_64. This consolidates both entries into a single one.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
arch/x86/Kconfig | 11 ++++-------
1 file changed, 4 insertions(+), 7 deletions(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index c19f5342ec2b..8986a6b6e3df 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2641,8 +2641,10 @@ config PCI_DIRECT
depends on PCI && (X86_64 || (PCI_GODIRECT || PCI_GOANY || PCI_GOOLPC || PCI_GOMMCONFIG))
config PCI_MMCONFIG
- def_bool y
- depends on X86_32 && PCI && (ACPI || SFI) && (PCI_GOMMCONFIG || PCI_GOANY)
+ bool "Support mmconfig PCI config space access" if X86_64
+ default y
+ depends on PCI && (ACPI || SFI)
+ depends on X86_64 || (PCI_GOANY || PCI_GOMMCONFIG)
config PCI_OLPC
def_bool y
@@ -2657,11 +2659,6 @@ config PCI_DOMAINS
def_bool y
depends on PCI
-config PCI_MMCONFIG
- bool "Support mmconfig PCI config space access"
- default y
- depends on X86_64 && PCI && (ACPI || SFI)
-
config PCI_CNB20LE_QUIRK
bool "Read CNB20LE Host Bridge Windows" if EXPERT
depends on PCI
--
2.13.6
^ permalink raw reply related
* [PATCH v4 4/7] x86: Align x86_64 PCI_MMCONFIG with 32-bit variant
From: Jan Kiszka @ 2018-03-04 18:31 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, Bjorn Helgaas
Cc: jailhouse-dev, linux-pci, x86, Linux Kernel Mailing List,
virtualization, Andy Shevchenko
In-Reply-To: <cover.1520188299.git.jan.kiszka@siemens.com>
From: Jan Kiszka <jan.kiszka@siemens.com>
Allow to enable PCI_MMCONFIG when only SFI is present and make this
option default on. This will help consolidating both into one Kconfig
statement.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
arch/x86/Kconfig | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index eb7f43f23521..c19f5342ec2b 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2659,7 +2659,8 @@ config PCI_DOMAINS
config PCI_MMCONFIG
bool "Support mmconfig PCI config space access"
- depends on X86_64 && PCI && ACPI
+ default y
+ depends on X86_64 && PCI && (ACPI || SFI)
config PCI_CNB20LE_QUIRK
bool "Read CNB20LE Host Bridge Windows" if EXPERT
--
2.13.6
^ permalink raw reply related
* [PATCH v4 3/7] x86/jailhouse: Enable PCI mmconfig access in inmates
From: Jan Kiszka @ 2018-03-04 18:31 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, Bjorn Helgaas
Cc: jailhouse-dev, linux-pci, x86, Linux Kernel Mailing List,
virtualization, Andy Shevchenko
In-Reply-To: <cover.1520188299.git.jan.kiszka@siemens.com>
From: Otavio Pontes <otavio.pontes@intel.com>
Use the PCI mmconfig base address exported by jailhouse in boot
parameters in order to access the memory mapped PCI configuration space.
Signed-off-by: Otavio Pontes <otavio.pontes@intel.com>
[Jan: rebased, fixed !CONFIG_PCI_MMCONFIG, used pcibios_last_bus]
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
arch/x86/include/asm/pci_x86.h | 2 ++
arch/x86/kernel/jailhouse.c | 8 ++++++++
arch/x86/pci/mmconfig-shared.c | 4 ++--
3 files changed, 12 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h
index eb66fa9cd0fc..959d618dbb17 100644
--- a/arch/x86/include/asm/pci_x86.h
+++ b/arch/x86/include/asm/pci_x86.h
@@ -151,6 +151,8 @@ extern int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end,
phys_addr_t addr);
extern int pci_mmconfig_delete(u16 seg, u8 start, u8 end);
extern struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus);
+extern struct pci_mmcfg_region *__init pci_mmconfig_add(int segment, int start,
+ int end, u64 addr);
extern struct list_head pci_mmcfg_list;
diff --git a/arch/x86/kernel/jailhouse.c b/arch/x86/kernel/jailhouse.c
index b68fd895235a..fa183a131edc 100644
--- a/arch/x86/kernel/jailhouse.c
+++ b/arch/x86/kernel/jailhouse.c
@@ -124,6 +124,14 @@ static int __init jailhouse_pci_arch_init(void)
if (pcibios_last_bus < 0)
pcibios_last_bus = 0xff;
+#ifdef CONFIG_PCI_MMCONFIG
+ if (setup_data.pci_mmconfig_base) {
+ pci_mmconfig_add(0, 0, pcibios_last_bus,
+ setup_data.pci_mmconfig_base);
+ pci_mmcfg_arch_init();
+ }
+#endif
+
return 0;
}
diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c
index 96684d0adcf9..0e590272366b 100644
--- a/arch/x86/pci/mmconfig-shared.c
+++ b/arch/x86/pci/mmconfig-shared.c
@@ -94,8 +94,8 @@ static struct pci_mmcfg_region *pci_mmconfig_alloc(int segment, int start,
return new;
}
-static struct pci_mmcfg_region *__init pci_mmconfig_add(int segment, int start,
- int end, u64 addr)
+struct pci_mmcfg_region *__init pci_mmconfig_add(int segment, int start,
+ int end, u64 addr)
{
struct pci_mmcfg_region *new;
--
2.13.6
^ permalink raw reply related
* [PATCH v4 2/7] PCI: Scan all functions when running over Jailhouse
From: Jan Kiszka @ 2018-03-04 18:31 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, Bjorn Helgaas
Cc: jailhouse-dev, Benedikt Spranger, linux-pci, x86,
Linux Kernel Mailing List, virtualization, Andy Shevchenko
In-Reply-To: <cover.1520188299.git.jan.kiszka@siemens.com>
From: Jan Kiszka <jan.kiszka@siemens.com>
Per PCIe r4.0, sec 7.5.1.1.9, multi-function devices are required to
have a function 0. Therefore, Linux scans for devices at function 0
(devfn 0/8/16/...) and only scans for other functions if function 0
has its Multi-Function Device bit set or ARI or SR-IOV indicate
there are more functions.
The Jailhouse hypervisor may pass individual functions of a
multi-function device to a guest without passing function 0, which
means a Linux guest won't find them.
Change Linux PCI probing so it scans all function numbers when
running as a guest over Jailhouse.
This is technically prohibited by the spec, so it is possible that
PCI devices without the Multi-Function Device bit set may have
unexpected behavior in response to this probe.
Derived from original patch by Benedikt Spranger.
CC: Benedikt Spranger <b.spranger@linutronix.de>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
---
arch/x86/pci/legacy.c | 4 +++-
drivers/pci/probe.c | 22 +++++++++++++++++++---
2 files changed, 22 insertions(+), 4 deletions(-)
diff --git a/arch/x86/pci/legacy.c b/arch/x86/pci/legacy.c
index 1cb01abcb1be..dfbe6ac38830 100644
--- a/arch/x86/pci/legacy.c
+++ b/arch/x86/pci/legacy.c
@@ -4,6 +4,7 @@
#include <linux/init.h>
#include <linux/export.h>
#include <linux/pci.h>
+#include <asm/jailhouse_para.h>
#include <asm/pci_x86.h>
/*
@@ -34,13 +35,14 @@ int __init pci_legacy_init(void)
void pcibios_scan_specific_bus(int busn)
{
+ int stride = jailhouse_paravirt() ? 1 : 8;
int devfn;
u32 l;
if (pci_find_bus(0, busn))
return;
- for (devfn = 0; devfn < 256; devfn += 8) {
+ for (devfn = 0; devfn < 256; devfn += stride) {
if (!raw_pci_read(0, busn, devfn, PCI_VENDOR_ID, 2, &l) &&
l != 0x0000 && l != 0xffff) {
DBG("Found device at %02x:%02x [%04x]\n", busn, devfn, l);
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index ef5377438a1e..3c365dc996e7 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -16,6 +16,7 @@
#include <linux/pci-aspm.h>
#include <linux/aer.h>
#include <linux/acpi.h>
+#include <linux/hypervisor.h>
#include <linux/irqdomain.h>
#include <linux/pm_runtime.h>
#include "pci.h"
@@ -2518,14 +2519,29 @@ static unsigned int pci_scan_child_bus_extend(struct pci_bus *bus,
{
unsigned int used_buses, normal_bridges = 0, hotplug_bridges = 0;
unsigned int start = bus->busn_res.start;
- unsigned int devfn, cmax, max = start;
+ unsigned int devfn, fn, cmax, max = start;
struct pci_dev *dev;
+ int nr_devs;
dev_dbg(&bus->dev, "scanning bus\n");
/* Go find them, Rover! */
- for (devfn = 0; devfn < 0x100; devfn += 8)
- pci_scan_slot(bus, devfn);
+ for (devfn = 0; devfn < 256; devfn += 8) {
+ nr_devs = pci_scan_slot(bus, devfn);
+
+ /*
+ * The Jailhouse hypervisor may pass individual functions of a
+ * multi-function device to a guest without passing function 0.
+ * Look for them as well.
+ */
+ if (jailhouse_paravirt() && nr_devs == 0) {
+ for (fn = 1; fn < 8; fn++) {
+ dev = pci_scan_single_device(bus, devfn + fn);
+ if (dev)
+ dev->multifunction = 1;
+ }
+ }
+ }
/* Reserve buses for SR-IOV capability */
used_buses = pci_iov_bus_range(bus);
--
2.13.6
^ permalink raw reply related
* [PATCH v4 1/7] jailhouse: Provide detection for non-x86 systems
From: Jan Kiszka @ 2018-03-04 18:31 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, Bjorn Helgaas
Cc: jailhouse-dev, Mark Rutland, Juergen Gross, linux-pci, x86,
Linux Kernel Mailing List, virtualization, Andy Shevchenko,
Rob Herring
In-Reply-To: <cover.1520188299.git.jan.kiszka@siemens.com>
From: Jan Kiszka <jan.kiszka@siemens.com>
Implement jailhouse_paravirt() via device tree probing on architectures
!= x86. Will be used by the PCI core.
CC: Rob Herring <robh+dt@kernel.org>
CC: Mark Rutland <mark.rutland@arm.com>
CC: Juergen Gross <jgross@suse.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
---
Documentation/devicetree/bindings/jailhouse.txt | 8 ++++++++
arch/x86/include/asm/jailhouse_para.h | 2 +-
include/linux/hypervisor.h | 17 +++++++++++++++--
3 files changed, 24 insertions(+), 3 deletions(-)
create mode 100644 Documentation/devicetree/bindings/jailhouse.txt
diff --git a/Documentation/devicetree/bindings/jailhouse.txt b/Documentation/devicetree/bindings/jailhouse.txt
new file mode 100644
index 000000000000..2901c25ff340
--- /dev/null
+++ b/Documentation/devicetree/bindings/jailhouse.txt
@@ -0,0 +1,8 @@
+Jailhouse non-root cell device tree bindings
+--------------------------------------------
+
+When running in a non-root Jailhouse cell (partition), the device tree of this
+platform shall have a top-level "hypervisor" node with the following
+properties:
+
+- compatible = "jailhouse,cell"
diff --git a/arch/x86/include/asm/jailhouse_para.h b/arch/x86/include/asm/jailhouse_para.h
index 875b54376689..b885a961a150 100644
--- a/arch/x86/include/asm/jailhouse_para.h
+++ b/arch/x86/include/asm/jailhouse_para.h
@@ -1,7 +1,7 @@
/* SPDX-License-Identifier: GPL2.0 */
/*
- * Jailhouse paravirt_ops implementation
+ * Jailhouse paravirt detection
*
* Copyright (c) Siemens AG, 2015-2017
*
diff --git a/include/linux/hypervisor.h b/include/linux/hypervisor.h
index b19563f9a8eb..fc08b433c856 100644
--- a/include/linux/hypervisor.h
+++ b/include/linux/hypervisor.h
@@ -8,15 +8,28 @@
*/
#ifdef CONFIG_X86
+
+#include <asm/jailhouse_para.h>
#include <asm/x86_init.h>
+
static inline void hypervisor_pin_vcpu(int cpu)
{
x86_platform.hyper.pin_vcpu(cpu);
}
-#else
+
+#else /* !CONFIG_X86 */
+
+#include <linux/of.h>
+
static inline void hypervisor_pin_vcpu(int cpu)
{
}
-#endif
+
+static inline bool jailhouse_paravirt(void)
+{
+ return of_find_compatible_node(NULL, NULL, "jailhouse,cell");
+}
+
+#endif /* !CONFIG_X86 */
#endif /* __LINUX_HYPEVISOR_H */
--
2.13.6
^ permalink raw reply related
* [PATCH v4 0/7] jailhouse: Enhance secondary Jailhouse guest support /wrt PCI
From: Jan Kiszka @ 2018-03-04 18:31 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, Bjorn Helgaas
Cc: jailhouse-dev, Juergen Gross, Benedikt Spranger, linux-pci, x86,
Linux Kernel Mailing List, virtualization, Andy Shevchenko,
Rob Herring, Otavio Pontes, Mark Rutland
Basic x86 support [1] for running Linux as secondary Jailhouse [2] guest
is currently pending in the tip tree. This builds on top and enhances
the PCI support for x86 and also ARM guests (ARM[64] does not require
platform patches and works already).
Key elements of this series are:
- detection of Jailhouse via device tree hypervisor node
- function-level PCI scan if Jailhouse is detected
- MMCONFIG support for x86 guests
As most changes affect x86, I would suggest to route the series also via
tip after the necessary acks are collected.
Changes in v4:
- slit up Kconfig changes
- respect pcibios_last_bus during mmconfig setup
- cosmetic changes requested by Andy
Changes in v3:
- avoided duplicate scans of PCI functions under Jailhouse
- reformated PCI_MMCONFIG condition and rephrase related commit log
Changes in v2:
- adjusted commit log and include ordering in patch 2
- rebased over Linus master
Jan
[1] https://lkml.org/lkml/2017/11/27/125
[2] http://jailhouse-project.org
CC: Benedikt Spranger <b.spranger@linutronix.de>
CC: Juergen Gross <jgross@suse.com>
CC: Mark Rutland <mark.rutland@arm.com>
CC: Otavio Pontes <otavio.pontes@intel.com>
CC: Rob Herring <robh+dt@kernel.org>
Jan Kiszka (6):
jailhouse: Provide detection for non-x86 systems
PCI: Scan all functions when running over Jailhouse
x86: Align x86_64 PCI_MMCONFIG with 32-bit variant
x86: Consolidate PCI_MMCONFIG configs
x86/jailhouse: Allow to use PCI_MMCONFIG without ACPI
MAINTAINERS: Add entry for Jailhouse
Otavio Pontes (1):
x86/jailhouse: Enable PCI mmconfig access in inmates
Documentation/devicetree/bindings/jailhouse.txt | 8 ++++++++
MAINTAINERS | 7 +++++++
arch/x86/Kconfig | 12 +++++++-----
arch/x86/include/asm/jailhouse_para.h | 2 +-
arch/x86/include/asm/pci_x86.h | 2 ++
arch/x86/kernel/Makefile | 2 +-
arch/x86/kernel/cpu/amd.c | 2 +-
arch/x86/kernel/jailhouse.c | 8 ++++++++
arch/x86/pci/legacy.c | 4 +++-
arch/x86/pci/mmconfig-shared.c | 4 ++--
drivers/pci/probe.c | 22 +++++++++++++++++++---
include/linux/hypervisor.h | 17 +++++++++++++++--
12 files changed, 74 insertions(+), 16 deletions(-)
create mode 100644 Documentation/devicetree/bindings/jailhouse.txt
--
2.13.6
^ permalink raw reply
* Re: [PATCH] virtio_balloon: export huge page allocation statistics
From: Jonathan Helman @ 2018-03-02 22:16 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: linux-kernel, virtualization
In-Reply-To: <20180227170250-mutt-send-email-mst@kernel.org>
On 02/27/2018 07:20 AM, Michael S. Tsirkin wrote:
> On Fri, Feb 16, 2018 at 09:44:32PM -0800, Jonathan Helman wrote:
>> Export statistics for successful and failed huge page allocations
>> from the virtio balloon driver. These 2 stats come directly from
>> the vm_events HTLB_BUDDY_PGALLOC and HTLB_BUDDY_PGALLOC_FAIL.
>> Signed-off-by: Jonathan Helman <jonathan.helman@oracle.com>
>
> Any host/guest intergace changes need to be copied to
> the virtio TC mailing list (subscriber-only, sorry about that):
> virtio-dev@lists.oasis-open.org
>
Sorry, will do on v2.
>> ---
>> drivers/virtio/virtio_balloon.c | 6 ++++++
>> include/uapi/linux/virtio_balloon.h | 4 +++-
>> 2 files changed, 9 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
>> index dfe5684..6b237e3 100644
>> --- a/drivers/virtio/virtio_balloon.c
>> +++ b/drivers/virtio/virtio_balloon.c
>> @@ -272,6 +272,12 @@ static unsigned int update_balloon_stats(struct virtio_balloon *vb)
>> pages_to_bytes(events[PSWPOUT]));
>> update_stat(vb, idx++, VIRTIO_BALLOON_S_MAJFLT, events[PGMAJFAULT]);
>> update_stat(vb, idx++, VIRTIO_BALLOON_S_MINFLT, events[PGFAULT]);
>> +#ifdef CONFIG_HUGETLB_PAGE
>> + update_stat(vb, idx++, VIRTIO_BALLOON_S_HTLB_PGALLOC,
>> + events[HTLB_BUDDY_PGALLOC]);
>> + update_stat(vb, idx++, VIRTIO_BALLOON_S_HTLB_PGFAIL,
>> + events[HTLB_BUDDY_PGALLOC_FAIL]);
>> +#endif
>> #endif
>> update_stat(vb, idx++, VIRTIO_BALLOON_S_MEMFREE,
>> pages_to_bytes(i.freeram));
>> diff --git a/include/uapi/linux/virtio_balloon.h b/include/uapi/linux/virtio_balloon.h
>> index 4e8b830..e3e8071 100644
>> --- a/include/uapi/linux/virtio_balloon.h
>> +++ b/include/uapi/linux/virtio_balloon.h
>> @@ -53,7 +53,9 @@ struct virtio_balloon_config {
>> #define VIRTIO_BALLOON_S_MEMTOT 5 /* Total amount of memory */
>> #define VIRTIO_BALLOON_S_AVAIL 6 /* Available memory as in /proc */
>> #define VIRTIO_BALLOON_S_CACHES 7 /* Disk caches */
>> -#define VIRTIO_BALLOON_S_NR 8
>> +#define VIRTIO_BALLOON_S_HTLB_PGALLOC 8 /* Number of htlb pgalloc successes */
>> +#define VIRTIO_BALLOON_S_HTLB_PGFAIL 9 /* Number of htlb pgalloc failures */
>> +#define VIRTIO_BALLOON_S_NR 10
>
> Can you clarify the comments pls? Eschew abbreviation.
Sure. How about the following instead? Hoping it makes things clearer,
without going over the 80 character limit ;)
#define VIRTIO_BALLOON_S_HTLB_PGALLOC 8 /* Hugetlb page allocations */
#define VIRTIO_BALLOON_S_HTLB_PGFAIL 9 /* Hugetlb page allocation
failures */
If this is better, I'll send out a v2.
Thanks,
Jon
>
>>
>> /*
>> * Memory statistics structure.
>> --
>> 1.8.3.1
^ permalink raw reply
* Re: [PATCH net V2] virtio-net: re enable XDP_REDIRECT for mergeable buffer
From: Michael S. Tsirkin @ 2018-03-02 17:36 UTC (permalink / raw)
To: Jason Wang; +Cc: netdev, brouer, john.fastabend, linux-kernel, virtualization
In-Reply-To: <1519982954-14360-1-git-send-email-jasowang@redhat.com>
On Fri, Mar 02, 2018 at 05:29:14PM +0800, Jason Wang wrote:
> XDP_REDIRECT support for mergeable buffer was removed since commit
> 7324f5399b06 ("virtio_net: disable XDP_REDIRECT in receive_mergeable()
> case"). This is because we don't reserve enough tailroom for struct
> skb_shared_info which breaks XDP assumption. So this patch fixes this
> by reserving enough tailroom and using fixed size of rx buffer.
>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
I think the next incremental step is to look at splitting
out fast path XDP processing to a separate set of functions.
> ---
> Changes from V1:
> - do not add duplicated tracepoint when redirection fails
> ---
> drivers/net/virtio_net.c | 54 +++++++++++++++++++++++++++++++++++++-----------
> 1 file changed, 42 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 9bb9e56..426dcf7 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -504,6 +504,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
> page_off += *len;
>
> while (--*num_buf) {
> + int tailroom = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> unsigned int buflen;
> void *buf;
> int off;
> @@ -518,7 +519,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
> /* guard against a misconfigured or uncooperative backend that
> * is sending packet larger than the MTU.
> */
> - if ((page_off + buflen) > PAGE_SIZE) {
> + if ((page_off + buflen + tailroom) > PAGE_SIZE) {
> put_page(p);
> goto err_buf;
> }
> @@ -690,6 +691,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> unsigned int truesize;
> unsigned int headroom = mergeable_ctx_to_headroom(ctx);
> bool sent;
> + int err;
>
> head_skb = NULL;
>
> @@ -701,7 +703,12 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> void *data;
> u32 act;
>
> - /* This happens when rx buffer size is underestimated */
> + /* This happens when rx buffer size is underestimated
> + * or headroom is not enough because of the buffer
> + * was refilled before XDP is set. This should only
> + * happen for the first several packets, so we don't
> + * care much about its performance.
> + */
> if (unlikely(num_buf > 1 ||
> headroom < virtnet_get_headroom(vi))) {
> /* linearize data for XDP */
> @@ -736,9 +743,6 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>
> act = bpf_prog_run_xdp(xdp_prog, &xdp);
>
> - if (act != XDP_PASS)
> - ewma_pkt_len_add(&rq->mrg_avg_pkt_len, len);
> -
> switch (act) {
> case XDP_PASS:
> /* recalculate offset to account for any header
> @@ -770,6 +774,18 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> goto err_xdp;
> rcu_read_unlock();
> goto xdp_xmit;
> + case XDP_REDIRECT:
> + err = xdp_do_redirect(dev, &xdp, xdp_prog);
> + if (err) {
> + if (unlikely(xdp_page != page))
> + put_page(xdp_page);
> + goto err_xdp;
> + }
> + *xdp_xmit = true;
> + if (unlikely(xdp_page != page))
> + goto err_xdp;
> + rcu_read_unlock();
> + goto xdp_xmit;
> default:
> bpf_warn_invalid_xdp_action(act);
> case XDP_ABORTED:
> @@ -1013,13 +1029,18 @@ static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq,
> }
>
> static unsigned int get_mergeable_buf_len(struct receive_queue *rq,
> - struct ewma_pkt_len *avg_pkt_len)
> + struct ewma_pkt_len *avg_pkt_len,
> + unsigned int room)
> {
> const size_t hdr_len = sizeof(struct virtio_net_hdr_mrg_rxbuf);
> unsigned int len;
>
> - len = hdr_len + clamp_t(unsigned int, ewma_pkt_len_read(avg_pkt_len),
> + if (room)
> + return PAGE_SIZE - room;
> +
> + len = hdr_len + clamp_t(unsigned int, ewma_pkt_len_read(avg_pkt_len),
> rq->min_buf_len, PAGE_SIZE - hdr_len);
> +
> return ALIGN(len, L1_CACHE_BYTES);
> }
>
> @@ -1028,21 +1049,27 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> {
> struct page_frag *alloc_frag = &rq->alloc_frag;
> unsigned int headroom = virtnet_get_headroom(vi);
> + unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
> + unsigned int room = SKB_DATA_ALIGN(headroom + tailroom);
> char *buf;
> void *ctx;
> int err;
> unsigned int len, hole;
>
> - len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len);
> - if (unlikely(!skb_page_frag_refill(len + headroom, alloc_frag, gfp)))
> + /* Extra tailroom is needed to satisfy XDP's assumption. This
> + * means rx frags coalescing won't work, but consider we've
> + * disabled GSO for XDP, it won't be a big issue.
> + */
> + len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room);
> + if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp)))
> return -ENOMEM;
>
> buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
> buf += headroom; /* advance address leaving hole at front of pkt */
> get_page(alloc_frag->page);
> - alloc_frag->offset += len + headroom;
> + alloc_frag->offset += len + room;
> hole = alloc_frag->size - alloc_frag->offset;
> - if (hole < len + headroom) {
> + if (hole < len + room) {
> /* To avoid internal fragmentation, if there is very likely not
> * enough space for another buffer, add the remaining space to
> * the current buffer.
> @@ -2576,12 +2603,15 @@ static ssize_t mergeable_rx_buffer_size_show(struct netdev_rx_queue *queue,
> {
> struct virtnet_info *vi = netdev_priv(queue->dev);
> unsigned int queue_index = get_netdev_rx_queue_index(queue);
> + unsigned int headroom = virtnet_get_headroom(vi);
> + unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
> struct ewma_pkt_len *avg;
>
> BUG_ON(queue_index >= vi->max_queue_pairs);
> avg = &vi->rq[queue_index].mrg_avg_pkt_len;
> return sprintf(buf, "%u\n",
> - get_mergeable_buf_len(&vi->rq[queue_index], avg));
> + get_mergeable_buf_len(&vi->rq[queue_index], avg,
> + SKB_DATA_ALIGN(headroom + tailroom)));
> }
>
> static struct rx_queue_attribute mergeable_rx_buffer_size_attribute =
> --
> 2.7.4
^ permalink raw reply
* Re: [PATCH net V2] virtio-net: re enable XDP_REDIRECT for mergeable buffer
From: Jesper Dangaard Brouer @ 2018-03-02 16:07 UTC (permalink / raw)
To: Jason Wang
Cc: mst, netdev, john.fastabend, linux-kernel, virtualization, brouer
In-Reply-To: <1519982954-14360-1-git-send-email-jasowang@redhat.com>
On Fri, 2 Mar 2018 17:29:14 +0800
Jason Wang <jasowang@redhat.com> wrote:
> XDP_REDIRECT support for mergeable buffer was removed since commit
> 7324f5399b06 ("virtio_net: disable XDP_REDIRECT in receive_mergeable()
> case"). This is because we don't reserve enough tailroom for struct
> skb_shared_info which breaks XDP assumption. So this patch fixes this
> by reserving enough tailroom and using fixed size of rx buffer.
>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
> Changes from V1:
> - do not add duplicated tracepoint when redirection fails
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
I gave it a quick spin on my testlab, and cpumap seems to
work/not-crash now (if I managed to turn back config to
receive_mergeable() correctly ;-)).
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
^ permalink raw reply
* Re: [PATCH v3 3/6] x86/jailhouse: Enable PCI mmconfig access in inmates
From: Jan Kiszka @ 2018-03-02 10:27 UTC (permalink / raw)
To: Andy Shevchenko
Cc: jailhouse-dev, linux-pci, x86, Linux Kernel Mailing List,
virtualization, Ingo Molnar, H . Peter Anvin, Bjorn Helgaas,
Thomas Gleixner
In-Reply-To: <CAHp75VeH5tBBY4D-1_VVaYiCdRyBG+2vA8ZGg_QBfVujawh=iA@mail.gmail.com>
On 2018-03-01 11:31, Andy Shevchenko wrote:
> On Thu, Mar 1, 2018 at 7:40 AM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>
>> Use the PCI mmconfig base address exported by jailhouse in boot
>> parameters in order to access the memory mapped PCI configuration space.
>
>
>> --- a/arch/x86/kernel/jailhouse.c
>> +++ b/arch/x86/kernel/jailhouse.c
>> @@ -124,6 +124,13 @@ static int __init jailhouse_pci_arch_init(void)
>> if (pcibios_last_bus < 0)
>> pcibios_last_bus = 0xff;
>>
>> +#ifdef CONFIG_PCI_MMCONFIG
>> + if (setup_data.pci_mmconfig_base) {
>
>> + pci_mmconfig_add(0, 0, 0xff, setup_data.pci_mmconfig_base);
>
> Hmm... Shouldn't be pcibios_last_bus instead of 0xff?
>
Indeed.
Thanks,
Jan
>> + pci_mmcfg_arch_init();
>> + }
>> +#endif
>
--
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux
^ permalink raw reply
* [PATCH net V2] virtio-net: re enable XDP_REDIRECT for mergeable buffer
From: Jason Wang @ 2018-03-02 9:29 UTC (permalink / raw)
To: mst, virtualization, netdev, linux-kernel; +Cc: john.fastabend, brouer
XDP_REDIRECT support for mergeable buffer was removed since commit
7324f5399b06 ("virtio_net: disable XDP_REDIRECT in receive_mergeable()
case"). This is because we don't reserve enough tailroom for struct
skb_shared_info which breaks XDP assumption. So this patch fixes this
by reserving enough tailroom and using fixed size of rx buffer.
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
Changes from V1:
- do not add duplicated tracepoint when redirection fails
---
drivers/net/virtio_net.c | 54 +++++++++++++++++++++++++++++++++++++-----------
1 file changed, 42 insertions(+), 12 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 9bb9e56..426dcf7 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -504,6 +504,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
page_off += *len;
while (--*num_buf) {
+ int tailroom = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
unsigned int buflen;
void *buf;
int off;
@@ -518,7 +519,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
/* guard against a misconfigured or uncooperative backend that
* is sending packet larger than the MTU.
*/
- if ((page_off + buflen) > PAGE_SIZE) {
+ if ((page_off + buflen + tailroom) > PAGE_SIZE) {
put_page(p);
goto err_buf;
}
@@ -690,6 +691,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
unsigned int truesize;
unsigned int headroom = mergeable_ctx_to_headroom(ctx);
bool sent;
+ int err;
head_skb = NULL;
@@ -701,7 +703,12 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
void *data;
u32 act;
- /* This happens when rx buffer size is underestimated */
+ /* This happens when rx buffer size is underestimated
+ * or headroom is not enough because of the buffer
+ * was refilled before XDP is set. This should only
+ * happen for the first several packets, so we don't
+ * care much about its performance.
+ */
if (unlikely(num_buf > 1 ||
headroom < virtnet_get_headroom(vi))) {
/* linearize data for XDP */
@@ -736,9 +743,6 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
act = bpf_prog_run_xdp(xdp_prog, &xdp);
- if (act != XDP_PASS)
- ewma_pkt_len_add(&rq->mrg_avg_pkt_len, len);
-
switch (act) {
case XDP_PASS:
/* recalculate offset to account for any header
@@ -770,6 +774,18 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
goto err_xdp;
rcu_read_unlock();
goto xdp_xmit;
+ case XDP_REDIRECT:
+ err = xdp_do_redirect(dev, &xdp, xdp_prog);
+ if (err) {
+ if (unlikely(xdp_page != page))
+ put_page(xdp_page);
+ goto err_xdp;
+ }
+ *xdp_xmit = true;
+ if (unlikely(xdp_page != page))
+ goto err_xdp;
+ rcu_read_unlock();
+ goto xdp_xmit;
default:
bpf_warn_invalid_xdp_action(act);
case XDP_ABORTED:
@@ -1013,13 +1029,18 @@ static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq,
}
static unsigned int get_mergeable_buf_len(struct receive_queue *rq,
- struct ewma_pkt_len *avg_pkt_len)
+ struct ewma_pkt_len *avg_pkt_len,
+ unsigned int room)
{
const size_t hdr_len = sizeof(struct virtio_net_hdr_mrg_rxbuf);
unsigned int len;
- len = hdr_len + clamp_t(unsigned int, ewma_pkt_len_read(avg_pkt_len),
+ if (room)
+ return PAGE_SIZE - room;
+
+ len = hdr_len + clamp_t(unsigned int, ewma_pkt_len_read(avg_pkt_len),
rq->min_buf_len, PAGE_SIZE - hdr_len);
+
return ALIGN(len, L1_CACHE_BYTES);
}
@@ -1028,21 +1049,27 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
{
struct page_frag *alloc_frag = &rq->alloc_frag;
unsigned int headroom = virtnet_get_headroom(vi);
+ unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
+ unsigned int room = SKB_DATA_ALIGN(headroom + tailroom);
char *buf;
void *ctx;
int err;
unsigned int len, hole;
- len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len);
- if (unlikely(!skb_page_frag_refill(len + headroom, alloc_frag, gfp)))
+ /* Extra tailroom is needed to satisfy XDP's assumption. This
+ * means rx frags coalescing won't work, but consider we've
+ * disabled GSO for XDP, it won't be a big issue.
+ */
+ len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room);
+ if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp)))
return -ENOMEM;
buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
buf += headroom; /* advance address leaving hole at front of pkt */
get_page(alloc_frag->page);
- alloc_frag->offset += len + headroom;
+ alloc_frag->offset += len + room;
hole = alloc_frag->size - alloc_frag->offset;
- if (hole < len + headroom) {
+ if (hole < len + room) {
/* To avoid internal fragmentation, if there is very likely not
* enough space for another buffer, add the remaining space to
* the current buffer.
@@ -2576,12 +2603,15 @@ static ssize_t mergeable_rx_buffer_size_show(struct netdev_rx_queue *queue,
{
struct virtnet_info *vi = netdev_priv(queue->dev);
unsigned int queue_index = get_netdev_rx_queue_index(queue);
+ unsigned int headroom = virtnet_get_headroom(vi);
+ unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
struct ewma_pkt_len *avg;
BUG_ON(queue_index >= vi->max_queue_pairs);
avg = &vi->rq[queue_index].mrg_avg_pkt_len;
return sprintf(buf, "%u\n",
- get_mergeable_buf_len(&vi->rq[queue_index], avg));
+ get_mergeable_buf_len(&vi->rq[queue_index], avg,
+ SKB_DATA_ALIGN(headroom + tailroom)));
}
static struct rx_queue_attribute mergeable_rx_buffer_size_attribute =
--
2.7.4
^ permalink raw reply related
* Re: [PATCH net] virtio-net: re enable XDP_REDIRECT for mergeable buffer
From: Jason Wang @ 2018-03-02 9:27 UTC (permalink / raw)
To: Jesper Dangaard Brouer
Cc: netdev, virtualization, john.fastabend, linux-kernel, mst
In-Reply-To: <20180302080517.4526f384@redhat.com>
On 2018年03月02日 15:05, Jesper Dangaard Brouer wrote:
> On Fri, 2 Mar 2018 14:25:29 +0800
> Jason Wang <jasowang@redhat.com> wrote:
>
>> @@ -770,6 +774,19 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>> goto err_xdp;
>> rcu_read_unlock();
>> goto xdp_xmit;
>> + case XDP_REDIRECT:
>> + err = xdp_do_redirect(dev, &xdp, xdp_prog);
>> + if (err) {
>> + trace_xdp_exception(vi->dev, xdp_prog, act);
> Do not add a trace_xdp_exception here... this is handled inside
> xdp_do_redirect() invocation.
>
>
Right, let me post V2.
Thanks
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH net] virtio-net: re enable XDP_REDIRECT for mergeable buffer
From: Jesper Dangaard Brouer @ 2018-03-02 7:05 UTC (permalink / raw)
To: Jason Wang
Cc: mst, netdev, john.fastabend, linux-kernel, virtualization, brouer
In-Reply-To: <1519971929-15148-1-git-send-email-jasowang@redhat.com>
On Fri, 2 Mar 2018 14:25:29 +0800
Jason Wang <jasowang@redhat.com> wrote:
> @@ -770,6 +774,19 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> goto err_xdp;
> rcu_read_unlock();
> goto xdp_xmit;
> + case XDP_REDIRECT:
> + err = xdp_do_redirect(dev, &xdp, xdp_prog);
> + if (err) {
> + trace_xdp_exception(vi->dev, xdp_prog, act);
Do not add a trace_xdp_exception here... this is handled inside
xdp_do_redirect() invocation.
> + if (unlikely(xdp_page != page))
> + put_page(xdp_page);
> + goto err_xdp;
> + }
> + *xdp_xmit = true;
> + if (unlikely(xdp_page != page))
> + goto err_xdp;
> + rcu_read_unlock();
> + goto xdp_xmit;
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
^ permalink raw reply
* [PATCH net] virtio-net: re enable XDP_REDIRECT for mergeable buffer
From: Jason Wang @ 2018-03-02 6:25 UTC (permalink / raw)
To: mst, virtualization, netdev, linux-kernel; +Cc: john.fastabend, brouer
XDP_REDIRECT support for mergeable buffer was removed since commit
7324f5399b06 ("virtio_net: disable XDP_REDIRECT in receive_mergeable()
case"). This is because we don't reserve enough tailroom for struct
skb_shared_info which breaks XDP assumption. So this patch fixes this
by reserving enough tailroom and using fixed size of rx buffer.
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
drivers/net/virtio_net.c | 55 +++++++++++++++++++++++++++++++++++++-----------
1 file changed, 43 insertions(+), 12 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 9bb9e56..11e48c5 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -504,6 +504,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
page_off += *len;
while (--*num_buf) {
+ int tailroom = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
unsigned int buflen;
void *buf;
int off;
@@ -518,7 +519,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
/* guard against a misconfigured or uncooperative backend that
* is sending packet larger than the MTU.
*/
- if ((page_off + buflen) > PAGE_SIZE) {
+ if ((page_off + buflen + tailroom) > PAGE_SIZE) {
put_page(p);
goto err_buf;
}
@@ -690,6 +691,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
unsigned int truesize;
unsigned int headroom = mergeable_ctx_to_headroom(ctx);
bool sent;
+ int err;
head_skb = NULL;
@@ -701,7 +703,12 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
void *data;
u32 act;
- /* This happens when rx buffer size is underestimated */
+ /* This happens when rx buffer size is underestimated
+ * or headroom is not enough because of the buffer
+ * was refilled before XDP is set. This should only
+ * happen for the first several packets, so we don't
+ * care much about its performance.
+ */
if (unlikely(num_buf > 1 ||
headroom < virtnet_get_headroom(vi))) {
/* linearize data for XDP */
@@ -736,9 +743,6 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
act = bpf_prog_run_xdp(xdp_prog, &xdp);
- if (act != XDP_PASS)
- ewma_pkt_len_add(&rq->mrg_avg_pkt_len, len);
-
switch (act) {
case XDP_PASS:
/* recalculate offset to account for any header
@@ -770,6 +774,19 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
goto err_xdp;
rcu_read_unlock();
goto xdp_xmit;
+ case XDP_REDIRECT:
+ err = xdp_do_redirect(dev, &xdp, xdp_prog);
+ if (err) {
+ trace_xdp_exception(vi->dev, xdp_prog, act);
+ if (unlikely(xdp_page != page))
+ put_page(xdp_page);
+ goto err_xdp;
+ }
+ *xdp_xmit = true;
+ if (unlikely(xdp_page != page))
+ goto err_xdp;
+ rcu_read_unlock();
+ goto xdp_xmit;
default:
bpf_warn_invalid_xdp_action(act);
case XDP_ABORTED:
@@ -1013,13 +1030,18 @@ static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq,
}
static unsigned int get_mergeable_buf_len(struct receive_queue *rq,
- struct ewma_pkt_len *avg_pkt_len)
+ struct ewma_pkt_len *avg_pkt_len,
+ unsigned int room)
{
const size_t hdr_len = sizeof(struct virtio_net_hdr_mrg_rxbuf);
unsigned int len;
- len = hdr_len + clamp_t(unsigned int, ewma_pkt_len_read(avg_pkt_len),
+ if (room)
+ return PAGE_SIZE - room;
+
+ len = hdr_len + clamp_t(unsigned int, ewma_pkt_len_read(avg_pkt_len),
rq->min_buf_len, PAGE_SIZE - hdr_len);
+
return ALIGN(len, L1_CACHE_BYTES);
}
@@ -1028,21 +1050,27 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
{
struct page_frag *alloc_frag = &rq->alloc_frag;
unsigned int headroom = virtnet_get_headroom(vi);
+ unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
+ unsigned int room = SKB_DATA_ALIGN(headroom + tailroom);
char *buf;
void *ctx;
int err;
unsigned int len, hole;
- len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len);
- if (unlikely(!skb_page_frag_refill(len + headroom, alloc_frag, gfp)))
+ /* Extra tailroom is needed to satisfy XDP's assumption. This
+ * means rx frags coalescing won't work, but consider we've
+ * disabled GSO for XDP, it won't be a big issue.
+ */
+ len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room);
+ if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp)))
return -ENOMEM;
buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
buf += headroom; /* advance address leaving hole at front of pkt */
get_page(alloc_frag->page);
- alloc_frag->offset += len + headroom;
+ alloc_frag->offset += len + room;
hole = alloc_frag->size - alloc_frag->offset;
- if (hole < len + headroom) {
+ if (hole < len + room) {
/* To avoid internal fragmentation, if there is very likely not
* enough space for another buffer, add the remaining space to
* the current buffer.
@@ -2576,12 +2604,15 @@ static ssize_t mergeable_rx_buffer_size_show(struct netdev_rx_queue *queue,
{
struct virtnet_info *vi = netdev_priv(queue->dev);
unsigned int queue_index = get_netdev_rx_queue_index(queue);
+ unsigned int headroom = virtnet_get_headroom(vi);
+ unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
struct ewma_pkt_len *avg;
BUG_ON(queue_index >= vi->max_queue_pairs);
avg = &vi->rq[queue_index].mrg_avg_pkt_len;
return sprintf(buf, "%u\n",
- get_mergeable_buf_len(&vi->rq[queue_index], avg));
+ get_mergeable_buf_len(&vi->rq[queue_index], avg,
+ SKB_DATA_ALIGN(headroom + tailroom)));
}
static struct rx_queue_attribute mergeable_rx_buffer_size_attribute =
--
2.7.4
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox