* Re: [PATCH v2 2/2] x86: paravirt: make native_save_fl extern inline
From: H. Peter Anvin @ 2018-06-05 17:31 UTC (permalink / raw)
To: Nick Desaulniers, akpm, ard.biesheuvel, aryabinin, akataria,
boris.ostrovsky, brijesh.singh, caoj.fnst, gregkh, jan.kiszka,
jarkko.sakkinen, jgross, jpoimboe, kirill.shutemov, mingo, mjg59,
mka, pombredanne, rostedt, tglx, thomas.lendacky, tweek
Cc: linux-efi, mawilcox, arnd, tstellar, x86, will.deacon, astrachan,
ghackmann, linux-kernel, yamada.masahiro, michal.lkml, geert,
manojgupta, keescook, sedat.dilek, rientjes, virtualization,
linux-kbuild
In-Reply-To: <26c7fd1c-ab40-368f-c2e3-43a6b45157fd@zytor.com>
[-- Attachment #1: Type: text/plain, Size: 528 bytes --]
On 06/05/18 10:28, H. Peter Anvin wrote:
> On 06/05/18 10:05, Nick Desaulniers wrote:
>> +
>> +/*
>> + * void native_restore_fl(unsigned long flags)
>> + * %rdi: flags
>> + */
>> +ENTRY(native_restore_fl)
>> + push %_ASM_DI
>> + popf
>> + ret
>> +ENDPROC(native_restore_fl)
>> +EXPORT_SYMBOL(native_restore_fl)
>>
>
> To work on i386, this would have to be %_ASM_AX in that case.
>
> Something like this added to <asm/asm.h> might be useful; then you can
> simply:
>
> push %_ASM_ARG1
>
Version with fixed typo...
-hpa
[-- Attachment #2: 0001-x86-asm-add-_ASM_ARG-constants-for-argument-registes.patch --]
[-- Type: text/x-patch, Size: 2091 bytes --]
From 9946c03bc6648ea65e6f8e2576c390dca9555288 Mon Sep 17 00:00:00 2001
From: "H. Peter Anvin" <hpa@linux.intel.com>
Date: Tue, 5 Jun 2018 10:21:35 -0700
Subject: [PATCH] x86/asm: add _ASM_ARG* constants for argument registes to
<asm/asm.h>
i386 and x86-64 uses different registers for arguments; make them
available so we don't have to #ifdef in the actual code.
Native size and specified size (q, l, w, b) versions are provided.
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
---
arch/x86/include/asm/asm.h | 59 ++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 59 insertions(+)
diff --git a/arch/x86/include/asm/asm.h b/arch/x86/include/asm/asm.h
index 219faaec51df..990770f9e76b 100644
--- a/arch/x86/include/asm/asm.h
+++ b/arch/x86/include/asm/asm.h
@@ -46,6 +46,65 @@
#define _ASM_SI __ASM_REG(si)
#define _ASM_DI __ASM_REG(di)
+#ifndef __x86_64__
+/* 32 bit */
+
+#define _ASM_ARG1 _ASM_AX
+#define _ASM_ARG2 _ASM_DX
+#define _ASM_ARG3 _ASM_CX
+
+#define _ASM_ARG1L eax
+#define _ASM_ARG2L edx
+#define _ASM_ARG3L ecx
+
+#define _ASM_ARG1W ax
+#define _ASM_ARG2W dx
+#define _ASM_ARG3W cx
+
+#define _ASM_ARG1B al
+#define _ASM_ARG2B dl
+#define _ASM_ARG3B cl
+
+#else
+/* 64 bit */
+
+#define _ASM_ARG1 _ASM_DI
+#define _ASM_ARG2 _ASM_SI
+#define _ASM_ARG3 _ASM_DX
+#define _ASM_ARG4 _ASM_CX
+#define _ASM_ARG5 r8
+#define _ASM_ARG6 r9
+
+#define _ASM_ARG1Q rdi
+#define _ASM_ARG2Q rsi
+#define _ASM_ARG3Q rdx
+#define _ASM_ARG4Q rcx
+#define _ASM_ARG5Q r8
+#define _ASM_ARG6Q r9
+
+#define _ASM_ARG1L edi
+#define _ASM_ARG2L esi
+#define _ASM_ARG3L edx
+#define _ASM_ARG4L ecx
+#define _ASM_ARG5L r8d
+#define _ASM_ARG6L r9d
+
+#define _ASM_ARG1W di
+#define _ASM_ARG2W si
+#define _ASM_ARG3W dx
+#define _ASM_ARG4W cx
+#define _ASM_ARG5W r8w
+#define _ASM_ARG6W r9w
+
+#define _ASM_ARG1B dil
+#define _ASM_ARG2B sil
+#define _ASM_ARG3B dl
+#define _ASM_ARG4B cl
+#define _ASM_ARG5B r8b
+#define _ASM_ARG6B r9b
+
+#endif
+
/*
* Macros to generate condition code outputs from inline assembly,
* The output operand must be type "bool".
--
2.14.4
[-- Attachment #3: Type: text/plain, Size: 183 bytes --]
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply related
* Re: [PATCH v2 2/2] x86: paravirt: make native_save_fl extern inline
From: H. Peter Anvin @ 2018-06-05 17:28 UTC (permalink / raw)
To: Nick Desaulniers, akpm, ard.biesheuvel, aryabinin, akataria,
boris.ostrovsky, brijesh.singh, caoj.fnst, gregkh, jan.kiszka,
jarkko.sakkinen, jgross, jpoimboe, kirill.shutemov, mingo, mjg59,
mka, pombredanne, rostedt, tglx, thomas.lendacky, tweek
Cc: linux-efi, mawilcox, arnd, tstellar, x86, will.deacon, astrachan,
ghackmann, linux-kernel, yamada.masahiro, michal.lkml, geert,
manojgupta, keescook, sedat.dilek, rientjes, virtualization,
linux-kbuild
In-Reply-To: <20180605170532.170361-3-ndesaulniers@google.com>
[-- Attachment #1: Type: text/plain, Size: 421 bytes --]
On 06/05/18 10:05, Nick Desaulniers wrote:
> +
> +/*
> + * void native_restore_fl(unsigned long flags)
> + * %rdi: flags
> + */
> +ENTRY(native_restore_fl)
> + push %_ASM_DI
> + popf
> + ret
> +ENDPROC(native_restore_fl)
> +EXPORT_SYMBOL(native_restore_fl)
>
To work on i386, this would have to be %_ASM_AX in that case.
Something like this added to <asm/asm.h> might be useful; then you can
simply:
push %_ASM_ARG1
[-- Attachment #2: 0001-x86-asm-add-_ASM_ARG-constants-for-argument-registes.patch --]
[-- Type: text/x-patch, Size: 2091 bytes --]
From 83a7c1b7167dceee0eb731cf4ae3af7f9b2c05ea Mon Sep 17 00:00:00 2001
From: "H. Peter Anvin" <hpa@linux.intel.com>
Date: Tue, 5 Jun 2018 10:21:35 -0700
Subject: [PATCH] x86/asm: add _ASM_ARG* constants for argument registes to
<asm/asm.h>
i386 and x86-64 uses different registers for arguments; make them
available so we don't have to #ifdef in the actual code.
Native size and specified size (q, l, w, b) versions are provided.
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
---
arch/x86/include/asm/asm.h | 59 ++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 59 insertions(+)
diff --git a/arch/x86/include/asm/asm.h b/arch/x86/include/asm/asm.h
index 219faaec51df..3cab54bce51d 100644
--- a/arch/x86/include/asm/asm.h
+++ b/arch/x86/include/asm/asm.h
@@ -46,6 +46,65 @@
#define _ASM_SI __ASM_REG(si)
#define _ASM_DI __ASM_REG(di)
+#ifndef __x86_64__
+/* 32 bit */
+
+#define _ASM_ARG1 _ASM_AX
+#define _ASM_ARG2 _ASM_DX
+#define _ASM_ARG3 _ASM_CX
+
+#define _ASM_ARG1L eax
+#define _ASM_ARG2L edx
+#define _ASM_ARG3L ecx
+
+#define _ASM_ARG1W ax
+#define _ASM_ARG2W dx
+#define _ASM_ARG3W cx
+
+#define _ASM_ARG1B al
+#define _ASM_ARG2B dl
+#define _ASM_ARG3B cl
+
+#else
+/* 64 bit */
+
+#define _ASM_ARG1 _ASM_DI
+#define _ASM_ARG2 _ASM_SI
+#define _ASM_ARG3 _ASM_DX
+#define _ASM_ARG4 _ASM_CX
+#define _ASM_ARG5 r8
+#define _ASM_ARG6 r9
+
+#define _ASM_ARG1Q rdi
+#define _ASM_ARG2Q rsi
+#define _ASM_ARG3Q rdx
+#define _ASM_ARG4Q rcx
+#define _ASM_ARG5Q r8
+#define _ASM_ARG6Q r9
+
+#define _ASM_ARG1L edi
+#define _ASM_ARG2L esi
+#define _ASM_ARG3L edx
+#define _ASM_ARG4L ecx
+#define _ASM_ARG5L r8d
+#define _ASM_ARG6L r9d
+
+#define _ASM_ARG1W di
+#define _ASM_ARG2W di
+#define _ASM_ARG3W dx
+#define _ASM_ARG4W cx
+#define _ASM_ARG5W r8w
+#define _ASM_ARG6W r9w
+
+#define _ASM_ARG1B dil
+#define _ASM_ARG2B sil
+#define _ASM_ARG3B dl
+#define _ASM_ARG4B cl
+#define _ASM_ARG5B r8b
+#define _ASM_ARG6B r9b
+
+#endif
+
/*
* Macros to generate condition code outputs from inline assembly,
* The output operand must be type "bool".
--
2.14.4
[-- Attachment #3: Type: text/plain, Size: 183 bytes --]
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply related
* Re: [PATCH v2 1/2] compiler-gcc.h: add gnu_inline to all inline declarations
From: Joe Perches @ 2018-06-05 17:23 UTC (permalink / raw)
To: Nick Desaulniers, akpm, ard.biesheuvel, aryabinin, akataria,
boris.ostrovsky, brijesh.singh, caoj.fnst, gregkh, hpa,
jan.kiszka, jarkko.sakkinen, jgross, jpoimboe, kirill.shutemov,
mingo, mjg59, mka, pombredanne, rostedt, tglx, thomas.lendacky,
tweek
Cc: linux-efi, mawilcox, arnd, tstellar, x86, will.deacon, astrachan,
ghackmann, linux-kernel, yamada.masahiro, michal.lkml, geert,
manojgupta, keescook, sedat.dilek, rientjes, virtualization,
linux-kbuild
In-Reply-To: <20180605170532.170361-2-ndesaulniers@google.com>
On Tue, 2018-06-05 at 10:05 -0700, Nick Desaulniers wrote:
> Functions marked extern inline do not emit an externally visible
> function when the gnu89 C standard is used. Some KBUILD Makefiles
> overwrite KBUILD_CFLAGS. This is an issue for GCC 5.1+ users as without
> an explicit C standard specified, the default is gnu11. Since c99, the
> semantics of extern inline have changed such that an externally visible
> function is always emitted. This can lead to multiple definition errors
> of extern inline functions at link time of compilation units whose build
> files have removed an explicit C standard compiler flag for users of GCC
> 5.1+ or Clang.
[]
> diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
[]
> @@ -72,17 +72,24 @@
> * -Wunused-function. This turns out to avoid the need for complex #ifdef
> * directives. Suppress the warning in clang as well by using "unused"
> * function attribute, which is redundant but not harmful for gcc.
> + * Prefer gnu_inline, so that extern inline functions do not emit an
> + * externally visible function. This makes extern inline behave as per gnu89
> + * semantics rather than c99. This prevents multiple symbol definition errors
> + * of extern inline functions at link time.
> */
> #if !defined(CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING) || \
> !defined(CONFIG_OPTIMIZE_INLINING) || (__GNUC__ < 4)
> -#define inline inline __attribute__((always_inline,unused)) notrace
> -#define __inline__ __inline__ __attribute__((always_inline,unused)) notrace
> -#define __inline __inline __attribute__((always_inline,unused)) notrace
> +#define inline \
> + inline __attribute__((always_inline, unused, gnu_inline)) notrace
> +#define __inline__ \
> + __inline__ __attribute__((always_inline, unused, gnu_inline)) notrace
> +#define __inline \
> + __inline __attribute__((always_inline, unused, gnu_inline)) notrace
Perhaps these are simpler as
#define __inline__ inline
#define __inline inline
> #else
> /* A lot of inline functions can cause havoc with function tracing */
> -#define inline inline __attribute__((unused)) notrace
> -#define __inline__ __inline__ __attribute__((unused)) notrace
> -#define __inline __inline __attribute__((unused)) notrace
> +#define inline inline __attribute__((unused, gnu_inline)) notrace
> +#define __inline__ __inline__ __attribute__((unused, gnu_inline)) notrace
> +#define __inline __inline __attribute__((unused, gnu_inline)) notrace
> #endif
And only set once along with:
> #define __always_inline inline __attribute__((always_inline))
And perhaps this __always_inline should be updated
with gnu_inline as well
^ permalink raw reply
* Re: [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
From: Michael S. Tsirkin @ 2018-06-05 12:33 UTC (permalink / raw)
To: Samudrala, Sridhar
Cc: alexander.h.duyck, virtio-dev, qemu-devel, virtualization
In-Reply-To: <fcefe501-b410-f1ec-829c-68199edcdbef@intel.com>
I don't think this is sufficient.
If both primary and standby devices are present, a legacy guest without
support for the feature might see two devices with same mac and get
confused.
I think that we should only make primary visible after guest acked the
backup feature bit.
And on reset or when backup is cleared in some other way, unplug the
primary.
Something like the below will do the job:
Primary device is added with a special "primary-failover" flag.
A virtual machine is then initialized with just a standby virtio
device. Primary is not yet added.
Later QEMU detects that guest driver device set DRIVER_OK.
It then exposes the primary device to the guest, and triggers
a device addition event (hot-plug event) for it.
If QEMU detects guest driver removal, it initiates a hot-unplug sequence
to remove the primary driver. In particular, if QEMU detects guest
re-initialization (e.g. by detecting guest reset) it immediately removes
the primary device.
We can move some of this code to management as well, architecturally it
does not make too much sense but it might be easier implementation-wise.
HTH
On Mon, Jun 04, 2018 at 06:41:48PM -0700, Samudrala, Sridhar wrote:
> Ping on this patch now that the kernel patches are accepted into davem's net-next tree.
> https://patchwork.ozlabs.org/cover/920005/
>
>
> On 5/7/2018 4:09 PM, Sridhar Samudrala wrote:
> > This feature bit can be used by hypervisor to indicate virtio_net device to
> > act as a standby for another device with the same MAC address.
> >
> > I tested this with a small change to the patch to mark the STANDBY feature 'true'
> > by default as i am using libvirt to start the VMs.
> > Is there a way to pass the newly added feature bit 'standby' to qemu via libvirt
> > XML file?
> >
> > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
> > ---
> > hw/net/virtio-net.c | 2 ++
> > include/standard-headers/linux/virtio_net.h | 3 +++
> > 2 files changed, 5 insertions(+)
> >
> > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> > index 90502fca7c..38b3140670 100644
> > --- a/hw/net/virtio-net.c
> > +++ b/hw/net/virtio-net.c
> > @@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = {
> > true),
> > DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN),
> > DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str),
> > + DEFINE_PROP_BIT64("standby", VirtIONet, host_features, VIRTIO_NET_F_STANDBY,
> > + false),
> > DEFINE_PROP_END_OF_LIST(),
> > };
> > diff --git a/include/standard-headers/linux/virtio_net.h b/include/standard-headers/linux/virtio_net.h
> > index e9f255ea3f..01ec09684c 100644
> > --- a/include/standard-headers/linux/virtio_net.h
> > +++ b/include/standard-headers/linux/virtio_net.h
> > @@ -57,6 +57,9 @@
> > * Steering */
> > #define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */
> > +#define VIRTIO_NET_F_STANDBY 62 /* Act as standby for another device
> > + * with the same MAC.
> > + */
> > #define VIRTIO_NET_F_SPEED_DUPLEX 63 /* Device set linkspeed and duplex */
> > #ifndef VIRTIO_NET_NO_LEGACY
^ permalink raw reply
* Re: [virtio-dev] Re: [PATCH v3] virtio_pci: support enabling VFs
From: Michael S. Tsirkin @ 2018-06-05 12:23 UTC (permalink / raw)
To: Tiwei Bie
Cc: alexander.h.duyck, virtio-dev, linux-pci, linux-kernel,
virtualization, stefanha, zhihong.wang, bhelgaas, mark.d.rustad
In-Reply-To: <20180605013653.GA1045@debian>
On Tue, Jun 05, 2018 at 09:36:53AM +0800, Tiwei Bie wrote:
> On Mon, Jun 04, 2018 at 07:32:25PM +0300, Michael S. Tsirkin wrote:
> > On Fri, Jun 01, 2018 at 12:02:39PM +0800, Tiwei Bie wrote:
> > > There is a new feature bit allocated in virtio spec to
> > > support SR-IOV (Single Root I/O Virtualization):
> > >
> > > https://github.com/oasis-tcs/virtio-spec/issues/11
> > >
> > > This patch enables the support for this feature bit in
> > > virtio driver.
> > >
> > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > > ---
> >
> > OK but what about freeze/restore functions?
So for restore, don't you need to restore the
sriov capability?
> > I also wonder about kexec - virtio.c currently does:
> >
> > /* We always start by resetting the device, in case a previous
> > * driver messed it up. This also tests that code path a little. */
> > dev->config->reset(dev);
> >
> > Do we need to do something like this for sriov?
>
> I think VFs are managed by PCI core. Once they are
> allocated, virtio driver doesn't have to care too
> much about how to manage them. The proposal for the
> spec is just to provide a feature bit based virtio
> way for virtio drivers to know whether a virtio
> device is SR-IOV capable (and virtio drivers can
> support configuring SR-IOV based on the feature
> bit negotiation result).
>
> >
> > I also wonder whether PCI core should disable sriov for us.
> >
> >
> > I wish there was a patch emulating this without vDPA for QEMU,
> > would make it easy to test your patches. Do you happen
> > to have something like this?
>
> Sorry, currently I don't have anything like this..
>
> Best regards,
> Tiwei Bie
>
> >
> > Thanks,
> >
> >
> > > v3:
> > > - Drop the acks;
> > >
> > > v2:
> > > - Disable VFs when unbinding the driver (Alex, MST);
> > > - Don't use pci_sriov_configure_simple (Alex);
> > >
> > > drivers/virtio/virtio_pci_common.c | 30 ++++++++++++++++++++++++++++++
> > > drivers/virtio/virtio_pci_modern.c | 14 ++++++++++++++
> > > include/uapi/linux/virtio_config.h | 7 ++++++-
> > > 3 files changed, 50 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
> > > index 48d4d1cf1cb6..1d4467b2dc31 100644
> > > --- a/drivers/virtio/virtio_pci_common.c
> > > +++ b/drivers/virtio/virtio_pci_common.c
> > > @@ -577,6 +577,8 @@ static void virtio_pci_remove(struct pci_dev *pci_dev)
> > > struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
> > > struct device *dev = get_device(&vp_dev->vdev.dev);
> > >
> > > + pci_disable_sriov(pci_dev);
> > > +
> > > unregister_virtio_device(&vp_dev->vdev);
> > >
> > > if (vp_dev->ioaddr)
> > > @@ -588,6 +590,33 @@ static void virtio_pci_remove(struct pci_dev *pci_dev)
> > > put_device(dev);
> > > }
> > >
> > > +static int virtio_pci_sriov_configure(struct pci_dev *pci_dev, int num_vfs)
> > > +{
> > > + struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
> > > + struct virtio_device *vdev = &vp_dev->vdev;
> > > + int ret;
> > > +
> > > + if (!(vdev->config->get_status(vdev) & VIRTIO_CONFIG_S_DRIVER_OK))
> > > + return -EBUSY;
> > > +
> > > + if (!__virtio_test_bit(vdev, VIRTIO_F_SR_IOV))
> > > + return -EINVAL;
> > > +
> > > + if (pci_vfs_assigned(pci_dev))
> > > + return -EPERM;
> > > +
> > > + if (num_vfs == 0) {
> > > + pci_disable_sriov(pci_dev);
> > > + return 0;
> > > + }
> > > +
> > > + ret = pci_enable_sriov(pci_dev, num_vfs);
> > > + if (ret < 0)
> > > + return ret;
> > > +
> > > + return num_vfs;
> > > +}
> > > +
> > > static struct pci_driver virtio_pci_driver = {
> > > .name = "virtio-pci",
> > > .id_table = virtio_pci_id_table,
> > > @@ -596,6 +625,7 @@ static struct pci_driver virtio_pci_driver = {
> > > #ifdef CONFIG_PM_SLEEP
> > > .driver.pm = &virtio_pci_pm_ops,
> > > #endif
> > > + .sriov_configure = virtio_pci_sriov_configure,
> > > };
> > >
> > > module_pci_driver(virtio_pci_driver);
> > > diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
> > > index 2555d80f6eec..07571daccfec 100644
> > > --- a/drivers/virtio/virtio_pci_modern.c
> > > +++ b/drivers/virtio/virtio_pci_modern.c
> > > @@ -153,14 +153,28 @@ static u64 vp_get_features(struct virtio_device *vdev)
> > > return features;
> > > }
> > >
> > > +static void vp_transport_features(struct virtio_device *vdev, u64 features)
> > > +{
> > > + struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > > + struct pci_dev *pci_dev = vp_dev->pci_dev;
> > > +
> > > + if ((features & BIT_ULL(VIRTIO_F_SR_IOV)) &&
> > > + pci_find_ext_capability(pci_dev, PCI_EXT_CAP_ID_SRIOV))
> > > + __virtio_set_bit(vdev, VIRTIO_F_SR_IOV);
> > > +}
> > > +
> > > /* virtio config->finalize_features() implementation */
> > > static int vp_finalize_features(struct virtio_device *vdev)
> > > {
> > > struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > > + u64 features = vdev->features;
> > >
> > > /* Give virtio_ring a chance to accept features. */
> > > vring_transport_features(vdev);
> > >
> > > + /* Give virtio_pci a chance to accept features. */
> > > + vp_transport_features(vdev, features);
> > > +
> > > if (!__virtio_test_bit(vdev, VIRTIO_F_VERSION_1)) {
> > > dev_err(&vdev->dev, "virtio: device uses modern interface "
> > > "but does not have VIRTIO_F_VERSION_1\n");
> > > diff --git a/include/uapi/linux/virtio_config.h b/include/uapi/linux/virtio_config.h
> > > index 308e2096291f..b7c1f4e7d59e 100644
> > > --- a/include/uapi/linux/virtio_config.h
> > > +++ b/include/uapi/linux/virtio_config.h
> > > @@ -49,7 +49,7 @@
> > > * transport being used (eg. virtio_ring), the rest are per-device feature
> > > * bits. */
> > > #define VIRTIO_TRANSPORT_F_START 28
> > > -#define VIRTIO_TRANSPORT_F_END 34
> > > +#define VIRTIO_TRANSPORT_F_END 38
> > >
> > > #ifndef VIRTIO_CONFIG_NO_LEGACY
> > > /* Do we get callbacks when the ring is completely used, even if we've
> > > @@ -71,4 +71,9 @@
> > > * this is for compatibility with legacy systems.
> > > */
> > > #define VIRTIO_F_IOMMU_PLATFORM 33
> > > +
> > > +/*
> > > + * Does the device support Single Root I/O Virtualization?
> > > + */
> > > +#define VIRTIO_F_SR_IOV 37
> > > #endif /* _UAPI_LINUX_VIRTIO_CONFIG_H */
> > > --
> > > 2.17.0
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> >
^ permalink raw reply
* [RFC v6 5/5] virtio_ring: enable packed ring
From: Tiwei Bie @ 2018-06-05 7:40 UTC (permalink / raw)
To: mst, jasowang, virtualization, linux-kernel, netdev; +Cc: wexu
In-Reply-To: <20180605074046.20709-1-tiwei.bie@intel.com>
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
drivers/s390/virtio/virtio_ccw.c | 8 ++++++++
drivers/virtio/virtio_ring.c | 2 ++
2 files changed, 10 insertions(+)
diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
index 8f5c1d7f751a..ff5b85736d8d 100644
--- a/drivers/s390/virtio/virtio_ccw.c
+++ b/drivers/s390/virtio/virtio_ccw.c
@@ -765,6 +765,11 @@ static u64 virtio_ccw_get_features(struct virtio_device *vdev)
return rc;
}
+static void ccw_transport_features(struct virtio_device *vdev)
+{
+ __virtio_clear_bit(vdev, VIRTIO_F_RING_PACKED);
+}
+
static int virtio_ccw_finalize_features(struct virtio_device *vdev)
{
struct virtio_ccw_device *vcdev = to_vc_device(vdev);
@@ -791,6 +796,9 @@ static int virtio_ccw_finalize_features(struct virtio_device *vdev)
/* Give virtio_ring a chance to accept features. */
vring_transport_features(vdev);
+ /* Give virtio_ccw a chance to accept features. */
+ ccw_transport_features(vdev);
+
features->index = 0;
features->features = cpu_to_le32((u32)vdev->features);
/* Write the first half of the feature bits to the host. */
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index aefd7ac40928..fe849fd8733b 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -1965,6 +1965,8 @@ void vring_transport_features(struct virtio_device *vdev)
break;
case VIRTIO_F_IOMMU_PLATFORM:
break;
+ case VIRTIO_F_RING_PACKED:
+ break;
default:
/* We don't understand this bit. */
__virtio_clear_bit(vdev, i);
--
2.17.0
^ permalink raw reply related
* [RFC v6 4/5] virtio_ring: add event idx support in packed ring
From: Tiwei Bie @ 2018-06-05 7:40 UTC (permalink / raw)
To: mst, jasowang, virtualization, linux-kernel, netdev; +Cc: wexu
In-Reply-To: <20180605074046.20709-1-tiwei.bie@intel.com>
This commit introduces the EVENT_IDX support in packed ring.
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
drivers/virtio/virtio_ring.c | 74 ++++++++++++++++++++++++++++++++----
1 file changed, 67 insertions(+), 7 deletions(-)
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 983ce5ffda1b..aefd7ac40928 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -1048,7 +1048,7 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)
{
struct vring_virtqueue *vq = to_vvq(_vq);
- u16 flags;
+ u16 new, old, off_wrap, flags, wrap_counter, event_idx;
bool needs_kick;
u32 snapshot;
@@ -1057,9 +1057,19 @@ static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)
* suppressions. */
virtio_mb(vq->weak_barriers);
+ old = vq->next_avail_idx - vq->num_added;
+ new = vq->next_avail_idx;
+ vq->num_added = 0;
+
snapshot = *(u32 *)vq->vring_packed.device;
+ off_wrap = virtio16_to_cpu(_vq->vdev, (__virtio16)(snapshot & 0xffff));
flags = virtio16_to_cpu(_vq->vdev, (__virtio16)(snapshot >> 16)) & 0x3;
+ wrap_counter = off_wrap >> 15;
+ event_idx = off_wrap & ~(1<<15);
+ if (wrap_counter != vq->avail_wrap_counter)
+ event_idx -= vq->vring_packed.num;
+
#ifdef DEBUG
if (vq->last_add_time_valid) {
WARN_ON(ktime_to_ms(ktime_sub(ktime_get(),
@@ -1068,7 +1078,10 @@ static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)
vq->last_add_time_valid = false;
#endif
- needs_kick = (flags != VRING_EVENT_F_DISABLE);
+ if (flags == VRING_EVENT_F_DESC)
+ needs_kick = vring_need_event(event_idx, new, old);
+ else
+ needs_kick = (flags != VRING_EVENT_F_DISABLE);
END_USE(vq);
return needs_kick;
}
@@ -1177,6 +1190,15 @@ static void *virtqueue_get_buf_ctx_packed(struct virtqueue *_vq,
ret = vq->desc_state_packed[id].data;
detach_buf_packed(vq, id, ctx);
+ /* If we expect an interrupt for the next entry, tell host
+ * by writing event index and flush out the write before
+ * the read in the next get_buf call. */
+ if (vq->event_flags_shadow == VRING_EVENT_F_DESC)
+ virtio_store_mb(vq->weak_barriers,
+ &vq->vring_packed.driver->off_wrap,
+ cpu_to_virtio16(_vq->vdev, vq->last_used_idx |
+ ((u16)vq->used_wrap_counter << 15)));
+
#ifdef DEBUG
vq->last_add_time_valid = false;
#endif
@@ -1204,9 +1226,20 @@ static unsigned virtqueue_enable_cb_prepare_packed(struct virtqueue *_vq)
/* We optimistically turn back on interrupts, then check if there was
* more to do. */
+ /* Depending on the VIRTIO_RING_F_EVENT_IDX feature, we need to
+ * either clear the flags bit or point the event index at the next
+ * entry. Always update the event index to keep code simple. */
+
+ vq->vring_packed.driver->off_wrap = cpu_to_virtio16(_vq->vdev,
+ vq->last_used_idx |
+ ((u16)vq->used_wrap_counter << 15));
if (vq->event_flags_shadow == VRING_EVENT_F_DISABLE) {
- vq->event_flags_shadow = VRING_EVENT_F_ENABLE;
+ /* We need to update event offset and event wrap
+ * counter first before updating event flags. */
+ virtio_wmb(vq->weak_barriers);
+ vq->event_flags_shadow = vq->event ? VRING_EVENT_F_DESC :
+ VRING_EVENT_F_ENABLE;
vq->vring_packed.driver->flags = cpu_to_virtio16(_vq->vdev,
vq->event_flags_shadow);
}
@@ -1232,21 +1265,48 @@ static bool virtqueue_poll_packed(struct virtqueue *_vq, unsigned last_used_idx)
static bool virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq)
{
struct vring_virtqueue *vq = to_vvq(_vq);
+ u16 bufs, used_idx, wrap_counter;
START_USE(vq);
/* We optimistically turn back on interrupts, then check if there was
* more to do. */
+ /* Depending on the VIRTIO_RING_F_EVENT_IDX feature, we need to
+ * either clear the flags bit or point the event index at the next
+ * entry. Always update the event index to keep code simple. */
+
+ /* TODO: tune this threshold */
+ if (vq->next_avail_idx < vq->last_used_idx)
+ bufs = (vq->vring_packed.num + vq->next_avail_idx -
+ vq->last_used_idx) * 3 / 4;
+ else
+ bufs = (vq->next_avail_idx - vq->last_used_idx) * 3 / 4;
+
+ wrap_counter = vq->used_wrap_counter;
+
+ used_idx = vq->last_used_idx + bufs;
+ if (used_idx >= vq->vring_packed.num) {
+ used_idx -= vq->vring_packed.num;
+ wrap_counter ^= 1;
+ }
+
+ vq->vring_packed.driver->off_wrap = cpu_to_virtio16(_vq->vdev,
+ used_idx | (wrap_counter << 15));
if (vq->event_flags_shadow == VRING_EVENT_F_DISABLE) {
- vq->event_flags_shadow = VRING_EVENT_F_ENABLE;
+ /* We need to update event offset and event wrap
+ * counter first before updating event flags. */
+ virtio_wmb(vq->weak_barriers);
+ vq->event_flags_shadow = vq->event ? VRING_EVENT_F_DESC :
+ VRING_EVENT_F_ENABLE;
vq->vring_packed.driver->flags = cpu_to_virtio16(_vq->vdev,
vq->event_flags_shadow);
- /* We need to enable interrupts first before re-checking
- * for more used buffers. */
- virtio_mb(vq->weak_barriers);
}
+ /* We need to update event suppression structure first
+ * before re-checking for more used buffers. */
+ virtio_mb(vq->weak_barriers);
+
if (more_used_packed(vq)) {
END_USE(vq);
return false;
--
2.17.0
^ permalink raw reply related
* [RFC v6 3/5] virtio_ring: add packed ring support
From: Tiwei Bie @ 2018-06-05 7:40 UTC (permalink / raw)
To: mst, jasowang, virtualization, linux-kernel, netdev; +Cc: wexu
In-Reply-To: <20180605074046.20709-1-tiwei.bie@intel.com>
This commit introduces the support (without EVENT_IDX) for
packed ring.
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
drivers/virtio/virtio_ring.c | 486 ++++++++++++++++++++++++++++++++++-
1 file changed, 479 insertions(+), 7 deletions(-)
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 416c33143125..983ce5ffda1b 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -62,6 +62,12 @@ struct vring_desc_state {
};
struct vring_desc_state_packed {
+ void *data; /* Data for callback. */
+ struct vring_packed_desc *indir_desc; /* Indirect descriptor, if any. */
+ int num; /* Descriptor list length. */
+ dma_addr_t addr; /* Buffer DMA addr. */
+ u32 len; /* Buffer length. */
+ u16 flags; /* Descriptor flags. */
int next; /* The next desc state. */
};
@@ -661,7 +667,6 @@ static bool virtqueue_poll_split(struct virtqueue *_vq, unsigned last_used_idx)
{
struct vring_virtqueue *vq = to_vvq(_vq);
- virtio_mb(vq->weak_barriers);
return (u16)last_used_idx != virtio16_to_cpu(_vq->vdev, vq->vring.used->idx);
}
@@ -758,6 +763,72 @@ static inline unsigned vring_size_packed(unsigned int num, unsigned long align)
& ~(align - 1)) + sizeof(struct vring_packed_desc_event) * 2;
}
+static void vring_unmap_state_packed(const struct vring_virtqueue *vq,
+ struct vring_desc_state_packed *state)
+{
+ u16 flags;
+
+ if (!vring_use_dma_api(vq->vq.vdev))
+ return;
+
+ flags = state->flags;
+
+ if (flags & VRING_DESC_F_INDIRECT) {
+ dma_unmap_single(vring_dma_dev(vq),
+ state->addr, state->len,
+ (flags & VRING_DESC_F_WRITE) ?
+ DMA_FROM_DEVICE : DMA_TO_DEVICE);
+ } else {
+ dma_unmap_page(vring_dma_dev(vq),
+ state->addr, state->len,
+ (flags & VRING_DESC_F_WRITE) ?
+ DMA_FROM_DEVICE : DMA_TO_DEVICE);
+ }
+}
+
+static void vring_unmap_desc_packed(const struct vring_virtqueue *vq,
+ struct vring_packed_desc *desc)
+{
+ u16 flags;
+
+ if (!vring_use_dma_api(vq->vq.vdev))
+ return;
+
+ flags = virtio16_to_cpu(vq->vq.vdev, desc->flags);
+
+ if (flags & VRING_DESC_F_INDIRECT) {
+ dma_unmap_single(vring_dma_dev(vq),
+ virtio64_to_cpu(vq->vq.vdev, desc->addr),
+ virtio32_to_cpu(vq->vq.vdev, desc->len),
+ (flags & VRING_DESC_F_WRITE) ?
+ DMA_FROM_DEVICE : DMA_TO_DEVICE);
+ } else {
+ dma_unmap_page(vring_dma_dev(vq),
+ virtio64_to_cpu(vq->vq.vdev, desc->addr),
+ virtio32_to_cpu(vq->vq.vdev, desc->len),
+ (flags & VRING_DESC_F_WRITE) ?
+ DMA_FROM_DEVICE : DMA_TO_DEVICE);
+ }
+}
+
+static struct vring_packed_desc *alloc_indirect_packed(struct virtqueue *_vq,
+ unsigned int total_sg,
+ gfp_t gfp)
+{
+ struct vring_packed_desc *desc;
+
+ /*
+ * We require lowmem mappings for the descriptors because
+ * otherwise virt_to_phys will give us bogus addresses in the
+ * virtqueue.
+ */
+ gfp &= ~__GFP_HIGHMEM;
+
+ desc = kmalloc(total_sg * sizeof(struct vring_packed_desc), gfp);
+
+ return desc;
+}
+
static inline int virtqueue_add_packed(struct virtqueue *_vq,
struct scatterlist *sgs[],
unsigned int total_sg,
@@ -767,47 +838,445 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
void *ctx,
gfp_t gfp)
{
+ struct vring_virtqueue *vq = to_vvq(_vq);
+ struct vring_packed_desc *desc;
+ struct scatterlist *sg;
+ unsigned int i, n, descs_used, uninitialized_var(prev), err_idx;
+ __virtio16 uninitialized_var(head_flags), flags;
+ u16 head, avail_wrap_counter, id, curr;
+ bool indirect;
+
+ START_USE(vq);
+
+ BUG_ON(data == NULL);
+ BUG_ON(ctx && vq->indirect);
+
+ if (unlikely(vq->broken)) {
+ END_USE(vq);
+ return -EIO;
+ }
+
+#ifdef DEBUG
+ {
+ ktime_t now = ktime_get();
+
+ /* No kick or get, with .1 second between? Warn. */
+ if (vq->last_add_time_valid)
+ WARN_ON(ktime_to_ms(ktime_sub(now, vq->last_add_time))
+ > 100);
+ vq->last_add_time = now;
+ vq->last_add_time_valid = true;
+ }
+#endif
+
+ BUG_ON(total_sg == 0);
+
+ head = vq->next_avail_idx;
+ avail_wrap_counter = vq->avail_wrap_counter;
+
+ if (virtqueue_use_indirect(_vq, total_sg))
+ desc = alloc_indirect_packed(_vq, total_sg, gfp);
+ else {
+ desc = NULL;
+ WARN_ON_ONCE(total_sg > vq->vring_packed.num && !vq->indirect);
+ }
+
+ if (desc) {
+ /* Use a single buffer which doesn't continue */
+ indirect = true;
+ /* Set up rest to use this indirect table. */
+ i = 0;
+ descs_used = 1;
+ } else {
+ indirect = false;
+ desc = vq->vring_packed.desc;
+ i = head;
+ descs_used = total_sg;
+ }
+
+ if (vq->vq.num_free < descs_used) {
+ pr_debug("Can't add buf len %i - avail = %i\n",
+ descs_used, vq->vq.num_free);
+ /* FIXME: for historical reasons, we force a notify here if
+ * there are outgoing parts to the buffer. Presumably the
+ * host should service the ring ASAP. */
+ if (out_sgs)
+ vq->notify(&vq->vq);
+ if (indirect)
+ kfree(desc);
+ END_USE(vq);
+ return -ENOSPC;
+ }
+
+ id = vq->free_head;
+ BUG_ON(id == vq->vring_packed.num);
+
+ curr = id;
+ for (n = 0; n < out_sgs + in_sgs; n++) {
+ for (sg = sgs[n]; sg; sg = sg_next(sg)) {
+ dma_addr_t addr = vring_map_one_sg(vq, sg, n < out_sgs ?
+ DMA_TO_DEVICE : DMA_FROM_DEVICE);
+ if (vring_mapping_error(vq, addr))
+ goto unmap_release;
+
+ flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_NEXT |
+ (n < out_sgs ? 0 : VRING_DESC_F_WRITE) |
+ VRING_DESC_F_AVAIL(vq->avail_wrap_counter) |
+ VRING_DESC_F_USED(!vq->avail_wrap_counter));
+ if (!indirect && i == head)
+ head_flags = flags;
+ else
+ desc[i].flags = flags;
+
+ desc[i].addr = cpu_to_virtio64(_vq->vdev, addr);
+ desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length);
+ i++;
+ if (!indirect) {
+ if (vring_use_dma_api(_vq->vdev)) {
+ vq->desc_state_packed[curr].addr = addr;
+ vq->desc_state_packed[curr].len =
+ sg->length;
+ vq->desc_state_packed[curr].flags =
+ virtio16_to_cpu(_vq->vdev,
+ flags);
+ }
+ curr = vq->desc_state_packed[curr].next;
+
+ if (i >= vq->vring_packed.num) {
+ i = 0;
+ vq->avail_wrap_counter ^= 1;
+ }
+ }
+ }
+ }
+
+ prev = (i > 0 ? i : vq->vring_packed.num) - 1;
+ desc[prev].id = cpu_to_virtio16(_vq->vdev, id);
+
+ /* Last one doesn't continue. */
+ if (total_sg == 1)
+ head_flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);
+ else
+ desc[prev].flags &= cpu_to_virtio16(_vq->vdev,
+ ~VRING_DESC_F_NEXT);
+
+ if (indirect) {
+ /* Now that the indirect table is filled in, map it. */
+ dma_addr_t addr = vring_map_single(
+ vq, desc, total_sg * sizeof(struct vring_packed_desc),
+ DMA_TO_DEVICE);
+ if (vring_mapping_error(vq, addr))
+ goto unmap_release;
+
+ head_flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_INDIRECT |
+ VRING_DESC_F_AVAIL(avail_wrap_counter) |
+ VRING_DESC_F_USED(!avail_wrap_counter));
+ vq->vring_packed.desc[head].addr = cpu_to_virtio64(_vq->vdev,
+ addr);
+ vq->vring_packed.desc[head].len = cpu_to_virtio32(_vq->vdev,
+ total_sg * sizeof(struct vring_packed_desc));
+ vq->vring_packed.desc[head].id = cpu_to_virtio16(_vq->vdev, id);
+
+ if (vring_use_dma_api(_vq->vdev)) {
+ vq->desc_state_packed[id].addr = addr;
+ vq->desc_state_packed[id].len = total_sg *
+ sizeof(struct vring_packed_desc);
+ vq->desc_state_packed[id].flags =
+ virtio16_to_cpu(_vq->vdev, head_flags);
+ }
+ }
+
+ /* We're using some buffers from the free list. */
+ vq->vq.num_free -= descs_used;
+
+ /* Update free pointer */
+ if (indirect) {
+ n = head + 1;
+ if (n >= vq->vring_packed.num) {
+ n = 0;
+ vq->avail_wrap_counter ^= 1;
+ }
+ vq->next_avail_idx = n;
+ vq->free_head = vq->desc_state_packed[id].next;
+ } else {
+ vq->next_avail_idx = i;
+ vq->free_head = curr;
+ }
+
+ /* Store token and indirect buffer state. */
+ vq->desc_state_packed[id].num = descs_used;
+ vq->desc_state_packed[id].data = data;
+ if (indirect)
+ vq->desc_state_packed[id].indir_desc = desc;
+ else
+ vq->desc_state_packed[id].indir_desc = ctx;
+
+ /* A driver MUST NOT make the first descriptor in the list
+ * available before all subsequent descriptors comprising
+ * the list are made available. */
+ virtio_wmb(vq->weak_barriers);
+ vq->vring_packed.desc[head].flags = head_flags;
+ vq->num_added += descs_used;
+
+ pr_debug("Added buffer head %i to %p\n", head, vq);
+ END_USE(vq);
+
+ return 0;
+
+unmap_release:
+ err_idx = i;
+ i = head;
+
+ for (n = 0; n < total_sg; n++) {
+ if (i == err_idx)
+ break;
+ vring_unmap_desc_packed(vq, &desc[i]);
+ i++;
+ if (!indirect && i >= vq->vring_packed.num)
+ i = 0;
+ }
+
+ vq->avail_wrap_counter = avail_wrap_counter;
+
+ if (indirect)
+ kfree(desc);
+
+ END_USE(vq);
return -EIO;
}
static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)
{
- return false;
+ struct vring_virtqueue *vq = to_vvq(_vq);
+ u16 flags;
+ bool needs_kick;
+ u32 snapshot;
+
+ START_USE(vq);
+ /* We need to expose the new flags value before checking notification
+ * suppressions. */
+ virtio_mb(vq->weak_barriers);
+
+ snapshot = *(u32 *)vq->vring_packed.device;
+ flags = virtio16_to_cpu(_vq->vdev, (__virtio16)(snapshot >> 16)) & 0x3;
+
+#ifdef DEBUG
+ if (vq->last_add_time_valid) {
+ WARN_ON(ktime_to_ms(ktime_sub(ktime_get(),
+ vq->last_add_time)) > 100);
+ }
+ vq->last_add_time_valid = false;
+#endif
+
+ needs_kick = (flags != VRING_EVENT_F_DISABLE);
+ END_USE(vq);
+ return needs_kick;
+}
+
+static void detach_buf_packed(struct vring_virtqueue *vq,
+ unsigned int id, void **ctx)
+{
+ struct vring_desc_state_packed *state = NULL;
+ struct vring_packed_desc *desc;
+ unsigned int curr, i;
+
+ /* Clear data ptr. */
+ vq->desc_state_packed[id].data = NULL;
+
+ curr = id;
+ for (i = 0; i < vq->desc_state_packed[id].num; i++) {
+ state = &vq->desc_state_packed[curr];
+ vring_unmap_state_packed(vq, state);
+ curr = state->next;
+ }
+
+ BUG_ON(state == NULL);
+ vq->vq.num_free += vq->desc_state_packed[id].num;
+ state->next = vq->free_head;
+ vq->free_head = id;
+
+ if (vq->indirect) {
+ u32 len;
+
+ /* Free the indirect table, if any, now that it's unmapped. */
+ desc = vq->desc_state_packed[id].indir_desc;
+ if (!desc)
+ return;
+
+ if (vring_use_dma_api(vq->vq.vdev)) {
+ len = vq->desc_state_packed[id].len;
+ for (i = 0; i < len / sizeof(struct vring_packed_desc);
+ i++)
+ vring_unmap_desc_packed(vq, &desc[i]);
+ }
+ kfree(desc);
+ vq->desc_state_packed[id].indir_desc = NULL;
+ } else if (ctx) {
+ *ctx = vq->desc_state_packed[id].indir_desc;
+ }
}
static inline bool more_used_packed(const struct vring_virtqueue *vq)
{
- return false;
+ u16 last_used, flags;
+ u8 avail, used;
+
+ last_used = vq->last_used_idx;
+ flags = virtio16_to_cpu(vq->vq.vdev,
+ vq->vring_packed.desc[last_used].flags);
+ avail = !!(flags & VRING_DESC_F_AVAIL(1));
+ used = !!(flags & VRING_DESC_F_USED(1));
+
+ return avail == used && used == vq->used_wrap_counter;
}
static void *virtqueue_get_buf_ctx_packed(struct virtqueue *_vq,
unsigned int *len,
void **ctx)
{
- return NULL;
+ struct vring_virtqueue *vq = to_vvq(_vq);
+ u16 last_used, id;
+ void *ret;
+
+ START_USE(vq);
+
+ if (unlikely(vq->broken)) {
+ END_USE(vq);
+ return NULL;
+ }
+
+ if (!more_used_packed(vq)) {
+ pr_debug("No more buffers in queue\n");
+ END_USE(vq);
+ return NULL;
+ }
+
+ /* Only get used elements after they have been exposed by host. */
+ virtio_rmb(vq->weak_barriers);
+
+ last_used = vq->last_used_idx;
+ id = virtio16_to_cpu(_vq->vdev, vq->vring_packed.desc[last_used].id);
+ *len = virtio32_to_cpu(_vq->vdev, vq->vring_packed.desc[last_used].len);
+
+ if (unlikely(id >= vq->vring_packed.num)) {
+ BAD_RING(vq, "id %u out of range\n", id);
+ return NULL;
+ }
+ if (unlikely(!vq->desc_state_packed[id].data)) {
+ BAD_RING(vq, "id %u is not a head!\n", id);
+ return NULL;
+ }
+
+ vq->last_used_idx += vq->desc_state_packed[id].num;
+ if (vq->last_used_idx >= vq->vring_packed.num) {
+ vq->last_used_idx -= vq->vring_packed.num;
+ vq->used_wrap_counter ^= 1;
+ }
+
+ /* detach_buf_packed clears data, so grab it now. */
+ ret = vq->desc_state_packed[id].data;
+ detach_buf_packed(vq, id, ctx);
+
+#ifdef DEBUG
+ vq->last_add_time_valid = false;
+#endif
+
+ END_USE(vq);
+ return ret;
}
static void virtqueue_disable_cb_packed(struct virtqueue *_vq)
{
+ struct vring_virtqueue *vq = to_vvq(_vq);
+
+ if (vq->event_flags_shadow != VRING_EVENT_F_DISABLE) {
+ vq->event_flags_shadow = VRING_EVENT_F_DISABLE;
+ vq->vring_packed.driver->flags = cpu_to_virtio16(_vq->vdev,
+ vq->event_flags_shadow);
+ }
}
static unsigned virtqueue_enable_cb_prepare_packed(struct virtqueue *_vq)
{
- return 0;
+ struct vring_virtqueue *vq = to_vvq(_vq);
+
+ START_USE(vq);
+
+ /* We optimistically turn back on interrupts, then check if there was
+ * more to do. */
+
+ if (vq->event_flags_shadow == VRING_EVENT_F_DISABLE) {
+ vq->event_flags_shadow = VRING_EVENT_F_ENABLE;
+ vq->vring_packed.driver->flags = cpu_to_virtio16(_vq->vdev,
+ vq->event_flags_shadow);
+ }
+
+ END_USE(vq);
+ return vq->last_used_idx;
}
static bool virtqueue_poll_packed(struct virtqueue *_vq, unsigned last_used_idx)
{
- return false;
+ struct vring_virtqueue *vq = to_vvq(_vq);
+ u8 avail, used;
+ u16 flags;
+
+ flags = virtio16_to_cpu(vq->vq.vdev,
+ vq->vring_packed.desc[last_used_idx].flags);
+ avail = !!(flags & VRING_DESC_F_AVAIL(1));
+ used = !!(flags & VRING_DESC_F_USED(1));
+
+ return avail == used && used == vq->used_wrap_counter;
}
static bool virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq)
{
- return false;
+ struct vring_virtqueue *vq = to_vvq(_vq);
+
+ START_USE(vq);
+
+ /* We optimistically turn back on interrupts, then check if there was
+ * more to do. */
+
+ if (vq->event_flags_shadow == VRING_EVENT_F_DISABLE) {
+ vq->event_flags_shadow = VRING_EVENT_F_ENABLE;
+ vq->vring_packed.driver->flags = cpu_to_virtio16(_vq->vdev,
+ vq->event_flags_shadow);
+ /* We need to enable interrupts first before re-checking
+ * for more used buffers. */
+ virtio_mb(vq->weak_barriers);
+ }
+
+ if (more_used_packed(vq)) {
+ END_USE(vq);
+ return false;
+ }
+
+ END_USE(vq);
+ return true;
}
static void *virtqueue_detach_unused_buf_packed(struct virtqueue *_vq)
{
+ struct vring_virtqueue *vq = to_vvq(_vq);
+ unsigned int i;
+ void *buf;
+
+ START_USE(vq);
+
+ for (i = 0; i < vq->vring_packed.num; i++) {
+ if (!vq->desc_state_packed[i].data)
+ continue;
+ /* detach_buf clears data, so grab it now. */
+ buf = vq->desc_state_packed[i].data;
+ detach_buf_packed(vq, i, NULL);
+ END_USE(vq);
+ return buf;
+ }
+ /* That should have freed everything. */
+ BUG_ON(vq->vq.num_free != vq->vring_packed.num);
+
+ END_USE(vq);
return NULL;
}
@@ -1084,6 +1553,9 @@ bool virtqueue_poll(struct virtqueue *_vq, unsigned last_used_idx)
{
struct vring_virtqueue *vq = to_vvq(_vq);
+ /* We need to enable interrupts first before re-checking
+ * for more used buffers. */
+ virtio_mb(vq->weak_barriers);
return vq->packed ? virtqueue_poll_packed(_vq, last_used_idx) :
virtqueue_poll_split(_vq, last_used_idx);
}
--
2.17.0
^ permalink raw reply related
* [RFC v6 2/5] virtio_ring: support creating packed ring
From: Tiwei Bie @ 2018-06-05 7:40 UTC (permalink / raw)
To: mst, jasowang, virtualization, linux-kernel, netdev; +Cc: wexu
In-Reply-To: <20180605074046.20709-1-tiwei.bie@intel.com>
This commit introduces the support for creating packed ring.
All split ring specific functions are added _split suffix.
Some necessary stubs for packed ring are also added.
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
drivers/virtio/virtio_ring.c | 801 +++++++++++++++++++++++------------
include/linux/virtio_ring.h | 8 +-
2 files changed, 546 insertions(+), 263 deletions(-)
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 71458f493cf8..416c33143125 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -61,11 +61,15 @@ struct vring_desc_state {
struct vring_desc *indir_desc; /* Indirect descriptor, if any. */
};
+struct vring_desc_state_packed {
+ int next; /* The next desc state. */
+};
+
struct vring_virtqueue {
struct virtqueue vq;
- /* Actual memory layout for this queue */
- struct vring vring;
+ /* Is this a packed ring? */
+ bool packed;
/* Can we use weak barriers? */
bool weak_barriers;
@@ -87,11 +91,39 @@ struct vring_virtqueue {
/* Last used index we've seen. */
u16 last_used_idx;
- /* Last written value to avail->flags */
- u16 avail_flags_shadow;
+ union {
+ /* Available for split ring */
+ struct {
+ /* Actual memory layout for this queue. */
+ struct vring vring;
- /* Last written value to avail->idx in guest byte order */
- u16 avail_idx_shadow;
+ /* Last written value to avail->flags */
+ u16 avail_flags_shadow;
+
+ /* Last written value to avail->idx in
+ * guest byte order. */
+ u16 avail_idx_shadow;
+ };
+
+ /* Available for packed ring */
+ struct {
+ /* Actual memory layout for this queue. */
+ struct vring_packed vring_packed;
+
+ /* Driver ring wrap counter. */
+ bool avail_wrap_counter;
+
+ /* Device ring wrap counter. */
+ bool used_wrap_counter;
+
+ /* Index of the next avail descriptor. */
+ u16 next_avail_idx;
+
+ /* Last written value to driver->flags in
+ * guest byte order. */
+ u16 event_flags_shadow;
+ };
+ };
/* How to notify other side. FIXME: commonalize hcalls! */
bool (*notify)(struct virtqueue *vq);
@@ -111,11 +143,24 @@ struct vring_virtqueue {
#endif
/* Per-descriptor state. */
- struct vring_desc_state desc_state[];
+ union {
+ struct vring_desc_state desc_state[1];
+ struct vring_desc_state_packed desc_state_packed[1];
+ };
};
#define to_vvq(_vq) container_of(_vq, struct vring_virtqueue, vq)
+static inline bool virtqueue_use_indirect(struct virtqueue *_vq,
+ unsigned int total_sg)
+{
+ struct vring_virtqueue *vq = to_vvq(_vq);
+
+ /* If the host supports indirect descriptor tables, and we have multiple
+ * buffers, then go indirect. FIXME: tune this threshold */
+ return (vq->indirect && total_sg > 1 && vq->vq.num_free);
+}
+
/*
* Modern virtio devices have feature bits to specify whether they need a
* quirk and bypass the IOMMU. If not there, just use the DMA API.
@@ -201,8 +246,17 @@ static dma_addr_t vring_map_single(const struct vring_virtqueue *vq,
cpu_addr, size, direction);
}
-static void vring_unmap_one(const struct vring_virtqueue *vq,
- struct vring_desc *desc)
+static int vring_mapping_error(const struct vring_virtqueue *vq,
+ dma_addr_t addr)
+{
+ if (!vring_use_dma_api(vq->vq.vdev))
+ return 0;
+
+ return dma_mapping_error(vring_dma_dev(vq), addr);
+}
+
+static void vring_unmap_one_split(const struct vring_virtqueue *vq,
+ struct vring_desc *desc)
{
u16 flags;
@@ -226,17 +280,9 @@ static void vring_unmap_one(const struct vring_virtqueue *vq,
}
}
-static int vring_mapping_error(const struct vring_virtqueue *vq,
- dma_addr_t addr)
-{
- if (!vring_use_dma_api(vq->vq.vdev))
- return 0;
-
- return dma_mapping_error(vring_dma_dev(vq), addr);
-}
-
-static struct vring_desc *alloc_indirect(struct virtqueue *_vq,
- unsigned int total_sg, gfp_t gfp)
+static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq,
+ unsigned int total_sg,
+ gfp_t gfp)
{
struct vring_desc *desc;
unsigned int i;
@@ -257,14 +303,14 @@ static struct vring_desc *alloc_indirect(struct virtqueue *_vq,
return desc;
}
-static inline int virtqueue_add(struct virtqueue *_vq,
- struct scatterlist *sgs[],
- unsigned int total_sg,
- unsigned int out_sgs,
- unsigned int in_sgs,
- void *data,
- void *ctx,
- gfp_t gfp)
+static inline int virtqueue_add_split(struct virtqueue *_vq,
+ struct scatterlist *sgs[],
+ unsigned int total_sg,
+ unsigned int out_sgs,
+ unsigned int in_sgs,
+ void *data,
+ void *ctx,
+ gfp_t gfp)
{
struct vring_virtqueue *vq = to_vvq(_vq);
struct scatterlist *sg;
@@ -300,10 +346,8 @@ static inline int virtqueue_add(struct virtqueue *_vq,
head = vq->free_head;
- /* If the host supports indirect descriptor tables, and we have multiple
- * buffers, then go indirect. FIXME: tune this threshold */
- if (vq->indirect && total_sg > 1 && vq->vq.num_free)
- desc = alloc_indirect(_vq, total_sg, gfp);
+ if (virtqueue_use_indirect(_vq, total_sg))
+ desc = alloc_indirect_split(_vq, total_sg, gfp);
else {
desc = NULL;
WARN_ON_ONCE(total_sg > vq->vring.num && !vq->indirect);
@@ -424,7 +468,7 @@ static inline int virtqueue_add(struct virtqueue *_vq,
for (n = 0; n < total_sg; n++) {
if (i == err_idx)
break;
- vring_unmap_one(vq, &desc[i]);
+ vring_unmap_one_split(vq, &desc[i]);
i = virtio16_to_cpu(_vq->vdev, vq->vring.desc[i].next);
}
@@ -435,6 +479,355 @@ static inline int virtqueue_add(struct virtqueue *_vq,
return -EIO;
}
+static bool virtqueue_kick_prepare_split(struct virtqueue *_vq)
+{
+ struct vring_virtqueue *vq = to_vvq(_vq);
+ u16 new, old;
+ bool needs_kick;
+
+ START_USE(vq);
+ /* We need to expose available array entries before checking avail
+ * event. */
+ virtio_mb(vq->weak_barriers);
+
+ old = vq->avail_idx_shadow - vq->num_added;
+ new = vq->avail_idx_shadow;
+ vq->num_added = 0;
+
+#ifdef DEBUG
+ if (vq->last_add_time_valid) {
+ WARN_ON(ktime_to_ms(ktime_sub(ktime_get(),
+ vq->last_add_time)) > 100);
+ }
+ vq->last_add_time_valid = false;
+#endif
+
+ if (vq->event) {
+ needs_kick = vring_need_event(virtio16_to_cpu(_vq->vdev, vring_avail_event(&vq->vring)),
+ new, old);
+ } else {
+ needs_kick = !(vq->vring.used->flags & cpu_to_virtio16(_vq->vdev, VRING_USED_F_NO_NOTIFY));
+ }
+ END_USE(vq);
+ return needs_kick;
+}
+
+static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
+ void **ctx)
+{
+ unsigned int i, j;
+ __virtio16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT);
+
+ /* Clear data ptr. */
+ vq->desc_state[head].data = NULL;
+
+ /* Put back on free list: unmap first-level descriptors and find end */
+ i = head;
+
+ while (vq->vring.desc[i].flags & nextflag) {
+ vring_unmap_one_split(vq, &vq->vring.desc[i]);
+ i = virtio16_to_cpu(vq->vq.vdev, vq->vring.desc[i].next);
+ vq->vq.num_free++;
+ }
+
+ vring_unmap_one_split(vq, &vq->vring.desc[i]);
+ vq->vring.desc[i].next = cpu_to_virtio16(vq->vq.vdev, vq->free_head);
+ vq->free_head = head;
+
+ /* Plus final descriptor */
+ vq->vq.num_free++;
+
+ if (vq->indirect) {
+ struct vring_desc *indir_desc = vq->desc_state[head].indir_desc;
+ u32 len;
+
+ /* Free the indirect table, if any, now that it's unmapped. */
+ if (!indir_desc)
+ return;
+
+ len = virtio32_to_cpu(vq->vq.vdev, vq->vring.desc[head].len);
+
+ BUG_ON(!(vq->vring.desc[head].flags &
+ cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_INDIRECT)));
+ BUG_ON(len == 0 || len % sizeof(struct vring_desc));
+
+ for (j = 0; j < len / sizeof(struct vring_desc); j++)
+ vring_unmap_one_split(vq, &indir_desc[j]);
+
+ kfree(indir_desc);
+ vq->desc_state[head].indir_desc = NULL;
+ } else if (ctx) {
+ *ctx = vq->desc_state[head].indir_desc;
+ }
+}
+
+static inline bool more_used_split(const struct vring_virtqueue *vq)
+{
+ return vq->last_used_idx != virtio16_to_cpu(vq->vq.vdev, vq->vring.used->idx);
+}
+
+static void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq,
+ unsigned int *len,
+ void **ctx)
+{
+ struct vring_virtqueue *vq = to_vvq(_vq);
+ void *ret;
+ unsigned int i;
+ u16 last_used;
+
+ START_USE(vq);
+
+ if (unlikely(vq->broken)) {
+ END_USE(vq);
+ return NULL;
+ }
+
+ if (!more_used_split(vq)) {
+ pr_debug("No more buffers in queue\n");
+ END_USE(vq);
+ return NULL;
+ }
+
+ /* Only get used array entries after they have been exposed by host. */
+ virtio_rmb(vq->weak_barriers);
+
+ last_used = (vq->last_used_idx & (vq->vring.num - 1));
+ i = virtio32_to_cpu(_vq->vdev, vq->vring.used->ring[last_used].id);
+ *len = virtio32_to_cpu(_vq->vdev, vq->vring.used->ring[last_used].len);
+
+ if (unlikely(i >= vq->vring.num)) {
+ BAD_RING(vq, "id %u out of range\n", i);
+ return NULL;
+ }
+ if (unlikely(!vq->desc_state[i].data)) {
+ BAD_RING(vq, "id %u is not a head!\n", i);
+ return NULL;
+ }
+
+ /* detach_buf_split clears data, so grab it now. */
+ ret = vq->desc_state[i].data;
+ detach_buf_split(vq, i, ctx);
+ vq->last_used_idx++;
+ /* If we expect an interrupt for the next entry, tell host
+ * by writing event index and flush out the write before
+ * the read in the next get_buf call. */
+ if (!(vq->avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT))
+ virtio_store_mb(vq->weak_barriers,
+ &vring_used_event(&vq->vring),
+ cpu_to_virtio16(_vq->vdev, vq->last_used_idx));
+
+#ifdef DEBUG
+ vq->last_add_time_valid = false;
+#endif
+
+ END_USE(vq);
+ return ret;
+}
+
+static void virtqueue_disable_cb_split(struct virtqueue *_vq)
+{
+ struct vring_virtqueue *vq = to_vvq(_vq);
+
+ if (!(vq->avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) {
+ vq->avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
+ if (!vq->event)
+ vq->vring.avail->flags = cpu_to_virtio16(_vq->vdev, vq->avail_flags_shadow);
+ }
+}
+
+static unsigned virtqueue_enable_cb_prepare_split(struct virtqueue *_vq)
+{
+ struct vring_virtqueue *vq = to_vvq(_vq);
+ u16 last_used_idx;
+
+ START_USE(vq);
+
+ /* We optimistically turn back on interrupts, then check if there was
+ * more to do. */
+ /* Depending on the VIRTIO_RING_F_EVENT_IDX feature, we need to
+ * either clear the flags bit or point the event index at the next
+ * entry. Always do both to keep code simple. */
+ if (vq->avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT) {
+ vq->avail_flags_shadow &= ~VRING_AVAIL_F_NO_INTERRUPT;
+ if (!vq->event)
+ vq->vring.avail->flags = cpu_to_virtio16(_vq->vdev, vq->avail_flags_shadow);
+ }
+ vring_used_event(&vq->vring) = cpu_to_virtio16(_vq->vdev, last_used_idx = vq->last_used_idx);
+ END_USE(vq);
+ return last_used_idx;
+}
+
+static bool virtqueue_poll_split(struct virtqueue *_vq, unsigned last_used_idx)
+{
+ struct vring_virtqueue *vq = to_vvq(_vq);
+
+ virtio_mb(vq->weak_barriers);
+ return (u16)last_used_idx != virtio16_to_cpu(_vq->vdev, vq->vring.used->idx);
+}
+
+static bool virtqueue_enable_cb_delayed_split(struct virtqueue *_vq)
+{
+ struct vring_virtqueue *vq = to_vvq(_vq);
+ u16 bufs;
+
+ START_USE(vq);
+
+ /* We optimistically turn back on interrupts, then check if there was
+ * more to do. */
+ /* Depending on the VIRTIO_RING_F_USED_EVENT_IDX feature, we need to
+ * either clear the flags bit or point the event index at the next
+ * entry. Always update the event index to keep code simple. */
+ if (vq->avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT) {
+ vq->avail_flags_shadow &= ~VRING_AVAIL_F_NO_INTERRUPT;
+ if (!vq->event)
+ vq->vring.avail->flags = cpu_to_virtio16(_vq->vdev, vq->avail_flags_shadow);
+ }
+ /* TODO: tune this threshold */
+ bufs = (u16)(vq->avail_idx_shadow - vq->last_used_idx) * 3 / 4;
+
+ virtio_store_mb(vq->weak_barriers,
+ &vring_used_event(&vq->vring),
+ cpu_to_virtio16(_vq->vdev, vq->last_used_idx + bufs));
+
+ if (unlikely((u16)(virtio16_to_cpu(_vq->vdev, vq->vring.used->idx) - vq->last_used_idx) > bufs)) {
+ END_USE(vq);
+ return false;
+ }
+
+ END_USE(vq);
+ return true;
+}
+
+static void *virtqueue_detach_unused_buf_split(struct virtqueue *_vq)
+{
+ struct vring_virtqueue *vq = to_vvq(_vq);
+ unsigned int i;
+ void *buf;
+
+ START_USE(vq);
+
+ for (i = 0; i < vq->vring.num; i++) {
+ if (!vq->desc_state[i].data)
+ continue;
+ /* detach_buf clears data, so grab it now. */
+ buf = vq->desc_state[i].data;
+ detach_buf_split(vq, i, NULL);
+ vq->avail_idx_shadow--;
+ vq->vring.avail->idx = cpu_to_virtio16(_vq->vdev, vq->avail_idx_shadow);
+ END_USE(vq);
+ return buf;
+ }
+ /* That should have freed everything. */
+ BUG_ON(vq->vq.num_free != vq->vring.num);
+
+ END_USE(vq);
+ return NULL;
+}
+
+/*
+ * The layout for the packed ring is a continuous chunk of memory
+ * which looks like this.
+ *
+ * struct vring_packed {
+ * // The actual descriptors (16 bytes each)
+ * struct vring_packed_desc desc[num];
+ *
+ * // Padding to the next align boundary.
+ * char pad[];
+ *
+ * // Driver Event Suppression
+ * struct vring_packed_desc_event driver;
+ *
+ * // Device Event Suppression
+ * struct vring_packed_desc_event device;
+ * };
+ */
+static inline void vring_init_packed(struct vring_packed *vr, unsigned int num,
+ void *p, unsigned long align)
+{
+ vr->num = num;
+ vr->desc = p;
+ vr->driver = (void *)ALIGN(((uintptr_t)p +
+ sizeof(struct vring_packed_desc) * num), align);
+ vr->device = vr->driver + 1;
+}
+
+static inline unsigned vring_size_packed(unsigned int num, unsigned long align)
+{
+ return ((sizeof(struct vring_packed_desc) * num + align - 1)
+ & ~(align - 1)) + sizeof(struct vring_packed_desc_event) * 2;
+}
+
+static inline int virtqueue_add_packed(struct virtqueue *_vq,
+ struct scatterlist *sgs[],
+ unsigned int total_sg,
+ unsigned int out_sgs,
+ unsigned int in_sgs,
+ void *data,
+ void *ctx,
+ gfp_t gfp)
+{
+ return -EIO;
+}
+
+static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)
+{
+ return false;
+}
+
+static inline bool more_used_packed(const struct vring_virtqueue *vq)
+{
+ return false;
+}
+
+static void *virtqueue_get_buf_ctx_packed(struct virtqueue *_vq,
+ unsigned int *len,
+ void **ctx)
+{
+ return NULL;
+}
+
+static void virtqueue_disable_cb_packed(struct virtqueue *_vq)
+{
+}
+
+static unsigned virtqueue_enable_cb_prepare_packed(struct virtqueue *_vq)
+{
+ return 0;
+}
+
+static bool virtqueue_poll_packed(struct virtqueue *_vq, unsigned last_used_idx)
+{
+ return false;
+}
+
+static bool virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq)
+{
+ return false;
+}
+
+static void *virtqueue_detach_unused_buf_packed(struct virtqueue *_vq)
+{
+ return NULL;
+}
+
+static inline int virtqueue_add(struct virtqueue *_vq,
+ struct scatterlist *sgs[],
+ unsigned int total_sg,
+ unsigned int out_sgs,
+ unsigned int in_sgs,
+ void *data,
+ void *ctx,
+ gfp_t gfp)
+{
+ struct vring_virtqueue *vq = to_vvq(_vq);
+
+ return vq->packed ? virtqueue_add_packed(_vq, sgs, total_sg, out_sgs,
+ in_sgs, data, ctx, gfp) :
+ virtqueue_add_split(_vq, sgs, total_sg, out_sgs,
+ in_sgs, data, ctx, gfp);
+}
+
/**
* virtqueue_add_sgs - expose buffers to other end
* @vq: the struct virtqueue we're talking about.
@@ -551,34 +944,9 @@ EXPORT_SYMBOL_GPL(virtqueue_add_inbuf_ctx);
bool virtqueue_kick_prepare(struct virtqueue *_vq)
{
struct vring_virtqueue *vq = to_vvq(_vq);
- u16 new, old;
- bool needs_kick;
- START_USE(vq);
- /* We need to expose available array entries before checking avail
- * event. */
- virtio_mb(vq->weak_barriers);
-
- old = vq->avail_idx_shadow - vq->num_added;
- new = vq->avail_idx_shadow;
- vq->num_added = 0;
-
-#ifdef DEBUG
- if (vq->last_add_time_valid) {
- WARN_ON(ktime_to_ms(ktime_sub(ktime_get(),
- vq->last_add_time)) > 100);
- }
- vq->last_add_time_valid = false;
-#endif
-
- if (vq->event) {
- needs_kick = vring_need_event(virtio16_to_cpu(_vq->vdev, vring_avail_event(&vq->vring)),
- new, old);
- } else {
- needs_kick = !(vq->vring.used->flags & cpu_to_virtio16(_vq->vdev, VRING_USED_F_NO_NOTIFY));
- }
- END_USE(vq);
- return needs_kick;
+ return vq->packed ? virtqueue_kick_prepare_packed(_vq) :
+ virtqueue_kick_prepare_split(_vq);
}
EXPORT_SYMBOL_GPL(virtqueue_kick_prepare);
@@ -626,58 +994,9 @@ bool virtqueue_kick(struct virtqueue *vq)
}
EXPORT_SYMBOL_GPL(virtqueue_kick);
-static void detach_buf(struct vring_virtqueue *vq, unsigned int head,
- void **ctx)
-{
- unsigned int i, j;
- __virtio16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT);
-
- /* Clear data ptr. */
- vq->desc_state[head].data = NULL;
-
- /* Put back on free list: unmap first-level descriptors and find end */
- i = head;
-
- while (vq->vring.desc[i].flags & nextflag) {
- vring_unmap_one(vq, &vq->vring.desc[i]);
- i = virtio16_to_cpu(vq->vq.vdev, vq->vring.desc[i].next);
- vq->vq.num_free++;
- }
-
- vring_unmap_one(vq, &vq->vring.desc[i]);
- vq->vring.desc[i].next = cpu_to_virtio16(vq->vq.vdev, vq->free_head);
- vq->free_head = head;
-
- /* Plus final descriptor */
- vq->vq.num_free++;
-
- if (vq->indirect) {
- struct vring_desc *indir_desc = vq->desc_state[head].indir_desc;
- u32 len;
-
- /* Free the indirect table, if any, now that it's unmapped. */
- if (!indir_desc)
- return;
-
- len = virtio32_to_cpu(vq->vq.vdev, vq->vring.desc[head].len);
-
- BUG_ON(!(vq->vring.desc[head].flags &
- cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_INDIRECT)));
- BUG_ON(len == 0 || len % sizeof(struct vring_desc));
-
- for (j = 0; j < len / sizeof(struct vring_desc); j++)
- vring_unmap_one(vq, &indir_desc[j]);
-
- kfree(indir_desc);
- vq->desc_state[head].indir_desc = NULL;
- } else if (ctx) {
- *ctx = vq->desc_state[head].indir_desc;
- }
-}
-
static inline bool more_used(const struct vring_virtqueue *vq)
{
- return vq->last_used_idx != virtio16_to_cpu(vq->vq.vdev, vq->vring.used->idx);
+ return vq->packed ? more_used_packed(vq) : more_used_split(vq);
}
/**
@@ -700,57 +1019,9 @@ void *virtqueue_get_buf_ctx(struct virtqueue *_vq, unsigned int *len,
void **ctx)
{
struct vring_virtqueue *vq = to_vvq(_vq);
- void *ret;
- unsigned int i;
- u16 last_used;
- START_USE(vq);
-
- if (unlikely(vq->broken)) {
- END_USE(vq);
- return NULL;
- }
-
- if (!more_used(vq)) {
- pr_debug("No more buffers in queue\n");
- END_USE(vq);
- return NULL;
- }
-
- /* Only get used array entries after they have been exposed by host. */
- virtio_rmb(vq->weak_barriers);
-
- last_used = (vq->last_used_idx & (vq->vring.num - 1));
- i = virtio32_to_cpu(_vq->vdev, vq->vring.used->ring[last_used].id);
- *len = virtio32_to_cpu(_vq->vdev, vq->vring.used->ring[last_used].len);
-
- if (unlikely(i >= vq->vring.num)) {
- BAD_RING(vq, "id %u out of range\n", i);
- return NULL;
- }
- if (unlikely(!vq->desc_state[i].data)) {
- BAD_RING(vq, "id %u is not a head!\n", i);
- return NULL;
- }
-
- /* detach_buf clears data, so grab it now. */
- ret = vq->desc_state[i].data;
- detach_buf(vq, i, ctx);
- vq->last_used_idx++;
- /* If we expect an interrupt for the next entry, tell host
- * by writing event index and flush out the write before
- * the read in the next get_buf call. */
- if (!(vq->avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT))
- virtio_store_mb(vq->weak_barriers,
- &vring_used_event(&vq->vring),
- cpu_to_virtio16(_vq->vdev, vq->last_used_idx));
-
-#ifdef DEBUG
- vq->last_add_time_valid = false;
-#endif
-
- END_USE(vq);
- return ret;
+ return vq->packed ? virtqueue_get_buf_ctx_packed(_vq, len, ctx) :
+ virtqueue_get_buf_ctx_split(_vq, len, ctx);
}
EXPORT_SYMBOL_GPL(virtqueue_get_buf_ctx);
@@ -772,12 +1043,10 @@ void virtqueue_disable_cb(struct virtqueue *_vq)
{
struct vring_virtqueue *vq = to_vvq(_vq);
- if (!(vq->avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) {
- vq->avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
- if (!vq->event)
- vq->vring.avail->flags = cpu_to_virtio16(_vq->vdev, vq->avail_flags_shadow);
- }
-
+ if (vq->packed)
+ virtqueue_disable_cb_packed(_vq);
+ else
+ virtqueue_disable_cb_split(_vq);
}
EXPORT_SYMBOL_GPL(virtqueue_disable_cb);
@@ -796,23 +1065,9 @@ EXPORT_SYMBOL_GPL(virtqueue_disable_cb);
unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq)
{
struct vring_virtqueue *vq = to_vvq(_vq);
- u16 last_used_idx;
- START_USE(vq);
-
- /* We optimistically turn back on interrupts, then check if there was
- * more to do. */
- /* Depending on the VIRTIO_RING_F_EVENT_IDX feature, we need to
- * either clear the flags bit or point the event index at the next
- * entry. Always do both to keep code simple. */
- if (vq->avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT) {
- vq->avail_flags_shadow &= ~VRING_AVAIL_F_NO_INTERRUPT;
- if (!vq->event)
- vq->vring.avail->flags = cpu_to_virtio16(_vq->vdev, vq->avail_flags_shadow);
- }
- vring_used_event(&vq->vring) = cpu_to_virtio16(_vq->vdev, last_used_idx = vq->last_used_idx);
- END_USE(vq);
- return last_used_idx;
+ return vq->packed ? virtqueue_enable_cb_prepare_packed(_vq) :
+ virtqueue_enable_cb_prepare_split(_vq);
}
EXPORT_SYMBOL_GPL(virtqueue_enable_cb_prepare);
@@ -829,8 +1084,8 @@ bool virtqueue_poll(struct virtqueue *_vq, unsigned last_used_idx)
{
struct vring_virtqueue *vq = to_vvq(_vq);
- virtio_mb(vq->weak_barriers);
- return (u16)last_used_idx != virtio16_to_cpu(_vq->vdev, vq->vring.used->idx);
+ return vq->packed ? virtqueue_poll_packed(_vq, last_used_idx) :
+ virtqueue_poll_split(_vq, last_used_idx);
}
EXPORT_SYMBOL_GPL(virtqueue_poll);
@@ -868,34 +1123,9 @@ EXPORT_SYMBOL_GPL(virtqueue_enable_cb);
bool virtqueue_enable_cb_delayed(struct virtqueue *_vq)
{
struct vring_virtqueue *vq = to_vvq(_vq);
- u16 bufs;
- START_USE(vq);
-
- /* We optimistically turn back on interrupts, then check if there was
- * more to do. */
- /* Depending on the VIRTIO_RING_F_USED_EVENT_IDX feature, we need to
- * either clear the flags bit or point the event index at the next
- * entry. Always update the event index to keep code simple. */
- if (vq->avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT) {
- vq->avail_flags_shadow &= ~VRING_AVAIL_F_NO_INTERRUPT;
- if (!vq->event)
- vq->vring.avail->flags = cpu_to_virtio16(_vq->vdev, vq->avail_flags_shadow);
- }
- /* TODO: tune this threshold */
- bufs = (u16)(vq->avail_idx_shadow - vq->last_used_idx) * 3 / 4;
-
- virtio_store_mb(vq->weak_barriers,
- &vring_used_event(&vq->vring),
- cpu_to_virtio16(_vq->vdev, vq->last_used_idx + bufs));
-
- if (unlikely((u16)(virtio16_to_cpu(_vq->vdev, vq->vring.used->idx) - vq->last_used_idx) > bufs)) {
- END_USE(vq);
- return false;
- }
-
- END_USE(vq);
- return true;
+ return vq->packed ? virtqueue_enable_cb_delayed_packed(_vq) :
+ virtqueue_enable_cb_delayed_split(_vq);
}
EXPORT_SYMBOL_GPL(virtqueue_enable_cb_delayed);
@@ -910,27 +1140,9 @@ EXPORT_SYMBOL_GPL(virtqueue_enable_cb_delayed);
void *virtqueue_detach_unused_buf(struct virtqueue *_vq)
{
struct vring_virtqueue *vq = to_vvq(_vq);
- unsigned int i;
- void *buf;
- START_USE(vq);
-
- for (i = 0; i < vq->vring.num; i++) {
- if (!vq->desc_state[i].data)
- continue;
- /* detach_buf clears data, so grab it now. */
- buf = vq->desc_state[i].data;
- detach_buf(vq, i, NULL);
- vq->avail_idx_shadow--;
- vq->vring.avail->idx = cpu_to_virtio16(_vq->vdev, vq->avail_idx_shadow);
- END_USE(vq);
- return buf;
- }
- /* That should have freed everything. */
- BUG_ON(vq->vq.num_free != vq->vring.num);
-
- END_USE(vq);
- return NULL;
+ return vq->packed ? virtqueue_detach_unused_buf_packed(_vq) :
+ virtqueue_detach_unused_buf_split(_vq);
}
EXPORT_SYMBOL_GPL(virtqueue_detach_unused_buf);
@@ -955,7 +1167,8 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
EXPORT_SYMBOL_GPL(vring_interrupt);
struct virtqueue *__vring_new_virtqueue(unsigned int index,
- struct vring vring,
+ union vring_union vring,
+ bool packed,
struct virtio_device *vdev,
bool weak_barriers,
bool context,
@@ -963,19 +1176,22 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
void (*callback)(struct virtqueue *),
const char *name)
{
- unsigned int i;
struct vring_virtqueue *vq;
+ unsigned int num, i;
+ size_t size;
- vq = kmalloc(sizeof(*vq) + vring.num * sizeof(struct vring_desc_state),
- GFP_KERNEL);
+ num = packed ? vring.vring_packed.num : vring.vring_split.num;
+ size = packed ? num * sizeof(struct vring_desc_state_packed) :
+ num * sizeof(struct vring_desc_state);
+
+ vq = kmalloc(sizeof(*vq) + size, GFP_KERNEL);
if (!vq)
return NULL;
- vq->vring = vring;
vq->vq.callback = callback;
vq->vq.vdev = vdev;
vq->vq.name = name;
- vq->vq.num_free = vring.num;
+ vq->vq.num_free = num;
vq->vq.index = index;
vq->we_own_ring = false;
vq->queue_dma_addr = 0;
@@ -984,9 +1200,8 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
vq->weak_barriers = weak_barriers;
vq->broken = false;
vq->last_used_idx = 0;
- vq->avail_flags_shadow = 0;
- vq->avail_idx_shadow = 0;
vq->num_added = 0;
+ vq->packed = packed;
list_add_tail(&vq->vq.list, &vdev->vqs);
#ifdef DEBUG
vq->in_use = false;
@@ -997,19 +1212,48 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
!context;
vq->event = virtio_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX);
+ if (vq->packed) {
+ vq->vring_packed = vring.vring_packed;
+ vq->next_avail_idx = 0;
+ vq->avail_wrap_counter = 1;
+ vq->used_wrap_counter = 1;
+ vq->event_flags_shadow = 0;
+
+ memset(vq->desc_state_packed, 0,
+ num * sizeof(struct vring_desc_state_packed));
+
+ /* Put everything in free lists. */
+ vq->free_head = 0;
+ for (i = 0; i < num-1; i++)
+ vq->desc_state_packed[i].next = i + 1;
+ } else {
+ vq->vring = vring.vring_split;
+ vq->avail_flags_shadow = 0;
+ vq->avail_idx_shadow = 0;
+
+ /* Put everything in free lists. */
+ vq->free_head = 0;
+ for (i = 0; i < num-1; i++)
+ vq->vring.desc[i].next = cpu_to_virtio16(vdev, i + 1);
+
+ memset(vq->desc_state, 0,
+ num * sizeof(struct vring_desc_state));
+ }
+
/* No callback? Tell other side not to bother us. */
if (!callback) {
- vq->avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
- if (!vq->event)
- vq->vring.avail->flags = cpu_to_virtio16(vdev, vq->avail_flags_shadow);
+ if (packed) {
+ vq->event_flags_shadow = VRING_EVENT_F_DISABLE;
+ vq->vring_packed.driver->flags = cpu_to_virtio16(vdev,
+ vq->event_flags_shadow);
+ } else {
+ vq->avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
+ if (!vq->event)
+ vq->vring.avail->flags = cpu_to_virtio16(vdev,
+ vq->avail_flags_shadow);
+ }
}
- /* Put everything in free lists. */
- vq->free_head = 0;
- for (i = 0; i < vring.num-1; i++)
- vq->vring.desc[i].next = cpu_to_virtio16(vdev, i + 1);
- memset(vq->desc_state, 0, vring.num * sizeof(struct vring_desc_state));
-
return &vq->vq;
}
EXPORT_SYMBOL_GPL(__vring_new_virtqueue);
@@ -1056,6 +1300,12 @@ static void vring_free_queue(struct virtio_device *vdev, size_t size,
}
}
+static inline int
+__vring_size(unsigned int num, unsigned long align, bool packed)
+{
+ return packed ? vring_size_packed(num, align) : vring_size(num, align);
+}
+
struct virtqueue *vring_create_virtqueue(
unsigned int index,
unsigned int num,
@@ -1072,7 +1322,8 @@ struct virtqueue *vring_create_virtqueue(
void *queue = NULL;
dma_addr_t dma_addr;
size_t queue_size_in_bytes;
- struct vring vring;
+ union vring_union vring;
+ bool packed;
/* We assume num is a power of 2. */
if (num & (num - 1)) {
@@ -1080,9 +1331,13 @@ struct virtqueue *vring_create_virtqueue(
return NULL;
}
+ packed = virtio_has_feature(vdev, VIRTIO_F_RING_PACKED);
+
/* TODO: allocate each queue chunk individually */
- for (; num && vring_size(num, vring_align) > PAGE_SIZE; num /= 2) {
- queue = vring_alloc_queue(vdev, vring_size(num, vring_align),
+ for (; num && __vring_size(num, vring_align, packed) > PAGE_SIZE;
+ num /= 2) {
+ queue = vring_alloc_queue(vdev, __vring_size(num, vring_align,
+ packed),
&dma_addr,
GFP_KERNEL|__GFP_NOWARN|__GFP_ZERO);
if (queue)
@@ -1094,17 +1349,21 @@ struct virtqueue *vring_create_virtqueue(
if (!queue) {
/* Try to get a single page. You are my only hope! */
- queue = vring_alloc_queue(vdev, vring_size(num, vring_align),
+ queue = vring_alloc_queue(vdev, __vring_size(num, vring_align,
+ packed),
&dma_addr, GFP_KERNEL|__GFP_ZERO);
}
if (!queue)
return NULL;
- queue_size_in_bytes = vring_size(num, vring_align);
- vring_init(&vring, num, queue, vring_align);
+ queue_size_in_bytes = __vring_size(num, vring_align, packed);
+ if (packed)
+ vring_init_packed(&vring.vring_packed, num, queue, vring_align);
+ else
+ vring_init(&vring.vring_split, num, queue, vring_align);
- vq = __vring_new_virtqueue(index, vring, vdev, weak_barriers, context,
- notify, callback, name);
+ vq = __vring_new_virtqueue(index, vring, packed, vdev, weak_barriers,
+ context, notify, callback, name);
if (!vq) {
vring_free_queue(vdev, queue_size_in_bytes, queue,
dma_addr);
@@ -1130,10 +1389,17 @@ struct virtqueue *vring_new_virtqueue(unsigned int index,
void (*callback)(struct virtqueue *vq),
const char *name)
{
- struct vring vring;
- vring_init(&vring, num, pages, vring_align);
- return __vring_new_virtqueue(index, vring, vdev, weak_barriers, context,
- notify, callback, name);
+ union vring_union vring;
+ bool packed;
+
+ packed = virtio_has_feature(vdev, VIRTIO_F_RING_PACKED);
+ if (packed)
+ vring_init_packed(&vring.vring_packed, num, pages, vring_align);
+ else
+ vring_init(&vring.vring_split, num, pages, vring_align);
+
+ return __vring_new_virtqueue(index, vring, packed, vdev, weak_barriers,
+ context, notify, callback, name);
}
EXPORT_SYMBOL_GPL(vring_new_virtqueue);
@@ -1143,7 +1409,9 @@ void vring_del_virtqueue(struct virtqueue *_vq)
if (vq->we_own_ring) {
vring_free_queue(vq->vq.vdev, vq->queue_size_in_bytes,
- vq->vring.desc, vq->queue_dma_addr);
+ vq->packed ? (void *)vq->vring_packed.desc :
+ (void *)vq->vring.desc,
+ vq->queue_dma_addr);
}
list_del(&_vq->list);
kfree(vq);
@@ -1185,7 +1453,7 @@ unsigned int virtqueue_get_vring_size(struct virtqueue *_vq)
struct vring_virtqueue *vq = to_vvq(_vq);
- return vq->vring.num;
+ return vq->packed ? vq->vring_packed.num : vq->vring.num;
}
EXPORT_SYMBOL_GPL(virtqueue_get_vring_size);
@@ -1228,6 +1496,10 @@ dma_addr_t virtqueue_get_avail_addr(struct virtqueue *_vq)
BUG_ON(!vq->we_own_ring);
+ if (vq->packed)
+ return vq->queue_dma_addr + ((char *)vq->vring_packed.driver -
+ (char *)vq->vring_packed.desc);
+
return vq->queue_dma_addr +
((char *)vq->vring.avail - (char *)vq->vring.desc);
}
@@ -1239,11 +1511,16 @@ dma_addr_t virtqueue_get_used_addr(struct virtqueue *_vq)
BUG_ON(!vq->we_own_ring);
+ if (vq->packed)
+ return vq->queue_dma_addr + ((char *)vq->vring_packed.device -
+ (char *)vq->vring_packed.desc);
+
return vq->queue_dma_addr +
((char *)vq->vring.used - (char *)vq->vring.desc);
}
EXPORT_SYMBOL_GPL(virtqueue_get_used_addr);
+/* Only available for split ring */
const struct vring *virtqueue_get_vring(struct virtqueue *vq)
{
return &to_vvq(vq)->vring;
diff --git a/include/linux/virtio_ring.h b/include/linux/virtio_ring.h
index bbf32524ab27..a0075894ad16 100644
--- a/include/linux/virtio_ring.h
+++ b/include/linux/virtio_ring.h
@@ -60,6 +60,11 @@ static inline void virtio_store_mb(bool weak_barriers,
struct virtio_device;
struct virtqueue;
+union vring_union {
+ struct vring vring_split;
+ struct vring_packed vring_packed;
+};
+
/*
* Creates a virtqueue and allocates the descriptor ring. If
* may_reduce_num is set, then this may allocate a smaller ring than
@@ -79,7 +84,8 @@ struct virtqueue *vring_create_virtqueue(unsigned int index,
/* Creates a virtqueue with a custom layout. */
struct virtqueue *__vring_new_virtqueue(unsigned int index,
- struct vring vring,
+ union vring_union vring,
+ bool packed,
struct virtio_device *vdev,
bool weak_barriers,
bool ctx,
--
2.17.0
^ permalink raw reply related
* [RFC v6 1/5] virtio: add packed ring definitions
From: Tiwei Bie @ 2018-06-05 7:40 UTC (permalink / raw)
To: mst, jasowang, virtualization, linux-kernel, netdev; +Cc: wexu
In-Reply-To: <20180605074046.20709-1-tiwei.bie@intel.com>
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
include/uapi/linux/virtio_config.h | 5 ++++-
include/uapi/linux/virtio_ring.h | 36 ++++++++++++++++++++++++++++++
2 files changed, 40 insertions(+), 1 deletion(-)
diff --git a/include/uapi/linux/virtio_config.h b/include/uapi/linux/virtio_config.h
index 308e2096291f..932a6ecc8e46 100644
--- a/include/uapi/linux/virtio_config.h
+++ b/include/uapi/linux/virtio_config.h
@@ -49,7 +49,7 @@
* transport being used (eg. virtio_ring), the rest are per-device feature
* bits. */
#define VIRTIO_TRANSPORT_F_START 28
-#define VIRTIO_TRANSPORT_F_END 34
+#define VIRTIO_TRANSPORT_F_END 35
#ifndef VIRTIO_CONFIG_NO_LEGACY
/* Do we get callbacks when the ring is completely used, even if we've
@@ -71,4 +71,7 @@
* this is for compatibility with legacy systems.
*/
#define VIRTIO_F_IOMMU_PLATFORM 33
+
+/* This feature indicates support for the packed virtqueue layout. */
+#define VIRTIO_F_RING_PACKED 34
#endif /* _UAPI_LINUX_VIRTIO_CONFIG_H */
diff --git a/include/uapi/linux/virtio_ring.h b/include/uapi/linux/virtio_ring.h
index 6d5d5faa989b..7b378da788a7 100644
--- a/include/uapi/linux/virtio_ring.h
+++ b/include/uapi/linux/virtio_ring.h
@@ -44,6 +44,9 @@
/* This means the buffer contains a list of buffer descriptors. */
#define VRING_DESC_F_INDIRECT 4
+#define VRING_DESC_F_AVAIL(b) ((__u16)(b) << 7)
+#define VRING_DESC_F_USED(b) ((__u16)(b) << 15)
+
/* The Host uses this in used->flags to advise the Guest: don't kick me when
* you add a buffer. It's unreliable, so it's simply an optimization. Guest
* will still kick if it's out of buffers. */
@@ -53,6 +56,10 @@
* optimization. */
#define VRING_AVAIL_F_NO_INTERRUPT 1
+#define VRING_EVENT_F_ENABLE 0x0
+#define VRING_EVENT_F_DISABLE 0x1
+#define VRING_EVENT_F_DESC 0x2
+
/* We support indirect buffer descriptors */
#define VIRTIO_RING_F_INDIRECT_DESC 28
@@ -171,4 +178,33 @@ static inline int vring_need_event(__u16 event_idx, __u16 new_idx, __u16 old)
return (__u16)(new_idx - event_idx - 1) < (__u16)(new_idx - old);
}
+struct vring_packed_desc_event {
+ /* __virtio16 off : 15; // Descriptor Event Offset
+ * __virtio16 wrap : 1; // Descriptor Event Wrap Counter */
+ __virtio16 off_wrap;
+ /* __virtio16 flags : 2; // Descriptor Event Flags */
+ __virtio16 flags;
+};
+
+struct vring_packed_desc {
+ /* Buffer Address. */
+ __virtio64 addr;
+ /* Buffer Length. */
+ __virtio32 len;
+ /* Buffer ID. */
+ __virtio16 id;
+ /* The flags depending on descriptor type. */
+ __virtio16 flags;
+};
+
+struct vring_packed {
+ unsigned int num;
+
+ struct vring_packed_desc *desc;
+
+ struct vring_packed_desc_event *driver;
+
+ struct vring_packed_desc_event *device;
+};
+
#endif /* _UAPI_LINUX_VIRTIO_RING_H */
--
2.17.0
^ permalink raw reply related
* [RFC v6 0/5] virtio: support packed ring
From: Tiwei Bie @ 2018-06-05 7:40 UTC (permalink / raw)
To: mst, jasowang, virtualization, linux-kernel, netdev; +Cc: wexu
Hello everyone,
This RFC implements packed ring support in virtio driver.
Some functional tests have been done with Jason's
packed ring implementation in vhost (RFC v5):
https://lwn.net/Articles/755862/
Both of ping and netperf worked as expected.
TODO:
- Refinements (for code and commit log);
- More tests and bug fixes if any;
- Send the formal patch set;
RFC v5 -> RFC v6:
- Avoid tracking addr/len/flags when DMA API isn't used (MST/Jason);
- Define wrap counter as bool (Jason);
- Use ALIGN() in vring_init_packed() (Jason);
- Avoid using pointer to track `next` in detach_buf_packed() (Jason);
- Add comments for barriers (Jason);
- Don't enable RING_PACKED on ccw for now (noticed by Jason);
- Refine the memory barrier in virtqueue_poll();
- Add a missing memory barrier in virtqueue_enable_cb_delayed_packed();
- Remove the hacks in virtqueue_enable_cb_prepare_packed();
RFC v4 -> RFC v5:
- Save DMA addr, etc in desc state (Jason);
- Track used wrap counter;
RFC v3 -> RFC v4:
- Make ID allocation support out-of-order (Jason);
- Various fixes for EVENT_IDX support;
RFC v2 -> RFC v3:
- Split into small patches (Jason);
- Add helper virtqueue_use_indirect() (Jason);
- Just set id for the last descriptor of a list (Jason);
- Calculate the prev in virtqueue_add_packed() (Jason);
- Fix/improve desc suppression code (Jason/MST);
- Refine the code layout for XXX_split/packed and wrappers (MST);
- Fix the comments and API in uapi (MST);
- Remove the BUG_ON() for indirect (Jason);
- Some other refinements and bug fixes;
RFC v1 -> RFC v2:
- Add indirect descriptor support - compile test only;
- Add event suppression supprt - compile test only;
- Move vring_packed_init() out of uapi (Jason, MST);
- Merge two loops into one in virtqueue_add_packed() (Jason);
- Split vring_unmap_one() for packed ring and split ring (Jason);
- Avoid using '%' operator (Jason);
- Rename free_head -> next_avail_idx (Jason);
- Add comments for virtio_wmb() in virtqueue_add_packed() (Jason);
- Some other refinements and bug fixes;
Thanks!
Tiwei Bie (5):
virtio: add packed ring definitions
virtio_ring: support creating packed ring
virtio_ring: add packed ring support
virtio_ring: add event idx support in packed ring
virtio_ring: enable packed ring
drivers/s390/virtio/virtio_ccw.c | 8 +
drivers/virtio/virtio_ring.c | 1361 ++++++++++++++++++++++------
include/linux/virtio_ring.h | 8 +-
include/uapi/linux/virtio_config.h | 5 +-
include/uapi/linux/virtio_ring.h | 36 +
5 files changed, 1141 insertions(+), 277 deletions(-)
--
2.17.0
^ permalink raw reply
* Re: [RFC V2] virtio: Add platform specific DMA API translation for virito devices
From: Christoph Hellwig @ 2018-06-05 4:52 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: robh, Michael S. Tsirkin, mpe, linux-kernel, virtualization, hch,
luto, joe, david, linuxppc-dev, elfring, Anshuman Khandual
In-Reply-To: <6164442a718645a754879eac5c4c5fad9283c211.camel@kernel.crashing.org>
On Tue, Jun 05, 2018 at 09:26:56AM +1000, Benjamin Herrenschmidt wrote:
> Sorry Michael, that doesn't click. Yes of course virtio is implemented
> in qemu, but the problem we are trying to solve is *not* a qemu problem
> (the fact that the Linux drivers bypass the DMA API is wrong, needs
> fixing, and isnt a qemu problem). The fact that the secure guests need
> bounce buffering is not a qemu problem either.
>
> Whether qemu chose to use an iommu or not is, and should remain an
> orthogonal problem.
Agreed. We have a problem with qemu (old qemu only?) punching a hole
into the VM abstraction by deciding that even if firmware tables
claim use of an IOMMU for a PCI bus it expects virtio to use physiscal
addresses. So far so bad. The answer to that should have been to
quirk the affected qemu versions and move on. Instead we now have
virtio not use the DMA API by default and are creating a worse problem.
Let's fix this issue ASAP and quirk the buggy implementations instead
of letting everyone else suffer.
> The DMA API itself isn't the one that needs to learn "per-device
> quirks", it's just plumbing into arch backends. The "quirk" is at the
> point of establishing the backend for a given device.
>
> We can go a good way down that path simply by having virtio in Linux
> start with putting *itself* its own direct ops in there when
> VIRTIO_F_IOMMU_PLATFORM is not set, and removing all the special casing
> in the rest of the driver.
Yes. And we have all the infrastructure for that now. A few RDMA
drivers quirk to virt_dma_ops, and virtio could quirk to dma_direct_ops
anytime now. In fact given how much time we are spending arguing here
I'm going to give it a spin today.
> Once that's done, we have a single point of establishing the dma ops,
> we can quirk in there if needed, that's rather nicely contained, or put
> an arch hook, or whatever is necessary.
Yes.
^ permalink raw reply
* Re: [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
From: Jason Wang @ 2018-06-05 2:06 UTC (permalink / raw)
To: Samudrala, Sridhar, mst, virtualization, virtio-dev,
jesse.brandeburg, alexander.h.duyck, qemu-devel
In-Reply-To: <fcefe501-b410-f1ec-829c-68199edcdbef@intel.com>
On 2018年06月05日 09:41, Samudrala, Sridhar wrote:
> Ping on this patch now that the kernel patches are accepted into
> davem's net-next tree.
> https://patchwork.ozlabs.org/cover/920005/
>
>
> On 5/7/2018 4:09 PM, Sridhar Samudrala wrote:
>> This feature bit can be used by hypervisor to indicate virtio_net
>> device to
>> act as a standby for another device with the same MAC address.
>>
>> I tested this with a small change to the patch to mark the STANDBY
>> feature 'true'
>> by default as i am using libvirt to start the VMs.
>> Is there a way to pass the newly added feature bit 'standby' to qemu
>> via libvirt
>> XML file?
>>
Maybe you can try qemu command line passthrough:
https://libvirt.org/drvqemu.html#qemucommand
The patch looks good to me. Let's see if Michael have any comment.
Thanks
>> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
>> ---
>> hw/net/virtio-net.c | 2 ++
>> include/standard-headers/linux/virtio_net.h | 3 +++
>> 2 files changed, 5 insertions(+)
>>
>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>> index 90502fca7c..38b3140670 100644
>> --- a/hw/net/virtio-net.c
>> +++ b/hw/net/virtio-net.c
>> @@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = {
>> true),
>> DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed,
>> SPEED_UNKNOWN),
>> DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str),
>> + DEFINE_PROP_BIT64("standby", VirtIONet, host_features,
>> VIRTIO_NET_F_STANDBY,
>> + false),
>> DEFINE_PROP_END_OF_LIST(),
>> };
>> diff --git a/include/standard-headers/linux/virtio_net.h
>> b/include/standard-headers/linux/virtio_net.h
>> index e9f255ea3f..01ec09684c 100644
>> --- a/include/standard-headers/linux/virtio_net.h
>> +++ b/include/standard-headers/linux/virtio_net.h
>> @@ -57,6 +57,9 @@
>> * Steering */
>> #define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */
>> +#define VIRTIO_NET_F_STANDBY 62 /* Act as standby for
>> another device
>> + * with the same MAC.
>> + */
>> #define VIRTIO_NET_F_SPEED_DUPLEX 63 /* Device set linkspeed and
>> duplex */
>> #ifndef VIRTIO_NET_NO_LEGACY
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [RFC V2] virtio: Add platform specific DMA API translation for virito devices
From: David Gibson @ 2018-06-05 1:52 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: robh, Michael S. Tsirkin, mpe, linux-kernel, virtualization, hch,
joe, linuxppc-dev, elfring, Anshuman Khandual
In-Reply-To: <d5df613d6347fe2f9bb6ea65bc6f6be05650ca6f.camel@kernel.crashing.org>
[-- Attachment #1.1: Type: text/plain, Size: 1720 bytes --]
On Mon, Jun 04, 2018 at 07:48:54PM +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2018-06-04 at 18:57 +1000, David Gibson wrote:
> >
> > > - First qemu doesn't know that the guest will switch to "secure mode"
> > > in advance. There is no difference between a normal and a secure
> > > partition until the partition does the magic UV call to "enter secure
> > > mode" and qemu doesn't see any of it. So who can set the flag here ?
> >
> > This seems weird to me. As a rule HV calls should go through qemu -
> > or be allowed to go directly to KVM *by* qemu.
>
> It's not an HV call, it's a UV call, qemu won't see it, qemu isn't
> trusted. Now the UV *will* reflect that to the HV via some synthetized
> HV calls, and we *could* have those do a pass by qemu, however, so far,
> our entire design doesn't rely on *any* qemu knowledge whatsoever and
> it would be sad to add it just for that purpose.
>
> Additionally, this is rather orthogonal, see my other email, the
> problem we are trying to solve is *not* a qemu problem and it doesn't
> make sense to leak that into qemu.
>
> > We generally reserve
> > the latter for hot path things. Since this isn't a hot path, having
> > the call handled directly by the kernel seems wrong.
> >
> > Unless a "UV call" is something different I don't know about.
>
> Yes, a UV call goes to the Ultravisor, not the Hypervisor. The
> Hypervisor isn't trusted.
Ah, right. Is that implemented in the host kernel, or in something
further above?
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
[-- Attachment #2: Type: text/plain, Size: 183 bytes --]
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
From: Samudrala, Sridhar @ 2018-06-05 1:41 UTC (permalink / raw)
To: mst, virtualization, virtio-dev, jesse.brandeburg,
alexander.h.duyck, jasowang, qemu-devel
In-Reply-To: <1525734594-11134-1-git-send-email-sridhar.samudrala@intel.com>
Ping on this patch now that the kernel patches are accepted into davem's net-next tree.
https://patchwork.ozlabs.org/cover/920005/
On 5/7/2018 4:09 PM, Sridhar Samudrala wrote:
> This feature bit can be used by hypervisor to indicate virtio_net device to
> act as a standby for another device with the same MAC address.
>
> I tested this with a small change to the patch to mark the STANDBY feature 'true'
> by default as i am using libvirt to start the VMs.
> Is there a way to pass the newly added feature bit 'standby' to qemu via libvirt
> XML file?
>
> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
> ---
> hw/net/virtio-net.c | 2 ++
> include/standard-headers/linux/virtio_net.h | 3 +++
> 2 files changed, 5 insertions(+)
>
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index 90502fca7c..38b3140670 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = {
> true),
> DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN),
> DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str),
> + DEFINE_PROP_BIT64("standby", VirtIONet, host_features, VIRTIO_NET_F_STANDBY,
> + false),
> DEFINE_PROP_END_OF_LIST(),
> };
>
> diff --git a/include/standard-headers/linux/virtio_net.h b/include/standard-headers/linux/virtio_net.h
> index e9f255ea3f..01ec09684c 100644
> --- a/include/standard-headers/linux/virtio_net.h
> +++ b/include/standard-headers/linux/virtio_net.h
> @@ -57,6 +57,9 @@
> * Steering */
> #define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */
>
> +#define VIRTIO_NET_F_STANDBY 62 /* Act as standby for another device
> + * with the same MAC.
> + */
> #define VIRTIO_NET_F_SPEED_DUPLEX 63 /* Device set linkspeed and duplex */
>
> #ifndef VIRTIO_NET_NO_LEGACY
^ permalink raw reply
* Re: [virtio-dev] Re: [PATCH v3] virtio_pci: support enabling VFs
From: Tiwei Bie @ 2018-06-05 1:36 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: alexander.h.duyck, virtio-dev, linux-pci, linux-kernel,
virtualization, stefanha, zhihong.wang, bhelgaas, mark.d.rustad
In-Reply-To: <20180604192222-mutt-send-email-mst@kernel.org>
On Mon, Jun 04, 2018 at 07:32:25PM +0300, Michael S. Tsirkin wrote:
> On Fri, Jun 01, 2018 at 12:02:39PM +0800, Tiwei Bie wrote:
> > There is a new feature bit allocated in virtio spec to
> > support SR-IOV (Single Root I/O Virtualization):
> >
> > https://github.com/oasis-tcs/virtio-spec/issues/11
> >
> > This patch enables the support for this feature bit in
> > virtio driver.
> >
> > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > ---
>
> OK but what about freeze/restore functions?
>
> I also wonder about kexec - virtio.c currently does:
>
> /* We always start by resetting the device, in case a previous
> * driver messed it up. This also tests that code path a little. */
> dev->config->reset(dev);
>
> Do we need to do something like this for sriov?
I think VFs are managed by PCI core. Once they are
allocated, virtio driver doesn't have to care too
much about how to manage them. The proposal for the
spec is just to provide a feature bit based virtio
way for virtio drivers to know whether a virtio
device is SR-IOV capable (and virtio drivers can
support configuring SR-IOV based on the feature
bit negotiation result).
>
> I also wonder whether PCI core should disable sriov for us.
>
>
> I wish there was a patch emulating this without vDPA for QEMU,
> would make it easy to test your patches. Do you happen
> to have something like this?
Sorry, currently I don't have anything like this..
Best regards,
Tiwei Bie
>
> Thanks,
>
>
> > v3:
> > - Drop the acks;
> >
> > v2:
> > - Disable VFs when unbinding the driver (Alex, MST);
> > - Don't use pci_sriov_configure_simple (Alex);
> >
> > drivers/virtio/virtio_pci_common.c | 30 ++++++++++++++++++++++++++++++
> > drivers/virtio/virtio_pci_modern.c | 14 ++++++++++++++
> > include/uapi/linux/virtio_config.h | 7 ++++++-
> > 3 files changed, 50 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
> > index 48d4d1cf1cb6..1d4467b2dc31 100644
> > --- a/drivers/virtio/virtio_pci_common.c
> > +++ b/drivers/virtio/virtio_pci_common.c
> > @@ -577,6 +577,8 @@ static void virtio_pci_remove(struct pci_dev *pci_dev)
> > struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
> > struct device *dev = get_device(&vp_dev->vdev.dev);
> >
> > + pci_disable_sriov(pci_dev);
> > +
> > unregister_virtio_device(&vp_dev->vdev);
> >
> > if (vp_dev->ioaddr)
> > @@ -588,6 +590,33 @@ static void virtio_pci_remove(struct pci_dev *pci_dev)
> > put_device(dev);
> > }
> >
> > +static int virtio_pci_sriov_configure(struct pci_dev *pci_dev, int num_vfs)
> > +{
> > + struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
> > + struct virtio_device *vdev = &vp_dev->vdev;
> > + int ret;
> > +
> > + if (!(vdev->config->get_status(vdev) & VIRTIO_CONFIG_S_DRIVER_OK))
> > + return -EBUSY;
> > +
> > + if (!__virtio_test_bit(vdev, VIRTIO_F_SR_IOV))
> > + return -EINVAL;
> > +
> > + if (pci_vfs_assigned(pci_dev))
> > + return -EPERM;
> > +
> > + if (num_vfs == 0) {
> > + pci_disable_sriov(pci_dev);
> > + return 0;
> > + }
> > +
> > + ret = pci_enable_sriov(pci_dev, num_vfs);
> > + if (ret < 0)
> > + return ret;
> > +
> > + return num_vfs;
> > +}
> > +
> > static struct pci_driver virtio_pci_driver = {
> > .name = "virtio-pci",
> > .id_table = virtio_pci_id_table,
> > @@ -596,6 +625,7 @@ static struct pci_driver virtio_pci_driver = {
> > #ifdef CONFIG_PM_SLEEP
> > .driver.pm = &virtio_pci_pm_ops,
> > #endif
> > + .sriov_configure = virtio_pci_sriov_configure,
> > };
> >
> > module_pci_driver(virtio_pci_driver);
> > diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
> > index 2555d80f6eec..07571daccfec 100644
> > --- a/drivers/virtio/virtio_pci_modern.c
> > +++ b/drivers/virtio/virtio_pci_modern.c
> > @@ -153,14 +153,28 @@ static u64 vp_get_features(struct virtio_device *vdev)
> > return features;
> > }
> >
> > +static void vp_transport_features(struct virtio_device *vdev, u64 features)
> > +{
> > + struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > + struct pci_dev *pci_dev = vp_dev->pci_dev;
> > +
> > + if ((features & BIT_ULL(VIRTIO_F_SR_IOV)) &&
> > + pci_find_ext_capability(pci_dev, PCI_EXT_CAP_ID_SRIOV))
> > + __virtio_set_bit(vdev, VIRTIO_F_SR_IOV);
> > +}
> > +
> > /* virtio config->finalize_features() implementation */
> > static int vp_finalize_features(struct virtio_device *vdev)
> > {
> > struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > + u64 features = vdev->features;
> >
> > /* Give virtio_ring a chance to accept features. */
> > vring_transport_features(vdev);
> >
> > + /* Give virtio_pci a chance to accept features. */
> > + vp_transport_features(vdev, features);
> > +
> > if (!__virtio_test_bit(vdev, VIRTIO_F_VERSION_1)) {
> > dev_err(&vdev->dev, "virtio: device uses modern interface "
> > "but does not have VIRTIO_F_VERSION_1\n");
> > diff --git a/include/uapi/linux/virtio_config.h b/include/uapi/linux/virtio_config.h
> > index 308e2096291f..b7c1f4e7d59e 100644
> > --- a/include/uapi/linux/virtio_config.h
> > +++ b/include/uapi/linux/virtio_config.h
> > @@ -49,7 +49,7 @@
> > * transport being used (eg. virtio_ring), the rest are per-device feature
> > * bits. */
> > #define VIRTIO_TRANSPORT_F_START 28
> > -#define VIRTIO_TRANSPORT_F_END 34
> > +#define VIRTIO_TRANSPORT_F_END 38
> >
> > #ifndef VIRTIO_CONFIG_NO_LEGACY
> > /* Do we get callbacks when the ring is completely used, even if we've
> > @@ -71,4 +71,9 @@
> > * this is for compatibility with legacy systems.
> > */
> > #define VIRTIO_F_IOMMU_PLATFORM 33
> > +
> > +/*
> > + * Does the device support Single Root I/O Virtualization?
> > + */
> > +#define VIRTIO_F_SR_IOV 37
> > #endif /* _UAPI_LINUX_VIRTIO_CONFIG_H */
> > --
> > 2.17.0
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>
^ permalink raw reply
* Re: [RFC V2] virtio: Add platform specific DMA API translation for virito devices
From: Michael S. Tsirkin @ 2018-06-05 1:25 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: robh, mpe, linux-kernel, virtualization, hch, joe, david,
linuxppc-dev, elfring, Anshuman Khandual
In-Reply-To: <6164442a718645a754879eac5c4c5fad9283c211.camel@kernel.crashing.org>
On Tue, Jun 05, 2018 at 09:26:56AM +1000, Benjamin Herrenschmidt wrote:
> I would like to keep however the ability to bypass the iommu for
> performance reasons
So that's easy, clear the IOMMU flag and this means "bypass the IOMMU".
--
MST
^ permalink raw reply
* RE: [PATCH v4] virtio_blk: add DISCARD and WRIET ZEROES commands support
From: Liu, Changpeng @ 2018-06-05 0:55 UTC (permalink / raw)
To: Paolo Bonzini, Stefan Hajnoczi
Cc: cavery@redhat.com, virtualization@lists.linux-foundation.org
In-Reply-To: <102ec75b-68c9-a488-d0c3-223ab09254bc@redhat.com>
> -----Original Message-----
> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> Sent: Monday, June 4, 2018 6:03 PM
> To: Liu, Changpeng <changpeng.liu@intel.com>; Stefan Hajnoczi
> <stefanha@redhat.com>
> Cc: virtualization@lists.linux-foundation.org; cavery@redhat.com;
> jasowang@redhat.com; Wang, Wei W <wei.w.wang@intel.com>
> Subject: Re: [PATCH v4] virtio_blk: add DISCARD and WRIET ZEROES commands
> support
>
> On 04/06/2018 06:14, Liu, Changpeng wrote:
> >>> But I believe the specification says VIRTIO_BLK_T_OUT means direction, so
> >>> OR the two bits together should compliance with the specification.
> >> I cannot find that in the specification:
> >>
> >> http://docs.oasis-open.org/virtio/virtio/v1.0/cs04/virtio-v1.0-cs04.html#x1-
> >> 2020002
> >>
> >> and it would contradict the "The type of the request is either ... or
> >> ..." wording that I quoted from the spec above.
> >>
> >> If you do find something in the spec, please let me know and we can
> >> figure out how to make the spec consistent.
> >
> > I saw comments from file linux/usr/include/uapi/linux/virtio_blk.h, which says
> > VIRTIO_BLK_T_OUT may be combined with other commands and means direction,
> > the specification does not have such description.
>
> I don't think it is in the specification indeed (however, 11 and 13 were
> chosen so that VIRTIO_BLK_T_OUT can still indicate direction).
Correct, we don't need to OR VIRTIO_BLK_T_OUT again for DISCARD and WRITE ZEROES commands.
I'll remove it in next patch set.
>
> Paolo
^ permalink raw reply
* Re: [RFC V2] virtio: Add platform specific DMA API translation for virito devices
From: Benjamin Herrenschmidt @ 2018-06-04 23:26 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: robh, mpe, linux-kernel, virtualization, hch, joe, david,
linuxppc-dev, elfring, Anshuman Khandual
In-Reply-To: <20180604184035-mutt-send-email-mst@kernel.org>
On Mon, 2018-06-04 at 19:21 +0300, Michael S. Tsirkin wrote:
>
> > > > - First qemu doesn't know that the guest will switch to "secure mode"
> > > > in advance. There is no difference between a normal and a secure
> > > > partition until the partition does the magic UV call to "enter secure
> > > > mode" and qemu doesn't see any of it. So who can set the flag here ?
> > >
> > > The user should set it. You just tell user "to be able to use with
> > > feature X, enable IOMMU".
> >
> > That's completely backwards. The user has no idea what that stuff is.
> > And it would have to percolate all the way up the management stack,
> > libvirt, kimchi, whatever else ... that's just nonsense.
> >
> > Especially since, as I explained in my other email, this is *not* a
> > qemu problem and thus the solution shouldn't be messing around with
> > qemu.
>
> virtio is implemented in qemu though. If you prefer to stick
> all your code in either guest or the UV that's your decision
> but it looks like qemu could be helpful here.
Sorry Michael, that doesn't click. Yes of course virtio is implemented
in qemu, but the problem we are trying to solve is *not* a qemu problem
(the fact that the Linux drivers bypass the DMA API is wrong, needs
fixing, and isnt a qemu problem). The fact that the secure guests need
bounce buffering is not a qemu problem either.
Whether qemu chose to use an iommu or not is, and should remain an
orthogonal problem.
Forcing qemu to use the iommu to work around a linux side lack of
proper use of the DMA API is not only just papering over the problem,
it's also forcing changes up 3 or 4 levels of the SW stack to create
that new option that no user will understand the meaning of and that
would otherwise be unnecessary.
> For example what if you have a guest that passes physical addresses
> to qemu bypassing swiotlb? Don't you want to detect
> that and fail gracefully rather than crash the guest?
A guest bug then ? Well it wouldn't so much crash as force the pages to
become encrypted and cause horrible ping/pong between qemu and the
guest (the secure pages aren't accessible to qemu directly).
> That's what VIRTIO_F_IOMMU_PLATFORM will do for you.
Again this is orthogonal. Using an iommu will indeed provide a modicum
of protection against buggy drivers, like it does on HW PCI platforms,
whether those guests are secure or not.
Note however that in practice, we tend to disable the iommu even on
real HW whenever we want performance (of course we can't for guests but
for bare metal systems we do, the added RAS isn't worth the performance
lost for very fast networking for example).
> Still that's hypervisor's decision. What isn't up to the hypervisor is
> the way we structure code. We made an early decision to merge a hack
> with xen, among discussion about how with time DMA API will learn to
> support per-device quirks and we'll be able to switch to that.
> So let's do that now?
The DMA API itself isn't the one that needs to learn "per-device
quirks", it's just plumbing into arch backends. The "quirk" is at the
point of establishing the backend for a given device.
We can go a good way down that path simply by having virtio in Linux
start with putting *itself* its own direct ops in there when
VIRTIO_F_IOMMU_PLATFORM is not set, and removing all the special casing
in the rest of the driver.
Once that's done, we have a single point of establishing the dma ops,
we can quirk in there if needed, that's rather nicely contained, or put
an arch hook, or whatever is necessary.
I would like to keep however the ability to bypass the iommu for
performance reasons, and also because it's qemu default mode of
operation and my secure guest has no clean way to force qemu to turn
the iommu on. The hypervisor *could* return something to qemu when the
guest switch to secure as we do know that, and qemu could walk all of
it's virtio devices as a result and "switch" them over but that's
almost grosser from a qemu perspective.
.../...
> > The point is that requiring specific qemu command line arguments isn't
> > going to fly. We have additional problems due to the fact that our
> > firmware (SLOF) inside qemu doesn't currently deal with iommu's etc...
> > though those can be fixed.
> >
> > Overall, however, this seems to be the most convoluted way of achieving
> > things, require user interventions where none should be needed etc...
> >
> > Again, what's wrong with a 2 lines hook instead that solves it all and
> > completely avoids involving qemu ?
> >
> > Ben.
>
> That each platform wants to add hacks in this data path function.
Sure, then add a single platform hook and the platforms can do what
they want here.
But as I said, it should all be done at initialization time rather than
in the data path, this we absolutely agree. We should just chose the
right set of dma_ops, and have the data path always use the DMA API.
Cheers,
Ben.
> > >
> > > > >
> > > > >
> > > > > > arch/powerpc/include/asm/dma-mapping.h | 6 ++++++
> > > > > > arch/powerpc/platforms/pseries/iommu.c | 11 +++++++++++
> > > > > > drivers/virtio/virtio_ring.c | 10 ++++++++++
> > > > > > 3 files changed, 27 insertions(+)
> > > > > >
> > > > > > diff --git a/arch/powerpc/include/asm/dma-mapping.h b/arch/powerpc/include/asm/dma-mapping.h
> > > > > > index 8fa3945..056e578 100644
> > > > > > --- a/arch/powerpc/include/asm/dma-mapping.h
> > > > > > +++ b/arch/powerpc/include/asm/dma-mapping.h
> > > > > > @@ -115,4 +115,10 @@ extern u64 __dma_get_required_mask(struct device *dev);
> > > > > > #define ARCH_HAS_DMA_MMAP_COHERENT
> > > > > >
> > > > > > #endif /* __KERNEL__ */
> > > > > > +
> > > > > > +#define platform_forces_virtio_dma platform_forces_virtio_dma
> > > > > > +
> > > > > > +struct virtio_device;
> > > > > > +
> > > > > > +extern bool platform_forces_virtio_dma(struct virtio_device *vdev);
> > > > > > #endif /* _ASM_DMA_MAPPING_H */
> > > > > > diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> > > > > > index 06f0296..a2ec15a 100644
> > > > > > --- a/arch/powerpc/platforms/pseries/iommu.c
> > > > > > +++ b/arch/powerpc/platforms/pseries/iommu.c
> > > > > > @@ -38,6 +38,7 @@
> > > > > > #include <linux/of.h>
> > > > > > #include <linux/iommu.h>
> > > > > > #include <linux/rculist.h>
> > > > > > +#include <linux/virtio.h>
> > > > > > #include <asm/io.h>
> > > > > > #include <asm/prom.h>
> > > > > > #include <asm/rtas.h>
> > > > > > @@ -1396,3 +1397,13 @@ static int __init disable_multitce(char *str)
> > > > > > __setup("multitce=", disable_multitce);
> > > > > >
> > > > > > machine_subsys_initcall_sync(pseries, tce_iommu_bus_notifier_init);
> > > > > > +
> > > > > > +bool platform_forces_virtio_dma(struct virtio_device *vdev)
> > > > > > +{
> > > > > > + /*
> > > > > > + * On protected guest platforms, force virtio core to use DMA
> > > > > > + * MAP API for all virtio devices. But there can also be some
> > > > > > + * exceptions for individual devices like virtio balloon.
> > > > > > + */
> > > > > > + return (of_find_compatible_node(NULL, NULL, "ibm,ultravisor") != NULL);
> > > > > > +}
> > > > >
> > > > > Isn't this kind of slow? vring_use_dma_api is on
> > > > > data path and supposed to be very fast.
> > > > >
> > > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > > index 21d464a..47ea6c3 100644
> > > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > > @@ -141,8 +141,18 @@ struct vring_virtqueue {
> > > > > > * unconditionally on data path.
> > > > > > */
> > > > > >
> > > > > > +#ifndef platform_forces_virtio_dma
> > > > > > +static inline bool platform_forces_virtio_dma(struct virtio_device *vdev)
> > > > > > +{
> > > > > > + return false;
> > > > > > +}
> > > > > > +#endif
> > > > > > +
> > > > > > static bool vring_use_dma_api(struct virtio_device *vdev)
> > > > > > {
> > > > > > + if (platform_forces_virtio_dma(vdev))
> > > > > > + return true;
> > > > > > +
> > > > > > if (!virtio_has_iommu_quirk(vdev))
> > > > > > return true;
> > > > > >
> > > > > > --
> > > > > > 2.9.3
^ permalink raw reply
* Re: [PATCH] virtio_ring: switch to dma_XX barriers for rpmsg
From: Bjorn Andersson @ 2018-06-04 20:16 UTC (permalink / raw)
To: Paolo Bonzini, Suman Anna
Cc: Ohad Ben-Cohen, Michael S. Tsirkin, linux-remoteproc,
linux-kernel, virtualization
In-Reply-To: <524fb400-2325-f60b-7e03-15be01888afc@redhat.com>
On Thu 19 Apr 10:48 PDT 2018, Paolo Bonzini wrote:
> On 19/04/2018 19:46, Michael S. Tsirkin wrote:
> >> This should be okay, but I wonder if there should be a virtio_wmb(...)
> >> or an "if (weak_barriers) wmb()" before the "writel" in vm_notify
> >> (drivers/virtio/virtio_mmio.c).
> >>
> >> Thanks,
> >>
> >> Paolo
> > That one uses weak barriers AFAIK.
> >
> > IIUC you mean rproc_virtio_notify.
> >
> > I suspect it works because specific kick callbacks have a barrier internally.
>
> Yes, that one. At least keystone_rproc_kick doesn't seem to have a barrier.
>
Afaict you're correct. My expectation is that the kick ensures write
ordering internally and if I read this correct keystone_rproc_kick()
results in a writel_relaxed() in the gpio driver.
@Suman, can you please have a look at this.
Thanks Paolo,
Bjorn
^ permalink raw reply
* Re: [PATCH] virtio_ring: switch to dma_XX barriers for rpmsg
From: Bjorn Andersson @ 2018-06-04 20:02 UTC (permalink / raw)
To: Michael S. Tsirkin, Suman Anna, Loic Pallardy
Cc: Ohad Ben-Cohen, linux-remoteproc, linux-kernel, virtualization,
Paolo Bonzini
In-Reply-To: <1524159181-351878-1-git-send-email-mst@redhat.com>
On Thu 19 Apr 10:35 PDT 2018, Michael S. Tsirkin wrote:
> virtio is using barriers to order memory accesses, thus
> dma_wmb/rmb is a good match.
>
> Build-tested on x86: Before
>
> [mst@tuck linux]$ size drivers/virtio/virtio_ring.o
> text data bss dec hex filename
> 11392 820 0 12212 2fb4 drivers/virtio/virtio_ring.o
>
> After
> mst@tuck linux]$ size drivers/virtio/virtio_ring.o
> text data bss dec hex filename
> 11284 820 0 12104 2f48 drivers/virtio/virtio_ring.o
>
> Cc: Ohad Ben-Cohen <ohad@wizery.com>
> Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
> Cc: linux-remoteproc@vger.kernel.org
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Regards,
Bjorn
> ---
>
> It's good in theory, but could one of RPMSG maintainers please review
> and ack this patch? Or even better test it?
>
> All these barriers are useless on Intel anyway ...
>
> include/linux/virtio_ring.h | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/virtio_ring.h b/include/linux/virtio_ring.h
> index bbf3252..fab0213 100644
> --- a/include/linux/virtio_ring.h
> +++ b/include/linux/virtio_ring.h
> @@ -35,7 +35,7 @@ static inline void virtio_rmb(bool weak_barriers)
> if (weak_barriers)
> virt_rmb();
> else
> - rmb();
> + dma_rmb();
> }
>
> static inline void virtio_wmb(bool weak_barriers)
> @@ -43,7 +43,7 @@ static inline void virtio_wmb(bool weak_barriers)
> if (weak_barriers)
> virt_wmb();
> else
> - wmb();
> + dma_wmb();
> }
>
> static inline void virtio_store_mb(bool weak_barriers,
> --
> MST
^ permalink raw reply
* Re: [PATCH v2 0/9] x86: macrofying inline asm for better compilation
From: Josh Poimboeuf @ 2018-06-04 19:05 UTC (permalink / raw)
To: Nadav Amit
Cc: Juergen Gross, Kate Stewart, Kees Cook, Peter Zijlstra,
Greg Kroah-Hartman, Christopher Li, x86, linux-kernel,
Philippe Ombredanne, virtualization, linux-sparse, Ingo Molnar,
Jan Beulich, H. Peter Anvin, Alok Kataria, Linus Torvalds,
Thomas Gleixner
In-Reply-To: <20180604112131.59100-1-namit@vmware.com>
On Mon, Jun 04, 2018 at 04:21:22AM -0700, Nadav Amit wrote:
> This patch-set deals with an interesting yet stupid problem: kernel code
> that does not get inlined despite its simplicity. There are several
> causes for this behavior: "cold" attribute on __init, different function
> optimization levels; conditional constant computations based on
> __builtin_constant_p(); and finally large inline assembly blocks.
>
> This patch-set deals with the inline assembly problem. I separated these
> patches from the others (that were sent in the RFC) for easier
> inclusion. I also separated the removal of unnecessary new-lines which
> would be sent separately.
>
> The problem with inline assembly is that inline assembly is often used
> by the kernel for things that are other than code - for example,
> assembly directives and data. GCC however is oblivious to the content of
> the blocks and assumes their cost in space and time is proportional to
> the number of the perceived assembly "instruction", according to the
> number of newlines and semicolons. Alternatives, paravirt and other
> mechanisms are affected, causing code not to be inlined, and degrading
> compilation quality in general.
>
> The solution that this patch-set carries for this problem is to create
> an assembly macro, and then call it from the inline assembly block. As
> a result, the compiler sees a single "instruction" and assigns the more
> appropriate cost to the code.
>
> To avoid uglification of the code, as many noted, the macros are first
> precompiled into an assembly file, which is later assembled together
> with the the C files. This also enables to avoid duplicate
> implementation that was set before for the asm and C code. This can be
> seen in the exception table changes.
>
> Overall this patch-set slightly increases the kernel size (my build was
> done using my Ubuntu 18.04 config + localyesconfig for the record):
>
> text data bss dec hex filename
> 18140829 10224724 2957312 31322865 1ddf2f1 ./vmlinux before
> 18163608 10227348 2957312 31348268 1de562c ./vmlinux after (+0.1%)
>
> The number of static functions in the image is reduced by 379, but
> actually inlining is even better, which does not always shows in these
> numbers: a function may be inlined causing the calling function not to
> be inlined.
>
> The Makefile stuff may not be too clean. Ideas for improvements are
> welcome.
>
> v1->v2: * Compiling the macros into a separate .s file, improving
> readability (Linus)
> * Improving assembly formatting, applying most of the comments
> according to my judgment (Jan)
> * Adding exception-table, cpufeature and jump-labels
> * Removing new-line cleanup; to be submitted separately
How did you find these issues? Is there some way to find them
automatically in the future? Perhaps with a GCC plugin?
--
Josh
^ permalink raw reply
* Re: [RFC V2] virtio: Add platform specific DMA API translation for virito devices
From: Michael S. Tsirkin @ 2018-06-04 16:34 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: robh, mpe, linux-kernel, virtualization, Christoph Hellwig, joe,
david, linuxppc-dev, elfring, Anshuman Khandual
In-Reply-To: <acdfef1327f73f6ac67645d9f1a8e9204a0f22fb.camel@kernel.crashing.org>
On Mon, Jun 04, 2018 at 11:14:36PM +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2018-06-04 at 05:55 -0700, Christoph Hellwig wrote:
> > On Mon, Jun 04, 2018 at 03:43:09PM +0300, Michael S. Tsirkin wrote:
> > > Another is that given the basic functionality is in there, optimizations
> > > can possibly wait until per-device quirks in DMA API are supported.
> >
> > We have had per-device dma_ops for quite a while.
>
> I've asked Ansuman to start with a patch that converts virtio to use
> DMA ops always, along with an init quirk to hookup "direct" ops when
> the IOMMU flag isn't set.
>
> This will at least remove that horrid duplication of code path we have
> in there.
>
> Then we can just involve the arch in that init quirk so we can chose an
> alternate set of ops when running a secure VM.
>
> This is completely orthogonal to whether an iommu exist qemu side or
> not, and should be entirely solved on the Linux side.
>
> Cheers,
> Ben.
Sounds good to me.
--
MST
^ permalink raw reply
* Re: [PATCH v3] virtio_pci: support enabling VFs
From: Michael S. Tsirkin @ 2018-06-04 16:32 UTC (permalink / raw)
To: Tiwei Bie
Cc: alexander.h.duyck, virtio-dev, linux-pci, linux-kernel,
virtualization, stefanha, zhihong.wang, bhelgaas, mark.d.rustad
In-Reply-To: <20180601040239.1151-1-tiwei.bie@intel.com>
On Fri, Jun 01, 2018 at 12:02:39PM +0800, Tiwei Bie wrote:
> There is a new feature bit allocated in virtio spec to
> support SR-IOV (Single Root I/O Virtualization):
>
> https://github.com/oasis-tcs/virtio-spec/issues/11
>
> This patch enables the support for this feature bit in
> virtio driver.
>
> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> ---
OK but what about freeze/restore functions?
I also wonder about kexec - virtio.c currently does:
/* We always start by resetting the device, in case a previous
* driver messed it up. This also tests that code path a little. */
dev->config->reset(dev);
Do we need to do something like this for sriov?
I also wonder whether PCI core should disable sriov for us.
I wish there was a patch emulating this without vDPA for QEMU,
would make it easy to test your patches. Do you happen
to have something like this?
Thanks,
> v3:
> - Drop the acks;
>
> v2:
> - Disable VFs when unbinding the driver (Alex, MST);
> - Don't use pci_sriov_configure_simple (Alex);
>
> drivers/virtio/virtio_pci_common.c | 30 ++++++++++++++++++++++++++++++
> drivers/virtio/virtio_pci_modern.c | 14 ++++++++++++++
> include/uapi/linux/virtio_config.h | 7 ++++++-
> 3 files changed, 50 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
> index 48d4d1cf1cb6..1d4467b2dc31 100644
> --- a/drivers/virtio/virtio_pci_common.c
> +++ b/drivers/virtio/virtio_pci_common.c
> @@ -577,6 +577,8 @@ static void virtio_pci_remove(struct pci_dev *pci_dev)
> struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
> struct device *dev = get_device(&vp_dev->vdev.dev);
>
> + pci_disable_sriov(pci_dev);
> +
> unregister_virtio_device(&vp_dev->vdev);
>
> if (vp_dev->ioaddr)
> @@ -588,6 +590,33 @@ static void virtio_pci_remove(struct pci_dev *pci_dev)
> put_device(dev);
> }
>
> +static int virtio_pci_sriov_configure(struct pci_dev *pci_dev, int num_vfs)
> +{
> + struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
> + struct virtio_device *vdev = &vp_dev->vdev;
> + int ret;
> +
> + if (!(vdev->config->get_status(vdev) & VIRTIO_CONFIG_S_DRIVER_OK))
> + return -EBUSY;
> +
> + if (!__virtio_test_bit(vdev, VIRTIO_F_SR_IOV))
> + return -EINVAL;
> +
> + if (pci_vfs_assigned(pci_dev))
> + return -EPERM;
> +
> + if (num_vfs == 0) {
> + pci_disable_sriov(pci_dev);
> + return 0;
> + }
> +
> + ret = pci_enable_sriov(pci_dev, num_vfs);
> + if (ret < 0)
> + return ret;
> +
> + return num_vfs;
> +}
> +
> static struct pci_driver virtio_pci_driver = {
> .name = "virtio-pci",
> .id_table = virtio_pci_id_table,
> @@ -596,6 +625,7 @@ static struct pci_driver virtio_pci_driver = {
> #ifdef CONFIG_PM_SLEEP
> .driver.pm = &virtio_pci_pm_ops,
> #endif
> + .sriov_configure = virtio_pci_sriov_configure,
> };
>
> module_pci_driver(virtio_pci_driver);
> diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
> index 2555d80f6eec..07571daccfec 100644
> --- a/drivers/virtio/virtio_pci_modern.c
> +++ b/drivers/virtio/virtio_pci_modern.c
> @@ -153,14 +153,28 @@ static u64 vp_get_features(struct virtio_device *vdev)
> return features;
> }
>
> +static void vp_transport_features(struct virtio_device *vdev, u64 features)
> +{
> + struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> + struct pci_dev *pci_dev = vp_dev->pci_dev;
> +
> + if ((features & BIT_ULL(VIRTIO_F_SR_IOV)) &&
> + pci_find_ext_capability(pci_dev, PCI_EXT_CAP_ID_SRIOV))
> + __virtio_set_bit(vdev, VIRTIO_F_SR_IOV);
> +}
> +
> /* virtio config->finalize_features() implementation */
> static int vp_finalize_features(struct virtio_device *vdev)
> {
> struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> + u64 features = vdev->features;
>
> /* Give virtio_ring a chance to accept features. */
> vring_transport_features(vdev);
>
> + /* Give virtio_pci a chance to accept features. */
> + vp_transport_features(vdev, features);
> +
> if (!__virtio_test_bit(vdev, VIRTIO_F_VERSION_1)) {
> dev_err(&vdev->dev, "virtio: device uses modern interface "
> "but does not have VIRTIO_F_VERSION_1\n");
> diff --git a/include/uapi/linux/virtio_config.h b/include/uapi/linux/virtio_config.h
> index 308e2096291f..b7c1f4e7d59e 100644
> --- a/include/uapi/linux/virtio_config.h
> +++ b/include/uapi/linux/virtio_config.h
> @@ -49,7 +49,7 @@
> * transport being used (eg. virtio_ring), the rest are per-device feature
> * bits. */
> #define VIRTIO_TRANSPORT_F_START 28
> -#define VIRTIO_TRANSPORT_F_END 34
> +#define VIRTIO_TRANSPORT_F_END 38
>
> #ifndef VIRTIO_CONFIG_NO_LEGACY
> /* Do we get callbacks when the ring is completely used, even if we've
> @@ -71,4 +71,9 @@
> * this is for compatibility with legacy systems.
> */
> #define VIRTIO_F_IOMMU_PLATFORM 33
> +
> +/*
> + * Does the device support Single Root I/O Virtualization?
> + */
> +#define VIRTIO_F_SR_IOV 37
> #endif /* _UAPI_LINUX_VIRTIO_CONFIG_H */
> --
> 2.17.0
^ permalink raw reply
* Re: [RFC V2] virtio: Add platform specific DMA API translation for virito devices
From: Michael S. Tsirkin @ 2018-06-04 16:21 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: robh, mpe, linux-kernel, virtualization, hch, joe, david,
linuxppc-dev, elfring, Anshuman Khandual
In-Reply-To: <e7ceddbec11711a89282e9b70b7fd3c8af10b030.camel@kernel.crashing.org>
On Mon, Jun 04, 2018 at 11:11:52PM +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2018-06-04 at 15:43 +0300, Michael S. Tsirkin wrote:
> > On Thu, May 24, 2018 at 08:27:04AM +1000, Benjamin Herrenschmidt wrote:
> > > On Wed, 2018-05-23 at 21:50 +0300, Michael S. Tsirkin wrote:
> > >
> > > > I re-read that discussion and I'm still unclear on the
> > > > original question, since I got several apparently
> > > > conflicting answers.
> > > >
> > > > I asked:
> > > >
> > > > Why isn't setting VIRTIO_F_IOMMU_PLATFORM on the
> > > > hypervisor side sufficient?
> > >
> > > I thought I had replied to this...
> > >
> > > There are a couple of reasons:
> > >
> > > - First qemu doesn't know that the guest will switch to "secure mode"
> > > in advance. There is no difference between a normal and a secure
> > > partition until the partition does the magic UV call to "enter secure
> > > mode" and qemu doesn't see any of it. So who can set the flag here ?
> >
> > The user should set it. You just tell user "to be able to use with
> > feature X, enable IOMMU".
>
> That's completely backwards. The user has no idea what that stuff is.
> And it would have to percolate all the way up the management stack,
> libvirt, kimchi, whatever else ... that's just nonsense.
>
> Especially since, as I explained in my other email, this is *not* a
> qemu problem and thus the solution shouldn't be messing around with
> qemu.
virtio is implemented in qemu though. If you prefer to stick
all your code in either guest or the UV that's your decision
but it looks like qemu could be helpful here.
For example what if you have a guest that passes physical addresses
to qemu bypassing swiotlb? Don't you want to detect
that and fail gracefully rather than crash the guest?
That's what VIRTIO_F_IOMMU_PLATFORM will do for you.
Still that's hypervisor's decision. What isn't up to the hypervisor is
the way we structure code. We made an early decision to merge a hack
with xen, among discussion about how with time DMA API will learn to
support per-device quirks and we'll be able to switch to that.
So let's do that now?
> >
> > > - Second, when using VIRTIO_F_IOMMU_PLATFORM, we also make qemu (or
> > > vhost) go through the emulated MMIO for every access to the guest,
> > > which adds additional overhead.
> > >
> > > Cheers,
> > > Ben.
> >
> > There are several answers to this. One is that we are working hard to
> > make overhead small when the mappings are static (which they would be if
> > there's no actual IOMMU). So maybe especially given you are using
> > a bounce buffer on top it's not so bad - did you try to
> > benchmark?
> >
> > Another is that given the basic functionality is in there, optimizations
> > can possibly wait until per-device quirks in DMA API are supported.
>
> The point is that requiring specific qemu command line arguments isn't
> going to fly. We have additional problems due to the fact that our
> firmware (SLOF) inside qemu doesn't currently deal with iommu's etc...
> though those can be fixed.
>
> Overall, however, this seems to be the most convoluted way of achieving
> things, require user interventions where none should be needed etc...
>
> Again, what's wrong with a 2 lines hook instead that solves it all and
> completely avoids involving qemu ?
>
> Ben.
That each platform wants to add hacks in this data path function.
> >
> > > >
> > > >
> > > > > arch/powerpc/include/asm/dma-mapping.h | 6 ++++++
> > > > > arch/powerpc/platforms/pseries/iommu.c | 11 +++++++++++
> > > > > drivers/virtio/virtio_ring.c | 10 ++++++++++
> > > > > 3 files changed, 27 insertions(+)
> > > > >
> > > > > diff --git a/arch/powerpc/include/asm/dma-mapping.h b/arch/powerpc/include/asm/dma-mapping.h
> > > > > index 8fa3945..056e578 100644
> > > > > --- a/arch/powerpc/include/asm/dma-mapping.h
> > > > > +++ b/arch/powerpc/include/asm/dma-mapping.h
> > > > > @@ -115,4 +115,10 @@ extern u64 __dma_get_required_mask(struct device *dev);
> > > > > #define ARCH_HAS_DMA_MMAP_COHERENT
> > > > >
> > > > > #endif /* __KERNEL__ */
> > > > > +
> > > > > +#define platform_forces_virtio_dma platform_forces_virtio_dma
> > > > > +
> > > > > +struct virtio_device;
> > > > > +
> > > > > +extern bool platform_forces_virtio_dma(struct virtio_device *vdev);
> > > > > #endif /* _ASM_DMA_MAPPING_H */
> > > > > diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> > > > > index 06f0296..a2ec15a 100644
> > > > > --- a/arch/powerpc/platforms/pseries/iommu.c
> > > > > +++ b/arch/powerpc/platforms/pseries/iommu.c
> > > > > @@ -38,6 +38,7 @@
> > > > > #include <linux/of.h>
> > > > > #include <linux/iommu.h>
> > > > > #include <linux/rculist.h>
> > > > > +#include <linux/virtio.h>
> > > > > #include <asm/io.h>
> > > > > #include <asm/prom.h>
> > > > > #include <asm/rtas.h>
> > > > > @@ -1396,3 +1397,13 @@ static int __init disable_multitce(char *str)
> > > > > __setup("multitce=", disable_multitce);
> > > > >
> > > > > machine_subsys_initcall_sync(pseries, tce_iommu_bus_notifier_init);
> > > > > +
> > > > > +bool platform_forces_virtio_dma(struct virtio_device *vdev)
> > > > > +{
> > > > > + /*
> > > > > + * On protected guest platforms, force virtio core to use DMA
> > > > > + * MAP API for all virtio devices. But there can also be some
> > > > > + * exceptions for individual devices like virtio balloon.
> > > > > + */
> > > > > + return (of_find_compatible_node(NULL, NULL, "ibm,ultravisor") != NULL);
> > > > > +}
> > > >
> > > > Isn't this kind of slow? vring_use_dma_api is on
> > > > data path and supposed to be very fast.
> > > >
> > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > index 21d464a..47ea6c3 100644
> > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > @@ -141,8 +141,18 @@ struct vring_virtqueue {
> > > > > * unconditionally on data path.
> > > > > */
> > > > >
> > > > > +#ifndef platform_forces_virtio_dma
> > > > > +static inline bool platform_forces_virtio_dma(struct virtio_device *vdev)
> > > > > +{
> > > > > + return false;
> > > > > +}
> > > > > +#endif
> > > > > +
> > > > > static bool vring_use_dma_api(struct virtio_device *vdev)
> > > > > {
> > > > > + if (platform_forces_virtio_dma(vdev))
> > > > > + return true;
> > > > > +
> > > > > if (!virtio_has_iommu_quirk(vdev))
> > > > > return true;
> > > > >
> > > > > --
> > > > > 2.9.3
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox