* Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM
From: Andrew Morton @ 2018-04-19 19:47 UTC (permalink / raw)
To: Mikulas Patocka
Cc: dm-devel, eric.dumazet, mst, netdev, linux-kernel, virtualization,
linux-mm, edumazet, bhutchings, David Miller, Vlastimil Babka
In-Reply-To: <alpine.LRH.2.02.1804191207380.31175@file01.intranet.prod.int.rdu2.redhat.com>
On Thu, 19 Apr 2018 12:12:38 -0400 (EDT) Mikulas Patocka <mpatocka@redhat.com> wrote:
> The kvmalloc function tries to use kmalloc and falls back to vmalloc if
> kmalloc fails.
>
> Unfortunatelly, some kernel code has bugs - it uses kvmalloc and then
> uses DMA-API on the returned memory or frees it with kfree. Such bugs were
> found in the virtio-net driver, dm-integrity or RHEL7 powerpc-specific
> code.
>
> These bugs are hard to reproduce because vmalloc falls back to kmalloc
> only if memory is fragmented.
Yes, that's nasty.
> In order to detect these bugs reliably I submit this patch that changes
> kvmalloc to always use vmalloc if CONFIG_DEBUG_VM is turned on.
>
> ...
>
> --- linux-2.6.orig/mm/util.c 2018-04-18 15:46:23.000000000 +0200
> +++ linux-2.6/mm/util.c 2018-04-18 16:00:43.000000000 +0200
> @@ -395,6 +395,7 @@ EXPORT_SYMBOL(vm_mmap);
> */
> void *kvmalloc_node(size_t size, gfp_t flags, int node)
> {
> +#ifndef CONFIG_DEBUG_VM
> gfp_t kmalloc_flags = flags;
> void *ret;
>
> @@ -426,6 +427,7 @@ void *kvmalloc_node(size_t size, gfp_t f
> */
> if (ret || size <= PAGE_SIZE)
> return ret;
> +#endif
>
> return __vmalloc_node_flags_caller(size, node, flags,
> __builtin_return_address(0));
Well, it doesn't have to be done at compile-time, does it? We could
add a knob (in debugfs, presumably) which enables this at runtime.
That's far more user-friendly.
^ permalink raw reply
* Re: [RFC] vhost: introduce mdev based hardware vhost backend
From: Michael S. Tsirkin @ 2018-04-19 18:40 UTC (permalink / raw)
To: Jason Wang
Cc: alexander.h.duyck, virtio-dev, kvm, netdev, linux-kernel,
virtualization, xiao.w.wang, ddutile, jianfeng.tan, zhihong.wang
In-Reply-To: <30a63fff-7599-640a-361f-a27e5783012a@redhat.com>
On Tue, Apr 10, 2018 at 03:25:45PM +0800, Jason Wang wrote:
> > > > One problem is that, different virtio ring compatible devices
> > > > may have different device interfaces. That is to say, we will
> > > > need different drivers in QEMU. It could be troublesome. And
> > > > that's what this patch trying to fix. The idea behind this
> > > > patch is very simple: mdev is a standard way to emulate device
> > > > in kernel.
> > > So you just move the abstraction layer from qemu to kernel, and you still
> > > need different drivers in kernel for different device interfaces of
> > > accelerators. This looks even more complex than leaving it in qemu. As you
> > > said, another idea is to implement userspace vhost backend for accelerators
> > > which seems easier and could co-work with other parts of qemu without
> > > inventing new type of messages.
> > I'm not quite sure. Do you think it's acceptable to
> > add various vendor specific hardware drivers in QEMU?
> >
>
> I don't object but we need to figure out the advantages of doing it in qemu
> too.
>
> Thanks
To be frank kernel is exactly where device drivers belong. DPDK did
move them to userspace but that's merely a requirement for data path.
*If* you can have them in kernel that is best:
- update kernel and there's no need to rebuild userspace
- apps can be written in any language no need to maintain multiple
libraries or add wrappers
- security concerns are much smaller (ok people are trying to
raise the bar with IOMMUs and such, but it's already pretty
good even without)
The biggest issue is that you let userspace poke at the
device which is also allowed by the IOMMU to poke at
kernel memory (needed for kernel driver to work).
Yes, maybe if device is not buggy it's all fine, but
it's better if we do not have to trust the device
otherwise the security picture becomes more murky.
I suggested attaching a PASID to (some) queues - see my old post "using
PASIDs to enable a safe variant of direct ring access".
Then using IOMMU with VFIO to limit access through queue to corrent
ranges of memory.
--
MST
^ permalink raw reply
* Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM
From: Vlastimil Babka @ 2018-04-19 18:28 UTC (permalink / raw)
To: Mikulas Patocka, David Miller, Andrew Morton, linux-mm
Cc: dm-devel, eric.dumazet, mst, netdev, linux-kernel, virtualization,
edumazet, bhutchings, Laura Abbott
In-Reply-To: <alpine.LRH.2.02.1804191207380.31175@file01.intranet.prod.int.rdu2.redhat.com>
On 04/19/2018 06:12 PM, Mikulas Patocka wrote:
> From: Mikulas Patocka <mpatocka@redhat.com>
> Subject: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM
>
> The kvmalloc function tries to use kmalloc and falls back to vmalloc if
> kmalloc fails.
>
> Unfortunatelly, some kernel code has bugs - it uses kvmalloc and then
> uses DMA-API on the returned memory or frees it with kfree. Such bugs were
> found in the virtio-net driver, dm-integrity or RHEL7 powerpc-specific
> code.
>
> These bugs are hard to reproduce because vmalloc falls back to kmalloc
> only if memory is fragmented.
>
> In order to detect these bugs reliably I submit this patch that changes
> kvmalloc to always use vmalloc if CONFIG_DEBUG_VM is turned on.
>
> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Hmm AFAIK Fedora uses CONFIG_DEBUG_VM in their kernels. Sure you want to
impose this on all users? Seems too much for DEBUG_VM to me. Maybe it
should be hidden under some error injection config?
> ---
> mm/util.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> Index: linux-2.6/mm/util.c
> ===================================================================
> --- linux-2.6.orig/mm/util.c 2018-04-18 15:46:23.000000000 +0200
> +++ linux-2.6/mm/util.c 2018-04-18 16:00:43.000000000 +0200
> @@ -395,6 +395,7 @@ EXPORT_SYMBOL(vm_mmap);
> */
> void *kvmalloc_node(size_t size, gfp_t flags, int node)
> {
> +#ifndef CONFIG_DEBUG_VM
> gfp_t kmalloc_flags = flags;
> void *ret;
>
> @@ -426,6 +427,7 @@ void *kvmalloc_node(size_t size, gfp_t f
> */
> if (ret || size <= PAGE_SIZE)
> return ret;
> +#endif
Did you verify that vmalloc does the right thing for sub-page sizes?
Shouldn't those be exempted?
> return __vmalloc_node_flags_caller(size, node, flags,
> __builtin_return_address(0));
>
^ permalink raw reply
* Re: [PATCH] virtio_ring: switch to dma_XX barriers for rpmsg
From: Michael S. Tsirkin @ 2018-04-19 18:10 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Ohad Ben-Cohen, linux-remoteproc, linux-kernel, virtualization,
Bjorn Andersson
In-Reply-To: <524fb400-2325-f60b-7e03-15be01888afc@redhat.com>
On Thu, Apr 19, 2018 at 07:48:24PM +0200, Paolo Bonzini wrote:
> On 19/04/2018 19:46, Michael S. Tsirkin wrote:
> >> This should be okay, but I wonder if there should be a virtio_wmb(...)
> >> or an "if (weak_barriers) wmb()" before the "writel" in vm_notify
> >> (drivers/virtio/virtio_mmio.c).
> >>
> >> Thanks,
> >>
> >> Paolo
> > That one uses weak barriers AFAIK.
> >
> > IIUC you mean rproc_virtio_notify.
> >
> > I suspect it works because specific kick callbacks have a barrier internally.
>
> Yes, that one. At least keystone_rproc_kick doesn't seem to have a barrier.
>
> Paolo
Maybe it's wrong or not used with virtio then.
This patch won't change anything with respect to that though.
--
MST
^ permalink raw reply
* Re: [PATCH] virtio_ring: switch to dma_XX barriers for rpmsg
From: Paolo Bonzini @ 2018-04-19 17:48 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Ohad Ben-Cohen, linux-remoteproc, linux-kernel, virtualization,
Bjorn Andersson
In-Reply-To: <20180419204017-mutt-send-email-mst@kernel.org>
On 19/04/2018 19:46, Michael S. Tsirkin wrote:
>> This should be okay, but I wonder if there should be a virtio_wmb(...)
>> or an "if (weak_barriers) wmb()" before the "writel" in vm_notify
>> (drivers/virtio/virtio_mmio.c).
>>
>> Thanks,
>>
>> Paolo
> That one uses weak barriers AFAIK.
>
> IIUC you mean rproc_virtio_notify.
>
> I suspect it works because specific kick callbacks have a barrier internally.
Yes, that one. At least keystone_rproc_kick doesn't seem to have a barrier.
Paolo
^ permalink raw reply
* Re: [PATCH] virtio_ring: switch to dma_XX barriers for rpmsg
From: Michael S. Tsirkin @ 2018-04-19 17:46 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Ohad Ben-Cohen, linux-remoteproc, linux-kernel, virtualization,
Bjorn Andersson
In-Reply-To: <5047d08c-9f9e-2902-d425-1fac849ce5b4@redhat.com>
On Thu, Apr 19, 2018 at 07:39:21PM +0200, Paolo Bonzini wrote:
> On 19/04/2018 19:35, Michael S. Tsirkin wrote:
> > virtio is using barriers to order memory accesses, thus
> > dma_wmb/rmb is a good match.
> >
> > Build-tested on x86: Before
> >
> > [mst@tuck linux]$ size drivers/virtio/virtio_ring.o
> > text data bss dec hex filename
> > 11392 820 0 12212 2fb4 drivers/virtio/virtio_ring.o
> >
> > After
> > mst@tuck linux]$ size drivers/virtio/virtio_ring.o
> > text data bss dec hex filename
> > 11284 820 0 12104 2f48 drivers/virtio/virtio_ring.o
> >
> > Cc: Ohad Ben-Cohen <ohad@wizery.com>
> > Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
> > Cc: linux-remoteproc@vger.kernel.org
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> >
> > It's good in theory, but could one of RPMSG maintainers please review
> > and ack this patch? Or even better test it?
> >
> > All these barriers are useless on Intel anyway ...
>
> This should be okay, but I wonder if there should be a virtio_wmb(...)
> or an "if (weak_barriers) wmb()" before the "writel" in vm_notify
> (drivers/virtio/virtio_mmio.c).
>
> Thanks,
>
> Paolo
That one uses weak barriers AFAIK.
IIUC you mean rproc_virtio_notify.
I suspect it works because specific kick callbacks have a barrier internally.
> >
> > include/linux/virtio_ring.h | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/linux/virtio_ring.h b/include/linux/virtio_ring.h
> > index bbf3252..fab0213 100644
> > --- a/include/linux/virtio_ring.h
> > +++ b/include/linux/virtio_ring.h
> > @@ -35,7 +35,7 @@ static inline void virtio_rmb(bool weak_barriers)
> > if (weak_barriers)
> > virt_rmb();
> > else
> > - rmb();
> > + dma_rmb();
> > }
> >
> > static inline void virtio_wmb(bool weak_barriers)
> > @@ -43,7 +43,7 @@ static inline void virtio_wmb(bool weak_barriers)
> > if (weak_barriers)
> > virt_wmb();
> > else
> > - wmb();
> > + dma_wmb();
> > }
> >
> > static inline void virtio_store_mb(bool weak_barriers,
> >
^ permalink raw reply
* Re: [PATCH] virtio_ring: switch to dma_XX barriers for rpmsg
From: Paolo Bonzini @ 2018-04-19 17:39 UTC (permalink / raw)
To: Michael S. Tsirkin, linux-kernel
Cc: Ohad Ben-Cohen, virtualization, linux-remoteproc, Bjorn Andersson
In-Reply-To: <1524159181-351878-1-git-send-email-mst@redhat.com>
On 19/04/2018 19:35, Michael S. Tsirkin wrote:
> virtio is using barriers to order memory accesses, thus
> dma_wmb/rmb is a good match.
>
> Build-tested on x86: Before
>
> [mst@tuck linux]$ size drivers/virtio/virtio_ring.o
> text data bss dec hex filename
> 11392 820 0 12212 2fb4 drivers/virtio/virtio_ring.o
>
> After
> mst@tuck linux]$ size drivers/virtio/virtio_ring.o
> text data bss dec hex filename
> 11284 820 0 12104 2f48 drivers/virtio/virtio_ring.o
>
> Cc: Ohad Ben-Cohen <ohad@wizery.com>
> Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
> Cc: linux-remoteproc@vger.kernel.org
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>
> It's good in theory, but could one of RPMSG maintainers please review
> and ack this patch? Or even better test it?
>
> All these barriers are useless on Intel anyway ...
This should be okay, but I wonder if there should be a virtio_wmb(...)
or an "if (weak_barriers) wmb()" before the "writel" in vm_notify
(drivers/virtio/virtio_mmio.c).
Thanks,
Paolo
>
> include/linux/virtio_ring.h | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/virtio_ring.h b/include/linux/virtio_ring.h
> index bbf3252..fab0213 100644
> --- a/include/linux/virtio_ring.h
> +++ b/include/linux/virtio_ring.h
> @@ -35,7 +35,7 @@ static inline void virtio_rmb(bool weak_barriers)
> if (weak_barriers)
> virt_rmb();
> else
> - rmb();
> + dma_rmb();
> }
>
> static inline void virtio_wmb(bool weak_barriers)
> @@ -43,7 +43,7 @@ static inline void virtio_wmb(bool weak_barriers)
> if (weak_barriers)
> virt_wmb();
> else
> - wmb();
> + dma_wmb();
> }
>
> static inline void virtio_store_mb(bool weak_barriers,
>
^ permalink raw reply
* [PATCH] virtio_ring: switch to dma_XX barriers for rpmsg
From: Michael S. Tsirkin @ 2018-04-19 17:35 UTC (permalink / raw)
To: linux-kernel
Cc: Ohad Ben-Cohen, linux-remoteproc, virtualization, Paolo Bonzini,
Bjorn Andersson
virtio is using barriers to order memory accesses, thus
dma_wmb/rmb is a good match.
Build-tested on x86: Before
[mst@tuck linux]$ size drivers/virtio/virtio_ring.o
text data bss dec hex filename
11392 820 0 12212 2fb4 drivers/virtio/virtio_ring.o
After
mst@tuck linux]$ size drivers/virtio/virtio_ring.o
text data bss dec hex filename
11284 820 0 12104 2f48 drivers/virtio/virtio_ring.o
Cc: Ohad Ben-Cohen <ohad@wizery.com>
Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
Cc: linux-remoteproc@vger.kernel.org
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
It's good in theory, but could one of RPMSG maintainers please review
and ack this patch? Or even better test it?
All these barriers are useless on Intel anyway ...
include/linux/virtio_ring.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/include/linux/virtio_ring.h b/include/linux/virtio_ring.h
index bbf3252..fab0213 100644
--- a/include/linux/virtio_ring.h
+++ b/include/linux/virtio_ring.h
@@ -35,7 +35,7 @@ static inline void virtio_rmb(bool weak_barriers)
if (weak_barriers)
virt_rmb();
else
- rmb();
+ dma_rmb();
}
static inline void virtio_wmb(bool weak_barriers)
@@ -43,7 +43,7 @@ static inline void virtio_wmb(bool weak_barriers)
if (weak_barriers)
virt_wmb();
else
- wmb();
+ dma_wmb();
}
static inline void virtio_store_mb(bool weak_barriers,
--
MST
^ permalink raw reply related
* Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM
From: Michael S. Tsirkin @ 2018-04-19 16:43 UTC (permalink / raw)
To: Mikulas Patocka
Cc: dm-devel, eric.dumazet, netdev, linux-kernel, virtualization,
linux-mm, edumazet, bhutchings, Andrew Morton, David Miller,
Vlastimil Babka
In-Reply-To: <alpine.LRH.2.02.1804191207380.31175@file01.intranet.prod.int.rdu2.redhat.com>
On Thu, Apr 19, 2018 at 12:12:38PM -0400, Mikulas Patocka wrote:
>
>
> On Wed, 18 Apr 2018, Mikulas Patocka wrote:
>
> >
> >
> > On Wed, 18 Apr 2018, David Miller wrote:
> >
> > > From: Mikulas Patocka <mpatocka@redhat.com>
> > > Date: Wed, 18 Apr 2018 12:44:25 -0400 (EDT)
> > >
> > > > The structure net_device is followed by arbitrary driver-specific data
> > > > (accessible with the function netdev_priv). And for virtio-net, these
> > > > driver-specific data must be in DMA memory.
> > >
> > > And we are saying that this assumption is wrong and needs to be
> > > corrected.
> >
> > So, try to find all the networking drivers that to DMA to the private
> > area.
> >
> > The problem here is that kvzalloc usually returns DMA-able area, but it
> > may return non-DMA area rarely, if the memory is too fragmented. So, we
> > are in a situation, where some networking drivers will randomly fail. Go
> > and find them.
> >
> > Mikulas
>
> Her I submit a patch that makes kvmalloc always use vmalloc if
> CONFIG_DEBUG_VM is defined.
>
>
>
>
> From: Mikulas Patocka <mpatocka@redhat.com>
> Subject: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM
>
> The kvmalloc function tries to use kmalloc and falls back to vmalloc if
> kmalloc fails.
>
> Unfortunatelly, some kernel code has bugs - it uses kvmalloc and then
> uses DMA-API on the returned memory or frees it with kfree. Such bugs were
> found in the virtio-net driver, dm-integrity or RHEL7 powerpc-specific
> code.
>
> These bugs are hard to reproduce because vmalloc falls back to kmalloc
> only if memory is fragmented.
>
> In order to detect these bugs reliably I submit this patch that changes
> kvmalloc to always use vmalloc if CONFIG_DEBUG_VM is turned on.
>
> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Maybe make it conditional on CONFIG_DEBUG_SG too?
Otherwise I think you just trigger a hard to debug memory corruption.
> ---
> mm/util.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> Index: linux-2.6/mm/util.c
> ===================================================================
> --- linux-2.6.orig/mm/util.c 2018-04-18 15:46:23.000000000 +0200
> +++ linux-2.6/mm/util.c 2018-04-18 16:00:43.000000000 +0200
> @@ -395,6 +395,7 @@ EXPORT_SYMBOL(vm_mmap);
> */
> void *kvmalloc_node(size_t size, gfp_t flags, int node)
> {
> +#ifndef CONFIG_DEBUG_VM
> gfp_t kmalloc_flags = flags;
> void *ret;
>
> @@ -426,6 +427,7 @@ void *kvmalloc_node(size_t size, gfp_t f
> */
> if (ret || size <= PAGE_SIZE)
> return ret;
> +#endif
>
> return __vmalloc_node_flags_caller(size, node, flags,
> __builtin_return_address(0));
^ permalink raw reply
* Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM
From: Mikulas Patocka @ 2018-04-19 16:28 UTC (permalink / raw)
To: Eric Dumazet
Cc: dm-devel, mst, netdev, linux-kernel, virtualization, linux-mm,
edumazet, bhutchings, Andrew Morton, David Miller,
Vlastimil Babka
In-Reply-To: <6260a8af-3166-94d3-9441-104d342ab7a1@gmail.com>
On Thu, 19 Apr 2018, Eric Dumazet wrote:
>
>
> On 04/19/2018 09:12 AM, Mikulas Patocka wrote:
> >
> >
> > These bugs are hard to reproduce because vmalloc falls back to kmalloc
> > only if memory is fragmented.
> >
>
> This sentence is wrong.
>
> .... because kvmalloc() falls back to vmalloc() ...
Yes. There should be "falls back to vmalloc()".
Mikulas
^ permalink raw reply
* Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM
From: Eric Dumazet @ 2018-04-19 16:25 UTC (permalink / raw)
To: Mikulas Patocka, David Miller, Andrew Morton, linux-mm
Cc: dm-devel, eric.dumazet, mst, netdev, linux-kernel, virtualization,
edumazet, bhutchings, Vlastimil Babka
In-Reply-To: <alpine.LRH.2.02.1804191207380.31175@file01.intranet.prod.int.rdu2.redhat.com>
On 04/19/2018 09:12 AM, Mikulas Patocka wrote:
>
>
> These bugs are hard to reproduce because vmalloc falls back to kmalloc
> only if memory is fragmented.
>
This sentence is wrong.
.... because kvmalloc() falls back to vmalloc() ...
^ permalink raw reply
* [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM
From: Mikulas Patocka @ 2018-04-19 16:12 UTC (permalink / raw)
To: David Miller, Andrew Morton, linux-mm
Cc: dm-devel, eric.dumazet, mst, netdev, linux-kernel, virtualization,
edumazet, bhutchings, Vlastimil Babka
In-Reply-To: <alpine.LRH.2.02.1804181350050.17942@file01.intranet.prod.int.rdu2.redhat.com>
On Wed, 18 Apr 2018, Mikulas Patocka wrote:
>
>
> On Wed, 18 Apr 2018, David Miller wrote:
>
> > From: Mikulas Patocka <mpatocka@redhat.com>
> > Date: Wed, 18 Apr 2018 12:44:25 -0400 (EDT)
> >
> > > The structure net_device is followed by arbitrary driver-specific data
> > > (accessible with the function netdev_priv). And for virtio-net, these
> > > driver-specific data must be in DMA memory.
> >
> > And we are saying that this assumption is wrong and needs to be
> > corrected.
>
> So, try to find all the networking drivers that to DMA to the private
> area.
>
> The problem here is that kvzalloc usually returns DMA-able area, but it
> may return non-DMA area rarely, if the memory is too fragmented. So, we
> are in a situation, where some networking drivers will randomly fail. Go
> and find them.
>
> Mikulas
Her I submit a patch that makes kvmalloc always use vmalloc if
CONFIG_DEBUG_VM is defined.
From: Mikulas Patocka <mpatocka@redhat.com>
Subject: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM
The kvmalloc function tries to use kmalloc and falls back to vmalloc if
kmalloc fails.
Unfortunatelly, some kernel code has bugs - it uses kvmalloc and then
uses DMA-API on the returned memory or frees it with kfree. Such bugs were
found in the virtio-net driver, dm-integrity or RHEL7 powerpc-specific
code.
These bugs are hard to reproduce because vmalloc falls back to kmalloc
only if memory is fragmented.
In order to detect these bugs reliably I submit this patch that changes
kvmalloc to always use vmalloc if CONFIG_DEBUG_VM is turned on.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
---
mm/util.c | 2 ++
1 file changed, 2 insertions(+)
Index: linux-2.6/mm/util.c
===================================================================
--- linux-2.6.orig/mm/util.c 2018-04-18 15:46:23.000000000 +0200
+++ linux-2.6/mm/util.c 2018-04-18 16:00:43.000000000 +0200
@@ -395,6 +395,7 @@ EXPORT_SYMBOL(vm_mmap);
*/
void *kvmalloc_node(size_t size, gfp_t flags, int node)
{
+#ifndef CONFIG_DEBUG_VM
gfp_t kmalloc_flags = flags;
void *ret;
@@ -426,6 +427,7 @@ void *kvmalloc_node(size_t size, gfp_t f
*/
if (ret || size <= PAGE_SIZE)
return ret;
+#endif
return __vmalloc_node_flags_caller(size, node, flags,
__builtin_return_address(0));
^ permalink raw reply
* Re: [PATCH v2 net 2/3] virtio_net: fix adding vids on big-endian
From: Michael S. Tsirkin @ 2018-04-19 14:56 UTC (permalink / raw)
To: Cornelia Huck
Cc: Thomas Huth, Eric Dumazet, netdev, linux-kernel, virtualization,
Mikulas Patocka, David Miller
In-Reply-To: <20180419152641.092865f3.cohuck@redhat.com>
On Thu, Apr 19, 2018 at 03:26:41PM +0200, Cornelia Huck wrote:
> On Thu, 19 Apr 2018 08:30:49 +0300
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
>
> > Programming vids (adding or removing them) still passes
> > guest-endian values in the DMA buffer. That's wrong
> > if guest is big-endian and when virtio 1 is enabled.
> >
> > Note: this is on top of a previous patch:
> > virtio_net: split out ctrl buffer
> >
> > Fixes: 9465a7a6f ("virtio_net: enable v1.0 support")
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> > drivers/net/virtio_net.c | 6 +++---
> > 1 file changed, 3 insertions(+), 3 deletions(-)
>
> Ouch. Have you seen any bug reports for that?
No, but then vlans within VMs aren't used too often
(as opposed to attaching vlans by the HV).
--
MST
^ permalink raw reply
* Re: [PATCH v2 net 2/3] virtio_net: fix adding vids on big-endian
From: Cornelia Huck @ 2018-04-19 13:26 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Thomas Huth, Eric Dumazet, netdev, linux-kernel, virtualization,
Mikulas Patocka, David Miller
In-Reply-To: <1524115776-334953-3-git-send-email-mst@redhat.com>
On Thu, 19 Apr 2018 08:30:49 +0300
"Michael S. Tsirkin" <mst@redhat.com> wrote:
> Programming vids (adding or removing them) still passes
> guest-endian values in the DMA buffer. That's wrong
> if guest is big-endian and when virtio 1 is enabled.
>
> Note: this is on top of a previous patch:
> virtio_net: split out ctrl buffer
>
> Fixes: 9465a7a6f ("virtio_net: enable v1.0 support")
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
> drivers/net/virtio_net.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
Ouch. Have you seen any bug reports for that?
^ permalink raw reply
* Re: [PATCH v2 net 3/3] virtio_net: sparse annotation fix
From: Jason Wang @ 2018-04-19 12:27 UTC (permalink / raw)
To: Michael S. Tsirkin, linux-kernel
Cc: Thomas Huth, Eric Dumazet, netdev, Cornelia Huck, virtualization,
Mikulas Patocka, David Miller
In-Reply-To: <1524115776-334953-4-git-send-email-mst@redhat.com>
On 2018年04月19日 13:30, Michael S. Tsirkin wrote:
> offloads is a buffer in virtio format, should use
> the __virtio64 tag.
>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
> drivers/net/virtio_net.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index f84fe04..c5b11f2 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -155,7 +155,7 @@ struct control_buf {
> u8 promisc;
> u8 allmulti;
> __virtio16 vid;
> - u64 offloads;
> + __virtio64 offloads;
> };
>
> struct virtnet_info {
Acked-by: Jason Wang <jasowang@redhat.com>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH v2 net 2/3] virtio_net: fix adding vids on big-endian
From: Jason Wang @ 2018-04-19 12:26 UTC (permalink / raw)
To: Michael S. Tsirkin, linux-kernel
Cc: Thomas Huth, Eric Dumazet, netdev, Cornelia Huck, virtualization,
Mikulas Patocka, David Miller
In-Reply-To: <1524115776-334953-3-git-send-email-mst@redhat.com>
On 2018年04月19日 13:30, Michael S. Tsirkin wrote:
> Programming vids (adding or removing them) still passes
> guest-endian values in the DMA buffer. That's wrong
> if guest is big-endian and when virtio 1 is enabled.
>
> Note: this is on top of a previous patch:
> virtio_net: split out ctrl buffer
>
> Fixes: 9465a7a6f ("virtio_net: enable v1.0 support")
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
> drivers/net/virtio_net.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 3d0eff53..f84fe04 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -154,7 +154,7 @@ struct control_buf {
> struct virtio_net_ctrl_mq mq;
> u8 promisc;
> u8 allmulti;
> - u16 vid;
> + __virtio16 vid;
> u64 offloads;
> };
>
> @@ -1718,7 +1718,7 @@ static int virtnet_vlan_rx_add_vid(struct net_device *dev,
> struct virtnet_info *vi = netdev_priv(dev);
> struct scatterlist sg;
>
> - vi->ctrl->vid = vid;
> + vi->ctrl->vid = cpu_to_virtio16(vi->vdev, vid);
> sg_init_one(&sg, &vi->ctrl->vid, sizeof(vi->ctrl->vid));
>
> if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_VLAN,
> @@ -1733,7 +1733,7 @@ static int virtnet_vlan_rx_kill_vid(struct net_device *dev,
> struct virtnet_info *vi = netdev_priv(dev);
> struct scatterlist sg;
>
> - vi->ctrl->vid = vid;
> + vi->ctrl->vid = cpu_to_virtio16(vi->vdev, vid);
> sg_init_one(&sg, &vi->ctrl->vid, sizeof(vi->ctrl->vid));
>
> if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_VLAN,
Acked-by: Jason Wang <jasowang@redhat.com>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH v2 net 1/3] virtio_net: split out ctrl buffer
From: Jason Wang @ 2018-04-19 12:26 UTC (permalink / raw)
To: Michael S. Tsirkin, linux-kernel
Cc: Thomas Huth, Eric Dumazet, netdev, Cornelia Huck, virtualization,
Mikulas Patocka, David Miller
In-Reply-To: <1524115776-334953-2-git-send-email-mst@redhat.com>
On 2018年04月19日 13:30, Michael S. Tsirkin wrote:
> When sending control commands, virtio net sets up several buffers for
> DMA. The buffers are all part of the net device which means it's
> actually allocated by kvmalloc so it's in theory (on extreme memory
> pressure) possible to get a vmalloc'ed buffer which on some platforms
> means we can't DMA there.
>
> Fix up by moving the DMA buffers into a separate structure.
>
> Reported-by: Mikulas Patocka <mpatocka@redhat.com>
> Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>
> Changes from v1:
> build fix
>
> drivers/net/virtio_net.c | 68 +++++++++++++++++++++++++++---------------------
> 1 file changed, 39 insertions(+), 29 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 7b187ec..3d0eff53 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -147,6 +147,17 @@ struct receive_queue {
> struct xdp_rxq_info xdp_rxq;
> };
>
> +/* Control VQ buffers: protected by the rtnl lock */
> +struct control_buf {
> + struct virtio_net_ctrl_hdr hdr;
> + virtio_net_ctrl_ack status;
> + struct virtio_net_ctrl_mq mq;
> + u8 promisc;
> + u8 allmulti;
> + u16 vid;
> + u64 offloads;
> +};
> +
> struct virtnet_info {
> struct virtio_device *vdev;
> struct virtqueue *cvq;
> @@ -192,14 +203,7 @@ struct virtnet_info {
> struct hlist_node node;
> struct hlist_node node_dead;
>
> - /* Control VQ buffers: protected by the rtnl lock */
> - struct virtio_net_ctrl_hdr ctrl_hdr;
> - virtio_net_ctrl_ack ctrl_status;
> - struct virtio_net_ctrl_mq ctrl_mq;
> - u8 ctrl_promisc;
> - u8 ctrl_allmulti;
> - u16 ctrl_vid;
> - u64 ctrl_offloads;
> + struct control_buf *ctrl;
>
> /* Ethtool settings */
> u8 duplex;
> @@ -1454,25 +1458,25 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd,
> /* Caller should know better */
> BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ));
>
> - vi->ctrl_status = ~0;
> - vi->ctrl_hdr.class = class;
> - vi->ctrl_hdr.cmd = cmd;
> + vi->ctrl->status = ~0;
> + vi->ctrl->hdr.class = class;
> + vi->ctrl->hdr.cmd = cmd;
> /* Add header */
> - sg_init_one(&hdr, &vi->ctrl_hdr, sizeof(vi->ctrl_hdr));
> + sg_init_one(&hdr, &vi->ctrl->hdr, sizeof(vi->ctrl->hdr));
> sgs[out_num++] = &hdr;
>
> if (out)
> sgs[out_num++] = out;
>
> /* Add return status. */
> - sg_init_one(&stat, &vi->ctrl_status, sizeof(vi->ctrl_status));
> + sg_init_one(&stat, &vi->ctrl->status, sizeof(vi->ctrl->status));
> sgs[out_num] = &stat;
>
> BUG_ON(out_num + 1 > ARRAY_SIZE(sgs));
> virtqueue_add_sgs(vi->cvq, sgs, out_num, 1, vi, GFP_ATOMIC);
>
> if (unlikely(!virtqueue_kick(vi->cvq)))
> - return vi->ctrl_status == VIRTIO_NET_OK;
> + return vi->ctrl->status == VIRTIO_NET_OK;
>
> /* Spin for a response, the kick causes an ioport write, trapping
> * into the hypervisor, so the request should be handled immediately.
> @@ -1481,7 +1485,7 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd,
> !virtqueue_is_broken(vi->cvq))
> cpu_relax();
>
> - return vi->ctrl_status == VIRTIO_NET_OK;
> + return vi->ctrl->status == VIRTIO_NET_OK;
> }
>
> static int virtnet_set_mac_address(struct net_device *dev, void *p)
> @@ -1593,8 +1597,8 @@ static int _virtnet_set_queues(struct virtnet_info *vi, u16 queue_pairs)
> if (!vi->has_cvq || !virtio_has_feature(vi->vdev, VIRTIO_NET_F_MQ))
> return 0;
>
> - vi->ctrl_mq.virtqueue_pairs = cpu_to_virtio16(vi->vdev, queue_pairs);
> - sg_init_one(&sg, &vi->ctrl_mq, sizeof(vi->ctrl_mq));
> + vi->ctrl->mq.virtqueue_pairs = cpu_to_virtio16(vi->vdev, queue_pairs);
> + sg_init_one(&sg, &vi->ctrl->mq, sizeof(vi->ctrl->mq));
>
> if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_MQ,
> VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET, &sg)) {
> @@ -1653,22 +1657,22 @@ static void virtnet_set_rx_mode(struct net_device *dev)
> if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_RX))
> return;
>
> - vi->ctrl_promisc = ((dev->flags & IFF_PROMISC) != 0);
> - vi->ctrl_allmulti = ((dev->flags & IFF_ALLMULTI) != 0);
> + vi->ctrl->promisc = ((dev->flags & IFF_PROMISC) != 0);
> + vi->ctrl->allmulti = ((dev->flags & IFF_ALLMULTI) != 0);
>
> - sg_init_one(sg, &vi->ctrl_promisc, sizeof(vi->ctrl_promisc));
> + sg_init_one(sg, &vi->ctrl->promisc, sizeof(vi->ctrl->promisc));
>
> if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_RX,
> VIRTIO_NET_CTRL_RX_PROMISC, sg))
> dev_warn(&dev->dev, "Failed to %sable promisc mode.\n",
> - vi->ctrl_promisc ? "en" : "dis");
> + vi->ctrl->promisc ? "en" : "dis");
>
> - sg_init_one(sg, &vi->ctrl_allmulti, sizeof(vi->ctrl_allmulti));
> + sg_init_one(sg, &vi->ctrl->allmulti, sizeof(vi->ctrl->allmulti));
>
> if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_RX,
> VIRTIO_NET_CTRL_RX_ALLMULTI, sg))
> dev_warn(&dev->dev, "Failed to %sable allmulti mode.\n",
> - vi->ctrl_allmulti ? "en" : "dis");
> + vi->ctrl->allmulti ? "en" : "dis");
>
> uc_count = netdev_uc_count(dev);
> mc_count = netdev_mc_count(dev);
> @@ -1714,8 +1718,8 @@ static int virtnet_vlan_rx_add_vid(struct net_device *dev,
> struct virtnet_info *vi = netdev_priv(dev);
> struct scatterlist sg;
>
> - vi->ctrl_vid = vid;
> - sg_init_one(&sg, &vi->ctrl_vid, sizeof(vi->ctrl_vid));
> + vi->ctrl->vid = vid;
> + sg_init_one(&sg, &vi->ctrl->vid, sizeof(vi->ctrl->vid));
>
> if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_VLAN,
> VIRTIO_NET_CTRL_VLAN_ADD, &sg))
> @@ -1729,8 +1733,8 @@ static int virtnet_vlan_rx_kill_vid(struct net_device *dev,
> struct virtnet_info *vi = netdev_priv(dev);
> struct scatterlist sg;
>
> - vi->ctrl_vid = vid;
> - sg_init_one(&sg, &vi->ctrl_vid, sizeof(vi->ctrl_vid));
> + vi->ctrl->vid = vid;
> + sg_init_one(&sg, &vi->ctrl->vid, sizeof(vi->ctrl->vid));
>
> if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_VLAN,
> VIRTIO_NET_CTRL_VLAN_DEL, &sg))
> @@ -2126,9 +2130,9 @@ static int virtnet_restore_up(struct virtio_device *vdev)
> static int virtnet_set_guest_offloads(struct virtnet_info *vi, u64 offloads)
> {
> struct scatterlist sg;
> - vi->ctrl_offloads = cpu_to_virtio64(vi->vdev, offloads);
> + vi->ctrl->offloads = cpu_to_virtio64(vi->vdev, offloads);
>
> - sg_init_one(&sg, &vi->ctrl_offloads, sizeof(vi->ctrl_offloads));
> + sg_init_one(&sg, &vi->ctrl->offloads, sizeof(vi->ctrl->offloads));
>
> if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_GUEST_OFFLOADS,
> VIRTIO_NET_CTRL_GUEST_OFFLOADS_SET, &sg)) {
> @@ -2351,6 +2355,7 @@ static void virtnet_free_queues(struct virtnet_info *vi)
>
> kfree(vi->rq);
> kfree(vi->sq);
> + kfree(vi->ctrl);
> }
>
> static void _free_receive_bufs(struct virtnet_info *vi)
> @@ -2543,6 +2548,9 @@ static int virtnet_alloc_queues(struct virtnet_info *vi)
> {
> int i;
>
> + vi->ctrl = kzalloc(sizeof(*vi->ctrl), GFP_KERNEL);
> + if (!vi->ctrl)
> + goto err_ctrl;
> vi->sq = kzalloc(sizeof(*vi->sq) * vi->max_queue_pairs, GFP_KERNEL);
> if (!vi->sq)
> goto err_sq;
> @@ -2571,6 +2579,8 @@ static int virtnet_alloc_queues(struct virtnet_info *vi)
> err_rq:
> kfree(vi->sq);
> err_sq:
> + kfree(vi->ctrl);
> +err_ctrl:
> return -ENOMEM;
> }
>
Acked-by: Jason Wang <jasowang@redhat.com>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH net] virtio_net: split out ctrl buffer
From: kbuild test robot @ 2018-04-19 8:05 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Eric Dumazet, netdev, linux-kernel, virtualization,
Mikulas Patocka, kbuild-all, David Miller
In-Reply-To: <1524113437-308621-1-git-send-email-mst@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 1725 bytes --]
Hi Michael,
I love your patch! Yet something to improve:
[auto build test ERROR on net/master]
url: https://github.com/0day-ci/linux/commits/Michael-S-Tsirkin/virtio_net-split-out-ctrl-buffer/20180419-145754
config: i386-randconfig-a0-201815 (attached as .config)
compiler: gcc-4.9 (Debian 4.9.4-2) 4.9.4
reproduce:
# save the attached .config to linux build tree
make ARCH=i386
All errors (new ones prefixed by >>):
drivers/net/virtio_net.c: In function 'virtnet_free_queues':
>> drivers/net/virtio_net.c:2365:10: error: 'struct virtnet_info' has no member named 'err_ctrl'
kfree(vi->err_ctrl);
^
drivers/net/virtio_net.c: In function 'virtnet_alloc_queues':
>> drivers/net/virtio_net.c:2589:8: error: 'vq' undeclared (first use in this function)
kfree(vq->ctrl);
^
drivers/net/virtio_net.c:2589:8: note: each undeclared identifier is reported only once for each function it appears in
vim +2365 drivers/net/virtio_net.c
2347
2348 static void virtnet_free_queues(struct virtnet_info *vi)
2349 {
2350 int i;
2351
2352 for (i = 0; i < vi->max_queue_pairs; i++) {
2353 napi_hash_del(&vi->rq[i].napi);
2354 netif_napi_del(&vi->rq[i].napi);
2355 netif_napi_del(&vi->sq[i].napi);
2356 }
2357
2358 /* We called napi_hash_del() before netif_napi_del(),
2359 * we need to respect an RCU grace period before freeing vi->rq
2360 */
2361 synchronize_net();
2362
2363 kfree(vi->rq);
2364 kfree(vi->sq);
> 2365 kfree(vi->err_ctrl);
2366 }
2367
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 31332 bytes --]
[-- Attachment #3: Type: text/plain, Size: 183 bytes --]
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH net] virtio_net: split out ctrl buffer
From: kbuild test robot @ 2018-04-19 7:40 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Eric Dumazet, netdev, linux-kernel, virtualization,
Mikulas Patocka, kbuild-all, David Miller
In-Reply-To: <1524113437-308621-1-git-send-email-mst@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 1822 bytes --]
Hi Michael,
I love your patch! Yet something to improve:
[auto build test ERROR on net/master]
url: https://github.com/0day-ci/linux/commits/Michael-S-Tsirkin/virtio_net-split-out-ctrl-buffer/20180419-145754
config: x86_64-randconfig-x006-201815 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64
All errors (new ones prefixed by >>):
drivers//net/virtio_net.c: In function 'virtnet_free_queues':
>> drivers//net/virtio_net.c:2365:12: error: 'struct virtnet_info' has no member named 'err_ctrl'; did you mean 'ctrl'?
kfree(vi->err_ctrl);
^~~~~~~~
ctrl
drivers//net/virtio_net.c: In function 'virtnet_alloc_queues':
>> drivers//net/virtio_net.c:2589:8: error: 'vq' undeclared (first use in this function); did you mean 'vi'?
kfree(vq->ctrl);
^~
vi
drivers//net/virtio_net.c:2589:8: note: each undeclared identifier is reported only once for each function it appears in
vim +2365 drivers//net/virtio_net.c
2347
2348 static void virtnet_free_queues(struct virtnet_info *vi)
2349 {
2350 int i;
2351
2352 for (i = 0; i < vi->max_queue_pairs; i++) {
2353 napi_hash_del(&vi->rq[i].napi);
2354 netif_napi_del(&vi->rq[i].napi);
2355 netif_napi_del(&vi->sq[i].napi);
2356 }
2357
2358 /* We called napi_hash_del() before netif_napi_del(),
2359 * we need to respect an RCU grace period before freeing vi->rq
2360 */
2361 synchronize_net();
2362
2363 kfree(vi->rq);
2364 kfree(vi->sq);
> 2365 kfree(vi->err_ctrl);
2366 }
2367
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 27230 bytes --]
[-- Attachment #3: Type: text/plain, Size: 183 bytes --]
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [RFC PATCH net-next v6 2/4] net: Introduce generic bypass module
From: Jiri Pirko @ 2018-04-19 7:22 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: alexander.h.duyck, virtio-dev, kubakici, Samudrala, Sridhar,
virtualization, loseweigh, netdev, davem
In-Reply-To: <20180419070752-mutt-send-email-mst@kernel.org>
Thu, Apr 19, 2018 at 06:08:58AM CEST, mst@redhat.com wrote:
>On Wed, Apr 18, 2018 at 10:32:06PM +0200, Jiri Pirko wrote:
>> >> >> > With regards to alternate names for 'active', you suggested 'stolen', but i
>> >> >> > am not too happy with it.
>> >> >> > netvsc uses vf_netdev, are you OK with this? Or another option is 'passthru'
>> >> >> No. The netdev could be any netdevice. It does not have to be a "VF".
>> >> >> I think "stolen" is quite appropriate since it describes the modus
>> >> >> operandi. The bypass master steals some netdevice according to some
>> >> >> match.
>> >> >>
>> >> >> But I don't insist on "stolen". Just sounds right.
>> >> >
>> >> >We are adding VIRTIO_NET_F_BACKUP as a new feature bit to enable this feature, So i think
>> >> >'backup' name is consistent.
>> >>
>> >> It perhaps makes sense from the view of virtio device. However, as I
>> >> described couple of times, for master/slave device the name "backup" is
>> >> highly misleading.
>> >
>> >virtio is the backup. You are supposed to use another
>> >(typically passthrough) device, if that fails use virtio.
>> >It does seem appropriate to me. If you like, we can
>> >change that to "standby". Active I don't like either. "main"?
>>
>> Sounds much better, yes.
>
>Excuse me, which of the versions are better in your eyes?
standby is okay. main/primary is fine too.
>
>
>>
>> >
>> >In fact would failover be better than bypass?
>>
>> Also, much better.
>>
^ permalink raw reply
* Re: [virtio-dev] Re: [RFC PATCH 2/3] netdev: kernel-only IFF_HIDDEN netdevice
From: Siwei Liu @ 2018-04-19 6:43 UTC (permalink / raw)
To: Samudrala, Sridhar
Cc: Alexander Duyck, virtio-dev, Jiri Pirko, Michael S. Tsirkin,
Jakub Kicinski, Netdev, virtualization, David Ahern, si-wei liu,
David Miller
In-Reply-To: <5d5f1e52-bddf-5551-bdea-8e7b408b39b9@intel.com>
On Wed, Apr 18, 2018 at 11:10 PM, Samudrala, Sridhar
<sridhar.samudrala@intel.com> wrote:
>
> On 4/18/2018 10:07 PM, Michael S. Tsirkin wrote:
>>
>> On Wed, Apr 18, 2018 at 10:00:51PM -0700, Samudrala, Sridhar wrote:
>>>
>>> On 4/18/2018 9:41 PM, Michael S. Tsirkin wrote:
>>>>
>>>> On Wed, Apr 18, 2018 at 04:33:34PM -0700, Samudrala, Sridhar wrote:
>>>>>
>>>>> On 4/17/2018 5:26 PM, Siwei Liu wrote:
>>>>>>
>>>>>> I ran this with a few folks offline and gathered some good feedbacks
>>>>>> that I'd like to share thus revive the discussion.
>>>>>>
>>>>>> First of all, as illustrated in the reply below, cloud service
>>>>>> providers require transparent live migration. Specifically, the main
>>>>>> target of our case is to support SR-IOV live migration via kernel
>>>>>> upgrade while keeping the userspace of old distros unmodified. If it's
>>>>>> because this use case is not appealing enough for the mainline to
>>>>>> adopt, I will shut up and not continue discussing, although
>>>>>> technically it's entirely possible (and there's precedent in other
>>>>>> implementation) to do so to benefit any cloud service providers.
>>>>>>
>>>>>> If it's just the implementation of hiding netdev itself needs to be
>>>>>> improved, such as implementing it as attribute flag or adding linkdump
>>>>>> API, that's completely fine and we can look into that. However, the
>>>>>> specific issue needs to be undestood beforehand is to make transparent
>>>>>> SR-IOV to be able to take over the name (so inherit all the configs)
>>>>>> from the lower netdev, which needs some games with uevents and name
>>>>>> space reservation. So far I don't think it's been well discussed.
>>>>>>
>>>>>> One thing in particular I'd like to point out is that the 3-netdev
>>>>>> model currently missed to address the core problem of live migration:
>>>>>> migration of hardware specific feature/state, for e.g. ethtool configs
>>>>>> and hardware offloading states. Only general network state (IP
>>>>>> address, gateway, for eg.) associated with the bypass interface can be
>>>>>> migrated. As a follow-up work, bypass driver can/should be enhanced to
>>>>>> save and apply those hardware specific configs before or after
>>>>>> migration as needed. The transparent 1-netdev model being proposed as
>>>>>> part of this patch series will be able to solve that problem naturally
>>>>>> by making all hardware specific configurations go through the central
>>>>>> bypass driver, such that hardware configurations can be replayed when
>>>>>> new VF or passthrough gets plugged back in. Although that
>>>>>> corresponding function hasn't been implemented today, I'd like to
>>>>>> refresh everyone's mind that is the core problem any live migration
>>>>>> proposal should have addressed.
>>>>>>
>>>>>> If it would make things more clear to defer netdev hiding until all
>>>>>> functionalities regarding centralizing and replay are implemented,
>>>>>> we'd take advices like that and move on to implementing those features
>>>>>> as follow-up patches. Once all needed features get done, we'd resume
>>>>>> the work for hiding lower netdev at that point. Think it would be the
>>>>>> best to make everyone understand the big picture in advance before
>>>>>> going too far.
>>>>>
>>>>> I think we should get the 3-netdev model integrated and add any
>>>>> additional
>>>>> ndo_ops/ethool ops that we would like to support/migrate before looking
>>>>> into
>>>>> hiding the lower netdevs.
>>>>
>>>> Once they are exposed, I don't think we'll be able to hide them -
>>>> they will be a kernel ABI.
>>>>
>>>> Do you think everyone needs to hide the SRIOV device?
>>>> Or that only some users need this?
>>>
>>> Hyper-V is currently supporting live migration without hiding the SR-IOV
>>> device. So i don't
>>> think it is a hard requirement.
>>
>> OK, fine.
>>
>>> And also, as we don't yet have a consensus on how to hide
>>> the lower netdevs, we could make it as another feature bit to hide lower
>>> netdevs once
>>> we have an acceptable solution.
>>
>> Guest/host interface isn't more flexible than the userspace/kernel
>> interface. The feature bit you propose would say what exactly?
>> Hypervisor has no idea what guest kernel shows guest userspace.
>> Note that the backup flag doesn't tell guest kernel what to do,
>> it just tells guest that there is or will be a faster main device
>> connected to the same backend, so the backup should only be used
>> when main device is not present.
>
>
> The current bypass module supports 3-netdev and 2-netdev models via 2 sets
> of interfaces
> bypass_master_create/destroy and bypass_master_register/unregister. So
> theoretically
> we can support the 2 models via 2 different feature bits. BACKUP and
> BACKUP_2_NETDEV.
I'm still trying to understand the value of so many models to support.
If we all agree eventually the transparent 1-netdev model can address
the more general case while 2-netdev or 3-netdev is unable to, what's
the point for supporting these many features?
-Siwei
>
> Similarly if we can figure out a way to hide both the netdevs, we can add
> BACKUP_1_NETDEV
> feature bit and update the bypass module to provide another set of
> interfaces that can
> be used by virtio_net to support this model.
>
> Now that we are leaning towards 'standby' as the name for the lower
> virtio-net, should we
> change the feature bit name also to VIRTIO_NET_F_STANDBY?
>
>>
>
^ permalink raw reply
* Re: [RFC PATCH net-next v6 2/4] net: Introduce generic bypass module
From: Jiri Pirko @ 2018-04-19 6:35 UTC (permalink / raw)
To: Samudrala, Sridhar
Cc: alexander.h.duyck, virtio-dev, Michael S. Tsirkin, kubakici,
netdev, virtualization, loseweigh, davem
In-Reply-To: <ff0c5ea1-16d4-00a7-9952-9049efa818eb@intel.com>
Thu, Apr 19, 2018 at 12:46:11AM CEST, sridhar.samudrala@intel.com wrote:
>On 4/18/2018 1:32 PM, Jiri Pirko wrote:
>> > > > > > > You still use "active"/"backup" names which is highly misleading as
>> > > > > > > it has completely different meaning that in bond for example.
>> > > > > > > I noted that in my previous review already. Please change it.
>> > > > > > I guess the issue is with only the 'active' name. 'backup' should be fine as it also
>> > > > > > matches with the BACKUP feature bit we are adding to virtio_net.
>> > > > > I think that "backup" is also misleading. Both "active" and "backup"
>> > > > > mean a *state* of slaves. This should be named differently.
>> > > > >
>> > > > >
>> > > > >
>> > > > > > With regards to alternate names for 'active', you suggested 'stolen', but i
>> > > > > > am not too happy with it.
>> > > > > > netvsc uses vf_netdev, are you OK with this? Or another option is 'passthru'
>> > > > > No. The netdev could be any netdevice. It does not have to be a "VF".
>> > > > > I think "stolen" is quite appropriate since it describes the modus
>> > > > > operandi. The bypass master steals some netdevice according to some
>> > > > > match.
>> > > > >
>> > > > > But I don't insist on "stolen". Just sounds right.
>> > > > We are adding VIRTIO_NET_F_BACKUP as a new feature bit to enable this feature, So i think
>> > > > 'backup' name is consistent.
>> > > It perhaps makes sense from the view of virtio device. However, as I
>> > > described couple of times, for master/slave device the name "backup" is
>> > > highly misleading.
>> > virtio is the backup. You are supposed to use another
>> > (typically passthrough) device, if that fails use virtio.
>> > It does seem appropriate to me. If you like, we can
>> > change that to "standby". Active I don't like either. "main"?
>> Sounds much better, yes.
>
>OK. Will change backup to 'standby'.
>'main' is fine, what about 'primary'?
Primary is also bonding terminology. But in this case, I think it would
fit. The primary slave is used as the active one whenever the link is
up.
>
>
>>
>>
>> > In fact would failover be better than bypass?
>> Also, much better.
>
>So do we want to change all 'bypass' references to 'failover' including
>the filenames.(net/core/failover.c and include/net/failover.h)
>
><snip>
>
>
>
>>
>>
>> >
>> > > > The intent is to restrict the 'active' netdev to be a VF. If there is a way to check that
>> > > > a PCI device is a VF in the guest kernel, we could restrict 'active' netdev to be a VF.
>> > > >
>> > > > Will look for any suggestions in the next day or two. If i don't get any, i will go
>> > > > with 'stolen'
>> > > >
>> > > > <snip>
>> > > >
>> > > >
>> > > > > +
>> > > > > +static struct net_device *bypass_master_get_bymac(u8 *mac,
>> > > > > + struct bypass_ops **ops)
>> > > > > +{
>> > > > > + struct bypass_master *bypass_master;
>> > > > > + struct net_device *bypass_netdev;
>> > > > > +
>> > > > > + spin_lock(&bypass_lock);
>> > > > > + list_for_each_entry(bypass_master, &bypass_master_list, list) {
>> > > > > > > As I wrote the last time, you don't need this list, spinlock.
>> > > > > > > You can do just something like:
>> > > > > > > for_each_net(net) {
>> > > > > > > for_each_netdev(net, dev) {
>> > > > > > > if (netif_is_bypass_master(dev)) {
>> > > > > > This function returns the upper netdev as well as the ops associated
>> > > > > > with that netdev.
>> > > > > > bypass_master_list is a list of 'struct bypass_master' that associates
>> > > > > Well, can't you have it in netdev priv?
>> > > > We cannot do this for 2-netdev model as there is no bypass_netdev created.
>> > > Howcome? You have no master? I don't understand..
>
>For 2-netdev model, the master netdev is not a new one created by the bypass module.
>It is created by netvsc internally and passed via bypass_master_register()
But virtio_net alho has to create the master and pass it down to the
bypass module. Howcome it is different?
>
><snip>
>
>
>
>> > >
>> > > > > > > > +
>> > > > > > > > + /* Avoid Bonding master dev with same MAC registering as slave dev */
>> > > > > > > > + if ((dev->priv_flags & IFF_BONDING) && (dev->flags & IFF_MASTER))
>> > > > > > > Yeah, this is certainly incorrect. One thing is, you should be using the
>> > > > > > > helpers netif_is_bond_master().
>> > > > > > > But what about the rest? macsec, macvlan, team, bridge, ovs and others?
>> > > > > > >
>> > > > > > > You need to do it not by blacklisting, but with whitelisting. You need
>> > > > > > > to whitelist VF devices. My port flavours patchset might help with this.
>> > > > > > May be i can use netdev_has_lower_dev() helper to make sure that the slave
>> > > > > I don't see such function in the code.
>> > > > It is netdev_has_any_lower_dev(). I need to export it.
>> > > Come on, you cannot use that. That would allow bonding without slaves,
>> > > but the slaves could be added later on.
>> > >
>> > > What exactly you are trying to achieve by this?
>
>I think i can remove this check. In pre-register,
>for backup device, i check that its parent matches bypass device &
>for vf device, we make sure that it is a pci device.
Okay. That is a start.
>
>
^ permalink raw reply
* Re: [virtio-dev] Re: [RFC PATCH 2/3] netdev: kernel-only IFF_HIDDEN netdevice
From: Siwei Liu @ 2018-04-19 6:31 UTC (permalink / raw)
To: Samudrala, Sridhar
Cc: Alexander Duyck, virtio-dev, Jiri Pirko, Michael S. Tsirkin,
Jakub Kicinski, Netdev, virtualization, David Ahern, si-wei liu,
David Miller
In-Reply-To: <4e029524-d542-0f12-bfdb-7c8a2f198e28@intel.com>
On Wed, Apr 18, 2018 at 10:00 PM, Samudrala, Sridhar
<sridhar.samudrala@intel.com> wrote:
> On 4/18/2018 9:41 PM, Michael S. Tsirkin wrote:
>>
>> On Wed, Apr 18, 2018 at 04:33:34PM -0700, Samudrala, Sridhar wrote:
>>>
>>> On 4/17/2018 5:26 PM, Siwei Liu wrote:
>>>>
>>>> I ran this with a few folks offline and gathered some good feedbacks
>>>> that I'd like to share thus revive the discussion.
>>>>
>>>> First of all, as illustrated in the reply below, cloud service
>>>> providers require transparent live migration. Specifically, the main
>>>> target of our case is to support SR-IOV live migration via kernel
>>>> upgrade while keeping the userspace of old distros unmodified. If it's
>>>> because this use case is not appealing enough for the mainline to
>>>> adopt, I will shut up and not continue discussing, although
>>>> technically it's entirely possible (and there's precedent in other
>>>> implementation) to do so to benefit any cloud service providers.
>>>>
>>>> If it's just the implementation of hiding netdev itself needs to be
>>>> improved, such as implementing it as attribute flag or adding linkdump
>>>> API, that's completely fine and we can look into that. However, the
>>>> specific issue needs to be undestood beforehand is to make transparent
>>>> SR-IOV to be able to take over the name (so inherit all the configs)
>>>> from the lower netdev, which needs some games with uevents and name
>>>> space reservation. So far I don't think it's been well discussed.
>>>>
>>>> One thing in particular I'd like to point out is that the 3-netdev
>>>> model currently missed to address the core problem of live migration:
>>>> migration of hardware specific feature/state, for e.g. ethtool configs
>>>> and hardware offloading states. Only general network state (IP
>>>> address, gateway, for eg.) associated with the bypass interface can be
>>>> migrated. As a follow-up work, bypass driver can/should be enhanced to
>>>> save and apply those hardware specific configs before or after
>>>> migration as needed. The transparent 1-netdev model being proposed as
>>>> part of this patch series will be able to solve that problem naturally
>>>> by making all hardware specific configurations go through the central
>>>> bypass driver, such that hardware configurations can be replayed when
>>>> new VF or passthrough gets plugged back in. Although that
>>>> corresponding function hasn't been implemented today, I'd like to
>>>> refresh everyone's mind that is the core problem any live migration
>>>> proposal should have addressed.
>>>>
>>>> If it would make things more clear to defer netdev hiding until all
>>>> functionalities regarding centralizing and replay are implemented,
>>>> we'd take advices like that and move on to implementing those features
>>>> as follow-up patches. Once all needed features get done, we'd resume
>>>> the work for hiding lower netdev at that point. Think it would be the
>>>> best to make everyone understand the big picture in advance before
>>>> going too far.
>>>
>>> I think we should get the 3-netdev model integrated and add any
>>> additional
>>> ndo_ops/ethool ops that we would like to support/migrate before looking
>>> into
>>> hiding the lower netdevs.
>>
>> Once they are exposed, I don't think we'll be able to hide them -
>> they will be a kernel ABI.
>>
>> Do you think everyone needs to hide the SRIOV device?
>> Or that only some users need this?
>
>
> Hyper-V is currently supporting live migration without hiding the SR-IOV
> device. So i don't
> think it is a hard requirement. And also, as we don't yet have a consensus
This is a vague point as Hyper-V is mostly Windows oriented: the
target users don't change adapter settings in device manager much as
it's hidden too deep already. Actually it does not address the general
case for SR-IOV live migration but just a subset, why are we making
such comparison?
Note it's always the hard requirement for live migration that *all
states* should be migrated no matter what the implementation it is
going to be. The current 3-netdev model is remote to be useful for
real world scenario and it has no advantage compared to using
userspace generic bonding.
-Siwei
> on how to hide
> the lower netdevs, we could make it as another feature bit to hide lower
> netdevs once
> we have an acceptable solution.
>
^ permalink raw reply
* Re: [virtio-dev] Re: [RFC PATCH 2/3] netdev: kernel-only IFF_HIDDEN netdevice
From: Samudrala, Sridhar @ 2018-04-19 6:10 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Alexander Duyck, virtio-dev, Jiri Pirko, Jakub Kicinski, Netdev,
virtualization, Siwei Liu, David Ahern, si-wei liu, David Miller
In-Reply-To: <20180419080215-mutt-send-email-mst@kernel.org>
On 4/18/2018 10:07 PM, Michael S. Tsirkin wrote:
> On Wed, Apr 18, 2018 at 10:00:51PM -0700, Samudrala, Sridhar wrote:
>> On 4/18/2018 9:41 PM, Michael S. Tsirkin wrote:
>>> On Wed, Apr 18, 2018 at 04:33:34PM -0700, Samudrala, Sridhar wrote:
>>>> On 4/17/2018 5:26 PM, Siwei Liu wrote:
>>>>> I ran this with a few folks offline and gathered some good feedbacks
>>>>> that I'd like to share thus revive the discussion.
>>>>>
>>>>> First of all, as illustrated in the reply below, cloud service
>>>>> providers require transparent live migration. Specifically, the main
>>>>> target of our case is to support SR-IOV live migration via kernel
>>>>> upgrade while keeping the userspace of old distros unmodified. If it's
>>>>> because this use case is not appealing enough for the mainline to
>>>>> adopt, I will shut up and not continue discussing, although
>>>>> technically it's entirely possible (and there's precedent in other
>>>>> implementation) to do so to benefit any cloud service providers.
>>>>>
>>>>> If it's just the implementation of hiding netdev itself needs to be
>>>>> improved, such as implementing it as attribute flag or adding linkdump
>>>>> API, that's completely fine and we can look into that. However, the
>>>>> specific issue needs to be undestood beforehand is to make transparent
>>>>> SR-IOV to be able to take over the name (so inherit all the configs)
>>>>> from the lower netdev, which needs some games with uevents and name
>>>>> space reservation. So far I don't think it's been well discussed.
>>>>>
>>>>> One thing in particular I'd like to point out is that the 3-netdev
>>>>> model currently missed to address the core problem of live migration:
>>>>> migration of hardware specific feature/state, for e.g. ethtool configs
>>>>> and hardware offloading states. Only general network state (IP
>>>>> address, gateway, for eg.) associated with the bypass interface can be
>>>>> migrated. As a follow-up work, bypass driver can/should be enhanced to
>>>>> save and apply those hardware specific configs before or after
>>>>> migration as needed. The transparent 1-netdev model being proposed as
>>>>> part of this patch series will be able to solve that problem naturally
>>>>> by making all hardware specific configurations go through the central
>>>>> bypass driver, such that hardware configurations can be replayed when
>>>>> new VF or passthrough gets plugged back in. Although that
>>>>> corresponding function hasn't been implemented today, I'd like to
>>>>> refresh everyone's mind that is the core problem any live migration
>>>>> proposal should have addressed.
>>>>>
>>>>> If it would make things more clear to defer netdev hiding until all
>>>>> functionalities regarding centralizing and replay are implemented,
>>>>> we'd take advices like that and move on to implementing those features
>>>>> as follow-up patches. Once all needed features get done, we'd resume
>>>>> the work for hiding lower netdev at that point. Think it would be the
>>>>> best to make everyone understand the big picture in advance before
>>>>> going too far.
>>>> I think we should get the 3-netdev model integrated and add any additional
>>>> ndo_ops/ethool ops that we would like to support/migrate before looking into
>>>> hiding the lower netdevs.
>>> Once they are exposed, I don't think we'll be able to hide them -
>>> they will be a kernel ABI.
>>>
>>> Do you think everyone needs to hide the SRIOV device?
>>> Or that only some users need this?
>> Hyper-V is currently supporting live migration without hiding the SR-IOV device. So i don't
>> think it is a hard requirement.
> OK, fine.
>
>> And also, as we don't yet have a consensus on how to hide
>> the lower netdevs, we could make it as another feature bit to hide lower netdevs once
>> we have an acceptable solution.
> Guest/host interface isn't more flexible than the userspace/kernel
> interface. The feature bit you propose would say what exactly?
> Hypervisor has no idea what guest kernel shows guest userspace.
> Note that the backup flag doesn't tell guest kernel what to do,
> it just tells guest that there is or will be a faster main device
> connected to the same backend, so the backup should only be used
> when main device is not present.
The current bypass module supports 3-netdev and 2-netdev models via 2 sets of interfaces
bypass_master_create/destroy and bypass_master_register/unregister. So theoretically
we can support the 2 models via 2 different feature bits. BACKUP and BACKUP_2_NETDEV.
Similarly if we can figure out a way to hide both the netdevs, we can add BACKUP_1_NETDEV
feature bit and update the bypass module to provide another set of interfaces that can
be used by virtio_net to support this model.
Now that we are leaning towards 'standby' as the name for the lower virtio-net, should we
change the feature bit name also to VIRTIO_NET_F_STANDBY?
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* [PATCH v2 net 3/3] virtio_net: sparse annotation fix
From: Michael S. Tsirkin @ 2018-04-19 5:30 UTC (permalink / raw)
To: linux-kernel
Cc: Thomas Huth, Eric Dumazet, netdev, Cornelia Huck, virtualization,
Mikulas Patocka, David Miller
In-Reply-To: <1524115776-334953-1-git-send-email-mst@redhat.com>
offloads is a buffer in virtio format, should use
the __virtio64 tag.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
drivers/net/virtio_net.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index f84fe04..c5b11f2 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -155,7 +155,7 @@ struct control_buf {
u8 promisc;
u8 allmulti;
__virtio16 vid;
- u64 offloads;
+ __virtio64 offloads;
};
struct virtnet_info {
--
MST
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox