Linux virtualization list

Linux virtualization list
 help / color / mirror / Atom feed

* Re: [RFC] virtio: use mandatory barriers for remote processor vdevs
From: Michael S. Tsirkin @ 2011-11-30 14:59 UTC (permalink / raw)
  To: Ohad Ben-Cohen; +Cc: linux-arm-kernel, linux-kernel, kvm, virtualization
In-Reply-To: <CAK=WgbZK+k4uuyMPHUfr_PMyrR5ceRTvuDRpmhXfgwpSUOPw0g@mail.gmail.com>

On Wed, Nov 30, 2011 at 01:45:05PM +0200, Ohad Ben-Cohen wrote:
> > So you put virtio rings in MMIO memory?
> 
> I'll be precise: the vrings are created in non-cacheable memory, which
> both processors have access to.
> 
> > Could you please give a couple of examples of breakage?
> 
> Sure. Basically, the order of the vring memory operations appear
> differently to the observing processor. For example, avail->idx gets
> updated before the new entry is put in the available array...

I see. And this happens because the ARM processor reorders
memory writes to this uncacheable memory?
And in an SMP configuration, writes are somehow not reordered?

For example, if we had such an AMP configuration with and x86
processor, wmb() (sfence) would be wrong and smp_wmb() would be sufficient.

Just checking that this is not a bug in the smp_wmb implementation
for the specific platform.

-- 
MST

^ permalink raw reply

* Re: [RFC] virtio: use mandatory barriers for remote processor vdevs
From: Ohad Ben-Cohen @ 2011-11-30 16:04 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: linux-arm-kernel, linux-kernel, kvm, virtualization
In-Reply-To: <20111130145958.GE21413@redhat.com>

On Wed, Nov 30, 2011 at 4:59 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> I see. And this happens because the ARM processor reorders
> memory writes

Yes.

> And in an SMP configuration, writes are somehow not reordered?

They are, but then the smp memory barriers are enough to control these
effects. It's not enough to control reordering as seen by a device
(which is what our AMP processors are) though.

(btw, the difference between an SMP processor and a device here lies
in how the memory is mapped: normal memory vs. device memory
attributes. it's an ARM thingy).

> Just checking that this is not a bug in the smp_wmb implementation
> for the specific platform.

No, it's not.

ARM's smp memory barriers use ARM's DMB instruction, which is enough
to control SMP effects, whereas ARM's mandatory memory barriers use
ARM's DSB instruction, which is required to ensure the ordering
between Device and Normal memory accesses.

Thanks,
Ohad.

^ permalink raw reply

* Re: [PATCH] virtio-ring: Use threshold for switching to indirect descriptors
From: Sasha Levin @ 2011-11-30 16:11 UTC (permalink / raw)
  To: Avi Kivity; +Cc: markmc, kvm, Michael S. Tsirkin, linux-kernel, virtualization
In-Reply-To: <4ED4F30F.8000603@redhat.com>

On Tue, 2011-11-29 at 16:58 +0200, Avi Kivity wrote:
> On 11/29/2011 04:54 PM, Michael S. Tsirkin wrote:
> > > 
> > > Which is actually strange, weren't indirect buffers introduced to make
> > > the performance *better*? From what I see it's pretty much the
> > > same/worse for virtio-blk.
> >
> > I know they were introduced to allow adding very large bufs.
> > See 9fa29b9df32ba4db055f3977933cd0c1b8fe67cd
> > Mark, you wrote the patch, could you tell us which workloads
> > benefit the most from indirect bufs?
> >
> 
> Indirects are really for block devices with many spindles, since there
> the limiting factor is the number of requests in flight.  Network
> interfaces are limited by bandwidth, it's better to increase the ring
> size and use direct buffers there (so the ring size more or less
> corresponds to the buffer size).
> 

I did some testing of indirect descriptors under different workloads.

All tests were on a 2 vcpu guest with vhost on. Simple TCP_STREAM using
netperf.

Indirect desc off:
guest -> host, 1 stream: ~4600mb/s
host -> guest, 1 stream: ~5900mb/s
guest -> host, 8 streams: ~620mb/s (on average)
host -> guest, 8 stream: ~600mb/s (on average)

Indirect desc on:
guest -> host, 1 stream: ~4900mb/s
host -> guest, 1 stream: ~5400mb/s
guest -> host, 8 streams: ~620mb/s (on average)
host -> guest, 8 stream: ~600mb/s (on average)

Which means that for one stream, guest to host gets faster while host to
guest gets slower when indirect descriptors are on.

-- 

Sasha.

^ permalink raw reply

* Re: [RFC] virtio: use mandatory barriers for remote processor vdevs
From: Michael S. Tsirkin @ 2011-11-30 16:15 UTC (permalink / raw)
  To: Ohad Ben-Cohen; +Cc: linux-arm-kernel, linux-kernel, kvm, virtualization
In-Reply-To: <CAK=WgbYyFdLftE_kp2_JOThVhn-FzGsDqVqKn0Jwm2teQyZBNA@mail.gmail.com>

On Wed, Nov 30, 2011 at 06:04:56PM +0200, Ohad Ben-Cohen wrote:
> On Wed, Nov 30, 2011 at 4:59 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > I see. And this happens because the ARM processor reorders
> > memory writes
> 
> Yes.
> 
> > And in an SMP configuration, writes are somehow not reordered?
> 
> They are, but then the smp memory barriers are enough to control these
> effects. It's not enough to control reordering as seen by a device
> (which is what our AMP processors are) though.
> 
> (btw, the difference between an SMP processor and a device here lies
> in how the memory is mapped: normal memory vs. device memory
> attributes. it's an ARM thingy).

How are the rings mapped? normal memory, right?
We allocate them with plan alloc_pages_exact in virtio_pci.c ...

> > Just checking that this is not a bug in the smp_wmb implementation
> > for the specific platform.
> 
> No, it's not.
> 
> ARM's smp memory barriers use ARM's DMB instruction, which is enough
> to control SMP effects, whereas ARM's mandatory memory barriers use
> ARM's DSB instruction, which is required to ensure the ordering
> between Device and Normal memory accesses.
> 
> Thanks,
> Ohad.

Yes wmb() is required to ensure ordering for MMIO.
But here both accesses: index and ring - are for
memory, not MMIO.

I could understand ring kick bypassing index write, maybe ...
But you described an index write bypassing descriptor write.
Is this something you see in practice?

-- 
MST

^ permalink raw reply

* Re: [PATCH] virtio-ring: Use threshold for switching to indirect descriptors
From: Sasha Levin @ 2011-11-30 16:17 UTC (permalink / raw)
  To: Avi Kivity; +Cc: markmc, kvm, Michael S. Tsirkin, linux-kernel, virtualization
In-Reply-To: <1322669511.3985.8.camel@lappy>

Sorry, I forgot to copy-paste one of the results :)

On Wed, 2011-11-30 at 18:11 +0200, Sasha Levin wrote:
> I did some testing of indirect descriptors under different workloads.
> 
> All tests were on a 2 vcpu guest with vhost on. Simple TCP_STREAM using
> netperf.
> 
> Indirect desc off:
> guest -> host, 1 stream: ~4600mb/s
> host -> guest, 1 stream: ~5900mb/s
> guest -> host, 8 streams: ~620mb/s (on average)
> host -> guest, 8 stream: ~600mb/s (on average)
> 
> Indirect desc on:
> guest -> host, 1 stream: ~4900mb/s
> host -> guest, 1 stream: ~5400mb/s
> guest -> host, 8 streams: ~620mb/s (on average)
> host -> guest, 8 stream: ~600mb/s (on average)
Should be:
host -> guest, 8 stream: ~515mb/s (on average)

> 
> Which means that for one stream, guest to host gets faster while host to
> guest gets slower when indirect descriptors are on.
> 

-- 

Sasha.

^ permalink raw reply

* Re: [RFC] virtio: use mandatory barriers for remote processor vdevs
From: Ohad Ben-Cohen @ 2011-11-30 16:24 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: linux-arm-kernel, linux-kernel, kvm, virtualization
In-Reply-To: <20111130161555.GB25812@redhat.com>

On Wed, Nov 30, 2011 at 6:15 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> How are the rings mapped? normal memory, right?

No, device memory.

> We allocate them with plan alloc_pages_exact in virtio_pci.c ...

I'm not using virtio_pci.c; remoteproc is allocating the rings using
the DMA API.

> Yes wmb() is required to ensure ordering for MMIO.
> But here both accesses: index and ring - are for
> memory, not MMIO.

I'm doing IO with a device over shared memory. It does require
mandatory barriers as I explained.

> Is this something you see in practice?

Yes. These bugs are very real.

^ permalink raw reply

* Re: [Xen-devel] [ANNOUNCE] Xen port to Cortex-A15 / ARMv7 with virt extensions
From: Ian Campbell @ 2011-11-30 16:27 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: xen-devel@lists.xensource.com, linaro-dev@lists.linaro.org,
	Pawel Moll, kvm@vger.kernel.org, Stefano Stabellini,
	linux-kernel@vger.kernel.org,
	virtualization@lists.linux-foundation.org,
	android-virt@lists.cs.columbia.edu,
	embeddedxen-devel@lists.sourceforge.net,
	linux-arm-kernel@lists.infradead.org
In-Reply-To: <201111301432.54463.arnd@arndb.de>

On Wed, 2011-11-30 at 14:32 +0000, Arnd Bergmann wrote:
> On Wednesday 30 November 2011, Ian Campbell wrote:
> > On Wed, 2011-11-30 at 13:03 +0000, Arnd Bergmann wrote:
> > For domU the DT would presumably be constructed by the toolstack (in
> > dom0 userspace) as appropriate for the guest configuration. I guess this
> > needn't correspond to any particular "real" hardware platform.
> 
> Correct, but it needs to correspond to some platform that is supported
> by the guest OS, which leaves the choice between emulating a real
> hardware platform, adding a completely new platform specifically for
> virtual machines, or something in between the two.
> 
> What I suggested to the KVM developers is to start out with the
> vexpress platform, but then generalize it to the point where it fits
> your needs. All hardware that one expects a guest to have (GIC, timer,
> ...) will still show up in the same location as on a real vexpress,
> while anything that makes no sense or is better paravirtualized (LCD,
> storage, ...) just becomes optional and has to be described in the
> device tree if it's actually there.

That's along the lines of what I was thinking as well.

The DT contains the address of GIC, timer etc as well right? So at least
in principal we needn't provide e.g. the GIC at the same address as any
real platform but in practice I expect we will.

In principal we could also offer the user options as to which particular
platform a guest looks like.

> > > This would also be the place where you tell the guest that it should
> > > look for PV devices. I'm not familiar with how Xen announces PV
> > > devices to the guest on other architectures, but you have the
> > > choice between providing a full "binding", i.e. a formal specification
> > > in device tree format for the guest to detect PV devices in the
> > > same way as physical or emulated devices, or just providing a single
> > > place in the device tree in which the guest detects the presence
> > > of a xen device bus and then uses hcalls to find the devices on that
> > > bus.
> > 
> > On x86 there is an emulated PCI device which serves as the hooking point
> > for the PV drivers. For ARM I don't think it would be unreasonable to
> > have a DT entry instead. I think it would be fine just represent the
> > root of the "xenbus" and further discovery would occur using the normal
> > xenbus mechanisms (so not a full binding). AIUI for buses which are
> > enumerable this is the preferred DT scheme to use.
> 
> In general that is the case, yes. One could argue that any software
> protocol between Xen and the guest is as good as any other, so it
> makes sense to use the device tree to describe all devices here.
> The counterargument to that is that Linux and other OSs already
> support Xenbus, so there is no need to come up with a new binding.

Right.

> I don't care much either way, but I think it would be good to
> use similar solutions across all hypervisors. The two options
> that I've seen discussed for KVM were to use either a virtual PCI
> bus with individual virtio-pci devices as on the PC, or to
> use the new virtio-mmio driver and individually put virtio devices
> into the device tree.
> 
> > > Another topic is the question whether there are any hcalls that
> > > we should try to standardize before we get another architecture
> > > with multiple conflicting hcall APIs as we have on x86 and powerpc.
> > 
> > The hcall API we are currently targeting is the existing Xen API (at
> > least the generic parts of it). These generally deal with fairly Xen
> > specific concepts like grant tables etc.
> 
> Ok. It would of course still be possible to agree on an argument passing
> convention so that we can share the macros used to issue the hcalls,
> even if the individual commands are all different.

I think it likely that we can all agree on a common calling convention
for N-argument hypercalls. It doubt there are that many useful choices
with conflicting requirements yet strongly compelling advantages.

>  I think I also
> remember talk about the need for a set of hypervisor independent calls
> that everyone should implement, but I can't remember what those were.

I'd not heard of this, maybe I just wasn't looking the right way though.

> Maybe we can split the number space into a range of some generic and
> some vendor specific hcalls?

Ian.

^ permalink raw reply

* Re: virtio-scsi spec (was Re: [PATCH] Add virtio-scsi to the virtio spec)
From: Paolo Bonzini @ 2011-11-30 16:36 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: LKML, linux-scsi, virtualization, Stefan Hajnoczi,
	Michael S. Tsirkin
In-Reply-To: <4ED63AE1.8090105@suse.de>

On 11/30/2011 03:17 PM, Hannes Reinecke wrote:
>>   seg_max is the maximum number of segments that can be in a
>>     command. A bidirectional command can include seg_max input
>>     segments and seg_max output segments.
>>
> I would like to have the other request_queue limitations exposed
> here, too.
> Most notably we're missing the maximum size of an individual segment
> and the maximum size of the overall I/O request.

The virtio transport does not put any limit, as far as I know.

> As this is the host specification I really would like to see an host
> identifier somewhere in there.
> Otherwise we won't be able to reliably identify a virtio SCSI host.

I thought about it, but I couldn't figure out exactly how to use it. If 
it's just allocating 64 bits in the configuration space (with the 
stipulation that they could be zero), let's do it now. Otherwise a 
controlq command is indeed better, and it can come later.

But even if it's just a 64-bit value, then: 1) where would you place it 
in sysfs for userspace?  I can make up a random name, but existing user 
tools won't find it and that's against the design of virtio-scsi.  2) 
How would it be encoded as a transport ID?  Is it FC, or firewire, or 
SAS, or what?

> Plus you can't calculate the ITL nexus information, making
> Persistent Reservations impossible.

They are not impossible, only some features such as SPEC_I_PT.  If you 
use NPIV or iSCSI in the host, then the persistent reservations will 
already get the correct initiator port.  If not, much more work is needed.

> We should be adding
>
> VIRTIO_SCSI_S_BUSY
>
> for a temporary failure, indicating that a command retry
> might be sufficient to clear this situation.
> Equivalent to VIRTIO_SCSI_S_NEXUS_FAILURE, but issuing a retry on
> the same path.

... and equivalent to DID_BUS_BUSY.  Assuming no other major objections, 
I will add and resubmit in a few days.

Paolo

^ permalink raw reply

* [PATCH 1/1] Staging: hv: mousevsc: Properly add the hid device
From: K. Y. Srinivasan @ 2011-11-30 16:52 UTC (permalink / raw)
  To: gregkh, linux-kernel, devel, virtualization, ohering, joe,
	dmitry.torokhov, jkosina
  Cc: K. Y. Srinivasan, Haiyang Zhang

We need to properly add the hid device to correctly initialize the
sysfs state. While this patch is against the staging tree; Jiri,
please pick up this patch as you merge the Hyper-V mouse driver.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reported-by: Fuzhou Chen <fuzhouch@microsoft.com>
---
 drivers/staging/hv/hv_mouse.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/drivers/staging/hv/hv_mouse.c b/drivers/staging/hv/hv_mouse.c
index a28c549..66da8e3 100644
--- a/drivers/staging/hv/hv_mouse.c
+++ b/drivers/staging/hv/hv_mouse.c
@@ -519,6 +519,10 @@ static int mousevsc_probe(struct hv_device *device,
 
 	sprintf(hid_dev->name, "%s", "Microsoft Vmbus HID-compliant Mouse");
 
+	ret = hid_add_device(hid_dev);
+	if (ret)
+		goto probe_err1;
+
 	ret = hid_parse_report(hid_dev, input_dev->report_desc,
 				input_dev->report_desc_size);
 
-- 
1.7.4.1

^ permalink raw reply related

* Re: [Xen-devel] [ANNOUNCE] Xen port to Cortex-A15 / ARMv7 with virt extensions
From: Arnd Bergmann @ 2011-11-30 18:15 UTC (permalink / raw)
  To: Ian Campbell
  Cc: xen-devel@lists.xensource.com, linaro-dev@lists.linaro.org,
	Pawel Moll, kvm@vger.kernel.org, Stefano Stabellini,
	linux-kernel@vger.kernel.org,
	virtualization@lists.linux-foundation.org,
	android-virt@lists.cs.columbia.edu,
	embeddedxen-devel@lists.sourceforge.net,
	linux-arm-kernel@lists.infradead.org
In-Reply-To: <1322670473.31810.129.camel@zakaz.uk.xensource.com>

On Wednesday 30 November 2011, Ian Campbell wrote:
> On Wed, 2011-11-30 at 14:32 +0000, Arnd Bergmann wrote:
> > On Wednesday 30 November 2011, Ian Campbell wrote:
> > What I suggested to the KVM developers is to start out with the
> > vexpress platform, but then generalize it to the point where it fits
> > your needs. All hardware that one expects a guest to have (GIC, timer,
> > ...) will still show up in the same location as on a real vexpress,
> > while anything that makes no sense or is better paravirtualized (LCD,
> > storage, ...) just becomes optional and has to be described in the
> > device tree if it's actually there.
> 
> That's along the lines of what I was thinking as well.
> 
> The DT contains the address of GIC, timer etc as well right? So at least
> in principal we needn't provide e.g. the GIC at the same address as any
> real platform but in practice I expect we will.

Yes.

> In principal we could also offer the user options as to which particular
> platform a guest looks like.

At least when using a qemu based simulation. Most platforms have some
characteristics that are not meaningful in a classic virtualization
scenario, but it would certainly be helpful to use the virtualization
extensions to run a kernel that was built for a particular platform
faster than with pure qemu, when you want to test that kernel image.

It has been suggested in the past that it would be nice to run the
guest kernel built for the same platform as the host kernel by
default, but I think it would be much better to have just one
platform that we end up using for guests on any host platform,
unless there is a strong reason to do otherwise.

There is also ongoing restructuring in the ARM Linux kernel to
allow running the same kernel binary on multiple platforms. While
there is still a lot of work to be done, you should assume that
we will finish it before you see lots of users in production, there
is no need to plan for the current one-kernel-per-board case.

> > Ok. It would of course still be possible to agree on an argument passing
> > convention so that we can share the macros used to issue the hcalls,
> > even if the individual commands are all different.
> 
> I think it likely that we can all agree on a common calling convention
> for N-argument hypercalls. It doubt there are that many useful choices
> with conflicting requirements yet strongly compelling advantages.

Exactly. I think it's only lack of communication that has resulted in
different interfaces for each hypervisor on the other architectures.

KVM and Xen at least both fall into the single-return-value category,
so we should be able to agree on a calling conventions. KVM does not
have an hcall API on ARM yet, and I see no reason not to use the
same implementation that you have in the Xen guest.

Stefano, can you split out the generic parts of your asm/xen/hypercall.h
file into a common asm/hypercall.h and submit it for review to the
arm kernel list?

	Arnd

^ permalink raw reply

* Re: [Embeddedxen-devel] [Xen-devel] [ANNOUNCE] Xen port to Cortex-A15 / ARMv7 with virt extensions
From: Stefano Stabellini @ 2011-11-30 18:32 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: xen-devel@lists.xensource.com, linaro-dev@lists.linaro.org,
	Ian Campbell, Pawel Moll, Stefano Stabellini,
	linux-kernel@vger.kernel.org,
	virtualization@lists.linux-foundation.org,
	android-virt@lists.cs.columbia.edu, kvm@vger.kernel.org,
	embeddedxen-devel@lists.sourceforge.net,
	linux-arm-kernel@lists.infradead.org
In-Reply-To: <201111301815.01297.arnd@arndb.de>

On Wed, 30 Nov 2011, Arnd Bergmann wrote:
> > In principal we could also offer the user options as to which particular
> > platform a guest looks like.
> 
> At least when using a qemu based simulation. Most platforms have some
> characteristics that are not meaningful in a classic virtualization
> scenario, but it would certainly be helpful to use the virtualization
> extensions to run a kernel that was built for a particular platform
> faster than with pure qemu, when you want to test that kernel image.
> 
> It has been suggested in the past that it would be nice to run the
> guest kernel built for the same platform as the host kernel by
> default, but I think it would be much better to have just one
> platform that we end up using for guests on any host platform,
> unless there is a strong reason to do otherwise.
> 
> There is also ongoing restructuring in the ARM Linux kernel to
> allow running the same kernel binary on multiple platforms. While
> there is still a lot of work to be done, you should assume that
> we will finish it before you see lots of users in production, there
> is no need to plan for the current one-kernel-per-board case.

It is very good to hear, I am counting on it.


> > > Ok. It would of course still be possible to agree on an argument passing
> > > convention so that we can share the macros used to issue the hcalls,
> > > even if the individual commands are all different.
> > 
> > I think it likely that we can all agree on a common calling convention
> > for N-argument hypercalls. It doubt there are that many useful choices
> > with conflicting requirements yet strongly compelling advantages.
> 
> Exactly. I think it's only lack of communication that has resulted in
> different interfaces for each hypervisor on the other architectures.

It is also due to history: on X86 it was possible to issue hypercalls to
Xen before VMCALL (the X86 version of HVC) was available.


> KVM and Xen at least both fall into the single-return-value category,
> so we should be able to agree on a calling conventions. KVM does not
> have an hcall API on ARM yet, and I see no reason not to use the
> same implementation that you have in the Xen guest.
> 
> Stefano, can you split out the generic parts of your asm/xen/hypercall.h
> file into a common asm/hypercall.h and submit it for review to the
> arm kernel list?

Sure, I can do that.
Usually the hypercall calling convention is very hypervisor specific,
but if it turns out that we have the same requirements I happy to design
a common interface.

^ permalink raw reply

* Re: [RFC] virtio: use mandatory barriers for remote processor vdevs
From: Ohad Ben-Cohen @ 2011-11-30 22:43 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: linux-arm-kernel, linux-kernel, kvm, virtualization
In-Reply-To: <20111130145004.GD21413@redhat.com>

On Wed, Nov 30, 2011 at 4:50 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> make headers_install
> make -C tools/virtio/
> (you'll need an empty stub for tools/virtio/linux/module.h,
>  I just sent a patch to add that)
> sudo insmod tools/virtio/vhost_test/vhost_test.ko
> ./tools/virtio/virtio_test

Ok, I gave this a spin.

I've tried to see if reverting d57ed95 has any measurable effect on
the execution time of virtio_test's run_test(), but I couldn't see any
(several attempts with and without d57ed95 yielded very similar range
of execution times).

YMMV though, especially with real workloads.

> Real virtualization/x86 can keep using current smp_XX barriers, right?

Yes, sure. ARM virtualization can too, since smp_XX barriers are
enough for that scenario.

> We can have some config for your kind of setup.

Please note that it can't be a compile-time decision though (unless
we're willing to effectively revert d57ed95 when this config kicks
in): it's not unlikely that one would want to have both use cases
running on the same time.

Thanks,
Ohad.

^ permalink raw reply

* Re: [RFC] virtio: use mandatory barriers for remote processor vdevs
From: Michael S. Tsirkin @ 2011-11-30 23:13 UTC (permalink / raw)
  To: Ohad Ben-Cohen; +Cc: linux-arm-kernel, linux-kernel, kvm, virtualization
In-Reply-To: <CAK=WgbY59n9N_-2_TZ9wNBNcTMV1DCrz65e-Ae=xamcg6A0dOA@mail.gmail.com>

On Thu, Dec 01, 2011 at 12:43:08AM +0200, Ohad Ben-Cohen wrote:
> On Wed, Nov 30, 2011 at 4:50 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > make headers_install
> > make -C tools/virtio/
> > (you'll need an empty stub for tools/virtio/linux/module.h,
> >  I just sent a patch to add that)
> > sudo insmod tools/virtio/vhost_test/vhost_test.ko
> > ./tools/virtio/virtio_test
> 
> Ok, I gave this a spin.
> 
> I've tried to see if reverting d57ed95 has any measurable effect on
> the execution time of virtio_test's run_test(), but I couldn't see any
> (several attempts with and without d57ed95 yielded very similar range
> of execution times).
> 
> YMMV though, especially with real workloads.
> 
> > Real virtualization/x86 can keep using current smp_XX barriers, right?
> 
> Yes, sure. ARM virtualization can too, since smp_XX barriers are
> enough for that scenario.
> 
> > We can have some config for your kind of setup.
> 
> Please note that it can't be a compile-time decision though (unless
> we're willing to effectively revert d57ed95 when this config kicks
> in): it's not unlikely that one would want to have both use cases
> running on the same time.
> 
> Thanks,
> Ohad.

For x86, stores into memory are ordered. So I think that yes, smp_XXX
can be selected at compile time.

So let's forget the virtio strangeness for a minute,

To me it starts looking like we need some new kind of barrier
that handles accesses to DMA coherent memory. dma_Xmb()?
dma_coherent_Xmb()?  For example, on x86, dma_wmb() can be barrier(),
but on your system it needs to do DSB.

We can set the rule that dma barriers are guaranteed stronger
than smp ones, and we can just use dma_ everywhere.
So the strength will be:

smp < dma < mandatory

And now virtio can use DMA barriers and instead of adding
overhead for x86, x86 will actually gain from this,
as we'll drop mandatory barriers on UP systems.

Hmm?

-- 
MST

^ permalink raw reply

* Re: [RFC] virtio: use mandatory barriers for remote processor vdevs
From: Ohad Ben-Cohen @ 2011-11-30 23:27 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: linux-arm-kernel, linux-kernel, kvm, virtualization
In-Reply-To: <CAK=WgbazBq4GVo16sZsrZirgMYbwMcWtPaqn90c6d9Phr0gh3A@mail.gmail.com>

On Wed, Nov 30, 2011 at 6:24 PM, Ohad Ben-Cohen <ohad@wizery.com> wrote:
> On Wed, Nov 30, 2011 at 6:15 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>> How are the rings mapped? normal memory, right?
>
> No, device memory.

Ok, I have more info.

Originally remoteproc was mapping the rings using ioremap, and that
meant ARM Device memory.

Recently, though, we moved to CMA (allocating memory for the rings via
dma_alloc_coherent), and that isn't Device memory anymore: it's
uncacheable Normal memory (on ARM v6+).

We still require mandatory barriers though: one very reproducible
problem I personally face is that the avail index doesn't get updated
before the kick. As a result, the remote processor misses a buffer
that was just added (the kick wakes it up only to find that the avail
index wasn't changed yet). In this case, it probably happens because
the mailbox, used to kick the remote processor, is mapped as Device
memory, and therefore the kick can be reordered before the updates to
the ring  can be observed.

I did get two additional reports about reordering issues, on different
setups than my own, and which I can't personally reproduce: the one
I've described earlier (avail index gets updated before the avail
array) and one in the receive path (reading a used buffer which we
already read). I couldn't personally verify those, but both issues
were reported to be gone when mandatory barriers were used.

I expect those reports only to increase: the diversity of platforms
that are now looking into adopting virtio for this kind of
inter-process communication is quite huge, with several different
architectures and even more hardware implementations on the way (not
only ARM).

Thanks,
Ohad.

^ permalink raw reply

* Re: [RFC] virtio: use mandatory barriers for remote processor vdevs
From: Michael S. Tsirkin @ 2011-11-30 23:43 UTC (permalink / raw)
  To: Ohad Ben-Cohen; +Cc: linux-arm-kernel, linux-kernel, kvm, virtualization
In-Reply-To: <CAK=WgbbZH3ZFzbwwyVBGEvwHPbogDW6+Kb_1pRam2jj5oJHuDA@mail.gmail.com>

On Thu, Dec 01, 2011 at 01:27:10AM +0200, Ohad Ben-Cohen wrote:
> On Wed, Nov 30, 2011 at 6:24 PM, Ohad Ben-Cohen <ohad@wizery.com> wrote:
> > On Wed, Nov 30, 2011 at 6:15 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> >> How are the rings mapped? normal memory, right?
> >
> > No, device memory.
> 
> Ok, I have more info.
> 
> Originally remoteproc was mapping the rings using ioremap, and that
> meant ARM Device memory.
> 
> Recently, though, we moved to CMA (allocating memory for the rings via
> dma_alloc_coherent), and that isn't Device memory anymore: it's
> uncacheable Normal memory (on ARM v6+).

And these accesses need to be ordered with DSB? Or DMB?

> We still require mandatory barriers though: one very reproducible
> problem I personally face is that the avail index doesn't get updated
> before the kick.

Aha! The *kick* really is MMIO. So I think we do need a mandatory barrier
before the kick.  Maybe we need it for virtio-pci as well
(not on kvm, naturally :) Off-hand this seems to belong in the transport
layer but need to think about it.

> As a result, the remote processor misses a buffer
> that was just added (the kick wakes it up only to find that the avail
> index wasn't changed yet). In this case, it probably happens because
> the mailbox, used to kick the remote processor, is mapped as Device
> memory, and therefore the kick can be reordered before the updates to
> the ring  can be observed.
> 
> I did get two additional reports about reordering issues, on different
> setups than my own, and which I can't personally reproduce: the one
> I've described earlier (avail index gets updated before the avail
> array) and one in the receive path (reading a used buffer which we
> already read). I couldn't personally verify those, but both issues
> were reported to be gone when mandatory barriers were used.

Hmm. So it's a hint that something is wrong with memory
but not what's wrong exactly.

> I expect those reports only to increase: the diversity of platforms
> that are now looking into adopting virtio for this kind of
> inter-process communication is quite huge, with several different
> architectures and even more hardware implementations on the way (not
> only ARM).
> 
> Thanks,
> Ohad.

Right. We need to be very careful with memory,
it's a tricky field. One known problem with virtio
is its insistance on using native endian-ness
for some fields. If power is used, we'll have to fix this.

-- 
MST

^ permalink raw reply

* Re: [PATCH] virtio-mmio: Devices parameter parsing
From: Rusty Russell @ 2011-12-01  2:06 UTC (permalink / raw)
  To: Pawel Moll
  Cc: linux-kernel@vger.kernel.org,
	virtualization@lists.linux-foundation.org
In-Reply-To: <1322588190.3164.129.camel@hornet.cambridge.arm.com>

On Tue, 29 Nov 2011 17:36:30 +0000, Pawel Moll <pawel.moll@arm.com> wrote:
> On Mon, 2011-11-28 at 00:31 +0000, Rusty Russell wrote:
> > Off the top of my head, this makes me think of the way initcalls are
> > ordered.  We could put a parameter parsing initcall at the start of each
> > initcall level in include/asm-generic/vmlinux.lds.h's INITCALLS macro.
> > 
> > Then we steal four bits from struct kernel_param's flags to indicate the
> > level of the initcall (-1 == existing ones, otherwise N == before level
> > N initcalls).
> 
> Yes, this was my initial idea as well. The only problem I faced is the
> fact that there is no "between levels"... It's easy to add parameters
> parsing _at_ any particular level, but hard to do this _after_ level A
> and _before_ level B. The initcalls section simply contains all the
> calls, ordered by the level - the only "separated" level is the pre-SMP
> early one. And order within one level is determined by the link order,
> so I can't guarantee parsing the parameters as the first call of a level
> (nor as the last call of the previous level).

Yeah, that's why I suggested changing the linker script.
>  /* This is the fundamental function for registering boot/module
>     parameters. */
> -#define __module_param_call(prefix, name, ops, arg, isbool, perm)	\
> +#define __module_param_call(prefix, name, ops, arg, isbool, late, perm)	\
>  	/* Default value instead of permissions? */			\
>  	static int __param_perm_check_##name __attribute__((unused)) =	\
>  	BUILD_BUG_ON_ZERO((perm) < 0 || (perm) > 0777 || ((perm) & 2))	\

Might as well change isbool to "flags", since we have to fix callers
anyway.

> diff --git a/init/main.c b/init/main.c
> index 217ed23..ce89a53 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -407,7 +407,7 @@ static int __init do_early_param(char *param, char *val)
>  
>  void __init parse_early_options(char *cmdline)
>  {
> -	parse_args("early options", cmdline, NULL, 0, do_early_param);
> +	parse_args("early options", cmdline, NULL, 0, 0, 0, do_early_param);

It'd be nice to replace the early param stuff too, but that's probably a
separate patch.  As is getting rid of the old __setup() calls everywhere
;)

But so far, it looks good!

Thanks,
Rusty.

^ permalink raw reply

* Re: [RFC] virtio: use mandatory barriers for remote processor vdevs
From: Rusty Russell @ 2011-12-01  2:28 UTC (permalink / raw)
  To: Michael S. Tsirkin, Ohad Ben-Cohen
  Cc: linux-arm-kernel, linux-kernel, kvm, virtualization
In-Reply-To: <20111130231306.GD30031@redhat.com>

On Thu, 1 Dec 2011 01:13:07 +0200, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> For x86, stores into memory are ordered. So I think that yes, smp_XXX
> can be selected at compile time.
> 
> So let's forget the virtio strangeness for a minute,

Hmm, we got away with light barriers because we knew we were not
*really* talking to a device.  But now with virtio-mmio, turns out we
are :)

I'm really tempted to revert d57ed95 for 3.2, and we can revisit this
optimization later if it proves worthwhile.

Thoughts?
Rusty. 

^ permalink raw reply

* Re: [PATCH] virtio-ring: Use threshold for switching to indirect descriptors
From: Rusty Russell @ 2011-12-01  2:42 UTC (permalink / raw)
  To: Sasha Levin, Avi Kivity
  Cc: markmc, virtualization, linux-kernel, kvm, Michael S. Tsirkin
In-Reply-To: <1322669511.3985.8.camel@lappy>

On Wed, 30 Nov 2011 18:11:51 +0200, Sasha Levin <levinsasha928@gmail.com> wrote:
> On Tue, 2011-11-29 at 16:58 +0200, Avi Kivity wrote:
> > On 11/29/2011 04:54 PM, Michael S. Tsirkin wrote:
> > > > 
> > > > Which is actually strange, weren't indirect buffers introduced to make
> > > > the performance *better*? From what I see it's pretty much the
> > > > same/worse for virtio-blk.
> > >
> > > I know they were introduced to allow adding very large bufs.
> > > See 9fa29b9df32ba4db055f3977933cd0c1b8fe67cd
> > > Mark, you wrote the patch, could you tell us which workloads
> > > benefit the most from indirect bufs?
> > >
> > 
> > Indirects are really for block devices with many spindles, since there
> > the limiting factor is the number of requests in flight.  Network
> > interfaces are limited by bandwidth, it's better to increase the ring
> > size and use direct buffers there (so the ring size more or less
> > corresponds to the buffer size).
> > 
> 
> I did some testing of indirect descriptors under different workloads.

MST and I discussed getting clever with dynamic limits ages ago, but it
was down low on the TODO list.  Thanks for diving into this...

AFAICT, if the ring never fills, direct is optimal.  When the ring
fills, indirect is optimal (we're better to queue now than later).

Why not something simple, like a threshold which drops every time we
fill the ring?

struct vring_virtqueue
{
...
        int indirect_thresh;
...
}

virtqueue_add_buf_gfp()
{
...

        if (vq->indirect &&
            (vq->vring.num - vq->num_free) + out + in > vq->indirect_thresh)
                return indirect()
...

	if (vq->num_free < out + in) {
                if (vq->indirect && vq->indirect_thresh > 0)
                        vq->indirect_thresh--;

...
}

Too dumb?

Cheers,
Rusty.

^ permalink raw reply

* Re: [PATCHv3 RFC] virtio-pci: flexible configuration layout
From: Rusty Russell @ 2011-12-01  2:42 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Krishna Kumar, kvm, Pawel Moll, Michael S. Tsirkin,
	Alexey Kardashevskiy, Wang Sheng-Hui, lkml - Kernel Mailing List,
	virtualization, Christian Borntraeger, Amit Shah
In-Reply-To: <1322658763.3985.4.camel@lappy>

On Wed, 30 Nov 2011 15:12:43 +0200, Sasha Levin <levinsasha928@gmail.com> wrote:
> On Wed, 2011-11-30 at 10:10 +1030, Rusty Russell wrote:
> > On Mon, 28 Nov 2011 11:15:31 +0200, Sasha Levin <levinsasha928@gmail.com> wrote:
> > > On Mon, 2011-11-28 at 11:25 +1030, Rusty Russell wrote:
> > > > I'd like to see kvmtools remove support for legacy mode altogether,
> > > > but they probably have existing users.
> > > 
> > > While we can't simply remove it right away, instead of mixing our
> > > implementation for both legacy and new spec in the same code we can
> > > split the virtio-pci implementation into two:
> > > 
> > > 	- virtio/virtio-pci-legacy.c
> > > 	- virtio/virtio-pci.c
> > > 
> > > At that point we can #ifdef the entire virtio-pci-legacy.c for now and
> > > remove it at the same time legacy virtio-pci is removed from the kernel.
> > 
> > Hmm, that might be neat, but we can't tell the driver core to try
> > virtio-pci before virtio-pci-legacy, so we need detection code in both
> > modules (and add a "force" flag to virtio-pci-legacy to tell it to
> > accept the device even if it's not a legacy-only one).
> 
> I was thinking more in the direction of fallback code in virtio-pci.c to
> virtio-pci-legacy.c.
> 
> Something like:
> #ifdef VIRTIO_PCI_LEGACY
> [Create BAR0 and map it to virtio-pci-legacy.c]
> #endif
> 
> So BAR0 isn't defined as long as legacy code is there, which makes
> falling back to legacy pretty simple.

But it's nicer to see the driver actually labelled "virtio-pci-legacy",
and such a module.

I'll code something up, see what it looks like.

Cheers,
Rusty.

^ permalink raw reply

* Re: [PATCH] Add virtio-scsi to the virtio spec
From: Rusty Russell @ 2011-12-01  3:14 UTC (permalink / raw)
  To: Paolo Bonzini, virtualization
  Cc: linux-scsi, LKML, Stefan Hajnoczi, Michael S. Tsirkin
In-Reply-To: <1322661042-28191-1-git-send-email-pbonzini@redhat.com>

On Wed, 30 Nov 2011 14:50:41 +0100, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Hi all,
> 
> here is the specification for a virtio-based SCSI host (controller, HBA,
> you name it).  The virtio SCSI host is the basis of an alternative
> storage stack for KVM. This stack would overcome several limitations of
> the current solution, virtio-blk:

OK, I like the idea, but I'd prefer to see the spec only cover things
which are implemented and tested, otherwise the risk of a flaw in the
spec is really high in my experience.

Comments below:

>   num_queues is the total number of virtqueues exposed by the
>     device. The driver is free to use only one request queue, or
>     it can use more to achieve better performance.

s/total number of virtqueues/total number of request virtqueues/ ?

>   max_channel, max_target and max_lun can be used by the driver
>     as hints for scanning the logical units on the host. In the
>     current version of the spec, they will always be respectively
>     0, 255 and 16383.

s/hints for scanning/hints to constrain scanning/ ? (I assume).  But why
mention the current values?  That doesn't help someone implementing a
driver or a device.  If you want to, you could mention that as an
implementation detail of your current implmentation, but it seems out of
place in the spec.

> If the driver uses the eventq, it should then place at least a
> buffer in the eventq.

s/at least a/at least one/

> The driver queues requests to an arbitrary request queue, and they are
> used by the device on that same queue. In this version of the spec,
> if a driver uses more than one queue it is the responsibility of the
> driver to ensure strict request ordering; commands placed on different
> queue will be consumed with no order constraints.

Suggest simplification of second sentence:

It is the responsibility of the driver to ensure strict request
ordering; commands placed on different queues will be consumed with no
order constraints.

> The lun field addresses a target and logical unit in the
> virtio-scsi device's SCSI domain. In this version of the spec,
> the only supported format for the LUN field is: first byte set to
> 1, second byte set to target, third and fourth byte representing
> a single level LUN structure, followed by four zero bytes. With
> this representation, a virtio-scsi device can serve up to 256
> targets and 16384 LUNs per target.

You keep saying "In this version of the spec".  I would delete that
phrase everywhere.

> Task_attr, prio and crn should be left to zero: command priority
> is explicitly not supported by this version of the device;
> task_attr defines the task attribute as in the table above, but
> all task attributes may be mapped to SIMPLE by the device; crn
> may also be provided by clients, but is generally expected to be
> 0. The maximum CRN value defined by the protocol is 255, since
> CRN is stored in an 8-bit integer.

Be braver in your language please.  It helps poor implementers who are
already confused by learning SCSI and virtio:

 Task_attr, and prio must be zero.[1] task_attr defines the task
 attribute as in the table above, but all task attributes may be mapped
 to SIMPLE by the device; crn may also be provided by clients, but is
 generally expected to be 0.

 [1] Future extensions may use these fields.

Is it useful for a driver to specify ordered (or other) modes, knowing
it could be reduced to SIMPLE without it being aware?  Or should we use
feature bits to indicate what the device supports?

>   Note that since ACA is not supported by this version of the
>   spec, VIRTIO_SCSI_T_TMF_CLEAR_ACA is always a no-operation.

I think if you don't support ACA in the spec, don't define this.  How
will a driver author use this information?

>     struct virtio_scsi_ctrl_an {
>	 u32 type;
>	 u8  lun[8];
>	 u32 event_requested;
>	 u32 event_actual;
>	 u8  response;
>     }

With all these structures, you might want a comment indicating the
read-only and write-only (from the device POV) parts of the struct, eg:

     struct virtio_scsi_ctrl_an {
         // Read-only part
	 u32 type;
	 u8  lun[8];
	 u32 event_requested;
         // Write-only part
	 u32 event_actual;
	 u8  response;
     }

But basically, though I know nothing about SCSI, I like both the content
and style of this addition!

Thanks,
Rusty.

^ permalink raw reply

* Re: [RFC] virtio: use mandatory barriers for remote processor vdevs
From: Ohad Ben-Cohen @ 2011-12-01  6:14 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: linux-arm-kernel, linux-kernel, kvm, virtualization
In-Reply-To: <20111130231306.GD30031@redhat.com>

On Thu, Dec 1, 2011 at 1:13 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> For x86, stores into memory are ordered. So I think that yes, smp_XXX
> can be selected at compile time.

But then you can't use the same kernel image for both scenarios.

It won't take long until people will use virtio on ARM for both
virtualization and for talking to devices, and having to rebuild the
kernel for different use cases is nasty.

^ permalink raw reply

* Re: [RFC] virtio: use mandatory barriers for remote processor vdevs
From: Ohad Ben-Cohen @ 2011-12-01  6:20 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: linux-arm-kernel, linux-kernel, kvm, virtualization
In-Reply-To: <20111130234318.GC31069@redhat.com>

On Thu, Dec 1, 2011 at 1:43 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> And these accesses need to be ordered with DSB? Or DMB?

DMB (i.e. smp barriers) should be enough within Normal memory
accesses, though the other issues that were reported to me are a bit
concerning. I'm still trying to get more information about them, in
the hopes that I can eventually reproduce them myself.

^ permalink raw reply

* Re: [RFC] virtio: use mandatory barriers for remote processor vdevs
From: Ohad Ben-Cohen @ 2011-12-01  7:15 UTC (permalink / raw)
  To: Rusty Russell
  Cc: linux-arm-kernel, virtualization, linux-kernel, kvm,
	Michael S. Tsirkin
In-Reply-To: <87zkfdrpn8.fsf@rustcorp.com.au>

On Thu, Dec 1, 2011 at 4:28 AM, Rusty Russell <rusty@rustcorp.com.au> wrote:
> Hmm, we got away with light barriers because we knew we were not
> *really* talking to a device.  But now with virtio-mmio, turns out we
> are :)
>
> I'm really tempted to revert d57ed95 for 3.2, and we can revisit this
> optimization later if it proves worthwhile.

+1

^ permalink raw reply

* Re: [PATCH] virtio-ring: Use threshold for switching to indirect descriptors
From: Michael S. Tsirkin @ 2011-12-01  7:58 UTC (permalink / raw)
  To: Rusty Russell
  Cc: markmc, kvm, linux-kernel, virtualization, Sasha Levin,
	Avi Kivity
In-Reply-To: <87wrahrp0u.fsf@rustcorp.com.au>

On Thu, Dec 01, 2011 at 01:12:25PM +1030, Rusty Russell wrote:
> On Wed, 30 Nov 2011 18:11:51 +0200, Sasha Levin <levinsasha928@gmail.com> wrote:
> > On Tue, 2011-11-29 at 16:58 +0200, Avi Kivity wrote:
> > > On 11/29/2011 04:54 PM, Michael S. Tsirkin wrote:
> > > > > 
> > > > > Which is actually strange, weren't indirect buffers introduced to make
> > > > > the performance *better*? From what I see it's pretty much the
> > > > > same/worse for virtio-blk.
> > > >
> > > > I know they were introduced to allow adding very large bufs.
> > > > See 9fa29b9df32ba4db055f3977933cd0c1b8fe67cd
> > > > Mark, you wrote the patch, could you tell us which workloads
> > > > benefit the most from indirect bufs?
> > > >
> > > 
> > > Indirects are really for block devices with many spindles, since there
> > > the limiting factor is the number of requests in flight.  Network
> > > interfaces are limited by bandwidth, it's better to increase the ring
> > > size and use direct buffers there (so the ring size more or less
> > > corresponds to the buffer size).
> > > 
> > 
> > I did some testing of indirect descriptors under different workloads.
> 
> MST and I discussed getting clever with dynamic limits ages ago, but it
> was down low on the TODO list.  Thanks for diving into this...
> 
> AFAICT, if the ring never fills, direct is optimal.  When the ring
> fills, indirect is optimal (we're better to queue now than later).
> 
> Why not something simple, like a threshold which drops every time we
> fill the ring?
> 
> struct vring_virtqueue
> {
> ...
>         int indirect_thresh;
> ...
> }
> 
> virtqueue_add_buf_gfp()
> {
> ...
> 
>         if (vq->indirect &&
>             (vq->vring.num - vq->num_free) + out + in > vq->indirect_thresh)
>                 return indirect()
> ...
> 
> 	if (vq->num_free < out + in) {
>                 if (vq->indirect && vq->indirect_thresh > 0)
>                         vq->indirect_thresh--;
>         
> ...
> }
> 
> Too dumb?
> 
> Cheers,
> Rusty.

We'll presumably need some logic to increment is back,
to account for random workload changes.
Something like slow start?

-- 
MST

^ permalink raw reply

* Re: [PATCH] virtio-ring: Use threshold for switching to indirect descriptors
From: Sasha Levin @ 2011-12-01  8:09 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: markmc, kvm, linux-kernel, virtualization, Avi Kivity
In-Reply-To: <20111201075847.GA5479@redhat.com>

On Thu, 2011-12-01 at 09:58 +0200, Michael S. Tsirkin wrote:
> On Thu, Dec 01, 2011 at 01:12:25PM +1030, Rusty Russell wrote:
> > On Wed, 30 Nov 2011 18:11:51 +0200, Sasha Levin <levinsasha928@gmail.com> wrote:
> > > On Tue, 2011-11-29 at 16:58 +0200, Avi Kivity wrote:
> > > > On 11/29/2011 04:54 PM, Michael S. Tsirkin wrote:
> > > > > > 
> > > > > > Which is actually strange, weren't indirect buffers introduced to make
> > > > > > the performance *better*? From what I see it's pretty much the
> > > > > > same/worse for virtio-blk.
> > > > >
> > > > > I know they were introduced to allow adding very large bufs.
> > > > > See 9fa29b9df32ba4db055f3977933cd0c1b8fe67cd
> > > > > Mark, you wrote the patch, could you tell us which workloads
> > > > > benefit the most from indirect bufs?
> > > > >
> > > > 
> > > > Indirects are really for block devices with many spindles, since there
> > > > the limiting factor is the number of requests in flight.  Network
> > > > interfaces are limited by bandwidth, it's better to increase the ring
> > > > size and use direct buffers there (so the ring size more or less
> > > > corresponds to the buffer size).
> > > > 
> > > 
> > > I did some testing of indirect descriptors under different workloads.
> > 
> > MST and I discussed getting clever with dynamic limits ages ago, but it
> > was down low on the TODO list.  Thanks for diving into this...
> > 
> > AFAICT, if the ring never fills, direct is optimal.  When the ring
> > fills, indirect is optimal (we're better to queue now than later).
> > 
> > Why not something simple, like a threshold which drops every time we
> > fill the ring?
> > 
> > struct vring_virtqueue
> > {
> > ...
> >         int indirect_thresh;
> > ...
> > }
> > 
> > virtqueue_add_buf_gfp()
> > {
> > ...
> > 
> >         if (vq->indirect &&
> >             (vq->vring.num - vq->num_free) + out + in > vq->indirect_thresh)
> >                 return indirect()
> > ...
> > 
> > 	if (vq->num_free < out + in) {
> >                 if (vq->indirect && vq->indirect_thresh > 0)
> >                         vq->indirect_thresh--;
> >         
> > ...
> > }
> > 
> > Too dumb?
> > 
> > Cheers,
> > Rusty.
> 
> We'll presumably need some logic to increment is back,
> to account for random workload changes.
> Something like slow start?

We can increment it each time the queue was less than 10% full, it
should act like slow start, no?

-- 

Sasha.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox