From: Rusty Russell <rusty@rustcorp.com.au>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
Sasha Levin <levinsasha928@gmail.com>,
Pawel Moll <pawel.moll@arm.com>,
virtualization <virtualization@lists.linux-foundation.org>
Subject: Re: [RFC 7/11] virtio_pci: new, capability-aware driver.
Date: Thu, 12 Jan 2012 12:31:09 +1030 [thread overview]
Message-ID: <87vcoh4r2y.fsf@rustcorp.com.au> (raw)
In-Reply-To: <20120111102129.GC20988@redhat.com>
On Wed, 11 Jan 2012 12:21:30 +0200, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Wed, Jan 11, 2012 at 10:55:52AM +1030, Rusty Russell wrote:
> > On Tue, 10 Jan 2012 19:03:36 +0200, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > On Wed, Dec 21, 2011 at 11:03:25AM +1030, Rusty Russell wrote:
> > > > Yes. The idea that we can alter fields in the device-specific config
> > > > area is flawed. There may be cases where it doesn't matter, but as an
> > > > idea it was holed to begin with.
> > > >
> > > > We can reduce probability by doing a double read to check, but there are
> > > > still cases where it will fail.
> > >
> > > Okay - want me to propose an interface for that?
> >
> > Had a brief chat with BenH (CC'd).
> >
> > I think we should deprecate writing to the config space. Only balloon
> > does it AFAICT, and I can't quite figure out *why* it has an 'active'
> > field.
>
> Are you sure? I think net writes a mac address.
True. We'll need to disable that, and come up with another mechanism if
we want it back (a new feature and a VIRTIO_NET_HDR_F_SET_MAC flag in
the virtio_net header? Or would that mess up vhost_net?).
> > This solves half the problem, of sync guest writes. For the
> > other half, I suggest a generation counter; odd means inconsistent. The
> > guest can poll.
>
> So we get the counter until it's even, get the config, if it's changed
> repeat? Yes it works. However, I would like to have a way to detect
> config change just by looking at memory. ATM we need to read ISR to
> know. If we used a VQ, the advantage would be that the device can work
> with a single MSIX vector shared by all VQs.
If we use a 32-bit counter, we also get this though, right?
If counter has changed, it's a config interrupt...
> If we do require config VQ anyway, why not use it to notify
> guest of config changes? Guest could pre-post an in buffer
> and host uses that.
We could, but it's an additional burden on each device. vqs are cheap,
but not free. And the config area is so damn convenient...
> > BenH also convinced me we should finally make the config space LE if
> > we're going to change things. Since PCI is the most common transport,
> > guest-endian confuses people. And it sucks for really weird machines.
>
> Are we going to keep guest endian for e.g. virtio net header?
> If yes the benefit of switching config space is not that big.
> And changes in devices would affect non-PCI transports.
Yep. It would only make sense if we do it for everything. And yes,
it'll mess up everyone who is BE, so it needs to be a feature bit for
them.
> > We should also change the ring (to a single ring, I think). Descriptors
> > to 24 bytes long (8 byte cookie, 8 byte addr, 4 byte len, 4 byte flags).
> > We might be able to squeeze it into 20 bytes but that means packing. We
> > should support inline, chained or indirect. Let the other side ack by
> > setting flag, cookie and len (if written).
>
> Quite possibly all or some of these things help performance
> but do we have to change the spec before we have experimental
> proof?
We change the spec last, once we know what we're doing, ideally.
> I did experiment with a single ring using tools/virtio and
> I didn't see a measureable performance gain.
Interesting. It is simpler and more standard than our current design,
but that's not sufficient unless there are other reasons. Needs further
discussion and testing.
> Two rings do have the advantage of not requiring host side copy, which
> copy would surely add to cache pressure.
Well, a simple host could process in-order and leave stuff in the ring I
guess. A smarter host would copy and queue, maybe leave one queue entry
in so it doesn't get flooded?
> Since
> host doesn't change desriptors, we could also
> preformat some descriptors in the current design.
>
> There is a fragmentation problem in theory but it can be alleviated with
> a smart allocator.
Yeah, the complexity scares me...
> About inline - it can only help very small buffers.
> Which workloads do you have in mind exactly?
It was suggested by others, but I think TCP Acks are the classic one.
12 + 14 + 20 + 40 = 86 bytes with virtio_net_hdr_mrg_rxbuf at the front.
> BTW this seems to be the reverse from what you have in Mar 2001,
> see 87mxkjls61.fsf@rustcorp.com.au :)
(s/2001/2011). Indeed. Noone shared my optimism that having an open
process for a virtio2 would bring more players on board (my original
motivation). But technical requirements are mounting up, which means
we're going to get there anyway.
> I am much less concerned with what we do for configuration,
> but I do not believe we have learned all performance lessons
> from virtio ring1. Is there any reason why we shouldn't be
> able to experiment with inline within virtio1 and see
> whether that gets us anything?
Inline in the used ring is possible, but those descriptors 8 bytes, vs
24/32.
> If we do a bunch of changes to the ring at once, we can't
> figure out what's right, what's wrong, or back out of
> mistakes later.
>
> Since there are non PCI transports that use the ring,
> we really shouldn't make both the configuration and
> the ring changes depend on the same feature bit.
Yes, I'm thinking #define VIRTIO_F_VIRTIO2 (-1). For PCI, this gets
mapped into a "are we using the new config layout?". For others, it
gets mapped into a transport-specific feature.
(I'm sure you get it, but for the others) This is because I want to be
draw a clear line between all the legacy stuff at the same time, not
have to support part of it later because someone might not flip the
feature bit.
Cheers,
Rusty.
next prev parent reply other threads:[~2012-01-12 2:01 UTC|newest]
Thread overview: 106+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-12-08 10:22 [PATCH 0/11] RFC: PCI using capabilitities Rusty Russell
2011-12-08 10:30 ` [RFC 1/11] virtio: use u32, not bitmap for struct virtio_device's features Rusty Russell
2011-12-08 10:31 ` [RFC 2/11] virtio: add support for 64 bit features Rusty Russell
2011-12-08 10:32 ` [PATCH 0/11] RFC: PCI using capabilitities Sasha Levin
2011-12-08 10:32 ` [RFC 3/11] pci: add pci_iomap_range Rusty Russell
2011-12-15 8:30 ` Michael S. Tsirkin
2011-12-16 1:56 ` Rusty Russell
2011-12-26 14:05 ` [PATCHv2 RFC] " Michael S Tsirkin
2011-12-26 14:05 ` Michael S Tsirkin
2011-12-08 10:34 ` [RFC 4/11] virtio-pci: define layout for virtio vendor-specific capabilities Rusty Russell
2011-12-10 21:14 ` Sasha Levin
2011-12-08 10:35 ` [RFC 6/11] virtio_pci: move old defines to legacy, introduce new structure Rusty Russell
2011-12-08 10:38 ` [RFC 6/11] virtio_pci: don't use the legacy driver if we find the new PCI capabilities Rusty Russell
2011-12-10 21:18 ` Sasha Levin
2011-12-11 5:15 ` Rusty Russell
2011-12-11 9:37 ` Michael S. Tsirkin
2011-12-08 10:39 ` [RFC 7/11] virtio_pci: new, capability-aware driver Rusty Russell
2011-12-11 9:42 ` Michael S. Tsirkin
2011-12-11 22:45 ` Rusty Russell
2011-12-12 11:49 ` Michael S. Tsirkin
2011-12-12 18:10 ` Don Dutile
2011-12-16 1:58 ` Rusty Russell
2013-05-28 7:56 ` Michael S. Tsirkin
2013-05-29 1:17 ` Rusty Russell
2013-05-29 10:21 ` Michael S. Tsirkin
2013-05-30 2:17 ` Rusty Russell
2011-12-12 18:25 ` Michael S. Tsirkin
2011-12-13 2:21 ` Rusty Russell
2011-12-15 8:27 ` Michael S. Tsirkin
2011-12-16 1:50 ` Rusty Russell
2011-12-18 10:18 ` Michael S. Tsirkin
2011-12-19 6:06 ` Rusty Russell
2011-12-19 9:13 ` Michael S. Tsirkin
2011-12-19 11:08 ` Rusty Russell
2011-12-20 11:37 ` Michael S. Tsirkin
2011-12-21 0:33 ` Rusty Russell
2011-12-21 9:19 ` Michael S. Tsirkin
2012-01-10 17:03 ` Michael S. Tsirkin
2012-01-11 0:25 ` Rusty Russell
2012-01-11 1:48 ` Benjamin Herrenschmidt
2012-01-11 8:47 ` Stefan Hajnoczi
2012-01-11 9:10 ` Benjamin Herrenschmidt
2012-01-11 14:28 ` Stefan Hajnoczi
2012-01-11 15:39 ` Michael S. Tsirkin
2012-01-11 17:21 ` Stefan Hajnoczi
2012-01-11 18:34 ` Michael S. Tsirkin
2012-01-11 20:56 ` Benjamin Herrenschmidt
2012-01-11 20:46 ` Benjamin Herrenschmidt
2012-01-12 10:37 ` Stefan Hajnoczi
2012-01-11 10:21 ` Michael S. Tsirkin
2012-01-11 21:13 ` Benjamin Herrenschmidt
2012-01-11 22:13 ` Michael S. Tsirkin
2012-01-11 22:19 ` Benjamin Herrenschmidt
2012-01-11 22:56 ` Benjamin Herrenschmidt
2012-01-12 5:29 ` Michael S. Tsirkin
2012-01-12 6:13 ` Benjamin Herrenschmidt
2012-01-12 6:37 ` Michael S. Tsirkin
2012-01-12 2:01 ` Rusty Russell [this message]
2012-01-12 4:31 ` Benjamin Herrenschmidt
2012-01-12 6:09 ` Michael S. Tsirkin
2012-01-12 6:28 ` Benjamin Herrenschmidt
2012-01-12 6:51 ` Michael S. Tsirkin
2012-01-12 7:03 ` Benjamin Herrenschmidt
2012-01-12 6:00 ` Michael S. Tsirkin
2012-01-13 3:22 ` Rusty Russell
2012-01-11 13:30 ` Anthony Liguori
2012-01-11 15:12 ` Michael S. Tsirkin
2012-01-11 15:15 ` Anthony Liguori
2012-01-11 15:21 ` Michael S. Tsirkin
2012-01-11 15:28 ` Anthony Liguori
2012-01-11 15:45 ` Michael S. Tsirkin
2012-01-11 16:02 ` Anthony Liguori
2012-01-11 17:08 ` Michael S. Tsirkin
2012-01-11 19:42 ` Anthony Liguori
2012-01-11 20:14 ` Michael S. Tsirkin
2012-01-11 20:26 ` Anthony Liguori
2012-01-11 21:02 ` Benjamin Herrenschmidt
2012-01-11 22:02 ` Michael S. Tsirkin
2012-01-11 22:16 ` Benjamin Herrenschmidt
2012-01-12 1:42 ` Rusty Russell
2012-01-13 2:19 ` Michael S. Tsirkin
2012-01-13 2:32 ` Benjamin Herrenschmidt
2012-01-18 15:29 ` Michael S. Tsirkin
2012-01-22 22:53 ` Rusty Russell
2012-01-18 4:52 ` Rusty Russell
2012-01-11 21:58 ` Michael S. Tsirkin
2012-01-11 20:59 ` Benjamin Herrenschmidt
2012-01-11 20:51 ` Benjamin Herrenschmidt
2012-01-11 20:50 ` Benjamin Herrenschmidt
2012-01-12 1:35 ` Rusty Russell
2011-12-08 10:40 ` [RFC 8/11] virtio_pci: share structure between legacy and modern Rusty Russell
2011-12-08 10:41 ` [RFC 9/11] virtio_pci: share interrupt/notify handlers " Rusty Russell
2011-12-08 10:42 ` [RFC 10/11] virtio_pci: share virtqueue setup/teardown between modern and legacy driver Rusty Russell
2011-12-08 10:44 ` [RFC 11/11] virtio_pci: simplify common helpers Rusty Russell
2011-12-08 15:37 ` [PATCH 0/11] RFC: PCI using capabilitities Sasha Levin
2011-12-08 15:37 ` Sasha Levin
2011-12-09 6:17 ` Rusty Russell
2011-12-10 21:32 ` Sasha Levin
2011-12-11 9:05 ` Avi Kivity
2011-12-11 10:03 ` Sasha Levin
2011-12-11 12:30 ` Michael S. Tsirkin
2011-12-11 12:48 ` Sasha Levin
2011-12-11 12:48 ` Sasha Levin
2011-12-11 12:47 ` Michael S. Tsirkin
2011-12-11 12:53 ` Sasha Levin
2011-12-11 12:53 ` Sasha Levin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87vcoh4r2y.fsf@rustcorp.com.au \
--to=rusty@rustcorp.com.au \
--cc=benh@kernel.crashing.org \
--cc=borntraeger@de.ibm.com \
--cc=levinsasha928@gmail.com \
--cc=mst@redhat.com \
--cc=pawel.moll@arm.com \
--cc=virtualization@lists.linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.