All of lore.kernel.org
 help / color / mirror / Atom feed
From: Rusty Russell <rusty@rustcorp.com.au>
To: Anthony Liguori <aliguori@us.ibm.com>, Gerd Hoffmann <kraxel@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>,
	qemu-devel <qemu-devel@nongnu.org>,
	kvm@vger.kernel.org, virtualization@lists.linux-foundation.org
Subject: Re: [Qemu-devel] Using PCI config space to indicate config location
Date: Tue, 09 Oct 2012 13:46:15 +1030	[thread overview]
Message-ID: <87haq48hds.fsf@rustcorp.com.au> (raw)
In-Reply-To: <87sj9oh0pm.fsf@codemonkey.ws>

Anthony Liguori <aliguori@us.ibm.com> writes:
> We'll never remove legacy so we shouldn't plan on it.  There are
> literally hundreds of thousands of VMs out there with the current virtio
> drivers installed in them.  We'll be supporting them for a very, very
> long time :-)

You will be supporting this for qemu on x86, sure.  As I think we're
still in the growth phase for virtio, I prioritize future spec
cleanliness pretty high.

But I think you'll be surprised how fast this is deprecated:
1) Bigger queues for block devices (guest-specified ringsize)
2) Smaller rings for openbios (guest-specified alignment)
3) All-mmio mode (powerpc)
4) Whatever network features get numbers > 31.

> I don't think we gain a lot by moving the ISR into a separate BAR.
> Splitting up registers like that seems weird to me too.

Confused.  I proposed the same split as you have, just ISR by itself.

> It's very normal to have a mirrored set of registers that are PIO in one
> bar and MMIO in a different BAR.
>
> If we added an additional constraints that BAR1 was mirrored except for
> the config space and the MSI section was always there, I think the end
> result would be nice.  IOW:

But it won't be the same, because we want all that extra stuff, like
more feature bits and queue size alignment.  (Admittedly queues past
16TB aren't a killer feature).

To make it concrete:

Current:
struct {
        __le32 host_features;   /* read-only */
        __le32 guest_features;  /* read/write */
        __le32 queue_pfn;       /* read/write */
        __le16 queue_size;      /* read-only */
        __le16 queue_sel;       /* read/write */
        __le16 queue_notify;    /* read/write */
        u8 status;              /* read/write */
        u8 isr;                 /* read-only, clear on read */
        /* Optional */
        __le16 msi_config_vector;       /* read/write */
        __le16 msi_queue_vector;        /* read/write */
        /* ... device features */
};

Proposed:
struct virtio_pci_cfg {
	/* About the whole device. */
	__le32 device_feature_select;	/* read-write */
	__le32 device_feature;		/* read-only */
	__le32 guest_feature_select;	/* read-write */
	__le32 guest_feature;		/* read-only */
	__le16 msix_config;		/* read-write */
	__u8 device_status;		/* read-write */
	__u8 unused;

	/* About a specific virtqueue. */
	__le16 queue_select;	/* read-write */
	__le16 queue_align;	/* read-write, power of 2. */
	__le16 queue_size;	/* read-write, power of 2. */
	__le16 queue_msix_vector;/* read-write */
	__le64 queue_address;	/* read-write: 0xFFFFFFFFFFFFFFFF == DNE. */
};

struct virtio_pci_isr {
        __u8 isr;                 /* read-only, clear on read */
};

We could also enforce LE in the per-device config space in this case,
another nice cleanup for PCI people.

> BAR0[pio]: virtio-pci registers + optional MSI section + virtio-config
> BAR1[mmio]: virtio-pci registers + MSI section + future extensions
> BAR2[mmio]: virtio-config
>
> We can continue to do ISR access via BAR0 for performance reasons.

But powerpc explicitly *doesnt* want a pio bar.  So let it be its own
bar, which can be either.

>> As to MMIO vs PIO, the BARs are self-describing, so we should explicitly
>> endorse that and leave it to the devices.
>>
>> The detection is simple: if BAR1 has non-zero length, it's new-style,
>> otherwise legacy.
>
> I agree that this is the best way to extend, but I think we should still
> use a transport feature bit.  We want to be able to detect within QEMU
> whether a guest is using these new features because we need to adjust
> migration state accordingly.
>
> Otherwise we would have to detect reads/writes to the new BARs to
> maintain whether the extended register state needs to be saved.  This
> gets nasty dealing with things like reset.

I don't think it'll be that bad; reset clears the device to unknown,
bar0 moves it from unknown->legacy mode, bar1/2/3 changes it from
unknown->modern mode, and anything else is bad (I prefer being strict so
we catch bad implementations from the beginning).

But I'm happy to implement it and see what it's like.

> A feature bit simplifies this all pretty well.

I suspect it will be quite ugly, actually.  The guest has to use BAR0 to
get the features to see if they can use BAR1.  Do they ack the feature
(via BAR0) before accessing BAR1?  If not, qemu can't rely on the
feature bit.

Cheers,
Rusty.

WARNING: multiple messages have this Message-ID (diff)
From: Rusty Russell <rusty@rustcorp.com.au>
To: Anthony Liguori <aliguori@us.ibm.com>, Gerd Hoffmann <kraxel@redhat.com>
Cc: virtualization@lists.linux-foundation.org,
	qemu-devel <qemu-devel@nongnu.org>,
	kvm@vger.kernel.org, "Michael S. Tsirkin" <mst@redhat.com>
Subject: Re: [Qemu-devel] Using PCI config space to indicate config location
Date: Tue, 09 Oct 2012 13:46:15 +1030	[thread overview]
Message-ID: <87haq48hds.fsf@rustcorp.com.au> (raw)
In-Reply-To: <87sj9oh0pm.fsf@codemonkey.ws>

Anthony Liguori <aliguori@us.ibm.com> writes:
> We'll never remove legacy so we shouldn't plan on it.  There are
> literally hundreds of thousands of VMs out there with the current virtio
> drivers installed in them.  We'll be supporting them for a very, very
> long time :-)

You will be supporting this for qemu on x86, sure.  As I think we're
still in the growth phase for virtio, I prioritize future spec
cleanliness pretty high.

But I think you'll be surprised how fast this is deprecated:
1) Bigger queues for block devices (guest-specified ringsize)
2) Smaller rings for openbios (guest-specified alignment)
3) All-mmio mode (powerpc)
4) Whatever network features get numbers > 31.

> I don't think we gain a lot by moving the ISR into a separate BAR.
> Splitting up registers like that seems weird to me too.

Confused.  I proposed the same split as you have, just ISR by itself.

> It's very normal to have a mirrored set of registers that are PIO in one
> bar and MMIO in a different BAR.
>
> If we added an additional constraints that BAR1 was mirrored except for
> the config space and the MSI section was always there, I think the end
> result would be nice.  IOW:

But it won't be the same, because we want all that extra stuff, like
more feature bits and queue size alignment.  (Admittedly queues past
16TB aren't a killer feature).

To make it concrete:

Current:
struct {
        __le32 host_features;   /* read-only */
        __le32 guest_features;  /* read/write */
        __le32 queue_pfn;       /* read/write */
        __le16 queue_size;      /* read-only */
        __le16 queue_sel;       /* read/write */
        __le16 queue_notify;    /* read/write */
        u8 status;              /* read/write */
        u8 isr;                 /* read-only, clear on read */
        /* Optional */
        __le16 msi_config_vector;       /* read/write */
        __le16 msi_queue_vector;        /* read/write */
        /* ... device features */
};

Proposed:
struct virtio_pci_cfg {
	/* About the whole device. */
	__le32 device_feature_select;	/* read-write */
	__le32 device_feature;		/* read-only */
	__le32 guest_feature_select;	/* read-write */
	__le32 guest_feature;		/* read-only */
	__le16 msix_config;		/* read-write */
	__u8 device_status;		/* read-write */
	__u8 unused;

	/* About a specific virtqueue. */
	__le16 queue_select;	/* read-write */
	__le16 queue_align;	/* read-write, power of 2. */
	__le16 queue_size;	/* read-write, power of 2. */
	__le16 queue_msix_vector;/* read-write */
	__le64 queue_address;	/* read-write: 0xFFFFFFFFFFFFFFFF == DNE. */
};

struct virtio_pci_isr {
        __u8 isr;                 /* read-only, clear on read */
};

We could also enforce LE in the per-device config space in this case,
another nice cleanup for PCI people.

> BAR0[pio]: virtio-pci registers + optional MSI section + virtio-config
> BAR1[mmio]: virtio-pci registers + MSI section + future extensions
> BAR2[mmio]: virtio-config
>
> We can continue to do ISR access via BAR0 for performance reasons.

But powerpc explicitly *doesnt* want a pio bar.  So let it be its own
bar, which can be either.

>> As to MMIO vs PIO, the BARs are self-describing, so we should explicitly
>> endorse that and leave it to the devices.
>>
>> The detection is simple: if BAR1 has non-zero length, it's new-style,
>> otherwise legacy.
>
> I agree that this is the best way to extend, but I think we should still
> use a transport feature bit.  We want to be able to detect within QEMU
> whether a guest is using these new features because we need to adjust
> migration state accordingly.
>
> Otherwise we would have to detect reads/writes to the new BARs to
> maintain whether the extended register state needs to be saved.  This
> gets nasty dealing with things like reset.

I don't think it'll be that bad; reset clears the device to unknown,
bar0 moves it from unknown->legacy mode, bar1/2/3 changes it from
unknown->modern mode, and anything else is bad (I prefer being strict so
we catch bad implementations from the beginning).

But I'm happy to implement it and see what it's like.

> A feature bit simplifies this all pretty well.

I suspect it will be quite ugly, actually.  The guest has to use BAR0 to
get the features to see if they can use BAR1.  Do they ack the feature
(via BAR0) before accessing BAR1?  If not, qemu can't rely on the
feature bit.

Cheers,
Rusty.

  parent reply	other threads:[~2012-10-09  3:22 UTC|newest]

Thread overview: 91+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-09-27  0:29 Proposal for virtio standardization Rusty Russell
2012-09-27  0:29 ` [Qemu-devel] " Rusty Russell
2012-09-27  0:29 ` Rusty Russell
2012-10-04 18:49 ` [Qemu-devel] " Anthony Liguori
2012-10-04 18:49   ` Anthony Liguori
2012-10-08  2:21   ` Using PCI config space to indicate config location Rusty Russell
2012-10-08  2:21   ` Rusty Russell
2012-10-08  2:21     ` [Qemu-devel] " Rusty Russell
2012-10-08 13:58     ` Anthony Liguori
2012-10-08 13:58     ` Anthony Liguori
2012-10-08 13:58       ` Anthony Liguori
2012-10-08 14:58       ` Gerd Hoffmann
2012-10-08 14:58         ` Gerd Hoffmann
2012-10-08 15:09         ` Anthony Liguori
2012-10-08 15:09           ` Anthony Liguori
2012-10-08 20:13           ` Gerd Hoffmann
2012-10-08 20:13             ` Gerd Hoffmann
2012-10-08 20:55             ` Anthony Liguori
2012-10-08 20:55               ` Anthony Liguori
2012-10-08 23:56               ` Rusty Russell
2012-10-09  1:51                 ` Anthony Liguori
2012-10-09  3:16                   ` Rusty Russell
2012-10-09  3:16                   ` Rusty Russell [this message]
2012-10-09  3:16                     ` Rusty Russell
2012-10-09 10:17                     ` Avi Kivity
2012-10-09 10:17                       ` Avi Kivity
2012-10-09 14:03                       ` Anthony Liguori
2012-10-09 14:03                         ` Anthony Liguori
2012-10-09 13:56                     ` Anthony Liguori
2012-10-09 13:56                       ` Anthony Liguori
2012-10-10  3:44                       ` Rusty Russell
2012-10-10  3:44                         ` Rusty Russell
2012-10-10 11:37                         ` Michael S. Tsirkin
2012-10-10 11:37                           ` Michael S. Tsirkin
2012-10-09 13:56                     ` Anthony Liguori
2012-10-09 21:09                     ` Jamie Lokier
2012-10-09 21:09                     ` Jamie Lokier
2012-10-09 21:09                       ` [Qemu-devel] " Jamie Lokier
2012-10-10  3:44                       ` Rusty Russell
2012-10-10  3:44                         ` Rusty Russell
2012-10-11  0:08                         ` Rusty Russell
2012-10-11  0:08                           ` Rusty Russell
2012-10-11  0:08                         ` Rusty Russell
2012-10-09  6:33                   ` Gerd Hoffmann
2012-10-09  6:33                     ` Gerd Hoffmann
2012-10-09 15:26                     ` Anthony Liguori
2012-10-09 15:26                     ` Anthony Liguori
2012-10-09 15:26                       ` Anthony Liguori
2012-10-09 20:24                       ` Gerd Hoffmann
2012-10-09 20:24                         ` Gerd Hoffmann
2012-10-10  2:54                         ` Rusty Russell
2012-10-10  2:54                           ` Rusty Russell
2012-10-10 13:36                           ` Anthony Liguori
2012-10-10 13:41                             ` Michael S. Tsirkin
2012-10-10 13:41                               ` Michael S. Tsirkin
2012-10-11  0:43                               ` Rusty Russell
2012-10-11  0:43                               ` Rusty Russell
2012-10-11  0:43                                 ` [Qemu-devel] " Rusty Russell
2012-10-10  2:54                         ` Rusty Russell
2012-10-10  8:34                 ` Michael S. Tsirkin
2012-10-10  8:34                   ` Michael S. Tsirkin
2012-10-10  8:30     ` Michael S. Tsirkin
2012-10-10  8:30       ` [Qemu-devel] " Michael S. Tsirkin
2012-10-11  1:18       ` Rusty Russell
2012-10-11  1:18         ` [Qemu-devel] " Rusty Russell
2012-10-11 10:23         ` Michael S. Tsirkin
2012-10-11 10:23           ` [Qemu-devel] " Michael S. Tsirkin
2012-10-11 22:29           ` Rusty Russell
2012-10-11 22:29           ` Rusty Russell
2012-10-11 22:29             ` [Qemu-devel] " Rusty Russell
2012-10-12  9:33             ` Michael S. Tsirkin
2012-10-12  9:33               ` [Qemu-devel] " Michael S. Tsirkin
2012-10-12  9:51               ` Rusty Russell
2012-10-12  9:51                 ` [Qemu-devel] " Rusty Russell
2012-10-12 10:02                 ` Michael S. Tsirkin
2012-10-12 10:02                   ` [Qemu-devel] " Michael S. Tsirkin
2012-10-16 13:15                   ` Rusty Russell
2012-10-16 13:15                   ` Rusty Russell
2012-10-16 13:15                     ` [Qemu-devel] " Rusty Russell
2012-10-16 13:30                     ` Michael S. Tsirkin
2012-10-16 13:30                       ` [Qemu-devel] " Michael S. Tsirkin
2012-10-16 13:52                       ` Rusty Russell
2012-10-16 13:52                         ` [Qemu-devel] " Rusty Russell
2012-10-16 13:52                       ` Rusty Russell
2012-10-04 18:49 ` [Qemu-devel] Proposal for virtio standardization Anthony Liguori
2012-10-09 14:02 ` Cornelia Huck
2012-10-09 14:02   ` [Qemu-devel] " Cornelia Huck
2012-10-10  3:46   ` Rusty Russell
2012-10-10  3:46     ` [Qemu-devel] " Rusty Russell
2012-10-10  3:46   ` Rusty Russell
2012-10-09 14:02 ` Cornelia Huck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87haq48hds.fsf@rustcorp.com.au \
    --to=rusty@rustcorp.com.au \
    --cc=aliguori@us.ibm.com \
    --cc=kraxel@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.