Linux virtualization list
 help / color / mirror / Atom feed
* Re: [PATCH] virtio-balloon: Trivial cleanups
From: Michael S. Tsirkin @ 2011-11-03 11:09 UTC (permalink / raw)
  To: Sasha Levin; +Cc: linux-kernel, virtualization
In-Reply-To: <1320308404-28859-1-git-send-email-levinsasha928@gmail.com>

On Thu, Nov 03, 2011 at 10:20:04AM +0200, Sasha Levin wrote:
> Trivial changes to remove forgotten junk, format comments, and correct names.
> 
> Cc: Rusty Russell <rusty@rustcorp.com.au>
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: virtualization@lists.linux-foundation.org
> Signed-off-by: Sasha Levin <levinsasha928@gmail.com>

Acked-by: Michael S. Tsirkin <mst@redhat.com>

3.3 material.
Rusty, you want to take it? Want me to?

> ---
>  drivers/virtio/virtio_balloon.c |    6 +++---
>  1 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index e058ace..289e998 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -1,4 +1,5 @@
> -/* Virtio balloon implementation, inspired by Dor Loar and Marcelo
> +/*
> + * Virtio balloon implementation, inspired by Dor Laor and Marcelo
>   * Tosatti's implementations.
>   *
>   *  Copyright 2008 Rusty Russell IBM Corporation
> @@ -17,7 +18,7 @@
>   *  along with this program; if not, write to the Free Software
>   *  Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
>   */
> -//#define DEBUG
> +
>  #include <linux/virtio.h>
>  #include <linux/virtio_balloon.h>
>  #include <linux/swap.h>
> @@ -148,7 +149,6 @@ static void leak_balloon(struct virtio_balloon *vb, size_t num)
>  		vb->num_pages--;
>  	}
>  
> -
>  	/*
>  	 * Note that if
>  	 * virtio_has_feature(vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST);
> -- 
> 1.7.7.1

^ permalink raw reply

* Re: [PATCH RFC] virtio-pci: flexible configuration layout
From: Michael S. Tsirkin @ 2011-11-03 11:36 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Krishna Kumar, kvm, Pawel Moll, Wang Sheng-Hui,
	Alexey Kardashevskiy, lkml - Kernel Mailing List, Jesse Barnes,
	virtualization, Christian Borntraeger, Amit Shah, Linus Torvalds
In-Reply-To: <1320318591.29407.17.camel@lappy>

On Thu, Nov 03, 2011 at 01:09:51PM +0200, Sasha Levin wrote:
> On Thu, 2011-11-03 at 12:33 +0200, Michael S. Tsirkin wrote:
> > On Thu, Nov 03, 2011 at 02:19:03AM +0200, Sasha Levin wrote:
> > > Hi Michael,
> > > 
> > > On Thu, 2011-11-03 at 01:31 +0200, Michael S. Tsirkin wrote:
> > > > Add a flexible mechanism to specify virtio configuration layout, using
> > > > pci vendor-specific capability.  A separate capability is used for each
> > > > of common, device specific and data-path accesses.
> > > > 
> > > > Warning: compiled only.
> > > > This patch also needs to be split up, pci_iomap changes
> > > > also need arch updates for non-x86.
> > > > 
> > > > We also will need to update the spec.
> > > > 
> > > > See the first chunk for layout documentation.
> > > > 
> > > > Posting here for early feedback.
> > > > 
> > > > In particular:
> > > > 
> > > > Do we need to require offset to be aligned?
> > > > Does iowrite16 work with unaligned accesses on all architectures?
> > > > Does using ioread/write as we do add overhead as compared to
> > > > plain PIO accesses?
> > > > 
> > > > Jesse - are you OK with the pci_iomap_range API proposed here
> > > > (see last chunks)?
> > > > I noticed lots of architectures duplicate the implementation
> > > > of pci_iomap - makes sense to clean that up?
> > > > 
> > > > Thanks,
> > > > 
> > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > > 
> > > > ---
> > > > 
> > > > diff --git a/include/linux/virtio_pci.h b/include/linux/virtio_pci.h
> > > > index ea66f3f..b8a9de2 100644
> > > > --- a/include/linux/virtio_pci.h
> > > > +++ b/include/linux/virtio_pci.h
> > > > @@ -92,4 +92,38 @@
> > > >  /* The alignment to use between consumer and producer parts of vring.
> > > >   * x86 pagesize again. */
> > > >  #define VIRTIO_PCI_VRING_ALIGN		4096
> > > > +
> > > > +/*
> > > > + * Layout for Virtio PCI vendor specific capability (little-endian):
> > > > + * 5 bit virtio capability id.
> > > > + * 3 bit BAR index register, specifying which BAR to use.
> > > > + * 4 byte cfg offset within the BAR.
> > > 
> > > Why not make the 3 bits of the BIR as the lower 3 bits of the offset
> > > like its done in the PCI spec?
> > > 
> > > This way you also get 'free' 32bit alignment and simpler masking.
> > 
> > 3 bit gets us 64 bit alignment, no? Seems a bit more than we need:
> > we need at most 16 bit alignment.
> 
> Yup, it's actually 64 bit, but whats the problem with that?

Wastes space. Remember we can put stuff in PIO space
which MSI-X doesn't allow, but we need to allow as long
as MMIO is slower.

> It will also
> look more similar to the PCI spec.

You mean MSI-X in the PCI spec, right?
MSI-X needs alignment to function

> > 
> > > > + * 4 byte cfg size.
> > > > + */
> > > > +
> > > > +/* A single virtio device has multiple vendor specific capabilities, we use the
> > > > + * 5 bit ID field to distinguish between these. */
> > > > +#define VIRTIO_PCI_CAP_ID		3
> > > > +#define VIRTIO_PCI_CAP_ID_MASK		0x1f
> > > > +#define VIRTIO_PCI_CAP_ID_SHIFT		0
> > > > +
> > > > +/* IDs for different capabilities. If a specific configuration
> > > > + * is missing, legacy PIO path is used. */
> > > > +/* Common configuration */
> > > > +#define VIRTIO_PCI_CAP_COMMON_CFG	0
> > > > +/* Device specific confiuration */
> > > > +#define VIRTIO_PCI_CAP_DEVICE_CFG	1
> > > > +/* Notifications and ISR access */
> > > > +#define VIRTIO_PCI_CAP_NOTIFY_CFG	2
> > > 
> > > I think that separating notification here is very kvm specific.
> > 
> > I don't see how it's kvm specific. This just makes it possible to split data path and
> > configuration. Good for security as well. I was actually debating a
> > separate configuration for ISR and NOTIFICATION.
> 
> It's not really a clean split between data and config, you still have
> device status stuck between notification and ISR, which isn't really in
> the data path.

No, it's in a separate capability. Look at the patch: device status
access used common_map, ISR and notification notify_map.
They can be in separate BAR or whereever we like.

> A split as you suggested sounds like a good idea, but we just need to
> move some fields around to make it clean.

OK, I'll add an ISR capability. This let us move it around where
we want.

> > > Maybe it
> > > could be replaced by allowing virtio to signal eventfds or something
> > > similar?
> > 
> > No idea what you mean - it's a pci device, how should it signal eventfds?
> 
> I'm not really sure what I was thinking about, ignore that :)
> 
> > 
> > > > +
> > > > +/* Index of the BAR including this configuration */
> > > > +#define VIRTIO_PCI_CAP_CFG_BIR		3
> > > > +#define VIRTIO_PCI_CAP_CFG_BIR_MASK	(0x7)
> > > > +#define VIRTIO_PCI_CAP_CFG_BIR_SHIFT	5
> > > > +
> > > > +/* Offset within the BAR */
> > > > +#define VIRTIO_PCI_CAP_CFG_OFF		4
> > > > +/* Size of the configuration space */
> > > > +#define VIRTIO_PCI_CAP_CFG_SIZE		8
> > > > +
> > > >  #endif
> > > > diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
> > > > index 4bcc8b8..d4ed130 100644
> > > > --- a/drivers/virtio/virtio_pci.c
> > > > +++ b/drivers/virtio/virtio_pci.c
> > > > @@ -37,8 +37,12 @@ struct virtio_pci_device
> > > >  	struct virtio_device vdev;
> > > >  	struct pci_dev *pci_dev;
> > > >  
> > > > -	/* the IO mapping for the PCI config space */
> > > > +	/* the IO address for the common PCI config space */
> > > >  	void __iomem *ioaddr;
> > > > +	/* the IO address for device specific config */
> > > > +	void __iomem *ioaddr_device;
> > > > +	/* the IO address to use for notifications operations */
> > > > +	void __iomem *ioaddr_notify;
> > > >  
> > > >  	/* a list of queues so we can dispatch IRQs */
> > > >  	spinlock_t lock;
> > > > @@ -57,8 +61,142 @@ struct virtio_pci_device
> > > >  	unsigned msix_used_vectors;
> > > >  	/* Whether we have vector per vq */
> > > >  	bool per_vq_vectors;
> > > > +
> > > > +	/* Various IO mappings: used for resource tracking only. */
> > > > +
> > > > +	/* Legacy BAR0: typically PIO. */
> > > > +	void __iomem *legacy_map;
> > > > +
> > > > +	/* Mappings specified by device capabilities: typically in MMIO */
> > > > +	void __iomem *notify_map;
> > > > +	void __iomem *common_map;
> > > > +	void __iomem *device_map;
> > > >  };
> > > >  
> > > > +/*
> > > > + * With PIO, device-specific config moves as MSI-X is enabled/disabled.
> > > > + * Use this accessor to keep pointer to that config in sync.
> > > > + */
> > > > +static void virtio_pci_set_msix_enabled(struct virtio_pci_device *vp_dev, int enabled)
> > > > +{
> > > > +	vp_dev->msix_enabled = enabled;
> > > > +	if (vp_dev->device_map)
> > > > +		vp_dev->ioaddr_device = vp_dev->device_map;
> > > > +	else
> > > > +		vp_dev->ioaddr_device = vp_dev->legacy_map +
> > > > +			VIRTIO_PCI_CONFIG(vp_dev);
> > > > +}
> > > > +
> > > > +static void __iomem *virtio_pci_map_cfg(struct virtio_pci_device *vp_dev, u8 cap_id)
> > > > +{
> > > > +        u32 size;
> > > > +        u32 offset;
> > > > +        u8 bir;
> > > > +        u8 cap_len;
> > > > +	int pos;
> > > > +	struct pci_dev *dev = vp_dev->pci_dev;
> > > > +	void __iomem *p;
> > > > +
> > > > +	for (pos = pci_find_capability(dev, PCI_CAP_ID_VNDR);
> > > > +	     pos > 0;
> > > > +	     pos = pci_find_next_capability(dev, pos, PCI_CAP_ID_VNDR)) {
> > > > +		u8 id;
> > > > +		pci_read_config_byte(dev, pos + PCI_VNDR_CAP_LEN, &cap_len);
> > > > +		if (cap_len < VIRTIO_PCI_CAP_ID + 1)
> > > > +			continue;
> > > > +		pci_read_config_byte(dev, pos + VIRTIO_PCI_CAP_ID, &id);
> > > > +		id >>= VIRTIO_PCI_CAP_ID_SHIFT;
> > > > +		id &= VIRTIO_PCI_CAP_ID_MASK;
> > > > +		if (id == cap_id)
> > > > +			break;
> > > > +	}
> > > > +
> > > > +	if (pos <= 0)
> > > > +		return NULL;
> > > > +
> > > > +	if (cap_len < VIRTIO_PCI_CAP_CFG_BIR + 1)
> > > > +		goto err;
> > > > +        pci_read_config_byte(dev, pos + VIRTIO_PCI_CAP_CFG_BIR, &bir);
> > > > +	if (cap_len < VIRTIO_PCI_CAP_CFG_OFF + 4)
> > > > +		goto err;
> > > > +        pci_read_config_dword(dev, pos + VIRTIO_PCI_CAP_CFG_OFF, &offset);
> > > > +	if (cap_len < VIRTIO_PCI_CAP_CFG_SIZE + 4)
> > > > +		goto err;
> > > > +        pci_read_config_dword(dev, pos + VIRTIO_PCI_CAP_CFG_SIZE, &size);
> > > > +        bir >>= VIRTIO_PCI_CAP_CFG_BIR_SHIFT;
> > > > +        bir &= VIRTIO_PCI_CAP_CFG_BIR_MASK;
> > > > +
> > > > +	/* It's possible for a device to supply a huge config space,
> > > > +	 * but we'll never need to map more than a page ATM. */
> > > > +	p = pci_iomap_range(dev, bir, offset, size, PAGE_SIZE);
> > > > +	if (!p)
> > > > +		dev_err(&vp_dev->vdev.dev, "Unable to map virtio pci memory");
> > > > +	return p;
> > > > +err:
> > > > +	dev_err(&vp_dev->vdev.dev, "Unable to parse virtio pci capability");
> > > > +	return NULL;
> > > > +}
> > > > +
> > > > +static void virtio_pci_iounmap(struct virtio_pci_device *vp_dev)
> > > > +{
> > > > +	if (vp_dev->legacy_map)
> > > > +		pci_iounmap(vp_dev->pci_dev, vp_dev->legacy_map);
> > > > +	if (vp_dev->notify_map)
> > > > +		pci_iounmap(vp_dev->pci_dev, vp_dev->notify_map);
> > > > +	if (vp_dev->common_map)
> > > > +		pci_iounmap(vp_dev->pci_dev, vp_dev->common_map);
> > > > +	if (vp_dev->device_map)
> > > > +		pci_iounmap(vp_dev->pci_dev, vp_dev->device_map);
> > > > +}
> > > > +
> > > > +static int virtio_pci_iomap(struct virtio_pci_device *vp_dev)
> > > > +{
> > > > +	vp_dev->notify_map = virtio_pci_map_cfg(vp_dev,
> > > > +						     VIRTIO_PCI_CAP_NOTIFY_CFG);
> > > > +	vp_dev->common_map = virtio_pci_map_cfg(vp_dev,
> > > > +						     VIRTIO_PCI_CAP_COMMON_CFG);
> > > > +	vp_dev->device_map = virtio_pci_map_cfg(vp_dev,
> > > > +						     VIRTIO_PCI_CAP_DEVICE_CFG);
> > > > +
> > > > +	if (!vp_dev->notify_map || !vp_dev->common_map ||
> > > > +	    !vp_dev->device_map) {
> > > > +		/*
> > > > +		 * If not all capabilities present, map legacy PIO.
> > > > +		 * Legacy access is at BAR 0. We never need to map
> > > > +		 * more than 256 bytes there, since legacy config space
> > > > +		 * used PIO which has this size limit.
> > > > +		 * */
> > > 
> > > Whats the point of putting each of those new configs into a separate cap
> > > structure if you must have all 3 of them for the new config layout to
> > > work anyway?
> > 
> > Pls look closer, we don't require all 3 of them. For example, we can have BAR0
> > DEVICE and COMMON, and notifications will use BAR0.
> > 
> > By using an identical structure we make is much easier to reuse
> > and extend it. For example we might add a separate structure for ISR.
> 
> Right.
> 
> > > Also, why not use this change to simplify the config space? The attempts
> > > to squeeze it into the limited PIO space made it quite messy, and this
> > > can be a good chance to fix some things like the MSI-X and offsets
> > > issue.
> > 
> > For MSI-X this is exactly what this change does, by splitting
> > device-specific parts out. So if DEVICE capability is present,
> > this tells us where the device specific data starts and
> > enabling MSI-X does not relocate it.
> 
> You still have relocations within the shared config space, once 64bit
> features or other future extensions are implemented.
> 
> > Are there other fields we want to move?
> 
> We can do several things:
> 
> 1. Since we're no longer limited in space - remove queue selector and
> just have a full queue table.

I'm not sure it's a good idea.
MMIO space is cheaper than PIO but it's not like it is completely free.
What does this get us?

> 2. Move device specific features into the device specific region.
> Currently the features field is a mix between virtio-pci and device
> specific features.
> 
> I did more similar things in the spec suggestion I've sent, could you
> look at the second part for more ideas please?

Will do.

> -- 
> 
> Sasha.

^ permalink raw reply

* Re: [PATCH (repost) RFC 2/2] virtio-pci: recall and return msix notifications on ISR read
From: Jan Kiszka @ 2011-11-03 11:42 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Anthony Liguori, qemu-devel@nongnu.org,
	virtualization@lists.linux-foundation.org, Blue Swirl,
	Stefan Weil, Avi Kivity, Richard Henderson
In-Reply-To: <6dc9aa9764b1cfddf557a98f269e0f7d31ce03ac.1320259840.git.mst@redhat.com>

On 2011-11-02 21:11, Michael S. Tsirkin wrote:
> MSIX spec requires that device can be operated with
> all vectors masked, by polling pending bits.
> Add APIs to recall an msix notification, and make polling
> mode possible in virtio-pci by clearing the
> pending bits and setting ISR appropriately on ISR read.
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>  hw/msix.c       |   26 ++++++++++++++++++++++++++
>  hw/msix.h       |    3 +++
>  hw/virtio-pci.c |   11 ++++++++++-
>  3 files changed, 39 insertions(+), 1 deletions(-)
> 
> diff --git a/hw/msix.c b/hw/msix.c
> index 63b41b9..fe967c9 100644
> --- a/hw/msix.c
> +++ b/hw/msix.c
> @@ -349,6 +349,32 @@ void msix_notify(PCIDevice *dev, unsigned vector)
>      stl_le_phys(address, data);
>  }
>  
> +/* Recall outstanding MSI-X notifications for a vector, if possible.
> + * Return true if any were outstanding. */
> +bool msix_recall(PCIDevice *dev, unsigned vector)
> +{
> +    bool ret;
> +    if (vector >= dev->msix_entries_nr)
> +        return false;
> +    ret = msix_is_pending(dev, vector);
> +    msix_clr_pending(dev, vector);
> +    return ret;
> +}

I would prefer to have a single API instead to clarify the tight relation:

bool msi[x]_set_notify(PCIDevice *dev, unsigned vector, unsigned level)

Would return true for level=1 if the message was either sent directly or
queued (we could deliver false if it was already queued, but I see no
use case for this yet).

Also, I don't see the generic value of some msix_recall_all. I think
it's better handled in a single loop over all vectors at caller site,
clearing the individual interrupt reason bits on a per-vector basis
there. msix_recall_all is only useful in the virtio case where you have
one vector of reason A and all the rest of B. Once you had multiple
reason C vectors as well, it would not help anymore.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply

* Re: [PATCH (repost) RFC 2/2] virtio-pci: recall and return msix notifications on ISR read
From: Michael S. Tsirkin @ 2011-11-03 12:07 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Anthony Liguori, qemu-devel@nongnu.org,
	virtualization@lists.linux-foundation.org, Blue Swirl,
	Stefan Weil, Avi Kivity, Richard Henderson
In-Reply-To: <4EB27E3F.2030503@siemens.com>

On Thu, Nov 03, 2011 at 12:42:55PM +0100, Jan Kiszka wrote:
> On 2011-11-02 21:11, Michael S. Tsirkin wrote:
> > MSIX spec requires that device can be operated with
> > all vectors masked, by polling pending bits.
> > Add APIs to recall an msix notification, and make polling
> > mode possible in virtio-pci by clearing the
> > pending bits and setting ISR appropriately on ISR read.
> > 
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> >  hw/msix.c       |   26 ++++++++++++++++++++++++++
> >  hw/msix.h       |    3 +++
> >  hw/virtio-pci.c |   11 ++++++++++-
> >  3 files changed, 39 insertions(+), 1 deletions(-)
> > 
> > diff --git a/hw/msix.c b/hw/msix.c
> > index 63b41b9..fe967c9 100644
> > --- a/hw/msix.c
> > +++ b/hw/msix.c
> > @@ -349,6 +349,32 @@ void msix_notify(PCIDevice *dev, unsigned vector)
> >      stl_le_phys(address, data);
> >  }
> >  
> > +/* Recall outstanding MSI-X notifications for a vector, if possible.
> > + * Return true if any were outstanding. */
> > +bool msix_recall(PCIDevice *dev, unsigned vector)
> > +{
> > +    bool ret;
> > +    if (vector >= dev->msix_entries_nr)
> > +        return false;
> > +    ret = msix_is_pending(dev, vector);
> > +    msix_clr_pending(dev, vector);
> > +    return ret;
> > +}
> 
> I would prefer to have a single API instead to clarify the tight relation:
> 
> bool msi[x]_set_notify(PCIDevice *dev, unsigned vector, unsigned level)
> 
> Would return true for level=1 if the message was either sent directly or
> queued (we could deliver false if it was already queued, but I see no
> use case for this yet).

It's a matter of taste: some people like functions with flags, some
prefer separate functions.  I really prefer two functions.

But I agree it woulkd be better to have a name that makes it clear that
what we recall is a notification.
msix_notify_queue/msix_notify_dequeue?


> Also, I don't see the generic value of some msix_recall_all. I think
> it's better handled in a single loop over all vectors at caller site,
> clearing the individual interrupt reason bits on a per-vector basis
> there. msix_recall_all is only useful in the virtio case where you have
> one vector of reason A and all the rest of B. Once you had multiple
> reason C vectors as well, it would not help anymore.
> 
> Jan

The reason I wanted to have it is to reduce the overhead this adds:
since PBA is packed, it's much faster to check whether any bits are set
than by going through them all, one by one. Typically all PBA
bits are clear ...

I agree it might not help non-virtio devices, but to me it looks like a
harmless little helper - what's the issue with it?

> -- 
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux

^ permalink raw reply

* Re: [PATCH RFC] virtio-pci: flexible configuration layout
From: Michael S. Tsirkin @ 2011-11-03 12:11 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Krishna Kumar, kvm, Pawel Moll, Wang Sheng-Hui,
	Alexey Kardashevskiy, lkml - Kernel Mailing List, Jesse Barnes,
	virtualization, Christian Borntraeger, Sasha Levin, Amit Shah,
	Linus Torvalds
In-Reply-To: <4EB26ED0.3080409@redhat.com>

On Thu, Nov 03, 2011 at 12:37:04PM +0200, Avi Kivity wrote:
> On 11/03/2011 01:31 AM, Michael S. Tsirkin wrote:
> > Add a flexible mechanism to specify virtio configuration layout, using
> > pci vendor-specific capability.  A separate capability is used for each
> > of common, device specific and data-path accesses.
> >
> >
> 
> How about posting the spec change instead of patches?

We need both of course but I typically start with patches.
Easier to see whether it's even possible to implement the
spec without overhauling all of the code.

> -- 
> error compiling committee.c: too many arguments to function

^ permalink raw reply

* Re: [PATCH] VirtioConsole support.
From: Christian Borntraeger @ 2011-11-03 12:38 UTC (permalink / raw)
  To: Rusty Russell
  Cc: xen-devel, Konrad Rzeszutek Wilk, Stephen Boyd,
	Greg Kroah-Hartman, Miche Baker-Harvey, linux-kernel,
	virtualization, Amit Shah, Hendrik Brueckner, Anton Blanchard,
	Benjamin Herrenschmidt, Joe Perches
In-Reply-To: <87k47itc7u.fsf@rustcorp.com.au>

On 02/11/11 23:02, Rusty Russell wrote:
> Ben, Christian, can you figure out what's going on?  Is it time for a
> complete rewrite?

You are talking about hvc console? Yes it has so many users that it is now
something like a console/tty layer extension, but it was never designed to be
one. 

I think the hardest part would be to get all users together (for testing and
requirements) and then to find someone that is actually doing that rework. Dont
know if the pain level is already that high?

Christian

^ permalink raw reply

* Re: virtio-pci new configuration proposal
From: Michael S. Tsirkin @ 2011-11-03 12:46 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Anthony Liguori, linux-kernel, kvm, virtualization
In-Reply-To: <1320309203.29407.1.camel@lappy>

On Thu, Nov 03, 2011 at 10:33:23AM +0200, Sasha Levin wrote:
> On Thu, 2011-11-03 at 12:28 +1030, Rusty Russell wrote: 
> > On Wed, 02 Nov 2011 20:49:27 +0200, Sasha Levin <levinsasha928@gmail.com> wrote:
> > > This is a proposal for a new layout of the virtio-pci config space.
> > > 
> > > We will separate the current configuration into two: A virtio-pci common
> > > configuration and a device specific configuration. This allows more flexibility
> > > with adding features and makes usage easier, specifically in cases like the
> > > ones in virtio-net where device specific configurations depend on device
> > > specific features.
> > 
> > Thanks for this Sasha.  Several general comments:
> > 
> > 1) How to we distinguish the two layouts?  In theory, we need to do this
> >    forever.  In practice we can deprecate the old layout in several
> >    years' time.
> 
> Old layouts won't have the new virtio-pci cap structure in their PCI
> config space.

What happens next time we want to change something?

> > 2) I don't think we want to turn the device-specific config into a
> >    linked list.  We haven't needed variable-length config (yet!), and
> >    it's (slightly) more complex.  That's also the part of the spec which
> >    is shared with non-PCI virtio implementations.
> 
> Variable length config wasn't used yet because space in the device
> specific space was reserved for a feature even if that feature wasn't
> used.

Not only that, also because it is messy to debug.  With fixed offsets
you just print the address/data and you know what it's doing.

> For example, the MAC feature reserved 6 bytes in the config space for
> the MAC even if VIRTIO_NET_F_MAC wasn't enabled. Here we can just avoid
> having it pollute the config space until it's enabled.
> 

This looks like overdesign to me. The only reason PCI spec
used capability list is
1. to save space
2. standard mechanism to discover features
You say explicitly space is not an issue, and you keep
feature bits around.

> I don't think it'll have any impact on non-PCI implementations since the
> "pointers" are simply offsets from the beginning of the config space,
> and are not PCI specific in any way.

The APIs use a single offset cookie to pass configuration.
If you want capabilities, that will have to be changed.

> > 3) If we're changing the queue layout, it's a chance to fix a
> >    longstanding bug: let the guest notify the host of preferred
> >    queue size and alignment.
> 
> Yup, we can do that.
> 
> -- 
> 
> Sasha.
> 

^ permalink raw reply

* Re: virtio-pci new configuration proposal
From: Sasha Levin @ 2011-11-03 13:19 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Anthony Liguori, linux-kernel, kvm, virtualization
In-Reply-To: <20111103124651.GK18296@redhat.com>

On Thu, 2011-11-03 at 14:46 +0200, Michael S. Tsirkin wrote:
> On Thu, Nov 03, 2011 at 10:33:23AM +0200, Sasha Levin wrote:
> > On Thu, 2011-11-03 at 12:28 +1030, Rusty Russell wrote: 
> > > On Wed, 02 Nov 2011 20:49:27 +0200, Sasha Levin <levinsasha928@gmail.com> wrote:
> > > > This is a proposal for a new layout of the virtio-pci config space.
> > > > 
> > > > We will separate the current configuration into two: A virtio-pci common
> > > > configuration and a device specific configuration. This allows more flexibility
> > > > with adding features and makes usage easier, specifically in cases like the
> > > > ones in virtio-net where device specific configurations depend on device
> > > > specific features.
> > > 
> > > Thanks for this Sasha.  Several general comments:
> > > 
> > > 1) How to we distinguish the two layouts?  In theory, we need to do this
> > >    forever.  In practice we can deprecate the old layout in several
> > >    years' time.
> > 
> > Old layouts won't have the new virtio-pci cap structure in their PCI
> > config space.
> 
> What happens next time we want to change something?

Thats why there are virtio-pci cap values. This layout cap was defined
as VIRTIO_PCI_C_LAYOUT. If for example we want to provide a whole
different layout, we can define something similar to
VIRTIO_PCI_C_LAYOUT_V2 and use it in newer drivers (we can also add a
version field to the original layout ofcourse).

This also allows an easy way to provide backwards compatibility by
specifying many layout definitions and letting the driver choose the
latest layout he can. This would also allow to make larger changes
easier as it allows you to have several different layout configs in the
same code.

> 
> > > 2) I don't think we want to turn the device-specific config into a
> > >    linked list.  We haven't needed variable-length config (yet!), and
> > >    it's (slightly) more complex.  That's also the part of the spec which
> > >    is shared with non-PCI virtio implementations.
> > 
> > Variable length config wasn't used yet because space in the device
> > specific space was reserved for a feature even if that feature wasn't
> > used.
> 
> Not only that, also because it is messy to debug.  With fixed offsets
> you just print the address/data and you know what it's doing.
> 
> > For example, the MAC feature reserved 6 bytes in the config space for
> > the MAC even if VIRTIO_NET_F_MAC wasn't enabled. Here we can just avoid
> > having it pollute the config space until it's enabled.
> > 
> 
> This looks like overdesign to me. The only reason PCI spec
> used capability list is
> 1. to save space
> 2. standard mechanism to discover features
> You say explicitly space is not an issue, and you keep
> feature bits around.

Okay, I agree with not doing linked lists for device specific features.

I do think we need them for virtio-pci itself to handle features such as
MSI-X.

> > I don't think it'll have any impact on non-PCI implementations since the
> > "pointers" are simply offsets from the beginning of the config space,
> > and are not PCI specific in any way.
> 
> The APIs use a single offset cookie to pass configuration.
> If you want capabilities, that will have to be changed.
> 
> > > 3) If we're changing the queue layout, it's a chance to fix a
> > >    longstanding bug: let the guest notify the host of preferred
> > >    queue size and alignment.
> > 
> > Yup, we can do that.
> > 
> > -- 
> > 
> > Sasha.
> > 

-- 

Sasha.

^ permalink raw reply

* Re: [PATCH RFC] virtio-pci: flexible configuration layout
From: Michael S. Tsirkin @ 2011-11-03 13:30 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Krishna Kumar, kvm, Pawel Moll, Wang Sheng-Hui,
	Alexey Kardashevskiy, lkml - Kernel Mailing List, Jesse Barnes,
	virtualization, Christian Borntraeger, Amit Shah, Linus Torvalds
In-Reply-To: <1320318591.29407.17.camel@lappy>

> 2. Move device specific features into the device specific region.
> Currently the features field is a mix between virtio-pci and device
> specific features.

A single feature field with bits partitioned to transport
specific and device specific fields is a generic virtio
thing. So there's relatively little to be gained from
moving the device bits out.

> I did more similar things in the spec suggestion I've sent,

What lacked there is the motivation.

I am aware of several issues that need to be addressed,
as they block extending virtio going forward:
- There's no place to put more transport registers.
  For MSIX I hacked around that with moving device config when a feature
  bit is set, but in hindsight this setup is fragile
  and is panful to extend any further.
  Splitting out device and transport helps.
- PIO space is limited. We are wasting it on non data path
  configuration. There might also be a wish to have largish data
  in configuration space, like a 1024 byte ID field.
- PIO is faster than MMIO on x86 KVM so we need to keep
  using it for data path on that architecture
- Support architectures which have host/guest at different
  endian-ness.

I addressed the first 3 and added capability infrastructure
which we can use in the future to support the last one
(by nature of using little endian format for capabilities).

> could you
> look at the second part for more ideas please?

I saw the per-queue features - but I don't see how it will work really
or what is the issue they solve.
Any more ideas I missed?

> -- 
> 
> Sasha.

^ permalink raw reply

* Re: [PATCH RFC] virtio-pci: flexible configuration layout
From: Avi Kivity @ 2011-11-03 13:37 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Krishna Kumar, kvm, Pawel Moll, Wang Sheng-Hui,
	Alexey Kardashevskiy, lkml - Kernel Mailing List, Jesse Barnes,
	virtualization, Christian Borntraeger, Sasha Levin, Amit Shah,
	Linus Torvalds
In-Reply-To: <20111103121139.GF18296@redhat.com>

On 11/03/2011 02:11 PM, Michael S. Tsirkin wrote:
> On Thu, Nov 03, 2011 at 12:37:04PM +0200, Avi Kivity wrote:
> > On 11/03/2011 01:31 AM, Michael S. Tsirkin wrote:
> > > Add a flexible mechanism to specify virtio configuration layout, using
> > > pci vendor-specific capability.  A separate capability is used for each
> > > of common, device specific and data-path accesses.
> > >
> > >
> > 
> > How about posting the spec change instead of patches?
>
> We need both of course but I typically start with patches.
> Easier to see whether it's even possible to implement the
> spec without overhauling all of the code.

Patches are okay for someone familiar with the code.  For others, they
require more effort.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply

* Re: virtio-pci new configuration proposal
From: Michael S. Tsirkin @ 2011-11-03 13:48 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Anthony Liguori, linux-kernel, kvm, virtualization
In-Reply-To: <1320326341.29407.25.camel@lappy>

On Thu, Nov 03, 2011 at 03:19:01PM +0200, Sasha Levin wrote:
> On Thu, 2011-11-03 at 14:46 +0200, Michael S. Tsirkin wrote:
> > On Thu, Nov 03, 2011 at 10:33:23AM +0200, Sasha Levin wrote:
> > > On Thu, 2011-11-03 at 12:28 +1030, Rusty Russell wrote: 
> > > > On Wed, 02 Nov 2011 20:49:27 +0200, Sasha Levin <levinsasha928@gmail.com> wrote:
> > > > > This is a proposal for a new layout of the virtio-pci config space.
> > > > > 
> > > > > We will separate the current configuration into two: A virtio-pci common
> > > > > configuration and a device specific configuration. This allows more flexibility
> > > > > with adding features and makes usage easier, specifically in cases like the
> > > > > ones in virtio-net where device specific configurations depend on device
> > > > > specific features.
> > > > 
> > > > Thanks for this Sasha.  Several general comments:
> > > > 
> > > > 1) How to we distinguish the two layouts?  In theory, we need to do this
> > > >    forever.  In practice we can deprecate the old layout in several
> > > >    years' time.
> > > 
> > > Old layouts won't have the new virtio-pci cap structure in their PCI
> > > config space.
> > 
> > What happens next time we want to change something?
> 
> Thats why there are virtio-pci cap values. This layout cap was defined
> as VIRTIO_PCI_C_LAYOUT. If for example we want to provide a whole
> different layout, we can define something similar to
> VIRTIO_PCI_C_LAYOUT_V2 and use it in newer drivers (we can also add a
> version field to the original layout ofcourse).
> 
> This also allows an easy way to provide backwards compatibility by
> specifying many layout definitions and letting the driver choose the
> latest layout he can. This would also allow to make larger changes
> easier as it allows you to have several different layout configs in the
> same code.

This plan does not make me happy. Versioning is much messier than
features to support, and much harder for downstreams to
cherry-pick.

> > 
> > > > 2) I don't think we want to turn the device-specific config into a
> > > >    linked list.  We haven't needed variable-length config (yet!), and
> > > >    it's (slightly) more complex.  That's also the part of the spec which
> > > >    is shared with non-PCI virtio implementations.
> > > 
> > > Variable length config wasn't used yet because space in the device
> > > specific space was reserved for a feature even if that feature wasn't
> > > used.
> > 
> > Not only that, also because it is messy to debug.  With fixed offsets
> > you just print the address/data and you know what it's doing.
> > 
> > > For example, the MAC feature reserved 6 bytes in the config space for
> > > the MAC even if VIRTIO_NET_F_MAC wasn't enabled. Here we can just avoid
> > > having it pollute the config space until it's enabled.
> > > 
> > 
> > This looks like overdesign to me. The only reason PCI spec
> > used capability list is
> > 1. to save space
> > 2. standard mechanism to discover features
> > You say explicitly space is not an issue, and you keep
> > feature bits around.
> 
> Okay, I agree with not doing linked lists for device specific features.
> 
> I do think we need them for virtio-pci itself to handle features such as
> MSI-X.

It's definitely easier for virtio-pci than for devices.
But let's address the motivation point.
Do you expect a need to have a huge structure there, like
megabytes in size, so space will be an issue again?

> > > I don't think it'll have any impact on non-PCI implementations since the
> > > "pointers" are simply offsets from the beginning of the config space,
> > > and are not PCI specific in any way.
> > 
> > The APIs use a single offset cookie to pass configuration.
> > If you want capabilities, that will have to be changed.
> > 
> > > > 3) If we're changing the queue layout, it's a chance to fix a
> > > >    longstanding bug: let the guest notify the host of preferred
> > > >    queue size and alignment.
> > > 
> > > Yup, we can do that.
> > > 
> > > -- 
> > > 
> > > Sasha.
> > > 
> 
> -- 
> 
> Sasha.

^ permalink raw reply

* Re: [PATCH RFC] virtio-pci: flexible configuration layout
From: Michael S. Tsirkin @ 2011-11-03 13:53 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Krishna Kumar, kvm, Pawel Moll, Wang Sheng-Hui,
	Alexey Kardashevskiy, lkml - Kernel Mailing List, Jesse Barnes,
	virtualization, Christian Borntraeger, Sasha Levin, Amit Shah,
	Linus Torvalds
In-Reply-To: <4EB29926.9030509@redhat.com>

On Thu, Nov 03, 2011 at 03:37:42PM +0200, Avi Kivity wrote:
> On 11/03/2011 02:11 PM, Michael S. Tsirkin wrote:
> > On Thu, Nov 03, 2011 at 12:37:04PM +0200, Avi Kivity wrote:
> > > On 11/03/2011 01:31 AM, Michael S. Tsirkin wrote:
> > > > Add a flexible mechanism to specify virtio configuration layout, using
> > > > pci vendor-specific capability.  A separate capability is used for each
> > > > of common, device specific and data-path accesses.
> > > >
> > > >
> > > 
> > > How about posting the spec change instead of patches?
> >
> > We need both of course but I typically start with patches.
> > Easier to see whether it's even possible to implement the
> > spec without overhauling all of the code.
> 
> Patches are okay for someone familiar with the code.  For others, they
> require more effort.

Yes. It's just that I needed to convince myself that the approach is good,
and that meant writing the supporting code. Since I had the code
I didn't see a reason not to send it :)

> -- 
> error compiling committee.c: too many arguments to function

^ permalink raw reply

* Re: [PATCH] VirtioConsole support.
From: Konrad Rzeszutek Wilk @ 2011-11-03 14:15 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: xen-devel, Stephen Boyd, Greg Kroah-Hartman, Miche Baker-Harvey,
	linux-kernel, Rusty Russell, virtualization, Amit Shah,
	Hendrik Brueckner, Anton Blanchard, Benjamin Herrenschmidt,
	Joe Perches
In-Reply-To: <4EB28B37.8030301@de.ibm.com>

On Thu, Nov 03, 2011 at 01:38:15PM +0100, Christian Borntraeger wrote:
> On 02/11/11 23:02, Rusty Russell wrote:
> > Ben, Christian, can you figure out what's going on?  Is it time for a
> > complete rewrite?
> 
> You are talking about hvc console? Yes it has so many users that it is now
> something like a console/tty layer extension, but it was never designed to be
> one. 
> 
> I think the hardest part would be to get all users together (for testing and
> requirements) and then to find someone that is actually doing that rework. Dont

I am volunteering myself to test the Xen user of it. Perhaps even add some patches,
albeit my knowledge of the hvc console is that "it works" (well, with this minor
speed bump).

> know if the pain level is already that high?
> 
> Christian
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply

* Re: [Qemu-devel] [PATCH] virtio: Add PCI memory BAR in addition to PIO BAR
From: Michael S. Tsirkin @ 2011-11-03 14:31 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: kvm, qemu-devel, virtualization, Pekka Enberg, Avi Kivity,
	David Gibson
In-Reply-To: <4EB29BEB.4070603@codemonkey.ws>

On Thu, Nov 03, 2011 at 08:49:31AM -0500, Anthony Liguori wrote:
> On 11/03/2011 08:45 AM, Avi Kivity wrote:
> >On 11/03/2011 03:38 PM, Anthony Liguori wrote:
> >>>
> >>>>We could use a better agreement on the processor for making virtio
> >>>>changes. Should it go (1) virtio spec (2) kernel (3) qemu, or should
> >>>>it go (2), (1), (3)?
> >>>
> >>>1. Informal discussion
> >>
> >>
> >>Where?  Is this lkml?  There were a number of virtio changes recently
> >>that never involved qemu-devel.
> >
> >Theoretically, virtualization@lists.linux-foundation.org, if it still
> >exists.  Maybe we need a virtio list. qemu-devel@, kvm@, lkml could be
> >copied.
> 
> Perhaps it's time to create a virtio@vger?  Just have a simple
> process that all spec changes to there the appropriate kernel, QEMU,
> virtio-win, or NKT maintainers can require any virtio change to also
> have a committed spec change first.

Historically virtualization@lists.linux-foundation.org was used,
that is still low enough volume. It was down but seems to be
up now. It's easy enough for me to subscribe to yet
another list but my concern is that if we move we might loose some
people that don't notice the change.

Anything wrong with just using the old list?
It's on the MAINTAINERS file, I'm not sure why it was bypassed in this case.
Maybe linux-foundation.org was still down when the patch was posted?

> >The point is that we can't drive virtio from either qemu or the kernel
> >any more.  The spec represents the "virtual hardware manufacturer",
> >which qemu and linux/vhost (and others) emulate, and which linux (and
> >others) write drivers for.
> 
> Yup.  We need to be more rigorous about using the spec for that as
> we've not done a great job historically here.
> 
> Regards,
> 
> Anthony Liguori
> 
> >
> >>
> >>>2. Proposed spec patch, kernel change, qemu change
> >>>3. Buy-ins from spec maintainer, kernel driver maintainer, qemu device
> >>>maintainer (only regarding the ABI, not the code)
> >>
> >>I don't think this is how it's working today.  I would be happy with a
> >>flow like this.
> >
> >If Michael and Rusty agree, we can adopt it immediately.
> >

If I understand the proposal, what is suggested is that
all of spec, kvm and virtio patches are posted on list and
acked before merging any one of them?

Sure, this makes sense.

-- 
MST

^ permalink raw reply

* Re: [Qemu-devel] [PATCH] virtio: Add PCI memory BAR in addition to PIO BAR
From: Anthony Liguori @ 2011-11-03 14:37 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: kvm, qemu-devel, virtualization, Pekka Enberg, Avi Kivity,
	David Gibson
In-Reply-To: <20111103143129.GA20860@redhat.com>

On 11/03/2011 09:31 AM, Michael S. Tsirkin wrote:
> On Thu, Nov 03, 2011 at 08:49:31AM -0500, Anthony Liguori wrote:
>> On 11/03/2011 08:45 AM, Avi Kivity wrote:
>>> On 11/03/2011 03:38 PM, Anthony Liguori wrote:
>>>>>
>>>>>> We could use a better agreement on the processor for making virtio
>>>>>> changes. Should it go (1) virtio spec (2) kernel (3) qemu, or should
>>>>>> it go (2), (1), (3)?
>>>>>
>>>>> 1. Informal discussion
>>>>
>>>>
>>>> Where?  Is this lkml?  There were a number of virtio changes recently
>>>> that never involved qemu-devel.
>>>
>>> Theoretically, virtualization@lists.linux-foundation.org, if it still
>>> exists.  Maybe we need a virtio list. qemu-devel@, kvm@, lkml could be
>>> copied.
>>
>> Perhaps it's time to create a virtio@vger?  Just have a simple
>> process that all spec changes to there the appropriate kernel, QEMU,
>> virtio-win, or NKT maintainers can require any virtio change to also
>> have a committed spec change first.
>
> Historically virtualization@lists.linux-foundation.org was used,
> that is still low enough volume. It was down but seems to be
> up now. It's easy enough for me to subscribe to yet
> another list but my concern is that if we move we might loose some
> people that don't notice the change.
>
> Anything wrong with just using the old list?

No, but it's so old, I have no idea who admin's it these days.

> It's on the MAINTAINERS file, I'm not sure why it was bypassed in this case.
> Maybe linux-foundation.org was still down when the patch was posted?

The list has been down for a while.

>>>>> 2. Proposed spec patch, kernel change, qemu change
>>>>> 3. Buy-ins from spec maintainer, kernel driver maintainer, qemu device
>>>>> maintainer (only regarding the ABI, not the code)
>>>>
>>>> I don't think this is how it's working today.  I would be happy with a
>>>> flow like this.
>>>
>>> If Michael and Rusty agree, we can adopt it immediately.
>>>
>
> If I understand the proposal, what is suggested is that
> all of spec, kvm and virtio patches are posted on list and
> acked before merging any one of them?
>
> Sure, this makes sense.

Well, what's needed before the spec is changed is an interesting question, but I 
think the main thing is, don't commit any virtio ABI changes to vhost, QEMU, 
NKT, or the kernel until the spec for the change has been committed.

It would be nice to have a working implementation before committing a spec 
change.  Even nicer would be to have Acked-by's a maintainer in each area affected.

Regards,

Anthony Liguori

^ permalink raw reply

* Re: [Qemu-devel] [PATCH] virtio: Add PCI memory BAR in addition to PIO BAR
From: Avi Kivity @ 2011-11-03 14:53 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: kvm, Michael S. Tsirkin, qemu-devel, virtualization, Pekka Enberg,
	David Gibson
In-Reply-To: <4EB2A747.9040909@codemonkey.ws>

On 11/03/2011 04:37 PM, Anthony Liguori wrote:
>
>>>>>> 2. Proposed spec patch, kernel change, qemu change
>>>>>> 3. Buy-ins from spec maintainer, kernel driver maintainer, qemu
>>>>>> device
>>>>>> maintainer (only regarding the ABI, not the code)
>>>>>
>>>>> I don't think this is how it's working today.  I would be happy
>>>>> with a
>>>>> flow like this.
>>>>
>>>> If Michael and Rusty agree, we can adopt it immediately.
>>>>
>>
>> If I understand the proposal, what is suggested is that
>> all of spec, kvm and virtio patches are posted on list and
>> acked before merging any one of them?
>>
>> Sure, this makes sense.
>
> Well, what's needed before the spec is changed is an interesting
> question, but I think the main thing is, don't commit any virtio ABI
> changes to vhost, QEMU, NKT, or the kernel until the spec for the
> change has been committed.
>
> It would be nice to have a working implementation before committing a
> spec change.  Even nicer would be to have Acked-by's a maintainer in
> each area affected.
>

Those are steps 2 and 3 of the proposed process.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply

* Re: [PATCH RFC] virtio-pci: flexible configuration layout
From: Jesse Barnes @ 2011-11-03 14:59 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Krishna Kumar, kvm, Pawel Moll, Wang Sheng-Hui,
	Alexey Kardashevskiy, lkml - Kernel Mailing List, virtualization,
	Christian Borntraeger, Sasha Levin, Amit Shah, Linus Torvalds
In-Reply-To: <20111102233110.GA20289@redhat.com>


[-- Attachment #1.1: Type: text/plain, Size: 1429 bytes --]

On Thu, 3 Nov 2011 01:31:11 +0200
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> Add a flexible mechanism to specify virtio configuration layout, using
> pci vendor-specific capability.  A separate capability is used for each
> of common, device specific and data-path accesses.
> 
> Warning: compiled only.
> This patch also needs to be split up, pci_iomap changes
> also need arch updates for non-x86.
> 
> We also will need to update the spec.
> 
> See the first chunk for layout documentation.
> 
> Posting here for early feedback.
> 
> In particular:
> 
> Do we need to require offset to be aligned?
> Does iowrite16 work with unaligned accesses on all architectures?
> Does using ioread/write as we do add overhead as compared to
> plain PIO accesses?
> 
> Jesse - are you OK with the pci_iomap_range API proposed here
> (see last chunks)?
> I noticed lots of architectures duplicate the implementation
> of pci_iomap - makes sense to clean that up?

Probably makes sense to have a weak version at least for arches that
just do the same thing.  But some arches really do need separate
versions because they wrap read/write as well...

Given the arch constraints, it's probably not possible to get rid of
the min/max args in favor of a simple offset/len pair like we have for
ioremap.  But if we could that would be even better.

-- 
Jesse Barnes, Intel Open Source Technology Center

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH (repost) RFC 2/2] virtio-pci: recall and return msix notifications on ISR read
From: Jan Kiszka @ 2011-11-03 16:52 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Anthony Liguori, qemu-devel@nongnu.org,
	virtualization@lists.linux-foundation.org, Blue Swirl,
	Stefan Weil, Avi Kivity, Richard Henderson
In-Reply-To: <20111103120712.GE18296@redhat.com>

On 2011-11-03 13:07, Michael S. Tsirkin wrote:
> On Thu, Nov 03, 2011 at 12:42:55PM +0100, Jan Kiszka wrote:
>> On 2011-11-02 21:11, Michael S. Tsirkin wrote:
>>> MSIX spec requires that device can be operated with
>>> all vectors masked, by polling pending bits.
>>> Add APIs to recall an msix notification, and make polling
>>> mode possible in virtio-pci by clearing the
>>> pending bits and setting ISR appropriately on ISR read.
>>>
>>> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>>> ---
>>>  hw/msix.c       |   26 ++++++++++++++++++++++++++
>>>  hw/msix.h       |    3 +++
>>>  hw/virtio-pci.c |   11 ++++++++++-
>>>  3 files changed, 39 insertions(+), 1 deletions(-)
>>>
>>> diff --git a/hw/msix.c b/hw/msix.c
>>> index 63b41b9..fe967c9 100644
>>> --- a/hw/msix.c
>>> +++ b/hw/msix.c
>>> @@ -349,6 +349,32 @@ void msix_notify(PCIDevice *dev, unsigned vector)
>>>      stl_le_phys(address, data);
>>>  }
>>>  
>>> +/* Recall outstanding MSI-X notifications for a vector, if possible.
>>> + * Return true if any were outstanding. */
>>> +bool msix_recall(PCIDevice *dev, unsigned vector)
>>> +{
>>> +    bool ret;
>>> +    if (vector >= dev->msix_entries_nr)
>>> +        return false;
>>> +    ret = msix_is_pending(dev, vector);
>>> +    msix_clr_pending(dev, vector);
>>> +    return ret;
>>> +}
>>
>> I would prefer to have a single API instead to clarify the tight relation:
>>
>> bool msi[x]_set_notify(PCIDevice *dev, unsigned vector, unsigned level)
>>
>> Would return true for level=1 if the message was either sent directly or
>> queued (we could deliver false if it was already queued, but I see no
>> use case for this yet).
> 
> It's a matter of taste: some people like functions with flags, some
> prefer separate functions.  I really prefer two functions.
> 
> But I agree it woulkd be better to have a name that makes it clear that
> what we recall is a notification.
> msix_notify_queue/msix_notify_dequeue?

OK, that doesn't sound bad.

> 
> 
>> Also, I don't see the generic value of some msix_recall_all. I think
>> it's better handled in a single loop over all vectors at caller site,
>> clearing the individual interrupt reason bits on a per-vector basis
>> there. msix_recall_all is only useful in the virtio case where you have
>> one vector of reason A and all the rest of B. Once you had multiple
>> reason C vectors as well, it would not help anymore.
>>
>> Jan
> 
> The reason I wanted to have it is to reduce the overhead this adds:
> since PBA is packed, it's much faster to check whether any bits are set
> than by going through them all, one by one. Typically all PBA
> bits are clear ...
> 
> I agree it might not help non-virtio devices, but to me it looks like a
> harmless little helper - what's the issue with it?

*If* there is a noticeable performance gain, I'm fine with
msix_notify_dequeue_all (about how may vectors are we talking in the
vitio case?). But the code would be more regular the other way around.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply

* Re: [PATCH] virtio-balloon: Trivial cleanups
From: Rusty Russell @ 2011-11-04  9:36 UTC (permalink / raw)
  To: Michael S. Tsirkin, Sasha Levin; +Cc: linux-kernel, virtualization
In-Reply-To: <20111103110954.GC18296@redhat.com>

On Thu, 3 Nov 2011 13:09:55 +0200, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Thu, Nov 03, 2011 at 10:20:04AM +0200, Sasha Levin wrote:
> > Trivial changes to remove forgotten junk, format comments, and correct names.
> > 
> > Cc: Rusty Russell <rusty@rustcorp.com.au>
> > Cc: "Michael S. Tsirkin" <mst@redhat.com>
> > Cc: virtualization@lists.linux-foundation.org
> > Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
> 
> Acked-by: Michael S. Tsirkin <mst@redhat.com>
> 
> 3.3 material.
> Rusty, you want to take it? Want me to?

Hmm, this is trivial patch monkey territory, but I've taken it.

Thanks,
Rusty.

^ permalink raw reply

* Re: virtio-pci new configuration proposal
From: Rusty Russell @ 2011-11-04  9:44 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Anthony Liguori, virtualization, linux-kernel, kvm,
	Michael S. Tsirkin
In-Reply-To: <1320309203.29407.1.camel@lappy>

On Thu, 03 Nov 2011 10:33:23 +0200, Sasha Levin <levinsasha928@gmail.com> wrote:
> On Thu, 2011-11-03 at 12:28 +1030, Rusty Russell wrote: 
> > 2) I don't think we want to turn the device-specific config into a
> >    linked list.  We haven't needed variable-length config (yet!), and
> >    it's (slightly) more complex.  That's also the part of the spec which
> >    is shared with non-PCI virtio implementations.
> 
> Variable length config wasn't used yet because space in the device
> specific space was reserved for a feature even if that feature wasn't
> used.
> 
> For example, the MAC feature reserved 6 bytes in the config space for
> the MAC even if VIRTIO_NET_F_MAC wasn't enabled. Here we can just avoid
> having it pollute the config space until it's enabled.

Exactly.  But we haven't had a problem so far; but we don't put
arbitrarily large fields in there.

> I don't think it'll have any impact on non-PCI implementations since the
> "pointers" are simply offsets from the beginning of the config space,
> and are not PCI specific in any way.

But the drivers currently just use offsetof() to access it, eg:

	if (virtio_config_val_len(vdev, VIRTIO_NET_F_MAC,
				  offsetof(struct virtio_net_config, mac),
				  dev->dev_addr, dev->addr_len) < 0)

That would have to change, and that means a change for drivers and for
the non-PCI implementations.

Hence I think this is a step too far.

> > 3) If we're changing the queue layout, it's a chance to fix a
> >    longstanding bug: let the guest notify the host of preferred
> >    queue size and alignment.
> 
> Yup, we can do that.

The seabios guys will definitely thank you!

Cheers,
Rusty.

^ permalink raw reply

* Re: [PATCH 3 of 5] virtio: support unlocked queue kick
From: Stefan Hajnoczi @ 2011-11-04 10:09 UTC (permalink / raw)
  To: Rusty Russell
  Cc: MichaelS.Tsirkist, Christoph Hellwig, linux-kernel, kvm,
	virtualization
In-Reply-To: <20111103075211.GD6993@infradead.org>

On Thu, Nov 3, 2011 at 7:52 AM, Christoph Hellwig <hch@infradead.org> wrote:
> On Thu, Nov 03, 2011 at 06:12:51PM +1030, Rusty Russell wrote:
>> Based on patch by Christoph for virtio_blk speedup:
>
> Please credit it to Stefan - he also sent a pointer to his original
> version in reply to the previous thread.

Thank you Christoph.  Here the pointer to the original mailing list thread:

"Here is the patch:
https://github.com/stefanha/linux/commit/a6d06644e3a58e57a774e77d7dc34c4a5a2e7496

Or as an email if you want to track it down in your inbox:
http://www.spinics.net/lists/linux-virtualization/msg14616.html"

Stefan

^ permalink raw reply

* Re: [PATCH 3 of 5] virtio: support unlocked queue kick
From: Rusty Russell @ 2011-11-04 10:36 UTC (permalink / raw)
  Cc: Christoph Hellwig, virtualization, linux-kernel, kvm, mst
In-Reply-To: <20111103075211.GD6993@infradead.org>

On Thu, 3 Nov 2011 03:52:11 -0400, Christoph Hellwig <hch@infradead.org> wrote:
> On Thu, Nov 03, 2011 at 06:12:51PM +1030, Rusty Russell wrote:
> > Based on patch by Christoph for virtio_blk speedup:
> 
> Please credit it to Stefan - he also sent a pointer to his original
> version in reply to the previous thread.
> 
> Also shouldn't virtqueue_kick have kerneldoc comments?

Yep, and I've credited it properly.

> I also notices that you documented the functions bother here and in
> the first patch in the headers.  At least historically the kerneldoc
> tools didn't parse comments at declarations, but only at the function
> defintions.  Did you check these actually get picked up?

Gah, I'd forgotten that the kernel tendency is to put the interface
documentation next to the implementation.

Personally, I think extracting it is insane, but I've moved them.

Thanks,
Rusty.

^ permalink raw reply

* Re: virtio-pci new configuration proposal
From: Michael S. Tsirkin @ 2011-11-04 11:40 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Anthony Liguori, virtualization, Sasha Levin, kvm, linux-kernel
In-Reply-To: <87lirwrzlg.fsf@rustcorp.com.au>

On Fri, Nov 04, 2011 at 08:14:43PM +1030, Rusty Russell wrote:
> > > 3) If we're changing the queue layout, it's a chance to fix a
> > >    longstanding bug: let the guest notify the host of preferred
> > >    queue size and alignment.
> > 
> > Yup, we can do that.

We don't need to change all of layout for that - just add another field
in the common config structure to supply the alignment.

> The seabios guys will definitely thank you!

-- 
MST

^ permalink raw reply

* Re: virtio-pci new configuration proposal
From: Sasha Levin @ 2011-11-04 12:32 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Anthony Liguori, linux-kernel, kvm, virtualization
In-Reply-To: <20111104114033.GA21308@redhat.com>

On Fri, 2011-11-04 at 13:40 +0200, Michael S. Tsirkin wrote:
> On Fri, Nov 04, 2011 at 08:14:43PM +1030, Rusty Russell wrote:
> > > > 3) If we're changing the queue layout, it's a chance to fix a
> > > >    longstanding bug: let the guest notify the host of preferred
> > > >    queue size and alignment.
> > > 
> > > Yup, we can do that.
> 
> We don't need to change all of layout for that - just add another field
> in the common config structure to supply the alignment.

How would you do it without changing the layout? Add another optional
field at the end which will shift offsets based on whether the host and
guest support this new feature or not?

This leads to 3 different things which now shift config offsets around.

As you said, the PCI cap list was introduced both to save space (which
is not the motivation here), and because it's a very efficient and easy
way to manage optional features without requiring tricks which move
offsets around like we do now.

-- 

Sasha.

^ permalink raw reply

* Re: virtio-pci new configuration proposal
From: Michael S. Tsirkin @ 2011-11-04 13:51 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Anthony Liguori, linux-kernel, kvm, virtualization
In-Reply-To: <1320409939.3334.6.camel@lappy>

On Fri, Nov 04, 2011 at 02:32:19PM +0200, Sasha Levin wrote:
> On Fri, 2011-11-04 at 13:40 +0200, Michael S. Tsirkin wrote:
> > On Fri, Nov 04, 2011 at 08:14:43PM +1030, Rusty Russell wrote:
> > > > > 3) If we're changing the queue layout, it's a chance to fix a
> > > > >    longstanding bug: let the guest notify the host of preferred
> > > > >    queue size and alignment.
> > > > 
> > > > Yup, we can do that.
> > 
> > We don't need to change all of layout for that - just add another field
> > in the common config structure to supply the alignment.
> 
> How would you do it without changing the layout? Add another optional
> field at the end which will shift offsets based on whether the host and
> guest support this new feature or not?
> 
> This leads to 3 different things which now shift config offsets around.

No. Just put the field at offset 24 from the offset specified
by VIRTIO_PCI_CAP_COMMON_CFG.

> As you said, the PCI cap list was introduced both to save space (which
> is not the motivation here), and because it's a very efficient

It's actually pretty inefficient - there's an overhead of 3 bytes for
each vendor specific option.

> and easy way to manage optional features without requiring tricks
> which move offsets around like we do now.

Tricks with offsets only appeared because we had datapath, device
specific and common config in the same place.
feature list isn't needed to fix that.

> -- 
> 
> Sasha.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox