Linux virtualization list
 help / color / mirror / Atom feed
* Re: [PATCH 5 of 5] virtio: expose added descriptors immediately
From: Michael S. Tsirkin @ 2011-11-23  8:30 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Christoph Hellwig, linux-kernel, kvm, virtualization
In-Reply-To: <87ty5v1vqy.fsf@rustcorp.com.au>

On Wed, Nov 23, 2011 at 11:49:01AM +1030, Rusty Russell wrote:
> On Tue, 22 Nov 2011 08:29:08 +0200, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > On Tue, Nov 22, 2011 at 11:03:04AM +1030, Rusty Russell wrote:
> > > -	/* If you haven't kicked in this long, you're probably doing something
> > > -	 * wrong. */
> > > -	WARN_ON(vq->num_added > vq->vring.num);
> > > +	/* This is very unlikely, but theoretically possible.  Kick
> > > +	 * just in case. */
> > > +	if (unlikely(vq->num_added == 65535))
> > 
> > This is 0xffff but why use the decimal notation?
> 
> Interesting.  Why use hex?  Feels more like binary?

Just easier to see it's the largest 16 bit number.

> But I've changed it to "(1 << 16) - 1" to be clear.

That's even better.

> > > +		virtqueue_kick(_vq);
> > >  
> > >  	pr_debug("Added buffer head %i to %p\n", head, vq);
> > >  	END_USE(vq);
> > 
> > We also still need to reset vq->num_added, right?
> 
> virtqueue_kick does that for us.
> 
> Cheers,
> Rusty.

Right.

^ permalink raw reply

* Re: [PATCHv3 RFC] virtio-pci: flexible configuration layout
From: Rusty Russell @ 2011-11-23  2:32 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Krishna Kumar, kvm, Pawel Moll, Wang Sheng-Hui,
	Alexey Kardashevskiy, lkml - Kernel Mailing List, virtualization,
	Christian Borntraeger, Sasha Levin, Amit Shah
In-Reply-To: <20111122183621.GA5235@redhat.com>

On Tue, 22 Nov 2011 20:36:22 +0200, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> Here's an updated vesion.
> I'm alternating between updating the spec and the driver,
> spec update to follow.

Don't touch the spec yet, we have a long way to go :(

I want the ability for driver to set the ring size, and the device to
set the alignment.  That's a bigger change than you have here.  I
imagine it almost rips the driver into two completely different drivers.

This is the kind of thing I had in mind, for the header.  Want me to
code up the rest?

(I've avoided adding the constants for the new layout: a struct is more
compact and more descriptive).

Cheers,
Rusty.

diff --git a/include/linux/virtio_pci.h b/include/linux/virtio_pci.h
--- a/include/linux/virtio_pci.h
+++ b/include/linux/virtio_pci.h
@@ -42,56 +42,74 @@
 #include <linux/virtio_config.h>
 
 /* A 32-bit r/o bitmask of the features supported by the host */
-#define VIRTIO_PCI_HOST_FEATURES	0
+#define VIRTIO_PCI_LEGACY_HOST_FEATURES		0
 
 /* A 32-bit r/w bitmask of features activated by the guest */
-#define VIRTIO_PCI_GUEST_FEATURES	4
+#define VIRTIO_PCI_LEGACY_GUEST_FEATURES	4
 
 /* A 32-bit r/w PFN for the currently selected queue */
-#define VIRTIO_PCI_QUEUE_PFN		8
+#define VIRTIO_PCI_LEGACY_QUEUE_PFN		8
 
 /* A 16-bit r/o queue size for the currently selected queue */
-#define VIRTIO_PCI_QUEUE_NUM		12
+#define VIRTIO_PCI_LEGACY_QUEUE_NUM		12
 
 /* A 16-bit r/w queue selector */
-#define VIRTIO_PCI_QUEUE_SEL		14
+#define VIRTIO_PCI_LEGACY_QUEUE_SEL		14
 
 /* A 16-bit r/w queue notifier */
-#define VIRTIO_PCI_QUEUE_NOTIFY		16
+#define VIRTIO_PCI_LEGACY_QUEUE_NOTIFY		16
 
 /* An 8-bit device status register.  */
-#define VIRTIO_PCI_STATUS		18
+#define VIRTIO_PCI_LEGACY_STATUS		18
 
 /* An 8-bit r/o interrupt status register.  Reading the value will return the
  * current contents of the ISR and will also clear it.  This is effectively
  * a read-and-acknowledge. */
-#define VIRTIO_PCI_ISR			19
-
-/* The bit of the ISR which indicates a device configuration change. */
-#define VIRTIO_PCI_ISR_CONFIG		0x2
+#define VIRTIO_PCI_LEGACY_ISR			19
 
 /* MSI-X registers: only enabled if MSI-X is enabled. */
 /* A 16-bit vector for configuration changes. */
-#define VIRTIO_MSI_CONFIG_VECTOR        20
+#define VIRTIO_MSI_LEGACY_CONFIG_VECTOR        20
 /* A 16-bit vector for selected queue notifications. */
-#define VIRTIO_MSI_QUEUE_VECTOR         22
-/* Vector value used to disable MSI for queue */
-#define VIRTIO_MSI_NO_VECTOR            0xffff
+#define VIRTIO_MSI_LEGACY_QUEUE_VECTOR         22
 
 /* The remaining space is defined by each driver as the per-driver
  * configuration space */
-#define VIRTIO_PCI_CONFIG(dev)		((dev)->msix_enabled ? 24 : 20)
+#define VIRTIO_PCI_LEGACY_CONFIG(dev)		((dev)->msix_enabled ? 24 : 20)
+
+/* How many bits to shift physical queue address written to QUEUE_PFN.
+ * 12 is historical, and due to x86 page size. */
+#define VIRTIO_PCI_LEGACY_QUEUE_ADDR_SHIFT	12
+
+/* The alignment to use between consumer and producer parts of vring.
+ * x86 pagesize again. */
+#define VIRTIO_PCI_LEGACY_VRING_ALIGN		4096
+
+#ifndef __KERNEL__
+/* Don't break compile of old userspace code.  These will go away. */
+#define VIRTIO_PCI_HOST_FEATURES VIRTIO_PCI_LEGACY_HOST_FEATURES
+#define VIRTIO_PCI_GUEST_FEATURES VIRTIO_PCI_LEGACY_GUEST_FEATURES
+#define VIRTIO_PCI_LEGACY_QUEUE_PFN VIRTIO_PCI_QUEUE_PFN
+#define VIRTIO_PCI_LEGACY_QUEUE_NUM VIRTIO_PCI_QUEUE_NUM
+#define VIRTIO_PCI_LEGACY_QUEUE_SEL VIRTIO_PCI_QUEUE_SEL
+#define VIRTIO_PCI_LEGACY_QUEUE_NOTIFY VIRTIO_PCI_QUEUE_NOTIFY
+#define VIRTIO_PCI_LEGACY_STATUS VIRTIO_PCI_STATUS
+#define VIRTIO_PCI_LEGACY_ISR VIRTIO_PCI_ISR
+#define VIRTIO_MSI_LEGACY_CONFIG_VECTOR VIRTIO_MSI_CONFIG_VECTOR
+#define VIRTIO_MSI_LEGACY_QUEUE_VECTOR VIRTIO_MSI_QUEUE_VECTOR
+#define VIRTIO_PCI_LEGACY_CONFIG(dev) VIRTIO_PCI_CONFIG(dev)
+#define VIRTIO_PCI_LEGACY_QUEUE_ADDR_SHIFT VIRTIO_PCI_QUEUE_ADDR_SHIFT
+#define VIRTIO_PCI_LEGACY_VRING_ALIGN VIRTIO_PCI_VRING_ALIGN
+#endif /* ...!KERNEL */
 
 /* Virtio ABI version, this must match exactly */
 #define VIRTIO_PCI_ABI_VERSION		0
 
-/* How many bits to shift physical queue address written to QUEUE_PFN.
- * 12 is historical, and due to x86 page size. */
-#define VIRTIO_PCI_QUEUE_ADDR_SHIFT	12
+/* Vector value used to disable MSI for queue */
+#define VIRTIO_MSI_NO_VECTOR            0xffff
 
-/* The alignment to use between consumer and producer parts of vring.
- * x86 pagesize again. */
-#define VIRTIO_PCI_VRING_ALIGN		4096
+/* The bit of the ISR which indicates a device configuration change. */
+#define VIRTIO_PCI_ISR_CONFIG		0x2
 
 /*
  * Layout for Virtio PCI vendor specific capability (little-endian):
@@ -133,4 +151,20 @@
 #define VIRTIO_PCI_CAP_CFG_OFF_MASK	0xffffffff
 #define VIRTIO_PCI_CAP_CFG_OFF_SHIFT	0
 
+/* Fields in VIRTIO_PCI_CAP_COMMON_CFG: */
+struct virtio_pci_common_cfg {
+	/* About the whole device. */
+	__u64 device_features;	/* read-only */
+	__u64 guest_features;	/* read-write */
+	__u64 queue_address;	/* read-write */
+	__u16 msix_config;	/* read-write */
+	__u8 device_status;	/* read-write */
+	__u8 unused;
+
+	/* About a specific virtqueue. */
+	__u16 queue_select;	/* read-write */
+	__u16 queue_align;	/* read-write, power of 2. */
+	__u16 queue_size;	/* read-write, power of 2. */
+	__u16 queue_msix_vector;/* read-write */
+};
 #endif

^ permalink raw reply

* Re: [PATCH 5 of 5] virtio: expose added descriptors immediately
From: Rusty Russell @ 2011-11-23  1:19 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Christoph Hellwig, linux-kernel, kvm, virtualization
In-Reply-To: <20111122062907.GB11012@redhat.com>

On Tue, 22 Nov 2011 08:29:08 +0200, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Tue, Nov 22, 2011 at 11:03:04AM +1030, Rusty Russell wrote:
> > -	/* If you haven't kicked in this long, you're probably doing something
> > -	 * wrong. */
> > -	WARN_ON(vq->num_added > vq->vring.num);
> > +	/* This is very unlikely, but theoretically possible.  Kick
> > +	 * just in case. */
> > +	if (unlikely(vq->num_added == 65535))
> 
> This is 0xffff but why use the decimal notation?

Interesting.  Why use hex?  Feels more like binary?

But I've changed it to "(1 << 16) - 1" to be clear.

> > +		virtqueue_kick(_vq);
> >  
> >  	pr_debug("Added buffer head %i to %p\n", head, vq);
> >  	END_USE(vq);
> 
> We also still need to reset vq->num_added, right?

virtqueue_kick does that for us.

Cheers,
Rusty.

^ permalink raw reply

* [PATCHv3 RFC] virtio-pci: flexible configuration layout
From: Michael S. Tsirkin @ 2011-11-22 18:36 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Krishna Kumar, kvm, Pawel Moll, Wang Sheng-Hui,
	Alexey Kardashevskiy, lkml - Kernel Mailing List, virtualization,
	Christian Borntraeger, Sasha Levin, Amit Shah

Here's an updated vesion.
I'm alternating between updating the spec and the driver,
spec update to follow.

Compiled only.  Posting here for early feedback, and to allow Sasha to
proceed with his "kvm tool" work.

Changes from v2:
	address comments by Rusty
	bugfixes by Sasha
Changes from v1:
	Updated to match v3 of the spec, see:

todo:
	split core changes out

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 drivers/virtio/Kconfig      |   22 +++++
 drivers/virtio/virtio_pci.c |  203 ++++++++++++++++++++++++++++++++++++++++---
 include/asm-generic/io.h    |    4 +
 include/asm-generic/iomap.h |   11 +++
 include/linux/pci_regs.h    |    4 +
 include/linux/virtio_pci.h  |   41 +++++++++
 lib/iomap.c                 |   41 ++++++++-
 7 files changed, 307 insertions(+), 19 deletions(-)

diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
index 816ed08..465245e 100644
--- a/drivers/virtio/Kconfig
+++ b/drivers/virtio/Kconfig
@@ -25,6 +25,28 @@ config VIRTIO_PCI
 
 	  If unsure, say M.
 
+config VIRTIO_PCI_LEGACY
+	bool "Compatibility with Legacy PIO"
+	default y
+	depends on VIRTIO_PCI
+	---help---
+	  Look out into your driveway.  Do you have a flying car?  If
+	  so, you can happily disable this option and virtio will not
+	  break.  Otherwise, leave it set.  Unless you're testing what
+	  life will be like in The Future.
+
+	  In other words:
+
+	  Support compatibility with legacy PIO mapping in hypervisors.
+	  As of Nov 2011, this is required by all hypervisors without
+	  exception, so for now, disabling this option is only useful for
+	  testing.  In future hypervisors, it will be possible to disable
+	  this option and get a slightly smaller kernel.
+	  You know that your hypervisor is new enough if you disable this
+	  option and the device initialization passes.
+
+	  If unsure, say Y.
+
 config VIRTIO_BALLOON
 	tristate "Virtio balloon driver (EXPERIMENTAL)"
 	select VIRTIO
diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
index 3d1bf41..681347b 100644
--- a/drivers/virtio/virtio_pci.c
+++ b/drivers/virtio/virtio_pci.c
@@ -37,8 +37,14 @@ struct virtio_pci_device
 	struct virtio_device vdev;
 	struct pci_dev *pci_dev;
 
-	/* the IO mapping for the PCI config space */
+	/* the IO address for the common PCI config space */
 	void __iomem *ioaddr;
+	/* the IO address for device specific config */
+	void __iomem *ioaddr_device;
+	/* the IO address to use for notifications operations */
+	void __iomem *ioaddr_notify;
+	/* the IO address to use for reading ISR */
+	void __iomem *ioaddr_isr;
 
 	/* a list of queues so we can dispatch IRQs */
 	spinlock_t lock;
@@ -57,8 +63,175 @@ struct virtio_pci_device
 	unsigned msix_used_vectors;
 	/* Whether we have vector per vq */
 	bool per_vq_vectors;
+
+	/* Various IO mappings: used for resource tracking only. */
+
+#ifdef CONFIG_VIRTIO_PCI_LEGACY
+	/* Legacy BAR0: typically PIO. */
+	void __iomem *legacy_map;
+#endif
+
+	/* Mappings specified by device capabilities: typically in MMIO */
+	void __iomem *isr_map;
+	void __iomem *notify_map;
+	void __iomem *common_map;
+	void __iomem *device_map;
 };
 
+#ifdef CONFIG_VIRTIO_PCI_LEGACY
+static void __iomem *virtio_pci_set_legacy_map(struct virtio_pci_device *vp_dev)
+{
+	return vp_dev->legacy_map = pci_iomap(vp_dev->pci_dev, 0, 256);
+}
+
+static void __iomem *virtio_pci_legacy_map(struct virtio_pci_device *vp_dev)
+{
+	return vp_dev->legacy_map;
+}
+#else
+static void __iomem *virtio_pci_set_legacy_map(struct virtio_pci_device *vp_dev)
+{
+	return NULL;
+}
+
+static void __iomem *virtio_pci_legacy_map(struct virtio_pci_device *vp_dev)
+{
+	return NULL;
+}
+#endif
+
+/*
+ * With PIO, device-specific config moves as MSI-X is enabled/disabled.
+ * Use this accessor to keep pointer to that config in sync.
+ */
+static void virtio_pci_set_msix_enabled(struct virtio_pci_device *vp_dev, int enabled)
+{
+	void __iomem* m;
+	vp_dev->msix_enabled = enabled;
+	m = virtio_pci_legacy_map(vp_dev);
+	if (m)
+		vp_dev->ioaddr_device = m + VIRTIO_PCI_CONFIG(vp_dev);
+	else
+		vp_dev->ioaddr_device = vp_dev->device_map;
+}
+
+static void __iomem *virtio_pci_map_cfg(struct virtio_pci_device *vp_dev, u8 cap_id,
+					u32 align)
+{
+        u32 size;
+        u32 offset;
+        u8 bir;
+        u8 cap_len;
+	int pos;
+	struct pci_dev *dev = vp_dev->pci_dev;
+	void __iomem *p;
+
+	for (pos = pci_find_capability(dev, PCI_CAP_ID_VNDR);
+	     pos > 0;
+	     pos = pci_find_next_capability(dev, pos, PCI_CAP_ID_VNDR)) {
+		u8 id;
+		pci_read_config_byte(dev, pos + PCI_VNDR_CAP_LEN, &cap_len);
+		if (cap_len < VIRTIO_PCI_CAP_ID + 1)
+			continue;
+		pci_read_config_byte(dev, pos + VIRTIO_PCI_CAP_ID, &id);
+		id >>= VIRTIO_PCI_CAP_ID_SHIFT;
+		id &= VIRTIO_PCI_CAP_ID_MASK;
+		if (id == cap_id)
+			break;
+	}
+
+	if (pos <= 0)
+		return NULL;
+
+	if (cap_len < VIRTIO_PCI_CAP_CFG_BIR + 1)
+		goto err;
+        pci_read_config_byte(dev, pos + VIRTIO_PCI_CAP_CFG_BIR, &bir);
+	if (cap_len < VIRTIO_PCI_CAP_CFG_OFF + 4)
+		goto err;
+        pci_read_config_dword(dev, pos + VIRTIO_PCI_CAP_CFG_OFF, &offset);
+	if (cap_len < VIRTIO_PCI_CAP_CFG_SIZE + 4)
+		goto err;
+        pci_read_config_dword(dev, pos + VIRTIO_PCI_CAP_CFG_SIZE, &size);
+        bir >>= VIRTIO_PCI_CAP_CFG_BIR_SHIFT;
+        bir &= VIRTIO_PCI_CAP_CFG_BIR_MASK;
+        size >>= VIRTIO_PCI_CAP_CFG_SIZE_SHIFT;
+        size &= VIRTIO_PCI_CAP_CFG_SIZE_MASK;
+        offset >>= VIRTIO_PCI_CAP_CFG_OFF_SHIFT;
+        offset &= VIRTIO_PCI_CAP_CFG_OFF_MASK;
+	/* Align offset appropriately */
+	offset &= ~(align - 1);
+
+	/* It's possible for a device to supply a huge config space,
+	 * but we'll never need to map more than a page ATM. */
+	p = pci_iomap_range(dev, bir, offset, size, PAGE_SIZE);
+	if (!p)
+		dev_err(&vp_dev->vdev.dev, "Unable to map virtio pci memory");
+	return p;
+err:
+	dev_err(&vp_dev->vdev.dev, "Unable to parse virtio pci capability");
+	return NULL;
+}
+
+static void virtio_pci_iounmap(struct virtio_pci_device *vp_dev)
+{
+	if (virtio_pci_legacy_map(vp_dev))
+		pci_iounmap(vp_dev->pci_dev, virtio_pci_legacy_map(vp_dev));
+	if (vp_dev->isr_map)
+		pci_iounmap(vp_dev->pci_dev, vp_dev->isr_map);
+	if (vp_dev->notify_map)
+		pci_iounmap(vp_dev->pci_dev, vp_dev->notify_map);
+	if (vp_dev->common_map)
+		pci_iounmap(vp_dev->pci_dev, vp_dev->common_map);
+	if (vp_dev->device_map)
+		pci_iounmap(vp_dev->pci_dev, vp_dev->device_map);
+}
+
+static int virtio_pci_iomap(struct virtio_pci_device *vp_dev)
+{
+	vp_dev->isr_map = virtio_pci_map_cfg(vp_dev,
+					     VIRTIO_PCI_CAP_ISR_CFG,
+					     sizeof(u8));
+	vp_dev->notify_map = virtio_pci_map_cfg(vp_dev,
+						VIRTIO_PCI_CAP_NOTIFY_CFG,
+						sizeof(u16));
+	vp_dev->common_map = virtio_pci_map_cfg(vp_dev,
+						VIRTIO_PCI_CAP_COMMON_CFG,
+						sizeof(u32));
+	vp_dev->device_map = virtio_pci_map_cfg(vp_dev,
+						VIRTIO_PCI_CAP_DEVICE_CFG,
+						sizeof(u8));
+
+	if (vp_dev->notify_map && vp_dev->isr_map &&
+	    vp_dev->common_map && vp_dev->device_map) {
+		vp_dev->ioaddr = vp_dev->common_map;
+		vp_dev->ioaddr_isr = vp_dev->isr_map;
+		vp_dev->ioaddr_notify = vp_dev->notify_map;
+		vp_dev->ioaddr_device = vp_dev->device_map;
+	} else {
+		void __iomem* m;
+		/*
+		 * If not all capabilities present, map legacy PIO.
+		 * Legacy access is at BAR 0. We never need to map
+		 * more than 256 bytes there, since legacy config space
+		 * used PIO which has this size limit.
+		 * */
+		m = virtio_pci_set_legacy_map(vp_dev);
+		if (!m) {
+			dev_err(&vp_dev->vdev.dev, "Unable to map legacy PIO");
+			goto err;
+		}
+		vp_dev->ioaddr = m;
+		vp_dev->ioaddr_isr = m + VIRTIO_PCI_ISR;
+		vp_dev->ioaddr_notify = m + VIRTIO_PCI_QUEUE_NOTIFY;
+		vp_dev->ioaddr_device = m + VIRTIO_PCI_CONFIG(vp_dev);
+	}
+
+	return 0;
+err:
+	virtio_pci_iounmap(vp_dev);
+	return -EINVAL;
+}
+
 /* Constants for MSI-X */
 /* Use first vector for configuration changes, second and the rest for
  * virtqueues Thus, we need at least 2 vectors for MSI. */
@@ -130,8 +303,7 @@ static void vp_get(struct virtio_device *vdev, unsigned offset,
 		   void *buf, unsigned len)
 {
 	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
-	void __iomem *ioaddr = vp_dev->ioaddr +
-				VIRTIO_PCI_CONFIG(vp_dev) + offset;
+	void __iomem *ioaddr = vp_dev->ioaddr_device + offset;
 	u8 *ptr = buf;
 	int i;
 
@@ -145,8 +317,7 @@ static void vp_set(struct virtio_device *vdev, unsigned offset,
 		   const void *buf, unsigned len)
 {
 	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
-	void __iomem *ioaddr = vp_dev->ioaddr +
-				VIRTIO_PCI_CONFIG(vp_dev) + offset;
+	void __iomem *ioaddr = vp_dev->ioaddr_device + offset;
 	const u8 *ptr = buf;
 	int i;
 
@@ -184,7 +355,7 @@ static void vp_notify(struct virtqueue *vq)
 
 	/* we write the queue's selector into the notification register to
 	 * signal the other end */
-	iowrite16(info->queue_index, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_NOTIFY);
+	iowrite16(info->queue_index, vp_dev->ioaddr_notify);
 }
 
 /* Handle a configuration change: Tell driver if it wants to know. */
@@ -231,7 +402,8 @@ static irqreturn_t vp_interrupt(int irq, void *opaque)
 
 	/* reading the ISR has the effect of also clearing it so it's very
 	 * important to save off the value. */
-	isr = ioread8(vp_dev->ioaddr + VIRTIO_PCI_ISR);
+	isr = ioread8(vp_dev->ioaddr_notify +
+		      VIRTIO_PCI_ISR - VIRTIO_PCI_QUEUE_NOTIFY);
 
 	/* It's definitely not us if the ISR was not high */
 	if (!isr)
@@ -265,7 +437,7 @@ static void vp_free_vectors(struct virtio_device *vdev)
 		ioread16(vp_dev->ioaddr + VIRTIO_MSI_CONFIG_VECTOR);
 
 		pci_disable_msix(vp_dev->pci_dev);
-		vp_dev->msix_enabled = 0;
+                virtio_pci_set_msix_enabled(vp_dev, 0);
 		vp_dev->msix_vectors = 0;
 	}
 
@@ -303,7 +475,7 @@ static int vp_request_msix_vectors(struct virtio_device *vdev, int nvectors,
 	if (err)
 		goto error;
 	vp_dev->msix_vectors = nvectors;
-	vp_dev->msix_enabled = 1;
+        virtio_pci_set_msix_enabled(vp_dev, 1);
 
 	/* Set the vector used for configuration */
 	v = vp_dev->msix_used_vectors;
@@ -451,7 +623,10 @@ static void vp_del_vq(struct virtqueue *vq)
 		iowrite16(VIRTIO_MSI_NO_VECTOR,
 			  vp_dev->ioaddr + VIRTIO_MSI_QUEUE_VECTOR);
 		/* Flush the write out to device */
-		ioread8(vp_dev->ioaddr + VIRTIO_PCI_ISR);
+		ioread8(vp_dev->ioaddr + VIRTIO_MSI_QUEUE_VECTOR);
+		/* And clear ISR: TODO: really needed? */
+		ioread8(vp_dev->ioaddr_notify +
+		      VIRTIO_PCI_ISR - VIRTIO_PCI_QUEUE_NOTIFY);
 	}
 
 	vring_del_virtqueue(vq);
@@ -642,8 +817,8 @@ static int __devinit virtio_pci_probe(struct pci_dev *pci_dev,
 	if (err)
 		goto out_enable_device;
 
-	vp_dev->ioaddr = pci_iomap(pci_dev, 0, 0);
-	if (vp_dev->ioaddr == NULL)
+	err = virtio_pci_iomap(vp_dev);
+	if (err)
 		goto out_req_regions;
 
 	pci_set_drvdata(pci_dev, vp_dev);
@@ -665,7 +840,7 @@ static int __devinit virtio_pci_probe(struct pci_dev *pci_dev,
 
 out_set_drvdata:
 	pci_set_drvdata(pci_dev, NULL);
-	pci_iounmap(pci_dev, vp_dev->ioaddr);
+	virtio_pci_iounmap(vp_dev);
 out_req_regions:
 	pci_release_regions(pci_dev);
 out_enable_device:
@@ -683,7 +858,7 @@ static void __devexit virtio_pci_remove(struct pci_dev *pci_dev)
 
 	vp_del_vqs(&vp_dev->vdev);
 	pci_set_drvdata(pci_dev, NULL);
-	pci_iounmap(pci_dev, vp_dev->ioaddr);
+	virtio_pci_iounmap(vp_dev);
 	pci_release_regions(pci_dev);
 	pci_disable_device(pci_dev);
 	kfree(vp_dev);
diff --git a/include/asm-generic/io.h b/include/asm-generic/io.h
index 9120887..3cf1787 100644
--- a/include/asm-generic/io.h
+++ b/include/asm-generic/io.h
@@ -286,6 +286,10 @@ static inline void writesb(const void __iomem *addr, const void *buf, int len)
 /* Create a virtual mapping cookie for a PCI BAR (memory or IO) */
 struct pci_dev;
 extern void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned long max);
+extern void __iomem *pci_iomap_range(struct pci_dev *dev, int bar,
+                                     unsigned offset,
+                                     unsigned long minlen,
+                                     unsigned long maxlen);
 static inline void pci_iounmap(struct pci_dev *dev, void __iomem *p)
 {
 }
diff --git a/include/asm-generic/iomap.h b/include/asm-generic/iomap.h
index 98dcd76..6f192d4 100644
--- a/include/asm-generic/iomap.h
+++ b/include/asm-generic/iomap.h
@@ -70,8 +70,19 @@ extern void ioport_unmap(void __iomem *);
 /* Create a virtual mapping cookie for a PCI BAR (memory or IO) */
 struct pci_dev;
 extern void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned long max);
+extern void __iomem *pci_iomap_range(struct pci_dev *dev, int bar,
+                                     unsigned offset,
+                                     unsigned long minlen,
+                                     unsigned long maxlen);
 extern void pci_iounmap(struct pci_dev *dev, void __iomem *);
 #else
+static inline void __iomem *pci_iomap_range(struct pci_dev *dev, int bar,
+                                            unsigned offset,
+                                            unsigned long minlen,
+                                            unsigned long maxlen)
+{
+	return NULL;
+}
 struct pci_dev;
 static inline void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned long max)
 {
diff --git a/include/linux/pci_regs.h b/include/linux/pci_regs.h
index b5d9657..3e7905c 100644
--- a/include/linux/pci_regs.h
+++ b/include/linux/pci_regs.h
@@ -375,6 +375,10 @@
 #define  PCI_X_STATUS_266MHZ	0x40000000	/* 266 MHz capable */
 #define  PCI_X_STATUS_533MHZ	0x80000000	/* 533 MHz capable */
 
+/* Vendor specific capability */
+#define PCI_VNDR_CAP_LEN	2	/* Capability length (8 bits), including
+					   bytes: ID, NEXT and LEN itself. */
+
 /* PCI Bridge Subsystem ID registers */
 
 #define PCI_SSVID_VENDOR_ID     4	/* PCI-Bridge subsystem vendor id register */
diff --git a/include/linux/virtio_pci.h b/include/linux/virtio_pci.h
index ea66f3f..d6568e7 100644
--- a/include/linux/virtio_pci.h
+++ b/include/linux/virtio_pci.h
@@ -92,4 +92,45 @@
 /* The alignment to use between consumer and producer parts of vring.
  * x86 pagesize again. */
 #define VIRTIO_PCI_VRING_ALIGN		4096
+
+/*
+ * Layout for Virtio PCI vendor specific capability (little-endian):
+ * 5 bit virtio capability id.
+ * 3 bit BAR index register, specifying which BAR to use.
+ * 4 byte cfg offset within the BAR.
+ * 4 byte cfg size.
+ */
+
+/* A single virtio device has multiple vendor specific capabilities, we use the
+ * 5 bit ID field to distinguish between these. */
+#define VIRTIO_PCI_CAP_ID		3
+#define VIRTIO_PCI_CAP_ID_MASK		0xff
+#define VIRTIO_PCI_CAP_ID_SHIFT		0
+
+/* IDs for different capabilities. If a specific configuration
+ * is missing, legacy PIO path is used. */
+/* Common configuration */
+#define VIRTIO_PCI_CAP_COMMON_CFG	1
+/* Notifications */
+#define VIRTIO_PCI_CAP_NOTIFY_CFG	2
+/* ISR access */
+#define VIRTIO_PCI_CAP_ISR_CFG		3
+/* Device specific confiuration */
+#define VIRTIO_PCI_CAP_DEVICE_CFG	4
+
+/* Index of the BAR including this configuration */
+#define VIRTIO_PCI_CAP_CFG_BIR		4
+#define VIRTIO_PCI_CAP_CFG_BIR_MASK	(0x7)
+#define VIRTIO_PCI_CAP_CFG_BIR_SHIFT	0
+
+/* Size of the configuration space */
+#define VIRTIO_PCI_CAP_CFG_SIZE		4
+#define VIRTIO_PCI_CAP_CFG_SIZE_MASK	0xffffff
+#define VIRTIO_PCI_CAP_CFG_SIZE_SHIFT	8
+
+/* Offset within the BAR */
+#define VIRTIO_PCI_CAP_CFG_OFF		8
+#define VIRTIO_PCI_CAP_CFG_OFF_MASK	0xffffffff
+#define VIRTIO_PCI_CAP_CFG_OFF_SHIFT	0
+
 #endif
diff --git a/lib/iomap.c b/lib/iomap.c
index 5dbcb4b..93ae915 100644
--- a/lib/iomap.c
+++ b/lib/iomap.c
@@ -243,26 +243,37 @@ EXPORT_SYMBOL(ioport_unmap);
 
 #ifdef CONFIG_PCI
 /**
- * pci_iomap - create a virtual mapping cookie for a PCI BAR
+ * pci_iomap_range - create a virtual mapping cookie for a PCI BAR
  * @dev: PCI device that owns the BAR
  * @bar: BAR number
- * @maxlen: length of the memory to map
+ * @offset: map memory at the given offset in BAR
+ * @minlen: min length of the memory to map
+ * @maxlen: max length of the memory to map
  *
  * Using this function you will get a __iomem address to your device BAR.
  * You can access it using ioread*() and iowrite*(). These functions hide
  * the details if this is a MMIO or PIO address space and will just do what
  * you expect from them in the correct way.
  *
+ * @minlen specifies the minimum length to map. We check that BAR is
+ * large enough.
  * @maxlen specifies the maximum length to map. If you want to get access to
- * the complete BAR without checking for its length first, pass %0 here.
+ * the complete BAR from offset to the end, pass %0 here.
  * */
-void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned long maxlen)
+void __iomem *pci_iomap_range(struct pci_dev *dev, int bar,
+                              unsigned offset,
+                              unsigned long minlen,
+                              unsigned long maxlen)
 {
 	resource_size_t start = pci_resource_start(dev, bar);
 	resource_size_t len = pci_resource_len(dev, bar);
 	unsigned long flags = pci_resource_flags(dev, bar);
 
-	if (!len || !start)
+	if (len <= offset || !start)
+		return NULL;
+        len -= offset;
+	start += offset;
+        if (len < minlen)
 		return NULL;
 	if (maxlen && len > maxlen)
 		len = maxlen;
@@ -277,10 +288,30 @@ void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned long maxlen)
 	return NULL;
 }
 
+/**
+ * pci_iomap - create a virtual mapping cookie for a PCI BAR
+ * @dev: PCI device that owns the BAR
+ * @bar: BAR number
+ * @maxlen: length of the memory to map
+ *
+ * Using this function you will get a __iomem address to your device BAR.
+ * You can access it using ioread*() and iowrite*(). These functions hide
+ * the details if this is a MMIO or PIO address space and will just do what
+ * you expect from them in the correct way.
+ *
+ * @maxlen specifies the maximum length to map. If you want to get access to
+ * the complete BAR without checking for its length first, pass %0 here.
+ * */
+void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned long maxlen)
+{
+    return pci_iomap_range(dev, bar, 0, 0, maxlen);
+}
+
 void pci_iounmap(struct pci_dev *dev, void __iomem * addr)
 {
 	IO_COND(addr, /* nothing */, iounmap(addr));
 }
 EXPORT_SYMBOL(pci_iomap);
+EXPORT_SYMBOL(pci_iomap_range);
 EXPORT_SYMBOL(pci_iounmap);
 #endif /* CONFIG_PCI */
-- 
1.7.5.53.gc233e

^ permalink raw reply related

* Re: [RFC] kvm tools: Implement multiple VQ for virtio-net
From: Stephen Hemminger @ 2011-11-22 18:14 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Krishna Kumar, gorcunov, kvm, Michael S. Tsirkin, Asias He,
	virtualization, Pekka Enberg, Sasha Levin, netdev, mingo
In-Reply-To: <87bos6b936.fsf@rustcorp.com.au>

I have been playing with userspace-rcu which has a number of neat
lockless routines for queuing and hashing. But there aren't kernel versions
and several of them may require cmpxchg to work.

^ permalink raw reply

* Re: [PATCH 5 of 5] virtio: expose added descriptors immediately
From: Michael S. Tsirkin @ 2011-11-22  6:29 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Christoph Hellwig, linux-kernel, kvm, virtualization
In-Reply-To: <87lir9xagv.fsf@rustcorp.com.au>

On Tue, Nov 22, 2011 at 11:03:04AM +1030, Rusty Russell wrote:
> On Mon, 21 Nov 2011 13:57:04 +0200, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > On Mon, Nov 21, 2011 at 12:18:45PM +1030, Rusty Russell wrote:
> > > On Wed, 16 Nov 2011 09:18:38 +0200, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > My unlocked kick patches will trip this warning: they make
> > > > virtio-net do add + get without kick.
> > > 
> > > Heh, it's a good sign if they do, since that means you're running really
> > > well :)
> > 
> > They don't in fact, in my testing :(. But I think they can with luck.
> > 
> > > > I think block with unlocked kick can trip it too:
> > > > add, lock is dropped and then an interrupt can get.
> > > > 
> > > > We also don't need a kick each num - each 2^15 is enough.
> > > > Why don't we do this at start of add_buf:
> > > > if (vq->num_added >= 0x7fff)
> > > > 	return -ENOSPC;
> > > 
> > > The warning was there in case a driver is never doing a kick, and
> > > getting away with it (mostly) because the device is polling.  Let's not
> > > penalize good drivers to catch bad ones.
> > > 
> > > How about we do this properly, like so:
> > 
> > Absolutely. But I think we also need to handle num_added
> > overflow of a 15 bit counter, no? Otherwise the
> > vring_need_event logic might give us false negatives ....
> > I'm guessing we can just assume we need a kick in that case.
> 
> You're right.  Thankyou.  My immediate reaction of "make it an unsigned
> long" doesn't work.
> 
> Here's the diff to what I posted before:
> 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -254,9 +254,10 @@ add_head:
>  	vq->vring.avail->idx++;
>  	vq->num_added++;
>  
> -	/* If you haven't kicked in this long, you're probably doing something
> -	 * wrong. */
> -	WARN_ON(vq->num_added > vq->vring.num);
> +	/* This is very unlikely, but theoretically possible.  Kick
> +	 * just in case. */
> +	if (unlikely(vq->num_added == 65535))

This is 0xffff but why use the decimal notation?

> +		virtqueue_kick(_vq);
>  
>  	pr_debug("Added buffer head %i to %p\n", head, vq);
>  	END_USE(vq);

We also still need to reset vq->num_added, right?

-- 
MST

^ permalink raw reply

* Re: [PATCH v3 2/3] hvc_init(): Enforce one-time initialization.
From: Rusty Russell @ 2011-11-22  0:58 UTC (permalink / raw)
  To: Miche Baker-Harvey
  Cc: Stephen Rothwell, xen-devel, Konrad Rzeszutek Wilk,
	Benjamin Herrenschmidt, Greg Kroah-Hartman, linux-kernel,
	virtualization, Anton Blanchard, Amit Shah, Mike Waychison,
	ppc-dev, Eric Northrup
In-Reply-To: <CAB8Rdaqno9kVHB8wsbH5=zs_ppUkd1G=siywFFfH3Ud21qK3FA@mail.gmail.com>

On Mon, 21 Nov 2011 14:16:38 -0800, Miche Baker-Harvey <miche@google.com> wrote:
> Thanks, Rusty.  I'm not using QEMU though, just KVM.   I create the device, wait
> for the message from the guest that the device is ready, and then add ports.
> 
> Miche

OK, since Amit was the one who implemented multi-port console, I'm going
to hand this to him.

I'm sure he's been looking for excuses to dive back into the console
code!

Cheers,
Rusty.

^ permalink raw reply

* Re: [PATCH] virtio-mmio: Devices parameter parsing
From: Rusty Russell @ 2011-11-22  0:53 UTC (permalink / raw)
  To: Pawel Moll
  Cc: linux-kernel@vger.kernel.org,
	virtualization@lists.linux-foundation.org
In-Reply-To: <1321898210.3093.263.camel@hornet.cambridge.arm.com>

On Mon, 21 Nov 2011 17:56:50 +0000, Pawel Moll <pawel.moll@arm.com> wrote:
> On Mon, 2011-11-21 at 14:44 +0000, Pawel Moll wrote:
> > I'll post v3 tonight or tomorrow.
> 
> Ok, it won't be v3 then...
> 
> I tried again, using your suggestions - see below (just the crucial bit,
> however I have the kernel-parameters.txt bits ready as well). Parsing
> works like charm, however when it gets to device_register() and
> platform_device_register_resndata() I hit the same problem as
> previously. Both are using slab allocator...
> 
> device_register()->device_add()->device_private_init()->kzalloc()
> 
> and
> 
> platform_device_register_resndata()->platform_device_register_full()->
> platform_device_alloc()->kzalloc().
> 
> Essentially it seems that the parameter callbacks are just executed
> waaay to early to do more than just set few integers here and there...
> 
> Any ideas?

Rewrite the kernel and every arch so allocators work as soon as we hit
start_kernel() and we can get rid of hundreds of ugly workarounds?

Perhaps not today.  So, we will need a linked list of devices and
resources.  Yeah, that means allocating, which means YA
slab_is_available()/alloc_bootmem() hack.

Think of yourself as a pioneer...

Thanks :)
Rusty.

^ permalink raw reply

* Re: [PATCH] virtio-mmio: Devices parameter parsing
From: Rusty Russell @ 2011-11-22  0:44 UTC (permalink / raw)
  To: Pawel Moll
  Cc: Peter Maydell, linux-kernel@vger.kernel.org, Sasha Levin,
	virtualization@lists.linux-foundation.org
In-Reply-To: <1321886688.3093.248.camel@hornet.cambridge.arm.com>

On Mon, 21 Nov 2011 14:44:48 +0000, Pawel Moll <pawel.moll@arm.com> wrote:
> On Mon, 2011-11-21 at 03:32 +0000, Rusty Russell wrote:
> > 2) Doesn't leak mem on error paths.
> 
> Em, I don't think my code was leaking memory in any error case? (no
> offence meant or taken ;-)

You're right, I'm wrong.  In your original version, your innovative use
of free() then strdup() worked as intended.

> > 3) Handles error from platform_device_register_resndata().
> 
> As my code was ignoring wrong descriptions (all the continues) I ignored
> this result as well (the implementation complains on its own anyway),
> but I get your point.

I was referring to this:

	  return platform_device_register_resndata(&virtio_mmio_cmdline_parent,
			  "virtio-mmio", virtio_mmio_cmdline_id++,
			  resources, ARRAY_SIZE(resources), NULL, 0) != NULL;

The set function returns 0 or a -ve errno, not true/false.

> > 4) Uses shorter names for static functions/variables.
> 
> Ah, I get the hint :-) I'm trying to keep the naming convention in
> static symbols as well, as it makes cscope/ctags/grep usage easier...
> I'll just use the abbreviated "vm_" prefixes then.
> 
> > See what you think...
> 
> Funnily enough when I proposed some string parser few years ago (totally
> different story) I was flamed for using strchr() instead of strsep() ;-)
> But ok, I don't mind getting back to basics.
> 
> > +static int set_cmdline_device(const char *device, const struct kernel_param *kp)
> [...]
> > +       delim = strchr(device, '@');
> [...]
> > +	*delim = '\0';
> 
> Ah. I forgot that strchr() takes const char * but returns "non-const"
> char *... Cheating, that's what it is ;-), but will work. Probably what
> we really want is something like
> 	kstrtoull_delim(device, 0, &val, "@\0")
> I'll have a look and may try to propose something of that sort, but
> that's another story. Maybe I should just use simple_strtol() for the
> time being?

Or would it be simpler to enhance sscanf() with some weird format option
for suffixing?  I haven't looked for similar cases, but I'd suspect a
big win in general.

This would be neater than anything else we've got:
        if (sscanf(device, "%llu@%llu[KMG]:%u", ...) != 3
            && sscanf(device, "%llu@%llu[KMG]:%u:%u", ...) != 4)
                return -EINVAL;

Cheers,
Rusty.

^ permalink raw reply

* Re: [PATCH 5 of 5] virtio: expose added descriptors immediately
From: Rusty Russell @ 2011-11-22  0:33 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Christoph Hellwig, linux-kernel, kvm, virtualization
In-Reply-To: <20111121115702.GA27580@redhat.com>

On Mon, 21 Nov 2011 13:57:04 +0200, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Mon, Nov 21, 2011 at 12:18:45PM +1030, Rusty Russell wrote:
> > On Wed, 16 Nov 2011 09:18:38 +0200, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > My unlocked kick patches will trip this warning: they make
> > > virtio-net do add + get without kick.
> > 
> > Heh, it's a good sign if they do, since that means you're running really
> > well :)
> 
> They don't in fact, in my testing :(. But I think they can with luck.
> 
> > > I think block with unlocked kick can trip it too:
> > > add, lock is dropped and then an interrupt can get.
> > > 
> > > We also don't need a kick each num - each 2^15 is enough.
> > > Why don't we do this at start of add_buf:
> > > if (vq->num_added >= 0x7fff)
> > > 	return -ENOSPC;
> > 
> > The warning was there in case a driver is never doing a kick, and
> > getting away with it (mostly) because the device is polling.  Let's not
> > penalize good drivers to catch bad ones.
> > 
> > How about we do this properly, like so:
> 
> Absolutely. But I think we also need to handle num_added
> overflow of a 15 bit counter, no? Otherwise the
> vring_need_event logic might give us false negatives ....
> I'm guessing we can just assume we need a kick in that case.

You're right.  Thankyou.  My immediate reaction of "make it an unsigned
long" doesn't work.

Here's the diff to what I posted before:

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -254,9 +254,10 @@ add_head:
 	vq->vring.avail->idx++;
 	vq->num_added++;
 
-	/* If you haven't kicked in this long, you're probably doing something
-	 * wrong. */
-	WARN_ON(vq->num_added > vq->vring.num);
+	/* This is very unlikely, but theoretically possible.  Kick
+	 * just in case. */
+	if (unlikely(vq->num_added == 65535))
+		virtqueue_kick(_vq);
 
 	pr_debug("Added buffer head %i to %p\n", head, vq);
 	END_USE(vq);

^ permalink raw reply

* Re: [PATCH v3 2/3] hvc_init(): Enforce one-time initialization.
From: Miche Baker-Harvey @ 2011-11-21 22:16 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Stephen Rothwell, xen-devel, Konrad Rzeszutek Wilk,
	Benjamin Herrenschmidt, Greg Kroah-Hartman, linux-kernel,
	virtualization, Anton Blanchard, Amit Shah, Mike Waychison,
	ppc-dev, Eric Northrup
In-Reply-To: <87lira9ii7.fsf@rustcorp.com.au>

Thanks, Rusty.  I'm not using QEMU though, just KVM.   I create the device, wait
for the message from the guest that the device is ready, and then add ports.

Miche

On Sun, Nov 20, 2011 at 9:01 PM, Rusty Russell <rusty@rustcorp.com.au> wrote:
> On Thu, 17 Nov 2011 10:57:37 -0800, Miche Baker-Harvey <miche@google.com> wrote:
>> Rusty, Michael, Stephen, et al,
>>
>> Thanks for your comments on these patches.
>>
>> For what I'm trying to do, all three patches are necessary, but maybe
>> I'm going about it the wrong way. Your input would be appreciated.
>> I'm in no way claiming that these patches are "right", just that it's
>> working for me, and that what's in the current pool is not.
>
> We have to *understand* the code.  If we don't, we need to rewrite the
> code so we *do* understand it, or make way for someone who does.
>
> I'm looking at the kvm man page to try to figure out how to have virtio
> console, and it's deeply unclear.  What kvm commandline are you using?
> I'll try to debug it here, and see what I learn about hvc_console.
>
> Cheers,
> Rusty.
>

^ permalink raw reply

* Re: [PATCH] virtio-mmio: Devices parameter parsing
From: Pawel Moll @ 2011-11-21 17:56 UTC (permalink / raw)
  To: Rusty Russell
  Cc: linux-kernel@vger.kernel.org,
	virtualization@lists.linux-foundation.org
In-Reply-To: <1321886688.3093.248.camel@hornet.cambridge.arm.com>

On Mon, 2011-11-21 at 14:44 +0000, Pawel Moll wrote:
> I'll post v3 tonight or tomorrow.

Ok, it won't be v3 then...

I tried again, using your suggestions - see below (just the crucial bit,
however I have the kernel-parameters.txt bits ready as well). Parsing
works like charm, however when it gets to device_register() and
platform_device_register_resndata() I hit the same problem as
previously. Both are using slab allocator...

device_register()->device_add()->device_private_init()->kzalloc()

and

platform_device_register_resndata()->platform_device_register_full()->
platform_device_alloc()->kzalloc().

Essentially it seems that the parameter callbacks are just executed
waaay to early to do more than just set few integers here and there...

Any ideas?

Cheers!

Paweł

8<-------------------------------------------------------------------

@@ -443,6 +489,145 @@ static int __devexit virtio_mmio_remove(struct platform_device *pdev)
 
 
 
+/* Devices list parameter */
+
+#if defined(CONFIG_VIRTIO_MMIO_CMDLINE_DEVICES)
+
+static struct device vm_cmdline_parent = {
+	.init_name = "virtio-mmio-cmdline",
+};
+
+static int vm_cmdline_parent_registered;
+static int vm_cmdline_id;
+
+static int vm_cmdline_set(const char *str,
+		const struct kernel_param *kp)
+{
+	int err;
+	struct resource resources[2];
+	unsigned long long val;
+	char *delim;
+	struct platform_device *pdev;
+
+	/* Size */
+	resources[0].flags = IORESOURCE_MEM;
+	resources[0].end = memparse(str, &delim) - 1;
+	if (*delim != '@')
+		return -EINVAL;
+	str = delim + 1;
+
+	/* Base address */
+	delim = strchr(str, ':');
+	if (!delim)
+		return -EINVAL;
+	/* kstrtoull is strict, so we have to temporarily truncate */
+	*delim = '\0';
+	err = kstrtoull(str, 0, &val);
+	*delim = ':';
+	if (err)
+		return err;
+	resources[0].start = val;
+	resources[0].end += val;
+	str = delim + 1;
+	
+	/* Interrupt */
+	delim = strchr(str, ':');
+	if (delim) /* Optional device id delimiter */
+		*delim = '\0';
+	err = kstrtoull(str, 0, &val);
+	if (delim)
+		*delim = ':';
+	resources[1].flags = IORESOURCE_IRQ;
+	resources[1].start = resources[1].end = val;
+	str = delim + 1;
+
+	/* Optional device id */
+	if (delim) {
+		err = kstrtoull(str, 0, &val);
+		if (err)
+			return err;
+		vm_cmdline_id = val;
+	}
+
+	if (!vm_cmdline_parent_registered) {
+		err = device_register(&vm_cmdline_parent);
+		if (err) {
+			pr_err("Failed to register parent device!\n");
+			return err;
+		}
+		vm_cmdline_parent_registered = 1;
+	}
+
+       pr_info("Registering device virtio-mmio.%d at 0x%llx-0x%llx, IRQ %d.\n",
+		       vm_cmdline_id++,
+		       (unsigned long long)resources[0].start,
+		       (unsigned long long)resources[0].end,
+		       (int)resources[1].start);
+
+	pdev = platform_device_register_resndata(&vm_cmdline_parent,
+			"virtio-mmio", vm_cmdline_id++,
+			resources, ARRAY_SIZE(resources), NULL, 0);
+	if (IS_ERR(pdev))
+		return PTR_ERR(pdev);
+
+	return 0;
+}
+
+static int vm_cmdline_get_device(struct device *dev, void *data)
+{
+	char *buffer = data;
+	unsigned int len = strlen(buffer);
+	struct platform_device *pdev = to_platform_device(dev);
+
+	snprintf(buffer + len, PAGE_SIZE - len, "0x%llx@0x%llx:%llu:%d\n",
+			pdev->resource[0].end - pdev->resource[0].start + 1ULL,
+			(unsigned long long)pdev->resource[0].start,
+			(unsigned long long)pdev->resource[1].start,
+			pdev->id);
+	return 0;
+}
+
+static int vm_cmdline_get(char *buffer, const struct kernel_param *kp)
+{
+	buffer[0] = '\0';
+	device_for_each_child(&vm_cmdline_parent, buffer,
+			vm_cmdline_get_device);
+	return strlen(buffer) + 1;
+}
+
+static struct kernel_param_ops vm_cmdline_param_ops = {
+	.set = vm_cmdline_set,
+	.get = vm_cmdline_get,
+};
+
+module_param_cb(device, &vm_cmdline_param_ops, NULL, S_IRUSR);
+
+static int vm_unregister_cmdline_device(struct device *dev,
+		void *data)
+{
+	platform_device_unregister(to_platform_device(dev));
+
+	return 0;
+}
+
+static void vm_unregister_cmdline_devices(void)
+{
+	if (vm_cmdline_parent_registered) {
+		device_for_each_child(&vm_cmdline_parent, NULL,
+				vm_unregister_cmdline_device);
+		device_unregister(&vm_cmdline_parent);
+		vm_cmdline_parent_registered = 0;
+	}
+}
+
+#else
+
+static void virtio_mmio_unregister_cmdline_devices(void)
+{
+}
+
+#endif
+
 /* Platform driver */
 
 static struct of_device_id virtio_mmio_match[] = {
@@ -469,6 +654,7 @@ static int __init virtio_mmio_init(void)
 static void __exit virtio_mmio_exit(void)
 {
 	platform_driver_unregister(&virtio_mmio_driver);
+	vm_unregister_cmdline_devices();
 }
 
 module_init(virtio_mmio_init);



_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH] virtio-mmio: Devices parameter parsing
From: Pawel Moll @ 2011-11-21 14:44 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Peter Maydell, linux-kernel@vger.kernel.org, Sasha Levin,
	virtualization@lists.linux-foundation.org
In-Reply-To: <87vcqe9ml9.fsf@rustcorp.com.au>

On Mon, 2011-11-21 at 03:32 +0000, Rusty Russell wrote:
> No it's not; I didn't bother when I converted things across, and it
> shows.  

546970bc (Rusty Russell   2010-08-11 23:04:20 -0600 133) #define module_param_cb

Now I understand... :-)

> But we expect virtio hackers to be smarter than average :)

Hope I'll stand up to the challenge ;-)

> > there would be a clash. So I wanted to add a "first_id" parameter, but
<...>
> Well, tell them to get the cmdline order right, or allow an explicit
> device id in the commandline.

Yeah, I've came up with the same idea last night:

	<size>[KMG]@<base>:<irq>[:<id>]

<id>, if specified, sets the id for the current device and a base for
the next one.

> Since I hope we're going to be coding together more often, I've written
> this how I would have done it (based loosely on your version) to
> compare.

Thanks, your efforts are truly appreciated!

> Main changes other than the obvious:
> 1) Documentation in kernel-parameters.txt

I was considering this previously but for some reason I thought
kernel-parameters.txt wasn't a place for module parameters at all. Now I
had another look and see I was wrong.

> 2) Doesn't leak mem on error paths.

Em, I don't think my code was leaking memory in any error case? (no
offence meant or taken ;-)

> 3) Handles error from platform_device_register_resndata().

As my code was ignoring wrong descriptions (all the continues) I ignored
this result as well (the implementation complains on its own anyway),
but I get your point.

> 4) Uses shorter names for static functions/variables.

Ah, I get the hint :-) I'm trying to keep the naming convention in
static symbols as well, as it makes cscope/ctags/grep usage easier...
I'll just use the abbreviated "vm_" prefixes then.

> See what you think...

Funnily enough when I proposed some string parser few years ago (totally
different story) I was flamed for using strchr() instead of strsep() ;-)
But ok, I don't mind getting back to basics.

> +static int set_cmdline_device(const char *device, const struct kernel_param *kp)
[...]
> +       delim = strchr(device, '@');
[...]
> +	*delim = '\0';

Ah. I forgot that strchr() takes const char * but returns "non-const"
char *... Cheating, that's what it is ;-), but will work. Probably what
we really want is something like
	kstrtoull_delim(device, 0, &val, "@\0")
I'll have a look and may try to propose something of that sort, but
that's another story. Maybe I should just use simple_strtol() for the
time being?

I'll post v3 tonight or tomorrow.

Cheers!

Paweł


_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH 5 of 5] virtio: expose added descriptors immediately
From: Michael S. Tsirkin @ 2011-11-21 11:57 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Christoph Hellwig, linux-kernel, kvm, virtualization
In-Reply-To: <8739dib5z6.fsf@rustcorp.com.au>

On Mon, Nov 21, 2011 at 12:18:45PM +1030, Rusty Russell wrote:
> On Wed, 16 Nov 2011 09:18:38 +0200, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > My unlocked kick patches will trip this warning: they make
> > virtio-net do add + get without kick.
> 
> Heh, it's a good sign if they do, since that means you're running really
> well :)

They don't in fact, in my testing :(. But I think they can with luck.

> > I think block with unlocked kick can trip it too:
> > add, lock is dropped and then an interrupt can get.
> > 
> > We also don't need a kick each num - each 2^15 is enough.
> > Why don't we do this at start of add_buf:
> > if (vq->num_added >= 0x7fff)
> > 	return -ENOSPC;
> 
> The warning was there in case a driver is never doing a kick, and
> getting away with it (mostly) because the device is polling.  Let's not
> penalize good drivers to catch bad ones.
> 
> How about we do this properly, like so:

Absolutely. But I think we also need to handle num_added
overflow of a 15 bit counter, no? Otherwise the
vring_need_event logic might give us false negatives ....
I'm guessing we can just assume we need a kick in that case.

> From: Rusty Russell <rusty@rustcorp.com.au>
> Subject: virtio: add debugging if driver doesn't kick.
> 
> Under the existing #ifdef DEBUG, check that they don't have more than
> 1/10 of a second between an add_buf() and a
> virtqueue_notify()/virtqueue_kick_prepare() call.
> 
> We could get false positives on a really busy system, but good for
> development.
> 
> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
> ---
>  drivers/virtio/virtio_ring.c |   31 +++++++++++++++++++++++++++++++
>  1 file changed, 31 insertions(+)
> 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -22,6 +22,7 @@
>  #include <linux/device.h>
>  #include <linux/slab.h>
>  #include <linux/module.h>
> +#include <linux/hrtimer.h>
>  
>  /* virtio guest is communicating with a virtual "device" that actually runs on
>   * a host processor.  Memory barriers are used to control SMP effects. */
> @@ -102,6 +103,10 @@ struct vring_virtqueue
>  #ifdef DEBUG
>  	/* They're supposed to lock for us. */
>  	unsigned int in_use;
> +
> +	/* Figure out if their kicks are too delayed. */
> +	bool last_add_time_valid;
> +	ktime_t last_add_time;
>  #endif
>  
>  	/* Tokens for callbacks. */
> @@ -192,6 +197,19 @@ int virtqueue_add_buf(struct virtqueue *
>  
>  	BUG_ON(data == NULL);
>  
> +#ifdef DEBUG
> +	{
> +		ktime_t now = ktime_get();
> +
> +		/* No kick or get, with .1 second between?  Warn. */
> +		if (vq->last_add_time_valid)
> +			WARN_ON(ktime_to_ms(ktime_sub(now, vq->last_add_time))
> +					    > 100);
> +		vq->last_add_time = now;
> +		vq->last_add_time_valid = true;
> +	}
> +#endif
> +
>  	/* If the host supports indirect descriptor tables, and we have multiple
>  	 * buffers, then go indirect. FIXME: tune this threshold */
>  	if (vq->indirect && (out + in) > 1 && vq->num_free) {
> @@ -291,6 +309,14 @@ bool virtqueue_kick_prepare(struct virtq
>  	new = vq->vring.avail->idx;
>  	vq->num_added = 0;
>  
> +#ifdef DEBUG
> +	if (vq->last_add_time_valid) {
> +		WARN_ON(ktime_to_ms(ktime_sub(ktime_get(),
> +					      vq->last_add_time)) > 100);
> +	}
> +	vq->last_add_time_valid = false;
> +#endif
> +
>  	if (vq->event) {
>  		needs_kick = vring_need_event(vring_avail_event(&vq->vring),
>  					      new, old);
> @@ -428,6 +454,10 @@ void *virtqueue_get_buf(struct virtqueue
>  		virtio_mb();
>  	}
>  
> +#ifdef DEBUG
> +	vq->last_add_time_valid = false;
> +#endif
> +
>  	END_USE(vq);
>  	return ret;
>  }
> @@ -611,6 +641,7 @@ struct virtqueue *vring_new_virtqueue(un
>  	list_add_tail(&vq->vq.list, &vdev->vqs);
>  #ifdef DEBUG
>  	vq->in_use = false;
> +	vq->last_add_time_valid = false;
>  #endif
>  
>  	vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC);

^ permalink raw reply

* Re: [PATCH v3 2/3] hvc_init(): Enforce one-time initialization.
From: Rusty Russell @ 2011-11-21  5:01 UTC (permalink / raw)
  To: Miche Baker-Harvey
  Cc: Stephen Rothwell, xen-devel, Konrad Rzeszutek Wilk,
	Benjamin Herrenschmidt, Greg Kroah-Hartman, linux-kernel,
	virtualization, Anton Blanchard, Amit Shah, Mike Waychison,
	ppc-dev, Eric Northrup
In-Reply-To: <CAB8RdapbaueyWLJuJDE_ZdvLkRNM4sQHTxgmgO=jrnXZ6YPSmA@mail.gmail.com>

On Thu, 17 Nov 2011 10:57:37 -0800, Miche Baker-Harvey <miche@google.com> wrote:
> Rusty, Michael, Stephen, et al,
> 
> Thanks for your comments on these patches.
> 
> For what I'm trying to do, all three patches are necessary, but maybe
> I'm going about it the wrong way. Your input would be appreciated.
> I'm in no way claiming that these patches are "right", just that it's
> working for me, and that what's in the current pool is not.

We have to *understand* the code.  If we don't, we need to rewrite the
code so we *do* understand it, or make way for someone who does.

I'm looking at the kvm man page to try to figure out how to have virtio
console, and it's deeply unclear.  What kvm commandline are you using?
I'll try to debug it here, and see what I learn about hvc_console.

Cheers,
Rusty.

^ permalink raw reply

* Re: [PATCH] virtio-pci: make reset operation safer
From: Rusty Russell @ 2011-11-21  4:45 UTC (permalink / raw)
  To: Amit Shah; +Cc: virtualization, linux-kernel, Michael S. Tsirkin
In-Reply-To: <20111117154114.GA31281@redhat.com>

On Thu, 17 Nov 2011 17:41:15 +0200, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> virtio pci device reset actually just does an I/O
> write, which in PCI is really posted, that is it
> can complete on CPU before the device has received it.

Thanks, applied.  Because it's a theoretical issue, I've not cc'd
stable.

Thanks,
Rusty.

^ permalink raw reply

* Re: [PATCH] virtio-mmio: Devices parameter parsing
From: Rusty Russell @ 2011-11-21  3:32 UTC (permalink / raw)
  To: Pawel Moll
  Cc: Peter Maydell, linux-kernel@vger.kernel.org, Sasha Levin,
	virtualization@lists.linux-foundation.org
In-Reply-To: <1321467222.3137.417.camel@hornet.cambridge.arm.com>

On Wed, 16 Nov 2011 18:13:42 +0000, Pawel Moll <pawel.moll@arm.com> wrote:
> On Wed, 2011-11-16 at 00:42 +0000, Rusty Russell wrote:
> > On Tue, 15 Nov 2011 13:53:05 +0000, Pawel Moll <pawel.moll@arm.com> wrote:
> > > +static char *virtio_mmio_cmdline_devices;
> > > +module_param_named(devices, virtio_mmio_cmdline_devices, charp, 0);
> > 
> > This is the wrong way to do this.
> > 
> > Don't put things in a charp and process it later.  It's lazy.  
> 
> Definitely not lazy - string parsing is very absorbing, really! ;-)
> 
> > You
> > should write parsers for it and call it straight from module_param.
> > 
> > And if you do it that way, multiple devices are simply multiple
> > arguments.
> >
> > module_param_cb(device, &param_ops_virtio_mmio, NULL, 0400);
> 
> Hm. Honestly, first time I hear someone suggesting using the param_cb
> variant... It doesn't seem to be too popular ;-)

No it's not; I didn't bother when I converted things across, and it
shows.  But we expect virtio hackers to be smarter than average :)

> But anyway, I tried to do use your suggestion (see below), even if I'm
> not convinced it's winning anything. But, in order to use the strsep()
> and kstrtoull() I need a non-const version of the string. And as the
> slab is not available at the time, I can't simply do kstrdup(), I'd have
> to abuse the "const char *val" params_ops.set's argument...
> Interestingly charp operations have the same problem:
> 
> int param_set_charp(const char *val, const struct kernel_param *kp)
> <...>
>         /* This is a hack.  We can't kmalloc in early boot, and we
>          * don't need to; this mangled commandline is preserved. */
>         if (slab_is_available()) {
> 
> Also, regarding the fact that one parameter define more than one
> "entity" - this is how mtd partitions are defined (all similarities
> intended ;-), see "drivers/mtd/cmdlinepart.c". And I quite like this
> syntax...

Yes, that's the traditional method.  I don't really hate it, but it
seems unnecessary and it's less useful when reporting parse errors.

> There's one more thing I realize I missed. The platform devices are
> registered starting from id 0 (so as "virtio-mmio.0"). Now, if you
> happened to have a statically defined virtio-mmio with the same id,
> there would be a clash. So I wanted to add a "first_id" parameter, but
> with the _cb parameter I can't guarantee ordering (I mean, to have the
> "first_id" available _before_ first device is being instantiated). So
> I'd have to cache the devices and then create them in one go. Sounds
> like the charp parameter for me :-)

Well, tell them to get the cmdline order right, or allow an explicit
device id in the commandline.

Since I hope we're going to be coding together more often, I've written
this how I would have done it (based loosely on your version) to
compare.

Main changes other than the obvious:
1) Documentation in kernel-parameters.txt
2) Doesn't leak mem on error paths.
3) Handles error from platform_device_register_resndata().
4) Uses shorter names for static functions/variables.
5) Allows (read) access to kernel parameters via sysfs.
5) Completely untested.

See what you think...
Rusty.

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -110,6 +110,7 @@ parameter is applicable:
 	USB	USB support is enabled.
 	USBHID	USB Human Interface Device support is enabled.
 	V4L	Video For Linux support is enabled.
+	VMMIO	CONFIG_VIRTIO_MMIO_CMDLINE_DEVICES is enabled.
 	VGA	The VGA console has been enabled.
 	VT	Virtual terminal support is enabled.
 	WDT	Watchdog support is enabled.
@@ -2720,6 +2721,12 @@ bytes respectively. Such letter suffixes
 	video=		[FB] Frame buffer configuration
 			See Documentation/fb/modedb.txt.
 
+	virtio_mmio.device=<size>[KMG]@<baseaddr>:<irq>
+			Can be used multiple times for multiple devices.
+			Example would be:
+				virtio_mmio.device=0x100@0x100b0000:48
+
+
 	vga=		[BOOT,X86-32] Select a particular video mode
 			See Documentation/x86/boot.txt and
 			Documentation/svga.txt.
diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
--- a/drivers/virtio/Kconfig
+++ b/drivers/virtio/Kconfig
@@ -46,4 +46,15 @@ config VIRTIO_BALLOON
 
  	 If unsure, say N.
 
+config VIRTIO_MMIO_CMDLINE_DEVICES
+	bool "Memory mapped virtio devices parameter parsing"
+	depends on VIRTIO_MMIO
+	---help---
+	 Allow virtio-mmio devices instantiation via the kernel command line
+	 or module parameters. Be aware that using incorrect parameters (base
+	 address in particular) can crash your system - you have been warned.
+	 See Documentation/kernel-parameters.txt for details.
+
+	 If unsure, say 'N'.
+
 endmenu
diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
--- a/drivers/virtio/virtio_mmio.c
+++ b/drivers/virtio/virtio_mmio.c
@@ -6,6 +6,45 @@
  * This module allows virtio devices to be used over a virtual, memory mapped
  * platform device.
  *
+ * The guest device(s) may be instantiated in one of three equivalent ways:
+ *
+ * 1. Static platform device in board's code, eg.:
+ *
+ *	static struct platform_device v2m_virtio_device = {
+ *		.name = "virtio-mmio",
+ *		.id = -1,
+ *		.num_resources = 2,
+ *		.resource = (struct resource []) {
+ *			{
+ *				.start = 0x1001e000,
+ *				.end = 0x1001e0ff,
+ *				.flags = IORESOURCE_MEM,
+ *			}, {
+ *				.start = 42 + 32,
+ *				.end = 42 + 32,
+ *				.flags = IORESOURCE_IRQ,
+ *			},
+ *		}
+ *	};
+ *
+ * 2. Device Tree node, eg.:
+ *
+ *		virtio_block@1e000 {
+ *			compatible = "virtio,mmio";
+ *			reg = <0x1e000 0x100>;
+ *			interrupts = <42>;
+ *		}
+ *
+ * 3. Kernel module (or command line) parameter (can be used more than once!)
+ *		[virtio_mmio.]device=<size>@<baseaddr>:<irq>
+ *    where:
+ *		<size>     := size (can use standard suffixes like K or M)
+ *		<baseaddr> := physical base address
+ *		<irq>      := interrupt number (as passed to request_irq())
+ *    eg.:
+ *		virtio_mmio.device=0x100@0x100b0000:48 virtio_mmio.device=1K@0x1001e000:74
+ *
+ *
  * Registers layout (all 32-bit wide):
  *
  * offset d. name             description
@@ -42,6 +81,8 @@
  * See the COPYING file in the top-level directory.
  */
 
+#define pr_fmt(fmt) "virtio-mmio: " fmt
+
 #include <linux/highmem.h>
 #include <linux/interrupt.h>
 #include <linux/io.h>
@@ -443,6 +484,119 @@ static int __devexit virtio_mmio_remove(
 
 
 
+/* Devices list parameter */
+
+#if defined(CONFIG_VIRTIO_MMIO_CMDLINE_DEVICES)
+static struct device cmdline_parent;
+static bool cmdline_parent_registered;
+static int cmdline_id __initdata;
+
+/* <size>@<baseaddr>:<irq> */
+static int set_cmdline_device(const char *device, const struct kernel_param *kp)
+{
+	int err;
+	struct resource resources[2];
+	unsigned long long val;
+	char *delim, *p;
+	struct platform_device *pdev;
+
+	delim = strchr(device, '@');
+	if (!delim)
+		return -EINVAL;
+
+	/* kstrtoull is strict, so we have to temporarily truncate. */
+	*delim = '\0';
+	err = kstrtoull(device, 0, &val);
+	*delim = '@';
+	if (err)
+		return err;
+
+	resources[0].flags = IORESOURCE_MEM;
+	resources[0].start = val;
+	resources[0].end = val + memparse(delim + 1, &p) - 1;
+
+	if (*p != ':')
+		return -EINVAL;
+
+	err = kstrtoull(p+1, 0, &val);
+	if (err)
+		return err;
+
+	resources[1].flags = IORESOURCE_IRQ;
+	resources[1].start = resources[1].end = val;
+
+	pr_info("Registering device virtio-mmio.%d at 0x%lx-0x%lx, IRQ %u.\n",
+		cmdline_id,
+		(long)resources[0].start, (long)resources[0].end,
+		(int)resources[1].start);
+
+	if (!cmdline_parent_registered) {
+		cmdline_parent.init_name = "virtio-mmio-cmdline";
+		err = device_register(&cmdline_parent);
+		if (err) {
+			pr_err("Failed to register parent device (%u)!\n", err);
+			return err;
+		}
+		cmdline_parent_registered = true;
+	}
+
+	pdev = platform_device_register_resndata(&cmdline_parent,
+						 "virtio-mmio",
+						 cmdline_id,
+						 resources,
+						 ARRAY_SIZE(resources),
+						 NULL, 0);
+	if (IS_ERR(pdev))
+		return PTR_ERR(pdev);
+	cmdline_id++;
+	return 0;
+}
+
+static int get_dev(struct device *dev, void *_buf)
+{
+	char *buf = _buf;
+	unsigned int len = strlen(buf);
+	struct platform_device *pdev = to_platform_device(dev);
+
+	snprintf(buf + len, PAGE_SIZE - len, "%llu@%llu:%llu\n",
+		 pdev->resource[0].end - pdev->resource[0].start + 1ULL,
+		 (unsigned long long)pdev->resource[0].start,
+		 (unsigned long long)pdev->resource[1].start);
+	return 0;
+}
+
+static int get_cmdline_device(char *buffer, const struct kernel_param *kp)
+{
+	buffer[0] = '\0';
+	device_for_each_child(&cmdline_parent, buffer, get_dev);
+	return strlen(buffer) + 1;
+}
+
+static struct kernel_param_ops cmdline_param_ops = {
+	.set = set_cmdline_device,
+	.get = get_cmdline_device,
+};
+
+module_param_cb(device, &cmdline_param_ops, NULL, 0400);
+
+static int unregister_cmdline_device(struct device *dev, void *data)
+{
+	platform_device_unregister(to_platform_device(dev));
+	return 0;
+}
+
+static void unregister_cmdline_devices(void)
+{
+	device_for_each_child(&cmdline_parent, NULL, unregister_cmdline_device);
+	if (cmdline_parent_registered)
+		device_unregister(&cmdline_parent);
+}
+#else /* ! CONFIG_VIRTIO_MMIO_CMDLINE_DEVICES */
+static void unregister_cmdline_devices(void)
+{
+}
+#endif
+
 /* Platform driver */
 
 static struct of_device_id virtio_mmio_match[] = {
@@ -468,6 +622,7 @@ static int __init virtio_mmio_init(void)
 
 static void __exit virtio_mmio_exit(void)
 {
+	unregister_cmdline_devices();
 	platform_driver_unregister(&virtio_mmio_driver);
 }

^ permalink raw reply

* Re: [PATCH 5 of 5] virtio: expose added descriptors immediately
From: Rusty Russell @ 2011-11-21  1:48 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Christoph Hellwig, linux-kernel, kvm, virtualization
In-Reply-To: <20111116071838.GE5433@redhat.com>

On Wed, 16 Nov 2011 09:18:38 +0200, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> My unlocked kick patches will trip this warning: they make
> virtio-net do add + get without kick.

Heh, it's a good sign if they do, since that means you're running really
well :)

> I think block with unlocked kick can trip it too:
> add, lock is dropped and then an interrupt can get.
> 
> We also don't need a kick each num - each 2^15 is enough.
> Why don't we do this at start of add_buf:
> if (vq->num_added >= 0x7fff)
> 	return -ENOSPC;

The warning was there in case a driver is never doing a kick, and
getting away with it (mostly) because the device is polling.  Let's not
penalize good drivers to catch bad ones.

How about we do this properly, like so:

From: Rusty Russell <rusty@rustcorp.com.au>
Subject: virtio: add debugging if driver doesn't kick.

Under the existing #ifdef DEBUG, check that they don't have more than
1/10 of a second between an add_buf() and a
virtqueue_notify()/virtqueue_kick_prepare() call.

We could get false positives on a really busy system, but good for
development.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
---
 drivers/virtio/virtio_ring.c |   31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -22,6 +22,7 @@
 #include <linux/device.h>
 #include <linux/slab.h>
 #include <linux/module.h>
+#include <linux/hrtimer.h>
 
 /* virtio guest is communicating with a virtual "device" that actually runs on
  * a host processor.  Memory barriers are used to control SMP effects. */
@@ -102,6 +103,10 @@ struct vring_virtqueue
 #ifdef DEBUG
 	/* They're supposed to lock for us. */
 	unsigned int in_use;
+
+	/* Figure out if their kicks are too delayed. */
+	bool last_add_time_valid;
+	ktime_t last_add_time;
 #endif
 
 	/* Tokens for callbacks. */
@@ -192,6 +197,19 @@ int virtqueue_add_buf(struct virtqueue *
 
 	BUG_ON(data == NULL);
 
+#ifdef DEBUG
+	{
+		ktime_t now = ktime_get();
+
+		/* No kick or get, with .1 second between?  Warn. */
+		if (vq->last_add_time_valid)
+			WARN_ON(ktime_to_ms(ktime_sub(now, vq->last_add_time))
+					    > 100);
+		vq->last_add_time = now;
+		vq->last_add_time_valid = true;
+	}
+#endif
+
 	/* If the host supports indirect descriptor tables, and we have multiple
 	 * buffers, then go indirect. FIXME: tune this threshold */
 	if (vq->indirect && (out + in) > 1 && vq->num_free) {
@@ -291,6 +309,14 @@ bool virtqueue_kick_prepare(struct virtq
 	new = vq->vring.avail->idx;
 	vq->num_added = 0;
 
+#ifdef DEBUG
+	if (vq->last_add_time_valid) {
+		WARN_ON(ktime_to_ms(ktime_sub(ktime_get(),
+					      vq->last_add_time)) > 100);
+	}
+	vq->last_add_time_valid = false;
+#endif
+
 	if (vq->event) {
 		needs_kick = vring_need_event(vring_avail_event(&vq->vring),
 					      new, old);
@@ -428,6 +454,10 @@ void *virtqueue_get_buf(struct virtqueue
 		virtio_mb();
 	}
 
+#ifdef DEBUG
+	vq->last_add_time_valid = false;
+#endif
+
 	END_USE(vq);
 	return ret;
 }
@@ -611,6 +641,7 @@ struct virtqueue *vring_new_virtqueue(un
 	list_add_tail(&vq->vq.list, &vdev->vqs);
 #ifdef DEBUG
 	vq->in_use = false;
+	vq->last_add_time_valid = false;
 #endif
 
 	vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC);

^ permalink raw reply

* Re: [RFC 1/5] virtio-pci: flexible configuration layout
From: Rusty Russell @ 2011-11-21  1:01 UTC (permalink / raw)
  To: Sasha Levin, linux-kernel; +Cc: penberg, mingo, virtualization, kvm, mst
In-Reply-To: <1321314197-5265-2-git-send-email-levinsasha928@gmail.com>

On Tue, 15 Nov 2011 01:43:13 +0200, Sasha Levin <levinsasha928@gmail.com> wrote:
> From: "Michael S. Tsirkin" <mst@redhat.com>
> 
> Add a flexible mechanism to specify virtio configuration layout, using
> pci vendor-specific capability.  A separate capability is used for each
> of common, device specific and data-path accesses.

OK, a couple of minor suggestions:

> +static int virtio_pci_iomap(struct virtio_pci_device *vp_dev)
> +{
> +	vp_dev->isr_map = virtio_pci_map_cfg(vp_dev,
> +					     VIRTIO_PCI_CAP_ISR_CFG,
> +					     sizeof(u8));
> +	vp_dev->notify_map = virtio_pci_map_cfg(vp_dev,
> +						VIRTIO_PCI_CAP_NOTIFY_CFG,
> +						sizeof(u16));
> +	vp_dev->common_map = virtio_pci_map_cfg(vp_dev,
> +						VIRTIO_PCI_CAP_COMMON_CFG,
> +						sizeof(u32));
> +	vp_dev->device_map = virtio_pci_map_cfg(vp_dev,
> +						VIRTIO_PCI_CAP_DEVICE_CFG,
> +						sizeof(u8));
> +
> +	if (!vp_dev->notify_map || !vp_dev->common_map ||
> +	    !vp_dev->device_map) {
> +		/*
> +		 * If not all capabilities present, map legacy PIO.
> +		 * Legacy access is at BAR 0. We never need to map
> +		 * more than 256 bytes there, since legacy config space
> +		 * used PIO which has this size limit.
> +		 * */
> +		vp_dev->legacy_map = pci_iomap(vp_dev->pci_dev, 0, 256);
> +		if (!vp_dev->legacy_map) {
> +			dev_err(&vp_dev->vdev.dev, "Unable to map legacy PIO");
> +			goto err;
> +		}
> +	}

Please don't mix the two.  Ever, under any circumstances.  If it helps,
put CONFIG_VIRTIO_PCI_LEGACY #ifdefs in there:

	vp_dev->common_map = virtio_pci_map_cfg(vp_dev,
						VIRTIO_PCI_CAP_COMMON_CFG,
						sizeof(u32));
        /* Something horribly wrong? */
        if (IS_ERR(vp_dev->common_map)) {
                err = PTR_ERR(vp_dev->common_map);
                goto fail;
        }

        /* Not found? */
        if (!vp_dev->common_map)
                return legacy_setup(vp_dev, ...);

        ...

Practice has shown that if we allow something, it'll be done.  We should
break horribly on malformed devices, and really really only fall back
when we have a well-formed, but old, device.

And various other legacy paths can happily use:

#ifdef CONFIG_VIRTIO_PCI_LEGACY
#define is_legacy(vp_dev) ((vp_dev)->legacy_map != NULL)
#else
#define is_legacy(vp_dev) false
#endif

Here's suggested Kconfig entry:

config VIRTIO_PCI_LEGACY
	bool
        default y
	depends on VIRTIO_PCI
	---help---
	  Look out into your driveway.  Do you have a flying car?  If
          so, you can happily disable this option and virtio will not
          break.  Otherwise, leave it set.  Unless you're testing what
          life will be like in The Future.

Cheers,
Rusty.

^ permalink raw reply

* Re: [RFC] kvm tools: Implement multiple VQ for virtio-net
From: Rusty Russell @ 2011-11-21  0:41 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Krishna Kumar, gorcunov, kvm, Asias He, virtualization,
	Pekka Enberg, Sasha Levin, netdev, mingo, Stephen Hemminger
In-Reply-To: <20111116072317.GG5433@redhat.com>

On Wed, 16 Nov 2011 09:23:17 +0200, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Wed, Nov 16, 2011 at 10:34:42AM +1030, Rusty Russell wrote:
> > On Mon, 14 Nov 2011 15:05:07 +0200, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > On Mon, Nov 14, 2011 at 02:25:17PM +0200, Pekka Enberg wrote:
> > > > On Mon, Nov 14, 2011 at 4:04 AM, Asias He <asias.hejun@gmail.com> wrote:
> > > > > Why both the bandwidth and latency performance are dropping so dramatically
> > > > > with multiple VQ?
> > > > 
> > > > What's the expected benefit from multiple VQs
> > > 
> > > Heh, the original patchset didn't mention this :) It really should.
> > > They are supposed to speed up networking for high smp guests.
> > 
> > If we have one queue per guest CPU, does this allow us to run lockless?
> > 
> > Thanks,
> > Rusty.
> 
> LLTX? It's supposed to be deprecated, isn't it?

I was referring back to "Subject: virtio net lockless ring" which
Stephen sent back in June, nothing more specific.

I assumed from his query that this was still an active area of
exploration...

Stephen?

Thanks,
Rusty.

^ permalink raw reply

* CFP: CCGrid 2012 deadline extension
From: Ioan Raicu @ 2011-11-20 14:52 UTC (permalink / raw)
  To: virtualization

****************

DEADLINE EXTENSION (CCGrid 2012)
Must submit paper by November 25, but can update submission until 
December 2, 2011.

****************

12th IEEE/ACM International Symposium on Cluster, Cloud and Grid

Computing (CCGrid 2012) Ottawa, Canada

May 13-16, 2012

http://www.cloudbus.org/ccgrid2012


CALL FOR PAPERS

Rapid advances in processing, communication and systems/middleware
technologies are leading to new paradigms and platforms for computing,
ranging from computing Clusters to widely distributed Grid and
emerging Clouds. CCGrid is a series of very successful conferences,
sponsored by the IEEE Computer Society Technical Committee on Scalable
Computing (TCSC) and ACM, with the overarching goal of bringing
together international researchers, developers, and users and to
provide an international forum to present leading research activities
and results on a broad range of topics related to these platforms and
paradigms and their applications. The conference features keynotes,
technical presentations, posters and research demos, workshops,
tutorials, as well as the SCALE challenges featuring live
demonstrations.

In 2012, CCGrid will come to Canada for the first time and will be
held in Ottawa, the capital city. CCGrid 2012 will have a focus on
important and immediate issues that are significantly influencing all
aspects of cluster, cloud and grid computing. Topics of interest
include, but are not limited to:

  * Applications and Experiences: Applications to real and complex
    problems in science, engineering, business and society; User
    studies; Experiences with large-scale deployments systems or
    applications.

  * Architecture: System architectures, Design and deployment.

  * Autonomic Computing and Cyberinfrastructure: Self managed behavior,
    models and technologies; Autonomic paradigms and approaches
    (control-based, bio-inspired, emergent, etc.); Bio-inspired
    approaches to management; SLA definition and enforcement.

  * Performance Modeling and Evaluation: Performance models; Monitoring
    and evaluation tools, Analysis of system/application performance;
    Benchmarks and testbeds.

  * Programming Models, Systems, and Fault-Tolerant Computing:
    Programming models for cluster, clouds and grid computing; fault
    tolerant infrastructure and algorithms; systems software to enable
    efficient computing.

  * Multicore and Accelerator-based Computing: Software and application
    techniques to utilize multicore architectures and
    accelerators/heterogeneous computing systems.

  * Scheduling and Resource Management: Techniques to schedule jobs and
    resources on clusters, clouds and grid computing platforms.

  * Cloud Computing: Cloud architectures; Software tools and techniques
    for clouds.


PAPER SUBMISSION

Authors are invited to submit papers electronically. Submitted
manuscripts should be structured as technical papers and may not
exceed 8 letter size (8.5 x 11) pages including figures, tables and
references using the IEEE format for conference proceedings:

http://www.computer.org/portal/web/cscps/formatting

Submissions not conforming to these guidelines may be
returned without review. Authors should submit the manuscript in PDF
format and make sure that the file will print on a printer that uses
letter size (8.5 x 11) paper. The official language of the meeting is
English. All manuscripts will be reviewed and will be judged on
correctness, originality, technical strength, significance, quality of
presentation, and interest and relevance to the conference attendees.

Submitted papers must represent original unpublished research that is
not currently under review for any other conference or journal. Papers
not following these guidelines will be rejected without review and
further action may be taken, including (but not limited to)
notifications sent to the heads of the institutions of the authors and
sponsors of the conference. Submissions received after the due date,
exceeding the page limit, or not appropriately structured may not be
considered. Authors may contact the conference chairs for more
information. The proceedings will be published through the IEEE
Computer Society Press, USA and will be made available online through
the IEEE Digital Library.


Submission Link:

https://www.easychair.org/account/signin.cgi?conf=ccgrid2012


JOURNAL SPECIAL ISSUE

Highly rated Top 6 papers from the CCGrid 2012 conference will be
invited to extend for publication in a special issue of the "Future
Generation Computer Systems (FGCS)" Journal published by
Elsevier Press.


CHAIRS

General Chair

  * Shikharesh Majumdar, Carleton University, Canada

Honorary Chair

  * Geoffrey Fox, Indiana University, USA

Program Committee Co-Chairs

  * Rajkumar Buyya, University of Melbourne, Australia
  * Pavan Balaji, Argonne National Laboratory, USA

Program Committee Vice-chairs

  * Daniel S. Katz (Applications and Experiences)
  * Dhabaleswar K. Panda (Architecture)
  * Manish Parashar (Middleware, Autonomic Computing, and 
Cyberinfrastructure)
  * Ahmad Afsahi (Performance Modeling and Analysis)
  * Xian-He Sun (Performance Measurement and Evaluation)
  * William Gropp (Programming Models, Systems, and Fault-Tolerant 
computing)
  * David Bader (Multicore and Accelerator-based Computing)
  * Thomas Fahringer (Scheduling and Resource Management)
  * Ignacio Martin Llorente and Madhusudhan Govindaraju (Cloud Computing)

Cyber Co-Chairs

  * Anton Beloglazov, The University of Melbourne, Australia
  * Suraj Pandey, CSIRO, Australia
  * Trevor Gelowsky, Carleton University, Canada

Workshops Co-Chairs

  * Marin Litiou, York University, Canada
  * Mukaddim Pathan, Telstra Corporation Limited, Australia

Publicity Chairs

  * Helen Karatza, Aristotle University of Thessaloniki, Greece
  * Ioan Raicu, Illinois Institute of Technology&  Argonne National 
Labs, USA
  * Bruno Schulze, National Laboratory for Scientific Computing, Brazil
  * G Subrahmanya VRK Rao: Cognizant technology Solutions, India

Tutorials Co-Chairs

  * Sushil K. Prasad, Georgia State University, USA
  * Rob Simmonds, Westgrid, Canada

Doctoral Symposium Co-Chairs

  * Carlos Varela, Rensselaer Polytechnic Institute, USA
  * Yogesh Simmhan, University of Southern California

Poster and Research Demo Co-Chairs

  * Suraj Pandey, CSIRO, Australia

SCALE Challenge Coordinator

  * Shantenu Jha, Rutgers and Loisiana State University

Steering Committee

  * Henri Bal, Vrije University, The Netherlands
  * Pavan Balaji, Argonne National Laboratory, USA
  * Rajkumar Buyya, University of Melbourne, Australia (Chair)
  * Franck Capello, University of Paris-Sud, France
  * Jack Dongarra, University of Tennessee&  ORNL, USA
  * Dick Epema, Technical University of Delft, The Netherlands
  * Thomas Fahringer, University of Innsbruck, Austria
  * Ian Foster, University of Chicago, USA
  * Wolfgang Gentzsch, DEISA, Germany
  * Hai Jin, Huazhong University of Science&  Technology, China
  * Craig Lee, The Aerospace Corporation, USA (Co-Chair)
  * Laurent Lefevre, INRIA, France
  * Geng Lin, Dell Inc., USA
  * Manish Parashar, Rutgers: The State University of New Jersey, USA
  * Shikharesh Majumdar, Carleton University, Canada
  * Satoshi Matsuoaka, Tokyo Institute of Technology, Japan
  * Omer Rana, Cardiff University, UK
  * Paul Roe, Queensland University of Technology, Australia
  * Bruno Schulze, LNCC, Brazil
  * Nalini Venkatasubramanian, University of California, USA
  * Carlos Varela, Rensselaer Polytechnic Institute, USA


IMPORTANT DATES

   Papers Due:             25 November 2011
   Papers Final Revisions:     02 December 2011
   Notification of Acceptance:     30 January 2012
   Camera Ready Papers Due:     27 February 2012
   Sponsors: IEEE Computer Society (TCSE)&  ACM SIGARCH (approval pending)

-- 
=================================================================
Ioan Raicu, Ph.D.
Assistant Professor, Illinois Institute of Technology (IIT)
Guest Research Faculty, Argonne National Laboratory (ANL)
=================================================================
Data-Intensive Distributed Systems Laboratory, CS/IIT
Distributed Systems Laboratory, MCS/ANL
=================================================================
Cel:    1-847-722-0876
Office: 1-312-567-5704
Email:  iraicu@cs.iit.edu
Web:    http://www.cs.iit.edu/~iraicu/
Web:    http://datasys.cs.iit.edu/
=================================================================
=================================================================

^ permalink raw reply

* RE: [PATCH 1/1] Staging: hv: storvsc: Move the storage driver out of the staging area
From: KY Srinivasan @ 2011-11-18 23:28 UTC (permalink / raw)
  To: James Bottomley
  Cc: gregkh@suse.de, linux-kernel@vger.kernel.org,
	devel@linuxdriverproject.org, virtualization@lists.osdl.org,
	linux-scsi@vger.kernel.org, ohering@suse.com, hch@infradead.org
In-Reply-To: <1321635631.3022.44.camel@dabdike.int.hansenpartnership.com>



> -----Original Message-----
> From: James Bottomley [mailto:James.Bottomley@HansenPartnership.com]
> Sent: Friday, November 18, 2011 12:01 PM
> To: KY Srinivasan
> Cc: gregkh@suse.de; linux-kernel@vger.kernel.org;
> devel@linuxdriverproject.org; virtualization@lists.osdl.org; linux-
> scsi@vger.kernel.org; ohering@suse.com; hch@infradead.org
> Subject: RE: [PATCH 1/1] Staging: hv: storvsc: Move the storage driver out of the
> staging area
> 
> On Fri, 2011-11-18 at 04:53 +0000, KY Srinivasan wrote:
> > > I still think you need to disable clustering and junk the bvec merge
> > > function.  Your object seems to be to accumulate in page size multiples
> > > (and not aggregate over this) ... that's what clustering is designed to
> > > do.
> >
> > As part of addressing your first round of comments, I experimented with your
> > suggestions and I could not get rid of the code that does the bounce buffer
> handling.
> > I could generate I/O patterns that would require bounce buffer handling with
> your
> > suggestions in place.
> 
> There are two separate things here.  One is the scatter gather list
> bounce buffer processing, which is a bit nasty, but if you want to use <
> 4kb block size filesystems, I suppose I have to accept that you need it
> (although since it's a global fault of the hypervisor, perhaps it should
> sit in the hypervisor interface layer, so that every SG based transport
> can use it).  The other is the biovec merge function addition.  The
> purpose of this function seems to be to ensure you get one SG entry for
> every page (if I read it right)?  In that case, you don't need any of
> this special casing.  The flag DISABLE_CLUSTERING achieves the same
> thing.

Thanks James. I will keep you posted on my progress on this issue.

Regards,

K. Y


^ permalink raw reply

* Re: [RFC] [ver3 PATCH 3/6] virtio_net: virtio_net driver changes
From: Ben Hutchings @ 2011-11-18 17:14 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Krishna Kumar, kvm, mst, netdev, virtualization, davem
In-Reply-To: <1321633081.2974.1.camel@sasha>

On Fri, 2011-11-18 at 18:18 +0200, Sasha Levin wrote:
> On Fri, 2011-11-18 at 15:40 +0000, Ben Hutchings wrote:
> > On Fri, 2011-11-18 at 08:24 +0200, Sasha Levin wrote:
> > > On Fri, 2011-11-18 at 01:08 +0000, Ben Hutchings wrote:
> > > > On Fri, 2011-11-11 at 18:34 +0530, Krishna Kumar wrote:
> > > > > Changes for multiqueue virtio_net driver.
> > > > [...]
> > > > > @@ -677,25 +730,35 @@ static struct rtnl_link_stats64 *virtnet
> > > > >  {
> > > > >  	struct virtnet_info *vi = netdev_priv(dev);
> > > > >  	int cpu;
> > > > > -	unsigned int start;
> > > > >  
> > > > >  	for_each_possible_cpu(cpu) {
> > > > > -		struct virtnet_stats __percpu *stats
> > > > > -			= per_cpu_ptr(vi->stats, cpu);
> > > > > -		u64 tpackets, tbytes, rpackets, rbytes;
> > > > > -
> > > > > -		do {
> > > > > -			start = u64_stats_fetch_begin(&stats->syncp);
> > > > > -			tpackets = stats->tx_packets;
> > > > > -			tbytes   = stats->tx_bytes;
> > > > > -			rpackets = stats->rx_packets;
> > > > > -			rbytes   = stats->rx_bytes;
> > > > > -		} while (u64_stats_fetch_retry(&stats->syncp, start));
> > > > > -
> > > > > -		tot->rx_packets += rpackets;
> > > > > -		tot->tx_packets += tpackets;
> > > > > -		tot->rx_bytes   += rbytes;
> > > > > -		tot->tx_bytes   += tbytes;
> > > > > +		int qpair;
> > > > > +
> > > > > +		for (qpair = 0; qpair < vi->num_queue_pairs; qpair++) {
> > > > > +			struct virtnet_send_stats __percpu *tx_stat;
> > > > > +			struct virtnet_recv_stats __percpu *rx_stat;
> > > > 
> > > > While you're at it, you can drop the per-CPU stats and make them only
> > > > per-queue.  There is unlikely to be any benefit in maintaining them
> > > > per-CPU while receive and transmit processing is serialised per-queue.
> > > 
> > > It allows you to update stats without a lock.
> > 
> > But you'll already be holding a lock related to the queue.
> 
> Right, but now you're holding a queue lock just when playing with the
> queue, we don't hold it when we process the data - which is when we
> usually need to update stats.
[...]

The *stack* is holding the appropriate lock when calling the NAPI poll
function or ndo_start_xmit function.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* RE: [PATCH 1/1] Staging: hv: storvsc: Move the storage driver out of the staging area
From: James Bottomley @ 2011-11-18 17:00 UTC (permalink / raw)
  To: KY Srinivasan
  Cc: gregkh@suse.de, linux-kernel@vger.kernel.org,
	devel@linuxdriverproject.org, virtualization@lists.osdl.org,
	linux-scsi@vger.kernel.org, ohering@suse.com, hch@infradead.org
In-Reply-To: <6E21E5352C11B742B20C142EB499E0481AA6D232@TK5EX14MBXC128.redmond.corp.microsoft.com>

On Fri, 2011-11-18 at 04:53 +0000, KY Srinivasan wrote:
> > I still think you need to disable clustering and junk the bvec merge
> > function.  Your object seems to be to accumulate in page size multiples
> > (and not aggregate over this) ... that's what clustering is designed to
> > do.
> 
> As part of addressing your first round of comments, I experimented with your
> suggestions and I could not get rid of the code that does the bounce buffer handling.
> I could generate I/O patterns that would require bounce buffer handling with your
> suggestions in place.

There are two separate things here.  One is the scatter gather list
bounce buffer processing, which is a bit nasty, but if you want to use <
4kb block size filesystems, I suppose I have to accept that you need it
(although since it's a global fault of the hypervisor, perhaps it should
sit in the hypervisor interface layer, so that every SG based transport
can use it).  The other is the biovec merge function addition.  The
purpose of this function seems to be to ensure you get one SG entry for
every page (if I read it right)?  In that case, you don't need any of
this special casing.  The flag DISABLE_CLUSTERING achieves the same
thing.

James

^ permalink raw reply

* Re: [RFC] [ver3 PATCH 3/6] virtio_net: virtio_net driver changes
From: Sasha Levin @ 2011-11-18 16:18 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: Krishna Kumar, kvm, mst, netdev, virtualization, davem
In-Reply-To: <1321630839.2885.117.camel@deadeye>

On Fri, 2011-11-18 at 15:40 +0000, Ben Hutchings wrote:
> On Fri, 2011-11-18 at 08:24 +0200, Sasha Levin wrote:
> > On Fri, 2011-11-18 at 01:08 +0000, Ben Hutchings wrote:
> > > On Fri, 2011-11-11 at 18:34 +0530, Krishna Kumar wrote:
> > > > Changes for multiqueue virtio_net driver.
> > > [...]
> > > > @@ -677,25 +730,35 @@ static struct rtnl_link_stats64 *virtnet
> > > >  {
> > > >  	struct virtnet_info *vi = netdev_priv(dev);
> > > >  	int cpu;
> > > > -	unsigned int start;
> > > >  
> > > >  	for_each_possible_cpu(cpu) {
> > > > -		struct virtnet_stats __percpu *stats
> > > > -			= per_cpu_ptr(vi->stats, cpu);
> > > > -		u64 tpackets, tbytes, rpackets, rbytes;
> > > > -
> > > > -		do {
> > > > -			start = u64_stats_fetch_begin(&stats->syncp);
> > > > -			tpackets = stats->tx_packets;
> > > > -			tbytes   = stats->tx_bytes;
> > > > -			rpackets = stats->rx_packets;
> > > > -			rbytes   = stats->rx_bytes;
> > > > -		} while (u64_stats_fetch_retry(&stats->syncp, start));
> > > > -
> > > > -		tot->rx_packets += rpackets;
> > > > -		tot->tx_packets += tpackets;
> > > > -		tot->rx_bytes   += rbytes;
> > > > -		tot->tx_bytes   += tbytes;
> > > > +		int qpair;
> > > > +
> > > > +		for (qpair = 0; qpair < vi->num_queue_pairs; qpair++) {
> > > > +			struct virtnet_send_stats __percpu *tx_stat;
> > > > +			struct virtnet_recv_stats __percpu *rx_stat;
> > > 
> > > While you're at it, you can drop the per-CPU stats and make them only
> > > per-queue.  There is unlikely to be any benefit in maintaining them
> > > per-CPU while receive and transmit processing is serialised per-queue.
> > 
> > It allows you to update stats without a lock.
> 
> But you'll already be holding a lock related to the queue.

Right, but now you're holding a queue lock just when playing with the
queue, we don't hold it when we process the data - which is when we
usually need to update stats.

> > Whats the benefit of having them per queue?
> 
> It should save some memory (and a little time when summing stats, though
> that's unlikely to matter much).
> 
> The important thing is that splitting up stats per-CPU *and* per-queue
> is a waste.
> 
> Ben.
> 


-- 

Sasha.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox