From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1Nk1x7-0007Sr-L8
	for qemu-devel@nongnu.org; Tue, 23 Feb 2010 16:05:57 -0500
Received: from [199.232.76.173] (port=38732 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1Nk1x6-0007S3-SQ
	for qemu-devel@nongnu.org; Tue, 23 Feb 2010 16:05:57 -0500
Received: from Debian-exim by monty-python.gnu.org with spam-scanned (Exim
	4.60) (envelope-from <anthony@codemonkey.ws>) id 1Nk1x2-00029o-Fv
	for qemu-devel@nongnu.org; Tue, 23 Feb 2010 16:05:56 -0500
Received: from mail-vw0-f45.google.com ([209.85.212.45]:43191)
	by monty-python.gnu.org with esmtp (Exim 4.60)
	(envelope-from <anthony@codemonkey.ws>) id 1Nk1x1-00029b-V3
	for qemu-devel@nongnu.org; Tue, 23 Feb 2010 16:05:52 -0500
Received: by vws11 with SMTP id 11so575379vws.4
	for <qemu-devel@nongnu.org>; Tue, 23 Feb 2010 13:05:51 -0800 (PST)
Message-ID: <4B84432C.6090104@codemonkey.ws>
Date: Tue, 23 Feb 2010 15:05:48 -0600
From: Anthony Liguori <anthony@codemonkey.ws>
MIME-Version: 1.0
Subject: Re: [Qemu-devel] [PATCH] VirtIO Support for Memory Regions
References: <20100223205225.14860.50990.stgit@st-brides.cs.ualberta.ca>
In-Reply-To: <20100223205225.14860.50990.stgit@st-brides.cs.ualberta.ca>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Cam Macdonell <cam@cs.ualberta.ca>
Cc: qemu-devel@nongnu.org

Hi Cam,

On 02/23/2010 02:52 PM, Cam Macdonell wrote:
> Support for passing memory regions via VirtIO to remove need for PCI
> support in the guest.
>
> Adds new vectors to VirtIO config space to pass memory regions similar
> to how virtqueues are passed.
>
> Kernel patch is forthcoming that add device_ops to access the memory regions.
>
> I have used this mechanism to implement my host shared memory implementation
> and modified Alex's Virtio FB to use it as well.
>    

Virtio is really a DMA engine.  One of the nice things about it's design 
is that you can do things like transparent bounce buffering if needed.  
Adding a mechanism like this breaks this abstract and turns virtio into 
something that's more than I think it should be.

For guest shared memory, what makes sense to me is to make use of 
uio_pci within a guest to map bars in userspace.  The device can have 
the first bar map a small MMIO region that is tied to a ioeventfd with 
KVM.  With just qemu, it can be tied to a normal eventfd.  You can use 
irqfd with KVM to tie an eventfd to the PCI irq.  Likewise, you can use 
a normal eventfd to emulate that.  I'd suggest using a /dev/shm file to 
act as the shared memory segment.  You'll need a little magic within 
qemu_ram_alloc() to use this (maybe qemu_ram_alloc() should take a 
context parameter so this can be setup).

You'll need to setup the backend with the name of the shared memory 
segment and the two eventfds.  Also, a user needs to specify a pci 
vendor/device id.  It's probably worth having some kind of identifier 
tag in the PCI device's first MMIO region.

I think there's a couple important characteristics of an implementation 
like this.  By using a uio_pci, you can implement shared memory in 
userspace but you can also implement a proper kernel driver if you so 
desired for a specific device.

Regards,

Anthony Liguori

> Comments are welcome,
>
> Thanks,
> Cam
> ---
>   hw/virtio-pci.c |   61 +++++++++++++++++++++++++++++++++++++++++++++++++++----
>   hw/virtio.c     |   38 ++++++++++++++++++++++++++++++++++
>   hw/virtio.h     |   22 ++++++++++++++++++++
>   3 files changed, 117 insertions(+), 4 deletions(-)
>
> diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
> index f3373ae..e71bf64 100644
> --- a/hw/virtio-pci.c
> +++ b/hw/virtio-pci.c
> @@ -55,13 +55,13 @@
>
>   /* MSI-X registers: only enabled if MSI-X is enabled. */
>   /* A 16-bit vector for configuration changes. */
> -#define VIRTIO_MSI_CONFIG_VECTOR        20
> +#define VIRTIO_MSI_CONFIG_VECTOR        32
>   /* A 16-bit vector for selected queue notifications. */
> -#define VIRTIO_MSI_QUEUE_VECTOR         22
> +#define VIRTIO_MSI_QUEUE_VECTOR         34
>
>   /* Config space size */
> -#define VIRTIO_PCI_CONFIG_NOMSI         20
> -#define VIRTIO_PCI_CONFIG_MSI           24
> +#define VIRTIO_PCI_CONFIG_NOMSI         32
> +#define VIRTIO_PCI_CONFIG_MSI           36
>   #define VIRTIO_PCI_REGION_SIZE(dev)     (msix_present(dev) ? \
>                                            VIRTIO_PCI_CONFIG_MSI : \
>                                            VIRTIO_PCI_CONFIG_NOMSI)
> @@ -72,6 +72,12 @@
>                                            VIRTIO_PCI_CONFIG_MSI : \
>                                            VIRTIO_PCI_CONFIG_NOMSI)
>
> +#define VIRTIO_PCI_MEM_SEL                  20
> +
> +#define VIRTIO_PCI_MEM_SIZE                 24
> +
> +#define VIRTIO_PCI_MEM_PFN                  28
> +
>   /* Virtio ABI version, if we increment this, we break the guest driver. */
>   #define VIRTIO_PCI_ABI_VERSION          0
>
> @@ -227,6 +233,11 @@ static void virtio_ioport_write(void *opaque, uint32_t addr, uint32_t val)
>               val = VIRTIO_NO_VECTOR;
>           virtio_queue_set_vector(vdev, vdev->queue_sel, val);
>           break;
> +    case VIRTIO_PCI_MEM_SEL:
> +        fprintf(stderr, "%s: VIRTIO_PCI_MEM_SEL %d\n", __func__, val);
> +        if (val<  VIRTIO_PCI_MEM_MAX)
> +             vdev->mem_sel = val;
> +        break;
>       default:
>           fprintf(stderr, "%s: unexpected address 0x%x value 0x%x\n",
>                   __func__, addr, val);
> @@ -271,6 +282,14 @@ static uint32_t virtio_ioport_read(VirtIOPCIProxy *proxy, uint32_t addr)
>       case VIRTIO_MSI_QUEUE_VECTOR:
>           ret = virtio_queue_vector(vdev, vdev->queue_sel);
>           break;
> +    case VIRTIO_PCI_MEM_SIZE:
> +        ret = virtio_mem_get_size(vdev, vdev->mem_sel);
> +        break;
> +    case VIRTIO_PCI_MEM_PFN:
> +        ret = virtio_mem_get_addr(vdev, vdev->mem_sel)
> +>>  VIRTIO_PCI_QUEUE_ADDR_SHIFT;
> +        break;
> +
>       default:
>           break;
>       }
> @@ -370,6 +389,30 @@ static void virtio_map(PCIDevice *pci_dev, int region_num,
>           vdev->get_config(vdev, vdev->config);
>   }
>
> +
> +static void virtio_setup_mem(PCIDevice *d, int region_num,
> +                       pcibus_t addr, pcibus_t size, int type)
> +{
> +    VirtIOPCIProxy *proxy = DO_UPCAST(VirtIOPCIProxy, pci_dev, d);
> +    VirtIODevice *vdev;
> +
> +    printf("XXX setmem to %#x - %#x\n", (uint32_t)addr, (uint32_t)(addr + size));
> +    vdev = proxy->vdev;
> +
> +    cpu_register_physical_memory(addr, size, IO_MEM_UNASSIGNED);
> +    cpu_register_physical_memory(addr, size, vdev->vmem[region_num-2].ram_addr | IO_MEM_RAM);
> +
> +    /* store the address of the region and get a pointer for it
> +     * that can access the mem region */
> +    vdev->vmem[region_num-2].guest_phys_addr = (target_phys_addr_t)addr;
> +
> +    if (vdev->vmem[region_num-2].on_map) {
> +        vdev->vmem[region_num-2].on_map(vdev, addr, size);
> +    }
> +    return;
> +
> +}
> +
>   static void virtio_write_config(PCIDevice *pci_dev, uint32_t address,
>                                   uint32_t val, int len)
>   {
> @@ -406,6 +449,7 @@ static void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice *vdev,
>   {
>       uint8_t *config;
>       uint32_t size;
> +    int i;
>
>       proxy->vdev = vdev;
>
> @@ -426,6 +470,15 @@ static void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice *vdev,
>
>       config[0x3d] = 1;
>
> +    // only BARs that were added with virtio_add_memory will be allocated here
> +    for (i = 0; i<  VIRTIO_PCI_QUEUE_MAX; i++) {
> +        if (vdev->vmem[i].size>  0) {
> +            pci_register_bar(&proxy->pci_dev, 2 + i, vdev->vmem[i].size,
> +                PCI_BASE_ADDRESS_SPACE_MEMORY, virtio_setup_mem);
> +        }
> +    }
> +
> +
>       if (vdev->nvectors&&  !msix_init(&proxy->pci_dev, vdev->nvectors, 1, 0)) {
>           pci_register_bar(&proxy->pci_dev, 1,
>                            msix_bar_size(&proxy->pci_dev),
> diff --git a/hw/virtio.c b/hw/virtio.c
> index 7c020a3..1f7924e 100644
> --- a/hw/virtio.c
> +++ b/hw/virtio.c
> @@ -549,6 +549,16 @@ target_phys_addr_t virtio_queue_get_addr(VirtIODevice *vdev, int n)
>       return vdev->vq[n].pa;
>   }
>
> +target_phys_addr_t virtio_mem_get_addr(VirtIODevice *vdev, int n)
> +{
> +    return vdev->vmem[n].guest_phys_addr;
> +}
> +
> +int virtio_mem_get_size(VirtIODevice *vdev, int n)
> +{
> +    return vdev->vmem[n].size;
> +}
> +
>   int virtio_queue_get_num(VirtIODevice *vdev, int n)
>   {
>       return vdev->vq[n].vring.num;
> @@ -592,6 +602,33 @@ VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size,
>       return&vdev->vq[i];
>   }
>
> +VirtMem *virtio_add_memory(VirtIODevice *vdev, int mem_size,
> +                            void (*handle_access)(VirtIODevice *,
> +                                target_phys_addr_t, uint32_t))
> +{
> +    int i;
> +
> +    /* trace what virtio_fb_setmem does */
> +
> +    for (i = 0; i<  VIRTIO_PCI_MEM_MAX; i++) {
> +        if (vdev->vmem[i].size == 0) {
> +            /* XXX: check that memory is a power of 2 */
> +            vdev->vmem[i].size = mem_size * 1024 * 1024; /* presume size is in MBs*/
> +            vdev->vmem[i].ram_addr = qemu_ram_alloc(vdev->vmem[i].size);
> +            vdev->vmem[i].host_ptr = qemu_get_ram_ptr(vdev->vmem[i].ram_addr);
> +            vdev->vmem[i].on_map = handle_access;
> +            break;
> +        }
> +    }
> +
> +    return&vdev->vmem[i];
> +}
> +
> +void * virtio_get_memory_ptr(VirtMem * vmem)
> +{
> +    return vmem->host_ptr;
> +}
> +
>   void virtio_notify(VirtIODevice *vdev, VirtQueue *vq)
>   {
>       /* Always notify when queue is empty (when feature acknowledge) */
> @@ -714,6 +751,7 @@ VirtIODevice *virtio_common_init(const char *name, uint16_t device_id,
>       vdev->queue_sel = 0;
>       vdev->config_vector = VIRTIO_NO_VECTOR;
>       vdev->vq = qemu_mallocz(sizeof(VirtQueue) * VIRTIO_PCI_QUEUE_MAX);
> +    vdev->vmem = qemu_mallocz(sizeof(VirtMem) * VIRTIO_PCI_MEM_MAX);
>       for(i = 0; i<  VIRTIO_PCI_QUEUE_MAX; i++)
>           vdev->vq[i].vector = VIRTIO_NO_VECTOR;
>
> diff --git a/hw/virtio.h b/hw/virtio.h
> index 3baa2a3..02beb27 100644
> --- a/hw/virtio.h
> +++ b/hw/virtio.h
> @@ -82,6 +82,15 @@ typedef struct VirtQueueElement
>       struct iovec out_sg[VIRTQUEUE_MAX_SIZE];
>   } VirtQueueElement;
>
> +typedef struct VirtMem
> +{
> +    target_phys_addr_t guest_phys_addr;
> +    ram_addr_t ram_addr;
> +    int size;
> +    uint8_t * host_ptr;
> +    void (*on_map)(VirtIODevice *vdev, target_phys_addr_t, uint32_t);
> +} VirtMem;
> +
>   typedef struct {
>       void (*notify)(void * opaque, uint16_t vector);
>       void (*save_config)(void * opaque, QEMUFile *f);
> @@ -93,6 +102,8 @@ typedef struct {
>
>   #define VIRTIO_PCI_QUEUE_MAX 64
>
> +#define VIRTIO_PCI_MEM_MAX 16
> +
>   #define VIRTIO_NO_VECTOR 0xffff
>
>   struct VirtIODevice
> @@ -101,6 +112,7 @@ struct VirtIODevice
>       uint8_t status;
>       uint8_t isr;
>       uint16_t queue_sel;
> +    uint16_t mem_sel;
>       uint32_t guest_features;
>       size_t config_len;
>       void *config;
> @@ -113,6 +125,7 @@ struct VirtIODevice
>       void (*set_config)(VirtIODevice *vdev, const uint8_t *config);
>       void (*reset)(VirtIODevice *vdev);
>       VirtQueue *vq;
> +    VirtMem * vmem;
>       const VirtIOBindings *binding;
>       void *binding_opaque;
>       uint16_t device_id;
> @@ -122,6 +135,12 @@ VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size,
>                               void (*handle_output)(VirtIODevice *,
>                                                     VirtQueue *));
>
> +void * virtio_get_memory_ptr(VirtMem * vmem);
> +VirtMem *virtio_add_memory(VirtIODevice *vdev, int mem_size,
> +                            void (*handle_access)(VirtIODevice *,
> +                                                  target_phys_addr_t,
> +                                                  uint32_t));
> +
>   void virtqueue_push(VirtQueue *vq, const VirtQueueElement *elem,
>                       unsigned int len);
>   void virtqueue_flush(VirtQueue *vq, unsigned int count);
> @@ -169,6 +188,9 @@ void virtio_update_irq(VirtIODevice *vdev);
>   void virtio_bind_device(VirtIODevice *vdev, const VirtIOBindings *binding,
>                           void *opaque);
>
> +target_phys_addr_t virtio_mem_get_addr(VirtIODevice *vdev, int n);
> +int virtio_mem_get_size(VirtIODevice *vdev, int n);
> +
>   /* Base devices.  */
>   VirtIODevice *virtio_blk_init(DeviceState *dev, BlockConf *conf);
>   VirtIODevice *virtio_net_init(DeviceState *dev, NICConf *conf);
>
>
>