From mboxrd@z Thu Jan 1 00:00:00 1970 From: Cam Macdonell Subject: Re: [PATCH v5 4/5] Inter-VM shared memory PCI device Date: Mon, 10 May 2010 09:22:53 -0600 Message-ID: References: <1271872408-22842-1-git-send-email-cam@cs.ualberta.ca> <1271872408-22842-2-git-send-email-cam@cs.ualberta.ca> <1271872408-22842-3-git-send-email-cam@cs.ualberta.ca> <1271872408-22842-4-git-send-email-cam@cs.ualberta.ca> <1271872408-22842-5-git-send-email-cam@cs.ualberta.ca> <4BE7F517.5010707@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: kvm@vger.kernel.org, qemu-devel@nongnu.org To: Avi Kivity Return-path: Received: from mail-vw0-f46.google.com ([209.85.212.46]:53183 "EHLO mail-vw0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752041Ab0EJPWy convert rfc822-to-8bit (ORCPT ); Mon, 10 May 2010 11:22:54 -0400 Received: by vws17 with SMTP id 17so156389vws.19 for ; Mon, 10 May 2010 08:22:54 -0700 (PDT) In-Reply-To: <4BE7F517.5010707@redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: On Mon, May 10, 2010 at 5:59 AM, Avi Kivity wrote: > On 04/21/2010 08:53 PM, Cam Macdonell wrote: >> >> Support an inter-vm shared memory device that maps a shared-memory o= bject >> as a >> PCI device in the guest. =A0This patch also supports interrupts betw= een >> guest by >> communicating over a unix domain socket. =A0This patch applies to th= e >> qemu-kvm >> repository. >> >> =A0 =A0 -device ivshmem,size=3D[,shm=3D= ] >> >> Interrupts are supported between multiple VMs by using a shared memo= ry >> server >> by using a chardev socket. >> >> =A0 =A0 -device ivshmem,size=3D[,shm=3D= ] >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 [,chardev=3D][,msi=3Don]= [,irqfd=3Don][,vectors=3Dn] >> =A0 =A0 -chardev socket,path=3D,id=3D >> >> (shared memory server is qemu.git/contrib/ivshmem-server) >> >> Sample programs and init scripts are in a git repo here: >> >> >> +typedef struct EventfdEntry { >> + =A0 =A0PCIDevice *pdev; >> + =A0 =A0int vector; >> +} EventfdEntry; >> + >> +typedef struct IVShmemState { >> + =A0 =A0PCIDevice dev; >> + =A0 =A0uint32_t intrmask; >> + =A0 =A0uint32_t intrstatus; >> + =A0 =A0uint32_t doorbell; >> + >> + =A0 =A0CharDriverState * chr; >> + =A0 =A0CharDriverState ** eventfd_chr; >> + =A0 =A0int ivshmem_mmio_io_addr; >> + >> + =A0 =A0pcibus_t mmio_addr; >> + =A0 =A0unsigned long ivshmem_offset; >> + =A0 =A0uint64_t ivshmem_size; /* size of shared memory region */ >> + =A0 =A0int shm_fd; /* shared memory file descriptor */ >> + >> + =A0 =A0int nr_allocated_vms; >> + =A0 =A0/* array of eventfds for each guest */ >> + =A0 =A0int ** eventfds; >> + =A0 =A0/* keep track of # of eventfds for each guest*/ >> + =A0 =A0int * eventfds_posn_count; >> > > More readable: > > =A0typedef struct Peer { > =A0 =A0 =A0int nb_eventfds; > =A0 =A0 =A0int *eventfds; > =A0} Peer; > =A0int nb_peers; > =A0Peer *peers; > > Does eventfd_chr need to be there as well? > >> + >> + =A0 =A0int nr_alloc_guests; >> + =A0 =A0int vm_id; >> + =A0 =A0int num_eventfds; >> + =A0 =A0uint32_t vectors; >> + =A0 =A0uint32_t features; >> + =A0 =A0EventfdEntry *eventfd_table; >> + >> + =A0 =A0char * shmobj; >> + =A0 =A0char * sizearg; >> > > Does this need to be part of the state? > >> +} IVShmemState; >> + >> +/* registers for the Inter-VM shared memory device */ >> +enum ivshmem_registers { >> + =A0 =A0IntrMask =3D 0, >> + =A0 =A0IntrStatus =3D 4, >> + =A0 =A0IVPosition =3D 8, >> + =A0 =A0Doorbell =3D 12, >> +}; >> + >> +static inline uint32_t ivshmem_has_feature(IVShmemState *ivs, int >> feature) { >> + =A0 =A0return (ivs->features& =A0(1<< =A0feature)); >> +} >> + >> +static inline int is_power_of_two(int x) { >> + =A0 =A0return (x& =A0(x-1)) =3D=3D 0; >> +} >> > > argument needs to be uint64_t to avoid overflow with large BARs. =A0R= eturn > type can be bool. > >> +static void ivshmem_io_writel(void *opaque, uint8_t addr, uint32_t = val) >> +{ >> + =A0 =A0IVShmemState *s =3D opaque; >> + >> + =A0 =A0u_int64_t write_one =3D 1; >> + =A0 =A0u_int16_t dest =3D val>> =A016; >> + =A0 =A0u_int16_t vector =3D val& =A00xff; >> + >> + =A0 =A0addr&=3D 0xfe; >> > > Why 0xfe? =A0Can understand 0xfc or 0xff. > >> + >> + =A0 =A0switch (addr) >> + =A0 =A0{ >> + =A0 =A0 =A0 =A0case IntrMask: >> + =A0 =A0 =A0 =A0 =A0 =A0ivshmem_IntrMask_write(s, val); >> + =A0 =A0 =A0 =A0 =A0 =A0break; >> + >> + =A0 =A0 =A0 =A0case IntrStatus: >> + =A0 =A0 =A0 =A0 =A0 =A0ivshmem_IntrStatus_write(s, val); >> + =A0 =A0 =A0 =A0 =A0 =A0break; >> + >> + =A0 =A0 =A0 =A0case Doorbell: >> + =A0 =A0 =A0 =A0 =A0 =A0/* check doorbell range */ >> + =A0 =A0 =A0 =A0 =A0 =A0if ((vector>=3D 0)&& =A0(vector< =A0s->even= tfds_posn_count[dest])) >> { >> > > What if dest is too big? =A0We overflow s->eventfds_posn_count. >> >> + >> +static void close_guest_eventfds(IVShmemState *s, int posn) >> +{ >> + =A0 =A0int i, guest_curr_max; >> + >> + =A0 =A0guest_curr_max =3D s->eventfds_posn_count[posn]; >> + >> + =A0 =A0for (i =3D 0; i< =A0guest_curr_max; i++) >> + =A0 =A0 =A0 =A0close(s->eventfds[posn][i]); >> + >> + =A0 =A0free(s->eventfds[posn]); >> > > qemu_free(). > >> +/* this function increase the dynamic storage need to store data ab= out >> other >> + * guests */ >> +static void increase_dynamic_storage(IVShmemState *s, int new_min_s= ize) { >> + >> + =A0 =A0int j, old_nr_alloc; >> + >> + =A0 =A0old_nr_alloc =3D s->nr_alloc_guests; >> + >> + =A0 =A0while (s->nr_alloc_guests< =A0new_min_size) >> + =A0 =A0 =A0 =A0s->nr_alloc_guests =3D s->nr_alloc_guests * 2; >> + >> + =A0 =A0IVSHMEM_DPRINTF("bumping storage to %d guests\n", >> s->nr_alloc_guests); >> + =A0 =A0s->eventfds =3D qemu_realloc(s->eventfds, s->nr_alloc_guest= s * >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0sizeof(int *)); >> + =A0 =A0s->eventfds_posn_count =3D qemu_realloc(s->eventfds_posn_co= unt, >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0s->nr_alloc_guests * >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0sizeof(int)); >> + =A0 =A0s->eventfd_table =3D qemu_realloc(s->eventfd_table, s->nr_a= lloc_guests >> * >> + >> =A0sizeof(EventfdEntry)); >> + >> + =A0 =A0if ((s->eventfds =3D=3D NULL) || (s->eventfds_posn_count =3D= =3D NULL) || >> + =A0 =A0 =A0 =A0 =A0 =A0(s->eventfd_table =3D=3D NULL)) { >> + =A0 =A0 =A0 =A0fprintf(stderr, "Allocation error - exiting\n"); >> + =A0 =A0 =A0 =A0exit(1); >> + =A0 =A0} >> + >> + =A0 =A0if (!ivshmem_has_feature(s, IVSHMEM_IRQFD)) { >> + =A0 =A0 =A0 =A0s->eventfd_chr =3D (CharDriverState **)qemu_realloc= (s->eventfd_chr, >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0s->nr_alloc_guests * sizeof(void *)); >> + =A0 =A0 =A0 =A0if (s->eventfd_chr =3D=3D NULL) { >> + =A0 =A0 =A0 =A0 =A0 =A0fprintf(stderr, "Allocation error - exiting= \n"); >> + =A0 =A0 =A0 =A0 =A0 =A0exit(1); >> + =A0 =A0 =A0 =A0} >> + =A0 =A0} >> + >> + =A0 =A0/* zero out new pointers */ >> + =A0 =A0for (j =3D old_nr_alloc; j< =A0s->nr_alloc_guests; j++) { >> + =A0 =A0 =A0 =A0s->eventfds[j] =3D NULL; >> > > eventfds_posn_count and eventfd_table want zeroing as well. > >> + =A0 =A0} >> +} >> + >> +static void ivshmem_read(void *opaque, const uint8_t * buf, int fla= gs) >> +{ >> + =A0 =A0IVShmemState *s =3D opaque; >> + =A0 =A0int incoming_fd, tmp_fd; >> + =A0 =A0int guest_curr_max; >> + =A0 =A0long incoming_posn; >> + >> + =A0 =A0memcpy(&incoming_posn, buf, sizeof(long)); >> + =A0 =A0/* pick off s->chr->msgfd and store it, posn should accompa= ny msg */ >> + =A0 =A0tmp_fd =3D qemu_chr_get_msgfd(s->chr); >> + =A0 =A0IVSHMEM_DPRINTF("posn is %ld, fd is %d\n", incoming_posn, t= mp_fd); >> + >> + =A0 =A0/* make sure we have enough space for this guest */ >> + =A0 =A0if (incoming_posn>=3D s->nr_alloc_guests) { >> + =A0 =A0 =A0 =A0increase_dynamic_storage(s, incoming_posn); >> + =A0 =A0} >> + >> + =A0 =A0if (tmp_fd =3D=3D -1) { >> + =A0 =A0 =A0 =A0/* if posn is positive and unseen before then this = is our posn*/ >> + =A0 =A0 =A0 =A0if ((incoming_posn>=3D 0)&& =A0(s->eventfds[incomin= g_posn] =3D=3D NULL)) >> { >> + =A0 =A0 =A0 =A0 =A0 =A0/* receive our posn */ >> + =A0 =A0 =A0 =A0 =A0 =A0s->vm_id =3D incoming_posn; >> + =A0 =A0 =A0 =A0 =A0 =A0return; >> + =A0 =A0 =A0 =A0} else { >> + =A0 =A0 =A0 =A0 =A0 =A0/* otherwise an fd =3D=3D -1 means an exist= ing guest has gone >> away */ >> + =A0 =A0 =A0 =A0 =A0 =A0IVSHMEM_DPRINTF("posn %ld has gone away\n",= incoming_posn); >> + =A0 =A0 =A0 =A0 =A0 =A0close_guest_eventfds(s, incoming_posn); >> + =A0 =A0 =A0 =A0 =A0 =A0return; >> + =A0 =A0 =A0 =A0} >> + =A0 =A0} >> + >> + =A0 =A0/* because of the implementation of get_msgfd, we need a du= p */ >> + =A0 =A0incoming_fd =3D dup(tmp_fd); >> > > Error check. > >> + >> + =A0 =A0/* if the position is -1, then it's shared memory region fd= */ >> + =A0 =A0if (incoming_posn =3D=3D -1) { >> + >> + =A0 =A0 =A0 =A0s->num_eventfds =3D 0; >> + >> + =A0 =A0 =A0 =A0if (check_shm_size(s, incoming_fd) =3D=3D -1) { >> + =A0 =A0 =A0 =A0 =A0 =A0exit(-1); >> + =A0 =A0 =A0 =A0} >> + >> + =A0 =A0 =A0 =A0/* creating a BAR in qemu_chr callback may be crazy= */ >> + =A0 =A0 =A0 =A0create_shared_memory_BAR(s, incoming_fd); >> > > It probably is... why can't you create it during initialization? This is for the shared memory server implementation, so the fd for the shared memory has to be received (over the qemu char device) from the server before the BAR can be created via qemu_ram_mmap() which adds the necessary memory Otherwise, if the BAR is allocated during initialization, I would have to use MAP_FIXED to mmap the memory. This is what I did before the qemu_ram_mmap() function was added. > > >> + >> + =A0 =A0 =A0 return; >> + =A0 =A0} >> + >> + =A0 =A0/* each guest has an array of eventfds, and we keep track o= f how many >> + =A0 =A0 * guests for each VM */ >> + =A0 =A0guest_curr_max =3D s->eventfds_posn_count[incoming_posn]; >> + =A0 =A0if (guest_curr_max =3D=3D 0) { >> + =A0 =A0 =A0 =A0/* one eventfd per MSI vector */ >> + =A0 =A0 =A0 =A0s->eventfds[incoming_posn] =3D (int *) qemu_malloc(= s->vectors * >> + >> =A0sizeof(int)); >> + =A0 =A0} >> + >> + =A0 =A0/* this is an eventfd for a particular guest VM */ >> + =A0 =A0IVSHMEM_DPRINTF("eventfds[%ld][%d] =3D %d\n", incoming_posn= , >> guest_curr_max, >> + >> =A0incoming_fd); >> + =A0 =A0s->eventfds[incoming_posn][guest_curr_max] =3D incoming_fd; >> + >> + =A0 =A0/* increment count for particular guest */ >> + =A0 =A0s->eventfds_posn_count[incoming_posn]++; >> > > Not sure I follow exactly, but perhaps this needs to be > > =A0 =A0s->eventfds_posn_count[incoming_posn] =3D guest_curr_max + 1; > > Oh, it is. > >> + >> + =A0 =A0 =A0 =A0/* allocate/initialize space for interrupt handling= */ >> + =A0 =A0 =A0 =A0s->eventfds =3D qemu_mallocz(s->nr_alloc_guests * s= izeof(int *)); >> + =A0 =A0 =A0 =A0s->eventfd_table =3D qemu_mallocz(s->vectors * >> sizeof(EventfdEntry)); >> + =A0 =A0 =A0 =A0s->eventfds_posn_count =3D qemu_mallocz(s->nr_alloc= _guests * >> sizeof(int)); >> + >> + =A0 =A0 =A0 =A0pci_conf[PCI_INTERRUPT_PIN] =3D 1; /* we are going = to support >> interrupts */ >> > > This is done by the guest BIOS. > > > -- > error compiling committee.c: too many arguments to function > >