From mboxrd@z Thu Jan 1 00:00:00 1970 From: Thomas Monjalon Subject: Re: [PATCH v6 4/8] eal/linux: add per rx queue interrupt handling based on VFIO Date: Fri, 27 Feb 2015 15:13:21 +0100 Message-ID: <7961783.gxFFSSThyb@xps13> References: <1424710542-14637-1-git-send-email-danny.zhou@intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Cc: dev-VfR2kkLFssw@public.gmane.org To: "Liang, Cunming" Return-path: In-Reply-To: List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces-VfR2kkLFssw@public.gmane.org Sender: "dev" Hi Cunming, First, sorry to have to say that, but it is not easy to read discussion= s where quote marks are not used. I re-insert them for clarity. Comments below. 2015-02-27 12:22, Liang, Cunming: > From: David Marchand [mailto:david.marchand-pdR9zngts4EAvxtiuMwx3w@public.gmane.org] > Sent: Friday, February 27, 2015 6:34 PM >=20 > > I am not really comfortable with this api. > >=20 > > This is just creating something on top of the standard epoll api wi= th > > limitations. In the end, we could just use an external lib that doe= s this > > already. >=20 > [Liang, Cunming] Not really, I think. We try to protect the data insi= de > =E2=80=98rte_intr_handle=E2=80=99, it doesn=E2=80=99t expect user to = understand the things defined > inside =E2=80=98rte_intr_handle=E2=80=99. > It=E2=80=99s better typedef =E2=80=98rte_intr_handle=E2=80=99 as a ra= w integer ID, having a function > to get it from a ethdev. Then all the interrupt api is around it. > It provides the common pci NIC devices rxtx interrupt processing appr= oach. > For the limitations, we can fix it step by step. >=20 > > So ok, this will work for your limited use case, but this will not = be > > really useful for anything else. > > Not sure it has its place in eal, this is more an example to me. >=20 > [Liang, Cunming] =E2=80=98limited use case=E2=80=99 do you means only= for rxtx ? > It don=E2=80=99t expect to provide a generic event mechanism (like li= bev/libevent > does), but a simple way to allow PMD work with DMA interrupt. It main= ly > abstract for rx interrupt purpose. I appreciate if you could help to = list > more useful cases. You don't expect to provide a generic event mechanism but application developpers could need to wait for many events at once, not only Rx one= s. That's why it's better to provide only the needed parts to use somethin= g generic like libevent. And we should avoid reinventing the wheel. > > > +static void > > > +eal_intr_process_rxtx_interrupts(struct rte_intr_handle *intr_ha= ndle, > > > + struct epoll_event *events, > > > + uint32_t *vec, int nfds) > > > +{ > > > + int i, bytes_read; > > > + union rte_intr_read_buffer buf; > > > + int fd; > > > + > > > + for (i =3D 0; i < nfds; i++) { > > > + /* set the length to be read for different handle= type */ > > > + switch (intr_handle->type) { > > > + case RTE_INTR_HANDLE_UIO: > > > + bytes_read =3D sizeof(buf.uio_intr_count)= ; > > > + break; > > > + case RTE_INTR_HANDLE_ALARM: > > > + bytes_read =3D sizeof(buf.timerfd_num); > > > + break; > > > +#ifdef VFIO_PRESENT > > > + case RTE_INTR_HANDLE_VFIO_MSIX: > > > + case RTE_INTR_HANDLE_VFIO_MSI: > > > + case RTE_INTR_HANDLE_VFIO_LEGACY: > > > + bytes_read =3D sizeof(buf.vfio_intr_count= ); > > > + break; > > > +#endif > > > + default: > > > + bytes_read =3D 1; > > > + break; > > > + } > > > + > > > + /** > > > + * read out to clear the ready-to-be-read flag > > > + * for epoll_wait. > > > + */ > > > + vec[i] =3D events[i].data.u32; > > > + assert(vec[i] < VFIO_MAX_RXTX_INTR_ID); > > > + > > > + fd =3D intr_handle->efds[vec[i]]; > > > + bytes_read =3D read(fd, &buf, bytes_read); > > > + if (bytes_read < 0) > > > + RTE_LOG(ERR, EAL, "Error reading from fil= e " > > > + "descriptor %d: %s\n", fd, strerr= or(errno)); > > > + else if (bytes_read =3D=3D 0) > > > + RTE_LOG(ERR, EAL, "Read nothing from file= " > > > + "descriptor %d\n", fd); > > > + } > > > +} > >=20 > > Why unconditionnally read ? > > You are absorbing events from the application if the application ga= ve you > > an external epfd and populated it with its own fds. >=20 > [Liang, Cunming] The vector number was checked. If an external epfd > populated some event carry fd rather than a data.u32 but the value > inside the valid range, it considers as a valid vector number. No mat= ter > the read success or not, it always notify the event. Do you have any > suggestion used here to check the condition ? >=20 > > > +static int init_tls_epfd(void) > > > +{ > > > + int pfd =3D epoll_create(1); > > > + if (pfd < 0) { > > > + RTE_LOG(ERR, EAL, > > > + "Cannot create epoll instance\n"); > > > + return -1; > > > + } > > > + return pfd; > > > +} > > > + > > > +int > > > +rte_intr_rx_wait(struct rte_intr_handle *intr_handle, int epfd, > > > + uint32_t *vec, uint16_t num) > > > +{ > >=20 > > In the end, this "rx" does not mean anything to eal. >=20 > [Liang, Cunming] That=E2=80=99s a good point. I tried to remove =E2=80= =98rx=E2=80=99 and use a > generic word here. =E2=80=98rte_intr_wait=E2=80=99 looks like too gen= eric, > =E2=80=98rte_intr_epfd_wait=E2=80=99 looks not abstract with bsd. > As the function only serves for rxtx vector, so using the rx prefix. > Which name do you prefer ? You should understand that you are trying to wrongly replace a generic = lib. The best name is probably /dev/null. > > > +#define MAX_EVENTS 8 > > > + struct epoll_event events[MAX_EVENTS]; > > > + int ret, nfds =3D 0; > > > + > > > + if (!intr_handle || !vec) { > > > + RTE_LOG(ERR, EAL, "invalid input parameter\n"); > > > + return -1; > > > + } > > > + > > > + if (intr_handle->type !=3D RTE_INTR_HANDLE_VFIO_MSIX) { > > > + RTE_LOG(ERR, EAL, "intr type should be VFIO_MSIX\= n"); > > > + return -1; > > > + } > > > + > > > + if (epfd =3D=3D RTE_EPOLL_FD_ANY) { > > > + /* using per thread epoll fd */ > > > + if (unlikely(RTE_PER_LCORE(_epfd) =3D=3D -1)) > > > + RTE_PER_LCORE(_epfd) =3D init_tls_epfd();= > > > + epfd =3D RTE_PER_LCORE(_epfd); > > > + } > >=20 > > Rather than testing every time, this should be set by the caller, > > i.e. epfd is always valid. > > If application does not want to create a epfd, then it calls > > rte_intr_rx_wait with RTE_EPOLL_FD_ANY (this name is not well chose= n) > > that is a macro wrapped to RTE_PER_LCORE(_epfd). >=20 > [Liang, Cunming] It sounds good to me. As we don=E2=80=99t expect to = expose > *rte_per_lcore__epfd* as an public symbol, so will define rte_epfd() > instread. > Within rte_epfd(), if RTE_PER_LCORE(_epfd) not assigned, then > init_tls_epfd() once. >=20 > > init_tls_epfd() should be called only once at init time. > > No need to check every time. >=20 > [Liang, Cunming] As it probably not need per thread epfd at all. > So I prefer to create it when it real needed as above I mentioned. > > > + do { > > > + ret =3D epoll_wait(epfd, events, > > > + RTE_MIN(num, MAX_EVENTS), > > > + EAL_INTR_EPOLL_WAIT_FOREVER); > > > + if (unlikely(ret < 0)) { > > > + /* epoll_wait fail */ > > > + RTE_LOG(ERR, EAL, "epoll_wait returns wit= h fail\n"); > > > + return -1; > > > + } else if (ret > 0) { > > > + /* epoll_wait has at least one fd ready t= o read */ > > > + eal_intr_process_rxtx_interrupts(intr_han= dle, events, > > > + vec, ret= ); > > > + num -=3D ret; > > > + vec +=3D ret; > > > + nfds +=3D ret; > > > + } else if (nfds > 0) > > > + break; > > > + } while (num > 0); > > > + > > > + return nfds; > > > +} > > > > You are blocking unless all fds have been set, so you are serialisi= ng > > all events. >=20 > [Liang, Cunming] I=E2=80=99m not sure fully got your point. If any ev= ent arrives, > it gets back. Do you means if no fds added in, it=E2=80=99s always bl= ocking. > You expect to have a timeout return ? > > > RTE_LOG(ERR, EAL, " cannot get IRQ info,= " > > >=20 > > > - "error %i (%s)\n", errno,= strerror(errno)); > > > + "error %i (%s)\n", errno, strerro= r(errno)); > >=20 > > Garbage, this has nothing to do with the patch. >=20 > [Liang, Cunming] It=E2=80=99s for line number exceed 80 margin compla= in. The title of the patch is "add per rx queue interrupt handling based on= VFIO". So this kind of modification is a garbage. Sorry, I won't play the game "idem / the same" below ;) > > > - if (internal_config.vfio_intr_mode !=3D R= TE_INTR_MODE_NONE) { > > > + if (internal_config.vfio_intr_mode !=3D > > > + RTE_INTR_MODE_NONE) { > > > RTE_LOG(ERR, EAL, > > > - " interrupt vect= or does not support eventfd!\n"); > > > + " interrupt vector " > > > + "does not support eventfd= !\n");> >=20 > >=20 > > Idem. >=20 > [Liang, Cunming] The same. > > > - "error %i (%s)\n", errno,= strerror(errno)); > > > + "error %i (%s)\n", errno, strerro= r(errno)); > >=20 > > Idem. >=20 > [Liang, Cunming] The same. > > > dev->intr_handle.vfio_dev_fd =3D vfio_dev_fd; > > > - > >=20 > > Idem. >=20 > [Liang, Cunming] Accept.