From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eric Dumazet Subject: Re: [take23 0/5] kevent: Generic event handling mechanism. Date: Thu, 09 Nov 2006 00:07:37 +0100 Message-ID: <45526339.3040506@cosmosbay.com> References: <1154985aa0591036@2ka.mipt.ru> <20061107141718.f7414b31.akpm@osdl.org> <20061108082147.GA2447@2ka.mipt.ru> <200611081551.14671.dada1@cosmosbay.com> <20061108140307.da7d815f.akpm@osdl.org> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Andrew Morton , Evgeniy Polyakov , David Miller , Ulrich Drepper , netdev , Zach Brown , Christoph Hellwig , Chase Venters , Johann Borck , Linux Kernel Mailing List , Jeff Garzik Return-path: Received: from sp604005mt.neufgp.fr ([84.96.92.11]:65246 "EHLO smtp.Neuf.fr") by vger.kernel.org with ESMTP id S1423926AbWKHXHa (ORCPT ); Wed, 8 Nov 2006 18:07:30 -0500 In-reply-to: To: Davide Libenzi Sender: netdev-owner@vger.kernel.org List-Id: netdev.vger.kernel.org Davide Libenzi a =E9crit : > On Wed, 8 Nov 2006, Andrew Morton wrote: >=20 >> On Wed, 8 Nov 2006 15:51:13 +0100 >> Eric Dumazet wrote: >> >>> [PATCH] eventpoll : In case a fault occurs during copy_to_user(), w= e should=20 >>> report the count of events that were successfully copied into user = space,=20 >>> instead of EFAULT. That would be consistent with behavior of read/w= rite()=20 >>> syscalls for example. >>> >>> Signed-off-by: Eric Dumazet >>> >>> >>> >>> [eventpoll.patch text/plain (424B)] >>> --- linux/fs/eventpoll.c 2006-11-08 15:37:36.000000000 +0100 >>> +++ linux/fs/eventpoll.c 2006-11-08 15:38:31.000000000 +0100 >>> @@ -1447,7 +1447,7 @@ >>> &events[eventcnt].events) || >>> __put_user(epi->event.data, >>> &events[eventcnt].data)) >>> - return -EFAULT; >>> + return eventcnt ? eventcnt : -EFAULT; >>> if (epi->event.events & EPOLLONESHOT) >>> epi->event.events &=3D EP_PRIVATE_BITS; >>> eventcnt++; >>> >> Definitely a better interface, but I wonder if it's too late to chan= ge it. >> >> An app which does >> >> if (epoll_wait(...) =3D=3D -1) >> barf(errno); >> else >> assume_all_events_were_received(); >> >> will now do the wrong thing. >> >> otoh, such an applciation basically _has_ to use the epoll_wait() >> return value to work out how many events it received, so maybe it's = OK... >=20 > I don't care about both ways, but sys_poll() does the same thing epol= l=20 > does right now, so I would not change epoll behaviour. >=20 Sure poll() cannot return a partial count, since its return value is : On success, a positive number is returned, where the number returned is the number of structures which have non-zero revents fields (= in other words, those descriptors with events or errors reported). poll() is non destructive (it doesnt change any state into kernel). Ret= urning=20 EFAULT in case of an error in the very last bit of user area is mandato= ry. On the contrary : epoll_wait() does return a count of transfered events, and update some = state=20 in kernel (it consume Edge Trigered events : They can be lost forever i= f not=20 reported to user) So epoll_wait() is much more like read(), that also updates file state = in=20 kernel (current file position)