From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id ; Tue, 29 Oct 2002 21:16:16 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id ; Tue, 29 Oct 2002 21:16:16 -0500 Received: from mtao-m02.ehs.aol.com ([64.12.52.8]:5765 "EHLO mtao-m02.ehs.aol.com") by vger.kernel.org with ESMTP id ; Tue, 29 Oct 2002 21:16:15 -0500 Date: Tue, 29 Oct 2002 18:22:35 -0800 From: John Gardiner Myers Subject: Re: and nicer too - Re: [PATCH] epoll more scalable than poll In-reply-to: To: Linux Kernel Mailing List Cc: linux-aio@kvack.org, lse-tech@lists.sourceforge.net Message-id: <3DBF426B.6050208@netscape.com> MIME-version: 1.0 Content-type: text/plain; charset=ISO-8859-1; format=flowed Content-transfer-encoding: 7BIT X-Accept-Language: en-us, en User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.2b) Gecko/20021016 References: Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Davide Libenzi wrote: >You're in the computer science by many many years and still you're not able to >understand how edge triggered events works. > Failure to agree does not imply failure to understand. I understand the model you want to apply to this problem, I do not agree that it is the best model to apply to this problem. >Why the heck ( and this for the 100th time ) do you want to go to wait for >an event on the newly born fd if : > >1) On connect() you have the _full_ write I/O space available >2) On accept() it's very likely the you'll find something more than a SYN > in the first packet > >Besides, the first code is even more cleaner and simmetric, while adopting >your half *ss solution might suggest the user that he can go waiting for >events any time he wants. > The first code is hardly cleaner and is definitely not symmetric--the way the accept code has to set up to fall through the do_use_fd() code is subtle. In the first code, the accept segment cannot be cleanly pulled into a callback: for (;;) { nfds = sys_epoll_wait(kdpfd, &pfds, -1); for(n = 0; n < nfds; ++n) { (cb[pfds[n].fd])(pfds[n].fd); } } Also, your first code does not fit your "edge triggered" model--the code for handling 's' does not drain its input. By the time you call accept(), there could be multiple connections ready to be accepted. Your connect() argument is not applicable to "server sends first" protocols. I suspect you are being overly optimistic about the likelihood of getting data with SYN, but whatever. The argument is basically that not delivering an event upon registration (and thus having the event be implicit) improves performance because the socket is going to be ready with sufficiently high probability. I would counter that the cost of explicitly delivering such an event is miniscule compared to the rest of the cost of connection setup and teardown--the optimization is not worthwhile. >Like going to sleep the the wait queue of IDE >disk w/out having issued any command. > The key difference between this interface and wait queues is that with wait queues it is not technically feasible to both register interest and test the condition in a single, atomic operation. epoll does not have this technical limitation, so it can provide a better interface. >PS: since my time is not infinite, and since I'm working on the changes we >agreed with Andrew I would suggest you either to take another look at the >code suggesting us new changes ( like you did yesterday ) or to go >shopping for books. > I am uncomfortable with the way the epoll code adds its own set of notification hooks into the socket and pipe code. Much better would be to extend the existing set of notification hooks, like the aio poll code does. That would reduce the risk of kernel bugs where some subsystem fails to deliver an event to one but not all types of poll notification hooks and it would minimize the cost of the epoll patch when epoll is not being used. > >