From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eric Dumazet Subject: Re: [PATCH] tcp: set SPLICE_F_NONBLOCK after first buffer has been spliced Date: Thu, 05 Nov 2009 15:11:45 +0100 Message-ID: <4AF2DD21.8060604@gmail.com> References: <20091105095947.32131.99768.stgit@rabbit.intern.cm-ag> <4AF2A929.3000201@gmail.com> <20091105105749.GA4901@rabbit.intern.cm-ag> <4AF2B551.6010302@gmail.com> <20091105132352.GA14453@rabbit.intern.cm-ag> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: linux-kernel@vger.kernel.org, jens.axboe@oracle.com, Linux Netdev List To: Max Kellermann Return-path: Received: from gw1.cosmosbay.com ([212.99.114.194]:56594 "EHLO gw1.cosmosbay.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756205AbZKEOLy (ORCPT ); Thu, 5 Nov 2009 09:11:54 -0500 In-Reply-To: <20091105132352.GA14453@rabbit.intern.cm-ag> Sender: netdev-owner@vger.kernel.org List-ID: Max Kellermann a =E9crit : > On 2009/11/05 12:21, Eric Dumazet wrote: >> Max Kellermann a =E9crit : >>> Do you think that a splice() should block if the socket is readable >>> and the pipe is writable according to select()? >>> >> Yes, this is perfectly legal >> >> select() can return "OK to write on fd", >> and still, write(fd, buffer, 10000000) is supposer/allowed to block = if fd is not O_NDELAY >=20 >>>From the select() manpage: "those in writefds will be watched to see > if a write will not block" >=20 >>>From the poll() manpage: "Writing now will not block." >=20 > This looks unambiguous to me, and contradicts with your thesis. Can > you provide sources? >=20 > What is your interpretation of the guarantees provided by select() an= d > poll()? Which byte count is "ok" to write after POLLOUT, and how muc= h > is "too much"? How does the application know? It cannot, therefore an application uses O_NDELAY to avoid blocking. Try following program if you are not convinced #include #include #include char buffer[1000000]; int main(int argc, char *argv[]) { int fds[2]; struct pollfd pfd; int res; =09 pipe(fds); pfd.fd =3D fds[1]; pfd.events =3D POLLOUT; res =3D poll(&pfd, 1, -1); if (res > 0 && pfd.revents & POLLOUT) printf("OK to write on pipe\n"); write(fds[1], buffer, sizeof(buffer)); // why it blocks, did poll() li= ed ??? return 0; } > I understand your patch, but I don't understand the conflict with my > patch. Can you describe a breakage caused by my patch? I only pointed out that using splice(tcp -> pipe) and blocking on pipe _can_ block, even on _first_ frame received from tcp, as you discovered= =2E Your only choices to avoid a deadlock are : 1) to use SPLICE_F_NONBLOCK. 2) Using a second thread to read the pipe and empty it. First thread wi= ll happily transfert 1000000 bytes in one syscall... 3) or limit your splice(... len, flags) length to 16 (16 buffers of one= byte in pathological cases) Your patch basically makes SPLICE_F_NONBLOCK option always set (choice = 1) above) So users wanting option 3) are stuck. You force them to use a poll()/se= lect() thing while they dont want to poll : They have a producer thread(s), an= d a consumer thread(s). producer() { while (1) splice(tcp, &offset, pfds[1], NULL, 10000000, SPLICE_F_MORE | SPLICE_F_MOVE); } Why in the first place have an option if it is always set ?