From: Eric Dumazet <eric.dumazet@gmail.com>
To: Max Kellermann <mk@cm4all.com>
Cc: linux-kernel@vger.kernel.org, jens.axboe@oracle.com,
Linux Netdev List <netdev@vger.kernel.org>
Subject: Re: [PATCH] tcp: set SPLICE_F_NONBLOCK after first buffer has been spliced
Date: Thu, 05 Nov 2009 15:11:45 +0100 [thread overview]
Message-ID: <4AF2DD21.8060604@gmail.com> (raw)
In-Reply-To: <20091105132352.GA14453@rabbit.intern.cm-ag>
Max Kellermann a écrit :
> On 2009/11/05 12:21, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> Max Kellermann a écrit :
>>> Do you think that a splice() should block if the socket is readable
>>> and the pipe is writable according to select()?
>>>
>> Yes, this is perfectly legal
>>
>> select() can return "OK to write on fd",
>> and still, write(fd, buffer, 10000000) is supposer/allowed to block if fd is not O_NDELAY
>
>>From the select() manpage: "those in writefds will be watched to see
> if a write will not block"
>
>>From the poll() manpage: "Writing now will not block."
>
> This looks unambiguous to me, and contradicts with your thesis. Can
> you provide sources?
>
> What is your interpretation of the guarantees provided by select() and
> poll()? Which byte count is "ok" to write after POLLOUT, and how much
> is "too much"? How does the application know?
It cannot, therefore an application uses O_NDELAY to avoid blocking.
Try following program if you are not convinced
#include <unistd.h>
#include <sys/poll.h>
#include <stdio.h>
char buffer[1000000];
int main(int argc, char *argv[])
{
int fds[2];
struct pollfd pfd;
int res;
pipe(fds);
pfd.fd = fds[1];
pfd.events = POLLOUT;
res = poll(&pfd, 1, -1);
if (res > 0 && pfd.revents & POLLOUT)
printf("OK to write on pipe\n");
write(fds[1], buffer, sizeof(buffer)); // why it blocks, did poll() lied ???
return 0;
}
> I understand your patch, but I don't understand the conflict with my
> patch. Can you describe a breakage caused by my patch?
I only pointed out that using splice(tcp -> pipe) and blocking on pipe
_can_ block, even on _first_ frame received from tcp, as you discovered.
Your only choices to avoid a deadlock are :
1) to use SPLICE_F_NONBLOCK.
2) Using a second thread to read the pipe and empty it. First thread will
happily transfert 1000000 bytes in one syscall...
3) or limit your splice(... len, flags) length to 16 (16 buffers of one byte
in pathological cases)
Your patch basically makes SPLICE_F_NONBLOCK option always set (choice 1) above)
So users wanting option 3) are stuck. You force them to use a poll()/select()
thing while they dont want to poll : They have a producer thread(s), and a consumer
thread(s).
producer()
{
while (1)
splice(tcp, &offset, pfds[1], NULL, 10000000,
SPLICE_F_MORE | SPLICE_F_MOVE);
}
Why in the first place have an option if it is always set ?
next prev parent reply other threads:[~2009-11-05 14:11 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20091105095947.32131.99768.stgit@rabbit.intern.cm-ag>
2009-11-05 10:30 ` [PATCH] tcp: set SPLICE_F_NONBLOCK after first buffer has been spliced Eric Dumazet
2009-11-05 10:57 ` Max Kellermann
2009-11-05 11:21 ` Eric Dumazet
2009-11-05 13:23 ` Max Kellermann
2009-11-05 14:11 ` Eric Dumazet [this message]
2009-11-05 14:33 ` Max Kellermann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4AF2DD21.8060604@gmail.com \
--to=eric.dumazet@gmail.com \
--cc=jens.axboe@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mk@cm4all.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).