public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: lkml@pengaru.com
To: linux-kernel@vger.kernel.org
Subject: Re: Honoring SO_RCVLOWAT in proto_ops.poll methods
Date: Sun, 21 Sep 2008 04:24:42 -0500	[thread overview]
Message-ID: <20080921092442.GS2761@fc6222126.aspadmin.net> (raw)
In-Reply-To: <20080920230046.GQ2761@fc6222126.aspadmin.net>

On Sat, Sep 20, 2008 at 06:00:46PM -0500, lkml@pengaru.com wrote:
> On Sat, Sep 20, 2008 at 03:21:40PM -0700, David Miller wrote:
> > From: lkml@pengaru.com
> > Date: Sat, 20 Sep 2008 16:42:29 -0500
> > 
> > > I have a need for select/poll/epoll_wait to block on sockets which have
> > > unread data sitting in the receive buffer with a quantity less than
> > > specified via setsockopt() w/SO_RCVLOWAT, not less than one like the
> > > current implementation.
> > 
> > If BSD never provided this behavior, such a change is likely
> > to break applications.
> 
> I did a quick look through FreeBSD source on fxr and found this macro:
> http://fxr.watson.org/fxr/source/sys/socketvar.h#L197
> 
> Which is used by the generic socket poll here:
> http://fxr.watson.org/fxr/source/kern/uipc_socket.c#L2731
> 
> You can look throughout that listing and so_rcv.sb_lowat is always what
> is compared against for determining rcv buf readability.
> 
> You might also want to look at the socket(7) man page which implies that
> what Linux currently does is exceptional & incorrect:
> 
>        SO_RCVLOWAT and SO_SNDLOWAT
>               Specify the minimum number of bytes in  the  buffer  until
>               the  socket  layer  will  pass  the  data  to the protocol
>               (SO_SNDLOWAT) or  the  user  on  receiving  (SO_RCVLOWAT).
>               These two values are initialised to 1.  SO_SNDLOWAT is not
>               changeable on Linux (setsockopt fails with the error  ENO-
>               PROTOOPT).   SO_RCVLOWAT  is  changeable  only since Linux
>               2.4.  The select(2) and poll(2) system calls currently  do
>               not  respect  the SO_RCVLOWAT setting on Linux, and mark a
>               socket readable when even a single byte of data is  avail-
>               able.   A subsequent read from the socket will block until
>               SO_RCVLOWAT bytes are available.
> 

I've been working on my application further and finally got around to
testing it with the assumption that poll won't block with regard to
SO_RCVLOWAT, and to my surprise even my recv() calls with MSG_PEEK flags
set are not blocking.  They block without MSG_PEEK, but not with.

Upon further investigation I find in tcp.c tcp_recvmsg() 2.6.26.5:

1306         target = sock_rcvlowat(sk, flags & MSG_WAITALL, len);

...snip...

1371                 if (copied >= target && !sk->sk_backlog.tail)
1372                         break;
1373 
1374                 if (copied) {
1375                         if (sk->sk_err ||
1376                             sk->sk_state == TCP_CLOSE ||
1377                             (sk->sk_shutdown & RCV_SHUTDOWN) ||
1378                             !timeo ||
1379                             signal_pending(current) ||
1380                             (flags & MSG_PEEK))
1381                                 break;
1382                 } else {


So line #1380 drops out without satisfying copied >= target if MSG_PEEK is
set, and if you look at the remainder of the function it's assuming that
it needs to cleanup buffers before waiting for more.  So fixing this guy
is likely not as trivial as fixing poll, since the rest of the function
has to be massaged to not try free things be in MSG_PEEK mode.

Once again, this deviates from FreeBSD behavior.

At this point, for my application to work on Linux without burning CPU like
mad... I basically have to sleep and poll the socket regularly to see if
more data has arrived with the tcp socket ioctl SIOCINQ. :(

Regards,
Vito Caputo

  reply	other threads:[~2008-09-21  9:24 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-09-20 21:42 Honoring SO_RCVLOWAT in proto_ops.poll methods lkml
2008-09-20 22:21 ` David Miller
2008-09-20 23:00   ` lkml
2008-09-21  9:24     ` lkml [this message]
2008-09-21 14:18       ` Alan Cox
     [not found]         ` <20080921145134.GT2761@fc6222126.aspadmin.net>
2008-09-21 20:13           ` Alan Cox
2008-09-21 22:09             ` lkml
2008-10-05 20:27               ` David Miller
2008-10-05 21:45                 ` swivel
2008-10-05 22:30                   ` David Miller
2008-10-06  5:17                     ` lkml
2008-10-06 17:18                       ` David Miller
2008-10-06 17:45                         ` David Miller
2008-10-13  7:34                     ` David Miller
2008-10-13  8:32                       ` swivel
2008-10-13  9:58                         ` David Miller
2008-10-20  3:58                           ` swivel
2008-10-20  4:25                             ` David Miller
2008-11-05 11:36                             ` David Miller
2008-09-22 12:15             ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080921092442.GS2761@fc6222126.aspadmin.net \
    --to=lkml@pengaru.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox