From: Trond Myklebust <Trond.Myklebust@netapp.com>
To: Jeff Layton <jlayton@redhat.com>
Cc: Andy Adamson <andros@netapp.com>, linux-nfs@vger.kernel.org
Subject: Re: [[RFC] 1/1] SUNRPC: dynamic rpc_slot allocator for TCP
Date: Thu, 05 May 2011 08:19:56 -0400 [thread overview]
Message-ID: <1304597996.22197.9.camel@lade.trondhjem.org> (raw)
In-Reply-To: <20110505074741.17f1698d@tlielax.poochiereds.net>
On Thu, 2011-05-05 at 07:47 -0400, Jeff Layton wrote:
> On Wed, 04 May 2011 11:35:34 -0400
> Trond Myklebust <Trond.Myklebust@netapp.com> wrote:
>
> > On Wed, 2011-05-04 at 11:20 -0400, Andy Adamson wrote:
> > > On May 4, 2011, at 11:08 AM, Jeff Layton wrote:
> > >
> > > > On Mon, 2 May 2011 21:40:08 -0400
> > > > andros@netapp.com wrote:
> > > >
> > > >> + if (!test_and_clear_bit(XPRT_WRITE_SPACE, &xprt->state))
> > > >> + return;
> > > >
> > > > Also, I'm not sure that a single bit really conveys enough information
> > > > for this.
> > > >
> > > > IIUC, sk_write_space gets called when a packet is TCP ACK'ed. It seems
> > > > possible that we would sometimes have buffer space available to queue
> > > > the packet without sk_write_space being called. With this, we'll
> > > > basically be serializing all dynamic slot allocations behind the
> > > > sk_write_space callbacks.
> > >
> > > Which I thought was OK given that the TCP window takes a while to stabilize.
> > >
> > > >
> > > > Consider the case of many small TCP frames being sent after a large one
> > > > just got ACK'ed. Only one would be allowed to be sent, even though
> > > > there might be enough send buffer space to allow for more.
> > > >
> > > > Would it instead make more sense to base this on the amount of space
> > > > available in the actual socket rather than this bit?
> > >
> > > So at each write_space, potentially allocate more than one rpc_slot as opposed
> > > to allocating one rpc_slot and waiting for the next write_space? I could look at this
> > > with the 10G testiing.
> >
> > Why? You can't send that data. Once you hit the write space limit, then
> > the socket remains blocked until you get the callback. It doesn't matter
> > how small the frame, you will not be allowed to send more data.
> >
> > On the other hand, we do set the SOCK_NOSPACE bit, which means that the
> > socket layer will attempt to grow the TCP window even though we're not
> > actually putting more data into the socket.
> >
>
> I'm not sure I understand what you're suggesting here.
>
> I guess my main point is that a single bit that we flip on in
> write_space and flip off when a slot is allocated doesn't carry enough
> info. That scheme will also be subject to subtle differences in timing.
> For instance...
>
> Suppose a large number of TCP ACKs come in all at around the same time.
> write_space gets called a bunch of times in succession, so the bit gets
> "set" several times. Several queued tasks get woken up but only one can
> clear the bit so only one gets a slot.
>
> However, if those acks come in with enough of a delay between them, then
> you can potentially get one slot allocated per write_space callback.
The write space callback doesn't wake anyone up until 1/2 the total
socket buffer is free: that's what the sock_writeable() test does.
> I think we ought to consider a heuristic that doesn't rely on the
> frequency and timing of write_space callbacks.
What we're doing now is basically what is being done by the socket layer
when a user process tries to write too much data to the socket: we tell
the TCP layer to grow the window, and we wait for the write space
callback to tell us that we have enough free socket buffer space to be
able to make progress. We're not waking up and retrying on every ACK as
you suggest.
--
Trond Myklebust
Linux NFS client maintainer
NetApp
Trond.Myklebust@netapp.com
www.netapp.com
next prev parent reply other threads:[~2011-05-05 12:19 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-05-03 1:40 [RFC 0/1] SUNRPC: dynamic rpc_slot allocator for TCP andros
2011-05-03 1:40 ` [[RFC] 1/1] " andros
2011-05-04 0:20 ` Jeff Layton
2011-05-04 0:44 ` Trond Myklebust
2011-05-04 1:18 ` NeilBrown
2011-05-04 1:46 ` Trond Myklebust
2011-05-04 2:07 ` NeilBrown
2011-05-04 11:54 ` Jeff Layton
2011-05-04 14:54 ` Andy Adamson
2011-05-04 15:18 ` Jeff Layton
2011-05-04 15:30 ` Trond Myklebust
2011-05-04 15:52 ` Andy Adamson
2011-05-04 16:01 ` Chuck Lever
2011-05-04 17:22 ` Andy Adamson
2011-05-05 12:05 ` Jeff Layton
2011-05-04 1:33 ` Jeff Layton
2011-05-04 14:59 ` Jeff Layton
[not found] ` <20110504105918.422f7609-9yPaYZwiELC+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
2011-05-04 15:10 ` Andy Adamson
2011-05-04 15:08 ` Jeff Layton
2011-05-04 15:20 ` Andy Adamson
2011-05-04 15:31 ` Jeff Layton
2011-05-04 15:35 ` Trond Myklebust
2011-05-05 11:47 ` Jeff Layton
2011-05-05 12:19 ` Trond Myklebust [this message]
2011-05-03 20:06 ` [RFC 0/1] " Chuck Lever
2011-05-03 20:13 ` Andy Adamson
2011-05-03 20:20 ` Chuck Lever
2011-05-03 20:34 ` Andy Adamson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1304597996.22197.9.camel@lade.trondhjem.org \
--to=trond.myklebust@netapp.com \
--cc=andros@netapp.com \
--cc=jlayton@redhat.com \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).