linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@redhat.com>
To: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Andy Adamson <andros@netapp.com>, linux-nfs@vger.kernel.org
Subject: Re: [[RFC] 1/1] SUNRPC: dynamic rpc_slot allocator for TCP
Date: Thu, 5 May 2011 07:47:41 -0400	[thread overview]
Message-ID: <20110505074741.17f1698d@tlielax.poochiereds.net> (raw)
In-Reply-To: <1304523334.4476.12.camel@lade.trondhjem.org>

On Wed, 04 May 2011 11:35:34 -0400
Trond Myklebust <Trond.Myklebust@netapp.com> wrote:

> On Wed, 2011-05-04 at 11:20 -0400, Andy Adamson wrote:
> > On May 4, 2011, at 11:08 AM, Jeff Layton wrote:
> > 
> > > On Mon,  2 May 2011 21:40:08 -0400
> > > andros@netapp.com wrote:
> > > 
> > >> +	if (!test_and_clear_bit(XPRT_WRITE_SPACE, &xprt->state))
> > >> +		return;
> > > 
> > > Also, I'm not sure that a single bit really conveys enough information
> > > for this.
> > > 
> > > IIUC, sk_write_space gets called when a packet is TCP ACK'ed. It seems
> > > possible that we would sometimes have buffer space available to queue
> > > the packet without sk_write_space being called. With this, we'll
> > > basically be serializing all dynamic slot allocations behind the
> > > sk_write_space callbacks.
> > 
> > Which I thought was OK given that the TCP window takes a while to stabilize.
> > 
> > > 
> > > Consider the case of many small TCP frames being sent after a large one
> > > just got ACK'ed. Only one would be allowed to be sent, even though
> > > there might be enough send buffer space to allow for more.
> > > 
> > > Would it instead make more sense to base this on the amount of space
> > > available in the actual socket rather than this bit?
> > 
> > So at each write_space, potentially allocate more than one rpc_slot as opposed
> > to allocating one rpc_slot and waiting for the next write_space? I could look at this
> > with the 10G testiing.
> 
> Why? You can't send that data. Once you hit the write space limit, then
> the socket remains blocked until you get the callback. It doesn't matter
> how small the frame, you will not be allowed to send more data.
> 
> On the other hand, we do set the SOCK_NOSPACE bit, which means that the
> socket layer will attempt to grow the TCP window even though we're not
> actually putting more data into the socket.
> 

I'm not sure I understand what you're suggesting here.

I guess my main point is that a single bit that we flip on in
write_space and flip off when a slot is allocated doesn't carry enough
info. That scheme will also be subject to subtle differences in timing.
For instance...

Suppose a large number of TCP ACKs come in all at around the same time.
write_space gets called a bunch of times in succession, so the bit gets
"set" several times. Several queued tasks get woken up but only one can
clear the bit so only one gets a slot.

However, if those acks come in with enough of a delay between them, then
you can potentially get one slot allocated per write_space callback.

I think we ought to consider a heuristic that doesn't rely on the
frequency and timing of write_space callbacks.

-- 
Jeff Layton <jlayton@redhat.com>

  reply	other threads:[~2011-05-05 11:47 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-05-03  1:40 [RFC 0/1] SUNRPC: dynamic rpc_slot allocator for TCP andros
2011-05-03  1:40 ` [[RFC] 1/1] " andros
2011-05-04  0:20   ` Jeff Layton
2011-05-04  0:44     ` Trond Myklebust
2011-05-04  1:18       ` NeilBrown
2011-05-04  1:46         ` Trond Myklebust
2011-05-04  2:07           ` NeilBrown
2011-05-04 11:54             ` Jeff Layton
2011-05-04 14:54               ` Andy Adamson
2011-05-04 15:18                 ` Jeff Layton
2011-05-04 15:30                   ` Trond Myklebust
2011-05-04 15:52                   ` Andy Adamson
2011-05-04 16:01                     ` Chuck Lever
2011-05-04 17:22                       ` Andy Adamson
2011-05-05 12:05                         ` Jeff Layton
2011-05-04  1:33       ` Jeff Layton
2011-05-04 14:59   ` Jeff Layton
     [not found]     ` <20110504105918.422f7609-9yPaYZwiELC+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
2011-05-04 15:10       ` Andy Adamson
2011-05-04 15:08   ` Jeff Layton
2011-05-04 15:20     ` Andy Adamson
2011-05-04 15:31       ` Jeff Layton
2011-05-04 15:35       ` Trond Myklebust
2011-05-05 11:47         ` Jeff Layton [this message]
2011-05-05 12:19           ` Trond Myklebust
2011-05-03 20:06 ` [RFC 0/1] " Chuck Lever
2011-05-03 20:13   ` Andy Adamson
2011-05-03 20:20     ` Chuck Lever
2011-05-03 20:34       ` Andy Adamson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110505074741.17f1698d@tlielax.poochiereds.net \
    --to=jlayton@redhat.com \
    --cc=Trond.Myklebust@netapp.com \
    --cc=andros@netapp.com \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).