Re: [[RFC] 1/1] SUNRPC: dynamic rpc_slot allocator for TCP

linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: NeilBrown <neilb@suse.de>
To: Trond Myklebust <Trond.Myklebust@netapp.com>, andros@netapp.com
Cc: Jeff Layton <jlayton@redhat.com>, linux-nfs@vger.kernel.org
Subject: Re: [[RFC] 1/1] SUNRPC: dynamic rpc_slot allocator for TCP
Date: Wed, 4 May 2011 11:18:25 +1000	[thread overview]
Message-ID: <20110504111825.6f26f75a@notabene.brown> (raw)
In-Reply-To: <1304469890.26184.16.camel@lade.trondhjem.org>

On Tue, 03 May 2011 20:44:50 -0400 Trond Myklebust
<Trond.Myklebust@netapp.com> wrote:

> On Tue, 2011-05-03 at 20:20 -0400, Jeff Layton wrote:
> > On Mon,  2 May 2011 21:40:08 -0400
> > andros@netapp.com wrote:
> > 
> > > From: Andy Adamson <andros@netapp.com>
> > > 
> > > Hookup TCP congestion feedback into rpc_slot allocation so that the RPC layer
> > > can fully utilize the negotiated TCP window.
> > > 
> > > Use a slab cache for rpc_slots. Statically allocate an rpc_xprt rpc_slot slab
> > > cache using GFP_KERNEL to the RPC_DEF_SLOT_TABLE number of slots at
> > > rpc_xprt allocation.
> > > 
> > > Add a dynamic rpc slot allocator to rpc_xprt_ops which is set only for TCP.
> > > For TCP, trigger a dyamic slot allocation in response to a write_space
> > > callback which is in turn called when the TCP layer is waiting for buffer space.
> > > 
> > > Dynamically add a slot at the beginning of the RPC call_transmit state. The slot
> > > allocator uses GFP_NOWAIT and will return without allocating a slot if
> > > GFP_NOWAIT allocation fails. This is OK because the write_space callback will
> > > be called again, and the dynamic slot allocator can retry.
> > > 
> > > Signed-off-by: Andy Adamson <andros@netap.com>
> > > ---
> > >  include/linux/sunrpc/sched.h |    2 +
> > >  include/linux/sunrpc/xprt.h  |    6 +++-
> > >  net/sunrpc/clnt.c            |    4 ++
> > >  net/sunrpc/sched.c           |   39 ++++++++++++++++++++++
> > >  net/sunrpc/xprt.c            |   75 +++++++++++++++++++++++++++++++++++++-----
> > >  net/sunrpc/xprtsock.c        |    1 +
> > >  6 files changed, 117 insertions(+), 10 deletions(-)
> > > 
> > 
> > Nice work, comments inline below...
> > 
> > 
> > [...]
> > 
> > > +
> > > +/*
> > > + * Static transport rpc_slot allocation called only at rpc_xprt allocation.
> > > + * No need to take the xprt->reserve_lock.
> > > + */
> > > +int
> > > +xprt_alloc_slot_entries(struct rpc_xprt *xprt, int num_req)
> > > +{
> > > +	struct rpc_rqst *req;
> > > +	int i;
> > > +
> > > +	for (i = 0; i < num_req; i++) {
> > > +		req = rpc_alloc_slot(GFP_KERNEL);
> > > +		if (!req)
> > > +			return -ENOMEM;
> > > +		memset(req, 0, sizeof(*req));
> > > +		list_add(&req->rq_list, &xprt->free);
> > > +	}
> > > +	dprintk("<-- %s mempool_alloc %d reqs\n", __func__,
> > > +		xprt->max_reqs);
> > > +	return 0;
> > > +}
> > > +
> > 
> > So, I don't quite get this...
> > 
> > You declare a global mempool early on, and then allocate from that
> > mempool for a list of static entries. Doesn't that sort of rob you of
> > any benefit of using a mempool here? IOW, won't the static allocations
> > potentially rob the mempool of "guaranteed" entries such that the
> > dynamic ones eventually all turn into slab allocations anyway?
> 
> The other thing is that we really should _never_ be using GFP_KERNEL
> allocations in any paths that might be used for writebacks, since that
> might recurse back into the filesystem due to direct memory reclaims. In
> fact, since this code path will commonly be used by rpciod, then we
> should rather be using GFP_ATOMIC and/or GFP_NOWAIT (see rpc_malloc).
> 
> > What I think would make more sense would be to have multiple mempools
> > -- one per xprt and simply set the mempool size to the number of
> > "static" entries that you want for the mempool. Then you could get rid
> > of the free list, and just do allocations out of the mempool directly.
> > You'll be guaranteed to be able to allocate up to the number in the
> > mempool and everything above that would just becomes a slab allocation.
> 
> I'm not sure that we need multiple mempools: all we really require is
> that memory reclaims to be able to make some form of progress.
> I'd therefore rather advocate a global mempool (which matches our policy
> on the rpc buffer pool), but that we allow the allocations to fail if
> we're really low on memory.
> 
> The problem with multiple mempools is that if there is no activity on
> one of those filesystems, then we're basically locking up memory for no
> good reason.
> 

It might be worth pointing out that mempools only use the reserved memory as
a last resort.   They try a normal allocation first and only when that fails
do they dip into the pool.  This is because normal allocation avoids taking
global locks on common case - they are CPU-local.  The pool is a global
resource (i.e. for all CPUs) so a spinlock is needed.

As GFP_KERNEL essentially never fails, it will never dip into the pool, so
there is not a lot of point using it in a mempool allocation (though I guess
it could make the code cleaner/neater and that would be a good thing).

In general, mempools should be sized to the minimum number of entries
necessary to make forward progress when memory pressure is high.  This is
typically 1.  Making mempools larger than this might allow throughput to be a
bit higher during that high-memory-pressure time, but wastes memory all the
rest of the time.

You only need multiple mempools when they might be used on the one request.
i.e. if two separate allocation are needed to handle the one request, then
they must come from two separate mempools.
If two different allocations will always be for two independent requests, then
they can safely share a mempool.

For rpc slots, I doubt you need mempools at all.
Mempools are only needed if failure is not an option and you would rather
wait, but you cannot wait for regular writeback because you are on the
writeback path.  So you use a mempool which waits for a previous request to
complete.  I don't think that describes rpc slots at all.
For rpc slots, you can afford to wait when setting up the transport, as you
are not on the writeout path yet, and later you can always cope with failure.
So just use kmalloc.

NeilBrown

next prev parent reply	other threads:[~2011-05-04  1:18 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-05-03  1:40 [RFC 0/1] SUNRPC: dynamic rpc_slot allocator for TCP andros
2011-05-03  1:40 ` [[RFC] 1/1] " andros
2011-05-04  0:20   ` Jeff Layton
2011-05-04  0:44     ` Trond Myklebust
2011-05-04  1:18       ` NeilBrown [this message]
2011-05-04  1:46         ` Trond Myklebust
2011-05-04  2:07           ` NeilBrown
2011-05-04 11:54             ` Jeff Layton
2011-05-04 14:54               ` Andy Adamson
2011-05-04 15:18                 ` Jeff Layton
2011-05-04 15:30                   ` Trond Myklebust
2011-05-04 15:52                   ` Andy Adamson
2011-05-04 16:01                     ` Chuck Lever
2011-05-04 17:22                       ` Andy Adamson
2011-05-05 12:05                         ` Jeff Layton
2011-05-04  1:33       ` Jeff Layton
2011-05-04 14:59   ` Jeff Layton
     [not found]     ` <20110504105918.422f7609-9yPaYZwiELC+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
2011-05-04 15:10       ` Andy Adamson
2011-05-04 15:08   ` Jeff Layton
2011-05-04 15:20     ` Andy Adamson
2011-05-04 15:31       ` Jeff Layton
2011-05-04 15:35       ` Trond Myklebust
2011-05-05 11:47         ` Jeff Layton
2011-05-05 12:19           ` Trond Myklebust
2011-05-03 20:06 ` [RFC 0/1] " Chuck Lever
2011-05-03 20:13   ` Andy Adamson
2011-05-03 20:20     ` Chuck Lever
2011-05-03 20:34       ` Andy Adamson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110504111825.6f26f75a@notabene.brown \
    --to=neilb@suse.de \
    --cc=Trond.Myklebust@netapp.com \
    --cc=andros@netapp.com \
    --cc=jlayton@redhat.com \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).