From: "J. Bruce Fields" <bfields@fieldses.org>
To: Chuck Lever <chuck.lever@oracle.com>
Cc: linux-nfs@vger.kernel.org
Subject: Re: [PATCH RFC] NFSD: Optimize DRC bucket pruning
Date: Tue, 21 Sep 2021 18:21:22 -0400 [thread overview]
Message-ID: <20210921222122.GE21704@fieldses.org> (raw)
In-Reply-To: <163216587593.1058.15663218635528093628.stgit@klimt.1015granger.net>
On Mon, Sep 20, 2021 at 03:25:21PM -0400, Chuck Lever wrote:
> DRC bucket pruning is done by nfsd_cache_lookup(), which is part of
> every NFSv2 and NFSv3 dispatch (ie, it's done while the client is
> waiting).
>
> I added a trace_printk() in prune_bucket() to see just how long
> it takes to prune. Here are two ends of the spectrum:
>
> prune_bucket: Scanned 1 and freed 0 in 90 ns, 62 entries remaining
> prune_bucket: Scanned 2 and freed 1 in 716 ns, 63 entries remaining
> ...
> prune_bucket: Scanned 75 and freed 74 in 34149 ns, 1 entries remaining
So how do we end up with so many to prune at once if we're pruning every
time we add a new item?
Oh, I guess they must all have about the same expiry time, so they all
became eligible for pruning at the same time.
> Pruning latency is noticeable on fast transports with fast storage.
> By noticeable, I mean that the latency measured here in the worst
> case is the same order of magnitude as the round trip time for
> cached server operations.
>
> We could do something like moving expired entries to an expired list
> and then free them later instead of freeing them right in
> prune_bucket(). But simply limiting the number of entries that can
> be pruned by a lookup is simple and retains more entries in the
> cache, making the DRC somewhat more effective.
And when there's a bunch of expired entries to prune I guess it makes
sense that stopping at pruning 3 for each one you add will still keep
the DRC size under control.
Anyway, looks simple and makes sense to me; applying.
--b.
>
> Comparison with a 70/30 fio 8KB 12 thread direct I/O test:
>
>
> Before:
>
> write: IOPS=61.6k, BW=481MiB/s (505MB/s)(14.1GiB/30001msec); 0 zone resets
>
> WRITE:
> 1848726 ops (30%)
> avg bytes sent per op: 8340 avg bytes received per op: 136
> backlog wait: 0.635158 RTT: 0.128525 total execute time: 0.827242 (milliseconds)
>
>
> After:
>
> write: IOPS=63.0k, BW=492MiB/s (516MB/s)(14.4GiB/30001msec); 0 zone resets
>
> WRITE:
> 1891144 ops (30%)
> avg bytes sent per op: 8340 avg bytes received per op: 136
> backlog wait: 0.616114 RTT: 0.126842 total execute time: 0.805348 (milliseconds)
>
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
> fs/nfsd/nfscache.c | 17 +++++++++++------
> 1 file changed, 11 insertions(+), 6 deletions(-)
>
> diff --git a/fs/nfsd/nfscache.c b/fs/nfsd/nfscache.c
> index 96cdf77925f3..6e0b6f3148dc 100644
> --- a/fs/nfsd/nfscache.c
> +++ b/fs/nfsd/nfscache.c
> @@ -241,8 +241,8 @@ lru_put_end(struct nfsd_drc_bucket *b, struct svc_cacherep *rp)
> list_move_tail(&rp->c_lru, &b->lru_head);
> }
>
> -static long
> -prune_bucket(struct nfsd_drc_bucket *b, struct nfsd_net *nn)
> +static long prune_bucket(struct nfsd_drc_bucket *b, struct nfsd_net *nn,
> + unsigned int max)
> {
> struct svc_cacherep *rp, *tmp;
> long freed = 0;
> @@ -258,11 +258,17 @@ prune_bucket(struct nfsd_drc_bucket *b, struct nfsd_net *nn)
> time_before(jiffies, rp->c_timestamp + RC_EXPIRE))
> break;
> nfsd_reply_cache_free_locked(b, rp, nn);
> - freed++;
> + if (max && freed++ > max)
> + break;
> }
> return freed;
> }
>
> +static long nfsd_prune_bucket(struct nfsd_drc_bucket *b, struct nfsd_net *nn)
> +{
> + return prune_bucket(b, nn, 3);
> +}
> +
> /*
> * Walk the LRU list and prune off entries that are older than RC_EXPIRE.
> * Also prune the oldest ones when the total exceeds the max number of entries.
> @@ -279,7 +285,7 @@ prune_cache_entries(struct nfsd_net *nn)
> if (list_empty(&b->lru_head))
> continue;
> spin_lock(&b->cache_lock);
> - freed += prune_bucket(b, nn);
> + freed += prune_bucket(b, nn, 0);
> spin_unlock(&b->cache_lock);
> }
> return freed;
> @@ -453,8 +459,7 @@ int nfsd_cache_lookup(struct svc_rqst *rqstp)
> atomic_inc(&nn->num_drc_entries);
> nfsd_stats_drc_mem_usage_add(nn, sizeof(*rp));
>
> - /* go ahead and prune the cache */
> - prune_bucket(b, nn);
> + nfsd_prune_bucket(b, nn);
>
> out_unlock:
> spin_unlock(&b->cache_lock);
>
prev parent reply other threads:[~2021-09-21 22:21 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-09-20 19:25 [PATCH RFC] NFSD: Optimize DRC bucket pruning Chuck Lever
2021-09-21 22:21 ` J. Bruce Fields [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210921222122.GE21704@fieldses.org \
--to=bfields@fieldses.org \
--cc=chuck.lever@oracle.com \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).