public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Chuck Lever <chuck.lever@oracle.com>
To: Jeff Layton <jlayton@kernel.org>, NeilBrown <neil@brown.name>,
	Olga Kornievskaia <okorniev@redhat.com>,
	Dai Ngo <Dai.Ngo@oracle.com>, Tom Talpey <tom@talpey.com>
Cc: Trond Myklebust <trondmy@kernel.org>,
	Anna Schumaker <anna@kernel.org>,
	linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 06/14] sunrpc: add helpers to count and snapshot pending cache requests
Date: Thu, 19 Mar 2026 14:47:20 -0400	[thread overview]
Message-ID: <163efbf2-5e1a-4dde-b6c8-787c9b68e878@oracle.com> (raw)
In-Reply-To: <579b0239abbbb0b95d619e6b400bb919fedab60d.camel@kernel.org>

On 3/19/26 2:22 PM, Jeff Layton wrote:
> On Thu, 2026-03-19 at 14:07 -0400, Chuck Lever wrote:
>> On 3/16/26 11:14 AM, Jeff Layton wrote:
>>> Add sunrpc_cache_requests_count() and sunrpc_cache_requests_snapshot()
>>> to allow callers to count and snapshot the pending upcall request list
>>> without exposing struct cache_request outside of cache.c.
>>>
>>> Both functions skip entries that no longer have CACHE_PENDING set.
>>>
>>> The snapshot function takes a cache_get() reference on each item so the
>>> caller can safely use them after the queue_lock is released.
>>>
>>> These will be used by the nfsd generic netlink dumpit handler for
>>> svc_export upcall requests.
>>>
>>> Signed-off-by: Jeff Layton <jlayton@kernel.org>
>>> ---
>>>  include/linux/sunrpc/cache.h |  5 ++++
>>>  net/sunrpc/cache.c           | 57 ++++++++++++++++++++++++++++++++++++++++++++
>>>  2 files changed, 62 insertions(+)
>>>
>>> diff --git a/include/linux/sunrpc/cache.h b/include/linux/sunrpc/cache.h
>>> index c358151c23950ab48e83991c6138bb7d0e049ace..343f0fb675d761e019989e47e03645484e0aa30f 100644
>>> --- a/include/linux/sunrpc/cache.h
>>> +++ b/include/linux/sunrpc/cache.h
>>> @@ -251,6 +251,11 @@ extern int sunrpc_cache_register_pipefs(struct dentry *parent, const char *,
>>>  extern void sunrpc_cache_unregister_pipefs(struct cache_detail *);
>>>  extern void sunrpc_cache_unhash(struct cache_detail *, struct cache_head *);
>>>  
>>> +int sunrpc_cache_requests_count(struct cache_detail *cd);
>>> +int sunrpc_cache_requests_snapshot(struct cache_detail *cd,
>>> +				   struct cache_head **items,
>>> +				   u64 *seqnos, int max);
>>> +
>>>  /* Must store cache_detail in seq_file->private if using next three functions */
>>>  extern void *cache_seq_start_rcu(struct seq_file *file, loff_t *pos);
>>>  extern void *cache_seq_next_rcu(struct seq_file *file, void *p, loff_t *pos);
>>> diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
>>> index 819f12add8f26562fdc6aaa200f55dec0180bfbc..2a78560edee5ca07be0fce90f87ce43a19df245f 100644
>>> --- a/net/sunrpc/cache.c
>>> +++ b/net/sunrpc/cache.c
>>> @@ -1906,3 +1906,60 @@ void sunrpc_cache_unhash(struct cache_detail *cd, struct cache_head *h)
>>>  		spin_unlock(&cd->hash_lock);
>>>  }
>>>  EXPORT_SYMBOL_GPL(sunrpc_cache_unhash);
>>> +
>>> +/**
>>> + * sunrpc_cache_requests_count - count pending upcall requests
>>> + * @cd: cache_detail to query
>>> + *
>>> + * Returns the number of requests on the cache's request list that
>>> + * still have CACHE_PENDING set.
>>> + */
>>> +int sunrpc_cache_requests_count(struct cache_detail *cd)
>>> +{
>>> +	struct cache_request *crq;
>>> +	int cnt = 0;
>>> +
>>> +	spin_lock(&cd->queue_lock);
>>> +	list_for_each_entry(crq, &cd->requests, list) {
>>> +		if (test_bit(CACHE_PENDING, &crq->item->flags))
>>> +			cnt++;
>>> +	}
>>> +	spin_unlock(&cd->queue_lock);
>>> +	return cnt;
>>> +}
>>> +EXPORT_SYMBOL_GPL(sunrpc_cache_requests_count);
>>> +
>>> +/**
>>> + * sunrpc_cache_requests_snapshot - snapshot pending upcall requests
>>> + * @cd: cache_detail to query
>>> + * @items: array to fill with cache_head pointers (caller-allocated)
>>> + * @seqnos: array to fill with sequence numbers (caller-allocated)
>>> + * @max: size of the arrays
>>> + *
>>> + * Only entries with CACHE_PENDING set are included. Takes a reference
>>> + * on each cache_head via cache_get(). Caller must call cache_put()
>>> + * on each returned item when done.
>>> + *
>>> + * Returns the number of entries filled.
>>> + */
>>> +int sunrpc_cache_requests_snapshot(struct cache_detail *cd,
>>> +				   struct cache_head **items,
>>> +				   u64 *seqnos, int max)
>>> +{
>>> +	struct cache_request *crq;
>>> +	int i = 0;
>>> +
>>> +	spin_lock(&cd->queue_lock);
>>> +	list_for_each_entry(crq, &cd->requests, list) {
>>> +		if (i >= max)
>>> +			break;
>>> +		if (!test_bit(CACHE_PENDING, &crq->item->flags))
>>> +			continue;
>>> +		items[i] = cache_get(crq->item);
>>> +		seqnos[i] = crq->seqno;
>>> +		i++;
>>> +	}
>>> +	spin_unlock(&cd->queue_lock);
>>> +	return i;
>>> +}
>>> +EXPORT_SYMBOL_GPL(sunrpc_cache_requests_snapshot);
>>>
>>
>> This API architecture introduces a TOCTOU, since as soon as the
>> queue_lock is dropped, the count can immediately become stale.
>>
>> The count returned by sunrpc_cache_requests_count() is used as the array
>> bound. To wit:
>>
>>   cnt = sunrpc_cache_requests_count(cd);
>>
>>   items = kcalloc(cnt, sizeof(*items), GFP_KERNEL);
>>
>>   seqnos = kcalloc(cnt, sizeof(*seqnos), GFP_KERNEL);
>>
>>   cnt = sunrpc_cache_requests_snapshot(cd, items, seqnos, cnt);
>>
>>
>>
>> This appears in all four dumpit handlers (ip_map, unix_gid, svc_export,
>> expkey).
>>
>> This isn't a memory safety issue; _snapshot() caps its output at max, so
>> if the list grows between the two calls, entries are silently truncated
>> rather than overflowing the buffer. If the list shrinks, the arrays are
>> merely oversized.
>>
>> However, the practical risk is incomplete data returned to userspace. If
>> the caller is guaranteed to be re-invoked (e.g., the netlink dumpit
>> callback gets called again until it returns 0), then missing items due
>> to list growth between count() and snapshot() is harmless: they'll be
>> picked up on the next pass.
>>
>> But looking at the callers, they all do this:
>>
>>   if (cb->args[0])
>>       return 0;
>>
>> and then set cb->args[0] after the single snapshot dump.
>>
>> The dumpit is a one-shot: it snapshots once and signals completion. If
>> the list grows between count() and snapshot(), the truncated entries are
>> silently lost and there's no subsequent pass to pick them up, IIUC.
>>
> 
> Userland will receive a separate notify request whenever a
> cache_request is queued, and that notification is only sent after the
> cache_request is queued.
> 
> So while it might not receive all of the queued requests from the
> kernel in the race you describe, it won't matter because select() will
> return the notify descriptor again on the next pass and mountd will
> pick up the remaining entries at that point.

Makes sense. But perhaps should be documented somewhere; the "retry"
loop is not obvious from the source code introduced here.

"Caller is expected to ..." or some such language.


-- 
Chuck Lever

  reply	other threads:[~2026-03-19 18:47 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-16 15:14 [PATCH 00/14] nfsd/sunrpc: add support for netlink upcalls for mountd/exportd Jeff Layton
2026-03-16 15:14 ` [PATCH 01/14] nfsd: move struct nfsd_genl_rqstp to nfsctl.c Jeff Layton
2026-03-16 15:14 ` [PATCH 02/14] sunrpc: rename sunrpc_cache_pipe_upcall() to sunrpc_cache_upcall() Jeff Layton
2026-03-16 15:14 ` [PATCH 03/14] sunrpc: rename sunrpc_cache_pipe_upcall_timeout() Jeff Layton
2026-03-16 15:14 ` [PATCH 04/14] sunrpc: rename cache_pipe_upcall() to cache_do_upcall() Jeff Layton
2026-03-19 13:54   ` Chuck Lever
2026-03-16 15:14 ` [PATCH 05/14] sunrpc: add a cache_notify callback Jeff Layton
2026-03-19 15:13   ` Chuck Lever
2026-03-16 15:14 ` [PATCH 06/14] sunrpc: add helpers to count and snapshot pending cache requests Jeff Layton
2026-03-19 18:07   ` Chuck Lever
2026-03-19 18:22     ` Jeff Layton
2026-03-19 18:47       ` Chuck Lever [this message]
2026-03-16 15:14 ` [PATCH 07/14] sunrpc: add a generic netlink family for cache upcalls Jeff Layton
2026-03-19 18:44   ` Chuck Lever
2026-03-19 19:14   ` Chuck Lever
2026-03-19 19:19     ` Jeff Layton
2026-03-19 19:20       ` Chuck Lever
2026-03-19 19:31         ` Chuck Lever
2026-03-16 15:14 ` [PATCH 08/14] sunrpc: add netlink upcall for the auth.unix.ip cache Jeff Layton
2026-03-16 15:14 ` [PATCH 09/14] sunrpc: add netlink upcall for the auth.unix.gid cache Jeff Layton
2026-03-20 14:32   ` Chuck Lever
2026-03-16 15:14 ` [PATCH 10/14] nfsd: add new netlink spec for svc_export upcall Jeff Layton
2026-03-20 15:17   ` Chuck Lever
2026-03-23 20:00     ` Jeff Layton
2026-03-16 15:14 ` [PATCH 11/14] nfsd: add netlink upcall for the svc_export cache Jeff Layton
2026-03-16 15:14 ` [PATCH 12/14] nfsd: add netlink upcall for the nfsd.fh cache Jeff Layton
2026-03-16 15:14 ` [PATCH 13/14] sunrpc: add SUNRPC_CMD_CACHE_FLUSH netlink command Jeff Layton
2026-03-16 15:14 ` [PATCH 14/14] nfsd: add NFSD_CMD_CACHE_FLUSH " Jeff Layton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=163efbf2-5e1a-4dde-b6c8-787c9b68e878@oracle.com \
    --to=chuck.lever@oracle.com \
    --cc=Dai.Ngo@oracle.com \
    --cc=anna@kernel.org \
    --cc=jlayton@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neil@brown.name \
    --cc=okorniev@redhat.com \
    --cc=tom@talpey.com \
    --cc=trondmy@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox