[PATCH 0/3] NFSD: Prevent dupplicate SCSI fencing operation.

Linux NFS development
 help / color / mirror / Atom feed

* [PATCH 0/3] NFSD: Prevent dupplicate SCSI fencing operation.
@ 2025-12-15 18:13 Dai Ngo
  2025-12-15 18:13 ` [PATCH 1/3] NFSD: Move clientid_hashval and same_clid to header files Dai Ngo
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Dai Ngo @ 2025-12-15 18:13 UTC (permalink / raw)
  To: chuck.lever, jlayton, neilb, okorniev, tom, hch; +Cc: linux-nfs

This patchset implements a mechanism to avoid issuing redundant
SCSI client fencing operations for the same client and SCSI device:

Introduce a hash table per net namespace to store records of clients
that has been assigned persistent registration key to use for
accessing the SCSI block devices.

Each record contains the clientID, dev_t of the SCSI block device
and a flag to indicate whether the client was fenced.

When the server hands out the client registration key, it creates
a record with the fence flag set to False and inserts this record
into the hash table.

When a client needs to be fenced, the server looks up the client's
registration record and checks the fenced flag to see if the
fencing op needs to be issued.

When the client sends the GETDEVINFO for a new layout, the server
resets the fenced flag back to False.

When the client unmounts the share, all records belong to this
client are removed by __destroy_client.

All client records and the hash table are freed when the net
namespace is shutdown.

Dai Ngo (3):
NFSD: Move clientid_hashval and same_clid to header files
NFSD: Add infrastructure for tracking persistent SCSI registration keys
NFSD: Prevent redundant SCSI fencing operations

 fs/nfsd/blocklayout.c | 145 ++++++++++++++++++++++++++++++++++++++++++++-
 fs/nfsd/netns.h       |   2 +
 fs/nfsd/nfs4state.c   |  16 +----
 fs/nfsd/nfsd.h        |  19 ++++++
 fs/nfsd/nfssvc.c      |   9 ++-
 fs/nfsd/pnfs.h        |  11 ++++
 fs/nfsd/state.h       |   5 ++
 7 files changed, 189 insertions(+), 18 deletions(-)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/3] NFSD: Move clientid_hashval and same_clid to header files
  2025-12-15 18:13 [PATCH 0/3] NFSD: Prevent dupplicate SCSI fencing operation Dai Ngo
@ 2025-12-15 18:13 ` Dai Ngo
  2025-12-15 18:58   ` Chuck Lever
  2025-12-15 18:13 ` [PATCH 2/3] NFSD: Add infrastructure for tracking persistent SCSI registration keys Dai Ngo
  2025-12-15 18:13 ` [PATCH 3/3] NFSD: Prevent redundant SCSI fencing operations Dai Ngo
  2 siblings, 1 reply; 16+ messages in thread
From: Dai Ngo @ 2025-12-15 18:13 UTC (permalink / raw)
  To: chuck.lever, jlayton, neilb, okorniev, tom, hch; +Cc: linux-nfs

As preparation for introducing infrastructure to track SCSI fencing events,
relocate the clientid_hashval function from nfs4state.c to nfsd.h and the
same_clid function from nfs4state.c to state.h.

No functional changes are introduced in this commit—only movement of code
to improve accessibility for forthcoming enhancements.

Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
---
 fs/nfsd/nfs4state.c | 15 ---------------
 fs/nfsd/nfsd.h      |  5 +++++
 fs/nfsd/state.h     |  5 +++++
 3 files changed, 10 insertions(+), 15 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 35004568d43e..13b4dc89b1e8 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -1423,15 +1423,6 @@ static void revoke_delegation(struct nfs4_delegation *dp)
 	destroy_unhashed_deleg(dp);
 }
 
-/*
- * SETCLIENTID state
- */
-
-static unsigned int clientid_hashval(u32 id)
-{
-	return id & CLIENT_HASH_MASK;
-}
-
 static unsigned int clientstr_hashval(struct xdr_netobj name)
 {
 	return opaque_hashval(name.data, 8) & CLIENT_HASH_MASK;
@@ -2621,12 +2612,6 @@ same_verf(nfs4_verifier *v1, nfs4_verifier *v2)
 	return 0 == memcmp(v1->data, v2->data, sizeof(v1->data));
 }
 
-static int
-same_clid(clientid_t *cl1, clientid_t *cl2)
-{
-	return (cl1->cl_boot == cl2->cl_boot) && (cl1->cl_id == cl2->cl_id);
-}
-
 static bool groups_equal(struct group_info *g1, struct group_info *g2)
 {
 	int i;
diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
index 50be785f1d2c..369efe6ed665 100644
--- a/fs/nfsd/nfsd.h
+++ b/fs/nfsd/nfsd.h
@@ -510,6 +510,11 @@ static inline bool nfsd_attrs_supported(u32 minorversion, const u32 *bmval)
 	return bmval_is_subset(bmval, nfsd_suppattrs[minorversion]);
 }
 
+static inline unsigned int clientid_hashval(u32 id)
+{
+	return id & CLIENT_HASH_MASK;
+}
+
 /* These will return ERR_INVAL if specified in GETATTR or READDIR. */
 #define NFSD_WRITEONLY_ATTRS_WORD1 \
 	(FATTR4_WORD1_TIME_ACCESS_SET   | FATTR4_WORD1_TIME_MODIFY_SET)
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index 1e736f402426..b737e8cfe6d5 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -865,6 +865,11 @@ static inline bool try_to_expire_client(struct nfs4_client *clp)
 	return clp->cl_state == NFSD4_EXPIRABLE;
 }
 
+static inline int same_clid(clientid_t *cl1, clientid_t *cl2)
+{
+	return (cl1->cl_boot == cl2->cl_boot) && (cl1->cl_id == cl2->cl_id);
+}
+
 extern __be32 nfsd4_deleg_getattr_conflict(struct svc_rqst *rqstp,
 		struct dentry *dentry, struct nfs4_delegation **pdp);
 #endif   /* NFSD4_STATE_H */
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/3] NFSD: Move clientid_hashval and same_clid to header files
  2025-12-15 18:13 ` [PATCH 1/3] NFSD: Move clientid_hashval and same_clid to header files Dai Ngo
@ 2025-12-15 18:58   ` Chuck Lever
  2025-12-15 20:50     ` Dai Ngo
  2025-12-18  9:25     ` Christoph Hellwig
  0 siblings, 2 replies; 16+ messages in thread
From: Chuck Lever @ 2025-12-15 18:58 UTC (permalink / raw)
  To: Dai Ngo, Chuck Lever, Jeff Layton, neilb, Olga Kornievskaia,
	Tom Talpey, Christoph Hellwig
  Cc: linux-nfs



On Mon, Dec 15, 2025, at 1:13 PM, Dai Ngo wrote:
> As preparation for introducing infrastructure to track SCSI fencing events,
> relocate the clientid_hashval function from nfs4state.c to nfsd.h and the
> same_clid function from nfs4state.c to state.h.
>
> No functional changes are introduced in this commit—only movement of code
> to improve accessibility for forthcoming enhancements.
>
> Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
> ---
>  fs/nfsd/nfs4state.c | 15 ---------------
>  fs/nfsd/nfsd.h      |  5 +++++
>  fs/nfsd/state.h     |  5 +++++
>  3 files changed, 10 insertions(+), 15 deletions(-)
>
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index 35004568d43e..13b4dc89b1e8 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -1423,15 +1423,6 @@ static void revoke_delegation(struct nfs4_delegation *dp)
>  	destroy_unhashed_deleg(dp);
>  }
> 
> -/*
> - * SETCLIENTID state
> - */
> -
> -static unsigned int clientid_hashval(u32 id)
> -{
> -	return id & CLIENT_HASH_MASK;
> -}
> -
>  static unsigned int clientstr_hashval(struct xdr_netobj name)
>  {
>  	return opaque_hashval(name.data, 8) & CLIENT_HASH_MASK;
> @@ -2621,12 +2612,6 @@ same_verf(nfs4_verifier *v1, nfs4_verifier *v2)
>  	return 0 == memcmp(v1->data, v2->data, sizeof(v1->data));
>  }
> 
> -static int
> -same_clid(clientid_t *cl1, clientid_t *cl2)
> -{
> -	return (cl1->cl_boot == cl2->cl_boot) && (cl1->cl_id == cl2->cl_id);
> -}
> -
>  static bool groups_equal(struct group_info *g1, struct group_info *g2)
>  {
>  	int i;
> diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
> index 50be785f1d2c..369efe6ed665 100644
> --- a/fs/nfsd/nfsd.h
> +++ b/fs/nfsd/nfsd.h
> @@ -510,6 +510,11 @@ static inline bool nfsd_attrs_supported(u32 
> minorversion, const u32 *bmval)
>  	return bmval_is_subset(bmval, nfsd_suppattrs[minorversion]);
>  }
> 
> +static inline unsigned int clientid_hashval(u32 id)
> +{
> +	return id & CLIENT_HASH_MASK;
> +}
> +

I can't comment on the overall purpose of this series yet, but
there are one or two mechanical issues that I see already.

Let's not add NFSv4- or pNFS-specific functions to fs/nfsd/nfsd.h.
Same comment applies to the function declarations this series moves
in a subsequent patch.

Why can't clientid_hashval() go into fs/nfsd/state.h instead?

Also, when a function becomes accessible outside of one source
file (like a "static inline" function or a callback function),
it needs to get a kdoc comment that documents its API contract.


>  /* These will return ERR_INVAL if specified in GETATTR or READDIR. */
>  #define NFSD_WRITEONLY_ATTRS_WORD1 \
>  	(FATTR4_WORD1_TIME_ACCESS_SET   | FATTR4_WORD1_TIME_MODIFY_SET)
> diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
> index 1e736f402426..b737e8cfe6d5 100644
> --- a/fs/nfsd/state.h
> +++ b/fs/nfsd/state.h
> @@ -865,6 +865,11 @@ static inline bool try_to_expire_client(struct 
> nfs4_client *clp)
>  	return clp->cl_state == NFSD4_EXPIRABLE;
>  }
> 
> +static inline int same_clid(clientid_t *cl1, clientid_t *cl2)
> +{
> +	return (cl1->cl_boot == cl2->cl_boot) && (cl1->cl_id == cl2->cl_id);
> +}
> +
>  extern __be32 nfsd4_deleg_getattr_conflict(struct svc_rqst *rqstp,
>  		struct dentry *dentry, struct nfs4_delegation **pdp);
>  #endif   /* NFSD4_STATE_H */
> -- 
> 2.47.3

-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/3] NFSD: Move clientid_hashval and same_clid to header files
  2025-12-15 18:58   ` Chuck Lever
@ 2025-12-15 20:50     ` Dai Ngo
  2025-12-18  9:25     ` Christoph Hellwig
  1 sibling, 0 replies; 16+ messages in thread
From: Dai Ngo @ 2025-12-15 20:50 UTC (permalink / raw)
  To: Chuck Lever, Chuck Lever, Jeff Layton, neilb, Olga Kornievskaia,
	Tom Talpey, Christoph Hellwig
  Cc: linux-nfs


On 12/15/25 10:58 AM, Chuck Lever wrote:
>
> On Mon, Dec 15, 2025, at 1:13 PM, Dai Ngo wrote:
>> As preparation for introducing infrastructure to track SCSI fencing events,
>> relocate the clientid_hashval function from nfs4state.c to nfsd.h and the
>> same_clid function from nfs4state.c to state.h.
>>
>> No functional changes are introduced in this commit—only movement of code
>> to improve accessibility for forthcoming enhancements.
>>
>> Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
>> ---
>>   fs/nfsd/nfs4state.c | 15 ---------------
>>   fs/nfsd/nfsd.h      |  5 +++++
>>   fs/nfsd/state.h     |  5 +++++
>>   3 files changed, 10 insertions(+), 15 deletions(-)
>>
>> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
>> index 35004568d43e..13b4dc89b1e8 100644
>> --- a/fs/nfsd/nfs4state.c
>> +++ b/fs/nfsd/nfs4state.c
>> @@ -1423,15 +1423,6 @@ static void revoke_delegation(struct nfs4_delegation *dp)
>>   	destroy_unhashed_deleg(dp);
>>   }
>>
>> -/*
>> - * SETCLIENTID state
>> - */
>> -
>> -static unsigned int clientid_hashval(u32 id)
>> -{
>> -	return id & CLIENT_HASH_MASK;
>> -}
>> -
>>   static unsigned int clientstr_hashval(struct xdr_netobj name)
>>   {
>>   	return opaque_hashval(name.data, 8) & CLIENT_HASH_MASK;
>> @@ -2621,12 +2612,6 @@ same_verf(nfs4_verifier *v1, nfs4_verifier *v2)
>>   	return 0 == memcmp(v1->data, v2->data, sizeof(v1->data));
>>   }
>>
>> -static int
>> -same_clid(clientid_t *cl1, clientid_t *cl2)
>> -{
>> -	return (cl1->cl_boot == cl2->cl_boot) && (cl1->cl_id == cl2->cl_id);
>> -}
>> -
>>   static bool groups_equal(struct group_info *g1, struct group_info *g2)
>>   {
>>   	int i;
>> diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
>> index 50be785f1d2c..369efe6ed665 100644
>> --- a/fs/nfsd/nfsd.h
>> +++ b/fs/nfsd/nfsd.h
>> @@ -510,6 +510,11 @@ static inline bool nfsd_attrs_supported(u32
>> minorversion, const u32 *bmval)
>>   	return bmval_is_subset(bmval, nfsd_suppattrs[minorversion]);
>>   }
>>
>> +static inline unsigned int clientid_hashval(u32 id)
>> +{
>> +	return id & CLIENT_HASH_MASK;
>> +}
>> +
> I can't comment on the overall purpose of this series yet, but
> there are one or two mechanical issues that I see already.
>
> Let's not add NFSv4- or pNFS-specific functions to fs/nfsd/nfsd.h.
> Same comment applies to the function declarations this series moves
> in a subsequent patch.

Fix in v2.

>
> Why can't clientid_hashval() go into fs/nfsd/state.h instead?

Fix in v2.

>
> Also, when a function becomes accessible outside of one source
> file (like a "static inline" function or a callback function),
> it needs to get a kdoc comment that documents its API contract.

Fix in v2.

Thanks,
-Dai

>
>
>>   /* These will return ERR_INVAL if specified in GETATTR or READDIR. */
>>   #define NFSD_WRITEONLY_ATTRS_WORD1 \
>>   	(FATTR4_WORD1_TIME_ACCESS_SET   | FATTR4_WORD1_TIME_MODIFY_SET)
>> diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
>> index 1e736f402426..b737e8cfe6d5 100644
>> --- a/fs/nfsd/state.h
>> +++ b/fs/nfsd/state.h
>> @@ -865,6 +865,11 @@ static inline bool try_to_expire_client(struct
>> nfs4_client *clp)
>>   	return clp->cl_state == NFSD4_EXPIRABLE;
>>   }
>>
>> +static inline int same_clid(clientid_t *cl1, clientid_t *cl2)
>> +{
>> +	return (cl1->cl_boot == cl2->cl_boot) && (cl1->cl_id == cl2->cl_id);
>> +}
>> +
>>   extern __be32 nfsd4_deleg_getattr_conflict(struct svc_rqst *rqstp,
>>   		struct dentry *dentry, struct nfs4_delegation **pdp);
>>   #endif   /* NFSD4_STATE_H */
>> -- 
>> 2.47.3

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/3] NFSD: Move clientid_hashval and same_clid to header files
  2025-12-15 18:58   ` Chuck Lever
  2025-12-15 20:50     ` Dai Ngo
@ 2025-12-18  9:25     ` Christoph Hellwig
  2025-12-18 19:40       ` Dai Ngo
  1 sibling, 1 reply; 16+ messages in thread
From: Christoph Hellwig @ 2025-12-18  9:25 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Dai Ngo, Chuck Lever, Jeff Layton, neilb, Olga Kornievskaia,
	Tom Talpey, Christoph Hellwig, linux-nfs

On Mon, Dec 15, 2025 at 01:58:49PM -0500, Chuck Lever wrote:
> >  	return bmval_is_subset(bmval, nfsd_suppattrs[minorversion]);
> >  }
> > 
> > +static inline unsigned int clientid_hashval(u32 id)
> > +{
> > +	return id & CLIENT_HASH_MASK;
> > +}
> > +
> 
> I can't comment on the overall purpose of this series yet, but
> there are one or two mechanical issues that I see already.
> 
> Let's not add NFSv4- or pNFS-specific functions to fs/nfsd/nfsd.h.
> Same comment applies to the function declarations this series moves
> in a subsequent patch.
> 
> Why can't clientid_hashval() go into fs/nfsd/state.h instead?

Or why do we need it at all?  It's completly trivial, and there is
no inherent advantage in sharing an arbiteary masking for hasheѕ.

> Also, when a function becomes accessible outside of one source
> file (like a "static inline" function or a callback function),
> it needs to get a kdoc comment that documents its API contract.

And a nfsd_ prefix would also be useful.  All good arguments for
just duplicating this bit.

> > +static inline int same_clid(clientid_t *cl1, clientid_t *cl2)
> > +{
> > +	return (cl1->cl_boot == cl2->cl_boot) && (cl1->cl_id == cl2->cl_id);
> > +}

Now this might have some more reason to be shared as we need two
values to establish the client identify.  Ńo need for the braces
here even wen they are copied from the existing code.

Also I wonder why clientid_t is a typedef instead of a struct?  And
if we want to treat it as opaque, why we're comparing the fields instead
of a memcmp?


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/3] NFSD: Move clientid_hashval and same_clid to header files
  2025-12-18  9:25     ` Christoph Hellwig
@ 2025-12-18 19:40       ` Dai Ngo
  0 siblings, 0 replies; 16+ messages in thread
From: Dai Ngo @ 2025-12-18 19:40 UTC (permalink / raw)
  To: Christoph Hellwig, Chuck Lever
  Cc: Chuck Lever, Jeff Layton, neilb, Olga Kornievskaia, Tom Talpey,
	linux-nfs


On 12/18/25 1:25 AM, Christoph Hellwig wrote:
> On Mon, Dec 15, 2025 at 01:58:49PM -0500, Chuck Lever wrote:
>>>   	return bmval_is_subset(bmval, nfsd_suppattrs[minorversion]);
>>>   }
>>>
>>> +static inline unsigned int clientid_hashval(u32 id)
>>> +{
>>> +	return id & CLIENT_HASH_MASK;
>>> +}
>>> +
>> I can't comment on the overall purpose of this series yet, but
>> there are one or two mechanical issues that I see already.
>>
>> Let's not add NFSv4- or pNFS-specific functions to fs/nfsd/nfsd.h.
>> Same comment applies to the function declarations this series moves
>> in a subsequent patch.
>>
>> Why can't clientid_hashval() go into fs/nfsd/state.h instead?
> Or why do we need it at all?  It's completly trivial, and there is
> no inherent advantage in sharing an arbiteary masking for hasheѕ.
>
>> Also, when a function becomes accessible outside of one source
>> file (like a "static inline" function or a callback function),
>> it needs to get a kdoc comment that documents its API contract.
> And a nfsd_ prefix would also be useful.  All good arguments for
> just duplicating this bit.
>
>>> +static inline int same_clid(clientid_t *cl1, clientid_t *cl2)
>>> +{
>>> +	return (cl1->cl_boot == cl2->cl_boot) && (cl1->cl_id == cl2->cl_id);
>>> +}
> Now this might have some more reason to be shared as we need two
> values to establish the client identify.  Ńo need for the braces
> here even wen they are copied from the existing code.
>
> Also I wonder why clientid_t is a typedef instead of a struct?  And
> if we want to treat it as opaque, why we're comparing the fields instead
> of a memcmp?

I'll drop this patch since I will use a list that hangs off the nfs4_client
structure to keep track of the block devices, as you suggested.

-Dai


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 2/3] NFSD: Add infrastructure for tracking persistent SCSI registration keys
  2025-12-15 18:13 [PATCH 0/3] NFSD: Prevent dupplicate SCSI fencing operation Dai Ngo
  2025-12-15 18:13 ` [PATCH 1/3] NFSD: Move clientid_hashval and same_clid to header files Dai Ngo
@ 2025-12-15 18:13 ` Dai Ngo
  2025-12-18  9:34   ` Christoph Hellwig
  2025-12-15 18:13 ` [PATCH 3/3] NFSD: Prevent redundant SCSI fencing operations Dai Ngo
  2 siblings, 1 reply; 16+ messages in thread
From: Dai Ngo @ 2025-12-15 18:13 UTC (permalink / raw)
  To: chuck.lever, jlayton, neilb, okorniev, tom, hch; +Cc: linux-nfs

Introduce a per-net namespace hash table to maintain records of
NFSv4 clients that have been assigned persistent registration
keys for accessing SCSI block devices. Each entry in this table
consists of the client’s ID, the associated SCSI block device
(dev_t), and a fenced status flag.

This infrastructure is used to prevent duplicate fencing operations.

Implementation details:

When a client receives a persistent registration key for a SCSI
device the server creates a new record (with the ‘fenced’ flag set
to false) and inserts it into the hash table for the corresponding
network namespace.

When the server needs to perform a fencing operation, it consults
the client’s record to check the existing fenced status and avoids
issuing redundant fencing operations if the client is already
fenced.

Whenever a client issues a GETDEVINFO call to obtain a new layout,
the server resets the client’s fenced flag to false, allowing for
new fencing actions as appropriate.

Upon client unmount or session teardown, all records belonging to
that client are cleaned up in __destroy_client. The entire hash table
and all contained client records are freed when the corresponding
network namespace is shut down.

Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
---
 fs/nfsd/blocklayout.c | 129 ++++++++++++++++++++++++++++++++++++++++++
 fs/nfsd/netns.h       |   2 +
 fs/nfsd/nfs4state.c   |   1 +
 fs/nfsd/nfsd.h        |  16 ++++++
 fs/nfsd/nfssvc.c      |   9 ++-
 fs/nfsd/pnfs.h        |  11 ++++
 6 files changed, 166 insertions(+), 2 deletions(-)

diff --git a/fs/nfsd/blocklayout.c b/fs/nfsd/blocklayout.c
index afa16d7a8013..f2ab264af88e 100644
--- a/fs/nfsd/blocklayout.c
+++ b/fs/nfsd/blocklayout.c
@@ -426,4 +426,133 @@ const struct nfsd4_layout_ops scsi_layout_ops = {
 	.proc_layoutcommit	= nfsd4_scsi_proc_layoutcommit,
 	.fence_client		= nfsd4_scsi_fence_client,
 };
+
+int
+nfsd4_scsi_pr_init_hashtbl(struct nfsd_net *nn)
+{
+	int ix;
+
+	nn->client_pr_record_hashtbl = kmalloc_array(CLIENT_HASH_SIZE,
+					sizeof(struct list_head),
+					GFP_KERNEL);
+	if (!nn->client_pr_record_hashtbl)
+		return -ENOMEM;
+	spin_lock_init(&nn->client_pr_record_hashtbl_lock);
+	for (ix = 0; ix < CLIENT_HASH_SIZE; ix++)
+		INIT_LIST_HEAD(&nn->client_pr_record_hashtbl[ix]);
+	return 0;
+}
+
+static struct scsi_pr_record *
+nfsd4_pr_find_client(struct nfs4_client *clp, struct block_device *blkdev)
+{
+	struct scsi_pr_record *pr_rec = NULL;
+	struct nfsd_net *nn = net_generic(clp->net, nfsd_net_id);
+	unsigned int idhashval;
+	dev_t bdev = blkdev->bd_dev;
+
+	assert_spin_locked(&nn->client_pr_record_hashtbl_lock);
+	idhashval = clientid_hashval(clp->cl_clientid.cl_id);
+	list_for_each_entry(pr_rec, &nn->client_pr_record_hashtbl[idhashval],
+					spr_hash) {
+		if (same_clid(&pr_rec->spr_clid, &clp->cl_clientid) &&
+				pr_rec->spr_bdev == bdev) {
+			return pr_rec;
+		}
+	}
+	return NULL;
+}
+
+static int
+nfsd4_scsi_pr_add_client(struct nfs4_client *clp, struct block_device *blkdev)
+{
+	unsigned int idhashval;
+	struct scsi_pr_record *pr_rec = NULL;
+	struct scsi_pr_record *rec;
+	struct nfsd_net *nn = net_generic(clp->net, nfsd_net_id);
+	dev_t bdev = blkdev->bd_dev;
+
+	spin_lock(&nn->client_pr_record_hashtbl_lock);
+	rec = nfsd4_pr_find_client(clp, blkdev);
+	if (rec) {
+		rec->spr_fenced = false;
+		spin_unlock(&nn->client_pr_record_hashtbl_lock);
+		return 0;
+	}
+	spin_unlock(&nn->client_pr_record_hashtbl_lock);
+
+	pr_rec = kzalloc(sizeof(struct scsi_pr_record), GFP_KERNEL);
+	if (!pr_rec)
+		return -ENOMEM;
+	pr_rec->spr_clid = clp->cl_clientid;
+	pr_rec->spr_bdev = bdev;
+	pr_rec->spr_fenced = false;
+
+	idhashval = clientid_hashval(clp->cl_clientid.cl_id);
+	spin_lock(&nn->client_pr_record_hashtbl_lock);
+	rec = nfsd4_pr_find_client(clp, blkdev);
+	if (rec) {
+		rec->spr_fenced = false;
+		spin_unlock(&nn->client_pr_record_hashtbl_lock);
+		kfree(pr_rec);
+		return 0;
+	}
+	list_add(&pr_rec->spr_hash, &nn->client_pr_record_hashtbl[idhashval]);
+	spin_unlock(&nn->client_pr_record_hashtbl_lock);
+	return 0;
+
+}
+
+bool
+nfsd4_scsi_pr_fence(struct nfs4_client *clp, struct block_device *blkdev)
+{
+	struct nfsd_net *nn = net_generic(clp->net, nfsd_net_id);
+	struct scsi_pr_record *rec;
+
+	assert_spin_locked(&nn->client_pr_record_hashtbl_lock);
+	rec = nfsd4_pr_find_client(clp, blkdev);
+	if (rec && !rec->spr_fenced) {
+		rec->spr_fenced = true;
+		return true;
+	}
+	return false;
+}
+
+void
+nfsd4_scsi_pr_del_client(struct nfs4_client *clp)
+{
+	struct nfsd_net *nn = net_generic(clp->net, nfsd_net_id);
+	struct scsi_pr_record *entry, *tmp;
+	unsigned int idhashval;
+
+	idhashval = clientid_hashval(clp->cl_clientid.cl_id);
+	spin_lock(&nn->client_pr_record_hashtbl_lock);
+	list_for_each_entry_safe(entry, tmp,
+			&nn->client_pr_record_hashtbl[idhashval], spr_hash) {
+		if (same_clid(&entry->spr_clid, &clp->cl_clientid)) {
+			list_del(&entry->spr_hash);
+			kfree(entry);
+		}
+	}
+	spin_unlock(&nn->client_pr_record_hashtbl_lock);
+}
+
+void
+nfsd4_scsi_pr_shutdown(struct nfsd_net *nn)
+{
+	int ix;
+	struct scsi_pr_record *entry;
+
+	spin_lock(&nn->client_pr_record_hashtbl_lock);
+	for (ix = 0; ix < CLIENT_HASH_SIZE; ix++) {
+		while (!list_empty(&nn->client_pr_record_hashtbl[ix])) {
+			entry = list_entry(nn->client_pr_record_hashtbl[ix].next,
+					struct scsi_pr_record, spr_hash);
+			list_del(&entry->spr_hash);
+			kfree(entry);
+		}
+	}
+	spin_unlock(&nn->client_pr_record_hashtbl_lock);
+	kfree(nn->client_pr_record_hashtbl);
+}
 #endif /* CONFIG_NFSD_SCSILAYOUT */
diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
index 3e2d0fde80a7..466f8478d8f3 100644
--- a/fs/nfsd/netns.h
+++ b/fs/nfsd/netns.h
@@ -217,6 +217,8 @@ struct nfsd_net {
 	spinlock_t              local_clients_lock;
 	struct list_head	local_clients;
 #endif
+	spinlock_t		client_pr_record_hashtbl_lock;
+	struct list_head	*client_pr_record_hashtbl;
 };
 
 /* Simple check to find out if a given net was properly initialized */
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 13b4dc89b1e8..70009e882a4a 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -2528,6 +2528,7 @@ __destroy_client(struct nfs4_client *clp)
 		svc_xprt_put(clp->cl_cb_conn.cb_xprt);
 	atomic_add_unless(&nn->nfs4_client_count, -1, 0);
 	nfsd4_dec_courtesy_client_count(nn, clp);
+	nfsd4_scsi_pr_del_client(clp);
 	free_client(clp);
 	wake_up_all(&expiry_wq);
 }
diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
index 369efe6ed665..4572daa94a79 100644
--- a/fs/nfsd/nfsd.h
+++ b/fs/nfsd/nfsd.h
@@ -570,6 +570,15 @@ extern void nfsd4_ssc_init_umount_work(struct nfsd_net *nn);
 
 extern void nfsd4_init_leases_net(struct nfsd_net *nn);
 
+extern int nfsd4_scsi_pr_init_hashtbl(struct nfsd_net *net);
+extern void nfsd4_scsi_pr_shutdown(struct nfsd_net *net);
+struct nfs4_client;
+extern void nfsd4_scsi_pr_del_client(struct nfs4_client *clp);
+extern int nfsd4_scsi_pr_add_client(struct nfs4_client *clp,
+		struct block_device *blkdev);
+extern bool nfsd4_scsi_pr_fence(struct nfs4_client *clp,
+		struct block_device *blkdev);
+
 #else /* CONFIG_NFSD_V4 */
 static inline int nfsd4_is_junction(struct dentry *dentry)
 {
@@ -578,6 +587,13 @@ static inline int nfsd4_is_junction(struct dentry *dentry)
 
 static inline void nfsd4_init_leases_net(struct nfsd_net *nn) { };
 
+extern inline int nfsd4_scsi_pr_init_hashtbl(struct nfsd_net *nn)
+{
+	return 0;
+}
+
+extern inline void nfsd4_scsi_pr_shutdown(struct nfsd_net *nn) {};
+
 #define register_cld_notifier() 0
 #define unregister_cld_notifier() do { } while(0)
 
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index f6cae4430ba4..d4f96291b4b5 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -381,13 +381,17 @@ static int nfsd_startup_net(struct net *net, const struct cred *cred)
 #ifdef CONFIG_NFSD_V4_2_INTER_SSC
 	nfsd4_ssc_init_umount_work(nn);
 #endif
-	ret = nfs4_state_start_net(net);
+	ret = nfsd4_scsi_pr_init_hashtbl(nn);
 	if (ret)
 		goto out_reply_cache;
-
+	ret = nfs4_state_start_net(net);
+	if (ret)
+		goto out_pr_client;
 	nn->nfsd_net_up = true;
 	return 0;
 
+out_pr_client:
+	nfsd4_scsi_pr_shutdown(nn);
 out_reply_cache:
 	nfsd_reply_cache_shutdown(nn);
 out_filecache:
@@ -423,6 +427,7 @@ static void nfsd_shutdown_net(struct net *net)
 
 	wait_for_completion(&nn->nfsd_net_free_done);
 	percpu_ref_exit(&nn->nfsd_net_ref);
+	nfsd4_scsi_pr_shutdown(nn);
 
 	nn->nfsd_net_up = false;
 	nfsd_shutdown_generic();
diff --git a/fs/nfsd/pnfs.h b/fs/nfsd/pnfs.h
index db9af780438b..9ae02b6d922d 100644
--- a/fs/nfsd/pnfs.h
+++ b/fs/nfsd/pnfs.h
@@ -67,6 +67,17 @@ __be32 nfsd4_return_client_layouts(struct svc_rqst *rqstp,
 int nfsd4_set_deviceid(struct nfsd4_deviceid *id, const struct svc_fh *fhp,
 		u32 device_generation);
 struct nfsd4_deviceid_map *nfsd4_find_devid_map(int idx);
+
+int nfsd4_scsi_pr_init_hashtbl(struct nfsd_net *nn);
+void nfsd4_scsi_pr_shutdown(struct nfsd_net *nn);
+void nfsd4_scsi_pr_del_client(struct nfs4_client *clp);
+
+struct scsi_pr_record {
+	struct list_head spr_hash;
+	clientid_t spr_clid;
+	dev_t spr_bdev;
+	bool spr_fenced;
+};
 #endif /* CONFIG_NFSD_V4 */
 
 #ifdef CONFIG_NFSD_PNFS
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] NFSD: Add infrastructure for tracking persistent SCSI registration keys
  2025-12-15 18:13 ` [PATCH 2/3] NFSD: Add infrastructure for tracking persistent SCSI registration keys Dai Ngo
@ 2025-12-18  9:34   ` Christoph Hellwig
  2025-12-18 16:00     ` Chuck Lever
  2025-12-18 19:40     ` Dai Ngo
  0 siblings, 2 replies; 16+ messages in thread
From: Christoph Hellwig @ 2025-12-18  9:34 UTC (permalink / raw)
  To: Dai Ngo; +Cc: chuck.lever, jlayton, neilb, okorniev, tom, hch, linux-nfs

On Mon, Dec 15, 2025 at 10:13:34AM -0800, Dai Ngo wrote:
> +
> +int
> +nfsd4_scsi_pr_init_hashtbl(struct nfsd_net *nn)
> +{
> +	int ix;
> +
> +	nn->client_pr_record_hashtbl = kmalloc_array(CLIENT_HASH_SIZE,
> +					sizeof(struct list_head),
> +					GFP_KERNEL);
> +	if (!nn->client_pr_record_hashtbl)
> +		return -ENOMEM;
> +	spin_lock_init(&nn->client_pr_record_hashtbl_lock);
> +	for (ix = 0; ix < CLIENT_HASH_SIZE; ix++)
> +		INIT_LIST_HEAD(&nn->client_pr_record_hashtbl[ix]);
> +	return 0;

I guess there is precendent in the nfsd code in using this fixed size
hash table, but they are not very scalable.  And using the rhastable
API is actually relatively simple, so it might be easier to use that
than rolling your own hash.

If you stick to the fixes size open code hash, you should use a
hlist_head here.  There is no advantage in having a pointer to the tail
entry for hashes, and the hlist saves half of the memory usage and
improves cache efficiency.

But taking a step back:  why do we even need a new hash table here?
Can't we jut hang off a list of block device for which a layout
was granted off the nfs4_client structure given that we already
have it available?

> +static struct scsi_pr_record *
> +nfsd4_pr_find_client(struct nfs4_client *clp, struct block_device *blkdev)
> +{
> +	struct scsi_pr_record *pr_rec = NULL;
> +	struct nfsd_net *nn = net_generic(clp->net, nfsd_net_id);
> +	unsigned int idhashval;
> +	dev_t bdev = blkdev->bd_dev;
> +
> +	assert_spin_locked(&nn->client_pr_record_hashtbl_lock);
> +	idhashval = clientid_hashval(clp->cl_clientid.cl_id);
> +	list_for_each_entry(pr_rec, &nn->client_pr_record_hashtbl[idhashval],
> +					spr_hash) {
> +		if (same_clid(&pr_rec->spr_clid, &clp->cl_clientid) &&
> +				pr_rec->spr_bdev == bdev) {
> +			return pr_rec;
> +		}

This ensures that you always have collisions for multiple bdevs of the
same client.  Why not use a hash the mixes entropy from the client id
and the bdev?

> +bool
> +nfsd4_scsi_pr_fence(struct nfs4_client *clp, struct block_device *blkdev)
> +{
> +	struct nfsd_net *nn = net_generic(clp->net, nfsd_net_id);
> +	struct scsi_pr_record *rec;
> +
> +	assert_spin_locked(&nn->client_pr_record_hashtbl_lock);
> +	rec = nfsd4_pr_find_client(clp, blkdev);
> +	if (rec && !rec->spr_fenced) {
> +		rec->spr_fenced = true;
> +		return true;
> +	}
> +	return false;
> +}

This function seems misnamed.  It doesn't do any actualy fencing.

> +extern int nfsd4_scsi_pr_init_hashtbl(struct nfsd_net *net);
> +extern void nfsd4_scsi_pr_shutdown(struct nfsd_net *net);
> +struct nfs4_client;
> +extern void nfsd4_scsi_pr_del_client(struct nfs4_client *clp);
> +extern int nfsd4_scsi_pr_add_client(struct nfs4_client *clp,
> +		struct block_device *blkdev);
> +extern bool nfsd4_scsi_pr_fence(struct nfs4_client *clp,
> +		struct block_device *blkdev);

Some of these are only used inside of blocklayout, so mark them
static.  For the others drop the extern.  Also please keep
struct forward declarations at the top of the file.

> +
>  #else /* CONFIG_NFSD_V4 */
>  static inline int nfsd4_is_junction(struct dentry *dentry)
>  {
> @@ -578,6 +587,13 @@ static inline int nfsd4_is_junction(struct dentry *dentry)
>  
>  static inline void nfsd4_init_leases_net(struct nfsd_net *nn) { };
>  
> +extern inline int nfsd4_scsi_pr_init_hashtbl(struct nfsd_net *nn)

This should be static inline.

> index db9af780438b..9ae02b6d922d 100644
> --- a/fs/nfsd/pnfs.h
> +++ b/fs/nfsd/pnfs.h
> @@ -67,6 +67,17 @@ __be32 nfsd4_return_client_layouts(struct svc_rqst *rqstp,
>  int nfsd4_set_deviceid(struct nfsd4_deviceid *id, const struct svc_fh *fhp,
>  		u32 device_generation);
>  struct nfsd4_deviceid_map *nfsd4_find_devid_map(int idx);
> +
> +int nfsd4_scsi_pr_init_hashtbl(struct nfsd_net *nn);
> +void nfsd4_scsi_pr_shutdown(struct nfsd_net *nn);
> +void nfsd4_scsi_pr_del_client(struct nfs4_client *clp);

These duplicate the prototypes in nfsd.h.

> +struct scsi_pr_record {
> +	struct list_head spr_hash;
> +	clientid_t spr_clid;
> +	dev_t spr_bdev;
> +	bool spr_fenced;
> +};

This can be kept static in blocklayout.c


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] NFSD: Add infrastructure for tracking persistent SCSI registration keys
  2025-12-18  9:34   ` Christoph Hellwig
@ 2025-12-18 16:00     ` Chuck Lever
  2025-12-18 19:44       ` Dai Ngo
  2025-12-19  5:24       ` Christoph Hellwig
  2025-12-18 19:40     ` Dai Ngo
  1 sibling, 2 replies; 16+ messages in thread
From: Chuck Lever @ 2025-12-18 16:00 UTC (permalink / raw)
  To: Christoph Hellwig, Dai Ngo; +Cc: jlayton, neilb, okorniev, tom, linux-nfs

On 12/18/25 4:34 AM, Christoph Hellwig wrote:
> On Mon, Dec 15, 2025 at 10:13:34AM -0800, Dai Ngo wrote:
>> +
>> +int
>> +nfsd4_scsi_pr_init_hashtbl(struct nfsd_net *nn)
>> +{
>> +	int ix;
>> +
>> +	nn->client_pr_record_hashtbl = kmalloc_array(CLIENT_HASH_SIZE,
>> +					sizeof(struct list_head),
>> +					GFP_KERNEL);
>> +	if (!nn->client_pr_record_hashtbl)
>> +		return -ENOMEM;
>> +	spin_lock_init(&nn->client_pr_record_hashtbl_lock);
>> +	for (ix = 0; ix < CLIENT_HASH_SIZE; ix++)
>> +		INIT_LIST_HEAD(&nn->client_pr_record_hashtbl[ix]);
>> +	return 0;
> I guess there is precendent in the nfsd code in using this fixed size
> hash table, but they are not very scalable.  And using the rhastable
> API is actually relatively simple, so it might be easier to use that
> than rolling your own hash.
> 
> If you stick to the fixes size open code hash, you should use a
> hlist_head here.  There is no advantage in having a pointer to the tail
> entry for hashes, and the hlist saves half of the memory usage and
> improves cache efficiency.
> 
> But taking a step back:  why do we even need a new hash table here?
> Can't we jut hang off a list of block device for which a layout
> was granted off the nfs4_client structure given that we already
> have it available?

My question is: how many items will this table need to track,
on average? at maximum?


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] NFSD: Add infrastructure for tracking persistent SCSI registration keys
  2025-12-18 16:00     ` Chuck Lever
@ 2025-12-18 19:44       ` Dai Ngo
  2025-12-19  5:24       ` Christoph Hellwig
  1 sibling, 0 replies; 16+ messages in thread
From: Dai Ngo @ 2025-12-18 19:44 UTC (permalink / raw)
  To: Chuck Lever, Christoph Hellwig; +Cc: jlayton, neilb, okorniev, tom, linux-nfs

On 12/18/25 8:00 AM, Chuck Lever wrote:
> On 12/18/25 4:34 AM, Christoph Hellwig wrote:
>> On Mon, Dec 15, 2025 at 10:13:34AM -0800, Dai Ngo wrote:
>>> +
>>> +int
>>> +nfsd4_scsi_pr_init_hashtbl(struct nfsd_net *nn)
>>> +{
>>> +	int ix;
>>> +
>>> +	nn->client_pr_record_hashtbl = kmalloc_array(CLIENT_HASH_SIZE,
>>> +					sizeof(struct list_head),
>>> +					GFP_KERNEL);
>>> +	if (!nn->client_pr_record_hashtbl)
>>> +		return -ENOMEM;
>>> +	spin_lock_init(&nn->client_pr_record_hashtbl_lock);
>>> +	for (ix = 0; ix < CLIENT_HASH_SIZE; ix++)
>>> +		INIT_LIST_HEAD(&nn->client_pr_record_hashtbl[ix]);
>>> +	return 0;
>> I guess there is precendent in the nfsd code in using this fixed size
>> hash table, but they are not very scalable.  And using the rhastable
>> API is actually relatively simple, so it might be easier to use that
>> than rolling your own hash.
>>
>> If you stick to the fixes size open code hash, you should use a
>> hlist_head here.  There is no advantage in having a pointer to the tail
>> entry for hashes, and the hlist saves half of the memory usage and
>> improves cache efficiency.
>>
>> But taking a step back:  why do we even need a new hash table here?
>> Can't we jut hang off a list of block device for which a layout
>> was granted off the nfs4_client structure given that we already
>> have it available?
> My question is: how many items will this table need to track,
> on average? at maximum?

I don't have the exact answer but my guess is a NFS client would not be
configured to access too many shares with SCSI layout type. The maximum
maybe less then 100 and the average is less than 10?

-Dai


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] NFSD: Add infrastructure for tracking persistent SCSI registration keys
  2025-12-18 16:00     ` Chuck Lever
  2025-12-18 19:44       ` Dai Ngo
@ 2025-12-19  5:24       ` Christoph Hellwig
  2025-12-19 13:40         ` Chuck Lever
  1 sibling, 1 reply; 16+ messages in thread
From: Christoph Hellwig @ 2025-12-19  5:24 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Christoph Hellwig, Dai Ngo, jlayton, neilb, okorniev, tom,
	linux-nfs

On Thu, Dec 18, 2025 at 11:00:52AM -0500, Chuck Lever wrote:
> > But taking a step back:  why do we even need a new hash table here?
> > Can't we jut hang off a list of block device for which a layout
> > was granted off the nfs4_client structure given that we already
> > have it available?
> 
> My question is: how many items will this table need to track,
> on average? at maximum?

A good question that I do not have an answer to.  In my experience most
NFS deployments actually use exactly one export per client, and I don't
think I've seen a production deployment with more than a dozen exports
mounted on a single client.  Then again I'm usually pretty well shielded
from the worst enterprise deployments, so who knows especially in the
days of containers.  Multiply that by the number of clients.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] NFSD: Add infrastructure for tracking persistent SCSI registration keys
  2025-12-19  5:24       ` Christoph Hellwig
@ 2025-12-19 13:40         ` Chuck Lever
  0 siblings, 0 replies; 16+ messages in thread
From: Chuck Lever @ 2025-12-19 13:40 UTC (permalink / raw)
  To: Christoph Hellwig, Dai Ngo
  Cc: Jeff Layton, neilb, Olga Kornievskaia, Tom Talpey, linux-nfs,
	Chuck Lever



On Fri, Dec 19, 2025, at 12:24 AM, Christoph Hellwig wrote:
> On Thu, Dec 18, 2025 at 11:00:52AM -0500, Chuck Lever wrote:
>> > But taking a step back:  why do we even need a new hash table here?
>> > Can't we jut hang off a list of block device for which a layout
>> > was granted off the nfs4_client structure given that we already
>> > have it available?
>> 
>> My question is: how many items will this table need to track,
>> on average? at maximum?
>
> A good question that I do not have an answer to.  In my experience most
> NFS deployments actually use exactly one export per client, and I don't
> think I've seen a production deployment with more than a dozen exports
> mounted on a single client.  Then again I'm usually pretty well shielded
> from the worst enterprise deployments, so who knows especially in the
> days of containers.  Multiply that by the number of clients.

In my head that's getting up into the hundreds, potentially. I'd
say a linked list is right out. NFSD has utilized rhashtables for
several other applications to great success, so there is existing
code that can easily be copied for this use case.

We might also consider an xarray, which has an ultra-simple API
for basic uses, is reasonably scalable, and has somewhat better
memory behavior than hash tables.

-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] NFSD: Add infrastructure for tracking persistent SCSI registration keys
  2025-12-18  9:34   ` Christoph Hellwig
  2025-12-18 16:00     ` Chuck Lever
@ 2025-12-18 19:40     ` Dai Ngo
  1 sibling, 0 replies; 16+ messages in thread
From: Dai Ngo @ 2025-12-18 19:40 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: chuck.lever, jlayton, neilb, okorniev, tom, linux-nfs


On 12/18/25 1:34 AM, Christoph Hellwig wrote:
> On Mon, Dec 15, 2025 at 10:13:34AM -0800, Dai Ngo wrote:
>> +
>> +int
>> +nfsd4_scsi_pr_init_hashtbl(struct nfsd_net *nn)
>> +{
>> +	int ix;
>> +
>> +	nn->client_pr_record_hashtbl = kmalloc_array(CLIENT_HASH_SIZE,
>> +					sizeof(struct list_head),
>> +					GFP_KERNEL);
>> +	if (!nn->client_pr_record_hashtbl)
>> +		return -ENOMEM;
>> +	spin_lock_init(&nn->client_pr_record_hashtbl_lock);
>> +	for (ix = 0; ix < CLIENT_HASH_SIZE; ix++)
>> +		INIT_LIST_HEAD(&nn->client_pr_record_hashtbl[ix]);
>> +	return 0;
> I guess there is precendent in the nfsd code in using this fixed size
> hash table, but they are not very scalable.  And using the rhastable
> API is actually relatively simple, so it might be easier to use that
> than rolling your own hash.
>
> If you stick to the fixes size open code hash, you should use a
> hlist_head here.  There is no advantage in having a pointer to the tail
> entry for hashes, and the hlist saves half of the memory usage and
> improves cache efficiency.
>
> But taking a step back:  why do we even need a new hash table here?
> Can't we jut hang off a list of block device for which a layout
> was granted off the nfs4_client structure given that we already
> have it available?

Yes, a simple list hang of the nfs4_client structure makes sense.

I don't think a NFS client would be configured to access too many
pNFS shares with SCSI layout type. Also, fencing is not expected
to be a frequent operation so traversing the list should not effect
the overall performance of the NFS server.

>
>> +static struct scsi_pr_record *
>> +nfsd4_pr_find_client(struct nfs4_client *clp, struct block_device *blkdev)
>> +{
>> +	struct scsi_pr_record *pr_rec = NULL;
>> +	struct nfsd_net *nn = net_generic(clp->net, nfsd_net_id);
>> +	unsigned int idhashval;
>> +	dev_t bdev = blkdev->bd_dev;
>> +
>> +	assert_spin_locked(&nn->client_pr_record_hashtbl_lock);
>> +	idhashval = clientid_hashval(clp->cl_clientid.cl_id);
>> +	list_for_each_entry(pr_rec, &nn->client_pr_record_hashtbl[idhashval],
>> +					spr_hash) {
>> +		if (same_clid(&pr_rec->spr_clid, &clp->cl_clientid) &&
>> +				pr_rec->spr_bdev == bdev) {
>> +			return pr_rec;
>> +		}
> This ensures that you always have collisions for multiple bdevs of the
> same client.  Why not use a hash the mixes entropy from the client id
> and the bdev?

This is no longer needed when switching to use a simple list.

>
>> +bool
>> +nfsd4_scsi_pr_fence(struct nfs4_client *clp, struct block_device *blkdev)
>> +{
>> +	struct nfsd_net *nn = net_generic(clp->net, nfsd_net_id);
>> +	struct scsi_pr_record *rec;
>> +
>> +	assert_spin_locked(&nn->client_pr_record_hashtbl_lock);
>> +	rec = nfsd4_pr_find_client(clp, blkdev);
>> +	if (rec && !rec->spr_fenced) {
>> +		rec->spr_fenced = true;
>> +		return true;
>> +	}
>> +	return false;
>> +}
> This function seems misnamed.  It doesn't do any actualy fencing.

will fix in v2.

>
>> +extern int nfsd4_scsi_pr_init_hashtbl(struct nfsd_net *net);
>> +extern void nfsd4_scsi_pr_shutdown(struct nfsd_net *net);
>> +struct nfs4_client;
>> +extern void nfsd4_scsi_pr_del_client(struct nfs4_client *clp);
>> +extern int nfsd4_scsi_pr_add_client(struct nfs4_client *clp,
>> +		struct block_device *blkdev);
>> +extern bool nfsd4_scsi_pr_fence(struct nfs4_client *clp,
>> +		struct block_device *blkdev);
> Some of these are only used inside of blocklayout, so mark them
> static.  For the others drop the extern.  Also please keep
> struct forward declarations at the top of the file.

will fix in v2.

>
>> +
>>   #else /* CONFIG_NFSD_V4 */
>>   static inline int nfsd4_is_junction(struct dentry *dentry)
>>   {
>> @@ -578,6 +587,13 @@ static inline int nfsd4_is_junction(struct dentry *dentry)
>>   
>>   static inline void nfsd4_init_leases_net(struct nfsd_net *nn) { };
>>   
>> +extern inline int nfsd4_scsi_pr_init_hashtbl(struct nfsd_net *nn)
> This should be static inline.

no longer need this with a simple list.

>
>> index db9af780438b..9ae02b6d922d 100644
>> --- a/fs/nfsd/pnfs.h
>> +++ b/fs/nfsd/pnfs.h
>> @@ -67,6 +67,17 @@ __be32 nfsd4_return_client_layouts(struct svc_rqst *rqstp,
>>   int nfsd4_set_deviceid(struct nfsd4_deviceid *id, const struct svc_fh *fhp,
>>   		u32 device_generation);
>>   struct nfsd4_deviceid_map *nfsd4_find_devid_map(int idx);
>> +
>> +int nfsd4_scsi_pr_init_hashtbl(struct nfsd_net *nn);
>> +void nfsd4_scsi_pr_shutdown(struct nfsd_net *nn);
>> +void nfsd4_scsi_pr_del_client(struct nfs4_client *clp);
> These duplicate the prototypes in nfsd.h.

will fix in v2.

>
>> +struct scsi_pr_record {
>> +	struct list_head spr_hash;
>> +	clientid_t spr_clid;
>> +	dev_t spr_bdev;
>> +	bool spr_fenced;
>> +};
> This can be kept static in blocklayout.c

will fix in v2.

Thanks,
-Dai


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 3/3] NFSD: Prevent redundant SCSI fencing operations
  2025-12-15 18:13 [PATCH 0/3] NFSD: Prevent dupplicate SCSI fencing operation Dai Ngo
  2025-12-15 18:13 ` [PATCH 1/3] NFSD: Move clientid_hashval and same_clid to header files Dai Ngo
  2025-12-15 18:13 ` [PATCH 2/3] NFSD: Add infrastructure for tracking persistent SCSI registration keys Dai Ngo
@ 2025-12-15 18:13 ` Dai Ngo
  2025-12-18  9:34   ` Christoph Hellwig
  2 siblings, 1 reply; 16+ messages in thread
From: Dai Ngo @ 2025-12-15 18:13 UTC (permalink / raw)
  To: chuck.lever, jlayton, neilb, okorniev, tom, hch; +Cc: linux-nfs

Update the server’s handling of SCSI persistent registration fencing
by leveraging the per-namespace SCSI client registration hash table
to record and check fencing state for each client-device pair.

Implementation details:

When the server issues a persistent registration key to a client, it
creates a new record containing the client ID, associated SCSI block
device (dev_t), and initializes the fenced flag to false. This record
is inserted into the hash table for fast lookups.

When the server needs to fence a client via nfsd4_scsi_fence_client,
it first looks up the corresponding record using the client ID and
device identifier. If the fenced flag in the record is already set,
the server skips issuing another fence operation and simply returns.
If the flag is not set, it is updated to true and the fencing operation
proceeds.

By tracking fencing state in the hash table, the server avoids issuing
redundant fencing operations for the same client and device, improving
efficiency and ensuring proper SCSI reservation management.

Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
---
 fs/nfsd/blocklayout.c | 16 +++++++++++++++-
 fs/nfsd/nfsd.h        |  2 --
 2 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/fs/nfsd/blocklayout.c b/fs/nfsd/blocklayout.c
index f2ab264af88e..73961e394311 100644
--- a/fs/nfsd/blocklayout.c
+++ b/fs/nfsd/blocklayout.c
@@ -17,7 +17,8 @@
 
 #define NFSDDBG_FACILITY	NFSDDBG_PNFS
 
-
+static int nfsd4_scsi_pr_add_client(struct nfs4_client *clp,
+		struct block_device *blkdev);
 /*
  * Get an extent from the file system that starts at offset or below
  * and may be shorter than the requested length.
@@ -357,6 +358,10 @@ nfsd4_block_get_device_info_scsi(struct super_block *sb,
 		goto out_free_dev;
 	}
 
+	ret = nfsd4_scsi_pr_add_client(clp, sb->s_bdev);
+	if (ret)
+		goto out_free_dev;
+
 	return 0;
 
 out_free_dev:
@@ -399,11 +404,20 @@ nfsd4_scsi_fence_client(struct nfs4_layout_stateid *ls, struct nfsd_file *file)
 {
 	struct nfs4_client *clp = ls->ls_stid.sc_client;
 	struct block_device *bdev = file->nf_file->f_path.mnt->mnt_sb->s_bdev;
+	struct nfsd_net *nn = net_generic(clp->net, nfsd_net_id);
 	int status;
 
+	spin_lock(&nn->client_pr_record_hashtbl_lock);
+	if (!nfsd4_scsi_pr_fence(clp, bdev)) {
+		spin_unlock(&nn->client_pr_record_hashtbl_lock);
+		return;
+	}
+	spin_unlock(&nn->client_pr_record_hashtbl_lock);
 	status = bdev->bd_disk->fops->pr_ops->pr_preempt(bdev, NFSD_MDS_PR_KEY,
 			nfsd4_scsi_pr_key(clp),
 			PR_EXCLUSIVE_ACCESS_REG_ONLY, true);
+	if (status)
+		nfsd4_scsi_pr_add_client(clp, bdev);
 	trace_nfsd_pnfs_fence(clp, bdev->bd_disk->disk_name, status);
 }
 
diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
index 4572daa94a79..5a60419395b2 100644
--- a/fs/nfsd/nfsd.h
+++ b/fs/nfsd/nfsd.h
@@ -574,8 +574,6 @@ extern int nfsd4_scsi_pr_init_hashtbl(struct nfsd_net *net);
 extern void nfsd4_scsi_pr_shutdown(struct nfsd_net *net);
 struct nfs4_client;
 extern void nfsd4_scsi_pr_del_client(struct nfs4_client *clp);
-extern int nfsd4_scsi_pr_add_client(struct nfs4_client *clp,
-		struct block_device *blkdev);
 extern bool nfsd4_scsi_pr_fence(struct nfs4_client *clp,
 		struct block_device *blkdev);
 
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] NFSD: Prevent redundant SCSI fencing operations
  2025-12-15 18:13 ` [PATCH 3/3] NFSD: Prevent redundant SCSI fencing operations Dai Ngo
@ 2025-12-18  9:34   ` Christoph Hellwig
  2025-12-18 19:41     ` Dai Ngo
  0 siblings, 1 reply; 16+ messages in thread
From: Christoph Hellwig @ 2025-12-18  9:34 UTC (permalink / raw)
  To: Dai Ngo; +Cc: chuck.lever, jlayton, neilb, okorniev, tom, hch, linux-nfs

This should probably be kept with the previous patch it is closely
related.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] NFSD: Prevent redundant SCSI fencing operations
  2025-12-18  9:34   ` Christoph Hellwig
@ 2025-12-18 19:41     ` Dai Ngo
  0 siblings, 0 replies; 16+ messages in thread
From: Dai Ngo @ 2025-12-18 19:41 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: chuck.lever, jlayton, neilb, okorniev, tom, linux-nfs


On 12/18/25 1:34 AM, Christoph Hellwig wrote:
> This should probably be kept with the previous patch it is closely
> related.

will do.

-Dai


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2025-12-19 13:41 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-15 18:13 [PATCH 0/3] NFSD: Prevent dupplicate SCSI fencing operation Dai Ngo
2025-12-15 18:13 ` [PATCH 1/3] NFSD: Move clientid_hashval and same_clid to header files Dai Ngo
2025-12-15 18:58   ` Chuck Lever
2025-12-15 20:50     ` Dai Ngo
2025-12-18  9:25     ` Christoph Hellwig
2025-12-18 19:40       ` Dai Ngo
2025-12-15 18:13 ` [PATCH 2/3] NFSD: Add infrastructure for tracking persistent SCSI registration keys Dai Ngo
2025-12-18  9:34   ` Christoph Hellwig
2025-12-18 16:00     ` Chuck Lever
2025-12-18 19:44       ` Dai Ngo
2025-12-19  5:24       ` Christoph Hellwig
2025-12-19 13:40         ` Chuck Lever
2025-12-18 19:40     ` Dai Ngo
2025-12-15 18:13 ` [PATCH 3/3] NFSD: Prevent redundant SCSI fencing operations Dai Ngo
2025-12-18  9:34   ` Christoph Hellwig
2025-12-18 19:41     ` Dai Ngo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox