* [PATCH v3 1/1] pNFS: Serialize SCSI PR registration to avoid reservation conflicts
@ 2026-03-06 16:29 Dai Ngo
2026-03-06 18:46 ` Dai Ngo
0 siblings, 1 reply; 4+ messages in thread
From: Dai Ngo @ 2026-03-06 16:29 UTC (permalink / raw)
To: trondmy, anna; +Cc: linux-nfs
With SCSI layouts, the NFS client must not submit I/O to the data server
until the Persistent Reservation (PR) registration has completed.
Currently, bl_register_scsi() sets PNFS_BDEV_REGISTERED before performing
the PR operation. If multiple threads concurrently start I/O to the same
SCSI device, the first thread sets the flag and begins registration,
while other threads observe the flag, skip registration, and proceed to
issue I/O. Those I/Os can hit RESERVATION CONFLICT, forcing fall back to
the MDS.
Protect the registration path with a mutex so only one thread performs
PR registration at a time. Other threads wait for registration to finish
and only then re-check PNFS_BDEV_REGISTERED, ensuring no I/O is issued
until PR registration is complete.
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
---
fs/nfs/blocklayout/blocklayout.h | 8 +++-----
fs/nfs/blocklayout/dev.c | 15 ++++++++++++---
2 files changed, 15 insertions(+), 8 deletions(-)
v2:
. remove fio test from commit message.
. rename pbd_mutex to pbd_registration_mutex and add a description
of its usage.
. move declaration of pbd_registration_mutex before the (*map)().
. protect unregistration op with pbd_registration_mutex.
v3:
. replace PNFS_BDEV_REGISTERED flag in pnfs_block_dev with
blkdev_registered boolean.
diff --git a/fs/nfs/blocklayout/blocklayout.h b/fs/nfs/blocklayout/blocklayout.h
index 6da40ca19570..311b14334902 100644
--- a/fs/nfs/blocklayout/blocklayout.h
+++ b/fs/nfs/blocklayout/blocklayout.h
@@ -111,17 +111,15 @@ struct pnfs_block_dev {
struct file *bdev_file;
u64 disk_offset;
- unsigned long flags;
+ bool blkdev_registered;
u64 pr_key;
+ /* Mutex to serialize SCSI PR register/unregister operations. */
+ struct mutex pbd_registration_mutex;
bool (*map)(struct pnfs_block_dev *dev, u64 offset,
struct pnfs_block_dev_map *map);
-};
-/* pnfs_block_dev flag bits */
-enum {
- PNFS_BDEV_REGISTERED = 0,
};
/* sector_t fields are all in 512-byte sectors */
diff --git a/fs/nfs/blocklayout/dev.c b/fs/nfs/blocklayout/dev.c
index cc6327d97a91..f1e77c4290ae 100644
--- a/fs/nfs/blocklayout/dev.c
+++ b/fs/nfs/blocklayout/dev.c
@@ -33,10 +33,15 @@ static bool bl_register_scsi(struct pnfs_block_dev *dev)
const struct pr_ops *ops = bdev->bd_disk->fops->pr_ops;
int status;
- if (test_and_set_bit(PNFS_BDEV_REGISTERED, &dev->flags))
+ mutex_lock(&dev->pbd_registration_mutex);
+ if (dev->blkdev_registered) {
+ mutex_unlock(&dev->pbd_registration_mutex);
return true;
+ }
+ dev->blkdev_registered = true;
status = ops->pr_register(bdev, 0, dev->pr_key, true);
+ mutex_unlock(&dev->pbd_registration_mutex);
if (status) {
trace_bl_pr_key_reg_err(bdev, dev->pr_key, status);
return false;
@@ -55,9 +60,12 @@ static void bl_unregister_dev(struct pnfs_block_dev *dev)
return;
}
- if (dev->type == PNFS_BLOCK_VOLUME_SCSI &&
- test_and_clear_bit(PNFS_BDEV_REGISTERED, &dev->flags))
+ mutex_lock(&dev->pbd_registration_mutex);
+ if (dev->type == PNFS_BLOCK_VOLUME_SCSI && dev->blkdev_registered) {
+ dev->blkdev_registered = false;
bl_unregister_scsi(dev);
+ }
+ mutex_unlock(&dev->pbd_registration_mutex);
}
bool bl_register_dev(struct pnfs_block_dev *dev)
@@ -572,6 +580,7 @@ bl_alloc_deviceid_node(struct nfs_server *server, struct pnfs_device *pdev,
top = kzalloc_obj(*top, gfp_mask);
if (!top)
goto out_free_volumes;
+ mutex_init(&top->pbd_registration_mutex);
ret = bl_parse_deviceid(server, top, volumes, nr_volumes - 1, gfp_mask);
--
2.47.3
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH v3 1/1] pNFS: Serialize SCSI PR registration to avoid reservation conflicts
2026-03-06 16:29 [PATCH v3 1/1] pNFS: Serialize SCSI PR registration to avoid reservation conflicts Dai Ngo
@ 2026-03-06 18:46 ` Dai Ngo
2026-03-09 15:34 ` Christoph Hellwig
0 siblings, 1 reply; 4+ messages in thread
From: Dai Ngo @ 2026-03-06 18:46 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: linux-nfs, Trond Myklebust, anna
Christoph,
The new mutex, pbd_registration_mutex, is initialized only for the
top-level pnfs_block_dev in this patch. The mutexes for child pnfs_block_dev
instances allocated by bl_parse_concat() and bl_parse_stripe() are not
initialized.
Should we initialize these child mutexes as well, in case we later want
to support concatenated and striped SCSI devices for pNFS?
Thanks,
-Dai
On 3/6/26 8:29 AM, Dai Ngo wrote:
> With SCSI layouts, the NFS client must not submit I/O to the data server
> until the Persistent Reservation (PR) registration has completed.
>
> Currently, bl_register_scsi() sets PNFS_BDEV_REGISTERED before performing
> the PR operation. If multiple threads concurrently start I/O to the same
> SCSI device, the first thread sets the flag and begins registration,
> while other threads observe the flag, skip registration, and proceed to
> issue I/O. Those I/Os can hit RESERVATION CONFLICT, forcing fall back to
> the MDS.
>
> Protect the registration path with a mutex so only one thread performs
> PR registration at a time. Other threads wait for registration to finish
> and only then re-check PNFS_BDEV_REGISTERED, ensuring no I/O is issued
> until PR registration is complete.
>
> Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
> ---
> fs/nfs/blocklayout/blocklayout.h | 8 +++-----
> fs/nfs/blocklayout/dev.c | 15 ++++++++++++---
> 2 files changed, 15 insertions(+), 8 deletions(-)
>
> v2:
> . remove fio test from commit message.
> . rename pbd_mutex to pbd_registration_mutex and add a description
> of its usage.
> . move declaration of pbd_registration_mutex before the (*map)().
> . protect unregistration op with pbd_registration_mutex.
> v3:
> . replace PNFS_BDEV_REGISTERED flag in pnfs_block_dev with
> blkdev_registered boolean.
>
> diff --git a/fs/nfs/blocklayout/blocklayout.h b/fs/nfs/blocklayout/blocklayout.h
> index 6da40ca19570..311b14334902 100644
> --- a/fs/nfs/blocklayout/blocklayout.h
> +++ b/fs/nfs/blocklayout/blocklayout.h
> @@ -111,17 +111,15 @@ struct pnfs_block_dev {
>
> struct file *bdev_file;
> u64 disk_offset;
> - unsigned long flags;
>
> + bool blkdev_registered;
> u64 pr_key;
> + /* Mutex to serialize SCSI PR register/unregister operations. */
> + struct mutex pbd_registration_mutex;
>
> bool (*map)(struct pnfs_block_dev *dev, u64 offset,
> struct pnfs_block_dev_map *map);
> -};
>
> -/* pnfs_block_dev flag bits */
> -enum {
> - PNFS_BDEV_REGISTERED = 0,
> };
>
> /* sector_t fields are all in 512-byte sectors */
> diff --git a/fs/nfs/blocklayout/dev.c b/fs/nfs/blocklayout/dev.c
> index cc6327d97a91..f1e77c4290ae 100644
> --- a/fs/nfs/blocklayout/dev.c
> +++ b/fs/nfs/blocklayout/dev.c
> @@ -33,10 +33,15 @@ static bool bl_register_scsi(struct pnfs_block_dev *dev)
> const struct pr_ops *ops = bdev->bd_disk->fops->pr_ops;
> int status;
>
> - if (test_and_set_bit(PNFS_BDEV_REGISTERED, &dev->flags))
> + mutex_lock(&dev->pbd_registration_mutex);
> + if (dev->blkdev_registered) {
> + mutex_unlock(&dev->pbd_registration_mutex);
> return true;
> + }
> + dev->blkdev_registered = true;
>
> status = ops->pr_register(bdev, 0, dev->pr_key, true);
> + mutex_unlock(&dev->pbd_registration_mutex);
> if (status) {
> trace_bl_pr_key_reg_err(bdev, dev->pr_key, status);
> return false;
> @@ -55,9 +60,12 @@ static void bl_unregister_dev(struct pnfs_block_dev *dev)
> return;
> }
>
> - if (dev->type == PNFS_BLOCK_VOLUME_SCSI &&
> - test_and_clear_bit(PNFS_BDEV_REGISTERED, &dev->flags))
> + mutex_lock(&dev->pbd_registration_mutex);
> + if (dev->type == PNFS_BLOCK_VOLUME_SCSI && dev->blkdev_registered) {
> + dev->blkdev_registered = false;
> bl_unregister_scsi(dev);
> + }
> + mutex_unlock(&dev->pbd_registration_mutex);
> }
>
> bool bl_register_dev(struct pnfs_block_dev *dev)
> @@ -572,6 +580,7 @@ bl_alloc_deviceid_node(struct nfs_server *server, struct pnfs_device *pdev,
> top = kzalloc_obj(*top, gfp_mask);
> if (!top)
> goto out_free_volumes;
> + mutex_init(&top->pbd_registration_mutex);
>
> ret = bl_parse_deviceid(server, top, volumes, nr_volumes - 1, gfp_mask);
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH v3 1/1] pNFS: Serialize SCSI PR registration to avoid reservation conflicts
2026-03-06 18:46 ` Dai Ngo
@ 2026-03-09 15:34 ` Christoph Hellwig
2026-03-09 16:38 ` Dai Ngo
0 siblings, 1 reply; 4+ messages in thread
From: Christoph Hellwig @ 2026-03-09 15:34 UTC (permalink / raw)
To: Dai Ngo; +Cc: Christoph Hellwig, linux-nfs, Trond Myklebust, anna
On Fri, Mar 06, 2026 at 10:46:00AM -0800, Dai Ngo wrote:
> Christoph,
>
> The new mutex, pbd_registration_mutex, is initialized only for the
> top-level pnfs_block_dev in this patch. The mutexes for child pnfs_block_dev
> instances allocated by bl_parse_concat() and bl_parse_stripe() are not
> initialized.
>
> Should we initialize these child mutexes as well, in case we later want
> to support concatenated and striped SCSI devices for pNFS?
I don't think we need to, as the registration would happen on the
actual SCSI devices, not the meta devices.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH v3 1/1] pNFS: Serialize SCSI PR registration to avoid reservation conflicts
2026-03-09 15:34 ` Christoph Hellwig
@ 2026-03-09 16:38 ` Dai Ngo
0 siblings, 0 replies; 4+ messages in thread
From: Dai Ngo @ 2026-03-09 16:38 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: linux-nfs, Trond Myklebust, anna
On 3/9/26 8:34 AM, Christoph Hellwig wrote:
> On Fri, Mar 06, 2026 at 10:46:00AM -0800, Dai Ngo wrote:
>> Christoph,
>>
>> The new mutex, pbd_registration_mutex, is initialized only for the
>> top-level pnfs_block_dev in this patch. The mutexes for child pnfs_block_dev
>> instances allocated by bl_parse_concat() and bl_parse_stripe() are not
>> initialized.
>>
>> Should we initialize these child mutexes as well, in case we later want
>> to support concatenated and striped SCSI devices for pNFS?
> I don't think we need to, as the registration would happen on the
> actual SCSI devices, not the meta devices.
ok, thanks!
-Dai
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-03-09 16:39 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-06 16:29 [PATCH v3 1/1] pNFS: Serialize SCSI PR registration to avoid reservation conflicts Dai Ngo
2026-03-06 18:46 ` Dai Ngo
2026-03-09 15:34 ` Christoph Hellwig
2026-03-09 16:38 ` Dai Ngo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox