public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Dai Ngo <dai.ngo@oracle.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: linux-nfs@vger.kernel.org, Trond Myklebust <trondmy@kernel.org>,
	anna@kernel.org
Subject: Re: [PATCH v3 1/1] pNFS: Serialize SCSI PR registration to avoid reservation conflicts
Date: Fri, 6 Mar 2026 10:46:00 -0800	[thread overview]
Message-ID: <fc0a566a-9cba-4fd0-99a0-d7fb7043a77b@oracle.com> (raw)
In-Reply-To: <20260306162927.3276695-1-dai.ngo@oracle.com>

Christoph,

The new mutex, pbd_registration_mutex, is initialized only for the
top-level pnfs_block_dev in this patch. The mutexes for child pnfs_block_dev
instances allocated by bl_parse_concat() and bl_parse_stripe() are not
initialized.

Should we initialize these child mutexes as well, in case we later want
to support concatenated and striped SCSI devices for pNFS?

Thanks,
-Dai

On 3/6/26 8:29 AM, Dai Ngo wrote:
> With SCSI layouts, the NFS client must not submit I/O to the data server
> until the Persistent Reservation (PR) registration has completed.
>
> Currently, bl_register_scsi() sets PNFS_BDEV_REGISTERED before performing
> the PR operation. If multiple threads concurrently start I/O to the same
> SCSI device, the first thread sets the flag and begins registration,
> while other threads observe the flag, skip registration, and proceed to
> issue I/O. Those I/Os can hit RESERVATION CONFLICT, forcing fall back to
> the MDS.
>
> Protect the registration path with a mutex so only one thread performs
> PR registration at a time. Other threads wait for registration to finish
> and only then re-check PNFS_BDEV_REGISTERED, ensuring no I/O is issued
> until PR registration is complete.
>
> Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
> ---
>   fs/nfs/blocklayout/blocklayout.h |  8 +++-----
>   fs/nfs/blocklayout/dev.c         | 15 ++++++++++++---
>   2 files changed, 15 insertions(+), 8 deletions(-)
>
> v2:
>      . remove fio test from commit message.
>      . rename pbd_mutex to pbd_registration_mutex and add a description
>        of its usage.
>      . move declaration of pbd_registration_mutex before the (*map)().
>      . protect unregistration op with pbd_registration_mutex.
> v3:
>      . replace PNFS_BDEV_REGISTERED flag in pnfs_block_dev with
>        blkdev_registered boolean.
>
> diff --git a/fs/nfs/blocklayout/blocklayout.h b/fs/nfs/blocklayout/blocklayout.h
> index 6da40ca19570..311b14334902 100644
> --- a/fs/nfs/blocklayout/blocklayout.h
> +++ b/fs/nfs/blocklayout/blocklayout.h
> @@ -111,17 +111,15 @@ struct pnfs_block_dev {
>   
>   	struct file			*bdev_file;
>   	u64				disk_offset;
> -	unsigned long			flags;
>   
> +	bool				blkdev_registered;
>   	u64				pr_key;
> +	/* Mutex to serialize SCSI PR register/unregister operations. */
> +	struct mutex			pbd_registration_mutex;
>   
>   	bool (*map)(struct pnfs_block_dev *dev, u64 offset,
>   			struct pnfs_block_dev_map *map);
> -};
>   
> -/* pnfs_block_dev flag bits */
> -enum {
> -	PNFS_BDEV_REGISTERED = 0,
>   };
>   
>   /* sector_t fields are all in 512-byte sectors */
> diff --git a/fs/nfs/blocklayout/dev.c b/fs/nfs/blocklayout/dev.c
> index cc6327d97a91..f1e77c4290ae 100644
> --- a/fs/nfs/blocklayout/dev.c
> +++ b/fs/nfs/blocklayout/dev.c
> @@ -33,10 +33,15 @@ static bool bl_register_scsi(struct pnfs_block_dev *dev)
>   	const struct pr_ops *ops = bdev->bd_disk->fops->pr_ops;
>   	int status;
>   
> -	if (test_and_set_bit(PNFS_BDEV_REGISTERED, &dev->flags))
> +	mutex_lock(&dev->pbd_registration_mutex);
> +	if (dev->blkdev_registered) {
> +		mutex_unlock(&dev->pbd_registration_mutex);
>   		return true;
> +	}
> +	dev->blkdev_registered = true;
>   
>   	status = ops->pr_register(bdev, 0, dev->pr_key, true);
> +	mutex_unlock(&dev->pbd_registration_mutex);
>   	if (status) {
>   		trace_bl_pr_key_reg_err(bdev, dev->pr_key, status);
>   		return false;
> @@ -55,9 +60,12 @@ static void bl_unregister_dev(struct pnfs_block_dev *dev)
>   		return;
>   	}
>   
> -	if (dev->type == PNFS_BLOCK_VOLUME_SCSI &&
> -		test_and_clear_bit(PNFS_BDEV_REGISTERED, &dev->flags))
> +	mutex_lock(&dev->pbd_registration_mutex);
> +	if (dev->type == PNFS_BLOCK_VOLUME_SCSI && dev->blkdev_registered) {
> +		dev->blkdev_registered = false;
>   		bl_unregister_scsi(dev);
> +	}
> +	mutex_unlock(&dev->pbd_registration_mutex);
>   }
>   
>   bool bl_register_dev(struct pnfs_block_dev *dev)
> @@ -572,6 +580,7 @@ bl_alloc_deviceid_node(struct nfs_server *server, struct pnfs_device *pdev,
>   	top = kzalloc_obj(*top, gfp_mask);
>   	if (!top)
>   		goto out_free_volumes;
> +	mutex_init(&top->pbd_registration_mutex);
>   
>   	ret = bl_parse_deviceid(server, top, volumes, nr_volumes - 1, gfp_mask);
>   

  reply	other threads:[~2026-03-06 18:46 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-06 16:29 [PATCH v3 1/1] pNFS: Serialize SCSI PR registration to avoid reservation conflicts Dai Ngo
2026-03-06 18:46 ` Dai Ngo [this message]
2026-03-09 15:34   ` Christoph Hellwig
2026-03-09 16:38     ` Dai Ngo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fc0a566a-9cba-4fd0-99a0-d7fb7043a77b@oracle.com \
    --to=dai.ngo@oracle.com \
    --cc=anna@kernel.org \
    --cc=hch@infradead.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=trondmy@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox