From: Nilay Shroff <nilay@linux.ibm.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: Keith Busch <kbusch@kernel.org>,
axboe@fb.com, linux-block@vger.kernel.org,
linux-nvme@lists.infradead.org, Gregory Joyce <gjoyce@ibm.com>
Subject: Re: [Bug Report] nvme-cli fails re-formatting NVMe namespace
Date: Wed, 20 Mar 2024 11:23:27 +0530 [thread overview]
Message-ID: <239228ec-6c8d-432c-905d-b477014deee3@linux.ibm.com> (raw)
In-Reply-To: <ZfpHvyjT6kbQKrPF@infradead.org>
On 3/20/24 07:49, Christoph Hellwig wrote:
> Can you try this patch instead?
>
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 00864a63447099..4bac54d4e0015b 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -2204,6 +2204,7 @@ static int nvme_update_ns_info(struct nvme_ns *ns, struct nvme_ns_info *info)
> }
>
> if (!ret && nvme_ns_head_multipath(ns->head)) {
> + struct queue_limits *ns_lim = &ns->disk->queue->limits;
> struct queue_limits lim;
>
> blk_mq_freeze_queue(ns->head->disk->queue);
> @@ -2215,7 +2216,26 @@ static int nvme_update_ns_info(struct nvme_ns *ns, struct nvme_ns_info *info)
> set_disk_ro(ns->head->disk, nvme_ns_is_readonly(ns, info));
> nvme_mpath_revalidate_paths(ns);
>
> + /*
> + * queue_limits mixes values that are the hardware limitations
> + * for bio splitting with what is the device configuration.
> + *
> + * For NVMe the device configuration can change after e.g. a
> + * Format command, and we really want to pick up the new format
> + * value here. But we must still stack the queue limits to the
> + * least common denominator for multipathing to split the bios
> + * properly.
> + *
> + * To work around this, we explicitly set the device
> + * configuration to those that we just queried, but only stack
> + * the splitting limits in to make sure we still obey possibly
> + * lower limitations of other controllers.
> + */
> lim = queue_limits_start_update(ns->head->disk->queue);
> + lim.logical_block_size = ns_lim->logical_block_size;
> + lim.physical_block_size = ns_lim->physical_block_size;
> + lim.io_min = ns_lim->io_min;
> + lim.io_opt = ns_lim->io_opt;
> queue_limits_stack_bdev(&lim, ns->disk->part0, 0,
> ns->head->disk->disk_name);
> ret = queue_limits_commit_update(ns->head->disk->queue, &lim);
>
I have just tested the above patch and it's working as expected. With the above patch,
I don't see any issue formatting the NVMe disk with block-size of 512. Looks good to me.
Thanks,
--Nilay
PS: For reference, please find below test result obtained using the above patch.
--------------------------------------------------------------------------------
# lspci
0018:01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM173X
# nvme list
Node Generic SN Model Namespace Usage Format FW Rev
--------------------- --------------------- -------------------- ---------------------------------------- ---------- -------------------------- ---------------- --------
/dev/nvme0n1 /dev/ng0n1 S6EUNA0R500358 1.6TB NVMe Gen4 U.2 SSD 0x1 1.60 TB / 1.60 TB 4 KiB + 0 B REV.SN49
# nvme id-ns /dev/nvme0n1 -H
NVME Identify Namespace 1:
nsze : 0xba4d4ab0
ncap : 0xba4d4ab0
nuse : 0xba4d4ab0
nsfeat : 0
[4:4] : 0 NPWG, NPWA, NPDG, NPDA, and NOWS are Not Supported
[3:3] : 0 NGUID and EUI64 fields if non-zero, Reused
[2:2] : 0 Deallocated or Unwritten Logical Block error Not Supported
[1:1] : 0 Namespace uses AWUN, AWUPF, and ACWU
[0:0] : 0 Thin Provisioning Not Supported
<snip>
<snip>
nlbaf : 4
flbas : 0
[6:5] : 0 Most significant 2 bits of Current LBA Format Selected
[4:4] : 0 Metadata Transferred in Separate Contiguous Buffer
[3:0] : 0 Least significant 4 bits of Current LBA Format Selected
<snip>
<snip>
LBA Format 0 : Metadata Size: 0 bytes - Data Size: 4096 bytes - Relative Performance: 0 Best (in use)
LBA Format 1 : Metadata Size: 8 bytes - Data Size: 4096 bytes - Relative Performance: 0x2 Good
LBA Format 2 : Metadata Size: 0 bytes - Data Size: 512 bytes - Relative Performance: 0x1 Better
LBA Format 3 : Metadata Size: 8 bytes - Data Size: 512 bytes - Relative Performance: 0x3 Degraded
LBA Format 4 : Metadata Size: 64 bytes - Data Size: 4096 bytes - Relative Performance: 0x3 Degraded
# lsblk -t /dev/nvme0n1
NAME ALIGNMENT MIN-IO OPT-IO PHY-SEC LOG-SEC ROTA SCHED RQ-SIZE RA WSAME
nvme0n1 0 4096 0 4096 4096 0 128 0B
^^^ ^^^
<< The nvme disk has block size of 4096; now format it with block size of 512
# nvme format /dev/nvme0n1 --lbaf=2 --pil=0 --ms=0 --pi=0 -f
Success formatting namespace:1
>> Success formatting; no error seen
# lsblk -t /dev/nvme0n1
NAME ALIGNMENT MIN-IO OPT-IO PHY-SEC LOG-SEC ROTA SCHED RQ-SIZE RA WSAME
nvme0n1 0 512 0 512 512 0 128 0B
^^^ ^^^
# cat /sys/block/nvme0n1/queue/logical_block_size:512
# cat /sys/block/nvme0n1/queue/physical_block_size:512
# cat /sys/block/nvme0n1/queue/optimal_io_size:0
# cat /sys/block/nvme0n1/queue/minimum_io_size:512
# cat /sys/block/nvme0c0n1/queue/logical_block_size:512
# cat /sys/block/nvme0c0n1/queue/physical_block_size:512
# cat /sys/block/nvme0c0n1/queue/optimal_io_size:0
# cat /sys/block/nvme0c0n1/queue/minimum_io_size:512
# nvme id-ns /dev/nvme0n1 -H
NVME Identify Namespace 1:
nsze : 0xba4d4ab0
ncap : 0xba4d4ab0
nuse : 0xba4d4ab0
nsfeat : 0
[4:4] : 0 NPWG, NPWA, NPDG, NPDA, and NOWS are Not Supported
[3:3] : 0 NGUID and EUI64 fields if non-zero, Reused
[2:2] : 0 Deallocated or Unwritten Logical Block error Not Supported
[1:1] : 0 Namespace uses AWUN, AWUPF, and ACWU
[0:0] : 0 Thin Provisioning Not Supported
<snip>
<snip>
nlbaf : 4
flbas : 0x2
[6:5] : 0 Most significant 2 bits of Current LBA Format Selected
[4:4] : 0 Metadata Transferred in Separate Contiguous Buffer
[3:0] : 0x2 Least significant 4 bits of Current LBA Format Selected
<snip>
<snip>
LBA Format 0 : Metadata Size: 0 bytes - Data Size: 4096 bytes - Relative Performance: 0 Best
LBA Format 1 : Metadata Size: 8 bytes - Data Size: 4096 bytes - Relative Performance: 0x2 Good
LBA Format 2 : Metadata Size: 0 bytes - Data Size: 512 bytes - Relative Performance: 0x1 Better (in use)
LBA Format 3 : Metadata Size: 8 bytes - Data Size: 512 bytes - Relative Performance: 0x3 Degraded
LBA Format 4 : Metadata Size: 64 bytes - Data Size: 4096 bytes - Relative Performance: 0x3 Degraded
prev parent reply other threads:[~2024-03-20 5:53 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-15 14:31 [Bug Report] nvme-cli fails re-formatting NVMe namespace Nilay Shroff
2024-03-18 2:18 ` Christoph Hellwig
2024-03-18 4:56 ` Nilay Shroff
2024-03-18 23:18 ` Christoph Hellwig
2024-03-20 2:19 ` Christoph Hellwig
2024-03-20 5:53 ` Nilay Shroff [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=239228ec-6c8d-432c-905d-b477014deee3@linux.ibm.com \
--to=nilay@linux.ibm.com \
--cc=axboe@fb.com \
--cc=gjoyce@ibm.com \
--cc=hch@infradead.org \
--cc=kbusch@kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox