From: Nilay Shroff <nilay@linux.ibm.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: Keith Busch <kbusch@kernel.org>,
axboe@fb.com, linux-block@vger.kernel.org,
linux-nvme@lists.infradead.org, Gregory Joyce <gjoyce@ibm.com>
Subject: [Bug Report] nvme-cli fails re-formatting NVMe namespace
Date: Fri, 15 Mar 2024 20:01:33 +0530 [thread overview]
Message-ID: <7a3b35dd-7365-4427-95a0-929b28c64e73@linux.ibm.com> (raw)
Hi,
We found that "nvme format ..." command fails to format nvme disk with block-size set to 512.
Notes and observations:
======================
This is observed on the latest linus kernel tree. This was working well on kernel v6.8.
Test details:
=============
At system boot or when nvme is hot plugin, the nvme block size is 4096 and later if we try format
it with the block-size of 512 (lbaf=2) then it fails. Interestingly, if we start with the nvme block
size of 512 and later if we try format it with block-size of 4096 (lbaf=0) then it doesn't fail.
Please note that CONFIG_NVME_MULTIPATH is enabled.
Please find below further details:
# lspci
0018:01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM173X
# nvme list
Node Generic SN Model Namespace Usage Format FW Rev
--------------------- --------------------- -------------------- ---------------------------------------- ---------- -------------------------- ---------------- --------
/dev/nvme0n1 /dev/ng0n1 S6EUNA0R500358 1.6TB NVMe Gen4 U.2 SSD 0x1 1.60 TB / 1.60 TB 512 B + 0 B REV.SN49
# nvme id-ns /dev/nvme0n1 -H
NVME Identify Namespace 1:
nsze : 0xba4d4ab0
ncap : 0xba4d4ab0
nuse : 0xba4d4ab0
<snip>
<snip>
nlbaf : 4
flbas : 0
[6:5] : 0 Most significant 2 bits of Current LBA Format Selected
[4:4] : 0 Metadata Transferred in Separate Contiguous Buffer
[3:0] : 0 Least significant 4 bits of Current LBA Format Selected
<snip>
<snip>
LBA Format 0 : Metadata Size: 0 bytes - Data Size: 4096 bytes - Relative Performance: 0 Best (in use)
LBA Format 1 : Metadata Size: 8 bytes - Data Size: 4096 bytes - Relative Performance: 0x2 Good
LBA Format 2 : Metadata Size: 0 bytes - Data Size: 512 bytes - Relative Performance: 0x1 Better
LBA Format 3 : Metadata Size: 8 bytes - Data Size: 512 bytes - Relative Performance: 0x3 Degraded
LBA Format 4 : Metadata Size: 64 bytes - Data Size: 4096 bytes - Relative Performance: 0x3 Degraded
# lsblk -t /dev/nvme0n1
NAME ALIGNMENT MIN-IO OPT-IO PHY-SEC LOG-SEC ROTA SCHED RQ-SIZE RA WSAME
nvme0n1 0 4096 0 4096 4096 0 128 0B
^^^ ^^^
!!!! FAILING TO FORMAT with 512 bytes of block size !!!!
# nvme format /dev/nvme0n1 --lbaf=2 --pil=0 --ms=0 --pi=0 -f
Success formatting namespace:1
failed to set block size to 512
^^^
# lsblk -t /dev/nvme0n1
NAME ALIGNMENT MIN-IO OPT-IO PHY-SEC LOG-SEC ROTA SCHED RQ-SIZE RA WSAME
nvme0n1 0 4096 0 4096 4096 0 128 0B
^^^ ^^^
# cat /sys/block/nvme0n1/queue/logical_block_size:4096
# cat /sys/block/nvme0n1/queue/physical_block_size:4096
# cat /sys/block/nvme0c0n1/queue/logical_block_size:512
# cat /sys/block/nvme0c0n1/queue/physical_block_size:512
# nvme id-ns /dev/nvme0n1 -H
NVME Identify Namespace 1:
nsze : 0xba4d4ab0
ncap : 0xba4d4ab0
nuse : 0xba4d4ab0
<snip>
<snip>
nlbaf : 4
flbas : 0x2
[6:5] : 0 Most significant 2 bits of Current LBA Format Selected
[4:4] : 0 Metadata Transferred in Separate Contiguous Buffer
[3:0] : 0x2 Least significant 4 bits of Current LBA Format Selected
<snip>
<snip>
LBA Format 0 : Metadata Size: 0 bytes - Data Size: 4096 bytes - Relative Performance: 0 Best
LBA Format 1 : Metadata Size: 8 bytes - Data Size: 4096 bytes - Relative Performance: 0x2 Good
LBA Format 2 : Metadata Size: 0 bytes - Data Size: 512 bytes - Relative Performance: 0x1 Better (in use)
LBA Format 3 : Metadata Size: 8 bytes - Data Size: 512 bytes - Relative Performance: 0x3 Degraded
LBA Format 4 : Metadata Size: 64 bytes - Data Size: 4096 bytes - Relative Performance: 0x3 Degraded
Note : We could see above that the NVMe is indeed formatted with lbaf 2(block size 512). However,
the block queue limits are not correctly updated.
Git bisect:
==========
Git bisect reveals the following commit as bad commit:
8f03cfa117e06bd2d3ba7ed8bba70a3dda310cae is the first bad commit
commit 8f03cfa117e06bd2d3ba7ed8bba70a3dda310cae
Author: Christoph Hellwig <hch@lst.de>
Date: Mon Mar 4 07:04:51 2024 -0700
nvme: don't use nvme_update_disk_info for the multipath disk
Currently nvme_update_ns_info_block calls nvme_update_disk_info both for
the namespace attached disk, and the multipath one (if it exists). This
is very different from how other stacking drivers work, and leads to
a lot of complexity.
Switch to setting the disk capacity and initializing the integrity
profile, and let blk_stack_limits which already is called just below
deal with updating the other limits.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
drivers/nvme/host/core.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
The above commit is part of the new atomic queue limit updates patch series. For
NVMe device if multipath config is enabled then we rely on blk_stack_limits to
update the queue limits for the stacked device. For updating the logical/physical
queue limit of the top (nvme%dn%d) device, the blk_stack_limits() uses the max of
top and bottom limit:
t->logical_block_size = max(t->logical_block_size,
b->logical_block_size);
t->physical_block_size = max(t->physical_block_size,
b->physical_block_size);
When we try formatting the nvme disk with block-size of 512, the value of
t->logical_block_size would be 4096 (as this is the initial block-size) however the
value of b->logical_block_size would be 512 (the block size of the bottom device is first
updated in nvme_update_ns_info_block()).
I think we may want to update the queue limits of both top and bottom devices in the
nvme_update_ns_info_block(). Or if there's some other way?
Let me know if you need any further information.
Thanks,
--Nilay
next reply other threads:[~2024-03-15 14:36 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-15 14:31 Nilay Shroff [this message]
2024-03-18 2:18 ` [Bug Report] nvme-cli fails re-formatting NVMe namespace Christoph Hellwig
2024-03-18 4:56 ` Nilay Shroff
2024-03-18 23:18 ` Christoph Hellwig
2024-03-20 2:19 ` Christoph Hellwig
2024-03-20 5:53 ` Nilay Shroff
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7a3b35dd-7365-4427-95a0-929b28c64e73@linux.ibm.com \
--to=nilay@linux.ibm.com \
--cc=axboe@fb.com \
--cc=gjoyce@ibm.com \
--cc=hch@infradead.org \
--cc=kbusch@kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox