* Saw I/O errors while delete/create/attach a namespace on nvme device. @ 2023-11-07 4:10 Wen Xiong 2023-11-07 4:36 ` Chaitanya Kulkarni 2023-11-07 8:56 ` Christoph Hellwig 0 siblings, 2 replies; 10+ messages in thread From: Wen Xiong @ 2023-11-07 4:10 UTC (permalink / raw) To: linux-nvme; +Cc: Wenxiong Hi All, I am working on new nvme device and found this: nguid is changed while delete/create/attach a namespace, we saw some error messages in linux log. Should we see these errors messages since recreating namespaces or are there some issues in error path? # dmesg [10524.330156] nvme nvme0: rescanning namespaces. [10524.338810] nvme nvme0: identifiers changed for nsid 1 [10524.343448] block nvme0n1: no usable path - requeuing I/O [10524.480468] block nvme0n1: no available path - failing I/O [10524.480497] block nvme0n1: no available path - failing I/O [10524.480501] Buffer I/O error on dev nvme0n1, logical block 781402912, async page read [10524.480508] block nvme0n1: no available path - failing I/O [10524.480510] Buffer I/O error on dev nvme0n1, logical block 781402913, async page read [10524.480515] block nvme0n1: no available path - failing I/O [10524.480517] Buffer I/O error on dev nvme0n1, logical block 781402914, async page read [10524.480524] block nvme0n1: no available path - failing I/O [10524.480525] Buffer I/O error on dev nvme0n1, logical block 781402915, async page read [10524.480528] block nvme0n1: no available path - failing I/O [10524.480529] Buffer I/O error on dev nvme0n1, logical block 781402916, async page read [10524.480531] block nvme0n1: no available path - failing I/O [10524.480535] Buffer I/O error on dev nvme0n1, logical block 781402917, async page read [10524.480539] block nvme0n1: no available path - failing I/O [10524.480541] Buffer I/O error on dev nvme0n1, logical block 781402918, async page read [10524.480546] block nvme0n1: no available path - failing I/O [10524.480548] Buffer I/O error on dev nvme0n1, logical block 781402919, async page read Thanks for your help Wendy ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Saw I/O errors while delete/create/attach a namespace on nvme device. 2023-11-07 4:10 Saw I/O errors while delete/create/attach a namespace on nvme device Wen Xiong @ 2023-11-07 4:36 ` Chaitanya Kulkarni 2023-11-07 8:56 ` Christoph Hellwig 1 sibling, 0 replies; 10+ messages in thread From: Chaitanya Kulkarni @ 2023-11-07 4:36 UTC (permalink / raw) To: Wen Xiong; +Cc: linux-nvme@lists.infradead.org On 11/6/23 20:10, Wen Xiong wrote: > Hi All, > > I am working on new nvme device and found this: nguid is changed while > delete/create/attach a namespace, we saw some error messages in linux log. Please provide the detailed steps from start which has lead to this behavior, also please provide git repo/branch/branch head that you are using. -ck ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Saw I/O errors while delete/create/attach a namespace on nvme device. 2023-11-07 4:10 Saw I/O errors while delete/create/attach a namespace on nvme device Wen Xiong 2023-11-07 4:36 ` Chaitanya Kulkarni @ 2023-11-07 8:56 ` Christoph Hellwig 2023-11-07 10:25 ` Chaitanya Kulkarni 2023-11-07 13:26 ` Wen Xiong 1 sibling, 2 replies; 10+ messages in thread From: Christoph Hellwig @ 2023-11-07 8:56 UTC (permalink / raw) To: Wen Xiong; +Cc: linux-nvme, Wenxiong On Mon, Nov 06, 2023 at 10:10:33PM -0600, Wen Xiong wrote: > Hi All, > > I am working on new nvme device and found this: nguid is changed while > delete/create/attach a namespace, we saw some error messages in linux log. > > Should we see these errors messages since recreating namespaces or are there > some issues in error path? What exactly are you doing? To the Linux host code this looks like the NGUID changed for an existing namespace. Are you deleting and recreating nsid1 rapidly and the controller is assigning a different nguid? ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Saw I/O errors while delete/create/attach a namespace on nvme device. 2023-11-07 8:56 ` Christoph Hellwig @ 2023-11-07 10:25 ` Chaitanya Kulkarni 2023-11-07 14:31 ` Wen Xiong 2023-11-07 13:26 ` Wen Xiong 1 sibling, 1 reply; 10+ messages in thread From: Chaitanya Kulkarni @ 2023-11-07 10:25 UTC (permalink / raw) To: Christoph Hellwig, Wen Xiong; +Cc: linux-nvme@lists.infradead.org, Wenxiong On 11/7/23 00:56, Christoph Hellwig wrote: > On Mon, Nov 06, 2023 at 10:10:33PM -0600, Wen Xiong wrote: >> Hi All, >> >> I am working on new nvme device and found this: nguid is changed while >> delete/create/attach a namespace, we saw some error messages in linux log. >> >> Should we see these errors messages since recreating namespaces or are there >> some issues in error path? > What exactly are you doing? To the Linux host code this looks like the > NGUID changed for an existing namespace. Are you deleting and > recreating nsid1 rapidly and the controller is assigning a different > nguid? > > exactly "identifiers changed for nsid XXX" comes from nvme_validate_ns() and the only caller is nvme_scan_ns() where ns is present, so scan ns found the namespace and nvme_ns_ids_eaqual() returned an error hence 2nd print message, but as said earlier we really need to see the exact steps ... To check exactly which id is problematic something like in [1] can be used, totally untested ... -ck [1] diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 75a1b58a7a43..84651c922548 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -1749,10 +1749,23 @@ static void nvme_config_discard(struct gendisk *disk, struct nvme_ns *ns) static bool nvme_ns_ids_equal(struct nvme_ns_ids *a, struct nvme_ns_ids *b) { - return uuid_equal(&a->uuid, &b->uuid) && - memcmp(&a->nguid, &b->nguid, sizeof(a->nguid)) == 0 && - memcmp(&a->eui64, &b->eui64, sizeof(a->eui64)) == 0 && - a->csi == b->csi; + if (uuid_equal(&a->uuid, &b->uuid) == false) { + pr_info("uuid mismatch\n"); + return false; + } + if (memcmp(&a->nguid, &b->nguid, sizeof(a->nguid)) != 0) { + pr_info("nguid mismatch\n"); + return false; + } + if (memcmp(&a->eui64, &b->eui64, sizeof(a->eui64)) != 0) { + pr_info("euid64 mismaatch\n"); + return false; + } + if (a->csi != b->csi) { + pr_info("csi mismatch\n"); + return false; + } + return true; } static int nvme_init_ms(struct nvme_ns *ns, struct nvme_id_ns *id) ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: Saw I/O errors while delete/create/attach a namespace on nvme device. 2023-11-07 10:25 ` Chaitanya Kulkarni @ 2023-11-07 14:31 ` Wen Xiong 2023-11-07 15:18 ` Keith Busch 0 siblings, 1 reply; 10+ messages in thread From: Wen Xiong @ 2023-11-07 14:31 UTC (permalink / raw) To: Chaitanya Kulkarni; +Cc: Christoph Hellwig, linux-nvme, Wenxiong > To check exactly which id is problematic something like in [1] can be > used, totally untested ... > Steps: # nvme detach-ns /dev/nvme0 --namespace-id=1 --controllers=0x81 # nvme delete-ns /dev/nvme0 --namespace-id=1 # nvme create-ns /dev/nvme0 --nsze=562805846 --ncap=562805846 --flbas=0 -dps=0 -nmic=1 # nvme attach-ns /dev/nvme0 -n 1 --controller=0x81 Below is linux log with your patch: [ 149.570987] nvme nvme0: rescanning namespaces. [ 149.578714] nguid mismatch [ 149.578719] nvme nvme0: identifiers changed for nsid 1 [ 149.582291] block nvme0n1: no usable path - requeuing I/O [ 149.722140] block nvme0n1: no available path - failing I/O [ 149.722157] block nvme0n1: no available path - failing I/O [ 149.722165] Buffer I/O error on dev nvme0n1, logical block 281402912, async page read [ 149.722171] block nvme0n1: no available path - failing I/O [ 149.722175] Buffer I/O error on dev nvme0n1, logical block 281402913, async page read [ 149.722181] block nvme0n1: no available path - failing I/O [ 149.722185] Buffer I/O error on dev nvme0n1, logical block 281402914, async page read [ 149.722191] block nvme0n1: no available path - failing I/O [ 149.722195] Buffer I/O error on dev nvme0n1, logical block 281402915, async page read [ 149.722203] block nvme0n1: no available path - failing I/O [ 149.722208] Buffer I/O error on dev nvme0n1, logical block 281402916, async page read [ 149.722217] block nvme0n1: no available path - failing I/O [ 149.722226] Buffer I/O error on dev nvme0n1, logical block 281402917, async page read [ 149.722231] block nvme0n1: no available path - failing I/O [ 149.722233] Buffer I/O error on dev nvme0n1, logical block 281402918, async page read [ 149.722237] block nvme0n1: no available path - failing I/O [ 149.722239] Buffer I/O error on dev nvme0n1, logical block 281402919, async page read [root@ltcrain119-lp4 ~]# Below is nguid changes: # nvme id-ns /dev/nvme0n1|grep nguid nguid : 37444630577000630025384700000245 nvme id-ns /dev/nvme0n1|grep nguid nguid : 37444630577000630025384700000246 Thanks a lot! Wendy ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Saw I/O errors while delete/create/attach a namespace on nvme device. 2023-11-07 14:31 ` Wen Xiong @ 2023-11-07 15:18 ` Keith Busch 2023-11-07 15:53 ` Wen Xiong ` (2 more replies) 0 siblings, 3 replies; 10+ messages in thread From: Keith Busch @ 2023-11-07 15:18 UTC (permalink / raw) To: Wen Xiong; +Cc: Chaitanya Kulkarni, Christoph Hellwig, linux-nvme, Wenxiong On Tue, Nov 07, 2023 at 08:31:37AM -0600, Wen Xiong wrote: > > > To check exactly which id is problematic something like in [1] can be > > used, totally untested ... > > > Steps: > # nvme detach-ns /dev/nvme0 --namespace-id=1 --controllers=0x81 > # nvme delete-ns /dev/nvme0 --namespace-id=1 > # nvme create-ns /dev/nvme0 --nsze=562805846 --ncap=562805846 --flbas=0 > -dps=0 -nmic=1 > # nvme attach-ns /dev/nvme0 -n 1 --controller=0x81 > > Below is linux log with your patch: > > [ 149.570987] nvme nvme0: rescanning namespaces. Are you running these commands in quick succession? There should be a "rescanning namespaces" message right after the 'detach-ns' command, and before subsequent 'attach-ns' command. It looks here that the rescan didn't run until after the 'attach-ns' occured. Instead of tearing down the original, the driver just sees the namespace it previously knew about has changed unexpectedly; the processing for the namespace removal didn't happen prior to the attach-ns command. > [ 149.578714] nguid mismatch > [ 149.578719] nvme nvme0: identifiers changed for nsid 1 > [ 149.582291] block nvme0n1: no usable path - requeuing I/O > [ 149.722140] block nvme0n1: no available path - failing I/O > [ 149.722157] block nvme0n1: no available path - failing I/O If you drop all open references to /dev/nvme0n1, then the handle should get deleted, and a manual rescan after that should get your new namespace visible. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Saw I/O errors while delete/create/attach a namespace on nvme device. 2023-11-07 15:18 ` Keith Busch @ 2023-11-07 15:53 ` Wen Xiong 2023-11-07 19:22 ` Wen Xiong 2023-11-08 7:15 ` Christoph Hellwig 2 siblings, 0 replies; 10+ messages in thread From: Wen Xiong @ 2023-11-07 15:53 UTC (permalink / raw) To: Keith Busch; +Cc: Chaitanya Kulkarni, Christoph Hellwig, linux-nvme, Wenxiong On 2023-11-07 09:18, Keith Busch wrote: Hi Keith, > "rescanning namespaces" message right after the 'detach-ns' command, > and > before subsequent 'attach-ns' command. It looks here that the rescan > didn't run until after the 'attach-ns' occured. Instead of tearing down > the original, the driver just sees the namespace it previously knew > about has changed unexpectedly; the processing for the namespace > removal > didn't happen prior to the attach-ns command. Re-did: # nvme detach-ns /dev/nvme0 --namespace-id=1 --controllers=0x81 detach-ns: Success, nsid:1 # dmesg [ 4804.431303] nvme nvme0: rescanning namespaces # nvme delete-ns /dev/nvme0 --namespace-id=1 delete-ns: Success, deleted nsid:1 # dmesg [ 4804.431303] nvme nvme0: rescanning namespaces. # nvme create-ns /dev/nvme0 --nsze=562805846 --ncap=562805846 --flbas=0 -dps=0 -nmic=1 create-ns: Success, created nsid:1 [root@ltcrain119-lp4 ~]# dmesg [ 4804.431303] nvme nvme0: rescanning namespaces. > If you drop all open references to /dev/nvme0n1, then the handle should > get deleted, and a manual rescan after that should get your new > namespace visible. # nvme attach-ns /dev/nvme0 -n 1 --controller=0x81 attach-ns: Success, nsid:1 # dmesg [ 4804.431303] nvme nvme0: rescanning namespaces. [ 5219.493625] nvme nvme0: rescanning namespaces. [ 5219.502136] nguid mismatch [ 5219.502146] nvme nvme0: identifiers changed for nsid 1 [ 5219.506668] block nvme0n1: no usable path - requeuing I/O [ 5219.662788] block nvme0n1: no available path - failing I/O [ 5219.662824] block nvme0n1: no available path - failing I/O [ 5219.662841] Buffer I/O error on dev nvme0n1, logical block 281402912, async page read [ 5219.662859] block nvme0n1: no available path - failing I/O [ 5219.662875] Buffer I/O error on dev nvme0n1, logical block 281402913, async page read [ 5219.662887] block nvme0n1: no available path - failing I/O [ 5219.662894] Buffer I/O error on dev nvme0n1, logical block 281402914, async page read [ 5219.662913] block nvme0n1: no available path - failing I/O [ 5219.662926] Buffer I/O error on dev nvme0n1, logical block 281402915, async page read [ 5219.662956] block nvme0n1: no available path - failing I/O [ 5219.662970] Buffer I/O error on dev nvme0n1, logical block 281402916, async page read [ 5219.662985] bio_check_eod: 7 callbacks suppressed [ 5219.662988] systemd-udevd: attempt to access beyond end of device nvme0n1: rw=0, sector=4502446672, nr_sectors = 16 limit=0 [ 5219.663022] Buffer I/O error on dev nvme0n1, logical block 281402917, async page read [ 5219.663035] systemd-udevd: attempt to access beyond end of device nvme0n1: rw=0, sector=4502446688, nr_sectors = 16 limit=0 [ 5219.663052] Buffer I/O error on dev nvme0n1, logical block 281402918, async page read [ 5219.663065] systemd-udevd: attempt to access beyond end of device nvme0n1: rw=0, sector=4502446704, nr_sectors = 16 limit=0 [ 5219.663099] Buffer I/O error on dev nvme0n1, logical block 281402919, async page read # nvme ns-rescan /dev/nvme0n1 /dev/nvme0n1: No such file or directory Usage: nvme ns-rescan <device> [OPTIONS] Rescans the NVMe namespaces # nvme ns-rescan /dev/nvme0n1 /dev/nvme0n1: No such file or directory Usage: nvme ns-rescan <device> [OPTIONS] Rescans the NVMe namespaces # ls -l /dev/nvme* crw-------. 1 root root 240, 0 Nov 7 08:26 /dev/nvme0 crw-------. 1 root root 240, 1 Nov 7 08:13 /dev/nvme1 brw-rw----. 1 root disk 259, 1 Nov 7 08:13 /dev/nvme1n1 [root@ltcrain119-lp4 ~]# nvme attach-ns /dev/nvme0 -n 1 --controller=0x81 NVMe status: Namespace Already Attached: The controller is already attached to the namespace specified(0x2118) [root@ltcrain119-lp4 ~]# ls -l /dev/nvme* crw-------. 1 root root 240, 0 Nov 7 08:26 /dev/nvme0 brw-rw----. 1 root disk 259, 3 Nov 7 09:48 /dev/nvme0n1 crw-------. 1 root root 240, 1 Nov 7 08:13 /dev/nvme1 brw-rw----. 1 root disk 259, 1 Nov 7 08:13 /dev/nvme1n1 After attach-ns command, /dev/nvme0n1 is not showed up in /dev/*, somehow I have to do the 2nd attach-ns command, nvme ns-rescan works after the 2nd attach-ns. Is a firmware issue on nvme device? Thanks, Wendy ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Saw I/O errors while delete/create/attach a namespace on nvme device. 2023-11-07 15:18 ` Keith Busch 2023-11-07 15:53 ` Wen Xiong @ 2023-11-07 19:22 ` Wen Xiong 2023-11-08 7:15 ` Christoph Hellwig 2 siblings, 0 replies; 10+ messages in thread From: Wen Xiong @ 2023-11-07 19:22 UTC (permalink / raw) To: Keith Busch; +Cc: Chaitanya Kulkarni, Christoph Hellwig, linux-nvme, Wenxiong On 2023-11-07 09:18, Keith Busch wrote: Hi Keith, > If you drop all open references to /dev/nvme0n1, then the handle should > get deleted, and a manual rescan after that should get your new > namespace visible. Should we call nvme_ns_scan(*ctrl, nsid) again if nguid/uuid/eui64/csi are changed when recreating a namespace? Customers don't need to run ns-attach/ns-rescan manually in user space. Thanks! Wendy ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Saw I/O errors while delete/create/attach a namespace on nvme device. 2023-11-07 15:18 ` Keith Busch 2023-11-07 15:53 ` Wen Xiong 2023-11-07 19:22 ` Wen Xiong @ 2023-11-08 7:15 ` Christoph Hellwig 2 siblings, 0 replies; 10+ messages in thread From: Christoph Hellwig @ 2023-11-08 7:15 UTC (permalink / raw) To: Keith Busch Cc: Wen Xiong, Chaitanya Kulkarni, Christoph Hellwig, linux-nvme, Wenxiong On Tue, Nov 07, 2023 at 08:18:17AM -0700, Keith Busch wrote: > Are you running these commands in quick succession? There should be a > "rescanning namespaces" message right after the 'detach-ns' command, and > before subsequent 'attach-ns' command. It looks here that the rescan > didn't run until after the 'attach-ns' occured. Instead of tearing down > the original, the driver just sees the namespace it previously knew > about has changed unexpectedly; the processing for the namespace removal > didn't happen prior to the attach-ns command. Yep. > > > [ 149.578714] nguid mismatch > > [ 149.578719] nvme nvme0: identifiers changed for nsid 1 > > [ 149.582291] block nvme0n1: no usable path - requeuing I/O > > [ 149.722140] block nvme0n1: no available path - failing I/O > > [ 149.722157] block nvme0n1: no available path - failing I/O > > If you drop all open references to /dev/nvme0n1, then the handle should > get deleted, and a manual rescan after that should get your new > namespace visible. I fear we still need to handle this somehow. For actual per-spec namespce management we'll just need to snoop the namespace management commands and update the ns_head membership. For out of band management there's not much we can do as-is. A good addition to the spec would be to add the concept of a namespace generation that is incremented every time the nsid is reused. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Saw I/O errors while delete/create/attach a namespace on nvme device. 2023-11-07 8:56 ` Christoph Hellwig 2023-11-07 10:25 ` Chaitanya Kulkarni @ 2023-11-07 13:26 ` Wen Xiong 1 sibling, 0 replies; 10+ messages in thread From: Wen Xiong @ 2023-11-07 13:26 UTC (permalink / raw) To: Christoph Hellwig; +Cc: linux-nvme, Wenxiong > What exactly are you doing? To the Linux host code this looks like the > NGUID changed for an existing namespace. Are you deleting and > recreating nsid1 rapidly and the controller is assigning a different > nguid? Hi Christoph, Good morning! Yes. controller assigned a difference nguid to a namespace. Steps: Executed nvme detach-ns /dev/nvme0 --namespace-id=1 --controllers=0x81 Executed nvme delete-ns /dev/nvme0 --namespace-id=1 Executed nvme create-ns /dev/nvme0 --nsze=1562805846 --ncap=1562805846 --flbas=0 -dps=0 -nmic=1 Executed nvme attach-ns /dev/nvme --namespace-id=1 --controllers=0x81 Saw NGUID changed and IO errors when executing attach-ns command. Thanks a lot! Wendy ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2023-11-08 7:15 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-11-07 4:10 Saw I/O errors while delete/create/attach a namespace on nvme device Wen Xiong 2023-11-07 4:36 ` Chaitanya Kulkarni 2023-11-07 8:56 ` Christoph Hellwig 2023-11-07 10:25 ` Chaitanya Kulkarni 2023-11-07 14:31 ` Wen Xiong 2023-11-07 15:18 ` Keith Busch 2023-11-07 15:53 ` Wen Xiong 2023-11-07 19:22 ` Wen Xiong 2023-11-08 7:15 ` Christoph Hellwig 2023-11-07 13:26 ` Wen Xiong
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.