* Re: [PATCH 0/2] nvme: handle partially unique NID value [not found] <20250414090959.2015-1-hare@kernel.org> @ 2025-04-14 11:19 ` Christoph Hellwig 2025-04-14 11:31 ` Hannes Reinecke 0 siblings, 1 reply; 13+ messages in thread From: Christoph Hellwig @ 2025-04-14 11:19 UTC (permalink / raw) To: hare; +Cc: Christoph Hellwig, Keith Busch, Sagi Grimberg, wagi, linux-nvme On Mon, Apr 14, 2025 at 11:09:57AM +0200, hare@kernel.org wrote: > From: Hannes Reinecke <hare@kernel.org> > > Hi all, > > we have encountered a customer issue where the NID values for additional > namespaces on the same device are not unique in all cases; the NGUID is, > but the EUI64 is not. Problem is that prior to commit e2724cb9f0c4 there > devices worked without a problem, but after that all NIDs are blanked out. > This results in udev not creating persistent device links anymore and the > system failing to boot. These devices are so broken that we absolutely should not support them You've also received that feedback both in person from me, from Daniel and from the nvme technical working group. I'm not sure why you insist resending it instead of telling the OEM that specifically requested this spec violating behavior from their SSD vendor to stop doing those broken thing in the many months you have known of this gravely incorrect indefensible behavior. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/2] nvme: handle partially unique NID value 2025-04-14 11:19 ` [PATCH 0/2] nvme: handle partially unique NID value Christoph Hellwig @ 2025-04-14 11:31 ` Hannes Reinecke 2025-04-14 11:41 ` Christoph Hellwig 0 siblings, 1 reply; 13+ messages in thread From: Hannes Reinecke @ 2025-04-14 11:31 UTC (permalink / raw) To: Christoph Hellwig, hare Cc: Keith Busch, Sagi Grimberg, wagi, linux-nvme, Ballard, Curtis C (HPE Storage), Javier Gonzalez On 4/14/25 13:19, Christoph Hellwig wrote: > On Mon, Apr 14, 2025 at 11:09:57AM +0200, hare@kernel.org wrote: >> From: Hannes Reinecke <hare@kernel.org> >> >> Hi all, >> >> we have encountered a customer issue where the NID values for additional >> namespaces on the same device are not unique in all cases; the NGUID is, >> but the EUI64 is not. Problem is that prior to commit e2724cb9f0c4 there >> devices worked without a problem, but after that all NIDs are blanked out. >> This results in udev not creating persistent device links anymore and the >> system failing to boot. > > These devices are so broken that we absolutely should not support them > You've also received that feedback both in person from me, from Daniel > and from the nvme technical working group. I'm not sure why you insist > resending it instead of telling the OEM that specifically requested this > spec violating behavior from their SSD vendor to stop doing those > broken thing in the many months you have known of this gravely incorrect > indefensible behavior. > Thank you for your kind words. We have discussed this at LSF, and the involved parties (ie Samsung as the vendor, HPe as the IHV, and us as the OS provider) are happy with this approach. And we have paying customers for which the cited patch caused a regression, so ignoring it is not an option for us. I hoped this patchset would be acceptable for upstream; as it is not we will have to include this patchset as a SUSE-specific modification. Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect hare@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/2] nvme: handle partially unique NID value 2025-04-14 11:31 ` Hannes Reinecke @ 2025-04-14 11:41 ` Christoph Hellwig 2025-04-14 11:55 ` Hannes Reinecke 2025-04-17 16:56 ` Ballard, Curtis C (HPE Storage) 0 siblings, 2 replies; 13+ messages in thread From: Christoph Hellwig @ 2025-04-14 11:41 UTC (permalink / raw) To: Hannes Reinecke Cc: Christoph Hellwig, hare, Keith Busch, Sagi Grimberg, wagi, linux-nvme, Ballard, Curtis C (HPE Storage), Javier Gonzalez On Mon, Apr 14, 2025 at 01:31:29PM +0200, Hannes Reinecke wrote: > We have discussed this at LSF, and the involved parties (ie > Samsung as the vendor, HPe as the IHV, and us as the OS provider) > are happy with this approach. > And we have paying customers for which the cited patch caused a regression, > so ignoring it is not an option for us. Tell them to fix their broken systems instead of shifting this broken crap upstream. Really, we bend over backwards for consumer hardware that doesn't know better. We don't add crap for vendors that absolutely should know better participate in the working group and only provide expensive enterprise hardware just because they pay you. If you have so little spine that you want to accommodate this intentionally broken behavior do it in your tree but don't force the burden on others. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/2] nvme: handle partially unique NID value 2025-04-14 11:41 ` Christoph Hellwig @ 2025-04-14 11:55 ` Hannes Reinecke 2025-04-14 11:59 ` Christoph Hellwig 2025-04-17 16:56 ` Ballard, Curtis C (HPE Storage) 1 sibling, 1 reply; 13+ messages in thread From: Hannes Reinecke @ 2025-04-14 11:55 UTC (permalink / raw) To: Christoph Hellwig Cc: hare, Keith Busch, Sagi Grimberg, linux-nvme, Ballard, Curtis C (HPE Storage), Javier Gonzalez, Daniel Wagner On 4/14/25 13:41, Christoph Hellwig wrote: > On Mon, Apr 14, 2025 at 01:31:29PM +0200, Hannes Reinecke wrote: >> We have discussed this at LSF, and the involved parties (ie >> Samsung as the vendor, HPe as the IHV, and us as the OS provider) >> are happy with this approach. >> And we have paying customers for which the cited patch caused a regression, >> so ignoring it is not an option for us. > > Tell them to fix their broken systems instead of shifting this broken > crap upstream. Really, we bend over backwards for consumer hardware > that doesn't know better. We don't add crap for vendors that absolutely > should know better participate in the working group and only provide > expensive enterprise hardware just because they pay you. If you have > so little spine that you want to accommodate this intentionally broken > behavior do it in your tree but don't force the burden on others. A simple NACK would have been sufficient. Cheers, Hannes 'spineless' Reinecke -- Dr. Hannes Reinecke Kernel Storage Architect hare@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/2] nvme: handle partially unique NID value 2025-04-14 11:55 ` Hannes Reinecke @ 2025-04-14 11:59 ` Christoph Hellwig 0 siblings, 0 replies; 13+ messages in thread From: Christoph Hellwig @ 2025-04-14 11:59 UTC (permalink / raw) To: Hannes Reinecke Cc: Christoph Hellwig, hare, Keith Busch, Sagi Grimberg, linux-nvme, Ballard, Curtis C (HPE Storage), Javier Gonzalez, Daniel Wagner On Mon, Apr 14, 2025 at 01:55:26PM +0200, Hannes Reinecke wrote: > A simple NACK would have been sufficient. Not, it won't. Your behavior here where you keep for something really stupid after repeated NAKs is infuriating. As is the OEMs behavior to even ask for this behavior to start with. ^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: [PATCH 0/2] nvme: handle partially unique NID value 2025-04-14 11:41 ` Christoph Hellwig 2025-04-14 11:55 ` Hannes Reinecke @ 2025-04-17 16:56 ` Ballard, Curtis C (HPE Storage) [not found] ` <CGME20250502082359uscas1p1e2a9858dcc9200ab1d1d863c4495fc0a@uscas1p1.samsung.com> 1 sibling, 1 reply; 13+ messages in thread From: Ballard, Curtis C (HPE Storage) @ 2025-04-17 16:56 UTC (permalink / raw) To: Christoph Hellwig, Hannes Reinecke Cc: hare@kernel.org, Keith Busch, Sagi Grimberg, wagi@lst.de, linux-nvme@lists.infradead.org, Javier Gonzalez Christoph, There is no debate about whether the NID reporting behavior is incorrect and has to be fixed. It definitely has to be fixed and is getting fixed for new drives. That behavior was a defect, not a request, and I have theories on how people that probably knew better missed realizing that. Unfortunately the incorrect implementation was missed for quite a while and there are drives in the field that have a correct NGUID and an invalid EUI64 in some specific configurations. There is no simple fix for the drives in the field. I've seen some reflector traffic that suggests that similar behavior has been seen in other drives. Since the NGUID is valid, and is the value used as the unique namespace ID (when present), the issue didn't create problems in the environment where the drives were being used until a uniqueness check was performed on the EUI64. It is a very serious error that the EUI64 is not unique and it is completely appropriate for that to be flagged. Having a quirk of some kind that allows the drives to be used, when they worked perfectly previously, seems like the right thing to do. A discussion on how to appropriately flag this serious error seems to be in order if the method proposed by Hannes isn't acceptable. Curtis -----Original Message----- From: Christoph Hellwig <hch@lst.de> Sent: Monday, April 14, 2025 5:41 AM To: Hannes Reinecke <hare@suse.de> Cc: Christoph Hellwig <hch@lst.de>; hare@kernel.org; Keith Busch <kbusch@kernel.org>; Sagi Grimberg <sagi@grimberg.me>; wagi@lst.de; linux-nvme@lists.infradead.org; Ballard, Curtis C (HPE Storage) <curtis.ballard@hpe.com>; Javier Gonzalez <javier.gonz@samsung.com> Subject: Re: [PATCH 0/2] nvme: handle partially unique NID value On Mon, Apr 14, 2025 at 01:31:29PM +0200, Hannes Reinecke wrote: > We have discussed this at LSF, and the involved parties (ie > Samsung as the vendor, HPe as the IHV, and us as the OS provider) > are happy with this approach. > And we have paying customers for which the cited patch caused a regression, > so ignoring it is not an option for us. Tell them to fix their broken systems instead of shifting this broken crap upstream. Really, we bend over backwards for consumer hardware that doesn't know better. We don't add crap for vendors that absolutely should know better participate in the working group and only provide expensive enterprise hardware just because they pay you. If you have so little spine that you want to accommodate this intentionally broken behavior do it in your tree but don't force the burden on others. ^ permalink raw reply [flat|nested] 13+ messages in thread
[parent not found: <CGME20250502082359uscas1p1e2a9858dcc9200ab1d1d863c4495fc0a@uscas1p1.samsung.com>]
[parent not found: <eee8de0a44074dd3bfb0fc6ec425b647@samsung.com>]
* Re: [PATCH 0/2] nvme: handle partially unique NID value [not found] ` <eee8de0a44074dd3bfb0fc6ec425b647@samsung.com> @ 2025-05-02 10:25 ` Christoph Hellwig [not found] ` <27a99b458f0144fba094726e4f470552@samsung.com> 0 siblings, 1 reply; 13+ messages in thread From: Christoph Hellwig @ 2025-05-02 10:25 UTC (permalink / raw) To: Judy Brock Cc: Ballard, Curtis C (HPE Storage), Christoph Hellwig, Hannes Reinecke, hare@kernel.org, Keith Busch, Sagi Grimberg, wagi@lst.de, linux-nvme@lists.infradead.org, Javier Gonzalez Judy, stop it. HP could have trivially asked Samsung for a firmware update and gotten it in the time they used all their commercial channels to fight actually having to fix their intentional stupidity. If you are a supposedly legit enterprise storage vendor and ask your SSD vendor for a non-standard data corrupting feature you have to admit your failure and fix it. And I'm amazed how HP is trying to flex their commercial muscle to get around not having to admit their failure and fix it, and I'm also really surprised how little spine you folks have to play along with this. This is very disappointing and does not make you a trustworthy actor. ^ permalink raw reply [flat|nested] 13+ messages in thread
[parent not found: <27a99b458f0144fba094726e4f470552@samsung.com>]
* Re: [PATCH 0/2] nvme: handle partially unique NID value [not found] ` <27a99b458f0144fba094726e4f470552@samsung.com> @ 2025-05-03 3:46 ` Keith Busch 2025-05-05 9:51 ` Javier Gonzalez 0 siblings, 1 reply; 13+ messages in thread From: Keith Busch @ 2025-05-03 3:46 UTC (permalink / raw) To: Judy Brock Cc: Christoph Hellwig, Ballard, Curtis C (HPE Storage), Hannes Reinecke, hare@kernel.org, Sagi Grimberg, wagi@lst.de, linux-nvme@lists.infradead.org, Javier Gonzalez On Fri, May 02, 2025 at 11:26:47PM +0000, Judy Brock wrote: > For example, both companies have "admitted failure" but you haven't > heard it: the FW in question definitely has a defect. Neither company > is holding it out as compliant. Both companies have indicated going > forward, the defective behavior has been corrected. > > Not sure why you keep saying that neither company is willing to fix it. I'm a little confused. If the conflicting behavior has been corrected, why is this being discussed here? A device side fix is surely the best possible outcome for everyone here. Requiring a kernel upgrade to work around undesirable firmware behavior is a bit unpleasant for end users when you already have a solution that works with any nvme capable OS. ? ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/2] nvme: handle partially unique NID value 2025-05-03 3:46 ` Keith Busch @ 2025-05-05 9:51 ` Javier Gonzalez 2025-05-05 11:11 ` Christoph Hellwig 0 siblings, 1 reply; 13+ messages in thread From: Javier Gonzalez @ 2025-05-05 9:51 UTC (permalink / raw) To: Keith Busch Cc: Judy Brock, Christoph Hellwig, Ballard, Curtis C (HPE Storage), Hannes Reinecke, hare@kernel.org, Sagi Grimberg, wagi@lst.de, linux-nvme@lists.infradead.org On 02.05.2025 21:46, Keith Busch wrote: >On Fri, May 02, 2025 at 11:26:47PM +0000, Judy Brock wrote: >> For example, both companies have "admitted failure" but you haven't >> heard it: the FW in question definitely has a defect. Neither company >> is holding it out as compliant. Both companies have indicated going >> forward, the defective behavior has been corrected. >> >> Not sure why you keep saying that neither company is willing to fix it. > >I'm a little confused. If the conflicting behavior has been corrected, >why is this being discussed here? A device side fix is surely the best >possible outcome for everyone here. Requiring a kernel upgrade to work >around undesirable firmware behavior is a bit unpleasant for end users >when you already have a solution that works with any nvme capable OS. ? Agree. I think Hannes' approach to add dynamic quirks was the closest to an upstreamable solution, as a general quick for the PM177xx is not acceptable. But I completely understand Christoph's NAK. I think we should let HPE distros carry this quirk for drives where they would not want to roll a FW update. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/2] nvme: handle partially unique NID value 2025-05-05 9:51 ` Javier Gonzalez @ 2025-05-05 11:11 ` Christoph Hellwig 2025-05-05 13:08 ` Javier Gonzalez 0 siblings, 1 reply; 13+ messages in thread From: Christoph Hellwig @ 2025-05-05 11:11 UTC (permalink / raw) To: Javier Gonzalez Cc: Keith Busch, Judy Brock, Christoph Hellwig, Ballard, Curtis C (HPE Storage), Hannes Reinecke, hare@kernel.org, Sagi Grimberg, wagi@lst.de, linux-nvme@lists.infradead.org On Mon, May 05, 2025 at 11:51:39AM +0200, Javier Gonzalez wrote: > I think we should let HPE distros carry this quirk for drives where they > would not want to roll a FW update. Or just goddamn people to upgrade the broken firmware. Without it their data is at risk, so they'd better do it. Also maybe this is a lesson to SSDs vendors (and I really mean all of them) that if they can't push back ob broken "features" due to market dynamics they should at least OEM brand the devices in the identify data so that the blame gets deflected to the right party. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/2] nvme: handle partially unique NID value 2025-05-05 11:11 ` Christoph Hellwig @ 2025-05-05 13:08 ` Javier Gonzalez 2025-05-05 13:49 ` Laurence Oberman 0 siblings, 1 reply; 13+ messages in thread From: Javier Gonzalez @ 2025-05-05 13:08 UTC (permalink / raw) To: Christoph Hellwig Cc: Keith Busch, Judy Brock, Ballard, Curtis C (HPE Storage), Hannes Reinecke, hare@kernel.org, Sagi Grimberg, wagi@lst.de, linux-nvme@lists.infradead.org On 05.05.2025 13:11, Christoph Hellwig wrote: >On Mon, May 05, 2025 at 11:51:39AM +0200, Javier Gonzalez wrote: >> I think we should let HPE distros carry this quirk for drives where they >> would not want to roll a FW update. > >Or just goddamn people to upgrade the broken firmware. Without it >their data is at risk, so they'd better do it. > >Also maybe this is a lesson to SSDs vendors (and I really mean all of >them) that if they can't push back ob broken "features" due to market >dynamics they should at least OEM brand the devices in the identify >data so that the blame gets deflected to the right party. Agree. The dynamics of how OEMs want to apply FW updates is up to them, but there is no doubt this has been a mess. Hope we have learned a lesson... ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/2] nvme: handle partially unique NID value 2025-05-05 13:08 ` Javier Gonzalez @ 2025-05-05 13:49 ` Laurence Oberman 2025-05-06 7:07 ` Javier Gonzalez 0 siblings, 1 reply; 13+ messages in thread From: Laurence Oberman @ 2025-05-05 13:49 UTC (permalink / raw) To: Javier Gonzalez, Christoph Hellwig Cc: Keith Busch, Judy Brock, Ballard, Curtis C (HPE Storage), Hannes Reinecke, hare@kernel.org, Sagi Grimberg, wagi@lst.de, linux-nvme@lists.infradead.org On Mon, 2025-05-05 at 15:08 +0200, Javier Gonzalez wrote: > On 05.05.2025 13:11, Christoph Hellwig wrote: > > On Mon, May 05, 2025 at 11:51:39AM +0200, Javier Gonzalez wrote: > > > I think we should let HPE distros carry this quirk for drives > > > where they > > > would not want to roll a FW update. > > > > Or just goddamn people to upgrade the broken firmware. Without it > > their data is at risk, so they'd better do it. > > > > Also maybe this is a lesson to SSDs vendors (and I really mean all > > of > > them) that if they can't push back ob broken "features" due to > > market > > dynamics they should at least OEM brand the devices in the identify > > data so that the blame gets deflected to the right party. > > Agree. The dynamics of how OEMs want to apply FW updates is up to > them, > but there is no doubt this has been a mess. Hope we have learned a > lesson... > Seems what I sent last week is the same issue. For now we will fix this in a RHEL only kernel until the vendor gets F/W fixes out. There are a lot of devices out in the wild already I guess, that have this issue Thanks Laurence ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/2] nvme: handle partially unique NID value 2025-05-05 13:49 ` Laurence Oberman @ 2025-05-06 7:07 ` Javier Gonzalez 0 siblings, 0 replies; 13+ messages in thread From: Javier Gonzalez @ 2025-05-06 7:07 UTC (permalink / raw) To: Laurence Oberman Cc: Christoph Hellwig, Keith Busch, Judy Brock, Ballard, Curtis C (HPE Storage), Hannes Reinecke, hare@kernel.org, Sagi Grimberg, wagi@lst.de, linux-nvme@lists.infradead.org On 05.05.2025 09:49, Laurence Oberman wrote: >On Mon, 2025-05-05 at 15:08 +0200, Javier Gonzalez wrote: >> On 05.05.2025 13:11, Christoph Hellwig wrote: >> > On Mon, May 05, 2025 at 11:51:39AM +0200, Javier Gonzalez wrote: >> > > I think we should let HPE distros carry this quirk for drives >> > > where they >> > > would not want to roll a FW update. >> > >> > Or just goddamn people to upgrade the broken firmware. Without it >> > their data is at risk, so they'd better do it. >> > >> > Also maybe this is a lesson to SSDs vendors (and I really mean all >> > of >> > them) that if they can't push back ob broken "features" due to >> > market >> > dynamics they should at least OEM brand the devices in the identify >> > data so that the blame gets deflected to the right party. >> >> Agree. The dynamics of how OEMs want to apply FW updates is up to >> them, >> but there is no doubt this has been a mess. Hope we have learned a >> lesson... >> > >Seems what I sent last week is the same issue. >For now we will fix this in a RHEL only kernel until the vendor gets >F/W fixes out. This is great. Thanks for the support Laurence! Curtis, With SUSE and RedHat picking this quirk, is it enough on your end? ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2025-05-06 7:07 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20250414090959.2015-1-hare@kernel.org>
2025-04-14 11:19 ` [PATCH 0/2] nvme: handle partially unique NID value Christoph Hellwig
2025-04-14 11:31 ` Hannes Reinecke
2025-04-14 11:41 ` Christoph Hellwig
2025-04-14 11:55 ` Hannes Reinecke
2025-04-14 11:59 ` Christoph Hellwig
2025-04-17 16:56 ` Ballard, Curtis C (HPE Storage)
[not found] ` <CGME20250502082359uscas1p1e2a9858dcc9200ab1d1d863c4495fc0a@uscas1p1.samsung.com>
[not found] ` <eee8de0a44074dd3bfb0fc6ec425b647@samsung.com>
2025-05-02 10:25 ` Christoph Hellwig
[not found] ` <27a99b458f0144fba094726e4f470552@samsung.com>
2025-05-03 3:46 ` Keith Busch
2025-05-05 9:51 ` Javier Gonzalez
2025-05-05 11:11 ` Christoph Hellwig
2025-05-05 13:08 ` Javier Gonzalez
2025-05-05 13:49 ` Laurence Oberman
2025-05-06 7:07 ` Javier Gonzalez
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox