linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [BUG report] kernel warnings with Samsung 970 EVO 2TB SSD
@ 2025-08-16 11:40 Diederik de Haas
  2025-08-16 13:20 ` Keith Busch
  0 siblings, 1 reply; 6+ messages in thread
From: Diederik de Haas @ 2025-08-16 11:40 UTC (permalink / raw)
  To: linux-nvme; +Cc: linux-kernel, Diederik de Haas

[-- Attachment #1: Type: text/plain, Size: 4867 bytes --]

Hi,

I have a Samsung 970 EVO 2TB SSD and I see these kernel warnings:

root@nanopi-r5s:~# uname -a
Linux nanopi-r5s 6.16-arm64-cknow #1 SMP PREEMPT Debian 6.16-1 (2025-07-28) aarch64 GNU/Linux
root@nanopi-r5s:~# dmesg --level 0,1,2
root@nanopi-r5s:~# dmesg --level 3
root@nanopi-r5s:~# dmesg --level 4
[    2.410231] dw-apb-uart fe660000.serial: forbid DMA for kernel console
[    5.234812] gpio gpiochip0: Static allocation of GPIO base is deprecated, use dynamic allocation.
[    5.242112] gpio gpiochip1: Static allocation of GPIO base is deprecated, use dynamic allocation.
[    5.246222] gpio gpiochip2: Static allocation of GPIO base is deprecated, use dynamic allocation.
[    5.252811] gpio gpiochip3: Static allocation of GPIO base is deprecated, use dynamic allocation.
[    5.265791] gpio gpiochip4: Static allocation of GPIO base is deprecated, use dynamic allocation.
[    5.741901] r8169 0000:01:00.0: can't read MAC address, setting random one
[    5.806644] pci 0001:10:00.0: Primary bus is hard wired to 0
[    5.849952] r8169 0001:11:00.0: can't read MAC address, setting random one
[    6.017270] pci 0002:20:00.0: Primary bus is hard wired to 0
[    6.393688] nvme nvme0: missing or invalid SUBNQN field.
[   21.484306] nvme nvme0: using unchecked data buffer
root@nanopi-r5s:~# dmesg | grep nvme
[    6.386187] nvme nvme0: pci function 0002:21:00.0
[    6.386697] nvme 0002:21:00.0: enabling device (0000 -> 0002)
[    6.393688] nvme nvme0: missing or invalid SUBNQN field.
[    6.397901] nvme nvme0: D3 entry latency set to 8 seconds
[    6.428168] nvme nvme0: 4/0/0 default/read/poll queues
[    6.465173]  nvme0n1: p1
[   12.522314] systemd[1]: Starting modprobe@nvme_fabrics.service - Load Kernel Module nvme_fabrics...
[   12.973871] systemd[1]: modprobe@nvme_fabrics.service: Deactivated successfully.
[   12.977051] systemd[1]: Finished modprobe@nvme_fabrics.service - Load Kernel Module nvme_fabrics.
[   21.484306] nvme nvme0: using unchecked data buffer

Before I put this SSD into my FriendlyELEC NanoPi R5S (rk3568; arm64)
I had it in my main PC (AMD Ryzen 1800X; amd64) where I had these
warnings as well, so it seems directly connected to the drive, not the
device it's plugged into.

I wonder if something can be done to fix those warnings.
I'm not aware of these warnings causing actual problems, but I haven't
'really' used it thus far (mostly to store some media files), but I want
to use my NanoPi R5S as my server (with f.e. my git repos), so I want to
be extra sure my data won't be at risk. And I don't like ignoring kernel
warnings; I assume they're warnings for a reason.

Some more data about the drive:

root@nanopi-r5s:~# nvme list
Node                  Generic               SN                   Model                                    Namespace  Usage                      Format           FW Rev
--------------------- --------------------- -------------------- ---------------------------------------- ---------- -------------------------- ---------------- --------
/dev/nvme0n1          /dev/ng0n1            xxxxxxxxxxxxxxx      Samsung SSD 970 EVO Plus 2TB             0x1        534.51  GB /   2.00  TB    512   B +  0 B   2B2QEXM7

root@nanopi-r5s:~# nvme get-feature /dev/nvme0 -f 3
get-feature:0x03 (LBA Range Type): NVMe status: Invalid Namespace or Format: The namespace or the format of that namespace is invalid(0x200b)

I don't know if it would/can be risky to share the Serial Number, so I
blanked that out, but I can provide that if that would be helpful.

root@nanopi-r5s:~# lspci -v -s 0002:21:00.0
0002:21:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983 (prog-if 02 [NVM Express])
        Subsystem: Samsung Electronics Co Ltd SSD 970 EVO/PRO
        Flags: bus master, fast devsel, latency 0, IRQ 75
        Memory at f0200000 (64-bit, non-prefetchable) [size=16K]
        Capabilities: [40] Power Management version 3
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [70] Express Endpoint, IntMsgNum 0
        Capabilities: [b0] MSI-X: Enable+ Count=33 Masked-
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [148] Device Serial Number 00-00-00-00-00-00-00-00
        Capabilities: [158] Power Budgeting <?>
        Capabilities: [168] Secondary PCI Express
        Capabilities: [188] Latency Tolerance Reporting
        Capabilities: [190] L1 PM Substates
        Kernel driver in use: nvme
        Kernel modules: nvme

But I did not change the Device Serial Number from lspci.
AFAIK I have the latest firmware (checked with fwupd).

Happy to provide additional data, but as I don't know what would be
useful, I figured I'll leave it up to the experts to ask for it.

Cheers,
  Diederik

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG report] kernel warnings with Samsung 970 EVO 2TB SSD
  2025-08-16 11:40 [BUG report] kernel warnings with Samsung 970 EVO 2TB SSD Diederik de Haas
@ 2025-08-16 13:20 ` Keith Busch
  2025-08-16 14:11   ` Diederik de Haas
  0 siblings, 1 reply; 6+ messages in thread
From: Keith Busch @ 2025-08-16 13:20 UTC (permalink / raw)
  To: Diederik de Haas; +Cc: linux-nvme, linux-kernel

On Sat, Aug 16, 2025 at 01:40:44PM +0200, Diederik de Haas wrote:
> Hi,
> 
> I have a Samsung 970 EVO 2TB SSD and I see these kernel warnings:

... 
 
> I wonder if something can be done to fix those warnings.

Are you talking about this message?

   nvme nvme0: missing or invalid SUBNQN field

You can't do anything about it, but I wouldn't worry about it either.

If you want to see what the driver is reacting to, you can check the
subnqn from command line:

  # nvme id-ctrl /dev/nvme0 | grep subnqn

It'll probably be all zeros. The field has been required by spec, but
the driver tolerates ones that don't implement it. It's just a message
that the device isn't spec compliant, but otherwise perfectly
operational.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG report] kernel warnings with Samsung 970 EVO 2TB SSD
  2025-08-16 13:20 ` Keith Busch
@ 2025-08-16 14:11   ` Diederik de Haas
  2025-08-18 18:58     ` Keith Busch
  0 siblings, 1 reply; 6+ messages in thread
From: Diederik de Haas @ 2025-08-16 14:11 UTC (permalink / raw)
  To: Keith Busch; +Cc: linux-nvme, linux-kernel, Diederik de Haas

[-- Attachment #1: Type: text/plain, Size: 1703 bytes --]

Hi,

On Sat Aug 16, 2025 at 3:20 PM CEST, Keith Busch wrote:
> On Sat, Aug 16, 2025 at 01:40:44PM +0200, Diederik de Haas wrote:
>> I have a Samsung 970 EVO 2TB SSD and I see these kernel warnings:
>
> ... 
>  
>> I wonder if something can be done to fix those warnings.
>
> Are you talking about this message?
>
>    nvme nvme0: missing or invalid SUBNQN field
>
> You can't do anything about it, but I wouldn't worry about it either.

That's indeed one of them; good to know it's not to worry about.

> If you want to see what the driver is reacting to, you can check the
> subnqn from command line:
>
>   # nvme id-ctrl /dev/nvme0 | grep subnqn
>
> It'll probably be all zeros. The field has been required by spec, but
> the driver tolerates ones that don't implement it.

root@nanopi-r5s:~# nvme id-ctrl /dev/nvme0 | grep subnqn
subnqn    :

So it seems to be just empty?

> It's just a message that the device isn't spec compliant, but
> otherwise perfectly operational.

But still worthy of a warning (instead of info) msg?

The other kernel warning is this:

  nvme nvme0: using unchecked data buffer

The SUBNQN message appears every time, this one appears often, but not
always.

When researching this/these issues, I discovered the nvme-cli package
(with the nvme command) and via its manpage I found this command:

  nvme get-feature /dev/nvme0 -f 3

I didn't even know NVMe's had namespaces, but this didn't look good:

  The namespace or the format of that namespace is invalid(0x200b)

... without actually understanding what it means and/or what its
consequences are. It could be harmless and/or normal though.

Cheers,
  Diederik

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG report] kernel warnings with Samsung 970 EVO 2TB SSD
  2025-08-16 14:11   ` Diederik de Haas
@ 2025-08-18 18:58     ` Keith Busch
  2025-08-18 20:48       ` Diederik de Haas
  0 siblings, 1 reply; 6+ messages in thread
From: Keith Busch @ 2025-08-18 18:58 UTC (permalink / raw)
  To: Diederik de Haas; +Cc: linux-nvme, linux-kernel

On Sat, Aug 16, 2025 at 04:11:00PM +0200, Diederik de Haas wrote:
> On Sat Aug 16, 2025 at 3:20 PM CEST, Keith Busch wrote:
> 
> > If you want to see what the driver is reacting to, you can check the
> > subnqn from command line:
> >
> >   # nvme id-ctrl /dev/nvme0 | grep subnqn
> >
> > It'll probably be all zeros. The field has been required by spec, but
> > the driver tolerates ones that don't implement it.
> 
> root@nanopi-r5s:~# nvme id-ctrl /dev/nvme0 | grep subnqn
> subnqn    :
> 
> So it seems to be just empty?

They, it's interpreted as a string. All 0's would be an empty string.
 
> > It's just a message that the device isn't spec compliant, but
> > otherwise perfectly operational.
> 
> But still worthy of a warning (instead of info) msg?
> 
> The other kernel warning is this:
> 
>   nvme nvme0: using unchecked data buffer
> 
> The SUBNQN message appears every time, this one appears often, but not
> always.

That one means you've sent a user space passthrough command to a device
that doesn't support SGL DMA. Without that, the nvme protocol uses
implicitly sized DMA that the driver can't be sure is accurate. The user
could theoretically provide a short buffer that can corrupt memory if
done by accident, or be used as an attack vector if done by malicious
software.

This is also not something to worry about unless you run malicious or
buggy software.
 
> When researching this/these issues, I discovered the nvme-cli package
> (with the nvme command) and via its manpage I found this command:
> 
>   nvme get-feature /dev/nvme0 -f 3
> 
> I didn't even know NVMe's had namespaces, but this didn't look good:
> 
>   The namespace or the format of that namespace is invalid(0x200b)
> 
> ... without actually understanding what it means and/or what its
> consequences are. It could be harmless and/or normal though.

The feature you're requesting is the LBA range, which is namespace
scoped. You need to specify a namespace id, either by opening the
namespace's block device (/dev/nvme0n1) instead of the admin handle
(/dev/nvme0), or you can manually specify the namespace with paramters
"--namespace-id=1" or just "-n1".


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG report] kernel warnings with Samsung 970 EVO 2TB SSD
  2025-08-18 18:58     ` Keith Busch
@ 2025-08-18 20:48       ` Diederik de Haas
  2025-08-18 21:20         ` Diederik de Haas
  0 siblings, 1 reply; 6+ messages in thread
From: Diederik de Haas @ 2025-08-18 20:48 UTC (permalink / raw)
  To: Keith Busch; +Cc: linux-nvme, linux-kernel, Diederik de Haas

[-- Attachment #1: Type: text/plain, Size: 3231 bytes --]

Hi,

First of all: thanks for taking the time to answer my questions :)

On Mon Aug 18, 2025 at 8:58 PM CEST, Keith Busch wrote:
> On Sat, Aug 16, 2025 at 04:11:00PM +0200, Diederik de Haas wrote:
>> On Sat Aug 16, 2025 at 3:20 PM CEST, Keith Busch wrote:
>> 
>> > If you want to see what the driver is reacting to, you can check the
>> > subnqn from command line:
>> >
>> >   # nvme id-ctrl /dev/nvme0 | grep subnqn
>> >
>> > It'll probably be all zeros. The field has been required by spec, but
>> > the driver tolerates ones that don't implement it.
>> 
>> root@nanopi-r5s:~# nvme id-ctrl /dev/nvme0 | grep subnqn
>> subnqn    :
>> 
>> So it seems to be just empty?
>
> They, it's interpreted as a string. All 0's would be an empty string.

Ah yes, makes sense.

>> The other kernel warning is this:
>> 
>>   nvme nvme0: using unchecked data buffer
>> 
>> The SUBNQN message appears every time, this one appears often, but not
>> always.
>
> That one means you've sent a user space passthrough command to a device
> that doesn't support SGL DMA. Without that, the nvme protocol uses
> implicitly sized DMA that the driver can't be sure is accurate. The user
> could theoretically provide a short buffer that can corrupt memory if
> done by accident, or be used as an attack vector if done by malicious
> software.
>
> This is also not something to worry about unless you run malicious or
> buggy software.

I would be surprised if I was running malicious software, but pretty
much all software has bugs, so that's ofc possible.
(I run Debian Testing or Unstable on pretty much all my devices)

I thought it was a HW problem as the problem seemed to disappear from my
PC when I removed the NVMe drive from it. And when put in my NanoPi R5S
it appeared again on that device.
Seemed, as I just found out it happened on my PC as well (with Samsung 
960 PRO 1TB) this boot (but not the 20 boots prior).

Uninstalled the 3 programs from R5S that showed up the most around the
warning message and it's still there. 
Would 'dyndbg' be helpful to determine what program is buggy?
 
>> When researching this/these issues, I discovered the nvme-cli package
>> (with the nvme command) and via its manpage I found this command:
>> 
>>   nvme get-feature /dev/nvme0 -f 3
>> 
>> I didn't even know NVMe's had namespaces, but this didn't look good:
>> 
>>   The namespace or the format of that namespace is invalid(0x200b)
>> 
>> ... without actually understanding what it means and/or what its
>> consequences are. It could be harmless and/or normal though.
>
> The feature you're requesting is the LBA range, which is namespace
> scoped. You need to specify a namespace id, either by opening the
> namespace's block device (/dev/nvme0n1) instead of the admin handle
> (/dev/nvme0), or you can manually specify the namespace with paramters
> "--namespace-id=1" or just "-n1".

Adding "-n1" does show normal (AFAICT) output. It's all zeros though.
And now the error message makes sense too :-)
The nvme-cli man page could/should have a better (ie working) example,
but that's not a kernel problem.

Thanks for your help and reassurances :-)

Cheers,
  Diederik

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG report] kernel warnings with Samsung 970 EVO 2TB SSD
  2025-08-18 20:48       ` Diederik de Haas
@ 2025-08-18 21:20         ` Diederik de Haas
  0 siblings, 0 replies; 6+ messages in thread
From: Diederik de Haas @ 2025-08-18 21:20 UTC (permalink / raw)
  To: Keith Busch; +Cc: linux-nvme, linux-kernel, Diederik de Haas

[-- Attachment #1: Type: text/plain, Size: 1507 bytes --]

On Mon Aug 18, 2025 at 10:48 PM CEST, Diederik de Haas wrote:
> On Mon Aug 18, 2025 at 8:58 PM CEST, Keith Busch wrote:
>> On Sat, Aug 16, 2025 at 04:11:00PM +0200, Diederik de Haas wrote:
>>> On Sat Aug 16, 2025 at 3:20 PM CEST, Keith Busch wrote:
>>> 
>>> The other kernel warning is this:
>>> 
>>>   nvme nvme0: using unchecked data buffer
>>> 
>>> The SUBNQN message appears every time, this one appears often, but not
>>> always.
>>
>> That one means you've sent a user space passthrough command to a device
>> that doesn't support SGL DMA. Without that, the nvme protocol uses
>> implicitly sized DMA that the driver can't be sure is accurate. The user
>> could theoretically provide a short buffer that can corrupt memory if
>> done by accident, or be used as an attack vector if done by malicious
>> software.
>>
>> This is also not something to worry about unless you run malicious or
>> buggy software.
>
> I would be surprised if I was running malicious software, but pretty
> much all software has bugs, so that's ofc possible.
> ...
>
> Uninstalled the 3 programs from R5S that showed up the most around the
> warning message and it's still there. 
> Would 'dyndbg' be helpful to determine what program is buggy?
  
Looks like I found the 'winner': udisks2 (package)

I uninstalled that and in the 10 boots after that, I did not see the
message. Installed it again (without Recommends) and it was back on the
first (re)boot.

Thanks!

Cheers,
  Diederik

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-08-18 22:25 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-16 11:40 [BUG report] kernel warnings with Samsung 970 EVO 2TB SSD Diederik de Haas
2025-08-16 13:20 ` Keith Busch
2025-08-16 14:11   ` Diederik de Haas
2025-08-18 18:58     ` Keith Busch
2025-08-18 20:48       ` Diederik de Haas
2025-08-18 21:20         ` Diederik de Haas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).