public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
* Patch [0/3] Support eager_unmap for non ldpme sd devs
@ 2025-07-27 18:06 Kurt Garloff
  2025-07-28 15:35 ` Bart Van Assche
  2025-07-29  5:09 ` Martin K. Petersen
  0 siblings, 2 replies; 7+ messages in thread
From: Kurt Garloff @ 2025-07-27 18:06 UTC (permalink / raw)
  To: linux-scsi

Hi,

(Resent, the original message was not text/plain only. It's too long
that I have interacted with linux MLs, sorry.)

I had to do some rearrangement of NVMe drives over several of my
machines recently. I was using a few USB NVMe enclosures for this.
One of the things that was annoying was that I had no way to discard
free space using fstrim (or with blkdiscard or mkfs / mkswap options).

The SCSI disk driver (sd_mod) would just claim that discard is not
supported. It's not true. The NVMes support it as do the enclosures.
I have a few, though all have variants of RTL9210 chips.

But sd was too conservative to let me do it.

Attached series of patches address this. (Patches are against 6.16.0-rc7.)

1/3: Introduces an eager_unmap module parameter which lets the sd driver
      unmap/ws16/ws10 capabilities and issue these commands even for
      devices without ldpme support (thin provisioning). The VPD pages are
      being queried for all devices advertising SBC >= 3 to avoid breaking
      old hardware that might barf of being asked for VPDs.
2/3: Makes the approach more conservative by guarding the requesting of
      VPD pages also on eager_unmap being set. With this patch, the default
      behavior is the same as before the eager_unmap patches unless the
      parameter is set to non-zero.
3/3: This changes the default of eager_unmap to 1. It should be safe based
      on the SBC >= 3 protection, but it's hard to know for sure with all
      the possible broken hardware out there. So leave the parameter for
      admins that need to set it to 0. In case we have significant fallout,
      this is the one patch that would need to get reverted.

I have been running kernels with variants of these patches for over a year
now.

PS: Please copy me one responses, I have stopped being subscribed to
LKML and linux-scsi a long time ago. Sorry!

-- 
Kurt Garloff <kurt@garloff.de>, Cologne


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Patch [0/3] Support eager_unmap for non ldpme sd devs
  2025-07-27 18:06 Patch [0/3] Support eager_unmap for non ldpme sd devs Kurt Garloff
@ 2025-07-28 15:35 ` Bart Van Assche
  2025-07-29  5:09 ` Martin K. Petersen
  1 sibling, 0 replies; 7+ messages in thread
From: Bart Van Assche @ 2025-07-28 15:35 UTC (permalink / raw)
  To: Kurt Garloff, linux-scsi

On 7/27/25 11:06 AM, Kurt Garloff wrote:
> Hi,
> 
> (Resent, the original message was not text/plain only. It's too long
> that I have interacted with linux MLs, sorry.)
> 
> I had to do some rearrangement of NVMe drives over several of my
> machines recently. I was using a few USB NVMe enclosures for this.
> One of the things that was annoying was that I had no way to discard
> free space using fstrim (or with blkdiscard or mkfs / mkswap options).
> 
> The SCSI disk driver (sd_mod) would just claim that discard is not
> supported. It's not true. The NVMes support it as do the enclosures.
> I have a few, though all have variants of RTL9210 chips.
> 
> But sd was too conservative to let me do it.
> 
> Attached series of patches address this. (Patches are against 6.16.0-rc7.)
> 
> 1/3: Introduces an eager_unmap module parameter which lets the sd driver
>       unmap/ws16/ws10 capabilities and issue these commands even for
>       devices without ldpme support (thin provisioning). The VPD pages are
>       being queried for all devices advertising SBC >= 3 to avoid breaking
>       old hardware that might barf of being asked for VPDs.
> 2/3: Makes the approach more conservative by guarding the requesting of
>       VPD pages also on eager_unmap being set. With this patch, the default
>       behavior is the same as before the eager_unmap patches unless the
>       parameter is set to non-zero.
> 3/3: This changes the default of eager_unmap to 1. It should be safe based
>       on the SBC >= 3 protection, but it's hard to know for sure with all
>       the possible broken hardware out there. So leave the parameter for
>       admins that need to set it to 0. In case we have significant fallout,
>       this is the one patch that would need to get reverted.
> 
> I have been running kernels with variants of these patches for over a year
> now.
> 
> PS: Please copy me one responses, I have stopped being subscribed to
> LKML and linux-scsi a long time ago. Sorry!

Please use git format-patch --cover-letter ... and git send-email ... to
publish a patch series. Nobody looks at patches included in email
attachments. Please also make sure that your ~/.gitconfig settings match
what is expected for posting patches on Linux kernel mailing lists.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Patch [0/3] Support eager_unmap for non ldpme sd devs
  2025-07-27 18:06 Patch [0/3] Support eager_unmap for non ldpme sd devs Kurt Garloff
  2025-07-28 15:35 ` Bart Van Assche
@ 2025-07-29  5:09 ` Martin K. Petersen
  2025-07-29  7:32   ` Kurt Garloff
  1 sibling, 1 reply; 7+ messages in thread
From: Martin K. Petersen @ 2025-07-29  5:09 UTC (permalink / raw)
  To: Kurt Garloff; +Cc: linux-scsi


Hi Kurt!

> The SCSI disk driver (sd_mod) would just claim that discard is not
> supported. It's not true. The NVMes support it as do the enclosures.

For some definition of support at least. If they can't figure out how to
set a single bit flag to enable the feature, then maybe their "support"
is not entirely trustworthy. But great that it works...

What are your devices actually reporting in:

  # sg_readcap -l /dev/sdN

  # sg_vpd -p bl /dev/sdN

  # sg_vpd -p lbpv /dev/sdN

?

> I have a few, though all have variants of RTL9210 chips.
>
> But sd was too conservative to let me do it.

Why not just have a udev rule which does:

  echo unmap > /sys/class/scsi_disk/H:C:T:L/provisioning_mode

?

Substitute "writesame_10" or "writesame_16" depending on what is
actually implemented.

-- 
Martin K. Petersen

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Patch [0/3] Support eager_unmap for non ldpme sd devs
  2025-07-29  5:09 ` Martin K. Petersen
@ 2025-07-29  7:32   ` Kurt Garloff
  2025-07-29  9:15     ` Kurt Garloff
  2025-07-31  4:42     ` Martin K. Petersen
  0 siblings, 2 replies; 7+ messages in thread
From: Kurt Garloff @ 2025-07-29  7:32 UTC (permalink / raw)
  To: Martin K. Petersen; +Cc: linux-scsi

Hi Martin,

On 29.07.25 07:09, Martin K. Petersen wrote:
>> The SCSI disk driver (sd_mod) would just claim that discard is not
>> supported. It's not true. The NVMes support it as do the enclosures.
> For some definition of support at least. If they can't figure out how to
> set a single bit flag to enable the feature, then maybe their "support"
> is not entirely trustworthy. But great that it works...
>
> What are your devices actually reporting in:
>
>    # sg_readcap -l /dev/sdN
Read Capacity results:
    Protection: prot_en=0, p_type=0, p_i_exponent=0
    Logical block provisioning: lbpme=0, lbprz=0
    Last LBA=3907029167 (0xe8e088af), Number of logical blocks=3907029168
    Logical block length=512 bytes
    Logical blocks per physical block exponent=0
    Lowest aligned LBA=0
Hence:
    Device size: 2000398934016 bytes, 1907729.1 MiB, 2000.40 GB, 2.00 TB

Yes, lbpme is 0.

>    # sg_vpd -p bl /dev/sdN
Block limits VPD page (SBC):
   Write same non-zero (WSNZ): 0
   Maximum compare and write length: 0 blocks [Command not implemented]
   Optimal transfer length granularity: 1 blocks
   Maximum transfer length: 65535 blocks
   Optimal transfer length: 65535 blocks
   Maximum prefetch transfer length: 0 blocks [ignored]
   Maximum unmap LBA count: 20971520
   Maximum unmap block descriptor count: 1
   Optimal unmap granularity: 1 blocks
   Unmap granularity alignment valid: false
   Unmap granularity alignment: 0 [invalid]
   Maximum write same length: 0 blocks [not reported]
   Maximum atomic transfer length: 0 blocks [not reported]
   Atomic alignment: 0 [unaligned atomic writes permitted]
   Atomic transfer length granularity: 0 [no granularity requirement
   Maximum atomic transfer length with atomic boundary: 0 blocks [not 
reported]
   Maximum atomic boundary size: 0 blocks [can only write atomic 1 block]

Max 10GiB unmap at a time.

>    # sg_vpd -p lbpv /dev/sdN

Logical block provisioning VPD page (SBC):
   Unmap command supported (LBPU): 1
   Write same (16) with unmap bit supported (LBPWS): 0
   Write same (10) with unmap bit supported (LBPWS10): 0
   Logical block provisioning read zeros (LBPRZ): 0
   Anchored LBAs supported (ANC_SUP): 0
   Threshold exponent: 0 [threshold sets not supported]
   Descriptor present (DP): 0
   Minimum percentage: 0 [not reported]
   Provisioning type: 0 (not known or fully provisioned)
   Threshold percentage: 0 [percentages not supported]

LBPU is 1.

>> I have a few, though all have variants of RTL9210 chips.
>>
>> But sd was too conservative to let me do it.
> Why not just have a udev rule which does:
>
>    echo unmap > /sys/class/scsi_disk/H:C:T:L/provisioning_mode
>
> ?
>
> Substitute "writesame_10" or "writesame_16" depending on what is
> actually implemented.

It does the job.

Question, as you say, is whether supporting unmap (LBPU=1) is
something to be trusted when the device says that Logical Block
Provisioning Management Enable is off.

Some docs mention LBPME in context of thin provisioning, which
is not a good description of what SSDs/NVMes do.

Using sg_readcap on a PCIe attached NVMe indeed does not result
in LBPME being set either.

root@framekurt(//):~ [0]# sg_readcap -l /dev/nvme0n1
Read Capacity results:
    Protection: prot_en=0, p_type=0, p_i_exponent=0
    Logical block provisioning: lbpme=0, lbprz=0
    Last LBA=8001573551 (0x1dcee52af), Number of logical blocks=8001573552
    Logical block length=512 bytes
    Logical blocks per physical block exponent=0
    Lowest aligned LBA=0
Hence:
    Device size: 4096805658624 bytes, 3907018.3 MiB, 4096.81 GB, 4.10 TB

Actually, here's the fun parts:
- The enclosures do not seem to set the LBPME bit when doing the
   translation to SCSI's READ_CAPACITY_16 command, as far as I
   can see. The Linux nvme driver does not do it either nor does the
   NVMe SCSI translation guide[*] even mention it.
- The nvme driver just relies on the dataset management bit (which the
   SCSI translation guide 1:1 relates to LBPU from VPD 0x82 lbpv).
- Attaching the same PCIe device in a USB enclosure of course results in
   the SD driver being used who does not see LBPME being set.
- An old SATA NVMe in the same enclosure actually does report LBPME=1
   and the (unpatched) SD driver happily uses the advertised unmap.

[*] 
https://www.nvmexpress.org/wp-content/uploads/NVM-Express-SCSI-Translation-Reference-1_1-Gold.pdf

My conclusion:
- Neither the Linux nvme driver nor USB enclosure firmware nor the
   NVMe-SCSI translation guide seem to emulate the LBPME bit for
   devices with unmap/ws/ws10 support.
- SATA NVMEs do seem to set LBPME=1 and the SD driver requires it.

One could argue that the NVMe-SCSI translation is incomplete and
should be fixed, but that discussion is now many years late.
I guess that other OS drivers do not require LBPME=1, which is
why the translation guide and subsequently firmwares and even the
Linux nvme driver don't care about LBPME. The SD driver probably
should be told to not care either ...

Remaining question is whether we'd recommend users to use a set of udev 
rules to fix up or whether we have a sd_mod parameter to ignore LBPME 
(if LBPU/LBPWS/LBPWS10 are set) or even ignore it by default (and only 
provide an opt-out parameter).

Best,

-- 
Kurt Garloff <kurt@garloff.de>, Cologne


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Patch [0/3] Support eager_unmap for non ldpme sd devs
  2025-07-29  7:32   ` Kurt Garloff
@ 2025-07-29  9:15     ` Kurt Garloff
  2025-07-29  9:37       ` Kurt Garloff
  2025-07-31  4:42     ` Martin K. Petersen
  1 sibling, 1 reply; 7+ messages in thread
From: Kurt Garloff @ 2025-07-29  9:15 UTC (permalink / raw)
  To: Martin K. Petersen; +Cc: linux-scsi

[-- Attachment #1: Type: text/plain, Size: 972 bytes --]

Hi,

On 29.07.25 09:32, Kurt Garloff wrote:
> Remaining question is whether we'd recommend users to use
> a set of udev rules to fix up or whether we have a sd_mod
> parameter to ignore LBPME (if LBPU/LBPWS/LBPWS10 are set)
> or even ignore it by default (and only provide an opt-out
> parameter).

I should probably mention that doing it via udev is complex.

ACTION!="add", GOTO="usb_nvme_end"
SUBSYSTEM=="usb_device", GOTO="usb_nvme_start"
SUBSYSTEM=="usb", ENV{DEVTYPE}=="usb_device", GOTO="usb_nvme_start"
GOTO="usb_nvme_end"

LABEL="usb_nvme_start"
# Realtek 9210 USB-NVMe bridge
ATTR{idVendor}=="0bda", ATTR{idProduct}=="9210", 
RUN="/usr/local/sbin/set-unmap.sh"
# More USB IDs to follow here

LABEL="usb_nvme_end"

with attached script in /usr/local/sbin/set-unmap.sh*
*It's somewhat error proof, but probably would not have hit the mark
yet for inclusion into a Linux distro.*
*

Patching sd would seem easier ...*
*

-- 
Kurt Garloff <kurt@garloff.de>, Cologne

[-- Attachment #2: set-unmap.sh --]
[-- Type: application/x-shellscript, Size: 2095 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Patch [0/3] Support eager_unmap for non ldpme sd devs
  2025-07-29  9:15     ` Kurt Garloff
@ 2025-07-29  9:37       ` Kurt Garloff
  0 siblings, 0 replies; 7+ messages in thread
From: Kurt Garloff @ 2025-07-29  9:37 UTC (permalink / raw)
  To: Martin K. Petersen; +Cc: linux-scsi

Hi,

On 29.07.25 11:15, Kurt Garloff wrote:
> with attached script in /usr/local/sbin/set-unmap.sh*
> *It's somewhat error proof, but probably would not have hit the mark
> yet for inclusion into a Linux distro.*

Well, and add a true into the debug function.
Apparently bash thinks that an empty function is illegal.

Cheers,

-- 
Kurt Garloff <kurt@garloff.de>, Cologne


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Patch [0/3] Support eager_unmap for non ldpme sd devs
  2025-07-29  7:32   ` Kurt Garloff
  2025-07-29  9:15     ` Kurt Garloff
@ 2025-07-31  4:42     ` Martin K. Petersen
  1 sibling, 0 replies; 7+ messages in thread
From: Martin K. Petersen @ 2025-07-31  4:42 UTC (permalink / raw)
  To: Kurt Garloff; +Cc: Martin K. Petersen, linux-scsi


Hi Kurt!

> Some docs mention LBPME in context of thin provisioning, which
> is not a good description of what SSDs/NVMes do.

Logical Block Provisioning is how discard and other capacity
provisioning primitives are defined in SCSI. SCSI devices which support
unmapping logical blocks are required to set LBPME. This includes SCSI
SSDs which support "discards" as well as storage arrays which implement
thin provisioning.

LBPME was originally called TPE since it concerned itself with thin
provisioning only. The flag was later renamed to accommodate devices
using other provisioning models.

> One could argue that the NVMe-SCSI translation is incomplete and
> should be fixed, but that discussion is now many years late.

Unfortunately SCSI-NVMe translation isn't standardized the same way as
SCSI-ATA translation is.

Implementing SCSI-NVMe translation requires an appropriate understanding
of both protocols. There is no formal translation specification to rely
on.

> Linux nvme driver don't care about LBPME.

The Linux NVMe driver doesn't know about SCSI protocol flags. The
protocol translation is being done by your enclosure's bridge device.

The flag corresponding to LBPME in NVMe is the DSM flag in ONCS in
IDENTIFY CONTROLLER. That is what your bridge firmware should have
translated to LBPME=1.

The flag corresponding to LBPME in ATA is TRIM SUPPORTED in IDENTIFY
DEVICE. It appears your device handled that translation correctly.

> Remaining question is whether we'd recommend users to use a set of
> udev rules to fix up or whether we have a sd_mod parameter to ignore
> LBPME (if LBPU/LBPWS/LBPWS10 are set) or even ignore it by default
> (and only provide an opt-out parameter).

The SCSI spec requires devices to set LBPME=1 to enable Logical Block
Provisioning. This is not optional. It is perfectly reasonable for a
device to report LBPWS to indicate that the WRITE SAME(16) command is
implemented. That does not imply that Logical Block Provisioning is
actually enabled. You could plug a hard disk into your enclosure, for
instance, and therefore LBPME would be 0 despite the bridge indicating
support for WRITE SAME(16).

We are not going to add an option to ignore LBPME. It is one of the most
reliable "does this device know what it is doing?" heuristics we have.
And we really can not trust a device which fails to handle even the
most basic bits of the SCSI protocol.

Twiddling the sysfs file via a udev rule is the appropriate way for
users to enable discards on devices which fail to implement the spec
correctly.

-- 
Martin K. Petersen

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-07-31  4:43 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-27 18:06 Patch [0/3] Support eager_unmap for non ldpme sd devs Kurt Garloff
2025-07-28 15:35 ` Bart Van Assche
2025-07-29  5:09 ` Martin K. Petersen
2025-07-29  7:32   ` Kurt Garloff
2025-07-29  9:15     ` Kurt Garloff
2025-07-29  9:37       ` Kurt Garloff
2025-07-31  4:42     ` Martin K. Petersen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox