* Patch [0/3] Support eager_unmap for non ldpme sd devs @ 2025-07-27 18:06 Kurt Garloff 2025-07-28 15:35 ` Bart Van Assche 2025-07-29 5:09 ` Martin K. Petersen 0 siblings, 2 replies; 7+ messages in thread From: Kurt Garloff @ 2025-07-27 18:06 UTC (permalink / raw) To: linux-scsi Hi, (Resent, the original message was not text/plain only. It's too long that I have interacted with linux MLs, sorry.) I had to do some rearrangement of NVMe drives over several of my machines recently. I was using a few USB NVMe enclosures for this. One of the things that was annoying was that I had no way to discard free space using fstrim (or with blkdiscard or mkfs / mkswap options). The SCSI disk driver (sd_mod) would just claim that discard is not supported. It's not true. The NVMes support it as do the enclosures. I have a few, though all have variants of RTL9210 chips. But sd was too conservative to let me do it. Attached series of patches address this. (Patches are against 6.16.0-rc7.) 1/3: Introduces an eager_unmap module parameter which lets the sd driver unmap/ws16/ws10 capabilities and issue these commands even for devices without ldpme support (thin provisioning). The VPD pages are being queried for all devices advertising SBC >= 3 to avoid breaking old hardware that might barf of being asked for VPDs. 2/3: Makes the approach more conservative by guarding the requesting of VPD pages also on eager_unmap being set. With this patch, the default behavior is the same as before the eager_unmap patches unless the parameter is set to non-zero. 3/3: This changes the default of eager_unmap to 1. It should be safe based on the SBC >= 3 protection, but it's hard to know for sure with all the possible broken hardware out there. So leave the parameter for admins that need to set it to 0. In case we have significant fallout, this is the one patch that would need to get reverted. I have been running kernels with variants of these patches for over a year now. PS: Please copy me one responses, I have stopped being subscribed to LKML and linux-scsi a long time ago. Sorry! -- Kurt Garloff <kurt@garloff.de>, Cologne ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Patch [0/3] Support eager_unmap for non ldpme sd devs 2025-07-27 18:06 Patch [0/3] Support eager_unmap for non ldpme sd devs Kurt Garloff @ 2025-07-28 15:35 ` Bart Van Assche 2025-07-29 5:09 ` Martin K. Petersen 1 sibling, 0 replies; 7+ messages in thread From: Bart Van Assche @ 2025-07-28 15:35 UTC (permalink / raw) To: Kurt Garloff, linux-scsi On 7/27/25 11:06 AM, Kurt Garloff wrote: > Hi, > > (Resent, the original message was not text/plain only. It's too long > that I have interacted with linux MLs, sorry.) > > I had to do some rearrangement of NVMe drives over several of my > machines recently. I was using a few USB NVMe enclosures for this. > One of the things that was annoying was that I had no way to discard > free space using fstrim (or with blkdiscard or mkfs / mkswap options). > > The SCSI disk driver (sd_mod) would just claim that discard is not > supported. It's not true. The NVMes support it as do the enclosures. > I have a few, though all have variants of RTL9210 chips. > > But sd was too conservative to let me do it. > > Attached series of patches address this. (Patches are against 6.16.0-rc7.) > > 1/3: Introduces an eager_unmap module parameter which lets the sd driver > unmap/ws16/ws10 capabilities and issue these commands even for > devices without ldpme support (thin provisioning). The VPD pages are > being queried for all devices advertising SBC >= 3 to avoid breaking > old hardware that might barf of being asked for VPDs. > 2/3: Makes the approach more conservative by guarding the requesting of > VPD pages also on eager_unmap being set. With this patch, the default > behavior is the same as before the eager_unmap patches unless the > parameter is set to non-zero. > 3/3: This changes the default of eager_unmap to 1. It should be safe based > on the SBC >= 3 protection, but it's hard to know for sure with all > the possible broken hardware out there. So leave the parameter for > admins that need to set it to 0. In case we have significant fallout, > this is the one patch that would need to get reverted. > > I have been running kernels with variants of these patches for over a year > now. > > PS: Please copy me one responses, I have stopped being subscribed to > LKML and linux-scsi a long time ago. Sorry! Please use git format-patch --cover-letter ... and git send-email ... to publish a patch series. Nobody looks at patches included in email attachments. Please also make sure that your ~/.gitconfig settings match what is expected for posting patches on Linux kernel mailing lists. Thanks, Bart. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Patch [0/3] Support eager_unmap for non ldpme sd devs 2025-07-27 18:06 Patch [0/3] Support eager_unmap for non ldpme sd devs Kurt Garloff 2025-07-28 15:35 ` Bart Van Assche @ 2025-07-29 5:09 ` Martin K. Petersen 2025-07-29 7:32 ` Kurt Garloff 1 sibling, 1 reply; 7+ messages in thread From: Martin K. Petersen @ 2025-07-29 5:09 UTC (permalink / raw) To: Kurt Garloff; +Cc: linux-scsi Hi Kurt! > The SCSI disk driver (sd_mod) would just claim that discard is not > supported. It's not true. The NVMes support it as do the enclosures. For some definition of support at least. If they can't figure out how to set a single bit flag to enable the feature, then maybe their "support" is not entirely trustworthy. But great that it works... What are your devices actually reporting in: # sg_readcap -l /dev/sdN # sg_vpd -p bl /dev/sdN # sg_vpd -p lbpv /dev/sdN ? > I have a few, though all have variants of RTL9210 chips. > > But sd was too conservative to let me do it. Why not just have a udev rule which does: echo unmap > /sys/class/scsi_disk/H:C:T:L/provisioning_mode ? Substitute "writesame_10" or "writesame_16" depending on what is actually implemented. -- Martin K. Petersen ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Patch [0/3] Support eager_unmap for non ldpme sd devs 2025-07-29 5:09 ` Martin K. Petersen @ 2025-07-29 7:32 ` Kurt Garloff 2025-07-29 9:15 ` Kurt Garloff 2025-07-31 4:42 ` Martin K. Petersen 0 siblings, 2 replies; 7+ messages in thread From: Kurt Garloff @ 2025-07-29 7:32 UTC (permalink / raw) To: Martin K. Petersen; +Cc: linux-scsi Hi Martin, On 29.07.25 07:09, Martin K. Petersen wrote: >> The SCSI disk driver (sd_mod) would just claim that discard is not >> supported. It's not true. The NVMes support it as do the enclosures. > For some definition of support at least. If they can't figure out how to > set a single bit flag to enable the feature, then maybe their "support" > is not entirely trustworthy. But great that it works... > > What are your devices actually reporting in: > > # sg_readcap -l /dev/sdN Read Capacity results: Protection: prot_en=0, p_type=0, p_i_exponent=0 Logical block provisioning: lbpme=0, lbprz=0 Last LBA=3907029167 (0xe8e088af), Number of logical blocks=3907029168 Logical block length=512 bytes Logical blocks per physical block exponent=0 Lowest aligned LBA=0 Hence: Device size: 2000398934016 bytes, 1907729.1 MiB, 2000.40 GB, 2.00 TB Yes, lbpme is 0. > # sg_vpd -p bl /dev/sdN Block limits VPD page (SBC): Write same non-zero (WSNZ): 0 Maximum compare and write length: 0 blocks [Command not implemented] Optimal transfer length granularity: 1 blocks Maximum transfer length: 65535 blocks Optimal transfer length: 65535 blocks Maximum prefetch transfer length: 0 blocks [ignored] Maximum unmap LBA count: 20971520 Maximum unmap block descriptor count: 1 Optimal unmap granularity: 1 blocks Unmap granularity alignment valid: false Unmap granularity alignment: 0 [invalid] Maximum write same length: 0 blocks [not reported] Maximum atomic transfer length: 0 blocks [not reported] Atomic alignment: 0 [unaligned atomic writes permitted] Atomic transfer length granularity: 0 [no granularity requirement Maximum atomic transfer length with atomic boundary: 0 blocks [not reported] Maximum atomic boundary size: 0 blocks [can only write atomic 1 block] Max 10GiB unmap at a time. > # sg_vpd -p lbpv /dev/sdN Logical block provisioning VPD page (SBC): Unmap command supported (LBPU): 1 Write same (16) with unmap bit supported (LBPWS): 0 Write same (10) with unmap bit supported (LBPWS10): 0 Logical block provisioning read zeros (LBPRZ): 0 Anchored LBAs supported (ANC_SUP): 0 Threshold exponent: 0 [threshold sets not supported] Descriptor present (DP): 0 Minimum percentage: 0 [not reported] Provisioning type: 0 (not known or fully provisioned) Threshold percentage: 0 [percentages not supported] LBPU is 1. >> I have a few, though all have variants of RTL9210 chips. >> >> But sd was too conservative to let me do it. > Why not just have a udev rule which does: > > echo unmap > /sys/class/scsi_disk/H:C:T:L/provisioning_mode > > ? > > Substitute "writesame_10" or "writesame_16" depending on what is > actually implemented. It does the job. Question, as you say, is whether supporting unmap (LBPU=1) is something to be trusted when the device says that Logical Block Provisioning Management Enable is off. Some docs mention LBPME in context of thin provisioning, which is not a good description of what SSDs/NVMes do. Using sg_readcap on a PCIe attached NVMe indeed does not result in LBPME being set either. root@framekurt(//):~ [0]# sg_readcap -l /dev/nvme0n1 Read Capacity results: Protection: prot_en=0, p_type=0, p_i_exponent=0 Logical block provisioning: lbpme=0, lbprz=0 Last LBA=8001573551 (0x1dcee52af), Number of logical blocks=8001573552 Logical block length=512 bytes Logical blocks per physical block exponent=0 Lowest aligned LBA=0 Hence: Device size: 4096805658624 bytes, 3907018.3 MiB, 4096.81 GB, 4.10 TB Actually, here's the fun parts: - The enclosures do not seem to set the LBPME bit when doing the translation to SCSI's READ_CAPACITY_16 command, as far as I can see. The Linux nvme driver does not do it either nor does the NVMe SCSI translation guide[*] even mention it. - The nvme driver just relies on the dataset management bit (which the SCSI translation guide 1:1 relates to LBPU from VPD 0x82 lbpv). - Attaching the same PCIe device in a USB enclosure of course results in the SD driver being used who does not see LBPME being set. - An old SATA NVMe in the same enclosure actually does report LBPME=1 and the (unpatched) SD driver happily uses the advertised unmap. [*] https://www.nvmexpress.org/wp-content/uploads/NVM-Express-SCSI-Translation-Reference-1_1-Gold.pdf My conclusion: - Neither the Linux nvme driver nor USB enclosure firmware nor the NVMe-SCSI translation guide seem to emulate the LBPME bit for devices with unmap/ws/ws10 support. - SATA NVMEs do seem to set LBPME=1 and the SD driver requires it. One could argue that the NVMe-SCSI translation is incomplete and should be fixed, but that discussion is now many years late. I guess that other OS drivers do not require LBPME=1, which is why the translation guide and subsequently firmwares and even the Linux nvme driver don't care about LBPME. The SD driver probably should be told to not care either ... Remaining question is whether we'd recommend users to use a set of udev rules to fix up or whether we have a sd_mod parameter to ignore LBPME (if LBPU/LBPWS/LBPWS10 are set) or even ignore it by default (and only provide an opt-out parameter). Best, -- Kurt Garloff <kurt@garloff.de>, Cologne ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Patch [0/3] Support eager_unmap for non ldpme sd devs 2025-07-29 7:32 ` Kurt Garloff @ 2025-07-29 9:15 ` Kurt Garloff 2025-07-29 9:37 ` Kurt Garloff 2025-07-31 4:42 ` Martin K. Petersen 1 sibling, 1 reply; 7+ messages in thread From: Kurt Garloff @ 2025-07-29 9:15 UTC (permalink / raw) To: Martin K. Petersen; +Cc: linux-scsi [-- Attachment #1: Type: text/plain, Size: 972 bytes --] Hi, On 29.07.25 09:32, Kurt Garloff wrote: > Remaining question is whether we'd recommend users to use > a set of udev rules to fix up or whether we have a sd_mod > parameter to ignore LBPME (if LBPU/LBPWS/LBPWS10 are set) > or even ignore it by default (and only provide an opt-out > parameter). I should probably mention that doing it via udev is complex. ACTION!="add", GOTO="usb_nvme_end" SUBSYSTEM=="usb_device", GOTO="usb_nvme_start" SUBSYSTEM=="usb", ENV{DEVTYPE}=="usb_device", GOTO="usb_nvme_start" GOTO="usb_nvme_end" LABEL="usb_nvme_start" # Realtek 9210 USB-NVMe bridge ATTR{idVendor}=="0bda", ATTR{idProduct}=="9210", RUN="/usr/local/sbin/set-unmap.sh" # More USB IDs to follow here LABEL="usb_nvme_end" with attached script in /usr/local/sbin/set-unmap.sh* *It's somewhat error proof, but probably would not have hit the mark yet for inclusion into a Linux distro.* * Patching sd would seem easier ...* * -- Kurt Garloff <kurt@garloff.de>, Cologne [-- Attachment #2: set-unmap.sh --] [-- Type: application/x-shellscript, Size: 2095 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Patch [0/3] Support eager_unmap for non ldpme sd devs 2025-07-29 9:15 ` Kurt Garloff @ 2025-07-29 9:37 ` Kurt Garloff 0 siblings, 0 replies; 7+ messages in thread From: Kurt Garloff @ 2025-07-29 9:37 UTC (permalink / raw) To: Martin K. Petersen; +Cc: linux-scsi Hi, On 29.07.25 11:15, Kurt Garloff wrote: > with attached script in /usr/local/sbin/set-unmap.sh* > *It's somewhat error proof, but probably would not have hit the mark > yet for inclusion into a Linux distro.* Well, and add a true into the debug function. Apparently bash thinks that an empty function is illegal. Cheers, -- Kurt Garloff <kurt@garloff.de>, Cologne ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Patch [0/3] Support eager_unmap for non ldpme sd devs 2025-07-29 7:32 ` Kurt Garloff 2025-07-29 9:15 ` Kurt Garloff @ 2025-07-31 4:42 ` Martin K. Petersen 1 sibling, 0 replies; 7+ messages in thread From: Martin K. Petersen @ 2025-07-31 4:42 UTC (permalink / raw) To: Kurt Garloff; +Cc: Martin K. Petersen, linux-scsi Hi Kurt! > Some docs mention LBPME in context of thin provisioning, which > is not a good description of what SSDs/NVMes do. Logical Block Provisioning is how discard and other capacity provisioning primitives are defined in SCSI. SCSI devices which support unmapping logical blocks are required to set LBPME. This includes SCSI SSDs which support "discards" as well as storage arrays which implement thin provisioning. LBPME was originally called TPE since it concerned itself with thin provisioning only. The flag was later renamed to accommodate devices using other provisioning models. > One could argue that the NVMe-SCSI translation is incomplete and > should be fixed, but that discussion is now many years late. Unfortunately SCSI-NVMe translation isn't standardized the same way as SCSI-ATA translation is. Implementing SCSI-NVMe translation requires an appropriate understanding of both protocols. There is no formal translation specification to rely on. > Linux nvme driver don't care about LBPME. The Linux NVMe driver doesn't know about SCSI protocol flags. The protocol translation is being done by your enclosure's bridge device. The flag corresponding to LBPME in NVMe is the DSM flag in ONCS in IDENTIFY CONTROLLER. That is what your bridge firmware should have translated to LBPME=1. The flag corresponding to LBPME in ATA is TRIM SUPPORTED in IDENTIFY DEVICE. It appears your device handled that translation correctly. > Remaining question is whether we'd recommend users to use a set of > udev rules to fix up or whether we have a sd_mod parameter to ignore > LBPME (if LBPU/LBPWS/LBPWS10 are set) or even ignore it by default > (and only provide an opt-out parameter). The SCSI spec requires devices to set LBPME=1 to enable Logical Block Provisioning. This is not optional. It is perfectly reasonable for a device to report LBPWS to indicate that the WRITE SAME(16) command is implemented. That does not imply that Logical Block Provisioning is actually enabled. You could plug a hard disk into your enclosure, for instance, and therefore LBPME would be 0 despite the bridge indicating support for WRITE SAME(16). We are not going to add an option to ignore LBPME. It is one of the most reliable "does this device know what it is doing?" heuristics we have. And we really can not trust a device which fails to handle even the most basic bits of the SCSI protocol. Twiddling the sysfs file via a udev rule is the appropriate way for users to enable discards on devices which fail to implement the spec correctly. -- Martin K. Petersen ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-07-31 4:43 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-07-27 18:06 Patch [0/3] Support eager_unmap for non ldpme sd devs Kurt Garloff 2025-07-28 15:35 ` Bart Van Assche 2025-07-29 5:09 ` Martin K. Petersen 2025-07-29 7:32 ` Kurt Garloff 2025-07-29 9:15 ` Kurt Garloff 2025-07-29 9:37 ` Kurt Garloff 2025-07-31 4:42 ` Martin K. Petersen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox