Linux block layer

Linux block layer
 help / color / mirror / Atom feed

* [PATCH v7 3/9] dt-bindings: bluetooth: qcom: Add NVMEM BD address cell
From: Loic Poulain @ 2026-07-01 16:00 UTC (permalink / raw)
  To: Ulf Hansson, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Bjorn Andersson, Konrad Dybcio, Jens Axboe, Johannes Berg,
	Jeff Johnson, Bartosz Golaszewski, Marcel Holtmann,
	Luiz Augusto von Dentz, Balakrishna Godavarthi, Rocky Liao,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Srinivas Kandagatla, Andrew Lunn, Heiner Kallweit,
	Russell King, Saravana Kannan, Christian Marangi
  Cc: linux-mmc, devicetree, linux-kernel, linux-arm-msm, linux-block,
	linux-wireless, ath10k, linux-bluetooth, netdev, daniel,
	Loic Poulain, Bartosz Golaszewski
In-Reply-To: <20260701-block-as-nvmem-v7-0-3fe8205ef0a8@oss.qualcomm.com>

Add support for an NVMEM cell provider for "local-bd-address",
allowing the Bluetooth stack to retrieve controller's BD address
from non-volatile storage such as an EEPROM or an eMMC partition.

Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Signed-off-by: Loic Poulain <loic.poulain@oss.qualcomm.com>
---
 .../devicetree/bindings/net/bluetooth/qcom,bluetooth-common.yaml | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/Documentation/devicetree/bindings/net/bluetooth/qcom,bluetooth-common.yaml b/Documentation/devicetree/bindings/net/bluetooth/qcom,bluetooth-common.yaml
index c8e9c55c1afb4c8e05ba2dae41ce2db4194b4a0f..7cb28f30c9af032082f23311f2fc89a32f266f17 100644
--- a/Documentation/devicetree/bindings/net/bluetooth/qcom,bluetooth-common.yaml
+++ b/Documentation/devicetree/bindings/net/bluetooth/qcom,bluetooth-common.yaml
@@ -22,4 +22,13 @@ properties:
     description:
       boot firmware is incorrectly passing the address in big-endian order
 
+  nvmem-cells:
+    maxItems: 1
+    description:
+      Nvmem data cell that contains a 6 byte BD address with the most
+      significant byte first (big-endian).
+
+  nvmem-cell-names:
+    const: local-bd-address
+
 additionalProperties: true

-- 
2.34.1


^ permalink raw reply related

* [PATCH v7 7/9] Bluetooth: hci_sync: Add NVMEM-backed BD address retrieval
From: Loic Poulain @ 2026-07-01 16:00 UTC (permalink / raw)
  To: Ulf Hansson, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Bjorn Andersson, Konrad Dybcio, Jens Axboe, Johannes Berg,
	Jeff Johnson, Bartosz Golaszewski, Marcel Holtmann,
	Luiz Augusto von Dentz, Balakrishna Godavarthi, Rocky Liao,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Srinivas Kandagatla, Andrew Lunn, Heiner Kallweit,
	Russell King, Saravana Kannan, Christian Marangi
  Cc: linux-mmc, devicetree, linux-kernel, linux-arm-msm, linux-block,
	linux-wireless, ath10k, linux-bluetooth, netdev, daniel,
	Loic Poulain, Bartosz Golaszewski, Piotr Kwapulinski
In-Reply-To: <20260701-block-as-nvmem-v7-0-3fe8205ef0a8@oss.qualcomm.com>

Some devices store the Bluetooth BD address in non-volatile
memory, which can be accessed through the NVMEM framework.
Similar to Ethernet or WiFi MAC addresses, add support for
reading the BD address from a 'local-bd-address' NVMEM cell.

As with the device-tree provided BD address, add a quirk to
indicate whether a device or platform should attempt to read
the address from NVMEM when no valid in-chip address is present.
Also add a quirk to indicate if the address is stored in
big-endian byte order.

Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Reviewed-by: Piotr Kwapulinski <piotr.kwapulinski@intel.com>
Signed-off-by: Loic Poulain <loic.poulain@oss.qualcomm.com>
---
 include/net/bluetooth/hci.h | 18 ++++++++++++++++++
 net/bluetooth/hci_sync.c    | 41 ++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 58 insertions(+), 1 deletion(-)

diff --git a/include/net/bluetooth/hci.h b/include/net/bluetooth/hci.h
index 572b1c620c5d653a1fe10b26c1b0ba33e8f4968f..7686466d1109253b0d75edeb5f6a99fb98ce4cc6 100644
--- a/include/net/bluetooth/hci.h
+++ b/include/net/bluetooth/hci.h
@@ -164,6 +164,24 @@ enum {
 	 */
 	HCI_QUIRK_BDADDR_PROPERTY_BROKEN,
 
+	/* When this quirk is set, the public Bluetooth address
+	 * initially reported by HCI Read BD Address command
+	 * is considered invalid. The public BD Address can be
+	 * retrieved via a 'local-bd-address' NVMEM cell.
+	 *
+	 * This quirk can be set before hci_register_dev is called or
+	 * during the hdev->setup vendor callback.
+	 */
+	HCI_QUIRK_USE_BDADDR_NVMEM,
+
+	/* When this quirk is set, the Bluetooth Device Address provided by
+	 * the 'local-bd-address' NVMEM is stored in big-endian order.
+	 *
+	 * This quirk can be set before hci_register_dev is called or
+	 * during the hdev->setup vendor callback.
+	 */
+	HCI_QUIRK_BDADDR_NVMEM_BE,
+
 	/* When this quirk is set, the duplicate filtering during
 	 * scanning is based on Bluetooth devices addresses. To allow
 	 * RSSI based updates, restart scanning if needed.
diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c
index fd3aacdea512a37c22b9a2be90c89ddca4b4d99f..56248d4abcb5b1d9993962a9f6bf60bf865b8d7b 100644
--- a/net/bluetooth/hci_sync.c
+++ b/net/bluetooth/hci_sync.c
@@ -7,6 +7,7 @@
  */
 
 #include <linux/property.h>
+#include <linux/of_net.h>
 
 #include <net/bluetooth/bluetooth.h>
 #include <net/bluetooth/hci_core.h>
@@ -3588,6 +3589,39 @@ int hci_powered_update_sync(struct hci_dev *hdev)
 	return 0;
 }
 
+/**
+ * hci_dev_get_bd_addr_from_nvmem - Get the Bluetooth Device Address
+ *				    (BD_ADDR) for a HCI device from
+ *				    an NVMEM cell.
+ * @hdev:	The HCI device
+ *
+ * Search for 'local-bd-address' NVMEM cell in the device firmware node.
+ *
+ * All-zero BD addresses are rejected (unprovisioned).
+ *
+ * Return: 0 on success, or a negative error code on failure.
+ */
+static int hci_dev_get_bd_addr_from_nvmem(struct hci_dev *hdev)
+{
+	struct device_node *np = dev_of_node(hdev->dev.parent);
+	u8 ba[sizeof(bdaddr_t)];
+	int err;
+
+	if (!np)
+		return -ENODEV;
+
+	err = of_get_nvmem_eui48(np, "local-bd-address", ba);
+	if (err)
+		return err;
+
+	if (hci_test_quirk(hdev, HCI_QUIRK_BDADDR_NVMEM_BE))
+		baswap(&hdev->public_addr, (bdaddr_t *)ba);
+	else
+		bacpy(&hdev->public_addr, (bdaddr_t *)ba);
+
+	return 0;
+}
+
 /**
  * hci_dev_get_bd_addr_from_property - Get the Bluetooth Device Address
  *				       (BD_ADDR) for a HCI device from
@@ -5042,12 +5076,17 @@ static int hci_dev_setup_sync(struct hci_dev *hdev)
 	 * its setup callback.
 	 */
 	invalid_bdaddr = hci_test_quirk(hdev, HCI_QUIRK_INVALID_BDADDR) ||
-			 hci_test_quirk(hdev, HCI_QUIRK_USE_BDADDR_PROPERTY);
+			 hci_test_quirk(hdev, HCI_QUIRK_USE_BDADDR_PROPERTY) ||
+			 hci_test_quirk(hdev, HCI_QUIRK_USE_BDADDR_NVMEM);
 	if (!ret) {
 		if (hci_test_quirk(hdev, HCI_QUIRK_USE_BDADDR_PROPERTY) &&
 		    !bacmp(&hdev->public_addr, BDADDR_ANY))
 			hci_dev_get_bd_addr_from_property(hdev);
 
+		if (hci_test_quirk(hdev, HCI_QUIRK_USE_BDADDR_NVMEM) &&
+		    !bacmp(&hdev->public_addr, BDADDR_ANY))
+			hci_dev_get_bd_addr_from_nvmem(hdev);
+
 		if (invalid_bdaddr && bacmp(&hdev->public_addr, BDADDR_ANY) &&
 		    hdev->set_bdaddr) {
 			ret = hdev->set_bdaddr(hdev, &hdev->public_addr);

-- 
2.34.1


^ permalink raw reply related

* [PATCH v7 8/9] Bluetooth: qca: Set NVMEM BD address quirks when address is invalid
From: Loic Poulain @ 2026-07-01 16:00 UTC (permalink / raw)
  To: Ulf Hansson, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Bjorn Andersson, Konrad Dybcio, Jens Axboe, Johannes Berg,
	Jeff Johnson, Bartosz Golaszewski, Marcel Holtmann,
	Luiz Augusto von Dentz, Balakrishna Godavarthi, Rocky Liao,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Srinivas Kandagatla, Andrew Lunn, Heiner Kallweit,
	Russell King, Saravana Kannan, Christian Marangi
  Cc: linux-mmc, devicetree, linux-kernel, linux-arm-msm, linux-block,
	linux-wireless, ath10k, linux-bluetooth, netdev, daniel,
	Loic Poulain, Bartosz Golaszewski
In-Reply-To: <20260701-block-as-nvmem-v7-0-3fe8205ef0a8@oss.qualcomm.com>

When the controller BD address is invalid (zero or default),
set the NVMEM quirks to allow retrieving the address from a
'local-bd-address' NVMEM cell. The BD address is often stored
alongside the WiFi MAC address in big-endian format, so also
set the big-endian quirk.

Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Signed-off-by: Loic Poulain <loic.poulain@oss.qualcomm.com>
---
 drivers/bluetooth/btqca.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/bluetooth/btqca.c b/drivers/bluetooth/btqca.c
index dda76365726f0bfe0e80e05fe04859fa4f0592e1..df33eacfd29fa680f393f90215150743e6001d5b 100644
--- a/drivers/bluetooth/btqca.c
+++ b/drivers/bluetooth/btqca.c
@@ -721,8 +721,11 @@ static int qca_check_bdaddr(struct hci_dev *hdev, const struct qca_fw_config *co
 	}
 
 	bda = (struct hci_rp_read_bd_addr *)skb->data;
-	if (!bacmp(&bda->bdaddr, &config->bdaddr))
+	if (!bacmp(&bda->bdaddr, &config->bdaddr)) {
 		hci_set_quirk(hdev, HCI_QUIRK_USE_BDADDR_PROPERTY);
+		hci_set_quirk(hdev, HCI_QUIRK_USE_BDADDR_NVMEM);
+		hci_set_quirk(hdev, HCI_QUIRK_BDADDR_NVMEM_BE);
+	}
 
 	kfree_skb(skb);
 

-- 
2.34.1


^ permalink raw reply related

* [PATCH v7 6/9] net: of_net: Add of_get_nvmem_eui48() helper for EUI-48 lookup
From: Loic Poulain @ 2026-07-01 16:00 UTC (permalink / raw)
  To: Ulf Hansson, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Bjorn Andersson, Konrad Dybcio, Jens Axboe, Johannes Berg,
	Jeff Johnson, Bartosz Golaszewski, Marcel Holtmann,
	Luiz Augusto von Dentz, Balakrishna Godavarthi, Rocky Liao,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Srinivas Kandagatla, Andrew Lunn, Heiner Kallweit,
	Russell King, Saravana Kannan, Christian Marangi
  Cc: linux-mmc, devicetree, linux-kernel, linux-arm-msm, linux-block,
	linux-wireless, ath10k, linux-bluetooth, netdev, daniel,
	Loic Poulain, Bartosz Golaszewski
In-Reply-To: <20260701-block-as-nvmem-v7-0-3fe8205ef0a8@oss.qualcomm.com>

Factor out the common NVMEM EUI-48 retrieval logic from
of_get_mac_address_nvmem() into a new of_get_nvmem_eui48() helper that
accepts the NVMEM cell name as a parameter. This allows other subsystems
(e.g. Bluetooth) to reuse the same lookup-validate-copy pattern with a
different cell name, without duplicating code.

of_get_mac_address_nvmem() is updated to call of_get_nvmem_eui48() with
"mac-address", preserving its existing behavior.

Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Signed-off-by: Loic Poulain <loic.poulain@oss.qualcomm.com>
---
 include/linux/of_net.h |  7 +++++++
 net/core/of_net.c      | 49 +++++++++++++++++++++++++++++++++++++------------
 2 files changed, 44 insertions(+), 12 deletions(-)

diff --git a/include/linux/of_net.h b/include/linux/of_net.h
index d88715a0b3a52f87af23d47791bea3baf5be5200..7854ba555d9a55f3d020a37fe00a27ae52e0e5dc 100644
--- a/include/linux/of_net.h
+++ b/include/linux/of_net.h
@@ -15,6 +15,7 @@ struct net_device;
 extern int of_get_phy_mode(struct device_node *np, phy_interface_t *interface);
 extern int of_get_mac_address(struct device_node *np, u8 *mac);
 extern int of_get_mac_address_nvmem(struct device_node *np, u8 *mac);
+int of_get_nvmem_eui48(struct device_node *np, const char *cell_name, u8 *addr);
 int of_get_ethdev_address(struct device_node *np, struct net_device *dev);
 extern struct net_device *of_find_net_device_by_node(struct device_node *np);
 #else
@@ -34,6 +35,12 @@ static inline int of_get_mac_address_nvmem(struct device_node *np, u8 *mac)
 	return -ENODEV;
 }
 
+static inline int of_get_nvmem_eui48(struct device_node *np,
+				      const char *cell_name, u8 *addr)
+{
+	return -ENODEV;
+}
+
 static inline int of_get_ethdev_address(struct device_node *np, struct net_device *dev)
 {
 	return -ENODEV;
diff --git a/net/core/of_net.c b/net/core/of_net.c
index 93ea425b9248a23f4f95a336e9cdbf0053248e32..11c1acca151266ac9287457b4050a54b08e2b5f5 100644
--- a/net/core/of_net.c
+++ b/net/core/of_net.c
@@ -61,9 +61,7 @@ static int of_get_mac_addr(struct device_node *np, const char *name, u8 *addr)
 int of_get_mac_address_nvmem(struct device_node *np, u8 *addr)
 {
 	struct platform_device *pdev = of_find_device_by_node(np);
-	struct nvmem_cell *cell;
-	const void *mac;
-	size_t len;
+	u8 mac[ETH_ALEN] __aligned(sizeof(u16));
 	int ret;
 
 	/* Try lookup by device first, there might be a nvmem_cell_lookup
@@ -75,27 +73,54 @@ int of_get_mac_address_nvmem(struct device_node *np, u8 *addr)
 		return ret;
 	}
 
-	cell = of_nvmem_cell_get(np, "mac-address");
+	ret = of_get_nvmem_eui48(np, "mac-address", mac);
+	if (ret)
+		return ret;
+
+	if (!is_valid_ether_addr(mac))
+		return -EINVAL;
+
+	ether_addr_copy(addr, mac);
+	return 0;
+}
+EXPORT_SYMBOL(of_get_mac_address_nvmem);
+
+/**
+ * of_get_nvmem_eui48 - Read a 6-byte EUI-48 address from a named NVMEM cell.
+ * @np:		Device node to look up the NVMEM cell from.
+ * @cell_name:	Name of the NVMEM cell (e.g. "mac-address", "local-bd-address").
+ * @addr:	Output buffer for the 6-byte address.
+ *
+ * Reads the named NVMEM cell and validates that it contains a non-zero 6-byte
+ * address. Returns 0 on success, negative errno on failure.
+ */
+int of_get_nvmem_eui48(struct device_node *np, const char *cell_name, u8 *addr)
+{
+	struct nvmem_cell *cell;
+	const void *eui48;
+	size_t len;
+
+	cell = of_nvmem_cell_get(np, cell_name);
 	if (IS_ERR(cell))
 		return PTR_ERR(cell);
 
-	mac = nvmem_cell_read(cell, &len);
+	eui48 = nvmem_cell_read(cell, &len);
 	nvmem_cell_put(cell);
 
-	if (IS_ERR(mac))
-		return PTR_ERR(mac);
+	if (IS_ERR(eui48))
+		return PTR_ERR(eui48);
 
-	if (len != ETH_ALEN || !is_valid_ether_addr(mac)) {
-		kfree(mac);
+	if (len != ETH_ALEN || !memchr_inv(eui48, 0, ETH_ALEN)) {
+		kfree(eui48);
 		return -EINVAL;
 	}
 
-	memcpy(addr, mac, ETH_ALEN);
-	kfree(mac);
+	memcpy(addr, eui48, ETH_ALEN);
+	kfree(eui48);
 
 	return 0;
 }
-EXPORT_SYMBOL(of_get_mac_address_nvmem);
+EXPORT_SYMBOL_GPL(of_get_nvmem_eui48);
 
 /**
  * of_get_mac_address()

-- 
2.34.1


^ permalink raw reply related

* [PATCH v7 9/9] arm64: dts: qcom: arduino-imola: Describe NVMEM layout for WiFi/BT addresses
From: Loic Poulain @ 2026-07-01 16:00 UTC (permalink / raw)
  To: Ulf Hansson, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Bjorn Andersson, Konrad Dybcio, Jens Axboe, Johannes Berg,
	Jeff Johnson, Bartosz Golaszewski, Marcel Holtmann,
	Luiz Augusto von Dentz, Balakrishna Godavarthi, Rocky Liao,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Srinivas Kandagatla, Andrew Lunn, Heiner Kallweit,
	Russell King, Saravana Kannan, Christian Marangi
  Cc: linux-mmc, devicetree, linux-kernel, linux-arm-msm, linux-block,
	linux-wireless, ath10k, linux-bluetooth, netdev, daniel,
	Loic Poulain, Konrad Dybcio, Bartosz Golaszewski
In-Reply-To: <20260701-block-as-nvmem-v7-0-3fe8205ef0a8@oss.qualcomm.com>

On Arduino Uno-Q, the eMMC boot1 partition is factory provisioned
with device-specific information such as the WiFi MAC address
and the Bluetooth BD address. This partition can serve as an
alternative to additional non-volatile memory, such as a
dedicated EEPROM.

The eMMC boot partitions are typically good candidates, as they
are relatively small, read-only by default (and can be enforced
as hardware read-only), and are not affected by board reflashing
procedures, which generally target the eMMC user or GP partitions.

Describe the corresponding nvmem-layout for the WiFi and Bluetooth
addresses, and point the WiFi and Bluetooth nodes to the appropriate
NVMEM cells to retrieve them.

Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Signed-off-by: Loic Poulain <loic.poulain@oss.qualcomm.com>
---
 arch/arm64/boot/dts/qcom/qrb2210-arduino-imola.dts | 32 ++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/arch/arm64/boot/dts/qcom/qrb2210-arduino-imola.dts b/arch/arm64/boot/dts/qcom/qrb2210-arduino-imola.dts
index bf088fa9807f040f0c8f405f9111b01790b09377..38839b8a361e76f6c1989924b16095b9d8815f66 100644
--- a/arch/arm64/boot/dts/qcom/qrb2210-arduino-imola.dts
+++ b/arch/arm64/boot/dts/qcom/qrb2210-arduino-imola.dts
@@ -409,7 +409,33 @@ &sdhc_1 {
 	no-sdio;
 	no-sd;
 
+	#address-cells = <1>;
+	#size-cells = <0>;
+
 	status = "okay";
+
+	card@0 {
+		compatible = "mmc-card";
+		reg = <0>;
+
+		partitions-boot1 {
+			compatible = "fixed-layout";
+			#address-cells = <1>;
+			#size-cells = <1>;
+
+			wifi_mac_addr: mac-addr@4400 {
+				compatible = "mac-base";
+				reg = <0x4400 0x6>;
+				#nvmem-cell-cells = <1>;
+			};
+
+			bd_addr: bd-addr@5400 {
+				compatible = "mac-base";
+				reg = <0x5400 0x6>;
+				#nvmem-cell-cells = <1>;
+			};
+		};
+	};
 };
 
 &spi5 {
@@ -512,6 +538,9 @@ bluetooth {
 		vddch0-supply = <&pm4125_l22>;
 		enable-gpios = <&tlmm 87 GPIO_ACTIVE_HIGH>;
 		max-speed = <3000000>;
+
+		nvmem-cells = <&bd_addr 0>;
+		nvmem-cell-names = "local-bd-address";
 	};
 };
 
@@ -557,6 +586,9 @@ &wifi {
 	qcom,ath10k-calibration-variant = "ArduinoImola";
 	firmware-name = "qcm2290";
 
+	nvmem-cells = <&wifi_mac_addr 0>;
+	nvmem-cell-names = "mac-address";
+
 	status = "okay";
 };
 

-- 
2.34.1


^ permalink raw reply related

* Re: [PATCH v4 1/5] block: use integrity interval instead of sector as seed
From: Caleb Sander Mateos @ 2026-07-01 19:56 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Anuj Gupta, linux-block, linux-nvme, linux-scsi, target-devel,
	linux-kernel, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
	Chaitanya Kulkarni
In-Reply-To: <20260627054220.2174166-2-csander@purestorage.com>

On Fri, Jun 26, 2026 at 10:42 PM Caleb Sander Mateos
<csander@purestorage.com> wrote:
>
> bio_integrity_setup_default() and blk_integrity_iterate() set the
> integrity seed (initial reference tag) to the absolute address in the
> block device in units of 512-byte sectors. However, Type 1 and Type 2
> ref tags are actually the least significant bits of the integrity
> interval number. On devices with integrity interval size > 512 bytes,
> the ref tag seed thus isn't the correct initial ref tag. The ref tag
> seed is correctly incremented/decremented in units of integrity
> intervals in bio_integrity_map_iter(), bio_integrity_advance(), and
> blk_integrity_interval().
>
> For REQ_OP_{WRITE,READ}, blk_integrity_{prepare,complete}() covers up
> this ref tag seed discrepancy by adding/subtracting the difference
> between the initial integrity interval and ref tag values to/from each
> ref tag in the protection information. However, REQ_OP_ZONE_APPEND can
> also carry PI but doesn't go through blk_integrity_prepare() because the
> final data location on the zoned block device isn't known until the
> operation completes. As a result, the REQ_OP_ZONE_APPEND PI ref tags
> start from the ref tag seed, which isn't in integrity interval units.
> Subsequent reads of the appended blocks will fail to remap the ref tags
> from the expected integrity interval numbers to sector numbers.
>
> Additionally, NVMe and many SCSI transports support offloading ref tag
> remapping to the device by specifying the expected initial ref tag in
> the command. The kernel doesn't currently take advantage of this, always
> remapping ref tags in software for reads and writes and setting the
> expected initial ref tag to the integrity interval. Setting the ref tag
> seed in units of integrity intervals would be a prerequisite to allowing
> the kernel to skip the software remapping and pass the ref tag seed as
> the expected initial ref tag in the command.
>
> So compute the ref tag seed in units of integrity intervals instead of
> sectors to avoid relying on ref tag remapping for the conversion.

Martin, are you okay with this updated commit message? Would be nice
to get this fix in so auto-integrity works correctly for zone appends
on 4KB-integrity-interval devices.

Thanks,
Caleb


>
> Fixes: 0512a75b98f8 ("block: Introduce REQ_OP_ZONE_APPEND")
> Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
> Reviewed-by: Anuj Gupta <anuj20.g@samsung.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---
>  block/bio-integrity.c | 3 ++-
>  block/t10-pi.c        | 3 ++-
>  2 files changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/block/bio-integrity.c b/block/bio-integrity.c
> index b23e2434d80c..d20f9002c7c9 100644
> --- a/block/bio-integrity.c
> +++ b/block/bio-integrity.c
> @@ -102,12 +102,13 @@ void bio_integrity_free_buf(struct bio_integrity_payload *bip)
>
>  void bio_integrity_setup_default(struct bio *bio)
>  {
>         struct blk_integrity *bi = blk_get_integrity(bio->bi_bdev->bd_disk);
>         struct bio_integrity_payload *bip = bio_integrity(bio);
> +       u64 seed = bio->bi_iter.bi_sector >> (bi->interval_exp - SECTOR_SHIFT);
>
> -       bip_set_seed(bip, bio->bi_iter.bi_sector);
> +       bip_set_seed(bip, seed);
>
>         if (bi->csum_type) {
>                 bip->bip_flags |= BIP_CHECK_GUARD;
>                 if (bi->csum_type == BLK_INTEGRITY_CSUM_IP)
>                         bip->bip_flags |= BIP_IP_CHECKSUM;
> diff --git a/block/t10-pi.c b/block/t10-pi.c
> index a19b4e102a83..e58d5eb6cefb 100644
> --- a/block/t10-pi.c
> +++ b/block/t10-pi.c
> @@ -308,18 +308,19 @@ static blk_status_t blk_integrity_iterate(struct bio *bio,
>                                           struct bvec_iter *data_iter,
>                                           bool verify)
>  {
>         struct blk_integrity *bi = blk_get_integrity(bio->bi_bdev->bd_disk);
>         struct bio_integrity_payload *bip = bio_integrity(bio);
> +       u64 seed = data_iter->bi_sector >> (bi->interval_exp - SECTOR_SHIFT);
>         struct blk_integrity_iter iter = {
>                 .bio = bio,
>                 .bip = bip,
>                 .bi = bi,
>                 .data_iter = *data_iter,
>                 .prot_iter = bip->bip_iter,
>                 .interval_remaining = 1 << bi->interval_exp,
> -               .seed = data_iter->bi_sector,
> +               .seed = seed,
>                 .csum = 0,
>         };
>         blk_status_t ret = BLK_STS_OK;
>
>         while (iter.data_iter.bi_size && ret == BLK_STS_OK) {
> --
> 2.54.0
>

^ permalink raw reply

* Re: [PATCH] block: Make WBT latency writes honor enable state
From: guzebing @ 2026-07-01 23:27 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-kernel
In-Reply-To: <20260621014030.1625306-1-guzebing1612@gmail.com>

Gentle ping.

This patch addresses a case where writing the current default WBT
latency to queue/wbt_lat_usec returns success but does not enable WBT
when WBT was disabled by default, for example after selecting BFQ.

If this is intended behavior, then perhaps no fix is needed. Please let
me know if I missed the intended semantics here.

Thanks,
Guzebing

On 6/21/26 9:40 AM, guzebing wrote:
> From: Guzebing <guzebing1612@gmail.com>
> 
> queue/wbt_lat_usec controls both the stored WBT latency target and the
> effective WBT enable state.
> 
> The old no-op check skipped updates whenever the converted latency
> matched the stored min_lat_nsec. That check ignored whether the current
> WBT state already matched the state requested by the write. For a queue
> disabled by default, attempting to enable WBT by writing the default
> value through sysfs could return success while the enable state was left
> unchanged.
> 
> Treat a write as a no-op only when both the stored latency and the
> effective WBT enabled state already match the converted value.
> 
> Signed-off-by: Guzebing <guzebing1612@gmail.com>
> ---
> Background:
> 
> The issue can be reproduced on an NVMe namespace when BFQ is available:
> 
>   echo bfq > /sys/block/nvme0n1/queue/scheduler
>   cat /sys/block/nvme0n1/queue/wbt_lat_usec
>   echo 2000 > /sys/block/nvme0n1/queue/wbt_lat_usec
>   cat /sys/block/nvme0n1/queue/wbt_lat_usec
> 
> After BFQ selects the queue, WBT is disabled by default.  On a
> non-rotational NVMe namespace the stored default latency remains
> 2000000 nsec, while the sysfs file reports 0 because the effective WBT
> state is disabled:
> 
>   queue/wbt_lat_usec = 0
>   debugfs enabled = 3
>   debugfs min_lat_nsec = 2000000
> 
> Writing the default value succeeds, but the old no-op check skips the
> state transition because min_lat_nsec already matches the converted
> value:
> 
>   echo 2000 > /sys/block/nvme0n1/queue/wbt_lat_usec
>   # echo returns success, but:
>   queue/wbt_lat_usec = 0
>   debugfs enabled = 3
>   debugfs min_lat_nsec = 2000000
> 
> As a control, writing a non-default value first does work:
> 
>   echo 5000 > /sys/block/nvme0n1/queue/wbt_lat_usec
>   queue/wbt_lat_usec = 5000
>   debugfs enabled = 2
>   debugfs min_lat_nsec = 5000000
> 
> Writing the default value after that also works, because the stored
> latency changes from 5000000 nsec back to 2000000 nsec:
> 
>   echo 2000 > /sys/block/nvme0n1/queue/wbt_lat_usec
>   queue/wbt_lat_usec = 2000
>   debugfs enabled = 2
>   debugfs min_lat_nsec = 2000000
> 
> With this patch, writing the default value after BFQ default-disables
> WBT also re-enables WBT as expected:
> 
>   queue/wbt_lat_usec = 2000
>   debugfs enabled = 2
>   debugfs min_lat_nsec = 2000000
> 
>  block/blk-wbt.c | 21 ++++++++++++++++++++-
>  1 file changed, 20 insertions(+), 1 deletion(-)
> 
> diff --git a/block/blk-wbt.c b/block/blk-wbt.c
> index dcc2438ca16dc..953d400fd0137 100644
> --- a/block/blk-wbt.c
> +++ b/block/blk-wbt.c
> @@ -813,6 +813,21 @@ static void wbt_queue_depth_changed(struct rq_qos *rqos)
>  	wbt_update_limits(RQWB(rqos));
>  }
>  
> +static bool wbt_set_lat_changed(struct request_queue *q, u64 val)
> +{
> +	struct rq_qos *rqos = wbt_rq_qos(q);
> +	struct rq_wb *rwb;
> +
> +	if (!rqos)
> +		return true;
> +
> +	rwb = RQWB(rqos);
> +	if (rwb->min_lat_nsec != val)
> +		return true;
> +
> +	return rwb_enabled(rwb) != !!val;
> +}
> +
>  static void wbt_exit(struct rq_qos *rqos)
>  {
>  	struct rq_wb *rwb = RQWB(rqos);
> @@ -1005,8 +1020,12 @@ int wbt_set_lat(struct gendisk *disk, s64 val)
>  	else if (val >= 0)
>  		val *= 1000ULL;
>  
> -	if (wbt_get_min_lat(q) == val)
> +	mutex_lock(&disk->rqos_state_mutex);
> +	if (!wbt_set_lat_changed(q, val)) {
> +		mutex_unlock(&disk->rqos_state_mutex);
>  		goto out;
> +	}
> +	mutex_unlock(&disk->rqos_state_mutex);
>  
>  	blk_mq_quiesce_queue(q);
>  


^ permalink raw reply

* Re: [PATCH] block: Make WBT latency writes honor enable state
From: Jens Axboe @ 2026-07-02  1:07 UTC (permalink / raw)
  To: guzebing; +Cc: linux-block, linux-kernel
In-Reply-To: <20260621014030.1625306-1-guzebing1612@gmail.com>


On Sun, 21 Jun 2026 09:40:30 +0800, guzebing wrote:
> queue/wbt_lat_usec controls both the stored WBT latency target and the
> effective WBT enable state.
> 
> The old no-op check skipped updates whenever the converted latency
> matched the stored min_lat_nsec. That check ignored whether the current
> WBT state already matched the state requested by the write. For a queue
> disabled by default, attempting to enable WBT by writing the default
> value through sysfs could return success while the enable state was left
> unchanged.
> 
> [...]

Applied, thanks!

[1/1] block: Make WBT latency writes honor enable state
      commit: 1e56f30a73f304fe26a272742c398aedd88a1a6c

Best regards,
-- 
Jens Axboe




^ permalink raw reply

* Re: [PATCH v5 0/5] crypto: skcipher - multi-data-unit dispatch as a template
From: Leonid Ravich @ 2026-07-02  8:45 UTC (permalink / raw)
  To: linux-crypto, dm-devel
  Cc: Eric Biggers, linux-block, linux-kernel, herbert, davem, snitzer,
	mpatocka, axboe
In-Reply-To: <20260701071919.GA111652@sol>

On Wed, Jul 01, 2026 at 12:19:19AM -0700, Eric Biggers wrote:
> No, this didn't address my feedback.  It moved things around but still
> adds additional overhead for everyone to support an out-of-tree driver,
> which also hasn't been shown to be any better than just using the CPU.

Eric, thanks for the fast reply.

Overhead: for a non-user the only cost is the data_unit_size field plus
one zeroing store in set_tfm()/ON_STACK; the en/decrypt paths are
untouched.  A dun() user pays one indirect dispatch into the template per
request plus a scatterwalk step and IV copy per unit -- the same per-DU
bookkeeping the consumer already open-codes today.

On the driver: I agree pushing code optimized for an out-of-tree driver
is wrong, but I don't think that's the case here -- this helps any async
crypto engine, and there are in-tree async xts(aes) ones dm-crypt is
eligible to use today: HiSilicon SEC2, TI DTHEv2, Atmel (I don't have any
to test on).  To bound the win, I used cryptd as a pure async carrier and
moved the per-DU split inside it, then ran dm-crypt + fio: batching cut
CPU ~30% on 128k I/O (large batch) and had zero impact on 4k -- so the
saving is dispatch, not crypto.  A real engine that submits a whole
multi-DU request in one descriptor avoids that per-DU dispatch entirely,
so it saves at least that.

So the question for me is what the bar is: does landing the API and dun()
template now (with the in-tree consolidation it already buys dm-crypt and
blk-crypto-fallback), with a throughput demonstration deferred to a real
async provider, work for you ?

Thanks,
Leonid

^ permalink raw reply

* Re: [PATCH] ublk: snapshot batch commands before preparing I/O
From: Ming Lei @ 2026-07-02 11:53 UTC (permalink / raw)
  To: Yousef Alhouseen
  Cc: Jens Axboe, Caleb Sander Mateos, linux-block, linux-kernel,
	syzbot+1a67ee1aa79484801ec6
In-Reply-To: <20260630211827.50475-1-alhouseenyousef@gmail.com>

On Tue, Jun 30, 2026 at 4:18 PM Yousef Alhouseen
<alhouseenyousef@gmail.com> wrote:
>
> The batch prepare path rereads its userspace element array when rolling
> back a partially prepared batch. Userspace can change an already
> processed tag before the second read, causing rollback to reject the
> replacement tag and leave earlier I/O slots prepared. The
> WARN_ON_ONCE() in the rollback path then fires.
>
> Copy the bounded batch into kernel memory before changing any I/O state
> and use the same snapshot for preparation and rollback. Commit and fetch
> batches retain the existing chunked userspace walk.
>
> Fixes: b256795b3606 ("ublk: handle UBLK_U_IO_PREP_IO_CMDS")
> Reported-by: syzbot+1a67ee1aa79484801ec6@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=1a67ee1aa79484801ec6
> Signed-off-by: Yousef Alhouseen <alhouseenyousef@gmail.com>

The fix looks fine, given the copy is only added in prep stage, which
isn't part of fast io path:

Reviewed-by: Ming Lei <tom.leiming@gmail.com>



Thanks,
Ming Lei

^ permalink raw reply

* Re: [PATCH] ublk: snapshot batch commands before preparing I/O
From: Jens Axboe @ 2026-07-02 12:28 UTC (permalink / raw)
  To: Ming Lei, Yousef Alhouseen
  Cc: Caleb Sander Mateos, linux-block, linux-kernel,
	syzbot+1a67ee1aa79484801ec6
In-Reply-To: <20260630211827.50475-1-alhouseenyousef@gmail.com>


On Tue, 30 Jun 2026 23:18:27 +0200, Yousef Alhouseen wrote:
> The batch prepare path rereads its userspace element array when rolling
> back a partially prepared batch. Userspace can change an already
> processed tag before the second read, causing rollback to reject the
> replacement tag and leave earlier I/O slots prepared. The
> WARN_ON_ONCE() in the rollback path then fires.
> 
> Copy the bounded batch into kernel memory before changing any I/O state
> and use the same snapshot for preparation and rollback. Commit and fetch
> batches retain the existing chunked userspace walk.
> 
> [...]

Applied, thanks!

[1/1] ublk: snapshot batch commands before preparing I/O
      commit: f01f5275feb77bac9fefbbf7cc584fe0b3850a92

Best regards,
-- 
Jens Axboe




^ permalink raw reply

* [RFC 0/1] block: export I/O latency histograms
From: Diangang Li @ 2026-07-02 13:27 UTC (permalink / raw)
  To: axboe; +Cc: linux-kernel, linux-block, Diangang Li

From: Diangang Li <lidiangang@bytedance.com>

Hi,

The existing block I/O statistics count completed I/Os and accumulate the
time spent in each operation group. That works for average latency, but
not for the tail. Once the time is folded into a single total, userspace
cannot tell whether a device saw a steady stream of moderate I/Os or a
small number of very slow ones.

This RFC adds cumulative latency histograms for block devices and
partitions. The new accounting is in the same completion paths as the
existing I/O statistics and uses the same operation groups: read, write,
discard, and flush.

Two proc files are added:

  /proc/disk_lat_buckets
        bucket upper bounds, in microseconds

  /proc/disk_lat_hists
        cumulative histogram counters

/proc/disk_lat_hists follows the shape of /proc/diskstats. Each
reported device or partition has four consecutive lines, in read,
write, discard, flush order. Each line starts with the major number,
minor number, and device name, followed by the bucket counters.
Userspace can sample the file twice and compute interval histograms and
percentiles from the deltas.

eBPF is useful for targeted debugging, but it is not a good match for
this interface. These counters are block accounting data, tied to the
same accounting points as diskstats and readable without a resident
userspace collector.

The histogram storage is per block_device and optional. If allocation
fails, bd_lat_hist remains NULL and regular I/O statistics keep working.
The record side uses per-cpu counters.

The current bucket table has 24 upper bounds, from 10 us to 8 seconds,
which gives 25 counters. That covers both fast NVMe devices and slow
disks without making the per-device state too large.

Fio tests on NVMe and HDD devices did not show a consistent performance
regression, and confirmed that histogram deltas match the corresponding
diskstats completion counters.

Diangang Li (1):
  block: export I/O latency histograms

 Documentation/ABI/testing/procfs-diskstats |  25 ++++
 block/Makefile                             |   2 +-
 block/bdev.c                               |   2 +
 block/blk-core.c                           |   4 +-
 block/blk-flush.c                          |   5 +-
 block/blk-mq.c                             |   4 +-
 block/blk.h                                |   7 +
 block/disk-lat-hist.c                      | 158 +++++++++++++++++++++
 block/genhd.c                              |  10 ++
 include/linux/blk_types.h                  |   1 +
 10 files changed, 213 insertions(+), 5 deletions(-)
 create mode 100644 block/disk-lat-hist.c

-- 
2.39.5

^ permalink raw reply

* [RFC 1/1] block: export I/O latency histograms
From: Diangang Li @ 2026-07-02 13:27 UTC (permalink / raw)
  To: axboe; +Cc: linux-kernel, linux-block, Diangang Li
In-Reply-To: <20260702132712.2255703-1-diangangli@gmail.com>

From: Diangang Li <lidiangang@bytedance.com>

The existing block I/O statistics expose completed I/O counts and total
elapsed time for each operation group. Userspace can derive average
latency from those counters, but it cannot recover tail latency
information such as P99 from cumulative totals.

Add optional per-block-device latency histogram accounting for read,
write, discard and flush statistics groups. The counters follow the
existing I/O statistics accounting paths and are exported through
/proc/disk_lat_hists.

Add /proc/disk_lat_buckets to expose the bucket upper bounds in
microseconds so userspace can interpret each histogram counter.

Histogram storage is allocated per block_device and treated as optional.
If allocation fails, regular I/O statistics continue to work and the
histogram output skips that device.

Signed-off-by: Diangang Li <lidiangang@bytedance.com>
---
 Documentation/ABI/testing/procfs-diskstats |  25 ++++
 block/Makefile                             |   2 +-
 block/bdev.c                               |   2 +
 block/blk-core.c                           |   4 +-
 block/blk-flush.c                          |   5 +-
 block/blk-mq.c                             |   4 +-
 block/blk.h                                |   7 +
 block/disk-lat-hist.c                      | 158 +++++++++++++++++++++
 block/genhd.c                              |  10 ++
 include/linux/blk_types.h                  |   1 +
 10 files changed, 213 insertions(+), 5 deletions(-)
 create mode 100644 block/disk-lat-hist.c

diff --git a/Documentation/ABI/testing/procfs-diskstats b/Documentation/ABI/testing/procfs-diskstats
index 6a719cf2075cd..015c33f5c150b 100644
--- a/Documentation/ABI/testing/procfs-diskstats
+++ b/Documentation/ABI/testing/procfs-diskstats
@@ -41,3 +41,28 @@ Description:
 		==  =====================================
 
 		For more details refer to Documentation/admin-guide/iostats.rst
+
+What:		/proc/disk_lat_buckets
+Date:		July 2026
+Contact:	Linux block layer mailing list <linux-block@vger.kernel.org>
+Description:
+		Contains the latency histogram bucket upper bounds, in
+		microseconds. The 24 bounds define the 25 counters in
+		/proc/disk_lat_hists. The first counter covers latencies up to
+		10 us, and the last counter covers latencies above 8 seconds.
+
+What:		/proc/disk_lat_hists
+Date:		July 2026
+Contact:	Linux block layer mailing list <linux-block@vger.kernel.org>
+Description:
+		Contains cumulative I/O latency histogram counters for block
+		devices and partitions. Each reported device or partition has
+		four consecutive lines, in read, write, discard, flush order.
+		Each line has 28 fields:
+
+		==  ===================================
+		 1  major number
+		 2  minor number
+		 3  device name
+		 4-28 cumulative latency bucket counters
+		==  ===================================
diff --git a/block/Makefile b/block/Makefile
index e7bd320e3d697..a24850cf1d51f 100644
--- a/block/Makefile
+++ b/block/Makefile
@@ -11,7 +11,7 @@ obj-y		:= bdev.o fops.o bio.o elevator.o blk-core.o blk-sysfs.o \
 			blk-mq-tag.o blk-mq-dma.o blk-stat.o \
 			blk-mq-sysfs.o blk-mq-cpumap.o blk-mq-sched.o ioctl.o \
 			genhd.o ioprio.o badblocks.o partitions/ blk-rq-qos.o \
-			disk-events.o blk-ia-ranges.o early-lookup.o
+			disk-events.o blk-ia-ranges.o early-lookup.o disk-lat-hist.o
 
 obj-$(CONFIG_BLK_ERROR_INJECTION) += error-injection.o
 obj-$(CONFIG_BLK_DEV_BSG_COMMON) += bsg.o
diff --git a/block/bdev.c b/block/bdev.c
index 85ce57bd2ae4f..d389772515e4c 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -394,6 +394,7 @@ static void bdev_free_inode(struct inode *inode)
 {
 	struct block_device *bdev = I_BDEV(inode);
 
+	disk_lat_hist_free(bdev);
 	free_percpu(bdev->bd_stats);
 	kfree(bdev->bd_meta_info);
 	security_bdev_free(bdev);
@@ -483,6 +484,7 @@ struct block_device *bdev_alloc(struct gendisk *disk, u8 partno)
 		iput(inode);
 		return NULL;
 	}
+	disk_lat_hist_alloc(bdev);
 	bdev->bd_disk = disk;
 	return bdev;
 }
diff --git a/block/blk-core.c b/block/blk-core.c
index 365641266c9e8..8d9c4eb850465 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1103,12 +1103,14 @@ void bdev_end_io_acct(struct block_device *bdev, enum req_op op,
 	const int sgrp = op_stat_group(op);
 	unsigned long now = READ_ONCE(jiffies);
 	unsigned long duration = now - start_time;
+	u64 duration_ns = jiffies_to_nsecs(duration);
 
 	part_stat_lock();
 	update_io_ticks(bdev, now, true);
 	part_stat_inc(bdev, ios[sgrp]);
 	part_stat_add(bdev, sectors[sgrp], sectors);
-	part_stat_add(bdev, nsecs[sgrp], jiffies_to_nsecs(duration));
+	part_stat_add(bdev, nsecs[sgrp], duration_ns);
+	disk_lat_hist_record_part(bdev, sgrp, duration_ns);
 	bdev_dec_in_flight(bdev, op);
 	part_stat_unlock();
 }
diff --git a/block/blk-flush.c b/block/blk-flush.c
index 403a46c864117..a1fbd749b6607 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -124,11 +124,12 @@ static void blk_flush_restore_request(struct request *rq)
 static void blk_account_io_flush(struct request *rq)
 {
 	struct block_device *part = rq->q->disk->part0;
+	u64 nsecs = blk_time_get_ns() - rq->start_time_ns;
 
 	part_stat_lock();
 	part_stat_inc(part, ios[STAT_FLUSH]);
-	part_stat_add(part, nsecs[STAT_FLUSH],
-		      blk_time_get_ns() - rq->start_time_ns);
+	part_stat_add(part, nsecs[STAT_FLUSH], nsecs);
+	disk_lat_hist_record_part(part, STAT_FLUSH, nsecs);
 	part_stat_unlock();
 }
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 88cb5acc4f39e..231bd531803b4 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1077,11 +1077,13 @@ static inline void blk_account_io_done(struct request *req, u64 now)
 	 */
 	if ((req->rq_flags & (RQF_IO_STAT|RQF_FLUSH_SEQ)) == RQF_IO_STAT) {
 		const int sgrp = op_stat_group(req_op(req));
+		u64 nsecs = now - req->start_time_ns;
 
 		part_stat_lock();
 		update_io_ticks(req->part, jiffies, true);
 		part_stat_inc(req->part, ios[sgrp]);
-		part_stat_add(req->part, nsecs[sgrp], now - req->start_time_ns);
+		part_stat_add(req->part, nsecs[sgrp], nsecs);
+		disk_lat_hist_record_part(req->part, sgrp, nsecs);
 		bdev_dec_in_flight(req->part, req_op(req));
 		part_stat_unlock();
 	}
diff --git a/block/blk.h b/block/blk.h
index 25af8ac5ef0f7..c79222bd13194 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -345,6 +345,13 @@ bool blk_attempt_plug_merge(struct request_queue *q, struct bio *bio,
 bool blk_bio_list_merge(struct request_queue *q, struct list_head *list,
 			struct bio *bio, unsigned int nr_segs);
 
+void disk_lat_hist_alloc(struct block_device *bdev);
+void disk_lat_hist_free(struct block_device *bdev);
+void disk_lat_hist_set_all(struct block_device *bdev, int value);
+void disk_lat_hist_record_part(struct block_device *part, int sgrp, u64 nsec);
+int disk_lat_buckets_show(struct seq_file *seqf, void *v);
+int disk_lat_hists_show(struct seq_file *seqf, void *v);
+
 /*
  * Plug flush limits
  */
diff --git a/block/disk-lat-hist.c b/block/disk-lat-hist.c
new file mode 100644
index 0000000000000..1ef2b33cb68c3
--- /dev/null
+++ b/block/disk-lat-hist.c
@@ -0,0 +1,158 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/blkdev.h>
+#include <linux/part_stat.h>
+#include <linux/percpu.h>
+#include <linux/seq_file.h>
+
+#include "blk.h"
+
+#define DISK_LAT_HIST_BOUNDS	24
+#define DISK_LAT_HIST_BUCKETS	(DISK_LAT_HIST_BOUNDS + 1)
+
+struct disk_lat_hist {
+	u64 buckets[NR_STAT_GROUPS][DISK_LAT_HIST_BUCKETS];
+};
+
+static const u64 disk_lat_hist_bounds_us[DISK_LAT_HIST_BOUNDS] = {
+	10, 20, 40, 80,
+	100, 200, 400, 800,
+	1000, 2000, 4000, 8000,
+	10000, 20000, 40000, 80000,
+	100000, 200000, 400000, 800000,
+	1000000, 2000000, 4000000, 8000000,
+};
+
+static const int disk_lat_hist_order[NR_STAT_GROUPS] = {
+	STAT_READ,
+	STAT_WRITE,
+	STAT_DISCARD,
+	STAT_FLUSH,
+};
+
+void disk_lat_hist_alloc(struct block_device *bdev)
+{
+	/*
+	 * Latency histograms are optional. If allocation fails,
+	 * bd_lat_hist stays NULL; the record path skips histogram
+	 * accounting and regular I/O statistics are unaffected.
+	 */
+	bdev->bd_lat_hist = alloc_percpu(struct disk_lat_hist);
+	if (!bdev->bd_lat_hist)
+		pr_warn_once("block: failed to allocate latency histograms\n");
+}
+
+void disk_lat_hist_free(struct block_device *bdev)
+{
+	if (!bdev->bd_lat_hist)
+		return;
+	free_percpu(bdev->bd_lat_hist);
+	bdev->bd_lat_hist = NULL;
+}
+
+void disk_lat_hist_set_all(struct block_device *bdev, int value)
+{
+	int cpu;
+
+	if (!bdev->bd_lat_hist)
+		return;
+
+	for_each_possible_cpu(cpu)
+		memset(per_cpu_ptr(bdev->bd_lat_hist, cpu), value,
+		       sizeof(struct disk_lat_hist));
+}
+
+static void disk_lat_hist_record(struct block_device *bdev, int sgrp,
+				 int bucket)
+{
+	if (!bdev || !bdev->bd_lat_hist)
+		return;
+	__this_cpu_inc(bdev->bd_lat_hist->buckets[sgrp][bucket]);
+}
+
+static int disk_lat_hist_bucket(u64 nsec)
+{
+	int low = 0, high = DISK_LAT_HIST_BOUNDS;
+
+	while (low < high) {
+		int mid = low + (high - low) / 2;
+
+		if (nsec <= disk_lat_hist_bounds_us[mid] * NSEC_PER_USEC)
+			high = mid;
+		else
+			low = mid + 1;
+	}
+
+	return low;
+}
+
+void disk_lat_hist_record_part(struct block_device *part, int sgrp, u64 nsec)
+{
+	struct block_device *whole;
+	int bucket;
+
+	if (sgrp < 0 || sgrp >= NR_STAT_GROUPS || !part || !part->bd_disk)
+		return;
+
+	bucket = disk_lat_hist_bucket(nsec);
+	disk_lat_hist_record(part, sgrp, bucket);
+
+	whole = bdev_whole(part);
+	if (whole != part)
+		disk_lat_hist_record(whole, sgrp, bucket);
+}
+
+static void disk_lat_hist_seq_show(struct seq_file *seqf,
+				   struct block_device *bdev)
+{
+	u64 buckets[NR_STAT_GROUPS][DISK_LAT_HIST_BUCKETS] = { };
+	int cpu, sgrp, i, bucket;
+
+	if (!bdev->bd_lat_hist)
+		return;
+
+	for_each_possible_cpu(cpu) {
+		struct disk_lat_hist *hist = per_cpu_ptr(bdev->bd_lat_hist, cpu);
+
+		for (sgrp = 0; sgrp < NR_STAT_GROUPS; sgrp++)
+			for (i = 0; i < DISK_LAT_HIST_BUCKETS; i++)
+				buckets[sgrp][i] += hist->buckets[sgrp][i];
+	}
+
+	for (i = 0; i < NR_STAT_GROUPS; i++) {
+		sgrp = disk_lat_hist_order[i];
+		seq_printf(seqf, "%4d %7d %pg",
+			   MAJOR(bdev->bd_dev), MINOR(bdev->bd_dev), bdev);
+		for (bucket = 0; bucket < DISK_LAT_HIST_BUCKETS; bucket++)
+			seq_printf(seqf, " %llu", buckets[sgrp][bucket]);
+		seq_putc(seqf, '\n');
+	}
+}
+
+int disk_lat_buckets_show(struct seq_file *seqf, void *v)
+{
+	int i;
+
+	for (i = 0; i < DISK_LAT_HIST_BOUNDS; i++)
+		seq_printf(seqf, "%s%llu", i ? " " : "",
+			   disk_lat_hist_bounds_us[i]);
+	seq_putc(seqf, '\n');
+
+	return 0;
+}
+
+int disk_lat_hists_show(struct seq_file *seqf, void *v)
+{
+	struct gendisk *disk = v;
+	struct block_device *part;
+	unsigned long idx;
+
+	rcu_read_lock();
+	xa_for_each(&disk->part_tbl, idx, part) {
+		if (bdev_is_partition(part) && !bdev_nr_sectors(part))
+			continue;
+		disk_lat_hist_seq_show(seqf, part);
+	}
+	rcu_read_unlock();
+
+	return 0;
+}
diff --git a/block/genhd.c b/block/genhd.c
index f84b6a355b574..b4ee18f5c4ea2 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -747,6 +747,7 @@ static void __del_gendisk(struct gendisk *disk)
 	disk->slave_dir = NULL;
 
 	part_stat_set_all(disk->part0, 0);
+	disk_lat_hist_set_all(disk->part0, 0);
 	disk->part0->bd_stamp = 0;
 	sysfs_remove_link(block_depr, dev_name(disk_to_dev(disk)));
 	pm_runtime_set_memalloc_noio(disk_to_dev(disk), false);
@@ -1420,9 +1421,18 @@ static const struct seq_operations diskstats_op = {
 	.show	= diskstats_show
 };
 
+static const struct seq_operations disk_lat_hists_op = {
+	.start	= disk_seqf_start,
+	.next	= disk_seqf_next,
+	.stop	= disk_seqf_stop,
+	.show	= disk_lat_hists_show
+};
+
 static int __init proc_genhd_init(void)
 {
 	proc_create_seq("diskstats", 0, NULL, &diskstats_op);
+	proc_create_single("disk_lat_buckets", 0, NULL, disk_lat_buckets_show);
+	proc_create_seq("disk_lat_hists", 0, NULL, &disk_lat_hists_op);
 	proc_create_seq("partitions", 0, NULL, &partitions_op);
 	return 0;
 }
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 8808ee76e73c0..be2d31aea5d44 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -44,6 +44,7 @@ struct block_device {
 	struct gendisk *	bd_disk;
 	struct request_queue *	bd_queue;
 	struct disk_stats __percpu *bd_stats;
+	struct disk_lat_hist __percpu *bd_lat_hist;
 	unsigned long		bd_stamp;
 	atomic_t		__bd_flags;	// partition number + flags
 #define BD_PARTNO		255	// lower 8 bits; assign-once
-- 
2.39.5


^ permalink raw reply related

* Re: [PATCH v2 10/18] block: convert iomap ops to ->iomap_next()
From: Christoph Hellwig @ 2026-07-02 14:03 UTC (permalink / raw)
  To: Keith Busch
  Cc: Joanne Koong, brauner, hch, djwong, willy, hsiangkao,
	linux-fsdevel, linux-xfs, Jens Axboe, open list:BLOCK LAYER,
	open list
In-Reply-To: <akRlehPHzy-F_105@kbusch-mbp>

On Tue, Jun 30, 2026 at 06:55:22PM -0600, Keith Busch wrote:
> On Tue, Jun 30, 2026 at 05:09:25PM -0700, Joanne Koong wrote:
> >  static const struct iomap_ops blkdev_iomap_ops = {
> > -	.iomap_begin		= blkdev_iomap_begin,
> > +	.iomap_next		= blkdev_iomap_next,
> >  };
> 
> I think it's generally safe to use the same mailing list for the entire
> series. There's no context here on what "iomap_next" is because I'm
> subscribed only to linux-block. I found the rest here:

Yeah, without that some people always get screwed over reviewing the
series.

> FWIW, everything looks good to me.
> 
> Reviewed-by: Keith Busch <kbusch@kernel.org>

Same here:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply

* Re: [PATCH v2 17/18] iomap: pass iomap_next_fn directly instead of struct iomap_ops
From: Christoph Hellwig @ 2026-07-02 14:07 UTC (permalink / raw)
  To: Joanne Koong
  Cc: brauner, hch, djwong, willy, hsiangkao, linux-fsdevel, linux-xfs,
	Jens Axboe, Chris Mason, David Sterba, Alexander Viro, Jan Kara,
	Dan Williams, Gao Xiang, Chao Yu, Yue Hu, Jeffle Xu,
	Sandeep Dhavale, Hongbo Li, Chunhai Guo, Namjae Jeon,
	Sungjong Seo, Yuezhang Mo, Theodore Ts'o, Andreas Dilger,
	Baokun Li, Ojaswin Mujoo, Ritesh Harjani (IBM), Zhang Yi,
	Jaegeuk Kim, Miklos Szeredi, Andreas Gruenbacher, Mikulas Patocka,
	Hyunchul Lee, Konstantin Komarov, Carlos Maiolino, Damien Le Moal,
	Naohiro Aota, Johannes Thumshirn, open list:BLOCK LAYER,
	open list, open list:BTRFS FILE SYSTEM,
	open list:FILESYSTEM DIRECT ACCESS (DAX),
	open list:EROFS FILE SYSTEM, open list:EXT2 FILE SYSTEM,
	open list:F2FS FILE SYSTEM, open list:FUSE FILESYSTEM [CORE],
	open list:GFS2 FILE SYSTEM, open list:NTFS3 FILESYSTEM
In-Reply-To: <20260701000949.1666714-18-joannelkoong@gmail.com>

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

In terms of merge logistics, I wonder if we should delay this and
the previous patch to the next merge window so that we can minimize the
cross-subsystem merge pain with more file system iomap conversion.
If none of them actually happen until rc6 or so, orif  the merges aren't
painful we could still pick them up late in the merge window.


^ permalink raw reply

* Re: [PATCH v5 0/5] crypto: skcipher - multi-data-unit dispatch as a template
From: Eric Biggers @ 2026-07-02 16:45 UTC (permalink / raw)
  To: Leonid Ravich
  Cc: linux-crypto, dm-devel, linux-block, linux-kernel, herbert, davem,
	snitzer, mpatocka, axboe
In-Reply-To: <20260702084534.22846-1-lravich@amazon.com>

On Thu, Jul 02, 2026 at 08:45:34AM +0000, Leonid Ravich wrote:
> On Wed, Jul 01, 2026 at 12:19:19AM -0700, Eric Biggers wrote:
> > No, this didn't address my feedback.  It moved things around but still
> > adds additional overhead for everyone to support an out-of-tree driver,
> > which also hasn't been shown to be any better than just using the CPU.
> 
> Eric, thanks for the fast reply.
> 
> Overhead: for a non-user the only cost is the data_unit_size field plus
> one zeroing store in set_tfm()/ON_STACK; the en/decrypt paths are
> untouched.

Sure, which is still a cost for everyone.

> A dun() user pays one indirect dispatch into the template per
> request plus a scatterwalk step and IV copy per unit -- the same per-DU
> bookkeeping the consumer already open-codes today.

It's not the same at all.  There's now an extra indirect call, more
per-request memory used, additional overhead to create a scatterlist and
then break it up into multiple ones using the fully generalized
scatterlist walker instead of just creating the correct ones in the
first place from the bio_vec, etc.  We need to be simplifying the crypto
APIs, not making them even more complex and adding more overhead.

> On the driver: I agree pushing code optimized for an out-of-tree driver
> is wrong

So don't do it.

> but I don't think that's the case here -- this helps any async
> crypto engine, and there are in-tree async xts(aes) ones dm-crypt is
> eligible to use today: HiSilicon SEC2, TI DTHEv2, Atmel (I don't have any
> to test on).

It helps nothing, as there is no patch and no benchmarks.

> To bound the win, I used cryptd as a pure async carrier and
> moved the per-DU split inside it, then ran dm-crypt + fio: batching cut
> CPU ~30% on 128k I/O (large batch) and had zero impact on 4k -- so the
> saving is dispatch, not crypto.
>
> A real engine that submits a whole
> multi-DU request in one descriptor avoids that per-DU dispatch entirely,
> so it saves at least that.

That is not an equivalent benchmark at all.

> So the question for me is what the bar is: does landing the API and dun()
> template now (with the in-tree consolidation it already buys dm-crypt and
> blk-crypto-fallback), with a throughput demonstration deferred to a real
> async provider, work for you ?

I definitely don't want this for blk-crypto-fallback.
blk-crypto-fallback already only supports CPU-based crypto acceleration.
(As it should, because nothing else is actually worthwhile these days,
other than hardware inline encryption which is separately supported.)
And I'm also planning to switch it from crypto_skcipher to lib/crypto/
to eliminate the remaining API overhead, which will actually accomplish
that, unlike the dun() thing which just makes it worse.

The bar for adding things to the upstream kernel is that it has to
actually be used *and* beneficial in the upstream kernel.

- Eric

^ permalink raw reply

* Re: [PATCH v2 17/18] iomap: pass iomap_next_fn directly instead of struct iomap_ops
From: Darrick J. Wong @ 2026-07-02 16:51 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Joanne Koong, brauner, willy, hsiangkao, linux-fsdevel, linux-xfs,
	Jens Axboe, Chris Mason, David Sterba, Alexander Viro, Jan Kara,
	Dan Williams, Gao Xiang, Chao Yu, Yue Hu, Jeffle Xu,
	Sandeep Dhavale, Hongbo Li, Chunhai Guo, Namjae Jeon,
	Sungjong Seo, Yuezhang Mo, Theodore Ts'o, Andreas Dilger,
	Baokun Li, Ojaswin Mujoo, Ritesh Harjani (IBM), Zhang Yi,
	Jaegeuk Kim, Miklos Szeredi, Andreas Gruenbacher, Mikulas Patocka,
	Hyunchul Lee, Konstantin Komarov, Carlos Maiolino, Damien Le Moal,
	Naohiro Aota, Johannes Thumshirn, open list:BLOCK LAYER,
	open list, open list:BTRFS FILE SYSTEM,
	open list:FILESYSTEM DIRECT ACCESS (DAX),
	open list:EROFS FILE SYSTEM, open list:EXT2 FILE SYSTEM,
	open list:F2FS FILE SYSTEM, open list:FUSE FILESYSTEM [CORE],
	open list:GFS2 FILE SYSTEM, open list:NTFS3 FILESYSTEM
In-Reply-To: <20260702140705.GE21339@lst.de>

On Thu, Jul 02, 2026 at 04:07:05PM +0200, Christoph Hellwig wrote:
> Looks good:
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> 
> In terms of merge logistics, I wonder if we should delay this and
> the previous patch to the next merge window so that we can minimize the
> cross-subsystem merge pain with more file system iomap conversion.
> If none of them actually happen until rc6 or so, orif  the merges aren't
> painful we could still pick them up late in the merge window.

I'd say everything but this patch should go in during the merge window
for 7.3, along with clear instructions to brauner/torvalds to expect
this patch to appear right before 7.3-rc1 gets tagged, to clean up all
the other changes that come in.

--D

^ permalink raw reply

* Re: [PATCH v2 17/18] iomap: pass iomap_next_fn directly instead of struct iomap_ops
From: Darrick J. Wong @ 2026-07-02 16:58 UTC (permalink / raw)
  To: Joanne Koong
  Cc: brauner, hch, willy, hsiangkao, linux-fsdevel, linux-xfs,
	Jens Axboe, Chris Mason, David Sterba, Alexander Viro, Jan Kara,
	Dan Williams, Gao Xiang, Chao Yu, Yue Hu, Jeffle Xu,
	Sandeep Dhavale, Hongbo Li, Chunhai Guo, Namjae Jeon,
	Sungjong Seo, Yuezhang Mo, Theodore Ts'o, Andreas Dilger,
	Baokun Li, Ojaswin Mujoo, Ritesh Harjani (IBM), Zhang Yi,
	Jaegeuk Kim, Miklos Szeredi, Andreas Gruenbacher, Mikulas Patocka,
	Hyunchul Lee, Konstantin Komarov, Carlos Maiolino, Damien Le Moal,
	Naohiro Aota, Johannes Thumshirn, open list:BLOCK LAYER,
	open list, open list:BTRFS FILE SYSTEM,
	open list:FILESYSTEM DIRECT ACCESS (DAX),
	open list:EROFS FILE SYSTEM, open list:EXT2 FILE SYSTEM,
	open list:F2FS FILE SYSTEM, open list:FUSE FILESYSTEM [CORE],
	open list:GFS2 FILE SYSTEM, open list:NTFS3 FILESYSTEM
In-Reply-To: <20260701000949.1666714-18-joannelkoong@gmail.com>

On Tue, Jun 30, 2026 at 05:09:32PM -0700, Joanne Koong wrote:
> Now that all filesystems implement ->iomap_next() and the legacy
> ->iomap_begin()/->iomap_end() fallback is gone, struct iomap_ops only
> wraps a single iomap_next function pointer. Drop the struct entirely and
> pass the iomap_next_fn directly to iomap_iter() and all the iomap/dax
> entry points; filesystems pass their ->iomap_next function instead of an
> ops struct.
> 
> No functional change intended.
> 
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>

I'm gonna cut out quite a bit of this patch to reduce the reply size.

> ---
<snip>
>  fs/iomap/buffered-io.c | 38 ++++++++++++++---------------
>  fs/iomap/direct-io.c   |  8 +++---
>  fs/iomap/fiemap.c      |  8 +++---
>  fs/iomap/iter.c        |  8 +++---
>  fs/iomap/seek.c        |  8 +++---
>  fs/iomap/swapfile.c    |  4 +--
<snip>
>  fs/remap_range.c       |  6 ++---
>  fs/xfs/xfs_aops.c      |  8 +++---
>  fs/xfs/xfs_file.c      | 40 +++++++++++++++---------------
>  fs/xfs/xfs_iomap.c     | 55 +++++++++---------------------------------
>  fs/xfs/xfs_iomap.h     | 24 ++++++++++++------
>  fs/xfs/xfs_iops.c      |  4 +--
>  fs/xfs/xfs_reflink.c   |  6 ++---
>  fs/zonefs/file.c       | 22 ++++++-----------
>  include/linux/dax.h    | 18 ++++++--------
>  include/linux/fs.h     |  7 ++++--
>  include/linux/iomap.h  | 46 +++++++++++++++--------------------
>  54 files changed, 302 insertions(+), 411 deletions(-)
> 
<snip>

> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 3f0932e46fd6..0aa8abc438c1 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -626,7 +626,7 @@ static int iomap_read_folio_iter(struct iomap_iter *iter,
>  	return 0;
>  }
>  
> -void iomap_read_folio(const struct iomap_ops *ops,
> +void iomap_read_folio(iomap_next_fn iomap_next,

If you took my earlier suggestion to rename the typedef to
iomap_iter_fn, then this parameter ought to be named iter_fn.

>  		struct iomap_read_folio_ctx *ctx, void *private)
>  {
>  	struct folio *folio = ctx->cur_folio;
> @@ -650,7 +650,7 @@ void iomap_read_folio(const struct iomap_ops *ops,
>  		fsverity_readahead(ctx->vi, folio->index,
>  				   folio_nr_pages(folio));
>  
> -	while ((ret = iomap_iter(&iter, ops)) > 0) {
> +	while ((ret = iomap_iter(&iter, iomap_next)) > 0) {
>  		iter.status = iomap_read_folio_iter(&iter, ctx,
>  				&bytes_submitted);
>  		iomap_read_submit(&iter, ctx);
> @@ -688,22 +688,22 @@ static int iomap_readahead_iter(struct iomap_iter *iter,
>  
>  /**
>   * iomap_readahead - Attempt to read pages from a file.
> - * @ops: The operations vector for the filesystem.
> + * @iomap_next: The iomap_next callback for the filesystem.

"The iomap iteration function for the filesystem" ?

Using the term "iomap_next" in the definition for iomap_next isn't that
helpful.

>   * @ctx: The ctx used for issuing readahead.
>   * @private: The filesystem-specific information for issuing iomap_iter.
>   *
>   * This function is for filesystems to call to implement their readahead
>   * address_space operation.
>   *
> - * Context: The @ops callbacks may submit I/O (eg to read the addresses of
> + * Context: The @iomap_next callback may submit I/O (eg to read the addresses of
>   * blocks from disc), and may wait for it.  The caller may be trying to
>   * access a different page, and so sleeping excessively should be avoided.
>   * It may allocate memory, but should avoid costly allocations.  This
>   * function is called with memalloc_nofs set, so allocations will not cause
>   * the filesystem to be reentered.
>   */
> -void iomap_readahead(const struct iomap_ops *ops,
> -		struct iomap_read_folio_ctx *ctx, void *private)
> +void iomap_readahead(iomap_next_fn iomap_next, struct iomap_read_folio_ctx *ctx,
> +		void *private)
>  {
>  	struct readahead_control *rac = ctx->rac;
>  	struct iomap_iter iter = {
> @@ -725,7 +725,7 @@ void iomap_readahead(const struct iomap_ops *ops,
>  		fsverity_readahead(ctx->vi, readahead_index(rac),
>  				readahead_count(rac));
>  
> -	while (iomap_iter(&iter, ops) > 0) {
> +	while (iomap_iter(&iter, iomap_next) > 0) {
>  		iter.status = iomap_readahead_iter(&iter, ctx,
>  					&cur_bytes_submitted);
>  		iomap_read_submit(&iter, ctx);
> @@ -1268,7 +1268,7 @@ static int iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i,
>  
>  ssize_t
>  iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *i,
> -		const struct iomap_ops *ops,
> +		iomap_next_fn iomap_next,
>  		const struct iomap_write_ops *write_ops, void *private)
>  {
>  	struct iomap_iter iter = {
> @@ -1285,7 +1285,7 @@ iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *i,
>  	if (iocb->ki_flags & IOCB_DONTCACHE)
>  		iter.flags |= IOMAP_DONTCACHE;
>  
> -	while ((ret = iomap_iter(&iter, ops)) > 0)
> +	while ((ret = iomap_iter(&iter, iomap_next)) > 0)
>  		iter.status = iomap_write_iter(&iter, i, write_ops);
>  
>  	if (unlikely(iter.pos == iocb->ki_pos))
> @@ -1297,7 +1297,7 @@ iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *i,
>  EXPORT_SYMBOL_GPL(iomap_file_buffered_write);
>  
>  int iomap_fsverity_write(struct file *file, loff_t pos, size_t length,
> -		const void *buf, const struct iomap_ops *ops,
> +		const void *buf, iomap_next_fn iomap_next,
>  		const struct iomap_write_ops *write_ops)
>  {
>  	int			ret;
> @@ -1314,7 +1314,7 @@ int iomap_fsverity_write(struct file *file, loff_t pos, size_t length,
>  
>  	iov_iter_kvec(&iiter, WRITE, &kvec, 1, length);
>  
> -	ret = iomap_file_buffered_write(&iocb, &iiter, ops, write_ops, NULL);
> +	ret = iomap_file_buffered_write(&iocb, &iiter, iomap_next, write_ops, NULL);
>  	if (ret < 0)
>  		return ret;
>  	return ret == length ? 0 : -EIO;
> @@ -1586,7 +1586,7 @@ static int iomap_unshare_iter(struct iomap_iter *iter,
>  
>  int
>  iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len,
> -		const struct iomap_ops *ops,
> +		iomap_next_fn iomap_next,
>  		const struct iomap_write_ops *write_ops)
>  {
>  	struct iomap_iter iter = {
> @@ -1601,7 +1601,7 @@ iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len,
>  		return 0;
>  
>  	iter.len = min(len, size - pos);
> -	while ((ret = iomap_iter(&iter, ops)) > 0)
> +	while ((ret = iomap_iter(&iter, iomap_next)) > 0)
>  		iter.status = iomap_unshare_iter(&iter, write_ops);
>  	return ret;
>  }
> @@ -1710,7 +1710,7 @@ EXPORT_SYMBOL_GPL(iomap_fill_dirty_folios);
>  
>  int
>  iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
> -		const struct iomap_ops *ops,
> +		iomap_next_fn iomap_next,
>  		const struct iomap_write_ops *write_ops, void *private)
>  {
>  	struct folio_batch fbatch;
> @@ -1735,7 +1735,7 @@ iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
>  	 */
>  	range_dirty = filemap_range_needs_writeback(mapping, iter.pos,
>  					iter.pos + iter.len - 1);
> -	while ((ret = iomap_iter(&iter, ops)) > 0) {
> +	while ((ret = iomap_iter(&iter, iomap_next)) > 0) {
>  		const struct iomap *srcmap = iomap_iter_srcmap(&iter);
>  
>  		if (!(iter.iomap.flags & IOMAP_F_FOLIO_BATCH) &&
> @@ -1761,7 +1761,7 @@ EXPORT_SYMBOL_GPL(iomap_zero_range);
>  
>  int
>  iomap_truncate_page(struct inode *inode, loff_t pos, bool *did_zero,
> -		const struct iomap_ops *ops,
> +		iomap_next_fn iomap_next,
>  		const struct iomap_write_ops *write_ops, void *private)
>  {
>  	unsigned int blocksize = i_blocksize(inode);
> @@ -1770,7 +1770,7 @@ iomap_truncate_page(struct inode *inode, loff_t pos, bool *did_zero,
>  	/* Block boundary? Nothing to do */
>  	if (!off)
>  		return 0;
> -	return iomap_zero_range(inode, pos, blocksize - off, did_zero, ops,
> +	return iomap_zero_range(inode, pos, blocksize - off, did_zero, iomap_next,
>  			write_ops, private);
>  }
>  EXPORT_SYMBOL_GPL(iomap_truncate_page);
> @@ -1795,7 +1795,7 @@ static int iomap_folio_mkwrite_iter(struct iomap_iter *iter,
>  	return iomap_iter_advance(iter, length);
>  }
>  
> -vm_fault_t iomap_page_mkwrite(struct vm_fault *vmf, const struct iomap_ops *ops,
> +vm_fault_t iomap_page_mkwrite(struct vm_fault *vmf, iomap_next_fn iomap_next,
>  		void *private)
>  {
>  	struct iomap_iter iter = {
> @@ -1812,7 +1812,7 @@ vm_fault_t iomap_page_mkwrite(struct vm_fault *vmf, const struct iomap_ops *ops,
>  		goto out_unlock;
>  	iter.pos = folio_pos(folio);
>  	iter.len = ret;
> -	while ((ret = iomap_iter(&iter, ops)) > 0)
> +	while ((ret = iomap_iter(&iter, iomap_next)) > 0)
>  		iter.status = iomap_folio_mkwrite_iter(&iter, folio);
>  
>  	if (ret < 0)
> diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> index b485e3b191da..e299d186f743 100644
> --- a/fs/iomap/direct-io.c
> +++ b/fs/iomap/direct-io.c
> @@ -676,7 +676,7 @@ static int iomap_dio_iter(struct iomap_iter *iter, struct iomap_dio *dio)
>   */
>  struct iomap_dio *
>  __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
> -		const struct iomap_ops *ops, const struct iomap_dio_ops *dops,
> +		iomap_next_fn iomap_next, const struct iomap_dio_ops *dops,
>  		unsigned int dio_flags, void *private, size_t done_before)
>  {
>  	struct inode *inode = file_inode(iocb->ki_filp);
> @@ -800,7 +800,7 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
>  	inode_dio_begin(inode);
>  
>  	blk_start_plug(&plug);
> -	while ((ret = iomap_iter(&iomi, ops)) > 0) {
> +	while ((ret = iomap_iter(&iomi, iomap_next)) > 0) {
>  		iomi.status = iomap_dio_iter(&iomi, dio);
>  
>  		/*
> @@ -890,12 +890,12 @@ EXPORT_SYMBOL_GPL(__iomap_dio_rw);
>  
>  ssize_t
>  iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
> -		const struct iomap_ops *ops, const struct iomap_dio_ops *dops,
> +		iomap_next_fn iomap_next, const struct iomap_dio_ops *dops,
>  		unsigned int dio_flags, void *private, size_t done_before)
>  {
>  	struct iomap_dio *dio;
>  
> -	dio = __iomap_dio_rw(iocb, iter, ops, dops, dio_flags, private,
> +	dio = __iomap_dio_rw(iocb, iter, iomap_next, dops, dio_flags, private,
>  			     done_before);
>  	if (IS_ERR_OR_NULL(dio))
>  		return PTR_ERR_OR_ZERO(dio);
> diff --git a/fs/iomap/fiemap.c b/fs/iomap/fiemap.c
> index d11dadff8286..fc488f05d8ce 100644
> --- a/fs/iomap/fiemap.c
> +++ b/fs/iomap/fiemap.c
> @@ -56,7 +56,7 @@ static int iomap_fiemap_iter(struct iomap_iter *iter,
>  }
>  
>  int iomap_fiemap(struct inode *inode, struct fiemap_extent_info *fi,
> -		u64 start, u64 len, const struct iomap_ops *ops)
> +		u64 start, u64 len, iomap_next_fn iomap_next)
>  {
>  	struct iomap_iter iter = {
>  		.inode		= inode,
> @@ -73,7 +73,7 @@ int iomap_fiemap(struct inode *inode, struct fiemap_extent_info *fi,
>  	if (ret)
>  		return ret;
>  
> -	while ((ret = iomap_iter(&iter, ops)) > 0)
> +	while ((ret = iomap_iter(&iter, iomap_next)) > 0)
>  		iter.status = iomap_fiemap_iter(&iter, fi, &prev);
>  
>  	if (prev.type != IOMAP_HOLE) {
> @@ -92,7 +92,7 @@ EXPORT_SYMBOL_GPL(iomap_fiemap);
>  /* legacy ->bmap interface.  0 is the error return (!) */
>  sector_t
>  iomap_bmap(struct address_space *mapping, sector_t bno,
> -		const struct iomap_ops *ops)
> +		iomap_next_fn iomap_next)
>  {
>  	struct iomap_iter iter = {
>  		.inode	= mapping->host,
> @@ -107,7 +107,7 @@ iomap_bmap(struct address_space *mapping, sector_t bno,
>  		return 0;
>  
>  	bno = 0;
> -	while ((ret = iomap_iter(&iter, ops)) > 0) {
> +	while ((ret = iomap_iter(&iter, iomap_next)) > 0) {
>  		if (iter.iomap.type == IOMAP_MAPPED)
>  			bno = iomap_sector(&iter.iomap, iter.pos) >> blkshift;
>  		/* leave iter.status unset to abort loop */
> diff --git a/fs/iomap/iter.c b/fs/iomap/iter.c
> index 466c491bdef6..984045af310a 100644
> --- a/fs/iomap/iter.c
> +++ b/fs/iomap/iter.c
> @@ -42,7 +42,7 @@ static inline void iomap_iter_done(struct iomap_iter *iter)
>  /**
>   * iomap_iter - iterate over a ranges in a file
>   * @iter: iteration structue
> - * @ops: iomap ops provided by the file system
> + * @iomap_next: iomap_next callback provided by the file system
>   *
>   * Iterate over filesystem-provided space mappings for the provided file range.
>   *
> @@ -54,13 +54,13 @@ static inline void iomap_iter_done(struct iomap_iter *iter)
>   * of the loop body:  leave @iter.status unchanged, or set it to a negative
>   * errno.
>   */
> -int iomap_iter(struct iomap_iter *iter, const struct iomap_ops *ops)
> +int iomap_iter(struct iomap_iter *iter, iomap_next_fn iomap_next)
>  {
>  	int ret;
>  
> -	trace_iomap_iter(iter, ops, _RET_IP_);
> +	trace_iomap_iter(iter, iomap_next, _RET_IP_);
>  
> -	ret = ops->iomap_next(iter, &iter->iomap, &iter->srcmap);
> +	ret = iomap_next(iter, &iter->iomap, &iter->srcmap);
>  	iter->status = 0;
>  	if (ret > 0)
>  		iomap_iter_done(iter);
> diff --git a/fs/iomap/seek.c b/fs/iomap/seek.c
> index 6cbc587c93da..1bc5053d3fc1 100644
> --- a/fs/iomap/seek.c
> +++ b/fs/iomap/seek.c
> @@ -27,7 +27,7 @@ static int iomap_seek_hole_iter(struct iomap_iter *iter,
>  }
>  
>  loff_t
> -iomap_seek_hole(struct inode *inode, loff_t pos, const struct iomap_ops *ops)
> +iomap_seek_hole(struct inode *inode, loff_t pos, iomap_next_fn iomap_next)
>  {
>  	loff_t size = i_size_read(inode);
>  	struct iomap_iter iter = {
> @@ -42,7 +42,7 @@ iomap_seek_hole(struct inode *inode, loff_t pos, const struct iomap_ops *ops)
>  		return -ENXIO;
>  
>  	iter.len = size - pos;
> -	while ((ret = iomap_iter(&iter, ops)) > 0)
> +	while ((ret = iomap_iter(&iter, iomap_next)) > 0)
>  		iter.status = iomap_seek_hole_iter(&iter, &pos);
>  	if (ret < 0)
>  		return ret;
> @@ -73,7 +73,7 @@ static int iomap_seek_data_iter(struct iomap_iter *iter,
>  }
>  
>  loff_t
> -iomap_seek_data(struct inode *inode, loff_t pos, const struct iomap_ops *ops)
> +iomap_seek_data(struct inode *inode, loff_t pos, iomap_next_fn iomap_next)
>  {
>  	loff_t size = i_size_read(inode);
>  	struct iomap_iter iter = {
> @@ -88,7 +88,7 @@ iomap_seek_data(struct inode *inode, loff_t pos, const struct iomap_ops *ops)
>  		return -ENXIO;
>  
>  	iter.len = size - pos;
> -	while ((ret = iomap_iter(&iter, ops)) > 0)
> +	while ((ret = iomap_iter(&iter, iomap_next)) > 0)
>  		iter.status = iomap_seek_data_iter(&iter, &pos);
>  	if (ret < 0)
>  		return ret;
> diff --git a/fs/iomap/swapfile.c b/fs/iomap/swapfile.c
> index 0db77c449467..b8bb34deddfc 100644
> --- a/fs/iomap/swapfile.c
> +++ b/fs/iomap/swapfile.c
> @@ -139,7 +139,7 @@ static int iomap_swapfile_iter(struct iomap_iter *iter,
>   */
>  int iomap_swapfile_activate(struct swap_info_struct *sis,
>  		struct file *swap_file, sector_t *pagespan,
> -		const struct iomap_ops *ops)
> +		iomap_next_fn iomap_next)
>  {
>  	struct inode *inode = swap_file->f_mapping->host;
>  	struct iomap_iter iter = {
> @@ -163,7 +163,7 @@ int iomap_swapfile_activate(struct swap_info_struct *sis,
>  	if (ret)
>  		return ret;
>  
> -	while ((ret = iomap_iter(&iter, ops)) > 0)
> +	while ((ret = iomap_iter(&iter, iomap_next)) > 0)
>  		iter.status = iomap_swapfile_iter(&iter, &iter.iomap, &isi);
>  	if (ret < 0)
>  		return ret;

<snip>

> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 2a0c54256e93..91480cb6a4d8 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -752,7 +752,7 @@ xfs_vm_bmap(
>  	 */
>  	if (xfs_is_cow_inode(ip) || XFS_IS_REALTIME_INODE(ip))
>  		return 0;
> -	return iomap_bmap(mapping, block, &xfs_read_iomap_ops);
> +	return iomap_bmap(mapping, block, xfs_read_iomap_next);
>  }
>  
>  static void
> @@ -793,7 +793,7 @@ xfs_vm_read_folio(
>  	struct iomap_read_folio_ctx	ctx = { .cur_folio = folio };
>  
>  	ctx.ops = xfs_get_iomap_read_ops(folio->mapping);
> -	iomap_read_folio(&xfs_read_iomap_ops, &ctx, NULL);
> +	iomap_read_folio(xfs_read_iomap_next, &ctx, NULL);
>  	return 0;
>  }
>  
> @@ -804,7 +804,7 @@ xfs_vm_readahead(
>  	struct iomap_read_folio_ctx	ctx = { .rac = rac };
>  
>  	ctx.ops = xfs_get_iomap_read_ops(rac->mapping),
> -	iomap_readahead(&xfs_read_iomap_ops, &ctx, NULL);
> +	iomap_readahead(xfs_read_iomap_next, &ctx, NULL);
>  }
>  
>  static int
> @@ -850,7 +850,7 @@ xfs_vm_swap_activate(
>  	sis->bdev = xfs_inode_buftarg(ip)->bt_bdev;
>  
>  	return iomap_swapfile_activate(sis, swap_file, span,
> -			&xfs_read_iomap_ops);
> +			xfs_read_iomap_next);
>  }
>  
>  const struct address_space_operations xfs_address_space_operations = {
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index 7f8bef1a9954..a987ffbf3c02 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -269,7 +269,7 @@ xfs_file_dio_read(
>  		dio_ops = &xfs_dio_read_bounce_ops;
>  		dio_flags |= IOMAP_DIO_BOUNCE;
>  	}
> -	ret = iomap_dio_rw(iocb, to, &xfs_read_iomap_ops, dio_ops, dio_flags,
> +	ret = iomap_dio_rw(iocb, to, xfs_read_iomap_next, dio_ops, dio_flags,
>  			NULL, 0);
>  	xfs_iunlock(ip, XFS_IOLOCK_SHARED);
>  
> @@ -292,7 +292,7 @@ xfs_file_dax_read(
>  	ret = xfs_ilock_iocb(iocb, XFS_IOLOCK_SHARED);
>  	if (ret)
>  		return ret;
> -	ret = dax_iomap_rw(iocb, to, &xfs_read_iomap_ops);
> +	ret = dax_iomap_rw(iocb, to, xfs_read_iomap_next);
>  	xfs_iunlock(ip, XFS_IOLOCK_SHARED);
>  
>  	file_accessed(iocb->ki_filp);
> @@ -742,7 +742,7 @@ xfs_file_dio_write_aligned(
>  	struct xfs_inode	*ip,
>  	struct kiocb		*iocb,
>  	struct iov_iter		*from,
> -	const struct iomap_ops	*ops,
> +	iomap_next_fn		iomap_next,
>  	const struct iomap_dio_ops *dops,
>  	struct xfs_zone_alloc_ctx *ac)
>  {
> @@ -777,7 +777,7 @@ xfs_file_dio_write_aligned(
>  	if (mapping_stable_writes(iocb->ki_filp->f_mapping))
>  		dio_flags |= IOMAP_DIO_BOUNCE;
>  	trace_xfs_file_direct_write(iocb, from);
> -	ret = iomap_dio_rw(iocb, from, ops, dops, dio_flags, ac, 0);
> +	ret = iomap_dio_rw(iocb, from, iomap_next, dops, dio_flags, ac, 0);
>  out_unlock:
>  	xfs_iunlock(ip, iolock);
>  	return ret;
> @@ -799,7 +799,7 @@ xfs_file_dio_write_zoned(
>  	if (ret < 0)
>  		return ret;
>  	ret = xfs_file_dio_write_aligned(ip, iocb, from,
> -			&xfs_zoned_direct_write_iomap_ops,
> +			xfs_zoned_direct_write_iomap_next,
>  			&xfs_dio_zoned_write_ops, &ac);
>  	xfs_zoned_space_unreserve(ip->i_mount, &ac);
>  	return ret;
> @@ -824,16 +824,16 @@ xfs_file_dio_write_atomic(
>  	unsigned int		iolock = XFS_IOLOCK_SHARED;
>  	ssize_t			ret, ocount = iov_iter_count(from);
>  	unsigned int		dio_flags = 0;
> -	const struct iomap_ops	*dops;
> +	iomap_next_fn		dops;
>  
>  	/*
>  	 * HW offload should be faster, so try that first if it is already
>  	 * known that the write length is not too large.
>  	 */
>  	if (ocount > xfs_inode_buftarg(ip)->bt_awu_max)
> -		dops = &xfs_atomic_write_cow_iomap_ops;
> +		dops = xfs_atomic_write_cow_iomap_next;
>  	else
> -		dops = &xfs_direct_write_iomap_ops;
> +		dops = xfs_direct_write_iomap_next;

Probably ought to be called iter_fn, or at least something that isn't
"dops".

>  
>  retry:
>  	ret = xfs_ilock_iocb_for_write(iocb, &iolock);
> @@ -862,9 +862,9 @@ xfs_file_dio_write_atomic(
>  	 * possible. The REQ_ATOMIC-based method is typically not possible if
>  	 * the write spans multiple extents or the disk blocks are misaligned.
>  	 */
> -	if (ret == -ENOPROTOOPT && dops == &xfs_direct_write_iomap_ops) {
> +	if (ret == -ENOPROTOOPT && dops == xfs_direct_write_iomap_next) {
>  		xfs_iunlock(ip, iolock);
> -		dops = &xfs_atomic_write_cow_iomap_ops;
> +		dops = xfs_atomic_write_cow_iomap_next;
>  		goto retry;
>  	}
>  
> @@ -947,7 +947,7 @@ xfs_file_dio_write_unaligned(
>  		flags |= IOMAP_DIO_BOUNCE;
>  
>  	trace_xfs_file_direct_write(iocb, from);
> -	ret = iomap_dio_rw(iocb, from, &xfs_direct_write_iomap_ops,
> +	ret = iomap_dio_rw(iocb, from, xfs_direct_write_iomap_next,
>  			   &xfs_dio_write_ops, flags, NULL, 0);
>  
>  	/*
> @@ -987,7 +987,7 @@ xfs_file_dio_write(
>  	if (iocb->ki_flags & IOCB_ATOMIC)
>  		return xfs_file_dio_write_atomic(ip, iocb, from);
>  	return xfs_file_dio_write_aligned(ip, iocb, from,
> -			&xfs_direct_write_iomap_ops, &xfs_dio_write_ops, NULL);
> +			xfs_direct_write_iomap_next, &xfs_dio_write_ops, NULL);
>  }
>  
>  static noinline ssize_t
> @@ -1011,7 +1011,7 @@ xfs_file_dax_write(
>  	pos = iocb->ki_pos;
>  
>  	trace_xfs_file_dax_write(iocb, from);
> -	ret = dax_iomap_rw(iocb, from, &xfs_dax_write_iomap_ops);
> +	ret = dax_iomap_rw(iocb, from, xfs_dax_write_iomap_next);
>  	if (ret > 0 && iocb->ki_pos > i_size_read(inode)) {
>  		i_size_write(inode, iocb->ki_pos);
>  		error = xfs_setfilesize(ip, pos, ret);
> @@ -1054,7 +1054,7 @@ xfs_file_buffered_write(
>  
>  	trace_xfs_file_buffered_write(iocb, from);
>  	ret = iomap_file_buffered_write(iocb, from,
> -			&xfs_buffered_write_iomap_ops, &xfs_iomap_write_ops,
> +			xfs_buffered_write_iomap_next, &xfs_iomap_write_ops,
>  			NULL);
>  
>  	/*
> @@ -1135,7 +1135,7 @@ xfs_file_buffered_write_zoned(
>  retry:
>  	trace_xfs_file_buffered_write(iocb, from);
>  	ret = iomap_file_buffered_write(iocb, from,
> -			&xfs_buffered_write_iomap_ops, &xfs_iomap_write_ops,
> +			xfs_buffered_write_iomap_next, &xfs_iomap_write_ops,
>  			&ac);
>  	if (ret == -ENOSPC && !cleared_space) {
>  		/*
> @@ -1856,10 +1856,10 @@ xfs_file_llseek(
>  	default:
>  		return generic_file_llseek(file, offset, whence);
>  	case SEEK_HOLE:
> -		offset = iomap_seek_hole(inode, offset, &xfs_seek_iomap_ops);
> +		offset = iomap_seek_hole(inode, offset, xfs_seek_iomap_next);
>  		break;
>  	case SEEK_DATA:
> -		offset = iomap_seek_data(inode, offset, &xfs_seek_iomap_ops);
> +		offset = iomap_seek_data(inode, offset, xfs_seek_iomap_next);
>  		break;
>  	}
>  
> @@ -1883,8 +1883,8 @@ xfs_dax_fault_locked(
>  	}
>  	ret = dax_iomap_fault(vmf, order, &pfn, NULL,
>  			(write_fault && !vmf->cow_page) ?
> -				&xfs_dax_write_iomap_ops :
> -				&xfs_read_iomap_ops);
> +				xfs_dax_write_iomap_next :
> +				xfs_read_iomap_next);
>  	if (ret & VM_FAULT_NEEDDSYNC)
>  		ret = dax_finish_sync_fault(vmf, order, pfn);
>  	return ret;
> @@ -1948,7 +1948,7 @@ __xfs_write_fault(
>  	if (IS_DAX(inode))
>  		ret = xfs_dax_fault_locked(vmf, order, true);
>  	else
> -		ret = iomap_page_mkwrite(vmf, &xfs_buffered_write_iomap_ops,
> +		ret = iomap_page_mkwrite(vmf, xfs_buffered_write_iomap_next,
>  				ac);
>  	xfs_iunlock(ip, lock_mode);
>  
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index 4fa1a5c985db..71c4bb024f04 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -1037,7 +1037,7 @@ xfs_direct_write_iomap_begin(
>  	return error;
>  }
>  
> -static int
> +int
>  xfs_direct_write_iomap_next(
>  	const struct iomap_iter *iter,
>  	struct iomap		*iomap,
> @@ -1047,10 +1047,6 @@ xfs_direct_write_iomap_next(
>  			NULL);
>  }
>  
> -const struct iomap_ops xfs_direct_write_iomap_ops = {
> -	.iomap_next		= xfs_direct_write_iomap_next,
> -};
> -
>  #ifdef CONFIG_XFS_RT
>  /*
>   * This is really simple.  The space has already been reserved before taking the
> @@ -1099,7 +1095,7 @@ xfs_zoned_direct_write_iomap_begin(
>  	return 0;
>  }
>  
> -static int
> +int
>  xfs_zoned_direct_write_iomap_next(
>  	const struct iomap_iter *iter,
>  	struct iomap		*iomap,
> @@ -1109,9 +1105,6 @@ xfs_zoned_direct_write_iomap_next(
>  			xfs_zoned_direct_write_iomap_begin, NULL);
>  }
>  
> -const struct iomap_ops xfs_zoned_direct_write_iomap_ops = {
> -	.iomap_next		= xfs_zoned_direct_write_iomap_next,
> -};
>  #endif /* CONFIG_XFS_RT */
>  
>  #ifdef DEBUG
> @@ -1294,7 +1287,7 @@ xfs_atomic_write_cow_iomap_begin(
>  	return error;
>  }
>  
> -static int
> +int
>  xfs_atomic_write_cow_iomap_next(
>  	const struct iomap_iter *iter,
>  	struct iomap		*iomap,
> @@ -1304,10 +1297,6 @@ xfs_atomic_write_cow_iomap_next(
>  			xfs_atomic_write_cow_iomap_begin, NULL);
>  }
>  
> -const struct iomap_ops xfs_atomic_write_cow_iomap_ops = {
> -	.iomap_next		= xfs_atomic_write_cow_iomap_next,
> -};
> -
>  static int
>  xfs_dax_write_iomap_end(
>  	struct inode		*inode,
> @@ -1328,7 +1317,7 @@ xfs_dax_write_iomap_end(
>  	return xfs_reflink_end_cow(ip, pos, written);
>  }
>  
> -static int
> +int
>  xfs_dax_write_iomap_next(
>  	const struct iomap_iter *iter,
>  	struct iomap		*iomap,
> @@ -1338,10 +1327,6 @@ xfs_dax_write_iomap_next(
>  			xfs_dax_write_iomap_end);
>  }
>  
> -const struct iomap_ops xfs_dax_write_iomap_ops = {
> -	.iomap_next	= xfs_dax_write_iomap_next,
> -};
> -
>  /*
>   * Convert a hole to a delayed allocation.
>   */
> @@ -2207,7 +2192,7 @@ xfs_buffered_write_iomap_end(
>  	return 0;
>  }
>  
> -static int
> +int
>  xfs_buffered_write_iomap_next(
>  	const struct iomap_iter *iter,
>  	struct iomap		*iomap,
> @@ -2218,10 +2203,6 @@ xfs_buffered_write_iomap_next(
>  			xfs_buffered_write_iomap_end);
>  }
>  
> -const struct iomap_ops xfs_buffered_write_iomap_ops = {
> -	.iomap_next		= xfs_buffered_write_iomap_next,
> -};
> -
>  static int
>  xfs_read_iomap_begin(
>  	struct inode		*inode,
> @@ -2263,7 +2244,7 @@ xfs_read_iomap_begin(
>  				 shared ? IOMAP_F_SHARED : 0, seq);
>  }
>  
> -static int
> +int
>  xfs_read_iomap_next(
>  	const struct iomap_iter *iter,
>  	struct iomap		*iomap,
> @@ -2272,10 +2253,6 @@ xfs_read_iomap_next(
>  	return iomap_process(iter, iomap, srcmap, xfs_read_iomap_begin, NULL);
>  }
>  
> -const struct iomap_ops xfs_read_iomap_ops = {
> -	.iomap_next		= xfs_read_iomap_next,
> -};
> -
>  static int
>  xfs_seek_iomap_begin(
>  	struct inode		*inode,
> @@ -2360,7 +2337,7 @@ xfs_seek_iomap_begin(
>  	return error;
>  }
>  
> -static int
> +int
>  xfs_seek_iomap_next(
>  	const struct iomap_iter *iter,
>  	struct iomap		*iomap,
> @@ -2369,10 +2346,6 @@ xfs_seek_iomap_next(
>  	return iomap_process(iter, iomap, srcmap, xfs_seek_iomap_begin, NULL);
>  }
>  
> -const struct iomap_ops xfs_seek_iomap_ops = {
> -	.iomap_next		= xfs_seek_iomap_next,
> -};
> -
>  static int
>  xfs_xattr_iomap_begin(
>  	struct inode		*inode,
> @@ -2416,7 +2389,7 @@ xfs_xattr_iomap_begin(
>  	return xfs_bmbt_to_iomap(ip, iomap, &imap, flags, IOMAP_F_XATTR, seq);
>  }
>  
> -static int
> +int
>  xfs_xattr_iomap_next(
>  	const struct iomap_iter *iter,
>  	struct iomap		*iomap,
> @@ -2425,10 +2398,6 @@ xfs_xattr_iomap_next(
>  	return iomap_process(iter, iomap, srcmap, xfs_xattr_iomap_begin, NULL);
>  }
>  
> -const struct iomap_ops xfs_xattr_iomap_ops = {
> -	.iomap_next		= xfs_xattr_iomap_next,
> -};
> -
>  int
>  xfs_zero_range(
>  	struct xfs_inode	*ip,
> @@ -2443,9 +2412,9 @@ xfs_zero_range(
>  
>  	if (IS_DAX(inode))
>  		return dax_zero_range(inode, pos, len, did_zero,
> -				      &xfs_dax_write_iomap_ops);
> +				      xfs_dax_write_iomap_next);
>  	return iomap_zero_range(inode, pos, len, did_zero,
> -			&xfs_buffered_write_iomap_ops, &xfs_iomap_write_ops,
> +			xfs_buffered_write_iomap_next, &xfs_iomap_write_ops,
>  			ac);
>  }
>  
> @@ -2460,8 +2429,8 @@ xfs_truncate_page(
>  
>  	if (IS_DAX(inode))
>  		return dax_truncate_page(inode, pos, did_zero,
> -					&xfs_dax_write_iomap_ops);
> +					xfs_dax_write_iomap_next);
>  	return iomap_truncate_page(inode, pos, did_zero,
> -			&xfs_buffered_write_iomap_ops, &xfs_iomap_write_ops,
> +			xfs_buffered_write_iomap_next, &xfs_iomap_write_ops,
>  			ac);
>  }
> diff --git a/fs/xfs/xfs_iomap.h b/fs/xfs/xfs_iomap.h
> index ebcce7d49446..01875d20fb66 100644
> --- a/fs/xfs/xfs_iomap.h
> +++ b/fs/xfs/xfs_iomap.h
> @@ -49,14 +49,22 @@ xfs_aligned_fsb_count(
>  	return count_fsb;
>  }
>  
> -extern const struct iomap_ops xfs_buffered_write_iomap_ops;
> -extern const struct iomap_ops xfs_direct_write_iomap_ops;
> -extern const struct iomap_ops xfs_zoned_direct_write_iomap_ops;
> -extern const struct iomap_ops xfs_read_iomap_ops;
> -extern const struct iomap_ops xfs_seek_iomap_ops;
> -extern const struct iomap_ops xfs_xattr_iomap_ops;
> -extern const struct iomap_ops xfs_dax_write_iomap_ops;
> -extern const struct iomap_ops xfs_atomic_write_cow_iomap_ops;
> +int xfs_buffered_write_iomap_next(const struct iomap_iter *iter,
> +		struct iomap *iomap, struct iomap *srcmap);
> +int xfs_direct_write_iomap_next(const struct iomap_iter *iter,
> +		struct iomap *iomap, struct iomap *srcmap);
> +int xfs_zoned_direct_write_iomap_next(const struct iomap_iter *iter,
> +		struct iomap *iomap, struct iomap *srcmap);
> +int xfs_read_iomap_next(const struct iomap_iter *iter, struct iomap *iomap,
> +		struct iomap *srcmap);
> +int xfs_seek_iomap_next(const struct iomap_iter *iter, struct iomap *iomap,
> +		struct iomap *srcmap);
> +int xfs_xattr_iomap_next(const struct iomap_iter *iter, struct iomap *iomap,
> +		struct iomap *srcmap);
> +int xfs_dax_write_iomap_next(const struct iomap_iter *iter, struct iomap *iomap,
> +		struct iomap *srcmap);
> +int xfs_atomic_write_cow_iomap_next(const struct iomap_iter *iter,
> +		struct iomap *iomap, struct iomap *srcmap);
>  extern const struct iomap_write_ops xfs_iomap_write_ops;
>  
>  #endif /* __XFS_IOMAP_H__*/
> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> index 6339f4956ecb..5c3d9a365f93 100644
> --- a/fs/xfs/xfs_iops.c
> +++ b/fs/xfs/xfs_iops.c
> @@ -1239,10 +1239,10 @@ xfs_vn_fiemap(
>  	if (fieinfo->fi_flags & FIEMAP_FLAG_XATTR) {
>  		fieinfo->fi_flags &= ~FIEMAP_FLAG_XATTR;
>  		error = iomap_fiemap(inode, fieinfo, start, length,
> -				&xfs_xattr_iomap_ops);
> +				xfs_xattr_iomap_next);
>  	} else {
>  		error = iomap_fiemap(inode, fieinfo, start, length,
> -				&xfs_read_iomap_ops);
> +				xfs_read_iomap_next);
>  	}
>  	xfs_iunlock(XFS_I(inode), XFS_IOLOCK_SHARED);
>  
> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index a5c188b78138..2b9792626bab 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -1683,7 +1683,7 @@ xfs_reflink_remap_prep(
>  				pos_out, len, remap_flags);
>  	else
>  		ret = dax_remap_file_range_prep(file_in, pos_in, file_out,
> -				pos_out, len, remap_flags, &xfs_read_iomap_ops);
> +				pos_out, len, remap_flags, xfs_read_iomap_next);
>  	if (ret || *len == 0)
>  		goto out_unlock;
>  
> @@ -1878,10 +1878,10 @@ xfs_reflink_unshare(
>  
>  	if (IS_DAX(inode))
>  		error = dax_file_unshare(inode, offset, len,
> -				&xfs_dax_write_iomap_ops);
> +				xfs_dax_write_iomap_next);
>  	else
>  		error = iomap_file_unshare(inode, offset, len,
> -				&xfs_buffered_write_iomap_ops,
> +				xfs_buffered_write_iomap_next,
>  				&xfs_iomap_write_ops);
>  	if (error)
>  		goto out;

Aside from the name bikeshedding, the logic looks solid. :)

--D

^ permalink raw reply

* Re: [PATCH v4 1/5] block: use integrity interval instead of sector as seed
From: Martin K. Petersen @ 2026-07-02 23:41 UTC (permalink / raw)
  To: Caleb Sander Mateos
  Cc: Martin K. Petersen, Anuj Gupta, linux-block, linux-nvme,
	linux-scsi, target-devel, linux-kernel, Jens Axboe,
	Christoph Hellwig, Sagi Grimberg, Chaitanya Kulkarni
In-Reply-To: <CADUfDZqE6JLLbugK+qYe=XFSLV5zUwFEQ6qRqvJ-=uVs8OtfVg@mail.gmail.com>


Caleb,

> Martin, are you okay with this updated commit message? Would be nice
> to get this fix in so auto-integrity works correctly for zone appends
> on 4KB-integrity-interval devices.

I'm on vacation.

Will take a look when I'm back next week.

-- 
Martin K. Petersen

^ permalink raw reply

* Re: [PATCH v2 10/18] block: convert iomap ops to ->iomap_next()
From: Joanne Koong @ 2026-07-03  0:06 UTC (permalink / raw)
  To: Keith Busch
  Cc: brauner, hch, djwong, willy, hsiangkao, linux-fsdevel, linux-xfs,
	Jens Axboe, open list:BLOCK LAYER, open list
In-Reply-To: <akRlehPHzy-F_105@kbusch-mbp>

On Tue, Jun 30, 2026 at 5:55 PM Keith Busch <kbusch@kernel.org> wrote:
>
> On Tue, Jun 30, 2026 at 05:09:25PM -0700, Joanne Koong wrote:
> >  static const struct iomap_ops blkdev_iomap_ops = {
> > -     .iomap_begin            = blkdev_iomap_begin,
> > +     .iomap_next             = blkdev_iomap_next,
> >  };
>
> I think it's generally safe to use the same mailing list for the entire
> series. There's no context here on what "iomap_next" is because I'm
> subscribed only to linux-block. I found the rest here:
>
>   https://lore.kernel.org/linux-fsdevel/20260701000949.1666714-1-joannelkoong@gmail.com/

Ahh I see, thanks for the heads up Keith. I'll do that from now on.

Thanks,
Joanne

^ permalink raw reply

* Re: [PATCH v2 17/18] iomap: pass iomap_next_fn directly instead of struct iomap_ops
From: Joanne Koong @ 2026-07-03  0:17 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: brauner, hch, willy, hsiangkao, linux-fsdevel, linux-xfs,
	Jens Axboe, Chris Mason, David Sterba, Alexander Viro, Jan Kara,
	Dan Williams, Gao Xiang, Chao Yu, Yue Hu, Jeffle Xu,
	Sandeep Dhavale, Hongbo Li, Chunhai Guo, Namjae Jeon,
	Sungjong Seo, Yuezhang Mo, Theodore Ts'o, Andreas Dilger,
	Baokun Li, Ojaswin Mujoo, Ritesh Harjani (IBM), Zhang Yi,
	Jaegeuk Kim, Miklos Szeredi, Andreas Gruenbacher, Mikulas Patocka,
	Hyunchul Lee, Konstantin Komarov, Carlos Maiolino, Damien Le Moal,
	Naohiro Aota, Johannes Thumshirn, open list:BLOCK LAYER,
	open list, open list:BTRFS FILE SYSTEM,
	open list:FILESYSTEM DIRECT ACCESS (DAX),
	open list:EROFS FILE SYSTEM, open list:EXT2 FILE SYSTEM,
	open list:F2FS FILE SYSTEM, open list:FUSE FILESYSTEM [CORE],
	open list:GFS2 FILE SYSTEM, open list:NTFS3 FILESYSTEM
In-Reply-To: <20260702165841.GM9392@frogsfrogsfrogs>

On Thu, Jul 2, 2026 at 9:58 AM Darrick J. Wong <djwong@kernel.org> wrote:
>
> > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > index 3f0932e46fd6..0aa8abc438c1 100644
> > --- a/fs/iomap/buffered-io.c
> > +++ b/fs/iomap/buffered-io.c
> > @@ -626,7 +626,7 @@ static int iomap_read_folio_iter(struct iomap_iter *iter,
> >       return 0;
> >  }
> >
> > -void iomap_read_folio(const struct iomap_ops *ops,
> > +void iomap_read_folio(iomap_next_fn iomap_next,
>
> If you took my earlier suggestion to rename the typedef to
> iomap_iter_fn, then this parameter ought to be named iter_fn.

Hmm... maybe at that point, it's self-explanatory enough that the arg
could just be called "iter" instead of "iter_fn"?

>
> >               struct iomap_read_folio_ctx *ctx, void *private)
> >  {
> >       struct folio *folio = ctx->cur_folio;
> > @@ -650,7 +650,7 @@ void iomap_read_folio(const struct iomap_ops *ops,
> >               fsverity_readahead(ctx->vi, folio->index,
> >                                  folio_nr_pages(folio));
> >
> > -     while ((ret = iomap_iter(&iter, ops)) > 0) {
> > +     while ((ret = iomap_iter(&iter, iomap_next)) > 0) {
> >               iter.status = iomap_read_folio_iter(&iter, ctx,
> >                               &bytes_submitted);
> >               iomap_read_submit(&iter, ctx);
> > @@ -688,22 +688,22 @@ static int iomap_readahead_iter(struct iomap_iter *iter,
> >
> >  /**
> >   * iomap_readahead - Attempt to read pages from a file.
> > - * @ops: The operations vector for the filesystem.
> > + * @iomap_next: The iomap_next callback for the filesystem.
>
> "The iomap iteration function for the filesystem" ?
>
> Using the term "iomap_next" in the definition for iomap_next isn't that
> helpful.

Agreed, I'll replace this with your suggestion.

>
> >       return ret;
> > @@ -824,16 +824,16 @@ xfs_file_dio_write_atomic(
> >       unsigned int            iolock = XFS_IOLOCK_SHARED;
> >       ssize_t                 ret, ocount = iov_iter_count(from);
> >       unsigned int            dio_flags = 0;
> > -     const struct iomap_ops  *dops;
> > +     iomap_next_fn           dops;
> >
> >       /*
> >        * HW offload should be faster, so try that first if it is already
> >        * known that the write length is not too large.
> >        */
> >       if (ocount > xfs_inode_buftarg(ip)->bt_awu_max)
> > -             dops = &xfs_atomic_write_cow_iomap_ops;
> > +             dops = xfs_atomic_write_cow_iomap_next;
> >       else
> > -             dops = &xfs_direct_write_iomap_ops;
> > +             dops = xfs_direct_write_iomap_next;
>
> Probably ought to be called iter_fn, or at least something that isn't
> "dops".

Nice spotting, I'll rename this in the next version.

Thanks,
Joanne

^ permalink raw reply

* Re: [PATCH v2 17/18] iomap: pass iomap_next_fn directly instead of struct iomap_ops
From: Darrick J. Wong @ 2026-07-03  1:42 UTC (permalink / raw)
  To: Joanne Koong
  Cc: brauner, hch, willy, hsiangkao, linux-fsdevel, linux-xfs,
	Jens Axboe, Chris Mason, David Sterba, Alexander Viro, Jan Kara,
	Dan Williams, Gao Xiang, Chao Yu, Yue Hu, Jeffle Xu,
	Sandeep Dhavale, Hongbo Li, Chunhai Guo, Namjae Jeon,
	Sungjong Seo, Yuezhang Mo, Theodore Ts'o, Andreas Dilger,
	Baokun Li, Ojaswin Mujoo, Ritesh Harjani (IBM), Zhang Yi,
	Jaegeuk Kim, Miklos Szeredi, Andreas Gruenbacher, Mikulas Patocka,
	Hyunchul Lee, Konstantin Komarov, Carlos Maiolino, Damien Le Moal,
	Naohiro Aota, Johannes Thumshirn, open list:BLOCK LAYER,
	open list, open list:BTRFS FILE SYSTEM,
	open list:FILESYSTEM DIRECT ACCESS (DAX),
	open list:EROFS FILE SYSTEM, open list:EXT2 FILE SYSTEM,
	open list:F2FS FILE SYSTEM, open list:FUSE FILESYSTEM [CORE],
	open list:GFS2 FILE SYSTEM, open list:NTFS3 FILESYSTEM
In-Reply-To: <CAJnrk1YW0gKRVvHRC+WeKoV2vrquzaC6UkipZkQ34Z0RAQDjtg@mail.gmail.com>

On Thu, Jul 02, 2026 at 05:17:02PM -0700, Joanne Koong wrote:
> On Thu, Jul 2, 2026 at 9:58 AM Darrick J. Wong <djwong@kernel.org> wrote:
> >
> > > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > > index 3f0932e46fd6..0aa8abc438c1 100644
> > > --- a/fs/iomap/buffered-io.c
> > > +++ b/fs/iomap/buffered-io.c
> > > @@ -626,7 +626,7 @@ static int iomap_read_folio_iter(struct iomap_iter *iter,
> > >       return 0;
> > >  }
> > >
> > > -void iomap_read_folio(const struct iomap_ops *ops,
> > > +void iomap_read_folio(iomap_next_fn iomap_next,
> >
> > If you took my earlier suggestion to rename the typedef to
> > iomap_iter_fn, then this parameter ought to be named iter_fn.
> 
> Hmm... maybe at that point, it's self-explanatory enough that the arg
> could just be called "iter" instead of "iter_fn"?

Dunno.  Seeing as we already have variables named "iter" that are the
actual iteration state object, I think it's clearer to leave the
iteration function as "iter_fn".

> >
> > >               struct iomap_read_folio_ctx *ctx, void *private)
> > >  {
> > >       struct folio *folio = ctx->cur_folio;
> > > @@ -650,7 +650,7 @@ void iomap_read_folio(const struct iomap_ops *ops,
> > >               fsverity_readahead(ctx->vi, folio->index,
> > >                                  folio_nr_pages(folio));
> > >
> > > -     while ((ret = iomap_iter(&iter, ops)) > 0) {
> > > +     while ((ret = iomap_iter(&iter, iomap_next)) > 0) {
> > >               iter.status = iomap_read_folio_iter(&iter, ctx,
> > >                               &bytes_submitted);
> > >               iomap_read_submit(&iter, ctx);
> > > @@ -688,22 +688,22 @@ static int iomap_readahead_iter(struct iomap_iter *iter,
> > >
> > >  /**
> > >   * iomap_readahead - Attempt to read pages from a file.
> > > - * @ops: The operations vector for the filesystem.
> > > + * @iomap_next: The iomap_next callback for the filesystem.
> >
> > "The iomap iteration function for the filesystem" ?
> >
> > Using the term "iomap_next" in the definition for iomap_next isn't that
> > helpful.
> 
> Agreed, I'll replace this with your suggestion.

<nod>

> >
> > >       return ret;
> > > @@ -824,16 +824,16 @@ xfs_file_dio_write_atomic(
> > >       unsigned int            iolock = XFS_IOLOCK_SHARED;
> > >       ssize_t                 ret, ocount = iov_iter_count(from);
> > >       unsigned int            dio_flags = 0;
> > > -     const struct iomap_ops  *dops;
> > > +     iomap_next_fn           dops;
> > >
> > >       /*
> > >        * HW offload should be faster, so try that first if it is already
> > >        * known that the write length is not too large.
> > >        */
> > >       if (ocount > xfs_inode_buftarg(ip)->bt_awu_max)
> > > -             dops = &xfs_atomic_write_cow_iomap_ops;
> > > +             dops = xfs_atomic_write_cow_iomap_next;
> > >       else
> > > -             dops = &xfs_direct_write_iomap_ops;
> > > +             dops = xfs_direct_write_iomap_next;
> >
> > Probably ought to be called iter_fn, or at least something that isn't
> > "dops".
> 
> Nice spotting, I'll rename this in the next version.

<nod>

--D

> Thanks,
> Joanne
> 

^ permalink raw reply

* Re: [PATCH v2 17/18] iomap: pass iomap_next_fn directly instead of struct iomap_ops
From: Joanne Koong @ 2026-07-03  1:47 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, brauner, willy, hsiangkao, linux-fsdevel,
	linux-xfs, Jens Axboe, Chris Mason, David Sterba, Alexander Viro,
	Jan Kara, Dan Williams, Gao Xiang, Chao Yu, Yue Hu, Jeffle Xu,
	Sandeep Dhavale, Hongbo Li, Chunhai Guo, Namjae Jeon,
	Sungjong Seo, Yuezhang Mo, Theodore Ts'o, Andreas Dilger,
	Baokun Li, Ojaswin Mujoo, Ritesh Harjani (IBM), Zhang Yi,
	Jaegeuk Kim, Miklos Szeredi, Andreas Gruenbacher, Mikulas Patocka,
	Hyunchul Lee, Konstantin Komarov, Carlos Maiolino, Damien Le Moal,
	Naohiro Aota, Johannes Thumshirn, open list:BLOCK LAYER,
	open list, open list:BTRFS FILE SYSTEM,
	open list:FILESYSTEM DIRECT ACCESS (DAX),
	open list:EROFS FILE SYSTEM, open list:EXT2 FILE SYSTEM,
	open list:F2FS FILE SYSTEM, open list:FUSE FILESYSTEM [CORE],
	open list:GFS2 FILE SYSTEM, open list:NTFS3 FILESYSTEM
In-Reply-To: <20260702165117.GK9392@frogsfrogsfrogs>

On Thu, Jul 2, 2026 at 9:51 AM Darrick J. Wong <djwong@kernel.org> wrote:
>
> On Thu, Jul 02, 2026 at 04:07:05PM +0200, Christoph Hellwig wrote:
> > Looks good:
> >
> > Reviewed-by: Christoph Hellwig <hch@lst.de>
> >
> > In terms of merge logistics, I wonder if we should delay this and
> > the previous patch to the next merge window so that we can minimize the
> > cross-subsystem merge pain with more file system iomap conversion.
> > If none of them actually happen until rc6 or so, orif  the merges aren't
> > painful we could still pick them up late in the merge window.
>
> I'd say everything but this patch should go in during the merge window
> for 7.3, along with clear instructions to brauner/torvalds to expect
> this patch to appear right before 7.3-rc1 gets tagged, to clean up all
> the other changes that come in.

Just to clarify, did you mean this patch and the previous one? If i'm
interpreting Christoph's concern correctly, I think he's worried about
other filesystems converting to iomap using the ->iomap_begin() /
->iomap_end() functions still? That sounds like a good plan to me, for
v3 I'll submit everything but this patch and the last one and then
submit these patches (and any cleanup ones that become necessary) to
Christian right before 7.3-rc1 gets tagged (which as I understand it,
is when the merge window is about to close).

Thanks,
Joanne

^ permalink raw reply

* Re: [PATCH v2 17/18] iomap: pass iomap_next_fn directly instead of struct iomap_ops
From: Darrick J. Wong @ 2026-07-03  2:01 UTC (permalink / raw)
  To: Joanne Koong
  Cc: Christoph Hellwig, brauner, willy, hsiangkao, linux-fsdevel,
	linux-xfs, Jens Axboe, Chris Mason, David Sterba, Alexander Viro,
	Jan Kara, Dan Williams, Gao Xiang, Chao Yu, Yue Hu, Jeffle Xu,
	Sandeep Dhavale, Hongbo Li, Chunhai Guo, Namjae Jeon,
	Sungjong Seo, Yuezhang Mo, Theodore Ts'o, Andreas Dilger,
	Baokun Li, Ojaswin Mujoo, Ritesh Harjani (IBM), Zhang Yi,
	Jaegeuk Kim, Miklos Szeredi, Andreas Gruenbacher, Mikulas Patocka,
	Hyunchul Lee, Konstantin Komarov, Carlos Maiolino, Damien Le Moal,
	Naohiro Aota, Johannes Thumshirn, open list:BLOCK LAYER,
	open list, open list:BTRFS FILE SYSTEM,
	open list:FILESYSTEM DIRECT ACCESS (DAX),
	open list:EROFS FILE SYSTEM, open list:EXT2 FILE SYSTEM,
	open list:F2FS FILE SYSTEM, open list:FUSE FILESYSTEM [CORE],
	open list:GFS2 FILE SYSTEM, open list:NTFS3 FILESYSTEM
In-Reply-To: <CAJnrk1b8j5WHtbHOWNXc4=QBFOxde1f2QxTOeui7Ta8O-xWcTA@mail.gmail.com>

On Thu, Jul 02, 2026 at 06:47:43PM -0700, Joanne Koong wrote:
> On Thu, Jul 2, 2026 at 9:51 AM Darrick J. Wong <djwong@kernel.org> wrote:
> >
> > On Thu, Jul 02, 2026 at 04:07:05PM +0200, Christoph Hellwig wrote:
> > > Looks good:
> > >
> > > Reviewed-by: Christoph Hellwig <hch@lst.de>
> > >
> > > In terms of merge logistics, I wonder if we should delay this and
> > > the previous patch to the next merge window so that we can minimize the
> > > cross-subsystem merge pain with more file system iomap conversion.
> > > If none of them actually happen until rc6 or so, orif  the merges aren't
> > > painful we could still pick them up late in the merge window.
> >
> > I'd say everything but this patch should go in during the merge window
> > for 7.3, along with clear instructions to brauner/torvalds to expect
> > this patch to appear right before 7.3-rc1 gets tagged, to clean up all
> > the other changes that come in.
> 
> Just to clarify, did you mean this patch and the previous one? If i'm

Er, yes, patches 16-18 in this series.

> interpreting Christoph's concern correctly, I think he's worried about
> other filesystems converting to iomap using the ->iomap_begin() /
> ->iomap_end() functions still? That sounds like a good plan to me, for
> v3 I'll submit everything but this patch and the last one and then
> submit these patches (and any cleanup ones that become necessary) to
> Christian right before 7.3-rc1 gets tagged (which as I understand it,
> is when the merge window is about to close).

Yes.  And be sure to ask both of them beforehand so there aren't any
youknowwho-style surprises/outrages.

--D

> Thanks,
> Joanne
> 

^ permalink raw reply

* [PATCH] block: reject block device inodes with i_rdev == 0 in lookup_bdev()
From: Yun Zhou @ 2026-07-03  6:55 UTC (permalink / raw)
  To: axboe, brauner; +Cc: linux-block, linux-kernel, yun.zhou

lookup_bdev() blindly returns inode->i_rdev without validating it.
When a FUSE filesystem exposes a root inode with S_IFBLK mode but
i_rdev == 0 (via rootmode=060000), any subsequent mount attempt using
that path as a block device source propagates dev_t 0 into the
superblock machinery.  After commit 9ee5f161a4db ("fs: maintain a
global device-to-superblock table") this triggers a WARNING in
super_dev_register().

Reject i_rdev == 0 early with -ENODEV since no real block device
driver registers major 0.

Reported-by: syzbot+72fe3ea5814121fbc76e@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=72fe3ea5814121fbc76e
Signed-off-by: Yun Zhou <yun.zhou@windriver.com>
---
 block/bdev.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/block/bdev.c b/block/bdev.c
index 28b0d40c362f..797d7f0ef609 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -1278,6 +1278,18 @@ int lookup_bdev(const char *pathname, dev_t *dev)
 	if (!may_open_dev(&path))
 		goto out_path_put;
 
+	/*
+	 * Reject a block device inode with i_rdev == 0.  A dev_t of 0 is
+	 * never valid for a block device: no real block device driver
+	 * registers major 0.  Fake block device inodes (e.g. fuse with
+	 * rootmode=S_IFBLK) can expose i_rdev == 0, and letting that
+	 * propagate would confuse superblock lookup and trigger warnings
+	 * in the device-to-superblock table (super_dev_register).
+	 */
+	error = -ENODEV;
+	if (!inode->i_rdev)
+		goto out_path_put;
+
 	*dev = inode->i_rdev;
 	error = 0;
 out_path_put:
-- 
2.43.0


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox