[RESEND PATCH] dm-ebs: Mark full buffer dirty even on partial write

dm-devel.redhat.com archive mirror
 help / color / mirror / Atom feed

* [RESEND PATCH] dm-ebs: Mark full buffer dirty even on partial write
@ 2025-10-14 14:47 Uladzislau Rezki (Sony)
  2025-10-16 19:59 ` Andrew Morton
  0 siblings, 1 reply; 33+ messages in thread
From: Uladzislau Rezki (Sony) @ 2025-10-14 14:47 UTC (permalink / raw)
  To: Mikulas Patocka, Alasdair Kergon
  Cc: Mike Snitzer, Christoph Hellwig, LKML, DMML, Uladzislau Rezki

When performing a read-modify-write(RMW) operation, any modification
to a buffered block must cause the entire buffer to be marked dirty.

Marking only a subrange as dirty is incorrect because the underlying
device block size(ubs) defines the minimum read/write granularity. A
lower device can perform I/O only on regions which are fully aligned
and sized to ubs.

This change ensures that write-back operations always occur in full
ubs-sized chunks, matching the intended emulation semantics of the
EBS target.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
 drivers/md/dm-ebs-target.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/md/dm-ebs-target.c b/drivers/md/dm-ebs-target.c
index 6abb31ca9662..b354e74a670e 100644
--- a/drivers/md/dm-ebs-target.c
+++ b/drivers/md/dm-ebs-target.c
@@ -103,7 +103,7 @@ static int __ebs_rw_bvec(struct ebs_c *ec, enum req_op op, struct bio_vec *bv,
 			} else {
 				flush_dcache_page(bv->bv_page);
 				memcpy(ba, pa, cur_len);
-				dm_bufio_mark_partial_buffer_dirty(b, buf_off, buf_off + cur_len);
+				dm_bufio_mark_buffer_dirty(b);
 			}

 			dm_bufio_release(b);
-- 
2.47.3

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [RESEND PATCH] dm-ebs: Mark full buffer dirty even on partial write
  2025-10-14 14:47 [RESEND PATCH] dm-ebs: Mark full buffer dirty even on partial write Uladzislau Rezki (Sony)
@ 2025-10-16 19:59 ` Andrew Morton
  2025-10-17 15:55   ` Uladzislau Rezki
  0 siblings, 1 reply; 33+ messages in thread
From: Andrew Morton @ 2025-10-16 19:59 UTC (permalink / raw)
  To: Uladzislau Rezki (Sony)
  Cc: Mikulas Patocka, Alasdair Kergon, Mike Snitzer, Christoph Hellwig,
	LKML, DMML

On Tue, 14 Oct 2025 16:47:31 +0200 "Uladzislau Rezki (Sony)" <urezki@gmail.com> wrote:

> When performing a read-modify-write(RMW) operation, any modification
> to a buffered block must cause the entire buffer to be marked dirty.
> 
> Marking only a subrange as dirty is incorrect because the underlying
> device block size(ubs) defines the minimum read/write granularity. A
> lower device can perform I/O only on regions which are fully aligned
> and sized to ubs.
> 
> This change ensures that write-back operations always occur in full
> ubs-sized chunks, matching the intended emulation semantics of the
> EBS target.

It sounds like this can result in corruption under some circumstances?

It would be helpful if you could spell this out clearly, please.  What
are the userspace-visible effects of this bug and how are those effects
demonstrated?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RESEND PATCH] dm-ebs: Mark full buffer dirty even on partial write
  2025-10-16 19:59 ` Andrew Morton
@ 2025-10-17 15:55   ` Uladzislau Rezki
  0 siblings, 0 replies; 33+ messages in thread
From: Uladzislau Rezki @ 2025-10-17 15:55 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Uladzislau Rezki (Sony), Mikulas Patocka, Alasdair Kergon,
	Mike Snitzer, Christoph Hellwig, LKML, DMML

On Thu, Oct 16, 2025 at 12:59:51PM -0700, Andrew Morton wrote:
> On Tue, 14 Oct 2025 16:47:31 +0200 "Uladzislau Rezki (Sony)" <urezki@gmail.com> wrote:
> 
> > When performing a read-modify-write(RMW) operation, any modification
> > to a buffered block must cause the entire buffer to be marked dirty.
> > 
> > Marking only a subrange as dirty is incorrect because the underlying
> > device block size(ubs) defines the minimum read/write granularity. A
> > lower device can perform I/O only on regions which are fully aligned
> > and sized to ubs.
> > 
> > This change ensures that write-back operations always occur in full
> > ubs-sized chunks, matching the intended emulation semantics of the
> > EBS target.
> 
> It sounds like this can result in corruption under some circumstances?
> 
> It would be helpful if you could spell this out clearly, please.  What
> are the userspace-visible effects of this bug and how are those effects
> demonstrated?

See below:

<snip>
commit 333b5e9ff2ccb35c3040fa8b0fd7011dfd42aae2
Author: Uladzislau Rezki (Sony) <urezki@gmail.com>
Date:   Wed Oct 8 19:49:50 2025 +0200

    dm-ebs: Mark full buffer dirty even on partial write
    
    When performing a read-modify-write(RMW) operation, any modification
    to a buffered block must cause the entire buffer to be marked dirty.
    
    Marking only a subrange as dirty is incorrect because the underlying
    device block size(ubs) defines the minimum read/write granularity. A
    lower device can perform I/O only on regions which are fully aligned
    and sized to ubs.
    
    This change ensures that write-back operations always occur in full
    ubs-sized chunks, matching the intended emulation semantics of the
    EBS target.
    
    As for user space visible impact, submitting sub-ubs and misaligned
    I/O for devices which are tuned to ubs sizes only, will reject such
    requests, therefore it can lead to losing data. Example:
    
    1) Create a 8K nvme device in qemu by adding
    
    -device nvme,drive=drv0,serial=foo,logical_block_size=8192,physical_block_size=8192
    
    2) Setup dm-ebs to emulate 512B to 8K mapping.
    
    urezki@pc638:~/bin$ cat dmsetup.sh
    
    lower=/dev/nvme0n1
    len=$(blockdev --getsz "$lower")
    
    echo "0 $len ebs $lower 0 1 16" | dmsetup create nvme-8k
    urezki@pc638:~/bin$
    
    offset 0, ebs=1 and ubs=16(in sectors).
    
    3) Create an ext4 filesystem(default 4K block size)
    
    urezki@pc638:~/bin$ sudo mkfs.ext4 -F /dev/dm-0
    mke2fs 1.47.0 (5-Feb-2023)
    Discarding device blocks: done
    Creating filesystem with 2072576 4k blocks and 518144 inodes
    Filesystem UUID: bd0b6ca6-0506-4e31-86da-8d22c9d50b63
    Superblock backups stored on blocks:
            32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632
    
    Allocating group tables: done
    Writing inode tables: done
    Creating journal (16384 blocks): done
    Writing superblocks and filesystem accounting information: mkfs.ext4: Input/output error while writing out and closing file system
    urezki@pc638:~/bin$ dmesg
    
    <snip>
    [ 1618.875449] buffer_io_error: 1028 callbacks suppressed
    [ 1618.875456] Buffer I/O error on dev dm-0, logical block 0, lost async page write
    [ 1618.875527] Buffer I/O error on dev dm-0, logical block 1, lost async page write
    [ 1618.875602] Buffer I/O error on dev dm-0, logical block 2, lost async page write
    [ 1618.875620] Buffer I/O error on dev dm-0, logical block 3, lost async page write
    [ 1618.875639] Buffer I/O error on dev dm-0, logical block 4, lost async page write
    [ 1618.894316] Buffer I/O error on dev dm-0, logical block 5, lost async page write
    [ 1618.894358] Buffer I/O error on dev dm-0, logical block 6, lost async page write
    [ 1618.894380] Buffer I/O error on dev dm-0, logical block 7, lost async page write
    [ 1618.894405] Buffer I/O error on dev dm-0, logical block 8, lost async page write
    [ 1618.894427] Buffer I/O error on dev dm-0, logical block 9, lost async page write
    <snip>
    
    Many I/O errors because the lower 8K device rejects sub-ubs/misaligned
    requests.
    
    with a patch:
    
    urezki@pc638:~/bin$ sudo mkfs.ext4 -F /dev/dm-0
    mke2fs 1.47.0 (5-Feb-2023)
    Discarding device blocks: done
    Creating filesystem with 2072576 4k blocks and 518144 inodes
    Filesystem UUID: 9b54f44f-ef55-4bd4-9e40-c8b775a616ac
    Superblock backups stored on blocks:
            32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632
    
    Allocating group tables: done
    Writing inode tables: done
    Creating journal (16384 blocks): done
    Writing superblocks and filesystem accounting information: done
    
    urezki@pc638:~/bin$ sudo mount /dev/dm-0 /mnt/
    urezki@pc638:~/bin$ ls -al /mnt/
    total 24
    drwxr-xr-x  3 root root  4096 Oct 17 15:13 .
    drwxr-xr-x 19 root root  4096 Jul 10 19:42 ..
    drwx------  2 root root 16384 Oct 17 15:13 lost+found
    urezki@pc638:~/bin$
    
    After this change: mkfs completes; mount succeeds.
    
    Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>

diff --git a/drivers/md/dm-ebs-target.c b/drivers/md/dm-ebs-target.c
index 6abb31ca9662..b354e74a670e 100644
--- a/drivers/md/dm-ebs-target.c
+++ b/drivers/md/dm-ebs-target.c
@@ -103,7 +103,7 @@ static int __ebs_rw_bvec(struct ebs_c *ec, enum req_op op, struct bio_vec *bv,
 			} else {
 				flush_dcache_page(bv->bv_page);
 				memcpy(ba, pa, cur_len);
-				dm_bufio_mark_partial_buffer_dirty(b, buf_off, buf_off + cur_len);
+				dm_bufio_mark_buffer_dirty(b);
 			}
 
 			dm_bufio_release(b);
<snip>

--
Uladzislau Rezki

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RESEND PATCH] dm-ebs: Mark full buffer dirty even on partial write
@ 2025-11-17 10:59 Uladzislau Rezki (Sony)
  2025-11-17 20:48 ` Mikulas Patocka
  0 siblings, 1 reply; 33+ messages in thread
From: Uladzislau Rezki (Sony) @ 2025-11-17 10:59 UTC (permalink / raw)
  To: Mikulas Patocka, Alasdair Kergon, DMML
  Cc: Andrew Morton, Mike Snitzer, Christoph Hellwig, LKML,
	Uladzislau Rezki

When performing a read-modify-write(RMW) operation, any modification
to a buffered block must cause the entire buffer to be marked dirty.

Marking only a subrange as dirty is incorrect because the underlying
device block size(ubs) defines the minimum read/write granularity. A
lower device can perform I/O only on regions which are fully aligned
and sized to ubs.

This change ensures that write-back operations always occur in full
ubs-sized chunks, matching the intended emulation semantics of the
EBS target.

As for user space visible impact, submitting sub-ubs and misaligned
I/O for devices which are tuned to ubs sizes only, will reject such
requests, therefore it can lead to losing data. Example:

1) Create a 8K nvme device in qemu by adding

-device nvme,drive=drv0,serial=foo,logical_block_size=8192,physical_block_size=8192

2) Setup dm-ebs to emulate 512B to 8K mapping

urezki@pc638:~/bin$ cat dmsetup.sh

lower=/dev/nvme0n1
len=$(blockdev --getsz "$lower")

echo "0 $len ebs $lower 0 1 16" | dmsetup create nvme-8k
urezki@pc638:~/bin$

offset 0, ebs=1 and ubs=16(in sectors).

3) Create an ext4 filesystem(default 4K block size)

urezki@pc638:~/bin$ sudo mkfs.ext4 -F /dev/dm-0
mke2fs 1.47.0 (5-Feb-2023)
Discarding device blocks: done
Creating filesystem with 2072576 4k blocks and 518144 inodes
Filesystem UUID: bd0b6ca6-0506-4e31-86da-8d22c9d50b63
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632

Allocating group tables: done
Writing inode tables: done
Creating journal (16384 blocks): done
Writing superblocks and filesystem accounting information: mkfs.ext4: Input/output error while writing out and closing file system
urezki@pc638:~/bin$ dmesg

<snip>
[ 1618.875449] buffer_io_error: 1028 callbacks suppressed
[ 1618.875456] Buffer I/O error on dev dm-0, logical block 0, lost async page write
[ 1618.875527] Buffer I/O error on dev dm-0, logical block 1, lost async page write
[ 1618.875602] Buffer I/O error on dev dm-0, logical block 2, lost async page write
[ 1618.875620] Buffer I/O error on dev dm-0, logical block 3, lost async page write
[ 1618.875639] Buffer I/O error on dev dm-0, logical block 4, lost async page write
[ 1618.894316] Buffer I/O error on dev dm-0, logical block 5, lost async page write
[ 1618.894358] Buffer I/O error on dev dm-0, logical block 6, lost async page write
[ 1618.894380] Buffer I/O error on dev dm-0, logical block 7, lost async page write
[ 1618.894405] Buffer I/O error on dev dm-0, logical block 8, lost async page write
[ 1618.894427] Buffer I/O error on dev dm-0, logical block 9, lost async page write
<snip>

Many I/O errors because the lower 8K device rejects sub-ubs/misaligned
requests.

with a patch:

urezki@pc638:~/bin$ sudo mkfs.ext4 -F /dev/dm-0
mke2fs 1.47.0 (5-Feb-2023)
Discarding device blocks: done
Creating filesystem with 2072576 4k blocks and 518144 inodes
Filesystem UUID: 9b54f44f-ef55-4bd4-9e40-c8b775a616ac
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632

Allocating group tables: done
Writing inode tables: done
Creating journal (16384 blocks): done
Writing superblocks and filesystem accounting information: done

urezki@pc638:~/bin$ sudo mount /dev/dm-0 /mnt/
urezki@pc638:~/bin$ ls -al /mnt/
total 24
drwxr-xr-x  3 root root  4096 Oct 17 15:13 .
drwxr-xr-x 19 root root  4096 Jul 10 19:42 ..
drwx------  2 root root 16384 Oct 17 15:13 lost+found
urezki@pc638:~/bin$

After this change: mkfs completes; mount succeeds.

v1 -> v2:
 - reflect a user space visible impact in the commit message.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
 drivers/md/dm-ebs-target.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/md/dm-ebs-target.c b/drivers/md/dm-ebs-target.c
index 6abb31ca9662..b354e74a670e 100644
--- a/drivers/md/dm-ebs-target.c
+++ b/drivers/md/dm-ebs-target.c
@@ -103,7 +103,7 @@ static int __ebs_rw_bvec(struct ebs_c *ec, enum req_op op, struct bio_vec *bv,
 			} else {
 				flush_dcache_page(bv->bv_page);
 				memcpy(ba, pa, cur_len);
-				dm_bufio_mark_partial_buffer_dirty(b, buf_off, buf_off + cur_len);
+				dm_bufio_mark_buffer_dirty(b);
 			}
 
 			dm_bufio_release(b);
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [RESEND PATCH] dm-ebs: Mark full buffer dirty even on partial write
  2025-11-17 10:59 Uladzislau Rezki (Sony)
@ 2025-11-17 20:48 ` Mikulas Patocka
  2025-11-18 11:39   ` Uladzislau Rezki
  0 siblings, 1 reply; 33+ messages in thread
From: Mikulas Patocka @ 2025-11-17 20:48 UTC (permalink / raw)
  To: Uladzislau Rezki (Sony)
  Cc: Alasdair Kergon, DMML, Andrew Morton, Mike Snitzer,
	Christoph Hellwig, LKML

Hi

What is the logical_block_size of the underlying nvme device? - i.e. 
what's the content of this file 
/sys/block/nvme0n1/queue/logical_block_size in the virtual machine?

Mikulas

On Mon, 17 Nov 2025, Uladzislau Rezki (Sony) wrote:

> When performing a read-modify-write(RMW) operation, any modification
> to a buffered block must cause the entire buffer to be marked dirty.
> 
> Marking only a subrange as dirty is incorrect because the underlying
> device block size(ubs) defines the minimum read/write granularity. A
> lower device can perform I/O only on regions which are fully aligned
> and sized to ubs.
> 
> This change ensures that write-back operations always occur in full
> ubs-sized chunks, matching the intended emulation semantics of the
> EBS target.
> 
> As for user space visible impact, submitting sub-ubs and misaligned
> I/O for devices which are tuned to ubs sizes only, will reject such
> requests, therefore it can lead to losing data. Example:
> 
> 1) Create a 8K nvme device in qemu by adding
> 
> -device nvme,drive=drv0,serial=foo,logical_block_size=8192,physical_block_size=8192
> 
> 2) Setup dm-ebs to emulate 512B to 8K mapping
> 
> urezki@pc638:~/bin$ cat dmsetup.sh
> 
> lower=/dev/nvme0n1
> len=$(blockdev --getsz "$lower")
> 
> echo "0 $len ebs $lower 0 1 16" | dmsetup create nvme-8k
> urezki@pc638:~/bin$
> 
> offset 0, ebs=1 and ubs=16(in sectors).
> 
> 3) Create an ext4 filesystem(default 4K block size)
> 
> urezki@pc638:~/bin$ sudo mkfs.ext4 -F /dev/dm-0
> mke2fs 1.47.0 (5-Feb-2023)
> Discarding device blocks: done
> Creating filesystem with 2072576 4k blocks and 518144 inodes
> Filesystem UUID: bd0b6ca6-0506-4e31-86da-8d22c9d50b63
> Superblock backups stored on blocks:
>         32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632
> 
> Allocating group tables: done
> Writing inode tables: done
> Creating journal (16384 blocks): done
> Writing superblocks and filesystem accounting information: mkfs.ext4: Input/output error while writing out and closing file system
> urezki@pc638:~/bin$ dmesg
> 
> <snip>
> [ 1618.875449] buffer_io_error: 1028 callbacks suppressed
> [ 1618.875456] Buffer I/O error on dev dm-0, logical block 0, lost async page write
> [ 1618.875527] Buffer I/O error on dev dm-0, logical block 1, lost async page write
> [ 1618.875602] Buffer I/O error on dev dm-0, logical block 2, lost async page write
> [ 1618.875620] Buffer I/O error on dev dm-0, logical block 3, lost async page write
> [ 1618.875639] Buffer I/O error on dev dm-0, logical block 4, lost async page write
> [ 1618.894316] Buffer I/O error on dev dm-0, logical block 5, lost async page write
> [ 1618.894358] Buffer I/O error on dev dm-0, logical block 6, lost async page write
> [ 1618.894380] Buffer I/O error on dev dm-0, logical block 7, lost async page write
> [ 1618.894405] Buffer I/O error on dev dm-0, logical block 8, lost async page write
> [ 1618.894427] Buffer I/O error on dev dm-0, logical block 9, lost async page write
> <snip>
> 
> Many I/O errors because the lower 8K device rejects sub-ubs/misaligned
> requests.
> 
> with a patch:
> 
> urezki@pc638:~/bin$ sudo mkfs.ext4 -F /dev/dm-0
> mke2fs 1.47.0 (5-Feb-2023)
> Discarding device blocks: done
> Creating filesystem with 2072576 4k blocks and 518144 inodes
> Filesystem UUID: 9b54f44f-ef55-4bd4-9e40-c8b775a616ac
> Superblock backups stored on blocks:
>         32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632
> 
> Allocating group tables: done
> Writing inode tables: done
> Creating journal (16384 blocks): done
> Writing superblocks and filesystem accounting information: done
> 
> urezki@pc638:~/bin$ sudo mount /dev/dm-0 /mnt/
> urezki@pc638:~/bin$ ls -al /mnt/
> total 24
> drwxr-xr-x  3 root root  4096 Oct 17 15:13 .
> drwxr-xr-x 19 root root  4096 Jul 10 19:42 ..
> drwx------  2 root root 16384 Oct 17 15:13 lost+found
> urezki@pc638:~/bin$
> 
> After this change: mkfs completes; mount succeeds.
> 
> v1 -> v2:
>  - reflect a user space visible impact in the commit message.
> 
> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> ---
>  drivers/md/dm-ebs-target.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/md/dm-ebs-target.c b/drivers/md/dm-ebs-target.c
> index 6abb31ca9662..b354e74a670e 100644
> --- a/drivers/md/dm-ebs-target.c
> +++ b/drivers/md/dm-ebs-target.c
> @@ -103,7 +103,7 @@ static int __ebs_rw_bvec(struct ebs_c *ec, enum req_op op, struct bio_vec *bv,
>  			} else {
>  				flush_dcache_page(bv->bv_page);
>  				memcpy(ba, pa, cur_len);
> -				dm_bufio_mark_partial_buffer_dirty(b, buf_off, buf_off + cur_len);
> +				dm_bufio_mark_buffer_dirty(b);
>  			}
>  
>  			dm_bufio_release(b);
> -- 
> 2.47.3
> 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RESEND PATCH] dm-ebs: Mark full buffer dirty even on partial write
  2025-11-17 20:48 ` Mikulas Patocka
@ 2025-11-18 11:39   ` Uladzislau Rezki
  2025-11-18 12:00     ` Mikulas Patocka
  0 siblings, 1 reply; 33+ messages in thread
From: Uladzislau Rezki @ 2025-11-18 11:39 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Uladzislau Rezki (Sony), Alasdair Kergon, DMML, Andrew Morton,
	Mike Snitzer, Christoph Hellwig, LKML

Hello, Mikulas!

> Hi
> 
> What is the logical_block_size of the underlying nvme device? - i.e. 
> what's the content of this file 
> /sys/block/nvme0n1/queue/logical_block_size in the virtual machine?
> 
It is 512. Whereas a physical is bigger, i.e. my device can not perform
I/O by 512 granularity.

As for virtual machine, i just simulated the problem so people can set
it up and check. The commit message describes how it can be reproduced.

The dm-ebs target which i setup does ebs to ubs conversion, so the NVME
driver gets BIOs are in size and aligned to ubs size. The ubs size
corresponds to the underlying physical device I/O size.

So your patch does not work if logical < physical. Therefore it does
not help my project.

--
Uladzislau Rezki

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RESEND PATCH] dm-ebs: Mark full buffer dirty even on partial write
  2025-11-18 11:39   ` Uladzislau Rezki
@ 2025-11-18 12:00     ` Mikulas Patocka
  2025-11-18 12:40       ` Uladzislau Rezki
  2025-11-18 14:15       ` Benjamin Marzinski
  0 siblings, 2 replies; 33+ messages in thread
From: Mikulas Patocka @ 2025-11-18 12:00 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Alasdair Kergon, DMML, Andrew Morton, Mike Snitzer,
	Christoph Hellwig, LKML

On Tue, 18 Nov 2025, Uladzislau Rezki wrote:

> Hello, Mikulas!
> 
> > Hi
> > 
> > What is the logical_block_size of the underlying nvme device? - i.e. 
> > what's the content of this file 
> > /sys/block/nvme0n1/queue/logical_block_size in the virtual machine?
> > 
> It is 512. Whereas a physical is bigger, i.e. my device can not perform
> I/O by 512 granularity.

And what is physical block size? Is it 8192?

> As for virtual machine, i just simulated the problem so people can set
> it up and check. The commit message describes how it can be reproduced.
> 
> The dm-ebs target which i setup does ebs to ubs conversion, so the NVME
> driver gets BIOs are in size and aligned to ubs size. The ubs size
> corresponds to the underlying physical device I/O size.
> 
> So your patch does not work if logical < physical. Therefore it does
> not help my project.

Logical block size is the granularity at which the device can accept I/O. 
Physical block size is the block size on the medium.

If logical < physical, then the device performs read-modify-write cycle 
when writing blocks that are not aligned at physical block size.

So, your setup is broken, because it advertises logical block size 512, 
but it is not able to perform I/O at this granularity.

There is this piece of code in include/linux/blkdev.h:
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
/*
 * We should strive for 1 << (PAGE_SHIFT + MAX_PAGECACHE_ORDER)
 * however we constrain this to what we can validate and test.
 */
#define BLK_MAX_BLOCK_SIZE      SZ_64K
#else
#define BLK_MAX_BLOCK_SIZE      PAGE_SIZE
#endif

/* blk_validate_limits() validates bsize, so drivers don't usually need to */
static inline int blk_validate_block_size(unsigned long bsize)
{
        if (bsize < 512 || bsize > BLK_MAX_BLOCK_SIZE || !is_power_of_2(bsize))
                return -EINVAL;

        return 0;
}

What happens when you define CONFIG_TRANSPARENT_HUGEPAGE in your .config? 
Does it fix the problem with small logical block size for you?

Mikulas

> --
> Uladzislau Rezki
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RESEND PATCH] dm-ebs: Mark full buffer dirty even on partial write
  2025-11-18 12:00     ` Mikulas Patocka
@ 2025-11-18 12:40       ` Uladzislau Rezki
  2025-11-18 12:46         ` Christoph Hellwig
  2025-11-18 14:15       ` Benjamin Marzinski
  1 sibling, 1 reply; 33+ messages in thread
From: Uladzislau Rezki @ 2025-11-18 12:40 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Uladzislau Rezki, Alasdair Kergon, DMML, Andrew Morton,
	Mike Snitzer, Christoph Hellwig, LKML

On Tue, Nov 18, 2025 at 01:00:36PM +0100, Mikulas Patocka wrote:
> 
> 
> On Tue, 18 Nov 2025, Uladzislau Rezki wrote:
> 
> > Hello, Mikulas!
> > 
> > > Hi
> > > 
> > > What is the logical_block_size of the underlying nvme device? - i.e. 
> > > what's the content of this file 
> > > /sys/block/nvme0n1/queue/logical_block_size in the virtual machine?
> > > 
> > It is 512. Whereas a physical is bigger, i.e. my device can not perform
> > I/O by 512 granularity.
> 
> And what is physical block size? Is it 8192?
> 
Bigger then logical.

> > As for virtual machine, i just simulated the problem so people can set
> > it up and check. The commit message describes how it can be reproduced.
> > 
> > The dm-ebs target which i setup does ebs to ubs conversion, so the NVME
> > driver gets BIOs are in size and aligned to ubs size. The ubs size
> > corresponds to the underlying physical device I/O size.
> > 
> > So your patch does not work if logical < physical. Therefore it does
> > not help my project.
> 
> Logical block size is the granularity at which the device can accept I/O. 
> Physical block size is the block size on the medium.
> 
> If logical < physical, then the device performs read-modify-write cycle 
> when writing blocks that are not aligned at physical block size.
> 
This is not true. It depends on your device and specification. If it
can't there is the dm-ebs that does the job.

> So, your setup is broken, because it advertises logical block size 512, 
> but it is not able to perform I/O at this granularity.
> 
I posted the workflow how to reproduce the problem. See the commit
messages. But as i noted it is for people so they can simulate it.
 
But in my case, real one, logical < pysical.

> There is this piece of code in include/linux/blkdev.h:
> #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> /*
>  * We should strive for 1 << (PAGE_SHIFT + MAX_PAGECACHE_ORDER)
>  * however we constrain this to what we can validate and test.
>  */
> #define BLK_MAX_BLOCK_SIZE      SZ_64K
:wq
> #else
> #define BLK_MAX_BLOCK_SIZE      PAGE_SIZE
> #endif
> 
> /* blk_validate_limits() validates bsize, so drivers don't usually need to */
> static inline int blk_validate_block_size(unsigned long bsize)
> {
>         if (bsize < 512 || bsize > BLK_MAX_BLOCK_SIZE || !is_power_of_2(bsize))
>                 return -EINVAL;
> 
>         return 0;
> }
> 
> What happens when you define CONFIG_TRANSPARENT_HUGEPAGE in your .config? 
> Does it fix the problem with small logical block size for you?
> 
TRANSPARENT stuff allows you to work with PS < BS. I have it enabled in
my case.
 
Just to repeat, the device can not do I/O with logical bs only physical.

--
Uladzislau Rezki

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RESEND PATCH] dm-ebs: Mark full buffer dirty even on partial write
  2025-11-18 12:40       ` Uladzislau Rezki
@ 2025-11-18 12:46         ` Christoph Hellwig
  0 siblings, 0 replies; 33+ messages in thread
From: Christoph Hellwig @ 2025-11-18 12:46 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Mikulas Patocka, Alasdair Kergon, DMML, Andrew Morton,
	Mike Snitzer, Christoph Hellwig, LKML

On Tue, Nov 18, 2025 at 01:40:28PM +0100, Uladzislau Rezki wrote:
> > If logical < physical, then the device performs read-modify-write cycle 
> > when writing blocks that are not aligned at physical block size.
> > 
> This is not true. It depends on your device and specification. If it
> can't there is the dm-ebs that does the job.

Logical blocks size is the access unit.  Physical block size is a hint
that the device might be doing RWF and thus be slow.  What driver/device
are using?  Whatever it is doing is completely broken.

(that being said aligning things to the physical block size as much
as possible is usually still a good optimization, that's why the
value exists).

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RESEND PATCH] dm-ebs: Mark full buffer dirty even on partial write
  2025-11-18 12:00     ` Mikulas Patocka
  2025-11-18 12:40       ` Uladzislau Rezki
@ 2025-11-18 14:15       ` Benjamin Marzinski
  2025-11-18 17:21         ` Mikulas Patocka
  1 sibling, 1 reply; 33+ messages in thread
From: Benjamin Marzinski @ 2025-11-18 14:15 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Uladzislau Rezki, Alasdair Kergon, DMML, Andrew Morton,
	Mike Snitzer, Christoph Hellwig, LKML

On Tue, Nov 18, 2025 at 01:00:36PM +0100, Mikulas Patocka wrote:
> 
> 
> On Tue, 18 Nov 2025, Uladzislau Rezki wrote:
> 
> > Hello, Mikulas!
> > 
> > > Hi
> > > 
> > > What is the logical_block_size of the underlying nvme device? - i.e. 
> > > what's the content of this file 
> > > /sys/block/nvme0n1/queue/logical_block_size in the virtual machine?
> > > 
> > It is 512. Whereas a physical is bigger, i.e. my device can not perform
> > I/O by 512 granularity.
> 
> And what is physical block size? Is it 8192?
> 
> > As for virtual machine, i just simulated the problem so people can set
> > it up and check. The commit message describes how it can be reproduced.
> > 
> > The dm-ebs target which i setup does ebs to ubs conversion, so the NVME
> > driver gets BIOs are in size and aligned to ubs size. The ubs size
> > corresponds to the underlying physical device I/O size.
> > 
> > So your patch does not work if logical < physical. Therefore it does
> > not help my project.
> 
> Logical block size is the granularity at which the device can accept I/O. 
> Physical block size is the block size on the medium.
> 
> If logical < physical, then the device performs read-modify-write cycle 
> when writing blocks that are not aligned at physical block size.
> 
> So, your setup is broken, because it advertises logical block size 512, 
> but it is not able to perform I/O at this granularity.

This emulated nvme device is broken, but the question still remains,
"should dm-ebs enfore writing at the alignment that was specified in its
table line?" If you don't specify a ubs in the table line, it defaults
to the logical block size. So, if you do specify a ubs, it stands to
reason that you want IO at that alignment, instead of the logical-block
size (perhaps because your device is broken, and advertises the wrong
logical block size).  

So, I think that Uladzislau's patch makes sense, in addition to your
own.

-Ben
 
> There is this piece of code in include/linux/blkdev.h:
> #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> /*
>  * We should strive for 1 << (PAGE_SHIFT + MAX_PAGECACHE_ORDER)
>  * however we constrain this to what we can validate and test.
>  */
> #define BLK_MAX_BLOCK_SIZE      SZ_64K
> #else
> #define BLK_MAX_BLOCK_SIZE      PAGE_SIZE
> #endif
> 
> /* blk_validate_limits() validates bsize, so drivers don't usually need to */
> static inline int blk_validate_block_size(unsigned long bsize)
> {
>         if (bsize < 512 || bsize > BLK_MAX_BLOCK_SIZE || !is_power_of_2(bsize))
>                 return -EINVAL;
> 
>         return 0;
> }
> 
> What happens when you define CONFIG_TRANSPARENT_HUGEPAGE in your .config? 
> Does it fix the problem with small logical block size for you?
> 
> Mikulas
> 
> > --
> > Uladzislau Rezki
> > 
> 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RESEND PATCH] dm-ebs: Mark full buffer dirty even on partial write
  2025-11-18 14:15       ` Benjamin Marzinski
@ 2025-11-18 17:21         ` Mikulas Patocka
  2025-11-19  5:46           ` Christoph Hellwig
  0 siblings, 1 reply; 33+ messages in thread
From: Mikulas Patocka @ 2025-11-18 17:21 UTC (permalink / raw)
  To: Benjamin Marzinski
  Cc: Uladzislau Rezki, Alasdair Kergon, DMML, Andrew Morton,
	Mike Snitzer, Christoph Hellwig, LKML



On Tue, 18 Nov 2025, Benjamin Marzinski wrote:

> On Tue, Nov 18, 2025 at 01:00:36PM +0100, Mikulas Patocka wrote:
> > 
> > 
> > On Tue, 18 Nov 2025, Uladzislau Rezki wrote:
> > 
> > > Hello, Mikulas!
> > > 
> > > > Hi
> > > > 
> > > > What is the logical_block_size of the underlying nvme device? - i.e. 
> > > > what's the content of this file 
> > > > /sys/block/nvme0n1/queue/logical_block_size in the virtual machine?
> > > > 
> > > It is 512. Whereas a physical is bigger, i.e. my device can not perform
> > > I/O by 512 granularity.
> > 
> > And what is physical block size? Is it 8192?
> > 
> > > As for virtual machine, i just simulated the problem so people can set
> > > it up and check. The commit message describes how it can be reproduced.
> > > 
> > > The dm-ebs target which i setup does ebs to ubs conversion, so the NVME
> > > driver gets BIOs are in size and aligned to ubs size. The ubs size
> > > corresponds to the underlying physical device I/O size.
> > > 
> > > So your patch does not work if logical < physical. Therefore it does
> > > not help my project.
> > 
> > Logical block size is the granularity at which the device can accept I/O. 
> > Physical block size is the block size on the medium.
> > 
> > If logical < physical, then the device performs read-modify-write cycle 
> > when writing blocks that are not aligned at physical block size.
> > 
> > So, your setup is broken, because it advertises logical block size 512, 
> > but it is not able to perform I/O at this granularity.
> 
> This emulated nvme device is broken, but the question still remains,
> "should dm-ebs enfore writing at the alignment that was specified in its
> table line?" If you don't specify a ubs in the table line, it defaults
> to the logical block size. So, if you do specify a ubs, it stands to
> reason that you want IO at that alignment, instead of the logical-block
> size (perhaps because your device is broken, and advertises the wrong
> logical block size).  
> 
> So, I think that Uladzislau's patch makes sense, in addition to your
> own.
> 
> -Ben

OK - I accepted Uladzislau's patch. As logical block size and physical 
block size seem to be unreliable, it's better to set the size in dm-ebs.

Mikulas


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RESEND PATCH] dm-ebs: Mark full buffer dirty even on partial write
  2025-11-18 17:21         ` Mikulas Patocka
@ 2025-11-19  5:46           ` Christoph Hellwig
  2025-11-19  8:43             ` Uladzislau Rezki
  2025-11-19 17:26             ` Mikulas Patocka
  0 siblings, 2 replies; 33+ messages in thread
From: Christoph Hellwig @ 2025-11-19  5:46 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Benjamin Marzinski, Uladzislau Rezki, Alasdair Kergon, DMML,
	Andrew Morton, Mike Snitzer, Christoph Hellwig, LKML

On Tue, Nov 18, 2025 at 06:21:56PM +0100, Mikulas Patocka wrote:
> OK - I accepted Uladzislau's patch. As logical block size and physical 
> block size seem to be unreliable, it's better to set the size in dm-ebs.

logical and physical block size are reliable.  Uladzislau just seems
to have a completely broken device that needs fixing, because it will
run into all kinds of other problems otherwise.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RESEND PATCH] dm-ebs: Mark full buffer dirty even on partial write
  2025-11-19  5:46           ` Christoph Hellwig
@ 2025-11-19  8:43             ` Uladzislau Rezki
  2025-11-19  8:53               ` Christoph Hellwig
  2025-11-19 17:26             ` Mikulas Patocka
  1 sibling, 1 reply; 33+ messages in thread
From: Uladzislau Rezki @ 2025-11-19  8:43 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Mikulas Patocka, Benjamin Marzinski, Uladzislau Rezki,
	Alasdair Kergon, DMML, Andrew Morton, Mike Snitzer, LKML

On Wed, Nov 19, 2025 at 06:46:35AM +0100, Christoph Hellwig wrote:
> On Tue, Nov 18, 2025 at 06:21:56PM +0100, Mikulas Patocka wrote:
> > OK - I accepted Uladzislau's patch. As logical block size and physical 
> > block size seem to be unreliable, it's better to set the size in dm-ebs.
> 
> logical and physical block size are reliable.  Uladzislau just seems
> to have a completely broken device that needs fixing, because it will
> run into all kinds of other problems otherwise.
> 
Well. LBA is fixed in my case. Just only one format which is 512B.

Whereas the I/O can not be performed by using LBAs sizes. It is fixed
and bigger.

I rely on this:

<snip>
/*
 * Construct an emulated block size mapping: <dev_path> <offset> <ebs> [<ubs>]
 *
 * <dev_path>: path of the underlying device
 * <offset>: offset in 512 bytes sectors into <dev_path>
 * <ebs>: emulated block size in units of 512 bytes exposed to the upper layer
 * [<ubs>]: underlying block size in units of 512 bytes imposed on the lower layer;
 *	    optional, if not supplied, retrieve logical block size from underlying device
 */
static int ebs_ctr(struct dm_target *ti, unsigned int argc, char **argv)
{
...
<snip>

to do RMW. It says if UBS is set, data has to impose to lower layer
in sizes of UBS and aligned to UBS. I specify the UBS what my device
is capable of reading/writing.

The buffer is correctly updated in terms of RMW in UBS window. But it
flushes partly leading to I/O errors. I find it wrong. Because i set
the desired underlying block size.

That is my concern.

--
Uladzislau Rezki

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RESEND PATCH] dm-ebs: Mark full buffer dirty even on partial write
  2025-11-19  8:43             ` Uladzislau Rezki
@ 2025-11-19  8:53               ` Christoph Hellwig
  2025-11-19  8:57                 ` Uladzislau Rezki
  0 siblings, 1 reply; 33+ messages in thread
From: Christoph Hellwig @ 2025-11-19  8:53 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Christoph Hellwig, Mikulas Patocka, Benjamin Marzinski,
	Alasdair Kergon, DMML, Andrew Morton, Mike Snitzer, LKML

On Wed, Nov 19, 2025 at 09:43:13AM +0100, Uladzislau Rezki wrote:
> Well. LBA is fixed in my case. Just only one format which is 512B.
> 
> Whereas the I/O can not be performed by using LBAs sizes. It is fixed
> and bigger.

Then it is not an LBA.  The LBA is defined as the minimum block size
you can do I/O on.  Aka your device is really gravely broken.  What
device is this and who is selling it?


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RESEND PATCH] dm-ebs: Mark full buffer dirty even on partial write
  2025-11-19  8:53               ` Christoph Hellwig
@ 2025-11-19  8:57                 ` Uladzislau Rezki
  2025-11-19  9:00                   ` Christoph Hellwig
  0 siblings, 1 reply; 33+ messages in thread
From: Uladzislau Rezki @ 2025-11-19  8:57 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Uladzislau Rezki, Mikulas Patocka, Benjamin Marzinski,
	Alasdair Kergon, DMML, Andrew Morton, Mike Snitzer, LKML

On Wed, Nov 19, 2025 at 09:53:28AM +0100, Christoph Hellwig wrote:
> On Wed, Nov 19, 2025 at 09:43:13AM +0100, Uladzislau Rezki wrote:
> > Well. LBA is fixed in my case. Just only one format which is 512B.
> > 
> > Whereas the I/O can not be performed by using LBAs sizes. It is fixed
> > and bigger.
> 
> Then it is not an LBA.  The LBA is defined as the minimum block size
> you can do I/O on.  Aka your device is really gravely broken.  What
> device is this and who is selling it?
> 
I define UBS - underlying block size. The lower layer expects BIOs in
that sizes but this is not true.

I am not allowed to disclose and answer your last question.

--
Uladzislau Rezki

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RESEND PATCH] dm-ebs: Mark full buffer dirty even on partial write
  2025-11-19  8:57                 ` Uladzislau Rezki
@ 2025-11-19  9:00                   ` Christoph Hellwig
  2025-11-19  9:01                     ` Uladzislau Rezki
  0 siblings, 1 reply; 33+ messages in thread
From: Christoph Hellwig @ 2025-11-19  9:00 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Christoph Hellwig, Mikulas Patocka, Benjamin Marzinski,
	Alasdair Kergon, DMML, Andrew Morton, Mike Snitzer, LKML

On Wed, Nov 19, 2025 at 09:57:49AM +0100, Uladzislau Rezki wrote:
> On Wed, Nov 19, 2025 at 09:53:28AM +0100, Christoph Hellwig wrote:
> > On Wed, Nov 19, 2025 at 09:43:13AM +0100, Uladzislau Rezki wrote:
> > > Well. LBA is fixed in my case. Just only one format which is 512B.
> > > 
> > > Whereas the I/O can not be performed by using LBAs sizes. It is fixed
> > > and bigger.
> > 
> > Then it is not an LBA.  The LBA is defined as the minimum block size
> > you can do I/O on.  Aka your device is really gravely broken.  What
> > device is this and who is selling it?
> > 
> I define UBS - underlying block size. The lower layer expects BIOs in
> that sizes but this is not true.
> 
> I am not allowed to disclose and answer your last question.

Well.  Let's frame it differently:  this device is just to broken
to be used with Linux.  In theory we could quirk based on the vendor
ID to correctly report the logical block size, but for that we'd
need the information.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RESEND PATCH] dm-ebs: Mark full buffer dirty even on partial write
  2025-11-19  9:00                   ` Christoph Hellwig
@ 2025-11-19  9:01                     ` Uladzislau Rezki
  2025-11-19  9:05                       ` Christoph Hellwig
  0 siblings, 1 reply; 33+ messages in thread
From: Uladzislau Rezki @ 2025-11-19  9:01 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Uladzislau Rezki, Mikulas Patocka, Benjamin Marzinski,
	Alasdair Kergon, DMML, Andrew Morton, Mike Snitzer, LKML

On Wed, Nov 19, 2025 at 10:00:10AM +0100, Christoph Hellwig wrote:
> On Wed, Nov 19, 2025 at 09:57:49AM +0100, Uladzislau Rezki wrote:
> > On Wed, Nov 19, 2025 at 09:53:28AM +0100, Christoph Hellwig wrote:
> > > On Wed, Nov 19, 2025 at 09:43:13AM +0100, Uladzislau Rezki wrote:
> > > > Well. LBA is fixed in my case. Just only one format which is 512B.
> > > > 
> > > > Whereas the I/O can not be performed by using LBAs sizes. It is fixed
> > > > and bigger.
> > > 
> > > Then it is not an LBA.  The LBA is defined as the minimum block size
> > > you can do I/O on.  Aka your device is really gravely broken.  What
> > > device is this and who is selling it?
> > > 
> > I define UBS - underlying block size. The lower layer expects BIOs in
> > that sizes but this is not true.
> > 
> > I am not allowed to disclose and answer your last question.
> 
> Well.  Let's frame it differently:  this device is just to broken
> to be used with Linux.  In theory we could quirk based on the vendor
> ID to correctly report the logical block size, but for that we'd
> need the information.
> 
/*
 * Construct an emulated block size mapping: <dev_path> <offset> <ebs> [<ubs>]
 *
 * <dev_path>: path of the underlying device
 * <offset>: offset in 512 bytes sectors into <dev_path>
 * <ebs>: emulated block size in units of 512 bytes exposed to the upper layer
 * [<ubs>]: underlying block size in units of 512 bytes imposed on the lower layer;
 *	    optional, if not supplied, retrieve logical block size from underlying device
 */
static int ebs_ctr(struct dm_target *ti, unsigned int argc, char **argv)
{

Again i rely on this.

--
Uladzislau Rezki

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RESEND PATCH] dm-ebs: Mark full buffer dirty even on partial write
  2025-11-19  9:01                     ` Uladzislau Rezki
@ 2025-11-19  9:05                       ` Christoph Hellwig
  2025-11-19  9:13                         ` Uladzislau Rezki
  0 siblings, 1 reply; 33+ messages in thread
From: Christoph Hellwig @ 2025-11-19  9:05 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Christoph Hellwig, Mikulas Patocka, Benjamin Marzinski,
	Alasdair Kergon, DMML, Andrew Morton, Mike Snitzer, LKML

On Wed, Nov 19, 2025 at 10:01:25AM +0100, Uladzislau Rezki wrote:
> Again i rely on this.

I'm not sure what your random quoting of low-quality comments in
kernel code is trying to prove here.

Any device that does not support the LBA/sector size correctly is
not support by Linux.  We could in theory quirk it in the driver,
but for that we'd need to known the vendor/model and a good argument
why we can't fix the device reporting, as this should just be firmware.

Secrecy is not going to help with that.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RESEND PATCH] dm-ebs: Mark full buffer dirty even on partial write
  2025-11-19  9:05                       ` Christoph Hellwig
@ 2025-11-19  9:13                         ` Uladzislau Rezki
  2025-11-19  9:17                           ` Christoph Hellwig
  0 siblings, 1 reply; 33+ messages in thread
From: Uladzislau Rezki @ 2025-11-19  9:13 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Uladzislau Rezki, Mikulas Patocka, Benjamin Marzinski,
	Alasdair Kergon, DMML, Andrew Morton, Mike Snitzer, LKML

On Wed, Nov 19, 2025 at 10:05:30AM +0100, Christoph Hellwig wrote:
> On Wed, Nov 19, 2025 at 10:01:25AM +0100, Uladzislau Rezki wrote:
> > Again i rely on this.
> 
> I'm not sure what your random quoting of low-quality comments in
> kernel code is trying to prove here.
> 
> Any device that does not support the LBA/sector size correctly is
> not support by Linux.  We could in theory quirk it in the driver,
> but for that we'd need to known the vendor/model and a good argument
> why we can't fix the device reporting, as this should just be firmware.
> 
> Secrecy is not going to help with that.
> 
I will not resist.

I tried my best to explain the problem including description and steps
how to reproduce it in the commit message if someone wants to repeat it
in qemu with virtual nvme device.

--
Uladzisual Rezki

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RESEND PATCH] dm-ebs: Mark full buffer dirty even on partial write
  2025-11-19  9:13                         ` Uladzislau Rezki
@ 2025-11-19  9:17                           ` Christoph Hellwig
  0 siblings, 0 replies; 33+ messages in thread
From: Christoph Hellwig @ 2025-11-19  9:17 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Christoph Hellwig, Mikulas Patocka, Benjamin Marzinski,
	Alasdair Kergon, DMML, Andrew Morton, Mike Snitzer, LKML

On Wed, Nov 19, 2025 at 10:13:51AM +0100, Uladzislau Rezki wrote:
> On Wed, Nov 19, 2025 at 10:05:30AM +0100, Christoph Hellwig wrote:
> > On Wed, Nov 19, 2025 at 10:01:25AM +0100, Uladzislau Rezki wrote:
> > > Again i rely on this.
> > 
> > I'm not sure what your random quoting of low-quality comments in
> > kernel code is trying to prove here.
> > 
> > Any device that does not support the LBA/sector size correctly is
> > not support by Linux.  We could in theory quirk it in the driver,
> > but for that we'd need to known the vendor/model and a good argument
> > why we can't fix the device reporting, as this should just be firmware.
> > 
> > Secrecy is not going to help with that.
> > 
> I will not resist.
> 
> I tried my best to explain the problem including description and steps
> how to reproduce it in the commit message if someone wants to repeat it
> in qemu with virtual nvme device.

Again, the problem is that your device reports an incorrect block size.

And you're being a complete dick ignoring this and not even telling
us anything about the device.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RESEND PATCH] dm-ebs: Mark full buffer dirty even on partial write
  2025-11-19  5:46           ` Christoph Hellwig
  2025-11-19  8:43             ` Uladzislau Rezki
@ 2025-11-19 17:26             ` Mikulas Patocka
  2025-11-20  6:21               ` Christoph Hellwig
  1 sibling, 1 reply; 33+ messages in thread
From: Mikulas Patocka @ 2025-11-19 17:26 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Benjamin Marzinski, Uladzislau Rezki, Alasdair Kergon, DMML,
	Andrew Morton, Mike Snitzer, LKML

On Wed, 19 Nov 2025, Christoph Hellwig wrote:

> On Tue, Nov 18, 2025 at 06:21:56PM +0100, Mikulas Patocka wrote:
> > OK - I accepted Uladzislau's patch. As logical block size and physical 
> > block size seem to be unreliable, it's better to set the size in dm-ebs.
> 
> logical and physical block size are reliable.  Uladzislau just seems
> to have a completely broken device that needs fixing, because it will

He created a qemu-emulated NVMe device with physical and logical block 
size 8192 in a virtual machine. And logical block size was reported as 512 
in the guest kernel - so it is either a qemu bug or a kernel bug.

> run into all kinds of other problems otherwise.

Hmm, Linux may get it wrong too. See this piece of code in 
include/linux/blkdev.h:

#ifdef CONFIG_TRANSPARENT_HUGEPAGE
/*
 * We should strive for 1 << (PAGE_SHIFT + MAX_PAGECACHE_ORDER)
 * however we constrain this to what we can validate and test.
 */
#define BLK_MAX_BLOCK_SIZE      SZ_64K
#else
#define BLK_MAX_BLOCK_SIZE      PAGE_SIZE
#endif

/* blk_validate_limits() validates bsize, so drivers don't usually need to */
static inline int blk_validate_block_size(unsigned long bsize)
{
        if (bsize < 512 || bsize > BLK_MAX_BLOCK_SIZE || !is_power_of_2(bsize))
                return -EINVAL;

        return 0;
}

in nvme_update_disk_info there is this piece of code:
        if (blk_validate_block_size(bs)) {
                bs = (1 << 9);
                valid = false;
        }

So, the valid block size depends on whether CONFIG_TRANSPARENT_HUGEPAGE is 
defined, which is quite weird.

Mikulas

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RESEND PATCH] dm-ebs: Mark full buffer dirty even on partial write
  2025-11-19 17:26             ` Mikulas Patocka
@ 2025-11-20  6:21               ` Christoph Hellwig
  2025-11-20 12:08                 ` Uladzislau Rezki
  0 siblings, 1 reply; 33+ messages in thread
From: Christoph Hellwig @ 2025-11-20  6:21 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Christoph Hellwig, Benjamin Marzinski, Uladzislau Rezki,
	Alasdair Kergon, DMML, Andrew Morton, Mike Snitzer, LKML

On Wed, Nov 19, 2025 at 06:26:13PM +0100, Mikulas Patocka wrote:
> 
> 
> On Wed, 19 Nov 2025, Christoph Hellwig wrote:
> 
> > On Tue, Nov 18, 2025 at 06:21:56PM +0100, Mikulas Patocka wrote:
> > > OK - I accepted Uladzislau's patch. As logical block size and physical 
> > > block size seem to be unreliable, it's better to set the size in dm-ebs.
> > 
> > logical and physical block size are reliable.  Uladzislau just seems
> > to have a completely broken device that needs fixing, because it will
> 
> He created a qemu-emulated NVMe device with physical and logical block 
> size 8192 in a virtual machine. And logical block size was reported as 512 
> in the guest kernel - so it is either a qemu bug or a kernel bug.

No, that's not the case.  If you use his command line you'll see a qemu
device with 8192 logical blocks assuming you have support for large
folios, or a completely unusuable device that claims to have 512
byte blocks for compatibility, but also a capacity of zero so that no
one can use it for anything but passthrough.

> in nvme_update_disk_info there is this piece of code:
>         if (blk_validate_block_size(bs)) {
>                 bs = (1 << 9);
>                 valid = false;
>         }

Yes, that's what I mentioned above.  The valid=false sets the capacity
to zero, so you're not actually going to be able to use this device.

> So, the valid block size depends on whether CONFIG_TRANSPARENT_HUGEPAGE is 
> defined, which is quite weird.

And also the page size, and none of that is too weird.  You need support
efficiently allocating large order folios to support a
block size > PAGE_SIZE and currently CONFIG_TRANSPARENT_HUGEPAGE is
the guard for that.  There was some talk of lifting that, but that
requires a bit of work.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RESEND PATCH] dm-ebs: Mark full buffer dirty even on partial write
  2025-11-20  6:21               ` Christoph Hellwig
@ 2025-11-20 12:08                 ` Uladzislau Rezki
  2025-11-20 12:40                   ` Uladzislau Rezki
  2025-11-21  7:24                   ` Christoph Hellwig
  0 siblings, 2 replies; 33+ messages in thread
From: Uladzislau Rezki @ 2025-11-20 12:08 UTC (permalink / raw)
  To: Christoph Hellwig, Mikulas Patocka
  Cc: Mikulas Patocka, Benjamin Marzinski, Uladzislau Rezki,
	Alasdair Kergon, DMML, Andrew Morton, Mike Snitzer, LKML

On Thu, Nov 20, 2025 at 07:21:46AM +0100, Christoph Hellwig wrote:
> On Wed, Nov 19, 2025 at 06:26:13PM +0100, Mikulas Patocka wrote:
> > 
> > 
> > On Wed, 19 Nov 2025, Christoph Hellwig wrote:
> > 
> > > On Tue, Nov 18, 2025 at 06:21:56PM +0100, Mikulas Patocka wrote:
> > > > OK - I accepted Uladzislau's patch. As logical block size and physical 
> > > > block size seem to be unreliable, it's better to set the size in dm-ebs.
> > > 
> > > logical and physical block size are reliable.  Uladzislau just seems
> > > to have a completely broken device that needs fixing, because it will
> > 
> > He created a qemu-emulated NVMe device with physical and logical block 
> > size 8192 in a virtual machine. And logical block size was reported as 512 
> > in the guest kernel - so it is either a qemu bug or a kernel bug.
> 
> No, that's not the case.  If you use his command line you'll see a qemu
> device with 8192 logical blocks assuming you have support for large
> folios, or a completely unusuable device that claims to have 512
> byte blocks for compatibility, but also a capacity of zero so that no
> one can use it for anything but passthrough.
> 
> > in nvme_update_disk_info there is this piece of code:
> >         if (blk_validate_block_size(bs)) {
> >                 bs = (1 << 9);
> >                 valid = false;
> >         }
> 
> Yes, that's what I mentioned above.  The valid=false sets the capacity
> to zero, so you're not actually going to be able to use this device.
> 
> > So, the valid block size depends on whether CONFIG_TRANSPARENT_HUGEPAGE is 
> > defined, which is quite weird.
> 
> And also the page size, and none of that is too weird.  You need support
> efficiently allocating large order folios to support a
> block size > PAGE_SIZE and currently CONFIG_TRANSPARENT_HUGEPAGE is
> the guard for that.  There was some talk of lifting that, but that
> requires a bit of work.
> 
Could you please check below? Is the last one is correctly reported?
I have enabled the CONFIG_TRANSPARENT_HUGEPAGE option. If i specify,
8192, 8192 first case, reports are what i set. Second variant 512, 8129
shows 512, 512:

CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y
CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD=y
CONFIG_TRANSPARENT_HUGEPAGE=y
CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y
# CONFIG_TRANSPARENT_HUGEPAGE_MADVISE is not set
# CONFIG_TRANSPARENT_HUGEPAGE_NEVER is not set

-device nvme,drive=drv0,serial=foo,logical_block_size=8192,physical_block_size=8192
urezki@pc638:~$ sudo nvme list
Node                  Generic               SN                   Model                                    Namespace Usage                      Format           FW Rev
--------------------- --------------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme0n1          /dev/ng0n1            foo                  QEMU NVMe Ctrl                           1           8.49  GB /   8.49  GB      8 KiB +  0 B   10.0.6
urezki@pc638:~$ sudo cat /sys/block/nvme0n1/queue/logical_block_size
8192
urezki@pc638:~$ sudo cat /sys/block/nvme0n1/queue/physical_block_size
8192
urezki@pc638:~$


-device nvme,drive=drv0,serial=foo,logical_block_size=512,physical_block_size=8192
urezki@pc638:~$ sudo nvme list
Node                  Generic               SN                   Model                                    Namespace Usage                      Format           FW Rev
--------------------- --------------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme0n1          /dev/ng0n1            foo                  QEMU NVMe Ctrl                           1           8.49  GB /   8.49  GB    512   B +  0 B   10.0.6
urezki@pc638:~$ sudo cat /sys/block/nvme0n1/queue/physical_block_size
512
urezki@pc638:~$ sudo cat /sys/block/nvme0n1/queue/logical_block_size
512
urezki@pc638:~$

Is that what should be reported?

--
Uladzislau Rezki

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RESEND PATCH] dm-ebs: Mark full buffer dirty even on partial write
  2025-11-20 12:08                 ` Uladzislau Rezki
@ 2025-11-20 12:40                   ` Uladzislau Rezki
  2025-11-21  7:25                     ` Christoph Hellwig
  2025-11-21  7:24                   ` Christoph Hellwig
  1 sibling, 1 reply; 33+ messages in thread
From: Uladzislau Rezki @ 2025-11-20 12:40 UTC (permalink / raw)
  To: Christoph Hellwig, Mikulas Patocka
  Cc: Christoph Hellwig, Mikulas Patocka, Benjamin Marzinski,
	Alasdair Kergon, DMML, Andrew Morton, Mike Snitzer, LKML

On Thu, Nov 20, 2025 at 01:08:57PM +0100, Uladzislau Rezki wrote:
> On Thu, Nov 20, 2025 at 07:21:46AM +0100, Christoph Hellwig wrote:
> > On Wed, Nov 19, 2025 at 06:26:13PM +0100, Mikulas Patocka wrote:
> > > 
> > > 
> > > On Wed, 19 Nov 2025, Christoph Hellwig wrote:
> > > 
> > > > On Tue, Nov 18, 2025 at 06:21:56PM +0100, Mikulas Patocka wrote:
> > > > > OK - I accepted Uladzislau's patch. As logical block size and physical 
> > > > > block size seem to be unreliable, it's better to set the size in dm-ebs.
> > > > 
> > > > logical and physical block size are reliable.  Uladzislau just seems
> > > > to have a completely broken device that needs fixing, because it will
> > > 
> > > He created a qemu-emulated NVMe device with physical and logical block 
> > > size 8192 in a virtual machine. And logical block size was reported as 512 
> > > in the guest kernel - so it is either a qemu bug or a kernel bug.
> > 
> > No, that's not the case.  If you use his command line you'll see a qemu
> > device with 8192 logical blocks assuming you have support for large
> > folios, or a completely unusuable device that claims to have 512
> > byte blocks for compatibility, but also a capacity of zero so that no
> > one can use it for anything but passthrough.
> > 
> > > in nvme_update_disk_info there is this piece of code:
> > >         if (blk_validate_block_size(bs)) {
> > >                 bs = (1 << 9);
> > >                 valid = false;
> > >         }
> > 
> > Yes, that's what I mentioned above.  The valid=false sets the capacity
> > to zero, so you're not actually going to be able to use this device.
> > 
> > > So, the valid block size depends on whether CONFIG_TRANSPARENT_HUGEPAGE is 
> > > defined, which is quite weird.
> > 
> > And also the page size, and none of that is too weird.  You need support
> > efficiently allocating large order folios to support a
> > block size > PAGE_SIZE and currently CONFIG_TRANSPARENT_HUGEPAGE is
> > the guard for that.  There was some talk of lifting that, but that
> > requires a bit of work.
> > 
> Could you please check below? Is the last one is correctly reported?
> I have enabled the CONFIG_TRANSPARENT_HUGEPAGE option. If i specify,
> 8192, 8192 first case, reports are what i set. Second variant 512, 8129
> shows 512, 512:
> 
> CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y
> CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD=y
> CONFIG_TRANSPARENT_HUGEPAGE=y
> CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y
> # CONFIG_TRANSPARENT_HUGEPAGE_MADVISE is not set
> # CONFIG_TRANSPARENT_HUGEPAGE_NEVER is not set
> 
> -device nvme,drive=drv0,serial=foo,logical_block_size=8192,physical_block_size=8192
> urezki@pc638:~$ sudo nvme list
> Node                  Generic               SN                   Model                                    Namespace Usage                      Format           FW Rev
> --------------------- --------------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
> /dev/nvme0n1          /dev/ng0n1            foo                  QEMU NVMe Ctrl                           1           8.49  GB /   8.49  GB      8 KiB +  0 B   10.0.6
> urezki@pc638:~$ sudo cat /sys/block/nvme0n1/queue/logical_block_size
> 8192
> urezki@pc638:~$ sudo cat /sys/block/nvme0n1/queue/physical_block_size
> 8192
> urezki@pc638:~$
> 
> 
> -device nvme,drive=drv0,serial=foo,logical_block_size=512,physical_block_size=8192
> urezki@pc638:~$ sudo nvme list
> Node                  Generic               SN                   Model                                    Namespace Usage                      Format           FW Rev
> --------------------- --------------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
> /dev/nvme0n1          /dev/ng0n1            foo                  QEMU NVMe Ctrl                           1           8.49  GB /   8.49  GB    512   B +  0 B   10.0.6
> urezki@pc638:~$ sudo cat /sys/block/nvme0n1/queue/physical_block_size
> 512
> urezki@pc638:~$ sudo cat /sys/block/nvme0n1/queue/logical_block_size
> 512
> urezki@pc638:~$
> 
It might be that qemu changes this, i will check.

Christoph, but i assume it is valid case: logical=512B, physical=8192K.
Could you please confirm?

Thanks!

--
Uladzislau Rezki

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RESEND PATCH] dm-ebs: Mark full buffer dirty even on partial write
  2025-11-20 12:08                 ` Uladzislau Rezki
  2025-11-20 12:40                   ` Uladzislau Rezki
@ 2025-11-21  7:24                   ` Christoph Hellwig
  2025-11-21 13:21                     ` Uladzislau Rezki
  1 sibling, 1 reply; 33+ messages in thread
From: Christoph Hellwig @ 2025-11-21  7:24 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Christoph Hellwig, Mikulas Patocka, Benjamin Marzinski,
	Alasdair Kergon, DMML, Andrew Morton, Mike Snitzer, LKML

On Thu, Nov 20, 2025 at 01:08:57PM +0100, Uladzislau Rezki wrote:
> Could you please check below? Is the last one is correctly reported?

The latter looks unexpected, but is is becase qemu is not passing through
the qemu physical_block_size attribute to any of the nvme settings Linux
interprets as such for NVMe (NVMe doesn't actually have the concept of
a physical block size, unlike SCSI/ATA):

root@testvm:~# nvme id-ns -H /dev/nvme0n1 | grep npw
npwg    : 0
npwa    : 0
root@testvm:~# nvme id-ns -H /dev/nvme0n1 | grep naw
nawun   : 0
nawupf  : 0
root@testvm:~# nvme id-ctrl -H /dev/nvme0 | grep awupf
awupf     : 0

but as said multiple times, that should not really matter - the logical
block size is the granularity of I/O, the physical block size is just
a performance hint.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RESEND PATCH] dm-ebs: Mark full buffer dirty even on partial write
  2025-11-20 12:40                   ` Uladzislau Rezki
@ 2025-11-21  7:25                     ` Christoph Hellwig
  0 siblings, 0 replies; 33+ messages in thread
From: Christoph Hellwig @ 2025-11-21  7:25 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Christoph Hellwig, Mikulas Patocka, Benjamin Marzinski,
	Alasdair Kergon, DMML, Andrew Morton, Mike Snitzer, LKML

On Thu, Nov 20, 2025 at 01:40:32PM +0100, Uladzislau Rezki wrote:
> Christoph, but i assume it is valid case: logical=512B, physical=8192K.
> Could you please confirm?

That is a valid configuration.  One that must accept I/O at 512 byte
granularity.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RESEND PATCH] dm-ebs: Mark full buffer dirty even on partial write
  2025-11-21  7:24                   ` Christoph Hellwig
@ 2025-11-21 13:21                     ` Uladzislau Rezki
  2025-11-21 16:48                       ` Benjamin Marzinski
  2025-11-24 14:30                       ` Christoph Hellwig
  0 siblings, 2 replies; 33+ messages in thread
From: Uladzislau Rezki @ 2025-11-21 13:21 UTC (permalink / raw)
  To: Christoph Hellwig, Mikulas Patocka
  Cc: Uladzislau Rezki, Mikulas Patocka, Benjamin Marzinski,
	Alasdair Kergon, DMML, Andrew Morton, Mike Snitzer, LKML

On Fri, Nov 21, 2025 at 08:24:21AM +0100, Christoph Hellwig wrote:
> On Thu, Nov 20, 2025 at 01:08:57PM +0100, Uladzislau Rezki wrote:
> > Could you please check below? Is the last one is correctly reported?
> 
> The latter looks unexpected, but is is becase qemu is not passing through
> the qemu physical_block_size attribute to any of the nvme settings Linux
> interprets as such for NVMe (NVMe doesn't actually have the concept of
> a physical block size, unlike SCSI/ATA):
> 
OK, understood and thank you for checking this.

>
> root@testvm:~# nvme id-ns -H /dev/nvme0n1 | grep npw
> npwg    : 0
> npwa    : 0
> root@testvm:~# nvme id-ns -H /dev/nvme0n1 | grep naw
> nawun   : 0
> nawupf  : 0
> root@testvm:~# nvme id-ctrl -H /dev/nvme0 | grep awupf
> awupf     : 0
> 
> but as said multiple times, that should not really matter - the logical
> block size is the granularity of I/O, the physical block size is just
> a performance hint.
>
Right.

As stated in commit message of the patch which is in question. 8K
emulated in qemu device with CONFIG_TRANSPARENT_HUGEPAGE=y:

urezki@pc638:~$ sudo nvme list
Node                  Generic               SN                   Model                                    Namespace Usage                      Format           FW Rev
--------------------- --------------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme0n1          /dev/ng0n1            foo                  QEMU NVMe Ctrl                           1           8.49  GB /   8.49  GB      8 KiB +  0 B   10.0.6
urezki@pc638:~$ cat bin/dmsetup.sh
#!/bin/bash

lower=/dev/nvme0n1
len=$(blockdev --getsz "$lower")

echo "0 $len ebs $lower 0 1 16" | dmsetup create nvme-8k
urezki@pc638:~$ sudo bin/dmsetup.sh
urezki@pc638:~$ sudo cat /sys/block/nvme0n1/queue/logical_block_size
8192
urezki@pc638:~$ sudo cat /sys/block/nvme0n1/queue/physical_block_size
8192
urezki@pc638:~$ sudo cat /sys/block/dm-0/queue/logical_block_size
512
urezki@pc638:~$ sudo cat /sys/block/dm-0/queue/physical_block_size
8192
urezki@pc638:~$ sudo mkfs.ext4 -F /dev/dm-0
mke2fs 1.47.0 (5-Feb-2023)
/dev/dm-0 contains a ext4 file system
        last mounted on Fri Nov 21 12:22:55 2025
Discarding device blocks: done
Creating filesystem with 2072576 4k blocks and 518144 inodes
Filesystem UUID: f71adb05-c020-4406-bc0d-bdb9e5c29af7
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632

Allocating group tables: done
Writing inode tables: done
Creating journal (16384 blocks): done
Writing superblocks and filesystem accounting information: mkfs.ext4: Input/output error while writing out and closing file system
urezki@pc638:~$ sudo dmesg | grep -i "i/o"
[   71.813322] Buffer I/O error on dev dm-0, logical block 10, lost async page write
[   71.813373] Buffer I/O error on dev dm-0, logical block 11, lost async page write
[   71.813395] Buffer I/O error on dev dm-0, logical block 12, lost async page write
[   71.813415] Buffer I/O error on dev dm-0, logical block 13, lost async page write
[   71.813433] Buffer I/O error on dev dm-0, logical block 14, lost async page write
[   71.813451] Buffer I/O error on dev dm-0, logical block 15, lost async page write
[   71.813475] Buffer I/O error on dev dm-0, logical block 16, lost async page write
[   71.813493] Buffer I/O error on dev dm-0, logical block 17, lost async page write
[   71.813516] Buffer I/O error on dev dm-0, logical block 18, lost async page write
[   71.813537] Buffer I/O error on dev dm-0, logical block 19, lost async page write
urezki@pc638:~$

with the patch:

urezki@pc638:~$ sudo nvme list
Node                  Generic               SN                   Model                                    Namespace Usage                      Format           FW Rev
--------------------- --------------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme0n1          /dev/ng0n1            foo                  QEMU NVMe Ctrl                           1           8.49  GB /   8.49  GB      8 KiB +  0 B   10.0.6
urezki@pc638:~$ cat bin/dmsetup.sh
#!/bin/bash

lower=/dev/nvme0n1
len=$(blockdev --getsz "$lower")

echo "0 $len ebs $lower 0 1 16" | dmsetup create nvme-8k
urezki@pc638:~$ sudo bin/dmsetup.sh
urezki@pc638:~$ sudo cat /sys/block/nvme0n1/queue/logical_block_size
8192
urezki@pc638:~$ sudo cat /sys/block/nvme0n1/queue/physical_block_size
8192
urezki@pc638:~$ sudo cat /sys/block/dm-0/queue/logical_block_size
512
urezki@pc638:~$ sudo cat /sys/block/dm-0/queue/physical_block_size
8192
urezki@pc638:~$ sudo mkfs.ext4 -F /dev/dm-0
mke2fs 1.47.0 (5-Feb-2023)
Discarding device blocks: done
Creating filesystem with 2072576 4k blocks and 518144 inodes
Filesystem UUID: c7dff4c7-aa7e-4c94-98ee-f9ea2da92a06
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632

Allocating group tables: done
Writing inode tables: done
Creating journal (16384 blocks): done
Writing superblocks and filesystem accounting information: done

urezki@pc638:~$ sudo mount /dev/dm-0 /mnt/
urezki@pc638:~$ ls -al /mnt/
total 24
drwxr-xr-x  3 root root  4096 Nov 21 12:22 .
drwxr-xr-x 19 root root  4096 Jul 10 19:42 ..
drwx------  2 root root 16384 Nov 21 12:22 lost+found
urezki@pc638:~$

How do we solve this?

Mikulas proposed to use below patch:

<snip>
Index: linux-2.6/drivers/md/dm-bufio.c
===================================================================
--- linux-2.6.orig/drivers/md/dm-bufio.c        2025-10-13 21:42:47.000000000 +0200
+++ linux-2.6/drivers/md/dm-bufio.c     2025-10-20 14:40:32.000000000 +0200
@@ -1374,7 +1374,7 @@ static void submit_io(struct dm_buffer *
 {
        unsigned int n_sectors;
        sector_t sector;
-       unsigned int offset, end;
+       unsigned int offset, end, align;

        b->end_io = end_io;

@@ -1388,9 +1388,10 @@ static void submit_io(struct dm_buffer *
                        b->c->write_callback(b);
                offset = b->write_start;
                end = b->write_end;
-               offset &= -DM_BUFIO_WRITE_ALIGN;
-               end += DM_BUFIO_WRITE_ALIGN - 1;
-               end &= -DM_BUFIO_WRITE_ALIGN;
+               align = max(DM_BUFIO_WRITE_ALIGN, bdev_logical_block_size(b->c->bdev));
+               offset &= -align;
+               end += align - 1;
+               end &= -align;
                if (unlikely(end > b->c->block_size))
                        end = b->c->block_size;
<snip>

and it fixes the setup which i described in the commit message, but i
have question.

Why in dm-ebs we need to offload partial buffer < ubf size?

Thank you for answers!

--
Uladzislau Rezki

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RESEND PATCH] dm-ebs: Mark full buffer dirty even on partial write
  2025-11-21 13:21                     ` Uladzislau Rezki
@ 2025-11-21 16:48                       ` Benjamin Marzinski
  2025-11-24 10:43                         ` Uladzislau Rezki
  2025-11-24 14:30                       ` Christoph Hellwig
  1 sibling, 1 reply; 33+ messages in thread
From: Benjamin Marzinski @ 2025-11-21 16:48 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Christoph Hellwig, Mikulas Patocka, Alasdair Kergon, DMML,
	Andrew Morton, Mike Snitzer, LKML

On Fri, Nov 21, 2025 at 02:21:34PM +0100, Uladzislau Rezki wrote:
> On Fri, Nov 21, 2025 at 08:24:21AM +0100, Christoph Hellwig wrote:
> > On Thu, Nov 20, 2025 at 01:08:57PM +0100, Uladzislau Rezki wrote:
> > > Could you please check below? Is the last one is correctly reported?
> > 
> > The latter looks unexpected, but is is becase qemu is not passing through
> > the qemu physical_block_size attribute to any of the nvme settings Linux
> > interprets as such for NVMe (NVMe doesn't actually have the concept of
> > a physical block size, unlike SCSI/ATA):
> > 
> OK, understood and thank you for checking this.
> 
> >
> > root@testvm:~# nvme id-ns -H /dev/nvme0n1 | grep npw
> > npwg    : 0
> > npwa    : 0
> > root@testvm:~# nvme id-ns -H /dev/nvme0n1 | grep naw
> > nawun   : 0
> > nawupf  : 0
> > root@testvm:~# nvme id-ctrl -H /dev/nvme0 | grep awupf
> > awupf     : 0
> > 
> > but as said multiple times, that should not really matter - the logical
> > block size is the granularity of I/O, the physical block size is just
> > a performance hint.
> >
> Right.
> 
> As stated in commit message of the patch which is in question. 8K
> emulated in qemu device with CONFIG_TRANSPARENT_HUGEPAGE=y:
> 
> urezki@pc638:~$ sudo nvme list
> Node                  Generic               SN                   Model                                    Namespace Usage                      Format           FW Rev
> --------------------- --------------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
> /dev/nvme0n1          /dev/ng0n1            foo                  QEMU NVMe Ctrl                           1           8.49  GB /   8.49  GB      8 KiB +  0 B   10.0.6
> urezki@pc638:~$ cat bin/dmsetup.sh
> #!/bin/bash
> 
> lower=/dev/nvme0n1
> len=$(blockdev --getsz "$lower")
> 
> echo "0 $len ebs $lower 0 1 16" | dmsetup create nvme-8k
> urezki@pc638:~$ sudo bin/dmsetup.sh
> urezki@pc638:~$ sudo cat /sys/block/nvme0n1/queue/logical_block_size
> 8192
> urezki@pc638:~$ sudo cat /sys/block/nvme0n1/queue/physical_block_size
> 8192
> urezki@pc638:~$ sudo cat /sys/block/dm-0/queue/logical_block_size
> 512
> urezki@pc638:~$ sudo cat /sys/block/dm-0/queue/physical_block_size
> 8192
> urezki@pc638:~$ sudo mkfs.ext4 -F /dev/dm-0
> mke2fs 1.47.0 (5-Feb-2023)
> /dev/dm-0 contains a ext4 file system
>         last mounted on Fri Nov 21 12:22:55 2025
> Discarding device blocks: done
> Creating filesystem with 2072576 4k blocks and 518144 inodes
> Filesystem UUID: f71adb05-c020-4406-bc0d-bdb9e5c29af7
> Superblock backups stored on blocks:
>         32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632
> 
> Allocating group tables: done
> Writing inode tables: done
> Creating journal (16384 blocks): done
> Writing superblocks and filesystem accounting information: mkfs.ext4: Input/output error while writing out and closing file system
> urezki@pc638:~$ sudo dmesg | grep -i "i/o"
> [   71.813322] Buffer I/O error on dev dm-0, logical block 10, lost async page write
> [   71.813373] Buffer I/O error on dev dm-0, logical block 11, lost async page write
> [   71.813395] Buffer I/O error on dev dm-0, logical block 12, lost async page write
> [   71.813415] Buffer I/O error on dev dm-0, logical block 13, lost async page write
> [   71.813433] Buffer I/O error on dev dm-0, logical block 14, lost async page write
> [   71.813451] Buffer I/O error on dev dm-0, logical block 15, lost async page write
> [   71.813475] Buffer I/O error on dev dm-0, logical block 16, lost async page write
> [   71.813493] Buffer I/O error on dev dm-0, logical block 17, lost async page write
> [   71.813516] Buffer I/O error on dev dm-0, logical block 18, lost async page write
> [   71.813537] Buffer I/O error on dev dm-0, logical block 19, lost async page write
> urezki@pc638:~$
> 
> with the patch:
> 
> urezki@pc638:~$ sudo nvme list
> Node                  Generic               SN                   Model                                    Namespace Usage                      Format           FW Rev
> --------------------- --------------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
> /dev/nvme0n1          /dev/ng0n1            foo                  QEMU NVMe Ctrl                           1           8.49  GB /   8.49  GB      8 KiB +  0 B   10.0.6
> urezki@pc638:~$ cat bin/dmsetup.sh
> #!/bin/bash
> 
> lower=/dev/nvme0n1
> len=$(blockdev --getsz "$lower")
> 
> echo "0 $len ebs $lower 0 1 16" | dmsetup create nvme-8k
> urezki@pc638:~$ sudo bin/dmsetup.sh
> urezki@pc638:~$ sudo cat /sys/block/nvme0n1/queue/logical_block_size
> 8192
> urezki@pc638:~$ sudo cat /sys/block/nvme0n1/queue/physical_block_size
> 8192
> urezki@pc638:~$ sudo cat /sys/block/dm-0/queue/logical_block_size
> 512
> urezki@pc638:~$ sudo cat /sys/block/dm-0/queue/physical_block_size
> 8192
> urezki@pc638:~$ sudo mkfs.ext4 -F /dev/dm-0
> mke2fs 1.47.0 (5-Feb-2023)
> Discarding device blocks: done
> Creating filesystem with 2072576 4k blocks and 518144 inodes
> Filesystem UUID: c7dff4c7-aa7e-4c94-98ee-f9ea2da92a06
> Superblock backups stored on blocks:
>         32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632
> 
> Allocating group tables: done
> Writing inode tables: done
> Creating journal (16384 blocks): done
> Writing superblocks and filesystem accounting information: done
> 
> urezki@pc638:~$ sudo mount /dev/dm-0 /mnt/
> urezki@pc638:~$ ls -al /mnt/
> total 24
> drwxr-xr-x  3 root root  4096 Nov 21 12:22 .
> drwxr-xr-x 19 root root  4096 Jul 10 19:42 ..
> drwx------  2 root root 16384 Nov 21 12:22 lost+found
> urezki@pc638:~$
> 
> How do we solve this?
> 
> Mikulas proposed to use below patch:
> 
> <snip>
> Index: linux-2.6/drivers/md/dm-bufio.c
> ===================================================================
> --- linux-2.6.orig/drivers/md/dm-bufio.c        2025-10-13 21:42:47.000000000 +0200
> +++ linux-2.6/drivers/md/dm-bufio.c     2025-10-20 14:40:32.000000000 +0200
> @@ -1374,7 +1374,7 @@ static void submit_io(struct dm_buffer *
>  {
>         unsigned int n_sectors;
>         sector_t sector;
> -       unsigned int offset, end;
> +       unsigned int offset, end, align;
> 
>         b->end_io = end_io;
> 
> @@ -1388,9 +1388,10 @@ static void submit_io(struct dm_buffer *
>                         b->c->write_callback(b);
>                 offset = b->write_start;
>                 end = b->write_end;
> -               offset &= -DM_BUFIO_WRITE_ALIGN;
> -               end += DM_BUFIO_WRITE_ALIGN - 1;
> -               end &= -DM_BUFIO_WRITE_ALIGN;
> +               align = max(DM_BUFIO_WRITE_ALIGN, bdev_logical_block_size(b->c->bdev));
> +               offset &= -align;
> +               end += align - 1;
> +               end &= -align;
>                 if (unlikely(end > b->c->block_size))
>                         end = b->c->block_size;
> <snip>
> 
> and it fixes the setup which i described in the commit message, but i
> have question.
> 
> Why in dm-ebs we need to offload partial buffer < ubf size?

Um, did you notice that Mikulas accepted your patch?

> 
> Thank you for answers!
> 
> --
> Uladzislau Rezki


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RESEND PATCH] dm-ebs: Mark full buffer dirty even on partial write
  2025-11-21 16:48                       ` Benjamin Marzinski
@ 2025-11-24 10:43                         ` Uladzislau Rezki
  0 siblings, 0 replies; 33+ messages in thread
From: Uladzislau Rezki @ 2025-11-24 10:43 UTC (permalink / raw)
  To: Benjamin Marzinski
  Cc: Uladzislau Rezki, Christoph Hellwig, Mikulas Patocka,
	Alasdair Kergon, DMML, Andrew Morton, Mike Snitzer, LKML

> On Fri, Nov 21, 2025 at 02:21:34PM +0100, Uladzislau Rezki wrote:
> > On Fri, Nov 21, 2025 at 08:24:21AM +0100, Christoph Hellwig wrote:
> > > On Thu, Nov 20, 2025 at 01:08:57PM +0100, Uladzislau Rezki wrote:
> > > > Could you please check below? Is the last one is correctly reported?
> > > 
> > > The latter looks unexpected, but is is becase qemu is not passing through
> > > the qemu physical_block_size attribute to any of the nvme settings Linux
> > > interprets as such for NVMe (NVMe doesn't actually have the concept of
> > > a physical block size, unlike SCSI/ATA):
> > > 
> > OK, understood and thank you for checking this.
> > 
> > >
> > > root@testvm:~# nvme id-ns -H /dev/nvme0n1 | grep npw
> > > npwg    : 0
> > > npwa    : 0
> > > root@testvm:~# nvme id-ns -H /dev/nvme0n1 | grep naw
> > > nawun   : 0
> > > nawupf  : 0
> > > root@testvm:~# nvme id-ctrl -H /dev/nvme0 | grep awupf
> > > awupf     : 0
> > > 
> > > but as said multiple times, that should not really matter - the logical
> > > block size is the granularity of I/O, the physical block size is just
> > > a performance hint.
> > >
> > Right.
> > 
> > As stated in commit message of the patch which is in question. 8K
> > emulated in qemu device with CONFIG_TRANSPARENT_HUGEPAGE=y:
> > 
> > urezki@pc638:~$ sudo nvme list
> > Node                  Generic               SN                   Model                                    Namespace Usage                      Format           FW Rev
> > --------------------- --------------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
> > /dev/nvme0n1          /dev/ng0n1            foo                  QEMU NVMe Ctrl                           1           8.49  GB /   8.49  GB      8 KiB +  0 B   10.0.6
> > urezki@pc638:~$ cat bin/dmsetup.sh
> > #!/bin/bash
> > 
> > lower=/dev/nvme0n1
> > len=$(blockdev --getsz "$lower")
> > 
> > echo "0 $len ebs $lower 0 1 16" | dmsetup create nvme-8k
> > urezki@pc638:~$ sudo bin/dmsetup.sh
> > urezki@pc638:~$ sudo cat /sys/block/nvme0n1/queue/logical_block_size
> > 8192
> > urezki@pc638:~$ sudo cat /sys/block/nvme0n1/queue/physical_block_size
> > 8192
> > urezki@pc638:~$ sudo cat /sys/block/dm-0/queue/logical_block_size
> > 512
> > urezki@pc638:~$ sudo cat /sys/block/dm-0/queue/physical_block_size
> > 8192
> > urezki@pc638:~$ sudo mkfs.ext4 -F /dev/dm-0
> > mke2fs 1.47.0 (5-Feb-2023)
> > /dev/dm-0 contains a ext4 file system
> >         last mounted on Fri Nov 21 12:22:55 2025
> > Discarding device blocks: done
> > Creating filesystem with 2072576 4k blocks and 518144 inodes
> > Filesystem UUID: f71adb05-c020-4406-bc0d-bdb9e5c29af7
> > Superblock backups stored on blocks:
> >         32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632
> > 
> > Allocating group tables: done
> > Writing inode tables: done
> > Creating journal (16384 blocks): done
> > Writing superblocks and filesystem accounting information: mkfs.ext4: Input/output error while writing out and closing file system
> > urezki@pc638:~$ sudo dmesg | grep -i "i/o"
> > [   71.813322] Buffer I/O error on dev dm-0, logical block 10, lost async page write
> > [   71.813373] Buffer I/O error on dev dm-0, logical block 11, lost async page write
> > [   71.813395] Buffer I/O error on dev dm-0, logical block 12, lost async page write
> > [   71.813415] Buffer I/O error on dev dm-0, logical block 13, lost async page write
> > [   71.813433] Buffer I/O error on dev dm-0, logical block 14, lost async page write
> > [   71.813451] Buffer I/O error on dev dm-0, logical block 15, lost async page write
> > [   71.813475] Buffer I/O error on dev dm-0, logical block 16, lost async page write
> > [   71.813493] Buffer I/O error on dev dm-0, logical block 17, lost async page write
> > [   71.813516] Buffer I/O error on dev dm-0, logical block 18, lost async page write
> > [   71.813537] Buffer I/O error on dev dm-0, logical block 19, lost async page write
> > urezki@pc638:~$
> > 
> > with the patch:
> > 
> > urezki@pc638:~$ sudo nvme list
> > Node                  Generic               SN                   Model                                    Namespace Usage                      Format           FW Rev
> > --------------------- --------------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
> > /dev/nvme0n1          /dev/ng0n1            foo                  QEMU NVMe Ctrl                           1           8.49  GB /   8.49  GB      8 KiB +  0 B   10.0.6
> > urezki@pc638:~$ cat bin/dmsetup.sh
> > #!/bin/bash
> > 
> > lower=/dev/nvme0n1
> > len=$(blockdev --getsz "$lower")
> > 
> > echo "0 $len ebs $lower 0 1 16" | dmsetup create nvme-8k
> > urezki@pc638:~$ sudo bin/dmsetup.sh
> > urezki@pc638:~$ sudo cat /sys/block/nvme0n1/queue/logical_block_size
> > 8192
> > urezki@pc638:~$ sudo cat /sys/block/nvme0n1/queue/physical_block_size
> > 8192
> > urezki@pc638:~$ sudo cat /sys/block/dm-0/queue/logical_block_size
> > 512
> > urezki@pc638:~$ sudo cat /sys/block/dm-0/queue/physical_block_size
> > 8192
> > urezki@pc638:~$ sudo mkfs.ext4 -F /dev/dm-0
> > mke2fs 1.47.0 (5-Feb-2023)
> > Discarding device blocks: done
> > Creating filesystem with 2072576 4k blocks and 518144 inodes
> > Filesystem UUID: c7dff4c7-aa7e-4c94-98ee-f9ea2da92a06
> > Superblock backups stored on blocks:
> >         32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632
> > 
> > Allocating group tables: done
> > Writing inode tables: done
> > Creating journal (16384 blocks): done
> > Writing superblocks and filesystem accounting information: done
> > 
> > urezki@pc638:~$ sudo mount /dev/dm-0 /mnt/
> > urezki@pc638:~$ ls -al /mnt/
> > total 24
> > drwxr-xr-x  3 root root  4096 Nov 21 12:22 .
> > drwxr-xr-x 19 root root  4096 Jul 10 19:42 ..
> > drwx------  2 root root 16384 Nov 21 12:22 lost+found
> > urezki@pc638:~$
> > 
> > How do we solve this?
> > 
> > Mikulas proposed to use below patch:
> > 
> > <snip>
> > Index: linux-2.6/drivers/md/dm-bufio.c
> > ===================================================================
> > --- linux-2.6.orig/drivers/md/dm-bufio.c        2025-10-13 21:42:47.000000000 +0200
> > +++ linux-2.6/drivers/md/dm-bufio.c     2025-10-20 14:40:32.000000000 +0200
> > @@ -1374,7 +1374,7 @@ static void submit_io(struct dm_buffer *
> >  {
> >         unsigned int n_sectors;
> >         sector_t sector;
> > -       unsigned int offset, end;
> > +       unsigned int offset, end, align;
> > 
> >         b->end_io = end_io;
> > 
> > @@ -1388,9 +1388,10 @@ static void submit_io(struct dm_buffer *
> >                         b->c->write_callback(b);
> >                 offset = b->write_start;
> >                 end = b->write_end;
> > -               offset &= -DM_BUFIO_WRITE_ALIGN;
> > -               end += DM_BUFIO_WRITE_ALIGN - 1;
> > -               end &= -DM_BUFIO_WRITE_ALIGN;
> > +               align = max(DM_BUFIO_WRITE_ALIGN, bdev_logical_block_size(b->c->bdev));
> > +               offset &= -align;
> > +               end += align - 1;
> > +               end &= -align;
> >                 if (unlikely(end > b->c->block_size))
> >                         end = b->c->block_size;
> > <snip>
> > 
> > and it fixes the setup which i described in the commit message, but i
> > have question.
> > 
> > Why in dm-ebs we need to offload partial buffer < ubf size?
> 
> Um, did you notice that Mikulas accepted your patch?
> 
I saw, he mentioned that and i am glad that during this discussion
we came to one more extra patch. I got the feeling that there were
misunderstanding between us, so i decided to make it more clear
that is it.

Thank you.

--
Uladzislau Rezki

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RESEND PATCH] dm-ebs: Mark full buffer dirty even on partial write
  2025-11-21 13:21                     ` Uladzislau Rezki
  2025-11-21 16:48                       ` Benjamin Marzinski
@ 2025-11-24 14:30                       ` Christoph Hellwig
  2025-11-24 15:30                         ` Uladzislau Rezki
  1 sibling, 1 reply; 33+ messages in thread
From: Christoph Hellwig @ 2025-11-24 14:30 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Christoph Hellwig, Mikulas Patocka, Benjamin Marzinski,
	Alasdair Kergon, DMML, Andrew Morton, Mike Snitzer, LKML

On Fri, Nov 21, 2025 at 02:21:34PM +0100, Uladzislau Rezki wrote:
> -               offset &= -DM_BUFIO_WRITE_ALIGN;
> -               end += DM_BUFIO_WRITE_ALIGN - 1;
> -               end &= -DM_BUFIO_WRITE_ALIGN;
> +               align = max(DM_BUFIO_WRITE_ALIGN, bdev_logical_block_size(b->c->bdev));
> +               offset &= -align;
> +               end += align - 1;
> +               end &= -align;
>                 if (unlikely(end > b->c->block_size))
>                         end = b->c->block_size;
> <snip>
> 
> and it fixes the setup which i described in the commit message, but i
> have question.

And this patch, using bdev_logical_block_size looks correct.

> 
> Why in dm-ebs we need to offload partial buffer < ubf size?

I don't understand this question.  What is ubf?  What does partial
buffer mean in this context, and what does offload mean?


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RESEND PATCH] dm-ebs: Mark full buffer dirty even on partial write
  2025-11-24 14:30                       ` Christoph Hellwig
@ 2025-11-24 15:30                         ` Uladzislau Rezki
  2025-11-24 17:00                           ` Christoph Hellwig
  0 siblings, 1 reply; 33+ messages in thread
From: Uladzislau Rezki @ 2025-11-24 15:30 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Uladzislau Rezki, Mikulas Patocka, Benjamin Marzinski,
	Alasdair Kergon, DMML, Andrew Morton, Mike Snitzer, LKML

On Mon, Nov 24, 2025 at 03:30:44PM +0100, Christoph Hellwig wrote:
> On Fri, Nov 21, 2025 at 02:21:34PM +0100, Uladzislau Rezki wrote:
> > -               offset &= -DM_BUFIO_WRITE_ALIGN;
> > -               end += DM_BUFIO_WRITE_ALIGN - 1;
> > -               end &= -DM_BUFIO_WRITE_ALIGN;
> > +               align = max(DM_BUFIO_WRITE_ALIGN, bdev_logical_block_size(b->c->bdev));
> > +               offset &= -align;
> > +               end += align - 1;
> > +               end &= -align;
> >                 if (unlikely(end > b->c->block_size))
> >                         end = b->c->block_size;
> > <snip>
> > 
> > and it fixes the setup which i described in the commit message, but i
> > have question.
> 
> And this patch, using bdev_logical_block_size looks correct.
> 
> > 
> > Why in dm-ebs we need to offload partial buffer < ubf size?
> 
> I don't understand this question.  What is ubf?  What does partial
> buffer mean in this context, and what does offload mean?
> 
That was a typo :) i meant ubs - which is underlying block size or number
of sectors which define the logical block size of the device. In our case
it is 8K thus is 16 = 512 * 16 = 8K.

Partial buffer means, in context of dm-ebs, that within 8K buffer only
part of it can be modified. For example, since we emulate 512B to 8K
from upper layer to the device, a file system can write for example
just first 4K within 8K window buffer and only that part is marked as
dirty.

offloading or imposing the data to the lower layer. i.e. writing dirty
buffers to the device calling submit_io().

Is it better? It might be that i missed something, feel free to correct.

--
Uladzislau Rezki

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RESEND PATCH] dm-ebs: Mark full buffer dirty even on partial write
  2025-11-24 15:30                         ` Uladzislau Rezki
@ 2025-11-24 17:00                           ` Christoph Hellwig
  2025-11-24 18:05                             ` Uladzislau Rezki
  0 siblings, 1 reply; 33+ messages in thread
From: Christoph Hellwig @ 2025-11-24 17:00 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Christoph Hellwig, Mikulas Patocka, Benjamin Marzinski,
	Alasdair Kergon, DMML, Andrew Morton, Mike Snitzer, LKML

On Mon, Nov 24, 2025 at 04:30:25PM +0100, Uladzislau Rezki wrote:
> > 
> > > 
> > > Why in dm-ebs we need to offload partial buffer < ubf size?
> > 
> > I don't understand this question.  What is ubf?  What does partial
> > buffer mean in this context, and what does offload mean?
> > 
> That was a typo :) i meant ubs - which is underlying block size or number
> of sectors which define the logical block size of the device. In our case
> it is 8K thus is 16 = 512 * 16 = 8K.
> 
> Partial buffer means, in context of dm-ebs, that within 8K buffer only
> part of it can be modified. For example, since we emulate 512B to 8K
> from upper layer to the device, a file system can write for example
> just first 4K within 8K window buffer and only that part is marked as
> dirty.
> 
> offloading or imposing the data to the lower layer. i.e. writing dirty
> buffers to the device calling submit_io().
> 
> Is it better? It might be that i missed something, feel free to correct.

I'm still lost what the question is, sorry.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RESEND PATCH] dm-ebs: Mark full buffer dirty even on partial write
  2025-11-24 17:00                           ` Christoph Hellwig
@ 2025-11-24 18:05                             ` Uladzislau Rezki
  0 siblings, 0 replies; 33+ messages in thread
From: Uladzislau Rezki @ 2025-11-24 18:05 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Uladzislau Rezki, Mikulas Patocka, Benjamin Marzinski,
	Alasdair Kergon, DMML, Andrew Morton, Mike Snitzer, LKML

On Mon, Nov 24, 2025 at 06:00:37PM +0100, Christoph Hellwig wrote:
> On Mon, Nov 24, 2025 at 04:30:25PM +0100, Uladzislau Rezki wrote:
> > > 
> > > > 
> > > > Why in dm-ebs we need to offload partial buffer < ubf size?
> > > 
> > > I don't understand this question.  What is ubf?  What does partial
> > > buffer mean in this context, and what does offload mean?
> > > 
> > That was a typo :) i meant ubs - which is underlying block size or number
> > of sectors which define the logical block size of the device. In our case
> > it is 8K thus is 16 = 512 * 16 = 8K.
> > 
> > Partial buffer means, in context of dm-ebs, that within 8K buffer only
> > part of it can be modified. For example, since we emulate 512B to 8K
> > from upper layer to the device, a file system can write for example
> > just first 4K within 8K window buffer and only that part is marked as
> > dirty.
> > 
> > offloading or imposing the data to the lower layer. i.e. writing dirty
> > buffers to the device calling submit_io().
> > 
> > Is it better? It might be that i missed something, feel free to correct.
> 
> I'm still lost what the question is, sorry.
> 
No problem, i am fine with it.

Thank you for your input especially explaining the difference
between logical_block_size and physical_block_size for nvme device.

Appreciate it!

--
Uladzislau Rezki

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2025-11-24 18:05 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-14 14:47 [RESEND PATCH] dm-ebs: Mark full buffer dirty even on partial write Uladzislau Rezki (Sony)
2025-10-16 19:59 ` Andrew Morton
2025-10-17 15:55   ` Uladzislau Rezki
  -- strict thread matches above, loose matches on Subject: below --
2025-11-17 10:59 Uladzislau Rezki (Sony)
2025-11-17 20:48 ` Mikulas Patocka
2025-11-18 11:39   ` Uladzislau Rezki
2025-11-18 12:00     ` Mikulas Patocka
2025-11-18 12:40       ` Uladzislau Rezki
2025-11-18 12:46         ` Christoph Hellwig
2025-11-18 14:15       ` Benjamin Marzinski
2025-11-18 17:21         ` Mikulas Patocka
2025-11-19  5:46           ` Christoph Hellwig
2025-11-19  8:43             ` Uladzislau Rezki
2025-11-19  8:53               ` Christoph Hellwig
2025-11-19  8:57                 ` Uladzislau Rezki
2025-11-19  9:00                   ` Christoph Hellwig
2025-11-19  9:01                     ` Uladzislau Rezki
2025-11-19  9:05                       ` Christoph Hellwig
2025-11-19  9:13                         ` Uladzislau Rezki
2025-11-19  9:17                           ` Christoph Hellwig
2025-11-19 17:26             ` Mikulas Patocka
2025-11-20  6:21               ` Christoph Hellwig
2025-11-20 12:08                 ` Uladzislau Rezki
2025-11-20 12:40                   ` Uladzislau Rezki
2025-11-21  7:25                     ` Christoph Hellwig
2025-11-21  7:24                   ` Christoph Hellwig
2025-11-21 13:21                     ` Uladzislau Rezki
2025-11-21 16:48                       ` Benjamin Marzinski
2025-11-24 10:43                         ` Uladzislau Rezki
2025-11-24 14:30                       ` Christoph Hellwig
2025-11-24 15:30                         ` Uladzislau Rezki
2025-11-24 17:00                           ` Christoph Hellwig
2025-11-24 18:05                             ` Uladzislau Rezki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).