* [PATCH v2] Do not require atomic writes to be power of 2 sized and aligned on length boundary
@ 2025-12-21 13:24 Vitaliy Filippov
2025-12-21 23:17 ` Keith Busch
2026-01-28 6:08 ` Ojaswin Mujoo
0 siblings, 2 replies; 8+ messages in thread
From: Vitaliy Filippov @ 2025-12-21 13:24 UTC (permalink / raw)
To: linux-block, linux-nvme; +Cc: Vitaliy Filippov
It contradicts NVMe specification where alignment is only required when atomic
write boundary (NABSPF/NABO) is set and highly limits usage of NVMe atomic writes
Signed-off-by: Vitaliy Filippov <vitalifster@gmail.com>
---
fs/read_write.c | 8 --------
1 file changed, 8 deletions(-)
diff --git a/fs/read_write.c b/fs/read_write.c
index 833bae068770..5467d710108d 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1802,17 +1802,9 @@ int generic_file_rw_checks(struct file *file_in, struct file *file_out)
int generic_atomic_write_valid(struct kiocb *iocb, struct iov_iter *iter)
{
- size_t len = iov_iter_count(iter);
-
if (!iter_is_ubuf(iter))
return -EINVAL;
- if (!is_power_of_2(len))
- return -EINVAL;
-
- if (!IS_ALIGNED(iocb->ki_pos, len))
- return -EINVAL;
-
if (!(iocb->ki_flags & IOCB_DIRECT))
return -EOPNOTSUPP;
--
2.51.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v2] Do not require atomic writes to be power of 2 sized and aligned on length boundary
2025-12-21 13:24 [PATCH v2] Do not require atomic writes to be power of 2 sized and aligned on length boundary Vitaliy Filippov
@ 2025-12-21 23:17 ` Keith Busch
2025-12-22 9:54 ` Vitaliy Filippov
2026-01-28 6:08 ` Ojaswin Mujoo
1 sibling, 1 reply; 8+ messages in thread
From: Keith Busch @ 2025-12-21 23:17 UTC (permalink / raw)
To: Vitaliy Filippov; +Cc: linux-block, linux-nvme
On Sun, Dec 21, 2025 at 04:24:02PM +0300, Vitaliy Filippov wrote:
> It contradicts NVMe specification where alignment is only required when atomic
> write boundary (NABSPF/NABO) is set and highly limits usage of NVMe atomic writes
Commit header is missing the "fs:" prefix, and the commit log should
wrap at 72 characters.
On the techincal side, this is a generic function used by multiple
protocols, so you can't just appeal to NVMe to justify removing the
checks.
NVMe still has atomic boundaries where straddling it fails to be an
atomic operation. Instead of removing the checks, you'd have to replace
it with a more costly operation if you really want to support more
arbitrary write lengths and offsets. And if you do manage to remove the
power of two requirement, then the queue limit for nvme's
atomic_write_hw_unit_max isn't correct anymore.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2] Do not require atomic writes to be power of 2 sized and aligned on length boundary
2025-12-21 23:17 ` Keith Busch
@ 2025-12-22 9:54 ` Vitaliy Filippov
2025-12-22 13:28 ` Vitaliy Filippov
0 siblings, 1 reply; 8+ messages in thread
From: Vitaliy Filippov @ 2025-12-22 9:54 UTC (permalink / raw)
To: Keith Busch; +Cc: linux-block, linux-nvme
Hi! Thanks a lot for your reply! This is actually my first patch ever
so please don't blame me for not following some standards, I'll try to
resubmit it correctly.
Regarding the rest:
1) NVMe atomic boundaries seem to already be checked in
nvme_valid_atomic_write().
2) What's atomic_write_hw_unit_max? As I understand, Linux also
already checks it, at least
/sys/block/nvme**/queue/atomic_write_max_bytes is already limited by
max_hw_sectors_kb.
3) Yes, I've of course seen that this function is also used by ext4
and xfs, but I don't understand the motivation behind the 2^n
requirement. I suppose file systems may fragment the write according
to currently allocated extents for example, but I don't see how issues
coming from this can be fixed by requiring writes to be 2^n.
But I understand that just removing the check may break something if
somebody relies on them. What do you think about removing the
requirement only for NVMe or only for block devices then? I see 3 ways
to do it:
a) split generic_atomic_write_valid() into two functions - first for
all types of inodes and second only for file systems.
b) remove generic_atomic_write_valid() from block device checks at all.
c) change generic_atomic_write_valid() just like in my original patch
but copy original checks into other places where it's used (ext4 and
xfs).
Which way do you think would be the best?
On Mon, Dec 22, 2025 at 2:17 AM Keith Busch <kbusch@kernel.org> wrote:
>
> On Sun, Dec 21, 2025 at 04:24:02PM +0300, Vitaliy Filippov wrote:
> > It contradicts NVMe specification where alignment is only required when atomic
> > write boundary (NABSPF/NABO) is set and highly limits usage of NVMe atomic writes
>
> Commit header is missing the "fs:" prefix, and the commit log should
> wrap at 72 characters.
>
> On the techincal side, this is a generic function used by multiple
> protocols, so you can't just appeal to NVMe to justify removing the
> checks.
>
> NVMe still has atomic boundaries where straddling it fails to be an
> atomic operation. Instead of removing the checks, you'd have to replace
> it with a more costly operation if you really want to support more
> arbitrary write lengths and offsets. And if you do manage to remove the
> power of two requirement, then the queue limit for nvme's
> atomic_write_hw_unit_max isn't correct anymore.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2] Do not require atomic writes to be power of 2 sized and aligned on length boundary
2025-12-22 9:54 ` Vitaliy Filippov
@ 2025-12-22 13:28 ` Vitaliy Filippov
2025-12-23 9:26 ` John Garry
0 siblings, 1 reply; 8+ messages in thread
From: Vitaliy Filippov @ 2025-12-22 13:28 UTC (permalink / raw)
To: linux-fsdevel
Cc: linux-block, linux-nvme, linux-fsdevel+subscribe, Keith Busch
Hi linux-fsdevel,
I recently discovered that Linux incorrectly requires all atomic
writes to have 2^N length and to be aligned on the length boundary.
This requirement contradicts NVMe specification which doesn't require
such alignment and length and thus highly restricts usage of atomic
writes with NVMe disks which support it (Micron and Kioxia).
NVMe specification has its own atomic write restrictions - AWUPF and
NABSPF/NABO, but both are already checked by the nvme subsystem.
The 2^N restriction comes from generic_atomic_write_valid().
I submitted a patch which removes this restriction to linux-block and
linux-nvme. Sorry if these maillists weren't the right place to send
it to, it's my first patch :).
But the function is currently used in 3 places: block/fops.c,
fs/ext4/file.c and fs/xfs/xfs_file.c.
Can you tell me if ext4 and xfs really want atomic writes to be 2^N
sized and length-aligned?
From looking at the code I'd say they don't really require it?
Can you approve my patch if I'm right? Please :-)
On Mon, Dec 22, 2025 at 12:54 PM Vitaliy Filippov <vitalifster@gmail.com> wrote:
>
> Hi! Thanks a lot for your reply! This is actually my first patch ever
> so please don't blame me for not following some standards, I'll try to
> resubmit it correctly.
>
> Regarding the rest:
>
> 1) NVMe atomic boundaries seem to already be checked in
> nvme_valid_atomic_write().
>
> 2) What's atomic_write_hw_unit_max? As I understand, Linux also
> already checks it, at least
> /sys/block/nvme**/queue/atomic_write_max_bytes is already limited by
> max_hw_sectors_kb.
>
> 3) Yes, I've of course seen that this function is also used by ext4
> and xfs, but I don't understand the motivation behind the 2^n
> requirement. I suppose file systems may fragment the write according
> to currently allocated extents for example, but I don't see how issues
> coming from this can be fixed by requiring writes to be 2^n.
>
> But I understand that just removing the check may break something if
> somebody relies on them. What do you think about removing the
> requirement only for NVMe or only for block devices then? I see 3 ways
> to do it:
> a) split generic_atomic_write_valid() into two functions - first for
> all types of inodes and second only for file systems.
> b) remove generic_atomic_write_valid() from block device checks at all.
> c) change generic_atomic_write_valid() just like in my original patch
> but copy original checks into other places where it's used (ext4 and
> xfs).
>
> Which way do you think would be the best?
>
> On Mon, Dec 22, 2025 at 2:17 AM Keith Busch <kbusch@kernel.org> wrote:
> >
> > On Sun, Dec 21, 2025 at 04:24:02PM +0300, Vitaliy Filippov wrote:
> > > It contradicts NVMe specification where alignment is only required when atomic
> > > write boundary (NABSPF/NABO) is set and highly limits usage of NVMe atomic writes
> >
> > Commit header is missing the "fs:" prefix, and the commit log should
> > wrap at 72 characters.
> >
> > On the techincal side, this is a generic function used by multiple
> > protocols, so you can't just appeal to NVMe to justify removing the
> > checks.
> >
> > NVMe still has atomic boundaries where straddling it fails to be an
> > atomic operation. Instead of removing the checks, you'd have to replace
> > it with a more costly operation if you really want to support more
> > arbitrary write lengths and offsets. And if you do manage to remove the
> > power of two requirement, then the queue limit for nvme's
> > atomic_write_hw_unit_max isn't correct anymore.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2] Do not require atomic writes to be power of 2 sized and aligned on length boundary
2025-12-22 13:28 ` Vitaliy Filippov
@ 2025-12-23 9:26 ` John Garry
2025-12-23 11:19 ` Vitaliy Filippov
0 siblings, 1 reply; 8+ messages in thread
From: John Garry @ 2025-12-23 9:26 UTC (permalink / raw)
To: Vitaliy Filippov, linux-fsdevel
Cc: linux-block, linux-nvme, linux-fsdevel+subscribe, Keith Busch
On 22/12/2025 13:28, Vitaliy Filippov wrote:
> Hi linux-fsdevel,
> I recently discovered that Linux incorrectly requires all atomic
> writes to have 2^N length and to be aligned on the length boundary.
> This requirement contradicts NVMe specification which doesn't require
> such alignment and length and thus highly restricts usage of atomic
> writes with NVMe disks which support it (Micron and Kioxia).
All these alignment and size rules are specific to using RWF_ATOMIC. You
don't have to use RWF_ATOMIC if you don't want to - as you prob know,
atomic writes are implicit on NVMe.
> NVMe specification has its own atomic write restrictions - AWUPF and
> NABSPF/NABO, but both are already checked by the nvme subsystem.
> The 2^N restriction comes from generic_atomic_write_valid().
> I submitted a patch which removes this restriction to linux-block and
> linux-nvme. Sorry if these maillists weren't the right place to send
> it to, it's my first patch :).
> But the function is currently used in 3 places: block/fops.c,
> fs/ext4/file.c and fs/xfs/xfs_file.c.
> Can you tell me if ext4 and xfs really want atomic writes to be 2^N
> sized and length-aligned?
As above, this is just the kernel atomic write rules to support using
different storage technologies.
> From looking at the code I'd say they don't really require it?
> Can you approve my patch if I'm right? Please :-)
>
> On Mon, Dec 22, 2025 at 12:54 PM Vitaliy Filippov <vitalifster@gmail.com> wrote:
>>
>> Hi! Thanks a lot for your reply! This is actually my first patch ever
>> so please don't blame me for not following some standards, I'll try to
>> resubmit it correctly.
>>
>> Regarding the rest:
>>
>> 1) NVMe atomic boundaries seem to already be checked in
>> nvme_valid_atomic_write().
>>
>> 2) What's atomic_write_hw_unit_max? As I understand, Linux also
>> already checks it, at least
>> /sys/block/nvme**/queue/atomic_write_max_bytes is already limited by
>> max_hw_sectors_kb.
>>
>> 3) Yes, I've of course seen that this function is also used by ext4
>> and xfs, but I don't understand the motivation behind the 2^n
>> requirement. I suppose file systems may fragment the write according
>> to currently allocated extents for example, but I don't see how issues
>> coming from this can be fixed by requiring writes to be 2^n.
>>
>> But I understand that just removing the check may break something if
>> somebody relies on them. What do you think about removing the
>> requirement only for NVMe or only for block devices then? I see 3 ways
>> to do it:
>> a) split generic_atomic_write_valid() into two functions - first for
>> all types of inodes and second only for file systems.
>> b) remove generic_atomic_write_valid() from block device checks at all.
>> c) change generic_atomic_write_valid() just like in my original patch
>> but copy original checks into other places where it's used (ext4 and
>> xfs).
>>
>> Which way do you think would be the best?
>>
>> On Mon, Dec 22, 2025 at 2:17 AM Keith Busch <kbusch@kernel.org> wrote:
>>>
>>> On Sun, Dec 21, 2025 at 04:24:02PM +0300, Vitaliy Filippov wrote:
>>>> It contradicts NVMe specification where alignment is only required when atomic
>>>> write boundary (NABSPF/NABO) is set and highly limits usage of NVMe atomic writes
>>>
>>> Commit header is missing the "fs:" prefix, and the commit log should
>>> wrap at 72 characters.
>>>
>>> On the techincal side, this is a generic function used by multiple
>>> protocols, so you can't just appeal to NVMe to justify removing the
>>> checks.
>>>
>>> NVMe still has atomic boundaries where straddling it fails to be an
>>> atomic operation. Instead of removing the checks, you'd have to replace
>>> it with a more costly operation if you really want to support more
>>> arbitrary write lengths and offsets. And if you do manage to remove the
>>> power of two requirement, then the queue limit for nvme's
>>> atomic_write_hw_unit_max isn't correct anymore.
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2] Do not require atomic writes to be power of 2 sized and aligned on length boundary
2025-12-23 9:26 ` John Garry
@ 2025-12-23 11:19 ` Vitaliy Filippov
2025-12-23 11:34 ` Vitaliy Filippov
0 siblings, 1 reply; 8+ messages in thread
From: Vitaliy Filippov @ 2025-12-23 11:19 UTC (permalink / raw)
To: John Garry
Cc: linux-fsdevel, linux-block, linux-nvme, linux-fsdevel+subscribe,
Keith Busch
What does "just the kernel atomic write rules" mean?
What's the idea of these restrictions?
I want to use atomic writes, but without this restriction.
And generally I don't think this restriction is needed for anyone at all.
That's why I ask - can it be removed? Can I remove it in my patch?
On Tue, Dec 23, 2025 at 12:26 PM John Garry <john.g.garry@oracle.com> wrote:
>
> On 22/12/2025 13:28, Vitaliy Filippov wrote:
> > Hi linux-fsdevel,
> > I recently discovered that Linux incorrectly requires all atomic
> > writes to have 2^N length and to be aligned on the length boundary.
> > This requirement contradicts NVMe specification which doesn't require
> > such alignment and length and thus highly restricts usage of atomic
> > writes with NVMe disks which support it (Micron and Kioxia).
>
> All these alignment and size rules are specific to using RWF_ATOMIC. You
> don't have to use RWF_ATOMIC if you don't want to - as you prob know,
> atomic writes are implicit on NVMe.
>
> > NVMe specification has its own atomic write restrictions - AWUPF and
> > NABSPF/NABO, but both are already checked by the nvme subsystem.
> > The 2^N restriction comes from generic_atomic_write_valid().
> > I submitted a patch which removes this restriction to linux-block and
> > linux-nvme. Sorry if these maillists weren't the right place to send
> > it to, it's my first patch :).
> > But the function is currently used in 3 places: block/fops.c,
> > fs/ext4/file.c and fs/xfs/xfs_file.c.
> > Can you tell me if ext4 and xfs really want atomic writes to be 2^N
> > sized and length-aligned?
>
> As above, this is just the kernel atomic write rules to support using
> different storage technologies.
>
> > From looking at the code I'd say they don't really require it?
> > Can you approve my patch if I'm right? Please :-)
> >
> > On Mon, Dec 22, 2025 at 12:54 PM Vitaliy Filippov <vitalifster@gmail.com> wrote:
> >>
> >> Hi! Thanks a lot for your reply! This is actually my first patch ever
> >> so please don't blame me for not following some standards, I'll try to
> >> resubmit it correctly.
> >>
> >> Regarding the rest:
> >>
> >> 1) NVMe atomic boundaries seem to already be checked in
> >> nvme_valid_atomic_write().
> >>
> >> 2) What's atomic_write_hw_unit_max? As I understand, Linux also
> >> already checks it, at least
> >> /sys/block/nvme**/queue/atomic_write_max_bytes is already limited by
> >> max_hw_sectors_kb.
> >>
> >> 3) Yes, I've of course seen that this function is also used by ext4
> >> and xfs, but I don't understand the motivation behind the 2^n
> >> requirement. I suppose file systems may fragment the write according
> >> to currently allocated extents for example, but I don't see how issues
> >> coming from this can be fixed by requiring writes to be 2^n.
> >>
> >> But I understand that just removing the check may break something if
> >> somebody relies on them. What do you think about removing the
> >> requirement only for NVMe or only for block devices then? I see 3 ways
> >> to do it:
> >> a) split generic_atomic_write_valid() into two functions - first for
> >> all types of inodes and second only for file systems.
> >> b) remove generic_atomic_write_valid() from block device checks at all.
> >> c) change generic_atomic_write_valid() just like in my original patch
> >> but copy original checks into other places where it's used (ext4 and
> >> xfs).
> >>
> >> Which way do you think would be the best?
> >>
> >> On Mon, Dec 22, 2025 at 2:17 AM Keith Busch <kbusch@kernel.org> wrote:
> >>>
> >>> On Sun, Dec 21, 2025 at 04:24:02PM +0300, Vitaliy Filippov wrote:
> >>>> It contradicts NVMe specification where alignment is only required when atomic
> >>>> write boundary (NABSPF/NABO) is set and highly limits usage of NVMe atomic writes
> >>>
> >>> Commit header is missing the "fs:" prefix, and the commit log should
> >>> wrap at 72 characters.
> >>>
> >>> On the techincal side, this is a generic function used by multiple
> >>> protocols, so you can't just appeal to NVMe to justify removing the
> >>> checks.
> >>>
> >>> NVMe still has atomic boundaries where straddling it fails to be an
> >>> atomic operation. Instead of removing the checks, you'd have to replace
> >>> it with a more costly operation if you really want to support more
> >>> arbitrary write lengths and offsets. And if you do manage to remove the
> >>> power of two requirement, then the queue limit for nvme's
> >>> atomic_write_hw_unit_max isn't correct anymore.
> >
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2] Do not require atomic writes to be power of 2 sized and aligned on length boundary
2025-12-23 11:19 ` Vitaliy Filippov
@ 2025-12-23 11:34 ` Vitaliy Filippov
0 siblings, 0 replies; 8+ messages in thread
From: Vitaliy Filippov @ 2025-12-23 11:34 UTC (permalink / raw)
To: John Garry; +Cc: linux-fsdevel, linux-block, linux-nvme, Keith Busch
For example, in theory, there are also SAS disks which require a
separate WRITE ATOMIC command for writes to be atomic.
I'm not sure which actual disk models support it, though... :)
But as I understand, Linux won't be able to send this command without
the RWF_ATOMIC flag.
And RWF_ATOMIC is limited to 2^N and length-aligned writes so it would
block SAS/SCSI atomic write usage for at least part of use-cases.
On Tue, Dec 23, 2025 at 2:19 PM Vitaliy Filippov <vitalifster@gmail.com> wrote:
>
> What does "just the kernel atomic write rules" mean?
> What's the idea of these restrictions?
> I want to use atomic writes, but without this restriction.
> And generally I don't think this restriction is needed for anyone at all.
> That's why I ask - can it be removed? Can I remove it in my patch?
>
> On Tue, Dec 23, 2025 at 12:26 PM John Garry <john.g.garry@oracle.com> wrote:
> >
> > On 22/12/2025 13:28, Vitaliy Filippov wrote:
> > > Hi linux-fsdevel,
> > > I recently discovered that Linux incorrectly requires all atomic
> > > writes to have 2^N length and to be aligned on the length boundary.
> > > This requirement contradicts NVMe specification which doesn't require
> > > such alignment and length and thus highly restricts usage of atomic
> > > writes with NVMe disks which support it (Micron and Kioxia).
> >
> > All these alignment and size rules are specific to using RWF_ATOMIC. You
> > don't have to use RWF_ATOMIC if you don't want to - as you prob know,
> > atomic writes are implicit on NVMe.
> >
> > > NVMe specification has its own atomic write restrictions - AWUPF and
> > > NABSPF/NABO, but both are already checked by the nvme subsystem.
> > > The 2^N restriction comes from generic_atomic_write_valid().
> > > I submitted a patch which removes this restriction to linux-block and
> > > linux-nvme. Sorry if these maillists weren't the right place to send
> > > it to, it's my first patch :).
> > > But the function is currently used in 3 places: block/fops.c,
> > > fs/ext4/file.c and fs/xfs/xfs_file.c.
> > > Can you tell me if ext4 and xfs really want atomic writes to be 2^N
> > > sized and length-aligned?
> >
> > As above, this is just the kernel atomic write rules to support using
> > different storage technologies.
> >
> > > From looking at the code I'd say they don't really require it?
> > > Can you approve my patch if I'm right? Please :-)
> > >
> > > On Mon, Dec 22, 2025 at 12:54 PM Vitaliy Filippov <vitalifster@gmail.com> wrote:
> > >>
> > >> Hi! Thanks a lot for your reply! This is actually my first patch ever
> > >> so please don't blame me for not following some standards, I'll try to
> > >> resubmit it correctly.
> > >>
> > >> Regarding the rest:
> > >>
> > >> 1) NVMe atomic boundaries seem to already be checked in
> > >> nvme_valid_atomic_write().
> > >>
> > >> 2) What's atomic_write_hw_unit_max? As I understand, Linux also
> > >> already checks it, at least
> > >> /sys/block/nvme**/queue/atomic_write_max_bytes is already limited by
> > >> max_hw_sectors_kb.
> > >>
> > >> 3) Yes, I've of course seen that this function is also used by ext4
> > >> and xfs, but I don't understand the motivation behind the 2^n
> > >> requirement. I suppose file systems may fragment the write according
> > >> to currently allocated extents for example, but I don't see how issues
> > >> coming from this can be fixed by requiring writes to be 2^n.
> > >>
> > >> But I understand that just removing the check may break something if
> > >> somebody relies on them. What do you think about removing the
> > >> requirement only for NVMe or only for block devices then? I see 3 ways
> > >> to do it:
> > >> a) split generic_atomic_write_valid() into two functions - first for
> > >> all types of inodes and second only for file systems.
> > >> b) remove generic_atomic_write_valid() from block device checks at all.
> > >> c) change generic_atomic_write_valid() just like in my original patch
> > >> but copy original checks into other places where it's used (ext4 and
> > >> xfs).
> > >>
> > >> Which way do you think would be the best?
> > >>
> > >> On Mon, Dec 22, 2025 at 2:17 AM Keith Busch <kbusch@kernel.org> wrote:
> > >>>
> > >>> On Sun, Dec 21, 2025 at 04:24:02PM +0300, Vitaliy Filippov wrote:
> > >>>> It contradicts NVMe specification where alignment is only required when atomic
> > >>>> write boundary (NABSPF/NABO) is set and highly limits usage of NVMe atomic writes
> > >>>
> > >>> Commit header is missing the "fs:" prefix, and the commit log should
> > >>> wrap at 72 characters.
> > >>>
> > >>> On the techincal side, this is a generic function used by multiple
> > >>> protocols, so you can't just appeal to NVMe to justify removing the
> > >>> checks.
> > >>>
> > >>> NVMe still has atomic boundaries where straddling it fails to be an
> > >>> atomic operation. Instead of removing the checks, you'd have to replace
> > >>> it with a more costly operation if you really want to support more
> > >>> arbitrary write lengths and offsets. And if you do manage to remove the
> > >>> power of two requirement, then the queue limit for nvme's
> > >>> atomic_write_hw_unit_max isn't correct anymore.
> > >
> >
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2] Do not require atomic writes to be power of 2 sized and aligned on length boundary
2025-12-21 13:24 [PATCH v2] Do not require atomic writes to be power of 2 sized and aligned on length boundary Vitaliy Filippov
2025-12-21 23:17 ` Keith Busch
@ 2026-01-28 6:08 ` Ojaswin Mujoo
1 sibling, 0 replies; 8+ messages in thread
From: Ojaswin Mujoo @ 2026-01-28 6:08 UTC (permalink / raw)
To: Vitaliy Filippov; +Cc: linux-block, linux-nvme
On Sun, Dec 21, 2025 at 04:24:02PM +0300, Vitaliy Filippov wrote:
> It contradicts NVMe specification where alignment is only required when atomic
> write boundary (NABSPF/NABO) is set and highly limits usage of NVMe atomic writes
>
> Signed-off-by: Vitaliy Filippov <vitalifster@gmail.com>
Hi Vitaliy,
There's some context to how this feature is designed as such. One of the
reasons to have powers of 2 is to abstract out device (SCSI, NVME) level
spec details from the higher level implementation of atomic writes. My
memory on what the specs say is a bit fuzzy but iirc SCSI defines an
optional alignment for WRITE_ATOMIC command wheras NVMe can have a
boundary which shall not be crossed.
Which means, for a user to perform atomic writes, the physical blocks
allocated by the filesystem would need to adhere to these limitations,
which would need knowledge, at the FS level, of what the underlying device
is and what its limitations are. We wanted to avoid exposing these
details to the FS. The power of 2 length and alignment becomes a good
middle ground where if the FS can ensure that the allocated blocks
follow these limits, then it would satisfy both SCSI and NVMe, without
having to worry about the individual spec's details.
It also helps that power of 2 simplifies the calculations at a lot of
places and the first users of the feature ie DBs are okay with this
limitation.
Yes it might be a bit restrictive and we might have use cases in the
future that need non power-of-2, but just removing it from the
generic helpers, like you did, is not the right way. It will be a more
involved change that might need modifications throughout the stack.
Regards,
ojaswin
> ---
> fs/read_write.c | 8 --------
> 1 file changed, 8 deletions(-)
>
> diff --git a/fs/read_write.c b/fs/read_write.c
> index 833bae068770..5467d710108d 100644
> --- a/fs/read_write.c
> +++ b/fs/read_write.c
> @@ -1802,17 +1802,9 @@ int generic_file_rw_checks(struct file *file_in, struct file *file_out)
>
> int generic_atomic_write_valid(struct kiocb *iocb, struct iov_iter *iter)
> {
> - size_t len = iov_iter_count(iter);
> -
> if (!iter_is_ubuf(iter))
> return -EINVAL;
>
> - if (!is_power_of_2(len))
> - return -EINVAL;
> -
> - if (!IS_ALIGNED(iocb->ki_pos, len))
> - return -EINVAL;
> -
> if (!(iocb->ki_flags & IOCB_DIRECT))
> return -EOPNOTSUPP;
>
> --
> 2.51.0
>
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2026-01-28 6:08 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-21 13:24 [PATCH v2] Do not require atomic writes to be power of 2 sized and aligned on length boundary Vitaliy Filippov
2025-12-21 23:17 ` Keith Busch
2025-12-22 9:54 ` Vitaliy Filippov
2025-12-22 13:28 ` Vitaliy Filippov
2025-12-23 9:26 ` John Garry
2025-12-23 11:19 ` Vitaliy Filippov
2025-12-23 11:34 ` Vitaliy Filippov
2026-01-28 6:08 ` Ojaswin Mujoo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox