* [PATCH 1/4] iomap: make sure iomap_adjust_read_range() are aligned with block_size
2025-09-13 3:37 [PATCH v4 0/4] allow partial folio write with iomap_folio_state alexjlzheng
@ 2025-09-13 3:37 ` alexjlzheng
2025-09-14 11:45 ` Pankaj Raghav (Samsung)
2025-09-13 3:37 ` [PATCH 2/4] iomap: move iter revert case out of the unwritten branch alexjlzheng
` (3 subsequent siblings)
4 siblings, 1 reply; 14+ messages in thread
From: alexjlzheng @ 2025-09-13 3:37 UTC (permalink / raw)
To: hch, brauner
Cc: djwong, yi.zhang, linux-xfs, linux-fsdevel, linux-kernel,
Jinliang Zheng
From: Jinliang Zheng <alexjlzheng@tencent.com>
iomap_folio_state marks the uptodate state in units of block_size, so
it is better to check that pos and length are aligned with block_size.
Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com>
---
fs/iomap/buffered-io.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index fd827398afd2..0c38333933c6 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -234,6 +234,9 @@ static void iomap_adjust_read_range(struct inode *inode, struct folio *folio,
unsigned first = poff >> block_bits;
unsigned last = (poff + plen - 1) >> block_bits;
+ WARN_ON(*pos & (block_size - 1));
+ WARN_ON(length & (block_size - 1));
+
/*
* If the block size is smaller than the page size, we need to check the
* per-block uptodate status and adjust the offset and length if needed
--
2.49.0
^ permalink raw reply related [flat|nested] 14+ messages in thread* Re: [PATCH 1/4] iomap: make sure iomap_adjust_read_range() are aligned with block_size
2025-09-13 3:37 ` [PATCH 1/4] iomap: make sure iomap_adjust_read_range() are aligned with block_size alexjlzheng
@ 2025-09-14 11:45 ` Pankaj Raghav (Samsung)
2025-09-14 12:40 ` Jinliang Zheng
0 siblings, 1 reply; 14+ messages in thread
From: Pankaj Raghav (Samsung) @ 2025-09-14 11:45 UTC (permalink / raw)
To: alexjlzheng
Cc: hch, brauner, djwong, yi.zhang, linux-xfs, linux-fsdevel,
linux-kernel, Jinliang Zheng
On Sat, Sep 13, 2025 at 11:37:15AM +0800, alexjlzheng@gmail.com wrote:
> From: Jinliang Zheng <alexjlzheng@tencent.com>
>
> iomap_folio_state marks the uptodate state in units of block_size, so
> it is better to check that pos and length are aligned with block_size.
>
> Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com>
> ---
> fs/iomap/buffered-io.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index fd827398afd2..0c38333933c6 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -234,6 +234,9 @@ static void iomap_adjust_read_range(struct inode *inode, struct folio *folio,
> unsigned first = poff >> block_bits;
> unsigned last = (poff + plen - 1) >> block_bits;
>
> + WARN_ON(*pos & (block_size - 1));
> + WARN_ON(length & (block_size - 1));
Any reason you chose WARN_ON instead of WARN_ON_ONCE?
I don't see WARN_ON being used in iomap/buffered-io.c.
--
Pankaj
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/4] iomap: make sure iomap_adjust_read_range() are aligned with block_size
2025-09-14 11:45 ` Pankaj Raghav (Samsung)
@ 2025-09-14 12:40 ` Jinliang Zheng
2025-09-15 8:54 ` Pankaj Raghav (Samsung)
0 siblings, 1 reply; 14+ messages in thread
From: Jinliang Zheng @ 2025-09-14 12:40 UTC (permalink / raw)
To: kernel
Cc: alexjlzheng, alexjlzheng, brauner, djwong, hch, linux-fsdevel,
linux-kernel, linux-xfs, yi.zhang
On Sun, 14 Sep 2025 13:45:16 +0200, kernel@pankajraghav.com wrote:
> On Sat, Sep 14, 2025 at 11:37:15AM +0800, alexjlzheng@gmail.com wrote:
> > From: Jinliang Zheng <alexjlzheng@tencent.com>
> >
> > iomap_folio_state marks the uptodate state in units of block_size, so
> > it is better to check that pos and length are aligned with block_size.
> >
> > Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com>
> > ---
> > fs/iomap/buffered-io.c | 3 +++
> > 1 file changed, 3 insertions(+)
> >
> > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > index fd827398afd2..0c38333933c6 100644
> > --- a/fs/iomap/buffered-io.c
> > +++ b/fs/iomap/buffered-io.c
> > @@ -234,6 +234,9 @@ static void iomap_adjust_read_range(struct inode *inode, struct folio *folio,
> > unsigned first = poff >> block_bits;
> > unsigned last = (poff + plen - 1) >> block_bits;
> >
> > + WARN_ON(*pos & (block_size - 1));
> > + WARN_ON(length & (block_size - 1));
> Any reason you chose WARN_ON instead of WARN_ON_ONCE?
I just think it's a fatal error that deserves attention every time
it's triggered.
>
> I don't see WARN_ON being used in iomap/buffered-io.c.
I'm not sure if there are any community guidelines for using these
two macros. If there are, please let me know and I'll be happy to
follow them as a guide.
thanks,
Jinliang Zheng. :)
> --
> Pankaj
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/4] iomap: make sure iomap_adjust_read_range() are aligned with block_size
2025-09-14 12:40 ` Jinliang Zheng
@ 2025-09-15 8:54 ` Pankaj Raghav (Samsung)
2025-09-15 9:12 ` Jinliang Zheng
0 siblings, 1 reply; 14+ messages in thread
From: Pankaj Raghav (Samsung) @ 2025-09-15 8:54 UTC (permalink / raw)
To: Jinliang Zheng
Cc: alexjlzheng, brauner, djwong, hch, linux-fsdevel, linux-kernel,
linux-xfs, yi.zhang
On Sun, Sep 14, 2025 at 08:40:06PM +0800, Jinliang Zheng wrote:
> On Sun, 14 Sep 2025 13:45:16 +0200, kernel@pankajraghav.com wrote:
> > On Sat, Sep 14, 2025 at 11:37:15AM +0800, alexjlzheng@gmail.com wrote:
> > > From: Jinliang Zheng <alexjlzheng@tencent.com>
> > >
> > > iomap_folio_state marks the uptodate state in units of block_size, so
> > > it is better to check that pos and length are aligned with block_size.
> > >
> > > Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com>
> > > ---
> > > fs/iomap/buffered-io.c | 3 +++
> > > 1 file changed, 3 insertions(+)
> > >
> > > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > > index fd827398afd2..0c38333933c6 100644
> > > --- a/fs/iomap/buffered-io.c
> > > +++ b/fs/iomap/buffered-io.c
> > > @@ -234,6 +234,9 @@ static void iomap_adjust_read_range(struct inode *inode, struct folio *folio,
> > > unsigned first = poff >> block_bits;
> > > unsigned last = (poff + plen - 1) >> block_bits;
> > >
> > > + WARN_ON(*pos & (block_size - 1));
> > > + WARN_ON(length & (block_size - 1));
> > Any reason you chose WARN_ON instead of WARN_ON_ONCE?
>
> I just think it's a fatal error that deserves attention every time
> it's triggered.
>
Is this a general change or does your later changes depend on these on
warning to work correctly?
> >
> > I don't see WARN_ON being used in iomap/buffered-io.c.
>
> I'm not sure if there are any community guidelines for using these
> two macros. If there are, please let me know and I'll be happy to
> follow them as a guide.
We typically use WARN_ON_ONCE to prevent spamming.
--
Pankaj
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/4] iomap: make sure iomap_adjust_read_range() are aligned with block_size
2025-09-15 8:54 ` Pankaj Raghav (Samsung)
@ 2025-09-15 9:12 ` Jinliang Zheng
0 siblings, 0 replies; 14+ messages in thread
From: Jinliang Zheng @ 2025-09-15 9:12 UTC (permalink / raw)
To: kernel
Cc: alexjlzheng, alexjlzheng, brauner, djwong, hch, linux-fsdevel,
linux-kernel, linux-xfs, yi.zhang
On Mon, 15 Sep 2025 10:54:00 +0200, kernel@pankajraghav.com wrote:
> On Sun, Sep 14, 2025 at 08:40:06PM +0800, Jinliang Zheng wrote:
> > On Sun, 14 Sep 2025 13:45:16 +0200, kernel@pankajraghav.com wrote:
> > > On Sat, Sep 14, 2025 at 11:37:15AM +0800, alexjlzheng@gmail.com wrote:
> > > > From: Jinliang Zheng <alexjlzheng@tencent.com>
> > > >
> > > > iomap_folio_state marks the uptodate state in units of block_size, so
> > > > it is better to check that pos and length are aligned with block_size.
> > > >
> > > > Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com>
> > > > ---
> > > > fs/iomap/buffered-io.c | 3 +++
> > > > 1 file changed, 3 insertions(+)
> > > >
> > > > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > > > index fd827398afd2..0c38333933c6 100644
> > > > --- a/fs/iomap/buffered-io.c
> > > > +++ b/fs/iomap/buffered-io.c
> > > > @@ -234,6 +234,9 @@ static void iomap_adjust_read_range(struct inode *inode, struct folio *folio,
> > > > unsigned first = poff >> block_bits;
> > > > unsigned last = (poff + plen - 1) >> block_bits;
> > > >
> > > > + WARN_ON(*pos & (block_size - 1));
> > > > + WARN_ON(length & (block_size - 1));
> > > Any reason you chose WARN_ON instead of WARN_ON_ONCE?
> >
> > I just think it's a fatal error that deserves attention every time
> > it's triggered.
> >
>
> Is this a general change or does your later changes depend on these on
> warning to work correctly?
No, there is no functional change.
I added it only because the correctness of iomap_adjust_read_range() depends on
it, so it's better to hightlight it now.
```
/* move forward for each leading block marked uptodate */
for (i = first; i <= last; i++) {
if (!ifs_block_is_uptodate(ifs, i))
break;
*pos += block_size; <-------------------- if not aligned, ...
poff += block_size;
plen -= block_size;
first++;
}
```
>
> > >
> > > I don't see WARN_ON being used in iomap/buffered-io.c.
> >
> > I'm not sure if there are any community guidelines for using these
> > two macros. If there are, please let me know and I'll be happy to
> > follow them as a guide.
>
> We typically use WARN_ON_ONCE to prevent spamming.
If you think it's better, I will send a new version.
thanks,
Jinliang Zheng. :)
>
> --
> Pankaj
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 2/4] iomap: move iter revert case out of the unwritten branch
2025-09-13 3:37 [PATCH v4 0/4] allow partial folio write with iomap_folio_state alexjlzheng
2025-09-13 3:37 ` [PATCH 1/4] iomap: make sure iomap_adjust_read_range() are aligned with block_size alexjlzheng
@ 2025-09-13 3:37 ` alexjlzheng
2025-09-13 3:37 ` [PATCH 3/4] iomap: make iomap_write_end() return the number of written length again alexjlzheng
` (2 subsequent siblings)
4 siblings, 0 replies; 14+ messages in thread
From: alexjlzheng @ 2025-09-13 3:37 UTC (permalink / raw)
To: hch, brauner
Cc: djwong, yi.zhang, linux-xfs, linux-fsdevel, linux-kernel,
Jinliang Zheng
From: Jinliang Zheng <alexjlzheng@tencent.com>
The commit e1f453d4336d ("iomap: do some small logical cleanup in
buffered write") merged iomap_write_failed() and iov_iter_revert()
into the branch with written == 0. Because, at the time,
iomap_write_end() could never return a partial write length.
In the subsequent patch, iomap_write_end() will be modified to allow
to return block-aligned partial write length (partial write length
here is relative to the folio-sized write), which violated the above
patch's assumption.
This patch moves it back out to prepare for the subsequent patches.
Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com>
---
fs/iomap/buffered-io.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 0c38333933c6..109c3bad6ccf 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1019,6 +1019,11 @@ static int iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i,
if (old_size < pos)
pagecache_isize_extended(iter->inode, old_size, pos);
+ if (written < bytes)
+ iomap_write_failed(iter->inode, pos + written,
+ bytes - written);
+ if (unlikely(copied != written))
+ iov_iter_revert(i, copied - written);
cond_resched();
if (unlikely(written == 0)) {
@@ -1028,9 +1033,6 @@ static int iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i,
* halfway through, might be a race with munmap,
* might be severe memory pressure.
*/
- iomap_write_failed(iter->inode, pos, bytes);
- iov_iter_revert(i, copied);
-
if (chunk > PAGE_SIZE)
chunk /= 2;
if (copied) {
--
2.49.0
^ permalink raw reply related [flat|nested] 14+ messages in thread* [PATCH 3/4] iomap: make iomap_write_end() return the number of written length again
2025-09-13 3:37 [PATCH v4 0/4] allow partial folio write with iomap_folio_state alexjlzheng
2025-09-13 3:37 ` [PATCH 1/4] iomap: make sure iomap_adjust_read_range() are aligned with block_size alexjlzheng
2025-09-13 3:37 ` [PATCH 2/4] iomap: move iter revert case out of the unwritten branch alexjlzheng
@ 2025-09-13 3:37 ` alexjlzheng
2025-09-13 3:37 ` [PATCH 4/4] iomap: don't abandon the whole copy when we have iomap_folio_state alexjlzheng
2025-09-14 11:40 ` [PATCH v4 0/4] allow partial folio write with iomap_folio_state Pankaj Raghav (Samsung)
4 siblings, 0 replies; 14+ messages in thread
From: alexjlzheng @ 2025-09-13 3:37 UTC (permalink / raw)
To: hch, brauner
Cc: djwong, yi.zhang, linux-xfs, linux-fsdevel, linux-kernel,
Jinliang Zheng
From: Jinliang Zheng <alexjlzheng@tencent.com>
In the next patch, we allow iomap_write_end() to conditionally accept
partial writes, so this patch makes iomap_write_end() return the number
of accepted write bytes in preparation for the next patch.
Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com>
---
fs/iomap/buffered-io.c | 27 +++++++++++++--------------
1 file changed, 13 insertions(+), 14 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 109c3bad6ccf..7b9193f8243a 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -873,7 +873,7 @@ static int iomap_write_begin(struct iomap_iter *iter,
return status;
}
-static bool __iomap_write_end(struct inode *inode, loff_t pos, size_t len,
+static int __iomap_write_end(struct inode *inode, loff_t pos, size_t len,
size_t copied, struct folio *folio)
{
flush_dcache_folio(folio);
@@ -890,11 +890,11 @@ static bool __iomap_write_end(struct inode *inode, loff_t pos, size_t len,
* redo the whole thing.
*/
if (unlikely(copied < len && !folio_test_uptodate(folio)))
- return false;
+ return 0;
iomap_set_range_uptodate(folio, offset_in_folio(folio, pos), len);
iomap_set_range_dirty(folio, offset_in_folio(folio, pos), copied);
filemap_dirty_folio(inode->i_mapping, folio);
- return true;
+ return copied;
}
static void iomap_write_end_inline(const struct iomap_iter *iter,
@@ -915,10 +915,10 @@ static void iomap_write_end_inline(const struct iomap_iter *iter,
}
/*
- * Returns true if all copied bytes have been written to the pagecache,
- * otherwise return false.
+ * Returns number of copied bytes have been written to the pagecache,
+ * zero if block is partial update.
*/
-static bool iomap_write_end(struct iomap_iter *iter, size_t len, size_t copied,
+static int iomap_write_end(struct iomap_iter *iter, size_t len, size_t copied,
struct folio *folio)
{
const struct iomap *srcmap = iomap_iter_srcmap(iter);
@@ -926,7 +926,7 @@ static bool iomap_write_end(struct iomap_iter *iter, size_t len, size_t copied,
if (srcmap->type == IOMAP_INLINE) {
iomap_write_end_inline(iter, folio, pos, copied);
- return true;
+ return copied;
}
if (srcmap->flags & IOMAP_F_BUFFER_HEAD) {
@@ -934,7 +934,7 @@ static bool iomap_write_end(struct iomap_iter *iter, size_t len, size_t copied,
bh_written = block_write_end(pos, len, copied, folio);
WARN_ON_ONCE(bh_written != copied && bh_written != 0);
- return bh_written == copied;
+ return bh_written;
}
return __iomap_write_end(iter->inode, pos, len, copied, folio);
@@ -1000,8 +1000,7 @@ static int iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i,
flush_dcache_folio(folio);
copied = copy_folio_from_iter_atomic(folio, offset, bytes, i);
- written = iomap_write_end(iter, bytes, copied, folio) ?
- copied : 0;
+ written = iomap_write_end(iter, bytes, copied, folio);
/*
* Update the in-memory inode size after copying the data into
@@ -1315,7 +1314,7 @@ static int iomap_unshare_iter(struct iomap_iter *iter,
do {
struct folio *folio;
size_t offset;
- bool ret;
+ int ret;
bytes = min_t(u64, SIZE_MAX, bytes);
status = iomap_write_begin(iter, write_ops, &folio, &offset,
@@ -1327,7 +1326,7 @@ static int iomap_unshare_iter(struct iomap_iter *iter,
ret = iomap_write_end(iter, bytes, bytes, folio);
__iomap_put_folio(iter, write_ops, bytes, folio);
- if (WARN_ON_ONCE(!ret))
+ if (WARN_ON_ONCE(ret != bytes))
return -EIO;
cond_resched();
@@ -1388,7 +1387,7 @@ static int iomap_zero_iter(struct iomap_iter *iter, bool *did_zero,
do {
struct folio *folio;
size_t offset;
- bool ret;
+ int ret;
bytes = min_t(u64, SIZE_MAX, bytes);
status = iomap_write_begin(iter, write_ops, &folio, &offset,
@@ -1406,7 +1405,7 @@ static int iomap_zero_iter(struct iomap_iter *iter, bool *did_zero,
ret = iomap_write_end(iter, bytes, bytes, folio);
__iomap_put_folio(iter, write_ops, bytes, folio);
- if (WARN_ON_ONCE(!ret))
+ if (WARN_ON_ONCE(ret != bytes))
return -EIO;
status = iomap_iter_advance(iter, &bytes);
--
2.49.0
^ permalink raw reply related [flat|nested] 14+ messages in thread* [PATCH 4/4] iomap: don't abandon the whole copy when we have iomap_folio_state
2025-09-13 3:37 [PATCH v4 0/4] allow partial folio write with iomap_folio_state alexjlzheng
` (2 preceding siblings ...)
2025-09-13 3:37 ` [PATCH 3/4] iomap: make iomap_write_end() return the number of written length again alexjlzheng
@ 2025-09-13 3:37 ` alexjlzheng
2025-09-15 10:50 ` Pankaj Raghav (Samsung)
2025-09-14 11:40 ` [PATCH v4 0/4] allow partial folio write with iomap_folio_state Pankaj Raghav (Samsung)
4 siblings, 1 reply; 14+ messages in thread
From: alexjlzheng @ 2025-09-13 3:37 UTC (permalink / raw)
To: hch, brauner
Cc: djwong, yi.zhang, linux-xfs, linux-fsdevel, linux-kernel,
Jinliang Zheng
From: Jinliang Zheng <alexjlzheng@tencent.com>
Currently, if a partial write occurs in a buffer write, the entire write will
be discarded. While this is an uncommon case, it's still a bit wasteful and
we can do better.
With iomap_folio_state, we can identify uptodate states at the block
level, and a read_folio reading can correctly handle partially
uptodate folios.
Therefore, when a partial write occurs, accept the block-aligned
partial write instead of rejecting the entire write.
For example, suppose a folio is 2MB, blocksize is 4kB, and the copied
bytes are 2MB-3kB.
Without this patchset, we'd need to recopy from the beginning of the
folio in the next iteration, which means 2MB-3kB of bytes is copy
duplicately.
|<-------------------- 2MB -------------------->|
+-------+-------+-------+-------+-------+-------+
| block | ... | block | block | ... | block | folio
+-------+-------+-------+-------+-------+-------+
|<-4kB->|
|<--------------- copied 2MB-3kB --------->| first time copied
|<-------- 1MB -------->| next time we need copy (chunk /= 2)
|<-------- 1MB -------->| next next time we need copy.
|<------ 2MB-3kB bytes duplicate copy ---->|
With this patchset, we can accept 2MB-4kB of bytes, which is block-aligned.
This means we only need to process the remaining 4kB in the next iteration,
which means there's only 1kB we need to copy duplicately.
|<-------------------- 2MB -------------------->|
+-------+-------+-------+-------+-------+-------+
| block | ... | block | block | ... | block | folio
+-------+-------+-------+-------+-------+-------+
|<-4kB->|
|<--------------- copied 2MB-3kB --------->| first time copied
|<-4kB->| next time we need copy
|<>|
only 1kB bytes duplicate copy
Although partial writes are inherently a relatively unusual situation and do
not account for a large proportion of performance testing, the optimization
here still makes sense in large-scale data centers.
Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com>
---
fs/iomap/buffered-io.c | 44 +++++++++++++++++++++++++++++++++---------
1 file changed, 35 insertions(+), 9 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 7b9193f8243a..0952a3debe11 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -873,6 +873,25 @@ static int iomap_write_begin(struct iomap_iter *iter,
return status;
}
+static int iomap_trim_tail_partial(struct inode *inode, loff_t pos,
+ size_t copied, struct folio *folio)
+{
+ struct iomap_folio_state *ifs = folio->private;
+ unsigned block_size, last_blk, last_blk_bytes;
+
+ if (!ifs || !copied)
+ return 0;
+
+ block_size = 1 << inode->i_blkbits;
+ last_blk = offset_in_folio(folio, pos + copied - 1) >> inode->i_blkbits;
+ last_blk_bytes = (pos + copied) & (block_size - 1);
+
+ if (!ifs_block_is_uptodate(ifs, last_blk))
+ copied -= min(copied, last_blk_bytes);
+
+ return copied;
+}
+
static int __iomap_write_end(struct inode *inode, loff_t pos, size_t len,
size_t copied, struct folio *folio)
{
@@ -881,17 +900,24 @@ static int __iomap_write_end(struct inode *inode, loff_t pos, size_t len,
/*
* The blocks that were entirely written will now be uptodate, so we
* don't have to worry about a read_folio reading them and overwriting a
- * partial write. However, if we've encountered a short write and only
- * partially written into a block, it will not be marked uptodate, so a
- * read_folio might come in and destroy our partial write.
+ * partial write.
*
- * Do the simplest thing and just treat any short write to a
- * non-uptodate page as a zero-length write, and force the caller to
- * redo the whole thing.
+ * However, if we've encountered a short write and only partially
+ * written into a block, we must discard the short-written _tail_ block
+ * and not mark it uptodate in the ifs, to ensure a read_folio reading
+ * can handle it correctly via iomap_adjust_read_range(). It's safe to
+ * keep the non-tail block writes because we know that for a non-tail
+ * block:
+ * - is either fully written, since copy_from_user() is sequential
+ * - or is a partially written head block that has already been read in
+ * and marked uptodate in the ifs by iomap_write_begin().
*/
- if (unlikely(copied < len && !folio_test_uptodate(folio)))
- return 0;
- iomap_set_range_uptodate(folio, offset_in_folio(folio, pos), len);
+ if (unlikely(copied < len && !folio_test_uptodate(folio))) {
+ copied = iomap_trim_tail_partial(inode, pos, copied, folio);
+ if (!copied)
+ return 0;
+ }
+ iomap_set_range_uptodate(folio, offset_in_folio(folio, pos), copied);
iomap_set_range_dirty(folio, offset_in_folio(folio, pos), copied);
filemap_dirty_folio(inode->i_mapping, folio);
return copied;
--
2.49.0
^ permalink raw reply related [flat|nested] 14+ messages in thread* Re: [PATCH 4/4] iomap: don't abandon the whole copy when we have iomap_folio_state
2025-09-13 3:37 ` [PATCH 4/4] iomap: don't abandon the whole copy when we have iomap_folio_state alexjlzheng
@ 2025-09-15 10:50 ` Pankaj Raghav (Samsung)
2025-09-15 11:12 ` Jinliang Zheng
0 siblings, 1 reply; 14+ messages in thread
From: Pankaj Raghav (Samsung) @ 2025-09-15 10:50 UTC (permalink / raw)
To: alexjlzheng
Cc: hch, brauner, djwong, yi.zhang, linux-xfs, linux-fsdevel,
linux-kernel, Jinliang Zheng
> +static int iomap_trim_tail_partial(struct inode *inode, loff_t pos,
> + size_t copied, struct folio *folio)
> +{
> + struct iomap_folio_state *ifs = folio->private;
> + unsigned block_size, last_blk, last_blk_bytes;
> +
> + if (!ifs || !copied)
> + return 0;
> +
> + block_size = 1 << inode->i_blkbits;
> + last_blk = offset_in_folio(folio, pos + copied - 1) >> inode->i_blkbits;
> + last_blk_bytes = (pos + copied) & (block_size - 1);
> +
> + if (!ifs_block_is_uptodate(ifs, last_blk))
> + copied -= min(copied, last_blk_bytes);
If pos is aligned to block_size, is there a scenario where
copied < last_blk_bytes?
Trying to understand why you are using a min() here.
--
Pankaj
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH 4/4] iomap: don't abandon the whole copy when we have iomap_folio_state
2025-09-15 10:50 ` Pankaj Raghav (Samsung)
@ 2025-09-15 11:12 ` Jinliang Zheng
2025-09-15 11:29 ` Pankaj Raghav (Samsung)
0 siblings, 1 reply; 14+ messages in thread
From: Jinliang Zheng @ 2025-09-15 11:12 UTC (permalink / raw)
To: kernel
Cc: alexjlzheng, alexjlzheng, brauner, djwong, hch, linux-fsdevel,
linux-kernel, linux-xfs, yi.zhang
On Mon, 15 Sep 2025 12:50:54 +0200, kernel@pankajraghav.com wrote:
> > +static int iomap_trim_tail_partial(struct inode *inode, loff_t pos,
> > + size_t copied, struct folio *folio)
> > +{
> > + struct iomap_folio_state *ifs = folio->private;
> > + unsigned block_size, last_blk, last_blk_bytes;
> > +
> > + if (!ifs || !copied)
> > + return 0;
> > +
> > + block_size = 1 << inode->i_blkbits;
> > + last_blk = offset_in_folio(folio, pos + copied - 1) >> inode->i_blkbits;
> > + last_blk_bytes = (pos + copied) & (block_size - 1);
> > +
> > + if (!ifs_block_is_uptodate(ifs, last_blk))
> > + copied -= min(copied, last_blk_bytes);
>
> If pos is aligned to block_size, is there a scenario where
> copied < last_blk_bytes?
I believe there is no other scenario. The min() here is specifically to handle cases where
pos is not aligned to block_size. But please note that the pos here is unrelated to the pos
in iomap_adjust_read_range().
thanks,
Jinliang Zheng. :)
>
> Trying to understand why you are using a min() here.
> --
> Pankaj
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH 4/4] iomap: don't abandon the whole copy when we have iomap_folio_state
2025-09-15 11:12 ` Jinliang Zheng
@ 2025-09-15 11:29 ` Pankaj Raghav (Samsung)
0 siblings, 0 replies; 14+ messages in thread
From: Pankaj Raghav (Samsung) @ 2025-09-15 11:29 UTC (permalink / raw)
To: Jinliang Zheng
Cc: alexjlzheng, brauner, djwong, hch, linux-fsdevel, linux-kernel,
linux-xfs, yi.zhang
On Mon, Sep 15, 2025 at 07:12:28PM +0800, Jinliang Zheng wrote:
> On Mon, 15 Sep 2025 12:50:54 +0200, kernel@pankajraghav.com wrote:
> > > +static int iomap_trim_tail_partial(struct inode *inode, loff_t pos,
> > > + size_t copied, struct folio *folio)
> > > +{
> > > + struct iomap_folio_state *ifs = folio->private;
> > > + unsigned block_size, last_blk, last_blk_bytes;
> > > +
> > > + if (!ifs || !copied)
> > > + return 0;
> > > +
> > > + block_size = 1 << inode->i_blkbits;
> > > + last_blk = offset_in_folio(folio, pos + copied - 1) >> inode->i_blkbits;
> > > + last_blk_bytes = (pos + copied) & (block_size - 1);
> > > +
> > > + if (!ifs_block_is_uptodate(ifs, last_blk))
> > > + copied -= min(copied, last_blk_bytes);
> >
> > If pos is aligned to block_size, is there a scenario where
> > copied < last_blk_bytes?
>
> I believe there is no other scenario. The min() here is specifically to handle cases where
> pos is not aligned to block_size. But please note that the pos here is unrelated to the pos
> in iomap_adjust_read_range().
Ah, you are right. This is about write and not read. I got a bit
confused after reading both the patches back to back.
--
Pankaj
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v4 0/4] allow partial folio write with iomap_folio_state
2025-09-13 3:37 [PATCH v4 0/4] allow partial folio write with iomap_folio_state alexjlzheng
` (3 preceding siblings ...)
2025-09-13 3:37 ` [PATCH 4/4] iomap: don't abandon the whole copy when we have iomap_folio_state alexjlzheng
@ 2025-09-14 11:40 ` Pankaj Raghav (Samsung)
2025-09-14 13:30 ` Jinliang Zheng
4 siblings, 1 reply; 14+ messages in thread
From: Pankaj Raghav (Samsung) @ 2025-09-14 11:40 UTC (permalink / raw)
To: alexjlzheng
Cc: hch, brauner, djwong, yi.zhang, linux-xfs, linux-fsdevel,
linux-kernel, Jinliang Zheng
On Sat, Sep 13, 2025 at 11:37:14AM +0800, alexjlzheng@gmail.com wrote:
> This patchset has been tested by xfstests' generic and xfs group, and
> there's no new failed cases compared to the lastest upstream version kernel.
Do you know if there is a specific test from generic/ or xfs/ in
xfstests that is testing this path?
As this is slightly changing the behaviour of a partial write, it would
be nice to either add a test or highlight which test is hitting this
path in the cover letter.
--
Pankaj
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH v4 0/4] allow partial folio write with iomap_folio_state
2025-09-14 11:40 ` [PATCH v4 0/4] allow partial folio write with iomap_folio_state Pankaj Raghav (Samsung)
@ 2025-09-14 13:30 ` Jinliang Zheng
0 siblings, 0 replies; 14+ messages in thread
From: Jinliang Zheng @ 2025-09-14 13:30 UTC (permalink / raw)
To: kernel
Cc: alexjlzheng, alexjlzheng, brauner, djwong, hch, linux-fsdevel,
linux-kernel, linux-xfs, yi.zhang
On Sun, 14 Sep 2025 13:40:30 +0200, kernel@pankajraghav.com wrote:
> On Sat, Sep 14, 2025 at 11:37:14AM +0800, alexjlzheng@gmail.com wrote:
> > This patchset has been tested by xfstests' generic and xfs group, and
> > there's no new failed cases compared to the lastest upstream version kernel.
>
> Do you know if there is a specific test from generic/ or xfs/ in
> xfstests that is testing this path?
It seems not. But there is a chance that the existing buffer write will hit.
thanks,
Jinliang Zheng. :)
>
> As this is slightly changing the behaviour of a partial write, it would
> be nice to either add a test or highlight which test is hitting this
> path in the cover letter.
>
> --
> Pankaj
^ permalink raw reply [flat|nested] 14+ messages in thread