* [PATCH] iomap: move prefaulting out of hot write path
@ 2025-07-26 9:09 alexjlzheng
2025-07-27 22:10 ` Matthew Wilcox
0 siblings, 1 reply; 6+ messages in thread
From: alexjlzheng @ 2025-07-26 9:09 UTC (permalink / raw)
To: brauner, djwong, dave.hansen
Cc: linux-xfs, linux-fsdevel, linux-kernel, Jinliang Zheng
From: Jinliang Zheng <alexjlzheng@tencent.com>
Similar to commit 665575cff098 ("filemap: move prefaulting out of hot
write path"), there's no need to do the faultin unconditionally. It is
more reasonable to perform faultin operation only when an exception
occurs.
And copy_folio_from_iter_atomic() short-circuits page fault handle logics
via pagefault_disable(), which prevents deadlock scenarios when both
source and destination buffers reside within the same folio. So it's
safe move prefaulting after copy failed.
Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com>
---
fs/iomap/buffered-io.c | 25 ++++++++++---------------
1 file changed, 10 insertions(+), 15 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index fb4519158f3a..7ca3f3b9d57e 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -964,21 +964,6 @@ static int iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
if (bytes > iomap_length(iter))
bytes = iomap_length(iter);
- /*
- * Bring in the user page that we'll copy from _first_.
- * Otherwise there's a nasty deadlock on copying from the
- * same page as we're writing to, without it being marked
- * up-to-date.
- *
- * For async buffered writes the assumption is that the user
- * page has already been faulted in. This can be optimized by
- * faulting the user page.
- */
- if (unlikely(fault_in_iov_iter_readable(i, bytes) == bytes)) {
- status = -EFAULT;
- break;
- }
-
status = iomap_write_begin(iter, &folio, &offset, &bytes);
if (unlikely(status)) {
iomap_write_failed(iter->inode, iter->pos, bytes);
@@ -992,6 +977,12 @@ static int iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
if (mapping_writably_mapped(mapping))
flush_dcache_folio(folio);
+ /*
+ * copy_folio_from_iter_atomic() short-circuits page fault handle
+ * logics via pagefault_disable(), to prevent deadlock scenarios
+ * when both source and destination buffers reside within the same
+ * folio (mmap, ...).
+ */
copied = copy_folio_from_iter_atomic(folio, offset, bytes, i);
written = iomap_write_end(iter, bytes, copied, folio) ?
copied : 0;
@@ -1030,6 +1021,10 @@ static int iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
bytes = copied;
goto retry;
}
+ if (fault_in_iov_iter_readable(i, bytes) == bytes) {
+ status = -EFAULT;
+ break;
+ }
} else {
total_written += written;
iomap_iter_advance(iter, &written);
--
2.49.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] iomap: move prefaulting out of hot write path
2025-07-26 9:09 alexjlzheng
@ 2025-07-27 22:10 ` Matthew Wilcox
0 siblings, 0 replies; 6+ messages in thread
From: Matthew Wilcox @ 2025-07-27 22:10 UTC (permalink / raw)
To: alexjlzheng
Cc: brauner, djwong, dave.hansen, linux-xfs, linux-fsdevel,
linux-kernel, Jinliang Zheng
On Sat, Jul 26, 2025 at 05:09:56PM +0800, alexjlzheng@gmail.com wrote:
> @@ -992,6 +977,12 @@ static int iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
> if (mapping_writably_mapped(mapping))
> flush_dcache_folio(folio);
>
> + /*
> + * copy_folio_from_iter_atomic() short-circuits page fault handle
> + * logics via pagefault_disable(), to prevent deadlock scenarios
> + * when both source and destination buffers reside within the same
> + * folio (mmap, ...).
> + */
Why did you change this comment from the one in 665575cff098?
The comment in that commit is correct. This comment is so badly
mangled, it isn't even wrong.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH] iomap: move prefaulting out of hot write path
@ 2025-10-09 9:08 alexjlzheng
2025-10-09 15:01 ` Darrick J. Wong
0 siblings, 1 reply; 6+ messages in thread
From: alexjlzheng @ 2025-10-09 9:08 UTC (permalink / raw)
To: djwong, brauner
Cc: linux-xfs, linux-fsdevel, linux-kernel, dave.hansen,
Jinliang Zheng
From: Jinliang Zheng <alexjlzheng@tencent.com>
Prefaulting the write source buffer incurs an extra userspace access
in the common fast path. Make iomap_write_iter() consistent with
generic_perform_write(): only touch userspace an extra time when
copy_folio_from_iter_atomic() has failed to make progress.
This patch is inspired by commit 665575cff098 ("filemap: move
prefaulting out of hot write path").
Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com>
---
fs/iomap/buffered-io.c | 31 ++++++++++++++++---------------
1 file changed, 16 insertions(+), 15 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 8b847a1e27f1..6e6573fce78a 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -972,21 +972,6 @@ static int iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i,
if (bytes > iomap_length(iter))
bytes = iomap_length(iter);
- /*
- * Bring in the user page that we'll copy from _first_.
- * Otherwise there's a nasty deadlock on copying from the
- * same page as we're writing to, without it being marked
- * up-to-date.
- *
- * For async buffered writes the assumption is that the user
- * page has already been faulted in. This can be optimized by
- * faulting the user page.
- */
- if (unlikely(fault_in_iov_iter_readable(i, bytes) == bytes)) {
- status = -EFAULT;
- break;
- }
-
status = iomap_write_begin(iter, write_ops, &folio, &offset,
&bytes);
if (unlikely(status)) {
@@ -1001,6 +986,12 @@ static int iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i,
if (mapping_writably_mapped(mapping))
flush_dcache_folio(folio);
+ /*
+ * Faults here on mmap()s can recurse into arbitrary
+ * filesystem code. Lots of locks are held that can
+ * deadlock. Use an atomic copy to avoid deadlocking
+ * in page fault handling.
+ */
copied = copy_folio_from_iter_atomic(folio, offset, bytes, i);
written = iomap_write_end(iter, bytes, copied, folio) ?
copied : 0;
@@ -1039,6 +1030,16 @@ static int iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i,
bytes = copied;
goto retry;
}
+
+ /*
+ * 'folio' is now unlocked and faults on it can be
+ * handled. Ensure forward progress by trying to
+ * fault it in now.
+ */
+ if (fault_in_iov_iter_readable(i, bytes) == bytes) {
+ status = -EFAULT;
+ break;
+ }
} else {
total_written += written;
iomap_iter_advance(iter, &written);
--
2.49.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] iomap: move prefaulting out of hot write path
2025-10-09 9:08 [PATCH] iomap: move prefaulting out of hot write path alexjlzheng
@ 2025-10-09 15:01 ` Darrick J. Wong
2025-10-09 15:15 ` Dave Hansen
0 siblings, 1 reply; 6+ messages in thread
From: Darrick J. Wong @ 2025-10-09 15:01 UTC (permalink / raw)
To: alexjlzheng, dave.hansen
Cc: brauner, linux-xfs, linux-fsdevel, linux-kernel, dave.hansen,
Jinliang Zheng
On Thu, Oct 09, 2025 at 05:08:51PM +0800, alexjlzheng@gmail.com wrote:
> From: Jinliang Zheng <alexjlzheng@tencent.com>
>
> Prefaulting the write source buffer incurs an extra userspace access
> in the common fast path. Make iomap_write_iter() consistent with
> generic_perform_write(): only touch userspace an extra time when
> copy_folio_from_iter_atomic() has failed to make progress.
>
> This patch is inspired by commit 665575cff098 ("filemap: move
> prefaulting out of hot write path").
Seems fine to me, but I wonder if dhansen has any thoughts about this
patch ... which exactly mirrors one he sent eight months ago?
--D
> Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com>
> ---
> fs/iomap/buffered-io.c | 31 ++++++++++++++++---------------
> 1 file changed, 16 insertions(+), 15 deletions(-)
>
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 8b847a1e27f1..6e6573fce78a 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -972,21 +972,6 @@ static int iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i,
> if (bytes > iomap_length(iter))
> bytes = iomap_length(iter);
>
> - /*
> - * Bring in the user page that we'll copy from _first_.
> - * Otherwise there's a nasty deadlock on copying from the
> - * same page as we're writing to, without it being marked
> - * up-to-date.
> - *
> - * For async buffered writes the assumption is that the user
> - * page has already been faulted in. This can be optimized by
> - * faulting the user page.
> - */
> - if (unlikely(fault_in_iov_iter_readable(i, bytes) == bytes)) {
> - status = -EFAULT;
> - break;
> - }
> -
> status = iomap_write_begin(iter, write_ops, &folio, &offset,
> &bytes);
> if (unlikely(status)) {
> @@ -1001,6 +986,12 @@ static int iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i,
> if (mapping_writably_mapped(mapping))
> flush_dcache_folio(folio);
>
> + /*
> + * Faults here on mmap()s can recurse into arbitrary
> + * filesystem code. Lots of locks are held that can
> + * deadlock. Use an atomic copy to avoid deadlocking
> + * in page fault handling.
> + */
> copied = copy_folio_from_iter_atomic(folio, offset, bytes, i);
> written = iomap_write_end(iter, bytes, copied, folio) ?
> copied : 0;
> @@ -1039,6 +1030,16 @@ static int iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i,
> bytes = copied;
> goto retry;
> }
> +
> + /*
> + * 'folio' is now unlocked and faults on it can be
> + * handled. Ensure forward progress by trying to
> + * fault it in now.
> + */
> + if (fault_in_iov_iter_readable(i, bytes) == bytes) {
> + status = -EFAULT;
> + break;
> + }
> } else {
> total_written += written;
> iomap_iter_advance(iter, &written);
> --
> 2.49.0
>
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] iomap: move prefaulting out of hot write path
2025-10-09 15:01 ` Darrick J. Wong
@ 2025-10-09 15:15 ` Dave Hansen
2025-10-10 2:04 ` Jinliang Zheng
0 siblings, 1 reply; 6+ messages in thread
From: Dave Hansen @ 2025-10-09 15:15 UTC (permalink / raw)
To: Darrick J. Wong, alexjlzheng, dave.hansen
Cc: brauner, linux-xfs, linux-fsdevel, linux-kernel, Jinliang Zheng
On 10/9/25 08:01, Darrick J. Wong wrote:
> On Thu, Oct 09, 2025 at 05:08:51PM +0800, alexjlzheng@gmail.com wrote:
>> From: Jinliang Zheng <alexjlzheng@tencent.com>
>>
>> Prefaulting the write source buffer incurs an extra userspace access
>> in the common fast path. Make iomap_write_iter() consistent with
>> generic_perform_write(): only touch userspace an extra time when
>> copy_folio_from_iter_atomic() has failed to make progress.
>>
>> This patch is inspired by commit 665575cff098 ("filemap: move
>> prefaulting out of hot write path").
> Seems fine to me, but I wonder if dhansen has any thoughts about this
> patch ... which exactly mirrors one he sent eight months ago?
I don't _really_ care all that much. But, yeah, I would have expected
a little shout-out or something when someone copies the changelog and
code verbatim from another patch:
https://lore.kernel.org/lkml/20250129181753.3927F212@davehans-spike.ostc.intel.com/
and then copies a comment from a second patch I did.
But I guess I was cc'd at least. Also, if my name isn't on this one,
then I don't have to fix any of the bugs it causes. Right? ;)
Just one warning: be on the lookout for bugs in the area. The
prefaulting definitely does a good job of hiding bugs in other bits
of the code. The generic_perform_write() gunk seems to have uncovered
a bug or two.
Also, didn't Christoph ask you to make the comments wider the last
time Alex posted this? I don't think that got changed.
https://lore.kernel.org/lkml/aIt8BYa6Ti6SRh8C@infradead.org/
Overall, the change still seems as valid to me as it did when I wrote the
patch in the first place. Although it feels funny to ack my own
patch.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] iomap: move prefaulting out of hot write path
2025-10-09 15:15 ` Dave Hansen
@ 2025-10-10 2:04 ` Jinliang Zheng
0 siblings, 0 replies; 6+ messages in thread
From: Jinliang Zheng @ 2025-10-10 2:04 UTC (permalink / raw)
To: dave.hansen
Cc: alexjlzheng, alexjlzheng, brauner, dave.hansen, djwong,
linux-fsdevel, linux-kernel, linux-xfs
> On 11/9/25 08:01, Darrick J. Wong wrote:
> > On Thu, Oct 09, 2025 at 05:08:51PM +0800, alexjlzheng@gmail.com wrote:
> >> From: Jinliang Zheng <alexjlzheng@tencent.com>
> >>
> >> Prefaulting the write source buffer incurs an extra userspace access
> >> in the common fast path. Make iomap_write_iter() consistent with
> >> generic_perform_write(): only touch userspace an extra time when
> >> copy_folio_from_iter_atomic() has failed to make progress.
> >>
> >> This patch is inspired by commit 665575cff098 ("filemap: move
> >> prefaulting out of hot write path").
> > Seems fine to me, but I wonder if dhansen has any thoughts about this
> > patch ... which exactly mirrors one he sent eight months ago?
>
> I don't _really_ care all that much. But, yeah, I would have expected
> a little shout-out or something when someone copies the changelog and
> code verbatim from another patch:
>
> https://lore.kernel.org/lkml/20250129181753.3927F212@davehans-spike.ostc.intel.com/
>
> and then copies a comment from a second patch I did.
Sorry for forgetting to CC you in my previous email.
When I sent V1[1], I hadn't come across this email (which was an oversight on my part):
- https://lore.kernel.org/lkml/20250129181753.3927F212@davehans-spike.ostc.intel.com/
At that time, I was quite puzzled about why generic_perform_write() had moved prefaulting
out of the hot write path, while iomap_write_iter() had not done the same.
It wasn't until I was preparing V2[2] that I found the email above. However, the code around
had already undergone some changes by then, so I rebased the code in this email onto the
upstream version. My apologies for forgetting to CC you earlier.
[1] https://lore.kernel.org/linux-xfs/20250726090955.647131-2-alexjlzheng@tencent.com/
[2] https://lore.kernel.org/linux-xfs/20250730164408.4187624-2-alexjlzheng@tencent.com/
Hope you know I didn't mean any offense. Sorry about that.
>
> But I guess I was cc'd at least. Also, if my name isn't on this one,
> then I don't have to fix any of the bugs it causes. Right? ;)
>
> Just one warning: be on the lookout for bugs in the area. The
> prefaulting definitely does a good job of hiding bugs in other bits
> of the code. The generic_perform_write() gunk seems to have uncovered
> a bug or two.
Indeed, the reason I sent this patch was precisely because I was unsure why the change
for iomap_write_iter() hadn't been merged like the one for generic_perform_write() — I
wondered if there might be some underlying issue. I hoped to seek everyone's thoughts
through this patch. :)
>
> Also, didn't Christoph ask you to make the comments wider the last
> time Alex posted this? I don't think that got changed.
>
> https://lore.kernel.org/lkml/aIt8BYa6Ti6SRh8C@infradead.org/
>
> Overall, the change still seems as valid to me as it did when I wrote the
> patch in the first place. Although it feels funny to ack my own
> patch.
If moving prefaulting out of the hot write path in iomap_write_iter() is indeed
acceptable, would you mind taking the time to rebase the code from your patch onto
the latest upstream version and submit a new patch? After all, you are the
original author of the change. :)
Thank you very much,
Jinliang. :)
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-10-10 2:05 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-09 9:08 [PATCH] iomap: move prefaulting out of hot write path alexjlzheng
2025-10-09 15:01 ` Darrick J. Wong
2025-10-09 15:15 ` Dave Hansen
2025-10-10 2:04 ` Jinliang Zheng
-- strict thread matches above, loose matches on Subject: below --
2025-07-26 9:09 alexjlzheng
2025-07-27 22:10 ` Matthew Wilcox
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).