* [RFC PATCH] mm/filemap.c: fix the timing of asignment of prev_pos
@ 2022-08-17 13:51 Guixin Liu
2022-08-17 15:16 ` Andrew Morton
2022-08-17 15:25 ` Matthew Wilcox
0 siblings, 2 replies; 5+ messages in thread
From: Guixin Liu @ 2022-08-17 13:51 UTC (permalink / raw)
To: willy, akpm; +Cc: linux-fsdevel, linux-mm
The prev_pos should be assigned before the iocb->ki_pos is incremented,
so that the prev_pos is the exact location of the last visit.
Fixes: 06c0444290cec ("mm/filemap.c: generic_file_buffered_read() now
uses find_get_pages_contig")
Signed-off-by: Guixin Liu <kanie@linux.alibaba.com>
---
Hi guys,
When I`m running repetitive 4k read io which has same offset,
I find that access to folio_mark_accessed is inevitable in the
read process, the reason is that the prev_pos is assigned after the
iocb->ki_pos is incremented, so that the prev_pos is always not equal
to the position currently visited.
Is this a bug that needs fixing?
mm/filemap.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/filemap.c b/mm/filemap.c
index 660490c..68fd987 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2703,8 +2703,8 @@ ssize_t filemap_read(struct kiocb *iocb, struct iov_iter *iter,
copied = copy_folio_to_iter(folio, offset, bytes, iter);
already_read += copied;
- iocb->ki_pos += copied;
ra->prev_pos = iocb->ki_pos;
+ iocb->ki_pos += copied;
if (copied < bytes) {
error = -EFAULT;
--
1.8.3.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [RFC PATCH] mm/filemap.c: fix the timing of asignment of prev_pos
2022-08-17 13:51 [RFC PATCH] mm/filemap.c: fix the timing of asignment of prev_pos Guixin Liu
@ 2022-08-17 15:16 ` Andrew Morton
2022-08-17 15:30 ` Matthew Wilcox
2022-08-17 15:25 ` Matthew Wilcox
1 sibling, 1 reply; 5+ messages in thread
From: Andrew Morton @ 2022-08-17 15:16 UTC (permalink / raw)
To: Guixin Liu; +Cc: willy, linux-fsdevel, linux-mm, Kent Overstreet
On Wed, 17 Aug 2022 21:51:57 +0800 Guixin Liu <kanie@linux.alibaba.com> wrote:
> The prev_pos should be assigned before the iocb->ki_pos is incremented,
> so that the prev_pos is the exact location of the last visit.
>
> Fixes: 06c0444290cec ("mm/filemap.c: generic_file_buffered_read() now
> uses find_get_pages_contig")
> Signed-off-by: Guixin Liu <kanie@linux.alibaba.com>
>
> ---
> Hi guys,
> When I`m running repetitive 4k read io which has same offset,
> I find that access to folio_mark_accessed is inevitable in the
> read process, the reason is that the prev_pos is assigned after the
> iocb->ki_pos is incremented, so that the prev_pos is always not equal
> to the position currently visited.
> Is this a bug that needs fixing?
It looks wrong to me and it does appear that 06c0444290cecf0 did this
unintentionally.
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -2703,8 +2703,8 @@ ssize_t filemap_read(struct kiocb *iocb, struct iov_iter *iter,
> copied = copy_folio_to_iter(folio, offset, bytes, iter);
>
> already_read += copied;
> - iocb->ki_pos += copied;
> ra->prev_pos = iocb->ki_pos;
> + iocb->ki_pos += copied;
>
> if (copied < bytes) {
> error = -EFAULT;
So we significantly messed up pagecache page aging and nobody noticed
for nearly two years. What does this tell us :(
I'd be interested if anyone can demonstrate runtime effects from this
change. If yes then I'll add cc:stable. If no then I'll ask why we
even bothered.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC PATCH] mm/filemap.c: fix the timing of asignment of prev_pos
2022-08-17 13:51 [RFC PATCH] mm/filemap.c: fix the timing of asignment of prev_pos Guixin Liu
2022-08-17 15:16 ` Andrew Morton
@ 2022-08-17 15:25 ` Matthew Wilcox
2022-08-18 3:13 ` Guixin Liu
1 sibling, 1 reply; 5+ messages in thread
From: Matthew Wilcox @ 2022-08-17 15:25 UTC (permalink / raw)
To: Guixin Liu; +Cc: akpm, linux-fsdevel, linux-mm
On Wed, Aug 17, 2022 at 09:51:57PM +0800, Guixin Liu wrote:
> The prev_pos should be assigned before the iocb->ki_pos is incremented,
> so that the prev_pos is the exact location of the last visit.
>
> Fixes: 06c0444290cec ("mm/filemap.c: generic_file_buffered_read() now
> uses find_get_pages_contig")
> Signed-off-by: Guixin Liu <kanie@linux.alibaba.com>
>
> ---
> Hi guys,
> When I`m running repetitive 4k read io which has same offset,
> I find that access to folio_mark_accessed is inevitable in the
> read process, the reason is that the prev_pos is assigned after the
> iocb->ki_pos is incremented, so that the prev_pos is always not equal
> to the position currently visited.
> Is this a bug that needs fixing?
I think you've misunderstood the purpose of 'prev_pos'. But this has
been the source of bugs, so let's go through it in detail.
In general, we want to mark a folio as accessed each time we read from
it. So if we do this:
read(fd, buf, 1024 * 1024);
we want to mark each folio as having been accessed.
But if we're doing lots of short reads, we don't want to mark a folio as
being accessed multiple times (if you dive into the implementation,
you'll see the first time, the 'referenced' flag is set and the second
time, the folio is moved to the active list, so it matters how often
we call mark_accessed). IOW:
for (i = 0; i < 1024 * 1024; i++)
read(fd, buf, 1);
should do the same amount of accessed/referenced/activation as the single
read above.
So when we store ki_pos in prev_pos, we don't want to know "Where did
the previous read start?" We want to know "Where did the previous read
end". That's why when we test it, we check whether prev_pos - 1 is in
the same folio as the offset we're looking at:
if (!pos_same_folio(iocb->ki_pos, ra->prev_pos - 1,
fbatch.folios[0]))
folio_mark_accessed(fbatch.folios[0]);
I'm not super-proud of this code, and accept that it's confusing.
But I don't think the patch below is right. If you could share
your actual test and show what's going wrong, I'm interested.
I think what you're saying is that this loop:
for (i = 0; i < 1000; i++)
pread(fd, buf, 4096, 1024 * 1024);
results in the folio at offset 1MB being marked as accessed more than
once. If so, then I think that's the algorithm behaving as designed.
Whether that's desirable is a different question; when I touched this
code last, I was trying to restore the previous behaviour which was
inadvertently broken. I'm not taking a position on what the right
behaviour is for such code.
> mm/filemap.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 660490c..68fd987 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -2703,8 +2703,8 @@ ssize_t filemap_read(struct kiocb *iocb, struct iov_iter *iter,
> copied = copy_folio_to_iter(folio, offset, bytes, iter);
>
> already_read += copied;
> - iocb->ki_pos += copied;
> ra->prev_pos = iocb->ki_pos;
> + iocb->ki_pos += copied;
>
> if (copied < bytes) {
> error = -EFAULT;
> --
> 1.8.3.1
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC PATCH] mm/filemap.c: fix the timing of asignment of prev_pos
2022-08-17 15:16 ` Andrew Morton
@ 2022-08-17 15:30 ` Matthew Wilcox
0 siblings, 0 replies; 5+ messages in thread
From: Matthew Wilcox @ 2022-08-17 15:30 UTC (permalink / raw)
To: Andrew Morton; +Cc: Guixin Liu, linux-fsdevel, linux-mm, Kent Overstreet
On Wed, Aug 17, 2022 at 08:16:57AM -0700, Andrew Morton wrote:
> On Wed, 17 Aug 2022 21:51:57 +0800 Guixin Liu <kanie@linux.alibaba.com> wrote:
>
> > The prev_pos should be assigned before the iocb->ki_pos is incremented,
> > so that the prev_pos is the exact location of the last visit.
> >
> > Fixes: 06c0444290cec ("mm/filemap.c: generic_file_buffered_read() now
> > uses find_get_pages_contig")
> > Signed-off-by: Guixin Liu <kanie@linux.alibaba.com>
> >
> > ---
> > Hi guys,
> > When I`m running repetitive 4k read io which has same offset,
> > I find that access to folio_mark_accessed is inevitable in the
> > read process, the reason is that the prev_pos is assigned after the
> > iocb->ki_pos is incremented, so that the prev_pos is always not equal
> > to the position currently visited.
> > Is this a bug that needs fixing?
>
> It looks wrong to me and it does appear that 06c0444290cecf0 did this
> unintentionally.
That commit was the start of a problem, but I think I restored the
original behaviour in 5ccc944dce3d. You were part of that discussion
back in June:
https://lore.kernel.org/linux-mm/20220602082129.2805890-1-yukuai3@huawei.com/
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC PATCH] mm/filemap.c: fix the timing of asignment of prev_pos
2022-08-17 15:25 ` Matthew Wilcox
@ 2022-08-18 3:13 ` Guixin Liu
0 siblings, 0 replies; 5+ messages in thread
From: Guixin Liu @ 2022-08-18 3:13 UTC (permalink / raw)
To: Matthew Wilcox; +Cc: akpm, linux-fsdevel, linux-mm
在 2022/8/17 23:25, Matthew Wilcox 写道:
> On Wed, Aug 17, 2022 at 09:51:57PM +0800, Guixin Liu wrote:
>> The prev_pos should be assigned before the iocb->ki_pos is incremented,
>> so that the prev_pos is the exact location of the last visit.
>>
>> Fixes: 06c0444290cec ("mm/filemap.c: generic_file_buffered_read() now
>> uses find_get_pages_contig")
>> Signed-off-by: Guixin Liu <kanie@linux.alibaba.com>
>>
>> ---
>> Hi guys,
>> When I`m running repetitive 4k read io which has same offset,
>> I find that access to folio_mark_accessed is inevitable in the
>> read process, the reason is that the prev_pos is assigned after the
>> iocb->ki_pos is incremented, so that the prev_pos is always not equal
>> to the position currently visited.
>> Is this a bug that needs fixing?
> I think you've misunderstood the purpose of 'prev_pos'. But this has
> been the source of bugs, so let's go through it in detail.
>
> In general, we want to mark a folio as accessed each time we read from
> it. So if we do this:
>
> read(fd, buf, 1024 * 1024);
>
> we want to mark each folio as having been accessed.
>
> But if we're doing lots of short reads, we don't want to mark a folio as
> being accessed multiple times (if you dive into the implementation,
> you'll see the first time, the 'referenced' flag is set and the second
> time, the folio is moved to the active list, so it matters how often
> we call mark_accessed). IOW:
>
> for (i = 0; i < 1024 * 1024; i++)
> read(fd, buf, 1);
>
> should do the same amount of accessed/referenced/activation as the single
> read above.
>
> So when we store ki_pos in prev_pos, we don't want to know "Where did
> the previous read start?" We want to know "Where did the previous read
> end". That's why when we test it, we check whether prev_pos - 1 is in
> the same folio as the offset we're looking at:
>
> if (!pos_same_folio(iocb->ki_pos, ra->prev_pos - 1,
> fbatch.folios[0]))
> folio_mark_accessed(fbatch.folios[0]);
>
> I'm not super-proud of this code, and accept that it's confusing.
> But I don't think the patch below is right. If you could share
> your actual test and show what's going wrong, I'm interested.
>
> I think what you're saying is that this loop:
>
> for (i = 0; i < 1000; i++)
> pread(fd, buf, 4096, 1024 * 1024);
>
> results in the folio at offset 1MB being marked as accessed more than
> once. If so, then I think that's the algorithm behaving as designed.
> Whether that's desirable is a different question; when I touched this
> code last, I was trying to restore the previous behaviour which was
> inadvertently broken. I'm not taking a position on what the right
> behaviour is for such code.
>
My thanks for your detailed description, I am wrong about this, I test
not on the newest code, My fault.
The 5ccc944dce3d actually solved this problem.
Best regards,
Guixin Liu
>> mm/filemap.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/mm/filemap.c b/mm/filemap.c
>> index 660490c..68fd987 100644
>> --- a/mm/filemap.c
>> +++ b/mm/filemap.c
>> @@ -2703,8 +2703,8 @@ ssize_t filemap_read(struct kiocb *iocb, struct iov_iter *iter,
>> copied = copy_folio_to_iter(folio, offset, bytes, iter);
>>
>> already_read += copied;
>> - iocb->ki_pos += copied;
>> ra->prev_pos = iocb->ki_pos;
>> + iocb->ki_pos += copied;
>>
>> if (copied < bytes) {
>> error = -EFAULT;
>> --
>> 1.8.3.1
>>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2022-08-18 3:14 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-08-17 13:51 [RFC PATCH] mm/filemap.c: fix the timing of asignment of prev_pos Guixin Liu
2022-08-17 15:16 ` Andrew Morton
2022-08-17 15:30 ` Matthew Wilcox
2022-08-17 15:25 ` Matthew Wilcox
2022-08-18 3:13 ` Guixin Liu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).