Re: [RFC PATCH] mm/filemap.c: fix the timing of asignment of prev_pos

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Guixin Liu <kanie@linux.alibaba.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org
Subject: Re: [RFC PATCH] mm/filemap.c: fix the timing of asignment of prev_pos
Date: Thu, 18 Aug 2022 11:13:24 +0800	[thread overview]
Message-ID: <ba2b9f11-be74-fa32-9ee6-e1794bdc09a0@linux.alibaba.com> (raw)
In-Reply-To: <Yv0IccKJ6Spk/zH4@casper.infradead.org>


在 2022/8/17 23:25, Matthew Wilcox 写道:
> On Wed, Aug 17, 2022 at 09:51:57PM +0800, Guixin Liu wrote:
>> The prev_pos should be assigned before the iocb->ki_pos is incremented,
>> so that the prev_pos is the exact location of the last visit.
>>
>> Fixes: 06c0444290cec ("mm/filemap.c: generic_file_buffered_read() now
>> uses find_get_pages_contig")
>> Signed-off-by: Guixin Liu <kanie@linux.alibaba.com>
>>
>> ---
>> Hi guys,
>>      When I`m running repetitive 4k read io which has same offset,
>> I find that access to folio_mark_accessed is inevitable in the
>> read process, the reason is that the prev_pos is assigned after the
>> iocb->ki_pos is incremented, so that the prev_pos is always not equal
>> to the position currently visited.
>>      Is this a bug that needs fixing?
> I think you've misunderstood the purpose of 'prev_pos'.  But this has
> been the source of bugs, so let's go through it in detail.
>
> In general, we want to mark a folio as accessed each time we read from
> it.  So if we do this:
>
> 	read(fd, buf, 1024 * 1024);
>
> we want to mark each folio as having been accessed.
>
> But if we're doing lots of short reads, we don't want to mark a folio as
> being accessed multiple times (if you dive into the implementation,
> you'll see the first time, the 'referenced' flag is set and the second
> time, the folio is moved to the active list, so it matters how often
> we call mark_accessed).  IOW:
>
> 	for (i = 0; i < 1024 * 1024; i++)
> 		read(fd, buf, 1);
>
> should do the same amount of accessed/referenced/activation as the single
> read above.
>
> So when we store ki_pos in prev_pos, we don't want to know "Where did
> the previous read start?"  We want to know "Where did the previous read
> end".  That's why when we test it, we check whether prev_pos - 1 is in
> the same folio as the offset we're looking at:
>
>                  if (!pos_same_folio(iocb->ki_pos, ra->prev_pos - 1,
>                                                          fbatch.folios[0]))
>                          folio_mark_accessed(fbatch.folios[0]);
>
> I'm not super-proud of this code, and accept that it's confusing.
> But I don't think the patch below is right.  If you could share
> your actual test and show what's going wrong, I'm interested.
>
> I think what you're saying is that this loop:
>
> 	for (i = 0; i < 1000; i++)
> 		pread(fd, buf, 4096, 1024 * 1024);
>
> results in the folio at offset 1MB being marked as accessed more than
> once.  If so, then I think that's the algorithm behaving as designed.
> Whether that's desirable is a different question; when I touched this
> code last, I was trying to restore the previous behaviour which was
> inadvertently broken.  I'm not taking a position on what the right
> behaviour is for such code.
>
My thanks for your detailed description, I am wrong about this, I test 
not on the newest code, My fault.

The 5ccc944dce3d actually solved this problem.

Best regards,

Guixin Liu

>>   mm/filemap.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/mm/filemap.c b/mm/filemap.c
>> index 660490c..68fd987 100644
>> --- a/mm/filemap.c
>> +++ b/mm/filemap.c
>> @@ -2703,8 +2703,8 @@ ssize_t filemap_read(struct kiocb *iocb, struct iov_iter *iter,
>>   			copied = copy_folio_to_iter(folio, offset, bytes, iter);
>>   
>>   			already_read += copied;
>> -			iocb->ki_pos += copied;
>>   			ra->prev_pos = iocb->ki_pos;
>> +			iocb->ki_pos += copied;
>>   
>>   			if (copied < bytes) {
>>   				error = -EFAULT;
>> -- 
>> 1.8.3.1
>>

     prev parent reply	other threads:[~2022-08-18  3:14 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-17 13:51 [RFC PATCH] mm/filemap.c: fix the timing of asignment of prev_pos Guixin Liu
2022-08-17 15:16 ` Andrew Morton
2022-08-17 15:30   ` Matthew Wilcox
2022-08-17 15:25 ` Matthew Wilcox
2022-08-18  3:13   ` Guixin Liu [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ba2b9f11-be74-fa32-9ee6-e1794bdc09a0@linux.alibaba.com \
    --to=kanie@linux.alibaba.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).