Re: [f2fs-dev] [PATCH] f2fs: get out of a repeat loop when getting a locked data page

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jaegeuk Kim <jaegeuk@kernel.org>
To: Matthew Wilcox <willy@infradead.org>
Cc: linux-kernel@vger.kernel.org, stable@vger.kernel.org,
	linux-f2fs-devel@lists.sourceforge.net, linux-mm@kvack.org,
	linux-fsdevel@vger.kernel.org
Subject: Re: [f2fs-dev] [PATCH] f2fs: get out of a repeat loop when getting a locked data page
Date: Wed, 5 Apr 2023 09:39:37 -0700	[thread overview]
Message-ID: <ZC2kSfNUXKK4PfpM@google.com> (raw)
In-Reply-To: <ZCHCykI/BLcfDzt7@casper.infradead.org>

On 03/27, Matthew Wilcox wrote:
> On Mon, Mar 27, 2023 at 08:30:33AM -0700, Jaegeuk Kim wrote:
> > On 03/26, Chao Yu wrote:
> > > On 2023/3/24 5:39, Jaegeuk Kim wrote:
> > > > https://bugzilla.kernel.org/show_bug.cgi?id=216050
> > > > 
> > > > Somehow we're getting a page which has a different mapping.
> > > > Let's avoid the infinite loop.
> > > > 
> > > > Cc: <stable@vger.kernel.org>
> > > > Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> > > > ---
> > > >   fs/f2fs/data.c | 8 ++------
> > > >   1 file changed, 2 insertions(+), 6 deletions(-)
> > > > 
> > > > diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> > > > index bf51e6e4eb64..80702c93e885 100644
> > > > --- a/fs/f2fs/data.c
> > > > +++ b/fs/f2fs/data.c
> > > > @@ -1329,18 +1329,14 @@ struct page *f2fs_get_lock_data_page(struct inode *inode, pgoff_t index,
> > > >   {
> > > >   	struct address_space *mapping = inode->i_mapping;
> > > >   	struct page *page;
> > > > -repeat:
> > > > +
> > > >   	page = f2fs_get_read_data_page(inode, index, 0, for_write, NULL);
> > > >   	if (IS_ERR(page))
> > > >   		return page;
> > > >   	/* wait for read completion */
> > > >   	lock_page(page);
> > > > -	if (unlikely(page->mapping != mapping)) {
> > > 
> > > How about using such logic only for move_data_page() to limit affect for
> > > other paths?
> > 
> > Why move_data_page() only? If this happens, we'll fall into a loop in anywhere?
> > 
> > > 
> > > Jaegeuk, any thoughts about why mapping is mismatch in between page's one and
> > > inode->i_mapping?
> > 
> > > 
> > > After several times code review, I didn't get any clue about why f2fs always
> > > get the different mapping in a loop.
> > 
> > I couldn't find the path to happen this. So weird. Please check the history in the
> > bug.
> > 
> > > 
> > > Maybe we can loop MM guys to check whether below folio_file_page() may return
> > > page which has different mapping?
> > 
> > Matthew may have some idea on this?
> 
> There's a lot of comments in the bug ... hard to come into this one
> cold.
> 
> I did notice this one (#119):
> : Interestingly, ref count is 514, which looks suspiciously as a binary
> : flag 1000000010. Is it possible that during 5.17/5.18 implementation
> : of a "pin", somehow binary flag was written to ref count, or something
> : like '1 << ...' happens?
> 
> That indicates to me that somehow you've got hold of a THP that is in
> the page cache.  Probably shmem/tmpfs.  That indicate to me a refcount
> problem that looks something like this:
> 
> f2fs allocates a page
> f2fs adds the page to the page cache
> f2fs puts the reference to the page without removing it from the
> page cache (how?)

Is it somewhat related to setting a bit in private field?

When we migrate the blocks, we do:
1) get_lock_page()
2) submit read
3) lock_page()
3) set_page_dirty()
4) set_page_private_gcing(page)

--- in fs/f2fs/f2fs.h
1409 #define PAGE_PRIVATE_SET_FUNC(name, flagname) \
1410 static inline void set_page_private_##name(struct page *page) \
1411 { \
1412         if (!PagePrivate(page)) { \
1413                 get_page(page); \
1414                 SetPagePrivate(page); \
1415                 set_page_private(page, 0); \
1416         } \
1417         set_bit(PAGE_PRIVATE_NOT_POINTER, &page_private(page)); \
1418         set_bit(PAGE_PRIVATE_##flagname, &page_private(page)); \
1419 }


5) set_page_writebac()
6) submit write
7) unlock_page()
8) put_page(page)

Later, f2fs_invalidate_folio will do put_page again by:
clear_page_private_gcing(&folio->page);

--- in fs/f2fs/f2fs.h
1421 #define PAGE_PRIVATE_CLEAR_FUNC(name, flagname) \
1422 static inline void clear_page_private_##name(struct page *page) \
1423 { \
1424         clear_bit(PAGE_PRIVATE_##flagname, &page_private(page)); \
1425         if (page_private(page) == BIT(PAGE_PRIVATE_NOT_POINTER)) { \
1426                 set_page_private(page, 0); \
1427                 if (PagePrivate(page)) { \
1428                         ClearPagePrivate(page); \
1429                         put_page(page); \
1430                 }\
1431         } \
1432 }

> page is now free, gets reallocated into a THP
> lookup from the f2fs file finds the new THP
> things explode messily
> 
> Checking page->mapping is going to avoid the messy explosion, but
> you'll still have a page in the page cache which doesn't actually
> belong to you, and that's going to lead to subtle data corruption.
> 
> This should be caught by page_expected_state(), called from
> free_page_is_bad(), called from free_pages_prepare().  Do your testers
> have CONFIG_DEBUG_VM enabled?  That might give you a fighting chance at
> finding the last place which called put_page().  It won't necessarily be
> the _wrong_ place to call put_page() (that may have happened earlier),
> but it may give you a clue.
> 
> > > 
> > > struct page *pagecache_get_page(struct address_space *mapping, pgoff_t index,
> > > 		int fgp_flags, gfp_t gfp)
> > > {
> > > 	struct folio *folio;
> > > 
> > > 	folio = __filemap_get_folio(mapping, index, fgp_flags, gfp);
> > > 	if (IS_ERR(folio))
> > > 		return NULL;
> > > 	return folio_file_page(folio, index);
> > > }
> > > 
> > > Thanks,
> > > 
> > > > -		f2fs_put_page(page, 1);
> > > > -		goto repeat;
> > > > -	}
> > > > -	if (unlikely(!PageUptodate(page))) {
> > > > +	if (unlikely(page->mapping != mapping || !PageUptodate(page))) {
> > > >   		f2fs_put_page(page, 1);
> > > >   		return ERR_PTR(-EIO);
> > > >   	}


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

WARNING: multiple messages have this Message-ID (diff)

From: Jaegeuk Kim <jaegeuk@kernel.org>
To: Matthew Wilcox <willy@infradead.org>
Cc: Chao Yu <chao@kernel.org>,
	linux-kernel@vger.kernel.org,
	linux-f2fs-devel@lists.sourceforge.net, stable@vger.kernel.org,
	linux-mm@kvack.org, linux-fsdevel@vger.kernel.org
Subject: Re: [f2fs-dev] [PATCH] f2fs: get out of a repeat loop when getting a locked data page
Date: Wed, 5 Apr 2023 09:39:37 -0700	[thread overview]
Message-ID: <ZC2kSfNUXKK4PfpM@google.com> (raw)
In-Reply-To: <ZCHCykI/BLcfDzt7@casper.infradead.org>

On 03/27, Matthew Wilcox wrote:
> On Mon, Mar 27, 2023 at 08:30:33AM -0700, Jaegeuk Kim wrote:
> > On 03/26, Chao Yu wrote:
> > > On 2023/3/24 5:39, Jaegeuk Kim wrote:
> > > > https://bugzilla.kernel.org/show_bug.cgi?id=216050
> > > > 
> > > > Somehow we're getting a page which has a different mapping.
> > > > Let's avoid the infinite loop.
> > > > 
> > > > Cc: <stable@vger.kernel.org>
> > > > Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> > > > ---
> > > >   fs/f2fs/data.c | 8 ++------
> > > >   1 file changed, 2 insertions(+), 6 deletions(-)
> > > > 
> > > > diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> > > > index bf51e6e4eb64..80702c93e885 100644
> > > > --- a/fs/f2fs/data.c
> > > > +++ b/fs/f2fs/data.c
> > > > @@ -1329,18 +1329,14 @@ struct page *f2fs_get_lock_data_page(struct inode *inode, pgoff_t index,
> > > >   {
> > > >   	struct address_space *mapping = inode->i_mapping;
> > > >   	struct page *page;
> > > > -repeat:
> > > > +
> > > >   	page = f2fs_get_read_data_page(inode, index, 0, for_write, NULL);
> > > >   	if (IS_ERR(page))
> > > >   		return page;
> > > >   	/* wait for read completion */
> > > >   	lock_page(page);
> > > > -	if (unlikely(page->mapping != mapping)) {
> > > 
> > > How about using such logic only for move_data_page() to limit affect for
> > > other paths?
> > 
> > Why move_data_page() only? If this happens, we'll fall into a loop in anywhere?
> > 
> > > 
> > > Jaegeuk, any thoughts about why mapping is mismatch in between page's one and
> > > inode->i_mapping?
> > 
> > > 
> > > After several times code review, I didn't get any clue about why f2fs always
> > > get the different mapping in a loop.
> > 
> > I couldn't find the path to happen this. So weird. Please check the history in the
> > bug.
> > 
> > > 
> > > Maybe we can loop MM guys to check whether below folio_file_page() may return
> > > page which has different mapping?
> > 
> > Matthew may have some idea on this?
> 
> There's a lot of comments in the bug ... hard to come into this one
> cold.
> 
> I did notice this one (#119):
> : Interestingly, ref count is 514, which looks suspiciously as a binary
> : flag 1000000010. Is it possible that during 5.17/5.18 implementation
> : of a "pin", somehow binary flag was written to ref count, or something
> : like '1 << ...' happens?
> 
> That indicates to me that somehow you've got hold of a THP that is in
> the page cache.  Probably shmem/tmpfs.  That indicate to me a refcount
> problem that looks something like this:
> 
> f2fs allocates a page
> f2fs adds the page to the page cache
> f2fs puts the reference to the page without removing it from the
> page cache (how?)

Is it somewhat related to setting a bit in private field?

When we migrate the blocks, we do:
1) get_lock_page()
2) submit read
3) lock_page()
3) set_page_dirty()
4) set_page_private_gcing(page)

--- in fs/f2fs/f2fs.h
1409 #define PAGE_PRIVATE_SET_FUNC(name, flagname) \
1410 static inline void set_page_private_##name(struct page *page) \
1411 { \
1412         if (!PagePrivate(page)) { \
1413                 get_page(page); \
1414                 SetPagePrivate(page); \
1415                 set_page_private(page, 0); \
1416         } \
1417         set_bit(PAGE_PRIVATE_NOT_POINTER, &page_private(page)); \
1418         set_bit(PAGE_PRIVATE_##flagname, &page_private(page)); \
1419 }


5) set_page_writebac()
6) submit write
7) unlock_page()
8) put_page(page)

Later, f2fs_invalidate_folio will do put_page again by:
clear_page_private_gcing(&folio->page);

--- in fs/f2fs/f2fs.h
1421 #define PAGE_PRIVATE_CLEAR_FUNC(name, flagname) \
1422 static inline void clear_page_private_##name(struct page *page) \
1423 { \
1424         clear_bit(PAGE_PRIVATE_##flagname, &page_private(page)); \
1425         if (page_private(page) == BIT(PAGE_PRIVATE_NOT_POINTER)) { \
1426                 set_page_private(page, 0); \
1427                 if (PagePrivate(page)) { \
1428                         ClearPagePrivate(page); \
1429                         put_page(page); \
1430                 }\
1431         } \
1432 }

> page is now free, gets reallocated into a THP
> lookup from the f2fs file finds the new THP
> things explode messily
> 
> Checking page->mapping is going to avoid the messy explosion, but
> you'll still have a page in the page cache which doesn't actually
> belong to you, and that's going to lead to subtle data corruption.
> 
> This should be caught by page_expected_state(), called from
> free_page_is_bad(), called from free_pages_prepare().  Do your testers
> have CONFIG_DEBUG_VM enabled?  That might give you a fighting chance at
> finding the last place which called put_page().  It won't necessarily be
> the _wrong_ place to call put_page() (that may have happened earlier),
> but it may give you a clue.
> 
> > > 
> > > struct page *pagecache_get_page(struct address_space *mapping, pgoff_t index,
> > > 		int fgp_flags, gfp_t gfp)
> > > {
> > > 	struct folio *folio;
> > > 
> > > 	folio = __filemap_get_folio(mapping, index, fgp_flags, gfp);
> > > 	if (IS_ERR(folio))
> > > 		return NULL;
> > > 	return folio_file_page(folio, index);
> > > }
> > > 
> > > Thanks,
> > > 
> > > > -		f2fs_put_page(page, 1);
> > > > -		goto repeat;
> > > > -	}
> > > > -	if (unlikely(!PageUptodate(page))) {
> > > > +	if (unlikely(page->mapping != mapping || !PageUptodate(page))) {
> > > >   		f2fs_put_page(page, 1);
> > > >   		return ERR_PTR(-EIO);
> > > >   	}

next prev parent reply	other threads:[~2023-04-05 16:39 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-23 21:39 [f2fs-dev] [PATCH] f2fs: get out of a repeat loop when getting a locked data page Jaegeuk Kim
2023-03-23 21:39 ` Jaegeuk Kim
2023-03-26 13:47 ` [f2fs-dev] " Chao Yu
2023-03-26 13:47   ` Chao Yu
2023-03-27 15:30   ` Jaegeuk Kim
2023-03-27 15:30     ` Jaegeuk Kim
2023-03-27 16:22     ` Matthew Wilcox
2023-03-27 16:22       ` Matthew Wilcox
2023-04-05 16:39       ` Jaegeuk Kim [this message]
2023-04-05 16:39         ` Jaegeuk Kim
2023-04-05 20:47         ` Jaegeuk Kim
2023-04-05 20:47           ` Jaegeuk Kim
2023-04-06  1:50         ` Chao Yu
2023-04-06  1:50           ` Chao Yu
2023-04-06  3:18           ` Jaegeuk Kim
2023-04-06  3:18             ` Jaegeuk Kim
2023-04-10  9:57             ` Chao Yu
2023-04-10  9:57               ` Chao Yu
2023-04-10 23:24               ` Jaegeuk Kim
2023-04-10 23:24                 ` Jaegeuk Kim
2023-04-11  8:49                 ` Chao Yu
2023-04-11  8:49                   ` Chao Yu
2023-03-30 13:23     ` Chao Yu
2023-03-30 13:23       ` Chao Yu
2023-07-17 17:34 ` patchwork-bot+f2fs
2023-07-17 17:34   ` patchwork-bot+f2fs

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZC2kSfNUXKK4PfpM@google.com \
    --to=jaegeuk@kernel.org \
    --cc=linux-f2fs-devel@lists.sourceforge.net \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=stable@vger.kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.