From: Luis Chamberlain <mcgrof@kernel.org>
To: Zi Yan <ziy@nvidia.com>
Cc: Vlastimil Babka <vbabka@suse.cz>,
Sean Christopherson <seanjc@google.com>,
Matthew Wilcox <willy@infradead.org>,
"Pankaj Raghav (Samsung)" <kernel@pankajraghav.com>,
djwong@kernel.org, brauner@kernel.org, david@fromorbit.com,
chandan.babu@oracle.com, akpm@linux-foundation.org,
linux-fsdevel@vger.kernel.org, hare@suse.de,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
linux-xfs@vger.kernel.org, gost.dev@samsung.com,
p.raghav@samsung.com
Subject: Re: [PATCH v4 05/11] mm: do not split a folio if it has minimum folio order requirement
Date: Mon, 29 Apr 2024 17:31:04 -0700 [thread overview]
Message-ID: <ZjA7yBQjkh52TM_T@bombadil.infradead.org> (raw)
In-Reply-To: <6799F341-9E37-4F3E-B0D0-B5B2138A5F5F@nvidia.com>
On Mon, Apr 29, 2024 at 10:29:29AM -0400, Zi Yan wrote:
> On 28 Apr 2024, at 23:56, Luis Chamberlain wrote:
>
> > On Sat, Apr 27, 2024 at 05:57:17PM -0700, Luis Chamberlain wrote:
> >> On Fri, Apr 26, 2024 at 04:46:11PM -0700, Luis Chamberlain wrote:
> >>> On Thu, Apr 25, 2024 at 05:47:28PM -0700, Luis Chamberlain wrote:
> >>>> On Thu, Apr 25, 2024 at 09:10:16PM +0100, Matthew Wilcox wrote:
> >>>>> On Thu, Apr 25, 2024 at 01:37:40PM +0200, Pankaj Raghav (Samsung) wrote:
> >>>>>> From: Pankaj Raghav <p.raghav@samsung.com>
> >>>>>>
> >>>>>> using that API for LBS is resulting in an NULL ptr dereference
> >>>>>> error in the writeback path [1].
> >>>>>>
> >>>>>> [1] https://gist.github.com/mcgrof/d12f586ec6ebe32b2472b5d634c397df
> >>>>>
> >>>>> How would I go about reproducing this?
> >>
> >> Well so the below fixes this but I am not sure if this is correct.
> >> folio_mark_dirty() at least says that a folio should not be truncated
> >> while its running. I am not sure if we should try to split folios then
> >> even though we check for writeback once. truncate_inode_partial_folio()
> >> will folio_wait_writeback() but it will split_folio() before checking
> >> for claiming to fail to truncate with folio_test_dirty(). But since the
> >> folio is locked its not clear why this should be possible.
> >>
> >> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> >> index 83955362d41c..90195506211a 100644
> >> --- a/mm/huge_memory.c
> >> +++ b/mm/huge_memory.c
> >> @@ -3058,7 +3058,7 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
> >> if (new_order >= folio_order(folio))
> >> return -EINVAL;
> >>
> >> - if (folio_test_writeback(folio))
> >> + if (folio_test_dirty(folio) || folio_test_writeback(folio))
> >> return -EBUSY;
> >>
> >> if (!folio_test_anon(folio)) {
> >
> > I wondered what code path is causing this and triggering this null
> > pointer, so I just sprinkled a check here:
> >
> > VM_BUG_ON_FOLIO(folio_test_dirty(folio), folio);
> >
> > The answer was:
> >
> > kcompactd() --> migrate_pages_batch()
> > --> try_split_folio --> split_folio_to_list() -->
> > split_huge_page_to_list_to_order()
> >
>
> There are 3 try_split_folio() in migrate_pages_batch().
This is only true for linux-next, for v6.9-rc5 off of which this testing
is based on there are only two.
> First one is to split anonymous large folios that are on deferred
> split list, so not related;
This is in linux-next and not v6.9-rc5.
> second one is to split THPs when thp migration is not supported, but
> this is compaction, so not related; third one is to split large folios
> when there is no same size free page in the system, and this should be
> the one.
Agreed, the case where migrate_folio_unmap() failed with -ENOMEM. This
also helps us enhance the reproducer further, which I'll do next.
> > And I verified that moving the check only to the migrate_pages_batch()
> > path also fixes the crash:
> >
> > diff --git a/mm/migrate.c b/mm/migrate.c
> > index 73a052a382f1..83b528eb7100 100644
> > --- a/mm/migrate.c
> > +++ b/mm/migrate.c
> > @@ -1484,7 +1484,12 @@ static inline int try_split_folio(struct folio *folio, struct list_head *split_f
> > int rc;
> >
> > folio_lock(folio);
> > + if (folio_test_dirty(folio)) {
> > + rc = -EBUSY;
> > + goto out;
> > + }
> > rc = split_folio_to_list(folio, split_folios);
> > +out:
> > folio_unlock(folio);
> > if (!rc)
> > list_move_tail(&folio->lru, split_folios);
> >
> > However I'd like compaction folks to review this. I see some indications
> > in the code that migration can race with truncation but we feel fine by
> > it by taking the folio lock. However here we have a case where we see
> > the folio clearly locked and the folio is dirty. Other migraiton code
> > seems to write back the code and can wait, here we just move on. Further
> > reading on commit 0003e2a414687 ("mm: Add AS_UNMOVABLE to mark mapping
> > as completely unmovable") seems to hint that migration is safe if the
> > mapping either does not exist or the mapping does exist but has
> > mapping->a_ops->migrate_folio so I'd like further feedback on this.
>
> During migration, all page table entries pointing to this dirty folio
> are invalid, and accesses to this folio will cause page fault and
> wait on the migration entry. I am not sure we need to skip dirty folios.
I see.. thanks!
> > Another thing which requires review is if we we split a folio but not
> > down to order 0 but to the new min order, does the accounting on
> > migrate_pages_batch() require changing? And most puzzling, why do we
>
> What accounting are you referring to? split code should take care of it.
The folio order can change after split, and so I was concerned about the
nr_pages used in migrate_pages_batch(). But I see now that when
migrate_folio_unmap() first failed we try to split the folio, and if
successful I see now we the caller will again call migrate_pages_batch()
with a retry attempt of 1 only to the split folios. I also see the
nr_pages is just local to each list for each loop, first on the from
list to unmap and afte on the unmap list so we move the folios.
> > not see this with regular large folios, but we do see it with minorder ?
>
> I wonder if the split code handles folio->mapping->i_pages properly.
> Does the i_pages store just folio pointers or also need all tail page
> pointers? I am no expert in fs, thus need help.
mapping->i_pages stores folio pointers in the page cache or
swap/dax/shadow entries (xa_is_value(folio)). The folios however can be
special and we special-case them with shmem_mapping(mapping) checks.
split_huge_page_to_list_to_order() doens't get called with swap/dax/shadow
entries, and we also bail out on shmem_mapping(mapping) already.
Luis
next prev parent reply other threads:[~2024-04-30 0:31 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-25 11:37 [PATCH v4 00/11] enable bs > ps in XFS Pankaj Raghav (Samsung)
2024-04-25 11:37 ` [PATCH v4 01/11] readahead: rework loop in page_cache_ra_unbounded() Pankaj Raghav (Samsung)
2024-04-25 11:37 ` [PATCH v4 02/11] fs: Allow fine-grained control of folio sizes Pankaj Raghav (Samsung)
2024-04-25 18:07 ` Hannes Reinecke
2024-04-26 15:09 ` Darrick J. Wong
2024-04-25 11:37 ` [PATCH v4 03/11] filemap: allocate mapping_min_order folios in the page cache Pankaj Raghav (Samsung)
2024-04-25 19:04 ` Hannes Reinecke
2024-04-26 15:12 ` Darrick J. Wong
2024-04-28 20:59 ` Pankaj Raghav (Samsung)
2024-04-25 11:37 ` [PATCH v4 04/11] readahead: allocate folios with mapping_min_order in readahead Pankaj Raghav (Samsung)
2024-04-25 18:53 ` Matthew Wilcox
2024-04-25 11:37 ` [PATCH v4 05/11] mm: do not split a folio if it has minimum folio order requirement Pankaj Raghav (Samsung)
2024-04-25 20:10 ` Matthew Wilcox
2024-04-26 0:47 ` Luis Chamberlain
2024-04-26 23:46 ` Luis Chamberlain
2024-04-28 0:57 ` Luis Chamberlain
2024-04-29 3:56 ` Luis Chamberlain
2024-04-29 14:29 ` Zi Yan
2024-04-30 0:31 ` Luis Chamberlain [this message]
2024-04-30 0:49 ` Luis Chamberlain
2024-04-30 2:43 ` Zi Yan
2024-04-30 19:27 ` Luis Chamberlain
2024-05-01 4:13 ` Matthew Wilcox
2024-05-01 14:28 ` Matthew Wilcox
2024-04-26 15:49 ` Pankaj Raghav (Samsung)
2024-04-25 11:37 ` [PATCH v4 06/11] filemap: cap PTE range to be created to i_size in folio_map_range() Pankaj Raghav (Samsung)
2024-04-25 20:24 ` Matthew Wilcox
2024-04-26 12:54 ` Pankaj Raghav (Samsung)
2024-04-25 11:37 ` [PATCH v4 07/11] iomap: fix iomap_dio_zero() for fs bs > system page size Pankaj Raghav (Samsung)
2024-04-26 6:22 ` Christoph Hellwig
2024-04-26 11:43 ` Pankaj Raghav (Samsung)
2024-04-27 5:12 ` Christoph Hellwig
2024-04-29 21:02 ` Pankaj Raghav (Samsung)
2024-04-27 3:26 ` Matthew Wilcox
2024-04-27 4:52 ` Christoph Hellwig
2024-04-25 11:37 ` [PATCH v4 08/11] xfs: use kvmalloc for xattr buffers Pankaj Raghav (Samsung)
2024-04-26 15:18 ` Darrick J. Wong
2024-04-28 21:06 ` Pankaj Raghav (Samsung)
2024-04-25 11:37 ` [PATCH v4 09/11] xfs: expose block size in stat Pankaj Raghav (Samsung)
2024-04-26 15:15 ` Darrick J. Wong
2024-04-25 11:37 ` [PATCH v4 10/11] xfs: make the calculation generic in xfs_sb_validate_fsb_count() Pankaj Raghav (Samsung)
2024-04-26 15:16 ` Darrick J. Wong
2024-04-25 11:37 ` [PATCH v4 11/11] xfs: enable block size larger than page size support Pankaj Raghav (Samsung)
2024-04-26 15:18 ` Darrick J. Wong
[not found] ` <87y18zxvpd.fsf@gmail.com>
2024-04-27 5:05 ` [PATCH v4 00/11] enable bs > ps in XFS Darrick J. Wong
2024-04-29 20:39 ` Pankaj Raghav (Samsung)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZjA7yBQjkh52TM_T@bombadil.infradead.org \
--to=mcgrof@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=brauner@kernel.org \
--cc=chandan.babu@oracle.com \
--cc=david@fromorbit.com \
--cc=djwong@kernel.org \
--cc=gost.dev@samsung.com \
--cc=hare@suse.de \
--cc=kernel@pankajraghav.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-xfs@vger.kernel.org \
--cc=p.raghav@samsung.com \
--cc=seanjc@google.com \
--cc=vbabka@suse.cz \
--cc=willy@infradead.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).