From: Tarun Sahu <tsahu@linux.ibm.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: linux-mm@kvack.org, akpm@linux-foundation.org,
muchun.song@linux.dev, mike.kravetz@oracle.com,
aneesh.kumar@linux.ibm.com, sidhartha.kumar@oracle.com,
gerald.schaefer@linux.ibm.com, linux-kernel@vger.kernel.org,
jaypatel@linux.ibm.com
Subject: Re: [PATCH] mm/folio: Avoid special handling for order value 0 in folio_set_order
Date: Mon, 24 Apr 2023 21:01:21 +0530 [thread overview]
Message-ID: <877cu15mba.fsf@linux.ibm.com> (raw)
In-Reply-To: <ZDmzyag88pO1Kdk8@casper.infradead.org>
Hi, Mathew,
I am not sure If I was clear about my intention behind the patch.
Here, I attempt to answer it again. Please lemme know if you agree.
Matthew Wilcox <willy@infradead.org> writes:
> On Sat, Apr 15, 2023 at 01:18:32AM +0530, Tarun Sahu wrote:
>> folio_set_order(folio, 0); which is an abuse of folio_set_order as 0-order
>> folio does not have any tail page to set order.
>
> I think you're missing the point of how folio_set_order() is used.
> When splitting a large folio, we need to zero out the folio_nr_pages
> in the tail, so it does have a tail page, and that tail page needs to
> be zeroed. We even assert that there is a tail page:
>
> if (WARN_ON_ONCE(!folio_test_large(folio)))
> return;
>
> Or maybe you need to explain yourself better.
>
In the upstream, I don't see folio_set_order(folio, 0) being called in
splitting path. IIUC, we had to zero out _folio_nr_pages during freeing
gigantic folio as described by Commit ba9c1201beaa
("mm/hugetlb: clear compound_nr before freeing gigantic pages").
I agree that folio_set_order(folio, 0) is called with folio having
tail pages. But I meant only that, in general, it is just confusing to
have setting the folio order to 0.
With this patch, I would like to draw attention to the point that there
is no need to call folio_set_order(folio, 0) anymore to zero out
_folio_order and _folio_nr_pages.
In past, it was needed because page->mapping used to overlap with
_folio_nr_pages and _folio_order. So if these fields were left
uncleared during freeing gigantic hugepages, they were causing
"BUG: bad page state" due to non-zero page->mapping. Now, After
Commit a01f43901cfb ("hugetlb: be sure to free demoted CMA pages to
CMA") page->mapping has explicitly been cleared out for tail pages.
Also, _folio_order and _folio_nr_pages no longer overlaps with page->mapping.
struct page {
...
struct address_space * mapping; /* 24 8 */
...
}
struct folio {
...
union {
struct {
long unsigned int _flags_1; /* 64 8 */
long unsigned int _head_1; /* 72 8 */
unsigned char _folio_dtor; /* 80 1 */
unsigned char _folio_order; /* 81 1 */
/* XXX 2 bytes hole, try to pack */
atomic_t _entire_mapcount; /* 84 4 */
atomic_t _nr_pages_mapped; /* 88 4 */
atomic_t _pincount; /* 92 4 */
unsigned int _folio_nr_pages; /* 96 4 */
}; /* 64 40 */
struct page __page_1 __attribute__((__aligned__(8))); /* 64 64 */
}
...
}
So, folio_set_order(folio, 0) can be removed from freeing gigantic
folio path (__destroy_compound_gigantic_folio).
Another place, where folio_set_order(folio, 0) is called inside
__prep_compound_gigantic_folio during error path. Here,
folio_set_order(folio, 0) can also be removed if we move
folio_set_order(folio, order) after for loop. Also, Mike confirmed that
it is safe to move the call.
~Tarun
>> folio->_folio_nr_pages is
>> set to 0 for order 0 in folio_set_order. It is required because
>> _folio_nr_pages overlapped with page->mapping and leaving it non zero
>> caused "bad page" error while freeing gigantic hugepages. This was fixed in
>> Commit ba9c1201beaa ("mm/hugetlb: clear compound_nr before freeing gigantic
>> pages"). Also commit a01f43901cfb ("hugetlb: be sure to free demoted CMA
>> pages to CMA") now explicitly clear page->mapping and hence we won't see
>> the bad page error even if _folio_nr_pages remains unset. Also the order 0
>> folios are not supposed to call folio_set_order, So now we can get rid of
>> folio_set_order(folio, 0) from hugetlb code path to clear the confusion.
>
> ... this is all very confusing.
>
>> The patch also moves _folio_set_head and folio_set_order calls in
>> __prep_compound_gigantic_folio() such that we avoid clearing them in the
>> error path.
>
> But don't we need those bits set while we operate on the folio to set it
> up? It makes me nervous if we don't have those bits set because we can
> end up with speculative references that point to a head page while that
> page is not marked as a head page. It may not be a problem, but I want
> to see some air-tight analysis of that.
>
>> Testing: I have run LTP tests, which all passes. and also I have written
>> the test in LTP which tests the bug caused by compound_nr and page->mapping
>> overlapping.
>>
>> https://lore.kernel.org/all/20230413090753.883953-1-tsahu@linux.ibm.com/
>>
>> Running on older kernel ( < 5.10-rc7) with the above bug this fails while
>> on newer kernel and, also with this patch it passes.
>>
>> Signed-off-by: Tarun Sahu <tsahu@linux.ibm.com>
>> ---
>> mm/hugetlb.c | 9 +++------
>> mm/internal.h | 8 ++------
>> 2 files changed, 5 insertions(+), 12 deletions(-)
>>
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> index f16b25b1a6b9..e2540269c1dc 100644
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
>> @@ -1489,7 +1489,6 @@ static void __destroy_compound_gigantic_folio(struct folio *folio,
>> set_page_refcounted(p);
>> }
>>
>> - folio_set_order(folio, 0);
>> __folio_clear_head(folio);
>> }
>>
>> @@ -1951,9 +1950,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
>> struct page *p;
>>
>> __folio_clear_reserved(folio);
>> - __folio_set_head(folio);
>> - /* we rely on prep_new_hugetlb_folio to set the destructor */
>> - folio_set_order(folio, order);
>> for (i = 0; i < nr_pages; i++) {
>> p = folio_page(folio, i);
>>
>> @@ -1999,6 +1995,9 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
>> if (i != 0)
>> set_compound_head(p, &folio->page);
>> }
>> + __folio_set_head(folio);
>> + /* we rely on prep_new_hugetlb_folio to set the destructor */
>> + folio_set_order(folio, order);
>> atomic_set(&folio->_entire_mapcount, -1);
>> atomic_set(&folio->_nr_pages_mapped, 0);
>> atomic_set(&folio->_pincount, 0);
>> @@ -2017,8 +2016,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
>> p = folio_page(folio, j);
>> __ClearPageReserved(p);
>> }
>> - folio_set_order(folio, 0);
>> - __folio_clear_head(folio);
>> return false;
>> }
>>
>> diff --git a/mm/internal.h b/mm/internal.h
>> index 18cda26b8a92..0d96a3bc1d58 100644
>> --- a/mm/internal.h
>> +++ b/mm/internal.h
>> @@ -425,16 +425,12 @@ int split_free_page(struct page *free_page,
>> */
>> static inline void folio_set_order(struct folio *folio, unsigned int order)
>> {
>> - if (WARN_ON_ONCE(!folio_test_large(folio)))
>> + if (WARN_ON_ONCE(!order || !folio_test_large(folio)))
>> return;
>>
>> folio->_folio_order = order;
>> #ifdef CONFIG_64BIT
>> - /*
>> - * When hugetlb dissolves a folio, we need to clear the tail
>> - * page, rather than setting nr_pages to 1.
>> - */
>> - folio->_folio_nr_pages = order ? 1U << order : 0;
>> + folio->_folio_nr_pages = 1U << order;
>> #endif
>> }
>>
>> --
>> 2.31.1
>>
next prev parent reply other threads:[~2023-04-24 15:37 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-14 19:48 [PATCH] mm/folio: Avoid special handling for order value 0 in folio_set_order Tarun Sahu
2023-04-14 20:12 ` Matthew Wilcox
2023-04-18 9:03 ` Tarun Sahu
2023-04-18 18:56 ` Mike Kravetz
2023-04-24 15:40 ` Tarun Sahu
2023-04-24 15:31 ` Tarun Sahu [this message]
2023-04-14 21:35 ` Sidhartha Kumar
2023-04-18 9:20 ` Tarun Sahu
2023-04-18 9:25 ` Tarun Sahu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=877cu15mba.fsf@linux.ibm.com \
--to=tsahu@linux.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=aneesh.kumar@linux.ibm.com \
--cc=gerald.schaefer@linux.ibm.com \
--cc=jaypatel@linux.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mike.kravetz@oracle.com \
--cc=muchun.song@linux.dev \
--cc=sidhartha.kumar@oracle.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.