From: Tarun Sahu <tsahu@linux.ibm.com>
To: linux-mm@kvack.org
Cc: akpm@linux-foundation.org, muchun.song@linux.dev,
mike.kravetz@oracle.com, aneesh.kumar@linux.ibm.com,
willy@infradead.org, sidhartha.kumar@oracle.com,
gerald.schaefer@linux.ibm.com, linux-kernel@vger.kernel.org,
jaypatel@linux.ibm.com
Subject: Re: [PATCH v2] mm/folio: Avoid special handling for order value 0 in folio_set_order
Date: Mon, 15 May 2023 22:45:30 +0530 [thread overview]
Message-ID: <87pm71qzwt.fsf@linux.ibm.com> (raw)
In-Reply-To: <20230515170809.284680-1-tsahu@linux.ibm.com>
Changes from v1:
- Changed the patch description. Added comment from Mike.
~Tarun
Tarun Sahu <tsahu@linux.ibm.com> writes:
> folio_set_order(folio, 0) is used in kernel at two places
> __destroy_compound_gigantic_folio and __prep_compound_gigantic_folio.
> Currently, It is called to clear out the folio->_folio_nr_pages and
> folio->_folio_order.
>
> For __destroy_compound_gigantic_folio:
> In past, folio_set_order(folio, 0) was needed because page->mapping used
> to overlap with _folio_nr_pages and _folio_order. So if these fields were
> left uncleared during freeing gigantic hugepages, they were causing
> "BUG: bad page state" due to non-zero page->mapping. Now, After
> Commit a01f43901cfb ("hugetlb: be sure to free demoted CMA pages to
> CMA") page->mapping has explicitly been cleared out for tail pages. Also,
> _folio_order and _folio_nr_pages no longer overlaps with page->mapping.
>
> struct page {
> ...
> struct address_space * mapping; /* 24 8 */
> ...
> }
>
> struct folio {
> ...
> union {
> struct {
> long unsigned int _flags_1; /* 64 8 */
> long unsigned int _head_1; /* 72 8 */
> unsigned char _folio_dtor; /* 80 1 */
> unsigned char _folio_order; /* 81 1 */
>
> /* XXX 2 bytes hole, try to pack */
>
> atomic_t _entire_mapcount; /* 84 4 */
> atomic_t _nr_pages_mapped; /* 88 4 */
> atomic_t _pincount; /* 92 4 */
> unsigned int _folio_nr_pages; /* 96 4 */
> }; /* 64 40 */
> struct page __page_1 __attribute__((__aligned__(8))); /* 64 64 */
> }
> ...
> }
>
> So, folio_set_order(folio, 0) can be removed from freeing gigantic
> folio path (__destroy_compound_gigantic_folio).
>
> Another place, folio_set_order(folio, 0) is called inside
> __prep_compound_gigantic_folio during error path. Here,
> folio_set_order(folio, 0) can also be removed if we move
> folio_set_order(folio, order) after for loop.
>
> The patch also moves _folio_set_head call in __prep_compound_gigantic_folio()
> such that we avoid clearing them in the error path.
>
> Also, as Mike pointed out:
> "It would actually be better to move the calls _folio_set_head and
> folio_set_order in __prep_compound_gigantic_folio() as suggested here. Why?
> In the current code, the ref count on the 'head page' is still 1 (or more)
> while those calls are made. So, someone could take a speculative ref on the
> page BEFORE the tail pages are set up."
>
> This way, folio_set_order(folio, 0) is no more needed. And it will also
> helps removing the confusion of folio order being set to 0 (as _folio_order
> field is part of first tail page).
>
> Testing: I have run LTP tests, which all passes. and also I have written
> the test in LTP which tests the bug caused by compound_nr and page->mapping
> overlapping.
>
> https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/mem/hugetlb/hugemmap/hugemmap32.c
>
> Running on older kernel ( < 5.10-rc7) with the above bug this fails while
> on newer kernel and, also with this patch it passes.
>
> Signed-off-by: Tarun Sahu <tsahu@linux.ibm.com>
> ---
> mm/hugetlb.c | 9 +++------
> mm/internal.h | 8 ++------
> 2 files changed, 5 insertions(+), 12 deletions(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index f154019e6b84..607553445855 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1489,7 +1489,6 @@ static void __destroy_compound_gigantic_folio(struct folio *folio,
> set_page_refcounted(p);
> }
>
> - folio_set_order(folio, 0);
> __folio_clear_head(folio);
> }
>
> @@ -1951,9 +1950,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
> struct page *p;
>
> __folio_clear_reserved(folio);
> - __folio_set_head(folio);
> - /* we rely on prep_new_hugetlb_folio to set the destructor */
> - folio_set_order(folio, order);
> for (i = 0; i < nr_pages; i++) {
> p = folio_page(folio, i);
>
> @@ -1999,6 +1995,9 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
> if (i != 0)
> set_compound_head(p, &folio->page);
> }
> + __folio_set_head(folio);
> + /* we rely on prep_new_hugetlb_folio to set the destructor */
> + folio_set_order(folio, order);
> atomic_set(&folio->_entire_mapcount, -1);
> atomic_set(&folio->_nr_pages_mapped, 0);
> atomic_set(&folio->_pincount, 0);
> @@ -2017,8 +2016,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
> p = folio_page(folio, j);
> __ClearPageReserved(p);
> }
> - folio_set_order(folio, 0);
> - __folio_clear_head(folio);
> return false;
> }
>
> diff --git a/mm/internal.h b/mm/internal.h
> index 68410c6d97ac..c59fe08c5b39 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -425,16 +425,12 @@ int split_free_page(struct page *free_page,
> */
> static inline void folio_set_order(struct folio *folio, unsigned int order)
> {
> - if (WARN_ON_ONCE(!folio_test_large(folio)))
> + if (WARN_ON_ONCE(!order || !folio_test_large(folio)))
> return;
>
> folio->_folio_order = order;
> #ifdef CONFIG_64BIT
> - /*
> - * When hugetlb dissolves a folio, we need to clear the tail
> - * page, rather than setting nr_pages to 1.
> - */
> - folio->_folio_nr_pages = order ? 1U << order : 0;
> + folio->_folio_nr_pages = 1U << order;
> #endif
> }
>
> --
> 2.31.1
next prev parent reply other threads:[~2023-05-15 17:15 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-15 17:08 [PATCH v2] mm/folio: Avoid special handling for order value 0 in folio_set_order Tarun Sahu
2023-05-15 17:15 ` Tarun Sahu [this message]
2023-05-15 17:16 ` Matthew Wilcox
2023-05-15 17:45 ` Mike Kravetz
2023-06-03 0:08 ` Mike Kravetz
2023-05-16 13:09 ` Tarun Sahu
2023-05-22 5:49 ` Tarun Sahu
2023-06-06 15:58 ` Mike Kravetz
2023-06-08 10:03 ` Tarun Sahu
2023-06-08 23:52 ` Mike Kravetz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87pm71qzwt.fsf@linux.ibm.com \
--to=tsahu@linux.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=aneesh.kumar@linux.ibm.com \
--cc=gerald.schaefer@linux.ibm.com \
--cc=jaypatel@linux.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mike.kravetz@oracle.com \
--cc=muchun.song@linux.dev \
--cc=sidhartha.kumar@oracle.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.