All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tarun Sahu <tsahu@linux.ibm.com>
To: Mike Kravetz <mike.kravetz@oracle.com>
Cc: linux-mm@kvack.org, akpm@linux-foundation.org,
	muchun.song@linux.dev, aneesh.kumar@linux.ibm.com,
	willy@infradead.org, sidhartha.kumar@oracle.com,
	gerald.schaefer@linux.ibm.com, linux-kernel@vger.kernel.org,
	jaypatel@linux.ibm.com
Subject: Re: [PATCH v2] mm/folio: Avoid special handling for order value 0 in folio_set_order
Date: Thu, 08 Jun 2023 15:33:14 +0530	[thread overview]
Message-ID: <873532jmot.fsf@linux.ibm.com> (raw)
In-Reply-To: <20230606155853.GA4150@monkey>

Hi Mike,

Please find my comments inline.

Mike Kravetz <mike.kravetz@oracle.com> writes:

> On 06/06/23 10:32, Tarun Sahu wrote:
>>                                        
>> Hi Mike,              
>>     
>> Thanks for your inputs.                          
>> I wanted to know if you find it okay, Can I send it again adding your Reviewed-by?
>
> Hi Tarun,
>
> Just a few more comments/questions.
>
> On 05/15/23 22:38, Tarun Sahu wrote:
>> folio_set_order(folio, 0) is used in kernel at two places
>> __destroy_compound_gigantic_folio and __prep_compound_gigantic_folio.
>> Currently, It is called to clear out the folio->_folio_nr_pages and
>> folio->_folio_order.
>> 
>> For __destroy_compound_gigantic_folio:
>> In past, folio_set_order(folio, 0) was needed because page->mapping used
>> to overlap with _folio_nr_pages and _folio_order. So if these fields were
>> left uncleared during freeing gigantic hugepages, they were causing
>> "BUG: bad page state" due to non-zero page->mapping. Now, After
>> Commit a01f43901cfb ("hugetlb: be sure to free demoted CMA pages to
>> CMA") page->mapping has explicitly been cleared out for tail pages. Also,
>> _folio_order and _folio_nr_pages no longer overlaps with page->mapping.
>
> I believe the same logic/reasoning as above also applies to
> __prep_compound_gigantic_folio.
> Why?
> In __prep_compound_gigantic_folio we only call folio_set_order(folio, 0)
> in the case of error.  If __prep_compound_gigantic_folio fails, the caller
> will then call free_gigantic_folio() on the "gigantic page".  However, it is
> not really a gigantic  at this point in time, and we are simply calling
> cma_release() or free_contig_range().
> The end result is that I do not believe the existing call to
> folio_set_order(folio, 0) in __prep_compound_gigantic_folio is actually
> required.  ???
No, there is a difference. IIUC, __destroy_compound_gigantic_folio
explicitly reset page->mapping for each page of compound page which
makes sure, even if in future some fields of struct page/folio overlaps
with page->mapping, it won't cause `BUG: bad page state` error. But If we
just remove folio_set_order(folio, 0) from __prep_compound_gigantic_folio
without moving folio_set_order(folio, order), this will cause extra
maintenance overhead to track if page->_folio_order overlaps with
page->mapping everytime struct page fields are changed. As in case of
overlapping page->mapping will be non-zero. IMHO, To avoid it,
moving the folio_set_order(folio, order) after all error checks are
done on tail pages. So, _folio_order is either set on success and not
set in case of error. (which is the original proposal). But for
folio_set_head, I agree the way you suggested below.

WDYT?

>
> If my reasoning above is correct, then we could just have one patch to
> remove the folio_set_order(folio, 0) calls and remove special casing for
> order 0 in folio_set_order.
>
> However, I still believe your restructuring of __prep_compound_gigantic_folio,
> is of value.  I do not believe there is an issue as questioned by Matthew.  My
> reasoning has been stated previously.  We could make changes like the following
> to retain the same order of operations in __prep_compound_gigantic_folio and
> totally avoid Matthew's question.  Totally untested.
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index ea24718db4af..a54fee663cb1 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1950,10 +1950,8 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
>  	int nr_pages = 1 << order;
>  	struct page *p;
>  
> -	__folio_clear_reserved(folio);
> -	__folio_set_head(folio);
>  	/* we rely on prep_new_hugetlb_folio to set the destructor */
> -	folio_set_order(folio, order);
> +
>  	for (i = 0; i < nr_pages; i++) {
>  		p = folio_page(folio, i);
>  
> @@ -1969,7 +1967,7 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
>  		 * on the head page when they need know if put_page() is needed
>  		 * after get_user_pages().
>  		 */
> -		if (i != 0)	/* head page cleared above */
> +		if (i != 0)	/* head page cleared below */
>  			__ClearPageReserved(p);
>  		/*
>  		 * Subtle and very unlikely
> @@ -1996,8 +1994,14 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
>  		} else {
>  			VM_BUG_ON_PAGE(page_count(p), p);
>  		}
> -		if (i != 0)
> +
> +		if (i == 0) {
> +			__folio_clear_reserved(folio);
> +			__folio_set_head(folio);
> +			folio_set_order(folio, order);
With folio_set_head, I agree to this, But does not feel good with
folio_set_order as per my above reasoning. WDYT?

> +		} else {
>  			set_compound_head(p, &folio->page);
> +		}
>  	}
>  	atomic_set(&folio->_entire_mapcount, -1);
>  	atomic_set(&folio->_nr_pages_mapped, 0);
> @@ -2017,7 +2021,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
>  		p = folio_page(folio, j);
>  		__ClearPageReserved(p);
>  	}
> -	folio_set_order(folio, 0);
>  	__folio_clear_head(folio);
>  	return false;
>  }
>
>
>> 
>> struct page {
>> ...
>>    struct address_space * mapping;  /* 24     8 */
>> ...
>> }
>> 
>> struct folio {
>> ...
>>     union {
>>         struct {
>>         	long unsigned int _flags_1;      /* 64    8 */
>>         	long unsigned int _head_1;       /* 72    8 */
>>         	unsigned char _folio_dtor;       /* 80    1 */
>>         	unsigned char _folio_order;      /* 81    1 */
>> 
>>         	/* XXX 2 bytes hole, try to pack */
>> 
>>         	atomic_t   _entire_mapcount;     /* 84    4 */
>>         	atomic_t   _nr_pages_mapped;     /* 88    4 */
>>         	atomic_t   _pincount;            /* 92    4 */
>>         	unsigned int _folio_nr_pages;    /* 96    4 */
>>         };                                       /* 64   40 */
>>         struct page __page_1 __attribute__((__aligned__(8))); /* 64   64 */
>>     }
>> ...
>> }
>
> I do not think the copy of page/folio definitions adds much value to the
> commit message.
Yeah, Will remove it.
>
> -- 
> Mike Kravetz


  reply	other threads:[~2023-06-08 10:03 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-15 17:08 [PATCH v2] mm/folio: Avoid special handling for order value 0 in folio_set_order Tarun Sahu
2023-05-15 17:15 ` Tarun Sahu
2023-05-15 17:16 ` Matthew Wilcox
2023-05-15 17:45   ` Mike Kravetz
2023-06-03  0:08     ` Mike Kravetz
2023-05-16 13:09   ` Tarun Sahu
2023-05-22  5:49 ` Tarun Sahu
2023-06-06 15:58 ` Mike Kravetz
2023-06-08 10:03   ` Tarun Sahu [this message]
2023-06-08 23:52     ` Mike Kravetz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=873532jmot.fsf@linux.ibm.com \
    --to=tsahu@linux.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=gerald.schaefer@linux.ibm.com \
    --cc=jaypatel@linux.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=muchun.song@linux.dev \
    --cc=sidhartha.kumar@oracle.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.