Re: [PATCH v2] mm/folio: Avoid special handling for order value 0 in folio_set_order

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Mike Kravetz <mike.kravetz@oracle.com>
To: Tarun Sahu <tsahu@linux.ibm.com>
Cc: linux-mm@kvack.org, akpm@linux-foundation.org,
	muchun.song@linux.dev, aneesh.kumar@linux.ibm.com,
	willy@infradead.org, sidhartha.kumar@oracle.com,
	gerald.schaefer@linux.ibm.com, linux-kernel@vger.kernel.org,
	jaypatel@linux.ibm.com
Subject: Re: [PATCH v2] mm/folio: Avoid special handling for order value 0 in folio_set_order
Date: Thu, 8 Jun 2023 16:52:34 -0700	[thread overview]
Message-ID: <20230608235234.GC88798@monkey> (raw)
In-Reply-To: <873532jmot.fsf@linux.ibm.com>

On 06/08/23 15:33, Tarun Sahu wrote:
> Hi Mike,
> 
> Please find my comments inline.
> 
> Mike Kravetz <mike.kravetz@oracle.com> writes:
> 
> > On 06/06/23 10:32, Tarun Sahu wrote:
> >>                                        
> >> Hi Mike,              
> >>     
> >> Thanks for your inputs.                          
> >> I wanted to know if you find it okay, Can I send it again adding your Reviewed-by?
> >
> > Hi Tarun,
> >
> > Just a few more comments/questions.
> >
> > On 05/15/23 22:38, Tarun Sahu wrote:
> >> folio_set_order(folio, 0) is used in kernel at two places
> >> __destroy_compound_gigantic_folio and __prep_compound_gigantic_folio.
> >> Currently, It is called to clear out the folio->_folio_nr_pages and
> >> folio->_folio_order.
> >> 
> >> For __destroy_compound_gigantic_folio:
> >> In past, folio_set_order(folio, 0) was needed because page->mapping used
> >> to overlap with _folio_nr_pages and _folio_order. So if these fields were
> >> left uncleared during freeing gigantic hugepages, they were causing
> >> "BUG: bad page state" due to non-zero page->mapping. Now, After
> >> Commit a01f43901cfb ("hugetlb: be sure to free demoted CMA pages to
> >> CMA") page->mapping has explicitly been cleared out for tail pages. Also,
> >> _folio_order and _folio_nr_pages no longer overlaps with page->mapping.
> >
> > I believe the same logic/reasoning as above also applies to
> > __prep_compound_gigantic_folio.
> > Why?
> > In __prep_compound_gigantic_folio we only call folio_set_order(folio, 0)
> > in the case of error.  If __prep_compound_gigantic_folio fails, the caller
> > will then call free_gigantic_folio() on the "gigantic page".  However, it is
> > not really a gigantic  at this point in time, and we are simply calling
> > cma_release() or free_contig_range().
> > The end result is that I do not believe the existing call to
> > folio_set_order(folio, 0) in __prep_compound_gigantic_folio is actually
> > required.  ???
> No, there is a difference. IIUC, __destroy_compound_gigantic_folio
> explicitly reset page->mapping for each page of compound page which
> makes sure, even if in future some fields of struct page/folio overlaps
> with page->mapping, it won't cause `BUG: bad page state` error. But If we
> just remove folio_set_order(folio, 0) from __prep_compound_gigantic_folio
> without moving folio_set_order(folio, order), this will cause extra
> maintenance overhead to track if page->_folio_order overlaps with
> page->mapping everytime struct page fields are changed. As in case of
> overlapping page->mapping will be non-zero. IMHO, To avoid it,
> moving the folio_set_order(folio, order) after all error checks are
> done on tail pages. So, _folio_order is either set on success and not
> set in case of error. (which is the original proposal). But for
> folio_set_head, I agree the way you suggested below.
> 
> WDYT?

Right.  It is more 'future proof' to only set folio order on success as
done in your original patch.

> >
> > If my reasoning above is correct, then we could just have one patch to
> > remove the folio_set_order(folio, 0) calls and remove special casing for
> > order 0 in folio_set_order.
> >
> > However, I still believe your restructuring of __prep_compound_gigantic_folio,
> > is of value.  I do not believe there is an issue as questioned by Matthew.  My
> > reasoning has been stated previously.  We could make changes like the following
> > to retain the same order of operations in __prep_compound_gigantic_folio and
> > totally avoid Matthew's question.  Totally untested.
> >
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index ea24718db4af..a54fee663cb1 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -1950,10 +1950,8 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
> >  	int nr_pages = 1 << order;
> >  	struct page *p;
> >  
> > -	__folio_clear_reserved(folio);
> > -	__folio_set_head(folio);
> >  	/* we rely on prep_new_hugetlb_folio to set the destructor */
> > -	folio_set_order(folio, order);
> > +
> >  	for (i = 0; i < nr_pages; i++) {
> >  		p = folio_page(folio, i);
> >  
> > @@ -1969,7 +1967,7 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
> >  		 * on the head page when they need know if put_page() is needed
> >  		 * after get_user_pages().
> >  		 */
> > -		if (i != 0)	/* head page cleared above */
> > +		if (i != 0)	/* head page cleared below */
> >  			__ClearPageReserved(p);
> >  		/*
> >  		 * Subtle and very unlikely
> > @@ -1996,8 +1994,14 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
> >  		} else {
> >  			VM_BUG_ON_PAGE(page_count(p), p);
> >  		}
> > -		if (i != 0)
> > +
> > +		if (i == 0) {
> > +			__folio_clear_reserved(folio);
> > +			__folio_set_head(folio);
> > +			folio_set_order(folio, order);
> With folio_set_head, I agree to this, But does not feel good with
> folio_set_order as per my above reasoning. WDYT?

Agree with your reasoning.  We should just move __folio_set_head and
folio_set_order after the loop as you originally suggested.

> 
> > +		} else {
> >  			set_compound_head(p, &folio->page);
> > +		}
> >  	}
> >  	atomic_set(&folio->_entire_mapcount, -1);
> >  	atomic_set(&folio->_nr_pages_mapped, 0);
> > @@ -2017,7 +2021,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
> >  		p = folio_page(folio, j);
> >  		__ClearPageReserved(p);
> >  	}
> > -	folio_set_order(folio, 0);
> >  	__folio_clear_head(folio);
> >  	return false;
> >  }
> >
> >
> >> 
> >> struct page {
> >> ...
> >>    struct address_space * mapping;  /* 24     8 */
> >> ...
> >> }
> >> 
> >> struct folio {
> >> ...
> >>     union {
> >>         struct {
> >>         	long unsigned int _flags_1;      /* 64    8 */
> >>         	long unsigned int _head_1;       /* 72    8 */
> >>         	unsigned char _folio_dtor;       /* 80    1 */
> >>         	unsigned char _folio_order;      /* 81    1 */
> >> 
> >>         	/* XXX 2 bytes hole, try to pack */
> >> 
> >>         	atomic_t   _entire_mapcount;     /* 84    4 */
> >>         	atomic_t   _nr_pages_mapped;     /* 88    4 */
> >>         	atomic_t   _pincount;            /* 92    4 */
> >>         	unsigned int _folio_nr_pages;    /* 96    4 */
> >>         };                                       /* 64   40 */
> >>         struct page __page_1 __attribute__((__aligned__(8))); /* 64   64 */
> >>     }
> >> ...
> >> }
> >
> > I do not think the copy of page/folio definitions adds much value to the
> > commit message.
> Yeah, Will remove it.
> >

I think we are finally on the same page.  I am good with this v2 patch.
Only change is to update commit message to remove the definitions.  
-- 
Mike Kravetz

     prev parent reply	other threads:[~2023-06-08 23:52 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-15 17:08 [PATCH v2] mm/folio: Avoid special handling for order value 0 in folio_set_order Tarun Sahu
2023-05-15 17:15 ` Tarun Sahu
2023-05-15 17:16 ` Matthew Wilcox
2023-05-15 17:45   ` Mike Kravetz
2023-06-03  0:08     ` Mike Kravetz
2023-05-16 13:09   ` Tarun Sahu
2023-05-22  5:49 ` Tarun Sahu
2023-06-06 15:58 ` Mike Kravetz
2023-06-08 10:03   ` Tarun Sahu
2023-06-08 23:52     ` Mike Kravetz [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230608235234.GC88798@monkey \
    --to=mike.kravetz@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=gerald.schaefer@linux.ibm.com \
    --cc=jaypatel@linux.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=muchun.song@linux.dev \
    --cc=sidhartha.kumar@oracle.com \
    --cc=tsahu@linux.ibm.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.