All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Kravetz <mike.kravetz@oracle.com>
To: Miaohe Lin <linmiaohe@huawei.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Muchun Song <songmuchun@bytedance.com>,
	Linux-MM <linux-mm@kvack.org>,
	linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: [bug report] mm/hugetlb: various bugs with avoid_reserve case in alloc_huge_page()
Date: Thu, 18 Aug 2022 15:43:33 -0700	[thread overview]
Message-ID: <Yv7AlZyNaAgpB4Qg@monkey> (raw)
In-Reply-To: <d449c6d1-314f-5b90-6d68-3773e2722d7f@huawei.com>

On 08/17/22 16:31, Miaohe Lin wrote:
> Hi all:
>     When I investigate the mm/hugetlb.c code again, I found there are a few possible issues
> with avoid_reserve case. (It's really hard to follow the relevant code for me.) Please take
> a look at the below analysis:

Thank you for taking a close look at this code!

I agree that the code is hard to follow.  I have spent many hours/days/weeks
chasing down the cause of incorrect reservation counts.  I imagine there could
be more issues, especially when you add the uncommon avoid_reserve and
MAP_NORESERVE processing.

> 1.avoid_reserve issue with h->resv_huge_pages in alloc_huge_page.

Did you actually see this issue, or is it just based on code inspection?
I tried to recreate, but could not.  When looking closer, this may not
even be possible.

>     Assume:
> 	h->free_huge_pages 60
> 	h->resv_huge_pages 30
> 	spool->rsv_hpages  30

OK.

> 
>     When avoid_reserve is true, after alloc_huge_page(), we will have:

Take a close look at the calling paths for alloc_huge_page when avoid_reserve
is true.  There are only two such call paths.
1) copy_hugetlb_page_range - We allocate pages in the 'early COW' processing.
   In such cases, the pages are private and not associated with a file, or
   filesystem or subpool (spool).  Therefore, there should be no spool
   modifications.
2) hugetlb_wp (formerly called hugetlb_cow) - Again, we are allocating a
   private page and should not be modifying spool.

If the above is correct, then we will not modify spool->rsv_hpages which
leads to the inconsistent results.

It is confusing that MAP_NORESERVE does not imply avoid_reserve will be
passed to alloc_huge_page.

> 	spool->rsv_hpages  29 /* hugepage_subpool_get_pages decreases it. */
> 	h->free_huge_pages 59
> 	h->resv_huge_pages 30 /* rsv_hpages is used, but *h->resv_huge_pages is not modified accordingly*. */
> 
>     If the hugetlb page is freed later, we will have:
> 	spool->rsv_hpages  30 /* hugepage_subpool_put_pages increases it. */
> 	h->free_huge_pages 60
> 	h->resv_huge_pages 31 /* *increased wrongly* due to hugepage_subpool_put_pages(spool, 1) == 0. */
> 			   ^^
> 

I'll take a closer look at 2 and 3 when we determine if 1 is a possible
issue or not.
-- 
Mike Kravetz

> 2.avoid_reserve issue with hugetlb rsvd cgroup charge for private mappings in alloc_huge_page.
> 
>     In general, if hugetlb pages are reserved, corresponding rsvd counters are charged in resv_maps
> for private mappings. Otherwise they're charged in individual hugetlb pages. When alloc_huge_page()
> is called with avoid_reserve == true, hugetlb_cgroup_charge_cgroup_rsvd() will be called to charge
> the newly allocated hugetlb page even if there has a reservation for this page in resv_maps. Then
> vma_commit_reservation() is called to indicate that the reservation is consumed. So the reservation
> *can not be used, thus leaking* from now on because vma_needs_reservation always return 1 for it.
> 
> 3.avoid_reserve issue with restore_reserve_on_error
> 
>     There's a assumption in restore_reserve_on_error(): If HPageRestoreReserve is not set, this indicates
> there is an entry in the reserve map added by alloc_huge_page or HPageRestoreReserve would be set on the
> page. But this assumption *does not hold for avoid_reserve*. HPageRestoreReserve won't be set even if there
> is already an entry in the reserve map for avoid_reserve case. So avoid_reserve should be considered in this
> function, i.e. we need *a reliable way* to determine whether the entry is added by the alloc_huge_page().
> 
> Are above issues possible? Or am I miss something? These possible issues seem not easy to fix for me.
> Any thoughts? Any response would be appreciated!
> 
> Thanks!
> Miaohe Lin


  reply	other threads:[~2022-08-19  0:29 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-17  8:31 [bug report] mm/hugetlb: various bugs with avoid_reserve case in alloc_huge_page() Miaohe Lin
2022-08-18 22:43 ` Mike Kravetz [this message]
2022-08-19  7:20   ` Miaohe Lin
2022-08-19 19:11     ` Mike Kravetz
2022-08-22  3:13       ` Miaohe Lin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yv7AlZyNaAgpB4Qg@monkey \
    --to=mike.kravetz@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=linmiaohe@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=songmuchun@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.