From: Davidlohr Bueso <dave@stgolabs.net>
To: Mike Kravetz <mike.kravetz@oracle.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
David Rientjes <rientjes@google.com>,
Luiz Capitulino <lcapitulino@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH v2 0/2] alloc_huge_page/hugetlb_reserve_pages race
Date: Mon, 25 May 2015 12:58:07 -0700 [thread overview]
Message-ID: <1432583887.2185.53.camel@stgolabs.net> (raw)
In-Reply-To: <1432353304-12767-1-git-send-email-mike.kravetz@oracle.com>
On Fri, 2015-05-22 at 20:55 -0700, Mike Kravetz wrote:
> This updated patch set includes new documentation for the region/
> reserve map routines. Since I am not the original author of this
> code, comments would be appreciated.
>
> While working on hugetlbfs fallocate support, I noticed the following
> race in the existing code. It is unlikely that this race is hit very
> often in the current code.
Have you actually run into this issue? Can you produce a testcase?
> However, if more functionality to add and
> remove pages to hugetlbfs mappings (such as fallocate) is added the
> likelihood of hitting this race will increase.
>
> alloc_huge_page and hugetlb_reserve_pages use information from the
> reserve map to determine if there are enough available huge pages to
> complete the operation, as well as adjust global reserve and subpool
> usage counts. The order of operations is as follows:
> - call region_chg() to determine the expected change based on reserve map
> - determine if enough resources are available for this operation
> - adjust global counts based on the expected change
> - call region_add() to update the reserve map
> The issue is that reserve map could change between the call to region_chg
> and region_add. In this case, the counters which were adjusted based on
> the output of region_chg will not be correct.
>
> In order to hit this race today, there must be an existing shared hugetlb
> mmap created with the MAP_NORESERVE flag. A page fault to allocate a huge
> page via this mapping must occur at the same another task is mapping the
> same region without the MAP_NORESERVE flag.
In the past file regions were serialized by either mmap_sem (exclusive)
or the hugetlb instantiation mutex (when mmap_sem was shared). With
finer grained locking, however, we now rely on the resv_map->lock. So I
guess you are referring to something like this, no?
CPU0 (via vma_[needs/commit]_reservation) CPU1
hugetlb_fault
mutex_lock(hash_A)
hugetlb_no_page
alloc_huge_page shm_get
region_chg hugetlb_file_setup
<accounting updates> hugetlb_reserve_pages
region_chg
region_add <accounting updates>
region_add
Couldn't this race also occur upon concurrent faults on two different
hashes backed by the same vma?
Anyway, it's memorial day, so I'll take a closer look during the week,
but you seem to be correct. An alternative could be to continue holding
the spinlock until the after region_add, but I like your "fixup"
approach.
> The patch set does not prevent the race from happening. Rather, it adds
> simple functionality to detect when the race has occurred. If a race is
> detected, then the incorrect counts are adjusted.
>
> v2:
> Added documentation for the region/reserve map routines
Thanks for doing this, as akpm mentioned, it is much needed. However,
this should be a new, separate patch.
> Created common routine for vma_commit_reservation and
> vma_commit_reservation to help prevent them from drifting
> apart in the future.
>
> Mike Kravetz (2):
> mm/hugetlb: compute/return the number of regions added by region_add()
> mm/hugetlb: handle races in alloc_huge_page and hugetlb_reserve_pages
Ah, so these two patches are duplicates from your fallocate series,
right? You should drop those from that patchset then, as bugfixes should
be separate.
Could you rename patch 2 to something more meaningful? ie:
mm/hugetlb: account for races between region_chg and region_add
Also, gosh those function names are nasty and unclear -- I would change
them to region_prepare and region_commit, or something like that where
the purpose is more obvious.
Thanks,
Davidlohr
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2015-05-25 19:58 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-05-23 3:55 [PATCH v2 0/2] alloc_huge_page/hugetlb_reserve_pages race Mike Kravetz
2015-05-23 3:55 ` [PATCH v2 1/2] mm/hugetlb: compute/return the number of regions added by region_add() Mike Kravetz
2015-05-25 6:19 ` Naoya Horiguchi
2015-05-26 16:30 ` Mike Kravetz
2015-05-25 20:29 ` Davidlohr Bueso
2015-05-26 22:59 ` Mike Kravetz
2015-05-23 3:55 ` [PATCH v2 2/2] mm/hugetlb: handle races in alloc_huge_page and hugetlb_reserve_pages Mike Kravetz
2015-05-25 19:58 ` Davidlohr Bueso [this message]
2015-05-26 17:10 ` [PATCH v2 0/2] alloc_huge_page/hugetlb_reserve_pages race Mike Kravetz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1432583887.2185.53.camel@stgolabs.net \
--to=dave@stgolabs.net \
--cc=akpm@linux-foundation.org \
--cc=lcapitulino@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mike.kravetz@oracle.com \
--cc=n-horiguchi@ah.jp.nec.com \
--cc=rientjes@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).