From: Mike Kravetz <mike.kravetz@oracle.com>
To: linux-mm@kvack.org, linux-kernel@vger.kernel.org
Cc: Dave Hansen <dave.hansen@linux.intel.com>,
Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
David Rientjes <rientjes@google.com>,
Hugh Dickins <hughd@google.com>,
Davidlohr Bueso <dave@stgolabs.net>,
Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>,
Hillf Danton <hillf.zj@alibaba-inc.com>,
Christoph Hellwig <hch@infradead.org>
Subject: Re: [RFC v5 PATCH 1/9] mm/hugetlb: add region_del() to delete a specific range of entries
Date: Mon, 29 Jun 2015 14:47:59 -0700 [thread overview]
Message-ID: <5591BD0F.7050809@oracle.com> (raw)
In-Reply-To: <1435019919-29225-2-git-send-email-mike.kravetz@oracle.com>
On 06/22/2015 05:38 PM, Mike Kravetz wrote:
> fallocate hole punch will want to remove a specific range of pages.
> The existing region_truncate() routine deletes all region/reserve
> map entries after a specified offset. region_del() will provide
> this same functionality if the end of region is specified as -1.
> Hence, region_del() can replace region_truncate().
>
> Unlike region_truncate(), region_del() can return an error in the
> rare case where it can not allocate memory for a region descriptor.
> This ONLY happens in the case where an existing region must be split.
> Current callers passing -1 as end of range will never experience
> this error and do not need to deal with error handling. Future
> callers of region_del() (such as fallocate hole punch) will need to
> handle this error.
Unfortunately, this new region_del() functionality required for hole
punch conflicts with existing region_chg()/region_add() assumptions.
region_chg/region_add is something like a two step commit process for
adding new region entries. region_chg is first called to determine
the changes required for the new entry. If the new entry can be
represented by expanding an existing region, no changes are made to
the map in region_chg. If the new entry is not adjacent to an
existing region, a placeholder is created during region_chg. Later
when region_add is called, the assumption is that a region (real or
placeholder) can be expanded to represent the new entry. Since
all required entries already exist in the map, region_add can not
fail.
It is possible for the new region_del to modify the map between the
region_chg and region_add calls. It can not modify the same map
entry being added by region_chg/region_add as that is protected by
the fault mutex. However, it can modify an entry adjacent to the
new entry. The entry could be modified so that it is no longer
adjacent to the new entry. As a result, when region_add is called
it will not find a region which can be expanded to represent the
new entry.
In this situation, region_add only needs to add a new region to
the map. However, to do so would require allocating a new region
descriptor. The allocation could fail which would result in
region_add failing.
I'm thinking about having a cache of region descriptors pre-allocated
to handle this (rare) situation. The number of descriptors needed
in the cache would correspond to the number of page faults in
progress (between region_chg and region_add). region_chg would make
sure there are sufficient descriptors and allocate one if needed.
Error handling for region_chg ENOMEM already exists. A sufficient
number of entries would be pre-allocated such that in the normal
case no allocation would be necessary.
Thoughts?
--
Mike Kravetz
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2015-06-29 21:48 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-06-23 0:38 [RFC v5 PATCH 0/9] hugetlbfs: add fallocate support Mike Kravetz
2015-06-23 0:38 ` [RFC v5 PATCH 1/9] mm/hugetlb: add region_del() to delete a specific range of entries Mike Kravetz
2015-06-29 21:47 ` Mike Kravetz [this message]
2015-06-23 0:38 ` [RFC v5 PATCH 2/9] mm/hugetlb: expose hugetlb fault mutex for use by fallocate Mike Kravetz
2015-06-23 0:38 ` [RFC v5 PATCH 3/9] hugetlbfs: hugetlb_vmtruncate_list() needs to take a range to delete Mike Kravetz
2015-06-23 0:38 ` [RFC v5 PATCH 4/9] hugetlbfs: truncate_hugepages() takes a range of pages Mike Kravetz
2015-06-23 0:38 ` [RFC v5 PATCH 5/9] mm/hugetlb: vma_has_reserves() needs to handle fallocate hole punch Mike Kravetz
2015-06-23 0:38 ` [RFC v5 PATCH 6/9] mm/hugetlb: alloc_huge_page handle areas hole punched by fallocate Mike Kravetz
2015-06-23 0:38 ` [RFC v5 PATCH 7/9] hugetlbfs: New huge_add_to_page_cache helper routine Mike Kravetz
2015-06-23 0:38 ` [RFC v5 PATCH 8/9] hugetlbfs: add hugetlbfs_fallocate() Mike Kravetz
2015-07-24 6:25 ` Michal Hocko
2015-07-24 16:18 ` Mike Kravetz
2015-07-27 7:07 ` Michal Hocko
2015-06-23 0:38 ` [RFC v5 PATCH 9/9] mm: madvise allow remove operation for hugetlbfs Mike Kravetz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5591BD0F.7050809@oracle.com \
--to=mike.kravetz@oracle.com \
--cc=aneesh.kumar@linux.vnet.ibm.com \
--cc=dave.hansen@linux.intel.com \
--cc=dave@stgolabs.net \
--cc=hch@infradead.org \
--cc=hillf.zj@alibaba-inc.com \
--cc=hughd@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=n-horiguchi@ah.jp.nec.com \
--cc=rientjes@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).