From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ob0-f175.google.com (mail-ob0-f175.google.com [209.85.214.175]) by kanga.kvack.org (Postfix) with ESMTP id 9D9876B0038 for ; Mon, 16 Mar 2015 19:53:41 -0400 (EDT) Received: by obcxo2 with SMTP id xo2so48111618obc.0 for ; Mon, 16 Mar 2015 16:53:41 -0700 (PDT) Received: from aserp1040.oracle.com (aserp1040.oracle.com. [141.146.126.69]) by mx.google.com with ESMTPS id x6si6434956obg.3.2015.03.16.16.53.40 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Mon, 16 Mar 2015 16:53:40 -0700 (PDT) From: Mike Kravetz Subject: [PATCH V2 0/4] hugetlbfs: add min_size filesystem mount option Date: Mon, 16 Mar 2015 16:53:25 -0700 Message-Id: Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Andrew Morton , Davidlohr Bueso , Aneesh Kumar , Joonsoo Kim , Mike Kravetz hugetlbfs allocates huge pages from the global pool as needed. Even if the global pool contains a sufficient number pages for the filesystem size at mount time, those global pages could be grabbed for some other use. As a result, filesystem huge page allocations may fail due to lack of pages. Applications such as a database want to use huge pages for performance reasons. hugetlbfs filesystem semantics with ownership and modes work well to manage access to a pool of huge pages. However, the application would like some reasonable assurance that allocations will not fail due to a lack of huge pages. At application startup time, the application would like to configure itself to use a specific number of huge pages. Before starting, the application can check to make sure that enough huge pages exist in the system global pools. However, there are no guarantees that those pages will be available when needed by the application. What the application wants is exclusive use of a subset of huge pages. Add a new hugetlbfs mount option 'min_size=' to indicate that the specified number of pages will be available for use by the filesystem. At mount time, this number of huge pages will be reserved for exclusive use of the filesystem. If there is not a sufficient number of free pages, the mount will fail. As pages are allocated to and freeed from the filesystem, the number of reserved pages is adjusted so that the specified minimum is maintained. V2: Added ability to specify minimum size. (David Rientjes) V1: Comments from RFC addressed/incorporated Mike Kravetz (4): hugetlbfs: add minimum size tracking fields to subpool structure hugetlbfs: add minimum size accounting to subpools hugetlbfs: accept subpool min_size mount option and setup accordingly hugetlbfs: document min_size mount option Documentation/vm/hugetlbpage.txt | 21 ++++-- fs/hugetlbfs/inode.c | 75 ++++++++++++++++----- include/linux/hugetlb.h | 5 +- mm/hugetlb.c | 138 ++++++++++++++++++++++++++++++++------- 4 files changed, 190 insertions(+), 49 deletions(-) -- 2.1.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi0-f54.google.com (mail-oi0-f54.google.com [209.85.218.54]) by kanga.kvack.org (Postfix) with ESMTP id CC10E6B006C for ; Mon, 16 Mar 2015 19:53:42 -0400 (EDT) Received: by oibu204 with SMTP id u204so52018849oib.0 for ; Mon, 16 Mar 2015 16:53:42 -0700 (PDT) Received: from aserp1040.oracle.com (aserp1040.oracle.com. [141.146.126.69]) by mx.google.com with ESMTPS id e8si3913709oib.130.2015.03.16.16.53.42 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Mon, 16 Mar 2015 16:53:42 -0700 (PDT) From: Mike Kravetz Subject: [PATCH V2 1/4] hugetlbfs: add minimum size tracking fields to subpool structure Date: Mon, 16 Mar 2015 16:53:26 -0700 Message-Id: <1ef964ec5febb254dbee28604481c6768e018268.1426549010.git.mike.kravetz@oracle.com> In-Reply-To: References: In-Reply-To: References: Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Andrew Morton , Davidlohr Bueso , Aneesh Kumar , Joonsoo Kim , Mike Kravetz Add a field to the subpool structure to indicate the minimimum number of huge pages to always be used by this subpool. This minimum count includes allocated pages as well as reserved pages. If the minimum number of pages for the subpool have not been allocated, pages are reserved up to this minimum. An additional field (rsv_hpages) is used to track the number of pages reserved to meet this minimum size. The hstate pointer in the subpool is convenient to have when reserving and unreserving the pages. Signed-off-by: Mike Kravetz --- include/linux/hugetlb.h | 2 ++ mm/hugetlb.c | 3 +++ 2 files changed, 5 insertions(+) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 431b7fc..cfe13fd 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -23,6 +23,8 @@ struct hugepage_subpool { spinlock_t lock; long count; long max_hpages, used_hpages; + struct hstate *hstate; + long min_hpages, rsv_hpages; }; struct resv_map { diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 85032de..07b7226 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -85,6 +85,9 @@ struct hugepage_subpool *hugepage_new_subpool(long nr_blocks) spool->count = 1; spool->max_hpages = nr_blocks; spool->used_hpages = 0; + spool->hstate = NULL; + spool->min_hpages = 0; + spool->rsv_hpages = 0; return spool; } -- 2.1.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi0-f44.google.com (mail-oi0-f44.google.com [209.85.218.44]) by kanga.kvack.org (Postfix) with ESMTP id 4F4E06B006E for ; Mon, 16 Mar 2015 19:53:46 -0400 (EDT) Received: by oigv203 with SMTP id v203so17611025oig.3 for ; Mon, 16 Mar 2015 16:53:46 -0700 (PDT) Received: from aserp1040.oracle.com (aserp1040.oracle.com. [141.146.126.69]) by mx.google.com with ESMTPS id kj3si6408386oeb.58.2015.03.16.16.53.45 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Mon, 16 Mar 2015 16:53:45 -0700 (PDT) From: Mike Kravetz Subject: [PATCH V2 2/4] hugetlbfs: add minimum size accounting to subpools Date: Mon, 16 Mar 2015 16:53:27 -0700 Message-Id: <464e43df640c54408ed78d1397ad8148784e4ecc.1426549011.git.mike.kravetz@oracle.com> In-Reply-To: References: In-Reply-To: References: Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Andrew Morton , Davidlohr Bueso , Aneesh Kumar , Joonsoo Kim , Mike Kravetz The same routines that perform subpool maximum size accounting hugepage_subpool_get/put_pages() are modified to also perform minimum size accounting. When a delta value is passed to these routines, calculate how global reservations must be adjusted to maintain the subpool minimum size. The routines now return this global reserve count adjustment. This global adjusted reserve count is then passed to the global accounting routine hugetlb_acct_memory(). Signed-off-by: Mike Kravetz --- mm/hugetlb.c | 115 ++++++++++++++++++++++++++++++++++++++++++++++++----------- 1 file changed, 94 insertions(+), 21 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 07b7226..ab2ea1e 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -100,36 +100,85 @@ void hugepage_put_subpool(struct hugepage_subpool *spool) unlock_or_release_subpool(spool); } -static int hugepage_subpool_get_pages(struct hugepage_subpool *spool, +/* + * subpool accounting for allocating and reserving pages + * return -ENOMEM if there are not enough resources to satisfy the + * the request. Otherwise, return the number of pages by which the + * global pools must be adjusted (upward). The returned value may + * only be different than the passed value (delta) in the case where + * a subpool minimum size must be manitained. + */ +static long hugepage_subpool_get_pages(struct hugepage_subpool *spool, long delta) { - int ret = 0; + long ret = delta; if (!spool) - return 0; + return ret; spin_lock(&spool->lock); - if ((spool->used_hpages + delta) <= spool->max_hpages) { - spool->used_hpages += delta; - } else { - ret = -ENOMEM; + + if (spool->max_hpages != -1) { /* maximum size accounting */ + if ((spool->used_hpages + delta) <= spool->max_hpages) + spool->used_hpages += delta; + else { + ret = -ENOMEM; + goto unlock_ret; + } + } + + if (spool->min_hpages) { /* minimum size accounting */ + if (delta > spool->rsv_hpages) { + /* asking for more reserves than those already taken + * on behalf of subpool. return difference */ + ret = delta - spool->rsv_hpages; + spool->rsv_hpages = 0; + } else { + ret = 0; /* reserves already accounted for */ + spool->rsv_hpages -= delta; + } } - spin_unlock(&spool->lock); +unlock_ret: + spin_unlock(&spool->lock); return ret; } -static void hugepage_subpool_put_pages(struct hugepage_subpool *spool, +/* + * subpool accounting for freeing and unreserving pages + * Return the number of global page reservations that must be dropped. + * The return value may only be different than the passed value (delta) + * in the case where a subpool minimum size must be maintained. + */ +static long hugepage_subpool_put_pages(struct hugepage_subpool *spool, long delta) { + long ret = delta; + if (!spool) - return; + return delta; spin_lock(&spool->lock); - spool->used_hpages -= delta; + + if (spool->max_hpages != -1) /* maximum size accounting */ + spool->used_hpages -= delta; + + if (spool->min_hpages) { /* minimum size accounting */ + if (spool->rsv_hpages + delta <= spool->min_hpages) + ret = 0; + else + ret = spool->rsv_hpages + delta - spool->min_hpages; + + spool->rsv_hpages += delta; + if (spool->rsv_hpages > spool->min_hpages) + spool->rsv_hpages = spool->min_hpages; + } + /* If hugetlbfs_put_super couldn't free spool due to * an outstanding quota reference, free it now. */ unlock_or_release_subpool(spool); + + return ret; } static inline struct hugepage_subpool *subpool_inode(struct inode *inode) @@ -877,6 +926,14 @@ void free_huge_page(struct page *page) restore_reserve = PagePrivate(page); ClearPagePrivate(page); + /* + * A return code of zero implies that the subpool will be under + * it's minimum size if the reservation is not restored after + * page is free. Therefore, force restore_reserve operation. + */ + if (hugepage_subpool_put_pages(spool, 1) == 0) + restore_reserve = true; + spin_lock(&hugetlb_lock); hugetlb_cgroup_uncharge_page(hstate_index(h), pages_per_huge_page(h), page); @@ -894,7 +951,6 @@ void free_huge_page(struct page *page) enqueue_huge_page(h, page); } spin_unlock(&hugetlb_lock); - hugepage_subpool_put_pages(spool, 1); } static void prep_new_huge_page(struct hstate *h, struct page *page, int nid) @@ -1387,7 +1443,7 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma, if (chg < 0) return ERR_PTR(-ENOMEM); if (chg || avoid_reserve) - if (hugepage_subpool_get_pages(spool, 1)) + if (hugepage_subpool_get_pages(spool, 1) < 0) return ERR_PTR(-ENOSPC); ret = hugetlb_cgroup_charge_cgroup(idx, pages_per_huge_page(h), &h_cg); @@ -2455,6 +2511,7 @@ static void hugetlb_vm_op_close(struct vm_area_struct *vma) struct resv_map *resv = vma_resv_map(vma); struct hugepage_subpool *spool = subpool_vma(vma); unsigned long reserve, start, end; + long gbl_reserve; if (!resv || !is_vma_resv_set(vma, HPAGE_RESV_OWNER)) return; @@ -2467,8 +2524,12 @@ static void hugetlb_vm_op_close(struct vm_area_struct *vma) kref_put(&resv->refs, resv_map_release); if (reserve) { - hugetlb_acct_memory(h, -reserve); - hugepage_subpool_put_pages(spool, reserve); + /* + * decrement reserve counts. The global reserve count + * may be adjusted if the subpool has a minimum size. + */ + gbl_reserve = hugepage_subpool_put_pages(spool, reserve); + hugetlb_acct_memory(h, -gbl_reserve); } } @@ -3399,6 +3460,7 @@ int hugetlb_reserve_pages(struct inode *inode, struct hstate *h = hstate_inode(inode); struct hugepage_subpool *spool = subpool_inode(inode); struct resv_map *resv_map; + long gbl_reserve; /* * Only apply hugepage reservation if asked. At fault time, an @@ -3435,8 +3497,13 @@ int hugetlb_reserve_pages(struct inode *inode, goto out_err; } - /* There must be enough pages in the subpool for the mapping */ - if (hugepage_subpool_get_pages(spool, chg)) { + /* + * There must be enough pages in the subpool for the mapping. If + * the subpool has a minimum size, there may be some global + * reservations already in place (gbl_reserve). + */ + gbl_reserve = hugepage_subpool_get_pages(spool, chg); + if (gbl_reserve < 0) { ret = -ENOSPC; goto out_err; } @@ -3445,9 +3512,10 @@ int hugetlb_reserve_pages(struct inode *inode, * Check enough hugepages are available for the reservation. * Hand the pages back to the subpool if there are not */ - ret = hugetlb_acct_memory(h, chg); + ret = hugetlb_acct_memory(h, gbl_reserve); if (ret < 0) { - hugepage_subpool_put_pages(spool, chg); + /* put back original number of pages, chg */ + (void)hugepage_subpool_put_pages(spool, chg); goto out_err; } @@ -3477,6 +3545,7 @@ void hugetlb_unreserve_pages(struct inode *inode, long offset, long freed) struct resv_map *resv_map = inode_resv_map(inode); long chg = 0; struct hugepage_subpool *spool = subpool_inode(inode); + long gbl_reserve; if (resv_map) chg = region_truncate(resv_map, offset); @@ -3484,8 +3553,12 @@ void hugetlb_unreserve_pages(struct inode *inode, long offset, long freed) inode->i_blocks -= (blocks_per_huge_page(h) * freed); spin_unlock(&inode->i_lock); - hugepage_subpool_put_pages(spool, (chg - freed)); - hugetlb_acct_memory(h, -(chg - freed)); + /* + * If the subpool has a minimum size, the number of global + * reservations to be released may be adjusted. + */ + gbl_reserve = hugepage_subpool_put_pages(spool, (chg - freed)); + hugetlb_acct_memory(h, -gbl_reserve); } #ifdef CONFIG_ARCH_WANT_HUGE_PMD_SHARE -- 2.1.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pd0-f171.google.com (mail-pd0-f171.google.com [209.85.192.171]) by kanga.kvack.org (Postfix) with ESMTP id 0779B6B0070 for ; Mon, 16 Mar 2015 19:53:55 -0400 (EDT) Received: by pdbop1 with SMTP id op1so72136949pdb.2 for ; Mon, 16 Mar 2015 16:53:54 -0700 (PDT) Received: from userp1040.oracle.com (userp1040.oracle.com. [156.151.31.81]) by mx.google.com with ESMTPS id og4si18695953pdb.24.2015.03.16.16.53.52 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Mon, 16 Mar 2015 16:53:54 -0700 (PDT) From: Mike Kravetz Subject: [PATCH V2 3/4] hugetlbfs: accept subpool min_size mount option and setup accordingly Date: Mon, 16 Mar 2015 16:53:28 -0700 Message-Id: In-Reply-To: References: In-Reply-To: References: Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Andrew Morton , Davidlohr Bueso , Aneesh Kumar , Joonsoo Kim , Mike Kravetz Make 'min_size=' be an option when mounting a hugetlbfs. This option takes the same value as the 'size' option. min_size can be specified with specifying size. If both are specified, min_size must be less that or equal to size else the mount will fail. If min_size is specified, then at mount time an attempt is made to reserve min_size pages. If the reservation fails, the mount fails. At umount time, the reserved pages are released. Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c | 75 ++++++++++++++++++++++++++++++++++++++----------- include/linux/hugetlb.h | 3 +- mm/hugetlb.c | 26 +++++++++++++---- 3 files changed, 80 insertions(+), 24 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 5eba47f..7a20a1b 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -50,6 +50,7 @@ struct hugetlbfs_config { long nr_blocks; long nr_inodes; struct hstate *hstate; + long min_size; }; struct hugetlbfs_inode_info { @@ -73,7 +74,7 @@ int sysctl_hugetlb_shm_group; enum { Opt_size, Opt_nr_inodes, Opt_mode, Opt_uid, Opt_gid, - Opt_pagesize, + Opt_pagesize, Opt_min_size, Opt_err, }; @@ -84,6 +85,7 @@ static const match_table_t tokens = { {Opt_uid, "uid=%u"}, {Opt_gid, "gid=%u"}, {Opt_pagesize, "pagesize=%s"}, + {Opt_min_size, "min_size=%s"}, {Opt_err, NULL}, }; @@ -761,14 +763,32 @@ static const struct super_operations hugetlbfs_ops = { .show_options = generic_show_options, }; +enum { NO_SIZE, SIZE_STD, SIZE_PERCENT }; + +static bool +hugetlbfs_options_setsize(struct hstate *h, long long *size, int setsize) +{ + if (setsize == NO_SIZE) + return false; + + if (setsize == SIZE_PERCENT) { + *size <<= huge_page_shift(h); + *size *= h->max_huge_pages; + do_div(*size, 100); + } + + *size >>= huge_page_shift(h); + return true; +} + static int hugetlbfs_parse_options(char *options, struct hugetlbfs_config *pconfig) { char *p, *rest; substring_t args[MAX_OPT_ARGS]; int option; - unsigned long long size = 0; - enum { NO_SIZE, SIZE_STD, SIZE_PERCENT } setsize = NO_SIZE; + unsigned long long max_size = 0, min_size = 0; + int max_setsize = NO_SIZE, min_setsize = NO_SIZE; if (!options) return 0; @@ -806,10 +826,10 @@ hugetlbfs_parse_options(char *options, struct hugetlbfs_config *pconfig) /* memparse() will accept a K/M/G without a digit */ if (!isdigit(*args[0].from)) goto bad_val; - size = memparse(args[0].from, &rest); - setsize = SIZE_STD; + max_size = memparse(args[0].from, &rest); + max_setsize = SIZE_STD; if (*rest == '%') - setsize = SIZE_PERCENT; + max_setsize = SIZE_PERCENT; break; } @@ -832,6 +852,17 @@ hugetlbfs_parse_options(char *options, struct hugetlbfs_config *pconfig) break; } + case Opt_min_size: { + /* memparse() will accept a K/M/G without a digit */ + if (!isdigit(*args[0].from)) + goto bad_val; + min_size = memparse(args[0].from, &rest); + min_setsize = SIZE_STD; + if (*rest == '%') + min_setsize = SIZE_PERCENT; + break; + } + default: pr_err("Bad mount option: \"%s\"\n", p); return -EINVAL; @@ -839,15 +870,17 @@ hugetlbfs_parse_options(char *options, struct hugetlbfs_config *pconfig) } } - /* Do size after hstate is set up */ - if (setsize > NO_SIZE) { - struct hstate *h = pconfig->hstate; - if (setsize == SIZE_PERCENT) { - size <<= huge_page_shift(h); - size *= h->max_huge_pages; - do_div(size, 100); - } - pconfig->nr_blocks = (size >> huge_page_shift(h)); + /* Calculate number of huge pages based on hstate */ + if (hugetlbfs_options_setsize(pconfig->hstate, &max_size, max_setsize)) + pconfig->nr_blocks = max_size; + if (hugetlbfs_options_setsize(pconfig->hstate, &min_size, min_setsize)) + pconfig->min_size = min_size; + + /* If max_size specified, then min_size must be smaller */ + if (max_setsize > NO_SIZE && min_setsize > NO_SIZE && + pconfig->min_size > pconfig->nr_blocks) { + pr_err("minimum size can not be greater than maximum size\n"); + return -EINVAL; } return 0; @@ -872,6 +905,7 @@ hugetlbfs_fill_super(struct super_block *sb, void *data, int silent) config.gid = current_fsgid(); config.mode = 0755; config.hstate = &default_hstate; + config.min_size = 0; /* No default minimum size */ ret = hugetlbfs_parse_options(data, &config); if (ret) return ret; @@ -885,8 +919,15 @@ hugetlbfs_fill_super(struct super_block *sb, void *data, int silent) sbinfo->max_inodes = config.nr_inodes; sbinfo->free_inodes = config.nr_inodes; sbinfo->spool = NULL; - if (config.nr_blocks != -1) { - sbinfo->spool = hugepage_new_subpool(config.nr_blocks); + /* + * Allocate and initialize subpool if maximum or minimum size is + * specified. Any needed reservations (for minimim size) are taken + * taken when the subpool is created. + */ + if (config.nr_blocks != -1 || config.min_size != 0) { + sbinfo->spool = hugepage_new_subpool(config.hstate, + config.nr_blocks, + config.min_size); if (!sbinfo->spool) goto out_free; } diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index cfe13fd..6883fca 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -40,7 +40,8 @@ extern int hugetlb_max_hstate __read_mostly; #define for_each_hstate(h) \ for ((h) = hstates; (h) < &hstates[hugetlb_max_hstate]; (h)++) -struct hugepage_subpool *hugepage_new_subpool(long nr_blocks); +struct hugepage_subpool *hugepage_new_subpool(struct hstate *h, long nr_blocks, + long min_size); void hugepage_put_subpool(struct hugepage_subpool *spool); int PageHuge(struct page *page); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index ab2ea1e..7d4be33 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -61,6 +61,9 @@ DEFINE_SPINLOCK(hugetlb_lock); static int num_fault_mutexes; static struct mutex *htlb_fault_mutex_table ____cacheline_aligned_in_smp; +/* Forward declaration */ +static int hugetlb_acct_memory(struct hstate *h, long delta); + static inline void unlock_or_release_subpool(struct hugepage_subpool *spool) { bool free = (spool->count == 0) && (spool->used_hpages == 0); @@ -68,12 +71,18 @@ static inline void unlock_or_release_subpool(struct hugepage_subpool *spool) spin_unlock(&spool->lock); /* If no pages are used, and no other handles to the subpool - * remain, free the subpool the subpool remain */ - if (free) + * remain, give up any reservations mased on minimum size and + * free the subpool */ + if (free) { + if (spool->min_hpages) + hugetlb_acct_memory(spool->hstate, + -spool->min_hpages); kfree(spool); + } } -struct hugepage_subpool *hugepage_new_subpool(long nr_blocks) +struct hugepage_subpool *hugepage_new_subpool(struct hstate *h, long nr_blocks, + long min_size) { struct hugepage_subpool *spool; @@ -85,9 +94,14 @@ struct hugepage_subpool *hugepage_new_subpool(long nr_blocks) spool->count = 1; spool->max_hpages = nr_blocks; spool->used_hpages = 0; - spool->hstate = NULL; - spool->min_hpages = 0; - spool->rsv_hpages = 0; + spool->hstate = h; + spool->min_hpages = min_size; + + if (min_size && hugetlb_acct_memory(h, min_size)) { + kfree(spool); + return NULL; + } + spool->rsv_hpages = min_size; return spool; } -- 2.1.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f50.google.com (mail-pa0-f50.google.com [209.85.220.50]) by kanga.kvack.org (Postfix) with ESMTP id 9D7C16B0071 for ; Mon, 16 Mar 2015 19:53:56 -0400 (EDT) Received: by pabyw6 with SMTP id yw6so79314605pab.2 for ; Mon, 16 Mar 2015 16:53:56 -0700 (PDT) Received: from userp1040.oracle.com (userp1040.oracle.com. [156.151.31.81]) by mx.google.com with ESMTPS id xh9si25456881pab.119.2015.03.16.16.53.54 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Mon, 16 Mar 2015 16:53:55 -0700 (PDT) From: Mike Kravetz Subject: [PATCH V2 4/4] hugetlbfs: document min_size mount option Date: Mon, 16 Mar 2015 16:53:29 -0700 Message-Id: <3c82f2203e5453ddf3b29431863034afc7699303.1426549011.git.mike.kravetz@oracle.com> In-Reply-To: References: In-Reply-To: References: Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Andrew Morton , Davidlohr Bueso , Aneesh Kumar , Joonsoo Kim , Mike Kravetz Update documentation for the hugetlbfs min_size mount option. Signed-off-by: Mike Kravetz --- Documentation/vm/hugetlbpage.txt | 21 ++++++++++++++------- 1 file changed, 14 insertions(+), 7 deletions(-) diff --git a/Documentation/vm/hugetlbpage.txt b/Documentation/vm/hugetlbpage.txt index f2d3a10..83c0305 100644 --- a/Documentation/vm/hugetlbpage.txt +++ b/Documentation/vm/hugetlbpage.txt @@ -267,8 +267,8 @@ call, then it is required that system administrator mount a file system of type hugetlbfs: mount -t hugetlbfs \ - -o uid=,gid=,mode=,size=,nr_inodes= \ - none /mnt/huge + -o uid=,gid=,mode=,size=,min_size=, \ + nr_inodes= none /mnt/huge This command mounts a (pseudo) filesystem of type hugetlbfs on the directory /mnt/huge. Any files created on /mnt/huge uses huge pages. The uid and gid @@ -277,11 +277,18 @@ the uid and gid of the current process are taken. The mode option sets the mode of root of file system to value & 01777. This value is given in octal. By default the value 0755 is picked. The size option sets the maximum value of memory (huge pages) allowed for that filesystem (/mnt/huge). The size is -rounded down to HPAGE_SIZE. The option nr_inodes sets the maximum number of -inodes that /mnt/huge can use. If the size or nr_inodes option is not -provided on command line then no limits are set. For size and nr_inodes -options, you can use [G|g]/[M|m]/[K|k] to represent giga/mega/kilo. For -example, size=2K has the same meaning as size=2048. +rounded down to HPAGE_SIZE. The min_size option sets the minimum value of +memory (huge pages) allowed for the filesystem. Like the size option, +min_size is rounded down to HPAGE_SIZE. At mount time, the number of huge +pages specified by min_size are reserved for use by the filesystem. If +there are not enough free huge pages available, the mount will fail. As +huge pages are allocated to the filesystem and freed, the reserve count +is adjusted so that the sum of allocated and reserved huge pages is always +at least min_size. The option nr_inodes sets the maximum number of +inodes that /mnt/huge can use. If the size, min_size or nr_inodes option +is not provided on command line then no limits are set. For size, min_size +and nr_inodes options, you can use [G|g]/[M|m]/[K|k] to represent +giga/mega/kilo. For example, size=2K has the same meaning as size=2048. While read system calls are supported on files that reside on hugetlb file systems, write system calls are not. -- 2.1.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pd0-f175.google.com (mail-pd0-f175.google.com [209.85.192.175]) by kanga.kvack.org (Postfix) with ESMTP id D381E6B0038 for ; Wed, 18 Mar 2015 17:26:00 -0400 (EDT) Received: by pdbop1 with SMTP id op1so54468152pdb.2 for ; Wed, 18 Mar 2015 14:26:00 -0700 (PDT) Received: from mail.linuxfoundation.org (mail.linuxfoundation.org. [140.211.169.12]) by mx.google.com with ESMTPS id ee5si38397921pac.139.2015.03.18.14.25.59 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 18 Mar 2015 14:25:59 -0700 (PDT) Date: Wed, 18 Mar 2015 14:25:58 -0700 From: Andrew Morton Subject: Re: [PATCH V2 1/4] hugetlbfs: add minimum size tracking fields to subpool structure Message-Id: <20150318142558.d2958fbb7f8b083c00c40c0d@linux-foundation.org> In-Reply-To: <1ef964ec5febb254dbee28604481c6768e018268.1426549010.git.mike.kravetz@oracle.com> References: <1ef964ec5febb254dbee28604481c6768e018268.1426549010.git.mike.kravetz@oracle.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Mike Kravetz Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Davidlohr Bueso , Aneesh Kumar , Joonsoo Kim On Mon, 16 Mar 2015 16:53:26 -0700 Mike Kravetz wrote: > Add a field to the subpool structure to indicate the minimimum > number of huge pages to always be used by this subpool. This > minimum count includes allocated pages as well as reserved pages. > If the minimum number of pages for the subpool have not been > allocated, pages are reserved up to this minimum. An additional > field (rsv_hpages) is used to track the number of pages reserved > to meet this minimum size. The hstate pointer in the subpool > is convenient to have when reserving and unreserving the pages. > > ... > > --- a/include/linux/hugetlb.h > +++ b/include/linux/hugetlb.h > @@ -23,6 +23,8 @@ struct hugepage_subpool { > spinlock_t lock; > long count; > long max_hpages, used_hpages; > + struct hstate *hstate; > + long min_hpages, rsv_hpages; > }; Let's leave room for the descriptive comments which aren't there. --- a/include/linux/hugetlb.h~hugetlbfs-add-minimum-size-tracking-fields-to-subpool-structure-fix +++ a/include/linux/hugetlb.h @@ -22,9 +22,11 @@ struct mmu_gather; struct hugepage_subpool { spinlock_t lock; long count; - long max_hpages, used_hpages; + long max_hpagesl + long used_hpages; struct hstate *hstate; - long min_hpages, rsv_hpages; + long min_hpages; + long rsv_hpages; }; > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -85,6 +85,9 @@ struct hugepage_subpool *hugepage_new_subpool(long nr_blocks) > spool->count = 1; > spool->max_hpages = nr_blocks; > spool->used_hpages = 0; > + spool->hstate = NULL; > + spool->min_hpages = 0; > + spool->rsv_hpages = 0; Four strikes and you're out! --- a/mm/hugetlb.c~hugetlbfs-add-minimum-size-tracking-fields-to-subpool-structure-fix +++ a/mm/hugetlb.c @@ -77,17 +77,13 @@ struct hugepage_subpool *hugepage_new_su { struct hugepage_subpool *spool; - spool = kmalloc(sizeof(*spool), GFP_KERNEL); + spool = kzalloc(sizeof(*spool), GFP_KERNEL); if (!spool) return NULL; spin_lock_init(&spool->lock); spool->count = 1; spool->max_hpages = nr_blocks; - spool->used_hpages = 0; - spool->hstate = NULL; - spool->min_hpages = 0; - spool->rsv_hpages = 0; return spool; } _ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pd0-f176.google.com (mail-pd0-f176.google.com [209.85.192.176]) by kanga.kvack.org (Postfix) with ESMTP id E32706B0038 for ; Wed, 18 Mar 2015 17:40:56 -0400 (EDT) Received: by pdbcz9 with SMTP id cz9so54762438pdb.3 for ; Wed, 18 Mar 2015 14:40:56 -0700 (PDT) Received: from mail.linuxfoundation.org (mail.linuxfoundation.org. [140.211.169.12]) by mx.google.com with ESMTPS id bu12si38476180pdb.92.2015.03.18.14.40.55 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 18 Mar 2015 14:40:56 -0700 (PDT) Date: Wed, 18 Mar 2015 14:40:54 -0700 From: Andrew Morton Subject: Re: [PATCH V2 3/4] hugetlbfs: accept subpool min_size mount option and setup accordingly Message-Id: <20150318144054.c099e8a5e462303eea707252@linux-foundation.org> In-Reply-To: References: Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Mike Kravetz Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Davidlohr Bueso , Aneesh Kumar , Joonsoo Kim On Mon, 16 Mar 2015 16:53:28 -0700 Mike Kravetz wrote: > Make 'min_size=' be an option when mounting a hugetlbfs. This option > takes the same value as the 'size' option. min_size can be specified > with specifying size. If both are specified, min_size must be less > that or equal to size else the mount will fail. If min_size is > specified, then at mount time an attempt is made to reserve min_size > pages. If the reservation fails, the mount fails. At umount time, > the reserved pages are released. > > ... > > @@ -761,14 +763,32 @@ static const struct super_operations hugetlbfs_ops = { > .show_options = generic_show_options, > }; > > +enum { NO_SIZE, SIZE_STD, SIZE_PERCENT }; > + > +static bool > +hugetlbfs_options_setsize(struct hstate *h, long long *size, int setsize) > +{ > + if (setsize == NO_SIZE) > + return false; > + > + if (setsize == SIZE_PERCENT) { > + *size <<= huge_page_shift(h); > + *size *= h->max_huge_pages; > + do_div(*size, 100); I suppose do_div() takes a long long. u64 would be more conventional. I don't *think* all this code needed to use signed types. > + } > + > + *size >>= huge_page_shift(h); > + return true; > +} > + > static int > hugetlbfs_parse_options(char *options, struct hugetlbfs_config *pconfig) > { > char *p, *rest; > substring_t args[MAX_OPT_ARGS]; > int option; > - unsigned long long size = 0; > - enum { NO_SIZE, SIZE_STD, SIZE_PERCENT } setsize = NO_SIZE; > + unsigned long long max_size = 0, min_size = 0; > + int max_setsize = NO_SIZE, min_setsize = NO_SIZE; > > if (!options) > return 0; > @@ -806,10 +826,10 @@ hugetlbfs_parse_options(char *options, struct hugetlbfs_config *pconfig) > /* memparse() will accept a K/M/G without a digit */ > if (!isdigit(*args[0].from)) > goto bad_val; > - size = memparse(args[0].from, &rest); > - setsize = SIZE_STD; > + max_size = memparse(args[0].from, &rest); > + max_setsize = SIZE_STD; > if (*rest == '%') > - setsize = SIZE_PERCENT; > + max_setsize = SIZE_PERCENT; > break; > } > > @@ -832,6 +852,17 @@ hugetlbfs_parse_options(char *options, struct hugetlbfs_config *pconfig) > break; > } > > + case Opt_min_size: { > + /* memparse() will accept a K/M/G without a digit */ > + if (!isdigit(*args[0].from)) > + goto bad_val; > + min_size = memparse(args[0].from, &rest); > + min_setsize = SIZE_STD; > + if (*rest == '%') > + min_setsize = SIZE_PERCENT; > + break; > + } > + > default: > pr_err("Bad mount option: \"%s\"\n", p); > return -EINVAL; > @@ -839,15 +870,17 @@ hugetlbfs_parse_options(char *options, struct hugetlbfs_config *pconfig) > } > } > > - /* Do size after hstate is set up */ > - if (setsize > NO_SIZE) { > - struct hstate *h = pconfig->hstate; > - if (setsize == SIZE_PERCENT) { > - size <<= huge_page_shift(h); > - size *= h->max_huge_pages; > - do_div(size, 100); > - } > - pconfig->nr_blocks = (size >> huge_page_shift(h)); > + /* Calculate number of huge pages based on hstate */ > + if (hugetlbfs_options_setsize(pconfig->hstate, &max_size, max_setsize)) > + pconfig->nr_blocks = max_size; So hugetlbfs_options_setsize takes an arg whichis in units of bytes, modifies it in-place to b in units of pages and then copies it into something which is in units of nr_blocks. > + if (hugetlbfs_options_setsize(pconfig->hstate, &min_size, min_setsize)) > + pconfig->min_size = min_size; > + > + /* If max_size specified, then min_size must be smaller */ > + if (max_setsize > NO_SIZE && min_setsize > NO_SIZE && > + pconfig->min_size > pconfig->nr_blocks) { > + pr_err("minimum size can not be greater than maximum size\n"); > + return -EINVAL; > } > > return 0; > @@ -872,6 +905,7 @@ hugetlbfs_fill_super(struct super_block *sb, void *data, int silent) > config.gid = current_fsgid(); > config.mode = 0755; > config.hstate = &default_hstate; > + config.min_size = 0; /* No default minimum size */ > ret = hugetlbfs_parse_options(data, &config); > if (ret) > return ret; > @@ -885,8 +919,15 @@ hugetlbfs_fill_super(struct super_block *sb, void *data, int silent) > sbinfo->max_inodes = config.nr_inodes; > sbinfo->free_inodes = config.nr_inodes; > sbinfo->spool = NULL; > - if (config.nr_blocks != -1) { > - sbinfo->spool = hugepage_new_subpool(config.nr_blocks); > + /* > + * Allocate and initialize subpool if maximum or minimum size is > + * specified. Any needed reservations (for minimim size) are taken > + * taken when the subpool is created. > + */ > + if (config.nr_blocks != -1 || config.min_size != 0) { > + sbinfo->spool = hugepage_new_subpool(config.hstate, > + config.nr_blocks, > + config.min_size); And hugepage_new_subpool() takes something in units of nr_blocks and copies it into something whcih has units of nr-hugepages. And it takes an arg called "size" which is no longer number-of-bytes but is actually number-of-hpages. It's all rather confusing and unclear. A good philosophy would be never to use a variable called "size", because the reader doesn't know what units that size is measured in. Instead, make sure that the name reflects the variable's units. max_bytes, min_hpages, nr_blocks, etc. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pd0-f172.google.com (mail-pd0-f172.google.com [209.85.192.172]) by kanga.kvack.org (Postfix) with ESMTP id 3DE4E6B006C for ; Wed, 18 Mar 2015 17:41:10 -0400 (EDT) Received: by pdbop1 with SMTP id op1so54803494pdb.2 for ; Wed, 18 Mar 2015 14:41:10 -0700 (PDT) Received: from mail.linuxfoundation.org (mail.linuxfoundation.org. [140.211.169.12]) by mx.google.com with ESMTPS id cn14si38468952pac.39.2015.03.18.14.41.09 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 18 Mar 2015 14:41:09 -0700 (PDT) Date: Wed, 18 Mar 2015 14:41:08 -0700 From: Andrew Morton Subject: Re: [PATCH V2 4/4] hugetlbfs: document min_size mount option Message-Id: <20150318144108.e235862e0be30ff626e01820@linux-foundation.org> In-Reply-To: <3c82f2203e5453ddf3b29431863034afc7699303.1426549011.git.mike.kravetz@oracle.com> References: <3c82f2203e5453ddf3b29431863034afc7699303.1426549011.git.mike.kravetz@oracle.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Mike Kravetz Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Davidlohr Bueso , Aneesh Kumar , Joonsoo Kim On Mon, 16 Mar 2015 16:53:29 -0700 Mike Kravetz wrote: > Update documentation for the hugetlbfs min_size mount option. > > Signed-off-by: Mike Kravetz > --- > Documentation/vm/hugetlbpage.txt | 21 ++++++++++++++------- > 1 file changed, 14 insertions(+), 7 deletions(-) > > diff --git a/Documentation/vm/hugetlbpage.txt b/Documentation/vm/hugetlbpage.txt > index f2d3a10..83c0305 100644 > --- a/Documentation/vm/hugetlbpage.txt > +++ b/Documentation/vm/hugetlbpage.txt > @@ -267,8 +267,8 @@ call, then it is required that system administrator mount a file system of > type hugetlbfs: > > mount -t hugetlbfs \ > - -o uid=,gid=,mode=,size=,nr_inodes= \ > - none /mnt/huge > + -o uid=,gid=,mode=,size=,min_size=, \ > + nr_inodes= none /mnt/huge > > This command mounts a (pseudo) filesystem of type hugetlbfs on the directory > /mnt/huge. Any files created on /mnt/huge uses huge pages. The uid and gid > @@ -277,11 +277,18 @@ the uid and gid of the current process are taken. The mode option sets the > mode of root of file system to value & 01777. This value is given in octal. > By default the value 0755 is picked. The size option sets the maximum value of > memory (huge pages) allowed for that filesystem (/mnt/huge). The size is > -rounded down to HPAGE_SIZE. The option nr_inodes sets the maximum number of > -inodes that /mnt/huge can use. If the size or nr_inodes option is not > -provided on command line then no limits are set. For size and nr_inodes > -options, you can use [G|g]/[M|m]/[K|k] to represent giga/mega/kilo. For > -example, size=2K has the same meaning as size=2048. > +rounded down to HPAGE_SIZE. The min_size option sets the minimum value of > +memory (huge pages) allowed for the filesystem. Like the size option, > +min_size is rounded down to HPAGE_SIZE. At mount time, the number of huge > +pages specified by min_size are reserved for use by the filesystem. If > +there are not enough free huge pages available, the mount will fail. As > +huge pages are allocated to the filesystem and freed, the reserve count > +is adjusted so that the sum of allocated and reserved huge pages is always > +at least min_size. The option nr_inodes sets the maximum number of > +inodes that /mnt/huge can use. If the size, min_size or nr_inodes option > +is not provided on command line then no limits are set. For size, min_size > +and nr_inodes options, you can use [G|g]/[M|m]/[K|k] to represent > +giga/mega/kilo. For example, size=2K has the same meaning as size=2048. Nowhere here is the reader told the units of "size". We should at least describe that, and maybe even rename the thing to min_bytes. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f51.google.com (mail-pa0-f51.google.com [209.85.220.51]) by kanga.kvack.org (Postfix) with ESMTP id 161426B0038 for ; Wed, 18 Mar 2015 17:43:59 -0400 (EDT) Received: by pacwe9 with SMTP id we9so54385547pac.1 for ; Wed, 18 Mar 2015 14:43:58 -0700 (PDT) Received: from mail.linuxfoundation.org (mail.linuxfoundation.org. [140.211.169.12]) by mx.google.com with ESMTPS id mj1si38513102pdb.40.2015.03.18.14.43.58 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 18 Mar 2015 14:43:58 -0700 (PDT) Date: Wed, 18 Mar 2015 14:43:57 -0700 From: Andrew Morton Subject: Re: [PATCH V2 2/4] hugetlbfs: add minimum size accounting to subpools Message-Id: <20150318144357.0e7e25cdca5066c39032bae6@linux-foundation.org> In-Reply-To: <464e43df640c54408ed78d1397ad8148784e4ecc.1426549011.git.mike.kravetz@oracle.com> References: <464e43df640c54408ed78d1397ad8148784e4ecc.1426549011.git.mike.kravetz@oracle.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Mike Kravetz Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Davidlohr Bueso , Aneesh Kumar , Joonsoo Kim On Mon, 16 Mar 2015 16:53:27 -0700 Mike Kravetz wrote: > The same routines that perform subpool maximum size accounting > hugepage_subpool_get/put_pages() are modified to also perform > minimum size accounting. When a delta value is passed to these > routines, calculate how global reservations must be adjusted > to maintain the subpool minimum size. The routines now return > this global reserve count adjustment. This global adjusted > reserve count is then passed to the global accounting routine > hugetlb_acct_memory(). > The comment layout is a bit chaotic. Also, sentences start with capital letters and end with little round things! It's a bit anal but heck, the kernel isn't written in linglish. --- a/mm/hugetlb.c~hugetlbfs-add-minimum-size-accounting-to-subpools-fix +++ a/mm/hugetlb.c @@ -125,8 +125,10 @@ static long hugepage_subpool_get_pages(s if (spool->min_hpages) { /* minimum size accounting */ if (delta > spool->rsv_hpages) { - /* asking for more reserves than those already taken - * on behalf of subpool. return difference */ + /* + * Asking for more reserves than those already taken on + * behalf of subpool. Return difference. + */ ret = delta - spool->rsv_hpages; spool->rsv_hpages = 0; } else { @@ -141,7 +143,7 @@ unlock_ret: } /* - * subpool accounting for freeing and unreserving pages + * Subpool accounting for freeing and unreserving pages. * Return the number of global page reservations that must be dropped. * The return value may only be different than the passed value (delta) * in the case where a subpool minimum size must be maintained. @@ -170,8 +172,10 @@ static long hugepage_subpool_put_pages(s spool->rsv_hpages = spool->min_hpages; } - /* If hugetlbfs_put_super couldn't free spool due to - * an outstanding quota reference, free it now. */ + /* + * If hugetlbfs_put_super couldn't free spool due to an outstanding + * quota reference, free it now. + */ unlock_or_release_subpool(spool); return ret; @@ -923,9 +927,9 @@ void free_huge_page(struct page *page) ClearPagePrivate(page); /* - * A return code of zero implies that the subpool will be under - * it's minimum size if the reservation is not restored after - * page is free. Therefore, force restore_reserve operation. + * A return code of zero implies that the subpool will be under its + * minimum size if the reservation is not restored after page is free. + * Therefore, force restore_reserve operation. */ if (hugepage_subpool_put_pages(spool, 1) == 0) restore_reserve = true; @@ -2523,8 +2527,8 @@ static void hugetlb_vm_op_close(struct v if (reserve) { /* - * decrement reserve counts. The global reserve count - * may be adjusted if the subpool has a minimum size. + * Decrement reserve counts. The global reserve count may be + * adjusted if the subpool has a minimum size. */ gbl_reserve = hugepage_subpool_put_pages(spool, reserve); hugetlb_acct_memory(h, -gbl_reserve); _ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pd0-f176.google.com (mail-pd0-f176.google.com [209.85.192.176]) by kanga.kvack.org (Postfix) with ESMTP id 04B856B0038 for ; Wed, 18 Mar 2015 21:34:26 -0400 (EDT) Received: by pdbni2 with SMTP id ni2so59919747pdb.1 for ; Wed, 18 Mar 2015 18:34:25 -0700 (PDT) Received: from userp1040.oracle.com (userp1040.oracle.com. [156.151.31.81]) by mx.google.com with ESMTPS id ag9si39390433pad.217.2015.03.18.18.34.22 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Wed, 18 Mar 2015 18:34:24 -0700 (PDT) Message-ID: <550A2797.3000708@oracle.com> Date: Wed, 18 Mar 2015 18:34:15 -0700 From: Mike Kravetz MIME-Version: 1.0 Subject: Re: [PATCH V2 3/4] hugetlbfs: accept subpool min_size mount option and setup accordingly References: <20150318144054.c099e8a5e462303eea707252@linux-foundation.org> In-Reply-To: <20150318144054.c099e8a5e462303eea707252@linux-foundation.org> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Davidlohr Bueso , Aneesh Kumar , Joonsoo Kim On 03/18/2015 02:40 PM, Andrew Morton wrote: > On Mon, 16 Mar 2015 16:53:28 -0700 Mike Kravetz wrote: > >> Make 'min_size=' be an option when mounting a hugetlbfs. This option >> takes the same value as the 'size' option. min_size can be specified >> with specifying size. If both are specified, min_size must be less >> that or equal to size else the mount will fail. If min_size is >> specified, then at mount time an attempt is made to reserve min_size >> pages. If the reservation fails, the mount fails. At umount time, >> the reserved pages are released. >> >> ... >> >> @@ -761,14 +763,32 @@ static const struct super_operations hugetlbfs_ops = { >> .show_options = generic_show_options, >> }; >> >> +enum { NO_SIZE, SIZE_STD, SIZE_PERCENT }; >> + >> +static bool >> +hugetlbfs_options_setsize(struct hstate *h, long long *size, int setsize) >> +{ >> + if (setsize == NO_SIZE) >> + return false; >> + >> + if (setsize == SIZE_PERCENT) { >> + *size <<= huge_page_shift(h); >> + *size *= h->max_huge_pages; >> + do_div(*size, 100); > > I suppose do_div() takes a long long. u64 would be more conventional. > I don't *think* all this code needed to use signed types. > >> + } >> + >> + *size >>= huge_page_shift(h); >> + return true; >> +} >> + >> static int >> hugetlbfs_parse_options(char *options, struct hugetlbfs_config *pconfig) >> { >> char *p, *rest; >> substring_t args[MAX_OPT_ARGS]; >> int option; >> - unsigned long long size = 0; >> - enum { NO_SIZE, SIZE_STD, SIZE_PERCENT } setsize = NO_SIZE; >> + unsigned long long max_size = 0, min_size = 0; >> + int max_setsize = NO_SIZE, min_setsize = NO_SIZE; >> >> if (!options) >> return 0; >> @@ -806,10 +826,10 @@ hugetlbfs_parse_options(char *options, struct hugetlbfs_config *pconfig) >> /* memparse() will accept a K/M/G without a digit */ >> if (!isdigit(*args[0].from)) >> goto bad_val; >> - size = memparse(args[0].from, &rest); >> - setsize = SIZE_STD; >> + max_size = memparse(args[0].from, &rest); >> + max_setsize = SIZE_STD; >> if (*rest == '%') >> - setsize = SIZE_PERCENT; >> + max_setsize = SIZE_PERCENT; >> break; >> } >> >> @@ -832,6 +852,17 @@ hugetlbfs_parse_options(char *options, struct hugetlbfs_config *pconfig) >> break; >> } >> >> + case Opt_min_size: { >> + /* memparse() will accept a K/M/G without a digit */ >> + if (!isdigit(*args[0].from)) >> + goto bad_val; >> + min_size = memparse(args[0].from, &rest); >> + min_setsize = SIZE_STD; >> + if (*rest == '%') >> + min_setsize = SIZE_PERCENT; >> + break; >> + } >> + >> default: >> pr_err("Bad mount option: \"%s\"\n", p); >> return -EINVAL; >> @@ -839,15 +870,17 @@ hugetlbfs_parse_options(char *options, struct hugetlbfs_config *pconfig) >> } >> } >> >> - /* Do size after hstate is set up */ >> - if (setsize > NO_SIZE) { >> - struct hstate *h = pconfig->hstate; >> - if (setsize == SIZE_PERCENT) { >> - size <<= huge_page_shift(h); >> - size *= h->max_huge_pages; >> - do_div(size, 100); >> - } >> - pconfig->nr_blocks = (size >> huge_page_shift(h)); >> + /* Calculate number of huge pages based on hstate */ >> + if (hugetlbfs_options_setsize(pconfig->hstate, &max_size, max_setsize)) >> + pconfig->nr_blocks = max_size; > > So hugetlbfs_options_setsize takes an arg whichis in units of bytes, > modifies it in-place to b in units of pages and then copies it into > something which is in units of nr_blocks. > > >> + if (hugetlbfs_options_setsize(pconfig->hstate, &min_size, min_setsize)) >> + pconfig->min_size = min_size; >> + >> + /* If max_size specified, then min_size must be smaller */ >> + if (max_setsize > NO_SIZE && min_setsize > NO_SIZE && >> + pconfig->min_size > pconfig->nr_blocks) { >> + pr_err("minimum size can not be greater than maximum size\n"); >> + return -EINVAL; >> } >> >> return 0; >> @@ -872,6 +905,7 @@ hugetlbfs_fill_super(struct super_block *sb, void *data, int silent) >> config.gid = current_fsgid(); >> config.mode = 0755; >> config.hstate = &default_hstate; >> + config.min_size = 0; /* No default minimum size */ >> ret = hugetlbfs_parse_options(data, &config); >> if (ret) >> return ret; >> @@ -885,8 +919,15 @@ hugetlbfs_fill_super(struct super_block *sb, void *data, int silent) >> sbinfo->max_inodes = config.nr_inodes; >> sbinfo->free_inodes = config.nr_inodes; >> sbinfo->spool = NULL; >> - if (config.nr_blocks != -1) { >> - sbinfo->spool = hugepage_new_subpool(config.nr_blocks); >> + /* >> + * Allocate and initialize subpool if maximum or minimum size is >> + * specified. Any needed reservations (for minimim size) are taken >> + * taken when the subpool is created. >> + */ >> + if (config.nr_blocks != -1 || config.min_size != 0) { >> + sbinfo->spool = hugepage_new_subpool(config.hstate, >> + config.nr_blocks, >> + config.min_size); > > And hugepage_new_subpool() takes something in units of nr_blocks and > copies it into something whcih has units of nr-hugepages. > > And it takes an arg called "size" which is no longer number-of-bytes > but is actually number-of-hpages. > > > It's all rather confusing and unclear. A good philosophy would be > never to use a variable called "size", because the reader doesn't know > what units that size is measured in. Instead, make sure that the name > reflects the variable's units. max_bytes, min_hpages, nr_blocks, etc. > Thanks for the comments. I didn't want to cut/paste/duplicate the code used to parse the existing size option. But, it looks like I made it harder to understand. I'll take a pass as cleaning this up and making it more clear. -- Mike Kravetz -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ob0-f179.google.com (mail-ob0-f179.google.com [209.85.214.179]) by kanga.kvack.org (Postfix) with ESMTP id ACB106B0038 for ; Wed, 18 Mar 2015 21:51:32 -0400 (EDT) Received: by obcxo2 with SMTP id xo2so44486418obc.0 for ; Wed, 18 Mar 2015 18:51:32 -0700 (PDT) Received: from aserp1040.oracle.com (aserp1040.oracle.com. [141.146.126.69]) by mx.google.com with ESMTPS id f18si6317060oem.54.2015.03.18.18.51.31 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Wed, 18 Mar 2015 18:51:31 -0700 (PDT) Message-ID: <550A2B9A.3060905@oracle.com> Date: Wed, 18 Mar 2015 18:51:22 -0700 From: Mike Kravetz MIME-Version: 1.0 Subject: Re: [PATCH V2 4/4] hugetlbfs: document min_size mount option References: <3c82f2203e5453ddf3b29431863034afc7699303.1426549011.git.mike.kravetz@oracle.com> <20150318144108.e235862e0be30ff626e01820@linux-foundation.org> In-Reply-To: <20150318144108.e235862e0be30ff626e01820@linux-foundation.org> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Davidlohr Bueso , Aneesh Kumar , Joonsoo Kim On 03/18/2015 02:41 PM, Andrew Morton wrote: > On Mon, 16 Mar 2015 16:53:29 -0700 Mike Kravetz wrote: > >> Update documentation for the hugetlbfs min_size mount option. >> >> Signed-off-by: Mike Kravetz >> --- >> Documentation/vm/hugetlbpage.txt | 21 ++++++++++++++------- >> 1 file changed, 14 insertions(+), 7 deletions(-) >> >> diff --git a/Documentation/vm/hugetlbpage.txt b/Documentation/vm/hugetlbpage.txt >> index f2d3a10..83c0305 100644 >> --- a/Documentation/vm/hugetlbpage.txt >> +++ b/Documentation/vm/hugetlbpage.txt >> @@ -267,8 +267,8 @@ call, then it is required that system administrator mount a file system of >> type hugetlbfs: >> >> mount -t hugetlbfs \ >> - -o uid=,gid=,mode=,size=,nr_inodes= \ >> - none /mnt/huge >> + -o uid=,gid=,mode=,size=,min_size=, \ >> + nr_inodes= none /mnt/huge >> >> This command mounts a (pseudo) filesystem of type hugetlbfs on the directory >> /mnt/huge. Any files created on /mnt/huge uses huge pages. The uid and gid >> @@ -277,11 +277,18 @@ the uid and gid of the current process are taken. The mode option sets the >> mode of root of file system to value & 01777. This value is given in octal. >> By default the value 0755 is picked. The size option sets the maximum value of >> memory (huge pages) allowed for that filesystem (/mnt/huge). The size is >> -rounded down to HPAGE_SIZE. The option nr_inodes sets the maximum number of >> -inodes that /mnt/huge can use. If the size or nr_inodes option is not >> -provided on command line then no limits are set. For size and nr_inodes >> -options, you can use [G|g]/[M|m]/[K|k] to represent giga/mega/kilo. For >> -example, size=2K has the same meaning as size=2048. >> +rounded down to HPAGE_SIZE. The min_size option sets the minimum value of >> +memory (huge pages) allowed for the filesystem. Like the size option, >> +min_size is rounded down to HPAGE_SIZE. At mount time, the number of huge >> +pages specified by min_size are reserved for use by the filesystem. If >> +there are not enough free huge pages available, the mount will fail. As >> +huge pages are allocated to the filesystem and freed, the reserve count >> +is adjusted so that the sum of allocated and reserved huge pages is always >> +at least min_size. The option nr_inodes sets the maximum number of >> +inodes that /mnt/huge can use. If the size, min_size or nr_inodes option >> +is not provided on command line then no limits are set. For size, min_size >> +and nr_inodes options, you can use [G|g]/[M|m]/[K|k] to represent >> +giga/mega/kilo. For example, size=2K has the same meaning as size=2048. > > Nowhere here is the reader told the units of "size". We should at > least describe that, and maybe even rename the thing to min_bytes. > Ok, I will add that the size is in unit of bytes. My choice of 'min_size' as a name for the new mount option was influenced by the existing 'size' mount option. I'm open to any suggestions for the name of this new mount option. -- Mike Kravetz -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f51.google.com (mail-pa0-f51.google.com [209.85.220.51]) by kanga.kvack.org (Postfix) with ESMTP id D10E86B0038 for ; Wed, 18 Mar 2015 22:24:02 -0400 (EDT) Received: by pabxg6 with SMTP id xg6so47644634pab.0 for ; Wed, 18 Mar 2015 19:24:02 -0700 (PDT) Received: from mail.linuxfoundation.org (mail.linuxfoundation.org. [140.211.169.12]) by mx.google.com with ESMTPS id kx15si39610214pab.228.2015.03.18.19.24.01 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 18 Mar 2015 19:24:01 -0700 (PDT) Date: Wed, 18 Mar 2015 19:23:24 -0700 From: Andrew Morton Subject: Re: [PATCH V2 4/4] hugetlbfs: document min_size mount option Message-Id: <20150318192324.e0386907.akpm@linux-foundation.org> In-Reply-To: <550A2B9A.3060905@oracle.com> References: <3c82f2203e5453ddf3b29431863034afc7699303.1426549011.git.mike.kravetz@oracle.com> <20150318144108.e235862e0be30ff626e01820@linux-foundation.org> <550A2B9A.3060905@oracle.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Mike Kravetz Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Davidlohr Bueso , Aneesh Kumar , Joonsoo Kim On Wed, 18 Mar 2015 18:51:22 -0700 Mike Kravetz wrote: > > Nowhere here is the reader told the units of "size". We should at > > least describe that, and maybe even rename the thing to min_bytes. > > > > Ok, I will add that the size is in unit of bytes. My choice of > 'min_size' as a name for the new mount option was influenced by > the existing 'size' mount option. I'm open to any suggestions > for the name of this new mount option. Yes, due to the preexisting "size" I think we're stuck with "min_size". We could use min_size_bytes I guess, but the operator needs to go look up the units of "size" anyway. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pd0-f179.google.com (mail-pd0-f179.google.com [209.85.192.179]) by kanga.kvack.org (Postfix) with ESMTP id 9D7536B0038 for ; Fri, 20 Mar 2015 12:24:27 -0400 (EDT) Received: by pdbcz9 with SMTP id cz9so112894109pdb.3 for ; Fri, 20 Mar 2015 09:24:27 -0700 (PDT) Received: from userp1040.oracle.com (userp1040.oracle.com. [156.151.31.81]) by mx.google.com with ESMTPS id a4si10002308pdm.207.2015.03.20.09.24.25 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 20 Mar 2015 09:24:26 -0700 (PDT) Message-ID: <550C49B0.6070600@oracle.com> Date: Fri, 20 Mar 2015 09:24:16 -0700 From: Mike Kravetz MIME-Version: 1.0 Subject: Re: [PATCH V2 4/4] hugetlbfs: document min_size mount option References: <3c82f2203e5453ddf3b29431863034afc7699303.1426549011.git.mike.kravetz@oracle.com> <20150318144108.e235862e0be30ff626e01820@linux-foundation.org> <550A2B9A.3060905@oracle.com> <20150318192324.e0386907.akpm@linux-foundation.org> In-Reply-To: <20150318192324.e0386907.akpm@linux-foundation.org> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Davidlohr Bueso , Aneesh Kumar , Joonsoo Kim On 03/18/2015 07:23 PM, Andrew Morton wrote: > On Wed, 18 Mar 2015 18:51:22 -0700 Mike Kravetz wrote: > >>> Nowhere here is the reader told the units of "size". We should at >>> least describe that, and maybe even rename the thing to min_bytes. >>> >> >> Ok, I will add that the size is in unit of bytes. My choice of >> 'min_size' as a name for the new mount option was influenced by >> the existing 'size' mount option. I'm open to any suggestions >> for the name of this new mount option. > > Yes, due to the preexisting "size" I think we're stuck with "min_size". > We could use min_size_bytes I guess, but the operator needs to go look > up the units of "size" anyway. > Well, the existing size option can also be specified as a percentage of the huge page pool size. This is in the current code. There is a mount option 'pagesize=' that allows one to select which huge page (size) pool should be used. If none is specified the default huge page pool is used. There is no documentation for this pagesize option or using size to specify a percentage of the huge page pool size. I'll add this to the hugetlbpage.txt documentation. -- Mike Kravetz -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932796AbbCPXxr (ORCPT ); Mon, 16 Mar 2015 19:53:47 -0400 Received: from aserp1040.oracle.com ([141.146.126.69]:38350 "EHLO aserp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932158AbbCPXxq (ORCPT ); Mon, 16 Mar 2015 19:53:46 -0400 From: Mike Kravetz To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Andrew Morton , Davidlohr Bueso , Aneesh Kumar , Joonsoo Kim , Mike Kravetz Subject: [PATCH V2 0/4] hugetlbfs: add min_size filesystem mount option Date: Mon, 16 Mar 2015 16:53:25 -0700 Message-Id: X-Mailer: git-send-email 2.1.0 X-Source-IP: acsinet22.oracle.com [141.146.126.238] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org hugetlbfs allocates huge pages from the global pool as needed. Even if the global pool contains a sufficient number pages for the filesystem size at mount time, those global pages could be grabbed for some other use. As a result, filesystem huge page allocations may fail due to lack of pages. Applications such as a database want to use huge pages for performance reasons. hugetlbfs filesystem semantics with ownership and modes work well to manage access to a pool of huge pages. However, the application would like some reasonable assurance that allocations will not fail due to a lack of huge pages. At application startup time, the application would like to configure itself to use a specific number of huge pages. Before starting, the application can check to make sure that enough huge pages exist in the system global pools. However, there are no guarantees that those pages will be available when needed by the application. What the application wants is exclusive use of a subset of huge pages. Add a new hugetlbfs mount option 'min_size=' to indicate that the specified number of pages will be available for use by the filesystem. At mount time, this number of huge pages will be reserved for exclusive use of the filesystem. If there is not a sufficient number of free pages, the mount will fail. As pages are allocated to and freeed from the filesystem, the number of reserved pages is adjusted so that the specified minimum is maintained. V2: Added ability to specify minimum size. (David Rientjes) V1: Comments from RFC addressed/incorporated Mike Kravetz (4): hugetlbfs: add minimum size tracking fields to subpool structure hugetlbfs: add minimum size accounting to subpools hugetlbfs: accept subpool min_size mount option and setup accordingly hugetlbfs: document min_size mount option Documentation/vm/hugetlbpage.txt | 21 ++++-- fs/hugetlbfs/inode.c | 75 ++++++++++++++++----- include/linux/hugetlb.h | 5 +- mm/hugetlb.c | 138 ++++++++++++++++++++++++++++++++------- 4 files changed, 190 insertions(+), 49 deletions(-) -- 2.1.0 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933411AbbCPXyr (ORCPT ); Mon, 16 Mar 2015 19:54:47 -0400 Received: from aserp1040.oracle.com ([141.146.126.69]:38353 "EHLO aserp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932768AbbCPXxr (ORCPT ); Mon, 16 Mar 2015 19:53:47 -0400 From: Mike Kravetz To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Andrew Morton , Davidlohr Bueso , Aneesh Kumar , Joonsoo Kim , Mike Kravetz Subject: [PATCH V2 1/4] hugetlbfs: add minimum size tracking fields to subpool structure Date: Mon, 16 Mar 2015 16:53:26 -0700 Message-Id: <1ef964ec5febb254dbee28604481c6768e018268.1426549010.git.mike.kravetz@oracle.com> X-Mailer: git-send-email 2.1.0 In-Reply-To: References: In-Reply-To: References: X-Source-IP: aserv0022.oracle.com [141.146.126.234] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Add a field to the subpool structure to indicate the minimimum number of huge pages to always be used by this subpool. This minimum count includes allocated pages as well as reserved pages. If the minimum number of pages for the subpool have not been allocated, pages are reserved up to this minimum. An additional field (rsv_hpages) is used to track the number of pages reserved to meet this minimum size. The hstate pointer in the subpool is convenient to have when reserving and unreserving the pages. Signed-off-by: Mike Kravetz --- include/linux/hugetlb.h | 2 ++ mm/hugetlb.c | 3 +++ 2 files changed, 5 insertions(+) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 431b7fc..cfe13fd 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -23,6 +23,8 @@ struct hugepage_subpool { spinlock_t lock; long count; long max_hpages, used_hpages; + struct hstate *hstate; + long min_hpages, rsv_hpages; }; struct resv_map { diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 85032de..07b7226 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -85,6 +85,9 @@ struct hugepage_subpool *hugepage_new_subpool(long nr_blocks) spool->count = 1; spool->max_hpages = nr_blocks; spool->used_hpages = 0; + spool->hstate = NULL; + spool->min_hpages = 0; + spool->rsv_hpages = 0; return spool; } -- 2.1.0 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932854AbbCPXxx (ORCPT ); Mon, 16 Mar 2015 19:53:53 -0400 Received: from aserp1040.oracle.com ([141.146.126.69]:38374 "EHLO aserp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932076AbbCPXxu (ORCPT ); Mon, 16 Mar 2015 19:53:50 -0400 From: Mike Kravetz To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Andrew Morton , Davidlohr Bueso , Aneesh Kumar , Joonsoo Kim , Mike Kravetz Subject: [PATCH V2 2/4] hugetlbfs: add minimum size accounting to subpools Date: Mon, 16 Mar 2015 16:53:27 -0700 Message-Id: <464e43df640c54408ed78d1397ad8148784e4ecc.1426549011.git.mike.kravetz@oracle.com> X-Mailer: git-send-email 2.1.0 In-Reply-To: References: In-Reply-To: References: X-Source-IP: acsinet21.oracle.com [141.146.126.237] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The same routines that perform subpool maximum size accounting hugepage_subpool_get/put_pages() are modified to also perform minimum size accounting. When a delta value is passed to these routines, calculate how global reservations must be adjusted to maintain the subpool minimum size. The routines now return this global reserve count adjustment. This global adjusted reserve count is then passed to the global accounting routine hugetlb_acct_memory(). Signed-off-by: Mike Kravetz --- mm/hugetlb.c | 115 ++++++++++++++++++++++++++++++++++++++++++++++++----------- 1 file changed, 94 insertions(+), 21 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 07b7226..ab2ea1e 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -100,36 +100,85 @@ void hugepage_put_subpool(struct hugepage_subpool *spool) unlock_or_release_subpool(spool); } -static int hugepage_subpool_get_pages(struct hugepage_subpool *spool, +/* + * subpool accounting for allocating and reserving pages + * return -ENOMEM if there are not enough resources to satisfy the + * the request. Otherwise, return the number of pages by which the + * global pools must be adjusted (upward). The returned value may + * only be different than the passed value (delta) in the case where + * a subpool minimum size must be manitained. + */ +static long hugepage_subpool_get_pages(struct hugepage_subpool *spool, long delta) { - int ret = 0; + long ret = delta; if (!spool) - return 0; + return ret; spin_lock(&spool->lock); - if ((spool->used_hpages + delta) <= spool->max_hpages) { - spool->used_hpages += delta; - } else { - ret = -ENOMEM; + + if (spool->max_hpages != -1) { /* maximum size accounting */ + if ((spool->used_hpages + delta) <= spool->max_hpages) + spool->used_hpages += delta; + else { + ret = -ENOMEM; + goto unlock_ret; + } + } + + if (spool->min_hpages) { /* minimum size accounting */ + if (delta > spool->rsv_hpages) { + /* asking for more reserves than those already taken + * on behalf of subpool. return difference */ + ret = delta - spool->rsv_hpages; + spool->rsv_hpages = 0; + } else { + ret = 0; /* reserves already accounted for */ + spool->rsv_hpages -= delta; + } } - spin_unlock(&spool->lock); +unlock_ret: + spin_unlock(&spool->lock); return ret; } -static void hugepage_subpool_put_pages(struct hugepage_subpool *spool, +/* + * subpool accounting for freeing and unreserving pages + * Return the number of global page reservations that must be dropped. + * The return value may only be different than the passed value (delta) + * in the case where a subpool minimum size must be maintained. + */ +static long hugepage_subpool_put_pages(struct hugepage_subpool *spool, long delta) { + long ret = delta; + if (!spool) - return; + return delta; spin_lock(&spool->lock); - spool->used_hpages -= delta; + + if (spool->max_hpages != -1) /* maximum size accounting */ + spool->used_hpages -= delta; + + if (spool->min_hpages) { /* minimum size accounting */ + if (spool->rsv_hpages + delta <= spool->min_hpages) + ret = 0; + else + ret = spool->rsv_hpages + delta - spool->min_hpages; + + spool->rsv_hpages += delta; + if (spool->rsv_hpages > spool->min_hpages) + spool->rsv_hpages = spool->min_hpages; + } + /* If hugetlbfs_put_super couldn't free spool due to * an outstanding quota reference, free it now. */ unlock_or_release_subpool(spool); + + return ret; } static inline struct hugepage_subpool *subpool_inode(struct inode *inode) @@ -877,6 +926,14 @@ void free_huge_page(struct page *page) restore_reserve = PagePrivate(page); ClearPagePrivate(page); + /* + * A return code of zero implies that the subpool will be under + * it's minimum size if the reservation is not restored after + * page is free. Therefore, force restore_reserve operation. + */ + if (hugepage_subpool_put_pages(spool, 1) == 0) + restore_reserve = true; + spin_lock(&hugetlb_lock); hugetlb_cgroup_uncharge_page(hstate_index(h), pages_per_huge_page(h), page); @@ -894,7 +951,6 @@ void free_huge_page(struct page *page) enqueue_huge_page(h, page); } spin_unlock(&hugetlb_lock); - hugepage_subpool_put_pages(spool, 1); } static void prep_new_huge_page(struct hstate *h, struct page *page, int nid) @@ -1387,7 +1443,7 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma, if (chg < 0) return ERR_PTR(-ENOMEM); if (chg || avoid_reserve) - if (hugepage_subpool_get_pages(spool, 1)) + if (hugepage_subpool_get_pages(spool, 1) < 0) return ERR_PTR(-ENOSPC); ret = hugetlb_cgroup_charge_cgroup(idx, pages_per_huge_page(h), &h_cg); @@ -2455,6 +2511,7 @@ static void hugetlb_vm_op_close(struct vm_area_struct *vma) struct resv_map *resv = vma_resv_map(vma); struct hugepage_subpool *spool = subpool_vma(vma); unsigned long reserve, start, end; + long gbl_reserve; if (!resv || !is_vma_resv_set(vma, HPAGE_RESV_OWNER)) return; @@ -2467,8 +2524,12 @@ static void hugetlb_vm_op_close(struct vm_area_struct *vma) kref_put(&resv->refs, resv_map_release); if (reserve) { - hugetlb_acct_memory(h, -reserve); - hugepage_subpool_put_pages(spool, reserve); + /* + * decrement reserve counts. The global reserve count + * may be adjusted if the subpool has a minimum size. + */ + gbl_reserve = hugepage_subpool_put_pages(spool, reserve); + hugetlb_acct_memory(h, -gbl_reserve); } } @@ -3399,6 +3460,7 @@ int hugetlb_reserve_pages(struct inode *inode, struct hstate *h = hstate_inode(inode); struct hugepage_subpool *spool = subpool_inode(inode); struct resv_map *resv_map; + long gbl_reserve; /* * Only apply hugepage reservation if asked. At fault time, an @@ -3435,8 +3497,13 @@ int hugetlb_reserve_pages(struct inode *inode, goto out_err; } - /* There must be enough pages in the subpool for the mapping */ - if (hugepage_subpool_get_pages(spool, chg)) { + /* + * There must be enough pages in the subpool for the mapping. If + * the subpool has a minimum size, there may be some global + * reservations already in place (gbl_reserve). + */ + gbl_reserve = hugepage_subpool_get_pages(spool, chg); + if (gbl_reserve < 0) { ret = -ENOSPC; goto out_err; } @@ -3445,9 +3512,10 @@ int hugetlb_reserve_pages(struct inode *inode, * Check enough hugepages are available for the reservation. * Hand the pages back to the subpool if there are not */ - ret = hugetlb_acct_memory(h, chg); + ret = hugetlb_acct_memory(h, gbl_reserve); if (ret < 0) { - hugepage_subpool_put_pages(spool, chg); + /* put back original number of pages, chg */ + (void)hugepage_subpool_put_pages(spool, chg); goto out_err; } @@ -3477,6 +3545,7 @@ void hugetlb_unreserve_pages(struct inode *inode, long offset, long freed) struct resv_map *resv_map = inode_resv_map(inode); long chg = 0; struct hugepage_subpool *spool = subpool_inode(inode); + long gbl_reserve; if (resv_map) chg = region_truncate(resv_map, offset); @@ -3484,8 +3553,12 @@ void hugetlb_unreserve_pages(struct inode *inode, long offset, long freed) inode->i_blocks -= (blocks_per_huge_page(h) * freed); spin_unlock(&inode->i_lock); - hugepage_subpool_put_pages(spool, (chg - freed)); - hugetlb_acct_memory(h, -(chg - freed)); + /* + * If the subpool has a minimum size, the number of global + * reservations to be released may be adjusted. + */ + gbl_reserve = hugepage_subpool_put_pages(spool, (chg - freed)); + hugetlb_acct_memory(h, -gbl_reserve); } #ifdef CONFIG_ARCH_WANT_HUGE_PMD_SHARE -- 2.1.0 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933261AbbCPXyL (ORCPT ); Mon, 16 Mar 2015 19:54:11 -0400 Received: from userp1040.oracle.com ([156.151.31.81]:34854 "EHLO userp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932871AbbCPXyC (ORCPT ); Mon, 16 Mar 2015 19:54:02 -0400 From: Mike Kravetz To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Andrew Morton , Davidlohr Bueso , Aneesh Kumar , Joonsoo Kim , Mike Kravetz Subject: [PATCH V2 3/4] hugetlbfs: accept subpool min_size mount option and setup accordingly Date: Mon, 16 Mar 2015 16:53:28 -0700 Message-Id: X-Mailer: git-send-email 2.1.0 In-Reply-To: References: In-Reply-To: References: X-Source-IP: aserv0022.oracle.com [141.146.126.234] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Make 'min_size=' be an option when mounting a hugetlbfs. This option takes the same value as the 'size' option. min_size can be specified with specifying size. If both are specified, min_size must be less that or equal to size else the mount will fail. If min_size is specified, then at mount time an attempt is made to reserve min_size pages. If the reservation fails, the mount fails. At umount time, the reserved pages are released. Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c | 75 ++++++++++++++++++++++++++++++++++++++----------- include/linux/hugetlb.h | 3 +- mm/hugetlb.c | 26 +++++++++++++---- 3 files changed, 80 insertions(+), 24 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 5eba47f..7a20a1b 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -50,6 +50,7 @@ struct hugetlbfs_config { long nr_blocks; long nr_inodes; struct hstate *hstate; + long min_size; }; struct hugetlbfs_inode_info { @@ -73,7 +74,7 @@ int sysctl_hugetlb_shm_group; enum { Opt_size, Opt_nr_inodes, Opt_mode, Opt_uid, Opt_gid, - Opt_pagesize, + Opt_pagesize, Opt_min_size, Opt_err, }; @@ -84,6 +85,7 @@ static const match_table_t tokens = { {Opt_uid, "uid=%u"}, {Opt_gid, "gid=%u"}, {Opt_pagesize, "pagesize=%s"}, + {Opt_min_size, "min_size=%s"}, {Opt_err, NULL}, }; @@ -761,14 +763,32 @@ static const struct super_operations hugetlbfs_ops = { .show_options = generic_show_options, }; +enum { NO_SIZE, SIZE_STD, SIZE_PERCENT }; + +static bool +hugetlbfs_options_setsize(struct hstate *h, long long *size, int setsize) +{ + if (setsize == NO_SIZE) + return false; + + if (setsize == SIZE_PERCENT) { + *size <<= huge_page_shift(h); + *size *= h->max_huge_pages; + do_div(*size, 100); + } + + *size >>= huge_page_shift(h); + return true; +} + static int hugetlbfs_parse_options(char *options, struct hugetlbfs_config *pconfig) { char *p, *rest; substring_t args[MAX_OPT_ARGS]; int option; - unsigned long long size = 0; - enum { NO_SIZE, SIZE_STD, SIZE_PERCENT } setsize = NO_SIZE; + unsigned long long max_size = 0, min_size = 0; + int max_setsize = NO_SIZE, min_setsize = NO_SIZE; if (!options) return 0; @@ -806,10 +826,10 @@ hugetlbfs_parse_options(char *options, struct hugetlbfs_config *pconfig) /* memparse() will accept a K/M/G without a digit */ if (!isdigit(*args[0].from)) goto bad_val; - size = memparse(args[0].from, &rest); - setsize = SIZE_STD; + max_size = memparse(args[0].from, &rest); + max_setsize = SIZE_STD; if (*rest == '%') - setsize = SIZE_PERCENT; + max_setsize = SIZE_PERCENT; break; } @@ -832,6 +852,17 @@ hugetlbfs_parse_options(char *options, struct hugetlbfs_config *pconfig) break; } + case Opt_min_size: { + /* memparse() will accept a K/M/G without a digit */ + if (!isdigit(*args[0].from)) + goto bad_val; + min_size = memparse(args[0].from, &rest); + min_setsize = SIZE_STD; + if (*rest == '%') + min_setsize = SIZE_PERCENT; + break; + } + default: pr_err("Bad mount option: \"%s\"\n", p); return -EINVAL; @@ -839,15 +870,17 @@ hugetlbfs_parse_options(char *options, struct hugetlbfs_config *pconfig) } } - /* Do size after hstate is set up */ - if (setsize > NO_SIZE) { - struct hstate *h = pconfig->hstate; - if (setsize == SIZE_PERCENT) { - size <<= huge_page_shift(h); - size *= h->max_huge_pages; - do_div(size, 100); - } - pconfig->nr_blocks = (size >> huge_page_shift(h)); + /* Calculate number of huge pages based on hstate */ + if (hugetlbfs_options_setsize(pconfig->hstate, &max_size, max_setsize)) + pconfig->nr_blocks = max_size; + if (hugetlbfs_options_setsize(pconfig->hstate, &min_size, min_setsize)) + pconfig->min_size = min_size; + + /* If max_size specified, then min_size must be smaller */ + if (max_setsize > NO_SIZE && min_setsize > NO_SIZE && + pconfig->min_size > pconfig->nr_blocks) { + pr_err("minimum size can not be greater than maximum size\n"); + return -EINVAL; } return 0; @@ -872,6 +905,7 @@ hugetlbfs_fill_super(struct super_block *sb, void *data, int silent) config.gid = current_fsgid(); config.mode = 0755; config.hstate = &default_hstate; + config.min_size = 0; /* No default minimum size */ ret = hugetlbfs_parse_options(data, &config); if (ret) return ret; @@ -885,8 +919,15 @@ hugetlbfs_fill_super(struct super_block *sb, void *data, int silent) sbinfo->max_inodes = config.nr_inodes; sbinfo->free_inodes = config.nr_inodes; sbinfo->spool = NULL; - if (config.nr_blocks != -1) { - sbinfo->spool = hugepage_new_subpool(config.nr_blocks); + /* + * Allocate and initialize subpool if maximum or minimum size is + * specified. Any needed reservations (for minimim size) are taken + * taken when the subpool is created. + */ + if (config.nr_blocks != -1 || config.min_size != 0) { + sbinfo->spool = hugepage_new_subpool(config.hstate, + config.nr_blocks, + config.min_size); if (!sbinfo->spool) goto out_free; } diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index cfe13fd..6883fca 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -40,7 +40,8 @@ extern int hugetlb_max_hstate __read_mostly; #define for_each_hstate(h) \ for ((h) = hstates; (h) < &hstates[hugetlb_max_hstate]; (h)++) -struct hugepage_subpool *hugepage_new_subpool(long nr_blocks); +struct hugepage_subpool *hugepage_new_subpool(struct hstate *h, long nr_blocks, + long min_size); void hugepage_put_subpool(struct hugepage_subpool *spool); int PageHuge(struct page *page); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index ab2ea1e..7d4be33 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -61,6 +61,9 @@ DEFINE_SPINLOCK(hugetlb_lock); static int num_fault_mutexes; static struct mutex *htlb_fault_mutex_table ____cacheline_aligned_in_smp; +/* Forward declaration */ +static int hugetlb_acct_memory(struct hstate *h, long delta); + static inline void unlock_or_release_subpool(struct hugepage_subpool *spool) { bool free = (spool->count == 0) && (spool->used_hpages == 0); @@ -68,12 +71,18 @@ static inline void unlock_or_release_subpool(struct hugepage_subpool *spool) spin_unlock(&spool->lock); /* If no pages are used, and no other handles to the subpool - * remain, free the subpool the subpool remain */ - if (free) + * remain, give up any reservations mased on minimum size and + * free the subpool */ + if (free) { + if (spool->min_hpages) + hugetlb_acct_memory(spool->hstate, + -spool->min_hpages); kfree(spool); + } } -struct hugepage_subpool *hugepage_new_subpool(long nr_blocks) +struct hugepage_subpool *hugepage_new_subpool(struct hstate *h, long nr_blocks, + long min_size) { struct hugepage_subpool *spool; @@ -85,9 +94,14 @@ struct hugepage_subpool *hugepage_new_subpool(long nr_blocks) spool->count = 1; spool->max_hpages = nr_blocks; spool->used_hpages = 0; - spool->hstate = NULL; - spool->min_hpages = 0; - spool->rsv_hpages = 0; + spool->hstate = h; + spool->min_hpages = min_size; + + if (min_size && hugetlb_acct_memory(h, min_size)) { + kfree(spool); + return NULL; + } + spool->rsv_hpages = min_size; return spool; } -- 2.1.0 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933024AbbCPXyH (ORCPT ); Mon, 16 Mar 2015 19:54:07 -0400 Received: from userp1040.oracle.com ([156.151.31.81]:34856 "EHLO userp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932877AbbCPXyC (ORCPT ); Mon, 16 Mar 2015 19:54:02 -0400 From: Mike Kravetz To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Andrew Morton , Davidlohr Bueso , Aneesh Kumar , Joonsoo Kim , Mike Kravetz Subject: [PATCH V2 4/4] hugetlbfs: document min_size mount option Date: Mon, 16 Mar 2015 16:53:29 -0700 Message-Id: <3c82f2203e5453ddf3b29431863034afc7699303.1426549011.git.mike.kravetz@oracle.com> X-Mailer: git-send-email 2.1.0 In-Reply-To: References: In-Reply-To: References: X-Source-IP: acsinet22.oracle.com [141.146.126.238] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Update documentation for the hugetlbfs min_size mount option. Signed-off-by: Mike Kravetz --- Documentation/vm/hugetlbpage.txt | 21 ++++++++++++++------- 1 file changed, 14 insertions(+), 7 deletions(-) diff --git a/Documentation/vm/hugetlbpage.txt b/Documentation/vm/hugetlbpage.txt index f2d3a10..83c0305 100644 --- a/Documentation/vm/hugetlbpage.txt +++ b/Documentation/vm/hugetlbpage.txt @@ -267,8 +267,8 @@ call, then it is required that system administrator mount a file system of type hugetlbfs: mount -t hugetlbfs \ - -o uid=,gid=,mode=,size=,nr_inodes= \ - none /mnt/huge + -o uid=,gid=,mode=,size=,min_size=, \ + nr_inodes= none /mnt/huge This command mounts a (pseudo) filesystem of type hugetlbfs on the directory /mnt/huge. Any files created on /mnt/huge uses huge pages. The uid and gid @@ -277,11 +277,18 @@ the uid and gid of the current process are taken. The mode option sets the mode of root of file system to value & 01777. This value is given in octal. By default the value 0755 is picked. The size option sets the maximum value of memory (huge pages) allowed for that filesystem (/mnt/huge). The size is -rounded down to HPAGE_SIZE. The option nr_inodes sets the maximum number of -inodes that /mnt/huge can use. If the size or nr_inodes option is not -provided on command line then no limits are set. For size and nr_inodes -options, you can use [G|g]/[M|m]/[K|k] to represent giga/mega/kilo. For -example, size=2K has the same meaning as size=2048. +rounded down to HPAGE_SIZE. The min_size option sets the minimum value of +memory (huge pages) allowed for the filesystem. Like the size option, +min_size is rounded down to HPAGE_SIZE. At mount time, the number of huge +pages specified by min_size are reserved for use by the filesystem. If +there are not enough free huge pages available, the mount will fail. As +huge pages are allocated to the filesystem and freed, the reserve count +is adjusted so that the sum of allocated and reserved huge pages is always +at least min_size. The option nr_inodes sets the maximum number of +inodes that /mnt/huge can use. If the size, min_size or nr_inodes option +is not provided on command line then no limits are set. For size, min_size +and nr_inodes options, you can use [G|g]/[M|m]/[K|k] to represent +giga/mega/kilo. For example, size=2K has the same meaning as size=2048. While read system calls are supported on files that reside on hugetlb file systems, write system calls are not. -- 2.1.0 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932849AbbCRV0C (ORCPT ); Wed, 18 Mar 2015 17:26:02 -0400 Received: from mail.linuxfoundation.org ([140.211.169.12]:38843 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754848AbbCRVZ7 (ORCPT ); Wed, 18 Mar 2015 17:25:59 -0400 Date: Wed, 18 Mar 2015 14:25:58 -0700 From: Andrew Morton To: Mike Kravetz Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Davidlohr Bueso , Aneesh Kumar , Joonsoo Kim Subject: Re: [PATCH V2 1/4] hugetlbfs: add minimum size tracking fields to subpool structure Message-Id: <20150318142558.d2958fbb7f8b083c00c40c0d@linux-foundation.org> In-Reply-To: <1ef964ec5febb254dbee28604481c6768e018268.1426549010.git.mike.kravetz@oracle.com> References: <1ef964ec5febb254dbee28604481c6768e018268.1426549010.git.mike.kravetz@oracle.com> X-Mailer: Sylpheed 3.4.1 (GTK+ 2.24.23; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 16 Mar 2015 16:53:26 -0700 Mike Kravetz wrote: > Add a field to the subpool structure to indicate the minimimum > number of huge pages to always be used by this subpool. This > minimum count includes allocated pages as well as reserved pages. > If the minimum number of pages for the subpool have not been > allocated, pages are reserved up to this minimum. An additional > field (rsv_hpages) is used to track the number of pages reserved > to meet this minimum size. The hstate pointer in the subpool > is convenient to have when reserving and unreserving the pages. > > ... > > --- a/include/linux/hugetlb.h > +++ b/include/linux/hugetlb.h > @@ -23,6 +23,8 @@ struct hugepage_subpool { > spinlock_t lock; > long count; > long max_hpages, used_hpages; > + struct hstate *hstate; > + long min_hpages, rsv_hpages; > }; Let's leave room for the descriptive comments which aren't there. --- a/include/linux/hugetlb.h~hugetlbfs-add-minimum-size-tracking-fields-to-subpool-structure-fix +++ a/include/linux/hugetlb.h @@ -22,9 +22,11 @@ struct mmu_gather; struct hugepage_subpool { spinlock_t lock; long count; - long max_hpages, used_hpages; + long max_hpagesl + long used_hpages; struct hstate *hstate; - long min_hpages, rsv_hpages; + long min_hpages; + long rsv_hpages; }; > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -85,6 +85,9 @@ struct hugepage_subpool *hugepage_new_subpool(long nr_blocks) > spool->count = 1; > spool->max_hpages = nr_blocks; > spool->used_hpages = 0; > + spool->hstate = NULL; > + spool->min_hpages = 0; > + spool->rsv_hpages = 0; Four strikes and you're out! --- a/mm/hugetlb.c~hugetlbfs-add-minimum-size-tracking-fields-to-subpool-structure-fix +++ a/mm/hugetlb.c @@ -77,17 +77,13 @@ struct hugepage_subpool *hugepage_new_su { struct hugepage_subpool *spool; - spool = kmalloc(sizeof(*spool), GFP_KERNEL); + spool = kzalloc(sizeof(*spool), GFP_KERNEL); if (!spool) return NULL; spin_lock_init(&spool->lock); spool->count = 1; spool->max_hpages = nr_blocks; - spool->used_hpages = 0; - spool->hstate = NULL; - spool->min_hpages = 0; - spool->rsv_hpages = 0; return spool; } _ From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S964897AbbCRVk6 (ORCPT ); Wed, 18 Mar 2015 17:40:58 -0400 Received: from mail.linuxfoundation.org ([140.211.169.12]:38881 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933273AbbCRVk4 (ORCPT ); Wed, 18 Mar 2015 17:40:56 -0400 Date: Wed, 18 Mar 2015 14:40:54 -0700 From: Andrew Morton To: Mike Kravetz Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Davidlohr Bueso , Aneesh Kumar , Joonsoo Kim Subject: Re: [PATCH V2 3/4] hugetlbfs: accept subpool min_size mount option and setup accordingly Message-Id: <20150318144054.c099e8a5e462303eea707252@linux-foundation.org> In-Reply-To: References: X-Mailer: Sylpheed 3.4.1 (GTK+ 2.24.23; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 16 Mar 2015 16:53:28 -0700 Mike Kravetz wrote: > Make 'min_size=' be an option when mounting a hugetlbfs. This option > takes the same value as the 'size' option. min_size can be specified > with specifying size. If both are specified, min_size must be less > that or equal to size else the mount will fail. If min_size is > specified, then at mount time an attempt is made to reserve min_size > pages. If the reservation fails, the mount fails. At umount time, > the reserved pages are released. > > ... > > @@ -761,14 +763,32 @@ static const struct super_operations hugetlbfs_ops = { > .show_options = generic_show_options, > }; > > +enum { NO_SIZE, SIZE_STD, SIZE_PERCENT }; > + > +static bool > +hugetlbfs_options_setsize(struct hstate *h, long long *size, int setsize) > +{ > + if (setsize == NO_SIZE) > + return false; > + > + if (setsize == SIZE_PERCENT) { > + *size <<= huge_page_shift(h); > + *size *= h->max_huge_pages; > + do_div(*size, 100); I suppose do_div() takes a long long. u64 would be more conventional. I don't *think* all this code needed to use signed types. > + } > + > + *size >>= huge_page_shift(h); > + return true; > +} > + > static int > hugetlbfs_parse_options(char *options, struct hugetlbfs_config *pconfig) > { > char *p, *rest; > substring_t args[MAX_OPT_ARGS]; > int option; > - unsigned long long size = 0; > - enum { NO_SIZE, SIZE_STD, SIZE_PERCENT } setsize = NO_SIZE; > + unsigned long long max_size = 0, min_size = 0; > + int max_setsize = NO_SIZE, min_setsize = NO_SIZE; > > if (!options) > return 0; > @@ -806,10 +826,10 @@ hugetlbfs_parse_options(char *options, struct hugetlbfs_config *pconfig) > /* memparse() will accept a K/M/G without a digit */ > if (!isdigit(*args[0].from)) > goto bad_val; > - size = memparse(args[0].from, &rest); > - setsize = SIZE_STD; > + max_size = memparse(args[0].from, &rest); > + max_setsize = SIZE_STD; > if (*rest == '%') > - setsize = SIZE_PERCENT; > + max_setsize = SIZE_PERCENT; > break; > } > > @@ -832,6 +852,17 @@ hugetlbfs_parse_options(char *options, struct hugetlbfs_config *pconfig) > break; > } > > + case Opt_min_size: { > + /* memparse() will accept a K/M/G without a digit */ > + if (!isdigit(*args[0].from)) > + goto bad_val; > + min_size = memparse(args[0].from, &rest); > + min_setsize = SIZE_STD; > + if (*rest == '%') > + min_setsize = SIZE_PERCENT; > + break; > + } > + > default: > pr_err("Bad mount option: \"%s\"\n", p); > return -EINVAL; > @@ -839,15 +870,17 @@ hugetlbfs_parse_options(char *options, struct hugetlbfs_config *pconfig) > } > } > > - /* Do size after hstate is set up */ > - if (setsize > NO_SIZE) { > - struct hstate *h = pconfig->hstate; > - if (setsize == SIZE_PERCENT) { > - size <<= huge_page_shift(h); > - size *= h->max_huge_pages; > - do_div(size, 100); > - } > - pconfig->nr_blocks = (size >> huge_page_shift(h)); > + /* Calculate number of huge pages based on hstate */ > + if (hugetlbfs_options_setsize(pconfig->hstate, &max_size, max_setsize)) > + pconfig->nr_blocks = max_size; So hugetlbfs_options_setsize takes an arg whichis in units of bytes, modifies it in-place to b in units of pages and then copies it into something which is in units of nr_blocks. > + if (hugetlbfs_options_setsize(pconfig->hstate, &min_size, min_setsize)) > + pconfig->min_size = min_size; > + > + /* If max_size specified, then min_size must be smaller */ > + if (max_setsize > NO_SIZE && min_setsize > NO_SIZE && > + pconfig->min_size > pconfig->nr_blocks) { > + pr_err("minimum size can not be greater than maximum size\n"); > + return -EINVAL; > } > > return 0; > @@ -872,6 +905,7 @@ hugetlbfs_fill_super(struct super_block *sb, void *data, int silent) > config.gid = current_fsgid(); > config.mode = 0755; > config.hstate = &default_hstate; > + config.min_size = 0; /* No default minimum size */ > ret = hugetlbfs_parse_options(data, &config); > if (ret) > return ret; > @@ -885,8 +919,15 @@ hugetlbfs_fill_super(struct super_block *sb, void *data, int silent) > sbinfo->max_inodes = config.nr_inodes; > sbinfo->free_inodes = config.nr_inodes; > sbinfo->spool = NULL; > - if (config.nr_blocks != -1) { > - sbinfo->spool = hugepage_new_subpool(config.nr_blocks); > + /* > + * Allocate and initialize subpool if maximum or minimum size is > + * specified. Any needed reservations (for minimim size) are taken > + * taken when the subpool is created. > + */ > + if (config.nr_blocks != -1 || config.min_size != 0) { > + sbinfo->spool = hugepage_new_subpool(config.hstate, > + config.nr_blocks, > + config.min_size); And hugepage_new_subpool() takes something in units of nr_blocks and copies it into something whcih has units of nr-hugepages. And it takes an arg called "size" which is no longer number-of-bytes but is actually number-of-hpages. It's all rather confusing and unclear. A good philosophy would be never to use a variable called "size", because the reader doesn't know what units that size is measured in. Instead, make sure that the name reflects the variable's units. max_bytes, min_hpages, nr_blocks, etc. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933513AbbCRVlL (ORCPT ); Wed, 18 Mar 2015 17:41:11 -0400 Received: from mail.linuxfoundation.org ([140.211.169.12]:38889 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933101AbbCRVlJ (ORCPT ); Wed, 18 Mar 2015 17:41:09 -0400 Date: Wed, 18 Mar 2015 14:41:08 -0700 From: Andrew Morton To: Mike Kravetz Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Davidlohr Bueso , Aneesh Kumar , Joonsoo Kim Subject: Re: [PATCH V2 4/4] hugetlbfs: document min_size mount option Message-Id: <20150318144108.e235862e0be30ff626e01820@linux-foundation.org> In-Reply-To: <3c82f2203e5453ddf3b29431863034afc7699303.1426549011.git.mike.kravetz@oracle.com> References: <3c82f2203e5453ddf3b29431863034afc7699303.1426549011.git.mike.kravetz@oracle.com> X-Mailer: Sylpheed 3.4.1 (GTK+ 2.24.23; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 16 Mar 2015 16:53:29 -0700 Mike Kravetz wrote: > Update documentation for the hugetlbfs min_size mount option. > > Signed-off-by: Mike Kravetz > --- > Documentation/vm/hugetlbpage.txt | 21 ++++++++++++++------- > 1 file changed, 14 insertions(+), 7 deletions(-) > > diff --git a/Documentation/vm/hugetlbpage.txt b/Documentation/vm/hugetlbpage.txt > index f2d3a10..83c0305 100644 > --- a/Documentation/vm/hugetlbpage.txt > +++ b/Documentation/vm/hugetlbpage.txt > @@ -267,8 +267,8 @@ call, then it is required that system administrator mount a file system of > type hugetlbfs: > > mount -t hugetlbfs \ > - -o uid=,gid=,mode=,size=,nr_inodes= \ > - none /mnt/huge > + -o uid=,gid=,mode=,size=,min_size=, \ > + nr_inodes= none /mnt/huge > > This command mounts a (pseudo) filesystem of type hugetlbfs on the directory > /mnt/huge. Any files created on /mnt/huge uses huge pages. The uid and gid > @@ -277,11 +277,18 @@ the uid and gid of the current process are taken. The mode option sets the > mode of root of file system to value & 01777. This value is given in octal. > By default the value 0755 is picked. The size option sets the maximum value of > memory (huge pages) allowed for that filesystem (/mnt/huge). The size is > -rounded down to HPAGE_SIZE. The option nr_inodes sets the maximum number of > -inodes that /mnt/huge can use. If the size or nr_inodes option is not > -provided on command line then no limits are set. For size and nr_inodes > -options, you can use [G|g]/[M|m]/[K|k] to represent giga/mega/kilo. For > -example, size=2K has the same meaning as size=2048. > +rounded down to HPAGE_SIZE. The min_size option sets the minimum value of > +memory (huge pages) allowed for the filesystem. Like the size option, > +min_size is rounded down to HPAGE_SIZE. At mount time, the number of huge > +pages specified by min_size are reserved for use by the filesystem. If > +there are not enough free huge pages available, the mount will fail. As > +huge pages are allocated to the filesystem and freed, the reserve count > +is adjusted so that the sum of allocated and reserved huge pages is always > +at least min_size. The option nr_inodes sets the maximum number of > +inodes that /mnt/huge can use. If the size, min_size or nr_inodes option > +is not provided on command line then no limits are set. For size, min_size > +and nr_inodes options, you can use [G|g]/[M|m]/[K|k] to represent > +giga/mega/kilo. For example, size=2K has the same meaning as size=2048. Nowhere here is the reader told the units of "size". We should at least describe that, and maybe even rename the thing to min_bytes. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965036AbbCRVoA (ORCPT ); Wed, 18 Mar 2015 17:44:00 -0400 Received: from mail.linuxfoundation.org ([140.211.169.12]:38911 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933337AbbCRVn6 (ORCPT ); Wed, 18 Mar 2015 17:43:58 -0400 Date: Wed, 18 Mar 2015 14:43:57 -0700 From: Andrew Morton To: Mike Kravetz Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Davidlohr Bueso , Aneesh Kumar , Joonsoo Kim Subject: Re: [PATCH V2 2/4] hugetlbfs: add minimum size accounting to subpools Message-Id: <20150318144357.0e7e25cdca5066c39032bae6@linux-foundation.org> In-Reply-To: <464e43df640c54408ed78d1397ad8148784e4ecc.1426549011.git.mike.kravetz@oracle.com> References: <464e43df640c54408ed78d1397ad8148784e4ecc.1426549011.git.mike.kravetz@oracle.com> X-Mailer: Sylpheed 3.4.1 (GTK+ 2.24.23; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 16 Mar 2015 16:53:27 -0700 Mike Kravetz wrote: > The same routines that perform subpool maximum size accounting > hugepage_subpool_get/put_pages() are modified to also perform > minimum size accounting. When a delta value is passed to these > routines, calculate how global reservations must be adjusted > to maintain the subpool minimum size. The routines now return > this global reserve count adjustment. This global adjusted > reserve count is then passed to the global accounting routine > hugetlb_acct_memory(). > The comment layout is a bit chaotic. Also, sentences start with capital letters and end with little round things! It's a bit anal but heck, the kernel isn't written in linglish. --- a/mm/hugetlb.c~hugetlbfs-add-minimum-size-accounting-to-subpools-fix +++ a/mm/hugetlb.c @@ -125,8 +125,10 @@ static long hugepage_subpool_get_pages(s if (spool->min_hpages) { /* minimum size accounting */ if (delta > spool->rsv_hpages) { - /* asking for more reserves than those already taken - * on behalf of subpool. return difference */ + /* + * Asking for more reserves than those already taken on + * behalf of subpool. Return difference. + */ ret = delta - spool->rsv_hpages; spool->rsv_hpages = 0; } else { @@ -141,7 +143,7 @@ unlock_ret: } /* - * subpool accounting for freeing and unreserving pages + * Subpool accounting for freeing and unreserving pages. * Return the number of global page reservations that must be dropped. * The return value may only be different than the passed value (delta) * in the case where a subpool minimum size must be maintained. @@ -170,8 +172,10 @@ static long hugepage_subpool_put_pages(s spool->rsv_hpages = spool->min_hpages; } - /* If hugetlbfs_put_super couldn't free spool due to - * an outstanding quota reference, free it now. */ + /* + * If hugetlbfs_put_super couldn't free spool due to an outstanding + * quota reference, free it now. + */ unlock_or_release_subpool(spool); return ret; @@ -923,9 +927,9 @@ void free_huge_page(struct page *page) ClearPagePrivate(page); /* - * A return code of zero implies that the subpool will be under - * it's minimum size if the reservation is not restored after - * page is free. Therefore, force restore_reserve operation. + * A return code of zero implies that the subpool will be under its + * minimum size if the reservation is not restored after page is free. + * Therefore, force restore_reserve operation. */ if (hugepage_subpool_put_pages(spool, 1) == 0) restore_reserve = true; @@ -2523,8 +2527,8 @@ static void hugetlb_vm_op_close(struct v if (reserve) { /* - * decrement reserve counts. The global reserve count - * may be adjusted if the subpool has a minimum size. + * Decrement reserve counts. The global reserve count may be + * adjusted if the subpool has a minimum size. */ gbl_reserve = hugepage_subpool_put_pages(spool, reserve); hugetlb_acct_memory(h, -gbl_reserve); _ From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932141AbbCSBep (ORCPT ); Wed, 18 Mar 2015 21:34:45 -0400 Received: from userp1040.oracle.com ([156.151.31.81]:21895 "EHLO userp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754591AbbCSBeb (ORCPT ); Wed, 18 Mar 2015 21:34:31 -0400 Message-ID: <550A2797.3000708@oracle.com> Date: Wed, 18 Mar 2015 18:34:15 -0700 From: Mike Kravetz User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.5.0 MIME-Version: 1.0 To: Andrew Morton CC: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Davidlohr Bueso , Aneesh Kumar , Joonsoo Kim Subject: Re: [PATCH V2 3/4] hugetlbfs: accept subpool min_size mount option and setup accordingly References: <20150318144054.c099e8a5e462303eea707252@linux-foundation.org> In-Reply-To: <20150318144054.c099e8a5e462303eea707252@linux-foundation.org> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-Source-IP: userv0022.oracle.com [156.151.31.74] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 03/18/2015 02:40 PM, Andrew Morton wrote: > On Mon, 16 Mar 2015 16:53:28 -0700 Mike Kravetz wrote: > >> Make 'min_size=' be an option when mounting a hugetlbfs. This option >> takes the same value as the 'size' option. min_size can be specified >> with specifying size. If both are specified, min_size must be less >> that or equal to size else the mount will fail. If min_size is >> specified, then at mount time an attempt is made to reserve min_size >> pages. If the reservation fails, the mount fails. At umount time, >> the reserved pages are released. >> >> ... >> >> @@ -761,14 +763,32 @@ static const struct super_operations hugetlbfs_ops = { >> .show_options = generic_show_options, >> }; >> >> +enum { NO_SIZE, SIZE_STD, SIZE_PERCENT }; >> + >> +static bool >> +hugetlbfs_options_setsize(struct hstate *h, long long *size, int setsize) >> +{ >> + if (setsize == NO_SIZE) >> + return false; >> + >> + if (setsize == SIZE_PERCENT) { >> + *size <<= huge_page_shift(h); >> + *size *= h->max_huge_pages; >> + do_div(*size, 100); > > I suppose do_div() takes a long long. u64 would be more conventional. > I don't *think* all this code needed to use signed types. > >> + } >> + >> + *size >>= huge_page_shift(h); >> + return true; >> +} >> + >> static int >> hugetlbfs_parse_options(char *options, struct hugetlbfs_config *pconfig) >> { >> char *p, *rest; >> substring_t args[MAX_OPT_ARGS]; >> int option; >> - unsigned long long size = 0; >> - enum { NO_SIZE, SIZE_STD, SIZE_PERCENT } setsize = NO_SIZE; >> + unsigned long long max_size = 0, min_size = 0; >> + int max_setsize = NO_SIZE, min_setsize = NO_SIZE; >> >> if (!options) >> return 0; >> @@ -806,10 +826,10 @@ hugetlbfs_parse_options(char *options, struct hugetlbfs_config *pconfig) >> /* memparse() will accept a K/M/G without a digit */ >> if (!isdigit(*args[0].from)) >> goto bad_val; >> - size = memparse(args[0].from, &rest); >> - setsize = SIZE_STD; >> + max_size = memparse(args[0].from, &rest); >> + max_setsize = SIZE_STD; >> if (*rest == '%') >> - setsize = SIZE_PERCENT; >> + max_setsize = SIZE_PERCENT; >> break; >> } >> >> @@ -832,6 +852,17 @@ hugetlbfs_parse_options(char *options, struct hugetlbfs_config *pconfig) >> break; >> } >> >> + case Opt_min_size: { >> + /* memparse() will accept a K/M/G without a digit */ >> + if (!isdigit(*args[0].from)) >> + goto bad_val; >> + min_size = memparse(args[0].from, &rest); >> + min_setsize = SIZE_STD; >> + if (*rest == '%') >> + min_setsize = SIZE_PERCENT; >> + break; >> + } >> + >> default: >> pr_err("Bad mount option: \"%s\"\n", p); >> return -EINVAL; >> @@ -839,15 +870,17 @@ hugetlbfs_parse_options(char *options, struct hugetlbfs_config *pconfig) >> } >> } >> >> - /* Do size after hstate is set up */ >> - if (setsize > NO_SIZE) { >> - struct hstate *h = pconfig->hstate; >> - if (setsize == SIZE_PERCENT) { >> - size <<= huge_page_shift(h); >> - size *= h->max_huge_pages; >> - do_div(size, 100); >> - } >> - pconfig->nr_blocks = (size >> huge_page_shift(h)); >> + /* Calculate number of huge pages based on hstate */ >> + if (hugetlbfs_options_setsize(pconfig->hstate, &max_size, max_setsize)) >> + pconfig->nr_blocks = max_size; > > So hugetlbfs_options_setsize takes an arg whichis in units of bytes, > modifies it in-place to b in units of pages and then copies it into > something which is in units of nr_blocks. > > >> + if (hugetlbfs_options_setsize(pconfig->hstate, &min_size, min_setsize)) >> + pconfig->min_size = min_size; >> + >> + /* If max_size specified, then min_size must be smaller */ >> + if (max_setsize > NO_SIZE && min_setsize > NO_SIZE && >> + pconfig->min_size > pconfig->nr_blocks) { >> + pr_err("minimum size can not be greater than maximum size\n"); >> + return -EINVAL; >> } >> >> return 0; >> @@ -872,6 +905,7 @@ hugetlbfs_fill_super(struct super_block *sb, void *data, int silent) >> config.gid = current_fsgid(); >> config.mode = 0755; >> config.hstate = &default_hstate; >> + config.min_size = 0; /* No default minimum size */ >> ret = hugetlbfs_parse_options(data, &config); >> if (ret) >> return ret; >> @@ -885,8 +919,15 @@ hugetlbfs_fill_super(struct super_block *sb, void *data, int silent) >> sbinfo->max_inodes = config.nr_inodes; >> sbinfo->free_inodes = config.nr_inodes; >> sbinfo->spool = NULL; >> - if (config.nr_blocks != -1) { >> - sbinfo->spool = hugepage_new_subpool(config.nr_blocks); >> + /* >> + * Allocate and initialize subpool if maximum or minimum size is >> + * specified. Any needed reservations (for minimim size) are taken >> + * taken when the subpool is created. >> + */ >> + if (config.nr_blocks != -1 || config.min_size != 0) { >> + sbinfo->spool = hugepage_new_subpool(config.hstate, >> + config.nr_blocks, >> + config.min_size); > > And hugepage_new_subpool() takes something in units of nr_blocks and > copies it into something whcih has units of nr-hugepages. > > And it takes an arg called "size" which is no longer number-of-bytes > but is actually number-of-hpages. > > > It's all rather confusing and unclear. A good philosophy would be > never to use a variable called "size", because the reader doesn't know > what units that size is measured in. Instead, make sure that the name > reflects the variable's units. max_bytes, min_hpages, nr_blocks, etc. > Thanks for the comments. I didn't want to cut/paste/duplicate the code used to parse the existing size option. But, it looks like I made it harder to understand. I'll take a pass as cleaning this up and making it more clear. -- Mike Kravetz From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755394AbbCSBvi (ORCPT ); Wed, 18 Mar 2015 21:51:38 -0400 Received: from aserp1040.oracle.com ([141.146.126.69]:35824 "EHLO aserp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751948AbbCSBvh (ORCPT ); Wed, 18 Mar 2015 21:51:37 -0400 Message-ID: <550A2B9A.3060905@oracle.com> Date: Wed, 18 Mar 2015 18:51:22 -0700 From: Mike Kravetz User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.5.0 MIME-Version: 1.0 To: Andrew Morton CC: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Davidlohr Bueso , Aneesh Kumar , Joonsoo Kim Subject: Re: [PATCH V2 4/4] hugetlbfs: document min_size mount option References: <3c82f2203e5453ddf3b29431863034afc7699303.1426549011.git.mike.kravetz@oracle.com> <20150318144108.e235862e0be30ff626e01820@linux-foundation.org> In-Reply-To: <20150318144108.e235862e0be30ff626e01820@linux-foundation.org> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-Source-IP: aserv0022.oracle.com [141.146.126.234] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 03/18/2015 02:41 PM, Andrew Morton wrote: > On Mon, 16 Mar 2015 16:53:29 -0700 Mike Kravetz wrote: > >> Update documentation for the hugetlbfs min_size mount option. >> >> Signed-off-by: Mike Kravetz >> --- >> Documentation/vm/hugetlbpage.txt | 21 ++++++++++++++------- >> 1 file changed, 14 insertions(+), 7 deletions(-) >> >> diff --git a/Documentation/vm/hugetlbpage.txt b/Documentation/vm/hugetlbpage.txt >> index f2d3a10..83c0305 100644 >> --- a/Documentation/vm/hugetlbpage.txt >> +++ b/Documentation/vm/hugetlbpage.txt >> @@ -267,8 +267,8 @@ call, then it is required that system administrator mount a file system of >> type hugetlbfs: >> >> mount -t hugetlbfs \ >> - -o uid=,gid=,mode=,size=,nr_inodes= \ >> - none /mnt/huge >> + -o uid=,gid=,mode=,size=,min_size=, \ >> + nr_inodes= none /mnt/huge >> >> This command mounts a (pseudo) filesystem of type hugetlbfs on the directory >> /mnt/huge. Any files created on /mnt/huge uses huge pages. The uid and gid >> @@ -277,11 +277,18 @@ the uid and gid of the current process are taken. The mode option sets the >> mode of root of file system to value & 01777. This value is given in octal. >> By default the value 0755 is picked. The size option sets the maximum value of >> memory (huge pages) allowed for that filesystem (/mnt/huge). The size is >> -rounded down to HPAGE_SIZE. The option nr_inodes sets the maximum number of >> -inodes that /mnt/huge can use. If the size or nr_inodes option is not >> -provided on command line then no limits are set. For size and nr_inodes >> -options, you can use [G|g]/[M|m]/[K|k] to represent giga/mega/kilo. For >> -example, size=2K has the same meaning as size=2048. >> +rounded down to HPAGE_SIZE. The min_size option sets the minimum value of >> +memory (huge pages) allowed for the filesystem. Like the size option, >> +min_size is rounded down to HPAGE_SIZE. At mount time, the number of huge >> +pages specified by min_size are reserved for use by the filesystem. If >> +there are not enough free huge pages available, the mount will fail. As >> +huge pages are allocated to the filesystem and freed, the reserve count >> +is adjusted so that the sum of allocated and reserved huge pages is always >> +at least min_size. The option nr_inodes sets the maximum number of >> +inodes that /mnt/huge can use. If the size, min_size or nr_inodes option >> +is not provided on command line then no limits are set. For size, min_size >> +and nr_inodes options, you can use [G|g]/[M|m]/[K|k] to represent >> +giga/mega/kilo. For example, size=2K has the same meaning as size=2048. > > Nowhere here is the reader told the units of "size". We should at > least describe that, and maybe even rename the thing to min_bytes. > Ok, I will add that the size is in unit of bytes. My choice of 'min_size' as a name for the new mount option was influenced by the existing 'size' mount option. I'm open to any suggestions for the name of this new mount option. -- Mike Kravetz From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751144AbbCSCYG (ORCPT ); Wed, 18 Mar 2015 22:24:06 -0400 Received: from mail.linuxfoundation.org ([140.211.169.12]:41196 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750727AbbCSCYB (ORCPT ); Wed, 18 Mar 2015 22:24:01 -0400 Date: Wed, 18 Mar 2015 19:23:24 -0700 From: Andrew Morton To: Mike Kravetz Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Davidlohr Bueso , Aneesh Kumar , Joonsoo Kim Subject: Re: [PATCH V2 4/4] hugetlbfs: document min_size mount option Message-Id: <20150318192324.e0386907.akpm@linux-foundation.org> In-Reply-To: <550A2B9A.3060905@oracle.com> References: <3c82f2203e5453ddf3b29431863034afc7699303.1426549011.git.mike.kravetz@oracle.com> <20150318144108.e235862e0be30ff626e01820@linux-foundation.org> <550A2B9A.3060905@oracle.com> X-Mailer: Sylpheed 2.7.1 (GTK+ 2.18.9; x86_64-redhat-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 18 Mar 2015 18:51:22 -0700 Mike Kravetz wrote: > > Nowhere here is the reader told the units of "size". We should at > > least describe that, and maybe even rename the thing to min_bytes. > > > > Ok, I will add that the size is in unit of bytes. My choice of > 'min_size' as a name for the new mount option was influenced by > the existing 'size' mount option. I'm open to any suggestions > for the name of this new mount option. Yes, due to the preexisting "size" I think we're stuck with "min_size". We could use min_size_bytes I guess, but the operator needs to go look up the units of "size" anyway. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751901AbbCTQYi (ORCPT ); Fri, 20 Mar 2015 12:24:38 -0400 Received: from userp1040.oracle.com ([156.151.31.81]:41033 "EHLO userp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751312AbbCTQYe (ORCPT ); Fri, 20 Mar 2015 12:24:34 -0400 Message-ID: <550C49B0.6070600@oracle.com> Date: Fri, 20 Mar 2015 09:24:16 -0700 From: Mike Kravetz User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.5.0 MIME-Version: 1.0 To: Andrew Morton CC: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Davidlohr Bueso , Aneesh Kumar , Joonsoo Kim Subject: Re: [PATCH V2 4/4] hugetlbfs: document min_size mount option References: <3c82f2203e5453ddf3b29431863034afc7699303.1426549011.git.mike.kravetz@oracle.com> <20150318144108.e235862e0be30ff626e01820@linux-foundation.org> <550A2B9A.3060905@oracle.com> <20150318192324.e0386907.akpm@linux-foundation.org> In-Reply-To: <20150318192324.e0386907.akpm@linux-foundation.org> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-Source-IP: acsinet22.oracle.com [141.146.126.238] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 03/18/2015 07:23 PM, Andrew Morton wrote: > On Wed, 18 Mar 2015 18:51:22 -0700 Mike Kravetz wrote: > >>> Nowhere here is the reader told the units of "size". We should at >>> least describe that, and maybe even rename the thing to min_bytes. >>> >> >> Ok, I will add that the size is in unit of bytes. My choice of >> 'min_size' as a name for the new mount option was influenced by >> the existing 'size' mount option. I'm open to any suggestions >> for the name of this new mount option. > > Yes, due to the preexisting "size" I think we're stuck with "min_size". > We could use min_size_bytes I guess, but the operator needs to go look > up the units of "size" anyway. > Well, the existing size option can also be specified as a percentage of the huge page pool size. This is in the current code. There is a mount option 'pagesize=' that allows one to select which huge page (size) pool should be used. If none is specified the default huge page pool is used. There is no documentation for this pagesize option or using size to specify a percentage of the huge page pool size. I'll add this to the hugetlbpage.txt documentation. -- Mike Kravetz