* [RFC v2 PATCH 0/5] hugetlbfs: add fallocate support
@ 2015-04-23 22:13 Mike Kravetz
2015-04-23 22:13 ` [RFC v2 PATCH 1/5] hugetlbfs: truncate_hugepages() takes a range of pages Mike Kravetz
` (5 more replies)
0 siblings, 6 replies; 7+ messages in thread
From: Mike Kravetz @ 2015-04-23 22:13 UTC (permalink / raw)
To: linux-mm, linux-kernel
Cc: Dave Hansen, Naoya Horiguchi, David Rientjes, Hugh Dickins,
Davidlohr Bueso, Aneesh Kumar, Hillf Danton, Christoph Hellwig,
Mike Kravetz
hugetlbfs is used today by applications that want a high degree of
control over huge page usage. Often, large hugetlbfs files are used
to map a large number huge pages into the application processes.
The applications know when page ranges within these large files will
no longer be used, and ideally would like to release them back to
the subpool or global pools for other uses. The fallocate() system
call provides an interface for preallocation and hole punching within
files. This patch set adds fallocate functionality to hugetlbfs.
RFC v2:
Addressed alignment and error handling issues noticed by Hillf Danton
New region_del() routine for region tracking/resv_map of ranges
Fixed several issues found during more extensive testing
Error handling in region_del() when kmalloc() fails stills needs
to be addressed
madvise remove support remains
Mike Kravetz (5):
hugetlbfs: truncate_hugepages() takes a range of pages
hugetlbfs: remove region_truncte() as region_del() can be used
hugetlbfs: New huge_add_to_page_cache helper routine
hugetlbfs: add hugetlbfs_fallocate()
mm: madvise allow remove operation for hugetlbfs
fs/hugetlbfs/inode.c | 169 ++++++++++++++++++++++++++++++++++++++++++++++--
include/linux/hugetlb.h | 8 ++-
mm/hugetlb.c | 110 ++++++++++++++++++++++---------
mm/madvise.c | 2 +-
4 files changed, 248 insertions(+), 41 deletions(-)
--
2.1.0
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
* [RFC v2 PATCH 1/5] hugetlbfs: truncate_hugepages() takes a range of pages
2015-04-23 22:13 [RFC v2 PATCH 0/5] hugetlbfs: add fallocate support Mike Kravetz
@ 2015-04-23 22:13 ` Mike Kravetz
2015-04-23 22:13 ` [RFC v2 PATCH 2/5] hugetlbfs: remove region_truncte() as region_del() can be used Mike Kravetz
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Mike Kravetz @ 2015-04-23 22:13 UTC (permalink / raw)
To: linux-mm, linux-kernel
Cc: Dave Hansen, Naoya Horiguchi, David Rientjes, Hugh Dickins,
Davidlohr Bueso, Aneesh Kumar, Hillf Danton, Christoph Hellwig,
Mike Kravetz
Modify truncate_hugepages() to take a range of pages (start, end)
instead of simply start. If the value of end is -1, this indicates
the end of the range is the end of the file. This functionality
will be used for fallocate hole punching.
Downstream of truncate_hugepages, the routines hugetlb_unreserve_pages
must also be modified to accept a range of pages.
A new region tracking/resv_map routine region_del() is added to delete
a range of regions within the reserve maps. As in truncate_hugepages,
a range end value of -1 indicates all regions after the starting value
should be deleted.
Based-on code-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
fs/hugetlbfs/inode.c | 31 +++++++++++++++-----
include/linux/hugetlb.h | 3 +-
mm/hugetlb.c | 76 +++++++++++++++++++++++++++++++++++++++++++++++--
3 files changed, 100 insertions(+), 10 deletions(-)
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index c274aca..2faf2c4 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -324,19 +324,32 @@ static void truncate_huge_page(struct page *page)
delete_from_page_cache(page);
}
-static void truncate_hugepages(struct inode *inode, loff_t lstart)
+static void truncate_hugepages(struct inode *inode, loff_t lstart, loff_t lend)
{
struct hstate *h = hstate_inode(inode);
struct address_space *mapping = &inode->i_data;
const pgoff_t start = lstart >> huge_page_shift(h);
+ const pgoff_t end = lend >> huge_page_shift(h);
struct pagevec pvec;
pgoff_t next;
int i, freed = 0;
+ long lookup_nr = PAGEVEC_SIZE;
pagevec_init(&pvec, 0);
next = start;
- while (1) {
- if (!pagevec_lookup(&pvec, mapping, next, PAGEVEC_SIZE)) {
+ while (next < end) {
+ /*
+ * Make sure to never grab more pages that we
+ * might possibly need.
+ */
+ if (end - next < lookup_nr)
+ lookup_nr = end - next;
+
+ /*
+ * This pagevec_lookup() may return pages past 'end',
+ * so we must check for page->index > end.
+ */
+ if (!pagevec_lookup(&pvec, mapping, next, lookup_nr)) {
if (next == start)
break;
next = start;
@@ -347,6 +360,11 @@ static void truncate_hugepages(struct inode *inode, loff_t lstart)
struct page *page = pvec.pages[i];
lock_page(page);
+ if (page->index >= end) {
+ unlock_page(page);
+ next = end; /* we are done */
+ break;
+ }
if (page->index > next)
next = page->index;
++next;
@@ -356,15 +374,14 @@ static void truncate_hugepages(struct inode *inode, loff_t lstart)
}
huge_pagevec_release(&pvec);
}
- BUG_ON(!lstart && mapping->nrpages);
- hugetlb_unreserve_pages(inode, start, freed);
+ hugetlb_unreserve_pages(inode, start, end, freed);
}
static void hugetlbfs_evict_inode(struct inode *inode)
{
struct resv_map *resv_map;
- truncate_hugepages(inode, 0);
+ truncate_hugepages(inode, 0, -1);
resv_map = (struct resv_map *)inode->i_mapping->private_data;
/* root inode doesn't have the resv_map, so we should check it */
if (resv_map)
@@ -410,7 +427,7 @@ static int hugetlb_vmtruncate(struct inode *inode, loff_t offset)
if (!RB_EMPTY_ROOT(&mapping->i_mmap))
hugetlb_vmtruncate_list(&mapping->i_mmap, pgoff);
i_mmap_unlock_write(mapping);
- truncate_hugepages(inode, offset);
+ truncate_hugepages(inode, offset, -1);
return 0;
}
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 7b57850..de39705 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -75,7 +75,8 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
int hugetlb_reserve_pages(struct inode *inode, long from, long to,
struct vm_area_struct *vma,
vm_flags_t vm_flags);
-void hugetlb_unreserve_pages(struct inode *inode, long offset, long freed);
+void hugetlb_unreserve_pages(struct inode *inode, long start, long end,
+ long freed);
int dequeue_hwpoisoned_huge_page(struct page *page);
bool isolate_huge_page(struct page *page, struct list_head *list);
void putback_active_hugepage(struct page *page);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index c41b2a0..31e36cd 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -257,6 +257,77 @@ out_nrg:
return chg;
}
+static long region_del(struct resv_map *resv, long f, long t)
+{
+ struct list_head *head = &resv->regions;
+ struct file_region *rg, *trg;
+ struct file_region *nrg = NULL;
+ long chg = 0;
+
+ /*
+ * Locate segments we overlap and etiher split, remove or
+ * trim the existing regions. The end of region (t) == -1
+ * indicates all remaining regions. Special case t == -1 as
+ * all comparisons are signed.
+ */
+ if (t == -1)
+ t = LONG_MAX;
+retry:
+ spin_lock(&resv->lock);
+ list_for_each_entry_safe(rg, trg, head, link) {
+ if (rg->to <= f)
+ continue;
+ if (rg->from >= t)
+ break;
+
+ if (f > rg->from && t < rg->to) { /* must split region */
+ if (!nrg) {
+ spin_unlock(&resv->lock);
+ nrg = kmalloc(sizeof(*nrg),
+ GFP_KERNEL | __GFP_REPEAT);
+ if (!nrg) {
+ /* FIXME FIXME FIXME FIXME */
+ return -ENOMEM;
+ }
+ goto retry;
+ }
+
+ chg += t - f;
+
+ /* new entry for end of split region */
+ nrg->from = t;
+ nrg->to = rg->to;
+ INIT_LIST_HEAD(&nrg->link);
+
+ /* original entry is trimmed */
+ rg->to = f;
+
+ list_add(&nrg->link, &rg->link);
+ nrg = NULL;
+ break;
+ }
+
+ if (f <= rg->from && t >= rg->to) { /* remove entire region */
+ chg += rg->to - rg->from;
+ list_del(&rg->link);
+ kfree(rg);
+ continue;
+ }
+
+ if (f <= rg->from) { /* trim beginning of region */
+ chg += t - rg->from;
+ rg->from = t;
+ } else { /* trim end of region */
+ chg += rg->to - f;
+ rg->to = f;
+ }
+ }
+
+ spin_unlock(&resv->lock);
+ kfree(nrg);
+ return chg;
+}
+
static long region_truncate(struct resv_map *resv, long end)
{
struct list_head *head = &resv->regions;
@@ -3510,7 +3581,8 @@ out_err:
return ret;
}
-void hugetlb_unreserve_pages(struct inode *inode, long offset, long freed)
+void hugetlb_unreserve_pages(struct inode *inode, long start, long end,
+ long freed)
{
struct hstate *h = hstate_inode(inode);
struct resv_map *resv_map = inode_resv_map(inode);
@@ -3518,7 +3590,7 @@ void hugetlb_unreserve_pages(struct inode *inode, long offset, long freed)
struct hugepage_subpool *spool = subpool_inode(inode);
if (resv_map)
- chg = region_truncate(resv_map, offset);
+ chg = region_del(resv_map, start, end);
spin_lock(&inode->i_lock);
inode->i_blocks -= (blocks_per_huge_page(h) * freed);
spin_unlock(&inode->i_lock);
--
2.1.0
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [RFC v2 PATCH 2/5] hugetlbfs: remove region_truncte() as region_del() can be used
2015-04-23 22:13 [RFC v2 PATCH 0/5] hugetlbfs: add fallocate support Mike Kravetz
2015-04-23 22:13 ` [RFC v2 PATCH 1/5] hugetlbfs: truncate_hugepages() takes a range of pages Mike Kravetz
@ 2015-04-23 22:13 ` Mike Kravetz
2015-04-23 22:13 ` [RFC v2 PATCH 3/5] hugetlbfs: New huge_add_to_page_cache helper routine Mike Kravetz
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Mike Kravetz @ 2015-04-23 22:13 UTC (permalink / raw)
To: linux-mm, linux-kernel
Cc: Dave Hansen, Naoya Horiguchi, David Rientjes, Hugh Dickins,
Davidlohr Bueso, Aneesh Kumar, Hillf Danton, Christoph Hellwig,
Mike Kravetz
Now that region_del() exists, the region_truncate() routine can be
removed. Callers of region_truncate are changed to call region_del
instead with a ending value of -1.
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
mm/hugetlb.c | 37 +------------------------------------
1 file changed, 1 insertion(+), 36 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 31e36cd..60a4f21 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -328,41 +328,6 @@ retry:
return chg;
}
-static long region_truncate(struct resv_map *resv, long end)
-{
- struct list_head *head = &resv->regions;
- struct file_region *rg, *trg;
- long chg = 0;
-
- spin_lock(&resv->lock);
- /* Locate the region we are either in or before. */
- list_for_each_entry(rg, head, link)
- if (end <= rg->to)
- break;
- if (&rg->link == head)
- goto out;
-
- /* If we are in the middle of a region then adjust it. */
- if (end > rg->from) {
- chg = rg->to - end;
- rg->to = end;
- rg = list_entry(rg->link.next, typeof(*rg), link);
- }
-
- /* Drop any remaining regions. */
- list_for_each_entry_safe(rg, trg, rg->link.prev, link) {
- if (&rg->link == head)
- break;
- chg += rg->to - rg->from;
- list_del(&rg->link);
- kfree(rg);
- }
-
-out:
- spin_unlock(&resv->lock);
- return chg;
-}
-
static long region_count(struct resv_map *resv, long f, long t)
{
struct list_head *head = &resv->regions;
@@ -494,7 +459,7 @@ void resv_map_release(struct kref *ref)
struct resv_map *resv_map = container_of(ref, struct resv_map, refs);
/* Clear out any active regions before we release the map. */
- region_truncate(resv_map, 0);
+ region_del(resv_map, 0, -1);
kfree(resv_map);
}
--
2.1.0
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [RFC v2 PATCH 3/5] hugetlbfs: New huge_add_to_page_cache helper routine
2015-04-23 22:13 [RFC v2 PATCH 0/5] hugetlbfs: add fallocate support Mike Kravetz
2015-04-23 22:13 ` [RFC v2 PATCH 1/5] hugetlbfs: truncate_hugepages() takes a range of pages Mike Kravetz
2015-04-23 22:13 ` [RFC v2 PATCH 2/5] hugetlbfs: remove region_truncte() as region_del() can be used Mike Kravetz
@ 2015-04-23 22:13 ` Mike Kravetz
2015-04-23 22:13 ` [RFC v2 PATCH 4/5] hugetlbfs: add hugetlbfs_fallocate() Mike Kravetz
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Mike Kravetz @ 2015-04-23 22:13 UTC (permalink / raw)
To: linux-mm, linux-kernel
Cc: Dave Hansen, Naoya Horiguchi, David Rientjes, Hugh Dickins,
Davidlohr Bueso, Aneesh Kumar, Hillf Danton, Christoph Hellwig,
Mike Kravetz
Currently, there is only a single place where hugetlbfs pages are
added to the page cache. The new fallocate code be adding a second
one, so break the functionality out into its own helper.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
include/linux/hugetlb.h | 2 ++
mm/hugetlb.c | 27 ++++++++++++++++++---------
2 files changed, 20 insertions(+), 9 deletions(-)
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index de39705..a8d9238 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -326,6 +326,8 @@ struct huge_bootmem_page {
struct page *alloc_huge_page_node(struct hstate *h, int nid);
struct page *alloc_huge_page_noerr(struct vm_area_struct *vma,
unsigned long addr, int avoid_reserve);
+int huge_add_to_page_cache(struct page *page, struct address_space *mapping,
+ pgoff_t idx);
/* arch callback */
int __init alloc_bootmem_huge_page(struct hstate *h);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 60a4f21..23e2c6d 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2995,6 +2995,23 @@ static bool hugetlbfs_pagecache_present(struct hstate *h,
return page != NULL;
}
+int huge_add_to_page_cache(struct page *page, struct address_space *mapping,
+ pgoff_t idx)
+{
+ struct inode *inode = mapping->host;
+ struct hstate *h = hstate_inode(inode);
+ int err = add_to_page_cache(page, mapping, idx, GFP_KERNEL);
+
+ if (err)
+ return err;
+ ClearPagePrivate(page);
+
+ spin_lock(&inode->i_lock);
+ inode->i_blocks += blocks_per_huge_page(h);
+ spin_unlock(&inode->i_lock);
+ return 0;
+}
+
static int hugetlb_no_page(struct mm_struct *mm, struct vm_area_struct *vma,
struct address_space *mapping, pgoff_t idx,
unsigned long address, pte_t *ptep, unsigned int flags)
@@ -3041,21 +3058,13 @@ retry:
__SetPageUptodate(page);
if (vma->vm_flags & VM_MAYSHARE) {
- int err;
- struct inode *inode = mapping->host;
-
- err = add_to_page_cache(page, mapping, idx, GFP_KERNEL);
+ int err = huge_add_to_page_cache(page, mapping, idx);
if (err) {
put_page(page);
if (err == -EEXIST)
goto retry;
goto out;
}
- ClearPagePrivate(page);
-
- spin_lock(&inode->i_lock);
- inode->i_blocks += blocks_per_huge_page(h);
- spin_unlock(&inode->i_lock);
} else {
lock_page(page);
if (unlikely(anon_vma_prepare(vma))) {
--
2.1.0
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [RFC v2 PATCH 4/5] hugetlbfs: add hugetlbfs_fallocate()
2015-04-23 22:13 [RFC v2 PATCH 0/5] hugetlbfs: add fallocate support Mike Kravetz
` (2 preceding siblings ...)
2015-04-23 22:13 ` [RFC v2 PATCH 3/5] hugetlbfs: New huge_add_to_page_cache helper routine Mike Kravetz
@ 2015-04-23 22:13 ` Mike Kravetz
2015-04-23 22:13 ` [RFC v2 PATCH 5/5] mm: madvise allow remove operation for hugetlbfs Mike Kravetz
2015-04-29 7:04 ` [RFC v2 PATCH 0/5] hugetlbfs: add fallocate support Hillf Danton
5 siblings, 0 replies; 7+ messages in thread
From: Mike Kravetz @ 2015-04-23 22:13 UTC (permalink / raw)
To: linux-mm, linux-kernel
Cc: Dave Hansen, Naoya Horiguchi, David Rientjes, Hugh Dickins,
Davidlohr Bueso, Aneesh Kumar, Hillf Danton, Christoph Hellwig,
Mike Kravetz
This is based on the shmem version, but it has diverged quite
a bit. We have no swap to worry about, nor the new file sealing.
What this allows us to do is move physical memory in and out of
a hugetlbfs file without having it mapped. This also gives us
the ability to support MADV_REMOVE since it is currently
implemented using fallocate(). MADV_REMOVE lets madvise() remove
pages from the middle of a hugetlbfs file, which wasn't possible
before.
hugetlbfs fallocate only operates on whole huge pages.
Based-on code-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
fs/hugetlbfs/inode.c | 138 ++++++++++++++++++++++++++++++++++++++++++++++++
include/linux/hugetlb.h | 3 ++
mm/hugetlb.c | 2 +-
3 files changed, 142 insertions(+), 1 deletion(-)
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 2faf2c4..a1de0ff 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -12,6 +12,7 @@
#include <linux/thread_info.h>
#include <asm/current.h>
#include <linux/sched.h> /* remove ASAP */
+#include <linux/falloc.h>
#include <linux/fs.h>
#include <linux/mount.h>
#include <linux/file.h>
@@ -377,6 +378,142 @@ static void truncate_hugepages(struct inode *inode, loff_t lstart, loff_t lend)
hugetlb_unreserve_pages(inode, start, end, freed);
}
+static long hugetlbfs_punch_hole(struct inode *inode, loff_t offset, loff_t len)
+{
+ struct hstate *h = hstate_inode(inode);
+ unsigned long hpage_size = huge_page_size(h);
+ loff_t hole_start, hole_end;
+
+ /*
+ * For hole punch round up the beginning offset of the hole and
+ * round down the end.
+ */
+ hole_start = (offset + hpage_size - 1) & huge_page_mask(h);
+ hole_end = (offset + len) & huge_page_mask(h);
+
+ if ((u64)hole_end > (u64)hole_start) {
+ struct address_space *mapping = inode->i_mapping;
+
+ mutex_lock(&inode->i_mutex);
+ unmap_mapping_range(mapping, hole_start, hole_end, 0);
+ truncate_hugepages(inode, hole_start, hole_end);
+ mutex_unlock(&inode->i_mutex);
+ }
+
+ return 0;
+}
+
+static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset,
+ loff_t len)
+{
+ struct inode *inode = file_inode(file);
+ struct address_space *mapping = inode->i_mapping;
+ struct hstate *h = hstate_inode(inode);
+ struct vm_area_struct pseudo_vma;
+ unsigned long hpage_size = huge_page_size(h);
+ unsigned long hpage_shift = huge_page_shift(h);
+ pgoff_t start, index, end;
+ int error;
+
+ if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE))
+ return -EOPNOTSUPP;
+
+ if (mode & FALLOC_FL_PUNCH_HOLE)
+ return hugetlbfs_punch_hole(inode, offset, len);
+
+ /*
+ * Default preallocate case.
+ * For this range, start is rounded down and end is rounded up.
+ */
+ start = offset >> hpage_shift;
+ end = (offset + len + hpage_size - 1) >> hpage_shift;
+
+ mutex_lock(&inode->i_mutex);
+
+ /* We need to check rlimit even when FALLOC_FL_KEEP_SIZE */
+ error = inode_newsize_ok(inode, offset + len);
+ if (error)
+ goto out;
+
+ /*
+ * Initialize a pseudo vma that just contains the policy used
+ * when allocating the huge pages. The actual policy field
+ * (vm_policy) is determined based on the index in the loop below.
+ */
+ memset(&pseudo_vma, 0, sizeof(struct vm_area_struct));
+ pseudo_vma.vm_start = 0;
+ pseudo_vma.vm_flags |= (VM_HUGETLB | VM_MAYSHARE);
+ pseudo_vma.vm_file = file;
+
+ for (index = start; index < end; index++) {
+ /*
+ * This is supposed to be the vaddr where the page is being
+ * faulted in, but we have no vaddr here.
+ */
+ struct page *page;
+ unsigned long addr;
+ int avoid_reserve = 1;
+
+ cond_resched();
+
+ /*
+ * fallocate(2) manpage permits EINTR; we may have been
+ * interrupted because we are using up too much memory.
+ */
+ if (signal_pending(current)) {
+ error = -EINTR;
+ break;
+ }
+ page = find_get_page(mapping, index);
+ if (page) {
+ put_page(page);
+ continue;
+ }
+
+ /* Get policy based on index */
+ pseudo_vma.vm_policy =
+ mpol_shared_policy_lookup(&HUGETLBFS_I(inode)->policy,
+ index);
+
+ /* addr is the offset within the file (zero based) */
+ addr = index * hpage_size;
+ page = alloc_huge_page(&pseudo_vma, addr, avoid_reserve);
+ mpol_cond_put(pseudo_vma.vm_policy);
+ if (IS_ERR(page)) {
+ error = PTR_ERR(page);
+ goto out;
+ }
+ clear_huge_page(page, addr, pages_per_huge_page(h));
+ __SetPageUptodate(page);
+ error = huge_add_to_page_cache(page, mapping, index);
+ if (error) {
+ put_page(page);
+ /* Keep going if we see an -EEXIST */
+ if (error == -EEXIST)
+ continue;
+ else
+ goto out;
+ }
+
+ /*
+ * page_put due to reference from alloc_huge_page()
+ * unlock_page because locked by add_to_page_cache()
+ */
+ put_page(page);
+ unlock_page(page);
+ }
+
+ if (!(mode & FALLOC_FL_KEEP_SIZE) && offset + len > inode->i_size)
+ i_size_write(inode, offset + len);
+ inode->i_ctime = CURRENT_TIME;
+ spin_lock(&inode->i_lock);
+ inode->i_private = NULL;
+ spin_unlock(&inode->i_lock);
+out:
+ mutex_unlock(&inode->i_mutex);
+ return error;
+}
+
static void hugetlbfs_evict_inode(struct inode *inode)
{
struct resv_map *resv_map;
@@ -743,6 +880,7 @@ const struct file_operations hugetlbfs_file_operations = {
.fsync = noop_fsync,
.get_unmapped_area = hugetlb_get_unmapped_area,
.llseek = default_llseek,
+ .fallocate = hugetlbfs_fallocate,
};
static const struct inode_operations hugetlbfs_dir_inode_operations = {
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index a8d9238..65ed626 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -323,6 +323,8 @@ struct huge_bootmem_page {
#endif
};
+struct page *alloc_huge_page(struct vm_area_struct *vma,
+ unsigned long addr, int avoid_reserve);
struct page *alloc_huge_page_node(struct hstate *h, int nid);
struct page *alloc_huge_page_noerr(struct vm_area_struct *vma,
unsigned long addr, int avoid_reserve);
@@ -477,6 +479,7 @@ static inline bool hugepages_supported(void)
#else /* CONFIG_HUGETLB_PAGE */
struct hstate {};
+#define alloc_huge_page(v, a, r) NULL
#define alloc_huge_page_node(h, nid) NULL
#define alloc_huge_page_noerr(v, a, r) NULL
#define alloc_bootmem_huge_page(h) NULL
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 23e2c6d..7f70fc5 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1399,7 +1399,7 @@ static void vma_commit_reservation(struct hstate *h,
region_add(resv, idx, idx + 1);
}
-static struct page *alloc_huge_page(struct vm_area_struct *vma,
+struct page *alloc_huge_page(struct vm_area_struct *vma,
unsigned long addr, int avoid_reserve)
{
struct hugepage_subpool *spool = subpool_vma(vma);
--
2.1.0
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [RFC v2 PATCH 5/5] mm: madvise allow remove operation for hugetlbfs
2015-04-23 22:13 [RFC v2 PATCH 0/5] hugetlbfs: add fallocate support Mike Kravetz
` (3 preceding siblings ...)
2015-04-23 22:13 ` [RFC v2 PATCH 4/5] hugetlbfs: add hugetlbfs_fallocate() Mike Kravetz
@ 2015-04-23 22:13 ` Mike Kravetz
2015-04-29 7:04 ` [RFC v2 PATCH 0/5] hugetlbfs: add fallocate support Hillf Danton
5 siblings, 0 replies; 7+ messages in thread
From: Mike Kravetz @ 2015-04-23 22:13 UTC (permalink / raw)
To: linux-mm, linux-kernel
Cc: Dave Hansen, Naoya Horiguchi, David Rientjes, Hugh Dickins,
Davidlohr Bueso, Aneesh Kumar, Hillf Danton, Christoph Hellwig,
Mike Kravetz
Now that we have hole punching support for hugetlbfs, we can
also support the MADV_REMOVE interface to it.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
mm/madvise.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/madvise.c b/mm/madvise.c
index d551475..c4a1027 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -299,7 +299,7 @@ static long madvise_remove(struct vm_area_struct *vma,
*prev = NULL; /* tell sys_madvise we drop mmap_sem */
- if (vma->vm_flags & (VM_LOCKED | VM_HUGETLB))
+ if (vma->vm_flags & VM_LOCKED)
return -EINVAL;
f = vma->vm_file;
--
2.1.0
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [RFC v2 PATCH 0/5] hugetlbfs: add fallocate support
2015-04-23 22:13 [RFC v2 PATCH 0/5] hugetlbfs: add fallocate support Mike Kravetz
` (4 preceding siblings ...)
2015-04-23 22:13 ` [RFC v2 PATCH 5/5] mm: madvise allow remove operation for hugetlbfs Mike Kravetz
@ 2015-04-29 7:04 ` Hillf Danton
5 siblings, 0 replies; 7+ messages in thread
From: Hillf Danton @ 2015-04-29 7:04 UTC (permalink / raw)
To: 'Mike Kravetz', linux-mm, linux-kernel
Cc: 'Dave Hansen', 'Naoya Horiguchi',
'David Rientjes', 'Hugh Dickins',
'Davidlohr Bueso', 'Aneesh Kumar',
'Christoph Hellwig'
>
> hugetlbfs is used today by applications that want a high degree of
> control over huge page usage. Often, large hugetlbfs files are used
> to map a large number huge pages into the application processes.
> The applications know when page ranges within these large files will
> no longer be used, and ideally would like to release them back to
> the subpool or global pools for other uses. The fallocate() system
> call provides an interface for preallocation and hole punching within
> files. This patch set adds fallocate functionality to hugetlbfs.
>
> RFC v2:
> Addressed alignment and error handling issues noticed by Hillf Danton
> New region_del() routine for region tracking/resv_map of ranges
> Fixed several issues found during more extensive testing
> Error handling in region_del() when kmalloc() fails stills needs
> to be addressed
> madvise remove support remains
>
> Mike Kravetz (5):
> hugetlbfs: truncate_hugepages() takes a range of pages
> hugetlbfs: remove region_truncte() as region_del() can be used
> hugetlbfs: New huge_add_to_page_cache helper routine
> hugetlbfs: add hugetlbfs_fallocate()
> mm: madvise allow remove operation for hugetlbfs
>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
> fs/hugetlbfs/inode.c | 169 ++++++++++++++++++++++++++++++++++++++++++++++--
> include/linux/hugetlb.h | 8 ++-
> mm/hugetlb.c | 110 ++++++++++++++++++++++---------
> mm/madvise.c | 2 +-
> 4 files changed, 248 insertions(+), 41 deletions(-)
>
> --
> 2.1.0
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2015-04-29 7:04 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-04-23 22:13 [RFC v2 PATCH 0/5] hugetlbfs: add fallocate support Mike Kravetz
2015-04-23 22:13 ` [RFC v2 PATCH 1/5] hugetlbfs: truncate_hugepages() takes a range of pages Mike Kravetz
2015-04-23 22:13 ` [RFC v2 PATCH 2/5] hugetlbfs: remove region_truncte() as region_del() can be used Mike Kravetz
2015-04-23 22:13 ` [RFC v2 PATCH 3/5] hugetlbfs: New huge_add_to_page_cache helper routine Mike Kravetz
2015-04-23 22:13 ` [RFC v2 PATCH 4/5] hugetlbfs: add hugetlbfs_fallocate() Mike Kravetz
2015-04-23 22:13 ` [RFC v2 PATCH 5/5] mm: madvise allow remove operation for hugetlbfs Mike Kravetz
2015-04-29 7:04 ` [RFC v2 PATCH 0/5] hugetlbfs: add fallocate support Hillf Danton
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).