public inbox for linux-ia64@vger.kernel.org
 help / color / mirror / Atom feed
From: "Chen, Kenneth W" <kenneth.w.chen@intel.com>
To: 'Ray Bryant' <raybry@sgi.com>
Cc: 'Andy Whitcroft' <apw@shadowen.org>,
	"Martin J. Bligh" <mbligh@aracnet.com>,
	Andrew Morton <akpm@osdl.org>,
	linux-kernel@vger.kernel.org, anton@samba.org,
	sds@epoch.ncsc.mil, ak@suse.de, lse-tech@lists.sourceforge.net,
	linux-ia64@vger.kernel.org
Subject: RE: [Lse-tech] RE: [PATCH] HUGETLB memory commitment
Date: Mon, 05 Apr 2004 23:18:17 +0000	[thread overview]
Message-ID: <200404052318.i35NIHF29964@unix-os.sc.intel.com> (raw)
In-Reply-To: <4071A3DE.2020809@sgi.com>
In-Reply-To: <40717AA8.9050900@sgi.com>

>>>> Ray Bryant wrote on Monday, April 05, 2004 11:22 AM
> > Chen, Kenneth W wrote:
> > I actually started coding yesterday.  It doesn't look too bad (I think).
> > I will post it once I finished it up later today or tomorrow.
>
> Hmmm...so did I.  Oh well.  We can pull the good ideas from both. :-)

I did have a revelation from your original demand-paging patch with per-inode
tracking ;-)  I extended it into tracking by struct address_space (so we don't
pollute inode structure) and added per-block tracking.  See patch at the end of
this post. I admit I had very pessimistic thoughts until I saw your patch.


> > There are still some oddity in lifetime of the huge page reservation,
> > but that can be discussed once everyone sees the code.

I was thinking the lifetime of the huge page reservation should be the life
of a mapping, i.e., only persist across mmap/munmap.  That means add a ref
count in the per-block tracking.  This seriously complicates the design
because now, ref count needs to be updated in munmap and fault_hander in
addition to the mmap and truncate.  Not to mention that Andy Whitcroft already
pointed out we don't get notification from munmap.  Plus it seriously make
tracking logic complicated and have performance down side as well.

I guess everyone is OK with reservation lives until file truncate?



Patch enclosed, less than 140 lines of change (only x86 and ia64 for now,
should be trivial to add other arch).  Tested on linux-2.6.5

 arch/i386/mm/hugetlbpage.c |    5 +
 arch/ia64/mm/hugetlbpage.c |    5 +
 fs/hugetlbfs/inode.c       |  119 +++++++++++++++++++++++++++++++++++++++++++++
 include/linux/hugetlb.h    |    8 +++
 4 files changed, 135 insertions(+), 2 deletions(-)

diff -Nurp linux-2.6.5/arch/i386/mm/hugetlbpage.c linux-2.6.5.htlb/arch/i386/mm/hugetlbpage.c
--- linux-2.6.5/arch/i386/mm/hugetlbpage.c	2004-04-03 19:38:15.000000000 -0800
+++ linux-2.6.5.htlb/arch/i386/mm/hugetlbpage.c	2004-04-05 16:09:29.000000000 -0700
@@ -22,7 +22,8 @@

 static long    htlbpagemem;
 int     htlbpage_max;
-static long    htlbzone_pages;
+long    htlbzone_pages;
+long	htlbpage_resv;

 static struct list_head hugepage_freelists[MAX_NUMNODES];
 static spinlock_t htlbpage_lock = SPIN_LOCK_UNLOCKED;
@@ -516,9 +517,11 @@ int hugetlb_report_meminfo(char *buf)
 	return sprintf(buf,
 			"HugePages_Total: %5lu\n"
 			"HugePages_Free:  %5lu\n"
+			"HUgePages_Resv:  %5lu\n"
 			"Hugepagesize:    %5lu kB\n",
 			htlbzone_pages,
 			htlbpagemem,
+			htlbpage_resv,
 			HPAGE_SIZE/1024);
 }

diff -Nurp linux-2.6.5/arch/ia64/mm/hugetlbpage.c linux-2.6.5.htlb/arch/ia64/mm/hugetlbpage.c
--- linux-2.6.5/arch/ia64/mm/hugetlbpage.c	2004-04-03 19:37:07.000000000 -0800
+++ linux-2.6.5.htlb/arch/ia64/mm/hugetlbpage.c	2004-04-05 16:09:41.000000000 -0700
@@ -24,7 +24,8 @@

 static long	htlbpagemem;
 int		htlbpage_max;
-static long	htlbzone_pages;
+long		htlbzone_pages;
+long		htlbpage_resv;
 unsigned int	hpage_shift=HPAGE_SHIFT_DEFAULT;

 static struct list_head hugepage_freelists[MAX_NUMNODES];
@@ -579,9 +580,11 @@ int hugetlb_report_meminfo(char *buf)
 	return sprintf(buf,
 			"HugePages_Total: %5lu\n"
 			"HugePages_Free:  %5lu\n"
+			"HUgePages_Resv:  %5lu\n"
 			"Hugepagesize:    %5lu kB\n",
 			htlbzone_pages,
 			htlbpagemem,
+			htlbpage_resv,
 			HPAGE_SIZE/1024);
 }

diff -Nurp linux-2.6.5/fs/hugetlbfs/inode.c linux-2.6.5.htlb/fs/hugetlbfs/inode.c
--- linux-2.6.5/fs/hugetlbfs/inode.c	2004-04-03 19:38:14.000000000 -0800
+++ linux-2.6.5.htlb/fs/hugetlbfs/inode.c	2004-04-05 16:09:41.000000000 -0700
@@ -43,6 +43,121 @@ static struct backing_dev_info hugetlbfs
 	.memory_backed	= 1,	/* Does not contribute to dirty memory */
 };

+enum file_area_action {
+	INSERT,
+	FRONT_MERGE,
+	BACK_MERGE,
+	THREE_WAY_MERGE
+};
+
+/*
+ * return 0 if reservation is granted
+ */
+static int hugetlb_reserve_page(struct address_space *mapping,
+				struct vm_area_struct *vma)
+{
+	unsigned long block_start, block_end, resv;
+	struct list_head *p, *head;
+	struct file_area_struct *curr, *next;
+	enum file_area_action action;
+	int ret = -ENOMEM;
+
+	block_start = vma->vm_pgoff >> (HPAGE_SHIFT-PAGE_SHIFT);
+	block_end = block_start + ((vma->vm_end - vma->vm_start) >> HPAGE_SHIFT);
+
+	down(&mapping->i_shared_sem);
+
+	action = INSERT;
+	resv = block_end - block_start;
+	head = &mapping->private_list;
+	curr = next = NULL;
+	list_for_each(p, head) {
+		curr = list_entry(p, struct file_area_struct, list);
+		if (p->next != head)
+			next = list_entry(p->next, struct file_area_struct, list);
+
+		if (block_start <= curr->end) {
+			if (block_end <= curr->end) {
+				ret = 0;
+				goto out;
+			} else if (!next || block_end < next->start) {
+				resv = block_end - curr->end;
+				action = BACK_MERGE;
+			} else {
+				resv = next->start - curr->end;
+				action = THREE_WAY_MERGE;
+			}
+		} else if (!next || block_start < next->start) {
+			if (!next || block_end < next->start) {
+				resv = block_end - block_start;
+				action = INSERT;
+			} else {
+				curr = next;
+				resv = curr->start - block_start;
+				action = FRONT_MERGE;
+			}
+		}
+		else
+			continue;
+	}
+
+	/* check page reservation */
+	if (resv > (htlbzone_pages - htlbpage_resv))
+		goto out;
+
+	/* FIXME: check file system quota */
+
+	/* we have enough hugetlb page, go ahead reserve them */
+	switch(action) {
+		case BACK_MERGE:
+			curr->end = block_end;
+			break;
+		case FRONT_MERGE:
+			curr->start = block_start;
+			break;
+		case THREE_WAY_MERGE:
+			curr->end = next->end;
+			list_del(p->next);
+			kfree(next);
+			break;
+		case INSERT:
+			curr = kmalloc(sizeof(*curr), GFP_KERNEL);
+			if (!curr)
+				goto out;
+			curr->start = block_start;
+			curr->end = block_end;
+			list_add(&curr->list, p);
+			break;
+	}
+	htlbpage_resv += resv;
+	ret = 0;
+out:
+	up(&mapping->i_shared_sem);
+	return ret;
+}
+
+static void hugetlb_unreserve_page(struct address_space *mapping, loff_t lstart)
+{
+	struct file_area_struct *curr, *tmp;
+	unsigned long resv;
+
+	lstart >>= HPAGE_SHIFT;
+	down(&mapping->i_shared_sem);
+	list_for_each_entry_safe(curr, tmp, &mapping->private_list, list) {
+		if (lstart <= curr->start) {
+			resv = curr->end - curr->start;
+			list_del(&curr->list);
+			kfree(curr);
+		}
+		else {
+			resv = curr->end - lstart;
+			curr->end = lstart;
+		}
+		htlbpage_resv -= resv;
+	}
+	up(&mapping->i_shared_sem);
+}
+
 static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
 {
 	struct inode *inode = file->f_dentry->d_inode;
@@ -59,6 +174,9 @@ static int hugetlbfs_file_mmap(struct fi
 	if (vma->vm_end - vma->vm_start < HPAGE_SIZE)
 		return -EINVAL;

+	if (hugetlb_reserve_page(mapping, vma))
+		return -ENOMEM;
+
 	vma_len = (loff_t)(vma->vm_end - vma->vm_start);

 	down(&inode->i_sem);
@@ -186,6 +304,7 @@ void truncate_hugepages(struct address_s
 		huge_pagevec_release(&pvec);
 	}
 	BUG_ON(!lstart && mapping->nrpages);
+	hugetlb_unreserve_page(mapping, lstart);
 }

 static void hugetlbfs_delete_inode(struct inode *inode)
diff -Nurp linux-2.6.5/include/linux/hugetlb.h linux-2.6.5.htlb/include/linux/hugetlb.h
--- linux-2.6.5/include/linux/hugetlb.h	2004-04-03 19:37:06.000000000 -0800
+++ linux-2.6.5.htlb/include/linux/hugetlb.h	2004-04-05 16:09:41.000000000 -0700
@@ -30,6 +30,8 @@ int is_aligned_hugepage_range(unsigned l
 int pmd_huge(pmd_t pmd);

 extern int htlbpage_max;
+extern long htlbzone_pages;
+extern long htlbpage_resv;

 static inline void
 mark_mm_hugetlb(struct mm_struct *mm, struct vm_area_struct *vma)
@@ -103,6 +105,12 @@ struct hugetlbfs_sb_info {
 	spinlock_t	stat_lock;
 };

+struct file_area_struct {
+	struct list_head	list;
+	unsigned long		start;
+	unsigned long		end;
+};
+
 static inline struct hugetlbfs_sb_info *HUGETLBFS_SB(struct super_block *sb)
 {
 	return sb->s_fs_info;



  parent reply	other threads:[~2004-04-05 23:18 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-03-25 16:54 [PATCH] [0/6] HUGETLB memory commitment Andy Whitcroft
2004-03-25 16:58 ` [PATCH] [1/6] " Andy Whitcroft
2004-03-25 16:59 ` [PATCH] [2/6] " Andy Whitcroft
2004-03-25 17:00 ` [PATCH] [3/6] " Andy Whitcroft
2004-03-25 17:01 ` [PATCH] [4/6] " Andy Whitcroft
2004-03-25 17:02 ` [PATCH] [5/6] " Andy Whitcroft
2004-03-25 17:03 ` [PATCH] [6/6] " Andy Whitcroft
2004-03-25 21:04 ` [PATCH] [0/6] " Andrew Morton
2004-03-25 23:27   ` Andy Whitcroft
2004-03-25 23:51     ` Andrew Morton
2004-03-25 23:59       ` Andy Whitcroft
2004-03-26  2:01         ` Andy Whitcroft
2004-03-26  0:18       ` Martin J. Bligh
2004-03-28 18:02     ` Ray Bryant
2004-03-28 19:10       ` Martin J. Bligh
2004-03-28 21:32         ` [Lse-tech] " Ray Bryant
2004-03-29 16:50           ` Martin J. Bligh
2004-03-29 12:30         ` Andy Whitcroft
2004-03-26  0:10 ` Keith Owens
2004-03-26  0:22   ` Andrew Morton
2004-03-26  3:41     ` [Lse-tech] " Suparna Bhattacharya
2004-03-26  3:39       ` Keith Owens
2004-03-26 11:45         ` Suparna Bhattacharya
2004-03-29 20:45 ` Chen, Kenneth W
2004-03-29 20:49 ` Chen, Kenneth W
2004-03-30 12:57   ` Andy Whitcroft
2004-03-30 20:04 ` Chen, Kenneth W
2004-03-30 21:48   ` Andy Whitcroft
2004-03-31  1:48     ` Andy Whitcroft
2004-03-31  8:51 ` Chen, Kenneth W
2004-03-31 16:20   ` Andy Whitcroft
2004-04-01 21:15   ` Andy Whitcroft
2004-04-01 22:50     ` Andy Whitcroft
2004-04-01 23:09 ` Chen, Kenneth W
2004-04-03  3:57   ` [PATCH] " Ray Bryant
2004-04-04  3:31     ` Chen, Kenneth W
2004-04-04 22:15       ` Ray Bryant
2004-04-05 15:26       ` [Lse-tech] " Ray Bryant
2004-04-05 17:01         ` Chen, Kenneth W
2004-04-05 18:22           ` Ray Bryant
2004-04-05 23:18         ` Chen, Kenneth W [this message]
2004-04-06  1:05           ` Ray Bryant
2004-04-06 16:14           ` Andy Whitcroft
2004-04-06 17:40         ` Chen, Kenneth W

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200404052318.i35NIHF29964@unix-os.sc.intel.com \
    --to=kenneth.w.chen@intel.com \
    --cc=ak@suse.de \
    --cc=akpm@osdl.org \
    --cc=anton@samba.org \
    --cc=apw@shadowen.org \
    --cc=linux-ia64@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lse-tech@lists.sourceforge.net \
    --cc=mbligh@aracnet.com \
    --cc=raybry@sgi.com \
    --cc=sds@epoch.ncsc.mil \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox