linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Nishanth Aravamudan <nacc@us.ibm.com>
To: Nick Piggin <npiggin@suse.de>
Cc: Christoph Lameter <clameter@sgi.com>, Greg KH <gregkh@suse.de>,
	wli@holomorphy.com, agl@us.ibm.com, luick@cray.com,
	Lee.Schermerhorn@hp.com, linux-mm@kvack.org
Subject: [RFC][PATCH] hugetlb: add information and interface in sysfs [Was Re: [RFC][PATCH 4/5] Documentation: add node files to sysfs ABI]
Date: Sat, 26 Apr 2008 20:49:42 -0700	[thread overview]
Message-ID: <20080427034942.GB12129@us.ibm.com> (raw)
In-Reply-To: <20080424071352.GB14543@wotan.suse.de>

On 24.04.2008 [09:13:52 +0200], Nick Piggin wrote:
> On Wed, Apr 23, 2008 at 11:32:52AM -0700, Nishanth Aravamudan wrote:
> > 
> > So, I think, we pretty much agree on how things should be:
> > 
> > Direct translation of the current sysctl:
> > 
> > /sys/kernel/hugepages/nr_hugepages
> >                       nr_overcommit_hugepages
> > 
> > Adding multiple pools:
> > 
> > /sys/kernel/hugepages/nr_hugepages -> nr_hugepages_${default_size}
> >                       nr_overcommit_hugepages -> nr_overcommit_hugepages_${default_size}
> >                       nr_hugepages_${default_size}
> >                       nr_overcommit_hugepages_${default_size}
> >                       nr_hugepages_${other_size1}
> >                       nr_overcommit_hugepages_${other_size2}
> > 
> > Adding per-node control:
> > 
> > /sys/kernel/hugepages/nr_hugepages -> nr_hugepages_${default_size}
> >                       nr_overcommit_hugepages -> nr_overcommit_hugepages_${default_size}
> >                       nr_hugepages_${default_size}
> >                       nr_overcommit_hugepages_${default_size}
> >                       nr_hugepages_${other_size1}
> >                       nr_overcommit_hugepages_${other_size2}
> >                       nodeX/nr_hugepages -> nr_hugepages_${default_size}
> >                             nr_overcommit_hugepages -> nr_overcommit_hugepages_${default_size}
> >                             nr_hugepages_${default_size}
> >                             nr_overcommit_hugepages_${default_size}
> >                             nr_hugepages_${other_size1}
> >                             nr_overcommit_hugepages_${other_size2}
> > 
> > How does that look? Does anyone have any problems with such an
> > arrangement?
> 
> Looks pretty good. I would personally lean toward subdirectories for
> hstates. Pros are that it would be a little easier to navigate from
> the shell, and maybe more regular to program for.
> 
> You could possibly have hugepages_default symlink as well to one of
> the directories of your choice. This could be used by apps which do
> not specify exactly what size they want...
> 
> I don't know, just ideas.

So, here's the first cut of the patch. Still very rough, but it builds
and I'm running it now:

[20:41:34]nacc@arkanoid:/sys/kernel/hugepages$ tree
.
`-- hugepages-2MB
    |-- meminfo
    |-- nr_huge_pages
    `-- nr_overcommit_huge_pages

1 directory, 3 files

[20:41:56]nacc@arkanoid:/sys/kernel/hugepages$ cat /sys/kernel/hugepages/hugepages-2MB/meminfo
HugePages_Total:     0
HugePages_Free:      0
HugePages_Rsvd:      0
HugePages_Surp:      0
Hugepagesize:     2048 kB

[20:42:20]nacc@arkanoid:/sys/kernel/hugepages$ sudo echo 10 > /sys/kernel/hugepages/hugepages-2MB/nr_huge_pages 
[20:42:57]nacc@arkanoid:/sys/kernel/hugepages$ cat /sys/kernel/hugepages/hugepages-2MB/nr_huge_pages 
10
[20:43:02]nacc@arkanoid:/sys/kernel/hugepages$ cat /sys/kernel/hugepages/hugepages-2MB/meminfo 
HugePages_Total:    10
HugePages_Free:     10
HugePages_Rsvd:      0
HugePages_Surp:      0
Hugepagesize:     2048 kB
[20:43:05]nacc@arkanoid:/sys/kernel/hugepages$ cat /proc/m
[20:43:10]nacc@arkanoid:/sys/kernel/hugepages$ grep Huge /proc/meminfo 
HugePages_Total:    10
HugePages_Free:     10
HugePages_Rsvd:      0
HugePages_Surp:      0
Hugepagesize:     2048 kB

I haven't tested yet with multiple pools, will hopefully get to that Monday. I
see one obvious issue, in that I left an underscore in huge_pages :) Will fix.
How does the naming seem? I don't like having two memfmt()s but I couldn't
think of a good way, beyond perhaps having two strings, one for the magnitude
and one for the units, but that seemed gross.

A lot of the functions and macros, perhaps all of them, are clones of the ones
used for /sys/kernel/slab. Thanks to those authors for that code!

Greg, do you see any obvious violations of sysfs rules here? Well, beyond
meminfo itself, I guess, but given our previous snapshot discussion, I left it
simple and the same, rather than split it up.

Not-yet-Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>

 include/linux/hugetlb.h |    9 +-
 mm/hugetlb.c            |  317 ++++++++++++++++++++++++++++++++++++-----------
 2 files changed, 251 insertions(+), 75 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 7aa22e7..cac63bd 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -3,6 +3,9 @@
 
 #include <linux/fs.h>
 #include <linux/shm.h>
+#include <linux/mempolicy.h>
+#include <asm/tlbflush.h>
+#include <asm/hugetlb.h>
 
 #ifdef CONFIG_HUGETLBFS
 struct hugetlbfs_config {
@@ -69,10 +72,6 @@ static inline void set_file_hugepages(struct file *file)
 
 #ifdef CONFIG_HUGETLB_PAGE
 
-#include <linux/mempolicy.h>
-#include <asm/tlbflush.h>
-#include <asm/hugetlb.h>
-
 struct ctl_table;
 
 static inline int is_vm_hugetlb_page(struct vm_area_struct *vma)
@@ -131,6 +130,8 @@ struct hstate {
 	unsigned int nr_huge_pages_node[MAX_NUMNODES];
 	unsigned int free_huge_pages_node[MAX_NUMNODES];
 	unsigned int surplus_huge_pages_node[MAX_NUMNODES];
+	const char *name;
+	struct kobject kobj;
 };
 
 void __init huge_add_hstate(unsigned order);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index de03a14..c30e45d 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -15,6 +15,7 @@
 #include <linux/cpuset.h>
 #include <linux/mutex.h>
 #include <linux/bootmem.h>
+#include <linux/sysfs.h>
 
 #include <asm/page.h>
 #include <asm/pgtable.h>
@@ -604,9 +605,21 @@ static void __init gather_bootmem_prealloc(void)
 	}
 }
 
+static __init char *memfmt_nospaces(char *buf, unsigned long n)
+{
+	if (n >= (1UL << 30))
+		sprintf(buf, "%luGB", n >> 30);
+	else if (n >= (1UL << 20))
+		sprintf(buf, "%luMB", n >> 20);
+	else
+		sprintf(buf, "%luKB", n >> 10);
+	return buf;
+}
+
 static void __init hugetlb_init_hstate(struct hstate *h)
 {
 	unsigned long i;
+	char buf[32];
 
 	/* Don't reinitialize lists if they have been already init'ed */
 	if (!h->hugepage_freelists[0].next) {
@@ -624,6 +637,8 @@ static void __init hugetlb_init_hstate(struct hstate *h)
 			break;
 	}
 	h->max_huge_pages = i;
+	h->name = kasprintf(GFP_KERNEL, "hugepages-%s",
+				memfmt_nospaces(buf, huge_page_size(h)));
 }
 
 static void __init hugetlb_init_hstates(void)
@@ -662,77 +677,6 @@ static void __init report_hugepages(void)
         }
 }
 
-static int __init hugetlb_init(void)
-{
-	BUILD_BUG_ON(HPAGE_SHIFT == 0);
-
-	if (!size_to_hstate(HPAGE_SIZE)) {
-		huge_add_hstate(HUGETLB_PAGE_ORDER);
-		parsed_hstate->max_huge_pages = default_hstate_resv;
-	}
-
-	hugetlb_init_hstates();
-
-	gather_bootmem_prealloc();
-
-	report_hugepages();
-
-	return 0;
-}
-module_init(hugetlb_init);
-
-/* Should be called on processing a hugepagesz=... option */
-void __init huge_add_hstate(unsigned order)
-{
-	struct hstate *h;
-	if (size_to_hstate(PAGE_SIZE << order)) {
-		printk("hugepagesz= specified twice, ignoring\n");
-		return;
-	}
-	BUG_ON(max_hstate >= HUGE_MAX_HSTATE);
-	BUG_ON(order < HPAGE_SHIFT - PAGE_SHIFT);
-	h = &hstates[max_hstate++];
-	h->order = order;
-	h->mask = ~((1ULL << (order + PAGE_SHIFT)) - 1);
-	hugetlb_init_hstate(h);
-	parsed_hstate = h;
-}
-
-static int __init hugetlb_setup(char *s)
-{
-	unsigned long *mhp;
-
-	if (!max_hstate)
-		mhp = &default_hstate_resv;
-	else
-		mhp = &parsed_hstate->max_huge_pages;
-
-	if (sscanf(s, "%lu", mhp) <= 0)
-		*mhp = 0;
-
-	/*
-	 * Global state is always initialized later in hugetlb_init.
-	 * But we need to allocate >= MAX_ORDER hstates here early to still
-	 * use the bootmem allocator.
-	 */
-	if (max_hstate > 0 && parsed_hstate->order >= MAX_ORDER)
-		hugetlb_init_hstate(parsed_hstate);
-
-	return 1;
-}
-__setup("hugepages=", hugetlb_setup);
-
-static unsigned int cpuset_mems_nr(unsigned int *array)
-{
-	int node;
-	unsigned int nr = 0;
-
-	for_each_node_mask(node, cpuset_current_mems_allowed)
-		nr += array[node];
-
-	return nr;
-}
-
 #ifdef CONFIG_SYSCTL
 #ifdef CONFIG_HIGHMEM
 static void try_to_free_low(struct hstate *h, unsigned long count)
@@ -843,6 +787,237 @@ out:
 	return ret;
 }
 
+#ifdef CONFIG_SYSFS
+#define to_hstate_attr(n) container_of(n, struct hstate_attribute, attr)
+#define to_hstate(n) container_of(n, struct hstate, kobj)
+
+struct hstate_attribute {
+	struct attribute attr;
+	ssize_t (*show)(struct hstate *h, char *buf);
+	ssize_t (*store)(struct hstate *h, const char *buf, size_t count);
+};
+
+#define HSTATE_ATTR_RO(_name) \
+	static struct hstate_attribute _name##_attr = __ATTR_RO(_name)
+
+#define HSTATE_ATTR(_name) \
+	static struct hstate_attribute _name##_attr = \
+		__ATTR(_name, 0644, _name##_show, _name##_store)
+
+static ssize_t nr_huge_pages_show(struct hstate *h, char *buf)
+{
+	return sprintf(buf, "%lu\n", h->nr_huge_pages);
+}
+static ssize_t nr_huge_pages_store(struct hstate *h, const char *buf, size_t count)
+{
+	int tmp;
+
+	h->max_huge_pages = set_max_huge_pages(h,
+					simple_strtoul(buf, NULL, 10), &tmp);
+	max_huge_pages[h - hstates] = h->max_huge_pages;
+	return count;
+}
+HSTATE_ATTR(nr_huge_pages);
+
+static ssize_t nr_overcommit_huge_pages_show(struct hstate *h, char *buf)
+{
+	return sprintf(buf, "%lu\n", h->nr_overcommit_huge_pages);
+}
+static ssize_t nr_overcommit_huge_pages_store(struct hstate *h, const char *buf, size_t count)
+{
+	spin_lock(&hugetlb_lock);
+	h->nr_overcommit_huge_pages = simple_strtoul(buf, NULL, 10);
+	sysctl_overcommit_huge_pages[h - hstates] = h->nr_overcommit_huge_pages;
+	spin_unlock(&hugetlb_lock);
+	return count;
+}
+HSTATE_ATTR(nr_overcommit_huge_pages);
+
+static ssize_t meminfo_show(struct hstate *h, char *buf)
+{
+	return sprintf(buf,
+			"HugePages_Total: %5lu\n"
+			"HugePages_Free:  %5lu\n"
+			"HugePages_Rsvd:  %5lu\n"
+			"HugePages_Surp:  %5lu\n"
+			"Hugepagesize:    %5lu kB\n",
+			h->nr_huge_pages,
+			h->free_huge_pages,
+			h->resv_huge_pages,
+			h->surplus_huge_pages,
+			huge_page_size(h) / 1024);
+}
+HSTATE_ATTR_RO(meminfo);
+
+static struct kset *hstate_kset;
+
+static struct attribute *hstate_attrs[] = {
+	&meminfo_attr.attr,
+	&nr_huge_pages_attr.attr,
+	&nr_overcommit_huge_pages_attr.attr,
+};
+
+static struct attribute_group hstate_attr_group = {
+	.attrs = hstate_attrs,
+};
+
+static ssize_t hstate_attr_show(struct kobject *kobj,
+					struct attribute *attr,
+					char *buf)
+{
+	struct hstate_attribute *attribute;
+	struct hstate *h;
+	int err;
+
+	attribute = to_hstate_attr(attr);
+	h = to_hstate(kobj);
+
+	if (!attribute->show)
+		return -EIO;
+
+	err = attribute->show(h, buf);
+
+	return err;
+}
+
+static ssize_t hstate_attr_store(struct kobject *kobj,
+					struct attribute *attr,
+					const char *buf, size_t len)
+{
+	struct hstate_attribute *attribute;
+	struct hstate *h;
+	int err;
+
+	attribute = to_hstate_attr(attr);
+	h = to_hstate(kobj);
+
+	if (!attribute->store)
+		return -EIO;
+
+	err = attribute->store(h, buf, len);
+
+	return err;
+}
+
+static struct sysfs_ops hstate_sysfs_ops = {
+	.show = hstate_attr_show,
+	.store = hstate_attr_store,
+};
+
+static struct kobj_type hstate_ktype = {
+	.sysfs_ops = &hstate_sysfs_ops,
+};
+
+static int __init hugetlb_sysfs_add_hstate(struct hstate *h)
+{
+	int err;
+	h->kobj.kset = hstate_kset;
+	err = kobject_init_and_add(&h->kobj, &hstate_ktype, NULL, h->name);
+	if (err) {
+		kobject_put(&h->kobj);
+		return err;
+	}
+	err = sysfs_create_group(&h->kobj, &hstate_attr_group);
+	if (err)
+		return err;
+	return 0;
+}
+
+static void __init hugetlb_sysfs_init(void)
+{
+	struct hstate *h;
+	int err;
+
+	hstate_kset = kset_create_and_add("hugepages", NULL, kernel_kobj);
+	if (!hstate_kset)
+		return;
+
+	for_each_hstate(h) {
+		err = hugetlb_sysfs_add_hstate(h);
+		if (err)
+			printk(KERN_ERR "Hugetlb: Unable to add hstate %s", h->name);
+	}
+}
+#else
+static void __init hugetlb_sysfs_init(void)
+{
+}
+#endif
+
+static int __init hugetlb_init(void)
+{
+	BUILD_BUG_ON(HPAGE_SHIFT == 0);
+
+	if (!size_to_hstate(HPAGE_SIZE)) {
+		huge_add_hstate(HUGETLB_PAGE_ORDER);
+		parsed_hstate->max_huge_pages = default_hstate_resv;
+	}
+
+	hugetlb_init_hstates();
+
+	gather_bootmem_prealloc();
+
+	report_hugepages();
+
+	hugetlb_sysfs_init();
+
+	return 0;
+}
+module_init(hugetlb_init);
+
+/* Should be called on processing a hugepagesz=... option */
+void __init huge_add_hstate(unsigned order)
+{
+	struct hstate *h;
+	if (size_to_hstate(PAGE_SIZE << order)) {
+		printk("hugepagesz= specified twice, ignoring\n");
+		return;
+	}
+	BUG_ON(max_hstate >= HUGE_MAX_HSTATE);
+	BUG_ON(order < HPAGE_SHIFT - PAGE_SHIFT);
+	h = &hstates[max_hstate++];
+	h->order = order;
+	h->mask = ~((1ULL << (order + PAGE_SHIFT)) - 1);
+	hugetlb_init_hstate(h);
+	parsed_hstate = h;
+}
+
+static int __init hugetlb_setup(char *s)
+{
+	unsigned long *mhp;
+
+	if (!max_hstate)
+		mhp = &default_hstate_resv;
+	else
+		mhp = &parsed_hstate->max_huge_pages;
+
+	if (sscanf(s, "%lu", mhp) <= 0)
+		*mhp = 0;
+
+	/*
+	 * Global state is always initialized later in hugetlb_init.
+	 * But we need to allocate >= MAX_ORDER hstates here early to still
+	 * use the bootmem allocator.
+	 */
+	if (max_hstate > 0 && parsed_hstate->order >= MAX_ORDER)
+		hugetlb_init_hstate(parsed_hstate);
+
+	return 1;
+}
+__setup("hugepages=", hugetlb_setup);
+
+static unsigned int cpuset_mems_nr(unsigned int *array)
+{
+	int node;
+	unsigned int nr = 0;
+
+	for_each_node_mask(node, cpuset_current_mems_allowed)
+		nr += array[node];
+
+	return nr;
+}
+
+
 int hugetlb_sysctl_handler(struct ctl_table *table, int write,
 			   struct file *file, void __user *buffer,
 			   size_t *length, loff_t *ppos)

-- 
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2008-04-27  3:50 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-04-11 23:44 [PATCH 1/5] hugetlb: numafy several functions Nishanth Aravamudan
2008-04-11 23:47 ` [RFC][PATCH 2/5] " Nishanth Aravamudan
2008-04-11 23:47   ` [PATCH 3/5] hugetlb: interleave dequeueing of huge pages Nishanth Aravamudan
2008-04-11 23:49     ` [RFC][PATCH 4/5] Documentation: add node files to sysfs ABI Nishanth Aravamudan
2008-04-11 23:50       ` [RFC][PATCH 5/5] Documentation: update ABI and hugetlbpage.txt for per-node files Nishanth Aravamudan
2008-04-11 23:56       ` [RFC][PATCH 4/5] Documentation: add node files to sysfs ABI Greg KH
2008-04-12  0:27         ` Nishanth Aravamudan
2008-04-12  9:41         ` Nick Piggin
2008-04-12 10:26           ` Christoph Lameter
2008-04-14 21:09             ` Nishanth Aravamudan
2008-04-13  3:41           ` Greg KH
2008-04-14 21:05             ` Nishanth Aravamudan
2008-04-17 23:16               ` Nishanth Aravamudan
2008-04-17 23:22                 ` Christoph Lameter
2008-04-17 23:36                   ` Nishanth Aravamudan
2008-04-17 23:39                     ` Christoph Lameter
2008-04-18  6:04                       ` Nishanth Aravamudan
2008-04-18 17:27                         ` Nishanth Aravamudan
2008-04-20  2:24                           ` Greg KH
2008-04-21 16:43                             ` Nishanth Aravamudan
2008-04-20  2:21                       ` Greg KH
2008-04-21  6:06                         ` Christoph Lameter
2008-04-21 16:41                           ` Nishanth Aravamudan
2008-04-22  5:14                   ` Nick Piggin
2008-04-22 16:56                     ` Nishanth Aravamudan
2008-04-23  1:03                       ` Nick Piggin
2008-04-23 18:32                         ` Nishanth Aravamudan
2008-04-23 19:07                           ` Adam Litke
2008-04-24  7:13                           ` Nick Piggin
2008-04-24 15:54                             ` Nishanth Aravamudan
2008-04-27  3:49                             ` Nishanth Aravamudan [this message]
2008-04-27  5:10                               ` [RFC][PATCH] hugetlb: add information and interface in sysfs [Was Re: [RFC][PATCH 4/5] Documentation: add node files to sysfs ABI] Greg KH
2008-04-28 17:22                                 ` Nishanth Aravamudan
2008-04-28 17:29                                   ` Greg KH
2008-04-29 17:11                                     ` Nishanth Aravamudan
2008-04-29 17:22                                       ` Greg KH
2008-04-29 18:14                                         ` Nishanth Aravamudan
2008-04-29 18:26                                           ` Greg KH
2008-04-29 23:48                                             ` Nishanth Aravamudan
2008-05-01  3:07                                               ` Greg KH
2008-05-01 18:25                                                 ` Nishanth Aravamudan
2008-04-30 19:19                                             ` Nishanth Aravamudan
2008-05-01  3:08                                               ` Greg KH
2008-05-02 17:58                                                 ` Nishanth Aravamudan
2008-04-28 20:31                                 ` Christoph Lameter
2008-04-28 20:52                                   ` Nishanth Aravamudan
2008-04-28 21:29                                     ` Christoph Lameter
2008-04-29 16:43                                       ` Nishanth Aravamudan
2008-04-29 17:01                                         ` Christoph Lameter
2008-04-14 14:52   ` [RFC][PATCH 2/5] hugetlb: numafy several functions Adam Litke
2008-04-14 21:10     ` Nishanth Aravamudan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080427034942.GB12129@us.ibm.com \
    --to=nacc@us.ibm.com \
    --cc=Lee.Schermerhorn@hp.com \
    --cc=agl@us.ibm.com \
    --cc=clameter@sgi.com \
    --cc=gregkh@suse.de \
    --cc=linux-mm@kvack.org \
    --cc=luick@cray.com \
    --cc=npiggin@suse.de \
    --cc=wli@holomorphy.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).