All of lore.kernel.org
 help / color / mirror / Atom feed
From: Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
To: Dave Chinner <david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org>
Cc: Andrew Morton
	<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
	Glauber Costa <glommer-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Mel Gorman <mgorman-l3A5Bk7waGM@public.gmane.org>,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
	cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org,
	Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>,
	Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>,
	hughd-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org,
	Greg Thelen <gthelen-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	Dave Chinner <dchinner-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Subject: Re: [PATCH v10 11/35] list_lru: per-node list infrastructure
Date: Thu, 6 Jun 2013 12:21:43 +0400	[thread overview]
Message-ID: <51B04697.90106@parallels.com> (raw)
In-Reply-To: <20130606032107.GQ29338@dastard>

[-- Attachment #1: Type: text/plain, Size: 1250 bytes --]

On 06/06/2013 07:21 AM, Dave Chinner wrote:
>> It's unclear that active_nodes is really needed - we could just iterate
>> > across all items in list_lru.node[].  Are we sure that the correct
>> > tradeoff decision was made here?
> Yup. Think of all the cache line misses that checking
> node[x].nr_items != 0 entails. If MAX_NUMNODES = 1024, there's 1024
> cacheline misses right there. The nodemask is a much more cache
> friendly method of storing active node state.
> 
> not to mention that for small machines with a large MAX_NUMNODES,
> we'd be checking nodes that never have items stored on them...
> 
>> > What's the story on NUMA node hotplug, btw?
> Do we care? hotplug doesn't change MAX_NUMNODES, and if you are
> removing a node you have to free all the memory on the node,
> so that should already be tken care of by external code....
> 

Mel have already complained about this.
I have a patch that makes it dynamic but I didn't include it in here
because the series was already too big. I was also hoping to get it
ontop of the others, to avoid disruption.

I am attaching here for your appreciation.

For the record, nr_node_ids is firmware provided and it is actually
possible nodes, not online nodes. So hotplug won't change that.



[-- Attachment #2: 0001-list_lru-dynamically-adjust-node-arrays.patch --]
[-- Type: text/x-patch, Size: 7341 bytes --]

From cfc280ee20d93b1901c5ad2dcb13635ce7703d92 Mon Sep 17 00:00:00 2001
From: Glauber Costa <glommer-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
Date: Wed, 22 May 2013 09:55:15 +0400
Subject: [PATCH] list_lru: dynamically adjust node arrays

We currently use a compile-time constant to size the node array for the
list_lru structure. Due to this, we don't need to allocate any memory at
initialization time. But as a consequence, the structures that contain
embedded list_lru lists can become way too big (the superblock for
instance contains two of them).

This patch aims at ameliorating this situation by dynamically allocating
the node arrays with the firmware provided nr_node_ids.

Signed-off-by: Glauber Costa <glommer-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
Cc: Dave Chinner <dchinner-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: Mel Gorman <mgorman-l3A5Bk7waGM@public.gmane.org>
---
 fs/super.c               |  9 +++++++--
 fs/xfs/xfs_buf.c         |  6 +++++-
 fs/xfs/xfs_qm.c          | 10 ++++++++--
 include/linux/list_lru.h | 21 ++++---------------
 lib/list_lru.c           | 52 ++++++++++++++++++++++++++++++++++++++++++------
 5 files changed, 70 insertions(+), 28 deletions(-)

diff --git a/fs/super.c b/fs/super.c
index ff40e33..f8dfcec 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -209,8 +209,10 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags)
 		INIT_HLIST_BL_HEAD(&s->s_anon);
 		INIT_LIST_HEAD(&s->s_inodes);
 
-		list_lru_init_memcg(&s->s_dentry_lru);
-		list_lru_init_memcg(&s->s_inode_lru);
+		if (list_lru_init_memcg(&s->s_dentry_lru))
+			goto err_out;
+		if (list_lru_init_memcg(&s->s_inode_lru))
+			goto err_out_dentry_lru;
 
 		INIT_LIST_HEAD(&s->s_mounts);
 		init_rwsem(&s->s_umount);
@@ -251,6 +253,9 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags)
 	}
 out:
 	return s;
+
+err_out_dentry_lru:
+	list_lru_destroy(&s->s_dentry_lru);
 err_out:
 	security_sb_free(s);
 #ifdef CONFIG_SMP
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 0d7a619..b8cde02 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -1592,6 +1592,7 @@ xfs_free_buftarg(
 	struct xfs_mount	*mp,
 	struct xfs_buftarg	*btp)
 {
+	list_lru_destroy(&btp->bt_lru);
 	unregister_shrinker(&btp->bt_shrinker);
 
 	if (mp->m_flags & XFS_MOUNT_BARRIER)
@@ -1666,9 +1667,12 @@ xfs_alloc_buftarg(
 	if (!btp->bt_bdi)
 		goto error;
 
-	list_lru_init(&btp->bt_lru);
 	if (xfs_setsize_buftarg_early(btp, bdev))
 		goto error;
+
+	if (list_lru_init(&btp->bt_lru))
+		goto error;
+
 	btp->bt_shrinker.count_objects = xfs_buftarg_shrink_count;
 	btp->bt_shrinker.scan_objects = xfs_buftarg_shrink_scan;
 	btp->bt_shrinker.seeks = DEFAULT_SEEKS;
diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
index 85ca39e..29ea575 100644
--- a/fs/xfs/xfs_qm.c
+++ b/fs/xfs/xfs_qm.c
@@ -780,11 +780,18 @@ xfs_qm_init_quotainfo(
 
 	qinf = mp->m_quotainfo = kmem_zalloc(sizeof(xfs_quotainfo_t), KM_SLEEP);
 
+	if ((error = list_lru_init(&qinf->qi_lru))) {
+		kmem_free(qinf);
+		mp->m_quotainfo = NULL;
+		return error;
+	}
+
 	/*
 	 * See if quotainodes are setup, and if not, allocate them,
 	 * and change the superblock accordingly.
 	 */
 	if ((error = xfs_qm_init_quotainos(mp))) {
+		list_lru_destroy(&qinf->qi_lru);
 		kmem_free(qinf);
 		mp->m_quotainfo = NULL;
 		return error;
@@ -794,8 +801,6 @@ xfs_qm_init_quotainfo(
 	INIT_RADIX_TREE(&qinf->qi_gquota_tree, GFP_NOFS);
 	mutex_init(&qinf->qi_tree_lock);
 
-	list_lru_init(&qinf->qi_lru);
-
 	/* mutex used to serialize quotaoffs */
 	mutex_init(&qinf->qi_quotaofflock);
 
@@ -883,6 +888,7 @@ xfs_qm_destroy_quotainfo(
 	qi = mp->m_quotainfo;
 	ASSERT(qi != NULL);
 
+	list_lru_destroy(&qi->qi_lru);
 	unregister_shrinker(&qi->qi_shrinker);
 
 	if (qi->qi_uquotaip) {
diff --git a/include/linux/list_lru.h b/include/linux/list_lru.h
index dcb67dc..6d6efda 100644
--- a/include/linux/list_lru.h
+++ b/include/linux/list_lru.h
@@ -42,18 +42,8 @@ struct list_lru_array {
 };
 
 struct list_lru {
-	/*
-	 * Because we use a fixed-size array, this struct can be very big if
-	 * MAX_NUMNODES is big. If this becomes a problem this is fixable by
-	 * turning this into a pointer and dynamically allocating this to
-	 * nr_node_ids. This quantity is firwmare-provided, and still would
-	 * provide room for all nodes at the cost of a pointer lookup and an
-	 * extra allocation. Because that allocation will most likely come from
-	 * a different slab cache than the main structure holding this
-	 * structure, we may very well fail.
-	 */
-	struct list_lru_node	node[MAX_NUMNODES];
-	atomic_long_t		node_totals[MAX_NUMNODES];
+	struct list_lru_node	*node;
+	atomic_long_t		*node_totals;
 	nodemask_t		active_nodes;
 #ifdef CONFIG_MEMCG_KMEM
 	/* All memcg-aware LRUs will be chained in the lrus list */
@@ -78,14 +68,11 @@ struct mem_cgroup;
 struct list_lru_array *lru_alloc_array(void);
 int memcg_update_all_lrus(unsigned long num);
 void memcg_destroy_all_lrus(struct mem_cgroup *memcg);
-void list_lru_destroy(struct list_lru *lru);
 int __memcg_init_lru(struct list_lru *lru);
-#else
-static inline void list_lru_destroy(struct list_lru *lru)
-{
-}
 #endif
 
+void list_lru_destroy(struct list_lru *lru);
+
 int __list_lru_init(struct list_lru *lru, bool memcg_enabled);
 static inline int list_lru_init(struct list_lru *lru)
 {
diff --git a/lib/list_lru.c b/lib/list_lru.c
index f919f99..1b38d67 100644
--- a/lib/list_lru.c
+++ b/lib/list_lru.c
@@ -334,7 +334,6 @@ int __memcg_init_lru(struct list_lru *lru)
 {
 	int ret;
 
-	INIT_LIST_HEAD(&lru->lrus);
 	mutex_lock(&all_memcg_lrus_mutex);
 	list_add(&lru->lrus, &all_memcg_lrus);
 	ret = memcg_new_lru(lru);
@@ -369,8 +368,11 @@ out:
 	return ret;
 }
 
-void list_lru_destroy(struct list_lru *lru)
+static void list_lru_destroy_memcg(struct list_lru *lru)
 {
+	if (list_empty(&lru->lrus))
+		return;
+
 	mutex_lock(&all_memcg_lrus_mutex);
 	list_del(&lru->lrus);
 	mutex_unlock(&all_memcg_lrus_mutex);
@@ -388,20 +390,58 @@ void memcg_destroy_all_lrus(struct mem_cgroup *memcg)
 	}
 	mutex_unlock(&all_memcg_lrus_mutex);
 }
+
+int memcg_list_lru_init(struct list_lru *lru, bool memcg_enabled)
+{
+	INIT_LIST_HEAD(&lru->lrus);
+	if (memcg_enabled)
+		return memcg_init_lru(lru);
+
+	return 0;
+}
+#else
+static void list_lru_destroy_memcg(struct list_lru *lru)
+{
+}
+
+int memcg_list_lru_init(struct list_lru *lru, bool memcg_enabled)
+{
+	return 0;
+}
 #endif
 
 int __list_lru_init(struct list_lru *lru, bool memcg_enabled)
 {
 	int i;
 
+	size_t size;
+
+	size = sizeof(*lru->node) * nr_node_ids;
+	lru->node = kzalloc(size, GFP_KERNEL);
+	if (!lru->node)
+		return -ENOMEM;
+
+	size = sizeof(*lru->node) * nr_node_ids;
+	lru->node_totals = kzalloc(size, GFP_KERNEL);
+	if (!lru->node_totals) {
+		kfree(lru->node);
+		return -ENOMEM;
+	}
+
 	nodes_clear(lru->active_nodes);
-	for (i = 0; i < MAX_NUMNODES; i++) {
+	for (i = 0; i < nr_node_ids; i++) {
 		list_lru_init_one(&lru->node[i]);
 		atomic_long_set(&lru->node_totals[i], 0);
 	}
 
-	if (memcg_enabled)
-		return memcg_init_lru(lru);
-	return 0;
+	return memcg_list_lru_init(lru, memcg_enabled);
 }
 EXPORT_SYMBOL_GPL(__list_lru_init);
+
+void list_lru_destroy(struct list_lru *lru)
+{
+	kfree(lru->node);
+	kfree(lru->node_totals);
+	list_lru_destroy_memcg(lru);
+}
+EXPORT_SYMBOL_GPL(list_lru_destroy);
-- 
1.8.1.4


WARNING: multiple messages have this Message-ID (diff)
From: Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
To: Dave Chinner <david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org>
Cc: Andrew Morton
	<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
	Glauber Costa <glommer-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>,
	<linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	Mel Gorman <mgorman-l3A5Bk7waGM@public.gmane.org>,
	<linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org>,
	<cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	<kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>,
	Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>,
	Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>,
	<hughd-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	Greg Thelen <gthelen-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	Dave Chinner <dchinner-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Subject: Re: [PATCH v10 11/35] list_lru: per-node list infrastructure
Date: Thu, 6 Jun 2013 12:21:43 +0400	[thread overview]
Message-ID: <51B04697.90106@parallels.com> (raw)
In-Reply-To: <20130606032107.GQ29338@dastard>

[-- Attachment #1: Type: text/plain, Size: 1250 bytes --]

On 06/06/2013 07:21 AM, Dave Chinner wrote:
>> It's unclear that active_nodes is really needed - we could just iterate
>> > across all items in list_lru.node[].  Are we sure that the correct
>> > tradeoff decision was made here?
> Yup. Think of all the cache line misses that checking
> node[x].nr_items != 0 entails. If MAX_NUMNODES = 1024, there's 1024
> cacheline misses right there. The nodemask is a much more cache
> friendly method of storing active node state.
> 
> not to mention that for small machines with a large MAX_NUMNODES,
> we'd be checking nodes that never have items stored on them...
> 
>> > What's the story on NUMA node hotplug, btw?
> Do we care? hotplug doesn't change MAX_NUMNODES, and if you are
> removing a node you have to free all the memory on the node,
> so that should already be tken care of by external code....
> 

Mel have already complained about this.
I have a patch that makes it dynamic but I didn't include it in here
because the series was already too big. I was also hoping to get it
ontop of the others, to avoid disruption.

I am attaching here for your appreciation.

For the record, nr_node_ids is firmware provided and it is actually
possible nodes, not online nodes. So hotplug won't change that.



[-- Attachment #2: 0001-list_lru-dynamically-adjust-node-arrays.patch --]
[-- Type: text/x-patch, Size: 7342 bytes --]

>From cfc280ee20d93b1901c5ad2dcb13635ce7703d92 Mon Sep 17 00:00:00 2001
From: Glauber Costa <glommer-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
Date: Wed, 22 May 2013 09:55:15 +0400
Subject: [PATCH] list_lru: dynamically adjust node arrays

We currently use a compile-time constant to size the node array for the
list_lru structure. Due to this, we don't need to allocate any memory at
initialization time. But as a consequence, the structures that contain
embedded list_lru lists can become way too big (the superblock for
instance contains two of them).

This patch aims at ameliorating this situation by dynamically allocating
the node arrays with the firmware provided nr_node_ids.

Signed-off-by: Glauber Costa <glommer-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
Cc: Dave Chinner <dchinner-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: Mel Gorman <mgorman-l3A5Bk7waGM@public.gmane.org>
---
 fs/super.c               |  9 +++++++--
 fs/xfs/xfs_buf.c         |  6 +++++-
 fs/xfs/xfs_qm.c          | 10 ++++++++--
 include/linux/list_lru.h | 21 ++++---------------
 lib/list_lru.c           | 52 ++++++++++++++++++++++++++++++++++++++++++------
 5 files changed, 70 insertions(+), 28 deletions(-)

diff --git a/fs/super.c b/fs/super.c
index ff40e33..f8dfcec 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -209,8 +209,10 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags)
 		INIT_HLIST_BL_HEAD(&s->s_anon);
 		INIT_LIST_HEAD(&s->s_inodes);
 
-		list_lru_init_memcg(&s->s_dentry_lru);
-		list_lru_init_memcg(&s->s_inode_lru);
+		if (list_lru_init_memcg(&s->s_dentry_lru))
+			goto err_out;
+		if (list_lru_init_memcg(&s->s_inode_lru))
+			goto err_out_dentry_lru;
 
 		INIT_LIST_HEAD(&s->s_mounts);
 		init_rwsem(&s->s_umount);
@@ -251,6 +253,9 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags)
 	}
 out:
 	return s;
+
+err_out_dentry_lru:
+	list_lru_destroy(&s->s_dentry_lru);
 err_out:
 	security_sb_free(s);
 #ifdef CONFIG_SMP
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 0d7a619..b8cde02 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -1592,6 +1592,7 @@ xfs_free_buftarg(
 	struct xfs_mount	*mp,
 	struct xfs_buftarg	*btp)
 {
+	list_lru_destroy(&btp->bt_lru);
 	unregister_shrinker(&btp->bt_shrinker);
 
 	if (mp->m_flags & XFS_MOUNT_BARRIER)
@@ -1666,9 +1667,12 @@ xfs_alloc_buftarg(
 	if (!btp->bt_bdi)
 		goto error;
 
-	list_lru_init(&btp->bt_lru);
 	if (xfs_setsize_buftarg_early(btp, bdev))
 		goto error;
+
+	if (list_lru_init(&btp->bt_lru))
+		goto error;
+
 	btp->bt_shrinker.count_objects = xfs_buftarg_shrink_count;
 	btp->bt_shrinker.scan_objects = xfs_buftarg_shrink_scan;
 	btp->bt_shrinker.seeks = DEFAULT_SEEKS;
diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
index 85ca39e..29ea575 100644
--- a/fs/xfs/xfs_qm.c
+++ b/fs/xfs/xfs_qm.c
@@ -780,11 +780,18 @@ xfs_qm_init_quotainfo(
 
 	qinf = mp->m_quotainfo = kmem_zalloc(sizeof(xfs_quotainfo_t), KM_SLEEP);
 
+	if ((error = list_lru_init(&qinf->qi_lru))) {
+		kmem_free(qinf);
+		mp->m_quotainfo = NULL;
+		return error;
+	}
+
 	/*
 	 * See if quotainodes are setup, and if not, allocate them,
 	 * and change the superblock accordingly.
 	 */
 	if ((error = xfs_qm_init_quotainos(mp))) {
+		list_lru_destroy(&qinf->qi_lru);
 		kmem_free(qinf);
 		mp->m_quotainfo = NULL;
 		return error;
@@ -794,8 +801,6 @@ xfs_qm_init_quotainfo(
 	INIT_RADIX_TREE(&qinf->qi_gquota_tree, GFP_NOFS);
 	mutex_init(&qinf->qi_tree_lock);
 
-	list_lru_init(&qinf->qi_lru);
-
 	/* mutex used to serialize quotaoffs */
 	mutex_init(&qinf->qi_quotaofflock);
 
@@ -883,6 +888,7 @@ xfs_qm_destroy_quotainfo(
 	qi = mp->m_quotainfo;
 	ASSERT(qi != NULL);
 
+	list_lru_destroy(&qi->qi_lru);
 	unregister_shrinker(&qi->qi_shrinker);
 
 	if (qi->qi_uquotaip) {
diff --git a/include/linux/list_lru.h b/include/linux/list_lru.h
index dcb67dc..6d6efda 100644
--- a/include/linux/list_lru.h
+++ b/include/linux/list_lru.h
@@ -42,18 +42,8 @@ struct list_lru_array {
 };
 
 struct list_lru {
-	/*
-	 * Because we use a fixed-size array, this struct can be very big if
-	 * MAX_NUMNODES is big. If this becomes a problem this is fixable by
-	 * turning this into a pointer and dynamically allocating this to
-	 * nr_node_ids. This quantity is firwmare-provided, and still would
-	 * provide room for all nodes at the cost of a pointer lookup and an
-	 * extra allocation. Because that allocation will most likely come from
-	 * a different slab cache than the main structure holding this
-	 * structure, we may very well fail.
-	 */
-	struct list_lru_node	node[MAX_NUMNODES];
-	atomic_long_t		node_totals[MAX_NUMNODES];
+	struct list_lru_node	*node;
+	atomic_long_t		*node_totals;
 	nodemask_t		active_nodes;
 #ifdef CONFIG_MEMCG_KMEM
 	/* All memcg-aware LRUs will be chained in the lrus list */
@@ -78,14 +68,11 @@ struct mem_cgroup;
 struct list_lru_array *lru_alloc_array(void);
 int memcg_update_all_lrus(unsigned long num);
 void memcg_destroy_all_lrus(struct mem_cgroup *memcg);
-void list_lru_destroy(struct list_lru *lru);
 int __memcg_init_lru(struct list_lru *lru);
-#else
-static inline void list_lru_destroy(struct list_lru *lru)
-{
-}
 #endif
 
+void list_lru_destroy(struct list_lru *lru);
+
 int __list_lru_init(struct list_lru *lru, bool memcg_enabled);
 static inline int list_lru_init(struct list_lru *lru)
 {
diff --git a/lib/list_lru.c b/lib/list_lru.c
index f919f99..1b38d67 100644
--- a/lib/list_lru.c
+++ b/lib/list_lru.c
@@ -334,7 +334,6 @@ int __memcg_init_lru(struct list_lru *lru)
 {
 	int ret;
 
-	INIT_LIST_HEAD(&lru->lrus);
 	mutex_lock(&all_memcg_lrus_mutex);
 	list_add(&lru->lrus, &all_memcg_lrus);
 	ret = memcg_new_lru(lru);
@@ -369,8 +368,11 @@ out:
 	return ret;
 }
 
-void list_lru_destroy(struct list_lru *lru)
+static void list_lru_destroy_memcg(struct list_lru *lru)
 {
+	if (list_empty(&lru->lrus))
+		return;
+
 	mutex_lock(&all_memcg_lrus_mutex);
 	list_del(&lru->lrus);
 	mutex_unlock(&all_memcg_lrus_mutex);
@@ -388,20 +390,58 @@ void memcg_destroy_all_lrus(struct mem_cgroup *memcg)
 	}
 	mutex_unlock(&all_memcg_lrus_mutex);
 }
+
+int memcg_list_lru_init(struct list_lru *lru, bool memcg_enabled)
+{
+	INIT_LIST_HEAD(&lru->lrus);
+	if (memcg_enabled)
+		return memcg_init_lru(lru);
+
+	return 0;
+}
+#else
+static void list_lru_destroy_memcg(struct list_lru *lru)
+{
+}
+
+int memcg_list_lru_init(struct list_lru *lru, bool memcg_enabled)
+{
+	return 0;
+}
 #endif
 
 int __list_lru_init(struct list_lru *lru, bool memcg_enabled)
 {
 	int i;
 
+	size_t size;
+
+	size = sizeof(*lru->node) * nr_node_ids;
+	lru->node = kzalloc(size, GFP_KERNEL);
+	if (!lru->node)
+		return -ENOMEM;
+
+	size = sizeof(*lru->node) * nr_node_ids;
+	lru->node_totals = kzalloc(size, GFP_KERNEL);
+	if (!lru->node_totals) {
+		kfree(lru->node);
+		return -ENOMEM;
+	}
+
 	nodes_clear(lru->active_nodes);
-	for (i = 0; i < MAX_NUMNODES; i++) {
+	for (i = 0; i < nr_node_ids; i++) {
 		list_lru_init_one(&lru->node[i]);
 		atomic_long_set(&lru->node_totals[i], 0);
 	}
 
-	if (memcg_enabled)
-		return memcg_init_lru(lru);
-	return 0;
+	return memcg_list_lru_init(lru, memcg_enabled);
 }
 EXPORT_SYMBOL_GPL(__list_lru_init);
+
+void list_lru_destroy(struct list_lru *lru)
+{
+	kfree(lru->node);
+	kfree(lru->node_totals);
+	list_lru_destroy_memcg(lru);
+}
+EXPORT_SYMBOL_GPL(list_lru_destroy);
-- 
1.8.1.4


WARNING: multiple messages have this Message-ID (diff)
From: Glauber Costa <glommer@parallels.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Glauber Costa <glommer@openvz.org>,
	linux-fsdevel@vger.kernel.org, Mel Gorman <mgorman@suse.de>,
	linux-mm@kvack.org, cgroups@vger.kernel.org,
	kamezawa.hiroyu@jp.fujitsu.com, Michal Hocko <mhocko@suse.cz>,
	Johannes Weiner <hannes@cmpxchg.org>,
	hughd@google.com, Greg Thelen <gthelen@google.com>,
	Dave Chinner <dchinner@redhat.com>
Subject: Re: [PATCH v10 11/35] list_lru: per-node list infrastructure
Date: Thu, 6 Jun 2013 12:21:43 +0400	[thread overview]
Message-ID: <51B04697.90106@parallels.com> (raw)
In-Reply-To: <20130606032107.GQ29338@dastard>

[-- Attachment #1: Type: text/plain, Size: 1250 bytes --]

On 06/06/2013 07:21 AM, Dave Chinner wrote:
>> It's unclear that active_nodes is really needed - we could just iterate
>> > across all items in list_lru.node[].  Are we sure that the correct
>> > tradeoff decision was made here?
> Yup. Think of all the cache line misses that checking
> node[x].nr_items != 0 entails. If MAX_NUMNODES = 1024, there's 1024
> cacheline misses right there. The nodemask is a much more cache
> friendly method of storing active node state.
> 
> not to mention that for small machines with a large MAX_NUMNODES,
> we'd be checking nodes that never have items stored on them...
> 
>> > What's the story on NUMA node hotplug, btw?
> Do we care? hotplug doesn't change MAX_NUMNODES, and if you are
> removing a node you have to free all the memory on the node,
> so that should already be tken care of by external code....
> 

Mel have already complained about this.
I have a patch that makes it dynamic but I didn't include it in here
because the series was already too big. I was also hoping to get it
ontop of the others, to avoid disruption.

I am attaching here for your appreciation.

For the record, nr_node_ids is firmware provided and it is actually
possible nodes, not online nodes. So hotplug won't change that.



[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-list_lru-dynamically-adjust-node-arrays.patch --]
[-- Type: text/x-patch; name="0001-list_lru-dynamically-adjust-node-arrays.patch", Size: 0 bytes --]



  parent reply	other threads:[~2013-06-06  8:21 UTC|newest]

Thread overview: 248+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-03 19:29 [PATCH v10 00/35] kmemcg shrinkers Glauber Costa
2013-06-03 19:29 ` Glauber Costa
2013-06-03 19:29 ` Glauber Costa
2013-06-03 19:29 ` [PATCH v10 02/35] super: fix calculation of shrinkable objects for small numbers Glauber Costa
2013-06-03 19:29   ` Glauber Costa
2013-06-03 19:29 ` [PATCH v10 05/35] dcache: remove dentries from LRU before putting on dispose list Glauber Costa
2013-06-03 19:29   ` Glauber Costa
2013-06-05 23:07   ` Andrew Morton
2013-06-05 23:07     ` Andrew Morton
2013-06-06  8:04     ` Glauber Costa
2013-06-06  8:04       ` Glauber Costa
     [not found] ` <1370287804-3481-1-git-send-email-glommer-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2013-06-03 19:29   ` [PATCH v10 01/35] fs: bump inode and dentry counters to long Glauber Costa
2013-06-03 19:29     ` Glauber Costa
2013-06-03 19:29     ` Glauber Costa
2013-06-03 19:29   ` [PATCH v10 03/35] dcache: convert dentry_stat.nr_unused to per-cpu counters Glauber Costa
2013-06-03 19:29     ` Glauber Costa
2013-06-03 19:29     ` Glauber Costa
2013-06-05 23:07     ` Andrew Morton
2013-06-05 23:07       ` Andrew Morton
2013-06-06  1:45       ` Dave Chinner
2013-06-06  2:48         ` Andrew Morton
2013-06-06  4:02           ` Dave Chinner
2013-06-06 12:40           ` Glauber Costa
2013-06-06 12:40             ` Glauber Costa
2013-06-06 22:25             ` Andrew Morton
2013-06-06 22:25               ` Andrew Morton
     [not found]               ` <20130606152546.52f614d852da32d28a0b460f-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2013-06-06 23:42                 ` Dave Chinner
2013-06-06 23:42                   ` Dave Chinner
2013-06-07  6:03                   ` Glauber Costa
2013-06-07  6:03                     ` Glauber Costa
2013-06-07  6:03                     ` Glauber Costa
2013-06-03 19:29   ` [PATCH v10 04/35] dentry: move to per-sb LRU locks Glauber Costa
2013-06-03 19:29     ` Glauber Costa
2013-06-03 19:29     ` Glauber Costa
2013-06-05 23:07     ` Andrew Morton
2013-06-05 23:07       ` Andrew Morton
2013-06-06  1:56       ` Dave Chinner
2013-06-06  8:03       ` Glauber Costa
2013-06-06  8:03         ` Glauber Costa
2013-06-06 12:51         ` Glauber Costa
2013-06-06 12:51           ` Glauber Costa
2013-06-03 19:29   ` [PATCH v10 06/35] mm: new shrinker API Glauber Costa
2013-06-03 19:29     ` Glauber Costa
2013-06-03 19:29     ` Glauber Costa
2013-06-05 23:07     ` Andrew Morton
2013-06-05 23:07       ` Andrew Morton
     [not found]       ` <20130605160751.499f0ebb35e89a80dd7931f2-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2013-06-06  7:58         ` Glauber Costa
2013-06-06  7:58           ` Glauber Costa
2013-06-06  7:58           ` Glauber Costa
2013-06-03 19:29   ` [PATCH v10 07/35] shrinker: convert superblock shrinkers to new API Glauber Costa
2013-06-03 19:29     ` Glauber Costa
2013-06-03 19:29     ` Glauber Costa
2013-06-03 19:29   ` [PATCH v10 08/35] list: add a new LRU list type Glauber Costa
2013-06-03 19:29     ` Glauber Costa
2013-06-03 19:29     ` Glauber Costa
     [not found]     ` <1370287804-3481-9-git-send-email-glommer-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2013-06-05 23:07       ` Andrew Morton
2013-06-05 23:07         ` Andrew Morton
2013-06-05 23:07         ` Andrew Morton
2013-06-06  2:49         ` Dave Chinner
2013-06-06  2:49           ` Dave Chinner
2013-06-06  3:05           ` Andrew Morton
     [not found]             ` <20130605200554.d4dae16f.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2013-06-06  4:44               ` Dave Chinner
2013-06-06  4:44                 ` Dave Chinner
2013-06-06  7:04                 ` Andrew Morton
2013-06-06  9:03                   ` Glauber Costa
2013-06-06  9:03                     ` Glauber Costa
2013-06-06  9:55                     ` Andrew Morton
2013-06-06  9:55                       ` Andrew Morton
     [not found]                       ` <20130606025517.8400c279.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2013-06-06 11:47                         ` Glauber Costa
2013-06-06 11:47                           ` Glauber Costa
2013-06-06 11:47                           ` Glauber Costa
2013-06-06 14:28           ` Glauber Costa
2013-06-06 14:28             ` Glauber Costa
2013-06-06 14:28             ` Glauber Costa
2013-06-06  8:10         ` Glauber Costa
2013-06-06  8:10           ` Glauber Costa
2013-06-03 19:29   ` [PATCH v10 09/35] inode: convert inode lru list to generic lru list code Glauber Costa
2013-06-03 19:29     ` Glauber Costa
2013-06-03 19:29     ` Glauber Costa
2013-06-03 19:29   ` [PATCH v10 10/35] dcache: convert to use new lru list infrastructure Glauber Costa
2013-06-03 19:29     ` Glauber Costa
2013-06-03 19:29     ` Glauber Costa
2013-06-03 19:29   ` [PATCH v10 11/35] list_lru: per-node " Glauber Costa
2013-06-03 19:29     ` Glauber Costa
2013-06-03 19:29     ` Glauber Costa
2013-06-05 23:08     ` Andrew Morton
2013-06-05 23:08       ` Andrew Morton
2013-06-06  3:21       ` Dave Chinner
2013-06-06  3:51         ` Andrew Morton
2013-06-06  3:51           ` Andrew Morton
2013-06-06  8:21         ` Glauber Costa [this message]
2013-06-06  8:21           ` Glauber Costa
2013-06-06  8:21           ` Glauber Costa
2013-06-06 16:15       ` Glauber Costa
2013-06-06 16:15         ` Glauber Costa
2013-06-06 16:48         ` Andrew Morton
2013-06-06 16:48           ` Andrew Morton
2013-06-06 16:48           ` Andrew Morton
2013-06-03 19:29   ` [PATCH v10 12/35] shrinker: add node awareness Glauber Costa
2013-06-03 19:29     ` Glauber Costa
2013-06-03 19:29     ` Glauber Costa
2013-06-05 23:08     ` Andrew Morton
2013-06-05 23:08       ` Andrew Morton
2013-06-06  3:26       ` Dave Chinner
2013-06-06  3:54         ` Andrew Morton
     [not found]       ` <20130605160810.5b203c3368b9df7d087ee3b1-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2013-06-06  8:23         ` Glauber Costa
2013-06-06  8:23           ` Glauber Costa
2013-06-06  8:23           ` Glauber Costa
2013-06-03 19:29   ` [PATCH v10 13/35] vmscan: per-node deferred work Glauber Costa
2013-06-03 19:29     ` Glauber Costa
2013-06-03 19:29     ` Glauber Costa
2013-06-05 23:08     ` Andrew Morton
2013-06-05 23:08       ` Andrew Morton
2013-06-06  3:37       ` Dave Chinner
2013-06-06  4:59         ` Dave Chinner
2013-06-06  7:12           ` Andrew Morton
2013-06-06  7:12             ` Andrew Morton
     [not found]       ` <20130605160815.fb69f7d4d1736455727fc669-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2013-06-06  9:00         ` Glauber Costa
2013-06-06  9:00           ` Glauber Costa
2013-06-06  9:00           ` Glauber Costa
2013-06-03 19:29   ` [PATCH v10 14/35] list_lru: per-node API Glauber Costa
2013-06-03 19:29     ` Glauber Costa
2013-06-03 19:29     ` Glauber Costa
2013-06-03 19:29   ` [PATCH v10 15/35] fs: convert inode and dentry shrinking to be node aware Glauber Costa
2013-06-03 19:29     ` Glauber Costa
2013-06-03 19:29     ` Glauber Costa
2013-06-03 19:29   ` [PATCH v10 16/35] xfs: convert buftarg LRU to generic code Glauber Costa
2013-06-03 19:29     ` Glauber Costa
2013-06-03 19:29     ` Glauber Costa
2013-06-03 19:29   ` [PATCH v10 17/35] xfs: rework buffer dispose list tracking Glauber Costa
2013-06-03 19:29     ` Glauber Costa
2013-06-03 19:29     ` Glauber Costa
2013-06-03 19:29   ` [PATCH v10 18/35] xfs: convert dquot cache lru to list_lru Glauber Costa
2013-06-03 19:29     ` Glauber Costa
2013-06-03 19:29     ` Glauber Costa
2013-06-03 19:29   ` [PATCH v10 19/35] fs: convert fs shrinkers to new scan/count API Glauber Costa
2013-06-03 19:29     ` Glauber Costa
2013-06-03 19:29     ` Glauber Costa
2013-06-03 19:29   ` [PATCH v10 21/35] i915: bail out earlier when shrinker cannot acquire mutex Glauber Costa
2013-06-03 19:29     ` Glauber Costa
2013-06-03 19:29     ` Glauber Costa
2013-06-03 19:29   ` [PATCH v10 22/35] shrinker: convert remaining shrinkers to count/scan API Glauber Costa
2013-06-03 19:29     ` Glauber Costa
2013-06-03 19:29     ` Glauber Costa
2013-06-05 23:08     ` Andrew Morton
2013-06-05 23:08       ` Andrew Morton
     [not found]       ` <20130605160821.59adf9ad4efe48144fd9e237-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2013-06-06  3:41         ` Dave Chinner
2013-06-06  3:41           ` Dave Chinner
2013-06-06  8:27           ` Glauber Costa
2013-06-06  8:27             ` Glauber Costa
2013-06-03 19:29   ` [PATCH v10 23/35] hugepage: convert huge zero page shrinker to new shrinker API Glauber Costa
2013-06-03 19:29     ` Glauber Costa
2013-06-03 19:29     ` Glauber Costa
2013-06-03 19:29   ` [PATCH v10 24/35] shrinker: Kill old ->shrink API Glauber Costa
2013-06-03 19:29     ` Glauber Costa
2013-06-03 19:29     ` Glauber Costa
2013-06-03 19:29   ` [PATCH v10 25/35] vmscan: also shrink slab in memcg pressure Glauber Costa
2013-06-03 19:29     ` Glauber Costa
2013-06-03 19:29     ` Glauber Costa
2013-06-03 19:29   ` [PATCH v10 26/35] memcg,list_lru: duplicate LRUs upon kmemcg creation Glauber Costa
2013-06-03 19:29     ` Glauber Costa
2013-06-03 19:29     ` Glauber Costa
2013-06-05 23:08     ` Andrew Morton
2013-06-05 23:08       ` Andrew Morton
2013-06-05 23:08       ` Andrew Morton
     [not found]       ` <20130605160828.1ec9f3538258d9a6d6c74083-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2013-06-06  8:52         ` Glauber Costa
2013-06-06  8:52           ` Glauber Costa
2013-06-06  8:52           ` Glauber Costa
2013-06-03 19:29   ` [PATCH v10 27/35] lru: add an element to a memcg list Glauber Costa
2013-06-03 19:29     ` Glauber Costa
2013-06-03 19:29     ` Glauber Costa
2013-06-05 23:08     ` Andrew Morton
2013-06-05 23:08       ` Andrew Morton
2013-06-05 23:08       ` Andrew Morton
2013-06-06  8:44       ` Glauber Costa
2013-06-06  8:44         ` Glauber Costa
2013-06-03 19:29   ` [PATCH v10 28/35] list_lru: per-memcg walks Glauber Costa
2013-06-03 19:29     ` Glauber Costa
2013-06-03 19:29     ` Glauber Costa
2013-06-05 23:08     ` Andrew Morton
2013-06-05 23:08       ` Andrew Morton
     [not found]       ` <20130605160837.0d0a35fbd4b32d7ad02f7136-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2013-06-06  8:37         ` Glauber Costa
2013-06-06  8:37           ` Glauber Costa
2013-06-06  8:37           ` Glauber Costa
2013-06-03 19:29   ` [PATCH v10 29/35] memcg: per-memcg kmem shrinking Glauber Costa
2013-06-03 19:29     ` Glauber Costa
2013-06-03 19:29     ` Glauber Costa
2013-06-05 23:08     ` Andrew Morton
2013-06-05 23:08       ` Andrew Morton
     [not found]       ` <20130605160841.909420c06bfde62039489d2e-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2013-06-06  8:35         ` Glauber Costa
2013-06-06  8:35           ` Glauber Costa
2013-06-06  8:35           ` Glauber Costa
2013-06-06  9:49           ` Andrew Morton
2013-06-06  9:49             ` Andrew Morton
     [not found]             ` <20130606024906.e5b85b28.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2013-06-06 12:09               ` Glauber Costa
2013-06-06 12:09                 ` Glauber Costa
2013-06-06 12:09                 ` Glauber Costa
     [not found]                 ` <51B07BEC.9010205-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2013-06-06 22:23                   ` Andrew Morton
2013-06-06 22:23                     ` Andrew Morton
2013-06-06 22:23                     ` Andrew Morton
2013-06-07  6:10                     ` Glauber Costa
2013-06-07  6:10                       ` Glauber Costa
2013-06-07  6:10                       ` Glauber Costa
2013-06-03 19:29   ` [PATCH v10 30/35] memcg: scan cache objects hierarchically Glauber Costa
2013-06-03 19:29     ` Glauber Costa
2013-06-03 19:29     ` Glauber Costa
2013-06-05 23:08     ` Andrew Morton
2013-06-05 23:08       ` Andrew Morton
2013-06-03 19:30   ` [PATCH v10 32/35] super: targeted memcg reclaim Glauber Costa
2013-06-03 19:30     ` Glauber Costa
2013-06-03 19:30     ` Glauber Costa
2013-06-03 19:30   ` [PATCH v10 33/35] memcg: move initialization to memcg creation Glauber Costa
2013-06-03 19:30     ` Glauber Costa
2013-06-03 19:30     ` Glauber Costa
2013-06-03 19:30   ` [PATCH v10 34/35] vmpressure: in-kernel notifications Glauber Costa
2013-06-03 19:30     ` Glauber Costa
2013-06-03 19:30     ` Glauber Costa
2013-06-03 19:30   ` [PATCH v10 35/35] memcg: reap dead memcgs upon global memory pressure Glauber Costa
2013-06-03 19:30     ` Glauber Costa
2013-06-03 19:30     ` Glauber Costa
2013-06-05 23:09     ` Andrew Morton
2013-06-05 23:09       ` Andrew Morton
2013-06-06  8:33       ` Glauber Costa
2013-06-06  8:33         ` Glauber Costa
2013-06-06  8:33         ` Glauber Costa
2013-06-03 19:29 ` [PATCH v10 20/35] drivers: convert shrinkers to new count/scan API Glauber Costa
2013-06-03 19:29   ` Glauber Costa
2013-06-03 19:30 ` [PATCH v10 31/35] vmscan: take at least one pass with shrinkers Glauber Costa
2013-06-03 19:30   ` Glauber Costa
2013-06-05 23:07 ` [PATCH v10 00/35] kmemcg shrinkers Andrew Morton
2013-06-05 23:07   ` Andrew Morton
2013-06-06  3:44   ` Dave Chinner
2013-06-06  5:51   ` Glauber Costa
2013-06-06  5:51     ` Glauber Costa
2013-06-06  5:51     ` Glauber Costa
     [not found]     ` <51B02347.60809-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2013-06-06  7:18       ` Andrew Morton
2013-06-06  7:18         ` Andrew Morton
2013-06-06  7:18         ` Andrew Morton
     [not found]         ` <20130606001855.48d9da2e.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2013-06-06  7:37           ` Glauber Costa
2013-06-06  7:37             ` Glauber Costa
2013-06-06  7:37             ` Glauber Costa
2013-06-06  7:47             ` Andrew Morton
2013-06-06  7:47               ` Andrew Morton
2013-06-06  7:59               ` Glauber Costa
2013-06-06  7:59                 ` Glauber Costa
2013-06-06  7:59                 ` Glauber Costa
     [not found]   ` <20130605160721.da995af82eb247ccf8f8537f-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2013-06-07 14:15     ` Michal Hocko
2013-06-07 14:15       ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51B04697.90106@parallels.com \
    --to=glommer-bzqdu9zft3wakbo8gow8eq@public.gmane.org \
    --cc=akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
    --cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org \
    --cc=dchinner-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=glommer-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org \
    --cc=gthelen-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
    --cc=hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org \
    --cc=hughd-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
    --cc=kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org \
    --cc=linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org \
    --cc=mgorman-l3A5Bk7waGM@public.gmane.org \
    --cc=mhocko-AlSwsSmVLrQ@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.