public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] dentry/inode accounting for vm_enough_mem()
@ 2003-05-20  0:51 Dave Hansen
  2003-05-20  9:27 ` Andrew Morton
  2003-05-20 10:08 ` Andrew Morton
  0 siblings, 2 replies; 3+ messages in thread
From: Dave Hansen @ 2003-05-20  0:51 UTC (permalink / raw)
  To: lkml

[-- Attachment #1: Type: text/plain, Size: 1750 bytes --]

One of the things on the current must-fix list is:
> - Overcommit accounting gets wrong answers
> 
>   - underestimates reclaimable slab, gives bogus failures when
>     dcache&icache are large.

More comments from Andrew:
> If the cache slab fragmentation is bad it can be hugely wrong.
> A factor of ten has been observed (rarely).  Factors of two happen
> often.
...
> > But, if prune_[di]cache() will only touch those which are being
> >  counted by nr_unused, how can we be more aggressive?
> 
> Well, just by assuming all slab is reclaimable is one way.
> 
> The problem with that is that to read slab accounting we need to do
> get_page_state(), which is too expensive to be called for every mmap()
> on big SMP.

Instead of going through get_page_state(), the following code keeps 
track of entries as their space is allocated in the slab via 
{con,de}structors. It _will_ overestimate the amount of reclaimable 
slab but, previously, using the .nr_unused stat, this number was 
underestimated and caused too many good allocations to fail.  This 
assumes that every dentry/inode allocated in the slab is reclaimable,
which they probably will be if we get deperate enough anyway.

and, as for the counter type being an atomic_t:
> Andrew Morton wrote:
> > Dave Hansen wrote:
> > An atomic_t might be a good idea, but I'm a bit worried that 24 bits
> > might not be enough.  At 160 bytes/dentry, that's 2.5GB of dentries
> > before the counter overflows.  I would imagine that we'll run out of
> > plenty of other things before we get to _that_ many dentries.
> 
> The 24-bit thing is only on sparc32.  I don't think 2G of dentries
> is possible on sparc32 anyway.

The attached patch is against 2.5.69-mm7.
-- 
Dave Hansen
haveblue@us.ibm.com

[-- Attachment #2: (d,i)cache-vm_enough_fix-2.5.69-0.patch --]
[-- Type: text/x-patch, Size: 3722 bytes --]

diff -rup linux-2.5.69-mm8-clean/fs/dcache.c linux-2.5.69-mm8-dcache-count/fs/dcache.c
--- linux-2.5.69-mm8-clean/fs/dcache.c	Mon May 19 13:25:53 2003
+++ linux-2.5.69-mm8-dcache-count/fs/dcache.c	Mon May 19 13:45:05 2003
@@ -1529,6 +1529,16 @@ out:
 	return ino;
 }
 
+void d_ctor(void * objp, struct kmem_cache_s *cachep, unsigned long dflags)
+{
+	atomic_inc(&dentry_stat.nr_alloced);
+}
+
+void d_dtor(void * objp, struct kmem_cache_s *cachep, unsigned long dflags)
+{
+	atomic_dec(&dentry_stat.nr_alloced);
+}
+
 static void __init dcache_init(unsigned long mempages)
 {
 	struct hlist_head *d;
@@ -1548,7 +1558,7 @@ static void __init dcache_init(unsigned 
 					 sizeof(struct dentry),
 					 0,
 					 SLAB_HWCACHE_ALIGN,
-					 NULL, NULL);
+					 d_ctor, d_dtor);
 	if (!dentry_cache)
 		panic("Cannot create dentry cache");
 	
diff -rup linux-2.5.69-mm8-clean/fs/inode.c linux-2.5.69-mm8-dcache-count/fs/inode.c
--- linux-2.5.69-mm8-clean/fs/inode.c	Mon May 19 13:25:54 2003
+++ linux-2.5.69-mm8-dcache-count/fs/inode.c	Mon May 19 16:23:14 2003
@@ -197,6 +197,13 @@ static void init_once(void * foo, kmem_c
 	if ((flags & (SLAB_CTOR_VERIFY|SLAB_CTOR_CONSTRUCTOR)) ==
 	    SLAB_CTOR_CONSTRUCTOR)
 		inode_init_once(inode);
+
+	atomic_inc(&inodes_stat.nr_alloced);
+}
+
+void inode_dtor(void * objp, struct kmem_cache_s *cachep, unsigned long dflags)
+{
+	atomic_dec(&inodes_stat.nr_alloced);
 }
 
 /*
diff -rup linux-2.5.69-mm8-clean/include/linux/dcache.h linux-2.5.69-mm8-dcache-count/include/linux/dcache.h
--- linux-2.5.69-mm8-clean/include/linux/dcache.h	Mon May 19 13:21:34 2003
+++ linux-2.5.69-mm8-dcache-count/include/linux/dcache.h	Mon May 19 14:03:23 2003
@@ -37,6 +37,7 @@ struct qstr {
 struct dentry_stat_t {
 	int nr_dentry;
 	int nr_unused;
+	atomic_t nr_alloced;
 	int age_limit;          /* age in seconds */
 	int want_pages;         /* pages requested by system */
 	int dummy[2];
diff -rup linux-2.5.69-mm8-clean/include/linux/fs.h linux-2.5.69-mm8-dcache-count/include/linux/fs.h
--- linux-2.5.69-mm8-clean/include/linux/fs.h	Mon May 19 13:25:54 2003
+++ linux-2.5.69-mm8-dcache-count/include/linux/fs.h	Mon May 19 13:50:33 2003
@@ -58,6 +58,7 @@ extern struct files_stat_struct files_st
 struct inodes_stat_t {
 	int nr_inodes;
 	int nr_unused;
+	atomic_t nr_alloced;
 	int dummy[5];
 };
 extern struct inodes_stat_t inodes_stat;
diff -rup linux-2.5.69-mm8-clean/mm/mmap.c linux-2.5.69-mm8-dcache-count/mm/mmap.c
--- linux-2.5.69-mm8-clean/mm/mmap.c	Mon May 19 13:25:55 2003
+++ linux-2.5.69-mm8-dcache-count/mm/mmap.c	Mon May 19 16:24:33 2003
@@ -82,16 +82,21 @@ int vm_enough_memory(long pages)
 		free += nr_swap_pages;
 
 		/*
-		 * The code below doesn't account for free space in the
-		 * inode and dentry slab cache, slab cache fragmentation,
-		 * inodes and dentries which will become freeable under
-		 * VM load, etc. Lets just hope all these (complex)
-		 * factors balance out...
+		 * The code below will overestimate the amount of 
+		 * reclaimable slab.  Previously, using the .nr_unused
+		 * stat, this number was too low and caused too many
+		 * good allocations to fail.  This assumes that every 
+		 * dentry/inode allocated in the slab is reclaimable,
+		 * which they probably will be if we get deperate
+		 * enough.
+		 * - Dave Hansen <haveblue@us.ibm.com>
 		 */
-		free += (dentry_stat.nr_unused * sizeof(struct dentry)) >>
-			PAGE_SHIFT;
-		free += (inodes_stat.nr_unused * sizeof(struct inode)) >>
-			PAGE_SHIFT;
+		free += (atomic_read(&dentry_stat.nr_alloced) * 
+			 sizeof(struct dentry)) >>
+			 PAGE_SHIFT;
+		free += (atomic_read(&inodes_stat.nr_alloced) * 
+			 sizeof(struct inode)) >>
+			 PAGE_SHIFT;
 
 		/*
 		 * Leave the last 3% for root

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] dentry/inode accounting for vm_enough_mem()
  2003-05-20  0:51 [PATCH] dentry/inode accounting for vm_enough_mem() Dave Hansen
@ 2003-05-20  9:27 ` Andrew Morton
  2003-05-20 10:08 ` Andrew Morton
  1 sibling, 0 replies; 3+ messages in thread
From: Andrew Morton @ 2003-05-20  9:27 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-kernel

Dave Hansen <haveblue@us.ibm.com> wrote:
>
>   struct dentry_stat_t {
>   	int nr_dentry;
>   	int nr_unused;
>  +	atomic_t nr_alloced;
>   	int age_limit;          /* age in seconds */
>   	int want_pages;         /* pages requested by system */
>   	int dummy[2];

We're not at liberty to do this because /proc/sys/fs/dentry-state and
inode-state are implemented assuming that these structs are an array of
integers.  It'll screw up if the architecture's "int" and "atomic_t"
representations are different.

Probably you can just make this an integer and add a spinlock for it, or
not place it in dentry_stat.

Seems otherwise OK though.  Thanks.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] dentry/inode accounting for vm_enough_mem()
  2003-05-20  0:51 [PATCH] dentry/inode accounting for vm_enough_mem() Dave Hansen
  2003-05-20  9:27 ` Andrew Morton
@ 2003-05-20 10:08 ` Andrew Morton
  1 sibling, 0 replies; 3+ messages in thread
From: Andrew Morton @ 2003-05-20 10:08 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-kernel

Dave Hansen <haveblue@us.ibm.com> wrote:
>
> +void inode_dtor(void * objp, struct kmem_cache_s *cachep, unsigned long dflags)
>  +{
>  +	atomic_dec(&inodes_stat.nr_alloced);
>   }

This isn't called anywhere.

inodes are a problem, because their sizes differ, and because each
filesystem uses its own slab cache for them.

I'm wondering if it would not be better to modify slab to do this.  For all
the inode caches and the dentry cache and the mb cache and that new
shrinkable cache which just got added to XFS, we can pass a new flag into
kmem_cache_create() which tells slab.c to account for this cache.

So then, in slab.c, it's just a matter of incrementing or decrementing a
global counter each time slab allocates or frees a page on behalf of a
thus-tagged cache.  And simply read that counter in vm_enough_memory().




^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2003-05-20  9:53 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-05-20  0:51 [PATCH] dentry/inode accounting for vm_enough_mem() Dave Hansen
2003-05-20  9:27 ` Andrew Morton
2003-05-20 10:08 ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox