2.4.25 - large inode

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* 2.4.25 - large inode_cache
@ 2004-02-26  1:33 Jakob Oestergaard
  2004-02-26 11:19 ` Christian Leber
  0 siblings, 1 reply; 9+ messages in thread
From: Jakob Oestergaard @ 2004-02-26  1:33 UTC (permalink / raw)
  To: linux-kernel

Dear list,

I have this dual athlon box with 1G memory and a 150G filesystem (four
IDE disks on promise controllers, SW RAID-0+1, ext3fs, user quotas,
HIGHMEM set, plain 2.4.25).

A fresh boot after an unclean shutdown, caused it to run quotacheck on
the filesystem, nothing odd about that.

However, after quotacheck completed, I got "3 order allocation failed"
messages. They kept coming, about one per second. This was happening as
the box entered single user mode - and the messages continued.

>From /proc/slabinfo, I got:
inode_cache       1208571 1208571    512 172653 172653    1 :  124   62
dentry_cache         736 268680    128   27 8956    1 :  252  126

To me it looks like this could be at least a part of the explanation for
the memory shortage - am I completely off track here?

After a clean boot with no quotacheck, it looks like:
inode_cache         3829   3829    512  547  547    1 :  124   62
dentry_cache        4710   4710    128  157  157    1 :  252  126

Besides, after a few days of running, the machine will use about 100MB
of memory for cache, 100MB for buffers, about 100MB for userspace, and
the remaining 600-700 MB of memory for inode_cache and dentry_cache.

It's a file server, and its performance is far from stellar. After
seeing that only about 200MB total was used for cache/buffers, I started
digging into slabinfo.

Is this a known problem?  (yes, I know that there's been quite a bit
back and forward on this list about the slabs, but I'm not sure what the
current status is - as far as I know, the only known long-standing
problems with the 2.4 series VM should have been fixed in 2.4.25).

If not, is there anything I can do to actually find out what the cause
of my poor cache sizes are?   I believe that a box which does almost
strictly NFS file serving and has 1G of memory, should use more than
100M for cache.  Or is that just me?    (no, there is no significant
amount of free memory, virtually all is used - but not by cache and not
by userspace).

If it is a known problem - any workarounds?

Would 2.6 solve it?

Thanks, for any input you may have,

 / jakob

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.4.25 - large inode_cache
  2004-02-26  1:33 2.4.25 - large inode_cache Jakob Oestergaard
@ 2004-02-26 11:19 ` Christian Leber
  2004-02-26 13:08   ` Marcelo Tosatti
  0 siblings, 1 reply; 9+ messages in thread
From: Christian Leber @ 2004-02-26 11:19 UTC (permalink / raw)
  To: linux-kernel

On Thu, Feb 26, 2004 at 02:33:14AM +0100, Jakob Oestergaard wrote:
> Besides, after a few days of running, the machine will use about 100MB
> of memory for cache, 100MB for buffers, about 100MB for userspace, and
> the remaining 600-700 MB of memory for inode_cache and dentry_cache.

I have the same problem (it's an dual PIII NFS fileserver with promise sx6000
raid, 320 GB ext3 filesystem and only 512 MB Ram).

After only 2 days running the bloatmeter output looks like:
   inode_cache:   336585KB   357234KB   94.21
  dentry_cache:    50305KB    56523KB   88.99
       size-32:     1516KB     1695KB   89.46

free output is this:
             total       used       free     shared    buffers  cached
Mem:        515980     506464       9516          0    2272      19204
-/+ buffers/cache:     484988      30992
Swap:      1951856       7992    1943864


Regards

Christian Leber

-- 
  "Omnis enim res, quae dando non deficit, dum habetur et non datur,
   nondum habetur, quomodo habenda est."       (Aurelius Augustinus)
  Translation: <http://gnuhh.org/work/fsf-europe/augustinus.html>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.4.25 - large inode_cache
  2004-02-26 11:19 ` Christian Leber
@ 2004-02-26 13:08   ` Marcelo Tosatti
  2004-02-26 13:03     ` Jakob Oestergaard
  0 siblings, 1 reply; 9+ messages in thread
From: Marcelo Tosatti @ 2004-02-26 13:08 UTC (permalink / raw)
  To: Christian Leber; +Cc: linux-kernel



On Thu, 26 Feb 2004, Christian Leber wrote:

> On Thu, Feb 26, 2004 at 02:33:14AM +0100, Jakob Oestergaard wrote:
> > Besides, after a few days of running, the machine will use about 100MB
> > of memory for cache, 100MB for buffers, about 100MB for userspace, and
> > the remaining 600-700 MB of memory for inode_cache and dentry_cache.
>
> I have the same problem (it's an dual PIII NFS fileserver with promise sx6000
> raid, 320 GB ext3 filesystem and only 512 MB Ram).
>
> After only 2 days running the bloatmeter output looks like:
>    inode_cache:   336585KB   357234KB   94.21
>   dentry_cache:    50305KB    56523KB   88.99
>        size-32:     1516KB     1695KB   89.46
>
> free output is this:
>              total       used       free     shared    buffers  cached
> Mem:        515980     506464       9516          0    2272      19204
> -/+ buffers/cache:     484988      30992
> Swap:      1951856       7992    1943864

This should be normal behaviour -- the i/d caches grew because of file
system activitity. This memory will be reclaimed in case theres pressure.

Is the behaviour different from previous 2.4 or 2.6 kernels?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.4.25 - large inode_cache
  2004-02-26 13:08   ` Marcelo Tosatti
@ 2004-02-26 13:03     ` Jakob Oestergaard
  2004-02-26 14:23       ` Marcelo Tosatti
  0 siblings, 1 reply; 9+ messages in thread
From: Jakob Oestergaard @ 2004-02-26 13:03 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: linux-kernel

On Thu, Feb 26, 2004 at 10:08:23AM -0300, Marcelo Tosatti wrote:
...
> >
> > free output is this:
> >              total       used       free     shared    buffers  cached
> > Mem:        515980     506464       9516          0    2272      19204
> > -/+ buffers/cache:     484988      30992
> > Swap:      1951856       7992    1943864
> 
> This should be normal behaviour -- the i/d caches grew because of file
> system activitity. This memory will be reclaimed in case theres pressure.

But how is "pressure" defined?

Will a heap of busy knfsd processes doing reads or writes exert
pressure?   Or is it only local userspace that can pressurize the VM (by
either anonymously backed memory or file I/O).

This server happily serves large home directories over NFS, at really
poor speeds.  It will happily serve tens or hundreds of gigabytes, read
and write, over the course of a day, and *still* only cache about 100MB
NFS to/from the server is slow. It's common to see 10 knfsd processes in
D state while vmstat tells me the array works with about 4-6MB/sec
sustained throughput (where hdparm -t would give me more than 70MB/sec
on the md device).

The files read and written are commonly in the 20-60 MB range, so it's
not just because I'm loading the server with small seeks. Many files are
read multiple times within a few minutes, so the cache usage of 100MB is
completely bogus the way that I see it - but maybe there's just
something I don't know about the caching?   :)

> 
> Is the behaviour different from previous 2.4 or 2.6 kernels?

I never investigated the slabinfo on earlier 2.4. But the performance on
this server has been "under expectations" for as long as I can remember.
So, from the performance experience on this server I would say that
2.4.25 is not any worse than older kernels.

Since this is a production system I have been reluctant to jump on the
2.6 wagon - but my other experiences with 2.6.X have been good, so I'm
probably going to soften up and give it a try in a not too distant
future.

However, if this dcache/icache problem is well known and is (or at least
should be) solved in 2.6, then I can do the test this weekend.

Any enlightenment or suggestions are greatly appreciated :)

Thanks,

 / jakob

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.4.25 - large inode_cache
  2004-02-26 13:03     ` Jakob Oestergaard
@ 2004-02-26 14:23       ` Marcelo Tosatti
  2004-02-26 13:53         ` Jakob Oestergaard
  2004-02-26 17:43         ` Andreas Dilger
  0 siblings, 2 replies; 9+ messages in thread
From: Marcelo Tosatti @ 2004-02-26 14:23 UTC (permalink / raw)
  To: Jakob Oestergaard; +Cc: Marcelo Tosatti, linux-kernel



On Thu, 26 Feb 2004, Jakob Oestergaard wrote:

> On Thu, Feb 26, 2004 at 10:08:23AM -0300, Marcelo Tosatti wrote:
> ...
> > >
> > > free output is this:
> > >              total       used       free     shared    buffers  cached
> > > Mem:        515980     506464       9516          0    2272      19204
> > > -/+ buffers/cache:     484988      30992
> > > Swap:      1951856       7992    1943864
> >
> > This should be normal behaviour -- the i/d caches grew because of file
> > system activitity. This memory will be reclaimed in case theres pressure.
>
> But how is "pressure" defined?
>
> Will a heap of busy knfsd processes doing reads or writes exert
> pressure?   Or is it only local userspace that can pressurize the VM (by
> either anonymously backed memory or file I/O).

Any allocator will cause VM pressure.

> This server happily serves large home directories over NFS, at really
> poor speeds.  It will happily serve tens or hundreds of gigabytes, read
> and write, over the course of a day, and *still* only cache about 100MB
> NFS to/from the server is slow. It's common to see 10 knfsd processes in
> D state while vmstat tells me the array works with about 4-6MB/sec
> sustained throughput (where hdparm -t would give me more than 70MB/sec
> on the md device).
>
> The files read and written are commonly in the 20-60 MB range, so it's
> not just because I'm loading the server with small seeks. Many files are
> read multiple times within a few minutes, so the cache usage of 100MB is
> completely bogus the way that I see it - but maybe there's just
> something I don't know about the caching?   :)

It sounds the cache could grow larger for better performance, indeed.

> > Is the behaviour different from previous 2.4 or 2.6 kernels?
>
> I never investigated the slabinfo on earlier 2.4. But the performance on
> this server has been "under expectations" for as long as I can remember.
> So, from the performance experience on this server I would say that
> 2.4.25 is not any worse than older kernels.
>
> Since this is a production system I have been reluctant to jump on the
> 2.6 wagon - but my other experiences with 2.6.X have been good, so I'm
> probably going to soften up and give it a try in a not too distant
> future.
>
> However, if this dcache/icache problem is well known and is (or at least
> should be) solved in 2.6, then I can do the test this weekend.
>
> Any enlightenment or suggestions are greatly appreciated :)

What you can try is to increase the VM tunable vm_vfs_scan_ratio. This is
the proportion of VFS unused d/i caches that will try to be in one VM
freeing pass. The default is 6. Try 4 or 3.

/proc/sys/vm/vm_vfs_scan_ratio

You can also play with

/proc/sys/vm/vm_cache_scan_ratio (which is the percentage of cache which
will be scanned in one go).


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.4.25 - large inode_cache
  2004-02-26 14:23       ` Marcelo Tosatti
@ 2004-02-26 13:53         ` Jakob Oestergaard
  2004-02-26 17:43         ` Andreas Dilger
  1 sibling, 0 replies; 9+ messages in thread
From: Jakob Oestergaard @ 2004-02-26 13:53 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: linux-kernel

On Thu, Feb 26, 2004 at 11:23:46AM -0300, Marcelo Tosatti wrote:
...
> > Will a heap of busy knfsd processes doing reads or writes exert
> > pressure?   Or is it only local userspace that can pressurize the VM (by
> > either anonymously backed memory or file I/O).
> 
> Any allocator will cause VM pressure.

And I suppose that a busy knfsd qualifies as an "allocator"   :)

...
> > Any enlightenment or suggestions are greatly appreciated :)
> 
> What you can try is to increase the VM tunable vm_vfs_scan_ratio. This is
> the proportion of VFS unused d/i caches that will try to be in one VM
> freeing pass. The default is 6. Try 4 or 3.
> 
> /proc/sys/vm/vm_vfs_scan_ratio

Done!  Set to 3 now - I will let the box run with this setting until
tomorrow, and report back how things look.

> You can also play with
> 
> /proc/sys/vm/vm_cache_scan_ratio (which is the percentage of cache which
> will be scanned in one go).

I'm leaving this one be for now (one variable at a time). But let's see
what tomorrow brings. 

Judging from the code, it seems that it's the vm_vfs_scan_ratio that
directly affects the icache/dcache and dquot - but I'm sure that there
are subtle interactions far beyond what I can possibly hope to
comprehend ;)

Thanks a lot for your suggestions Marcelo!

 / jakob



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.4.25 - large inode_cache
  2004-02-26 14:23       ` Marcelo Tosatti
  2004-02-26 13:53         ` Jakob Oestergaard
@ 2004-02-26 17:43         ` Andreas Dilger
  2004-02-26 20:43           ` Marcelo Tosatti
  1 sibling, 1 reply; 9+ messages in thread
From: Andreas Dilger @ 2004-02-26 17:43 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Jakob Oestergaard, linux-kernel

On Feb 26, 2004  11:23 -0300, Marcelo Tosatti wrote:
> On Thu, 26 Feb 2004, Jakob Oestergaard wrote:
> > On Thu, Feb 26, 2004 at 10:08:23AM -0300, Marcelo Tosatti wrote:
> > > This should be normal behaviour -- the i/d caches grew because of file
> > > system activitity. This memory will be reclaimed in case theres pressure.
> >
> > But how is "pressure" defined?
> >
> > Will a heap of busy knfsd processes doing reads or writes exert
> > pressure?   Or is it only local userspace that can pressurize the VM (by
> > either anonymously backed memory or file I/O).
> 
> Any allocator will cause VM pressure.

But won't all of the knfsd allocations be by necessity GFP_NOFS to avoid
deadlocks, so they will be unable to clear inodes or dentries?  Both
shrink_icache_memory() and shrink_dcache_memory() do nothing if __GFP_FS
isn't set so if there is no user-space allocation pressure we will never
get into the dcache/icache freeing paths from knfsd allocations.

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.4.25 - large inode_cache
  2004-02-26 17:43         ` Andreas Dilger
@ 2004-02-26 20:43           ` Marcelo Tosatti
  2004-02-27 12:27             ` Jakob Oestergaard
  0 siblings, 1 reply; 9+ messages in thread
From: Marcelo Tosatti @ 2004-02-26 20:43 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: Marcelo Tosatti, Jakob Oestergaard, linux-kernel



On Thu, 26 Feb 2004, Andreas Dilger wrote:

> On Feb 26, 2004  11:23 -0300, Marcelo Tosatti wrote:
> > On Thu, 26 Feb 2004, Jakob Oestergaard wrote:
> > > On Thu, Feb 26, 2004 at 10:08:23AM -0300, Marcelo Tosatti wrote:
> > > > This should be normal behaviour -- the i/d caches grew because of file
> > > > system activitity. This memory will be reclaimed in case theres pressure.
> > >
> > > But how is "pressure" defined?
> > >
> > > Will a heap of busy knfsd processes doing reads or writes exert
> > > pressure?   Or is it only local userspace that can pressurize the VM (by
> > > either anonymously backed memory or file I/O).
> >
> > Any allocator will cause VM pressure.
>
> But won't all of the knfsd allocations be by necessity GFP_NOFS to avoid
> deadlocks, so they will be unable to clear inodes or dentries?  Both
> shrink_icache_memory() and shrink_dcache_memory() do nothing if __GFP_FS
> isn't set so if there is no user-space allocation pressure we will never
> get into the dcache/icache freeing paths from knfsd allocations.

Hi Andreas,

AFAICS knfsd does not allocate using GFP_NOFS.




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.4.25 - large inode_cache
  2004-02-26 20:43           ` Marcelo Tosatti
@ 2004-02-27 12:27             ` Jakob Oestergaard
  0 siblings, 0 replies; 9+ messages in thread
From: Jakob Oestergaard @ 2004-02-27 12:27 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Andreas Dilger, linux-kernel

On Thu, Feb 26, 2004 at 05:43:49PM -0300, Marcelo Tosatti wrote:
> 
...
> > > Any allocator will cause VM pressure.
> >
> > But won't all of the knfsd allocations be by necessity GFP_NOFS to avoid
> > deadlocks, so they will be unable to clear inodes or dentries?  Both
> > shrink_icache_memory() and shrink_dcache_memory() do nothing if __GFP_FS
> > isn't set so if there is no user-space allocation pressure we will never
> > get into the dcache/icache freeing paths from knfsd allocations.
> 
> Hi Andreas,
> 
> AFAICS knfsd does not allocate using GFP_NOFS.

So far, it looks like the vm_vfs_scan_ratio setting of '3' (instead of
'6' which was the default) made a big difference.

Currently the machine is using about ~600MB for cache (up from ~100MB),
and about 150MB for buffers (up from 80-100MB).

I'm going to let it run a little longer just with this setting, before
experimenting further.

I'll ask around about the perceived performance of the machine, and pay
attention to it myself.  I'll post the results monday or tuesday (not a
lot of interactive users during the weekend  ;)

Again, thanks a lot for the quick feedback!

/ jakob

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2004-02-27 12:28 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-02-26  1:33 2.4.25 - large inode_cache Jakob Oestergaard
2004-02-26 11:19 ` Christian Leber
2004-02-26 13:08   ` Marcelo Tosatti
2004-02-26 13:03     ` Jakob Oestergaard
2004-02-26 14:23       ` Marcelo Tosatti
2004-02-26 13:53         ` Jakob Oestergaard
2004-02-26 17:43         ` Andreas Dilger
2004-02-26 20:43           ` Marcelo Tosatti
2004-02-27 12:27             ` Jakob Oestergaard

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox