On Wed, Mar 18, 2009 at 02:53:47PM +0000, Paul Evans wrote: > We have a server whose dirty page count keeps increasing all the time, > to the point where 'sync' takes ages to flush the pages: > > root@freehand:~# time sync > > real 1m15.570s > user 0m0.000s > sys 0m0.052s > > We have some graphs of the dirty page count, as captured > from /proc/vmstat's "nr_dirty" entry: > > http://opensource.mxtelecom.com/tmp/freehand-dirty-day.png > http://opensource.mxtelecom.com/tmp/freehand-dirty-week.png > > I have tuned the dirty page flushing sysctls to the following: > > root@freehand:~# for F in /proc/sys/vm/dirty_*; do echo -n "$F: "; cat $F; done > /proc/sys/vm/dirty_background_ratio: 1 > /proc/sys/vm/dirty_expire_centisecs: 3000 > /proc/sys/vm/dirty_ratio: 3 > /proc/sys/vm/dirty_writeback_centisecs: 500 > > The role of the machine itself is that it performings large amount of > kernel iptables routing/firewalling traffic, and runs a set of apache > servers as HTTP<->Tomcat gateways. > > root@freehand:~# uname -r > 2.6.27-fes > > (this is a build of stock 2.6.27 source, with some extra iptables > patches. There shouldn't be anything mm-related here) > > By my understanding of the dirty page flush algorithm, we shouldn't be > accumulating these pages all the time; any page older than 30 seconds > ought to be written out, yes? > > If we manually 'sync', as above, then the count drops to zero, but then > slowly starts ramping up again as observed. > > As a temporary workaround I've put 'sync' in cron every 10 minutes, but > is there some more tuning I can do; or at least probing to see where > these pages are being accumulated from? Hi Paul, The attached filecache patch shall help identify the dirty files/pages. Usage: # run patched kernel modprobe filecache cat /proc/filecache The dirty files will have the 'D' flag in its "state" field. Thanks, Fengguang