From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eric Dumazet Subject: Re: [PATCH v1 0/8] Deferred dput() and iput() -- reducing lock contention Date: Sat, 17 Jan 2009 08:04:36 +0100 Message-ID: <49718304.90404@cosmosbay.com> References: <20090117022936.20425.43248.stgit@crlf.corp.google.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, Andrew Morton To: Mike Waychison Return-path: In-Reply-To: <20090117022936.20425.43248.stgit@crlf.corp.google.com> Sender: linux-kernel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org Mike Waychison a =C3=A9crit : > We've noticed that at times it can become very easy to have a system = begin to > livelock on dcache_lock/inode_lock (specifically in atomic_dec_and_lo= ck()) when > a lot of dentries are getting finalized at the same time (massive del= ete and > large fdtable destructions are two paths I've seen cause problems). >=20 > This patchset is an attempt to try and reduce the locking overheads a= ssociated > with final dput() and final iput(). This is done by batching dentrie= s and > inodes into per-process queues and processing them in 'parallel' to c= onsolidate > some of the locking. >=20 > Besides various workload testing, I threw together a load (at the end= of this > email) that causes massive fdtables (50K sockets by default) to get d= estroyed > on each cpu in the system. It also populates the dcache for procfs o= n those > tasks for good measure. Comparing lock_stat results (hardware is a S= un x4600 > M2 populated with 8 4-core 2.3GHz packages (32 CPUs) + 128GiB RAM): >=20 Hello Mike Seems quite a large/intrusive infrastructure for a well known problem. I even wasted some time on it. But it seems nobody cared too much or people were too busy. https://kerneltrap.org/mailarchive/linux-netdev/2008/12/11/4397594 (patch 6 should be discarded as followups show it was wrong [PATH 6/7] fs: struct file move from call_rcu() to SLAB_DESTROY_BY_RCU) sockets / pipes dont need dcache_lock or inode_lock at all, I am=20 sure Google machines also uses sockets :) Your test/bench program is quite biased (populating dcache for procfs, = using 50k filedesc on 32 cpu, not very realistic IMHO). I had a workload with processes using 1.000.000 file descriptors, (mainly sockets) and got some latency problems when they had to exit(). This problem was addressed by one cond_resched() added in close_files() (commit 944be0b224724fcbf63c3a3fe3a5478c325a6547 ) > BEFORE PATCHSET: > ---------------------------------------------------------------------= -----------------------------------------------------------------------= --------------------------------------------------- > class name con-bounces contention= s waittime-min waittime-max waittime-total acq-bounces acquisi= tions holdtime-min holdtime-max holdtime-total > ---------------------------------------------------------------------= -----------------------------------------------------------------------= --------------------------------------------------- >=20 > dcache_lock: 5666485 1008617= 2 11.16 2274036.47 3821690090.32 11042789 149= 26411 2.78 42138.28 65887138.46 > ----------- > dcache_lock 3016578 [] d_alloc+0x190/0x1e1 > dcache_lock 1844569 [] d_instantiate+0x2c/0x51 > dcache_lock 4223784 [] _atomic_dec_and_lock+0x3c/0x5c > dcache_lock 207544 [] d_rehash+0x1b/0x43 > ----------- > dcache_lock 2305449 [] d_instantiate+0x2c/0x51 > dcache_lock 1954160 [] d_alloc+0x190/0x1e1 > dcache_lock 4169163 [] _atomic_dec_and_lock+0x3c/0x5c > dcache_lock 866247 [] d_rehash+0x1b/0x43 >=20 > .....................................................................= =2E....................................................................= =2E.................................................... >=20 > inode_lock: 202813 20306= 4 12.91 376655.85 37696597.26 5136095 80= 01156 3.37 15142.77 5033144.13 > ---------- > inode_lock 20192 [] new_inode+0x27/0x63 > inode_lock 1 [] sync_inode+0x19/0x39 > inode_lock 5 [] __mark_inode_dirty+0xe2/0x165 > inode_lock 2 [] __writeback_single_inode+0x1bf/0x26c > ---------- > inode_lock 20198 [] new_inode+0x27/0x63 > inode_lock 1 [] __mark_inode_dirty+0xe2/0x165 > inode_lock 5 [] generic_sync_sb_inodes+0x33/0x31a > inode_lock 165016 [] _atomic_dec_and_lock+0x3c/0x5c >=20 > .....................................................................= =2E....................................................................= =2E.................................................... >=20 > &sbsec->isec_lock: 34428 3443= 1 16.77 67593.54 3780131.15 1415432 32= 00261 4.06 13357.96 1180726.21 >=20 >=20 >=20 >=20 >=20 >=20 > AFTER PATCHSET [Note that inode_lock doesn't even show up anymore]: > ---------------------------------------------------------------------= -----------------------------------------------------------------------= --------------------------------------------------- > class name con-bounces contention= s waittime-min waittime-max waittime-total acq-bounces acquisi= tions holdtime-min holdtime-max holdtime-total > ---------------------------------------------------------------------= -----------------------------------------------------------------------= --------------------------------------------------- >=20 > dcache_lock: 2732103 525482= 4 11.04 1314926.88 2187466928.10 5551173 97= 51332 2.43 78012.24 42830806.29 > ----------- > dcache_lock 3015590 [] d_alloc+0x190/0x1e1 > dcache_lock 1966978 [] d_instantiate+0x2c/0x51 > dcache_lock 258657 [] d_rehash+0x1b/0x43 > dcache_lock 30 [] __link_path_walk+0x42d/0x665 > ----------- > dcache_lock 2493589 [] d_instantiate+0x2c/0x51 > dcache_lock 1862921 [] d_alloc+0x190/0x1e1 > dcache_lock 882054 [] d_rehash+0x1b/0x43 > dcache_lock 15504 [] process_postponed_dentries+0x3e/0x29f >=20 > .....................................................................= =2E....................................................................= =2E.................................................... >=20 >=20 > ChangeLog: > [v1] - Jan 16, 2009 > - Initial version >=20 > Patches in series: >=20 > 1) Deferred batching of dput() > 2) Parallel dput() > 3) Deferred batching of iput() > 4) Fixing iput called from put_super path > 5) Parallelize iput() > 6) hugetlbfs drop_inode update > 7) Make drop_caches flush pending dput()s and iput()s > 8) Make the sync path drain dentries and inodes >=20 > Caveats: >=20 > * It's not clear to me how to make 'sync(2)' do what it should. Cu= rrently we > flush pending dputs and iputs before writing anything out, but presum= ably, > there may be hidden iputs in the do_sync path that really ought to be > synchronous. > * I don't like the changes in patch 4 and it should probably be cha= nged to > have the ->put_super callback paths call a synchronous version of ipu= t() > (perhaps sync_iput()?) instead of having to tag the inodes with S_SYN= CIPUT. > * cramfs, gfs2, ocfs2 need to be updated for the new drop_inode sem= antics. > --- >=20 > fs/block_dev.c | 2=20 > fs/dcache.c | 374 +++++++++++++++++++++++++++++++++++++-= --- > fs/drop_caches.c | 1=20 > fs/ext2/super.c | 2=20 > fs/gfs2/glock.c | 2=20 > fs/gfs2/ops_fstype.c | 2=20 > fs/hugetlbfs/inode.c | 72 +++++--- > fs/inode.c | 441 ++++++++++++++++++++++++++++++++++++++= +++------- > fs/ntfs/super.c | 2=20 > fs/smbfs/inode.c | 2=20 > fs/super.c | 6 - > fs/sync.c | 5 + > include/linux/dcache.h | 1=20 > include/linux/fs.h | 18 ++ > 14 files changed, 787 insertions(+), 143 deletions(-) >=20