All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <dada1@cosmosbay.com>
To: Mike Waychison <mikew@google.com>
Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH v1 0/8] Deferred dput() and iput() -- reducing lock contention
Date: Sat, 17 Jan 2009 08:04:36 +0100	[thread overview]
Message-ID: <49718304.90404@cosmosbay.com> (raw)
In-Reply-To: <20090117022936.20425.43248.stgit@crlf.corp.google.com>

Mike Waychison a écrit :
> We've noticed that at times it can become very easy to have a system begin to
> livelock on dcache_lock/inode_lock (specifically in atomic_dec_and_lock()) when
> a lot of dentries are getting finalized at the same time (massive delete and
> large fdtable destructions are two paths I've seen cause problems).
> 
> This patchset is an attempt to try and reduce the locking overheads associated
> with final dput() and final iput().  This is done by batching dentries and
> inodes into per-process queues and processing them in 'parallel' to consolidate
> some of the locking.
> 
> Besides various workload testing, I threw together a load (at the end of this
> email) that causes massive fdtables (50K sockets by default) to get destroyed
> on each cpu in the system.  It also populates the dcache for procfs on those
> tasks for good measure.  Comparing lock_stat results (hardware is a Sun x4600
> M2 populated with 8 4-core 2.3GHz packages (32 CPUs) + 128GiB RAM):
> 

Hello Mike

Seems quite a large/intrusive infrastructure for a well known problem.
I even wasted some time on it.
But it seems nobody cared too much or people were too busy.

https://kerneltrap.org/mailarchive/linux-netdev/2008/12/11/4397594

(patch 6 should be discarded as followups show it was wrong
[PATH 6/7] fs: struct file move from call_rcu() to SLAB_DESTROY_BY_RCU)


sockets / pipes dont need dcache_lock or inode_lock at all, I am 
sure Google machines also uses sockets :)

Your test/bench program is quite biased (populating dcache for procfs, using
50k filedesc on 32 cpu, not very realistic IMHO).

I had a workload with processes using 1.000.000 file descriptors,
(mainly sockets) and got some latency problems when they had to exit().
This problem was addressed by one cond_resched() added in close_files()
(commit 944be0b224724fcbf63c3a3fe3a5478c325a6547 )


> BEFORE PATCHSET:
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>                               class name    con-bounces    contentions   waittime-min   waittime-max waittime-total    acq-bounces   acquisitions   holdtime-min   holdtime-max holdtime-total
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> 
>                              dcache_lock:       5666485       10086172          11.16     2274036.47  3821690090.32       11042789       14926411           2.78       42138.28    65887138.46
>                              -----------
>                              dcache_lock        3016578          [<ffffffff810bd7e7>] d_alloc+0x190/0x1e1
>                              dcache_lock        1844569          [<ffffffff810bcdfc>] d_instantiate+0x2c/0x51
>                              dcache_lock        4223784          [<ffffffff8119bd88>] _atomic_dec_and_lock+0x3c/0x5c
>                              dcache_lock         207544          [<ffffffff810bc36a>] d_rehash+0x1b/0x43
>                              -----------
>                              dcache_lock        2305449          [<ffffffff810bcdfc>] d_instantiate+0x2c/0x51
>                              dcache_lock        1954160          [<ffffffff810bd7e7>] d_alloc+0x190/0x1e1
>                              dcache_lock        4169163          [<ffffffff8119bd88>] _atomic_dec_and_lock+0x3c/0x5c
>                              dcache_lock         866247          [<ffffffff810bc36a>] d_rehash+0x1b/0x43
> 
> ...............................................................................................................................................................................................
> 
>                               inode_lock:        202813         203064          12.91      376655.85    37696597.26        5136095        8001156           3.37       15142.77     5033144.13
>                               ----------
>                               inode_lock          20192          [<ffffffff810bfdbc>] new_inode+0x27/0x63
>                               inode_lock              1          [<ffffffff810c6f58>] sync_inode+0x19/0x39
>                               inode_lock              5          [<ffffffff810c76b5>] __mark_inode_dirty+0xe2/0x165
>                               inode_lock              2          [<ffffffff810c6e92>] __writeback_single_inode+0x1bf/0x26c
>                               ----------
>                               inode_lock          20198          [<ffffffff810bfdbc>] new_inode+0x27/0x63
>                               inode_lock              1          [<ffffffff810c76b5>] __mark_inode_dirty+0xe2/0x165
>                               inode_lock              5          [<ffffffff810c70fc>] generic_sync_sb_inodes+0x33/0x31a
>                               inode_lock         165016          [<ffffffff8119bd88>] _atomic_dec_and_lock+0x3c/0x5c
> 
> ...............................................................................................................................................................................................
> 
>                        &sbsec->isec_lock:         34428          34431          16.77       67593.54     3780131.15        1415432        3200261           4.06       13357.96     1180726.21
> 
> 
> 
> 
> 
> 
> AFTER PATCHSET [Note that inode_lock doesn't even show up anymore]:
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>                               class name    con-bounces    contentions   waittime-min   waittime-max waittime-total    acq-bounces   acquisitions   holdtime-min   holdtime-max holdtime-total
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> 
>                              dcache_lock:       2732103        5254824          11.04     1314926.88  2187466928.10        5551173        9751332           2.43       78012.24    42830806.29
>                              -----------
>                              dcache_lock        3015590          [<ffffffff810bdb04>] d_alloc+0x190/0x1e1
>                              dcache_lock        1966978          [<ffffffff810bd118>] d_instantiate+0x2c/0x51
>                              dcache_lock         258657          [<ffffffff810bc3cd>] d_rehash+0x1b/0x43
>                              dcache_lock             30          [<ffffffff810b7ad4>] __link_path_walk+0x42d/0x665
>                              -----------
>                              dcache_lock        2493589          [<ffffffff810bd118>] d_instantiate+0x2c/0x51
>                              dcache_lock        1862921          [<ffffffff810bdb04>] d_alloc+0x190/0x1e1
>                              dcache_lock         882054          [<ffffffff810bc3cd>] d_rehash+0x1b/0x43
>                              dcache_lock          15504          [<ffffffff810bc5e1>] process_postponed_dentries+0x3e/0x29f
> 
> ...............................................................................................................................................................................................
> 
> 
> ChangeLog:
>   [v1] - Jan 16, 2009
>     - Initial version
> 
> Patches in series:
> 
>   1)  Deferred batching of dput()
>   2)  Parallel dput()
>   3)  Deferred batching of iput()
>   4)  Fixing iput called from put_super path
>   5)  Parallelize iput()
>   6)  hugetlbfs drop_inode update
>   7)  Make drop_caches flush pending dput()s and iput()s
>   8)  Make the sync path drain dentries and inodes
> 
> Caveats:
> 
>   * It's not clear to me how to make 'sync(2)' do what it should.  Currently we
> flush pending dputs and iputs before writing anything out, but presumably,
> there may be hidden iputs in the do_sync path that really ought to be
> synchronous.
>   * I don't like the changes in patch 4 and it should probably be changed to
> have the ->put_super callback paths call a synchronous version of iput()
> (perhaps sync_iput()?) instead of having to tag the inodes with S_SYNCIPUT.
>   * cramfs, gfs2, ocfs2 need to be updated for the new drop_inode semantics.
> ---
> 
>  fs/block_dev.c         |    2 
>  fs/dcache.c            |  374 +++++++++++++++++++++++++++++++++++++----
>  fs/drop_caches.c       |    1 
>  fs/ext2/super.c        |    2 
>  fs/gfs2/glock.c        |    2 
>  fs/gfs2/ops_fstype.c   |    2 
>  fs/hugetlbfs/inode.c   |   72 +++++---
>  fs/inode.c             |  441 +++++++++++++++++++++++++++++++++++++++++-------
>  fs/ntfs/super.c        |    2 
>  fs/smbfs/inode.c       |    2 
>  fs/super.c             |    6 -
>  fs/sync.c              |    5 +
>  include/linux/dcache.h |    1 
>  include/linux/fs.h     |   18 ++
>  14 files changed, 787 insertions(+), 143 deletions(-)
> 



  parent reply	other threads:[~2009-01-17  7:05 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-01-17  2:29 [PATCH v1 0/8] Deferred dput() and iput() -- reducing lock contention Mike Waychison
2009-01-17  2:29 ` [PATCH v1 1/8] Deferred batching of dput() Mike Waychison
2009-01-17 10:15   ` Evgeniy Polyakov
2009-01-20 20:07     ` Mike Waychison
2009-01-17  2:29 ` [PATCH v1 2/8] Parallel dput() Mike Waychison
2009-01-17  2:29 ` [PATCH v1 3/8] Deferred batching of iput() Mike Waychison
2009-01-17 10:18   ` Evgeniy Polyakov
2009-01-20 20:07     ` Mike Waychison
2009-01-17  2:29 ` [PATCH v1 4/8] Fixing iput() called from put_super path Mike Waychison
2009-01-17  2:30 ` [PATCH v1 5/8] Parallelize iput() Mike Waychison
2009-01-17  2:30 ` [PATCH v1 6/8] hugetlbfs drop_inode update Mike Waychison
2009-01-17  2:30 ` [PATCH v1 7/8] Make drop_caches flush pending dput()s and iput()s Mike Waychison
2009-01-17  2:30 ` [PATCH v1 8/8] Make the sync path drain dentries and inodes Mike Waychison
2009-01-17  7:04 ` Eric Dumazet [this message]
2009-01-20 20:00   ` [PATCH v1 0/8] Deferred dput() and iput() -- reducing lock contention Mike Waychison
2009-01-20 20:00     ` Mike Waychison
2009-01-17  8:12 ` Dave Chinner
2009-01-20 19:01   ` Mike Waychison
2009-01-29  2:09   ` Mike Waychison
2009-01-21  5:52 ` Andi Kleen
2009-01-21  6:22   ` Mike Waychison
2009-01-21  8:48     ` Andi Kleen
2009-01-21 17:28       ` Mike Waychison

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49718304.90404@cosmosbay.com \
    --to=dada1@cosmosbay.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mikew@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.