From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p4U3lkEk219007 for ; Sun, 29 May 2011 22:47:46 -0500 Received: from ipmail06.adl6.internode.on.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 4F7C9166B333 for ; Sun, 29 May 2011 20:47:44 -0700 (PDT) Received: from ipmail06.adl6.internode.on.net (ipmail06.adl6.internode.on.net [150.101.137.145]) by cuda.sgi.com with ESMTP id FnnHqGua5iMD8oGf for ; Sun, 29 May 2011 20:47:44 -0700 (PDT) Date: Mon, 30 May 2011 13:47:41 +1000 From: Dave Chinner Subject: Re: [regression, 3.0-rc1] dentry cache growth during unlinks, XFS performance way down Message-ID: <20110530034741.GD561@dastard> References: <20110530020604.GC561@dastard> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20110530020604.GC561@dastard> List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Sender: xfs-bounces@oss.sgi.com Errors-To: xfs-bounces@oss.sgi.com To: linux-kernel@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, xfs@oss.sgi.com T24gTW9uLCBNYXkgMzAsIDIwMTEgYXQgMTI6MDY6MDRQTSArMTAwMCwgRGF2ZSBDaGlubmVyIHdy b3RlOgo+IEZvbGtzLAo+IAo+IEkganVzdCBib290ZWQgdXAgYSAzLjAtcmMxIGtlcm5lbCwgYW5k IG1vdW50ZWQgYW4gWEZTIGZpbGVzeXN0ZW0KPiB3aXRoIDUwTSBmaWxlcyBpbiBpdC4gUnVubmlu ZzoKPiAKPiAkIGZvciBpIGluIC9tbnQvc2NyYXRjaC8qOyBkbyBzdWRvIC91c3IvYmluL3RpbWUg cm0gLXJmICRpIDI+JjEgJiBkb25lCj4gCj4gcnVucyBhbiA4LXdheSBwYXJhbGxlbCB1bmxpbmsg b24gdGhlIGZpbGVzLiBOb3JtYWxseSB0aGlzIHJ1bnMgYXQKPiBhcm91bmQgODBrIHVubGlua3Mv cywgYW5kIGl0IHJ1bnMgd2l0aCBhYm91dCA1MDBrLTFtIGRlbnRyaWVzIGFuZAo+IGlub2RlcyBj YWNoZWQgaW4gdGhlIHN0ZWFkeSBzdGF0ZS4KPiAKPiBUaGUgc3RlYWR5IHN0YXRlIGJlaGF2aW91 ciB3aXRoIDMuMC1yYzEgaXMgdGhhdCB0aGVyZSBhcmUgYXJvdW5kIDEwbQo+IGNhY2hlZCBkZW50 cmllcyAtIGFsbCBuZWdhdGl2ZSBkZW50cmllcyAtIGNvbnN1bWluZyBhYm91dCAxLjZHQiBvZgo+ IFJBTSAob2YgNEdCIHRvdGFsKS4gUHJldmlvdXMgc3RlYWR5IHN0YXRlIHdhcywgSUlSQywgYXJv dW5kIDIwME1CIG9mCj4gZGVudHJpZXMuIE15IGluaXRpYWwgc3VzcGljaW9ucyBhcmUgdGhhdCB0 aGUgZGVudHJ5IHVuaGFzaGluZwo+IGNoYW5nZdGVIG1heSBiZSB0aGUgY2F1c2Ugb2YgdGhpcy4u LgoKU28gYSBiaXNlY3QgbGFuZHMgb246CgokIGdpdCBiaXNlY3QgZ29vZAo3OWJmN2M3MzJiNWZm NzViOTYwMjJlZDlkMjkxODFhZmQzZDI1MDljIGlzIHRoZSBmaXJzdCBiYWQgY29tbWl0CmNvbW1p dCA3OWJmN2M3MzJiNWZmNzViOTYwMjJlZDlkMjkxODFhZmQzZDI1MDljCkF1dGhvcjogU2FnZSBX ZWlsIDxzYWdlQG5ld2RyZWFtLm5ldD4KRGF0ZTogICBUdWUgTWF5IDI0IDEzOjA2OjA2IDIwMTEg LTA3MDAKCiAgICB2ZnM6IHB1c2ggZGVudHJ5X3VuaGFzaCBvbiBybWRpciBpbnRvIGZpbGUgc3lz dGVtcwoKICAgIE9ubHkgYSBmZXcgZmlsZSBzeXN0ZW1zIG5lZWQgdGhpcy4gIFN0YXJ0IGJ5IHB1 c2hpbmcgaXQgZG93biBpbnRvIGVhY2gKICAgIGZzIHJtZGlyIG1ldGhvZCAoZXhjZXB0IGdmczIg YW5kIHhmcykgc28gaXQgY2FuIGJlIGRlYWx0IHdpdGggb24gYSBwZXItZnMKICAgIGJhc2lzLgoK ICAgIFRoaXMgZG9lcyBub3QgY2hhbmdlIGJlaGF2aW9yIGZvciBhbnkgaW4tdHJlZSBmaWxlIHN5 c3RlbXMuCgogICAgQWNrZWQtYnk6IENocmlzdG9waCBIZWxsd2lnIDxoY2hAbHN0LmRlPgogICAg U2lnbmVkLW9mZi1ieTogU2FnZSBXZWlsIDxzYWdlQG5ld2RyZWFtLm5ldD4KICAgIFNpZ25lZC1v ZmYtYnk6IEFsIFZpcm8gPHZpcm9AemVuaXYubGludXgub3JnLnVrPgoKOjA0MDAwMCAwNDAwMDAg YzQ1ZDU4NzE4ZDMzZjdjYTFkYTg3Zjk5ZmE1MzhmNjVlYWEzZmUyYyBlYzcxY2JlY2M1OWU4YjE0 MmE3YmZjYWJkNDY5ZmE2NzQ4NmJlZjMwIE0gICAgICAgIGZzCgpPaywgc28gdGhlIHF1ZXN0aW9u IGhhcyB0byBiZSBhc2tlZCAtIHdoeSB3YXNuJ3QgZGVudHJ5X3VuaGFzaCgpCnB1c2hlZCBkb3du IGludG8gWEZTPwoKRnVydGhlciwgbm93IHRoYXQgZGVudHJ5X3VuaGFzaCgpIGhhcyBiZWVuIHJl bW92ZWQgZnJvbSBtb3N0CmZpbGVzeXN0ZW1zLCB3aGF0IGlzIHJlcGxhY2luZyB0aGUgc2hyaW5r X2RjYWNoZV9wYXJlbnQoKSBjYWxsIHRoYXQKd2FzIGNsZWFuaW5nIHVwIHRoZSAid2UgY2FuIG5l dmVyIHJlZmVyZW5jZSBhZ2FpbiIgY2hpbGQgZGVudHJpZXMgb2YKdGhlIHVubGlua2VkIGRpcmVj dG9yaWVzPyBJdCBhcHBlYXJzIHRoYXQgdGhleSBhcmUgbm93IGJlaW5nIGxlZnQgaW4KbWVtb3J5 IG9uIHRoZSBkZW50cnkgTFJVLiBJdCBhbHNvIGFwcGVhcnMgdGhhdCB0aGV5IGhhdmUKRF9SRUZF UkVOQ0VEIGJpdCBzZXQsIHNvIHRoZXkgZG8gbm90IGdldCBpbW1lZGlhdGVseSByZWNsYWltZWQg YnkKdGhlIHNocmlua2VyLgoKSGVuY2UgdGhleSBtdWNoIG1vcmUgZGlmZmljdWx0IHRvIHJlbW92 ZSBmcm9tIG1lbW9yeSB0aGFuIGluIDIuNi4zOSwKYW5kIHdpdGggdGhlIHJhdGUgYXQgd2hpY2gg dGhleSBhcmUgYmVpbmcgY3JlYXRlZCB0aGUgc2hyaW5rZXIgaXMKc2ltcGx5IG5vdCBhZ2dyZXNz aXZlIGVub3VnaCB0byBmcmVlIHRoZW0gYXQgdGhlIHNhbWUgcmF0ZSBhcyBpbgoyLjYuMzkgYW5k IGhlbmNlIHRoZSBtZW1vcnkgYmFsYW5jZSBvZiB0aGUgY2FjaGVzIGlzIHNpZ25pZmljYW50bHkK Y2hhbmdlZC4KCkl0IHdvdWxkIHNlZW0gdG8gbWUgdGhhdCB3ZSBzdGlsbCBuZWVkIHRoZSBjYWxs IHRvCnNocmlua19kY2FjaGVfcGFyZW50KCkgZm9yIHVubGlua2VkIGRpcmVjdG9yaWVzIC0gdGhh dCBwYXJ0IG9mIHRoZQpkZW50cnlfdW5oYXNoKCkgc3RpbGwgbmVlZHMgdG8gYmUgcnVuIHRvIGVu c3VyZSB0aGF0IHdlIGRvbid0CnBvbGx1dGUgbWVtb3J5IHdpdGggc3RhbGUgZGVudHJpZXMuIFRo ZSBvcmlnaW5hbCBwYXRjaCBzZXJpZXMKc3VnZ2VzdHMgdGhhdCB0aGlzIGlzIGEgcGVyLWZpbGVz eXN0ZW0gZGVjaXNpb247IEkgdGhpbmsgdGhpcwpwcm9ibGVtIHNob3dzIHRoYXQgaXQgaXMgcmVh bGx5IG5lY2Vzc2FyeSBmb3IgbW9zdCBmaWxlc3lzdGVtcy4KU28sIGRvIGkganVzdCBmaXggdGhp cyBpbiBYRlMsIG9yIHNob3VsZCBJIHJlLWFkZCBjYWxscyB0bwpzaHJpbmtfZGNhY2hlX3BhcmVu dCgpIGluIHRoZSBWRlMgZm9yIHJtZGlyIGFuZCByZW5hbWU/Cgo+IG9mIGFib3V0IDIwcywgd2hl cmUgdGhlIHBlYWsgaXMgYWJvdXQgODBrIHVubGlua3MvcywgYW5kIHRoZSB0cm91Z2gKPiBpcyBh cm91bmQgMjBrIHVubGlua3Mvcy4gVGhlIHJ1bnRpbWUgb2YgdGhlIDUwbSBpbm9kZSBkZWxldGUg aGFzCj4gZ29uZSBmcm9tIGFyb3VuZCAxMG0gb24gMi42LjM5LCB0bzoKPiAKPiAxMS43MXVzZXIg NDcwLjA4c3lzdGVtIDE1OjA3LjkxZWxhcHNlZCA1MyVDUFUgKDBhdmd0ZXh0KzBhdmdkYXRhIDEz MzE4NG1heHJlc2lkZW50KWsKPiAwaW5wdXRzKzBvdXRwdXRzICgzMG1ham9yKzQ5NzIyOG1pbm9y KXBhZ2VmYXVsdHMgMHN3YXBzCj4gMTEuNTB1c2VyIDQ2OC4zMHN5c3RlbSAxNToxNC4zNWVsYXBz ZWQgNTIlQ1BVICgwYXZndGV4dCswYXZnZGF0YSAxMzMxNjhtYXhyZXNpZGVudClrCj4gMGlucHV0 cyswb3V0cHV0cyAoNDJtYWpvcis0OTcyNjhtaW5vcilwYWdlZmF1bHRzIDBzd2Fwcwo+IDExLjM0 dXNlciA0NjYuNjZzeXN0ZW0gMTU6MjYuMDRlbGFwc2VkIDUxJUNQVSAoMGF2Z3RleHQrMGF2Z2Rh dGEgMTMzMjE2bWF4cmVzaWRlbnQpawo+IDBpbnB1dHMrMG91dHB1dHMgKDE4bWFqb3IrNDk3MTIx bWlub3IpcGFnZWZhdWx0cyAwc3dhcHMKPiAxMi4xNHVzZXIgNDcwLjQ2c3lzdGVtIDE1OjI2LjYw ZWxhcHNlZCA1MiVDUFUgKDBhdmd0ZXh0KzBhdmdkYXRhIDEzMzIxNm1heHJlc2lkZW50KWsKPiAw aW5wdXRzKzBvdXRwdXRzICg0NG1ham9yKzQ5NzMwOW1pbm9yKXBhZ2VmYXVsdHMgMHN3YXBzCj4g MTIuMDZ1c2VyIDQ2My43NHN5c3RlbSAxNToyOC44NGVsYXBzZWQgNTElQ1BVICgwYXZndGV4dCsw YXZnZGF0YSAxMzMyMzJtYXhyZXNpZGVudClrCj4gMGlucHV0cyswb3V0cHV0cyAoMjVtYWpvcis0 OTcwNDZtaW5vcilwYWdlZmF1bHRzIDBzd2Fwcwo+IDExLjM3dXNlciA0NjguMThzeXN0ZW0gMTU6 MjkuMDdlbGFwc2VkIDUxJUNQVSAoMGF2Z3RleHQrMGF2Z2RhdGEgMTMzMTg0bWF4cmVzaWRlbnQp awo+IDBpbnB1dHMrMG91dHB1dHMgKDU1bWFqb3IrNDk3MDU2bWlub3IpcGFnZWZhdWx0cyAwc3dh cHMKPiAxMS42OXVzZXIgNDc0LjQ2c3lzdGVtIDE1OjQ3LjQ1ZWxhcHNlZCA1MSVDUFUgKDBhdmd0 ZXh0KzBhdmdkYXRhIDEzMzIzMm1heHJlc2lkZW50KWsKPiAwaW5wdXRzKzBvdXRwdXRzICg2MW1h am9yKzQ5NzI4NG1pbm9yKXBhZ2VmYXVsdHMgMHN3YXBzCj4gMTEuMzJ1c2VyIDQ3Ni45M3N5c3Rl bSAxNjowNS4xNGVsYXBzZWQgNTAlQ1BVICgwYXZndGV4dCswYXZnZGF0YSAxMzMxODRtYXhyZXNp ZGVudClrCj4gMGlucHV0cyswb3V0cHV0cyAoMzBtYWpvcis0OTcyMjVtaW5vcilwYWdlZmF1bHRz IDBzd2Fwcwo+IAo+IEFib3V0IDE2IG1pbnV0ZXMuIEknbSBub3Qgc3VyZSB5ZXQgd2hldGhlciB0 aGlzIGNoYW5nZSBvZiBjYWNoZQo+IGJlaGF2aW91ciBpcyB0aGUgY2F1c2Ugb2YgdGhlIGVudGly ZSBwZXJmb3JtYW5jZSByZWdyZXNzaW9uLCBidXQKPiBpdCdzIGEgZ29vZCBjaGFuY2UgdGhhdCBp dCBpcyBhIGNvbnRyaWJ1dGluZyBmYWN0b3IuCgpUaGUgY2FjaGUgc2l6ZSBncm93dGggYnVnIGRv ZXMgbm90IGFwcGVhciB0byBiZSByZXNwb25zaWJsZSBmb3IgYW55Cm9mIHRoZSBwZXJmb3JtYW5j ZSByZWdyZXNzaW9uLgoKQ2hlZXJzLAoKRGF2ZS4KLS0gCkRhdmUgQ2hpbm5lcgpkYXZpZEBmcm9t b3JiaXQuY29tCgpfX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f Xwp4ZnMgbWFpbGluZyBsaXN0Cnhmc0Bvc3Muc2dpLmNvbQpodHRwOi8vb3NzLnNnaS5jb20vbWFp bG1hbi9saXN0aW5mby94ZnMK From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754336Ab1E3Drq (ORCPT ); Sun, 29 May 2011 23:47:46 -0400 Received: from ipmail06.adl6.internode.on.net ([150.101.137.145]:26060 "EHLO ipmail06.adl6.internode.on.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753275Ab1E3Drp (ORCPT ); Sun, 29 May 2011 23:47:45 -0400 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AiIEAJwS4015LCoegWdsb2JhbABVhEmhYBUBARYmJYhxrHGPWw6BHYNsgQcEn3s Date: Mon, 30 May 2011 13:47:41 +1000 From: Dave Chinner To: linux-kernel@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, xfs@oss.sgi.com Subject: Re: [regression, 3.0-rc1] dentry cache growth during unlinks, XFS performance way down Message-ID: <20110530034741.GD561@dastard> References: <20110530020604.GC561@dastard> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20110530020604.GC561@dastard> User-Agent: Mutt/1.5.20 (2009-06-14) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, May 30, 2011 at 12:06:04PM +1000, Dave Chinner wrote: > Folks, > > I just booted up a 3.0-rc1 kernel, and mounted an XFS filesystem > with 50M files in it. Running: > > $ for i in /mnt/scratch/*; do sudo /usr/bin/time rm -rf $i 2>&1 & done > > runs an 8-way parallel unlink on the files. Normally this runs at > around 80k unlinks/s, and it runs with about 500k-1m dentries and > inodes cached in the steady state. > > The steady state behaviour with 3.0-rc1 is that there are around 10m > cached dentries - all negative dentries - consuming about 1.6GB of > RAM (of 4GB total). Previous steady state was, IIRC, around 200MB of > dentries. My initial suspicions are that the dentry unhashing > changeѕ may be the cause of this... So a bisect lands on: $ git bisect good 79bf7c732b5ff75b96022ed9d29181afd3d2509c is the first bad commit commit 79bf7c732b5ff75b96022ed9d29181afd3d2509c Author: Sage Weil Date: Tue May 24 13:06:06 2011 -0700 vfs: push dentry_unhash on rmdir into file systems Only a few file systems need this. Start by pushing it down into each fs rmdir method (except gfs2 and xfs) so it can be dealt with on a per-fs basis. This does not change behavior for any in-tree file systems. Acked-by: Christoph Hellwig Signed-off-by: Sage Weil Signed-off-by: Al Viro :040000 040000 c45d58718d33f7ca1da87f99fa538f65eaa3fe2c ec71cbecc59e8b142a7bfcabd469fa67486bef30 M fs Ok, so the question has to be asked - why wasn't dentry_unhash() pushed down into XFS? Further, now that dentry_unhash() has been removed from most filesystems, what is replacing the shrink_dcache_parent() call that was cleaning up the "we can never reference again" child dentries of the unlinked directories? It appears that they are now being left in memory on the dentry LRU. It also appears that they have D_REFERENCED bit set, so they do not get immediately reclaimed by the shrinker. Hence they much more difficult to remove from memory than in 2.6.39, and with the rate at which they are being created the shrinker is simply not aggressive enough to free them at the same rate as in 2.6.39 and hence the memory balance of the caches is significantly changed. It would seem to me that we still need the call to shrink_dcache_parent() for unlinked directories - that part of the dentry_unhash() still needs to be run to ensure that we don't pollute memory with stale dentries. The original patch series suggests that this is a per-filesystem decision; I think this problem shows that it is really necessary for most filesystems. So, do i just fix this in XFS, or should I re-add calls to shrink_dcache_parent() in the VFS for rmdir and rename? > of about 20s, where the peak is about 80k unlinks/s, and the trough > is around 20k unlinks/s. The runtime of the 50m inode delete has > gone from around 10m on 2.6.39, to: > > 11.71user 470.08system 15:07.91elapsed 53%CPU (0avgtext+0avgdata 133184maxresident)k > 0inputs+0outputs (30major+497228minor)pagefaults 0swaps > 11.50user 468.30system 15:14.35elapsed 52%CPU (0avgtext+0avgdata 133168maxresident)k > 0inputs+0outputs (42major+497268minor)pagefaults 0swaps > 11.34user 466.66system 15:26.04elapsed 51%CPU (0avgtext+0avgdata 133216maxresident)k > 0inputs+0outputs (18major+497121minor)pagefaults 0swaps > 12.14user 470.46system 15:26.60elapsed 52%CPU (0avgtext+0avgdata 133216maxresident)k > 0inputs+0outputs (44major+497309minor)pagefaults 0swaps > 12.06user 463.74system 15:28.84elapsed 51%CPU (0avgtext+0avgdata 133232maxresident)k > 0inputs+0outputs (25major+497046minor)pagefaults 0swaps > 11.37user 468.18system 15:29.07elapsed 51%CPU (0avgtext+0avgdata 133184maxresident)k > 0inputs+0outputs (55major+497056minor)pagefaults 0swaps > 11.69user 474.46system 15:47.45elapsed 51%CPU (0avgtext+0avgdata 133232maxresident)k > 0inputs+0outputs (61major+497284minor)pagefaults 0swaps > 11.32user 476.93system 16:05.14elapsed 50%CPU (0avgtext+0avgdata 133184maxresident)k > 0inputs+0outputs (30major+497225minor)pagefaults 0swaps > > About 16 minutes. I'm not sure yet whether this change of cache > behaviour is the cause of the entire performance regression, but > it's a good chance that it is a contributing factor. The cache size growth bug does not appear to be responsible for any of the performance regression. Cheers, Dave. -- Dave Chinner david@fromorbit.com