From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id o4R2Gt7s104492 for ; Wed, 26 May 2010 21:16:56 -0500 Received: from mx1.suse.de (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id A991C144D916 for ; Wed, 26 May 2010 19:19:18 -0700 (PDT) Received: from mx1.suse.de (cantor.suse.de [195.135.220.2]) by cuda.sgi.com with ESMTP id mbvZ3gYWNbznAK4Z for ; Wed, 26 May 2010 19:19:18 -0700 (PDT) Date: Thu, 27 May 2010 12:19:05 +1000 From: Nick Piggin Subject: Re: [PATCH 3/5] superblock: introduce per-sb cache shrinker infrastructure Message-ID: <20100527021905.GG22536@laptop> References: <1274777588-21494-1-git-send-email-david@fromorbit.com> <1274777588-21494-4-git-send-email-david@fromorbit.com> <20100526164116.GD22536@laptop> <20100526231214.GB1395@dastard> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20100526231214.GB1395@dastard> List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Sender: xfs-bounces@oss.sgi.com Errors-To: xfs-bounces@oss.sgi.com To: Dave Chinner Cc: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, xfs@oss.sgi.com T24gVGh1LCBNYXkgMjcsIDIwMTAgYXQgMDk6MTI6MTRBTSArMTAwMCwgRGF2ZSBDaGlubmVyIHdy b3RlOgo+IE9uIFRodSwgTWF5IDI3LCAyMDEwIGF0IDAyOjQxOjE2QU0gKzEwMDAsIE5pY2sgUGln Z2luIHdyb3RlOgo+ID4gT24gVHVlLCBNYXkgMjUsIDIwMTAgYXQgMDY6NTM6MDZQTSArMTAwMCwg RGF2ZSBDaGlubmVyIHdyb3RlOgo+ID4gPiBAQCAtNDU2LDIxICs0NTYsMTYgQEAgc3RhdGljIHZv aWQgcHJ1bmVfb25lX2RlbnRyeShzdHJ1Y3QgZGVudHJ5ICogZGVudHJ5KQo+ID4gPiArCS8qCj4g PiA+ICsJICogaWYgd2UgY2FuJ3QgZ2V0IHRoZSB1bW91bnQgbG9jaywgdGhlbiB0aGVyZSdzIG5v IHBvaW50IGhhdmluZyB0aGUKPiA+ID4gKwkgKiBzaHJpbmtlciB0cnkgYWdhaW4gYmVjYXVzZSB0 aGUgc2IgaXMgYmVpbmcgdG9ybiBkb3duLgo+ID4gPiArCSAqLwo+ID4gPiArCWlmICghZG93bl9y ZWFkX3RyeWxvY2soJnNiLT5zX3Vtb3VudCkpCj4gPiA+ICsJCXJldHVybiAtMTsKPiA+IAo+ID4g V291bGQgeW91IGp1c3QgZWxhYm9yYXRlIG9uIHRoZSBsb2NrIG9yZGVyIHByb2JsZW0gc29tZXdo ZXJlPyAodGhlCj4gPiBjb21tZW50IG1ha2VzIGl0IGxvb2sgbGlrZSB3ZSAqY291bGQqIHRha2Ug dGhlIG11dGV4IGlmIHdlIHdhbnRlZAo+ID4gdG8pLgo+IAo+IFRoZSBzaHJpbmtlciBpcyB1bnJl Z2lzdGVyZWQgaW4gZGVhY3RpdmF0ZV9sb2NrZWRfc3VwZXIoKSB3aGljaCBpcwo+IGp1c3QgYmVm b3JlIC0+a2lsbF9zYiBpcyBjYWxsZWQuIFRoZSBzYi0+c191bW91bnQgbG9jayBpcyBoZWxkIGF0 Cj4gdGhpcyBwb2ludC4gaGVuY2UgaXMgdGhlIHNocmlua2VyIGlzIG9wZXJhdGluZywgd2Ugd2ls bCBkZWFkbG9jayBpZgo+IHdlIHRyeSB0byBsb2NrIGl0IGxpa2UgdGhpczoKPiAKPiAJdW5tb3Vu dDoJCQlzaHJpbmtlcjoKPiAJCQkJCWRvd25fcmVhZCgmc2hyaW5rZXJfbG9jayk7Cj4gCWRvd25f d3JpdGUoJnNiLT5zX3Vtb3VudCkKPiAJdW5yZWdpc3Rlcl9zaHJpbmtlcigpCj4gCWRvd25fd3Jp dGUoJnNocmlua2VyX2xvY2spCj4gCQkJCQlwcnVuZV9zdXBlcigpCj4gCQkJCQkgIGRvd25fcmVh ZCgmc2ItPnNfdW1vdW50KTsKPiAJCQkJCSAgKGRlYWRsb2NrKQo+IAo+IGhlbmNlIGlmIHdlIGNh bid0IGdldCB0aGUgc2ItPnNfdW1vdW50IGxvY2sgaW4gcHJ1bmVfc3VwZXIoKSwgdGhlbgo+IHRo ZSBzdXBlcmJsb2NrIG11c3QgYmUgYmVpbmcgdW5tb3VudGVkIGFuZCB0aGUgc2hyaW5rZXIgc2hv dWxkIGFib3J0Cj4gYXMgdGhlIC0+a2lsbF9zYiBtZXRob2Qgd2lsbCBjbGVhbiB1cCBldmVyeXRo aW5nIGFmdGVyIHRoZSBzaHJpbmtlcgo+IGlzIHVucmVnaXN0ZXJlZC4gSGVuY2UgdGhlIGRvd25f cmVhZF90cnlsb2NrKCkuCgpZb3UgYWRkZWQgaXQgdG8gdGhlIGNvbW1lbnQgaW4geW91ciB1cGRh dGVkIHBhdGNoLCB0aGF0IHdhcyB0aGUgbWFpbgp0aGluZyBJIHdhbnRlZC4gVGhhbmtzLgoKCj4g PiA+ICsJaWYgKCFzYi0+c19yb290KSB7Cj4gPiA+ICsJCXVwX3JlYWQoJnNiLT5zX3Vtb3VudCk7 Cj4gPiA+ICsJCXJldHVybiAtMTsKPiA+ID4gKwl9Cj4gPiA+ICsKPiA+ID4gKwlpZiAobnJfdG9f c2Nhbikgewo+ID4gPiArCQkvKiBwcm9wb3J0aW9uIHRoZSBzY2FuIGJldHdlZW4gdGhlIHR3byBj YWNoZdGVICovCj4gPiA+ICsJCWludCB0b3RhbDsKPiA+ID4gKwo+ID4gPiArCQl0b3RhbCA9IHNi LT5zX25yX2RlbnRyeV91bnVzZWQgKyBzYi0+c19ucl9pbm9kZXNfdW51c2VkICsgMTsKPiA+ID4g KwkJY291bnQgPSAobnJfdG9fc2NhbiAqIHNiLT5zX25yX2RlbnRyeV91bnVzZWQpIC8gdG90YWw7 Cj4gPiA+ICsKPiA+ID4gKwkJLyogcHJ1bmUgZGNhY2hlIGZpcnN0IGFzIGljYWNoZSBpcyBwaW5u ZWQgYnkgaXQgKi8KPiA+ID4gKwkJcHJ1bmVfZGNhY2hlX3NiKHNiLCBjb3VudCk7Cj4gPiA+ICsJ CXBydW5lX2ljYWNoZV9zYihzYiwgbnJfdG9fc2NhbiAtIGNvdW50KTsKPiA+ID4gKwl9Cj4gPiA+ ICsKPiA+ID4gKwljb3VudCA9ICgoc2ItPnNfbnJfZGVudHJ5X3VudXNlZCArIHNiLT5zX25yX2lu b2Rlc191bnVzZWQpIC8gMTAwKQo+ID4gPiArCQkJCQkJKiBzeXNjdGxfdmZzX2NhY2hlX3ByZXNz dXJlOwo+ID4gCj4gPiBEbyB5b3UgdGhpbmsgdHJ1bmNhdGluZyBpbiB0aGUgZGl2aXNpb25zIGlz IGF0IGFsbCBhIHByb2JsZW0/IEl0Cj4gPiBwcm9iYWJseSBkb2Vzbid0IG1hdHRlciBtdWNoIEkg c3VwcG9zZS4KPiAKPiBTYW1lIGNvZGUgYXMgY3VycmVudGx5IGV4aXN0cy4gSUlSQywgdGhlIHJl YXNvbmluZyBpcyB0aGF0IGlmIHdlJ3ZlCj4gZ290IGxlc3MgdGhhdCAxMDAgb2JqZWN0cyB0byBy ZWNsYWltLCB0aGVuIHdlJ3JlIHVubGlrZWx5IHRvIGJlIGFibGUKPiB0byBmcmVlIHVwIGFueSBt ZW1vcnkgZnJvbSB0aGUgY2FjaGVzLCBhbnl3YXkuCgpZZWFoLCB3aGljaCBpcyB3aHkgSSBzdG9w IHNob3J0IG9mIHNheWluZyB5b3Ugc2hvdWxkIGNoYW5nZSBpdCBpbgp0aGlzIHBhdGNoLgoKQnV0 IEkgdGhpbmsgd2Ugc2hvdWxkIGVuc3VyZSB0aGluZ3MgY2FuIGdldCByZWNsYWltZWQgZXZlbnR1 YWxseS4KMTAwIG9iamVjdHMgY291bGQgYmUgMTAwIHNsYWJzLCB3aGljaCBjb3VsZCBiZSBhbnl0 aGluZyBmcm9tCmhhbGYgYSBtZWcgdG8gaGFsZiBhIGRvemVuLiBNdWx0aXBsaWVkIGJ5IGVhY2gg b2YgdGhlIGNhY2hlcy4KQ291bGQgYmUgc2lnbmlmaWNhbnQgaW4gc21hbGwgc3lzdGVtcy4KCl9f X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fCnhmcyBtYWlsaW5n IGxpc3QKeGZzQG9zcy5zZ2kuY29tCmh0dHA6Ly9vc3Muc2dpLmNvbS9tYWlsbWFuL2xpc3RpbmZv L3hmcwo= From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757243Ab0E0CTV (ORCPT ); Wed, 26 May 2010 22:19:21 -0400 Received: from cantor.suse.de ([195.135.220.2]:35508 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753801Ab0E0CTT (ORCPT ); Wed, 26 May 2010 22:19:19 -0400 Date: Thu, 27 May 2010 12:19:05 +1000 From: Nick Piggin To: Dave Chinner Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, xfs@oss.sgi.com Subject: Re: [PATCH 3/5] superblock: introduce per-sb cache shrinker infrastructure Message-ID: <20100527021905.GG22536@laptop> References: <1274777588-21494-1-git-send-email-david@fromorbit.com> <1274777588-21494-4-git-send-email-david@fromorbit.com> <20100526164116.GD22536@laptop> <20100526231214.GB1395@dastard> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20100526231214.GB1395@dastard> User-Agent: Mutt/1.5.20 (2009-06-14) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, May 27, 2010 at 09:12:14AM +1000, Dave Chinner wrote: > On Thu, May 27, 2010 at 02:41:16AM +1000, Nick Piggin wrote: > > On Tue, May 25, 2010 at 06:53:06PM +1000, Dave Chinner wrote: > > > @@ -456,21 +456,16 @@ static void prune_one_dentry(struct dentry * dentry) > > > + /* > > > + * if we can't get the umount lock, then there's no point having the > > > + * shrinker try again because the sb is being torn down. > > > + */ > > > + if (!down_read_trylock(&sb->s_umount)) > > > + return -1; > > > > Would you just elaborate on the lock order problem somewhere? (the > > comment makes it look like we *could* take the mutex if we wanted > > to). > > The shrinker is unregistered in deactivate_locked_super() which is > just before ->kill_sb is called. The sb->s_umount lock is held at > this point. hence is the shrinker is operating, we will deadlock if > we try to lock it like this: > > unmount: shrinker: > down_read(&shrinker_lock); > down_write(&sb->s_umount) > unregister_shrinker() > down_write(&shrinker_lock) > prune_super() > down_read(&sb->s_umount); > (deadlock) > > hence if we can't get the sb->s_umount lock in prune_super(), then > the superblock must be being unmounted and the shrinker should abort > as the ->kill_sb method will clean up everything after the shrinker > is unregistered. Hence the down_read_trylock(). You added it to the comment in your updated patch, that was the main thing I wanted. Thanks. > > > + if (!sb->s_root) { > > > + up_read(&sb->s_umount); > > > + return -1; > > > + } > > > + > > > + if (nr_to_scan) { > > > + /* proportion the scan between the two cacheѕ */ > > > + int total; > > > + > > > + total = sb->s_nr_dentry_unused + sb->s_nr_inodes_unused + 1; > > > + count = (nr_to_scan * sb->s_nr_dentry_unused) / total; > > > + > > > + /* prune dcache first as icache is pinned by it */ > > > + prune_dcache_sb(sb, count); > > > + prune_icache_sb(sb, nr_to_scan - count); > > > + } > > > + > > > + count = ((sb->s_nr_dentry_unused + sb->s_nr_inodes_unused) / 100) > > > + * sysctl_vfs_cache_pressure; > > > > Do you think truncating in the divisions is at all a problem? It > > probably doesn't matter much I suppose. > > Same code as currently exists. IIRC, the reasoning is that if we've > got less that 100 objects to reclaim, then we're unlikely to be able > to free up any memory from the caches, anyway. Yeah, which is why I stop short of saying you should change it in this patch. But I think we should ensure things can get reclaimed eventually. 100 objects could be 100 slabs, which could be anything from half a meg to half a dozen. Multiplied by each of the caches. Could be significant in small systems. From mboxrd@z Thu Jan 1 00:00:00 1970 From: Nick Piggin Subject: Re: [PATCH 3/5] superblock: introduce per-sb cache shrinker infrastructure Date: Thu, 27 May 2010 12:19:05 +1000 Message-ID: <20100527021905.GG22536@laptop> References: <1274777588-21494-1-git-send-email-david@fromorbit.com> <1274777588-21494-4-git-send-email-david@fromorbit.com> <20100526164116.GD22536@laptop> <20100526231214.GB1395@dastard> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, xfs@oss.sgi.com To: Dave Chinner Return-path: Content-Disposition: inline In-Reply-To: <20100526231214.GB1395@dastard> Sender: owner-linux-mm@kvack.org List-Id: linux-fsdevel.vger.kernel.org On Thu, May 27, 2010 at 09:12:14AM +1000, Dave Chinner wrote: > On Thu, May 27, 2010 at 02:41:16AM +1000, Nick Piggin wrote: > > On Tue, May 25, 2010 at 06:53:06PM +1000, Dave Chinner wrote: > > > @@ -456,21 +456,16 @@ static void prune_one_dentry(struct dentry * = dentry) > > > + /* > > > + * if we can't get the umount lock, then there's no point having = the > > > + * shrinker try again because the sb is being torn down. > > > + */ > > > + if (!down_read_trylock(&sb->s_umount)) > > > + return -1; > >=20 > > Would you just elaborate on the lock order problem somewhere? (the > > comment makes it look like we *could* take the mutex if we wanted > > to). >=20 > The shrinker is unregistered in deactivate_locked_super() which is > just before ->kill_sb is called. The sb->s_umount lock is held at > this point. hence is the shrinker is operating, we will deadlock if > we try to lock it like this: >=20 > unmount: shrinker: > down_read(&shrinker_lock); > down_write(&sb->s_umount) > unregister_shrinker() > down_write(&shrinker_lock) > prune_super() > down_read(&sb->s_umount); > (deadlock) >=20 > hence if we can't get the sb->s_umount lock in prune_super(), then > the superblock must be being unmounted and the shrinker should abort > as the ->kill_sb method will clean up everything after the shrinker > is unregistered. Hence the down_read_trylock(). You added it to the comment in your updated patch, that was the main thing I wanted. Thanks. > > > + if (!sb->s_root) { > > > + up_read(&sb->s_umount); > > > + return -1; > > > + } > > > + > > > + if (nr_to_scan) { > > > + /* proportion the scan between the two cache=D1=95 */ > > > + int total; > > > + > > > + total =3D sb->s_nr_dentry_unused + sb->s_nr_inodes_unused + 1; > > > + count =3D (nr_to_scan * sb->s_nr_dentry_unused) / total; > > > + > > > + /* prune dcache first as icache is pinned by it */ > > > + prune_dcache_sb(sb, count); > > > + prune_icache_sb(sb, nr_to_scan - count); > > > + } > > > + > > > + count =3D ((sb->s_nr_dentry_unused + sb->s_nr_inodes_unused) / 10= 0) > > > + * sysctl_vfs_cache_pressure; > >=20 > > Do you think truncating in the divisions is at all a problem? It > > probably doesn't matter much I suppose. >=20 > Same code as currently exists. IIRC, the reasoning is that if we've > got less that 100 objects to reclaim, then we're unlikely to be able > to free up any memory from the caches, anyway. Yeah, which is why I stop short of saying you should change it in this patch. But I think we should ensure things can get reclaimed eventually. 100 objects could be 100 slabs, which could be anything from half a meg to half a dozen. Multiplied by each of the caches. Could be significant in small systems. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail143.messagelabs.com (mail143.messagelabs.com [216.82.254.35]) by kanga.kvack.org (Postfix) with ESMTP id 495EA6B01BA for ; Wed, 26 May 2010 22:19:21 -0400 (EDT) Date: Thu, 27 May 2010 12:19:05 +1000 From: Nick Piggin Subject: Re: [PATCH 3/5] superblock: introduce per-sb cache shrinker infrastructure Message-ID: <20100527021905.GG22536@laptop> References: <1274777588-21494-1-git-send-email-david@fromorbit.com> <1274777588-21494-4-git-send-email-david@fromorbit.com> <20100526164116.GD22536@laptop> <20100526231214.GB1395@dastard> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20100526231214.GB1395@dastard> Sender: owner-linux-mm@kvack.org To: Dave Chinner Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, xfs@oss.sgi.com List-ID: On Thu, May 27, 2010 at 09:12:14AM +1000, Dave Chinner wrote: > On Thu, May 27, 2010 at 02:41:16AM +1000, Nick Piggin wrote: > > On Tue, May 25, 2010 at 06:53:06PM +1000, Dave Chinner wrote: > > > @@ -456,21 +456,16 @@ static void prune_one_dentry(struct dentry * dentry) > > > + /* > > > + * if we can't get the umount lock, then there's no point having the > > > + * shrinker try again because the sb is being torn down. > > > + */ > > > + if (!down_read_trylock(&sb->s_umount)) > > > + return -1; > > > > Would you just elaborate on the lock order problem somewhere? (the > > comment makes it look like we *could* take the mutex if we wanted > > to). > > The shrinker is unregistered in deactivate_locked_super() which is > just before ->kill_sb is called. The sb->s_umount lock is held at > this point. hence is the shrinker is operating, we will deadlock if > we try to lock it like this: > > unmount: shrinker: > down_read(&shrinker_lock); > down_write(&sb->s_umount) > unregister_shrinker() > down_write(&shrinker_lock) > prune_super() > down_read(&sb->s_umount); > (deadlock) > > hence if we can't get the sb->s_umount lock in prune_super(), then > the superblock must be being unmounted and the shrinker should abort > as the ->kill_sb method will clean up everything after the shrinker > is unregistered. Hence the down_read_trylock(). You added it to the comment in your updated patch, that was the main thing I wanted. Thanks. > > > + if (!sb->s_root) { > > > + up_read(&sb->s_umount); > > > + return -1; > > > + } > > > + > > > + if (nr_to_scan) { > > > + /* proportion the scan between the two cacheN? */ > > > + int total; > > > + > > > + total = sb->s_nr_dentry_unused + sb->s_nr_inodes_unused + 1; > > > + count = (nr_to_scan * sb->s_nr_dentry_unused) / total; > > > + > > > + /* prune dcache first as icache is pinned by it */ > > > + prune_dcache_sb(sb, count); > > > + prune_icache_sb(sb, nr_to_scan - count); > > > + } > > > + > > > + count = ((sb->s_nr_dentry_unused + sb->s_nr_inodes_unused) / 100) > > > + * sysctl_vfs_cache_pressure; > > > > Do you think truncating in the divisions is at all a problem? It > > probably doesn't matter much I suppose. > > Same code as currently exists. IIRC, the reasoning is that if we've > got less that 100 objects to reclaim, then we're unlikely to be able > to free up any memory from the caches, anyway. Yeah, which is why I stop short of saying you should change it in this patch. But I think we should ensure things can get reclaimed eventually. 100 objects could be 100 slabs, which could be anything from half a meg to half a dozen. Multiplied by each of the caches. Could be significant in small systems. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org