From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id o4R1pR2O103754 for ; Wed, 26 May 2010 20:51:28 -0500 Received: from mail.internode.on.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id E5387109F52D for ; Wed, 26 May 2010 18:55:37 -0700 (PDT) Received: from mail.internode.on.net (bld-mail15.adl6.internode.on.net [150.101.137.100]) by cuda.sgi.com with ESMTP id 2E2lrut7KAo1Fgr7 for ; Wed, 26 May 2010 18:55:37 -0700 (PDT) Date: Thu, 27 May 2010 11:53:35 +1000 From: Dave Chinner Subject: [PATCH 3/5 v2] superblock: introduce per-sb cache shrinker infrastructure Message-ID: <20100527015335.GD1395@dastard> References: <1274777588-21494-1-git-send-email-david@fromorbit.com> <1274777588-21494-4-git-send-email-david@fromorbit.com> <20100526164116.GD22536@laptop> <20100526231214.GB1395@dastard> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20100526231214.GB1395@dastard> List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Sender: xfs-bounces@oss.sgi.com Errors-To: xfs-bounces@oss.sgi.com To: Nick Piggin Cc: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, xfs@oss.sgi.com T24gVGh1LCBNYXkgMjcsIDIwMTAgYXQgMDk6MTI6MTRBTSArMTAwMCwgRGF2ZSBDaGlubmVyIHdy b3RlOgo+IE9uIFRodSwgTWF5IDI3LCAyMDEwIGF0IDAyOjQxOjE2QU0gKzEwMDAsIE5pY2sgUGln Z2luIHdyb3RlOgouLi4uCj4gPiBOaXRwaWNrIGJ1dCBJIHByZWZlciBqdXN0IHRoZSByZXN0YXJ0 IGxhYmVsIHdoZXIgaXQgaXMgcHJldmlvdXNseS4gVGhpcwo+ID4gaXMgbW92aW5nIHNldHVwIGZv ciB0aGUgbmV4dCBpdGVyYXRpb24gaW50byB0aGUgImVycm9yIiBjYXNlLgo+IAo+IE9rLCB3aWxs IGZpeC4KLi4uLgo+ID4gV291bGQgeW91IGp1c3QgZWxhYm9yYXRlIG9uIHRoZSBsb2NrIG9yZGVy IHByb2JsZW0gc29tZXdoZXJlPyAodGhlCj4gPiBjb21tZW50IG1ha2VzIGl0IGxvb2sgbGlrZSB3 ZSAqY291bGQqIHRha2UgdGhlIG11dGV4IGlmIHdlIHdhbnRlZAo+ID4gdG8pLgo+IAo+IFRoZSBz aHJpbmtlciBpcyB1bnJlZ2lzdGVyZWQgaW4gZGVhY3RpdmF0ZV9sb2NrZWRfc3VwZXIoKSB3aGlj aCBpcwo+IGp1c3QgYmVmb3JlIC0+a2lsbF9zYiBpcyBjYWxsZWQuIFRoZSBzYi0+c191bW91bnQg bG9jayBpcyBoZWxkIGF0Cj4gdGhpcyBwb2ludC4gaGVuY2UgaXMgdGhlIHNocmlua2VyIGlzIG9w ZXJhdGluZywgd2Ugd2lsbCBkZWFkbG9jayBpZgo+IHdlIHRyeSB0byBsb2NrIGl0IGxpa2UgdGhp czoKPiAKPiAJdW5tb3VudDoJCQlzaHJpbmtlcjoKPiAJCQkJCWRvd25fcmVhZCgmc2hyaW5rZXJf bG9jayk7Cj4gCWRvd25fd3JpdGUoJnNiLT5zX3Vtb3VudCkKPiAJdW5yZWdpc3Rlcl9zaHJpbmtl cigpCj4gCWRvd25fd3JpdGUoJnNocmlua2VyX2xvY2spCj4gCQkJCQlwcnVuZV9zdXBlcigpCj4g CQkJCQkgIGRvd25fcmVhZCgmc2ItPnNfdW1vdW50KTsKPiAJCQkJCSAgKGRlYWRsb2NrKQo+IAo+ IGhlbmNlIGlmIHdlIGNhbid0IGdldCB0aGUgc2ItPnNfdW1vdW50IGxvY2sgaW4gcHJ1bmVfc3Vw ZXIoKSwgdGhlbgo+IHRoZSBzdXBlcmJsb2NrIG11c3QgYmUgYmVpbmcgdW5tb3VudGVkIGFuZCB0 aGUgc2hyaW5rZXIgc2hvdWxkIGFib3J0Cj4gYXMgdGhlIC0+a2lsbF9zYiBtZXRob2Qgd2lsbCBj bGVhbiB1cCBldmVyeXRoaW5nIGFmdGVyIHRoZSBzaHJpbmtlcgo+IGlzIHVucmVnaXN0ZXJlZC4g SGVuY2UgdGhlIGRvd25fcmVhZF90cnlsb2NrKCkuCgpVcGRhdGVkIHBhdGNoIGJlbG93IHdpdGgg dGhlc2UgaXNzdWVzIGZpeGVkLgoKQ2hlZXJzLAoKRGF2ZS4KLS0gCkRhdmUgQ2hpbm5lcgpkYXZp ZEBmcm9tb3JiaXQuY29tCgpzdXBlcmJsb2NrOiBpbnRyb2R1Y2UgcGVyLXNiIGNhY2hlIHNocmlu a2VyIGluZnJhc3RydWN0dXJlCgpGcm9tOiBEYXZlIENoaW5uZXIgPGRjaGlubmVyQHJlZGhhdC5j b20+CgpXaXRoIGNvbnRleHQgYmFzZWQgc2hyaW5rZXJzLCB3ZSBjYW4gaW1wbGVtZW50IGEgcGVy LXN1cGVyYmxvY2sKc2hyaW5rZXIgdGhhdCBzaHJpbmtzIHRoZSBjYWNoZXMgYXR0YWNoZWQgdG8g dGhlIHN1cGVyYmxvY2suIFdlCmN1cnJlbnRseSBoYXZlIGdsb2JhbCBzaHJpbmtlcnMgZm9yIHRo ZSBpbm9kZSBhbmQgZGVudHJ5IGNhY2hlcyB0aGF0CnNwbGl0IHVwIGludG8gcGVyLXN1cGVyYmxv Y2sgb3BlcmF0aW9ucyB2aWEgYSBjb2Fyc2UgcHJvcG9ydGlvbmluZwptZXRob2QgdGhhdCBkb2Vz IG5vdCBiYXRjaCB2ZXJ5IHdlbGwuICBUaGUgZ2xvYmFsIHNocmlua2VycyBhbHNvCmhhdmUgYSBk ZXBlbmRlbmN5IC0gZGVudHJpZXMgcGluIGlub2RlcyAtIHNvIHdlIGhhdmUgdG8gYmUgdmVyeQpj YXJlZnVsIGFib3V0IGhvdyB3ZSByZWdpc3RlciB0aGUgZ2xvYmFsIHNocmlua2VycyBzbyB0aGF0 IHRoZQppbXBsaWNpdCBjYWxsIG9yZGVyIGlzIGFsd2F5cyBjb3JyZWN0LgoKV2l0aCBhIHBlci1z YiBzaHJpbmtlciBjYWxsb3V0LCB3ZSBjYW4gZW5jb2RlIHRoaXMgZGVwZW5kZW5jeQpkaXJlY3Rs eSBpbnRvIHRoZSBwZXItc2Igc2hyaW5rZXIsIGhlbmNlIGF2b2lkaW5nIHRoZSBuZWVkIGZvcgpz dHJpY3RseSBvcmRlcmluZyBzaHJpbmtlciByZWdpc3RyYXRpb25zLiBXZSBhbHNvIGhhdmUgbm8g bmVlZCBmb3IKYW55IHByb3BvcnRpb25pbmcgY29kZSBmb3IgdGhlIHNocmlua2VyIHN1YnN5c3Rl bSBhbHJlYWR5IHByb3ZpZGVzCnRoaXMgZnVuY3Rpb25hbGl0eSBhY3Jvc3MgYWxsIHNocmlua2Vy cy4gQWxsb3dpbmcgdGhlIHNocmlua2VyIHRvCm9wZXJhdGUgb24gYSBzaW5nbGUgc3VwZXJibG9j ayBhdCBhIHRpbWUgbWVhbnMgdGhhdCB3ZSBkbyBsZXNzCnN1cGVyYmxvY2sgbGlzdCB0cmF2ZXJz YWxzIGFuZCBsb2NraW5nIGFuZCByZWNsYWltIHNob3VsZCBiYXRjaCBtb3JlCmVmZmVjdGl2ZWx5 LiBUaGlzIHNob3VsZCByZXN1bHQgaW4gbGVzcyBDUFUgb3ZlcmhlYWQgZm9yIHJlY2xhaW0gYW5k CnBvdGVudGlhbGx5IGZhc3RlciByZWNsYWltIG9mIGl0ZW1zIGZyb20gZWFjaCBmaWxlc3lzdGVt LgoKU2lnbmVkLW9mZi1ieTogRGF2ZSBDaGlubmVyIDxkY2hpbm5lckByZWRoYXQuY29tPgotLS0K VmVyc2lvbiAyOgotIGNoYW5nZSBsb29wIHJlc3RhcnQgaW4gX19zaHJpbmtfZGNhY2hlX3NiKCkg dG8gbWF0Y2ggcHJldmlvdXMKICByZXN0YXJ0IHNlbWFudGljcwotIGFkZCBhIGJldHRlciBjb21t ZW50IGluIHBydW5lX3N1cGVyKCkgdG8gZXhwbGFpbiB0aGUgZGVhZGxvY2sgd2UKICBhcmUgYXZv aWRpbmcgYnkgdXNpbmcgZG93bl9yZWFkX3RyeWxvY2soJnNiLT5zX3Vtb3VudCkgYmVmb3JlCiAg c3RhcnRpbmcgYW55IHNocmlua2luZy4KCiBmcy9kY2FjaGUuYyAgICAgICAgfCAgMTI3ICsrKysr KystLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0KIGZzL2lub2Rl LmMgICAgICAgICB8ICAxMDkgKysrLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t LS0tLS0KIGZzL3N1cGVyLmMgICAgICAgICB8ICAgNTggKysrKysrKysrKysrKysrKysrKysrKysr CiBpbmNsdWRlL2xpbnV4L2ZzLmggfCAgICA3ICsrKwogNCBmaWxlcyBjaGFuZ2VkLCA4OSBpbnNl cnRpb25zKCspLCAyMTIgZGVsZXRpb25zKC0pCgpkaWZmIC0tZ2l0IGEvZnMvZGNhY2hlLmMgYi9m cy9kY2FjaGUuYwppbmRleCBkYmE2YjZkLi5hN2NkMzM1IDEwMDY0NAotLS0gYS9mcy9kY2FjaGUu YworKysgYi9mcy9kY2FjaGUuYwpAQCAtNDU2LDIxICs0NTYsMTcgQEAgc3RhdGljIHZvaWQgcHJ1 bmVfb25lX2RlbnRyeShzdHJ1Y3QgZGVudHJ5ICogZGVudHJ5KQogICogd2hpY2ggZmxhZ3MgYXJl IHNldC4gVGhpcyBtZWFucyB3ZSBkb24ndCBuZWVkIHRvIG1haW50YWluIG11bHRpcGxlCiAgKiBz aW1pbGFyIGNvcGllcyBvZiB0aGlzIGxvb3AuCiAgKi8KLXN0YXRpYyB2b2lkIF9fc2hyaW5rX2Rj YWNoZV9zYihzdHJ1Y3Qgc3VwZXJfYmxvY2sgKnNiLCBpbnQgKmNvdW50LCBpbnQgZmxhZ3MpCitz dGF0aWMgdm9pZCBfX3Nocmlua19kY2FjaGVfc2Ioc3RydWN0IHN1cGVyX2Jsb2NrICpzYiwgaW50 IGNvdW50LCBpbnQgZmxhZ3MpCiB7CiAJTElTVF9IRUFEKHJlZmVyZW5jZWQpOwogCUxJU1RfSEVB RCh0bXApOwogCXN0cnVjdCBkZW50cnkgKmRlbnRyeTsKLQlpbnQgY250ID0gMDsKIAogCUJVR19P Tighc2IpOwotCUJVR19PTigoZmxhZ3MgJiBEQ0FDSEVfUkVGRVJFTkNFRCkgJiYgY291bnQgPT0g TlVMTCk7CisJQlVHX09OKChmbGFncyAmIERDQUNIRV9SRUZFUkVOQ0VEKSAmJiBjb3VudCA9PSAt MSk7CiAJc3Bpbl9sb2NrKCZkY2FjaGVfbG9jayk7Ci0JaWYgKGNvdW50ICE9IE5VTEwpCi0JCS8q IGNhbGxlZCBmcm9tIHBydW5lX2RjYWNoZSgpIGFuZCBzaHJpbmtfZGNhY2hlX3BhcmVudCgpICov Ci0JCWNudCA9ICpjb3VudDsKIHJlc3RhcnQ6Ci0JaWYgKGNvdW50ID09IE5VTEwpCisJaWYgKGNv dW50ID09IC0xKQogCQlsaXN0X3NwbGljZV9pbml0KCZzYi0+c19kZW50cnlfbHJ1LCAmdG1wKTsK IAllbHNlIHsKIAkJd2hpbGUgKCFsaXN0X2VtcHR5KCZzYi0+c19kZW50cnlfbHJ1KSkgewpAQCAt NDkyLDggKzQ4OCw3IEBAIHJlc3RhcnQ6CiAJCQl9IGVsc2UgewogCQkJCWxpc3RfbW92ZV90YWls KCZkZW50cnktPmRfbHJ1LCAmdG1wKTsKIAkJCQlzcGluX3VubG9jaygmZGVudHJ5LT5kX2xvY2sp OwotCQkJCWNudC0tOwotCQkJCWlmICghY250KQorCQkJCWlmICgtLWNvdW50ID09IDApCiAJCQkJ CWJyZWFrOwogCQkJfQogCQkJY29uZF9yZXNjaGVkX2xvY2soJmRjYWNoZV9sb2NrKTsKQEAgLTUx Niw4OCArNTExLDI3IEBAIHJlc3RhcnQ6CiAJCS8qIGRlbnRyeS0+ZF9sb2NrIHdhcyBkcm9wcGVk IGluIHBydW5lX29uZV9kZW50cnkoKSAqLwogCQljb25kX3Jlc2NoZWRfbG9jaygmZGNhY2hlX2xv Y2spOwogCX0KLQlpZiAoY291bnQgPT0gTlVMTCAmJiAhbGlzdF9lbXB0eSgmc2ItPnNfZGVudHJ5 X2xydSkpCisJaWYgKGNvdW50ID09IC0xICYmICFsaXN0X2VtcHR5KCZzYi0+c19kZW50cnlfbHJ1 KSkKIAkJZ290byByZXN0YXJ0OwotCWlmIChjb3VudCAhPSBOVUxMKQotCQkqY291bnQgPSBjbnQ7 CiAJaWYgKCFsaXN0X2VtcHR5KCZyZWZlcmVuY2VkKSkKIAkJbGlzdF9zcGxpY2UoJnJlZmVyZW5j ZWQsICZzYi0+c19kZW50cnlfbHJ1KTsKIAlzcGluX3VubG9jaygmZGNhY2hlX2xvY2spOwogfQog CiAvKioKLSAqIHBydW5lX2RjYWNoZSAtIHNocmluayB0aGUgZGNhY2hlCi0gKiBAY291bnQ6IG51 bWJlciBvZiBlbnRyaWVzIHRvIHRyeSB0byBmcmVlCisgKiBwcnVuZV9kY2FjaGVfc2IgLSBzaHJp bmsgdGhlIGRjYWNoZQorICogQG5yX3RvX3NjYW46IG51bWJlciBvZiBlbnRyaWVzIHRvIHRyeSB0 byBmcmVlCiAgKgotICogU2hyaW5rIHRoZSBkY2FjaGUuIFRoaXMgaXMgZG9uZSB3aGVuIHdlIG5l ZWQgbW9yZSBtZW1vcnksIG9yIHNpbXBseSB3aGVuIHdlCi0gKiBuZWVkIHRvIHVubW91bnQgc29t ZXRoaW5nIChhdCB3aGljaCBwb2ludCB3ZSBuZWVkIHRvIHVudXNlIGFsbCBkZW50cmllcykuCisg KiBBdHRlbXB0IHRvIHNocmluayB0aGUgc3VwZXJibG9jayBkY2FjaGUgTFJVIGJ5IEBucl90b19z Y2FuIGVudHJpZXMuIFRoaXMgaXMKKyAqIGRvbmUgd2hlbiB3ZSBuZWVkIG1vcmUgbWVtb3J5IGFu IGNhbGxlZCBmcm9tIHRoZSBzdXBlcmJsb2NrIHNocmlua2VyCisgKiBmdW5jdGlvbi4KICAqCi0g KiBUaGlzIGZ1bmN0aW9uIG1heSBmYWlsIHRvIGZyZWUgYW55IHJlc291cmNlcyBpZiBhbGwgdGhl IGRlbnRyaWVzIGFyZSBpbiB1c2UuCisgKiBUaGlzIGZ1bmN0aW9uIG1heSBmYWlsIHRvIGZyZWUg YW55IHJlc291cmNlcyBpZiBhbGwgdGhlIGRlbnRyaWVzIGFyZSBpbgorICogdXNlLgogICovCi1z dGF0aWMgdm9pZCBwcnVuZV9kY2FjaGUoaW50IGNvdW50KQordm9pZCBwcnVuZV9kY2FjaGVfc2Io c3RydWN0IHN1cGVyX2Jsb2NrICpzYiwgaW50IG5yX3RvX3NjYW4pCiB7Ci0Jc3RydWN0IHN1cGVy X2Jsb2NrICpzYiwgKm47Ci0JaW50IHdfY291bnQ7Ci0JaW50IHVudXNlZCA9IGRlbnRyeV9zdGF0 Lm5yX3VudXNlZDsKLQlpbnQgcHJ1bmVfcmF0aW87Ci0JaW50IHBydW5lZDsKLQotCWlmICh1bnVz ZWQgPT0gMCB8fCBjb3VudCA9PSAwKQotCQlyZXR1cm47Ci0Jc3Bpbl9sb2NrKCZkY2FjaGVfbG9j ayk7Ci0JaWYgKGNvdW50ID49IHVudXNlZCkKLQkJcHJ1bmVfcmF0aW8gPSAxOwotCWVsc2UKLQkJ cHJ1bmVfcmF0aW8gPSB1bnVzZWQgLyBjb3VudDsKLQlzcGluX2xvY2soJnNiX2xvY2spOwotCWxp c3RfZm9yX2VhY2hfZW50cnlfc2FmZShzYiwgbiwgJnN1cGVyX2Jsb2Nrcywgc19saXN0KSB7Ci0J CWlmIChsaXN0X2VtcHR5KCZzYi0+c19pbnN0YW5jZXMpKQotCQkJY29udGludWU7Ci0JCWlmIChz Yi0+c19ucl9kZW50cnlfdW51c2VkID09IDApCi0JCQljb250aW51ZTsKLQkJc2ItPnNfY291bnQr KzsKLQkJLyogTm93LCB3ZSByZWNsYWltIHVudXNlZCBkZW50cmlucyB3aXRoIGZhaXJuZXNzLgot CQkgKiBXZSByZWNsYWltIHRoZW0gc2FtZSBwZXJjZW50YWdlIGZyb20gZWFjaCBzdXBlcmJsb2Nr LgotCQkgKiBXZSBjYWxjdWxhdGUgbnVtYmVyIG9mIGRlbnRyaWVzIHRvIHNjYW4gb24gdGhpcyBz YgotCQkgKiBhcyBmb2xsb3dzLCBidXQgdGhlIGltcGxlbWVudGF0aW9uIGlzIGFycmFuZ2VkIHRv IGF2b2lkCi0JCSAqIG92ZXJmbG93czoKLQkJICogbnVtYmVyIG9mIGRlbnRyaWVzIHRvIHNjYW4g b24gdGhpcyBzYiA9Ci0JCSAqIGNvdW50ICogKG51bWJlciBvZiBkZW50cmllcyBvbiB0aGlzIHNi IC8KLQkJICogbnVtYmVyIG9mIGRlbnRyaWVzIGluIHRoZSBtYWNoaW5lKQotCQkgKi8KLQkJc3Bp bl91bmxvY2soJnNiX2xvY2spOwotCQlpZiAocHJ1bmVfcmF0aW8gIT0gMSkKLQkJCXdfY291bnQg PSAoc2ItPnNfbnJfZGVudHJ5X3VudXNlZCAvIHBydW5lX3JhdGlvKSArIDE7Ci0JCWVsc2UKLQkJ CXdfY291bnQgPSBzYi0+c19ucl9kZW50cnlfdW51c2VkOwotCQlwcnVuZWQgPSB3X2NvdW50Owot CQkvKgotCQkgKiBXZSBuZWVkIHRvIGJlIHN1cmUgdGhpcyBmaWxlc3lzdGVtIGlzbid0IGJlaW5n IHVubW91bnRlZCwKLQkJICogb3RoZXJ3aXNlIHdlIGNvdWxkIHJhY2Ugd2l0aCBnZW5lcmljX3No dXRkb3duX3N1cGVyKCksIGFuZAotCQkgKiBlbmQgdXAgaG9sZGluZyBhIHJlZmVyZW5jZSB0byBh biBpbm9kZSB3aGlsZSB0aGUgZmlsZXN5c3RlbQotCQkgKiBpcyB1bm1vdW50ZWQuICBTbyB3ZSB0 cnkgdG8gZ2V0IHNfdW1vdW50LCBhbmQgbWFrZSBzdXJlCi0JCSAqIHNfcm9vdCBpc24ndCBOVUxM LgotCQkgKi8KLQkJaWYgKGRvd25fcmVhZF90cnlsb2NrKCZzYi0+c191bW91bnQpKSB7Ci0JCQlp ZiAoKHNiLT5zX3Jvb3QgIT0gTlVMTCkgJiYKLQkJCSAgICAoIWxpc3RfZW1wdHkoJnNiLT5zX2Rl bnRyeV9scnUpKSkgewotCQkJCXNwaW5fdW5sb2NrKCZkY2FjaGVfbG9jayk7Ci0JCQkJX19zaHJp bmtfZGNhY2hlX3NiKHNiLCAmd19jb3VudCwKLQkJCQkJCURDQUNIRV9SRUZFUkVOQ0VEKTsKLQkJ CQlwcnVuZWQgLT0gd19jb3VudDsKLQkJCQlzcGluX2xvY2soJmRjYWNoZV9sb2NrKTsKLQkJCX0K LQkJCXVwX3JlYWQoJnNiLT5zX3Vtb3VudCk7Ci0JCX0KLQkJc3Bpbl9sb2NrKCZzYl9sb2NrKTsK LQkJY291bnQgLT0gcHJ1bmVkOwotCQlfX3B1dF9zdXBlcihzYik7Ci0JCS8qIG1vcmUgd29yayBs ZWZ0IHRvIGRvPyAqLwotCQlpZiAoY291bnQgPD0gMCkKLQkJCWJyZWFrOwotCX0KLQlzcGluX3Vu bG9jaygmc2JfbG9jayk7Ci0Jc3Bpbl91bmxvY2soJmRjYWNoZV9sb2NrKTsKKwlfX3Nocmlua19k Y2FjaGVfc2Ioc2IsIG5yX3RvX3NjYW4sIERDQUNIRV9SRUZFUkVOQ0VEKTsKIH0KIAogLyoqCkBA IC02MTAsNyArNTQ0LDcgQEAgc3RhdGljIHZvaWQgcHJ1bmVfZGNhY2hlKGludCBjb3VudCkKICAq Lwogdm9pZCBzaHJpbmtfZGNhY2hlX3NiKHN0cnVjdCBzdXBlcl9ibG9jayAqIHNiKQogewotCV9f c2hyaW5rX2RjYWNoZV9zYihzYiwgTlVMTCwgMCk7CisJX19zaHJpbmtfZGNhY2hlX3NiKHNiLCAt MSwgMCk7CiB9CiBFWFBPUlRfU1lNQk9MKHNocmlua19kY2FjaGVfc2IpOwogCkBAIC04NzgsMzcg KzgxMiwxMCBAQCB2b2lkIHNocmlua19kY2FjaGVfcGFyZW50KHN0cnVjdCBkZW50cnkgKiBwYXJl bnQpCiAJaW50IGZvdW5kOwogCiAJd2hpbGUgKChmb3VuZCA9IHNlbGVjdF9wYXJlbnQocGFyZW50 KSkgIT0gMCkKLQkJX19zaHJpbmtfZGNhY2hlX3NiKHNiLCAmZm91bmQsIDApOworCQlfX3Nocmlu a19kY2FjaGVfc2Ioc2IsIGZvdW5kLCAwKTsKIH0KIEVYUE9SVF9TWU1CT0woc2hyaW5rX2RjYWNo ZV9wYXJlbnQpOwogCi0vKgotICogU2NhbiBgbnInIGRlbnRyaWVzIGFuZCByZXR1cm4gdGhlIG51 bWJlciB3aGljaCByZW1haW4uCi0gKgotICogV2UgbmVlZCB0byBhdm9pZCByZWVudGVyaW5nIHRo ZSBmaWxlc3lzdGVtIGlmIHRoZSBjYWxsZXIgaXMgcGVyZm9ybWluZyBhCi0gKiBHRlBfTk9GUyBh bGxvY2F0aW9uIGF0dGVtcHQuICBPbmUgZXhhbXBsZSBkZWFkbG9jayBpczoKLSAqCi0gKiBleHQy X25ld19ibG9jay0+Z2V0YmxrLT5HRlAtPnNocmlua19kY2FjaGVfbWVtb3J5LT5wcnVuZV9kY2Fj aGUtPgotICogcHJ1bmVfb25lX2RlbnRyeS0+ZHB1dC0+ZGVudHJ5X2lwdXQtPmlwdXQtPmlub2Rl LT5pX3NiLT5zX29wLT5wdXRfaW5vZGUtPgotICogZXh0Ml9kaXNjYXJkX3ByZWFsbG9jLT5leHQy X2ZyZWVfYmxvY2tzLT5sb2NrX3N1cGVyLT5ERUFETE9DSy4KLSAqCi0gKiBJbiB0aGlzIGNhc2Ug d2UgcmV0dXJuIC0xIHRvIHRlbGwgdGhlIGNhbGxlciB0aGF0IHdlIGJhbGVkLgotICovCi1zdGF0 aWMgaW50IHNocmlua19kY2FjaGVfbWVtb3J5KHN0cnVjdCBzaHJpbmtlciAqc2hyaW5rLCBpbnQg bnIsIGdmcF90IGdmcF9tYXNrKQotewotCWlmIChucikgewotCQlpZiAoIShnZnBfbWFzayAmIF9f R0ZQX0ZTKSkKLQkJCXJldHVybiAtMTsKLQkJcHJ1bmVfZGNhY2hlKG5yKTsKLQl9Ci0JcmV0dXJu IChkZW50cnlfc3RhdC5ucl91bnVzZWQgLyAxMDApICogc3lzY3RsX3Zmc19jYWNoZV9wcmVzc3Vy ZTsKLX0KLQotc3RhdGljIHN0cnVjdCBzaHJpbmtlciBkY2FjaGVfc2hyaW5rZXIgPSB7Ci0JLnNo cmluayA9IHNocmlua19kY2FjaGVfbWVtb3J5LAotCS5zZWVrcyA9IERFRkFVTFRfU0VFS1MsCi19 OwotCiAvKioKICAqIGRfYWxsb2MJLQlhbGxvY2F0ZSBhIGRjYWNoZSBlbnRyeQogICogQHBhcmVu dDogcGFyZW50IG9mIGVudHJ5IHRvIGFsbG9jYXRlCkBAIC0yMzE2LDggKzIyMjMsNiBAQCBzdGF0 aWMgdm9pZCBfX2luaXQgZGNhY2hlX2luaXQodm9pZCkKIAkgKi8KIAlkZW50cnlfY2FjaGUgPSBL TUVNX0NBQ0hFKGRlbnRyeSwKIAkJU0xBQl9SRUNMQUlNX0FDQ09VTlR8U0xBQl9QQU5JQ3xTTEFC X01FTV9TUFJFQUQpOwotCQotCXJlZ2lzdGVyX3Nocmlua2VyKCZkY2FjaGVfc2hyaW5rZXIpOwog CiAJLyogSGFzaCBtYXkgaGF2ZSBiZWVuIHNldCB1cCBpbiBkY2FjaGVfaW5pdF9lYXJseSAqLwog CWlmICghaGFzaGRpc3QpCmRpZmYgLS1naXQgYS9mcy9pbm9kZS5jIGIvZnMvaW5vZGUuYwppbmRl eCAxZTQ0ZWM1Li41ZmI0YTM5IDEwMDY0NAotLS0gYS9mcy9pbm9kZS5jCisrKyBiL2ZzL2lub2Rl LmMKQEAgLTI1LDcgKzI1LDYgQEAKICNpbmNsdWRlIDxsaW51eC9tb3VudC5oPgogI2luY2x1ZGUg PGxpbnV4L2FzeW5jLmg+CiAjaW5jbHVkZSA8bGludXgvcG9zaXhfYWNsLmg+Ci0jaW5jbHVkZSAi aW50ZXJuYWwuaCIKIAogLyoKICAqIFRoaXMgaXMgbmVlZGVkIGZvciB0aGUgZm9sbG93aW5nIGZ1 bmN0aW9uczoKQEAgLTQ0MSw4ICs0NDAsMTAgQEAgc3RhdGljIGludCBjYW5fdW51c2Uoc3RydWN0 IGlub2RlICppbm9kZSkKIH0KIAogLyoKLSAqIFNjYW4gYGdvYWwnIGlub2RlcyBvbiB0aGUgdW51 c2VkIGxpc3QgZm9yIGZyZWVhYmxlIG9uZXMuIFRoZXkgYXJlIG1vdmVkIHRvCi0gKiBhIHRlbXBv cmFyeSBsaXN0IGFuZCB0aGVuIGFyZSBmcmVlZCBvdXRzaWRlIGlub2RlX2xvY2sgYnkgZGlzcG9z ZV9saXN0KCkuCisgKiBXYWxrIHRoZSBzdXBlcmJsb2NrIGlub2RlIExSVSBmb3IgZnJlZWFibGUg aW5vZGVzIGFuZCBhdHRlbXB0IHRvIGZyZWUgdGhlbS4KKyAqIFRoaXMgaXMgY2FsbGVkIGZyb20g dGhlIHN1cGVyYmxvY2sgc2hyaW5rZXIgZnVuY3Rpb24gd2l0aCBhIG51bWJlciBvZiBpbm9kZXMK KyAqIHRvIHRyaW0gZnJvbSB0aGUgTFJVLiBJbm9kZXMgdG8gYmUgZnJlZWQgYXJlIG1vdmVkIHRv IGEgdGVtcG9yYXJ5IGxpc3QgYW5kCisgKiB0aGVuIGFyZSBmcmVlZCBvdXRzaWRlIGlub2RlX2xv Y2sgYnkgZGlzcG9zZV9saXN0KCkuCiAgKgogICogQW55IGlub2RlcyB3aGljaCBhcmUgcGlubmVk IHB1cmVseSBiZWNhdXNlIG9mIGF0dGFjaGVkIHBhZ2VjYWNoZSBoYXZlIHRoZWlyCiAgKiBwYWdl Y2FjaGUgcmVtb3ZlZC4gIFdlIGV4cGVjdCB0aGUgZmluYWwgaXB1dCgpIG9uIHRoYXQgaW5vZGUg dG8gYWRkIGl0IHRvCkBAIC00NTAsMTAgKzQ1MSwxMCBAQCBzdGF0aWMgaW50IGNhbl91bnVzZShz dHJ1Y3QgaW5vZGUgKmlub2RlKQogICogaW5vZGUgaXMgc3RpbGwgZnJlZWFibGUsIHByb2NlZWQu ICBUaGUgcmlnaHQgaW5vZGUgaXMgZm91bmQgOTkuOSUgb2YgdGhlCiAgKiB0aW1lIGluIHRlc3Rp bmcgb24gYSA0LXdheS4KICAqCi0gKiBJZiB0aGUgaW5vZGUgaGFzIG1ldGFkYXRhIGJ1ZmZlcnMg YXR0YWNoZWQgdG8gbWFwcGluZy0+cHJpdmF0ZV9saXN0IHRoZW4KLSAqIHRyeSB0byByZW1vdmUg dGhlbS4KKyAqIElmIHRoZSBpbm9kZSBoYXMgbWV0YWRhdGEgYnVmZmVycyBhdHRhY2hlZCB0byBt YXBwaW5nLT5wcml2YXRlX2xpc3QgdGhlbiB0cnkKKyAqIHRvIHJlbW92ZSB0aGVtLgogICovCi1z dGF0aWMgdm9pZCBzaHJpbmtfaWNhY2hlX3NiKHN0cnVjdCBzdXBlcl9ibG9jayAqc2IsIGludCAq bnJfdG9fc2NhbikKK3ZvaWQgcHJ1bmVfaWNhY2hlX3NiKHN0cnVjdCBzdXBlcl9ibG9jayAqc2Is IGludCBucl90b19zY2FuKQogewogCUxJU1RfSEVBRChmcmVlYWJsZSk7CiAJaW50IG5yX3BydW5l ZCA9IDA7CkBAIC00NjEsNyArNDYyLDcgQEAgc3RhdGljIHZvaWQgc2hyaW5rX2ljYWNoZV9zYihz dHJ1Y3Qgc3VwZXJfYmxvY2sgKnNiLCBpbnQgKm5yX3RvX3NjYW4pCiAJdW5zaWduZWQgbG9uZyBy ZWFwID0gMDsKIAogCXNwaW5fbG9jaygmaW5vZGVfbG9jayk7Ci0JZm9yIChucl9zY2FubmVkID0g Km5yX3RvX3NjYW47IG5yX3NjYW5uZWQgPj0gMDsgbnJfc2Nhbm5lZC0tKSB7CisJZm9yIChucl9z Y2FubmVkID0gbnJfdG9fc2NhbjsgbnJfc2Nhbm5lZCA+PSAwOyBucl9zY2FubmVkLS0pIHsKIAkJ c3RydWN0IGlub2RlICppbm9kZTsKIAogCQlpZiAobGlzdF9lbXB0eSgmc2ItPnNfaW5vZGVfbHJ1 KSkKQEAgLTUwMCwxMDMgKzUwMSwxMCBAQCBzdGF0aWMgdm9pZCBzaHJpbmtfaWNhY2hlX3NiKHN0 cnVjdCBzdXBlcl9ibG9jayAqc2IsIGludCAqbnJfdG9fc2NhbikKIAllbHNlCiAJCV9fY291bnRf dm1fZXZlbnRzKFBHSU5PREVTVEVBTCwgcmVhcCk7CiAJc3Bpbl91bmxvY2soJmlub2RlX2xvY2sp OwotCSpucl90b19zY2FuID0gbnJfc2Nhbm5lZDsKIAogCWRpc3Bvc2VfbGlzdCgmZnJlZWFibGUp OwogfQogCi1zdGF0aWMgdm9pZCBwcnVuZV9pY2FjaGUoaW50IGNvdW50KQotewotCXN0cnVjdCBz dXBlcl9ibG9jayAqc2IsICpuOwotCWludCB3X2NvdW50OwotCWludCB1bnVzZWQgPSBpbm9kZXNf c3RhdC5ucl91bnVzZWQ7Ci0JaW50IHBydW5lX3JhdGlvOwotCWludCBwcnVuZWQ7Ci0KLQlpZiAo dW51c2VkID09IDAgfHwgY291bnQgPT0gMCkKLQkJcmV0dXJuOwotCWRvd25fcmVhZCgmaXBydW5l X3NlbSk7Ci0JaWYgKGNvdW50ID49IHVudXNlZCkKLQkJcHJ1bmVfcmF0aW8gPSAxOwotCWVsc2UK LQkJcHJ1bmVfcmF0aW8gPSB1bnVzZWQgLyBjb3VudDsKLQlzcGluX2xvY2soJnNiX2xvY2spOwot CWxpc3RfZm9yX2VhY2hfZW50cnlfc2FmZShzYiwgbiwgJnN1cGVyX2Jsb2Nrcywgc19saXN0KSB7 Ci0JCWlmIChsaXN0X2VtcHR5KCZzYi0+c19pbnN0YW5jZXMpKQotCQkJY29udGludWU7Ci0JCWlm IChzYi0+c19ucl9pbm9kZXNfdW51c2VkID09IDApCi0JCQljb250aW51ZTsKLQkJc2ItPnNfY291 bnQrKzsKLQkJLyogTm93LCB3ZSByZWNsYWltIHVudXNlZCBkZW50cmlucyB3aXRoIGZhaXJuZXNz LgotCQkgKiBXZSByZWNsYWltIHRoZW0gc2FtZSBwZXJjZW50YWdlIGZyb20gZWFjaCBzdXBlcmJs b2NrLgotCQkgKiBXZSBjYWxjdWxhdGUgbnVtYmVyIG9mIGRlbnRyaWVzIHRvIHNjYW4gb24gdGhp cyBzYgotCQkgKiBhcyBmb2xsb3dzLCBidXQgdGhlIGltcGxlbWVudGF0aW9uIGlzIGFycmFuZ2Vk IHRvIGF2b2lkCi0JCSAqIG92ZXJmbG93czoKLQkJICogbnVtYmVyIG9mIGRlbnRyaWVzIHRvIHNj YW4gb24gdGhpcyBzYiA9Ci0JCSAqIGNvdW50ICogKG51bWJlciBvZiBkZW50cmllcyBvbiB0aGlz IHNiIC8KLQkJICogbnVtYmVyIG9mIGRlbnRyaWVzIGluIHRoZSBtYWNoaW5lKQotCQkgKi8KLQkJ c3Bpbl91bmxvY2soJnNiX2xvY2spOwotCQlpZiAocHJ1bmVfcmF0aW8gIT0gMSkKLQkJCXdfY291 bnQgPSAoc2ItPnNfbnJfaW5vZGVzX3VudXNlZCAvIHBydW5lX3JhdGlvKSArIDE7Ci0JCWVsc2UK LQkJCXdfY291bnQgPSBzYi0+c19ucl9pbm9kZXNfdW51c2VkOwotCQlwcnVuZWQgPSB3X2NvdW50 OwotCQkvKgotCQkgKiBXZSBuZWVkIHRvIGJlIHN1cmUgdGhpcyBmaWxlc3lzdGVtIGlzbid0IGJl aW5nIHVubW91bnRlZCwKLQkJICogb3RoZXJ3aXNlIHdlIGNvdWxkIHJhY2Ugd2l0aCBnZW5lcmlj X3NodXRkb3duX3N1cGVyKCksIGFuZAotCQkgKiBlbmQgdXAgaG9sZGluZyBhIHJlZmVyZW5jZSB0 byBhbiBpbm9kZSB3aGlsZSB0aGUgZmlsZXN5c3RlbQotCQkgKiBpcyB1bm1vdW50ZWQuICBTbyB3 ZSB0cnkgdG8gZ2V0IHNfdW1vdW50LCBhbmQgbWFrZSBzdXJlCi0JCSAqIHNfcm9vdCBpc24ndCBO VUxMLgotCQkgKi8KLQkJaWYgKGRvd25fcmVhZF90cnlsb2NrKCZzYi0+c191bW91bnQpKSB7Ci0J CQlpZiAoKHNiLT5zX3Jvb3QgIT0gTlVMTCkgJiYKLQkJCSAgICAoIWxpc3RfZW1wdHkoJnNiLT5z X2lub2RlX2xydSkpKSB7Ci0JCQkJc2hyaW5rX2ljYWNoZV9zYihzYiwgJndfY291bnQpOwotCQkJ CXBydW5lZCAtPSB3X2NvdW50OwotCQkJfQotCQkJdXBfcmVhZCgmc2ItPnNfdW1vdW50KTsKLQkJ fQotCQlzcGluX2xvY2soJnNiX2xvY2spOwotCQljb3VudCAtPSBwcnVuZWQ7Ci0JCV9fcHV0X3N1 cGVyKHNiKTsKLQkJLyogbW9yZSB3b3JrIGxlZnQgdG8gZG8/ICovCi0JCWlmIChjb3VudCA8PSAw KQotCQkJYnJlYWs7Ci0JfQotCXNwaW5fdW5sb2NrKCZzYl9sb2NrKTsKLQl1cF9yZWFkKCZpcHJ1 bmVfc2VtKTsKLX0KLQotLyoKLSAqIHNocmlua19pY2FjaGVfbWVtb3J5KCkgd2lsbCBhdHRlbXB0 IHRvIHJlY2xhaW0gc29tZSB1bnVzZWQgaW5vZGVzLiAgSGVyZSwKLSAqICJ1bnVzZWQiIG1lYW5z IHRoYXQgbm8gZGVudHJpZXMgYXJlIHJlZmVycmluZyB0byB0aGUgaW5vZGVzOiB0aGUgZmlsZXMg YXJlCi0gKiBub3Qgb3BlbiBhbmQgdGhlIGRjYWNoZSByZWZlcmVuY2VzIHRvIHRob3NlIGlub2Rl cyBoYXZlIGFscmVhZHkgYmVlbgotICogcmVjbGFpbWVkLgotICoKLSAqIFRoaXMgZnVuY3Rpb24g aXMgcGFzc2VkIHRoZSBudW1iZXIgb2YgaW5vZGVzIHRvIHNjYW4sIGFuZCBpdCByZXR1cm5zIHRo ZQotICogdG90YWwgbnVtYmVyIG9mIHJlbWFpbmluZyBwb3NzaWJseS1yZWNsYWltYWJsZSBpbm9k ZXMuCi0gKi8KLXN0YXRpYyBpbnQgc2hyaW5rX2ljYWNoZV9tZW1vcnkoc3RydWN0IHNocmlua2Vy ICpzaHJpbmssIGludCBuciwgZ2ZwX3QgZ2ZwX21hc2spCi17Ci0JaWYgKG5yKSB7Ci0JCS8qCi0J CSAqIE5hc3R5IGRlYWRsb2NrIGF2b2lkYW5jZS4gIFdlIG1heSBob2xkIHZhcmlvdXMgRlMgbG9j a3MsCi0JCSAqIGFuZCB3ZSBkb24ndCB3YW50IHRvIHJlY3Vyc2UgaW50byB0aGUgRlMgdGhhdCBj YWxsZWQgdXMKLQkJICogaW4gY2xlYXJfaW5vZGUoKSBhbmQgZnJpZW5kcy4uCi0JCSAqLwotCQlp ZiAoIShnZnBfbWFzayAmIF9fR0ZQX0ZTKSkKLQkJCXJldHVybiAtMTsKLQkJcHJ1bmVfaWNhY2hl KG5yKTsKLQl9Ci0JcmV0dXJuIChpbm9kZXNfc3RhdC5ucl91bnVzZWQgLyAxMDApICogc3lzY3Rs X3Zmc19jYWNoZV9wcmVzc3VyZTsKLX0KLQotc3RhdGljIHN0cnVjdCBzaHJpbmtlciBpY2FjaGVf c2hyaW5rZXIgPSB7Ci0JLnNocmluayA9IHNocmlua19pY2FjaGVfbWVtb3J5LAotCS5zZWVrcyA9 IERFRkFVTFRfU0VFS1MsCi19OwotCiBzdGF0aWMgdm9pZCBfX3dhaXRfb25fZnJlZWluZ19pbm9k ZShzdHJ1Y3QgaW5vZGUgKmlub2RlKTsKIC8qCiAgKiBDYWxsZWQgd2l0aCB0aGUgaW5vZGUgbG9j ayBoZWxkLgpAQCAtMTYzNCw3ICsxNTQyLDYgQEAgdm9pZCBfX2luaXQgaW5vZGVfaW5pdCh2b2lk KQogCQkJCQkgKFNMQUJfUkVDTEFJTV9BQ0NPVU5UfFNMQUJfUEFOSUN8CiAJCQkJCSBTTEFCX01F TV9TUFJFQUQpLAogCQkJCQkgaW5pdF9vbmNlKTsKLQlyZWdpc3Rlcl9zaHJpbmtlcigmaWNhY2hl X3Nocmlua2VyKTsKIAogCS8qIEhhc2ggbWF5IGhhdmUgYmVlbiBzZXQgdXAgaW4gaW5vZGVfaW5p dF9lYXJseSAqLwogCWlmICghaGFzaGRpc3QpCmRpZmYgLS1naXQgYS9mcy9zdXBlci5jIGIvZnMv c3VwZXIuYwppbmRleCBjNTU0YzUzLi42MTMzMzliIDEwMDY0NAotLS0gYS9mcy9zdXBlci5jCisr KyBiL2ZzL3N1cGVyLmMKQEAgLTM3LDYgKzM3LDU1IEBACiBMSVNUX0hFQUQoc3VwZXJfYmxvY2tz KTsKIERFRklORV9TUElOTE9DSyhzYl9sb2NrKTsKIAorc3RhdGljIGludCBwcnVuZV9zdXBlcihz dHJ1Y3Qgc2hyaW5rZXIgKnNocmluaywgaW50IG5yX3RvX3NjYW4sIGdmcF90IGdmcF9tYXNrKQor eworCXN0cnVjdCBzdXBlcl9ibG9jayAqc2I7CisJaW50IGNvdW50OworCisJc2IgPSBjb250YWlu ZXJfb2Yoc2hyaW5rLCBzdHJ1Y3Qgc3VwZXJfYmxvY2ssIHNfc2hyaW5rKTsKKworCS8qCisJICog RGVhZGxvY2sgYXZvaWRhbmNlLiAgV2UgbWF5IGhvbGQgdmFyaW91cyBGUyBsb2NrcywgYW5kIHdl IGRvbid0IHdhbnQKKwkgKiB0byByZWN1cnNlIGludG8gdGhlIEZTIHRoYXQgY2FsbGVkIHVzIGlu IGNsZWFyX2lub2RlKCkgYW5kIGZyaWVuZHMuLgorCSAqLworCWlmICghKGdmcF9tYXNrICYgX19H RlBfRlMpKQorCQlyZXR1cm4gLTE7CisKKwkvKgorCSAqIElmIHdlIGNhbid0IGdldCB0aGUgdW1v dW50IGxvY2ssIHRoZW4gaXQncyBiZWNhdXNlIHRoZSBzYiBpcyBiZWluZworCSAqIHVubW91bnRl ZC4gSWYgd2UgZ2V0IGhlcmUsIHRoZW4gdGhlIHVubW91bnQgaXMgbGlrZWx5IHN0dWNrIHRyeWlu ZworCSAqIHRvIHVucmVnaXN0ZXIgdGhlIHNocmlua2VyLCBzbyB3ZSBtdXN0IG5vdCBibG9jayB0 cnlpbmcgdG8gZ2V0IHRoZQorCSAqIHNiLT5zX3Vtb3VudCBvdGhlcndpc2Ugd2UgZGVhZGxvY2su IEhlbmNlIGlmIHdlIGZhaWwgdG8gZ2V0IHRoZQorCSAqIHNiX3Vtb3VudCBsb2NrLCBhYm9ydCBz aHJpbmtpbmcgdGhlIHNiIGJ5IHRlbGxpbmcgdGhlIHNocmlua2VyIG5vdAorCSAqIHRvIGNhbGwg dXMgYWdhaW4gYW5kIHRoZSB1bm1vdW50IHByb2Nlc3Mgd2lsbCBjbGVhbiB1cCB0aGUgY2FjaGUg Zm9yCisJICogdXMgYWZ0ZXIgaXQgaGFzIHVucmVnaXN0ZXJlZCB0aGUgc2hyaW5rZXIuCisJICov CisJaWYgKCFkb3duX3JlYWRfdHJ5bG9jaygmc2ItPnNfdW1vdW50KSkKKwkJcmV0dXJuIC0xOwor CisJaWYgKCFzYi0+c19yb290KSB7CisJCXVwX3JlYWQoJnNiLT5zX3Vtb3VudCk7CisJCXJldHVy biAtMTsKKwl9CisKKwlpZiAobnJfdG9fc2NhbikgeworCQkvKiBwcm9wb3J0aW9uIHRoZSBzY2Fu IGJldHdlZW4gdGhlIHR3byBjYWNoZdGVICovCisJCWludCB0b3RhbDsKKworCQl0b3RhbCA9IHNi LT5zX25yX2RlbnRyeV91bnVzZWQgKyBzYi0+c19ucl9pbm9kZXNfdW51c2VkICsgMTsKKwkJY291 bnQgPSAobnJfdG9fc2NhbiAqIHNiLT5zX25yX2RlbnRyeV91bnVzZWQpIC8gdG90YWw7CisKKwkJ LyogcHJ1bmUgZGNhY2hlIGZpcnN0IGFzIGljYWNoZSBpcyBwaW5uZWQgYnkgaXQgKi8KKwkJcHJ1 bmVfZGNhY2hlX3NiKHNiLCBjb3VudCk7CisJCXBydW5lX2ljYWNoZV9zYihzYiwgbnJfdG9fc2Nh biAtIGNvdW50KTsKKwl9CisKKwljb3VudCA9ICgoc2ItPnNfbnJfZGVudHJ5X3VudXNlZCArIHNi LT5zX25yX2lub2Rlc191bnVzZWQpIC8gMTAwKQorCQkJCQkJKiBzeXNjdGxfdmZzX2NhY2hlX3By ZXNzdXJlOworCXVwX3JlYWQoJnNiLT5zX3Vtb3VudCk7CisJcmV0dXJuIGNvdW50OworfQorCiAv KioKICAqCWFsbG9jX3N1cGVyCS0JY3JlYXRlIG5ldyBzdXBlcmJsb2NrCiAgKglAdHlwZToJZmls ZXN5c3RlbSB0eXBlIHN1cGVyYmxvY2sgc2hvdWxkIGJlbG9uZyB0bwpAQCAtOTksNiArMTQ4LDEz IEBAIHN0YXRpYyBzdHJ1Y3Qgc3VwZXJfYmxvY2sgKmFsbG9jX3N1cGVyKHN0cnVjdCBmaWxlX3N5 c3RlbV90eXBlICp0eXBlKQogCQlzLT5zX3Fjb3AgPSBzYl9xdW90YWN0bF9vcHM7CiAJCXMtPnNf b3AgPSAmZGVmYXVsdF9vcDsKIAkJcy0+c190aW1lX2dyYW4gPSAxMDAwMDAwMDAwOworCisJCS8q CisJCSAqIFRoZSBzaHJpbmtlciBpcyBzZXQgdXAgaGVyZSBidXQgbm90IHJlZ2lzdGVyZWQgdW50 aWwgYWZ0ZXIKKwkJICogdGhlIHN1cGVyYmxvY2sgaGFzIGJlZW4gZmlsbGVkIG91dCBzdWNjZXNz ZnVsbHkuCisJCSAqLworCQlzLT5zX3Nocmluay5zaHJpbmsgPSBwcnVuZV9zdXBlcjsKKwkJcy0+ c19zaHJpbmsuc2Vla3MgPSBERUZBVUxUX1NFRUtTOwogCX0KIG91dDoKIAlyZXR1cm4gczsKQEAg LTE2Miw2ICsyMTgsNyBAQCB2b2lkIGRlYWN0aXZhdGVfbG9ja2VkX3N1cGVyKHN0cnVjdCBzdXBl cl9ibG9jayAqcykKIAlzdHJ1Y3QgZmlsZV9zeXN0ZW1fdHlwZSAqZnMgPSBzLT5zX3R5cGU7CiAJ aWYgKGF0b21pY19kZWNfYW5kX3Rlc3QoJnMtPnNfYWN0aXZlKSkgewogCQl2ZnNfZHFfb2ZmKHMs IDApOworCQl1bnJlZ2lzdGVyX3Nocmlua2VyKCZzLT5zX3Nocmluayk7CiAJCWZzLT5raWxsX3Ni KHMpOwogCQlwdXRfZmlsZXN5c3RlbShmcyk7CiAJCXB1dF9zdXBlcihzKTsKQEAgLTMzNSw2ICsz OTIsNyBAQCByZXRyeToKIAlsaXN0X2FkZF90YWlsKCZzLT5zX2xpc3QsICZzdXBlcl9ibG9ja3Mp OwogCWxpc3RfYWRkKCZzLT5zX2luc3RhbmNlcywgJnR5cGUtPmZzX3N1cGVycyk7CiAJc3Bpbl91 bmxvY2soJnNiX2xvY2spOworCXJlZ2lzdGVyX3Nocmlua2VyKCZzLT5zX3Nocmluayk7CiAJZ2V0 X2ZpbGVzeXN0ZW0odHlwZSk7CiAJcmV0dXJuIHM7CiB9CmRpZmYgLS1naXQgYS9pbmNsdWRlL2xp bnV4L2ZzLmggYi9pbmNsdWRlL2xpbnV4L2ZzLmgKaW5kZXggN2I5MGM0My4uNWJmZjJkYyAxMDA2 NDQKLS0tIGEvaW5jbHVkZS9saW51eC9mcy5oCisrKyBiL2luY2x1ZGUvbGludXgvZnMuaApAQCAt MzgyLDYgKzM4Miw3IEBAIHN0cnVjdCBpbm9kZXNfc3RhdF90IHsKICNpbmNsdWRlIDxsaW51eC9j YXBhYmlsaXR5Lmg+CiAjaW5jbHVkZSA8bGludXgvc2VtYXBob3JlLmg+CiAjaW5jbHVkZSA8bGlu dXgvZmllbWFwLmg+CisjaW5jbHVkZSA8bGludXgvbW0uaD4KIAogI2luY2x1ZGUgPGFzbS9hdG9t aWMuaD4KICNpbmNsdWRlIDxhc20vYnl0ZW9yZGVyLmg+CkBAIC0xMzg1LDggKzEzODYsMTQgQEAg c3RydWN0IHN1cGVyX2Jsb2NrIHsKIAkgKiBnZW5lcmljX3Nob3dfb3B0aW9ucygpCiAJICovCiAJ Y2hhciAqc19vcHRpb25zOworCisJc3RydWN0IHNocmlua2VyIHNfc2hyaW5rOwkvKiBwZXItc2Ig c2hyaW5rZXIgaGFuZGxlICovCiB9OwogCisvKiBzdXBlcmJsb2NrIGNhY2hlIHBydW5pbmcgZnVu Y3Rpb25zICovCit2b2lkIHBydW5lX2ljYWNoZV9zYihzdHJ1Y3Qgc3VwZXJfYmxvY2sgKnNiLCBp bnQgbnJfdG9fc2Nhbik7Cit2b2lkIHBydW5lX2RjYWNoZV9zYihzdHJ1Y3Qgc3VwZXJfYmxvY2sg KnNiLCBpbnQgbnJfdG9fc2Nhbik7CisKIGV4dGVybiBzdHJ1Y3QgdGltZXNwZWMgY3VycmVudF9m c190aW1lKHN0cnVjdCBzdXBlcl9ibG9jayAqc2IpOwogCiAvKgoKX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fX19fX19fX19fX18KeGZzIG1haWxpbmcgbGlzdAp4ZnNAb3NzLnNn aS5jb20KaHR0cDovL29zcy5zZ2kuY29tL21haWxtYW4vbGlzdGluZm8veGZzCg== From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756998Ab0E0ByB (ORCPT ); Wed, 26 May 2010 21:54:01 -0400 Received: from bld-mail15.adl6.internode.on.net ([150.101.137.100]:37032 "EHLO mail.internode.on.net" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1752853Ab0E0Bx7 (ORCPT ); Wed, 26 May 2010 21:53:59 -0400 Date: Thu, 27 May 2010 11:53:35 +1000 From: Dave Chinner To: Nick Piggin Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, xfs@oss.sgi.com Subject: [PATCH 3/5 v2] superblock: introduce per-sb cache shrinker infrastructure Message-ID: <20100527015335.GD1395@dastard> References: <1274777588-21494-1-git-send-email-david@fromorbit.com> <1274777588-21494-4-git-send-email-david@fromorbit.com> <20100526164116.GD22536@laptop> <20100526231214.GB1395@dastard> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20100526231214.GB1395@dastard> User-Agent: Mutt/1.5.20 (2009-06-14) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, May 27, 2010 at 09:12:14AM +1000, Dave Chinner wrote: > On Thu, May 27, 2010 at 02:41:16AM +1000, Nick Piggin wrote: .... > > Nitpick but I prefer just the restart label wher it is previously. This > > is moving setup for the next iteration into the "error" case. > > Ok, will fix. .... > > Would you just elaborate on the lock order problem somewhere? (the > > comment makes it look like we *could* take the mutex if we wanted > > to). > > The shrinker is unregistered in deactivate_locked_super() which is > just before ->kill_sb is called. The sb->s_umount lock is held at > this point. hence is the shrinker is operating, we will deadlock if > we try to lock it like this: > > unmount: shrinker: > down_read(&shrinker_lock); > down_write(&sb->s_umount) > unregister_shrinker() > down_write(&shrinker_lock) > prune_super() > down_read(&sb->s_umount); > (deadlock) > > hence if we can't get the sb->s_umount lock in prune_super(), then > the superblock must be being unmounted and the shrinker should abort > as the ->kill_sb method will clean up everything after the shrinker > is unregistered. Hence the down_read_trylock(). Updated patch below with these issues fixed. Cheers, Dave. -- Dave Chinner david@fromorbit.com superblock: introduce per-sb cache shrinker infrastructure From: Dave Chinner With context based shrinkers, we can implement a per-superblock shrinker that shrinks the caches attached to the superblock. We currently have global shrinkers for the inode and dentry caches that split up into per-superblock operations via a coarse proportioning method that does not batch very well. The global shrinkers also have a dependency - dentries pin inodes - so we have to be very careful about how we register the global shrinkers so that the implicit call order is always correct. With a per-sb shrinker callout, we can encode this dependency directly into the per-sb shrinker, hence avoiding the need for strictly ordering shrinker registrations. We also have no need for any proportioning code for the shrinker subsystem already provides this functionality across all shrinkers. Allowing the shrinker to operate on a single superblock at a time means that we do less superblock list traversals and locking and reclaim should batch more effectively. This should result in less CPU overhead for reclaim and potentially faster reclaim of items from each filesystem. Signed-off-by: Dave Chinner --- Version 2: - change loop restart in __shrink_dcache_sb() to match previous restart semantics - add a better comment in prune_super() to explain the deadlock we are avoiding by using down_read_trylock(&sb->s_umount) before starting any shrinking. fs/dcache.c | 127 +++++++--------------------------------------------- fs/inode.c | 109 +++----------------------------------------- fs/super.c | 58 ++++++++++++++++++++++++ include/linux/fs.h | 7 +++ 4 files changed, 89 insertions(+), 212 deletions(-) diff --git a/fs/dcache.c b/fs/dcache.c index dba6b6d..a7cd335 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -456,21 +456,17 @@ static void prune_one_dentry(struct dentry * dentry) * which flags are set. This means we don't need to maintain multiple * similar copies of this loop. */ -static void __shrink_dcache_sb(struct super_block *sb, int *count, int flags) +static void __shrink_dcache_sb(struct super_block *sb, int count, int flags) { LIST_HEAD(referenced); LIST_HEAD(tmp); struct dentry *dentry; - int cnt = 0; BUG_ON(!sb); - BUG_ON((flags & DCACHE_REFERENCED) && count == NULL); + BUG_ON((flags & DCACHE_REFERENCED) && count == -1); spin_lock(&dcache_lock); - if (count != NULL) - /* called from prune_dcache() and shrink_dcache_parent() */ - cnt = *count; restart: - if (count == NULL) + if (count == -1) list_splice_init(&sb->s_dentry_lru, &tmp); else { while (!list_empty(&sb->s_dentry_lru)) { @@ -492,8 +488,7 @@ restart: } else { list_move_tail(&dentry->d_lru, &tmp); spin_unlock(&dentry->d_lock); - cnt--; - if (!cnt) + if (--count == 0) break; } cond_resched_lock(&dcache_lock); @@ -516,88 +511,27 @@ restart: /* dentry->d_lock was dropped in prune_one_dentry() */ cond_resched_lock(&dcache_lock); } - if (count == NULL && !list_empty(&sb->s_dentry_lru)) + if (count == -1 && !list_empty(&sb->s_dentry_lru)) goto restart; - if (count != NULL) - *count = cnt; if (!list_empty(&referenced)) list_splice(&referenced, &sb->s_dentry_lru); spin_unlock(&dcache_lock); } /** - * prune_dcache - shrink the dcache - * @count: number of entries to try to free + * prune_dcache_sb - shrink the dcache + * @nr_to_scan: number of entries to try to free * - * Shrink the dcache. This is done when we need more memory, or simply when we - * need to unmount something (at which point we need to unuse all dentries). + * Attempt to shrink the superblock dcache LRU by @nr_to_scan entries. This is + * done when we need more memory an called from the superblock shrinker + * function. * - * This function may fail to free any resources if all the dentries are in use. + * This function may fail to free any resources if all the dentries are in + * use. */ -static void prune_dcache(int count) +void prune_dcache_sb(struct super_block *sb, int nr_to_scan) { - struct super_block *sb, *n; - int w_count; - int unused = dentry_stat.nr_unused; - int prune_ratio; - int pruned; - - if (unused == 0 || count == 0) - return; - spin_lock(&dcache_lock); - if (count >= unused) - prune_ratio = 1; - else - prune_ratio = unused / count; - spin_lock(&sb_lock); - list_for_each_entry_safe(sb, n, &super_blocks, s_list) { - if (list_empty(&sb->s_instances)) - continue; - if (sb->s_nr_dentry_unused == 0) - continue; - sb->s_count++; - /* Now, we reclaim unused dentrins with fairness. - * We reclaim them same percentage from each superblock. - * We calculate number of dentries to scan on this sb - * as follows, but the implementation is arranged to avoid - * overflows: - * number of dentries to scan on this sb = - * count * (number of dentries on this sb / - * number of dentries in the machine) - */ - spin_unlock(&sb_lock); - if (prune_ratio != 1) - w_count = (sb->s_nr_dentry_unused / prune_ratio) + 1; - else - w_count = sb->s_nr_dentry_unused; - pruned = w_count; - /* - * We need to be sure this filesystem isn't being unmounted, - * otherwise we could race with generic_shutdown_super(), and - * end up holding a reference to an inode while the filesystem - * is unmounted. So we try to get s_umount, and make sure - * s_root isn't NULL. - */ - if (down_read_trylock(&sb->s_umount)) { - if ((sb->s_root != NULL) && - (!list_empty(&sb->s_dentry_lru))) { - spin_unlock(&dcache_lock); - __shrink_dcache_sb(sb, &w_count, - DCACHE_REFERENCED); - pruned -= w_count; - spin_lock(&dcache_lock); - } - up_read(&sb->s_umount); - } - spin_lock(&sb_lock); - count -= pruned; - __put_super(sb); - /* more work left to do? */ - if (count <= 0) - break; - } - spin_unlock(&sb_lock); - spin_unlock(&dcache_lock); + __shrink_dcache_sb(sb, nr_to_scan, DCACHE_REFERENCED); } /** @@ -610,7 +544,7 @@ static void prune_dcache(int count) */ void shrink_dcache_sb(struct super_block * sb) { - __shrink_dcache_sb(sb, NULL, 0); + __shrink_dcache_sb(sb, -1, 0); } EXPORT_SYMBOL(shrink_dcache_sb); @@ -878,37 +812,10 @@ void shrink_dcache_parent(struct dentry * parent) int found; while ((found = select_parent(parent)) != 0) - __shrink_dcache_sb(sb, &found, 0); + __shrink_dcache_sb(sb, found, 0); } EXPORT_SYMBOL(shrink_dcache_parent); -/* - * Scan `nr' dentries and return the number which remain. - * - * We need to avoid reentering the filesystem if the caller is performing a - * GFP_NOFS allocation attempt. One example deadlock is: - * - * ext2_new_block->getblk->GFP->shrink_dcache_memory->prune_dcache-> - * prune_one_dentry->dput->dentry_iput->iput->inode->i_sb->s_op->put_inode-> - * ext2_discard_prealloc->ext2_free_blocks->lock_super->DEADLOCK. - * - * In this case we return -1 to tell the caller that we baled. - */ -static int shrink_dcache_memory(struct shrinker *shrink, int nr, gfp_t gfp_mask) -{ - if (nr) { - if (!(gfp_mask & __GFP_FS)) - return -1; - prune_dcache(nr); - } - return (dentry_stat.nr_unused / 100) * sysctl_vfs_cache_pressure; -} - -static struct shrinker dcache_shrinker = { - .shrink = shrink_dcache_memory, - .seeks = DEFAULT_SEEKS, -}; - /** * d_alloc - allocate a dcache entry * @parent: parent of entry to allocate @@ -2316,8 +2223,6 @@ static void __init dcache_init(void) */ dentry_cache = KMEM_CACHE(dentry, SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|SLAB_MEM_SPREAD); - - register_shrinker(&dcache_shrinker); /* Hash may have been set up in dcache_init_early */ if (!hashdist) diff --git a/fs/inode.c b/fs/inode.c index 1e44ec5..5fb4a39 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -25,7 +25,6 @@ #include #include #include -#include "internal.h" /* * This is needed for the following functions: @@ -441,8 +440,10 @@ static int can_unuse(struct inode *inode) } /* - * Scan `goal' inodes on the unused list for freeable ones. They are moved to - * a temporary list and then are freed outside inode_lock by dispose_list(). + * Walk the superblock inode LRU for freeable inodes and attempt to free them. + * This is called from the superblock shrinker function with a number of inodes + * to trim from the LRU. Inodes to be freed are moved to a temporary list and + * then are freed outside inode_lock by dispose_list(). * * Any inodes which are pinned purely because of attached pagecache have their * pagecache removed. We expect the final iput() on that inode to add it to @@ -450,10 +451,10 @@ static int can_unuse(struct inode *inode) * inode is still freeable, proceed. The right inode is found 99.9% of the * time in testing on a 4-way. * - * If the inode has metadata buffers attached to mapping->private_list then - * try to remove them. + * If the inode has metadata buffers attached to mapping->private_list then try + * to remove them. */ -static void shrink_icache_sb(struct super_block *sb, int *nr_to_scan) +void prune_icache_sb(struct super_block *sb, int nr_to_scan) { LIST_HEAD(freeable); int nr_pruned = 0; @@ -461,7 +462,7 @@ static void shrink_icache_sb(struct super_block *sb, int *nr_to_scan) unsigned long reap = 0; spin_lock(&inode_lock); - for (nr_scanned = *nr_to_scan; nr_scanned >= 0; nr_scanned--) { + for (nr_scanned = nr_to_scan; nr_scanned >= 0; nr_scanned--) { struct inode *inode; if (list_empty(&sb->s_inode_lru)) @@ -500,103 +501,10 @@ static void shrink_icache_sb(struct super_block *sb, int *nr_to_scan) else __count_vm_events(PGINODESTEAL, reap); spin_unlock(&inode_lock); - *nr_to_scan = nr_scanned; dispose_list(&freeable); } -static void prune_icache(int count) -{ - struct super_block *sb, *n; - int w_count; - int unused = inodes_stat.nr_unused; - int prune_ratio; - int pruned; - - if (unused == 0 || count == 0) - return; - down_read(&iprune_sem); - if (count >= unused) - prune_ratio = 1; - else - prune_ratio = unused / count; - spin_lock(&sb_lock); - list_for_each_entry_safe(sb, n, &super_blocks, s_list) { - if (list_empty(&sb->s_instances)) - continue; - if (sb->s_nr_inodes_unused == 0) - continue; - sb->s_count++; - /* Now, we reclaim unused dentrins with fairness. - * We reclaim them same percentage from each superblock. - * We calculate number of dentries to scan on this sb - * as follows, but the implementation is arranged to avoid - * overflows: - * number of dentries to scan on this sb = - * count * (number of dentries on this sb / - * number of dentries in the machine) - */ - spin_unlock(&sb_lock); - if (prune_ratio != 1) - w_count = (sb->s_nr_inodes_unused / prune_ratio) + 1; - else - w_count = sb->s_nr_inodes_unused; - pruned = w_count; - /* - * We need to be sure this filesystem isn't being unmounted, - * otherwise we could race with generic_shutdown_super(), and - * end up holding a reference to an inode while the filesystem - * is unmounted. So we try to get s_umount, and make sure - * s_root isn't NULL. - */ - if (down_read_trylock(&sb->s_umount)) { - if ((sb->s_root != NULL) && - (!list_empty(&sb->s_inode_lru))) { - shrink_icache_sb(sb, &w_count); - pruned -= w_count; - } - up_read(&sb->s_umount); - } - spin_lock(&sb_lock); - count -= pruned; - __put_super(sb); - /* more work left to do? */ - if (count <= 0) - break; - } - spin_unlock(&sb_lock); - up_read(&iprune_sem); -} - -/* - * shrink_icache_memory() will attempt to reclaim some unused inodes. Here, - * "unused" means that no dentries are referring to the inodes: the files are - * not open and the dcache references to those inodes have already been - * reclaimed. - * - * This function is passed the number of inodes to scan, and it returns the - * total number of remaining possibly-reclaimable inodes. - */ -static int shrink_icache_memory(struct shrinker *shrink, int nr, gfp_t gfp_mask) -{ - if (nr) { - /* - * Nasty deadlock avoidance. We may hold various FS locks, - * and we don't want to recurse into the FS that called us - * in clear_inode() and friends.. - */ - if (!(gfp_mask & __GFP_FS)) - return -1; - prune_icache(nr); - } - return (inodes_stat.nr_unused / 100) * sysctl_vfs_cache_pressure; -} - -static struct shrinker icache_shrinker = { - .shrink = shrink_icache_memory, - .seeks = DEFAULT_SEEKS, -}; - static void __wait_on_freeing_inode(struct inode *inode); /* * Called with the inode lock held. @@ -1634,7 +1542,6 @@ void __init inode_init(void) (SLAB_RECLAIM_ACCOUNT|SLAB_PANIC| SLAB_MEM_SPREAD), init_once); - register_shrinker(&icache_shrinker); /* Hash may have been set up in inode_init_early */ if (!hashdist) diff --git a/fs/super.c b/fs/super.c index c554c53..613339b 100644 --- a/fs/super.c +++ b/fs/super.c @@ -37,6 +37,55 @@ LIST_HEAD(super_blocks); DEFINE_SPINLOCK(sb_lock); +static int prune_super(struct shrinker *shrink, int nr_to_scan, gfp_t gfp_mask) +{ + struct super_block *sb; + int count; + + sb = container_of(shrink, struct super_block, s_shrink); + + /* + * Deadlock avoidance. We may hold various FS locks, and we don't want + * to recurse into the FS that called us in clear_inode() and friends.. + */ + if (!(gfp_mask & __GFP_FS)) + return -1; + + /* + * If we can't get the umount lock, then it's because the sb is being + * unmounted. If we get here, then the unmount is likely stuck trying + * to unregister the shrinker, so we must not block trying to get the + * sb->s_umount otherwise we deadlock. Hence if we fail to get the + * sb_umount lock, abort shrinking the sb by telling the shrinker not + * to call us again and the unmount process will clean up the cache for + * us after it has unregistered the shrinker. + */ + if (!down_read_trylock(&sb->s_umount)) + return -1; + + if (!sb->s_root) { + up_read(&sb->s_umount); + return -1; + } + + if (nr_to_scan) { + /* proportion the scan between the two cacheѕ */ + int total; + + total = sb->s_nr_dentry_unused + sb->s_nr_inodes_unused + 1; + count = (nr_to_scan * sb->s_nr_dentry_unused) / total; + + /* prune dcache first as icache is pinned by it */ + prune_dcache_sb(sb, count); + prune_icache_sb(sb, nr_to_scan - count); + } + + count = ((sb->s_nr_dentry_unused + sb->s_nr_inodes_unused) / 100) + * sysctl_vfs_cache_pressure; + up_read(&sb->s_umount); + return count; +} + /** * alloc_super - create new superblock * @type: filesystem type superblock should belong to @@ -99,6 +148,13 @@ static struct super_block *alloc_super(struct file_system_type *type) s->s_qcop = sb_quotactl_ops; s->s_op = &default_op; s->s_time_gran = 1000000000; + + /* + * The shrinker is set up here but not registered until after + * the superblock has been filled out successfully. + */ + s->s_shrink.shrink = prune_super; + s->s_shrink.seeks = DEFAULT_SEEKS; } out: return s; @@ -162,6 +218,7 @@ void deactivate_locked_super(struct super_block *s) struct file_system_type *fs = s->s_type; if (atomic_dec_and_test(&s->s_active)) { vfs_dq_off(s, 0); + unregister_shrinker(&s->s_shrink); fs->kill_sb(s); put_filesystem(fs); put_super(s); @@ -335,6 +392,7 @@ retry: list_add_tail(&s->s_list, &super_blocks); list_add(&s->s_instances, &type->fs_supers); spin_unlock(&sb_lock); + register_shrinker(&s->s_shrink); get_filesystem(type); return s; } diff --git a/include/linux/fs.h b/include/linux/fs.h index 7b90c43..5bff2dc 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -382,6 +382,7 @@ struct inodes_stat_t { #include #include #include +#include #include #include @@ -1385,8 +1386,14 @@ struct super_block { * generic_show_options() */ char *s_options; + + struct shrinker s_shrink; /* per-sb shrinker handle */ }; +/* superblock cache pruning functions */ +void prune_icache_sb(struct super_block *sb, int nr_to_scan); +void prune_dcache_sb(struct super_block *sb, int nr_to_scan); + extern struct timespec current_fs_time(struct super_block *sb); /* From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail144.messagelabs.com (mail144.messagelabs.com [216.82.254.51]) by kanga.kvack.org (Postfix) with SMTP id 984106B01B5 for ; Wed, 26 May 2010 21:58:12 -0400 (EDT) Date: Thu, 27 May 2010 11:53:35 +1000 From: Dave Chinner Subject: [PATCH 3/5 v2] superblock: introduce per-sb cache shrinker infrastructure Message-ID: <20100527015335.GD1395@dastard> References: <1274777588-21494-1-git-send-email-david@fromorbit.com> <1274777588-21494-4-git-send-email-david@fromorbit.com> <20100526164116.GD22536@laptop> <20100526231214.GB1395@dastard> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20100526231214.GB1395@dastard> Sender: owner-linux-mm@kvack.org To: Nick Piggin Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, xfs@oss.sgi.com List-ID: On Thu, May 27, 2010 at 09:12:14AM +1000, Dave Chinner wrote: > On Thu, May 27, 2010 at 02:41:16AM +1000, Nick Piggin wrote: .... > > Nitpick but I prefer just the restart label wher it is previously. This > > is moving setup for the next iteration into the "error" case. > > Ok, will fix. .... > > Would you just elaborate on the lock order problem somewhere? (the > > comment makes it look like we *could* take the mutex if we wanted > > to). > > The shrinker is unregistered in deactivate_locked_super() which is > just before ->kill_sb is called. The sb->s_umount lock is held at > this point. hence is the shrinker is operating, we will deadlock if > we try to lock it like this: > > unmount: shrinker: > down_read(&shrinker_lock); > down_write(&sb->s_umount) > unregister_shrinker() > down_write(&shrinker_lock) > prune_super() > down_read(&sb->s_umount); > (deadlock) > > hence if we can't get the sb->s_umount lock in prune_super(), then > the superblock must be being unmounted and the shrinker should abort > as the ->kill_sb method will clean up everything after the shrinker > is unregistered. Hence the down_read_trylock(). Updated patch below with these issues fixed. Cheers, Dave. -- Dave Chinner david@fromorbit.com superblock: introduce per-sb cache shrinker infrastructure From: Dave Chinner With context based shrinkers, we can implement a per-superblock shrinker that shrinks the caches attached to the superblock. We currently have global shrinkers for the inode and dentry caches that split up into per-superblock operations via a coarse proportioning method that does not batch very well. The global shrinkers also have a dependency - dentries pin inodes - so we have to be very careful about how we register the global shrinkers so that the implicit call order is always correct. With a per-sb shrinker callout, we can encode this dependency directly into the per-sb shrinker, hence avoiding the need for strictly ordering shrinker registrations. We also have no need for any proportioning code for the shrinker subsystem already provides this functionality across all shrinkers. Allowing the shrinker to operate on a single superblock at a time means that we do less superblock list traversals and locking and reclaim should batch more effectively. This should result in less CPU overhead for reclaim and potentially faster reclaim of items from each filesystem. Signed-off-by: Dave Chinner --- Version 2: - change loop restart in __shrink_dcache_sb() to match previous restart semantics - add a better comment in prune_super() to explain the deadlock we are avoiding by using down_read_trylock(&sb->s_umount) before starting any shrinking. fs/dcache.c | 127 +++++++--------------------------------------------- fs/inode.c | 109 +++----------------------------------------- fs/super.c | 58 ++++++++++++++++++++++++ include/linux/fs.h | 7 +++ 4 files changed, 89 insertions(+), 212 deletions(-) diff --git a/fs/dcache.c b/fs/dcache.c index dba6b6d..a7cd335 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -456,21 +456,17 @@ static void prune_one_dentry(struct dentry * dentry) * which flags are set. This means we don't need to maintain multiple * similar copies of this loop. */ -static void __shrink_dcache_sb(struct super_block *sb, int *count, int flags) +static void __shrink_dcache_sb(struct super_block *sb, int count, int flags) { LIST_HEAD(referenced); LIST_HEAD(tmp); struct dentry *dentry; - int cnt = 0; BUG_ON(!sb); - BUG_ON((flags & DCACHE_REFERENCED) && count == NULL); + BUG_ON((flags & DCACHE_REFERENCED) && count == -1); spin_lock(&dcache_lock); - if (count != NULL) - /* called from prune_dcache() and shrink_dcache_parent() */ - cnt = *count; restart: - if (count == NULL) + if (count == -1) list_splice_init(&sb->s_dentry_lru, &tmp); else { while (!list_empty(&sb->s_dentry_lru)) { @@ -492,8 +488,7 @@ restart: } else { list_move_tail(&dentry->d_lru, &tmp); spin_unlock(&dentry->d_lock); - cnt--; - if (!cnt) + if (--count == 0) break; } cond_resched_lock(&dcache_lock); @@ -516,88 +511,27 @@ restart: /* dentry->d_lock was dropped in prune_one_dentry() */ cond_resched_lock(&dcache_lock); } - if (count == NULL && !list_empty(&sb->s_dentry_lru)) + if (count == -1 && !list_empty(&sb->s_dentry_lru)) goto restart; - if (count != NULL) - *count = cnt; if (!list_empty(&referenced)) list_splice(&referenced, &sb->s_dentry_lru); spin_unlock(&dcache_lock); } /** - * prune_dcache - shrink the dcache - * @count: number of entries to try to free + * prune_dcache_sb - shrink the dcache + * @nr_to_scan: number of entries to try to free * - * Shrink the dcache. This is done when we need more memory, or simply when we - * need to unmount something (at which point we need to unuse all dentries). + * Attempt to shrink the superblock dcache LRU by @nr_to_scan entries. This is + * done when we need more memory an called from the superblock shrinker + * function. * - * This function may fail to free any resources if all the dentries are in use. + * This function may fail to free any resources if all the dentries are in + * use. */ -static void prune_dcache(int count) +void prune_dcache_sb(struct super_block *sb, int nr_to_scan) { - struct super_block *sb, *n; - int w_count; - int unused = dentry_stat.nr_unused; - int prune_ratio; - int pruned; - - if (unused == 0 || count == 0) - return; - spin_lock(&dcache_lock); - if (count >= unused) - prune_ratio = 1; - else - prune_ratio = unused / count; - spin_lock(&sb_lock); - list_for_each_entry_safe(sb, n, &super_blocks, s_list) { - if (list_empty(&sb->s_instances)) - continue; - if (sb->s_nr_dentry_unused == 0) - continue; - sb->s_count++; - /* Now, we reclaim unused dentrins with fairness. - * We reclaim them same percentage from each superblock. - * We calculate number of dentries to scan on this sb - * as follows, but the implementation is arranged to avoid - * overflows: - * number of dentries to scan on this sb = - * count * (number of dentries on this sb / - * number of dentries in the machine) - */ - spin_unlock(&sb_lock); - if (prune_ratio != 1) - w_count = (sb->s_nr_dentry_unused / prune_ratio) + 1; - else - w_count = sb->s_nr_dentry_unused; - pruned = w_count; - /* - * We need to be sure this filesystem isn't being unmounted, - * otherwise we could race with generic_shutdown_super(), and - * end up holding a reference to an inode while the filesystem - * is unmounted. So we try to get s_umount, and make sure - * s_root isn't NULL. - */ - if (down_read_trylock(&sb->s_umount)) { - if ((sb->s_root != NULL) && - (!list_empty(&sb->s_dentry_lru))) { - spin_unlock(&dcache_lock); - __shrink_dcache_sb(sb, &w_count, - DCACHE_REFERENCED); - pruned -= w_count; - spin_lock(&dcache_lock); - } - up_read(&sb->s_umount); - } - spin_lock(&sb_lock); - count -= pruned; - __put_super(sb); - /* more work left to do? */ - if (count <= 0) - break; - } - spin_unlock(&sb_lock); - spin_unlock(&dcache_lock); + __shrink_dcache_sb(sb, nr_to_scan, DCACHE_REFERENCED); } /** @@ -610,7 +544,7 @@ static void prune_dcache(int count) */ void shrink_dcache_sb(struct super_block * sb) { - __shrink_dcache_sb(sb, NULL, 0); + __shrink_dcache_sb(sb, -1, 0); } EXPORT_SYMBOL(shrink_dcache_sb); @@ -878,37 +812,10 @@ void shrink_dcache_parent(struct dentry * parent) int found; while ((found = select_parent(parent)) != 0) - __shrink_dcache_sb(sb, &found, 0); + __shrink_dcache_sb(sb, found, 0); } EXPORT_SYMBOL(shrink_dcache_parent); -/* - * Scan `nr' dentries and return the number which remain. - * - * We need to avoid reentering the filesystem if the caller is performing a - * GFP_NOFS allocation attempt. One example deadlock is: - * - * ext2_new_block->getblk->GFP->shrink_dcache_memory->prune_dcache-> - * prune_one_dentry->dput->dentry_iput->iput->inode->i_sb->s_op->put_inode-> - * ext2_discard_prealloc->ext2_free_blocks->lock_super->DEADLOCK. - * - * In this case we return -1 to tell the caller that we baled. - */ -static int shrink_dcache_memory(struct shrinker *shrink, int nr, gfp_t gfp_mask) -{ - if (nr) { - if (!(gfp_mask & __GFP_FS)) - return -1; - prune_dcache(nr); - } - return (dentry_stat.nr_unused / 100) * sysctl_vfs_cache_pressure; -} - -static struct shrinker dcache_shrinker = { - .shrink = shrink_dcache_memory, - .seeks = DEFAULT_SEEKS, -}; - /** * d_alloc - allocate a dcache entry * @parent: parent of entry to allocate @@ -2316,8 +2223,6 @@ static void __init dcache_init(void) */ dentry_cache = KMEM_CACHE(dentry, SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|SLAB_MEM_SPREAD); - - register_shrinker(&dcache_shrinker); /* Hash may have been set up in dcache_init_early */ if (!hashdist) diff --git a/fs/inode.c b/fs/inode.c index 1e44ec5..5fb4a39 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -25,7 +25,6 @@ #include #include #include -#include "internal.h" /* * This is needed for the following functions: @@ -441,8 +440,10 @@ static int can_unuse(struct inode *inode) } /* - * Scan `goal' inodes on the unused list for freeable ones. They are moved to - * a temporary list and then are freed outside inode_lock by dispose_list(). + * Walk the superblock inode LRU for freeable inodes and attempt to free them. + * This is called from the superblock shrinker function with a number of inodes + * to trim from the LRU. Inodes to be freed are moved to a temporary list and + * then are freed outside inode_lock by dispose_list(). * * Any inodes which are pinned purely because of attached pagecache have their * pagecache removed. We expect the final iput() on that inode to add it to @@ -450,10 +451,10 @@ static int can_unuse(struct inode *inode) * inode is still freeable, proceed. The right inode is found 99.9% of the * time in testing on a 4-way. * - * If the inode has metadata buffers attached to mapping->private_list then - * try to remove them. + * If the inode has metadata buffers attached to mapping->private_list then try + * to remove them. */ -static void shrink_icache_sb(struct super_block *sb, int *nr_to_scan) +void prune_icache_sb(struct super_block *sb, int nr_to_scan) { LIST_HEAD(freeable); int nr_pruned = 0; @@ -461,7 +462,7 @@ static void shrink_icache_sb(struct super_block *sb, int *nr_to_scan) unsigned long reap = 0; spin_lock(&inode_lock); - for (nr_scanned = *nr_to_scan; nr_scanned >= 0; nr_scanned--) { + for (nr_scanned = nr_to_scan; nr_scanned >= 0; nr_scanned--) { struct inode *inode; if (list_empty(&sb->s_inode_lru)) @@ -500,103 +501,10 @@ static void shrink_icache_sb(struct super_block *sb, int *nr_to_scan) else __count_vm_events(PGINODESTEAL, reap); spin_unlock(&inode_lock); - *nr_to_scan = nr_scanned; dispose_list(&freeable); } -static void prune_icache(int count) -{ - struct super_block *sb, *n; - int w_count; - int unused = inodes_stat.nr_unused; - int prune_ratio; - int pruned; - - if (unused == 0 || count == 0) - return; - down_read(&iprune_sem); - if (count >= unused) - prune_ratio = 1; - else - prune_ratio = unused / count; - spin_lock(&sb_lock); - list_for_each_entry_safe(sb, n, &super_blocks, s_list) { - if (list_empty(&sb->s_instances)) - continue; - if (sb->s_nr_inodes_unused == 0) - continue; - sb->s_count++; - /* Now, we reclaim unused dentrins with fairness. - * We reclaim them same percentage from each superblock. - * We calculate number of dentries to scan on this sb - * as follows, but the implementation is arranged to avoid - * overflows: - * number of dentries to scan on this sb = - * count * (number of dentries on this sb / - * number of dentries in the machine) - */ - spin_unlock(&sb_lock); - if (prune_ratio != 1) - w_count = (sb->s_nr_inodes_unused / prune_ratio) + 1; - else - w_count = sb->s_nr_inodes_unused; - pruned = w_count; - /* - * We need to be sure this filesystem isn't being unmounted, - * otherwise we could race with generic_shutdown_super(), and - * end up holding a reference to an inode while the filesystem - * is unmounted. So we try to get s_umount, and make sure - * s_root isn't NULL. - */ - if (down_read_trylock(&sb->s_umount)) { - if ((sb->s_root != NULL) && - (!list_empty(&sb->s_inode_lru))) { - shrink_icache_sb(sb, &w_count); - pruned -= w_count; - } - up_read(&sb->s_umount); - } - spin_lock(&sb_lock); - count -= pruned; - __put_super(sb); - /* more work left to do? */ - if (count <= 0) - break; - } - spin_unlock(&sb_lock); - up_read(&iprune_sem); -} - -/* - * shrink_icache_memory() will attempt to reclaim some unused inodes. Here, - * "unused" means that no dentries are referring to the inodes: the files are - * not open and the dcache references to those inodes have already been - * reclaimed. - * - * This function is passed the number of inodes to scan, and it returns the - * total number of remaining possibly-reclaimable inodes. - */ -static int shrink_icache_memory(struct shrinker *shrink, int nr, gfp_t gfp_mask) -{ - if (nr) { - /* - * Nasty deadlock avoidance. We may hold various FS locks, - * and we don't want to recurse into the FS that called us - * in clear_inode() and friends.. - */ - if (!(gfp_mask & __GFP_FS)) - return -1; - prune_icache(nr); - } - return (inodes_stat.nr_unused / 100) * sysctl_vfs_cache_pressure; -} - -static struct shrinker icache_shrinker = { - .shrink = shrink_icache_memory, - .seeks = DEFAULT_SEEKS, -}; - static void __wait_on_freeing_inode(struct inode *inode); /* * Called with the inode lock held. @@ -1634,7 +1542,6 @@ void __init inode_init(void) (SLAB_RECLAIM_ACCOUNT|SLAB_PANIC| SLAB_MEM_SPREAD), init_once); - register_shrinker(&icache_shrinker); /* Hash may have been set up in inode_init_early */ if (!hashdist) diff --git a/fs/super.c b/fs/super.c index c554c53..613339b 100644 --- a/fs/super.c +++ b/fs/super.c @@ -37,6 +37,55 @@ LIST_HEAD(super_blocks); DEFINE_SPINLOCK(sb_lock); +static int prune_super(struct shrinker *shrink, int nr_to_scan, gfp_t gfp_mask) +{ + struct super_block *sb; + int count; + + sb = container_of(shrink, struct super_block, s_shrink); + + /* + * Deadlock avoidance. We may hold various FS locks, and we don't want + * to recurse into the FS that called us in clear_inode() and friends.. + */ + if (!(gfp_mask & __GFP_FS)) + return -1; + + /* + * If we can't get the umount lock, then it's because the sb is being + * unmounted. If we get here, then the unmount is likely stuck trying + * to unregister the shrinker, so we must not block trying to get the + * sb->s_umount otherwise we deadlock. Hence if we fail to get the + * sb_umount lock, abort shrinking the sb by telling the shrinker not + * to call us again and the unmount process will clean up the cache for + * us after it has unregistered the shrinker. + */ + if (!down_read_trylock(&sb->s_umount)) + return -1; + + if (!sb->s_root) { + up_read(&sb->s_umount); + return -1; + } + + if (nr_to_scan) { + /* proportion the scan between the two cacheN? */ + int total; + + total = sb->s_nr_dentry_unused + sb->s_nr_inodes_unused + 1; + count = (nr_to_scan * sb->s_nr_dentry_unused) / total; + + /* prune dcache first as icache is pinned by it */ + prune_dcache_sb(sb, count); + prune_icache_sb(sb, nr_to_scan - count); + } + + count = ((sb->s_nr_dentry_unused + sb->s_nr_inodes_unused) / 100) + * sysctl_vfs_cache_pressure; + up_read(&sb->s_umount); + return count; +} + /** * alloc_super - create new superblock * @type: filesystem type superblock should belong to @@ -99,6 +148,13 @@ static struct super_block *alloc_super(struct file_system_type *type) s->s_qcop = sb_quotactl_ops; s->s_op = &default_op; s->s_time_gran = 1000000000; + + /* + * The shrinker is set up here but not registered until after + * the superblock has been filled out successfully. + */ + s->s_shrink.shrink = prune_super; + s->s_shrink.seeks = DEFAULT_SEEKS; } out: return s; @@ -162,6 +218,7 @@ void deactivate_locked_super(struct super_block *s) struct file_system_type *fs = s->s_type; if (atomic_dec_and_test(&s->s_active)) { vfs_dq_off(s, 0); + unregister_shrinker(&s->s_shrink); fs->kill_sb(s); put_filesystem(fs); put_super(s); @@ -335,6 +392,7 @@ retry: list_add_tail(&s->s_list, &super_blocks); list_add(&s->s_instances, &type->fs_supers); spin_unlock(&sb_lock); + register_shrinker(&s->s_shrink); get_filesystem(type); return s; } diff --git a/include/linux/fs.h b/include/linux/fs.h index 7b90c43..5bff2dc 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -382,6 +382,7 @@ struct inodes_stat_t { #include #include #include +#include #include #include @@ -1385,8 +1386,14 @@ struct super_block { * generic_show_options() */ char *s_options; + + struct shrinker s_shrink; /* per-sb shrinker handle */ }; +/* superblock cache pruning functions */ +void prune_icache_sb(struct super_block *sb, int nr_to_scan); +void prune_dcache_sb(struct super_block *sb, int nr_to_scan); + extern struct timespec current_fs_time(struct super_block *sb); /* -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org