From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p667HerM089497 for ; Wed, 6 Jul 2011 02:17:40 -0500 Received: from ipmail05.adl6.internode.on.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 439131EE93CA for ; Wed, 6 Jul 2011 00:17:38 -0700 (PDT) Received: from ipmail05.adl6.internode.on.net (ipmail05.adl6.internode.on.net [150.101.137.143]) by cuda.sgi.com with ESMTP id Ri57Y2bnGwA76r3g for ; Wed, 06 Jul 2011 00:17:38 -0700 (PDT) Date: Wed, 6 Jul 2011 17:17:33 +1000 From: Dave Chinner Subject: Re: [PATCH 03/27] xfs: use write_cache_pages for writeback clustering Message-ID: <20110706071733.GY1026@dastard> References: <20110629140109.003209430@bombadil.infradead.org> <20110629140336.950805096@bombadil.infradead.org> <20110701022248.GM561@dastard> <20110701041851.GN561@dastard> <20110701093305.GA28531@infradead.org> <20110701154136.GA17881@localhost> <20110704032534.GD1026@dastard> <20110706045301.GA11604@localhost> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20110706045301.GA11604@localhost> List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Sender: xfs-bounces@oss.sgi.com Errors-To: xfs-bounces@oss.sgi.com To: Wu Fengguang Cc: Christoph Hellwig , "linux-mm@kvack.org" , "xfs@oss.sgi.com" , Mel Gorman , Johannes Weiner T24gVHVlLCBKdWwgMDUsIDIwMTEgYXQgMDk6NTM6MDFQTSAtMDcwMCwgV3UgRmVuZ2d1YW5nIHdy b3RlOgo+IE9uIE1vbiwgSnVsIDA0LCAyMDExIGF0IDExOjI1OjM0QU0gKzA4MDAsIERhdmUgQ2hp bm5lciB3cm90ZToKPiA+IE9uIEZyaSwgSnVsIDAxLCAyMDExIGF0IDExOjQxOjM2UE0gKzA4MDAs IFd1IEZlbmdndWFuZyB3cm90ZToKPiA+IFdlIGhhdmUgdG8gcmVtZW1iZXIgdGhhdCBtZW1vcnkg cmVjbGFpbSBpcyBkb2luZyBMUlUgcmVjbGFpbSBhbmQgdGhlCj4gPiBmbHVzaGVyIHRocmVhZHMg YXJlIGRvaW5nICJvbGRlc3QgZmlyc3QiIHdyaXRlYmFjay4gSU9XcywgYm90aCBhcmUgdHJ5aW5n Cj4gPiB0byBvcGVyYXRlIGluIHRoZSBzYW1lIGRpcmVjdGlvbiAob2xkZXN0IHRvIHlvdW5nZXN0 KSBmb3IgdGhlIHNhbWUKPiA+IHB1cnBvc2UuICBUaGUgZnVuZGFtZW50YWwgcHJvYmxlbSB0aGF0 IG9jY3VycyB3aGVuIG1lbW9yeSByZWNsYWltCj4gPiBzdGFydHMgd3JpdGluZyBwYWdlcyBiYWNr IGZyb20gdGhlIExSVSBpcyB0aGlzOgo+ID4gCj4gPiAJLSBtZW1vcnkgcmVjbGFpbSBoYXMgcnVu IGFoZWFkIG9mIElPIHdyaXRlYmFjayAtCj4gPiAKPiA+IFRoZSBMUlUgdXN1YWxseSBsb29rcyBs aWtlIHRoaXM6Cj4gPiAKPiA+IAlvbGRlc3QJCQkJCXlvdW5nZXN0Cj4gPiAJKy0tLS0tLS0tLS0t LS0tLSstLS0tLS0tLS0tLS0tLS0rLS0tLS0tLS0tLS0tLS0rCj4gPiAJY2xlYW4JCXdyaXRlYmFj awlkaXJ0eQo+ID4gCQkJXgkJXgo+ID4gCQkJfAkJfAo+ID4gCQkJfAkJV2hlcmUgZmx1c2hlciB3 aWxsIG5leHQgd29yayBmcm9tCj4gPiAJCQl8CQlXaGVyZSBrc3dhcGQgaXMgd29ya2luZyBmcm9t Cj4gPiAJCQl8Cj4gPiAJCQlJTyBzdWJtaXR0ZWQgYnkgZmx1c2hlciwgd2FpdGluZyBvbiBjb21w bGV0aW9uCj4gPiAKPiA+IAo+ID4gSWYgbWVtb3J5IHJlY2xhaW0gaXMgaGl0dGluZyBkaXJ0eSBw YWdlcyBvbiB0aGUgTFJVLCBpdCBtZWFucyBpdCBoYXMKPiA+IGdvdCBhaGVhZCBvZiB3cml0ZWJh Y2sgd2l0aG91dCBiZWluZyB0aHJvdHRsZWQgLSBpdCdzIHBhc3NlZCBvdmVyCj4gPiBhbGwgdGhl IHBhZ2VzIGN1cnJlbnRseSB1bmRlciB3cml0ZWJhY2sgYW5kIGlzIHRyeWluZyB0byB3cml0ZSBi YWNrCj4gPiBwYWdlcyB0aGF0IGFyZSAqbmV3ZXIqIHRoYW4gd2hhdCB3cml0ZWJhY2sgaXMgd29y a2luZyBvbi4gSU9XcywgaXQKPiA+IHN0YXJ0cyB0cnlpbmcgdG8gZG8gdGhlIGpvYiBvZiB0aGUg Zmx1c2hlciB0aHJlYWRzLCBhbmQgaXQgZG9lcyB0aGF0Cj4gPiB2ZXJ5IGJhZGx5Lgo+ID4gCj4g PiBUaGUgJDEwMCBxdWVzdGlvbiBpcyDiiJd3aHkgaXMgaXQgZ2V0dGluZyBhaGVhZCBvZiB3cml0 ZWJhY2sqPwo+IAo+IFRoZSBtb3N0IGltcG9ydGFudCBjYXNlIGlzOiBmYXN0ZXIgcmVhZGVyICsg cmVsYXRpdmVseSBzbG93IHdyaXRlci4KClNhbWUgdGhpbmcgSSBzYWlkIHRvIE1lbDogdGhhdCBp cyBub3QgdGhlIHdvcmtsb2FkIHRoYXQgaXMgY2F1c2luZwp0aGlzIHByb2JsZW0gSSBhbSBzZWVp bmcuCgo+IEFzc3VtZSBmb3IgZXZlcnkgMTAgcGFnZXMgcmVhZCwgMSBwYWdlIGlzIGRpcnRpZWQs IGFuZCB0aGUgZGlydHkgc3BlZWQKPiBpcyBmYXN0IGVub3VnaCB0byB0cmlnZ2VyIHRoZSAyMCUg ZGlydHkgcmF0aW8gYW5kIGhlbmNlIGRpcnR5IGJhbGFuY2luZy4KPiAKPiBUaGF0IHBhdHRlcm4g aXMgYWJsZSB0byBldmVubHkgZGlzdHJpYnV0ZSBkaXJ0eSBwYWdlcyBhbGwgb3ZlciB0aGUgTFJV Cj4gbGlzdCBhbmQgaGVuY2UgdHJpZ2dlciBsb3RzIG9mIHBhZ2VvdXQoKXMuIFRoZSAic2tpcCBy ZWNsYWltIHdyaXRlcyBvbgo+IGxvdyBwcmVzc3VyZSIgYXBwcm9hY2ggY2FuIGZpeCB0aGlzIGNh c2UuCgpTdXJlIGl0IGNhbiwgYnV0IGV2ZW4gYmV0dGVyIHdvdWxkIGJlIHRvIHNpbXBseSBza2lw IHRoZSBkaXJ0eSBwYWdlcwphbmQgcmVjbGFpbSB0aGUgaW50ZXJzcGVyc2VkIGNsZWFuIHBhZ2Vz IHdoaWNoIGdyZWF0bHkKb3V0bnVtYmVyIHRoZSBkaXJ0eSBwYWdlcy4gVGhhdCB0aGVuIGxldHMg d3JpdGViYWNrIGRlYWwgd2l0aApjbGVhbmluZyB0aGUgZGlydHkgcGFnZXMgaW4gdGhlIG1vc3Qg b3B0aW1hbCBtYW5uZXIsIGFuZCBubwp3cml0ZWJhY2sgZnJvbSBtZW1vcnkgcmVjbGFpbSBpcyBu ZWVkZWQuCgpJT1dzLCBJIGRvbid0IHRoaW5rIHdyaXRlYmFjayBmcm9tIHRoZSBMUlUgaXMgdGhl IHJpZ2h0IHNvbHV0aW9uIHRvCnRoZSBwcm9ibGVtIHlvdSd2ZSBkZXNjcmliZWQsIGVpdGhlci4K Cj4gCj4gVGhhbmtzLAo+IEZlbmdndWFuZwo+IC0tLQo+IFN1YmplY3Q6IHdyaXRlYmFjazogaW50 cm9kdWNlIGJkaV9zdGFydF9pbm9kZV93cml0ZWJhY2soKQo+IERhdGU6IFRodSBKdWwgMjkgMTQ6 NDE6MTkgQ1NUIDIwMTAKPiAKPiBUaGlzIHJlbGF5cyBBU1lOQyBmaWxlIHdyaXRlYmFjayBJT3Mg dG8gdGhlIGZsdXNoZXIgdGhyZWFkcy4KPiAKPiBwYWdlb3V0KCkgd2lsbCBjb250aW51ZSB0byBz ZXJ2ZSB0aGUgU1lOQyBmaWxlIHBhZ2Ugd3JpdGVzIGZvciBuZWNlc3NhcnkKPiB0aHJvdHRsaW5n IGZvciBwcmV2ZW50aW5nIE9PTSwgd2hpY2ggbWF5IGhhcHBlbiBpZiB0aGUgTFJVIGxpc3QgaXMg c21hbGwKPiBhbmQvb3IgdGhlIHN0b3JhZ2UgaXMgc2xvdywgc28gdGhhdCB0aGUgZmx1c2hlciBj YW5ub3QgY2xlYW4gZW5vdWdoCj4gcGFnZXMgYmVmb3JlIHRoZSBMUlUgaXMgZnVsbCBzY2FubmVk Lgo+IAo+IE9ubHkgQVNZTkMgcGFnZW91dCgpIGlzIHJlbGF5ZWQgdG8gdGhlIGZsdXNoZXIgdGhy ZWFkcywgdGhlIGxlc3MKPiBmcmVxdWVudCBTWU5DIHBhZ2VvdXQoKXMgd2lsbCB3b3JrIGFzIGJl Zm9yZSBhcyBhIGxhc3QgcmVzb3J0Lgo+IFRoaXMgaGVscHMgdG8gYXZvaWQgT09NIHdoZW4gdGhl IExSVSBsaXN0IGlzIHNtYWxsIGFuZC9vciB0aGUgc3RvcmFnZSBpcwo+IHNsb3csIGFuZCB0aGUg Zmx1c2hlciBjYW5ub3QgY2xlYW4gZW5vdWdoIHBhZ2VzIGJlZm9yZSB0aGUgTFJVIGlzCj4gZnVs bCBzY2FubmVkLgoKV2hpY2ggaWdub3JlcyB0aGUgZmFjdCB0aGF0IGFzeW5jIHBhZ2VvdXQgc2hv dWxkIG5vdCBiZSBoYXBwZW5pbmcgaW4KbW9zdCBjYXNlcy4gTGV0J3MgdHJ5IGFuZCBmaXggdGhl IHJvb3QgY2F1c2Ugb2YgdGhlIHByb2JsZW0sIG5vdApwYXBlciBvdmVyIGl0IGFnYWluLi4uCgo+ IFRoZSBmbHVzaGVyIHdpbGwgcGlnZ3kgYmFjayBtb3JlIGRpcnR5IHBhZ2VzIGZvciBJTwo+IC0g aXQncyBtb3JlIElPIGVmZmljaWVudAo+IC0gaXQgaGVscHMgY2xlYW4gbW9yZSBwYWdlcywgYSBn b29kIG51bWJlciBvZiB0aGVtIG1heSBzaXQgaW4gdGhlIHNhbWUKPiAgIExSVSBsaXN0IHRoYXQg aXMgYmVpbmcgc2Nhbm5lZC4KPiAKPiBUbyBhdm9pZCBtZW1vcnkgYWxsb2NhdGlvbnMgYXQgcGFn ZSByZWNsYWltLCBhIG1lbXBvb2wgaXMgY3JlYXRlZC4KPiAKPiBCYWNrZ3JvdW5kL3BlcmlvZGlj IHdvcmtzIHdpbGwgcXVpdCBhdXRvbWF0aWNhbGx5IChhcyBkb25lIGluIGFub3RoZXIKPiBwYXRj aCksIHNvIGFzIHRvIGNsZWFuIHRoZSBwYWdlcyB1bmRlciByZWNsYWltIEFTQVAuIEhvd2V2ZXIg Zm9yIG5vdyB0aGUKPiBzeW5jIHdvcmsgY2FuIHN0aWxsIGJsb2NrIHVzIGZvciBsb25nIHRpbWUu Cgo+ICAvKgo+ICsgKiBXaGVuIGZsdXNoaW5nIGFuIGlub2RlIHBhZ2UgKGZvciBwYWdlIHJlY2xh aW0pLCB0cnkgdG8gcGlnZ3kgYmFjayB1cCB0bwo+ICsgKiA0TUIgbmVhcmJ5IHBhZ2VzIGZvciBJ TyBlZmZpY2llbmN5LiBUaGVzZSBwYWdlcyB3aWxsIGhhdmUgZ29vZCBvcHBvcnR1bml0eQo+ICsg KiB0byBiZSBpbiB0aGUgc2FtZSBMUlUgbGlzdC4KPiArICovCj4gKyNkZWZpbmUgV1JJVEVfQVJP VU5EX1BBR0VTCU1JTl9XUklURUJBQ0tfUEFHRVMKClJlZ2FyZGxlc3Mgb2YgdGhlIHRyaWdnZXIs IEkgdGhpbmsgeW91J3JlIGdvaW5nIHRvbyBmYXIgaW4gdGhlIG90aGVyCmRpcmVjdGlvbiwgaGVy ZS4gSWYgd2UgaGF2ZSB0byBkbyBvbmUgSU8gdG8gY2xlYW4gdGhlIHBhZ2UgdGhhdCB0aGUKVk0g d2FudHMsIHRoZW4gaXQgaGFzIHRvIGJlIGRvbmUgd2l0aCBhcyBsaXR0bGUgbGF0ZW5jeSBhcyBw b3NzaWJsZQpidXQgbGFyZ2UgZW5vdWdoIHRvIHN0aWxsIG1haW50YWluIGRlY2VudCB0aHJvdWdo cHV0LgoKV2l0aCB0aGUgYWJvdmUgcGF0Y2gsIGZvciBldmVyeSBzaW5nbGUgZGlydHkgcGFnZSB0 aGUgVk0gd2FudHMKY2xlYW5lZCwgd2UnbGwgY2xlYW4gNE1CIG9mIHBhZ2VzIGFyb3VuZCBpdC4g T2ssIGJ1dCBvbmNlIHRoZSBWTSBoYXMKdHJpcHBlZCBvdmVyIHBhZ2VzIG9uIDI1IGRpZmZlcmVu dCBpbm9kZXMsIHdlJ3ZlIG5vdyBnb3QgMTAwTUIgb2YKd3JpdGViYWNrIHdvcmsgdG8gY2hldyB0 aHJvdWdoIGJlZm9yZSB3ZSBjYW4gZ2V0IHRvIHRoZSAyNnRoIHBhZ2UKdGhlIFZNIHdhbnRlZCBj bGVhbmVkLgoKQXQgd2hpY2ggcG9pbnQsIHdlIG1heSBhcyB3ZWxsIGp1c3QgaWdub3JlIHdoYXQg dGhlIFZNIHdhbnRzIGFuZApjb250aW51ZSB0byBjbGVhbiBwYWdlcyB2aWEgdGhlIGV4aXN0aW5n IG1lY2hhbmlzbXMgYmVjYXVzZSB0aGUKbGF0ZW5jeSBmb3IgY2xlYW5pbmcgYSBzcGVjaWZpYyBw YWdlIHdpbGwgd29yc2UgdGhhbiBpZiB0aGUgVk0ganVzdApza2lwcGVkIGl0IGluIHRoZSBmaXJz dCBwbGFjZS4uLi4KCkZXSVcsIFhGUyBsaW1pdGVkIHN1Y2ggY2x1c3RlcmluZyB0byA2NCBwYWdl cyBhdCBhIHRpbWUgdG8gdHJ5IHRvCmJhbGFuY2UgdGhlIGJhbmR3aWR0aCB2cyBjb21wbGV0aW9u IGxhdGVuY3kgcHJvYmxlbS4KCgo+ICsvKgo+ICsgKiBDYWxsZWQgYnkgcGFnZSByZWNsYWltIGNv ZGUgdG8gZmx1c2ggdGhlIGRpcnR5IHBhZ2UgQVNBUC4gRG8gd3JpdGUtYXJvdW5kIHRvCj4gKyAq IGltcHJvdmUgSU8gdGhyb3VnaHB1dC4gVGhlIG5lYXJieSBwYWdlcyB3aWxsIGhhdmUgZ29vZCBj aGFuY2UgdG8gcmVzaWRlIGluCj4gKyAqIHRoZSBzYW1lIExSVSBsaXN0IHRoYXQgdm1zY2FuIGlz IHdvcmtpbmcgb24sIGFuZCBldmVuIGNsb3NlIHRvIGVhY2ggb3RoZXIKPiArICogaW5zaWRlIHRo ZSBMUlUgbGlzdCBpbiB0aGUgY29tbW9uIGNhc2Ugb2Ygc2VxdWVudGlhbCByZWFkL3dyaXRlLgo+ ICsgKgo+ICsgKiByZXQgPiAwOiBzdWNjZXNzLCBmb3VuZC9yZXVzZWQgYSBwcmV2aW91cyB3cml0 ZWJhY2sgd29yawo+ICsgKiByZXQgPSAwOiBzdWNjZXNzLCBhbGxvY2F0ZWQvcXVldWVkIGEgbmV3 IHdyaXRlYmFjayB3b3JrCj4gKyAqIHJldCA8IDA6IGZhaWxlZAo+ICsgKi8KPiArbG9uZyBmbHVz aF9pbm9kZV9wYWdlKHN0cnVjdCBwYWdlICpwYWdlLCBzdHJ1Y3QgYWRkcmVzc19zcGFjZSAqbWFw cGluZykKPiArewo+ICsJc3RydWN0IGJhY2tpbmdfZGV2X2luZm8gKmJkaSA9IG1hcHBpbmctPmJh Y2tpbmdfZGV2X2luZm87Cj4gKwlzdHJ1Y3QgaW5vZGUgKmlub2RlID0gbWFwcGluZy0+aG9zdDsK PiArCXBnb2ZmX3Qgb2Zmc2V0ID0gcGFnZS0+aW5kZXg7Cj4gKwlwZ29mZl90IGxlbiA9IDA7Cj4g KwlzdHJ1Y3Qgd2Jfd3JpdGViYWNrX3dvcmsgKndvcms7Cj4gKwlsb25nIHJldCA9IC1FTk9FTlQ7 Cj4gKwo+ICsJaWYgKHVubGlrZWx5KCFpbm9kZSkpCj4gKwkJZ290byBvdXQ7Cj4gKwo+ICsJbGVu ID0gMTsKPiArCXNwaW5fbG9ja19iaCgmYmRpLT53Yl9sb2NrKTsKPiArCWxpc3RfZm9yX2VhY2hf ZW50cnlfcmV2ZXJzZSh3b3JrLCAmYmRpLT53b3JrX2xpc3QsIGxpc3QpIHsKPiArCQlpZiAod29y ay0+aW5vZGUgIT0gaW5vZGUpCj4gKwkJCWNvbnRpbnVlOwo+ICsJCWlmIChleHRlbmRfd3JpdGVi YWNrX3JhbmdlKHdvcmssIG9mZnNldCkpIHsKPiArCQkJcmV0ID0gbGVuOwo+ICsJCQlvZmZzZXQg PSB3b3JrLT5vZmZzZXQ7Cj4gKwkJCWxlbiA9IHdvcmstPm5yX3BhZ2VzOwo+ICsJCQlicmVhazsK PiArCQl9Cj4gKwkJaWYgKGxlbisrID4gMzApCS8qIGRvIGxpbWl0ZWQgc2VhcmNoICovCj4gKwkJ CWJyZWFrOwo+ICsJfQo+ICsJc3Bpbl91bmxvY2tfYmgoJmJkaS0+d2JfbG9jayk7CgpJIGRvbnQg dGhpbmsgdGhpcyBpcyBhIG5lY2Vzc2FyeSBvciBzY2FsYWJsZSBvcHRpbWlzYXRpb24uIEl0IHdv bid0CmJlIHVzZWZ1bCB3aGVuIHRoZXJlIGFyZSBsb3RzIG9mIGRpcnR5IGlub2RlcyBhbmQgZGly dHkgcGFnZXMgYXJlCnRyaXBwZWQgb3ZlciBpbiB0aGVpciBodW5kcmVkcyBvciB0aG91c2FuZHMg LSBpdCdsbCBqdXN0IGJ1cm4gQ1BVCmRvaW5nIG5vdGhpbmcsIGFuZCBzZXJpYWxpc2UgYWdhaW5z dCBvdGhlciByZWNsYWltIGFuZCB3cml0ZWJhY2sKd29yay4gSXQgbG9va3MgbGlrZSBhIGNhc2Ug b2YgcHJlbWF0dXJlIG9wdGltaXNhdGlvbiB0byBtZS4uLi4KCkFueXdheSwgaWYgdGhlcmUncyBh IHBhZ2UgZmx1c2ggbmVhciB0byBhbiBleGlzdGluZyBwaWVjZSBvZiB3b3JrIHRoZQpJTyBlbGV2 YXRvciBzaG91bGQgbWVyZ2UgdGhlbSBhcHByb3ByaWF0ZWx5LgoKPiArc3RhdGljIGxvbmcgd2Jf Zmx1c2hfaW5vZGUoc3RydWN0IGJkaV93cml0ZWJhY2sgKndiLAo+ICsJCQkgICBzdHJ1Y3Qgd2Jf d3JpdGViYWNrX3dvcmsgKndvcmspCj4gK3sKPiArCWxvZmZfdCBzdGFydCA9IHdvcmstPm9mZnNl dDsKPiArCWxvZmZfdCBlbmQgICA9IHdvcmstPm9mZnNldCArIHdvcmstPm5yX3BhZ2VzIC0gMTsK PiArCWludCB3cm90ZTsKPiArCj4gKwl3cm90ZSA9IF9fZmlsZW1hcF9mZGF0YXdyaXRlX3Jhbmdl KHdvcmstPmlub2RlLT5pX21hcHBpbmcsCj4gKwkJCQkJICAgc3RhcnQgPDwgUEFHRV9DQUNIRV9T SElGVCwKPiArCQkJCQkgICBlbmQgICA8PCBQQUdFX0NBQ0hFX1NISUZULAo+ICsJCQkJCSAgIFdC X1NZTkNfTk9ORSk7Cj4gKwlpcHV0KHdvcmstPmlub2RlKTsKPiArCXJldHVybiB3cm90ZTsKPiAr fQoKT3V0IG9mIGN1cmlvdXNpdHksIGJlZm9yZSBnb2luZyBkb3duIHRoZSBjb21wbGV4IHJvdXRl IGRpZCB5b3UgdHJ5Cmp1c3QgY2FsbGluZyB0aGlzIGRpcmVjdGx5IGFuZCBzZWVpbmcgaWYgaXQg c29sdmVkIHRoZSBwcm9ibGVtPyBpLmUuCgoJaWdyYWIoKQoJZ2V0IHN0YXJ0L2VuZAoJdW5sb2Nr IHBhZ2UKCV9fZmlsZW1hcF9mZGF0YXdyaXRlX3JhbmdlKCkKCWlwdXQoKQoKSSBtZWFuLCBtdWNo IGFzIEkgZGlzbGlrZSB0aGUgaWRlYSBvZiB3cml0ZWJhY2sgZnJvbSB0aGUgTFJVLCBpZiBhbGwK d2UgbmVlZCB0byBkbyBpcyBjYWxsIHRocm91Z2ggLndyaXRlcGFnZXMoKSB0byBkbyBnZXQgZGVj ZW50IElPIGZyb20KcmVjbGFpbSAod2hlbiBpdCBvY2N1cnMpLCB0aGVuIHdoeSBkbyB3ZSBuZWVk IHRvIGFkZCB0aGlzIGFzeW5jCmNvbXBsZXhpdHkgdG8gdGhlIGdlbmVyaWMgd3JpdGViYWNrIGNv ZGUgdG8gYWNoZWl2ZSB0aGUgc2FtZSBlbmQ/CgpDaGVlcnMsCgpEYXZlLgotLSAKRGF2ZSBDaGlu bmVyCmRhdmlkQGZyb21vcmJpdC5jb20KCl9fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fCnhmcyBtYWlsaW5nIGxpc3QKeGZzQG9zcy5zZ2kuY29tCmh0dHA6Ly9v c3Muc2dpLmNvbS9tYWlsbWFuL2xpc3RpbmZvL3hmcwo= From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail6.bemta12.messagelabs.com (mail6.bemta12.messagelabs.com [216.82.250.247]) by kanga.kvack.org (Postfix) with ESMTP id B1FF49000C2 for ; Wed, 6 Jul 2011 03:17:39 -0400 (EDT) Date: Wed, 6 Jul 2011 17:17:33 +1000 From: Dave Chinner Subject: Re: [PATCH 03/27] xfs: use write_cache_pages for writeback clustering Message-ID: <20110706071733.GY1026@dastard> References: <20110629140109.003209430@bombadil.infradead.org> <20110629140336.950805096@bombadil.infradead.org> <20110701022248.GM561@dastard> <20110701041851.GN561@dastard> <20110701093305.GA28531@infradead.org> <20110701154136.GA17881@localhost> <20110704032534.GD1026@dastard> <20110706045301.GA11604@localhost> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20110706045301.GA11604@localhost> Sender: owner-linux-mm@kvack.org List-ID: To: Wu Fengguang Cc: Christoph Hellwig , Mel Gorman , Johannes Weiner , "xfs@oss.sgi.com" , "linux-mm@kvack.org" On Tue, Jul 05, 2011 at 09:53:01PM -0700, Wu Fengguang wrote: > On Mon, Jul 04, 2011 at 11:25:34AM +0800, Dave Chinner wrote: > > On Fri, Jul 01, 2011 at 11:41:36PM +0800, Wu Fengguang wrote: > > We have to remember that memory reclaim is doing LRU reclaim and the > > flusher threads are doing "oldest first" writeback. IOWs, both are trying > > to operate in the same direction (oldest to youngest) for the same > > purpose. The fundamental problem that occurs when memory reclaim > > starts writing pages back from the LRU is this: > > > > - memory reclaim has run ahead of IO writeback - > > > > The LRU usually looks like this: > > > > oldest youngest > > +---------------+---------------+--------------+ > > clean writeback dirty > > ^ ^ > > | | > > | Where flusher will next work from > > | Where kswapd is working from > > | > > IO submitted by flusher, waiting on completion > > > > > > If memory reclaim is hitting dirty pages on the LRU, it means it has > > got ahead of writeback without being throttled - it's passed over > > all the pages currently under writeback and is trying to write back > > pages that are *newer* than what writeback is working on. IOWs, it > > starts trying to do the job of the flusher threads, and it does that > > very badly. > > > > The $100 question is a??why is it getting ahead of writeback*? > > The most important case is: faster reader + relatively slow writer. Same thing I said to Mel: that is not the workload that is causing this problem I am seeing. > Assume for every 10 pages read, 1 page is dirtied, and the dirty speed > is fast enough to trigger the 20% dirty ratio and hence dirty balancing. > > That pattern is able to evenly distribute dirty pages all over the LRU > list and hence trigger lots of pageout()s. The "skip reclaim writes on > low pressure" approach can fix this case. Sure it can, but even better would be to simply skip the dirty pages and reclaim the interspersed clean pages which greatly outnumber the dirty pages. That then lets writeback deal with cleaning the dirty pages in the most optimal manner, and no writeback from memory reclaim is needed. IOWs, I don't think writeback from the LRU is the right solution to the problem you've described, either. > > Thanks, > Fengguang > --- > Subject: writeback: introduce bdi_start_inode_writeback() > Date: Thu Jul 29 14:41:19 CST 2010 > > This relays ASYNC file writeback IOs to the flusher threads. > > pageout() will continue to serve the SYNC file page writes for necessary > throttling for preventing OOM, which may happen if the LRU list is small > and/or the storage is slow, so that the flusher cannot clean enough > pages before the LRU is full scanned. > > Only ASYNC pageout() is relayed to the flusher threads, the less > frequent SYNC pageout()s will work as before as a last resort. > This helps to avoid OOM when the LRU list is small and/or the storage is > slow, and the flusher cannot clean enough pages before the LRU is > full scanned. Which ignores the fact that async pageout should not be happening in most cases. Let's try and fix the root cause of the problem, not paper over it again... > The flusher will piggy back more dirty pages for IO > - it's more IO efficient > - it helps clean more pages, a good number of them may sit in the same > LRU list that is being scanned. > > To avoid memory allocations at page reclaim, a mempool is created. > > Background/periodic works will quit automatically (as done in another > patch), so as to clean the pages under reclaim ASAP. However for now the > sync work can still block us for long time. > /* > + * When flushing an inode page (for page reclaim), try to piggy back up to > + * 4MB nearby pages for IO efficiency. These pages will have good opportunity > + * to be in the same LRU list. > + */ > +#define WRITE_AROUND_PAGES MIN_WRITEBACK_PAGES Regardless of the trigger, I think you're going too far in the other direction, here. If we have to do one IO to clean the page that the VM wants, then it has to be done with as little latency as possible but large enough to still maintain decent throughput. With the above patch, for every single dirty page the VM wants cleaned, we'll clean 4MB of pages around it. Ok, but once the VM has tripped over pages on 25 different inodes, we've now got 100MB of writeback work to chew through before we can get to the 26th page the VM wanted cleaned. At which point, we may as well just ignore what the VM wants and continue to clean pages via the existing mechanisms because the latency for cleaning a specific page will worse than if the VM just skipped it in the first place.... FWIW, XFS limited such clustering to 64 pages at a time to try to balance the bandwidth vs completion latency problem. > +/* > + * Called by page reclaim code to flush the dirty page ASAP. Do write-around to > + * improve IO throughput. The nearby pages will have good chance to reside in > + * the same LRU list that vmscan is working on, and even close to each other > + * inside the LRU list in the common case of sequential read/write. > + * > + * ret > 0: success, found/reused a previous writeback work > + * ret = 0: success, allocated/queued a new writeback work > + * ret < 0: failed > + */ > +long flush_inode_page(struct page *page, struct address_space *mapping) > +{ > + struct backing_dev_info *bdi = mapping->backing_dev_info; > + struct inode *inode = mapping->host; > + pgoff_t offset = page->index; > + pgoff_t len = 0; > + struct wb_writeback_work *work; > + long ret = -ENOENT; > + > + if (unlikely(!inode)) > + goto out; > + > + len = 1; > + spin_lock_bh(&bdi->wb_lock); > + list_for_each_entry_reverse(work, &bdi->work_list, list) { > + if (work->inode != inode) > + continue; > + if (extend_writeback_range(work, offset)) { > + ret = len; > + offset = work->offset; > + len = work->nr_pages; > + break; > + } > + if (len++ > 30) /* do limited search */ > + break; > + } > + spin_unlock_bh(&bdi->wb_lock); I dont think this is a necessary or scalable optimisation. It won't be useful when there are lots of dirty inodes and dirty pages are tripped over in their hundreds or thousands - it'll just burn CPU doing nothing, and serialise against other reclaim and writeback work. It looks like a case of premature optimisation to me.... Anyway, if there's a page flush near to an existing piece of work the IO elevator should merge them appropriately. > +static long wb_flush_inode(struct bdi_writeback *wb, > + struct wb_writeback_work *work) > +{ > + loff_t start = work->offset; > + loff_t end = work->offset + work->nr_pages - 1; > + int wrote; > + > + wrote = __filemap_fdatawrite_range(work->inode->i_mapping, > + start << PAGE_CACHE_SHIFT, > + end << PAGE_CACHE_SHIFT, > + WB_SYNC_NONE); > + iput(work->inode); > + return wrote; > +} Out of curiousity, before going down the complex route did you try just calling this directly and seeing if it solved the problem? i.e. igrab() get start/end unlock page __filemap_fdatawrite_range() iput() I mean, much as I dislike the idea of writeback from the LRU, if all we need to do is call through .writepages() to do get decent IO from reclaim (when it occurs), then why do we need to add this async complexity to the generic writeback code to acheive the same end? Cheers, Dave. -- Dave Chinner david@fromorbit.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org