From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p622gQL5132836 for ; Fri, 1 Jul 2011 21:42:27 -0500 Received: from ipmail07.adl2.internode.on.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 56922472E2 for ; Fri, 1 Jul 2011 19:42:23 -0700 (PDT) Received: from ipmail07.adl2.internode.on.net (ipmail07.adl2.internode.on.net [150.101.137.131]) by cuda.sgi.com with ESMTP id DcNmT5ylsPp981Wo for ; Fri, 01 Jul 2011 19:42:23 -0700 (PDT) Date: Sat, 2 Jul 2011 12:42:19 +1000 From: Dave Chinner Subject: Re: [PATCH 03/27] xfs: use write_cache_pages for writeback clustering Message-ID: <20110702024219.GT561@dastard> References: <20110629140109.003209430@bombadil.infradead.org> <20110629140336.950805096@bombadil.infradead.org> <20110701022248.GM561@dastard> <20110701041851.GN561@dastard> <20110701093305.GA28531@infradead.org> <20110701145935.GB29530@suse.de> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20110701145935.GB29530@suse.de> List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Sender: xfs-bounces@oss.sgi.com Errors-To: xfs-bounces@oss.sgi.com To: Mel Gorman Cc: jack@suse.cz, xfs@oss.sgi.com, Christoph Hellwig , linux-mm@kvack.org, Wu Fengguang , Johannes Weiner T24gRnJpLCBKdWwgMDEsIDIwMTEgYXQgMDM6NTk6MzVQTSArMDEwMCwgTWVsIEdvcm1hbiB3cm90 ZToKPiBPbiBGcmksIEp1bCAwMSwgMjAxMSBhdCAwNTozMzowNUFNIC0wNDAwLCBDaHJpc3RvcGgg SGVsbHdpZyB3cm90ZToKPiA+IEpvaGFubmVzLCBNZWwsIFd1LAo+IAo+IEFtIGFkZGluZyBKYW4g S2FyYSBhcyBoZSBoYXMgYmVlbiB3b3JraW5nIG9uIHdyaXRlYmFjayBlZmZpY2llbmN5Cj4gcmVj ZW50bHkgYXMgd2VsbC4KCldyaXRlYmFjayBsb29rcyB0byBiZSB3b3JraW5nIGZpbmUgLSBpdCdz IGtzd2FwZCBzY3Jld2luZyB1cCB0aGUKd3JpdGViYWNrIHBhdHRlcm5zIHRoYXQgYXBwZWFycyB0 byBiZSB0aGUgcHJvYmxlbS4uLi4KCj4gPiBEYXZlIGhhcyBiZWVuIHN0cmVzc2luZyBzb21lIFhG UyBwYXRjaGVzIG9mIG1pbmUgdGhhdCByZW1vdmUgdGhlIFhGUwo+ID4gaW50ZXJuYWwgd3JpdGVi YWNrIGNsdXN0ZXJpbmcgaW4gZmF2b3VyIG9mIHVzaW5nIHdyaXRlX2NhY2hlX3BhZ2VzLgo+IAo+ IEFnYWluc3Qgd2hhdCBrZXJuZWw/IDIuNi4zOCB3YXMgYSBkaXNhc3RlciBmb3IgcmVjbGFpbSBJ J3ZlIGJlZW4KPiBmaW5kaW5nIG91dCB0aGlzIHdlZWsuIEkgZG9uJ3Qga25vdyBhYm91dCAyLjYu MzguOC4gMi42LjM5IHdhcyBiZXR0ZXIuCgozLjAtcmM0CgouLi4uCj4gVGhlIG51bWJlciBvZiBw YWdlcyB3cml0dGVuIGZyb20gcmVjbGFpbSBpcyBleGNlcHRpb25hbGx5IGxvdyAoMi42LjM4Cj4g d2FzIGEgdG90YWwgZGlzYXN0ZXIgYnV0IHRoYXQgcmVsZWFzZSB3YXMgYmFkIGZvciBhIG51bWJl ciBvZiByZWFzb25zLAo+IGhhdmVuJ3QgdGVzdGVkIDIuNi4zOC44IHlldCkgYnV0IHJlZHVjZWQg YnkgMi42LjM3IGFzIGV4cGVjdGVkLiBEaXJlY3QKPiByZWNsYWltIHVzYWdlIHdhcyByZWR1Y2Vk IGFuZCBlZmZpY2llbmN5IChyYXRpbyBvZiBwYWdlcyBzY2FubmVkIHRvCj4gcGFnZXMgcmVjbGFp bWVkKSB3YXMgaGlnaC4KCkFuZCBpcyB0aGF0IGNvbnNpc3RlbnQgYWNyb3NzIGV4dDMvZXh0NC94 ZnMvYnRyZnMgZmlsZXN5c3RlbXM/IEkKZG91YnQgaXQgdmVyeSBtdWNoLCBhcyBhbGwgaGF2ZSB2 ZXJ5IGRpZmZlcmVudCAud3JpdGVwYWdlCmJlaGF2aW91cnMuLi4KCkJUVywgY2FsbGVkIGEgd29y a2xvYWQgImZzbWFyayIgdGVsbHMgdXMgbm90aGluZyBhYm91dCB0aGUgd29ya2xvYWQKYmVpbmcg dGVzdGVkIC0gZnNtYXJrIGNhbiBkbyBhIGxvdCBvZiBpbnRlcmVzdGluZyB0aGluZ3MuIElPV3Ms IHlvdQpuZWVkIHRvIHF1b3RlIHRoZSBjb21tYW5kIGxpbmUgZm9yIGl0IHRvIGJlIG1lYW5pbmdm dWwgdG8gYW55b25lLi4uCgo+IEFzIEkgbG9vayB0aHJvdWdoIHRoZSByZXN1bHRzIEkgaGF2ZSBh dCB0aGUgbW9tZW50LCB0aGUgbnVtYmVyIG9mCj4gcGFnZXMgd3JpdHRlbiBiYWNrIHdhcyBzaW1w bHkgcmVhbGx5IGxvdyB3aGljaCBpcyB3aHkgdGhlIHByb2JsZW0gZmVsbAo+IG9mZiBteSByYWRh ci4KCkl0IGRvZXNuJ3QgdGFrZSBtYW55IHRvIGNvbXBsZXRlbHkgc2NyZXcgdXAgd3JpdGViYWNr IElPIHBhdHRlcm5zLgpXcml0ZSBhIGZldyByYW5kb20gcGFnZXMgdG8gYSAxME1CIGZpbGUgd2Vs bCBiZWZvcmUgd3JpdGViYWNrIHdvdWxkCmdldCB0byB0aGUgZmlsZSwgYW5kIGluc3RlYWQgb2Yg Z2V0dGluZyBvcHRpbWFsIHNlcXVlbnRpYWwgd3JpdGViYWNrCnBhdHRlcm5zIHdoZW4gd3JpdGVi YWNrIGdldHMgdG8gaXQsIHdlIGdldCBtdWx0aXBsZSBkaXNqb2ludCBJT3MKdGhhdCByZXF1aXJl IG11bHRpcGxlIHNlZWtzIHRvIGNvbXBsZXRlLgoKU2xvd2VyLCBsZXNzIGVmZmljaWVudCB3cml0 ZWJhY2sgSU8gY2F1c2VzIG1lbW9yeSBwcmVzc3VyZSB0byBsYXN0CmxvbmdlciBhbmQgaGVuY2Ug bW9yZSBsaWtlbHkgdG8gcmVzdWx0IGluIGtzd2FwZCB3cml0ZWJhY2ssIGFuZCBpdCdzCmp1c3Qg YSBkb3dud2FyZCBzcGlyYWwgZnJvbSB0aGVyZS4uLi4KCj4gPiA+IFRoYXQgbWVhbnMgdGhlIHRl c3QgaXMgb25seSB1c2luZyAxR0Igb2YgZGlzayBzcGFjZSwgYW5kCj4gPiA+IEknbSBydW5uaW5n IG9uIGEgVk0gd2l0aCAxR0IgUkFNLiBJdCBhcHBlYXJzIHRvIGJlIHJlbGF0ZWQgdG8gdGhlIFZN Cj4gPiA+IHRyaWdnZXJpbmcgcmFuZG9tIHBhZ2Ugd3JpdGViYWNrIGZyb20gdGhlIExSVSAtIDEw MHgxME1CIGZpbGVzIG1vcmUKPiA+ID4gdGhhbiBmaWxscyBtZW1vcnksIGhlbmNlIGl0IGJlaW5n IHRoZSBzbWFsbGVzdCB0ZXN0IGNhc2UgaSBjb3VsZAo+ID4gPiByZXByb2R1Y2UgdGhlIHByb2Js ZW0gb24uCj4gPiA+IAo+IAo+IE15IHRlc3RzIHdlcmUgb24gYSBtYWNoaW5lIHdpdGggOEcgYW5k IGV4dDMuIEknbSBydW5uaW5nIHNvbWUgb2YKPiB0aGUgdGVzdHMgYWdhaW5zdCBleHQ0IGFuZCB4 ZnMgdG8gc2VlIGlmIHRoYXQgbWFrZXMgYSBkaWZmZXJlbmNlIGJ1dAo+IGl0J3MgcG9zc2libGUg dGhlIHRlc3RzIGFyZSBzaW1wbHkgbm90IGFncmVzc2l2ZSBlbm91Z2ggc28gSSB3YW50IHRvCj4g cmVwcm9kdWNlIERhdmUncyB0ZXN0IGlmIHBvc3NpYmxlLgoKVG8gdGVsbCB0aGUgdHJ1dGgsIEkg ZG9uJ3QgdGhpbmsgYW55b25lIHJlYWxseSBjYXJlcyBob3cgZXh0MwpwZXJmb3JtcyB0aGVzZSBk YXlzLiBYRlMgc2VlbXMgdG8gYmUgdGhlIGZpbGVzeXN0ZW0gdGhhdCBicmluZ3Mgb3V0CmFsbCB0 aGUgYmFkIGJlaGF2aW91ciBpbiB0aGUgbW0gc3Vic3lzdGVtLi4uLgoKRldJVywgdGhlIG1tIHN1 YnN5c3RlbSB3b3JrcyB3ZWxsIGVub3VnaCB3aGVuIHRoZXJlIGlzIFJBTQphdmFpbGFibGUsIHNv IEknZCBzdWdnZXN0IHRoYXQgeW91ciByZWNsYWltIHRlc3RpbmcgbmVlZHMgdG8gZm9jdXMKb24g c21hbGxlciBtZW1vcnkgY29uZmlndXJhdGlvbnMgdG8gcmVhbGx5IHN0cmVzcyB0aGUgcmVjbGFp bQphbGdvcml0aG1zLiBUaGF0J3Mgb25lIG9mIHRoZSByZWFzb24gd2h5IEkgcmVndWxhcmx5IHRl c3Qgb24gMUdCLCAxcAptYWNoaW5lcyAtIHRoZXkgc2hvdyBwcm9ibGVtcyB0aGF0IGFyZSBoYXJk IHRvIHJlcOKUjG9kdWNlIG9uIGxhcmdlcgpjb25maWdzLi4uLgoKPiBJJ20gYXNzdW1pbmcgInRl c3QgMTgwIiBpcyBmcm9tIHhmc3Rlc3RzIHdoaWNoIHdhcyBub3Qgb25lIG9mIHRoZSB0ZXN0cwo+ IEkgdXNlZCBwcmV2aW91c2x5LiBUbyBydW4gd2l0aCAxMDAwIGZpbGVzIGluc3RlYWQgb2YgMTAw LCB3YXMgdGhlIGZpbGUKPiAiMTgwIiBzaW1wbHkgZWRpdHRlZCB0byBtYWtlIGl0IGxvb2sgbGlr ZSB0aGlzIGxvb3AgaW5zdGVhZD8KCkkgcmVkdWNlZCBpdCB0byAxMDAgZmlsZXMgc2ltcGx5IHRv IHNwZWVkIHVwIHRoZSB0ZXN0aW5nIHByb2Nlc3MgZm9yCnRoZSAiYmFkIGZpbGUgc2l6ZSIgcHJv YmxlbSBJIHdhcyB0cnlpbmcgdG8gZmluZC4gSWYgeW91IHdhbnQgdG8KcmVwcm9kdWNlIHRoZSBJ TyBjb2xsYXBzZSBpbiBhIGJpZyB3YXksIHJ1biBpdCB3aXRoIDEwMDAgZmlsZXMsIGFuZAppdCBo YXBwZW5zIGFib3V0IDIvM3JkcyBvZiB0aGUgd2F5IHRocm91Z2ggdGhlIHRlc3Qgb24gbXkgaGFy ZHdhcmUuCgo+ID4gPiBJdCBpcyB2ZXJ5IGNsZWFyIHRoYXQgZnJvbSB0aGUgSU8gY29tcGxldGlv bnMgdGhhdCB3ZSBhcmUgZ2V0dGluZyBhCj4gPiA+ICpsb3QqIG9mIGtzd2FwZCBkcml2ZW4gd3Jp dGViYWNrIGRpcmVjdGx5IHRocm91Z2ggLndyaXRlcGFnZToKPiA+ID4gCj4gPiA+ICQgZ3JlcCAi eGZzX3NldGZpbGVzaXplOiIgdC50IHxncmVwICI0MDk2JCIgfCB3YyAtbAo+ID4gPiA4MDEKPiA+ ID4gJCBncmVwICJ4ZnNfc2V0ZmlsZXNpemU6IiB0LnQgfGdyZXAgLXYgIjQwOTYkIiB8IHdjIC1s Cj4gPiA+IDc4Cj4gPiA+IAo+ID4gPiBTbyB0aGVyZSdzIH45MDAgSU8gY29tcGxldGlvbnMgdGhh dCBjaGFuZ2UgdGhlIGZpbGUgc2l6ZSwgYW5kIDkwJSBvZgo+ID4gPiB0aGVtIGFyZSBzaW5nbGUg cGFnZSB1cGRhdGVzLgo+ID4gPiAKPiA+ID4gJCBwcyAtZWYgfGdyZXAgW2tdc3dhcAo+ID4gPiBy b290ICAgICAgIDUxNCAgICAgMiAgMCAxMjo0MyA/ICAgICAgICAwMDowMDowMCBba3N3YXBkMF0K PiA+ID4gJCBncmVwICJ3cml0ZXBhZ2U6IiB0LnQgfCBncmVwICI1MTQgIiB8d2MgLWwKPiA+ID4g Nzk5Cj4gPiA+IAo+ID4gPiBPaCwgbm93IHRoYXQgaXMgdG9vIGNsb3NlIHRvIGp1c3QgYmUgYSBj by1pbmNpZGVuY2UuIFdlJ3JlIGdldHRpbmcKPiA+ID4gc2lnbmlmaWNhbnQgYW1vdW50cyBvZiBy YW5kb20gcGFnZSB3cml0ZWJhY2sgZnJvbSB0aGUgdGhlIGVuZHMgb2YKPiA+ID4gdGhlIExSVXMg ZG9uZSBieSB0aGUgVk0uCj4gPiA+IAo+ID4gPiA8c2lnaD4KPiAKPiBEb2VzIHRoZSB2YWx1ZSBm b3IgbnJfdm1zY2FuX3dyaXRlIGluIC9wcm9jL3Ztc3RhdCBjb3JyZWxhdGU/IEl0IG11c3QKPiBi dXQgbGV0cyBtZSBzdXJlIGJlY2F1c2UgSSdtIHVzaW5nIHRoYXQgZmlndXJlIHJhdGhlciB0aGFu IGZ0cmFjZSB0bwo+IGNvdW50IHdyaXRlYmFja3MgYXQgdGhlIG1vbWVudC4KClRoZSBudW1iZXIg aW4gL3Byb2Mvdm1zdGF0IGlzIGhpZ2hlci4gTXVjaCBoaWdoZXIuICBJIGp1c3QgcmFuIHRoZQp0 ZXN0IGF0IDEwMDAgZmlsZXMgKG9ubHkgY29sbGFwc2VkIHRvIH4zMDAwIGlvcHMgdGhpcyB0aW1l IGJlY2F1c2UgSQpyYW4gaXQgb24gYSBwbGFpbiAzLjAtcmM0IGtlcm5lbCB0aGF0IHN0aWxsIGhh cyB0aGUgLndyaXRlcGFnZQpjbHVzdGVyaW5nIGluIFhGUyksIGFuZCBJIHNlZToKCm5yX3Ztc2Nh bl93cml0ZSA2NzIzCgphZnRlciB0aGUgdGVzdC4gVGhlIGV2ZW50IHRyYWNlIG9ubHkgY2FwdHVy ZSB+MTQwMCB3cml0ZXBhZ2UgZXZlbnRzCmZyb20ga3N3YXBkLCBidXQgaXQgdGVuZHMgdG8gbWlz cyBhIGxvdCBvZiBldmVudHMgYXMgdGhlIHN5c3RlbSBpcwpxdWl0ZSB1bnJlc3BvbnNpdmUgYXQg dGltZXMgdW5kZXIgdGhpcyB3b3JrbG9hZCAtIGl0J3Mgbm90IHVuY29tbW9uCnRvIGhhdmUgc3No IHNlc3Npb25zIG5vdCBlY2hvIGEgY2hhcmFjdGVyIGZvciAxMHMuLi4gZS5nOiBJIHN0YXJ0ZWQK dGhlIHdvcmtsb2FkIH4xMTowODoyMjoKCiQgd2hpbGUgWyAxIF07IGRvIGRhdGU7IHNsZWVwIDE7 IGRvbmUKU2F0IEp1bCAgMiAxMTowODoxNSBFU1QgMjAxMQpTYXQgSnVsICAyIDExOjA4OjE2IEVT VCAyMDExClNhdCBKdWwgIDIgMTE6MDg6MTcgRVNUIDIwMTEKU2F0IEp1bCAgMiAxMTowODoxOCBF U1QgMjAxMQpTYXQgSnVsICAyIDExOjA4OjE5IEVTVCAyMDExClNhdCBKdWwgIDIgMTE6MDg6MjAg RVNUIDIwMTEKU2F0IEp1bCAgMiAxMTowODoyMSBFU1QgMjAxMQpTYXQgSnVsICAyIDExOjA4OjIy IEVTVCAyMDExICAgICAgICAgPDw8PDw8PDwgc3RhcnQgdGVzdCBoZXJlClNhdCBKdWwgIDIgMTE6 MDg6MjMgRVNUIDIwMTEKU2F0IEp1bCAgMiAxMTowODoyNCBFU1QgMjAxMQpTYXQgSnVsICAyIDEx OjA4OjI1IEVTVCAyMDExClNhdCBKdWwgIDIgMTE6MDg6MjYgRVNUIDIwMTEgICAgICAgICA8PDw8 PDw8PApTYXQgSnVsICAyIDExOjA4OjI3IEVTVCAyMDExICAgICAgICAgPDw8PDw8PDwKU2F0IEp1 bCAgMiAxMTowODozMCBFU1QgMjAxMSAgICAgICAgIDw8PDw8PDw8ClNhdCBKdWwgIDIgMTE6MDg6 MzUgRVNUIDIwMTEgICAgICAgICA8PDw8PDw8PApTYXQgSnVsICAyIDExOjA4OjM2IEVTVCAyMDEx ClNhdCBKdWwgIDIgMTE6MDg6MzcgRVNUIDIwMTEKU2F0IEp1bCAgMiAxMTowODozOCBFU1QgMjAx MSAgICAgICAgIDw8PDw8PDw8ClNhdCBKdWwgIDIgMTE6MDg6NDAgRVNUIDIwMTEgICAgICAgICA8 PDw8PDw8PApTYXQgSnVsICAyIDExOjA4OjQxIEVTVCAyMDExClNhdCBKdWwgIDIgMTE6MDg6NDIg RVNUIDIwMTEKU2F0IEp1bCAgMiAxMTowODo0MyBFU1QgMjAxMQoKQW5kIHRoZXJlIGFyZSBxdWl0 ZSBhIGZldyBtb3JlIG11bHRpLXNlY29uZCBob2xkb2ZmcyBkdXJpbmcgdGhlCnRlc3QsIHRvby4K Cj4gQSBtb3JlIHJlbGV2YW50IHF1ZXN0aW9uIGlzIHRoaXMgLQo+IGhvdyBtYW55IHBhZ2VzIHdl cmUgcmVjbGFpbWVkIGJ5IGtzd2FwZCBhbmQgd2hhdCBwZXJjZW50YWdlIGlzIDc5OQo+IHBhZ2Vz IG9mIHRoYXQ/IFdoYXQgZG8geW91IGNvbnNpZGVyIGFuIGFjY2VwdGFibGUgcGVyY2VudGFnZT8K CkkgZG9uJ3QgY2FyZSB3aGF0IHRoZSBwZXJjZW50YWdlIGlzIG9yIHdoYXQgdGhlIG51bWJlciBp cy4ga3N3YXBkIGlzCnJlY2xhaW1pbmcgcGFnZXMgbW9zdCBvZiB0aGUgdGltZSB3aXRob3V0IGFm ZmVjdCBJTyBwYXR0ZXJucywgYW5kCndoZW4gdGhhdCBoYXBwZW5zIEkganVzdCBkb24ndCBjYXJl IGJlY2F1c2UgaXQgaXMgd29ya2luZyBqdXN0IGZpbmUuCgpXaGF0IEkgY2FyZSBhYm91dCBpcyB3 aGF0IGtzd2FwZCBpcyBkb2luZyB3aGVuIGl0IGZpbmRzIGRpcnR5IHBhZ2VzCmFuZCBpdCBkZWNp ZGVzIHRoZXkgbmVlZCB0byBiZSB3cml0dGVuIGJhY2suIEl0J3Mgbm90IGEgcHJvYmxlbSB0aGF0 CnRoZXkgYXJlIGZvdW5kIG9yIG5lZWQgdG8gYmUgd3JpdHRlbiwgdGhlIHByb2JsZW0gaXMgdGhl IHV0dGVybHkKY3JhcCB3YXkgdGhhdCBtZW1vcnkgcmVjbGFpbSBpcyB0aHJvd2luZyB0aGUgcGFn ZXMgYXQgdGhlIGZpbGVzeXN0ZW0uCgpJJ20gbm90IHN1cmUgaG93IHRvIGdldCB0aHJvdWdoIHRv IHlvdSBndXlzIHRoYXQgc2luZ2xlLCByYW5kb20gcGFnZQp3cml0ZWJhY2sgaXMgKkJBRCouIFVz aW5nIC53cml0ZXBhZ2UgZGlyZWN0bHkgaXMgY29uc2lkZXJlZCBoYXJtZnVsCnRvIElPIHRocm91 Z2hwdXQsIGFuZCBtZW1vcnkgcmVjbGFpbSBuZWVkcyB0byBzdG9wIGRvaW5nIHRoYXQuCldlJ3Zl IGdvdCBoYWNrcyBpbiB0aGUgZmlsZXN5c3RlbXMgdG8gdHJ5IHRvIG1ha2UgdGhlIElPIG1lbW9y eQpyZWNsYWltIGV4ZWN1dGVzIHN1Y2sgbGVzcywgYnV0IHVsdGltYXRlbHkgdGhlIHByb2JsZW0g aXMgdGhlIElPCm1lbW9yeSByZWNsYWltIGlzIGRvaW5nLiBBbmQgbm93IHRoZSBtZW1vcnkgcmVj bGFpbSBJTyBwYXR0ZXJucyBhcmUKZ2V0dGluZyBpbiB0aGUgd2F5IG9mIGZ1cnRoZXIgaW1wcm92 aW5nIHRoZSB3cml0ZWJhY2sgcGF0aCBpbiBYRlMKYmVjYXVzZSB3ZXJlIGZpbmRpbmcgdGhlIGhh Y2tzIHdlJ3ZlIGJlZW4gY2FycnlpbmcgZm9yIHllYXJzIGFyZQoqc3RpbGwqIHRoZSBvbmx5IHRo aW5nIHRoYXQgaXMgbWFraW5nIElPIHVuZGVyIG1lbW9yeSBwcmVzc3VyZSBub3QKc3VjayBjb21w bGV0ZWx5LgoKV2hhdCBJIGZpbmQgZXh0cmVtZWx5IGZydXN0cmF0aW5nIGlzIHRoYXQgdGhpcyBp cyBub3QgYSBuZXcgaXNzdWUuCldlIChmaWxlc3lzdGVtIHBlb3BsZSkgaGF2ZSBiZWVuIGFza2lu ZyBmb3IgYSBsb25nIHRpbWUgdG8gaGF2ZSB0aGUKbWVtb3J5IHJlY2xhaW0gc3Vic3lzdGVtIGVp dGhlciBkZWZlciBJTyB0byB0aGUgd3JpdGViYWNrIHRocmVhZHMgb3IKdG8gdXNlIHRoZSAud3Jp dGVwYWdlcyBpbnRlcmZhY2UuIFdlJ3JlIG5vdCBhc2tpbmcgdGhpcyB0byBiZQpkaWZmaWN1bHQs IHdlJ3JlIGFza2luZyBmb3IgdGhpcyBzbyB0aGF0IHdlIGNhbiBjbHVzdGVyIElPIGluIGFuCm9w dGltYWwgbWFubmVyIHRvIGF2b2lkIHRoZXNlIElPIGNvbGxhcHNlcyB0aGF0IG1lbW9yeSByZWNs YWltCmN1cnJlbnRseSB0cmlnZ2Vycy4gIFdlIG5vdyBoYXZlIGdlbmVyaWMgbWV0aG9kcyBvZiBo YW5kaW5nIG9mZiBJTwp0byBmbHVzaGVyIHRocmVhZHMgdGhhdCBhbHNvIHByb3ZpZGUgc29tZSBs ZXZlbCBvZiB0aHJvdHRsaW5nLwpibG9ja2luZyB3aGlsZSBJTyBpcyBzdWJtaXR0ZWQgKGUuZy4g IHdyaXRlYmFja19pbm9kZXNfc2JfbnIoKSksIHNvCnRoaXMgc2hvdWxkbid0IGJlIGEgZGlmZmlj dWx0IHByb2JsZW0gdG8gc29sdmUgZm9yIHRoZSBtZW1vcnkKcmVjbGFpbSBzdWJzeXN0ZW0uCgpI ZWxsLCBtYXliZSBtZW1vcnkgcmVjbGFpbSBzaG91bGQgdGFrZSBhIGxlYWYgZnJvbSB0aGUgSU8t bGVzcwp0aHJvdHRsZSB3b3JrIHdlIGFyZSBkb2luZyAtIGhpdCBhIGJ1bmNoIG9mIGRpcnR5IHBh Z2VzIG9uIHRoZSBMUlUsCmp1c3QgYmFjayBvZmYgYW5kIGxldCB0aGUgd3JpdGViYWNrIHN1YnN5 c3RlbSBjbGVhbiBhIGZldyBtb3JlIHBhZ2VzCmJlZm9yZSBzdGFydGluZyBhbm90aGVyIHNjYW4u ICBMZXR0aW5nIHRoZSB3cml0ZWJhY2sgY29kZSBjbGVhbgpwYWdlcyBpcyB0aGUgZmFzdGVzdCB3 YXkgdG8gZ2V0IHBhZ2VzIGNsZWFuZWQgaW4gdGhlIHN5c3RlbSwgc28gaWYKd2UndmUgYWxyZWFk eSBnb3QgYSBnZW5lcmljIG1ldGhvZCBmb3IgY2xlYW5pbmcgYW5kL29yIHdhaXRpbmcgZm9yCnBh Z2VzIHRvIGJlIGNsZWFuZWQsIHdoeSBub3QgYWltIHRvIHVzZSB0aGF0PwoKQW5kIHdoaWxlIEkn bSByYW50aW5nLCB3aGVuIG9uIGVhcnRoIGlzIHRoZSBpc3N1ZS13cml0ZWJhY2stZnJvbS0KZGly ZWN0LXJlY2xhaW0gcHJvYmxlbSBnb2luZyB0byBiZSBmaXhlZCBzbyB3ZSBjYW4gcmVtb3ZlIHRo ZSBoYWNrcwppbiB0aGUgZmlsZXN5c3RlbSAud3JpdGVwYWdlIGltcGxlbWVudGF0aW9ucyB0byBw cmV2ZW50IHRoaXMgZnJvbQpvY2N1cnJpbmc/CgpJIG1lYW4sIHdoZW4gd2UgY29tYmluZSB0aGUg dHdvIGlzc3VlcywgZG9lc24ndCBpdCBpbXBseSB0aGF0IHRoZQptZW1vcnkgcmVjbGFpbSBzdWJz eXN0ZW0gbmVlZHMgdG8gYmUgcmVkZXNpZ25lZCBhcm91bmQgdGhlIGZhY3QgaXQKKmNhbid0IGNs ZWFuIHBhZ2VzIGRpcmVjdGx5Kj8gIFRoaXMgSU8gY29sbGFwc2UgaXNzdWUgc2hvd3MgdGhhdCB3 ZQpyZWFsbHkgZG9uJ3QgJ3Qgd2FudCBrc3dhcGQgaXNzdWluZyBJTyBkaXJlY3RseSB2aWEgLndy aXRlcGFnZSwgYW5kCndlIGFscmVhZHkgcmVqZWN0IElPIGZyb20gZGlyZWN0IHJlY2xhaW0gaW4g LndyaXRlcGFnZSBpbiBleHQ0LCBYRlMKYW5kIEJUUkZTIGJlY2F1c2Ugd2UnbGwgb3ZlcnJ1biB0 aGUgc3RhY2sgb24gYW55dGhpbmcgb3RoZXIgdGhhbgp0cml2aWFsIHN0b3JhZ2UgY29uZmlndXJh dGlvbnMuCgpUaGF0IHNheXMgdG8gbWUgaW4gYSBiaWcsIGZsYXNoaW5nIGJyaWdodCBwaW5rIG5l b24gc2lnbiB3YXkgdGhhdAptZW1vcnkgcmVjbGFpbSBzaW1wbHkgc2hvdWxkIG5vdCBiZSBpc3N1 aW5nIElPIGF0IGFsbC4gUGVyaGFwcyBpdCdzCnRpbWUgdG8gcmV0aGluayB0aGUgd2F5IG1lbW9y eSByZWNsYWltIGRlYWxzIHdpdGggZGlydHkgcGFnZXMgdG8KdGFrZSBpbnRvIGFjY291bnQgdGhl IGN1cnJlbnQgcmVhbGl0eT8KCjwvcmFudD4KCj4gPiBPbiBGcmksIEp1bCAwMSwgMjAxMSBhdCAw NzoyMDoyMVBNICsxMDAwLCBEYXZlIENoaW5uZXIgd3JvdGU6Cj4gPiA+ID4gTG9va3MgZ29vZC4g IEkgc3RpbGwgd29uZGVyIHdoeSBJIGhhdmVuJ3QgYmVlbiBhYmxlIHRvIGhpdCB0aGlzLgo+ID4g PiA+IEhhdmVuJ3Qgc2VlbiBhbnkgMTgwIGZhaWx1cmUgZm9yIGEgbG9uZyB0aW1lLCB3aXRoIGJv dGggNGsgYW5kIDUxMiBieXRlCj4gPiA+ID4gZmlsZXN5c3RlbXMgYW5kIHNpbmNlIHllc3RlcmRh eSAxayBhcyB3ZWxsLgo+ID4gPiAKPiA+ID4gSXQgcmVxdWlyZXMgdGhlIHRlc3QgdG8gcnVuIHRo ZSBWTSBvdXQgb2YgUkFNIGFuZCB0aGVuIGZvcmNlIGVub3VnaAo+ID4gPiBtZW1vcnkgcHJlc3N1 cmUgZm9yIGtzd2FwZCB0byBzdGFydCB3cml0ZWJhY2sgZnJvbSB0aGUgTFJVLiBUaGUKPiA+ID4g cmVwcm9kdWNlciBJIGhhdmUgaXMgYSAxcCwgMUdCIFJBTSBWTSB3aXRoIGl0J3MgZGlzayBpbWFn ZSBvbiBhCj4gPiA+IDEwME1CL3MgSFcgUkFJRDEgdy8gNTEyTUIgQkJXQyBkaXNrIHN1YnN5c3Rl bS4KPiA+ID4gCj4gCj4gWW91IHNheSBpdCdzIGEgMUcgVk0gYnV0IHlvdSBkb24ndCBzYXkgd2hh dCBhcmNoaXRlY3VyZS4KCng4Ni02NCBmb3IgYm90aCB0aGUgZ3Vlc3QgYW5kIHRoZSBob3N0LgoK Q2hlZXJzLAoKRGF2ZS4KLS0gCkRhdmUgQ2hpbm5lcgpkYXZpZEBmcm9tb3JiaXQuY29tCgpfX19f X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fXwp4ZnMgbWFpbGluZyBs aXN0Cnhmc0Bvc3Muc2dpLmNvbQpodHRwOi8vb3NzLnNnaS5jb20vbWFpbG1hbi9saXN0aW5mby94 ZnMK From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail138.messagelabs.com (mail138.messagelabs.com [216.82.249.35]) by kanga.kvack.org (Postfix) with SMTP id 35F106B0012 for ; Fri, 1 Jul 2011 22:42:25 -0400 (EDT) Date: Sat, 2 Jul 2011 12:42:19 +1000 From: Dave Chinner Subject: Re: [PATCH 03/27] xfs: use write_cache_pages for writeback clustering Message-ID: <20110702024219.GT561@dastard> References: <20110629140109.003209430@bombadil.infradead.org> <20110629140336.950805096@bombadil.infradead.org> <20110701022248.GM561@dastard> <20110701041851.GN561@dastard> <20110701093305.GA28531@infradead.org> <20110701145935.GB29530@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20110701145935.GB29530@suse.de> Sender: owner-linux-mm@kvack.org List-ID: To: Mel Gorman Cc: Christoph Hellwig , Johannes Weiner , Wu Fengguang , xfs@oss.sgi.com, jack@suse.cz, linux-mm@kvack.org On Fri, Jul 01, 2011 at 03:59:35PM +0100, Mel Gorman wrote: > On Fri, Jul 01, 2011 at 05:33:05AM -0400, Christoph Hellwig wrote: > > Johannes, Mel, Wu, > > Am adding Jan Kara as he has been working on writeback efficiency > recently as well. Writeback looks to be working fine - it's kswapd screwing up the writeback patterns that appears to be the problem.... > > Dave has been stressing some XFS patches of mine that remove the XFS > > internal writeback clustering in favour of using write_cache_pages. > > Against what kernel? 2.6.38 was a disaster for reclaim I've been > finding out this week. I don't know about 2.6.38.8. 2.6.39 was better. 3.0-rc4 .... > The number of pages written from reclaim is exceptionally low (2.6.38 > was a total disaster but that release was bad for a number of reasons, > haven't tested 2.6.38.8 yet) but reduced by 2.6.37 as expected. Direct > reclaim usage was reduced and efficiency (ratio of pages scanned to > pages reclaimed) was high. And is that consistent across ext3/ext4/xfs/btrfs filesystems? I doubt it very much, as all have very different .writepage behaviours... BTW, called a workload "fsmark" tells us nothing about the workload being tested - fsmark can do a lot of interesting things. IOWs, you need to quote the command line for it to be meaningful to anyone... > As I look through the results I have at the moment, the number of > pages written back was simply really low which is why the problem fell > off my radar. It doesn't take many to completely screw up writeback IO patterns. Write a few random pages to a 10MB file well before writeback would get to the file, and instead of getting optimal sequential writeback patterns when writeback gets to it, we get multiple disjoint IOs that require multiple seeks to complete. Slower, less efficient writeback IO causes memory pressure to last longer and hence more likely to result in kswapd writeback, and it's just a downward spiral from there.... > > > That means the test is only using 1GB of disk space, and > > > I'm running on a VM with 1GB RAM. It appears to be related to the VM > > > triggering random page writeback from the LRU - 100x10MB files more > > > than fills memory, hence it being the smallest test case i could > > > reproduce the problem on. > > > > > My tests were on a machine with 8G and ext3. I'm running some of > the tests against ext4 and xfs to see if that makes a difference but > it's possible the tests are simply not agressive enough so I want to > reproduce Dave's test if possible. To tell the truth, I don't think anyone really cares how ext3 performs these days. XFS seems to be the filesystem that brings out all the bad behaviour in the mm subsystem.... FWIW, the mm subsystem works well enough when there is RAM available, so I'd suggest that your reclaim testing needs to focus on smaller memory configurations to really stress the reclaim algorithms. That's one of the reason why I regularly test on 1GB, 1p machines - they show problems that are hard to repa??oduce on larger configs.... > I'm assuming "test 180" is from xfstests which was not one of the tests > I used previously. To run with 1000 files instead of 100, was the file > "180" simply editted to make it look like this loop instead? I reduced it to 100 files simply to speed up the testing process for the "bad file size" problem I was trying to find. If you want to reproduce the IO collapse in a big way, run it with 1000 files, and it happens about 2/3rds of the way through the test on my hardware. > > > It is very clear that from the IO completions that we are getting a > > > *lot* of kswapd driven writeback directly through .writepage: > > > > > > $ grep "xfs_setfilesize:" t.t |grep "4096$" | wc -l > > > 801 > > > $ grep "xfs_setfilesize:" t.t |grep -v "4096$" | wc -l > > > 78 > > > > > > So there's ~900 IO completions that change the file size, and 90% of > > > them are single page updates. > > > > > > $ ps -ef |grep [k]swap > > > root 514 2 0 12:43 ? 00:00:00 [kswapd0] > > > $ grep "writepage:" t.t | grep "514 " |wc -l > > > 799 > > > > > > Oh, now that is too close to just be a co-incidence. We're getting > > > significant amounts of random page writeback from the the ends of > > > the LRUs done by the VM. > > > > > > > > Does the value for nr_vmscan_write in /proc/vmstat correlate? It must > but lets me sure because I'm using that figure rather than ftrace to > count writebacks at the moment. The number in /proc/vmstat is higher. Much higher. I just ran the test at 1000 files (only collapsed to ~3000 iops this time because I ran it on a plain 3.0-rc4 kernel that still has the .writepage clustering in XFS), and I see: nr_vmscan_write 6723 after the test. The event trace only capture ~1400 writepage events from kswapd, but it tends to miss a lot of events as the system is quite unresponsive at times under this workload - it's not uncommon to have ssh sessions not echo a character for 10s... e.g: I started the workload ~11:08:22: $ while [ 1 ]; do date; sleep 1; done Sat Jul 2 11:08:15 EST 2011 Sat Jul 2 11:08:16 EST 2011 Sat Jul 2 11:08:17 EST 2011 Sat Jul 2 11:08:18 EST 2011 Sat Jul 2 11:08:19 EST 2011 Sat Jul 2 11:08:20 EST 2011 Sat Jul 2 11:08:21 EST 2011 Sat Jul 2 11:08:22 EST 2011 <<<<<<<< start test here Sat Jul 2 11:08:23 EST 2011 Sat Jul 2 11:08:24 EST 2011 Sat Jul 2 11:08:25 EST 2011 Sat Jul 2 11:08:26 EST 2011 <<<<<<<< Sat Jul 2 11:08:27 EST 2011 <<<<<<<< Sat Jul 2 11:08:30 EST 2011 <<<<<<<< Sat Jul 2 11:08:35 EST 2011 <<<<<<<< Sat Jul 2 11:08:36 EST 2011 Sat Jul 2 11:08:37 EST 2011 Sat Jul 2 11:08:38 EST 2011 <<<<<<<< Sat Jul 2 11:08:40 EST 2011 <<<<<<<< Sat Jul 2 11:08:41 EST 2011 Sat Jul 2 11:08:42 EST 2011 Sat Jul 2 11:08:43 EST 2011 And there are quite a few more multi-second holdoffs during the test, too. > A more relevant question is this - > how many pages were reclaimed by kswapd and what percentage is 799 > pages of that? What do you consider an acceptable percentage? I don't care what the percentage is or what the number is. kswapd is reclaiming pages most of the time without affect IO patterns, and when that happens I just don't care because it is working just fine. What I care about is what kswapd is doing when it finds dirty pages and it decides they need to be written back. It's not a problem that they are found or need to be written, the problem is the utterly crap way that memory reclaim is throwing the pages at the filesystem. I'm not sure how to get through to you guys that single, random page writeback is *BAD*. Using .writepage directly is considered harmful to IO throughput, and memory reclaim needs to stop doing that. We've got hacks in the filesystems to try to make the IO memory reclaim executes suck less, but ultimately the problem is the IO memory reclaim is doing. And now the memory reclaim IO patterns are getting in the way of further improving the writeback path in XFS because were finding the hacks we've been carrying for years are *still* the only thing that is making IO under memory pressure not suck completely. What I find extremely frustrating is that this is not a new issue. We (filesystem people) have been asking for a long time to have the memory reclaim subsystem either defer IO to the writeback threads or to use the .writepages interface. We're not asking this to be difficult, we're asking for this so that we can cluster IO in an optimal manner to avoid these IO collapses that memory reclaim currently triggers. We now have generic methods of handing off IO to flusher threads that also provide some level of throttling/ blocking while IO is submitted (e.g. writeback_inodes_sb_nr()), so this shouldn't be a difficult problem to solve for the memory reclaim subsystem. Hell, maybe memory reclaim should take a leaf from the IO-less throttle work we are doing - hit a bunch of dirty pages on the LRU, just back off and let the writeback subsystem clean a few more pages before starting another scan. Letting the writeback code clean pages is the fastest way to get pages cleaned in the system, so if we've already got a generic method for cleaning and/or waiting for pages to be cleaned, why not aim to use that? And while I'm ranting, when on earth is the issue-writeback-from- direct-reclaim problem going to be fixed so we can remove the hacks in the filesystem .writepage implementations to prevent this from occurring? I mean, when we combine the two issues, doesn't it imply that the memory reclaim subsystem needs to be redesigned around the fact it *can't clean pages directly*? This IO collapse issue shows that we really don't 't want kswapd issuing IO directly via .writepage, and we already reject IO from direct reclaim in .writepage in ext4, XFS and BTRFS because we'll overrun the stack on anything other than trivial storage configurations. That says to me in a big, flashing bright pink neon sign way that memory reclaim simply should not be issuing IO at all. Perhaps it's time to rethink the way memory reclaim deals with dirty pages to take into account the current reality? > > On Fri, Jul 01, 2011 at 07:20:21PM +1000, Dave Chinner wrote: > > > > Looks good. I still wonder why I haven't been able to hit this. > > > > Haven't seen any 180 failure for a long time, with both 4k and 512 byte > > > > filesystems and since yesterday 1k as well. > > > > > > It requires the test to run the VM out of RAM and then force enough > > > memory pressure for kswapd to start writeback from the LRU. The > > > reproducer I have is a 1p, 1GB RAM VM with it's disk image on a > > > 100MB/s HW RAID1 w/ 512MB BBWC disk subsystem. > > > > > You say it's a 1G VM but you don't say what architecure. x86-64 for both the guest and the host. Cheers, Dave. -- Dave Chinner david@fromorbit.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org