From mboxrd@z Thu Jan 1 00:00:00 1970 From: Minchan Kim Subject: Re: [PATCH 00/31] Move LRU page reclaim from zones to nodes v8 Date: Mon, 4 Jul 2016 17:04:12 +0900 Message-ID: <20160704080412.GA24605@bbox> References: <1467403299-25786-1-git-send-email-mgorman@techsingularity.net> <20160704013703.GA19943@bbox> <20160704043405.GB11498@techsingularity.net> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Return-path: Received: from lgeamrelo13.lge.com (LGEAMRELO13.lge.com [156.147.23.53]) by gabe.freedesktop.org (Postfix) with ESMTP id 8FA536E3BF for ; Mon, 4 Jul 2016 08:05:24 +0000 (UTC) In-Reply-To: <20160704043405.GB11498@techsingularity.net> Content-Disposition: inline List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: Mel Gorman Cc: Rik van Riel , intel-gfx@lists.freedesktop.org, LKML , dri-devel@lists.freedesktop.org, Linux-MM , Johannes Weiner , daniel.vetter@intel.com, Andrew Morton , Vlastimil Babka List-Id: dri-devel@lists.freedesktop.org T24gTW9uLCBKdWwgMDQsIDIwMTYgYXQgMDU6MzQ6MDVBTSArMDEwMCwgTWVsIEdvcm1hbiB3cm90 ZToKPiBPbiBNb24sIEp1bCAwNCwgMjAxNiBhdCAxMDozNzowM0FNICswOTAwLCBNaW5jaGFuIEtp bSB3cm90ZToKPiA+ID4gVGhlIHJlYXNvbiB3ZSBoYXZlIHpvbmUtYmFzZWQgcmVjbGFpbSBpcyB0 aGF0IHdlIHVzZWQgdG8gaGF2ZQo+ID4gPiBsYXJnZSBoaWdobWVtIHpvbmVzIGluIGNvbW1vbiBj b25maWd1cmF0aW9ucyBhbmQgaXQgd2FzIG5lY2Vzc2FyeQo+ID4gPiB0byBxdWlja2x5IGZpbmQg Wk9ORV9OT1JNQUwgcGFnZXMgZm9yIHJlY2xhaW0uIFRvZGF5LCB0aGlzIGlzIG11Y2gKPiA+ID4g bGVzcyBvZiBhIGNvbmNlcm4gYXMgbWFjaGluZXMgd2l0aCBsb3RzIG9mIG1lbW9yeSB3aWxsIChv ciBzaG91bGQpIHVzZQo+ID4gPiA2NC1iaXQga2VybmVscy4gQ29tYmluYXRpb25zIG9mIDMyLWJp dCBoYXJkd2FyZSBhbmQgNjQtYml0IGhhcmR3YXJlIGFyZQo+ID4gPiByYXJlLiBNYWNoaW5lcyB0 aGF0IGRvIHVzZSBoaWdobWVtIHNob3VsZCBoYXZlIHJlbGF0aXZlbHkgbG93IGhpZ2htZW06bG93 bWVtCj4gPiA+IHJhdGlvcyB0aGFuIHdlIHdvcnJpZWQgYWJvdXQgaW4gdGhlIHBhc3QuCj4gPiAK PiA+IEhlbGxvIE1lbCwKPiA+IAo+ID4gSSBhZ3JlZSB0aGUgZGlyZWN0aW9uIGFic29sdXRlbHku IEhvd2V2ZXIsIEkgaGF2ZSBhIGNvbmNlcm4gb24gaGlnaG1lbQo+ID4gc3lzdGVtIGFzIHlvdSBh bHJlYWR5IG1lbnRpb25lZC4KPiA+IAo+ID4gRW1iZWRkZWQgcHJvZHVjdHMgc3RpbGwgdXNlIDIg fiAzIHJhdGlvIChoaWdobWVtOmxvd21lbSkuCj4gPiBJbiBzdWNoIHN5c3RlbSwgTFJVIGNodXJu aW5nIGJ5IHNraXBwaW5nIG90aGVyIHpvbmUgcGFnZXMgZnJlcXVlbnRseQo+ID4gbWlnaHQgYmUg c2lnbmlmaWNhbnQgZm9yIHRoZSBwZXJmb3JtYW5jZS4KPiA+IAo+ID4gSG93IGJpZyByYXRpbyBi ZXR3ZWVuIGhpZ2htZW06bG93bWVtIGRvIHlvdSB0aGluayBhIHByb2JsZW0/Cj4gPiAKPiAKPiBU aGF0J3MgYSAiaG93IGxvbmcgaXMgYSBwaWVjZSBvZiBzdHJpbmciIHR5cGUgcXVlc3Rpb24uICBU aGUgcmF0aW8gZG9lcwo+IG5vdCBtYXR0ZXIgYXMgbXVjaCBhcyB3aGV0aGVyIHRoZSB3b3JrbG9h ZCBpcyBib3RoIHVuZGVyIG1lbW9yeSBwcmVzc3VyZQo+IGFuZCByZXF1aXJlcyBsYXJnZSBhbW91 bnRzIG9mIGxvd21lbSBwYWdlcy4gRXZlbiBvbiBzeXN0ZW1zIHdpdGggdmVyeSBoaWdoCj4gcmF0 aW9zLCBpdCBtYXkgbm90IGJlIGEgcHJvYmxlbSBpZiBISUdIUFRFIGlzIGVuYWJsZWQuCgpBcyB3 ZWxsIHBhZ2UgdGFibGUsIHBnZC9rZXJuZWxzdGFjay96YnVkL3NsYWIgYW5kIHNvIG9uLCBldmVy eSBrZXJuZWwKYWxsb2NhdGlvbnMgd2FudGVkIHRvIG1hc2sgX19HRlBfSElHSE1FTSBvZmYgd291 bGQgYmUgYSBwcm9ibGVtIGluCjMyYml0IHN5c3RlbS4KCkl0IGFsc28gZGVwZW5kcyBvbiB0aGF0 IGhvdyBtYW55IGRyaXZlcnMgbmVlZGVkIGxvd21lbSBvbmx5IHdlIGhhdmUKaW4gdGhlIHN5c3Rl bS4KCkkgZG9uJ3Qga25vdyBob3cgbWFueSBzdWNoIGRyaXZlciBpbiB0aGUgd29ybGQuIFdoZW4g SSBzaW1wbHkgZG8gZ3JlcCwKSSBmb3VuZCBzZXZlcmFsIGNhc2VzIHdoaWNoIG1hc2sgX19HRlBf SElHSE1FTSBvZmYgYW5kIGFtb25nIHRoZW0sCkkgZ3Vlc3MgRFJNIG1pZ2h0IGJlIGEgcG9wdWxh ciBmb3IgdXMuIEhvd2V2ZXIsIGl0IG1pZ2h0IGJlIHJlYWxseSByYXJlCnVzZWNhc2UgYW1vbmcg dmFyaW91cyBpOTE1IHVzZWNhc2VzLgoKPiAKPiA+ID4gCj4gPiA+IENvbmNlcHR1YWxseSwgbW92 aW5nIHRvIG5vZGUgTFJVcyBzaG91bGQgYmUgZWFzaWVyIHRvIHVuZGVyc3RhbmQuIFRoZQo+ID4g PiBwYWdlIGFsbG9jYXRvciBwbGF5cyBmZXdlciB0cmlja3MgdG8gZ2FtZSByZWNsYWltIGFuZCBy ZWNsYWltIGJlaGF2ZXMKPiA+ID4gc2ltaWxhcmx5IG9uIGFsbCBub2Rlcy4gCj4gPiA+IAo+ID4g PiBUaGUgc2VyaWVzIGhhcyBiZWVuIHRlc3RlZCBvbiBhIDE2IGNvcmUgVU1BIG1hY2hpbmUgYW5k IGEgMi1zb2NrZXQgNDgKPiA+ID4gY29yZSBOVU1BIG1hY2hpbmUuIFRoZSBVTUEgcmVzdWx0cyBh cmUgcHJlc2VudGVkIGluIG1vc3QgY2FzZXMgYXMgdGhlIE5VTUEKPiA+ID4gbWFjaGluZSBiZWhh dmVkIHNpbWlsYXJseS4KPiA+IAo+ID4gSSBndWVzcyB5b3Ugd291bGQgYWxyZWFkeSB0ZXN0IGJl bG93IHdpdGggdmFyaW91cyBoaWdobWVtIHN5c3RlbShlLmcuLAo+ID4gMjoxLCAzOjEsIDQ6MSBh bmQgc28gb24pLiBJZiB5b3UgaGF2ZSwgY291bGQgeW91IG1pbmQgc2hhcmluZyBpdD8KPiA+IAo+ IAo+IEkgaGF2ZW4ndCB0aGF0IGRhdGEsIHRoZSBiYXNlbGluZSBkaXN0cmlidXRpb24gdXNlZCBk b2Vzbid0IGV2ZW4gaGF2ZQo+IDMyLWJpdCBzdXBwb3J0LiBFdmVuIGlmIGl0IHdhcywgdGhlIHJl c3VsdHMgbWF5IG5vdCBiZSB0aGF0IGludGVyZXN0aW5nLgo+IFRoZSB3b3JrbG9hZHMgdXNlZCB3 ZXJlIG5vdCBuZWNlc3NhcmlseSBnb2luZyB0byB0cmlnZ2VyIGxvd21lbSBwcmVzc3VyZQo+IGFz IEhJR0hQVEUgd2FzIHNldCBvbiB0aGUgMzItYml0IGNvbmZpZ3MuCgpUaGF0IG1lYW5zIHdlIGRp ZG4ndCB0ZXN0IHRoaXMgb24gMzItYml0IHdpdGggaGlnaG1lbS4KCkknbSBub3Qgc3VyZSBpdCdz IHJlYWxseSB0b28gcmFyZSBjYXNlIHRvIHNwZW5kIGEgdGltZSBmb3IgdGVzdGluZy4KSW4gZmFj dCwgSSByZWFsbHkgd2FudCB0byB0ZXN0IGFsbCBzZXJpZXMgdG8gb3VyIHByb2R1Y3Rpb24gc3lz dGVtCndoaWNoIGlzIDMyYml0IGFuZCBoaWdobWVtIGJ1dCBhcyB3ZSBrbm93IHdlbGwsIG1vc3Qg b2YgZW1iZWRkZWQKc3lzdGVtIGtlcm5lbCBpcyByYXRoZXIgb2xkIHNvIGJhY2twb3J0aW5nIG5l ZWRzIGxvdHMgb2YgdGltZSBhbmQKY2FyZS4gSG93ZXZlciwgaWYgd2UgbWlzcyB0ZXN0aW5nIGlu IHRob3NlIHN5c3RlbSBhdCB0aGUgbW9tZW50LAp3ZSB3aWxsIGJlIHN1cHJpc2VkIGFmdGVyIDF+ MiB5ZWFycy4KCkkgZG9uJ3Qga25vdyB3aGF0IGtpbmRzIG9mIGJlbmNobWFyayBjYW4gd2UgY2Fu IGNoZWNrIGl0IHNvIEkgY2Fubm90Cmluc2lzdCBvbiBpdCBidXQgeW91IG1pZ2h0IGtub3cgaXQu CgpPa2F5LCBkbyB5b3UgaGF2ZSBhbnkgaWRlYSB0byBmaXggaXQgaWYgd2Ugc2VlIHN1Y2ggcmVn cmVzc2lvbiByZXBvcnQKaW4gMzItYml0IHN5c3RlbSBpbiBmdXR1cmU/Cl9fX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fCmRyaS1kZXZlbCBtYWlsaW5nIGxpc3QK ZHJpLWRldmVsQGxpc3RzLmZyZWVkZXNrdG9wLm9yZwpodHRwczovL2xpc3RzLmZyZWVkZXNrdG9w Lm9yZy9tYWlsbWFuL2xpc3RpbmZvL2RyaS1kZXZlbAo= From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-it0-f69.google.com (mail-it0-f69.google.com [209.85.214.69]) by kanga.kvack.org (Postfix) with ESMTP id C95206B0005 for ; Mon, 4 Jul 2016 04:05:24 -0400 (EDT) Received: by mail-it0-f69.google.com with SMTP id g8so180007297itb.2 for ; Mon, 04 Jul 2016 01:05:24 -0700 (PDT) Received: from lgeamrelo13.lge.com (LGEAMRELO13.lge.com. [156.147.23.53]) by mx.google.com with ESMTP id b188si9596ite.101.2016.07.04.01.05.23 for ; Mon, 04 Jul 2016 01:05:24 -0700 (PDT) Date: Mon, 4 Jul 2016 17:04:12 +0900 From: Minchan Kim Subject: Re: [PATCH 00/31] Move LRU page reclaim from zones to nodes v8 Message-ID: <20160704080412.GA24605@bbox> References: <1467403299-25786-1-git-send-email-mgorman@techsingularity.net> <20160704013703.GA19943@bbox> <20160704043405.GB11498@techsingularity.net> MIME-Version: 1.0 In-Reply-To: <20160704043405.GB11498@techsingularity.net> Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline Sender: owner-linux-mm@kvack.org List-ID: To: Mel Gorman Cc: Andrew Morton , Linux-MM , Rik van Riel , Vlastimil Babka , Johannes Weiner , LKML , daniel.vetter@intel.com, intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org, David Airlie On Mon, Jul 04, 2016 at 05:34:05AM +0100, Mel Gorman wrote: > On Mon, Jul 04, 2016 at 10:37:03AM +0900, Minchan Kim wrote: > > > The reason we have zone-based reclaim is that we used to have > > > large highmem zones in common configurations and it was necessary > > > to quickly find ZONE_NORMAL pages for reclaim. Today, this is much > > > less of a concern as machines with lots of memory will (or should) use > > > 64-bit kernels. Combinations of 32-bit hardware and 64-bit hardware are > > > rare. Machines that do use highmem should have relatively low highmem:lowmem > > > ratios than we worried about in the past. > > > > Hello Mel, > > > > I agree the direction absolutely. However, I have a concern on highmem > > system as you already mentioned. > > > > Embedded products still use 2 ~ 3 ratio (highmem:lowmem). > > In such system, LRU churning by skipping other zone pages frequently > > might be significant for the performance. > > > > How big ratio between highmem:lowmem do you think a problem? > > > > That's a "how long is a piece of string" type question. The ratio does > not matter as much as whether the workload is both under memory pressure > and requires large amounts of lowmem pages. Even on systems with very high > ratios, it may not be a problem if HIGHPTE is enabled. As well page table, pgd/kernelstack/zbud/slab and so on, every kernel allocations wanted to mask __GFP_HIGHMEM off would be a problem in 32bit system. It also depends on that how many drivers needed lowmem only we have in the system. I don't know how many such driver in the world. When I simply do grep, I found several cases which mask __GFP_HIGHMEM off and among them, I guess DRM might be a popular for us. However, it might be really rare usecase among various i915 usecases. > > > > > > > Conceptually, moving to node LRUs should be easier to understand. The > > > page allocator plays fewer tricks to game reclaim and reclaim behaves > > > similarly on all nodes. > > > > > > The series has been tested on a 16 core UMA machine and a 2-socket 48 > > > core NUMA machine. The UMA results are presented in most cases as the NUMA > > > machine behaved similarly. > > > > I guess you would already test below with various highmem system(e.g., > > 2:1, 3:1, 4:1 and so on). If you have, could you mind sharing it? > > > > I haven't that data, the baseline distribution used doesn't even have > 32-bit support. Even if it was, the results may not be that interesting. > The workloads used were not necessarily going to trigger lowmem pressure > as HIGHPTE was set on the 32-bit configs. That means we didn't test this on 32-bit with highmem. I'm not sure it's really too rare case to spend a time for testing. In fact, I really want to test all series to our production system which is 32bit and highmem but as we know well, most of embedded system kernel is rather old so backporting needs lots of time and care. However, if we miss testing in those system at the moment, we will be suprised after 1~2 years. I don't know what kinds of benchmark can we can check it so I cannot insist on it but you might know it. Okay, do you have any idea to fix it if we see such regression report in 32-bit system in future? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753317AbcGDIF3 (ORCPT ); Mon, 4 Jul 2016 04:05:29 -0400 Received: from LGEAMRELO13.lge.com ([156.147.23.53]:52161 "EHLO lgeamrelo13.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750895AbcGDIFY (ORCPT ); Mon, 4 Jul 2016 04:05:24 -0400 X-Original-SENDERIP: 156.147.1.125 X-Original-MAILFROM: minchan@kernel.org X-Original-SENDERIP: 165.244.98.204 X-Original-MAILFROM: minchan@kernel.org X-Original-SENDERIP: 10.177.223.161 X-Original-MAILFROM: minchan@kernel.org Date: Mon, 4 Jul 2016 17:04:12 +0900 From: Minchan Kim To: Mel Gorman CC: Andrew Morton , Linux-MM , Rik van Riel , Vlastimil Babka , Johannes Weiner , LKML , , , , David Airlie Subject: Re: [PATCH 00/31] Move LRU page reclaim from zones to nodes v8 Message-ID: <20160704080412.GA24605@bbox> References: <1467403299-25786-1-git-send-email-mgorman@techsingularity.net> <20160704013703.GA19943@bbox> <20160704043405.GB11498@techsingularity.net> MIME-Version: 1.0 In-Reply-To: <20160704043405.GB11498@techsingularity.net> User-Agent: Mutt/1.5.21 (2010-09-15) X-MIMETrack: Itemize by SMTP Server on LGEKRMHUB04/LGE/LG Group(Release 8.5.3FP6|November 21, 2013) at 2016/07/04 17:03:32, Serialize by Router on LGEKRMHUB04/LGE/LG Group(Release 8.5.3FP6|November 21, 2013) at 2016/07/04 17:03:32, Serialize complete at 2016/07/04 17:03:32 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jul 04, 2016 at 05:34:05AM +0100, Mel Gorman wrote: > On Mon, Jul 04, 2016 at 10:37:03AM +0900, Minchan Kim wrote: > > > The reason we have zone-based reclaim is that we used to have > > > large highmem zones in common configurations and it was necessary > > > to quickly find ZONE_NORMAL pages for reclaim. Today, this is much > > > less of a concern as machines with lots of memory will (or should) use > > > 64-bit kernels. Combinations of 32-bit hardware and 64-bit hardware are > > > rare. Machines that do use highmem should have relatively low highmem:lowmem > > > ratios than we worried about in the past. > > > > Hello Mel, > > > > I agree the direction absolutely. However, I have a concern on highmem > > system as you already mentioned. > > > > Embedded products still use 2 ~ 3 ratio (highmem:lowmem). > > In such system, LRU churning by skipping other zone pages frequently > > might be significant for the performance. > > > > How big ratio between highmem:lowmem do you think a problem? > > > > That's a "how long is a piece of string" type question. The ratio does > not matter as much as whether the workload is both under memory pressure > and requires large amounts of lowmem pages. Even on systems with very high > ratios, it may not be a problem if HIGHPTE is enabled. As well page table, pgd/kernelstack/zbud/slab and so on, every kernel allocations wanted to mask __GFP_HIGHMEM off would be a problem in 32bit system. It also depends on that how many drivers needed lowmem only we have in the system. I don't know how many such driver in the world. When I simply do grep, I found several cases which mask __GFP_HIGHMEM off and among them, I guess DRM might be a popular for us. However, it might be really rare usecase among various i915 usecases. > > > > > > > Conceptually, moving to node LRUs should be easier to understand. The > > > page allocator plays fewer tricks to game reclaim and reclaim behaves > > > similarly on all nodes. > > > > > > The series has been tested on a 16 core UMA machine and a 2-socket 48 > > > core NUMA machine. The UMA results are presented in most cases as the NUMA > > > machine behaved similarly. > > > > I guess you would already test below with various highmem system(e.g., > > 2:1, 3:1, 4:1 and so on). If you have, could you mind sharing it? > > > > I haven't that data, the baseline distribution used doesn't even have > 32-bit support. Even if it was, the results may not be that interesting. > The workloads used were not necessarily going to trigger lowmem pressure > as HIGHPTE was set on the 32-bit configs. That means we didn't test this on 32-bit with highmem. I'm not sure it's really too rare case to spend a time for testing. In fact, I really want to test all series to our production system which is 32bit and highmem but as we know well, most of embedded system kernel is rather old so backporting needs lots of time and care. However, if we miss testing in those system at the moment, we will be suprised after 1~2 years. I don't know what kinds of benchmark can we can check it so I cannot insist on it but you might know it. Okay, do you have any idea to fix it if we see such regression report in 32-bit system in future?