From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751352AbdBAGlP (ORCPT ); Wed, 1 Feb 2017 01:41:15 -0500 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:45991 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750991AbdBAGlN (ORCPT ); Wed, 1 Feb 2017 01:41:13 -0500 Subject: Re: [RFC V2 03/12] mm: Change generic FALLBACK zonelist creation process To: Dave Hansen , Anshuman Khandual , linux-kernel@vger.kernel.org, linux-mm@kvack.org References: <20170130033602.12275-1-khandual@linux.vnet.ibm.com> <20170130033602.12275-4-khandual@linux.vnet.ibm.com> <07bd439c-6270-b219-227b-4079d36a2788@intel.com> <434aa74c-e917-490e-85ab-8c67b1a82d95@linux.vnet.ibm.com> Cc: mhocko@suse.com, vbabka@suse.cz, mgorman@suse.de, minchan@kernel.org, aneesh.kumar@linux.vnet.ibm.com, bsingharora@gmail.com, srikar@linux.vnet.ibm.com, haren@linux.vnet.ibm.com, jglisse@redhat.com, dan.j.williams@intel.com From: Anshuman Khandual Date: Wed, 1 Feb 2017 12:10:55 +0530 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.5.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 17020106-0004-0000-0000-00000532A42A X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17020106-0005-0000-0000-0000132BBDFB Message-Id: <654e5b6f-4b23-671e-87ee-1ee83e3cc9a6@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-01-31_12:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1612050000 definitions=main-1702010065 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 01/31/2017 07:27 AM, Dave Hansen wrote: > On 01/30/2017 05:36 PM, Anshuman Khandual wrote: >>> Let's say we had a CDM node with 100x more RAM than the rest of the >>> system and it was just as fast as the rest of the RAM. Would we still >>> want it isolated like this? Or would we want a different policy? >> >> But then the other argument being, dont we want to keep this 100X more >> memory isolated for some special purpose to be utilized by specific >> applications ? > > I was thinking that in this case, we wouldn't even want to bother with > having "system RAM" in the fallback lists. A device who got its memory System RAM is in the fallback list of the CDM node for the following purpose. If the user asks explicitly through mbind() and there is insufficient memory on the CDM node to fulfill the request. Then it is better to fallback on a system RAM memory node than to fail the request. This is in line with expectations from the mbind() call. There are other ways for the user space like /proc/pid/numa_maps to query about from where exactly a given page has come from in the runtime. But keeping options open I have noted down this in the cover letter. " FALLBACK zonelist creation: CDM node's FALLBACK zonelist can also be changed to accommodate other CDM memory zones along with system RAM zones in which case they can be used as fallback options instead of first depending on the system RAM zones when it's own memory falls insufficient during allocation. " > usage off by 1% could start to starve the rest of the system. A sane Did not get this point. Could you please elaborate more on this ? > policy in this case might be to isolate the "system RAM" from the device's. Hmm. > >>> Why do we need this hard-coded along with the cpuset stuff later in the >>> series. Doesn't taking a node out of the cpuset also take it out of the >>> fallback lists? >> >> There are two mutually exclusive approaches which are described in >> this patch series. >> >> (1) zonelist modification based approach >> (2) cpuset restriction based approach >> >> As mentioned in the cover letter, > > Well, I'm glad you coded both of them up, but now that we have them how > to we pick which one to throw to the wolves? Or, do we just merge both > of them and let one bitrot? ;) I am just trying to see how each isolation method stack up from benefit and cost point of view, so that we can have informed debate about their individual merit. Meanwhile I have started looking at if the core buddy allocator __alloc_pages_nodemask() and its interaction with nodemask at various stages can also be modified to implement the intended solution.