From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx197.postini.com [74.125.245.197]) by kanga.kvack.org (Postfix) with SMTP id E79E66B0074 for ; Thu, 5 Jul 2012 21:00:43 -0400 (EDT) Received: by ghrr18 with SMTP id r18so9954200ghr.14 for ; Thu, 05 Jul 2012 18:00:43 -0700 (PDT) Message-ID: <4FF638AD.9040308@gmail.com> Date: Fri, 06 Jul 2012 09:00:29 +0800 From: Nai Xia Reply-To: nai.xia@gmail.com MIME-Version: 1.0 Subject: Re: [PATCH 13/40] autonuma: CPU follow memory algorithm References: <1340888180-15355-1-git-send-email-aarcange@redhat.com> <1340888180-15355-14-git-send-email-aarcange@redhat.com> <1340895238.28750.49.camel@twins> <20120629125517.GD32637@gmail.com> <4FEDDD0C.60609@redhat.com> <1340995986.28750.114.camel@twins> <4FF5D7CA.5020301@redhat.com> In-Reply-To: <4FF5D7CA.5020301@redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org List-ID: To: Rik van Riel Cc: Peter Zijlstra , dlaor@redhat.com, Ingo Molnar , Hillf Danton , Andrea Arcangeli , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Dan Smith , Linus Torvalds , Andrew Morton , Thomas Gleixner , Ingo Molnar , Paul Turner , Suresh Siddha , Mike Galbraith , "Paul E. McKenney" , Lai Jiangshan , Bharata B Rao , Lee Schermerhorn , Johannes Weiner , Srivatsa Vaddagiri , Christoph Lameter , Alex Shi , Mauricio Faria de Oliveira , Konrad Rzeszutek Wilk , Don Morris , Benjamin Herrenschmidt On 2012a1'07ae??06ae?JPY 02:07, Rik van Riel wrote: > On 06/29/2012 04:01 PM, Nai Xia wrote: > >> Hey guys, Can I say NAK to these patches ? > > Not necessarily the patches, but thinking about your > points some more, I thought of a much more serious > potential problem with Andrea's code. > >> Now I aware that this sampling algorithm is completely broken, if we take >> a few seconds to see what it is trying to solve: > >> Andrea's patch can only approximate the pages_accessed number in a >> time unit(scan interval), >> I don't think it can catch even 1% of average_page_access_frequence >> on a busy workload. > > It is much more "interesting" than that. > > Once the first thread gets a NUMA pagefault on a > particular page, the page is made present in the > page tables and NO OTHER THREAD will get NUMA > page faults. > > That means when trying to compare the weighting > of NUMA accesses between different threads in a > 10 second interval, we only know THE FIRST FAULT. > > We have no information on whether any other threads > tried to access the same page, because we do not > get faults more frequently. > > Not only do we not get use frequency information, > we may not get the information on which threads use > which memory, at all. > > Somehow Andrea's code still seems to work. On this point alone, I agree with Andrea's reasoning: 1. This information get averaged in noise. 2. If a thread statistically get more faults than others then it may deserve to be biased. Note, I mean only reasoning, I don't have enough confidence if Andrea's coding is really working like this, since I didn't do micro benchmarks on this part of code. > > It would be very interesting to know why. Note my personal experience tells me that sometimes you wrote a complex system, it works like a charm. And later you cut out 30% of its code, it's still working like a charm. Sometimes a part of a system just is not that relevant to the output of the whole benchmark, and this fact may make it seemingly have good resistance to false negatives/positives. It's time to look inside with benchmarks, IMO. Again, I have no intension or benefit in disabling this algorithm. I am only curious about the truth. Hope nobody will get offended. Thanks, Nai > > How much sense does the following code still make, > considering we may never get all the info on which > threads use which memory? > > + /* > + * Generate the w_nid/w_cpu_nid from the > + * pre-computed mm/task_numa_weight[] and > + * compute w_other using the w_m/w_t info > + * collected from the other process. > + */ > + if (mm == p->mm) { > + if (w_t > w_t_t) > + w_t_t = w_t; > + w_other = w_t*AUTONUMA_BALANCE_SCALE/w_t_t; > + w_nid = task_numa_weight[nid]; > + w_cpu_nid = task_numa_weight[cpu_nid]; > + w_type = W_TYPE_THREAD; > > Andrea, what is the real reason your code works? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org