From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail138.messagelabs.com (mail138.messagelabs.com [216.82.249.35]) by kanga.kvack.org (Postfix) with SMTP id E28646B0204 for ; Thu, 29 Apr 2010 02:42:39 -0400 (EDT) Received: by iwn40 with SMTP id 40so6290337iwn.1 for ; Wed, 28 Apr 2010 23:42:37 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: References: <1272403852-10479-1-git-send-email-mel@csn.ul.ie> <20100428155558.GI15815@csn.ul.ie> <20100428162305.GX510@random.random> <20100428134719.32e8011b@annuminas.surriel.com> <20100428142510.09984e15@annuminas.surriel.com> <20100428161711.5a815fa8@annuminas.surriel.com> <20100428165734.6541bab3@annuminas.surriel.com> <4BD8EA85.2000209@redhat.com> Date: Thu, 29 Apr 2010 15:42:37 +0900 Message-ID: Subject: Re: [RFC PATCH -v3] take all anon_vma locks in anon_vma_lock From: Minchan Kim Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Sender: owner-linux-mm@kvack.org To: Rik van Riel Cc: Andrea Arcangeli , Mel Gorman , Linux-MM , LKML , KAMEZAWA Hiroyuki , Christoph Lameter , Andrew Morton List-ID: On Thu, Apr 29, 2010 at 11:55 AM, Minchan Kim wrote= : > On Thu, Apr 29, 2010 at 11:10 AM, Rik van Riel wrote: >> On 04/28/2010 08:28 PM, Minchan Kim wrote: >>> >>> On Thu, Apr 29, 2010 at 5:57 AM, Rik van Riel =C2=A0wr= ote: >>>> >>>> Take all the locks for all the anon_vmas in anon_vma_lock, this proper= ly >>>> excludes migration and the transparent hugepage code from VMA changes >>>> done >>>> by mmap/munmap/mprotect/expand_stack/etc... >>>> >>>> Unfortunately, this requires adding a new lock (mm->anon_vma_chain_loc= k), >>>> otherwise we have an unavoidable lock ordering conflict. =C2=A0This ch= anges >>>> the >>>> locking rules for the "same_vma" list to be either mm->mmap_sem for >>>> write, >>>> or mm->mmap_sem for read plus the new mm->anon_vma_chain lock. =C2=A0T= his >>>> limits >>>> the place where the new lock is taken to 2 locations - anon_vma_prepar= e >>>> and >>>> expand_downwards. >>>> >>>> Document the locking rules for the same_vma list in the anon_vma_chain >>>> and >>>> remove the anon_vma_lock call from expand_upwards, which does not need >>>> it. >>>> >>>> Signed-off-by: Rik van Riel >>> >>> This patch makes things simple. So I like this. >>> Actually, I wanted this all-at-once locks approach. >>> But I was worried about that how the patch affects AIM 7 workload >>> which is cause of anon_vma_chain about scalability by Rik. >>> But now Rik himself is sending the patch. So I assume the patch >>> couldn't decrease scalability of the workload heavily. >> >> The thing is, the number of anon_vmas attached to a VMA is >> small (depth of the tree, so for apache or aim the typical >> depth is 2). This N is between 1 and 3. >> >> The problem we had originally is the _width_ of the tree, >> where every sibling process was attached to the same anon_vma >> and the rmap code had to walk the page tables of all the >> processes, for every privately owned page in each child process. >> For large server workloads, this N is between a few hundred and >> a few thousand. >> >> What matters most at this point is correctness - we need to be >> able to exclude rmap walks when messing with a VMA in any way >> that breaks lookups, because rmap walks for page migration and >> hugepage conversion have to be 100% reliable. >> >> That is not a constraint I had in mind with the original >> anon_vma changes, so the code needs to be fixed up now... > > Yes. I understand it. > > When you tried anon_vma_chain patches as I pointed out, what I have a > concern is parent's vma not child's one. > The vma of parent still has N anon_vma. > AFAIR, you said it's trade-off and would be good than old at least. > I agreed. =C2=A0But I just want to remind you because this makes worse. = =C2=A0:) > The corner case is that we have to hold locks of N. > > Do I miss something? > Really, Can't we ignore that case latency although this happen infrequent= ly? > I am not against this patch. I just want to listen your opinion. me/ slaps self. It's about height of tree and I can't imagine high height of scenario.(fork->fork->fork->...->fork) So as Rik pointed out, It's a not big overhead about latency latency, at least. I think. I supports this approach. --=20 Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org