From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail143.messagelabs.com (mail143.messagelabs.com [216.82.254.35]) by kanga.kvack.org (Postfix) with SMTP id 487416B004D for ; Tue, 2 Feb 2010 08:57:53 -0500 (EST) Date: Tue, 2 Feb 2010 14:56:34 +0100 From: Andrea Arcangeli Subject: Re: [PATCH 32 of 32] khugepaged Message-ID: <20100202135634.GK4135@random.random> References: <51b543fab38b1290f176.1264969663@v2.random> <4B670968.7090801@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4B670968.7090801@redhat.com> Sender: owner-linux-mm@kvack.org To: Rik van Riel Cc: linux-mm@kvack.org, Marcelo Tosatti , Adam Litke , Avi Kivity , Izik Eidus , Hugh Dickins , Nick Piggin , Mel Gorman , Dave Hansen , Benjamin Herrenschmidt , Ingo Molnar , Mike Travis , KAMEZAWA Hiroyuki , Christoph Lameter , Chris Wright , Andrew Morton , bpicco@redhat.com, KOSAKI Motohiro , Balbir Singh , Arnd Bergmann List-ID: On Mon, Feb 01, 2010 at 12:03:36PM -0500, Rik van Riel wrote: > On 01/31/2010 03:27 PM, Andrea Arcangeli wrote: > > > + /* stop anon_vma rmap pagetable access */ > > + spin_lock(&vma->anon_vma->lock); > > This is no longer enough. The anon_vma changes that > went into -mm recently mean that a VMA can be associated > with multiple anon_vmas. > > Of course, forcefully COW copying/writing every page in > the VMA will ensure that they are all in the anon_vma > you lock with the code above. > > I suspect the easiest fix would be to lock all the > anon_vmas attached to a VMA. That should not lead to > any deadlocks, since multiple siblings of the same > parent process would be encountering their anon_vma > structs in the same order, due to the way that > anon_vma_clone and anon_vma_fork work. > > This may be too subtle for lockdep, though :/ I hope I can do this in split_huge_page_mm/vma: if (pmd_trans_huge(*pmd)) { spin_lock(&mm->page_table_lock); /* stop pmd updates */ if (pmd_trans_huge(*pmd)) { unsigned long anon_mapping; struct page *page; struct anon_vma *anon_vma; page = pmd_page(*pmd); /* works even while pmd_splitting is set */ anon_mapping = page->mapping; /* ACCESS_ONCE not needed page has to go away before anon_vma */ VM_BUG_ON((anon_mapping & PAGE_MAPPING_FLAGS) != PAGE_MAPPING_ANON); VM_BUG_ON(!page_mapped(page)); anon_vma = (struct anon_vma *)((unsigned long)page->mapping - PAGE_MAPPING_ANON); rcu_read_lock(); spin_unlock(&mm->page_table_lock); spin_lock(&anon_vma->lock); rcu_read_unlock(); If this only causes at most 1 more memory allocation for each new vma created in the child, at the very first cow on the newly allocated vma in the child (and no more memory allocation) it should be unmeasurable overhead so it should be worth it and main downside seems to be in making the tricky vma adjusting more complex. I think the most urgent info that is missing is if this enough to fix the regressions in AIM that grinds to an halt now and prevents to get results. That is what started the development of this patch (for the other pages not cowed there is nothing we can do about it and current anon-vma is already as efficient as it can be in complexity terms). Optimizing for one-process-per-connection not so important, even if I know proprietary db do that. I've no idea how many pages there are in AIM that are cowed and how many that are totally shared and purely readonly for all processes in the system, and for the latter this is a noop... If it won't be proven that this fixes AIM we'll have to still to bisect previous patches. So in short I recommend to test this patch on Larry's system but again I think this change is good idea regardless (just I'm not convinced it'll be enough to prevent aim to grind to an halt, if you call page_referenced or try_to_unmap in a loop it won't be showing less in the profiling just because it returns faster, still a loop that will be and I suspect more the caller not the callee given the callee was fine in previous kernels... but I didn't have time to look into it so I may as well be wrong and maybe some legitimate change happened that requires a more frequent page_referenced calls, dunno). -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org