From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754329AbZHYPZx (ORCPT ); Tue, 25 Aug 2009 11:25:53 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751152AbZHYPZx (ORCPT ); Tue, 25 Aug 2009 11:25:53 -0400 Received: from mx1.redhat.com ([209.132.183.28]:38966 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750787AbZHYPZw (ORCPT ); Tue, 25 Aug 2009 11:25:52 -0400 Date: Tue, 25 Aug 2009 17:22:17 +0200 From: Andrea Arcangeli To: Hugh Dickins Cc: Izik Eidus , Rik van Riel , Chris Wright , Nick Piggin , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 13/12] ksm: fix munlock during exit_mmap deadlock Message-ID: <20090825152217.GQ14722@random.random> References: <20090825145832.GP14722@random.random> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20090825145832.GP14722@random.random> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Andrea Arcangeli We can't stop page faults from happening during exit_mmap or munlock fails. The fundamental issue is the absolute lack of serialization after mm_users reaches 0. mmap_sem should be hot in the cache as we just released it a few nanoseconds before in exit_mm, we just need to take it one last time after mm_users is 0 to allow drivers to serialize safely against it so that taking mmap_sem and checking mm_users > 0 is enough for ksm to serialize against exit_mmap while still noticing when oom killer or something else wants to release all memory of the mm. When ksm notices it bails out and it allows memory to be released. Signed-off-by: Andrea Arcangeli --- diff --git a/kernel/fork.c b/kernel/fork.c index 9a16c21..f5af0d3 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -515,7 +515,18 @@ void mmput(struct mm_struct *mm) if (atomic_dec_and_test(&mm->mm_users)) { exit_aio(mm); + + /* + * Allow drivers tracking mm without pinning mm_users + * (so that mm_users is allowed to reach 0 while they + * do their tracking) to serialize against exit_mmap + * by taking mmap_sem and checking mm_users is still > + * 0 before working on the mm they're tracking. + */ + down_read(&mm->mmap_sem); + up_read(&mm->mmap_sem); exit_mmap(mm); + set_mm_exe_file(mm, NULL); if (!list_empty(&mm->mmlist)) { spin_lock(&mmlist_lock); diff --git a/mm/memory.c b/mm/memory.c index 4a2c60d..025431e 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2603,7 +2603,7 @@ static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma, entry = maybe_mkwrite(pte_mkdirty(entry), vma); page_table = pte_offset_map_lock(mm, pmd, address, &ptl); - if (!pte_none(*page_table) || ksm_test_exit(mm)) + if (!pte_none(*page_table)) goto release; inc_mm_counter(mm, anon_rss); @@ -2753,7 +2753,7 @@ static int __do_fault(struct mm_struct *mm, struct vm_area_struct *vma, * handle that later. */ /* Only go through if we didn't race with anybody else... */ - if (likely(pte_same(*page_table, orig_pte) && !ksm_test_exit(mm))) { + if (likely(pte_same(*page_table, orig_pte))) { flush_icache_page(vma, page); entry = mk_pte(page, vma->vm_page_prot); if (flags & FAULT_FLAG_WRITE)