From: Andrea Arcangeli <aarcange@redhat.com>
To: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Rik van Riel <riel@redhat.com>,
linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
Marcelo Tosatti <mtosatti@redhat.com>,
Adam Litke <agl@us.ibm.com>, Avi Kivity <avi@redhat.com>,
Izik Eidus <ieidus@redhat.com>, Nick Piggin <npiggin@suse.de>,
Mel Gorman <mel@csn.ul.ie>, Dave Hansen <dave@linux.vnet.ibm.com>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
Ingo Molnar <mingo@elte.hu>, Mike Travis <travis@sgi.com>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
Christoph Lameter <cl@linux-foundation.org>,
Chris Wright <chrisw@sous-sol.org>,
bpicco@redhat.com,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
Balbir Singh <balbir@linux.vnet.ibm.com>,
Arnd Bergmann <arnd@arndb.de>,
"Michael S. Tsirkin" <mst@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Johannes Weiner <hannes@cmpxchg.org>
Subject: Re: [PATCH 35 of 41] don't leave orhpaned swap cache after ksm merging
Date: Thu, 1 Apr 2010 18:47:58 +0200 [thread overview]
Message-ID: <20100401164758.GZ5825@random.random> (raw)
In-Reply-To: <alpine.LSU.2.00.1003292302080.11420@sister.anvils>
On Mon, Mar 29, 2010 at 11:56:38PM -0700, Hugh Dickins wrote:
> I deeply resent you forcing me to think like this ;)
sorry ;)
> There is a simple bug with your patch below, isn't there?
> The BUG_ON(!PageLocked(page)) in munlock_vma_page().
> I expect that could be worked around with more messiness.
Didn't notice this, no more messiness just like in do_wp_page:
lock_page(old_page); /* for LRU manipulation */
clear_page_mlock(old_page);
unlock_page(old_page);
diff --git a/mm/ksm.c b/mm/ksm.c
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -889,7 +889,9 @@ static int try_to_merge_one_page(struct
err = -EFAULT;
if ((vma->vm_flags & VM_LOCKED) && kpage && !err) {
+ lock_page(page); /* for LRU manipulation */
munlock_vma_page(page);
+ unlock_page(page);
if (!PageMlocked(kpage)) {
lock_page(kpage);
mlock_vma_page(kpage);
So no big deal, the chances we block in that lock are close to zero
considering we just released it. The VM could still take it because
the page is still in the lru, but it will bail out when it sees the
page_mapcount() == 0. So no risk to wait for I/O.
> But really you're interested in whether I see an absolute reason why
> we have to hold page lock across the replace_page(). And no, I can't
> at this moment name an absolute reason, but still feel as I did when
> I made that change: it makes thinking about the transition easier.
What about do_wp_page? It also reads the orig_pte. It takes the page
lock just to run reuse_swap_cache. If that fails it drops the PT lock
allocates the page, take the PT lock again, runs pte_same the same way
reuse_swap_cache does it, and finally it copies and replaces the page.
How is that any different? I mean are we introducing a new case or
it's the same as do_wp_page.
I think it boils down to the answer of the above question. I think
they're equal, but if you think they're different I'll keep the lock
hold during replace_page no problem. I don't want to introduce new
locking cases, but to me it doesn't look like one!
> So why don't you leave try_to_merge_one_page() just as it is,
> and leave replace_page()'s put_page() as it is, but add in
> if (!page_mapped(page))
> try_to_free_swap(page);
> either before or after the put_page? The page_mapped test
> is not vital; but if the page is still mapped elsewhere,
> we usually take that as justification for keeping its swap.
No doubt we can leave the page lock around replace_page too, but I
personally hate to leave unknown-needed locking, especially if there
are other places that release the page lock and they only relies on
the pte_same check under PT lock when they replace the page
(do_wp_page).
Originally, before I found the trouble with the gup pins in
page_wrprotect (current write_protect_page) we didn't take the PG_lock
at all. We had to introduce it to do the page_count accounting right
on the swapcache and that's about it...
> (I should note in passing that really the thing to do here is
> not necessarily to free the swap, but to consider transferring
> the swap to the KSM page. If all goes well, the KSM page remains
> stable and we should be able to reread it from swap later on,
> without having to write it out there again. But the way swapping
Agreed that would be ideal. It'd save one I/O if both pte and
swapcache are clean, and it might improve swap locality even when one
of the two is dirty.
> of KSM pages works, the chance that the KSM page will be the one
> that's already PageSwapcache is fairly low; and so we do repeatedly
> write them out to swap. I was working to avoid that when doing the
> KSM swapping, but it grew such a long conditional expression -
> almost as long as the Cc list on this mail - and became so awkward
> between replace_page and try_to_merge_one_page, that I decided to put
> it all off to a later optimization. That I've never yet got around to.)
No problem. It's not high priority for sure. The only high priority
thing as far as KSM is concerned is to make it work on hugepages. For
now shutting down a VM and not being life with gigabytes of swap used
is enough...
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2010-04-01 16:48 UTC|newest]
Thread overview: 70+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-03-26 17:00 [PATCH 00 of 41] Transparent Hugepage Support #15 Andrea Arcangeli
2010-03-26 17:00 ` [PATCH 01 of 41] define MADV_HUGEPAGE Andrea Arcangeli
2010-03-26 17:00 ` [PATCH 02 of 41] compound_lock Andrea Arcangeli
2010-03-26 17:00 ` [PATCH 03 of 41] alter compound get_page/put_page Andrea Arcangeli
2010-03-26 17:00 ` [PATCH 04 of 41] update futex compound knowledge Andrea Arcangeli
2010-03-26 17:00 ` [PATCH 05 of 41] fix bad_page to show the real reason the page is bad Andrea Arcangeli
2010-03-26 17:00 ` [PATCH 06 of 41] clear compound mapping Andrea Arcangeli
2010-03-26 17:00 ` [PATCH 07 of 41] add native_set_pmd_at Andrea Arcangeli
2010-03-26 17:00 ` [PATCH 08 of 41] add pmd paravirt ops Andrea Arcangeli
2010-03-26 17:00 ` [PATCH 09 of 41] no paravirt version of pmd ops Andrea Arcangeli
2010-03-26 17:00 ` [PATCH 10 of 41] export maybe_mkwrite Andrea Arcangeli
2010-03-26 17:00 ` [PATCH 11 of 41] comment reminder in destroy_compound_page Andrea Arcangeli
2010-03-26 17:00 ` [PATCH 12 of 41] config_transparent_hugepage Andrea Arcangeli
2010-03-26 17:00 ` [PATCH 13 of 41] special pmd_trans_* functions Andrea Arcangeli
2010-03-26 17:00 ` [PATCH 14 of 41] add pmd mangling generic functions Andrea Arcangeli
2010-03-26 17:00 ` [PATCH 15 of 41] add pmd mangling functions to x86 Andrea Arcangeli
2010-03-26 17:00 ` [PATCH 16 of 41] bail out gup_fast on splitting pmd Andrea Arcangeli
2010-03-26 17:00 ` [PATCH 17 of 41] pte alloc trans splitting Andrea Arcangeli
2010-03-26 17:00 ` [PATCH 18 of 41] add pmd mmu_notifier helpers Andrea Arcangeli
2010-03-26 17:00 ` [PATCH 19 of 41] clear page compound Andrea Arcangeli
2010-03-26 17:00 ` [PATCH 20 of 41] add pmd_huge_pte to mm_struct Andrea Arcangeli
2010-03-26 17:00 ` [PATCH 21 of 41] split_huge_page_mm/vma Andrea Arcangeli
2010-03-26 17:00 ` [PATCH 22 of 41] split_huge_page paging Andrea Arcangeli
2010-03-26 17:00 ` [PATCH 23 of 41] clear_copy_huge_page Andrea Arcangeli
2010-03-26 17:00 ` [PATCH 24 of 41] kvm mmu transparent hugepage support Andrea Arcangeli
2010-03-26 17:00 ` [PATCH 25 of 41] _GFP_NO_KSWAPD Andrea Arcangeli
2010-03-26 17:00 ` [PATCH 26 of 41] don't alloc harder for gfp nomemalloc even if nowait Andrea Arcangeli
2010-03-26 17:00 ` [PATCH 27 of 41] transparent hugepage core Andrea Arcangeli
2010-03-26 17:00 ` [PATCH 28 of 41] verify pmd_trans_huge isn't leaking Andrea Arcangeli
2010-03-26 17:00 ` [PATCH 29 of 41] madvise(MADV_HUGEPAGE) Andrea Arcangeli
2010-03-26 17:00 ` [PATCH 30 of 41] pmd_trans_huge migrate bugcheck Andrea Arcangeli
2010-03-26 17:00 ` [PATCH 31 of 41] memcg compound Andrea Arcangeli
2010-03-29 1:57 ` Daisuke Nishimura
2010-03-29 18:23 ` Andrea Arcangeli
2010-03-26 17:00 ` [PATCH 32 of 41] memcg huge memory Andrea Arcangeli
2010-03-26 17:00 ` [PATCH 33 of 41] transparent hugepage vmstat Andrea Arcangeli
2010-03-29 2:13 ` Daisuke Nishimura
2010-03-29 18:21 ` Andrea Arcangeli
2010-03-30 0:40 ` Daisuke Nishimura
2010-03-26 17:00 ` [PATCH 34 of 41] khugepaged Andrea Arcangeli
2010-03-26 17:00 ` [PATCH 35 of 41] don't leave orhpaned swap cache after ksm merging Andrea Arcangeli
2010-03-26 17:16 ` Rik van Riel
2010-03-26 17:23 ` Andrea Arcangeli
2010-03-26 21:32 ` Hugh Dickins
2010-03-27 1:08 ` Andrea Arcangeli
2010-03-29 14:01 ` Andrea Arcangeli
2010-03-30 6:56 ` Hugh Dickins
2010-04-01 16:47 ` Andrea Arcangeli [this message]
2010-03-26 17:00 ` [PATCH 36 of 41] skip transhuge pages in ksm for now Andrea Arcangeli
2010-03-26 17:20 ` Rik van Riel
2010-03-26 17:00 ` [PATCH 37 of 41] add x86 32bit support Andrea Arcangeli
2010-03-26 17:45 ` Rik van Riel
2010-03-26 17:54 ` Johannes Weiner
2010-03-26 19:54 ` Andrea Arcangeli
2010-03-26 17:00 ` [PATCH 38 of 41] mincore transparent hugepage support Andrea Arcangeli
2010-03-26 18:13 ` Rik van Riel
2010-03-26 17:00 ` [PATCH 39 of 41] add pmd_modify Andrea Arcangeli
2010-03-26 18:24 ` Rik van Riel
2010-03-26 17:00 ` [PATCH 40 of 41] mprotect: pass vma down to page table walkers Andrea Arcangeli
2010-03-26 18:26 ` Rik van Riel
2010-03-26 17:00 ` [PATCH 41 of 41] mprotect: transparent huge page support Andrea Arcangeli
2010-03-26 18:27 ` Rik van Riel
2010-03-26 17:36 ` [PATCH 00 of 41] Transparent Hugepage Support #15 Mel Gorman
2010-03-26 18:07 ` Andrea Arcangeli
2010-03-26 21:09 ` Mel Gorman
2010-03-26 18:00 ` Christoph Lameter
2010-03-26 18:23 ` Andrea Arcangeli
2010-03-26 18:44 ` Christoph Lameter
2010-03-26 19:34 ` Andrea Arcangeli
2010-03-26 19:55 ` Christoph Lameter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100401164758.GZ5825@random.random \
--to=aarcange@redhat.com \
--cc=agl@us.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=arnd@arndb.de \
--cc=avi@redhat.com \
--cc=balbir@linux.vnet.ibm.com \
--cc=benh@kernel.crashing.org \
--cc=bpicco@redhat.com \
--cc=chrisw@sous-sol.org \
--cc=cl@linux-foundation.org \
--cc=dave@linux.vnet.ibm.com \
--cc=hannes@cmpxchg.org \
--cc=hugh.dickins@tiscali.co.uk \
--cc=ieidus@redhat.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-mm@kvack.org \
--cc=mel@csn.ul.ie \
--cc=mingo@elte.hu \
--cc=mst@redhat.com \
--cc=mtosatti@redhat.com \
--cc=npiggin@suse.de \
--cc=peterz@infradead.org \
--cc=riel@redhat.com \
--cc=travis@sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).