linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Andrea Arcangeli <aarcange@redhat.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Ingo Molnar <mingo@elte.hu>, Nick Piggin <npiggin@novell.com>,
	Hugh Dickins <hugh@veritas.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	linux-mm@kvack.org
Subject: Re: [aarcange@redhat.com: [PATCH] fork vs gup(-fast) fix]
Date: Mon, 16 Mar 2009 19:28:14 +0100	[thread overview]
Message-ID: <20090316182814.GA20555@random.random> (raw)
In-Reply-To: <alpine.LFD.2.00.0903161034030.3675@localhost.localdomain>

On Mon, Mar 16, 2009 at 10:42:48AM -0700, Linus Torvalds wrote:
> Maybe we could go back to also looking at page counts?

Hugh just recently reminded me why we switched to mapcount and
explanation is here: c475a8ab625d567eacf5e30ec35d6d8704558062
which wasn't entirely safe until this was added too:
ab967d86015a19777955370deebc8262d50fed63 which reliably allowed to
takeover swapcache pages taken by gup and at the same time it allowed
the VM to unmap ptes pointing to swapcache taken by GUP.

Yes it's possible to go back to page counts, then we have only to
reintroduce by 2.6.7 solution that will prevent the VM to unmap ptes
that are mapping pages take by GUP. Otherwise do_wp_page won't be able
to remap into the pte the same swapcache that was unmapped by the pte
by the VM leading to disk corruption with swapping (the 2.4 bug, fixed
in 2.4 with a simpler PG_lock local to direct-io, that prevented the
VM to unmap ptes on the page as long as I/O was in progress, and
PG_lock was released by the ->end_io async handler from irq IIRC).

The only problem I can see is if mapcount and page count can change
freely while PT lock and rmap locks are taken, comparing them won't be
as reliable as in ksm/fork (in my version of the fix) where we're
guaranteed mapcount is 1 and stays 1 as long as we hold PT lock,
because pte_write(pte) == true and PageAnon == true (I also added a
BUG_ON to check mapcount to be always 1 with the other two conditions
are true). That makes ksm/forkfix quite obviously safe in this regard.

But for the VM to decide not to unmap a pte taken by GUP, we also have
to deal with a mapcount > 1 and pte_write(pte) == false and PageAnon
== true. So if we solve that ordering issue between reading mapcount
and page count I don't see much of a problem to returning checking the
page count in the VM code to prevent the pte to be unmapped while page
is under GUP and then remove the mapcount-only check from do_wp_page
swapcache-reuse logic.

If we'd return using the page_count instead of mapcount, my first
patch I posted here would then not require any change to take care of
the 'reverse' race (modulo hugetlb) of the child writing to the pages
that are being written to disk by the parent, there would be no need
to de-cow in GUP (again modulo hugetlb).

> I really think we should be able to fix this without _anything_ like that 
> at all. Just the lock (and some reuse_swap_page() logic changes).

I don't see why we should introduce mm wide locks outside GUP
(worrying about the SetPageGUP in gup-fast when gup-fast would then
instead have to take a mm-wide lock sounds small issue) when we can be
page-granular and lockless. I agree it could be simpler and less
invasive into the gup details to add any logic outside of gup, but I
don't think the result will be superior, given it'll most certainly
become an havier-weight lock bouncing across all cpus calling
gup-fast, and it won't make a speed difference for the CPU to execute
an atomic lock op inner or outer of gup-fast. OTOH if the argument for
an outer mm wide lock is to keep the code simpler or more
maintainable, that would explain it. I think fixing it my way is not
more complicated than by fixing outside gup, but then I clearly may be
biased in what it looks simpler to me.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2009-03-16 18:28 UTC|newest]

Thread overview: 83+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20090311170611.GA2079@elte.hu>
2009-03-11 17:33 ` [aarcange@redhat.com: [PATCH] fork vs gup(-fast) fix] Linus Torvalds
2009-03-11 17:41   ` Ingo Molnar
2009-03-11 17:58     ` Linus Torvalds
2009-03-11 18:37       ` Andrea Arcangeli
2009-03-11 18:46         ` Linus Torvalds
2009-03-11 19:01           ` Linus Torvalds
2009-03-11 19:59             ` Andrea Arcangeli
2009-03-11 20:19               ` Linus Torvalds
2009-03-11 20:33                 ` Linus Torvalds
2009-03-11 20:55                   ` Andrea Arcangeli
2009-03-11 21:28                     ` Linus Torvalds
2009-03-11 21:57                       ` Andrea Arcangeli
2009-03-11 22:06                         ` Linus Torvalds
2009-03-11 22:07                           ` Linus Torvalds
2009-03-11 22:22                           ` Davide Libenzi
2009-03-11 22:32                             ` Linus Torvalds
2009-03-14  5:07                   ` Benjamin Herrenschmidt
2009-03-11 20:48                 ` Andrea Arcangeli
2009-03-14  5:06                 ` Benjamin Herrenschmidt
2009-03-14  5:20                   ` Nick Piggin
2009-03-16 16:01                     ` KOSAKI Motohiro
2009-03-16 16:23                       ` Nick Piggin
2009-03-16 16:32                         ` Linus Torvalds
2009-03-16 16:50                           ` Nick Piggin
2009-03-16 17:02                             ` Linus Torvalds
2009-03-16 17:19                               ` Nick Piggin
2009-03-16 17:42                                 ` Linus Torvalds
2009-03-16 18:02                                   ` Nick Piggin
2009-03-16 18:05                                     ` Nick Piggin
2009-03-16 18:17                                       ` Linus Torvalds
2009-03-16 18:33                                         ` Nick Piggin
2009-03-16 19:22                                           ` Linus Torvalds
2009-03-17  5:44                                             ` Nick Piggin
2009-03-16 18:14                                     ` Linus Torvalds
2009-03-16 18:29                                       ` Nick Piggin
2009-03-16 19:17                                         ` Linus Torvalds
2009-03-17  5:42                                           ` Nick Piggin
2009-03-17  5:58                                             ` Nick Piggin
2009-03-16 18:37                                       ` Andrea Arcangeli
2009-03-16 18:28                                   ` Andrea Arcangeli [this message]
2009-03-16 23:59                             ` KAMEZAWA Hiroyuki
2009-03-18  2:04                         ` KOSAKI Motohiro
2009-03-22 12:23                           ` KOSAKI Motohiro
2009-03-23  0:13                             ` KOSAKI Motohiro
2009-03-23 16:29                               ` Ingo Molnar
2009-03-23 16:46                                 ` Linus Torvalds
2009-03-24  5:08                                   ` KOSAKI Motohiro
2009-03-24 13:43                             ` Nick Piggin
2009-03-24 17:56                               ` Linus Torvalds
2009-03-30 10:52                               ` KOSAKI Motohiro
     [not found]                                 ` <200904022307.12043.nickpiggin@yahoo.com.au>
2009-04-03  3:49                                   ` Nick Piggin
2009-03-17  0:44                       ` Linus Torvalds
2009-03-17  0:56                         ` KAMEZAWA Hiroyuki
2009-03-17 12:19                         ` Andrea Arcangeli
2009-03-17 16:43                           ` Linus Torvalds
2009-03-17 17:01                             ` Linus Torvalds
2009-03-17 17:10                               ` Andrea Arcangeli
2009-03-17 17:43                                 ` Linus Torvalds
2009-03-17 18:09                                   ` Linus Torvalds
2009-03-17 18:19                                     ` Linus Torvalds
2009-03-17 18:46                                       ` Andrea Arcangeli
2009-03-17 19:03                                         ` Linus Torvalds
2009-03-17 19:35                                           ` Andrea Arcangeli
2009-03-17 19:55                                             ` Linus Torvalds
2009-03-11 19:06           ` Andrea Arcangeli
2009-03-12  5:36           ` Nick Piggin
2009-03-12 16:23             ` Nick Piggin
2009-03-12 17:00               ` Andrea Arcangeli
2009-03-12 17:20                 ` Nick Piggin
2009-03-12 17:23                   ` Nick Piggin
2009-03-12 18:06                   ` Andrea Arcangeli
2009-03-12 18:58                     ` Andrea Arcangeli
2009-03-13 16:09                     ` Nick Piggin
2009-03-13 19:34                       ` Andrea Arcangeli
2009-03-14  4:59                         ` Nick Piggin
2009-03-16 13:56                           ` Andrea Arcangeli
2009-03-16 16:01                             ` Nick Piggin
2009-03-14  4:46                       ` Nick Piggin
2009-03-14  5:06                         ` Nick Piggin
2009-03-11 18:53     ` Andrea Arcangeli
2009-03-11 18:22   ` Andrea Arcangeli
2009-03-11 19:06     ` Ingo Molnar
2009-03-11 19:15       ` Andrea Arcangeli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090316182814.GA20555@random.random \
    --to=aarcange@redhat.com \
    --cc=benh@kernel.crashing.org \
    --cc=hugh@veritas.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-mm@kvack.org \
    --cc=mingo@elte.hu \
    --cc=nickpiggin@yahoo.com.au \
    --cc=npiggin@novell.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).