From: Hugh Dickins <hugh.dickins@tiscali.co.uk>
To: Nick Piggin <npiggin@suse.de>
Cc: Andi Kleen <andi@firstfloor.org>,
riel@redhat.com, chris.mason@oracle.com,
akpm@linux-foundation.org, linux-kernel@vger.kernel.org,
linux-mm@kvack.org, fengguang.wu@intel.com
Subject: Re: [PATCH] [13/16] HWPOISON: The high level memory error handler in the VM v5
Date: Tue, 9 Jun 2009 17:05:53 +0100 (BST) [thread overview]
Message-ID: <Pine.LNX.4.64.0906091637430.13213@sister.anvils> (raw)
In-Reply-To: <20090609100922.GF14820@wotan.suse.de>
On Tue, 9 Jun 2009, Nick Piggin wrote:
> On Wed, Jun 03, 2009 at 08:46:47PM +0200, Andi Kleen wrote:
>
> Why not have this in rmap.c and not export the locking?
> I don't know.. does Hugh care?
Thanks for catching my eye with this, Nick.
As I've said to Andi, I don't actually have time to study all this.
To me, it's just another layer of complexity and maintenance burden
that one special-interest group is imposing upon mm, and I can't
keep up with it myself.
But looking at this one set of extracts: you're right that I'm much
happier when page_lock_anon_vma() isn't leaked outside of mm/rmap.c,
though I'd probably hate this whichever way round it was; and I
share your lock ordering concern.
However, if I'm interpreting these extracts correctly, the whole
thing looks very misguided to me. Are we really going to kill any
process that has a cousin who might once have mapped the page that's
been found hwpoisonous? The hwpoison secret police are dangerously
out of control, I'd say.
The usual use of rmap lookup loops is to go on to look into the page
table to see whether the page is actually mapped: I see no attempt
at that here, just an assumption that anyone on the list is guilty
of mapping the page and must be killed. And even if it did go on
to check if the page is there, maybe the process lost interest in
that page several weeks ago, why kill it?
At least in the file's prio_tree case, we'll only be killing those
who mmapped the range which happens to include the page. But in the
anon case, remember the anon_vma is just a bundle of "related" vmas
outside of which the page will not be found; so if one process got a
poisonous page through COW, all the other processes which happen to
be sharing that anon_vma through fork or through adjacent merging,
are going to get killed too.
Guilty by association.
I think a much more sensible approach would be to follow the page
migration technique of replacing the page's ptes by a special swap-like
entry, then do the killing from do_swap_page() if a process actually
tries to access the page.
But perhaps that has already been discussed and found impossible,
or I'm taking "kill" too seriously and other checks are done
elsewhere, or...
Hugh
>
> > +/*
> > + * Collect processes when the error hit an anonymous page.
> > + */
> > +static void collect_procs_anon(struct page *page, struct list_head *to_kill,
> > + struct to_kill **tkc)
> > +{
> > + struct vm_area_struct *vma;
> > + struct task_struct *tsk;
> > + struct anon_vma *av = page_lock_anon_vma(page);
> > +
> > + if (av == NULL) /* Not actually mapped anymore */
> > + return;
> > +
> > + read_lock(&tasklist_lock);
> > + for_each_process (tsk) {
> > + if (!tsk->mm)
> > + continue;
> > + list_for_each_entry (vma, &av->head, anon_vma_node) {
> > + if (vma->vm_mm == tsk->mm)
> > + add_to_kill(tsk, page, vma, to_kill, tkc);
> > + }
> > + }
> > + page_unlock_anon_vma(av);
> > + read_unlock(&tasklist_lock);
> > +}
> > +
> > +/*
> > + * Collect processes when the error hit a file mapped page.
> > + */
> > +static void collect_procs_file(struct page *page, struct list_head *to_kill,
> > + struct to_kill **tkc)
> > +{
> > + struct vm_area_struct *vma;
> > + struct task_struct *tsk;
> > + struct prio_tree_iter iter;
> > + struct address_space *mapping = page_mapping(page);
> > +
> > + /*
> > + * A note on the locking order between the two locks.
> > + * We don't rely on this particular order.
> > + * If you have some other code that needs a different order
> > + * feel free to switch them around. Or add a reverse link
> > + * from mm_struct to task_struct, then this could be all
> > + * done without taking tasklist_lock and looping over all tasks.
> > + */
> > +
> > + read_lock(&tasklist_lock);
> > + spin_lock(&mapping->i_mmap_lock);
>
> This still has my original complaint that it nests tasklist lock inside
> anon vma lock and outside inode mmap lock (and anon_vma nests inside i_mmap).
> I guess the property of our current rw locks means that does not matter,
> but it could if we had "fair" rw locks, or some tree (-rt tree maybe)
> changed rw lock to a plain exclusive lock.
>
>
> > + for_each_process(tsk) {
> > + pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
> > +
> > + if (!tsk->mm)
> > + continue;
> > +
> > + vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff,
> > + pgoff)
> > + if (vma->vm_mm == tsk->mm)
> > + add_to_kill(tsk, page, vma, to_kill, tkc);
> > + }
> > + spin_unlock(&mapping->i_mmap_lock);
> > + read_unlock(&tasklist_lock);
> > +}
> > +
> > +/*
> > + * Collect the processes who have the corrupted page mapped to kill.
> > + * This is done in two steps for locking reasons.
> > + * First preallocate one tokill structure outside the spin locks,
> > + * so that we can kill at least one process reasonably reliable.
> > + */
> > +static void collect_procs(struct page *page, struct list_head *tokill)
> > +{
> > + struct to_kill *tk;
> > +
> > + tk = kmalloc(sizeof(struct to_kill), GFP_KERNEL);
> > + /* memory allocation failure is implicitly handled */
> > + if (PageAnon(page))
> > + collect_procs_anon(page, tokill, &tk);
> > + else
> > + collect_procs_file(page, tokill, &tk);
> > + kfree(tk);
> > +}
>
> > Index: linux/mm/filemap.c
> > ===================================================================
> > --- linux.orig/mm/filemap.c 2009-06-03 19:37:38.000000000 +0200
> > +++ linux/mm/filemap.c 2009-06-03 20:13:43.000000000 +0200
> > @@ -105,6 +105,10 @@
> > *
> > * ->task->proc_lock
> > * ->dcache_lock (proc_pid_lookup)
> > + *
> > + * (code doesn't rely on that order, so you could switch it around)
> > + * ->tasklist_lock (memory_failure, collect_procs_ao)
> > + * ->i_mmap_lock
> > */
> >
> > /*
> > Index: linux/mm/rmap.c
> > ===================================================================
> > --- linux.orig/mm/rmap.c 2009-06-03 19:37:38.000000000 +0200
> > +++ linux/mm/rmap.c 2009-06-03 20:13:43.000000000 +0200
> > @@ -36,6 +36,10 @@
> > * mapping->tree_lock (widely used, in set_page_dirty,
> > * in arch-dependent flush_dcache_mmap_lock,
> > * within inode_lock in __sync_single_inode)
> > + *
> > + * (code doesn't rely on that order so it could be switched around)
> > + * ->tasklist_lock
> > + * anon_vma->lock (memory_failure, collect_procs_anon)
> > */
> >
> > #include <linux/mm.h>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2009-06-09 15:33 UTC|newest]
Thread overview: 71+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-06-03 18:46 [PATCH] [0/16] HWPOISON: Intro Andi Kleen
2009-06-03 18:46 ` [PATCH] [1/16] HWPOISON: Add page flag for poisoned pages Andi Kleen
2009-06-03 18:46 ` [PATCH] [2/16] HWPOISON: Export some rmap vma locking to outside world Andi Kleen
2009-06-03 18:46 ` [PATCH] [3/16] HWPOISON: Add support for poison swap entries v2 Andi Kleen
2009-06-03 18:46 ` [PATCH] [4/16] HWPOISON: Add new SIGBUS error codes for hardware poison signals Andi Kleen
2009-06-03 18:46 ` [PATCH] [5/16] HWPOISON: Add basic support for poisoned pages in fault handler v3 Andi Kleen
2009-06-03 18:46 ` [PATCH] [6/16] HWPOISON: Add various poison checks in mm/memory.c Andi Kleen
2009-06-04 4:26 ` Wu Fengguang
2009-06-04 5:19 ` Andi Kleen
2009-06-04 11:55 ` Wu Fengguang
2009-06-04 12:52 ` Andi Kleen
2009-06-04 12:50 ` Wu Fengguang
2009-06-04 13:02 ` Andi Kleen
2009-06-04 13:16 ` Wu Fengguang
2009-06-09 10:25 ` Nick Piggin
2009-06-09 12:21 ` Wu Fengguang
2009-06-09 12:35 ` Nick Piggin
2009-06-03 18:46 ` [PATCH] [7/16] HWPOISON: x86: Add VM_FAULT_HWPOISON handling to x86 page fault handler v2 Andi Kleen
2009-06-09 9:54 ` Nick Piggin
2009-06-09 12:34 ` [PATCH] HWPOISON: define VM_FAULT_HWPOISON to 0 when feature is disabled Wu Fengguang
2009-06-03 18:46 ` [PATCH] [8/16] HWPOISON: Use bitmask/action code for try_to_unmap behaviour Andi Kleen
2009-06-09 9:57 ` Nick Piggin
2009-06-10 2:27 ` Wu Fengguang
2009-06-10 6:07 ` Nick Piggin
2009-06-03 18:46 ` [PATCH] [9/16] HWPOISON: Handle hardware poisoned pages in try_to_unmap Andi Kleen
2009-06-04 4:35 ` Wu Fengguang
2009-06-04 5:21 ` Andi Kleen
2009-06-03 18:46 ` [PATCH] [10/16] HWPOISON: Handle poisoned pages in set_page_dirty() Andi Kleen
2009-06-04 0:36 ` Wu Fengguang
2009-06-04 5:27 ` Andi Kleen
2009-06-09 9:59 ` Nick Piggin
2009-06-09 12:51 ` Wu Fengguang
2009-06-03 18:46 ` [PATCH] [11/16] HWPOISON: check and isolate corrupted free pages v2 Andi Kleen
2009-06-09 10:02 ` Nick Piggin
2009-06-09 13:03 ` Wu Fengguang
2009-06-09 13:28 ` Nick Piggin
2009-06-09 13:49 ` Wu Fengguang
2009-06-09 13:55 ` Nick Piggin
2009-06-09 14:56 ` Wu Fengguang
2009-06-09 15:31 ` Nick Piggin
2009-06-03 18:46 ` [PATCH] [12/16] Refactor truncate to allow direct truncating of page Andi Kleen
2009-06-04 4:32 ` Wu Fengguang
2009-06-04 5:20 ` Andi Kleen
2009-06-03 18:46 ` [PATCH] [13/16] HWPOISON: The high level memory error handler in the VM v5 Andi Kleen
2009-06-04 3:24 ` Wu Fengguang
2009-06-04 5:13 ` Andi Kleen
2009-06-04 9:07 ` Wu Fengguang
2009-06-04 9:26 ` Andi Kleen
2009-06-09 9:51 ` Nick Piggin
2009-06-09 11:14 ` Nick Piggin
2009-06-09 10:09 ` Nick Piggin
2009-06-09 16:05 ` Hugh Dickins [this message]
2009-06-09 16:35 ` Nick Piggin
2009-06-10 8:38 ` Wu Fengguang
2009-06-10 8:59 ` Nick Piggin
2009-06-10 9:20 ` Wu Fengguang
2009-06-10 11:03 ` Nick Piggin
2009-06-10 12:16 ` Wu Fengguang
2009-06-10 12:36 ` Nick Piggin
2009-06-12 9:58 ` Andi Kleen
2009-06-10 3:10 ` [PATCH] HWPOISON: fix tasklist_lock/anon_vma locking order Wu Fengguang
2009-06-03 18:46 ` [PATCH] [14/16] HWPOISON: FOR TESTING: Enable memory failure code unconditionally Andi Kleen
2009-06-03 18:46 ` [PATCH] [15/16] HWPOISON: Add madvise() based injector for hardware poisoned pages v3 Andi Kleen
2009-06-03 18:46 ` [PATCH] [16/16] HWPOISON: Add simple debugfs interface to inject hwpoison on arbitary PFNs Andi Kleen
2009-06-09 10:20 ` [PATCH] [0/16] HWPOISON: Intro Nick Piggin
2009-06-10 9:07 ` Wu Fengguang
2009-06-10 9:18 ` Nick Piggin
2009-06-10 9:45 ` Wu Fengguang
2009-06-10 11:15 ` Nick Piggin
2009-06-10 12:36 ` Wu Fengguang
2009-06-10 12:47 ` Nick Piggin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.64.0906091637430.13213@sister.anvils \
--to=hugh.dickins@tiscali.co.uk \
--cc=akpm@linux-foundation.org \
--cc=andi@firstfloor.org \
--cc=chris.mason@oracle.com \
--cc=fengguang.wu@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=npiggin@suse.de \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).