linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Nick Piggin <npiggin@suse.de>
To: Wu Fengguang <fengguang.wu@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>,
	"hugh@veritas.com" <hugh@veritas.com>,
	"riel@redhat.com" <riel@redhat.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"chris.mason@oracle.com" <chris.mason@oracle.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: Re: [PATCH] [13/16] HWPOISON: The high level memory error handler in the VM v3
Date: Mon, 1 Jun 2009 13:50:46 +0200	[thread overview]
Message-ID: <20090601115046.GE5018@wotan.suse.de> (raw)
In-Reply-To: <20090528135428.GB16528@localhost>

On Thu, May 28, 2009 at 09:54:28PM +0800, Wu Fengguang wrote:
> On Thu, May 28, 2009 at 08:23:57PM +0800, Nick Piggin wrote:
> > On Thu, May 28, 2009 at 05:59:34PM +0800, Wu Fengguang wrote:
> > > Hi Nick,
> > > 
> > > > > +     /*
> > > > > +      * remove_from_page_cache assumes (mapping && !mapped)
> > > > > +      */
> > > > > +     if (page_mapping(p) && !page_mapped(p)) {
> > > > > +             remove_from_page_cache(p);
> > > > > +             page_cache_release(p);
> > > > > +     }
> > > > 
> > > > remove_mapping would probably be a better idea. Otherwise you can
> > > > probably introduce pagecache removal vs page fault races which
> > > > will make the kernel bug.
> > > 
> > > We use remove_mapping() at first, then discovered that it made strong
> > > assumption on page_count=2.
> > > 
> > > I guess it is safe from races since we are locking the page?
> > 
> > Yes it probably should (although you will lose get_user_pages data, but
> > I guess that's the aim anyway).
> 
> Yes. We (and truncate) rely heavily on this logic:
> 
>         retry:
>                 lock_page(page);
>                 if (page->mapping == NULL)
>                         goto retry;
>                 // do something on page
>                 unlock_page(page);
> 
> So that we can steal/isolate a page under its page lock.
> 
> The truncate code does wait on writeback page, but we would like to
> isolate the page ASAP, so as to avoid someone to find it in the page
> cache (or swap cache) and then access its content.
> 
> I see no obvious problems to isolate a writeback page from page cache
> or swap cache. But also I'm not sure it won't break some assumption
> in some corner of the kernel.

The problem is that then you have lost synchronization in the
pagecache. Nothing then prevents a new page from being put
in there and trying to do IO to or from the same device as the
currently running writeback.

 
> > But I just don't like this one file having all that required knowledge
> 
> Yes that's a big problem.
> 
> One major complexity involves classify the page into different known
> types, by testing page flags, page_mapping, page_mapped, etc. This
> is not avoidable.

No.

 
> Another major complexity is on calling the isolation routines to
> remove references from
>         - PTE
>         - page cache
>         - swap cache
>         - LRU list
> They more or less made some assumptions on their operating environment
> that we have to take care of.  Unfortunately these complexities are
> also not easily resolvable.
> 
> > (and few comments) of all the files in mm/. If you want to get rid
> 
> I promise I'll add more comments :)

OK, but they should still go in their relevant files. Or as best as
possible. Right now it's just silly to have all this here when much
of it could be moved out to filemap.c, swap_state.c, page_alloc.c, etc.


> > of the page and don't care what it's count or dirtyness is, then
> > truncate_inode_pages_range is the correct API to use.
> >
> > (or you could extract out some of it so you can call it directly on
> > individual locked pages, if that helps).
>  
> The patch to move over to truncate_complete_page() would like this.
> It's not a big win indeed.

No I don't mean to do this, but to move the truncate_inode_pages
code for truncating a single, locked, page into another function
in mm/truncate.c and then call that from here.

> 
> ---
>  mm/memory-failure.c |   14 ++++++--------
>  1 file changed, 6 insertions(+), 8 deletions(-)
> 
> --- sound-2.6.orig/mm/memory-failure.c
> +++ sound-2.6/mm/memory-failure.c
> @@ -327,20 +327,18 @@ static int me_pagecache_clean(struct pag
>  	if (!isolate_lru_page(p))
>  		page_cache_release(p);
>  
> -	if (page_has_private(p))
> -		do_invalidatepage(p, 0);
> -	if (page_has_private(p) && !try_to_release_page(p, GFP_NOIO))
> -		Dprintk(KERN_ERR "MCE %#lx: failed to release buffers\n",
> -			page_to_pfn(p));
> -
>  	/*
>  	 * remove_from_page_cache assumes (mapping && !mapped)
>  	 */
>  	if (page_mapping(p) && !page_mapped(p)) {
> -		remove_from_page_cache(p);
> -		page_cache_release(p);
> +                ClearPageMlocked(p);
> +                truncate_complete_page(p->mapping, p)
>  	}
>  
> +	if (page_has_private(p) && !try_to_release_page(p, GFP_NOIO))
> +		Dprintk(KERN_ERR "MCE %#lx: failed to release buffers\n",
> +			page_to_pfn(p));
> +
>  	return RECOVERED;
>  }
>  
> 
> > OK this is the point I was missing.
> > 
> > Should all be commented and put into mm/swap_state.c (or somewhere that
> > Hugh prefers).
> 
> But I doubt Hugh will welcome moving that bits into swap*.c ;)

Why not? If he has to look at it anyway, he probably rather looks
at fewer files :)

 
> > > Clean swap cache pages can be directly isolated. A later page fault will bring
> > > in the known good data from disk.
> > 
> > OK, but why do you ClearPageUptodate if it is just to be deleted from
> > swapcache anyway?
> 
> The ClearPageUptodate() is kind of a careless addition, in the hope
> that it will stop some random readers. Need more investigations.

OK. But it just muddies the waters in the meantime, so maybe take
such things out until there is a case for them.

 
> > > > You haven't waited on writeback here AFAIKS, and have you
> > > > *really* verified it is safe to call delete_from_swap_cache?
> > > 
> > > Good catch. I'll soon submit patches for handling the under
> > > read/write IO pages. In this patchset they are simply ignored.
> > 
> > Well that's quite important ;) I would suggest you just wait_on_page_writeback.
> > It is simple and should work. _Unless_ you can show it is a big problem that
> > needs equivalently big mes to fix ;)
> 
> Yes we could do wait_on_page_writeback() if necessary. The downside is,
> keeping writeback page in page cache opens a small time window for
> some one to access the page.

AFAIKS there already is such a window? You're doing lock_page and such.
No, it seems rather insane to do something like this here that no other
code in the mm ever does.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2009-06-01 11:50 UTC|newest]

Thread overview: 116+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-05-27 20:12 [PATCH] [0/16] HWPOISON: Intro Andi Kleen
2009-05-27 20:12 ` [PATCH] [1/16] HWPOISON: Add page flag for poisoned pages Andi Kleen
2009-05-27 20:35   ` Larry H.
2009-05-27 21:15   ` Alan Cox
2009-05-28  7:54     ` Andi Kleen
2009-05-29 16:10       ` Rik van Riel
2009-05-29 16:37         ` Andi Kleen
2009-05-29 16:34           ` Rik van Riel
2009-05-29 18:24             ` Andi Kleen
2009-05-29 18:26               ` Rik van Riel
2009-05-29 18:42                 ` Andi Kleen
2009-05-27 20:12 ` [PATCH] [2/16] HWPOISON: Export poison flag in /proc/kpageflags Andi Kleen
2009-05-29 16:37   ` Rik van Riel
2009-05-27 20:12 ` [PATCH] [3/16] HWPOISON: Export some rmap vma locking to outside world Andi Kleen
2009-05-27 20:12 ` [PATCH] [4/16] HWPOISON: Add support for poison swap entries v2 Andi Kleen
2009-05-28  8:46   ` Hidehiro Kawai
2009-05-28  9:11     ` Wu Fengguang
2009-05-28 10:42     ` Andi Kleen
2009-05-27 20:12 ` [PATCH] [5/16] HWPOISON: Add new SIGBUS error codes for hardware poison signals Andi Kleen
2009-05-27 20:12 ` [PATCH] [6/16] HWPOISON: Add basic support for poisoned pages in fault handler v2 Andi Kleen
2009-05-29  4:15   ` Hidehiro Kawai
2009-05-29  6:28     ` Andi Kleen
2009-05-27 20:12 ` [PATCH] [7/16] HWPOISON: Add various poison checks in mm/memory.c Andi Kleen
2009-05-27 20:12 ` [PATCH] [8/16] HWPOISON: x86: Add VM_FAULT_HWPOISON handling to x86 page fault handler Andi Kleen
2009-05-27 20:12 ` [PATCH] [9/16] HWPOISON: Use bitmask/action code for try_to_unmap behaviour Andi Kleen
2009-05-28  7:27   ` Nick Piggin
2009-05-28  8:03     ` Andi Kleen
2009-05-28  8:28       ` Nick Piggin
2009-05-28  9:02         ` Andi Kleen
2009-05-28 12:26           ` Nick Piggin
2009-05-27 20:12 ` [PATCH] [10/16] HWPOISON: Handle hardware poisoned pages in try_to_unmap Andi Kleen
2009-05-27 20:12 ` [PATCH] [11/16] HWPOISON: Handle poisoned pages in set_page_dirty() Andi Kleen
2009-05-27 20:12 ` [PATCH] [12/16] HWPOISON: check and isolate corrupted free pages Andi Kleen
2009-05-27 20:12 ` [PATCH] [13/16] HWPOISON: The high level memory error handler in the VM v3 Andi Kleen
2009-05-28  8:26   ` Nick Piggin
2009-05-28  9:31     ` Andi Kleen
2009-05-28 12:08       ` Nick Piggin
2009-05-28 13:45         ` Andi Kleen
2009-05-28 14:50           ` Wu Fengguang
2009-06-04  6:25             ` Nai Xia
2009-06-07 16:02               ` Wu Fengguang
2009-06-08 11:06                 ` Nai Xia
2009-06-08 12:31                   ` Wu Fengguang
2009-06-08 14:46                     ` Nai Xia
2009-06-09  6:48                       ` Wu Fengguang
2009-06-09 10:48                         ` Nick Piggin
2009-06-09 12:15                           ` Wu Fengguang
2009-06-09 12:17                             ` Nick Piggin
2009-06-09 12:47                               ` Wu Fengguang
2009-06-09 13:36                                 ` Nai Xia
2009-05-28 16:56           ` Russ Anderson
2009-05-30  6:42             ` Andi Kleen
2009-06-01 11:39               ` Nick Piggin
2009-06-01 18:19                 ` Andi Kleen
2009-06-01 12:05           ` Nick Piggin
2009-06-01 18:51             ` Andi Kleen
2009-06-02 12:10               ` Nick Piggin
2009-06-02 12:34                 ` Andi Kleen
2009-06-02 12:37                   ` Nick Piggin
2009-06-02 12:55                     ` Andi Kleen
2009-06-02 13:03                       ` Nick Piggin
2009-06-02 13:20                         ` Andi Kleen
2009-06-02 13:19                           ` Nick Piggin
2009-06-02 13:46                             ` Andi Kleen
2009-06-02 13:47                               ` Nick Piggin
2009-06-02 14:05                                 ` Andi Kleen
2009-06-02 13:30                     ` Wu Fengguang
2009-06-02 14:07                       ` Nick Piggin
2009-05-28  9:59     ` Wu Fengguang
2009-05-28 10:11       ` Andi Kleen
2009-05-28 10:33         ` Wu Fengguang
2009-05-28 10:51           ` Andi Kleen
2009-05-28 11:03             ` Wu Fengguang
2009-05-28 12:15             ` Nick Piggin
2009-05-28 13:48               ` Andi Kleen
2009-05-28 12:23       ` Nick Piggin
2009-05-28 13:54         ` Wu Fengguang
2009-06-01 11:50           ` Nick Piggin [this message]
2009-06-01 14:05             ` Wu Fengguang
2009-06-01 14:40               ` Nick Piggin
2009-06-02 11:14                 ` Wu Fengguang
2009-06-02 12:19                   ` Nick Piggin
2009-06-02 12:51                     ` Wu Fengguang
2009-06-02 14:33                       ` Nick Piggin
2009-06-03 10:21                       ` Jens Axboe
2009-06-01 21:11               ` Hugh Dickins
2009-06-01 21:41                 ` Andi Kleen
2009-06-01 18:32             ` Andi Kleen
2009-06-02 12:00               ` Nick Piggin
2009-06-02 12:47                 ` Andi Kleen
2009-06-02 12:57                   ` Nick Piggin
2009-06-02 13:25                     ` Andi Kleen
2009-06-02 13:24                       ` Nick Piggin
2009-06-02 13:41                         ` Andi Kleen
2009-06-02 13:40                           ` Nick Piggin
2009-06-02 13:53                           ` Wu Fengguang
2009-06-02 14:06                             ` Andi Kleen
2009-06-02 14:12                               ` Wu Fengguang
2009-06-02 14:21                                 ` Nick Piggin
2009-06-02 13:46                     ` Wu Fengguang
2009-06-02 14:08                       ` Andi Kleen
2009-06-02 14:10                         ` Wu Fengguang
2009-06-02 14:14                           ` Nick Piggin
2009-06-02 15:17                       ` Nick Piggin
2009-06-02 17:27                         ` Andi Kleen
2009-06-03  9:35                           ` Nick Piggin
2009-06-03 11:24                             ` Andi Kleen
2009-06-02 13:02                   ` Wu Fengguang
2009-06-02 15:09                   ` Nick Piggin
2009-06-02 17:19                     ` Andi Kleen
2009-06-03  6:24                       ` Nick Piggin
2009-06-03 15:51               ` Wu Fengguang
2009-06-03 16:05                 ` Andi Kleen
2009-05-27 20:12 ` [PATCH] [14/16] HWPOISON: FOR TESTING: Enable memory failure code unconditionally Andi Kleen
2009-05-27 20:12 ` [PATCH] [15/16] HWPOISON: Add madvise() based injector for hardware poisoned pages v3 Andi Kleen
2009-05-27 20:12 ` [PATCH] [16/16] HWPOISON: Add simple debugfs interface to inject hwpoison on arbitary PFNs Andi Kleen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090601115046.GE5018@wotan.suse.de \
    --to=npiggin@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=chris.mason@oracle.com \
    --cc=fengguang.wu@intel.com \
    --cc=hugh@veritas.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).