All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wu Fengguang <fengguang.wu@intel.com>
To: Nick Piggin <npiggin@suse.de>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>, Ingo Molnar <mingo@elte.hu>,
	Mel Gorman <mel@csn.ul.ie>, Thomas Gleixner <tglx@linutronix.de>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Hugh Dickins <hugh.dickins@tiscali.co.uk>,
	Andi Kleen <andi@firstfloor.org>,
	"riel@redhat.com" <riel@redhat.com>,
	"chris.mason@oracle.com" <chris.mason@oracle.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>
Subject: Re: [PATCH 00/22] HWPOISON: Intro (v5)
Date: Mon, 15 Jun 2009 19:41:49 +0800	[thread overview]
Message-ID: <20090615114149.GA7675@localhost> (raw)
In-Reply-To: <20090615103619.GC20461@wotan.suse.de>

On Mon, Jun 15, 2009 at 06:36:19PM +0800, Nick Piggin wrote:
> On Mon, Jun 15, 2009 at 06:09:54PM +0800, Wu Fengguang wrote:
> > On Mon, Jun 15, 2009 at 04:14:53PM +0800, Nick Piggin wrote:
> > > On Mon, Jun 15, 2009 at 08:44:47AM +0200, Nick Piggin wrote:
> > > > Did we verify with filesystem maintainers (eg. btrfs) that the
> > > > !ISREG test will be enough to prevent oopses?
> > > 
> > > BTW. this is quite a significant change I think and not
> > > really documented well enough. Previously a filesystem
> > > will know exactly when and why pagecache in a mapping
> > > under its control will be truncated (as opposed to
> > > invalidated).
> > > 
> > > They even have opportunity to hold locks such as i_mutex.
> > > 
> > > And depending on what they do, they could do interesting
> > > things even with ISREG files.
> > > 
> > > So, I really think this needs review by filesystem
> > > maintainers and it would be far safer to use invalidate
> > > until it is known to be safe.
> > 
> > Nick, we are doing invalidate_complete_page() for !S_ISREG inodes now.
> > Do you mean to do invalidate_complete_page() for all inodes for now?
> 
> That would make me a lot happier. It is obviously correct because
> that is basically what page reclaim and inode reclaim and drop
> caches etc does.
> 
> Note that I still don't like exporting invalidate_complete_page
> fro the same reasons I don't like exporting truncate_complete_page,
> so I will ask if you can do an invalidate_inode_page function
> along the same lines of the truncate_inode_page one please.

Sure. I did something radical - don't try to isolate dirty/writeback
pages, to match the exact invalidate_mapping_pages() behavior.

Let's mess with the dirty/writeback pages some time later.

+/*
+ * Clean (or cleaned) page cache page.
+ */
+static int me_pagecache_clean(struct page *p, unsigned long pfn)
+{
+       struct address_space *mapping;
+
+       if (!isolate_lru_page(p))
+               page_cache_release(p);
+
+       mapping = page_mapping(p);
+       if (mapping == NULL)
+               return RECOVERED;
+
+       /*
+        * Now remove it from page cache.
+        * Currently we only remove clean, unused page for the sake of safety.
+        */
+       if (!invalidate_inode_page(mapping, p)) {
+               printk(KERN_ERR
+                      "MCE %#lx: failed to remove from page cache\n", pfn);
+               return FAILED;
+       }
+       return RECOVERED;
+}


--- sound-2.6.orig/mm/truncate.c
+++ sound-2.6/mm/truncate.c
@@ -135,6 +135,21 @@ invalidate_complete_page(struct address_
        return ret;
 }

+/*
+ * Safely invalidate one page from its pagecache mapping.
+ * It only drops clean, unused pages. The page must be locked.
+ *
+ * Returns 1 if the page is successfully invalidated, otherwise 0.
+ */
+int invalidate_inode_page(struct address_space *mapping, struct page *page)
+{
+       if (PageDirty(page) || PageWriteback(page))
+               return 0;
+       if (page_mapped(page))
+               return 0;
+       return invalidate_complete_page(mapping, page);
+}
+
 /**
  * truncate_inode_pages - truncate range of pages specified by start & end byte offsets
  * @mapping: mapping to truncate
@@ -311,12 +326,8 @@ unsigned long invalidate_mapping_pages(s
                        if (lock_failed)
                                continue;

-                       if (PageDirty(page) || PageWriteback(page))
-                               goto unlock;
-                       if (page_mapped(page))
-                               goto unlock;
-                       ret += invalidate_complete_page(mapping, page);
-unlock:
+                       ret += invalidate_inode_page(mapping, page);
+                       
                        unlock_page(page);
                        if (next > end)
                                break;

> 
> > That's a good suggestion, it shall be able to do the job for most
> > pages indeed.
> 
> Yes I think it will be far far safer while only introducing
> another small class of pages which cannot be recovered (probably
> a much smaller set most of the time than the size of the existing
> set of pages which cannot be recovered).

Anyway, dirty pages are limited to 15% of the total memory by default. 

Thanks,
Fengguang

WARNING: multiple messages have this Message-ID (diff)
From: Wu Fengguang <fengguang.wu@intel.com>
To: Nick Piggin <npiggin@suse.de>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>, Ingo Molnar <mingo@elte.hu>,
	Mel Gorman <mel@csn.ul.ie>, Thomas Gleixner <tglx@linutronix.de>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Hugh Dickins <hugh.dickins@tiscali.co.uk>,
	Andi Kleen <andi@firstfloor.org>,
	"riel@redhat.com" <riel@redhat.com>,
	"chris.mason@oracle.com" <chris.mason@oracle.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>
Subject: Re: [PATCH 00/22] HWPOISON: Intro (v5)
Date: Mon, 15 Jun 2009 19:41:49 +0800	[thread overview]
Message-ID: <20090615114149.GA7675@localhost> (raw)
In-Reply-To: <20090615103619.GC20461@wotan.suse.de>

On Mon, Jun 15, 2009 at 06:36:19PM +0800, Nick Piggin wrote:
> On Mon, Jun 15, 2009 at 06:09:54PM +0800, Wu Fengguang wrote:
> > On Mon, Jun 15, 2009 at 04:14:53PM +0800, Nick Piggin wrote:
> > > On Mon, Jun 15, 2009 at 08:44:47AM +0200, Nick Piggin wrote:
> > > > Did we verify with filesystem maintainers (eg. btrfs) that the
> > > > !ISREG test will be enough to prevent oopses?
> > > 
> > > BTW. this is quite a significant change I think and not
> > > really documented well enough. Previously a filesystem
> > > will know exactly when and why pagecache in a mapping
> > > under its control will be truncated (as opposed to
> > > invalidated).
> > > 
> > > They even have opportunity to hold locks such as i_mutex.
> > > 
> > > And depending on what they do, they could do interesting
> > > things even with ISREG files.
> > > 
> > > So, I really think this needs review by filesystem
> > > maintainers and it would be far safer to use invalidate
> > > until it is known to be safe.
> > 
> > Nick, we are doing invalidate_complete_page() for !S_ISREG inodes now.
> > Do you mean to do invalidate_complete_page() for all inodes for now?
> 
> That would make me a lot happier. It is obviously correct because
> that is basically what page reclaim and inode reclaim and drop
> caches etc does.
> 
> Note that I still don't like exporting invalidate_complete_page
> fro the same reasons I don't like exporting truncate_complete_page,
> so I will ask if you can do an invalidate_inode_page function
> along the same lines of the truncate_inode_page one please.

Sure. I did something radical - don't try to isolate dirty/writeback
pages, to match the exact invalidate_mapping_pages() behavior.

Let's mess with the dirty/writeback pages some time later.

+/*
+ * Clean (or cleaned) page cache page.
+ */
+static int me_pagecache_clean(struct page *p, unsigned long pfn)
+{
+       struct address_space *mapping;
+
+       if (!isolate_lru_page(p))
+               page_cache_release(p);
+
+       mapping = page_mapping(p);
+       if (mapping == NULL)
+               return RECOVERED;
+
+       /*
+        * Now remove it from page cache.
+        * Currently we only remove clean, unused page for the sake of safety.
+        */
+       if (!invalidate_inode_page(mapping, p)) {
+               printk(KERN_ERR
+                      "MCE %#lx: failed to remove from page cache\n", pfn);
+               return FAILED;
+       }
+       return RECOVERED;
+}


--- sound-2.6.orig/mm/truncate.c
+++ sound-2.6/mm/truncate.c
@@ -135,6 +135,21 @@ invalidate_complete_page(struct address_
        return ret;
 }

+/*
+ * Safely invalidate one page from its pagecache mapping.
+ * It only drops clean, unused pages. The page must be locked.
+ *
+ * Returns 1 if the page is successfully invalidated, otherwise 0.
+ */
+int invalidate_inode_page(struct address_space *mapping, struct page *page)
+{
+       if (PageDirty(page) || PageWriteback(page))
+               return 0;
+       if (page_mapped(page))
+               return 0;
+       return invalidate_complete_page(mapping, page);
+}
+
 /**
  * truncate_inode_pages - truncate range of pages specified by start & end byte offsets
  * @mapping: mapping to truncate
@@ -311,12 +326,8 @@ unsigned long invalidate_mapping_pages(s
                        if (lock_failed)
                                continue;

-                       if (PageDirty(page) || PageWriteback(page))
-                               goto unlock;
-                       if (page_mapped(page))
-                               goto unlock;
-                       ret += invalidate_complete_page(mapping, page);
-unlock:
+                       ret += invalidate_inode_page(mapping, page);
+                       
                        unlock_page(page);
                        if (next > end)
                                break;

> 
> > That's a good suggestion, it shall be able to do the job for most
> > pages indeed.
> 
> Yes I think it will be far far safer while only introducing
> another small class of pages which cannot be recovered (probably
> a much smaller set most of the time than the size of the existing
> set of pages which cannot be recovered).

Anyway, dirty pages are limited to 15% of the total memory by default. 

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2009-06-15 12:10 UTC|newest]

Thread overview: 158+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-06-15  2:45 [PATCH 00/22] HWPOISON: Intro (v5) Wu Fengguang
2009-06-15  2:45 ` Wu Fengguang
2009-06-15  2:45 ` [PATCH 01/22] HWPOISON: Add page flag for poisoned pages Wu Fengguang
2009-06-15  2:45   ` Wu Fengguang
2009-06-15  2:45 ` [PATCH 02/22] HWPOISON: Export some rmap vma locking to outside world Wu Fengguang
2009-06-15  2:45   ` Wu Fengguang
2009-06-15  2:45 ` [PATCH 03/22] HWPOISON: Add support for poison swap entries v2 Wu Fengguang
2009-06-15  2:45   ` Wu Fengguang
2009-06-15  2:45 ` [PATCH 04/22] HWPOISON: Add new SIGBUS error codes for hardware poison signals Wu Fengguang
2009-06-15  2:45   ` Wu Fengguang
2009-06-15  2:45 ` [PATCH 05/22] HWPOISON: Add basic support for poisoned pages in fault handler v3 Wu Fengguang
2009-06-15  2:45   ` Wu Fengguang
2009-06-15  2:45 ` [PATCH 06/22] HWPOISON: x86: Add VM_FAULT_HWPOISON handling to x86 page fault handler v2 Wu Fengguang
2009-06-15  2:45   ` Wu Fengguang
2009-06-15  2:45 ` [PATCH 07/22] HWPOISON: define VM_FAULT_HWPOISON to 0 when feature is disabled Wu Fengguang
2009-06-15  2:45   ` Wu Fengguang
2009-06-15  2:45 ` [PATCH 08/22] HWPOISON: Use bitmask/action code for try_to_unmap behaviour Wu Fengguang
2009-06-15  2:45   ` Wu Fengguang
2009-06-15  2:45 ` [PATCH 09/22] HWPOISON: Handle hardware poisoned pages in try_to_unmap Wu Fengguang
2009-06-15  2:45   ` Wu Fengguang
2009-06-15 13:09   ` Minchan Kim
2009-06-15 13:09     ` Minchan Kim
2009-06-15 15:26     ` Wu Fengguang
2009-06-15 15:26       ` Wu Fengguang
2009-06-16  0:03       ` Minchan Kim
2009-06-16  0:03         ` Minchan Kim
2009-06-16 13:49         ` Wu Fengguang
2009-06-16 13:49           ` Wu Fengguang
2009-06-17  0:28           ` Minchan Kim
2009-06-17  0:28             ` Minchan Kim
2009-06-17  7:23             ` Wu Fengguang
2009-06-17  7:23               ` Wu Fengguang
2009-06-17 13:27               ` Minchan Kim
2009-06-17 13:27                 ` Minchan Kim
2009-06-17 13:37                 ` Wu Fengguang
2009-06-17 13:37                   ` Wu Fengguang
2009-06-17 13:43                   ` Minchan Kim
2009-06-17 13:43                     ` Minchan Kim
2009-06-17 14:03                     ` Wu Fengguang
2009-06-17 14:03                       ` Wu Fengguang
2009-06-17 14:08                       ` Minchan Kim
2009-06-17 14:08                         ` Minchan Kim
2009-06-17 14:12                         ` Wu Fengguang
2009-06-17 14:12                           ` Wu Fengguang
     [not found]               ` <28c262360906170644w65c08a8y2d2805fb08045804@mail.gmail.com>
     [not found]                 ` <20090617135543.GA8079@localhost>
     [not found]                   ` <28c262360906170703h3363b68dp74471358f647921e@mail.gmail.com>
2009-06-18 12:14                     ` Wu Fengguang
2009-06-18 12:14                       ` Wu Fengguang
2009-06-18 13:31                       ` Minchan Kim
2009-06-18 13:31                         ` Minchan Kim
2009-06-19  1:58                         ` Wu Fengguang
2009-06-19  1:58                           ` Wu Fengguang
2009-06-15  2:45 ` [PATCH 10/22] HWPOISON: check and isolate corrupted free pages v2 Wu Fengguang
2009-06-15  2:45   ` Wu Fengguang
2009-06-15  9:41   ` KAMEZAWA Hiroyuki
2009-06-15  9:41     ` KAMEZAWA Hiroyuki
2009-06-15 10:16     ` Wu Fengguang
2009-06-15 10:16       ` Wu Fengguang
2009-06-15 23:52       ` KAMEZAWA Hiroyuki
2009-06-15 23:52         ` KAMEZAWA Hiroyuki
2009-06-16  0:34         ` Wu Fengguang
2009-06-16  0:34           ` Wu Fengguang
2009-06-16 11:29           ` Hugh Dickins
2009-06-16 11:29             ` Hugh Dickins
2009-06-16 11:40             ` Wu Fengguang
2009-06-16 11:40               ` Wu Fengguang
2009-06-15  2:45 ` [PATCH 11/22] HWPOISON: Refactor truncate to allow direct truncating of page v3 Wu Fengguang
2009-06-15  2:45   ` Wu Fengguang
2009-06-15  2:45 ` [PATCH 12/22] HWPOISON: The high level memory error handler in the VM v7 Wu Fengguang
2009-06-15  2:45   ` Wu Fengguang
2009-06-15  2:45 ` [PATCH 13/22] HWPOISON: Add madvise() based injector for hardware poisoned pages v3 Wu Fengguang
2009-06-15  2:45   ` Wu Fengguang
2009-06-15  2:45 ` [PATCH 14/22] HWPOISON: Add simple debugfs interface to inject hwpoison on arbitary PFNs Wu Fengguang
2009-06-15  2:45   ` Wu Fengguang
2009-06-15  2:45 ` [PATCH 15/22] HWPOISON: early kill cleanups and fixes Wu Fengguang
2009-06-15  2:45   ` Wu Fengguang
2009-06-15  2:45 ` [PATCH 16/22] mm: move page flag numbers for user space to page-flags.h Wu Fengguang
2009-06-15  2:45   ` Wu Fengguang
2009-06-15  2:45 ` [PATCH 17/22] HWPOISON: introduce struct hwpoison_control Wu Fengguang
2009-06-15  2:45   ` Wu Fengguang
2009-06-15  2:45 ` [PATCH 18/22] HWPOISON: use compound head page Wu Fengguang
2009-06-15  2:45   ` Wu Fengguang
2009-06-15  2:45 ` [PATCH 19/22] HWPOISON: detect free buddy pages explicitly Wu Fengguang
2009-06-15  2:45   ` Wu Fengguang
2009-06-15  2:45 ` [PATCH 20/22] HWPOISON: collect infos that reflect the impact of the memory corruption Wu Fengguang
2009-06-15  2:45   ` Wu Fengguang
2009-06-15  2:45 ` [PATCH 21/22] HWPOISON: send uevent to report " Wu Fengguang
2009-06-15  2:45   ` Wu Fengguang
2009-06-15  6:29   ` Andi Kleen
2009-06-15  6:29     ` Andi Kleen
2009-06-15  9:56     ` Wu Fengguang
2009-06-15  9:56       ` Wu Fengguang
2009-06-16  0:35   ` Greg KH
2009-06-16  0:35     ` Greg KH
2009-06-15  2:45 ` [PATCH 22/22] HWPOISON: FOR TESTING: Enable memory failure code unconditionally Wu Fengguang
2009-06-15  2:45   ` Wu Fengguang
2009-06-15  3:18 ` [PATCH 00/22] HWPOISON: Intro (v5) Balbir Singh
2009-06-15  3:18   ` Balbir Singh
2009-06-15  4:27   ` Wu Fengguang
2009-06-15  4:27     ` Wu Fengguang
2009-06-15  6:44     ` Nick Piggin
2009-06-15  6:44       ` Nick Piggin
2009-06-15  7:09       ` Andi Kleen
2009-06-15  7:09         ` Andi Kleen
2009-06-15  7:19         ` Nick Piggin
2009-06-15  7:19           ` Nick Piggin
2009-06-15 12:10           ` Wu Fengguang
2009-06-15 12:10             ` Wu Fengguang
2009-06-15 12:25             ` Nick Piggin
2009-06-15 12:25               ` Nick Piggin
2009-06-15 14:22               ` Wu Fengguang
2009-06-15 14:22                 ` Wu Fengguang
2009-06-17  6:37                 ` [RFC][PATCH] HWPOISON: only early kill processes who installed SIGBUS handler Wu Fengguang
2009-06-17  6:37                   ` Wu Fengguang
2009-06-17  8:04                   ` Nick Piggin
2009-06-17  8:04                     ` Nick Piggin
2009-06-17  9:55                     ` Wu Fengguang
2009-06-17  9:55                       ` Wu Fengguang
2009-06-17 10:00                       ` Nick Piggin
2009-06-17 10:00                         ` Nick Piggin
2009-06-17 11:56                         ` Wu Fengguang
2009-06-17 11:56                           ` Wu Fengguang
2009-06-18  9:56                     ` Wu Fengguang
2009-06-18  9:56                       ` Wu Fengguang
2009-06-15  8:14       ` [PATCH 00/22] HWPOISON: Intro (v5) Nick Piggin
2009-06-15  8:14         ` Nick Piggin
2009-06-15 10:09         ` Wu Fengguang
2009-06-15 10:09           ` Wu Fengguang
2009-06-15 10:36           ` Nick Piggin
2009-06-15 10:36             ` Nick Piggin
2009-06-15 11:41             ` Wu Fengguang [this message]
2009-06-15 11:41               ` Wu Fengguang
2009-06-15 12:51     ` Hugh Dickins
2009-06-15 12:51       ` Hugh Dickins
2009-06-15 13:00       ` Alan Cox
2009-06-15 13:00         ` Alan Cox
2009-06-15 13:29         ` Andi Kleen
2009-06-15 13:29           ` Andi Kleen
2009-06-15 13:28           ` H. Peter Anvin
2009-06-15 13:28             ` H. Peter Anvin
2009-06-15 14:48           ` Alan Cox
2009-06-15 14:48             ` Alan Cox
2009-06-15 15:24             ` Andi Kleen
2009-06-15 15:24               ` Andi Kleen
2009-06-15 15:28               ` Alan Cox
2009-06-15 15:28                 ` Alan Cox
2009-06-15 16:19                 ` Andi Kleen
2009-06-15 16:19                   ` Andi Kleen
2009-06-15 16:28                   ` Alan Cox
2009-06-15 16:28                     ` Alan Cox
2009-06-15 17:07                     ` Andi Kleen
2009-06-15 17:07                       ` Andi Kleen
2009-06-16 19:44           ` Russ Anderson
2009-06-16 19:44             ` Russ Anderson
2009-06-16 20:28             ` H. Peter Anvin
2009-06-16 20:28               ` H. Peter Anvin
2009-06-16 20:54               ` Russ Anderson
2009-06-16 20:54                 ` Russ Anderson
2009-06-16 20:58                 ` H. Peter Anvin
2009-06-16 20:58                   ` H. Peter Anvin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090615114149.GA7675@localhost \
    --to=fengguang.wu@intel.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=chris.mason@oracle.com \
    --cc=hpa@zytor.com \
    --cc=hugh.dickins@tiscali.co.uk \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mel@csn.ul.ie \
    --cc=mingo@elte.hu \
    --cc=npiggin@suse.de \
    --cc=riel@redhat.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.