All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wu Fengguang <fengguang.wu@intel.com>
To: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Hugh Dickins <hugh.dickins@tiscali.co.uk>,
	Andrew Morton <akpm@linux-foundation.org>,
	Izik Eidus <ieidus@redhat.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Minchan Kim <minchan.kim@gmail.com>,
	Andi Kleen <andi@firstfloor.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: Re: [PATCH 6/6] mm: sigbus instead of abusing oom
Date: Wed, 11 Nov 2009 12:35:40 +0800	[thread overview]
Message-ID: <20091111043540.GA22223@localhost> (raw)
In-Reply-To: <20091111114119.FD53.A69D9226@jp.fujitsu.com>

On Wed, Nov 11, 2009 at 10:42:04AM +0800, KOSAKI Motohiro wrote:
> > On Tue, 10 Nov 2009 22:06:49 +0000 (GMT)
> > Hugh Dickins <hugh.dickins@tiscali.co.uk> wrote:
> >
> > > When do_nonlinear_fault() realizes that the page table must have been
> > > corrupted for it to have been called, it does print_bad_pte() and
> > > returns ... VM_FAULT_OOM, which is hard to understand.
> > >
> > > It made some sense when I did it for 2.6.15, when do_page_fault()
> > > just killed the current process; but nowadays it lets the OOM killer
> > > decide who to kill - so page table corruption in one process would
> > > be liable to kill another.
> > >
> > > Change it to return VM_FAULT_SIGBUS instead: that doesn't guarantee
> > > that the process will be killed, but is good enough for such a rare
> > > abnormality, accompanied as it is by the "BUG: Bad page map" message.
> > >
> > > And recent HWPOISON work has copied that code into do_swap_page(),
> > > when it finds an impossible swap entry: fix that to VM_FAULT_SIGBUS too.
> > >
> > > Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
> >
> > Thank you !
> > Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>
> Thank you, me too.
>
> 	Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

Thank you!

 	Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>


Some unrelated comments:

We observed that copy_to_user() on a hwpoison page would trigger 3
(duplicate) late kills (the last three lines below):

early kill:
        [   56.964041] virtual address 7fffcab7d000 found in vma
        [   56.964390]  7fffcab7d000 phys b4365000
        [   58.089254] Triggering MCE exception on CPU 0
        [   58.089563] Disabling lock debugging due to kernel taint
        [   58.089914] Machine check events logged
        [   58.090187] MCE exception done on CPU 0
        [   58.090462] MCE 0xb4365: page flags 0x100000000100068=uptodate,lru,active,mmap,anonymous,swapbacked count 1 mapcount 1
        [   58.091878] MCE 0xb4365: Killing copy_to_user_te:3768 early due to hardware memory corruption
        [   58.092425] MCE 0xb4365: dirty LRU page recovery: Recovered
late kill on copy_to_user():
        [   59.136331] Copy 4096 bytes to 00007fffcab7d000
        [   59.136641] MCE: Killing copy_to_user_te:3768 due to hardware memory corruption fault at 7fffcab7d000
        [   59.137231] MCE: Killing copy_to_user_te:3768 due to hardware memory corruption fault at 7fffcab7d000
        [   59.137812] MCE: Killing copy_to_user_te:3768 due to hardware memory corruption fault at 7fffcab7d001

And this patch does not affect it (somehow weird but harmless behavior).

Thanks,
Fengguang

WARNING: multiple messages have this Message-ID (diff)
From: Wu Fengguang <fengguang.wu@intel.com>
To: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Hugh Dickins <hugh.dickins@tiscali.co.uk>,
	Andrew Morton <akpm@linux-foundation.org>,
	Izik Eidus <ieidus@redhat.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Minchan Kim <minchan.kim@gmail.com>,
	Andi Kleen <andi@firstfloor.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: Re: [PATCH 6/6] mm: sigbus instead of abusing oom
Date: Wed, 11 Nov 2009 12:35:40 +0800	[thread overview]
Message-ID: <20091111043540.GA22223@localhost> (raw)
In-Reply-To: <20091111114119.FD53.A69D9226@jp.fujitsu.com>

On Wed, Nov 11, 2009 at 10:42:04AM +0800, KOSAKI Motohiro wrote:
> > On Tue, 10 Nov 2009 22:06:49 +0000 (GMT)
> > Hugh Dickins <hugh.dickins@tiscali.co.uk> wrote:
> >
> > > When do_nonlinear_fault() realizes that the page table must have been
> > > corrupted for it to have been called, it does print_bad_pte() and
> > > returns ... VM_FAULT_OOM, which is hard to understand.
> > >
> > > It made some sense when I did it for 2.6.15, when do_page_fault()
> > > just killed the current process; but nowadays it lets the OOM killer
> > > decide who to kill - so page table corruption in one process would
> > > be liable to kill another.
> > >
> > > Change it to return VM_FAULT_SIGBUS instead: that doesn't guarantee
> > > that the process will be killed, but is good enough for such a rare
> > > abnormality, accompanied as it is by the "BUG: Bad page map" message.
> > >
> > > And recent HWPOISON work has copied that code into do_swap_page(),
> > > when it finds an impossible swap entry: fix that to VM_FAULT_SIGBUS too.
> > >
> > > Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
> >
> > Thank you !
> > Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>
> Thank you, me too.
>
> 	Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

Thank you!

 	Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>


Some unrelated comments:

We observed that copy_to_user() on a hwpoison page would trigger 3
(duplicate) late kills (the last three lines below):

early kill:
        [   56.964041] virtual address 7fffcab7d000 found in vma
        [   56.964390]  7fffcab7d000 phys b4365000
        [   58.089254] Triggering MCE exception on CPU 0
        [   58.089563] Disabling lock debugging due to kernel taint
        [   58.089914] Machine check events logged
        [   58.090187] MCE exception done on CPU 0
        [   58.090462] MCE 0xb4365: page flags 0x100000000100068=uptodate,lru,active,mmap,anonymous,swapbacked count 1 mapcount 1
        [   58.091878] MCE 0xb4365: Killing copy_to_user_te:3768 early due to hardware memory corruption
        [   58.092425] MCE 0xb4365: dirty LRU page recovery: Recovered
late kill on copy_to_user():
        [   59.136331] Copy 4096 bytes to 00007fffcab7d000
        [   59.136641] MCE: Killing copy_to_user_te:3768 due to hardware memory corruption fault at 7fffcab7d000
        [   59.137231] MCE: Killing copy_to_user_te:3768 due to hardware memory corruption fault at 7fffcab7d000
        [   59.137812] MCE: Killing copy_to_user_te:3768 due to hardware memory corruption fault at 7fffcab7d001

And this patch does not affect it (somehow weird but harmless behavior).

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2009-11-11  4:35 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-11-10 21:50 [PATCH 0/6] mm: prepare for ksm swapping Hugh Dickins
2009-11-10 21:50 ` Hugh Dickins
2009-11-10 21:51 ` [PATCH 1/6] mm: define PAGE_MAPPING_FLAGS Hugh Dickins
2009-11-10 21:51   ` Hugh Dickins
2009-11-19  0:25   ` Rik van Riel
2009-11-19  0:25     ` Rik van Riel
2009-11-10 21:55 ` [PATCH 2/6] mm: mlocking in try_to_unmap_one Hugh Dickins
2009-11-10 21:55   ` Hugh Dickins
2009-11-11  7:56   ` KOSAKI Motohiro
2009-11-11  7:56     ` KOSAKI Motohiro
2009-11-11 11:36     ` Hugh Dickins
2009-11-11 11:36       ` Hugh Dickins
2009-11-13  8:16       ` KOSAKI Motohiro
2009-11-13  8:16         ` KOSAKI Motohiro
2009-11-13  8:26         ` KOSAKI Motohiro
2009-11-13  8:26           ` KOSAKI Motohiro
2009-11-13 11:50           ` Andrea Arcangeli
2009-11-13 11:50             ` Andrea Arcangeli
2009-11-13 18:00             ` KOSAKI Motohiro
2009-11-13 18:00               ` KOSAKI Motohiro
2009-11-15 22:37         ` Hugh Dickins
2009-11-15 22:37           ` Hugh Dickins
2009-11-17  2:00           ` KOSAKI Motohiro
2009-11-17  2:00             ` KOSAKI Motohiro
2009-11-18 16:32             ` Hugh Dickins
2009-11-18 16:32               ` Hugh Dickins
2009-11-13  6:30   ` KOSAKI Motohiro
2009-11-13  6:30     ` KOSAKI Motohiro
2009-11-15 22:16     ` Hugh Dickins
2009-11-15 22:16       ` Hugh Dickins
2009-11-16 23:34       ` KOSAKI Motohiro
2009-11-16 23:34         ` KOSAKI Motohiro
2009-11-10 21:59 ` [PATCH 3/6] mm: CONFIG_MMU for PG_mlocked Hugh Dickins
2009-11-10 21:59   ` Hugh Dickins
2009-11-11  1:22   ` KOSAKI Motohiro
2009-11-11  1:22     ` KOSAKI Motohiro
2009-11-11 10:48     ` Hugh Dickins
2009-11-11 10:48       ` Hugh Dickins
2009-11-11 12:38   ` Andi Kleen
2009-11-11 12:38     ` Andi Kleen
2009-11-10 22:00 ` [PATCH 4/6] mm: pass address down to rmap ones Hugh Dickins
2009-11-10 22:00   ` Hugh Dickins
2009-11-10 22:02 ` [PATCH 5/6] mm: stop ptlock enlarging struct page Hugh Dickins
2009-11-10 22:02   ` Hugh Dickins
2009-11-10 22:09   ` Peter Zijlstra
2009-11-10 22:09     ` Peter Zijlstra
2009-11-10 22:24     ` Hugh Dickins
2009-11-10 22:24       ` Hugh Dickins
2009-11-10 22:14   ` Peter Zijlstra
2009-11-10 22:14     ` Peter Zijlstra
2009-11-10 22:29     ` Hugh Dickins
2009-11-10 22:29       ` Hugh Dickins
2009-11-10 22:06 ` [PATCH 6/6] mm: sigbus instead of abusing oom Hugh Dickins
2009-11-10 22:06   ` Hugh Dickins
2009-11-11  2:37   ` KAMEZAWA Hiroyuki
2009-11-11  2:37     ` KAMEZAWA Hiroyuki
2009-11-11  2:42     ` KOSAKI Motohiro
2009-11-11  2:42       ` KOSAKI Motohiro
2009-11-11  4:35       ` Wu Fengguang [this message]
2009-11-11  4:35         ` Wu Fengguang
2009-11-11  5:51   ` Minchan Kim
2009-11-11  5:51     ` Minchan Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20091111043540.GA22223@localhost \
    --to=fengguang.wu@intel.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=hugh.dickins@tiscali.co.uk \
    --cc=ieidus@redhat.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan.kim@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.