From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail144.messagelabs.com (mail144.messagelabs.com [216.82.254.51]) by kanga.kvack.org (Postfix) with SMTP id 9DCCF6B004F for ; Wed, 10 Jun 2009 08:16:28 -0400 (EDT) Date: Wed, 10 Jun 2009 20:16:45 +0800 From: Wu Fengguang Subject: Re: [PATCH] [13/16] HWPOISON: The high level memory error handler in the VM v5 Message-ID: <20090610121645.GC5657@localhost> References: <20090603846.816684333@firstfloor.org> <20090603184648.2E2131D028F@basil.firstfloor.org> <20090609100922.GF14820@wotan.suse.de> <20090610083803.GE6597@localhost> <20090610085939.GE31155@wotan.suse.de> <20090610092010.GA32584@localhost> <20090610110305.GB3876@wotan.suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20090610110305.GB3876@wotan.suse.de> Sender: owner-linux-mm@kvack.org To: Nick Piggin Cc: Hugh Dickins , Andi Kleen , "riel@redhat.com" , "chris.mason@oracle.com" , "akpm@linux-foundation.org" , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" List-ID: On Wed, Jun 10, 2009 at 07:03:05PM +0800, Nick Piggin wrote: > On Wed, Jun 10, 2009 at 05:20:11PM +0800, Wu Fengguang wrote: > > On Wed, Jun 10, 2009 at 04:59:39PM +0800, Nick Piggin wrote: > > > On Wed, Jun 10, 2009 at 04:38:03PM +0800, Wu Fengguang wrote: > > > > On Wed, Jun 10, 2009 at 12:05:53AM +0800, Hugh Dickins wrote: > > > > > I think a much more sensible approach would be to follow the page > > > > > migration technique of replacing the page's ptes by a special swap-like > > > > > entry, then do the killing from do_swap_page() if a process actually > > > > > tries to access the page. > > > > > > > > We call that "late kill" and will be enabled when > > > > sysctl_memory_failure_early_kill=0. Its default value is 1. > > > > > > What's the use of this? What are the tradeoffs, in what situations > > > should an admin set this sysctl one way or the other? > > > > Good questions. > > > > My understanding is, when an application is generating data A, B, C in > > sequence, and A is found to be corrupted by the kernel. Does it make > > sense for the application to continue generate B and C? Or, are there > > data dependencies between them? With late kill, it becomes more likely > > that the disk contain new versions of B/C and old version of A, so > > will more likely create data inconsistency. > > > > So early kill is more safe. > > Hmm, I think that's pretty speculative, and doesn't seem possible for > an admin (or even kernel programmer) to choose the "right" value. > Agreed. It's not easy to choose if I'm myself an admin ;) > The application equally may not need to touch the data again, so > killing it might cause some inconsistency in whatever it is currently > doing. Yes, early kill can also be evil. What I can do now is to document the early kill parameter more carefully. Thanks, Fengguang -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org