From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754612AbZHKNAY (ORCPT ); Tue, 11 Aug 2009 09:00:24 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1754552AbZHKNAU (ORCPT ); Tue, 11 Aug 2009 09:00:20 -0400 Received: from mga03.intel.com ([143.182.124.21]:40969 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752990AbZHKNAT (ORCPT ); Tue, 11 Aug 2009 09:00:19 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.43,360,1246863600"; d="scan'208";a="174539683" Date: Tue, 11 Aug 2009 20:38:19 +0800 From: Wu Fengguang To: Hidehiro Kawai Cc: Andi Kleen , "tytso@mit.edu" , "hch@infradead.org" , "mfasheh@suse.com" , "aia21@cantab.net" , "hugh.dickins@tiscali.co.uk" , "swhiteho@redhat.com" , "akpm@linux-foundation.org" , "npiggin@suse.de" , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , Satoshi OSHIMA , Taketoshi Sakuraba Subject: Re: [PATCH] [16/19] HWPOISON: Enable .remove_error_page for migration aware file systems Message-ID: <20090811123819.GB18881@localhost> References: <200908051136.682859934@firstfloor.org> <20090805093643.E0C00B15D8@basil.firstfloor.org> <4A7FBFD1.2010208@hitachi.com> <20090810070745.GA26533@localhost> <4A80EA14.4030300@hitachi.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4A80EA14.4030300@hitachi.com> User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Aug 11, 2009 at 11:48:36AM +0800, Hidehiro Kawai wrote: > Wu Fengguang wrote: > > In fact we proposed a patch for preventing the re-corruption case, see > > > > http://lkml.org/lkml/2009/6/11/294 > > > > However it is hard to answer the (policy) question "How sticky should > > the EIO bit remain?". > > It's a good approach! This approach may also solve my concern, > the re-corruption issue caused by transient IO errors. > > But I also think it needs a bit more consideration. For example, > if the application has the valid data in the user space buffer, > it would try to re-write it after detecting an IO error from the > previous write. In this case, we should clear the sticky error flag. Yes, and maybe more than that. The IO error issue really deserves an independent work, which will inevitably involve lots of discussions with lots of people. For the data re-corruption problem, "vm.memory_failure_recovery = 0" should be the most clean workaround for now. Can we settle with that? Our goal for this initial hwpoison implementation is to achieve good coverage (not necessarily every possible case :). Thanks, Fengguang