From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Morton Subject: Re: [PATCH 2/2] fs: Make write(2) interruptible by a signal Date: Wed, 23 Nov 2011 01:50:05 -0800 Message-ID: <20111123015005.8f366566.akpm@linux-foundation.org> References: <1321441935-6802-1-git-send-email-jack@suse.cz> <1321441935-6802-3-git-send-email-jack@suse.cz> <20111116114421.GA9098@localhost> <20111122142805.4e59faae.akpm@linux-foundation.org> <20111123090533.GA22420@localhost> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: Jan Kara , Christoph Hellwig , Al Viro , "linux-fsdevel@vger.kernel.org" To: Wu Fengguang Return-path: Received: from mail.linuxfoundation.org ([140.211.169.12]:52795 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752726Ab1KWJtf (ORCPT ); Wed, 23 Nov 2011 04:49:35 -0500 In-Reply-To: <20111123090533.GA22420@localhost> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Wed, 23 Nov 2011 17:05:33 +0800 Wu Fengguang wrote: > On Wed, Nov 23, 2011 at 06:28:05AM +0800, Andrew Morton wrote: > > On Wed, 16 Nov 2011 19:44:21 +0800 > > Wu Fengguang wrote: > > > > > Due to the (very low) possibility of data loss by partial writes, IMHO > > > it would safer to test this patch in linux-next until next merge window, > > > > Any such bugs will not be discovered in linux-next testing. > > Yup, I'm afraid. > > > The only way to find these things in a reasonable period of time is to > > go in and find them. For example, intensive fsx-linux testing with > > concurrent heavy memory pressure on various filesystems with various > > block sizes. And of course concurrent signalling. If you're talking > > about O_DIRECT then iirc I hacked support for that into fsx-linux. I > > think. > > How are we going to measure the success/failure? Check if it > eventually resulted in filesystem corruption or whatever? yup. > When received SIGKILL, fsx-linux itself will just die. Well, there are ways of simulating its effect. For example, bale out of the write() every seventh time if current->comm=="fsx-linux". Or set a rearming timer which triggers a baled-out write. You'll work it out ;) > > Anyway, what _are_ the scenarios in which we think data can be lost? > > It's the vision that there may be partial writes on SIGKILL. Before > patch, the write will either succeed as a whole or not started at > all, depending on the timing of write/SIGKILL. This is kind of atomic > operation. However now the write can be half done. > > If the application really cares about atomic behavior, it can do > create-and-rename. However there are always the possibility of broken > applications. > > Maybe this is not that big problem as SIGKILL is considered be to > destructive already. Yeah, I have dim dark memories that there are subtle problems with interrupting write(). Linus might remember. Others might remember too, but we're only talking to fsdevel which nobody reads :(