From mboxrd@z Thu Jan 1 00:00:00 1970 From: Wu Fengguang Subject: Re: [PATCH 2/2] fs: Make write(2) interruptible by a signal Date: Wed, 23 Nov 2011 21:27:59 +0800 Message-ID: <20111123132758.GA25373@localhost> References: <1321441935-6802-1-git-send-email-jack@suse.cz> <1321441935-6802-3-git-send-email-jack@suse.cz> <20111116114421.GA9098@localhost> <20111122142805.4e59faae.akpm@linux-foundation.org> <20111123090533.GA22420@localhost> <20111123130803.GD9775@quack.suse.cz> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Andrew Morton , Christoph Hellwig , Al Viro , "linux-fsdevel@vger.kernel.org" , Theodore Ts'o To: Jan Kara Return-path: Received: from mga03.intel.com ([143.182.124.21]:9387 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755408Ab1KWN2F (ORCPT ); Wed, 23 Nov 2011 08:28:05 -0500 Content-Disposition: inline In-Reply-To: <20111123130803.GD9775@quack.suse.cz> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Wed, Nov 23, 2011 at 09:08:03PM +0800, Jan Kara wrote: > On Wed 23-11-11 17:05:33, Wu Fengguang wrote: > > On Wed, Nov 23, 2011 at 06:28:05AM +0800, Andrew Morton wrote: > > > On Wed, 16 Nov 2011 19:44:21 +0800 > > > Wu Fengguang wrote: > > > > > > > Due to the (very low) possibility of data loss by partial writes, IMHO > > > > it would safer to test this patch in linux-next until next merge window, > > > > > > Any such bugs will not be discovered in linux-next testing. > > > > Yup, I'm afraid. > > > > > The only way to find these things in a reasonable period of time is to > > > go in and find them. For example, intensive fsx-linux testing with > > > concurrent heavy memory pressure on various filesystems with various > > > block sizes. And of course concurrent signalling. If you're talking > > > about O_DIRECT then iirc I hacked support for that into fsx-linux. I > > > think. > > > > How are we going to measure the success/failure? Check if it > > eventually resulted in filesystem corruption or whatever? > There are a few different questions: > 1) Checking for filesystem corruption via fsck - I find such corruption > caused by stopping write early extremely unlikely. Agreed. > 2) Checking that we do not expose uninitialized data after a partial > (possibly DIRECT_IO) write - I did not find a place where that could happen > but this would be worth testing. I think I can write a test for this if > people are afraid of data exposure problems. Do we already have such kind of tests in xfstests? If not, it sounds like a good gap to fill :-) > 3) Is it acceptable for write(2) to be interrupted by SIGKILL in the > middle? That obviously does happen with my patches so there's no reason > to test that. The question is whether someone cares or not and that can be > tested only by reality check :). Since the signal is SIGKILL, the process > itself cannot notice the interrupted write but someone else can. But as I > already said earlier, partial writes can already be observed when the > machine crashes, filesystem is close to ENOSPC or so. Arguably these are > more severe error conditions than application catching SIGKILL so my > patch lowers the bar for observing partial writes. But I wouldn't like to > throw away a sensible thing - allow SIGKILL to interrupt a system call - > just because of fear of possibility some broken app could rely on this. > Sure if the reality check shows there are such broken apps and users who > care enough to report, then I have nothing against biting the bullet > and reverting the change... Opinions? Reading Ted's information feed, I tend to disregard the partial write issue: since the "broken" applications will already fail and get punished in various other cases, I don't care adding one more penalty case to them :-P Thanks, Fengguang