From mboxrd@z Thu Jan 1 00:00:00 1970 From: Chris Mason Subject: Re: [patch 9/9] mm: do_sync_mapping_range integrity fix Date: Fri, 31 Oct 2008 10:10:19 -0400 Message-ID: <1225462219.10549.19.camel@think.oraclecorp.com> References: <20081028144715.683011000@suse.de> <20081028145734.706927000@nick.local0.net> <20081030161344.0ed5ca52.akpm@linux-foundation.org> <20081031091616.GF19268@wotan.suse.de> Mime-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 7bit Cc: Andrew Morton , linux-fsdevel@vger.kernel.org, david@fromorbit.com To: Nick Piggin Return-path: Received: from rcsinet13.oracle.com ([148.87.113.125]:28870 "EHLO rgminet13.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751364AbYJaOMy (ORCPT ); Fri, 31 Oct 2008 10:12:54 -0400 In-Reply-To: <20081031091616.GF19268@wotan.suse.de> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Fri, 2008-10-31 at 10:16 +0100, Nick Piggin wrote: > On Thu, Oct 30, 2008 at 04:13:44PM -0700, Andrew Morton wrote: > > On Wed, 29 Oct 2008 01:47:24 +1100 > > npiggin@suse.de wrote: > > > > > Chris Mason notices do_sync_mapping_range didn't actually ask for > data > > > integrity writeout. Unfortunately, it is advertised as being > usable for > > > data integrity operations. > > > > > > This is a data interity bug. > > > [ use WB_SYNC_ALL instead of WB_SYNC_NONE ] > > If the caller > > is using sync_file_range() for integrity then the caller has done a > > SYNC_FILE_RANGE_WAIT_BEFORE. > > No disputes about whether the API works "by design". But I think the > implementation has a bug. I'll explain: I'll definitely agree the current usage is clumsy, and there is a bug in fs/buffer.c. A grep through the rest of the filesystems doesn't turn up many assumptions that WB_SYNC_NONE means it's ok to skip dirty pages. Greps for WB_SYNC_ALL and WB_SYNC_NONE in the fs code reveal: fs/buffer.c:__block_write_full_page() if (wbc->sync_mode != WB_SYNC_NONE || !wbc->nonblocking) { lock_buffer(bh); } else if (!trylock_buffer(bh)) { redirty_page_for_writepage(wbc, page); continue; } Easily fixed s/||/&&/, which is what XFS does. reiser3 has the same bug in fs/reiserfs/inode.c ntfs and gfs2 each have a check that assumes WB_SYNC_NONE means optional writeback, both seem fixable with one liners. Everywhere we wait on page writeback while we're trying to build nice big bios hurts performance. I'd rather see us switch to something closer to the do_sync_mapping_range expectation of WB_SYNC_NONE than sprinkle WB_SYNC_ALLs everywhere. -chris