From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ipmail06.adl2.internode.on.net ([150.101.137.129]:15989 "EHLO ipmail06.adl2.internode.on.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752548AbcC1WjT (ORCPT ); Mon, 28 Mar 2016 18:39:19 -0400 Date: Tue, 29 Mar 2016 09:39:16 +1100 From: Dave Chinner To: Jan Kara Cc: linux-fsdevel@vger.kernel.org, Ross Zwisler , Dan Williams , "Wilcox, Matthew R" Subject: Re: DAX data corruption for mmaped and written files Message-ID: <20160328223916.GB30721@dastard> References: <20160324131223.GJ4025@quack.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160324131223.GJ4025@quack.suse.cz> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Thu, Mar 24, 2016 at 02:12:23PM +0100, Jan Kara wrote: > Hello, > > yesterday I have been stress-testing mmap code with my new fault locking > patches and I have found a data corruption issue when file is written both > via mmap and standard write(2). The problem is following: > > CPU1 CPU2 > dax_io() dax_fault() > get_block() - allocates block > ... get_block() - finds allocated block > - zeroes it inside fs > fault completese > > if (buffer_unwritten(bh) || buffer_new(bh)) -> new buffer > dax_new_buf() -> zeroes buffer which may > overwrite user data Which filesystem? XFS should be, in both cases, zeroing the block entirely inside get_block() when it is first allocated. i.e we should see: CPU1 CPU2 dax_io() dax_fault() get_block() - allocates block - zeroes it inside fs ... get_block() - finds allocated block fault completes buffer returned is not new or unwritten. > In some cases the race can also go the other way around and we lose data > written by write. It shouldn't: CPU1 CPU2 dax_io() dax_fault() get_block() - allocates block - zeroes it inside fs .... get_block() - finds allocated block buffer returned is not new or unwritten. fault completes > So either we need to do the zeroing inside fs also for write(2) path (but > that would essentially mean we would write the block twice for each > allocating write) Yup, that's what XFS is supposed to be doing right now... > or we would need dax_io() to also use radix tree locking > to serialize against page faults (in the same way page cache does this with > page lock). Any opinion on what would be better? Don't care, as long as data is not corrupted ;) Cheers, Dave. -- Dave Chinner david@fromorbit.com