From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from aserp1040.oracle.com ([141.146.126.69]:31834 "EHLO aserp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751736AbdBNWHL (ORCPT ); Tue, 14 Feb 2017 17:07:11 -0500 Date: Tue, 14 Feb 2017 14:06:13 -0800 From: Liu Bo To: fdmanana@kernel.org Cc: linux-btrfs@vger.kernel.org Subject: Re: [PATCH] Btrfs: fix data loss after truncate when using the no-holes feature Message-ID: <20170214220613.GJ31091@localhost.localdomain> Reply-To: bo.li.liu@oracle.com References: <1487104526-13617-1-git-send-email-fdmanana@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: <1487104526-13617-1-git-send-email-fdmanana@kernel.org> Sender: linux-btrfs-owner@vger.kernel.org List-ID: On Tue, Feb 14, 2017 at 08:35:26PM +0000, fdmanana@kernel.org wrote: > From: Filipe Manana > > If we have a file with an implicit hole (NO_HOLES feature enabled) that > has an extent following the hole, delayed writes against regions of the > file behind the hole happened before but were not yet flushed and then > we truncate the file to a smaller size that lies inside the hole, we > end up persisting a wrong disk_i_size value for our inode that leads to > data loss after umounting and mounting again the filesystem or after > the inode is evicted and loaded again. > > This happens because at inode.c:btrfs_truncate_inode_items() we end up > setting last_size to the offset of the extent that we deleted and that > followed the hole. We then pass that value to btrfs_ordered_update_i_size() > which updates the inode's disk_i_size to a value smaller then the offset > of the buffered (delayed) writes. > > Example reproducer: > > $ mkfs.btrfs -f /dev/sdb > $ mount /dev/sdb /mnt > > $ xfs_io -f -c "pwrite -S 0x01 0K 32K" /mnt/foo > $ xfs_io -d -c "pwrite -S 0x02 -b 32K 64K 32K" /mnt/foo > $ xfs_io -c "truncate 60K" /mnt/foo > --> inode's disk_i_size updated to 0 > > $ md5sum /mnt/foo > 3c5ca3c3ab42f4b04d7e7eb0b0d4d806 /mnt/foo > > $ umount /dev/sdb > $ mount /dev/sdb /mnt > > $ md5sum /mnt/foo > d41d8cd98f00b204e9800998ecf8427e /mnt/foo > --> Empty file, all data lost! > > Cc: # 3.14+ > Fixes: 16e7549f045d ("Btrfs: incompatible format change to remove hole extents") > Signed-off-by: Filipe Manana > --- > fs/btrfs/inode.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > index 8eb6703..a0a38ce 100644 > --- a/fs/btrfs/inode.c > +++ b/fs/btrfs/inode.c > @@ -4703,8 +4703,12 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans, > btrfs_abort_transaction(trans, ret); > } > error: > - if (root->root_key.objectid != BTRFS_TREE_LOG_OBJECTID) > + if (root->root_key.objectid != BTRFS_TREE_LOG_OBJECTID) { > + if (last_size > new_size && > + btrfs_fs_incompat(fs_info, NO_HOLES)) > + last_size = new_size; > btrfs_ordered_update_i_size(inode, last_size, NULL); > + } > Looks good to me. Reviewed-by: Liu Bo We can also remove the code that sets last_size = new_size in the case of NO_HOLES inside the while loop, as I think this patch covers that, too. Because we've held the inode mutex and wait for the dirty pages beyond the current i_size, in fact, I couldn't think of a case that with NO_HOLE last_size < new_size, maybe we could use new_size directly and add a ASSERT just in case? Thanks, -liubo > btrfs_free_path(path); > > -- > 2.7.0.rc3 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html