From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:41914 "EHLO mx0b-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753129AbbHLOiw (ORCPT ); Wed, 12 Aug 2015 10:38:52 -0400 Message-ID: <55CB5A15.5020607@fb.com> Date: Wed, 12 Aug 2015 10:37:09 -0400 From: Josef Bacik MIME-Version: 1.0 To: , CC: Filipe Manana Subject: Re: [PATCH] Btrfs: check if previous transaction aborted to avoid fs corruption References: <1439388661-5316-1-git-send-email-fdmanana@kernel.org> In-Reply-To: <1439388661-5316-1-git-send-email-fdmanana@kernel.org> Content-Type: text/plain; charset="windows-1252"; format=flowed Sender: linux-btrfs-owner@vger.kernel.org List-ID: On 08/12/2015 10:11 AM, fdmanana@kernel.org wrote: > From: Filipe Manana > > While we are committing a transaction, it's possible the previous one is > still finishing its commit and therefore we wait for it to finish first. > However we were not checking if that previous transaction ended up getting > aborted after we waited for it to commit, so we ended up committing the > current transaction which can lead to fs corruption because the new > superblock can point to trees that have had one or more nodes/leafs that > were never durably persisted. > The following sequence diagram exemplifies how this is possible: > > CPU 0 CPU 1 > > transaction N starts > > (...) > > btrfs_commit_transaction(N) > > cur_trans->state = TRANS_STATE_COMMIT_START; > (...) > cur_trans->state = TRANS_STATE_COMMIT_DOING; > (...) > > cur_trans->state = TRANS_STATE_UNBLOCKED; > root->fs_info->running_transaction = NULL; > > btrfs_start_transaction() > --> starts transaction N + 1 > > btrfs_write_and_wait_transaction(trans, root); > --> starts writing all new or COWed ebs created > at transaction N > > creates some new ebs, COWs some > existing ebs but doesn't COW or > deletes eb X > > btrfs_commit_transaction(N + 1) > (...) > cur_trans->state = TRANS_STATE_COMMIT_START; > (...) > wait_for_commit(root, prev_trans); > --> prev_trans == transaction N > > btrfs_write_and_wait_transaction() continues > writing ebs > --> fails writing eb X, we abort transaction N > and set bit BTRFS_FS_STATE_ERROR on > fs_info->fs_state, so no new transactions > can start after setting that bit > > cleanup_transaction() > btrfs_cleanup_one_transaction() > wakes up task at CPU 1 > > continues, doesn't abort because > cur_trans->aborted (transaction N + 1) > is zero, and no checks for bit > BTRFS_FS_STATE_ERROR in fs_info->fs_state > are made > > btrfs_write_and_wait_transaction(trans, root); > --> succeeds, no errors during writeback > > write_ctree_super(trans, root, 0); > --> succeeds > --> we have now a superblock that points us > to some root that uses eb X, which was > never written to disk > > In this scenario future attempts to read eb X from disk results in an > error message like "parent transid verify failed on X wanted Y found Z". > > So fix this by aborting the current transaction if after waiting for the > previous transaction we verify that it was aborted. > > Signed-off-by: Filipe Manana Eesh good catch Filpe, Reviewed-by: Josef Bacik Thanks, Josef