From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.0 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C9AA1C43381 for ; Wed, 20 Mar 2019 17:44:48 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 944B92184D for ; Wed, 20 Mar 2019 17:44:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1553103888; bh=KCLQRYBoTpri19cTVK1WPASNXnx0WvOw0zptjmdk9ZI=; h=Subject:To:Cc:From:Date:List-ID:From; b=NoHSLc9jxZeFXfpttGmn8k2z8HBsk4qqyme8W/x3WW+zt35XoHpfJrhyy+s9/9vLS SVz/dG58Yjxv9HYJyJl0BKR+0ofjGTVOv1bIVu9PxS8xcgww6EBHcJw/V2v/VWIVOU LYi3iHL4b9qkfpcrPTlVgkB3hqQKyE+ok7KK2eSI= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727169AbfCTRos (ORCPT ); Wed, 20 Mar 2019 13:44:48 -0400 Received: from out5-smtp.messagingengine.com ([66.111.4.29]:39085 "EHLO out5-smtp.messagingengine.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726295AbfCTRop (ORCPT ); Wed, 20 Mar 2019 13:44:45 -0400 Received: from compute6.internal (compute6.nyi.internal [10.202.2.46]) by mailout.nyi.internal (Postfix) with ESMTP id CC7F821BA9; Wed, 20 Mar 2019 13:44:43 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute6.internal (MEProxy); Wed, 20 Mar 2019 13:44:43 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:from:message-id:mime-version:subject:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm2; bh=v4X2+k Xm5/RTspRZ8go8c57yqFtlaPMIiFlbclbOjjU=; b=Y0IDWQav8/jzSd1/6aV1WC Ae6KGHndwFpFsCK09QhxA2PPLu+YAh1tIuD/DzMF2HLycy1NwcdCdCMenKKJsqcq IYBo3c0jbMHdGXZnM4uGCCQz2pvPZC+qOlumJPFdEZ2aLLaviMApM8iU2CB3ot8a 3wTRzz8umYpN0lfWJ/3PORLRfhbN9zrW+URgQ5/eBkEYT7IUOOJC8YbqnYCtdx5t qJ3CeVMNDeZlxUYx8F1ZH48oYYejlb/4oQUOq98w83sB3rXuatOjy+s2JF9UeNiP Khrr/OqsJKFlyKVoQdrYNPitslRzuU8fSyt7/SxpbsEpn1KebkE2DEbNGEKWMzVw == X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedutddrieeigddutdegucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefuvffhfffkgggtgfesthekredttd dtlfenucfhrhhomhepoehgrhgvghhkhheslhhinhhugihfohhunhgurghtihhonhdrohhr gheqnecukfhppeekfedrkeeirdekledruddtjeenucfrrghrrghmpehmrghilhhfrhhomh epghhrvghgsehkrhhorghhrdgtohhmnecuvehluhhsthgvrhfuihiivgepfe X-ME-Proxy: Received: from localhost (5356596b.cm-6-7b.dynamic.ziggo.nl [83.86.89.107]) by mail.messagingengine.com (Postfix) with ESMTPA id 23B101031E; Wed, 20 Mar 2019 13:44:43 -0400 (EDT) Subject: FAILED: patch "[PATCH] btrfs: honor path->skip_locking in backref code" failed to apply to 4.14-stable tree To: josef@toxicpanda.com, dsterba@suse.com, fdmanana@suse.com, wqu@suse.com Cc: From: Date: Wed, 20 Mar 2019 18:44:38 +0100 Message-ID: <15531038781972@kroah.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ANSI_X3.4-1968 Content-Transfer-Encoding: 8bit Sender: stable-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org The patch below does not apply to the 4.14-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to . thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 38e3eebff643db725633657d1d87a3be019d1018 Mon Sep 17 00:00:00 2001 From: Josef Bacik Date: Wed, 16 Jan 2019 11:00:57 -0500 Subject: [PATCH] btrfs: honor path->skip_locking in backref code Qgroups will do the old roots lookup at delayed ref time, which could be while walking down the extent root while running a delayed ref. This should be fine, except we specifically lock eb's in the backref walking code irrespective of path->skip_locking, which deadlocks the system. Fix up the backref code to honor path->skip_locking, nobody will be modifying the commit_root when we're searching so it's completely safe to do. This happens since fb235dc06fac ("btrfs: qgroup: Move half of the qgroup accounting time out of commit trans"), kernel may lockup with quota enabled. There is one backref trace triggered by snapshot dropping along with write operation in the source subvolume. The example can be reliably reproduced: btrfs-cleaner D 0 4062 2 0x80000000 Call Trace: schedule+0x32/0x90 btrfs_tree_read_lock+0x93/0x130 [btrfs] find_parent_nodes+0x29b/0x1170 [btrfs] btrfs_find_all_roots_safe+0xa8/0x120 [btrfs] btrfs_find_all_roots+0x57/0x70 [btrfs] btrfs_qgroup_trace_extent_post+0x37/0x70 [btrfs] btrfs_qgroup_trace_leaf_items+0x10b/0x140 [btrfs] btrfs_qgroup_trace_subtree+0xc8/0xe0 [btrfs] do_walk_down+0x541/0x5e3 [btrfs] walk_down_tree+0xab/0xe7 [btrfs] btrfs_drop_snapshot+0x356/0x71a [btrfs] btrfs_clean_one_deleted_snapshot+0xb8/0xf0 [btrfs] cleaner_kthread+0x12b/0x160 [btrfs] kthread+0x112/0x130 ret_from_fork+0x27/0x50 When dropping snapshots with qgroup enabled, we will trigger backref walk. However such backref walk at that timing is pretty dangerous, as if one of the parent nodes get WRITE locked by other thread, we could cause a dead lock. For example: FS 260 FS 261 (Dropped) node A node B / \ / \ node C node D node E / \ / \ / \ leaf F|leaf G|leaf H|leaf I|leaf J|leaf K The lock sequence would be: Thread A (cleaner) | Thread B (other writer) ----------------------------------------------------------------------- write_lock(B) | write_lock(D) | ^^^ called by walk_down_tree() | | write_lock(A) | write_lock(D) << Stall read_lock(H) << for backref walk | read_lock(D) << lock owner is | the same thread A | so read lock is OK | read_lock(A) << Stall | So thread A hold write lock D, and needs read lock A to unlock. While thread B holds write lock A, while needs lock D to unlock. This will cause a deadlock. This is not only limited to snapshot dropping case. As the backref walk, even only happens on commit trees, is breaking the normal top-down locking order, makes it deadlock prone. Fixes: fb235dc06fac ("btrfs: qgroup: Move half of the qgroup accounting time out of commit trans") CC: stable@vger.kernel.org # 4.14+ Reported-and-tested-by: David Sterba Reported-by: Filipe Manana Reviewed-by: Qu Wenruo Signed-off-by: Josef Bacik Reviewed-by: Filipe Manana [ rebase to latest branch and fix lock assert bug in btrfs/007 ] Signed-off-by: Qu Wenruo [ copy logs and deadlock analysis from Qu's patch ] Signed-off-by: David Sterba diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c index 136454dbb4af..11459fe84a29 100644 --- a/fs/btrfs/backref.c +++ b/fs/btrfs/backref.c @@ -712,7 +712,7 @@ static int resolve_indirect_refs(struct btrfs_fs_info *fs_info, * read tree blocks and add keys where required. */ static int add_missing_keys(struct btrfs_fs_info *fs_info, - struct preftrees *preftrees) + struct preftrees *preftrees, bool lock) { struct prelim_ref *ref; struct extent_buffer *eb; @@ -737,12 +737,14 @@ static int add_missing_keys(struct btrfs_fs_info *fs_info, free_extent_buffer(eb); return -EIO; } - btrfs_tree_read_lock(eb); + if (lock) + btrfs_tree_read_lock(eb); if (btrfs_header_level(eb) == 0) btrfs_item_key_to_cpu(eb, &ref->key_for_search, 0); else btrfs_node_key_to_cpu(eb, &ref->key_for_search, 0); - btrfs_tree_read_unlock(eb); + if (lock) + btrfs_tree_read_unlock(eb); free_extent_buffer(eb); prelim_ref_insert(fs_info, &preftrees->indirect, ref, NULL); cond_resched(); @@ -1227,7 +1229,7 @@ static int find_parent_nodes(struct btrfs_trans_handle *trans, btrfs_release_path(path); - ret = add_missing_keys(fs_info, &preftrees); + ret = add_missing_keys(fs_info, &preftrees, path->skip_locking == 0); if (ret) goto out; @@ -1288,11 +1290,15 @@ static int find_parent_nodes(struct btrfs_trans_handle *trans, ret = -EIO; goto out; } - btrfs_tree_read_lock(eb); - btrfs_set_lock_blocking_read(eb); + + if (!path->skip_locking) { + btrfs_tree_read_lock(eb); + btrfs_set_lock_blocking_read(eb); + } ret = find_extent_in_eb(eb, bytenr, *extent_item_pos, &eie, ignore_offset); - btrfs_tree_read_unlock_blocking(eb); + if (!path->skip_locking) + btrfs_tree_read_unlock_blocking(eb); free_extent_buffer(eb); if (ret < 0) goto out;