stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Omar Sandoval <osandov@osandov.com>,
	Nikolay Borisov <nborisov@suse.com>,
	Omar Sandoval <osandov@fb.com>, David Sterba <dsterba@suse.com>
Subject: [PATCH 4.16 04/26] btrfs: Fix race condition between delayed refs and blockgroup removal
Date: Wed, 25 Apr 2018 12:33:13 +0200	[thread overview]
Message-ID: <20180425103315.020409043@linuxfoundation.org> (raw)
In-Reply-To: <20180425103314.842517924@linuxfoundation.org>

4.16-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Nikolay Borisov <nborisov@suse.com>

commit 5e388e95815408c27f3612190d089afc0774b870 upstream.

When the delayed refs for a head are all run, eventually
cleanup_ref_head is called which (in case of deletion) obtains a
reference for the relevant btrfs_space_info struct by querying the bg
for the range. This is problematic because when the last extent of a
bg is deleted a race window emerges between removal of that bg and the
subsequent invocation of cleanup_ref_head. This can result in cache being null
and either a null pointer dereference or assertion failure.

	task: ffff8d04d31ed080 task.stack: ffff9e5dc10cc000
	RIP: 0010:assfail.constprop.78+0x18/0x1a [btrfs]
	RSP: 0018:ffff9e5dc10cfbe8 EFLAGS: 00010292
	RAX: 0000000000000044 RBX: 0000000000000000 RCX: 0000000000000000
	RDX: ffff8d04ffc1f868 RSI: ffff8d04ffc178c8 RDI: ffff8d04ffc178c8
	RBP: ffff8d04d29e5ea0 R08: 00000000000001f0 R09: 0000000000000001
	R10: ffff9e5dc0507d58 R11: 0000000000000001 R12: ffff8d04d29e5ea0
	R13: ffff8d04d29e5f08 R14: ffff8d04efe29b40 R15: ffff8d04efe203e0
	FS:  00007fbf58ead500(0000) GS:ffff8d04ffc00000(0000) knlGS:0000000000000000
	CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
	CR2: 00007fe6c6975648 CR3: 0000000013b2a000 CR4: 00000000000006f0
	DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
	DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
	Call Trace:
	 __btrfs_run_delayed_refs+0x10e7/0x12c0 [btrfs]
	 btrfs_run_delayed_refs+0x68/0x250 [btrfs]
	 btrfs_should_end_transaction+0x42/0x60 [btrfs]
	 btrfs_truncate_inode_items+0xaac/0xfc0 [btrfs]
	 btrfs_evict_inode+0x4c6/0x5c0 [btrfs]
	 evict+0xc6/0x190
	 do_unlinkat+0x19c/0x300
	 do_syscall_64+0x74/0x140
	 entry_SYSCALL_64_after_hwframe+0x3d/0xa2
	RIP: 0033:0x7fbf589c57a7

To fix this, introduce a new flag "is_system" to head_ref structs,
which is populated at insertion time. This allows to decouple the
querying for the spaceinfo from querying the possibly deleted bg.

Fixes: d7eae3403f46 ("Btrfs: rework delayed ref total_bytes_pinned accounting")
CC: stable@vger.kernel.org # 4.14+
Suggested-by: Omar Sandoval <osandov@osandov.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/btrfs/delayed-ref.c |   19 ++++++++++++++-----
 fs/btrfs/delayed-ref.h |    1 +
 fs/btrfs/extent-tree.c |   16 +++++++++++-----
 3 files changed, 26 insertions(+), 10 deletions(-)

--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -553,8 +553,10 @@ add_delayed_ref_head(struct btrfs_fs_inf
 		     struct btrfs_delayed_ref_head *head_ref,
 		     struct btrfs_qgroup_extent_record *qrecord,
 		     u64 bytenr, u64 num_bytes, u64 ref_root, u64 reserved,
-		     int action, int is_data, int *qrecord_inserted_ret,
+		     int action, int is_data, int is_system,
+		     int *qrecord_inserted_ret,
 		     int *old_ref_mod, int *new_ref_mod)
+
 {
 	struct btrfs_delayed_ref_head *existing;
 	struct btrfs_delayed_ref_root *delayed_refs;
@@ -598,6 +600,7 @@ add_delayed_ref_head(struct btrfs_fs_inf
 	head_ref->ref_mod = count_mod;
 	head_ref->must_insert_reserved = must_insert_reserved;
 	head_ref->is_data = is_data;
+	head_ref->is_system = is_system;
 	head_ref->ref_tree = RB_ROOT;
 	INIT_LIST_HEAD(&head_ref->ref_add_list);
 	RB_CLEAR_NODE(&head_ref->href_node);
@@ -785,6 +788,7 @@ int btrfs_add_delayed_tree_ref(struct bt
 	struct btrfs_delayed_ref_root *delayed_refs;
 	struct btrfs_qgroup_extent_record *record = NULL;
 	int qrecord_inserted;
+	int is_system = (ref_root == BTRFS_CHUNK_TREE_OBJECTID);
 
 	BUG_ON(extent_op && extent_op->is_data);
 	ref = kmem_cache_alloc(btrfs_delayed_tree_ref_cachep, GFP_NOFS);
@@ -813,8 +817,8 @@ int btrfs_add_delayed_tree_ref(struct bt
 	 */
 	head_ref = add_delayed_ref_head(fs_info, trans, head_ref, record,
 					bytenr, num_bytes, 0, 0, action, 0,
-					&qrecord_inserted, old_ref_mod,
-					new_ref_mod);
+					is_system, &qrecord_inserted,
+					old_ref_mod, new_ref_mod);
 
 	add_delayed_tree_ref(fs_info, trans, head_ref, &ref->node, bytenr,
 			     num_bytes, parent, ref_root, level, action);
@@ -881,7 +885,7 @@ int btrfs_add_delayed_data_ref(struct bt
 	 */
 	head_ref = add_delayed_ref_head(fs_info, trans, head_ref, record,
 					bytenr, num_bytes, ref_root, reserved,
-					action, 1, &qrecord_inserted,
+					action, 1, 0, &qrecord_inserted,
 					old_ref_mod, new_ref_mod);
 
 	add_delayed_data_ref(fs_info, trans, head_ref, &ref->node, bytenr,
@@ -911,9 +915,14 @@ int btrfs_add_delayed_extent_op(struct b
 	delayed_refs = &trans->transaction->delayed_refs;
 	spin_lock(&delayed_refs->lock);
 
+	/*
+	 * extent_ops just modify the flags of an extent and they don't result
+	 * in ref count changes, hence it's safe to pass false/0 for is_system
+	 * argument
+	 */
 	add_delayed_ref_head(fs_info, trans, head_ref, NULL, bytenr,
 			     num_bytes, 0, 0, BTRFS_UPDATE_DELAYED_HEAD,
-			     extent_op->is_data, NULL, NULL, NULL);
+			     extent_op->is_data, 0, NULL, NULL, NULL);
 
 	spin_unlock(&delayed_refs->lock);
 	return 0;
--- a/fs/btrfs/delayed-ref.h
+++ b/fs/btrfs/delayed-ref.h
@@ -139,6 +139,7 @@ struct btrfs_delayed_ref_head {
 	 */
 	unsigned int must_insert_reserved:1;
 	unsigned int is_data:1;
+	unsigned int is_system:1;
 	unsigned int processing:1;
 };
 
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2615,13 +2615,19 @@ static int cleanup_ref_head(struct btrfs
 	trace_run_delayed_ref_head(fs_info, head, 0);
 
 	if (head->total_ref_mod < 0) {
-		struct btrfs_block_group_cache *cache;
+		struct btrfs_space_info *space_info;
+		u64 flags;
 
-		cache = btrfs_lookup_block_group(fs_info, head->bytenr);
-		ASSERT(cache);
-		percpu_counter_add(&cache->space_info->total_bytes_pinned,
+		if (head->is_data)
+			flags = BTRFS_BLOCK_GROUP_DATA;
+		else if (head->is_system)
+			flags = BTRFS_BLOCK_GROUP_SYSTEM;
+		else
+			flags = BTRFS_BLOCK_GROUP_METADATA;
+		space_info = __find_space_info(fs_info, flags);
+		ASSERT(space_info);
+		percpu_counter_add(&space_info->total_bytes_pinned,
 				   -head->num_bytes);
-		btrfs_put_block_group(cache);
 
 		if (head->is_data) {
 			spin_lock(&delayed_refs->lock);

  parent reply	other threads:[~2018-04-25 10:34 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-25 10:33 [PATCH 4.16 00/26] 4.16.5-stable review Greg Kroah-Hartman
2018-04-25 10:33 ` [PATCH 4.16 01/26] cifs: smbd: Check for iov length on sending the last iov Greg Kroah-Hartman
2018-04-25 10:33 ` [PATCH 4.16 02/26] cifs: do not allow creating sockets except with SMB1 posix exensions Greg Kroah-Hartman
2018-04-25 10:33 ` Greg Kroah-Hartman [this message]
2018-04-25 10:33 ` [PATCH 4.16 06/26] clocksource/imx-tpm: Correct -ETIME return condition check Greg Kroah-Hartman
2018-04-25 10:33 ` [PATCH 4.16 07/26] posix-cpu-timers: Ensure set_process_cpu_timer is always evaluated Greg Kroah-Hartman
2018-04-25 10:33 ` [PATCH 4.16 08/26] x86/tsc: Prevent 32bit truncation in calc_hpet_ref() Greg Kroah-Hartman
2018-04-25 10:33 ` [PATCH 4.16 09/26] drm/vc4: Fix memory leak during BO teardown Greg Kroah-Hartman
2018-04-25 10:33 ` [PATCH 4.16 10/26] drm/i915/gvt: throw error on unhandled vfio ioctls Greg Kroah-Hartman
2018-04-25 10:33 ` [PATCH 4.16 11/26] drm/i915/gvt: Add drm_format_mod update Greg Kroah-Hartman
2018-04-25 10:33 ` [PATCH 4.16 12/26] drm/i915/bios: filter out invalid DDC pins from VBT child devices Greg Kroah-Hartman
2018-04-25 10:33 ` [PATCH 4.16 13/26] drm/i915/audio: Fix audio detection issue on GLK Greg Kroah-Hartman
2018-04-25 10:33 ` [PATCH 4.16 14/26] drm/i915: Do no use kfree() to free a kmem_cache_alloc() return value Greg Kroah-Hartman
2018-04-25 10:33 ` [PATCH 4.16 16/26] alarmtimer: Init nanosleep alarm timer on stack Greg Kroah-Hartman
2018-04-25 10:33 ` [PATCH 4.16 17/26] mm,vmscan: Allow preallocating memory for register_shrinker() Greg Kroah-Hartman
2018-04-25 10:33 ` [PATCH 4.16 18/26] netfilter: x_tables: cap allocations at 512 mbyte Greg Kroah-Hartman
2018-04-25 10:33 ` [PATCH 4.16 19/26] netfilter: x_tables: add counters allocation wrapper Greg Kroah-Hartman
2018-04-25 10:33 ` [PATCH 4.16 20/26] netfilter: compat: prepare xt_compat_init_offsets to return errors Greg Kroah-Hartman
2018-04-25 10:33 ` [PATCH 4.16 21/26] netfilter: compat: reject huge allocation requests Greg Kroah-Hartman
2018-04-25 10:33 ` [PATCH 4.16 22/26] netfilter: x_tables: limit allocation requests for blob rule heads Greg Kroah-Hartman
2018-04-25 10:33 ` [PATCH 4.16 23/26] perf: Fix sample_max_stack maximum check Greg Kroah-Hartman
2018-04-25 10:33 ` [PATCH 4.16 24/26] perf: Return proper values for user stack errors Greg Kroah-Hartman
2018-04-25 10:33 ` [PATCH 4.16 25/26] RDMA/mlx5: Fix NULL dereference while accessing XRC_TGT QPs Greg Kroah-Hartman
2018-04-25 10:33 ` [PATCH 4.16 26/26] Revert "KVM: X86: Fix SMRAM accessing even if VM is shutdown" Greg Kroah-Hartman
2018-04-25 15:34 ` [PATCH 4.16 00/26] 4.16.5-stable review Guenter Roeck
2018-04-25 15:43   ` Greg Kroah-Hartman
2018-04-25 18:36 ` Shuah Khan
2018-04-26  5:59   ` Greg Kroah-Hartman
2018-04-25 21:42 ` Dan Rue
2018-04-26  6:59   ` Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180425103315.020409043@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=dsterba@suse.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nborisov@suse.com \
    --cc=osandov@fb.com \
    --cc=osandov@osandov.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).