From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, Omar Sandoval <osandov@osandov.com>,
Nikolay Borisov <nborisov@suse.com>,
Omar Sandoval <osandov@fb.com>, David Sterba <dsterba@suse.com>
Subject: [PATCH 4.16 04/26] btrfs: Fix race condition between delayed refs and blockgroup removal
Date: Wed, 25 Apr 2018 12:33:13 +0200 [thread overview]
Message-ID: <20180425103315.020409043@linuxfoundation.org> (raw)
In-Reply-To: <20180425103314.842517924@linuxfoundation.org>
4.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nikolay Borisov <nborisov@suse.com>
commit 5e388e95815408c27f3612190d089afc0774b870 upstream.
When the delayed refs for a head are all run, eventually
cleanup_ref_head is called which (in case of deletion) obtains a
reference for the relevant btrfs_space_info struct by querying the bg
for the range. This is problematic because when the last extent of a
bg is deleted a race window emerges between removal of that bg and the
subsequent invocation of cleanup_ref_head. This can result in cache being null
and either a null pointer dereference or assertion failure.
task: ffff8d04d31ed080 task.stack: ffff9e5dc10cc000
RIP: 0010:assfail.constprop.78+0x18/0x1a [btrfs]
RSP: 0018:ffff9e5dc10cfbe8 EFLAGS: 00010292
RAX: 0000000000000044 RBX: 0000000000000000 RCX: 0000000000000000
RDX: ffff8d04ffc1f868 RSI: ffff8d04ffc178c8 RDI: ffff8d04ffc178c8
RBP: ffff8d04d29e5ea0 R08: 00000000000001f0 R09: 0000000000000001
R10: ffff9e5dc0507d58 R11: 0000000000000001 R12: ffff8d04d29e5ea0
R13: ffff8d04d29e5f08 R14: ffff8d04efe29b40 R15: ffff8d04efe203e0
FS: 00007fbf58ead500(0000) GS:ffff8d04ffc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fe6c6975648 CR3: 0000000013b2a000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
__btrfs_run_delayed_refs+0x10e7/0x12c0 [btrfs]
btrfs_run_delayed_refs+0x68/0x250 [btrfs]
btrfs_should_end_transaction+0x42/0x60 [btrfs]
btrfs_truncate_inode_items+0xaac/0xfc0 [btrfs]
btrfs_evict_inode+0x4c6/0x5c0 [btrfs]
evict+0xc6/0x190
do_unlinkat+0x19c/0x300
do_syscall_64+0x74/0x140
entry_SYSCALL_64_after_hwframe+0x3d/0xa2
RIP: 0033:0x7fbf589c57a7
To fix this, introduce a new flag "is_system" to head_ref structs,
which is populated at insertion time. This allows to decouple the
querying for the spaceinfo from querying the possibly deleted bg.
Fixes: d7eae3403f46 ("Btrfs: rework delayed ref total_bytes_pinned accounting")
CC: stable@vger.kernel.org # 4.14+
Suggested-by: Omar Sandoval <osandov@osandov.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
fs/btrfs/delayed-ref.c | 19 ++++++++++++++-----
fs/btrfs/delayed-ref.h | 1 +
fs/btrfs/extent-tree.c | 16 +++++++++++-----
3 files changed, 26 insertions(+), 10 deletions(-)
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -553,8 +553,10 @@ add_delayed_ref_head(struct btrfs_fs_inf
struct btrfs_delayed_ref_head *head_ref,
struct btrfs_qgroup_extent_record *qrecord,
u64 bytenr, u64 num_bytes, u64 ref_root, u64 reserved,
- int action, int is_data, int *qrecord_inserted_ret,
+ int action, int is_data, int is_system,
+ int *qrecord_inserted_ret,
int *old_ref_mod, int *new_ref_mod)
+
{
struct btrfs_delayed_ref_head *existing;
struct btrfs_delayed_ref_root *delayed_refs;
@@ -598,6 +600,7 @@ add_delayed_ref_head(struct btrfs_fs_inf
head_ref->ref_mod = count_mod;
head_ref->must_insert_reserved = must_insert_reserved;
head_ref->is_data = is_data;
+ head_ref->is_system = is_system;
head_ref->ref_tree = RB_ROOT;
INIT_LIST_HEAD(&head_ref->ref_add_list);
RB_CLEAR_NODE(&head_ref->href_node);
@@ -785,6 +788,7 @@ int btrfs_add_delayed_tree_ref(struct bt
struct btrfs_delayed_ref_root *delayed_refs;
struct btrfs_qgroup_extent_record *record = NULL;
int qrecord_inserted;
+ int is_system = (ref_root == BTRFS_CHUNK_TREE_OBJECTID);
BUG_ON(extent_op && extent_op->is_data);
ref = kmem_cache_alloc(btrfs_delayed_tree_ref_cachep, GFP_NOFS);
@@ -813,8 +817,8 @@ int btrfs_add_delayed_tree_ref(struct bt
*/
head_ref = add_delayed_ref_head(fs_info, trans, head_ref, record,
bytenr, num_bytes, 0, 0, action, 0,
- &qrecord_inserted, old_ref_mod,
- new_ref_mod);
+ is_system, &qrecord_inserted,
+ old_ref_mod, new_ref_mod);
add_delayed_tree_ref(fs_info, trans, head_ref, &ref->node, bytenr,
num_bytes, parent, ref_root, level, action);
@@ -881,7 +885,7 @@ int btrfs_add_delayed_data_ref(struct bt
*/
head_ref = add_delayed_ref_head(fs_info, trans, head_ref, record,
bytenr, num_bytes, ref_root, reserved,
- action, 1, &qrecord_inserted,
+ action, 1, 0, &qrecord_inserted,
old_ref_mod, new_ref_mod);
add_delayed_data_ref(fs_info, trans, head_ref, &ref->node, bytenr,
@@ -911,9 +915,14 @@ int btrfs_add_delayed_extent_op(struct b
delayed_refs = &trans->transaction->delayed_refs;
spin_lock(&delayed_refs->lock);
+ /*
+ * extent_ops just modify the flags of an extent and they don't result
+ * in ref count changes, hence it's safe to pass false/0 for is_system
+ * argument
+ */
add_delayed_ref_head(fs_info, trans, head_ref, NULL, bytenr,
num_bytes, 0, 0, BTRFS_UPDATE_DELAYED_HEAD,
- extent_op->is_data, NULL, NULL, NULL);
+ extent_op->is_data, 0, NULL, NULL, NULL);
spin_unlock(&delayed_refs->lock);
return 0;
--- a/fs/btrfs/delayed-ref.h
+++ b/fs/btrfs/delayed-ref.h
@@ -139,6 +139,7 @@ struct btrfs_delayed_ref_head {
*/
unsigned int must_insert_reserved:1;
unsigned int is_data:1;
+ unsigned int is_system:1;
unsigned int processing:1;
};
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2615,13 +2615,19 @@ static int cleanup_ref_head(struct btrfs
trace_run_delayed_ref_head(fs_info, head, 0);
if (head->total_ref_mod < 0) {
- struct btrfs_block_group_cache *cache;
+ struct btrfs_space_info *space_info;
+ u64 flags;
- cache = btrfs_lookup_block_group(fs_info, head->bytenr);
- ASSERT(cache);
- percpu_counter_add(&cache->space_info->total_bytes_pinned,
+ if (head->is_data)
+ flags = BTRFS_BLOCK_GROUP_DATA;
+ else if (head->is_system)
+ flags = BTRFS_BLOCK_GROUP_SYSTEM;
+ else
+ flags = BTRFS_BLOCK_GROUP_METADATA;
+ space_info = __find_space_info(fs_info, flags);
+ ASSERT(space_info);
+ percpu_counter_add(&space_info->total_bytes_pinned,
-head->num_bytes);
- btrfs_put_block_group(cache);
if (head->is_data) {
spin_lock(&delayed_refs->lock);
next prev parent reply other threads:[~2018-04-25 10:34 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-04-25 10:33 [PATCH 4.16 00/26] 4.16.5-stable review Greg Kroah-Hartman
2018-04-25 10:33 ` [PATCH 4.16 01/26] cifs: smbd: Check for iov length on sending the last iov Greg Kroah-Hartman
2018-04-25 10:33 ` [PATCH 4.16 02/26] cifs: do not allow creating sockets except with SMB1 posix exensions Greg Kroah-Hartman
2018-04-25 10:33 ` Greg Kroah-Hartman [this message]
2018-04-25 10:33 ` [PATCH 4.16 06/26] clocksource/imx-tpm: Correct -ETIME return condition check Greg Kroah-Hartman
2018-04-25 10:33 ` [PATCH 4.16 07/26] posix-cpu-timers: Ensure set_process_cpu_timer is always evaluated Greg Kroah-Hartman
2018-04-25 10:33 ` [PATCH 4.16 08/26] x86/tsc: Prevent 32bit truncation in calc_hpet_ref() Greg Kroah-Hartman
2018-04-25 10:33 ` [PATCH 4.16 09/26] drm/vc4: Fix memory leak during BO teardown Greg Kroah-Hartman
2018-04-25 10:33 ` [PATCH 4.16 10/26] drm/i915/gvt: throw error on unhandled vfio ioctls Greg Kroah-Hartman
2018-04-25 10:33 ` [PATCH 4.16 11/26] drm/i915/gvt: Add drm_format_mod update Greg Kroah-Hartman
2018-04-25 10:33 ` [PATCH 4.16 12/26] drm/i915/bios: filter out invalid DDC pins from VBT child devices Greg Kroah-Hartman
2018-04-25 10:33 ` [PATCH 4.16 13/26] drm/i915/audio: Fix audio detection issue on GLK Greg Kroah-Hartman
2018-04-25 10:33 ` [PATCH 4.16 14/26] drm/i915: Do no use kfree() to free a kmem_cache_alloc() return value Greg Kroah-Hartman
2018-04-25 10:33 ` [PATCH 4.16 16/26] alarmtimer: Init nanosleep alarm timer on stack Greg Kroah-Hartman
2018-04-25 10:33 ` [PATCH 4.16 17/26] mm,vmscan: Allow preallocating memory for register_shrinker() Greg Kroah-Hartman
2018-04-25 10:33 ` [PATCH 4.16 18/26] netfilter: x_tables: cap allocations at 512 mbyte Greg Kroah-Hartman
2018-04-25 10:33 ` [PATCH 4.16 19/26] netfilter: x_tables: add counters allocation wrapper Greg Kroah-Hartman
2018-04-25 10:33 ` [PATCH 4.16 20/26] netfilter: compat: prepare xt_compat_init_offsets to return errors Greg Kroah-Hartman
2018-04-25 10:33 ` [PATCH 4.16 21/26] netfilter: compat: reject huge allocation requests Greg Kroah-Hartman
2018-04-25 10:33 ` [PATCH 4.16 22/26] netfilter: x_tables: limit allocation requests for blob rule heads Greg Kroah-Hartman
2018-04-25 10:33 ` [PATCH 4.16 23/26] perf: Fix sample_max_stack maximum check Greg Kroah-Hartman
2018-04-25 10:33 ` [PATCH 4.16 24/26] perf: Return proper values for user stack errors Greg Kroah-Hartman
2018-04-25 10:33 ` [PATCH 4.16 25/26] RDMA/mlx5: Fix NULL dereference while accessing XRC_TGT QPs Greg Kroah-Hartman
2018-04-25 10:33 ` [PATCH 4.16 26/26] Revert "KVM: X86: Fix SMRAM accessing even if VM is shutdown" Greg Kroah-Hartman
2018-04-25 15:34 ` [PATCH 4.16 00/26] 4.16.5-stable review Guenter Roeck
2018-04-25 15:43 ` Greg Kroah-Hartman
2018-04-25 18:36 ` Shuah Khan
2018-04-26 5:59 ` Greg Kroah-Hartman
2018-04-25 21:42 ` Dan Rue
2018-04-26 6:59 ` Greg Kroah-Hartman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180425103315.020409043@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=dsterba@suse.com \
--cc=linux-kernel@vger.kernel.org \
--cc=nborisov@suse.com \
--cc=osandov@fb.com \
--cc=osandov@osandov.com \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).