From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, Omar Sandoval <osandov@osandov.com>,
Nikolay Borisov <nborisov@suse.com>,
Omar Sandoval <osandov@fb.com>, David Sterba <dsterba@suse.com>
Subject: [PATCH 4.16 04/26] btrfs: Fix race condition between delayed refs and blockgroup removal
Date: Wed, 25 Apr 2018 12:33:13 +0200 [thread overview]
Message-ID: <20180425103315.020409043@linuxfoundation.org> (raw)
In-Reply-To: <20180425103314.842517924@linuxfoundation.org>
4.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nikolay Borisov <nborisov@suse.com>
commit 5e388e95815408c27f3612190d089afc0774b870 upstream.
When the delayed refs for a head are all run, eventually
cleanup_ref_head is called which (in case of deletion) obtains a
reference for the relevant btrfs_space_info struct by querying the bg
for the range. This is problematic because when the last extent of a
bg is deleted a race window emerges between removal of that bg and the
subsequent invocation of cleanup_ref_head. This can result in cache being null
and either a null pointer dereference or assertion failure.
task: ffff8d04d31ed080 task.stack: ffff9e5dc10cc000
RIP: 0010:assfail.constprop.78+0x18/0x1a [btrfs]
RSP: 0018:ffff9e5dc10cfbe8 EFLAGS: 00010292
RAX: 0000000000000044 RBX: 0000000000000000 RCX: 0000000000000000
RDX: ffff8d04ffc1f868 RSI: ffff8d04ffc178c8 RDI: ffff8d04ffc178c8
RBP: ffff8d04d29e5ea0 R08: 00000000000001f0 R09: 0000000000000001
R10: ffff9e5dc0507d58 R11: 0000000000000001 R12: ffff8d04d29e5ea0
R13: ffff8d04d29e5f08 R14: ffff8d04efe29b40 R15: ffff8d04efe203e0
FS: 00007fbf58ead500(0000) GS:ffff8d04ffc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fe6c6975648 CR3: 0000000013b2a000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
__btrfs_run_delayed_refs+0x10e7/0x12c0 [btrfs]
btrfs_run_delayed_refs+0x68/0x250 [btrfs]
btrfs_should_end_transaction+0x42/0x60 [btrfs]
btrfs_truncate_inode_items+0xaac/0xfc0 [btrfs]
btrfs_evict_inode+0x4c6/0x5c0 [btrfs]
evict+0xc6/0x190
do_unlinkat+0x19c/0x300
do_syscall_64+0x74/0x140
entry_SYSCALL_64_after_hwframe+0x3d/0xa2
RIP: 0033:0x7fbf589c57a7
To fix this, introduce a new flag "is_system" to head_ref structs,
which is populated at insertion time. This allows to decouple the
querying for the spaceinfo from querying the possibly deleted bg.
Fixes: d7eae3403f46 ("Btrfs: rework delayed ref total_bytes_pinned accounting")
CC: stable@vger.kernel.org # 4.14+
Suggested-by: Omar Sandoval <osandov@osandov.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
fs/btrfs/delayed-ref.c | 19 ++++++++++++++-----
fs/btrfs/delayed-ref.h | 1 +
fs/btrfs/extent-tree.c | 16 +++++++++++-----
3 files changed, 26 insertions(+), 10 deletions(-)
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -553,8 +553,10 @@ add_delayed_ref_head(struct btrfs_fs_inf
struct btrfs_delayed_ref_head *head_ref,
struct btrfs_qgroup_extent_record *qrecord,
u64 bytenr, u64 num_bytes, u64 ref_root, u64 reserved,
- int action, int is_data, int *qrecord_inserted_ret,
+ int action, int is_data, int is_system,
+ int *qrecord_inserted_ret,
int *old_ref_mod, int *new_ref_mod)
+
{
struct btrfs_delayed_ref_head *existing;
struct btrfs_delayed_ref_root *delayed_refs;
@@ -598,6 +600,7 @@ add_delayed_ref_head(struct btrfs_fs_inf
head_ref->ref_mod = count_mod;
head_ref->must_insert_reserved = must_insert_reserved;
head_ref->is_data = is_data;
+ head_ref->is_system = is_system;
head_ref->ref_tree = RB_ROOT;
INIT_LIST_HEAD(&head_ref->ref_add_list);
RB_CLEAR_NODE(&head_ref->href_node);
@@ -785,6 +788,7 @@ int btrfs_add_delayed_tree_ref(struct bt
struct btrfs_delayed_ref_root *delayed_refs;
struct btrfs_qgroup_extent_record *record = NULL;
int qrecord_inserted;
+ int is_system = (ref_root == BTRFS_CHUNK_TREE_OBJECTID);
BUG_ON(extent_op && extent_op->is_data);
ref = kmem_cache_alloc(btrfs_delayed_tree_ref_cachep, GFP_NOFS);
@@ -813,8 +817,8 @@ int btrfs_add_delayed_tree_ref(struct bt
*/
head_ref = add_delayed_ref_head(fs_info, trans, head_ref, record,
bytenr, num_bytes, 0, 0, action, 0,
- &qrecord_inserted, old_ref_mod,
- new_ref_mod);
+ is_system, &qrecord_inserted,
+ old_ref_mod, new_ref_mod);
add_delayed_tree_ref(fs_info, trans, head_ref, &ref->node, bytenr,
num_bytes, parent, ref_root, level, action);
@@ -881,7 +885,7 @@ int btrfs_add_delayed_data_ref(struct bt
*/
head_ref = add_delayed_ref_head(fs_info, trans, head_ref, record,
bytenr, num_bytes, ref_root, reserved,
- action, 1, &qrecord_inserted,
+ action, 1, 0, &qrecord_inserted,
old_ref_mod, new_ref_mod);
add_delayed_data_ref(fs_info, trans, head_ref, &ref->node, bytenr,
@@ -911,9 +915,14 @@ int btrfs_add_delayed_extent_op(struct b
delayed_refs = &trans->transaction->delayed_refs;
spin_lock(&delayed_refs->lock);
+ /*
+ * extent_ops just modify the flags of an extent and they don't result
+ * in ref count changes, hence it's safe to pass false/0 for is_system
+ * argument
+ */
add_delayed_ref_head(fs_info, trans, head_ref, NULL, bytenr,
num_bytes, 0, 0, BTRFS_UPDATE_DELAYED_HEAD,
- extent_op->is_data, NULL, NULL, NULL);
+ extent_op->is_data, 0, NULL, NULL, NULL);
spin_unlock(&delayed_refs->lock);
return 0;
--- a/fs/btrfs/delayed-ref.h
+++ b/fs/btrfs/delayed-ref.h
@@ -139,6 +139,7 @@ struct btrfs_delayed_ref_head {
*/
unsigned int must_insert_reserved:1;
unsigned int is_data:1;
+ unsigned int is_system:1;
unsigned int processing:1;
};
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2615,13 +2615,19 @@ static int cleanup_ref_head(struct btrfs
trace_run_delayed_ref_head(fs_info, head, 0);
if (head->total_ref_mod < 0) {
- struct btrfs_block_group_cache *cache;
+ struct btrfs_space_info *space_info;
+ u64 flags;
- cache = btrfs_lookup_block_group(fs_info, head->bytenr);
- ASSERT(cache);
- percpu_counter_add(&cache->space_info->total_bytes_pinned,
+ if (head->is_data)
+ flags = BTRFS_BLOCK_GROUP_DATA;
+ else if (head->is_system)
+ flags = BTRFS_BLOCK_GROUP_SYSTEM;
+ else
+ flags = BTRFS_BLOCK_GROUP_METADATA;
+ space_info = __find_space_info(fs_info, flags);
+ ASSERT(space_info);
+ percpu_counter_add(&space_info->total_bytes_pinned,
-head->num_bytes);
- btrfs_put_block_group(cache);
if (head->is_data) {
spin_lock(&delayed_refs->lock);
next prev parent reply other threads:[~2018-04-25 10:35 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-04-25 10:33 [PATCH 4.16 00/26] 4.16.5-stable review Greg Kroah-Hartman
2018-04-25 10:33 ` [PATCH 4.16 01/26] cifs: smbd: Check for iov length on sending the last iov Greg Kroah-Hartman
2018-04-25 10:33 ` [PATCH 4.16 02/26] cifs: do not allow creating sockets except with SMB1 posix exensions Greg Kroah-Hartman
2018-04-25 10:33 ` [PATCH 4.16 03/26] btrfs: fix unaligned access in readdir Greg Kroah-Hartman
2018-04-25 10:33 ` Greg Kroah-Hartman [this message]
2018-04-25 10:33 ` [PATCH 4.16 05/26] x86/acpi: Prevent X2APIC id 0xffffffff from being accounted Greg Kroah-Hartman
2018-04-25 10:33 ` [PATCH 4.16 06/26] clocksource/imx-tpm: Correct -ETIME return condition check Greg Kroah-Hartman
2018-04-25 10:33 ` [PATCH 4.16 07/26] posix-cpu-timers: Ensure set_process_cpu_timer is always evaluated Greg Kroah-Hartman
2018-04-25 10:33 ` [PATCH 4.16 08/26] x86/tsc: Prevent 32bit truncation in calc_hpet_ref() Greg Kroah-Hartman
2018-04-25 10:33 ` [PATCH 4.16 09/26] drm/vc4: Fix memory leak during BO teardown Greg Kroah-Hartman
2018-04-25 10:33 ` [PATCH 4.16 10/26] drm/i915/gvt: throw error on unhandled vfio ioctls Greg Kroah-Hartman
2018-04-25 10:33 ` [PATCH 4.16 11/26] drm/i915/gvt: Add drm_format_mod update Greg Kroah-Hartman
2018-04-25 10:33 ` [PATCH 4.16 12/26] drm/i915/bios: filter out invalid DDC pins from VBT child devices Greg Kroah-Hartman
2018-04-25 10:33 ` [PATCH 4.16 13/26] drm/i915/audio: Fix audio detection issue on GLK Greg Kroah-Hartman
2018-04-25 10:33 ` [PATCH 4.16 14/26] drm/i915: Do no use kfree() to free a kmem_cache_alloc() return value Greg Kroah-Hartman
2018-04-25 10:33 ` [PATCH 4.16 15/26] drm/i915: Fix LSPCON TMDS output buffer enabling from low-power state Greg Kroah-Hartman
2018-04-25 10:33 ` [PATCH 4.16 16/26] alarmtimer: Init nanosleep alarm timer on stack Greg Kroah-Hartman
2018-04-25 10:33 ` [PATCH 4.16 17/26] mm,vmscan: Allow preallocating memory for register_shrinker() Greg Kroah-Hartman
2018-04-25 10:33 ` [PATCH 4.16 18/26] netfilter: x_tables: cap allocations at 512 mbyte Greg Kroah-Hartman
2018-04-25 10:33 ` [PATCH 4.16 19/26] netfilter: x_tables: add counters allocation wrapper Greg Kroah-Hartman
2018-04-25 10:33 ` [PATCH 4.16 20/26] netfilter: compat: prepare xt_compat_init_offsets to return errors Greg Kroah-Hartman
2018-04-25 10:33 ` [PATCH 4.16 21/26] netfilter: compat: reject huge allocation requests Greg Kroah-Hartman
2018-04-25 10:33 ` [PATCH 4.16 22/26] netfilter: x_tables: limit allocation requests for blob rule heads Greg Kroah-Hartman
2018-04-25 10:33 ` [PATCH 4.16 23/26] perf: Fix sample_max_stack maximum check Greg Kroah-Hartman
2018-04-25 10:33 ` [PATCH 4.16 24/26] perf: Return proper values for user stack errors Greg Kroah-Hartman
2018-04-25 10:33 ` [PATCH 4.16 25/26] RDMA/mlx5: Fix NULL dereference while accessing XRC_TGT QPs Greg Kroah-Hartman
2018-04-25 10:33 ` [PATCH 4.16 26/26] Revert "KVM: X86: Fix SMRAM accessing even if VM is shutdown" Greg Kroah-Hartman
2018-04-25 15:34 ` [PATCH 4.16 00/26] 4.16.5-stable review Guenter Roeck
2018-04-25 15:43 ` Greg Kroah-Hartman
2018-04-25 15:44 ` kernelci.org bot
2018-04-25 18:36 ` Shuah Khan
2018-04-26 5:59 ` Greg Kroah-Hartman
2018-04-25 21:42 ` Dan Rue
2018-04-26 6:59 ` Greg Kroah-Hartman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180425103315.020409043@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=dsterba@suse.com \
--cc=linux-kernel@vger.kernel.org \
--cc=nborisov@suse.com \
--cc=osandov@fb.com \
--cc=osandov@osandov.com \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.