From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, Josef Bacik <josef@toxicpanda.com>,
Filipe Manana <fdmanana@suse.com>,
David Sterba <dsterba@suse.com>
Subject: [PATCH 5.2 26/85] Btrfs: fix assertion failure during fsync and use of stale transaction
Date: Wed, 18 Sep 2019 08:18:44 +0200 [thread overview]
Message-ID: <20190918061234.989088316@linuxfoundation.org> (raw)
In-Reply-To: <20190918061234.107708857@linuxfoundation.org>
From: Filipe Manana <fdmanana@suse.com>
commit 410f954cb1d1c79ae485dd83a175f21954fd87cd upstream.
Sometimes when fsync'ing a file we need to log that other inodes exist and
when we need to do that we acquire a reference on the inodes and then drop
that reference using iput() after logging them.
That generally is not a problem except if we end up doing the final iput()
(dropping the last reference) on the inode and that inode has a link count
of 0, which can happen in a very short time window if the logging path
gets a reference on the inode while it's being unlinked.
In that case we end up getting the eviction callback, btrfs_evict_inode(),
invoked through the iput() call chain which needs to drop all of the
inode's items from its subvolume btree, and in order to do that, it needs
to join a transaction at the helper function evict_refill_and_join().
However because the task previously started a transaction at the fsync
handler, btrfs_sync_file(), it has current->journal_info already pointing
to a transaction handle and therefore evict_refill_and_join() will get
that transaction handle from btrfs_join_transaction(). From this point on,
two different problems can happen:
1) evict_refill_and_join() will often change the transaction handle's
block reserve (->block_rsv) and set its ->bytes_reserved field to a
value greater than 0. If evict_refill_and_join() never commits the
transaction, the eviction handler ends up decreasing the reference
count (->use_count) of the transaction handle through the call to
btrfs_end_transaction(), and after that point we have a transaction
handle with a NULL ->block_rsv (which is the value prior to the
transaction join from evict_refill_and_join()) and a ->bytes_reserved
value greater than 0. If after the eviction/iput completes the inode
logging path hits an error or it decides that it must fallback to a
transaction commit, the btrfs fsync handle, btrfs_sync_file(), gets a
non-zero value from btrfs_log_dentry_safe(), and because of that
non-zero value it tries to commit the transaction using a handle with
a NULL ->block_rsv and a non-zero ->bytes_reserved value. This makes
the transaction commit hit an assertion failure at
btrfs_trans_release_metadata() because ->bytes_reserved is not zero but
the ->block_rsv is NULL. The produced stack trace for that is like the
following:
[192922.917158] assertion failed: !trans->bytes_reserved, file: fs/btrfs/transaction.c, line: 816
[192922.917553] ------------[ cut here ]------------
[192922.917922] kernel BUG at fs/btrfs/ctree.h:3532!
[192922.918310] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
[192922.918666] CPU: 2 PID: 883 Comm: fsstress Tainted: G W 5.1.4-btrfs-next-47 #1
[192922.919035] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
[192922.919801] RIP: 0010:assfail.constprop.25+0x18/0x1a [btrfs]
(...)
[192922.920925] RSP: 0018:ffffaebdc8a27da8 EFLAGS: 00010286
[192922.921315] RAX: 0000000000000051 RBX: ffff95c9c16a41c0 RCX: 0000000000000000
[192922.921692] RDX: 0000000000000000 RSI: ffff95cab6b16838 RDI: ffff95cab6b16838
[192922.922066] RBP: ffff95c9c16a41c0 R08: 0000000000000000 R09: 0000000000000000
[192922.922442] R10: ffffaebdc8a27e70 R11: 0000000000000000 R12: ffff95ca731a0980
[192922.922820] R13: 0000000000000000 R14: ffff95ca84c73338 R15: ffff95ca731a0ea8
[192922.923200] FS: 00007f337eda4e80(0000) GS:ffff95cab6b00000(0000) knlGS:0000000000000000
[192922.923579] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[192922.923948] CR2: 00007f337edad000 CR3: 00000001e00f6002 CR4: 00000000003606e0
[192922.924329] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[192922.924711] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[192922.925105] Call Trace:
[192922.925505] btrfs_trans_release_metadata+0x10c/0x170 [btrfs]
[192922.925911] btrfs_commit_transaction+0x3e/0xaf0 [btrfs]
[192922.926324] btrfs_sync_file+0x44c/0x490 [btrfs]
[192922.926731] do_fsync+0x38/0x60
[192922.927138] __x64_sys_fdatasync+0x13/0x20
[192922.927543] do_syscall_64+0x60/0x1c0
[192922.927939] entry_SYSCALL_64_after_hwframe+0x49/0xbe
(...)
[192922.934077] ---[ end trace f00808b12068168f ]---
2) If evict_refill_and_join() decides to commit the transaction, it will
be able to do it, since the nested transaction join only increments the
transaction handle's ->use_count reference counter and it does not
prevent the transaction from getting committed. This means that after
eviction completes, the fsync logging path will be using a transaction
handle that refers to an already committed transaction. What happens
when using such a stale transaction can be unpredictable, we are at
least having a use-after-free on the transaction handle itself, since
the transaction commit will call kmem_cache_free() against the handle
regardless of its ->use_count value, or we can end up silently losing
all the updates to the log tree after that iput() in the logging path,
or using a transaction handle that in the meanwhile was allocated to
another task for a new transaction, etc, pretty much unpredictable
what can happen.
In order to fix both of them, instead of using iput() during logging, use
btrfs_add_delayed_iput(), so that the logging path of fsync never drops
the last reference on an inode, that step is offloaded to a safe context
(usually the cleaner kthread).
The assertion failure issue was sporadically triggered by the test case
generic/475 from fstests, which loads the dm error target while fsstress
is running, which lead to fsync failing while logging inodes with -EIO
errors and then trying later to commit the transaction, triggering the
assertion failure.
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
fs/btrfs/tree-log.c | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -4985,7 +4985,7 @@ static int log_conflicting_inodes(struct
BTRFS_I(inode),
LOG_OTHER_INODE_ALL,
0, LLONG_MAX, ctx);
- iput(inode);
+ btrfs_add_delayed_iput(inode);
}
}
continue;
@@ -5000,7 +5000,7 @@ static int log_conflicting_inodes(struct
ret = btrfs_log_inode(trans, root, BTRFS_I(inode),
LOG_OTHER_INODE, 0, LLONG_MAX, ctx);
if (ret) {
- iput(inode);
+ btrfs_add_delayed_iput(inode);
continue;
}
@@ -5009,7 +5009,7 @@ static int log_conflicting_inodes(struct
key.offset = 0;
ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
if (ret < 0) {
- iput(inode);
+ btrfs_add_delayed_iput(inode);
continue;
}
@@ -5056,7 +5056,7 @@ static int log_conflicting_inodes(struct
}
path->slots[0]++;
}
- iput(inode);
+ btrfs_add_delayed_iput(inode);
}
return ret;
@@ -5689,7 +5689,7 @@ process_leaf:
}
if (btrfs_inode_in_log(BTRFS_I(di_inode), trans->transid)) {
- iput(di_inode);
+ btrfs_add_delayed_iput(di_inode);
break;
}
@@ -5701,7 +5701,7 @@ process_leaf:
if (!ret &&
btrfs_must_commit_transaction(trans, BTRFS_I(di_inode)))
ret = 1;
- iput(di_inode);
+ btrfs_add_delayed_iput(di_inode);
if (ret)
goto next_dir_inode;
if (ctx->log_new_dentries) {
@@ -5848,7 +5848,7 @@ static int btrfs_log_all_parents(struct
if (!ret && ctx && ctx->log_new_dentries)
ret = log_new_dir_dentries(trans, root,
BTRFS_I(dir_inode), ctx);
- iput(dir_inode);
+ btrfs_add_delayed_iput(dir_inode);
if (ret)
goto out;
}
@@ -5891,7 +5891,7 @@ static int log_new_ancestors(struct btrf
ret = btrfs_log_inode(trans, root, BTRFS_I(inode),
LOG_INODE_EXISTS,
0, LLONG_MAX, ctx);
- iput(inode);
+ btrfs_add_delayed_iput(inode);
if (ret)
return ret;
next prev parent reply other threads:[~2019-09-18 6:25 UTC|newest]
Thread overview: 94+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-09-18 6:18 [PATCH 5.2 00/85] 5.2.16-stable review Greg Kroah-Hartman
2019-09-18 6:18 ` [PATCH 5.2 01/85] bridge/mdb: remove wrong use of NLM_F_MULTI Greg Kroah-Hartman
2019-09-18 6:18 ` [PATCH 5.2 02/85] cdc_ether: fix rndis support for Mediatek based smartphones Greg Kroah-Hartman
2019-09-18 6:18 ` [PATCH 5.2 03/85] ipv6: Fix the link time qualifier of ping_v6_proc_exit_net() Greg Kroah-Hartman
2019-09-18 6:18 ` [PATCH 5.2 04/85] isdn/capi: check message length in capi_write() Greg Kroah-Hartman
2019-09-18 6:18 ` [PATCH 5.2 05/85] ixgbe: Fix secpath usage for IPsec TX offload Greg Kroah-Hartman
2019-09-18 6:18 ` [PATCH 5.2 06/85] ixgbevf: Fix secpath usage for IPsec Tx offload Greg Kroah-Hartman
2019-09-18 6:18 ` [PATCH 5.2 07/85] net: Fix null de-reference of device refcount Greg Kroah-Hartman
2019-09-18 6:18 ` [PATCH 5.2 08/85] net: gso: Fix skb_segment splat when splitting gso_size mangled skb having linear-headed frag_list Greg Kroah-Hartman
2019-09-18 6:18 ` [PATCH 5.2 09/85] net: phylink: Fix flow control resolution Greg Kroah-Hartman
2019-09-18 6:18 ` [PATCH 5.2 10/85] net: sched: fix reordering issues Greg Kroah-Hartman
2019-09-18 6:18 ` [PATCH 5.2 11/85] sch_hhf: ensure quantum and hhf_non_hh_weight are non-zero Greg Kroah-Hartman
2019-09-18 6:18 ` [PATCH 5.2 12/85] sctp: Fix the link time qualifier of sctp_ctrlsock_exit() Greg Kroah-Hartman
2019-09-18 6:18 ` [PATCH 5.2 13/85] sctp: use transport pf_retrans in sctp_do_8_2_transport_strike Greg Kroah-Hartman
2019-09-18 6:18 ` [PATCH 5.2 14/85] tcp: fix tcp_ecn_withdraw_cwr() to clear TCP_ECN_QUEUE_CWR Greg Kroah-Hartman
2019-09-18 6:18 ` [PATCH 5.2 15/85] tipc: add NULL pointer check before calling kfree_rcu Greg Kroah-Hartman
2019-09-18 6:18 ` [PATCH 5.2 16/85] tun: fix use-after-free when register netdev failed Greg Kroah-Hartman
2019-09-18 6:18 ` [PATCH 5.2 17/85] net-ipv6: fix excessive RTF_ADDRCONF flag on ::1/128 local route (and others) Greg Kroah-Hartman
2019-09-18 6:18 ` [PATCH 5.2 18/85] ipv6: addrconf_f6i_alloc - fix non-null pointer check to !IS_ERR() Greg Kroah-Hartman
2019-09-18 6:18 ` [PATCH 5.2 19/85] net: fixed_phy: Add forward declaration for struct gpio_desc; Greg Kroah-Hartman
2019-09-18 6:18 ` [PATCH 5.2 20/85] sctp: fix the missing put_user when dumping transport thresholds Greg Kroah-Hartman
2019-09-18 6:18 ` [PATCH 5.2 21/85] net: sock_map, fix missing ulp check in sock hash case Greg Kroah-Hartman
2019-09-18 6:18 ` [PATCH 5.2 22/85] gpiolib: acpi: Add gpiolib_acpi_run_edge_events_on_boot option and blacklist Greg Kroah-Hartman
2019-09-18 6:18 ` [PATCH 5.2 23/85] gpio: mockup: add missing single_release() Greg Kroah-Hartman
2019-09-18 6:18 ` [PATCH 5.2 24/85] gpio: fix line flag validation in linehandle_create Greg Kroah-Hartman
2019-09-18 6:18 ` [PATCH 5.2 25/85] gpio: fix line flag validation in lineevent_create Greg Kroah-Hartman
2019-09-18 6:18 ` Greg Kroah-Hartman [this message]
2019-09-18 6:18 ` [PATCH 5.2 27/85] cgroup: freezer: fix frozen state inheritance Greg Kroah-Hartman
2019-09-18 6:18 ` [PATCH 5.2 28/85] Revert "mmc: bcm2835: Terminate timeout work synchronously" Greg Kroah-Hartman
2019-09-18 6:18 ` [PATCH 5.2 29/85] Revert "mmc: sdhci: Remove unneeded quirk2 flag of O2 SD host controller" Greg Kroah-Hartman
2019-09-18 6:18 ` [PATCH 5.2 30/85] mmc: tmio: Fixup runtime PM management during probe Greg Kroah-Hartman
2019-09-18 6:18 ` [PATCH 5.2 31/85] mmc: tmio: Fixup runtime PM management during remove Greg Kroah-Hartman
2019-09-18 6:18 ` [PATCH 5.2 32/85] drm/lima: fix lima_gem_wait() return value Greg Kroah-Hartman
2019-09-18 6:18 ` [PATCH 5.2 33/85] drm/i915: Limit MST to <= 8bpc once again Greg Kroah-Hartman
2019-09-18 6:18 ` [PATCH 5.2 34/85] drm/i915: Restore relaxed padding (OCL_OOB_SUPPRES_ENABLE) for skl+ Greg Kroah-Hartman
2019-09-18 6:18 ` [PATCH 5.2 35/85] ipc: fix semtimedop for generic 32-bit architectures Greg Kroah-Hartman
2019-09-18 6:18 ` [PATCH 5.2 36/85] ipc: fix sparc64 ipc() wrapper Greg Kroah-Hartman
2019-09-18 6:18 ` [PATCH 5.2 37/85] ixgbe: fix double clean of Tx descriptors with xdp Greg Kroah-Hartman
2019-09-18 6:18 ` [PATCH 5.2 38/85] ixgbe: Prevent u8 wrapping of ITR value to something less than 10us Greg Kroah-Hartman
2019-09-18 6:18 ` [PATCH 5.2 39/85] Revert "rt2800: enable TX_PIN_CFG_LNA_PE_ bits per band" Greg Kroah-Hartman
2019-09-18 6:18 ` [PATCH 5.2 40/85] mt76: mt76x0e: disable 5GHz band for MT7630E Greg Kroah-Hartman
2019-09-18 6:18 ` [PATCH 5.2 41/85] genirq: Prevent NULL pointer dereference in resend_irqs() Greg Kroah-Hartman
2019-09-18 6:19 ` [PATCH 5.2 42/85] regulator: twl: voltage lists for vdd1/2 on twl4030 Greg Kroah-Hartman
2019-09-18 6:19 ` [PATCH 5.2 43/85] KVM: s390: kvm_s390_vm_start_migration: check dirty_bitmap before using it as target for memset() Greg Kroah-Hartman
2019-09-18 6:19 ` [PATCH 5.2 44/85] KVM: s390: Do not leak kernel stack data in the KVM_S390_INTERRUPT ioctl Greg Kroah-Hartman
2019-09-18 6:19 ` [PATCH 5.2 45/85] KVM: x86: work around leak of uninitialized stack contents Greg Kroah-Hartman
2019-09-18 6:19 ` [PATCH 5.2 46/85] KVM: x86/mmu: Reintroduce fast invalidate/zap for flushing memslot Greg Kroah-Hartman
2019-09-18 6:19 ` [PATCH 5.2 47/85] KVM: nVMX: handle page fault in vmread Greg Kroah-Hartman
2019-09-18 6:19 ` [PATCH 5.2 48/85] x86/purgatory: Change compiler flags from -mcmodel=kernel to -mcmodel=large to fix kexec relocation errors Greg Kroah-Hartman
2019-09-18 6:19 ` [PATCH 5.2 49/85] powerpc: Add barrier_nospec to raw_copy_in_user() Greg Kroah-Hartman
2019-09-18 6:19 ` [PATCH 5.2 50/85] kernel/module: Fix mem leak in module_add_modinfo_attrs Greg Kroah-Hartman
2019-09-18 6:19 ` [PATCH 5.2 51/85] x86/boot: Use efi_setup_data for searching RSDP on kexec-ed kernels Greg Kroah-Hartman
2019-09-18 6:19 ` [PATCH 5.2 52/85] x86/ima: check EFI SetupMode too Greg Kroah-Hartman
2019-09-18 6:19 ` [PATCH 5.2 53/85] drm/meson: Add support for XBGR8888 & ABGR8888 formats Greg Kroah-Hartman
2019-09-18 6:19 ` [PATCH 5.2 54/85] clk: Fix debugfs clk_possible_parents for clks without parent string names Greg Kroah-Hartman
2019-09-18 6:19 ` [PATCH 5.2 55/85] clk: Simplify debugfs printing and add a newline Greg Kroah-Hartman
2019-09-18 6:19 ` [PATCH 5.2 56/85] mt76: Fix a signedness bug in mt7615_add_interface() Greg Kroah-Hartman
2019-09-18 6:19 ` [PATCH 5.2 57/85] mt76: mt7615: Use after free in mt7615_mcu_set_bcn() Greg Kroah-Hartman
2019-09-18 6:19 ` [PATCH 5.2 58/85] clk: rockchip: Dont yell about bad mmc phases when getting Greg Kroah-Hartman
2019-09-18 6:19 ` [PATCH 5.2 59/85] mtd: rawnand: mtk: Fix wrongly assigned OOB buffer pointer issue Greg Kroah-Hartman
2019-09-18 6:19 ` [PATCH 5.2 60/85] PCI: Always allow probing with driver_override Greg Kroah-Hartman
2019-09-18 6:19 ` [PATCH 5.2 61/85] ubifs: Correctly use tnc_next() in search_dh_cookie() Greg Kroah-Hartman
2019-09-18 6:19 ` [PATCH 5.2 62/85] driver core: Fix use-after-free and double free on glue directory Greg Kroah-Hartman
2019-09-18 6:19 ` [PATCH 5.2 63/85] crypto: talitos - check AES key size Greg Kroah-Hartman
2019-09-18 6:19 ` [PATCH 5.2 64/85] crypto: talitos - fix CTR alg blocksize Greg Kroah-Hartman
2019-09-18 6:19 ` [PATCH 5.2 65/85] crypto: talitos - check data blocksize in ablkcipher Greg Kroah-Hartman
2019-09-18 6:19 ` [PATCH 5.2 66/85] crypto: talitos - fix ECB algs ivsize Greg Kroah-Hartman
2019-09-18 6:19 ` [PATCH 5.2 67/85] crypto: talitos - Do not modify req->cryptlen on decryption Greg Kroah-Hartman
2019-09-18 6:19 ` [PATCH 5.2 68/85] crypto: talitos - HMAC SNOOP NO AFEU mode requires SW icv checking Greg Kroah-Hartman
2019-09-18 6:19 ` [PATCH 5.2 69/85] firmware: ti_sci: Always request response from firmware Greg Kroah-Hartman
2019-09-18 6:19 ` [PATCH 5.2 70/85] drm: panel-orientation-quirks: Add extra quirk table entry for GPD MicroPC Greg Kroah-Hartman
2019-09-18 6:19 ` [PATCH 5.2 71/85] drm/mediatek: mtk_drm_drv.c: Add of_node_put() before goto Greg Kroah-Hartman
2019-09-18 6:19 ` [PATCH 5.2 72/85] mm/z3fold.c: remove z3fold_migration trylock Greg Kroah-Hartman
2019-09-18 6:19 ` [PATCH 5.2 73/85] mm/z3fold.c: fix lock/unlock imbalance in z3fold_page_isolate Greg Kroah-Hartman
2019-09-18 6:19 ` [PATCH 5.2 74/85] Revert "Bluetooth: btusb: driver to enable the usb-wakeup feature" Greg Kroah-Hartman
2019-09-18 6:19 ` [PATCH 5.2 75/85] iio: adc: stm32-dfsdm: fix output resolution Greg Kroah-Hartman
2019-09-18 6:19 ` [PATCH 5.2 76/85] iio: adc: stm32-dfsdm: fix data type Greg Kroah-Hartman
2019-09-18 6:19 ` [PATCH 5.2 77/85] modules: fix BUG when load module with rodata=n Greg Kroah-Hartman
2019-09-18 6:19 ` [PATCH 5.2 78/85] modules: fix compile error if dont have strict module rwx Greg Kroah-Hartman
2019-09-18 6:19 ` [PATCH 5.2 79/85] modules: always page-align module section allocations Greg Kroah-Hartman
2019-09-18 6:19 ` [PATCH 5.2 80/85] kvm: nVMX: Remove unnecessary sync_roots from handle_invept Greg Kroah-Hartman
2019-09-18 6:19 ` [PATCH 5.2 81/85] KVM: SVM: Fix detection of AMD Errata 1096 Greg Kroah-Hartman
2019-09-18 6:19 ` [PATCH 5.2 82/85] platform/x86: pmc_atom: Add CB4063 Beckhoff Automation board to critclk_systems DMI table Greg Kroah-Hartman
2019-09-18 6:19 ` [PATCH 5.2 83/85] platform/x86: pcengines-apuv2: use KEY_RESTART for front button Greg Kroah-Hartman
2019-09-18 6:19 ` [PATCH 5.2 84/85] rsi: fix a double free bug in rsi_91x_deinit() Greg Kroah-Hartman
2019-09-18 6:19 ` [PATCH 5.2 85/85] x86/build: Add -Wnoaddress-of-packed-member to REALMODE_CFLAGS, to silence GCC9 build warning Greg Kroah-Hartman
2019-09-18 11:59 ` [PATCH 5.2 00/85] 5.2.16-stable review kernelci.org bot
2019-09-18 15:17 ` Naresh Kamboju
2019-09-19 6:37 ` Greg Kroah-Hartman
2019-09-18 16:28 ` Jon Hunter
2019-09-19 6:37 ` Greg Kroah-Hartman
2019-09-18 19:38 ` Guenter Roeck
2019-09-19 1:22 ` shuah
2019-09-19 6:36 ` Greg Kroah-Hartman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190918061234.989088316@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=dsterba@suse.com \
--cc=fdmanana@suse.com \
--cc=josef@toxicpanda.com \
--cc=linux-kernel@vger.kernel.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).