From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, Filipe Manana <fdmanana@suse.com>
Subject: [PATCH 4.3 50/71] Btrfs: fix race leading to incorrect item deletion when dropping extents
Date: Sat, 12 Dec 2015 12:06:14 -0800 [thread overview]
Message-ID: <20151212200539.224037187@linuxfoundation.org> (raw)
In-Reply-To: <20151212200536.761001328@linuxfoundation.org>
4.3-stable review patch. If anyone has any objections, please let me know.
------------------
From: Filipe Manana <fdmanana@suse.com>
commit aeafbf8486c9e2bd53f5cc3c10c0b7fd7149d69c upstream.
While running a stress test I got the following warning triggered:
[191627.672810] ------------[ cut here ]------------
[191627.673949] WARNING: CPU: 8 PID: 8447 at fs/btrfs/file.c:779 __btrfs_drop_extents+0x391/0xa50 [btrfs]()
(...)
[191627.701485] Call Trace:
[191627.702037] [<ffffffff8145f077>] dump_stack+0x4f/0x7b
[191627.702992] [<ffffffff81095de5>] ? console_unlock+0x356/0x3a2
[191627.704091] [<ffffffff8104b3b0>] warn_slowpath_common+0xa1/0xbb
[191627.705380] [<ffffffffa0664499>] ? __btrfs_drop_extents+0x391/0xa50 [btrfs]
[191627.706637] [<ffffffff8104b46d>] warn_slowpath_null+0x1a/0x1c
[191627.707789] [<ffffffffa0664499>] __btrfs_drop_extents+0x391/0xa50 [btrfs]
[191627.709155] [<ffffffff8115663c>] ? cache_alloc_debugcheck_after.isra.32+0x171/0x1d0
[191627.712444] [<ffffffff81155007>] ? kmemleak_alloc_recursive.constprop.40+0x16/0x18
[191627.714162] [<ffffffffa06570c9>] insert_reserved_file_extent.constprop.40+0x83/0x24e [btrfs]
[191627.715887] [<ffffffffa065422b>] ? start_transaction+0x3bb/0x610 [btrfs]
[191627.717287] [<ffffffffa065b604>] btrfs_finish_ordered_io+0x273/0x4e2 [btrfs]
[191627.728865] [<ffffffffa065b888>] finish_ordered_fn+0x15/0x17 [btrfs]
[191627.730045] [<ffffffffa067d688>] normal_work_helper+0x14c/0x32c [btrfs]
[191627.731256] [<ffffffffa067d96a>] btrfs_endio_write_helper+0x12/0x14 [btrfs]
[191627.732661] [<ffffffff81061119>] process_one_work+0x24c/0x4ae
[191627.733822] [<ffffffff810615b0>] worker_thread+0x206/0x2c2
[191627.734857] [<ffffffff810613aa>] ? process_scheduled_works+0x2f/0x2f
[191627.736052] [<ffffffff810613aa>] ? process_scheduled_works+0x2f/0x2f
[191627.737349] [<ffffffff810669a6>] kthread+0xef/0xf7
[191627.738267] [<ffffffff810f3b3a>] ? time_hardirqs_on+0x15/0x28
[191627.739330] [<ffffffff810668b7>] ? __kthread_parkme+0xad/0xad
[191627.741976] [<ffffffff81465592>] ret_from_fork+0x42/0x70
[191627.743080] [<ffffffff810668b7>] ? __kthread_parkme+0xad/0xad
[191627.744206] ---[ end trace bbfddacb7aaada8d ]---
$ cat -n fs/btrfs/file.c
691 int __btrfs_drop_extents(struct btrfs_trans_handle *trans,
(...)
758 btrfs_item_key_to_cpu(leaf, &key, path->slots[0]);
759 if (key.objectid > ino ||
760 key.type > BTRFS_EXTENT_DATA_KEY || key.offset >= end)
761 break;
762
763 fi = btrfs_item_ptr(leaf, path->slots[0],
764 struct btrfs_file_extent_item);
765 extent_type = btrfs_file_extent_type(leaf, fi);
766
767 if (extent_type == BTRFS_FILE_EXTENT_REG ||
768 extent_type == BTRFS_FILE_EXTENT_PREALLOC) {
(...)
774 } else if (extent_type == BTRFS_FILE_EXTENT_INLINE) {
(...)
778 } else {
779 WARN_ON(1);
780 extent_end = search_start;
781 }
(...)
This happened because the item we were processing did not match a file
extent item (its key type != BTRFS_EXTENT_DATA_KEY), and even on this
case we cast the item to a struct btrfs_file_extent_item pointer and
then find a type field value that does not match any of the expected
values (BTRFS_FILE_EXTENT_[REG|PREALLOC|INLINE]). This scenario happens
due to a tiny time window where a race can happen as exemplified below.
For example, consider the following scenario where we're using the
NO_HOLES feature and we have the following two neighbour leafs:
Leaf X (has N items) Leaf Y
[ ... (257 INODE_ITEM 0) (257 INODE_REF 256) ] [ (257 EXTENT_DATA 8192), ... ]
slot N - 2 slot N - 1 slot 0
Our inode 257 has an implicit hole in the range [0, 8K[ (implicit rather
than explicit because NO_HOLES is enabled). Now if our inode has an
ordered extent for the range [4K, 8K[ that is finishing, the following
can happen:
CPU 1 CPU 2
btrfs_finish_ordered_io()
insert_reserved_file_extent()
__btrfs_drop_extents()
Searches for the key
(257 EXTENT_DATA 4096) through
btrfs_lookup_file_extent()
Key not found and we get a path where
path->nodes[0] == leaf X and
path->slots[0] == N
Because path->slots[0] is >=
btrfs_header_nritems(leaf X), we call
btrfs_next_leaf()
btrfs_next_leaf() releases the path
inserts key
(257 INODE_REF 4096)
at the end of leaf X,
leaf X now has N + 1 keys,
and the new key is at
slot N
btrfs_next_leaf() searches for
key (257 INODE_REF 256), with
path->keep_locks set to 1,
because it was the last key it
saw in leaf X
finds it in leaf X again and
notices it's no longer the last
key of the leaf, so it returns 0
with path->nodes[0] == leaf X and
path->slots[0] == N (which is now
< btrfs_header_nritems(leaf X)),
pointing to the new key
(257 INODE_REF 4096)
__btrfs_drop_extents() casts the
item at path->nodes[0], slot
path->slots[0], to a struct
btrfs_file_extent_item - it does
not skip keys for the target
inode with a type less than
BTRFS_EXTENT_DATA_KEY
(BTRFS_INODE_REF_KEY < BTRFS_EXTENT_DATA_KEY)
sees a bogus value for the type
field triggering the WARN_ON in
the trace shown above, and sets
extent_end = search_start (4096)
does the if-then-else logic to
fixup 0 length extent items created
by a past bug from hole punching:
if (extent_end == key.offset &&
extent_end >= search_start)
goto delete_extent_item;
that evaluates to true and it ends
up deleting the key pointed to by
path->slots[0], (257 INODE_REF 4096),
from leaf X
The same could happen for example for a xattr that ends up having a key
with an offset value that matches search_start (very unlikely but not
impossible).
So fix this by ensuring that keys smaller than BTRFS_EXTENT_DATA_KEY are
skipped, never casted to struct btrfs_file_extent_item and never deleted
by accident. Also protect against the unexpected case of getting a key
for a lower inode number by skipping that key and issuing a warning.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
fs/btrfs/file.c | 16 ++++++++++++----
1 file changed, 12 insertions(+), 4 deletions(-)
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -756,8 +756,16 @@ next_slot:
}
btrfs_item_key_to_cpu(leaf, &key, path->slots[0]);
- if (key.objectid > ino ||
- key.type > BTRFS_EXTENT_DATA_KEY || key.offset >= end)
+
+ if (key.objectid > ino)
+ break;
+ if (WARN_ON_ONCE(key.objectid < ino) ||
+ key.type < BTRFS_EXTENT_DATA_KEY) {
+ ASSERT(del_nr == 0);
+ path->slots[0]++;
+ goto next_slot;
+ }
+ if (key.type > BTRFS_EXTENT_DATA_KEY || key.offset >= end)
break;
fi = btrfs_item_ptr(leaf, path->slots[0],
@@ -776,8 +784,8 @@ next_slot:
btrfs_file_extent_inline_len(leaf,
path->slots[0], fi);
} else {
- WARN_ON(1);
- extent_end = search_start;
+ /* can't happen */
+ BUG();
}
/*
next prev parent reply other threads:[~2015-12-12 20:06 UTC|newest]
Thread overview: 93+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 01/71] certs: add .gitignore to stop git nagging about x509_certificate_list Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 02/71] r8169: fix kasan reported skb use-after-free Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 03/71] af-unix: fix use-after-free with concurrent readers while splicing Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 04/71] af_unix: dont append consumed skbs to sk_receive_queue Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 05/71] af_unix: take receive queue lock while appending new skb Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 06/71] unix: avoid use-after-free in ep_remove_wait_queue Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 07/71] af-unix: passcred support for sendpage Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 08/71] ipv6: Avoid creating RTF_CACHE from a rt that is not managed by fib6 tree Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 09/71] ipv6: Check expire on DST_NOCACHE route Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 10/71] ipv6: Check rt->dst.from for the " Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 11/71] Revert "ipv6: ndisc: inherit metadata dst when creating ndisc requests" Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 12/71] tools/net: Use include/uapi with __EXPORTED_HEADERS__ Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 13/71] packet: do skb_probe_transport_header when we actually have data Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 14/71] packet: always probe for transport header Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 15/71] packet: only allow extra vlan len on ethernet devices Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 16/71] packet: infer protocol from ethernet header if unset Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 17/71] packet: fix tpacket_snd max frame len Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 18/71] sctp: translate host order to network order when setting a hmacid Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 19/71] net/mlx5e: Added self loopback prevention Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 20/71] net/mlx4_core: Fix sleeping while holding spinlock at rem_slave_counters Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 21/71] ip_tunnel: disable preemption when updating per-cpu tstats Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 22/71] net: switchdev: fix return code of fdb_dump stub Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 23/71] net: thunder: Check for driver data in nicvf_remove() Greg Kroah-Hartman
2015-12-14 7:17 ` Pavel Fedin
2015-12-14 14:16 ` 'Greg Kroah-Hartman'
2015-12-14 14:51 ` Pavel Fedin
2015-12-12 20:05 ` [PATCH 4.3 24/71] snmp: Remove duplicate OUTMCAST stat increment Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 25/71] net/ip6_tunnel: fix dst leak Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 27/71] tcp: md5: fix lockdep annotation Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 28/71] tcp: disable Fast Open on timeouts after handshake Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 29/71] tcp: fix potential huge kmalloc() calls in TCP_REPAIR Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 30/71] tcp: initialize tp->copied_seq in case of cross SYN connection Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 31/71] net, scm: fix PaX detected msg_controllen overflow in scm_detach_fds Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 32/71] net: ipmr: fix static mfc/dev leaks on table destruction Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 33/71] net: ip6mr: " Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 34/71] vrf: fix double free and memory corruption on register_netdevice failure Greg Kroah-Hartman
2015-12-14 17:45 ` Ben Hutchings
2015-12-14 18:59 ` David Ahern
2015-12-15 5:40 ` Greg Kroah-Hartman
2015-12-15 15:12 ` [PATCH 4.3] vrf: Fix memory leak on registration failure in vrf_newlink() Ben Hutchings
2015-12-15 15:15 ` David Ahern
2015-12-15 15:26 ` Ben Hutchings
2015-12-15 15:31 ` [PATCH 4.3 1/2] Revert "vrf: fix double free and memory corruption on register_netdevice failure" Ben Hutchings
2015-12-15 15:49 ` David Ahern
2015-12-17 22:43 ` Patch "Revert "vrf: fix double free and memory corruption on register_netdevice failure"" has been added to the 4.3-stable tree gregkh
2015-12-15 15:32 ` [PATCH 4.3 2/2] vrf: fix double free and memory corruption on register_netdevice failure Nikolay Aleksandrov
2015-12-15 15:50 ` David Ahern
2015-12-15 17:02 ` Ben Hutchings
2015-12-17 22:43 ` Patch "vrf: fix double free and memory corruption on register_netdevice failure" has been added to the 4.3-stable tree gregkh
2015-12-15 17:48 ` [PATCH 4.3] vrf: Fix memory leak on registration failure in vrf_newlink() David Miller
2015-12-12 20:05 ` [PATCH 4.3 35/71] broadcom: fix PHY_ID_BCM5481 entry in the id table Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 36/71] tipc: fix error handling of expanding buffer headroom Greg Kroah-Hartman
2015-12-14 17:46 ` Ben Hutchings
2015-12-14 23:52 ` Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 37/71] ipv6: distinguish frag queues by device for multicast and link-local packets Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 38/71] RDS: fix race condition when sending a message on unbound socket Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 39/71] bpf, array: fix heap out-of-bounds access when updating elements Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 40/71] ipv6: add complete rcu protection around np->opt Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 41/71] net/neighbour: fix crash at dumping device-agnostic proxy entries Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 42/71] ipv6: sctp: implement sctp_v6_destroy_sock() Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 43/71] openvswitch: fix hangup on vxlan/gre/geneve device deletion Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 44/71] net_sched: fix qdisc_tree_decrease_qlen() races Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 45/71] btrfs: fix resending received snapshot with parent Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 46/71] btrfs: check unsupported filters in balance arguments Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 47/71] Btrfs: fix file corruption and data loss after cloning inline extents Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 48/71] Btrfs: fix truncation of compressed and inlined extents Greg Kroah-Hartman
2015-12-12 20:06 ` Greg Kroah-Hartman [this message]
2015-12-12 20:06 ` [PATCH 4.3 51/71] Btrfs: fix race leading to BUG_ON when running delalloc for nodatacow Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 52/71] Btrfs: fix race when listing an inodes xattrs Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 53/71] btrfs: fix signed overflows in btrfs_sync_file Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 54/71] rbd: dont put snap_context twice in rbd_queue_workfn() Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 55/71] ext4 crypto: fix memory leak in ext4_bio_write_page() Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 56/71] ext4 crypto: fix bugs in ext4_encrypted_zeroout() Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 57/71] ext4: fix potential use after free in __ext4_journal_stop Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 58/71] ext4, jbd2: ensure entering into panic after recording an error in superblock Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 59/71] firewire: ohci: fix JMicron JMB38x IT context discovery Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 60/71] nfsd: serialize state seqid morphing operations Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 61/71] nfsd: eliminate sending duplicate and repeated delegations Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 62/71] debugfs: fix refcount imbalance in start_creating Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 63/71] nfs4: start callback_ident at idr 1 Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 64/71] nfs4: resend LAYOUTGET when there is a race that changes the seqid Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 65/71] nfs: if we have no valid attrs, then dont declare the attribute cache valid Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 66/71] ocfs2: fix umask ignored issue Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 67/71] block: fix segment split Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 68/71] ceph: fix message length computation Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 69/71] ALSA: pci: depend on ZONE_DMA Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 70/71] ALSA: hda/hdmi - apply Skylake fix-ups to Broxton display codec Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 71/71] [media] cobalt: fix Kconfig dependency Greg Kroah-Hartman
2015-12-13 3:05 ` [PATCH 4.3 00/71] 4.3.3-stable review Shuah Khan
2015-12-13 3:46 ` Greg Kroah-Hartman
2015-12-13 16:01 ` Guenter Roeck
2015-12-14 3:28 ` Greg Kroah-Hartman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20151212200539.224037187@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=fdmanana@suse.com \
--cc=linux-kernel@vger.kernel.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).