From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, Filipe Manana <fdmanana@suse.com>
Subject: [PATCH 4.3 50/71] Btrfs: fix race leading to incorrect item deletion when dropping extents
Date: Sat, 12 Dec 2015 12:06:14 -0800 [thread overview]
Message-ID: <20151212200539.224037187@linuxfoundation.org> (raw)
In-Reply-To: <20151212200536.761001328@linuxfoundation.org>
4.3-stable review patch. If anyone has any objections, please let me know.
------------------
From: Filipe Manana <fdmanana@suse.com>
commit aeafbf8486c9e2bd53f5cc3c10c0b7fd7149d69c upstream.
While running a stress test I got the following warning triggered:
[191627.672810] ------------[ cut here ]------------
[191627.673949] WARNING: CPU: 8 PID: 8447 at fs/btrfs/file.c:779 __btrfs_drop_extents+0x391/0xa50 [btrfs]()
(...)
[191627.701485] Call Trace:
[191627.702037] [<ffffffff8145f077>] dump_stack+0x4f/0x7b
[191627.702992] [<ffffffff81095de5>] ? console_unlock+0x356/0x3a2
[191627.704091] [<ffffffff8104b3b0>] warn_slowpath_common+0xa1/0xbb
[191627.705380] [<ffffffffa0664499>] ? __btrfs_drop_extents+0x391/0xa50 [btrfs]
[191627.706637] [<ffffffff8104b46d>] warn_slowpath_null+0x1a/0x1c
[191627.707789] [<ffffffffa0664499>] __btrfs_drop_extents+0x391/0xa50 [btrfs]
[191627.709155] [<ffffffff8115663c>] ? cache_alloc_debugcheck_after.isra.32+0x171/0x1d0
[191627.712444] [<ffffffff81155007>] ? kmemleak_alloc_recursive.constprop.40+0x16/0x18
[191627.714162] [<ffffffffa06570c9>] insert_reserved_file_extent.constprop.40+0x83/0x24e [btrfs]
[191627.715887] [<ffffffffa065422b>] ? start_transaction+0x3bb/0x610 [btrfs]
[191627.717287] [<ffffffffa065b604>] btrfs_finish_ordered_io+0x273/0x4e2 [btrfs]
[191627.728865] [<ffffffffa065b888>] finish_ordered_fn+0x15/0x17 [btrfs]
[191627.730045] [<ffffffffa067d688>] normal_work_helper+0x14c/0x32c [btrfs]
[191627.731256] [<ffffffffa067d96a>] btrfs_endio_write_helper+0x12/0x14 [btrfs]
[191627.732661] [<ffffffff81061119>] process_one_work+0x24c/0x4ae
[191627.733822] [<ffffffff810615b0>] worker_thread+0x206/0x2c2
[191627.734857] [<ffffffff810613aa>] ? process_scheduled_works+0x2f/0x2f
[191627.736052] [<ffffffff810613aa>] ? process_scheduled_works+0x2f/0x2f
[191627.737349] [<ffffffff810669a6>] kthread+0xef/0xf7
[191627.738267] [<ffffffff810f3b3a>] ? time_hardirqs_on+0x15/0x28
[191627.739330] [<ffffffff810668b7>] ? __kthread_parkme+0xad/0xad
[191627.741976] [<ffffffff81465592>] ret_from_fork+0x42/0x70
[191627.743080] [<ffffffff810668b7>] ? __kthread_parkme+0xad/0xad
[191627.744206] ---[ end trace bbfddacb7aaada8d ]---
$ cat -n fs/btrfs/file.c
691 int __btrfs_drop_extents(struct btrfs_trans_handle *trans,
(...)
758 btrfs_item_key_to_cpu(leaf, &key, path->slots[0]);
759 if (key.objectid > ino ||
760 key.type > BTRFS_EXTENT_DATA_KEY || key.offset >= end)
761 break;
762
763 fi = btrfs_item_ptr(leaf, path->slots[0],
764 struct btrfs_file_extent_item);
765 extent_type = btrfs_file_extent_type(leaf, fi);
766
767 if (extent_type == BTRFS_FILE_EXTENT_REG ||
768 extent_type == BTRFS_FILE_EXTENT_PREALLOC) {
(...)
774 } else if (extent_type == BTRFS_FILE_EXTENT_INLINE) {
(...)
778 } else {
779 WARN_ON(1);
780 extent_end = search_start;
781 }
(...)
This happened because the item we were processing did not match a file
extent item (its key type != BTRFS_EXTENT_DATA_KEY), and even on this
case we cast the item to a struct btrfs_file_extent_item pointer and
then find a type field value that does not match any of the expected
values (BTRFS_FILE_EXTENT_[REG|PREALLOC|INLINE]). This scenario happens
due to a tiny time window where a race can happen as exemplified below.
For example, consider the following scenario where we're using the
NO_HOLES feature and we have the following two neighbour leafs:
Leaf X (has N items) Leaf Y
[ ... (257 INODE_ITEM 0) (257 INODE_REF 256) ] [ (257 EXTENT_DATA 8192), ... ]
slot N - 2 slot N - 1 slot 0
Our inode 257 has an implicit hole in the range [0, 8K[ (implicit rather
than explicit because NO_HOLES is enabled). Now if our inode has an
ordered extent for the range [4K, 8K[ that is finishing, the following
can happen:
CPU 1 CPU 2
btrfs_finish_ordered_io()
insert_reserved_file_extent()
__btrfs_drop_extents()
Searches for the key
(257 EXTENT_DATA 4096) through
btrfs_lookup_file_extent()
Key not found and we get a path where
path->nodes[0] == leaf X and
path->slots[0] == N
Because path->slots[0] is >=
btrfs_header_nritems(leaf X), we call
btrfs_next_leaf()
btrfs_next_leaf() releases the path
inserts key
(257 INODE_REF 4096)
at the end of leaf X,
leaf X now has N + 1 keys,
and the new key is at
slot N
btrfs_next_leaf() searches for
key (257 INODE_REF 256), with
path->keep_locks set to 1,
because it was the last key it
saw in leaf X
finds it in leaf X again and
notices it's no longer the last
key of the leaf, so it returns 0
with path->nodes[0] == leaf X and
path->slots[0] == N (which is now
< btrfs_header_nritems(leaf X)),
pointing to the new key
(257 INODE_REF 4096)
__btrfs_drop_extents() casts the
item at path->nodes[0], slot
path->slots[0], to a struct
btrfs_file_extent_item - it does
not skip keys for the target
inode with a type less than
BTRFS_EXTENT_DATA_KEY
(BTRFS_INODE_REF_KEY < BTRFS_EXTENT_DATA_KEY)
sees a bogus value for the type
field triggering the WARN_ON in
the trace shown above, and sets
extent_end = search_start (4096)
does the if-then-else logic to
fixup 0 length extent items created
by a past bug from hole punching:
if (extent_end == key.offset &&
extent_end >= search_start)
goto delete_extent_item;
that evaluates to true and it ends
up deleting the key pointed to by
path->slots[0], (257 INODE_REF 4096),
from leaf X
The same could happen for example for a xattr that ends up having a key
with an offset value that matches search_start (very unlikely but not
impossible).
So fix this by ensuring that keys smaller than BTRFS_EXTENT_DATA_KEY are
skipped, never casted to struct btrfs_file_extent_item and never deleted
by accident. Also protect against the unexpected case of getting a key
for a lower inode number by skipping that key and issuing a warning.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
fs/btrfs/file.c | 16 ++++++++++++----
1 file changed, 12 insertions(+), 4 deletions(-)
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -756,8 +756,16 @@ next_slot:
}
btrfs_item_key_to_cpu(leaf, &key, path->slots[0]);
- if (key.objectid > ino ||
- key.type > BTRFS_EXTENT_DATA_KEY || key.offset >= end)
+
+ if (key.objectid > ino)
+ break;
+ if (WARN_ON_ONCE(key.objectid < ino) ||
+ key.type < BTRFS_EXTENT_DATA_KEY) {
+ ASSERT(del_nr == 0);
+ path->slots[0]++;
+ goto next_slot;
+ }
+ if (key.type > BTRFS_EXTENT_DATA_KEY || key.offset >= end)
break;
fi = btrfs_item_ptr(leaf, path->slots[0],
@@ -776,8 +784,8 @@ next_slot:
btrfs_file_extent_inline_len(leaf,
path->slots[0], fi);
} else {
- WARN_ON(1);
- extent_end = search_start;
+ /* can't happen */
+ BUG();
}
/*
next prev parent reply other threads:[~2015-12-12 20:31 UTC|newest]
Thread overview: 94+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-12-12 20:05 [PATCH 4.3 00/71] 4.3.3-stable review Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 01/71] certs: add .gitignore to stop git nagging about x509_certificate_list Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 02/71] r8169: fix kasan reported skb use-after-free Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 03/71] af-unix: fix use-after-free with concurrent readers while splicing Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 04/71] af_unix: dont append consumed skbs to sk_receive_queue Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 05/71] af_unix: take receive queue lock while appending new skb Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 06/71] unix: avoid use-after-free in ep_remove_wait_queue Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 07/71] af-unix: passcred support for sendpage Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 08/71] ipv6: Avoid creating RTF_CACHE from a rt that is not managed by fib6 tree Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 09/71] ipv6: Check expire on DST_NOCACHE route Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 10/71] ipv6: Check rt->dst.from for the " Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 11/71] Revert "ipv6: ndisc: inherit metadata dst when creating ndisc requests" Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 12/71] tools/net: Use include/uapi with __EXPORTED_HEADERS__ Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 13/71] packet: do skb_probe_transport_header when we actually have data Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 14/71] packet: always probe for transport header Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 15/71] packet: only allow extra vlan len on ethernet devices Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 16/71] packet: infer protocol from ethernet header if unset Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 17/71] packet: fix tpacket_snd max frame len Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 18/71] sctp: translate host order to network order when setting a hmacid Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 19/71] net/mlx5e: Added self loopback prevention Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 20/71] net/mlx4_core: Fix sleeping while holding spinlock at rem_slave_counters Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 21/71] ip_tunnel: disable preemption when updating per-cpu tstats Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 22/71] net: switchdev: fix return code of fdb_dump stub Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 23/71] net: thunder: Check for driver data in nicvf_remove() Greg Kroah-Hartman
2015-12-14 7:17 ` Pavel Fedin
2015-12-14 14:16 ` 'Greg Kroah-Hartman'
2015-12-14 14:51 ` Pavel Fedin
2015-12-12 20:05 ` [PATCH 4.3 24/71] snmp: Remove duplicate OUTMCAST stat increment Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 25/71] net/ip6_tunnel: fix dst leak Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 27/71] tcp: md5: fix lockdep annotation Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 28/71] tcp: disable Fast Open on timeouts after handshake Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 29/71] tcp: fix potential huge kmalloc() calls in TCP_REPAIR Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 30/71] tcp: initialize tp->copied_seq in case of cross SYN connection Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 31/71] net, scm: fix PaX detected msg_controllen overflow in scm_detach_fds Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 32/71] net: ipmr: fix static mfc/dev leaks on table destruction Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 33/71] net: ip6mr: " Greg Kroah-Hartman
2015-12-12 20:05 ` [PATCH 4.3 34/71] vrf: fix double free and memory corruption on register_netdevice failure Greg Kroah-Hartman
2015-12-14 17:45 ` Ben Hutchings
2015-12-14 18:59 ` David Ahern
2015-12-15 5:40 ` Greg Kroah-Hartman
2015-12-15 15:12 ` [PATCH 4.3] vrf: Fix memory leak on registration failure in vrf_newlink() Ben Hutchings
2015-12-15 15:15 ` David Ahern
2015-12-15 15:26 ` Ben Hutchings
2015-12-15 15:31 ` [PATCH 4.3 1/2] Revert "vrf: fix double free and memory corruption on register_netdevice failure" Ben Hutchings
2015-12-15 15:49 ` David Ahern
2015-12-17 22:43 ` Patch "Revert "vrf: fix double free and memory corruption on register_netdevice failure"" has been added to the 4.3-stable tree gregkh
2015-12-15 15:32 ` [PATCH 4.3 2/2] vrf: fix double free and memory corruption on register_netdevice failure Nikolay Aleksandrov
2015-12-15 15:50 ` David Ahern
2015-12-15 17:02 ` Ben Hutchings
2015-12-17 22:43 ` Patch "vrf: fix double free and memory corruption on register_netdevice failure" has been added to the 4.3-stable tree gregkh
2015-12-15 17:48 ` [PATCH 4.3] vrf: Fix memory leak on registration failure in vrf_newlink() David Miller
2015-12-12 20:05 ` [PATCH 4.3 35/71] broadcom: fix PHY_ID_BCM5481 entry in the id table Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 36/71] tipc: fix error handling of expanding buffer headroom Greg Kroah-Hartman
2015-12-14 17:46 ` Ben Hutchings
2015-12-14 23:52 ` Greg Kroah-Hartman
2015-12-14 23:52 ` Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 37/71] ipv6: distinguish frag queues by device for multicast and link-local packets Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 38/71] RDS: fix race condition when sending a message on unbound socket Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 39/71] bpf, array: fix heap out-of-bounds access when updating elements Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 40/71] ipv6: add complete rcu protection around np->opt Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 41/71] net/neighbour: fix crash at dumping device-agnostic proxy entries Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 42/71] ipv6: sctp: implement sctp_v6_destroy_sock() Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 43/71] openvswitch: fix hangup on vxlan/gre/geneve device deletion Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 44/71] net_sched: fix qdisc_tree_decrease_qlen() races Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 45/71] btrfs: fix resending received snapshot with parent Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 46/71] btrfs: check unsupported filters in balance arguments Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 47/71] Btrfs: fix file corruption and data loss after cloning inline extents Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 48/71] Btrfs: fix truncation of compressed and inlined extents Greg Kroah-Hartman
2015-12-12 20:06 ` Greg Kroah-Hartman [this message]
2015-12-12 20:06 ` [PATCH 4.3 51/71] Btrfs: fix race leading to BUG_ON when running delalloc for nodatacow Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 52/71] Btrfs: fix race when listing an inodes xattrs Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 53/71] btrfs: fix signed overflows in btrfs_sync_file Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 54/71] rbd: dont put snap_context twice in rbd_queue_workfn() Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 55/71] ext4 crypto: fix memory leak in ext4_bio_write_page() Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 56/71] ext4 crypto: fix bugs in ext4_encrypted_zeroout() Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 57/71] ext4: fix potential use after free in __ext4_journal_stop Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 58/71] ext4, jbd2: ensure entering into panic after recording an error in superblock Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 59/71] firewire: ohci: fix JMicron JMB38x IT context discovery Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 60/71] nfsd: serialize state seqid morphing operations Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 61/71] nfsd: eliminate sending duplicate and repeated delegations Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 62/71] debugfs: fix refcount imbalance in start_creating Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 63/71] nfs4: start callback_ident at idr 1 Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 64/71] nfs4: resend LAYOUTGET when there is a race that changes the seqid Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 65/71] nfs: if we have no valid attrs, then dont declare the attribute cache valid Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 66/71] ocfs2: fix umask ignored issue Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 67/71] block: fix segment split Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 68/71] ceph: fix message length computation Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 69/71] ALSA: pci: depend on ZONE_DMA Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 70/71] ALSA: hda/hdmi - apply Skylake fix-ups to Broxton display codec Greg Kroah-Hartman
2015-12-12 20:06 ` [PATCH 4.3 71/71] [media] cobalt: fix Kconfig dependency Greg Kroah-Hartman
2015-12-13 3:05 ` [PATCH 4.3 00/71] 4.3.3-stable review Shuah Khan
2015-12-13 3:46 ` Greg Kroah-Hartman
2015-12-13 16:01 ` Guenter Roeck
2015-12-14 3:28 ` Greg Kroah-Hartman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20151212200539.224037187@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=fdmanana@suse.com \
--cc=linux-kernel@vger.kernel.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.