linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Filipe Manana <fdmanana@suse.com>
Subject: [PATCH 3.14 18/47] Btrfs: fix race leading to incorrect item deletion when dropping extents
Date: Wed, 20 Jan 2016 14:00:50 -0800	[thread overview]
Message-ID: <20160120215510.175919984@linuxfoundation.org> (raw)
In-Reply-To: <20160120215507.575738941@linuxfoundation.org>

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Filipe Manana <fdmanana@suse.com>

commit aeafbf8486c9e2bd53f5cc3c10c0b7fd7149d69c upstream.

While running a stress test I got the following warning triggered:

  [191627.672810] ------------[ cut here ]------------
  [191627.673949] WARNING: CPU: 8 PID: 8447 at fs/btrfs/file.c:779 __btrfs_drop_extents+0x391/0xa50 [btrfs]()
  (...)
  [191627.701485] Call Trace:
  [191627.702037]  [<ffffffff8145f077>] dump_stack+0x4f/0x7b
  [191627.702992]  [<ffffffff81095de5>] ? console_unlock+0x356/0x3a2
  [191627.704091]  [<ffffffff8104b3b0>] warn_slowpath_common+0xa1/0xbb
  [191627.705380]  [<ffffffffa0664499>] ? __btrfs_drop_extents+0x391/0xa50 [btrfs]
  [191627.706637]  [<ffffffff8104b46d>] warn_slowpath_null+0x1a/0x1c
  [191627.707789]  [<ffffffffa0664499>] __btrfs_drop_extents+0x391/0xa50 [btrfs]
  [191627.709155]  [<ffffffff8115663c>] ? cache_alloc_debugcheck_after.isra.32+0x171/0x1d0
  [191627.712444]  [<ffffffff81155007>] ? kmemleak_alloc_recursive.constprop.40+0x16/0x18
  [191627.714162]  [<ffffffffa06570c9>] insert_reserved_file_extent.constprop.40+0x83/0x24e [btrfs]
  [191627.715887]  [<ffffffffa065422b>] ? start_transaction+0x3bb/0x610 [btrfs]
  [191627.717287]  [<ffffffffa065b604>] btrfs_finish_ordered_io+0x273/0x4e2 [btrfs]
  [191627.728865]  [<ffffffffa065b888>] finish_ordered_fn+0x15/0x17 [btrfs]
  [191627.730045]  [<ffffffffa067d688>] normal_work_helper+0x14c/0x32c [btrfs]
  [191627.731256]  [<ffffffffa067d96a>] btrfs_endio_write_helper+0x12/0x14 [btrfs]
  [191627.732661]  [<ffffffff81061119>] process_one_work+0x24c/0x4ae
  [191627.733822]  [<ffffffff810615b0>] worker_thread+0x206/0x2c2
  [191627.734857]  [<ffffffff810613aa>] ? process_scheduled_works+0x2f/0x2f
  [191627.736052]  [<ffffffff810613aa>] ? process_scheduled_works+0x2f/0x2f
  [191627.737349]  [<ffffffff810669a6>] kthread+0xef/0xf7
  [191627.738267]  [<ffffffff810f3b3a>] ? time_hardirqs_on+0x15/0x28
  [191627.739330]  [<ffffffff810668b7>] ? __kthread_parkme+0xad/0xad
  [191627.741976]  [<ffffffff81465592>] ret_from_fork+0x42/0x70
  [191627.743080]  [<ffffffff810668b7>] ? __kthread_parkme+0xad/0xad
  [191627.744206] ---[ end trace bbfddacb7aaada8d ]---

  $ cat -n fs/btrfs/file.c
  691  int __btrfs_drop_extents(struct btrfs_trans_handle *trans,
  (...)
  758                  btrfs_item_key_to_cpu(leaf, &key, path->slots[0]);
  759                  if (key.objectid > ino ||
  760                      key.type > BTRFS_EXTENT_DATA_KEY || key.offset >= end)
  761                          break;
  762
  763                  fi = btrfs_item_ptr(leaf, path->slots[0],
  764                                      struct btrfs_file_extent_item);
  765                  extent_type = btrfs_file_extent_type(leaf, fi);
  766
  767                  if (extent_type == BTRFS_FILE_EXTENT_REG ||
  768                      extent_type == BTRFS_FILE_EXTENT_PREALLOC) {
  (...)
  774                  } else if (extent_type == BTRFS_FILE_EXTENT_INLINE) {
  (...)
  778                  } else {
  779                          WARN_ON(1);
  780                          extent_end = search_start;
  781                  }
  (...)

This happened because the item we were processing did not match a file
extent item (its key type != BTRFS_EXTENT_DATA_KEY), and even on this
case we cast the item to a struct btrfs_file_extent_item pointer and
then find a type field value that does not match any of the expected
values (BTRFS_FILE_EXTENT_[REG|PREALLOC|INLINE]). This scenario happens
due to a tiny time window where a race can happen as exemplified below.
For example, consider the following scenario where we're using the
NO_HOLES feature and we have the following two neighbour leafs:

               Leaf X (has N items)                    Leaf Y

[ ... (257 INODE_ITEM 0) (257 INODE_REF 256) ]  [ (257 EXTENT_DATA 8192), ... ]
          slot N - 2         slot N - 1              slot 0

Our inode 257 has an implicit hole in the range [0, 8K[ (implicit rather
than explicit because NO_HOLES is enabled). Now if our inode has an
ordered extent for the range [4K, 8K[ that is finishing, the following
can happen:

          CPU 1                                       CPU 2

  btrfs_finish_ordered_io()
    insert_reserved_file_extent()
      __btrfs_drop_extents()
         Searches for the key
          (257 EXTENT_DATA 4096) through
          btrfs_lookup_file_extent()

         Key not found and we get a path where
         path->nodes[0] == leaf X and
         path->slots[0] == N

         Because path->slots[0] is >=
         btrfs_header_nritems(leaf X), we call
         btrfs_next_leaf()

         btrfs_next_leaf() releases the path

                                                  inserts key
                                                  (257 INODE_REF 4096)
                                                  at the end of leaf X,
                                                  leaf X now has N + 1 keys,
                                                  and the new key is at
                                                  slot N

         btrfs_next_leaf() searches for
         key (257 INODE_REF 256), with
         path->keep_locks set to 1,
         because it was the last key it
         saw in leaf X

           finds it in leaf X again and
           notices it's no longer the last
           key of the leaf, so it returns 0
           with path->nodes[0] == leaf X and
           path->slots[0] == N (which is now
           < btrfs_header_nritems(leaf X)),
           pointing to the new key
           (257 INODE_REF 4096)

         __btrfs_drop_extents() casts the
         item at path->nodes[0], slot
         path->slots[0], to a struct
         btrfs_file_extent_item - it does
         not skip keys for the target
         inode with a type less than
         BTRFS_EXTENT_DATA_KEY
         (BTRFS_INODE_REF_KEY < BTRFS_EXTENT_DATA_KEY)

         sees a bogus value for the type
         field triggering the WARN_ON in
         the trace shown above, and sets
         extent_end = search_start (4096)

         does the if-then-else logic to
         fixup 0 length extent items created
         by a past bug from hole punching:

           if (extent_end == key.offset &&
               extent_end >= search_start)
               goto delete_extent_item;

         that evaluates to true and it ends
         up deleting the key pointed to by
         path->slots[0], (257 INODE_REF 4096),
         from leaf X

The same could happen for example for a xattr that ends up having a key
with an offset value that matches search_start (very unlikely but not
impossible).

So fix this by ensuring that keys smaller than BTRFS_EXTENT_DATA_KEY are
skipped, never casted to struct btrfs_file_extent_item and never deleted
by accident. Also protect against the unexpected case of getting a key
for a lower inode number by skipping that key and issuing a warning.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/btrfs/file.c |   16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -751,8 +751,16 @@ next_slot:
 		}
 
 		btrfs_item_key_to_cpu(leaf, &key, path->slots[0]);
-		if (key.objectid > ino ||
-		    key.type > BTRFS_EXTENT_DATA_KEY || key.offset >= end)
+
+		if (key.objectid > ino)
+			break;
+		if (WARN_ON_ONCE(key.objectid < ino) ||
+		    key.type < BTRFS_EXTENT_DATA_KEY) {
+			ASSERT(del_nr == 0);
+			path->slots[0]++;
+			goto next_slot;
+		}
+		if (key.type > BTRFS_EXTENT_DATA_KEY || key.offset >= end)
 			break;
 
 		fi = btrfs_item_ptr(leaf, path->slots[0],
@@ -771,8 +779,8 @@ next_slot:
 				btrfs_file_extent_inline_len(leaf,
 						     path->slots[0], fi);
 		} else {
-			WARN_ON(1);
-			extent_end = search_start;
+			/* can't happen */
+			BUG();
 		}
 
 		if (extent_end <= search_start) {

  parent reply	other threads:[~2016-01-20 22:10 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-20 22:00 [PATCH 3.14 00/47] 3.14.59-stable review Greg Kroah-Hartman
2016-01-20 22:00 ` [PATCH 3.14 01/47] unix: avoid use-after-free in ep_remove_wait_queue Greg Kroah-Hartman
2016-01-20 22:00 ` [PATCH 3.14 02/47] tools/net: Use include/uapi with __EXPORTED_HEADERS__ Greg Kroah-Hartman
2016-01-20 22:00 ` [PATCH 3.14 03/47] packet: do skb_probe_transport_header when we actually have data Greg Kroah-Hartman
2016-01-20 22:00 ` [PATCH 3.14 04/47] packet: always probe for transport header Greg Kroah-Hartman
2016-01-20 22:00 ` [PATCH 3.14 05/47] packet: infer protocol from ethernet header if unset Greg Kroah-Hartman
2016-01-20 22:00 ` [PATCH 3.14 06/47] sctp: translate host order to network order when setting a hmacid Greg Kroah-Hartman
2016-01-20 22:00 ` [PATCH 3.14 07/47] ip_tunnel: disable preemption when updating per-cpu tstats Greg Kroah-Hartman
2016-01-20 22:00 ` [PATCH 3.14 08/47] snmp: Remove duplicate OUTMCAST stat increment Greg Kroah-Hartman
2016-01-20 22:00 ` [PATCH 3.14 10/47] tcp: md5: fix lockdep annotation Greg Kroah-Hartman
2016-01-20 22:00 ` [PATCH 3.14 11/47] tcp: initialize tp->copied_seq in case of cross SYN connection Greg Kroah-Hartman
2016-01-20 22:00 ` [PATCH 3.14 12/47] net, scm: fix PaX detected msg_controllen overflow in scm_detach_fds Greg Kroah-Hartman
2016-01-20 22:00 ` [PATCH 3.14 13/47] net: ipmr: fix static mfc/dev leaks on table destruction Greg Kroah-Hartman
2016-01-20 22:00 ` [PATCH 3.14 14/47] net: ip6mr: " Greg Kroah-Hartman
2016-01-20 22:00 ` [PATCH 3.14 15/47] broadcom: fix PHY_ID_BCM5481 entry in the id table Greg Kroah-Hartman
2016-01-20 22:00 ` [PATCH 3.14 16/47] ipv6: distinguish frag queues by device for multicast and link-local packets Greg Kroah-Hartman
2016-01-20 22:00 ` [PATCH 3.14 17/47] ipv6: sctp: implement sctp_v6_destroy_sock() Greg Kroah-Hartman
2016-01-20 22:00 ` Greg Kroah-Hartman [this message]
2016-01-20 22:00 ` [PATCH 3.14 19/47] Btrfs: fix race leading to BUG_ON when running delalloc for nodatacow Greg Kroah-Hartman
2016-01-20 22:00 ` [PATCH 3.14 20/47] ext4: fix potential use after free in __ext4_journal_stop Greg Kroah-Hartman
2016-01-20 22:00 ` [PATCH 3.14 21/47] ext4, jbd2: ensure entering into panic after recording an error in superblock Greg Kroah-Hartman
2016-01-20 22:00 ` [PATCH 3.14 22/47] firewire: ohci: fix JMicron JMB38x IT context discovery Greg Kroah-Hartman
2016-01-20 22:00 ` [PATCH 3.14 23/47] nfs4: start callback_ident at idr 1 Greg Kroah-Hartman
2016-01-20 22:00 ` [PATCH 3.14 24/47] nfs: if we have no valid attrs, then dont declare the attribute cache valid Greg Kroah-Hartman
2016-01-20 22:00 ` [PATCH 3.14 25/47] ocfs2: fix umask ignored issue Greg Kroah-Hartman
2016-01-20 22:00 ` [PATCH 3.14 26/47] USB: cdc_acm: Ignore Infineon Flash Loader utility Greg Kroah-Hartman
2016-01-20 22:00 ` [PATCH 3.14 27/47] USB: serial: Another Infineon flash loader USB ID Greg Kroah-Hartman
2016-01-20 22:01 ` [PATCH 3.14 28/47] USB: cp210x: Remove CP2110 ID from compatibility list Greg Kroah-Hartman
2016-01-20 22:01 ` [PATCH 3.14 29/47] USB: add quirk for devices with broken LPM Greg Kroah-Hartman
2016-01-20 22:01 ` [PATCH 3.14 30/47] USB: whci-hcd: add check for dma mapping error Greg Kroah-Hartman
2016-01-20 22:01 ` [PATCH 3.14 31/47] usb: Use the USB_SS_MULT() macro to decode burst multiplier for log message Greg Kroah-Hartman
2016-01-20 22:01 ` [PATCH 3.14 32/47] gre6: allow to update all parameters via rtnl Greg Kroah-Hartman
2016-01-20 22:01 ` [PATCH 3.14 33/47] atl1c: Improve driver not to do order 4 GFP_ATOMIC allocation Greg Kroah-Hartman
2016-01-20 22:01 ` [PATCH 3.14 34/47] sctp: use the same clock as if sock source timestamps were on Greg Kroah-Hartman
2016-01-20 22:01 ` [PATCH 3.14 35/47] sctp: update the netstamp_needed counter when copying sockets Greg Kroah-Hartman
2016-01-20 22:01 ` [PATCH 3.14 36/47] ipv6: sctp: clone options to avoid use after free Greg Kroah-Hartman
2016-01-20 22:01 ` [PATCH 3.14 38/47] sh_eth: fix kernel oops in skb_put() Greg Kroah-Hartman
2016-01-20 22:01 ` [PATCH 3.14 39/47] vlan: Fix untag operations of stacked vlans with REORDER_HEADER off Greg Kroah-Hartman
2016-01-20 22:01 ` [PATCH 3.14 40/47] skbuff: Fix offset error in skb_reorder_vlan_header Greg Kroah-Hartman
2016-01-20 22:01 ` [PATCH 3.14 41/47] pptp: verify sockaddr_len in pptp_bind() and pptp_connect() Greg Kroah-Hartman
2016-01-20 22:01 ` [PATCH 3.14 42/47] bluetooth: Validate socket address length in sco_sock_bind() Greg Kroah-Hartman
2016-01-20 22:01 ` [PATCH 3.14 43/47] af_unix: Revert lock_interruptible in stream receive code Greg Kroah-Hartman
2016-01-20 22:01 ` [PATCH 3.14 44/47] KEYS: Fix race between key destruction and finding a keyring by name Greg Kroah-Hartman
2016-01-20 22:01 ` [PATCH 3.14 45/47] KEYS: Fix crash when attempt to garbage collect an uninstantiated keyring Greg Kroah-Hartman
2016-01-20 22:01 ` [PATCH 3.14 46/47] KEYS: Fix race between read and revoke Greg Kroah-Hartman
2016-01-20 22:01 ` [PATCH 3.14 47/47] KEYS: Fix keyring ref leak in join_session_keyring() Greg Kroah-Hartman
2016-01-20 23:15 ` [PATCH 3.14 00/47] 3.14.59-stable review Shuah Khan
2016-01-21 12:21 ` Guenter Roeck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160120215510.175919984@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=fdmanana@suse.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).