All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Filipe Manana <fdmanana@suse.com>
Subject: [PATCH 3.14 18/47] Btrfs: fix race leading to incorrect item deletion when dropping extents
Date: Wed, 20 Jan 2016 14:00:50 -0800	[thread overview]
Message-ID: <20160120215510.175919984@linuxfoundation.org> (raw)
In-Reply-To: <20160120215507.575738941@linuxfoundation.org>

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Filipe Manana <fdmanana@suse.com>

commit aeafbf8486c9e2bd53f5cc3c10c0b7fd7149d69c upstream.

While running a stress test I got the following warning triggered:

  [191627.672810] ------------[ cut here ]------------
  [191627.673949] WARNING: CPU: 8 PID: 8447 at fs/btrfs/file.c:779 __btrfs_drop_extents+0x391/0xa50 [btrfs]()
  (...)
  [191627.701485] Call Trace:
  [191627.702037]  [<ffffffff8145f077>] dump_stack+0x4f/0x7b
  [191627.702992]  [<ffffffff81095de5>] ? console_unlock+0x356/0x3a2
  [191627.704091]  [<ffffffff8104b3b0>] warn_slowpath_common+0xa1/0xbb
  [191627.705380]  [<ffffffffa0664499>] ? __btrfs_drop_extents+0x391/0xa50 [btrfs]
  [191627.706637]  [<ffffffff8104b46d>] warn_slowpath_null+0x1a/0x1c
  [191627.707789]  [<ffffffffa0664499>] __btrfs_drop_extents+0x391/0xa50 [btrfs]
  [191627.709155]  [<ffffffff8115663c>] ? cache_alloc_debugcheck_after.isra.32+0x171/0x1d0
  [191627.712444]  [<ffffffff81155007>] ? kmemleak_alloc_recursive.constprop.40+0x16/0x18
  [191627.714162]  [<ffffffffa06570c9>] insert_reserved_file_extent.constprop.40+0x83/0x24e [btrfs]
  [191627.715887]  [<ffffffffa065422b>] ? start_transaction+0x3bb/0x610 [btrfs]
  [191627.717287]  [<ffffffffa065b604>] btrfs_finish_ordered_io+0x273/0x4e2 [btrfs]
  [191627.728865]  [<ffffffffa065b888>] finish_ordered_fn+0x15/0x17 [btrfs]
  [191627.730045]  [<ffffffffa067d688>] normal_work_helper+0x14c/0x32c [btrfs]
  [191627.731256]  [<ffffffffa067d96a>] btrfs_endio_write_helper+0x12/0x14 [btrfs]
  [191627.732661]  [<ffffffff81061119>] process_one_work+0x24c/0x4ae
  [191627.733822]  [<ffffffff810615b0>] worker_thread+0x206/0x2c2
  [191627.734857]  [<ffffffff810613aa>] ? process_scheduled_works+0x2f/0x2f
  [191627.736052]  [<ffffffff810613aa>] ? process_scheduled_works+0x2f/0x2f
  [191627.737349]  [<ffffffff810669a6>] kthread+0xef/0xf7
  [191627.738267]  [<ffffffff810f3b3a>] ? time_hardirqs_on+0x15/0x28
  [191627.739330]  [<ffffffff810668b7>] ? __kthread_parkme+0xad/0xad
  [191627.741976]  [<ffffffff81465592>] ret_from_fork+0x42/0x70
  [191627.743080]  [<ffffffff810668b7>] ? __kthread_parkme+0xad/0xad
  [191627.744206] ---[ end trace bbfddacb7aaada8d ]---

  $ cat -n fs/btrfs/file.c
  691  int __btrfs_drop_extents(struct btrfs_trans_handle *trans,
  (...)
  758                  btrfs_item_key_to_cpu(leaf, &key, path->slots[0]);
  759                  if (key.objectid > ino ||
  760                      key.type > BTRFS_EXTENT_DATA_KEY || key.offset >= end)
  761                          break;
  762
  763                  fi = btrfs_item_ptr(leaf, path->slots[0],
  764                                      struct btrfs_file_extent_item);
  765                  extent_type = btrfs_file_extent_type(leaf, fi);
  766
  767                  if (extent_type == BTRFS_FILE_EXTENT_REG ||
  768                      extent_type == BTRFS_FILE_EXTENT_PREALLOC) {
  (...)
  774                  } else if (extent_type == BTRFS_FILE_EXTENT_INLINE) {
  (...)
  778                  } else {
  779                          WARN_ON(1);
  780                          extent_end = search_start;
  781                  }
  (...)

This happened because the item we were processing did not match a file
extent item (its key type != BTRFS_EXTENT_DATA_KEY), and even on this
case we cast the item to a struct btrfs_file_extent_item pointer and
then find a type field value that does not match any of the expected
values (BTRFS_FILE_EXTENT_[REG|PREALLOC|INLINE]). This scenario happens
due to a tiny time window where a race can happen as exemplified below.
For example, consider the following scenario where we're using the
NO_HOLES feature and we have the following two neighbour leafs:

               Leaf X (has N items)                    Leaf Y

[ ... (257 INODE_ITEM 0) (257 INODE_REF 256) ]  [ (257 EXTENT_DATA 8192), ... ]
          slot N - 2         slot N - 1              slot 0

Our inode 257 has an implicit hole in the range [0, 8K[ (implicit rather
than explicit because NO_HOLES is enabled). Now if our inode has an
ordered extent for the range [4K, 8K[ that is finishing, the following
can happen:

          CPU 1                                       CPU 2

  btrfs_finish_ordered_io()
    insert_reserved_file_extent()
      __btrfs_drop_extents()
         Searches for the key
          (257 EXTENT_DATA 4096) through
          btrfs_lookup_file_extent()

         Key not found and we get a path where
         path->nodes[0] == leaf X and
         path->slots[0] == N

         Because path->slots[0] is >=
         btrfs_header_nritems(leaf X), we call
         btrfs_next_leaf()

         btrfs_next_leaf() releases the path

                                                  inserts key
                                                  (257 INODE_REF 4096)
                                                  at the end of leaf X,
                                                  leaf X now has N + 1 keys,
                                                  and the new key is at
                                                  slot N

         btrfs_next_leaf() searches for
         key (257 INODE_REF 256), with
         path->keep_locks set to 1,
         because it was the last key it
         saw in leaf X

           finds it in leaf X again and
           notices it's no longer the last
           key of the leaf, so it returns 0
           with path->nodes[0] == leaf X and
           path->slots[0] == N (which is now
           < btrfs_header_nritems(leaf X)),
           pointing to the new key
           (257 INODE_REF 4096)

         __btrfs_drop_extents() casts the
         item at path->nodes[0], slot
         path->slots[0], to a struct
         btrfs_file_extent_item - it does
         not skip keys for the target
         inode with a type less than
         BTRFS_EXTENT_DATA_KEY
         (BTRFS_INODE_REF_KEY < BTRFS_EXTENT_DATA_KEY)

         sees a bogus value for the type
         field triggering the WARN_ON in
         the trace shown above, and sets
         extent_end = search_start (4096)

         does the if-then-else logic to
         fixup 0 length extent items created
         by a past bug from hole punching:

           if (extent_end == key.offset &&
               extent_end >= search_start)
               goto delete_extent_item;

         that evaluates to true and it ends
         up deleting the key pointed to by
         path->slots[0], (257 INODE_REF 4096),
         from leaf X

The same could happen for example for a xattr that ends up having a key
with an offset value that matches search_start (very unlikely but not
impossible).

So fix this by ensuring that keys smaller than BTRFS_EXTENT_DATA_KEY are
skipped, never casted to struct btrfs_file_extent_item and never deleted
by accident. Also protect against the unexpected case of getting a key
for a lower inode number by skipping that key and issuing a warning.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/btrfs/file.c |   16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -751,8 +751,16 @@ next_slot:
 		}
 
 		btrfs_item_key_to_cpu(leaf, &key, path->slots[0]);
-		if (key.objectid > ino ||
-		    key.type > BTRFS_EXTENT_DATA_KEY || key.offset >= end)
+
+		if (key.objectid > ino)
+			break;
+		if (WARN_ON_ONCE(key.objectid < ino) ||
+		    key.type < BTRFS_EXTENT_DATA_KEY) {
+			ASSERT(del_nr == 0);
+			path->slots[0]++;
+			goto next_slot;
+		}
+		if (key.type > BTRFS_EXTENT_DATA_KEY || key.offset >= end)
 			break;
 
 		fi = btrfs_item_ptr(leaf, path->slots[0],
@@ -771,8 +779,8 @@ next_slot:
 				btrfs_file_extent_inline_len(leaf,
 						     path->slots[0], fi);
 		} else {
-			WARN_ON(1);
-			extent_end = search_start;
+			/* can't happen */
+			BUG();
 		}
 
 		if (extent_end <= search_start) {

  parent reply	other threads:[~2016-01-20 22:10 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-20 22:00 [PATCH 3.14 00/47] 3.14.59-stable review Greg Kroah-Hartman
2016-01-20 22:00 ` [PATCH 3.14 01/47] unix: avoid use-after-free in ep_remove_wait_queue Greg Kroah-Hartman
2016-01-20 22:00 ` [PATCH 3.14 02/47] tools/net: Use include/uapi with __EXPORTED_HEADERS__ Greg Kroah-Hartman
2016-01-20 22:00 ` [PATCH 3.14 03/47] packet: do skb_probe_transport_header when we actually have data Greg Kroah-Hartman
2016-01-20 22:00 ` [PATCH 3.14 04/47] packet: always probe for transport header Greg Kroah-Hartman
2016-01-20 22:00 ` [PATCH 3.14 05/47] packet: infer protocol from ethernet header if unset Greg Kroah-Hartman
2016-01-20 22:00 ` [PATCH 3.14 06/47] sctp: translate host order to network order when setting a hmacid Greg Kroah-Hartman
2016-01-20 22:00 ` [PATCH 3.14 07/47] ip_tunnel: disable preemption when updating per-cpu tstats Greg Kroah-Hartman
2016-01-20 22:00 ` [PATCH 3.14 08/47] snmp: Remove duplicate OUTMCAST stat increment Greg Kroah-Hartman
2016-01-20 22:00 ` [PATCH 3.14 10/47] tcp: md5: fix lockdep annotation Greg Kroah-Hartman
2016-01-20 22:00 ` [PATCH 3.14 11/47] tcp: initialize tp->copied_seq in case of cross SYN connection Greg Kroah-Hartman
2016-01-20 22:00 ` [PATCH 3.14 12/47] net, scm: fix PaX detected msg_controllen overflow in scm_detach_fds Greg Kroah-Hartman
2016-01-20 22:00 ` [PATCH 3.14 13/47] net: ipmr: fix static mfc/dev leaks on table destruction Greg Kroah-Hartman
2016-01-20 22:00 ` [PATCH 3.14 14/47] net: ip6mr: " Greg Kroah-Hartman
2016-01-20 22:00 ` [PATCH 3.14 15/47] broadcom: fix PHY_ID_BCM5481 entry in the id table Greg Kroah-Hartman
2016-01-20 22:00 ` [PATCH 3.14 16/47] ipv6: distinguish frag queues by device for multicast and link-local packets Greg Kroah-Hartman
2016-01-20 22:00 ` [PATCH 3.14 17/47] ipv6: sctp: implement sctp_v6_destroy_sock() Greg Kroah-Hartman
2016-01-20 22:00 ` Greg Kroah-Hartman [this message]
2016-01-20 22:00 ` [PATCH 3.14 19/47] Btrfs: fix race leading to BUG_ON when running delalloc for nodatacow Greg Kroah-Hartman
2016-01-20 22:00 ` [PATCH 3.14 20/47] ext4: fix potential use after free in __ext4_journal_stop Greg Kroah-Hartman
2016-01-20 22:00 ` [PATCH 3.14 21/47] ext4, jbd2: ensure entering into panic after recording an error in superblock Greg Kroah-Hartman
2016-01-20 22:00 ` [PATCH 3.14 22/47] firewire: ohci: fix JMicron JMB38x IT context discovery Greg Kroah-Hartman
2016-01-20 22:00 ` [PATCH 3.14 23/47] nfs4: start callback_ident at idr 1 Greg Kroah-Hartman
2016-01-20 22:00 ` [PATCH 3.14 24/47] nfs: if we have no valid attrs, then dont declare the attribute cache valid Greg Kroah-Hartman
2016-01-20 22:00 ` [PATCH 3.14 25/47] ocfs2: fix umask ignored issue Greg Kroah-Hartman
2016-01-20 22:00 ` [PATCH 3.14 26/47] USB: cdc_acm: Ignore Infineon Flash Loader utility Greg Kroah-Hartman
2016-01-20 22:00 ` [PATCH 3.14 27/47] USB: serial: Another Infineon flash loader USB ID Greg Kroah-Hartman
2016-01-20 22:01 ` [PATCH 3.14 28/47] USB: cp210x: Remove CP2110 ID from compatibility list Greg Kroah-Hartman
2016-01-20 22:01 ` [PATCH 3.14 29/47] USB: add quirk for devices with broken LPM Greg Kroah-Hartman
2016-01-20 22:01 ` [PATCH 3.14 30/47] USB: whci-hcd: add check for dma mapping error Greg Kroah-Hartman
2016-01-20 22:01 ` [PATCH 3.14 31/47] usb: Use the USB_SS_MULT() macro to decode burst multiplier for log message Greg Kroah-Hartman
2016-01-20 22:01 ` [PATCH 3.14 32/47] gre6: allow to update all parameters via rtnl Greg Kroah-Hartman
2016-01-20 22:01 ` [PATCH 3.14 33/47] atl1c: Improve driver not to do order 4 GFP_ATOMIC allocation Greg Kroah-Hartman
2016-01-20 22:01 ` [PATCH 3.14 34/47] sctp: use the same clock as if sock source timestamps were on Greg Kroah-Hartman
2016-01-20 22:01 ` [PATCH 3.14 35/47] sctp: update the netstamp_needed counter when copying sockets Greg Kroah-Hartman
2016-01-20 22:01 ` [PATCH 3.14 36/47] ipv6: sctp: clone options to avoid use after free Greg Kroah-Hartman
2016-01-20 22:01 ` [PATCH 3.14 38/47] sh_eth: fix kernel oops in skb_put() Greg Kroah-Hartman
2016-01-20 22:01 ` [PATCH 3.14 39/47] vlan: Fix untag operations of stacked vlans with REORDER_HEADER off Greg Kroah-Hartman
2016-01-20 22:01 ` [PATCH 3.14 40/47] skbuff: Fix offset error in skb_reorder_vlan_header Greg Kroah-Hartman
2016-01-20 22:01 ` [PATCH 3.14 41/47] pptp: verify sockaddr_len in pptp_bind() and pptp_connect() Greg Kroah-Hartman
2016-01-20 22:01 ` [PATCH 3.14 42/47] bluetooth: Validate socket address length in sco_sock_bind() Greg Kroah-Hartman
2016-01-20 22:01 ` [PATCH 3.14 43/47] af_unix: Revert lock_interruptible in stream receive code Greg Kroah-Hartman
2016-01-20 22:01 ` [PATCH 3.14 44/47] KEYS: Fix race between key destruction and finding a keyring by name Greg Kroah-Hartman
2016-01-20 22:01 ` [PATCH 3.14 45/47] KEYS: Fix crash when attempt to garbage collect an uninstantiated keyring Greg Kroah-Hartman
2016-01-20 22:01 ` [PATCH 3.14 46/47] KEYS: Fix race between read and revoke Greg Kroah-Hartman
2016-01-20 22:01 ` [PATCH 3.14 47/47] KEYS: Fix keyring ref leak in join_session_keyring() Greg Kroah-Hartman
2016-01-20 23:15 ` [PATCH 3.14 00/47] 3.14.59-stable review Shuah Khan
2016-01-21 12:21 ` Guenter Roeck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160120215510.175919984@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=fdmanana@suse.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.