From: lizf@kernel.org
To: stable@vger.kernel.org
Cc: linux-kernel@vger.kernel.org, Filipe Manana <fdmanana@suse.com>,
Zefan Li <lizefan@huawei.com>
Subject: [PATCH 3.4 029/125] Btrfs: fix race leading to BUG_ON when running delalloc for nodatacow
Date: Wed, 12 Oct 2016 20:32:25 +0800 [thread overview]
Message-ID: <1476275641-4697-29-git-send-email-lizf@kernel.org> (raw)
In-Reply-To: <1476275600-4626-1-git-send-email-lizf@kernel.org>
From: Filipe Manana <fdmanana@suse.com>
3.4.113-rc1 review patch. If anyone has any objections, please let me know.
------------------
commit 1d512cb77bdbda80f0dd0620a3b260d697fd581d upstream.
If we are using the NO_HOLES feature, we have a tiny time window when
running delalloc for a nodatacow inode where we can race with a concurrent
link or xattr add operation leading to a BUG_ON.
This happens because at run_delalloc_nocow() we end up casting a leaf item
of type BTRFS_INODE_[REF|EXTREF]_KEY or of type BTRFS_XATTR_ITEM_KEY to a
file extent item (struct btrfs_file_extent_item) and then analyse its
extent type field, which won't match any of the expected extent types
(values BTRFS_FILE_EXTENT_[REG|PREALLOC|INLINE]) and therefore trigger an
explicit BUG_ON(1).
The following sequence diagram shows how the race happens when running a
no-cow dellaloc range [4K, 8K[ for inode 257 and we have the following
neighbour leafs:
Leaf X (has N items) Leaf Y
[ ... (257 INODE_ITEM 0) (257 INODE_REF 256) ] [ (257 EXTENT_DATA 8192), ... ]
slot N - 2 slot N - 1 slot 0
(Note the implicit hole for inode 257 regarding the [0, 8K[ range)
CPU 1 CPU 2
run_dealloc_nocow()
btrfs_lookup_file_extent()
--> searches for a key with value
(257 EXTENT_DATA 4096) in the
fs/subvol tree
--> returns us a path with
path->nodes[0] == leaf X and
path->slots[0] == N
because path->slots[0] is >=
btrfs_header_nritems(leaf X), it
calls btrfs_next_leaf()
btrfs_next_leaf()
--> releases the path
hard link added to our inode,
with key (257 INODE_REF 500)
added to the end of leaf X,
so leaf X now has N + 1 keys
--> searches for the key
(257 INODE_REF 256), because
it was the last key in leaf X
before it released the path,
with path->keep_locks set to 1
--> ends up at leaf X again and
it verifies that the key
(257 INODE_REF 256) is no longer
the last key in the leaf, so it
returns with path->nodes[0] ==
leaf X and path->slots[0] == N,
pointing to the new item with
key (257 INODE_REF 500)
the loop iteration of run_dealloc_nocow()
does not break out the loop and continues
because the key referenced in the path
at path->nodes[0] and path->slots[0] is
for inode 257, its type is < BTRFS_EXTENT_DATA_KEY
and its offset (500) is less then our delalloc
range's end (8192)
the item pointed by the path, an inode reference item,
is (incorrectly) interpreted as a file extent item and
we get an invalid extent type, leading to the BUG_ON(1):
if (extent_type == BTRFS_FILE_EXTENT_REG ||
extent_type == BTRFS_FILE_EXTENT_PREALLOC) {
(...)
} else if (extent_type == BTRFS_FILE_EXTENT_INLINE) {
(...)
} else {
BUG_ON(1)
}
The same can happen if a xattr is added concurrently and ends up having
a key with an offset smaller then the delalloc's range end.
So fix this by skipping keys with a type smaller than
BTRFS_EXTENT_DATA_KEY.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Zefan Li <lizefan@huawei.com>
---
fs/btrfs/inode.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 575c190..d460390 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1203,8 +1203,14 @@ next_slot:
num_bytes = 0;
btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]);
- if (found_key.objectid > ino ||
- found_key.type > BTRFS_EXTENT_DATA_KEY ||
+ if (found_key.objectid > ino)
+ break;
+ if (WARN_ON_ONCE(found_key.objectid < ino) ||
+ found_key.type < BTRFS_EXTENT_DATA_KEY) {
+ path->slots[0]++;
+ goto next_slot;
+ }
+ if (found_key.type > BTRFS_EXTENT_DATA_KEY ||
found_key.offset > end)
break;
--
1.9.1
next prev parent reply other threads:[~2016-10-12 12:32 UTC|newest]
Thread overview: 142+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-10-12 12:33 [PATCH 3.4 000/125] 3.4.113-rc1 review lizf
2016-10-12 12:31 ` [PATCH 3.4 001/125] mac80211: fix driver RSSI event calculations lizf
2016-10-12 12:31 ` [PATCH 3.4 002/125] wm831x_power: Use IRQF_ONESHOT to request threaded IRQs lizf
2016-10-12 12:31 ` [PATCH 3.4 003/125] mwifiex: fix mwifiex_rdeeprom_read() lizf
2016-10-12 12:32 ` [PATCH 3.4 004/125] devres: fix a for loop bounds check lizf
2016-10-12 12:32 ` [PATCH 3.4 005/125] ARM: pxa: remove incorrect __init annotation on pxa27x_set_pwrmode lizf
2016-10-12 12:32 ` [PATCH 3.4 006/125] MIPS: atomic: Fix comment describing atomic64_add_unless's return value lizf
2016-10-12 12:32 ` [PATCH 3.4 007/125] recordmcount: Fix endianness handling bug for nop_mcount lizf
2016-10-12 13:10 ` Steven Rostedt
2016-10-13 1:06 ` Zefan Li
2016-10-12 12:32 ` [PATCH 3.4 008/125] ipv6: fix tunnel error handling lizf
2016-10-12 12:32 ` [PATCH 3.4 009/125] scsi: restart list search after unlock in scsi_remove_target lizf
2016-10-12 12:32 ` [PATCH 3.4 010/125] net: fix a race in dst_release() lizf
2016-10-12 12:32 ` [PATCH 3.4 011/125] FS-Cache: Increase reference of parent after registering, netfs success lizf
2016-10-12 12:32 ` [PATCH 3.4 012/125] FS-Cache: Don't override netfs's primary_index if registering failed lizf
2016-10-12 12:32 ` [PATCH 3.4 013/125] FS-Cache: Handle a write to the page immediately beyond the EOF marker lizf
2016-10-12 12:32 ` [PATCH 3.4 014/125] HID: core: Avoid uninitialized buffer access lizf
2016-10-12 12:32 ` [PATCH 3.4 015/125] mtd: mtdpart: fix add_mtd_partitions error path lizf
2016-10-12 12:32 ` [PATCH 3.4 016/125] iommu/vt-d: Fix ATSR handling for Root-Complex integrated endpoints lizf
2016-10-12 12:32 ` [PATCH 3.4 017/125] ext4, jbd2: ensure entering into panic after recording an error in superblock lizf
2016-10-12 12:32 ` [PATCH 3.4 018/125] Bluetooth: ath3k: Add support of AR3012 0cf3:817b device lizf
2016-10-12 12:32 ` [PATCH 3.4 019/125] staging: rtl8712: Add device ID for Sitecom WLA2100 lizf
2016-10-12 12:32 ` [PATCH 3.4 020/125] ACPI: Use correct IRQ when uninstalling ACPI interrupt handler lizf
2016-10-12 12:32 ` [PATCH 3.4 021/125] ALSA: hda - Disable 64bit address for Creative HDA controllers lizf
2016-10-12 12:32 ` [PATCH 3.4 022/125] megaraid_sas: Do not use PAGE_SIZE for max_sectors lizf
2016-10-12 12:32 ` [PATCH 3.4 023/125] Revert "dm mpath: fix stalls when handling invalid ioctls" lizf
2016-10-12 12:32 ` [PATCH 3.4 024/125] crypto: algif_hash - Only export and import on sockets with data lizf
2016-10-12 12:32 ` [PATCH 3.4 025/125] megaraid_sas : SMAP restriction--do not access user memory from IOCTL code lizf
2016-10-12 12:32 ` [PATCH 3.4 026/125] ALSA: hda - Apply pin fixup for HP ProBook 6550b lizf
2016-10-12 12:32 ` [PATCH 3.4 027/125] firewire: ohci: fix JMicron JMB38x IT context discovery lizf
2016-10-12 13:08 ` Stefan Richter
2016-10-12 12:32 ` [PATCH 3.4 028/125] x86/cpu: Call verify_cpu() after having entered long mode too lizf
2016-10-12 12:32 ` lizf [this message]
2016-10-12 12:32 ` [PATCH 3.4 030/125] perf: Fix inherited events vs. tracepoint filters lizf
2016-10-12 12:32 ` [PATCH 3.4 031/125] scsi_sysfs: Fix queue_ramp_up_period return code lizf
2016-10-12 12:32 ` [PATCH 3.4 032/125] binfmt_elf: Don't clobber passed executable's file header lizf
2016-10-12 12:32 ` [PATCH 3.4 033/125] sctp: translate host order to network order when setting a hmacid lizf
2016-10-12 12:32 ` [PATCH 3.4 034/125] usb: musb: core: fix order of arguments to ulpi write callback lizf
2016-10-12 12:32 ` [PATCH 3.4 035/125] net: fix __netdev_update_features return on ndo_set_features failure lizf
2016-10-12 12:32 ` [PATCH 3.4 036/125] mac80211: mesh: fix call_rcu() usage lizf
2016-10-12 12:32 ` [PATCH 3.4 037/125] macvlan: fix leak in macvlan_handle_frame lizf
2016-10-12 12:32 ` [PATCH 3.4 038/125] tcp: md5: fix lockdep annotation lizf
2016-10-12 12:32 ` [PATCH 3.4 039/125] usblp: do not set TASK_INTERRUPTIBLE before lock lizf
2016-10-12 12:32 ` [PATCH 3.4 040/125] ip6mr: call del_timer_sync() in ip6mr_free_table() lizf
2016-10-12 12:32 ` [PATCH 3.4 041/125] net: ip6mr: fix static mfc/dev leaks on table destruction lizf
2016-10-12 12:32 ` [PATCH 3.4 042/125] broadcom: fix PHY_ID_BCM5481 entry in the id table lizf
2016-10-12 12:32 ` [PATCH 3.4 043/125] ring-buffer: Update read stamp with first real commit on page lizf
2016-10-12 12:32 ` [PATCH 3.4 044/125] net/neighbour: fix crash at dumping device-agnostic proxy entries lizf
2016-10-12 12:32 ` [PATCH 3.4 045/125] iio: lpc32xx_adc: fix warnings caused by enabling unprepared clock lizf
2016-10-12 12:32 ` [PATCH 3.4 046/125] ALSA: usb-audio: add packet size quirk for the Medeli DD305 lizf
2016-10-12 12:32 ` [PATCH 3.4 047/125] ALSA: usb-audio: prevent CH345 multiport output SysEx corruption lizf
2016-10-12 12:32 ` [PATCH 3.4 048/125] ALSA: usb-audio: work around CH345 input " lizf
2016-10-12 12:32 ` [PATCH 3.4 049/125] USB: serial: option: add support for Novatel MiFi USB620L lizf
2016-10-12 12:32 ` [PATCH 3.4 050/125] ASoC: wm8962: correct addresses for HPF_C_0/1 lizf
2016-10-12 12:32 ` [PATCH 3.4 051/125] USB: option: add XS Stick W100-2 from 4G Systems lizf
2016-10-12 12:32 ` [PATCH 3.4 052/125] mac: validate mac_partition is within sector lizf
2016-10-12 12:32 ` [PATCH 3.4 053/125] can: sja1000: clear interrupts on start lizf
2016-10-12 12:32 ` [PATCH 3.4 054/125] vfs: Make sendfile(2) killable even better lizf
2016-10-12 12:32 ` [PATCH 3.4 055/125] vfs: Avoid softlockups with sendfile(2) lizf
2016-10-12 12:32 ` [PATCH 3.4 056/125] nfs: if we have no valid attrs, then don't declare the attribute cache valid lizf
2016-10-12 12:32 ` [PATCH 3.4 057/125] wan/x25: Fix use-after-free in x25_asy_open_tty() lizf
2016-10-12 12:32 ` [PATCH 3.4 058/125] sched/core: Clear the root_domain cpumasks in init_rootdomain() lizf
2016-10-12 12:32 ` [PATCH 3.4 059/125] x86/signal: Fix restart_syscall number for x32 tasks lizf
2016-10-12 12:32 ` [PATCH 3.4 060/125] fix sysvfs symlinks lizf
2016-10-12 12:32 ` [PATCH 3.4 061/125] fuse: break infinite loop in fuse_fill_write_pages() lizf
2016-10-12 12:32 ` [PATCH 3.4 062/125] USB: cp210x: Remove CP2110 ID from compatibility list lizf
2016-10-12 12:32 ` [PATCH 3.4 063/125] ext4: Fix handling of extended tv_sec lizf
2016-10-12 12:33 ` [PATCH 3.4 064/125] jbd2: Fix unreclaimed pages after truncate in data=journal mode lizf
2016-10-12 12:33 ` [PATCH 3.4 065/125] drm/ttm: Fixed a read/write lock imbalance lizf
2016-10-12 13:04 ` Thomas Hellstrom
2016-10-13 2:48 ` Zefan Li
2016-10-12 12:33 ` [PATCH 3.4 066/125] AHCI: Fix softreset failed issue of Port Multiplier lizf
2016-10-12 12:33 ` [PATCH 3.4 067/125] sata_sil: disable trim lizf
2016-10-12 12:33 ` [PATCH 3.4 068/125] USB: whci-hcd: add check for dma mapping error lizf
2016-10-12 12:33 ` [PATCH 3.4 069/125] dm btree: fix leak of bufio-backed block in btree_split_sibling error path lizf
2016-10-12 12:33 ` [PATCH 3.4 070/125] usb: xhci: fix config fail of FS hub behind a HS hub with MTT lizf
2016-10-12 12:33 ` [PATCH 3.4 071/125] ALSA: rme96: Fix unexpected volume reset after rate changes lizf
2016-10-12 12:33 ` [PATCH 3.4 072/125] sctp: start t5 timer only when peer rwnd is 0 and local state is SHUTDOWN_PENDING lizf
2016-10-12 12:33 ` [PATCH 3.4 073/125] 9p: ->evict_inode() should kick out ->i_data, not ->i_mapping lizf
2016-10-12 12:33 ` [PATCH 3.4 074/125] crypto: skcipher - Copy iv from desc even for 0-len walks lizf
2016-10-12 12:33 ` [PATCH 3.4 075/125] rfkill: copy the name into the rfkill struct lizf
2016-10-12 12:33 ` [PATCH 3.4 076/125] dm btree: fix bufio buffer leaks in dm_btree_del() error path lizf
2016-10-12 12:33 ` [PATCH 3.4 077/125] ses: Fix problems with simple enclosures lizf
2016-10-12 12:33 ` [PATCH 3.4 078/125] vgaarb: fix signal handling in vga_get() lizf
2016-10-12 12:33 ` [PATCH 3.4 079/125] ses: fix additional element traversal bug lizf
2016-10-12 12:33 ` [PATCH 3.4 080/125] parisc iommu: fix panic due to trying to allocate too large region lizf
2016-10-12 12:33 ` [PATCH 3.4 081/125] mm, vmstat: allow WQ concurrency to discover memory reclaim doesn't make any progress lizf
2016-10-12 13:29 ` Michal Hocko
2016-10-13 2:49 ` Zefan Li
2016-10-12 12:33 ` [PATCH 3.4 082/125] mm: hugetlb: call huge_pte_alloc() only if ptep is null lizf
2016-10-12 12:33 ` [PATCH 3.4 083/125] tty: Fix GPF in flush_to_ldisc() lizf
2016-10-12 12:33 ` [PATCH 3.4 084/125] genirq: Prevent chip buslock deadlock lizf
2016-10-12 12:33 ` [PATCH 3.4 085/125] sh_eth: fix TX buffer byte-swapping lizf
2016-10-12 12:33 ` [PATCH 3.4 086/125] ARM: 8471/1: need to save/restore arm register(r11) when it is corrupted lizf
2016-10-12 12:33 ` [PATCH 3.4 087/125] mISDN: fix a loop count lizf
2016-10-12 12:33 ` [PATCH 3.4 088/125] ser_gigaset: fix deallocation of platform device structure lizf
2016-10-12 12:52 ` Paul Bolle
2016-10-13 2:52 ` Zefan Li
2016-10-13 8:11 ` Paul Bolle
2016-10-13 8:50 ` Zefan Li
2016-10-12 12:33 ` [PATCH 3.4 089/125] spi: fix parent-device reference leak lizf
2016-10-12 12:33 ` [PATCH 3.4 090/125] scripts: recordmcount: break hardlinks lizf
2016-10-12 12:33 ` [PATCH 3.4 091/125] ftrace/scripts: Have recordmcount copy the object file lizf
2016-10-12 12:33 ` [PATCH 3.4 092/125] xen: Add RING_COPY_REQUEST() lizf
2016-10-12 12:33 ` [PATCH 3.4 093/125] xen-netback: don't use last request to determine minimum Tx credit lizf
2016-10-12 12:33 ` [PATCH 3.4 094/125] xen-netback: use RING_COPY_REQUEST() throughout lizf
2016-10-12 12:33 ` [PATCH 3.4 095/125] xen-blkback: only read request operation from shared ring once lizf
2016-10-12 12:33 ` [PATCH 3.4 096/125] xen/pciback: Save xen_pci_op commands before processing it lizf
2016-10-12 12:59 ` Konrad Rzeszutek Wilk
2016-10-13 2:48 ` Zefan Li
2016-10-12 12:33 ` [PATCH 3.4 097/125] xen/pciback: Return error on XEN_PCI_OP_enable_msi when device has MSI or MSI-X enabled lizf
2016-10-12 12:33 ` [PATCH 3.4 098/125] xen/pciback: Return error on XEN_PCI_OP_enable_msix " lizf
2016-10-12 12:33 ` [PATCH 3.4 099/125] xen/pciback: Do not install an IRQ handler for MSI interrupts lizf
2016-10-12 12:33 ` [PATCH 3.4 100/125] xen/pciback: For XEN_PCI_OP_disable_msi[|x] only disable if device has MSI(X) enabled lizf
2016-10-12 12:33 ` [PATCH 3.4 101/125] xen/pciback: Don't allow MSI-X ops if PCI_COMMAND_MEMORY is not set lizf
2016-10-12 12:33 ` [PATCH 3.4 102/125] USB: ipaq.c: fix a timeout loop lizf
2016-10-12 12:33 ` [PATCH 3.4 103/125] USB: fix invalid memory access in hub_activate() lizf
2016-10-12 12:33 ` [PATCH 3.4 104/125] KEYS: Fix race between read and revoke lizf
2016-10-12 12:33 ` [PATCH 3.4 105/125] parisc: Fix syscall restarts lizf
2016-10-12 12:33 ` [PATCH 3.4 106/125] ipv6/addrlabel: fix ip6addrlbl_get() lizf
2016-10-12 12:33 ` [PATCH 3.4 107/125] ocfs2: fix BUG when calculate new backup super lizf
2016-10-12 12:33 ` [PATCH 3.4 108/125] mm/memory_hotplug.c: check for missing sections in test_pages_in_a_zone() lizf
2016-10-12 12:33 ` [PATCH 3.4 109/125] ftrace/scripts: Fix incorrect use of sprintf in recordmcount lizf
2016-10-12 12:33 ` [PATCH 3.4 110/125] net: possible use after free in dst_release lizf
2016-10-12 12:33 ` [PATCH 3.4 111/125] af_unix: fix a fatal race with bit fields lizf
2016-10-12 12:33 ` [PATCH 3.4 112/125] USB: ti_usb_3410_502: Fix ID table size lizf
2016-10-12 12:33 ` [PATCH 3.4 113/125] net: Fix skb csum races when peeking lizf
2016-10-12 12:33 ` [PATCH 3.4 114/125] udp: properly support MSG_PEEK with truncated buffers lizf
2016-10-12 12:33 ` [PATCH 3.4 115/125] drm/radeon: fix hotplug race at startup lizf
2016-10-12 12:33 ` [PATCH 3.4 116/125] sctp: Prevent soft lockup when sctp_accept() is called during a timeout event lizf
2016-10-12 12:33 ` [PATCH 3.4 117/125] ipv6: update ip6_rt_last_gc every time GC is run lizf
2016-10-12 12:33 ` [PATCH 3.4 118/125] ipv6: don't call fib6_run_gc() until routing is ready lizf
2016-10-12 12:33 ` [PATCH 3.4 119/125] ipv6: fix handling of blackhole and prohibit routes lizf
2016-10-12 12:33 ` [PATCH 3.4 120/125] Fix incomplete backport of commit 423f04d63cf4 lizf
2016-10-12 12:33 ` [PATCH 3.4 121/125] Fix incomplete backport of commit 0f792cf949a0 lizf
2016-10-12 12:33 ` [PATCH 3.4 122/125] Revert "USB: Add device quirk for ASUS T100 Base Station keyboard" lizf
2016-10-12 12:33 ` [PATCH 3.4 123/125] Revert "USB: Add OTG PET device to TPL" lizf
2016-10-12 12:34 ` [PATCH 3.4 124/125] tcp: make challenge acks less predictable lizf
2016-10-12 12:34 ` [PATCH 3.4 125/125] time: Prevent early expiry of hrtimers[CLOCK_REALTIME] at the leap second edge lizf
2016-10-12 16:56 ` [PATCH 3.4 000/125] 3.4.113-rc1 review Guenter Roeck
2016-10-13 1:06 ` Zefan Li
2016-10-13 21:15 ` Christoph Biedl
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1476275641-4697-29-git-send-email-lizf@kernel.org \
--to=lizf@kernel.org \
--cc=fdmanana@suse.com \
--cc=linux-kernel@vger.kernel.org \
--cc=lizefan@huawei.com \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).