From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, Filipe Manana <fdmanana@suse.com>,
David Sterba <dsterba@suse.com>,
"Sasha Levin (Microsoft)" <sashal@kernel.org>
Subject: [PATCH 5.0 016/101] Btrfs: fix file corruption after snapshotting due to mix of buffered/DIO writes
Date: Thu, 2 May 2019 17:20:18 +0200 [thread overview]
Message-ID: <20190502143341.147009688@linuxfoundation.org> (raw)
In-Reply-To: <20190502143339.434882399@linuxfoundation.org>
[ Upstream commit 609e804d771f59dc5d45a93e5ee0053c74bbe2bf ]
When we are mixing buffered writes with direct IO writes against the same
file and snapshotting is happening concurrently, we can end up with a
corrupt file content in the snapshot. Example:
1) Inode/file is empty.
2) Snapshotting starts.
2) Buffered write at offset 0 length 256Kb. This updates the i_size of the
inode to 256Kb, disk_i_size remains zero. This happens after the task
doing the snapshot flushes all existing delalloc.
3) DIO write at offset 256Kb length 768Kb. Once the ordered extent
completes it sets the inode's disk_i_size to 1Mb (256Kb + 768Kb) and
updates the inode item in the fs tree with a size of 1Mb (which is
the value of disk_i_size).
4) The dealloc for the range [0, 256Kb[ did not start yet.
5) The transaction used in the DIO ordered extent completion, which updated
the inode item, is committed by the snapshotting task.
6) Snapshot creation completes.
7) Dealloc for the range [0, 256Kb[ is flushed.
After that when reading the file from the snapshot we always get zeroes for
the range [0, 256Kb[, the file has a size of 1Mb and the data written by
the direct IO write is found. From an application's point of view this is
a corruption, since in the source subvolume it could never read a version
of the file that included the data from the direct IO write without the
data from the buffered write included as well. In the snapshot's tree,
file extent items are missing for the range [0, 256Kb[.
The issue, obviously, does not happen when using the -o flushoncommit
mount option.
Fix this by flushing delalloc for all the roots that are about to be
snapshotted when committing a transaction. This guarantees total ordering
when updating the disk_i_size of an inode since the flush for dealloc is
done when a transaction is in the TRANS_STATE_COMMIT_START state and wait
is done once no more external writers exist. This is similar to what we
do when using the flushoncommit mount option, but we do it only if the
transaction has snapshots to create and only for the roots of the
subvolumes to be snapshotted. The bulk of the dealloc is flushed in the
snapshot creation ioctl, so the flush work we do inside the transaction
is minimized.
This issue, involving buffered and direct IO writes with snapshotting, is
often triggered by fstest btrfs/078, and got reported by fsck when not
using the NO_HOLES features, for example:
$ cat results/btrfs/078.full
(...)
_check_btrfs_filesystem: filesystem on /dev/sdc is inconsistent
*** fsck.btrfs output ***
[1/7] checking root items
[2/7] checking extents
[3/7] checking free space cache
[4/7] checking fs roots
root 258 inode 264 errors 100, file extent discount
Found file extent holes:
start: 524288, len: 65536
ERROR: errors found in fs roots
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin (Microsoft) <sashal@kernel.org>
---
fs/btrfs/transaction.c | 49 ++++++++++++++++++++++++++++++++++++------
1 file changed, 43 insertions(+), 6 deletions(-)
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 4ec2b660d014..7f3ece91a4d0 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -1886,8 +1886,10 @@ static void btrfs_cleanup_pending_block_groups(struct btrfs_trans_handle *trans)
}
}
-static inline int btrfs_start_delalloc_flush(struct btrfs_fs_info *fs_info)
+static inline int btrfs_start_delalloc_flush(struct btrfs_trans_handle *trans)
{
+ struct btrfs_fs_info *fs_info = trans->fs_info;
+
/*
* We use writeback_inodes_sb here because if we used
* btrfs_start_delalloc_roots we would deadlock with fs freeze.
@@ -1897,15 +1899,50 @@ static inline int btrfs_start_delalloc_flush(struct btrfs_fs_info *fs_info)
* from already being in a transaction and our join_transaction doesn't
* have to re-take the fs freeze lock.
*/
- if (btrfs_test_opt(fs_info, FLUSHONCOMMIT))
+ if (btrfs_test_opt(fs_info, FLUSHONCOMMIT)) {
writeback_inodes_sb(fs_info->sb, WB_REASON_SYNC);
+ } else {
+ struct btrfs_pending_snapshot *pending;
+ struct list_head *head = &trans->transaction->pending_snapshots;
+
+ /*
+ * Flush dellaloc for any root that is going to be snapshotted.
+ * This is done to avoid a corrupted version of files, in the
+ * snapshots, that had both buffered and direct IO writes (even
+ * if they were done sequentially) due to an unordered update of
+ * the inode's size on disk.
+ */
+ list_for_each_entry(pending, head, list) {
+ int ret;
+
+ ret = btrfs_start_delalloc_snapshot(pending->root);
+ if (ret)
+ return ret;
+ }
+ }
return 0;
}
-static inline void btrfs_wait_delalloc_flush(struct btrfs_fs_info *fs_info)
+static inline void btrfs_wait_delalloc_flush(struct btrfs_trans_handle *trans)
{
- if (btrfs_test_opt(fs_info, FLUSHONCOMMIT))
+ struct btrfs_fs_info *fs_info = trans->fs_info;
+
+ if (btrfs_test_opt(fs_info, FLUSHONCOMMIT)) {
btrfs_wait_ordered_roots(fs_info, U64_MAX, 0, (u64)-1);
+ } else {
+ struct btrfs_pending_snapshot *pending;
+ struct list_head *head = &trans->transaction->pending_snapshots;
+
+ /*
+ * Wait for any dellaloc that we started previously for the roots
+ * that are going to be snapshotted. This is to avoid a corrupted
+ * version of files in the snapshots that had both buffered and
+ * direct IO writes (even if they were done sequentially).
+ */
+ list_for_each_entry(pending, head, list)
+ btrfs_wait_ordered_extents(pending->root,
+ U64_MAX, 0, U64_MAX);
+ }
}
int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
@@ -2024,7 +2061,7 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
extwriter_counter_dec(cur_trans, trans->type);
- ret = btrfs_start_delalloc_flush(fs_info);
+ ret = btrfs_start_delalloc_flush(trans);
if (ret)
goto cleanup_transaction;
@@ -2040,7 +2077,7 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
if (ret)
goto cleanup_transaction;
- btrfs_wait_delalloc_flush(fs_info);
+ btrfs_wait_delalloc_flush(trans);
btrfs_scrub_pause(fs_info);
/*
--
2.19.1
next prev parent reply other threads:[~2019-05-02 15:40 UTC|newest]
Thread overview: 115+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-05-02 15:20 [PATCH 5.0 000/101] 5.0.12-stable review Greg Kroah-Hartman
2019-05-02 15:20 ` [PATCH 5.0 001/101] selinux: use kernel linux/socket.h for genheaders and mdp Greg Kroah-Hartman
2019-05-02 15:20 ` [PATCH 5.0 002/101] Revert "ACPICA: Clear status of GPEs before enabling them" Greg Kroah-Hartman
2019-05-02 15:20 ` [PATCH 5.0 003/101] drm/i915: Do not enable FEC without DSC Greg Kroah-Hartman
2019-05-02 15:20 ` [PATCH 5.0 004/101] mm: make page ref count overflow check tighter and more explicit Greg Kroah-Hartman
2019-05-02 15:20 ` [PATCH 5.0 005/101] mm: add try_get_page() helper function Greg Kroah-Hartman
2019-05-02 15:20 ` [PATCH 5.0 006/101] mm: prevent get_user_pages() from overflowing page refcount Greg Kroah-Hartman
2019-05-02 15:20 ` [PATCH 5.0 007/101] fs: prevent page refcount overflow in pipe_buf_get Greg Kroah-Hartman
2019-05-02 15:20 ` [PATCH 5.0 008/101] arm64: dts: renesas: r8a77990: Fix SCIF5 DMA channels Greg Kroah-Hartman
2019-05-02 15:20 ` [PATCH 5.0 009/101] ARM: dts: bcm283x: Fix hdmi hpd gpio pull Greg Kroah-Hartman
2019-05-02 15:20 ` [PATCH 5.0 010/101] s390: limit brk randomization to 32MB Greg Kroah-Hartman
2019-05-02 15:20 ` [PATCH 5.0 011/101] mt76x02: fix hdr pointer in write txwi for USB Greg Kroah-Hartman
2019-05-02 15:20 ` [PATCH 5.0 012/101] mt76: mt76x2: fix external LNA gain settings Greg Kroah-Hartman
2019-05-02 15:20 ` [PATCH 5.0 013/101] mt76: mt76x2: fix 2.4 GHz channel " Greg Kroah-Hartman
2019-05-02 15:20 ` [PATCH 5.0 014/101] net: ieee802154: fix a potential NULL pointer dereference Greg Kroah-Hartman
2019-05-02 15:20 ` [PATCH 5.0 015/101] ieee802154: hwsim: propagate genlmsg_reply return code Greg Kroah-Hartman
2019-05-02 15:20 ` Greg Kroah-Hartman [this message]
2019-05-02 15:20 ` [PATCH 5.0 017/101] net: stmmac: dont set own bit too early for jumbo frames Greg Kroah-Hartman
2019-05-02 15:20 ` [PATCH 5.0 018/101] net: stmmac: fix jumbo frame sending with non-linear skbs Greg Kroah-Hartman
2019-05-02 15:20 ` [PATCH 5.0 019/101] qlcnic: Avoid potential NULL pointer dereference Greg Kroah-Hartman
2019-05-02 15:20 ` [PATCH 5.0 020/101] xsk: fix umem memory leak on cleanup Greg Kroah-Hartman
2019-05-02 15:20 ` [PATCH 5.0 021/101] staging: axis-fifo: add CONFIG_OF dependency Greg Kroah-Hartman
2019-05-02 15:20 ` [PATCH 5.0 022/101] staging, mt7621-pci: fix build without pci support Greg Kroah-Hartman
2019-05-02 15:20 ` [PATCH 5.0 023/101] netfilter: nft_set_rbtree: check for inactive element after flag mismatch Greg Kroah-Hartman
2019-05-02 15:20 ` [PATCH 5.0 024/101] netfilter: bridge: set skb transport_header before entering NF_INET_PRE_ROUTING Greg Kroah-Hartman
2019-05-02 15:20 ` [PATCH 5.0 025/101] netfilter: fix NETFILTER_XT_TARGET_TEE dependencies Greg Kroah-Hartman
2019-05-02 15:20 ` [PATCH 5.0 026/101] netfilter: ip6t_srh: fix NULL pointer dereferences Greg Kroah-Hartman
2019-05-02 15:20 ` [PATCH 5.0 027/101] s390/qeth: fix race when initializing the IP address table Greg Kroah-Hartman
2019-05-02 15:20 ` [PATCH 5.0 028/101] ARM: imx51: fix a leaked reference by adding missing of_node_put Greg Kroah-Hartman
2019-05-02 15:20 ` [PATCH 5.0 029/101] sc16is7xx: missing unregister/delete driver on error in sc16is7xx_init() Greg Kroah-Hartman
2019-05-02 15:20 ` [PATCH 5.0 030/101] serial: ar933x_uart: Fix build failure with disabled console Greg Kroah-Hartman
2019-05-02 15:20 ` [PATCH 5.0 031/101] KVM: arm64: Reset the PMU in preemptible context Greg Kroah-Hartman
2019-05-02 15:20 ` [PATCH 5.0 032/101] arm64: KVM: Always set ICH_HCR_EL2.EN if GICv4 is enabled Greg Kroah-Hartman
2019-05-02 15:20 ` [PATCH 5.0 033/101] KVM: arm/arm64: vgic-its: Take the srcu lock when writing to guest memory Greg Kroah-Hartman
2019-05-02 15:20 ` [PATCH 5.0 034/101] KVM: arm/arm64: vgic-its: Take the srcu lock when parsing the memslots Greg Kroah-Hartman
2019-05-02 15:20 ` [PATCH 5.0 035/101] usb: dwc3: pci: add support for Comet Lake PCH ID Greg Kroah-Hartman
2019-05-02 15:20 ` [PATCH 5.0 036/101] usb: gadget: net2280: Fix overrun of OUT messages Greg Kroah-Hartman
2019-05-02 15:20 ` [PATCH 5.0 037/101] usb: gadget: net2280: Fix net2280_dequeue() Greg Kroah-Hartman
2019-05-02 15:20 ` [PATCH 5.0 038/101] usb: gadget: net2272: Fix net2272_dequeue() Greg Kroah-Hartman
2019-05-02 15:20 ` [PATCH 5.0 039/101] ARM: dts: pfla02: increase phy reset duration Greg Kroah-Hartman
2019-05-02 15:20 ` [PATCH 5.0 040/101] i2c: i801: Add support for Intel Comet Lake Greg Kroah-Hartman
2019-05-02 15:20 ` [PATCH 5.0 041/101] KVM: arm/arm64: Fix handling of stage2 huge mappings Greg Kroah-Hartman
2019-05-02 15:20 ` [PATCH 5.0 042/101] net: ks8851: Dequeue RX packets explicitly Greg Kroah-Hartman
2019-05-02 15:20 ` [PATCH 5.0 043/101] net: ks8851: Reassert reset pin if chip ID check fails Greg Kroah-Hartman
2019-05-02 15:20 ` [PATCH 5.0 044/101] net: ks8851: Delay requesting IRQ until opened Greg Kroah-Hartman
2019-05-02 15:20 ` [PATCH 5.0 045/101] net: ks8851: Set initial carrier state to down Greg Kroah-Hartman
2019-05-02 15:20 ` [PATCH 5.0 046/101] staging: rtl8188eu: Fix potential NULL pointer dereference of kcalloc Greg Kroah-Hartman
2019-05-02 15:20 ` [PATCH 5.0 047/101] staging: rtlwifi: rtl8822b: fix to avoid potential NULL pointer dereference Greg Kroah-Hartman
2019-05-02 15:20 ` [PATCH 5.0 048/101] staging: rtl8712: uninitialized memory in read_bbreg_hdl() Greg Kroah-Hartman
2019-05-02 15:20 ` [PATCH 5.0 049/101] staging: rtlwifi: Fix potential NULL pointer dereference of kzalloc Greg Kroah-Hartman
2019-05-02 15:20 ` [PATCH 5.0 050/101] net: phy: Add DP83825I to the DP83822 driver Greg Kroah-Hartman
2019-05-02 15:20 ` [PATCH 5.0 051/101] net: macb: Add null check for PCLK and HCLK Greg Kroah-Hartman
2019-05-02 15:20 ` [PATCH 5.0 052/101] net/sched: dont dereference a->goto_chain to read the chain index Greg Kroah-Hartman
2019-05-02 15:20 ` [PATCH 5.0 053/101] ARM: dts: imx6qdl: Fix typo in imx6qdl-icore-rqs.dtsi Greg Kroah-Hartman
2019-05-02 15:20 ` [PATCH 5.0 054/101] drm/tegra: hub: Fix dereference before check Greg Kroah-Hartman
2019-05-02 15:20 ` [PATCH 5.0 055/101] NFS: Fix a typo in nfs_init_timeout_values() Greg Kroah-Hartman
2019-05-02 15:20 ` [PATCH 5.0 056/101] net: xilinx: fix possible object reference leak Greg Kroah-Hartman
2019-05-02 15:20 ` [PATCH 5.0 057/101] net: ibm: " Greg Kroah-Hartman
2019-05-02 15:21 ` [PATCH 5.0 058/101] net: ethernet: ti: " Greg Kroah-Hartman
2019-05-02 15:21 ` [PATCH 5.0 059/101] drm: Fix drm_release() and device unplug Greg Kroah-Hartman
2019-05-02 15:21 ` [PATCH 5.0 060/101] gpio: aspeed: fix a potential NULL pointer dereference Greg Kroah-Hartman
2019-05-02 15:21 ` [PATCH 5.0 061/101] drm/meson: Fix invalid pointer in meson_drv_unbind() Greg Kroah-Hartman
2019-05-02 15:21 ` [PATCH 5.0 062/101] drm/meson: Uninstall IRQ handler Greg Kroah-Hartman
2019-05-02 15:21 ` [PATCH 5.0 063/101] ARM: davinci: fix build failure with allnoconfig Greg Kroah-Hartman
2019-05-02 15:21 ` [PATCH 5.0 064/101] sbitmap: order READ/WRITE freed instance and setting clear bit Greg Kroah-Hartman
2019-05-02 15:21 ` [PATCH 5.0 065/101] staging: vc04_services: Fix an error code in vchiq_probe() Greg Kroah-Hartman
2019-05-02 15:21 ` [PATCH 5.0 066/101] scsi: mpt3sas: Fix kernel panic during expander reset Greg Kroah-Hartman
2019-05-02 15:21 ` [PATCH 5.0 067/101] scsi: aacraid: Insure we dont access PCIe space during AER/EEH Greg Kroah-Hartman
2019-05-02 15:21 ` [PATCH 5.0 068/101] scsi: qla4xxx: fix a potential NULL pointer dereference Greg Kroah-Hartman
2019-05-02 15:21 ` [PATCH 5.0 069/101] usb: usb251xb: fix to avoid " Greg Kroah-Hartman
2019-05-02 15:21 ` [PATCH 5.0 070/101] leds: trigger: netdev: fix refcnt leak on interface rename Greg Kroah-Hartman
2019-05-02 15:21 ` [PATCH 5.0 071/101] SUNRPC: fix uninitialized variable warning Greg Kroah-Hartman
2019-05-02 15:21 ` [PATCH 5.0 072/101] x86/realmode: Dont leak the trampoline kernel address Greg Kroah-Hartman
2019-05-02 15:21 ` [PATCH 5.0 073/101] usb: u132-hcd: fix resource leak Greg Kroah-Hartman
2019-05-02 15:21 ` [PATCH 5.0 074/101] ceph: fix use-after-free on symlink traversal Greg Kroah-Hartman
2019-05-02 15:21 ` [PATCH 5.0 075/101] scsi: zfcp: reduce flood of fcrscn1 trace records on multi-element RSCN Greg Kroah-Hartman
2019-05-02 15:21 ` [PATCH 5.0 076/101] x86/mm: Dont exceed the valid physical address space Greg Kroah-Hartman
2019-05-02 15:21 ` [PATCH 5.0 077/101] libata: fix using DMA buffers on stack Greg Kroah-Hartman
2019-05-02 15:21 ` [PATCH 5.0 078/101] kbuild: skip parsing pre sub-make code for recursion Greg Kroah-Hartman
2019-05-02 15:21 ` [PATCH 5.0 079/101] afs: Fix StoreData op marshalling Greg Kroah-Hartman
2019-05-02 15:21 ` [PATCH 5.0 080/101] gpio: of: Check propname before applying "cs-gpios" quirks Greg Kroah-Hartman
2019-05-02 15:21 ` [PATCH 5.0 081/101] gpio: of: Check for "spi-cs-high" in child instead of parent node Greg Kroah-Hartman
2019-05-02 15:21 ` [PATCH 5.0 082/101] KVM: nVMX: Do not inherit quadrant and invalid for the root shadow EPT Greg Kroah-Hartman
2019-05-02 15:21 ` [PATCH 5.0 083/101] KVM: SVM: Workaround errata#1096 (insn_len maybe zero on SMAP violation) Greg Kroah-Hartman
2019-05-02 15:21 ` [PATCH 5.0 084/101] kvm/x86: Move MSR_IA32_ARCH_CAPABILITIES to array emulated_msrs Greg Kroah-Hartman
2019-05-02 15:21 ` [PATCH 5.0 085/101] x86/kvm/hyper-v: avoid spurious pending stimer on vCPU init Greg Kroah-Hartman
2019-05-02 15:21 ` [PATCH 5.0 086/101] KVM: selftests: assert on exit reason in CR4/cpuid sync test Greg Kroah-Hartman
2019-05-02 15:21 ` [PATCH 5.0 087/101] KVM: selftests: explicitly disable PIE for tests Greg Kroah-Hartman
2019-05-02 15:21 ` [PATCH 5.0 088/101] KVM: selftests: disable stack protector for all KVM tests Greg Kroah-Hartman
2019-05-02 15:21 ` [PATCH 5.0 089/101] KVM: selftests: complete IO before migrating guest state Greg Kroah-Hartman
2019-05-02 15:21 ` [PATCH 5.0 090/101] gpio: of: Fix of_gpiochip_add() error path Greg Kroah-Hartman
2019-05-02 15:21 ` [PATCH 5.0 091/101] nvme-multipath: relax ANA state check Greg Kroah-Hartman
2019-05-02 15:21 ` [PATCH 5.0 092/101] nvmet: fix building bvec from sg list Greg Kroah-Hartman
2019-05-02 15:21 ` [PATCH 5.0 093/101] nvmet: fix error flow during ns enable Greg Kroah-Hartman
2019-05-02 15:21 ` [PATCH 5.0 094/101] perf cs-etm: Add missing case value Greg Kroah-Hartman
2019-05-02 15:21 ` [PATCH 5.0 095/101] perf machine: Update kernel map address and re-order properly Greg Kroah-Hartman
2019-05-02 15:21 ` [PATCH 5.0 096/101] kconfig/[mn]conf: handle backspace (^H) key Greg Kroah-Hartman
2019-05-02 15:21 ` [PATCH 5.0 097/101] iommu/amd: Reserve exclusion range in iova-domain Greg Kroah-Hartman
2019-05-02 15:21 ` [PATCH 5.0 098/101] kasan: fix variable tag set but not used warning Greg Kroah-Hartman
2019-05-02 15:21 ` [PATCH 5.0 099/101] ptrace: take into account saved_sigmask in PTRACE{GET,SET}SIGMASK Greg Kroah-Hartman
2019-05-02 15:21 ` [PATCH 5.0 100/101] leds: pca9532: fix a potential NULL pointer dereference Greg Kroah-Hartman
2019-05-06 8:21 ` Geert Uytterhoeven
2019-05-06 8:34 ` Greg Kroah-Hartman
2019-05-02 15:21 ` [PATCH 5.0 101/101] leds: trigger: netdev: use memcpy in device_name_store Greg Kroah-Hartman
2019-05-02 19:46 ` [PATCH 5.0 000/101] 5.0.12-stable review kernelci.org bot
2019-05-03 6:49 ` Naresh Kamboju
2019-05-03 7:32 ` Greg Kroah-Hartman
2019-05-03 9:27 ` Jon Hunter
2019-05-04 6:47 ` Greg Kroah-Hartman
2019-05-03 17:16 ` Guenter Roeck
2019-05-04 6:47 ` Greg Kroah-Hartman
2019-05-03 21:19 ` shuah
2019-05-04 6:46 ` Greg Kroah-Hartman
2019-05-04 1:28 ` Kelsey Skunberg
2019-05-04 6:47 ` Greg Kroah-Hartman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190502143341.147009688@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=dsterba@suse.com \
--cc=fdmanana@suse.com \
--cc=linux-kernel@vger.kernel.org \
--cc=sashal@kernel.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).