From: Sasha Levin <Alexander.Levin@microsoft.com>
To: "stable@vger.kernel.org" <stable@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Cc: Robbie Ko <robbieko@synology.com>,
David Sterba <dsterba@suse.com>,
Sasha Levin <Alexander.Levin@microsoft.com>
Subject: [PATCH AUTOSEL 4.18 03/76] Btrfs: fix unexpected failure of nocow buffered writes after snapshotting when low on space
Date: Mon, 24 Sep 2018 14:48:01 +0000 [thread overview]
Message-ID: <20180924144751.164410-3-alexander.levin@microsoft.com> (raw)
In-Reply-To: <20180924144751.164410-1-alexander.levin@microsoft.com>
From: Robbie Ko <robbieko@synology.com>
[ Upstream commit 8ecebf4d767e2307a946c8905278d6358eda35c3 ]
Commit e9894fd3e3b3 ("Btrfs: fix snapshot vs nocow writting") forced
nocow writes to fallback to COW, during writeback, when a snapshot is
created. This resulted in writes made before creating the snapshot to
unexpectedly fail with ENOSPC during writeback when success (0) was
returned to user space through the write system call.
The steps leading to this problem are:
1. When it's not possible to allocate data space for a write, the
buffered write path checks if a NOCOW write is possible. If it is,
it will not reserve space and success (0) is returned to user space.
2. Then when a snapshot is created, the root's will_be_snapshotted
atomic is incremented and writeback is triggered for all inode's that
belong to the root being snapshotted. Incrementing that atomic forces
all previous writes to fallback to COW during writeback (running
delalloc).
3. This results in the writeback for the inodes to fail and therefore
setting the ENOSPC error in their mappings, so that a subsequent
fsync on them will report the error to user space. So it's not a
completely silent data loss (since fsync will report ENOSPC) but it's
a very unexpected and undesirable behaviour, because if a clean
shutdown/unmount of the filesystem happens without previous calls to
fsync, it is expected to have the data present in the files after
mounting the filesystem again.
So fix this by adding a new atomic named snapshot_force_cow to the
root structure which prevents this behaviour and works the following way:
1. It is incremented when we start to create a snapshot after triggering
writeback and before waiting for writeback to finish.
2. This new atomic is now what is used by writeback (running delalloc)
to decide whether we need to fallback to COW or not. Because we
incremented this new atomic after triggering writeback in the
snapshot creation ioctl, we ensure that all buffered writes that
happened before snapshot creation will succeed and not fallback to
COW (which would make them fail with ENOSPC).
3. The existing atomic, will_be_snapshotted, is kept because it is used
to force new buffered writes, that start after we started
snapshotting, to reserve data space even when NOCOW is possible.
This makes these writes fail early with ENOSPC when there's no
available space to allocate, preventing the unexpected behaviour of
writeback later failing with ENOSPC due to a fallback to COW mode.
Fixes: e9894fd3e3b3 ("Btrfs: fix snapshot vs nocow writting")
Signed-off-by: Robbie Ko <robbieko@synology.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
---
fs/btrfs/ctree.h | 1 +
fs/btrfs/disk-io.c | 1 +
fs/btrfs/inode.c | 25 ++++---------------------
fs/btrfs/ioctl.c | 16 ++++++++++++++++
4 files changed, 22 insertions(+), 21 deletions(-)
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 118346aceea9..663ce0518d27 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1277,6 +1277,7 @@ struct btrfs_root {
int send_in_progress;
struct btrfs_subvolume_writers *subv_writers;
atomic_t will_be_snapshotted;
+ atomic_t snapshot_force_cow;
/* For qgroup metadata reserved space */
spinlock_t qgroup_meta_rsv_lock;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index dfed08e70ec1..891b1aab3480 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1217,6 +1217,7 @@ static void __setup_root(struct btrfs_root *root, struct btrfs_fs_info *fs_info,
atomic_set(&root->log_batch, 0);
refcount_set(&root->refs, 1);
atomic_set(&root->will_be_snapshotted, 0);
+ atomic_set(&root->snapshot_force_cow, 0);
root->log_transid = 0;
root->log_transid_committed = -1;
root->last_log_commit = 0;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 071d949f69ec..d3736fbf6774 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1275,7 +1275,7 @@ static noinline int run_delalloc_nocow(struct inode *inode,
u64 disk_num_bytes;
u64 ram_bytes;
int extent_type;
- int ret, err;
+ int ret;
int type;
int nocow;
int check_prev = 1;
@@ -1407,11 +1407,8 @@ static noinline int run_delalloc_nocow(struct inode *inode,
* if there are pending snapshots for this root,
* we fall into common COW way.
*/
- if (!nolock) {
- err = btrfs_start_write_no_snapshotting(root);
- if (!err)
- goto out_check;
- }
+ if (!nolock && atomic_read(&root->snapshot_force_cow))
+ goto out_check;
/*
* force cow if csum exists in the range.
* this ensure that csum for a given extent are
@@ -1420,9 +1417,6 @@ static noinline int run_delalloc_nocow(struct inode *inode,
ret = csum_exist_in_range(fs_info, disk_bytenr,
num_bytes);
if (ret) {
- if (!nolock)
- btrfs_end_write_no_snapshotting(root);
-
/*
* ret could be -EIO if the above fails to read
* metadata.
@@ -1435,11 +1429,8 @@ static noinline int run_delalloc_nocow(struct inode *inode,
WARN_ON_ONCE(nolock);
goto out_check;
}
- if (!btrfs_inc_nocow_writers(fs_info, disk_bytenr)) {
- if (!nolock)
- btrfs_end_write_no_snapshotting(root);
+ if (!btrfs_inc_nocow_writers(fs_info, disk_bytenr))
goto out_check;
- }
nocow = 1;
} else if (extent_type == BTRFS_FILE_EXTENT_INLINE) {
extent_end = found_key.offset +
@@ -1453,8 +1444,6 @@ static noinline int run_delalloc_nocow(struct inode *inode,
out_check:
if (extent_end <= start) {
path->slots[0]++;
- if (!nolock && nocow)
- btrfs_end_write_no_snapshotting(root);
if (nocow)
btrfs_dec_nocow_writers(fs_info, disk_bytenr);
goto next_slot;
@@ -1476,8 +1465,6 @@ static noinline int run_delalloc_nocow(struct inode *inode,
end, page_started, nr_written, 1,
NULL);
if (ret) {
- if (!nolock && nocow)
- btrfs_end_write_no_snapshotting(root);
if (nocow)
btrfs_dec_nocow_writers(fs_info,
disk_bytenr);
@@ -1497,8 +1484,6 @@ static noinline int run_delalloc_nocow(struct inode *inode,
ram_bytes, BTRFS_COMPRESS_NONE,
BTRFS_ORDERED_PREALLOC);
if (IS_ERR(em)) {
- if (!nolock && nocow)
- btrfs_end_write_no_snapshotting(root);
if (nocow)
btrfs_dec_nocow_writers(fs_info,
disk_bytenr);
@@ -1537,8 +1522,6 @@ static noinline int run_delalloc_nocow(struct inode *inode,
EXTENT_CLEAR_DATA_RESV,
PAGE_UNLOCK | PAGE_SET_PRIVATE2);
- if (!nolock && nocow)
- btrfs_end_write_no_snapshotting(root);
cur_offset = extent_end;
/*
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index f3d6be0c657b..ef7159646615 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -761,6 +761,7 @@ static int create_snapshot(struct btrfs_root *root, struct inode *dir,
struct btrfs_pending_snapshot *pending_snapshot;
struct btrfs_trans_handle *trans;
int ret;
+ bool snapshot_force_cow = false;
if (!test_bit(BTRFS_ROOT_REF_COWS, &root->state))
return -EINVAL;
@@ -777,6 +778,11 @@ static int create_snapshot(struct btrfs_root *root, struct inode *dir,
goto free_pending;
}
+ /*
+ * Force new buffered writes to reserve space even when NOCOW is
+ * possible. This is to avoid later writeback (running dealloc) to
+ * fallback to COW mode and unexpectedly fail with ENOSPC.
+ */
atomic_inc(&root->will_be_snapshotted);
smp_mb__after_atomic();
/* wait for no snapshot writes */
@@ -787,6 +793,14 @@ static int create_snapshot(struct btrfs_root *root, struct inode *dir,
if (ret)
goto dec_and_free;
+ /*
+ * All previous writes have started writeback in NOCOW mode, so now
+ * we force future writes to fallback to COW mode during snapshot
+ * creation.
+ */
+ atomic_inc(&root->snapshot_force_cow);
+ snapshot_force_cow = true;
+
btrfs_wait_ordered_extents(root, U64_MAX, 0, (u64)-1);
btrfs_init_block_rsv(&pending_snapshot->block_rsv,
@@ -851,6 +865,8 @@ static int create_snapshot(struct btrfs_root *root, struct inode *dir,
fail:
btrfs_subvolume_release_metadata(fs_info, &pending_snapshot->block_rsv);
dec_and_free:
+ if (snapshot_force_cow)
+ atomic_dec(&root->snapshot_force_cow);
if (atomic_dec_and_test(&root->will_be_snapshotted))
wake_up_var(&root->will_be_snapshotted);
free_pending:
--
2.17.1
next prev parent reply other threads:[~2018-09-24 20:51 UTC|newest]
Thread overview: 75+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-09-24 14:48 [PATCH AUTOSEL 4.18 01/76] mac80211: Run TXQ teardown code before de-registering interfaces Sasha Levin
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 02/76] mac80211_hwsim: require at least one channel Sasha Levin
2018-09-24 14:48 ` Sasha Levin [this message]
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 04/76] KVM: PPC: Book3S HV: Don't truncate HPTE index in xlate function Sasha Levin
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 05/76] cfg80211: remove division by size of sizeof(struct ieee80211_wmm_rule) Sasha Levin
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 06/76] btrfs: btrfs_shrink_device should call commit transaction at the end Sasha Levin
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 08/76] scsi: csiostor: fix incorrect port capabilities Sasha Levin
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 07/76] scsi: csiostor: add a check for NULL pointer after kmalloc() Sasha Levin
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 09/76] scsi: libata: Add missing newline at end of file Sasha Levin
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 10/76] scsi: aacraid: fix a signedness bug Sasha Levin
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 11/76] bpf, sockmap: fix potential use after free in bpf_tcp_close Sasha Levin
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 13/76] bpf: sockmap, decrement copied count correctly in redirect error case Sasha Levin
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 12/76] bpf, sockmap: fix psock refcount leak in bpf_tcp_recvmsg Sasha Levin
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 15/76] mac80211_hwsim: correct use of IEEE80211_VHT_CAP_RXSTBC_X Sasha Levin
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 14/76] mac80211: " Sasha Levin
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 16/76] cfg80211: make wmm_rule part of the reg_rule structure Sasha Levin
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 17/76] mac80211_hwsim: Fix possible Spectre-v1 for hwsim_world_regdom_custom Sasha Levin
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 18/76] nl80211: Fix nla_put_u8 to u16 for NL80211_WMMR_TXOP Sasha Levin
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 19/76] nl80211: Pass center frequency in kHz instead of MHz Sasha Levin
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 20/76] bpf: fix several offset tests in bpf_msg_pull_data Sasha Levin
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 21/76] gpio: adp5588: Fix sleep-in-atomic-context bug Sasha Levin
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 23/76] mac80211: avoid kernel panic when building AMSDU from non-linear SKB Sasha Levin
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 22/76] mac80211: mesh: fix HWMP sequence numbering to follow standard Sasha Levin
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 24/76] gpiolib: acpi: Switch to cansleep version of GPIO library call Sasha Levin
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 25/76] gpiolib-acpi: Register GpioInt ACPI event handlers from a late_initcall Sasha Levin
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 26/76] gpio: dwapb: Fix error handling in dwapb_gpio_probe() Sasha Levin
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 27/76] bpf: fix msg->data/data_end after sg shift repair in bpf_msg_pull_data Sasha Levin
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 29/76] bpf: fix sg shift repair start offset " Sasha Levin
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 28/76] bpf: fix shift upon scatterlist ring wrap-around " Sasha Levin
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 31/76] net: hns: add the code for cleaning pkt in chip Sasha Levin
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 32/76] net: hns: add netif_carrier_off before change speed and duplex Sasha Levin
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 33/76] sh_eth: Add R7S9210 support Sasha Levin
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 34/76] net: mvpp2: initialize port of_node pointer Sasha Levin
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 35/76] tc-testing: add test-cases for numeric and invalid control action Sasha Levin
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 36/76] cfg80211: nl80211_update_ft_ies() to validate NL80211_ATTR_IE Sasha Levin
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 37/76] mac80211: do not convert to A-MSDU if frag/subframe limited Sasha Levin
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 38/76] mac80211: always account for A-MSDU header changes Sasha Levin
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 39/76] tools/kvm_stat: fix python3 issues Sasha Levin
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 40/76] tools/kvm_stat: fix handling of invalid paths in debugfs provider Sasha Levin
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 41/76] tools/kvm_stat: fix updates for dead guests Sasha Levin
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 42/76] gpio: Fix crash due to registration race Sasha Levin
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 43/76] ARC: atomics: unbork atomic_fetch_##op() Sasha Levin
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 44/76] Revert "blk-throttle: fix race between blkcg_bio_issue_check() and cgroup_rmdir()" Sasha Levin
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 45/76] md/raid5-cache: disable reshape completely Sasha Levin
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 46/76] RAID10 BUG_ON in raise_barrier when force is true and conf->barrier is 0 Sasha Levin
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 47/76] selftests: pmtu: maximum MTU for vti4 is 2^16-1-20 Sasha Levin
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 49/76] ibmvnic: Include missing return code checks in reset function Sasha Levin
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 48/76] selftests: pmtu: detect correct binary to ping ipv6 addresses Sasha Levin
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 50/76] bpf: Fix bpf_msg_pull_data() Sasha Levin
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 51/76] bpf: avoid misuse of psock when TCP_ULP_BPF collides with another ULP Sasha Levin
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 52/76] net: ethernet: cpsw-phy-sel: prefer phandle for phy sel Sasha Levin
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 53/76] i2c: uniphier: issue STOP only for last message or I2C_M_STOP Sasha Levin
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 54/76] i2c: uniphier-f: " Sasha Levin
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 56/76] fs/cifs: don't translate SFM_SLASH (U+F026) to backslash Sasha Levin
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 55/76] net: cadence: Fix a sleep-in-atomic-context bug in macb_halt_tx() Sasha Levin
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 57/76] mac80211: fix an off-by-one issue in A-MSDU max_subframe computation Sasha Levin
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 58/76] cfg80211: fix a type issue in ieee80211_chandef_to_operating_class() Sasha Levin
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 60/76] mac80211: fix a race between restart and CSA flows Sasha Levin
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 59/76] mac80211: fix WMM TXOP calculation Sasha Levin
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 61/76] mac80211: Fix station bandwidth setting after channel switch Sasha Levin
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 62/76] mac80211: don't Tx a deauth frame if the AP forbade Tx Sasha Levin
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 64/76] fsnotify: fix ignore mask logic in fsnotify() Sasha Levin
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 63/76] mac80211: shorten the IBSS debug messages Sasha Levin
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 65/76] net/ibm/emac: wrong emac_calc_base call was used by typo Sasha Levin
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 66/76] nds32: fix logic for module Sasha Levin
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 68/76] nds32: Fix empty call trace Sasha Levin
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 67/76] nds32: add NULL entry to the end of_device_id array Sasha Levin
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 69/76] nds32: Fix get_user/put_user macro expand pointer problem Sasha Levin
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 70/76] nds32: fix build error because of wrong semicolon Sasha Levin
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 72/76] tools/vm/page-types.c: fix "defined but not used" warning Sasha Levin
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 71/76] tools/vm/slabinfo.c: fix sign-compare warning Sasha Levin
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 73/76] nds32: linker script: GCOV kernel may refers data in __exit Sasha Levin
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 74/76] ceph: avoid a use-after-free in ceph_destroy_options() Sasha Levin
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 75/76] firmware: arm_scmi: fix divide by zero when sustained_perf_level is zero Sasha Levin
2018-09-24 14:48 ` [PATCH AUTOSEL 4.18 76/76] afs: Fix cell specification to permit an empty address list Sasha Levin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180924144751.164410-3-alexander.levin@microsoft.com \
--to=alexander.levin@microsoft.com \
--cc=dsterba@suse.com \
--cc=linux-kernel@vger.kernel.org \
--cc=robbieko@synology.com \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).