public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Josef Bacik <josef@toxicpanda.com>,
	Filipe Manana <fdmanana@suse.com>,
	David Sterba <dsterba@suse.com>, Sasha Levin <sashal@kernel.org>
Subject: [PATCH 5.4 10/47] btrfs: fix race when defragmenting leads to unnecessary IO
Date: Mon,  4 Jan 2021 16:57:09 +0100	[thread overview]
Message-ID: <20210104155706.250001425@linuxfoundation.org> (raw)
In-Reply-To: <20210104155705.740576914@linuxfoundation.org>

From: Filipe Manana <fdmanana@suse.com>

[ Upstream commit 7f458a3873ae94efe1f37c8b96c97e7298769e98 ]

When defragmenting we skip ranges that have holes or inline extents, so that
we don't do unnecessary IO and waste space. We do this check when calling
should_defrag_range() at btrfs_defrag_file(). However we do it without
holding the inode's lock. The reason we do it like this is to avoid
blocking other tasks for too long, that possibly want to operate on other
file ranges, since after the call to should_defrag_range() and before
locking the inode, we trigger a synchronous page cache readahead. However
before we were able to lock the inode, some other task might have punched
a hole in our range, or we may now have an inline extent there, in which
case we should not set the range for defrag anymore since that would cause
unnecessary IO and make us waste space (i.e. allocating extents to contain
zeros for a hole).

So after we locked the inode and the range in the iotree, check again if
we have holes or an inline extent, and if we do, just skip the range.

I hit this while testing my next patch that fixes races when updating an
inode's number of bytes (subject "btrfs: update the number of bytes used
by an inode atomically"), and it depends on this change in order to work
correctly. Alternatively I could rework that other patch to detect holes
and flag their range with the 'new delalloc' bit, but this itself fixes
an efficiency problem due a race that from a functional point of view is
not harmful (it could be triggered with btrfs/062 from fstests).

CC: stable@vger.kernel.org # 5.4+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/btrfs/ioctl.c | 39 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index f58e03d1775d8..8ed71b3b25466 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1256,6 +1256,7 @@ static int cluster_pages_for_defrag(struct inode *inode,
 	u64 page_end;
 	u64 page_cnt;
 	u64 start = (u64)start_index << PAGE_SHIFT;
+	u64 search_start;
 	int ret;
 	int i;
 	int i_done;
@@ -1352,6 +1353,40 @@ static int cluster_pages_for_defrag(struct inode *inode,
 
 	lock_extent_bits(&BTRFS_I(inode)->io_tree,
 			 page_start, page_end - 1, &cached_state);
+
+	/*
+	 * When defragmenting we skip ranges that have holes or inline extents,
+	 * (check should_defrag_range()), to avoid unnecessary IO and wasting
+	 * space. At btrfs_defrag_file(), we check if a range should be defragged
+	 * before locking the inode and then, if it should, we trigger a sync
+	 * page cache readahead - we lock the inode only after that to avoid
+	 * blocking for too long other tasks that possibly want to operate on
+	 * other file ranges. But before we were able to get the inode lock,
+	 * some other task may have punched a hole in the range, or we may have
+	 * now an inline extent, in which case we should not defrag. So check
+	 * for that here, where we have the inode and the range locked, and bail
+	 * out if that happened.
+	 */
+	search_start = page_start;
+	while (search_start < page_end) {
+		struct extent_map *em;
+
+		em = btrfs_get_extent(BTRFS_I(inode), NULL, 0, search_start,
+				      page_end - search_start, 0);
+		if (IS_ERR(em)) {
+			ret = PTR_ERR(em);
+			goto out_unlock_range;
+		}
+		if (em->block_start >= EXTENT_MAP_LAST_BYTE) {
+			free_extent_map(em);
+			/* Ok, 0 means we did not defrag anything */
+			ret = 0;
+			goto out_unlock_range;
+		}
+		search_start = extent_map_end(em);
+		free_extent_map(em);
+	}
+
 	clear_extent_bit(&BTRFS_I(inode)->io_tree, page_start,
 			  page_end - 1, EXTENT_DELALLOC | EXTENT_DO_ACCOUNTING |
 			  EXTENT_DEFRAG, 0, 0, &cached_state);
@@ -1382,6 +1417,10 @@ static int cluster_pages_for_defrag(struct inode *inode,
 	btrfs_delalloc_release_extents(BTRFS_I(inode), page_cnt << PAGE_SHIFT);
 	extent_changeset_free(data_reserved);
 	return i_done;
+
+out_unlock_range:
+	unlock_extent_cached(&BTRFS_I(inode)->io_tree,
+			     page_start, page_end - 1, &cached_state);
 out:
 	for (i = 0; i < i_done; i++) {
 		unlock_page(pages[i]);
-- 
2.27.0




  parent reply	other threads:[~2021-01-04 16:13 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-04 15:56 [PATCH 5.4 00/47] 5.4.87-rc1 review Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.4 01/47] net/sched: sch_taprio: reset child qdiscs before freeing them Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.4 02/47] md/raid10: initialize r10_bio->read_slot before use Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.4 03/47] thermal/drivers/cpufreq_cooling: Update cpufreq_state only if state has changed Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.4 04/47] ext4: prevent creating duplicate encrypted filenames Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.4 05/47] ubifs: " Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.4 06/47] f2fs: " Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.4 07/47] fscrypt: add fscrypt_is_nokey_name() Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.4 08/47] fscrypt: remove kernel-internal constants from UAPI header Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.4 09/47] vfio/pci: Move dummy_resources_list init in vfio_pci_probe() Greg Kroah-Hartman
2021-01-04 15:57 ` Greg Kroah-Hartman [this message]
2021-01-04 15:57 ` [PATCH 5.4 11/47] ext4: dont remount read-only with errors=continue on reboot Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.4 12/47] KVM: x86: avoid incorrect writes to host MSR_IA32_SPEC_CTRL Greg Kroah-Hartman
2021-02-26 11:03   ` Thomas Lamprecht
2021-02-26 11:27     ` Paolo Bonzini
2021-02-26 12:59       ` Greg Kroah-Hartman
2021-02-26 14:15         ` Paolo Bonzini
2021-02-26 14:18           ` Thomas Lamprecht
2021-02-26 14:21             ` Paolo Bonzini
2021-01-04 15:57 ` [PATCH 5.4 13/47] KVM: SVM: relax conditions for allowing MSR_IA32_SPEC_CTRL accesses Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.4 14/47] KVM: x86: reinstate vendor-agnostic check on SPEC_CTRL cpuid bits Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.4 15/47] powerpc/bitops: Fix possible undefined behaviour with fls() and fls64() Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.4 16/47] jffs2: Allow setting rp_size to zero during remounting Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.4 17/47] jffs2: Fix NULL pointer dereference in rp_size fs option parsing Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.4 18/47] scsi: block: Fix a race in the runtime power management code Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.4 19/47] uapi: move constants from <linux/kernel.h> to <linux/const.h> Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.4 20/47] tools headers UAPI: Sync linux/const.h with the kernel headers Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.4 21/47] null_blk: Fix zone size initialization Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.4 22/47] of: fix linker-section match-table corruption Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.4 23/47] cgroup: Fix memory leak when parsing multiple source parameters Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.4 24/47] scsi: cxgb4i: Fix TLS dependency Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.4 25/47] Bluetooth: hci_h5: close serdev device and free hu in h5_close Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.4 26/47] reiserfs: add check for an invalid ih_entry_count Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.4 27/47] misc: vmw_vmci: fix kernel info-leak by initializing dbells in vmci_ctx_get_chkpt_doorbells() Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.4 28/47] media: gp8psk: initialize stats at power control logic Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.4 29/47] f2fs: fix shift-out-of-bounds in sanity_check_raw_super() Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.4 30/47] ALSA: seq: Use bool for snd_seq_queue internal flags Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.4 31/47] ALSA: rawmidi: Access runtime->avail always in spinlock Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.4 32/47] bfs: dont use WARNING: string when its just info Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.4 33/47] fcntl: Fix potential deadlock in send_sig{io, urg}() Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.4 34/47] rtc: sun6i: Fix memleak in sun6i_rtc_clk_init Greg Kroah-Hartman
2021-01-06 13:07   ` Pavel Machek
2021-01-04 15:57 ` [PATCH 5.4 35/47] module: set MODULE_STATE_GOING state when a module fails to load Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.4 36/47] quota: Dont overflow quota file offsets Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.4 37/47] rtc: pl031: fix resource leak in pl031_probe Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.4 38/47] powerpc: sysdev: add missing iounmap() on error in mpic_msgr_probe() Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.4 39/47] i3c master: fix missing destroy_workqueue() on error in i3c_master_register Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.4 40/47] NFSv4: Fix a pNFS layout related use-after-free race when freeing the inode Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.4 41/47] f2fs: avoid race condition for shrinker count Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.4 42/47] module: delay kobject uevent until after module init call Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.4 43/47] fs/namespace.c: WARN if mnt_count has become negative Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.4 44/47] um: ubd: Submit all data segments atomically Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.4 45/47] tick/sched: Remove bogus boot "safety" check Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.4 46/47] ALSA: pcm: Clear the full allocated memory at hw_params Greg Kroah-Hartman
2021-01-04 15:57 ` [PATCH 5.4 47/47] dm verity: skip verity work if I/O error when system is shutting down Greg Kroah-Hartman
2021-01-05  6:07 ` [PATCH 5.4 00/47] 5.4.87-rc1 review Daniel Díaz
2021-01-05 16:39 ` Shuah Khan
2021-01-05 18:17 ` Guenter Roeck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210104155706.250001425@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=dsterba@suse.com \
    --cc=fdmanana@suse.com \
    --cc=josef@toxicpanda.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sashal@kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox