From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, Josef Bacik <jbacik@fb.com>,
Filipe Manana <fdmanana@suse.com>,
David Sterba <dsterba@suse.com>,
Anand Jain <anand.jain@oracle.com>
Subject: [PATCH 4.14 01/30] btrfs: always wait on ordered extents at fsync time
Date: Mon, 25 Oct 2021 21:14:21 +0200 [thread overview]
Message-ID: <20211025190923.166779214@linuxfoundation.org> (raw)
In-Reply-To: <20211025190922.089277904@linuxfoundation.org>
From: Josef Bacik <jbacik@fb.com>
commit b5e6c3e170b77025b5f6174258c7ad71eed2d4de upstream.
There's a priority inversion that exists currently with btrfs fsync. In
some cases we will collect outstanding ordered extents onto a list and
only wait on them at the very last second. However this "very last
second" falls inside of a transaction handle, so if we are in a lower
priority cgroup we can end up holding the transaction open for longer
than needed, so if a high priority cgroup is also trying to fsync()
it'll see latency.
Signed-off-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
fs/btrfs/file.c | 56 ++++----------------------------------------------------
1 file changed, 4 insertions(+), 52 deletions(-)
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2102,53 +2102,12 @@ int btrfs_sync_file(struct file *file, l
atomic_inc(&root->log_batch);
full_sync = test_bit(BTRFS_INODE_NEEDS_FULL_SYNC,
&BTRFS_I(inode)->runtime_flags);
+
/*
- * We might have have had more pages made dirty after calling
- * start_ordered_ops and before acquiring the inode's i_mutex.
+ * We have to do this here to avoid the priority inversion of waiting on
+ * IO of a lower priority task while holding a transaciton open.
*/
- if (full_sync) {
- /*
- * For a full sync, we need to make sure any ordered operations
- * start and finish before we start logging the inode, so that
- * all extents are persisted and the respective file extent
- * items are in the fs/subvol btree.
- */
- ret = btrfs_wait_ordered_range(inode, start, len);
- } else {
- /*
- * Start any new ordered operations before starting to log the
- * inode. We will wait for them to finish in btrfs_sync_log().
- *
- * Right before acquiring the inode's mutex, we might have new
- * writes dirtying pages, which won't immediately start the
- * respective ordered operations - that is done through the
- * fill_delalloc callbacks invoked from the writepage and
- * writepages address space operations. So make sure we start
- * all ordered operations before starting to log our inode. Not
- * doing this means that while logging the inode, writeback
- * could start and invoke writepage/writepages, which would call
- * the fill_delalloc callbacks (cow_file_range,
- * submit_compressed_extents). These callbacks add first an
- * extent map to the modified list of extents and then create
- * the respective ordered operation, which means in
- * tree-log.c:btrfs_log_inode() we might capture all existing
- * ordered operations (with btrfs_get_logged_extents()) before
- * the fill_delalloc callback adds its ordered operation, and by
- * the time we visit the modified list of extent maps (with
- * btrfs_log_changed_extents()), we see and process the extent
- * map they created. We then use the extent map to construct a
- * file extent item for logging without waiting for the
- * respective ordered operation to finish - this file extent
- * item points to a disk location that might not have yet been
- * written to, containing random data - so after a crash a log
- * replay will make our inode have file extent items that point
- * to disk locations containing invalid data, as we returned
- * success to userspace without waiting for the respective
- * ordered operation to finish, because it wasn't captured by
- * btrfs_get_logged_extents().
- */
- ret = start_ordered_ops(inode, start, end);
- }
+ ret = btrfs_wait_ordered_range(inode, start, len);
if (ret) {
up_write(&BTRFS_I(inode)->dio_sem);
inode_unlock(inode);
@@ -2283,13 +2242,6 @@ int btrfs_sync_file(struct file *file, l
goto out;
}
}
- if (!full_sync) {
- ret = btrfs_wait_ordered_range(inode, start, len);
- if (ret) {
- btrfs_end_transaction(trans);
- goto out;
- }
- }
ret = btrfs_commit_transaction(trans);
} else {
ret = btrfs_end_transaction(trans);
next prev parent reply other threads:[~2021-10-25 19:23 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-10-25 19:14 [PATCH 4.14 00/30] 4.14.253-rc1 review Greg Kroah-Hartman
2021-10-25 19:14 ` Greg Kroah-Hartman [this message]
2021-10-25 19:14 ` [PATCH 4.14 02/30] ARM: dts: at91: sama5d2_som1_ek: disable ISC node by default Greg Kroah-Hartman
2021-10-25 19:14 ` [PATCH 4.14 03/30] xtensa: xtfpga: use CONFIG_USE_OF instead of CONFIG_OF Greg Kroah-Hartman
2021-10-25 19:14 ` [PATCH 4.14 04/30] xtensa: xtfpga: Try software restart before simulating CPU reset Greg Kroah-Hartman
2021-10-25 19:14 ` [PATCH 4.14 05/30] NFSD: Keep existing listeners on portlist error Greg Kroah-Hartman
2021-10-25 19:14 ` [PATCH 4.14 06/30] netfilter: ipvs: make global sysctl readonly in non-init netns Greg Kroah-Hartman
2021-10-25 19:14 ` [PATCH 4.14 07/30] NIOS2: irqflags: rename a redefined register name Greg Kroah-Hartman
2021-10-25 19:14 ` [PATCH 4.14 08/30] can: rcar_can: fix suspend/resume Greg Kroah-Hartman
2021-10-25 19:14 ` [PATCH 4.14 09/30] can: peak_usb: pcan_usb_fd_decode_status(): fix back to ERROR_ACTIVE state notification Greg Kroah-Hartman
2021-10-25 19:14 ` [PATCH 4.14 10/30] can: peak_pci: peak_pci_remove(): fix UAF Greg Kroah-Hartman
2021-10-25 19:14 ` [PATCH 4.14 11/30] ocfs2: fix data corruption after conversion from inline format Greg Kroah-Hartman
2021-10-25 19:14 ` [PATCH 4.14 12/30] ocfs2: mount fails with buffer overflow in strlen Greg Kroah-Hartman
2021-10-25 19:14 ` [PATCH 4.14 13/30] elfcore: correct reference to CONFIG_UML Greg Kroah-Hartman
2021-10-25 19:14 ` [PATCH 4.14 14/30] vfs: check fd has read access in kernel_read_file_from_fd() Greg Kroah-Hartman
2021-10-25 19:14 ` [PATCH 4.14 15/30] ALSA: usb-audio: Provide quirk for Sennheiser GSP670 Headset Greg Kroah-Hartman
2021-10-25 19:14 ` [PATCH 4.14 16/30] ASoC: DAPM: Fix missing kctl change notifications Greg Kroah-Hartman
2021-10-25 19:14 ` [PATCH 4.14 17/30] nfc: nci: fix the UAF of rf_conn_info object Greg Kroah-Hartman
2021-10-25 19:14 ` [PATCH 4.14 18/30] isdn: cpai: check ctr->cnr to avoid array index out of bound Greg Kroah-Hartman
2021-10-25 19:14 ` [PATCH 4.14 19/30] netfilter: Kconfig: use default y instead of m for bool config option Greg Kroah-Hartman
2021-10-25 19:14 ` [PATCH 4.14 20/30] btrfs: deal with errors when checking if a dir entry exists during log replay Greg Kroah-Hartman
2021-10-25 19:14 ` [PATCH 4.14 21/30] net: stmmac: add support for dwmac 3.40a Greg Kroah-Hartman
2021-10-25 19:14 ` [PATCH 4.14 22/30] ARM: dts: spear3xx: Fix gmac node Greg Kroah-Hartman
2021-10-25 19:14 ` [PATCH 4.14 23/30] isdn: mISDN: Fix sleeping function called from invalid context Greg Kroah-Hartman
2021-10-25 19:14 ` [PATCH 4.14 24/30] platform/x86: intel_scu_ipc: Update timeout value in comment Greg Kroah-Hartman
2021-10-25 19:14 ` [PATCH 4.14 25/30] ALSA: hda: avoid write to STATESTS if controller is in reset Greg Kroah-Hartman
2021-10-25 19:14 ` [PATCH 4.14 26/30] scsi: core: Fix shost->cmd_per_lun calculation in scsi_add_host_with_dma() Greg Kroah-Hartman
2021-10-25 19:14 ` [PATCH 4.14 27/30] usbnet: sanity check for maxpacket Greg Kroah-Hartman
2021-10-25 19:14 ` [PATCH 4.14 28/30] net: mdiobus: Fix memory leak in __mdiobus_register Greg Kroah-Hartman
2021-10-25 19:14 ` [PATCH 4.14 29/30] tracing: Have all levels of checks prevent recursion Greg Kroah-Hartman
2021-10-25 19:14 ` [PATCH 4.14 30/30] ARM: 9122/1: select HAVE_FUTEX_CMPXCHG Greg Kroah-Hartman
2021-10-26 9:20 ` [PATCH 4.14 00/30] 4.14.253-rc1 review Jon Hunter
2021-10-26 13:30 ` Naresh Kamboju
2021-10-26 19:15 ` Guenter Roeck
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20211025190923.166779214@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=anand.jain@oracle.com \
--cc=dsterba@suse.com \
--cc=fdmanana@suse.com \
--cc=jbacik@fb.com \
--cc=linux-kernel@vger.kernel.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox