stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Ross Zwisler <zwisler@google.com>,
	Theodore Tso <tytso@mit.edu>, Jan Kara <jack@suse.cz>
Subject: [PATCH 5.1 54/62] jbd2: introduce jbd2_inode dirty range scoping
Date: Fri, 26 Jul 2019 17:25:06 +0200	[thread overview]
Message-ID: <20190726152307.676250429@linuxfoundation.org> (raw)
In-Reply-To: <20190726152301.720139286@linuxfoundation.org>

From: Ross Zwisler <zwisler@chromium.org>

commit 6ba0e7dc64a5adcda2fbe65adc466891795d639e upstream.

Currently both journal_submit_inode_data_buffers() and
journal_finish_inode_data_buffers() operate on the entire address space
of each of the inodes associated with a given journal entry.  The
consequence of this is that if we have an inode where we are constantly
appending dirty pages we can end up waiting for an indefinite amount of
time in journal_finish_inode_data_buffers() while we wait for all the
pages under writeback to be written out.

The easiest way to cause this type of workload is do just dd from
/dev/zero to a file until it fills the entire filesystem.  This can
cause journal_finish_inode_data_buffers() to wait for the duration of
the entire dd operation.

We can improve this situation by scoping each of the inode dirty ranges
associated with a given transaction.  We do this via the jbd2_inode
structure so that the scoping is contained within jbd2 and so that it
follows the lifetime and locking rules for that structure.

This allows us to limit the writeback & wait in
journal_submit_inode_data_buffers() and
journal_finish_inode_data_buffers() respectively to the dirty range for
a given struct jdb2_inode, keeping us from waiting forever if the inode
in question is still being appended to.

Signed-off-by: Ross Zwisler <zwisler@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/jbd2/commit.c      |   23 +++++++++++++++++------
 fs/jbd2/journal.c     |    4 ++++
 fs/jbd2/transaction.c |   49 ++++++++++++++++++++++++++++---------------------
 include/linux/jbd2.h  |   22 ++++++++++++++++++++++
 4 files changed, 71 insertions(+), 27 deletions(-)

--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -187,14 +187,15 @@ static int journal_wait_on_commit_record
  * use writepages() because with dealyed allocation we may be doing
  * block allocation in writepages().
  */
-static int journal_submit_inode_data_buffers(struct address_space *mapping)
+static int journal_submit_inode_data_buffers(struct address_space *mapping,
+		loff_t dirty_start, loff_t dirty_end)
 {
 	int ret;
 	struct writeback_control wbc = {
 		.sync_mode =  WB_SYNC_ALL,
 		.nr_to_write = mapping->nrpages * 2,
-		.range_start = 0,
-		.range_end = i_size_read(mapping->host),
+		.range_start = dirty_start,
+		.range_end = dirty_end,
 	};
 
 	ret = generic_writepages(mapping, &wbc);
@@ -218,6 +219,9 @@ static int journal_submit_data_buffers(j
 
 	spin_lock(&journal->j_list_lock);
 	list_for_each_entry(jinode, &commit_transaction->t_inode_list, i_list) {
+		loff_t dirty_start = jinode->i_dirty_start;
+		loff_t dirty_end = jinode->i_dirty_end;
+
 		if (!(jinode->i_flags & JI_WRITE_DATA))
 			continue;
 		mapping = jinode->i_vfs_inode->i_mapping;
@@ -230,7 +234,8 @@ static int journal_submit_data_buffers(j
 		 * only allocated blocks here.
 		 */
 		trace_jbd2_submit_inode_data(jinode->i_vfs_inode);
-		err = journal_submit_inode_data_buffers(mapping);
+		err = journal_submit_inode_data_buffers(mapping, dirty_start,
+				dirty_end);
 		if (!ret)
 			ret = err;
 		spin_lock(&journal->j_list_lock);
@@ -257,12 +262,16 @@ static int journal_finish_inode_data_buf
 	/* For locking, see the comment in journal_submit_data_buffers() */
 	spin_lock(&journal->j_list_lock);
 	list_for_each_entry(jinode, &commit_transaction->t_inode_list, i_list) {
+		loff_t dirty_start = jinode->i_dirty_start;
+		loff_t dirty_end = jinode->i_dirty_end;
+
 		if (!(jinode->i_flags & JI_WAIT_DATA))
 			continue;
 		jinode->i_flags |= JI_COMMIT_RUNNING;
 		spin_unlock(&journal->j_list_lock);
-		err = filemap_fdatawait_keep_errors(
-				jinode->i_vfs_inode->i_mapping);
+		err = filemap_fdatawait_range_keep_errors(
+				jinode->i_vfs_inode->i_mapping, dirty_start,
+				dirty_end);
 		if (!ret)
 			ret = err;
 		spin_lock(&journal->j_list_lock);
@@ -282,6 +291,8 @@ static int journal_finish_inode_data_buf
 				&jinode->i_transaction->t_inode_list);
 		} else {
 			jinode->i_transaction = NULL;
+			jinode->i_dirty_start = 0;
+			jinode->i_dirty_end = 0;
 		}
 	}
 	spin_unlock(&journal->j_list_lock);
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -94,6 +94,8 @@ EXPORT_SYMBOL(jbd2_journal_try_to_free_b
 EXPORT_SYMBOL(jbd2_journal_force_commit);
 EXPORT_SYMBOL(jbd2_journal_inode_add_write);
 EXPORT_SYMBOL(jbd2_journal_inode_add_wait);
+EXPORT_SYMBOL(jbd2_journal_inode_ranged_write);
+EXPORT_SYMBOL(jbd2_journal_inode_ranged_wait);
 EXPORT_SYMBOL(jbd2_journal_init_jbd_inode);
 EXPORT_SYMBOL(jbd2_journal_release_jbd_inode);
 EXPORT_SYMBOL(jbd2_journal_begin_ordered_truncate);
@@ -2574,6 +2576,8 @@ void jbd2_journal_init_jbd_inode(struct
 	jinode->i_next_transaction = NULL;
 	jinode->i_vfs_inode = inode;
 	jinode->i_flags = 0;
+	jinode->i_dirty_start = 0;
+	jinode->i_dirty_end = 0;
 	INIT_LIST_HEAD(&jinode->i_list);
 }
 
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -2565,7 +2565,7 @@ void jbd2_journal_refile_buffer(journal_
  * File inode in the inode list of the handle's transaction
  */
 static int jbd2_journal_file_inode(handle_t *handle, struct jbd2_inode *jinode,
-				   unsigned long flags)
+		unsigned long flags, loff_t start_byte, loff_t end_byte)
 {
 	transaction_t *transaction = handle->h_transaction;
 	journal_t *journal;
@@ -2577,26 +2577,17 @@ static int jbd2_journal_file_inode(handl
 	jbd_debug(4, "Adding inode %lu, tid:%d\n", jinode->i_vfs_inode->i_ino,
 			transaction->t_tid);
 
-	/*
-	 * First check whether inode isn't already on the transaction's
-	 * lists without taking the lock. Note that this check is safe
-	 * without the lock as we cannot race with somebody removing inode
-	 * from the transaction. The reason is that we remove inode from the
-	 * transaction only in journal_release_jbd_inode() and when we commit
-	 * the transaction. We are guarded from the first case by holding
-	 * a reference to the inode. We are safe against the second case
-	 * because if jinode->i_transaction == transaction, commit code
-	 * cannot touch the transaction because we hold reference to it,
-	 * and if jinode->i_next_transaction == transaction, commit code
-	 * will only file the inode where we want it.
-	 */
-	if ((jinode->i_transaction == transaction ||
-	    jinode->i_next_transaction == transaction) &&
-	    (jinode->i_flags & flags) == flags)
-		return 0;
-
 	spin_lock(&journal->j_list_lock);
 	jinode->i_flags |= flags;
+
+	if (jinode->i_dirty_end) {
+		jinode->i_dirty_start = min(jinode->i_dirty_start, start_byte);
+		jinode->i_dirty_end = max(jinode->i_dirty_end, end_byte);
+	} else {
+		jinode->i_dirty_start = start_byte;
+		jinode->i_dirty_end = end_byte;
+	}
+
 	/* Is inode already attached where we need it? */
 	if (jinode->i_transaction == transaction ||
 	    jinode->i_next_transaction == transaction)
@@ -2631,12 +2622,28 @@ done:
 int jbd2_journal_inode_add_write(handle_t *handle, struct jbd2_inode *jinode)
 {
 	return jbd2_journal_file_inode(handle, jinode,
-				       JI_WRITE_DATA | JI_WAIT_DATA);
+			JI_WRITE_DATA | JI_WAIT_DATA, 0, LLONG_MAX);
 }
 
 int jbd2_journal_inode_add_wait(handle_t *handle, struct jbd2_inode *jinode)
 {
-	return jbd2_journal_file_inode(handle, jinode, JI_WAIT_DATA);
+	return jbd2_journal_file_inode(handle, jinode, JI_WAIT_DATA, 0,
+			LLONG_MAX);
+}
+
+int jbd2_journal_inode_ranged_write(handle_t *handle,
+		struct jbd2_inode *jinode, loff_t start_byte, loff_t length)
+{
+	return jbd2_journal_file_inode(handle, jinode,
+			JI_WRITE_DATA | JI_WAIT_DATA, start_byte,
+			start_byte + length - 1);
+}
+
+int jbd2_journal_inode_ranged_wait(handle_t *handle, struct jbd2_inode *jinode,
+		loff_t start_byte, loff_t length)
+{
+	return jbd2_journal_file_inode(handle, jinode, JI_WAIT_DATA,
+			start_byte, start_byte + length - 1);
 }
 
 /*
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -454,6 +454,22 @@ struct jbd2_inode {
 	 * @i_flags: Flags of inode [j_list_lock]
 	 */
 	unsigned long i_flags;
+
+	/**
+	 * @i_dirty_start:
+	 *
+	 * Offset in bytes where the dirty range for this inode starts.
+	 * [j_list_lock]
+	 */
+	loff_t i_dirty_start;
+
+	/**
+	 * @i_dirty_end:
+	 *
+	 * Inclusive offset in bytes where the dirty range for this inode
+	 * ends. [j_list_lock]
+	 */
+	loff_t i_dirty_end;
 };
 
 struct jbd2_revoke_table_s;
@@ -1400,6 +1416,12 @@ extern int	   jbd2_journal_force_commit(
 extern int	   jbd2_journal_force_commit_nested(journal_t *);
 extern int	   jbd2_journal_inode_add_write(handle_t *handle, struct jbd2_inode *inode);
 extern int	   jbd2_journal_inode_add_wait(handle_t *handle, struct jbd2_inode *inode);
+extern int	   jbd2_journal_inode_ranged_write(handle_t *handle,
+			struct jbd2_inode *inode, loff_t start_byte,
+			loff_t length);
+extern int	   jbd2_journal_inode_ranged_wait(handle_t *handle,
+			struct jbd2_inode *inode, loff_t start_byte,
+			loff_t length);
 extern int	   jbd2_journal_begin_ordered_truncate(journal_t *journal,
 				struct jbd2_inode *inode, loff_t new_size);
 extern void	   jbd2_journal_init_jbd_inode(struct jbd2_inode *jinode, struct inode *inode);



  parent reply	other threads:[~2019-07-26 15:31 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-26 15:24 [PATCH 5.1 00/62] 5.1.21-stable review Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.1 01/62] bnx2x: Prevent load reordering in tx completion processing Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.1 02/62] caif-hsi: fix possible deadlock in cfhsi_exit_module() Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.1 03/62] hv_netvsc: Fix extra rcu_read_unlock in netvsc_recv_callback() Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.1 04/62] igmp: fix memory leak in igmpv3_del_delrec() Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.1 05/62] ipv4: dont set IPv6 only flags to IPv4 addresses Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.1 06/62] ipv6: rt6_check should return NULL if from is NULL Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.1 07/62] ipv6: Unlink sibling route in case of failure Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.1 08/62] net: bcmgenet: use promisc for unsupported filters Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.1 09/62] net: dsa: mv88e6xxx: wait after reset deactivation Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.1 10/62] net: make skb_dst_force return true when dst is refcounted Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.1 11/62] net: neigh: fix multiple neigh timer scheduling Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.1 12/62] net: openvswitch: fix csum updates for MPLS actions Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.1 13/62] net: phy: sfp: hwmon: Fix scaling of RX power Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.1 14/62] net_sched: unset TCQ_F_CAN_BYPASS when adding filters Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.1 15/62] net: stmmac: Re-work the queue selection for TSO packets Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.1 16/62] net/tls: make sure offload also gets the keys wiped Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.1 17/62] nfc: fix potential illegal memory access Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.1 18/62] r8169: fix issue with confused RX unit after PHY power-down on RTL8411b Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.1 19/62] rxrpc: Fix send on a connected, but unbound socket Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.1 20/62] sctp: fix error handling on stream scheduler initialization Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.1 21/62] sctp: not bind the socket in sctp_connect Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.1 22/62] sky2: Disable MSI on ASUS P6T Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.1 23/62] tcp: be more careful in tcp_fragment() Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.1 24/62] tcp: fix tcp_set_congestion_control() use from bpf hook Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.1 25/62] tcp: Reset bytes_acked and bytes_received when disconnecting Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.1 26/62] vrf: make sure skb->data contains ip header to make routing Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.1 27/62] net/mlx5e: IPoIB, Add error path in mlx5_rdma_setup_rn Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.1 28/62] net: bridge: mcast: fix stale nsrcs pointer in igmp3/mld2 report handling Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.1 29/62] net: bridge: mcast: fix stale ipv6 hdr pointer when handling v6 query Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.1 30/62] net: bridge: dont cache ether dest pointer on input Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.1 31/62] net: bridge: stp: dont cache eth dest pointer before skb pull Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.1 32/62] macsec: fix use-after-free of skb during RX Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.1 33/62] macsec: fix checksumming after decryption Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.1 34/62] netrom: fix a memory leak in nr_rx_frame() Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.1 35/62] netrom: hold sock when setting skb->destructor Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.1 36/62] selftests: txring_overwrite: fix incorrect test of mmap() return value Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.1 37/62] net/tls: fix poll ignoring partially copied records Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.1 38/62] net/tls: reject offload of TLS 1.3 Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.1 39/62] net/mlx5e: Fix port tunnel GRE entropy control Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.1 40/62] net/mlx5e: Rx, Fix checksum calculation for new hardware Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.1 41/62] net/mlx5e: Fix return value from timeout recover function Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.1 42/62] net/mlx5e: Fix error flow in tx reporter diagnose Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.1 43/62] dma-buf: balance refcount inbalance Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.1 44/62] dma-buf: Discard old fence_excl on retrying get_fences_rcu for realloc Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.1 45/62] gpiolib: of: fix a memory leak in of_gpio_flags_quirks() Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.1 46/62] gpio: davinci: silence error prints in case of EPROBE_DEFER Greg Kroah-Hartman
2019-07-26 15:24 ` [PATCH 5.1 47/62] MIPS: lb60: Fix pin mappings Greg Kroah-Hartman
2019-07-26 15:25 ` [PATCH 5.1 48/62] perf script: Assume native_arch for pipe mode Greg Kroah-Hartman
2019-07-26 15:25 ` [PATCH 5.1 49/62] perf/core: Fix exclusive events grouping Greg Kroah-Hartman
2019-07-26 15:25 ` [PATCH 5.1 50/62] perf/core: Fix race between close() and fork() Greg Kroah-Hartman
2019-07-26 15:25 ` [PATCH 5.1 51/62] ext4: dont allow any modifications to an immutable file Greg Kroah-Hartman
2019-07-26 15:25 ` [PATCH 5.1 52/62] ext4: enforce the immutable flag on open files Greg Kroah-Hartman
2019-07-26 15:25 ` [PATCH 5.1 53/62] mm: add filemap_fdatawait_range_keep_errors() Greg Kroah-Hartman
2019-07-26 15:25 ` Greg Kroah-Hartman [this message]
2019-07-26 15:25 ` [PATCH 5.1 55/62] ext4: use jbd2_inode dirty range scoping Greg Kroah-Hartman
2019-07-26 15:25 ` [PATCH 5.1 56/62] ext4: allow directory holes Greg Kroah-Hartman
2019-07-26 15:25 ` [PATCH 5.1 57/62] KVM: nVMX: do not use dangling shadow VMCS after guest reset Greg Kroah-Hartman
2019-07-26 15:25 ` [PATCH 5.1 58/62] KVM: nVMX: Clear pending KVM_REQ_GET_VMCS12_PAGES when leaving nested Greg Kroah-Hartman
2019-07-26 15:25 ` [PATCH 5.1 59/62] Revert "kvm: x86: Use task structs fpu field for user" Greg Kroah-Hartman
2019-07-26 15:25 ` [PATCH 5.1 60/62] sd_zbc: Fix report zones buffer allocation Greg Kroah-Hartman
2019-07-26 15:25 ` [PATCH 5.1 61/62] block: Limit zone array allocation size Greg Kroah-Hartman
2019-07-26 15:25 ` [PATCH 5.1 62/62] mm: vmscan: scan anonymous pages on file refaults Greg Kroah-Hartman
2019-07-27  2:34 ` [PATCH 5.1 00/62] 5.1.21-stable review shuah
2019-07-27  4:34 ` kernelci.org bot
2019-07-27  5:34 ` Naresh Kamboju
2019-07-27 16:07 ` Guenter Roeck
2019-07-29  9:02 ` Jon Hunter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190726152307.676250429@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=jack@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=tytso@mit.edu \
    --cc=zwisler@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).