All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ross Zwisler <zwisler@google.com>
To: Jan Kara <jack@suse.cz>
Cc: Ross Zwisler <zwisler@chromium.org>,
	linux-kernel@vger.kernel.org, Theodore Ts'o <tytso@mit.edu>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	Jan Kara <jack@suse.com>,
	linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org, Fletcher Woodruff <fletcherw@google.com>,
	Justin TerAvest <teravest@google.com>
Subject: Re: [PATCH 2/3] jbd2: introduce jbd2_inode dirty range scoping
Date: Thu, 20 Jun 2019 09:09:11 -0600	[thread overview]
Message-ID: <20190620150911.GA4488@google.com> (raw)
In-Reply-To: <20190620110454.GL13630@quack2.suse.cz>

On Thu, Jun 20, 2019 at 01:04:54PM +0200, Jan Kara wrote:
> On Wed 19-06-19 11:21:55, Ross Zwisler wrote:
> > Currently both journal_submit_inode_data_buffers() and
> > journal_finish_inode_data_buffers() operate on the entire address space
> > of each of the inodes associated with a given journal entry.  The
> > consequence of this is that if we have an inode where we are constantly
> > appending dirty pages we can end up waiting for an indefinite amount of
> > time in journal_finish_inode_data_buffers() while we wait for all the
> > pages under writeback to be written out.
> > 
> > The easiest way to cause this type of workload is do just dd from
> > /dev/zero to a file until it fills the entire filesystem.  This can
> > cause journal_finish_inode_data_buffers() to wait for the duration of
> > the entire dd operation.
> > 
> > We can improve this situation by scoping each of the inode dirty ranges
> > associated with a given transaction.  We do this via the jbd2_inode
> > structure so that the scoping is contained within jbd2 and so that it
> > follows the lifetime and locking rules for that structure.
> > 
> > This allows us to limit the writeback & wait in
> > journal_submit_inode_data_buffers() and
> > journal_finish_inode_data_buffers() respectively to the dirty range for
> > a given struct jdb2_inode, keeping us from waiting forever if the inode
> > in question is still being appended to.
> > 
> > Signed-off-by: Ross Zwisler <zwisler@google.com>
> 
> The patch looks good to me. I was thinking whether we should not have
> separate ranges for current and the next transaction but I guess it is not
> worth it at least for now. So just one nit below. With that applied feel free
> to add:
> 
> Reviewed-by: Jan Kara <jack@suse.cz>

We could definitely keep separate dirty ranges for each of the current and
next transaction.  I think the case where you would see a difference would be
if you had multiple transactions in a row which grew the dirty range for a
given jbd2_inode, and then had a random I/O workload which kept dirtying pages
inside that enlarged dirty range.

I'm not sure how often this type of workload would be a problem.  For the
workloads I've been testing which purely append to the inode, having a single
dirty range per jbd2_inode is sufficient.

I guess for now this single range seems simpler, but if later we find that
someone would benefit from separate tracking for each of the current and next
transactions, I'll take a shot at adding it.

Thank you for the review!

> > @@ -257,15 +262,24 @@ static int journal_finish_inode_data_buffers(journal_t *journal,
> >  	/* For locking, see the comment in journal_submit_data_buffers() */
> >  	spin_lock(&journal->j_list_lock);
> >  	list_for_each_entry(jinode, &commit_transaction->t_inode_list, i_list) {
> > +		loff_t dirty_start = jinode->i_dirty_start;
> > +		loff_t dirty_end = jinode->i_dirty_end;
> > +
> >  		if (!(jinode->i_flags & JI_WAIT_DATA))
> >  			continue;
> >  		jinode->i_flags |= JI_COMMIT_RUNNING;
> >  		spin_unlock(&journal->j_list_lock);
> > -		err = filemap_fdatawait_keep_errors(
> > -				jinode->i_vfs_inode->i_mapping);
> > +		err = filemap_fdatawait_range_keep_errors(
> > +				jinode->i_vfs_inode->i_mapping, dirty_start,
> > +				dirty_end);
> >  		if (!ret)
> >  			ret = err;
> >  		spin_lock(&journal->j_list_lock);
> > +
> > +		if (!jinode->i_next_transaction) {
> > +			jinode->i_dirty_start = 0;
> > +			jinode->i_dirty_end = 0;
> > +		}
> 
> This would be more logical in the next loop that moves jinode into the next
> transaction.

Yep, agreed, this is much better.  Fixed in v2.

  reply	other threads:[~2019-06-20 15:09 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-19 17:21 [PATCH 0/3] Add dirty range scoping to jbd2 Ross Zwisler
2019-06-19 17:21 ` [PATCH 1/3] mm: add filemap_fdatawait_range_keep_errors() Ross Zwisler
2019-06-20  9:25   ` Jan Kara
2019-06-19 17:21 ` [PATCH 2/3] jbd2: introduce jbd2_inode dirty range scoping Ross Zwisler
2019-06-20 11:04   ` Jan Kara
2019-06-20 15:09     ` Ross Zwisler [this message]
2019-06-20 17:22       ` Theodore Ts'o
2019-06-19 17:21 ` [PATCH 3/3] ext4: use " Ross Zwisler
2019-06-20 11:15   ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190620150911.GA4488@google.com \
    --to=zwisler@google.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=fletcherw@google.com \
    --cc=jack@suse.com \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=teravest@google.com \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    --cc=zwisler@chromium.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.