Re: [Cluster-devel] [PATCH 2/2] gfs2: writeout truncated pages

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Benjamin Marzinski" <bmarzins@redhat.com>
To: Bob Peterson <rpeterso@redhat.com>
Cc: linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	cluster-devel <cluster-devel@redhat.com>
Subject: Re: [Cluster-devel] [PATCH 2/2] gfs2: writeout truncated pages
Date: Wed, 15 Jun 2016 13:45:56 -0500	[thread overview]
Message-ID: <20160615184556.GL8095@octiron.msp.redhat.com> (raw)
In-Reply-To: <2059374940.19514905.1466010482840.JavaMail.zimbra@redhat.com>

On Wed, Jun 15, 2016 at 01:08:02PM -0400, Bob Peterson wrote:
> Hi Ben,
> 
> Comments below.
> 
> ----- Original Message -----
> | When gfs2 attempts to write a page to a file that is being truncated,
> | and notices that the page is completely outside of the file size, it
> | tries to invalidate it.  However, this may require a transaction for
> | journaled data files to revoke any buffers from the page on the active
> | items list. Unfortunately, this can happen inside a log flush, where a
> | transaction cannot be started. Also, gfs2 may need to be able to remove
> | the buffer from the ail1 list before it can finish the log flush.
> | 
> | To deal with this, gfs2 now skips the check to see if the write is
> | outside the file size, and simply writes it anyway. This situation can
> | only occur when the truncate code still has the file locked exclusively,
> | and hasn't marked this block as free in the metadata (which happens
> | later in truc_dealloc).  After gfs2 writes this page out, the truncation
> | code will shortly invalidate it and write out any revokes if necessary.
> | 
> | To do this, gfs2 now implements its own version of block_write_full_page
> | without the check, and calls the newly exported __block_write_full_page.
> | The check still exists in nobh_writepage, which is gfs2 calls in the
> | non-journaled case. But in this case the buffers will never be in the
> | journal. Thus, no revokes will need to be issued and the write can
> | safely be skipped without causing any possible journal replay issues.
> | 
> | Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> | ---
> |  fs/gfs2/aops.c | 37 +++++++++++++++++++++++++++----------
> |  1 file changed, 27 insertions(+), 10 deletions(-)
> | 
> | diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
> | index 37b7bc1..d3a7301 100644
> | --- a/fs/gfs2/aops.c
> | +++ b/fs/gfs2/aops.c
> | @@ -100,20 +100,11 @@ static int gfs2_writepage_common(struct page *page,
> |  	struct inode *inode = page->mapping->host;
> |  	struct gfs2_inode *ip = GFS2_I(inode);
> |  	struct gfs2_sbd *sdp = GFS2_SB(inode);
> | -	loff_t i_size = i_size_read(inode);
> | -	pgoff_t end_index = i_size >> PAGE_SHIFT;
> | -	unsigned offset;
> |  
> |  	if (gfs2_assert_withdraw(sdp, gfs2_glock_is_held_excl(ip->i_gl)))
> |  		goto out;
> |  	if (current->journal_info)
> |  		goto redirty;
> | -	/* Is the page fully outside i_size? (truncate in progress) */
> | -	offset = i_size & (PAGE_SIZE-1);
> | -	if (page->index > end_index || (page->index == end_index && !offset)) {
> | -		page->mapping->a_ops->invalidatepage(page, 0, PAGE_SIZE);
> | -		goto out;
> | -	}
> 
> The concept in general looks good. Two things:
> 
> (1) Since this is writepage_common, does it still need to do the
> check for the normal non-jdata writeback? Does it makes sense to
> split this back up into two writepage functions again rather than
> the common one? After all, with your patch, the function does almost
> nothing anymore.

We do not need to worry about the non-jdata case.  nobh_writepage(),
which is called after gfs2_writepage_common in the non-jdata case, has
the identical check. It doesn't invalidate the page, but that's fine,
because the truncate code is about to do that anyway.

This isn't even as big a change as it looks.  In cases where we weren't
doing this writepage during a log flush, there was nothing to guarantee
that the truncate code wouldn't shrink the file size after the check in
gfs2_writepage_common, and cause us to hit the check in nobh_writepage
instead. Nate hit that scenario while testing the jdata case. In fact,
there's nothing to guarantee that the i_size doesn't change after the
check in nobh_writepage either, in which case we simply write out the
page, like in the jdata case.

> (2) Just a nit, but can you eliminate the ugly goto around the return?
> In other words, rather than:
> 
> 	if (current->journal_info)
> 		goto redirty;
> 	return 1;
> redirty:
> 
> Just simply do:
> 
> 	if (!current->journal_info)
> 		return 1;
> 

Sure. I'll clean that up and respin the patches.

-Ben

> Regards,
> 
> Bob Peterson
> Red Hat File Systems

next prev parent reply	other threads:[~2016-06-15 18:45 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-14 22:02 [PATCH 0/2] fix gfs2 truncate race Benjamin Marzinski
2016-06-14 22:02 ` [PATCH 1/2] fs: export __block_write_full_page Benjamin Marzinski
2016-06-14 22:02 ` [PATCH 2/2] gfs2: writeout truncated pages Benjamin Marzinski
2016-06-15 17:08   ` [Cluster-devel] " Bob Peterson
2016-06-15 18:45     ` Benjamin Marzinski [this message]
2016-06-16 15:56   ` Steven Whitehouse
2016-06-16 17:49     ` Benjamin Marzinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160615184556.GL8095@octiron.msp.redhat.com \
    --to=bmarzins@redhat.com \
    --cc=cluster-devel@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=rpeterso@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).