cluster-devel.redhat.com archive mirror
 help / color / mirror / Atom feed
From: Andreas Gruenbacher <agruenba@redhat.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [GFS2 PATCH 11/12] gfs2: Fix iomap write page reclaim deadlock
Date: Sat,  8 Jun 2019 14:16:01 +0200	[thread overview]
Message-ID: <20190608121601.4897-1-agruenba@redhat.com> (raw)
In-Reply-To: <5c5d673b-57dd-7720-4420-badac31b76c3@citrix.com>

Hi Ross,

On Fri, 7 Jun 2019 at 18:21, Ross Lagerwall <ross.lagerwall@citrix.com> wrote:
> On 5/7/19 9:32 PM, Andreas Gruenbacher wrote:
> > Since commit 64bc06bb32ee ("gfs2: iomap buffered write support"), gfs2 is doing
> > buffered writes by starting a transaction in iomap_begin, writing a range of
> > pages, and ending that transaction in iomap_end.  This approach suffers from
> > two problems:
> >
> >    (1) Any allocations necessary for the write are done in iomap_begin, so when
> >    the data aren't journaled, there is no need for keeping the transaction open
> >    until iomap_end.
> >
> >    (2) Transactions keep the gfs2 log flush lock held.  When
> >    iomap_file_buffered_write calls balance_dirty_pages, this can end up calling
> >    gfs2_write_inode, which will try to flush the log.  This requires taking the
> >    log flush lock which is already held, resulting in a deadlock.
> >
> > Fix both of these issues by not keeping transactions open from iomap_begin to
> > iomap_end.  Instead, start a small transaction in page_prepare and end it in
> > page_done when necessary.
> >
> Unfortunately, this patch broke growing gfs2 filesystems. It is easy to
> reproduce:
>
> $ mkfs.gfs2 -t xxx:yyy /dev/xvdb  4369065
> $ mount /dev/xvdb /mnt
> $ gfs2_grow /mnt (doesn't finish)
> FS: Mount point:             /mnt
> FS: Device:                  /dev/xvdb
> FS: Size:                    4369062 (0x42aaa6)
> DEV: Length:                 13107200 (0xc80000)
> The file system will grow by 34133MB.
>
> Looking at the kernel log, I see it hits the following assertion and
> then hangs trying to withdraw the filesystem (which is a separate
> problem, presumably):
>
> gfs2: fsid=xxx:yyy.0: fatal: assertion "(nbuf <= tr->tr_blocks) &&
> (tr->tr_num_revoke <= tr->tr_revokes)" failed
>     function = gfs2_trans_end, file = fs/gfs2/trans.c, line = 117
> gfs2: fsid=xxx:yyy.0: about to withdraw this file system
>
> Rearranging the code so that it prints information about the transaction
> before the failed withdrawal attempt shows:
> gfs2: fsid=xxx:yyy.0: Transaction created at:
> iomap_write_begin.constprop.45+0xbc/0x380
> gfs2: fsid=xxx:yyy.0: blocks=1 revokes=0 reserved=8 touched=1
> gfs2: fsid=xxx:yyy.0: Buf 1/0 Databuf 1/0 Revoke 0/0
>
> Reverting this commit fixes the issue. Tested with git master as of
> today (16d72dd4891fe).

thanks for the error report. This turns out to be a rounding error in
gfs2_iomap_page_prepare; the attached patch should help.

Thanks,
Andreas

---
 fs/gfs2/bmap.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index f42718d..d2a3f038 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -994,9 +994,11 @@ static void gfs2_write_unlock(struct inode *inode)
 static int gfs2_iomap_page_prepare(struct inode *inode, loff_t pos,
 				   unsigned len, struct iomap *iomap)
 {
+	unsigned int blocks;
 	struct gfs2_sbd *sdp = GFS2_SB(inode);
 
-	return gfs2_trans_begin(sdp, RES_DINODE + (len >> inode->i_blkbits), 0);
+	blocks = (len + i_blocksize(inode) - 1) >> inode->i_blkbits;
+	return gfs2_trans_begin(sdp, RES_DINODE + blocks, 0);
 }
 
 static void gfs2_iomap_page_done(struct inode *inode, loff_t pos,
-- 
1.8.3.1



  reply	other threads:[~2019-06-08 12:16 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-07 20:31 [Cluster-devel] [GFS2 PATCH 01/12] gfs2: Fix loop in gfs2_rbm_find (v2) Andreas Gruenbacher
2019-05-07 20:31 ` [Cluster-devel] [GFS2 PATCH 02/12] gfs2: Fix lru_count going negative Andreas Gruenbacher
2019-05-07 20:31 ` [Cluster-devel] [GFS2 PATCH 03/12] gfs2: clean_journal improperly set sd_log_flush_head Andreas Gruenbacher
2019-05-07 20:31 ` [Cluster-devel] [GFS2 PATCH 04/12] gfs2: Fix occasional glock use-after-free Andreas Gruenbacher
2019-05-07 20:31 ` [Cluster-devel] [GFS2 PATCH 05/12] gfs2: Replace gl_revokes with a GLF flag Andreas Gruenbacher
2019-05-07 20:31 ` [Cluster-devel] [GFS2 PATCH 06/12] gfs2: Remove misleading comments in gfs2_evict_inode Andreas Gruenbacher
2019-05-07 20:31 ` [Cluster-devel] [GFS2 PATCH 07/12] gfs2: Remove unnecessary extern declarations Andreas Gruenbacher
2019-05-07 20:32 ` [Cluster-devel] [GFS2 PATCH 08/12] gfs2: Rename sd_log_le_{revoke, ordered} Andreas Gruenbacher
2019-05-07 20:32 ` [Cluster-devel] [GFS2 PATCH 09/12] gfs2: Rename gfs2_trans_{add_unrevoke => remove_revoke} Andreas Gruenbacher
2019-05-07 20:32 ` [Cluster-devel] [GFS2 PATCH 10/12] gfs2: fix race between gfs2_freeze_func and unmount Andreas Gruenbacher
2019-05-07 20:32 ` [Cluster-devel] [GFS2 PATCH 11/12] gfs2: Fix iomap write page reclaim deadlock Andreas Gruenbacher
2019-06-07 16:19   ` Ross Lagerwall
2019-06-08 12:16     ` Andreas Gruenbacher [this message]
2019-06-11  8:29       ` Ross Lagerwall
2019-05-07 20:32 ` [Cluster-devel] [GFS2 PATCH 12/12] gfs2: read journal in large chunks Andreas Gruenbacher

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190608121601.4897-1-agruenba@redhat.com \
    --to=agruenba@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).