cluster-devel.redhat.com archive mirror
 help / color / mirror / Atom feed
* [Cluster-devel] [PATCH] gfs2: Prevent writeback in gfs2_file_write_iter
  2019-03-06 11:00 [Cluster-devel] GFS2 deadlock in 4.19 (iomap/writeback?) Edwin Török
@ 2019-03-13 17:13 ` Andreas Gruenbacher
  2019-03-14 11:18   ` Ross Lagerwall
  2019-03-15 20:58   ` Andreas Gruenbacher
  0 siblings, 2 replies; 7+ messages in thread
From: Andreas Gruenbacher @ 2019-03-13 17:13 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi Edwin,

On Wed, 6 Mar 2019 at 12:08, Edwin T?r?k <edvin.torok@citrix.com> wrote:
> Hello,
>
> I've been trying to debug a GFS2 deadlock that we see in our lab quite frequently with a 4.19 kernel. With 4.4 and older kernels we were not able to reproduce this.
> See below for lockdep dumps and stacktraces.

thanks for the thorough bug report.  Does the below fix work for you?

Thanks,
Andreas

--

Prevent page reclaim on the backing device of the filesystem in
gfs2_file_write_iter:

 - In gfs2_file_write_iter, iomap_file_buffered_write grabs the log
   flush lock (via iomap_apply -> iomap_begin -> gfs2_iomap_begin_write
   -> gfs2_trans_begin) and holds it for the duration of the write.

 - It then calls iomap_write_actor -> balance_dirty_pages.

 - If that triggers writeback on the same filesystem, we would try
   to grab the log flush lock again (via writeback_sb_inodes ->
   gfs2_write_inode -> gfs2_log_flush) and deadlock.

Prevent that by not setting backing_dev_info in gfs2_file_write_iter.

Reported-by: Edwin T?r?k <edvin.torok@citrix.com>
Fixes: 64bc06bb32ee ("gfs2: iomap buffered write support")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/file.c | 13 ++++---------
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index a2dea5bc04276..8c7d296d58608 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -816,16 +816,13 @@ static ssize_t gfs2_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
 	if (ret <= 0)
 		goto out;
 
-	/* We can write back this queue in page reclaim */
-	current->backing_dev_info = inode_to_bdi(inode);
-
 	ret = file_remove_privs(file);
 	if (ret)
-		goto out2;
+		goto out;
 
 	ret = file_update_time(file);
 	if (ret)
-		goto out2;
+		goto out;
 
 	if (iocb->ki_flags & IOCB_DIRECT) {
 		struct address_space *mapping = file->f_mapping;
@@ -834,11 +831,11 @@ static ssize_t gfs2_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
 
 		written = gfs2_file_direct_write(iocb, from);
 		if (written < 0 || !iov_iter_count(from))
-			goto out2;
+			goto out;
 
 		ret = iomap_file_buffered_write(iocb, from, &gfs2_iomap_ops);
 		if (unlikely(ret < 0))
-			goto out2;
+			goto out;
 		buffered = ret;
 
 		/*
@@ -867,8 +864,6 @@ static ssize_t gfs2_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
 			iocb->ki_pos += ret;
 	}
 
-out2:
-	current->backing_dev_info = NULL;
 out:
 	inode_unlock(inode);
 	if (likely(ret > 0)) {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [Cluster-devel] [PATCH] gfs2: Prevent writeback in gfs2_file_write_iter
  2019-03-13 17:13 ` [Cluster-devel] [PATCH] gfs2: Prevent writeback in gfs2_file_write_iter Andreas Gruenbacher
@ 2019-03-14 11:18   ` Ross Lagerwall
  2019-03-15 20:58   ` Andreas Gruenbacher
  1 sibling, 0 replies; 7+ messages in thread
From: Ross Lagerwall @ 2019-03-14 11:18 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On 3/13/19 5:13 PM, Andreas Gruenbacher wrote:
> Hi Edwin,
> 
> On Wed, 6 Mar 2019 at 12:08, Edwin T?r?k <edvin.torok@citrix.com> wrote:
>> Hello,
>>
>> I've been trying to debug a GFS2 deadlock that we see in our lab quite frequently with a 4.19 kernel. With 4.4 and older kernels we were not able to reproduce this.
>> See below for lockdep dumps and stacktraces.
> 
> thanks for the thorough bug report.  Does the below fix work for you?
> 
Hi Andreas,

I've tested the patch and it doesn't fix the issue. As far as I can see, 
current->backing_dev_info is not used by any of the code called from 
balance_dirty_pages_ratelimited() so I don't see how it could work.

I found a way of consistently reproducing the issue almost immediately 
(tested with the latest master commit):

# cat a.py
import os

fd = os.open("f", os.O_CREAT|os.O_TRUNC|os.O_WRONLY)

for i in range(1000):
     os.mkdir("xxx" + str(i), 0777)

buf = 'x' * 4096

while True:
     count = os.write(fd, buf)
     if count <= 0:
         break

# cat b.py
import os
while True:
   os.mkdir("x", 0777)
   os.rmdir("x")

# echo 8192 > /proc/sys/vm/dirty_bytes
# cd /gfs2mnt
# (mkdir tmp1; cd tmp1; python2 ~/a.py) &
# (mkdir tmp2; cd tmp2; python2 ~/a.py) &
# (mkdir tmp3; cd tmp3; python2 ~/b.py) &

This should deadlock almost immediately. One of the processes will be 
waiting in balance_dirty_pages() and holding sd_log_flush_lock and 
several others will be waiting for sd_log_flush_lock.

I came up with the following patch which seems to resolve the issue by 
failing to write the inode if it can't take the lock, but it seems like 
a dirty workaround rather than a proper fix:

diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c
index ee20ea42e7b5..0659560bb8c6 100644
--- a/fs/gfs2/log.c
+++ b/fs/gfs2/log.c
@@ -766,24 +766,29 @@ static void log_write_header(struct gfs2_sbd *sdp, 
u32 flags)
  }

  /**
- * gfs2_log_flush - flush incore transaction(s)
+ * __gfs2_log_flush - flush incore transaction(s)
   * @sdp: the filesystem
   * @gl: The glock structure to flush.  If NULL, flush the whole incore log
   * @flags: The log header flags: GFS2_LOG_HEAD_FLUSH_* and debug flags
+ * @wait: Wait for the sd_log_flush_lock
   *
   */

-void gfs2_log_flush(struct gfs2_sbd *sdp, struct gfs2_glock *gl, u32 flags)
+int __gfs2_log_flush(struct gfs2_sbd *sdp, struct gfs2_glock *gl, u32 
flags,
+		      bool wait)
  {
  	struct gfs2_trans *tr;
  	enum gfs2_freeze_state state = atomic_read(&sdp->sd_freeze_state);

-	down_write(&sdp->sd_log_flush_lock);
+	if (wait)
+		down_write(&sdp->sd_log_flush_lock);
+	else if (!down_write_trylock(&sdp->sd_log_flush_lock))
+		return -EAGAIN;

  	/* Log might have been flushed while we waited for the flush lock */
  	if (gl && !test_bit(GLF_LFLUSH, &gl->gl_flags)) {
  		up_write(&sdp->sd_log_flush_lock);
-		return;
+		return 0;
  	}
  	trace_gfs2_log_flush(sdp, 1, flags);

@@ -857,6 +862,12 @@ void gfs2_log_flush(struct gfs2_sbd *sdp, struct 
gfs2_glock *gl, u32 flags)
  	up_write(&sdp->sd_log_flush_lock);

  	kfree(tr);
+	return 0;
+}
+
+void gfs2_log_flush(struct gfs2_sbd *sdp, struct gfs2_glock *gl, u32 flags)
+{
+	__gfs2_log_flush(sdp, gl, flags, true);
  }

  /**
diff --git a/fs/gfs2/log.h b/fs/gfs2/log.h
index 20241436126d..2404167004bb 100644
--- a/fs/gfs2/log.h
+++ b/fs/gfs2/log.h
@@ -72,6 +72,8 @@ extern void gfs2_log_release(struct gfs2_sbd *sdp, 
unsigned int blks);
  extern int gfs2_log_reserve(struct gfs2_sbd *sdp, unsigned int blks);
  extern void gfs2_write_log_header(struct gfs2_sbd *sdp, struct 
gfs2_jdesc *jd,
  				  u64 seq, u32 tail, u32 flags, int op_flags);
+extern int __gfs2_log_flush(struct gfs2_sbd *sdp, struct gfs2_glock *gl,
+			    u32 type, bool wait);
  extern void gfs2_log_flush(struct gfs2_sbd *sdp, struct gfs2_glock *gl,
  			   u32 type);
  extern void gfs2_log_commit(struct gfs2_sbd *sdp, struct gfs2_trans 
*trans);
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index a971862b186e..369502a16ad6 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -756,10 +756,14 @@ static int gfs2_write_inode(struct inode *inode, 
struct writeback_control *wbc)
  	int ret = 0;
  	bool flush_all = (wbc->sync_mode == WB_SYNC_ALL || gfs2_is_jdata(ip));

-	if (flush_all)
-		gfs2_log_flush(GFS2_SB(inode), ip->i_gl,
+	if (flush_all) {
+		ret = __gfs2_log_flush(GFS2_SB(inode), ip->i_gl,
  			       GFS2_LOG_HEAD_FLUSH_NORMAL |
-			       GFS2_LFC_WRITE_INODE);
+			       GFS2_LFC_WRITE_INODE,
+			       wbc->sync_mode == WB_SYNC_ALL);
+		if (ret)
+			return ret;
+	}
  	if (bdi->wb.dirty_exceeded)
  		gfs2_ail1_flush(sdp, wbc);
  	else
---

Regards,
-- 
Ross Lagerwall



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [Cluster-devel] [PATCH] gfs2: Prevent writeback in gfs2_file_write_iter
  2019-03-13 17:13 ` [Cluster-devel] [PATCH] gfs2: Prevent writeback in gfs2_file_write_iter Andreas Gruenbacher
  2019-03-14 11:18   ` Ross Lagerwall
@ 2019-03-15 20:58   ` Andreas Gruenbacher
  1 sibling, 0 replies; 7+ messages in thread
From: Andreas Gruenbacher @ 2019-03-15 20:58 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi Ross,

On Thu, 14 Mar 2019 at 12:18, Ross Lagerwall <ross.lagerwall@citrix.com> wrote:
> On 3/13/19 5:13 PM, Andreas Gruenbacher wrote:
> > Hi Edwin,
> >
> > On Wed, 6 Mar 2019 at 12:08, Edwin T?r?k <edvin.torok@citrix.com>
> > wrote:
> >> Hello,
> >>
> >> I've been trying to debug a GFS2 deadlock that we see in our lab
> >> quite frequently with a 4.19 kernel. With 4.4 and older kernels we
> >> were not able to reproduce this.
> >> See below for lockdep dumps and stacktraces.
> >
> > thanks for the thorough bug report.  Does the below fix work for
> > you?
> >
> Hi Andreas,
>
> I've tested the patch and it doesn't fix the issue. As far as I can see,
> current->backing_dev_info is not used by any of the code called from
> balance_dirty_pages_ratelimited() so I don't see how it could work.

yes, I see now.

> I found a way of consistently reproducing the issue almost immediately
> (tested with the latest master commit):
>
> # cat a.py
> import os
>
> fd = os.open("f", os.O_CREAT|os.O_TRUNC|os.O_WRONLY)
>
> for i in range(1000):
>      os.mkdir("xxx" + str(i), 0777)
>
> buf = 'x' * 4096
>
> while True:
>      count = os.write(fd, buf)
>      if count <= 0:
>          break
>
> # cat b.py
> import os
> while True:
>    os.mkdir("x", 0777)
>    os.rmdir("x")
>
> # echo 8192 > /proc/sys/vm/dirty_bytes
> # cd /gfs2mnt
> # (mkdir tmp1; cd tmp1; python2 ~/a.py) &
> # (mkdir tmp2; cd tmp2; python2 ~/a.py) &
> # (mkdir tmp3; cd tmp3; python2 ~/b.py) &
>
> This should deadlock almost immediately. One of the processes will be
> waiting in balance_dirty_pages() and holding sd_log_flush_lock and
> several others will be waiting for sd_log_flush_lock.

This doesn't work for me: the python processes don't even start properly
when dirty_bytes is set so low.

> I came up with the following patch which seems to resolve the issue by
> failing to write the inode if it can't take the lock, but it seems
> like a dirty workaround rather than a proper fix:
>
> [...]

Looking at ext4_dirty_inode, it seems that we should just be able to
bail out of gfs2_write_inode an return 0 when PF_MEMALLOC is set in
current->flags.

Also, we should probably add the current->flags checks from
xfs_do_writepage to gfs2_writepage_common.

So what do you get with the below patch?

Thanks,
Andreas

---
 fs/gfs2/aops.c  | 7 +++++++
 fs/gfs2/super.c | 4 ++++
 2 files changed, 11 insertions(+)

diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index 05dd78f..694ff91 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -102,6 +102,13 @@ static int gfs2_writepage_common(struct page *page,
 	pgoff_t end_index = i_size >> PAGE_SHIFT;
 	unsigned offset;
 
+	/* (see xfs_do_writepage) */
+	if (WARN_ON_ONCE((current->flags & (PF_MEMALLOC|PF_KSWAPD)) ==
+			 PF_MEMALLOC))
+		goto redirty;
+	if (WARN_ON_ONCE(current->flags & PF_MEMALLOC_NOFS))
+		goto redirty;
+
 	if (gfs2_assert_withdraw(sdp, gfs2_glock_is_held_excl(ip->i_gl)))
 		goto out;
 	if (current->journal_info)
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index ca71163..540535c 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -756,6 +756,10 @@ static int gfs2_write_inode(struct inode *inode, struct writeback_control *wbc)
 	int ret = 0;
 	bool flush_all = (wbc->sync_mode == WB_SYNC_ALL || gfs2_is_jdata(ip));
 
+	/* (see ext4_dirty_inode) */
+	if (current->flags & PF_MEMALLOC)
+		return 0;
+
 	if (flush_all)
 		gfs2_log_flush(GFS2_SB(inode), ip->i_gl,
 			       GFS2_LOG_HEAD_FLUSH_NORMAL |
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [Cluster-devel] [PATCH] gfs2: Prevent writeback in  gfs2_file_write_iter
       [not found] <19:84>
@ 2019-03-16 23:58 ` Mark Syms
  2019-03-17 20:06   ` Andreas Gruenbacher
  0 siblings, 1 reply; 7+ messages in thread
From: Mark Syms @ 2019-03-16 23:58 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Sadly, this doesn't help and seems to make the situation worse. Our automated tests were previously seeing about 5% failure rate and with this patch its 20%. We still need to verify that they're all down to the same failure but I thought it better to give some early feedback.

Surprised that the script didn't run for you but the limit on dirty bytes was just to get it to occur more frequently. You may have success with larger values.

Mark

Message: 1
Date: Fri, 15 Mar 2019 21:58:12 +0100
From: Andreas Gruenbacher <agruenba at redhat.com<mailto:agruenba@redhat.com>>
To: Ross Lagerwall <ross.lagerwall at citrix.com<mailto:ross.lagerwall@citrix.com>>
Cc: igor.druzhinin at citrix.com<mailto:igor.druzhinin@citrix.com>, Sergey Dyasli
        <sergey.dyasli at citrix.com<mailto:sergey.dyasli@citrix.com>>,     Mark Syms <Mark.Syms at citrix.com<mailto:Mark.Syms@citrix.com>>,
        cluster-devel at redhat.com<mailto:cluster-devel@redhat.com>
Subject: Re: [Cluster-devel] [PATCH] gfs2: Prevent writeback in
        gfs2_file_write_iter
Message-ID: <20190315205812.22727-1-agruenba at redhat.com<mailto:20190315205812.22727-1-agruenba@redhat.com>>
Content-Type: text/plain; charset=UTF-8

Hi Ross,

On Thu, 14 Mar 2019 at 12:18, Ross Lagerwall <ross.lagerwall at citrix.com<mailto:ross.lagerwall@citrix.com>> wrote:
> On 3/13/19 5:13 PM, Andreas Gruenbacher wrote:
> > Hi Edwin,
> >
> > On Wed, 6 Mar 2019 at 12:08, Edwin T?r?k <edvin.torok at citrix.com<mailto:edvin.torok@citrix.com>>
> > wrote:
> >> Hello,
> >>
> >> I've been trying to debug a GFS2 deadlock that we see in our lab
> >> quite frequently with a 4.19 kernel. With 4.4 and older kernels we
> >> were not able to reproduce this.
> >> See below for lockdep dumps and stacktraces.
> >
> > thanks for the thorough bug report.  Does the below fix work for
> > you?
> >
> Hi Andreas,
>
> I've tested the patch and it doesn't fix the issue. As far as I can see,
> current->backing_dev_info is not used by any of the code called from
> balance_dirty_pages_ratelimited() so I don't see how it could work.

yes, I see now.

> I found a way of consistently reproducing the issue almost immediately
> (tested with the latest master commit):
>
> # cat a.py
> import os
>
> fd = os.open("f", os.O_CREAT|os.O_TRUNC|os.O_WRONLY)
>
> for i in range(1000):
>      os.mkdir("xxx" + str(i), 0777)
>
> buf = 'x' * 4096
>
> while True:
>      count = os.write(fd, buf)
>      if count <= 0:
>          break
>
> # cat b.py
> import os
> while True:
>    os.mkdir("x", 0777)
>    os.rmdir("x")
>
> # echo 8192 > /proc/sys/vm/dirty_bytes
> # cd /gfs2mnt
> # (mkdir tmp1; cd tmp1; python2 ~/a.py) &
> # (mkdir tmp2; cd tmp2; python2 ~/a.py) &
> # (mkdir tmp3; cd tmp3; python2 ~/b.py) &
>
> This should deadlock almost immediately. One of the processes will be
> waiting in balance_dirty_pages() and holding sd_log_flush_lock and
> several others will be waiting for sd_log_flush_lock.

This doesn't work for me: the python processes don't even start properly
when dirty_bytes is set so low.

> I came up with the following patch which seems to resolve the issue by
> failing to write the inode if it can't take the lock, but it seems
> like a dirty workaround rather than a proper fix:
>
> [...]

Looking at ext4_dirty_inode, it seems that we should just be able to
bail out of gfs2_write_inode an return 0 when PF_MEMALLOC is set in
current->flags.

Also, we should probably add the current->flags checks from
xfs_do_writepage to gfs2_writepage_common.

So what do you get with the below patch?

Thanks,
Andreas

---
 fs/gfs2/aops.c  | 7 +++++++
 fs/gfs2/super.c | 4 ++++
 2 files changed, 11 insertions(+)

diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index 05dd78f..694ff91 100644<tel:100644>
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -102,6 +102,13 @@ static int gfs2_writepage_common(struct page *page,
         pgoff_t end_index = i_size >> PAGE_SHIFT;
         unsigned offset;

+       /* (see xfs_do_writepage) */
+       if (WARN_ON_ONCE((current->flags & (PF_MEMALLOC|PF_KSWAPD)) ==
+                        PF_MEMALLOC))
+               goto redirty;
+       if (WARN_ON_ONCE(current->flags & PF_MEMALLOC_NOFS))
+               goto redirty;
+
         if (gfs2_assert_withdraw(sdp, gfs2_glock_is_held_excl(ip->i_gl)))
                 goto out;
         if (current->journal_info)
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index ca71163..540535c 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -756,6 +756,10 @@ static int gfs2_write_inode(struct inode *inode, struct writeback_control *wbc)
         int ret = 0;
         bool flush_all = (wbc->sync_mode == WB_SYNC_ALL || gfs2_is_jdata(ip));

+       /* (see ext4_dirty_inode) */
+       if (current->flags & PF_MEMALLOC)
+               return 0;
+
         if (flush_all)
                 gfs2_log_flush(GFS2_SB(inode), ip->i_gl,
                                GFS2_LOG_HEAD_FLUSH_NORMAL |
--
1.8.3.1



------------------------------

_______________________________________________
Cluster-devel mailing list
Cluster-devel at redhat.com<mailto:Cluster-devel@redhat.com>
https://www.redhat.com/mailman/listinfo/cluster-devel

End of Cluster-devel Digest, Vol 154, Issue 8
*********************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/cluster-devel/attachments/20190316/c07148c0/attachment.htm>

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [Cluster-devel] [PATCH] gfs2: Prevent writeback in gfs2_file_write_iter
  2019-03-16 23:58 ` [Cluster-devel] [PATCH] gfs2: Prevent writeback in gfs2_file_write_iter Mark Syms
@ 2019-03-17 20:06   ` Andreas Gruenbacher
  2019-03-18 15:10     ` Mark Syms
  0 siblings, 1 reply; 7+ messages in thread
From: Andreas Gruenbacher @ 2019-03-17 20:06 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Sun, 17 Mar 2019 at 00:59, Mark Syms <Mark.Syms@citrix.com> wrote:
>
> Sadly, this doesn't help and seems to make the situation worse. Our automated tests were previously seeing about 5% failure rate and with this patch its 20%.

So the PF_MEMALLOC flag doesn't get set on that code path. In that
case, we should probably check wbc->sync_mode instead. How about the
following?

diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index 540535c..36de2f7 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -757,7 +757,7 @@ static int gfs2_write_inode(struct inode *inode,
struct writeback_control *wbc)
     bool flush_all = (wbc->sync_mode == WB_SYNC_ALL || gfs2_is_jdata(ip));

     /* (see ext4_dirty_inode) */
-    if (current->flags & PF_MEMALLOC)
+    if (wbc->sync_mode != WB_SYNC_ALL)
         return 0;

     if (flush_all)

It may help to wrap that condition in WARN_ON_ONCE for debugging.

> We still need to verify that they're all down to the same failure but I thought it better to give some early feedback.
>
> Surprised that the script didn't run for you but the limit on dirty bytes was just to get it to occur more frequently. You may have success with larger values.

I'll keep trying to reproduce.

Thanks,
Andreas



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [Cluster-devel] [PATCH] gfs2: Prevent writeback in gfs2_file_write_iter
  2019-03-17 20:06   ` Andreas Gruenbacher
@ 2019-03-18 15:10     ` Mark Syms
  2019-03-18 16:30       ` Andreas Gruenbacher
  0 siblings, 1 reply; 7+ messages in thread
From: Mark Syms @ 2019-03-18 15:10 UTC (permalink / raw)
  To: cluster-devel.redhat.com

-----Original Message-----
From: Andreas Gruenbacher <agruenba@redhat.com> 
Sent: 17 March 2019 20:06
To: Mark Syms <Mark.Syms@citrix.com>
Cc: cluster-devel at redhat.com; Sergey Dyasli <sergey.dyasli@citrix.com>; Igor Druzhinin <igor.druzhinin@citrix.com>; Edvin Torok <edvin.torok@citrix.com>
Subject: Re: [Cluster-devel] [PATCH] gfs2: Prevent writeback in gfs2_file_write_iter

On Sun, 17 Mar 2019 at 00:59, Mark Syms <Mark.Syms@citrix.com> wrote:
>
> Sadly, this doesn't help and seems to make the situation worse. Our automated tests were previously seeing about 5% failure rate and with this patch its 20%.

So the PF_MEMALLOC flag doesn't get set on that code path. In that case, we should probably check wbc->sync_mode instead. How about the following?

diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c index 540535c..36de2f7 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -757,7 +757,7 @@ static int gfs2_write_inode(struct inode *inode, struct writeback_control *wbc)
     bool flush_all = (wbc->sync_mode == WB_SYNC_ALL || gfs2_is_jdata(ip));

     /* (see ext4_dirty_inode) */
-    if (current->flags & PF_MEMALLOC)
+    if (wbc->sync_mode != WB_SYNC_ALL)
         return 0;

     if (flush_all)

It may help to wrap that condition in WARN_ON_ONCE for debugging.

[Mark Syms] That works better. We were wondering whether it might be a bit too aggressive though in that it skips writing the inode entirely unless we have WB_SYNC_ALL whereas the patch that Ross Lagerwall posted originally would use a trylock in the case where we don't have WB_SYNC_ALL and then fail out. Whether or not this should just silently return 0 or -EAGAIN as Ross' patch does is a matter of style I guess. What are your thoughts on this?

Thanks,

Mark.




^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Cluster-devel] [PATCH] gfs2: Prevent writeback in gfs2_file_write_iter
  2019-03-18 15:10     ` Mark Syms
@ 2019-03-18 16:30       ` Andreas Gruenbacher
  0 siblings, 0 replies; 7+ messages in thread
From: Andreas Gruenbacher @ 2019-03-18 16:30 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Mon, 18 Mar 2019 at 16:15, Mark Syms <Mark.Syms@citrix.com> wrote:
> [Mark Syms] That works better.

Okay, great.

> We were wondering whether it might be a bit too aggressive though in that it skips writing the inode entirely unless we have WB_SYNC_ALL whereas the patch that Ross Lagerwall posted originally would use a trylock in the case where we don't have WB_SYNC_ALL and then fail out.

Yes, that may well be true. I'll have to think about it some more;
maybe we can avoid the trylock somehow else.

> Whether or not this should just silently return 0 or -EAGAIN as Ross' patch does is a matter of style I guess. What are your thoughts on this?

Well, write_inode is allowed to skip inodes in the WB_SYNC_NONE case.
If we return an error here, that error might leak and create nasty
surprises, so we should better not.

Thanks,
Andreas



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-03-18 16:30 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <19:84>
2019-03-16 23:58 ` [Cluster-devel] [PATCH] gfs2: Prevent writeback in gfs2_file_write_iter Mark Syms
2019-03-17 20:06   ` Andreas Gruenbacher
2019-03-18 15:10     ` Mark Syms
2019-03-18 16:30       ` Andreas Gruenbacher
2019-03-06 11:00 [Cluster-devel] GFS2 deadlock in 4.19 (iomap/writeback?) Edwin Török
2019-03-13 17:13 ` [Cluster-devel] [PATCH] gfs2: Prevent writeback in gfs2_file_write_iter Andreas Gruenbacher
2019-03-14 11:18   ` Ross Lagerwall
2019-03-15 20:58   ` Andreas Gruenbacher

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).