public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] xfs: move global xfslogd workqueue to per-mount
@ 2014-10-31 16:34 Brian Foster
  2014-11-06 23:59 ` Dave Chinner
  0 siblings, 1 reply; 3+ messages in thread
From: Brian Foster @ 2014-10-31 16:34 UTC (permalink / raw)
  To: xfs

The xfslogd workqueue is a global, single-job workqueue for buffer ioend
processing. This means we allow for a single work item at a time for all
possible XFS mounts on a system. fsstress testing in loopback XFS over
XFS configurations has reproduced xfslogd deadlocks due to the single
threaded nature of the queue and dependencies introduced between the
separate XFS instances by online discard (-o discard).

Discard over a loopback device converts the discard request to a hole
punch (fallocate) on the underlying file. Online discard requests are
issued synchronously and from xfslogd context in XFS, hence the xfslogd
workqueue is blocked in the upper fs waiting on a hole punch request to
be servied in the lower fs. If the lower fs issues I/O that depends on
xfslogd to complete, both filesystems end up hung indefinitely. This is
reproduced reliabily by generic/013 on XFS->loop->XFS test devices with
the '-o discard' mount option.

Further, docker implementations appear to use this kind of configuration
for container instance filesystems by default (container fs->dm->
loop->base fs) and therefore are subject to this deadlock when running
on XFS.

Replace the global xfslogd workqueue with a per-mount variant. This
guarantees each mount access to a single worker and prevents deadlocks
due to inter-fs dependencies introduced by discard.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---

Hi all,

Thoughts? An alternative was to increase max jobs on the existing
workqueue, but this seems more in line with how we manage workqueues
these days.

Brian

 fs/xfs/xfs_buf.c   | 13 ++-----------
 fs/xfs/xfs_mount.h |  1 +
 fs/xfs/xfs_super.c | 11 ++++++++++-
 3 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 24b4ebe..758bc2e 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -44,8 +44,6 @@
 
 static kmem_zone_t *xfs_buf_zone;
 
-static struct workqueue_struct *xfslogd_workqueue;
-
 #ifdef XFS_BUF_LOCK_TRACKING
 # define XB_SET_OWNER(bp)	((bp)->b_last_holder = current->pid)
 # define XB_CLEAR_OWNER(bp)	((bp)->b_last_holder = -1)
@@ -1053,7 +1051,8 @@ xfs_buf_ioend_async(
 	struct xfs_buf	*bp)
 {
 	INIT_WORK(&bp->b_iodone_work, xfs_buf_ioend_work);
-	queue_work(xfslogd_workqueue, &bp->b_iodone_work);
+	queue_work(bp->b_target->bt_mount->m_xfslogd_workqueue,
+		   &bp->b_iodone_work);
 }
 
 void
@@ -1882,15 +1881,8 @@ xfs_buf_init(void)
 	if (!xfs_buf_zone)
 		goto out;
 
-	xfslogd_workqueue = alloc_workqueue("xfslogd",
-				WQ_MEM_RECLAIM | WQ_HIGHPRI | WQ_FREEZABLE, 1);
-	if (!xfslogd_workqueue)
-		goto out_free_buf_zone;
-
 	return 0;
 
- out_free_buf_zone:
-	kmem_zone_destroy(xfs_buf_zone);
  out:
 	return -ENOMEM;
 }
@@ -1898,6 +1890,5 @@ xfs_buf_init(void)
 void
 xfs_buf_terminate(void)
 {
-	destroy_workqueue(xfslogd_workqueue);
 	kmem_zone_destroy(xfs_buf_zone);
 }
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index b0447c8..664a92b 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -168,6 +168,7 @@ typedef struct xfs_mount {
 						/* low free space thresholds */
 	struct xfs_kobj		m_kobj;
 
+	struct workqueue_struct *m_xfslogd_workqueue;
 	struct workqueue_struct	*m_data_workqueue;
 	struct workqueue_struct	*m_unwritten_workqueue;
 	struct workqueue_struct	*m_cil_workqueue;
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 9f622fe..b85385c 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -842,10 +842,16 @@ STATIC int
 xfs_init_mount_workqueues(
 	struct xfs_mount	*mp)
 {
+	mp->m_xfslogd_workqueue = alloc_workqueue("xfslogd/%s",
+			WQ_MEM_RECLAIM|WQ_HIGHPRI|WQ_FREEZABLE, 1,
+			mp->m_fsname);
+	if (!mp->m_xfslogd_workqueue)
+		goto out;
+
 	mp->m_data_workqueue = alloc_workqueue("xfs-data/%s",
 			WQ_MEM_RECLAIM|WQ_FREEZABLE, 0, mp->m_fsname);
 	if (!mp->m_data_workqueue)
-		goto out;
+		goto out_destroy_xfslogd;
 
 	mp->m_unwritten_workqueue = alloc_workqueue("xfs-conv/%s",
 			WQ_MEM_RECLAIM|WQ_FREEZABLE, 0, mp->m_fsname);
@@ -884,6 +890,8 @@ out_destroy_unwritten:
 	destroy_workqueue(mp->m_unwritten_workqueue);
 out_destroy_data_iodone_queue:
 	destroy_workqueue(mp->m_data_workqueue);
+out_destroy_xfslogd:
+	destroy_workqueue(mp->m_xfslogd_workqueue);
 out:
 	return -ENOMEM;
 }
@@ -898,6 +906,7 @@ xfs_destroy_mount_workqueues(
 	destroy_workqueue(mp->m_cil_workqueue);
 	destroy_workqueue(mp->m_data_workqueue);
 	destroy_workqueue(mp->m_unwritten_workqueue);
+	destroy_workqueue(mp->m_xfslogd_workqueue);
 }
 
 /*
-- 
1.8.3.1

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] xfs: move global xfslogd workqueue to per-mount
  2014-10-31 16:34 [PATCH] xfs: move global xfslogd workqueue to per-mount Brian Foster
@ 2014-11-06 23:59 ` Dave Chinner
  2014-11-07 14:05   ` Brian Foster
  0 siblings, 1 reply; 3+ messages in thread
From: Dave Chinner @ 2014-11-06 23:59 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

On Fri, Oct 31, 2014 at 12:34:31PM -0400, Brian Foster wrote:
> The xfslogd workqueue is a global, single-job workqueue for buffer ioend
> processing. This means we allow for a single work item at a time for all
> possible XFS mounts on a system. fsstress testing in loopback XFS over
> XFS configurations has reproduced xfslogd deadlocks due to the single
> threaded nature of the queue and dependencies introduced between the
> separate XFS instances by online discard (-o discard).
> 
> Discard over a loopback device converts the discard request to a hole
> punch (fallocate) on the underlying file. Online discard requests are
> issued synchronously and from xfslogd context in XFS, hence the xfslogd
> workqueue is blocked in the upper fs waiting on a hole punch request to
> be servied in the lower fs. If the lower fs issues I/O that depends on
> xfslogd to complete, both filesystems end up hung indefinitely. This is
> reproduced reliabily by generic/013 on XFS->loop->XFS test devices with
> the '-o discard' mount option.
> 
> Further, docker implementations appear to use this kind of configuration
> for container instance filesystems by default (container fs->dm->
> loop->base fs) and therefore are subject to this deadlock when running
> on XFS.
> 
> Replace the global xfslogd workqueue with a per-mount variant. This
> guarantees each mount access to a single worker and prevents deadlocks
> due to inter-fs dependencies introduced by discard.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
> 
> Hi all,
> 
> Thoughts? An alternative was to increase max jobs on the existing
> workqueue, but this seems more in line with how we manage workqueues
> these days.

First thing is that it's no longer a "log" workqueue. It's an async
buffer completion workqueue, so we really should rename it.
Especially as this change would mean we now have m_log_workqueue
for the log and m_xfslogd_workqueue for buffer completion...

Indeed, is the struct xfs_mount the right place for this? Shouldn't
it be on the relevant buftarg that the buffer is associated with?

> Brian
> 
>  fs/xfs/xfs_buf.c   | 13 ++-----------
>  fs/xfs/xfs_mount.h |  1 +
>  fs/xfs/xfs_super.c | 11 ++++++++++-
>  3 files changed, 13 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index 24b4ebe..758bc2e 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -44,8 +44,6 @@
>  
>  static kmem_zone_t *xfs_buf_zone;
>  
> -static struct workqueue_struct *xfslogd_workqueue;
> -
>  #ifdef XFS_BUF_LOCK_TRACKING
>  # define XB_SET_OWNER(bp)	((bp)->b_last_holder = current->pid)
>  # define XB_CLEAR_OWNER(bp)	((bp)->b_last_holder = -1)
> @@ -1053,7 +1051,8 @@ xfs_buf_ioend_async(
>  	struct xfs_buf	*bp)
>  {
>  	INIT_WORK(&bp->b_iodone_work, xfs_buf_ioend_work);
> -	queue_work(xfslogd_workqueue, &bp->b_iodone_work);
> +	queue_work(bp->b_target->bt_mount->m_xfslogd_workqueue,
> +		   &bp->b_iodone_work);
>  }

ie. queue_work(bp->b_target->bt_iodone_wq, &bp->b_iodone_work);

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] xfs: move global xfslogd workqueue to per-mount
  2014-11-06 23:59 ` Dave Chinner
@ 2014-11-07 14:05   ` Brian Foster
  0 siblings, 0 replies; 3+ messages in thread
From: Brian Foster @ 2014-11-07 14:05 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Fri, Nov 07, 2014 at 10:59:48AM +1100, Dave Chinner wrote:
> On Fri, Oct 31, 2014 at 12:34:31PM -0400, Brian Foster wrote:
> > The xfslogd workqueue is a global, single-job workqueue for buffer ioend
> > processing. This means we allow for a single work item at a time for all
> > possible XFS mounts on a system. fsstress testing in loopback XFS over
> > XFS configurations has reproduced xfslogd deadlocks due to the single
> > threaded nature of the queue and dependencies introduced between the
> > separate XFS instances by online discard (-o discard).
> > 
> > Discard over a loopback device converts the discard request to a hole
> > punch (fallocate) on the underlying file. Online discard requests are
> > issued synchronously and from xfslogd context in XFS, hence the xfslogd
> > workqueue is blocked in the upper fs waiting on a hole punch request to
> > be servied in the lower fs. If the lower fs issues I/O that depends on
> > xfslogd to complete, both filesystems end up hung indefinitely. This is
> > reproduced reliabily by generic/013 on XFS->loop->XFS test devices with
> > the '-o discard' mount option.
> > 
> > Further, docker implementations appear to use this kind of configuration
> > for container instance filesystems by default (container fs->dm->
> > loop->base fs) and therefore are subject to this deadlock when running
> > on XFS.
> > 
> > Replace the global xfslogd workqueue with a per-mount variant. This
> > guarantees each mount access to a single worker and prevents deadlocks
> > due to inter-fs dependencies introduced by discard.
> > 
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > ---
> > 
> > Hi all,
> > 
> > Thoughts? An alternative was to increase max jobs on the existing
> > workqueue, but this seems more in line with how we manage workqueues
> > these days.
> 
> First thing is that it's no longer a "log" workqueue. It's an async
> buffer completion workqueue, so we really should rename it.
> Especially as this change would mean we now have m_log_workqueue
> for the log and m_xfslogd_workqueue for buffer completion...
> 

Ok, sounds good. The name didn't make much sense to me given what it's
doing. ;) I guess it's historical.

> Indeed, is the struct xfs_mount the right place for this? Shouldn't
> it be on the relevant buftarg that the buffer is associated with?
> 

That makes sense from a generic design perspective: an iodone queue per
buffer target. That does introduce a behavior change that we need to
consider the side effects of. This queue currently is one request at a
time and retaining that configuration for per-buftarg queues still
allows for concurrency between log buf iodone processing and metadata
buf iodone processing when the log is a separate device.

It's not clear to me why this is a max_active=1 queue, so for that
reason I'm more hesitant to change behavior beyond what is a clear
separation between mounts. Do we have any serialization/locking hacks
around that depend on this condition? Also I suspect this means we
increase the possibility of things like adding items to the AIL
(xlog_iodone()) and pulling them off (e.g., xfs_iflush_done()) on
separate cpus, which makes me wonder if there are hidden performance
ramifications to such a change.

Maybe none of this matters and the queue config is also a historical
relic..?

Brian

> > Brian
> > 
> >  fs/xfs/xfs_buf.c   | 13 ++-----------
> >  fs/xfs/xfs_mount.h |  1 +
> >  fs/xfs/xfs_super.c | 11 ++++++++++-
> >  3 files changed, 13 insertions(+), 12 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> > index 24b4ebe..758bc2e 100644
> > --- a/fs/xfs/xfs_buf.c
> > +++ b/fs/xfs/xfs_buf.c
> > @@ -44,8 +44,6 @@
> >  
> >  static kmem_zone_t *xfs_buf_zone;
> >  
> > -static struct workqueue_struct *xfslogd_workqueue;
> > -
> >  #ifdef XFS_BUF_LOCK_TRACKING
> >  # define XB_SET_OWNER(bp)	((bp)->b_last_holder = current->pid)
> >  # define XB_CLEAR_OWNER(bp)	((bp)->b_last_holder = -1)
> > @@ -1053,7 +1051,8 @@ xfs_buf_ioend_async(
> >  	struct xfs_buf	*bp)
> >  {
> >  	INIT_WORK(&bp->b_iodone_work, xfs_buf_ioend_work);
> > -	queue_work(xfslogd_workqueue, &bp->b_iodone_work);
> > +	queue_work(bp->b_target->bt_mount->m_xfslogd_workqueue,
> > +		   &bp->b_iodone_work);
> >  }
> 
> ie. queue_work(bp->b_target->bt_iodone_wq, &bp->b_iodone_work);
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2014-11-07 14:06 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-10-31 16:34 [PATCH] xfs: move global xfslogd workqueue to per-mount Brian Foster
2014-11-06 23:59 ` Dave Chinner
2014-11-07 14:05   ` Brian Foster

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox