* [PATCH v3 0/2] xfs: split up xfslogd global workqueue
@ 2014-11-13 19:23 Brian Foster
2014-11-13 19:24 ` [PATCH v3 1/2] xfs: replace global xfslogd wq with per-mount wq Brian Foster
2014-11-13 19:24 ` [PATCH RFC v3 2/2] xfs: split metadata and log buffer completion to separate workqueues Brian Foster
0 siblings, 2 replies; 5+ messages in thread
From: Brian Foster @ 2014-11-13 19:23 UTC (permalink / raw)
To: xfs
I've broken this up into two patches because the bugfix is easy (patch
1) and I don't want to rework it for the purpose of refactoring usage of
the queue. Patch 1 is a straightforward, tested and backportable fix for
a specific problem and it's no more difficult to refactor queue usage
from a per-mount queue than from a single global queue.
The metadata/log buffer I/O separation is not as straightforward as
defining separate queues. We need to also define a mechanism to split
the completions across multiple queues within the buffer code. The ioend
code does this for the data/unwritten queues via the io type. There are
a few different ways we can go from here with regard to buffers. For
example:
- Define per-buftarg workqueues and configure them appropriately for
data and log devices.
- Split buf completion processing based on a buffer flag (or some other
identification).
- Continue using a single queue (per-mount) for metadata/log
completions.
The first approach introduces variance in that the separation only
occurs with separate data and log devices. This is not ideal and doesn't
seem worthwhile to me. It doesn't change anything in the common case and
introduces a new regression vector for the uncommon case.
The second approach is implemented by the 2/2 RFC of this series using
the XBF_SYNCIO flag simply because it only appears to be used by log
buffers at the moment. If we take this approach, should we define
something like an XBF_LOGIO or XBF_HIGHPRI flag?
The third approach simply involves dropping patch 2/2 and using the
per-mount queue as we use the global queue today.
Brian
v3:
- Split bug fix and buf/log I/O split into separate patches.
- Rename xfslogd workqueue to xfs-buf.
v2: http://oss.sgi.com/archives/xfs/2014-11/msg00164.html
- Rename xfslogd workqueue to xfs-iodone.
v1: http://oss.sgi.com/archives/xfs/2014-10/msg00539.html
Brian Foster (2):
xfs: replace global xfslogd wq with per-mount wq
xfs: split metadata and log buffer completion to separate workqueues
fs/xfs/xfs_buf.c | 19 ++++++++-----------
fs/xfs/xfs_mount.h | 1 +
fs/xfs/xfs_super.c | 12 ++++++++++--
3 files changed, 19 insertions(+), 13 deletions(-)
--
1.8.3.1
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v3 1/2] xfs: replace global xfslogd wq with per-mount wq
2014-11-13 19:23 [PATCH v3 0/2] xfs: split up xfslogd global workqueue Brian Foster
@ 2014-11-13 19:24 ` Brian Foster
2014-11-28 2:49 ` Dave Chinner
2014-11-13 19:24 ` [PATCH RFC v3 2/2] xfs: split metadata and log buffer completion to separate workqueues Brian Foster
1 sibling, 1 reply; 5+ messages in thread
From: Brian Foster @ 2014-11-13 19:24 UTC (permalink / raw)
To: xfs
The xfslogd workqueue is a global, single-job workqueue for buffer ioend
processing. This means we allow for a single work item at a time for all
possible XFS mounts on a system. fsstress testing in loopback XFS over
XFS configurations has reproduced xfslogd deadlocks due to the single
threaded nature of the queue and dependencies introduced between the
separate XFS instances by online discard (-o discard).
Discard over a loopback device converts the discard request to a hole
punch (fallocate) on the underlying file. Online discard requests are
issued synchronously and from xfslogd context in XFS, hence the xfslogd
workqueue is blocked in the upper fs waiting on a hole punch request to
be servied in the lower fs. If the lower fs issues I/O that depends on
xfslogd to complete, both filesystems end up hung indefinitely. This is
reproduced reliabily by generic/013 on XFS->loop->XFS test devices with
the '-o discard' mount option.
Further, docker implementations appear to use this kind of configuration
for container instance filesystems by default (container fs->dm->
loop->base fs) and therefore are subject to this deadlock when running
on XFS.
Replace the global xfslogd workqueue with a per-mount variant. This
guarantees each mount access to a single worker and prevents deadlocks
due to inter-fs dependencies introduced by discard. Since the queue is
only responsible for buffer iodone processing at this point in time,
rename xfslogd to xfs-buf.
Signed-off-by: Brian Foster <bfoster@redhat.com>
---
fs/xfs/xfs_buf.c | 12 +-----------
fs/xfs/xfs_mount.h | 1 +
fs/xfs/xfs_super.c | 11 ++++++++++-
3 files changed, 12 insertions(+), 12 deletions(-)
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 24b4ebe..c06d790 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -44,8 +44,6 @@
static kmem_zone_t *xfs_buf_zone;
-static struct workqueue_struct *xfslogd_workqueue;
-
#ifdef XFS_BUF_LOCK_TRACKING
# define XB_SET_OWNER(bp) ((bp)->b_last_holder = current->pid)
# define XB_CLEAR_OWNER(bp) ((bp)->b_last_holder = -1)
@@ -1053,7 +1051,7 @@ xfs_buf_ioend_async(
struct xfs_buf *bp)
{
INIT_WORK(&bp->b_iodone_work, xfs_buf_ioend_work);
- queue_work(xfslogd_workqueue, &bp->b_iodone_work);
+ queue_work(bp->b_target->bt_mount->m_buf_workqueue, &bp->b_iodone_work);
}
void
@@ -1882,15 +1880,8 @@ xfs_buf_init(void)
if (!xfs_buf_zone)
goto out;
- xfslogd_workqueue = alloc_workqueue("xfslogd",
- WQ_MEM_RECLAIM | WQ_HIGHPRI | WQ_FREEZABLE, 1);
- if (!xfslogd_workqueue)
- goto out_free_buf_zone;
-
return 0;
- out_free_buf_zone:
- kmem_zone_destroy(xfs_buf_zone);
out:
return -ENOMEM;
}
@@ -1898,6 +1889,5 @@ xfs_buf_init(void)
void
xfs_buf_terminate(void)
{
- destroy_workqueue(xfslogd_workqueue);
kmem_zone_destroy(xfs_buf_zone);
}
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index b0447c8..394bc71 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -168,6 +168,7 @@ typedef struct xfs_mount {
/* low free space thresholds */
struct xfs_kobj m_kobj;
+ struct workqueue_struct *m_buf_workqueue;
struct workqueue_struct *m_data_workqueue;
struct workqueue_struct *m_unwritten_workqueue;
struct workqueue_struct *m_cil_workqueue;
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 9f622fe..03e3cc2 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -842,10 +842,16 @@ STATIC int
xfs_init_mount_workqueues(
struct xfs_mount *mp)
{
+ mp->m_buf_workqueue = alloc_workqueue("xfs-buf/%s",
+ WQ_MEM_RECLAIM|WQ_HIGHPRI|WQ_FREEZABLE, 1,
+ mp->m_fsname);
+ if (!mp->m_buf_workqueue)
+ goto out;
+
mp->m_data_workqueue = alloc_workqueue("xfs-data/%s",
WQ_MEM_RECLAIM|WQ_FREEZABLE, 0, mp->m_fsname);
if (!mp->m_data_workqueue)
- goto out;
+ goto out_destroy_buf;
mp->m_unwritten_workqueue = alloc_workqueue("xfs-conv/%s",
WQ_MEM_RECLAIM|WQ_FREEZABLE, 0, mp->m_fsname);
@@ -884,6 +890,8 @@ out_destroy_unwritten:
destroy_workqueue(mp->m_unwritten_workqueue);
out_destroy_data_iodone_queue:
destroy_workqueue(mp->m_data_workqueue);
+out_destroy_buf:
+ destroy_workqueue(mp->m_buf_workqueue);
out:
return -ENOMEM;
}
@@ -898,6 +906,7 @@ xfs_destroy_mount_workqueues(
destroy_workqueue(mp->m_cil_workqueue);
destroy_workqueue(mp->m_data_workqueue);
destroy_workqueue(mp->m_unwritten_workqueue);
+ destroy_workqueue(mp->m_buf_workqueue);
}
/*
--
1.8.3.1
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH RFC v3 2/2] xfs: split metadata and log buffer completion to separate workqueues
2014-11-13 19:23 [PATCH v3 0/2] xfs: split up xfslogd global workqueue Brian Foster
2014-11-13 19:24 ` [PATCH v3 1/2] xfs: replace global xfslogd wq with per-mount wq Brian Foster
@ 2014-11-13 19:24 ` Brian Foster
2014-11-28 2:48 ` Dave Chinner
1 sibling, 1 reply; 5+ messages in thread
From: Brian Foster @ 2014-11-13 19:24 UTC (permalink / raw)
To: xfs
XFS traditionally sends all buffer I/O completion work to a single
queue. This includes metadata buffer completion and log buffer
completion. The log buffer completion requires a high priority queue to
prevent stalls due to log forces getting stuck behind other queued work.
Rather than continue to prioritize all buffer I/O completion due to the
needs of log completion, split log buffer completion off to
m_log_workqueue and move the high priority flag from m_buf_workqueue to
m_log_workqueue.
[XXX: Use of XBF_SYNCIO is purely for demonstration. Define a new flag.]
Signed-off-by: Brian Foster <bfoster@redhat.com>
---
fs/xfs/xfs_buf.c | 9 ++++++++-
fs/xfs/xfs_super.c | 5 ++---
2 files changed, 10 insertions(+), 4 deletions(-)
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index c06d790..58d729c 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -1050,8 +1050,15 @@ void
xfs_buf_ioend_async(
struct xfs_buf *bp)
{
+ struct workqueue_struct *wq;
+
+ if (bp->b_flags & XBF_SYNCIO)
+ wq = bp->b_target->bt_mount->m_log_workqueue;
+ else
+ wq = bp->b_target->bt_mount->m_buf_workqueue;
+
INIT_WORK(&bp->b_iodone_work, xfs_buf_ioend_work);
- queue_work(bp->b_target->bt_mount->m_buf_workqueue, &bp->b_iodone_work);
+ queue_work(wq, &bp->b_iodone_work);
}
void
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 03e3cc2..4b8cd37 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -843,8 +843,7 @@ xfs_init_mount_workqueues(
struct xfs_mount *mp)
{
mp->m_buf_workqueue = alloc_workqueue("xfs-buf/%s",
- WQ_MEM_RECLAIM|WQ_HIGHPRI|WQ_FREEZABLE, 1,
- mp->m_fsname);
+ WQ_MEM_RECLAIM|WQ_FREEZABLE, 1, mp->m_fsname);
if (!mp->m_buf_workqueue)
goto out;
@@ -869,7 +868,7 @@ xfs_init_mount_workqueues(
goto out_destroy_cil;
mp->m_log_workqueue = alloc_workqueue("xfs-log/%s",
- WQ_FREEZABLE, 0, mp->m_fsname);
+ WQ_FREEZABLE|WQ_HIGHPRI, 0, mp->m_fsname);
if (!mp->m_log_workqueue)
goto out_destroy_reclaim;
--
1.8.3.1
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH RFC v3 2/2] xfs: split metadata and log buffer completion to separate workqueues
2014-11-13 19:24 ` [PATCH RFC v3 2/2] xfs: split metadata and log buffer completion to separate workqueues Brian Foster
@ 2014-11-28 2:48 ` Dave Chinner
0 siblings, 0 replies; 5+ messages in thread
From: Dave Chinner @ 2014-11-28 2:48 UTC (permalink / raw)
To: Brian Foster; +Cc: xfs
On Thu, Nov 13, 2014 at 02:24:01PM -0500, Brian Foster wrote:
> XFS traditionally sends all buffer I/O completion work to a single
> queue. This includes metadata buffer completion and log buffer
> completion. The log buffer completion requires a high priority queue to
> prevent stalls due to log forces getting stuck behind other queued work.
>
> Rather than continue to prioritize all buffer I/O completion due to the
> needs of log completion, split log buffer completion off to
> m_log_workqueue and move the high priority flag from m_buf_workqueue to
> m_log_workqueue.
>
> [XXX: Use of XBF_SYNCIO is purely for demonstration. Define a new flag.]
>
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
> fs/xfs/xfs_buf.c | 9 ++++++++-
> fs/xfs/xfs_super.c | 5 ++---
> 2 files changed, 10 insertions(+), 4 deletions(-)
>
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index c06d790..58d729c 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -1050,8 +1050,15 @@ void
> xfs_buf_ioend_async(
> struct xfs_buf *bp)
> {
> + struct workqueue_struct *wq;
> +
> + if (bp->b_flags & XBF_SYNCIO)
> + wq = bp->b_target->bt_mount->m_log_workqueue;
> + else
> + wq = bp->b_target->bt_mount->m_buf_workqueue;
> +
> INIT_WORK(&bp->b_iodone_work, xfs_buf_ioend_work);
> - queue_work(bp->b_target->bt_mount->m_buf_workqueue, &bp->b_iodone_work);
> + queue_work(wq, &bp->b_iodone_work);
I can see what you are doing here, but I still think it would be
better to set this up at IO submission rather than taking all those
cacheline misses chasing pointers on IO completion. Adding an extra
pointer to the struct xfs_buf is not a big deal....
Otherwise this looks fine....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v3 1/2] xfs: replace global xfslogd wq with per-mount wq
2014-11-13 19:24 ` [PATCH v3 1/2] xfs: replace global xfslogd wq with per-mount wq Brian Foster
@ 2014-11-28 2:49 ` Dave Chinner
0 siblings, 0 replies; 5+ messages in thread
From: Dave Chinner @ 2014-11-28 2:49 UTC (permalink / raw)
To: Brian Foster; +Cc: xfs
On Thu, Nov 13, 2014 at 02:24:00PM -0500, Brian Foster wrote:
> The xfslogd workqueue is a global, single-job workqueue for buffer ioend
> processing. This means we allow for a single work item at a time for all
> possible XFS mounts on a system. fsstress testing in loopback XFS over
> XFS configurations has reproduced xfslogd deadlocks due to the single
> threaded nature of the queue and dependencies introduced between the
> separate XFS instances by online discard (-o discard).
>
> Discard over a loopback device converts the discard request to a hole
> punch (fallocate) on the underlying file. Online discard requests are
> issued synchronously and from xfslogd context in XFS, hence the xfslogd
> workqueue is blocked in the upper fs waiting on a hole punch request to
> be servied in the lower fs. If the lower fs issues I/O that depends on
> xfslogd to complete, both filesystems end up hung indefinitely. This is
> reproduced reliabily by generic/013 on XFS->loop->XFS test devices with
> the '-o discard' mount option.
>
> Further, docker implementations appear to use this kind of configuration
> for container instance filesystems by default (container fs->dm->
> loop->base fs) and therefore are subject to this deadlock when running
> on XFS.
>
> Replace the global xfslogd workqueue with a per-mount variant. This
> guarantees each mount access to a single worker and prevents deadlocks
> due to inter-fs dependencies introduced by discard. Since the queue is
> only responsible for buffer iodone processing at this point in time,
> rename xfslogd to xfs-buf.
>
> Signed-off-by: Brian Foster <bfoster@redhat.com>
Looks good. I'll take this as is and we can refine the way we point
to the workqueue in the patches that separate the log buffer
completions...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2014-11-28 2:49 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-11-13 19:23 [PATCH v3 0/2] xfs: split up xfslogd global workqueue Brian Foster
2014-11-13 19:24 ` [PATCH v3 1/2] xfs: replace global xfslogd wq with per-mount wq Brian Foster
2014-11-28 2:49 ` Dave Chinner
2014-11-13 19:24 ` [PATCH RFC v3 2/2] xfs: split metadata and log buffer completion to separate workqueues Brian Foster
2014-11-28 2:48 ` Dave Chinner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox