[RFC, PATCH 0/5] xfs: Reduce OOM kill problems under heavy load

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

* [RFC, PATCH 0/5] xfs: Reduce OOM kill problems under heavy load
@ 2011-02-22 22:16 Dave Chinner
  2011-02-22 22:16 ` [PATCH 1/5] xfs: introduce inode cluster buffer trylocks for xfs_iflush Dave Chinner
                   ` (4 more replies)
  0 siblings, 5 replies; 19+ messages in thread
From: Dave Chinner @ 2011-02-22 22:16 UTC (permalink / raw)
  To: xfs; +Cc: chris.mason

Chris Mason reported recently that a concurent stress test (basically copying
the linux kernel tree 20 times, verifying md5sums and deleting it in a loop
concurrently) under low memory conditions was triggering the OOM killer
muchmore easily than for btrfs.

Turns out there are two main problems. The first is that unlinked inodes were
not being reclaimed fast enough, leading to the OOM being declared when there
are large numbers of reclaimable inodes still around. The second was that
atime updates due to the verify step were creating large numbers of dirty
inodes at the VFS level that were not being written back and hence made
reclaimable before the system declared OOM and killed stuff.

The first problem is fixed by making background inode reclaim more frequent and
faster, kicking background reclaim from the inode cache shrinker so that when
memory is low we always have background inode reclaim in progress, and finally
making the shrinker reclaim scan block waiting on inodes to reclaim. This last
step throttles memory reclaim to the speed at which we can reclaim inodes, a
key step needed to prevent inodes from reclaim declaring OOM while there are
still reclaimable inodes around. The background inode reclaim prevents this
synchronous flush from finding dirty inodes and block on them in most cases and
hence prevents performance regressions in more common workloads due to reclaim
stalls.

To enable this new functionality, the xfssyncd thread is replaced with a
workqueue and the existing xfssyncd work replaced with a global workqueue.
Hence all filesystems will share the same workqueue and we remove allt eh
xfssyncd threads from the system. The ENOSPC inode flush is converted to use
the workqueue, and optimised to only allow a single flush at a time. This
significant speeds up ENOSPC processing under concurrent workloads as it
removes all the unnecessary scanning that every single ENOSPC event
currently queues to the xfssyncd. Finally, a new reinode reclaim worker is
added to the workqueue that runs 5x more frequently that the xfssyncd to do the
background inode reclaim scan.

The second problem is fixed simply by making the XFS inode cache shrinker kick
the bdi flusher to write back inodes if the bdi flusher is not already active.
This ensures that in low memory situations we are always actively writing back
inodes that are dirty at the VFS level and hence preventing them from building
up in an unreclaimable state. Once again this does not affect performance in
non-memory constrained situations.

The result is not yet perfect - the stress test still triggers the OOM killer
somewhere between 3-6 hours into the test on a CONFIG_XFS_DEBUG kernel with
lockdep enabled (so inodes consume roughly 2x the memory of a production
kernel), though this is a marked improvement. The OOM kill trigger appears to
be a different one to the above two, so expect more patches to address that
soon.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/5] xfs: introduce inode cluster buffer trylocks for xfs_iflush
  2011-02-22 22:16 [RFC, PATCH 0/5] xfs: Reduce OOM kill problems under heavy load Dave Chinner
@ 2011-02-22 22:16 ` Dave Chinner
  2011-03-03 15:55   ` Christoph Hellwig
  2011-02-22 22:16 ` [PATCH 2/5] xfs: introduce a xfssyncd workqueue Dave Chinner
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 19+ messages in thread
From: Dave Chinner @ 2011-02-22 22:16 UTC (permalink / raw)
  To: xfs; +Cc: chris.mason

From: Dave Chinner <dchinner@redhat.com>

There is an ABBA deadlock between synchronous inode flushing in
xfs_reclaim_inode and xfs_icluster_free. xfs_icluster_free locks the
buffer, then takes inode ilocks, whilst synchronous reclaim takes
the ilock followed by the buffer lock in xfs_iflush().

To avoid this deadlock, separate the inode cluster buffer locking
semantics from the synchronous inode flush semantics, allowing
callers to attempt to lock the buffer but still issue synchronous IO
if it can get the buffer. This requires xfs_iflush() calls that
currently use non-blocking semantics to pass SYNC_TRYLOCK rather
than 0 as the flags parameter.

This allows xfs_reclaim_inode to avoid the deadlock on the buffer
lock and detect the failure so that it can drop the inode ilock and
restart the reclaim attempt on the inode. This allows
xfs_ifree_cluster to obtain the inode lock, mark the inode stale and
release it and hence defuse the deadlock situation. It also has the
pleasant side effect of avoiding IO in xfs_reclaim_inode when it
tries to next reclaim the inode as it is now marked stale.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/linux-2.6/xfs_super.c |    2 +-
 fs/xfs/linux-2.6/xfs_sync.c  |   30 +++++++++++++++++++++++++++---
 fs/xfs/linux-2.6/xfs_sync.h  |    1 +
 fs/xfs/xfs_inode.c           |    2 +-
 fs/xfs/xfs_inode_item.c      |    6 +++---
 5 files changed, 33 insertions(+), 8 deletions(-)

diff --git a/fs/xfs/linux-2.6/xfs_super.c b/fs/xfs/linux-2.6/xfs_super.c
index 4b2b1c7..e010830 100644
--- a/fs/xfs/linux-2.6/xfs_super.c
+++ b/fs/xfs/linux-2.6/xfs_super.c
@@ -1077,7 +1077,7 @@ xfs_fs_write_inode(
 			error = 0;
 			goto out_unlock;
 		}
-		error = xfs_iflush(ip, 0);
+		error = xfs_iflush(ip, SYNC_TRYLOCK);
 	}
 
  out_unlock:
diff --git a/fs/xfs/linux-2.6/xfs_sync.c b/fs/xfs/linux-2.6/xfs_sync.c
index 6c10f1d..594cd82 100644
--- a/fs/xfs/linux-2.6/xfs_sync.c
+++ b/fs/xfs/linux-2.6/xfs_sync.c
@@ -761,8 +761,10 @@ xfs_reclaim_inode(
 	struct xfs_perag	*pag,
 	int			sync_mode)
 {
-	int	error = 0;
+	int	error;
 
+restart:
+	error = 0;
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
 	if (!xfs_iflock_nowait(ip)) {
 		if (!(sync_mode & SYNC_WAIT))
@@ -788,9 +790,31 @@ xfs_reclaim_inode(
 	if (xfs_inode_clean(ip))
 		goto reclaim;
 
-	/* Now we have an inode that needs flushing */
-	error = xfs_iflush(ip, sync_mode);
+	/*
+	 * Now we have an inode that needs flushing.
+	 *
+	 * We do a nonblocking flush here even if we are doing a SYNC_WAIT
+	 * reclaim as we can deadlock with inode cluster removal.
+	 * xfs_ifree_cluster() can lock the inode buffer before it locks the
+	 * ip->i_lock, and we are doing the exact opposite here. As a result,
+	 * doing a blocking xfs_itobp() to get the cluster buffer will result
+	 * in an ABBA deadlock with xfs_ifree_cluster().
+	 *
+	 * As xfs_ifree_cluser() must gather all inodes that are active in the
+	 * cache to mark them stale, if we hit this case we don't actually want
+	 * to do IO here - we want the inode marked stale so we can simply
+	 * reclaim it. Hence if we get an EAGAIN error on a SYNC_WAIT flush,
+	 * just unlock the inode, back off and try again. Hopefully the next
+	 * pass through will see the stale flag set on the inode.
+	 */
+	error = xfs_iflush(ip, SYNC_TRYLOCK | sync_mode);
 	if (sync_mode & SYNC_WAIT) {
+		if (error == EAGAIN) {
+			xfs_iunlock(ip, XFS_ILOCK_EXCL);
+			/* backoff longer than in xfs_ifree_cluster */
+			delay(2);
+			goto restart;
+		}
 		xfs_iflock(ip);
 		goto reclaim;
 	}
diff --git a/fs/xfs/linux-2.6/xfs_sync.h b/fs/xfs/linux-2.6/xfs_sync.h
index 32ba662..0ae48ff 100644
--- a/fs/xfs/linux-2.6/xfs_sync.h
+++ b/fs/xfs/linux-2.6/xfs_sync.h
@@ -34,6 +34,7 @@ typedef struct xfs_sync_work {
 
 int xfs_syncd_init(struct xfs_mount *mp);
 void xfs_syncd_stop(struct xfs_mount *mp);
+void xfs_syncd_queue_sync(struct xfs_mount *mp, int flags);
 
 int xfs_quiesce_data(struct xfs_mount *mp);
 void xfs_quiesce_attr(struct xfs_mount *mp);
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 6b3424b..a44c015 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -2834,7 +2834,7 @@ xfs_iflush(
 	 * Get the buffer containing the on-disk inode.
 	 */
 	error = xfs_itobp(mp, NULL, ip, &dip, &bp,
-				(flags & SYNC_WAIT) ? XBF_LOCK : XBF_TRYLOCK);
+				(flags & SYNC_TRYLOCK) ? XBF_TRYLOCK : XBF_LOCK);
 	if (error || !bp) {
 		xfs_ifunlock(ip);
 		return error;
diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
index fd4f398..46cc401 100644
--- a/fs/xfs/xfs_inode_item.c
+++ b/fs/xfs/xfs_inode_item.c
@@ -760,11 +760,11 @@ xfs_inode_item_push(
 	 * Push the inode to it's backing buffer. This will not remove the
 	 * inode from the AIL - a further push will be required to trigger a
 	 * buffer push. However, this allows all the dirty inodes to be pushed
-	 * to the buffer before it is pushed to disk. THe buffer IO completion
-	 * will pull th einode from the AIL, mark it clean and unlock the flush
+	 * to the buffer before it is pushed to disk. The buffer IO completion
+	 * will pull the inode from the AIL, mark it clean and unlock the flush
 	 * lock.
 	 */
-	(void) xfs_iflush(ip, 0);
+	(void) xfs_iflush(ip, SYNC_TRYLOCK);
 	xfs_iunlock(ip, XFS_ILOCK_SHARED);
 }
 
-- 
1.7.2.3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/5] xfs: introduce a xfssyncd workqueue
  2011-02-22 22:16 [RFC, PATCH 0/5] xfs: Reduce OOM kill problems under heavy load Dave Chinner
  2011-02-22 22:16 ` [PATCH 1/5] xfs: introduce inode cluster buffer trylocks for xfs_iflush Dave Chinner
@ 2011-02-22 22:16 ` Dave Chinner
  2011-02-22 22:16 ` [PATCH 3/5] xfs: convert ENOSPC inode flushing to use new syncd workqueue Dave Chinner
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 19+ messages in thread
From: Dave Chinner @ 2011-02-22 22:16 UTC (permalink / raw)
  To: xfs; +Cc: chris.mason

From: Dave Chinner <dchinner@redhat.com>

All of the work xfssyncd does is background functionality. There is
no need for a thread per filesystem to do this work - it can al be
managed by a global workqueue now they manage concurrency
effectively.

Introduce a new gglobal xfssyncd workqueue, and convert the periodic
work to use this new functionality. To do this, use a delayed work
construct to schedule the next running of the periodic sync work
for the filesystem. When the sync work is complete, queue a new
delayed work for the next running of the sync work.

For laptop mode, we wait on completion for the sync works, so ensure
that the sync work queuing interface can flush and wait for work to
complete to enable the work queue infrastructure to replace the
current sequence number and wakeup that is used.

Because the sync work does non-trivial amounts of work, mark the
new work queue as CPU intensive.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/linux-2.6/xfs_super.c |   40 +++++++++---------
 fs/xfs/linux-2.6/xfs_sync.c  |   96 +++++++++++++++++++++++-------------------
 fs/xfs/linux-2.6/xfs_sync.h  |    3 +
 fs/xfs/xfs_mount.h           |    4 +-
 4 files changed, 76 insertions(+), 67 deletions(-)

diff --git a/fs/xfs/linux-2.6/xfs_super.c b/fs/xfs/linux-2.6/xfs_super.c
index e010830..73a17b1 100644
--- a/fs/xfs/linux-2.6/xfs_super.c
+++ b/fs/xfs/linux-2.6/xfs_super.c
@@ -1190,22 +1190,12 @@ xfs_fs_sync_fs(
 		return -error;
 
 	if (laptop_mode) {
-		int	prev_sync_seq = mp->m_sync_seq;
-
 		/*
 		 * The disk must be active because we're syncing.
 		 * We schedule xfssyncd now (now that the disk is
 		 * active) instead of later (when it might not be).
 		 */
-		wake_up_process(mp->m_sync_task);
-		/*
-		 * We have to wait for the sync iteration to complete.
-		 * If we don't, the disk activity caused by the sync
-		 * will come after the sync is completed, and that
-		 * triggers another sync from laptop mode.
-		 */
-		wait_event(mp->m_wait_single_sync_task,
-				mp->m_sync_seq != prev_sync_seq);
+		xfs_syncd_queue_sync(mp, SYNC_WAIT);
 	}
 
 	return 0;
@@ -1491,7 +1481,6 @@ xfs_fs_fill_super(
 	atomic_set(&mp->m_active_trans, 0);
 	INIT_LIST_HEAD(&mp->m_sync_list);
 	spin_lock_init(&mp->m_sync_lock);
-	init_waitqueue_head(&mp->m_wait_single_sync_task);
 
 	mp->m_super = sb;
 	sb->s_fs_info = mp;
@@ -1818,26 +1807,35 @@ init_xfs_fs(void)
 	if (error)
 		goto out_cleanup_procfs;
 
+	xfs_syncd_wq = alloc_workqueue("xfssyncd", WQ_CPU_INTENSIVE, 8);
+	if (!xfs_syncd_wq) {
+		error = -ENOMEM;
+		goto out_sysctl_unregister;
+	}
+	mutex_init(&xfs_syncd_lock);
+
 	vfs_initquota();
 
 	error = register_filesystem(&xfs_fs_type);
 	if (error)
-		goto out_sysctl_unregister;
+		goto out_destroy_xfs_syncd;
 	return 0;
 
- out_sysctl_unregister:
+out_destroy_xfs_syncd:
+	destroy_workqueue(xfs_syncd_wq);
+out_sysctl_unregister:
 	xfs_sysctl_unregister();
- out_cleanup_procfs:
+out_cleanup_procfs:
 	xfs_cleanup_procfs();
- out_buf_terminate:
+out_buf_terminate:
 	xfs_buf_terminate();
- out_filestream_uninit:
+out_filestream_uninit:
 	xfs_filestream_uninit();
- out_mru_cache_uninit:
+out_mru_cache_uninit:
 	xfs_mru_cache_uninit();
- out_destroy_zones:
+out_destroy_zones:
 	xfs_destroy_zones();
- out:
+out:
 	return error;
 }
 
@@ -1846,6 +1844,8 @@ exit_xfs_fs(void)
 {
 	vfs_exitquota();
 	unregister_filesystem(&xfs_fs_type);
+	destroy_workqueue(xfs_syncd_wq);
+	mutex_destroy(&xfs_syncd_lock);
 	xfs_sysctl_unregister();
 	xfs_cleanup_procfs();
 	xfs_buf_terminate();
diff --git a/fs/xfs/linux-2.6/xfs_sync.c b/fs/xfs/linux-2.6/xfs_sync.c
index 594cd82..28afe11 100644
--- a/fs/xfs/linux-2.6/xfs_sync.c
+++ b/fs/xfs/linux-2.6/xfs_sync.c
@@ -39,6 +39,9 @@
 #include <linux/kthread.h>
 #include <linux/freezer.h>
 
+struct workqueue_struct	*xfs_syncd_wq;	/* sync workqueue */
+struct mutex		xfs_syncd_lock;	/* lock for workqueue */
+
 /*
  * The inode lookup is done in batches to keep the amount of lock traffic and
  * radix tree lookups to a minimum. The batch size is a trade off between
@@ -489,32 +492,6 @@ xfs_flush_inodes(
 	xfs_log_force(ip->i_mount, XFS_LOG_SYNC);
 }
 
-/*
- * Every sync period we need to unpin all items, reclaim inodes and sync
- * disk quotas.  We might need to cover the log to indicate that the
- * filesystem is idle and not frozen.
- */
-STATIC void
-xfs_sync_worker(
-	struct xfs_mount *mp,
-	void		*unused)
-{
-	int		error;
-
-	if (!(mp->m_flags & XFS_MOUNT_RDONLY)) {
-		/* dgc: errors ignored here */
-		if (mp->m_super->s_frozen == SB_UNFROZEN &&
-		    xfs_log_need_covered(mp))
-			error = xfs_fs_log_dummy(mp);
-		else
-			xfs_log_force(mp, 0);
-		xfs_reclaim_inodes(mp, 0);
-		error = xfs_qm_sync(mp, SYNC_TRYLOCK);
-	}
-	mp->m_sync_seq++;
-	wake_up(&mp->m_wait_single_sync_task);
-}
-
 STATIC int
 xfssyncd(
 	void			*arg)
@@ -535,27 +512,12 @@ xfssyncd(
 			break;
 
 		spin_lock(&mp->m_sync_lock);
-		/*
-		 * We can get woken by laptop mode, to do a sync -
-		 * that's the (only!) case where the list would be
-		 * empty with time remaining.
-		 */
-		if (!timeleft || list_empty(&mp->m_sync_list)) {
-			if (!timeleft)
-				timeleft = xfs_syncd_centisecs *
-							msecs_to_jiffies(10);
-			INIT_LIST_HEAD(&mp->m_sync_work.w_list);
-			list_add_tail(&mp->m_sync_work.w_list,
-					&mp->m_sync_list);
-		}
 		list_splice_init(&mp->m_sync_list, &tmp);
 		spin_unlock(&mp->m_sync_lock);
 
 		list_for_each_entry_safe(work, n, &tmp, w_list) {
 			(*work->w_syncer)(mp, work->w_data);
 			list_del(&work->w_list);
-			if (work == &mp->m_sync_work)
-				continue;
 			if (work->w_completion)
 				complete(work->w_completion);
 			kmem_free(work);
@@ -565,13 +527,56 @@ xfssyncd(
 	return 0;
 }
 
+void
+xfs_syncd_queue_sync(
+	struct xfs_mount        *mp,
+	int			flags)
+{
+	mutex_lock(&xfs_syncd_lock);
+	if (!delayed_work_pending(&mp->m_sync_work))
+		queue_delayed_work(xfs_syncd_wq, &mp->m_sync_work,
+				xfs_syncd_centisecs * msecs_to_jiffies(10));
+	mutex_unlock(&xfs_syncd_lock);
+
+	if (flags & SYNC_WAIT)
+		flush_delayed_work_sync(&mp->m_sync_work);
+}
+
+/*
+ * Every sync period we need to unpin all items, reclaim inodes and sync
+ * disk quotas.  We might need to cover the log to indicate that the
+ * filesystem is idle and not frozen.
+ */
+STATIC void
+xfs_sync_worker(
+	struct work_struct *work)
+{
+	struct xfs_mount *mp = container_of(to_delayed_work(work),
+					struct xfs_mount, m_sync_work);
+	int		error;
+
+	if (!(mp->m_flags & XFS_MOUNT_RDONLY)) {
+		/* dgc: errors ignored here */
+		if (mp->m_super->s_frozen == SB_UNFROZEN &&
+		    xfs_log_need_covered(mp))
+			error = xfs_fs_log_dummy(mp);
+		else
+			xfs_log_force(mp, 0);
+		xfs_reclaim_inodes(mp, 0);
+		error = xfs_qm_sync(mp, SYNC_TRYLOCK);
+	}
+
+	/* queue us up again */
+	xfs_syncd_queue_sync(mp, 0);
+}
+
 int
 xfs_syncd_init(
 	struct xfs_mount	*mp)
 {
-	mp->m_sync_work.w_syncer = xfs_sync_worker;
-	mp->m_sync_work.w_mount = mp;
-	mp->m_sync_work.w_completion = NULL;
+	INIT_DELAYED_WORK(&mp->m_sync_work, xfs_sync_worker);
+	xfs_syncd_queue_sync(mp, 0);
+
 	mp->m_sync_task = kthread_run(xfssyncd, mp, "xfssyncd/%s", mp->m_fsname);
 	if (IS_ERR(mp->m_sync_task))
 		return -PTR_ERR(mp->m_sync_task);
@@ -582,6 +587,9 @@ void
 xfs_syncd_stop(
 	struct xfs_mount	*mp)
 {
+	mutex_lock(&xfs_syncd_lock);
+	cancel_delayed_work_sync(&mp->m_sync_work);
+	mutex_unlock(&xfs_syncd_lock);
 	kthread_stop(mp->m_sync_task);
 }
 
diff --git a/fs/xfs/linux-2.6/xfs_sync.h b/fs/xfs/linux-2.6/xfs_sync.h
index 0ae48ff..1cd2fec 100644
--- a/fs/xfs/linux-2.6/xfs_sync.h
+++ b/fs/xfs/linux-2.6/xfs_sync.h
@@ -32,6 +32,9 @@ typedef struct xfs_sync_work {
 #define SYNC_WAIT		0x0001	/* wait for i/o to complete */
 #define SYNC_TRYLOCK		0x0002  /* only try to lock inodes */
 
+extern struct workqueue_struct	*xfs_syncd_wq;	/* sync workqueue */
+extern struct mutex		xfs_syncd_lock;	/* lock for workqueue */
+
 int xfs_syncd_init(struct xfs_mount *mp);
 void xfs_syncd_stop(struct xfs_mount *mp);
 void xfs_syncd_queue_sync(struct xfs_mount *mp, int flags);
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index a62e897..2c11e62 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -203,12 +203,10 @@ typedef struct xfs_mount {
 	struct mutex		m_icsb_mutex;	/* balancer sync lock */
 #endif
 	struct xfs_mru_cache	*m_filestream;  /* per-mount filestream data */
+	struct delayed_work	m_sync_work;	/* background sync work */
 	struct task_struct	*m_sync_task;	/* generalised sync thread */
-	xfs_sync_work_t		m_sync_work;	/* work item for VFS_SYNC */
 	struct list_head	m_sync_list;	/* sync thread work item list */
 	spinlock_t		m_sync_lock;	/* work item list lock */
-	int			m_sync_seq;	/* sync thread generation no. */
-	wait_queue_head_t	m_wait_single_sync_task;
 	__int64_t		m_update_flags;	/* sb flags we need to update
 						   on the next remount,rw */
 	struct shrinker		m_inode_shrink;	/* inode reclaim shrinker */
-- 
1.7.2.3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/5] xfs: convert ENOSPC inode flushing to use new syncd workqueue
  2011-02-22 22:16 [RFC, PATCH 0/5] xfs: Reduce OOM kill problems under heavy load Dave Chinner
  2011-02-22 22:16 ` [PATCH 1/5] xfs: introduce inode cluster buffer trylocks for xfs_iflush Dave Chinner
  2011-02-22 22:16 ` [PATCH 2/5] xfs: introduce a xfssyncd workqueue Dave Chinner
@ 2011-02-22 22:16 ` Dave Chinner
  2011-03-03 15:34   ` Christoph Hellwig
  2011-02-22 22:16 ` [PATCH 4/5] xfs: introduce background inode reclaim work Dave Chinner
  2011-02-22 22:16 ` [PATCH 5/5] xfs: kick inode writeback when low on memory Dave Chinner
  4 siblings, 1 reply; 19+ messages in thread
From: Dave Chinner @ 2011-02-22 22:16 UTC (permalink / raw)
  To: xfs; +Cc: chris.mason

From: Dave Chinner <dchinner@redhat.com>

On of the problems with the current inode flush at ENOSPC is that we
queue a flush per ENOSPC event, regardless of how many are already
queued. Thi can result in    hundreds of queued flushes, most of
which simply burn CPU scanned and do no real work. This simply slows
down allocation at ENOSPC.

We really only need one active flush at a time, and we can easily
implement that via the new xfs_syncd_wq. All we need to do is queue
a flush if one is not already active, then block waiting for the
currently active flush to complete. The result is that we only ever
have a single ENOSPC inode flush active at a time and this greatly
reduces the overhead of ENOSPC processing.

On my 2p test machine, this results in tests exercising ENOSPC
conditions running significantly faster - 042 halves execution time,
083 drops from 60s to 5s, etc - while not introducing test
regressions.

This allows us to remove the old xfssyncd threads and infrastructure
as they are no longer used.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/linux-2.6/xfs_super.c |    2 -
 fs/xfs/linux-2.6/xfs_sync.c  |  137 ++++++++++++-----------------------------
 fs/xfs/xfs_mount.h           |    4 +-
 3 files changed, 41 insertions(+), 102 deletions(-)

diff --git a/fs/xfs/linux-2.6/xfs_super.c b/fs/xfs/linux-2.6/xfs_super.c
index 73a17b1..47eb457 100644
--- a/fs/xfs/linux-2.6/xfs_super.c
+++ b/fs/xfs/linux-2.6/xfs_super.c
@@ -1479,8 +1479,6 @@ xfs_fs_fill_super(
 	spin_lock_init(&mp->m_sb_lock);
 	mutex_init(&mp->m_growlock);
 	atomic_set(&mp->m_active_trans, 0);
-	INIT_LIST_HEAD(&mp->m_sync_list);
-	spin_lock_init(&mp->m_sync_lock);
 
 	mp->m_super = sb;
 	sb->s_fs_info = mp;
diff --git a/fs/xfs/linux-2.6/xfs_sync.c b/fs/xfs/linux-2.6/xfs_sync.c
index 28afe11..d47dc45 100644
--- a/fs/xfs/linux-2.6/xfs_sync.c
+++ b/fs/xfs/linux-2.6/xfs_sync.c
@@ -434,99 +434,6 @@ xfs_quiesce_attr(
 	xfs_unmountfs_writesb(mp);
 }
 
-/*
- * Enqueue a work item to be picked up by the vfs xfssyncd thread.
- * Doing this has two advantages:
- * - It saves on stack space, which is tight in certain situations
- * - It can be used (with care) as a mechanism to avoid deadlocks.
- * Flushing while allocating in a full filesystem requires both.
- */
-STATIC void
-xfs_syncd_queue_work(
-	struct xfs_mount *mp,
-	void		*data,
-	void		(*syncer)(struct xfs_mount *, void *),
-	struct completion *completion)
-{
-	struct xfs_sync_work *work;
-
-	work = kmem_alloc(sizeof(struct xfs_sync_work), KM_SLEEP);
-	INIT_LIST_HEAD(&work->w_list);
-	work->w_syncer = syncer;
-	work->w_data = data;
-	work->w_mount = mp;
-	work->w_completion = completion;
-	spin_lock(&mp->m_sync_lock);
-	list_add_tail(&work->w_list, &mp->m_sync_list);
-	spin_unlock(&mp->m_sync_lock);
-	wake_up_process(mp->m_sync_task);
-}
-
-/*
- * Flush delayed allocate data, attempting to free up reserved space
- * from existing allocations.  At this point a new allocation attempt
- * has failed with ENOSPC and we are in the process of scratching our
- * heads, looking about for more room...
- */
-STATIC void
-xfs_flush_inodes_work(
-	struct xfs_mount *mp,
-	void		*arg)
-{
-	struct inode	*inode = arg;
-	xfs_sync_data(mp, SYNC_TRYLOCK);
-	xfs_sync_data(mp, SYNC_TRYLOCK | SYNC_WAIT);
-	iput(inode);
-}
-
-void
-xfs_flush_inodes(
-	xfs_inode_t	*ip)
-{
-	struct inode	*inode = VFS_I(ip);
-	DECLARE_COMPLETION_ONSTACK(completion);
-
-	igrab(inode);
-	xfs_syncd_queue_work(ip->i_mount, inode, xfs_flush_inodes_work, &completion);
-	wait_for_completion(&completion);
-	xfs_log_force(ip->i_mount, XFS_LOG_SYNC);
-}
-
-STATIC int
-xfssyncd(
-	void			*arg)
-{
-	struct xfs_mount	*mp = arg;
-	long			timeleft;
-	xfs_sync_work_t		*work, *n;
-	LIST_HEAD		(tmp);
-
-	set_freezable();
-	timeleft = xfs_syncd_centisecs * msecs_to_jiffies(10);
-	for (;;) {
-		if (list_empty(&mp->m_sync_list))
-			timeleft = schedule_timeout_interruptible(timeleft);
-		/* swsusp */
-		try_to_freeze();
-		if (kthread_should_stop() && list_empty(&mp->m_sync_list))
-			break;
-
-		spin_lock(&mp->m_sync_lock);
-		list_splice_init(&mp->m_sync_list, &tmp);
-		spin_unlock(&mp->m_sync_lock);
-
-		list_for_each_entry_safe(work, n, &tmp, w_list) {
-			(*work->w_syncer)(mp, work->w_data);
-			list_del(&work->w_list);
-			if (work->w_completion)
-				complete(work->w_completion);
-			kmem_free(work);
-		}
-	}
-
-	return 0;
-}
-
 void
 xfs_syncd_queue_sync(
 	struct xfs_mount        *mp,
@@ -570,16 +477,52 @@ xfs_sync_worker(
 	xfs_syncd_queue_sync(mp, 0);
 }
 
+/*
+ * Flush delayed allocate data, attempting to free up reserved space
+ * from existing allocations.  At this point a new allocation attempt
+ * has failed with ENOSPC and we are in the process of scratching our
+ * heads, looking about for more room.
+ *
+ * Queue a new data flush if there isn't one already in progress and
+ * wait for completion of the flush. This means that we only ever have one
+ * inode flush in progress no matter how many ENOSPC events are occurring and
+ * so will prevent the system from bogging down due to every concurrent
+ * ENOSPC event scanning all the active inodes in the system for writeback.
+ */
+void
+xfs_flush_inodes(
+	struct xfs_inode	*ip)
+{
+	struct xfs_mount	*mp = ip->i_mount;
+
+	mutex_lock(&xfs_syncd_lock);
+	if (!work_pending(&mp->m_flush_work))
+		queue_work(xfs_syncd_wq, &mp->m_flush_work);
+	mutex_unlock(&xfs_syncd_lock);
+
+	flush_work_sync(&mp->m_flush_work);
+}
+
+STATIC void
+xfs_flush_worker(
+	struct work_struct *work)
+{
+	struct xfs_mount *mp = container_of(work,
+					struct xfs_mount, m_flush_work);
+
+	xfs_sync_data(mp, SYNC_TRYLOCK);
+	xfs_sync_data(mp, SYNC_TRYLOCK | SYNC_WAIT);
+	xfs_log_force(mp, XFS_LOG_SYNC);
+}
+
 int
 xfs_syncd_init(
 	struct xfs_mount	*mp)
 {
+	INIT_WORK(&mp->m_flush_work, xfs_flush_worker);
 	INIT_DELAYED_WORK(&mp->m_sync_work, xfs_sync_worker);
 	xfs_syncd_queue_sync(mp, 0);
 
-	mp->m_sync_task = kthread_run(xfssyncd, mp, "xfssyncd/%s", mp->m_fsname);
-	if (IS_ERR(mp->m_sync_task))
-		return -PTR_ERR(mp->m_sync_task);
 	return 0;
 }
 
@@ -589,8 +532,8 @@ xfs_syncd_stop(
 {
 	mutex_lock(&xfs_syncd_lock);
 	cancel_delayed_work_sync(&mp->m_sync_work);
+	cancel_work_sync(&mp->m_flush_work);
 	mutex_unlock(&xfs_syncd_lock);
-	kthread_stop(mp->m_sync_task);
 }
 
 void
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index 2c11e62..a0ad90e 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -204,9 +204,7 @@ typedef struct xfs_mount {
 #endif
 	struct xfs_mru_cache	*m_filestream;  /* per-mount filestream data */
 	struct delayed_work	m_sync_work;	/* background sync work */
-	struct task_struct	*m_sync_task;	/* generalised sync thread */
-	struct list_head	m_sync_list;	/* sync thread work item list */
-	spinlock_t		m_sync_lock;	/* work item list lock */
+	struct work_struct	m_flush_work;	/* background inode flush */
 	__int64_t		m_update_flags;	/* sb flags we need to update
 						   on the next remount,rw */
 	struct shrinker		m_inode_shrink;	/* inode reclaim shrinker */
-- 
1.7.2.3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 4/5] xfs: introduce background inode reclaim work
  2011-02-22 22:16 [RFC, PATCH 0/5] xfs: Reduce OOM kill problems under heavy load Dave Chinner
                   ` (2 preceding siblings ...)
  2011-02-22 22:16 ` [PATCH 3/5] xfs: convert ENOSPC inode flushing to use new syncd workqueue Dave Chinner
@ 2011-02-22 22:16 ` Dave Chinner
  2011-03-03 15:36   ` Christoph Hellwig
  2011-02-22 22:16 ` [PATCH 5/5] xfs: kick inode writeback when low on memory Dave Chinner
  4 siblings, 1 reply; 19+ messages in thread
From: Dave Chinner @ 2011-02-22 22:16 UTC (permalink / raw)
  To: xfs; +Cc: chris.mason

From: Dave Chinner <dchinner@redhat.com>

Background inode reclaim needs to run more frequently that the XFS
syncd work is run as 30s is too long between optimal reclaim runs.
Add a new periodic work item to the xfs syncd workqueue to run a
fast, non-blocking inode reclaim scan.

To make memory reclaim based inode reclaim throttle to inode
cleaning but still reclaim inodes efficiently, make it kick the
background inode reclaim so that when we are low on memory we are
trying to reclaim inodes as efficiently as possible. To contrast
this, make the shrinker past do synchronous inode reclaim so that it
blocks on inodes under IO. This means that the shrinker will reclaim
inodes rather than just skipping over them, but it does not
adversely affect the rate of reclaim because most dirty inodes are
already under IO due to the background reclaim work the shrinker
kicked.

These two modifications solve one of the two OOM killer invocations
Chris Mason reported recently when running a stress testing script.
The particular workload trigger for the OOM killer invocation is
where there are more threads than CPUs all unlinking files in an
extremely memory constrained environment. Unlike other solutions,
this one does not have a performance impact on performance when
memory is not constrained or the number of concurrent threads
operating is <= to the number of CPUs.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/linux-2.6/xfs_sync.c |   63 +++++++++++++++++++++++++++++++++++++++++-
 fs/xfs/xfs_mount.h          |    1 +
 2 files changed, 62 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/linux-2.6/xfs_sync.c b/fs/xfs/linux-2.6/xfs_sync.c
index d47dc45..35138dc 100644
--- a/fs/xfs/linux-2.6/xfs_sync.c
+++ b/fs/xfs/linux-2.6/xfs_sync.c
@@ -478,6 +478,51 @@ xfs_sync_worker(
 }
 
 /*
+ * Queue a new inode reclaim pass if there isn't one already in progress.
+ * Wait for completion of the flush if necessary.
+ */
+void
+xfs_syncd_queue_reclaim(
+	struct xfs_mount        *mp,
+	int			flags)
+{
+	mutex_lock(&xfs_syncd_lock);
+	if (!delayed_work_pending(&mp->m_reclaim_work))
+		queue_delayed_work(xfs_syncd_wq, &mp->m_reclaim_work,
+			xfs_syncd_centisecs / 5 * msecs_to_jiffies(10));
+	mutex_unlock(&xfs_syncd_lock);
+
+	if (flags & SYNC_WAIT)
+		flush_delayed_work_sync(&mp->m_reclaim_work);
+}
+
+/*
+ * This is a fast pass over the inode cache to try to get reclaim moving on as
+ * many inodes as possible in a short period of time. It kicks itself every few
+ * seconds, as well as being kicked by the inode cache shrinker when memory
+ * goes low.
+ */
+STATIC void
+xfs_reclaim_worker(
+	struct work_struct *work)
+{
+	struct xfs_mount *mp = container_of(to_delayed_work(work),
+					struct xfs_mount, m_reclaim_work);
+
+	/* first unpin all the dirty and stale inodes. */
+	xfs_log_force(mp, XFS_LOG_SYNC);
+
+	/*
+	 * now scan as quickly as possible, not getting hung up on locked
+	 * inodes or those that are already flushing.
+	 */
+	xfs_reclaim_inodes(mp, SYNC_TRYLOCK);
+
+	/* queue us up again */
+	xfs_syncd_queue_reclaim(mp, 0);
+}
+
+/*
  * Flush delayed allocate data, attempting to free up reserved space
  * from existing allocations.  At this point a new allocation attempt
  * has failed with ENOSPC and we are in the process of scratching our
@@ -521,7 +566,10 @@ xfs_syncd_init(
 {
 	INIT_WORK(&mp->m_flush_work, xfs_flush_worker);
 	INIT_DELAYED_WORK(&mp->m_sync_work, xfs_sync_worker);
+	INIT_DELAYED_WORK(&mp->m_reclaim_work, xfs_reclaim_worker);
+
 	xfs_syncd_queue_sync(mp, 0);
+	xfs_syncd_queue_reclaim(mp, 0);
 
 	return 0;
 }
@@ -532,6 +580,7 @@ xfs_syncd_stop(
 {
 	mutex_lock(&xfs_syncd_lock);
 	cancel_delayed_work_sync(&mp->m_sync_work);
+	cancel_delayed_work_sync(&mp->m_reclaim_work);
 	cancel_work_sync(&mp->m_flush_work);
 	mutex_unlock(&xfs_syncd_lock);
 }
@@ -968,7 +1017,13 @@ xfs_reclaim_inodes(
 }
 
 /*
- * Shrinker infrastructure.
+ * Inode cache shrinker.
+ *
+ * When called we make sure that there is a background (fast) inode reclaim in
+ * progress, while we will throttle the speed of reclaim via doiing synchronous
+ * reclaim of inodes. That means if we come across dirty inodes, we wait for
+ * them to be cleaned, which we hope will not be very long due to the
+ * background walker having already kicked the IO off on those dirty inodes.
  */
 static int
 xfs_reclaim_inode_shrink(
@@ -983,10 +1038,14 @@ xfs_reclaim_inode_shrink(
 
 	mp = container_of(shrink, struct xfs_mount, m_inode_shrink);
 	if (nr_to_scan) {
+		/* kick background reclaimer */
+		xfs_syncd_queue_reclaim(mp, 0);
+
 		if (!(gfp_mask & __GFP_FS))
 			return -1;
 
-		xfs_reclaim_inodes_ag(mp, SYNC_TRYLOCK, &nr_to_scan);
+		xfs_reclaim_inodes_ag(mp, SYNC_TRYLOCK | SYNC_WAIT,
+					&nr_to_scan);
 		/* terminate if we don't exhaust the scan */
 		if (nr_to_scan > 0)
 			return -1;
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index a0ad90e..19af0ab 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -204,6 +204,7 @@ typedef struct xfs_mount {
 #endif
 	struct xfs_mru_cache	*m_filestream;  /* per-mount filestream data */
 	struct delayed_work	m_sync_work;	/* background sync work */
+	struct delayed_work	m_reclaim_work;	/* background inode reclaim */
 	struct work_struct	m_flush_work;	/* background inode flush */
 	__int64_t		m_update_flags;	/* sb flags we need to update
 						   on the next remount,rw */
-- 
1.7.2.3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 5/5] xfs: kick inode writeback when low on memory
  2011-02-22 22:16 [RFC, PATCH 0/5] xfs: Reduce OOM kill problems under heavy load Dave Chinner
                   ` (3 preceding siblings ...)
  2011-02-22 22:16 ` [PATCH 4/5] xfs: introduce background inode reclaim work Dave Chinner
@ 2011-02-22 22:16 ` Dave Chinner
  2011-03-02  3:06   ` Dave Chinner
  4 siblings, 1 reply; 19+ messages in thread
From: Dave Chinner @ 2011-02-22 22:16 UTC (permalink / raw)
  To: xfs; +Cc: chris.mason

From: Dave Chinner <dchinner@redhat.com>

When the inode cache shrinker runs, we may have lots of dirty inodes queued up
in the VFS dirty queues that have not been expired. The typical case for this
with XFS is atime updates. The result is that a highly concurrent workload that
copies files and then later reads them (say to verify checksums) dirties all
the inodes again, even when relatime is used.

In a constrained memory environment, this results in a large number of dirty
inodes using all of available memory and memory reclaim being unable to free
them as dirty inodes areconsidered active. This problem was uncovered by Chris
Mason during recent low memory stress testing.

The fix is to trigger VFS level writeback from the XFS inode cache shrinker if
there isn't already writeback in progress. This ensures that when we enter a
low memory situation we start cleaning inodes (via the flusher thread) on the
filesystem immediately, thereby making it more likely that we will be able to
evict those dirty inodes from the VFS in the near future.

The mechanism is not perfect - it only acts on the current filesystem, so if
all the dirty inodes are on a different filesystem it won't help. However, it
seems to be a valid assumption is that the filesystem with lots of dirty inodes
is going to have the shrinker called very soon after the memory shortage
begins, so this shouldn't be an issue.

The other flaw is that there is no guarantee that the flusher thread will make
progress fast enough to clean the dirty inodes so they can be reclaimed in the
near future. However, this mechanism does improve the resilience of the
filesystem under the test conditions - instead of reliably triggering the OOM
killer 20 minutes into the stress test, it took more than 6 hours before it
happened.

This small addition definitely improves the low memory resilience of XFS on
this type of workload, and best of all it has no impact on performance when
memory is not constrained.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/linux-2.6/xfs_sync.c |   11 +++++++++++
 1 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/fs/xfs/linux-2.6/xfs_sync.c b/fs/xfs/linux-2.6/xfs_sync.c
index 35138dc..3abde91 100644
--- a/fs/xfs/linux-2.6/xfs_sync.c
+++ b/fs/xfs/linux-2.6/xfs_sync.c
@@ -1044,6 +1044,17 @@ xfs_reclaim_inode_shrink(
 		if (!(gfp_mask & __GFP_FS))
 			return -1;

+		/*
+		 * make sure VFS is cleaning inodes so they can be pruned
+		 * and marked for reclaim in the XFS inode cache. If we don't
+		 * do this the VFS can accumulate dirty inodes and we can OOM
+		 * before they are cleaned by the periodic VFS writeback.
+		 *
+		 * This takes VFS level locks, so we can only do this after
+		 * the __GFP_FS checks otherwise lockdep gets really unhappy.
+		 */
+		writeback_inodes_sb_nr_if_idle(mp->m_super, nr_to_scan);
+
 		xfs_reclaim_inodes_ag(mp, SYNC_TRYLOCK | SYNC_WAIT,
 					&nr_to_scan);
 		/* terminate if we don't exhaust the scan */
-- 
1.7.2.3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/5] xfs: kick inode writeback when low on memory
  2011-02-22 22:16 ` [PATCH 5/5] xfs: kick inode writeback when low on memory Dave Chinner
@ 2011-03-02  3:06   ` Dave Chinner
  2011-03-02 14:12     ` Christoph Hellwig
  0 siblings, 1 reply; 19+ messages in thread
From: Dave Chinner @ 2011-03-02  3:06 UTC (permalink / raw)
  To: xfs; +Cc: chris.mason

On Wed, Feb 23, 2011 at 09:16:09AM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> When the inode cache shrinker runs, we may have lots of dirty inodes queued up
> in the VFS dirty queues that have not been expired. The typical case for this
> with XFS is atime updates. The result is that a highly concurrent workload that
> copies files and then later reads them (say to verify checksums) dirties all
> the inodes again, even when relatime is used.
> 
> In a constrained memory environment, this results in a large number of dirty
> inodes using all of available memory and memory reclaim being unable to free
> them as dirty inodes areconsidered active. This problem was uncovered by Chris
> Mason during recent low memory stress testing.
> 
> The fix is to trigger VFS level writeback from the XFS inode cache shrinker if
> there isn't already writeback in progress. This ensures that when we enter a
> low memory situation we start cleaning inodes (via the flusher thread) on the
> filesystem immediately, thereby making it more likely that we will be able to
> evict those dirty inodes from the VFS in the near future.
> 
> The mechanism is not perfect - it only acts on the current filesystem, so if
> all the dirty inodes are on a different filesystem it won't help. However, it
> seems to be a valid assumption is that the filesystem with lots of dirty inodes
> is going to have the shrinker called very soon after the memory shortage
> begins, so this shouldn't be an issue.
> 
> The other flaw is that there is no guarantee that the flusher thread will make
> progress fast enough to clean the dirty inodes so they can be reclaimed in the
> near future. However, this mechanism does improve the resilience of the
> filesystem under the test conditions - instead of reliably triggering the OOM
> killer 20 minutes into the stress test, it took more than 6 hours before it
> happened.
> 
> This small addition definitely improves the low memory resilience of XFS on
> this type of workload, and best of all it has no impact on performance when
> memory is not constrained.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/linux-2.6/xfs_sync.c |   11 +++++++++++
>  1 files changed, 11 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/xfs/linux-2.6/xfs_sync.c b/fs/xfs/linux-2.6/xfs_sync.c
> index 35138dc..3abde91 100644
> --- a/fs/xfs/linux-2.6/xfs_sync.c
> +++ b/fs/xfs/linux-2.6/xfs_sync.c
> @@ -1044,6 +1044,17 @@ xfs_reclaim_inode_shrink(
>  		if (!(gfp_mask & __GFP_FS))
>  			return -1;
>  
> +		/*
> +		 * make sure VFS is cleaning inodes so they can be pruned
> +		 * and marked for reclaim in the XFS inode cache. If we don't
> +		 * do this the VFS can accumulate dirty inodes and we can OOM
> +		 * before they are cleaned by the periodic VFS writeback.
> +		 *
> +		 * This takes VFS level locks, so we can only do this after
> +		 * the __GFP_FS checks otherwise lockdep gets really unhappy.
> +		 */
> +		writeback_inodes_sb_nr_if_idle(mp->m_super, nr_to_scan);
> +

Well, this generates a deadlock if we get a low memory situation
before the bdi flusher thread for the underly device has been
created. That is, we get low memory, kick
writeback_inodes_sb_nr_if_idle(), we end up with the bdi-default
thread trying to create the flush-x:y thread, which gets stuck
waiting for kthread_create() to complete.

kthread_create() never completes because the do_fork() call in the
kthreadd fails memory allocation and again calls (via the shrinker)
writeback_inodes_sb_nr_if_idle(), which thinks that
writeback_in_progress(bdi) is false, so tries to start
writeback again....

So, writeback_inodes_sb_nr_if_idle() is busted w.r.t. only queuing a
single writeback instance as writeback is only marked as in progress
once the queued callback is running. Perhaps writeback_in_progress()
should return try if the BDI_Pending bit is set, indicating the
flusher thread is being created right now, but I'm not sure that is
sufficient to avoid all the potential races here.

I'm open to ideas here - I could convert the bdi flusher
infrastructure to cmwqs rather than using worker threads, or move
all dirty inode tracking and writeback into XFS, or ???

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/5] xfs: kick inode writeback when low on memory
  2011-03-02  3:06   ` Dave Chinner
@ 2011-03-02 14:12     ` Christoph Hellwig
  2011-03-03  2:42       ` Dave Chinner
  0 siblings, 1 reply; 19+ messages in thread
From: Christoph Hellwig @ 2011-03-02 14:12 UTC (permalink / raw)
  To: Dave Chinner; +Cc: chris.mason, xfs

On Wed, Mar 02, 2011 at 02:06:02PM +1100, Dave Chinner wrote:
> I'm open to ideas here - I could convert the bdi flusher
> infrastructure to cmwqs rather than using worker threads, or move
> all dirty inode tracking and writeback into XFS, or ???

Tejun posted patches to convert the writeback threads to workqueues.
But I think sooner or later we should stop using VFS dirty state for
metadata.  By allowing the dirty_inode operation to return a value
and say it shouldn't be marked dirty that could be done relatively
easily.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/5] xfs: kick inode writeback when low on memory
  2011-03-02 14:12     ` Christoph Hellwig
@ 2011-03-03  2:42       ` Dave Chinner
  2011-03-03 15:48         ` Christoph Hellwig
  0 siblings, 1 reply; 19+ messages in thread
From: Dave Chinner @ 2011-03-03  2:42 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: chris.mason, xfs

On Wed, Mar 02, 2011 at 09:12:20AM -0500, Christoph Hellwig wrote:
> On Wed, Mar 02, 2011 at 02:06:02PM +1100, Dave Chinner wrote:
> > I'm open to ideas here - I could convert the bdi flusher
> > infrastructure to cmwqs rather than using worker threads, or move
> > all dirty inode tracking and writeback into XFS, or ???
> 
> Tejun posted patches to convert the writeback threads to workqueues.
> But I think sooner or later we should stop using VFS dirty state for
> metadata.  By allowing the dirty_inode operation to return a value
> and say it shouldn't be marked dirty that could be done relatively
> easily.

Yeah, it doesn't seem like there's an easy way around that. I guess
I'll start by tracking VFS dirty inodes via a tag in the per-ag radix
tree and kick writeback via a new xfssynd work operation. I'll see
if that is sufficient to avoid the OOM problem without needing to
log the inodes in the .dirty_inode callback or changing it's
prototype.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/5] xfs: convert ENOSPC inode flushing to use new syncd workqueue
  2011-02-22 22:16 ` [PATCH 3/5] xfs: convert ENOSPC inode flushing to use new syncd workqueue Dave Chinner
@ 2011-03-03 15:34   ` Christoph Hellwig
  2011-03-03 22:41     ` Dave Chinner
  0 siblings, 1 reply; 19+ messages in thread
From: Christoph Hellwig @ 2011-03-03 15:34 UTC (permalink / raw)
  To: Dave Chinner; +Cc: chris.mason, xfs

I still don't see any point in having the ENOSPC flushing moved to a
different context.

Just add a mutex and flush inline, e.g.

void
xfs_flush_inodes(
	struct xfs_inode	*ip)
{
	struct xfs_mount	*mp = ip->i_mount;

	if (!mutex_trylock(&xfs_syncd_lock))
		return;		/* someone else is flushing right now */
	xfs_sync_data(mp, SYNC_TRYLOCK);
	xfs_sync_data(mp, SYNC_TRYLOCK | SYNC_WAIT);
	xfs_log_force(mp, XFS_LOG_SYNC);
	mutex_unlock(&xfs_syncd_lock);
}

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/5] xfs: introduce background inode reclaim work
  2011-02-22 22:16 ` [PATCH 4/5] xfs: introduce background inode reclaim work Dave Chinner
@ 2011-03-03 15:36   ` Christoph Hellwig
  2011-03-03 22:43     ` Dave Chinner
  0 siblings, 1 reply; 19+ messages in thread
From: Christoph Hellwig @ 2011-03-03 15:36 UTC (permalink / raw)
  To: Dave Chinner; +Cc: chris.mason, xfs

> +void
> +xfs_syncd_queue_reclaim(
> +	struct xfs_mount        *mp,
> +	int			flags)
> +{
> +	mutex_lock(&xfs_syncd_lock);
> +	if (!delayed_work_pending(&mp->m_reclaim_work))
> +		queue_delayed_work(xfs_syncd_wq, &mp->m_reclaim_work,
> +			xfs_syncd_centisecs / 5 * msecs_to_jiffies(10));
> +	mutex_unlock(&xfs_syncd_lock);
> +
> +	if (flags & SYNC_WAIT)
> +		flush_delayed_work_sync(&mp->m_reclaim_work);
> +}

queue_work/queue_delayed_work have a test_set_bit on
WORK_STRUCT_PENDING_BIT, so can just call queue_work/queue_delayed_work
and it will do the right thing if it is in use.  So you can remove the
mutex and delayed_work_pending check here.

At least currently SYNC_WAIT is never set by any caller, and I wonder if
we should just leave the waiting to the caller if we ever grow one.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/5] xfs: kick inode writeback when low on memory
  2011-03-03  2:42       ` Dave Chinner
@ 2011-03-03 15:48         ` Christoph Hellwig
  2011-03-03 16:19           ` Christoph Hellwig
  0 siblings, 1 reply; 19+ messages in thread
From: Christoph Hellwig @ 2011-03-03 15:48 UTC (permalink / raw)
  To: Dave Chinner; +Cc: chris.mason, xfs

On Thu, Mar 03, 2011 at 01:42:28PM +1100, Dave Chinner wrote:
> Yeah, it doesn't seem like there's an easy way around that. I guess
> I'll start by tracking VFS dirty inodes via a tag in the per-ag radix
> tree and kick writeback via a new xfssynd work operation. I'll see
> if that is sufficient to avoid the OOM problem without needing to
> log the inodes in the .dirty_inode callback or changing it's
> prototype.

I don't think we'll be able to get around chaning the dirty_inode
callback.  We need a way to prevent the VFS from marking the inode
dirty, otherwise we have no chance of reclaiming it.

Except for that I think it's really simple:

 1) we need to reintroduce the i_update_size flag or an equivalent to
    distinguish unlogged timestamp from unlogged size updates for fsync
    vs fdatasync.  At that point we can stop looking at the VFS dirty
    bits in fsync.
 2) ->dirty_inode needs to tag the inode as dirty in the inode radix
    tree

With those minimal changes we should be set - we already
callxfs_sync_attr from the sync_fs path, and xfs_sync_inode_attr
properly picks up inodes with unlogged changes.

In fact that whole scheme might even be able to speed up sync - right
now we first log the inode and then flush all pending data in the log
back to disk, and with this scheme we avoid logging the inode in the
first place.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/5] xfs: introduce inode cluster buffer trylocks for xfs_iflush
  2011-02-22 22:16 ` [PATCH 1/5] xfs: introduce inode cluster buffer trylocks for xfs_iflush Dave Chinner
@ 2011-03-03 15:55   ` Christoph Hellwig
  2011-03-03 22:04     ` Dave Chinner
  0 siblings, 1 reply; 19+ messages in thread
From: Christoph Hellwig @ 2011-03-03 15:55 UTC (permalink / raw)
  To: Dave Chinner; +Cc: chris.mason, xfs

> +	 * pass through will see the stale flag set on the inode.
> +	 */
> +	error = xfs_iflush(ip, SYNC_TRYLOCK | sync_mode);
>  	if (sync_mode & SYNC_WAIT) {
> +		if (error == EAGAIN) {
> +			xfs_iunlock(ip, XFS_ILOCK_EXCL);
> +			/* backoff longer than in xfs_ifree_cluster */
> +			delay(2);

Do we really need the delay here?  It seems like we'd rather want to
keep going with scanning the next inode cluster and return here from
xfs_reclaim_inodes.

> diff --git a/fs/xfs/linux-2.6/xfs_sync.h b/fs/xfs/linux-2.6/xfs_sync.h
> index 32ba662..0ae48ff 100644
> --- a/fs/xfs/linux-2.6/xfs_sync.h
> +++ b/fs/xfs/linux-2.6/xfs_sync.h
> @@ -34,6 +34,7 @@ typedef struct xfs_sync_work {
>  
>  int xfs_syncd_init(struct xfs_mount *mp);
>  void xfs_syncd_stop(struct xfs_mount *mp);
> +void xfs_syncd_queue_sync(struct xfs_mount *mp, int flags);

This hunk belongs into a different patch.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/5] xfs: kick inode writeback when low on memory
  2011-03-03 15:48         ` Christoph Hellwig
@ 2011-03-03 16:19           ` Christoph Hellwig
  2011-03-09  5:46             ` Dave Chinner
  0 siblings, 1 reply; 19+ messages in thread
From: Christoph Hellwig @ 2011-03-03 16:19 UTC (permalink / raw)
  To: Dave Chinner; +Cc: chris.mason, xfs

On Thu, Mar 03, 2011 at 10:48:19AM -0500, Christoph Hellwig wrote:
> I don't think we'll be able to get around chaning the dirty_inode
> callback.  We need a way to prevent the VFS from marking the inode
> dirty, otherwise we have no chance of reclaiming it.
> 
> Except for that I think it's really simple:
> 
>  1) we need to reintroduce the i_update_size flag or an equivalent to
>     distinguish unlogged timestamp from unlogged size updates for fsync
>     vs fdatasync.  At that point we can stop looking at the VFS dirty
>     bits in fsync.
>  2) ->dirty_inode needs to tag the inode as dirty in the inode radix
>     tree
> 
> With those minimal changes we should be set - we already
> callxfs_sync_attr from the sync_fs path, and xfs_sync_inode_attr
> properly picks up inodes with unlogged changes.

Actually xfs_sync_attr does not get called from the sync path right now,
which is a bit odd.  But once we add it, possibly with an earlier
trylock pass and/or an inode cluster read-ahead the above plan still
stands.

What's also rather odd is how much we use xfs_sync_data - unlike the
inodes where our own code doing writeback based on disk order makes
a lot of sense data is actually handled very well by the core writeback
code.  The two remaining callers of xfs_sync_data are
xfs_flush_inodes_work and xfs_quiesce_data.  The former area really
belongs into this patchset - can you try what only calling
writeback_inodes* from the ENOSPC handler instead of doing our own stuff
does?  It should give you the avoidance of double writeout for free, and
get rid of one of xfs_sync_data callers.

After that we just need to look into xfs_quiesce_data.  The core
writeback code now does reliably writeback before calling into
->sync_fs, so the actual writeback should be superflous.  We will still
need a log force after it, and we might need an iteration through all
inodes to do an xfs_ioend_wait, but this are can be simplified a lot.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/5] xfs: introduce inode cluster buffer trylocks for xfs_iflush
  2011-03-03 15:55   ` Christoph Hellwig
@ 2011-03-03 22:04     ` Dave Chinner
  0 siblings, 0 replies; 19+ messages in thread
From: Dave Chinner @ 2011-03-03 22:04 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: chris.mason, xfs

On Thu, Mar 03, 2011 at 10:55:35AM -0500, Christoph Hellwig wrote:
> > +	 * pass through will see the stale flag set on the inode.
> > +	 */
> > +	error = xfs_iflush(ip, SYNC_TRYLOCK | sync_mode);
> >  	if (sync_mode & SYNC_WAIT) {
> > +		if (error == EAGAIN) {
> > +			xfs_iunlock(ip, XFS_ILOCK_EXCL);
> > +			/* backoff longer than in xfs_ifree_cluster */
> > +			delay(2);
> 
> Do we really need the delay here?  It seems like we'd rather want to
> keep going with scanning the next inode cluster and return here from
> xfs_reclaim_inodes.

I did that because SYNC_WAIT semantics mean "block until the inode
is reclaimed". This is the slow, reliable reclaim path that doesn't
return until the inode is reclaimed, so we have to have a backoff
here to allow xfs_ifree_cluster() to complete it's backoff and gain
the locks successfully thereby allowing the inode to be reclaimed
successfully.

> > diff --git a/fs/xfs/linux-2.6/xfs_sync.h b/fs/xfs/linux-2.6/xfs_sync.h
> > index 32ba662..0ae48ff 100644
> > --- a/fs/xfs/linux-2.6/xfs_sync.h
> > +++ b/fs/xfs/linux-2.6/xfs_sync.h
> > @@ -34,6 +34,7 @@ typedef struct xfs_sync_work {
> >  
> >  int xfs_syncd_init(struct xfs_mount *mp);
> >  void xfs_syncd_stop(struct xfs_mount *mp);
> > +void xfs_syncd_queue_sync(struct xfs_mount *mp, int flags);
> 
> This hunk belongs into a different patch.

Oops. Will fix.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/5] xfs: convert ENOSPC inode flushing to use new syncd workqueue
  2011-03-03 15:34   ` Christoph Hellwig
@ 2011-03-03 22:41     ` Dave Chinner
  2011-03-04 12:40       ` Christoph Hellwig
  0 siblings, 1 reply; 19+ messages in thread
From: Dave Chinner @ 2011-03-03 22:41 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: chris.mason, xfs

On Thu, Mar 03, 2011 at 10:34:10AM -0500, Christoph Hellwig wrote:
> I still don't see any point in having the ENOSPC flushing moved to a
> different context.

IIRC, stack usage has always been an issue, and we also call
xfs_flush_inodes() with the XFS_IOLOCK held (from
xfs_iomap_write_delay()) so the alternate context was used to avoid
deadlocks. I don't think we have that deadlock problem now thanks to
being able to combine SYNC_TRYLOCK | SYNC_WAIT flags, but I'm not
sure we can ignore the stack issues.

> Just add a mutex and flush inline, e.g.
> 
> void
> xfs_flush_inodes(
> 	struct xfs_inode	*ip)
> {
> 	struct xfs_mount	*mp = ip->i_mount;
> 
> 	if (!mutex_trylock(&xfs_syncd_lock))
> 		return;		/* someone else is flushing right now */
> 	xfs_sync_data(mp, SYNC_TRYLOCK);
> 	xfs_sync_data(mp, SYNC_TRYLOCK | SYNC_WAIT);
> 	xfs_log_force(mp, XFS_LOG_SYNC);
> 	mutex_unlock(&xfs_syncd_lock);
> }

This doesn't allow all the concurrent flushes to block on the flush
in progress. i.e. if there is a flush in progress, all the others
will simply return an likely get ENOSPC again because they haven't
waited for any potential space to be freed up. It also realy
requires a per-filesystem mutex, not a global mutex, because we
don't wan't to avoid flushing filesystem X because filesystem Y is
currently flushing.

Yes, I could play tricks when the trylock case fails, but I'd prefer
to leave it as a work item because then all the concurrent flushers
all block on the same work item and it is clear from the stack
traces what they are all waiting on.

I've also realised the work_pending() check is unnecessary, as is
the lock, because queue_work() will only queue new work if the work
item isn't already pending so there's no need to check it here.
Hence all this actually needs to do is:

	queue_work()
	flush_work_sync()

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/5] xfs: introduce background inode reclaim work
  2011-03-03 15:36   ` Christoph Hellwig
@ 2011-03-03 22:43     ` Dave Chinner
  0 siblings, 0 replies; 19+ messages in thread
From: Dave Chinner @ 2011-03-03 22:43 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: chris.mason, xfs

On Thu, Mar 03, 2011 at 10:36:34AM -0500, Christoph Hellwig wrote:
> > +void
> > +xfs_syncd_queue_reclaim(
> > +	struct xfs_mount        *mp,
> > +	int			flags)
> > +{
> > +	mutex_lock(&xfs_syncd_lock);
> > +	if (!delayed_work_pending(&mp->m_reclaim_work))
> > +		queue_delayed_work(xfs_syncd_wq, &mp->m_reclaim_work,
> > +			xfs_syncd_centisecs / 5 * msecs_to_jiffies(10));
> > +	mutex_unlock(&xfs_syncd_lock);
> > +
> > +	if (flags & SYNC_WAIT)
> > +		flush_delayed_work_sync(&mp->m_reclaim_work);
> > +}
> 
> queue_work/queue_delayed_work have a test_set_bit on
> WORK_STRUCT_PENDING_BIT, so can just call queue_work/queue_delayed_work
> and it will do the right thing if it is in use.  So you can remove the
> mutex and delayed_work_pending check here.
> 

Yup, it's already gone. :)

> At least currently SYNC_WAIT is never set by any caller, and I wonder if
> we should just leave the waiting to the caller if we ever grow one.

I can remove it - it is a left over from testing different methods
of throttling the shrinker.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/5] xfs: convert ENOSPC inode flushing to use new syncd workqueue
  2011-03-03 22:41     ` Dave Chinner
@ 2011-03-04 12:40       ` Christoph Hellwig
  0 siblings, 0 replies; 19+ messages in thread
From: Christoph Hellwig @ 2011-03-04 12:40 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, chris.mason, xfs

On Fri, Mar 04, 2011 at 09:41:05AM +1100, Dave Chinner wrote:
> On Thu, Mar 03, 2011 at 10:34:10AM -0500, Christoph Hellwig wrote:
> > I still don't see any point in having the ENOSPC flushing moved to a
> > different context.
> 
> IIRC, stack usage has always been an issue, and we also call
> xfs_flush_inodes() with the XFS_IOLOCK held (from
> xfs_iomap_write_delay()) so the alternate context was used to avoid
> deadlocks. I don't think we have that deadlock problem now thanks to
> being able to combine SYNC_TRYLOCK | SYNC_WAIT flags, but I'm not
> sure we can ignore the stack issues.

Given that we wait for completion of the syncing in the caller moving it
to a different context does not help with any deadlocks.  It just makes
them impossible to detect using lockdep.

> I've also realised the work_pending() check is unnecessary, as is
> the lock, because queue_work() will only queue new work if the work
> item isn't already pending so there's no need to check it here.
> Hence all this actually needs to do is:
> 
> 	queue_work()
> 	flush_work_sync()

or in fact only use the writeback_inodes_sb_if_idle call you added
later.  That also causes writeback of data from the flusher threads.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/5] xfs: kick inode writeback when low on memory
  2011-03-03 16:19           ` Christoph Hellwig
@ 2011-03-09  5:46             ` Dave Chinner
  0 siblings, 0 replies; 19+ messages in thread
From: Dave Chinner @ 2011-03-09  5:46 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: chris.mason, xfs

On Thu, Mar 03, 2011 at 11:19:29AM -0500, Christoph Hellwig wrote:
> On Thu, Mar 03, 2011 at 10:48:19AM -0500, Christoph Hellwig wrote:
> > I don't think we'll be able to get around chaning the dirty_inode
> > callback.  We need a way to prevent the VFS from marking the inode
> > dirty, otherwise we have no chance of reclaiming it.
> > 
> > Except for that I think it's really simple:
> > 
> >  1) we need to reintroduce the i_update_size flag or an equivalent to
> >     distinguish unlogged timestamp from unlogged size updates for fsync
> >     vs fdatasync.  At that point we can stop looking at the VFS dirty
> >     bits in fsync.
> >  2) ->dirty_inode needs to tag the inode as dirty in the inode radix
> >     tree
> > 
> > With those minimal changes we should be set - we already
> > callxfs_sync_attr from the sync_fs path, and xfs_sync_inode_attr
> > properly picks up inodes with unlogged changes.
> 
> Actually xfs_sync_attr does not get called from the sync path right now,
> which is a bit odd.

Right, and that is the root cause of the "filesystem doesn't idle"
problems that have been reported lately. As it is, I've taken the
approach of pushing the AIL every 30s rather than calling
xfs_sync_attr() as the method of avoiding this problem...

> But once we add it, possibly with an earlier
> trylock pass and/or an inode cluster read-ahead the above plan still
> stands.

I don't think that matters very much to the problem at hand.

> What's also rather odd is how much we use xfs_sync_data - unlike the
> inodes where our own code doing writeback based on disk order makes
> a lot of sense data is actually handled very well by the core writeback
> code.  The two remaining callers of xfs_sync_data are
> xfs_flush_inodes_work and xfs_quiesce_data.  The former area really
> belongs into this patchset - can you try what only calling
> writeback_inodes* from the ENOSPC handler instead of doing our own stuff
> does?  It should give you the avoidance of double writeout for free, and
> get rid of one of xfs_sync_data callers.

Not odd at all - both are doing something the linux VFS has not been
able to do until recently.  However, where-ever I've tried to use
writeback_inodes_sb_if_idle() in XFS has resulted in lockdep
complaints because it takes s_umount....

> After that we just need to look into xfs_quiesce_data.  The core
> writeback code now does reliably writeback before calling into
> ->sync_fs, so the actual writeback should be superflous.  We will still
> need a log force after it, and we might need an iteration through all
> inodes to do an xfs_ioend_wait, but this are can be simplified a lot.

I still don't fully trust the VFS data writeback to write all data
out in the case of freezing the filesystem, so I'm extremely wary of
dropping the data flushing that XFS is doing there.

And if we still have to do a xfs_ioend_wait() pass (which we do to
wait for direct io to complete), then all we are doing
is dropping 2 or 3 lines of code in xfs_sync_inode_data().

Hence I'm not really inclined to change either of these calls right
now as neither are critical to fixing the OOM problems.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2011-03-09  5:44 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-02-22 22:16 [RFC, PATCH 0/5] xfs: Reduce OOM kill problems under heavy load Dave Chinner
2011-02-22 22:16 ` [PATCH 1/5] xfs: introduce inode cluster buffer trylocks for xfs_iflush Dave Chinner
2011-03-03 15:55   ` Christoph Hellwig
2011-03-03 22:04     ` Dave Chinner
2011-02-22 22:16 ` [PATCH 2/5] xfs: introduce a xfssyncd workqueue Dave Chinner
2011-02-22 22:16 ` [PATCH 3/5] xfs: convert ENOSPC inode flushing to use new syncd workqueue Dave Chinner
2011-03-03 15:34   ` Christoph Hellwig
2011-03-03 22:41     ` Dave Chinner
2011-03-04 12:40       ` Christoph Hellwig
2011-02-22 22:16 ` [PATCH 4/5] xfs: introduce background inode reclaim work Dave Chinner
2011-03-03 15:36   ` Christoph Hellwig
2011-03-03 22:43     ` Dave Chinner
2011-02-22 22:16 ` [PATCH 5/5] xfs: kick inode writeback when low on memory Dave Chinner
2011-03-02  3:06   ` Dave Chinner
2011-03-02 14:12     ` Christoph Hellwig
2011-03-03  2:42       ` Dave Chinner
2011-03-03 15:48         ` Christoph Hellwig
2011-03-03 16:19           ` Christoph Hellwig
2011-03-09  5:46             ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox