From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from relay.sgi.com (relay2.corp.sgi.com [137.38.102.29]) by oss.sgi.com (Postfix) with ESMTP id 5F1D57FC9 for ; Thu, 20 Feb 2014 08:51:55 -0600 (CST) Message-ID: <5306168B.8080209@sgi.com> Date: Thu, 20 Feb 2014 08:51:55 -0600 From: Mark Tinguely MIME-Version: 1.0 Subject: Re: [PATCH 1/3] xfs: always do log forces via the workqueue References: <1392783402-4726-1-git-send-email-david@fromorbit.com> <1392783402-4726-2-git-send-email-david@fromorbit.com> <5304F6F6.3070007@redhat.com> <20140220002358.GH4916@dastard> In-Reply-To: <20140220002358.GH4916@dastard> List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: xfs-bounces@oss.sgi.com Sender: xfs-bounces@oss.sgi.com To: Dave Chinner Cc: Brian Foster , xfs@oss.sgi.com On 02/19/14 18:23, Dave Chinner wrote: > On Wed, Feb 19, 2014 at 01:24:54PM -0500, Brian Foster wrote: >> On 02/18/2014 11:16 PM, Dave Chinner wrote: >>> From: Dave Chinner >>> >>> Log forces can occur deep in the call chain when we have relatively >>> little stack free. Log forces can also happen at close to the call >>> chain leaves (e.g. xfs_buf_lock()) and hence we can trigger IO from >>> places where we really don't want to add more stack overhead. >>> >>> This stack overhead occurs because log forces do foreground CIL >>> pushes (xlog_cil_push_foreground()) rather than waking the >>> background push wq and waiting for the for the push to complete. >>> This foreground push was done to avoid confusing the CFQ Io >>> scheduler when fsync()s were issued, as it has trouble dealing with >>> dependent IOs being issued from different process contexts. >>> >>> Avoiding blowing the stack is much more critical than performance >>> optimisations for CFQ, especially as we've been recommending against >>> the use of CFQ for XFS since 3.2 kernels were release because of >>> it's problems with multi-threaded IO workloads. >>> >>> Hence convert xlog_cil_push_foreground() to move the push work >>> to the CIL workqueue. We already do the waiting for the push to >>> complete in xlog_cil_force_lsn(), so there's nothing else we need to >>> modify to make this work. >>> >>> Signed-off-by: Dave Chinner >>> --- >>> fs/xfs/xfs_log_cil.c | 52 +++++++++++++++++++++++++++++++++++++++------------- >>> 1 file changed, 39 insertions(+), 13 deletions(-) >>> >>> diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c >>> index b57a8e0..7e54553 100644 >>> --- a/fs/xfs/xfs_log_cil.c >>> +++ b/fs/xfs/xfs_log_cil.c >>> @@ -499,13 +499,6 @@ xlog_cil_push( >>> cil->xc_ctx = new_ctx; >>> >>> /* >>> - * mirror the new sequence into the cil structure so that we can do >>> - * unlocked checks against the current sequence in log forces without >>> - * risking deferencing a freed context pointer. >>> - */ >>> - cil->xc_current_sequence = new_ctx->sequence; >>> - >>> - /* >>> * The switch is now done, so we can drop the context lock and move out >>> * of a shared context. We can't just go straight to the commit record, >>> * though - we need to synchronise with previous and future commits so >>> @@ -523,8 +516,15 @@ xlog_cil_push( >>> * Hence we need to add this context to the committing context list so >>> * that higher sequences will wait for us to write out a commit record >>> * before they do. >>> + * >>> + * xfs_log_force_lsn requires us to mirror the new sequence into the cil >>> + * structure atomically with the addition of this sequence to the >>> + * committing list. This also ensures that we can do unlocked checks >>> + * against the current sequence in log forces without risking >>> + * deferencing a freed context pointer. >>> */ >>> spin_lock(&cil->xc_push_lock); >>> + cil->xc_current_sequence = new_ctx->sequence; >>> list_add(&ctx->committing,&cil->xc_committing); >>> spin_unlock(&cil->xc_push_lock); >>> up_write(&cil->xc_ctx_lock); >>> @@ -662,8 +662,14 @@ xlog_cil_push_background( >>> >>> } >>> >>> +/* >>> + * xlog_cil_push_now() is used to trigger an immediate CIL push to the sequence >>> + * number that is passed. When it returns, the work will be queued for >>> + * @push_seq, but it won't be completed. The caller is expected to do any >>> + * waiting for push_seq to complete if it is required. >>> + */ >>> static void >>> -xlog_cil_push_foreground( >>> +xlog_cil_push_now( >>> struct xlog *log, >>> xfs_lsn_t push_seq) >>> { >>> @@ -688,10 +694,8 @@ xlog_cil_push_foreground( >>> } >>> >>> cil->xc_push_seq = push_seq; >>> + queue_work(log->l_mp->m_cil_workqueue,&cil->xc_push_work); >>> spin_unlock(&cil->xc_push_lock); >>> - >>> - /* do the push now */ >>> - xlog_cil_push(log); >>> } >>> >>> bool >>> @@ -795,7 +799,8 @@ xlog_cil_force_lsn( >>> * xlog_cil_push() handles racing pushes for the same sequence, >>> * so no need to deal with it here. >>> */ >>> - xlog_cil_push_foreground(log, sequence); >>> +restart: >>> + xlog_cil_push_now(log, sequence); >>> >>> /* >>> * See if we can find a previous sequence still committing. >>> @@ -803,7 +808,6 @@ xlog_cil_force_lsn( >>> * before allowing the force of push_seq to go ahead. Hence block >>> * on commits for those as well. >>> */ >>> -restart: >>> spin_lock(&cil->xc_push_lock); >>> list_for_each_entry(ctx,&cil->xc_committing, committing) { >>> if (ctx->sequence> sequence) >>> @@ -821,6 +825,28 @@ restart: >>> /* found it! */ >>> commit_lsn = ctx->commit_lsn; >>> } >>> + >>> + /* >>> + * The call to xlog_cil_push_now() executes the push in the background. >>> + * Hence by the time we have got here it our sequence may not have been >>> + * pushed yet. This is true if the current sequence still matches the >>> + * push sequence after the above wait loop and the CIL still contains >>> + * dirty objects. >>> + * >>> + * When the push occurs, it will empty the CIL and >>> + * atomically increment the currect sequence past the push sequence and >>> + * move it into the committing list. Of course, if the CIL is clean at >>> + * the time of the push, it won't have pushed the CIL at all, so in that >>> + * case we should try the push for this sequence again from the start >>> + * just in case. >>> + */ >>> + >>> + if (sequence == cil->xc_current_sequence&& >>> + !list_empty(&cil->xc_cil)) { >>> + spin_unlock(&cil->xc_push_lock); >>> + goto restart; >>> + } >>> + >> >> IIUC, the objective here is to make sure we don't leave this code path >> before the push even starts and the ctx makes it onto the committing >> list, due to xlog_cil_push_now() moving things to a workqueue. > > Right. > >> Given that, what's the purpose of re-executing the background push as >> opposed to restarting the wait sequence (as done previously)? It looks >> like push_now() won't queue the work again due to cil->xc_push_seq, but >> it will flush the queue and I suppose make it more likely the push >> starts. Is that the intent? > > Effectively. But the other thing that it is protecting against is > that foreground push is done without holding the cil->xc_ctx_lock, > and so we can get the situation where we try a foreground push > of the current sequence, see that the CIL is empty and return > without pushing, wait for previous sequences to commit, then find > that the CIL has items on the CIL in the sequence we are supposed to > be committing. > > In this case, we don't know if this occurred because the workqueue > has not started working on our push, or whether we raced on an empty > CIL, and hence we need to make sure that everything in the sequence > we are support to commit is pushed to the log. > > Hence if the current sequence is dirty after we've ensure that all > prior sequences are fully checkpointed, need to go back and > push the CIL again to ensure that when we return to the caller the > CIL is checkpointed up to the point in time of the log force > occurring. The desired push sequence was taken from an item on the CIL (either when added or from a pinned item). How could the CIL now be empty other than someone else pushed to at least the desire sequence? A flush_work() should be enough in the case where the ctx of the desire sequence is not on the xc_committing list. The flush_work will wait for the worker to start and place the ctx of the desired sequence into the xc_committing list. This preventing a tight loop waiting for the cil push worker to start. Starting the cil push worker for every wakeup of smaller sequence in the list_for_each_entry loop seems wasteful. We know the later error paths in xfs_cil_push() will not do a wake, now is a good time to fix that. --Mark. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs