All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Tejun Heo <tj@kernel.org>
Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	xfs@oss.sgi.com
Subject: Re: [2.6.36-rc3] Workqueues, XFS, dependencies and deadlocks
Date: Wed, 8 Sep 2010 17:34:28 +1000	[thread overview]
Message-ID: <20100908073428.GR705@dastard> (raw)
In-Reply-To: <4C865CC4.9070701@kernel.org>

On Tue, Sep 07, 2010 at 05:39:48PM +0200, Tejun Heo wrote:
> On 09/07/2010 02:48 PM, Dave Chinner wrote:
> > On Tue, Sep 07, 2010 at 12:35:46PM +0200, Tejun Heo wrote:
> > Almost. What happens is that there is a queue of data IO
> > completions on q0, say w1...wN where wX is in the middle of the
> > queue. wX requires lock A, but lock A is held by a a transaction
> > commit that is blocked by IO completion t1 on q1. The dependency
> > chain we then have is:
> > 
> > 	wX on q0 -> lock A -> t1 on q1
> > 
> > To prevent wX from blocking q0, when lock A is not gained, we
> > requeue wX to the tail of q0 such that the queue is not wX+1..wN,wX.
> > this means that wX will not block completion processing of data IO.
> > If wX is the only work on q0, then to stop the work queue from
> > spinning processing wX, queueing wX, processing wX.... there is a
> > delay(1) call to allow some time for other IOs to complete to occur
> > before trying again to process wX again.
> > 
> > At some point, q1 is processed and t1 is run and lock A
> > released. Once this happens, wX will gain lock A and finish the
> > completion and be freed.
> > 
> > The issue I appear to be seeing is that while q0 is doing:
> > 
> > 	wX -> requeue on q0 -> delay(1) -> wX -> requeue q0 -> wX
> > 
> > q1 which contains t1 is never getting processed, and hence the q0/wX
> > loop is never getting broken.
> 
> I see.  The use case itself shouldn't be problematic at all for cmwq
> (sans bugs of course).  In the other reply, you said "the system is
> 100% unresponsive when the livelock occurs", which is kind of
> puzzling.  It isn't really a livelock.

Actually, it is. You don't need to burn CPU to livelock, you just
need a loop in the state machine that cannot be broken by internal
or external events to be considered livelocked.

However, this is not what I was calling the livelock problem - this
is what I was calling the deadlock problem because to all external
appearences the state machine is deadlocked on the inode lock....

The livelock case I described where the system is completely
unresponsive is the one I'm testing the WQ_HIGHPRI mod against.

FWIW, having considered the above case again, and seeing what the
WQ_HIGHPRI mod does in terms of queuing, I think that it may also
solve this deadlock as the log IO completionwill always be queued
ahead of the data IO completion now.


> >> Also, how does delay(1) help with chewing up CPU?  Are you talking
> >> about avoiding constant lock/unlock ops starving other lockers?  In
> >> such case, wouldn't cpu_relax() make more sense?
> > 
> > Basically delay(1) is used in many places in XFS as a "backoff and
> > retry after a short period of time" mechanism in places where
> > blocking would lead to deadlock or we need a state change to occur
> > before retrying the operation that would have deadlocked. If we
> > don't put a backoff in, then we simply burn CPU until the condition
> > clears.
> > 
> > In the case of the data Io completion workqueue processing, the CPU
> > burn occurred when the only item on the workqueue was the inode that
> > we could not lock.  Hence the backoff. It's not a great solution,
> > but it's the only one that could be sued without changing everything
> > to use delayed works and hence suffer the associated structure bloat
> > for what is a rare corner case....
> 
> Hmm... The point where I'm confused is that *delay()'s are busy waits.
> They burn CPU cycles.  I suppose you're referring to *sleep()'s,
> right?

fs/xfs/linux-2.6/time.h:

static inline void delay(long ticks)
{
        schedule_timeout_uninterruptible(ticks);
}

> >> I don't remember but once I increased maximum concurrency for every
> >> workqueue (the limit was 128 or something) and xfs pretty quickly hit
> >> the concurrency limit.  IIRC, there was a function which allocates
> >> work_struct and schedules it.  I'll look through the emails.
> > 
> > How do you get concurrency requirements of 128 when you only have a
> > small number of CPUs?
> 
> Probably I have overloaded the term 'concurrency' too much.  In this
> case, I meant the number of workers assigned to work items of the wq.
> If you fire off N work items which sleep at the same time, cmwq will
> eventually try to create N workers as each previous worker goes to
> sleep so that the CPU doesn't sit idle while there are work items to
> process as long as N < @wq->nr->active.

Ok, so if I queue N items on a single CPU when max_active == N, they
get spread across N worker threads on different CPUs? 

> Documentation follows.

I'll have read of this tonight.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

WARNING: multiple messages have this Message-ID (diff)
From: Dave Chinner <david@fromorbit.com>
To: Tejun Heo <tj@kernel.org>
Cc: linux-kernel@vger.kernel.org, xfs@oss.sgi.com,
	linux-fsdevel@vger.kernel.org
Subject: Re: [2.6.36-rc3] Workqueues, XFS, dependencies and deadlocks
Date: Wed, 8 Sep 2010 17:34:28 +1000	[thread overview]
Message-ID: <20100908073428.GR705@dastard> (raw)
In-Reply-To: <4C865CC4.9070701@kernel.org>

On Tue, Sep 07, 2010 at 05:39:48PM +0200, Tejun Heo wrote:
> On 09/07/2010 02:48 PM, Dave Chinner wrote:
> > On Tue, Sep 07, 2010 at 12:35:46PM +0200, Tejun Heo wrote:
> > Almost. What happens is that there is a queue of data IO
> > completions on q0, say w1...wN where wX is in the middle of the
> > queue. wX requires lock A, but lock A is held by a a transaction
> > commit that is blocked by IO completion t1 on q1. The dependency
> > chain we then have is:
> > 
> > 	wX on q0 -> lock A -> t1 on q1
> > 
> > To prevent wX from blocking q0, when lock A is not gained, we
> > requeue wX to the tail of q0 such that the queue is not wX+1..wN,wX.
> > this means that wX will not block completion processing of data IO.
> > If wX is the only work on q0, then to stop the work queue from
> > spinning processing wX, queueing wX, processing wX.... there is a
> > delay(1) call to allow some time for other IOs to complete to occur
> > before trying again to process wX again.
> > 
> > At some point, q1 is processed and t1 is run and lock A
> > released. Once this happens, wX will gain lock A and finish the
> > completion and be freed.
> > 
> > The issue I appear to be seeing is that while q0 is doing:
> > 
> > 	wX -> requeue on q0 -> delay(1) -> wX -> requeue q0 -> wX
> > 
> > q1 which contains t1 is never getting processed, and hence the q0/wX
> > loop is never getting broken.
> 
> I see.  The use case itself shouldn't be problematic at all for cmwq
> (sans bugs of course).  In the other reply, you said "the system is
> 100% unresponsive when the livelock occurs", which is kind of
> puzzling.  It isn't really a livelock.

Actually, it is. You don't need to burn CPU to livelock, you just
need a loop in the state machine that cannot be broken by internal
or external events to be considered livelocked.

However, this is not what I was calling the livelock problem - this
is what I was calling the deadlock problem because to all external
appearences the state machine is deadlocked on the inode lock....

The livelock case I described where the system is completely
unresponsive is the one I'm testing the WQ_HIGHPRI mod against.

FWIW, having considered the above case again, and seeing what the
WQ_HIGHPRI mod does in terms of queuing, I think that it may also
solve this deadlock as the log IO completionwill always be queued
ahead of the data IO completion now.


> >> Also, how does delay(1) help with chewing up CPU?  Are you talking
> >> about avoiding constant lock/unlock ops starving other lockers?  In
> >> such case, wouldn't cpu_relax() make more sense?
> > 
> > Basically delay(1) is used in many places in XFS as a "backoff and
> > retry after a short period of time" mechanism in places where
> > blocking would lead to deadlock or we need a state change to occur
> > before retrying the operation that would have deadlocked. If we
> > don't put a backoff in, then we simply burn CPU until the condition
> > clears.
> > 
> > In the case of the data Io completion workqueue processing, the CPU
> > burn occurred when the only item on the workqueue was the inode that
> > we could not lock.  Hence the backoff. It's not a great solution,
> > but it's the only one that could be sued without changing everything
> > to use delayed works and hence suffer the associated structure bloat
> > for what is a rare corner case....
> 
> Hmm... The point where I'm confused is that *delay()'s are busy waits.
> They burn CPU cycles.  I suppose you're referring to *sleep()'s,
> right?

fs/xfs/linux-2.6/time.h:

static inline void delay(long ticks)
{
        schedule_timeout_uninterruptible(ticks);
}

> >> I don't remember but once I increased maximum concurrency for every
> >> workqueue (the limit was 128 or something) and xfs pretty quickly hit
> >> the concurrency limit.  IIRC, there was a function which allocates
> >> work_struct and schedules it.  I'll look through the emails.
> > 
> > How do you get concurrency requirements of 128 when you only have a
> > small number of CPUs?
> 
> Probably I have overloaded the term 'concurrency' too much.  In this
> case, I meant the number of workers assigned to work items of the wq.
> If you fire off N work items which sleep at the same time, cmwq will
> eventually try to create N workers as each previous worker goes to
> sleep so that the CPU doesn't sit idle while there are work items to
> process as long as N < @wq->nr->active.

Ok, so if I queue N items on a single CPU when max_active == N, they
get spread across N worker threads on different CPUs? 

> Documentation follows.

I'll have read of this tonight.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2010-09-08  7:33 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-09-07  7:29 [2.6.36-rc3] Workqueues, XFS, dependencies and deadlocks Dave Chinner
2010-09-07  7:29 ` Dave Chinner
2010-09-07  9:04 ` Tejun Heo
2010-09-07  9:04   ` Tejun Heo
2010-09-07 10:01   ` Dave Chinner
2010-09-07 10:01     ` Dave Chinner
2010-09-07 10:35     ` Tejun Heo
2010-09-07 10:35       ` Tejun Heo
2010-09-07 12:26       ` Tejun Heo
2010-09-07 12:26         ` Tejun Heo
2010-09-07 13:02         ` Dave Chinner
2010-09-07 13:02           ` Dave Chinner
2010-09-08  8:22         ` Dave Chinner
2010-09-08  8:22           ` Dave Chinner
2010-09-08  8:51           ` Tejun Heo
2010-09-08  8:51             ` Tejun Heo
2010-09-08 10:05             ` Dave Chinner
2010-09-08 10:05               ` Dave Chinner
2010-09-08 10:05               ` Dave Chinner
2010-09-08 14:10               ` Tejun Heo
2010-09-08 14:10                 ` Tejun Heo
2010-09-07 12:48       ` Dave Chinner
2010-09-07 12:48         ` Dave Chinner
2010-09-07 15:39         ` Tejun Heo
2010-09-07 15:39           ` Tejun Heo
2010-09-08  7:34           ` Dave Chinner [this message]
2010-09-08  7:34             ` Dave Chinner
2010-09-08  8:20             ` Tejun Heo
2010-09-08  8:20               ` Tejun Heo
2010-09-08  8:28               ` Dave Chinner
2010-09-08  8:28                 ` Dave Chinner
2010-09-08  8:46                 ` Tejun Heo
2010-09-08  8:46                   ` Tejun Heo
2010-09-08 10:12                   ` Dave Chinner
2010-09-08 10:12                     ` Dave Chinner
2010-09-08 10:28                     ` Tejun Heo
2010-09-08 10:28                       ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100908073428.GR705@dastard \
    --to=david@fromorbit.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tj@kernel.org \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.