Re: [PATCH] writeback: avoid race when update bandwidth

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Fengguang Wu <fengguang.wu@intel.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Wanpeng Li <liwp.linux@gmail.com>,
	linux-kernel@vger.kernel.org,
	Gavin Shan <shangw@linux.vnet.ibm.com>,
	Wanpeng Li <liswp@linux.vnet.ibm.com>
Subject: Re: [PATCH] writeback: avoid race when update bandwidth
Date: Thu, 14 Jun 2012 21:48:18 +0800	[thread overview]
Message-ID: <20120614134818.GA15553@localhost> (raw)
In-Reply-To: <20120614013645.GA7339@dastard>

On Thu, Jun 14, 2012 at 11:36:45AM +1000, Dave Chinner wrote:
> On Wed, Jun 13, 2012 at 12:21:15PM +0800, Fengguang Wu wrote:
> > On Wed, Jun 13, 2012 at 01:56:47PM +1000, Dave Chinner wrote:
> > > On Tue, Jun 12, 2012 at 07:21:29PM +0800, Fengguang Wu wrote:
> > > > On Tue, Jun 12, 2012 at 06:26:43PM +0800, Wanpeng Li wrote:
> > > > > From: Wanpeng Li <liwp@linux.vnet.ibm.com>
> > > > 
> > > > That email address is no longer in use?
> > > > 
> > > > > Since bdi->wb.list_lock is used to protect the b_* lists,
> > > > > so the flushers who call wb_writeback to writeback pages will
> > > > > stuck when bandwidth update policy holds this lock. In order
> > > > > to avoid this race we can introduce a new bandwidth_lock who
> > > > > is responsible for protecting bandwidth update policy.
> > > 
> > > This is not a race condition - it is a lock contention condition.
> > 
> > Nod.
> > 
> > > > This looks good to me. wb.list_lock could be contended and it's better
> > > > for bdi_update_bandwidth() to use a standalone and hardly contended
> > > > lock.
> > > 
> > > I'm not sure it will be "hardly contended". That's a global lock, so
> > > now we'll end up with updates on different bdis contending and it's
> > > not uncommon to see a couple of thousand processes on large machines
> > > beating on balance_dirty_pages().  Putting a global scope lock
> > > around such a function doesn't seem like a good solution to me.
> > 
> > It's more about the number of bdi's than the number of processes that matters.
> > Because here is a per-bdi 200ms ratelimit:
> > 
> > bdi_update_bandwidth():
> > 
> >        if (time_is_after_eq_jiffies(bdi->bw_time_stamp + BANDWIDTH_INTERVAL))
> >                 return;         
> >        // lock it
> 
> So now you get a thousand processes on a thousand CPUs all hit that
> case at the same time because they are all writing to disk at the
> same time, all nicely synchronised by MPI. Lock contention ahoy!

Yeah, the cost does increase fast with number of CPUs...

> > So a global should be enough when there are only dozens of disks.
> 
> Only needs one bdi, just with lots of processes trying to hit it at
> the same time such that they all pass the time after check.

It's more related to number of CPUs: once task A updates
bdi->bw_time_stamp, the other tasks B, C, D, ... will see the updated
value and will all back off in the next 200ms period.

> > However, the global bandwidth_lock will probably become a problem when
> > there comes hundreds of disks. If there are (or will be) such setups,
> > I'm fine to revert to the old per-bdi locking.
> 
> There are setups with hundreds of disks. They also tend to
> have hundreds of CPUs, too....

OK.. I'll drop the change.

> > > Oh, and if you want to remove the dirty_lock from
> > > global_update_limit(), then replacing the lock with a cmpxchg loop
> > > will do it just fine....
> > 
> > Yes. But to be frank, I don't care about that dirty_lock at all,
> > because it has its own 200ms rate limiting :-)
> 
> That has the same problem, only it's currently nested inside another
> lock which isolates it from contention.  This is why measurement is
> important - until there is that evidence shows that the lock
> contention is a problem, don't change it because it generally has a
> unpredictable cascading effect that often results in worse
> contention that was there originally....

You are right, it's good attitude to avoid "might be better" changes
for some "suspected problem".

Thanks,
Fengguang

     prev parent reply	other threads:[~2012-06-14 13:48 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-12 10:26 [PATCH] writeback: avoid race when update bandwidth Wanpeng Li
2012-06-12 11:21 ` Fengguang Wu
2012-06-12 11:29   ` Wanpeng Li
2012-06-12 11:33     ` Fengguang Wu
2012-06-13  3:56   ` Dave Chinner
2012-06-13  4:21     ` Fengguang Wu
2012-06-14  1:36       ` Dave Chinner
2012-06-14 13:48         ` Fengguang Wu [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120614134818.GA15553@localhost \
    --to=fengguang.wu@intel.com \
    --cc=david@fromorbit.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=liswp@linux.vnet.ibm.com \
    --cc=liwp.linux@gmail.com \
    --cc=shangw@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox