All of lore.kernel.org
 help / color / mirror / Atom feed
From: Fengguang Wu <fengguang.wu@intel.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Wanpeng Li <liwp.linux@gmail.com>,
	linux-kernel@vger.kernel.org,
	Gavin Shan <shangw@linux.vnet.ibm.com>,
	Wanpeng Li <liswp@linux.vnet.ibm.com>
Subject: Re: [PATCH] writeback: avoid race when update bandwidth
Date: Thu, 14 Jun 2012 21:48:18 +0800	[thread overview]
Message-ID: <20120614134818.GA15553@localhost> (raw)
In-Reply-To: <20120614013645.GA7339@dastard>

On Thu, Jun 14, 2012 at 11:36:45AM +1000, Dave Chinner wrote:
> On Wed, Jun 13, 2012 at 12:21:15PM +0800, Fengguang Wu wrote:
> > On Wed, Jun 13, 2012 at 01:56:47PM +1000, Dave Chinner wrote:
> > > On Tue, Jun 12, 2012 at 07:21:29PM +0800, Fengguang Wu wrote:
> > > > On Tue, Jun 12, 2012 at 06:26:43PM +0800, Wanpeng Li wrote:
> > > > > From: Wanpeng Li <liwp@linux.vnet.ibm.com>
> > > > 
> > > > That email address is no longer in use?
> > > > 
> > > > > Since bdi->wb.list_lock is used to protect the b_* lists,
> > > > > so the flushers who call wb_writeback to writeback pages will
> > > > > stuck when bandwidth update policy holds this lock. In order
> > > > > to avoid this race we can introduce a new bandwidth_lock who
> > > > > is responsible for protecting bandwidth update policy.
> > > 
> > > This is not a race condition - it is a lock contention condition.
> > 
> > Nod.
> > 
> > > > This looks good to me. wb.list_lock could be contended and it's better
> > > > for bdi_update_bandwidth() to use a standalone and hardly contended
> > > > lock.
> > > 
> > > I'm not sure it will be "hardly contended". That's a global lock, so
> > > now we'll end up with updates on different bdis contending and it's
> > > not uncommon to see a couple of thousand processes on large machines
> > > beating on balance_dirty_pages().  Putting a global scope lock
> > > around such a function doesn't seem like a good solution to me.
> > 
> > It's more about the number of bdi's than the number of processes that matters.
> > Because here is a per-bdi 200ms ratelimit:
> > 
> > bdi_update_bandwidth():
> > 
> >        if (time_is_after_eq_jiffies(bdi->bw_time_stamp + BANDWIDTH_INTERVAL))
> >                 return;         
> >        // lock it
> 
> So now you get a thousand processes on a thousand CPUs all hit that
> case at the same time because they are all writing to disk at the
> same time, all nicely synchronised by MPI. Lock contention ahoy!

Yeah, the cost does increase fast with number of CPUs...

> > So a global should be enough when there are only dozens of disks.
> 
> Only needs one bdi, just with lots of processes trying to hit it at
> the same time such that they all pass the time after check.

It's more related to number of CPUs: once task A updates
bdi->bw_time_stamp, the other tasks B, C, D, ... will see the updated
value and will all back off in the next 200ms period.

> > However, the global bandwidth_lock will probably become a problem when
> > there comes hundreds of disks. If there are (or will be) such setups,
> > I'm fine to revert to the old per-bdi locking.
> 
> There are setups with hundreds of disks. They also tend to
> have hundreds of CPUs, too....

OK.. I'll drop the change.

> > > Oh, and if you want to remove the dirty_lock from
> > > global_update_limit(), then replacing the lock with a cmpxchg loop
> > > will do it just fine....
> > 
> > Yes. But to be frank, I don't care about that dirty_lock at all,
> > because it has its own 200ms rate limiting :-)
> 
> That has the same problem, only it's currently nested inside another
> lock which isolates it from contention.  This is why measurement is
> important - until there is that evidence shows that the lock
> contention is a problem, don't change it because it generally has a
> unpredictable cascading effect that often results in worse
> contention that was there originally....

You are right, it's good attitude to avoid "might be better" changes
for some "suspected problem".

Thanks,
Fengguang

      reply	other threads:[~2012-06-14 13:48 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-12 10:26 [PATCH] writeback: avoid race when update bandwidth Wanpeng Li
2012-06-12 11:21 ` Fengguang Wu
2012-06-12 11:29   ` Wanpeng Li
2012-06-12 11:33     ` Fengguang Wu
2012-06-13  3:56   ` Dave Chinner
2012-06-13  4:21     ` Fengguang Wu
2012-06-14  1:36       ` Dave Chinner
2012-06-14 13:48         ` Fengguang Wu [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120614134818.GA15553@localhost \
    --to=fengguang.wu@intel.com \
    --cc=david@fromorbit.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=liswp@linux.vnet.ibm.com \
    --cc=liwp.linux@gmail.com \
    --cc=shangw@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.