From: Fengguang Wu <fengguang.wu@intel.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Wanpeng Li <liwp.linux@gmail.com>,
linux-kernel@vger.kernel.org,
Gavin Shan <shangw@linux.vnet.ibm.com>,
Wanpeng Li <liswp@linux.vnet.ibm.com>
Subject: Re: [PATCH] writeback: avoid race when update bandwidth
Date: Thu, 14 Jun 2012 21:48:18 +0800 [thread overview]
Message-ID: <20120614134818.GA15553@localhost> (raw)
In-Reply-To: <20120614013645.GA7339@dastard>
On Thu, Jun 14, 2012 at 11:36:45AM +1000, Dave Chinner wrote:
> On Wed, Jun 13, 2012 at 12:21:15PM +0800, Fengguang Wu wrote:
> > On Wed, Jun 13, 2012 at 01:56:47PM +1000, Dave Chinner wrote:
> > > On Tue, Jun 12, 2012 at 07:21:29PM +0800, Fengguang Wu wrote:
> > > > On Tue, Jun 12, 2012 at 06:26:43PM +0800, Wanpeng Li wrote:
> > > > > From: Wanpeng Li <liwp@linux.vnet.ibm.com>
> > > >
> > > > That email address is no longer in use?
> > > >
> > > > > Since bdi->wb.list_lock is used to protect the b_* lists,
> > > > > so the flushers who call wb_writeback to writeback pages will
> > > > > stuck when bandwidth update policy holds this lock. In order
> > > > > to avoid this race we can introduce a new bandwidth_lock who
> > > > > is responsible for protecting bandwidth update policy.
> > >
> > > This is not a race condition - it is a lock contention condition.
> >
> > Nod.
> >
> > > > This looks good to me. wb.list_lock could be contended and it's better
> > > > for bdi_update_bandwidth() to use a standalone and hardly contended
> > > > lock.
> > >
> > > I'm not sure it will be "hardly contended". That's a global lock, so
> > > now we'll end up with updates on different bdis contending and it's
> > > not uncommon to see a couple of thousand processes on large machines
> > > beating on balance_dirty_pages(). Putting a global scope lock
> > > around such a function doesn't seem like a good solution to me.
> >
> > It's more about the number of bdi's than the number of processes that matters.
> > Because here is a per-bdi 200ms ratelimit:
> >
> > bdi_update_bandwidth():
> >
> > if (time_is_after_eq_jiffies(bdi->bw_time_stamp + BANDWIDTH_INTERVAL))
> > return;
> > // lock it
>
> So now you get a thousand processes on a thousand CPUs all hit that
> case at the same time because they are all writing to disk at the
> same time, all nicely synchronised by MPI. Lock contention ahoy!
Yeah, the cost does increase fast with number of CPUs...
> > So a global should be enough when there are only dozens of disks.
>
> Only needs one bdi, just with lots of processes trying to hit it at
> the same time such that they all pass the time after check.
It's more related to number of CPUs: once task A updates
bdi->bw_time_stamp, the other tasks B, C, D, ... will see the updated
value and will all back off in the next 200ms period.
> > However, the global bandwidth_lock will probably become a problem when
> > there comes hundreds of disks. If there are (or will be) such setups,
> > I'm fine to revert to the old per-bdi locking.
>
> There are setups with hundreds of disks. They also tend to
> have hundreds of CPUs, too....
OK.. I'll drop the change.
> > > Oh, and if you want to remove the dirty_lock from
> > > global_update_limit(), then replacing the lock with a cmpxchg loop
> > > will do it just fine....
> >
> > Yes. But to be frank, I don't care about that dirty_lock at all,
> > because it has its own 200ms rate limiting :-)
>
> That has the same problem, only it's currently nested inside another
> lock which isolates it from contention. This is why measurement is
> important - until there is that evidence shows that the lock
> contention is a problem, don't change it because it generally has a
> unpredictable cascading effect that often results in worse
> contention that was there originally....
You are right, it's good attitude to avoid "might be better" changes
for some "suspected problem".
Thanks,
Fengguang
prev parent reply other threads:[~2012-06-14 13:48 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-06-12 10:26 [PATCH] writeback: avoid race when update bandwidth Wanpeng Li
2012-06-12 11:21 ` Fengguang Wu
2012-06-12 11:29 ` Wanpeng Li
2012-06-12 11:33 ` Fengguang Wu
2012-06-13 3:56 ` Dave Chinner
2012-06-13 4:21 ` Fengguang Wu
2012-06-14 1:36 ` Dave Chinner
2012-06-14 13:48 ` Fengguang Wu [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120614134818.GA15553@localhost \
--to=fengguang.wu@intel.com \
--cc=david@fromorbit.com \
--cc=linux-kernel@vger.kernel.org \
--cc=liswp@linux.vnet.ibm.com \
--cc=liwp.linux@gmail.com \
--cc=shangw@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox