From: Gregory Haskins <ghaskins@novell.com>
To: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: mingo@elte.hu, srostedt@redhat.com, peterz@infradead.org,
linux-kernel@vger.kernel.org, linux-rt-users@vger.kernel.org,
npiggin@suse.de, gregory.haskins@gmail.com
Subject: Re: [PATCH 3/5] sched: make double-lock-balance fair
Date: Wed, 27 Aug 2008 07:41:58 -0400 [thread overview]
Message-ID: <48B53D86.8080806@novell.com> (raw)
In-Reply-To: <200808271636.09243.nickpiggin@yahoo.com.au>
[-- Attachment #1: Type: text/plain, Size: 6044 bytes --]
Nick Piggin wrote:
> On Tuesday 26 August 2008 22:23, Gregory Haskins wrote:
>
>> Nick Piggin wrote:
>>
>>> On Tuesday 26 August 2008 06:15, Gregory Haskins wrote:
>>>
>>>> double_lock balance() currently favors logically lower cpus since they
>>>> often do not have to release their own lock to acquire a second lock.
>>>> The result is that logically higher cpus can get starved when there is
>>>> a lot of pressure on the RQs. This can result in higher latencies on
>>>> higher cpu-ids.
>>>>
>>>> This patch makes the algorithm more fair by forcing all paths to have
>>>> to release both locks before acquiring them again. Since callsites to
>>>> double_lock_balance already consider it a potential
>>>> preemption/reschedule point, they have the proper logic to recheck for
>>>> atomicity violations.
>>>>
>>>> Signed-off-by: Gregory Haskins <ghaskins@novell.com>
>>>> ---
>>>>
>>>> kernel/sched.c | 17 +++++------------
>>>> 1 files changed, 5 insertions(+), 12 deletions(-)
>>>>
>>>> diff --git a/kernel/sched.c b/kernel/sched.c
>>>> index 6e0bde6..b7326cd 100644
>>>> --- a/kernel/sched.c
>>>> +++ b/kernel/sched.c
>>>> @@ -2790,23 +2790,16 @@ static int double_lock_balance(struct rq
>>>> *this_rq, struct rq *busiest) __acquires(busiest->lock)
>>>> __acquires(this_rq->lock)
>>>> {
>>>> - int ret = 0;
>>>> -
>>>> if (unlikely(!irqs_disabled())) {
>>>> /* printk() doesn't work good under rq->lock */
>>>> spin_unlock(&this_rq->lock);
>>>> BUG_ON(1);
>>>> }
>>>> - if (unlikely(!spin_trylock(&busiest->lock))) {
>>>> - if (busiest < this_rq) {
>>>> - spin_unlock(&this_rq->lock);
>>>> - spin_lock(&busiest->lock);
>>>> - spin_lock_nested(&this_rq->lock, SINGLE_DEPTH_NESTING);
>>>> - ret = 1;
>>>> - } else
>>>> - spin_lock_nested(&busiest->lock, SINGLE_DEPTH_NESTING);
>>>> - }
>>>> - return ret;
>>>> +
>>>> + spin_unlock(&this_rq->lock);
>>>> + double_rq_lock(this_rq, busiest);
>>>>
>>> Rather than adding the extra atomic operation, can't you just put this
>>> into the unlikely spin_trylock failure path rather than the unfair logic
>>> there?
>>>
>> The trick is that we *must* first release this_rq before proceeding or
>> the new proposal doesn't work as intended. This patch effectively
>> breaks up the this_rq->lock critical section evenly across all CPUs as
>> if it hit the case common for higher cpus.
>>
>
> I don't exactly see why my proposal would introduce any more latency, because
> we only trylock while holding the existing lock -- this is will only ever add
> a small ~constant time to the critical section, regardless of whether it is a
> high or low CPU runqueue.
>
Its because we are trying to create a break in the critical section of
this_rq->lock, not improve the acquisition of busiest->lock. So whether
you do spin_lock or spin_trylock on busiest does not matter. Busiest
will not be contended in the case that I am concerned with. If you use
my example below: rq[N] will not be contended because cpuN is blocked on
rq[0] after already having released rq[N]. So its the contention
against this_rq that is the problem.
Or am I missing your point completely?
>
>
>> This modification decreased
>> latency by over 800% (went from > 400us to < 50us) on cpus 6 and 7 in my
>> 8-way box namely because they were not forced to wait for all the other
>> lower cores to finish, but rather completions of double_lock_balance
>> were handled in true FIFO w.r.t. to other calls to
>> double_lock_balance(). It has to do with the positioning within your
>> FIFO ticket locks (though even if ticket locks are not present on a
>> given architecture we should still see an improvement.)
>>
>> When a low cpu wants to double lock, it tends to hold this_rq and gets
>> in line for busiest_rq with no bearing on how long it held this_rq.
>> Therefore the following scenario can occur:
>>
>> cpu 0 cpu N
>> ----------------------------------
>> rq[0] locked
>> ..
>> ..
>> ..
>> double_lock(N, 0)
>> rq[N] released
>> blocked on rq[0]
>> ..
>> ..
>> ..
>> ..
>> double_lock(0, N)
>> rq[N] locked
>> double_lock returns
>> ..
>> ..
>> ..
>> ..
>> rq[0] released rq[0] locked
>> double_lock returns
>> ...
>> ...
>> ...
>>
>> ---------------------------------
>>
>> So double lock acquisition favors the lower cpus unfairly. They will
>> always win, even if they were not first. Now with the combination of my
>> patch plus your ticket locks, entry into the double lock becomes FIFO
>> because the "blocked on rq[0]" would have inserted it in the
>> time-ordered head of rq[0].
>>
>
> Right, but I don't think it is particularly wrong to allow a given
> CPU to double_lock_balance ahead of another guy if we're already holding
> the lock.
Its not "wrong". Its just a latency source ;)
> _So long as_ the lock we are trying to acquire is uncontended,
> and we don't introduce this skewed unfairness due to lower CPUs being
> allowed to hold their lower lock while higher CPUs have to release their
> lock and first queue on the lower.
>
> The difference is that with my patch, there is a small window where the
> guy who asks for the double lock first will go through second. I don't
> think this really adds a fundamental amount of latency, and the
> performance benefit should not be ignored.
>
> Linux's traditional and I suppose much largest user base does not require
> realtime or really strict fairness, so IMO it is always questionable to
> make changes like this.
>
Please take a look at the v2 series that I sent out yesterday. I have
now predicated this on CONFIG_PREEMPT, per your comments.
-Greg
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 257 bytes --]
next prev parent reply other threads:[~2008-08-27 11:44 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-08-25 20:15 [PATCH 0/5] sched: misc rt fixes for tip/sched/devel Gregory Haskins
2008-08-25 20:15 ` [PATCH 1/5] sched: only try to push a task on wakeup if it is migratable Gregory Haskins
2008-08-25 20:15 ` [PATCH 2/5] sched: pull only one task during NEWIDLE balancing to limit critical section Gregory Haskins
2008-08-26 6:21 ` Nick Piggin
2008-08-26 11:36 ` Gregory Haskins
2008-08-27 6:41 ` Nick Piggin
2008-08-27 11:50 ` Gregory Haskins
2008-08-27 11:57 ` Nick Piggin
2008-08-25 20:15 ` [PATCH 3/5] sched: make double-lock-balance fair Gregory Haskins
2008-08-26 6:14 ` Nick Piggin
2008-08-26 12:23 ` Gregory Haskins
2008-08-27 6:36 ` Nick Piggin
2008-08-27 11:41 ` Gregory Haskins [this message]
2008-08-27 11:53 ` Nick Piggin
2008-08-27 12:10 ` Gregory Haskins
2008-08-25 20:15 ` [PATCH 4/5] sched: add sched_class->needs_post_schedule() member Gregory Haskins
2008-08-25 20:15 ` [PATCH 5/5] sched: create "pushable_tasks" list to limit pushing to one attempt Gregory Haskins
2008-08-26 17:34 ` [PATCH v2 0/6] Series short description Gregory Haskins
2008-08-26 17:34 ` [PATCH v2 1/6] sched: only try to push a task on wakeup if it is migratable Gregory Haskins
2008-08-26 17:34 ` [PATCH v2 2/6] sched: pull only one task during NEWIDLE balancing to limit critical section Gregory Haskins
2008-08-26 17:35 ` [PATCH v2 3/6] sched: make double-lock-balance fair Gregory Haskins
2008-08-27 8:21 ` Peter Zijlstra
2008-08-27 8:25 ` Peter Zijlstra
2008-08-27 10:26 ` Nick Piggin
2008-08-27 10:41 ` Peter Zijlstra
2008-08-27 10:56 ` Nick Piggin
2008-08-27 10:57 ` Nick Piggin
2008-08-27 12:03 ` Gregory Haskins
2008-08-27 11:07 ` Peter Zijlstra
2008-08-27 11:17 ` Russell King
2008-08-27 12:00 ` Gregory Haskins
2008-08-29 12:49 ` Ralf Baechle
2008-08-27 12:13 ` Gregory Haskins
2008-08-27 12:02 ` Gregory Haskins
2008-08-26 17:35 ` [PATCH v2 4/6] sched: add sched_class->needs_post_schedule() member Gregory Haskins
2008-08-26 17:35 ` [PATCH v2 5/6] plist: fix PLIST_NODE_INIT to work with debug enabled Gregory Haskins
2008-08-26 17:35 ` [PATCH v2 6/6] sched: create "pushable_tasks" list to limit pushing to one attempt Gregory Haskins
2008-08-29 13:24 ` Gregory Haskins
2008-08-26 18:16 ` [PATCH v2 0/6] sched: misc rt fixes for tip/sched/devel (was: Series short description) Gregory Haskins
2008-08-27 8:33 ` [PATCH v2 0/6] Series short description Peter Zijlstra
2008-09-04 12:54 ` [TIP/SCHED/DEVEL PATCH v3 0/6] sched: misc rt fixes Gregory Haskins
2008-09-04 12:55 ` [TIP/SCHED/DEVEL PATCH v3 1/6] sched: only try to push a task on wakeup if it is migratable Gregory Haskins
2008-09-04 12:55 ` [TIP/SCHED/DEVEL PATCH v3 2/6] sched: pull only one task during NEWIDLE balancing to limit critical section Gregory Haskins
2008-09-04 12:55 ` [TIP/SCHED/DEVEL PATCH v3 3/6] sched: make double-lock-balance fair Gregory Haskins
2008-09-04 12:55 ` [TIP/SCHED/DEVEL PATCH v3 4/6] sched: add sched_class->needs_post_schedule() member Gregory Haskins
2008-09-04 20:36 ` Steven Rostedt
2008-09-04 20:36 ` Gregory Haskins
2008-09-04 12:55 ` [TIP/SCHED/DEVEL PATCH v3 5/6] plist: fix PLIST_NODE_INIT to work with debug enabled Gregory Haskins
2008-09-04 12:55 ` [TIP/SCHED/DEVEL PATCH v3 6/6] sched: create "pushable_tasks" list to limit pushing to one attempt Gregory Haskins
2008-09-04 21:16 ` Steven Rostedt
2008-09-04 21:26 ` Gregory Haskins
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=48B53D86.8080806@novell.com \
--to=ghaskins@novell.com \
--cc=gregory.haskins@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rt-users@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=nickpiggin@yahoo.com.au \
--cc=npiggin@suse.de \
--cc=peterz@infradead.org \
--cc=srostedt@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox