From: Khalid Aziz <khalid.aziz@oracle.com>
To: David Lang <david@lang.hm>
Cc: Oleg Nesterov <oleg@redhat.com>, Andi Kleen <andi@firstfloor.org>,
Thomas Gleixner <tglx@linutronix.de>,
One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>,
"H. Peter Anvin" <hpa@zytor.com>, Ingo Molnar <mingo@kernel.org>,
peterz@infradead.org, akpm@linux-foundation.org,
viro@zeniv.linux.org.uk, linux-kernel@vger.kernel.org
Subject: Re: [RFC] [PATCH] Pre-emption control for userspace
Date: Wed, 05 Mar 2014 16:48:37 -0700 [thread overview]
Message-ID: <5317B7D5.2030403@oracle.com> (raw)
In-Reply-To: <alpine.DEB.2.02.1403051506550.6682@nftneq.ynat.uz>
On 03/05/2014 04:13 PM, David Lang wrote:
> Yes, you have paid the cost of the context switch, but your original
> problem description talked about having multiple other threads trying to
> get the lock, then spinning trying to get the lock (wasting time if the
> process holding it is asleep, but not if it's running on another core)
> and causing a long delay before the process holding the lock gets a
> chance to run again.
>
> Having the threads immediately yield to the process that has the lock
> reduces this down to two context switches, which isn't perfect, but it's
> a LOT better than what you started from.
OK, let us consider the multiple core scenario:
- Thread A gets scheduled on core 1, runs for 95% of its timeslice
before it gets to its critical section.
- Thread A grabs the lock and quickly reaches the end of its timeslice
before finishing its critical section.
- Thread A is preempted on core 1 by a completely unrelated thread.
- Thread B in the mean time is scheduled on core 2 and happens to get to
its critical section right away where it tries to grab the lock held by
thread A. It spins for a bit waiting to see if lock becomes available,
gives up and yields to next process in queue.
- Since thread A ran recently, it is now stuck towards the end of run
queue, so thread C gets to run on core 2 which goes through same fate as
thread A.
Now scale this scenario across more cores and more threads that all want
the same lock to execute their small critical section. This cost of
spinning and context switch could have been avoided if thread A could
get additional timeslice to complete its critical section. Yielding to
the process holding the lock happens only after contention has happened
and we have paid the price of two context switches for the lock owner.
When yield_to() happens, the lock owner may not get to run on the core
it was on because an unrelated thread is running on that core and it
needs to wait for that thread's timeslice to run out. If the lock owner
gets scheduled on another core, we pay the price of repopulating the
cache for a new thread on that core. yield_to() is better than having a
convoy building up of processes waiting for the same lock but worse than
avoiding the contention altogether.
>
> well, writing to something in /proc isn't free either. And how is the
> thread supposed to know if it needs to do so or if it's going to have
> enough time to finish it's work before it's out of time (how can it know
> how much time it would have left anyway?)
Cost is writing to a memory location since thread is using mmap, not
insignificant but hardly expensive. Thread does not need to know how
much time it has left in current timeslice. It always sets the flag to
request pre-emption immunity before entering the critical section and
clears the flag when it exits its critical section. If the thread comes
up for pre-emption while the flag is set, it gets immunity. If it does
not, flag will be cleared at the end of critical section any way.
> is this gain from not giving up the CPU at all? or is it from avoiding
> all the delays due to the contending thread trying in turn? the
> yield_to() approach avoids all those other threads trying in turn so it
> should get fairly close to the same benefits.
>
The gain is from avoiding contention by giving locking thread a chance
to complete its critical section which is expected to be very short
(certainly shorter than timeslice). Pre-emption immunity gives it one
and only one additional timeslice.
Hope this helps clear things up.
Thanks,
Khalid
next prev parent reply other threads:[~2014-03-05 23:49 UTC|newest]
Thread overview: 67+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-03-03 18:07 [RFC] [PATCH] Pre-emption control for userspace Khalid Aziz
2014-03-03 21:51 ` Davidlohr Bueso
2014-03-03 23:29 ` Khalid Aziz
2014-03-04 13:56 ` Oleg Nesterov
2014-03-04 17:44 ` Khalid Aziz
2014-03-04 18:38 ` Al Viro
2014-03-04 19:01 ` Khalid Aziz
2014-03-04 19:03 ` Oleg Nesterov
2014-03-04 20:14 ` Khalid Aziz
2014-03-05 14:38 ` Oleg Nesterov
2014-03-05 16:12 ` Oleg Nesterov
2014-03-05 17:10 ` Khalid Aziz
2014-03-04 21:12 ` H. Peter Anvin
2014-03-04 21:39 ` Khalid Aziz
2014-03-04 22:23 ` One Thousand Gnomes
2014-03-04 22:44 ` Khalid Aziz
2014-03-05 0:39 ` Thomas Gleixner
2014-03-05 0:51 ` Andi Kleen
2014-03-05 11:10 ` Peter Zijlstra
2014-03-05 17:29 ` Khalid Aziz
2014-03-05 19:58 ` Khalid Aziz
2014-03-06 9:57 ` Peter Zijlstra
2014-03-06 16:08 ` Khalid Aziz
2014-03-06 11:14 ` Thomas Gleixner
2014-03-06 16:32 ` Khalid Aziz
2014-03-05 14:54 ` Oleg Nesterov
2014-03-05 15:56 ` Andi Kleen
2014-03-05 16:36 ` Oleg Nesterov
2014-03-05 17:22 ` Khalid Aziz
2014-03-05 23:13 ` David Lang
2014-03-05 23:48 ` Khalid Aziz [this message]
2014-03-05 23:56 ` H. Peter Anvin
2014-03-06 0:02 ` Khalid Aziz
2014-03-06 0:13 ` H. Peter Anvin
2014-03-05 23:59 ` David Lang
2014-03-06 0:17 ` Khalid Aziz
2014-03-06 0:36 ` David Lang
2014-03-06 1:22 ` Khalid Aziz
2014-03-06 14:23 ` David Lang
2014-03-06 12:13 ` Kevin Easton
2014-03-06 13:59 ` Peter Zijlstra
2014-03-06 22:41 ` Andi Kleen
2014-03-06 14:25 ` David Lang
2014-03-06 16:12 ` Khalid Aziz
2014-03-06 13:24 ` Rasmus Villemoes
2014-03-06 13:34 ` Peter Zijlstra
2014-03-06 13:45 ` Rasmus Villemoes
2014-03-06 14:02 ` Peter Zijlstra
2014-03-06 14:33 ` Thomas Gleixner
2014-03-06 14:34 ` H. Peter Anvin
2014-03-06 14:04 ` Thomas Gleixner
2014-03-25 17:17 ` [PATCH v2] " Khalid Aziz
2014-03-25 17:44 ` Andrew Morton
2014-03-25 17:56 ` Khalid Aziz
2014-03-25 18:14 ` Andrew Morton
2014-03-25 17:46 ` Oleg Nesterov
2014-03-25 17:59 ` Khalid Aziz
2014-03-25 18:20 ` Andi Kleen
2014-03-25 18:47 ` Khalid Aziz
2014-03-25 19:47 ` Andi Kleen
2014-03-25 18:59 ` Eric W. Biederman
2014-03-25 19:15 ` Khalid Aziz
2014-03-25 20:31 ` Eric W. Biederman
2014-03-25 21:37 ` Khalid Aziz
2014-03-26 6:03 ` Mike Galbraith
2014-03-25 23:01 ` [RFC] [PATCH] " Davidlohr Bueso
2014-03-25 23:29 ` Khalid Aziz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5317B7D5.2030403@oracle.com \
--to=khalid.aziz@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=andi@firstfloor.org \
--cc=david@lang.hm \
--cc=gnomes@lxorguk.ukuu.org.uk \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=oleg@redhat.com \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox