All of lore.kernel.org
 help / color / mirror / Atom feed
From: "H. Peter Anvin" <hpa@zytor.com>
To: Khalid Aziz <khalid.aziz@oracle.com>,
	tglx@linutronix.de, Ingo Molnar <mingo@kernel.org>,
	peterz@infradead.org, akpm@linux-foundation.org,
	andi.kleen@intel.com, rob@landley.net, viro@zeniv.linux.org.uk,
	oleg@redhat.com, venki@google.com
Cc: linux-kernel@vger.kernel.org
Subject: Re: [RFC] [PATCH] Pre-emption control for userspace
Date: Tue, 04 Mar 2014 13:12:08 -0800	[thread overview]
Message-ID: <531641A8.40306@zytor.com> (raw)
In-Reply-To: <1393870033-31076-1-git-send-email-khalid.aziz@oracle.com>

On 03/03/2014 10:07 AM, Khalid Aziz wrote:
> 
> I am working on a feature that has been requested by database folks that
> helps with performance. Some of the oft executed database code uses
> mutexes to lock other threads out of a critical section. They often see
> a situation where a thread grabs the mutex, runs out of its timeslice
> and gets switched out which then causes another thread to run which
> tries to grab the same mutex, spins for a while and finally gives up.
> This can happen with multiple threads until original lock owner gets the
> CPU again and can complete executing its critical section. This queueing
> and subsequent CPU cycle wastage can be avoided if the locking thread
> could request to be granted an additional timeslice if its current
> timeslice runs out before it gives up the lock. Other operating systems
> have implemented this functionality and is used by databases as well as
> JVM. This functionality has been shown to improve performance by 3%-5%.
> 
> I have implemented similar functionality for Linux. This patch adds a
> file /proc/<tgid>/task/<tid>/sched_preempt_delay for each thread.
> Writing 1 to this file causes CFS scheduler to grant additional time
> slice if the currently running process comes up for pre-emption. Writing
> to this file needs to be very quick operation, so I have implemented
> code to allow mmap'ing /proc/<tgid>/task/<tid>/sched_preempt_delay. This
> allows a userspace task to write this flag very quickly. Usage model is
> a thread mmaps this file during initialization. It then writes a 1 to
> the mmap'd file after it grabs the lock in its critical section where it
> wants immunity from pre-emption. It then writes 0 again to this file
> after it releases the lock and calls sched_yield() to give up the
> processor. I have also added a new field in scheduler statistics -
> nr_preempt_delayed, that counts the number of times a thread has been
> granted amnesty. Further details on using this functionality are in 
> Documentation/scheduler/sched-preempt-delay.txt in the patch. This
> patch is based upon the work Venkatesh Pallipadi had done couple of
> years ago.
> 

Shades of the Android wakelocks, no?

This seems to effectively give userspace an option to turn preemptive
multitasking into cooperative multitasking, which of course is
unacceptable for a privileged process (the same reason why unprivileged
processes aren't allowed to run at above-normal priority, including RT
priority.)

I have several issues with this interface:

1. First, a process needs to know if it *should* have been preempted
before it calls sched_yield().  So there needs to be a second flag set
by the scheduler when granting amnesty.

2. A process which fails to call sched_yield() after being granted
amnesty must be penalized.

3. I'm not keen on occupying a full page for this.  I'm wondering if
doing a pointer into user space, futex-style, might make more sense.
The downside, of course, is what happens if the page being pointed to is
swapped out.

Keep in mind this HAS to be per thread.

	-hpa



  parent reply	other threads:[~2014-03-04 21:12 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-03-03 18:07 [RFC] [PATCH] Pre-emption control for userspace Khalid Aziz
2014-03-03 21:51 ` Davidlohr Bueso
2014-03-03 23:29   ` Khalid Aziz
2014-03-04 13:56 ` Oleg Nesterov
2014-03-04 17:44   ` Khalid Aziz
2014-03-04 18:38     ` Al Viro
2014-03-04 19:01       ` Khalid Aziz
2014-03-04 19:03     ` Oleg Nesterov
2014-03-04 20:14       ` Khalid Aziz
2014-03-05 14:38         ` Oleg Nesterov
2014-03-05 16:12           ` Oleg Nesterov
2014-03-05 17:10             ` Khalid Aziz
2014-03-04 21:12 ` H. Peter Anvin [this message]
2014-03-04 21:39   ` Khalid Aziz
2014-03-04 22:23     ` One Thousand Gnomes
2014-03-04 22:44       ` Khalid Aziz
2014-03-05  0:39         ` Thomas Gleixner
2014-03-05  0:51           ` Andi Kleen
2014-03-05 11:10             ` Peter Zijlstra
2014-03-05 17:29               ` Khalid Aziz
2014-03-05 19:58               ` Khalid Aziz
2014-03-06  9:57                 ` Peter Zijlstra
2014-03-06 16:08                   ` Khalid Aziz
2014-03-06 11:14                 ` Thomas Gleixner
2014-03-06 16:32                   ` Khalid Aziz
2014-03-05 14:54             ` Oleg Nesterov
2014-03-05 15:56               ` Andi Kleen
2014-03-05 16:36                 ` Oleg Nesterov
2014-03-05 17:22                   ` Khalid Aziz
2014-03-05 23:13                     ` David Lang
2014-03-05 23:48                       ` Khalid Aziz
2014-03-05 23:56                         ` H. Peter Anvin
2014-03-06  0:02                           ` Khalid Aziz
2014-03-06  0:13                             ` H. Peter Anvin
2014-03-05 23:59                         ` David Lang
2014-03-06  0:17                           ` Khalid Aziz
2014-03-06  0:36                             ` David Lang
2014-03-06  1:22                               ` Khalid Aziz
2014-03-06 14:23                                 ` David Lang
2014-03-06 12:13             ` Kevin Easton
2014-03-06 13:59               ` Peter Zijlstra
2014-03-06 22:41                 ` Andi Kleen
2014-03-06 14:25               ` David Lang
2014-03-06 16:12                 ` Khalid Aziz
2014-03-06 13:24   ` Rasmus Villemoes
2014-03-06 13:34     ` Peter Zijlstra
2014-03-06 13:45       ` Rasmus Villemoes
2014-03-06 14:02         ` Peter Zijlstra
2014-03-06 14:33           ` Thomas Gleixner
2014-03-06 14:34             ` H. Peter Anvin
2014-03-06 14:04         ` Thomas Gleixner
2014-03-25 17:17 ` [PATCH v2] " Khalid Aziz
2014-03-25 17:44   ` Andrew Morton
2014-03-25 17:56     ` Khalid Aziz
2014-03-25 18:14       ` Andrew Morton
2014-03-25 17:46   ` Oleg Nesterov
2014-03-25 17:59     ` Khalid Aziz
2014-03-25 18:20   ` Andi Kleen
2014-03-25 18:47     ` Khalid Aziz
2014-03-25 19:47       ` Andi Kleen
2014-03-25 18:59   ` Eric W. Biederman
2014-03-25 19:15     ` Khalid Aziz
2014-03-25 20:31       ` Eric W. Biederman
2014-03-25 21:37         ` Khalid Aziz
2014-03-26  6:03     ` Mike Galbraith
2014-03-25 23:01 ` [RFC] [PATCH] " Davidlohr Bueso
2014-03-25 23:29   ` Khalid Aziz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=531641A8.40306@zytor.com \
    --to=hpa@zytor.com \
    --cc=akpm@linux-foundation.org \
    --cc=andi.kleen@intel.com \
    --cc=khalid.aziz@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rob@landley.net \
    --cc=tglx@linutronix.de \
    --cc=venki@google.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.