linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@kernel.org>
To: Rik van Riel <riel@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	X86 ML <x86@kernel.org>,
	williams@redhat.com, Andrew Lutomirski <luto@kernel.org>,
	fweisbec@redhat.com, Peter Zijlstra <peterz@infradead.org>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	"Paul E. McKenney" <paulmck@us.ibm.com>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH 3/3] context_tracking,x86: remove extraneous irq disable & enable from context tracking on syscall entry
Date: Fri, 1 May 2015 20:40:26 +0200	[thread overview]
Message-ID: <20150501184025.GA2114@gmail.com> (raw)
In-Reply-To: <5543C05E.9040209@redhat.com>


* Rik van Riel <riel@redhat.com> wrote:

> On 05/01/2015 12:34 PM, Ingo Molnar wrote:
> > 
> > * Rik van Riel <riel@redhat.com> wrote:
> > 
> >>> I can understand people running hard-RT workloads not wanting to 
> >>> see the overhead of a timer tick or a scheduler tick with variable 
> >>> (and occasionally heavy) work done in IRQ context, but the jitter 
> >>> caused by a single trivial IPI with constant work should be very, 
> >>> very low and constant.
> >>
> >> Not if the realtime workload is running inside a KVM guest.
> > 
> > I don't buy this:
> > 
> >> At that point an IPI, either on the host or in the guest, involves a 
> >> full VMEXIT & VMENTER cycle.
> > 
> > So a full VMEXIT/VMENTER costs how much, 2000 cycles? That's around 1 
> > usec on recent hardware, and I bet it will get better with time.
> > 
> > I'm not aware of any hard-RT workload that cannot take 1 usec 
> > latencies.
> 
> Now think about doing this kind of IPI from inside a guest, to 
> another VCPU on the same guest.
> 
> Now you are looking at VMEXIT/VMENTER on the first VCPU,

Does it matter? It's not the hard-RT CPU, and this is a slowpath of 
synchronize_rcu().

> plus the cost of the IPI on the host, plus the cost of the emulation 
> layer, plus VMEXIT/VMENTER on the second VCPU to trigger the IPI 
> work, and possibly a second VMEXIT/VMENTER for IPI completion.

Only the VMEXIT/VMENTER on the second VCPU matters to RT latencies.

> I suspect it would be better to do RCU callback offload in some 
> other way.

Well, it's not just about callback offload, but it's about the basic 
synchronization guarantee of synchronize_rcu(): that all RCU read-side 
critical sections have finished executing after the call returns.

So even if a nohz-full CPU never actually queues a callback, it needs 
to stop using resources that a synchronize_rcu() caller expects it to 
stop using.

We can do that only if we know it in an SMP-coherent way that the 
remote CPU is not in an rcu_read_lock() section.

Sending an IPI is one way to achieve that.

Or we could do that in the syscall path with a single store of a 
constant flag to a location in the task struct. We have a number of 
natural flags that get written on syscall entry, such as:

        pushq_cfi $__USER_DS                    /* pt_regs->ss */

That goes to a constant location on the kernel stack. On return from 
system calls we could write 0 to that location.

So the remote CPU would have to do a read of this location. There are 
two cases:

 - If it's 0, then it has observed quiescent state on that CPU. (It 
   does not have to be atomics anymore, as we'd only observe the value 
   and MESI coherency takes care of it.)

 - If it's not 0 then the remote CPU is not executing user-space code 
   and we can install (remotely) a TIF_NOHZ flag in it and expect it 
   to process it either on return to user-space or on a context 
   switch.

This way, unless I'm missing something, reduces the overhead to a 
single store to a hot cacheline on return-to-userspace - which 
instruction if we place it well might as well be close to zero cost. 
No syscall entry cost. Slow-return cost only in the (rare) case of 
someone using synchronize_rcu().

Hm?

Thanks,

	Ingo

  reply	other threads:[~2015-05-01 18:40 UTC|newest]

Thread overview: 83+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-30 21:23 [PATCH 0/3] reduce nohz_full syscall overhead by 10% riel
2015-04-30 21:23 ` [PATCH 1/3] reduce indentation in __acct_update_integrals riel
2015-04-30 21:23 ` [PATCH 2/3] remove local_irq_save from __acct_update_integrals riel
2015-04-30 21:23 ` [PATCH 3/3] context_tracking,x86: remove extraneous irq disable & enable from context tracking on syscall entry riel
2015-04-30 21:56   ` Andy Lutomirski
2015-05-01  6:40   ` Ingo Molnar
2015-05-01 15:20     ` Rik van Riel
2015-05-01 15:59       ` Ingo Molnar
2015-05-01 16:03         ` Andy Lutomirski
2015-05-01 16:21           ` Ingo Molnar
2015-05-01 16:26             ` Rik van Riel
2015-05-01 16:34               ` Ingo Molnar
2015-05-01 18:05                 ` Rik van Riel
2015-05-01 18:40                   ` Ingo Molnar [this message]
2015-05-01 19:11                     ` Rik van Riel
2015-05-01 19:37                       ` Andy Lutomirski
2015-05-02  5:27                         ` Ingo Molnar
2015-05-02 18:27                           ` Rik van Riel
2015-05-03 18:41                           ` Andy Lutomirski
2015-05-07 10:35                             ` Ingo Molnar
2015-05-04  9:26                           ` Paolo Bonzini
2015-05-04 13:30                             ` Rik van Riel
2015-05-04 14:06                             ` Rik van Riel
2015-05-04 14:19                             ` Rik van Riel
2015-05-04 15:59                             ` question about RCU dynticks_nesting Rik van Riel
2015-05-04 18:39                               ` Paul E. McKenney
2015-05-04 19:39                                 ` Rik van Riel
2015-05-04 20:02                                   ` Paul E. McKenney
2015-05-04 20:13                                     ` Rik van Riel
2015-05-04 20:38                                       ` Paul E. McKenney
2015-05-04 20:53                                         ` Rik van Riel
2015-05-05  5:54                                           ` Paul E. McKenney
2015-05-06  1:49                                             ` Mike Galbraith
2015-05-06  3:44                                               ` Mike Galbraith
2015-05-06  6:06                                                 ` Paul E. McKenney
2015-05-06  6:52                                                   ` Mike Galbraith
2015-05-06  7:01                                                     ` Mike Galbraith
2015-05-07  0:59                                           ` Frederic Weisbecker
2015-05-07 15:44                                             ` Rik van Riel
2015-05-04 19:00                               ` Rik van Riel
2015-05-04 19:39                                 ` Paul E. McKenney
2015-05-04 19:59                                   ` Rik van Riel
2015-05-04 20:40                                     ` Paul E. McKenney
2015-05-05 10:53                                   ` Peter Zijlstra
2015-05-05 12:34                                     ` Paul E. McKenney
2015-05-05 13:00                                       ` Peter Zijlstra
2015-05-05 18:35                                         ` Paul E. McKenney
2015-05-05 21:09                                           ` Rik van Riel
2015-05-06  5:41                                             ` Paul E. McKenney
2015-05-05 10:48                                 ` Peter Zijlstra
2015-05-05 10:51                                   ` Peter Zijlstra
2015-05-05 12:30                                     ` Paul E. McKenney
2015-05-02  4:06                   ` [PATCH 3/3] context_tracking,x86: remove extraneous irq disable & enable from context tracking on syscall entry Mike Galbraith
2015-05-01 16:37             ` Ingo Molnar
2015-05-01 16:40               ` Rik van Riel
2015-05-01 16:45                 ` Ingo Molnar
2015-05-01 16:54                   ` Rik van Riel
2015-05-01 17:12                     ` Ingo Molnar
2015-05-01 17:22                       ` Rik van Riel
2015-05-01 17:59                         ` Ingo Molnar
2015-05-01 16:22           ` Rik van Riel
2015-05-01 16:27             ` Ingo Molnar
2015-05-03 13:23       ` Mike Galbraith
2015-05-03 17:30         ` Rik van Riel
2015-05-03 18:24           ` Andy Lutomirski
2015-05-03 18:52             ` Rik van Riel
2015-05-07 10:48               ` Ingo Molnar
2015-05-07 12:18                 ` Frederic Weisbecker
2015-05-07 12:29                   ` Ingo Molnar
2015-05-07 15:47                     ` Rik van Riel
2015-05-08  7:58                       ` Ingo Molnar
2015-05-07 12:22                 ` Andy Lutomirski
2015-05-07 12:44                   ` Ingo Molnar
2015-05-07 12:49                     ` Ingo Molnar
2015-05-08  6:17                       ` Paul E. McKenney
2015-05-07 12:52                     ` Andy Lutomirski
2015-05-07 15:08                       ` Ingo Molnar
2015-05-07 17:47                         ` Andy Lutomirski
2015-05-08  6:37                           ` Ingo Molnar
2015-05-08 10:59                             ` Andy Lutomirski
2015-05-08 11:27                               ` Ingo Molnar
2015-05-08 12:56                                 ` Andy Lutomirski
2015-05-08 13:27                                   ` Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150501184025.GA2114@gmail.com \
    --to=mingo@kernel.org \
    --cc=fweisbec@redhat.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=paulmck@us.ibm.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=williams@redhat.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).