All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@kernel.org>
To: Rik van Riel <riel@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	X86 ML <x86@kernel.org>,
	williams@redhat.com, Andrew Lutomirski <luto@kernel.org>,
	fweisbec@redhat.com, Peter Zijlstra <peterz@infradead.org>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	"Paul E. McKenney" <paulmck@us.ibm.com>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH 3/3] context_tracking,x86: remove extraneous irq disable & enable from context tracking on syscall entry
Date: Fri, 1 May 2015 20:40:26 +0200	[thread overview]
Message-ID: <20150501184025.GA2114@gmail.com> (raw)
In-Reply-To: <5543C05E.9040209@redhat.com>


* Rik van Riel <riel@redhat.com> wrote:

> On 05/01/2015 12:34 PM, Ingo Molnar wrote:
> > 
> > * Rik van Riel <riel@redhat.com> wrote:
> > 
> >>> I can understand people running hard-RT workloads not wanting to 
> >>> see the overhead of a timer tick or a scheduler tick with variable 
> >>> (and occasionally heavy) work done in IRQ context, but the jitter 
> >>> caused by a single trivial IPI with constant work should be very, 
> >>> very low and constant.
> >>
> >> Not if the realtime workload is running inside a KVM guest.
> > 
> > I don't buy this:
> > 
> >> At that point an IPI, either on the host or in the guest, involves a 
> >> full VMEXIT & VMENTER cycle.
> > 
> > So a full VMEXIT/VMENTER costs how much, 2000 cycles? That's around 1 
> > usec on recent hardware, and I bet it will get better with time.
> > 
> > I'm not aware of any hard-RT workload that cannot take 1 usec 
> > latencies.
> 
> Now think about doing this kind of IPI from inside a guest, to 
> another VCPU on the same guest.
> 
> Now you are looking at VMEXIT/VMENTER on the first VCPU,

Does it matter? It's not the hard-RT CPU, and this is a slowpath of 
synchronize_rcu().

> plus the cost of the IPI on the host, plus the cost of the emulation 
> layer, plus VMEXIT/VMENTER on the second VCPU to trigger the IPI 
> work, and possibly a second VMEXIT/VMENTER for IPI completion.

Only the VMEXIT/VMENTER on the second VCPU matters to RT latencies.

> I suspect it would be better to do RCU callback offload in some 
> other way.

Well, it's not just about callback offload, but it's about the basic 
synchronization guarantee of synchronize_rcu(): that all RCU read-side 
critical sections have finished executing after the call returns.

So even if a nohz-full CPU never actually queues a callback, it needs 
to stop using resources that a synchronize_rcu() caller expects it to 
stop using.

We can do that only if we know it in an SMP-coherent way that the 
remote CPU is not in an rcu_read_lock() section.

Sending an IPI is one way to achieve that.

Or we could do that in the syscall path with a single store of a 
constant flag to a location in the task struct. We have a number of 
natural flags that get written on syscall entry, such as:

        pushq_cfi $__USER_DS                    /* pt_regs->ss */

That goes to a constant location on the kernel stack. On return from 
system calls we could write 0 to that location.

So the remote CPU would have to do a read of this location. There are 
two cases:

 - If it's 0, then it has observed quiescent state on that CPU. (It 
   does not have to be atomics anymore, as we'd only observe the value 
   and MESI coherency takes care of it.)

 - If it's not 0 then the remote CPU is not executing user-space code 
   and we can install (remotely) a TIF_NOHZ flag in it and expect it 
   to process it either on return to user-space or on a context 
   switch.

This way, unless I'm missing something, reduces the overhead to a 
single store to a hot cacheline on return-to-userspace - which 
instruction if we place it well might as well be close to zero cost. 
No syscall entry cost. Slow-return cost only in the (rare) case of 
someone using synchronize_rcu().

Hm?

Thanks,

	Ingo

  reply	other threads:[~2015-05-01 18:40 UTC|newest]

Thread overview: 83+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-30 21:23 [PATCH 0/3] reduce nohz_full syscall overhead by 10% riel
2015-04-30 21:23 ` [PATCH 1/3] reduce indentation in __acct_update_integrals riel
2015-04-30 21:23 ` [PATCH 2/3] remove local_irq_save from __acct_update_integrals riel
2015-04-30 21:23 ` [PATCH 3/3] context_tracking,x86: remove extraneous irq disable & enable from context tracking on syscall entry riel
2015-04-30 21:56   ` Andy Lutomirski
2015-05-01  6:40   ` Ingo Molnar
2015-05-01 15:20     ` Rik van Riel
2015-05-01 15:59       ` Ingo Molnar
2015-05-01 16:03         ` Andy Lutomirski
2015-05-01 16:21           ` Ingo Molnar
2015-05-01 16:26             ` Rik van Riel
2015-05-01 16:34               ` Ingo Molnar
2015-05-01 18:05                 ` Rik van Riel
2015-05-01 18:40                   ` Ingo Molnar [this message]
2015-05-01 19:11                     ` Rik van Riel
2015-05-01 19:37                       ` Andy Lutomirski
2015-05-02  5:27                         ` Ingo Molnar
2015-05-02 18:27                           ` Rik van Riel
2015-05-03 18:41                           ` Andy Lutomirski
2015-05-07 10:35                             ` Ingo Molnar
2015-05-04  9:26                           ` Paolo Bonzini
2015-05-04 13:30                             ` Rik van Riel
2015-05-04 14:06                             ` Rik van Riel
2015-05-04 14:19                             ` Rik van Riel
2015-05-04 15:59                             ` question about RCU dynticks_nesting Rik van Riel
2015-05-04 18:39                               ` Paul E. McKenney
2015-05-04 19:39                                 ` Rik van Riel
2015-05-04 20:02                                   ` Paul E. McKenney
2015-05-04 20:13                                     ` Rik van Riel
2015-05-04 20:38                                       ` Paul E. McKenney
2015-05-04 20:53                                         ` Rik van Riel
2015-05-05  5:54                                           ` Paul E. McKenney
2015-05-06  1:49                                             ` Mike Galbraith
2015-05-06  3:44                                               ` Mike Galbraith
2015-05-06  6:06                                                 ` Paul E. McKenney
2015-05-06  6:52                                                   ` Mike Galbraith
2015-05-06  7:01                                                     ` Mike Galbraith
2015-05-07  0:59                                           ` Frederic Weisbecker
2015-05-07 15:44                                             ` Rik van Riel
2015-05-04 19:00                               ` Rik van Riel
2015-05-04 19:39                                 ` Paul E. McKenney
2015-05-04 19:59                                   ` Rik van Riel
2015-05-04 20:40                                     ` Paul E. McKenney
2015-05-05 10:53                                   ` Peter Zijlstra
2015-05-05 12:34                                     ` Paul E. McKenney
2015-05-05 13:00                                       ` Peter Zijlstra
2015-05-05 18:35                                         ` Paul E. McKenney
2015-05-05 21:09                                           ` Rik van Riel
2015-05-06  5:41                                             ` Paul E. McKenney
2015-05-05 10:48                                 ` Peter Zijlstra
2015-05-05 10:51                                   ` Peter Zijlstra
2015-05-05 12:30                                     ` Paul E. McKenney
2015-05-02  4:06                   ` [PATCH 3/3] context_tracking,x86: remove extraneous irq disable & enable from context tracking on syscall entry Mike Galbraith
2015-05-01 16:37             ` Ingo Molnar
2015-05-01 16:40               ` Rik van Riel
2015-05-01 16:45                 ` Ingo Molnar
2015-05-01 16:54                   ` Rik van Riel
2015-05-01 17:12                     ` Ingo Molnar
2015-05-01 17:22                       ` Rik van Riel
2015-05-01 17:59                         ` Ingo Molnar
2015-05-01 16:22           ` Rik van Riel
2015-05-01 16:27             ` Ingo Molnar
2015-05-03 13:23       ` Mike Galbraith
2015-05-03 17:30         ` Rik van Riel
2015-05-03 18:24           ` Andy Lutomirski
2015-05-03 18:52             ` Rik van Riel
2015-05-07 10:48               ` Ingo Molnar
2015-05-07 12:18                 ` Frederic Weisbecker
2015-05-07 12:29                   ` Ingo Molnar
2015-05-07 15:47                     ` Rik van Riel
2015-05-08  7:58                       ` Ingo Molnar
2015-05-07 12:22                 ` Andy Lutomirski
2015-05-07 12:44                   ` Ingo Molnar
2015-05-07 12:49                     ` Ingo Molnar
2015-05-08  6:17                       ` Paul E. McKenney
2015-05-07 12:52                     ` Andy Lutomirski
2015-05-07 15:08                       ` Ingo Molnar
2015-05-07 17:47                         ` Andy Lutomirski
2015-05-08  6:37                           ` Ingo Molnar
2015-05-08 10:59                             ` Andy Lutomirski
2015-05-08 11:27                               ` Ingo Molnar
2015-05-08 12:56                                 ` Andy Lutomirski
2015-05-08 13:27                                   ` Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150501184025.GA2114@gmail.com \
    --to=mingo@kernel.org \
    --cc=fweisbec@redhat.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=paulmck@us.ibm.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=williams@redhat.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.