From: Ingo Molnar <mingo@kernel.org>
To: Rik van Riel <riel@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
X86 ML <x86@kernel.org>,
williams@redhat.com, Andrew Lutomirski <luto@kernel.org>,
fweisbec@redhat.com, Peter Zijlstra <peterz@infradead.org>,
Heiko Carstens <heiko.carstens@de.ibm.com>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>,
Paolo Bonzini <pbonzini@redhat.com>,
"Paul E. McKenney" <paulmck@us.ibm.com>,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH 3/3] context_tracking,x86: remove extraneous irq disable & enable from context tracking on syscall entry
Date: Fri, 1 May 2015 20:40:26 +0200 [thread overview]
Message-ID: <20150501184025.GA2114@gmail.com> (raw)
In-Reply-To: <5543C05E.9040209@redhat.com>
* Rik van Riel <riel@redhat.com> wrote:
> On 05/01/2015 12:34 PM, Ingo Molnar wrote:
> >
> > * Rik van Riel <riel@redhat.com> wrote:
> >
> >>> I can understand people running hard-RT workloads not wanting to
> >>> see the overhead of a timer tick or a scheduler tick with variable
> >>> (and occasionally heavy) work done in IRQ context, but the jitter
> >>> caused by a single trivial IPI with constant work should be very,
> >>> very low and constant.
> >>
> >> Not if the realtime workload is running inside a KVM guest.
> >
> > I don't buy this:
> >
> >> At that point an IPI, either on the host or in the guest, involves a
> >> full VMEXIT & VMENTER cycle.
> >
> > So a full VMEXIT/VMENTER costs how much, 2000 cycles? That's around 1
> > usec on recent hardware, and I bet it will get better with time.
> >
> > I'm not aware of any hard-RT workload that cannot take 1 usec
> > latencies.
>
> Now think about doing this kind of IPI from inside a guest, to
> another VCPU on the same guest.
>
> Now you are looking at VMEXIT/VMENTER on the first VCPU,
Does it matter? It's not the hard-RT CPU, and this is a slowpath of
synchronize_rcu().
> plus the cost of the IPI on the host, plus the cost of the emulation
> layer, plus VMEXIT/VMENTER on the second VCPU to trigger the IPI
> work, and possibly a second VMEXIT/VMENTER for IPI completion.
Only the VMEXIT/VMENTER on the second VCPU matters to RT latencies.
> I suspect it would be better to do RCU callback offload in some
> other way.
Well, it's not just about callback offload, but it's about the basic
synchronization guarantee of synchronize_rcu(): that all RCU read-side
critical sections have finished executing after the call returns.
So even if a nohz-full CPU never actually queues a callback, it needs
to stop using resources that a synchronize_rcu() caller expects it to
stop using.
We can do that only if we know it in an SMP-coherent way that the
remote CPU is not in an rcu_read_lock() section.
Sending an IPI is one way to achieve that.
Or we could do that in the syscall path with a single store of a
constant flag to a location in the task struct. We have a number of
natural flags that get written on syscall entry, such as:
pushq_cfi $__USER_DS /* pt_regs->ss */
That goes to a constant location on the kernel stack. On return from
system calls we could write 0 to that location.
So the remote CPU would have to do a read of this location. There are
two cases:
- If it's 0, then it has observed quiescent state on that CPU. (It
does not have to be atomics anymore, as we'd only observe the value
and MESI coherency takes care of it.)
- If it's not 0 then the remote CPU is not executing user-space code
and we can install (remotely) a TIF_NOHZ flag in it and expect it
to process it either on return to user-space or on a context
switch.
This way, unless I'm missing something, reduces the overhead to a
single store to a hot cacheline on return-to-userspace - which
instruction if we place it well might as well be close to zero cost.
No syscall entry cost. Slow-return cost only in the (rare) case of
someone using synchronize_rcu().
Hm?
Thanks,
Ingo
next prev parent reply other threads:[~2015-05-01 18:40 UTC|newest]
Thread overview: 83+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-04-30 21:23 [PATCH 0/3] reduce nohz_full syscall overhead by 10% riel
2015-04-30 21:23 ` [PATCH 1/3] reduce indentation in __acct_update_integrals riel
2015-04-30 21:23 ` [PATCH 2/3] remove local_irq_save from __acct_update_integrals riel
2015-04-30 21:23 ` [PATCH 3/3] context_tracking,x86: remove extraneous irq disable & enable from context tracking on syscall entry riel
2015-04-30 21:56 ` Andy Lutomirski
2015-05-01 6:40 ` Ingo Molnar
2015-05-01 15:20 ` Rik van Riel
2015-05-01 15:59 ` Ingo Molnar
2015-05-01 16:03 ` Andy Lutomirski
2015-05-01 16:21 ` Ingo Molnar
2015-05-01 16:26 ` Rik van Riel
2015-05-01 16:34 ` Ingo Molnar
2015-05-01 18:05 ` Rik van Riel
2015-05-01 18:40 ` Ingo Molnar [this message]
2015-05-01 19:11 ` Rik van Riel
2015-05-01 19:37 ` Andy Lutomirski
2015-05-02 5:27 ` Ingo Molnar
2015-05-02 18:27 ` Rik van Riel
2015-05-03 18:41 ` Andy Lutomirski
2015-05-07 10:35 ` Ingo Molnar
2015-05-04 9:26 ` Paolo Bonzini
2015-05-04 13:30 ` Rik van Riel
2015-05-04 14:06 ` Rik van Riel
2015-05-04 14:19 ` Rik van Riel
2015-05-04 15:59 ` question about RCU dynticks_nesting Rik van Riel
2015-05-04 18:39 ` Paul E. McKenney
2015-05-04 19:39 ` Rik van Riel
2015-05-04 20:02 ` Paul E. McKenney
2015-05-04 20:13 ` Rik van Riel
2015-05-04 20:38 ` Paul E. McKenney
2015-05-04 20:53 ` Rik van Riel
2015-05-05 5:54 ` Paul E. McKenney
2015-05-06 1:49 ` Mike Galbraith
2015-05-06 3:44 ` Mike Galbraith
2015-05-06 6:06 ` Paul E. McKenney
2015-05-06 6:52 ` Mike Galbraith
2015-05-06 7:01 ` Mike Galbraith
2015-05-07 0:59 ` Frederic Weisbecker
2015-05-07 15:44 ` Rik van Riel
2015-05-04 19:00 ` Rik van Riel
2015-05-04 19:39 ` Paul E. McKenney
2015-05-04 19:59 ` Rik van Riel
2015-05-04 20:40 ` Paul E. McKenney
2015-05-05 10:53 ` Peter Zijlstra
2015-05-05 12:34 ` Paul E. McKenney
2015-05-05 13:00 ` Peter Zijlstra
2015-05-05 18:35 ` Paul E. McKenney
2015-05-05 21:09 ` Rik van Riel
2015-05-06 5:41 ` Paul E. McKenney
2015-05-05 10:48 ` Peter Zijlstra
2015-05-05 10:51 ` Peter Zijlstra
2015-05-05 12:30 ` Paul E. McKenney
2015-05-02 4:06 ` [PATCH 3/3] context_tracking,x86: remove extraneous irq disable & enable from context tracking on syscall entry Mike Galbraith
2015-05-01 16:37 ` Ingo Molnar
2015-05-01 16:40 ` Rik van Riel
2015-05-01 16:45 ` Ingo Molnar
2015-05-01 16:54 ` Rik van Riel
2015-05-01 17:12 ` Ingo Molnar
2015-05-01 17:22 ` Rik van Riel
2015-05-01 17:59 ` Ingo Molnar
2015-05-01 16:22 ` Rik van Riel
2015-05-01 16:27 ` Ingo Molnar
2015-05-03 13:23 ` Mike Galbraith
2015-05-03 17:30 ` Rik van Riel
2015-05-03 18:24 ` Andy Lutomirski
2015-05-03 18:52 ` Rik van Riel
2015-05-07 10:48 ` Ingo Molnar
2015-05-07 12:18 ` Frederic Weisbecker
2015-05-07 12:29 ` Ingo Molnar
2015-05-07 15:47 ` Rik van Riel
2015-05-08 7:58 ` Ingo Molnar
2015-05-07 12:22 ` Andy Lutomirski
2015-05-07 12:44 ` Ingo Molnar
2015-05-07 12:49 ` Ingo Molnar
2015-05-08 6:17 ` Paul E. McKenney
2015-05-07 12:52 ` Andy Lutomirski
2015-05-07 15:08 ` Ingo Molnar
2015-05-07 17:47 ` Andy Lutomirski
2015-05-08 6:37 ` Ingo Molnar
2015-05-08 10:59 ` Andy Lutomirski
2015-05-08 11:27 ` Ingo Molnar
2015-05-08 12:56 ` Andy Lutomirski
2015-05-08 13:27 ` Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150501184025.GA2114@gmail.com \
--to=mingo@kernel.org \
--cc=fweisbec@redhat.com \
--cc=heiko.carstens@de.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@amacapital.net \
--cc=luto@kernel.org \
--cc=mingo@redhat.com \
--cc=paulmck@us.ibm.com \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=riel@redhat.com \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=williams@redhat.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.