From: Kirill Tkhai <ktkhai@parallels.com>
To: Andy Lutomirski <luto@amacapital.net>
Cc: Peter Zijlstra <peterz@infradead.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Thomas Gleixner <tglx@linutronix.de>,
"Ingo Molnar" <mingo@redhat.com>,
"H. Peter Anvin" <hpa@zytor.com>
Subject: Re: [RFC] sched, x86: Prevent resched interrupts if task in kernel mode and !CONFIG_PREEMPT
Date: Fri, 23 Jan 2015 20:09:49 +0300 [thread overview]
Message-ID: <1422032989.6345.26.camel@tkhai> (raw)
In-Reply-To: <CALCETrVEsNj8dvqd-mNqb5tKNQOwQEgtMRUeTtJSS8-EmntAiA@mail.gmail.com>
В Пт, 23/01/2015 в 08:24 -0800, Andy Lutomirski пишет:
> On Fri, Jan 23, 2015 at 8:07 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Fri, Jan 23, 2015 at 06:53:32PM +0300, Kirill Tkhai wrote:
> >> It's useless to send reschedule interrupts in such situations. The earliest
> >> point, where schedule() call is possible, is sysret_careful(). But in that
> >> function we directly test TIF_NEED_RESCHED.
> >>
> >> So it's possible to get rid of that type of interrupts.
> >>
> >> How about this idea? Is set_bit() cheap on x86 machines?
> >
> > So you set TIF_POLLING_NRFLAG on syscall entry and clear it again on
> > exit? Thereby we avoid the IPI, because the exit path already checks for
> > TIF_NEED_RESCHED.
>
> The idle code says:
>
> /*
> * If the arch has a polling bit, we maintain an invariant:
> *
> * Our polling bit is clear if we're not scheduled (i.e. if
> * rq->curr != rq->idle). This means that, if rq->idle has
> * the polling bit set, then setting need_resched is
> * guaranteed to cause the cpu to reschedule.
> */
>
> Setting polling on non-idle tasks like this will either involve
> weakening this a bit (it'll still be true for rq->idle) or changing
> the polling state on context switch.
>
> >
> > Should work I suppose, but I'm not too familiar with all that entry.S
> > muck. Andy might know and appreciate this.
> >
> >> ---
> >> arch/x86/kernel/entry_64.S | 10 ++++++++++
> >> 1 file changed, 10 insertions(+)
> >>
> >> diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
> >> index c653dc4..a046ba8 100644
> >> --- a/arch/x86/kernel/entry_64.S
> >> +++ b/arch/x86/kernel/entry_64.S
> >> @@ -409,6 +409,13 @@ GLOBAL(system_call_after_swapgs)
> >> movq_cfi rax,(ORIG_RAX-ARGOFFSET)
> >> movq %rcx,RIP-ARGOFFSET(%rsp)
> >> CFI_REL_OFFSET rip,RIP-ARGOFFSET
> >> +#if !defined(CONFIG_PREEMPT) || !defined(SMP)
> >> + /*
> >> + * Tell resched_curr() do not send useless interrupts to us.
> >> + * Kernel isn't preemptible till sysret_careful() anyway.
> >> + */
> >> + LOCK ; bts $TIF_POLLING_NRFLAG,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
> >> +#endif
>
> That's kind of expensive. What's the !SMP part for?
smp_send_reschedule() is NOP on UP. There is no problem.
>
> >> testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
> >> jnz tracesys
> >> system_call_fastpath:
> >> @@ -427,6 +434,9 @@ GLOBAL(system_call_after_swapgs)
> >> * Has incomplete stack frame and undefined top of stack.
> >> */
> >> ret_from_sys_call:
> >> +#if !defined(CONFIG_PREEMPT) || !defined(SMP)
> >> + LOCK ; btr $TIF_POLLING_NRFLAG,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
> >> +#endif
>
> If only it were this simple. There are lots of ways out of syscalls,
> and this is only one of them :( If we did this, I'd rather do it
> through the do_notify_resume mechanism or something.
Yes, syscall is the only thing I did as an example.
> I don't see any way to do this without at least one atomic op or
> smp_mb per syscall, and that's kind of expensive.
JFI, doesn't x86 set_bit() lock a small area of memory? I thought
it's not very expensive on this arch (some bus optimizations or
something like this).
> Would it make sense to try to use context tracking instead? On
> systems that use context tracking, syscalls are already expensive, and
> we're already keeping track of which CPUs are in user mode.
I'll look at context_tracking, but I'm not sure some smp synchronization
there.
Thanks,
Kirill
next prev parent reply other threads:[~2015-01-23 17:10 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-01-23 15:53 [RFC] sched, x86: Prevent resched interrupts if task in kernel mode and !CONFIG_PREEMPT Kirill Tkhai
2015-01-23 16:07 ` Peter Zijlstra
2015-01-23 16:24 ` Andy Lutomirski
2015-01-23 17:09 ` Kirill Tkhai [this message]
2015-01-24 2:36 ` Andy Lutomirski
2015-01-26 11:58 ` Kirill Tkhai
2015-02-03 17:14 ` Kirill Tkhai
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1422032989.6345.26.camel@tkhai \
--to=ktkhai@parallels.com \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@amacapital.net \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox