public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Frederic Weisbecker <fweisbec@gmail.com>
To: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>, X86 ML <x86@kernel.org>,
	Sasha Levin <sasha.levin@oracle.com>,
	Brian Gerst <brgerst@gmail.com>,
	Denys Vlasenko <dvlasenk@redhat.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Oleg Nesterov <oleg@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Rik van Riel <riel@redhat.com>
Subject: Re: [PATCH] x86/entry/64: Context-track syscalls before enabling interrupts
Date: Wed, 19 Aug 2015 19:10:09 +0200	[thread overview]
Message-ID: <20150819171007.GA21717@lerouge> (raw)
In-Reply-To: <CALCETrVqA0JVxZeD+7vMD-9xmF8udXw5RO0fwQA2X45DSVqM0Q@mail.gmail.com>

On Tue, Aug 18, 2015 at 04:07:51PM -0700, Andy Lutomirski wrote:
> On Tue, Aug 18, 2015 at 4:02 PM, Frederic Weisbecker <fweisbec@gmail.com> wrote:
> > On Tue, Aug 18, 2015 at 03:35:30PM -0700, Andy Lutomirski wrote:
> >> On Tue, Aug 18, 2015 at 3:16 PM, Frederic Weisbecker <fweisbec@gmail.com> wrote:
> >> > On Tue, Aug 18, 2015 at 12:11:59PM -0700, Andy Lutomirski wrote:
> >> >> This fixes a couple minor holes if we took an IRQ very early in syscall
> >> >> processing:
> >> >>
> >> >>  - We could enter the IRQ with CONTEXT_USER.  Everything worked (RCU
> >> >>    was fine), but we could warn if all the debugging options were
> >> >>    set.
> >> >
> >> > So this is fixing issues after your changes that call user_exit() from
> >> > IRQs, right?
> >>
> >> Yes.  Here's an example splat, courtesy of Sasha:
> >>
> >> https://gist.github.com/sashalevin/a006a44989312f6835e7
> >>
> >> >
> >> > But the IRQs aren't supposed to call user_exit(), they have their own hooks.
> >> > That's where the real issue is.
> >>
> >> In -tip, the assumption is that we *always* switch to CONTEXT_KERNEL
> >> when entering the kernel for a non-NMI reason.
> >
> > Why? IRQs don't need that! We already have irq_enter()/irq_exit().
> >
> 
> Those are certainly redundant.

So? What's the point in duplicating a hook in arch code that core code already
has?

> I want to have a real hook to call
> that says "switch to IRQ context from CONTEXT_USER" or "switch to IRQ
> context from CONTEXT_KERNEL" (aka noop), but that doesn't currently
> exist.

You're not answering _why_ you want that.

> 
> > And we don't want to call rcu_user_*() pairs on IRQs, you're
> > introducing a serious performance regression here! And I'm talking about
> > the code that's currently in -tip.
> 
> Is there an easy way to fix it?  For example, could we figure out what
> makes it take so long and make it faster?

Sure, just remove your arch IRQ hook.

> If we need to, we could
> back out the IRQ bit and change the assertions for 4.3, but I'd rather
> keep the exact context tracking if at all possible.

I have no idea what you mean by exact context tracking here.

But If we ever want to call irq_enter() using arch hooks, and I have no idea why
we would ever want to do that since that involve complexifying the code
by $NR_ARCHS and moving C code to ASM, we need serious reasons! And that's
certainly not something we are going to plan now for the next week's merge window.

> >> That means that we can
> >> avoid all of the (expensive!) checks for what context we're in.
> >
> > If you're referring to context tracking, the context check is a per-cpu
> > read. Not something that's usually considered expensive.
> 
> In -tip, there aren't even extra branches, except those imposed by the
> user_exit implementation.

No there is the "call enter_from_user_mode" in the IRQ fast path.

> 
> >
> >> It also means that (other than IRQs, which need further cleanup), we only
> >> switch once per user/kernel switch.
> >
> > ???
> 
> In 4.2 and before, we can switch multiple times on the way out of the
> kernel, via SCHEDULE_USER, do_notify_resume, etc.  In -tip, we do it
> exactly once no matter what.

That's what we want for syscalls but not for IRQs.

> 
> >
> >>
> >> The cost for doing should be essentially zero, modulo artifacts from
> >> poor inlining.
> >
> > And modulo rcu_user_*() that do multiple costly atomic_add_return() operations
> > implying full memory barriers. Plus the unnecessary vtime accounting that doubles
> > the existing one in irq_enter/exit() (those even imply a lock currently, which will
> > probably be turned to seqcount, but still, full memory barriers...).
> >
> > I'm sorry but I'm going to NACK any code that does that in IRQs (and again that
> > concerns current tip:x86/asm).
> 
> Why do we need these heavyweight barriers?

Actually it's not full barriers but atomic ones (smp_mb__after_atomic_stuff())
I suspect we can't do much better given RCU requirements.

Still we don't need to call it twice.

> 
> If there's actually a measurable performance hit in IRQs in -tip, then
> can we come up with a better fix?

I'm sure it's very easily measurable.

> For example, we could change all
> the new CT_WARN_ON calls to check "are we in CONTEXT_KERNEL or in IRQ
> context" and make the IRQ entry do a lighter weight context tracking
> operation.

I don't see what we need to check actually. Context tracking can be in any
state while in IRQ.

> 
> But I think I'm still missing something fundamental about the
> performance: why is irq_enter() any faster than user_exit()?

It's stlightly faster at least because it takes care of nesting IRQs which
is likely with softirqs that get interrupted.

Now of course we wouldn't call user_exit() in this case, but the hook is there
in generic code, no need for anything from the arch.

  reply	other threads:[~2015-08-19 17:10 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-08-18 19:11 [PATCH] x86/entry/64: Context-track syscalls before enabling interrupts Andy Lutomirski
2015-08-18 22:16 ` Frederic Weisbecker
2015-08-18 22:35   ` Andy Lutomirski
2015-08-18 23:02     ` Frederic Weisbecker
2015-08-18 23:07       ` Andy Lutomirski
2015-08-19 17:10         ` Frederic Weisbecker [this message]
2015-08-21  7:50           ` Ingo Molnar
2015-08-21 13:14             ` Frederic Weisbecker
2015-08-19  0:30 ` Andy Lutomirski
2015-08-19 15:54   ` Andy Lutomirski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150819171007.GA21717@lerouge \
    --to=fweisbec@gmail.com \
    --cc=bp@alien8.de \
    --cc=brgerst@gmail.com \
    --cc=dvlasenk@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=luto@kernel.org \
    --cc=oleg@redhat.com \
    --cc=riel@redhat.com \
    --cc=sasha.levin@oracle.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox