Re: [PATCH] x86/asm/entry/64: pack interrupt dispatch table tighter

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Denys Vlasenko <dvlasenk@redhat.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Borislav Petkov <bp@alien8.de>, "H. Peter Anvin" <hpa@zytor.com>,
	Andy Lutomirski <luto@amacapital.net>,
	Oleg Nesterov <oleg@redhat.com>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Alexei Starovoitov <ast@plumgrid.com>,
	Will Drewry <wad@chromium.org>, Kees Cook <keescook@chromium.org>,
	the arch/x86 maintainers <x86@kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] x86/asm/entry/64: pack interrupt dispatch table tighter
Date: Fri, 03 Apr 2015 20:36:08 +0200	[thread overview]
Message-ID: <551EDD98.9030600@redhat.com> (raw)
In-Reply-To: <CA+55aFxHSCUmsWZR4ey_NfZep-h5oyEoUL15GamQt9v17znyvQ@mail.gmail.com>

On 04/03/2015 08:08 PM, Linus Torvalds wrote:
> On Fri, Apr 3, 2015 at 9:54 AM, Denys Vlasenko <dvlasenk@redhat.com> wrote:
>>
>> How about this version?
>> It's still isn't a star of readability,
>> but the structure of the 32-byte code block is more visible now...
> 
> Do we really even want to be this clever in the first place?
> 
> The thing is, when we take an interrupt:
> 
>  (a) the L1 I$ is always cold
> 
>  (b) the instruction decoder has never had time to run ahead
> 
>  (c) there are usually not that many different interrupts anyway, even
> under load (ie you'd have maybe disk and networking)
> 
>  (d) we intentionally spread out the different interrupt vector numbers
> 
>  (e) the 32-byte block thing is questionable, most older
> micro-architectures fetch in 16-byte blocks iirc.
> 
> So what this tells me is that:
> 
>  - (a+b) the jump-to-jump is likely fairly expensive, because even
> though they are in the same cacheline, the front end hasn't gotten
> ahead of anything, so there's no hiding any front end pipeline
> hickups.
> 
>  - (c+d) there is likely very little advantage to trying to "pack"
> things in cachelines

Good points.

>  - (d+e) the 7-instructions-in-one-32-byte-block doesn't really sound
> all that big of a win, and it does cause a 16-byte split for some
> interrupt.

No, this doesn't happen. With current code, none of instructions
cross 16-byte split. Even 8-byte boundary is never crossed.

> In other words, I'd suggest that we just use simple unconditional
> 5-byte branch instead. Add the two-byte "push" instruction, you have 7
> bytes per interrupt. Align that 7 bytes up to 8, and none of them ever
> cross a 16-byte boundary.
> 
> Simple, clean, and slightly bigger in memory footprint, but probably
> not noticeably more so in cache footprint, simply because there
> usually aren't that many active interrupts anyway.
> 
> The people who do millions of networking interrupts per second and
> have network cards that steer things to many different interrupts
> already try to make sure that the steering goes to different CPU's -
> otherwise there wouldn't be any *point* to steering things. So that
> particular case of "lots of active interrupts" doesn't have a bigger
> cache footprint *either*, since any particular CPU L1 I$ will still
> only handle a few interrupts.
> 
> So you get "only" 4 interrupt cases per 32 bytes rather than 7. But is
> that odd double jump and all this complexity really worth it?
> 
> So I really suggest just doing something stupid and straightforward
> (and completely untested) like this:
> 
>     .macro push_vector
>         pushq_cfi $(~vector+0x80)
>         jmp common_interrupt
>         .align 8
>     .endm
> 
>     vector=FIRST_EXTERNAL_VECTOR
>     .align 64
>     ENTRY(irq_entries_start)
>     .rept 256 /* this number does not need to be exact, just big enough */
>          make_vector
>     .endr
> 
> and just be done with it.
> 
> (Of course, you have to change the code that knows about the "7
> entries in 32 bytes" patterns too, but that's just going to be much
> simpler now).

I'll send a patch in ~30 minutes.

     prev parent reply	other threads:[~2015-04-03 18:36 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-03 11:13 [PATCH] x86/asm/entry/64: pack interrupt dispatch table tighter Denys Vlasenko
2015-04-03 14:03 ` Ingo Molnar
2015-04-03 16:12   ` Denys Vlasenko
2015-04-03 16:54   ` Denys Vlasenko
2015-04-03 18:08     ` Linus Torvalds
2015-04-03 18:35       ` H. Peter Anvin
2015-04-03 18:37         ` Linus Torvalds
2015-04-03 19:06           ` H. Peter Anvin
2015-04-04  6:42             ` Ingo Molnar
2015-04-03 18:36       ` Denys Vlasenko [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=551EDD98.9030600@redhat.com \
    --to=dvlasenk@redhat.com \
    --cc=ast@plumgrid.com \
    --cc=bp@alien8.de \
    --cc=fweisbec@gmail.com \
    --cc=hpa@zytor.com \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=mingo@kernel.org \
    --cc=oleg@redhat.com \
    --cc=rostedt@goodmis.org \
    --cc=torvalds@linux-foundation.org \
    --cc=wad@chromium.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox