Re: [PATCH] x86/asm/entry/64: pack interrupt dispatch table tighter

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Denys Vlasenko <dvlasenk@redhat.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Borislav Petkov <bp@alien8.de>, "H. Peter Anvin" <hpa@zytor.com>,
	Andy Lutomirski <luto@amacapital.net>,
	Oleg Nesterov <oleg@redhat.com>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Alexei Starovoitov <ast@plumgrid.com>,
	Will Drewry <wad@chromium.org>, Kees Cook <keescook@chromium.org>,
	the arch/x86 maintainers <x86@kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] x86/asm/entry/64: pack interrupt dispatch table tighter
Date: Fri, 03 Apr 2015 20:36:08 +0200	[thread overview]
Message-ID: <551EDD98.9030600@redhat.com> (raw)
In-Reply-To: <CA+55aFxHSCUmsWZR4ey_NfZep-h5oyEoUL15GamQt9v17znyvQ@mail.gmail.com>

On 04/03/2015 08:08 PM, Linus Torvalds wrote:
> On Fri, Apr 3, 2015 at 9:54 AM, Denys Vlasenko <dvlasenk@redhat.com> wrote:
>>
>> How about this version?
>> It's still isn't a star of readability,
>> but the structure of the 32-byte code block is more visible now...
> 
> Do we really even want to be this clever in the first place?
> 
> The thing is, when we take an interrupt:
> 
>  (a) the L1 I$ is always cold
> 
>  (b) the instruction decoder has never had time to run ahead
> 
>  (c) there are usually not that many different interrupts anyway, even
> under load (ie you'd have maybe disk and networking)
> 
>  (d) we intentionally spread out the different interrupt vector numbers
> 
>  (e) the 32-byte block thing is questionable, most older
> micro-architectures fetch in 16-byte blocks iirc.
> 
> So what this tells me is that:
> 
>  - (a+b) the jump-to-jump is likely fairly expensive, because even
> though they are in the same cacheline, the front end hasn't gotten
> ahead of anything, so there's no hiding any front end pipeline
> hickups.
> 
>  - (c+d) there is likely very little advantage to trying to "pack"
> things in cachelines

Good points.

>  - (d+e) the 7-instructions-in-one-32-byte-block doesn't really sound
> all that big of a win, and it does cause a 16-byte split for some
> interrupt.

No, this doesn't happen. With current code, none of instructions
cross 16-byte split. Even 8-byte boundary is never crossed.

> In other words, I'd suggest that we just use simple unconditional
> 5-byte branch instead. Add the two-byte "push" instruction, you have 7
> bytes per interrupt. Align that 7 bytes up to 8, and none of them ever
> cross a 16-byte boundary.
> 
> Simple, clean, and slightly bigger in memory footprint, but probably
> not noticeably more so in cache footprint, simply because there
> usually aren't that many active interrupts anyway.
> 
> The people who do millions of networking interrupts per second and
> have network cards that steer things to many different interrupts
> already try to make sure that the steering goes to different CPU's -
> otherwise there wouldn't be any *point* to steering things. So that
> particular case of "lots of active interrupts" doesn't have a bigger
> cache footprint *either*, since any particular CPU L1 I$ will still
> only handle a few interrupts.
> 
> So you get "only" 4 interrupt cases per 32 bytes rather than 7. But is
> that odd double jump and all this complexity really worth it?
> 
> So I really suggest just doing something stupid and straightforward
> (and completely untested) like this:
> 
>     .macro push_vector
>         pushq_cfi $(~vector+0x80)
>         jmp common_interrupt
>         .align 8
>     .endm
> 
>     vector=FIRST_EXTERNAL_VECTOR
>     .align 64
>     ENTRY(irq_entries_start)
>     .rept 256 /* this number does not need to be exact, just big enough */
>          make_vector
>     .endr
> 
> and just be done with it.
> 
> (Of course, you have to change the code that knows about the "7
> entries in 32 bytes" patterns too, but that's just going to be much
> simpler now).

I'll send a patch in ~30 minutes.

     prev parent reply	other threads:[~2015-04-03 18:36 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-03 11:13 [PATCH] x86/asm/entry/64: pack interrupt dispatch table tighter Denys Vlasenko
2015-04-03 14:03 ` Ingo Molnar
2015-04-03 16:12   ` Denys Vlasenko
2015-04-03 16:54   ` Denys Vlasenko
2015-04-03 18:08     ` Linus Torvalds
2015-04-03 18:35       ` H. Peter Anvin
2015-04-03 18:37         ` Linus Torvalds
2015-04-03 19:06           ` H. Peter Anvin
2015-04-04  6:42             ` Ingo Molnar
2015-04-03 18:36       ` Denys Vlasenko [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=551EDD98.9030600@redhat.com \
    --to=dvlasenk@redhat.com \
    --cc=ast@plumgrid.com \
    --cc=bp@alien8.de \
    --cc=fweisbec@gmail.com \
    --cc=hpa@zytor.com \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=mingo@kernel.org \
    --cc=oleg@redhat.com \
    --cc=rostedt@goodmis.org \
    --cc=torvalds@linux-foundation.org \
    --cc=wad@chromium.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.