public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@kernel.org>
To: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Borislav Petkov <bp@alien8.de>, "H. Peter Anvin" <hpa@zytor.com>,
	Andy Lutomirski <luto@amacapital.net>,
	Oleg Nesterov <oleg@redhat.com>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Alexei Starovoitov <ast@plumgrid.com>,
	Will Drewry <wad@chromium.org>, Kees Cook <keescook@chromium.org>,
	x86@kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] x86: simplify interrupt dispatch table
Date: Sat, 4 Apr 2015 09:06:36 +0200	[thread overview]
Message-ID: <20150404070636.GB19587@gmail.com> (raw)
In-Reply-To: <1428090553-7283-1-git-send-email-dvlasenk@redhat.com>


* Denys Vlasenko <dvlasenk@redhat.com> wrote:

> Interrupt entry points are handled with the following code:
> Each 32-byte code block contains seven entry points
> 
> 		...
> 		[push][jump 22] // 4 bytes
> 		[push][jump 18] // 4 bytes
> 		[push][jump 14] // 4 bytes
> 		[push][jump 10] // 4 bytes
> 		[push][jump  6] // 4 bytes
> 		[push][jump  2] // 4 bytes
> 		[push][jump common_interrupt][padding] // 8 bytes
> 
> 		[push][jump]
> 		[push][jump]
> 		[push][jump]
> 		[push][jump]
> 		[push][jump]
> 		[push][jump]
> 		[push][jump common_interrupt][padding]
> 
> 		[padding_2]
> 	common_interrupt:
> 
> And there is a table which holds pointers to every entry point,
> IOW: to every push.
> 
> In cold cache, two jumps are still costlier than one, even though we get
> the benefit of them residing in the same cacheline.
> 
> This change replaces short jumps with near ones to common_interrupt, and pads
> every push+jump pair to 8 bytes. This way, each interrupt takes only one jump.
> 
> This change replaces ".p2align CONFIG_X86_L1_CACHE_SHIFT" before dispatch table
> with ".align 8" - we do not need anything stronger than that.
> 
> The table of entry addresses (the interrupt[] array) is no longer
> necessary, the address of entries can be easily calculated as
> (irq_entries_start + i*8).
> 
>    text	   data	    bss	    dec	    hex	filename
>   12546	      0	      0	  12546	   3102	entry_64.o.before
>   11626	      0	      0	  11626	   2d6a	entry_64.o
> 
> The size decrease is because 1656 bytes of .init.rodata are gone.
> That's initdata, though. The resident size does go up a bit.

So I like this a lot, as it's straight, simple and obvious, both to 
hardware and to humans as well. (This is btw. quite close to the irq 
entry code layout we used to have historically.)

We could do three other changes that would probably help a lot more in 
practice than the addition or elimination of a single instruction:

1)

  We could try to not spread vectors, as modern APICs seem to handle 
  clustered vectors a lot better and we don't actually use irq 
  priority levels like other OSs so we are free to choose our vectors.

  This compresses the I$ footprint a bit more if lots of related
  irq sources are firing towards the same CPUs that share one or more
  caches (HT threads, cores, node local siblings).

  Even on single-node systems this would still compress the IDT and 
  the entry code cache footprint a bit.

2)

  We could allocate the IDT per CPU (or per node), lowering the D$ 
  cache miss costs on NUMA systems. (This, if we allowed the IDTs to 
  diverge, would also allow more irq sources sent to separate 
  vectors.)

  The simplest model of this, where each IDT is just a copy of each
  other, is relatively easy to implement, as the IDT is page aligned 
  and ro mapped already.

3)

  We could allocate the entry code itself too per cpu (or per node), 
  lowering the I$ cache miss costs on NUMA systems. This would be a 
  bit trickier to implement, as that part of the image has to be 
  relinked during bootup, but is doable.

I'd do 3) only once we are done with the current audit/cleanup/rewrite 
of the entry code.

Thanks,

	Ingo

  reply	other threads:[~2015-04-04  7:06 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-03 19:49 [PATCH] x86: simplify interrupt dispatch table Denys Vlasenko
2015-04-04  7:06 ` Ingo Molnar [this message]
2015-04-07 15:26 ` Borislav Petkov
2015-04-08  7:43 ` [tip:x86/asm] x86/asm/entry/irq: Simplify interrupt dispatch table (IDT) layout tip-bot for Denys Vlasenko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150404070636.GB19587@gmail.com \
    --to=mingo@kernel.org \
    --cc=ast@plumgrid.com \
    --cc=bp@alien8.de \
    --cc=dvlasenk@redhat.com \
    --cc=fweisbec@gmail.com \
    --cc=hpa@zytor.com \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=oleg@redhat.com \
    --cc=rostedt@goodmis.org \
    --cc=torvalds@linux-foundation.org \
    --cc=wad@chromium.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox