Re: [PATCH v2 05/15] x86/alternatives: Use optimized NOPs for padding

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Ingo Molnar <mingo@kernel.org>
To: Borislav Petkov <bp@alien8.de>
Cc: X86 ML <x86@kernel.org>, Andy Lutomirski <luto@amacapital.net>,
	LKML <linux-kernel@vger.kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH v2 05/15] x86/alternatives: Use optimized NOPs for padding
Date: Wed, 4 Mar 2015 07:43:03 +0100	[thread overview]
Message-ID: <20150304064303.GA16387@gmail.com> (raw)
In-Reply-To: <1424776497-3180-6-git-send-email-bp@alien8.de>


* Borislav Petkov <bp@alien8.de> wrote:

> From: Borislav Petkov <bp@suse.de>
> 
> Alternatives allow now for an empty old instruction. In this case we go
> and pad the space with NOPs at assembly time. However, there are the
> optimal, longer NOPs which should be used. Do that at patching time by
> adding alt_instr.padlen-sized NOPs at the old instruction address.
> 
> Cc: Andy Lutomirski <luto@amacapital.net>
> Signed-off-by: Borislav Petkov <bp@suse.de>
> ---
>  arch/x86/kernel/alternative.c | 14 +++++++++++++-
>  1 file changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
> index 715af37bf008..af397cc98d05 100644
> --- a/arch/x86/kernel/alternative.c
> +++ b/arch/x86/kernel/alternative.c
> @@ -323,6 +323,14 @@ done:
>  		n_dspl, (unsigned long)orig_insn + n_dspl + repl_len);
>  }
>  
> +static void __init_or_module optimize_nops(struct alt_instr *a, u8 *instr)
> +{
> +	add_nops(instr + (a->instrlen - a->padlen), a->padlen);

So while looking at this patch I was wondering about the following 
question: right now add_nops() does the obvious 'fill with large NOPs 
first, then fill the remaining bytes with a smaller NOP' logic:

/* Use this to add nops to a buffer, then text_poke the whole buffer. */
static void __init_or_module add_nops(void *insns, unsigned int len)
{
        while (len > 0) {
                unsigned int noplen = len;
                if (noplen > ASM_NOP_MAX)
                        noplen = ASM_NOP_MAX;
                memcpy(insns, ideal_nops[noplen], noplen);
                insns += noplen;
                len -= noplen;
        }
}

this works perfectly fine, but I'm wondering how current decoders work 
when a large NOP crosses a cache line boundary or a page boundary. Is 
there any inefficiency in that case, and if yes, could we avoid that 
by not spilling NOPs across cachelines or page boundaries?

With potentially thousands of patched instructions both situations are 
bound to occur dozens of times in the cacheline case, and a few times 
in the page boundary case.

There's also the following special case, of a large NOP followed by a 
small NOP, when the number of NOPs would not change if we padded 
differently:

                [      large NOP         ][smaller NOP]
       [         cacheline 1        ][        cacheline 2             ]

which might be more optimally filled with two mid-size NOPs:

                [    midsize NOP    ][   midsize NOP  ]
       [         cacheline 1        ][        cacheline 2             ]

So that any special boundary is not partially covered by a NOP 
instruction.

But the main question is, do such alignment details ever matter to 
decoder performance?

Thanks,

	Ingo

next prev parent reply	other threads:[~2015-03-04  6:43 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-24 11:14 [PATCH v2 00/15] x86, alternatives: Instruction padding and more robust JMPs Borislav Petkov
2015-02-24 11:14 ` [PATCH v2 01/15] x86/lib/copy_user_64.S: Remove FIX_ALIGNMENT define Borislav Petkov
2015-02-24 11:14 ` [PATCH v2 02/15] x86/alternatives: Cleanup DPRINTK macro Borislav Petkov
2015-02-24 11:14 ` [PATCH v2 03/15] x86/alternatives: Add instruction padding Borislav Petkov
2015-02-24 11:14 ` [PATCH v2 04/15] x86/alternatives: Make JMPs more robust Borislav Petkov
2015-02-24 11:14 ` [PATCH v2 05/15] x86/alternatives: Use optimized NOPs for padding Borislav Petkov
2015-03-04  6:43   ` Ingo Molnar [this message]
2015-03-04  8:42     ` Borislav Petkov
2015-02-24 11:14 ` [PATCH v2 06/15] x86/lib/copy_page_64.S: Use generic ALTERNATIVE macro Borislav Petkov
2015-02-24 11:14 ` [PATCH v2 07/15] x86/lib/copy_user_64.S: Convert to ALTERNATIVE_2 Borislav Petkov
2015-03-04  6:25   ` Ingo Molnar
2015-03-04  7:13     ` Ingo Molnar
2015-03-04  9:06       ` Borislav Petkov
2015-03-05  0:34         ` Ingo Molnar
2015-03-05  8:23           ` Borislav Petkov
2015-03-04  9:00     ` Borislav Petkov
2015-03-05  0:32       ` Ingo Molnar
2015-03-05  8:35         ` Borislav Petkov
2015-03-05  9:34           ` Ingo Molnar
2015-03-05  9:46             ` Ingo Molnar
2015-02-24 11:14 ` [PATCH v2 08/15] x86/smap: Use ALTERNATIVE macro Borislav Petkov
2015-02-24 11:14 ` [PATCH v2 09/15] x86/entry_32: Convert X86_INVD_BUG to " Borislav Petkov
2015-02-24 11:14 ` [PATCH v2 10/15] x86/lib/clear_page_64.S: Convert to ALTERNATIVE_2 macro Borislav Petkov
2015-02-24 11:14 ` [PATCH v2 11/15] x86/asm: Use alternative_2() in rdtsc_barrier() Borislav Petkov
2015-02-24 11:14 ` [PATCH v2 12/15] x86/asm: Cleanup prefetch primitives Borislav Petkov
2015-03-04  6:48   ` Ingo Molnar
2015-03-04  9:08     ` Borislav Petkov
2015-02-24 11:14 ` [PATCH v2 13/15] x86/lib/memset_64.S: Convert to ALTERNATIVE_2 macro Borislav Petkov
2015-02-24 11:14 ` [PATCH v2 14/15] x86/lib/memmove_64.S: Convert memmove() to ALTERNATIVE macro Borislav Petkov
2015-03-04  7:19   ` Ingo Molnar
2015-02-24 11:14 ` [PATCH v2 15/15] x86/lib/memcpy_64.S: Convert memcpy to ALTERNATIVE_2 macro Borislav Petkov
2015-03-04  7:26   ` Ingo Molnar
2015-03-04 13:58     ` Borislav Petkov
2015-03-05  0:26       ` Ingo Molnar
2015-03-05  8:37         ` Borislav Petkov
2015-02-24 20:25 ` [PATCH v2 00/15] x86, alternatives: Instruction padding and more robust JMPs Andy Lutomirski
2015-02-26 18:13 ` Borislav Petkov
2015-02-26 18:16   ` [PATCH 1/3] perf/bench: Fix mem* routines usage after alternatives change Borislav Petkov
2015-02-26 18:16     ` [PATCH 2/3] perf/bench: Carve out mem routine benchmarking Borislav Petkov
2015-02-26 18:16     ` [PATCH 3/3] perf/bench: Add -r all so that you can run all mem* routines Borislav Petkov
2015-03-04  7:30       ` Ingo Molnar
2015-03-02 14:51   ` [PATCH v2 00/15] x86, alternatives: Instruction padding and more robust JMPs Hitoshi Mitake
2015-03-02 16:27     ` Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150304064303.GA16387@gmail.com \
    --to=mingo@kernel.org \
    --cc=bp@alien8.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox