* [RFC PATCH 0/4] x86, alternatives: Insn padding and more robust JMPs
@ 2015-01-05 15:00 Borislav Petkov
2015-01-05 15:00 ` [RFC PATCH 1/4] x86, copy_user: Remove FIX_ALIGNMENT define Borislav Petkov
` (3 more replies)
0 siblings, 4 replies; 6+ messages in thread
From: Borislav Petkov @ 2015-01-05 15:00 UTC (permalink / raw)
To: X86 ML; +Cc: LKML
From: Borislav Petkov <bp@suse.de>
Hi all,
this is something which hpa and I talked about recently: the ability for
the alternatives code to add padding to the original instruction in case
the replacement is longer and also to be able to simply write "jmp" and
not care about which JMP exactly the compiler generates and whether the
relative offsets are correct.
So this is a stab at it, it seems to boot in kvm here but it needs more
staring to make sure we're actually generating the proper code at all
times.
Thus the RFC tag, comments/suggestions are welcome.
Thanks.
Borislav Petkov (4):
x86, copy_user: Remove FIX_ALIGNMENT define
x86, alternatives: Cleanup DPRINTK macro
alternatives: Add instruction padding
alternatives: Make JMPs more robust
arch/x86/include/asm/alternative-asm.h | 38 ++++++++++
arch/x86/include/asm/alternative.h | 34 +++++----
arch/x86/include/asm/cpufeature.h | 12 +--
arch/x86/kernel/alternative.c | 130 +++++++++++++++++++++++++++------
arch/x86/lib/copy_page_64.S | 34 ++++-----
arch/x86/lib/copy_user_64.S | 44 ++---------
6 files changed, 193 insertions(+), 99 deletions(-)
--
2.2.0.33.gc18b867
^ permalink raw reply [flat|nested] 6+ messages in thread
* [RFC PATCH 1/4] x86, copy_user: Remove FIX_ALIGNMENT define
2015-01-05 15:00 [RFC PATCH 0/4] x86, alternatives: Insn padding and more robust JMPs Borislav Petkov
@ 2015-01-05 15:00 ` Borislav Petkov
2015-01-05 15:00 ` [RFC PATCH 2/4] x86, alternatives: Cleanup DPRINTK macro Borislav Petkov
` (2 subsequent siblings)
3 siblings, 0 replies; 6+ messages in thread
From: Borislav Petkov @ 2015-01-05 15:00 UTC (permalink / raw)
To: X86 ML; +Cc: LKML
From: Borislav Petkov <bp@suse.de>
It is superfluous now so remove it. No object file change before and
after.
Signed-off-by: Borislav Petkov <bp@suse.de>
---
arch/x86/lib/copy_user_64.S | 5 -----
1 file changed, 5 deletions(-)
diff --git a/arch/x86/lib/copy_user_64.S b/arch/x86/lib/copy_user_64.S
index dee945d55594..1530ec2c1b12 100644
--- a/arch/x86/lib/copy_user_64.S
+++ b/arch/x86/lib/copy_user_64.S
@@ -8,9 +8,6 @@
#include <linux/linkage.h>
#include <asm/dwarf2.h>
-
-#define FIX_ALIGNMENT 1
-
#include <asm/current.h>
#include <asm/asm-offsets.h>
#include <asm/thread_info.h>
@@ -45,7 +42,6 @@
.endm
.macro ALIGN_DESTINATION
-#ifdef FIX_ALIGNMENT
/* check for bad alignment of destination */
movl %edi,%ecx
andl $7,%ecx
@@ -67,7 +63,6 @@
_ASM_EXTABLE(100b,103b)
_ASM_EXTABLE(101b,103b)
-#endif
.endm
/* Standard copy_to_user with segment limit checking */
--
2.2.0.33.gc18b867
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [RFC PATCH 2/4] x86, alternatives: Cleanup DPRINTK macro
2015-01-05 15:00 [RFC PATCH 0/4] x86, alternatives: Insn padding and more robust JMPs Borislav Petkov
2015-01-05 15:00 ` [RFC PATCH 1/4] x86, copy_user: Remove FIX_ALIGNMENT define Borislav Petkov
@ 2015-01-05 15:00 ` Borislav Petkov
2015-01-05 17:16 ` Joe Perches
2015-01-05 15:00 ` [RFC PATCH 3/4] alternatives: Add instruction padding Borislav Petkov
2015-01-05 15:00 ` [RFC PATCH 4/4] alternatives: Make JMPs more robust Borislav Petkov
3 siblings, 1 reply; 6+ messages in thread
From: Borislav Petkov @ 2015-01-05 15:00 UTC (permalink / raw)
To: X86 ML; +Cc: LKML
From: Borislav Petkov <bp@suse.de>
Make it pass __func__ implicitly. Also, dump info about each replacing
we're doing. Fixup comments and style while at it.
Signed-off-by: Borislav Petkov <bp@suse.de>
---
arch/x86/kernel/alternative.c | 41 +++++++++++++++++++++++++----------------
1 file changed, 25 insertions(+), 16 deletions(-)
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 703130f469ec..1e86e85bcf58 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -52,10 +52,10 @@ static int __init setup_noreplace_paravirt(char *str)
__setup("noreplace-paravirt", setup_noreplace_paravirt);
#endif
-#define DPRINTK(fmt, ...) \
-do { \
- if (debug_alternative) \
- printk(KERN_DEBUG fmt, ##__VA_ARGS__); \
+#define DPRINTK(fmt, args...) \
+do { \
+ if (debug_alternative) \
+ printk(KERN_DEBUG "%s: " fmt "\n", __func__, ##args); \
} while (0)
/*
@@ -243,12 +243,13 @@ extern struct alt_instr __alt_instructions[], __alt_instructions_end[];
extern s32 __smp_locks[], __smp_locks_end[];
void *text_poke_early(void *addr, const void *opcode, size_t len);
-/* Replace instructions with better alternatives for this CPU type.
- This runs before SMP is initialized to avoid SMP problems with
- self modifying code. This implies that asymmetric systems where
- APs have less capabilities than the boot processor are not handled.
- Tough. Make sure you disable such features by hand. */
-
+/*
+ * Replace instructions with better alternatives for this CPU type. This runs
+ * before SMP is initialized to avoid SMP problems with self modifying code.
+ * This implies that asymmetric systems where APs have less capabilities than
+ * the boot processor are not handled. Tough. Make sure you disable such
+ * features by hand.
+ */
void __init_or_module apply_alternatives(struct alt_instr *start,
struct alt_instr *end)
{
@@ -256,10 +257,10 @@ void __init_or_module apply_alternatives(struct alt_instr *start,
u8 *instr, *replacement;
u8 insnbuf[MAX_PATCH_LEN];
- DPRINTK("%s: alt table %p -> %p\n", __func__, start, end);
+ DPRINTK("alt table %p -> %p", start, end);
/*
* The scan order should be from start to end. A later scanned
- * alternative code can overwrite a previous scanned alternative code.
+ * alternative code can overwrite previously scanned alternative code.
* Some kernel functions (e.g. memcpy, memset, etc) use this order to
* patch code.
*
@@ -275,11 +276,19 @@ void __init_or_module apply_alternatives(struct alt_instr *start,
if (!boot_cpu_has(a->cpuid))
continue;
+ DPRINTK("feat: %d*32+%d, old: (%p, len: %d), repl: (%p, len: %d)",
+ a->cpuid >> 5,
+ a->cpuid & 0x1f,
+ instr, a->instrlen,
+ replacement, a->replacementlen);
+
memcpy(insnbuf, replacement, a->replacementlen);
/* 0xe8 is a relative jump; fix the offset. */
- if (*insnbuf == 0xe8 && a->replacementlen == 5)
- *(s32 *)(insnbuf + 1) += replacement - instr;
+ if (*insnbuf == 0xe8 && a->replacementlen == 5) {
+ *(s32 *)(insnbuf + 1) += replacement - instr;
+ DPRINTK("Fix CALL offset: 0x%x", *(s32 *)(insnbuf + 1));
+ }
add_nops(insnbuf + a->replacementlen,
a->instrlen - a->replacementlen);
@@ -371,8 +380,8 @@ void __init_or_module alternatives_smp_module_add(struct module *mod,
smp->locks_end = locks_end;
smp->text = text;
smp->text_end = text_end;
- DPRINTK("%s: locks %p -> %p, text %p -> %p, name %s\n",
- __func__, smp->locks, smp->locks_end,
+ DPRINTK("locks %p -> %p, text %p -> %p, name %s\n",
+ smp->locks, smp->locks_end,
smp->text, smp->text_end, smp->name);
list_add_tail(&smp->next, &smp_alt_modules);
--
2.2.0.33.gc18b867
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [RFC PATCH 3/4] alternatives: Add instruction padding
2015-01-05 15:00 [RFC PATCH 0/4] x86, alternatives: Insn padding and more robust JMPs Borislav Petkov
2015-01-05 15:00 ` [RFC PATCH 1/4] x86, copy_user: Remove FIX_ALIGNMENT define Borislav Petkov
2015-01-05 15:00 ` [RFC PATCH 2/4] x86, alternatives: Cleanup DPRINTK macro Borislav Petkov
@ 2015-01-05 15:00 ` Borislav Petkov
2015-01-05 15:00 ` [RFC PATCH 4/4] alternatives: Make JMPs more robust Borislav Petkov
3 siblings, 0 replies; 6+ messages in thread
From: Borislav Petkov @ 2015-01-05 15:00 UTC (permalink / raw)
To: X86 ML; +Cc: LKML
From: Borislav Petkov <bp@suse.de>
Up until now we have always paid attention to make sure the length of
the new instruction replacing the old one is at least less or equal to
the length of the old instruction. If the new instruction is longer, at
the time it replaces the old instruction it will overwrite the beginning
of the next instruction in the kernel image and cause your pants to
catch fire.
So instead of having to pay attention, teach the alternatives framework
to pad shorter old instructions with NOPs at buildtime - but only in the
case when
len(old instruction(s)) < len(new instruction(s))
and add nothing in the >= case. (In that case we do add_nops() when
patching).
This way the alternatives user shouldn't have to care about instruction
sizes and simply use the macros.
Signed-off-by: Borislav Petkov <bp@suse.de>
---
arch/x86/include/asm/alternative.h | 34 ++++++++++++++++++----------------
arch/x86/include/asm/cpufeature.h | 2 ++
arch/x86/kernel/alternative.c | 6 +++---
3 files changed, 23 insertions(+), 19 deletions(-)
diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
index 473bdbee378a..2b08c417e357 100644
--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -76,13 +76,25 @@ static inline int alternatives_text_reserved(void *start, void *end)
}
#endif /* CONFIG_SMP */
-#define OLDINSTR(oldinstr) "661:\n\t" oldinstr "\n662:\n"
-
#define b_replacement(number) "663"#number
#define e_replacement(number) "664"#number
-#define alt_slen "662b-661b"
-#define alt_rlen(number) e_replacement(number)"f-"b_replacement(number)"f"
+#define alt_slen "662b-661b"
+#define alt_rlen(number) e_replacement(number)"f-"b_replacement(number)"f"
+
+#define OLDINSTR(oldinstr, num) \
+ "661:\n\t" oldinstr "\n662:\n" \
+ ".skip -(((" alt_rlen(num) ")-(" alt_slen ")) > 0) * " \
+ "((" alt_rlen(num) ")-(" alt_slen ")),0x90\n"
+
+/*
+ * Pad the second replacement alternative with additional NOPs if it is
+ * additionally longer than the first replacement alternative.
+ */
+#define OLDINSTR_2(oldinstr, num1, num2) \
+ OLDINSTR(oldinstr, num1) \
+ ".skip -(((" alt_rlen(num2) ")-(" alt_rlen(num1) ")) > 0) * " \
+ "((" alt_rlen(num2) ")-(" alt_rlen(num1) ")),0x90\n"
#define ALTINSTR_ENTRY(feature, number) \
" .long 661b - .\n" /* label */ \
@@ -91,35 +103,25 @@ static inline int alternatives_text_reserved(void *start, void *end)
" .byte " alt_slen "\n" /* source len */ \
" .byte " alt_rlen(number) "\n" /* replacement len */
-#define DISCARD_ENTRY(number) /* rlen <= slen */ \
- " .byte 0xff + (" alt_rlen(number) ") - (" alt_slen ")\n"
-
#define ALTINSTR_REPLACEMENT(newinstr, feature, number) /* replacement */ \
b_replacement(number)":\n\t" newinstr "\n" e_replacement(number) ":\n\t"
/* alternative assembly primitive: */
#define ALTERNATIVE(oldinstr, newinstr, feature) \
- OLDINSTR(oldinstr) \
+ OLDINSTR(oldinstr, 1) \
".pushsection .altinstructions,\"a\"\n" \
ALTINSTR_ENTRY(feature, 1) \
".popsection\n" \
- ".pushsection .discard,\"aw\",@progbits\n" \
- DISCARD_ENTRY(1) \
- ".popsection\n" \
".pushsection .altinstr_replacement, \"ax\"\n" \
ALTINSTR_REPLACEMENT(newinstr, feature, 1) \
".popsection"
#define ALTERNATIVE_2(oldinstr, newinstr1, feature1, newinstr2, feature2)\
- OLDINSTR(oldinstr) \
+ OLDINSTR_2(oldinstr, 1, 2) \
".pushsection .altinstructions,\"a\"\n" \
ALTINSTR_ENTRY(feature1, 1) \
ALTINSTR_ENTRY(feature2, 2) \
".popsection\n" \
- ".pushsection .discard,\"aw\",@progbits\n" \
- DISCARD_ENTRY(1) \
- DISCARD_ENTRY(2) \
- ".popsection\n" \
".pushsection .altinstr_replacement, \"ax\"\n" \
ALTINSTR_REPLACEMENT(newinstr1, feature1, 1) \
ALTINSTR_REPLACEMENT(newinstr2, feature2, 2) \
diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index aede2c347bde..1db37780a344 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -489,6 +489,8 @@ static __always_inline __pure bool _static_cpu_has_safe(u16 bit)
*/
asm_volatile_goto("1: .byte 0xe9\n .long %l[t_dynamic] - 2f\n"
"2:\n"
+ ".skip -(((4f-3f) - (2b-1b)) > 0) * "
+ "((4f-3f) - (2b-1b)),0x90\n"
".section .altinstructions,\"a\"\n"
" .long 1b - .\n" /* src offset */
" .long 3f - .\n" /* repl offset */
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 1e86e85bcf58..c99b0f13a90e 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -270,7 +270,6 @@ void __init_or_module apply_alternatives(struct alt_instr *start,
for (a = start; a < end; a++) {
instr = (u8 *)&a->instr_offset + a->instr_offset;
replacement = (u8 *)&a->repl_offset + a->repl_offset;
- BUG_ON(a->replacementlen > a->instrlen);
BUG_ON(a->instrlen > sizeof(insnbuf));
BUG_ON(a->cpuid >= (NCAPINTS + NBUGINTS) * 32);
if (!boot_cpu_has(a->cpuid))
@@ -290,8 +289,9 @@ void __init_or_module apply_alternatives(struct alt_instr *start,
DPRINTK("Fix CALL offset: 0x%x", *(s32 *)(insnbuf + 1));
}
- add_nops(insnbuf + a->replacementlen,
- a->instrlen - a->replacementlen);
+ if (a->instrlen > a->replacementlen)
+ add_nops(insnbuf + a->replacementlen,
+ a->instrlen - a->replacementlen);
text_poke_early(instr, insnbuf, a->instrlen);
}
--
2.2.0.33.gc18b867
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [RFC PATCH 4/4] alternatives: Make JMPs more robust
2015-01-05 15:00 [RFC PATCH 0/4] x86, alternatives: Insn padding and more robust JMPs Borislav Petkov
` (2 preceding siblings ...)
2015-01-05 15:00 ` [RFC PATCH 3/4] alternatives: Add instruction padding Borislav Petkov
@ 2015-01-05 15:00 ` Borislav Petkov
3 siblings, 0 replies; 6+ messages in thread
From: Borislav Petkov @ 2015-01-05 15:00 UTC (permalink / raw)
To: X86 ML; +Cc: LKML
From: Borislav Petkov <bp@suse.de>
Up until now we had to pay attention to relative JMPs in alternatives
about how their relative offset gets computed so that the jump target
is still correct. Or, as it is the case for near CALLs (opcode e8), we
still have to go and readjust the offset at patching time.
What is more, the static_cpu_has_safe() facility had to force 4-byte
JMPs since we couldn't rely on the compiler to generate proper ones,
and, worse than that, generate a replacement JMP which is longer than
the original one, thus overwriting the beginning of the next instruction
at patching time.
So, in order to alleviate all that and make using JMPs more
straight-forward we go and pad the original instruction in an
alternative block with NOPs should the replacement(s) be longer. This
way, alternatives users shouldn't pay special attention so that original
and replacement instruction sizes are fine but the assembler would
simply add padding where needed and not do anything otherwise.
As a second aspect, we go and recompute JMPs at patching time so that we
can try to make 5-byte JMPs into two-byte ones if possible. If not, we
still have to recompute the offsets as the replacement JMP gets put far
away in the .altinstr_replacement section leading to a wrong offset if
copied verbatim.
For example, on a locally generated kernel image
old insn VA: 0xffffffff810014bd, CPU feat: X86_FEATURE_ALWAYS, size: 2
__switch_to:
ffffffff810014bd: eb 21 jmp ffffffff810014e0
repl insn: size: 5
ffffffff81d0b23c: e9 b1 62 2f ff jmpq ffffffff810014f2
gets corrected to a 2-byte JMP:
apply_alternatives: feat: 3*32+21, old: (ffffffff810014bd, len: 2), repl: (ffffffff81d0b23c, len: 5)
alt_insn: e9 b1 62 2f ff
recompute_jumps: next_rip: ffffffff81d0b241, tgt_rip: ffffffff810014f2, new_displ: 0x00000033, ret len: 2
converted to: eb 33 90 90 90
and a 5-byte JMP:
old insn VA: 0xffffffff81001516, CPU feat: X86_FEATURE_ALWAYS, size: 2
__switch_to:
ffffffff81001516: eb 30 jmp ffffffff81001548
repl insn: size: 5
ffffffff81d0b241: e9 10 63 2f ff jmpq ffffffff81001556
gets shortened into a two-byte one:
apply_alternatives: feat: 3*32+21, old: (ffffffff81001516, len: 2), repl: (ffffffff81d0b241, len: 5)
alt_insn: e9 10 63 2f ff
recompute_jumps: next_rip: ffffffff81d0b246, tgt_rip: ffffffff81001556, new_displ: 0x0000003e, ret len: 2
converted to: eb 3e 90 90 90
... and so on.
This leads to a net win of 126 bytes of I$ on an AMD guest which
means some savings of precious instruction cache bandwidth. The
padding to the shorter 2-byte JMPs are single-byte NOPs which on smart
microarchitectures means discarding NOPs at decode time and thus freeing
up execution bandwidth.
Signed-off-by: Borislav Petkov <bp@suse.de>
---
arch/x86/include/asm/alternative-asm.h | 38 +++++++++++++++
arch/x86/include/asm/cpufeature.h | 10 +---
arch/x86/kernel/alternative.c | 85 ++++++++++++++++++++++++++++++++--
arch/x86/lib/copy_page_64.S | 34 +++++++-------
arch/x86/lib/copy_user_64.S | 39 ++++------------
5 files changed, 146 insertions(+), 60 deletions(-)
diff --git a/arch/x86/include/asm/alternative-asm.h b/arch/x86/include/asm/alternative-asm.h
index 372231c22a47..c407705524c0 100644
--- a/arch/x86/include/asm/alternative-asm.h
+++ b/arch/x86/include/asm/alternative-asm.h
@@ -26,6 +26,44 @@
.byte \alt_len
.endm
+.macro ALTERNATIVE feat, orig, alt
+0:
+ \orig
+1:
+ .skip -(((3f-2f)-(1b-0b)) > 0) * ((3f-2f)-(1b-0b)),0x90
+
+ .pushsection .altinstructions,"a"
+ altinstruction_entry 0b,2f,\feat,1b-0b,3f-2f
+ .popsection
+
+ .pushsection .altinstr_replacement,"ax"
+2:
+ \alt
+3:
+ .popsection
+.endm
+
+.macro ALTERNATIVE_2 feat1, feat2, orig, alt1, alt2
+0:
+ \orig
+1:
+ .skip -(((3f-2f)-(1b-0b)) > 0) * ((3f-2f)-(1b-0b)),0x90
+ .skip -(((4f-3f)-(3f-2f)) > 0) * ((4f-3f)-(3f-2f)),0x90
+
+ .pushsection .altinstructions,"a"
+ altinstruction_entry 0b,2f,\feat1,1b-0b,3f-2f
+ altinstruction_entry 0b,3f,\feat2,1b-0b,4f-3f
+ .popsection
+
+ .pushsection .altinstr_replacement,"ax"
+2:
+ \alt1
+3:
+ \alt2
+4:
+ .popsection
+.endm
+
#endif /* __ASSEMBLY__ */
#endif /* _ASM_X86_ALTERNATIVE_ASM_H */
diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index 1db37780a344..b9ea801bd3ed 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -481,13 +481,7 @@ static __always_inline __pure bool __static_cpu_has(u16 bit)
static __always_inline __pure bool _static_cpu_has_safe(u16 bit)
{
#ifdef CC_HAVE_ASM_GOTO
-/*
- * We need to spell the jumps to the compiler because, depending on the offset,
- * the replacement jump can be bigger than the original jump, and this we cannot
- * have. Thus, we force the jump to the widest, 4-byte, signed relative
- * offset even though the last would often fit in less bytes.
- */
- asm_volatile_goto("1: .byte 0xe9\n .long %l[t_dynamic] - 2f\n"
+ asm_volatile_goto("1: jmp %l[t_dynamic]\n"
"2:\n"
".skip -(((4f-3f) - (2b-1b)) > 0) * "
"((4f-3f) - (2b-1b)),0x90\n"
@@ -499,7 +493,7 @@ static __always_inline __pure bool _static_cpu_has_safe(u16 bit)
" .byte 4f - 3f\n" /* repl len */
".previous\n"
".section .altinstr_replacement,\"ax\"\n"
- "3: .byte 0xe9\n .long %l[t_no] - 2b\n"
+ "3: jmp %l[t_no]\n"
"4:\n"
".previous\n"
".section .altinstructions,\"a\"\n"
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index c99b0f13a90e..974602c1e20d 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -244,6 +244,80 @@ extern s32 __smp_locks[], __smp_locks_end[];
void *text_poke_early(void *addr, const void *opcode, size_t len);
/*
+ * Are we looking at a near JMP with a 1,2, or 4-byte displacement.
+ */
+static inline bool is_jmp(const u8 opcode)
+{
+ return opcode == 0xeb || opcode == 0xe9;
+}
+
+static size_t __init_or_module recompute_jumps(struct alt_instr *a, u8 *insnbuf)
+{
+ u8 *next_rip, *tgt_rip;
+ s32 displ, new_displ;
+ size_t ret = 0;
+
+ if (debug_alternative) {
+ int i;
+
+ printk(KERN_DEBUG "alt_insn: ");
+ for (i = 0; i < a->replacementlen; i++)
+ printk(KERN_CONT "%02hhx ", insnbuf[i]);
+ printk(KERN_CONT "\n");
+ }
+
+ /* Already a two-byte JMP */
+ if (a->replacementlen == 2)
+ return ret;
+
+ WARN(a->replacementlen != 5, "WTF replacementlen: %d\n", a->replacementlen);
+
+ /* JMP rel32off */
+ displ = *(s32 *)(insnbuf + 1);
+
+ /*
+ * Clear out the old instruction in case we end up pasting in a shorter
+ * one and remnants from the old instruction would confuse us.
+ */
+ memset(insnbuf, 0x90, a->replacementlen);
+
+ /* next rIP of replacement insn */
+ next_rip = (u8 *)&a->repl_offset + a->repl_offset + a->replacementlen;
+ /* target rIP of replacement insn */
+ tgt_rip = next_rip + displ;
+ /* new displacement */
+ new_displ = tgt_rip - ((u8 *)&a->instr_offset + a->instr_offset);
+
+ if (-128 <= new_displ && new_displ <= 127) {
+ ret = 2;
+ new_displ -= 2;
+
+ insnbuf[0] = 0xeb;
+ insnbuf[1] = (s8)new_displ;
+ } else {
+ ret = 5;
+ new_displ -= 5;
+
+ insnbuf[0] = 0xe9;
+ *(s32 *)&insnbuf[1] = new_displ;
+ }
+
+ DPRINTK("next_rip: %p, tgt_rip: %p, new_displ: 0x%08x, ret len: %ld",
+ next_rip, tgt_rip, new_displ, ret);
+
+ if (debug_alternative) {
+ int i;
+
+ printk(KERN_DEBUG "converted to: ");
+ for (i = 0; i < a->replacementlen; i++)
+ printk(KERN_CONT "%02hhx ", insnbuf[i]);
+ printk(KERN_CONT "\n");
+ }
+
+ return ret;
+}
+
+/*
* Replace instructions with better alternatives for this CPU type. This runs
* before SMP is initialized to avoid SMP problems with self modifying code.
* This implies that asymmetric systems where APs have less capabilities than
@@ -268,6 +342,8 @@ void __init_or_module apply_alternatives(struct alt_instr *start,
* order.
*/
for (a = start; a < end; a++) {
+ size_t insnbuf_len = 0;
+
instr = (u8 *)&a->instr_offset + a->instr_offset;
replacement = (u8 *)&a->repl_offset + a->repl_offset;
BUG_ON(a->instrlen > sizeof(insnbuf));
@@ -289,16 +365,19 @@ void __init_or_module apply_alternatives(struct alt_instr *start,
DPRINTK("Fix CALL offset: 0x%x", *(s32 *)(insnbuf + 1));
}
+ if (is_jmp(instr[0]) && is_jmp(replacement[0]))
+ insnbuf_len = recompute_jumps(a, insnbuf);
+
if (a->instrlen > a->replacementlen)
add_nops(insnbuf + a->replacementlen,
a->instrlen - a->replacementlen);
- text_poke_early(instr, insnbuf, a->instrlen);
+ text_poke_early(instr, insnbuf,
+ (insnbuf_len > 0 ? insnbuf_len : a->instrlen));
}
}
#ifdef CONFIG_SMP
-
static void alternatives_smp_lock(const s32 *start, const s32 *end,
u8 *text, u8 *text_end)
{
@@ -449,7 +528,7 @@ int alternatives_text_reserved(void *start, void *end)
return 0;
}
-#endif
+#endif /* CONFIG_SMP */
#ifdef CONFIG_PARAVIRT
void __init_or_module apply_paravirt(struct paravirt_patch_site *start,
diff --git a/arch/x86/lib/copy_page_64.S b/arch/x86/lib/copy_page_64.S
index 176cca67212b..9d8b1b8da251 100644
--- a/arch/x86/lib/copy_page_64.S
+++ b/arch/x86/lib/copy_page_64.S
@@ -2,23 +2,35 @@
#include <linux/linkage.h>
#include <asm/dwarf2.h>
+#include <asm/cpufeature.h>
#include <asm/alternative-asm.h>
+/*
+ * Some CPUs run faster using the string copy instructions. It is also a lot
+ * simpler. Use this when possible
+ */
+
+ENTRY(copy_page)
+ CFI_STARTPROC
+ ALTERNATIVE X86_FEATURE_REP_GOOD, "jmp _copy_page", "jmp _copy_page_rep"
+ CFI_ENDPROC
+ENDPROC(copy_page)
+
ALIGN
-copy_page_rep:
+ENTRY(_copy_page_rep)
CFI_STARTPROC
movl $4096/8, %ecx
rep movsq
ret
CFI_ENDPROC
-ENDPROC(copy_page_rep)
+ENDPROC(_copy_page_rep)
/*
* Don't use streaming copy unless the CPU indicates X86_FEATURE_REP_GOOD.
* Could vary the prefetch distance based on SMP/UP.
*/
-ENTRY(copy_page)
+ENTRY(_copy_page)
CFI_STARTPROC
subq $2*8, %rsp
CFI_ADJUST_CFA_OFFSET 2*8
@@ -90,21 +102,7 @@ ENTRY(copy_page)
addq $2*8, %rsp
CFI_ADJUST_CFA_OFFSET -2*8
ret
-.Lcopy_page_end:
CFI_ENDPROC
-ENDPROC(copy_page)
-
- /* Some CPUs run faster using the string copy instructions.
- It is also a lot simpler. Use this when possible */
+ENDPROC(_copy_page)
-#include <asm/cpufeature.h>
- .section .altinstr_replacement,"ax"
-1: .byte 0xeb /* jmp <disp8> */
- .byte (copy_page_rep - copy_page) - (2f - 1b) /* offset */
-2:
- .previous
- .section .altinstructions,"a"
- altinstruction_entry copy_page, 1b, X86_FEATURE_REP_GOOD, \
- .Lcopy_page_end-copy_page, 2b-1b
- .previous
diff --git a/arch/x86/lib/copy_user_64.S b/arch/x86/lib/copy_user_64.S
index 1530ec2c1b12..3de90e9c9de1 100644
--- a/arch/x86/lib/copy_user_64.S
+++ b/arch/x86/lib/copy_user_64.S
@@ -16,31 +16,6 @@
#include <asm/asm.h>
#include <asm/smap.h>
-/*
- * By placing feature2 after feature1 in altinstructions section, we logically
- * implement:
- * If CPU has feature2, jmp to alt2 is used
- * else if CPU has feature1, jmp to alt1 is used
- * else jmp to orig is used.
- */
- .macro ALTERNATIVE_JUMP feature1,feature2,orig,alt1,alt2
-0:
- .byte 0xe9 /* 32bit jump */
- .long \orig-1f /* by default jump to orig */
-1:
- .section .altinstr_replacement,"ax"
-2: .byte 0xe9 /* near jump with 32bit immediate */
- .long \alt1-1b /* offset */ /* or alternatively to alt1 */
-3: .byte 0xe9 /* near jump with 32bit immediate */
- .long \alt2-1b /* offset */ /* or alternatively to alt2 */
- .previous
-
- .section .altinstructions,"a"
- altinstruction_entry 0b,2b,\feature1,5,5
- altinstruction_entry 0b,3b,\feature2,5,5
- .previous
- .endm
-
.macro ALIGN_DESTINATION
/* check for bad alignment of destination */
movl %edi,%ecx
@@ -74,9 +49,10 @@ ENTRY(_copy_to_user)
jc bad_to_user
cmpq TI_addr_limit(%rax),%rcx
ja bad_to_user
- ALTERNATIVE_JUMP X86_FEATURE_REP_GOOD,X86_FEATURE_ERMS, \
- copy_user_generic_unrolled,copy_user_generic_string, \
- copy_user_enhanced_fast_string
+ ALTERNATIVE_2 X86_FEATURE_REP_GOOD,X86_FEATURE_ERMS, \
+ "jmp copy_user_generic_unrolled", \
+ "jmp copy_user_generic_string", \
+ "jmp copy_user_enhanced_fast_string"
CFI_ENDPROC
ENDPROC(_copy_to_user)
@@ -89,9 +65,10 @@ ENTRY(_copy_from_user)
jc bad_from_user
cmpq TI_addr_limit(%rax),%rcx
ja bad_from_user
- ALTERNATIVE_JUMP X86_FEATURE_REP_GOOD,X86_FEATURE_ERMS, \
- copy_user_generic_unrolled,copy_user_generic_string, \
- copy_user_enhanced_fast_string
+ ALTERNATIVE_2 X86_FEATURE_REP_GOOD,X86_FEATURE_ERMS, \
+ "jmp copy_user_generic_unrolled", \
+ "jmp copy_user_generic_string", \
+ "jmp copy_user_enhanced_fast_string"
CFI_ENDPROC
ENDPROC(_copy_from_user)
--
2.2.0.33.gc18b867
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [RFC PATCH 2/4] x86, alternatives: Cleanup DPRINTK macro
2015-01-05 15:00 ` [RFC PATCH 2/4] x86, alternatives: Cleanup DPRINTK macro Borislav Petkov
@ 2015-01-05 17:16 ` Joe Perches
0 siblings, 0 replies; 6+ messages in thread
From: Joe Perches @ 2015-01-05 17:16 UTC (permalink / raw)
To: Borislav Petkov; +Cc: X86 ML, LKML
On Mon, 2015-01-05 at 16:00 +0100, Borislav Petkov wrote:
> From: Borislav Petkov <bp@suse.de>
>
> Make it pass __func__ implicitly. Also, dump info about each replacing
> we're doing. Fixup comments and style while at it.
It may be better to use dynamic debug functionality
directly with pr_debug instead of this __setup with
"debug-alternative".
It's becoming quite a bit more common to use the
#define macro(fmt, ...) style where you converted
back to the older #define macro(fmt, args...) style.
> diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
[]
> @@ -52,10 +52,10 @@ static int __init setup_noreplace_paravirt(char *str)
> __setup("noreplace-paravirt", setup_noreplace_paravirt);
> #endif
>
> -#define DPRINTK(fmt, ...) \
> -do { \
> - if (debug_alternative) \
> - printk(KERN_DEBUG fmt, ##__VA_ARGS__); \
> +#define DPRINTK(fmt, args...) \
> +do { \
> + if (debug_alternative) \
> + printk(KERN_DEBUG "%s: " fmt "\n", __func__, ##args); \
> } while (0)
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2015-01-05 17:16 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-01-05 15:00 [RFC PATCH 0/4] x86, alternatives: Insn padding and more robust JMPs Borislav Petkov
2015-01-05 15:00 ` [RFC PATCH 1/4] x86, copy_user: Remove FIX_ALIGNMENT define Borislav Petkov
2015-01-05 15:00 ` [RFC PATCH 2/4] x86, alternatives: Cleanup DPRINTK macro Borislav Petkov
2015-01-05 17:16 ` Joe Perches
2015-01-05 15:00 ` [RFC PATCH 3/4] alternatives: Add instruction padding Borislav Petkov
2015-01-05 15:00 ` [RFC PATCH 4/4] alternatives: Make JMPs more robust Borislav Petkov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox