* [PATCH 0/3] Reject setting system segments from userspace
@ 2023-12-13 16:34 Brian Gerst
2023-12-13 16:34 ` [PATCH 1/3] x86: Move TSS and LDT to end of the GDT Brian Gerst
` (2 more replies)
0 siblings, 3 replies; 14+ messages in thread
From: Brian Gerst @ 2023-12-13 16:34 UTC (permalink / raw)
To: linux-kernel, x86
Cc: Ingo Molnar, Thomas Gleixner, Borislav Petkov, H . Peter Anvin,
Peter Zijlstra, Linus Torvalds, Brian Gerst
Michal noted[1] that on systems that support UMIP, the instruction
decoder can be tricked into leaking the address of the TSS or LDT by
using ptrace to set the SS segment to a system segment index. Prevent
this from happening by rejecting attempts to use a system segment in the
ptrace and sigreturn syscalls.
[1] https://lore.kernel.org/lkml/20231206004654.2986026-1-mhal@rbox.co/
Brian Gerst (3):
x86: Move TSS and LDT to end of the GDT
x86/ptrace: Reject system segements
x86/sigreturn: Reject system segements
arch/x86/include/asm/segment.h | 44 ++++++++++++++++++++++++----------
arch/x86/kernel/ptrace.c | 12 ++--------
arch/x86/kernel/signal_32.c | 4 ++++
arch/x86/kernel/signal_64.c | 4 ++++
4 files changed, 42 insertions(+), 22 deletions(-)
base-commit: 3d626e0a7be7ddb635791fee18cb40631bc1d0b3
--
2.43.0
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 1/3] x86: Move TSS and LDT to end of the GDT
2023-12-13 16:34 [PATCH 0/3] Reject setting system segments from userspace Brian Gerst
@ 2023-12-13 16:34 ` Brian Gerst
2023-12-13 18:51 ` Linus Torvalds
2023-12-13 16:34 ` [PATCH 2/3] x86/ptrace: Reject system segements Brian Gerst
2023-12-13 16:34 ` [PATCH 3/3] x86/sigreturn: " Brian Gerst
2 siblings, 1 reply; 14+ messages in thread
From: Brian Gerst @ 2023-12-13 16:34 UTC (permalink / raw)
To: linux-kernel, x86
Cc: Ingo Molnar, Thomas Gleixner, Borislav Petkov, H . Peter Anvin,
Peter Zijlstra, Linus Torvalds, Brian Gerst
This will make testing for system segments easier.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
---
arch/x86/include/asm/segment.h | 33 +++++++++++++++++++++------------
1 file changed, 21 insertions(+), 12 deletions(-)
diff --git a/arch/x86/include/asm/segment.h b/arch/x86/include/asm/segment.h
index 9d6411c65920..a155843d0c37 100644
--- a/arch/x86/include/asm/segment.h
+++ b/arch/x86/include/asm/segment.h
@@ -83,8 +83,8 @@
* 13 - kernel data segment
* 14 - default user CS
* 15 - default user DS
- * 16 - TSS <=== cacheline #5
- * 17 - LDT
+ * 16 - unused <=== cacheline #5
+ * 17 - unused
* 18 - PNPBIOS support (16->32 gate)
* 19 - PNPBIOS support
* 20 - PNPBIOS support <=== cacheline #6
@@ -97,8 +97,11 @@
* 26 - ESPFIX small SS
* 27 - per-cpu [ offset to per-cpu data area ]
* 28 - VDSO getcpu
- * 29 - unused
- * 30 - unused
+ *
+ * ------- start of system segments:
+ *
+ * 29 - TSS
+ * 30 - LDT
* 31 - TSS for double fault handler
*/
#define GDT_ENTRY_TLS_MIN 6
@@ -108,8 +111,6 @@
#define GDT_ENTRY_KERNEL_DS 13
#define GDT_ENTRY_DEFAULT_USER_CS 14
#define GDT_ENTRY_DEFAULT_USER_DS 15
-#define GDT_ENTRY_TSS 16
-#define GDT_ENTRY_LDT 17
#define GDT_ENTRY_PNPBIOS_CS32 18
#define GDT_ENTRY_PNPBIOS_CS16 19
#define GDT_ENTRY_PNPBIOS_DS 20
@@ -121,6 +122,10 @@
#define GDT_ENTRY_PERCPU 27
#define GDT_ENTRY_CPUNODE 28
+/* Start of system segments */
+
+#define GDT_ENTRY_TSS 29
+#define GDT_ENTRY_LDT 30
#define GDT_ENTRY_DOUBLEFAULT_TSS 31
/*
@@ -188,20 +193,22 @@
#define GDT_ENTRY_DEFAULT_USER_DS 5
#define GDT_ENTRY_DEFAULT_USER_CS 6
-/* Needs two entries */
-#define GDT_ENTRY_TSS 8
-/* Needs two entries */
-#define GDT_ENTRY_LDT 10
-
#define GDT_ENTRY_TLS_MIN 12
#define GDT_ENTRY_TLS_MAX 14
#define GDT_ENTRY_CPUNODE 15
+/* Start of system segments */
+
+/* Needs two entries */
+#define GDT_ENTRY_TSS 16
+/* Needs two entries */
+#define GDT_ENTRY_LDT 18
+
/*
* Number of entries in the GDT table:
*/
-#define GDT_ENTRIES 16
+#define GDT_ENTRIES 20
/*
* Segment selector values corresponding to the above entries:
@@ -219,6 +226,8 @@
#endif
+#define GDT_SYSTEM_START GDT_ENTRY_TSS
+
#define IDT_ENTRIES 256
#define NUM_EXCEPTION_VECTORS 32
--
2.43.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 2/3] x86/ptrace: Reject system segements
2023-12-13 16:34 [PATCH 0/3] Reject setting system segments from userspace Brian Gerst
2023-12-13 16:34 ` [PATCH 1/3] x86: Move TSS and LDT to end of the GDT Brian Gerst
@ 2023-12-13 16:34 ` Brian Gerst
2023-12-13 16:34 ` [PATCH 3/3] x86/sigreturn: " Brian Gerst
2 siblings, 0 replies; 14+ messages in thread
From: Brian Gerst @ 2023-12-13 16:34 UTC (permalink / raw)
To: linux-kernel, x86
Cc: Ingo Molnar, Thomas Gleixner, Borislav Petkov, H . Peter Anvin,
Peter Zijlstra, Linus Torvalds, Brian Gerst, Michal Luczaj
Do not allow system segments (TSS and LDT) from being poked into segment
registers via ptrace. Loading these segments into a segment register
normally results in a general protection fault. But in the case of
ptrace, setting CS or SS to a system segment will cause IRET to fault.
This then results in the instruction decoder attempting to use the
invalid segment. This can be avoided by rejecting system segments in
PTRACE_SETREGS.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Reported-By: Michal Luczaj <mhal@rbox.co>
Link: https://lore.kernel.org/lkml/20231206004654.2986026-1-mhal@rbox.co/
---
arch/x86/include/asm/segment.h | 11 +++++++++++
arch/x86/kernel/ptrace.c | 12 ++----------
2 files changed, 13 insertions(+), 10 deletions(-)
diff --git a/arch/x86/include/asm/segment.h b/arch/x86/include/asm/segment.h
index a155843d0c37..ede1fa5aa4cc 100644
--- a/arch/x86/include/asm/segment.h
+++ b/arch/x86/include/asm/segment.h
@@ -359,6 +359,17 @@ static inline void __loadsegment_fs(unsigned short value)
#define savesegment(seg, value) \
asm("mov %%" #seg ",%0":"=r" (value) : : "memory")
+/*
+ * Determines whether a value may be installed in a segment register.
+ */
+static inline bool valid_user_selector(u16 value)
+{
+ if (unlikely(!(value & SEGMENT_TI_MASK) && value >= (GDT_SYSTEM_START * 8)))
+ return false;
+
+ return likely(value == 0 || (value & SEGMENT_RPL_MASK) == USER_RPL);
+}
+
#endif /* !__ASSEMBLY__ */
#endif /* __KERNEL__ */
diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index 095f04bdabdc..4c3a2278691e 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -162,14 +162,6 @@ const char *regs_query_register_name(unsigned int offset)
X86_EFLAGS_DF | X86_EFLAGS_OF | \
X86_EFLAGS_RF | X86_EFLAGS_AC))
-/*
- * Determines whether a value may be installed in a segment register.
- */
-static inline bool invalid_selector(u16 value)
-{
- return unlikely(value != 0 && (value & SEGMENT_RPL_MASK) != USER_RPL);
-}
-
#ifdef CONFIG_X86_32
#define FLAG_MASK FLAG_MASK_32
@@ -206,7 +198,7 @@ static int set_segment_reg(struct task_struct *task,
/*
* The value argument was already truncated to 16 bits.
*/
- if (invalid_selector(value))
+ if (!valid__user_selector(value))
return -EIO;
/*
@@ -296,7 +288,7 @@ static int set_segment_reg(struct task_struct *task,
/*
* The value argument was already truncated to 16 bits.
*/
- if (invalid_selector(value))
+ if (!valid_user_selector(value))
return -EIO;
/*
--
2.43.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 3/3] x86/sigreturn: Reject system segements
2023-12-13 16:34 [PATCH 0/3] Reject setting system segments from userspace Brian Gerst
2023-12-13 16:34 ` [PATCH 1/3] x86: Move TSS and LDT to end of the GDT Brian Gerst
2023-12-13 16:34 ` [PATCH 2/3] x86/ptrace: Reject system segements Brian Gerst
@ 2023-12-13 16:34 ` Brian Gerst
2023-12-13 18:54 ` Linus Torvalds
2 siblings, 1 reply; 14+ messages in thread
From: Brian Gerst @ 2023-12-13 16:34 UTC (permalink / raw)
To: linux-kernel, x86
Cc: Ingo Molnar, Thomas Gleixner, Borislav Petkov, H . Peter Anvin,
Peter Zijlstra, Linus Torvalds, Brian Gerst, Michal Luczaj
Do not allow system segments (TSS and LDT) from being loaded into segment
registers via sigreturn. Loading these segments into a segment register
normally results in a general protection fault. In the case of sigreturn,
setting CS or SS to a system segment will cause IRET to fault. This
then results in the instruction decoder attempting to use the invalid
segment. This can be avoided by rejecting system segments in the
sigreturn() syscall.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Reported-By: Michal Luczaj <mhal@rbox.co>
Link: https://lore.kernel.org/lkml/20231206004654.2986026-1-mhal@rbox.co/
---
arch/x86/kernel/signal_32.c | 4 ++++
arch/x86/kernel/signal_64.c | 4 ++++
2 files changed, 8 insertions(+)
diff --git a/arch/x86/kernel/signal_32.c b/arch/x86/kernel/signal_32.c
index c12624bc82a3..0e1926b676b0 100644
--- a/arch/x86/kernel/signal_32.c
+++ b/arch/x86/kernel/signal_32.c
@@ -98,7 +98,11 @@ static bool ia32_restore_sigcontext(struct pt_regs *regs,
/* Get CS/SS and force CPL3 */
regs->cs = sc.cs | 0x03;
+ if (!valid_user_selector(regs->cs))
+ return false;
regs->ss = sc.ss | 0x03;
+ if (!valid_user_selector(regs->ss))
+ return false;
regs->flags = (regs->flags & ~FIX_EFLAGS) | (sc.flags & FIX_EFLAGS);
/* disable syscall checks */
diff --git a/arch/x86/kernel/signal_64.c b/arch/x86/kernel/signal_64.c
index 23d8aaf8d9fd..666b147bf43a 100644
--- a/arch/x86/kernel/signal_64.c
+++ b/arch/x86/kernel/signal_64.c
@@ -79,7 +79,11 @@ static bool restore_sigcontext(struct pt_regs *regs,
/* Get CS/SS and force CPL3 */
regs->cs = sc.cs | 0x03;
+ if (!valid_user_selector(regs->cs))
+ return false;
regs->ss = sc.ss | 0x03;
+ if (!valid_user_selector(regs->ss))
+ return false;
regs->flags = (regs->flags & ~FIX_EFLAGS) | (sc.flags & FIX_EFLAGS);
/* disable syscall checks */
--
2.43.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 1/3] x86: Move TSS and LDT to end of the GDT
2023-12-13 16:34 ` [PATCH 1/3] x86: Move TSS and LDT to end of the GDT Brian Gerst
@ 2023-12-13 18:51 ` Linus Torvalds
2023-12-13 19:08 ` Linus Torvalds
2023-12-17 21:09 ` H. Peter Anvin
0 siblings, 2 replies; 14+ messages in thread
From: Linus Torvalds @ 2023-12-13 18:51 UTC (permalink / raw)
To: Brian Gerst
Cc: linux-kernel, x86, Ingo Molnar, Thomas Gleixner, Borislav Petkov,
H . Peter Anvin, Peter Zijlstra
On Wed, 13 Dec 2023 at 08:34, Brian Gerst <brgerst@gmail.com> wrote:
>
> This will make testing for system segments easier.
It seems to make more sense organizationally too, with the special
non-data/code segments clearly separate at the end.
So I think this is fine conceptually.
HOWEVER, I think that you might want to expand on this a bit more,
because there are other special segments selectors that might not be
thing you want to expose to user space.
We have GDT_ENTRY_PERCPU for example, which is a kernel-only segment.
It also happens to be 32-bit only, it doesn't matter for the thing
you're trying to fix, but that valid_user_selector() thing is then
used on x86-32 too.
So the ESPFIX and per-cpu segments are kernel-only, but then the VDSO
getcpu one is a user segment.
And the PnP and APM BIOS segments are similarly kernel-only.
But then the VDSO getcpu segment is user-visible, in the middle, and
again, it's 32-bit only but that whole GDT_SYSTEM_START thing is
supposed to work there too.
End result: this seems incomplete and not really fully baked.
I wonder if instead of GDT_SYSTEM_START, you'd be better off just
making a trivial constant bitmap of "these are user visible segments
in the GDT". No need to re-order things, just have something like
#define USER_SEGMENTS_MASK \
((1ul << GDT_ENTRY_DEFAULT_USER_CS) |
,,,,
and use that for the test (remember to check for GDT_ENTRIES as the max).
Hmm?
Linus
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 3/3] x86/sigreturn: Reject system segements
2023-12-13 16:34 ` [PATCH 3/3] x86/sigreturn: " Brian Gerst
@ 2023-12-13 18:54 ` Linus Torvalds
2023-12-17 21:07 ` H. Peter Anvin
0 siblings, 1 reply; 14+ messages in thread
From: Linus Torvalds @ 2023-12-13 18:54 UTC (permalink / raw)
To: Brian Gerst
Cc: linux-kernel, x86, Ingo Molnar, Thomas Gleixner, Borislav Petkov,
H . Peter Anvin, Peter Zijlstra, Michal Luczaj
On Wed, 13 Dec 2023 at 08:34, Brian Gerst <brgerst@gmail.com> wrote:
>
> @@ -98,7 +98,11 @@ static bool ia32_restore_sigcontext(struct pt_regs *regs,
>
> /* Get CS/SS and force CPL3 */
> regs->cs = sc.cs | 0x03;
> + if (!valid_user_selector(regs->cs))
> + return false;
> regs->ss = sc.ss | 0x03;
> + if (!valid_user_selector(regs->ss))
> + return false;
Side note: the SS/CS checks could be stricter than the usual selector tests.
In particular, normal segments can be Null segments. But CS/SS must not be.
Also, since you're now checking the validity, maybe we shouldn't do
the "force cpl3" any more, and just make it an error to try to load a
non-cpl3 segment at sigreturn..
That forcing was literally just because we weren't checking it for sanity...
Linus
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/3] x86: Move TSS and LDT to end of the GDT
2023-12-13 18:51 ` Linus Torvalds
@ 2023-12-13 19:08 ` Linus Torvalds
2023-12-16 18:24 ` Vegard Nossum
2023-12-17 21:09 ` H. Peter Anvin
1 sibling, 1 reply; 14+ messages in thread
From: Linus Torvalds @ 2023-12-13 19:08 UTC (permalink / raw)
To: Brian Gerst
Cc: linux-kernel, x86, Ingo Molnar, Thomas Gleixner, Borislav Petkov,
H . Peter Anvin, Peter Zijlstra
On Wed, 13 Dec 2023 at 10:51, Linus Torvalds
<torvalds@linuxfoundation.org> wrote:
>
> We have GDT_ENTRY_PERCPU for example, which is a kernel-only segment.
> It also happens to be 32-bit only, it doesn't matter for the thing
> you're trying to fix, but that valid_user_selector() thing is then
> used on x86-32 too.
>
> So the ESPFIX and per-cpu segments are kernel-only, but then the VDSO
> getcpu one is a user segment.
>
> And the PnP and APM BIOS segments are similarly kernel-only.
Final (?) note: when looking at this, I have to say that our
GDT_ENTRY_INIT() and GDT_ENTRY() macros are horrendous.
I know exactly *why* they are horrendous, with all the history of
passing in raw flags values, etc, and you can most certainly see that
whole thing in the GDT_ENTRY() macro. It's used in assembly code in a
couple of cases too.
But then you look at GDT_ENTRY_INIT(), and it turns that illegible
"flags" value into (slightly more) legible S/DPL/etc values. So it
literally makes people use those odd "this is how this is encoded"
values even when the code actually wants to use a structure definition
that has the flags split out.
I guess it's much too much work to really fix things, but maybe we
could at least add #defines and comments for the special values.
So instead of
GDT_ENTRY_INIT(0xc093, 0, 0xfffff)
we could maybe have
#define GDT_ENTRY_FLAGS(type,s,dpl,p,avl,l,d,g) \
((type) |
(s)<<4) | \
(dpl) << 5) | ....
and have #defines for those 0xc093 values (with comments), so that we'd have
GDT_ENTRY_INIT(KERNEL_DATA_FLAGS, 0, 0xffff)
instead of a magic 0xc093 number.
This would require some nit-picky "read all those values and know the
crazy descriptor table layout" thing. Maybe somebody has a serious
case of insomnia and boredom?
Linus
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/3] x86: Move TSS and LDT to end of the GDT
2023-12-13 19:08 ` Linus Torvalds
@ 2023-12-16 18:24 ` Vegard Nossum
2023-12-16 18:40 ` Linus Torvalds
0 siblings, 1 reply; 14+ messages in thread
From: Vegard Nossum @ 2023-12-16 18:24 UTC (permalink / raw)
To: Linus Torvalds, Brian Gerst
Cc: linux-kernel, x86, Ingo Molnar, Thomas Gleixner, Borislav Petkov,
H . Peter Anvin, Peter Zijlstra
[-- Attachment #1: Type: text/plain, Size: 1568 bytes --]
On 13/12/2023 20:08, Linus Torvalds wrote:
> I guess it's much too much work to really fix things, but maybe we
> could at least add #defines and comments for the special values.
>
> So instead of
>
> GDT_ENTRY_INIT(0xc093, 0, 0xfffff)
>
> we could maybe have
>
> #define GDT_ENTRY_FLAGS(type,s,dpl,p,avl,l,d,g) \
> ((type) |
> (s)<<4) | \
> (dpl) << 5) | ....
>
> and have #defines for those 0xc093 values (with comments), so that we'd have
>
> GDT_ENTRY_INIT(KERNEL_DATA_FLAGS, 0, 0xffff)
>
> instead of a magic 0xc093 number.
>
> This would require some nit-picky "read all those values and know the
> crazy descriptor table layout" thing. Maybe somebody has a serious
> case of insomnia and boredom?
I took a stab at this, see attached RFC patch. Maybe this does too much,
though.
I did basic build and boot tests on both 64- and 32-bit, but I would
also try a binary diff before/after just to verify nothing changed by
accident, as well as making sure all the code is actually compiled (some
of the BIOS stuff only gets used on 32-bit with ISA/PNP enabled, for
example).
While preparing the patch I also came across some things that are
unclear to me:
- why do we want some segments with the A (accessed) bit set and some
with it cleared -- is there an actual reason for the difference, or
could we just set it for all of them?
- why does setup_percpu_segment() want the DB (size) flag clear? This
seems to indicate that it's a 16-bit segment -- is this correct?
Vegard
[-- Attachment #2: 0001-x86-replace-magic-numbers-in-GDT-descriptors.patch --]
[-- Type: text/x-patch, Size: 22004 bytes --]
From 3211be5ff46abb597492a9d16746c51111e3b257 Mon Sep 17 00:00:00 2001
From: Vegard Nossum <vegard.nossum@oracle.com>
Date: Sat, 16 Dec 2023 15:45:11 +0100
Subject: [PATCH] x86: replace magic numbers in GDT descriptors
Linus suggested replacing the magic numbers in the GDT descriptors
using preprocessor macros. This patch isn't precisely what was asked
for; rationale as follows:
Designing the interface properly is actually pretty hard -- there are
several constraints:
- you want the final expressions to be readable at a glance; something
like GDT_ENTRY_FLAGS(5, 1, 0, 1, 0, 1, 1, 0) isn't because you need
to visit the definition to understand what each parameter represents
and then match up parameters in the user and the definition (which is
hard when there are so many of them)
- you want the final expressions to be fairly short/information-dense;
something like GDT_ENTRY_PRESENT | GDT_ENTRY_DATA_WRITABLE |
GDT_ENTRY_SYSTEM | GDT_ENTRY_DB | GDT_ENTRY_GRANULARITY_4K is a bit
too verbose to write out every time and is actually hard to read as
well because of all the repetition
- you may want to assume defaults for some things (e.g. entries are
DPL-0 a.k.a. kernel segments by default) and allow the user to
override the default -- but this works best if you can OR in the
override; if you want DPL-3 by default and override with DPL-0 you
would need to start masking off bits instead of OR-ing them in and
that just becomes harder to read
- you may want to parameterize some things (e.g. CODE vs. DATA or
KERNEL vs. USER) since both values are used and you don't really
want prefer either one by default -- or DPL, which is always some
value that is always specified
This patch tries to balance these requirements and has two layers of
definitions -- low-level and high-level:
- the low-level defines are the mapping between human-readable names
and the actual bit numbers
- the high-level defines are the mapping from high-level intent to
combinations of low-level flags, representing roughly a tuple
(data/code, 64/32/16-bits) plus an override for DPL-3, since that's
relatively rare but still very important to mark properly for those
segments.
Things that are still not so good in this patch are:
- some lines overflow 80 columns, although I've heard that coule be
fine in 2023
- we have *_BIOS variants for 32-bit code and data segments that don't
have the G flag set and give the limit in terms of bytes instead of
pages
While preparing the patch I also came across some things that are
unclear to me:
- why do we want some segments with the A (accessed) bit set and some
with it cleared -- is there an actual reason for the difference, or
could we just set it for all of them?
- why does setup_percpu_segment() want the DB (size) flag clear? This
seems to indicate that it's a 16-bit segment -- is this correct?
I took the liberty to remove some comments from the PNPBIOS/APMBIOS
entries that are now superfluous since what they were expressing is now
directly readable from the code.
I used this to find all the existing users of the GDT_ENTRY*() macros:
$ git grep -P 'GDT_ENTRY(_INIT)?\('
Testing:
I added a selftest that verifies that the new symbolic values are the
same as the old numeric ones, although it's more of a test that this
commit doesn't change anything as opposed to a true self test; run it
with:
$ make -C tools/testing/selftests/x86 desc
$ tools/testing/selftests/x86/desc_64
Link: https://lore.kernel.org/all/CAHk-=wib5XLebuEra7y2YH96wxdk=8vJnA8XoVq0FExpzVvN=Q@mail.gmail.com/
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
---
arch/x86/boot/pm.c | 7 +-
arch/x86/include/asm/desc_defs.h | 61 +++++++++--
arch/x86/kernel/apm_32.c | 2 +-
arch/x86/kernel/cpu/common.c | 50 ++++-----
arch/x86/kernel/head64.c | 6 +-
arch/x86/kernel/setup_percpu.c | 4 +-
arch/x86/platform/pvh/head.S | 7 +-
arch/x86/realmode/rm/reboot.S | 3 +-
drivers/firmware/efi/libstub/x86-5lvl.c | 4 +-
drivers/pnp/pnpbios/bioscalls.c | 2 +-
tools/testing/selftests/x86/Makefile | 2 +-
tools/testing/selftests/x86/desc.c | 134 ++++++++++++++++++++++++
12 files changed, 228 insertions(+), 54 deletions(-)
create mode 100644 tools/testing/selftests/x86/desc.c
diff --git a/arch/x86/boot/pm.c b/arch/x86/boot/pm.c
index 40031a614712..ab35b52d2c4b 100644
--- a/arch/x86/boot/pm.c
+++ b/arch/x86/boot/pm.c
@@ -11,6 +11,7 @@
*/
#include "boot.h"
+#include <asm/desc_defs.h>
#include <asm/segment.h>
/*
@@ -67,13 +68,13 @@ static void setup_gdt(void)
being 8-byte unaligned. Intel recommends 16 byte alignment. */
static const u64 boot_gdt[] __attribute__((aligned(16))) = {
/* CS: code, read/execute, 4 GB, base 0 */
- [GDT_ENTRY_BOOT_CS] = GDT_ENTRY(0xc09b, 0, 0xfffff),
+ [GDT_ENTRY_BOOT_CS] = GDT_ENTRY(DESC_CODE32 | _DESC_ACCESSED, 0, 0xfffff),
/* DS: data, read/write, 4 GB, base 0 */
- [GDT_ENTRY_BOOT_DS] = GDT_ENTRY(0xc093, 0, 0xfffff),
+ [GDT_ENTRY_BOOT_DS] = GDT_ENTRY(DESC_DATA32 | _DESC_ACCESSED, 0, 0xfffff),
/* TSS: 32-bit tss, 104 bytes, base 4096 */
/* We only have a TSS here to keep Intel VT happy;
we don't actually use it for anything. */
- [GDT_ENTRY_BOOT_TSS] = GDT_ENTRY(0x0089, 4096, 103),
+ [GDT_ENTRY_BOOT_TSS] = GDT_ENTRY(DESC_TSS32, 4096, 103),
};
/* Xen HVM incorrectly stores a pointer to the gdt_ptr, instead
of the gdt_ptr contents. Thus, make it static so it will
diff --git a/arch/x86/include/asm/desc_defs.h b/arch/x86/include/asm/desc_defs.h
index f7e7099af595..8843332b0975 100644
--- a/arch/x86/include/asm/desc_defs.h
+++ b/arch/x86/include/asm/desc_defs.h
@@ -8,6 +8,49 @@
* archs.
*/
+/*
+ * Low-level interface mapping flags/field names to bits
+ */
+
+#define _DESC_ACCESSED 0x0001
+#define _DESC_CODE_READABLE 0x0002
+#define _DESC_DATA_WRITABLE 0x0002
+#define _DESC_DATA_DIRECTION_DOWN 0x0004
+#define _DESC_CODE_CONFORMING 0x0004
+#define _DESC_EXECUTABLE 0x0008
+#define _DESC_SYSTEM 0x0010
+#define _DESC_DPL(dpl) ((dpl) << 5)
+#define _DESC_PRESENT 0x0080
+
+#define _DESC_LONG_CODE 0x2000
+#define _DESC_DB 0x4000 /* "size flag": 0 = 16-bit, 1 = 32-bit */
+#define _DESC_GRANULARITY_4K 0x8000
+
+/*
+ * High-level interface mapping intended usage to low-level combinations
+ * of flags
+ */
+
+#define _DESC_DATA (_DESC_PRESENT | _DESC_DATA_WRITABLE | _DESC_SYSTEM)
+#define _DESC_CODE (_DESC_PRESENT | _DESC_CODE_READABLE | _DESC_SYSTEM | _DESC_EXECUTABLE)
+
+#define DESC_DATA16 (_DESC_DATA)
+#define DESC_CODE16 (_DESC_CODE)
+
+#define DESC_DATA32 (_DESC_DATA | _DESC_GRANULARITY_4K | _DESC_DB)
+#define DESC_DATA32_BIOS (_DESC_DATA | _DESC_DB)
+
+#define DESC_CODE32 (_DESC_CODE | _DESC_GRANULARITY_4K | _DESC_DB)
+#define DESC_CODE32_BIOS (_DESC_CODE | _DESC_DB)
+
+/* don't ask why, this is just how you specify a 32-bit TSS descriptor */
+#define DESC_TSS32 (_DESC_PRESENT | _DESC_EXECUTABLE | _DESC_ACCESSED)
+
+#define DESC_DATA64 (_DESC_DATA | _DESC_GRANULARITY_4K | _DESC_DB)
+#define DESC_CODE64 (_DESC_CODE | _DESC_GRANULARITY_4K | _DESC_LONG_CODE)
+
+#define DESC_USER (_DESC_DPL(3))
+
#ifndef __ASSEMBLY__
#include <linux/types.h>
@@ -27,14 +70,14 @@ struct desc_struct {
.base0 = (u16) (base), \
.base1 = ((base) >> 16) & 0xFF, \
.base2 = ((base) >> 24) & 0xFF, \
- .type = (flags & 0x0f), \
- .s = (flags >> 4) & 0x01, \
- .dpl = (flags >> 5) & 0x03, \
- .p = (flags >> 7) & 0x01, \
- .avl = (flags >> 12) & 0x01, \
- .l = (flags >> 13) & 0x01, \
- .d = (flags >> 14) & 0x01, \
- .g = (flags >> 15) & 0x01, \
+ .type = ((flags) & 0x0f), \
+ .s = ((flags) >> 4) & 0x01, \
+ .dpl = ((flags) >> 5) & 0x03, \
+ .p = ((flags) >> 7) & 0x01, \
+ .avl = ((flags) >> 12) & 0x01, \
+ .l = ((flags) >> 13) & 0x01, \
+ .d = ((flags) >> 14) & 0x01, \
+ .g = ((flags) >> 15) & 0x01, \
}
enum {
@@ -94,6 +137,7 @@ struct gate_struct {
typedef struct gate_struct gate_desc;
+#ifndef _SETUP
static inline unsigned long gate_offset(const gate_desc *g)
{
#ifdef CONFIG_X86_64
@@ -108,6 +152,7 @@ static inline unsigned long gate_segment(const gate_desc *g)
{
return g->segment;
}
+#endif
struct desc_ptr {
unsigned short size;
diff --git a/arch/x86/kernel/apm_32.c b/arch/x86/kernel/apm_32.c
index 5934ee5bc087..76a5ced278c2 100644
--- a/arch/x86/kernel/apm_32.c
+++ b/arch/x86/kernel/apm_32.c
@@ -420,7 +420,7 @@ static DEFINE_MUTEX(apm_mutex);
* This is for buggy BIOS's that refer to (real mode) segment 0x40
* even though they are called in protected mode.
*/
-static struct desc_struct bad_bios_desc = GDT_ENTRY_INIT(0x4092,
+static struct desc_struct bad_bios_desc = GDT_ENTRY_INIT(DESC_DATA32_BIOS,
(unsigned long)__va(0x400UL), PAGE_SIZE - 0x400 - 1);
static const char driver_version[] = "1.16ac"; /* no spaces */
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index b14fc8c1c953..32934a0656af 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -188,45 +188,37 @@ DEFINE_PER_CPU_PAGE_ALIGNED(struct gdt_page, gdt_page) = { .gdt = {
* TLS descriptors are currently at a different place compared to i386.
* Hopefully nobody expects them at a fixed place (Wine?)
*/
- [GDT_ENTRY_KERNEL32_CS] = GDT_ENTRY_INIT(0xc09b, 0, 0xfffff),
- [GDT_ENTRY_KERNEL_CS] = GDT_ENTRY_INIT(0xa09b, 0, 0xfffff),
- [GDT_ENTRY_KERNEL_DS] = GDT_ENTRY_INIT(0xc093, 0, 0xfffff),
- [GDT_ENTRY_DEFAULT_USER32_CS] = GDT_ENTRY_INIT(0xc0fb, 0, 0xfffff),
- [GDT_ENTRY_DEFAULT_USER_DS] = GDT_ENTRY_INIT(0xc0f3, 0, 0xfffff),
- [GDT_ENTRY_DEFAULT_USER_CS] = GDT_ENTRY_INIT(0xa0fb, 0, 0xfffff),
+ [GDT_ENTRY_KERNEL32_CS] = GDT_ENTRY_INIT(DESC_CODE32 | _DESC_ACCESSED, 0, 0xfffff),
+ [GDT_ENTRY_KERNEL_CS] = GDT_ENTRY_INIT(DESC_CODE64 | _DESC_ACCESSED, 0, 0xfffff),
+ [GDT_ENTRY_KERNEL_DS] = GDT_ENTRY_INIT(DESC_DATA64 | _DESC_ACCESSED, 0, 0xfffff),
+ [GDT_ENTRY_DEFAULT_USER32_CS] = GDT_ENTRY_INIT(DESC_CODE32 | DESC_USER | _DESC_ACCESSED, 0, 0xfffff),
+ [GDT_ENTRY_DEFAULT_USER_DS] = GDT_ENTRY_INIT(DESC_DATA64 | DESC_USER | _DESC_ACCESSED, 0, 0xfffff),
+ [GDT_ENTRY_DEFAULT_USER_CS] = GDT_ENTRY_INIT(DESC_CODE64 | DESC_USER | _DESC_ACCESSED, 0, 0xfffff),
#else
- [GDT_ENTRY_KERNEL_CS] = GDT_ENTRY_INIT(0xc09a, 0, 0xfffff),
- [GDT_ENTRY_KERNEL_DS] = GDT_ENTRY_INIT(0xc092, 0, 0xfffff),
- [GDT_ENTRY_DEFAULT_USER_CS] = GDT_ENTRY_INIT(0xc0fa, 0, 0xfffff),
- [GDT_ENTRY_DEFAULT_USER_DS] = GDT_ENTRY_INIT(0xc0f2, 0, 0xfffff),
+ [GDT_ENTRY_KERNEL_CS] = GDT_ENTRY_INIT(DESC_CODE32, 0, 0xfffff),
+ [GDT_ENTRY_KERNEL_DS] = GDT_ENTRY_INIT(DESC_DATA32, 0, 0xfffff),
+ [GDT_ENTRY_DEFAULT_USER_CS] = GDT_ENTRY_INIT(DESC_CODE32 | DESC_USER, 0, 0xfffff),
+ [GDT_ENTRY_DEFAULT_USER_DS] = GDT_ENTRY_INIT(DESC_DATA32 | DESC_USER, 0, 0xfffff),
/*
* Segments used for calling PnP BIOS have byte granularity.
* They code segments and data segments have fixed 64k limits,
* the transfer segment sizes are set at run time.
*/
- /* 32-bit code */
- [GDT_ENTRY_PNPBIOS_CS32] = GDT_ENTRY_INIT(0x409a, 0, 0xffff),
- /* 16-bit code */
- [GDT_ENTRY_PNPBIOS_CS16] = GDT_ENTRY_INIT(0x009a, 0, 0xffff),
- /* 16-bit data */
- [GDT_ENTRY_PNPBIOS_DS] = GDT_ENTRY_INIT(0x0092, 0, 0xffff),
- /* 16-bit data */
- [GDT_ENTRY_PNPBIOS_TS1] = GDT_ENTRY_INIT(0x0092, 0, 0),
- /* 16-bit data */
- [GDT_ENTRY_PNPBIOS_TS2] = GDT_ENTRY_INIT(0x0092, 0, 0),
+ [GDT_ENTRY_PNPBIOS_CS32] = GDT_ENTRY_INIT(DESC_CODE32_BIOS, 0, 0xffff),
+ [GDT_ENTRY_PNPBIOS_CS16] = GDT_ENTRY_INIT(DESC_CODE16, 0, 0xffff),
+ [GDT_ENTRY_PNPBIOS_DS] = GDT_ENTRY_INIT(DESC_DATA16, 0, 0xffff),
+ [GDT_ENTRY_PNPBIOS_TS1] = GDT_ENTRY_INIT(DESC_DATA16, 0, 0),
+ [GDT_ENTRY_PNPBIOS_TS2] = GDT_ENTRY_INIT(DESC_DATA16, 0, 0),
/*
* The APM segments have byte granularity and their bases
* are set at run time. All have 64k limits.
*/
- /* 32-bit code */
- [GDT_ENTRY_APMBIOS_BASE] = GDT_ENTRY_INIT(0x409a, 0, 0xffff),
- /* 16-bit code */
- [GDT_ENTRY_APMBIOS_BASE+1] = GDT_ENTRY_INIT(0x009a, 0, 0xffff),
- /* data */
- [GDT_ENTRY_APMBIOS_BASE+2] = GDT_ENTRY_INIT(0x4092, 0, 0xffff),
-
- [GDT_ENTRY_ESPFIX_SS] = GDT_ENTRY_INIT(0xc092, 0, 0xfffff),
- [GDT_ENTRY_PERCPU] = GDT_ENTRY_INIT(0xc092, 0, 0xfffff),
+ [GDT_ENTRY_APMBIOS_BASE] = GDT_ENTRY_INIT(DESC_CODE32_BIOS, 0, 0xffff),
+ [GDT_ENTRY_APMBIOS_BASE+1] = GDT_ENTRY_INIT(DESC_CODE16, 0, 0xffff),
+ [GDT_ENTRY_APMBIOS_BASE+2] = GDT_ENTRY_INIT(DESC_DATA32_BIOS, 0, 0xffff),
+
+ [GDT_ENTRY_ESPFIX_SS] = GDT_ENTRY_INIT(DESC_DATA32, 0, 0xfffff),
+ [GDT_ENTRY_PERCPU] = GDT_ENTRY_INIT(DESC_DATA32, 0, 0xfffff),
#endif
} };
EXPORT_PER_CPU_SYMBOL_GPL(gdt_page);
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 05a110c97111..00dbddfdfece 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -71,9 +71,9 @@ EXPORT_SYMBOL(vmemmap_base);
* GDT used on the boot CPU before switching to virtual addresses.
*/
static struct desc_struct startup_gdt[GDT_ENTRIES] __initdata = {
- [GDT_ENTRY_KERNEL32_CS] = GDT_ENTRY_INIT(0xc09b, 0, 0xfffff),
- [GDT_ENTRY_KERNEL_CS] = GDT_ENTRY_INIT(0xa09b, 0, 0xfffff),
- [GDT_ENTRY_KERNEL_DS] = GDT_ENTRY_INIT(0xc093, 0, 0xfffff),
+ [GDT_ENTRY_KERNEL32_CS] = GDT_ENTRY_INIT(DESC_CODE32 | _DESC_ACCESSED, 0, 0xfffff),
+ [GDT_ENTRY_KERNEL_CS] = GDT_ENTRY_INIT(DESC_CODE64 | _DESC_ACCESSED, 0, 0xfffff),
+ [GDT_ENTRY_KERNEL_DS] = GDT_ENTRY_INIT(DESC_DATA64 | _DESC_ACCESSED, 0, 0xfffff),
};
/*
diff --git a/arch/x86/kernel/setup_percpu.c b/arch/x86/kernel/setup_percpu.c
index 2c97bf7b56ae..f2583de97a64 100644
--- a/arch/x86/kernel/setup_percpu.c
+++ b/arch/x86/kernel/setup_percpu.c
@@ -106,8 +106,8 @@ void __init pcpu_populate_pte(unsigned long addr)
static inline void setup_percpu_segment(int cpu)
{
#ifdef CONFIG_X86_32
- struct desc_struct d = GDT_ENTRY_INIT(0x8092, per_cpu_offset(cpu),
- 0xFFFFF);
+ struct desc_struct d = GDT_ENTRY_INIT(DESC_DATA32 & ~_DESC_DB,
+ per_cpu_offset(cpu), 0xFFFFF);
write_gdt_entry(get_cpu_gdt_rw(cpu), GDT_ENTRY_PERCPU, &d, DESCTYPE_S);
#endif
diff --git a/arch/x86/platform/pvh/head.S b/arch/x86/platform/pvh/head.S
index c4365a05ab83..7c6a1089ce1c 100644
--- a/arch/x86/platform/pvh/head.S
+++ b/arch/x86/platform/pvh/head.S
@@ -11,6 +11,7 @@
#include <linux/elfnote.h>
#include <linux/init.h>
#include <linux/linkage.h>
+#include <asm/desc_defs.h>
#include <asm/segment.h>
#include <asm/asm.h>
#include <asm/boot.h>
@@ -148,11 +149,11 @@ SYM_DATA_END(gdt)
SYM_DATA_START_LOCAL(gdt_start)
.quad 0x0000000000000000 /* NULL descriptor */
#ifdef CONFIG_X86_64
- .quad GDT_ENTRY(0xa09a, 0, 0xfffff) /* PVH_CS_SEL */
+ .quad GDT_ENTRY(DESC_CODE64, 0, 0xfffff) /* PVH_CS_SEL */
#else
- .quad GDT_ENTRY(0xc09a, 0, 0xfffff) /* PVH_CS_SEL */
+ .quad GDT_ENTRY(DESC_CODE32, 0, 0xfffff) /* PVH_CS_SEL */
#endif
- .quad GDT_ENTRY(0xc092, 0, 0xfffff) /* PVH_DS_SEL */
+ .quad GDT_ENTRY(DESC_DATA32, 0, 0xfffff) /* PVH_DS_SEL */
SYM_DATA_END_LABEL(gdt_start, SYM_L_LOCAL, gdt_end)
.balign 16
diff --git a/arch/x86/realmode/rm/reboot.S b/arch/x86/realmode/rm/reboot.S
index f10515b10e0a..5bc068b9acdd 100644
--- a/arch/x86/realmode/rm/reboot.S
+++ b/arch/x86/realmode/rm/reboot.S
@@ -1,5 +1,6 @@
/* SPDX-License-Identifier: GPL-2.0 */
#include <linux/linkage.h>
+#include <asm/desc_defs.h>
#include <asm/segment.h>
#include <asm/page_types.h>
#include <asm/processor-flags.h>
@@ -153,5 +154,5 @@ SYM_DATA_START(machine_real_restart_gdt)
* base value 0x100; since this is consistent with real mode
* semantics we don't have to reload the segments once CR0.PE = 0.
*/
- .quad GDT_ENTRY(0x0093, 0x100, 0xffff)
+ .quad GDT_ENTRY(DESC_DATA16 | _DESC_ACCESSED, 0x100, 0xffff)
SYM_DATA_END(machine_real_restart_gdt)
diff --git a/drivers/firmware/efi/libstub/x86-5lvl.c b/drivers/firmware/efi/libstub/x86-5lvl.c
index 479dd445acdc..005dd9b14f95 100644
--- a/drivers/firmware/efi/libstub/x86-5lvl.c
+++ b/drivers/firmware/efi/libstub/x86-5lvl.c
@@ -13,8 +13,8 @@ bool efi_no5lvl;
static void (*la57_toggle)(void *cr3);
static const struct desc_struct gdt[] = {
- [GDT_ENTRY_KERNEL32_CS] = GDT_ENTRY_INIT(0xc09b, 0, 0xfffff),
- [GDT_ENTRY_KERNEL_CS] = GDT_ENTRY_INIT(0xa09b, 0, 0xfffff),
+ [GDT_ENTRY_KERNEL32_CS] = GDT_ENTRY_INIT(DESC_CODE32 | _DESC_ACCESSED, 0, 0xfffff),
+ [GDT_ENTRY_KERNEL_CS] = GDT_ENTRY_INIT(DESC_CODE64 | _DESC_ACCESSED, 0, 0xfffff),
};
/*
diff --git a/drivers/pnp/pnpbios/bioscalls.c b/drivers/pnp/pnpbios/bioscalls.c
index ddc6f2163c8e..1f31dce5835a 100644
--- a/drivers/pnp/pnpbios/bioscalls.c
+++ b/drivers/pnp/pnpbios/bioscalls.c
@@ -60,7 +60,7 @@ do { \
set_desc_limit(&gdt[(selname) >> 3], (size) - 1); \
} while(0)
-static struct desc_struct bad_bios_desc = GDT_ENTRY_INIT(0x4092,
+static struct desc_struct bad_bios_desc = GDT_ENTRY_INIT(DESC_DATA32_BIOS,
(unsigned long)__va(0x400UL), PAGE_SIZE - 0x400 - 1);
/*
diff --git a/tools/testing/selftests/x86/Makefile b/tools/testing/selftests/x86/Makefile
index 0b872c0a42d2..509310cec626 100644
--- a/tools/testing/selftests/x86/Makefile
+++ b/tools/testing/selftests/x86/Makefile
@@ -14,7 +14,7 @@ TARGETS_C_BOTHBITS := single_step_syscall sysret_ss_attrs syscall_nt test_mremap
check_initial_reg_state sigreturn iopl ioperm \
test_vsyscall mov_ss_trap \
syscall_arg_fault fsgsbase_restore sigaltstack
-TARGETS_C_BOTHBITS += nx_stack
+TARGETS_C_BOTHBITS += nx_stack desc
TARGETS_C_32BIT_ONLY := entry_from_vm86 test_syscall_vdso unwind_vdso \
test_FCMOV test_FCOMI test_FISTTP \
vdso_restorer
diff --git a/tools/testing/selftests/x86/desc.c b/tools/testing/selftests/x86/desc.c
new file mode 100644
index 000000000000..7f30a482ad5d
--- /dev/null
+++ b/tools/testing/selftests/x86/desc.c
@@ -0,0 +1,134 @@
+#include <stdint.h>
+#include <stdio.h>
+
+#define u16 uint16_t
+
+#include "../../../../arch/x86/include/asm/desc_defs.h"
+
+struct testcase {
+ const char *name;
+ uint16_t flags;
+ uint16_t flags2;
+};
+
+#define TESTCASE(flags_, name_, flags2_) \
+ { \
+ .name = name_, \
+ .flags = flags_, \
+ .flags2 = flags2_, \
+ }
+
+/*
+ * Find all of the existing uses of raw descriptor values with:
+ *
+ * git grep -P 'GDT_ENTRY(_INIT)?\(' v6.7-rc5
+ *
+ */
+struct testcase testcases[] = {
+ // arch/x86/boot/pm.c
+ TESTCASE(0xc09b, "boot_cs", DESC_CODE32 | _DESC_ACCESSED),
+ TESTCASE(0xc093, "boot_ds", DESC_DATA32 | _DESC_ACCESSED),
+ TESTCASE(0x0089, "boot_tss", DESC_TSS32),
+
+ // arch/x86/kernel/apm_32.c
+ TESTCASE(0x4092, "apmbios_32", DESC_DATA32_BIOS),
+
+ // arch/x86/kernel/cpu/common.c
+ TESTCASE(0xc09b, "kernel32_cs", DESC_CODE32 | _DESC_ACCESSED),
+ TESTCASE(0xa09b, "kernel64_cs", DESC_CODE64 | _DESC_ACCESSED),
+ TESTCASE(0xc093, "kernel64_ds", DESC_DATA64 | _DESC_ACCESSED),
+ TESTCASE(0xc0fb, "default_user32_cs", DESC_CODE32 | DESC_USER | _DESC_ACCESSED),
+ TESTCASE(0xc0f3, "default_user64_ds", DESC_DATA64 | DESC_USER | _DESC_ACCESSED),
+ TESTCASE(0xa0fb, "default_user64_cs", DESC_CODE64 | DESC_USER | _DESC_ACCESSED),
+
+ TESTCASE(0xc09a, "kernel32_cs", DESC_CODE32),
+ TESTCASE(0xc092, "kernel32_ds", DESC_DATA32),
+ TESTCASE(0xc0fa, "default_user_cs", DESC_CODE32 | DESC_USER),
+ TESTCASE(0xc0f2, "default_user_ds", DESC_DATA32 | DESC_USER),
+
+ TESTCASE(0x409a, "pnpbios_cs32", DESC_CODE32_BIOS),
+ TESTCASE(0x009a, "pbpbios_cs16", DESC_CODE16),
+ TESTCASE(0x0092, "pnpbios_ds", DESC_DATA16),
+ TESTCASE(0x0092, "pnpbios_ts1", DESC_DATA16),
+ TESTCASE(0x0092, "pnpbios_ts2", DESC_DATA16),
+
+ TESTCASE(0x409a, "apmbios_cs32", DESC_CODE32_BIOS),
+ TESTCASE(0x009a, "apmbios_cs16", DESC_CODE16),
+ TESTCASE(0x4092, "apmbios_ds", DESC_DATA32_BIOS),
+ TESTCASE(0xc092, "espfix_ss", DESC_DATA32),
+ TESTCASE(0xc092, "percpu", DESC_DATA32),
+
+ // arch/x86/kernel/head64.c
+ TESTCASE(0xc09b, "kernel32_cs", DESC_CODE32 | _DESC_ACCESSED),
+ TESTCASE(0xa09b, "kernel64_cs", DESC_CODE64 | _DESC_ACCESSED),
+ TESTCASE(0xc093, "kernel64_ds", DESC_DATA64 | _DESC_ACCESSED),
+
+ // arch/x86/kernel/setup_percpu.c
+ TESTCASE(0x8092, "percpu_32", DESC_DATA32 & ~_DESC_DB),
+
+ // arch/x86/platform/pvh/head.S
+ TESTCASE(0xa09a, "pvh_cs64", DESC_CODE64),
+ TESTCASE(0xc09a, "pvh_cs32", DESC_CODE32),
+ TESTCASE(0xc092, "pvh_ds", DESC_DATA32),
+
+ // arch/x86/realmode/rm/reboot.S
+ TESTCASE(0x0093, "data16", DESC_DATA16 | _DESC_ACCESSED),
+
+ // drivers/firmware/efi/libstub/x86-5lvl.c
+ TESTCASE(0xc09b, "kernel32_cs", DESC_CODE32 | _DESC_ACCESSED),
+ TESTCASE(0xa09b, "kernel64_cs", DESC_CODE64 | _DESC_ACCESSED),
+
+ // drivers/pnp/pnpbios/bioscalls.c
+ TESTCASE(0x4092, "pnpbios_bad", DESC_DATA32_BIOS),
+};
+
+// decode flags based on struct desc_struct + GDT_ENTRY()
+// for debugging only
+
+struct desc_flags {
+ u16 base1: 8, a:1, rw:1, dc:1, e: 1, s: 1, dpl: 2, p: 1;
+ u16 limit1: 4, avl: 1, l: 1, d: 1, g: 1, base2: 8;
+} __attribute__((packed));
+
+#define DESC_FLAGS(flags) \
+ { \
+ .a = (flags >> 0) & 0x01, \
+ .rw = (flags >> 1) & 0x01, \
+ .dc = (flags >> 2) & 0x01, \
+ .e = (flags >> 3) & 0x01, \
+ .s = (flags >> 4) & 0x01, \
+ .dpl = (flags >> 5) & 0x03, \
+ .p = (flags >> 7) & 0x01, \
+ .avl = (flags >> 12) & 0x01, \
+ .l = (flags >> 13) & 0x01, \
+ .d = (flags >> 14) & 0x01, \
+ .g = (flags >> 15) & 0x01, \
+ }
+
+int main(int argc, char *argv[])
+{
+ for (int i = 0; i < sizeof(testcases) / sizeof(*testcases); ++i) {
+ struct testcase t = testcases[i];
+ struct desc_flags desc = DESC_FLAGS(t.flags);
+ int ok = (t.flags == t.flags2);
+
+ printf("%s %04x %04x a=%x rw=%x dc=%x e=%x s=%x dpl=%x p=%x avl=%x l=%x db=%x g=%x %s\n",
+ ok ? "ok!" : "err",
+ t.flags,
+ t.flags2,
+ desc.a,
+ desc.rw,
+ desc.dc,
+ desc.e,
+ desc.s,
+ desc.dpl,
+ desc.p,
+ desc.avl,
+ desc.l,
+ desc.d,
+ desc.g,
+ t.name);
+ }
+
+ return 0;
+}
--
2.34.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 1/3] x86: Move TSS and LDT to end of the GDT
2023-12-16 18:24 ` Vegard Nossum
@ 2023-12-16 18:40 ` Linus Torvalds
0 siblings, 0 replies; 14+ messages in thread
From: Linus Torvalds @ 2023-12-16 18:40 UTC (permalink / raw)
To: Vegard Nossum
Cc: Brian Gerst, linux-kernel, x86, Ingo Molnar, Thomas Gleixner,
Borislav Petkov, H . Peter Anvin, Peter Zijlstra
On Sat, 16 Dec 2023 at 10:25, Vegard Nossum <vegard.nossum@oracle.com> wrote:
>
> While preparing the patch I also came across some things that are
> unclear to me:
>
> - why do we want some segments with the A (accessed) bit set and some
> with it cleared -- is there an actual reason for the difference, or
> could we just set it for all of them?
I think it's random, and an effect of just having hardcoded numbers
and not having any structure to it.
But I do think you're right that we should just start with all
kernel-created segment descriptors marked as accessed. I do not
believe that we have any actual *use* for the descriptor access bit.
> - why does setup_percpu_segment() want the DB (size) flag clear? This
> seems to indicate that it's a 16-bit segment -- is this correct?
I think it's nonsensical and doesn't matter, and is another mistake
from us just having random numbers.
I don't think the DB bit matters except for when it's used for the
code or stack segment (or, apparently, if it's a grow-down segment).
So I think your patch looks good, and I would keep it in that form if
it makes it easier to just verify that it generates an identical
kernel image.
And then as a separate patch, I would remove that DB bit clear thing.
Anyway, I do like your patch, and I think the fact that you found
those oddities is a good argument *for* the patch, but at the same
time I think I'll just bow to the x86 maintainers who may think that
this is churn in an area that they'd rather not touch any more.
So consider that an "ack" from me, but with that caveat of yes, I
think a binary diff would be a good thing because this is *so* odd and
low-level and maybe people just think it's not worth it.
Thanks,
Linus
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 3/3] x86/sigreturn: Reject system segements
2023-12-13 18:54 ` Linus Torvalds
@ 2023-12-17 21:07 ` H. Peter Anvin
2023-12-17 21:40 ` Linus Torvalds
0 siblings, 1 reply; 14+ messages in thread
From: H. Peter Anvin @ 2023-12-17 21:07 UTC (permalink / raw)
To: Linus Torvalds, Brian Gerst
Cc: linux-kernel, x86, Ingo Molnar, Thomas Gleixner, Borislav Petkov,
Peter Zijlstra, Michal Luczaj
On December 13, 2023 10:54:00 AM PST, Linus Torvalds <torvalds@linuxfoundation.org> wrote:
>On Wed, 13 Dec 2023 at 08:34, Brian Gerst <brgerst@gmail.com> wrote:
>>
>> @@ -98,7 +98,11 @@ static bool ia32_restore_sigcontext(struct pt_regs *regs,
>>
>> /* Get CS/SS and force CPL3 */
>> regs->cs = sc.cs | 0x03;
>> + if (!valid_user_selector(regs->cs))
>> + return false;
>> regs->ss = sc.ss | 0x03;
>> + if (!valid_user_selector(regs->ss))
>> + return false;
>
>Side note: the SS/CS checks could be stricter than the usual selector tests.
>
>In particular, normal segments can be Null segments. But CS/SS must not be.
>
>Also, since you're now checking the validity, maybe we shouldn't do
>the "force cpl3" any more, and just make it an error to try to load a
>non-cpl3 segment at sigreturn..
>
>That forcing was literally just because we weren't checking it for sanity...
>
> Linus
Not to mention that changing a null descriptor to 3 is wrong.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/3] x86: Move TSS and LDT to end of the GDT
2023-12-13 18:51 ` Linus Torvalds
2023-12-13 19:08 ` Linus Torvalds
@ 2023-12-17 21:09 ` H. Peter Anvin
1 sibling, 0 replies; 14+ messages in thread
From: H. Peter Anvin @ 2023-12-17 21:09 UTC (permalink / raw)
To: Linus Torvalds, Brian Gerst
Cc: linux-kernel, x86, Ingo Molnar, Thomas Gleixner, Borislav Petkov,
Peter Zijlstra
On December 13, 2023 10:51:11 AM PST, Linus Torvalds <torvalds@linuxfoundation.org> wrote:
>On Wed, 13 Dec 2023 at 08:34, Brian Gerst <brgerst@gmail.com> wrote:
>>
>> This will make testing for system segments easier.
>
>It seems to make more sense organizationally too, with the special
>non-data/code segments clearly separate at the end.
>
>So I think this is fine conceptually.
>
>HOWEVER, I think that you might want to expand on this a bit more,
>because there are other special segments selectors that might not be
>thing you want to expose to user space.
>
>We have GDT_ENTRY_PERCPU for example, which is a kernel-only segment.
>It also happens to be 32-bit only, it doesn't matter for the thing
>you're trying to fix, but that valid_user_selector() thing is then
>used on x86-32 too.
>
>So the ESPFIX and per-cpu segments are kernel-only, but then the VDSO
>getcpu one is a user segment.
>
>And the PnP and APM BIOS segments are similarly kernel-only.
>
>But then the VDSO getcpu segment is user-visible, in the middle, and
>again, it's 32-bit only but that whole GDT_SYSTEM_START thing is
>supposed to work there too.
>
>End result: this seems incomplete and not really fully baked.
>
>I wonder if instead of GDT_SYSTEM_START, you'd be better off just
>making a trivial constant bitmap of "these are user visible segments
>in the GDT". No need to re-order things, just have something like
>
> #define USER_SEGMENTS_MASK \
> ((1ul << GDT_ENTRY_DEFAULT_USER_CS) |
> ,,,,
>
>and use that for the test (remember to check for GDT_ENTRIES as the max).
>
>Hmm?
>
> Linus
Somewhat unrelated: X86_BUG_ESPFIX should just be deleted, as we aren't actually ever cleaning it. All current x86 processors have that problem (until FRED).
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 3/3] x86/sigreturn: Reject system segements
2023-12-17 21:07 ` H. Peter Anvin
@ 2023-12-17 21:40 ` Linus Torvalds
2023-12-17 21:45 ` H. Peter Anvin
0 siblings, 1 reply; 14+ messages in thread
From: Linus Torvalds @ 2023-12-17 21:40 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Brian Gerst, linux-kernel, x86, Ingo Molnar, Thomas Gleixner,
Borislav Petkov, Peter Zijlstra, Michal Luczaj
On Sun, 17 Dec 2023 at 13:08, H. Peter Anvin <hpa@zytor.com> wrote:
>
> On December 13, 2023 10:54:00 AM PST, Linus Torvalds <torvalds@linuxfoundation.org> wrote:
]> >Side note: the SS/CS checks could be stricter than the usual selector tests.
> >
> >In particular, normal segments can be Null segments. But CS/SS must not be.
> >
> >Also, since you're now checking the validity, maybe we shouldn't do
> >the "force cpl3" any more, and just make it an error to try to load a
> >non-cpl3 segment at sigreturn..
> >
> >That forcing was literally just because we weren't checking it for sanity...
> >
> > Linus
>
> Not to mention that changing a null descriptor to 3 is wrong.
I don't think it is. All of 0-3 are "Null selectors". The RPL of the
selector simply doesn't matter when the index is zero, afaik.
But we obviously only do this for CS/SS, which can't be (any kind of)
Null selector and iret will GP on them regardless of the RPL in the
selector.
Linus
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 3/3] x86/sigreturn: Reject system segements
2023-12-17 21:40 ` Linus Torvalds
@ 2023-12-17 21:45 ` H. Peter Anvin
2023-12-18 8:31 ` Li, Xin3
0 siblings, 1 reply; 14+ messages in thread
From: H. Peter Anvin @ 2023-12-17 21:45 UTC (permalink / raw)
To: Linus Torvalds
Cc: Brian Gerst, linux-kernel, x86, Ingo Molnar, Thomas Gleixner,
Borislav Petkov, Peter Zijlstra, Michal Luczaj
On December 17, 2023 1:40:53 PM PST, Linus Torvalds <torvalds@linuxfoundation.org> wrote:
>On Sun, 17 Dec 2023 at 13:08, H. Peter Anvin <hpa@zytor.com> wrote:
>>
>> On December 13, 2023 10:54:00 AM PST, Linus Torvalds <torvalds@linuxfoundation.org> wrote:
>]> >Side note: the SS/CS checks could be stricter than the usual selector tests.
>> >
>> >In particular, normal segments can be Null segments. But CS/SS must not be.
>> >
>> >Also, since you're now checking the validity, maybe we shouldn't do
>> >the "force cpl3" any more, and just make it an error to try to load a
>> >non-cpl3 segment at sigreturn..
>> >
>> >That forcing was literally just because we weren't checking it for sanity...
>> >
>> > Linus
>>
>> Not to mention that changing a null descriptor to 3 is wrong.
>
>I don't think it is. All of 0-3 are "Null selectors". The RPL of the
>selector simply doesn't matter when the index is zero, afaik.
>
>But we obviously only do this for CS/SS, which can't be (any kind of)
>Null selector and iret will GP on them regardless of the RPL in the
>selector.
>
> Linus
Of course not for CS/SS, but I would agree that if the selector is 0 before the signal it shouldn't mysteriously and asynchronously become 3.
^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: [PATCH 3/3] x86/sigreturn: Reject system segements
2023-12-17 21:45 ` H. Peter Anvin
@ 2023-12-18 8:31 ` Li, Xin3
0 siblings, 0 replies; 14+ messages in thread
From: Li, Xin3 @ 2023-12-18 8:31 UTC (permalink / raw)
To: H. Peter Anvin, Linus Torvalds
Cc: Brian Gerst, linux-kernel@vger.kernel.org, x86@kernel.org,
Ingo Molnar, Thomas Gleixner, Borislav Petkov, Peter Zijlstra,
Michal Luczaj
> -----Original Message-----
> From: H. Peter Anvin <hpa@zytor.com>
> Sent: Sunday, December 17, 2023 1:46 PM
> To: Linus Torvalds <torvalds@linuxfoundation.org>
> Cc: Brian Gerst <brgerst@gmail.com>; linux-kernel@vger.kernel.org;
> x86@kernel.org; Ingo Molnar <mingo@kernel.org>; Thomas Gleixner
> <tglx@linutronix.de>; Borislav Petkov <bp@alien8.de>; Peter Zijlstra
> <peterz@infradead.org>; Michal Luczaj <mhal@rbox.co>
> Subject: Re: [PATCH 3/3] x86/sigreturn: Reject system segements
>
> On December 17, 2023 1:40:53 PM PST, Linus Torvalds
> <torvalds@linuxfoundation.org> wrote:
> >On Sun, 17 Dec 2023 at 13:08, H. Peter Anvin <hpa@zytor.com> wrote:
> >>
> >> On December 13, 2023 10:54:00 AM PST, Linus Torvalds
> <torvalds@linuxfoundation.org> wrote:
> >]> >Side note: the SS/CS checks could be stricter than the usual selector tests.
> >> >
> >> >In particular, normal segments can be Null segments. But CS/SS must not be.
> >> >
> >> >Also, since you're now checking the validity, maybe we shouldn't do
> >> >the "force cpl3" any more, and just make it an error to try to load
> >> >a
> >> >non-cpl3 segment at sigreturn..
> >> >
> >> >That forcing was literally just because we weren't checking it for sanity...
> >> >
> >> > Linus
> >>
> >> Not to mention that changing a null descriptor to 3 is wrong.
> >
> >I don't think it is. All of 0-3 are "Null selectors". The RPL of the
> >selector simply doesn't matter when the index is zero, afaik.
> >
> >But we obviously only do this for CS/SS, which can't be (any kind of)
> >Null selector and iret will GP on them regardless of the RPL in the
> >selector.
> >
> > Linus
>
> Of course not for CS/SS, but I would agree that if the selector is 0 before the
> signal it shouldn't mysteriously and asynchronously become 3.
Unfortunately reload_segments() _always_ sets DPL bits of GS/FS/DS/ES
to 3, even for 0. And IRET clears DPL bits when loading a NULL selector
into GS/FS/DS/ES:
https://lore.kernel.org/lkml/20230706052231.2183-1-xin3.li@intel.com/
Thanks!
Xin
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2023-12-18 8:31 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-12-13 16:34 [PATCH 0/3] Reject setting system segments from userspace Brian Gerst
2023-12-13 16:34 ` [PATCH 1/3] x86: Move TSS and LDT to end of the GDT Brian Gerst
2023-12-13 18:51 ` Linus Torvalds
2023-12-13 19:08 ` Linus Torvalds
2023-12-16 18:24 ` Vegard Nossum
2023-12-16 18:40 ` Linus Torvalds
2023-12-17 21:09 ` H. Peter Anvin
2023-12-13 16:34 ` [PATCH 2/3] x86/ptrace: Reject system segements Brian Gerst
2023-12-13 16:34 ` [PATCH 3/3] x86/sigreturn: " Brian Gerst
2023-12-13 18:54 ` Linus Torvalds
2023-12-17 21:07 ` H. Peter Anvin
2023-12-17 21:40 ` Linus Torvalds
2023-12-17 21:45 ` H. Peter Anvin
2023-12-18 8:31 ` Li, Xin3
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.