From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 8C22DC3DA42 for ; Wed, 10 Jul 2024 21:09:53 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sReYX-0003na-BS; Wed, 10 Jul 2024 17:08:53 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sReYV-0003mn-0R for qemu-devel@nongnu.org; Wed, 10 Jul 2024 17:08:51 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sReYP-00051M-U1 for qemu-devel@nongnu.org; Wed, 10 Jul 2024 17:08:49 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1720645724; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=PiCbPNRVzgqwrnORo/RtJhCR4kBC1pw1pKiWH3RgxyM=; b=NRA27LEsXYHCUpQnxkgTkl+F9URIdTdrF8lmXU94r46zumvGcKVUZgYBkPvJMSB3pX7K7U S1dwXsdQ2oo7Yytc9qpV2pAK7oHjPTN6aRiUIp2ng5hl5t0Bbd7rWzQyNNDygFZo/P4d0v J+jfzTlCrAPXnRs9r536E06FKKVn7ak= Received: from mail-wr1-f69.google.com (mail-wr1-f69.google.com [209.85.221.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-76-XZdFQQJdNomL4wShGQaP6g-1; Wed, 10 Jul 2024 17:08:42 -0400 X-MC-Unique: XZdFQQJdNomL4wShGQaP6g-1 Received: by mail-wr1-f69.google.com with SMTP id ffacd0b85a97d-36789d3603aso35422f8f.2 for ; Wed, 10 Jul 2024 14:08:42 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720645721; x=1721250521; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=PiCbPNRVzgqwrnORo/RtJhCR4kBC1pw1pKiWH3RgxyM=; b=HL2pJCFhKHipGO2f4Kiq9gF1Z/ub1auKg6bh3tU43X84RC8a+4hf8FVjLJXm/DrWbC mpuL7GuMA0u3o9pWTU9Dq2tvY6x+KWsGZAQHUejqz7bpjCVhQDQdYsRLoEGwxpSfVzpu Y/m0rkie3USzmEs+YdeWD/2k2fKISpRwJHzbFNR/Lx4IfV+gVUUNxm/0mXSirAPo4uiG mfugB/FwZpZ398AeGwfCsPIrVX1bSCOZH3N/74YfllBEq0mlIBz1fFZdt1aPixRRhTif ME4L4ET47XuKUPhhkfjxTNNDhYIm9GmXRmLr548T8Xdle73N0ukeE6wDHM0hgoqNtQXd E5eQ== X-Gm-Message-State: AOJu0YyQax+Y3zMDjn8OgdpYuhMZ3Fpre8cblWXlRpx2kowrkDr7CSPf AumxO9RZJ4Sb1GCaXnBdMqHaGgs5PKz0hUBCaPmDAq986GPewsJAo/pgWwhEGVIn6oTu72atMZB lALxaTlh97hj3Ck4kI+L67dZYZLjucm32pu8PO9J1IcnrdgXVsQVSlFruXHpQWhlAUBKxoAA6z4 ku7WeyqD5UsfOz5TWsQ/tjtwVcMEk= X-Received: by 2002:a5d:43cd:0:b0:367:9801:9c5b with SMTP id ffacd0b85a97d-367ceacb4f6mr5055537f8f.50.1720645721518; Wed, 10 Jul 2024 14:08:41 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFRd36lCTaaXmH4o/DaRHx/8k2o1cf+Gkm5bfsEjHXipqcj547vMc3DiYsrgVELknEHeIzx8qcWzNTermaooPM= X-Received: by 2002:a5d:43cd:0:b0:367:9801:9c5b with SMTP id ffacd0b85a97d-367ceacb4f6mr5055528f8f.50.1720645721150; Wed, 10 Jul 2024 14:08:41 -0700 (PDT) MIME-Version: 1.0 References: <20240710062920.73063-1-pbonzini@redhat.com> In-Reply-To: From: Paolo Bonzini Date: Wed, 10 Jul 2024 23:08:28 +0200 Message-ID: Subject: Re: [PATCH 00/10] target/i386/tcg: fixes for seg_helper.c To: Robert Henry Cc: qemu-devel , Richard Henderson Content-Type: multipart/alternative; boundary="000000000000fe7506061ceb0c64" Received-SPF: pass client-ip=170.10.133.124; envelope-from=pbonzini@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -21 X-Spam_score: -2.2 X-Spam_bar: -- X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.144, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org --000000000000fe7506061ceb0c64 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Il mer 10 lug 2024, 23:01 Robert Henry ha scritto: > I have only skimmed the diffs. Your knowledge of the deep semantics, > gained by close differential reading of intel and amd docs, is truly > amazing. Many thanks for pushing this through! > Thanks for bringing this to our attention too, apart from the practical bug hopefully it will help future readers to have a more precise implementation= . I tried to acknowledge your contribution in the commit messages. I have 2 nits, perhaps stylistic only. > > For code like "sp -=3D 2" or "sp +=3D 2" followed or preceded by a write = to > the stack pointer of a uint16_t variable 'x', would it be better/more > robust to rewrite as: "sp -=3D sizeof(x)" ? > I think that's intentional because the value subtracted is related to the "stw" or "stl" in the store (likewise for incrementing after a load) more than to the size of x. There are a lot of masks constructed using -1. I think it would be clearer > to use 0xffffffff (for 32-bit masks) as that reminds the reader that this > is a bit mask. But it seems that using -1 is how the original code was > written. > -1 is used for 64-bit masks only. They get unwieldy quickly. :) Paolo > On Tue, Jul 9, 2024 at 11:29=E2=80=AFPM Paolo Bonzini wrote: > >> This includes bugfixes: >> - allowing IRET from user mode to user mode with SMAP (do not use implic= it >> kernel accesses, which break if the stack is in userspace) >> >> - use DPL-level accesses for interrupts and call gates >> >> - various fixes for task switching >> >> And two related cleanups: computing MMU index once for far calls and >> returns >> (including task switches), and using X86Access for TSS access. >> >> Tested with a really ugly patch to kvm-unit-tests, included after >> signature. >> >> Paolo Bonzini (7): >> target/i386/tcg: Allow IRET from user mode to user mode with SMAP >> target/i386/tcg: use PUSHL/PUSHW for error code >> target/i386/tcg: Compute MMU index once >> target/i386/tcg: Use DPL-level accesses for interrupts and call gates >> target/i386/tcg: check for correct busy state before switching to a >> new task >> target/i386/tcg: use X86Access for TSS access >> target/i386/tcg: save current task state before loading new one >> >> Richard Henderson (3): >> target/i386/tcg: Remove SEG_ADDL >> target/i386/tcg: Reorg push/pop within seg_helper.c >> target/i386/tcg: Introduce x86_mmu_index_{kernel_,}pl >> >> target/i386/cpu.h | 11 +- >> target/i386/cpu.c | 27 +- >> target/i386/tcg/seg_helper.c | 606 +++++++++++++++++++---------------- >> 3 files changed, 354 insertions(+), 290 deletions(-) >> >> -- >> 2.45.2 >> >> diff --git a/lib/x86/usermode.c b/lib/x86/usermode.c >> index c3ec0ad7..0bf40c6d 100644 >> --- a/lib/x86/usermode.c >> +++ b/lib/x86/usermode.c >> @@ -5,13 +5,15 @@ >> #include "x86/desc.h" >> #include "x86/isr.h" >> #include "alloc.h" >> +#include "alloc_page.h" >> #include "setjmp.h" >> #include "usermode.h" >> >> #include "libcflat.h" >> #include >> >> -#define USERMODE_STACK_SIZE 0x2000 >> +#define USERMODE_STACK_ORDER 1 /* 8k */ >> +#define USERMODE_STACK_SIZE (1 << (12 + USERMODE_STACK_ORDER)) >> #define RET_TO_KERNEL_IRQ 0x20 >> >> static jmp_buf jmpbuf; >> @@ -37,9 +39,14 @@ uint64_t run_in_user(usermode_func func, unsigned int >> fault_vector, >> { >> extern char ret_to_kernel; >> volatile uint64_t rax =3D 0; >> - static unsigned char user_stack[USERMODE_STACK_SIZE]; >> + static unsigned char *user_stack; >> handler old_ex; >> >> + if (!user_stack) { >> + user_stack =3D alloc_pages(USERMODE_STACK_ORDER); >> + printf("%p\n", user_stack); >> + } >> + >> *raised_vector =3D 0; >> set_idt_entry(RET_TO_KERNEL_IRQ, &ret_to_kernel, 3); >> old_ex =3D handle_exception(fault_vector, >> @@ -51,6 +58,8 @@ uint64_t run_in_user(usermode_func func, unsigned int >> fault_vector, >> return 0; >> } >> >> + memcpy(user_stack + USERMODE_STACK_SIZE - 8, &func, 8); >> + >> asm volatile ( >> /* Prepare kernel SP for exception handlers */ >> "mov %%rsp, %[rsp0]\n\t" >> @@ -63,12 +72,13 @@ uint64_t run_in_user(usermode_func func, unsigned in= t >> fault_vector, >> "pushq %[user_stack_top]\n\t" >> "pushfq\n\t" >> "pushq %[user_cs]\n\t" >> - "lea user_mode(%%rip), %%rax\n\t" >> + "lea user_mode+0x800000(%%rip), %%rax\n\t" // >> smap.flat places usermode addresses at 8MB-16MB >> "pushq %%rax\n\t" >> "iretq\n" >> >> "user_mode:\n\t" >> /* Back up volatile registers before invoking >> func */ >> + "pop %%rax\n\t" >> "push %%rcx\n\t" >> "push %%rdx\n\t" >> "push %%rdi\n\t" >> @@ -78,11 +88,12 @@ uint64_t run_in_user(usermode_func func, unsigned in= t >> fault_vector, >> "push %%r10\n\t" >> "push %%r11\n\t" >> /* Call user mode function */ >> + "add $0x800000,%%rbp\n\t" >> "mov %[arg1], %%rdi\n\t" >> "mov %[arg2], %%rsi\n\t" >> "mov %[arg3], %%rdx\n\t" >> "mov %[arg4], %%rcx\n\t" >> - "call *%[func]\n\t" >> + "call *%%rax\n\t" >> /* Restore registers */ >> "pop %%r11\n\t" >> "pop %%r10\n\t" >> @@ -112,12 +123,11 @@ uint64_t run_in_user(usermode_func func, unsigned >> int fault_vector, >> [arg2]"m"(arg2), >> [arg3]"m"(arg3), >> [arg4]"m"(arg4), >> - [func]"m"(func), >> [user_ds]"i"(USER_DS), >> [user_cs]"i"(USER_CS), >> [kernel_ds]"rm"(KERNEL_DS), >> [user_stack_top]"r"(user_stack + >> - sizeof(user_stack)), >> + USERMODE_STACK_SIZE - 8), >> [kernel_entry_vector]"i"(RET_TO_KERNEL_IRQ)); >> >> handle_exception(fault_vector, old_ex); >> diff --git a/x86/smap.c b/x86/smap.c >> index 9a823a55..65119442 100644 >> --- a/x86/smap.c >> +++ b/x86/smap.c >> @@ -2,6 +2,7 @@ >> #include >> #include "x86/desc.h" >> #include "x86/processor.h" >> +#include "x86/usermode.h" >> #include "x86/vm.h" >> >> volatile int pf_count =3D 0; >> @@ -89,6 +90,31 @@ static void check_smap_nowp(void) >> write_cr3(read_cr3()); >> } >> >> +#ifdef __x86_64__ >> +static void iret(void) >> +{ >> + asm volatile( >> + "mov %%rsp, %%rcx;" >> + "movl %%ss, %%ebx; pushq %%rbx; pushq %%rcx;" >> + "pushf;" >> + "movl %%cs, %%ebx; pushq %%rbx; " >> + "lea 1f(%%rip), %%rbx; pushq %%rbx; iretq; 1:" >> + >> + : : : "ebx", "ecx", "cc"); /* RPL=3D0 */ >> +} >> + >> +static void test_user_iret(void) >> +{ >> + bool raised_vector; >> + uintptr_t user_iret =3D (uintptr_t)iret + USER_BASE; >> + >> + run_in_user((usermode_func)user_iret, PF_VECTOR, 0, 0, 0, 0, >> + &raised_vector); >> + >> + report(!raised_vector, "No #PF on CPL=3D3 DPL=3D3 iret"); >> +} >> +#endif >> + >> int main(int ac, char **av) >> { >> unsigned long i; >> @@ -196,7 +222,9 @@ int main(int ac, char **av) >> >> check_smap_nowp(); >> >> - // TODO: implicit kernel access from ring 3 (e.g. int) >> +#ifdef __x86_64__ >> + test_user_iret(); >> +#endif >> >> return report_summary(); >> } >> >> >> >> --000000000000fe7506061ceb0c64 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


Il mer 10 lug 2024, 23:01 Robert Henry <rrh.henry@gmail.com> ha scritto:
I have= only skimmed the diffs.=C2=A0 Your knowledge of the deep semantics, gained= by close differential reading of intel and amd docs, is truly amazing.=C2= =A0 Many thanks for pushing this through!

Thanks for bringing this to our = attention too, apart from the practical bug hopefully it will help future r= eaders to have a more precise implementation.

I tried to acknowledge your contribution in the commi= t messages.

I have 2 nits, perhaps stylistic only.

For code like "sp -=3D 2" or "sp=C2=A0+=3D 2" follow= ed or preceded by a write to the stack pointer of a uint16_t variable '= x',=C2=A0 would it be better/more robust to rewrite as: "sp -=3D s= izeof(x)"=C2=A0 ?

I think that's intentional because the va= lue subtracted is related to the "stw" or "stl" in the = store (likewise for incrementing after a load) more than to the size of x.<= /div>

= There are a lot of masks constructed using -1.=C2=A0 I think it would be cl= earer to use 0xffffffff (for 32-bit masks) as that reminds the reader that = this is a bit mask.=C2=A0 But it seems that using -1 is how the original co= de was written.
<= br>
-1 is used for 64-bit masks only. They get unwie= ldy quickly. :)

Paolo=C2= =A0

<= div>

On Tue, Jul 9, 2024 at 11:29=E2=80=AFPM Paolo Bonzini <pbonzi= ni@redhat.com> wrote:
This includes bugfixes:
- allowing IRET from user mode to user mode with SMAP (do not use implicit<= br> =C2=A0 kernel accesses, which break if the stack is in userspace)

- use DPL-level accesses for interrupts and call gates

- various fixes for task switching

And two related cleanups: computing MMU index once for far calls and return= s
(including task switches), and using X86Access for TSS access.

Tested with a really ugly patch to kvm-unit-tests, included after signature= .

Paolo Bonzini (7):
=C2=A0 target/i386/tcg: Allow IRET from user mode to user mode with SMAP =C2=A0 target/i386/tcg: use PUSHL/PUSHW for error code
=C2=A0 target/i386/tcg: Compute MMU index once
=C2=A0 target/i386/tcg: Use DPL-level accesses for interrupts and call gate= s
=C2=A0 target/i386/tcg: check for correct busy state before switching to a<= br> =C2=A0 =C2=A0 new task
=C2=A0 target/i386/tcg: use X86Access for TSS access
=C2=A0 target/i386/tcg: save current task state before loading new one

Richard Henderson (3):
=C2=A0 target/i386/tcg: Remove SEG_ADDL
=C2=A0 target/i386/tcg: Reorg push/pop within seg_helper.c
=C2=A0 target/i386/tcg: Introduce x86_mmu_index_{kernel_,}pl

=C2=A0target/i386/cpu.h=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 |=C2=A0 11= +-
=C2=A0target/i386/cpu.c=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 |=C2=A0 27= +-
=C2=A0target/i386/tcg/seg_helper.c | 606 +++++++++++++++++++---------------= -
=C2=A03 files changed, 354 insertions(+), 290 deletions(-)

--
2.45.2

diff --git a/lib/x86/usermode.c b/lib/x86/usermode.c
index c3ec0ad7..0bf40c6d 100644
--- a/lib/x86/usermode.c
+++ b/lib/x86/usermode.c
@@ -5,13 +5,15 @@
=C2=A0#include "x86/desc.h"
=C2=A0#include "x86/isr.h"
=C2=A0#include "alloc.h"
+#include "alloc_page.h"
=C2=A0#include "setjmp.h"
=C2=A0#include "usermode.h"

=C2=A0#include "libcflat.h"
=C2=A0#include <stdint.h>

-#define USERMODE_STACK_SIZE=C2=A0 =C2=A0 0x2000
+#define USERMODE_STACK_ORDER=C2=A0 =C2=A01 /* 8k */
+#define USERMODE_STACK_SIZE=C2=A0 =C2=A0 (1 << (12 + USERMODE_STACK_= ORDER))
=C2=A0#define RET_TO_KERNEL_IRQ=C2=A0 =C2=A0 =C2=A0 0x20

=C2=A0static jmp_buf jmpbuf;
@@ -37,9 +39,14 @@ uint64_t run_in_user(usermode_func func, unsigned int fa= ult_vector,
=C2=A0{
=C2=A0 =C2=A0 =C2=A0 =C2=A0 extern char ret_to_kernel;
=C2=A0 =C2=A0 =C2=A0 =C2=A0 volatile uint64_t rax =3D 0;
-=C2=A0 =C2=A0 =C2=A0 =C2=A0static unsigned char user_stack[USERMODE_STACK_= SIZE];
+=C2=A0 =C2=A0 =C2=A0 =C2=A0static unsigned char *user_stack;
=C2=A0 =C2=A0 =C2=A0 =C2=A0 handler old_ex;

+=C2=A0 =C2=A0 =C2=A0 =C2=A0if (!user_stack) {
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0user_stack =3D allo= c_pages(USERMODE_STACK_ORDER);
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0printf("%p\n&q= uot;, user_stack);
+=C2=A0 =C2=A0 =C2=A0 =C2=A0}
+
=C2=A0 =C2=A0 =C2=A0 =C2=A0 *raised_vector =3D 0;
=C2=A0 =C2=A0 =C2=A0 =C2=A0 set_idt_entry(RET_TO_KERNEL_IRQ, &ret_to_ke= rnel, 3);
=C2=A0 =C2=A0 =C2=A0 =C2=A0 old_ex =3D handle_exception(fault_vector,
@@ -51,6 +58,8 @@ uint64_t run_in_user(usermode_func func, unsigned int fau= lt_vector,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 return 0;
=C2=A0 =C2=A0 =C2=A0 =C2=A0 }

+=C2=A0 =C2=A0 =C2=A0 =C2=A0memcpy(user_stack + USERMODE_STACK_SIZE - 8, &a= mp;func, 8);
+
=C2=A0 =C2=A0 =C2=A0 =C2=A0 asm volatile (
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 /* Prepare kernel SP for exception handlers */
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 "mov %%rsp, %[rsp0]\n\t"
@@ -63,12 +72,13 @@ uint64_t run_in_user(usermode_func func, unsigned int f= ault_vector,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 "pushq %[user_stack_top]\n\t"
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 "pushfq\n\t"
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 "pushq %[user_cs]\n\t"
-=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0"lea user_mode(%%rip), %%rax\n\t"
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0"lea user_mode+0x800000(%%rip), %%rax\n\t" // smap.flat= places usermode addresses at 8MB-16MB
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 "pushq %%rax\n\t"
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 "iretq\n"

=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 "user_mode:\n\t"
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 /* Back up volatile registers before invoking func */
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0"pop %%rax\n\t"
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 "push %%rcx\n\t"
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 "push %%rdx\n\t"
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 "push %%rdi\n\t"
@@ -78,11 +88,12 @@ uint64_t run_in_user(usermode_func func, unsigned int f= ault_vector,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 "push %%r10\n\t"
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 "push %%r11\n\t"
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 /* Call user mode function */
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0"add $0x800000,%%rbp\n\t"
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 "mov %[arg1], %%rdi\n\t"
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 "mov %[arg2], %%rsi\n\t"
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 "mov %[arg3], %%rdx\n\t"
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 "mov %[arg4], %%rcx\n\t"
-=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0"call *%[func]\n\t"
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0"call *%%rax\n\t"
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 /* Restore registers */
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 "pop %%r11\n\t"
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 "pop %%r10\n\t"
@@ -112,12 +123,11 @@ uint64_t run_in_user(usermode_func func, unsigned int= fault_vector,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 [arg2]"m"(arg2),
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 [arg3]"m"(arg3),
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 [arg4]"m"(arg4),
-=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0[func]"m"(func),
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 [user_ds]"i"(USER_DS),
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 [user_cs]"i"(USER_CS),
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 [kernel_ds]"rm"(KERNEL_DS),
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 [user_stack_top]"r"(user_stack +
-=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0sizeof(us= er_stack)),
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0USERMODE_= STACK_SIZE - 8),
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 [kernel_entry_vector]"i"(RET_TO_KERNEL_IRQ));

=C2=A0 =C2=A0 =C2=A0 =C2=A0 handle_exception(fault_vector, old_ex);
diff --git a/x86/smap.c b/x86/smap.c
index 9a823a55..65119442 100644
--- a/x86/smap.c
+++ b/x86/smap.c
@@ -2,6 +2,7 @@
=C2=A0#include <alloc_page.h>
=C2=A0#include "x86/desc.h"
=C2=A0#include "x86/processor.h"
+#include "x86/usermode.h"
=C2=A0#include "x86/vm.h"

=C2=A0volatile int pf_count =3D 0;
@@ -89,6 +90,31 @@ static void check_smap_nowp(void)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 write_cr3(read_cr3());
=C2=A0}

+#ifdef __x86_64__
+static void iret(void)
+{
+=C2=A0 =C2=A0 =C2=A0 =C2=A0asm volatile(
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0"mov %%rsp, %%rcx;"
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0"movl %%ss, %%ebx; pushq %%r= bx; pushq %%rcx;"
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0"pushf;"
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0"movl %%cs, %%ebx; pushq %%r= bx; "
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0"lea 1f(%%rip), %%rbx; pushq= %%rbx; iretq; 1:"
+
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0: : : "ebx&quo= t;, "ecx", "cc"); /* RPL=3D0 */
+}
+
+static void test_user_iret(void)
+{
+=C2=A0 =C2=A0 =C2=A0 =C2=A0bool raised_vector;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0uintptr_t user_iret =3D (uintptr_t)iret + USER_= BASE;
+
+=C2=A0 =C2=A0 =C2=A0 =C2=A0run_in_user((usermode_func)user_iret, PF_VECTOR= , 0, 0, 0, 0,
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0&= raised_vector);
+
+=C2=A0 =C2=A0 =C2=A0 =C2=A0report(!raised_vector, "No #PF on CPL=3D3 = DPL=3D3 iret");
+}
+#endif
+
=C2=A0int main(int ac, char **av)
=C2=A0{
=C2=A0 =C2=A0 =C2=A0 =C2=A0 unsigned long i;
@@ -196,7 +222,9 @@ int main(int ac, char **av)

=C2=A0 =C2=A0 =C2=A0 =C2=A0 check_smap_nowp();

-=C2=A0 =C2=A0 =C2=A0 =C2=A0// TODO: implicit kernel access from ring 3 (e.= g. int)
+#ifdef __x86_64__
+=C2=A0 =C2=A0 =C2=A0 =C2=A0test_user_iret();
+#endif

=C2=A0 =C2=A0 =C2=A0 =C2=A0 return report_summary();
=C2=A0}



--000000000000fe7506061ceb0c64--