LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 36/63] scsi: ibmvscsi: Avoid multi-field memset() overflow by aiming at srp
From: Kees Cook @ 2021-08-18  6:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Tyrel Datwyler, Kees Cook, Martin K. Petersen, Greg Kroah-Hartman,
	linux-block, James E.J. Bottomley, linux-staging, linux-wireless,
	Gustavo A. R. Silva, dri-devel, linux-scsi, clang-built-linux,
	netdev, Paul Mackerras, linux-hardening, Andrew Morton,
	linuxppc-dev, Rasmus Villemoes, linux-kbuild
In-Reply-To: <20210818060533.3569517-1-keescook@chromium.org>

In preparation for FORTIFY_SOURCE performing compile-time and run-time
field bounds checking for memset(), avoid intentionally writing across
neighboring fields.

Instead of writing beyond the end of evt_struct->iu.srp.cmd, target the
upper union (evt_struct->iu.srp) instead, as that's what is being wiped.

Cc: Tyrel Datwyler <tyreld@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: linux-scsi@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/lkml/yq135rzp79c.fsf@ca-mkp.ca.oracle.com
Acked-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Link: https://lore.kernel.org/lkml/6eae8434-e9a7-aa74-628b-b515b3695359@linux.ibm.com
---
 drivers/scsi/ibmvscsi/ibmvscsi.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvscsi.c b/drivers/scsi/ibmvscsi/ibmvscsi.c
index 50df7dd9cb91..ea8e01f49cba 100644
--- a/drivers/scsi/ibmvscsi/ibmvscsi.c
+++ b/drivers/scsi/ibmvscsi/ibmvscsi.c
@@ -1055,8 +1055,9 @@ static int ibmvscsi_queuecommand_lck(struct scsi_cmnd *cmnd,
 		return SCSI_MLQUEUE_HOST_BUSY;
 
 	/* Set up the actual SRP IU */
+	BUILD_BUG_ON(sizeof(evt_struct->iu.srp) != SRP_MAX_IU_LEN);
+	memset(&evt_struct->iu.srp, 0x00, sizeof(evt_struct->iu.srp));
 	srp_cmd = &evt_struct->iu.srp.cmd;
-	memset(srp_cmd, 0x00, SRP_MAX_IU_LEN);
 	srp_cmd->opcode = SRP_CMD;
 	memcpy(srp_cmd->cdb, cmnd->cmnd, sizeof(srp_cmd->cdb));
 	int_to_scsilun(lun, &srp_cmd->lun);
-- 
2.30.2


^ permalink raw reply related

* Re: [PATCH v7 1/2] tty: hvc: pass DMA capable memory to put_chars()
From: Xianting TIan @ 2021-08-18  3:35 UTC (permalink / raw)
  To: Jiri Slaby, gregkh, amit, arnd, osandov
  Cc: linuxppc-dev, linux-kernel, virtualization
In-Reply-To: <5b728c71-a754-b3ef-4ad3-6e33db1b7647@kernel.org>


在 2021/8/18 上午11:17, Jiri Slaby 写道:
> Hi,
>
> On 17. 08. 21, 15:22, Xianting Tian wrote:
>> As well known, hvc backend can register its opertions to hvc backend.
>> the opertions contain put_chars(), get_chars() and so on.
>
> "operations". And there too:
>
>> Some hvc backend may do dma in its opertions. eg, put_chars() of
>> virtio-console. But in the code of hvc framework, it may pass DMA
>> incapable memory to put_chars() under a specific configuration, which
>> is explained in commit c4baad5029(virtio-console: avoid DMA from stack):
>> 1, c[] is on stack,
>>     hvc_console_print():
>>     char c[N_OUTBUF] __ALIGNED__;
>>     cons_ops[index]->put_chars(vtermnos[index], c, i);
>> 2, ch is on stack,
>>     static void hvc_poll_put_char(,,char ch)
>>     {
>>     struct tty_struct *tty = driver->ttys[0];
>>     struct hvc_struct *hp = tty->driver_data;
>>     int n;
>>
>>     do {
>>         n = hp->ops->put_chars(hp->vtermno, &ch, 1);
>>     } while (n <= 0);
>>     }
>>
>> Commit c4baad5029 is just the fix to avoid DMA from stack memory, which
>> is passed to virtio-console by hvc framework in above code. But I think
>> the fix is aggressive, it directly uses kmemdup() to alloc new buffer
>> from kmalloc area and do memcpy no matter the memory is in kmalloc area
>> or not. But most importantly, it should better be fixed in the hvc
>> framework, by changing it to never pass stack memory to the put_chars()
>> function in the first place. Otherwise, we still face the same issue if
>> a new hvc backend using dma added in the furture.
>>
>> We make 'char c[N_OUTBUF]' part of 'struct hvc_struct', so hp->c is no
>> longer the stack memory. we can use it in above two cases.
>
> In fact, you need only a single char for the poll case 
> (hvc_poll_put_char), so hvc_struct needs to contain only c, not an array.
>
> OTOH, you need c[N_OUTBUF] in the console case (hvc_console_print), 
> but not whole hvc_struct. So cons_hvcs should be an array of structs 
> composed of only the lock and the buffer.
>
> Hum.
>
> Or maybe rethink and take care of the console case by kmemdup and be 
> done with that case? What problem do you have with allocating 16 
> bytes? It should be quite easy and really fast (lockless) in most 
> cases. On the contrary, your solution has to take a spinlock to access 
> the global buffer.

May be we can change hvc_struct as below,

struct hvc_struct {

         ...
         char out_ch;
         char c[N_OUTBUF] __ALIGNED__;
         int outbuf_size;
         char outbuf[0] __ALIGNED__;
};

c[N_OUTBUF]  is only used for hvc_console_print(); out_ch is only used 
for hvc_poll_put_char(). Thus no competition exits, the spinlock can be 
removed.

Then cons_hvcs array can only contains the buffer.

Is it OK for you?  thanks
>
>> Other fix is use L1_CACHE_BYTES as the alignment, use 'sizeof(long)' as
>> dma alignment is wrong. And use struct_size() to calculate size of
>> hvc_struct.
>
> This ought to be in separate patches.
OK, thanks
>
> thanks,

^ permalink raw reply

* Re: [PATCH/RFC] powerpc/module_64: allow .init_array constructors to run
From: Daniel Axtens @ 2021-08-18  3:32 UTC (permalink / raw)
  To: Jan Stancek, mpe, benh, paulus, christophe.leroy, linuxppc-dev
  Cc: linux-kernel, jstancek
In-Reply-To: <920acea9aa18e4f2956581a8e158bdaa376fdf63.1629203945.git.jstancek@redhat.com>

Jan Stancek <jstancek@redhat.com> writes:

> gcov and kasan rely on compiler generated constructor code.
> For modules, gcc-8 with gcov enabled generates .init_array section,
> but on ppc64le it doesn't get executed. find_module_sections() never
> finds .init_array section, because module_frob_arch_sections() renames
> it to _init_array.
>
> Avoid renaming .init_array section, so do_mod_ctors() can use it.

This (the existing renaming) breaks a KASAN test too, so I'd love to see
this fixed.

However, I don't know that renaming the section is a complete fix: from
memory it is still be possible to get relocations that are too far away
and require separate trampolines. But I wasn't able to construct a
module that exhibited this behaviour and test a fix before I got pulled
away to other things.

Kind regards,
Daniel

>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
> Signed-off-by: Jan Stancek <jstancek@redhat.com>
> ---
> I wasn't able to trace the comment:
>   "We don't handle .init for the moment: rename to _init"
> to original patch (it pre-dates .git). I'm not sure if it
> still applies today, so I limited patch to .init_array. This
> fixes gcov for modules for me on ppc64le 5.14.0-rc6.
>
> Renaming issue is also mentioned in kasan patches here:
>   https://patchwork.ozlabs.org/project/linuxppc-dev/cover/20210319144058.772525-1-dja@axtens
>
>  arch/powerpc/kernel/module_64.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c
> index 6baa676e7cb6..c604b13ea6bf 100644
> --- a/arch/powerpc/kernel/module_64.c
> +++ b/arch/powerpc/kernel/module_64.c
> @@ -299,8 +299,16 @@ int module_frob_arch_sections(Elf64_Ehdr *hdr,
>  					  sechdrs[i].sh_size);
>  
>  		/* We don't handle .init for the moment: rename to _init */
> -		while ((p = strstr(secstrings + sechdrs[i].sh_name, ".init")))
> +		while ((p = strstr(secstrings + sechdrs[i].sh_name, ".init"))) {
> +#ifdef CONFIG_CONSTRUCTORS
> +			/* find_module_sections() needs .init_array intact */
> +			if (strstr(secstrings + sechdrs[i].sh_name,
> +				".init_array")) {
> +				break;
> +			}
> +#endif
>  			p[0] = '_';
> +		}
>  
>  		if (sechdrs[i].sh_type == SHT_SYMTAB)
>  			dedotify((void *)hdr + sechdrs[i].sh_offset,
> -- 
> 2.27.0

^ permalink raw reply

* Re: [PATCH v7 1/2] tty: hvc: pass DMA capable memory to put_chars()
From: Jiri Slaby @ 2021-08-18  3:17 UTC (permalink / raw)
  To: Xianting Tian, gregkh, amit, arnd, osandov
  Cc: linuxppc-dev, linux-kernel, virtualization
In-Reply-To: <20210817132300.165014-2-xianting.tian@linux.alibaba.com>

Hi,

On 17. 08. 21, 15:22, Xianting Tian wrote:
> As well known, hvc backend can register its opertions to hvc backend.
> the opertions contain put_chars(), get_chars() and so on.

"operations". And there too:

> Some hvc backend may do dma in its opertions. eg, put_chars() of
> virtio-console. But in the code of hvc framework, it may pass DMA
> incapable memory to put_chars() under a specific configuration, which
> is explained in commit c4baad5029(virtio-console: avoid DMA from stack):
> 1, c[] is on stack,
>     hvc_console_print():
> 	char c[N_OUTBUF] __ALIGNED__;
> 	cons_ops[index]->put_chars(vtermnos[index], c, i);
> 2, ch is on stack,
>     static void hvc_poll_put_char(,,char ch)
>     {
> 	struct tty_struct *tty = driver->ttys[0];
> 	struct hvc_struct *hp = tty->driver_data;
> 	int n;
> 
> 	do {
> 		n = hp->ops->put_chars(hp->vtermno, &ch, 1);
> 	} while (n <= 0);
>     }
> 
> Commit c4baad5029 is just the fix to avoid DMA from stack memory, which
> is passed to virtio-console by hvc framework in above code. But I think
> the fix is aggressive, it directly uses kmemdup() to alloc new buffer
> from kmalloc area and do memcpy no matter the memory is in kmalloc area
> or not. But most importantly, it should better be fixed in the hvc
> framework, by changing it to never pass stack memory to the put_chars()
> function in the first place. Otherwise, we still face the same issue if
> a new hvc backend using dma added in the furture.
> 
> We make 'char c[N_OUTBUF]' part of 'struct hvc_struct', so hp->c is no
> longer the stack memory. we can use it in above two cases.

In fact, you need only a single char for the poll case 
(hvc_poll_put_char), so hvc_struct needs to contain only c, not an array.

OTOH, you need c[N_OUTBUF] in the console case (hvc_console_print), but 
not whole hvc_struct. So cons_hvcs should be an array of structs 
composed of only the lock and the buffer.

Hum.

Or maybe rethink and take care of the console case by kmemdup and be 
done with that case? What problem do you have with allocating 16 bytes? 
It should be quite easy and really fast (lockless) in most cases. On the 
contrary, your solution has to take a spinlock to access the global buffer.

> Other fix is use L1_CACHE_BYTES as the alignment, use 'sizeof(long)' as
> dma alignment is wrong. And use struct_size() to calculate size of
> hvc_struct.

This ought to be in separate patches.

thanks,
-- 
js
suse labs

^ permalink raw reply

* Re: [PATCH] scsi: ibmvfc: Stop using scsi_cmnd.tag
From: Bart Van Assche @ 2021-08-18  3:04 UTC (permalink / raw)
  To: John Garry, tyreld, mpe, benh, paulus, jejb, martin.petersen
  Cc: sfr, linux-scsi, linux-kernel, linux-next, hare, linuxppc-dev,
	hch
In-Reply-To: <1629207817-211936-1-git-send-email-john.garry@huawei.com>

On 8/17/21 6:43 AM, John Garry wrote:
> Use scsi_cmd_to_rq(scsi_cmnd)->tag in preference to scsi_cmnd.tag.

Reviewed-by: Bart Van Assche <bvanassche@acm.org>

^ permalink raw reply

* Re: [PATCH] scsi: ibmvfc: Stop using scsi_cmnd.tag
From: Martin K. Petersen @ 2021-08-18  2:31 UTC (permalink / raw)
  To: John Garry
  Cc: tyreld, linux-scsi, martin.petersen, sfr, jejb, linux-kernel,
	linux-next, paulus, hare, linuxppc-dev, hch, bvanassche
In-Reply-To: <1629207817-211936-1-git-send-email-john.garry@huawei.com>


John,

> Use scsi_cmd_to_rq(scsi_cmnd)->tag in preference to scsi_cmnd.tag.

Applied to 5.15/scsi-staging and rebased for bisectability.

Just to be picky it looks like there's another scsi_cmmd tag lurking in
qla1280.c but it's sitting behind an #ifdef DEBUG_QLA1280.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply

* Re: [PATCH v7 1/2] tty: hvc: pass DMA capable memory to put_chars()
From: Xianting TIan @ 2021-08-18  1:49 UTC (permalink / raw)
  To: Greg KH
  Cc: arnd, amit, jirislaby, linux-kernel, virtualization, linuxppc-dev,
	osandov
In-Reply-To: <YRvaN0RwW03kkO1O@kroah.com>


在 2021/8/17 下午11:48, Greg KH 写道:
> On Tue, Aug 17, 2021 at 09:22:59PM +0800, Xianting Tian wrote:
>> We tested the patch, it worked normally.
> Who is "we"?
>
>> Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
> Like I said before, I need another developer from your company to review
> and sign-off on this patch (and the other one), before I am willing to
> look at it, based on the previous mistakes that have happened here.
thanks, I will add the developer in v8 and also with fix a build 
warning, which I don't meet in my build process.
>
> thanks,
>
> greg k-h

^ permalink raw reply

* [PATCH 5/5] KVM: selftests: Remove __NR_userfaultfd syscall fallback
From: Sean Christopherson @ 2021-08-18  0:12 UTC (permalink / raw)
  To: Russell King, Catalin Marinas, Will Deacon, Guo Ren,
	Thomas Bogendoerfer, Michael Ellerman, Heiko Carstens,
	Vasily Gorbik, Christian Borntraeger, Oleg Nesterov,
	Steven Rostedt, Ingo Molnar, Thomas Gleixner, Peter Zijlstra,
	Andy Lutomirski, Mathieu Desnoyers, Paul E. McKenney, Boqun Feng,
	Paolo Bonzini, Shuah Khan
  Cc: linux-s390, kvm, Ben Gardon, linux-kernel, linux-csky, linux-mips,
	Peter Foley, Paul Mackerras, linux-kselftest, Sean Christopherson,
	Shakeel Butt, linuxppc-dev, linux-arm-kernel
In-Reply-To: <20210818001210.4073390-1-seanjc@google.com>

Revert the __NR_userfaultfd syscall fallback added for KVM selftests now
that x86's unistd_{32,63}.h overrides are under uapi/ and thus not in
KVM sefltests' search path, i.e. now that KVM gets x86 syscall numbers
from the installed kernel headers.

No functional change intended.

Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 tools/arch/x86/include/uapi/asm/unistd_64.h | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/tools/arch/x86/include/uapi/asm/unistd_64.h b/tools/arch/x86/include/uapi/asm/unistd_64.h
index 4205ed4158bf..cb52a3a8b8fc 100644
--- a/tools/arch/x86/include/uapi/asm/unistd_64.h
+++ b/tools/arch/x86/include/uapi/asm/unistd_64.h
@@ -1,7 +1,4 @@
 /* SPDX-License-Identifier: GPL-2.0 */
-#ifndef __NR_userfaultfd
-#define __NR_userfaultfd 282
-#endif
 #ifndef __NR_perf_event_open
 # define __NR_perf_event_open 298
 #endif
-- 
2.33.0.rc1.237.g0d66db33f3-goog


^ permalink raw reply related

* [PATCH 4/5] KVM: selftests: Add a test for KVM_RUN+rseq to detect task migration bugs
From: Sean Christopherson @ 2021-08-18  0:12 UTC (permalink / raw)
  To: Russell King, Catalin Marinas, Will Deacon, Guo Ren,
	Thomas Bogendoerfer, Michael Ellerman, Heiko Carstens,
	Vasily Gorbik, Christian Borntraeger, Oleg Nesterov,
	Steven Rostedt, Ingo Molnar, Thomas Gleixner, Peter Zijlstra,
	Andy Lutomirski, Mathieu Desnoyers, Paul E. McKenney, Boqun Feng,
	Paolo Bonzini, Shuah Khan
  Cc: linux-s390, kvm, Ben Gardon, linux-kernel, linux-csky, linux-mips,
	Peter Foley, Paul Mackerras, linux-kselftest, Sean Christopherson,
	Shakeel Butt, linuxppc-dev, linux-arm-kernel
In-Reply-To: <20210818001210.4073390-1-seanjc@google.com>

Add a test to verify an rseq's CPU ID is updated correctly if the task is
migrated while the kernel is handling KVM_RUN.  This is a regression test
for a bug introduced by commit 72c3c0fe54a3 ("x86/kvm: Use generic xfer
to guest work function"), where TIF_NOTIFY_RESUME would be cleared by KVM
without updating rseq, leading to a stale CPU ID and other badness.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 tools/testing/selftests/kvm/.gitignore  |   1 +
 tools/testing/selftests/kvm/Makefile    |   3 +
 tools/testing/selftests/kvm/rseq_test.c | 131 ++++++++++++++++++++++++
 3 files changed, 135 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/rseq_test.c

diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftests/kvm/.gitignore
index 0709af0144c8..6d031ff6b68e 100644
--- a/tools/testing/selftests/kvm/.gitignore
+++ b/tools/testing/selftests/kvm/.gitignore
@@ -47,6 +47,7 @@
 /kvm_page_table_test
 /memslot_modification_stress_test
 /memslot_perf_test
+/rseq_test
 /set_memory_region_test
 /steal_time
 /kvm_binary_stats_test
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index 5832f510a16c..0756e79cb513 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -80,6 +80,7 @@ TEST_GEN_PROGS_x86_64 += kvm_create_max_vcpus
 TEST_GEN_PROGS_x86_64 += kvm_page_table_test
 TEST_GEN_PROGS_x86_64 += memslot_modification_stress_test
 TEST_GEN_PROGS_x86_64 += memslot_perf_test
+TEST_GEN_PROGS_x86_64 += rseq_test
 TEST_GEN_PROGS_x86_64 += set_memory_region_test
 TEST_GEN_PROGS_x86_64 += steal_time
 TEST_GEN_PROGS_x86_64 += kvm_binary_stats_test
@@ -92,6 +93,7 @@ TEST_GEN_PROGS_aarch64 += dirty_log_test
 TEST_GEN_PROGS_aarch64 += dirty_log_perf_test
 TEST_GEN_PROGS_aarch64 += kvm_create_max_vcpus
 TEST_GEN_PROGS_aarch64 += kvm_page_table_test
+TEST_GEN_PROGS_aarch64 += rseq_test
 TEST_GEN_PROGS_aarch64 += set_memory_region_test
 TEST_GEN_PROGS_aarch64 += steal_time
 TEST_GEN_PROGS_aarch64 += kvm_binary_stats_test
@@ -103,6 +105,7 @@ TEST_GEN_PROGS_s390x += demand_paging_test
 TEST_GEN_PROGS_s390x += dirty_log_test
 TEST_GEN_PROGS_s390x += kvm_create_max_vcpus
 TEST_GEN_PROGS_s390x += kvm_page_table_test
+TEST_GEN_PROGS_s390x += rseq_test
 TEST_GEN_PROGS_s390x += set_memory_region_test
 TEST_GEN_PROGS_s390x += kvm_binary_stats_test
 
diff --git a/tools/testing/selftests/kvm/rseq_test.c b/tools/testing/selftests/kvm/rseq_test.c
new file mode 100644
index 000000000000..90ed535eded7
--- /dev/null
+++ b/tools/testing/selftests/kvm/rseq_test.c
@@ -0,0 +1,131 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#define _GNU_SOURCE /* for program_invocation_short_name */
+#include <errno.h>
+#include <fcntl.h>
+#include <pthread.h>
+#include <sched.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <signal.h>
+#include <syscall.h>
+#include <sys/ioctl.h>
+#include <linux/rseq.h>
+#include <linux/unistd.h>
+
+#include "kvm_util.h"
+#include "processor.h"
+#include "test_util.h"
+
+#define VCPU_ID 0
+
+static __thread volatile struct rseq __rseq = {
+	.cpu_id = RSEQ_CPU_ID_UNINITIALIZED,
+};
+
+#define RSEQ_SIG 0xdeadbeef
+
+static pthread_t migration_thread;
+static cpu_set_t possible_mask;
+static bool done;
+
+static void guest_code(void)
+{
+	for (;;)
+		GUEST_SYNC(0);
+}
+
+static void sys_rseq(int flags)
+{
+	int r;
+
+	r = syscall(__NR_rseq, &__rseq, sizeof(__rseq), flags, RSEQ_SIG);
+	TEST_ASSERT(!r, "rseq failed, errno = %d (%s)", errno, strerror(errno));
+}
+
+static void *migration_worker(void *ign)
+{
+	cpu_set_t allowed_mask;
+	int r, i, nr_cpus, cpu;
+
+	CPU_ZERO(&allowed_mask);
+
+	nr_cpus = CPU_COUNT(&possible_mask);
+
+	for (i = 0; i < 20000; i++) {
+		cpu = i % nr_cpus;
+		if (!CPU_ISSET(cpu, &possible_mask))
+			continue;
+
+		CPU_SET(cpu, &allowed_mask);
+
+		r = sched_setaffinity(0, sizeof(allowed_mask), &allowed_mask);
+		TEST_ASSERT(!r, "sched_setaffinity failed, errno = %d (%s)", errno,
+			    strerror(errno));
+
+		CPU_CLR(cpu, &allowed_mask);
+
+		usleep(10);
+	}
+	done = true;
+	return NULL;
+}
+
+int main(int argc, char *argv[])
+{
+	struct kvm_vm *vm;
+	u32 cpu, rseq_cpu;
+	int r;
+
+	/* Tell stdout not to buffer its content */
+	setbuf(stdout, NULL);
+
+	r = sched_getaffinity(0, sizeof(possible_mask), &possible_mask);
+	TEST_ASSERT(!r, "sched_getaffinity failed, errno = %d (%s)", errno,
+		    strerror(errno));
+
+	if (CPU_COUNT(&possible_mask) < 2) {
+		print_skip("Only one CPU, task migration not possible\n");
+		exit(KSFT_SKIP);
+	}
+
+	sys_rseq(0);
+
+	/*
+	 * Create and run a dummy VM that immediately exits to userspace via
+	 * GUEST_SYNC, while concurrently migrating the process by setting its
+	 * CPU affinity.
+	 */
+	vm = vm_create_default(VCPU_ID, 0, guest_code);
+
+	pthread_create(&migration_thread, NULL, migration_worker, 0);
+
+	while (!done) {
+		vcpu_run(vm, VCPU_ID);
+		TEST_ASSERT(get_ucall(vm, VCPU_ID, NULL) == UCALL_SYNC,
+			    "Guest failed?");
+
+		cpu = sched_getcpu();
+		rseq_cpu = READ_ONCE(__rseq.cpu_id);
+
+		/*
+		 * Verify rseq's CPU matches sched's CPU, and that sched's CPU
+		 * is stable.  This doesn't handle the case where the task is
+		 * migrated between sched_getcpu() and reading rseq, and again
+		 * between reading rseq and sched_getcpu(), but in practice no
+		 * false positives have been observed, while on the other hand
+		 * blocking migration while this thread reads CPUs messes with
+		 * the timing and prevents hitting failures on a buggy kernel.
+		 */
+		TEST_ASSERT(rseq_cpu == cpu || cpu != sched_getcpu(),
+			    "rseq CPU = %d, sched CPU = %d\n", rseq_cpu, cpu);
+	}
+
+	pthread_join(migration_thread, NULL);
+
+	kvm_vm_free(vm);
+
+	sys_rseq(RSEQ_FLAG_UNREGISTER);
+
+	return 0;
+}
-- 
2.33.0.rc1.237.g0d66db33f3-goog


^ permalink raw reply related

* [PATCH 3/5] tools: Move x86 syscall number fallbacks to .../uapi/
From: Sean Christopherson @ 2021-08-18  0:12 UTC (permalink / raw)
  To: Russell King, Catalin Marinas, Will Deacon, Guo Ren,
	Thomas Bogendoerfer, Michael Ellerman, Heiko Carstens,
	Vasily Gorbik, Christian Borntraeger, Oleg Nesterov,
	Steven Rostedt, Ingo Molnar, Thomas Gleixner, Peter Zijlstra,
	Andy Lutomirski, Mathieu Desnoyers, Paul E. McKenney, Boqun Feng,
	Paolo Bonzini, Shuah Khan
  Cc: linux-s390, kvm, Ben Gardon, linux-kernel, linux-csky, linux-mips,
	Peter Foley, Paul Mackerras, linux-kselftest, Sean Christopherson,
	Shakeel Butt, linuxppc-dev, linux-arm-kernel
In-Reply-To: <20210818001210.4073390-1-seanjc@google.com>

Move unistd_{32,64}.h from x86/include/asm to x86/include/uapi/asm so
that tools/selftests that install kernel headers, e.g. KVM selftests, can
include non-uapi tools headers, e.g. to get 'struct list_head', without
effectively overriding the installed non-tool uapi headers.

Swapping KVM's search order, e.g. to search the kernel headers before
tool headers, is not a viable option as doing results in linux/type.h and
other core headers getting pulled from the kernel headers, which do not
have the kernel-internal typedefs that are used through tools, including
many files outside of selftests/kvm's control.

Prior to commit cec07f53c398 ("perf tools: Move syscall number fallbacks
from perf-sys.h to tools/arch/x86/include/asm/"), the handcoded numbers
were actual fallbacks, i.e. overriding unistd_{32,64}.h from the kernel
headers was unintentional.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 tools/arch/x86/include/{ => uapi}/asm/unistd_32.h | 0
 tools/arch/x86/include/{ => uapi}/asm/unistd_64.h | 0
 2 files changed, 0 insertions(+), 0 deletions(-)
 rename tools/arch/x86/include/{ => uapi}/asm/unistd_32.h (100%)
 rename tools/arch/x86/include/{ => uapi}/asm/unistd_64.h (100%)

diff --git a/tools/arch/x86/include/asm/unistd_32.h b/tools/arch/x86/include/uapi/asm/unistd_32.h
similarity index 100%
rename from tools/arch/x86/include/asm/unistd_32.h
rename to tools/arch/x86/include/uapi/asm/unistd_32.h
diff --git a/tools/arch/x86/include/asm/unistd_64.h b/tools/arch/x86/include/uapi/asm/unistd_64.h
similarity index 100%
rename from tools/arch/x86/include/asm/unistd_64.h
rename to tools/arch/x86/include/uapi/asm/unistd_64.h
-- 
2.33.0.rc1.237.g0d66db33f3-goog

^ permalink raw reply

* [PATCH 2/5] entry: rseq: Call rseq_handle_notify_resume() in tracehook_notify_resume()
From: Sean Christopherson @ 2021-08-18  0:12 UTC (permalink / raw)
  To: Russell King, Catalin Marinas, Will Deacon, Guo Ren,
	Thomas Bogendoerfer, Michael Ellerman, Heiko Carstens,
	Vasily Gorbik, Christian Borntraeger, Oleg Nesterov,
	Steven Rostedt, Ingo Molnar, Thomas Gleixner, Peter Zijlstra,
	Andy Lutomirski, Mathieu Desnoyers, Paul E. McKenney, Boqun Feng,
	Paolo Bonzini, Shuah Khan
  Cc: linux-s390, kvm, Ben Gardon, linux-kernel, linux-csky, linux-mips,
	Peter Foley, Paul Mackerras, linux-kselftest, Sean Christopherson,
	Shakeel Butt, linuxppc-dev, linux-arm-kernel
In-Reply-To: <20210818001210.4073390-1-seanjc@google.com>

Invoke rseq_handle_notify_resume() from tracehook_notify_resume() now
that the two function are always called back-to-back by architectures
that have rseq.  The rseq helper is stubbed out for architectures that
don't support rseq, i.e. this is a nop across the board.

Note, tracehook_notify_resume() is horribly named and arguably does not
belong in tracehook.h as literally every line of code in it has nothing
to do with tracing.  But, that's been true since commit a42c6ded827d
("move key_repace_session_keyring() into tracehook_notify_resume()")
first usurped tracehook_notify_resume() back in 2012.  Punt cleaning that
mess up to future patches.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/arm/kernel/signal.c     | 1 -
 arch/arm64/kernel/signal.c   | 1 -
 arch/csky/kernel/signal.c    | 4 +---
 arch/mips/kernel/signal.c    | 4 +---
 arch/powerpc/kernel/signal.c | 4 +---
 arch/s390/kernel/signal.c    | 1 -
 include/linux/tracehook.h    | 2 ++
 kernel/entry/common.c        | 4 +---
 kernel/entry/kvm.c           | 4 +---
 9 files changed, 7 insertions(+), 18 deletions(-)

diff --git a/arch/arm/kernel/signal.c b/arch/arm/kernel/signal.c
index a3a38d0a4c85..9df68d139965 100644
--- a/arch/arm/kernel/signal.c
+++ b/arch/arm/kernel/signal.c
@@ -670,7 +670,6 @@ do_work_pending(struct pt_regs *regs, unsigned int thread_flags, int syscall)
 				uprobe_notify_resume(regs);
 			} else {
 				tracehook_notify_resume(regs);
-				rseq_handle_notify_resume(NULL, regs);
 			}
 		}
 		local_irq_disable();
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
index 23036334f4dc..22b55db13da6 100644
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -951,7 +951,6 @@ asmlinkage void do_notify_resume(struct pt_regs *regs,
 
 			if (thread_flags & _TIF_NOTIFY_RESUME) {
 				tracehook_notify_resume(regs);
-				rseq_handle_notify_resume(NULL, regs);
 
 				/*
 				 * If we reschedule after checking the affinity
diff --git a/arch/csky/kernel/signal.c b/arch/csky/kernel/signal.c
index 312f046d452d..bc4238b9f709 100644
--- a/arch/csky/kernel/signal.c
+++ b/arch/csky/kernel/signal.c
@@ -260,8 +260,6 @@ asmlinkage void do_notify_resume(struct pt_regs *regs,
 	if (thread_info_flags & (_TIF_SIGPENDING | _TIF_NOTIFY_SIGNAL))
 		do_signal(regs);
 
-	if (thread_info_flags & _TIF_NOTIFY_RESUME) {
+	if (thread_info_flags & _TIF_NOTIFY_RESUME)
 		tracehook_notify_resume(regs);
-		rseq_handle_notify_resume(NULL, regs);
-	}
 }
diff --git a/arch/mips/kernel/signal.c b/arch/mips/kernel/signal.c
index f1e985109da0..c9b2a75563e1 100644
--- a/arch/mips/kernel/signal.c
+++ b/arch/mips/kernel/signal.c
@@ -906,10 +906,8 @@ asmlinkage void do_notify_resume(struct pt_regs *regs, void *unused,
 	if (thread_info_flags & (_TIF_SIGPENDING | _TIF_NOTIFY_SIGNAL))
 		do_signal(regs);
 
-	if (thread_info_flags & _TIF_NOTIFY_RESUME) {
+	if (thread_info_flags & _TIF_NOTIFY_RESUME)
 		tracehook_notify_resume(regs);
-		rseq_handle_notify_resume(NULL, regs);
-	}
 
 	user_enter();
 }
diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c
index e600764a926c..b93b87df499d 100644
--- a/arch/powerpc/kernel/signal.c
+++ b/arch/powerpc/kernel/signal.c
@@ -293,10 +293,8 @@ void do_notify_resume(struct pt_regs *regs, unsigned long thread_info_flags)
 		do_signal(current);
 	}
 
-	if (thread_info_flags & _TIF_NOTIFY_RESUME) {
+	if (thread_info_flags & _TIF_NOTIFY_RESUME)
 		tracehook_notify_resume(regs);
-		rseq_handle_notify_resume(NULL, regs);
-	}
 }
 
 static unsigned long get_tm_stackpointer(struct task_struct *tsk)
diff --git a/arch/s390/kernel/signal.c b/arch/s390/kernel/signal.c
index 78ef53b29958..b307db26bf2d 100644
--- a/arch/s390/kernel/signal.c
+++ b/arch/s390/kernel/signal.c
@@ -537,5 +537,4 @@ void arch_do_signal_or_restart(struct pt_regs *regs, bool has_signal)
 void do_notify_resume(struct pt_regs *regs)
 {
 	tracehook_notify_resume(regs);
-	rseq_handle_notify_resume(NULL, regs);
 }
diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h
index 3e80c4bc66f7..2564b7434b4d 100644
--- a/include/linux/tracehook.h
+++ b/include/linux/tracehook.h
@@ -197,6 +197,8 @@ static inline void tracehook_notify_resume(struct pt_regs *regs)
 
 	mem_cgroup_handle_over_high();
 	blkcg_maybe_throttle_current();
+
+	rseq_handle_notify_resume(NULL, regs);
 }
 
 /*
diff --git a/kernel/entry/common.c b/kernel/entry/common.c
index bf16395b9e13..d5a61d565ad5 100644
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -171,10 +171,8 @@ static unsigned long exit_to_user_mode_loop(struct pt_regs *regs,
 		if (ti_work & (_TIF_SIGPENDING | _TIF_NOTIFY_SIGNAL))
 			handle_signal_work(regs, ti_work);
 
-		if (ti_work & _TIF_NOTIFY_RESUME) {
+		if (ti_work & _TIF_NOTIFY_RESUME)
 			tracehook_notify_resume(regs);
-			rseq_handle_notify_resume(NULL, regs);
-		}
 
 		/* Architecture specific TIF work */
 		arch_exit_to_user_mode_work(regs, ti_work);
diff --git a/kernel/entry/kvm.c b/kernel/entry/kvm.c
index 049fd06b4c3d..49972ee99aff 100644
--- a/kernel/entry/kvm.c
+++ b/kernel/entry/kvm.c
@@ -19,10 +19,8 @@ static int xfer_to_guest_mode_work(struct kvm_vcpu *vcpu, unsigned long ti_work)
 		if (ti_work & _TIF_NEED_RESCHED)
 			schedule();
 
-		if (ti_work & _TIF_NOTIFY_RESUME) {
+		if (ti_work & _TIF_NOTIFY_RESUME)
 			tracehook_notify_resume(NULL);
-			rseq_handle_notify_resume(NULL, NULL);
-		}
 
 		ret = arch_xfer_to_guest_mode_handle_work(vcpu, ti_work);
 		if (ret)
-- 
2.33.0.rc1.237.g0d66db33f3-goog


^ permalink raw reply related

* [PATCH 0/5] KVM: rseq: Fix and a test for a KVM+rseq bug
From: Sean Christopherson @ 2021-08-18  0:12 UTC (permalink / raw)
  To: Russell King, Catalin Marinas, Will Deacon, Guo Ren,
	Thomas Bogendoerfer, Michael Ellerman, Heiko Carstens,
	Vasily Gorbik, Christian Borntraeger, Oleg Nesterov,
	Steven Rostedt, Ingo Molnar, Thomas Gleixner, Peter Zijlstra,
	Andy Lutomirski, Mathieu Desnoyers, Paul E. McKenney, Boqun Feng,
	Paolo Bonzini, Shuah Khan
  Cc: linux-s390, kvm, Ben Gardon, linux-kernel, linux-csky, linux-mips,
	Peter Foley, Paul Mackerras, linux-kselftest, Sean Christopherson,
	Shakeel Butt, linuxppc-dev, linux-arm-kernel

Patch 1 fixes a KVM+rseq bug where KVM's handling of TIF_NOTIFY_RESUME,
e.g. for task migration, clears the flag without informing rseq and leads
to stale data in userspace's rseq struct.

Patch 2 is a cleanup to try and make future bugs less likely.  It's also
a baby step towards moving and renaming tracehook_notify_resume() since
it has nothing to do with tracing.  It kills me to not do the move/rename
as part of this series, but having a dedicated series/discussion seems
more appropriate given the sheer number of architectures that call
tracehook_notify_resume() and the lack of an obvious home for the code.

Patch 3 is a fix/cleanup to stop overriding x86's unistd_{32,64}.h when
the include path (intentionally) omits tools' uapi headers.  KVM's
selftests do exactly that so that they can pick up the uapi headers from
the installed kernel headers, and still use various tools/ headers that
mirror kernel code, e.g. linux/types.h.  This allows the new test in
patch 4 to reference __NR_rseq without having to manually define it.

Patch 4 is a regression test for the KVM+rseq bug.

Patch 5 is a cleanup made possible by patch 3.

Sean Christopherson (5):
  KVM: rseq: Update rseq when processing NOTIFY_RESUME on xfer to KVM
    guest
  entry: rseq: Call rseq_handle_notify_resume() in
    tracehook_notify_resume()
  tools: Move x86 syscall number fallbacks to .../uapi/
  KVM: selftests: Add a test for KVM_RUN+rseq to detect task migration
    bugs
  KVM: selftests: Remove __NR_userfaultfd syscall fallback

 arch/arm/kernel/signal.c                      |   1 -
 arch/arm64/kernel/signal.c                    |   1 -
 arch/csky/kernel/signal.c                     |   4 +-
 arch/mips/kernel/signal.c                     |   4 +-
 arch/powerpc/kernel/signal.c                  |   4 +-
 arch/s390/kernel/signal.c                     |   1 -
 include/linux/tracehook.h                     |   2 +
 kernel/entry/common.c                         |   4 +-
 kernel/rseq.c                                 |   4 +-
 .../x86/include/{ => uapi}/asm/unistd_32.h    |   0
 .../x86/include/{ => uapi}/asm/unistd_64.h    |   3 -
 tools/testing/selftests/kvm/.gitignore        |   1 +
 tools/testing/selftests/kvm/Makefile          |   3 +
 tools/testing/selftests/kvm/rseq_test.c       | 131 ++++++++++++++++++
 14 files changed, 143 insertions(+), 20 deletions(-)
 rename tools/arch/x86/include/{ => uapi}/asm/unistd_32.h (100%)
 rename tools/arch/x86/include/{ => uapi}/asm/unistd_64.h (83%)
 create mode 100644 tools/testing/selftests/kvm/rseq_test.c

-- 
2.33.0.rc1.237.g0d66db33f3-goog

^ permalink raw reply

* [PATCH 1/5] KVM: rseq: Update rseq when processing NOTIFY_RESUME on xfer to KVM guest
From: Sean Christopherson @ 2021-08-18  0:12 UTC (permalink / raw)
  To: Russell King, Catalin Marinas, Will Deacon, Guo Ren,
	Thomas Bogendoerfer, Michael Ellerman, Heiko Carstens,
	Vasily Gorbik, Christian Borntraeger, Oleg Nesterov,
	Steven Rostedt, Ingo Molnar, Thomas Gleixner, Peter Zijlstra,
	Andy Lutomirski, Mathieu Desnoyers, Paul E. McKenney, Boqun Feng,
	Paolo Bonzini, Shuah Khan
  Cc: linux-s390, kvm, Ben Gardon, linux-kernel, linux-csky, linux-mips,
	Peter Foley, Paul Mackerras, linux-kselftest, Sean Christopherson,
	Shakeel Butt, linuxppc-dev, linux-arm-kernel
In-Reply-To: <20210818001210.4073390-1-seanjc@google.com>

Invoke rseq's NOTIFY_RESUME handler when processing the flag prior to
transferring to a KVM guest, which is roughly equivalent to an exit to
userspace and processes many of the same pending actions.  While the task
cannot be in an rseq critical section as the KVM path is reachable only
via ioctl(KVM_RUN), the side effects that apply to rseq outside of a
critical section still apply, e.g. the CPU ID needs to be updated if the
task is migrated.

Clearing TIF_NOTIFY_RESUME without informing rseq can lead to segfaults
and other badness in userspace VMMs that use rseq in combination with KVM,
e.g. due to the CPU ID being stale after task migration.

Fixes: 72c3c0fe54a3 ("x86/kvm: Use generic xfer to guest work function")
Reported-by: Peter Foley <pefoley@google.com>
Bisected-by: Doug Evans <dje@google.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 kernel/entry/kvm.c | 4 +++-
 kernel/rseq.c      | 4 ++--
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/kernel/entry/kvm.c b/kernel/entry/kvm.c
index 49972ee99aff..049fd06b4c3d 100644
--- a/kernel/entry/kvm.c
+++ b/kernel/entry/kvm.c
@@ -19,8 +19,10 @@ static int xfer_to_guest_mode_work(struct kvm_vcpu *vcpu, unsigned long ti_work)
 		if (ti_work & _TIF_NEED_RESCHED)
 			schedule();
 
-		if (ti_work & _TIF_NOTIFY_RESUME)
+		if (ti_work & _TIF_NOTIFY_RESUME) {
 			tracehook_notify_resume(NULL);
+			rseq_handle_notify_resume(NULL, NULL);
+		}
 
 		ret = arch_xfer_to_guest_mode_handle_work(vcpu, ti_work);
 		if (ret)
diff --git a/kernel/rseq.c b/kernel/rseq.c
index 35f7bd0fced0..58c79a7918cd 100644
--- a/kernel/rseq.c
+++ b/kernel/rseq.c
@@ -236,7 +236,7 @@ static bool in_rseq_cs(unsigned long ip, struct rseq_cs *rseq_cs)
 
 static int rseq_ip_fixup(struct pt_regs *regs)
 {
-	unsigned long ip = instruction_pointer(regs);
+	unsigned long ip = regs ? instruction_pointer(regs) : 0;
 	struct task_struct *t = current;
 	struct rseq_cs rseq_cs;
 	int ret;
@@ -250,7 +250,7 @@ static int rseq_ip_fixup(struct pt_regs *regs)
 	 * If not nested over a rseq critical section, restart is useless.
 	 * Clear the rseq_cs pointer and return.
 	 */
-	if (!in_rseq_cs(ip, &rseq_cs))
+	if (!regs || !in_rseq_cs(ip, &rseq_cs))
 		return clear_rseq_cs(t);
 	ret = rseq_need_restart(t, rseq_cs.flags);
 	if (ret <= 0)
-- 
2.33.0.rc1.237.g0d66db33f3-goog


^ permalink raw reply related

* Re: [PATCH kernel] KVM: PPC: Book3S HV: Make unique debugfs nodename
From: Fabiano Rosas @ 2021-08-17 22:20 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Paul Mackerras; +Cc: linuxppc-dev, linux-kernel, kvm-ppc
In-Reply-To: <be02290c-60a0-48af-0491-61e8a6d5b7b7@ozlabs.ru>

Alexey Kardashevskiy <aik@ozlabs.ru> writes:

> On 07/07/2021 14:13, Alexey Kardashevskiy wrote:

> alternatively move this debugfs stuff under the platform-independent
> directory, how about that?

That's a good idea. I only now realized we have two separate directories
for the same guest:

$ ls /sys/kernel/debug/kvm/ | grep $pid
19062-11
vm19062               

Looks like we would have to implement kvm_arch_create_vcpu_debugfs for
the vcpu information and add a similar hook for the vm.

>> ---
>>   arch/powerpc/kvm/book3s_hv.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
>> index 1d1fcc290fca..0223ddc0eed0 100644
>> --- a/arch/powerpc/kvm/book3s_hv.c
>> +++ b/arch/powerpc/kvm/book3s_hv.c
>> @@ -5227,7 +5227,7 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
>>   	/*
>>   	 * Create a debugfs directory for the VM
>>   	 */
>> -	snprintf(buf, sizeof(buf), "vm%d", current->pid);
>> +	snprintf(buf, sizeof(buf), "vm%d-lp%ld", current->pid, lpid);
>>   	kvm->arch.debugfs_dir = debugfs_create_dir(buf, kvm_debugfs_dir);
>>   	kvmppc_mmu_debugfs_init(kvm);
>>   	if (radix_enabled())
>> 

^ permalink raw reply

* Re: [PATCH v2 06/12] x86/sev: Replace occurrences of sev_active() with prot_guest_has()
From: Borislav Petkov @ 2021-08-17 18:43 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: Sathyanarayanan Kuppuswamy, linux-efi, Brijesh Singh, kvm,
	Peter Zijlstra, Dave Hansen, dri-devel, platform-driver-x86,
	Ard Biesheuvel, linux-s390, Andi Kleen, Joerg Roedel, x86,
	amd-gfx, Ingo Molnar, linux-graphics-maintainer, Joerg Roedel,
	Tianyu Lan, Andy Lutomirski, Thomas Gleixner, kexec, linux-kernel,
	iommu, linux-fsdevel, linuxppc-dev
In-Reply-To: <b346ae1b-dbd3-cdbd-b5cd-b5ab9c304737@amd.com>

On Tue, Aug 17, 2021 at 10:26:18AM -0500, Tom Lendacky wrote:
> >>  	/*
> >> -	 * If SME is active we need to be sure that kexec pages are
> >> -	 * not encrypted because when we boot to the new kernel the
> >> +	 * If host memory encryption is active we need to be sure that kexec
> >> +	 * pages are not encrypted because when we boot to the new kernel the
> >>  	 * pages won't be accessed encrypted (initially).
> >>  	 */
> > 
> > That hunk belongs logically into the previous patch which removes
> > sme_active().
> 
> I was trying to keep the sev_active() changes separate... so even though
> it's an SME thing, I kept it here. But I can move it to the previous
> patch, it just might look strange.

Oh I meant only the comment because it is a SME-related change. But not
too important so whatever.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply

* Re: [PATCH v2 05/12] x86/sme: Replace occurrences of sme_active() with prot_guest_has()
From: Borislav Petkov @ 2021-08-17 18:41 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: Sathyanarayanan Kuppuswamy, linux-efi, Brijesh Singh, kvm,
	Peter Zijlstra, Dave Hansen, dri-devel, platform-driver-x86,
	Will Deacon, linux-s390, Andi Kleen, Joerg Roedel, x86, amd-gfx,
	Ingo Molnar, linux-graphics-maintainer, Joerg Roedel, Tianyu Lan,
	Andy Lutomirski, Thomas Gleixner, kexec, linux-kernel, iommu,
	linux-fsdevel, linuxppc-dev
In-Reply-To: <2996b1c8-1ea1-0e56-3aad-08b46fc207f0@amd.com>

On Tue, Aug 17, 2021 at 09:46:58AM -0500, Tom Lendacky wrote:
> I'm ok with letting the TDX folks make changes to these calls to be SME or
> SEV specific, if necessary, later.

Yap, exactly. Let's add the specific stuff only when really needed.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply

* Re: [PATCH v2 03/12] x86/sev: Add an x86 version of prot_guest_has()
From: Borislav Petkov @ 2021-08-17 18:39 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: Sathyanarayanan Kuppuswamy, linux-efi, Brijesh Singh, kvm,
	Peter Zijlstra, Dave Hansen, dri-devel, platform-driver-x86,
	linux-s390, Andi Kleen, Joerg Roedel, x86, amd-gfx, Ingo Molnar,
	linux-graphics-maintainer, Joerg Roedel, Tianyu Lan,
	Andy Lutomirski, Thomas Gleixner, kexec, linux-kernel, iommu,
	linux-fsdevel, linuxppc-dev
In-Reply-To: <f4b8bc3c-5158-cd04-6e12-77f08036ea19@amd.com>

On Tue, Aug 17, 2021 at 10:22:52AM -0500, Tom Lendacky wrote:
> I can change it to be an AMD/HYGON check...  although, I'll have to check
> to see if any (very) early use of the function will work with that.

We can always change it later if really needed. It is just that I'm not
a fan of such "preemptive" changes.

> At a minimum, the check in arch/x86/kernel/head64.c will have to be
> changed or removed. I'll take a closer look.

Yeah, sme_me_mask, already discussed on IRC.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply

* Re: [PATCH v7 1/2] tty: hvc: pass DMA capable memory to put_chars()
From: kernel test robot @ 2021-08-17 18:24 UTC (permalink / raw)
  To: Xianting Tian, gregkh, jirislaby, amit, arnd, osandov
  Cc: kbuild-all, Xianting Tian, linux-kernel, virtualization,
	clang-built-linux, linuxppc-dev
In-Reply-To: <20210817132300.165014-2-xianting.tian@linux.alibaba.com>

[-- Attachment #1: Type: text/plain, Size: 4844 bytes --]

Hi Xianting,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on tty/tty-testing]
[also build test WARNING on char-misc/char-misc-testing soc/for-next v5.14-rc6 next-20210817]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Xianting-Tian/make-hvc-pass-dma-capable-memory-to-its-backend/20210817-212556
base:   https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty.git tty-testing
config: i386-randconfig-r021-20210816 (attached as .config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project 2c6448cdc2f68f8c28fd0bd9404182b81306e6e6)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/f12c3bee9f2413ed7643d858b40ce2337329fdae
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Xianting-Tian/make-hvc-pass-dma-capable-memory-to-its-backend/20210817-212556
        git checkout f12c3bee9f2413ed7643d858b40ce2337329fdae
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=i386 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   clang-14: warning: optimization flag '-falign-jumps=0' is not supported [-Wignored-optimization-argument]
   In file included from drivers/tty/hvc/hvc_console.c:15:
   In file included from include/linux/kbd_kern.h:5:
   In file included from include/linux/tty.h:5:
   In file included from include/linux/fs.h:6:
   In file included from include/linux/wait_bit.h:8:
   In file included from include/linux/wait.h:9:
   In file included from include/linux/spinlock.h:51:
   In file included from include/linux/preempt.h:78:
   In file included from arch/x86/include/asm/preempt.h:7:
   In file included from include/linux/thread_info.h:60:
   arch/x86/include/asm/thread_info.h:172:13: warning: calling '__builtin_frame_address' with a nonzero argument is unsafe [-Wframe-address]
           oldframe = __builtin_frame_address(1);
                      ^~~~~~~~~~~~~~~~~~~~~~~~~~
   arch/x86/include/asm/thread_info.h:174:11: warning: calling '__builtin_frame_address' with a nonzero argument is unsafe [-Wframe-address]
                   frame = __builtin_frame_address(2);
                           ^~~~~~~~~~~~~~~~~~~~~~~~~~
>> drivers/tty/hvc/hvc_console.c:160:18: warning: address of array 'hp->c' will always evaluate to 'true' [-Wpointer-bool-conversion]
           if (!hp || !hp->c)
                      ~~~~~^
   3 warnings generated.


vim +160 drivers/tty/hvc/hvc_console.c

   136	
   137	/*
   138	 * Console APIs, NOT TTY.  These APIs are available immediately when
   139	 * hvc_console_setup() finds adapters.
   140	 */
   141	
   142	static void hvc_console_print(struct console *co, const char *b,
   143				      unsigned count)
   144	{
   145		char *c;
   146		unsigned i = 0, n = 0;
   147		int r, donecr = 0, index = co->index;
   148		unsigned long flags;
   149		struct hvc_struct *hp;
   150	
   151		/* Console access attempt outside of acceptable console range. */
   152		if (index >= MAX_NR_HVC_CONSOLES)
   153			return;
   154	
   155		/* This console adapter was removed so it is not usable. */
   156		if (vtermnos[index] == -1)
   157			return;
   158	
   159		hp = cons_hvcs[index];
 > 160		if (!hp || !hp->c)
   161			return;
   162	
   163		c = hp->c;
   164	
   165		spin_lock_irqsave(&hp->c_lock, flags);
   166		while (count > 0 || i > 0) {
   167			if (count > 0 && i < sizeof(c)) {
   168				if (b[n] == '\n' && !donecr) {
   169					c[i++] = '\r';
   170					donecr = 1;
   171				} else {
   172					c[i++] = b[n++];
   173					donecr = 0;
   174					--count;
   175				}
   176			} else {
   177				r = cons_ops[index]->put_chars(vtermnos[index], c, i);
   178				if (r <= 0) {
   179					/* throw away characters on error
   180					 * but spin in case of -EAGAIN */
   181					if (r != -EAGAIN) {
   182						i = 0;
   183					} else {
   184						hvc_console_flush(cons_ops[index],
   185							      vtermnos[index]);
   186					}
   187				} else if (r > 0) {
   188					i -= r;
   189					if (i > 0)
   190						memmove(c, c+r, i);
   191				}
   192			}
   193		}
   194		spin_unlock_irqrestore(&hp->c_lock, flags);
   195		hvc_console_flush(cons_ops[index], vtermnos[index]);
   196	}
   197	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 36668 bytes --]

^ permalink raw reply

* Re: [PATCH] macintosh: no need to initilise statics to 0
From: Joe Perches @ 2021-08-17 18:05 UTC (permalink / raw)
  To: Christophe Leroy, Jason Wang, benh; +Cc: yukuai3, linuxppc-dev, linux-kernel
In-Reply-To: <2105ef52-b736-cc18-def9-02ac174d1922@csgroup.eu>

On Tue, 2021-08-17 at 13:59 +0200, Christophe Leroy wrote:
> 
> Le 17/08/2021 à 13:51, Jason Wang a écrit :
> > Global static variables dont need to be initialised to 0. Because
> > the compiler will initilise them.
> 
> It is not the compiler, it is the Kernel. It is done here:
> 
> https://elixir.bootlin.com/linux/v5.14-rc6/source/arch/powerpc/kernel/early_32.c

I don't know why that's done generally.

From memory, it's also required by the c spec unless it's for a union
where the first union member is smaller in size than other members.



^ permalink raw reply

* Re: [PATCH] powerpc/32s: Fix random crashes by adding isync() after locking/unlocking KUEP
From: Segher Boessenkool @ 2021-08-17 18:03 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: userm57, fthain, linux-kernel, Paul Mackerras, linuxppc-dev
In-Reply-To: <0426a0d3-bdc6-1a34-1018-71b34282a6c6@csgroup.eu>

Hi!

On Tue, Aug 17, 2021 at 07:13:44PM +0200, Christophe Leroy wrote:
> Le 17/08/2021 à 18:22, Segher Boessenkool a écrit :
> >On Tue, Aug 17, 2021 at 02:43:15PM +0000, Christophe Leroy wrote:
> >>Commit b5efec00b671 ("powerpc/32s: Move KUEP locking/unlocking in C")
> >>removed the 'isync' instruction after adding/removing NX bit in user
> >>segments. The reasoning behind this change was that when setting the
> >>NX bit we don't mind it taking effect with delay as the kernel never
> >>executes text from userspace, and when clearing the NX bit this is
> >>to return to userspace and then the 'rfi' should synchronise the
> >>context.
> >>
> >>However, it looks like on book3s/32 having a hash page table, at least
> >>on the G3 processor, we get an unexpected fault from userspace, then
> >>this is followed by something wrong in the verification of MSR_PR
> >>at end of another interrupt.
> >>
> >>This is fixed by adding back the removed isync() following update
> >>of NX bit in user segment registers. Only do it for cores with an
> >>hash table, as 603 cores don't exhibit that problem and the two isync
> >>increase ./null_syscall selftest by 6 cycles on an MPC 832x.
> >>
> >>First problem: unexpected PROTFAULT
> >>
> >>	[   62.896426] WARNING: CPU: 0 PID: 1660 at 
> >>	arch/powerpc/mm/fault.c:354 do_page_fault+0x6c/0x5b0
> >>	[   62.918111] Modules linked in:
> >>	[   62.923350] CPU: 0 PID: 1660 Comm: Xorg Not tainted 
> >>	5.13.0-pmac-00028-gb3c15b60339a #40
> >>	[   62.943476] NIP:  c001b5c8 LR: c001b6f8 CTR: 00000000
> >>	[   62.954714] REGS: e2d09e40 TRAP: 0700   Not tainted  
> >>	(5.13.0-pmac-00028-gb3c15b60339a)
> >
> >That is not a protection fault.  What causes this?
> 
> That's the WARN_ON(error_code & DSISR_PROTFAULT) at
> 
> https://elixir.bootlin.com/linux/v5.13/source/arch/powerpc/mm/fault.c#L354

Ah okay.  How confusing :-/

> >A CSI (like isync) is required both before and after mtsr.  It may work
> >on some cores without -- what part of that is luck, if there is anything
> >that guarantees it, is anyone's guess :-/
> 
> kuep_lock() is called when entering interrupts, it means we recently got an 
> 'rfi' to re-enable MMU.
> kuep_unlock() is called when exit interrupts, it means we are soon going to 
> call 'rfi' to go back to user.
> 
> In between, nobody is going to exec any userspace code, so who minds that 
> the 'mtsr' changing user segments is not completely finished ?

Hey, that is my question!  :-)

So why does this not work on 750 then?

> >>@@ -28,6 +30,8 @@ static inline void kuep_lock(void)
> >>  		return;
> >>  
> >>  	update_user_segments(mfsr(0) | SR_NX);
> >>+	if (mmu_has_feature(MMU_FTR_HPTE_TABLE))
> >>+		isync();	/* Context sync required after mtsr() */
> >>  }
> >
> >This needs a comment why you are not doing this for systems without
> >hardware page table walk, at the least?
> 
> Ok, will add a comment tomorrow.

Thanks!


Segher

^ permalink raw reply

* Re: [PATCH] macintosh: no need to initilise statics to 0
From: Segher Boessenkool @ 2021-08-17 17:44 UTC (permalink / raw)
  To: Christophe Leroy; +Cc: linuxppc-dev, linux-kernel, Jason Wang, yukuai3
In-Reply-To: <2105ef52-b736-cc18-def9-02ac174d1922@csgroup.eu>

On Tue, Aug 17, 2021 at 01:59:33PM +0200, Christophe Leroy wrote:
> 
> 
> Le 17/08/2021 à 13:51, Jason Wang a écrit :
> >Global static variables dont need to be initialised to 0. Because
> >the compiler will initilise them.
> 
> It is not the compiler, it is the Kernel. It is done here:
> 
> https://elixir.bootlin.com/linux/v5.14-rc6/source/arch/powerpc/kernel/early_32.c

This implements part of the C language: everything with static storage
duration (which includes all global objects, not just those marked
"static") is initialized.

> Among those 44 changes, only 2 are related to the commit's description.

Yeah.  Don't do that please :-/


Segher

^ permalink raw reply

* Re: [PATCH] powerpc/32s: Fix random crashes by adding isync() after locking/unlocking KUEP
From: Christophe Leroy @ 2021-08-17 17:13 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: userm57, fthain, linux-kernel, Paul Mackerras, linuxppc-dev
In-Reply-To: <20210817162239.GF1583@gate.crashing.org>



Le 17/08/2021 à 18:22, Segher Boessenkool a écrit :
> On Tue, Aug 17, 2021 at 02:43:15PM +0000, Christophe Leroy wrote:
>> Commit b5efec00b671 ("powerpc/32s: Move KUEP locking/unlocking in C")
>> removed the 'isync' instruction after adding/removing NX bit in user
>> segments. The reasoning behind this change was that when setting the
>> NX bit we don't mind it taking effect with delay as the kernel never
>> executes text from userspace, and when clearing the NX bit this is
>> to return to userspace and then the 'rfi' should synchronise the
>> context.
>>
>> However, it looks like on book3s/32 having a hash page table, at least
>> on the G3 processor, we get an unexpected fault from userspace, then
>> this is followed by something wrong in the verification of MSR_PR
>> at end of another interrupt.
>>
>> This is fixed by adding back the removed isync() following update
>> of NX bit in user segment registers. Only do it for cores with an
>> hash table, as 603 cores don't exhibit that problem and the two isync
>> increase ./null_syscall selftest by 6 cycles on an MPC 832x.
>>
>> First problem: unexpected PROTFAULT
>>
>> 	[   62.896426] WARNING: CPU: 0 PID: 1660 at arch/powerpc/mm/fault.c:354 do_page_fault+0x6c/0x5b0
>> 	[   62.918111] Modules linked in:
>> 	[   62.923350] CPU: 0 PID: 1660 Comm: Xorg Not tainted 5.13.0-pmac-00028-gb3c15b60339a #40
>> 	[   62.943476] NIP:  c001b5c8 LR: c001b6f8 CTR: 00000000
>> 	[   62.954714] REGS: e2d09e40 TRAP: 0700   Not tainted  (5.13.0-pmac-00028-gb3c15b60339a)
> 
> That is not a protection fault.  What causes this?

That's the WARN_ON(error_code & DSISR_PROTFAULT) at

https://elixir.bootlin.com/linux/v5.13/source/arch/powerpc/mm/fault.c#L354


> 
> A CSI (like isync) is required both before and after mtsr.  It may work
> on some cores without -- what part of that is luck, if there is anything
> that guarantees it, is anyone's guess :-/

kuep_lock() is called when entering interrupts, it means we recently got an 'rfi' to re-enable MMU.
kuep_unlock() is called when exit interrupts, it means we are soon going to call 'rfi' to go back to 
user.

In between, nobody is going to exec any userspace code, so who minds that the 'mtsr' changing user 
segments is not completely finished ?

> 
>> @@ -28,6 +30,8 @@ static inline void kuep_lock(void)
>>   		return;
>>   
>>   	update_user_segments(mfsr(0) | SR_NX);
>> +	if (mmu_has_feature(MMU_FTR_HPTE_TABLE))
>> +		isync();	/* Context sync required after mtsr() */
>>   }
> 
> This needs a comment why you are not doing this for systems without
> hardware page table walk, at the least?

Ok, will add a comment tomorrow.

Christophe

^ permalink raw reply

* Re: [PATCH] powerpc/32s: Fix random crashes by adding isync() after locking/unlocking KUEP
From: Segher Boessenkool @ 2021-08-17 16:22 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: userm57, fthain, linux-kernel, Paul Mackerras, linuxppc-dev
In-Reply-To: <1d28441dd80845e6428d693c0724cb6457247466.1629211378.git.christophe.leroy@csgroup.eu>

On Tue, Aug 17, 2021 at 02:43:15PM +0000, Christophe Leroy wrote:
> Commit b5efec00b671 ("powerpc/32s: Move KUEP locking/unlocking in C")
> removed the 'isync' instruction after adding/removing NX bit in user
> segments. The reasoning behind this change was that when setting the
> NX bit we don't mind it taking effect with delay as the kernel never
> executes text from userspace, and when clearing the NX bit this is
> to return to userspace and then the 'rfi' should synchronise the
> context.
> 
> However, it looks like on book3s/32 having a hash page table, at least
> on the G3 processor, we get an unexpected fault from userspace, then
> this is followed by something wrong in the verification of MSR_PR
> at end of another interrupt.
> 
> This is fixed by adding back the removed isync() following update
> of NX bit in user segment registers. Only do it for cores with an
> hash table, as 603 cores don't exhibit that problem and the two isync
> increase ./null_syscall selftest by 6 cycles on an MPC 832x.
> 
> First problem: unexpected PROTFAULT
> 
> 	[   62.896426] WARNING: CPU: 0 PID: 1660 at arch/powerpc/mm/fault.c:354 do_page_fault+0x6c/0x5b0
> 	[   62.918111] Modules linked in:
> 	[   62.923350] CPU: 0 PID: 1660 Comm: Xorg Not tainted 5.13.0-pmac-00028-gb3c15b60339a #40
> 	[   62.943476] NIP:  c001b5c8 LR: c001b6f8 CTR: 00000000
> 	[   62.954714] REGS: e2d09e40 TRAP: 0700   Not tainted  (5.13.0-pmac-00028-gb3c15b60339a)

That is not a protection fault.  What causes this?

A CSI (like isync) is required both before and after mtsr.  It may work
on some cores without -- what part of that is luck, if there is anything
that guarantees it, is anyone's guess :-/

> @@ -28,6 +30,8 @@ static inline void kuep_lock(void)
>  		return;
>  
>  	update_user_segments(mfsr(0) | SR_NX);
> +	if (mmu_has_feature(MMU_FTR_HPTE_TABLE))
> +		isync();	/* Context sync required after mtsr() */
>  }

This needs a comment why you are not doing this for systems without
hardware page table walk, at the least?


Segher

^ permalink raw reply

* [PATCH] powerpc/ptrace: Make user_mode() common to PPC32 and PPC64
From: Christophe Leroy @ 2021-08-17 16:00 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
  Cc: linuxppc-dev, linux-kernel

Today we have:

	#ifdef __powerpc64__
	#define user_mode(regs) ((((regs)->msr) >> MSR_PR_LG) & 0x1)
	#else
	#define user_mode(regs) (((regs)->msr & MSR_PR) != 0)
	#endif

With ppc64_defconfig, we get:

	if (!user_mode(regs))
    14b4:	e9 3e 01 08 	ld      r9,264(r30)
    14b8:	71 29 40 00 	andi.   r9,r9,16384
    14bc:	41 82 07 a4 	beq     1c60 <.emulate_instruction+0x7d0>

If taking the ppc32 definition of user_mode(), the exact same code
is generated for ppc64_defconfig.

So, only keep one version of user_mode(), preferably the one not
using MSR_PR_LG which should be kept internal to reg.h.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
 arch/powerpc/include/asm/ptrace.h | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/arch/powerpc/include/asm/ptrace.h b/arch/powerpc/include/asm/ptrace.h
index 14422e851494..fd60538737a0 100644
--- a/arch/powerpc/include/asm/ptrace.h
+++ b/arch/powerpc/include/asm/ptrace.h
@@ -197,11 +197,7 @@ static inline unsigned long frame_pointer(struct pt_regs *regs)
 	return 0;
 }
 
-#ifdef __powerpc64__
-#define user_mode(regs) ((((regs)->msr) >> MSR_PR_LG) & 0x1)
-#else
 #define user_mode(regs) (((regs)->msr & MSR_PR) != 0)
-#endif
 
 #define force_successful_syscall_return()   \
 	do { \
-- 
2.25.0


^ permalink raw reply related

* Re: [PATCH v7 1/2] tty: hvc: pass DMA capable memory to put_chars()
From: Greg KH @ 2021-08-17 15:48 UTC (permalink / raw)
  To: Xianting Tian
  Cc: arnd, amit, jirislaby, linux-kernel, virtualization, linuxppc-dev,
	osandov
In-Reply-To: <20210817132300.165014-2-xianting.tian@linux.alibaba.com>

On Tue, Aug 17, 2021 at 09:22:59PM +0800, Xianting Tian wrote:
> We tested the patch, it worked normally.

Who is "we"?

> Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>

Like I said before, I need another developer from your company to review
and sign-off on this patch (and the other one), before I am willing to
look at it, based on the previous mistakes that have happened here.

thanks,

greg k-h

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox