LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH v2 1/2] powerpc/64s: system call scv tabort fix for corrupt irq soft-mask state
From: Nicholas Piggin @ 2021-09-02  3:33 UTC (permalink / raw)
  To: Christophe Leroy, linuxppc-dev; +Cc: Eirik Fuller
In-Reply-To: <65ed1ac8-f4af-742a-1d2a-e5db7e71a920@csgroup.eu>

Excerpts from Christophe Leroy's message of September 2, 2021 3:21 am:
> 
> 
> Le 01/09/2021 à 18:54, Nicholas Piggin a écrit :
>> If a system call is made with a transaction active, the kernel
>> immediately aborts it and returns. scv system calls disable irqs even
>> earlier in their interrupt handler, and tabort_syscall does not fix this
>> up.
>> 
>> This can result in irq soft-mask state being messed up on the next
>> kernel entry, and crashing at BUG_ON(arch_irq_disabled_regs(regs)) in
>> the kernel exit handlers, or possibly worse.
>> 
>> Fix this by having tabort_syscall setting irq soft-mask back to enabled
>> (which requires MSR[EE] be disabled first).
>> 
>> Reported-by: Eirik Fuller <efuller@redhat.com>
>> Fixes: 7fa95f9adaee7 ("powerpc/64s: system call support for scv/rfscv instructions")
>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>> ---
>> 
>> Tested the wrong kernel before sending v1 and missed a bug, sorry.
>> 
>>   arch/powerpc/kernel/interrupt_64.S | 8 +++++++-
>>   1 file changed, 7 insertions(+), 1 deletion(-)
>> 
>> diff --git a/arch/powerpc/kernel/interrupt_64.S b/arch/powerpc/kernel/interrupt_64.S
>> index d4212d2ff0b5..9c31d65b4851 100644
>> --- a/arch/powerpc/kernel/interrupt_64.S
>> +++ b/arch/powerpc/kernel/interrupt_64.S
>> @@ -428,16 +428,22 @@ RESTART_TABLE(.Lsyscall_rst_start, .Lsyscall_rst_end, syscall_restart)
>>   #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
>>   tabort_syscall:
>>   _ASM_NOKPROBE_SYMBOL(tabort_syscall)
>> -	/* Firstly we need to enable TM in the kernel */
>> +	/* We need to enable TM in the kernel, and disable EE (for scv) */
>>   	mfmsr	r10
>>   	li	r9, 1
>>   	rldimi	r10, r9, MSR_TM_LG, 63-MSR_TM_LG
>> +	LOAD_REG_IMMEDIATE(r9, MSR_EE)
>> +	andc	r10, r10, r9
> 
> Why not use 'rlwinm' to mask out MSR_EE ?
> 
> Something like
> 
> 	rlwinm	r10, r10, 0, ~MSR_EE

Mainly because I'm bad at powerpc assembly. Why do you think I'm trying 
to change as much as possible to C?

Actually there should really be no need for mfmsr either, I wanted to
rewrite the thing entirely as

	ld      r10,PACAKMSR(r13)
	LOAD_REG_IMMEDIATE(r9, MSR_TM)
	or	r10,r10,r9
	mtmsrd	r10

But I thought that's not a minimal bug fix.

Thanks,
Nick
> 
>>   	mtmsrd	r10, 0
>>   
>>   	/* tabort, this dooms the transaction, nothing else */
>>   	li	r9, (TM_CAUSE_SYSCALL|TM_CAUSE_PERSISTENT)
>>   	TABORT(R9)
>>   
>> +	/* scv has disabled irqs so must re-enable. sc just remains enabled */
>> +	li	r9,IRQS_ENABLED
>> +	stb	r9,PACAIRQSOFTMASK(r13)
>> +
>>   	/*
>>   	 * Return directly to userspace. We have corrupted user register state,
>>   	 * but userspace will never see that register state. Execution will
>> 
> 

^ permalink raw reply

* Re: [PATCH v2 2/2] selftests/powerpc: Add scv versions of the basic TM syscall tests
From: Nicholas Piggin @ 2021-09-02  3:27 UTC (permalink / raw)
  To: Christophe Leroy, linuxppc-dev; +Cc: Eirik Fuller
In-Reply-To: <f99fa6c6-cebe-c261-0971-0f485cbcea2d@csgroup.eu>

Excerpts from Christophe Leroy's message of September 2, 2021 3:15 am:
> 
> 
> Le 01/09/2021 à 18:54, Nicholas Piggin a écrit :
>> The basic TM vs syscall test code hard codes an sc instruction for the
>> system call, which fails to cover scv even when the userspace libc has
>> support for it.
>> 
>> Duplicate the tests with hard coded scv variants so both are tested
>> when possible.
>> 
>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>> ---
>>   .../selftests/powerpc/tm/tm-syscall-asm.S     | 46 +++++++++++++++++++
>>   .../testing/selftests/powerpc/tm/tm-syscall.c | 36 ++++++++++++---
>>   2 files changed, 75 insertions(+), 7 deletions(-)
>> 
>> diff --git a/tools/testing/selftests/powerpc/tm/tm-syscall-asm.S b/tools/testing/selftests/powerpc/tm/tm-syscall-asm.S
>> index bd1ca25febe4..849316831e6a 100644
>> --- a/tools/testing/selftests/powerpc/tm/tm-syscall-asm.S
>> +++ b/tools/testing/selftests/powerpc/tm/tm-syscall-asm.S
>> @@ -2,6 +2,10 @@
>>   #include <ppc-asm.h>
>>   #include <asm/unistd.h>
>>   
>> +/* ppc-asm.h does not define r0 or r1 */
>> +#define r0 0
>> +#define r1 1
>> +
> 
> See https://github.com/gcc-mirror/gcc/blob/master/gcc/config/rs6000/ppc-asm.h
> 
> It doesn't not define r1 but it defines r0.

Oops, I'll fix that.

> And it defines 'sp' as register 1.

Does userspace code typically use that? Kernel code AFAIKS does not.

Thanks,
Nick

^ permalink raw reply

* Re: [PATCH 0/5] KVM: PPC: Book3S: Modules cleanup and unification
From: David Gibson @ 2021-09-02  1:28 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: linuxppc-dev, npiggin, kvm-ppc
In-Reply-To: <20210901173357.3183658-1-farosas@linux.ibm.com>

[-- Attachment #1: Type: text/plain, Size: 3261 bytes --]

On Wed, Sep 01, 2021 at 02:33:52PM -0300, Fabiano Rosas wrote:
> This series merges our three kvm modules kvm.ko, kvm-hv.ko and
> kvm-pr.ko into one kvm.ko module.

That doesn't sound like a good idea to me.  People who aren't on BookS
servers don't want - and can't use - kvm-hv.  Almost nobody wants
kvm-pr.  It's also kind of inconsistent with x86, which has the
separate AMD and Intel modules.

> The main reason for this is to deal with the issue that kvm.ko can be
> loaded on its own without any of the other modules present. This can
> happen if one or both of the modules fail to init or if the user loads
> kvm.ko only.
> 
> With only kvm.ko loaded, the userspace can call any of the KVM ioctls
> which will fail more or less gracefully depending on what kind of
> verification we do in powerpc.c.

I see that that's awkward, but I'm not sure it justifies compromising
the actual natural structure of the dependencies.

> Instead of adding a check to every entry point or finding a hack to
> link the modules so that when one fails (hv/pr), the other (kvm)
> exits, I think it is cleaner to just make them all a single module.
> 
> The two KVM implementations are already selected by Kconfig options,
> so the only thing that changes is that they are now in the same
> module. I also kept kvm-hv and kvm-pr as aliases to kvm, so that
> people don't get too surprised with the change.
> 
> There is a possible issue with the larger module size for kernel
> builds that should support both HV-only and PR-only environments, but
> PR is usually not used in production so I'm not sure if that is a real
> issue.
> 
> Patches 1,2,3 are standalone cleanups.
> Patches 4,5 are the unification work.
> 
> Fabiano Rosas (5):
>   KVM: PPC: Book3S HV: Check return value of kvmppc_radix_init
>   KVM: PPC: Book3S HV: Delay setting of kvm ops
>   KVM: PPC: Book3S HV: Free allocated memory if module init fails
>   KVM: PPC: Book3S: Unify kvm-hv and kvm-pr modules
>   KVM: PPC: Book3S: Stop exporting non-builtin symbols
> 
>  arch/powerpc/configs/powernv_defconfig |  2 +-
>  arch/powerpc/configs/ppc64_defconfig   |  2 +-
>  arch/powerpc/configs/pseries_defconfig |  2 +-
>  arch/powerpc/kvm/Kconfig               | 72 ++++++++++++--------------
>  arch/powerpc/kvm/Makefile              | 11 ++--
>  arch/powerpc/kvm/book3s.c              | 61 ++++++++++++++--------
>  arch/powerpc/kvm/book3s.h              | 19 +++++++
>  arch/powerpc/kvm/book3s_64_mmu_radix.c |  3 --
>  arch/powerpc/kvm/book3s_64_vio.c       |  3 --
>  arch/powerpc/kvm/book3s_hv.c           | 38 ++++++++------
>  arch/powerpc/kvm/book3s_pr.c           | 13 -----
>  arch/powerpc/kvm/book3s_rtas.c         |  1 -
>  arch/powerpc/kvm/book3s_xics.c         |  4 --
>  arch/powerpc/kvm/book3s_xive.c         |  6 ---
>  arch/powerpc/kvm/emulate.c             |  1 -
>  arch/powerpc/kvm/powerpc.c             | 14 -----
>  kernel/irq/irqdesc.c                   |  2 +-
>  17 files changed, 125 insertions(+), 129 deletions(-)
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* [PATCH 04/10] ps3vram: add error handling support for add_disk()
From: Luis Chamberlain @ 2021-09-01 21:00 UTC (permalink / raw)
  To: axboe, bhelgaas, liushixin2, thunder.leizhen, lee.jones, geoff,
	mpe, benh, paulus, jim, haris.iqbal, jinpu.wang, josh.h.morris,
	pjk1939, tim, richard, miquel.raynal, vigneshr
  Cc: linux-block, linuxppc-dev, Luis Chamberlain, linux-mtd,
	linux-kernel
In-Reply-To: <20210901210028.1750956-1-mcgrof@kernel.org>

We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 drivers/block/ps3vram.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/block/ps3vram.c b/drivers/block/ps3vram.c
index c7b19e128b03..af2a0d09c598 100644
--- a/drivers/block/ps3vram.c
+++ b/drivers/block/ps3vram.c
@@ -755,9 +755,14 @@ static int ps3vram_probe(struct ps3_system_bus_device *dev)
 	dev_info(&dev->core, "%s: Using %llu MiB of GPU memory\n",
 		 gendisk->disk_name, get_capacity(gendisk) >> 11);
 
-	device_add_disk(&dev->core, gendisk, NULL);
+	error = device_add_disk(&dev->core, gendisk, NULL);
+	if (error)
+		goto out_cleanup_disk;
+
 	return 0;
 
+out_cleanup_disk:
+	blk_cleanup_disk(gendisk);
 out_cache_cleanup:
 	remove_proc_entry(DEVICE_NAME, NULL);
 	ps3vram_cache_cleanup(dev);
-- 
2.30.2


^ permalink raw reply related

* [PATCH 02/10] pktcdvd: add error handling support for add_disk()
From: Luis Chamberlain @ 2021-09-01 21:00 UTC (permalink / raw)
  To: axboe, bhelgaas, liushixin2, thunder.leizhen, lee.jones, geoff,
	mpe, benh, paulus, jim, haris.iqbal, jinpu.wang, josh.h.morris,
	pjk1939, tim, richard, miquel.raynal, vigneshr
  Cc: linux-block, linuxppc-dev, Luis Chamberlain, linux-mtd,
	linux-kernel
In-Reply-To: <20210901210028.1750956-1-mcgrof@kernel.org>

We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

The out_mem2 error label already does what we need so
re-use that.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 drivers/block/pktcdvd.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index 0f26b2510a75..415248962e67 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -2729,7 +2729,9 @@ static int pkt_setup_dev(dev_t dev, dev_t* pkt_dev)
 	/* inherit events of the host device */
 	disk->events = pd->bdev->bd_disk->events;
 
-	add_disk(disk);
+	ret = add_disk(disk);
+	if (ret)
+		goto out_mem2;
 
 	pkt_sysfs_dev_new(pd);
 	pkt_debugfs_dev_new(pd);
-- 
2.30.2


^ permalink raw reply related

* [PATCH 01/10] mtip32xx: add error handling support for add_disk()
From: Luis Chamberlain @ 2021-09-01 21:00 UTC (permalink / raw)
  To: axboe, bhelgaas, liushixin2, thunder.leizhen, lee.jones, geoff,
	mpe, benh, paulus, jim, haris.iqbal, jinpu.wang, josh.h.morris,
	pjk1939, tim, richard, miquel.raynal, vigneshr
  Cc: linux-block, linuxppc-dev, Luis Chamberlain, linux-mtd,
	linux-kernel
In-Reply-To: <20210901210028.1750956-1-mcgrof@kernel.org>

We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

The read_capacity_error error label already does what we need,
so just re-use that.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 drivers/block/mtip32xx/mtip32xx.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/block/mtip32xx/mtip32xx.c b/drivers/block/mtip32xx/mtip32xx.c
index 901855717cb5..d0b40309f47e 100644
--- a/drivers/block/mtip32xx/mtip32xx.c
+++ b/drivers/block/mtip32xx/mtip32xx.c
@@ -3633,7 +3633,9 @@ static int mtip_block_initialize(struct driver_data *dd)
 	set_capacity(dd->disk, capacity);
 
 	/* Enable the block device and add it to /dev */
-	device_add_disk(&dd->pdev->dev, dd->disk, mtip_disk_attr_groups);
+	rv = device_add_disk(&dd->pdev->dev, dd->disk, mtip_disk_attr_groups);
+	if (rv)
+		goto read_capacity_error;
 
 	if (dd->mtip_svc_handler) {
 		set_bit(MTIP_DDF_INIT_DONE_BIT, &dd->dd_flag);
-- 
2.30.2


^ permalink raw reply related

* [PATCH 00/10] block: fourth batch of add_disk() error handling conversions
From: Luis Chamberlain @ 2021-09-01 21:00 UTC (permalink / raw)
  To: axboe, bhelgaas, liushixin2, thunder.leizhen, lee.jones, geoff,
	mpe, benh, paulus, jim, haris.iqbal, jinpu.wang, josh.h.morris,
	pjk1939, tim, richard, miquel.raynal, vigneshr
  Cc: linux-block, linuxppc-dev, Luis Chamberlain, linux-mtd,
	linux-kernel

The full set of changes can be found on my branch titled
20210901-for-axboe-add-disk-error-handling [0] which is
now based on axboe/master.

[0] https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux-next.git/log/?h=20210901-for-axboe-add-disk-error-handling

Luis Chamberlain (10):
  mtip32xx: add error handling support for add_disk()
  pktcdvd: add error handling support for add_disk()
  ps3disk: add error handling support for add_disk()
  ps3vram: add error handling support for add_disk()
  rnbd: add error handling support for add_disk()
  block/rsxx: add error handling support for add_disk()
  block/sunvdc: add error handling support for add_disk()
  block/sx8: add error handling support for add_disk()
  pf: add error handling support for add_disk()
  mtd/ubi/block: add error handling support for add_disk()

 drivers/block/mtip32xx/mtip32xx.c |  4 +++-
 drivers/block/paride/pf.c         |  4 +++-
 drivers/block/pktcdvd.c           |  4 +++-
 drivers/block/ps3disk.c           |  8 ++++++--
 drivers/block/ps3vram.c           |  7 ++++++-
 drivers/block/rnbd/rnbd-clt.c     | 13 +++++++++----
 drivers/block/rsxx/core.c         |  4 +++-
 drivers/block/rsxx/dev.c          | 12 +++++++++---
 drivers/block/sunvdc.c            | 14 +++++++++++---
 drivers/block/sx8.c               | 13 +++++++++----
 drivers/mtd/ubi/block.c           |  8 +++++++-
 11 files changed, 69 insertions(+), 22 deletions(-)

-- 
2.30.2


^ permalink raw reply

* [PATCH 08/10] block/sx8: add error handling support for add_disk()
From: Luis Chamberlain @ 2021-09-01 21:00 UTC (permalink / raw)
  To: axboe, bhelgaas, liushixin2, thunder.leizhen, lee.jones, geoff,
	mpe, benh, paulus, jim, haris.iqbal, jinpu.wang, josh.h.morris,
	pjk1939, tim, richard, miquel.raynal, vigneshr
  Cc: linux-block, linuxppc-dev, Luis Chamberlain, linux-mtd,
	linux-kernel
In-Reply-To: <20210901210028.1750956-1-mcgrof@kernel.org>

We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

A completion is used to notify the initial probe what is
happening and so we must defer error handling on completion.
Do this by remembering the error and using the shared cleanup
function.

The tags are shared and so are hanlded later for the
driver already.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 drivers/block/sx8.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/block/sx8.c b/drivers/block/sx8.c
index 420cd952ddc4..1c79248c4826 100644
--- a/drivers/block/sx8.c
+++ b/drivers/block/sx8.c
@@ -297,6 +297,7 @@ struct carm_host {
 
 	struct work_struct		fsm_task;
 
+	int probe_err;
 	struct completion		probe_comp;
 };
 
@@ -1181,8 +1182,11 @@ static void carm_fsm_task (struct work_struct *work)
 				struct gendisk *disk = port->disk;
 
 				set_capacity(disk, port->capacity);
-				add_disk(disk);
-				activated++;
+				host->probe_err = add_disk(disk);
+				if (!host->probe_err)
+					activated++;
+				else
+					break;
 			}
 
 		printk(KERN_INFO DRV_NAME "(%s): %d ports activated\n",
@@ -1192,11 +1196,9 @@ static void carm_fsm_task (struct work_struct *work)
 		reschedule = 1;
 		break;
 	}
-
 	case HST_PROBE_FINISHED:
 		complete(&host->probe_comp);
 		break;
-
 	case HST_ERROR:
 		/* FIXME: TODO */
 		break;
@@ -1507,7 +1509,10 @@ static int carm_init_one (struct pci_dev *pdev, const struct pci_device_id *ent)
 		goto err_out_free_irq;
 
 	DPRINTK("waiting for probe_comp\n");
+	host->probe_err = -ENODEV;
 	wait_for_completion(&host->probe_comp);
+	if (host->probe_err)
+		goto err_out_free_irq;
 
 	printk(KERN_INFO "%s: pci %s, ports %d, io %llx, irq %u, major %d\n",
 	       host->name, pci_name(pdev), (int) CARM_MAX_PORTS,
-- 
2.30.2


^ permalink raw reply related

* [PATCH 09/10] pf: add error handling support for add_disk()
From: Luis Chamberlain @ 2021-09-01 21:00 UTC (permalink / raw)
  To: axboe, bhelgaas, liushixin2, thunder.leizhen, lee.jones, geoff,
	mpe, benh, paulus, jim, haris.iqbal, jinpu.wang, josh.h.morris,
	pjk1939, tim, richard, miquel.raynal, vigneshr
  Cc: linux-block, linuxppc-dev, Luis Chamberlain, linux-mtd,
	linux-kernel
In-Reply-To: <20210901210028.1750956-1-mcgrof@kernel.org>

We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 drivers/block/paride/pf.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/block/paride/pf.c b/drivers/block/paride/pf.c
index f471d48a87bc..380d80e507c7 100644
--- a/drivers/block/paride/pf.c
+++ b/drivers/block/paride/pf.c
@@ -962,7 +962,9 @@ static int __init pf_init_unit(struct pf_unit *pf, bool autoprobe, int port,
 	if (pf_probe(pf))
 		goto out_pi_release;
 
-	add_disk(disk);
+	ret = add_disk(disk);
+	if (ret)
+		goto out_pi_release;
 	pf->present = 1;
 	return 0;
 
-- 
2.30.2


^ permalink raw reply related

* [PATCH 05/10] rnbd: add error handling support for add_disk()
From: Luis Chamberlain @ 2021-09-01 21:00 UTC (permalink / raw)
  To: axboe, bhelgaas, liushixin2, thunder.leizhen, lee.jones, geoff,
	mpe, benh, paulus, jim, haris.iqbal, jinpu.wang, josh.h.morris,
	pjk1939, tim, richard, miquel.raynal, vigneshr
  Cc: linux-block, linuxppc-dev, Luis Chamberlain, linux-mtd,
	linux-kernel
In-Reply-To: <20210901210028.1750956-1-mcgrof@kernel.org>

We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 drivers/block/rnbd/rnbd-clt.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/block/rnbd/rnbd-clt.c b/drivers/block/rnbd/rnbd-clt.c
index bd4a41afbbfc..1ba1c868535a 100644
--- a/drivers/block/rnbd/rnbd-clt.c
+++ b/drivers/block/rnbd/rnbd-clt.c
@@ -1384,8 +1384,10 @@ static void setup_request_queue(struct rnbd_clt_dev *dev)
 	blk_queue_write_cache(dev->queue, dev->wc, dev->fua);
 }
 
-static void rnbd_clt_setup_gen_disk(struct rnbd_clt_dev *dev, int idx)
+static int rnbd_clt_setup_gen_disk(struct rnbd_clt_dev *dev, int idx)
 {
+	int err;
+
 	dev->gd->major		= rnbd_client_major;
 	dev->gd->first_minor	= idx << RNBD_PART_BITS;
 	dev->gd->minors		= 1 << RNBD_PART_BITS;
@@ -1410,7 +1412,11 @@ static void rnbd_clt_setup_gen_disk(struct rnbd_clt_dev *dev, int idx)
 
 	if (!dev->rotational)
 		blk_queue_flag_set(QUEUE_FLAG_NONROT, dev->queue);
-	add_disk(dev->gd);
+	err = add_disk(dev->gd);
+	if (err)
+		blk_cleanup_disk(dev->gd);
+
+	return err;
 }
 
 static int rnbd_client_setup_device(struct rnbd_clt_dev *dev)
@@ -1426,8 +1432,7 @@ static int rnbd_client_setup_device(struct rnbd_clt_dev *dev)
 	rnbd_init_mq_hw_queues(dev);
 
 	setup_request_queue(dev);
-	rnbd_clt_setup_gen_disk(dev, idx);
-	return 0;
+	return rnbd_clt_setup_gen_disk(dev, idx);
 }
 
 static struct rnbd_clt_dev *init_dev(struct rnbd_clt_session *sess,
-- 
2.30.2


^ permalink raw reply related

* [PATCH 07/10] block/sunvdc: add error handling support for add_disk()
From: Luis Chamberlain @ 2021-09-01 21:00 UTC (permalink / raw)
  To: axboe, bhelgaas, liushixin2, thunder.leizhen, lee.jones, geoff,
	mpe, benh, paulus, jim, haris.iqbal, jinpu.wang, josh.h.morris,
	pjk1939, tim, richard, miquel.raynal, vigneshr
  Cc: linux-block, linuxppc-dev, Luis Chamberlain, linux-mtd,
	linux-kernel
In-Reply-To: <20210901210028.1750956-1-mcgrof@kernel.org>

We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

We re-use the same free tag call, so we also add a label for
that as well.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 drivers/block/sunvdc.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/block/sunvdc.c b/drivers/block/sunvdc.c
index 4d4bb810c2ae..6f45a53f7cbf 100644
--- a/drivers/block/sunvdc.c
+++ b/drivers/block/sunvdc.c
@@ -826,8 +826,8 @@ static int probe_disk(struct vdc_port *port)
 	if (IS_ERR(g)) {
 		printk(KERN_ERR PFX "%s: Could not allocate gendisk.\n",
 		       port->vio.name);
-		blk_mq_free_tag_set(&port->tag_set);
-		return PTR_ERR(g);
+		err = PTR_ERR(g);
+		goto out_free_tag;
 	}
 
 	port->disk = g;
@@ -879,9 +879,17 @@ static int probe_disk(struct vdc_port *port)
 	       port->vdisk_size, (port->vdisk_size >> (20 - 9)),
 	       port->vio.ver.major, port->vio.ver.minor);
 
-	device_add_disk(&port->vio.vdev->dev, g, NULL);
+	err = device_add_disk(&port->vio.vdev->dev, g, NULL);
+	if (err)
+		goto out_cleanup_disk;
 
 	return 0;
+
+out_cleanup_disk:
+	blk_cleanup_disk(g);
+out_free_tag:
+	blk_mq_free_tag_set(&port->tag_set);
+	return err;
 }
 
 static struct ldc_channel_config vdc_ldc_cfg = {
-- 
2.30.2


^ permalink raw reply related

* [PATCH 10/10] mtd/ubi/block: add error handling support for add_disk()
From: Luis Chamberlain @ 2021-09-01 21:00 UTC (permalink / raw)
  To: axboe, bhelgaas, liushixin2, thunder.leizhen, lee.jones, geoff,
	mpe, benh, paulus, jim, haris.iqbal, jinpu.wang, josh.h.morris,
	pjk1939, tim, richard, miquel.raynal, vigneshr
  Cc: linux-block, linuxppc-dev, Luis Chamberlain, linux-mtd,
	linux-kernel
In-Reply-To: <20210901210028.1750956-1-mcgrof@kernel.org>

We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 drivers/mtd/ubi/block.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/mtd/ubi/block.c b/drivers/mtd/ubi/block.c
index e003b4b44ffa..062e6c2c45f5 100644
--- a/drivers/mtd/ubi/block.c
+++ b/drivers/mtd/ubi/block.c
@@ -447,12 +447,18 @@ int ubiblock_create(struct ubi_volume_info *vi)
 	list_add_tail(&dev->list, &ubiblock_devices);
 
 	/* Must be the last step: anyone can call file ops from now on */
-	add_disk(dev->gd);
+	ret = add_disk(dev->gd);
+	if (ret)
+		goto out_destroy_wq;
+
 	dev_info(disk_to_dev(dev->gd), "created from ubi%d:%d(%s)",
 		 dev->ubi_num, dev->vol_id, vi->name);
 	mutex_unlock(&devices_mutex);
 	return 0;
 
+out_destroy_wq:
+	list_del(&dev->list);
+	destroy_workqueue(dev->wq);
 out_remove_minor:
 	idr_remove(&ubiblock_minor_idr, gd->first_minor);
 out_cleanup_disk:
-- 
2.30.2


^ permalink raw reply related

* [PATCH 03/10] ps3disk: add error handling support for add_disk()
From: Luis Chamberlain @ 2021-09-01 21:00 UTC (permalink / raw)
  To: axboe, bhelgaas, liushixin2, thunder.leizhen, lee.jones, geoff,
	mpe, benh, paulus, jim, haris.iqbal, jinpu.wang, josh.h.morris,
	pjk1939, tim, richard, miquel.raynal, vigneshr
  Cc: linux-block, linuxppc-dev, Luis Chamberlain, linux-mtd,
	linux-kernel
In-Reply-To: <20210901210028.1750956-1-mcgrof@kernel.org>

We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 drivers/block/ps3disk.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/block/ps3disk.c b/drivers/block/ps3disk.c
index 8d51efbe045d..3054adf77460 100644
--- a/drivers/block/ps3disk.c
+++ b/drivers/block/ps3disk.c
@@ -467,9 +467,13 @@ static int ps3disk_probe(struct ps3_system_bus_device *_dev)
 		 gendisk->disk_name, priv->model, priv->raw_capacity >> 11,
 		 get_capacity(gendisk) >> 11);
 
-	device_add_disk(&dev->sbd.core, gendisk, NULL);
-	return 0;
+	error = device_add_disk(&dev->sbd.core, gendisk, NULL);
+	if (error)
+		goto fail_cleanup_disk;
 
+	return 0;
+fail_cleanup_disk:
+	blk_cleanup_disk(gendisk);
 fail_free_tag_set:
 	blk_mq_free_tag_set(&priv->tag_set);
 fail_teardown:
-- 
2.30.2


^ permalink raw reply related

* [PATCH 06/10] block/rsxx: add error handling support for add_disk()
From: Luis Chamberlain @ 2021-09-01 21:00 UTC (permalink / raw)
  To: axboe, bhelgaas, liushixin2, thunder.leizhen, lee.jones, geoff,
	mpe, benh, paulus, jim, haris.iqbal, jinpu.wang, josh.h.morris,
	pjk1939, tim, richard, miquel.raynal, vigneshr
  Cc: linux-block, linuxppc-dev, Luis Chamberlain, linux-mtd,
	linux-kernel
In-Reply-To: <20210901210028.1750956-1-mcgrof@kernel.org>

We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 drivers/block/rsxx/core.c |  4 +++-
 drivers/block/rsxx/dev.c  | 12 +++++++++---
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/block/rsxx/core.c b/drivers/block/rsxx/core.c
index 83636714b8d7..8d9d69f5dfbc 100644
--- a/drivers/block/rsxx/core.c
+++ b/drivers/block/rsxx/core.c
@@ -935,7 +935,9 @@ static int rsxx_pci_probe(struct pci_dev *dev,
 			card->size8 = 0;
 	}
 
-	rsxx_attach_dev(card);
+	st = rsxx_attach_dev(card);
+	if (st)
+		goto failed_create_dev;
 
 	/************* Setup Debugfs *************/
 	rsxx_debugfs_dev_new(card);
diff --git a/drivers/block/rsxx/dev.c b/drivers/block/rsxx/dev.c
index 1cc40b0ea761..b2d3ac3efce2 100644
--- a/drivers/block/rsxx/dev.c
+++ b/drivers/block/rsxx/dev.c
@@ -192,6 +192,8 @@ static bool rsxx_discard_supported(struct rsxx_cardinfo *card)
 
 int rsxx_attach_dev(struct rsxx_cardinfo *card)
 {
+	int err = 0;
+
 	mutex_lock(&card->dev_lock);
 
 	/* The block device requires the stripe size from the config. */
@@ -200,13 +202,17 @@ int rsxx_attach_dev(struct rsxx_cardinfo *card)
 			set_capacity(card->gendisk, card->size8 >> 9);
 		else
 			set_capacity(card->gendisk, 0);
-		device_add_disk(CARD_TO_DEV(card), card->gendisk, NULL);
-		card->bdev_attached = 1;
+		err = device_add_disk(CARD_TO_DEV(card), card->gendisk, NULL);
+		if (err == 0)
+			card->bdev_attached = 1;
 	}
 
 	mutex_unlock(&card->dev_lock);
 
-	return 0;
+	if (err)
+		blk_cleanup_disk(card->gendisk);
+
+	return err;
 }
 
 void rsxx_detach_dev(struct rsxx_cardinfo *card)
-- 
2.30.2


^ permalink raw reply related

* [PATCH v3 5/5] KVM: selftests: Remove __NR_userfaultfd syscall fallback
From: Sean Christopherson @ 2021-09-01 20:30 UTC (permalink / raw)
  To: Russell King, Catalin Marinas, Will Deacon, Guo Ren,
	Thomas Bogendoerfer, Michael Ellerman, Steven Rostedt,
	Ingo Molnar, Oleg Nesterov, Thomas Gleixner, Peter Zijlstra,
	Andy Lutomirski, Mathieu Desnoyers, Paul E. McKenney, Boqun Feng,
	Paolo Bonzini, Shuah Khan
  Cc: kvm, Ben Gardon, linux-kernel, linux-csky, linux-mips,
	Peter Foley, Paul Mackerras, linux-kselftest, Sean Christopherson,
	Shakeel Butt, linuxppc-dev, linux-arm-kernel
In-Reply-To: <20210901203030.1292304-1-seanjc@google.com>

Revert the __NR_userfaultfd syscall fallback added for KVM selftests now
that x86's unistd_{32,63}.h overrides are under uapi/ and thus not in
KVM sefltests' search path, i.e. now that KVM gets x86 syscall numbers
from the installed kernel headers.

No functional change intended.

Reviewed-by: Ben Gardon <bgardon@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 tools/arch/x86/include/uapi/asm/unistd_64.h | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/tools/arch/x86/include/uapi/asm/unistd_64.h b/tools/arch/x86/include/uapi/asm/unistd_64.h
index 4205ed4158bf..cb52a3a8b8fc 100644
--- a/tools/arch/x86/include/uapi/asm/unistd_64.h
+++ b/tools/arch/x86/include/uapi/asm/unistd_64.h
@@ -1,7 +1,4 @@
 /* SPDX-License-Identifier: GPL-2.0 */
-#ifndef __NR_userfaultfd
-#define __NR_userfaultfd 282
-#endif
 #ifndef __NR_perf_event_open
 # define __NR_perf_event_open 298
 #endif
-- 
2.33.0.153.gba50c8fa24-goog


^ permalink raw reply related

* [PATCH v3 4/5] KVM: selftests: Add a test for KVM_RUN+rseq to detect task migration bugs
From: Sean Christopherson @ 2021-09-01 20:30 UTC (permalink / raw)
  To: Russell King, Catalin Marinas, Will Deacon, Guo Ren,
	Thomas Bogendoerfer, Michael Ellerman, Steven Rostedt,
	Ingo Molnar, Oleg Nesterov, Thomas Gleixner, Peter Zijlstra,
	Andy Lutomirski, Mathieu Desnoyers, Paul E. McKenney, Boqun Feng,
	Paolo Bonzini, Shuah Khan
  Cc: kvm, Ben Gardon, linux-kernel, linux-csky, linux-mips,
	Peter Foley, Paul Mackerras, linux-kselftest, Sean Christopherson,
	Shakeel Butt, linuxppc-dev, linux-arm-kernel
In-Reply-To: <20210901203030.1292304-1-seanjc@google.com>

Add a test to verify an rseq's CPU ID is updated correctly if the task is
migrated while the kernel is handling KVM_RUN.  This is a regression test
for a bug introduced by commit 72c3c0fe54a3 ("x86/kvm: Use generic xfer
to guest work function"), where TIF_NOTIFY_RESUME would be cleared by KVM
without updating rseq, leading to a stale CPU ID and other badness.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 tools/testing/selftests/kvm/.gitignore  |   1 +
 tools/testing/selftests/kvm/Makefile    |   3 +
 tools/testing/selftests/kvm/rseq_test.c | 236 ++++++++++++++++++++++++
 3 files changed, 240 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/rseq_test.c

diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftests/kvm/.gitignore
index 0709af0144c8..6d031ff6b68e 100644
--- a/tools/testing/selftests/kvm/.gitignore
+++ b/tools/testing/selftests/kvm/.gitignore
@@ -47,6 +47,7 @@
 /kvm_page_table_test
 /memslot_modification_stress_test
 /memslot_perf_test
+/rseq_test
 /set_memory_region_test
 /steal_time
 /kvm_binary_stats_test
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index 5832f510a16c..0756e79cb513 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -80,6 +80,7 @@ TEST_GEN_PROGS_x86_64 += kvm_create_max_vcpus
 TEST_GEN_PROGS_x86_64 += kvm_page_table_test
 TEST_GEN_PROGS_x86_64 += memslot_modification_stress_test
 TEST_GEN_PROGS_x86_64 += memslot_perf_test
+TEST_GEN_PROGS_x86_64 += rseq_test
 TEST_GEN_PROGS_x86_64 += set_memory_region_test
 TEST_GEN_PROGS_x86_64 += steal_time
 TEST_GEN_PROGS_x86_64 += kvm_binary_stats_test
@@ -92,6 +93,7 @@ TEST_GEN_PROGS_aarch64 += dirty_log_test
 TEST_GEN_PROGS_aarch64 += dirty_log_perf_test
 TEST_GEN_PROGS_aarch64 += kvm_create_max_vcpus
 TEST_GEN_PROGS_aarch64 += kvm_page_table_test
+TEST_GEN_PROGS_aarch64 += rseq_test
 TEST_GEN_PROGS_aarch64 += set_memory_region_test
 TEST_GEN_PROGS_aarch64 += steal_time
 TEST_GEN_PROGS_aarch64 += kvm_binary_stats_test
@@ -103,6 +105,7 @@ TEST_GEN_PROGS_s390x += demand_paging_test
 TEST_GEN_PROGS_s390x += dirty_log_test
 TEST_GEN_PROGS_s390x += kvm_create_max_vcpus
 TEST_GEN_PROGS_s390x += kvm_page_table_test
+TEST_GEN_PROGS_s390x += rseq_test
 TEST_GEN_PROGS_s390x += set_memory_region_test
 TEST_GEN_PROGS_s390x += kvm_binary_stats_test
 
diff --git a/tools/testing/selftests/kvm/rseq_test.c b/tools/testing/selftests/kvm/rseq_test.c
new file mode 100644
index 000000000000..060538bd405a
--- /dev/null
+++ b/tools/testing/selftests/kvm/rseq_test.c
@@ -0,0 +1,236 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#define _GNU_SOURCE /* for program_invocation_short_name */
+#include <errno.h>
+#include <fcntl.h>
+#include <pthread.h>
+#include <sched.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <signal.h>
+#include <syscall.h>
+#include <sys/ioctl.h>
+#include <asm/barrier.h>
+#include <linux/atomic.h>
+#include <linux/rseq.h>
+#include <linux/unistd.h>
+
+#include "kvm_util.h"
+#include "processor.h"
+#include "test_util.h"
+
+#define VCPU_ID 0
+
+static __thread volatile struct rseq __rseq = {
+	.cpu_id = RSEQ_CPU_ID_UNINITIALIZED,
+};
+
+/*
+ * Use an arbitrary, bogus signature for configuring rseq, this test does not
+ * actually enter an rseq critical section.
+ */
+#define RSEQ_SIG 0xdeadbeef
+
+/*
+ * Any bug related to task migration is likely to be timing-dependent; perform
+ * a large number of migrations to reduce the odds of a false negative.
+ */
+#define NR_TASK_MIGRATIONS 100000
+
+static pthread_t migration_thread;
+static cpu_set_t possible_mask;
+static bool done;
+
+static atomic_t seq_cnt;
+
+static void guest_code(void)
+{
+	for (;;)
+		GUEST_SYNC(0);
+}
+
+static void sys_rseq(int flags)
+{
+	int r;
+
+	r = syscall(__NR_rseq, &__rseq, sizeof(__rseq), flags, RSEQ_SIG);
+	TEST_ASSERT(!r, "rseq failed, errno = %d (%s)", errno, strerror(errno));
+}
+
+static void *migration_worker(void *ign)
+{
+	cpu_set_t allowed_mask;
+	int r, i, nr_cpus, cpu;
+
+	CPU_ZERO(&allowed_mask);
+
+	nr_cpus = CPU_COUNT(&possible_mask);
+
+	for (i = 0; i < NR_TASK_MIGRATIONS; i++) {
+		cpu = i % nr_cpus;
+		if (!CPU_ISSET(cpu, &possible_mask))
+			continue;
+
+		CPU_SET(cpu, &allowed_mask);
+
+		/*
+		 * Bump the sequence count twice to allow the reader to detect
+		 * that a migration may have occurred in between rseq and sched
+		 * CPU ID reads.  An odd sequence count indicates a migration
+		 * is in-progress, while a completely different count indicates
+		 * a migration occurred since the count was last read.
+		 */
+		atomic_inc(&seq_cnt);
+
+		/*
+		 * Ensure the odd count is visible while sched_getcpu() isn't
+		 * stable, i.e. while changing affinity is in-progress.
+		 */
+		smp_wmb();
+		r = sched_setaffinity(0, sizeof(allowed_mask), &allowed_mask);
+		TEST_ASSERT(!r, "sched_setaffinity failed, errno = %d (%s)",
+			    errno, strerror(errno));
+		smp_wmb();
+		atomic_inc(&seq_cnt);
+
+		CPU_CLR(cpu, &allowed_mask);
+
+		/*
+		 * Wait 1-10us before proceeding to the next iteration and more
+		 * specifically, before bumping seq_cnt again.  A delay is
+		 * needed on three fronts:
+		 *
+		 *  1. To allow sched_setaffinity() to prompt migration before
+		 *     ioctl(KVM_RUN) enters the guest so that TIF_NOTIFY_RESUME
+		 *     (or TIF_NEED_RESCHED, which indirectly leads to handling
+		 *     NOTIFY_RESUME) is handled in KVM context.
+		 *
+		 *     If NOTIFY_RESUME/NEED_RESCHED is set after KVM enters
+		 *     the guest, the guest will trigger a IO/MMIO exit all the
+		 *     way to userspace and the TIF flags will be handled by
+		 *     the generic "exit to userspace" logic, not by KVM.  The
+		 *     exit to userspace is necessary to give the test a chance
+		 *     to check the rseq CPU ID (see #2).
+		 *
+		 *     Alternatively, guest_code() could include an instruction
+		 *     to trigger an exit that is handled by KVM, but any such
+		 *     exit requires architecture specific code.
+		 *
+		 *  2. To let ioctl(KVM_RUN) make its way back to the test
+		 *     before the next round of migration.  The test's check on
+		 *     the rseq CPU ID must wait for migration to complete in
+		 *     order to avoid false positive, thus any kernel rseq bug
+		 *     will be missed if the next migration starts before the
+		 *     check completes.
+		 *
+		 *  3. To ensure the read-side makes efficient forward progress,
+		 *     e.g. if sched_getcpu() involves a syscall.  Stalling the
+		 *     read-side means the test will spend more time waiting for
+		 *     sched_getcpu() to stabilize and less time trying to hit
+		 *     the timing-dependent bug.
+		 *
+		 * Because any bug in this area is likely to be timing-dependent,
+		 * run with a range of delays at 1us intervals from 1us to 10us
+		 * as a best effort to avoid tuning the test to the point where
+		 * it can hit _only_ the original bug and not detect future
+		 * regressions.
+		 *
+		 * The original bug can reproduce with a delay up to ~500us on
+		 * x86-64, but starts to require more iterations to reproduce
+		 * as the delay creeps above ~10us, and the average runtime of
+		 * each iteration obviously increases as well.  Cap the delay
+		 * at 10us to keep test runtime reasonable while minimizing
+		 * potential coverage loss.
+		 *
+		 * The lower bound for reproducing the bug is likely below 1us,
+		 * e.g. failures occur on x86-64 with nanosleep(0), but at that
+		 * point the overhead of the syscall likely dominates the delay.
+		 * Use usleep() for simplicity and to avoid unnecessary kernel
+		 * dependencies.
+		 */
+		usleep((i % 10) + 1);
+	}
+	done = true;
+	return NULL;
+}
+
+int main(int argc, char *argv[])
+{
+	int r, i, snapshot;
+	struct kvm_vm *vm;
+	u32 cpu, rseq_cpu;
+
+	/* Tell stdout not to buffer its content */
+	setbuf(stdout, NULL);
+
+	r = sched_getaffinity(0, sizeof(possible_mask), &possible_mask);
+	TEST_ASSERT(!r, "sched_getaffinity failed, errno = %d (%s)", errno,
+		    strerror(errno));
+
+	if (CPU_COUNT(&possible_mask) < 2) {
+		print_skip("Only one CPU, task migration not possible\n");
+		exit(KSFT_SKIP);
+	}
+
+	sys_rseq(0);
+
+	/*
+	 * Create and run a dummy VM that immediately exits to userspace via
+	 * GUEST_SYNC, while concurrently migrating the process by setting its
+	 * CPU affinity.
+	 */
+	vm = vm_create_default(VCPU_ID, 0, guest_code);
+
+	pthread_create(&migration_thread, NULL, migration_worker, 0);
+
+	for (i = 0; !done; i++) {
+		vcpu_run(vm, VCPU_ID);
+		TEST_ASSERT(get_ucall(vm, VCPU_ID, NULL) == UCALL_SYNC,
+			    "Guest failed?");
+
+		/*
+		 * Verify rseq's CPU matches sched's CPU.  Ensure migration
+		 * doesn't occur between sched_getcpu() and reading the rseq
+		 * cpu_id by rereading both if the sequence count changes, or
+		 * if the count is odd (migration in-progress).
+		 */
+		do {
+			/*
+			 * Drop bit 0 to force a mismatch if the count is odd,
+			 * i.e. if a migration is in-progress.
+			 */
+			snapshot = atomic_read(&seq_cnt) & ~1;
+
+			/*
+			 * Ensure reading sched_getcpu() and rseq.cpu_id
+			 * complete in a single "no migration" window, i.e. are
+			 * not reordered across the seq_cnt reads.
+			 */
+			smp_rmb();
+			cpu = sched_getcpu();
+			rseq_cpu = READ_ONCE(__rseq.cpu_id);
+			smp_rmb();
+		} while (snapshot != atomic_read(&seq_cnt));
+
+		TEST_ASSERT(rseq_cpu == cpu,
+			    "rseq CPU = %d, sched CPU = %d\n", rseq_cpu, cpu);
+	}
+
+	/*
+	 * Sanity check that the test was able to enter the guest a reasonable
+	 * number of times, e.g. didn't get stalled too often/long waiting for
+	 * sched_getcpu() to stabilize.  A 2:1 migration:KVM_RUN ratio is a
+	 * fairly conservative ratio on x86-64, which can do _more_ KVM_RUNs
+	 * than migrations given the 1us+ delay in the migration task.
+	 */
+	TEST_ASSERT(i > (NR_TASK_MIGRATIONS / 2),
+		    "Only performed %d KVM_RUNs, task stalled too much?\n", i);
+
+	pthread_join(migration_thread, NULL);
+
+	kvm_vm_free(vm);
+
+	sys_rseq(RSEQ_FLAG_UNREGISTER);
+
+	return 0;
+}
-- 
2.33.0.153.gba50c8fa24-goog


^ permalink raw reply related

* [PATCH v3 3/5] tools: Move x86 syscall number fallbacks to .../uapi/
From: Sean Christopherson @ 2021-09-01 20:30 UTC (permalink / raw)
  To: Russell King, Catalin Marinas, Will Deacon, Guo Ren,
	Thomas Bogendoerfer, Michael Ellerman, Steven Rostedt,
	Ingo Molnar, Oleg Nesterov, Thomas Gleixner, Peter Zijlstra,
	Andy Lutomirski, Mathieu Desnoyers, Paul E. McKenney, Boqun Feng,
	Paolo Bonzini, Shuah Khan
  Cc: kvm, Ben Gardon, linux-kernel, linux-csky, linux-mips,
	Peter Foley, Paul Mackerras, linux-kselftest, Sean Christopherson,
	Shakeel Butt, linuxppc-dev, linux-arm-kernel
In-Reply-To: <20210901203030.1292304-1-seanjc@google.com>

Move unistd_{32,64}.h from x86/include/asm to x86/include/uapi/asm so
that tools/selftests that install kernel headers, e.g. KVM selftests, can
include non-uapi tools headers, e.g. to get 'struct list_head', without
effectively overriding the installed non-tool uapi headers.

Swapping KVM's search order, e.g. to search the kernel headers before
tool headers, is not a viable option as doing results in linux/type.h and
other core headers getting pulled from the kernel headers, which do not
have the kernel-internal typedefs that are used through tools, including
many files outside of selftests/kvm's control.

Prior to commit cec07f53c398 ("perf tools: Move syscall number fallbacks
from perf-sys.h to tools/arch/x86/include/asm/"), the handcoded numbers
were actual fallbacks, i.e. overriding unistd_{32,64}.h from the kernel
headers was unintentional.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 tools/arch/x86/include/{ => uapi}/asm/unistd_32.h | 0
 tools/arch/x86/include/{ => uapi}/asm/unistd_64.h | 0
 2 files changed, 0 insertions(+), 0 deletions(-)
 rename tools/arch/x86/include/{ => uapi}/asm/unistd_32.h (100%)
 rename tools/arch/x86/include/{ => uapi}/asm/unistd_64.h (100%)

diff --git a/tools/arch/x86/include/asm/unistd_32.h b/tools/arch/x86/include/uapi/asm/unistd_32.h
similarity index 100%
rename from tools/arch/x86/include/asm/unistd_32.h
rename to tools/arch/x86/include/uapi/asm/unistd_32.h
diff --git a/tools/arch/x86/include/asm/unistd_64.h b/tools/arch/x86/include/uapi/asm/unistd_64.h
similarity index 100%
rename from tools/arch/x86/include/asm/unistd_64.h
rename to tools/arch/x86/include/uapi/asm/unistd_64.h
-- 
2.33.0.153.gba50c8fa24-goog

^ permalink raw reply

* [PATCH v3 2/5] entry: rseq: Call rseq_handle_notify_resume() in tracehook_notify_resume()
From: Sean Christopherson @ 2021-09-01 20:30 UTC (permalink / raw)
  To: Russell King, Catalin Marinas, Will Deacon, Guo Ren,
	Thomas Bogendoerfer, Michael Ellerman, Steven Rostedt,
	Ingo Molnar, Oleg Nesterov, Thomas Gleixner, Peter Zijlstra,
	Andy Lutomirski, Mathieu Desnoyers, Paul E. McKenney, Boqun Feng,
	Paolo Bonzini, Shuah Khan
  Cc: kvm, Ben Gardon, linux-kernel, linux-csky, linux-mips,
	Peter Foley, Paul Mackerras, linux-kselftest, Sean Christopherson,
	Shakeel Butt, linuxppc-dev, linux-arm-kernel
In-Reply-To: <20210901203030.1292304-1-seanjc@google.com>

Invoke rseq_handle_notify_resume() from tracehook_notify_resume() now
that the two function are always called back-to-back by architectures
that have rseq.  The rseq helper is stubbed out for architectures that
don't support rseq, i.e. this is a nop across the board.

Note, tracehook_notify_resume() is horribly named and arguably does not
belong in tracehook.h as literally every line of code in it has nothing
to do with tracing.  But, that's been true since commit a42c6ded827d
("move key_repace_session_keyring() into tracehook_notify_resume()")
first usurped tracehook_notify_resume() back in 2012.  Punt cleaning that
mess up to future patches.

No functional change intended.

Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/arm/kernel/signal.c     | 1 -
 arch/arm64/kernel/signal.c   | 1 -
 arch/csky/kernel/signal.c    | 4 +---
 arch/mips/kernel/signal.c    | 4 +---
 arch/powerpc/kernel/signal.c | 4 +---
 include/linux/tracehook.h    | 2 ++
 kernel/entry/common.c        | 4 +---
 kernel/entry/kvm.c           | 4 +---
 8 files changed, 7 insertions(+), 17 deletions(-)

diff --git a/arch/arm/kernel/signal.c b/arch/arm/kernel/signal.c
index a3a38d0a4c85..9df68d139965 100644
--- a/arch/arm/kernel/signal.c
+++ b/arch/arm/kernel/signal.c
@@ -670,7 +670,6 @@ do_work_pending(struct pt_regs *regs, unsigned int thread_flags, int syscall)
 				uprobe_notify_resume(regs);
 			} else {
 				tracehook_notify_resume(regs);
-				rseq_handle_notify_resume(NULL, regs);
 			}
 		}
 		local_irq_disable();
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
index 23036334f4dc..22b55db13da6 100644
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -951,7 +951,6 @@ asmlinkage void do_notify_resume(struct pt_regs *regs,
 
 			if (thread_flags & _TIF_NOTIFY_RESUME) {
 				tracehook_notify_resume(regs);
-				rseq_handle_notify_resume(NULL, regs);
 
 				/*
 				 * If we reschedule after checking the affinity
diff --git a/arch/csky/kernel/signal.c b/arch/csky/kernel/signal.c
index 312f046d452d..bc4238b9f709 100644
--- a/arch/csky/kernel/signal.c
+++ b/arch/csky/kernel/signal.c
@@ -260,8 +260,6 @@ asmlinkage void do_notify_resume(struct pt_regs *regs,
 	if (thread_info_flags & (_TIF_SIGPENDING | _TIF_NOTIFY_SIGNAL))
 		do_signal(regs);
 
-	if (thread_info_flags & _TIF_NOTIFY_RESUME) {
+	if (thread_info_flags & _TIF_NOTIFY_RESUME)
 		tracehook_notify_resume(regs);
-		rseq_handle_notify_resume(NULL, regs);
-	}
 }
diff --git a/arch/mips/kernel/signal.c b/arch/mips/kernel/signal.c
index f1e985109da0..c9b2a75563e1 100644
--- a/arch/mips/kernel/signal.c
+++ b/arch/mips/kernel/signal.c
@@ -906,10 +906,8 @@ asmlinkage void do_notify_resume(struct pt_regs *regs, void *unused,
 	if (thread_info_flags & (_TIF_SIGPENDING | _TIF_NOTIFY_SIGNAL))
 		do_signal(regs);
 
-	if (thread_info_flags & _TIF_NOTIFY_RESUME) {
+	if (thread_info_flags & _TIF_NOTIFY_RESUME)
 		tracehook_notify_resume(regs);
-		rseq_handle_notify_resume(NULL, regs);
-	}
 
 	user_enter();
 }
diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c
index e600764a926c..b93b87df499d 100644
--- a/arch/powerpc/kernel/signal.c
+++ b/arch/powerpc/kernel/signal.c
@@ -293,10 +293,8 @@ void do_notify_resume(struct pt_regs *regs, unsigned long thread_info_flags)
 		do_signal(current);
 	}
 
-	if (thread_info_flags & _TIF_NOTIFY_RESUME) {
+	if (thread_info_flags & _TIF_NOTIFY_RESUME)
 		tracehook_notify_resume(regs);
-		rseq_handle_notify_resume(NULL, regs);
-	}
 }
 
 static unsigned long get_tm_stackpointer(struct task_struct *tsk)
diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h
index 3e80c4bc66f7..2564b7434b4d 100644
--- a/include/linux/tracehook.h
+++ b/include/linux/tracehook.h
@@ -197,6 +197,8 @@ static inline void tracehook_notify_resume(struct pt_regs *regs)
 
 	mem_cgroup_handle_over_high();
 	blkcg_maybe_throttle_current();
+
+	rseq_handle_notify_resume(NULL, regs);
 }
 
 /*
diff --git a/kernel/entry/common.c b/kernel/entry/common.c
index bf16395b9e13..d5a61d565ad5 100644
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -171,10 +171,8 @@ static unsigned long exit_to_user_mode_loop(struct pt_regs *regs,
 		if (ti_work & (_TIF_SIGPENDING | _TIF_NOTIFY_SIGNAL))
 			handle_signal_work(regs, ti_work);
 
-		if (ti_work & _TIF_NOTIFY_RESUME) {
+		if (ti_work & _TIF_NOTIFY_RESUME)
 			tracehook_notify_resume(regs);
-			rseq_handle_notify_resume(NULL, regs);
-		}
 
 		/* Architecture specific TIF work */
 		arch_exit_to_user_mode_work(regs, ti_work);
diff --git a/kernel/entry/kvm.c b/kernel/entry/kvm.c
index 049fd06b4c3d..49972ee99aff 100644
--- a/kernel/entry/kvm.c
+++ b/kernel/entry/kvm.c
@@ -19,10 +19,8 @@ static int xfer_to_guest_mode_work(struct kvm_vcpu *vcpu, unsigned long ti_work)
 		if (ti_work & _TIF_NEED_RESCHED)
 			schedule();
 
-		if (ti_work & _TIF_NOTIFY_RESUME) {
+		if (ti_work & _TIF_NOTIFY_RESUME)
 			tracehook_notify_resume(NULL);
-			rseq_handle_notify_resume(NULL, NULL);
-		}
 
 		ret = arch_xfer_to_guest_mode_handle_work(vcpu, ti_work);
 		if (ret)
-- 
2.33.0.153.gba50c8fa24-goog


^ permalink raw reply related

* [PATCH v3 1/5] KVM: rseq: Update rseq when processing NOTIFY_RESUME on xfer to KVM guest
From: Sean Christopherson @ 2021-09-01 20:30 UTC (permalink / raw)
  To: Russell King, Catalin Marinas, Will Deacon, Guo Ren,
	Thomas Bogendoerfer, Michael Ellerman, Steven Rostedt,
	Ingo Molnar, Oleg Nesterov, Thomas Gleixner, Peter Zijlstra,
	Andy Lutomirski, Mathieu Desnoyers, Paul E. McKenney, Boqun Feng,
	Paolo Bonzini, Shuah Khan
  Cc: kvm, Ben Gardon, linux-kernel, linux-csky, linux-mips,
	Peter Foley, Paul Mackerras, linux-kselftest, Sean Christopherson,
	Shakeel Butt, linuxppc-dev, linux-arm-kernel
In-Reply-To: <20210901203030.1292304-1-seanjc@google.com>

Invoke rseq's NOTIFY_RESUME handler when processing the flag prior to
transferring to a KVM guest, which is roughly equivalent to an exit to
userspace and processes many of the same pending actions.  While the task
cannot be in an rseq critical section as the KVM path is reachable only
by via ioctl(KVM_RUN), the side effects that apply to rseq outside of a
critical section still apply, e.g. the current CPU needs to be updated if
the task is migrated.

Clearing TIF_NOTIFY_RESUME without informing rseq can lead to segfaults
and other badness in userspace VMMs that use rseq in combination with KVM,
e.g. due to the CPU ID being stale after task migration.

Fixes: 72c3c0fe54a3 ("x86/kvm: Use generic xfer to guest work function")
Reported-by: Peter Foley <pefoley@google.com>
Bisected-by: Doug Evans <dje@google.com>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 kernel/entry/kvm.c |  4 +++-
 kernel/rseq.c      | 14 +++++++++++---
 2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/kernel/entry/kvm.c b/kernel/entry/kvm.c
index 49972ee99aff..049fd06b4c3d 100644
--- a/kernel/entry/kvm.c
+++ b/kernel/entry/kvm.c
@@ -19,8 +19,10 @@ static int xfer_to_guest_mode_work(struct kvm_vcpu *vcpu, unsigned long ti_work)
 		if (ti_work & _TIF_NEED_RESCHED)
 			schedule();
 
-		if (ti_work & _TIF_NOTIFY_RESUME)
+		if (ti_work & _TIF_NOTIFY_RESUME) {
 			tracehook_notify_resume(NULL);
+			rseq_handle_notify_resume(NULL, NULL);
+		}
 
 		ret = arch_xfer_to_guest_mode_handle_work(vcpu, ti_work);
 		if (ret)
diff --git a/kernel/rseq.c b/kernel/rseq.c
index 35f7bd0fced0..6d45ac3dae7f 100644
--- a/kernel/rseq.c
+++ b/kernel/rseq.c
@@ -282,9 +282,17 @@ void __rseq_handle_notify_resume(struct ksignal *ksig, struct pt_regs *regs)
 
 	if (unlikely(t->flags & PF_EXITING))
 		return;
-	ret = rseq_ip_fixup(regs);
-	if (unlikely(ret < 0))
-		goto error;
+
+	/*
+	 * regs is NULL if and only if the caller is in a syscall path.  Skip
+	 * fixup and leave rseq_cs as is so that rseq_sycall() will detect and
+	 * kill a misbehaving userspace on debug kernels.
+	 */
+	if (regs) {
+		ret = rseq_ip_fixup(regs);
+		if (unlikely(ret < 0))
+			goto error;
+	}
 	if (unlikely(rseq_update_cpu_id(t)))
 		goto error;
 	return;
-- 
2.33.0.153.gba50c8fa24-goog


^ permalink raw reply related

* [PATCH v3 0/5] KVM: rseq: Fix and a test for a KVM+rseq bug
From: Sean Christopherson @ 2021-09-01 20:30 UTC (permalink / raw)
  To: Russell King, Catalin Marinas, Will Deacon, Guo Ren,
	Thomas Bogendoerfer, Michael Ellerman, Steven Rostedt,
	Ingo Molnar, Oleg Nesterov, Thomas Gleixner, Peter Zijlstra,
	Andy Lutomirski, Mathieu Desnoyers, Paul E. McKenney, Boqun Feng,
	Paolo Bonzini, Shuah Khan
  Cc: kvm, Ben Gardon, linux-kernel, linux-csky, linux-mips,
	Peter Foley, Paul Mackerras, linux-kselftest, Sean Christopherson,
	Shakeel Butt, linuxppc-dev, linux-arm-kernel

Patch 1 fixes a KVM+rseq bug where KVM's handling of TIF_NOTIFY_RESUME,
e.g. for task migration, clears the flag without informing rseq and leads
to stale data in userspace's rseq struct.

Patch 2 is a cleanup to try and make future bugs less likely.  It's also
a baby step towards moving and renaming tracehook_notify_resume() since
it has nothing to do with tracing.

Patch 3 is a fix/cleanup to stop overriding x86's unistd_{32,64}.h when
the include path (intentionally) omits tools' uapi headers.  KVM's
selftests do exactly that so that they can pick up the uapi headers from
the installed kernel headers, and still use various tools/ headers that
mirror kernel code, e.g. linux/types.h.  This allows the new test in
patch 4 to reference __NR_rseq without having to manually define it.

Patch 4 is a regression test for the KVM+rseq bug.

Patch 5 is a cleanup made possible by patch 3.

Based on commit 835d31d319d9 ("Merge tag 'media/v5.15-1' of ...").

v3:
  - Collect Ack/Review. [Mathieu, Ben]
  - Add explicit smp_wmb() instead of relying on atomic_inc() to do a full
    barrier. [Mathieu]
  - Add lots and lots of comments in the selftest, especially around why
    the migration thread needs a udelay(). [Mathieu]
  - Delay between 1us and 10us to reduce the odds of having a hard
    dependency on arch/kernel behavior.  [Mathieu]
  - Dropped an s390 change in patch 2 after a rebase to upstream master.

v2:
  - https://lkml.kernel.org/r/20210820225002.310652-1-seanjc@google.com
  - Don't touch rseq_cs when handling KVM case so that rseq_syscall() will
    still detect a naughty userspace. [Mathieu]
  - Use a sequence counter + retry in the test to ensure the process isn't
    migrated between sched_getcpu() and reading rseq.cpu_id, i.e. to
    avoid a flaky test. [Mathieu]
  - Add Mathieu's ack for patch 2.
  - Add more comments in the test.

v1: https://lkml.kernel.org/r/20210818001210.4073390-1-seanjc@google.com

Sean Christopherson (5):
  KVM: rseq: Update rseq when processing NOTIFY_RESUME on xfer to KVM
    guest
  entry: rseq: Call rseq_handle_notify_resume() in
    tracehook_notify_resume()
  tools: Move x86 syscall number fallbacks to .../uapi/
  KVM: selftests: Add a test for KVM_RUN+rseq to detect task migration
    bugs
  KVM: selftests: Remove __NR_userfaultfd syscall fallback

 arch/arm/kernel/signal.c                      |   1 -
 arch/arm64/kernel/signal.c                    |   1 -
 arch/csky/kernel/signal.c                     |   4 +-
 arch/mips/kernel/signal.c                     |   4 +-
 arch/powerpc/kernel/signal.c                  |   4 +-
 include/linux/tracehook.h                     |   2 +
 kernel/entry/common.c                         |   4 +-
 kernel/rseq.c                                 |  14 +-
 .../x86/include/{ => uapi}/asm/unistd_32.h    |   0
 .../x86/include/{ => uapi}/asm/unistd_64.h    |   3 -
 tools/testing/selftests/kvm/.gitignore        |   1 +
 tools/testing/selftests/kvm/Makefile          |   3 +
 tools/testing/selftests/kvm/rseq_test.c       | 236 ++++++++++++++++++
 13 files changed, 257 insertions(+), 20 deletions(-)
 rename tools/arch/x86/include/{ => uapi}/asm/unistd_32.h (100%)
 rename tools/arch/x86/include/{ => uapi}/asm/unistd_64.h (83%)
 create mode 100644 tools/testing/selftests/kvm/rseq_test.c

-- 
2.33.0.153.gba50c8fa24-goog


^ permalink raw reply

* Re: [GIT PULL] retire legacy WR sbc8548 and sbc8641 platforms
From: Paul Gortmaker @ 2021-09-01 19:51 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: Scott Wood, Paul Mackerras, linuxppc-dev, linux-kernel
In-Reply-To: <87pmu09rg1.fsf@mpe.ellerman.id.au>

[Re: [GIT PULL] retire legacy WR sbc8548 and sbc8641 platforms] On 27/08/2021 (Fri 01:05) Michael Ellerman wrote:

> Paul Gortmaker <paul.gortmaker@windriver.com> writes:
> > This is unchanged from the original wr_sbc-delete branch sent in January,
> > other than to add the Acks from Scott in July, and update the baseline.
> >
> > Built with ppc64 defconfig and mpc85xx_cds_defconfig and mpc86xx_defconfig
> > just to make sure I didn't fat finger anything in the baseline update.
> 
> Thanks for following up on this.
> 
> I ended up cherry-picking the patches into my branch. I like to keep my
> next based on rc2, and merging this would have pulled in everything up
> to rc7 into my branch.
> 
> I don't think you were planning to merge this branch anywhere else, so
> it shouldn't make any difference, but let me know if it's a problem.

That is 100% fine - as you guessed, it is a dead end branch, and there is
no real underlying value in preserving the baseline for removals like this.

Thanks!
Paul.
--

> 
> It should appear here soon:
> 
>   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/log/?h=next
> 
> 
> cheers
>  
> 
> > Original v1 text follows below, from:
> >
> > https://lore.kernel.org/lkml/20210111082823.99562-1-paul.gortmaker@windriver.com
> >
> > It would be nice to get this in and off our collective to-do list.
> >
> > Thanks,
> > Paul.
> >
> >   ---
> >
> > In v2.6.27 (2008, 917f0af9e5a9) the sbc8260 support was implicitly
> > retired by not being carried forward through the ppc --> powerpc
> > device tree transition.
> >
> > Then, in v3.6 (2012, b048b4e17cbb) we retired the support for the
> > sbc8560 boards.
> >
> > Next, in v4.18 (2017, 3bc6cf5a86e5) we retired the support for the
> > 2006 vintage sbc834x boards.
> >
> > The sbc8548 and sbc8641d boards were maybe 1-2 years newer than the
> > sbc834x boards, but it is also 3+ years later, so it makes sense to
> > now retire them as well - which is what is done here.
> >
> > These two remaining WR boards were based on the Freescale MPC8548-CDS
> > and the MPC8641D-HPCN reference board implementations.  Having had the
> > chance to use these and many other Fsl ref boards, I know this:  The
> > Freescale reference boards were typically produced in limited quantity
> > and primarily available to BSP developers and hardware designers, and
> > not likely to have found a 2nd life with hobbyists and/or collectors.
> >
> > It was good to have that BSP code subjected to mainline review and
> > hence also widely available back in the day. But given the above, we
> > should probably also be giving serious consideration to retiring
> > additional similar age/type reference board platforms as well.
> >
> > I've always felt it is important for us to be proactive in retiring
> > old code, since it has a genuine non-zero carrying cost, as described
> > in the 930d52c012b8 merge log.  But for the here and now, we just
> > clean up the remaining BSP code that I had added for SBC platforms.
> >
> > --- 
> >
> > The following changes since commit e22ce8eb631bdc47a4a4ea7ecf4e4ba499db4f93:
> >
> >   Linux 5.14-rc7 (2021-08-22 14:24:56 -0700)
> >
> > are available in the Git repository at:
> >
> >   git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux.git wr_sbc-delete-v2
> >
> > for you to fetch changes up to d44e2dc12ea2112e74cdd25090eeda2727ed09cc:
> >
> >   MAINTAINERS: update for Paul Gortmaker (2021-08-24 08:19:01 -0400)
> >
> > ----------------------------------------------------------------
> > Paul Gortmaker (3):
> >       powerpc: retire sbc8548 board support
> >       powerpc: retire sbc8641d board support
> >       MAINTAINERS: update for Paul Gortmaker
> >
> >  MAINTAINERS                                 |   1 -
> >  arch/powerpc/boot/Makefile                  |   1 -
> >  arch/powerpc/boot/dts/fsl/sbc8641d.dts      | 176 -----------------
> >  arch/powerpc/boot/dts/sbc8548-altflash.dts  | 111 -----------
> >  arch/powerpc/boot/dts/sbc8548-post.dtsi     | 289 ----------------------------
> >  arch/powerpc/boot/dts/sbc8548-pre.dtsi      |  48 -----
> >  arch/powerpc/boot/dts/sbc8548.dts           | 106 ----------
> >  arch/powerpc/boot/wrapper                   |   2 +-
> >  arch/powerpc/configs/85xx/sbc8548_defconfig |  50 -----
> >  arch/powerpc/configs/mpc85xx_base.config    |   1 -
> >  arch/powerpc/configs/mpc86xx_base.config    |   1 -
> >  arch/powerpc/configs/ppc6xx_defconfig       |   1 -
> >  arch/powerpc/platforms/85xx/Kconfig         |   6 -
> >  arch/powerpc/platforms/85xx/Makefile        |   1 -
> >  arch/powerpc/platforms/85xx/sbc8548.c       | 134 -------------
> >  arch/powerpc/platforms/86xx/Kconfig         |   8 +-
> >  arch/powerpc/platforms/86xx/Makefile        |   1 -
> >  arch/powerpc/platforms/86xx/sbc8641d.c      |  87 ---------
> >  18 files changed, 2 insertions(+), 1022 deletions(-)
> >  delete mode 100644 arch/powerpc/boot/dts/fsl/sbc8641d.dts
> >  delete mode 100644 arch/powerpc/boot/dts/sbc8548-altflash.dts
> >  delete mode 100644 arch/powerpc/boot/dts/sbc8548-post.dtsi
> >  delete mode 100644 arch/powerpc/boot/dts/sbc8548-pre.dtsi
> >  delete mode 100644 arch/powerpc/boot/dts/sbc8548.dts
> >  delete mode 100644 arch/powerpc/configs/85xx/sbc8548_defconfig
> >  delete mode 100644 arch/powerpc/platforms/85xx/sbc8548.c
> >  delete mode 100644 arch/powerpc/platforms/86xx/sbc8641d.c

^ permalink raw reply

* [PATCH 2/5] KVM: PPC: Book3S HV: Delay setting of kvm ops
From: Fabiano Rosas @ 2021-09-01 17:33 UTC (permalink / raw)
  To: kvm-ppc; +Cc: linuxppc-dev, npiggin, david
In-Reply-To: <20210901173357.3183658-1-farosas@linux.ibm.com>

Delay the setting of kvm_hv_ops until after all init code has
completed. This avoids leaving the ops still accessible if the init
fails.

Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
---
 arch/powerpc/kvm/book3s_hv.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 4cdf2e6e1cf5..fe20074faa61 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -5985,9 +5985,6 @@ static int kvmppc_book3s_init_hv(void)
 	}
 #endif
 
-	kvm_ops_hv.owner = THIS_MODULE;
-	kvmppc_hv_ops = &kvm_ops_hv;
-
 	init_default_hcalls();
 
 	init_vcore_lists();
@@ -6003,10 +6000,15 @@ static int kvmppc_book3s_init_hv(void)
 	}
 
 	r = kvmppc_uvmem_init();
-	if (r < 0)
+	if (r < 0) {
 		pr_err("KVM-HV: kvmppc_uvmem_init failed %d\n", r);
+		return r;
+	}
 
-	return r;
+	kvm_ops_hv.owner = THIS_MODULE;
+	kvmppc_hv_ops = &kvm_ops_hv;
+
+	return 0;
 }
 
 static void kvmppc_book3s_exit_hv(void)
-- 
2.29.2


^ permalink raw reply related

* [PATCH 1/5] KVM: PPC: Book3S HV: Check return value of kvmppc_radix_init
From: Fabiano Rosas @ 2021-09-01 17:33 UTC (permalink / raw)
  To: kvm-ppc; +Cc: linuxppc-dev, npiggin, david
In-Reply-To: <20210901173357.3183658-1-farosas@linux.ibm.com>

The return of the function is being shadowed by the call to
kvmppc_uvmem_init.

Fixes: ca9f4942670c ("KVM: PPC: Book3S HV: Support for running secure guests")
Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
---
 arch/powerpc/kvm/book3s_hv.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 085fb8ecbf68..4cdf2e6e1cf5 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -5996,8 +5996,11 @@ static int kvmppc_book3s_init_hv(void)
 	if (r)
 		return r;
 
-	if (kvmppc_radix_possible())
+	if (kvmppc_radix_possible()) {
 		r = kvmppc_radix_init();
+		if (r)
+			return r;
+	}
 
 	r = kvmppc_uvmem_init();
 	if (r < 0)
-- 
2.29.2


^ permalink raw reply related

* [PATCH 5/5] KVM: PPC: Book3S: Stop exporting non-builtin symbols
From: Fabiano Rosas @ 2021-09-01 17:33 UTC (permalink / raw)
  To: kvm-ppc; +Cc: linuxppc-dev, npiggin, david
In-Reply-To: <20210901173357.3183658-1-farosas@linux.ibm.com>

Now that we have only one kvm module, we can stop exporting some
symbols.

Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
---
 arch/powerpc/kvm/book3s.c              | 15 ---------------
 arch/powerpc/kvm/book3s_64_mmu_radix.c |  3 ---
 arch/powerpc/kvm/book3s_64_vio.c       |  3 ---
 arch/powerpc/kvm/book3s_rtas.c         |  1 -
 arch/powerpc/kvm/book3s_xics.c         |  4 ----
 arch/powerpc/kvm/book3s_xive.c         |  6 ------
 arch/powerpc/kvm/emulate.c             |  1 -
 arch/powerpc/kvm/powerpc.c             | 14 --------------
 8 files changed, 47 deletions(-)

diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index c381bb17b0f9..120a68c76501 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -191,27 +191,23 @@ void kvmppc_book3s_queue_irqprio(struct kvm_vcpu *vcpu, unsigned int vec)
 	printk(KERN_INFO "Queueing interrupt %x\n", vec);
 #endif
 }
-EXPORT_SYMBOL_GPL(kvmppc_book3s_queue_irqprio);
 
 void kvmppc_core_queue_machine_check(struct kvm_vcpu *vcpu, ulong flags)
 {
 	/* might as well deliver this straight away */
 	kvmppc_inject_interrupt(vcpu, BOOK3S_INTERRUPT_MACHINE_CHECK, flags);
 }
-EXPORT_SYMBOL_GPL(kvmppc_core_queue_machine_check);
 
 void kvmppc_core_queue_syscall(struct kvm_vcpu *vcpu)
 {
 	kvmppc_inject_interrupt(vcpu, BOOK3S_INTERRUPT_SYSCALL, 0);
 }
-EXPORT_SYMBOL(kvmppc_core_queue_syscall);
 
 void kvmppc_core_queue_program(struct kvm_vcpu *vcpu, ulong flags)
 {
 	/* might as well deliver this straight away */
 	kvmppc_inject_interrupt(vcpu, BOOK3S_INTERRUPT_PROGRAM, flags);
 }
-EXPORT_SYMBOL_GPL(kvmppc_core_queue_program);
 
 void kvmppc_core_queue_fpunavail(struct kvm_vcpu *vcpu)
 {
@@ -235,19 +231,16 @@ void kvmppc_core_queue_dec(struct kvm_vcpu *vcpu)
 {
 	kvmppc_book3s_queue_irqprio(vcpu, BOOK3S_INTERRUPT_DECREMENTER);
 }
-EXPORT_SYMBOL_GPL(kvmppc_core_queue_dec);
 
 int kvmppc_core_pending_dec(struct kvm_vcpu *vcpu)
 {
 	return test_bit(BOOK3S_IRQPRIO_DECREMENTER, &vcpu->arch.pending_exceptions);
 }
-EXPORT_SYMBOL_GPL(kvmppc_core_pending_dec);
 
 void kvmppc_core_dequeue_dec(struct kvm_vcpu *vcpu)
 {
 	kvmppc_book3s_dequeue_irqprio(vcpu, BOOK3S_INTERRUPT_DECREMENTER);
 }
-EXPORT_SYMBOL_GPL(kvmppc_core_dequeue_dec);
 
 void kvmppc_core_queue_external(struct kvm_vcpu *vcpu,
                                 struct kvm_interrupt *irq)
@@ -290,13 +283,11 @@ void kvmppc_core_queue_data_storage(struct kvm_vcpu *vcpu, ulong dar,
 	kvmppc_set_dsisr(vcpu, flags);
 	kvmppc_inject_interrupt(vcpu, BOOK3S_INTERRUPT_DATA_STORAGE, 0);
 }
-EXPORT_SYMBOL_GPL(kvmppc_core_queue_data_storage);
 
 void kvmppc_core_queue_inst_storage(struct kvm_vcpu *vcpu, ulong flags)
 {
 	kvmppc_inject_interrupt(vcpu, BOOK3S_INTERRUPT_INST_STORAGE, flags);
 }
-EXPORT_SYMBOL_GPL(kvmppc_core_queue_inst_storage);
 
 static int kvmppc_book3s_irqprio_deliver(struct kvm_vcpu *vcpu,
 					 unsigned int priority)
@@ -428,7 +419,6 @@ int kvmppc_core_prepare_to_enter(struct kvm_vcpu *vcpu)
 
 	return 0;
 }
-EXPORT_SYMBOL_GPL(kvmppc_core_prepare_to_enter);
 
 kvm_pfn_t kvmppc_gpa_to_pfn(struct kvm_vcpu *vcpu, gpa_t gpa, bool writing,
 			bool *writable)
@@ -454,7 +444,6 @@ kvm_pfn_t kvmppc_gpa_to_pfn(struct kvm_vcpu *vcpu, gpa_t gpa, bool writing,
 
 	return gfn_to_pfn_prot(vcpu->kvm, gfn, writing, writable);
 }
-EXPORT_SYMBOL_GPL(kvmppc_gpa_to_pfn);
 
 int kvmppc_xlate(struct kvm_vcpu *vcpu, ulong eaddr, enum xlate_instdata xlid,
 		 enum xlate_readwrite xlrw, struct kvmppc_pte *pte)
@@ -501,7 +490,6 @@ int kvmppc_load_last_inst(struct kvm_vcpu *vcpu,
 	else
 		return EMULATE_AGAIN;
 }
-EXPORT_SYMBOL_GPL(kvmppc_load_last_inst);
 
 int kvmppc_subarch_vcpu_init(struct kvm_vcpu *vcpu)
 {
@@ -788,7 +776,6 @@ void kvmppc_set_msr(struct kvm_vcpu *vcpu, u64 msr)
 {
 	vcpu->kvm->arch.kvm_ops->set_msr(vcpu, msr);
 }
-EXPORT_SYMBOL_GPL(kvmppc_set_msr);
 
 int kvmppc_vcpu_run(struct kvm_vcpu *vcpu)
 {
@@ -964,7 +951,6 @@ int kvmppc_h_logical_ci_load(struct kvm_vcpu *vcpu)
 
 	return H_SUCCESS;
 }
-EXPORT_SYMBOL_GPL(kvmppc_h_logical_ci_load);
 
 int kvmppc_h_logical_ci_store(struct kvm_vcpu *vcpu)
 {
@@ -1004,7 +990,6 @@ int kvmppc_h_logical_ci_store(struct kvm_vcpu *vcpu)
 
 	return H_SUCCESS;
 }
-EXPORT_SYMBOL_GPL(kvmppc_h_logical_ci_store);
 
 int kvmppc_core_check_processor_compat(void)
 {
diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index b5905ae4377c..f354ccfed56d 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -81,7 +81,6 @@ unsigned long __kvmhv_copy_tofrom_guest_radix(int lpid, int pid,
 
 	return ret;
 }
-EXPORT_SYMBOL_GPL(__kvmhv_copy_tofrom_guest_radix);
 
 static long kvmhv_copy_tofrom_guest_radix(struct kvm_vcpu *vcpu, gva_t eaddr,
 					  void *to, void *from, unsigned long n)
@@ -117,14 +116,12 @@ long kvmhv_copy_from_guest_radix(struct kvm_vcpu *vcpu, gva_t eaddr, void *to,
 
 	return ret;
 }
-EXPORT_SYMBOL_GPL(kvmhv_copy_from_guest_radix);
 
 long kvmhv_copy_to_guest_radix(struct kvm_vcpu *vcpu, gva_t eaddr, void *from,
 			       unsigned long n)
 {
 	return kvmhv_copy_tofrom_guest_radix(vcpu, eaddr, NULL, from, n);
 }
-EXPORT_SYMBOL_GPL(kvmhv_copy_to_guest_radix);
 
 int kvmppc_mmu_walk_radix_tree(struct kvm_vcpu *vcpu, gva_t eaddr,
 			       struct kvmppc_pte *gpte, u64 root,
diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 8da93fdfa59e..5e717512d9c4 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -606,7 +606,6 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 
 	return ret;
 }
-EXPORT_SYMBOL_GPL(kvmppc_h_put_tce);
 
 long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
 		unsigned long liobn, unsigned long ioba,
@@ -703,7 +702,6 @@ long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
 
 	return ret;
 }
-EXPORT_SYMBOL_GPL(kvmppc_h_put_tce_indirect);
 
 long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
 		unsigned long liobn, unsigned long ioba,
@@ -752,4 +750,3 @@ long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
 
 	return ret;
 }
-EXPORT_SYMBOL_GPL(kvmppc_h_stuff_tce);
diff --git a/arch/powerpc/kvm/book3s_rtas.c b/arch/powerpc/kvm/book3s_rtas.c
index 0f847f1e5ddd..91094dfb9bc4 100644
--- a/arch/powerpc/kvm/book3s_rtas.c
+++ b/arch/powerpc/kvm/book3s_rtas.c
@@ -294,7 +294,6 @@ int kvmppc_rtas_hcall(struct kvm_vcpu *vcpu)
 	 */
 	return rc;
 }
-EXPORT_SYMBOL_GPL(kvmppc_rtas_hcall);
 
 void kvmppc_rtas_tokens_free(struct kvm *kvm)
 {
diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c
index 303e3cb096db..234a80c958f1 100644
--- a/arch/powerpc/kvm/book3s_xics.c
+++ b/arch/powerpc/kvm/book3s_xics.c
@@ -870,7 +870,6 @@ int kvmppc_xics_rm_complete(struct kvm_vcpu *vcpu, u32 hcall)
 
 	return H_SUCCESS;
 }
-EXPORT_SYMBOL_GPL(kvmppc_xics_rm_complete);
 
 int kvmppc_xics_hcall(struct kvm_vcpu *vcpu, u32 req)
 {
@@ -917,7 +916,6 @@ int kvmppc_xics_hcall(struct kvm_vcpu *vcpu, u32 req)
 
 	return rc;
 }
-EXPORT_SYMBOL_GPL(kvmppc_xics_hcall);
 
 
 /* -- Initialisation code etc. -- */
@@ -1498,7 +1496,6 @@ void kvmppc_xics_set_mapped(struct kvm *kvm, unsigned long irq,
 	ics->irq_state[idx].host_irq = host_irq;
 	ics->irq_state[idx].intr_cpu = -1;
 }
-EXPORT_SYMBOL_GPL(kvmppc_xics_set_mapped);
 
 void kvmppc_xics_clr_mapped(struct kvm *kvm, unsigned long irq,
 			    unsigned long host_irq)
@@ -1513,4 +1510,3 @@ void kvmppc_xics_clr_mapped(struct kvm *kvm, unsigned long irq,
 
 	ics->irq_state[idx].host_irq = 0;
 }
-EXPORT_SYMBOL_GPL(kvmppc_xics_clr_mapped);
diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
index 8cfab3547494..77b350805649 100644
--- a/arch/powerpc/kvm/book3s_xive.c
+++ b/arch/powerpc/kvm/book3s_xive.c
@@ -126,7 +126,6 @@ void kvmppc_xive_push_vcpu(struct kvm_vcpu *vcpu)
 			vcpu->arch.xive_esc_on = 0;
 	}
 }
-EXPORT_SYMBOL_GPL(kvmppc_xive_push_vcpu);
 
 /*
  * Pull a vcpu's context from the XIVE on guest exit.
@@ -157,7 +156,6 @@ void kvmppc_xive_pull_vcpu(struct kvm_vcpu *vcpu)
 	vcpu->arch.xive_pushed = 0;
 	eieio();
 }
-EXPORT_SYMBOL_GPL(kvmppc_xive_pull_vcpu);
 
 void kvmppc_xive_rearm_escalation(struct kvm_vcpu *vcpu)
 {
@@ -191,7 +189,6 @@ void kvmppc_xive_rearm_escalation(struct kvm_vcpu *vcpu)
 	}
 	mb();
 }
-EXPORT_SYMBOL_GPL(kvmppc_xive_rearm_escalation);
 
 /*
  * This is a simple trigger for a generic XIVE IRQ. This must
@@ -1016,7 +1013,6 @@ int kvmppc_xive_set_mapped(struct kvm *kvm, unsigned long guest_irq,
 
 	return 0;
 }
-EXPORT_SYMBOL_GPL(kvmppc_xive_set_mapped);
 
 int kvmppc_xive_clr_mapped(struct kvm *kvm, unsigned long guest_irq,
 			   struct irq_desc *host_desc)
@@ -1097,7 +1093,6 @@ int kvmppc_xive_clr_mapped(struct kvm *kvm, unsigned long guest_irq,
 
 	return 0;
 }
-EXPORT_SYMBOL_GPL(kvmppc_xive_clr_mapped);
 
 void kvmppc_xive_disable_vcpu_interrupts(struct kvm_vcpu *vcpu)
 {
@@ -2169,7 +2164,6 @@ int kvmppc_xive_xics_hcall(struct kvm_vcpu *vcpu, u32 req)
 
 	return H_UNSUPPORTED;
 }
-EXPORT_SYMBOL_GPL(kvmppc_xive_xics_hcall);
 
 int kvmppc_xive_debug_show_queues(struct seq_file *m, struct kvm_vcpu *vcpu)
 {
diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c
index ee1147c98cd8..468efa502d56 100644
--- a/arch/powerpc/kvm/emulate.c
+++ b/arch/powerpc/kvm/emulate.c
@@ -304,4 +304,3 @@ int kvmppc_emulate_instruction(struct kvm_vcpu *vcpu)
 
 	return emulated;
 }
-EXPORT_SYMBOL_GPL(kvmppc_emulate_instruction);
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index b4e6f70b97b9..b3a8853bf6ba 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -41,9 +41,7 @@
 #include "trace.h"
 
 struct kvmppc_ops *kvmppc_hv_ops;
-EXPORT_SYMBOL_GPL(kvmppc_hv_ops);
 struct kvmppc_ops *kvmppc_pr_ops;
-EXPORT_SYMBOL_GPL(kvmppc_pr_ops);
 
 
 int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
@@ -135,7 +133,6 @@ int kvmppc_prepare_to_enter(struct kvm_vcpu *vcpu)
 	local_irq_enable();
 	return r;
 }
-EXPORT_SYMBOL_GPL(kvmppc_prepare_to_enter);
 
 #if defined(CONFIG_PPC_BOOK3S_64) && defined(CONFIG_KVM_BOOK3S_PR_POSSIBLE)
 static void kvmppc_swab_shared(struct kvm_vcpu *vcpu)
@@ -248,7 +245,6 @@ int kvmppc_kvm_pv(struct kvm_vcpu *vcpu)
 
 	return r;
 }
-EXPORT_SYMBOL_GPL(kvmppc_kvm_pv);
 
 int kvmppc_sanity_check(struct kvm_vcpu *vcpu)
 {
@@ -277,7 +273,6 @@ int kvmppc_sanity_check(struct kvm_vcpu *vcpu)
 	vcpu->arch.sane = r;
 	return r ? 0 : -EINVAL;
 }
-EXPORT_SYMBOL_GPL(kvmppc_sanity_check);
 
 int kvmppc_emulate_mmio(struct kvm_vcpu *vcpu)
 {
@@ -319,7 +314,6 @@ int kvmppc_emulate_mmio(struct kvm_vcpu *vcpu)
 
 	return r;
 }
-EXPORT_SYMBOL_GPL(kvmppc_emulate_mmio);
 
 int kvmppc_st(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr,
 	      bool data)
@@ -362,7 +356,6 @@ int kvmppc_st(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr,
 
 	return EMULATE_DONE;
 }
-EXPORT_SYMBOL_GPL(kvmppc_st);
 
 int kvmppc_ld(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr,
 		      bool data)
@@ -411,7 +404,6 @@ int kvmppc_ld(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr,
 
 	return EMULATE_DONE;
 }
-EXPORT_SYMBOL_GPL(kvmppc_ld);
 
 int kvm_arch_hardware_enable(void)
 {
@@ -1281,7 +1273,6 @@ int kvmppc_handle_load(struct kvm_vcpu *vcpu,
 {
 	return __kvmppc_handle_load(vcpu, rt, bytes, is_default_endian, 0);
 }
-EXPORT_SYMBOL_GPL(kvmppc_handle_load);
 
 /* Same as above, but sign extends */
 int kvmppc_handle_loads(struct kvm_vcpu *vcpu,
@@ -1378,7 +1369,6 @@ int kvmppc_handle_store(struct kvm_vcpu *vcpu,
 
 	return EMULATE_DO_MMIO;
 }
-EXPORT_SYMBOL_GPL(kvmppc_handle_store);
 
 #ifdef CONFIG_VSX
 static inline int kvmppc_get_vsr_data(struct kvm_vcpu *vcpu, int rs, u64 *val)
@@ -2478,26 +2468,22 @@ long kvmppc_alloc_lpid(void)
 
 	return lpid;
 }
-EXPORT_SYMBOL_GPL(kvmppc_alloc_lpid);
 
 void kvmppc_claim_lpid(long lpid)
 {
 	set_bit(lpid, lpid_inuse);
 }
-EXPORT_SYMBOL_GPL(kvmppc_claim_lpid);
 
 void kvmppc_free_lpid(long lpid)
 {
 	clear_bit(lpid, lpid_inuse);
 }
-EXPORT_SYMBOL_GPL(kvmppc_free_lpid);
 
 void kvmppc_init_lpid(unsigned long nr_lpids_param)
 {
 	nr_lpids = min_t(unsigned long, KVMPPC_NR_LPIDS, nr_lpids_param);
 	memset(lpid_inuse, 0, sizeof(lpid_inuse));
 }
-EXPORT_SYMBOL_GPL(kvmppc_init_lpid);
 
 int kvm_arch_init(void *opaque)
 {
-- 
2.29.2


^ permalink raw reply related

* [PATCH 4/5] KVM: PPC: Book3S: Unify kvm-hv and kvm-pr modules
From: Fabiano Rosas @ 2021-09-01 17:33 UTC (permalink / raw)
  To: kvm-ppc; +Cc: linuxppc-dev, npiggin, david
In-Reply-To: <20210901173357.3183658-1-farosas@linux.ibm.com>

Our three virtualization modules (kvm, kvm-hv, kvm-pr) can be
loaded/unloaded in such a way that could leave kvm.ko loaded without
kvm-hv.ko or kvm-pr.ko. That means the userspace could continue to
issue ioctls to KVM while there is no code on our side to service
them.

We currently select at config time which module will be built, either
kvm-hv, kvm-pr or both, with the kvm module being always present. For
32 bits the kvm module is the only one present and contains the code
from kvm-pr in it. We can do the same for 64 bit and keep hv and pr
inside kvm.ko.

With this, we do not lose the ability of running two guests at the
same time, each using a different implementation (although only POWER
bare-metal with Hash MMU supports HV and PR at the same time).

We lose the ability of loading and unloading the code, which means any
Linux installation that wants to support *both* KVM-HV guests on POWER
bare-metal and KVM-PR nested guests on top of PowerVM would have a
binary with a larger memory footprint (assuming PR is only used when
HV is not present).

This patch alters the build to output only one module at all times and
relies on the Kconfig to select which implementation will be present.

The module init code was rearranged to initialize and cleanup both
implementations or only one of them.

The Kconfig was altered to provide one boolean for each implementation
(KVM_BOOK3S_PR_POSSIBLE, KVM_BOOK3S_HV_POSSIBLE), while keeping the
old tristate for the kvm.ko module (KVM_BOOK3S_64). The old
KVM_BOOK3S_32 already selects KVM_BOOK3S_PR_POSSIBLE, so nothing
changes there.

Two module aliases were created to facilitate the introduction of the
new scheme so `modprobe kvm-hv`and `modprobe kvm-pr` are the same as
`modprobe kvm`.

The license macro was removed because it is already included by
virt/kvm/kvm_main.c.

Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>

---
Here's a summary of the environments that can run KVM:

* 64 bit powernv w/Radix              KVM-HV
* 64 bit powernv w/Hash               KVM-HV        KVM-PR
  32 bit powernv w/Hash                             KVM-PR
  32 bit powernv w/Radix              N/A
* 64 bit pseries w/Radix on powernv   KVM-HV nested
* 64 bit pseries w/Hash  on powernv                 KVM-PR nested
  32 bit pseries w/Hash  on powernv                 KVM-PR nested
* 64 bit pseries w/Radix on PowerVM   N/A
* 64 bit pseries w/Hash  on PowerVM                 KVM-PR nested

Lines with an asterisk were tested with all combinations for the
module and the HV/PR configs.
---
 arch/powerpc/configs/powernv_defconfig |  2 +-
 arch/powerpc/configs/ppc64_defconfig   |  2 +-
 arch/powerpc/configs/pseries_defconfig |  2 +-
 arch/powerpc/kvm/Kconfig               | 72 ++++++++++++--------------
 arch/powerpc/kvm/Makefile              | 11 ++--
 arch/powerpc/kvm/book3s.c              | 46 +++++++++++++---
 arch/powerpc/kvm/book3s.h              | 19 +++++++
 arch/powerpc/kvm/book3s_hv.c           | 10 +---
 arch/powerpc/kvm/book3s_pr.c           | 13 -----
 kernel/irq/irqdesc.c                   |  2 +-
 10 files changed, 105 insertions(+), 74 deletions(-)

diff --git a/arch/powerpc/configs/powernv_defconfig b/arch/powerpc/configs/powernv_defconfig
index 8bfeea6c7de7..853b95685d9f 100644
--- a/arch/powerpc/configs/powernv_defconfig
+++ b/arch/powerpc/configs/powernv_defconfig
@@ -341,7 +341,7 @@ CONFIG_CRYPTO_DEV_NX=y
 CONFIG_CRYPTO_DEV_VMX=y
 CONFIG_VIRTUALIZATION=y
 CONFIG_KVM_BOOK3S_64=m
-CONFIG_KVM_BOOK3S_64_HV=m
+CONFIG_KVM_BOOK3S_HV_POSSIBLE=y
 CONFIG_VHOST_NET=m
 CONFIG_PRINTK_TIME=y
 CONFIG_PRINTK_CALLER=y
diff --git a/arch/powerpc/configs/ppc64_defconfig b/arch/powerpc/configs/ppc64_defconfig
index 0ad2291337a7..46ace457def6 100644
--- a/arch/powerpc/configs/ppc64_defconfig
+++ b/arch/powerpc/configs/ppc64_defconfig
@@ -61,7 +61,7 @@ CONFIG_PCCARD=y
 CONFIG_ELECTRA_CF=y
 CONFIG_VIRTUALIZATION=y
 CONFIG_KVM_BOOK3S_64=m
-CONFIG_KVM_BOOK3S_64_HV=m
+CONFIG_KVM_BOOK3S_HV_POSSIBLE=y
 CONFIG_VHOST_NET=m
 CONFIG_KPROBES=y
 CONFIG_JUMP_LABEL=y
diff --git a/arch/powerpc/configs/pseries_defconfig b/arch/powerpc/configs/pseries_defconfig
index b183629f1bcf..6804b1a6bef1 100644
--- a/arch/powerpc/configs/pseries_defconfig
+++ b/arch/powerpc/configs/pseries_defconfig
@@ -321,7 +321,7 @@ CONFIG_CRYPTO_DEV_NX_ENCRYPT=m
 CONFIG_CRYPTO_DEV_VMX=y
 CONFIG_VIRTUALIZATION=y
 CONFIG_KVM_BOOK3S_64=m
-CONFIG_KVM_BOOK3S_64_HV=m
+CONFIG_KVM_BOOK3S_HV_POSSIBLE=y
 CONFIG_VHOST_NET=m
 CONFIG_PRINTK_TIME=y
 CONFIG_PRINTK_CALLER=y
diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index e45644657d49..2d080f57ce3b 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -41,12 +41,43 @@ config KVM_BOOK3S_64_HANDLER
 	select PPC_DAWR_FORCE_ENABLE
 
 config KVM_BOOK3S_PR_POSSIBLE
-	bool
+	bool "KVM support without using hypervisor mode in host"
+	depends on KVM_BOOK3S_64 || KVM_BOOK3S_32
 	select KVM_MMIO
 	select MMU_NOTIFIER
+	help
+	  Support running guest kernels in virtual machines on processors
+	  without using hypervisor mode in the host, by running the
+	  guest in user mode (problem state) and emulating all
+	  privileged instructions and registers.
+
+	  This is not as fast as using hypervisor mode, but works on
+	  where hypervisor mode is not available or not usable,
+	  and can emulate processors that are different from the host
+	  processor, including emulating 32-bit processors on a 64-bit
+	  host.
+
 
 config KVM_BOOK3S_HV_POSSIBLE
-	bool
+	bool "KVM for POWER7 and later using hypervisor mode in host"
+	depends on KVM_BOOK3S_64
+	select MMU_NOTIFIER
+	select CMA
+	help
+	  Support running unmodified book3s_64 guest kernels in
+	  virtual machines on POWER7 and newer processors that have
+	  hypervisor mode available to the host.
+
+	  If you say Y here, KVM will use the hardware virtualization
+	  facilities of POWER7 (and later) processors, meaning that
+	  guest operating systems will run at full hardware speed
+	  using supervisor and user modes.  However, this also means
+	  that KVM is not usable under PowerVM (pHyp), is only usable
+	  on POWER7 or later processors, and cannot emulate a
+	  different processor from the host processor.
+
+	  If unsure, say N.
+
 
 config KVM_BOOK3S_32
 	tristate "KVM support for PowerPC book3s_32 processors"
@@ -80,43 +111,6 @@ config KVM_BOOK3S_64
 
 	  If unsure, say N.
 
-config KVM_BOOK3S_64_HV
-	tristate "KVM for POWER7 and later using hypervisor mode in host"
-	depends on KVM_BOOK3S_64 && PPC_POWERNV
-	select KVM_BOOK3S_HV_POSSIBLE
-	select MMU_NOTIFIER
-	select CMA
-	help
-	  Support running unmodified book3s_64 guest kernels in
-	  virtual machines on POWER7 and newer processors that have
-	  hypervisor mode available to the host.
-
-	  If you say Y here, KVM will use the hardware virtualization
-	  facilities of POWER7 (and later) processors, meaning that
-	  guest operating systems will run at full hardware speed
-	  using supervisor and user modes.  However, this also means
-	  that KVM is not usable under PowerVM (pHyp), is only usable
-	  on POWER7 or later processors, and cannot emulate a
-	  different processor from the host processor.
-
-	  If unsure, say N.
-
-config KVM_BOOK3S_64_PR
-	tristate "KVM support without using hypervisor mode in host"
-	depends on KVM_BOOK3S_64
-	select KVM_BOOK3S_PR_POSSIBLE
-	help
-	  Support running guest kernels in virtual machines on processors
-	  without using hypervisor mode in the host, by running the
-	  guest in user mode (problem state) and emulating all
-	  privileged instructions and registers.
-
-	  This is not as fast as using hypervisor mode, but works on
-	  machines where hypervisor mode is not available or not usable,
-	  and can emulate processors that are different from the host
-	  processor, including emulating 32-bit processors on a 64-bit
-	  host.
-
 config KVM_BOOK3S_HV_EXIT_TIMING
 	bool "Detailed timing for hypervisor real-mode code"
 	depends on KVM_BOOK3S_HV_POSSIBLE && DEBUG_FS
diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile
index 583c14ef596e..704380065df3 100644
--- a/arch/powerpc/kvm/Makefile
+++ b/arch/powerpc/kvm/Makefile
@@ -108,6 +108,14 @@ kvm-book3s_64-module-objs := \
 	book3s_rtas.o \
 	$(kvm-book3s_64-objs-y)
 
+ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+kvm-book3s_64-module-objs += $(kvm-hv-y)
+endif
+
+ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
+kvm-book3s_64-module-objs += $(kvm-pr-y)
+endif
+
 kvm-objs-$(CONFIG_KVM_BOOK3S_64) := $(kvm-book3s_64-module-objs)
 
 kvm-book3s_32-objs := \
@@ -134,7 +142,4 @@ obj-$(CONFIG_KVM_E500MC) += kvm.o
 obj-$(CONFIG_KVM_BOOK3S_64) += kvm.o
 obj-$(CONFIG_KVM_BOOK3S_32) += kvm.o
 
-obj-$(CONFIG_KVM_BOOK3S_64_PR) += kvm-pr.o
-obj-$(CONFIG_KVM_BOOK3S_64_HV) += kvm-hv.o
-
 obj-y += $(kvm-book3s_64-builtin-objs-y)
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 79833f78d1da..c381bb17b0f9 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -1065,6 +1065,24 @@ int kvm_irq_map_chip_pin(struct kvm *kvm, unsigned irqchip, unsigned pin)
 
 #endif /* CONFIG_KVM_XICS */
 
+static int kvmppc_init(void)
+{
+	int r;
+
+	r = kvmppc_book3s_init_hv();
+	if (r)
+		pr_err("KVM-HV: could not load (%d)\n", r);
+
+	r = kvmppc_book3s_init_pr();
+	if (r)
+		pr_err("KVM-PR: could not load (%d)\n", r);
+
+	if (!kvmppc_hv_ops && !kvmppc_pr_ops)
+		return -EINVAL;
+
+	return 0;
+}
+
 static int kvmppc_book3s_init(void)
 {
 	int r;
@@ -1072,9 +1090,10 @@ static int kvmppc_book3s_init(void)
 	r = kvm_init(NULL, sizeof(struct kvm_vcpu), 0, THIS_MODULE);
 	if (r)
 		return r;
-#ifdef CONFIG_KVM_BOOK3S_32_HANDLER
-	r = kvmppc_book3s_init_pr();
-#endif
+
+	r = kvmppc_init();
+	if (r)
+		goto exit;
 
 #ifdef CONFIG_KVM_XICS
 #ifdef CONFIG_KVM_XIVE
@@ -1087,22 +1106,35 @@ static int kvmppc_book3s_init(void)
 #endif
 		kvm_register_device_ops(&kvm_xics_ops, KVM_DEV_TYPE_XICS);
 #endif
+	return 0;
+
+exit:
+	kvm_exit();
 	return r;
 }
 
 static void kvmppc_book3s_exit(void)
 {
-#ifdef CONFIG_KVM_BOOK3S_32_HANDLER
 	kvmppc_book3s_exit_pr();
-#endif
+	kvmppc_book3s_exit_hv();
 	kvm_exit();
 }
 
 module_init(kvmppc_book3s_init);
 module_exit(kvmppc_book3s_exit);
 
-/* On 32bit this is our one and only kernel module */
-#ifdef CONFIG_KVM_BOOK3S_32_HANDLER
 MODULE_ALIAS_MISCDEV(KVM_MINOR);
 MODULE_ALIAS("devname:kvm");
+
+/*
+ * Whether we use KVM-HV or KVM-PR is dependent exclusively on the
+ * config options. These aliases are here only to ease the transition
+ * to the one module model we have now.
+ */
+#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+MODULE_ALIAS("kvm-hv");
+#endif
+
+#ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
+MODULE_ALIAS("kvm-pr");
 #endif
diff --git a/arch/powerpc/kvm/book3s.h b/arch/powerpc/kvm/book3s.h
index 740e51def5a5..d21772904971 100644
--- a/arch/powerpc/kvm/book3s.h
+++ b/arch/powerpc/kvm/book3s.h
@@ -22,8 +22,27 @@ extern int kvmppc_core_emulate_mtspr_pr(struct kvm_vcpu *vcpu,
 					int sprn, ulong spr_val);
 extern int kvmppc_core_emulate_mfspr_pr(struct kvm_vcpu *vcpu,
 					int sprn, ulong *spr_val);
+#ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
 extern int kvmppc_book3s_init_pr(void);
 extern void kvmppc_book3s_exit_pr(void);
+#else
+static inline int kvmppc_book3s_init_pr(void)
+{
+	return 0;
+}
+static inline void kvmppc_book3s_exit_pr(void) {}
+#endif
+
+#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+extern int kvmppc_book3s_init_hv(void);
+extern void kvmppc_book3s_exit_hv(void);
+#else
+static inline int kvmppc_book3s_init_hv(void)
+{
+	return 0;
+}
+static inline void kvmppc_book3s_exit_hv(void) {}
+#endif
 
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
 extern void kvmppc_emulate_tabort(struct kvm_vcpu *vcpu, int ra_val);
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index a7b82eb438f5..6ba9545ef58e 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -5941,7 +5941,7 @@ static int kvmppc_radix_possible(void)
 	return cpu_has_feature(CPU_FTR_ARCH_300) && radix_enabled();
 }
 
-static int kvmppc_book3s_init_hv(void)
+int kvmppc_book3s_init_hv(void)
 {
 	int r;
 
@@ -6018,7 +6018,7 @@ static int kvmppc_book3s_init_hv(void)
 	return r;
 }
 
-static void kvmppc_book3s_exit_hv(void)
+void kvmppc_book3s_exit_hv(void)
 {
 	kvmppc_uvmem_free();
 	kvmppc_free_host_rm_ops();
@@ -6027,9 +6027,3 @@ static void kvmppc_book3s_exit_hv(void)
 	kvmppc_hv_ops = NULL;
 	kvmhv_nested_exit();
 }
-
-module_init(kvmppc_book3s_init_hv);
-module_exit(kvmppc_book3s_exit_hv);
-MODULE_LICENSE("GPL");
-MODULE_ALIAS_MISCDEV(KVM_MINOR);
-MODULE_ALIAS("devname:kvm");
diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index 6bc9425acb32..1d017c4b3eb4 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -2100,16 +2100,3 @@ void kvmppc_book3s_exit_pr(void)
 	kvmppc_pr_ops = NULL;
 	kvmppc_mmu_hpte_sysexit();
 }
-
-/*
- * We only support separate modules for book3s 64
- */
-#ifdef CONFIG_PPC_BOOK3S_64
-
-module_init(kvmppc_book3s_init_pr);
-module_exit(kvmppc_book3s_exit_pr);
-
-MODULE_LICENSE("GPL");
-MODULE_ALIAS_MISCDEV(KVM_MINOR);
-MODULE_ALIAS("devname:kvm");
-#endif
diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
index 4e3c29bb603c..7c9c9e794c01 100644
--- a/kernel/irq/irqdesc.c
+++ b/kernel/irq/irqdesc.c
@@ -352,7 +352,7 @@ struct irq_desc *irq_to_desc(unsigned int irq)
 {
 	return radix_tree_lookup(&irq_desc_tree, irq);
 }
-#ifdef CONFIG_KVM_BOOK3S_64_HV_MODULE
+#ifdef CONFIG_KVM_BOOK3S_64_MODULE
 EXPORT_SYMBOL_GPL(irq_to_desc);
 #endif
 
-- 
2.29.2


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox