LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH] powerpc/fault: fix wrong KUAP fault for IO_URING
From: Christophe Leroy @ 2021-01-29 12:26 UTC (permalink / raw)
  To: Zorro Lang, Aneesh Kumar K.V; +Cc: Jens Axboe, linuxppc-dev, Nicholas Piggin
In-Reply-To: <20210129065220.GS14354@localhost.localdomain>

+Aneesh

Le 29/01/2021 à 07:52, Zorro Lang a écrit :
> On Thu, Jan 28, 2021 at 03:44:21PM +0100, Christophe Leroy wrote:
>>
>>
>> Le 28/01/2021 à 15:42, Jens Axboe a écrit :
>>> On 1/28/21 6:52 AM, Zorro Lang wrote:
>>>> On Wed, Jan 27, 2021 at 08:06:37PM -0700, Jens Axboe wrote:
>>>>> On 1/27/21 8:13 PM, Zorro Lang wrote:
>>>>>> On Thu, Jan 28, 2021 at 10:18:07AM +1000, Nicholas Piggin wrote:
>>>>>>> Excerpts from Jens Axboe's message of January 28, 2021 5:29 am:
>>>>>>>> On 1/27/21 9:38 AM, Christophe Leroy wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Le 27/01/2021 à 15:56, Zorro Lang a écrit :
>>>>>>>>>> On powerpc, io_uring test hit below KUAP fault on __do_page_fault.
>>>>>>>>>> The fail source line is:
>>>>>>>>>>
>>>>>>>>>>      if (unlikely(!is_user && bad_kernel_fault(regs, error_code, address, is_write)))
>>>>>>>>>>          return SIGSEGV;
>>>>>>>>>>
>>>>>>>>>> The is_user() is based on user_mod(regs) only. This's not suit for
>>>>>>>>>> io_uring, where the helper thread can assume the user app identity
>>>>>>>>>> and could perform this fault just fine. So turn to use mm to decide
>>>>>>>>>> if this is valid or not.
>>>>>>>>>
>>>>>>>>> I don't understand why testing is_user would be an issue. KUAP purpose
>>>>>>>>> it to block any unallowed access from kernel to user memory
>>>>>>>>> (Equivalent to SMAP on x86). So it really must be based on MSR_PR bit,
>>>>>>>>> that is what is_user provides.
>>>>>>>>>
>>>>>>>>> If the kernel access is legitimate, kernel should have opened
>>>>>>>>> userspace access then you shouldn't get this "Bug: Read fault blocked
>>>>>>>>> by KUAP!".
>>>>>>>>>
>>>>>>>>> As far as I understand, the fault occurs in
>>>>>>>>> iov_iter_fault_in_readable() which calls fault_in_pages_readable() And
>>>>>>>>> fault_in_pages_readable() uses __get_user() so it is a legitimate
>>>>>>>>> access and you really should get a KUAP fault.
>>>>>>>>>
>>>>>>>>> So the problem is somewhere else, I think you proposed patch just
>>>>>>>>> hides the problem, it doesn't fix it.
>>>>>>>>
>>>>>>>> If we do kthread_use_mm(), can we agree that the user access is valid?
>>>>>>>
>>>>>>> Yeah the io uring code is fine, provided it uses the uaccess primitives
>>>>>>> like any other kernel code. It's looking more like a an arch/powerpc bug.
>>>>>>>
>>>>>>>> We should be able to copy to/from user space, and including faults, if
>>>>>>>> that's been done and the new mm assigned. Because it really should be.
>>>>>>>> If SMAP was a problem on x86, we would have seen it long ago.
>>>>>>>>
>>>>>>>> I'm assuming this may be breakage related to the recent uaccess changes
>>>>>>>> related to set_fs and friends? Or maybe recent changes on the powerpc
>>>>>>>> side?
>>>>>>>>
>>>>>>>> Zorro, did 5.10 work?
>>>>>>>
>>>>>>> Would be interesting to know.
>>>>>>
>>>>>> Sure Nick and Jens, which 5.10 rc? version do you want to know ? Or any git
>>>>>> commit(be the HEAD) in 5.10 phase?
>>>>>
>>>>> I forget which versions had what series of this, but 5.10 final - and if
>>>>> that fails, then 5.9 final. IIRC, 5.9 was pre any of these changes, and
>>>>> 5.10 definitely has them.
>>>>
>>>> I justed built linux v5.10 with same .config file, and gave it same test.
>>>> v5.10 (HEAD=2c85ebc57b Linux 5.10) can't reproduce this bug:
>>>>
>>>> # ./check generic/013 generic/051
>>>> FSTYP         -- xfs (non-debug)
>>>> PLATFORM      -- Linux/ppc64le ibm-p9z-xxx-xxxx 5.10.0 #3 SMP Thu Jan 28 04:12:14 EST 2021
>>>> MKFS_OPTIONS  -- -f -m crc=1,finobt=1,reflink=1,rmapbt=1,bigtime=1,inobtcount=1 /dev/sda3
>>>> MOUNT_OPTIONS -- -o context=system_u:object_r:root_t:s0 /dev/sda3 /mnt/xfstests/scratch
>>>>
>>>> generic/013 138s ...  77s
>>>> generic/051 103s ...  143s
>>>> Ran: generic/013 generic/051
>>>> Passed all 2 tests
>>>
>>> Thanks for testing that, so I think it's safe to conclude that there's a
>>> regression in powerpc fault handling for kthreads that use
>>> kthread_use_mm in this release. A bisect would definitely find it, but
>>> might be pointless if Christophe or Nick already have an idea of what it
>>> is.
>>>
>>
>> I don't have any idea yet, but I'd be curious to see the vmlinux binary matching the reported Oops.
> 
> I just upload the vmlinux and .config file, the vmlinux it too big, I have to
> upload it to my google store and share the link as below:
> 
> config file: https://drive.google.com/file/d/1pMszboxdjbMPqSNXnMH-1UCZC-vtDnI9/view?usp=sharing
> vmlinux: https://drive.google.com/file/d/1s7g2eBPAFFV61aM2dO9bvVTERGQ9mLYk/view?usp=sharing
> 
> I used latest upstream mainline linux, HEAD commit is:
> 76c057c84d (HEAD -> master, origin/master, origin/HEAD) Merge branch 'parisc-5.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
> 
> The test failed on this kernel as:
> 
> # dmesg
> [   96.200296] ------------[ cut here ]------------
> [   96.200304] Bug: Read fault blocked by KUAP!
> [   96.200309] WARNING: CPU: 3 PID: 1876 at arch/powerpc/mm/fault.c:229 bad_kernel_fault+0x180/0x310

> [   96.200734] NIP [c000000000849424] fault_in_pages_readable+0x104/0x350
> [   96.200741] LR [c00000000084952c] fault_in_pages_readable+0x20c/0x350
> [   96.200747] --- interrupt: 300


Problem happens in a section where userspace access is supposed to be granted, so the patch you 
proposed is definitely not the right fix.

c000000000849408:	2c 01 00 4c 	isync
c00000000084940c:	a6 03 3d 7d 	mtspr   29,r9  <== granting userspace access permission
c000000000849410:	2c 01 00 4c 	isync
c000000000849414:	00 00 36 e9 	ld      r9,0(r22)
c000000000849418:	20 00 29 81 	lwz     r9,32(r9)
c00000000084941c:	00 02 29 71 	andi.   r9,r9,512
c000000000849420:	78 d3 5e 7f 	mr      r30,r26
==> c000000000849424:	00 00 bf 8b 	lbz     r29,0(r31)  <== accessing userspace
c000000000849428:	10 00 82 41 	beq     c000000000849438 <fault_in_pages_readable+0x118>
c00000000084942c:	2c 01 00 4c 	isync
c000000000849430:	a6 03 bd 7e 	mtspr   29,r21  <== clearing userspace access permission
c000000000849434:	2c 01 00 4c 	isync

My first guess is that the problem is linked to the following function, see the comment

/*
  * For kernel thread that doesn't have thread.regs return
  * default AMR/IAMR values.
  */
static inline u64 current_thread_amr(void)
{
	if (current->thread.regs)
		return current->thread.regs->amr;
	return AMR_KUAP_BLOCKED;
}

Above function was introduced by commit 48a8ab4eeb82 ("powerpc/book3s64/pkeys: Don't update SPRN_AMR 
when in kernel mode")

Aneesh, any idea ?

Christophe

^ permalink raw reply

* Re: [PATCH 03/13] module: unexport find_module and module_mutex
From: Miroslav Benes @ 2021-01-29 15:16 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Petr Mladek, Jiri Kosina, Andrew Donnellan, linux-kbuild,
	David Airlie, Masahiro Yamada, Josh Poimboeuf, Maarten Lankhorst,
	linux-kernel, Maxime Ripard, live-patching, Michal Marek,
	Joe Lawrence, dri-devel, Thomas Zimmermann, Jessica Yu,
	Frederic Barrat, Daniel Vetter, linuxppc-dev
In-Reply-To: <20210128181421.2279-4-hch@lst.de>

On Thu, 28 Jan 2021, Christoph Hellwig wrote:

> find_module is not used by modular code any more, and random driver code
> has no business calling it to start with.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Miroslav Benes <mbenes@suse.cz>

M

^ permalink raw reply

* Re: [PATCH 04/13] module: use RCU to synchronize find_module
From: Miroslav Benes @ 2021-01-29 15:29 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Petr Mladek, Jiri Kosina, Andrew Donnellan, linux-kbuild,
	David Airlie, Masahiro Yamada, Josh Poimboeuf, Maarten Lankhorst,
	linux-kernel, Maxime Ripard, live-patching, Michal Marek,
	Joe Lawrence, dri-devel, Thomas Zimmermann, Jessica Yu,
	Frederic Barrat, Daniel Vetter, linuxppc-dev
In-Reply-To: <20210128181421.2279-5-hch@lst.de>

On Thu, 28 Jan 2021, Christoph Hellwig wrote:

> Allow for a RCU-sched critical section around find_module, following
> the lower level find_module_all helper, and switch the two callers
> outside of module.c to use such a RCU-sched critical section instead
> of module_mutex.

That's a nice idea.
 
> @@ -57,7 +58,7 @@ static void klp_find_object_module(struct klp_object *obj)
>  	if (!klp_is_module(obj))
>  		return;
>  
> -	mutex_lock(&module_mutex);
> +	rcu_read_lock_sched();
>  	/*
>  	 * We do not want to block removal of patched modules and therefore
>  	 * we do not take a reference here. The patches are removed by
> @@ -74,7 +75,7 @@ static void klp_find_object_module(struct klp_object *obj)
>  	if (mod && mod->klp_alive)

RCU always baffles me a bit, so I'll ask. Don't we need 
rcu_dereference_sched() here? "mod" comes from a RCU-protected list, so I 
wonder.

>  		obj->mod = mod;
>  
> -	mutex_unlock(&module_mutex);
> +	rcu_read_unlock_sched();
>  }

Thanks

Miroslav

^ permalink raw reply

* [PATCH AUTOSEL 5.10 18/41] scsi: ibmvfc: Set default timeout to avoid crash during migration
From: Sasha Levin @ 2021-01-29 15:36 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Brian King, Sasha Levin, linuxppc-dev, linux-scsi,
	Martin K . Petersen
In-Reply-To: <20210129153713.1592185-1-sashal@kernel.org>

From: Brian King <brking@linux.vnet.ibm.com>

[ Upstream commit 764907293edc1af7ac857389af9dc858944f53dc ]

While testing live partition mobility, we have observed occasional crashes
of the Linux partition. What we've seen is that during the live migration,
for specific configurations with large amounts of memory, slow network
links, and workloads that are changing memory a lot, the partition can end
up being suspended for 30 seconds or longer. This resulted in the following
scenario:

CPU 0                          CPU 1
-------------------------------  ----------------------------------
scsi_queue_rq                    migration_store
 -> blk_mq_start_request          -> rtas_ibm_suspend_me
  -> blk_add_timer                 -> on_each_cpu(rtas_percpu_suspend_me
              _______________________________________V
             |
             V
    -> IPI from CPU 1
     -> rtas_percpu_suspend_me
                                     -> __rtas_suspend_last_cpu

-- Linux partition suspended for > 30 seconds --
                                      -> for_each_online_cpu(cpu)
                                           plpar_hcall_norets(H_PROD
 -> scsi_dispatch_cmd
                                      -> scsi_times_out
                                       -> scsi_abort_command
                                        -> queue_delayed_work
  -> ibmvfc_queuecommand_lck
   -> ibmvfc_send_event
    -> ibmvfc_send_crq
     - returns H_CLOSED
   <- returns SCSI_MLQUEUE_HOST_BUSY
-> __blk_mq_requeue_request

                                      -> scmd_eh_abort_handler
                                       -> scsi_try_to_abort_cmd
                                         - returns SUCCESS
                                       -> scsi_queue_insert

Normally, the SCMD_STATE_COMPLETE bit would protect against the command
completion and the timeout, but that doesn't work here, since we don't
check that at all in the SCSI_MLQUEUE_HOST_BUSY path.

In this case we end up calling scsi_queue_insert on a request that has
already been queued, or possibly even freed, and we crash.

The patch below simply increases the default I/O timeout to avoid this race
condition. This is also the timeout value that nearly all IBM SAN storage
recommends setting as the default value.

Link: https://lore.kernel.org/r/1610463998-19791-1-git-send-email-brking@linux.vnet.ibm.com
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 070cf516b98fe..57c9a71fa33a7 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -2957,8 +2957,10 @@ static int ibmvfc_slave_configure(struct scsi_device *sdev)
 	unsigned long flags = 0;
 
 	spin_lock_irqsave(shost->host_lock, flags);
-	if (sdev->type == TYPE_DISK)
+	if (sdev->type == TYPE_DISK) {
 		sdev->allow_restart = 1;
+		blk_queue_rq_timeout(sdev->request_queue, 120 * HZ);
+	}
 	spin_unlock_irqrestore(shost->host_lock, flags);
 	return 0;
 }
-- 
2.27.0


^ permalink raw reply related

* [PATCH AUTOSEL 5.10 28/41] selftests/powerpc: Only test lwm/stmw on big endian
From: Sasha Levin @ 2021-01-29 15:36 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Sasha Levin, linuxppc-dev, linux-kselftest, Libor Pechacek
In-Reply-To: <20210129153713.1592185-1-sashal@kernel.org>

From: Michael Ellerman <mpe@ellerman.id.au>

[ Upstream commit dd3a44c06f7b4f14e90065bf05d62c255b20005f ]

Newer binutils (>= 2.36) refuse to assemble lmw/stmw when building in
little endian mode. That breaks compilation of our alignment handler
test:

  /tmp/cco4l14N.s: Assembler messages:
  /tmp/cco4l14N.s:1440: Error: `lmw' invalid when little-endian
  /tmp/cco4l14N.s:1814: Error: `stmw' invalid when little-endian
  make[2]: *** [../../lib.mk:139: /output/kselftest/powerpc/alignment/alignment_handler] Error 1

These tests do pass on little endian machines, as the kernel will
still emulate those instructions even when running little
endian (which is arguably a kernel bug).

But we don't really need to test that case, so ifdef those
instructions out to get the alignment test building again.

Reported-by: Libor Pechacek <lpechacek@suse.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Tested-by: Libor Pechacek <lpechacek@suse.com>
Link: https://lore.kernel.org/r/20210119041800.3093047-1-mpe@ellerman.id.au
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 .../testing/selftests/powerpc/alignment/alignment_handler.c  | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/powerpc/alignment/alignment_handler.c b/tools/testing/selftests/powerpc/alignment/alignment_handler.c
index cb53a8b777e68..c25cf7cd45e9f 100644
--- a/tools/testing/selftests/powerpc/alignment/alignment_handler.c
+++ b/tools/testing/selftests/powerpc/alignment/alignment_handler.c
@@ -443,7 +443,6 @@ int test_alignment_handler_integer(void)
 	LOAD_DFORM_TEST(ldu);
 	LOAD_XFORM_TEST(ldx);
 	LOAD_XFORM_TEST(ldux);
-	LOAD_DFORM_TEST(lmw);
 	STORE_DFORM_TEST(stb);
 	STORE_XFORM_TEST(stbx);
 	STORE_DFORM_TEST(stbu);
@@ -462,7 +461,11 @@ int test_alignment_handler_integer(void)
 	STORE_XFORM_TEST(stdx);
 	STORE_DFORM_TEST(stdu);
 	STORE_XFORM_TEST(stdux);
+
+#ifdef __BIG_ENDIAN__
+	LOAD_DFORM_TEST(lmw);
 	STORE_DFORM_TEST(stmw);
+#endif
 
 	return rc;
 }
-- 
2.27.0


^ permalink raw reply related

* [PATCH AUTOSEL 5.4 10/19] scsi: ibmvfc: Set default timeout to avoid crash during migration
From: Sasha Levin @ 2021-01-29 15:37 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Brian King, Sasha Levin, linuxppc-dev, linux-scsi,
	Martin K . Petersen
In-Reply-To: <20210129153806.1592565-1-sashal@kernel.org>

From: Brian King <brking@linux.vnet.ibm.com>

[ Upstream commit 764907293edc1af7ac857389af9dc858944f53dc ]

While testing live partition mobility, we have observed occasional crashes
of the Linux partition. What we've seen is that during the live migration,
for specific configurations with large amounts of memory, slow network
links, and workloads that are changing memory a lot, the partition can end
up being suspended for 30 seconds or longer. This resulted in the following
scenario:

CPU 0                          CPU 1
-------------------------------  ----------------------------------
scsi_queue_rq                    migration_store
 -> blk_mq_start_request          -> rtas_ibm_suspend_me
  -> blk_add_timer                 -> on_each_cpu(rtas_percpu_suspend_me
              _______________________________________V
             |
             V
    -> IPI from CPU 1
     -> rtas_percpu_suspend_me
                                     -> __rtas_suspend_last_cpu

-- Linux partition suspended for > 30 seconds --
                                      -> for_each_online_cpu(cpu)
                                           plpar_hcall_norets(H_PROD
 -> scsi_dispatch_cmd
                                      -> scsi_times_out
                                       -> scsi_abort_command
                                        -> queue_delayed_work
  -> ibmvfc_queuecommand_lck
   -> ibmvfc_send_event
    -> ibmvfc_send_crq
     - returns H_CLOSED
   <- returns SCSI_MLQUEUE_HOST_BUSY
-> __blk_mq_requeue_request

                                      -> scmd_eh_abort_handler
                                       -> scsi_try_to_abort_cmd
                                         - returns SUCCESS
                                       -> scsi_queue_insert

Normally, the SCMD_STATE_COMPLETE bit would protect against the command
completion and the timeout, but that doesn't work here, since we don't
check that at all in the SCSI_MLQUEUE_HOST_BUSY path.

In this case we end up calling scsi_queue_insert on a request that has
already been queued, or possibly even freed, and we crash.

The patch below simply increases the default I/O timeout to avoid this race
condition. This is also the timeout value that nearly all IBM SAN storage
recommends setting as the default value.

Link: https://lore.kernel.org/r/1610463998-19791-1-git-send-email-brking@linux.vnet.ibm.com
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 8a76284b59b08..523809a8a2323 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -2881,8 +2881,10 @@ static int ibmvfc_slave_configure(struct scsi_device *sdev)
 	unsigned long flags = 0;
 
 	spin_lock_irqsave(shost->host_lock, flags);
-	if (sdev->type == TYPE_DISK)
+	if (sdev->type == TYPE_DISK) {
 		sdev->allow_restart = 1;
+		blk_queue_rq_timeout(sdev->request_queue, 120 * HZ);
+	}
 	spin_unlock_irqrestore(shost->host_lock, flags);
 	return 0;
 }
-- 
2.27.0


^ permalink raw reply related

* [PATCH AUTOSEL 5.4 14/19] selftests/powerpc: Only test lwm/stmw on big endian
From: Sasha Levin @ 2021-01-29 15:38 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Sasha Levin, linuxppc-dev, linux-kselftest, Libor Pechacek
In-Reply-To: <20210129153806.1592565-1-sashal@kernel.org>

From: Michael Ellerman <mpe@ellerman.id.au>

[ Upstream commit dd3a44c06f7b4f14e90065bf05d62c255b20005f ]

Newer binutils (>= 2.36) refuse to assemble lmw/stmw when building in
little endian mode. That breaks compilation of our alignment handler
test:

  /tmp/cco4l14N.s: Assembler messages:
  /tmp/cco4l14N.s:1440: Error: `lmw' invalid when little-endian
  /tmp/cco4l14N.s:1814: Error: `stmw' invalid when little-endian
  make[2]: *** [../../lib.mk:139: /output/kselftest/powerpc/alignment/alignment_handler] Error 1

These tests do pass on little endian machines, as the kernel will
still emulate those instructions even when running little
endian (which is arguably a kernel bug).

But we don't really need to test that case, so ifdef those
instructions out to get the alignment test building again.

Reported-by: Libor Pechacek <lpechacek@suse.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Tested-by: Libor Pechacek <lpechacek@suse.com>
Link: https://lore.kernel.org/r/20210119041800.3093047-1-mpe@ellerman.id.au
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 .../testing/selftests/powerpc/alignment/alignment_handler.c  | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/powerpc/alignment/alignment_handler.c b/tools/testing/selftests/powerpc/alignment/alignment_handler.c
index 0453c50c949cb..0725239bbd85c 100644
--- a/tools/testing/selftests/powerpc/alignment/alignment_handler.c
+++ b/tools/testing/selftests/powerpc/alignment/alignment_handler.c
@@ -380,7 +380,6 @@ int test_alignment_handler_integer(void)
 	LOAD_DFORM_TEST(ldu);
 	LOAD_XFORM_TEST(ldx);
 	LOAD_XFORM_TEST(ldux);
-	LOAD_DFORM_TEST(lmw);
 	STORE_DFORM_TEST(stb);
 	STORE_XFORM_TEST(stbx);
 	STORE_DFORM_TEST(stbu);
@@ -399,7 +398,11 @@ int test_alignment_handler_integer(void)
 	STORE_XFORM_TEST(stdx);
 	STORE_DFORM_TEST(stdu);
 	STORE_XFORM_TEST(stdux);
+
+#ifdef __BIG_ENDIAN__
+	LOAD_DFORM_TEST(lmw);
 	STORE_DFORM_TEST(stmw);
+#endif
 
 	return rc;
 }
-- 
2.27.0


^ permalink raw reply related

* [PATCH AUTOSEL 4.19 08/12] scsi: ibmvfc: Set default timeout to avoid crash during migration
From: Sasha Levin @ 2021-01-29 15:38 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Brian King, Sasha Levin, linuxppc-dev, linux-scsi,
	Martin K . Petersen
In-Reply-To: <20210129153831.1592769-1-sashal@kernel.org>

From: Brian King <brking@linux.vnet.ibm.com>

[ Upstream commit 764907293edc1af7ac857389af9dc858944f53dc ]

While testing live partition mobility, we have observed occasional crashes
of the Linux partition. What we've seen is that during the live migration,
for specific configurations with large amounts of memory, slow network
links, and workloads that are changing memory a lot, the partition can end
up being suspended for 30 seconds or longer. This resulted in the following
scenario:

CPU 0                          CPU 1
-------------------------------  ----------------------------------
scsi_queue_rq                    migration_store
 -> blk_mq_start_request          -> rtas_ibm_suspend_me
  -> blk_add_timer                 -> on_each_cpu(rtas_percpu_suspend_me
              _______________________________________V
             |
             V
    -> IPI from CPU 1
     -> rtas_percpu_suspend_me
                                     -> __rtas_suspend_last_cpu

-- Linux partition suspended for > 30 seconds --
                                      -> for_each_online_cpu(cpu)
                                           plpar_hcall_norets(H_PROD
 -> scsi_dispatch_cmd
                                      -> scsi_times_out
                                       -> scsi_abort_command
                                        -> queue_delayed_work
  -> ibmvfc_queuecommand_lck
   -> ibmvfc_send_event
    -> ibmvfc_send_crq
     - returns H_CLOSED
   <- returns SCSI_MLQUEUE_HOST_BUSY
-> __blk_mq_requeue_request

                                      -> scmd_eh_abort_handler
                                       -> scsi_try_to_abort_cmd
                                         - returns SUCCESS
                                       -> scsi_queue_insert

Normally, the SCMD_STATE_COMPLETE bit would protect against the command
completion and the timeout, but that doesn't work here, since we don't
check that at all in the SCSI_MLQUEUE_HOST_BUSY path.

In this case we end up calling scsi_queue_insert on a request that has
already been queued, or possibly even freed, and we crash.

The patch below simply increases the default I/O timeout to avoid this race
condition. This is also the timeout value that nearly all IBM SAN storage
recommends setting as the default value.

Link: https://lore.kernel.org/r/1610463998-19791-1-git-send-email-brking@linux.vnet.ibm.com
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 090ab377f65e5..50078a199fea0 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -2890,8 +2890,10 @@ static int ibmvfc_slave_configure(struct scsi_device *sdev)
 	unsigned long flags = 0;
 
 	spin_lock_irqsave(shost->host_lock, flags);
-	if (sdev->type == TYPE_DISK)
+	if (sdev->type == TYPE_DISK) {
 		sdev->allow_restart = 1;
+		blk_queue_rq_timeout(sdev->request_queue, 120 * HZ);
+	}
 	spin_unlock_irqrestore(shost->host_lock, flags);
 	return 0;
 }
-- 
2.27.0


^ permalink raw reply related

* [PATCH AUTOSEL 4.19 09/12] selftests/powerpc: Only test lwm/stmw on big endian
From: Sasha Levin @ 2021-01-29 15:38 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Sasha Levin, linuxppc-dev, linux-kselftest, Libor Pechacek
In-Reply-To: <20210129153831.1592769-1-sashal@kernel.org>

From: Michael Ellerman <mpe@ellerman.id.au>

[ Upstream commit dd3a44c06f7b4f14e90065bf05d62c255b20005f ]

Newer binutils (>= 2.36) refuse to assemble lmw/stmw when building in
little endian mode. That breaks compilation of our alignment handler
test:

  /tmp/cco4l14N.s: Assembler messages:
  /tmp/cco4l14N.s:1440: Error: `lmw' invalid when little-endian
  /tmp/cco4l14N.s:1814: Error: `stmw' invalid when little-endian
  make[2]: *** [../../lib.mk:139: /output/kselftest/powerpc/alignment/alignment_handler] Error 1

These tests do pass on little endian machines, as the kernel will
still emulate those instructions even when running little
endian (which is arguably a kernel bug).

But we don't really need to test that case, so ifdef those
instructions out to get the alignment test building again.

Reported-by: Libor Pechacek <lpechacek@suse.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Tested-by: Libor Pechacek <lpechacek@suse.com>
Link: https://lore.kernel.org/r/20210119041800.3093047-1-mpe@ellerman.id.au
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 .../testing/selftests/powerpc/alignment/alignment_handler.c  | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/powerpc/alignment/alignment_handler.c b/tools/testing/selftests/powerpc/alignment/alignment_handler.c
index 169a8b9719fb9..4f8335e0c9858 100644
--- a/tools/testing/selftests/powerpc/alignment/alignment_handler.c
+++ b/tools/testing/selftests/powerpc/alignment/alignment_handler.c
@@ -384,7 +384,6 @@ int test_alignment_handler_integer(void)
 	LOAD_DFORM_TEST(ldu);
 	LOAD_XFORM_TEST(ldx);
 	LOAD_XFORM_TEST(ldux);
-	LOAD_DFORM_TEST(lmw);
 	STORE_DFORM_TEST(stb);
 	STORE_XFORM_TEST(stbx);
 	STORE_DFORM_TEST(stbu);
@@ -403,7 +402,11 @@ int test_alignment_handler_integer(void)
 	STORE_XFORM_TEST(stdx);
 	STORE_DFORM_TEST(stdu);
 	STORE_XFORM_TEST(stdux);
+
+#ifdef __BIG_ENDIAN__
+	LOAD_DFORM_TEST(lmw);
 	STORE_DFORM_TEST(stmw);
+#endif
 
 	return rc;
 }
-- 
2.27.0


^ permalink raw reply related

* [PATCH AUTOSEL 4.14 6/8] scsi: ibmvfc: Set default timeout to avoid crash during migration
From: Sasha Levin @ 2021-01-29 15:38 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Brian King, Sasha Levin, linuxppc-dev, linux-scsi,
	Martin K . Petersen
In-Reply-To: <20210129153848.1592916-1-sashal@kernel.org>

From: Brian King <brking@linux.vnet.ibm.com>

[ Upstream commit 764907293edc1af7ac857389af9dc858944f53dc ]

While testing live partition mobility, we have observed occasional crashes
of the Linux partition. What we've seen is that during the live migration,
for specific configurations with large amounts of memory, slow network
links, and workloads that are changing memory a lot, the partition can end
up being suspended for 30 seconds or longer. This resulted in the following
scenario:

CPU 0                          CPU 1
-------------------------------  ----------------------------------
scsi_queue_rq                    migration_store
 -> blk_mq_start_request          -> rtas_ibm_suspend_me
  -> blk_add_timer                 -> on_each_cpu(rtas_percpu_suspend_me
              _______________________________________V
             |
             V
    -> IPI from CPU 1
     -> rtas_percpu_suspend_me
                                     -> __rtas_suspend_last_cpu

-- Linux partition suspended for > 30 seconds --
                                      -> for_each_online_cpu(cpu)
                                           plpar_hcall_norets(H_PROD
 -> scsi_dispatch_cmd
                                      -> scsi_times_out
                                       -> scsi_abort_command
                                        -> queue_delayed_work
  -> ibmvfc_queuecommand_lck
   -> ibmvfc_send_event
    -> ibmvfc_send_crq
     - returns H_CLOSED
   <- returns SCSI_MLQUEUE_HOST_BUSY
-> __blk_mq_requeue_request

                                      -> scmd_eh_abort_handler
                                       -> scsi_try_to_abort_cmd
                                         - returns SUCCESS
                                       -> scsi_queue_insert

Normally, the SCMD_STATE_COMPLETE bit would protect against the command
completion and the timeout, but that doesn't work here, since we don't
check that at all in the SCSI_MLQUEUE_HOST_BUSY path.

In this case we end up calling scsi_queue_insert on a request that has
already been queued, or possibly even freed, and we crash.

The patch below simply increases the default I/O timeout to avoid this race
condition. This is also the timeout value that nearly all IBM SAN storage
recommends setting as the default value.

Link: https://lore.kernel.org/r/1610463998-19791-1-git-send-email-brking@linux.vnet.ibm.com
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index dbacd9830d3df..460014ded14de 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -2891,8 +2891,10 @@ static int ibmvfc_slave_configure(struct scsi_device *sdev)
 	unsigned long flags = 0;
 
 	spin_lock_irqsave(shost->host_lock, flags);
-	if (sdev->type == TYPE_DISK)
+	if (sdev->type == TYPE_DISK) {
 		sdev->allow_restart = 1;
+		blk_queue_rq_timeout(sdev->request_queue, 120 * HZ);
+	}
 	spin_unlock_irqrestore(shost->host_lock, flags);
 	return 0;
 }
-- 
2.27.0


^ permalink raw reply related

* [PATCH AUTOSEL 4.9 3/4] scsi: ibmvfc: Set default timeout to avoid crash during migration
From: Sasha Levin @ 2021-01-29 15:38 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Brian King, Sasha Levin, linuxppc-dev, linux-scsi,
	Martin K . Petersen
In-Reply-To: <20210129153859.1593031-1-sashal@kernel.org>

From: Brian King <brking@linux.vnet.ibm.com>

[ Upstream commit 764907293edc1af7ac857389af9dc858944f53dc ]

While testing live partition mobility, we have observed occasional crashes
of the Linux partition. What we've seen is that during the live migration,
for specific configurations with large amounts of memory, slow network
links, and workloads that are changing memory a lot, the partition can end
up being suspended for 30 seconds or longer. This resulted in the following
scenario:

CPU 0                          CPU 1
-------------------------------  ----------------------------------
scsi_queue_rq                    migration_store
 -> blk_mq_start_request          -> rtas_ibm_suspend_me
  -> blk_add_timer                 -> on_each_cpu(rtas_percpu_suspend_me
              _______________________________________V
             |
             V
    -> IPI from CPU 1
     -> rtas_percpu_suspend_me
                                     -> __rtas_suspend_last_cpu

-- Linux partition suspended for > 30 seconds --
                                      -> for_each_online_cpu(cpu)
                                           plpar_hcall_norets(H_PROD
 -> scsi_dispatch_cmd
                                      -> scsi_times_out
                                       -> scsi_abort_command
                                        -> queue_delayed_work
  -> ibmvfc_queuecommand_lck
   -> ibmvfc_send_event
    -> ibmvfc_send_crq
     - returns H_CLOSED
   <- returns SCSI_MLQUEUE_HOST_BUSY
-> __blk_mq_requeue_request

                                      -> scmd_eh_abort_handler
                                       -> scsi_try_to_abort_cmd
                                         - returns SUCCESS
                                       -> scsi_queue_insert

Normally, the SCMD_STATE_COMPLETE bit would protect against the command
completion and the timeout, but that doesn't work here, since we don't
check that at all in the SCSI_MLQUEUE_HOST_BUSY path.

In this case we end up calling scsi_queue_insert on a request that has
already been queued, or possibly even freed, and we crash.

The patch below simply increases the default I/O timeout to avoid this race
condition. This is also the timeout value that nearly all IBM SAN storage
recommends setting as the default value.

Link: https://lore.kernel.org/r/1610463998-19791-1-git-send-email-brking@linux.vnet.ibm.com
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 04b3ac17531db..7865feb8e5e83 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -2891,8 +2891,10 @@ static int ibmvfc_slave_configure(struct scsi_device *sdev)
 	unsigned long flags = 0;
 
 	spin_lock_irqsave(shost->host_lock, flags);
-	if (sdev->type == TYPE_DISK)
+	if (sdev->type == TYPE_DISK) {
 		sdev->allow_restart = 1;
+		blk_queue_rq_timeout(sdev->request_queue, 120 * HZ);
+	}
 	spin_unlock_irqrestore(shost->host_lock, flags);
 	return 0;
 }
-- 
2.27.0


^ permalink raw reply related

* [PATCH AUTOSEL 4.4 2/2] scsi: ibmvfc: Set default timeout to avoid crash during migration
From: Sasha Levin @ 2021-01-29 15:39 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Brian King, Sasha Levin, linuxppc-dev, linux-scsi,
	Martin K . Petersen
In-Reply-To: <20210129153906.1593114-1-sashal@kernel.org>

From: Brian King <brking@linux.vnet.ibm.com>

[ Upstream commit 764907293edc1af7ac857389af9dc858944f53dc ]

While testing live partition mobility, we have observed occasional crashes
of the Linux partition. What we've seen is that during the live migration,
for specific configurations with large amounts of memory, slow network
links, and workloads that are changing memory a lot, the partition can end
up being suspended for 30 seconds or longer. This resulted in the following
scenario:

CPU 0                          CPU 1
-------------------------------  ----------------------------------
scsi_queue_rq                    migration_store
 -> blk_mq_start_request          -> rtas_ibm_suspend_me
  -> blk_add_timer                 -> on_each_cpu(rtas_percpu_suspend_me
              _______________________________________V
             |
             V
    -> IPI from CPU 1
     -> rtas_percpu_suspend_me
                                     -> __rtas_suspend_last_cpu

-- Linux partition suspended for > 30 seconds --
                                      -> for_each_online_cpu(cpu)
                                           plpar_hcall_norets(H_PROD
 -> scsi_dispatch_cmd
                                      -> scsi_times_out
                                       -> scsi_abort_command
                                        -> queue_delayed_work
  -> ibmvfc_queuecommand_lck
   -> ibmvfc_send_event
    -> ibmvfc_send_crq
     - returns H_CLOSED
   <- returns SCSI_MLQUEUE_HOST_BUSY
-> __blk_mq_requeue_request

                                      -> scmd_eh_abort_handler
                                       -> scsi_try_to_abort_cmd
                                         - returns SUCCESS
                                       -> scsi_queue_insert

Normally, the SCMD_STATE_COMPLETE bit would protect against the command
completion and the timeout, but that doesn't work here, since we don't
check that at all in the SCSI_MLQUEUE_HOST_BUSY path.

In this case we end up calling scsi_queue_insert on a request that has
already been queued, or possibly even freed, and we crash.

The patch below simply increases the default I/O timeout to avoid this race
condition. This is also the timeout value that nearly all IBM SAN storage
recommends setting as the default value.

Link: https://lore.kernel.org/r/1610463998-19791-1-git-send-email-brking@linux.vnet.ibm.com
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index db80ab8335dfb..aa74f72e582ab 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -2883,8 +2883,10 @@ static int ibmvfc_slave_configure(struct scsi_device *sdev)
 	unsigned long flags = 0;
 
 	spin_lock_irqsave(shost->host_lock, flags);
-	if (sdev->type == TYPE_DISK)
+	if (sdev->type == TYPE_DISK) {
 		sdev->allow_restart = 1;
+		blk_queue_rq_timeout(sdev->request_queue, 120 * HZ);
+	}
 	spin_unlock_irqrestore(shost->host_lock, flags);
 	return 0;
 }
-- 
2.27.0


^ permalink raw reply related

* Re: [PATCH v2] tpm: ibmvtpm: fix error return code in tpm_ibmvtpm_probe()
From: Jarkko Sakkinen @ 2021-01-29 17:35 UTC (permalink / raw)
  To: Stefan Berger
  Cc: Wang Hai, linux-kernel, Hulk Robot, paulus, linux-integrity,
	linuxppc-dev, Stefan Berger
In-Reply-To: <20210126014753.340299-1-stefanb@linux.vnet.ibm.com>

On Mon, Jan 25, 2021 at 08:47:53PM -0500, Stefan Berger wrote:
> From: Stefan Berger <stefanb@linux.ibm.com>
> 
> Return error code -ETIMEDOUT rather than '0' when waiting for the
> rtce_buf to be set has timed out.
> 
> Fixes: d8d74ea3c002 ("tpm: ibmvtpm: Wait for buffer to be set before proceeding")
> Reported-by: Hulk Robot <hulkci@huawei.com>
> Signed-off-by: Wang Hai <wanghai38@huawei.com>
> Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
> ---

Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>

Thanks! Should I add

Cc: stable@vger.kernel.org to this?

/Jarkko


>  drivers/char/tpm/tpm_ibmvtpm.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/char/tpm/tpm_ibmvtpm.c b/drivers/char/tpm/tpm_ibmvtpm.c
> index 994385bf37c0..813eb2cac0ce 100644
> --- a/drivers/char/tpm/tpm_ibmvtpm.c
> +++ b/drivers/char/tpm/tpm_ibmvtpm.c
> @@ -687,6 +687,7 @@ static int tpm_ibmvtpm_probe(struct vio_dev *vio_dev,
>  				ibmvtpm->rtce_buf != NULL,
>  				HZ)) {
>  		dev_err(dev, "CRQ response timed out\n");
> +		rc = -ETIMEDOUT;
>  		goto init_irq_cleanup;
>  	}
>  
> -- 
> 2.25.4
> 
> 

^ permalink raw reply

* Re: [PATCH v2] tpm: ibmvtpm: fix error return code in tpm_ibmvtpm_probe()
From: Stefan Berger @ 2021-01-29 18:57 UTC (permalink / raw)
  To: Jarkko Sakkinen, Stefan Berger
  Cc: Wang Hai, linux-kernel, Hulk Robot, paulus, linux-integrity,
	linuxppc-dev
In-Reply-To: <YBRHfZeqAirQolIN@kernel.org>

On 1/29/21 12:35 PM, Jarkko Sakkinen wrote:
> On Mon, Jan 25, 2021 at 08:47:53PM -0500, Stefan Berger wrote:
>> From: Stefan Berger <stefanb@linux.ibm.com>
>>
>> Return error code -ETIMEDOUT rather than '0' when waiting for the
>> rtce_buf to be set has timed out.
>>
>> Fixes: d8d74ea3c002 ("tpm: ibmvtpm: Wait for buffer to be set before proceeding")
>> Reported-by: Hulk Robot <hulkci@huawei.com>
>> Signed-off-by: Wang Hai <wanghai38@huawei.com>
>> Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
>> ---
> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
>
> Thanks! Should I add
>
> Cc: stable@vger.kernel.org to this?

Yes, that would be good! Thank you!


    Stefan


>
> /Jarkko
>
>
>>   drivers/char/tpm/tpm_ibmvtpm.c | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/char/tpm/tpm_ibmvtpm.c b/drivers/char/tpm/tpm_ibmvtpm.c
>> index 994385bf37c0..813eb2cac0ce 100644
>> --- a/drivers/char/tpm/tpm_ibmvtpm.c
>> +++ b/drivers/char/tpm/tpm_ibmvtpm.c
>> @@ -687,6 +687,7 @@ static int tpm_ibmvtpm_probe(struct vio_dev *vio_dev,
>>   				ibmvtpm->rtce_buf != NULL,
>>   				HZ)) {
>>   		dev_err(dev, "CRQ response timed out\n");
>> +		rc = -ETIMEDOUT;
>>   		goto init_irq_cleanup;
>>   	}
>>   
>> -- 
>> 2.25.4
>>
>>


^ permalink raw reply

* Re: [PATCH] vio: make remove callback return void
From: Tyrel Datwyler @ 2021-01-29 18:58 UTC (permalink / raw)
  To: Uwe Kleine-König, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, David S. Miller, Jens Axboe, Matt Mackall,
	Herbert Xu, Peter Huewe, Jarkko Sakkinen, Jason Gunthorpe,
	Haren Myneni, Breno Leitão, Nayna Jain,
	Paulo Flabiano Smorigo, Steven Royer, Arnd Bergmann,
	Greg Kroah-Hartman, Cristobal Forno, Jakub Kicinski, Dany Madden,
	Lijun Pan, Sukadev Bhattiprolu, James E.J. Bottomley,
	Martin K. Petersen, Michael Cyr, Jiri Slaby
  Cc: linux-scsi, netdev, linux-kernel, linux-block, target-devel,
	linux-crypto, sparclinux, linux-integrity, linuxppc-dev
In-Reply-To: <20210127215010.99954-1-uwe@kleine-koenig.org>

On 1/27/21 1:50 PM, Uwe Kleine-König wrote:
> The driver core ignores the return value of struct bus_type::remove()
> because there is only little that can be done. To simplify the quest to
> make this function return void, let struct vio_driver::remove() return
> void, too. All users already unconditionally return 0, this commit makes
> it obvious that returning an error code is a bad idea and makes it
> obvious for future driver authors that returning an error code isn't
> intended.
> 
> Note there are two nominally different implementations for a vio bus:
> one in arch/sparc/kernel/vio.c and the other in
> arch/powerpc/platforms/pseries/vio.c. I didn't care to check which
> driver is using which of these busses (or if even some of them can be
> used with both) and simply adapt all drivers and the two bus codes in
> one go.
> 
> Note that for the powerpc implementation there is a semantical change:
> Before this patch for a device that was bound to a driver without a
> remove callback vio_cmo_bus_remove(viodev) wasn't called. As the device
> core still considers the device unbound after vio_bus_remove() returns
> calling this unconditionally is the consistent behaviour which is
> implemented here.
> 
> Signed-off-by: Uwe Kleine-König <uwe@kleine-koenig.org>

Reviewed-by: Tyrel Datwyler <tyreld@linux.ibm.com>

> ---
> Hello,
> 
> note that this change depends on
> https://lore.kernel.org/r/20210121062005.53271-1-ljp@linux.ibm.com which removes
> an (ignored) return -EBUSY in drivers/net/ethernet/ibm/ibmvnic.c.
> I don't know when/if this latter patch will be applied, so it might take
> some time until my patch can go in.
> 
> Best regards
> Uwe
> 
>  arch/powerpc/include/asm/vio.h           | 2 +-
>  arch/powerpc/platforms/pseries/vio.c     | 7 +++----
>  arch/sparc/include/asm/vio.h             | 2 +-
>  arch/sparc/kernel/ds.c                   | 6 ------
>  arch/sparc/kernel/vio.c                  | 4 ++--
>  drivers/block/sunvdc.c                   | 3 +--
>  drivers/char/hw_random/pseries-rng.c     | 3 +--
>  drivers/char/tpm/tpm_ibmvtpm.c           | 4 +---
>  drivers/crypto/nx/nx-842-pseries.c       | 4 +---
>  drivers/crypto/nx/nx.c                   | 4 +---
>  drivers/misc/ibmvmc.c                    | 4 +---
>  drivers/net/ethernet/ibm/ibmveth.c       | 4 +---
>  drivers/net/ethernet/ibm/ibmvnic.c       | 4 +---
>  drivers/net/ethernet/sun/ldmvsw.c        | 4 +---
>  drivers/net/ethernet/sun/sunvnet.c       | 3 +--
>  drivers/scsi/ibmvscsi/ibmvfc.c           | 3 +--
>  drivers/scsi/ibmvscsi/ibmvscsi.c         | 4 +---
>  drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c | 4 +---
>  drivers/tty/hvc/hvcs.c                   | 3 +--
>  drivers/tty/vcc.c                        | 4 +---
>  20 files changed, 22 insertions(+), 54 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/vio.h b/arch/powerpc/include/asm/vio.h
> index 0cf52746531b..721c0d6715ac 100644
> --- a/arch/powerpc/include/asm/vio.h
> +++ b/arch/powerpc/include/asm/vio.h
> @@ -113,7 +113,7 @@ struct vio_driver {
>  	const char *name;
>  	const struct vio_device_id *id_table;
>  	int (*probe)(struct vio_dev *dev, const struct vio_device_id *id);
> -	int (*remove)(struct vio_dev *dev);
> +	void (*remove)(struct vio_dev *dev);
>  	/* A driver must have a get_desired_dma() function to
>  	 * be loaded in a CMO environment if it uses DMA.
>  	 */
> diff --git a/arch/powerpc/platforms/pseries/vio.c b/arch/powerpc/platforms/pseries/vio.c
> index b2797cfe4e2b..9cb4fc839fd5 100644
> --- a/arch/powerpc/platforms/pseries/vio.c
> +++ b/arch/powerpc/platforms/pseries/vio.c
> @@ -1261,7 +1261,6 @@ static int vio_bus_remove(struct device *dev)
>  	struct vio_dev *viodev = to_vio_dev(dev);
>  	struct vio_driver *viodrv = to_vio_driver(dev->driver);
>  	struct device *devptr;
> -	int ret = 1;
>  
>  	/*
>  	 * Hold a reference to the device after the remove function is called
> @@ -1270,13 +1269,13 @@ static int vio_bus_remove(struct device *dev)
>  	devptr = get_device(dev);
>  
>  	if (viodrv->remove)
> -		ret = viodrv->remove(viodev);
> +		viodrv->remove(viodev);
>  
> -	if (!ret && firmware_has_feature(FW_FEATURE_CMO))
> +	if (firmware_has_feature(FW_FEATURE_CMO))
>  		vio_cmo_bus_remove(viodev);
>  
>  	put_device(devptr);
> -	return ret;
> +	return 0;
>  }
>  
>  /**
> diff --git a/arch/sparc/include/asm/vio.h b/arch/sparc/include/asm/vio.h
> index 059f0eb678e0..8a1a83bbb6d5 100644
> --- a/arch/sparc/include/asm/vio.h
> +++ b/arch/sparc/include/asm/vio.h
> @@ -362,7 +362,7 @@ struct vio_driver {
>  	struct list_head		node;
>  	const struct vio_device_id	*id_table;
>  	int (*probe)(struct vio_dev *dev, const struct vio_device_id *id);
> -	int (*remove)(struct vio_dev *dev);
> +	void (*remove)(struct vio_dev *dev);
>  	void (*shutdown)(struct vio_dev *dev);
>  	unsigned long			driver_data;
>  	struct device_driver		driver;
> diff --git a/arch/sparc/kernel/ds.c b/arch/sparc/kernel/ds.c
> index 522e5b51050c..4a5bdb0df779 100644
> --- a/arch/sparc/kernel/ds.c
> +++ b/arch/sparc/kernel/ds.c
> @@ -1236,11 +1236,6 @@ static int ds_probe(struct vio_dev *vdev, const struct vio_device_id *id)
>  	return err;
>  }
>  
> -static int ds_remove(struct vio_dev *vdev)
> -{
> -	return 0;
> -}
> -
>  static const struct vio_device_id ds_match[] = {
>  	{
>  		.type = "domain-services-port",
> @@ -1251,7 +1246,6 @@ static const struct vio_device_id ds_match[] = {
>  static struct vio_driver ds_driver = {
>  	.id_table	= ds_match,
>  	.probe		= ds_probe,
> -	.remove		= ds_remove,
>  	.name		= "ds",
>  };
>  
> diff --git a/arch/sparc/kernel/vio.c b/arch/sparc/kernel/vio.c
> index 4f57056ed463..348a88691219 100644
> --- a/arch/sparc/kernel/vio.c
> +++ b/arch/sparc/kernel/vio.c
> @@ -105,10 +105,10 @@ static int vio_device_remove(struct device *dev)
>  		 * routines to do so at the moment. TBD
>  		 */
>  
> -		return drv->remove(vdev);
> +		drv->remove(vdev);
>  	}
>  
> -	return 1;
> +	return 0;
>  }
>  
>  static ssize_t devspec_show(struct device *dev,
> diff --git a/drivers/block/sunvdc.c b/drivers/block/sunvdc.c
> index 39aeebc6837d..1547d4345ad8 100644
> --- a/drivers/block/sunvdc.c
> +++ b/drivers/block/sunvdc.c
> @@ -1071,7 +1071,7 @@ static int vdc_port_probe(struct vio_dev *vdev, const struct vio_device_id *id)
>  	return err;
>  }
>  
> -static int vdc_port_remove(struct vio_dev *vdev)
> +static void vdc_port_remove(struct vio_dev *vdev)
>  {
>  	struct vdc_port *port = dev_get_drvdata(&vdev->dev);
>  
> @@ -1094,7 +1094,6 @@ static int vdc_port_remove(struct vio_dev *vdev)
>  
>  		kfree(port);
>  	}
> -	return 0;
>  }
>  
>  static void vdc_requeue_inflight(struct vdc_port *port)
> diff --git a/drivers/char/hw_random/pseries-rng.c b/drivers/char/hw_random/pseries-rng.c
> index 8038a8a9fb58..f4949b689bd5 100644
> --- a/drivers/char/hw_random/pseries-rng.c
> +++ b/drivers/char/hw_random/pseries-rng.c
> @@ -54,10 +54,9 @@ static int pseries_rng_probe(struct vio_dev *dev,
>  	return hwrng_register(&pseries_rng);
>  }
>  
> -static int pseries_rng_remove(struct vio_dev *dev)
> +static void pseries_rng_remove(struct vio_dev *dev)
>  {
>  	hwrng_unregister(&pseries_rng);
> -	return 0;
>  }
>  
>  static const struct vio_device_id pseries_rng_driver_ids[] = {
> diff --git a/drivers/char/tpm/tpm_ibmvtpm.c b/drivers/char/tpm/tpm_ibmvtpm.c
> index 994385bf37c0..903604769de9 100644
> --- a/drivers/char/tpm/tpm_ibmvtpm.c
> +++ b/drivers/char/tpm/tpm_ibmvtpm.c
> @@ -343,7 +343,7 @@ static int ibmvtpm_crq_send_init_complete(struct ibmvtpm_dev *ibmvtpm)
>   *
>   * Return: Always 0.
>   */
> -static int tpm_ibmvtpm_remove(struct vio_dev *vdev)
> +static void tpm_ibmvtpm_remove(struct vio_dev *vdev)
>  {
>  	struct tpm_chip *chip = dev_get_drvdata(&vdev->dev);
>  	struct ibmvtpm_dev *ibmvtpm = dev_get_drvdata(&chip->dev);
> @@ -372,8 +372,6 @@ static int tpm_ibmvtpm_remove(struct vio_dev *vdev)
>  	kfree(ibmvtpm);
>  	/* For tpm_ibmvtpm_get_desired_dma */
>  	dev_set_drvdata(&vdev->dev, NULL);
> -
> -	return 0;
>  }
>  
>  /**
> diff --git a/drivers/crypto/nx/nx-842-pseries.c b/drivers/crypto/nx/nx-842-pseries.c
> index 2de5e3672e42..cc8dd3072b8b 100644
> --- a/drivers/crypto/nx/nx-842-pseries.c
> +++ b/drivers/crypto/nx/nx-842-pseries.c
> @@ -1042,7 +1042,7 @@ static int nx842_probe(struct vio_dev *viodev,
>  	return ret;
>  }
>  
> -static int nx842_remove(struct vio_dev *viodev)
> +static void nx842_remove(struct vio_dev *viodev)
>  {
>  	struct nx842_devdata *old_devdata;
>  	unsigned long flags;
> @@ -1063,8 +1063,6 @@ static int nx842_remove(struct vio_dev *viodev)
>  	if (old_devdata)
>  		kfree(old_devdata->counters);
>  	kfree(old_devdata);
> -
> -	return 0;
>  }
>  
>  static const struct vio_device_id nx842_vio_driver_ids[] = {
> diff --git a/drivers/crypto/nx/nx.c b/drivers/crypto/nx/nx.c
> index 0d2dc5be7f19..1d0e8a1ba160 100644
> --- a/drivers/crypto/nx/nx.c
> +++ b/drivers/crypto/nx/nx.c
> @@ -783,7 +783,7 @@ static int nx_probe(struct vio_dev *viodev, const struct vio_device_id *id)
>  	return nx_register_algs();
>  }
>  
> -static int nx_remove(struct vio_dev *viodev)
> +static void nx_remove(struct vio_dev *viodev)
>  {
>  	dev_dbg(&viodev->dev, "entering nx_remove for UA 0x%x\n",
>  		viodev->unit_address);
> @@ -811,8 +811,6 @@ static int nx_remove(struct vio_dev *viodev)
>  		nx_unregister_skcipher(&nx_ecb_aes_alg, NX_FC_AES,
>  				       NX_MODE_AES_ECB);
>  	}
> -
> -	return 0;
>  }
>  
>  
> diff --git a/drivers/misc/ibmvmc.c b/drivers/misc/ibmvmc.c
> index 2d778d0f011e..c0fe3295c330 100644
> --- a/drivers/misc/ibmvmc.c
> +++ b/drivers/misc/ibmvmc.c
> @@ -2288,15 +2288,13 @@ static int ibmvmc_probe(struct vio_dev *vdev, const struct vio_device_id *id)
>  	return -EPERM;
>  }
>  
> -static int ibmvmc_remove(struct vio_dev *vdev)
> +static void ibmvmc_remove(struct vio_dev *vdev)
>  {
>  	struct crq_server_adapter *adapter = dev_get_drvdata(&vdev->dev);
>  
>  	dev_info(adapter->dev, "Entering remove for UA 0x%x\n",
>  		 vdev->unit_address);
>  	ibmvmc_release_crq_queue(adapter);
> -
> -	return 0;
>  }
>  
>  static struct vio_device_id ibmvmc_device_table[] = {
> diff --git a/drivers/net/ethernet/ibm/ibmveth.c b/drivers/net/ethernet/ibm/ibmveth.c
> index c3ec9ceed833..7fea9ae60f13 100644
> --- a/drivers/net/ethernet/ibm/ibmveth.c
> +++ b/drivers/net/ethernet/ibm/ibmveth.c
> @@ -1758,7 +1758,7 @@ static int ibmveth_probe(struct vio_dev *dev, const struct vio_device_id *id)
>  	return 0;
>  }
>  
> -static int ibmveth_remove(struct vio_dev *dev)
> +static void ibmveth_remove(struct vio_dev *dev)
>  {
>  	struct net_device *netdev = dev_get_drvdata(&dev->dev);
>  	struct ibmveth_adapter *adapter = netdev_priv(netdev);
> @@ -1771,8 +1771,6 @@ static int ibmveth_remove(struct vio_dev *dev)
>  
>  	free_netdev(netdev);
>  	dev_set_drvdata(&dev->dev, NULL);
> -
> -	return 0;
>  }
>  
>  static struct attribute veth_active_attr;
> diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
> index a187d51bcf92..2eec0652760c 100644
> --- a/drivers/net/ethernet/ibm/ibmvnic.c
> +++ b/drivers/net/ethernet/ibm/ibmvnic.c
> @@ -5430,7 +5430,7 @@ static int ibmvnic_probe(struct vio_dev *dev, const struct vio_device_id *id)
>  	return rc;
>  }
>  
> -static int ibmvnic_remove(struct vio_dev *dev)
> +static void ibmvnic_remove(struct vio_dev *dev)
>  {
>  	struct net_device *netdev = dev_get_drvdata(&dev->dev);
>  	struct ibmvnic_adapter *adapter = netdev_priv(netdev);
> @@ -5460,8 +5460,6 @@ static int ibmvnic_remove(struct vio_dev *dev)
>  	device_remove_file(&dev->dev, &dev_attr_failover);
>  	free_netdev(netdev);
>  	dev_set_drvdata(&dev->dev, NULL);
> -
> -	return 0;
>  }
>  
>  static ssize_t failover_store(struct device *dev, struct device_attribute *attr,
> diff --git a/drivers/net/ethernet/sun/ldmvsw.c b/drivers/net/ethernet/sun/ldmvsw.c
> index 01ea0d6f8819..50bd4e3b0af9 100644
> --- a/drivers/net/ethernet/sun/ldmvsw.c
> +++ b/drivers/net/ethernet/sun/ldmvsw.c
> @@ -404,7 +404,7 @@ static int vsw_port_probe(struct vio_dev *vdev, const struct vio_device_id *id)
>  	return err;
>  }
>  
> -static int vsw_port_remove(struct vio_dev *vdev)
> +static void vsw_port_remove(struct vio_dev *vdev)
>  {
>  	struct vnet_port *port = dev_get_drvdata(&vdev->dev);
>  	unsigned long flags;
> @@ -430,8 +430,6 @@ static int vsw_port_remove(struct vio_dev *vdev)
>  
>  		free_netdev(port->dev);
>  	}
> -
> -	return 0;
>  }
>  
>  static void vsw_cleanup(void)
> diff --git a/drivers/net/ethernet/sun/sunvnet.c b/drivers/net/ethernet/sun/sunvnet.c
> index 96b883f965f6..58ee89223951 100644
> --- a/drivers/net/ethernet/sun/sunvnet.c
> +++ b/drivers/net/ethernet/sun/sunvnet.c
> @@ -510,7 +510,7 @@ static int vnet_port_probe(struct vio_dev *vdev, const struct vio_device_id *id)
>  	return err;
>  }
>  
> -static int vnet_port_remove(struct vio_dev *vdev)
> +static void vnet_port_remove(struct vio_dev *vdev)
>  {
>  	struct vnet_port *port = dev_get_drvdata(&vdev->dev);
>  
> @@ -533,7 +533,6 @@ static int vnet_port_remove(struct vio_dev *vdev)
>  
>  		kfree(port);
>  	}
> -	return 0;
>  }
>  
>  static const struct vio_device_id vnet_port_match[] = {
> diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
> index 42e4d35e0d35..0a472acaca5d 100644
> --- a/drivers/scsi/ibmvscsi/ibmvfc.c
> +++ b/drivers/scsi/ibmvscsi/ibmvfc.c
> @@ -5253,7 +5253,7 @@ static int ibmvfc_probe(struct vio_dev *vdev, const struct vio_device_id *id)
>   * Return value:
>   * 	0
>   **/
> -static int ibmvfc_remove(struct vio_dev *vdev)
> +static void ibmvfc_remove(struct vio_dev *vdev)
>  {
>  	struct ibmvfc_host *vhost = dev_get_drvdata(&vdev->dev);
>  	unsigned long flags;
> @@ -5282,7 +5282,6 @@ static int ibmvfc_remove(struct vio_dev *vdev)
>  	spin_unlock(&ibmvfc_driver_lock);
>  	scsi_host_put(vhost->host);
>  	LEAVE;
> -	return 0;
>  }
>  
>  /**
> diff --git a/drivers/scsi/ibmvscsi/ibmvscsi.c b/drivers/scsi/ibmvscsi/ibmvscsi.c
> index 29fcc44be2d5..77fafb1bc173 100644
> --- a/drivers/scsi/ibmvscsi/ibmvscsi.c
> +++ b/drivers/scsi/ibmvscsi/ibmvscsi.c
> @@ -2335,7 +2335,7 @@ static int ibmvscsi_probe(struct vio_dev *vdev, const struct vio_device_id *id)
>  	return -1;
>  }
>  
> -static int ibmvscsi_remove(struct vio_dev *vdev)
> +static void ibmvscsi_remove(struct vio_dev *vdev)
>  {
>  	struct ibmvscsi_host_data *hostdata = dev_get_drvdata(&vdev->dev);
>  
> @@ -2356,8 +2356,6 @@ static int ibmvscsi_remove(struct vio_dev *vdev)
>  	spin_unlock(&ibmvscsi_driver_lock);
>  
>  	scsi_host_put(hostdata->host);
> -
> -	return 0;
>  }
>  
>  /**
> diff --git a/drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c b/drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c
> index cc3908c2d2f9..9abd9e253af6 100644
> --- a/drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c
> +++ b/drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c
> @@ -3595,7 +3595,7 @@ static int ibmvscsis_probe(struct vio_dev *vdev,
>  	return rc;
>  }
>  
> -static int ibmvscsis_remove(struct vio_dev *vdev)
> +static void ibmvscsis_remove(struct vio_dev *vdev)
>  {
>  	struct scsi_info *vscsi = dev_get_drvdata(&vdev->dev);
>  
> @@ -3622,8 +3622,6 @@ static int ibmvscsis_remove(struct vio_dev *vdev)
>  	list_del(&vscsi->list);
>  	spin_unlock_bh(&ibmvscsis_dev_lock);
>  	kfree(vscsi);
> -
> -	return 0;
>  }
>  
>  static ssize_t system_id_show(struct device *dev,
> diff --git a/drivers/tty/hvc/hvcs.c b/drivers/tty/hvc/hvcs.c
> index 3e0461285c34..80874945ded8 100644
> --- a/drivers/tty/hvc/hvcs.c
> +++ b/drivers/tty/hvc/hvcs.c
> @@ -819,7 +819,7 @@ static int hvcs_probe(
>  	return 0;
>  }
>  
> -static int hvcs_remove(struct vio_dev *dev)
> +static void hvcs_remove(struct vio_dev *dev)
>  {
>  	struct hvcs_struct *hvcsd = dev_get_drvdata(&dev->dev);
>  	unsigned long flags;
> @@ -849,7 +849,6 @@ static int hvcs_remove(struct vio_dev *dev)
>  
>  	printk(KERN_INFO "HVCS: vty-server@%X removed from the"
>  			" vio bus.\n", dev->unit_address);
> -	return 0;
>  };
>  
>  static struct vio_driver hvcs_vio_driver = {
> diff --git a/drivers/tty/vcc.c b/drivers/tty/vcc.c
> index e2d6205f83ce..5f72ebf93821 100644
> --- a/drivers/tty/vcc.c
> +++ b/drivers/tty/vcc.c
> @@ -677,7 +677,7 @@ static int vcc_probe(struct vio_dev *vdev, const struct vio_device_id *id)
>   *
>   * Return: status of removal
>   */
> -static int vcc_remove(struct vio_dev *vdev)
> +static void vcc_remove(struct vio_dev *vdev)
>  {
>  	struct vcc_port *port = dev_get_drvdata(&vdev->dev);
>  
> @@ -712,8 +712,6 @@ static int vcc_remove(struct vio_dev *vdev)
>  		kfree(port->domain);
>  		kfree(port);
>  	}
> -
> -	return 0;
>  }
>  
>  static const struct vio_device_id vcc_match[] = {
> 


^ permalink raw reply

* Re: [PATCH] vio: make remove callback return void
From: Lijun Pan @ 2021-01-29 19:08 UTC (permalink / raw)
  To: Uwe Kleine-König
  Cc: Cristobal Forno, Tyrel Datwyler, sparclinux, target-devel,
	Paul Mackerras, Breno Leitão, Peter Huewe,
	Sukadev Bhattiprolu, Jiri Slaby, Herbert Xu, linux-scsi,
	Nayna Jain, Jason Gunthorpe, Michael Cyr, Jakub Kicinski,
	Arnd Bergmann, James E.J. Bottomley, linux-block, Lijun Pan,
	Matt Mackall, Jens Axboe, Steven Royer, Martin K. Petersen,
	Greg Kroah-Hartman, linux-kernel, Jarkko Sakkinen, linux-crypto,
	netdev, Dany Madden, Paulo Flabiano Smorigo, linux-integrity,
	linuxppc-dev, David S. Miller
In-Reply-To: <20210127215010.99954-1-uwe@kleine-koenig.org>

On Wed, Jan 27, 2021 at 6:41 PM Uwe Kleine-König <uwe@kleine-koenig.org> wrote:
>
> The driver core ignores the return value of struct bus_type::remove()
> because there is only little that can be done. To simplify the quest to
> make this function return void, let struct vio_driver::remove() return
> void, too. All users already unconditionally return 0, this commit makes
> it obvious that returning an error code is a bad idea and makes it
> obvious for future driver authors that returning an error code isn't
> intended.
>
> Note there are two nominally different implementations for a vio bus:
> one in arch/sparc/kernel/vio.c and the other in
> arch/powerpc/platforms/pseries/vio.c. I didn't care to check which
> driver is using which of these busses (or if even some of them can be
> used with both) and simply adapt all drivers and the two bus codes in
> one go.
>
> Note that for the powerpc implementation there is a semantical change:
> Before this patch for a device that was bound to a driver without a
> remove callback vio_cmo_bus_remove(viodev) wasn't called. As the device
> core still considers the device unbound after vio_bus_remove() returns
> calling this unconditionally is the consistent behaviour which is
> implemented here.
>
> Signed-off-by: Uwe Kleine-König <uwe@kleine-koenig.org>

Acked-by: Lijun Pan <ljp@linux.ibm.com>

^ permalink raw reply

* Re: [PATCH 04/13] module: use RCU to synchronize find_module
From: Thiago Jung Bauermann @ 2021-01-29 21:29 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Petr Mladek, Jiri Kosina, Andrew Donnellan, linux-kbuild,
	David Airlie, Masahiro Yamada, Josh Poimboeuf, Maarten Lankhorst,
	linux-kernel, Maxime Ripard, live-patching, Michal Marek,
	Joe Lawrence, dri-devel, Thomas Zimmermann, Jessica Yu,
	Frederic Barrat, Daniel Vetter, Miroslav Benes, linuxppc-dev
In-Reply-To: <20210129051012.GA2053@lst.de>


Christoph Hellwig <hch@lst.de> writes:

> On Thu, Jan 28, 2021 at 05:50:56PM -0300, Thiago Jung Bauermann wrote:
>> >  struct module *find_module(const char *name)
>> >  {
>> > -	module_assert_mutex();
>> 
>> Does it make sense to replace the assert above with the warn below (untested)?
>> 
>>      RCU_LOCKDEP_WARN(rcu_read_lock_sched_held());
>
> One caller actually holds module_mutex still.  And find_module_all,
> which implements the actual logic already asserts that either
> module_mutex is held or rcu_read_lock, so I don't tink we need an
> extra one here.

Ok, thanks for the clarification.

-- 
Thiago Jung Bauermann
IBM Linux Technology Center

^ permalink raw reply

* Re: [PATCH] powerpc: remove unneeded semicolons
From: Chengyang Fan @ 2021-01-30  1:31 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: joe, linuxppc-dev
In-Reply-To: <87v9bgc6al.fsf@mpe.ellerman.id.au>

Although they are harmless, I think we should keep the consistency of 
kernel coding style.

Thanks


On 2021/1/29 19:48, Michael Ellerman wrote:
> Chengyang Fan <cy.fan@huawei.com> writes:
>> Remove superfluous semicolons after function definitions.
> Is there a good reason why?
>
> I realise they're superfluous, but they're also harmless as far as I'm
> aware.
>
> cheers
>
>>   arch/powerpc/include/asm/book3s/32/mmu-hash.h       |  2 +-
>>   arch/powerpc/include/asm/book3s/64/mmu.h            |  2 +-
>>   arch/powerpc/include/asm/book3s/64/tlbflush-radix.h |  2 +-
>>   arch/powerpc/include/asm/book3s/64/tlbflush.h       |  2 +-
>>   arch/powerpc/include/asm/firmware.h                 |  2 +-
>>   arch/powerpc/include/asm/kvm_ppc.h                  |  6 +++---
>>   arch/powerpc/include/asm/paca.h                     |  6 +++---
>>   arch/powerpc/include/asm/rtas.h                     |  2 +-
>>   arch/powerpc/include/asm/setup.h                    |  6 +++---
>>   arch/powerpc/include/asm/simple_spinlock.h          |  4 ++--
>>   arch/powerpc/include/asm/smp.h                      |  2 +-
>>   arch/powerpc/include/asm/xmon.h                     |  4 ++--
>>   arch/powerpc/kernel/prom.c                          |  2 +-
>>   arch/powerpc/kernel/setup.h                         | 12 ++++++------
>>   arch/powerpc/platforms/powernv/subcore.h            |  2 +-
>>   arch/powerpc/platforms/pseries/pseries.h            |  2 +-
>>   16 files changed, 29 insertions(+), 29 deletions(-)
> .

^ permalink raw reply

* [PATCH] powerpc64/idle: Fix SP offsets when saving GPRs
From: Christopher M. Riedl @ 2021-01-30  3:04 UTC (permalink / raw)
  To: linuxppc-dev

The idle entry/exit code saves/restores GPRs in the stack "red zone"
(Protected Zone according to PowerPC64 ELF ABI v2). However, the offset
used for the first GPR is incorrect and overwrites the back chain - the
Protected Zone actually starts below the current SP. In practice this is
probably not an issue, but it's still incorrect so fix it.

Signed-off-by: Christopher M. Riedl <cmr@codefail.de>
---
 arch/powerpc/kernel/idle_book3s.S | 126 +++++++++++++++---------------
 1 file changed, 63 insertions(+), 63 deletions(-)

diff --git a/arch/powerpc/kernel/idle_book3s.S b/arch/powerpc/kernel/idle_book3s.S
index 22f249b6f58d..80cf35183e9d 100644
--- a/arch/powerpc/kernel/idle_book3s.S
+++ b/arch/powerpc/kernel/idle_book3s.S
@@ -53,27 +53,27 @@ _GLOBAL(isa300_idle_stop_mayloss)
 	mflr	r4
 	mfcr	r5
 	/* use stack red zone rather than a new frame for saving regs */
-	std	r2,-8*0(r1)
-	std	r14,-8*1(r1)
-	std	r15,-8*2(r1)
-	std	r16,-8*3(r1)
-	std	r17,-8*4(r1)
-	std	r18,-8*5(r1)
-	std	r19,-8*6(r1)
-	std	r20,-8*7(r1)
-	std	r21,-8*8(r1)
-	std	r22,-8*9(r1)
-	std	r23,-8*10(r1)
-	std	r24,-8*11(r1)
-	std	r25,-8*12(r1)
-	std	r26,-8*13(r1)
-	std	r27,-8*14(r1)
-	std	r28,-8*15(r1)
-	std	r29,-8*16(r1)
-	std	r30,-8*17(r1)
-	std	r31,-8*18(r1)
-	std	r4,-8*19(r1)
-	std	r5,-8*20(r1)
+	std	r2,-8*1(r1)
+	std	r14,-8*2(r1)
+	std	r15,-8*3(r1)
+	std	r16,-8*4(r1)
+	std	r17,-8*5(r1)
+	std	r18,-8*6(r1)
+	std	r19,-8*7(r1)
+	std	r20,-8*8(r1)
+	std	r21,-8*9(r1)
+	std	r22,-8*10(r1)
+	std	r23,-8*11(r1)
+	std	r24,-8*12(r1)
+	std	r25,-8*13(r1)
+	std	r26,-8*14(r1)
+	std	r27,-8*15(r1)
+	std	r28,-8*16(r1)
+	std	r29,-8*17(r1)
+	std	r30,-8*18(r1)
+	std	r31,-8*19(r1)
+	std	r4,-8*20(r1)
+	std	r5,-8*21(r1)
 	/* 168 bytes */
 	PPC_STOP
 	b	.	/* catch bugs */
@@ -89,8 +89,8 @@ _GLOBAL(isa300_idle_stop_mayloss)
  */
 _GLOBAL(idle_return_gpr_loss)
 	ld	r1,PACAR1(r13)
-	ld	r4,-8*19(r1)
-	ld	r5,-8*20(r1)
+	ld	r4,-8*20(r1)
+	ld	r5,-8*21(r1)
 	mtlr	r4
 	mtcr	r5
 	/*
@@ -98,25 +98,25 @@ _GLOBAL(idle_return_gpr_loss)
 	 * from PACATOC. This could be avoided for that less common case
 	 * if KVM saved its r2.
 	 */
-	ld	r2,-8*0(r1)
-	ld	r14,-8*1(r1)
-	ld	r15,-8*2(r1)
-	ld	r16,-8*3(r1)
-	ld	r17,-8*4(r1)
-	ld	r18,-8*5(r1)
-	ld	r19,-8*6(r1)
-	ld	r20,-8*7(r1)
-	ld	r21,-8*8(r1)
-	ld	r22,-8*9(r1)
-	ld	r23,-8*10(r1)
-	ld	r24,-8*11(r1)
-	ld	r25,-8*12(r1)
-	ld	r26,-8*13(r1)
-	ld	r27,-8*14(r1)
-	ld	r28,-8*15(r1)
-	ld	r29,-8*16(r1)
-	ld	r30,-8*17(r1)
-	ld	r31,-8*18(r1)
+	ld	r2,-8*1(r1)
+	ld	r14,-8*2(r1)
+	ld	r15,-8*3(r1)
+	ld	r16,-8*4(r1)
+	ld	r17,-8*5(r1)
+	ld	r18,-8*6(r1)
+	ld	r19,-8*7(r1)
+	ld	r20,-8*8(r1)
+	ld	r21,-8*9(r1)
+	ld	r22,-8*10(r1)
+	ld	r23,-8*11(r1)
+	ld	r24,-8*12(r1)
+	ld	r25,-8*13(r1)
+	ld	r26,-8*14(r1)
+	ld	r27,-8*15(r1)
+	ld	r28,-8*16(r1)
+	ld	r29,-8*17(r1)
+	ld	r30,-8*18(r1)
+	ld	r31,-8*19(r1)
 	blr
 
 /*
@@ -155,27 +155,27 @@ _GLOBAL(isa206_idle_insn_mayloss)
 	mflr	r4
 	mfcr	r5
 	/* use stack red zone rather than a new frame for saving regs */
-	std	r2,-8*0(r1)
-	std	r14,-8*1(r1)
-	std	r15,-8*2(r1)
-	std	r16,-8*3(r1)
-	std	r17,-8*4(r1)
-	std	r18,-8*5(r1)
-	std	r19,-8*6(r1)
-	std	r20,-8*7(r1)
-	std	r21,-8*8(r1)
-	std	r22,-8*9(r1)
-	std	r23,-8*10(r1)
-	std	r24,-8*11(r1)
-	std	r25,-8*12(r1)
-	std	r26,-8*13(r1)
-	std	r27,-8*14(r1)
-	std	r28,-8*15(r1)
-	std	r29,-8*16(r1)
-	std	r30,-8*17(r1)
-	std	r31,-8*18(r1)
-	std	r4,-8*19(r1)
-	std	r5,-8*20(r1)
+	std	r2,-8*1(r1)
+	std	r14,-8*2(r1)
+	std	r15,-8*3(r1)
+	std	r16,-8*4(r1)
+	std	r17,-8*5(r1)
+	std	r18,-8*6(r1)
+	std	r19,-8*7(r1)
+	std	r20,-8*8(r1)
+	std	r21,-8*9(r1)
+	std	r22,-8*10(r1)
+	std	r23,-8*11(r1)
+	std	r24,-8*12(r1)
+	std	r25,-8*13(r1)
+	std	r26,-8*14(r1)
+	std	r27,-8*15(r1)
+	std	r28,-8*16(r1)
+	std	r29,-8*17(r1)
+	std	r30,-8*18(r1)
+	std	r31,-8*19(r1)
+	std	r4,-8*20(r1)
+	std	r5,-8*21(r1)
 	cmpwi	r3,PNV_THREAD_NAP
 	bne	1f
 	IDLE_STATE_ENTER_SEQ_NORET(PPC_NAP)
-- 
2.26.1


^ permalink raw reply related

* Re: [PATCH] powerpc/fault: fix wrong KUAP fault for IO_URING
From: Michael Ellerman @ 2021-01-30 11:22 UTC (permalink / raw)
  To: Christophe Leroy, Zorro Lang, Aneesh Kumar K.V
  Cc: Jens Axboe, linuxppc-dev, Nicholas Piggin
In-Reply-To: <18dd441b-440a-fe95-0907-d8cec5b49410@csgroup.eu>

Christophe Leroy <christophe.leroy@csgroup.eu> writes:
> +Aneesh
>
> Le 29/01/2021 à 07:52, Zorro Lang a écrit :
..
>> [   96.200296] ------------[ cut here ]------------
>> [   96.200304] Bug: Read fault blocked by KUAP!
>> [   96.200309] WARNING: CPU: 3 PID: 1876 at arch/powerpc/mm/fault.c:229 bad_kernel_fault+0x180/0x310
>
>> [   96.200734] NIP [c000000000849424] fault_in_pages_readable+0x104/0x350
>> [   96.200741] LR [c00000000084952c] fault_in_pages_readable+0x20c/0x350
>> [   96.200747] --- interrupt: 300
>
>
> Problem happens in a section where userspace access is supposed to be granted, so the patch you 
> proposed is definitely not the right fix.
>
> c000000000849408:	2c 01 00 4c 	isync
> c00000000084940c:	a6 03 3d 7d 	mtspr   29,r9  <== granting userspace access permission
> c000000000849410:	2c 01 00 4c 	isync
> c000000000849414:	00 00 36 e9 	ld      r9,0(r22)
> c000000000849418:	20 00 29 81 	lwz     r9,32(r9)
> c00000000084941c:	00 02 29 71 	andi.   r9,r9,512
> c000000000849420:	78 d3 5e 7f 	mr      r30,r26
> ==> c000000000849424:	00 00 bf 8b 	lbz     r29,0(r31)  <== accessing userspace
> c000000000849428:	10 00 82 41 	beq     c000000000849438 <fault_in_pages_readable+0x118>
> c00000000084942c:	2c 01 00 4c 	isync
> c000000000849430:	a6 03 bd 7e 	mtspr   29,r21  <== clearing userspace access permission
> c000000000849434:	2c 01 00 4c 	isync
>
> My first guess is that the problem is linked to the following function, see the comment
>
> /*
>   * For kernel thread that doesn't have thread.regs return
>   * default AMR/IAMR values.
>   */
> static inline u64 current_thread_amr(void)
> {
> 	if (current->thread.regs)
> 		return current->thread.regs->amr;
> 	return AMR_KUAP_BLOCKED;
> }
>
> Above function was introduced by commit 48a8ab4eeb82 ("powerpc/book3s64/pkeys: Don't update SPRN_AMR 
> when in kernel mode")

Yeah that's a bit of a curly one.

At some point io_uring did kthread_use_mm(), which is supposed to mean
the kthread can operate on behalf of the original process that submitted
the IO.

But because KUAP is implemented using memory protection keys, it depends
on the value of the AMR register, which is not part of the mm, it's in
thread.regs->amr.

And what's worse by the time we're in kthread_use_mm() we no longer have
access to the thread.regs->amr of the original process that submitted
the IO.

We also can't simply move the AMR into the mm, precisely because it's
per thread, not per mm.

So TBH I don't know how we're going to fix this.

I guess we could return AMR=unblocked for kernel threads, but that's
arguably a bug because it allows a process to circumvent memory keys by
asking the kernel to do the access.


cheers

^ permalink raw reply

* Re: [PATCH] powerpc64/idle: Fix SP offsets when saving GPRs
From: Michael Ellerman @ 2021-01-30 11:32 UTC (permalink / raw)
  To: Christopher M. Riedl, linuxppc-dev, Nicholas Piggin
In-Reply-To: <20210130030430.11369-1-cmr@codefail.de>

"Christopher M. Riedl" <cmr@codefail.de> writes:
> The idle entry/exit code saves/restores GPRs in the stack "red zone"
> (Protected Zone according to PowerPC64 ELF ABI v2). However, the offset
> used for the first GPR is incorrect and overwrites the back chain - the
> Protected Zone actually starts below the current SP. In practice this is
> probably not an issue, but it's still incorrect so fix it.

Nice catch.

Corrupting the back chain means you can't backtrace from there, which
could be confusing for debugging one day.

It does make me wonder why we don't just create a stack frame and use
the normal macros? It would use a bit more stack space, but we shouldn't
be short of stack space when going idle.

Nick, was there a particular reason for using the red zone?

cheers


> diff --git a/arch/powerpc/kernel/idle_book3s.S b/arch/powerpc/kernel/idle_book3s.S
> index 22f249b6f58d..80cf35183e9d 100644
> --- a/arch/powerpc/kernel/idle_book3s.S
> +++ b/arch/powerpc/kernel/idle_book3s.S
> @@ -53,27 +53,27 @@ _GLOBAL(isa300_idle_stop_mayloss)
>  	mflr	r4
>  	mfcr	r5
>  	/* use stack red zone rather than a new frame for saving regs */
> -	std	r2,-8*0(r1)
> -	std	r14,-8*1(r1)
> -	std	r15,-8*2(r1)
> -	std	r16,-8*3(r1)
> -	std	r17,-8*4(r1)
> -	std	r18,-8*5(r1)
> -	std	r19,-8*6(r1)
> -	std	r20,-8*7(r1)
> -	std	r21,-8*8(r1)
> -	std	r22,-8*9(r1)
> -	std	r23,-8*10(r1)
> -	std	r24,-8*11(r1)
> -	std	r25,-8*12(r1)
> -	std	r26,-8*13(r1)
> -	std	r27,-8*14(r1)
> -	std	r28,-8*15(r1)
> -	std	r29,-8*16(r1)
> -	std	r30,-8*17(r1)
> -	std	r31,-8*18(r1)
> -	std	r4,-8*19(r1)
> -	std	r5,-8*20(r1)
> +	std	r2,-8*1(r1)
> +	std	r14,-8*2(r1)
> +	std	r15,-8*3(r1)
> +	std	r16,-8*4(r1)
> +	std	r17,-8*5(r1)
> +	std	r18,-8*6(r1)
> +	std	r19,-8*7(r1)
> +	std	r20,-8*8(r1)
> +	std	r21,-8*9(r1)
> +	std	r22,-8*10(r1)
> +	std	r23,-8*11(r1)
> +	std	r24,-8*12(r1)
> +	std	r25,-8*13(r1)
> +	std	r26,-8*14(r1)
> +	std	r27,-8*15(r1)
> +	std	r28,-8*16(r1)
> +	std	r29,-8*17(r1)
> +	std	r30,-8*18(r1)
> +	std	r31,-8*19(r1)
> +	std	r4,-8*20(r1)
> +	std	r5,-8*21(r1)
>  	/* 168 bytes */
>  	PPC_STOP
>  	b	.	/* catch bugs */
> @@ -89,8 +89,8 @@ _GLOBAL(isa300_idle_stop_mayloss)
>   */
>  _GLOBAL(idle_return_gpr_loss)
>  	ld	r1,PACAR1(r13)
> -	ld	r4,-8*19(r1)
> -	ld	r5,-8*20(r1)
> +	ld	r4,-8*20(r1)
> +	ld	r5,-8*21(r1)
>  	mtlr	r4
>  	mtcr	r5
>  	/*
> @@ -98,25 +98,25 @@ _GLOBAL(idle_return_gpr_loss)
>  	 * from PACATOC. This could be avoided for that less common case
>  	 * if KVM saved its r2.
>  	 */
> -	ld	r2,-8*0(r1)
> -	ld	r14,-8*1(r1)
> -	ld	r15,-8*2(r1)
> -	ld	r16,-8*3(r1)
> -	ld	r17,-8*4(r1)
> -	ld	r18,-8*5(r1)
> -	ld	r19,-8*6(r1)
> -	ld	r20,-8*7(r1)
> -	ld	r21,-8*8(r1)
> -	ld	r22,-8*9(r1)
> -	ld	r23,-8*10(r1)
> -	ld	r24,-8*11(r1)
> -	ld	r25,-8*12(r1)
> -	ld	r26,-8*13(r1)
> -	ld	r27,-8*14(r1)
> -	ld	r28,-8*15(r1)
> -	ld	r29,-8*16(r1)
> -	ld	r30,-8*17(r1)
> -	ld	r31,-8*18(r1)
> +	ld	r2,-8*1(r1)
> +	ld	r14,-8*2(r1)
> +	ld	r15,-8*3(r1)
> +	ld	r16,-8*4(r1)
> +	ld	r17,-8*5(r1)
> +	ld	r18,-8*6(r1)
> +	ld	r19,-8*7(r1)
> +	ld	r20,-8*8(r1)
> +	ld	r21,-8*9(r1)
> +	ld	r22,-8*10(r1)
> +	ld	r23,-8*11(r1)
> +	ld	r24,-8*12(r1)
> +	ld	r25,-8*13(r1)
> +	ld	r26,-8*14(r1)
> +	ld	r27,-8*15(r1)
> +	ld	r28,-8*16(r1)
> +	ld	r29,-8*17(r1)
> +	ld	r30,-8*18(r1)
> +	ld	r31,-8*19(r1)
>  	blr
>  
>  /*
> @@ -155,27 +155,27 @@ _GLOBAL(isa206_idle_insn_mayloss)
>  	mflr	r4
>  	mfcr	r5
>  	/* use stack red zone rather than a new frame for saving regs */
> -	std	r2,-8*0(r1)
> -	std	r14,-8*1(r1)
> -	std	r15,-8*2(r1)
> -	std	r16,-8*3(r1)
> -	std	r17,-8*4(r1)
> -	std	r18,-8*5(r1)
> -	std	r19,-8*6(r1)
> -	std	r20,-8*7(r1)
> -	std	r21,-8*8(r1)
> -	std	r22,-8*9(r1)
> -	std	r23,-8*10(r1)
> -	std	r24,-8*11(r1)
> -	std	r25,-8*12(r1)
> -	std	r26,-8*13(r1)
> -	std	r27,-8*14(r1)
> -	std	r28,-8*15(r1)
> -	std	r29,-8*16(r1)
> -	std	r30,-8*17(r1)
> -	std	r31,-8*18(r1)
> -	std	r4,-8*19(r1)
> -	std	r5,-8*20(r1)
> +	std	r2,-8*1(r1)
> +	std	r14,-8*2(r1)
> +	std	r15,-8*3(r1)
> +	std	r16,-8*4(r1)
> +	std	r17,-8*5(r1)
> +	std	r18,-8*6(r1)
> +	std	r19,-8*7(r1)
> +	std	r20,-8*8(r1)
> +	std	r21,-8*9(r1)
> +	std	r22,-8*10(r1)
> +	std	r23,-8*11(r1)
> +	std	r24,-8*12(r1)
> +	std	r25,-8*13(r1)
> +	std	r26,-8*14(r1)
> +	std	r27,-8*15(r1)
> +	std	r28,-8*16(r1)
> +	std	r29,-8*17(r1)
> +	std	r30,-8*18(r1)
> +	std	r31,-8*19(r1)
> +	std	r4,-8*20(r1)
> +	std	r5,-8*21(r1)
>  	cmpwi	r3,PNV_THREAD_NAP
>  	bne	1f
>  	IDLE_STATE_ENTER_SEQ_NORET(PPC_NAP)
> -- 
> 2.26.1

^ permalink raw reply

* [PATCH v7 00/42] powerpc: interrupt wrappers
From: Nicholas Piggin @ 2021-01-30 13:08 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Athira Rajeev, Nicholas Piggin

This adds interrupt handler wrapper functions, similar to the
generic / x86 code, and moves several common operations into them
from either asm or open coded in the individual handlers.

This series is based on powerpc fixes-test tree, there's another
unrelated pending fix in patch 1 of the series which clashes a
bit.

This includes more changes and fixes suggested by Christophe,
a few minor bug fixes and compile fix noticed by kbuild, and
some NMI changes Athira asked about -- PMI interrupts don't
block tracing when they are soft-NMI.

Since v1:
- Fixed a couple of compile issues
- Fixed perf weirdness (sometimes NMI, sometimes not)
- Also move irq_enter/exit into wrappers

Since v2:
- Rebased upstream
- Took code in patch 3 from Christophe
- Fixed some compile errors from 0day

Since v3:
- Rebased
- Split Christophe's 32s DABR patch into its own patch
- Fixed missing asm from 32s on patch 3 noticed by Christophe.
- Moved changes around, split out one more patch (patch 9) to make
  changes more logical and atomic.
- Add comments explaining _RAW handlers (SLB, HPTE) interrupts better

Since v4:
- Rebased (on top of scv fallback flush fix)
- Rearranged a few changes into different patches from Christophe,
  e.g., the ___do_page_fault change from patch 2 to 10. I didn't
  do everything (e.g., splitting to update __hash_page to drop the
  msr argument before the bulk of patch 2 seemed like churn without
  much improvement), and also other things like removing the new
  ___do_page_fault variant if we can change hash fault context tracking
  I didn't get time to completely investigate and implement. I think
  this shouldn't be a showstopper though we can make more improvements
  as we go.

Since v5:
- Lots of good review suggestions from Christophe, see v5 email threads.
- Major change being do_break is left in asm and selected early as an
  alternate interrupt handler now, which is a smaller step and matches
  other subarchs better.
- Rearranged patches, split, moved things, bug fixes, etc.
- Converted a few more missed exception handlers for debug and ras

Since v6:
- Move related interrupt handler de-argify patches together [Christophe]
- Split do_bad_page_fault patch [Christophe]
- Change do_page_fault cleanup patch [Christophe]
- entry_32.S can't avoid saving r4/r5 until later in the series [Christophe]
- Soft-NMI decrementer and perf don't block ftrace [Athira]
- Rebased on some fixes
- Fixed mismerge / duplicate line in patch 40
- Fix kbuild hash missing declaration bug

Christophe Leroy (1):
  powerpc/32s: move DABR match out of handle_page_fault

Nicholas Piggin (41):
  powerpc/64s: interrupt exit improve bounding of interrupt recursion
  KVM: PPC: Book3S HV: Context tracking exit guest context before
    enabling irqs
  powerpc/64s: move DABR match out of handle_page_fault
  powerpc/64s: move the hash fault handling logic to C
  powerpc: remove arguments from fault handler functions
  powerpc/fsl_booke/32: CacheLockingException remove args
  powerpc: do_break get registers from regs
  powerpc: DebugException remove args
  powerpc/32: transfer can avoid saving r4/r5 over trace call
  powerpc: bad_page_fault get registers from regs
  powerpc/64s: add do_bad_page_fault_segv handler
  powerpc: rearrange do_page_fault error case to be inside
    exception_enter
  powerpc/64s: move bad_page_fault handling to C
  powerpc/64s: split do_hash_fault
  powerpc/mm: Remove stale do_page_fault comment referring to SLB faults
  powerpc/64s: slb comment update
  powerpc/traps: add NOKPROBE_SYMBOL for sreset and mce
  powerpc/perf: move perf irq/nmi handling details into traps.c
  powerpc/time: move timer_broadcast_interrupt prototype to asm/time.h
  powerpc: add and use unknown_async_exception
  powerpc/cell: tidy up pervasive declarations
  powerpc: introduce die_mce
  powerpc/mce: ensure machine check handler always tests RI
  powerpc: improve handling of unrecoverable system reset
  powerpc: interrupt handler wrapper functions
  powerpc: add interrupt wrapper entry / exit stub functions
  powerpc: convert interrupt handlers to use wrappers
  powerpc: add interrupt_cond_local_irq_enable helper
  powerpc/64: context tracking remove _TIF_NOHZ
  powerpc/64s/hash: improve context tracking of hash faults
  powerpc/64: context tracking move to interrupt wrappers
  powerpc/64: add context tracking to asynchronous interrupts
  powerpc: handle irq_enter/irq_exit in interrupt handler wrappers
  powerpc/64s: move context tracking exit to interrupt exit path
  powerpc/64s: reconcile interrupts in C
  powerpc/64: move account_stolen_time into its own function
  powerpc/64: entry cpu time accounting in C
  powerpc: move NMI entry/exit code into wrapper
  powerpc/64s: move NMI soft-mask handling to C
  powerpc/64s: runlatch interrupt handling in C
  powerpc/64s: power4 nap fixup in C

 arch/powerpc/Kconfig                       |   1 -
 arch/powerpc/include/asm/asm-prototypes.h  |  29 --
 arch/powerpc/include/asm/bug.h             |   9 +-
 arch/powerpc/include/asm/cputime.h         |  14 +
 arch/powerpc/include/asm/debug.h           |   4 -
 arch/powerpc/include/asm/hw_irq.h          |   9 -
 arch/powerpc/include/asm/interrupt.h       | 437 +++++++++++++++++++++
 arch/powerpc/include/asm/ppc_asm.h         |  24 --
 arch/powerpc/include/asm/processor.h       |   1 +
 arch/powerpc/include/asm/thread_info.h     |  10 +-
 arch/powerpc/include/asm/time.h            |   2 +
 arch/powerpc/kernel/dbell.c                |   9 +-
 arch/powerpc/kernel/entry_32.S             |  25 +-
 arch/powerpc/kernel/exceptions-64e.S       |   8 +-
 arch/powerpc/kernel/exceptions-64s.S       | 310 ++-------------
 arch/powerpc/kernel/head_40x.S             |  11 +-
 arch/powerpc/kernel/head_8xx.S             |  11 +-
 arch/powerpc/kernel/head_book3s_32.S       |  14 +-
 arch/powerpc/kernel/head_booke.h           |   6 +-
 arch/powerpc/kernel/head_fsl_booke.S       |   6 +-
 arch/powerpc/kernel/idle_book3s.S          |   4 +
 arch/powerpc/kernel/irq.c                  |   7 +-
 arch/powerpc/kernel/mce.c                  |  16 +-
 arch/powerpc/kernel/process.c              |   8 +-
 arch/powerpc/kernel/ptrace/ptrace.c        |   4 -
 arch/powerpc/kernel/signal.c               |   4 -
 arch/powerpc/kernel/syscall_64.c           |  90 +++--
 arch/powerpc/kernel/tau_6xx.c              |   5 +-
 arch/powerpc/kernel/time.c                 |   7 +-
 arch/powerpc/kernel/traps.c                | 265 ++++++-------
 arch/powerpc/kernel/watchdog.c             |  15 +-
 arch/powerpc/kvm/book3s_hv.c               |   7 +-
 arch/powerpc/kvm/book3s_hv_builtin.c       |   1 +
 arch/powerpc/kvm/booke.c                   |   1 +
 arch/powerpc/mm/book3s64/hash_utils.c      |  97 +++--
 arch/powerpc/mm/book3s64/slb.c             |  40 +-
 arch/powerpc/mm/fault.c                    |  76 ++--
 arch/powerpc/perf/core-book3s.c            |  35 +-
 arch/powerpc/perf/core-fsl-emb.c           |  25 --
 arch/powerpc/platforms/8xx/machine_check.c |   2 +-
 arch/powerpc/platforms/cell/pervasive.c    |   1 +
 arch/powerpc/platforms/cell/pervasive.h    |   3 -
 arch/powerpc/platforms/cell/ras.c          |   6 +-
 arch/powerpc/platforms/cell/ras.h          |   9 +-
 arch/powerpc/platforms/powernv/idle.c      |   1 +
 arch/powerpc/platforms/powernv/opal.c      |   2 +-
 arch/powerpc/platforms/pseries/ras.c       |   2 +-
 47 files changed, 914 insertions(+), 759 deletions(-)
 create mode 100644 arch/powerpc/include/asm/interrupt.h

-- 
2.23.0


^ permalink raw reply

* [PATCH v7 01/42] powerpc/64s: interrupt exit improve bounding of interrupt recursion
From: Nicholas Piggin @ 2021-01-30 13:08 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Athira Rajeev, Nicholas Piggin
In-Reply-To: <20210130130852.2952424-1-npiggin@gmail.com>

When replaying pending soft-masked interrupts when an interrupt returns
to an irqs-enabled context, there is a special case required if this was
an asynchronous interrupt to avoid unbounded interrupt recursion.

This case was not tested for in the case the asynchronous interrupt hit
in user context, because a subsequent nested interrupt would by definition
hit in kernel mode, which then exits via the kernel path which does test
this case.

There is no reason to allow this for such interrupts. While recursion is
bounded at the next level, it's simpler and uses less stack to apply the
replay logic consistently.

This also expands the comment which was really pretty poor and didn't
explain the problem (I can say that because I wrote it).

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/syscall_64.c | 55 +++++++++++++++++++-------------
 1 file changed, 33 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/kernel/syscall_64.c b/arch/powerpc/kernel/syscall_64.c
index 7c85ed04a164..e0eb2a502db3 100644
--- a/arch/powerpc/kernel/syscall_64.c
+++ b/arch/powerpc/kernel/syscall_64.c
@@ -138,8 +138,12 @@ notrace long system_call_exception(long r3, long r4, long r5,
 /*
  * local irqs must be disabled. Returns false if the caller must re-enable
  * them, check for new work, and try again.
+ *
+ * This should be called with local irqs disabled, but if they were previously
+ * enabled when the interrupt handler returns (indicating a process-context /
+ * synchronous interrupt) then irqs_enabled should be true.
  */
-static notrace inline bool prep_irq_for_enabled_exit(bool clear_ri)
+static notrace inline bool prep_irq_for_enabled_exit(bool clear_ri, bool irqs_enabled)
 {
 	/* This must be done with RI=1 because tracing may touch vmaps */
 	trace_hardirqs_on();
@@ -156,6 +160,29 @@ static notrace inline bool prep_irq_for_enabled_exit(bool clear_ri)
 		trace_hardirqs_off();
 		local_paca->irq_happened |= PACA_IRQ_HARD_DIS;
 
+		/*
+		 * Must replay pending soft-masked interrupts now. Don't just
+		 * local_irq_enabe(); local_irq_disable(); because if we are
+		 * returning from an asynchronous interrupt here, another one
+		 * might hit after irqs are enabled, and it would exit via this
+		 * same path allowing another to fire, and so on unbounded.
+		 *
+		 * If interrupts were enabled when this interrupt exited,
+		 * indicating a process context (synchronous) interrupt,
+		 * local_irq_enable/disable can be used, which will enable
+		 * interrupts rather than keeping them masked (unclear how
+		 * much benefit this is over just replaying for all cases,
+		 * because we immediately disable again, so all we're really
+		 * doing is allowing hard interrupts to execute directly for
+		 * a very small time, rather than being masked and replayed).
+		 */
+		if (irqs_enabled) {
+			local_irq_enable();
+			local_irq_disable();
+		} else {
+			replay_soft_interrupts();
+		}
+
 		return false;
 	}
 	local_paca->irq_happened = 0;
@@ -212,8 +239,9 @@ notrace unsigned long syscall_exit_prepare(unsigned long r3,
 		ret |= _TIF_RESTOREALL;
 	}
 
-again:
 	local_irq_disable();
+
+again:
 	ti_flags = READ_ONCE(*ti_flagsp);
 	while (unlikely(ti_flags & (_TIF_USER_WORK_MASK & ~_TIF_RESTORE_TM))) {
 		local_irq_enable();
@@ -258,10 +286,8 @@ notrace unsigned long syscall_exit_prepare(unsigned long r3,
 	}
 
 	/* scv need not set RI=0 because SRRs are not used */
-	if (unlikely(!prep_irq_for_enabled_exit(!scv))) {
-		local_irq_enable();
+	if (unlikely(!prep_irq_for_enabled_exit(!scv, true)))
 		goto again;
-	}
 
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
 	local_paca->tm_scratch = regs->msr;
@@ -336,11 +362,8 @@ notrace unsigned long interrupt_exit_user_prepare(struct pt_regs *regs, unsigned
 		}
 	}
 
-	if (unlikely(!prep_irq_for_enabled_exit(true))) {
-		local_irq_enable();
-		local_irq_disable();
+	if (unlikely(!prep_irq_for_enabled_exit(true, !irqs_disabled_flags(flags))))
 		goto again;
-	}
 
 #ifdef CONFIG_PPC_BOOK3E
 	if (unlikely(ts->debug.dbcr0 & DBCR0_IDM)) {
@@ -403,20 +426,8 @@ notrace unsigned long interrupt_exit_kernel_prepare(struct pt_regs *regs, unsign
 			}
 		}
 
-		if (unlikely(!prep_irq_for_enabled_exit(true))) {
-			/*
-			 * Can't local_irq_restore to replay if we were in
-			 * interrupt context. Must replay directly.
-			 */
-			if (irqs_disabled_flags(flags)) {
-				replay_soft_interrupts();
-			} else {
-				local_irq_restore(flags);
-				local_irq_save(flags);
-			}
-			/* Took an interrupt, may have more exit work to do. */
+		if (unlikely(!prep_irq_for_enabled_exit(true, !irqs_disabled_flags(flags))))
 			goto again;
-		}
 	} else {
 		/* Returning to a kernel context with local irqs disabled. */
 		__hard_EE_RI_disable();
-- 
2.23.0


^ permalink raw reply related

* [PATCH v7 02/42] KVM: PPC: Book3S HV: Context tracking exit guest context before enabling irqs
From: Nicholas Piggin @ 2021-01-30 13:08 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Athira Rajeev, Nicholas Piggin
In-Reply-To: <20210130130852.2952424-1-npiggin@gmail.com>

Interrupts that occur in kernel mode expect that context tracking
is set to kernel. Enabling local irqs before context tracking
switches from guest to host means interrupts can come in and trigger
warnings about wrong context, and possibly worse.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kvm/book3s_hv.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 6f612d240392..d348e77cee20 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3407,8 +3407,9 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc)
 
 	kvmppc_set_host_core(pcpu);
 
+	guest_exit_irqoff();
+
 	local_irq_enable();
-	guest_exit();
 
 	/* Let secondaries go back to the offline loop */
 	for (i = 0; i < controlled_threads; ++i) {
@@ -4217,8 +4218,9 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 time_limit,
 
 	kvmppc_set_host_core(pcpu);
 
+	guest_exit_irqoff();
+
 	local_irq_enable();
-	guest_exit();
 
 	cpumask_clear_cpu(pcpu, &kvm->arch.cpu_in_guest);
 
-- 
2.23.0


^ permalink raw reply related

* [PATCH v7 03/42] powerpc/32s: move DABR match out of handle_page_fault
From: Nicholas Piggin @ 2021-01-30 13:08 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Athira Rajeev, Nicholas Piggin
In-Reply-To: <20210130130852.2952424-1-npiggin@gmail.com>

From: Christophe Leroy <christophe.leroy@csgroup.eu>

handle_page_fault() has some code dedicated to book3s/32 to
call do_break() when the DSI is a DABR match.

On other platforms, do_break() is handled separately.

Do the same for book3s/32, do it earlier in the process of DSI.

This change also avoid doing the test on ISI.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
 arch/powerpc/kernel/entry_32.S       | 15 ---------------
 arch/powerpc/kernel/head_book3s_32.S |  3 +++
 2 files changed, 3 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index 1c9b0ccc2172..238eacfda7b0 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -670,10 +670,6 @@ ppc_swapcontext:
 	.globl	handle_page_fault
 handle_page_fault:
 	addi	r3,r1,STACK_FRAME_OVERHEAD
-#ifdef CONFIG_PPC_BOOK3S_32
-	andis.  r0,r5,DSISR_DABRMATCH@h
-	bne-    handle_dabr_fault
-#endif
 	bl	do_page_fault
 	cmpwi	r3,0
 	beq+	ret_from_except
@@ -687,17 +683,6 @@ handle_page_fault:
 	bl	__bad_page_fault
 	b	ret_from_except_full
 
-#ifdef CONFIG_PPC_BOOK3S_32
-	/* We have a data breakpoint exception - handle it */
-handle_dabr_fault:
-	SAVE_NVGPRS(r1)
-	lwz	r0,_TRAP(r1)
-	clrrwi	r0,r0,1
-	stw	r0,_TRAP(r1)
-	bl      do_break
-	b	ret_from_except_full
-#endif
-
 /*
  * This routine switches between two different tasks.  The process
  * state of one is saved on its kernel stack.  Then the state
diff --git a/arch/powerpc/kernel/head_book3s_32.S b/arch/powerpc/kernel/head_book3s_32.S
index 858fbc8b19f3..6d411b8fd5d3 100644
--- a/arch/powerpc/kernel/head_book3s_32.S
+++ b/arch/powerpc/kernel/head_book3s_32.S
@@ -689,7 +689,10 @@ handle_page_fault_tramp_1:
 	lwz	r5, _DSISR(r11)
 	/* fall through */
 handle_page_fault_tramp_2:
+	andis.	r0, r5, DSISR_DABRMATCH@h
+	bne-	1f
 	EXC_XFER_LITE(0x300, handle_page_fault)
+1:	EXC_XFER_STD(0x300, do_break)
 
 #ifdef CONFIG_VMAP_STACK
 .macro save_regs_thread		thread
-- 
2.23.0


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox