[PATCH v4 0/2] m68k uaccess fault handling fixes

public inbox for linux-m68k@lists.linux-m68k.org
 help / color / mirror / Atom feed

* [PATCH v4 0/2] m68k uaccess fault handling fixes
@ 2024-04-29  3:09 Michael Schmitz
  2024-04-29  3:09 ` [PATCH v4 1/2] m68k: Handle __generic_copy_to_user faults more carefully Michael Schmitz
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Michael Schmitz @ 2024-04-29  3:09 UTC (permalink / raw)
  To: linux-m68k, geert; +Cc: gerg

(Actually now) version 4 of fixes for uaccess fault handling
on 68030, tested on 68030 and 68040.

Patch 1 has one exception table entry fixed (residual
calculated in __generic_copy_to_user did not take into
account the number of longword transfers omitted due to a
fault) si as well as a final NOP added in __clear_user.

Patch 2 is unchanged from v3 (actually since RFC v1), now
has Tested-by tag by Finn Thain added.

These patches would still benefit from testing on 68060
and Coldfire. 

Cheers,

   Michael

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v4 1/2] m68k: Handle __generic_copy_to_user faults more carefully
  2024-04-29  3:09 [PATCH v4 0/2] m68k uaccess fault handling fixes Michael Schmitz
@ 2024-04-29  3:09 ` Michael Schmitz
  2024-08-07  8:14   ` Finn Thain
  2024-04-29  3:09 ` [PATCH v4 2/2] m68k: improve __constant_copy_to_user_asm() fault handling Michael Schmitz
  2024-04-29  7:58 ` [PATCH v4 0/2] m68k uaccess fault handling fixes Greg Ungerer
  2 siblings, 1 reply; 17+ messages in thread
From: Michael Schmitz @ 2024-04-29  3:09 UTC (permalink / raw)
  To: linux-m68k, geert; +Cc: gerg, Michael Schmitz, Finn Thain, linux-m68k

As mentioned by Finn Thain in his patch to improve put_user
exception handling on 040, a similar problem exists on 030
processors.

A moves instruction that crosses a page boundary from a
mapped page into an unmapped one will cause a mid-instruction
bus error exception (frame format b), with the PC pointing
(usually) two instructions past the faulting movesl instruction.

Our exception handling in __generic_copy_to_user only covers
the instruction immediately following the faulting one. As
a result, fixup_exception in send_fault_sig does not detect
this case, and cause send_fault_sig to oops.

Extend the exception table to cover one additional instruction
beyond the moves[lwb] instructions.

Tested on 68030 (Atari Falcon 030) with transfers beginning
at one to six bytes offset from the end of a mapped page,
followed by further bytes on an unmapped page (testcase
derived from stress-ng sysbadaddr stressor by Finn Thain).

Tested on 68040 (Mac Quadra) and 68030 (Mac IIci) by Finn Thain.

A similar problem is present in __clear_user(); modify the
exception table for that function in the same way (tested by
Finn Thain).

Cc: Finn Thain <fthain@linux-m68k.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: linux-m68k@lists.linux-m68k.org
Tested-by: Finn Thain <fthain@linux-m68k.org>
Link: https://lore.kernel.org/all/e0f23460779e6d16e2633486ac4841790ef2aca0.1713176294.git.fthain@linux-m68k.org
Signed-off-by: Michael Schmitz <schmitzmic@gmail.com>

---

Changes from v3:

Finn Thain:

- correct exception table entry for movew instruction in
  __generic_copy_to_user

- add final NOP in __clear_user

Changes from RFC v2:

Finn Thain:

- add missing extension table entries and final NOP in
  __generic_copy_to_user faults in 040 tests

Michael Schmitz:

- add yet another exception table entry in __clear_user

Changes from RFC v1:

Michael Schmitz:

- use extended exception table instead of additional NOPs
---
 arch/m68k/lib/uaccess.c | 65 ++++++++++++++++++++++++-----------------
 1 file changed, 38 insertions(+), 27 deletions(-)

diff --git a/arch/m68k/lib/uaccess.c b/arch/m68k/lib/uaccess.c
index 86b6fed5151c..c63efa6ea4d4 100644
--- a/arch/m68k/lib/uaccess.c
+++ b/arch/m68k/lib/uaccess.c
@@ -62,35 +62,42 @@ unsigned long __generic_copy_to_user(void __user *to, const void *from,
 
 	asm volatile ("\n"
 		"	tst.l	%0\n"
-		"	jeq	4f\n"
+		"	jeq	5f\n"
 		"1:	move.l	(%1)+,%3\n"
 		"2:	"MOVES".l	%3,(%2)+\n"
 		"3:	subq.l	#1,%0\n"
-		"	jne	1b\n"
-		"4:	btst	#1,%5\n"
-		"	jeq	6f\n"
-		"	move.w	(%1)+,%3\n"
-		"5:	"MOVES".w	%3,(%2)+\n"
-		"6:	btst	#0,%5\n"
+		"4:	jne	1b\n"
+		"5:	btst	#1,%5\n"
 		"	jeq	8f\n"
-		"	move.b	(%1)+,%3\n"
-		"7:	"MOVES".b  %3,(%2)+\n"
-		"8:\n"
+		"6:	move.w	(%1)+,%3\n"
+		"7:	"MOVES".w	%3,(%2)+\n"
+		"8:	btst	#0,%5\n"
+		"9:	jeq	13f\n"
+		"10:	move.b	(%1)+,%3\n"
+		"11:	"MOVES".b	%3,(%2)+\n"
+		"12:	nop\n"
+		"13:\n"
 		"	.section .fixup,\"ax\"\n"
 		"	.even\n"
 		"20:	lsl.l	#2,%0\n"
 		"50:	add.l	%5,%0\n"
-		"	jra	8b\n"
+		"	jra	13b\n"
 		"	.previous\n"
 		"\n"
 		"	.section __ex_table,\"a\"\n"
 		"	.align	4\n"
+		"	.long	1b,20b\n"
 		"	.long	2b,20b\n"
 		"	.long	3b,20b\n"
-		"	.long	5b,50b\n"
-		"	.long	6b,50b\n"
+		"	.long	4b,20b\n"
+		"	.long	5b,20b\n"
+		"	.long	6b,20b\n"
 		"	.long	7b,50b\n"
 		"	.long	8b,50b\n"
+		"	.long	9b,50b\n"
+		"	.long	10b,50b\n"
+		"	.long	11b,50b\n"
+		"	.long	12b,50b\n"
 		"	.previous"
 		: "=d" (res), "+a" (from), "+a" (to), "=&d" (tmp)
 		: "0" (n / 4), "d" (n & 3));
@@ -109,32 +116,36 @@ unsigned long __clear_user(void __user *to, unsigned long n)
 
 	asm volatile ("\n"
 		"	tst.l	%0\n"
-		"	jeq	3f\n"
+		"	jeq	4f\n"
 		"1:	"MOVES".l	%2,(%1)+\n"
 		"2:	subq.l	#1,%0\n"
-		"	jne	1b\n"
-		"3:	btst	#1,%4\n"
-		"	jeq	5f\n"
-		"4:	"MOVES".w	%2,(%1)+\n"
-		"5:	btst	#0,%4\n"
-		"	jeq	7f\n"
-		"6:	"MOVES".b	%2,(%1)\n"
-		"7:\n"
+		"3:	jne	1b\n"
+		"4:	btst	#1,%4\n"
+		"	jeq	6f\n"
+		"5:	"MOVES".w	%2,(%1)+\n"
+		"6:	btst	#0,%4\n"
+		"7:	jeq	9f\n"
+		"8:	"MOVES".b	%2,(%1)\n"
+		"9:	nop\n"
+		"10:\n"
 		"	.section .fixup,\"ax\"\n"
 		"	.even\n"
-		"10:	lsl.l	#2,%0\n"
+		"20:	lsl.l	#2,%0\n"
 		"40:	add.l	%4,%0\n"
-		"	jra	7b\n"
+		"	jra	10b\n"
 		"	.previous\n"
 		"\n"
 		"	.section __ex_table,\"a\"\n"
 		"	.align	4\n"
-		"	.long	1b,10b\n"
-		"	.long	2b,10b\n"
-		"	.long	4b,40b\n"
+		"	.long	1b,20b\n"
+		"	.long	2b,20b\n"
+		"	.long	3b,20b\n"
+		"	.long	4b,20b\n"
 		"	.long	5b,40b\n"
 		"	.long	6b,40b\n"
 		"	.long	7b,40b\n"
+		"	.long	8b,40b\n"
+		"	.long	9b,40b\n"
 		"	.previous"
 		: "=d" (res), "+a" (to)
 		: "d" (0), "0" (n / 4), "d" (n & 3));
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 1/2] m68k: Handle __generic_copy_to_user faults more carefully
  2024-04-29  3:09 ` [PATCH v4 1/2] m68k: Handle __generic_copy_to_user faults more carefully Michael Schmitz
@ 2024-08-07  8:14   ` Finn Thain
  2024-08-07 19:32     ` Michael Schmitz
  0 siblings, 1 reply; 17+ messages in thread
From: Finn Thain @ 2024-08-07  8:14 UTC (permalink / raw)
  To: Michael Schmitz; +Cc: linux-m68k, geert, gerg, linux-m68k


On Mon, 29 Apr 2024, Michael Schmitz wrote:

> As mentioned by Finn Thain in his patch to improve put_user
> exception handling on 040, a similar problem exists on 030
> processors.
> 
> A moves instruction that crosses a page boundary from a
> mapped page into an unmapped one will cause a mid-instruction
> bus error exception (frame format b), with the PC pointing
> (usually) two instructions past the faulting movesl instruction.
> 
> Our exception handling in __generic_copy_to_user only covers
> the instruction immediately following the faulting one. As
> a result, fixup_exception in send_fault_sig does not detect
> this case, and cause send_fault_sig to oops.
> 
> Extend the exception table to cover one additional instruction
> beyond the moves[lwb] instructions.
> 
> Tested on 68030 (Atari Falcon 030) with transfers beginning
> at one to six bytes offset from the end of a mapped page,
> followed by further bytes on an unmapped page (testcase
> derived from stress-ng sysbadaddr stressor by Finn Thain).
> 
> Tested on 68040 (Mac Quadra) and 68030 (Mac IIci) by Finn Thain.
> 
> A similar problem is present in __clear_user(); modify the
> exception table for that function in the same way (tested by
> Finn Thain).
> 

Well, that __clear_user() bug is no longer theoretical. I accidentally 
bumped into it when I sent a ^C to a shell script I wrote to test some 
mac_scsi driver patches...


[    0.000000] Linux version 6.10.0-mac-00011-gd1d490afeb2a (fthain@nippy) (m68k-unknown-linux-musl-gcc (Gentoo 13.2.1_p20240210 p14) 13.2.1 20240210, GNU ld (Gentoo 2.40 p7) 2.40.0) #39 Sat Aug  3 19:57:43 AEST 2024

...

root@(none):/# bash scsi-test.sh /dev/sda5
bs=512k count=1
[  130.090000] sd 0:0:0:0: [sda] tag#9 PDMA fixup: DRQ timeout
[  130.090000] sd 0:0:0:0: [sda] tag#9 switching to slow handshake
[  130.180000] sd 0:0:0:0: Power-on or device reset occurred
1+0 records in
1+0 records out
524288 bytes (524 kB, 512 KiB) copied, 5.13732 s, 102 kB/s
[  136.030000] bash (57): drop_caches: 3
^C
root@(none):/# bash scsi-test.sh /dev/sdb5
bs=512k count=1
1+0 records in
1+0 records out
524288 bytes (524 kB, 512 KiB) copied, 1.1273 s, 465 kB/s
[  160.440000] bash (61): drop_caches: 3
1+0 records in
1+0 records out
524288 bytes (524 kB, 512 KiB) copied, 1.12763 s, 465 kB/s
[  171.920000] bash (61): drop_caches: 3

bs=64k count=8
8+0 records in
8+0 records out
524288 bytes (524 kB, 512 KiB) copied, 1.03469 s, 507 kB/s
[  184.480000] bash (61): drop_caches: 3
8+0 records in
8+0 records out
524288 bytes (524 kB, 512 KiB) copied, 1.03221 s, 508 kB/s
[  196.340000] bash (61): drop_caches: 3

bs=4k count=128
128+0 records in
128+0 records out
524288 bytes (524 kB, 512 KiB) copied, 1.13089 s, 464 kB/s
[  208.860000] bash (61): drop_caches: 3
128+0 records in
128+0 records out
524288 bytes (524 kB, 512 KiB) copied, 1.13089 s, 464 kB/s
[  220.850000] bash (61): drop_caches: 3

bs=512 count=1k
1024+0 records in
1024+0 records out
524288 bytes (524 kB, 512 KiB) copied, 3.69678 s, 142 kB/s
[  236.020000] bash (61): drop_caches: 3
1024+0 records in
1024+0 records out
524288 bytes (524 kB, 512 KiB) copied, 3.68874 s, 142 kB/s
[  250.570000] bash (61): drop_caches: 3

bs=512k count=1
1+0 records in
1+0 records out
524288 bytes (524 kB, 512 KiB) copied, 1.12338 s, 467 kB/s
[  263.120000] bash (61): drop_caches: 3
1+0 records in
1+0 records out
524288 bytes (524 kB, 512 KiB) copied, 1.12665 s, 465 kB/s
[  275.110000] bash (61): drop_caches: 3

bs=64k count=8
8+0 records in
8+0 records out
524288 bytes (524 kB, 512 KiB) copied, 1.08861 s, 482 kB/s
[  287.610000] bash (61): drop_caches: 3
8+0 records in
8+0 records out
524288 bytes (524 kB, 512 KiB) copied, 1.03319 s, 507 kB/s
[  299.510000] bash (61): drop_caches: 3
^C[  303.190000] Unable to handle kernel access at virtual address d7d17fd7
[  303.200000] Oops: 00000000
[  303.210000] Modules linked in:
[  303.220000] PC: [<004301fe>] __clear_user+0x22/0x40
[  303.240000] SR: 2000  SP: 4841d159  a2: 00b78590
[  303.250000] d0: 000003ab    d1: 00000000    d2: 00000000    d3: 8000c150
[  303.260000] d4: 00002000    d5: 00000000    a0: 8000c154    a1: 009bdb3c
[  303.280000] Process cmp (pid: 92, task=7d95deea)
[  303.290000] Frame format=B ssw=0709 isc=0801 isb=0001 daddr=8000c150 dobuf=00000000
[  303.310000] baddr=00430202 dibuf=8000c150 ver=f
[  303.320000] Stack from 0149de8c:
[  303.320000]         8001c2c4 0149deb8 00139904 8000c150 00000eb0 00983c00 00000000 00000003
[  303.320000]         00000003 8000bf00 00983a00 0149df40 0013a1fa 00912480 8000bf00 00983c60
[  303.320000]         00000003 00000012 00000000 fffffff8 00000006 d012f728 00000000 00983a5a
[  303.320000]         00000004 00983a00 000579d4 0052b7e8 009125a0 00912480 00983a5a 80000034
[  303.320000]         000d1704 014994c0 80008864 80000000 00000000 80008864 80008864 00000000
[  303.320000]         01497800 00000000 00000034 fffffff8 fffffff8 0149df7c 000e9296 00983a00
[  303.380000] Call Trace: [<00139904>] elf_load+0x192/0x1da
[  303.390000]  [<0013a1fa>] load_elf_binary+0x8ae/0xe46
[  303.400000]  [<000579d4>] try_module_get+0x0/0x48
[  303.410000]  [<000d1704>] kfree+0x0/0x160
[  303.420000]  [<000e9296>] bprm_execve+0x134/0x416
[  303.430000]  [<000e9daa>] copy_string_kernel+0x0/0x184
[  303.440000]  [<000e9fa4>] copy_strings+0x0/0x310
[  303.460000]  [<000ea3f6>] do_execveat_common+0x142/0x1d2
[  303.480000]  [<000eb128>] sys_execve+0x2a/0x36
[  303.490000]  [<0000275e>] syscall+0x8/0xc
[  303.510000]  [<0008c010>] do_read_cache_folio+0xde/0x2c8
[  303.520000] 
[  303.520000] Code: 206e 0008 4282 4a80 6708 0e98 2800 5380 <66f8> 0801 0001 6704 0e58 2800 0801 0000 6704 0e10 2800 241f 4e5e 4e75 4e75 4e56
[  303.560000] Disabling lock debugging due to kernel taint

root@(none):/# 


Also, it appears that the patch Michael posted in this thread would have 
prevented that oops because the "jne 1b" at __clear_user+0x22 would have 
been found in the exception table.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 1/2] m68k: Handle __generic_copy_to_user faults more carefully
  2024-08-07  8:14   ` Finn Thain
@ 2024-08-07 19:32     ` Michael Schmitz
  2024-08-08  1:57       ` Finn Thain
  0 siblings, 1 reply; 17+ messages in thread
From: Michael Schmitz @ 2024-08-07 19:32 UTC (permalink / raw)
  To: Finn Thain; +Cc: linux-m68k, geert, gerg, linux-m68k

Hi Finn,

thanks for (inadvertently) proving this bug in __clear_user() does exist!

I'm afraid I've lost track of where we're at with this patch series. 
Does it need more work, or more bug reports such as the one below? The 
previous bug reports might be considered somewhat contrived but this 
one's from 'real' user space code, and none too complex at that?

Cheers,

     Michael

On 7/08/24 20:14, Finn Thain wrote:
> On Mon, 29 Apr 2024, Michael Schmitz wrote:
>
>> As mentioned by Finn Thain in his patch to improve put_user
>> exception handling on 040, a similar problem exists on 030
>> processors.
>>
>> A moves instruction that crosses a page boundary from a
>> mapped page into an unmapped one will cause a mid-instruction
>> bus error exception (frame format b), with the PC pointing
>> (usually) two instructions past the faulting movesl instruction.
>>
>> Our exception handling in __generic_copy_to_user only covers
>> the instruction immediately following the faulting one. As
>> a result, fixup_exception in send_fault_sig does not detect
>> this case, and cause send_fault_sig to oops.
>>
>> Extend the exception table to cover one additional instruction
>> beyond the moves[lwb] instructions.
>>
>> Tested on 68030 (Atari Falcon 030) with transfers beginning
>> at one to six bytes offset from the end of a mapped page,
>> followed by further bytes on an unmapped page (testcase
>> derived from stress-ng sysbadaddr stressor by Finn Thain).
>>
>> Tested on 68040 (Mac Quadra) and 68030 (Mac IIci) by Finn Thain.
>>
>> A similar problem is present in __clear_user(); modify the
>> exception table for that function in the same way (tested by
>> Finn Thain).
>>
> Well, that __clear_user() bug is no longer theoretical. I accidentally
> bumped into it when I sent a ^C to a shell script I wrote to test some
> mac_scsi driver patches...
>
>
> [    0.000000] Linux version 6.10.0-mac-00011-gd1d490afeb2a (fthain@nippy) (m68k-unknown-linux-musl-gcc (Gentoo 13.2.1_p20240210 p14) 13.2.1 20240210, GNU ld (Gentoo 2.40 p7) 2.40.0) #39 Sat Aug  3 19:57:43 AEST 2024
>
> ...
>
> root@(none):/# bash scsi-test.sh /dev/sda5
> bs=512k count=1
> [  130.090000] sd 0:0:0:0: [sda] tag#9 PDMA fixup: DRQ timeout
> [  130.090000] sd 0:0:0:0: [sda] tag#9 switching to slow handshake
> [  130.180000] sd 0:0:0:0: Power-on or device reset occurred
> 1+0 records in
> 1+0 records out
> 524288 bytes (524 kB, 512 KiB) copied, 5.13732 s, 102 kB/s
> [  136.030000] bash (57): drop_caches: 3
> ^C
> root@(none):/# bash scsi-test.sh /dev/sdb5
> bs=512k count=1
> 1+0 records in
> 1+0 records out
> 524288 bytes (524 kB, 512 KiB) copied, 1.1273 s, 465 kB/s
> [  160.440000] bash (61): drop_caches: 3
> 1+0 records in
> 1+0 records out
> 524288 bytes (524 kB, 512 KiB) copied, 1.12763 s, 465 kB/s
> [  171.920000] bash (61): drop_caches: 3
>
> bs=64k count=8
> 8+0 records in
> 8+0 records out
> 524288 bytes (524 kB, 512 KiB) copied, 1.03469 s, 507 kB/s
> [  184.480000] bash (61): drop_caches: 3
> 8+0 records in
> 8+0 records out
> 524288 bytes (524 kB, 512 KiB) copied, 1.03221 s, 508 kB/s
> [  196.340000] bash (61): drop_caches: 3
>
> bs=4k count=128
> 128+0 records in
> 128+0 records out
> 524288 bytes (524 kB, 512 KiB) copied, 1.13089 s, 464 kB/s
> [  208.860000] bash (61): drop_caches: 3
> 128+0 records in
> 128+0 records out
> 524288 bytes (524 kB, 512 KiB) copied, 1.13089 s, 464 kB/s
> [  220.850000] bash (61): drop_caches: 3
>
> bs=512 count=1k
> 1024+0 records in
> 1024+0 records out
> 524288 bytes (524 kB, 512 KiB) copied, 3.69678 s, 142 kB/s
> [  236.020000] bash (61): drop_caches: 3
> 1024+0 records in
> 1024+0 records out
> 524288 bytes (524 kB, 512 KiB) copied, 3.68874 s, 142 kB/s
> [  250.570000] bash (61): drop_caches: 3
>
> bs=512k count=1
> 1+0 records in
> 1+0 records out
> 524288 bytes (524 kB, 512 KiB) copied, 1.12338 s, 467 kB/s
> [  263.120000] bash (61): drop_caches: 3
> 1+0 records in
> 1+0 records out
> 524288 bytes (524 kB, 512 KiB) copied, 1.12665 s, 465 kB/s
> [  275.110000] bash (61): drop_caches: 3
>
> bs=64k count=8
> 8+0 records in
> 8+0 records out
> 524288 bytes (524 kB, 512 KiB) copied, 1.08861 s, 482 kB/s
> [  287.610000] bash (61): drop_caches: 3
> 8+0 records in
> 8+0 records out
> 524288 bytes (524 kB, 512 KiB) copied, 1.03319 s, 507 kB/s
> [  299.510000] bash (61): drop_caches: 3
> ^C[  303.190000] Unable to handle kernel access at virtual address d7d17fd7
> [  303.200000] Oops: 00000000
> [  303.210000] Modules linked in:
> [  303.220000] PC: [<004301fe>] __clear_user+0x22/0x40
> [  303.240000] SR: 2000  SP: 4841d159  a2: 00b78590
> [  303.250000] d0: 000003ab    d1: 00000000    d2: 00000000    d3: 8000c150
> [  303.260000] d4: 00002000    d5: 00000000    a0: 8000c154    a1: 009bdb3c
> [  303.280000] Process cmp (pid: 92, task=7d95deea)
> [  303.290000] Frame format=B ssw=0709 isc=0801 isb=0001 daddr=8000c150 dobuf=00000000
> [  303.310000] baddr=00430202 dibuf=8000c150 ver=f
> [  303.320000] Stack from 0149de8c:
> [  303.320000]         8001c2c4 0149deb8 00139904 8000c150 00000eb0 00983c00 00000000 00000003
> [  303.320000]         00000003 8000bf00 00983a00 0149df40 0013a1fa 00912480 8000bf00 00983c60
> [  303.320000]         00000003 00000012 00000000 fffffff8 00000006 d012f728 00000000 00983a5a
> [  303.320000]         00000004 00983a00 000579d4 0052b7e8 009125a0 00912480 00983a5a 80000034
> [  303.320000]         000d1704 014994c0 80008864 80000000 00000000 80008864 80008864 00000000
> [  303.320000]         01497800 00000000 00000034 fffffff8 fffffff8 0149df7c 000e9296 00983a00
> [  303.380000] Call Trace: [<00139904>] elf_load+0x192/0x1da
> [  303.390000]  [<0013a1fa>] load_elf_binary+0x8ae/0xe46
> [  303.400000]  [<000579d4>] try_module_get+0x0/0x48
> [  303.410000]  [<000d1704>] kfree+0x0/0x160
> [  303.420000]  [<000e9296>] bprm_execve+0x134/0x416
> [  303.430000]  [<000e9daa>] copy_string_kernel+0x0/0x184
> [  303.440000]  [<000e9fa4>] copy_strings+0x0/0x310
> [  303.460000]  [<000ea3f6>] do_execveat_common+0x142/0x1d2
> [  303.480000]  [<000eb128>] sys_execve+0x2a/0x36
> [  303.490000]  [<0000275e>] syscall+0x8/0xc
> [  303.510000]  [<0008c010>] do_read_cache_folio+0xde/0x2c8
> [  303.520000]
> [  303.520000] Code: 206e 0008 4282 4a80 6708 0e98 2800 5380 <66f8> 0801 0001 6704 0e58 2800 0801 0000 6704 0e10 2800 241f 4e5e 4e75 4e75 4e56
> [  303.560000] Disabling lock debugging due to kernel taint
>
> root@(none):/#
>
>
> Also, it appears that the patch Michael posted in this thread would have
> prevented that oops because the "jne 1b" at __clear_user+0x22 would have
> been found in the exception table.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 1/2] m68k: Handle __generic_copy_to_user faults more carefully
  2024-08-07 19:32     ` Michael Schmitz
@ 2024-08-08  1:57       ` Finn Thain
  2024-08-08  6:05         ` Greg Ungerer
  0 siblings, 1 reply; 17+ messages in thread
From: Finn Thain @ 2024-08-08  1:57 UTC (permalink / raw)
  To: Michael Schmitz; +Cc: linux-m68k, geert, gerg, linux-m68k


Hello Michael

On Thu, 8 Aug 2024, Michael Schmitz wrote:

> > Well, that __clear_user() bug is no longer theoretical. I accidentally 
> > bumped into it when I sent a ^C to a shell script I wrote to test some 
> > mac_scsi driver patches...

...

> 
> I'm afraid I've lost track of where we're at with this patch series. 
> Does it need more work, or more bug reports such as the one below?

Apparently the series is waiting for some testing on a Coldfire system 
with MMU.

> The previous bug reports might be considered somewhat contrived but this 
> one's from 'real' user space code, and none too complex at that?
> 

Right. That code was as follows. There's nothing here aimed at 
arch/m68k/lib/uaccess.c in particular, just IO to a block device and a 
tmpfs filesystem.


#!/bin/bash

set -e -u

filename=$1

rand=/tmp/rand_test_data
zero=/tmp/zero_test_data
len=512K

dd if=/dev/urandom of=$rand bs=$len count=1 2>/dev/null
dd if=/dev/zero of=$zero bs=$len count=1 2>/dev/null

write() {
        dd $args if=$orig of=$filename # oflag=direct
}

compare() {
        echo 3 > /proc/sys/vm/drop_caches
        if ! cmp -n $len $orig $filename ; then
                diff -u <(hexdump -C < $orig) <(hexdump -C < $filename)
        fi
}

while true; do
        for args in "bs=512k count=1" "bs=64k count=8" "bs=4k count=128" "bs=512 count=1k" ; do
                echo $args
                orig=$rand ; write ; compare
                orig=$zero ; write ; compare
                sync
                echo
        done
done

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 1/2] m68k: Handle __generic_copy_to_user faults more carefully
  2024-08-08  1:57       ` Finn Thain
@ 2024-08-08  6:05         ` Greg Ungerer
  2024-08-08  6:56           ` Finn Thain
  2024-08-08  6:58           ` Michael Schmitz
  0 siblings, 2 replies; 17+ messages in thread
From: Greg Ungerer @ 2024-08-08  6:05 UTC (permalink / raw)
  To: Finn Thain, Michael Schmitz; +Cc: linux-m68k, geert, linux-m68k

Hi Michael, Finn,

On 8/8/24 11:57, Finn Thain wrote:
> 
> Hello Michael
> 
> On Thu, 8 Aug 2024, Michael Schmitz wrote:
> 
>>> Well, that __clear_user() bug is no longer theoretical. I accidentally
>>> bumped into it when I sent a ^C to a shell script I wrote to test some
>>> mac_scsi driver patches...
> 
> ...
> 
>>
>> I'm afraid I've lost track of where we're at with this patch series.
>> Does it need more work, or more bug reports such as the one below?
> 
> Apparently the series is waiting for some testing on a Coldfire system
> with MMU.

Ok, I am in a state that I can do that now (I managed to fix my M5475EVB board).
If I test the v4 versions of this patch set that should do the job?

Regards
Greg



>> The previous bug reports might be considered somewhat contrived but this
>> one's from 'real' user space code, and none too complex at that?
>>
> 
> Right. That code was as follows. There's nothing here aimed at
> arch/m68k/lib/uaccess.c in particular, just IO to a block device and a
> tmpfs filesystem.
> 
> 
> #!/bin/bash
> 
> set -e -u
> 
> filename=$1
> 
> rand=/tmp/rand_test_data
> zero=/tmp/zero_test_data
> len=512K
> 
> dd if=/dev/urandom of=$rand bs=$len count=1 2>/dev/null
> dd if=/dev/zero of=$zero bs=$len count=1 2>/dev/null
> 
> write() {
>          dd $args if=$orig of=$filename # oflag=direct
> }
> 
> compare() {
>          echo 3 > /proc/sys/vm/drop_caches
>          if ! cmp -n $len $orig $filename ; then
>                  diff -u <(hexdump -C < $orig) <(hexdump -C < $filename)
>          fi
> }
> 
> while true; do
>          for args in "bs=512k count=1" "bs=64k count=8" "bs=4k count=128" "bs=512 count=1k" ; do
>                  echo $args
>                  orig=$rand ; write ; compare
>                  orig=$zero ; write ; compare
>                  sync
>                  echo
>          done
> done

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 1/2] m68k: Handle __generic_copy_to_user faults more carefully
  2024-08-08  6:05         ` Greg Ungerer
@ 2024-08-08  6:56           ` Finn Thain
  2024-08-08 14:52             ` Greg Ungerer
  2024-08-08  6:58           ` Michael Schmitz
  1 sibling, 1 reply; 17+ messages in thread
From: Finn Thain @ 2024-08-08  6:56 UTC (permalink / raw)
  To: Greg Ungerer; +Cc: Michael Schmitz, linux-m68k, geert, linux-m68k


On Thu, 8 Aug 2024, Greg Ungerer wrote:

> On 8/8/24 11:57, Finn Thain wrote:
> > 
> > >
> > > I'm afraid I've lost track of where we're at with this patch series. 
> > > Does it need more work, or more bug reports such as the one below?
> > 
> > Apparently the series is waiting for some testing on a Coldfire system 
> > with MMU.
> 
> Ok, I am in a state that I can do that now (I managed to fix my M5475EVB 
> board). 

Thanks, Greg.

> If I test the v4 versions of this patch set that should do the job?
> 

There are 3 patches that need some more testing, one from me and two from 
Michael:

[PATCH] m68k: Handle put_user() faults more carefully
[PATCH v4 1/2] m68k: Handle __generic_copy_to_user faults more carefully
[PATCH v4 2/2] m68k: improve __constant_copy_to_user_asm() fault handling

They were posted in two threads:

https://lore.kernel.org/linux-m68k/1ed9c4c753395510c1a8df9c35e2ad4c31c90f95.1714265926.git.fthain@linux-m68k.org/T/
https://lore.kernel.org/linux-m68k/CAMuHMdVrOnOQwCxk42YCjEkbfL-YDSvpf_xTaouv9hUs2bO+qg@mail.gmail.com/T/

On 680x0, one of the bugs was brought to light with 'stress-ng 
--sysbadaddr -1'. The others required special test programs.

I've no idea whether Coldfire silicon is susceptable at all, and if it is, 
whether the same reproducers would work. But that's neither here nor there 
from a regression testing standpoint.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 1/2] m68k: Handle __generic_copy_to_user faults more carefully
  2024-08-08  6:56           ` Finn Thain
@ 2024-08-08 14:52             ` Greg Ungerer
  2024-08-08 19:27               ` Michael Schmitz
  2024-08-09  3:22               ` Finn Thain
  0 siblings, 2 replies; 17+ messages in thread
From: Greg Ungerer @ 2024-08-08 14:52 UTC (permalink / raw)
  To: Finn Thain; +Cc: Michael Schmitz, linux-m68k, geert, linux-m68k

Hi Finn,

On 8/8/24 16:56, Finn Thain wrote:
> On Thu, 8 Aug 2024, Greg Ungerer wrote:
>> On 8/8/24 11:57, Finn Thain wrote:
>>>> I'm afraid I've lost track of where we're at with this patch series.
>>>> Does it need more work, or more bug reports such as the one below?
>>>
>>> Apparently the series is waiting for some testing on a Coldfire system
>>> with MMU.
>>
>> Ok, I am in a state that I can do that now (I managed to fix my M5475EVB
>> board).
> 
> Thanks, Greg.
> 
>> If I test the v4 versions of this patch set that should do the job?
>>
> 
> There are 3 patches that need some more testing, one from me and two from
> Michael:
> 
> [PATCH] m68k: Handle put_user() faults more carefully
> [PATCH v4 1/2] m68k: Handle __generic_copy_to_user faults more carefully
> [PATCH v4 2/2] m68k: improve __constant_copy_to_user_asm() fault handling
> 
> They were posted in two threads:
> 
> https://lore.kernel.org/linux-m68k/1ed9c4c753395510c1a8df9c35e2ad4c31c90f95.1714265926.git.fthain@linux-m68k.org/T/
> https://lore.kernel.org/linux-m68k/CAMuHMdVrOnOQwCxk42YCjEkbfL-YDSvpf_xTaouv9hUs2bO+qg@mail.gmail.com/T/
> 
> On 680x0, one of the bugs was brought to light with 'stress-ng
> --sysbadaddr -1'. The others required special test programs.
> 
> I've no idea whether Coldfire silicon is susceptable at all, and if it is,
> whether the same reproducers would work. But that's neither here nor there
> from a regression testing standpoint.

Ok, thanks for the links. I have applied and tested those, no obvious regressions.
So for all of these patches:

Tested-by: Greg Ungerer <gerg@linux-m68k.org>

I tried out the "stress-ng --sysbadaddr -1" test, and that didn't go so well for me:

# stress-ng --sysbadaddr -1
stress-ng: info:  [37] defaulting to a 86400 second (1 day, 0.00 secs) run per stressor
stress-ng: info:  [37] dispatching hogs: 1 sysbadaddr
*** ILLEGAL INSTRUCTION ***   FORMAT=4
Current process id is 39
BAD KERNEL TRAP: 00000000
Modules linked in:
PC: [<00000000>] 0x0
SR: 2004  SP: 6504e563  a2: 008ee380
d0: 000000f7    d1: 00000000    d2: 00000000    d3: 00000000
d4: 00a87b80    d5: bfbf3814    a0: 00000000    a1: bfbf3814
Process stress-ng-sysba (pid: 39, task=4dbb2ec5)
Frame format=4 eff addr=480a2004 pc=0002b154
Stack from 00adff20:
         00ade000 00000000 00000000 000000f7 00000000 00000004 00a87b80 00000000
         00000000 00000000 00000000 008ee380 0002ab5c 00000100 00000122 fffffff6
         bfbf376c 0002b29e 000000f7 bfbf3814 00000000 00000000 00ade000 0002b222
         00ae0800 80118988 00000000 00000005 bfbf37a0 00000005 bfbf3814 00adffcc
         00023d2c 00adffcc 00000000 00000000 00000000 00000000 000000f7 00000000
         80118b46 00021850 00024b00 000000f7 bfbf3814 00000000 00000000 bfbf3814
Call Trace: [<0002ab5c>] child_wait_callback+0x0/0x24
  [<0002b29e>] sys_wait4+0x7c/0x8e
  [<0002b222>] sys_wait4+0x0/0x8e
  [<00023d2c>] buserr_c+0xb0/0x152
  [<00021850>] buserr+0x28/0x30
  [<00024b00>] system_call+0x54/0xa8

But that is the same with and without these patches.

Regards
Greg


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 1/2] m68k: Handle __generic_copy_to_user faults more carefully
  2024-08-08 14:52             ` Greg Ungerer
@ 2024-08-08 19:27               ` Michael Schmitz
  2024-08-09  3:34                 ` Finn Thain
  2024-08-09  3:22               ` Finn Thain
  1 sibling, 1 reply; 17+ messages in thread
From: Michael Schmitz @ 2024-08-08 19:27 UTC (permalink / raw)
  To: Greg Ungerer, Finn Thain; +Cc: linux-m68k, geert, linux-m68k

Hi Greg,

tanks for testing!

On 9/08/24 02:52, Greg Ungerer wrote:
> Hi Finn,
>
> On 8/8/24 16:56, Finn Thain wrote:
>> On Thu, 8 Aug 2024, Greg Ungerer wrote:
>>> On 8/8/24 11:57, Finn Thain wrote:
>>>>> I'm afraid I've lost track of where we're at with this patch series.
>>>>> Does it need more work, or more bug reports such as the one below?
>>>>
>>>> Apparently the series is waiting for some testing on a Coldfire system
>>>> with MMU.
>>>
>>> Ok, I am in a state that I can do that now (I managed to fix my 
>>> M5475EVB
>>> board).
>>
>> Thanks, Greg.
>>
>>> If I test the v4 versions of this patch set that should do the job?
>>>
>>
>> There are 3 patches that need some more testing, one from me and two 
>> from
>> Michael:
>>
>> [PATCH] m68k: Handle put_user() faults more carefully
>> [PATCH v4 1/2] m68k: Handle __generic_copy_to_user faults more carefully
>> [PATCH v4 2/2] m68k: improve __constant_copy_to_user_asm() fault 
>> handling
>>
>> They were posted in two threads:
>>
>> https://lore.kernel.org/linux-m68k/1ed9c4c753395510c1a8df9c35e2ad4c31c90f95.1714265926.git.fthain@linux-m68k.org/T/ 
>>
>> https://lore.kernel.org/linux-m68k/CAMuHMdVrOnOQwCxk42YCjEkbfL-YDSvpf_xTaouv9hUs2bO+qg@mail.gmail.com/T/ 
>>
>>
>> On 680x0, one of the bugs was brought to light with 'stress-ng
>> --sysbadaddr -1'. The others required special test programs.
>>
>> I've no idea whether Coldfire silicon is susceptable at all, and if 
>> it is,
>> whether the same reproducers would work. But that's neither here nor 
>> there
>> from a regression testing standpoint.
>
> Ok, thanks for the links. I have applied and tested those, no obvious 
> regressions.
> So for all of these patches:
>
> Tested-by: Greg Ungerer <gerg@linux-m68k.org>
>
> I tried out the "stress-ng --sysbadaddr -1" test, and that didn't go 
> so well for me:
>
> # stress-ng --sysbadaddr -1
> stress-ng: info:  [37] defaulting to a 86400 second (1 day, 0.00 secs) 
> run per stressor
> stress-ng: info:  [37] dispatching hogs: 1 sysbadaddr
> *** ILLEGAL INSTRUCTION ***   FORMAT=4
> Current process id is 39
> BAD KERNEL TRAP: 00000000
> Modules linked in:
> PC: [<00000000>] 0x0
> SR: 2004  SP: 6504e563  a2: 008ee380
> d0: 000000f7    d1: 00000000    d2: 00000000    d3: 00000000
> d4: 00a87b80    d5: bfbf3814    a0: 00000000    a1: bfbf3814
> Process stress-ng-sysba (pid: 39, task=4dbb2ec5)
> Frame format=4 eff addr=480a2004 pc=0002b154
> Stack from 00adff20:
>         00ade000 00000000 00000000 000000f7 00000000 00000004 00a87b80 
> 00000000
>         00000000 00000000 00000000 008ee380 0002ab5c 00000100 00000122 
> fffffff6
>         bfbf376c 0002b29e 000000f7 bfbf3814 00000000 00000000 00ade000 
> 0002b222
>         00ae0800 80118988 00000000 00000005 bfbf37a0 00000005 bfbf3814 
> 00adffcc
>         00023d2c 00adffcc 00000000 00000000 00000000 00000000 000000f7 
> 00000000
>         80118b46 00021850 00024b00 000000f7 bfbf3814 00000000 00000000 
> bfbf3814
> Call Trace: [<0002ab5c>] child_wait_callback+0x0/0x24
>  [<0002b29e>] sys_wait4+0x7c/0x8e
>  [<0002b222>] sys_wait4+0x0/0x8e
>  [<00023d2c>] buserr_c+0xb0/0x152
>  [<00021850>] buserr+0x28/0x30
>  [<00024b00>] system_call+0x54/0xa8
>
> But that is the same with and without these patches.

I wonder if recent signal handling changes (e.g. commit 
0d4276cfbe6fd4c4a21acdee803b05a3a6192082) have rare unexpected side 
effects on Coldfire here ... OTOH, signal handling as such works just 
fine, right?

Cheers,

     Michael


>
> Regards
> Greg
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 1/2] m68k: Handle __generic_copy_to_user faults more carefully
  2024-08-08 19:27               ` Michael Schmitz
@ 2024-08-09  3:34                 ` Finn Thain
  2024-08-09  8:03                   ` Michael Schmitz
  0 siblings, 1 reply; 17+ messages in thread
From: Finn Thain @ 2024-08-09  3:34 UTC (permalink / raw)
  To: Michael Schmitz; +Cc: Greg Ungerer, linux-m68k, geert, linux-m68k

[-- Attachment #1: Type: text/plain, Size: 2360 bytes --]


On Fri, 9 Aug 2024, Michael Schmitz wrote:

> >
> > I tried out the "stress-ng --sysbadaddr -1" test, and that didn't go 
> > so well for me:
> >
> > # stress-ng --sysbadaddr -1
> > stress-ng: info:  [37] defaulting to a 86400 second (1 day, 0.00 secs) run
> > per stressor
> > stress-ng: info:  [37] dispatching hogs: 1 sysbadaddr
> > *** ILLEGAL INSTRUCTION ***   FORMAT=4
> > Current process id is 39
> > BAD KERNEL TRAP: 00000000
> > Modules linked in:
> > PC: [<00000000>] 0x0
> > SR: 2004  SP: 6504e563  a2: 008ee380
> > d0: 000000f7    d1: 00000000    d2: 00000000    d3: 00000000
> > d4: 00a87b80    d5: bfbf3814    a0: 00000000    a1: bfbf3814
> > Process stress-ng-sysba (pid: 39, task=4dbb2ec5)
> > Frame format=4 eff addr=480a2004 pc=0002b154
> > Stack from 00adff20:
> >         00ade000 00000000 00000000 000000f7 00000000 00000004 00a87b80
> > 00000000
> >         00000000 00000000 00000000 008ee380 0002ab5c 00000100 00000122
> > fffffff6
> >         bfbf376c 0002b29e 000000f7 bfbf3814 00000000 00000000 00ade000
> > 0002b222
> >         00ae0800 80118988 00000000 00000005 bfbf37a0 00000005 bfbf3814
> > 00adffcc
> >         00023d2c 00adffcc 00000000 00000000 00000000 00000000 000000f7
> > 00000000
> >         80118b46 00021850 00024b00 000000f7 bfbf3814 00000000 00000000
> > bfbf3814
> > Call Trace: [<0002ab5c>] child_wait_callback+0x0/0x24
> >  [<0002b29e>] sys_wait4+0x7c/0x8e
> >  [<0002b222>] sys_wait4+0x0/0x8e
> >  [<00023d2c>] buserr_c+0xb0/0x152
> >  [<00021850>] buserr+0x28/0x30
> >  [<00024b00>] system_call+0x54/0xa8
> >
> > But that is the same with and without these patches.
> 
> I wonder if recent signal handling changes (e.g. commit
> 0d4276cfbe6fd4c4a21acdee803b05a3a6192082) have rare unexpected side effects on
> Coldfire here ... OTOH, signal handling as such works just fine, right?
> 

That would be commit b845b574f86d ("m68k: Move signal frame following 
exception on 68020/030") on mainline. If it caused a regression, that 
would have first appeared in v6.4. I can't imagine how that commit could 
affect Coldfire but that's no reason not to test an older kernel.
FWIW, my hunch is that the other stressors which call wait4() will 
probably crash too (regardless of kernel version).

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 1/2] m68k: Handle __generic_copy_to_user faults more carefully
  2024-08-09  3:34                 ` Finn Thain
@ 2024-08-09  8:03                   ` Michael Schmitz
  2024-08-09 12:58                     ` Greg Ungerer
  0 siblings, 1 reply; 17+ messages in thread
From: Michael Schmitz @ 2024-08-09  8:03 UTC (permalink / raw)
  To: Finn Thain; +Cc: Greg Ungerer, linux-m68k, geert, linux-m68k

Hi Finn,

Am 09.08.2024 um 15:34 schrieb Finn Thain:
>
> On Fri, 9 Aug 2024, Michael Schmitz wrote:
>
>>>
>>> I tried out the "stress-ng --sysbadaddr -1" test, and that didn't go
>>> so well for me:
>>>
>>> # stress-ng --sysbadaddr -1
>>> stress-ng: info:  [37] defaulting to a 86400 second (1 day, 0.00 secs) run
>>> per stressor
>>> stress-ng: info:  [37] dispatching hogs: 1 sysbadaddr
>>> *** ILLEGAL INSTRUCTION ***   FORMAT=4
>>> Current process id is 39
>>> BAD KERNEL TRAP: 00000000
>>> Modules linked in:
>>> PC: [<00000000>] 0x0
>>> SR: 2004  SP: 6504e563  a2: 008ee380
>>> d0: 000000f7    d1: 00000000    d2: 00000000    d3: 00000000
>>> d4: 00a87b80    d5: bfbf3814    a0: 00000000    a1: bfbf3814
>>> Process stress-ng-sysba (pid: 39, task=4dbb2ec5)
>>> Frame format=4 eff addr=480a2004 pc=0002b154
>>> Stack from 00adff20:
>>>         00ade000 00000000 00000000 000000f7 00000000 00000004 00a87b80
>>> 00000000
>>>         00000000 00000000 00000000 008ee380 0002ab5c 00000100 00000122
>>> fffffff6
>>>         bfbf376c 0002b29e 000000f7 bfbf3814 00000000 00000000 00ade000
>>> 0002b222
>>>         00ae0800 80118988 00000000 00000005 bfbf37a0 00000005 bfbf3814
>>> 00adffcc
>>>         00023d2c 00adffcc 00000000 00000000 00000000 00000000 000000f7
>>> 00000000
>>>         80118b46 00021850 00024b00 000000f7 bfbf3814 00000000 00000000
>>> bfbf3814
>>> Call Trace: [<0002ab5c>] child_wait_callback+0x0/0x24
>>>  [<0002b29e>] sys_wait4+0x7c/0x8e
>>>  [<0002b222>] sys_wait4+0x0/0x8e
>>>  [<00023d2c>] buserr_c+0xb0/0x152
>>>  [<00021850>] buserr+0x28/0x30
>>>  [<00024b00>] system_call+0x54/0xa8
>>>
>>> But that is the same with and without these patches.
>>
>> I wonder if recent signal handling changes (e.g. commit
>> 0d4276cfbe6fd4c4a21acdee803b05a3a6192082) have rare unexpected side effects on
>> Coldfire here ... OTOH, signal handling as such works just fine, right?
>>
>
> That would be commit b845b574f86d ("m68k: Move signal frame following
> exception on 68020/030") on mainline. If it caused a regression, that
> would have first appeared in v6.4. I can't imagine how that commit could
> affect Coldfire but that's no reason not to test an older kernel.

Yes, testing older kernels is certainly the fastest way to rule out 
involvement of that commit.

On a closer look, there is no possible way in that this commit can be 
responsible for the bug unless Coldfire does use frame format B for 
access errors. I don't think that's likely?

> FWIW, my hunch is that the other stressors which call wait4() will
> probably crash too (regardless of kernel version).

Quite likely ...

Cheers,

	Michael



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 1/2] m68k: Handle __generic_copy_to_user faults more carefully
  2024-08-09  8:03                   ` Michael Schmitz
@ 2024-08-09 12:58                     ` Greg Ungerer
  0 siblings, 0 replies; 17+ messages in thread
From: Greg Ungerer @ 2024-08-09 12:58 UTC (permalink / raw)
  To: Michael Schmitz, Finn Thain; +Cc: linux-m68k, geert, linux-m68k


Hi Michael, Finn,

On 9/8/24 18:03, Michael Schmitz wrote:
> Hi Finn,
> 
> Am 09.08.2024 um 15:34 schrieb Finn Thain:
>>
>> On Fri, 9 Aug 2024, Michael Schmitz wrote:
>>
>>>>
>>>> I tried out the "stress-ng --sysbadaddr -1" test, and that didn't go
>>>> so well for me:
>>>>
>>>> # stress-ng --sysbadaddr -1
>>>> stress-ng: info:  [37] defaulting to a 86400 second (1 day, 0.00 
>>>> secs) run
>>>> per stressor
>>>> stress-ng: info:  [37] dispatching hogs: 1 sysbadaddr
>>>> *** ILLEGAL INSTRUCTION ***   FORMAT=4
>>>> Current process id is 39
>>>> BAD KERNEL TRAP: 00000000
>>>> Modules linked in:
>>>> PC: [<00000000>] 0x0
>>>> SR: 2004  SP: 6504e563  a2: 008ee380
>>>> d0: 000000f7    d1: 00000000    d2: 00000000    d3: 00000000
>>>> d4: 00a87b80    d5: bfbf3814    a0: 00000000    a1: bfbf3814
>>>> Process stress-ng-sysba (pid: 39, task=4dbb2ec5)
>>>> Frame format=4 eff addr=480a2004 pc=0002b154
>>>> Stack from 00adff20:
>>>>         00ade000 00000000 00000000 000000f7 00000000 00000004 00a87b80
>>>> 00000000
>>>>         00000000 00000000 00000000 008ee380 0002ab5c 00000100 00000122
>>>> fffffff6
>>>>         bfbf376c 0002b29e 000000f7 bfbf3814 00000000 00000000 00ade000
>>>> 0002b222
>>>>         00ae0800 80118988 00000000 00000005 bfbf37a0 00000005 bfbf3814
>>>> 00adffcc
>>>>         00023d2c 00adffcc 00000000 00000000 00000000 00000000 000000f7
>>>> 00000000
>>>>         80118b46 00021850 00024b00 000000f7 bfbf3814 00000000 00000000
>>>> bfbf3814
>>>> Call Trace: [<0002ab5c>] child_wait_callback+0x0/0x24
>>>>  [<0002b29e>] sys_wait4+0x7c/0x8e
>>>>  [<0002b222>] sys_wait4+0x0/0x8e
>>>>  [<00023d2c>] buserr_c+0xb0/0x152
>>>>  [<00021850>] buserr+0x28/0x30
>>>>  [<00024b00>] system_call+0x54/0xa8
>>>>
>>>> But that is the same with and without these patches.
>>>
>>> I wonder if recent signal handling changes (e.g. commit
>>> 0d4276cfbe6fd4c4a21acdee803b05a3a6192082) have rare unexpected side 
>>> effects on
>>> Coldfire here ... OTOH, signal handling as such works just fine, right?
>>>
>>
>> That would be commit b845b574f86d ("m68k: Move signal frame following
>> exception on 68020/030") on mainline. If it caused a regression, that
>> would have first appeared in v6.4. I can't imagine how that commit could
>> affect Coldfire but that's no reason not to test an older kernel.
> 
> Yes, testing older kernels is certainly the fastest way to rule out 
> involvement of that commit.

I will go back a few revisions and see if I can shed some light on it.

Regards
Greg


> On a closer look, there is no possible way in that this commit can be 
> responsible for the bug unless Coldfire does use frame format B for 
> access errors. I don't think that's likely?
> 
>> FWIW, my hunch is that the other stressors which call wait4() will
>> probably crash too (regardless of kernel version).
> 
> Quite likely ...
> 
> Cheers,
> 
>      Michael
> 
> 
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 1/2] m68k: Handle __generic_copy_to_user faults more carefully
  2024-08-08 14:52             ` Greg Ungerer
  2024-08-08 19:27               ` Michael Schmitz
@ 2024-08-09  3:22               ` Finn Thain
  1 sibling, 0 replies; 17+ messages in thread
From: Finn Thain @ 2024-08-09  3:22 UTC (permalink / raw)
  To: Greg Ungerer; +Cc: Michael Schmitz, linux-m68k, geert, linux-m68k


On Fri, 9 Aug 2024, Greg Ungerer wrote:

> 
> Ok, thanks for the links. I have applied and tested those, no obvious
> regressions.
> So for all of these patches:
> 
> Tested-by: Greg Ungerer <gerg@linux-m68k.org>
> 

Thanks for testing!

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 1/2] m68k: Handle __generic_copy_to_user faults more carefully
  2024-08-08  6:05         ` Greg Ungerer
  2024-08-08  6:56           ` Finn Thain
@ 2024-08-08  6:58           ` Michael Schmitz
  1 sibling, 0 replies; 17+ messages in thread
From: Michael Schmitz @ 2024-08-08  6:58 UTC (permalink / raw)
  To: Greg Ungerer, Finn Thain; +Cc: linux-m68k, geert, linux-m68k

Hi Greg,

Am 08.08.2024 um 18:05 schrieb Greg Ungerer:
> Hi Michael, Finn,
>
> On 8/8/24 11:57, Finn Thain wrote:
>>
>> Hello Michael
>>
>> On Thu, 8 Aug 2024, Michael Schmitz wrote:
>>
>>>> Well, that __clear_user() bug is no longer theoretical. I accidentally
>>>> bumped into it when I sent a ^C to a shell script I wrote to test some
>>>> mac_scsi driver patches...
>>
>> ...
>>
>>>
>>> I'm afraid I've lost track of where we're at with this patch series.
>>> Does it need more work, or more bug reports such as the one below?
>>
>> Apparently the series is waiting for some testing on a Coldfire system
>> with MMU.
>
> Ok, I am in a state that I can do that now (I managed to fix my M5475EVB
> board).
> If I test the v4 versions of this patch set that should do the job?

Yes, v4 is the one to test.

I'll see that I can dig out my test code to reproduce the bug.

Thanks,

	Michael

>
> Regards
> Greg
>
>
>
>>> The previous bug reports might be considered somewhat contrived but this
>>> one's from 'real' user space code, and none too complex at that?
>>>
>>
>> Right. That code was as follows. There's nothing here aimed at
>> arch/m68k/lib/uaccess.c in particular, just IO to a block device and a
>> tmpfs filesystem.
>>
>>
>> #!/bin/bash
>>
>> set -e -u
>>
>> filename=$1
>>
>> rand=/tmp/rand_test_data
>> zero=/tmp/zero_test_data
>> len=512K
>>
>> dd if=/dev/urandom of=$rand bs=$len count=1 2>/dev/null
>> dd if=/dev/zero of=$zero bs=$len count=1 2>/dev/null
>>
>> write() {
>>          dd $args if=$orig of=$filename # oflag=direct
>> }
>>
>> compare() {
>>          echo 3 > /proc/sys/vm/drop_caches
>>          if ! cmp -n $len $orig $filename ; then
>>                  diff -u <(hexdump -C < $orig) <(hexdump -C < $filename)
>>          fi
>> }
>>
>> while true; do
>>          for args in "bs=512k count=1" "bs=64k count=8" "bs=4k
>> count=128" "bs=512 count=1k" ; do
>>                  echo $args
>>                  orig=$rand ; write ; compare
>>                  orig=$zero ; write ; compare
>>                  sync
>>                  echo
>>          done
>> done

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v4 2/2] m68k: improve __constant_copy_to_user_asm() fault handling
  2024-04-29  3:09 [PATCH v4 0/2] m68k uaccess fault handling fixes Michael Schmitz
  2024-04-29  3:09 ` [PATCH v4 1/2] m68k: Handle __generic_copy_to_user faults more carefully Michael Schmitz
@ 2024-04-29  3:09 ` Michael Schmitz
  2024-04-29  7:58 ` [PATCH v4 0/2] m68k uaccess fault handling fixes Greg Ungerer
  2 siblings, 0 replies; 17+ messages in thread
From: Michael Schmitz @ 2024-04-29  3:09 UTC (permalink / raw)
  To: linux-m68k, geert; +Cc: gerg, Michael Schmitz, Finn Thain

A problem similar to that reported for __put_user_asm and
__generic_copy_to_user is also present in
__constant_copy_to_user_asm.

Address the problem by extending the exception table to cover
two instructions past each moves instruction, and adding a
single NOP at the very end to catch faults on the final
instruction (which is not guaranteed to be a movesb!).

Tested on 68030 (Atari Falcon 030) with a transfer beginning
at a single byte at the end of a mapped page followed by
seven more bytes on an unmapped page (testcase derived from
stress-ng sysbadaddr stressor by Finn Thain and modified to
use the llseek syscall).

Cc: Finn Thain <fthain@linux-m68k.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Tested-by: Finn Thain <fthain@linux-m68k.org>
Link: https://lore.kernel.org/all/e0f23460779e6d16e2633486ac4841790ef2aca0.1713176294.git.fthain@linux-m68k.org
Signed-off-by: Michael Schmitz <schmitzmic@gmail.com>
---
 arch/m68k/include/asm/uaccess.h | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/m68k/include/asm/uaccess.h b/arch/m68k/include/asm/uaccess.h
index 44e52d8323e5..f1f4b62d6f69 100644
--- a/arch/m68k/include/asm/uaccess.h
+++ b/arch/m68k/include/asm/uaccess.h
@@ -288,10 +288,11 @@ __constant_copy_from_user(void *to, const void __user *from, unsigned long n)
 		"21:	"MOVES"."#s2"	%3,(%1)+\n"			\
 		"22:\n"							\
 		"	.ifnc	\""#s3"\",\"\"\n"			\
-		"	move."#s3"	(%2)+,%3\n"			\
-		"31:	"MOVES"."#s3"	%3,(%1)+\n"			\
-		"32:\n"							\
+		"31:	move."#s3"	(%2)+,%3\n"			\
+		"32:	"MOVES"."#s3"	%3,(%1)+\n"			\
+		"33:\n"							\
 		"	.endif\n"					\
+		"34:	nop\n"						\
 		"4:\n"							\
 		"\n"							\
 		"	.section __ex_table,\"a\"\n"			\
@@ -303,7 +304,9 @@ __constant_copy_from_user(void *to, const void __user *from, unsigned long n)
 		"	.ifnc	\""#s3"\",\"\"\n"			\
 		"	.long	31b,5f\n"				\
 		"	.long	32b,5f\n"				\
+		"	.long	33b,5f\n"				\
 		"	.endif\n"					\
+		"	.long	34b,5f\n"				\
 		"	.previous\n"					\
 		"\n"							\
 		"	.section .fixup,\"ax\"\n"			\
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 0/2] m68k uaccess fault handling fixes
  2024-04-29  3:09 [PATCH v4 0/2] m68k uaccess fault handling fixes Michael Schmitz
  2024-04-29  3:09 ` [PATCH v4 1/2] m68k: Handle __generic_copy_to_user faults more carefully Michael Schmitz
  2024-04-29  3:09 ` [PATCH v4 2/2] m68k: improve __constant_copy_to_user_asm() fault handling Michael Schmitz
@ 2024-04-29  7:58 ` Greg Ungerer
  2024-04-29  8:08   ` Geert Uytterhoeven
  2 siblings, 1 reply; 17+ messages in thread
From: Greg Ungerer @ 2024-04-29  7:58 UTC (permalink / raw)
  To: Michael Schmitz, linux-m68k, geert

Hi Michael,

On 29/4/24 13:09, Michael Schmitz wrote:
> (Actually now) version 4 of fixes for uaccess fault handling
> on 68030, tested on 68030 and 68040.
> 
> Patch 1 has one exception table entry fixed (residual
> calculated in __generic_copy_to_user did not take into
> account the number of longword transfers omitted due to a
> fault) si as well as a final NOP added in __clear_user.
> 
> Patch 2 is unchanged from v3 (actually since RFC v1), now
> has Tested-by tag by Finn Thain added.
> 
> These patches would still benefit from testing on 68060
> and Coldfire.

I would very much like to test them on ColdFire. Unfortunately my only
ColdFire with MMU board (a M547xEVB) failed a couple of months back.
I am trying to find a replacement, but so far have not been able to get
my hands on one.

Maybe someone else out there has one they can test with?

Regards
Greg



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 0/2] m68k uaccess fault handling fixes
  2024-04-29  7:58 ` [PATCH v4 0/2] m68k uaccess fault handling fixes Greg Ungerer
@ 2024-04-29  8:08   ` Geert Uytterhoeven
  0 siblings, 0 replies; 17+ messages in thread
From: Geert Uytterhoeven @ 2024-04-29  8:08 UTC (permalink / raw)
  To: Greg Ungerer, Angelo Dureghello; +Cc: Michael Schmitz, linux-m68k

On Mon, Apr 29, 2024 at 9:58 AM Greg Ungerer <gerg@linux-m68k.org> wrote:
> On 29/4/24 13:09, Michael Schmitz wrote:
> > (Actually now) version 4 of fixes for uaccess fault handling
> > on 68030, tested on 68030 and 68040.
> >
> > Patch 1 has one exception table entry fixed (residual
> > calculated in __generic_copy_to_user did not take into
> > account the number of longword transfers omitted due to a
> > fault) si as well as a final NOP added in __clear_user.
> >
> > Patch 2 is unchanged from v3 (actually since RFC v1), now
> > has Tested-by tag by Finn Thain added.
> >
> > These patches would still benefit from testing on 68060
> > and Coldfire.
>
> I would very much like to test them on ColdFire. Unfortunately my only
> ColdFire with MMU board (a M547xEVB) failed a couple of months back.
> I am trying to find a replacement, but so far have not been able to get
> my hands on one.
>
> Maybe someone else out there has one they can test with?

Angelo?

Gr{oetje,eeting}s,

                        Geert


--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2024-08-09 12:58 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-29  3:09 [PATCH v4 0/2] m68k uaccess fault handling fixes Michael Schmitz
2024-04-29  3:09 ` [PATCH v4 1/2] m68k: Handle __generic_copy_to_user faults more carefully Michael Schmitz
2024-08-07  8:14   ` Finn Thain
2024-08-07 19:32     ` Michael Schmitz
2024-08-08  1:57       ` Finn Thain
2024-08-08  6:05         ` Greg Ungerer
2024-08-08  6:56           ` Finn Thain
2024-08-08 14:52             ` Greg Ungerer
2024-08-08 19:27               ` Michael Schmitz
2024-08-09  3:34                 ` Finn Thain
2024-08-09  8:03                   ` Michael Schmitz
2024-08-09 12:58                     ` Greg Ungerer
2024-08-09  3:22               ` Finn Thain
2024-08-08  6:58           ` Michael Schmitz
2024-04-29  3:09 ` [PATCH v4 2/2] m68k: improve __constant_copy_to_user_asm() fault handling Michael Schmitz
2024-04-29  7:58 ` [PATCH v4 0/2] m68k uaccess fault handling fixes Greg Ungerer
2024-04-29  8:08   ` Geert Uytterhoeven

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox