* [PATCH v4 0/2] m68k uaccess fault handling fixes
@ 2024-04-29 3:09 Michael Schmitz
2024-04-29 3:09 ` [PATCH v4 1/2] m68k: Handle __generic_copy_to_user faults more carefully Michael Schmitz
` (2 more replies)
0 siblings, 3 replies; 17+ messages in thread
From: Michael Schmitz @ 2024-04-29 3:09 UTC (permalink / raw)
To: linux-m68k, geert; +Cc: gerg
(Actually now) version 4 of fixes for uaccess fault handling
on 68030, tested on 68030 and 68040.
Patch 1 has one exception table entry fixed (residual
calculated in __generic_copy_to_user did not take into
account the number of longword transfers omitted due to a
fault) si as well as a final NOP added in __clear_user.
Patch 2 is unchanged from v3 (actually since RFC v1), now
has Tested-by tag by Finn Thain added.
These patches would still benefit from testing on 68060
and Coldfire.
Cheers,
Michael
^ permalink raw reply [flat|nested] 17+ messages in thread* [PATCH v4 1/2] m68k: Handle __generic_copy_to_user faults more carefully 2024-04-29 3:09 [PATCH v4 0/2] m68k uaccess fault handling fixes Michael Schmitz @ 2024-04-29 3:09 ` Michael Schmitz 2024-08-07 8:14 ` Finn Thain 2024-04-29 3:09 ` [PATCH v4 2/2] m68k: improve __constant_copy_to_user_asm() fault handling Michael Schmitz 2024-04-29 7:58 ` [PATCH v4 0/2] m68k uaccess fault handling fixes Greg Ungerer 2 siblings, 1 reply; 17+ messages in thread From: Michael Schmitz @ 2024-04-29 3:09 UTC (permalink / raw) To: linux-m68k, geert; +Cc: gerg, Michael Schmitz, Finn Thain, linux-m68k As mentioned by Finn Thain in his patch to improve put_user exception handling on 040, a similar problem exists on 030 processors. A moves instruction that crosses a page boundary from a mapped page into an unmapped one will cause a mid-instruction bus error exception (frame format b), with the PC pointing (usually) two instructions past the faulting movesl instruction. Our exception handling in __generic_copy_to_user only covers the instruction immediately following the faulting one. As a result, fixup_exception in send_fault_sig does not detect this case, and cause send_fault_sig to oops. Extend the exception table to cover one additional instruction beyond the moves[lwb] instructions. Tested on 68030 (Atari Falcon 030) with transfers beginning at one to six bytes offset from the end of a mapped page, followed by further bytes on an unmapped page (testcase derived from stress-ng sysbadaddr stressor by Finn Thain). Tested on 68040 (Mac Quadra) and 68030 (Mac IIci) by Finn Thain. A similar problem is present in __clear_user(); modify the exception table for that function in the same way (tested by Finn Thain). Cc: Finn Thain <fthain@linux-m68k.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: linux-m68k@lists.linux-m68k.org Tested-by: Finn Thain <fthain@linux-m68k.org> Link: https://lore.kernel.org/all/e0f23460779e6d16e2633486ac4841790ef2aca0.1713176294.git.fthain@linux-m68k.org Signed-off-by: Michael Schmitz <schmitzmic@gmail.com> --- Changes from v3: Finn Thain: - correct exception table entry for movew instruction in __generic_copy_to_user - add final NOP in __clear_user Changes from RFC v2: Finn Thain: - add missing extension table entries and final NOP in __generic_copy_to_user faults in 040 tests Michael Schmitz: - add yet another exception table entry in __clear_user Changes from RFC v1: Michael Schmitz: - use extended exception table instead of additional NOPs --- arch/m68k/lib/uaccess.c | 65 ++++++++++++++++++++++++----------------- 1 file changed, 38 insertions(+), 27 deletions(-) diff --git a/arch/m68k/lib/uaccess.c b/arch/m68k/lib/uaccess.c index 86b6fed5151c..c63efa6ea4d4 100644 --- a/arch/m68k/lib/uaccess.c +++ b/arch/m68k/lib/uaccess.c @@ -62,35 +62,42 @@ unsigned long __generic_copy_to_user(void __user *to, const void *from, asm volatile ("\n" " tst.l %0\n" - " jeq 4f\n" + " jeq 5f\n" "1: move.l (%1)+,%3\n" "2: "MOVES".l %3,(%2)+\n" "3: subq.l #1,%0\n" - " jne 1b\n" - "4: btst #1,%5\n" - " jeq 6f\n" - " move.w (%1)+,%3\n" - "5: "MOVES".w %3,(%2)+\n" - "6: btst #0,%5\n" + "4: jne 1b\n" + "5: btst #1,%5\n" " jeq 8f\n" - " move.b (%1)+,%3\n" - "7: "MOVES".b %3,(%2)+\n" - "8:\n" + "6: move.w (%1)+,%3\n" + "7: "MOVES".w %3,(%2)+\n" + "8: btst #0,%5\n" + "9: jeq 13f\n" + "10: move.b (%1)+,%3\n" + "11: "MOVES".b %3,(%2)+\n" + "12: nop\n" + "13:\n" " .section .fixup,\"ax\"\n" " .even\n" "20: lsl.l #2,%0\n" "50: add.l %5,%0\n" - " jra 8b\n" + " jra 13b\n" " .previous\n" "\n" " .section __ex_table,\"a\"\n" " .align 4\n" + " .long 1b,20b\n" " .long 2b,20b\n" " .long 3b,20b\n" - " .long 5b,50b\n" - " .long 6b,50b\n" + " .long 4b,20b\n" + " .long 5b,20b\n" + " .long 6b,20b\n" " .long 7b,50b\n" " .long 8b,50b\n" + " .long 9b,50b\n" + " .long 10b,50b\n" + " .long 11b,50b\n" + " .long 12b,50b\n" " .previous" : "=d" (res), "+a" (from), "+a" (to), "=&d" (tmp) : "0" (n / 4), "d" (n & 3)); @@ -109,32 +116,36 @@ unsigned long __clear_user(void __user *to, unsigned long n) asm volatile ("\n" " tst.l %0\n" - " jeq 3f\n" + " jeq 4f\n" "1: "MOVES".l %2,(%1)+\n" "2: subq.l #1,%0\n" - " jne 1b\n" - "3: btst #1,%4\n" - " jeq 5f\n" - "4: "MOVES".w %2,(%1)+\n" - "5: btst #0,%4\n" - " jeq 7f\n" - "6: "MOVES".b %2,(%1)\n" - "7:\n" + "3: jne 1b\n" + "4: btst #1,%4\n" + " jeq 6f\n" + "5: "MOVES".w %2,(%1)+\n" + "6: btst #0,%4\n" + "7: jeq 9f\n" + "8: "MOVES".b %2,(%1)\n" + "9: nop\n" + "10:\n" " .section .fixup,\"ax\"\n" " .even\n" - "10: lsl.l #2,%0\n" + "20: lsl.l #2,%0\n" "40: add.l %4,%0\n" - " jra 7b\n" + " jra 10b\n" " .previous\n" "\n" " .section __ex_table,\"a\"\n" " .align 4\n" - " .long 1b,10b\n" - " .long 2b,10b\n" - " .long 4b,40b\n" + " .long 1b,20b\n" + " .long 2b,20b\n" + " .long 3b,20b\n" + " .long 4b,20b\n" " .long 5b,40b\n" " .long 6b,40b\n" " .long 7b,40b\n" + " .long 8b,40b\n" + " .long 9b,40b\n" " .previous" : "=d" (res), "+a" (to) : "d" (0), "0" (n / 4), "d" (n & 3)); -- 2.17.1 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH v4 1/2] m68k: Handle __generic_copy_to_user faults more carefully 2024-04-29 3:09 ` [PATCH v4 1/2] m68k: Handle __generic_copy_to_user faults more carefully Michael Schmitz @ 2024-08-07 8:14 ` Finn Thain 2024-08-07 19:32 ` Michael Schmitz 0 siblings, 1 reply; 17+ messages in thread From: Finn Thain @ 2024-08-07 8:14 UTC (permalink / raw) To: Michael Schmitz; +Cc: linux-m68k, geert, gerg, linux-m68k On Mon, 29 Apr 2024, Michael Schmitz wrote: > As mentioned by Finn Thain in his patch to improve put_user > exception handling on 040, a similar problem exists on 030 > processors. > > A moves instruction that crosses a page boundary from a > mapped page into an unmapped one will cause a mid-instruction > bus error exception (frame format b), with the PC pointing > (usually) two instructions past the faulting movesl instruction. > > Our exception handling in __generic_copy_to_user only covers > the instruction immediately following the faulting one. As > a result, fixup_exception in send_fault_sig does not detect > this case, and cause send_fault_sig to oops. > > Extend the exception table to cover one additional instruction > beyond the moves[lwb] instructions. > > Tested on 68030 (Atari Falcon 030) with transfers beginning > at one to six bytes offset from the end of a mapped page, > followed by further bytes on an unmapped page (testcase > derived from stress-ng sysbadaddr stressor by Finn Thain). > > Tested on 68040 (Mac Quadra) and 68030 (Mac IIci) by Finn Thain. > > A similar problem is present in __clear_user(); modify the > exception table for that function in the same way (tested by > Finn Thain). > Well, that __clear_user() bug is no longer theoretical. I accidentally bumped into it when I sent a ^C to a shell script I wrote to test some mac_scsi driver patches... [ 0.000000] Linux version 6.10.0-mac-00011-gd1d490afeb2a (fthain@nippy) (m68k-unknown-linux-musl-gcc (Gentoo 13.2.1_p20240210 p14) 13.2.1 20240210, GNU ld (Gentoo 2.40 p7) 2.40.0) #39 Sat Aug 3 19:57:43 AEST 2024 ... root@(none):/# bash scsi-test.sh /dev/sda5 bs=512k count=1 [ 130.090000] sd 0:0:0:0: [sda] tag#9 PDMA fixup: DRQ timeout [ 130.090000] sd 0:0:0:0: [sda] tag#9 switching to slow handshake [ 130.180000] sd 0:0:0:0: Power-on or device reset occurred 1+0 records in 1+0 records out 524288 bytes (524 kB, 512 KiB) copied, 5.13732 s, 102 kB/s [ 136.030000] bash (57): drop_caches: 3 ^C root@(none):/# bash scsi-test.sh /dev/sdb5 bs=512k count=1 1+0 records in 1+0 records out 524288 bytes (524 kB, 512 KiB) copied, 1.1273 s, 465 kB/s [ 160.440000] bash (61): drop_caches: 3 1+0 records in 1+0 records out 524288 bytes (524 kB, 512 KiB) copied, 1.12763 s, 465 kB/s [ 171.920000] bash (61): drop_caches: 3 bs=64k count=8 8+0 records in 8+0 records out 524288 bytes (524 kB, 512 KiB) copied, 1.03469 s, 507 kB/s [ 184.480000] bash (61): drop_caches: 3 8+0 records in 8+0 records out 524288 bytes (524 kB, 512 KiB) copied, 1.03221 s, 508 kB/s [ 196.340000] bash (61): drop_caches: 3 bs=4k count=128 128+0 records in 128+0 records out 524288 bytes (524 kB, 512 KiB) copied, 1.13089 s, 464 kB/s [ 208.860000] bash (61): drop_caches: 3 128+0 records in 128+0 records out 524288 bytes (524 kB, 512 KiB) copied, 1.13089 s, 464 kB/s [ 220.850000] bash (61): drop_caches: 3 bs=512 count=1k 1024+0 records in 1024+0 records out 524288 bytes (524 kB, 512 KiB) copied, 3.69678 s, 142 kB/s [ 236.020000] bash (61): drop_caches: 3 1024+0 records in 1024+0 records out 524288 bytes (524 kB, 512 KiB) copied, 3.68874 s, 142 kB/s [ 250.570000] bash (61): drop_caches: 3 bs=512k count=1 1+0 records in 1+0 records out 524288 bytes (524 kB, 512 KiB) copied, 1.12338 s, 467 kB/s [ 263.120000] bash (61): drop_caches: 3 1+0 records in 1+0 records out 524288 bytes (524 kB, 512 KiB) copied, 1.12665 s, 465 kB/s [ 275.110000] bash (61): drop_caches: 3 bs=64k count=8 8+0 records in 8+0 records out 524288 bytes (524 kB, 512 KiB) copied, 1.08861 s, 482 kB/s [ 287.610000] bash (61): drop_caches: 3 8+0 records in 8+0 records out 524288 bytes (524 kB, 512 KiB) copied, 1.03319 s, 507 kB/s [ 299.510000] bash (61): drop_caches: 3 ^C[ 303.190000] Unable to handle kernel access at virtual address d7d17fd7 [ 303.200000] Oops: 00000000 [ 303.210000] Modules linked in: [ 303.220000] PC: [<004301fe>] __clear_user+0x22/0x40 [ 303.240000] SR: 2000 SP: 4841d159 a2: 00b78590 [ 303.250000] d0: 000003ab d1: 00000000 d2: 00000000 d3: 8000c150 [ 303.260000] d4: 00002000 d5: 00000000 a0: 8000c154 a1: 009bdb3c [ 303.280000] Process cmp (pid: 92, task=7d95deea) [ 303.290000] Frame format=B ssw=0709 isc=0801 isb=0001 daddr=8000c150 dobuf=00000000 [ 303.310000] baddr=00430202 dibuf=8000c150 ver=f [ 303.320000] Stack from 0149de8c: [ 303.320000] 8001c2c4 0149deb8 00139904 8000c150 00000eb0 00983c00 00000000 00000003 [ 303.320000] 00000003 8000bf00 00983a00 0149df40 0013a1fa 00912480 8000bf00 00983c60 [ 303.320000] 00000003 00000012 00000000 fffffff8 00000006 d012f728 00000000 00983a5a [ 303.320000] 00000004 00983a00 000579d4 0052b7e8 009125a0 00912480 00983a5a 80000034 [ 303.320000] 000d1704 014994c0 80008864 80000000 00000000 80008864 80008864 00000000 [ 303.320000] 01497800 00000000 00000034 fffffff8 fffffff8 0149df7c 000e9296 00983a00 [ 303.380000] Call Trace: [<00139904>] elf_load+0x192/0x1da [ 303.390000] [<0013a1fa>] load_elf_binary+0x8ae/0xe46 [ 303.400000] [<000579d4>] try_module_get+0x0/0x48 [ 303.410000] [<000d1704>] kfree+0x0/0x160 [ 303.420000] [<000e9296>] bprm_execve+0x134/0x416 [ 303.430000] [<000e9daa>] copy_string_kernel+0x0/0x184 [ 303.440000] [<000e9fa4>] copy_strings+0x0/0x310 [ 303.460000] [<000ea3f6>] do_execveat_common+0x142/0x1d2 [ 303.480000] [<000eb128>] sys_execve+0x2a/0x36 [ 303.490000] [<0000275e>] syscall+0x8/0xc [ 303.510000] [<0008c010>] do_read_cache_folio+0xde/0x2c8 [ 303.520000] [ 303.520000] Code: 206e 0008 4282 4a80 6708 0e98 2800 5380 <66f8> 0801 0001 6704 0e58 2800 0801 0000 6704 0e10 2800 241f 4e5e 4e75 4e75 4e56 [ 303.560000] Disabling lock debugging due to kernel taint root@(none):/# Also, it appears that the patch Michael posted in this thread would have prevented that oops because the "jne 1b" at __clear_user+0x22 would have been found in the exception table. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v4 1/2] m68k: Handle __generic_copy_to_user faults more carefully 2024-08-07 8:14 ` Finn Thain @ 2024-08-07 19:32 ` Michael Schmitz 2024-08-08 1:57 ` Finn Thain 0 siblings, 1 reply; 17+ messages in thread From: Michael Schmitz @ 2024-08-07 19:32 UTC (permalink / raw) To: Finn Thain; +Cc: linux-m68k, geert, gerg, linux-m68k Hi Finn, thanks for (inadvertently) proving this bug in __clear_user() does exist! I'm afraid I've lost track of where we're at with this patch series. Does it need more work, or more bug reports such as the one below? The previous bug reports might be considered somewhat contrived but this one's from 'real' user space code, and none too complex at that? Cheers, Michael On 7/08/24 20:14, Finn Thain wrote: > On Mon, 29 Apr 2024, Michael Schmitz wrote: > >> As mentioned by Finn Thain in his patch to improve put_user >> exception handling on 040, a similar problem exists on 030 >> processors. >> >> A moves instruction that crosses a page boundary from a >> mapped page into an unmapped one will cause a mid-instruction >> bus error exception (frame format b), with the PC pointing >> (usually) two instructions past the faulting movesl instruction. >> >> Our exception handling in __generic_copy_to_user only covers >> the instruction immediately following the faulting one. As >> a result, fixup_exception in send_fault_sig does not detect >> this case, and cause send_fault_sig to oops. >> >> Extend the exception table to cover one additional instruction >> beyond the moves[lwb] instructions. >> >> Tested on 68030 (Atari Falcon 030) with transfers beginning >> at one to six bytes offset from the end of a mapped page, >> followed by further bytes on an unmapped page (testcase >> derived from stress-ng sysbadaddr stressor by Finn Thain). >> >> Tested on 68040 (Mac Quadra) and 68030 (Mac IIci) by Finn Thain. >> >> A similar problem is present in __clear_user(); modify the >> exception table for that function in the same way (tested by >> Finn Thain). >> > Well, that __clear_user() bug is no longer theoretical. I accidentally > bumped into it when I sent a ^C to a shell script I wrote to test some > mac_scsi driver patches... > > > [ 0.000000] Linux version 6.10.0-mac-00011-gd1d490afeb2a (fthain@nippy) (m68k-unknown-linux-musl-gcc (Gentoo 13.2.1_p20240210 p14) 13.2.1 20240210, GNU ld (Gentoo 2.40 p7) 2.40.0) #39 Sat Aug 3 19:57:43 AEST 2024 > > ... > > root@(none):/# bash scsi-test.sh /dev/sda5 > bs=512k count=1 > [ 130.090000] sd 0:0:0:0: [sda] tag#9 PDMA fixup: DRQ timeout > [ 130.090000] sd 0:0:0:0: [sda] tag#9 switching to slow handshake > [ 130.180000] sd 0:0:0:0: Power-on or device reset occurred > 1+0 records in > 1+0 records out > 524288 bytes (524 kB, 512 KiB) copied, 5.13732 s, 102 kB/s > [ 136.030000] bash (57): drop_caches: 3 > ^C > root@(none):/# bash scsi-test.sh /dev/sdb5 > bs=512k count=1 > 1+0 records in > 1+0 records out > 524288 bytes (524 kB, 512 KiB) copied, 1.1273 s, 465 kB/s > [ 160.440000] bash (61): drop_caches: 3 > 1+0 records in > 1+0 records out > 524288 bytes (524 kB, 512 KiB) copied, 1.12763 s, 465 kB/s > [ 171.920000] bash (61): drop_caches: 3 > > bs=64k count=8 > 8+0 records in > 8+0 records out > 524288 bytes (524 kB, 512 KiB) copied, 1.03469 s, 507 kB/s > [ 184.480000] bash (61): drop_caches: 3 > 8+0 records in > 8+0 records out > 524288 bytes (524 kB, 512 KiB) copied, 1.03221 s, 508 kB/s > [ 196.340000] bash (61): drop_caches: 3 > > bs=4k count=128 > 128+0 records in > 128+0 records out > 524288 bytes (524 kB, 512 KiB) copied, 1.13089 s, 464 kB/s > [ 208.860000] bash (61): drop_caches: 3 > 128+0 records in > 128+0 records out > 524288 bytes (524 kB, 512 KiB) copied, 1.13089 s, 464 kB/s > [ 220.850000] bash (61): drop_caches: 3 > > bs=512 count=1k > 1024+0 records in > 1024+0 records out > 524288 bytes (524 kB, 512 KiB) copied, 3.69678 s, 142 kB/s > [ 236.020000] bash (61): drop_caches: 3 > 1024+0 records in > 1024+0 records out > 524288 bytes (524 kB, 512 KiB) copied, 3.68874 s, 142 kB/s > [ 250.570000] bash (61): drop_caches: 3 > > bs=512k count=1 > 1+0 records in > 1+0 records out > 524288 bytes (524 kB, 512 KiB) copied, 1.12338 s, 467 kB/s > [ 263.120000] bash (61): drop_caches: 3 > 1+0 records in > 1+0 records out > 524288 bytes (524 kB, 512 KiB) copied, 1.12665 s, 465 kB/s > [ 275.110000] bash (61): drop_caches: 3 > > bs=64k count=8 > 8+0 records in > 8+0 records out > 524288 bytes (524 kB, 512 KiB) copied, 1.08861 s, 482 kB/s > [ 287.610000] bash (61): drop_caches: 3 > 8+0 records in > 8+0 records out > 524288 bytes (524 kB, 512 KiB) copied, 1.03319 s, 507 kB/s > [ 299.510000] bash (61): drop_caches: 3 > ^C[ 303.190000] Unable to handle kernel access at virtual address d7d17fd7 > [ 303.200000] Oops: 00000000 > [ 303.210000] Modules linked in: > [ 303.220000] PC: [<004301fe>] __clear_user+0x22/0x40 > [ 303.240000] SR: 2000 SP: 4841d159 a2: 00b78590 > [ 303.250000] d0: 000003ab d1: 00000000 d2: 00000000 d3: 8000c150 > [ 303.260000] d4: 00002000 d5: 00000000 a0: 8000c154 a1: 009bdb3c > [ 303.280000] Process cmp (pid: 92, task=7d95deea) > [ 303.290000] Frame format=B ssw=0709 isc=0801 isb=0001 daddr=8000c150 dobuf=00000000 > [ 303.310000] baddr=00430202 dibuf=8000c150 ver=f > [ 303.320000] Stack from 0149de8c: > [ 303.320000] 8001c2c4 0149deb8 00139904 8000c150 00000eb0 00983c00 00000000 00000003 > [ 303.320000] 00000003 8000bf00 00983a00 0149df40 0013a1fa 00912480 8000bf00 00983c60 > [ 303.320000] 00000003 00000012 00000000 fffffff8 00000006 d012f728 00000000 00983a5a > [ 303.320000] 00000004 00983a00 000579d4 0052b7e8 009125a0 00912480 00983a5a 80000034 > [ 303.320000] 000d1704 014994c0 80008864 80000000 00000000 80008864 80008864 00000000 > [ 303.320000] 01497800 00000000 00000034 fffffff8 fffffff8 0149df7c 000e9296 00983a00 > [ 303.380000] Call Trace: [<00139904>] elf_load+0x192/0x1da > [ 303.390000] [<0013a1fa>] load_elf_binary+0x8ae/0xe46 > [ 303.400000] [<000579d4>] try_module_get+0x0/0x48 > [ 303.410000] [<000d1704>] kfree+0x0/0x160 > [ 303.420000] [<000e9296>] bprm_execve+0x134/0x416 > [ 303.430000] [<000e9daa>] copy_string_kernel+0x0/0x184 > [ 303.440000] [<000e9fa4>] copy_strings+0x0/0x310 > [ 303.460000] [<000ea3f6>] do_execveat_common+0x142/0x1d2 > [ 303.480000] [<000eb128>] sys_execve+0x2a/0x36 > [ 303.490000] [<0000275e>] syscall+0x8/0xc > [ 303.510000] [<0008c010>] do_read_cache_folio+0xde/0x2c8 > [ 303.520000] > [ 303.520000] Code: 206e 0008 4282 4a80 6708 0e98 2800 5380 <66f8> 0801 0001 6704 0e58 2800 0801 0000 6704 0e10 2800 241f 4e5e 4e75 4e75 4e56 > [ 303.560000] Disabling lock debugging due to kernel taint > > root@(none):/# > > > Also, it appears that the patch Michael posted in this thread would have > prevented that oops because the "jne 1b" at __clear_user+0x22 would have > been found in the exception table. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v4 1/2] m68k: Handle __generic_copy_to_user faults more carefully 2024-08-07 19:32 ` Michael Schmitz @ 2024-08-08 1:57 ` Finn Thain 2024-08-08 6:05 ` Greg Ungerer 0 siblings, 1 reply; 17+ messages in thread From: Finn Thain @ 2024-08-08 1:57 UTC (permalink / raw) To: Michael Schmitz; +Cc: linux-m68k, geert, gerg, linux-m68k Hello Michael On Thu, 8 Aug 2024, Michael Schmitz wrote: > > Well, that __clear_user() bug is no longer theoretical. I accidentally > > bumped into it when I sent a ^C to a shell script I wrote to test some > > mac_scsi driver patches... ... > > I'm afraid I've lost track of where we're at with this patch series. > Does it need more work, or more bug reports such as the one below? Apparently the series is waiting for some testing on a Coldfire system with MMU. > The previous bug reports might be considered somewhat contrived but this > one's from 'real' user space code, and none too complex at that? > Right. That code was as follows. There's nothing here aimed at arch/m68k/lib/uaccess.c in particular, just IO to a block device and a tmpfs filesystem. #!/bin/bash set -e -u filename=$1 rand=/tmp/rand_test_data zero=/tmp/zero_test_data len=512K dd if=/dev/urandom of=$rand bs=$len count=1 2>/dev/null dd if=/dev/zero of=$zero bs=$len count=1 2>/dev/null write() { dd $args if=$orig of=$filename # oflag=direct } compare() { echo 3 > /proc/sys/vm/drop_caches if ! cmp -n $len $orig $filename ; then diff -u <(hexdump -C < $orig) <(hexdump -C < $filename) fi } while true; do for args in "bs=512k count=1" "bs=64k count=8" "bs=4k count=128" "bs=512 count=1k" ; do echo $args orig=$rand ; write ; compare orig=$zero ; write ; compare sync echo done done ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v4 1/2] m68k: Handle __generic_copy_to_user faults more carefully 2024-08-08 1:57 ` Finn Thain @ 2024-08-08 6:05 ` Greg Ungerer 2024-08-08 6:56 ` Finn Thain 2024-08-08 6:58 ` Michael Schmitz 0 siblings, 2 replies; 17+ messages in thread From: Greg Ungerer @ 2024-08-08 6:05 UTC (permalink / raw) To: Finn Thain, Michael Schmitz; +Cc: linux-m68k, geert, linux-m68k Hi Michael, Finn, On 8/8/24 11:57, Finn Thain wrote: > > Hello Michael > > On Thu, 8 Aug 2024, Michael Schmitz wrote: > >>> Well, that __clear_user() bug is no longer theoretical. I accidentally >>> bumped into it when I sent a ^C to a shell script I wrote to test some >>> mac_scsi driver patches... > > ... > >> >> I'm afraid I've lost track of where we're at with this patch series. >> Does it need more work, or more bug reports such as the one below? > > Apparently the series is waiting for some testing on a Coldfire system > with MMU. Ok, I am in a state that I can do that now (I managed to fix my M5475EVB board). If I test the v4 versions of this patch set that should do the job? Regards Greg >> The previous bug reports might be considered somewhat contrived but this >> one's from 'real' user space code, and none too complex at that? >> > > Right. That code was as follows. There's nothing here aimed at > arch/m68k/lib/uaccess.c in particular, just IO to a block device and a > tmpfs filesystem. > > > #!/bin/bash > > set -e -u > > filename=$1 > > rand=/tmp/rand_test_data > zero=/tmp/zero_test_data > len=512K > > dd if=/dev/urandom of=$rand bs=$len count=1 2>/dev/null > dd if=/dev/zero of=$zero bs=$len count=1 2>/dev/null > > write() { > dd $args if=$orig of=$filename # oflag=direct > } > > compare() { > echo 3 > /proc/sys/vm/drop_caches > if ! cmp -n $len $orig $filename ; then > diff -u <(hexdump -C < $orig) <(hexdump -C < $filename) > fi > } > > while true; do > for args in "bs=512k count=1" "bs=64k count=8" "bs=4k count=128" "bs=512 count=1k" ; do > echo $args > orig=$rand ; write ; compare > orig=$zero ; write ; compare > sync > echo > done > done ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v4 1/2] m68k: Handle __generic_copy_to_user faults more carefully 2024-08-08 6:05 ` Greg Ungerer @ 2024-08-08 6:56 ` Finn Thain 2024-08-08 14:52 ` Greg Ungerer 2024-08-08 6:58 ` Michael Schmitz 1 sibling, 1 reply; 17+ messages in thread From: Finn Thain @ 2024-08-08 6:56 UTC (permalink / raw) To: Greg Ungerer; +Cc: Michael Schmitz, linux-m68k, geert, linux-m68k On Thu, 8 Aug 2024, Greg Ungerer wrote: > On 8/8/24 11:57, Finn Thain wrote: > > > > > > > > I'm afraid I've lost track of where we're at with this patch series. > > > Does it need more work, or more bug reports such as the one below? > > > > Apparently the series is waiting for some testing on a Coldfire system > > with MMU. > > Ok, I am in a state that I can do that now (I managed to fix my M5475EVB > board). Thanks, Greg. > If I test the v4 versions of this patch set that should do the job? > There are 3 patches that need some more testing, one from me and two from Michael: [PATCH] m68k: Handle put_user() faults more carefully [PATCH v4 1/2] m68k: Handle __generic_copy_to_user faults more carefully [PATCH v4 2/2] m68k: improve __constant_copy_to_user_asm() fault handling They were posted in two threads: https://lore.kernel.org/linux-m68k/1ed9c4c753395510c1a8df9c35e2ad4c31c90f95.1714265926.git.fthain@linux-m68k.org/T/ https://lore.kernel.org/linux-m68k/CAMuHMdVrOnOQwCxk42YCjEkbfL-YDSvpf_xTaouv9hUs2bO+qg@mail.gmail.com/T/ On 680x0, one of the bugs was brought to light with 'stress-ng --sysbadaddr -1'. The others required special test programs. I've no idea whether Coldfire silicon is susceptable at all, and if it is, whether the same reproducers would work. But that's neither here nor there from a regression testing standpoint. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v4 1/2] m68k: Handle __generic_copy_to_user faults more carefully 2024-08-08 6:56 ` Finn Thain @ 2024-08-08 14:52 ` Greg Ungerer 2024-08-08 19:27 ` Michael Schmitz 2024-08-09 3:22 ` Finn Thain 0 siblings, 2 replies; 17+ messages in thread From: Greg Ungerer @ 2024-08-08 14:52 UTC (permalink / raw) To: Finn Thain; +Cc: Michael Schmitz, linux-m68k, geert, linux-m68k Hi Finn, On 8/8/24 16:56, Finn Thain wrote: > On Thu, 8 Aug 2024, Greg Ungerer wrote: >> On 8/8/24 11:57, Finn Thain wrote: >>>> I'm afraid I've lost track of where we're at with this patch series. >>>> Does it need more work, or more bug reports such as the one below? >>> >>> Apparently the series is waiting for some testing on a Coldfire system >>> with MMU. >> >> Ok, I am in a state that I can do that now (I managed to fix my M5475EVB >> board). > > Thanks, Greg. > >> If I test the v4 versions of this patch set that should do the job? >> > > There are 3 patches that need some more testing, one from me and two from > Michael: > > [PATCH] m68k: Handle put_user() faults more carefully > [PATCH v4 1/2] m68k: Handle __generic_copy_to_user faults more carefully > [PATCH v4 2/2] m68k: improve __constant_copy_to_user_asm() fault handling > > They were posted in two threads: > > https://lore.kernel.org/linux-m68k/1ed9c4c753395510c1a8df9c35e2ad4c31c90f95.1714265926.git.fthain@linux-m68k.org/T/ > https://lore.kernel.org/linux-m68k/CAMuHMdVrOnOQwCxk42YCjEkbfL-YDSvpf_xTaouv9hUs2bO+qg@mail.gmail.com/T/ > > On 680x0, one of the bugs was brought to light with 'stress-ng > --sysbadaddr -1'. The others required special test programs. > > I've no idea whether Coldfire silicon is susceptable at all, and if it is, > whether the same reproducers would work. But that's neither here nor there > from a regression testing standpoint. Ok, thanks for the links. I have applied and tested those, no obvious regressions. So for all of these patches: Tested-by: Greg Ungerer <gerg@linux-m68k.org> I tried out the "stress-ng --sysbadaddr -1" test, and that didn't go so well for me: # stress-ng --sysbadaddr -1 stress-ng: info: [37] defaulting to a 86400 second (1 day, 0.00 secs) run per stressor stress-ng: info: [37] dispatching hogs: 1 sysbadaddr *** ILLEGAL INSTRUCTION *** FORMAT=4 Current process id is 39 BAD KERNEL TRAP: 00000000 Modules linked in: PC: [<00000000>] 0x0 SR: 2004 SP: 6504e563 a2: 008ee380 d0: 000000f7 d1: 00000000 d2: 00000000 d3: 00000000 d4: 00a87b80 d5: bfbf3814 a0: 00000000 a1: bfbf3814 Process stress-ng-sysba (pid: 39, task=4dbb2ec5) Frame format=4 eff addr=480a2004 pc=0002b154 Stack from 00adff20: 00ade000 00000000 00000000 000000f7 00000000 00000004 00a87b80 00000000 00000000 00000000 00000000 008ee380 0002ab5c 00000100 00000122 fffffff6 bfbf376c 0002b29e 000000f7 bfbf3814 00000000 00000000 00ade000 0002b222 00ae0800 80118988 00000000 00000005 bfbf37a0 00000005 bfbf3814 00adffcc 00023d2c 00adffcc 00000000 00000000 00000000 00000000 000000f7 00000000 80118b46 00021850 00024b00 000000f7 bfbf3814 00000000 00000000 bfbf3814 Call Trace: [<0002ab5c>] child_wait_callback+0x0/0x24 [<0002b29e>] sys_wait4+0x7c/0x8e [<0002b222>] sys_wait4+0x0/0x8e [<00023d2c>] buserr_c+0xb0/0x152 [<00021850>] buserr+0x28/0x30 [<00024b00>] system_call+0x54/0xa8 But that is the same with and without these patches. Regards Greg ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v4 1/2] m68k: Handle __generic_copy_to_user faults more carefully 2024-08-08 14:52 ` Greg Ungerer @ 2024-08-08 19:27 ` Michael Schmitz 2024-08-09 3:34 ` Finn Thain 2024-08-09 3:22 ` Finn Thain 1 sibling, 1 reply; 17+ messages in thread From: Michael Schmitz @ 2024-08-08 19:27 UTC (permalink / raw) To: Greg Ungerer, Finn Thain; +Cc: linux-m68k, geert, linux-m68k Hi Greg, tanks for testing! On 9/08/24 02:52, Greg Ungerer wrote: > Hi Finn, > > On 8/8/24 16:56, Finn Thain wrote: >> On Thu, 8 Aug 2024, Greg Ungerer wrote: >>> On 8/8/24 11:57, Finn Thain wrote: >>>>> I'm afraid I've lost track of where we're at with this patch series. >>>>> Does it need more work, or more bug reports such as the one below? >>>> >>>> Apparently the series is waiting for some testing on a Coldfire system >>>> with MMU. >>> >>> Ok, I am in a state that I can do that now (I managed to fix my >>> M5475EVB >>> board). >> >> Thanks, Greg. >> >>> If I test the v4 versions of this patch set that should do the job? >>> >> >> There are 3 patches that need some more testing, one from me and two >> from >> Michael: >> >> [PATCH] m68k: Handle put_user() faults more carefully >> [PATCH v4 1/2] m68k: Handle __generic_copy_to_user faults more carefully >> [PATCH v4 2/2] m68k: improve __constant_copy_to_user_asm() fault >> handling >> >> They were posted in two threads: >> >> https://lore.kernel.org/linux-m68k/1ed9c4c753395510c1a8df9c35e2ad4c31c90f95.1714265926.git.fthain@linux-m68k.org/T/ >> >> https://lore.kernel.org/linux-m68k/CAMuHMdVrOnOQwCxk42YCjEkbfL-YDSvpf_xTaouv9hUs2bO+qg@mail.gmail.com/T/ >> >> >> On 680x0, one of the bugs was brought to light with 'stress-ng >> --sysbadaddr -1'. The others required special test programs. >> >> I've no idea whether Coldfire silicon is susceptable at all, and if >> it is, >> whether the same reproducers would work. But that's neither here nor >> there >> from a regression testing standpoint. > > Ok, thanks for the links. I have applied and tested those, no obvious > regressions. > So for all of these patches: > > Tested-by: Greg Ungerer <gerg@linux-m68k.org> > > I tried out the "stress-ng --sysbadaddr -1" test, and that didn't go > so well for me: > > # stress-ng --sysbadaddr -1 > stress-ng: info: [37] defaulting to a 86400 second (1 day, 0.00 secs) > run per stressor > stress-ng: info: [37] dispatching hogs: 1 sysbadaddr > *** ILLEGAL INSTRUCTION *** FORMAT=4 > Current process id is 39 > BAD KERNEL TRAP: 00000000 > Modules linked in: > PC: [<00000000>] 0x0 > SR: 2004 SP: 6504e563 a2: 008ee380 > d0: 000000f7 d1: 00000000 d2: 00000000 d3: 00000000 > d4: 00a87b80 d5: bfbf3814 a0: 00000000 a1: bfbf3814 > Process stress-ng-sysba (pid: 39, task=4dbb2ec5) > Frame format=4 eff addr=480a2004 pc=0002b154 > Stack from 00adff20: > 00ade000 00000000 00000000 000000f7 00000000 00000004 00a87b80 > 00000000 > 00000000 00000000 00000000 008ee380 0002ab5c 00000100 00000122 > fffffff6 > bfbf376c 0002b29e 000000f7 bfbf3814 00000000 00000000 00ade000 > 0002b222 > 00ae0800 80118988 00000000 00000005 bfbf37a0 00000005 bfbf3814 > 00adffcc > 00023d2c 00adffcc 00000000 00000000 00000000 00000000 000000f7 > 00000000 > 80118b46 00021850 00024b00 000000f7 bfbf3814 00000000 00000000 > bfbf3814 > Call Trace: [<0002ab5c>] child_wait_callback+0x0/0x24 > [<0002b29e>] sys_wait4+0x7c/0x8e > [<0002b222>] sys_wait4+0x0/0x8e > [<00023d2c>] buserr_c+0xb0/0x152 > [<00021850>] buserr+0x28/0x30 > [<00024b00>] system_call+0x54/0xa8 > > But that is the same with and without these patches. I wonder if recent signal handling changes (e.g. commit 0d4276cfbe6fd4c4a21acdee803b05a3a6192082) have rare unexpected side effects on Coldfire here ... OTOH, signal handling as such works just fine, right? Cheers, Michael > > Regards > Greg > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v4 1/2] m68k: Handle __generic_copy_to_user faults more carefully 2024-08-08 19:27 ` Michael Schmitz @ 2024-08-09 3:34 ` Finn Thain 2024-08-09 8:03 ` Michael Schmitz 0 siblings, 1 reply; 17+ messages in thread From: Finn Thain @ 2024-08-09 3:34 UTC (permalink / raw) To: Michael Schmitz; +Cc: Greg Ungerer, linux-m68k, geert, linux-m68k [-- Attachment #1: Type: text/plain, Size: 2360 bytes --] On Fri, 9 Aug 2024, Michael Schmitz wrote: > > > > I tried out the "stress-ng --sysbadaddr -1" test, and that didn't go > > so well for me: > > > > # stress-ng --sysbadaddr -1 > > stress-ng: info: [37] defaulting to a 86400 second (1 day, 0.00 secs) run > > per stressor > > stress-ng: info: [37] dispatching hogs: 1 sysbadaddr > > *** ILLEGAL INSTRUCTION *** FORMAT=4 > > Current process id is 39 > > BAD KERNEL TRAP: 00000000 > > Modules linked in: > > PC: [<00000000>] 0x0 > > SR: 2004 SP: 6504e563 a2: 008ee380 > > d0: 000000f7 d1: 00000000 d2: 00000000 d3: 00000000 > > d4: 00a87b80 d5: bfbf3814 a0: 00000000 a1: bfbf3814 > > Process stress-ng-sysba (pid: 39, task=4dbb2ec5) > > Frame format=4 eff addr=480a2004 pc=0002b154 > > Stack from 00adff20: > > 00ade000 00000000 00000000 000000f7 00000000 00000004 00a87b80 > > 00000000 > > 00000000 00000000 00000000 008ee380 0002ab5c 00000100 00000122 > > fffffff6 > > bfbf376c 0002b29e 000000f7 bfbf3814 00000000 00000000 00ade000 > > 0002b222 > > 00ae0800 80118988 00000000 00000005 bfbf37a0 00000005 bfbf3814 > > 00adffcc > > 00023d2c 00adffcc 00000000 00000000 00000000 00000000 000000f7 > > 00000000 > > 80118b46 00021850 00024b00 000000f7 bfbf3814 00000000 00000000 > > bfbf3814 > > Call Trace: [<0002ab5c>] child_wait_callback+0x0/0x24 > > [<0002b29e>] sys_wait4+0x7c/0x8e > > [<0002b222>] sys_wait4+0x0/0x8e > > [<00023d2c>] buserr_c+0xb0/0x152 > > [<00021850>] buserr+0x28/0x30 > > [<00024b00>] system_call+0x54/0xa8 > > > > But that is the same with and without these patches. > > I wonder if recent signal handling changes (e.g. commit > 0d4276cfbe6fd4c4a21acdee803b05a3a6192082) have rare unexpected side effects on > Coldfire here ... OTOH, signal handling as such works just fine, right? > That would be commit b845b574f86d ("m68k: Move signal frame following exception on 68020/030") on mainline. If it caused a regression, that would have first appeared in v6.4. I can't imagine how that commit could affect Coldfire but that's no reason not to test an older kernel. FWIW, my hunch is that the other stressors which call wait4() will probably crash too (regardless of kernel version). ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v4 1/2] m68k: Handle __generic_copy_to_user faults more carefully 2024-08-09 3:34 ` Finn Thain @ 2024-08-09 8:03 ` Michael Schmitz 2024-08-09 12:58 ` Greg Ungerer 0 siblings, 1 reply; 17+ messages in thread From: Michael Schmitz @ 2024-08-09 8:03 UTC (permalink / raw) To: Finn Thain; +Cc: Greg Ungerer, linux-m68k, geert, linux-m68k Hi Finn, Am 09.08.2024 um 15:34 schrieb Finn Thain: > > On Fri, 9 Aug 2024, Michael Schmitz wrote: > >>> >>> I tried out the "stress-ng --sysbadaddr -1" test, and that didn't go >>> so well for me: >>> >>> # stress-ng --sysbadaddr -1 >>> stress-ng: info: [37] defaulting to a 86400 second (1 day, 0.00 secs) run >>> per stressor >>> stress-ng: info: [37] dispatching hogs: 1 sysbadaddr >>> *** ILLEGAL INSTRUCTION *** FORMAT=4 >>> Current process id is 39 >>> BAD KERNEL TRAP: 00000000 >>> Modules linked in: >>> PC: [<00000000>] 0x0 >>> SR: 2004 SP: 6504e563 a2: 008ee380 >>> d0: 000000f7 d1: 00000000 d2: 00000000 d3: 00000000 >>> d4: 00a87b80 d5: bfbf3814 a0: 00000000 a1: bfbf3814 >>> Process stress-ng-sysba (pid: 39, task=4dbb2ec5) >>> Frame format=4 eff addr=480a2004 pc=0002b154 >>> Stack from 00adff20: >>> 00ade000 00000000 00000000 000000f7 00000000 00000004 00a87b80 >>> 00000000 >>> 00000000 00000000 00000000 008ee380 0002ab5c 00000100 00000122 >>> fffffff6 >>> bfbf376c 0002b29e 000000f7 bfbf3814 00000000 00000000 00ade000 >>> 0002b222 >>> 00ae0800 80118988 00000000 00000005 bfbf37a0 00000005 bfbf3814 >>> 00adffcc >>> 00023d2c 00adffcc 00000000 00000000 00000000 00000000 000000f7 >>> 00000000 >>> 80118b46 00021850 00024b00 000000f7 bfbf3814 00000000 00000000 >>> bfbf3814 >>> Call Trace: [<0002ab5c>] child_wait_callback+0x0/0x24 >>> [<0002b29e>] sys_wait4+0x7c/0x8e >>> [<0002b222>] sys_wait4+0x0/0x8e >>> [<00023d2c>] buserr_c+0xb0/0x152 >>> [<00021850>] buserr+0x28/0x30 >>> [<00024b00>] system_call+0x54/0xa8 >>> >>> But that is the same with and without these patches. >> >> I wonder if recent signal handling changes (e.g. commit >> 0d4276cfbe6fd4c4a21acdee803b05a3a6192082) have rare unexpected side effects on >> Coldfire here ... OTOH, signal handling as such works just fine, right? >> > > That would be commit b845b574f86d ("m68k: Move signal frame following > exception on 68020/030") on mainline. If it caused a regression, that > would have first appeared in v6.4. I can't imagine how that commit could > affect Coldfire but that's no reason not to test an older kernel. Yes, testing older kernels is certainly the fastest way to rule out involvement of that commit. On a closer look, there is no possible way in that this commit can be responsible for the bug unless Coldfire does use frame format B for access errors. I don't think that's likely? > FWIW, my hunch is that the other stressors which call wait4() will > probably crash too (regardless of kernel version). Quite likely ... Cheers, Michael ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v4 1/2] m68k: Handle __generic_copy_to_user faults more carefully 2024-08-09 8:03 ` Michael Schmitz @ 2024-08-09 12:58 ` Greg Ungerer 0 siblings, 0 replies; 17+ messages in thread From: Greg Ungerer @ 2024-08-09 12:58 UTC (permalink / raw) To: Michael Schmitz, Finn Thain; +Cc: linux-m68k, geert, linux-m68k Hi Michael, Finn, On 9/8/24 18:03, Michael Schmitz wrote: > Hi Finn, > > Am 09.08.2024 um 15:34 schrieb Finn Thain: >> >> On Fri, 9 Aug 2024, Michael Schmitz wrote: >> >>>> >>>> I tried out the "stress-ng --sysbadaddr -1" test, and that didn't go >>>> so well for me: >>>> >>>> # stress-ng --sysbadaddr -1 >>>> stress-ng: info: [37] defaulting to a 86400 second (1 day, 0.00 >>>> secs) run >>>> per stressor >>>> stress-ng: info: [37] dispatching hogs: 1 sysbadaddr >>>> *** ILLEGAL INSTRUCTION *** FORMAT=4 >>>> Current process id is 39 >>>> BAD KERNEL TRAP: 00000000 >>>> Modules linked in: >>>> PC: [<00000000>] 0x0 >>>> SR: 2004 SP: 6504e563 a2: 008ee380 >>>> d0: 000000f7 d1: 00000000 d2: 00000000 d3: 00000000 >>>> d4: 00a87b80 d5: bfbf3814 a0: 00000000 a1: bfbf3814 >>>> Process stress-ng-sysba (pid: 39, task=4dbb2ec5) >>>> Frame format=4 eff addr=480a2004 pc=0002b154 >>>> Stack from 00adff20: >>>> 00ade000 00000000 00000000 000000f7 00000000 00000004 00a87b80 >>>> 00000000 >>>> 00000000 00000000 00000000 008ee380 0002ab5c 00000100 00000122 >>>> fffffff6 >>>> bfbf376c 0002b29e 000000f7 bfbf3814 00000000 00000000 00ade000 >>>> 0002b222 >>>> 00ae0800 80118988 00000000 00000005 bfbf37a0 00000005 bfbf3814 >>>> 00adffcc >>>> 00023d2c 00adffcc 00000000 00000000 00000000 00000000 000000f7 >>>> 00000000 >>>> 80118b46 00021850 00024b00 000000f7 bfbf3814 00000000 00000000 >>>> bfbf3814 >>>> Call Trace: [<0002ab5c>] child_wait_callback+0x0/0x24 >>>> [<0002b29e>] sys_wait4+0x7c/0x8e >>>> [<0002b222>] sys_wait4+0x0/0x8e >>>> [<00023d2c>] buserr_c+0xb0/0x152 >>>> [<00021850>] buserr+0x28/0x30 >>>> [<00024b00>] system_call+0x54/0xa8 >>>> >>>> But that is the same with and without these patches. >>> >>> I wonder if recent signal handling changes (e.g. commit >>> 0d4276cfbe6fd4c4a21acdee803b05a3a6192082) have rare unexpected side >>> effects on >>> Coldfire here ... OTOH, signal handling as such works just fine, right? >>> >> >> That would be commit b845b574f86d ("m68k: Move signal frame following >> exception on 68020/030") on mainline. If it caused a regression, that >> would have first appeared in v6.4. I can't imagine how that commit could >> affect Coldfire but that's no reason not to test an older kernel. > > Yes, testing older kernels is certainly the fastest way to rule out > involvement of that commit. I will go back a few revisions and see if I can shed some light on it. Regards Greg > On a closer look, there is no possible way in that this commit can be > responsible for the bug unless Coldfire does use frame format B for > access errors. I don't think that's likely? > >> FWIW, my hunch is that the other stressors which call wait4() will >> probably crash too (regardless of kernel version). > > Quite likely ... > > Cheers, > > Michael > > > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v4 1/2] m68k: Handle __generic_copy_to_user faults more carefully 2024-08-08 14:52 ` Greg Ungerer 2024-08-08 19:27 ` Michael Schmitz @ 2024-08-09 3:22 ` Finn Thain 1 sibling, 0 replies; 17+ messages in thread From: Finn Thain @ 2024-08-09 3:22 UTC (permalink / raw) To: Greg Ungerer; +Cc: Michael Schmitz, linux-m68k, geert, linux-m68k On Fri, 9 Aug 2024, Greg Ungerer wrote: > > Ok, thanks for the links. I have applied and tested those, no obvious > regressions. > So for all of these patches: > > Tested-by: Greg Ungerer <gerg@linux-m68k.org> > Thanks for testing! ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v4 1/2] m68k: Handle __generic_copy_to_user faults more carefully 2024-08-08 6:05 ` Greg Ungerer 2024-08-08 6:56 ` Finn Thain @ 2024-08-08 6:58 ` Michael Schmitz 1 sibling, 0 replies; 17+ messages in thread From: Michael Schmitz @ 2024-08-08 6:58 UTC (permalink / raw) To: Greg Ungerer, Finn Thain; +Cc: linux-m68k, geert, linux-m68k Hi Greg, Am 08.08.2024 um 18:05 schrieb Greg Ungerer: > Hi Michael, Finn, > > On 8/8/24 11:57, Finn Thain wrote: >> >> Hello Michael >> >> On Thu, 8 Aug 2024, Michael Schmitz wrote: >> >>>> Well, that __clear_user() bug is no longer theoretical. I accidentally >>>> bumped into it when I sent a ^C to a shell script I wrote to test some >>>> mac_scsi driver patches... >> >> ... >> >>> >>> I'm afraid I've lost track of where we're at with this patch series. >>> Does it need more work, or more bug reports such as the one below? >> >> Apparently the series is waiting for some testing on a Coldfire system >> with MMU. > > Ok, I am in a state that I can do that now (I managed to fix my M5475EVB > board). > If I test the v4 versions of this patch set that should do the job? Yes, v4 is the one to test. I'll see that I can dig out my test code to reproduce the bug. Thanks, Michael > > Regards > Greg > > > >>> The previous bug reports might be considered somewhat contrived but this >>> one's from 'real' user space code, and none too complex at that? >>> >> >> Right. That code was as follows. There's nothing here aimed at >> arch/m68k/lib/uaccess.c in particular, just IO to a block device and a >> tmpfs filesystem. >> >> >> #!/bin/bash >> >> set -e -u >> >> filename=$1 >> >> rand=/tmp/rand_test_data >> zero=/tmp/zero_test_data >> len=512K >> >> dd if=/dev/urandom of=$rand bs=$len count=1 2>/dev/null >> dd if=/dev/zero of=$zero bs=$len count=1 2>/dev/null >> >> write() { >> dd $args if=$orig of=$filename # oflag=direct >> } >> >> compare() { >> echo 3 > /proc/sys/vm/drop_caches >> if ! cmp -n $len $orig $filename ; then >> diff -u <(hexdump -C < $orig) <(hexdump -C < $filename) >> fi >> } >> >> while true; do >> for args in "bs=512k count=1" "bs=64k count=8" "bs=4k >> count=128" "bs=512 count=1k" ; do >> echo $args >> orig=$rand ; write ; compare >> orig=$zero ; write ; compare >> sync >> echo >> done >> done ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v4 2/2] m68k: improve __constant_copy_to_user_asm() fault handling 2024-04-29 3:09 [PATCH v4 0/2] m68k uaccess fault handling fixes Michael Schmitz 2024-04-29 3:09 ` [PATCH v4 1/2] m68k: Handle __generic_copy_to_user faults more carefully Michael Schmitz @ 2024-04-29 3:09 ` Michael Schmitz 2024-04-29 7:58 ` [PATCH v4 0/2] m68k uaccess fault handling fixes Greg Ungerer 2 siblings, 0 replies; 17+ messages in thread From: Michael Schmitz @ 2024-04-29 3:09 UTC (permalink / raw) To: linux-m68k, geert; +Cc: gerg, Michael Schmitz, Finn Thain A problem similar to that reported for __put_user_asm and __generic_copy_to_user is also present in __constant_copy_to_user_asm. Address the problem by extending the exception table to cover two instructions past each moves instruction, and adding a single NOP at the very end to catch faults on the final instruction (which is not guaranteed to be a movesb!). Tested on 68030 (Atari Falcon 030) with a transfer beginning at a single byte at the end of a mapped page followed by seven more bytes on an unmapped page (testcase derived from stress-ng sysbadaddr stressor by Finn Thain and modified to use the llseek syscall). Cc: Finn Thain <fthain@linux-m68k.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Tested-by: Finn Thain <fthain@linux-m68k.org> Link: https://lore.kernel.org/all/e0f23460779e6d16e2633486ac4841790ef2aca0.1713176294.git.fthain@linux-m68k.org Signed-off-by: Michael Schmitz <schmitzmic@gmail.com> --- arch/m68k/include/asm/uaccess.h | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/arch/m68k/include/asm/uaccess.h b/arch/m68k/include/asm/uaccess.h index 44e52d8323e5..f1f4b62d6f69 100644 --- a/arch/m68k/include/asm/uaccess.h +++ b/arch/m68k/include/asm/uaccess.h @@ -288,10 +288,11 @@ __constant_copy_from_user(void *to, const void __user *from, unsigned long n) "21: "MOVES"."#s2" %3,(%1)+\n" \ "22:\n" \ " .ifnc \""#s3"\",\"\"\n" \ - " move."#s3" (%2)+,%3\n" \ - "31: "MOVES"."#s3" %3,(%1)+\n" \ - "32:\n" \ + "31: move."#s3" (%2)+,%3\n" \ + "32: "MOVES"."#s3" %3,(%1)+\n" \ + "33:\n" \ " .endif\n" \ + "34: nop\n" \ "4:\n" \ "\n" \ " .section __ex_table,\"a\"\n" \ @@ -303,7 +304,9 @@ __constant_copy_from_user(void *to, const void __user *from, unsigned long n) " .ifnc \""#s3"\",\"\"\n" \ " .long 31b,5f\n" \ " .long 32b,5f\n" \ + " .long 33b,5f\n" \ " .endif\n" \ + " .long 34b,5f\n" \ " .previous\n" \ "\n" \ " .section .fixup,\"ax\"\n" \ -- 2.17.1 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH v4 0/2] m68k uaccess fault handling fixes 2024-04-29 3:09 [PATCH v4 0/2] m68k uaccess fault handling fixes Michael Schmitz 2024-04-29 3:09 ` [PATCH v4 1/2] m68k: Handle __generic_copy_to_user faults more carefully Michael Schmitz 2024-04-29 3:09 ` [PATCH v4 2/2] m68k: improve __constant_copy_to_user_asm() fault handling Michael Schmitz @ 2024-04-29 7:58 ` Greg Ungerer 2024-04-29 8:08 ` Geert Uytterhoeven 2 siblings, 1 reply; 17+ messages in thread From: Greg Ungerer @ 2024-04-29 7:58 UTC (permalink / raw) To: Michael Schmitz, linux-m68k, geert Hi Michael, On 29/4/24 13:09, Michael Schmitz wrote: > (Actually now) version 4 of fixes for uaccess fault handling > on 68030, tested on 68030 and 68040. > > Patch 1 has one exception table entry fixed (residual > calculated in __generic_copy_to_user did not take into > account the number of longword transfers omitted due to a > fault) si as well as a final NOP added in __clear_user. > > Patch 2 is unchanged from v3 (actually since RFC v1), now > has Tested-by tag by Finn Thain added. > > These patches would still benefit from testing on 68060 > and Coldfire. I would very much like to test them on ColdFire. Unfortunately my only ColdFire with MMU board (a M547xEVB) failed a couple of months back. I am trying to find a replacement, but so far have not been able to get my hands on one. Maybe someone else out there has one they can test with? Regards Greg ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v4 0/2] m68k uaccess fault handling fixes 2024-04-29 7:58 ` [PATCH v4 0/2] m68k uaccess fault handling fixes Greg Ungerer @ 2024-04-29 8:08 ` Geert Uytterhoeven 0 siblings, 0 replies; 17+ messages in thread From: Geert Uytterhoeven @ 2024-04-29 8:08 UTC (permalink / raw) To: Greg Ungerer, Angelo Dureghello; +Cc: Michael Schmitz, linux-m68k On Mon, Apr 29, 2024 at 9:58 AM Greg Ungerer <gerg@linux-m68k.org> wrote: > On 29/4/24 13:09, Michael Schmitz wrote: > > (Actually now) version 4 of fixes for uaccess fault handling > > on 68030, tested on 68030 and 68040. > > > > Patch 1 has one exception table entry fixed (residual > > calculated in __generic_copy_to_user did not take into > > account the number of longword transfers omitted due to a > > fault) si as well as a final NOP added in __clear_user. > > > > Patch 2 is unchanged from v3 (actually since RFC v1), now > > has Tested-by tag by Finn Thain added. > > > > These patches would still benefit from testing on 68060 > > and Coldfire. > > I would very much like to test them on ColdFire. Unfortunately my only > ColdFire with MMU board (a M547xEVB) failed a couple of months back. > I am trying to find a replacement, but so far have not been able to get > my hands on one. > > Maybe someone else out there has one they can test with? Angelo? Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2024-08-09 12:58 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-04-29 3:09 [PATCH v4 0/2] m68k uaccess fault handling fixes Michael Schmitz 2024-04-29 3:09 ` [PATCH v4 1/2] m68k: Handle __generic_copy_to_user faults more carefully Michael Schmitz 2024-08-07 8:14 ` Finn Thain 2024-08-07 19:32 ` Michael Schmitz 2024-08-08 1:57 ` Finn Thain 2024-08-08 6:05 ` Greg Ungerer 2024-08-08 6:56 ` Finn Thain 2024-08-08 14:52 ` Greg Ungerer 2024-08-08 19:27 ` Michael Schmitz 2024-08-09 3:34 ` Finn Thain 2024-08-09 8:03 ` Michael Schmitz 2024-08-09 12:58 ` Greg Ungerer 2024-08-09 3:22 ` Finn Thain 2024-08-08 6:58 ` Michael Schmitz 2024-04-29 3:09 ` [PATCH v4 2/2] m68k: improve __constant_copy_to_user_asm() fault handling Michael Schmitz 2024-04-29 7:58 ` [PATCH v4 0/2] m68k uaccess fault handling fixes Greg Ungerer 2024-04-29 8:08 ` Geert Uytterhoeven
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox