* Since 6.10 - kernel oops/panics on G4 macmini due to change in drivers/ata/pata_macio.c @ 2024-08-12 22:32 Kolbjørn Barmen 2024-08-13 5:49 ` Jonáš Vidra 0 siblings, 1 reply; 11+ messages in thread From: Kolbjørn Barmen @ 2024-08-12 22:32 UTC (permalink / raw) To: linuxppc-dev; +Cc: linux-kernel Ever since 6.10, my macmini G4 behaved unstable when dealing with lots of I/O activity, such as sync'ing of Gentoo portage tree, unpacking kernel source tarball, building large software packages (or kernel) etc. After a bit of testing, and patient kernel rebuilding (while crashing) I found the cuplit to be this commit/change https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/diff/?id=09fe2bfa6b83f865126ce3964744863f69a4a030 Exampe of what a opps/panic looks like (and they all look very much alike) https://share.icloud.com/photos/042BHRkrXqPO-fllvpxMFl2CA -- kolla ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Since 6.10 - kernel oops/panics on G4 macmini due to change in drivers/ata/pata_macio.c 2024-08-12 22:32 Since 6.10 - kernel oops/panics on G4 macmini due to change in drivers/ata/pata_macio.c Kolbjørn Barmen @ 2024-08-13 5:49 ` Jonáš Vidra 2024-08-13 9:54 ` Niklas Cassel 0 siblings, 1 reply; 11+ messages in thread From: Jonáš Vidra @ 2024-08-13 5:49 UTC (permalink / raw) To: Kolbjørn Barmen Cc: linuxppc-dev, linux-kernel, linux-ide, mpe, cassel, linux On Tue 13. Aug 2024 0:32:37 CEST, Kolbjørn Barmen wrote: > Ever since 6.10, my macmini G4 behaved unstable when dealing with lots of > I/O activity, such as sync'ing of Gentoo portage tree, unpacking kernel > source tarball, building large software packages (or kernel) etc. > > After a bit of testing, and patient kernel rebuilding (while crashing) I > found the cuplit to be this commit/change > > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/diff/?id=09fe2bfa6b83f865126ce3964744863f69a4a030 I've been able to reproduce this pata_macio bug on a desktop PowerMac G4 with the 6.10.3 kernel version. Reverting the linked change ("ata: pata_macio: Fix max_segment_size with PAGE_SIZE == 64K") makes the errors go away. CCing linux-ide and the authors of that patch; I hope this is OK with you guys. > Exampe of what a opps/panic looks like (and they all look very much alike) > > https://share.icloud.com/photos/042BHRkrXqPO-fllvpxMFl2CA Textual form for easier searching: ------------[ cut here ]------------ kernel BUG at drivers/ata/pata_macio.c:544! Oops: Exception in kernel mode, sig: 5 [#1] BE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2 DEBUG_PAGEALLOC PowerMac Modules linked in: ipv6 binfmt_misc b43 mac80211 radeon libarc4 cfg80211 snd_aoa_codec_tas snd_aoa_fabric_layout snd_aoa rfkill snd_aoa_i2sbus hwmon drm_suballoc_helper snd_aoa_soundbus i2c_algo_bit snd_pcm backlight drm_ttm_helper ttm xhci_pci pmac_zilog therm_windtunnel xhci_hcd drm_display_helper firewire_ohci snd_timer snd firewire_core serial_base ssb soundcore crc_itu_t CPU: 1 PID: 1870 Comm: kworker/u10:4 Tainted: G T 6.10.3-gentoo #1 Hardware name: PowerMac3,6 7455 0x80010303 PowerMac Workqueue: btrfs-worker btrfs_work_helper NIP: c0719670 LR: c0719678 CTR: 00000001 REGS: f2db9bf0 TRAP: 0700 Tainted: G T (6.10.3-gentoo) MSR: 00021032 <ME,IR,DR,RI> CR: 44008408 XER: 20000000 GPR00: c06fc28c f2db9cb0 c10d8020 c12d28cc 00000000 00000000 00000000 c109cff4 GPR08: 69fd0000 00000100 00010000 00000000 00000000 00000000 c007801c c40c1980 GPR16: 00000000 00000000 00000000 00000000 00000000 00000100 00000122 c11377c8 GPR24: 000000ff 00000008 0000ff00 00000000 c14200a8 00000101 00000000 c109d000 NIP [c0719670] pata_macio_qc_prep+0xf4/0x190 LR [c0719678] pata_macio_qc_prep+0xfc/0x190 Call Trace: [f2db9cb0] [c1421660] 0xc1421660 (unreliable) [f2db9ce0] [c06fc28c] ata_qc_issue+0x14c/0x2d4 [f2db9d00] [c0707c5c] __ata_scsi_queuecmd+0x200/0x53c [f2db9d20] [c0707fe8] ata_scsi_queuecmd+0x50/0xe0 [f2db9d40] [c06e2644] scsi_queue_rq+0x788/0xb1c [f2db9d80] [c0492464] __blk_mq_issue_directly+0x58/0xf4 [f2db9db0] [c0497828] blk_mq_plug_issue_direct+0x8c/0x1b4 [f2db9de0] [c0498074] blk_mq_flush_plug_list.part.0+0x584/0x5e0 [f2db9e30] [c0485a40] __blk_flush_plug+0xf8/0x194 [f2db9e70] [c0485f88] __submit_bio+0x1b8/0x2e0 [f2db9ec0] [c04862e0] submit_bio_noacct_nocheck+0x230/0x304 [f2db9f00] [c03aaf30] btrfs_work_helper+0x200/0x338 [f2db9f40] [c006cae0] process_one_work+0x1a8/0x338 [f2db9f70] [c006d79c] worker_thread+0x364/0x4c0 [f2db9fc0] [c007811c] kthread+0x100/0x104 [f2db9ff0] [c001b304] start_kernel_thread+0x10/0x14 Code: 38ff0004 b37f0002 7d20ff2c 3bff0010 7d003d2c 7d084a14 93dffff8 b3dffffe b3dffffc 41820010 3bbd0001 4200ffc0 <0fe00000> 4bdcbb01 813c0044 3b180001 ---[ end trace 0000000000000000 ]--- note: kworker/u10:4[1870] exited with irqs disabled ------------[ cut here ]------------ WARNING: CPU: 1 PID: 1870 at kernel/exit.c:825 do_exit+0x854/0x9ec Modules linked in: ipv6 binfmt_misc b43 mac80211 radeon libarc4 cfg80211 snd_aoa_codec_tas snd_aoa_fabric_layout snd_aoa rfkill snd_aoa_i2sbus hwmon drm_suballoc_helper snd_aoa_soundbus i2c_algo_bit snd_pcm backlight drm_ttm_helper ttm xhci_pci pmac_zilog therm_windtunnel xhci_hcd drm_display_helper firewire_ohci snd_timer snd firewire_core serial_base ssb soundcore crc_itu_t CPU: 1 PID: 1870 Comm: kworker/u10:4 Tainted: G D T 6.10.3-gentoo #1 Hardware name: PowerMac3,6 7455 0x80010303 PowerMac Workqueue: btrfs-worker btrfs_work_helper NIP: c004f09c LR: c004e8a4 CTR: 00000000 REGS: f2db9a80 TRAP: 0700 Tainted: G D T (6.10.3-gentoo) MSR: 00029032 <EE,ME,IR,DR,RI> CR: 88db92e2 XER: 00000000 GPR00: c004f2c4 f2db9b40 c10d8020 00000000 00002710 00000000 00000000 00000000 GPR08: 00000000 f2db9e88 00000004 00000000 28db92e2 00000000 c007801c c40c1980 GPR16: 00000000 00000000 00000000 00000000 00000000 00000100 00000122 c11377c8 GPR24: 000000ff c0db0000 00001032 c0a21000 c138d520 00000005 c10d8020 c1447220 NIP [c004f09c] do_exit+0x854/0x9ec LR [c004e8a4] do_exit+0x5c/0x9ec Call Trace: [f2db9b40] [c00b0c38] _printk+0x78/0xc4 (unreliable) [f2db9b90] [c004f2c4] make_task_dead+0x90/0x174 [f2db9bb0] [c0010b9c] die+0x324/0x32c [f2db9be0] [c0004828] ProgramCheck_virt+0x108/0x158 --- interrupt: 700 at pata_macio_qc_prep+0xf4/0x190 NIP: c0719670 LR: c0719678 CTR: 00000001 REGS: f2db9bf0 TRAP: 0700 Tainted: G D T (6.10.3-gentoo) MSR: 00021032 <ME,IR,DR,RI> CR: 44008408 XER: 20000000 GPR00: c06fc28c f2db9cb0 c10d8020 c12d28cc 00000000 00000000 00000000 c109cff4 GPR08: 69fd0000 00000100 00010000 00000000 00000000 00000000 c007801c c40c1980 GPR16: 00000000 00000000 00000000 00000000 00000000 00000100 00000122 c11377c8 GPR24: 000000ff 00000008 0000ff00 00000000 c14200a8 00000101 00000000 c109d000 NIP [c0719670] pata_macio_qc_prep+0xf4/0x190 LR [c0719678] pata_macio_qc_prep+0xfc/0x190 --- interrupt: 700 [f2db9cb0] [c1421660] 0xc1421660 (unreliable) [f2db9ce0] [c06fc28c] ata_qc_issue+0x14c/0x2d4 [f2db9d00] [c0707c5c] __ata_scsi_queuecmd+0x200/0x53c [f2db9d20] [c0707fe8] ata_scsi_queuecmd+0x50/0xe0 [f2db9d40] [c06e2644] scsi_queue_rq+0x788/0xb1c [f2db9d80] [c0492464] __blk_mq_issue_directly+0x58/0xf4 [f2db9db0] [c0497828] blk_mq_plug_issue_direct+0x8c/0x1b4 [f2db9de0] [c0498074] blk_mq_flush_plug_list.part.0+0x584/0x5e0 [f2db9e30] [c0485a40] __blk_flush_plug+0xf8/0x194 [f2db9e70] [c0485f88] __submit_bio+0x1b8/0x2e0 [f2db9ec0] [c04862e0] submit_bio_noacct_nocheck+0x230/0x304 [f2db9f00] [c03aaf30] btrfs_work_helper+0x200/0x338 [f2db9f40] [c006cae0] process_one_work+0x1a8/0x338 [f2db9f70] [c006d79c] worker_thread+0x364/0x4c0 [f2db9fc0] [c007811c] kthread+0x100/0x104 [f2db9ff0] [c001b304] start_kernel_thread+0x10/0x14 Code: 915e02fc 81410014 912a0004 915e03c0 939e03c4 91210014 813e04cc 4bfffcec 807e0370 38800000 4bffe195 4bfffc9c <0fe00000> 4bfff848 0fe00000 4bfff7ec ---[ end trace 0000000000000000 ]--- ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Since 6.10 - kernel oops/panics on G4 macmini due to change in drivers/ata/pata_macio.c 2024-08-13 5:49 ` Jonáš Vidra @ 2024-08-13 9:54 ` Niklas Cassel 2024-08-13 9:58 ` Jonáš Vidra 2024-08-13 12:32 ` Michael Ellerman 0 siblings, 2 replies; 11+ messages in thread From: Niklas Cassel @ 2024-08-13 9:54 UTC (permalink / raw) To: Michael Ellerman Cc: Kolbjørn Barmen, linuxppc-dev, linux-kernel, linux-ide, Jonáš Vidra, Christoph Hellwig, linux Hello Jonáš, Kolbjørn, thank you for the report. On Tue, Aug 13, 2024 at 07:49:34AM +0200, Jonáš Vidra wrote: > On Tue 13. Aug 2024 0:32:37 CEST, Kolbjørn Barmen wrote: > > Ever since 6.10, my macmini G4 behaved unstable when dealing with lots of > > I/O activity, such as sync'ing of Gentoo portage tree, unpacking kernel > > source tarball, building large software packages (or kernel) etc. > > > > After a bit of testing, and patient kernel rebuilding (while crashing) I > > found the cuplit to be this commit/change > > > > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/diff/?id=09fe2bfa6b83f865126ce3964744863f69a4a030 > > I've been able to reproduce this pata_macio bug on a desktop PowerMac G4 > with the 6.10.3 kernel version. Reverting the linked change > ("ata: pata_macio: Fix max_segment_size with PAGE_SIZE == 64K") makes > the errors go away. Michael, as the author of the this commit, could you please look into this issue? We could revert your patch, which appears to work for some users, but that would again break setups with PAGE_SIZE == 64K. (I assume that Jonáš and Kolbjørn are not building with PAGE_SIZE == 64K.) > > ------------[ cut here ]------------ > kernel BUG at drivers/ata/pata_macio.c:544! https://github.com/torvalds/linux/blob/v6.11-rc3/drivers/ata/pata_macio.c#L544 It seems that the while (sg_len) loop does not play nice with the new .max_segment_size. > Oops: Exception in kernel mode, sig: 5 [#1] > BE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2 DEBUG_PAGEALLOC PowerMac > Modules linked in: ipv6 binfmt_misc b43 mac80211 radeon libarc4 cfg80211 > snd_aoa_codec_tas snd_aoa_fabric_layout snd_aoa rfkill snd_aoa_i2sbus hwmon > drm_suballoc_helper snd_aoa_soundbus i2c_algo_bit snd_pcm backlight > drm_ttm_helper ttm xhci_pci pmac_zilog therm_windtunnel xhci_hcd > drm_display_helper firewire_ohci snd_timer snd firewire_core serial_base ssb > soundcore crc_itu_t > CPU: 1 PID: 1870 Comm: kworker/u10:4 Tainted: G T > 6.10.3-gentoo #1 > Hardware name: PowerMac3,6 7455 0x80010303 PowerMac > Workqueue: btrfs-worker btrfs_work_helper > NIP: c0719670 LR: c0719678 CTR: 00000001 > REGS: f2db9bf0 TRAP: 0700 Tainted: G T (6.10.3-gentoo) > MSR: 00021032 <ME,IR,DR,RI> CR: 44008408 XER: 20000000 > > GPR00: c06fc28c f2db9cb0 c10d8020 c12d28cc 00000000 00000000 00000000 > c109cff4 GPR08: 69fd0000 00000100 00010000 00000000 00000000 00000000 > c007801c c40c1980 GPR16: 00000000 00000000 00000000 00000000 00000000 > 00000100 00000122 c11377c8 GPR24: 000000ff 00000008 0000ff00 00000000 > c14200a8 00000101 00000000 c109d000 NIP [c0719670] > pata_macio_qc_prep+0xf4/0x190 > LR [c0719678] pata_macio_qc_prep+0xfc/0x190 > Call Trace: > [f2db9cb0] [c1421660] 0xc1421660 (unreliable) > [f2db9ce0] [c06fc28c] ata_qc_issue+0x14c/0x2d4 > [f2db9d00] [c0707c5c] __ata_scsi_queuecmd+0x200/0x53c > [f2db9d20] [c0707fe8] ata_scsi_queuecmd+0x50/0xe0 > [f2db9d40] [c06e2644] scsi_queue_rq+0x788/0xb1c > [f2db9d80] [c0492464] __blk_mq_issue_directly+0x58/0xf4 > [f2db9db0] [c0497828] blk_mq_plug_issue_direct+0x8c/0x1b4 > [f2db9de0] [c0498074] blk_mq_flush_plug_list.part.0+0x584/0x5e0 > [f2db9e30] [c0485a40] __blk_flush_plug+0xf8/0x194 > [f2db9e70] [c0485f88] __submit_bio+0x1b8/0x2e0 > [f2db9ec0] [c04862e0] submit_bio_noacct_nocheck+0x230/0x304 > [f2db9f00] [c03aaf30] btrfs_work_helper+0x200/0x338 > [f2db9f40] [c006cae0] process_one_work+0x1a8/0x338 > [f2db9f70] [c006d79c] worker_thread+0x364/0x4c0 > [f2db9fc0] [c007811c] kthread+0x100/0x104 > [f2db9ff0] [c001b304] start_kernel_thread+0x10/0x14 > Code: 38ff0004 b37f0002 7d20ff2c 3bff0010 7d003d2c 7d084a14 93dffff8 > b3dffffe b3dffffc 41820010 3bbd0001 4200ffc0 <0fe00000> 4bdcbb01 813c0044 > 3b180001 ---[ end trace 0000000000000000 ]--- > > note: kworker/u10:4[1870] exited with irqs disabled > ------------[ cut here ]------------ > WARNING: CPU: 1 PID: 1870 at kernel/exit.c:825 do_exit+0x854/0x9ec > Modules linked in: ipv6 binfmt_misc b43 mac80211 radeon libarc4 cfg80211 > snd_aoa_codec_tas snd_aoa_fabric_layout snd_aoa rfkill snd_aoa_i2sbus hwmon > drm_suballoc_helper snd_aoa_soundbus i2c_algo_bit snd_pcm backlight > drm_ttm_helper ttm xhci_pci pmac_zilog therm_windtunnel xhci_hcd > drm_display_helper firewire_ohci snd_timer snd firewire_core serial_base ssb > soundcore crc_itu_t > CPU: 1 PID: 1870 Comm: kworker/u10:4 Tainted: G D T > 6.10.3-gentoo #1 > Hardware name: PowerMac3,6 7455 0x80010303 PowerMac > Workqueue: btrfs-worker btrfs_work_helper > NIP: c004f09c LR: c004e8a4 CTR: 00000000 > REGS: f2db9a80 TRAP: 0700 Tainted: G D T (6.10.3-gentoo) > MSR: 00029032 <EE,ME,IR,DR,RI> CR: 88db92e2 XER: 00000000 > > GPR00: c004f2c4 f2db9b40 c10d8020 00000000 00002710 00000000 00000000 > 00000000 GPR08: 00000000 f2db9e88 00000004 00000000 28db92e2 00000000 > c007801c c40c1980 GPR16: 00000000 00000000 00000000 00000000 00000000 > 00000100 00000122 c11377c8 GPR24: 000000ff c0db0000 00001032 c0a21000 > c138d520 00000005 c10d8020 c1447220 NIP [c004f09c] do_exit+0x854/0x9ec > LR [c004e8a4] do_exit+0x5c/0x9ec > Call Trace: > [f2db9b40] [c00b0c38] _printk+0x78/0xc4 (unreliable) > [f2db9b90] [c004f2c4] make_task_dead+0x90/0x174 > [f2db9bb0] [c0010b9c] die+0x324/0x32c > [f2db9be0] [c0004828] ProgramCheck_virt+0x108/0x158 > --- interrupt: 700 at pata_macio_qc_prep+0xf4/0x190 > NIP: c0719670 LR: c0719678 CTR: 00000001 > REGS: f2db9bf0 TRAP: 0700 Tainted: G D T (6.10.3-gentoo) > MSR: 00021032 <ME,IR,DR,RI> CR: 44008408 XER: 20000000 > > GPR00: c06fc28c f2db9cb0 c10d8020 c12d28cc 00000000 00000000 00000000 > c109cff4 GPR08: 69fd0000 00000100 00010000 00000000 00000000 00000000 > c007801c c40c1980 GPR16: 00000000 00000000 00000000 00000000 00000000 > 00000100 00000122 c11377c8 GPR24: 000000ff 00000008 0000ff00 00000000 > c14200a8 00000101 00000000 c109d000 NIP [c0719670] > pata_macio_qc_prep+0xf4/0x190 > LR [c0719678] pata_macio_qc_prep+0xfc/0x190 > --- interrupt: 700 > [f2db9cb0] [c1421660] 0xc1421660 (unreliable) > [f2db9ce0] [c06fc28c] ata_qc_issue+0x14c/0x2d4 > [f2db9d00] [c0707c5c] __ata_scsi_queuecmd+0x200/0x53c > [f2db9d20] [c0707fe8] ata_scsi_queuecmd+0x50/0xe0 > [f2db9d40] [c06e2644] scsi_queue_rq+0x788/0xb1c > [f2db9d80] [c0492464] __blk_mq_issue_directly+0x58/0xf4 > [f2db9db0] [c0497828] blk_mq_plug_issue_direct+0x8c/0x1b4 > [f2db9de0] [c0498074] blk_mq_flush_plug_list.part.0+0x584/0x5e0 > [f2db9e30] [c0485a40] __blk_flush_plug+0xf8/0x194 > [f2db9e70] [c0485f88] __submit_bio+0x1b8/0x2e0 > [f2db9ec0] [c04862e0] submit_bio_noacct_nocheck+0x230/0x304 > [f2db9f00] [c03aaf30] btrfs_work_helper+0x200/0x338 > [f2db9f40] [c006cae0] process_one_work+0x1a8/0x338 > [f2db9f70] [c006d79c] worker_thread+0x364/0x4c0 > [f2db9fc0] [c007811c] kthread+0x100/0x104 > [f2db9ff0] [c001b304] start_kernel_thread+0x10/0x14 > Code: 915e02fc 81410014 912a0004 915e03c0 939e03c4 91210014 813e04cc > 4bfffcec 807e0370 38800000 4bffe195 4bfffc9c <0fe00000> 4bfff848 0fe00000 > 4bfff7ec ---[ end trace 0000000000000000 ]--- > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Since 6.10 - kernel oops/panics on G4 macmini due to change in drivers/ata/pata_macio.c 2024-08-13 9:54 ` Niklas Cassel @ 2024-08-13 9:58 ` Jonáš Vidra 2024-08-13 12:32 ` Michael Ellerman 1 sibling, 0 replies; 11+ messages in thread From: Jonáš Vidra @ 2024-08-13 9:58 UTC (permalink / raw) To: Niklas Cassel Cc: Michael Ellerman, Kolbjørn Barmen, linuxppc-dev, linux-kernel, linux-ide, Christoph Hellwig, linux On úterý 13. srpna 2024 11:54:57 CEST, Niklas Cassel wrote: > Hello Jonáš, Kolbjørn, > > thank you for the report. > > On Tue, Aug 13, 2024 at 07:49:34AM +0200, Jonáš Vidra wrote: > >> On Tue 13. Aug 2024 0:32:37 CEST, Kolbjørn Barmen wrote: ... > > Michael, as the author of the this commit, could you please look into > this issue? > > We could revert your patch, which appears to work for some users, > but that would again break setups with PAGE_SIZE == 64K. > (I assume that Jonáš and Kolbjørn are not building with PAGE_SIZE == 64K.) This is from a PPC32 machine, so it doesn't even have that option. It only supports 4K pages. >> ------------[ cut here ]------------ >> kernel BUG at drivers/ata/pata_macio.c:544! > > > https://github.com/torvalds/linux/blob/v6.11-rc3/drivers/ata/pata_macio.c#L544 > > It seems that the > while (sg_len) loop does not play nice with the new .max_segment_size. > > > >> Oops: Exception in kernel mode, sig: 5 [#1] >> BE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2 DEBUG_PAGEALLOC PowerMac >> Modules linked in: ipv6 binfmt_misc b43 mac80211 radeon libarc4 cfg80211 >> snd_aoa_codec_tas snd_aoa_fabric_layout snd_aoa rfkill >> snd_aoa_i2sbus hwmon >> drm_suballoc_helper snd_aoa_soundbus i2c_algo_bit snd_pcm backlight >> drm_ttm_helper ttm xhci_pci pmac_zilog therm_windtunnel xhci_hcd >> drm_display_helper firewire_ohci snd_timer snd firewire_core >> serial_base ssb >> soundcore crc_itu_t >> CPU: 1 PID: 1870 Comm: kworker/u10:4 Tainted: G T >> 6.10.3-gentoo #1 >> Hardware name: PowerMac3,6 7455 0x80010303 PowerMac >> Workqueue: btrfs-worker btrfs_work_helper >> NIP: c0719670 LR: c0719678 CTR: 00000001 >> REGS: f2db9bf0 TRAP: 0700 Tainted: G T (6.10.3-gentoo) >> MSR: 00021032 <ME,IR,DR,RI> CR: 44008408 XER: 20000000 >> >> GPR00: c06fc28c f2db9cb0 c10d8020 c12d28cc 00000000 00000000 00000000 >> c109cff4 GPR08: 69fd0000 00000100 00010000 00000000 00000000 00000000 >> c007801c c40c1980 GPR16: 00000000 00000000 00000000 00000000 00000000 >> 00000100 00000122 c11377c8 GPR24: 000000ff 00000008 0000ff00 00000000 >> c14200a8 00000101 00000000 c109d000 NIP [c0719670] >> pata_macio_qc_prep+0xf4/0x190 >> LR [c0719678] pata_macio_qc_prep+0xfc/0x190 >> Call Trace: >> [f2db9cb0] [c1421660] 0xc1421660 (unreliable) >> [f2db9ce0] [c06fc28c] ata_qc_issue+0x14c/0x2d4 >> [f2db9d00] [c0707c5c] __ata_scsi_queuecmd+0x200/0x53c >> [f2db9d20] [c0707fe8] ata_scsi_queuecmd+0x50/0xe0 >> [f2db9d40] [c06e2644] scsi_queue_rq+0x788/0xb1c >> [f2db9d80] [c0492464] __blk_mq_issue_directly+0x58/0xf4 >> [f2db9db0] [c0497828] blk_mq_plug_issue_direct+0x8c/0x1b4 >> [f2db9de0] [c0498074] blk_mq_flush_plug_list.part.0+0x584/0x5e0 >> [f2db9e30] [c0485a40] __blk_flush_plug+0xf8/0x194 >> [f2db9e70] [c0485f88] __submit_bio+0x1b8/0x2e0 >> [f2db9ec0] [c04862e0] submit_bio_noacct_nocheck+0x230/0x304 >> [f2db9f00] [c03aaf30] btrfs_work_helper+0x200/0x338 >> [f2db9f40] [c006cae0] process_one_work+0x1a8/0x338 >> [f2db9f70] [c006d79c] worker_thread+0x364/0x4c0 >> [f2db9fc0] [c007811c] kthread+0x100/0x104 >> [f2db9ff0] [c001b304] start_kernel_thread+0x10/0x14 >> Code: 38ff0004 b37f0002 7d20ff2c 3bff0010 7d003d2c 7d084a14 93dffff8 >> b3dffffe b3dffffc 41820010 3bbd0001 4200ffc0 <0fe00000> 4bdcbb01 813c0044 >> 3b180001 ---[ end trace 0000000000000000 ]--- >> >> note: kworker/u10:4[1870] exited with irqs disabled >> ------------[ cut here ]------------ >> WARNING: CPU: 1 PID: 1870 at kernel/exit.c:825 do_exit+0x854/0x9ec >> Modules linked in: ipv6 binfmt_misc b43 mac80211 radeon libarc4 cfg80211 >> snd_aoa_codec_tas snd_aoa_fabric_layout snd_aoa rfkill >> snd_aoa_i2sbus hwmon >> drm_suballoc_helper snd_aoa_soundbus i2c_algo_bit snd_pcm backlight >> drm_ttm_helper ttm xhci_pci pmac_zilog therm_windtunnel xhci_hcd >> drm_display_helper firewire_ohci snd_timer snd firewire_core >> serial_base ssb >> soundcore crc_itu_t >> CPU: 1 PID: 1870 Comm: kworker/u10:4 Tainted: G D T >> 6.10.3-gentoo #1 >> Hardware name: PowerMac3,6 7455 0x80010303 PowerMac >> Workqueue: btrfs-worker btrfs_work_helper >> NIP: c004f09c LR: c004e8a4 CTR: 00000000 >> REGS: f2db9a80 TRAP: 0700 Tainted: G D T (6.10.3-gentoo) >> MSR: 00029032 <EE,ME,IR,DR,RI> CR: 88db92e2 XER: 00000000 >> >> GPR00: c004f2c4 f2db9b40 c10d8020 00000000 00002710 00000000 00000000 >> 00000000 GPR08: 00000000 f2db9e88 00000004 00000000 28db92e2 00000000 >> c007801c c40c1980 GPR16: 00000000 00000000 00000000 00000000 00000000 >> 00000100 00000122 c11377c8 GPR24: 000000ff c0db0000 00001032 c0a21000 >> c138d520 00000005 c10d8020 c1447220 NIP [c004f09c] do_exit+0x854/0x9ec >> LR [c004e8a4] do_exit+0x5c/0x9ec >> Call Trace: >> [f2db9b40] [c00b0c38] _printk+0x78/0xc4 (unreliable) >> [f2db9b90] [c004f2c4] make_task_dead+0x90/0x174 >> [f2db9bb0] [c0010b9c] die+0x324/0x32c >> [f2db9be0] [c0004828] ProgramCheck_virt+0x108/0x158 >> --- interrupt: 700 at pata_macio_qc_prep+0xf4/0x190 >> NIP: c0719670 LR: c0719678 CTR: 00000001 >> REGS: f2db9bf0 TRAP: 0700 Tainted: G D T (6.10.3-gentoo) >> MSR: 00021032 <ME,IR,DR,RI> CR: 44008408 XER: 20000000 >> >> GPR00: c06fc28c f2db9cb0 c10d8020 c12d28cc 00000000 00000000 00000000 >> c109cff4 GPR08: 69fd0000 00000100 00010000 00000000 00000000 00000000 >> c007801c c40c1980 GPR16: 00000000 00000000 00000000 00000000 00000000 >> 00000100 00000122 c11377c8 GPR24: 000000ff 00000008 0000ff00 00000000 >> c14200a8 00000101 00000000 c109d000 NIP [c0719670] >> pata_macio_qc_prep+0xf4/0x190 >> LR [c0719678] pata_macio_qc_prep+0xfc/0x190 >> --- interrupt: 700 >> [f2db9cb0] [c1421660] 0xc1421660 (unreliable) >> [f2db9ce0] [c06fc28c] ata_qc_issue+0x14c/0x2d4 >> [f2db9d00] [c0707c5c] __ata_scsi_queuecmd+0x200/0x53c >> [f2db9d20] [c0707fe8] ata_scsi_queuecmd+0x50/0xe0 >> [f2db9d40] [c06e2644] scsi_queue_rq+0x788/0xb1c >> [f2db9d80] [c0492464] __blk_mq_issue_directly+0x58/0xf4 >> [f2db9db0] [c0497828] blk_mq_plug_issue_direct+0x8c/0x1b4 >> [f2db9de0] [c0498074] blk_mq_flush_plug_list.part.0+0x584/0x5e0 >> [f2db9e30] [c0485a40] __blk_flush_plug+0xf8/0x194 >> [f2db9e70] [c0485f88] __submit_bio+0x1b8/0x2e0 >> [f2db9ec0] [c04862e0] submit_bio_noacct_nocheck+0x230/0x304 >> [f2db9f00] [c03aaf30] btrfs_work_helper+0x200/0x338 >> [f2db9f40] [c006cae0] process_one_work+0x1a8/0x338 >> [f2db9f70] [c006d79c] worker_thread+0x364/0x4c0 >> [f2db9fc0] [c007811c] kthread+0x100/0x104 >> [f2db9ff0] [c001b304] start_kernel_thread+0x10/0x14 >> Code: 915e02fc 81410014 912a0004 915e03c0 939e03c4 91210014 813e04cc >> 4bfffcec 807e0370 38800000 4bffe195 4bfffc9c <0fe00000> 4bfff848 0fe00000 >> 4bfff7ec ---[ end trace 0000000000000000 ]--- >> > > > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Since 6.10 - kernel oops/panics on G4 macmini due to change in drivers/ata/pata_macio.c 2024-08-13 9:54 ` Niklas Cassel 2024-08-13 9:58 ` Jonáš Vidra @ 2024-08-13 12:32 ` Michael Ellerman 2024-08-13 14:33 ` Kolbjørn Barmen 2024-08-13 14:59 ` Niklas Cassel 1 sibling, 2 replies; 11+ messages in thread From: Michael Ellerman @ 2024-08-13 12:32 UTC (permalink / raw) To: Niklas Cassel Cc: Kolbjørn Barmen, linuxppc-dev, linux-kernel, linux-ide, Jonáš Vidra, Christoph Hellwig, linux Niklas Cassel <cassel@kernel.org> writes: > Hello Jonáš, Kolbjørn, > > thank you for the report. > > On Tue, Aug 13, 2024 at 07:49:34AM +0200, Jonáš Vidra wrote: >> On Tue 13. Aug 2024 0:32:37 CEST, Kolbjørn Barmen wrote: >> > Ever since 6.10, my macmini G4 behaved unstable when dealing with lots of >> > I/O activity, such as sync'ing of Gentoo portage tree, unpacking kernel >> > source tarball, building large software packages (or kernel) etc. >> > >> > After a bit of testing, and patient kernel rebuilding (while crashing) I >> > found the cuplit to be this commit/change >> > >> > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/diff/?id=09fe2bfa6b83f865126ce3964744863f69a4a030 >> >> I've been able to reproduce this pata_macio bug on a desktop PowerMac G4 >> with the 6.10.3 kernel version. Reverting the linked change >> ("ata: pata_macio: Fix max_segment_size with PAGE_SIZE == 64K") makes >> the errors go away. > > Michael, as the author of the this commit, could you please look into > this issue? I can. My commit was really just working around the warning in the SCSI core which appeared after afd53a3d8528, it was supposed to just fix the warning without changing behaviour. Though obviously it did for 4KB PAGE_SIZE kernels. I don't have easy access to my mac-mini so it would be helpful if you can test changes Jonáš and/or Kolbjørn. > We could revert your patch, which appears to work for some users, > but that would again break setups with PAGE_SIZE == 64K. > (I assume that Jonáš and Kolbjørn are not building with PAGE_SIZE == 64K.) Yes they are using 4K, it says so in the oops. >> ------------[ cut here ]------------ >> kernel BUG at drivers/ata/pata_macio.c:544! > > https://github.com/torvalds/linux/blob/v6.11-rc3/drivers/ata/pata_macio.c#L544 > > It seems that the > while (sg_len) loop does not play nice with the new .max_segment_size. Right, but only for 4KB kernels for some reason. Is there some limit elsewhere that prevents the bug tripping on 64KB kernels, or is it just luck that no one has hit it? I wonder if the best solution is something like below. It effectively reverts to the old behaviour for 4KB page size, and should avoid the same bug happening on 64KB page size kernels. cheers diff --git a/drivers/ata/pata_macio.c b/drivers/ata/pata_macio.c index 1b85e8bf4ef9..eaffa510de49 100644 --- a/drivers/ata/pata_macio.c +++ b/drivers/ata/pata_macio.c @@ -208,6 +208,19 @@ static const char* macio_ata_names[] = { /* Don't let a DMA segment go all the way to 64K */ #define MAX_DBDMA_SEG 0xff00 +#ifdef CONFIG_PAGE_SIZE_64KB +/* + * The SCSI core requires the segment size to cover at least a page, so + * for 64K page size kernels it must be at least 64K. However the + * hardware can't handle 64K, so pata_macio_qc_prep() will split large + * requests. To handle the split requests the tablesize must be halved. + */ +#define MAX_SEGMENT_SIZE SZ_64K +#define SG_TABLESIZE (MAX_DCMDS / 2) +#else +#define MAX_SEGMENT_SIZE MAX_DBDMA_SEG +#define SG_TABLESIZE MAX_DCMDS +#endif /* * Wait 1s for disk to answer on IDE bus after a hard reset @@ -912,16 +925,10 @@ static int pata_macio_do_resume(struct pata_macio_priv *priv) static const struct scsi_host_template pata_macio_sht = { __ATA_BASE_SHT(DRV_NAME), - .sg_tablesize = MAX_DCMDS, + .sg_tablesize = SG_TABLESIZE, /* We may not need that strict one */ .dma_boundary = ATA_DMA_BOUNDARY, - /* - * The SCSI core requires the segment size to cover at least a page, so - * for 64K page size kernels this must be at least 64K. However the - * hardware can't handle 64K, so pata_macio_qc_prep() will split large - * requests. - */ - .max_segment_size = SZ_64K, + .max_segment_size = MAX_SEGMENT_SIZE, .device_configure = pata_macio_device_configure, .sdev_groups = ata_common_sdev_groups, .can_queue = ATA_DEF_QUEUE, ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: Since 6.10 - kernel oops/panics on G4 macmini due to change in drivers/ata/pata_macio.c 2024-08-13 12:32 ` Michael Ellerman @ 2024-08-13 14:33 ` Kolbjørn Barmen 2024-08-13 14:59 ` Niklas Cassel 1 sibling, 0 replies; 11+ messages in thread From: Kolbjørn Barmen @ 2024-08-13 14:33 UTC (permalink / raw) To: Michael Ellerman Cc: Niklas Cassel, Kolbjørn Barmen, linuxppc-dev, linux-kernel, linux-ide, Jonáš Vidra, Christoph Hellwig, linux On Tue, 13 Aug 2024, Michael Ellerman wrote: > Niklas Cassel <cassel@kernel.org> writes: > > Hello Jonáš, Kolbjørn, > > > > thank you for the report. > > > > On Tue, Aug 13, 2024 at 07:49:34AM +0200, Jonáš Vidra wrote: > >> On Tue 13. Aug 2024 0:32:37 CEST, Kolbjørn Barmen wrote: > >> > Ever since 6.10, my macmini G4 behaved unstable when dealing with lots of > >> > I/O activity, such as sync'ing of Gentoo portage tree, unpacking kernel > >> > source tarball, building large software packages (or kernel) etc. > >> > > >> > After a bit of testing, and patient kernel rebuilding (while crashing) I > >> > found the cuplit to be this commit/change > >> > > >> > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/diff/?id=09fe2bfa6b83f865126ce3964744863f69a4a030 > >> > >> I've been able to reproduce this pata_macio bug on a desktop PowerMac G4 > >> with the 6.10.3 kernel version. Reverting the linked change > >> ("ata: pata_macio: Fix max_segment_size with PAGE_SIZE == 64K") makes > >> the errors go away. > > > > Michael, as the author of the this commit, could you please look into > > this issue? > > I can. My commit was really just working around the warning in the SCSI > core which appeared after afd53a3d8528, it was supposed to just fix the > warning without changing behaviour. Though obviously it did for 4KB > PAGE_SIZE kernels. > > I don't have easy access to my mac-mini so it would be helpful if you > can test changes Jonáš and/or Kolbjørn. I applied your patch (to 6.10.4 sources) and built a kernel, and did some stress testing (tarring adnd untarring large archives) and so far it looks good. Thanks! :) -- kolla ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Since 6.10 - kernel oops/panics on G4 macmini due to change in drivers/ata/pata_macio.c 2024-08-13 12:32 ` Michael Ellerman 2024-08-13 14:33 ` Kolbjørn Barmen @ 2024-08-13 14:59 ` Niklas Cassel 2024-08-14 12:20 ` Michael Ellerman 1 sibling, 1 reply; 11+ messages in thread From: Niklas Cassel @ 2024-08-13 14:59 UTC (permalink / raw) To: Michael Ellerman Cc: Kolbjørn Barmen, linuxppc-dev, linux-kernel, linux-ide, Jonáš Vidra, Christoph Hellwig, linux Hello Michael, On Tue, Aug 13, 2024 at 10:32:36PM +1000, Michael Ellerman wrote: > Niklas Cassel <cassel@kernel.org> writes: > > Hello Jonáš, Kolbjørn, > > > > thank you for the report. > > > > On Tue, Aug 13, 2024 at 07:49:34AM +0200, Jonáš Vidra wrote: > >> On Tue 13. Aug 2024 0:32:37 CEST, Kolbjørn Barmen wrote: > >> > Ever since 6.10, my macmini G4 behaved unstable when dealing with lots of > >> > I/O activity, such as sync'ing of Gentoo portage tree, unpacking kernel > >> > source tarball, building large software packages (or kernel) etc. > >> > > >> > After a bit of testing, and patient kernel rebuilding (while crashing) I > >> > found the cuplit to be this commit/change > >> > > >> > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/diff/?id=09fe2bfa6b83f865126ce3964744863f69a4a030 > >> > >> I've been able to reproduce this pata_macio bug on a desktop PowerMac G4 > >> with the 6.10.3 kernel version. Reverting the linked change > >> ("ata: pata_macio: Fix max_segment_size with PAGE_SIZE == 64K") makes > >> the errors go away. > > > > Michael, as the author of the this commit, could you please look into > > this issue? > > I can. My commit was really just working around the warning in the SCSI > core which appeared after afd53a3d8528, it was supposed to just fix the > warning without changing behaviour. Though obviously it did for 4KB > PAGE_SIZE kernels. > > I don't have easy access to my mac-mini so it would be helpful if you > can test changes Jonáš and/or Kolbjørn. > > > We could revert your patch, which appears to work for some users, > > but that would again break setups with PAGE_SIZE == 64K. > > (I assume that Jonáš and Kolbjørn are not building with PAGE_SIZE == 64K.) > > Yes they are using 4K, it says so in the oops. > > >> ------------[ cut here ]------------ > >> kernel BUG at drivers/ata/pata_macio.c:544! > > > > https://github.com/torvalds/linux/blob/v6.11-rc3/drivers/ata/pata_macio.c#L544 > > > > It seems that the > > while (sg_len) loop does not play nice with the new .max_segment_size. > > Right, but only for 4KB kernels for some reason. Is there some limit > elsewhere that prevents the bug tripping on 64KB kernels, or is it just > luck that no one has hit it? Have your tried running fio (flexible I/O tester), with reads with a very large block sizes? I would be surprised if it isn't possible to trigger the same bug with 64K page size. max segment size = 64K MAX_DCMDS = 256 256 * 64K = 16 MiB What happens if you run fio with a 16 MiB blocksize? Something like: $ sudo fio --name=test --filename=/dev/sdX --direct=1 --runtime=60 --ioengine=io_uring --rw=read --iodepth=4 --bs=16M Kind regards, Niklas ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Since 6.10 - kernel oops/panics on G4 macmini due to change in drivers/ata/pata_macio.c 2024-08-13 14:59 ` Niklas Cassel @ 2024-08-14 12:20 ` Michael Ellerman 2024-08-14 14:06 ` Niklas Cassel 0 siblings, 1 reply; 11+ messages in thread From: Michael Ellerman @ 2024-08-14 12:20 UTC (permalink / raw) To: Niklas Cassel Cc: Kolbjørn Barmen, linuxppc-dev, linux-kernel, linux-ide, Jonáš Vidra, Christoph Hellwig, linux Niklas Cassel <cassel@kernel.org> writes: > On Tue, Aug 13, 2024 at 10:32:36PM +1000, Michael Ellerman wrote: >> Niklas Cassel <cassel@kernel.org> writes: >> > On Tue, Aug 13, 2024 at 07:49:34AM +0200, Jonáš Vidra wrote: ... >> >> ------------[ cut here ]------------ >> >> kernel BUG at drivers/ata/pata_macio.c:544! >> > >> > https://github.com/torvalds/linux/blob/v6.11-rc3/drivers/ata/pata_macio.c#L544 >> > >> > It seems that the >> > while (sg_len) loop does not play nice with the new .max_segment_size. >> >> Right, but only for 4KB kernels for some reason. Is there some limit >> elsewhere that prevents the bug tripping on 64KB kernels, or is it just >> luck that no one has hit it? > > Have your tried running fio (flexible I/O tester), with reads with a very > large block sizes? > > I would be surprised if it isn't possible to trigger the same bug with > 64K page size. > > max segment size = 64K > MAX_DCMDS = 256 > 256 * 64K = 16 MiB > What happens if you run fio with a 16 MiB blocksize? > > Something like: > $ sudo fio --name=test --filename=/dev/sdX --direct=1 --runtime=60 --ioengine=io_uring --rw=read --iodepth=4 --bs=16M Nothing interesting happens, fio succeeds. The largest request that comes into pata_macio_qc_prep() is 1280KB, which results in 40 DMA list entries. I tried with a larger block size but it doesn't change anything. I guess there's some limit somewhere else in the stack? That was testing on qemu, but I don't think it should matter? I guess there's no way to run the fio test against a file, ie. without a raw partition? My real G5 doesn't have any spare disks/partitions in it. cheers fio-3.37 Starting 1 process test: (groupid=0, jobs=1): err= 0: pid=257: Wed Aug 14 22:18:59 2024 read: IOPS=6, BW=195MiB/s (204MB/s)(96.0MiB/493msec) slat (usec): min=32973, max=35222, avg=33836.35, stdev=1212.51 clat (msec): min=378, max=448, avg=413.35, stdev=34.99 lat (msec): min=413, max=481, avg=447.19, stdev=33.87 clat percentiles (msec): | 1.00th=[ 380], 5.00th=[ 380], 10.00th=[ 380], 20.00th=[ 380], | 30.00th=[ 380], 40.00th=[ 414], 50.00th=[ 414], 60.00th=[ 414], | 70.00th=[ 447], 80.00th=[ 447], 90.00th=[ 447], 95.00th=[ 447], | 99.00th=[ 447], 99.50th=[ 447], 99.90th=[ 447], 99.95th=[ 447], | 99.99th=[ 447] bw ( KiB/s): min=195047, max=195047, per=97.82%, avg=195047.00, stdev= 0.00, samples=1 iops : min= 5, max= 5, avg= 5.00, stdev= 0.00, samples=1 lat (msec) : 500=100.00% cpu : usr=1.62%, sys=11.97%, ctx=22, majf=0, minf=1540 IO depths : 1=33.3%, 2=66.7%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts: total=3,0,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=4 Run status group 0 (all jobs): READ: bw=195MiB/s (204MB/s), 195MiB/s-195MiB/s (204MB/s-204MB/s), io=96.0MiB (101MB), run=493-493msec Disk stats (read/write): sda: ios=78/0, sectors=196608/0, merge=0/0, ticks=745/0, in_queue=745, util=66.89% ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Since 6.10 - kernel oops/panics on G4 macmini due to change in drivers/ata/pata_macio.c 2024-08-14 12:20 ` Michael Ellerman @ 2024-08-14 14:06 ` Niklas Cassel 2024-08-16 23:46 ` Michael Ellerman 0 siblings, 1 reply; 11+ messages in thread From: Niklas Cassel @ 2024-08-14 14:06 UTC (permalink / raw) To: Michael Ellerman Cc: Kolbjørn Barmen, linuxppc-dev, linux-kernel, linux-ide, Jonáš Vidra, Christoph Hellwig, linux On Wed, Aug 14, 2024 at 10:20:55PM +1000, Michael Ellerman wrote: > Niklas Cassel <cassel@kernel.org> writes: > > On Tue, Aug 13, 2024 at 10:32:36PM +1000, Michael Ellerman wrote: > >> Niklas Cassel <cassel@kernel.org> writes: > >> > On Tue, Aug 13, 2024 at 07:49:34AM +0200, Jonáš Vidra wrote: > ... > >> >> ------------[ cut here ]------------ > >> >> kernel BUG at drivers/ata/pata_macio.c:544! > >> > > >> > https://github.com/torvalds/linux/blob/v6.11-rc3/drivers/ata/pata_macio.c#L544 > >> > > >> > It seems that the > >> > while (sg_len) loop does not play nice with the new .max_segment_size. > >> > >> Right, but only for 4KB kernels for some reason. Is there some limit > >> elsewhere that prevents the bug tripping on 64KB kernels, or is it just > >> luck that no one has hit it? > > > > Have your tried running fio (flexible I/O tester), with reads with a very > > large block sizes? > > > > I would be surprised if it isn't possible to trigger the same bug with > > 64K page size. > > > > max segment size = 64K > > MAX_DCMDS = 256 > > 256 * 64K = 16 MiB > > What happens if you run fio with a 16 MiB blocksize? > > > > Something like: > > $ sudo fio --name=test --filename=/dev/sdX --direct=1 --runtime=60 --ioengine=io_uring --rw=read --iodepth=4 --bs=16M > > Nothing interesting happens, fio succeeds. > > The largest request that comes into pata_macio_qc_prep() is 1280KB, > which results in 40 DMA list entries. > > I tried with a larger block size but it doesn't change anything. I guess > there's some limit somewhere else in the stack? > > That was testing on qemu, but I don't think it should matter? > > I guess there's no way to run the fio test against a file, ie. without a > raw partition? My real G5 doesn't have any spare disks/partitions in it. You can definitely run fio against a file. e.g. $ dd if=/dev/random of=/tmp/my_file bs=1M count=1024 $ sudo fio --name=test --filename=/tmp/my_file --direct=1 --runtime=60 --ioengine=io_uring --rw=read --iodepth=4 --bs=16M Perhaps try with 32M block size, so that it is larger than max segment size = 64K MAX_DCMDS = 256 256 * 64K = 16 MiB Perhaps also try with and without --direct. It could be interesting to use the page cache if you do --rw=readwrite that might possibly result in larger bios. Kind regards, Niklas ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Since 6.10 - kernel oops/panics on G4 macmini due to change in drivers/ata/pata_macio.c 2024-08-14 14:06 ` Niklas Cassel @ 2024-08-16 23:46 ` Michael Ellerman 2024-08-17 3:32 ` Christoph Hellwig 0 siblings, 1 reply; 11+ messages in thread From: Michael Ellerman @ 2024-08-16 23:46 UTC (permalink / raw) To: Niklas Cassel Cc: Kolbjørn Barmen, linuxppc-dev, linux-kernel, linux-ide, Jonáš Vidra, Christoph Hellwig, linux Niklas Cassel <cassel@kernel.org> writes: > On Wed, Aug 14, 2024 at 10:20:55PM +1000, Michael Ellerman wrote: >> Niklas Cassel <cassel@kernel.org> writes: >> > On Tue, Aug 13, 2024 at 10:32:36PM +1000, Michael Ellerman wrote: >> >> Niklas Cassel <cassel@kernel.org> writes: >> >> > On Tue, Aug 13, 2024 at 07:49:34AM +0200, Jonáš Vidra wrote: >> ... >> >> >> ------------[ cut here ]------------ >> >> >> kernel BUG at drivers/ata/pata_macio.c:544! >> >> > >> >> > https://github.com/torvalds/linux/blob/v6.11-rc3/drivers/ata/pata_macio.c#L544 >> >> > >> >> > It seems that the >> >> > while (sg_len) loop does not play nice with the new .max_segment_size. >> >> >> >> Right, but only for 4KB kernels for some reason. Is there some limit >> >> elsewhere that prevents the bug tripping on 64KB kernels, or is it just >> >> luck that no one has hit it? >> > >> > Have your tried running fio (flexible I/O tester), with reads with a very >> > large block sizes? >> > >> > I would be surprised if it isn't possible to trigger the same bug with >> > 64K page size. >> > >> > max segment size = 64K >> > MAX_DCMDS = 256 >> > 256 * 64K = 16 MiB >> > What happens if you run fio with a 16 MiB blocksize? >> > >> > Something like: >> > $ sudo fio --name=test --filename=/dev/sdX --direct=1 --runtime=60 --ioengine=io_uring --rw=read --iodepth=4 --bs=16M >> >> Nothing interesting happens, fio succeeds. >> >> The largest request that comes into pata_macio_qc_prep() is 1280KB, >> which results in 40 DMA list entries. >> >> I tried with a larger block size but it doesn't change anything. I guess >> there's some limit somewhere else in the stack? >> >> That was testing on qemu, but I don't think it should matter? >> >> I guess there's no way to run the fio test against a file, ie. without a >> raw partition? My real G5 doesn't have any spare disks/partitions in it. > > > You can definitely run fio against a file. > > e.g. > $ dd if=/dev/random of=/tmp/my_file bs=1M count=1024 > > $ sudo fio --name=test --filename=/tmp/my_file --direct=1 --runtime=60 --ioengine=io_uring --rw=read --iodepth=4 --bs=16M > > > Perhaps try with 32M block size, so that it is larger than > max segment size = 64K > MAX_DCMDS = 256 > 256 * 64K = 16 MiB > > Perhaps also try with and without --direct. > It could be interesting to use the page cache if you do --rw=readwrite > that might possibly result in larger bios. Changing the fio settings didn't help. I did some tracing and noticed it was always splitting the bio in __bio_split_to_limits() based on get_max_io_size(). That eventually lead me to max_sectors_kb in sysfs, which is by default (on my system at least) 1280 (KB) - which is exactly the size I see in pata-macio. Increasing max_sectors_kb with: # echo 16384 > /sys/devices/pci0000:f0/0000:f0:0c.0/0.80000000:mac-io/0.00020000:ata-3/ata1/host0/target0:0:0/0:0:0:0/block/sda/queue/max_sectors_kb Allows me to trip the bug (I turned it into a WARN to keep the system alive): [ 1804.988552] ------------[ cut here ]------------ [ 1804.988963] DMA table overflow! [ 1804.989781] WARNING: CPU: 0 PID: 299 at drivers/ata/pata_macio.c:546 pata_macio_qc_prep+0x27c/0x2a4 [ 1804.991157] Modules linked in: [ 1804.991945] CPU: 0 PID: 299 Comm: iou-wrk-298 Not tainted 6.10.4-dirty #242 [ 1804.992688] Hardware name: PowerMac3,1 PPC970FX 0x3c0301 PowerMac [ 1804.993512] NIP: c0000000008bcfb4 LR: c0000000008bcfb0 CTR: 0000000000000000 [ 1804.994244] REGS: c0000000052d6fb0 TRAP: 0700 Not tainted (6.10.4-dirty) [ 1804.994998] MSR: 800000000202b032 <SF,VEC,EE,FP,ME,IR,DR,RI> CR: 44484240 XER: 00000000 [ 1804.996178] IRQMASK: 1 [ 1804.996178] GPR00: c0000000008bcfb0 c0000000052d7250 c000000000f50b00 0000000000000013 [ 1804.996178] GPR04: 0000000100000282 c0000000014806c0 fffffffffffec230 000000003ed10000 [ 1804.996178] GPR08: 0000000000000027 c00000003fe02410 0000000000000001 0000000044484240 [ 1804.996178] GPR12: c0000000014806a8 c0000000017b0000 c0000000006c9488 c000000005026b40 [ 1804.996178] GPR16: 0000000000000000 0000000002000000 c000000000cecaa8 c000000000e44ac8 [ 1804.996178] GPR20: 0000000000800000 0000000000000080 000000000000ff00 c000000000d12730 [ 1804.996178] GPR24: c000000000e20788 c00000000330eae8 0000000000000000 0000000000000020 [ 1804.996178] GPR28: c0000000036c8130 0000000000000100 0000000000000000 c000000003fb1000 [ 1805.003085] NIP [c0000000008bcfb4] pata_macio_qc_prep+0x27c/0x2a4 [ 1805.003715] LR [c0000000008bcfb0] pata_macio_qc_prep+0x278/0x2a4 [ 1805.004564] Call Trace: [ 1805.004963] [c0000000052d7250] [c0000000008bcfb0] pata_macio_qc_prep+0x278/0x2a4 (unreliable) [ 1805.005974] [c0000000052d7310] [c00000000089840c] ata_qc_issue+0x170/0x390 [ 1805.006719] [c0000000052d7390] [c0000000008a5160] __ata_scsi_queuecmd+0x220/0x7d4 [ 1805.007472] [c0000000052d7410] [c000000000 8a5778] ata_scsi_queuecmd+0x64/0xe8 [ 1805.008194] [c0000000052d7450] [c00000000085b450] scsi_queue_rq+0x408/0xd74 [ 1805.008904] [c0000000052d7500] [c00000000067bfc8] blk_mq_dispatch_rq_list+0x160/0x914 [ 1805.009696] [c0000000052d75b0] [c000000000683d50] __blk_mq_sched_dispatch_requests+0x5fc/0x77c [ 1805.010551] [c0000000052d7680] [c000000000683f68] blk_mq_sched_dispatch_requests+0x44/0x90 [ 1805.011371] [c0000000052d76b0] [c000000000677328] blk_mq_run_hw_queue+0x220/0x240 [ 1805.012138] [c0000000052d76f0] [c00000000067b084] blk_mq_flush_plug_list.part.0+0x214/0x75c [ 1805.012975] [c0000000052d77a0] [c00000000067b664] blk_add_rq_to_plug+0x98/0x1f0 [ 1805.013717] [c0000000052d77e0] [c00000000067cd4c] blk_mq_submit_bio+0x5b0/0x888 [ 1805.014457] [c0000000052d7890] [c000000000667bf0] __submit_bio+0xa4/0x2e4 [ 1805.015149] [c0000000052d7910] [c0000000006680bc] submit_bio_noacct_nocheck+0x28c/0x404 [ 1805.015952] [c0000000052d7980] [c00000000065bf68] blkdev_direct_IO+0x63c/0x824 [ 1805.016688] [c0000000052d7aa0] [c00000000065c614] blkdev_read_iter+0x10c/0x1c8 [ 1805.017423] [c0000000052d7af0] [c0000000006b2cdc] __io_read+0xe0/0x5a0 [ 1805.018091] [c0000000052d7b50] [c0000000006b3a70] io_read+0x30/0x74 [ 1805.018733] [c0000000052d7b80] [c0000000006a9040] io_issue_sqe+0x8c/0x768 [ 1805.019419] [c0000000052d7c00] [c0000000006a9850] io_wq_submit_work+0x118/0x518 [ 1805.020153] [c0000000052d7c60] [c0000000006c8ebc] io_worker_handle_work+0x23c/0x800 [ 1805.020923] [c0000000052d7d00] [c0000000006c95f8] io_wq_worker+0x178/0x51c [ 1805.021621] [c0000000052d7e50] [c00000000000bd94] ret_from_kernel_user_thread+0x14/0x1c Same behaviour on a kernel with PAGE_SIZE = 4KB. I don't know why max_sectors_kb starts out with a different value on my system, but anyway the bug is lurking there, even if it doesn't trip by default in some configurations. I'll clean up and send my patch from earlier in the thread. cheers ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Since 6.10 - kernel oops/panics on G4 macmini due to change in drivers/ata/pata_macio.c 2024-08-16 23:46 ` Michael Ellerman @ 2024-08-17 3:32 ` Christoph Hellwig 0 siblings, 0 replies; 11+ messages in thread From: Christoph Hellwig @ 2024-08-17 3:32 UTC (permalink / raw) To: Michael Ellerman Cc: Niklas Cassel, Kolbjørn Barmen, linuxppc-dev, linux-kernel, linux-ide, Jonáš Vidra, Christoph Hellwig, linux On Sat, Aug 17, 2024 at 09:46:31AM +1000, Michael Ellerman wrote: > Same behaviour on a kernel with PAGE_SIZE = 4KB. > > I don't know why max_sectors_kb starts out with a different value on my > system, but anyway the bug is lurking there, even if it doesn't trip by > default in some configurations. Various distributions use udev rules to increase it. ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2024-08-17 11:24 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-08-12 22:32 Since 6.10 - kernel oops/panics on G4 macmini due to change in drivers/ata/pata_macio.c Kolbjørn Barmen 2024-08-13 5:49 ` Jonáš Vidra 2024-08-13 9:54 ` Niklas Cassel 2024-08-13 9:58 ` Jonáš Vidra 2024-08-13 12:32 ` Michael Ellerman 2024-08-13 14:33 ` Kolbjørn Barmen 2024-08-13 14:59 ` Niklas Cassel 2024-08-14 12:20 ` Michael Ellerman 2024-08-14 14:06 ` Niklas Cassel 2024-08-16 23:46 ` Michael Ellerman 2024-08-17 3:32 ` Christoph Hellwig
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).