* Since 6.10 - kernel oops/panics on G4 macmini due to change in drivers/ata/pata_macio.c
@ 2024-08-12 22:32 Kolbjørn Barmen
2024-08-13 5:49 ` Jonáš Vidra
0 siblings, 1 reply; 11+ messages in thread
From: Kolbjørn Barmen @ 2024-08-12 22:32 UTC (permalink / raw)
To: linuxppc-dev; +Cc: linux-kernel
Ever since 6.10, my macmini G4 behaved unstable when dealing with lots of
I/O activity, such as sync'ing of Gentoo portage tree, unpacking kernel
source tarball, building large software packages (or kernel) etc.
After a bit of testing, and patient kernel rebuilding (while crashing) I
found the cuplit to be this commit/change
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/diff/?id=09fe2bfa6b83f865126ce3964744863f69a4a030
Exampe of what a opps/panic looks like (and they all look very much alike)
https://share.icloud.com/photos/042BHRkrXqPO-fllvpxMFl2CA
-- kolla
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Since 6.10 - kernel oops/panics on G4 macmini due to change in drivers/ata/pata_macio.c
2024-08-12 22:32 Since 6.10 - kernel oops/panics on G4 macmini due to change in drivers/ata/pata_macio.c Kolbjørn Barmen
@ 2024-08-13 5:49 ` Jonáš Vidra
2024-08-13 9:54 ` Niklas Cassel
0 siblings, 1 reply; 11+ messages in thread
From: Jonáš Vidra @ 2024-08-13 5:49 UTC (permalink / raw)
To: Kolbjørn Barmen
Cc: linuxppc-dev, linux-kernel, linux-ide, mpe, cassel, linux
On Tue 13. Aug 2024 0:32:37 CEST, Kolbjørn Barmen wrote:
> Ever since 6.10, my macmini G4 behaved unstable when dealing with lots of
> I/O activity, such as sync'ing of Gentoo portage tree, unpacking kernel
> source tarball, building large software packages (or kernel) etc.
>
> After a bit of testing, and patient kernel rebuilding (while crashing) I
> found the cuplit to be this commit/change
>
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/diff/?id=09fe2bfa6b83f865126ce3964744863f69a4a030
I've been able to reproduce this pata_macio bug on a desktop PowerMac G4
with the 6.10.3 kernel version. Reverting the linked change
("ata: pata_macio: Fix max_segment_size with PAGE_SIZE == 64K") makes
the errors go away.
CCing linux-ide and the authors of that patch; I hope this is OK with
you guys.
> Exampe of what a opps/panic looks like (and they all look very much alike)
>
> https://share.icloud.com/photos/042BHRkrXqPO-fllvpxMFl2CA
Textual form for easier searching:
------------[ cut here ]------------
kernel BUG at drivers/ata/pata_macio.c:544!
Oops: Exception in kernel mode, sig: 5 [#1]
BE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2 DEBUG_PAGEALLOC PowerMac
Modules linked in: ipv6 binfmt_misc b43 mac80211 radeon libarc4 cfg80211
snd_aoa_codec_tas snd_aoa_fabric_layout snd_aoa rfkill snd_aoa_i2sbus hwmon
drm_suballoc_helper snd_aoa_soundbus i2c_algo_bit snd_pcm backlight
drm_ttm_helper ttm xhci_pci pmac_zilog therm_windtunnel xhci_hcd
drm_display_helper firewire_ohci snd_timer snd firewire_core serial_base
ssb soundcore crc_itu_t
CPU: 1 PID: 1870 Comm: kworker/u10:4 Tainted: G T
6.10.3-gentoo #1
Hardware name: PowerMac3,6 7455 0x80010303 PowerMac
Workqueue: btrfs-worker btrfs_work_helper
NIP: c0719670 LR: c0719678 CTR: 00000001
REGS: f2db9bf0 TRAP: 0700 Tainted: G T (6.10.3-gentoo)
MSR: 00021032 <ME,IR,DR,RI> CR: 44008408 XER: 20000000
GPR00: c06fc28c f2db9cb0 c10d8020 c12d28cc 00000000 00000000 00000000
c109cff4
GPR08: 69fd0000 00000100 00010000 00000000 00000000 00000000 c007801c
c40c1980
GPR16: 00000000 00000000 00000000 00000000 00000000 00000100 00000122
c11377c8
GPR24: 000000ff 00000008 0000ff00 00000000 c14200a8 00000101 00000000
c109d000
NIP [c0719670] pata_macio_qc_prep+0xf4/0x190
LR [c0719678] pata_macio_qc_prep+0xfc/0x190
Call Trace:
[f2db9cb0] [c1421660] 0xc1421660 (unreliable)
[f2db9ce0] [c06fc28c] ata_qc_issue+0x14c/0x2d4
[f2db9d00] [c0707c5c] __ata_scsi_queuecmd+0x200/0x53c
[f2db9d20] [c0707fe8] ata_scsi_queuecmd+0x50/0xe0
[f2db9d40] [c06e2644] scsi_queue_rq+0x788/0xb1c
[f2db9d80] [c0492464] __blk_mq_issue_directly+0x58/0xf4
[f2db9db0] [c0497828] blk_mq_plug_issue_direct+0x8c/0x1b4
[f2db9de0] [c0498074] blk_mq_flush_plug_list.part.0+0x584/0x5e0
[f2db9e30] [c0485a40] __blk_flush_plug+0xf8/0x194
[f2db9e70] [c0485f88] __submit_bio+0x1b8/0x2e0
[f2db9ec0] [c04862e0] submit_bio_noacct_nocheck+0x230/0x304
[f2db9f00] [c03aaf30] btrfs_work_helper+0x200/0x338
[f2db9f40] [c006cae0] process_one_work+0x1a8/0x338
[f2db9f70] [c006d79c] worker_thread+0x364/0x4c0
[f2db9fc0] [c007811c] kthread+0x100/0x104
[f2db9ff0] [c001b304] start_kernel_thread+0x10/0x14
Code: 38ff0004 b37f0002 7d20ff2c 3bff0010 7d003d2c 7d084a14 93dffff8
b3dffffe b3dffffc 41820010 3bbd0001 4200ffc0 <0fe00000> 4bdcbb01 813c0044
3b180001
---[ end trace 0000000000000000 ]---
note: kworker/u10:4[1870] exited with irqs disabled
------------[ cut here ]------------
WARNING: CPU: 1 PID: 1870 at kernel/exit.c:825 do_exit+0x854/0x9ec
Modules linked in: ipv6 binfmt_misc b43 mac80211 radeon libarc4 cfg80211
snd_aoa_codec_tas snd_aoa_fabric_layout snd_aoa rfkill snd_aoa_i2sbus hwmon
drm_suballoc_helper snd_aoa_soundbus i2c_algo_bit snd_pcm backlight
drm_ttm_helper ttm xhci_pci pmac_zilog therm_windtunnel xhci_hcd
drm_display_helper firewire_ohci snd_timer snd firewire_core serial_base
ssb soundcore crc_itu_t
CPU: 1 PID: 1870 Comm: kworker/u10:4 Tainted: G D T
6.10.3-gentoo #1
Hardware name: PowerMac3,6 7455 0x80010303 PowerMac
Workqueue: btrfs-worker btrfs_work_helper
NIP: c004f09c LR: c004e8a4 CTR: 00000000
REGS: f2db9a80 TRAP: 0700 Tainted: G D T (6.10.3-gentoo)
MSR: 00029032 <EE,ME,IR,DR,RI> CR: 88db92e2 XER: 00000000
GPR00: c004f2c4 f2db9b40 c10d8020 00000000 00002710 00000000 00000000
00000000
GPR08: 00000000 f2db9e88 00000004 00000000 28db92e2 00000000 c007801c
c40c1980
GPR16: 00000000 00000000 00000000 00000000 00000000 00000100 00000122
c11377c8
GPR24: 000000ff c0db0000 00001032 c0a21000 c138d520 00000005 c10d8020
c1447220
NIP [c004f09c] do_exit+0x854/0x9ec
LR [c004e8a4] do_exit+0x5c/0x9ec
Call Trace:
[f2db9b40] [c00b0c38] _printk+0x78/0xc4 (unreliable)
[f2db9b90] [c004f2c4] make_task_dead+0x90/0x174
[f2db9bb0] [c0010b9c] die+0x324/0x32c
[f2db9be0] [c0004828] ProgramCheck_virt+0x108/0x158
--- interrupt: 700 at pata_macio_qc_prep+0xf4/0x190
NIP: c0719670 LR: c0719678 CTR: 00000001
REGS: f2db9bf0 TRAP: 0700 Tainted: G D T (6.10.3-gentoo)
MSR: 00021032 <ME,IR,DR,RI> CR: 44008408 XER: 20000000
GPR00: c06fc28c f2db9cb0 c10d8020 c12d28cc 00000000 00000000 00000000
c109cff4
GPR08: 69fd0000 00000100 00010000 00000000 00000000 00000000 c007801c
c40c1980
GPR16: 00000000 00000000 00000000 00000000 00000000 00000100 00000122
c11377c8
GPR24: 000000ff 00000008 0000ff00 00000000 c14200a8 00000101 00000000
c109d000
NIP [c0719670] pata_macio_qc_prep+0xf4/0x190
LR [c0719678] pata_macio_qc_prep+0xfc/0x190
--- interrupt: 700
[f2db9cb0] [c1421660] 0xc1421660 (unreliable)
[f2db9ce0] [c06fc28c] ata_qc_issue+0x14c/0x2d4
[f2db9d00] [c0707c5c] __ata_scsi_queuecmd+0x200/0x53c
[f2db9d20] [c0707fe8] ata_scsi_queuecmd+0x50/0xe0
[f2db9d40] [c06e2644] scsi_queue_rq+0x788/0xb1c
[f2db9d80] [c0492464] __blk_mq_issue_directly+0x58/0xf4
[f2db9db0] [c0497828] blk_mq_plug_issue_direct+0x8c/0x1b4
[f2db9de0] [c0498074] blk_mq_flush_plug_list.part.0+0x584/0x5e0
[f2db9e30] [c0485a40] __blk_flush_plug+0xf8/0x194
[f2db9e70] [c0485f88] __submit_bio+0x1b8/0x2e0
[f2db9ec0] [c04862e0] submit_bio_noacct_nocheck+0x230/0x304
[f2db9f00] [c03aaf30] btrfs_work_helper+0x200/0x338
[f2db9f40] [c006cae0] process_one_work+0x1a8/0x338
[f2db9f70] [c006d79c] worker_thread+0x364/0x4c0
[f2db9fc0] [c007811c] kthread+0x100/0x104
[f2db9ff0] [c001b304] start_kernel_thread+0x10/0x14
Code: 915e02fc 81410014 912a0004 915e03c0 939e03c4 91210014 813e04cc
4bfffcec 807e0370 38800000 4bffe195 4bfffc9c <0fe00000> 4bfff848 0fe00000
4bfff7ec
---[ end trace 0000000000000000 ]---
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Since 6.10 - kernel oops/panics on G4 macmini due to change in drivers/ata/pata_macio.c
2024-08-13 5:49 ` Jonáš Vidra
@ 2024-08-13 9:54 ` Niklas Cassel
2024-08-13 9:58 ` Jonáš Vidra
2024-08-13 12:32 ` Michael Ellerman
0 siblings, 2 replies; 11+ messages in thread
From: Niklas Cassel @ 2024-08-13 9:54 UTC (permalink / raw)
To: Michael Ellerman
Cc: Kolbjørn Barmen, linuxppc-dev, linux-kernel, linux-ide,
Jonáš Vidra, Christoph Hellwig, linux
Hello Jonáš, Kolbjørn,
thank you for the report.
On Tue, Aug 13, 2024 at 07:49:34AM +0200, Jonáš Vidra wrote:
> On Tue 13. Aug 2024 0:32:37 CEST, Kolbjørn Barmen wrote:
> > Ever since 6.10, my macmini G4 behaved unstable when dealing with lots of
> > I/O activity, such as sync'ing of Gentoo portage tree, unpacking kernel
> > source tarball, building large software packages (or kernel) etc.
> >
> > After a bit of testing, and patient kernel rebuilding (while crashing) I
> > found the cuplit to be this commit/change
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/diff/?id=09fe2bfa6b83f865126ce3964744863f69a4a030
>
> I've been able to reproduce this pata_macio bug on a desktop PowerMac G4
> with the 6.10.3 kernel version. Reverting the linked change
> ("ata: pata_macio: Fix max_segment_size with PAGE_SIZE == 64K") makes
> the errors go away.
Michael, as the author of the this commit, could you please look into
this issue?
We could revert your patch, which appears to work for some users,
but that would again break setups with PAGE_SIZE == 64K.
(I assume that Jonáš and Kolbjørn are not building with PAGE_SIZE == 64K.)
>
> ------------[ cut here ]------------
> kernel BUG at drivers/ata/pata_macio.c:544!
https://github.com/torvalds/linux/blob/v6.11-rc3/drivers/ata/pata_macio.c#L544
It seems that the
while (sg_len) loop does not play nice with the new .max_segment_size.
> Oops: Exception in kernel mode, sig: 5 [#1]
> BE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2 DEBUG_PAGEALLOC PowerMac
> Modules linked in: ipv6 binfmt_misc b43 mac80211 radeon libarc4 cfg80211
> snd_aoa_codec_tas snd_aoa_fabric_layout snd_aoa rfkill snd_aoa_i2sbus hwmon
> drm_suballoc_helper snd_aoa_soundbus i2c_algo_bit snd_pcm backlight
> drm_ttm_helper ttm xhci_pci pmac_zilog therm_windtunnel xhci_hcd
> drm_display_helper firewire_ohci snd_timer snd firewire_core serial_base ssb
> soundcore crc_itu_t
> CPU: 1 PID: 1870 Comm: kworker/u10:4 Tainted: G T
> 6.10.3-gentoo #1
> Hardware name: PowerMac3,6 7455 0x80010303 PowerMac
> Workqueue: btrfs-worker btrfs_work_helper
> NIP: c0719670 LR: c0719678 CTR: 00000001
> REGS: f2db9bf0 TRAP: 0700 Tainted: G T (6.10.3-gentoo)
> MSR: 00021032 <ME,IR,DR,RI> CR: 44008408 XER: 20000000
>
> GPR00: c06fc28c f2db9cb0 c10d8020 c12d28cc 00000000 00000000 00000000
> c109cff4 GPR08: 69fd0000 00000100 00010000 00000000 00000000 00000000
> c007801c c40c1980 GPR16: 00000000 00000000 00000000 00000000 00000000
> 00000100 00000122 c11377c8 GPR24: 000000ff 00000008 0000ff00 00000000
> c14200a8 00000101 00000000 c109d000 NIP [c0719670]
> pata_macio_qc_prep+0xf4/0x190
> LR [c0719678] pata_macio_qc_prep+0xfc/0x190
> Call Trace:
> [f2db9cb0] [c1421660] 0xc1421660 (unreliable)
> [f2db9ce0] [c06fc28c] ata_qc_issue+0x14c/0x2d4
> [f2db9d00] [c0707c5c] __ata_scsi_queuecmd+0x200/0x53c
> [f2db9d20] [c0707fe8] ata_scsi_queuecmd+0x50/0xe0
> [f2db9d40] [c06e2644] scsi_queue_rq+0x788/0xb1c
> [f2db9d80] [c0492464] __blk_mq_issue_directly+0x58/0xf4
> [f2db9db0] [c0497828] blk_mq_plug_issue_direct+0x8c/0x1b4
> [f2db9de0] [c0498074] blk_mq_flush_plug_list.part.0+0x584/0x5e0
> [f2db9e30] [c0485a40] __blk_flush_plug+0xf8/0x194
> [f2db9e70] [c0485f88] __submit_bio+0x1b8/0x2e0
> [f2db9ec0] [c04862e0] submit_bio_noacct_nocheck+0x230/0x304
> [f2db9f00] [c03aaf30] btrfs_work_helper+0x200/0x338
> [f2db9f40] [c006cae0] process_one_work+0x1a8/0x338
> [f2db9f70] [c006d79c] worker_thread+0x364/0x4c0
> [f2db9fc0] [c007811c] kthread+0x100/0x104
> [f2db9ff0] [c001b304] start_kernel_thread+0x10/0x14
> Code: 38ff0004 b37f0002 7d20ff2c 3bff0010 7d003d2c 7d084a14 93dffff8
> b3dffffe b3dffffc 41820010 3bbd0001 4200ffc0 <0fe00000> 4bdcbb01 813c0044
> 3b180001 ---[ end trace 0000000000000000 ]---
>
> note: kworker/u10:4[1870] exited with irqs disabled
> ------------[ cut here ]------------
> WARNING: CPU: 1 PID: 1870 at kernel/exit.c:825 do_exit+0x854/0x9ec
> Modules linked in: ipv6 binfmt_misc b43 mac80211 radeon libarc4 cfg80211
> snd_aoa_codec_tas snd_aoa_fabric_layout snd_aoa rfkill snd_aoa_i2sbus hwmon
> drm_suballoc_helper snd_aoa_soundbus i2c_algo_bit snd_pcm backlight
> drm_ttm_helper ttm xhci_pci pmac_zilog therm_windtunnel xhci_hcd
> drm_display_helper firewire_ohci snd_timer snd firewire_core serial_base ssb
> soundcore crc_itu_t
> CPU: 1 PID: 1870 Comm: kworker/u10:4 Tainted: G D T
> 6.10.3-gentoo #1
> Hardware name: PowerMac3,6 7455 0x80010303 PowerMac
> Workqueue: btrfs-worker btrfs_work_helper
> NIP: c004f09c LR: c004e8a4 CTR: 00000000
> REGS: f2db9a80 TRAP: 0700 Tainted: G D T (6.10.3-gentoo)
> MSR: 00029032 <EE,ME,IR,DR,RI> CR: 88db92e2 XER: 00000000
>
> GPR00: c004f2c4 f2db9b40 c10d8020 00000000 00002710 00000000 00000000
> 00000000 GPR08: 00000000 f2db9e88 00000004 00000000 28db92e2 00000000
> c007801c c40c1980 GPR16: 00000000 00000000 00000000 00000000 00000000
> 00000100 00000122 c11377c8 GPR24: 000000ff c0db0000 00001032 c0a21000
> c138d520 00000005 c10d8020 c1447220 NIP [c004f09c] do_exit+0x854/0x9ec
> LR [c004e8a4] do_exit+0x5c/0x9ec
> Call Trace:
> [f2db9b40] [c00b0c38] _printk+0x78/0xc4 (unreliable)
> [f2db9b90] [c004f2c4] make_task_dead+0x90/0x174
> [f2db9bb0] [c0010b9c] die+0x324/0x32c
> [f2db9be0] [c0004828] ProgramCheck_virt+0x108/0x158
> --- interrupt: 700 at pata_macio_qc_prep+0xf4/0x190
> NIP: c0719670 LR: c0719678 CTR: 00000001
> REGS: f2db9bf0 TRAP: 0700 Tainted: G D T (6.10.3-gentoo)
> MSR: 00021032 <ME,IR,DR,RI> CR: 44008408 XER: 20000000
>
> GPR00: c06fc28c f2db9cb0 c10d8020 c12d28cc 00000000 00000000 00000000
> c109cff4 GPR08: 69fd0000 00000100 00010000 00000000 00000000 00000000
> c007801c c40c1980 GPR16: 00000000 00000000 00000000 00000000 00000000
> 00000100 00000122 c11377c8 GPR24: 000000ff 00000008 0000ff00 00000000
> c14200a8 00000101 00000000 c109d000 NIP [c0719670]
> pata_macio_qc_prep+0xf4/0x190
> LR [c0719678] pata_macio_qc_prep+0xfc/0x190
> --- interrupt: 700
> [f2db9cb0] [c1421660] 0xc1421660 (unreliable)
> [f2db9ce0] [c06fc28c] ata_qc_issue+0x14c/0x2d4
> [f2db9d00] [c0707c5c] __ata_scsi_queuecmd+0x200/0x53c
> [f2db9d20] [c0707fe8] ata_scsi_queuecmd+0x50/0xe0
> [f2db9d40] [c06e2644] scsi_queue_rq+0x788/0xb1c
> [f2db9d80] [c0492464] __blk_mq_issue_directly+0x58/0xf4
> [f2db9db0] [c0497828] blk_mq_plug_issue_direct+0x8c/0x1b4
> [f2db9de0] [c0498074] blk_mq_flush_plug_list.part.0+0x584/0x5e0
> [f2db9e30] [c0485a40] __blk_flush_plug+0xf8/0x194
> [f2db9e70] [c0485f88] __submit_bio+0x1b8/0x2e0
> [f2db9ec0] [c04862e0] submit_bio_noacct_nocheck+0x230/0x304
> [f2db9f00] [c03aaf30] btrfs_work_helper+0x200/0x338
> [f2db9f40] [c006cae0] process_one_work+0x1a8/0x338
> [f2db9f70] [c006d79c] worker_thread+0x364/0x4c0
> [f2db9fc0] [c007811c] kthread+0x100/0x104
> [f2db9ff0] [c001b304] start_kernel_thread+0x10/0x14
> Code: 915e02fc 81410014 912a0004 915e03c0 939e03c4 91210014 813e04cc
> 4bfffcec 807e0370 38800000 4bffe195 4bfffc9c <0fe00000> 4bfff848 0fe00000
> 4bfff7ec ---[ end trace 0000000000000000 ]---
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Since 6.10 - kernel oops/panics on G4 macmini due to change in drivers/ata/pata_macio.c
2024-08-13 9:54 ` Niklas Cassel
@ 2024-08-13 9:58 ` Jonáš Vidra
2024-08-13 12:32 ` Michael Ellerman
1 sibling, 0 replies; 11+ messages in thread
From: Jonáš Vidra @ 2024-08-13 9:58 UTC (permalink / raw)
To: Niklas Cassel
Cc: Michael Ellerman, Kolbjørn Barmen, linuxppc-dev,
linux-kernel, linux-ide, Christoph Hellwig, linux
On úterý 13. srpna 2024 11:54:57 CEST, Niklas Cassel wrote:
> Hello Jonáš, Kolbjørn,
>
> thank you for the report.
>
> On Tue, Aug 13, 2024 at 07:49:34AM +0200, Jonáš Vidra wrote:
>
>> On Tue 13. Aug 2024 0:32:37 CEST, Kolbjørn Barmen wrote: ...
>
> Michael, as the author of the this commit, could you please look into
> this issue?
>
> We could revert your patch, which appears to work for some users,
> but that would again break setups with PAGE_SIZE == 64K.
> (I assume that Jonáš and Kolbjørn are not building with PAGE_SIZE == 64K.)
This is from a PPC32 machine, so it doesn't even have that option.
It only supports 4K pages.
>> ------------[ cut here ]------------
>> kernel BUG at drivers/ata/pata_macio.c:544!
>
>
> https://github.com/torvalds/linux/blob/v6.11-rc3/drivers/ata/pata_macio.c#L544
>
> It seems that the
> while (sg_len) loop does not play nice with the new .max_segment_size.
>
>
>
>> Oops: Exception in kernel mode, sig: 5 [#1]
>> BE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2 DEBUG_PAGEALLOC PowerMac
>> Modules linked in: ipv6 binfmt_misc b43 mac80211 radeon libarc4 cfg80211
>> snd_aoa_codec_tas snd_aoa_fabric_layout snd_aoa rfkill
>> snd_aoa_i2sbus hwmon
>> drm_suballoc_helper snd_aoa_soundbus i2c_algo_bit snd_pcm backlight
>> drm_ttm_helper ttm xhci_pci pmac_zilog therm_windtunnel xhci_hcd
>> drm_display_helper firewire_ohci snd_timer snd firewire_core
>> serial_base ssb
>> soundcore crc_itu_t
>> CPU: 1 PID: 1870 Comm: kworker/u10:4 Tainted: G T
>> 6.10.3-gentoo #1
>> Hardware name: PowerMac3,6 7455 0x80010303 PowerMac
>> Workqueue: btrfs-worker btrfs_work_helper
>> NIP: c0719670 LR: c0719678 CTR: 00000001
>> REGS: f2db9bf0 TRAP: 0700 Tainted: G T (6.10.3-gentoo)
>> MSR: 00021032 <ME,IR,DR,RI> CR: 44008408 XER: 20000000
>>
>> GPR00: c06fc28c f2db9cb0 c10d8020 c12d28cc 00000000 00000000 00000000
>> c109cff4 GPR08: 69fd0000 00000100 00010000 00000000 00000000 00000000
>> c007801c c40c1980 GPR16: 00000000 00000000 00000000 00000000 00000000
>> 00000100 00000122 c11377c8 GPR24: 000000ff 00000008 0000ff00 00000000
>> c14200a8 00000101 00000000 c109d000 NIP [c0719670]
>> pata_macio_qc_prep+0xf4/0x190
>> LR [c0719678] pata_macio_qc_prep+0xfc/0x190
>> Call Trace:
>> [f2db9cb0] [c1421660] 0xc1421660 (unreliable)
>> [f2db9ce0] [c06fc28c] ata_qc_issue+0x14c/0x2d4
>> [f2db9d00] [c0707c5c] __ata_scsi_queuecmd+0x200/0x53c
>> [f2db9d20] [c0707fe8] ata_scsi_queuecmd+0x50/0xe0
>> [f2db9d40] [c06e2644] scsi_queue_rq+0x788/0xb1c
>> [f2db9d80] [c0492464] __blk_mq_issue_directly+0x58/0xf4
>> [f2db9db0] [c0497828] blk_mq_plug_issue_direct+0x8c/0x1b4
>> [f2db9de0] [c0498074] blk_mq_flush_plug_list.part.0+0x584/0x5e0
>> [f2db9e30] [c0485a40] __blk_flush_plug+0xf8/0x194
>> [f2db9e70] [c0485f88] __submit_bio+0x1b8/0x2e0
>> [f2db9ec0] [c04862e0] submit_bio_noacct_nocheck+0x230/0x304
>> [f2db9f00] [c03aaf30] btrfs_work_helper+0x200/0x338
>> [f2db9f40] [c006cae0] process_one_work+0x1a8/0x338
>> [f2db9f70] [c006d79c] worker_thread+0x364/0x4c0
>> [f2db9fc0] [c007811c] kthread+0x100/0x104
>> [f2db9ff0] [c001b304] start_kernel_thread+0x10/0x14
>> Code: 38ff0004 b37f0002 7d20ff2c 3bff0010 7d003d2c 7d084a14 93dffff8
>> b3dffffe b3dffffc 41820010 3bbd0001 4200ffc0 <0fe00000> 4bdcbb01 813c0044
>> 3b180001 ---[ end trace 0000000000000000 ]---
>>
>> note: kworker/u10:4[1870] exited with irqs disabled
>> ------------[ cut here ]------------
>> WARNING: CPU: 1 PID: 1870 at kernel/exit.c:825 do_exit+0x854/0x9ec
>> Modules linked in: ipv6 binfmt_misc b43 mac80211 radeon libarc4 cfg80211
>> snd_aoa_codec_tas snd_aoa_fabric_layout snd_aoa rfkill
>> snd_aoa_i2sbus hwmon
>> drm_suballoc_helper snd_aoa_soundbus i2c_algo_bit snd_pcm backlight
>> drm_ttm_helper ttm xhci_pci pmac_zilog therm_windtunnel xhci_hcd
>> drm_display_helper firewire_ohci snd_timer snd firewire_core
>> serial_base ssb
>> soundcore crc_itu_t
>> CPU: 1 PID: 1870 Comm: kworker/u10:4 Tainted: G D T
>> 6.10.3-gentoo #1
>> Hardware name: PowerMac3,6 7455 0x80010303 PowerMac
>> Workqueue: btrfs-worker btrfs_work_helper
>> NIP: c004f09c LR: c004e8a4 CTR: 00000000
>> REGS: f2db9a80 TRAP: 0700 Tainted: G D T (6.10.3-gentoo)
>> MSR: 00029032 <EE,ME,IR,DR,RI> CR: 88db92e2 XER: 00000000
>>
>> GPR00: c004f2c4 f2db9b40 c10d8020 00000000 00002710 00000000 00000000
>> 00000000 GPR08: 00000000 f2db9e88 00000004 00000000 28db92e2 00000000
>> c007801c c40c1980 GPR16: 00000000 00000000 00000000 00000000 00000000
>> 00000100 00000122 c11377c8 GPR24: 000000ff c0db0000 00001032 c0a21000
>> c138d520 00000005 c10d8020 c1447220 NIP [c004f09c] do_exit+0x854/0x9ec
>> LR [c004e8a4] do_exit+0x5c/0x9ec
>> Call Trace:
>> [f2db9b40] [c00b0c38] _printk+0x78/0xc4 (unreliable)
>> [f2db9b90] [c004f2c4] make_task_dead+0x90/0x174
>> [f2db9bb0] [c0010b9c] die+0x324/0x32c
>> [f2db9be0] [c0004828] ProgramCheck_virt+0x108/0x158
>> --- interrupt: 700 at pata_macio_qc_prep+0xf4/0x190
>> NIP: c0719670 LR: c0719678 CTR: 00000001
>> REGS: f2db9bf0 TRAP: 0700 Tainted: G D T (6.10.3-gentoo)
>> MSR: 00021032 <ME,IR,DR,RI> CR: 44008408 XER: 20000000
>>
>> GPR00: c06fc28c f2db9cb0 c10d8020 c12d28cc 00000000 00000000 00000000
>> c109cff4 GPR08: 69fd0000 00000100 00010000 00000000 00000000 00000000
>> c007801c c40c1980 GPR16: 00000000 00000000 00000000 00000000 00000000
>> 00000100 00000122 c11377c8 GPR24: 000000ff 00000008 0000ff00 00000000
>> c14200a8 00000101 00000000 c109d000 NIP [c0719670]
>> pata_macio_qc_prep+0xf4/0x190
>> LR [c0719678] pata_macio_qc_prep+0xfc/0x190
>> --- interrupt: 700
>> [f2db9cb0] [c1421660] 0xc1421660 (unreliable)
>> [f2db9ce0] [c06fc28c] ata_qc_issue+0x14c/0x2d4
>> [f2db9d00] [c0707c5c] __ata_scsi_queuecmd+0x200/0x53c
>> [f2db9d20] [c0707fe8] ata_scsi_queuecmd+0x50/0xe0
>> [f2db9d40] [c06e2644] scsi_queue_rq+0x788/0xb1c
>> [f2db9d80] [c0492464] __blk_mq_issue_directly+0x58/0xf4
>> [f2db9db0] [c0497828] blk_mq_plug_issue_direct+0x8c/0x1b4
>> [f2db9de0] [c0498074] blk_mq_flush_plug_list.part.0+0x584/0x5e0
>> [f2db9e30] [c0485a40] __blk_flush_plug+0xf8/0x194
>> [f2db9e70] [c0485f88] __submit_bio+0x1b8/0x2e0
>> [f2db9ec0] [c04862e0] submit_bio_noacct_nocheck+0x230/0x304
>> [f2db9f00] [c03aaf30] btrfs_work_helper+0x200/0x338
>> [f2db9f40] [c006cae0] process_one_work+0x1a8/0x338
>> [f2db9f70] [c006d79c] worker_thread+0x364/0x4c0
>> [f2db9fc0] [c007811c] kthread+0x100/0x104
>> [f2db9ff0] [c001b304] start_kernel_thread+0x10/0x14
>> Code: 915e02fc 81410014 912a0004 915e03c0 939e03c4 91210014 813e04cc
>> 4bfffcec 807e0370 38800000 4bffe195 4bfffc9c <0fe00000> 4bfff848 0fe00000
>> 4bfff7ec ---[ end trace 0000000000000000 ]---
>>
>
>
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Since 6.10 - kernel oops/panics on G4 macmini due to change in drivers/ata/pata_macio.c
2024-08-13 9:54 ` Niklas Cassel
2024-08-13 9:58 ` Jonáš Vidra
@ 2024-08-13 12:32 ` Michael Ellerman
2024-08-13 14:33 ` Kolbjørn Barmen
2024-08-13 14:59 ` Niklas Cassel
1 sibling, 2 replies; 11+ messages in thread
From: Michael Ellerman @ 2024-08-13 12:32 UTC (permalink / raw)
To: Niklas Cassel
Cc: Kolbjørn Barmen, linuxppc-dev, linux-kernel, linux-ide,
Jonáš Vidra, Christoph Hellwig, linux
Niklas Cassel <cassel@kernel.org> writes:
> Hello Jonáš, Kolbjørn,
>
> thank you for the report.
>
> On Tue, Aug 13, 2024 at 07:49:34AM +0200, Jonáš Vidra wrote:
>> On Tue 13. Aug 2024 0:32:37 CEST, Kolbjørn Barmen wrote:
>> > Ever since 6.10, my macmini G4 behaved unstable when dealing with lots of
>> > I/O activity, such as sync'ing of Gentoo portage tree, unpacking kernel
>> > source tarball, building large software packages (or kernel) etc.
>> >
>> > After a bit of testing, and patient kernel rebuilding (while crashing) I
>> > found the cuplit to be this commit/change
>> >
>> > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/diff/?id=09fe2bfa6b83f865126ce3964744863f69a4a030
>>
>> I've been able to reproduce this pata_macio bug on a desktop PowerMac G4
>> with the 6.10.3 kernel version. Reverting the linked change
>> ("ata: pata_macio: Fix max_segment_size with PAGE_SIZE == 64K") makes
>> the errors go away.
>
> Michael, as the author of the this commit, could you please look into
> this issue?
I can. My commit was really just working around the warning in the SCSI
core which appeared after afd53a3d8528, it was supposed to just fix the
warning without changing behaviour. Though obviously it did for 4KB
PAGE_SIZE kernels.
I don't have easy access to my mac-mini so it would be helpful if you
can test changes Jonáš and/or Kolbjørn.
> We could revert your patch, which appears to work for some users,
> but that would again break setups with PAGE_SIZE == 64K.
> (I assume that Jonáš and Kolbjørn are not building with PAGE_SIZE == 64K.)
Yes they are using 4K, it says so in the oops.
>> ------------[ cut here ]------------
>> kernel BUG at drivers/ata/pata_macio.c:544!
>
> https://github.com/torvalds/linux/blob/v6.11-rc3/drivers/ata/pata_macio.c#L544
>
> It seems that the
> while (sg_len) loop does not play nice with the new .max_segment_size.
Right, but only for 4KB kernels for some reason. Is there some limit
elsewhere that prevents the bug tripping on 64KB kernels, or is it just
luck that no one has hit it?
I wonder if the best solution is something like below. It effectively
reverts to the old behaviour for 4KB page size, and should avoid the
same bug happening on 64KB page size kernels.
cheers
diff --git a/drivers/ata/pata_macio.c b/drivers/ata/pata_macio.c
index 1b85e8bf4ef9..eaffa510de49 100644
--- a/drivers/ata/pata_macio.c
+++ b/drivers/ata/pata_macio.c
@@ -208,6 +208,19 @@ static const char* macio_ata_names[] = {
/* Don't let a DMA segment go all the way to 64K */
#define MAX_DBDMA_SEG 0xff00
+#ifdef CONFIG_PAGE_SIZE_64KB
+/*
+ * The SCSI core requires the segment size to cover at least a page, so
+ * for 64K page size kernels it must be at least 64K. However the
+ * hardware can't handle 64K, so pata_macio_qc_prep() will split large
+ * requests. To handle the split requests the tablesize must be halved.
+ */
+#define MAX_SEGMENT_SIZE SZ_64K
+#define SG_TABLESIZE (MAX_DCMDS / 2)
+#else
+#define MAX_SEGMENT_SIZE MAX_DBDMA_SEG
+#define SG_TABLESIZE MAX_DCMDS
+#endif
/*
* Wait 1s for disk to answer on IDE bus after a hard reset
@@ -912,16 +925,10 @@ static int pata_macio_do_resume(struct pata_macio_priv *priv)
static const struct scsi_host_template pata_macio_sht = {
__ATA_BASE_SHT(DRV_NAME),
- .sg_tablesize = MAX_DCMDS,
+ .sg_tablesize = SG_TABLESIZE,
/* We may not need that strict one */
.dma_boundary = ATA_DMA_BOUNDARY,
- /*
- * The SCSI core requires the segment size to cover at least a page, so
- * for 64K page size kernels this must be at least 64K. However the
- * hardware can't handle 64K, so pata_macio_qc_prep() will split large
- * requests.
- */
- .max_segment_size = SZ_64K,
+ .max_segment_size = MAX_SEGMENT_SIZE,
.device_configure = pata_macio_device_configure,
.sdev_groups = ata_common_sdev_groups,
.can_queue = ATA_DEF_QUEUE,
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: Since 6.10 - kernel oops/panics on G4 macmini due to change in drivers/ata/pata_macio.c
2024-08-13 12:32 ` Michael Ellerman
@ 2024-08-13 14:33 ` Kolbjørn Barmen
2024-08-13 14:59 ` Niklas Cassel
1 sibling, 0 replies; 11+ messages in thread
From: Kolbjørn Barmen @ 2024-08-13 14:33 UTC (permalink / raw)
To: Michael Ellerman
Cc: Niklas Cassel, Kolbjørn Barmen, linuxppc-dev, linux-kernel,
linux-ide, Jonáš Vidra, Christoph Hellwig, linux
On Tue, 13 Aug 2024, Michael Ellerman wrote:
> Niklas Cassel <cassel@kernel.org> writes:
> > Hello Jonáš, Kolbjørn,
> >
> > thank you for the report.
> >
> > On Tue, Aug 13, 2024 at 07:49:34AM +0200, Jonáš Vidra wrote:
> >> On Tue 13. Aug 2024 0:32:37 CEST, Kolbjørn Barmen wrote:
> >> > Ever since 6.10, my macmini G4 behaved unstable when dealing with lots of
> >> > I/O activity, such as sync'ing of Gentoo portage tree, unpacking kernel
> >> > source tarball, building large software packages (or kernel) etc.
> >> >
> >> > After a bit of testing, and patient kernel rebuilding (while crashing) I
> >> > found the cuplit to be this commit/change
> >> >
> >> > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/diff/?id=09fe2bfa6b83f865126ce3964744863f69a4a030
> >>
> >> I've been able to reproduce this pata_macio bug on a desktop PowerMac G4
> >> with the 6.10.3 kernel version. Reverting the linked change
> >> ("ata: pata_macio: Fix max_segment_size with PAGE_SIZE == 64K") makes
> >> the errors go away.
> >
> > Michael, as the author of the this commit, could you please look into
> > this issue?
>
> I can. My commit was really just working around the warning in the SCSI
> core which appeared after afd53a3d8528, it was supposed to just fix the
> warning without changing behaviour. Though obviously it did for 4KB
> PAGE_SIZE kernels.
>
> I don't have easy access to my mac-mini so it would be helpful if you
> can test changes Jonáš and/or Kolbjørn.
I applied your patch (to 6.10.4 sources) and built a kernel, and did some stress
testing (tarring adnd untarring large archives) and so far it looks good.
Thanks! :)
-- kolla
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Since 6.10 - kernel oops/panics on G4 macmini due to change in drivers/ata/pata_macio.c
2024-08-13 12:32 ` Michael Ellerman
2024-08-13 14:33 ` Kolbjørn Barmen
@ 2024-08-13 14:59 ` Niklas Cassel
2024-08-14 12:20 ` Michael Ellerman
1 sibling, 1 reply; 11+ messages in thread
From: Niklas Cassel @ 2024-08-13 14:59 UTC (permalink / raw)
To: Michael Ellerman
Cc: Kolbjørn Barmen, linuxppc-dev, linux-kernel, linux-ide,
Jonáš Vidra, Christoph Hellwig, linux
Hello Michael,
On Tue, Aug 13, 2024 at 10:32:36PM +1000, Michael Ellerman wrote:
> Niklas Cassel <cassel@kernel.org> writes:
> > Hello Jonáš, Kolbjørn,
> >
> > thank you for the report.
> >
> > On Tue, Aug 13, 2024 at 07:49:34AM +0200, Jonáš Vidra wrote:
> >> On Tue 13. Aug 2024 0:32:37 CEST, Kolbjørn Barmen wrote:
> >> > Ever since 6.10, my macmini G4 behaved unstable when dealing with lots of
> >> > I/O activity, such as sync'ing of Gentoo portage tree, unpacking kernel
> >> > source tarball, building large software packages (or kernel) etc.
> >> >
> >> > After a bit of testing, and patient kernel rebuilding (while crashing) I
> >> > found the cuplit to be this commit/change
> >> >
> >> > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/diff/?id=09fe2bfa6b83f865126ce3964744863f69a4a030
> >>
> >> I've been able to reproduce this pata_macio bug on a desktop PowerMac G4
> >> with the 6.10.3 kernel version. Reverting the linked change
> >> ("ata: pata_macio: Fix max_segment_size with PAGE_SIZE == 64K") makes
> >> the errors go away.
> >
> > Michael, as the author of the this commit, could you please look into
> > this issue?
>
> I can. My commit was really just working around the warning in the SCSI
> core which appeared after afd53a3d8528, it was supposed to just fix the
> warning without changing behaviour. Though obviously it did for 4KB
> PAGE_SIZE kernels.
>
> I don't have easy access to my mac-mini so it would be helpful if you
> can test changes Jonáš and/or Kolbjørn.
>
> > We could revert your patch, which appears to work for some users,
> > but that would again break setups with PAGE_SIZE == 64K.
> > (I assume that Jonáš and Kolbjørn are not building with PAGE_SIZE == 64K.)
>
> Yes they are using 4K, it says so in the oops.
>
> >> ------------[ cut here ]------------
> >> kernel BUG at drivers/ata/pata_macio.c:544!
> >
> > https://github.com/torvalds/linux/blob/v6.11-rc3/drivers/ata/pata_macio.c#L544
> >
> > It seems that the
> > while (sg_len) loop does not play nice with the new .max_segment_size.
>
> Right, but only for 4KB kernels for some reason. Is there some limit
> elsewhere that prevents the bug tripping on 64KB kernels, or is it just
> luck that no one has hit it?
Have your tried running fio (flexible I/O tester), with reads with a very
large block sizes?
I would be surprised if it isn't possible to trigger the same bug with
64K page size.
max segment size = 64K
MAX_DCMDS = 256
256 * 64K = 16 MiB
What happens if you run fio with a 16 MiB blocksize?
Something like:
$ sudo fio --name=test --filename=/dev/sdX --direct=1 --runtime=60 --ioengine=io_uring --rw=read --iodepth=4 --bs=16M
Kind regards,
Niklas
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Since 6.10 - kernel oops/panics on G4 macmini due to change in drivers/ata/pata_macio.c
2024-08-13 14:59 ` Niklas Cassel
@ 2024-08-14 12:20 ` Michael Ellerman
2024-08-14 14:06 ` Niklas Cassel
0 siblings, 1 reply; 11+ messages in thread
From: Michael Ellerman @ 2024-08-14 12:20 UTC (permalink / raw)
To: Niklas Cassel
Cc: Kolbjørn Barmen, linuxppc-dev, linux-kernel, linux-ide,
Jonáš Vidra, Christoph Hellwig, linux
Niklas Cassel <cassel@kernel.org> writes:
> On Tue, Aug 13, 2024 at 10:32:36PM +1000, Michael Ellerman wrote:
>> Niklas Cassel <cassel@kernel.org> writes:
>> > On Tue, Aug 13, 2024 at 07:49:34AM +0200, Jonáš Vidra wrote:
...
>> >> ------------[ cut here ]------------
>> >> kernel BUG at drivers/ata/pata_macio.c:544!
>> >
>> > https://github.com/torvalds/linux/blob/v6.11-rc3/drivers/ata/pata_macio.c#L544
>> >
>> > It seems that the
>> > while (sg_len) loop does not play nice with the new .max_segment_size.
>>
>> Right, but only for 4KB kernels for some reason. Is there some limit
>> elsewhere that prevents the bug tripping on 64KB kernels, or is it just
>> luck that no one has hit it?
>
> Have your tried running fio (flexible I/O tester), with reads with a very
> large block sizes?
>
> I would be surprised if it isn't possible to trigger the same bug with
> 64K page size.
>
> max segment size = 64K
> MAX_DCMDS = 256
> 256 * 64K = 16 MiB
> What happens if you run fio with a 16 MiB blocksize?
>
> Something like:
> $ sudo fio --name=test --filename=/dev/sdX --direct=1 --runtime=60 --ioengine=io_uring --rw=read --iodepth=4 --bs=16M
Nothing interesting happens, fio succeeds.
The largest request that comes into pata_macio_qc_prep() is 1280KB,
which results in 40 DMA list entries.
I tried with a larger block size but it doesn't change anything. I guess
there's some limit somewhere else in the stack?
That was testing on qemu, but I don't think it should matter?
I guess there's no way to run the fio test against a file, ie. without a
raw partition? My real G5 doesn't have any spare disks/partitions in it.
cheers
fio-3.37
Starting 1 process
test: (groupid=0, jobs=1): err= 0: pid=257: Wed Aug 14 22:18:59 2024
read: IOPS=6, BW=195MiB/s (204MB/s)(96.0MiB/493msec)
slat (usec): min=32973, max=35222, avg=33836.35, stdev=1212.51
clat (msec): min=378, max=448, avg=413.35, stdev=34.99
lat (msec): min=413, max=481, avg=447.19, stdev=33.87
clat percentiles (msec):
| 1.00th=[ 380], 5.00th=[ 380], 10.00th=[ 380], 20.00th=[ 380],
| 30.00th=[ 380], 40.00th=[ 414], 50.00th=[ 414], 60.00th=[ 414],
| 70.00th=[ 447], 80.00th=[ 447], 90.00th=[ 447], 95.00th=[ 447],
| 99.00th=[ 447], 99.50th=[ 447], 99.90th=[ 447], 99.95th=[ 447],
| 99.99th=[ 447]
bw ( KiB/s): min=195047, max=195047, per=97.82%, avg=195047.00, stdev= 0.00, samples=1
iops : min= 5, max= 5, avg= 5.00, stdev= 0.00, samples=1
lat (msec) : 500=100.00%
cpu : usr=1.62%, sys=11.97%, ctx=22, majf=0, minf=1540
IO depths : 1=33.3%, 2=66.7%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=3,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=4
Run status group 0 (all jobs):
READ: bw=195MiB/s (204MB/s), 195MiB/s-195MiB/s (204MB/s-204MB/s), io=96.0MiB (101MB), run=493-493msec
Disk stats (read/write):
sda: ios=78/0, sectors=196608/0, merge=0/0, ticks=745/0, in_queue=745, util=66.89%
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Since 6.10 - kernel oops/panics on G4 macmini due to change in drivers/ata/pata_macio.c
2024-08-14 12:20 ` Michael Ellerman
@ 2024-08-14 14:06 ` Niklas Cassel
2024-08-16 23:46 ` Michael Ellerman
0 siblings, 1 reply; 11+ messages in thread
From: Niklas Cassel @ 2024-08-14 14:06 UTC (permalink / raw)
To: Michael Ellerman
Cc: Kolbjørn Barmen, linuxppc-dev, linux-kernel, linux-ide,
Jonáš Vidra, Christoph Hellwig, linux
On Wed, Aug 14, 2024 at 10:20:55PM +1000, Michael Ellerman wrote:
> Niklas Cassel <cassel@kernel.org> writes:
> > On Tue, Aug 13, 2024 at 10:32:36PM +1000, Michael Ellerman wrote:
> >> Niklas Cassel <cassel@kernel.org> writes:
> >> > On Tue, Aug 13, 2024 at 07:49:34AM +0200, Jonáš Vidra wrote:
> ...
> >> >> ------------[ cut here ]------------
> >> >> kernel BUG at drivers/ata/pata_macio.c:544!
> >> >
> >> > https://github.com/torvalds/linux/blob/v6.11-rc3/drivers/ata/pata_macio.c#L544
> >> >
> >> > It seems that the
> >> > while (sg_len) loop does not play nice with the new .max_segment_size.
> >>
> >> Right, but only for 4KB kernels for some reason. Is there some limit
> >> elsewhere that prevents the bug tripping on 64KB kernels, or is it just
> >> luck that no one has hit it?
> >
> > Have your tried running fio (flexible I/O tester), with reads with a very
> > large block sizes?
> >
> > I would be surprised if it isn't possible to trigger the same bug with
> > 64K page size.
> >
> > max segment size = 64K
> > MAX_DCMDS = 256
> > 256 * 64K = 16 MiB
> > What happens if you run fio with a 16 MiB blocksize?
> >
> > Something like:
> > $ sudo fio --name=test --filename=/dev/sdX --direct=1 --runtime=60 --ioengine=io_uring --rw=read --iodepth=4 --bs=16M
>
> Nothing interesting happens, fio succeeds.
>
> The largest request that comes into pata_macio_qc_prep() is 1280KB,
> which results in 40 DMA list entries.
>
> I tried with a larger block size but it doesn't change anything. I guess
> there's some limit somewhere else in the stack?
>
> That was testing on qemu, but I don't think it should matter?
>
> I guess there's no way to run the fio test against a file, ie. without a
> raw partition? My real G5 doesn't have any spare disks/partitions in it.
You can definitely run fio against a file.
e.g.
$ dd if=/dev/random of=/tmp/my_file bs=1M count=1024
$ sudo fio --name=test --filename=/tmp/my_file --direct=1 --runtime=60 --ioengine=io_uring --rw=read --iodepth=4 --bs=16M
Perhaps try with 32M block size, so that it is larger than
max segment size = 64K
MAX_DCMDS = 256
256 * 64K = 16 MiB
Perhaps also try with and without --direct.
It could be interesting to use the page cache if you do --rw=readwrite
that might possibly result in larger bios.
Kind regards,
Niklas
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Since 6.10 - kernel oops/panics on G4 macmini due to change in drivers/ata/pata_macio.c
2024-08-14 14:06 ` Niklas Cassel
@ 2024-08-16 23:46 ` Michael Ellerman
2024-08-17 3:32 ` Christoph Hellwig
0 siblings, 1 reply; 11+ messages in thread
From: Michael Ellerman @ 2024-08-16 23:46 UTC (permalink / raw)
To: Niklas Cassel
Cc: Kolbjørn Barmen, linuxppc-dev, linux-kernel, linux-ide,
Jonáš Vidra, Christoph Hellwig, linux
Niklas Cassel <cassel@kernel.org> writes:
> On Wed, Aug 14, 2024 at 10:20:55PM +1000, Michael Ellerman wrote:
>> Niklas Cassel <cassel@kernel.org> writes:
>> > On Tue, Aug 13, 2024 at 10:32:36PM +1000, Michael Ellerman wrote:
>> >> Niklas Cassel <cassel@kernel.org> writes:
>> >> > On Tue, Aug 13, 2024 at 07:49:34AM +0200, Jonáš Vidra wrote:
>> ...
>> >> >> ------------[ cut here ]------------
>> >> >> kernel BUG at drivers/ata/pata_macio.c:544!
>> >> >
>> >> > https://github.com/torvalds/linux/blob/v6.11-rc3/drivers/ata/pata_macio.c#L544
>> >> >
>> >> > It seems that the
>> >> > while (sg_len) loop does not play nice with the new .max_segment_size.
>> >>
>> >> Right, but only for 4KB kernels for some reason. Is there some limit
>> >> elsewhere that prevents the bug tripping on 64KB kernels, or is it just
>> >> luck that no one has hit it?
>> >
>> > Have your tried running fio (flexible I/O tester), with reads with a very
>> > large block sizes?
>> >
>> > I would be surprised if it isn't possible to trigger the same bug with
>> > 64K page size.
>> >
>> > max segment size = 64K
>> > MAX_DCMDS = 256
>> > 256 * 64K = 16 MiB
>> > What happens if you run fio with a 16 MiB blocksize?
>> >
>> > Something like:
>> > $ sudo fio --name=test --filename=/dev/sdX --direct=1 --runtime=60 --ioengine=io_uring --rw=read --iodepth=4 --bs=16M
>>
>> Nothing interesting happens, fio succeeds.
>>
>> The largest request that comes into pata_macio_qc_prep() is 1280KB,
>> which results in 40 DMA list entries.
>>
>> I tried with a larger block size but it doesn't change anything. I guess
>> there's some limit somewhere else in the stack?
>>
>> That was testing on qemu, but I don't think it should matter?
>>
>> I guess there's no way to run the fio test against a file, ie. without a
>> raw partition? My real G5 doesn't have any spare disks/partitions in it.
>
>
> You can definitely run fio against a file.
>
> e.g.
> $ dd if=/dev/random of=/tmp/my_file bs=1M count=1024
>
> $ sudo fio --name=test --filename=/tmp/my_file --direct=1 --runtime=60 --ioengine=io_uring --rw=read --iodepth=4 --bs=16M
>
>
> Perhaps try with 32M block size, so that it is larger than
> max segment size = 64K
> MAX_DCMDS = 256
> 256 * 64K = 16 MiB
>
> Perhaps also try with and without --direct.
> It could be interesting to use the page cache if you do --rw=readwrite
> that might possibly result in larger bios.
Changing the fio settings didn't help.
I did some tracing and noticed it was always splitting the bio in
__bio_split_to_limits() based on get_max_io_size().
That eventually lead me to max_sectors_kb in sysfs, which is by default
(on my system at least) 1280 (KB) - which is exactly the size I see in
pata-macio.
Increasing max_sectors_kb with:
# echo 16384 > /sys/devices/pci0000:f0/0000:f0:0c.0/0.80000000:mac-io/0.00020000:ata-3/ata1/host0/target0:0:0/0:0:0:0/block/sda/queue/max_sectors_kb
Allows me to trip the bug (I turned it into a WARN to keep the system alive):
[ 1804.988552] ------------[ cut here ]------------
[ 1804.988963] DMA table overflow!
[ 1804.989781] WARNING: CPU: 0 PID: 299 at drivers/ata/pata_macio.c:546 pata_macio_qc_prep+0x27c/0x2a4
[ 1804.991157] Modules linked in:
[ 1804.991945] CPU: 0 PID: 299 Comm: iou-wrk-298 Not tainted 6.10.4-dirty #242
[ 1804.992688] Hardware name: PowerMac3,1 PPC970FX 0x3c0301 PowerMac
[ 1804.993512] NIP: c0000000008bcfb4 LR: c0000000008bcfb0 CTR: 0000000000000000
[ 1804.994244] REGS: c0000000052d6fb0 TRAP: 0700 Not tainted (6.10.4-dirty)
[ 1804.994998] MSR: 800000000202b032 <SF,VEC,EE,FP,ME,IR,DR,RI> CR: 44484240 XER: 00000000
[ 1804.996178] IRQMASK: 1
[ 1804.996178] GPR00: c0000000008bcfb0 c0000000052d7250 c000000000f50b00 0000000000000013
[ 1804.996178] GPR04: 0000000100000282 c0000000014806c0 fffffffffffec230 000000003ed10000
[ 1804.996178] GPR08: 0000000000000027 c00000003fe02410 0000000000000001 0000000044484240
[ 1804.996178] GPR12: c0000000014806a8 c0000000017b0000 c0000000006c9488 c000000005026b40
[ 1804.996178] GPR16: 0000000000000000 0000000002000000 c000000000cecaa8 c000000000e44ac8
[ 1804.996178] GPR20: 0000000000800000 0000000000000080 000000000000ff00 c000000000d12730
[ 1804.996178] GPR24: c000000000e20788 c00000000330eae8 0000000000000000 0000000000000020
[ 1804.996178] GPR28: c0000000036c8130 0000000000000100 0000000000000000 c000000003fb1000
[ 1805.003085] NIP [c0000000008bcfb4] pata_macio_qc_prep+0x27c/0x2a4
[ 1805.003715] LR [c0000000008bcfb0] pata_macio_qc_prep+0x278/0x2a4
[ 1805.004564] Call Trace:
[ 1805.004963] [c0000000052d7250] [c0000000008bcfb0] pata_macio_qc_prep+0x278/0x2a4 (unreliable)
[ 1805.005974] [c0000000052d7310] [c00000000089840c] ata_qc_issue+0x170/0x390
[ 1805.006719] [c0000000052d7390] [c0000000008a5160] __ata_scsi_queuecmd+0x220/0x7d4
[ 1805.007472] [c0000000052d7410] [c000000000 8a5778] ata_scsi_queuecmd+0x64/0xe8
[ 1805.008194] [c0000000052d7450] [c00000000085b450] scsi_queue_rq+0x408/0xd74
[ 1805.008904] [c0000000052d7500] [c00000000067bfc8] blk_mq_dispatch_rq_list+0x160/0x914
[ 1805.009696] [c0000000052d75b0] [c000000000683d50] __blk_mq_sched_dispatch_requests+0x5fc/0x77c
[ 1805.010551] [c0000000052d7680] [c000000000683f68] blk_mq_sched_dispatch_requests+0x44/0x90
[ 1805.011371] [c0000000052d76b0] [c000000000677328] blk_mq_run_hw_queue+0x220/0x240
[ 1805.012138] [c0000000052d76f0] [c00000000067b084] blk_mq_flush_plug_list.part.0+0x214/0x75c
[ 1805.012975] [c0000000052d77a0] [c00000000067b664] blk_add_rq_to_plug+0x98/0x1f0
[ 1805.013717] [c0000000052d77e0] [c00000000067cd4c] blk_mq_submit_bio+0x5b0/0x888
[ 1805.014457] [c0000000052d7890] [c000000000667bf0] __submit_bio+0xa4/0x2e4
[ 1805.015149] [c0000000052d7910] [c0000000006680bc] submit_bio_noacct_nocheck+0x28c/0x404
[ 1805.015952] [c0000000052d7980] [c00000000065bf68] blkdev_direct_IO+0x63c/0x824
[ 1805.016688] [c0000000052d7aa0] [c00000000065c614] blkdev_read_iter+0x10c/0x1c8
[ 1805.017423] [c0000000052d7af0] [c0000000006b2cdc] __io_read+0xe0/0x5a0
[ 1805.018091] [c0000000052d7b50] [c0000000006b3a70] io_read+0x30/0x74
[ 1805.018733] [c0000000052d7b80] [c0000000006a9040] io_issue_sqe+0x8c/0x768
[ 1805.019419] [c0000000052d7c00] [c0000000006a9850] io_wq_submit_work+0x118/0x518
[ 1805.020153] [c0000000052d7c60] [c0000000006c8ebc] io_worker_handle_work+0x23c/0x800
[ 1805.020923] [c0000000052d7d00] [c0000000006c95f8] io_wq_worker+0x178/0x51c
[ 1805.021621] [c0000000052d7e50] [c00000000000bd94] ret_from_kernel_user_thread+0x14/0x1c
Same behaviour on a kernel with PAGE_SIZE = 4KB.
I don't know why max_sectors_kb starts out with a different value on my
system, but anyway the bug is lurking there, even if it doesn't trip by
default in some configurations.
I'll clean up and send my patch from earlier in the thread.
cheers
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Since 6.10 - kernel oops/panics on G4 macmini due to change in drivers/ata/pata_macio.c
2024-08-16 23:46 ` Michael Ellerman
@ 2024-08-17 3:32 ` Christoph Hellwig
0 siblings, 0 replies; 11+ messages in thread
From: Christoph Hellwig @ 2024-08-17 3:32 UTC (permalink / raw)
To: Michael Ellerman
Cc: Niklas Cassel, Kolbjørn Barmen, linuxppc-dev, linux-kernel,
linux-ide, Jonáš Vidra, Christoph Hellwig, linux
On Sat, Aug 17, 2024 at 09:46:31AM +1000, Michael Ellerman wrote:
> Same behaviour on a kernel with PAGE_SIZE = 4KB.
>
> I don't know why max_sectors_kb starts out with a different value on my
> system, but anyway the bug is lurking there, even if it doesn't trip by
> default in some configurations.
Various distributions use udev rules to increase it.
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2024-08-17 11:24 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-12 22:32 Since 6.10 - kernel oops/panics on G4 macmini due to change in drivers/ata/pata_macio.c Kolbjørn Barmen
2024-08-13 5:49 ` Jonáš Vidra
2024-08-13 9:54 ` Niklas Cassel
2024-08-13 9:58 ` Jonáš Vidra
2024-08-13 12:32 ` Michael Ellerman
2024-08-13 14:33 ` Kolbjørn Barmen
2024-08-13 14:59 ` Niklas Cassel
2024-08-14 12:20 ` Michael Ellerman
2024-08-14 14:06 ` Niklas Cassel
2024-08-16 23:46 ` Michael Ellerman
2024-08-17 3:32 ` Christoph Hellwig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).