* Fwd: [6.5.1] Slow boot and opening files (RIP: kvfree_rcu_bulk)
@ 2023-09-03 1:34 Bagas Sanjaya
2023-09-03 3:29 ` Hugh Dickins
` (2 more replies)
0 siblings, 3 replies; 13+ messages in thread
From: Bagas Sanjaya @ 2023-09-03 1:34 UTC (permalink / raw)
To: Paul E. McKenney, Ziwei Dai, Hugh Dickins, Marcus Seyfarth
Cc: Linux Kernel Mailing List, Linux Regressions, Linux RCU
Hi,
I notice a regression report on Bugzilla [1]. Quoting from it:
> I've just made the transition from 6.4.14 to 6.5.1 and my Haswell-EP X99 machine took way longer to boot (55 seconds instead of 16 seconds). The following trace was seen in dmesg which was also not present on 6.4.14 (and might be the cause for the long boot time); this is on bare metal.
>
> [ +0,000021] CPU: 13 PID: 338 Comm: kworker/13:1 Not tainted 6.5.1-3.1-cachyos-lto #1 c414458bd5e5db6e6f9addca639c3a78811b24e7
> [ +0,000003] Hardware name: LENOVO GAMING TF/X99-TF Gaming, BIOS CX99DE26 10/10/2020
> [ +0,000002] Workqueue: events kfree_rcu_work
> [ +0,000004] RIP: 0010:kvfree_rcu_bulk+0x13b/0x160
> [ +0,000004] Code: 80 04 00 00 80 bf 89 04 00 00 00 75 24 85 c0 75 20 31 f6 ba 02 00 00 00 e8 72 50 bd ff 5b 41 5c 41 5e 41 5f 5d e9 05 df ba ff <0f> 0b e9 54 ff ff ff a9 ff ff ff 7f 74 e5 80 bf 88 04 00 00 >
> [ +0,000002] RSP: 0018:ffff8fe4611cbd90 EFLAGS: 00010206
> [ +0,000002] RAX: 0000000000000048 RBX: ffff8fe8e04f7000 RCX: fffffffffffffffc
> [ +0,000002] RDX: 0000000000000000 RSI: ffff8fe8e04f7000 RDI: ffff8fe9df95cac8
> [ +0,000001] RBP: ffff8fe4611cbe40 R08: 8080808080808080 R09: fefefefefefefeff
> [ +0,000002] R10: 000073746e657665 R11: 8080000000000000 R12: 0000000000000000
> [ +0,000001] R13: ffff8fe4611cbde0 R14: ffff8fe9df95cac8 R15: ffff8fe4611cbdd0
> [ +0,000001] FS: 0000000000000000(0000) GS:ffff8fe9df940000(0000) knlGS:0000000000000000
> [ +0,000002] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ +0,000002] CR2: 00007f8287bff008 CR3: 00000005e8f73001 CR4: 00000000001706e0
> [ +0,000001] Call Trace:
> [ +0,000003] <TASK>
> [ +0,000001] ? __warn+0x9e/0x160
> [ +0,000004] ? kvfree_rcu_bulk+0x13b/0x160
> [ +0,000004] ? report_bug+0x112/0x180
> [ +0,000003] ? handle_bug+0x3d/0x80
> [ +0,000003] ? exc_invalid_op+0x16/0x40
> [ +0,000003] ? asm_exc_invalid_op+0x16/0x20
> [ +0,000005] ? kvfree_rcu_bulk+0x13b/0x160
> [ +0,000003] kfree_rcu_work+0xcd/0x200
> [ +0,000005] process_one_work+0x21a/0x620
> [ +0,000004] ? wake_up_process+0x1d3/0x1720
> [ +0,000004] worker_thread+0x12b/0x4c0
> [ +0,000003] ? compat_get_bitmap+0xa0/0xa0
> [ +0,000003] kthread+0xf1/0x160
> [ +0,000004] ? getreg32+0x1e0/0x1e0
> [ +0,000003] ret_from_fork+0x30/0x40
> [ +0,000005] ? getreg32+0x1e0/0x1e0
> [ +0,000003] ret_from_fork_asm+0x11/0x20
> [ +0,000005] </TASK>
> [ +0,000001] ---[ end trace 0000000000000000 ]---
Later, the reporter came up with another trace:
> I just saw a patch from Hugh Dickins on the LKML (https://www.spinics.net/lists/kernel/msg4919906.html) and indeed, with my self-compiled 6.5.1 Kernel, the trace is now downgraded to a warning (see below). However, the slow boot still remains and also my games won't load up due to missing a rendering device. But that might be a different issue.
>
>
> [ +0,000227] ------------[ cut here ]------------
> [ +0,000002] WARNING: CPU: 21 PID: 345 at kernel/rcu/tree.c:2952 kvfree_rcu_bulk+0x13b/0x160
> [ +0,000011] Modules linked in: pkcs8_key_parser crypto_user fuse loop zram bpf_preload ip_tables x_tables ext4 crc32c_generic mbcache crc16 jbd2 usbhid amdgpu mfd_core drm_buddy drm_suballoc_helper crc32c_i>
> [ +0,000027] CPU: 21 PID: 345 Comm: kworker/21:1 Not tainted 6.5.1-3.1-cachyos-lto #1 de6495663682da00bbe0d80bdc163dd768b25681
> [ +0,000004] Hardware name: LENOVO GAMING TF/X99-TF Gaming, BIOS CX99DE26 10/10/2020
> [ +0,000002] Workqueue: events kfree_rcu_work
> [ +0,000006] RIP: 0010:kvfree_rcu_bulk+0x13b/0x160
> [ +0,000006] Code: 80 04 00 00 80 bf 89 04 00 00 00 75 24 85 c0 75 20 31 f6 ba 02 00 00 00 e8 52 50 bd ff 5b 41 5c 41 5e 41 5f 5d e9 e5 de ba ff <0f> 0b e9 54 ff ff ff a9 ff ff ff 7f 74 e5 80 bf 88 04 00 00 >
> [ +0,000003] RSP: 0018:ffff8df57a5ffd90 EFLAGS: 00010206
> [ +0,000003] RAX: 0000000000000020 RBX: ffff8df44cb40000 RCX: fffffffffffffffc
> [ +0,000003] RDX: 0000000000000000 RSI: ffff8df44cb40000 RDI: ffff8df91fb5cac8
> [ +0,000002] RBP: ffff8df57a5ffe40 R08: 8080808080808080 R09: fefefefefefefeff
> [ +0,000002] R10: 000073746e657665 R11: 8080000000000000 R12: 0000000000000000
> [ +0,000002] R13: ffff8df57a5ffde0 R14: ffff8df91fb5cac8 R15: ffff8df57a5ffdd0
> [ +0,000002] FS: 0000000000000000(0000) GS:ffff8df91fb40000(0000) knlGS:0000000000000000
> [ +0,000003] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ +0,000002] CR2: 000055fe51d3f700 CR3: 000000065de7d002 CR4: 00000000001706e0
> [ +0,000002] Call Trace:
> [ +0,000003] <TASK>
> [ +0,000002] ? __warn+0x9e/0x160
> [ +0,000006] ? kvfree_rcu_bulk+0x13b/0x160
> [ +0,000005] ? report_bug+0x112/0x180
> [ +0,000005] ? handle_bug+0x3d/0x80
> [ +0,000005] ? exc_invalid_op+0x16/0x40
> [ +0,000003] ? asm_exc_invalid_op+0x16/0x20
> [ +0,000007] ? kvfree_rcu_bulk+0x13b/0x160
> [ +0,000006] kfree_rcu_work+0xcd/0x200
> [ +0,000006] process_one_work+0x21a/0x620
> [ +0,000006] ? wake_up_process+0x1d3/0x1720
> [ +0,000005] worker_thread+0x12b/0x4c0
> [ +0,000005] ? compat_get_bitmap+0xa0/0xa0
> [ +0,000004] kthread+0xf1/0x160
> [ +0,000006] ? getreg32+0x1e0/0x1e0
> [ +0,000004] ret_from_fork+0x30/0x40
> [ +0,000007] ? getreg32+0x1e0/0x1e0
> [ +0,000003] ret_from_fork_asm+0x11/0x20
> [ +0,000009] </TASK>
> [ +0,000001] ---[ end trace 0000000000000000 ]---
See Bugzilla for the full thread.
Anyway, I'm adding this regression to be tracked by regzbot:
#regzbot introduced: v6.4..v6.5 https://bugzilla.kernel.org/show_bug.cgi?id=217864
Thanks.
[1]: https://bugzilla.kernel.org/show_bug.cgi?id=217864
--
An old man doll... just what I always wanted! - Clara
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Fwd: [6.5.1] Slow boot and opening files (RIP: kvfree_rcu_bulk)
2023-09-03 1:34 Fwd: [6.5.1] Slow boot and opening files (RIP: kvfree_rcu_bulk) Bagas Sanjaya
@ 2023-09-03 3:29 ` Hugh Dickins
2023-09-03 10:45 ` Paul E. McKenney
2023-09-11 13:25 ` Bagas Sanjaya
2 siblings, 0 replies; 13+ messages in thread
From: Hugh Dickins @ 2023-09-03 3:29 UTC (permalink / raw)
To: Bagas Sanjaya
Cc: Paul E. McKenney, Ziwei Dai, Hugh Dickins, Marcus Seyfarth,
Linux Kernel Mailing List, Linux Regressions, Linux RCU
On Sun, 3 Sep 2023, Bagas Sanjaya wrote:
> Hi,
>
> I notice a regression report on Bugzilla [1]. Quoting from it:
>
> > I've just made the transition from 6.4.14 to 6.5.1 and my Haswell-EP X99 machine took way longer to boot (55 seconds instead of 16 seconds). The following trace was seen in dmesg which was also not present on 6.4.14 (and might be the cause for the long boot time); this is on bare metal.
> >
> > [ +0,000021] CPU: 13 PID: 338 Comm: kworker/13:1 Not tainted 6.5.1-3.1-cachyos-lto #1 c414458bd5e5db6e6f9addca639c3a78811b24e7
> > [ +0,000003] Hardware name: LENOVO GAMING TF/X99-TF Gaming, BIOS CX99DE26 10/10/2020
> > [ +0,000002] Workqueue: events kfree_rcu_work
> > [ +0,000004] RIP: 0010:kvfree_rcu_bulk+0x13b/0x160
> > [ +0,000004] Code: 80 04 00 00 80 bf 89 04 00 00 00 75 24 85 c0 75 20 31 f6 ba 02 00 00 00 e8 72 50 bd ff 5b 41 5c 41 5e 41 5f 5d e9 05 df ba ff <0f> 0b e9 54 ff ff ff a9 ff ff ff 7f 74 e5 80 bf 88 04 00 00 >
> > [ +0,000002] RSP: 0018:ffff8fe4611cbd90 EFLAGS: 00010206
> > [ +0,000002] RAX: 0000000000000048 RBX: ffff8fe8e04f7000 RCX: fffffffffffffffc
> > [ +0,000002] RDX: 0000000000000000 RSI: ffff8fe8e04f7000 RDI: ffff8fe9df95cac8
> > [ +0,000001] RBP: ffff8fe4611cbe40 R08: 8080808080808080 R09: fefefefefefefeff
> > [ +0,000002] R10: 000073746e657665 R11: 8080000000000000 R12: 0000000000000000
> > [ +0,000001] R13: ffff8fe4611cbde0 R14: ffff8fe9df95cac8 R15: ffff8fe4611cbdd0
> > [ +0,000001] FS: 0000000000000000(0000) GS:ffff8fe9df940000(0000) knlGS:0000000000000000
> > [ +0,000002] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ +0,000002] CR2: 00007f8287bff008 CR3: 00000005e8f73001 CR4: 00000000001706e0
> > [ +0,000001] Call Trace:
> > [ +0,000003] <TASK>
> > [ +0,000001] ? __warn+0x9e/0x160
> > [ +0,000004] ? kvfree_rcu_bulk+0x13b/0x160
> > [ +0,000004] ? report_bug+0x112/0x180
> > [ +0,000003] ? handle_bug+0x3d/0x80
> > [ +0,000003] ? exc_invalid_op+0x16/0x40
> > [ +0,000003] ? asm_exc_invalid_op+0x16/0x20
> > [ +0,000005] ? kvfree_rcu_bulk+0x13b/0x160
> > [ +0,000003] kfree_rcu_work+0xcd/0x200
> > [ +0,000005] process_one_work+0x21a/0x620
> > [ +0,000004] ? wake_up_process+0x1d3/0x1720
> > [ +0,000004] worker_thread+0x12b/0x4c0
> > [ +0,000003] ? compat_get_bitmap+0xa0/0xa0
> > [ +0,000003] kthread+0xf1/0x160
> > [ +0,000004] ? getreg32+0x1e0/0x1e0
> > [ +0,000003] ret_from_fork+0x30/0x40
> > [ +0,000005] ? getreg32+0x1e0/0x1e0
> > [ +0,000003] ret_from_fork_asm+0x11/0x20
> > [ +0,000005] </TASK>
> > [ +0,000001] ---[ end trace 0000000000000000 ]---
>
> Later, the reporter came up with another trace:
>
> > I just saw a patch from Hugh Dickins on the LKML (https://www.spinics.net/lists/kernel/msg4919906.html) and indeed, with my self-compiled 6.5.1 Kernel, the trace is now downgraded to a warning (see below). However, the slow boot still remains and also my games won't load up due to missing a rendering device. But that might be a different issue.
> >
> >
> > [ +0,000227] ------------[ cut here ]------------
> > [ +0,000002] WARNING: CPU: 21 PID: 345 at kernel/rcu/tree.c:2952 kvfree_rcu_bulk+0x13b/0x160
> > [ +0,000011] Modules linked in: pkcs8_key_parser crypto_user fuse loop zram bpf_preload ip_tables x_tables ext4 crc32c_generic mbcache crc16 jbd2 usbhid amdgpu mfd_core drm_buddy drm_suballoc_helper crc32c_i>
> > [ +0,000027] CPU: 21 PID: 345 Comm: kworker/21:1 Not tainted 6.5.1-3.1-cachyos-lto #1 de6495663682da00bbe0d80bdc163dd768b25681
> > [ +0,000004] Hardware name: LENOVO GAMING TF/X99-TF Gaming, BIOS CX99DE26 10/10/2020
> > [ +0,000002] Workqueue: events kfree_rcu_work
> > [ +0,000006] RIP: 0010:kvfree_rcu_bulk+0x13b/0x160
> > [ +0,000006] Code: 80 04 00 00 80 bf 89 04 00 00 00 75 24 85 c0 75 20 31 f6 ba 02 00 00 00 e8 52 50 bd ff 5b 41 5c 41 5e 41 5f 5d e9 e5 de ba ff <0f> 0b e9 54 ff ff ff a9 ff ff ff 7f 74 e5 80 bf 88 04 00 00 >
> > [ +0,000003] RSP: 0018:ffff8df57a5ffd90 EFLAGS: 00010206
> > [ +0,000003] RAX: 0000000000000020 RBX: ffff8df44cb40000 RCX: fffffffffffffffc
> > [ +0,000003] RDX: 0000000000000000 RSI: ffff8df44cb40000 RDI: ffff8df91fb5cac8
> > [ +0,000002] RBP: ffff8df57a5ffe40 R08: 8080808080808080 R09: fefefefefefefeff
> > [ +0,000002] R10: 000073746e657665 R11: 8080000000000000 R12: 0000000000000000
> > [ +0,000002] R13: ffff8df57a5ffde0 R14: ffff8df91fb5cac8 R15: ffff8df57a5ffdd0
> > [ +0,000002] FS: 0000000000000000(0000) GS:ffff8df91fb40000(0000) knlGS:0000000000000000
> > [ +0,000003] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ +0,000002] CR2: 000055fe51d3f700 CR3: 000000065de7d002 CR4: 00000000001706e0
> > [ +0,000002] Call Trace:
> > [ +0,000003] <TASK>
> > [ +0,000002] ? __warn+0x9e/0x160
> > [ +0,000006] ? kvfree_rcu_bulk+0x13b/0x160
> > [ +0,000005] ? report_bug+0x112/0x180
> > [ +0,000005] ? handle_bug+0x3d/0x80
> > [ +0,000005] ? exc_invalid_op+0x16/0x40
> > [ +0,000003] ? asm_exc_invalid_op+0x16/0x20
> > [ +0,000007] ? kvfree_rcu_bulk+0x13b/0x160
> > [ +0,000006] kfree_rcu_work+0xcd/0x200
> > [ +0,000006] process_one_work+0x21a/0x620
> > [ +0,000006] ? wake_up_process+0x1d3/0x1720
> > [ +0,000005] worker_thread+0x12b/0x4c0
> > [ +0,000005] ? compat_get_bitmap+0xa0/0xa0
> > [ +0,000004] kthread+0xf1/0x160
> > [ +0,000006] ? getreg32+0x1e0/0x1e0
> > [ +0,000004] ret_from_fork+0x30/0x40
> > [ +0,000007] ? getreg32+0x1e0/0x1e0
> > [ +0,000003] ret_from_fork_asm+0x11/0x20
> > [ +0,000009] </TASK>
> > [ +0,000001] ---[ end trace 0000000000000000 ]---
>
> See Bugzilla for the full thread.
>
> Anyway, I'm adding this regression to be tracked by regzbot:
>
> #regzbot introduced: v6.4..v6.5 https://bugzilla.kernel.org/show_bug.cgi?id=217864
>
> Thanks.
>
> [1]: https://bugzilla.kernel.org/show_bug.cgi?id=217864
Sorry, please delete me from this thread: I'm no expert on system
slowdowns or cachyos or kvfree_rcu_bulk(), and have to stick to those
things which I might be able to help with.
IIRC sometimes slowdowns like that come from an area of uncached memory
getting to be used by mistake; and presumably two bisections of 6.4->6.5
on that machine would help identify where the slowdown and the warning
come in - but I shall not be participating.
The 6.6 patch of mine which Marcus applied is not wrong on 6.5, but not
helpful there either; and not relevant to whatever is going on here.
He shows it as "changing" a warning with the first few lines left out
to a warning with the first few lines included i.e. no change,
so no need for me to get involved.
Hugh
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Fwd: [6.5.1] Slow boot and opening files (RIP: kvfree_rcu_bulk)
2023-09-03 1:34 Fwd: [6.5.1] Slow boot and opening files (RIP: kvfree_rcu_bulk) Bagas Sanjaya
2023-09-03 3:29 ` Hugh Dickins
@ 2023-09-03 10:45 ` Paul E. McKenney
[not found] ` <CA+FbhJOSfqcb3=ecL-y=13j81b1Ts13wHpzBSURyCRQUvd2NWQ@mail.gmail.com>
2023-09-11 13:25 ` Bagas Sanjaya
2 siblings, 1 reply; 13+ messages in thread
From: Paul E. McKenney @ 2023-09-03 10:45 UTC (permalink / raw)
To: Bagas Sanjaya
Cc: Ziwei Dai, Hugh Dickins, Marcus Seyfarth,
Linux Kernel Mailing List, Linux Regressions, Linux RCU,
Uladzislau Rezki
On Sun, Sep 03, 2023 at 08:34:44AM +0700, Bagas Sanjaya wrote:
> Hi,
>
> I notice a regression report on Bugzilla [1]. Quoting from it:
>
> > I've just made the transition from 6.4.14 to 6.5.1 and my Haswell-EP X99 machine took way longer to boot (55 seconds instead of 16 seconds). The following trace was seen in dmesg which was also not present on 6.4.14 (and might be the cause for the long boot time); this is on bare metal.
> >
> > [ +0,000021] CPU: 13 PID: 338 Comm: kworker/13:1 Not tainted 6.5.1-3.1-cachyos-lto #1 c414458bd5e5db6e6f9addca639c3a78811b24e7
This looks like part of the splat was omitted.
> > [ +0,000003] Hardware name: LENOVO GAMING TF/X99-TF Gaming, BIOS CX99DE26 10/10/2020
> > [ +0,000002] Workqueue: events kfree_rcu_work
> > [ +0,000004] RIP: 0010:kvfree_rcu_bulk+0x13b/0x160
> > [ +0,000004] Code: 80 04 00 00 80 bf 89 04 00 00 00 75 24 85 c0 75 20 31 f6 ba 02 00 00 00 e8 72 50 bd ff 5b 41 5c 41 5e 41 5f 5d e9 05 df ba ff <0f> 0b e9 54 ff ff ff a9 ff ff ff 7f 74 e5 80 bf 88 04 00 00 >
> > [ +0,000002] RSP: 0018:ffff8fe4611cbd90 EFLAGS: 00010206
> > [ +0,000002] RAX: 0000000000000048 RBX: ffff8fe8e04f7000 RCX: fffffffffffffffc
> > [ +0,000002] RDX: 0000000000000000 RSI: ffff8fe8e04f7000 RDI: ffff8fe9df95cac8
> > [ +0,000001] RBP: ffff8fe4611cbe40 R08: 8080808080808080 R09: fefefefefefefeff
> > [ +0,000002] R10: 000073746e657665 R11: 8080000000000000 R12: 0000000000000000
> > [ +0,000001] R13: ffff8fe4611cbde0 R14: ffff8fe9df95cac8 R15: ffff8fe4611cbdd0
> > [ +0,000001] FS: 0000000000000000(0000) GS:ffff8fe9df940000(0000) knlGS:0000000000000000
> > [ +0,000002] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ +0,000002] CR2: 00007f8287bff008 CR3: 00000005e8f73001 CR4: 00000000001706e0
> > [ +0,000001] Call Trace:
> > [ +0,000003] <TASK>
> > [ +0,000001] ? __warn+0x9e/0x160
> > [ +0,000004] ? kvfree_rcu_bulk+0x13b/0x160
> > [ +0,000004] ? report_bug+0x112/0x180
> > [ +0,000003] ? handle_bug+0x3d/0x80
> > [ +0,000003] ? exc_invalid_op+0x16/0x40
> > [ +0,000003] ? asm_exc_invalid_op+0x16/0x20
> > [ +0,000005] ? kvfree_rcu_bulk+0x13b/0x160
> > [ +0,000003] kfree_rcu_work+0xcd/0x200
> > [ +0,000005] process_one_work+0x21a/0x620
> > [ +0,000004] ? wake_up_process+0x1d3/0x1720
> > [ +0,000004] worker_thread+0x12b/0x4c0
> > [ +0,000003] ? compat_get_bitmap+0xa0/0xa0
> > [ +0,000003] kthread+0xf1/0x160
> > [ +0,000004] ? getreg32+0x1e0/0x1e0
> > [ +0,000003] ret_from_fork+0x30/0x40
> > [ +0,000005] ? getreg32+0x1e0/0x1e0
> > [ +0,000003] ret_from_fork_asm+0x11/0x20
> > [ +0,000005] </TASK>
> > [ +0,000001] ---[ end trace 0000000000000000 ]---
>
> Later, the reporter came up with another trace:
>
> > I just saw a patch from Hugh Dickins on the LKML (https://www.spinics.net/lists/kernel/msg4919906.html) and indeed, with my self-compiled 6.5.1 Kernel, the trace is now downgraded to a warning (see below). However, the slow boot still remains and also my games won't load up due to missing a rendering device. But that might be a different issue.
> >
> >
> > [ +0,000227] ------------[ cut here ]------------
> > [ +0,000002] WARNING: CPU: 21 PID: 345 at kernel/rcu/tree.c:2952 kvfree_rcu_bulk+0x13b/0x160
In -stable v6.5.1, this line is the following:
rcu_lock_acquire(&rcu_callback_map);
None of the patches in the github URL listed in the full version of that
bugzilla comment affect this file, but some of them could potentially
produce a slowdown.
Nevertheless adding Uladzislau on CC for his thoughts.
In the meantime, I echo Artem S. Tashkinov's suggestion of bisection.
Thanx, Paul
> > [ +0,000011] Modules linked in: pkcs8_key_parser crypto_user fuse loop zram bpf_preload ip_tables x_tables ext4 crc32c_generic mbcache crc16 jbd2 usbhid amdgpu mfd_core drm_buddy drm_suballoc_helper crc32c_i>
> > [ +0,000027] CPU: 21 PID: 345 Comm: kworker/21:1 Not tainted 6.5.1-3.1-cachyos-lto #1 de6495663682da00bbe0d80bdc163dd768b25681
> > [ +0,000004] Hardware name: LENOVO GAMING TF/X99-TF Gaming, BIOS CX99DE26 10/10/2020
> > [ +0,000002] Workqueue: events kfree_rcu_work
> > [ +0,000006] RIP: 0010:kvfree_rcu_bulk+0x13b/0x160
> > [ +0,000006] Code: 80 04 00 00 80 bf 89 04 00 00 00 75 24 85 c0 75 20 31 f6 ba 02 00 00 00 e8 52 50 bd ff 5b 41 5c 41 5e 41 5f 5d e9 e5 de ba ff <0f> 0b e9 54 ff ff ff a9 ff ff ff 7f 74 e5 80 bf 88 04 00 00 >
> > [ +0,000003] RSP: 0018:ffff8df57a5ffd90 EFLAGS: 00010206
> > [ +0,000003] RAX: 0000000000000020 RBX: ffff8df44cb40000 RCX: fffffffffffffffc
> > [ +0,000003] RDX: 0000000000000000 RSI: ffff8df44cb40000 RDI: ffff8df91fb5cac8
> > [ +0,000002] RBP: ffff8df57a5ffe40 R08: 8080808080808080 R09: fefefefefefefeff
> > [ +0,000002] R10: 000073746e657665 R11: 8080000000000000 R12: 0000000000000000
> > [ +0,000002] R13: ffff8df57a5ffde0 R14: ffff8df91fb5cac8 R15: ffff8df57a5ffdd0
> > [ +0,000002] FS: 0000000000000000(0000) GS:ffff8df91fb40000(0000) knlGS:0000000000000000
> > [ +0,000003] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ +0,000002] CR2: 000055fe51d3f700 CR3: 000000065de7d002 CR4: 00000000001706e0
> > [ +0,000002] Call Trace:
> > [ +0,000003] <TASK>
> > [ +0,000002] ? __warn+0x9e/0x160
> > [ +0,000006] ? kvfree_rcu_bulk+0x13b/0x160
> > [ +0,000005] ? report_bug+0x112/0x180
> > [ +0,000005] ? handle_bug+0x3d/0x80
> > [ +0,000005] ? exc_invalid_op+0x16/0x40
> > [ +0,000003] ? asm_exc_invalid_op+0x16/0x20
> > [ +0,000007] ? kvfree_rcu_bulk+0x13b/0x160
> > [ +0,000006] kfree_rcu_work+0xcd/0x200
> > [ +0,000006] process_one_work+0x21a/0x620
> > [ +0,000006] ? wake_up_process+0x1d3/0x1720
> > [ +0,000005] worker_thread+0x12b/0x4c0
> > [ +0,000005] ? compat_get_bitmap+0xa0/0xa0
> > [ +0,000004] kthread+0xf1/0x160
> > [ +0,000006] ? getreg32+0x1e0/0x1e0
> > [ +0,000004] ret_from_fork+0x30/0x40
> > [ +0,000007] ? getreg32+0x1e0/0x1e0
> > [ +0,000003] ret_from_fork_asm+0x11/0x20
> > [ +0,000009] </TASK>
> > [ +0,000001] ---[ end trace 0000000000000000 ]---
>
> See Bugzilla for the full thread.
>
> Anyway, I'm adding this regression to be tracked by regzbot:
>
> #regzbot introduced: v6.4..v6.5 https://bugzilla.kernel.org/show_bug.cgi?id=217864
>
> Thanks.
>
> [1]: https://bugzilla.kernel.org/show_bug.cgi?id=217864
>
> --
> An old man doll... just what I always wanted! - Clara
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Fwd: [6.5.1] Slow boot and opening files (RIP: kvfree_rcu_bulk)
[not found] ` <CA+FbhJPxU+QKf1tTBd8XcKcSsTeRmJ=ji_L4QYiad--hGqRB5w@mail.gmail.com>
@ 2023-09-03 15:36 ` Paul E. McKenney
[not found] ` <CA+FbhJPtmFG2vKNXWr67Tuh-4HUi8n81PmKxwftv9iK1a3On-A@mail.gmail.com>
0 siblings, 1 reply; 13+ messages in thread
From: Paul E. McKenney @ 2023-09-03 15:36 UTC (permalink / raw)
To: Marcus Seyfarth
Cc: Bagas Sanjaya, Ziwei Dai, Uladzislau Rezki, linux-kernel, rcu
On Sun, Sep 03, 2023 at 04:06:15PM +0200, Marcus Seyfarth wrote:
> > > I should also probably add my RCU config settings here:
> > >
> > > # RCU Subsystem
> > > #
> > > CONFIG_TREE_RCU=y
> > > CONFIG_PREEMPT_RCU=y
> > > CONFIG_RCU_EXPERT=y
> > > CONFIG_TREE_SRCU=y
> > > CONFIG_TASKS_RCU_GENERIC=y
> > > # CONFIG_FORCE_TASKS_RCU is not set
> > > CONFIG_TASKS_RCU=y
> > > # CONFIG_FORCE_TASKS_RUDE_RCU is not set
> > > CONFIG_TASKS_RUDE_RCU=y
> > > # CONFIG_FORCE_TASKS_TRACE_RCU is not set
> > > CONFIG_TASKS_TRACE_RCU=y
> > > CONFIG_RCU_STALL_COMMON=y
> > > CONFIG_RCU_NEED_SEGCBLIST=y
> > > CONFIG_RCU_FANOUT=32
> > > CONFIG_RCU_FANOUT_LEAF=32
> >
> > This CONFIG_RCU_FANOUT_LEAF=32 could result in lock contention, but much
> > depends on your workload.
>
> I followed advise of ChatGPT 3.5 on that setting (while it also warned me
> about the potential lock contention, but I haven't observed performance
> problems in my benchmarking):
>
> ChatGPT: "Since the previous recommendation for the CONFIG_RCU_FANOUT
> option for the Intel Xeon E5-2696 v3 processor was to start with a value of
> 32, it would be appropriate to set CONFIG_RCU_FANOUT_LEAF to the same value
> of 32. This aligns the fanout value at both the inner levels and the leaf
> level of the RCU hierarchy. Keeping CONFIG_RCU_FANOUT_LEAF consistent with
> CONFIG_RCU_FANOUT helps maintain a balanced distribution of resources and
> avoids potential performance bottlenecks."
Me, I would not trust ChatGPT for this sort of thing, but it is your life,
so feel free to do what you want.
But if you for whatever reason trust ChatGPT, why don't you ask it for
the root cause of the slowdown and the the warning? Why bother asking me?
> > CONFIG_RCU_BOOST=y
> > > CONFIG_RCU_BOOST_DELAY=2
> >
> > Boosting after only two milliseconds is supported, but aggressive.
>
> That was also advised for gaming workloads by ChatGPT 3.5 but not as
> strongly:
>
> ChatGPT: "For gaming performance, the impact of CONFIG_RCU_BOOST_DELAY is
> usually minimal, and the default value should be sufficient for most
> scenarios. The default value is typically set to 4 milliseconds (ms) in
> many kernel configurations. If you wish to experiment with different values
> to optimize gaming performance, you can try reducing the
> CONFIG_RCU_BOOST_DELAY value slightly. For example, you can experiment with
> values like 2 ms or even 1 ms to potentially improve responsiveness for
> real-time tasks, including the game itself."
Much of ChatGPT's response is counter-factual, as in ChatGPT just made a
fool of itself here, as it has done quite often in the past. Please do
everyone (and especially yourself) a big favor and stop trusting its
statistical word-salad mashups regarding RCU.
Maybe the day will come when AI understands RCU, but today is not that
day.
> > > # CONFIG_RCU_EXP_KTHREAD is not set
> > > CONFIG_RCU_NOCB_CPU=y
> > > CONFIG_RCU_NOCB_CPU_DEFAULT_ALL=y
> > > CONFIG_RCU_NOCB_CPU_CB_BOOST=y
> > > # CONFIG_TASKS_TRACE_RCU_READ_MB is not set
> > > CONFIG_RCU_LAZY=y
> >
> > As noted earlier, please try CONFIG_RCU_LAZY=y.
>
> As CONFIG_RCU_LAZY=y is set above, I am confused by this suggestion. Do you
> want me to unset the other settings?
Apologies.
I want you to follow the advice I gave you in my earlier email, the one
with Message-ID c054b588-b029-4380-9ec5-4ae50ee37d08@paulmck-laptop.
I would give you a URL, but you seem to have dropped the public email
lists. I have added them back.
> > > # CONFIG_RCU_DOUBLE_CHECK_CB_TIME is not set
> > > # end of RCU Subsystem
> > >
> > > I also added this to the CachyOS config file:
> > >
> > > # Suggestions from BARD for my Xeon E5-2696V3
> > > kernel.rcu_nocbs = 0
> >
> > Well and good.
> >
> > > kernel.rcu_cpu_stall_count = 18
> > > kernel.rcu_queue_length = 1024
> > > kernel.rcu_interval = 200
> >
> > I might be having a bad code-search day, but I don't see any sign of
> > any of these in mainline or in -rcu.
> >
> > Do any of the patches that added these also add a call_rcu() that is
> > invoked during the time that you observe the slowdowns? More generally,
> > I suggest inspecting these patches carefully. I did not take any of
> > them into account when reviewing recent changes (nor should I or any of
> > the other RCU maintainers or reviewers be expected to), so it is quite
> > possible that a recent change invalidated one of those patches.
>
> These were not a result of any RCU extra patches. These four entries were
> added by me manually to the CachyOS-Settings file
> (/etc/sysctl.d/99-cachyos-settings.conf) that controls various knobs on
> that distro in this central place.
Unless there are additional patches (perhaps generated by ChatGPT?),
those settings do nothing.
> The only patch CachyOS ships with 6.5 that could be somewhat related is:
> https://lore.kernel.org/linux-mm/20230703184928.GB4378@monkey/T/#m36fa1f2a52341d57a8ac39f5bd2d64376d26bfe5
>
> >From the discussion, I see that it was problematic. I will delete it in my
> next experiments.
>
> The CachyOS Kernel default RCU settings are available under this link:
> https://github.com/CachyOS/linux-cachyos/blob/fa4eda73dd00e29fad3c98d49a8843d813b1c1fe/linux-cachyos/config#L163
> [which also reproduced the problems with the default distro Kernel 6.5.0-2].
>
> Thanks a lot for your help!
You are welcome, but at this point, I am afraid that you are on your own.
You have the CONFIG_RCU_LAZY=n suggestion and the bisection suggestion.
I wish you the best of everything in your quest.
If this problem occurs in mainline, someone will reproduce it reasonably
soon against a clean mainline or -stable release.
Thanx, Paul
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Fwd: [6.5.1] Slow boot and opening files (RIP: kvfree_rcu_bulk)
[not found] ` <CA+FbhJPtmFG2vKNXWr67Tuh-4HUi8n81PmKxwftv9iK1a3On-A@mail.gmail.com>
@ 2023-09-04 11:23 ` Uladzislau Rezki
[not found] ` <CA+FbhJPNZ-E3e7WBH_jAvi3Rn-2gV4TVk9S9qmheXkqXw+Sakg@mail.gmail.com>
0 siblings, 1 reply; 13+ messages in thread
From: Uladzislau Rezki @ 2023-09-04 11:23 UTC (permalink / raw)
To: Marcus Seyfarth
Cc: paulmck, Bagas Sanjaya, Ziwei Dai, Uladzislau Rezki, linux-kernel,
rcu
Hello, Marcus.
>
> Apologies.
>
> I want you to follow the advice I gave you in my earlier email, the one
> with Message-ID c054b588-b029-4380-9ec5-4ae50ee37d08@paulmck-laptop.
> I would give you a URL, but you seem to have dropped the public email
> lists. I have added them back.
>
>
> Sorry, I've overlooked that previous mail. I've now tried CONFIG_RCU_LAZY=n and
> also reduced the amount of extra-patches and extra-flags to a bare minimum. It
> didn't help though. The slow boot is still there and shutdown/reboot also
> didn't seem to work again (while the trace went away by overwriting /kernel/rcu
> /tree.c with the file from 6.4.14). I am also not subscribed to the LKML, so my
> E-Mails won't reach the list anyway. Unfortunately Zhang Qiang doesn't seem to
> work for Intel any longer as my previous mail did not reach him either. I hope
> the reported issues will be fixed eventually, I'll stay away from 6.5 for some
> time.
>
Could you please clarify some items:
1.
<snip>
if (!WARN_ON_ONCE(!poll_state_synchronize_rcu_full(&bnode->gp_snap))) {
debug_rcu_bhead_unqueue(bnode);
rcu_lock_acquire(&rcu_callback_map);
<snip>
Do you see this warning? I mean the one that is in the if()?
2. Please provide a full .config file.
3. Could you please also to be more specific how to reproduce a boot delay and the warning you see?
4. Please provide your full dmesg.
Thank you in advance!
--
Uladzislau Rezki
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Fwd: [6.5.1] Slow boot and opening files (RIP: kvfree_rcu_bulk)
[not found] ` <CA+FbhJPNZ-E3e7WBH_jAvi3Rn-2gV4TVk9S9qmheXkqXw+Sakg@mail.gmail.com>
@ 2023-09-04 15:05 ` Uladzislau Rezki
[not found] ` <CA+FbhJMr6LzmOpVNkYyiSERAsNEqqvQwQ7SwJK=CmwvV9d2Z-A@mail.gmail.com>
0 siblings, 1 reply; 13+ messages in thread
From: Uladzislau Rezki @ 2023-09-04 15:05 UTC (permalink / raw)
To: Marcus Seyfarth
Cc: Uladzislau Rezki, paulmck, Bagas Sanjaya, Ziwei Dai, linux-kernel,
rcu
> Could you please clarify some items:
>
> 1.
> <snip>
> if (!WARN_ON_ONCE(!poll_state_synchronize_rcu_full(&bnode->gp_snap))) {
> debug_rcu_bhead_unqueue(bnode);
> rcu_lock_acquire(&rcu_callback_map);
> <snip>
>
> Do you see this warning? I mean the one that is in the if()?
>
>
> Hi! From my limited understanding, the warning points to that snippet: [
> 7.108424] WARNING: CPU: 13 PID: 338 at kernel/rcu/tree.c:2952
> kvfree_rcu_bulk+0x13b/0x160
>
OK. Since you have a compiled vmlinux. Just to be sure, could you please
perform below steps:
<snip>
urezki@pc638:~/data/raid0/coding/linux.git$ gdb ./vmlinux
GNU gdb (Debian 10.1-1.7) 10.1.90.20210103-git
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./vmlinux...
(gdb) l *kvfree_rcu_bulk+0x13b
<snip>
and post the output here?
>
> I also don't get that warning any longer in my experiments when overwriting /
> kernel/rcu/tree.c with the file from 6.4.14.
>
The 6.4 does not have such warnings, therefore you do not see it :)
>
> 2. Please provide a full .config file.
> 3. Could you please also to be more specific how to reproduce a boot delay
> and the warning you see?
> 4. Please provide your full dmesg.
>
>
> The boot delay and warnings happen with both my self-compiled 6.5.1 and the
> CachyOS default Kernel 6.5.0-2. Reproducing is as simple as booting into that
> system and open up dmesg after the task bar eventually shows up (which takes 10
> - 20 seconds longer than normal). The warning goes away when overwriting /
> kernel/rcu/tree.c with the file from 6.4.14. But the slow boot, task bar and
> the shutdown/reboot problem remained. Journalctl signals some failures with
> powerdevil that do not happen with the older Kernel installed.
>
Let's focus on your own self compiled kernel. As for the 6.5.1 kernel,
could you please point your SHA1 so i can take a vanilla kernel and base
my testing on exactly the same baseline?
> I've attached the files to this mail for convenience.
>
Appreciate and thank you for the help!
--
Uladzislau Rezki
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Fwd: [6.5.1] Slow boot and opening files (RIP: kvfree_rcu_bulk)
[not found] ` <CA+FbhJMr6LzmOpVNkYyiSERAsNEqqvQwQ7SwJK=CmwvV9d2Z-A@mail.gmail.com>
@ 2023-09-04 16:53 ` Uladzislau Rezki
2023-09-04 19:13 ` Uladzislau Rezki
0 siblings, 1 reply; 13+ messages in thread
From: Uladzislau Rezki @ 2023-09-04 16:53 UTC (permalink / raw)
To: Marcus Seyfarth
Cc: Uladzislau Rezki, paulmck, Bagas Sanjaya, Ziwei Dai, linux-kernel,
rcu
>
> > Could you please clarify some items:
> >
> > 1.
> > <snip>
> > if (!WARN_ON_ONCE(!poll_state_synchronize_rcu_full(&bnode->gp_snap)))
> {
> > debug_rcu_bhead_unqueue(bnode);
> > rcu_lock_acquire(&rcu_callback_map);
> > <snip>
> >
> > Do you see this warning? I mean the one that is in the if()?
> >
> >
> > Hi! From my limited understanding, the warning points to that snippet: [
>
> > 7.108424] WARNING: CPU: 13 PID: 338 at kernel/rcu/tree.c:2952
> > kvfree_rcu_bulk+0x13b/0x160
> >
> OK. Since you have a compiled vmlinux. Just to be sure, could you please
> perform below steps:
>
> <snip>
> urezki@pc638:~/data/raid0/coding/linux.git$ gdb ./vmlinux
> GNU gdb (Debian 10.1-1.7) 10.1.90.20210103-git
> Copyright (C) 2021 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/
> gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.
> Type "show copying" and "show warranty" for details.
> This GDB was configured as "x86_64-linux-gnu".
> Type "show configuration" for configuration details.
> For bug reporting instructions, please see:
> <https://www.gnu.org/software/gdb/bugs/>.
> Find the GDB manual and other documentation resources online at:
> <http://www.gnu.org/software/gdb/documentation/>.
>
> For help, type "help".
> Type "apropos word" to search for commands related to "word"...
> Reading symbols from ./vmlinux...
> (gdb) l *kvfree_rcu_bulk+0x13b
> <snip>
>
> and post the output here?
>
>
> I guess that is with my self-compiled 6.5.1 installed, right? I might come back
> to this task with the asked data later, but as that machine is used in
> production for daily tasks this might take a couple of days.
>
Yes, please.
>
> I don't have an SHA1 but you can take the vanilla 6.5.1 kernel and apply all
> seven 0001-*.patch files from my repo that are available here: https://
> github.com/ms178/archpkgbuilds/tree/main/packages/linux-cachyos to get it into
> the same state.
>
Sounds good. I will try my best to reproduce it locally with set of
extra patches + your .config + 6.5.1 kernel.
Thank you for the help to debug it!
--
Uladzislau Rezki
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Fwd: [6.5.1] Slow boot and opening files (RIP: kvfree_rcu_bulk)
2023-09-04 16:53 ` Uladzislau Rezki
@ 2023-09-04 19:13 ` Uladzislau Rezki
[not found] ` <CA+FbhJPNK=4s8J5OqOBaDC8EDNQzevQMQ+fwZnfxG92ReabQOA@mail.gmail.com>
0 siblings, 1 reply; 13+ messages in thread
From: Uladzislau Rezki @ 2023-09-04 19:13 UTC (permalink / raw)
To: Marcus Seyfarth
Cc: Marcus Seyfarth, paulmck, Bagas Sanjaya, Ziwei Dai, linux-kernel,
rcu
>> https://github.com/ms178/archpkgbuilds/tree/main/packages/linux-cachyos
0001-EEVDF-cachy.patch
0001-arch-x86-Makefile.patch
0001-bore-eevdf.patch
0001-cachyos-base-all.patch
0001-lrng.patch
0001-makefile-clang-ms178.patch
0001-ms178.patch
Is sequence of applying important or there is no dependency?
--
Uladzislau Rezki
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Fwd: [6.5.1] Slow boot and opening files (RIP: kvfree_rcu_bulk)
[not found] ` <CA+FbhJMEqLycroz=J6VvC=4OBaJSwz8K+K6Zgki80M-5YdYp6A@mail.gmail.com>
@ 2023-09-05 10:41 ` Uladzislau Rezki
2023-09-05 16:46 ` Uladzislau Rezki
1 sibling, 0 replies; 13+ messages in thread
From: Uladzislau Rezki @ 2023-09-05 10:41 UTC (permalink / raw)
To: Marcus Seyfarth
Cc: Uladzislau Rezki, paulmck, Bagas Sanjaya, Ziwei Dai, linux-kernel,
rcu
>
> For a comparison, I've attached a journalctl log for my custom 6.4.14 Kernel.
>
Thank you!
--
Uladzislau Rezki
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Fwd: [6.5.1] Slow boot and opening files (RIP: kvfree_rcu_bulk)
[not found] ` <CA+FbhJMEqLycroz=J6VvC=4OBaJSwz8K+K6Zgki80M-5YdYp6A@mail.gmail.com>
2023-09-05 10:41 ` Uladzislau Rezki
@ 2023-09-05 16:46 ` Uladzislau Rezki
[not found] ` <CA+FbhJOekDxBjQH6jUFXusgakRVx_Y0S3s5avko23c6XqCc2Mw@mail.gmail.com>
1 sibling, 1 reply; 13+ messages in thread
From: Uladzislau Rezki @ 2023-09-05 16:46 UTC (permalink / raw)
To: Marcus Seyfarth
Cc: Uladzislau Rezki, paulmck, Bagas Sanjaya, Ziwei Dai, linux-kernel,
rcu
>
> For a comparison, I've attached a journalctl log for my custom 6.4.14 Kernel.
>
1. I tried to apply your patches on stable 6.5.1 kernel. All of them can be
applied except of one. Its name is 0001-ms178.patch and it produces below
rejects:
<snip>
arch/alpha/kernel/syscalls/syscall.tbl.rej
arch/arm/tools/syscall.tbl.rej
arch/arm64/include/asm/unistd.h.rej
arch/arm64/include/asm/unistd32.h.rej
arch/ia64/kernel/syscalls/syscall.tbl.rej
arch/m68k/kernel/syscalls/syscall.tbl.rej
arch/microblaze/kernel/syscalls/syscall.tbl.rej
arch/mips/kernel/syscalls/syscall_n32.tbl.rej
arch/mips/kernel/syscalls/syscall_n64.tbl.rej
arch/mips/kernel/syscalls/syscall_o32.tbl.rej
arch/parisc/kernel/syscalls/syscall.tbl.rej
arch/powerpc/kernel/syscalls/syscall.tbl.rej
arch/s390/kernel/syscalls/syscall.tbl.rej
arch/sh/kernel/syscalls/syscall.tbl.rej
arch/sparc/kernel/syscalls/syscall.tbl.rej
arch/x86/entry/syscalls/syscall_32.tbl.rej
arch/x86/entry/syscalls/syscall_64.tbl.rej
arch/xtensa/kernel/syscalls/syscall.tbl.rej
include/uapi/asm-generic/unistd.h.rej
<snip>
But let's skip that part.
2. One of patches also modifies the kernel/rcu/tree.c file:
<snip>
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 1449cb69a0e0..fbc20c6cdbeb 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2810,6 +2810,7 @@ struct kfree_rcu_cpu_work {
/**
* struct kfree_rcu_cpu - batch up kfree_rcu() requests for RCU grace period
+ * @gp_snap: The GP snapshot recorded at the last scheduling of monitor work.
* @head: List of kfree_rcu() objects not yet waiting for a grace period
* @head_gp_snap: Snapshot of RCU state for objects placed to "@head"
* @bulk_head: Bulk-List of kvfree_rcu() objects not yet waiting for a grace period
@@ -2849,6 +2850,7 @@ struct kfree_rcu_cpu {
struct kfree_rcu_cpu_work krw_arr[KFREE_N_BATCHES];
raw_spinlock_t lock;
struct delayed_work monitor_work;
+ unsigned long gp_snap;
bool initialized;
struct delayed_work page_cache_work;
@@ -3095,6 +3097,7 @@ schedule_delayed_monitor_work(struct kfree_rcu_cpu *krcp)
mod_delayed_work(system_wq, &krcp->monitor_work, delay);
return;
}
+ krcp->gp_snap = get_state_synchronize_rcu();
queue_delayed_work(system_wq, &krcp->monitor_work, delay);
}
@@ -3187,7 +3190,10 @@ static void kfree_rcu_monitor(struct work_struct *work)
// be that the work is in the pending state when
// channels have been detached following by each
// other.
- queue_rcu_work(system_wq, &krwp->rcu_work);
+ if (poll_state_synchronize_rcu(krcp->gp_snap))
+ queue_work(system_wq, &krwp->rcu_work.work);
+ else
+ queue_rcu_work(system_wq, &krwp->rcu_work);
}
}
<snip>
i do not understand where you got this patch and what a reason behind of
applying is.
3. Could you please remove that patch(revert it) and try one more time?
4. When i apply all your patches i see that you modify:
urezki@pc638:~/data/raid0/coding/linux-stable.git$ git st | wc -l
455
urezki@pc638:~/data/raid0/coding/linux-stable.git$
455 files. The delta is huge. It touches arch, mm, rcu, drivers, crypto,
include/linux/ generic headers, init, kernel, lib, net, etc parts.
So as a result we have:
<snip>
451 files changed, 34218 insertions(+), 5576 deletions(-)
<snip>
--
Uladzislau Rezki
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: Fwd: [6.5.1] Slow boot and opening files (RIP: kvfree_rcu_bulk)
[not found] ` <CA+FbhJNQfGW5RMJc-WaOmjqmYuTnqdvRPYO_40TP5=P4LFPDYQ@mail.gmail.com>
@ 2023-09-05 18:46 ` Uladzislau Rezki
[not found] ` <CA+FbhJO4xfdrUtZLWMRMaZdM2W-G+ZKtg9ESdwT8DUFJZKmW-Q@mail.gmail.com>
0 siblings, 1 reply; 13+ messages in thread
From: Uladzislau Rezki @ 2023-09-05 18:46 UTC (permalink / raw)
To: Marcus Seyfarth
Cc: Uladzislau Rezki, paulmck, Bagas Sanjaya, Ziwei Dai, linux-kernel,
rcu
On Tue, Sep 05, 2023 at 08:17:27PM +0200, Marcus Seyfarth wrote:
> Systemd-analyze shows also improvements. But still the firmware part looks
> suspicious:
>
> ❯ systemd-analyze
> Startup finished in 2min 42.000s (firmware) + 4.201s (loader) + 6.895s (kernel)
> + 1.541s (userspace) = 2min 54.640s
> graphical.target reached after 1.539s in userspace.
>
>
> Am Di., 5. Sept. 2023 um 20:12 Uhr schrieb Marcus Seyfarth <
> m.seyfarth@gmail.com>:
>
> 3. Could you please remove that patch(revert it) and try one more time?
>
>
> Okay, I've just removed that patch and indeed the warning is gone on my
> self-compiled 6.5.1 Kernel. I've attached the journalctl output. The slow
> boot remains though. But at least the performance in WebGL Aquarium is back
> where I expected it to be.
>
Good. Please do not include that patch on top of 6.5.1 and higher kernels. It just
breaks the things and is not considered as applicable, because the functionality
which improves reclaim process is in place.
As for slow boot up time, it is another issue and not related to RCU, IMHO.
--
Uladzislau Rezki
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Fwd: [6.5.1] Slow boot and opening files (RIP: kvfree_rcu_bulk)
[not found] ` <CA+FbhJO4xfdrUtZLWMRMaZdM2W-G+ZKtg9ESdwT8DUFJZKmW-Q@mail.gmail.com>
@ 2023-09-06 10:31 ` Uladzislau Rezki
0 siblings, 0 replies; 13+ messages in thread
From: Uladzislau Rezki @ 2023-09-06 10:31 UTC (permalink / raw)
To: Marcus Seyfarth
Cc: Uladzislau Rezki, paulmck, Bagas Sanjaya, Ziwei Dai, linux-kernel,
rcu
On Tue, Sep 05, 2023 at 09:07:40PM +0200, Marcus Seyfarth wrote:
> Good. Please do not include that patch on top of 6.5.1 and higher kernels.
> It just
> breaks the things and is not considered as applicable, because the
> functionality
> which improves reclaim process is in place.
>
> As for slow boot up time, it is another issue and not related to RCU, IMHO.
>
>
> Thanks for your help, I will close the RCU-related bug report and file a
> seperate report for the boot/shutdown/reboot problem.
>
You are welcome! Thank you for closing an RCU report. When it comes to
a boot time issue, i see that user space is up and running and it seems
kernel loading process is ongoing. It might be that some modules during
inserting/probing take time. Like you mentioned, firmware related timing
issues.
--
Uladzislau Rezki
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Fwd: [6.5.1] Slow boot and opening files (RIP: kvfree_rcu_bulk)
2023-09-03 1:34 Fwd: [6.5.1] Slow boot and opening files (RIP: kvfree_rcu_bulk) Bagas Sanjaya
2023-09-03 3:29 ` Hugh Dickins
2023-09-03 10:45 ` Paul E. McKenney
@ 2023-09-11 13:25 ` Bagas Sanjaya
2 siblings, 0 replies; 13+ messages in thread
From: Bagas Sanjaya @ 2023-09-11 13:25 UTC (permalink / raw)
To: Paul E. McKenney, Ziwei Dai, Hugh Dickins, Marcus Seyfarth
Cc: Linux Kernel Mailing List, Linux Regressions, Linux RCU
[-- Attachment #1: Type: text/plain, Size: 357 bytes --]
On Sun, Sep 03, 2023 at 08:34:44AM +0700, Bagas Sanjaya wrote:
> #regzbot introduced: v6.4..v6.5 https://bugzilla.kernel.org/show_bug.cgi?id=217864
>
As requested by the reporter,
#regzbot dup-of: https://lore.kernel.org/lkml/5011708f-b0ae-2853-0f87-a3b59845a2cc@gmail.com/
Thanks.
--
An old man doll... just what I always wanted! - Clara
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2023-09-11 21:36 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-03 1:34 Fwd: [6.5.1] Slow boot and opening files (RIP: kvfree_rcu_bulk) Bagas Sanjaya
2023-09-03 3:29 ` Hugh Dickins
2023-09-03 10:45 ` Paul E. McKenney
[not found] ` <CA+FbhJOSfqcb3=ecL-y=13j81b1Ts13wHpzBSURyCRQUvd2NWQ@mail.gmail.com>
[not found] ` <c054b588-b029-4380-9ec5-4ae50ee37d08@paulmck-laptop>
[not found] ` <CA+FbhJO1FbOvP=GmUse-r8Yma1tSv6iqppDhz9Ut1JKpjN49qQ@mail.gmail.com>
[not found] ` <CA+FbhJPOkV4Xa7ZJ4QF2y_-5i24F+TkhByr=LzMPt9iy9EshSQ@mail.gmail.com>
[not found] ` <4009a6e0-198c-4f25-8e34-4774dcd8c537@paulmck-laptop>
[not found] ` <CA+FbhJPxU+QKf1tTBd8XcKcSsTeRmJ=ji_L4QYiad--hGqRB5w@mail.gmail.com>
2023-09-03 15:36 ` Paul E. McKenney
[not found] ` <CA+FbhJPtmFG2vKNXWr67Tuh-4HUi8n81PmKxwftv9iK1a3On-A@mail.gmail.com>
2023-09-04 11:23 ` Uladzislau Rezki
[not found] ` <CA+FbhJPNZ-E3e7WBH_jAvi3Rn-2gV4TVk9S9qmheXkqXw+Sakg@mail.gmail.com>
2023-09-04 15:05 ` Uladzislau Rezki
[not found] ` <CA+FbhJMr6LzmOpVNkYyiSERAsNEqqvQwQ7SwJK=CmwvV9d2Z-A@mail.gmail.com>
2023-09-04 16:53 ` Uladzislau Rezki
2023-09-04 19:13 ` Uladzislau Rezki
[not found] ` <CA+FbhJPNK=4s8J5OqOBaDC8EDNQzevQMQ+fwZnfxG92ReabQOA@mail.gmail.com>
[not found] ` <CA+FbhJMEqLycroz=J6VvC=4OBaJSwz8K+K6Zgki80M-5YdYp6A@mail.gmail.com>
2023-09-05 10:41 ` Uladzislau Rezki
2023-09-05 16:46 ` Uladzislau Rezki
[not found] ` <CA+FbhJOekDxBjQH6jUFXusgakRVx_Y0S3s5avko23c6XqCc2Mw@mail.gmail.com>
[not found] ` <CA+FbhJNQfGW5RMJc-WaOmjqmYuTnqdvRPYO_40TP5=P4LFPDYQ@mail.gmail.com>
2023-09-05 18:46 ` Uladzislau Rezki
[not found] ` <CA+FbhJO4xfdrUtZLWMRMaZdM2W-G+ZKtg9ESdwT8DUFJZKmW-Q@mail.gmail.com>
2023-09-06 10:31 ` Uladzislau Rezki
2023-09-11 13:25 ` Bagas Sanjaya
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox