* [PATCH] net/sched: taprio: fix CPU stuck due to the taprio hrtimer
[not found] <00000000000022a23c061604edb3@google.com>
@ 2024-05-03 8:53 ` Yun Lu
2024-05-03 9:18 ` Yun Lu
` (2 subsequent siblings)
3 siblings, 0 replies; 8+ messages in thread
From: Yun Lu @ 2024-05-03 8:53 UTC (permalink / raw)
To: syzbot+1acbadd9f48eeeacda29; +Cc: syzkaller-bugs, netdev
Syzbot reported the issues which rcu_preempt detected stalls on CPUs, and
the Call Trace shows that CPU is stuck on taprio hrtimer [1] [2].
rcu_lock_release include/linux/rcupdate.h:308 [inline]
rcu_read_unlock include/linux/rcupdate.h:783 [inline]
advance_sched+0xb37/0xca0 net/sched/sch_taprio.c:987
__run_hrtimer kernel/time/hrtimer.c:1692 [inline]
__hrtimer_run_queues+0x597/0xd00 kernel/time/hrtimer.c:1756
hrtimer_interrupt+0x396/0x990 kernel/time/hrtimer.c:1818
local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1032 [inline]
__sysvec_apic_timer_interrupt+0x109/0x3a0 arch/x86/kernel/apic/apic.c:1049
instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1043 [inline]
sysvec_apic_timer_interrupt+0x52/0xc0 arch/x86/kernel/apic/apic.c:1043
asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702
......
Assuming the clockid of hrtimer is set to REALTIME and hrtimer is started,
the system time is then adjusted by a significant value backwards, and this
adjustment is not taken into account in the advance_sched function, which
still calculating the hrtimer expires based on the previous end_time.
This will result in the hrtimer expires being much smaller than the current
system time. Consequently, this hrtimer keeps getting inserted as the first
node in the timerqueue, causing the CPU to enter an infinite loop in the
__hrtimer_run_queues function, getting stuck and unable to exit or respond
to any interrupts.
To address this, when calculating the start time of the hrtimer, retain a
record of the offset between the current time corresponding to clockid and
the monotonic time. Subsequently, when setting the hrtimer expires, check
if this offset has changed. If it has, the hrtimer expires should be
adjusted accordingly by adding this difference offset value.
[1] https://lore.kernel.org/all/00000000000022a23c061604edb3@google.com/
[2] https://lore.kernel.org/all/000000000000d929dd0614a8ba8c@google.com/
Signed-off-by: Yun Lu <luyun@kylinos.cn>
Reported-by: syzbot+1acbadd9f48eeeacda29@syzkaller.appspotmail.com
---
net/sched/sch_taprio.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index a0d54b422186..360778f65d9e 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -104,6 +104,7 @@ struct taprio_sched {
u32 max_sdu[TC_MAX_QUEUE]; /* save info from the user */
u32 fp[TC_QOPT_MAX_QUEUE]; /* only for dump and offloading */
u32 txtime_delay;
+ ktime_t offset;
};
struct __tc_taprio_qopt_offload {
@@ -170,6 +171,13 @@ static ktime_t sched_base_time(const struct sched_gate_list *sched)
return ns_to_ktime(sched->base_time);
}
+static ktime_t taprio_get_offset(enum tk_offsets tk_offset)
+{
+ ktime_t time = ktime_get();
+
+ return ktime_sub_ns(ktime_mono_to_any(time, tk_offset), time);
+}
+
static ktime_t taprio_mono_to_any(const struct taprio_sched *q, ktime_t mono)
{
/* This pairs with WRITE_ONCE() in taprio_parse_clockid() */
@@ -918,6 +926,8 @@ static enum hrtimer_restart advance_sched(struct hrtimer *timer)
int num_tc = netdev_get_num_tc(dev);
struct sched_entry *entry, *next;
struct Qdisc *sch = q->root;
+ enum tk_offsets tk_offset = READ_ONCE(q->tk_offset);
+ ktime_t now_offset = taprio_get_offset(tk_offset);
ktime_t end_time;
int tc;
@@ -957,6 +967,14 @@ static enum hrtimer_restart advance_sched(struct hrtimer *timer)
end_time = ktime_add_ns(entry->end_time, next->interval);
end_time = min_t(ktime_t, end_time, oper->cycle_end_time);
+ if (q->offset != now_offset) {
+ ktime_t diff = ktime_sub_ns(now_offset, q->offset);
+
+ end_time = ktime_add_ns(end_time, diff);
+ oper->cycle_end_time = ktime_add_ns(oper->cycle_end_time, diff);
+ q->offset = now_offset;
+ }
+
for (tc = 0; tc < num_tc; tc++) {
if (next->gate_duration[tc] == oper->cycle_time)
next->gate_close_time[tc] = KTIME_MAX;
@@ -1205,11 +1223,13 @@ static int taprio_get_start_time(struct Qdisc *sch,
ktime_t *start)
{
struct taprio_sched *q = qdisc_priv(sch);
+ enum tk_offsets tk_offset = READ_ONCE(q->tk_offset);
ktime_t now, base, cycle;
s64 n;
base = sched_base_time(sched);
now = taprio_get_time(q);
+ q->offset = taprio_get_offset(tk_offset);
if (ktime_after(base, now)) {
*start = base;
--
2.34.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH] net/sched: taprio: fix CPU stuck due to the taprio hrtimer
[not found] <00000000000022a23c061604edb3@google.com>
2024-05-03 8:53 ` [PATCH] net/sched: taprio: fix CPU stuck due to the taprio hrtimer Yun Lu
@ 2024-05-03 9:18 ` Yun Lu
2024-05-03 12:03 ` [syzbot] [kasan?] [mm?] INFO: rcu detected stall in __run_timer_base syzbot
2024-05-03 21:34 ` [PATCH] net/sched: taprio: fix CPU stuck due to the taprio hrtimer Jakub Kicinski
2024-05-03 19:58 ` [syzbot] [kasan?] [mm?] INFO: rcu detected stall in __run_timer_base syzbot
2024-07-31 11:57 ` [syzbot] [usb?] " syzbot
3 siblings, 2 replies; 8+ messages in thread
From: Yun Lu @ 2024-05-03 9:18 UTC (permalink / raw)
To: syzbot+1acbadd9f48eeeacda29; +Cc: syzkaller-bugs, netdev
#syz test https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
---
net/sched/sch_taprio.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index a0d54b422186..360778f65d9e 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -104,6 +104,7 @@ struct taprio_sched {
u32 max_sdu[TC_MAX_QUEUE]; /* save info from the user */
u32 fp[TC_QOPT_MAX_QUEUE]; /* only for dump and offloading */
u32 txtime_delay;
+ ktime_t offset;
};
struct __tc_taprio_qopt_offload {
@@ -170,6 +171,13 @@ static ktime_t sched_base_time(const struct sched_gate_list *sched)
return ns_to_ktime(sched->base_time);
}
+static ktime_t taprio_get_offset(enum tk_offsets tk_offset)
+{
+ ktime_t time = ktime_get();
+
+ return ktime_sub_ns(ktime_mono_to_any(time, tk_offset), time);
+}
+
static ktime_t taprio_mono_to_any(const struct taprio_sched *q, ktime_t mono)
{
/* This pairs with WRITE_ONCE() in taprio_parse_clockid() */
@@ -918,6 +926,8 @@ static enum hrtimer_restart advance_sched(struct hrtimer *timer)
int num_tc = netdev_get_num_tc(dev);
struct sched_entry *entry, *next;
struct Qdisc *sch = q->root;
+ enum tk_offsets tk_offset = READ_ONCE(q->tk_offset);
+ ktime_t now_offset = taprio_get_offset(tk_offset);
ktime_t end_time;
int tc;
@@ -957,6 +967,14 @@ static enum hrtimer_restart advance_sched(struct hrtimer *timer)
end_time = ktime_add_ns(entry->end_time, next->interval);
end_time = min_t(ktime_t, end_time, oper->cycle_end_time);
+ if (q->offset != now_offset) {
+ ktime_t diff = ktime_sub_ns(now_offset, q->offset);
+
+ end_time = ktime_add_ns(end_time, diff);
+ oper->cycle_end_time = ktime_add_ns(oper->cycle_end_time, diff);
+ q->offset = now_offset;
+ }
+
for (tc = 0; tc < num_tc; tc++) {
if (next->gate_duration[tc] == oper->cycle_time)
next->gate_close_time[tc] = KTIME_MAX;
@@ -1205,11 +1223,13 @@ static int taprio_get_start_time(struct Qdisc *sch,
ktime_t *start)
{
struct taprio_sched *q = qdisc_priv(sch);
+ enum tk_offsets tk_offset = READ_ONCE(q->tk_offset);
ktime_t now, base, cycle;
s64 n;
base = sched_base_time(sched);
now = taprio_get_time(q);
+ q->offset = taprio_get_offset(tk_offset);
if (ktime_after(base, now)) {
*start = base;
--
2.34.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [syzbot] [kasan?] [mm?] INFO: rcu detected stall in __run_timer_base
2024-05-03 9:18 ` Yun Lu
@ 2024-05-03 12:03 ` syzbot
2024-05-03 21:34 ` [PATCH] net/sched: taprio: fix CPU stuck due to the taprio hrtimer Jakub Kicinski
1 sibling, 0 replies; 8+ messages in thread
From: syzbot @ 2024-05-03 12:03 UTC (permalink / raw)
To: linux-kernel, luyun, netdev, syzkaller-bugs
Hello,
syzbot has tested the proposed patch but the reproducer is still triggering an issue:
UBSAN: array-index-out-of-bounds in ktime_mono_to_any
------------[ cut here ]------------
UBSAN: array-index-out-of-bounds in kernel/time/timekeeping.c:927:20
index 3 is out of range for type 'ktime_t *[3]' (aka 'long long *[3]')
CPU: 0 PID: 5514 Comm: syz-executor.0 Not tainted 6.9.0-rc6-syzkaller-00131-gf03359bca01b-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
ubsan_epilogue lib/ubsan.c:231 [inline]
__ubsan_handle_out_of_bounds+0x121/0x150 lib/ubsan.c:429
ktime_mono_to_any+0xf7/0x100 kernel/time/timekeeping.c:927
taprio_get_offset net/sched/sch_taprio.c:178 [inline]
taprio_get_start_time+0xf6/0x2a0 net/sched/sch_taprio.c:1232
taprio_change+0x30c5/0x42d0 net/sched/sch_taprio.c:1963
taprio_init+0x9da/0xc80 net/sched/sch_taprio.c:2135
qdisc_create+0x9d6/0x1190 net/sched/sch_api.c:1355
tc_modify_qdisc+0xa26/0x1e40 net/sched/sch_api.c:1776
rtnetlink_rcv_msg+0x89d/0x10d0 net/core/rtnetlink.c:6595
netlink_rcv_skb+0x1e5/0x430 net/netlink/af_netlink.c:2559
netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
netlink_unicast+0x7ec/0x980 net/netlink/af_netlink.c:1361
netlink_sendmsg+0x8e1/0xcb0 net/netlink/af_netlink.c:1905
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg+0x223/0x270 net/socket.c:745
____sys_sendmsg+0x525/0x7d0 net/socket.c:2584
___sys_sendmsg net/socket.c:2638 [inline]
__sys_sendmsg+0x2b0/0x3a0 net/socket.c:2667
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fca4dc7de69
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 20 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fca4e9720c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007fca4ddabf80 RCX: 00007fca4dc7de69
RDX: 0000000000000000 RSI: 00000000200007c0 RDI: 0000000000000004
RBP: 00007fca4dcca47a R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 000000000000000b R14: 00007fca4ddabf80 R15: 00007fffc66d5928
</TASK>
---[ end trace ]---
Tested on:
commit: f03359bc Merge tag 'for-6.9-rc6-tag' of git://git.kern..
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
console output: https://syzkaller.appspot.com/x/log.txt?x=17aef62f180000
kernel config: https://syzkaller.appspot.com/x/.config?x=38cdad974684e704
dashboard link: https://syzkaller.appspot.com/bug?extid=1acbadd9f48eeeacda29
compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
patch: https://syzkaller.appspot.com/x/patch.diff?x=1665ecc0980000
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [syzbot] [kasan?] [mm?] INFO: rcu detected stall in __run_timer_base
[not found] <00000000000022a23c061604edb3@google.com>
2024-05-03 8:53 ` [PATCH] net/sched: taprio: fix CPU stuck due to the taprio hrtimer Yun Lu
2024-05-03 9:18 ` Yun Lu
@ 2024-05-03 19:58 ` syzbot
2024-07-31 11:57 ` [syzbot] [usb?] " syzbot
3 siblings, 0 replies; 8+ messages in thread
From: syzbot @ 2024-05-03 19:58 UTC (permalink / raw)
To: akpm, brauner, davem, dvyukov, elver, glider, hdanton, jhs,
kasan-dev, keescook, linux-fsdevel, linux-kernel, linux-mm, luyun,
netdev, pctammela, syzkaller-bugs, victor, viro, vladimir.oltean
syzbot has bisected this issue to:
commit da71714e359b64bd7aab3bd56ec53f307f058133
Author: Jamal Hadi Salim <jhs@mojatatu.com>
Date: Tue Aug 22 10:12:31 2023 +0000
net/sched: fix a qdisc modification with ambiguous command request
bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=13b9b317180000
start commit: fe46a7dd189e Merge tag 'sound-6.9-rc1' of git://git.kernel..
git tree: upstream
final oops: https://syzkaller.appspot.com/x/report.txt?x=1079b317180000
console output: https://syzkaller.appspot.com/x/log.txt?x=17b9b317180000
kernel config: https://syzkaller.appspot.com/x/.config?x=fe78468a74fdc3b7
dashboard link: https://syzkaller.appspot.com/bug?extid=1acbadd9f48eeeacda29
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=16435913180000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=111600cb180000
Reported-by: syzbot+1acbadd9f48eeeacda29@syzkaller.appspotmail.com
Fixes: da71714e359b ("net/sched: fix a qdisc modification with ambiguous command request")
For information about bisection process see: https://goo.gl/tpsmEJ#bisection
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] net/sched: taprio: fix CPU stuck due to the taprio hrtimer
2024-05-03 9:18 ` Yun Lu
2024-05-03 12:03 ` [syzbot] [kasan?] [mm?] INFO: rcu detected stall in __run_timer_base syzbot
@ 2024-05-03 21:34 ` Jakub Kicinski
1 sibling, 0 replies; 8+ messages in thread
From: Jakub Kicinski @ 2024-05-03 21:34 UTC (permalink / raw)
To: Yun Lu; +Cc: syzbot+1acbadd9f48eeeacda29, syzkaller-bugs, netdev
On Fri, 3 May 2024 17:18:44 +0800 Yun Lu wrote:
> #syz test https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
Why are you CCing netdev@ on this? 🤔️
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [syzbot] [kasan?] [mm?] INFO: rcu detected stall in __run_timer_base
[not found] <20240414025336.2016-1-hdanton@sina.com>
@ 2024-05-07 3:32 ` Yun Lu
0 siblings, 0 replies; 8+ messages in thread
From: Yun Lu @ 2024-05-07 3:32 UTC (permalink / raw)
To: hdanton
Cc: linux-kernel, syzbot+1acbadd9f48eeeacda29, syzkaller-bugs,
vinicius.gomes, jhs, netdev
Hello,
Indeed, the taprio hrtimer does cause CPU stuck in certain specific scenarios,
and this patch indirectly confirms it.
However, it seems this patch isn't the final solution?
On my testing machine, after starting Taprio hrtimer and then adjusting the
system time backward, it can be observed that the taprio hrtimer indeed gets stuck
in an infinite loop through tracing. And, the patch below can effectively resolve
this issue, but it doesn't work for syzbot's tests.
https://lore.kernel.org/all/20240506023617.1309937-1-luyun@kylinos.cn/
Are there any better suggestions from others regarding the resolution of this CPU stuck issue?
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [syzbot] [usb?] INFO: rcu detected stall in __run_timer_base
[not found] <00000000000022a23c061604edb3@google.com>
` (2 preceding siblings ...)
2024-05-03 19:58 ` [syzbot] [kasan?] [mm?] INFO: rcu detected stall in __run_timer_base syzbot
@ 2024-07-31 11:57 ` syzbot
2024-07-31 14:08 ` Alan Stern
3 siblings, 1 reply; 8+ messages in thread
From: syzbot @ 2024-07-31 11:57 UTC (permalink / raw)
To: akpm, brauner, davem, dvyukov, elver, glider, gregkh, hdanton,
jhs, kasan-dev, keescook, kuba, linux-fsdevel, linux-kernel,
linux-mm, linux-usb, luyun, netdev, pctammela, rafael, stern,
syzkaller-bugs, victor, vinicius.gomes, viro, vladimir.oltean
syzbot suspects this issue was fixed by commit:
commit 22f00812862564b314784167a89f27b444f82a46
Author: Alan Stern <stern@rowland.harvard.edu>
Date: Fri Jun 14 01:30:43 2024 +0000
USB: class: cdc-wdm: Fix CPU lockup caused by excessive log messages
bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=14f906bd980000
start commit: 89be4025b0db Merge tag '6.10-rc1-smb3-client-fixes' of git..
git tree: upstream
kernel config: https://syzkaller.appspot.com/x/.config?x=b9016f104992d69c
dashboard link: https://syzkaller.appspot.com/bug?extid=1acbadd9f48eeeacda29
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=145ed3fc980000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=11c1541c980000
If the result looks correct, please mark the issue as fixed by replying with:
#syz fix: USB: class: cdc-wdm: Fix CPU lockup caused by excessive log messages
For information about bisection process see: https://goo.gl/tpsmEJ#bisection
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [syzbot] [usb?] INFO: rcu detected stall in __run_timer_base
2024-07-31 11:57 ` [syzbot] [usb?] " syzbot
@ 2024-07-31 14:08 ` Alan Stern
0 siblings, 0 replies; 8+ messages in thread
From: Alan Stern @ 2024-07-31 14:08 UTC (permalink / raw)
To: syzbot
Cc: akpm, brauner, davem, dvyukov, elver, glider, gregkh, hdanton,
jhs, kasan-dev, keescook, kuba, linux-fsdevel, linux-kernel,
linux-mm, linux-usb, luyun, netdev, pctammela, rafael,
syzkaller-bugs, victor, vinicius.gomes, viro, vladimir.oltean
On Wed, Jul 31, 2024 at 04:57:02AM -0700, syzbot wrote:
> syzbot suspects this issue was fixed by commit:
>
> commit 22f00812862564b314784167a89f27b444f82a46
> Author: Alan Stern <stern@rowland.harvard.edu>
> Date: Fri Jun 14 01:30:43 2024 +0000
>
> USB: class: cdc-wdm: Fix CPU lockup caused by excessive log messages
>
> bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=14f906bd980000
> start commit: 89be4025b0db Merge tag '6.10-rc1-smb3-client-fixes' of git..
> git tree: upstream
> kernel config: https://syzkaller.appspot.com/x/.config?x=b9016f104992d69c
> dashboard link: https://syzkaller.appspot.com/bug?extid=1acbadd9f48eeeacda29
> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=145ed3fc980000
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=11c1541c980000
>
> If the result looks correct, please mark the issue as fixed by replying with:
>
#syz fix: USB: class: cdc-wdm: Fix CPU lockup caused by excessive log messages
>
> For information about bisection process see: https://goo.gl/tpsmEJ#bisection
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2024-07-31 14:08 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <00000000000022a23c061604edb3@google.com>
2024-05-03 8:53 ` [PATCH] net/sched: taprio: fix CPU stuck due to the taprio hrtimer Yun Lu
2024-05-03 9:18 ` Yun Lu
2024-05-03 12:03 ` [syzbot] [kasan?] [mm?] INFO: rcu detected stall in __run_timer_base syzbot
2024-05-03 21:34 ` [PATCH] net/sched: taprio: fix CPU stuck due to the taprio hrtimer Jakub Kicinski
2024-05-03 19:58 ` [syzbot] [kasan?] [mm?] INFO: rcu detected stall in __run_timer_base syzbot
2024-07-31 11:57 ` [syzbot] [usb?] " syzbot
2024-07-31 14:08 ` Alan Stern
[not found] <20240414025336.2016-1-hdanton@sina.com>
2024-05-07 3:32 ` [syzbot] [kasan?] [mm?] " Yun Lu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).