* [PATCH net] net/sched: taprio: fix duration_to_length()
@ 2024-05-23 13:45 Eric Dumazet
2024-05-23 19:05 ` Simon Horman
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Eric Dumazet @ 2024-05-23 13:45 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Jamal Hadi Salim, Cong Wang, Jiri Pirko, netdev, eric.dumazet,
Eric Dumazet, syzbot, Vladimir Oltean, Vinicius Costa Gomes
duration_to_length() is incorrectly using div_u64()
instead of div64_u64().
syzbot reported:
Oops: divide error: 0000 [#1] PREEMPT SMP KASAN PTI
CPU: 1 PID: 15391 Comm: syz-executor.0 Not tainted 6.9.0-syzkaller-08544-g4b377b4868ef #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/02/2024
RIP: 0010:div_u64_rem include/linux/math64.h:29 [inline]
RIP: 0010:div_u64 include/linux/math64.h:130 [inline]
RIP: 0010:duration_to_length net/sched/sch_taprio.c:259 [inline]
RIP: 0010:taprio_update_queue_max_sdu+0x287/0x870 net/sched/sch_taprio.c:288
Code: be 08 00 00 00 e8 99 5b 6a f8 48 89 d8 48 c1 e8 03 42 80 3c 28 00 74 08 48 89 df e8 13 59 6a f8 48 8b 03 89 c1 48 89 e8 31 d2 <48> f7 f1 48 89 c5 48 83 7c 24 50 00 4c 8b 74 24 30 74 47 e8 c1 19
RSP: 0018:ffffc9000506eb38 EFLAGS: 00010246
RAX: 0000000000001f40 RBX: ffff88802f3562e0 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88802f3562e0
RBP: 0000000000001f40 R08: ffff88802f3562e7 R09: 1ffff11005e6ac5c
R10: dffffc0000000000 R11: ffffed1005e6ac5d R12: 00000000ffffffff
R13: dffffc0000000000 R14: ffff88801ef59400 R15: 00000000003f0008
FS: 00007fee340bf6c0(0000) GS:ffff8880b9500000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b2c524000 CR3: 0000000024a52000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
taprio_change+0x2dce/0x42d0 net/sched/sch_taprio.c:1911
taprio_init+0x9da/0xc80 net/sched/sch_taprio.c:2112
qdisc_create+0x9d4/0x11a0 net/sched/sch_api.c:1355
tc_modify_qdisc+0xa26/0x1e40 net/sched/sch_api.c:1777
rtnetlink_rcv_msg+0x89b/0x10d0 net/core/rtnetlink.c:6595
netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2564
netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1361
netlink_sendmsg+0x8e1/0xcb0 net/netlink/af_netlink.c:1905
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg+0x221/0x270 net/socket.c:745
____sys_sendmsg+0x525/0x7d0 net/socket.c:2584
___sys_sendmsg net/socket.c:2638 [inline]
__sys_sendmsg+0x2b0/0x3a0 net/socket.c:2667
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fee3327cee9
Fixes: fed87cc6718a ("net/sched: taprio: automatically calculate queueMaxSDU based on TC gate durations")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Vladimir Oltean <vladimir.oltean@nxp.com>
Cc: Vinicius Costa Gomes <vinicius.gomes@intel.com>
---
net/sched/sch_taprio.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index 1ab17e8a72605385280fad9b7f656a6771236acc..827fb81fc63a098304bad198fadd4aed55d1fec4 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -256,7 +256,8 @@ static int length_to_duration(struct taprio_sched *q, int len)
static int duration_to_length(struct taprio_sched *q, u64 duration)
{
- return div_u64(duration * PSEC_PER_NSEC, atomic64_read(&q->picos_per_byte));
+ return div64_u64(duration * PSEC_PER_NSEC,
+ atomic64_read(&q->picos_per_byte));
}
/* Sets sched->max_sdu[] and sched->max_frm_len[] to the minimum between the
--
2.45.1.288.g0e0cd299f1-goog
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH net] net/sched: taprio: fix duration_to_length()
2024-05-23 13:45 [PATCH net] net/sched: taprio: fix duration_to_length() Eric Dumazet
@ 2024-05-23 19:05 ` Simon Horman
2024-05-23 23:08 ` Vinicius Costa Gomes
2024-05-24 15:39 ` Vladimir Oltean
2 siblings, 0 replies; 9+ messages in thread
From: Simon Horman @ 2024-05-23 19:05 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Jamal Hadi Salim,
Cong Wang, Jiri Pirko, netdev, eric.dumazet, syzbot,
Vladimir Oltean, Vinicius Costa Gomes
On Thu, May 23, 2024 at 01:45:49PM +0000, Eric Dumazet wrote:
> duration_to_length() is incorrectly using div_u64()
> instead of div64_u64().
>
> syzbot reported:
>
> Oops: divide error: 0000 [#1] PREEMPT SMP KASAN PTI
> CPU: 1 PID: 15391 Comm: syz-executor.0 Not tainted 6.9.0-syzkaller-08544-g4b377b4868ef #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/02/2024
> RIP: 0010:div_u64_rem include/linux/math64.h:29 [inline]
> RIP: 0010:div_u64 include/linux/math64.h:130 [inline]
> RIP: 0010:duration_to_length net/sched/sch_taprio.c:259 [inline]
> RIP: 0010:taprio_update_queue_max_sdu+0x287/0x870 net/sched/sch_taprio.c:288
> Code: be 08 00 00 00 e8 99 5b 6a f8 48 89 d8 48 c1 e8 03 42 80 3c 28 00 74 08 48 89 df e8 13 59 6a f8 48 8b 03 89 c1 48 89 e8 31 d2 <48> f7 f1 48 89 c5 48 83 7c 24 50 00 4c 8b 74 24 30 74 47 e8 c1 19
> RSP: 0018:ffffc9000506eb38 EFLAGS: 00010246
> RAX: 0000000000001f40 RBX: ffff88802f3562e0 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88802f3562e0
> RBP: 0000000000001f40 R08: ffff88802f3562e7 R09: 1ffff11005e6ac5c
> R10: dffffc0000000000 R11: ffffed1005e6ac5d R12: 00000000ffffffff
> R13: dffffc0000000000 R14: ffff88801ef59400 R15: 00000000003f0008
> FS: 00007fee340bf6c0(0000) GS:ffff8880b9500000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000001b2c524000 CR3: 0000000024a52000 CR4: 00000000003506f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> <TASK>
> taprio_change+0x2dce/0x42d0 net/sched/sch_taprio.c:1911
> taprio_init+0x9da/0xc80 net/sched/sch_taprio.c:2112
> qdisc_create+0x9d4/0x11a0 net/sched/sch_api.c:1355
> tc_modify_qdisc+0xa26/0x1e40 net/sched/sch_api.c:1777
> rtnetlink_rcv_msg+0x89b/0x10d0 net/core/rtnetlink.c:6595
> netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2564
> netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
> netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1361
> netlink_sendmsg+0x8e1/0xcb0 net/netlink/af_netlink.c:1905
> sock_sendmsg_nosec net/socket.c:730 [inline]
> __sock_sendmsg+0x221/0x270 net/socket.c:745
> ____sys_sendmsg+0x525/0x7d0 net/socket.c:2584
> ___sys_sendmsg net/socket.c:2638 [inline]
> __sys_sendmsg+0x2b0/0x3a0 net/socket.c:2667
> do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
> RIP: 0033:0x7fee3327cee9
>
> Fixes: fed87cc6718a ("net/sched: taprio: automatically calculate queueMaxSDU based on TC gate durations")
> Reported-by: syzbot <syzkaller@googlegroups.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Vladimir Oltean <vladimir.oltean@nxp.com>
> Cc: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net] net/sched: taprio: fix duration_to_length()
2024-05-23 13:45 [PATCH net] net/sched: taprio: fix duration_to_length() Eric Dumazet
2024-05-23 19:05 ` Simon Horman
@ 2024-05-23 23:08 ` Vinicius Costa Gomes
2024-05-24 15:39 ` Vladimir Oltean
2 siblings, 0 replies; 9+ messages in thread
From: Vinicius Costa Gomes @ 2024-05-23 23:08 UTC (permalink / raw)
To: Eric Dumazet, David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Jamal Hadi Salim, Cong Wang, Jiri Pirko, netdev, eric.dumazet,
Eric Dumazet, syzbot, Vladimir Oltean
Eric Dumazet <edumazet@google.com> writes:
> duration_to_length() is incorrectly using div_u64()
> instead of div64_u64().
>
> syzbot reported:
>
> Oops: divide error: 0000 [#1] PREEMPT SMP KASAN PTI
> CPU: 1 PID: 15391 Comm: syz-executor.0 Not tainted 6.9.0-syzkaller-08544-g4b377b4868ef #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/02/2024
> RIP: 0010:div_u64_rem include/linux/math64.h:29 [inline]
> RIP: 0010:div_u64 include/linux/math64.h:130 [inline]
> RIP: 0010:duration_to_length net/sched/sch_taprio.c:259 [inline]
> RIP: 0010:taprio_update_queue_max_sdu+0x287/0x870 net/sched/sch_taprio.c:288
> Code: be 08 00 00 00 e8 99 5b 6a f8 48 89 d8 48 c1 e8 03 42 80 3c 28 00 74 08 48 89 df e8 13 59 6a f8 48 8b 03 89 c1 48 89 e8 31 d2 <48> f7 f1 48 89 c5 48 83 7c 24 50 00 4c 8b 74 24 30 74 47 e8 c1 19
> RSP: 0018:ffffc9000506eb38 EFLAGS: 00010246
> RAX: 0000000000001f40 RBX: ffff88802f3562e0 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88802f3562e0
> RBP: 0000000000001f40 R08: ffff88802f3562e7 R09: 1ffff11005e6ac5c
> R10: dffffc0000000000 R11: ffffed1005e6ac5d R12: 00000000ffffffff
> R13: dffffc0000000000 R14: ffff88801ef59400 R15: 00000000003f0008
> FS: 00007fee340bf6c0(0000) GS:ffff8880b9500000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000001b2c524000 CR3: 0000000024a52000 CR4: 00000000003506f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> <TASK>
> taprio_change+0x2dce/0x42d0 net/sched/sch_taprio.c:1911
> taprio_init+0x9da/0xc80 net/sched/sch_taprio.c:2112
> qdisc_create+0x9d4/0x11a0 net/sched/sch_api.c:1355
> tc_modify_qdisc+0xa26/0x1e40 net/sched/sch_api.c:1777
> rtnetlink_rcv_msg+0x89b/0x10d0 net/core/rtnetlink.c:6595
> netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2564
> netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
> netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1361
> netlink_sendmsg+0x8e1/0xcb0 net/netlink/af_netlink.c:1905
> sock_sendmsg_nosec net/socket.c:730 [inline]
> __sock_sendmsg+0x221/0x270 net/socket.c:745
> ____sys_sendmsg+0x525/0x7d0 net/socket.c:2584
> ___sys_sendmsg net/socket.c:2638 [inline]
> __sys_sendmsg+0x2b0/0x3a0 net/socket.c:2667
> do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
> RIP: 0033:0x7fee3327cee9
>
> Fixes: fed87cc6718a ("net/sched: taprio: automatically calculate queueMaxSDU based on TC gate durations")
> Reported-by: syzbot <syzkaller@googlegroups.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Vladimir Oltean <vladimir.oltean@nxp.com>
> Cc: Vinicius Costa Gomes <vinicius.gomes@intel.com>
> ---
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
--
Vinicius
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net] net/sched: taprio: fix duration_to_length()
2024-05-23 13:45 [PATCH net] net/sched: taprio: fix duration_to_length() Eric Dumazet
2024-05-23 19:05 ` Simon Horman
2024-05-23 23:08 ` Vinicius Costa Gomes
@ 2024-05-24 15:39 ` Vladimir Oltean
2024-05-24 15:50 ` Eric Dumazet
2 siblings, 1 reply; 9+ messages in thread
From: Vladimir Oltean @ 2024-05-24 15:39 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Jamal Hadi Salim,
Cong Wang, Jiri Pirko, netdev, eric.dumazet, syzbot,
Vinicius Costa Gomes
On Thu, May 23, 2024 at 01:45:49PM +0000, Eric Dumazet wrote:
> duration_to_length() is incorrectly using div_u64()
> instead of div64_u64().
> ---
> net/sched/sch_taprio.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
> index 1ab17e8a72605385280fad9b7f656a6771236acc..827fb81fc63a098304bad198fadd4aed55d1fec4 100644
> --- a/net/sched/sch_taprio.c
> +++ b/net/sched/sch_taprio.c
> @@ -256,7 +256,8 @@ static int length_to_duration(struct taprio_sched *q, int len)
>
> static int duration_to_length(struct taprio_sched *q, u64 duration)
> {
> - return div_u64(duration * PSEC_PER_NSEC, atomic64_read(&q->picos_per_byte));
> + return div64_u64(duration * PSEC_PER_NSEC,
> + atomic64_read(&q->picos_per_byte));
> }
There's a netdev_dbg() in taprio_set_picos_per_byte(). Could you turn
that on? I'm curious what was the q->picos_per_byte value that triggered
the 64-bit division fault. There are a few weird things about
q->picos_per_byte's representation and use as an atomic64_t (s64) type.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net] net/sched: taprio: fix duration_to_length()
2024-05-24 15:39 ` Vladimir Oltean
@ 2024-05-24 15:50 ` Eric Dumazet
2024-05-24 15:52 ` Eric Dumazet
0 siblings, 1 reply; 9+ messages in thread
From: Eric Dumazet @ 2024-05-24 15:50 UTC (permalink / raw)
To: Vladimir Oltean
Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Jamal Hadi Salim,
Cong Wang, Jiri Pirko, netdev, eric.dumazet, syzbot,
Vinicius Costa Gomes
On Fri, May 24, 2024 at 5:39 PM Vladimir Oltean <vladimir.oltean@nxp.com> wrote:
>
> On Thu, May 23, 2024 at 01:45:49PM +0000, Eric Dumazet wrote:
> > duration_to_length() is incorrectly using div_u64()
> > instead of div64_u64().
> > ---
> > net/sched/sch_taprio.c | 3 ++-
> > 1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
> > index 1ab17e8a72605385280fad9b7f656a6771236acc..827fb81fc63a098304bad198fadd4aed55d1fec4 100644
> > --- a/net/sched/sch_taprio.c
> > +++ b/net/sched/sch_taprio.c
> > @@ -256,7 +256,8 @@ static int length_to_duration(struct taprio_sched *q, int len)
> >
> > static int duration_to_length(struct taprio_sched *q, u64 duration)
> > {
> > - return div_u64(duration * PSEC_PER_NSEC, atomic64_read(&q->picos_per_byte));
> > + return div64_u64(duration * PSEC_PER_NSEC,
> > + atomic64_read(&q->picos_per_byte));
> > }
>
> There's a netdev_dbg() in taprio_set_picos_per_byte(). Could you turn
> that on? I'm curious what was the q->picos_per_byte value that triggered
> the 64-bit division fault. There are a few weird things about
> q->picos_per_byte's representation and use as an atomic64_t (s64) type.
No repro yet.
Anything with 32 low order bits cleared would trigger a divide by 0.
(1ULL << 32) picoseconds is only 4.294 ms
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net] net/sched: taprio: fix duration_to_length()
2024-05-24 15:50 ` Eric Dumazet
@ 2024-05-24 15:52 ` Eric Dumazet
2024-05-24 16:07 ` Vladimir Oltean
0 siblings, 1 reply; 9+ messages in thread
From: Eric Dumazet @ 2024-05-24 15:52 UTC (permalink / raw)
To: Vladimir Oltean
Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Jamal Hadi Salim,
Cong Wang, Jiri Pirko, netdev, eric.dumazet, syzbot,
Vinicius Costa Gomes
On Fri, May 24, 2024 at 5:50 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Fri, May 24, 2024 at 5:39 PM Vladimir Oltean <vladimir.oltean@nxp.com> wrote:
> >
> > On Thu, May 23, 2024 at 01:45:49PM +0000, Eric Dumazet wrote:
> > > duration_to_length() is incorrectly using div_u64()
> > > instead of div64_u64().
> > > ---
> > > net/sched/sch_taprio.c | 3 ++-
> > > 1 file changed, 2 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
> > > index 1ab17e8a72605385280fad9b7f656a6771236acc..827fb81fc63a098304bad198fadd4aed55d1fec4 100644
> > > --- a/net/sched/sch_taprio.c
> > > +++ b/net/sched/sch_taprio.c
> > > @@ -256,7 +256,8 @@ static int length_to_duration(struct taprio_sched *q, int len)
> > >
> > > static int duration_to_length(struct taprio_sched *q, u64 duration)
> > > {
> > > - return div_u64(duration * PSEC_PER_NSEC, atomic64_read(&q->picos_per_byte));
> > > + return div64_u64(duration * PSEC_PER_NSEC,
> > > + atomic64_read(&q->picos_per_byte));
> > > }
> >
> > There's a netdev_dbg() in taprio_set_picos_per_byte(). Could you turn
> > that on? I'm curious what was the q->picos_per_byte value that triggered
> > the 64-bit division fault. There are a few weird things about
> > q->picos_per_byte's representation and use as an atomic64_t (s64) type.
>
>
> No repro yet.
>
> Anything with 32 low order bits cleared would trigger a divide by 0.
>
> (1ULL << 32) picoseconds is only 4.294 ms
BTW, just a reminder, div_u64() is a divide by a 32bit value...
static inline u64 div_u64(u64 dividend, u32 divisor)
...
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net] net/sched: taprio: fix duration_to_length()
2024-05-24 15:52 ` Eric Dumazet
@ 2024-05-24 16:07 ` Vladimir Oltean
2024-05-27 8:07 ` Eric Dumazet
0 siblings, 1 reply; 9+ messages in thread
From: Vladimir Oltean @ 2024-05-24 16:07 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Jamal Hadi Salim,
Cong Wang, Jiri Pirko, netdev, eric.dumazet, syzbot,
Vinicius Costa Gomes
On Fri, May 24, 2024 at 05:52:17PM +0200, Eric Dumazet wrote:
> On Fri, May 24, 2024 at 5:50 PM Eric Dumazet <edumazet@google.com> wrote:
> >
> > On Fri, May 24, 2024 at 5:39 PM Vladimir Oltean <vladimir.oltean@nxp.com> wrote:
> > >
> > > On Thu, May 23, 2024 at 01:45:49PM +0000, Eric Dumazet wrote:
> > > > duration_to_length() is incorrectly using div_u64()
> > > > instead of div64_u64().
> > > > ---
> > > > net/sched/sch_taprio.c | 3 ++-
> > > > 1 file changed, 2 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
> > > > index 1ab17e8a72605385280fad9b7f656a6771236acc..827fb81fc63a098304bad198fadd4aed55d1fec4 100644
> > > > --- a/net/sched/sch_taprio.c
> > > > +++ b/net/sched/sch_taprio.c
> > > > @@ -256,7 +256,8 @@ static int length_to_duration(struct taprio_sched *q, int len)
> > > >
> > > > static int duration_to_length(struct taprio_sched *q, u64 duration)
> > > > {
> > > > - return div_u64(duration * PSEC_PER_NSEC, atomic64_read(&q->picos_per_byte));
> > > > + return div64_u64(duration * PSEC_PER_NSEC,
> > > > + atomic64_read(&q->picos_per_byte));
> > > > }
> > >
> > > There's a netdev_dbg() in taprio_set_picos_per_byte(). Could you turn
> > > that on? I'm curious what was the q->picos_per_byte value that triggered
> > > the 64-bit division fault. There are a few weird things about
> > > q->picos_per_byte's representation and use as an atomic64_t (s64) type.
> >
> >
> > No repro yet.
> >
> > Anything with 32 low order bits cleared would trigger a divide by 0.
> >
> > (1ULL << 32) picoseconds is only 4.294 ms
>
> BTW, just a reminder, div_u64() is a divide by a 32bit value...
>
> static inline u64 div_u64(u64 dividend, u32 divisor)
> ...
The thing is that I don't see how q->picos_per_byte could take any sane
value of either 0 or a multiple of 2^32. Its formula is "(USEC_PER_SEC * 8) / speed"
where "speed" is the link speed: 10, 100, 1000 etc. The special cases
of speed=0 and speed=SPEED_UNKNOWN are handled by falling back to SPEED_10
in the picos_per_byte calculation.
For q->picos_per_byte to be larger than 2^32, "speed" would have to be
smaller than 8000000 / U32_MAX (0.001862645).
For q->picos_per_byte to be exactly 0, "speed" would have to be larger
than 8000000. But the largest defined speed in include/uapi/linux/ethtool.h
is precisely SPEED_800000, leading to an expected q->picos_per_byte of 1.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net] net/sched: taprio: fix duration_to_length()
2024-05-24 16:07 ` Vladimir Oltean
@ 2024-05-27 8:07 ` Eric Dumazet
2024-05-27 11:43 ` Vladimir Oltean
0 siblings, 1 reply; 9+ messages in thread
From: Eric Dumazet @ 2024-05-27 8:07 UTC (permalink / raw)
To: Vladimir Oltean
Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Jamal Hadi Salim,
Cong Wang, Jiri Pirko, netdev, eric.dumazet, syzbot,
Vinicius Costa Gomes
On Fri, May 24, 2024 at 6:07 PM Vladimir Oltean <vladimir.oltean@nxp.com> wrote:
>
> On Fri, May 24, 2024 at 05:52:17PM +0200, Eric Dumazet wrote:
> > On Fri, May 24, 2024 at 5:50 PM Eric Dumazet <edumazet@google.com> wrote:
> > >
> > > On Fri, May 24, 2024 at 5:39 PM Vladimir Oltean <vladimir.oltean@nxp.com> wrote:
> > > >
> > > > On Thu, May 23, 2024 at 01:45:49PM +0000, Eric Dumazet wrote:
> > > > > duration_to_length() is incorrectly using div_u64()
> > > > > instead of div64_u64().
> > > > > ---
> > > > > net/sched/sch_taprio.c | 3 ++-
> > > > > 1 file changed, 2 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
> > > > > index 1ab17e8a72605385280fad9b7f656a6771236acc..827fb81fc63a098304bad198fadd4aed55d1fec4 100644
> > > > > --- a/net/sched/sch_taprio.c
> > > > > +++ b/net/sched/sch_taprio.c
> > > > > @@ -256,7 +256,8 @@ static int length_to_duration(struct taprio_sched *q, int len)
> > > > >
> > > > > static int duration_to_length(struct taprio_sched *q, u64 duration)
> > > > > {
> > > > > - return div_u64(duration * PSEC_PER_NSEC, atomic64_read(&q->picos_per_byte));
> > > > > + return div64_u64(duration * PSEC_PER_NSEC,
> > > > > + atomic64_read(&q->picos_per_byte));
> > > > > }
> > > >
> > > > There's a netdev_dbg() in taprio_set_picos_per_byte(). Could you turn
> > > > that on? I'm curious what was the q->picos_per_byte value that triggered
> > > > the 64-bit division fault. There are a few weird things about
> > > > q->picos_per_byte's representation and use as an atomic64_t (s64) type.
> > >
> > >
> > > No repro yet.
> > >
> > > Anything with 32 low order bits cleared would trigger a divide by 0.
> > >
> > > (1ULL << 32) picoseconds is only 4.294 ms
> >
> > BTW, just a reminder, div_u64() is a divide by a 32bit value...
> >
> > static inline u64 div_u64(u64 dividend, u32 divisor)
> > ...
>
> The thing is that I don't see how q->picos_per_byte could take any sane
> value of either 0 or a multiple of 2^32. Its formula is "(USEC_PER_SEC * 8) / speed"
> where "speed" is the link speed: 10, 100, 1000 etc. The special cases
> of speed=0 and speed=SPEED_UNKNOWN are handled by falling back to SPEED_10
> in the picos_per_byte calculation.
>
> For q->picos_per_byte to be larger than 2^32, "speed" would have to be
> smaller than 8000000 / U32_MAX (0.001862645).
>
> For q->picos_per_byte to be exactly 0, "speed" would have to be larger
> than 8000000. But the largest defined speed in include/uapi/linux/ethtool.h
> is precisely SPEED_800000, leading to an expected q->picos_per_byte of 1.
This suggests q->picos_per_byte should be a mere u32, and that
taprio_set_picos_per_byte()
should make sure to not set 0 in q->picos_per_byte
Presumably some devices must get a speed bigger than SPEED_800000
team driver could do that, according to team_ethtool_get_link_ksettings()
diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index 1ab17e8a72605385280fad9b7f656a6771236acc..71087a53630362863cc6c5e462b29dbef8cd5d74
100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -89,9 +89,9 @@ struct taprio_sched {
bool offloaded;
bool detected_mqprio;
bool broken_mqprio;
- atomic64_t picos_per_byte; /* Using picoseconds because for 10Gbps+
- * speeds it's sub-nanoseconds per byte
- */
+ atomic_t picos_per_byte; /* Using picoseconds because for 10Gbps+
+ * speeds it's sub-nanoseconds per byte
+ */
/* Protects the update side of the RCU protected current_entry */
spinlock_t current_entry_lock;
@@ -251,12 +251,12 @@ static ktime_t get_interval_end_time(struct
sched_gate_list *sched,
static int length_to_duration(struct taprio_sched *q, int len)
{
- return div_u64(len * atomic64_read(&q->picos_per_byte), PSEC_PER_NSEC);
+ return div_u64((u64)len * atomic_read(&q->picos_per_byte),
PSEC_PER_NSEC);
}
static int duration_to_length(struct taprio_sched *q, u64 duration)
{
- return div_u64(duration * PSEC_PER_NSEC,
atomic64_read(&q->picos_per_byte));
+ return div_u64(duration * PSEC_PER_NSEC,
atomic_read(&q->picos_per_byte));
}
/* Sets sched->max_sdu[] and sched->max_frm_len[] to the minimum between the
@@ -666,8 +666,8 @@ static void taprio_set_budgets(struct taprio_sched *q,
if (entry->gate_duration[tc] == sched->cycle_time)
budget = INT_MAX;
else
- budget =
div64_u64((u64)entry->gate_duration[tc] * PSEC_PER_NSEC,
- atomic64_read(&q->picos_per_byte));
+ budget = div_u64((u64)entry->gate_duration[tc]
* PSEC_PER_NSEC,
+ atomic_read(&q->picos_per_byte));
atomic_set(&entry->budget[tc], budget);
}
@@ -1291,7 +1291,7 @@ static void taprio_set_picos_per_byte(struct
net_device *dev,
{
struct ethtool_link_ksettings ecmd;
int speed = SPEED_10;
- int picos_per_byte;
+ u32 picos_per_byte;
int err;
err = __ethtool_get_link_ksettings(dev, &ecmd);
@@ -1303,11 +1303,11 @@ static void taprio_set_picos_per_byte(struct
net_device *dev,
skip:
picos_per_byte = (USEC_PER_SEC * 8) / speed;
-
- atomic64_set(&q->picos_per_byte, picos_per_byte);
- netdev_dbg(dev, "taprio: set %s's picos_per_byte to: %lld,
linkspeed: %d\n",
- dev->name, (long long)atomic64_read(&q->picos_per_byte),
- ecmd.base.speed);
+ if (!picos_per_byte)
+ picos_per_byte = 1U;
+ atomic_set(&q->picos_per_byte, picos_per_byte);
+ netdev_dbg(dev, "taprio: set %s's picos_per_byte to: %u,
linkspeed: %d\n",
+ dev->name, picos_per_byte, ecmd.base.speed);
}
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net] net/sched: taprio: fix duration_to_length()
2024-05-27 8:07 ` Eric Dumazet
@ 2024-05-27 11:43 ` Vladimir Oltean
0 siblings, 0 replies; 9+ messages in thread
From: Vladimir Oltean @ 2024-05-27 11:43 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Jamal Hadi Salim,
Cong Wang, Jiri Pirko, netdev, eric.dumazet, syzbot,
Vinicius Costa Gomes
On Mon, May 27, 2024 at 10:07:31AM +0200, Eric Dumazet wrote:
> On Fri, May 24, 2024 at 6:07 PM Vladimir Oltean <vladimir.oltean@nxp.com> wrote:
> >
> > On Fri, May 24, 2024 at 05:52:17PM +0200, Eric Dumazet wrote:
> > > On Fri, May 24, 2024 at 5:50 PM Eric Dumazet <edumazet@google.com> wrote:
> > > >
> > > > On Fri, May 24, 2024 at 5:39 PM Vladimir Oltean <vladimir.oltean@nxp.com> wrote:
> > > > >
> > > > > On Thu, May 23, 2024 at 01:45:49PM +0000, Eric Dumazet wrote:
> > > > > > duration_to_length() is incorrectly using div_u64()
> > > > > > instead of div64_u64().
> > > > > > ---
> > > > > > net/sched/sch_taprio.c | 3 ++-
> > > > > > 1 file changed, 2 insertions(+), 1 deletion(-)
> > > > > >
> > > > > > diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
> > > > > > index 1ab17e8a72605385280fad9b7f656a6771236acc..827fb81fc63a098304bad198fadd4aed55d1fec4 100644
> > > > > > --- a/net/sched/sch_taprio.c
> > > > > > +++ b/net/sched/sch_taprio.c
> > > > > > @@ -256,7 +256,8 @@ static int length_to_duration(struct taprio_sched *q, int len)
> > > > > >
> > > > > > static int duration_to_length(struct taprio_sched *q, u64 duration)
> > > > > > {
> > > > > > - return div_u64(duration * PSEC_PER_NSEC, atomic64_read(&q->picos_per_byte));
> > > > > > + return div64_u64(duration * PSEC_PER_NSEC,
> > > > > > + atomic64_read(&q->picos_per_byte));
> > > > > > }
> > > > >
> > > > > There's a netdev_dbg() in taprio_set_picos_per_byte(). Could you turn
> > > > > that on? I'm curious what was the q->picos_per_byte value that triggered
> > > > > the 64-bit division fault. There are a few weird things about
> > > > > q->picos_per_byte's representation and use as an atomic64_t (s64) type.
> > > >
> > > >
> > > > No repro yet.
> > > >
> > > > Anything with 32 low order bits cleared would trigger a divide by 0.
> > > >
> > > > (1ULL << 32) picoseconds is only 4.294 ms
> > >
> > > BTW, just a reminder, div_u64() is a divide by a 32bit value...
> > >
> > > static inline u64 div_u64(u64 dividend, u32 divisor)
> > > ...
> >
> > The thing is that I don't see how q->picos_per_byte could take any sane
> > value of either 0 or a multiple of 2^32. Its formula is "(USEC_PER_SEC * 8) / speed"
> > where "speed" is the link speed: 10, 100, 1000 etc. The special cases
> > of speed=0 and speed=SPEED_UNKNOWN are handled by falling back to SPEED_10
> > in the picos_per_byte calculation.
> >
> > For q->picos_per_byte to be larger than 2^32, "speed" would have to be
> > smaller than 8000000 / U32_MAX (0.001862645).
> >
> > For q->picos_per_byte to be exactly 0, "speed" would have to be larger
> > than 8000000. But the largest defined speed in include/uapi/linux/ethtool.h
> > is precisely SPEED_800000, leading to an expected q->picos_per_byte of 1.
>
> This suggests q->picos_per_byte should be a mere u32, and that
> taprio_set_picos_per_byte()
> should make sure to not set 0 in q->picos_per_byte
This is what I was hinting at, indeed. But we're getting farther away
from the problem, which is the fact that syzbot _was_ able to trigger a
division by zero somehow, when zero was not a valid value that I can see.
> Presumably some devices must get a speed bigger than SPEED_800000
>
> team driver could do that, according to team_ethtool_get_link_ksettings()
I misspoke in the earlier email. SPEED_800000 is still 1 order of
magnitude lower than the maximum representable speed (picos_per_byte
should be 10 for it, not 1). So, we should still be good.
> diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
> index 1ab17e8a72605385280fad9b7f656a6771236acc..71087a53630362863cc6c5e462b29dbef8cd5d74
> 100644
> --- a/net/sched/sch_taprio.c
> +++ b/net/sched/sch_taprio.c
> @@ -89,9 +89,9 @@ struct taprio_sched {
> bool offloaded;
> bool detected_mqprio;
> bool broken_mqprio;
> - atomic64_t picos_per_byte; /* Using picoseconds because for 10Gbps+
> - * speeds it's sub-nanoseconds per byte
> - */
> + atomic_t picos_per_byte; /* Using picoseconds because for 10Gbps+
> + * speeds it's sub-nanoseconds per byte
> + */
>
> /* Protects the update side of the RCU protected current_entry */
> spinlock_t current_entry_lock;
> @@ -251,12 +251,12 @@ static ktime_t get_interval_end_time(struct
> sched_gate_list *sched,
>
> static int length_to_duration(struct taprio_sched *q, int len)
> {
> - return div_u64(len * atomic64_read(&q->picos_per_byte), PSEC_PER_NSEC);
> + return div_u64((u64)len * atomic_read(&q->picos_per_byte),
> PSEC_PER_NSEC);
> }
>
> static int duration_to_length(struct taprio_sched *q, u64 duration)
> {
> - return div_u64(duration * PSEC_PER_NSEC,
> atomic64_read(&q->picos_per_byte));
> + return div_u64(duration * PSEC_PER_NSEC,
> atomic_read(&q->picos_per_byte));
> }
>
> /* Sets sched->max_sdu[] and sched->max_frm_len[] to the minimum between the
> @@ -666,8 +666,8 @@ static void taprio_set_budgets(struct taprio_sched *q,
> if (entry->gate_duration[tc] == sched->cycle_time)
> budget = INT_MAX;
> else
> - budget =
> div64_u64((u64)entry->gate_duration[tc] * PSEC_PER_NSEC,
> - atomic64_read(&q->picos_per_byte));
> + budget = div_u64((u64)entry->gate_duration[tc]
> * PSEC_PER_NSEC,
> + atomic_read(&q->picos_per_byte));
>
> atomic_set(&entry->budget[tc], budget);
> }
> @@ -1291,7 +1291,7 @@ static void taprio_set_picos_per_byte(struct
> net_device *dev,
> {
> struct ethtool_link_ksettings ecmd;
> int speed = SPEED_10;
> - int picos_per_byte;
> + u32 picos_per_byte;
> int err;
>
> err = __ethtool_get_link_ksettings(dev, &ecmd);
> @@ -1303,11 +1303,11 @@ static void taprio_set_picos_per_byte(struct
> net_device *dev,
>
> skip:
> picos_per_byte = (USEC_PER_SEC * 8) / speed;
> -
> - atomic64_set(&q->picos_per_byte, picos_per_byte);
> - netdev_dbg(dev, "taprio: set %s's picos_per_byte to: %lld,
> linkspeed: %d\n",
> - dev->name, (long long)atomic64_read(&q->picos_per_byte),
> - ecmd.base.speed);
> + if (!picos_per_byte)
> + picos_per_byte = 1U;
> + atomic_set(&q->picos_per_byte, picos_per_byte);
> + netdev_dbg(dev, "taprio: set %s's picos_per_byte to: %u,
> linkspeed: %d\n",
> + dev->name, picos_per_byte, ecmd.base.speed);
> }
I would be cautious about making this change not having certainty what
was the picos_per_byte value (and associated speed) that triggered the fault.
I'm hoping we're not masking some larger issue about how the speed is
retrieved or processed.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2024-05-27 11:43 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-23 13:45 [PATCH net] net/sched: taprio: fix duration_to_length() Eric Dumazet
2024-05-23 19:05 ` Simon Horman
2024-05-23 23:08 ` Vinicius Costa Gomes
2024-05-24 15:39 ` Vladimir Oltean
2024-05-24 15:50 ` Eric Dumazet
2024-05-24 15:52 ` Eric Dumazet
2024-05-24 16:07 ` Vladimir Oltean
2024-05-27 8:07 ` Eric Dumazet
2024-05-27 11:43 ` Vladimir Oltean
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).