divide error in x86 and cputime

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* divide error in x86 and cputime
@ 2025-07-07  8:14 Li,Rongqing
  2025-07-07 15:11 ` Steven Rostedt
  2025-07-07 22:09 ` Oleg Nesterov
  0 siblings, 2 replies; 28+ messages in thread
From: Li,Rongqing @ 2025-07-07  8:14 UTC (permalink / raw)
  To: oleg@redhat.com
  Cc: linux-kernel@vger.kernel.org, vschneid@redhat.com,
	mgorman@suse.de, bsegall@google.com, rostedt@goodmis.org,
	dietmar.eggemann@arm.com, vincent.guittot@linaro.org,
	juri.lelli@redhat.com, peterz@infradead.org, mingo@redhat.com

Hi:

I see a divide error on x86 machine, the stack is below:


[78250815.703847] divide error: 0000 [#1] PREEMPT SMP NOPTI
[78250815.703852] CPU: 127 PID: 83435 Comm: killall Kdump: loaded Tainted: P           OE K   5.10.0 #1
[78250815.703853] Hardware name: Inspur SSINSPURMBX-XA3-100D-B356/NF5280A6, BIOS 3.00.21 06/27/2022
[78250815.703859] RIP: 0010:cputime_adjust+0x55/0xb0
[78250815.703860] Code: 3b 4c 8b 4d 10 48 89 c6 49 8d 04 38 4c 39 c8 73 38 48 8b 45 00 48 8b 55 08 48 85 c0 74 16 48 85 d2 74 49 48 8d 0c 10 49 f7 e1 <48> f7 f1 49 39 c0 4c 0f 42 c0 4c 89 c8 4c 29 c0 48 39 c7 77 25 48
[78250815.703861] RSP: 0018:ffffa34c2517bc40 EFLAGS: 00010887
[78250815.703864] RAX: 69f98da9ba980c00 RBX: ffff976c93d2a5e0 RCX: 0000000709e00900
[78250815.703864] RDX: 00f5dfffab0fc352 RSI: 0000000000000082 RDI: ff07410dca0bcd5e
[78250815.703865] RBP: ffffa34c2517bc70 R08: 00f5dfff54f8e5ce R09: fffd213aabd74626
[78250815.703866] R10: ffffa34c2517bed8 R11: 0000000000000000 R12: ffff976c93d2a5f0
[78250815.703867] R13: ffffa34c2517bd78 R14: ffffa34c2517bd70 R15: 0000000000001000
[78250815.703868] FS:  00007f58060f97a0(0000) GS:ffff976afe9c0000(0000) knlGS:0000000000000000
[78250815.703869] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[78250815.703870] CR2: 00007f580610e000 CR3: 0000017e3b3d2004 CR4: 0000000000770ee0
[78250815.703870] PKRU: 55555554
[78250815.703871] Call Trace:
[78250815.703877]  thread_group_cputime_adjusted+0x4a/0x70
[78250815.703881]  do_task_stat+0x2ed/0xe00
[78250815.703885]  ? khugepaged_enter_vma_merge+0x12/0xd0
[78250815.703888]  proc_single_show+0x51/0xc0
[78250815.703892]  seq_read_iter+0x185/0x3c0
[78250815.703895]  seq_read+0x106/0x150
[78250815.703898]  vfs_read+0x98/0x180
[78250815.703900]  ksys_read+0x59/0xd0
[78250815.703904]  do_syscall_64+0x33/0x40
[78250815.703907]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[78250815.703910] RIP: 0033:0x318aeda360


It caused by a process with many threads running very long, and utime+stime overflowed 64bit, then cause the below div

mul_u64_u64_div_u64(0x69f98da9ba980c00, 0xfffd213aabd74626, 0x09e00900);

I see the comments of mul_u64_u64_div_u64() say:

Will generate an #DE when the result doesn't fit u64, could fix with an
__ex_table[] entry when it becomes an issu


Seem __ex_table[] entry for div does not work ?

Thanks

-Li




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: divide error in x86 and cputime
  2025-07-07  8:14 divide error in x86 and cputime Li,Rongqing
@ 2025-07-07 15:11 ` Steven Rostedt
  2025-07-07 22:09 ` Oleg Nesterov
  1 sibling, 0 replies; 28+ messages in thread
From: Steven Rostedt @ 2025-07-07 15:11 UTC (permalink / raw)
  To: Li,Rongqing
  Cc: oleg@redhat.com, linux-kernel@vger.kernel.org,
	vschneid@redhat.com, mgorman@suse.de, bsegall@google.com,
	dietmar.eggemann@arm.com, vincent.guittot@linaro.org,
	juri.lelli@redhat.com, peterz@infradead.org, mingo@redhat.com

On Mon, 7 Jul 2025 08:14:41 +0000
"Li,Rongqing" <lirongqing@baidu.com> wrote:

> Hi:
> 
> I see a divide error on x86 machine, the stack is below:
> 
> 
> [78250815.703847] divide error: 0000 [#1] PREEMPT SMP NOPTI
> [78250815.703852] CPU: 127 PID: 83435 Comm: killall Kdump: loaded Tainted: P           OE K   5.10.0 #1

Did you see this on a 5.10 kernel?

Do you see it on something more recent? Preferably the 6.15 or 6.16.

-- Steve

> [78250815.703853] Hardware name: Inspur SSINSPURMBX-XA3-100D-B356/NF5280A6, BIOS 3.00.21 06/27/2022
> [78250815.703859] RIP: 0010:cputime_adjust+0x55/0xb0
> [78250815.703860] Code: 3b 4c 8b 4d 10 48 89 c6 49 8d 04 38 4c 39 c8 73 38 48 8b 45 00 48 8b 55 08 48 85 c0 74 16 48 85 d2 74 49 48 8d 0c 10 49 f7 e1 <48> f7 f1 49 39 c0 4c 0f 42 c0 4c 89 c8 4c 29 c0 48 39 c7 77 25 48
> [78250815.703861] RSP: 0018:ffffa34c2517bc40 EFLAGS: 00010887
> [78250815.703864] RAX: 69f98da9ba980c00 RBX: ffff976c93d2a5e0 RCX: 0000000709e00900
> [78250815.703864] RDX: 00f5dfffab0fc352 RSI: 0000000000000082 RDI: ff07410dca0bcd5e
> [78250815.703865] RBP: ffffa34c2517bc70 R08: 00f5dfff54f8e5ce R09: fffd213aabd74626
> [78250815.703866] R10: ffffa34c2517bed8 R11: 0000000000000000 R12: ffff976c93d2a5f0
> [78250815.703867] R13: ffffa34c2517bd78 R14: ffffa34c2517bd70 R15: 0000000000001000
> [78250815.703868] FS:  00007f58060f97a0(0000) GS:ffff976afe9c0000(0000) knlGS:0000000000000000
> [78250815.703869] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [78250815.703870] CR2: 00007f580610e000 CR3: 0000017e3b3d2004 CR4: 0000000000770ee0
> [78250815.703870] PKRU: 55555554
> [78250815.703871] Call Trace:
> [78250815.703877]  thread_group_cputime_adjusted+0x4a/0x70
> [78250815.703881]  do_task_stat+0x2ed/0xe00
> [78250815.703885]  ? khugepaged_enter_vma_merge+0x12/0xd0
> [78250815.703888]  proc_single_show+0x51/0xc0
> [78250815.703892]  seq_read_iter+0x185/0x3c0
> [78250815.703895]  seq_read+0x106/0x150
> [78250815.703898]  vfs_read+0x98/0x180
> [78250815.703900]  ksys_read+0x59/0xd0
> [78250815.703904]  do_syscall_64+0x33/0x40
> [78250815.703907]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [78250815.703910] RIP: 0033:0x318aeda360
> 
> 
> It caused by a process with many threads running very long, and utime+stime overflowed 64bit, then cause the below div
> 
> mul_u64_u64_div_u64(0x69f98da9ba980c00, 0xfffd213aabd74626, 0x09e00900);
> 
> I see the comments of mul_u64_u64_div_u64() say:
> 
> Will generate an #DE when the result doesn't fit u64, could fix with an
> __ex_table[] entry when it becomes an issu
> 
> 
> Seem __ex_table[] entry for div does not work ?
> 
> Thanks
> 
> -Li
> 
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: divide error in x86 and cputime
  2025-07-07  8:14 divide error in x86 and cputime Li,Rongqing
  2025-07-07 15:11 ` Steven Rostedt
@ 2025-07-07 22:09 ` Oleg Nesterov
  2025-07-07 22:20   ` Steven Rostedt
  2025-07-07 22:30   ` Oleg Nesterov
  1 sibling, 2 replies; 28+ messages in thread
From: Oleg Nesterov @ 2025-07-07 22:09 UTC (permalink / raw)
  To: Li,Rongqing, Peter Zijlstra, David Laight
  Cc: linux-kernel@vger.kernel.org, vschneid@redhat.com,
	mgorman@suse.de, bsegall@google.com, rostedt@goodmis.org,
	dietmar.eggemann@arm.com, vincent.guittot@linaro.org,
	juri.lelli@redhat.com, mingo@redhat.com

On 07/07, Li,Rongqing wrote:
>
> [78250815.703847] divide error: 0000 [#1] PREEMPT SMP NOPTI

...

> It caused by a process with many threads running very long,
> and utime+stime overflowed 64bit, then cause the below div
>
> mul_u64_u64_div_u64(0x69f98da9ba980c00, 0xfffd213aabd74626, 0x09e00900);
>
> I see the comments of mul_u64_u64_div_u64() say:
>
> Will generate an #DE when the result doesn't fit u64, could fix with an
> __ex_table[] entry when it becomes an issu
>
> Seem __ex_table[] entry for div does not work ?

Well, the current version doesn't have an __ex_table[] entry for div...

I do not know what can/should we do in this case... Perhaps

	static inline u64 mul_u64_u64_div_u64(u64 a, u64 mul, u64 div)
	{
		int ok = 0;
		u64 q;

		asm ("mulq %3; 1: divq %4; movl $1,%1; 2:\n"
			_ASM_EXTABLE(1b, 2b)
			: "=a" (q), "+r" (ok)
			: "a" (a), "rm" (mul), "rm" (div)
			: "rdx");

		return ok ? q : -1ul;
	}

?

Should return ULLONG_MAX on #DE.

Oleg.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: divide error in x86 and cputime
  2025-07-07 22:09 ` Oleg Nesterov
@ 2025-07-07 22:20   ` Steven Rostedt
  2025-07-07 22:33     ` Steven Rostedt
  2025-07-07 22:30   ` Oleg Nesterov
  1 sibling, 1 reply; 28+ messages in thread
From: Steven Rostedt @ 2025-07-07 22:20 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Li,Rongqing, Peter Zijlstra, David Laight,
	linux-kernel@vger.kernel.org, vschneid@redhat.com,
	mgorman@suse.de, bsegall@google.com, dietmar.eggemann@arm.com,
	vincent.guittot@linaro.org, juri.lelli@redhat.com,
	mingo@redhat.com

On Tue, 8 Jul 2025 00:09:38 +0200
Oleg Nesterov <oleg@redhat.com> wrote:

> Well, the current version doesn't have an __ex_table[] entry for div...
> 
> I do not know what can/should we do in this case... Perhaps
> 
> 	static inline u64 mul_u64_u64_div_u64(u64 a, u64 mul, u64 div)
> 	{
> 		int ok = 0;
> 		u64 q;
> 
> 		asm ("mulq %3; 1: divq %4; movl $1,%1; 2:\n"
> 			_ASM_EXTABLE(1b, 2b)
> 			: "=a" (q), "+r" (ok)
> 			: "a" (a), "rm" (mul), "rm" (div)
> 			: "rdx");
> 
> 		return ok ? q : -1ul;
> 	}
> 
> ?
> 
> Should return ULLONG_MAX on #DE.

I would say this should never happen and if it does, let the kernel crash.

-- Steve

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: divide error in x86 and cputime
  2025-07-07 22:09 ` Oleg Nesterov
  2025-07-07 22:20   ` Steven Rostedt
@ 2025-07-07 22:30   ` Oleg Nesterov
  2025-07-07 23:41     ` 答复: [????] " Li,Rongqing
  1 sibling, 1 reply; 28+ messages in thread
From: Oleg Nesterov @ 2025-07-07 22:30 UTC (permalink / raw)
  To: Li,Rongqing, Peter Zijlstra, David Laight
  Cc: linux-kernel@vger.kernel.org, vschneid@redhat.com,
	mgorman@suse.de, bsegall@google.com, rostedt@goodmis.org,
	dietmar.eggemann@arm.com, vincent.guittot@linaro.org,
	juri.lelli@redhat.com, mingo@redhat.com

On a second thought, this

    mul_u64_u64_div_u64(0x69f98da9ba980c00, 0xfffd213aabd74626, 0x09e00900);
                        stime               rtime               stime + utime	

looks suspicious:

	- stime > stime + utime

	- rtime = 0xfffd213aabd74626 is absurdly huge

so perhaps there is another problem?

Oleg.

On 07/08, Oleg Nesterov wrote:
>
> On 07/07, Li,Rongqing wrote:
> >
> > [78250815.703847] divide error: 0000 [#1] PREEMPT SMP NOPTI
>
> ...
>
> > It caused by a process with many threads running very long,
> > and utime+stime overflowed 64bit, then cause the below div
> >
> > mul_u64_u64_div_u64(0x69f98da9ba980c00, 0xfffd213aabd74626, 0x09e00900);
> >
> > I see the comments of mul_u64_u64_div_u64() say:
> >
> > Will generate an #DE when the result doesn't fit u64, could fix with an
> > __ex_table[] entry when it becomes an issu
> >
> > Seem __ex_table[] entry for div does not work ?
>
> Well, the current version doesn't have an __ex_table[] entry for div...
>
> I do not know what can/should we do in this case... Perhaps
>
> 	static inline u64 mul_u64_u64_div_u64(u64 a, u64 mul, u64 div)
> 	{
> 		int ok = 0;
> 		u64 q;
>
> 		asm ("mulq %3; 1: divq %4; movl $1,%1; 2:\n"
> 			_ASM_EXTABLE(1b, 2b)
> 			: "=a" (q), "+r" (ok)
> 			: "a" (a), "rm" (mul), "rm" (div)
> 			: "rdx");
>
> 		return ok ? q : -1ul;
> 	}
>
> ?
>
> Should return ULLONG_MAX on #DE.
>
> Oleg.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: divide error in x86 and cputime
  2025-07-07 22:20   ` Steven Rostedt
@ 2025-07-07 22:33     ` Steven Rostedt
  2025-07-07 23:00       ` Oleg Nesterov
  2025-07-08  1:40       ` 答复: [????] " Li,Rongqing
  0 siblings, 2 replies; 28+ messages in thread
From: Steven Rostedt @ 2025-07-07 22:33 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Li,Rongqing, Peter Zijlstra, David Laight,
	linux-kernel@vger.kernel.org, vschneid@redhat.com,
	mgorman@suse.de, bsegall@google.com, dietmar.eggemann@arm.com,
	vincent.guittot@linaro.org, juri.lelli@redhat.com,
	mingo@redhat.com

On Mon, 7 Jul 2025 18:20:56 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> I would say this should never happen and if it does, let the kernel crash.

>> [78250815.703852] CPU: 127 PID: 83435 Comm: killall Kdump: loaded Tainted: P           OE K   5.10.0 #1

This happened on a 5.10 kernel with a proprietary module loaded, so
honestly, if it can't be reproduced on a newer kernel without any
proprietary modules loaded, I say we don't worry about it.

I also don't by the utime + stime overflowing a 64bit number.

  2^64 / 2 = 2^63 = 9223372036854775808

That would be:

                                   minutes    days
                                      v        v
  9223372036854775808 / 1000000000 / 60 / 60 / 24 / 365.25 = 292.27
                           ^               ^         ^
                        ns -> sec       hours       years

So the report says they have threads running for a very long time, it would
still be 292 years of run time!

-- Steve

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: divide error in x86 and cputime
  2025-07-07 22:33     ` Steven Rostedt
@ 2025-07-07 23:00       ` Oleg Nesterov
  2025-07-08 11:00         ` David Laight
  2025-07-08  1:40       ` 答复: [????] " Li,Rongqing
  1 sibling, 1 reply; 28+ messages in thread
From: Oleg Nesterov @ 2025-07-07 23:00 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Li,Rongqing, Peter Zijlstra, David Laight,
	linux-kernel@vger.kernel.org, vschneid@redhat.com,
	mgorman@suse.de, bsegall@google.com, dietmar.eggemann@arm.com,
	vincent.guittot@linaro.org, juri.lelli@redhat.com,
	mingo@redhat.com

On 07/07, Steven Rostedt wrote:
>
> On Mon, 7 Jul 2025 18:20:56 -0400
> Steven Rostedt <rostedt@goodmis.org> wrote:
>
> > I would say this should never happen and if it does, let the kernel crash.
>
> >> [78250815.703852] CPU: 127 PID: 83435 Comm: killall Kdump: loaded Tainted: P           OE K   5.10.0 #1
>
> This happened on a 5.10 kernel with a proprietary module loaded, so
> honestly, if it can't be reproduced on a newer kernel without any
> proprietary modules loaded, I say we don't worry about it.

Yes, agreed, see my reply to myself.

Oleg.

> I also don't by the utime + stime overflowing a 64bit number.
>
>   2^64 / 2 = 2^63 = 9223372036854775808
>
> That would be:
>
>                                    minutes    days
>                                       v        v
>   9223372036854775808 / 1000000000 / 60 / 60 / 24 / 365.25 = 292.27
>                            ^               ^         ^
>                         ns -> sec       hours       years
>
> So the report says they have threads running for a very long time, it would
> still be 292 years of run time!
>
> -- Steve
>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* 答复: [????] Re: divide error in x86 and cputime
  2025-07-07 22:30   ` Oleg Nesterov
@ 2025-07-07 23:41     ` Li,Rongqing
  2025-07-07 23:53       ` Steven Rostedt
                         ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Li,Rongqing @ 2025-07-07 23:41 UTC (permalink / raw)
  To: Oleg Nesterov, Peter Zijlstra, David Laight
  Cc: linux-kernel@vger.kernel.org, vschneid@redhat.com,
	mgorman@suse.de, bsegall@google.com, rostedt@goodmis.org,
	dietmar.eggemann@arm.com, vincent.guittot@linaro.org,
	juri.lelli@redhat.com, mingo@redhat.com



> On a second thought, this
> 
>     mul_u64_u64_div_u64(0x69f98da9ba980c00, 0xfffd213aabd74626,
> 0x09e00900);
>                         stime               rtime
> stime + utime
> 
> looks suspicious:
> 
> 	- stime > stime + utime
> 
> 	- rtime = 0xfffd213aabd74626 is absurdly huge
> 
> so perhaps there is another problem?
> 

it happened when a process with 236 busy polling threads , run about 904 days, the total time will overflow the 64bit

non-x86 system maybe has same issue, once (stime + utime) overflows 64bit, mul_u64_u64_div_u64 from lib/math/div64.c maybe cause division by 0

so to cputime, could cputime_adjust() return stime if stime if stime + utime is overflow

diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 6dab4854..db0c273 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -579,6 +579,10 @@ void cputime_adjust(struct task_cputime *curr, struct prev_cputime *prev,
                goto update;
        }

+       if (stime > (stime + utime)) {
+               goto update;
+       }
+
        stime = mul_u64_u64_div_u64(stime, rtime, stime + utime);
        /*
         * Because mul_u64_u64_div_u64() can approximate on some


Thanks

-Li


> Oleg.
> 
> On 07/08, Oleg Nesterov wrote:
> >
> > On 07/07, Li,Rongqing wrote:
> > >
> > > [78250815.703847] divide error: 0000 [#1] PREEMPT SMP NOPTI
> >
> > ...
> >
> > > It caused by a process with many threads running very long, and
> > > utime+stime overflowed 64bit, then cause the below div
> > >
> > > mul_u64_u64_div_u64(0x69f98da9ba980c00, 0xfffd213aabd74626,
> > > 0x09e00900);
> > >
> > > I see the comments of mul_u64_u64_div_u64() say:
> > >
> > > Will generate an #DE when the result doesn't fit u64, could fix with
> > > an __ex_table[] entry when it becomes an issu
> > >
> > > Seem __ex_table[] entry for div does not work ?
> >
> > Well, the current version doesn't have an __ex_table[] entry for div...
> >
> > I do not know what can/should we do in this case... Perhaps
> >
> > 	static inline u64 mul_u64_u64_div_u64(u64 a, u64 mul, u64 div)
> > 	{
> > 		int ok = 0;
> > 		u64 q;
> >
> > 		asm ("mulq %3; 1: divq %4; movl $1,%1; 2:\n"
> > 			_ASM_EXTABLE(1b, 2b)
> > 			: "=a" (q), "+r" (ok)
> > 			: "a" (a), "rm" (mul), "rm" (div)
> > 			: "rdx");
> >
> > 		return ok ? q : -1ul;
> > 	}
> >
> > ?
> >
> > Should return ULLONG_MAX on #DE.
> >
> > Oleg.


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [????] Re: divide error in x86 and cputime
  2025-07-07 23:41     ` 答复: [????] " Li,Rongqing
@ 2025-07-07 23:53       ` Steven Rostedt
  2025-07-08  0:10         ` 答复: [????] " Li,Rongqing
  2025-07-08  0:23       ` 答复: " Li,Rongqing
  2026-01-04 13:23       ` Xia Fukun
  2 siblings, 1 reply; 28+ messages in thread
From: Steven Rostedt @ 2025-07-07 23:53 UTC (permalink / raw)
  To: Li,Rongqing
  Cc: Oleg Nesterov, Peter Zijlstra, David Laight,
	linux-kernel@vger.kernel.org, vschneid@redhat.com,
	mgorman@suse.de, bsegall@google.com, dietmar.eggemann@arm.com,
	vincent.guittot@linaro.org, juri.lelli@redhat.com,
	mingo@redhat.com

On Mon, 7 Jul 2025 23:41:14 +0000
"Li,Rongqing" <lirongqing@baidu.com> wrote:

> > On a second thought, this
> > 
> >     mul_u64_u64_div_u64(0x69f98da9ba980c00, 0xfffd213aabd74626,
> > 0x09e00900);
> >                         stime               rtime
> > stime + utime
> > 
> > looks suspicious:
> > 
> > 	- stime > stime + utime
> > 
> > 	- rtime = 0xfffd213aabd74626 is absurdly huge
> > 
> > so perhaps there is another problem?
> >   
> 
> it happened when a process with 236 busy polling threads , run about 904 days, the total time will overflow the 64bit
> 
> non-x86 system maybe has same issue, once (stime + utime) overflows 64bit, mul_u64_u64_div_u64 from lib/math/div64.c maybe cause division by 0
> 
> so to cputime, could cputime_adjust() return stime if stime if stime + utime is overflow
> 
> diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
> index 6dab4854..db0c273 100644
> --- a/kernel/sched/cputime.c
> +++ b/kernel/sched/cputime.c
> @@ -579,6 +579,10 @@ void cputime_adjust(struct task_cputime *curr, struct prev_cputime *prev,
>                 goto update;
>         }
> 
> +       if (stime > (stime + utime)) {
> +               goto update;
> +       }
> +
>         stime = mul_u64_u64_div_u64(stime, rtime, stime + utime);
>         /*
>          * Because mul_u64_u64_div_u64() can approximate on some
> 

Are you running 5.10.0? Because a diff of 5.10.238 from 5.10.0 gives:

@@ -579,6 +579,12 @@ void cputime_adjust(struct task_cputime *curr, struct prev_cputime *prev,
        }
 
        stime = mul_u64_u64_div_u64(stime, rtime, stime + utime);
+       /*
+        * Because mul_u64_u64_div_u64() can approximate on some
+        * achitectures; enforce the constraint that: a*b/(b+c) <= a.
+        */
+       if (unlikely(stime > rtime))
+               stime = rtime;
 
 update:


Thus the result is what's getting screwed up.

-- Steve

^ permalink raw reply	[flat|nested] 28+ messages in thread

* 答复: [????] Re: [????] Re: divide error in x86 and cputime
  2025-07-07 23:53       ` Steven Rostedt
@ 2025-07-08  0:10         ` Li,Rongqing
  2025-07-08  0:30           ` Steven Rostedt
  0 siblings, 1 reply; 28+ messages in thread
From: Li,Rongqing @ 2025-07-08  0:10 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Oleg Nesterov, Peter Zijlstra, David Laight,
	linux-kernel@vger.kernel.org, vschneid@redhat.com,
	mgorman@suse.de, bsegall@google.com, dietmar.eggemann@arm.com,
	vincent.guittot@linaro.org, juri.lelli@redhat.com,
	mingo@redhat.com

> On Mon, 7 Jul 2025 23:41:14 +0000
> "Li,Rongqing" <lirongqing@baidu.com> wrote:
> 
> > > On a second thought, this
> > >
> > >     mul_u64_u64_div_u64(0x69f98da9ba980c00, 0xfffd213aabd74626,
> > > 0x09e00900);
> > >                         stime               rtime
> > > stime + utime
> > >
> > > looks suspicious:
> > >
> > > 	- stime > stime + utime
> > >
> > > 	- rtime = 0xfffd213aabd74626 is absurdly huge
> > >
> > > so perhaps there is another problem?
> > >
> >
> > it happened when a process with 236 busy polling threads , run about
> > 904 days, the total time will overflow the 64bit
> >
> > non-x86 system maybe has same issue, once (stime + utime) overflows
> > 64bit, mul_u64_u64_div_u64 from lib/math/div64.c maybe cause division
> > by 0
> >
> > so to cputime, could cputime_adjust() return stime if stime if stime +
> > utime is overflow
> >
> > diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c index
> > 6dab4854..db0c273 100644
> > --- a/kernel/sched/cputime.c
> > +++ b/kernel/sched/cputime.c
> > @@ -579,6 +579,10 @@ void cputime_adjust(struct task_cputime *curr,
> struct prev_cputime *prev,
> >                 goto update;
> >         }
> >
> > +       if (stime > (stime + utime)) {
> > +               goto update;
> > +       }
> > +
> >         stime = mul_u64_u64_div_u64(stime, rtime, stime + utime);
> >         /*
> >          * Because mul_u64_u64_div_u64() can approximate on some
> >
> 
> Are you running 5.10.0? Because a diff of 5.10.238 from 5.10.0 gives:
> 
> @@ -579,6 +579,12 @@ void cputime_adjust(struct task_cputime *curr, struct
> prev_cputime *prev,
>         }
> 
>         stime = mul_u64_u64_div_u64(stime, rtime, stime + utime);
> +       /*
> +        * Because mul_u64_u64_div_u64() can approximate on some
> +        * achitectures; enforce the constraint that: a*b/(b+c) <= a.
> +        */
> +       if (unlikely(stime > rtime))
> +               stime = rtime;


My 5.10 has not this patch " sched/cputime: Fix mul_u64_u64_div_u64() precision for cputime ",
but I am sure this patch can not fix this overflow issue, Since division error happened in mul_u64_u64_div_u64()

Thanks

-Li


> 
>  update:
> 
> 
> Thus the result is what's getting screwed up.
> 
> -- Steve

^ permalink raw reply	[flat|nested] 28+ messages in thread

* 答复: [????] Re: divide error in x86 and cputime
  2025-07-07 23:41     ` 答复: [????] " Li,Rongqing
  2025-07-07 23:53       ` Steven Rostedt
@ 2025-07-08  0:23       ` Li,Rongqing
  2026-01-04 13:23       ` Xia Fukun
  2 siblings, 0 replies; 28+ messages in thread
From: Li,Rongqing @ 2025-07-08  0:23 UTC (permalink / raw)
  To: Oleg Nesterov, Peter Zijlstra, David Laight
  Cc: linux-kernel@vger.kernel.org, vschneid@redhat.com,
	mgorman@suse.de, bsegall@google.com, rostedt@goodmis.org,
	dietmar.eggemann@arm.com, vincent.guittot@linaro.org,
	juri.lelli@redhat.com, mingo@redhat.com

> non-x86 system maybe has same issue, once (stime + utime) overflows 64bit,
> mul_u64_u64_div_u64 from lib/math/div64.c maybe cause division by 0
> 

Correct this, mul_u64_u64_div_u64(0x69f98da9ba980c00, 0xfffd213aabd74626, 0x009e00900) oflib/math/div64.c returns 0xffffffffffffffff


> so to cputime, could cputime_adjust() return stime if stime if stime + utime is
> overflow
> 
> diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c index
> 6dab4854..db0c273 100644
> --- a/kernel/sched/cputime.c
> +++ b/kernel/sched/cputime.c
> @@ -579,6 +579,10 @@ void cputime_adjust(struct task_cputime *curr, struct
> prev_cputime *prev,
>                 goto update;
>         }
> 
> +       if (stime > (stime + utime)) {
> +               goto update;
> +       }
> +
>         stime = mul_u64_u64_div_u64(stime, rtime, stime + utime);
>         /*
>          * Because mul_u64_u64_div_u64() can approximate on some
> 
> 
> Thanks
> 
> -Li
> 
> 
> > Oleg.
> >
> > On 07/08, Oleg Nesterov wrote:
> > >
> > > On 07/07, Li,Rongqing wrote:
> > > >
> > > > [78250815.703847] divide error: 0000 [#1] PREEMPT SMP NOPTI
> > >
> > > ...
> > >
> > > > It caused by a process with many threads running very long, and
> > > > utime+stime overflowed 64bit, then cause the below div
> > > >
> > > > mul_u64_u64_div_u64(0x69f98da9ba980c00, 0xfffd213aabd74626,
> > > > 0x09e00900);
> > > >
> > > > I see the comments of mul_u64_u64_div_u64() say:
> > > >
> > > > Will generate an #DE when the result doesn't fit u64, could fix
> > > > with an __ex_table[] entry when it becomes an issu
> > > >
> > > > Seem __ex_table[] entry for div does not work ?
> > >
> > > Well, the current version doesn't have an __ex_table[] entry for div...
> > >
> > > I do not know what can/should we do in this case... Perhaps
> > >
> > > 	static inline u64 mul_u64_u64_div_u64(u64 a, u64 mul, u64 div)
> > > 	{
> > > 		int ok = 0;
> > > 		u64 q;
> > >
> > > 		asm ("mulq %3; 1: divq %4; movl $1,%1; 2:\n"
> > > 			_ASM_EXTABLE(1b, 2b)
> > > 			: "=a" (q), "+r" (ok)
> > > 			: "a" (a), "rm" (mul), "rm" (div)
> > > 			: "rdx");
> > >
> > > 		return ok ? q : -1ul;
> > > 	}
> > >
> > > ?
> > >
> > > Should return ULLONG_MAX on #DE.
> > >
> > > Oleg.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [????] Re: [????] Re: divide error in x86 and cputime
  2025-07-08  0:10         ` 答复: [????] " Li,Rongqing
@ 2025-07-08  0:30           ` Steven Rostedt
  2025-07-08  1:17             ` 答复: [????] " Li,Rongqing
  2025-07-08 10:35             ` [????] Re: [????] " David Laight
  0 siblings, 2 replies; 28+ messages in thread
From: Steven Rostedt @ 2025-07-08  0:30 UTC (permalink / raw)
  To: Li,Rongqing
  Cc: Oleg Nesterov, Peter Zijlstra, David Laight,
	linux-kernel@vger.kernel.org, vschneid@redhat.com,
	mgorman@suse.de, bsegall@google.com, dietmar.eggemann@arm.com,
	vincent.guittot@linaro.org, juri.lelli@redhat.com,
	mingo@redhat.com

On Tue, 8 Jul 2025 00:10:54 +0000
"Li,Rongqing" <lirongqing@baidu.com> wrote:

> >         stime = mul_u64_u64_div_u64(stime, rtime, stime + utime);
> > +       /*
> > +        * Because mul_u64_u64_div_u64() can approximate on some
> > +        * achitectures; enforce the constraint that: a*b/(b+c) <= a.
> > +        */
> > +       if (unlikely(stime > rtime))
> > +               stime = rtime;  
> 
> 
> My 5.10 has not this patch " sched/cputime: Fix mul_u64_u64_div_u64() precision for cputime ",
> but I am sure this patch can not fix this overflow issue, Since division error happened in mul_u64_u64_div_u64()

Have you tried it? Or are you just making an assumption?

How can you be so sure? Did you even *look* at the commit?

    sched/cputime: Fix mul_u64_u64_div_u64() precision for cputime
    
    In extreme test scenarios:
    the 14th field utime in /proc/xx/stat is greater than sum_exec_runtime,
    utime = 18446744073709518790 ns, rtime = 135989749728000 ns
    
    In cputime_adjust() process, stime is greater than rtime due to
    mul_u64_u64_div_u64() precision problem.
    before call mul_u64_u64_div_u64(),
    stime = 175136586720000, rtime = 135989749728000, utime = 1416780000.
    after call mul_u64_u64_div_u64(),
    stime = 135989949653530
    
    unsigned reversion occurs because rtime is less than stime.
    utime = rtime - stime = 135989749728000 - 135989949653530
                          = -199925530
                          = (u64)18446744073709518790
    
    Trigger condition:
      1). User task run in kernel mode most of time
      2). ARM64 architecture
      3). TICK_CPU_ACCOUNTING=y
          CONFIG_VIRT_CPU_ACCOUNTING_NATIVE is not set
    
    Fix mul_u64_u64_div_u64() conversion precision by reset stime to rtime


When stime ends up greater than rtime, it causes utime to go NEGATIVE!

That means *YES* it can overflow a u64 number. That's your bug.

Next time, look to see if there's fixes in the code that is triggering
issues for you and test them out, before bothering upstream.

Goodbye.

-- Steve

^ permalink raw reply	[flat|nested] 28+ messages in thread

* 答复: [????] Re: [????] Re: [????] Re: divide error in x86 and cputime
  2025-07-08  0:30           ` Steven Rostedt
@ 2025-07-08  1:17             ` Li,Rongqing
  2025-07-08  1:41               ` Steven Rostedt
  2025-07-08 10:35             ` [????] Re: [????] " David Laight
  1 sibling, 1 reply; 28+ messages in thread
From: Li,Rongqing @ 2025-07-08  1:17 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Oleg Nesterov, Peter Zijlstra, David Laight,
	linux-kernel@vger.kernel.org, vschneid@redhat.com,
	mgorman@suse.de, bsegall@google.com, dietmar.eggemann@arm.com,
	vincent.guittot@linaro.org, juri.lelli@redhat.com,
	mingo@redhat.com

> Have you tried it? Or are you just making an assumption?
> 
> How can you be so sure? Did you even *look* at the commit?
> 
>     sched/cputime: Fix mul_u64_u64_div_u64() precision for cputime
> 
>     In extreme test scenarios:
>     the 14th field utime in /proc/xx/stat is greater than sum_exec_runtime,
>     utime = 18446744073709518790 ns, rtime = 135989749728000 ns
> 
>     In cputime_adjust() process, stime is greater than rtime due to
>     mul_u64_u64_div_u64() precision problem.
>     before call mul_u64_u64_div_u64(),
>     stime = 175136586720000, rtime = 135989749728000, utime =
> 1416780000.
>     after call mul_u64_u64_div_u64(),
>     stime = 135989949653530
> 
>     unsigned reversion occurs because rtime is less than stime.
>     utime = rtime - stime = 135989749728000 - 135989949653530
>                           = -199925530
>                           = (u64)18446744073709518790
> 

I will try to tested this patch, But I think it is different case;

Stime is not greater than rtime in my case, (stime= 0x69f98da9ba980c00, rtime= 0xfffd213aabd74626, stime+utime= 0x9e00900. So utime should be 0x960672564f47fd00 ), and this overflow process with 236 busy poll threads running about 904 day, so I think these times are correct


Thanks

-Li

>     Trigger condition:
>       1). User task run in kernel mode most of time
>       2). ARM64 architecture
>       3). TICK_CPU_ACCOUNTING=y
>           CONFIG_VIRT_CPU_ACCOUNTING_NATIVE is not set
> 
>     Fix mul_u64_u64_div_u64() conversion precision by reset stime to rtime
> 
> 
> When stime ends up greater than rtime, it causes utime to go NEGATIVE!
> 
> That means *YES* it can overflow a u64 number. That's your bug.
> 
> Next time, look to see if there's fixes in the code that is triggering issues for you
> and test them out, before bothering upstream.
> 
> Goodbye.
> 
> -- Steve

^ permalink raw reply	[flat|nested] 28+ messages in thread

* 答复: [????] Re: divide error in x86 and cputime
  2025-07-07 22:33     ` Steven Rostedt
  2025-07-07 23:00       ` Oleg Nesterov
@ 2025-07-08  1:40       ` Li,Rongqing
  2025-07-08  1:53         ` Steven Rostedt
  1 sibling, 1 reply; 28+ messages in thread
From: Li,Rongqing @ 2025-07-08  1:40 UTC (permalink / raw)
  To: Steven Rostedt, Oleg Nesterov
  Cc: Peter Zijlstra, David Laight, linux-kernel@vger.kernel.org,
	vschneid@redhat.com, mgorman@suse.de, bsegall@google.com,
	dietmar.eggemann@arm.com, vincent.guittot@linaro.org,
	juri.lelli@redhat.com, mingo@redhat.com

> That would be:
> 
>                                    minutes    days
>                                       v        v
>   9223372036854775808 / 1000000000 / 60 / 60 / 24 / 365.25 = 292.27
>                            ^               ^         ^
>                         ns -> sec       hours       years
> 
> So the report says they have threads running for a very long time, it would still
> be 292 years of run time!

Utime/rtime is u64, it means overflow needs 292.27*2=584 year,

But with multiple thread, like 292 threads, it only need two years, it is a thread group total running time


void thread_group_cputime_adjusted(struct task_struct *p, u64 *ut, u64 *st)
{
    struct task_cputime cputime;

    thread_group_cputime(p, &cputime);
    cputime_adjust(&cputime, &p->signal->prev_cputime, ut, st);
}

-Li




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: divide error in x86 and cputime
  2025-07-08  1:17             ` 答复: [????] " Li,Rongqing
@ 2025-07-08  1:41               ` Steven Rostedt
  0 siblings, 0 replies; 28+ messages in thread
From: Steven Rostedt @ 2025-07-08  1:41 UTC (permalink / raw)
  To: Li,Rongqing
  Cc: Oleg Nesterov, Peter Zijlstra, David Laight,
	linux-kernel@vger.kernel.org, vschneid@redhat.com,
	mgorman@suse.de, bsegall@google.com, dietmar.eggemann@arm.com,
	vincent.guittot@linaro.org, juri.lelli@redhat.com,
	mingo@redhat.com

On Tue, 8 Jul 2025 01:17:50 +0000
"Li,Rongqing" <lirongqing@baidu.com> wrote:

> Stime is not greater than rtime in my case, (stime= 0x69f98da9ba980c00,
> rtime= 0xfffd213aabd74626, stime+utime= 0x9e00900. So utime should be
> 0x960672564f47fd00 ), and this overflow process with 236 busy poll
> threads running about 904 day, so I think these times are correct
> 

But look at rtime, it is *negative*. So maybe that fix isn't going to fix
this bug, but rtime is most definitely screwed up. That value is:

  0xfffd213aabd74626 = (u64)18445936184654251558 = (s64)-807889055300058

There's no way run time should be 584 years in nanoseconds.

So if it's not fixed by that commit, it's a bug that happened before you even
got to the mul_u64_u64_div_u64() function. Touching that is only putting a
band-aid on the symptom, you haven't touched the real bug.

I bet there's likely another fix between what you are using and 5.10.238.
There's 31,101 commits between those two. You are using a way old kernel
without any fixes to it. It is known to be buggy. You will hit bugs with
it. No need to tell us about it.

-- Steve

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [????] Re: divide error in x86 and cputime
  2025-07-08  1:40       ` 答复: [????] " Li,Rongqing
@ 2025-07-08  1:53         ` Steven Rostedt
  2025-07-08  1:58           ` 答复: [????] " Li,Rongqing
  0 siblings, 1 reply; 28+ messages in thread
From: Steven Rostedt @ 2025-07-08  1:53 UTC (permalink / raw)
  To: Li,Rongqing
  Cc: Oleg Nesterov, Peter Zijlstra, David Laight,
	linux-kernel@vger.kernel.org, vschneid@redhat.com,
	mgorman@suse.de, bsegall@google.com, dietmar.eggemann@arm.com,
	vincent.guittot@linaro.org, juri.lelli@redhat.com,
	mingo@redhat.com

On Tue, 8 Jul 2025 01:40:27 +0000
"Li,Rongqing" <lirongqing@baidu.com> wrote:

> > That would be:
> > 
> >                                    minutes    days
> >                                       v        v
> >   9223372036854775808 / 1000000000 / 60 / 60 / 24 / 365.25 = 292.27
> >                            ^               ^         ^
> >                         ns -> sec       hours       years
> > 
> > So the report says they have threads running for a very long time, it would still
> > be 292 years of run time!  
> 
> Utime/rtime is u64, it means overflow needs 292.27*2=584 year,
> 
> But with multiple thread, like 292 threads, it only need two years, it is a thread group total running time
> 
> 
> void thread_group_cputime_adjusted(struct task_struct *p, u64 *ut, u64 *st)
> {
>     struct task_cputime cputime;
> 
>     thread_group_cputime(p, &cputime);
>     cputime_adjust(&cputime, &p->signal->prev_cputime, ut, st);
> }
> 

So you are saying that you have been running this for over two years
without a reboot?

Then the issue isn't the divider, it's that the thread group cputime can
overflow. Perhaps it needs a cap, or a way to "reset" somehow after "so long"?

-- Steve


^ permalink raw reply	[flat|nested] 28+ messages in thread

* 答复: [????] Re: [????] Re: divide error in x86 and cputime
  2025-07-08  1:53         ` Steven Rostedt
@ 2025-07-08  1:58           ` Li,Rongqing
  2025-07-08  2:05             ` Steven Rostedt
  0 siblings, 1 reply; 28+ messages in thread
From: Li,Rongqing @ 2025-07-08  1:58 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Oleg Nesterov, Peter Zijlstra, David Laight,
	linux-kernel@vger.kernel.org, vschneid@redhat.com,
	mgorman@suse.de, bsegall@google.com, dietmar.eggemann@arm.com,
	vincent.guittot@linaro.org, juri.lelli@redhat.com,
	mingo@redhat.com

> "Li,Rongqing" <lirongqing@baidu.com> wrote:
> 
> > > That would be:
> > >
> > >                                    minutes    days
> > >                                       v        v
> > >   9223372036854775808 / 1000000000 / 60 / 60 / 24 / 365.25 = 292.27
> > >                            ^               ^         ^
> > >                         ns -> sec       hours       years
> > >
> > > So the report says they have threads running for a very long time,
> > > it would still be 292 years of run time!
> >
> > Utime/rtime is u64, it means overflow needs 292.27*2=584 year,
> >
> > But with multiple thread, like 292 threads, it only need two years, it
> > is a thread group total running time
> >
> >
> > void thread_group_cputime_adjusted(struct task_struct *p, u64 *ut, u64
> > *st) {
> >     struct task_cputime cputime;
> >
> >     thread_group_cputime(p, &cputime);
> >     cputime_adjust(&cputime, &p->signal->prev_cputime, ut, st); }
> >
> 
> So you are saying that you have been running this for over two years without a
> reboot?
> 

Yes, Consider more and more CPUs in machine, I think it is common case


> Then the issue isn't the divider, it's that the thread group cputime can overflow.
> Perhaps it needs a cap, or a way to "reset" somehow after "so long"?


Do not clear how to reset

But mul_u64_u64_div_u64() for x86 should not trigger a division error panic, maybe should return a ULLONG_MAX on #DE (like non-x86 mul_u64_u64_div_u64(),)

> 
> -- Steve


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [????] Re: [????] Re: divide error in x86 and cputime
  2025-07-08  1:58           ` 答复: [????] " Li,Rongqing
@ 2025-07-08  2:05             ` Steven Rostedt
  2025-07-08  2:17               ` Oleg Nesterov
  0 siblings, 1 reply; 28+ messages in thread
From: Steven Rostedt @ 2025-07-08  2:05 UTC (permalink / raw)
  To: Li,Rongqing
  Cc: Oleg Nesterov, Peter Zijlstra, David Laight,
	linux-kernel@vger.kernel.org, vschneid@redhat.com,
	mgorman@suse.de, bsegall@google.com, dietmar.eggemann@arm.com,
	vincent.guittot@linaro.org, juri.lelli@redhat.com,
	mingo@redhat.com

On Tue, 8 Jul 2025 01:58:00 +0000
"Li,Rongqing" <lirongqing@baidu.com> wrote:

> But mul_u64_u64_div_u64() for x86 should not trigger a division error panic, maybe should return a ULLONG_MAX on #DE (like non-x86 mul_u64_u64_div_u64(),)

Perhaps. But it is still producing garbage.

-- Steve

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [????] Re: [????] Re: divide error in x86 and cputime
  2025-07-08  2:05             ` Steven Rostedt
@ 2025-07-08  2:17               ` Oleg Nesterov
  2025-07-08  9:58                 ` David Laight
  0 siblings, 1 reply; 28+ messages in thread
From: Oleg Nesterov @ 2025-07-08  2:17 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Li,Rongqing, Peter Zijlstra, David Laight,
	linux-kernel@vger.kernel.org, vschneid@redhat.com,
	mgorman@suse.de, bsegall@google.com, dietmar.eggemann@arm.com,
	vincent.guittot@linaro.org, juri.lelli@redhat.com,
	mingo@redhat.com

On 07/07, Steven Rostedt wrote:
>
> On Tue, 8 Jul 2025 01:58:00 +0000
> "Li,Rongqing" <lirongqing@baidu.com> wrote:
>
> > But mul_u64_u64_div_u64() for x86 should not trigger a division error panic,
> maybe should return a ULLONG_MAX on #DE (like non-x86 mul_u64_u64_div_u64(),)
>
> Perhaps.

So do you think

	static inline u64 mul_u64_u64_div_u64(u64 a, u64 mul, u64 div)
	{
		int ok = 0;
		u64 q;

		asm ("mulq %3; 1: divq %4; movl $1,%1; 2:\n"
			_ASM_EXTABLE(1b, 2b)
			: "=a" (q), "+r" (ok)
			: "a" (a), "rm" (mul), "rm" (div)
			: "rdx");

		return ok ? q : -1ul;
	}

makes sense at least for consistency with the generic implementation
in lib/math/div64.c ?

>  But it is still producing garbage.

Agreed. And not a solution to this particular problem.

Oleg.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [????] Re: [????] Re: divide error in x86 and cputime
  2025-07-08  2:17               ` Oleg Nesterov
@ 2025-07-08  9:58                 ` David Laight
  0 siblings, 0 replies; 28+ messages in thread
From: David Laight @ 2025-07-08  9:58 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Steven Rostedt, Li,Rongqing, Peter Zijlstra,
	linux-kernel@vger.kernel.org, vschneid@redhat.com,
	mgorman@suse.de, bsegall@google.com, dietmar.eggemann@arm.com,
	vincent.guittot@linaro.org, juri.lelli@redhat.com,
	mingo@redhat.com

On Tue, 8 Jul 2025 04:17:04 +0200
Oleg Nesterov <oleg@redhat.com> wrote:

> On 07/07, Steven Rostedt wrote:
> >
> > On Tue, 8 Jul 2025 01:58:00 +0000
> > "Li,Rongqing" <lirongqing@baidu.com> wrote:
> >  
> > > But mul_u64_u64_div_u64() for x86 should not trigger a division error panic,  
> > maybe should return a ULLONG_MAX on #DE (like non-x86 mul_u64_u64_div_u64(),)
> >
> > Perhaps.  
> 
> So do you think
> 
> 	static inline u64 mul_u64_u64_div_u64(u64 a, u64 mul, u64 div)
> 	{
> 		int ok = 0;
> 		u64 q;
> 
> 		asm ("mulq %3; 1: divq %4; movl $1,%1; 2:\n"
> 			_ASM_EXTABLE(1b, 2b)
> 			: "=a" (q), "+r" (ok)
> 			: "a" (a), "rm" (mul), "rm" (div)
> 			: "rdx");
> 
> 		return ok ? q : -1ul;

You need to decide what to return/do when 'div' is zero.
So perhaps:
		if (ok)
			return q;
		BUG_ON(!div);
		return ~(u64)0;

But maybe 0/0 should return 0.

> 	}
> 
> makes sense at least for consistency with the generic implementation
> in lib/math/div64.c ?

I don't like the way the current version handles divide by zero at all.
Even forcing the cpu to execute a 'divide by zero' doesn't seem right.
The result should be well defined (and useful).
It might even be worth adding an extra parameter to report overflow
and return ~0 for overflow and 0 for divide by zero (I think that is
less likely to cause grief in the following instructions). 
That does 'pass the buck' to the caller.

> 
> >  But it is still producing garbage.  
> 
> Agreed. And not a solution to this particular problem.

Using mul_u64_u_64_div_u64() here is also horribly expensive for a
simple split between (IIRC) utime and stime.
It isn't too bad on x86-64, but everywhere else it is horrid.
For 'random' values the code hits 900 clocks on x86-32 - and that
is in userspace with cmov and %ebp as a general register.
My new version is ~230 for x86-32 and ~130 for x86-64 (not doing
the fast asm) on ivy bridge, ~80 for x86-64 on zen5.
(I'm on holiday and have limited systems available.)

	David

> 
> Oleg.
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [????] Re: [????] Re: divide error in x86 and cputime
  2025-07-08  0:30           ` Steven Rostedt
  2025-07-08  1:17             ` 答复: [????] " Li,Rongqing
@ 2025-07-08 10:35             ` David Laight
  2025-07-08 11:12               ` 答复: [????] " Li,Rongqing
  1 sibling, 1 reply; 28+ messages in thread
From: David Laight @ 2025-07-08 10:35 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Li,Rongqing, Oleg Nesterov, Peter Zijlstra,
	linux-kernel@vger.kernel.org, vschneid@redhat.com,
	mgorman@suse.de, bsegall@google.com, dietmar.eggemann@arm.com,
	vincent.guittot@linaro.org, juri.lelli@redhat.com,
	mingo@redhat.com

On Mon, 7 Jul 2025 20:30:57 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Tue, 8 Jul 2025 00:10:54 +0000
> "Li,Rongqing" <lirongqing@baidu.com> wrote:
> 
> > >         stime = mul_u64_u64_div_u64(stime, rtime, stime + utime);
> > > +       /*
> > > +        * Because mul_u64_u64_div_u64() can approximate on some
> > > +        * achitectures; enforce the constraint that: a*b/(b+c) <= a.
> > > +        */
> > > +       if (unlikely(stime > rtime))
> > > +               stime = rtime;    
> > 
> > 
> > My 5.10 has not this patch " sched/cputime: Fix mul_u64_u64_div_u64() precision for cputime ",
> > but I am sure this patch can not fix this overflow issue, Since division error happened in mul_u64_u64_div_u64()  
> 
> Have you tried it? Or are you just making an assumption?
> 
> How can you be so sure? Did you even *look* at the commit?

It can't be relevant.
That change is after the mul_u64_u64_div_u64() call that trapped.
It is also not relevant for x86-64 because it uses the asm version.

At some point mul_u64_u64_div_u64() got changed to be accurate (and slow)
so that check isn't needed any more.

	David

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: divide error in x86 and cputime
  2025-07-07 23:00       ` Oleg Nesterov
@ 2025-07-08 11:00         ` David Laight
  0 siblings, 0 replies; 28+ messages in thread
From: David Laight @ 2025-07-08 11:00 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Steven Rostedt, Li,Rongqing, Peter Zijlstra,
	linux-kernel@vger.kernel.org, vschneid@redhat.com,
	mgorman@suse.de, bsegall@google.com, dietmar.eggemann@arm.com,
	vincent.guittot@linaro.org, juri.lelli@redhat.com,
	mingo@redhat.com

On Tue, 8 Jul 2025 01:00:57 +0200
Oleg Nesterov <oleg@redhat.com> wrote:

> On 07/07, Steven Rostedt wrote:
> >
> > On Mon, 7 Jul 2025 18:20:56 -0400
> > Steven Rostedt <rostedt@goodmis.org> wrote:
> >  
> > > I would say this should never happen and if it does, let the kernel crash.  
> >  
> > >> [78250815.703852] CPU: 127 PID: 83435 Comm: killall Kdump: loaded Tainted: P           OE K   5.10.0 #1  
> >
> > This happened on a 5.10 kernel with a proprietary module loaded, so
> > honestly, if it can't be reproduced on a newer kernel without any
> > proprietary modules loaded, I say we don't worry about it.  
> 
> Yes, agreed, see my reply to myself.

Except that just isn't relevant.
The problem is that the process running time (across all threads) can
easily exceed 2^64 nanoseconds.

With cpu having more and more 'cores' and software spinning to reduce
latency it will get more and more common.

Perhaps standardising on ns for timers (etc) wasn't such a bright idea.
Maybe 100ns would have been better.

But the process 'rtime' does need dividing down somewhat.
Thread 'rtime' is fine - 564 years isn't going to be out problem!

	David
  


^ permalink raw reply	[flat|nested] 28+ messages in thread

* 答复: [????] Re: [????] Re: [????] Re: divide error in x86 and cputime
  2025-07-08 10:35             ` [????] Re: [????] " David Laight
@ 2025-07-08 11:12               ` Li,Rongqing
  0 siblings, 0 replies; 28+ messages in thread
From: Li,Rongqing @ 2025-07-08 11:12 UTC (permalink / raw)
  To: David Laight, Steven Rostedt
  Cc: Oleg Nesterov, Peter Zijlstra, linux-kernel@vger.kernel.org,
	vschneid@redhat.com, mgorman@suse.de, bsegall@google.com,
	dietmar.eggemann@arm.com, vincent.guittot@linaro.org,
	juri.lelli@redhat.com, mingo@redhat.com

> > On Tue, 8 Jul 2025 00:10:54 +0000
> > "Li,Rongqing" <lirongqing@baidu.com> wrote:
> >
> > > >         stime = mul_u64_u64_div_u64(stime, rtime, stime + utime);
> > > > +       /*
> > > > +        * Because mul_u64_u64_div_u64() can approximate on some
> > > > +        * achitectures; enforce the constraint that: a*b/(b+c) <= a.
> > > > +        */
> > > > +       if (unlikely(stime > rtime))
> > > > +               stime = rtime;
> > >
> > >
> > > My 5.10 has not this patch " sched/cputime: Fix
> > > mul_u64_u64_div_u64() precision for cputime ", but I am sure this
> > > patch can not fix this overflow issue, Since division error happened
> > > in mul_u64_u64_div_u64()
> >
> > Have you tried it? Or are you just making an assumption?
> >
> > How can you be so sure? Did you even *look* at the commit?
> 
> It can't be relevant.
> That change is after the mul_u64_u64_div_u64() call that trapped.
> It is also not relevant for x86-64 because it uses the asm version.
> 
> At some point mul_u64_u64_div_u64() got changed to be accurate (and slow) so
> that check isn't needed any more.
> 

I see this patch not relevant

Thank you very much for your confirmation

-Li

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: 答复: [????] Re: divide error in x86 and cputime
  2025-07-07 23:41     ` 答复: [????] " Li,Rongqing
  2025-07-07 23:53       ` Steven Rostedt
  2025-07-08  0:23       ` 答复: " Li,Rongqing
@ 2026-01-04 13:23       ` Xia Fukun
  2026-01-04 14:23         ` Oleg Nesterov
  2 siblings, 1 reply; 28+ messages in thread
From: Xia Fukun @ 2026-01-04 13:23 UTC (permalink / raw)
  To: Li,Rongqing, Oleg Nesterov, Peter Zijlstra, David Laight
  Cc: linux-kernel@vger.kernel.org, vschneid@redhat.com,
	mgorman@suse.de, bsegall@google.com, rostedt@goodmis.org,
	dietmar.eggemann@arm.com, vincent.guittot@linaro.org,
	juri.lelli@redhat.com, mingo@redhat.com, Zhangqiao (2012 lab),
	Xia Fukun


On 7/8/2025 7:41 AM, Li,Rongqing wrote:
> 
> it happened when a process with 236 busy polling threads , run about 904 days, the total time will overflow the 64bit
> 
> non-x86 system maybe has same issue, once (stime + utime) overflows 64bit, mul_u64_u64_div_u64 from lib/math/div64.c maybe cause division by 0
> 

We have encountered the same issue in an environment with x86 architecture and kernel version 5.10.

[48734536.498953] divide error: 0000 [#1] SMP NOPTI
[48734536.504336] CPU: 273 PID: 4619 Comm: nano-sysmonitor Kdump: loaded Tainted: G           OE     5.10.0-60.18.0.50.r1209_60_175.hce2.x86_64 #1
[48734536.518065] Hardware name: XFUSION 5885H V7/BC15MBHA, BIOS 01.02.01.03 01/01/2024
[48734536.526620] RIP: 0010:cputime_adjust+0x55/0xb0
[48734536.532093] Code: 0b 48 8b 7d 10 49 89 c0 48 8d 04 0e 48 39 f8 73 38 48 8b 45 00 48 8b 55 08 48 85 c0 74 16 48 85 d2 74 4d 4c 8d 0c 10 48 f7 e7 <49> f7 f1 48 39 c6 48 0f 42 f0 48 89 f8 48 29 f0 48 39 c1 77 29 48
[48734536.552057] RSP: 0018:ffffae408e07bbc8 EFLAGS: 00010807
[48734536.558328] RAX: 2facb95ea704eb6a RBX: ffff98b6293db180 RCX: fff9b822b886cabf
[48734536.566529] RDX: 0005cf0f135b9489 RSI: 0005cf0ec21afa94 RDI: ffff93c922ae82ee
[48734536.574727] RBP: ffffae408e07bbf8 R08: 0000000000000082 R09: 000007333e295d49
[48734536.582930] R10: 8000000000000000 R11: 0000000000000000 R12: ffffae408e07bcf8
[48734536.591131] R13: ffffae408e07bcf0 R14: ffff98b6293db190 R15: fffa2e80e26a98fd
[48734536.599334] FS:  00007f0bc58c3740(0000) GS:ffff98bb75040000(0000) knlGS:0000000000000000
[48734536.608498] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[48734536.615294] CR2: 0000557c0ddca1c8 CR3: 00000600ae12a002 CR4: 0000000000372ee0
[48734536.623497] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[48734536.631697] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[48734536.639898] Call Trace:
[48734536.712624]  thread_group_cputime_adjusted+0x4b/0x70
[48734536.718634]  do_task_stat+0x2d8/0xdc0
[48734536.723326]  task_info_proc_get_info+0x133/0x150


Specifically, a division error occurs in cputime_adjust() during the following calculation:

mul_u64_u64_div_u64(0x5cf1187f5ad33, 0xffff93c922ae82ee, 0x7333e295d49)

Is the patch provided here feasible? Or are there any known workarounds?

> so to cputime, could cputime_adjust() return stime if stime if stime + utime is overflow
> 
> diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
> index 6dab4854..db0c273 100644
> --- a/kernel/sched/cputime.c
> +++ b/kernel/sched/cputime.c
> @@ -579,6 +579,10 @@ void cputime_adjust(struct task_cputime *curr, struct prev_cputime *prev,
>                 goto update;
>         }
> 
> +       if (stime > (stime + utime)) {
> +               goto update;
> +       }
> +
>         stime = mul_u64_u64_div_u64(stime, rtime, stime + utime);
>         /*
>          * Because mul_u64_u64_div_u64() can approximate on some
> 




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: 答复: [????] Re: divide error in x86 and cputime
  2026-01-04 13:23       ` Xia Fukun
@ 2026-01-04 14:23         ` Oleg Nesterov
  2026-01-04 18:15           ` David Laight
  0 siblings, 1 reply; 28+ messages in thread
From: Oleg Nesterov @ 2026-01-04 14:23 UTC (permalink / raw)
  To: Xia Fukun, Peter Zijlstra, Ingo Molnar
  Cc: Li,Rongqing, David Laight, linux-kernel@vger.kernel.org,
	vschneid@redhat.com, mgorman@suse.de, bsegall@google.com,
	rostedt@goodmis.org, dietmar.eggemann@arm.com,
	vincent.guittot@linaro.org, juri.lelli@redhat.com,
	Zhangqiao (2012 lab)

Peter, Ingo,

can you take

	[PATCH v3 0/2] x86/math64: handle #DE in mul_u64_u64_div_u64()
	https://lore.kernel.org/all/20250815164009.GA11676@redhat.com/

? at least 1/2 which fixes the problem with #DE ...

Oleg.

On 01/04, Xia Fukun wrote:
> 
> On 7/8/2025 7:41 AM, Li,Rongqing wrote:
> > 
> > it happened when a process with 236 busy polling threads , run about 904 days, the total time will overflow the 64bit
> > 
> > non-x86 system maybe has same issue, once (stime + utime) overflows 64bit, mul_u64_u64_div_u64 from lib/math/div64.c maybe cause division by 0
> > 
> 
> We have encountered the same issue in an environment with x86 architecture and kernel version 5.10.
> 
> [48734536.498953] divide error: 0000 [#1] SMP NOPTI
> [48734536.504336] CPU: 273 PID: 4619 Comm: nano-sysmonitor Kdump: loaded Tainted: G           OE     5.10.0-60.18.0.50.r1209_60_175.hce2.x86_64 #1
> [48734536.518065] Hardware name: XFUSION 5885H V7/BC15MBHA, BIOS 01.02.01.03 01/01/2024
> [48734536.526620] RIP: 0010:cputime_adjust+0x55/0xb0
> [48734536.532093] Code: 0b 48 8b 7d 10 49 89 c0 48 8d 04 0e 48 39 f8 73 38 48 8b 45 00 48 8b 55 08 48 85 c0 74 16 48 85 d2 74 4d 4c 8d 0c 10 48 f7 e7 <49> f7 f1 48 39 c6 48 0f 42 f0 48 89 f8 48 29 f0 48 39 c1 77 29 48
> [48734536.552057] RSP: 0018:ffffae408e07bbc8 EFLAGS: 00010807
> [48734536.558328] RAX: 2facb95ea704eb6a RBX: ffff98b6293db180 RCX: fff9b822b886cabf
> [48734536.566529] RDX: 0005cf0f135b9489 RSI: 0005cf0ec21afa94 RDI: ffff93c922ae82ee
> [48734536.574727] RBP: ffffae408e07bbf8 R08: 0000000000000082 R09: 000007333e295d49
> [48734536.582930] R10: 8000000000000000 R11: 0000000000000000 R12: ffffae408e07bcf8
> [48734536.591131] R13: ffffae408e07bcf0 R14: ffff98b6293db190 R15: fffa2e80e26a98fd
> [48734536.599334] FS:  00007f0bc58c3740(0000) GS:ffff98bb75040000(0000) knlGS:0000000000000000
> [48734536.608498] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [48734536.615294] CR2: 0000557c0ddca1c8 CR3: 00000600ae12a002 CR4: 0000000000372ee0
> [48734536.623497] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [48734536.631697] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
> [48734536.639898] Call Trace:
> [48734536.712624]  thread_group_cputime_adjusted+0x4b/0x70
> [48734536.718634]  do_task_stat+0x2d8/0xdc0
> [48734536.723326]  task_info_proc_get_info+0x133/0x150
> 
> 
> Specifically, a division error occurs in cputime_adjust() during the following calculation:
> 
> mul_u64_u64_div_u64(0x5cf1187f5ad33, 0xffff93c922ae82ee, 0x7333e295d49)
> 
> Is the patch provided here feasible? Or are there any known workarounds?
> 
> > so to cputime, could cputime_adjust() return stime if stime if stime + utime is overflow
> > 
> > diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
> > index 6dab4854..db0c273 100644
> > --- a/kernel/sched/cputime.c
> > +++ b/kernel/sched/cputime.c
> > @@ -579,6 +579,10 @@ void cputime_adjust(struct task_cputime *curr, struct prev_cputime *prev,
> >                 goto update;
> >         }
> > 
> > +       if (stime > (stime + utime)) {
> > +               goto update;
> > +       }
> > +
> >         stime = mul_u64_u64_div_u64(stime, rtime, stime + utime);
> >         /*
> >          * Because mul_u64_u64_div_u64() can approximate on some
> > 
> 
> 
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [????] Re: divide error in x86 and cputime
  2026-01-04 14:23         ` Oleg Nesterov
@ 2026-01-04 18:15           ` David Laight
  2026-01-04 20:30             ` Oleg Nesterov
  0 siblings, 1 reply; 28+ messages in thread
From: David Laight @ 2026-01-04 18:15 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Xia Fukun, Peter Zijlstra, Ingo Molnar, Li,Rongqing,
	linux-kernel@vger.kernel.org, vschneid@redhat.com,
	mgorman@suse.de, bsegall@google.com, rostedt@goodmis.org,
	dietmar.eggemann@arm.com, vincent.guittot@linaro.org,
	juri.lelli@redhat.com, Zhangqiao (2012 lab)

On Sun, 4 Jan 2026 15:23:19 +0100
Oleg Nesterov <oleg@redhat.com> wrote:

> Peter, Ingo,
> 
> can you take
> 
> 	[PATCH v3 0/2] x86/math64: handle #DE in mul_u64_u64_div_u64()
> 	https://lore.kernel.org/all/20250815164009.GA11676@redhat.com/
> 
> ? at least 1/2 which fixes the problem with #DE ...

I need to look at the state of my mul_u64_u64_div64() patch as well.
I think that has got lost somewhere.
Partially due to arguments about how to handle overflow and divide by zero.
I don't see a problem returning ~0ull for both - it is extremely unlikely
to be a valid result (esp. for code that doesn't need to handle overflow).

But this code needs a completely different fix.
Either the total runtime needs holding in a some other units, or the calculation
needs to use the 'delta runtime' rather than 'absolute runtime' so that
module arithmetic avoids the overflow.

The extra check before the divide will stop the panic, but the returned value
isn't going to be correct.
After 'not much longer' utime will be large enough that the divide no
longer overflows - at which point the calculated value is complete garbage.

	David 

> 
> Oleg.
> 
> On 01/04, Xia Fukun wrote:
> > 
> > On 7/8/2025 7:41 AM, Li,Rongqing wrote:  
> > > 
> > > it happened when a process with 236 busy polling threads , run about 904 days, the total time will overflow the 64bit
> > > 
> > > non-x86 system maybe has same issue, once (stime + utime) overflows 64bit, mul_u64_u64_div_u64 from lib/math/div64.c maybe cause division by 0
> > >   
> > 
> > We have encountered the same issue in an environment with x86 architecture and kernel version 5.10.
> > 
> > [48734536.498953] divide error: 0000 [#1] SMP NOPTI
> > [48734536.504336] CPU: 273 PID: 4619 Comm: nano-sysmonitor Kdump: loaded Tainted: G           OE     5.10.0-60.18.0.50.r1209_60_175.hce2.x86_64 #1
> > [48734536.518065] Hardware name: XFUSION 5885H V7/BC15MBHA, BIOS 01.02.01.03 01/01/2024
> > [48734536.526620] RIP: 0010:cputime_adjust+0x55/0xb0
> > [48734536.532093] Code: 0b 48 8b 7d 10 49 89 c0 48 8d 04 0e 48 39 f8 73 38 48 8b 45 00 48 8b 55 08 48 85 c0 74 16 48 85 d2 74 4d 4c 8d 0c 10 48 f7 e7 <49> f7 f1 48 39 c6 48 0f 42 f0 48 89 f8 48 29 f0 48 39 c1 77 29 48
> > [48734536.552057] RSP: 0018:ffffae408e07bbc8 EFLAGS: 00010807
> > [48734536.558328] RAX: 2facb95ea704eb6a RBX: ffff98b6293db180 RCX: fff9b822b886cabf
> > [48734536.566529] RDX: 0005cf0f135b9489 RSI: 0005cf0ec21afa94 RDI: ffff93c922ae82ee
> > [48734536.574727] RBP: ffffae408e07bbf8 R08: 0000000000000082 R09: 000007333e295d49
> > [48734536.582930] R10: 8000000000000000 R11: 0000000000000000 R12: ffffae408e07bcf8
> > [48734536.591131] R13: ffffae408e07bcf0 R14: ffff98b6293db190 R15: fffa2e80e26a98fd
> > [48734536.599334] FS:  00007f0bc58c3740(0000) GS:ffff98bb75040000(0000) knlGS:0000000000000000
> > [48734536.608498] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [48734536.615294] CR2: 0000557c0ddca1c8 CR3: 00000600ae12a002 CR4: 0000000000372ee0
> > [48734536.623497] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [48734536.631697] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
> > [48734536.639898] Call Trace:
> > [48734536.712624]  thread_group_cputime_adjusted+0x4b/0x70
> > [48734536.718634]  do_task_stat+0x2d8/0xdc0
> > [48734536.723326]  task_info_proc_get_info+0x133/0x150
> > 
> > 
> > Specifically, a division error occurs in cputime_adjust() during the following calculation:
> > 
> > mul_u64_u64_div_u64(0x5cf1187f5ad33, 0xffff93c922ae82ee, 0x7333e295d49)
> > 
> > Is the patch provided here feasible? Or are there any known workarounds?
> >   
> > > so to cputime, could cputime_adjust() return stime if stime if stime + utime is overflow
> > > 
> > > diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
> > > index 6dab4854..db0c273 100644
> > > --- a/kernel/sched/cputime.c
> > > +++ b/kernel/sched/cputime.c
> > > @@ -579,6 +579,10 @@ void cputime_adjust(struct task_cputime *curr, struct prev_cputime *prev,
> > >                 goto update;
> > >         }
> > > 
> > > +       if (stime > (stime + utime)) {
> > > +               goto update;
> > > +       }
> > > +
> > >         stime = mul_u64_u64_div_u64(stime, rtime, stime + utime);
> > >         /*
> > >          * Because mul_u64_u64_div_u64() can approximate on some
> > >   
> > 
> > 
> >   
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [????] Re: divide error in x86 and cputime
  2026-01-04 18:15           ` David Laight
@ 2026-01-04 20:30             ` Oleg Nesterov
  2026-01-04 22:03               ` David Laight
  0 siblings, 1 reply; 28+ messages in thread
From: Oleg Nesterov @ 2026-01-04 20:30 UTC (permalink / raw)
  To: David Laight
  Cc: Xia Fukun, Peter Zijlstra, Ingo Molnar, Li,Rongqing,
	linux-kernel@vger.kernel.org, vschneid@redhat.com,
	mgorman@suse.de, bsegall@google.com, rostedt@goodmis.org,
	dietmar.eggemann@arm.com, vincent.guittot@linaro.org,
	juri.lelli@redhat.com, Zhangqiao (2012 lab)

On 01/04, David Laight wrote:
>
> On Sun, 4 Jan 2026 15:23:19 +0100
> Oleg Nesterov <oleg@redhat.com> wrote:
>
> > Peter, Ingo,
> >
> > can you take
> >
> > 	[PATCH v3 0/2] x86/math64: handle #DE in mul_u64_u64_div_u64()
> > 	https://lore.kernel.org/all/20250815164009.GA11676@redhat.com/
> >
> > ? at least 1/2 which fixes the problem with #DE ...
>

...

> But this code needs a completely different fix.

Of course, this is clear. The fix above doesn't even try to address the
problems in cputime_adjust() paths.

But. To me the fix above makes sense regardless. mul_u64_u64_div_u64()
should nevet trigger #DE, and I thought that we already discussed this
before.

Do you agree?

Oleg.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [????] Re: divide error in x86 and cputime
  2026-01-04 20:30             ` Oleg Nesterov
@ 2026-01-04 22:03               ` David Laight
  0 siblings, 0 replies; 28+ messages in thread
From: David Laight @ 2026-01-04 22:03 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Xia Fukun, Peter Zijlstra, Ingo Molnar, Li,Rongqing,
	linux-kernel@vger.kernel.org, vschneid@redhat.com,
	mgorman@suse.de, bsegall@google.com, rostedt@goodmis.org,
	dietmar.eggemann@arm.com, vincent.guittot@linaro.org,
	juri.lelli@redhat.com, Zhangqiao (2012 lab)

On Sun, 4 Jan 2026 21:30:39 +0100
Oleg Nesterov <oleg@redhat.com> wrote:

> On 01/04, David Laight wrote:
> >
> > On Sun, 4 Jan 2026 15:23:19 +0100
> > Oleg Nesterov <oleg@redhat.com> wrote:
> >  
> > > Peter, Ingo,
> > >
> > > can you take
> > >
> > > 	[PATCH v3 0/2] x86/math64: handle #DE in mul_u64_u64_div_u64()
> > > 	https://lore.kernel.org/all/20250815164009.GA11676@redhat.com/
> > >
> > > ? at least 1/2 which fixes the problem with #DE ...  
> >  
> 
> ...
> 
> > But this code needs a completely different fix.  
> 
> Of course, this is clear. The fix above doesn't even try to address the
> problems in cputime_adjust() paths.
> 
> But. To me the fix above makes sense regardless. mul_u64_u64_div_u64()
> should nevet trigger #DE, and I thought that we already discussed this
> before.
> 
> Do you agree?

Yes, I think both the generic and x86 versions should just return ~0ull
for both overflow and divide by zero.

	David

> 
> Oleg.
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2026-01-04 22:03 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-07  8:14 divide error in x86 and cputime Li,Rongqing
2025-07-07 15:11 ` Steven Rostedt
2025-07-07 22:09 ` Oleg Nesterov
2025-07-07 22:20   ` Steven Rostedt
2025-07-07 22:33     ` Steven Rostedt
2025-07-07 23:00       ` Oleg Nesterov
2025-07-08 11:00         ` David Laight
2025-07-08  1:40       ` 答复: [????] " Li,Rongqing
2025-07-08  1:53         ` Steven Rostedt
2025-07-08  1:58           ` 答复: [????] " Li,Rongqing
2025-07-08  2:05             ` Steven Rostedt
2025-07-08  2:17               ` Oleg Nesterov
2025-07-08  9:58                 ` David Laight
2025-07-07 22:30   ` Oleg Nesterov
2025-07-07 23:41     ` 答复: [????] " Li,Rongqing
2025-07-07 23:53       ` Steven Rostedt
2025-07-08  0:10         ` 答复: [????] " Li,Rongqing
2025-07-08  0:30           ` Steven Rostedt
2025-07-08  1:17             ` 答复: [????] " Li,Rongqing
2025-07-08  1:41               ` Steven Rostedt
2025-07-08 10:35             ` [????] Re: [????] " David Laight
2025-07-08 11:12               ` 答复: [????] " Li,Rongqing
2025-07-08  0:23       ` 答复: " Li,Rongqing
2026-01-04 13:23       ` Xia Fukun
2026-01-04 14:23         ` Oleg Nesterov
2026-01-04 18:15           ` David Laight
2026-01-04 20:30             ` Oleg Nesterov
2026-01-04 22:03               ` David Laight

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox