Kernel oops caused by signed divide

public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed

* Kernel oops caused by signed divide
@ 2024-09-09 17:21 Zac Ecob
  2024-09-09 17:27 ` Yonghong Song
  2024-09-09 17:29 ` Alexei Starovoitov
  0 siblings, 2 replies; 16+ messages in thread
From: Zac Ecob @ 2024-09-09 17:21 UTC (permalink / raw)
  To: bpf@vger.kernel.org

Hello,

I recently received a kernel 'oops' about a divide error. 
After some research, it seems that the 'div64_s64' function used for the 'MOD'/'REM' instructions boils down to an 'idiv'.

The 'dividend' is set to INT64_MIN, and the 'divisor' to -1, then because of two's complement, there is no corresponding positive value, causing the error (at least to my understanding).

Apologies if this is already known / not a relevant concern.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Kernel oops caused by signed divide
  2024-09-09 17:21 Kernel oops caused by signed divide Zac Ecob
@ 2024-09-09 17:27 ` Yonghong Song
  2024-09-09 17:29 ` Alexei Starovoitov
  1 sibling, 0 replies; 16+ messages in thread
From: Yonghong Song @ 2024-09-09 17:27 UTC (permalink / raw)
  To: Zac Ecob, bpf@vger.kernel.org


On 9/9/24 10:21 AM, Zac Ecob wrote:
> Hello,
>
> I recently received a kernel 'oops' about a divide error.
> After some research, it seems that the 'div64_s64' function used for the 'MOD'/'REM' instructions boils down to an 'idiv'.
>
> The 'dividend' is set to INT64_MIN, and the 'divisor' to -1, then because of two's complement, there is no corresponding positive value, causing the error (at least to my understanding).

Could you provide a reproducible test case for this? It will make it easy to debug the issue.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Kernel oops caused by signed divide
  2024-09-09 17:21 Kernel oops caused by signed divide Zac Ecob
  2024-09-09 17:27 ` Yonghong Song
@ 2024-09-09 17:29 ` Alexei Starovoitov
  2024-09-09 23:47   ` Yonghong Song
  2024-09-10 14:21   ` Yonghong Song
  1 sibling, 2 replies; 16+ messages in thread
From: Alexei Starovoitov @ 2024-09-09 17:29 UTC (permalink / raw)
  To: Zac Ecob, Yonghong Song, Daniel Borkmann; +Cc: bpf@vger.kernel.org

On Mon, Sep 9, 2024 at 10:21 AM Zac Ecob <zacecob@protonmail.com> wrote:
>
> Hello,
>
> I recently received a kernel 'oops' about a divide error.
> After some research, it seems that the 'div64_s64' function used for the 'MOD'/'REM' instructions boils down to an 'idiv'.
>
> The 'dividend' is set to INT64_MIN, and the 'divisor' to -1, then because of two's complement, there is no corresponding positive value, causing the error (at least to my understanding).
>
>
> Apologies if this is already known / not a relevant concern.

Thanks for the report. This is a new issue.

Yonghong,

it's related to the new signed div insn.
It sounds like we need to update chk_and_div[] part of
the verifier to account for signed div differently.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Kernel oops caused by signed divide
  2024-09-09 17:29 ` Alexei Starovoitov
@ 2024-09-09 23:47   ` Yonghong Song
  2024-09-10 14:21   ` Yonghong Song
  1 sibling, 0 replies; 16+ messages in thread
From: Yonghong Song @ 2024-09-09 23:47 UTC (permalink / raw)
  To: Alexei Starovoitov, Zac Ecob, Daniel Borkmann; +Cc: bpf@vger.kernel.org


On 9/9/24 10:29 AM, Alexei Starovoitov wrote:
> On Mon, Sep 9, 2024 at 10:21 AM Zac Ecob <zacecob@protonmail.com> wrote:
>> Hello,
>>
>> I recently received a kernel 'oops' about a divide error.
>> After some research, it seems that the 'div64_s64' function used for the 'MOD'/'REM' instructions boils down to an 'idiv'.
>>
>> The 'dividend' is set to INT64_MIN, and the 'divisor' to -1, then because of two's complement, there is no corresponding positive value, causing the error (at least to my understanding).
>>
>>
>> Apologies if this is already known / not a relevant concern.
> Thanks for the report. This is a new issue.
>
> Yonghong,
>
> it's related to the new signed div insn.
> It sounds like we need to update chk_and_div[] part of
> the verifier to account for signed div differently.

Okay. Indeed, INT64_MIN/(-1) cannot be represented.
I will do something similar to chk_and_div[] to filter
out this corner case.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Kernel oops caused by signed divide
  2024-09-09 17:29 ` Alexei Starovoitov
  2024-09-09 23:47   ` Yonghong Song
@ 2024-09-10 14:21   ` Yonghong Song
  2024-09-10 14:44     ` Dave Thaler
  2024-09-10 15:21     ` Alexei Starovoitov
  1 sibling, 2 replies; 16+ messages in thread
From: Yonghong Song @ 2024-09-10 14:21 UTC (permalink / raw)
  To: Alexei Starovoitov, Zac Ecob, Daniel Borkmann; +Cc: bpf@vger.kernel.org


On 9/9/24 10:29 AM, Alexei Starovoitov wrote:
> On Mon, Sep 9, 2024 at 10:21 AM Zac Ecob <zacecob@protonmail.com> wrote:
>> Hello,
>>
>> I recently received a kernel 'oops' about a divide error.
>> After some research, it seems that the 'div64_s64' function used for the 'MOD'/'REM' instructions boils down to an 'idiv'.
>>
>> The 'dividend' is set to INT64_MIN, and the 'divisor' to -1, then because of two's complement, there is no corresponding positive value, causing the error (at least to my understanding).
>>
>>
>> Apologies if this is already known / not a relevant concern.
> Thanks for the report. This is a new issue.
>
> Yonghong,
>
> it's related to the new signed div insn.
> It sounds like we need to update chk_and_div[] part of
> the verifier to account for signed div differently.

In verifier, we have
   /* [R,W]x div 0 -> 0 */
   /* [R,W]x mod 0 -> [R,W]x */

What the value for
   Rx_a sdiv Rx_b -> ?
where Rx_a = INT64_MIN and Rx_b = -1?

Should we just do
   INT64_MIN sdiv -1 -> -1
or some other values?


^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: Kernel oops caused by signed divide
  2024-09-10 14:21   ` Yonghong Song
@ 2024-09-10 14:44     ` Dave Thaler
  2024-09-10 15:18       ` Yonghong Song
  2024-09-10 15:21     ` Alexei Starovoitov
  1 sibling, 1 reply; 16+ messages in thread
From: Dave Thaler @ 2024-09-10 14:44 UTC (permalink / raw)
  To: 'Yonghong Song', 'Alexei Starovoitov',
	'Zac Ecob', 'Daniel Borkmann'
  Cc: bpf

Yonghong Song wrote: 
[...]
> In verifier, we have
>    /* [R,W]x div 0 -> 0 */
>    /* [R,W]x mod 0 -> [R,W]x */
> 
> What the value for
>    Rx_a sdiv Rx_b -> ?
> where Rx_a = INT64_MIN and Rx_b = -1?
> 
> Should we just do
>    INT64_MIN sdiv -1 -> -1
> or some other values?

What happens for BPF_NEG INT64_MIN?

Dave


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Kernel oops caused by signed divide
  2024-09-10 14:44     ` Dave Thaler
@ 2024-09-10 15:18       ` Yonghong Song
  2024-09-10 15:21         ` Alexei Starovoitov
  0 siblings, 1 reply; 16+ messages in thread
From: Yonghong Song @ 2024-09-10 15:18 UTC (permalink / raw)
  To: Dave Thaler, 'Alexei Starovoitov', 'Zac Ecob',
	'Daniel Borkmann'
  Cc: bpf


On 9/10/24 7:44 AM, Dave Thaler wrote:
> Yonghong Song wrote:
> [...]
>> In verifier, we have
>>     /* [R,W]x div 0 -> 0 */
>>     /* [R,W]x mod 0 -> [R,W]x */
>>
>> What the value for
>>     Rx_a sdiv Rx_b -> ?
>> where Rx_a = INT64_MIN and Rx_b = -1?
>>
>> Should we just do
>>     INT64_MIN sdiv -1 -> -1
>> or some other values?
> What happens for BPF_NEG INT64_MIN?

Right. This is equivalent to INT64_MIN/-1. Indeed, we need check and protect for this case as well.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Kernel oops caused by signed divide
  2024-09-10 14:21   ` Yonghong Song
  2024-09-10 14:44     ` Dave Thaler
@ 2024-09-10 15:21     ` Alexei Starovoitov
  2024-09-10 18:02       ` Yonghong Song
  1 sibling, 1 reply; 16+ messages in thread
From: Alexei Starovoitov @ 2024-09-10 15:21 UTC (permalink / raw)
  To: Yonghong Song; +Cc: Zac Ecob, Daniel Borkmann, bpf@vger.kernel.org

On Tue, Sep 10, 2024 at 7:21 AM Yonghong Song <yonghong.song@linux.dev> wrote:
>
>
> On 9/9/24 10:29 AM, Alexei Starovoitov wrote:
> > On Mon, Sep 9, 2024 at 10:21 AM Zac Ecob <zacecob@protonmail.com> wrote:
> >> Hello,
> >>
> >> I recently received a kernel 'oops' about a divide error.
> >> After some research, it seems that the 'div64_s64' function used for the 'MOD'/'REM' instructions boils down to an 'idiv'.
> >>
> >> The 'dividend' is set to INT64_MIN, and the 'divisor' to -1, then because of two's complement, there is no corresponding positive value, causing the error (at least to my understanding).
> >>
> >>
> >> Apologies if this is already known / not a relevant concern.
> > Thanks for the report. This is a new issue.
> >
> > Yonghong,
> >
> > it's related to the new signed div insn.
> > It sounds like we need to update chk_and_div[] part of
> > the verifier to account for signed div differently.
>
> In verifier, we have
>    /* [R,W]x div 0 -> 0 */
>    /* [R,W]x mod 0 -> [R,W]x */

the verifier is doing what hw does. In this case this is arm64 behavior.

> What the value for
>    Rx_a sdiv Rx_b -> ?
> where Rx_a = INT64_MIN and Rx_b = -1?

Why does it matter what Rx_a contains ?

What cpus do in this case?

> Should we just do
>    INT64_MIN sdiv -1 -> -1
> or some other values?
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Kernel oops caused by signed divide
  2024-09-10 15:18       ` Yonghong Song
@ 2024-09-10 15:21         ` Alexei Starovoitov
  2024-09-10 18:12           ` Yonghong Song
  0 siblings, 1 reply; 16+ messages in thread
From: Alexei Starovoitov @ 2024-09-10 15:21 UTC (permalink / raw)
  To: Yonghong Song; +Cc: Dave Thaler, Zac Ecob, Daniel Borkmann, bpf

On Tue, Sep 10, 2024 at 8:18 AM Yonghong Song <yonghong.song@linux.dev> wrote:
>
>
> On 9/10/24 7:44 AM, Dave Thaler wrote:
> > Yonghong Song wrote:
> > [...]
> >> In verifier, we have
> >>     /* [R,W]x div 0 -> 0 */
> >>     /* [R,W]x mod 0 -> [R,W]x */
> >>
> >> What the value for
> >>     Rx_a sdiv Rx_b -> ?
> >> where Rx_a = INT64_MIN and Rx_b = -1?
> >>
> >> Should we just do
> >>     INT64_MIN sdiv -1 -> -1
> >> or some other values?
> > What happens for BPF_NEG INT64_MIN?
>
> Right. This is equivalent to INT64_MIN/-1. Indeed, we need check and protect for this case as well.

why? what's wrong with bpf_neg -1 ?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Kernel oops caused by signed divide
  2024-09-10 15:21     ` Alexei Starovoitov
@ 2024-09-10 18:02       ` Yonghong Song
  2024-09-10 18:25         ` Alexei Starovoitov
  0 siblings, 1 reply; 16+ messages in thread
From: Yonghong Song @ 2024-09-10 18:02 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: Zac Ecob, Daniel Borkmann, bpf@vger.kernel.org


On 9/10/24 8:21 AM, Alexei Starovoitov wrote:
> On Tue, Sep 10, 2024 at 7:21 AM Yonghong Song <yonghong.song@linux.dev> wrote:
>>
>> On 9/9/24 10:29 AM, Alexei Starovoitov wrote:
>>> On Mon, Sep 9, 2024 at 10:21 AM Zac Ecob <zacecob@protonmail.com> wrote:
>>>> Hello,
>>>>
>>>> I recently received a kernel 'oops' about a divide error.
>>>> After some research, it seems that the 'div64_s64' function used for the 'MOD'/'REM' instructions boils down to an 'idiv'.
>>>>
>>>> The 'dividend' is set to INT64_MIN, and the 'divisor' to -1, then because of two's complement, there is no corresponding positive value, causing the error (at least to my understanding).
>>>>
>>>>
>>>> Apologies if this is already known / not a relevant concern.
>>> Thanks for the report. This is a new issue.
>>>
>>> Yonghong,
>>>
>>> it's related to the new signed div insn.
>>> It sounds like we need to update chk_and_div[] part of
>>> the verifier to account for signed div differently.
>> In verifier, we have
>>     /* [R,W]x div 0 -> 0 */
>>     /* [R,W]x mod 0 -> [R,W]x */
> the verifier is doing what hw does. In this case this is arm64 behavior.

Okay, I see. I tried on a arm64 machine it indeed hehaves like the above.

# uname -a
Linux ... #1 SMP PREEMPT_DYNAMIC Thu Aug  1 06:58:32 PDT 2024 aarch64 aarch64 aarch64 GNU/Linux
# cat t2.c
#include <stdio.h>
#include <limits.h>
int main(void) {
   volatile long long a = 5;
   volatile long long b = 0;
   printf("a/b = %lld\n", a/b);
   return 0;
}
# cat t3.c
#include <stdio.h>
#include <limits.h>
int main(void) {
   volatile long long a = 5;
   volatile long long b = 0;
   printf("a%%b = %lld\n", a%b);
   return 0;
}
# gcc -O2 t2.c && ./a.out
a/b = 0
# gcc -O2 t3.c && ./a.out
a%b = 5

on arm64, clang18 compiled binary has the same result

# clang -O2 t2.c && ./a.out
a/b = 0
# clang -O2 t3.c && ./a.out
a%b = 5

The same source code, compiled on x86_64 with -O2 as well,
it generates:
   Floating point exception (core dumped)

>
>> What the value for
>>     Rx_a sdiv Rx_b -> ?
>> where Rx_a = INT64_MIN and Rx_b = -1?
> Why does it matter what Rx_a contains ?

It does matter. See below:

on arm64:

# cat t1.c
#include <stdio.h>
#include <limits.h>
int main(void) {
   volatile long long a = LLONG_MIN;
   volatile long long b = -1;
   printf("a/b = %lld\n", a/b);
   return 0;
}
# clang -O2 t1.c && ./a.out
a/b = -9223372036854775808
# gcc -O2 t1.c && ./a.out
a/b = -9223372036854775808

So the result of a/b is LLONG_MIN

The same code will cause exception on x86_64:

$ uname -a
Linux ... #1 SMP Wed Jun  5 06:21:21 PDT 2024 x86_64 x86_64 x86_64 GNU/Linux
[yhs@devvm1513.prn0 ~]$ gcc -O2 t1.c && ./a.out
Floating point exception (core dumped)
[yhs@devvm1513.prn0 ~]$ clang -O2 t1.c && ./a.out
Floating point exception (core dumped)

So this is what we care about.

So I guess we can follow arm64 result too.

>
> What cpus do in this case?

See above. arm64 produces *some* result while x64 cause exception.
We do need to special handle for LLONG_MIN/(-1) case.

>
>> Should we just do
>>     INT64_MIN sdiv -1 -> -1
>> or some other values?
>>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Kernel oops caused by signed divide
  2024-09-10 15:21         ` Alexei Starovoitov
@ 2024-09-10 18:12           ` Yonghong Song
  0 siblings, 0 replies; 16+ messages in thread
From: Yonghong Song @ 2024-09-10 18:12 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: Dave Thaler, Zac Ecob, Daniel Borkmann, bpf


On 9/10/24 8:21 AM, Alexei Starovoitov wrote:
> On Tue, Sep 10, 2024 at 8:18 AM Yonghong Song <yonghong.song@linux.dev> wrote:
>>
>> On 9/10/24 7:44 AM, Dave Thaler wrote:
>>> Yonghong Song wrote:
>>> [...]
>>>> In verifier, we have
>>>>      /* [R,W]x div 0 -> 0 */
>>>>      /* [R,W]x mod 0 -> [R,W]x */
>>>>
>>>> What the value for
>>>>      Rx_a sdiv Rx_b -> ?
>>>> where Rx_a = INT64_MIN and Rx_b = -1?
>>>>
>>>> Should we just do
>>>>      INT64_MIN sdiv -1 -> -1
>>>> or some other values?
>>> What happens for BPF_NEG INT64_MIN?
>> Right. This is equivalent to INT64_MIN/-1. Indeed, we need check and protect for this case as well.
> why? what's wrong with bpf_neg -1 ?

I think you are right. 'bpf_neg <num>' should not cause any exception.
In this particular case 'bpf_neg LLONG_MIN' equals LLONG_MIN.

On arm64,

# cat t4.c
#include <stdio.h>
#include <limits.h>
int main(void) {
   volatile long long a = LLONG_MIN;
   printf("-a = %lld\n", -a);
   return 0;
}
# gcc -O2 t4.c && ./a.out
-a = -9223372036854775808

In the above -a also equals LLONG_MIN.

On x86, we get the same result.

$ uname -a
Linux ... #1 SMP Wed Jun  5 06:21:21 PDT 2024 x86_64 x86_64 x86_64 GNU/Linux
$ cat t4.c
#include <stdio.h>
#include <limits.h>
int main(void) {
   volatile long long a = LLONG_MIN;
   printf("-a = %lld\n", -a);
   return 0;
}
$ gcc -O2 t4.c && ./a.out
-a = -9223372036854775808
$ clang -O2 t4.c && ./a.out
-a = -9223372036854775808




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Kernel oops caused by signed divide
  2024-09-10 18:02       ` Yonghong Song
@ 2024-09-10 18:25         ` Alexei Starovoitov
  2024-09-10 19:32           ` Yonghong Song
  0 siblings, 1 reply; 16+ messages in thread
From: Alexei Starovoitov @ 2024-09-10 18:25 UTC (permalink / raw)
  To: Yonghong Song; +Cc: Zac Ecob, Daniel Borkmann, bpf@vger.kernel.org

On Tue, Sep 10, 2024 at 11:02 AM Yonghong Song <yonghong.song@linux.dev> wrote:
>
>
> On 9/10/24 8:21 AM, Alexei Starovoitov wrote:
> > On Tue, Sep 10, 2024 at 7:21 AM Yonghong Song <yonghong.song@linux.dev> wrote:
> >>
> >> On 9/9/24 10:29 AM, Alexei Starovoitov wrote:
> >>> On Mon, Sep 9, 2024 at 10:21 AM Zac Ecob <zacecob@protonmail.com> wrote:
> >>>> Hello,
> >>>>
> >>>> I recently received a kernel 'oops' about a divide error.
> >>>> After some research, it seems that the 'div64_s64' function used for the 'MOD'/'REM' instructions boils down to an 'idiv'.
> >>>>
> >>>> The 'dividend' is set to INT64_MIN, and the 'divisor' to -1, then because of two's complement, there is no corresponding positive value, causing the error (at least to my understanding).
> >>>>
> >>>>
> >>>> Apologies if this is already known / not a relevant concern.
> >>> Thanks for the report. This is a new issue.
> >>>
> >>> Yonghong,
> >>>
> >>> it's related to the new signed div insn.
> >>> It sounds like we need to update chk_and_div[] part of
> >>> the verifier to account for signed div differently.
> >> In verifier, we have
> >>     /* [R,W]x div 0 -> 0 */
> >>     /* [R,W]x mod 0 -> [R,W]x */
> > the verifier is doing what hw does. In this case this is arm64 behavior.
>
> Okay, I see. I tried on a arm64 machine it indeed hehaves like the above.
>
> # uname -a
> Linux ... #1 SMP PREEMPT_DYNAMIC Thu Aug  1 06:58:32 PDT 2024 aarch64 aarch64 aarch64 GNU/Linux
> # cat t2.c
> #include <stdio.h>
> #include <limits.h>
> int main(void) {
>    volatile long long a = 5;
>    volatile long long b = 0;
>    printf("a/b = %lld\n", a/b);
>    return 0;
> }
> # cat t3.c
> #include <stdio.h>
> #include <limits.h>
> int main(void) {
>    volatile long long a = 5;
>    volatile long long b = 0;
>    printf("a%%b = %lld\n", a%b);
>    return 0;
> }
> # gcc -O2 t2.c && ./a.out
> a/b = 0
> # gcc -O2 t3.c && ./a.out
> a%b = 5
>
> on arm64, clang18 compiled binary has the same result
>
> # clang -O2 t2.c && ./a.out
> a/b = 0
> # clang -O2 t3.c && ./a.out
> a%b = 5
>
> The same source code, compiled on x86_64 with -O2 as well,
> it generates:
>    Floating point exception (core dumped)
>
> >
> >> What the value for
> >>     Rx_a sdiv Rx_b -> ?
> >> where Rx_a = INT64_MIN and Rx_b = -1?
> > Why does it matter what Rx_a contains ?
>
> It does matter. See below:
>
> on arm64:
>
> # cat t1.c
> #include <stdio.h>
> #include <limits.h>
> int main(void) {
>    volatile long long a = LLONG_MIN;
>    volatile long long b = -1;
>    printf("a/b = %lld\n", a/b);
>    return 0;
> }
> # clang -O2 t1.c && ./a.out
> a/b = -9223372036854775808
> # gcc -O2 t1.c && ./a.out
> a/b = -9223372036854775808
>
> So the result of a/b is LLONG_MIN
>
> The same code will cause exception on x86_64:
>
> $ uname -a
> Linux ... #1 SMP Wed Jun  5 06:21:21 PDT 2024 x86_64 x86_64 x86_64 GNU/Linux
> [yhs@devvm1513.prn0 ~]$ gcc -O2 t1.c && ./a.out
> Floating point exception (core dumped)
> [yhs@devvm1513.prn0 ~]$ clang -O2 t1.c && ./a.out
> Floating point exception (core dumped)
>
> So this is what we care about.
>
> So I guess we can follow arm64 result too.
>
> >
> > What cpus do in this case?
>
> See above. arm64 produces *some* result while x64 cause exception.
> We do need to special handle for LLONG_MIN/(-1) case.

My point about Rx_a that idiv will cause out-of-range exception
for many other values than Rx_a == INT64_MIN.
I'm not sure that divisor -1 is the only such case either.
Probably is, since intuitively -2 and all other divisors should fit fine.
So the check likely needs Rx_b == -1 and a check for high bit in Rx_a ?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Kernel oops caused by signed divide
  2024-09-10 18:25         ` Alexei Starovoitov
@ 2024-09-10 19:32           ` Yonghong Song
  2024-09-10 21:53             ` Alexei Starovoitov
  0 siblings, 1 reply; 16+ messages in thread
From: Yonghong Song @ 2024-09-10 19:32 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: Zac Ecob, Daniel Borkmann, bpf@vger.kernel.org


On 9/10/24 11:25 AM, Alexei Starovoitov wrote:
> On Tue, Sep 10, 2024 at 11:02 AM Yonghong Song <yonghong.song@linux.dev> wrote:
>>
>> On 9/10/24 8:21 AM, Alexei Starovoitov wrote:
>>> On Tue, Sep 10, 2024 at 7:21 AM Yonghong Song <yonghong.song@linux.dev> wrote:
>>>> On 9/9/24 10:29 AM, Alexei Starovoitov wrote:
>>>>> On Mon, Sep 9, 2024 at 10:21 AM Zac Ecob <zacecob@protonmail.com> wrote:
>>>>>> Hello,
>>>>>>
>>>>>> I recently received a kernel 'oops' about a divide error.
>>>>>> After some research, it seems that the 'div64_s64' function used for the 'MOD'/'REM' instructions boils down to an 'idiv'.
>>>>>>
>>>>>> The 'dividend' is set to INT64_MIN, and the 'divisor' to -1, then because of two's complement, there is no corresponding positive value, causing the error (at least to my understanding).
>>>>>>
>>>>>>
>>>>>> Apologies if this is already known / not a relevant concern.
>>>>> Thanks for the report. This is a new issue.
>>>>>
>>>>> Yonghong,
>>>>>
>>>>> it's related to the new signed div insn.
>>>>> It sounds like we need to update chk_and_div[] part of
>>>>> the verifier to account for signed div differently.
>>>> In verifier, we have
>>>>      /* [R,W]x div 0 -> 0 */
>>>>      /* [R,W]x mod 0 -> [R,W]x */
>>> the verifier is doing what hw does. In this case this is arm64 behavior.
>> Okay, I see. I tried on a arm64 machine it indeed hehaves like the above.
>>
>> # uname -a
>> Linux ... #1 SMP PREEMPT_DYNAMIC Thu Aug  1 06:58:32 PDT 2024 aarch64 aarch64 aarch64 GNU/Linux
>> # cat t2.c
>> #include <stdio.h>
>> #include <limits.h>
>> int main(void) {
>>     volatile long long a = 5;
>>     volatile long long b = 0;
>>     printf("a/b = %lld\n", a/b);
>>     return 0;
>> }
>> # cat t3.c
>> #include <stdio.h>
>> #include <limits.h>
>> int main(void) {
>>     volatile long long a = 5;
>>     volatile long long b = 0;
>>     printf("a%%b = %lld\n", a%b);
>>     return 0;
>> }
>> # gcc -O2 t2.c && ./a.out
>> a/b = 0
>> # gcc -O2 t3.c && ./a.out
>> a%b = 5
>>
>> on arm64, clang18 compiled binary has the same result
>>
>> # clang -O2 t2.c && ./a.out
>> a/b = 0
>> # clang -O2 t3.c && ./a.out
>> a%b = 5
>>
>> The same source code, compiled on x86_64 with -O2 as well,
>> it generates:
>>     Floating point exception (core dumped)
>>
>>>> What the value for
>>>>      Rx_a sdiv Rx_b -> ?
>>>> where Rx_a = INT64_MIN and Rx_b = -1?
>>> Why does it matter what Rx_a contains ?
>> It does matter. See below:
>>
>> on arm64:
>>
>> # cat t1.c
>> #include <stdio.h>
>> #include <limits.h>
>> int main(void) {
>>     volatile long long a = LLONG_MIN;
>>     volatile long long b = -1;
>>     printf("a/b = %lld\n", a/b);
>>     return 0;
>> }
>> # clang -O2 t1.c && ./a.out
>> a/b = -9223372036854775808
>> # gcc -O2 t1.c && ./a.out
>> a/b = -9223372036854775808
>>
>> So the result of a/b is LLONG_MIN
>>
>> The same code will cause exception on x86_64:
>>
>> $ uname -a
>> Linux ... #1 SMP Wed Jun  5 06:21:21 PDT 2024 x86_64 x86_64 x86_64 GNU/Linux
>> [yhs@devvm1513.prn0 ~]$ gcc -O2 t1.c && ./a.out
>> Floating point exception (core dumped)
>> [yhs@devvm1513.prn0 ~]$ clang -O2 t1.c && ./a.out
>> Floating point exception (core dumped)
>>
>> So this is what we care about.
>>
>> So I guess we can follow arm64 result too.
>>
>>> What cpus do in this case?
>> See above. arm64 produces *some* result while x64 cause exception.
>> We do need to special handle for LLONG_MIN/(-1) case.
> My point about Rx_a that idiv will cause out-of-range exception
> for many other values than Rx_a == INT64_MIN.
> I'm not sure that divisor -1 is the only such case either.
> Probably is, since intuitively -2 and all other divisors should fit fine.
> So the check likely needs Rx_b == -1 and a check for high bit in Rx_a ?

Looks like only Rx_a == INT64_MIN may cause the problem.
All other Rx_a numbers (from INT64_MIN+1 to INT64_MAX)
should be okay. Some selective testing below on x64 host:

$ cat t5.c
#include <stdio.h>
#include <limits.h>

unsigned long long res;
int main(void) {
   volatile long long a;
   long long i;
   for (i = LLONG_MIN + 1; i <= LLONG_MIN + 100; i++) {
     volatile long long b = -1;
     a = i;
     res += (unsigned long long)(a/b);
   }
   for (i = LLONG_MAX - 100; i <= LLONG_MAX - 1; i++) {
     volatile long long b = -1;
     a = i;
     res += (unsigned long long)(a/b);
   }
   printf("res = %llx\n", res);
   return 0;
}
$ gcc -O2 t5.c && ./a.out
res = 64

So I think it should be okay if the range is from LLONG_MIN + 1
to LLONG_MAX - 1.

Now for LLONG_MAX/(-1)

$ cat t6.c
#include <stdio.h>
#include <limits.h>
int main(void) {
   volatile long long a = LLONG_MAX;
   volatile long long b = -1;
   printf("a/b = %lld\n", a/b);
   return 0;
}
$ gcc -O2 t6.c && ./a.out
a/b = -9223372036854775807

It is okay too. So I think LLONG_MIN/(-1) is the only case
we should take care of.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Kernel oops caused by signed divide
  2024-09-10 19:32           ` Yonghong Song
@ 2024-09-10 21:53             ` Alexei Starovoitov
  2024-09-10 22:00               ` Yonghong Song
  2024-09-10 22:43               ` Andrii Nakryiko
  0 siblings, 2 replies; 16+ messages in thread
From: Alexei Starovoitov @ 2024-09-10 21:53 UTC (permalink / raw)
  To: Yonghong Song; +Cc: Zac Ecob, Daniel Borkmann, bpf@vger.kernel.org

On Tue, Sep 10, 2024 at 12:32 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>
>
> On 9/10/24 11:25 AM, Alexei Starovoitov wrote:
> > On Tue, Sep 10, 2024 at 11:02 AM Yonghong Song <yonghong.song@linux.dev> wrote:
> >>
> >> On 9/10/24 8:21 AM, Alexei Starovoitov wrote:
> >>> On Tue, Sep 10, 2024 at 7:21 AM Yonghong Song <yonghong.song@linux.dev> wrote:
> >>>> On 9/9/24 10:29 AM, Alexei Starovoitov wrote:
> >>>>> On Mon, Sep 9, 2024 at 10:21 AM Zac Ecob <zacecob@protonmail.com> wrote:
> >>>>>> Hello,
> >>>>>>
> >>>>>> I recently received a kernel 'oops' about a divide error.
> >>>>>> After some research, it seems that the 'div64_s64' function used for the 'MOD'/'REM' instructions boils down to an 'idiv'.
> >>>>>>
> >>>>>> The 'dividend' is set to INT64_MIN, and the 'divisor' to -1, then because of two's complement, there is no corresponding positive value, causing the error (at least to my understanding).
> >>>>>>
> >>>>>>
> >>>>>> Apologies if this is already known / not a relevant concern.
> >>>>> Thanks for the report. This is a new issue.
> >>>>>
> >>>>> Yonghong,
> >>>>>
> >>>>> it's related to the new signed div insn.
> >>>>> It sounds like we need to update chk_and_div[] part of
> >>>>> the verifier to account for signed div differently.
> >>>> In verifier, we have
> >>>>      /* [R,W]x div 0 -> 0 */
> >>>>      /* [R,W]x mod 0 -> [R,W]x */
> >>> the verifier is doing what hw does. In this case this is arm64 behavior.
> >> Okay, I see. I tried on a arm64 machine it indeed hehaves like the above.
> >>
> >> # uname -a
> >> Linux ... #1 SMP PREEMPT_DYNAMIC Thu Aug  1 06:58:32 PDT 2024 aarch64 aarch64 aarch64 GNU/Linux
> >> # cat t2.c
> >> #include <stdio.h>
> >> #include <limits.h>
> >> int main(void) {
> >>     volatile long long a = 5;
> >>     volatile long long b = 0;
> >>     printf("a/b = %lld\n", a/b);
> >>     return 0;
> >> }
> >> # cat t3.c
> >> #include <stdio.h>
> >> #include <limits.h>
> >> int main(void) {
> >>     volatile long long a = 5;
> >>     volatile long long b = 0;
> >>     printf("a%%b = %lld\n", a%b);
> >>     return 0;
> >> }
> >> # gcc -O2 t2.c && ./a.out
> >> a/b = 0
> >> # gcc -O2 t3.c && ./a.out
> >> a%b = 5
> >>
> >> on arm64, clang18 compiled binary has the same result
> >>
> >> # clang -O2 t2.c && ./a.out
> >> a/b = 0
> >> # clang -O2 t3.c && ./a.out
> >> a%b = 5
> >>
> >> The same source code, compiled on x86_64 with -O2 as well,
> >> it generates:
> >>     Floating point exception (core dumped)
> >>
> >>>> What the value for
> >>>>      Rx_a sdiv Rx_b -> ?
> >>>> where Rx_a = INT64_MIN and Rx_b = -1?
> >>> Why does it matter what Rx_a contains ?
> >> It does matter. See below:
> >>
> >> on arm64:
> >>
> >> # cat t1.c
> >> #include <stdio.h>
> >> #include <limits.h>
> >> int main(void) {
> >>     volatile long long a = LLONG_MIN;
> >>     volatile long long b = -1;
> >>     printf("a/b = %lld\n", a/b);
> >>     return 0;
> >> }
> >> # clang -O2 t1.c && ./a.out
> >> a/b = -9223372036854775808
> >> # gcc -O2 t1.c && ./a.out
> >> a/b = -9223372036854775808
> >>
> >> So the result of a/b is LLONG_MIN
> >>
> >> The same code will cause exception on x86_64:
> >>
> >> $ uname -a
> >> Linux ... #1 SMP Wed Jun  5 06:21:21 PDT 2024 x86_64 x86_64 x86_64 GNU/Linux
> >> [yhs@devvm1513.prn0 ~]$ gcc -O2 t1.c && ./a.out
> >> Floating point exception (core dumped)
> >> [yhs@devvm1513.prn0 ~]$ clang -O2 t1.c && ./a.out
> >> Floating point exception (core dumped)
> >>
> >> So this is what we care about.
> >>
> >> So I guess we can follow arm64 result too.
> >>
> >>> What cpus do in this case?
> >> See above. arm64 produces *some* result while x64 cause exception.
> >> We do need to special handle for LLONG_MIN/(-1) case.
> > My point about Rx_a that idiv will cause out-of-range exception
> > for many other values than Rx_a == INT64_MIN.
> > I'm not sure that divisor -1 is the only such case either.
> > Probably is, since intuitively -2 and all other divisors should fit fine.
> > So the check likely needs Rx_b == -1 and a check for high bit in Rx_a ?
>
> Looks like only Rx_a == INT64_MIN may cause the problem.
> All other Rx_a numbers (from INT64_MIN+1 to INT64_MAX)
> should be okay. Some selective testing below on x64 host:
>
> $ cat t5.c
> #include <stdio.h>
> #include <limits.h>
>
> unsigned long long res;
> int main(void) {
>    volatile long long a;
>    long long i;
>    for (i = LLONG_MIN + 1; i <= LLONG_MIN + 100; i++) {
>      volatile long long b = -1;
>      a = i;
>      res += (unsigned long long)(a/b);
>    }
>    for (i = LLONG_MAX - 100; i <= LLONG_MAX - 1; i++) {

Changing this test to i <= LLONG_MAX
and compiling with gcc -O0 or clang -O2 or clang -O0
is causing an exception,
because 'a' becomes LLONG_MIN.
Compilers are doing some odd code gen.
I don't understand how 'i' can wrap this way.

>      volatile long long b = -1;
>      a = i;
>      res += (unsigned long long)(a/b);
>    }
>    printf("res = %llx\n", res);
>    return 0;
> }
> $ gcc -O2 t5.c && ./a.out
> res = 64
>
> So I think it should be okay if the range is from LLONG_MIN + 1
> to LLONG_MAX - 1.
>
> Now for LLONG_MAX/(-1)
>
> $ cat t6.c
> #include <stdio.h>
> #include <limits.h>
> int main(void) {
>    volatile long long a = LLONG_MAX;
>    volatile long long b = -1;
>    printf("a/b = %lld\n", a/b);
>    return 0;
> }
> $ gcc -O2 t6.c && ./a.out
> a/b = -9223372036854775807
>
> It is okay too. So I think LLONG_MIN/(-1) is the only case
> we should take care of.

The test shows that that's the case, but I still can wrap
my head around that only LLONG_MIN/(-1) is a problem.

Any math experts can explain this?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Kernel oops caused by signed divide
  2024-09-10 21:53             ` Alexei Starovoitov
@ 2024-09-10 22:00               ` Yonghong Song
  2024-09-10 22:43               ` Andrii Nakryiko
  1 sibling, 0 replies; 16+ messages in thread
From: Yonghong Song @ 2024-09-10 22:00 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: Zac Ecob, Daniel Borkmann, bpf@vger.kernel.org


On 9/10/24 2:53 PM, Alexei Starovoitov wrote:
> On Tue, Sep 10, 2024 at 12:32 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>>
>> On 9/10/24 11:25 AM, Alexei Starovoitov wrote:
>>> On Tue, Sep 10, 2024 at 11:02 AM Yonghong Song <yonghong.song@linux.dev> wrote:
>>>> On 9/10/24 8:21 AM, Alexei Starovoitov wrote:
>>>>> On Tue, Sep 10, 2024 at 7:21 AM Yonghong Song <yonghong.song@linux.dev> wrote:
>>>>>> On 9/9/24 10:29 AM, Alexei Starovoitov wrote:
>>>>>>> On Mon, Sep 9, 2024 at 10:21 AM Zac Ecob <zacecob@protonmail.com> wrote:
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> I recently received a kernel 'oops' about a divide error.
>>>>>>>> After some research, it seems that the 'div64_s64' function used for the 'MOD'/'REM' instructions boils down to an 'idiv'.
>>>>>>>>
>>>>>>>> The 'dividend' is set to INT64_MIN, and the 'divisor' to -1, then because of two's complement, there is no corresponding positive value, causing the error (at least to my understanding).
>>>>>>>>
>>>>>>>>
>>>>>>>> Apologies if this is already known / not a relevant concern.
>>>>>>> Thanks for the report. This is a new issue.
>>>>>>>
>>>>>>> Yonghong,
>>>>>>>
>>>>>>> it's related to the new signed div insn.
>>>>>>> It sounds like we need to update chk_and_div[] part of
>>>>>>> the verifier to account for signed div differently.
>>>>>> In verifier, we have
>>>>>>       /* [R,W]x div 0 -> 0 */
>>>>>>       /* [R,W]x mod 0 -> [R,W]x */
>>>>> the verifier is doing what hw does. In this case this is arm64 behavior.
>>>> Okay, I see. I tried on a arm64 machine it indeed hehaves like the above.
>>>>
>>>> # uname -a
>>>> Linux ... #1 SMP PREEMPT_DYNAMIC Thu Aug  1 06:58:32 PDT 2024 aarch64 aarch64 aarch64 GNU/Linux
>>>> # cat t2.c
>>>> #include <stdio.h>
>>>> #include <limits.h>
>>>> int main(void) {
>>>>      volatile long long a = 5;
>>>>      volatile long long b = 0;
>>>>      printf("a/b = %lld\n", a/b);
>>>>      return 0;
>>>> }
>>>> # cat t3.c
>>>> #include <stdio.h>
>>>> #include <limits.h>
>>>> int main(void) {
>>>>      volatile long long a = 5;
>>>>      volatile long long b = 0;
>>>>      printf("a%%b = %lld\n", a%b);
>>>>      return 0;
>>>> }
>>>> # gcc -O2 t2.c && ./a.out
>>>> a/b = 0
>>>> # gcc -O2 t3.c && ./a.out
>>>> a%b = 5
>>>>
>>>> on arm64, clang18 compiled binary has the same result
>>>>
>>>> # clang -O2 t2.c && ./a.out
>>>> a/b = 0
>>>> # clang -O2 t3.c && ./a.out
>>>> a%b = 5
>>>>
>>>> The same source code, compiled on x86_64 with -O2 as well,
>>>> it generates:
>>>>      Floating point exception (core dumped)
>>>>
>>>>>> What the value for
>>>>>>       Rx_a sdiv Rx_b -> ?
>>>>>> where Rx_a = INT64_MIN and Rx_b = -1?
>>>>> Why does it matter what Rx_a contains ?
>>>> It does matter. See below:
>>>>
>>>> on arm64:
>>>>
>>>> # cat t1.c
>>>> #include <stdio.h>
>>>> #include <limits.h>
>>>> int main(void) {
>>>>      volatile long long a = LLONG_MIN;
>>>>      volatile long long b = -1;
>>>>      printf("a/b = %lld\n", a/b);
>>>>      return 0;
>>>> }
>>>> # clang -O2 t1.c && ./a.out
>>>> a/b = -9223372036854775808
>>>> # gcc -O2 t1.c && ./a.out
>>>> a/b = -9223372036854775808
>>>>
>>>> So the result of a/b is LLONG_MIN
>>>>
>>>> The same code will cause exception on x86_64:
>>>>
>>>> $ uname -a
>>>> Linux ... #1 SMP Wed Jun  5 06:21:21 PDT 2024 x86_64 x86_64 x86_64 GNU/Linux
>>>> [yhs@devvm1513.prn0 ~]$ gcc -O2 t1.c && ./a.out
>>>> Floating point exception (core dumped)
>>>> [yhs@devvm1513.prn0 ~]$ clang -O2 t1.c && ./a.out
>>>> Floating point exception (core dumped)
>>>>
>>>> So this is what we care about.
>>>>
>>>> So I guess we can follow arm64 result too.
>>>>
>>>>> What cpus do in this case?
>>>> See above. arm64 produces *some* result while x64 cause exception.
>>>> We do need to special handle for LLONG_MIN/(-1) case.
>>> My point about Rx_a that idiv will cause out-of-range exception
>>> for many other values than Rx_a == INT64_MIN.
>>> I'm not sure that divisor -1 is the only such case either.
>>> Probably is, since intuitively -2 and all other divisors should fit fine.
>>> So the check likely needs Rx_b == -1 and a check for high bit in Rx_a ?
>> Looks like only Rx_a == INT64_MIN may cause the problem.
>> All other Rx_a numbers (from INT64_MIN+1 to INT64_MAX)
>> should be okay. Some selective testing below on x64 host:
>>
>> $ cat t5.c
>> #include <stdio.h>
>> #include <limits.h>
>>
>> unsigned long long res;
>> int main(void) {
>>     volatile long long a;
>>     long long i;
>>     for (i = LLONG_MIN + 1; i <= LLONG_MIN + 100; i++) {
>>       volatile long long b = -1;
>>       a = i;
>>       res += (unsigned long long)(a/b);
>>     }
>>     for (i = LLONG_MAX - 100; i <= LLONG_MAX - 1; i++) {
> Changing this test to i <= LLONG_MAX
> and compiling with gcc -O0 or clang -O2 or clang -O0
> is causing an exception,
> because 'a' becomes LLONG_MIN.

This is my theory.
If change to i <= LLONG_MAX, then after
i = LLONG_MAX, it will do 'i++' and then
it will become tricky since then undefined
behavior will pick in as 'i++' will become
out of range.

> Compilers are doing some odd code gen.
> I don't understand how 'i' can wrap this way.
>
>>       volatile long long b = -1;
>>       a = i;
>>       res += (unsigned long long)(a/b);
>>     }
>>     printf("res = %llx\n", res);
>>     return 0;
>> }
>> $ gcc -O2 t5.c && ./a.out
>> res = 64
>>
>> So I think it should be okay if the range is from LLONG_MIN + 1
>> to LLONG_MAX - 1.
>>
>> Now for LLONG_MAX/(-1)
>>
>> $ cat t6.c
>> #include <stdio.h>
>> #include <limits.h>
>> int main(void) {
>>     volatile long long a = LLONG_MAX;
>>     volatile long long b = -1;
>>     printf("a/b = %lld\n", a/b);
>>     return 0;
>> }
>> $ gcc -O2 t6.c && ./a.out
>> a/b = -9223372036854775807
>>
>> It is okay too. So I think LLONG_MIN/(-1) is the only case
>> we should take care of.
> The test shows that that's the case, but I still can wrap
> my head around that only LLONG_MIN/(-1) is a problem.
>
> Any math experts can explain this?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Kernel oops caused by signed divide
  2024-09-10 21:53             ` Alexei Starovoitov
  2024-09-10 22:00               ` Yonghong Song
@ 2024-09-10 22:43               ` Andrii Nakryiko
  1 sibling, 0 replies; 16+ messages in thread
From: Andrii Nakryiko @ 2024-09-10 22:43 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Yonghong Song, Zac Ecob, Daniel Borkmann, bpf@vger.kernel.org

On Tue, Sep 10, 2024 at 2:53 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Tue, Sep 10, 2024 at 12:32 PM Yonghong Song <yonghong.song@linux.dev> wrote:
> >
> >
> > On 9/10/24 11:25 AM, Alexei Starovoitov wrote:
> > > On Tue, Sep 10, 2024 at 11:02 AM Yonghong Song <yonghong.song@linux.dev> wrote:
> > >>
> > >> On 9/10/24 8:21 AM, Alexei Starovoitov wrote:
> > >>> On Tue, Sep 10, 2024 at 7:21 AM Yonghong Song <yonghong.song@linux.dev> wrote:
> > >>>> On 9/9/24 10:29 AM, Alexei Starovoitov wrote:
> > >>>>> On Mon, Sep 9, 2024 at 10:21 AM Zac Ecob <zacecob@protonmail.com> wrote:
> > >>>>>> Hello,
> > >>>>>>
> > >>>>>> I recently received a kernel 'oops' about a divide error.
> > >>>>>> After some research, it seems that the 'div64_s64' function used for the 'MOD'/'REM' instructions boils down to an 'idiv'.
> > >>>>>>
> > >>>>>> The 'dividend' is set to INT64_MIN, and the 'divisor' to -1, then because of two's complement, there is no corresponding positive value, causing the error (at least to my understanding).
> > >>>>>>
> > >>>>>>
> > >>>>>> Apologies if this is already known / not a relevant concern.
> > >>>>> Thanks for the report. This is a new issue.
> > >>>>>
> > >>>>> Yonghong,
> > >>>>>
> > >>>>> it's related to the new signed div insn.
> > >>>>> It sounds like we need to update chk_and_div[] part of
> > >>>>> the verifier to account for signed div differently.
> > >>>> In verifier, we have
> > >>>>      /* [R,W]x div 0 -> 0 */
> > >>>>      /* [R,W]x mod 0 -> [R,W]x */
> > >>> the verifier is doing what hw does. In this case this is arm64 behavior.
> > >> Okay, I see. I tried on a arm64 machine it indeed hehaves like the above.
> > >>
> > >> # uname -a
> > >> Linux ... #1 SMP PREEMPT_DYNAMIC Thu Aug  1 06:58:32 PDT 2024 aarch64 aarch64 aarch64 GNU/Linux
> > >> # cat t2.c
> > >> #include <stdio.h>
> > >> #include <limits.h>
> > >> int main(void) {
> > >>     volatile long long a = 5;
> > >>     volatile long long b = 0;
> > >>     printf("a/b = %lld\n", a/b);
> > >>     return 0;
> > >> }
> > >> # cat t3.c
> > >> #include <stdio.h>
> > >> #include <limits.h>
> > >> int main(void) {
> > >>     volatile long long a = 5;
> > >>     volatile long long b = 0;
> > >>     printf("a%%b = %lld\n", a%b);
> > >>     return 0;
> > >> }
> > >> # gcc -O2 t2.c && ./a.out
> > >> a/b = 0
> > >> # gcc -O2 t3.c && ./a.out
> > >> a%b = 5
> > >>
> > >> on arm64, clang18 compiled binary has the same result
> > >>
> > >> # clang -O2 t2.c && ./a.out
> > >> a/b = 0
> > >> # clang -O2 t3.c && ./a.out
> > >> a%b = 5
> > >>
> > >> The same source code, compiled on x86_64 with -O2 as well,
> > >> it generates:
> > >>     Floating point exception (core dumped)
> > >>
> > >>>> What the value for
> > >>>>      Rx_a sdiv Rx_b -> ?
> > >>>> where Rx_a = INT64_MIN and Rx_b = -1?
> > >>> Why does it matter what Rx_a contains ?
> > >> It does matter. See below:
> > >>
> > >> on arm64:
> > >>
> > >> # cat t1.c
> > >> #include <stdio.h>
> > >> #include <limits.h>
> > >> int main(void) {
> > >>     volatile long long a = LLONG_MIN;
> > >>     volatile long long b = -1;
> > >>     printf("a/b = %lld\n", a/b);
> > >>     return 0;
> > >> }
> > >> # clang -O2 t1.c && ./a.out
> > >> a/b = -9223372036854775808
> > >> # gcc -O2 t1.c && ./a.out
> > >> a/b = -9223372036854775808
> > >>
> > >> So the result of a/b is LLONG_MIN
> > >>
> > >> The same code will cause exception on x86_64:
> > >>
> > >> $ uname -a
> > >> Linux ... #1 SMP Wed Jun  5 06:21:21 PDT 2024 x86_64 x86_64 x86_64 GNU/Linux
> > >> [yhs@devvm1513.prn0 ~]$ gcc -O2 t1.c && ./a.out
> > >> Floating point exception (core dumped)
> > >> [yhs@devvm1513.prn0 ~]$ clang -O2 t1.c && ./a.out
> > >> Floating point exception (core dumped)
> > >>
> > >> So this is what we care about.
> > >>
> > >> So I guess we can follow arm64 result too.
> > >>
> > >>> What cpus do in this case?
> > >> See above. arm64 produces *some* result while x64 cause exception.
> > >> We do need to special handle for LLONG_MIN/(-1) case.
> > > My point about Rx_a that idiv will cause out-of-range exception
> > > for many other values than Rx_a == INT64_MIN.
> > > I'm not sure that divisor -1 is the only such case either.
> > > Probably is, since intuitively -2 and all other divisors should fit fine.
> > > So the check likely needs Rx_b == -1 and a check for high bit in Rx_a ?
> >
> > Looks like only Rx_a == INT64_MIN may cause the problem.
> > All other Rx_a numbers (from INT64_MIN+1 to INT64_MAX)
> > should be okay. Some selective testing below on x64 host:
> >
> > $ cat t5.c
> > #include <stdio.h>
> > #include <limits.h>
> >
> > unsigned long long res;
> > int main(void) {
> >    volatile long long a;
> >    long long i;
> >    for (i = LLONG_MIN + 1; i <= LLONG_MIN + 100; i++) {
> >      volatile long long b = -1;
> >      a = i;
> >      res += (unsigned long long)(a/b);
> >    }
> >    for (i = LLONG_MAX - 100; i <= LLONG_MAX - 1; i++) {
>
> Changing this test to i <= LLONG_MAX
> and compiling with gcc -O0 or clang -O2 or clang -O0
> is causing an exception,
> because 'a' becomes LLONG_MIN.
> Compilers are doing some odd code gen.
> I don't understand how 'i' can wrap this way.
>
> >      volatile long long b = -1;
> >      a = i;
> >      res += (unsigned long long)(a/b);
> >    }
> >    printf("res = %llx\n", res);
> >    return 0;
> > }
> > $ gcc -O2 t5.c && ./a.out
> > res = 64
> >
> > So I think it should be okay if the range is from LLONG_MIN + 1
> > to LLONG_MAX - 1.
> >
> > Now for LLONG_MAX/(-1)
> >
> > $ cat t6.c
> > #include <stdio.h>
> > #include <limits.h>
> > int main(void) {
> >    volatile long long a = LLONG_MAX;
> >    volatile long long b = -1;
> >    printf("a/b = %lld\n", a/b);
> >    return 0;
> > }
> > $ gcc -O2 t6.c && ./a.out
> > a/b = -9223372036854775807
> >
> > It is okay too. So I think LLONG_MIN/(-1) is the only case
> > we should take care of.
>
> The test shows that that's the case, but I still can wrap
> my head around that only LLONG_MIN/(-1) is a problem.
>
> Any math experts can explain this?
>

Not a math expert, but this is because LLONG_MIN / (-1) needs to be
-LLONG_MIN, right? But -LLONG_MIN is not representable in 2-complement
representation, because positive and negative sides are not
"symmetrical":

LLONG_MIN = -9,223,372,036,854,775,808
LLONG_MAX= 9,223,372,036,854,775,807

-LLONG_MIN would be 9,223,372,036,854,775,808, which is beyond the
representable range for 64-bit signed integer.

That's why Dave asked about BPF_NEG for LLONG_MIN, it's a similar
problem, its result is unrepresentable value. So in practice
-LLONG_MIN == LLONG_MIN :)

$ cat main.c
#include <stdio.h>
#include <stdint.h>

int main()
{
        long long x = INT64_MIN;

        printf("%lld %llx %llx\n", x, x, -x);

        return 0;
}
$ cc main.c && ./a.out
-9223372036854775808 8000000000000000 8000000000000000

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2024-09-10 22:43 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-09 17:21 Kernel oops caused by signed divide Zac Ecob
2024-09-09 17:27 ` Yonghong Song
2024-09-09 17:29 ` Alexei Starovoitov
2024-09-09 23:47   ` Yonghong Song
2024-09-10 14:21   ` Yonghong Song
2024-09-10 14:44     ` Dave Thaler
2024-09-10 15:18       ` Yonghong Song
2024-09-10 15:21         ` Alexei Starovoitov
2024-09-10 18:12           ` Yonghong Song
2024-09-10 15:21     ` Alexei Starovoitov
2024-09-10 18:02       ` Yonghong Song
2024-09-10 18:25         ` Alexei Starovoitov
2024-09-10 19:32           ` Yonghong Song
2024-09-10 21:53             ` Alexei Starovoitov
2024-09-10 22:00               ` Yonghong Song
2024-09-10 22:43               ` Andrii Nakryiko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox