linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* Build failure with v4.9-rc1 and GCC trunk -- compiler weirdness
@ 2016-10-17 18:38 Will Deacon
  2016-10-17 19:43 ` Ard Biesheuvel
  2016-10-19 15:37 ` Markus Trippelsdorf
  0 siblings, 2 replies; 30+ messages in thread
From: Will Deacon @ 2016-10-17 18:38 UTC (permalink / raw)
  To: linux-arm-kernel

Hi all,

I'm seeing an arm64 build failure with -rc1 and GCC trunk, although I
believe that the new compiler behaviour at the heart of the problem
has the potential to affect other architectures and other pieces of
kernel code relying on dead-code elimination to remove deliberately
undefined functions.

The failure looks like:

  | drivers/built-in.o: In function `armada_3700_add_composite_clk':
  |
  | linux/drivers/clk/mvebu/armada-37xx-periph.c:351:
  | undefined reference to `____ilog2_NaN'
  |
  | linux/drivers/clk/mvebu/armada-37xx-periph.c:351:(.text+0xc72e0):
  | relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol
  | `____ilog2_NaN'
  |
  | make: *** [vmlinux] Error 1

and if we look at the source for armada_3700_add_composite_clk, we see
that this is caused by:

  int table_size = 0;

  rate->reg = reg + (u64)rate->reg;
  for (clkt = rate->table; clkt->div; clkt++)
	 table_size++;
  rate->width = order_base_2(table_size);

order_base_2 calls ilog2, which has the ____ilog2_NaN call:

#define ilog2(n)				\
(						\
	__builtin_constant_p(n) ? (		\
		(n) < 1 ? ____ilog2_NaN() :	\

This is because we're in a curious case where GCC has emitted a
special-cased version of armada_3700_add_composite_clk, with table_size
effectively constant-folded as 0. Whilst we shouldn't see this in a
non-buggy kernel (hence the deliberate call to the undefined function
____ilog2_NaN), it means that the final link fails because we have a
____ilog2_NaN in the code, with a runtime check on table_size.

In other words, __builtin_constant_p appears to be weaker than we've
been assuming. Talking to the compiler guys here, this is due to the
"jump-threading" optimisation pass, so the patch below disables that.

A simpler example is:

int foo();
int bar();

int count(int *argc)
{
	int table_size = 0;

	for (; *argc; argc++)
		table_size++;

	if (__builtin_constant_p(table_size))
		return table_size == 0 ? foo() : bar();

	return bar();
}

which compiles to:

count:
	ldr	w0, [x0]
	cbz	w0, .L4
	b	bar
	.p2align 3
.L4:
	b	foo

and, with the "optimisation" disabled:

count:
	b	bar

Thoughts? It feels awfully fragile disabling passes like this, but with
GCC transforming the code like this, I can't immediately think of a way
to preserve the intended behaviour of the code.

Will

--->8

diff --git a/Makefile b/Makefile
index 512e47a53e9a..750873d6d11e 100644
--- a/Makefile
+++ b/Makefile
@@ -641,6 +641,11 @@ endif
 # Tell gcc to never replace conditional load with a non-conditional one
 KBUILD_CFLAGS	+= $(call cc-option,--param=allow-store-data-races=0)
 
+# Stop gcc from converting switches into a form that defeats dead code
+# elimination and can subsequently lead to calls to intentionally
+# undefined functions appearing in the final link.
+KBUILD_CFLAGS	+= $(call cc-option,--param=max-fsm-thread-path-insns=1)
+
 include scripts/Makefile.gcc-plugins
 
 ifdef CONFIG_READABLE_ASM

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Build failure with v4.9-rc1 and GCC trunk -- compiler weirdness
  2016-10-17 18:38 Build failure with v4.9-rc1 and GCC trunk -- compiler weirdness Will Deacon
@ 2016-10-17 19:43 ` Ard Biesheuvel
  2016-10-19 13:35   ` Will Deacon
  2016-10-19 15:37 ` Markus Trippelsdorf
  1 sibling, 1 reply; 30+ messages in thread
From: Ard Biesheuvel @ 2016-10-17 19:43 UTC (permalink / raw)
  To: linux-arm-kernel

On 17 October 2016 at 19:38, Will Deacon <will.deacon@arm.com> wrote:
> Hi all,
>
> I'm seeing an arm64 build failure with -rc1 and GCC trunk, although I
> believe that the new compiler behaviour at the heart of the problem
> has the potential to affect other architectures and other pieces of
> kernel code relying on dead-code elimination to remove deliberately
> undefined functions.
>
> The failure looks like:
>
>   | drivers/built-in.o: In function `armada_3700_add_composite_clk':
>   |
>   | linux/drivers/clk/mvebu/armada-37xx-periph.c:351:
>   | undefined reference to `____ilog2_NaN'
>   |
>   | linux/drivers/clk/mvebu/armada-37xx-periph.c:351:(.text+0xc72e0):
>   | relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol
>   | `____ilog2_NaN'
>   |
>   | make: *** [vmlinux] Error 1
>
> and if we look at the source for armada_3700_add_composite_clk, we see
> that this is caused by:
>
>   int table_size = 0;
>
>   rate->reg = reg + (u64)rate->reg;
>   for (clkt = rate->table; clkt->div; clkt++)
>          table_size++;
>   rate->width = order_base_2(table_size);
>
> order_base_2 calls ilog2, which has the ____ilog2_NaN call:
>
> #define ilog2(n)                                \
> (                                               \
>         __builtin_constant_p(n) ? (             \
>                 (n) < 1 ? ____ilog2_NaN() :     \
>
> This is because we're in a curious case where GCC has emitted a
> special-cased version of armada_3700_add_composite_clk, with table_size
> effectively constant-folded as 0. Whilst we shouldn't see this in a
> non-buggy kernel (hence the deliberate call to the undefined function
> ____ilog2_NaN), it means that the final link fails because we have a
> ____ilog2_NaN in the code, with a runtime check on table_size.
>

This is indeed an unintended side effect, but I would not call it
weird behaviour at all. The code in its current form does not handle
the case where it could end up passing 0 into order_base_2(), and we
simply need to handle that case. If order_base_2() is not defined for
input 0, it should BUG() in that case, and the associated
__builtin_unreachable() should prevent the special version from being
emitted. If order_base_2() is defined for input 0, it should not
invoke ilog2() with that argument, and the problem should go away as
well.

-- 
Ard.


> In other words, __builtin_constant_p appears to be weaker than we've
> been assuming. Talking to the compiler guys here, this is due to the
> "jump-threading" optimisation pass, so the patch below disables that.
>
> A simpler example is:
>
> int foo();
> int bar();
>
> int count(int *argc)
> {
>         int table_size = 0;
>
>         for (; *argc; argc++)
>                 table_size++;
>
>         if (__builtin_constant_p(table_size))
>                 return table_size == 0 ? foo() : bar();
>
>         return bar();
> }
>
> which compiles to:
>
> count:
>         ldr     w0, [x0]
>         cbz     w0, .L4
>         b       bar
>         .p2align 3
> .L4:
>         b       foo
>
> and, with the "optimisation" disabled:
>
> count:
>         b       bar
>
> Thoughts? It feels awfully fragile disabling passes like this, but with
> GCC transforming the code like this, I can't immediately think of a way
> to preserve the intended behaviour of the code.
>
> Will
>
> --->8
>
> diff --git a/Makefile b/Makefile
> index 512e47a53e9a..750873d6d11e 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -641,6 +641,11 @@ endif
>  # Tell gcc to never replace conditional load with a non-conditional one
>  KBUILD_CFLAGS  += $(call cc-option,--param=allow-store-data-races=0)
>
> +# Stop gcc from converting switches into a form that defeats dead code
> +# elimination and can subsequently lead to calls to intentionally
> +# undefined functions appearing in the final link.
> +KBUILD_CFLAGS  += $(call cc-option,--param=max-fsm-thread-path-insns=1)
> +
>  include scripts/Makefile.gcc-plugins
>
>  ifdef CONFIG_READABLE_ASM

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Build failure with v4.9-rc1 and GCC trunk -- compiler weirdness
  2016-10-17 19:43 ` Ard Biesheuvel
@ 2016-10-19 13:35   ` Will Deacon
  2016-10-19 14:59     ` Ard Biesheuvel
  2016-10-19 15:58     ` Russell King - ARM Linux
  0 siblings, 2 replies; 30+ messages in thread
From: Will Deacon @ 2016-10-19 13:35 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Ard,

On Mon, Oct 17, 2016 at 08:43:19PM +0100, Ard Biesheuvel wrote:
> On 17 October 2016 at 19:38, Will Deacon <will.deacon@arm.com> wrote:
> > I'm seeing an arm64 build failure with -rc1 and GCC trunk, although I
> > believe that the new compiler behaviour at the heart of the problem
> > has the potential to affect other architectures and other pieces of
> > kernel code relying on dead-code elimination to remove deliberately
> > undefined functions.
> >
> > The failure looks like:
> >
> >   | drivers/built-in.o: In function `armada_3700_add_composite_clk':
> >   |
> >   | linux/drivers/clk/mvebu/armada-37xx-periph.c:351:
> >   | undefined reference to `____ilog2_NaN'
> >   |
> >   | linux/drivers/clk/mvebu/armada-37xx-periph.c:351:(.text+0xc72e0):
> >   | relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol
> >   | `____ilog2_NaN'
> >   |
> >   | make: *** [vmlinux] Error 1
> >
> > and if we look at the source for armada_3700_add_composite_clk, we see
> > that this is caused by:
> >
> >   int table_size = 0;
> >
> >   rate->reg = reg + (u64)rate->reg;
> >   for (clkt = rate->table; clkt->div; clkt++)
> >          table_size++;
> >   rate->width = order_base_2(table_size);
> >
> > order_base_2 calls ilog2, which has the ____ilog2_NaN call:
> >
> > #define ilog2(n)                                \
> > (                                               \
> >         __builtin_constant_p(n) ? (             \
> >                 (n) < 1 ? ____ilog2_NaN() :     \
> >
> > This is because we're in a curious case where GCC has emitted a
> > special-cased version of armada_3700_add_composite_clk, with table_size
> > effectively constant-folded as 0. Whilst we shouldn't see this in a
> > non-buggy kernel (hence the deliberate call to the undefined function
> > ____ilog2_NaN), it means that the final link fails because we have a
> > ____ilog2_NaN in the code, with a runtime check on table_size.
> >
> 
> This is indeed an unintended side effect, but I would not call it
> weird behaviour at all. The code in its current form does not handle
> the case where it could end up passing 0 into order_base_2(), and we
> simply need to handle that case.

The reasons I think it's weird are:

  (1) The optimisation doesn't generate better code in this case --
      optimising for the table_size == 0 case is uninformed, particularly
      as that *cannot* happen at runtime (GCC probably can't tell, due
      to things like container_of, but all the clock data is static).

  (2) __builtin_constant_p(n) could be interpreted by a developer as
     "this code will execute with a constant n at runtime". With this
     issue, GCC could (in theory) generate a specialisation for every
     possible value of a variable, and return __builtin_constant_p as
     true for all of them, which somewhat undermines the point of the
     builtin.

> If order_base_2() is not defined for input 0, it should BUG() in that
> case, and the associated __builtin_unreachable() should prevent the
> special version from being emitted. If order_base_2() is defined for input
> 0, it should not invoke ilog2() with that argument, and the problem should
> go away as well.

I don't necessarily think it should BUG() if it's not defined for input
0; things like __ffs don't do that and we'd be introducing conditional
checks for cases that should not happen. The comment above order_base_2
does suggest that ob2(0) should return 0, but it can actually end up
invoking ilog2(-1), which is obviously wrong.

I could update the comment, but that doesn't fix the build issue.

Will

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Build failure with v4.9-rc1 and GCC trunk -- compiler weirdness
  2016-10-19 13:35   ` Will Deacon
@ 2016-10-19 14:59     ` Ard Biesheuvel
  2016-10-19 15:01       ` Ard Biesheuvel
  2016-10-19 15:58     ` Russell King - ARM Linux
  1 sibling, 1 reply; 30+ messages in thread
From: Ard Biesheuvel @ 2016-10-19 14:59 UTC (permalink / raw)
  To: linux-arm-kernel

On 19 October 2016 at 14:35, Will Deacon <will.deacon@arm.com> wrote:
> Hi Ard,
>
> On Mon, Oct 17, 2016 at 08:43:19PM +0100, Ard Biesheuvel wrote:
>> On 17 October 2016 at 19:38, Will Deacon <will.deacon@arm.com> wrote:
>> > I'm seeing an arm64 build failure with -rc1 and GCC trunk, although I
>> > believe that the new compiler behaviour at the heart of the problem
>> > has the potential to affect other architectures and other pieces of
>> > kernel code relying on dead-code elimination to remove deliberately
>> > undefined functions.
>> >
>> > The failure looks like:
>> >
>> >   | drivers/built-in.o: In function `armada_3700_add_composite_clk':
>> >   |
>> >   | linux/drivers/clk/mvebu/armada-37xx-periph.c:351:
>> >   | undefined reference to `____ilog2_NaN'
>> >   |
>> >   | linux/drivers/clk/mvebu/armada-37xx-periph.c:351:(.text+0xc72e0):
>> >   | relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol
>> >   | `____ilog2_NaN'
>> >   |
>> >   | make: *** [vmlinux] Error 1
>> >
>> > and if we look at the source for armada_3700_add_composite_clk, we see
>> > that this is caused by:
>> >
>> >   int table_size = 0;
>> >
>> >   rate->reg = reg + (u64)rate->reg;
>> >   for (clkt = rate->table; clkt->div; clkt++)
>> >          table_size++;
>> >   rate->width = order_base_2(table_size);
>> >
>> > order_base_2 calls ilog2, which has the ____ilog2_NaN call:
>> >
>> > #define ilog2(n)                                \
>> > (                                               \
>> >         __builtin_constant_p(n) ? (             \
>> >                 (n) < 1 ? ____ilog2_NaN() :     \
>> >
>> > This is because we're in a curious case where GCC has emitted a
>> > special-cased version of armada_3700_add_composite_clk, with table_size
>> > effectively constant-folded as 0. Whilst we shouldn't see this in a
>> > non-buggy kernel (hence the deliberate call to the undefined function
>> > ____ilog2_NaN), it means that the final link fails because we have a
>> > ____ilog2_NaN in the code, with a runtime check on table_size.
>> >
>>
>> This is indeed an unintended side effect, but I would not call it
>> weird behaviour at all. The code in its current form does not handle
>> the case where it could end up passing 0 into order_base_2(), and we
>> simply need to handle that case.
>
> The reasons I think it's weird are:
>
>   (1) The optimisation doesn't generate better code in this case --
>       optimising for the table_size == 0 case is uninformed, particularly
>       as that *cannot* happen at runtime (GCC probably can't tell, due
>       to things like container_of, but all the clock data is static).
>

AFAICT, the references to the static clock data are indirected via
of_device_get_match_data(), which means there is no way the compiler
can prove that table_size is always non-zero.

>   (2) __builtin_constant_p(n) could be interpreted by a developer as
>      "this code will execute with a constant n at runtime". With this
>      issue, GCC could (in theory) generate a specialisation for every
>      possible value of a variable, and return __builtin_constant_p as
>      true for all of them, which somewhat undermines the point of the
>      builtin.
>

Yes, and that would be perfectly legal from a correctness point of
view, and would likely help performance as well. By using
__builtin_constant_p(), you are choosing to perform a build time
evaluation of an expression that would ordinarily be evaluated only at
runtime. This implies that you have to address undefined behavior at
build time rather than at runtime as well.

>> If order_base_2() is not defined for input 0, it should BUG() in that
>> case, and the associated __builtin_unreachable() should prevent the
>> special version from being emitted. If order_base_2() is defined for input
>> 0, it should not invoke ilog2() with that argument, and the problem should
>> go away as well.
>
> I don't necessarily think it should BUG() if it's not defined for input
> 0; things like __ffs don't do that and we'd be introducing conditional
> checks for cases that should not happen. The comment above order_base_2
> does suggest that ob2(0) should return 0, but it can actually end up
> invoking ilog2(-1), which is obviously wrong.
>
> I could update the comment, but that doesn't fix the build issue.
>

Fixing roundup_pow_of_two() [which is arguably incorrect] would
probably fix the build issue as well, no?

diff --git a/include/linux/log2.h b/include/linux/log2.h
index fd7ff3d91e6a..8a4be5e4223b 100644
--- a/include/linux/log2.h
+++ b/include/linux/log2.h
@@ -168,7 +168,7 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
 #define roundup_pow_of_two(n)                  \
 (                                              \
        __builtin_constant_p(n) ? (             \
-               (n == 1) ? 1 :                  \
+               (n <= 1) ? 1 :                  \
                (1UL << (ilog2((n) - 1) + 1))   \
                                   ) :          \
        __roundup_pow_of_two(n)                 \

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Build failure with v4.9-rc1 and GCC trunk -- compiler weirdness
  2016-10-19 14:59     ` Ard Biesheuvel
@ 2016-10-19 15:01       ` Ard Biesheuvel
  2016-10-19 15:11         ` Arnd Bergmann
  0 siblings, 1 reply; 30+ messages in thread
From: Ard Biesheuvel @ 2016-10-19 15:01 UTC (permalink / raw)
  To: linux-arm-kernel

On 19 October 2016 at 15:59, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> On 19 October 2016 at 14:35, Will Deacon <will.deacon@arm.com> wrote:
>> Hi Ard,
>>
>> On Mon, Oct 17, 2016 at 08:43:19PM +0100, Ard Biesheuvel wrote:
>>> On 17 October 2016 at 19:38, Will Deacon <will.deacon@arm.com> wrote:
>>> > I'm seeing an arm64 build failure with -rc1 and GCC trunk, although I
>>> > believe that the new compiler behaviour at the heart of the problem
>>> > has the potential to affect other architectures and other pieces of
>>> > kernel code relying on dead-code elimination to remove deliberately
>>> > undefined functions.
>>> >
>>> > The failure looks like:
>>> >
>>> >   | drivers/built-in.o: In function `armada_3700_add_composite_clk':
>>> >   |
>>> >   | linux/drivers/clk/mvebu/armada-37xx-periph.c:351:
>>> >   | undefined reference to `____ilog2_NaN'
>>> >   |
>>> >   | linux/drivers/clk/mvebu/armada-37xx-periph.c:351:(.text+0xc72e0):
>>> >   | relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol
>>> >   | `____ilog2_NaN'
>>> >   |
>>> >   | make: *** [vmlinux] Error 1
>>> >
>>> > and if we look at the source for armada_3700_add_composite_clk, we see
>>> > that this is caused by:
>>> >
>>> >   int table_size = 0;
>>> >
>>> >   rate->reg = reg + (u64)rate->reg;
>>> >   for (clkt = rate->table; clkt->div; clkt++)
>>> >          table_size++;
>>> >   rate->width = order_base_2(table_size);
>>> >
>>> > order_base_2 calls ilog2, which has the ____ilog2_NaN call:
>>> >
>>> > #define ilog2(n)                                \
>>> > (                                               \
>>> >         __builtin_constant_p(n) ? (             \
>>> >                 (n) < 1 ? ____ilog2_NaN() :     \
>>> >
>>> > This is because we're in a curious case where GCC has emitted a
>>> > special-cased version of armada_3700_add_composite_clk, with table_size
>>> > effectively constant-folded as 0. Whilst we shouldn't see this in a
>>> > non-buggy kernel (hence the deliberate call to the undefined function
>>> > ____ilog2_NaN), it means that the final link fails because we have a
>>> > ____ilog2_NaN in the code, with a runtime check on table_size.
>>> >
>>>
>>> This is indeed an unintended side effect, but I would not call it
>>> weird behaviour at all. The code in its current form does not handle
>>> the case where it could end up passing 0 into order_base_2(), and we
>>> simply need to handle that case.
>>
>> The reasons I think it's weird are:
>>
>>   (1) The optimisation doesn't generate better code in this case --
>>       optimising for the table_size == 0 case is uninformed, particularly
>>       as that *cannot* happen at runtime (GCC probably can't tell, due
>>       to things like container_of, but all the clock data is static).
>>
>
> AFAICT, the references to the static clock data are indirected via
> of_device_get_match_data(), which means there is no way the compiler
> can prove that table_size is always non-zero.
>
>>   (2) __builtin_constant_p(n) could be interpreted by a developer as
>>      "this code will execute with a constant n at runtime". With this
>>      issue, GCC could (in theory) generate a specialisation for every
>>      possible value of a variable, and return __builtin_constant_p as
>>      true for all of them, which somewhat undermines the point of the
>>      builtin.
>>
>
> Yes, and that would be perfectly legal from a correctness point of
> view, and would likely help performance as well. By using
> __builtin_constant_p(), you are choosing to perform a build time
> evaluation of an expression that would ordinarily be evaluated only at
> runtime. This implies that you have to address undefined behavior at
> build time rather than at runtime as well.
>
>>> If order_base_2() is not defined for input 0, it should BUG() in that
>>> case, and the associated __builtin_unreachable() should prevent the
>>> special version from being emitted. If order_base_2() is defined for input
>>> 0, it should not invoke ilog2() with that argument, and the problem should
>>> go away as well.
>>
>> I don't necessarily think it should BUG() if it's not defined for input
>> 0; things like __ffs don't do that and we'd be introducing conditional
>> checks for cases that should not happen. The comment above order_base_2
>> does suggest that ob2(0) should return 0, but it can actually end up
>> invoking ilog2(-1), which is obviously wrong.
>>
>> I could update the comment, but that doesn't fix the build issue.
>>
>
> Fixing roundup_pow_of_two() [which is arguably incorrect]

I just spotted the comment that says it is undefined. But that means
it could legally return 1 for input 0, i suppose

> would
> probably fix the build issue as well, no?
>
> diff --git a/include/linux/log2.h b/include/linux/log2.h
> index fd7ff3d91e6a..8a4be5e4223b 100644
> --- a/include/linux/log2.h
> +++ b/include/linux/log2.h
> @@ -168,7 +168,7 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
>  #define roundup_pow_of_two(n)                  \
>  (                                              \
>         __builtin_constant_p(n) ? (             \
> -               (n == 1) ? 1 :                  \
> +               (n <= 1) ? 1 :                  \
>                 (1UL << (ilog2((n) - 1) + 1))   \
>                                    ) :          \
>         __roundup_pow_of_two(n)                 \

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Build failure with v4.9-rc1 and GCC trunk -- compiler weirdness
  2016-10-19 15:01       ` Ard Biesheuvel
@ 2016-10-19 15:11         ` Arnd Bergmann
  2016-10-19 15:27           ` Ard Biesheuvel
  2016-10-19 15:32           ` Gregory CLEMENT
  0 siblings, 2 replies; 30+ messages in thread
From: Arnd Bergmann @ 2016-10-19 15:11 UTC (permalink / raw)
  To: linux-arm-kernel

On Wednesday, October 19, 2016 4:01:58 PM CEST Ard Biesheuvel wrote:
> On 19 October 2016 at 15:59, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> > On 19 October 2016 at 14:35, Will Deacon <will.deacon@arm.com> wrote:
> >> On Mon, Oct 17, 2016 at 08:43:19PM +0100, Ard Biesheuvel wrote:
> >>> On 17 October 2016 at 19:38, Will Deacon <will.deacon@arm.com> wrote:
> >
> > Yes, and that would be perfectly legal from a correctness point of
> > view, and would likely help performance as well. By using
> > __builtin_constant_p(), you are choosing to perform a build time
> > evaluation of an expression that would ordinarily be evaluated only at
> > runtime. This implies that you have to address undefined behavior at
> > build time rather than at runtime as well.
> >
> >>> If order_base_2() is not defined for input 0, it should BUG() in that
> >>> case, and the associated __builtin_unreachable() should prevent the
> >>> special version from being emitted. If order_base_2() is defined for input
> >>> 0, it should not invoke ilog2() with that argument, and the problem should
> >>> go away as well.
> >>
> >> I don't necessarily think it should BUG() if it's not defined for input
> >> 0; things like __ffs don't do that and we'd be introducing conditional
> >> checks for cases that should not happen. The comment above order_base_2
> >> does suggest that ob2(0) should return 0, but it can actually end up
> >> invoking ilog2(-1), which is obviously wrong.
> >>
> >> I could update the comment, but that doesn't fix the build issue.
> >>
> >
> > Fixing roundup_pow_of_two() [which is arguably incorrect]
> 
> I just spotted the comment that says it is undefined. But that means
> it could legally return 1 for input 0, i suppose

I think having the link error in roundup_pow_of_two() is safer than
returning 1.

Why not turn it into a runtime warning in this driver?

diff --git a/drivers/clk/mvebu/armada-37xx-periph.c b/drivers/clk/mvebu/armada-37xx-periph.c
index cecb0fdfaef6..711d1d9842cc 100644
--- a/drivers/clk/mvebu/armada-37xx-periph.c
+++ b/drivers/clk/mvebu/armada-37xx-periph.c
@@ -349,8 +349,10 @@ static int armada_3700_add_composite_clk(const struct clk_periph_data *data,
 			rate->reg = reg + (u64)rate->reg;
 			for (clkt = rate->table; clkt->div; clkt++)
 				table_size++;
-			rate->width = order_base_2(table_size);
-			rate->lock = lock;
+			if (!WARN_ON(table_size == 0)) {
+				rate->width = order_base_2(table_size);
+				rate->lock = lock;
+			}
 		}
 	}
 

	
	Arnd

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Build failure with v4.9-rc1 and GCC trunk -- compiler weirdness
  2016-10-19 15:11         ` Arnd Bergmann
@ 2016-10-19 15:27           ` Ard Biesheuvel
  2016-10-19 15:44             ` Arnd Bergmann
  2016-10-19 15:32           ` Gregory CLEMENT
  1 sibling, 1 reply; 30+ messages in thread
From: Ard Biesheuvel @ 2016-10-19 15:27 UTC (permalink / raw)
  To: linux-arm-kernel

On 19 October 2016 at 16:11, Arnd Bergmann <arnd@arndb.de> wrote:
> On Wednesday, October 19, 2016 4:01:58 PM CEST Ard Biesheuvel wrote:
>> On 19 October 2016 at 15:59, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>> > On 19 October 2016 at 14:35, Will Deacon <will.deacon@arm.com> wrote:
>> >> On Mon, Oct 17, 2016 at 08:43:19PM +0100, Ard Biesheuvel wrote:
>> >>> On 17 October 2016 at 19:38, Will Deacon <will.deacon@arm.com> wrote:
>> >
>> > Yes, and that would be perfectly legal from a correctness point of
>> > view, and would likely help performance as well. By using
>> > __builtin_constant_p(), you are choosing to perform a build time
>> > evaluation of an expression that would ordinarily be evaluated only at
>> > runtime. This implies that you have to address undefined behavior at
>> > build time rather than at runtime as well.
>> >
>> >>> If order_base_2() is not defined for input 0, it should BUG() in that
>> >>> case, and the associated __builtin_unreachable() should prevent the
>> >>> special version from being emitted. If order_base_2() is defined for input
>> >>> 0, it should not invoke ilog2() with that argument, and the problem should
>> >>> go away as well.
>> >>
>> >> I don't necessarily think it should BUG() if it's not defined for input
>> >> 0; things like __ffs don't do that and we'd be introducing conditional
>> >> checks for cases that should not happen. The comment above order_base_2
>> >> does suggest that ob2(0) should return 0, but it can actually end up
>> >> invoking ilog2(-1), which is obviously wrong.
>> >>
>> >> I could update the comment, but that doesn't fix the build issue.
>> >>
>> >
>> > Fixing roundup_pow_of_two() [which is arguably incorrect]
>>
>> I just spotted the comment that says it is undefined. But that means
>> it could legally return 1 for input 0, i suppose
>
> I think having the link error in roundup_pow_of_two() is safer than
> returning 1.
>
> Why not turn it into a runtime warning in this driver?
>
> diff --git a/drivers/clk/mvebu/armada-37xx-periph.c b/drivers/clk/mvebu/armada-37xx-periph.c
> index cecb0fdfaef6..711d1d9842cc 100644
> --- a/drivers/clk/mvebu/armada-37xx-periph.c
> +++ b/drivers/clk/mvebu/armada-37xx-periph.c
> @@ -349,8 +349,10 @@ static int armada_3700_add_composite_clk(const struct clk_periph_data *data,
>                         rate->reg = reg + (u64)rate->reg;
>                         for (clkt = rate->table; clkt->div; clkt++)
>                                 table_size++;
> -                       rate->width = order_base_2(table_size);
> -                       rate->lock = lock;
> +                       if (!WARN_ON(table_size == 0)) {
> +                               rate->width = order_base_2(table_size);
> +                               rate->lock = lock;
> +                       }
>                 }
>         }
>

I guess Will is not looking for a way to fix the driver, but for a way
to eliminate this issue entirely going forward.

In general, I think the issue where constant folding results in
ilog2() or other similar functions being called with invalid build
time constant parameter values is simply something we have to deal
with.

In this case, it is in fact order_base_2() that deviates from its
documented behavior (as Will points out), and fixing /that/ should
make this particular issue go away afaict.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Build failure with v4.9-rc1 and GCC trunk -- compiler weirdness
  2016-10-19 15:11         ` Arnd Bergmann
  2016-10-19 15:27           ` Ard Biesheuvel
@ 2016-10-19 15:32           ` Gregory CLEMENT
  1 sibling, 0 replies; 30+ messages in thread
From: Gregory CLEMENT @ 2016-10-19 15:32 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Arnd,
 
 On mer., oct. 19 2016, Arnd Bergmann <arnd@arndb.de> wrote:

> On Wednesday, October 19, 2016 4:01:58 PM CEST Ard Biesheuvel wrote:
>> On 19 October 2016 at 15:59, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>> > On 19 October 2016 at 14:35, Will Deacon <will.deacon@arm.com> wrote:
>> >> On Mon, Oct 17, 2016 at 08:43:19PM +0100, Ard Biesheuvel wrote:
>> >>> On 17 October 2016 at 19:38, Will Deacon <will.deacon@arm.com> wrote:
>> >
>> > Yes, and that would be perfectly legal from a correctness point of
>> > view, and would likely help performance as well. By using
>> > __builtin_constant_p(), you are choosing to perform a build time
>> > evaluation of an expression that would ordinarily be evaluated only at
>> > runtime. This implies that you have to address undefined behavior at
>> > build time rather than at runtime as well.
>> >
>> >>> If order_base_2() is not defined for input 0, it should BUG() in that
>> >>> case, and the associated __builtin_unreachable() should prevent the
>> >>> special version from being emitted. If order_base_2() is defined for input
>> >>> 0, it should not invoke ilog2() with that argument, and the problem should
>> >>> go away as well.
>> >>
>> >> I don't necessarily think it should BUG() if it's not defined for input
>> >> 0; things like __ffs don't do that and we'd be introducing conditional
>> >> checks for cases that should not happen. The comment above order_base_2
>> >> does suggest that ob2(0) should return 0, but it can actually end up
>> >> invoking ilog2(-1), which is obviously wrong.
>> >>
>> >> I could update the comment, but that doesn't fix the build issue.
>> >>
>> >
>> > Fixing roundup_pow_of_two() [which is arguably incorrect]
>> 
>> I just spotted the comment that says it is undefined. But that means
>> it could legally return 1 for input 0, i suppose
>
> I think having the link error in roundup_pow_of_two() is safer than
> returning 1.
>
> Why not turn it into a runtime warning in this driver?
>
> diff --git a/drivers/clk/mvebu/armada-37xx-periph.c b/drivers/clk/mvebu/armada-37xx-periph.c
> index cecb0fdfaef6..711d1d9842cc 100644
> --- a/drivers/clk/mvebu/armada-37xx-periph.c
> +++ b/drivers/clk/mvebu/armada-37xx-periph.c
> @@ -349,8 +349,10 @@ static int armada_3700_add_composite_clk(const struct clk_periph_data *data,
>  			rate->reg = reg + (u64)rate->reg;
>  			for (clkt = rate->table; clkt->div; clkt++)
>  				table_size++;
> -			rate->width = order_base_2(table_size);
> -			rate->lock = lock;
> +			if (!WARN_ON(table_size == 0)) {
> +				rate->width = order_base_2(table_size);
> +				rate->lock = lock;
> +			}

With the way the data are constructed in the driver I don't see how the
table_size can be 0.

However I understand it is more something for the compiler.

In this case it is better to nullify the rate_hw as having width=0 will
lead to trouble in the clk_divider operations


If it is the needed solution for this build error I can submit this kind
of patch:
diff --git a/drivers/clk/mvebu/armada-37xx-periph.c b/drivers/clk/mvebu/armada-37xx-periph.c
index 45905fc0d75b..dbc49359406d 100644
--- a/drivers/clk/mvebu/armada-37xx-periph.c
+++ b/drivers/clk/mvebu/armada-37xx-periph.c
@@ -345,11 +345,16 @@ static int armada_3700_add_composite_clk(const struct clk_periph_data *data,
                        const struct clk_div_table *clkt;
                        int table_size = 0;
 
-                       rate->reg = reg + (u64)rate->reg;
                        for (clkt = rate->table; clkt->div; clkt++)
                                table_size++;
-                       rate->width = order_base_2(table_size);
-                       rate->lock = lock;
+                       if (!WARN_ON(table_size == 0)) {
+                               rate->reg = reg + (u64)rate->reg;
+                               rate->width = order_base_2(table_size);
+                               rate->lock = lock;
+                       } else {
+                               rate_hw = NULL;
+                               rate_ops = NULL;
+                       }
                }
        }


Gregory

>  		}
>  	}
>  
>
> 	
> 	Arnd
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

-- 
Gregory Clement, Free Electrons
Kernel, drivers, real-time and embedded Linux
development, consulting, training and support.
http://free-electrons.com

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Build failure with v4.9-rc1 and GCC trunk -- compiler weirdness
  2016-10-17 18:38 Build failure with v4.9-rc1 and GCC trunk -- compiler weirdness Will Deacon
  2016-10-17 19:43 ` Ard Biesheuvel
@ 2016-10-19 15:37 ` Markus Trippelsdorf
  2016-10-19 15:55   ` Linus Torvalds
  1 sibling, 1 reply; 30+ messages in thread
From: Markus Trippelsdorf @ 2016-10-19 15:37 UTC (permalink / raw)
  To: linux-arm-kernel

On 2016.10.17 at 19:38 +0100, Will Deacon wrote:
> Hi all,
> 
> I'm seeing an arm64 build failure with -rc1 and GCC trunk, although I
> believe that the new compiler behaviour at the heart of the problem
> has the potential to affect other architectures and other pieces of
> kernel code relying on dead-code elimination to remove deliberately
> undefined functions.
> 
> The failure looks like:
> 
>   | drivers/built-in.o: In function `armada_3700_add_composite_clk':
>   |
>   | linux/drivers/clk/mvebu/armada-37xx-periph.c:351:
>   | undefined reference to `____ilog2_NaN'
>   |
>   | linux/drivers/clk/mvebu/armada-37xx-periph.c:351:(.text+0xc72e0):
>   | relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol
>   | `____ilog2_NaN'
>   |
>   | make: *** [vmlinux] Error 1
> 

This is a gcc bug, see:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72785

-- 
Markus

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Build failure with v4.9-rc1 and GCC trunk -- compiler weirdness
  2016-10-19 15:27           ` Ard Biesheuvel
@ 2016-10-19 15:44             ` Arnd Bergmann
  0 siblings, 0 replies; 30+ messages in thread
From: Arnd Bergmann @ 2016-10-19 15:44 UTC (permalink / raw)
  To: linux-arm-kernel

On Wednesday, October 19, 2016 4:27:51 PM CEST Ard Biesheuvel wrote:
> >
> > Why not turn it into a runtime warning in this driver?
> >
> > diff --git a/drivers/clk/mvebu/armada-37xx-periph.c b/drivers/clk/mvebu/armada-37xx-periph.c
> > index cecb0fdfaef6..711d1d9842cc 100644
> > --- a/drivers/clk/mvebu/armada-37xx-periph.c
> > +++ b/drivers/clk/mvebu/armada-37xx-periph.c
> > @@ -349,8 +349,10 @@ static int armada_3700_add_composite_clk(const struct clk_periph_data *data,
> >                         rate->reg = reg + (u64)rate->reg;
> >                         for (clkt = rate->table; clkt->div; clkt++)
> >                                 table_size++;
> > -                       rate->width = order_base_2(table_size);
> > -                       rate->lock = lock;
> > +                       if (!WARN_ON(table_size == 0)) {
> > +                               rate->width = order_base_2(table_size);
> > +                               rate->lock = lock;
> > +                       }
> >                 }
> >         }
> >
> 
> I guess Will is not looking for a way to fix the driver, but for a way
> to eliminate this issue entirely going forward.
>
> In general, I think the issue where constant folding results in
> ilog2() or other similar functions being called with invalid build
> time constant parameter values is simply something we have to deal
> with.
> 
> In this case, it is in fact order_base_2() that deviates from its
> documented behavior (as Will points out), and fixing /that/ should
> make this particular issue go away afaict.

Ah, right. I also noticed that order_base_2() is defined as
log2(1 << (log2(n-1)+1)), which seems a bit redundant.
Maybe we can simplify it to something like

#define order_base_2(n) ((n) <= 1) ? 0 : log2((n) - 1) + 1)

	Arnd

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Build failure with v4.9-rc1 and GCC trunk -- compiler weirdness
  2016-10-19 15:37 ` Markus Trippelsdorf
@ 2016-10-19 15:55   ` Linus Torvalds
  2016-10-19 15:56     ` Markus Trippelsdorf
  0 siblings, 1 reply; 30+ messages in thread
From: Linus Torvalds @ 2016-10-19 15:55 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Oct 19, 2016 at 8:37 AM, Markus Trippelsdorf
<markus@trippelsdorf.de> wrote:
>
> This is a gcc bug, see:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72785

Well, in the meantime we apparently have to live with it. Unless Will
is using some unreleased gcc version that nobody else is using and we
can just ignore it?

I don't think the link-time check is so important that we need to
notice it, and the "____ilog2_NaN()" could just be replaced with "0".

                 Linus

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Build failure with v4.9-rc1 and GCC trunk -- compiler weirdness
  2016-10-19 15:55   ` Linus Torvalds
@ 2016-10-19 15:56     ` Markus Trippelsdorf
  2016-10-19 16:00       ` Ard Biesheuvel
  2016-10-19 16:01       ` Linus Torvalds
  0 siblings, 2 replies; 30+ messages in thread
From: Markus Trippelsdorf @ 2016-10-19 15:56 UTC (permalink / raw)
  To: linux-arm-kernel

On 2016.10.19 at 08:55 -0700, Linus Torvalds wrote:
> On Wed, Oct 19, 2016 at 8:37 AM, Markus Trippelsdorf
> <markus@trippelsdorf.de> wrote:
> >
> > This is a gcc bug, see:
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72785
> 
> Well, in the meantime we apparently have to live with it. Unless Will
> is using some unreleased gcc version that nobody else is using and we
> can just ignore it?

Yes, he is using gcc-7 that is unreleased. (It will be released April
next year.) 

-- 
Markus

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Build failure with v4.9-rc1 and GCC trunk -- compiler weirdness
  2016-10-19 13:35   ` Will Deacon
  2016-10-19 14:59     ` Ard Biesheuvel
@ 2016-10-19 15:58     ` Russell King - ARM Linux
  1 sibling, 0 replies; 30+ messages in thread
From: Russell King - ARM Linux @ 2016-10-19 15:58 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Oct 19, 2016 at 02:35:00PM +0100, Will Deacon wrote:
> Hi Ard,
> 
> On Mon, Oct 17, 2016 at 08:43:19PM +0100, Ard Biesheuvel wrote:
> > If order_base_2() is not defined for input 0, it should BUG() in that
> > case, and the associated __builtin_unreachable() should prevent the
> > special version from being emitted. If order_base_2() is defined for input
> > 0, it should not invoke ilog2() with that argument, and the problem should
> > go away as well.
> 
> I don't necessarily think it should BUG() if it's not defined for input
> 0;

In any case, Linus will have a rant about that: Linus has already been
concerned about the abuse of BUG().  BUG() should not be used as an
assert() replacement, but should be used where we have absolutely
no other option than to crash the kernel, because (eg) continuing
would result in the users' data being corrupted.

So no, BUG() is not the answer here.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Build failure with v4.9-rc1 and GCC trunk -- compiler weirdness
  2016-10-19 15:56     ` Markus Trippelsdorf
@ 2016-10-19 16:00       ` Ard Biesheuvel
  2016-10-19 16:01       ` Linus Torvalds
  1 sibling, 0 replies; 30+ messages in thread
From: Ard Biesheuvel @ 2016-10-19 16:00 UTC (permalink / raw)
  To: linux-arm-kernel

On 19 October 2016 at 16:56, Markus Trippelsdorf <markus@trippelsdorf.de> wrote:
> On 2016.10.19 at 08:55 -0700, Linus Torvalds wrote:
>> On Wed, Oct 19, 2016 at 8:37 AM, Markus Trippelsdorf
>> <markus@trippelsdorf.de> wrote:
>> >
>> > This is a gcc bug, see:
>> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72785
>>
>> Well, in the meantime we apparently have to live with it. Unless Will
>> is using some unreleased gcc version that nobody else is using and we
>> can just ignore it?
>
> Yes, he is using gcc-7 that is unreleased. (It will be released April
> next year.)
>

order_base_2() is still broken though, given that it is documented as

 * The first few values calculated by this routine:
 *  ob2(0) = 0
 *  ob2(1) = 0
 *  ob2(2) = 1
 *  ob2(3) = 2
 *  ob2(4) = 2
 *  ob2(5) = 3

whereas order_base_2(0) actually ends up invoking
roundup_pow_of_two(0), which is documented as being undefined.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Build failure with v4.9-rc1 and GCC trunk -- compiler weirdness
  2016-10-19 15:56     ` Markus Trippelsdorf
  2016-10-19 16:00       ` Ard Biesheuvel
@ 2016-10-19 16:01       ` Linus Torvalds
  2016-10-19 16:22         ` Will Deacon
  1 sibling, 1 reply; 30+ messages in thread
From: Linus Torvalds @ 2016-10-19 16:01 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Oct 19, 2016 at 8:56 AM, Markus Trippelsdorf
<markus@trippelsdorf.de> wrote:
> On 2016.10.19 at 08:55 -0700, Linus Torvalds wrote:
>>
>> Well, in the meantime we apparently have to live with it. Unless Will
>> is using some unreleased gcc version that nobody else is using and we
>> can just ignore it?
>
> Yes, he is using gcc-7 that is unreleased. (It will be released April
> next year.)

Ahh, self-built? So it's not part of some experimental ARM distro
setup and this will be annoying lots of people?

If so, still think that we could just get rid of the ____ilog2_NaN()
thing as it's not _that_ important, but it's certainly not very
high-priority. Will can do it in his tree too for testing, and it can
remind people to get the gcc problem fixed.

              Linus

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Build failure with v4.9-rc1 and GCC trunk -- compiler weirdness
  2016-10-19 16:01       ` Linus Torvalds
@ 2016-10-19 16:22         ` Will Deacon
  2017-02-01 16:58           ` Laura Abbott
  0 siblings, 1 reply; 30+ messages in thread
From: Will Deacon @ 2016-10-19 16:22 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Oct 19, 2016 at 09:01:33AM -0700, Linus Torvalds wrote:
> On Wed, Oct 19, 2016 at 8:56 AM, Markus Trippelsdorf
> <markus@trippelsdorf.de> wrote:
> > On 2016.10.19 at 08:55 -0700, Linus Torvalds wrote:
> >>
> >> Well, in the meantime we apparently have to live with it. Unless Will
> >> is using some unreleased gcc version that nobody else is using and we
> >> can just ignore it?
> >
> > Yes, he is using gcc-7 that is unreleased. (It will be released April
> > next year.)
> 
> Ahh, self-built? So it's not part of some experimental ARM distro
> setup and this will be annoying lots of people?

Our friendly compiler guys built it, but it's just a snapshot of trunk,
so it's all heading towards GCC 7.0. AFAIU, the problematic optimisation
is also a mid-end pass, so it would affect other architectures too.

> If so, still think that we could just get rid of the ____ilog2_NaN()
> thing as it's not _that_ important, but it's certainly not very
> high-priority. Will can do it in his tree too for testing, and it can
> remind people to get the gcc problem fixed.

I'm carrying the diff below, which fixes arm64 defconfig, but I'm worried
that we might be relying on this trick elsewhere. The arm __bad_cmpxchg
function, for example.

Will

--->8

diff --git a/include/linux/log2.h b/include/linux/log2.h
index fd7ff3d91e6a..9cf5ad69065d 100644
--- a/include/linux/log2.h
+++ b/include/linux/log2.h
@@ -16,12 +16,6 @@
 #include <linux/bitops.h>
 
 /*
- * deal with unrepresentable constant logarithms
- */
-extern __attribute__((const, noreturn))
-int ____ilog2_NaN(void);
-
-/*
  * non-constant log of base 2 calculators
  * - the arch may override these in asm/bitops.h if they can be implemented
  *   more efficiently than using fls() and fls64()
@@ -85,7 +79,7 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
 #define ilog2(n)				\
 (						\
 	__builtin_constant_p(n) ? (		\
-		(n) < 1 ? ____ilog2_NaN() :	\
+		(n) < 1 ? 0 :			\
 		(n) & (1ULL << 63) ? 63 :	\
 		(n) & (1ULL << 62) ? 62 :	\
 		(n) & (1ULL << 61) ? 61 :	\
@@ -149,9 +143,7 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
 		(n) & (1ULL <<  3) ?  3 :	\
 		(n) & (1ULL <<  2) ?  2 :	\
 		(n) & (1ULL <<  1) ?  1 :	\
-		(n) & (1ULL <<  0) ?  0 :	\
-		____ilog2_NaN()			\
-				   ) :		\
+		0) :				\
 	(sizeof(n) <= 4) ?			\
 	__ilog2_u32(n) :			\
 	__ilog2_u64(n)				\
@@ -194,7 +186,6 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
  * @n: parameter
  *
  * The first few values calculated by this routine:
- *  ob2(0) = 0
  *  ob2(1) = 0
  *  ob2(2) = 1
  *  ob2(3) = 2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Build failure with v4.9-rc1 and GCC trunk -- compiler weirdness
  2016-10-19 16:22         ` Will Deacon
@ 2017-02-01 16:58           ` Laura Abbott
  2017-02-01 17:36             ` Ard Biesheuvel
  0 siblings, 1 reply; 30+ messages in thread
From: Laura Abbott @ 2017-02-01 16:58 UTC (permalink / raw)
  To: linux-arm-kernel

On 10/19/2016 09:22 AM, Will Deacon wrote:
> On Wed, Oct 19, 2016 at 09:01:33AM -0700, Linus Torvalds wrote:
>> On Wed, Oct 19, 2016 at 8:56 AM, Markus Trippelsdorf
>> <markus@trippelsdorf.de> wrote:
>>> On 2016.10.19 at 08:55 -0700, Linus Torvalds wrote:
>>>>
>>>> Well, in the meantime we apparently have to live with it. Unless Will
>>>> is using some unreleased gcc version that nobody else is using and we
>>>> can just ignore it?
>>>
>>> Yes, he is using gcc-7 that is unreleased. (It will be released April
>>> next year.)
>>
>> Ahh, self-built? So it's not part of some experimental ARM distro
>> setup and this will be annoying lots of people?
> 
> Our friendly compiler guys built it, but it's just a snapshot of trunk,
> so it's all heading towards GCC 7.0. AFAIU, the problematic optimisation
> is also a mid-end pass, so it would affect other architectures too.
> 
>> If so, still think that we could just get rid of the ____ilog2_NaN()
>> thing as it's not _that_ important, but it's certainly not very
>> high-priority. Will can do it in his tree too for testing, and it can
>> remind people to get the gcc problem fixed.
> 
> I'm carrying the diff below, which fixes arm64 defconfig, but I'm worried
> that we might be relying on this trick elsewhere. The arm __bad_cmpxchg
> function, for example.
> 
> Will
> 
> --->8
> 
> diff --git a/include/linux/log2.h b/include/linux/log2.h
> index fd7ff3d91e6a..9cf5ad69065d 100644
> --- a/include/linux/log2.h
> +++ b/include/linux/log2.h
> @@ -16,12 +16,6 @@
>  #include <linux/bitops.h>
>  
>  /*
> - * deal with unrepresentable constant logarithms
> - */
> -extern __attribute__((const, noreturn))
> -int ____ilog2_NaN(void);
> -
> -/*
>   * non-constant log of base 2 calculators
>   * - the arch may override these in asm/bitops.h if they can be implemented
>   *   more efficiently than using fls() and fls64()
> @@ -85,7 +79,7 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
>  #define ilog2(n)				\
>  (						\
>  	__builtin_constant_p(n) ? (		\
> -		(n) < 1 ? ____ilog2_NaN() :	\
> +		(n) < 1 ? 0 :			\
>  		(n) & (1ULL << 63) ? 63 :	\
>  		(n) & (1ULL << 62) ? 62 :	\
>  		(n) & (1ULL << 61) ? 61 :	\
> @@ -149,9 +143,7 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
>  		(n) & (1ULL <<  3) ?  3 :	\
>  		(n) & (1ULL <<  2) ?  2 :	\
>  		(n) & (1ULL <<  1) ?  1 :	\
> -		(n) & (1ULL <<  0) ?  0 :	\
> -		____ilog2_NaN()			\
> -				   ) :		\
> +		0) :				\
>  	(sizeof(n) <= 4) ?			\
>  	__ilog2_u32(n) :			\
>  	__ilog2_u64(n)				\
> @@ -194,7 +186,6 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
>   * @n: parameter
>   *
>   * The first few values calculated by this routine:
> - *  ob2(0) = 0
>   *  ob2(1) = 0
>   *  ob2(2) = 1
>   *  ob2(3) = 2
> 

Reviving this thread as gcc 7 has now hit Fedora rawhide and has this
same issue. I pulled in the above patch from Will as a temporary work
around for building. It didn't look like there was consensus on a
permanent solution though from the thread.

Thanks,
Laura

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Build failure with v4.9-rc1 and GCC trunk -- compiler weirdness
  2017-02-01 16:58           ` Laura Abbott
@ 2017-02-01 17:36             ` Ard Biesheuvel
  2017-02-01 18:19               ` Ard Biesheuvel
  2017-02-01 21:50               ` Laura Abbott
  0 siblings, 2 replies; 30+ messages in thread
From: Ard Biesheuvel @ 2017-02-01 17:36 UTC (permalink / raw)
  To: linux-arm-kernel

On 1 February 2017 at 16:58, Laura Abbott <labbott@redhat.com> wrote:
> On 10/19/2016 09:22 AM, Will Deacon wrote:
>> On Wed, Oct 19, 2016 at 09:01:33AM -0700, Linus Torvalds wrote:
>>> On Wed, Oct 19, 2016 at 8:56 AM, Markus Trippelsdorf
>>> <markus@trippelsdorf.de> wrote:
>>>> On 2016.10.19 at 08:55 -0700, Linus Torvalds wrote:
>>>>>
>>>>> Well, in the meantime we apparently have to live with it. Unless Will
>>>>> is using some unreleased gcc version that nobody else is using and we
>>>>> can just ignore it?
>>>>
>>>> Yes, he is using gcc-7 that is unreleased. (It will be released April
>>>> next year.)
>>>
>>> Ahh, self-built? So it's not part of some experimental ARM distro
>>> setup and this will be annoying lots of people?
>>
>> Our friendly compiler guys built it, but it's just a snapshot of trunk,
>> so it's all heading towards GCC 7.0. AFAIU, the problematic optimisation
>> is also a mid-end pass, so it would affect other architectures too.
>>
>>> If so, still think that we could just get rid of the ____ilog2_NaN()
>>> thing as it's not _that_ important, but it's certainly not very
>>> high-priority. Will can do it in his tree too for testing, and it can
>>> remind people to get the gcc problem fixed.
>>
>> I'm carrying the diff below, which fixes arm64 defconfig, but I'm worried
>> that we might be relying on this trick elsewhere. The arm __bad_cmpxchg
>> function, for example.
>>
>> Will
>>
>> --->8
>>
>> diff --git a/include/linux/log2.h b/include/linux/log2.h
>> index fd7ff3d91e6a..9cf5ad69065d 100644
>> --- a/include/linux/log2.h
>> +++ b/include/linux/log2.h
>> @@ -16,12 +16,6 @@
>>  #include <linux/bitops.h>
>>
>>  /*
>> - * deal with unrepresentable constant logarithms
>> - */
>> -extern __attribute__((const, noreturn))
>> -int ____ilog2_NaN(void);
>> -
>> -/*
>>   * non-constant log of base 2 calculators
>>   * - the arch may override these in asm/bitops.h if they can be implemented
>>   *   more efficiently than using fls() and fls64()
>> @@ -85,7 +79,7 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
>>  #define ilog2(n)                             \
>>  (                                            \
>>       __builtin_constant_p(n) ? (             \
>> -             (n) < 1 ? ____ilog2_NaN() :     \
>> +             (n) < 1 ? 0 :                   \
>>               (n) & (1ULL << 63) ? 63 :       \
>>               (n) & (1ULL << 62) ? 62 :       \
>>               (n) & (1ULL << 61) ? 61 :       \
>> @@ -149,9 +143,7 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
>>               (n) & (1ULL <<  3) ?  3 :       \
>>               (n) & (1ULL <<  2) ?  2 :       \
>>               (n) & (1ULL <<  1) ?  1 :       \
>> -             (n) & (1ULL <<  0) ?  0 :       \
>> -             ____ilog2_NaN()                 \
>> -                                ) :          \
>> +             0) :                            \
>>       (sizeof(n) <= 4) ?                      \
>>       __ilog2_u32(n) :                        \
>>       __ilog2_u64(n)                          \
>> @@ -194,7 +186,6 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
>>   * @n: parameter
>>   *
>>   * The first few values calculated by this routine:
>> - *  ob2(0) = 0
>>   *  ob2(1) = 0
>>   *  ob2(2) = 1
>>   *  ob2(3) = 2
>>
>
> Reviving this thread as gcc 7 has now hit Fedora rawhide and has this
> same issue. I pulled in the above patch from Will as a temporary work
> around for building. It didn't look like there was consensus on a
> permanent solution though from the thread.
>

I still think order_base_2() is broken, since it may invoke
roundup_pow_of_two() with an input value that is documented as
producing undefined output. I would argue that the below is the
correct fix.

diff --git a/include/linux/log2.h b/include/linux/log2.h
index fd7ff3d91e6a..46523731bec0 100644
--- a/include/linux/log2.h
+++ b/include/linux/log2.h
@@ -203,6 +203,18 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
  *  ... and so on.
  */

-#define order_base_2(n) ilog2(roundup_pow_of_two(n))
+static inline __attribute__((__const__))
+unsigned long __order_base_2(unsigned long n)
+{
+       return n ? 1UL << fls_long(n - 1) : 1;
+}
+
+#define order_base_2(n)                                \
+(                                              \
+       __builtin_constant_p(n) ? (             \
+               ((n) < 2) ? (n) :               \
+               ilog2((n) - 1) + 1) :           \
+       ilog2(__order_base_2(n))                \
+ )

 #endif /* _LINUX_LOG2_H */

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Build failure with v4.9-rc1 and GCC trunk -- compiler weirdness
  2017-02-01 17:36             ` Ard Biesheuvel
@ 2017-02-01 18:19               ` Ard Biesheuvel
  2017-02-01 19:04                 ` Joe Perches
  2017-02-01 21:50               ` Laura Abbott
  1 sibling, 1 reply; 30+ messages in thread
From: Ard Biesheuvel @ 2017-02-01 18:19 UTC (permalink / raw)
  To: linux-arm-kernel

On 1 February 2017 at 17:36, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> On 1 February 2017 at 16:58, Laura Abbott <labbott@redhat.com> wrote:
>> On 10/19/2016 09:22 AM, Will Deacon wrote:
>>> On Wed, Oct 19, 2016 at 09:01:33AM -0700, Linus Torvalds wrote:
>>>> On Wed, Oct 19, 2016 at 8:56 AM, Markus Trippelsdorf
>>>> <markus@trippelsdorf.de> wrote:
>>>>> On 2016.10.19 at 08:55 -0700, Linus Torvalds wrote:
>>>>>>
>>>>>> Well, in the meantime we apparently have to live with it. Unless Will
>>>>>> is using some unreleased gcc version that nobody else is using and we
>>>>>> can just ignore it?
>>>>>
>>>>> Yes, he is using gcc-7 that is unreleased. (It will be released April
>>>>> next year.)
>>>>
>>>> Ahh, self-built? So it's not part of some experimental ARM distro
>>>> setup and this will be annoying lots of people?
>>>
>>> Our friendly compiler guys built it, but it's just a snapshot of trunk,
>>> so it's all heading towards GCC 7.0. AFAIU, the problematic optimisation
>>> is also a mid-end pass, so it would affect other architectures too.
>>>
>>>> If so, still think that we could just get rid of the ____ilog2_NaN()
>>>> thing as it's not _that_ important, but it's certainly not very
>>>> high-priority. Will can do it in his tree too for testing, and it can
>>>> remind people to get the gcc problem fixed.
>>>
>>> I'm carrying the diff below, which fixes arm64 defconfig, but I'm worried
>>> that we might be relying on this trick elsewhere. The arm __bad_cmpxchg
>>> function, for example.
>>>
>>> Will
>>>
>>> --->8
>>>
>>> diff --git a/include/linux/log2.h b/include/linux/log2.h
>>> index fd7ff3d91e6a..9cf5ad69065d 100644
>>> --- a/include/linux/log2.h
>>> +++ b/include/linux/log2.h
>>> @@ -16,12 +16,6 @@
>>>  #include <linux/bitops.h>
>>>
>>>  /*
>>> - * deal with unrepresentable constant logarithms
>>> - */
>>> -extern __attribute__((const, noreturn))
>>> -int ____ilog2_NaN(void);
>>> -
>>> -/*
>>>   * non-constant log of base 2 calculators
>>>   * - the arch may override these in asm/bitops.h if they can be implemented
>>>   *   more efficiently than using fls() and fls64()
>>> @@ -85,7 +79,7 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
>>>  #define ilog2(n)                             \
>>>  (                                            \
>>>       __builtin_constant_p(n) ? (             \
>>> -             (n) < 1 ? ____ilog2_NaN() :     \
>>> +             (n) < 1 ? 0 :                   \
>>>               (n) & (1ULL << 63) ? 63 :       \
>>>               (n) & (1ULL << 62) ? 62 :       \
>>>               (n) & (1ULL << 61) ? 61 :       \
>>> @@ -149,9 +143,7 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
>>>               (n) & (1ULL <<  3) ?  3 :       \
>>>               (n) & (1ULL <<  2) ?  2 :       \
>>>               (n) & (1ULL <<  1) ?  1 :       \
>>> -             (n) & (1ULL <<  0) ?  0 :       \
>>> -             ____ilog2_NaN()                 \
>>> -                                ) :          \
>>> +             0) :                            \
>>>       (sizeof(n) <= 4) ?                      \
>>>       __ilog2_u32(n) :                        \
>>>       __ilog2_u64(n)                          \
>>> @@ -194,7 +186,6 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
>>>   * @n: parameter
>>>   *
>>>   * The first few values calculated by this routine:
>>> - *  ob2(0) = 0
>>>   *  ob2(1) = 0
>>>   *  ob2(2) = 1
>>>   *  ob2(3) = 2
>>>
>>
>> Reviving this thread as gcc 7 has now hit Fedora rawhide and has this
>> same issue. I pulled in the above patch from Will as a temporary work
>> around for building. It didn't look like there was consensus on a
>> permanent solution though from the thread.
>>
>
> I still think order_base_2() is broken, since it may invoke
> roundup_pow_of_two() with an input value that is documented as
> producing undefined output. I would argue that the below is the
> correct fix.
>
> diff --git a/include/linux/log2.h b/include/linux/log2.h
> index fd7ff3d91e6a..46523731bec0 100644
> --- a/include/linux/log2.h
> +++ b/include/linux/log2.h
> @@ -203,6 +203,18 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
>   *  ... and so on.
>   */
>
> -#define order_base_2(n) ilog2(roundup_pow_of_two(n))
> +static inline __attribute__((__const__))
> +unsigned long __order_base_2(unsigned long n)
> +{
> +       return n ? 1UL << fls_long(n - 1) : 1;
> +}
> +
> +#define order_base_2(n)                                \
> +(                                              \
> +       __builtin_constant_p(n) ? (             \
> +               ((n) < 2) ? (n) :               \
> +               ilog2((n) - 1) + 1) :           \
> +       ilog2(__order_base_2(n))                \
> + )
>
>  #endif /* _LINUX_LOG2_H */

Actually, there is a still a redundant shift/fls() in there, this is
even simpler:

diff --git a/include/linux/log2.h b/include/linux/log2.h
index fd7ff3d91e6a..4741534bd7af 100644
--- a/include/linux/log2.h
+++ b/include/linux/log2.h
@@ -203,6 +203,18 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
  *  ... and so on.
  */

-#define order_base_2(n) ilog2(roundup_pow_of_two(n))
+static inline __attribute__((__const__))
+unsigned long __order_base_2(unsigned long n)
+{
+       return n > 1 ? ilog2(n - 1) + 1 : 0;
+}
+
+#define order_base_2(n)                                \
+(                                              \
+       __builtin_constant_p(n) ? (             \
+               ((n) < 2) ? (n) :               \
+               ilog2((n) - 1) + 1) :           \
+       __order_base_2(n)                       \
+ )

 #endif /* _LINUX_LOG2_H */

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Build failure with v4.9-rc1 and GCC trunk -- compiler weirdness
  2017-02-01 18:19               ` Ard Biesheuvel
@ 2017-02-01 19:04                 ` Joe Perches
  2017-02-01 19:31                   ` Ard Biesheuvel
  2017-02-02  9:02                   ` Peter Zijlstra
  0 siblings, 2 replies; 30+ messages in thread
From: Joe Perches @ 2017-02-01 19:04 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 2017-02-01 at 18:19 +0000, Ard Biesheuvel wrote:
> On 1 February 2017 at 17:36, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> > I still think order_base_2() is broken, since it may invoke
> > roundup_pow_of_two() with an input value that is documented as
> > producing undefined output. I would argue that the below is the
> > correct fix.
> > 
> > diff --git a/include/linux/log2.h b/include/linux/log2.h
> > index fd7ff3d91e6a..46523731bec0 100644
> > --- a/include/linux/log2.h
> > +++ b/include/linux/log2.h
> > @@ -203,6 +203,18 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
> >   *  ... and so on.
> >   */
> > 
> > -#define order_base_2(n) ilog2(roundup_pow_of_two(n))
> > +static inline __attribute__((__const__))
> > +unsigned long __order_base_2(unsigned long n)
> > +{
> > +       return n ? 1UL << fls_long(n - 1) : 1;
> > +}
> > +
> > +#define order_base_2(n)                                \
> > +(                                              \
> > +       __builtin_constant_p(n) ? (             \
> > +               ((n) < 2) ? (n) :               \
> > +               ilog2((n) - 1) + 1) :           \
> > +       ilog2(__order_base_2(n))                \
> > + )
> > 
> >  #endif /* _LINUX_LOG2_H */
> 
> Actually, there is a still a redundant shift/fls() in there, this is
> even simpler:
> 
> diff --git a/include/linux/log2.h b/include/linux/log2.h
> index fd7ff3d91e6a..4741534bd7af 100644
> --- a/include/linux/log2.h
> +++ b/include/linux/log2.h
> @@ -203,6 +203,18 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
>   *  ... and so on.
>   */
> 
> -#define order_base_2(n) ilog2(roundup_pow_of_two(n))
> +static inline __attribute__((__const__))

commonly __attribute_const__

> +unsigned long __order_base_2(unsigned long n)
> +{
> +       return n > 1 ? ilog2(n - 1) + 1 : 0;
> +}
> +
> +#define order_base_2(n)                                \
> +(                                              \
> +       __builtin_constant_p(n) ? (             \
> +               ((n) < 2) ? (n) :               \
> +               ilog2((n) - 1) + 1) :           \
> +       __order_base_2(n)                       \
> + )

Does this work properly when n is a signed negative value?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Build failure with v4.9-rc1 and GCC trunk -- compiler weirdness
  2017-02-01 19:04                 ` Joe Perches
@ 2017-02-01 19:31                   ` Ard Biesheuvel
  2017-02-01 19:49                     ` Joe Perches
  2017-02-02  9:02                   ` Peter Zijlstra
  1 sibling, 1 reply; 30+ messages in thread
From: Ard Biesheuvel @ 2017-02-01 19:31 UTC (permalink / raw)
  To: linux-arm-kernel

On 1 February 2017 at 19:04, Joe Perches <joe@perches.com> wrote:
> On Wed, 2017-02-01 at 18:19 +0000, Ard Biesheuvel wrote:
>> On 1 February 2017 at 17:36, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>> > I still think order_base_2() is broken, since it may invoke
>> > roundup_pow_of_two() with an input value that is documented as
>> > producing undefined output. I would argue that the below is the
>> > correct fix.
>> >
>> > diff --git a/include/linux/log2.h b/include/linux/log2.h
>> > index fd7ff3d91e6a..46523731bec0 100644
>> > --- a/include/linux/log2.h
>> > +++ b/include/linux/log2.h
>> > @@ -203,6 +203,18 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
>> >   *  ... and so on.
>> >   */
>> >
>> > -#define order_base_2(n) ilog2(roundup_pow_of_two(n))
>> > +static inline __attribute__((__const__))
>> > +unsigned long __order_base_2(unsigned long n)
>> > +{
>> > +       return n ? 1UL << fls_long(n - 1) : 1;
>> > +}
>> > +
>> > +#define order_base_2(n)                                \
>> > +(                                              \
>> > +       __builtin_constant_p(n) ? (             \
>> > +               ((n) < 2) ? (n) :               \
>> > +               ilog2((n) - 1) + 1) :           \
>> > +       ilog2(__order_base_2(n))                \
>> > + )
>> >
>> >  #endif /* _LINUX_LOG2_H */
>>
>> Actually, there is a still a redundant shift/fls() in there, this is
>> even simpler:
>>
>> diff --git a/include/linux/log2.h b/include/linux/log2.h
>> index fd7ff3d91e6a..4741534bd7af 100644
>> --- a/include/linux/log2.h
>> +++ b/include/linux/log2.h
>> @@ -203,6 +203,18 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
>>   *  ... and so on.
>>   */
>>
>> -#define order_base_2(n) ilog2(roundup_pow_of_two(n))
>> +static inline __attribute__((__const__))
>
> commonly __attribute_const__
>

... except in <linux/ilog2.h>, which probably predates that.

>> +unsigned long __order_base_2(unsigned long n)
>> +{
>> +       return n > 1 ? ilog2(n - 1) + 1 : 0;
>> +}
>> +
>> +#define order_base_2(n)                                \
>> +(                                              \
>> +       __builtin_constant_p(n) ? (             \
>> +               ((n) < 2) ? (n) :               \
>> +               ilog2((n) - 1) + 1) :           \
>> +       __order_base_2(n)                       \
>> + )
>
> Does this work properly when n is a signed negative value?
>

No, but order_base_2() is explicitly only defined for inputs [0, ->),
so its behavior for negative inputs is best left undefined.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Build failure with v4.9-rc1 and GCC trunk -- compiler weirdness
  2017-02-01 19:31                   ` Ard Biesheuvel
@ 2017-02-01 19:49                     ` Joe Perches
  2017-02-01 19:53                       ` Ard Biesheuvel
  0 siblings, 1 reply; 30+ messages in thread
From: Joe Perches @ 2017-02-01 19:49 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 2017-02-01 at 19:31 +0000, Ard Biesheuvel wrote:
> On 1 February 2017 at 19:04, Joe Perches <joe@perches.com> wrote:
> > On Wed, 2017-02-01 at 18:19 +0000, Ard Biesheuvel wrote:
> > > On 1 February 2017 at 17:36, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> > > > I still think order_base_2() is broken, since it may invoke
> > > > roundup_pow_of_two() with an input value that is documented as
> > > > producing undefined output. I would argue that the below is the
> > > > correct fix.
> > > > 
> > > > diff --git a/include/linux/log2.h b/include/linux/log2.h
> > > > index fd7ff3d91e6a..46523731bec0 100644
> > > > --- a/include/linux/log2.h
> > > > +++ b/include/linux/log2.h
> > > > @@ -203,6 +203,18 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
> > > >   *  ... and so on.
> > > >   */
> > > > 
> > > > -#define order_base_2(n) ilog2(roundup_pow_of_two(n))
> > > > +static inline __attribute__((__const__))
> > > > +unsigned long __order_base_2(unsigned long n)
> > > > +{
> > > > +       return n ? 1UL << fls_long(n - 1) : 1;
> > > > +}
> > > > +
> > > > +#define order_base_2(n)                                \
> > > > +(                                              \
> > > > +       __builtin_constant_p(n) ? (             \
> > > > +               ((n) < 2) ? (n) :               \
> > > > +               ilog2((n) - 1) + 1) :           \
> > > > +       ilog2(__order_base_2(n))                \
> > > > + )
> > > > 
> > > >  #endif /* _LINUX_LOG2_H */
> > > 
> > > Actually, there is a still a redundant shift/fls() in there, this is
> > > even simpler:
> > > 
> > > diff --git a/include/linux/log2.h b/include/linux/log2.h
> > > index fd7ff3d91e6a..4741534bd7af 100644
> > > --- a/include/linux/log2.h
> > > +++ b/include/linux/log2.h
> > > @@ -203,6 +203,18 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
> > >   *  ... and so on.
> > >   */
> > > 
> > > -#define order_base_2(n) ilog2(roundup_pow_of_two(n))
> > > +static inline __attribute__((__const__))
> > 
> > commonly __attribute_const__
> > 
> 
> ... except in <linux/ilog2.h>, which probably predates that.
> 
> > > +unsigned long __order_base_2(unsigned long n)
> > > +{
> > > +       return n > 1 ? ilog2(n - 1) + 1 : 0;
> > > +}
> > > +
> > > +#define order_base_2(n)                                \
> > > +(                                              \
> > > +       __builtin_constant_p(n) ? (             \
> > > +               ((n) < 2) ? (n) :               \
> > > +               ilog2((n) - 1) + 1) :           \
> > > +       __order_base_2(n)                       \
> > > + )
> > 
> > Does this work properly when n is a signed negative value?
> > 
> 
> No, but order_base_2() is explicitly only defined for inputs [0, ->),

where?

> so its behavior for negative inputs is best left undefined.

Or maybe add a BUILD_BUG_ON something like:

#define order_base_2(n)							\
({									\
	typeof(n) _n = n;						\
	BUILD_BUG_ON(__builtin_constant_p(_n) && _n < 0);		\
	__builtin_constant_p(_n) ? (_n < 2 ? _n :?ilog2((_n) - 1) + 1))	\
				 :?__order_base_2(_n);			\
})

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Build failure with v4.9-rc1 and GCC trunk -- compiler weirdness
  2017-02-01 19:49                     ` Joe Perches
@ 2017-02-01 19:53                       ` Ard Biesheuvel
  2017-02-01 20:34                         ` Joe Perches
  0 siblings, 1 reply; 30+ messages in thread
From: Ard Biesheuvel @ 2017-02-01 19:53 UTC (permalink / raw)
  To: linux-arm-kernel

On 1 February 2017 at 19:49, Joe Perches <joe@perches.com> wrote:
> On Wed, 2017-02-01 at 19:31 +0000, Ard Biesheuvel wrote:
>> On 1 February 2017 at 19:04, Joe Perches <joe@perches.com> wrote:
>> > On Wed, 2017-02-01 at 18:19 +0000, Ard Biesheuvel wrote:
>> > > On 1 February 2017 at 17:36, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>> > > > I still think order_base_2() is broken, since it may invoke
>> > > > roundup_pow_of_two() with an input value that is documented as
>> > > > producing undefined output. I would argue that the below is the
>> > > > correct fix.
>> > > >
>> > > > diff --git a/include/linux/log2.h b/include/linux/log2.h
>> > > > index fd7ff3d91e6a..46523731bec0 100644
>> > > > --- a/include/linux/log2.h
>> > > > +++ b/include/linux/log2.h
>> > > > @@ -203,6 +203,18 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
>> > > >   *  ... and so on.
>> > > >   */
>> > > >
>> > > > -#define order_base_2(n) ilog2(roundup_pow_of_two(n))
>> > > > +static inline __attribute__((__const__))
>> > > > +unsigned long __order_base_2(unsigned long n)
>> > > > +{
>> > > > +       return n ? 1UL << fls_long(n - 1) : 1;
>> > > > +}
>> > > > +
>> > > > +#define order_base_2(n)                                \
>> > > > +(                                              \
>> > > > +       __builtin_constant_p(n) ? (             \
>> > > > +               ((n) < 2) ? (n) :               \
>> > > > +               ilog2((n) - 1) + 1) :           \
>> > > > +       ilog2(__order_base_2(n))                \
>> > > > + )
>> > > >
>> > > >  #endif /* _LINUX_LOG2_H */
>> > >
>> > > Actually, there is a still a redundant shift/fls() in there, this is
>> > > even simpler:
>> > >
>> > > diff --git a/include/linux/log2.h b/include/linux/log2.h
>> > > index fd7ff3d91e6a..4741534bd7af 100644
>> > > --- a/include/linux/log2.h
>> > > +++ b/include/linux/log2.h
>> > > @@ -203,6 +203,18 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
>> > >   *  ... and so on.
>> > >   */
>> > >
>> > > -#define order_base_2(n) ilog2(roundup_pow_of_two(n))
>> > > +static inline __attribute__((__const__))
>> >
>> > commonly __attribute_const__
>> >
>>
>> ... except in <linux/ilog2.h>, which probably predates that.
>>
>> > > +unsigned long __order_base_2(unsigned long n)
>> > > +{
>> > > +       return n > 1 ? ilog2(n - 1) + 1 : 0;
>> > > +}
>> > > +
>> > > +#define order_base_2(n)                                \
>> > > +(                                              \
>> > > +       __builtin_constant_p(n) ? (             \
>> > > +               ((n) < 2) ? (n) :               \
>> > > +               ilog2((n) - 1) + 1) :           \
>> > > +       __order_base_2(n)                       \
>> > > + )
>> >
>> > Does this work properly when n is a signed negative value?
>> >
>>
>> No, but order_base_2() is explicitly only defined for inputs [0, ->),
>
> where?
>

The comment describes it as follows

 /**
  * order_base_2 - calculate the (rounded up) base 2 order of the argument
  * @n: parameter
  *
  * The first few values calculated by this routine:
  *  ob2(0) = 0
  *  ob2(1) = 0
  *  ob2(2) = 1
  *  ob2(3) = 2
  *  ob2(4) = 2
  *  ob2(5) = 3
  *  ... and so on.
  */

i.e., it defines the output for inputs 0, 1, 2, 3, 4, 5, ..., and not
for negative inputs, hence undefined.

>> so its behavior for negative inputs is best left undefined.
>
> Or maybe add a BUILD_BUG_ON something like:
>
> #define order_base_2(n)                                                 \
> ({                                                                      \
>         typeof(n) _n = n;                                               \
>         BUILD_BUG_ON(__builtin_constant_p(_n) && _n < 0);               \
>         __builtin_constant_p(_n) ? (_n < 2 ? _n : ilog2((_n) - 1) + 1)) \
>                                  : __order_base_2(_n);                  \
> })
>

This would interfere with the ability to use order_base_2() in
initializers for global variables.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Build failure with v4.9-rc1 and GCC trunk -- compiler weirdness
  2017-02-01 19:53                       ` Ard Biesheuvel
@ 2017-02-01 20:34                         ` Joe Perches
  2017-02-01 21:11                           ` Ard Biesheuvel
  0 siblings, 1 reply; 30+ messages in thread
From: Joe Perches @ 2017-02-01 20:34 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 2017-02-01 at 19:53 +0000, Ard Biesheuvel wrote:
> On 1 February 2017 at 19:49, Joe Perches <joe@perches.com> wrote:
[]
> > Or maybe add a BUILD_BUG_ON something like:
> > 
> > #define order_base_2(n)                                                 \
> > ({                                                                      \
> >         typeof(n) _n = n;                                               \
> >         BUILD_BUG_ON(__builtin_constant_p(_n) && _n < 0);               \
> >         __builtin_constant_p(_n) ? (_n < 2 ? _n : ilog2((_n) - 1) + 1)) \
> >                                  : __order_base_2(_n);                  \
> > })
> > 
> 
> This would interfere with the ability to use order_base_2() in
> initializers for global variables.

There aren't any as far as I can tell and would using
order_base_2() for a global initializer make sense?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Build failure with v4.9-rc1 and GCC trunk -- compiler weirdness
  2017-02-01 20:34                         ` Joe Perches
@ 2017-02-01 21:11                           ` Ard Biesheuvel
  0 siblings, 0 replies; 30+ messages in thread
From: Ard Biesheuvel @ 2017-02-01 21:11 UTC (permalink / raw)
  To: linux-arm-kernel

On 1 February 2017 at 20:34, Joe Perches <joe@perches.com> wrote:
> On Wed, 2017-02-01 at 19:53 +0000, Ard Biesheuvel wrote:
>> On 1 February 2017 at 19:49, Joe Perches <joe@perches.com> wrote:
> []
>> > Or maybe add a BUILD_BUG_ON something like:
>> >
>> > #define order_base_2(n)                                                 \
>> > ({                                                                      \
>> >         typeof(n) _n = n;                                               \
>> >         BUILD_BUG_ON(__builtin_constant_p(_n) && _n < 0);               \
>> >         __builtin_constant_p(_n) ? (_n < 2 ? _n : ilog2((_n) - 1) + 1)) \
>> >                                  : __order_base_2(_n);                  \
>> > })
>> >
>>
>> This would interfere with the ability to use order_base_2() in
>> initializers for global variables.
>
> There aren't any as far as I can tell and would using
> order_base_2() for a global initializer make sense?
>

Why wouldn't it make sense?

In any case, we could also solve this by doing this instead

#define order_base_2(n)                        \
(                                              \
       __builtin_constant_p(n) ? (             \
               ((n) == 0 || (n) == 1) ? 0 :    \
               ilog2((n) - 1) + 1) :           \
       __order_base_2(n)                       \
)

which will emit the usual unresolveable __ilog2_NaN reference when
constants < 0 are passed to order_base_2()

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Build failure with v4.9-rc1 and GCC trunk -- compiler weirdness
  2017-02-01 17:36             ` Ard Biesheuvel
  2017-02-01 18:19               ` Ard Biesheuvel
@ 2017-02-01 21:50               ` Laura Abbott
  2017-02-02  9:17                 ` Ard Biesheuvel
  1 sibling, 1 reply; 30+ messages in thread
From: Laura Abbott @ 2017-02-01 21:50 UTC (permalink / raw)
  To: linux-arm-kernel

On 02/01/2017 09:36 AM, Ard Biesheuvel wrote:
> On 1 February 2017 at 16:58, Laura Abbott <labbott@redhat.com> wrote:
>> On 10/19/2016 09:22 AM, Will Deacon wrote:
>>> On Wed, Oct 19, 2016 at 09:01:33AM -0700, Linus Torvalds wrote:
>>>> On Wed, Oct 19, 2016 at 8:56 AM, Markus Trippelsdorf
>>>> <markus@trippelsdorf.de> wrote:
>>>>> On 2016.10.19 at 08:55 -0700, Linus Torvalds wrote:
>>>>>>
>>>>>> Well, in the meantime we apparently have to live with it. Unless Will
>>>>>> is using some unreleased gcc version that nobody else is using and we
>>>>>> can just ignore it?
>>>>>
>>>>> Yes, he is using gcc-7 that is unreleased. (It will be released April
>>>>> next year.)
>>>>
>>>> Ahh, self-built? So it's not part of some experimental ARM distro
>>>> setup and this will be annoying lots of people?
>>>
>>> Our friendly compiler guys built it, but it's just a snapshot of trunk,
>>> so it's all heading towards GCC 7.0. AFAIU, the problematic optimisation
>>> is also a mid-end pass, so it would affect other architectures too.
>>>
>>>> If so, still think that we could just get rid of the ____ilog2_NaN()
>>>> thing as it's not _that_ important, but it's certainly not very
>>>> high-priority. Will can do it in his tree too for testing, and it can
>>>> remind people to get the gcc problem fixed.
>>>
>>> I'm carrying the diff below, which fixes arm64 defconfig, but I'm worried
>>> that we might be relying on this trick elsewhere. The arm __bad_cmpxchg
>>> function, for example.
>>>
>>> Will
>>>
>>> --->8
>>>
>>> diff --git a/include/linux/log2.h b/include/linux/log2.h
>>> index fd7ff3d91e6a..9cf5ad69065d 100644
>>> --- a/include/linux/log2.h
>>> +++ b/include/linux/log2.h
>>> @@ -16,12 +16,6 @@
>>>  #include <linux/bitops.h>
>>>
>>>  /*
>>> - * deal with unrepresentable constant logarithms
>>> - */
>>> -extern __attribute__((const, noreturn))
>>> -int ____ilog2_NaN(void);
>>> -
>>> -/*
>>>   * non-constant log of base 2 calculators
>>>   * - the arch may override these in asm/bitops.h if they can be implemented
>>>   *   more efficiently than using fls() and fls64()
>>> @@ -85,7 +79,7 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
>>>  #define ilog2(n)                             \
>>>  (                                            \
>>>       __builtin_constant_p(n) ? (             \
>>> -             (n) < 1 ? ____ilog2_NaN() :     \
>>> +             (n) < 1 ? 0 :                   \
>>>               (n) & (1ULL << 63) ? 63 :       \
>>>               (n) & (1ULL << 62) ? 62 :       \
>>>               (n) & (1ULL << 61) ? 61 :       \
>>> @@ -149,9 +143,7 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
>>>               (n) & (1ULL <<  3) ?  3 :       \
>>>               (n) & (1ULL <<  2) ?  2 :       \
>>>               (n) & (1ULL <<  1) ?  1 :       \
>>> -             (n) & (1ULL <<  0) ?  0 :       \
>>> -             ____ilog2_NaN()                 \
>>> -                                ) :          \
>>> +             0) :                            \
>>>       (sizeof(n) <= 4) ?                      \
>>>       __ilog2_u32(n) :                        \
>>>       __ilog2_u64(n)                          \
>>> @@ -194,7 +186,6 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
>>>   * @n: parameter
>>>   *
>>>   * The first few values calculated by this routine:
>>> - *  ob2(0) = 0
>>>   *  ob2(1) = 0
>>>   *  ob2(2) = 1
>>>   *  ob2(3) = 2
>>>
>>
>> Reviving this thread as gcc 7 has now hit Fedora rawhide and has this
>> same issue. I pulled in the above patch from Will as a temporary work
>> around for building. It didn't look like there was consensus on a
>> permanent solution though from the thread.
>>
> 
> I still think order_base_2() is broken, since it may invoke
> roundup_pow_of_two() with an input value that is documented as
> producing undefined output. I would argue that the below is the
> correct fix.
> 
> diff --git a/include/linux/log2.h b/include/linux/log2.h
> index fd7ff3d91e6a..46523731bec0 100644
> --- a/include/linux/log2.h
> +++ b/include/linux/log2.h
> @@ -203,6 +203,18 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
>   *  ... and so on.
>   */
> 
> -#define order_base_2(n) ilog2(roundup_pow_of_two(n))
> +static inline __attribute__((__const__))
> +unsigned long __order_base_2(unsigned long n)
> +{
> +       return n ? 1UL << fls_long(n - 1) : 1;
> +}
> +
> +#define order_base_2(n)                                \
> +(                                              \
> +       __builtin_constant_p(n) ? (             \
> +               ((n) < 2) ? (n) :               \
> +               ilog2((n) - 1) + 1) :           \
> +       ilog2(__order_base_2(n))                \
> + )
> 
>  #endif /* _LINUX_LOG2_H */
> 

This fixes the problem although the comments should be updated
as well. Is it worth fixing this for ilog2 as well just to avoid
the link nonsense there as well?

Thanks,
Laura

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Build failure with v4.9-rc1 and GCC trunk -- compiler weirdness
  2017-02-01 19:04                 ` Joe Perches
  2017-02-01 19:31                   ` Ard Biesheuvel
@ 2017-02-02  9:02                   ` Peter Zijlstra
  1 sibling, 0 replies; 30+ messages in thread
From: Peter Zijlstra @ 2017-02-02  9:02 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Feb 01, 2017 at 11:04:54AM -0800, Joe Perches wrote:
> > +#define order_base_2(n)                                \
> > +(                                              \
> > +       __builtin_constant_p(n) ? (             \
> > +               ((n) < 2) ? (n) :               \
> > +               ilog2((n) - 1) + 1) :           \
> > +       __order_base_2(n)                       \
> > + )
> 
> Does this work properly when n is a signed negative value?

Do you see it returning a complex number?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Build failure with v4.9-rc1 and GCC trunk -- compiler weirdness
  2017-02-01 21:50               ` Laura Abbott
@ 2017-02-02  9:17                 ` Ard Biesheuvel
  2017-02-02 15:43                   ` Laura Abbott
  0 siblings, 1 reply; 30+ messages in thread
From: Ard Biesheuvel @ 2017-02-02  9:17 UTC (permalink / raw)
  To: linux-arm-kernel

On 1 February 2017 at 21:50, Laura Abbott <labbott@redhat.com> wrote:
> On 02/01/2017 09:36 AM, Ard Biesheuvel wrote:
>> On 1 February 2017 at 16:58, Laura Abbott <labbott@redhat.com> wrote:
>>> On 10/19/2016 09:22 AM, Will Deacon wrote:
>>>> On Wed, Oct 19, 2016 at 09:01:33AM -0700, Linus Torvalds wrote:
>>>>> On Wed, Oct 19, 2016 at 8:56 AM, Markus Trippelsdorf
>>>>> <markus@trippelsdorf.de> wrote:
>>>>>> On 2016.10.19 at 08:55 -0700, Linus Torvalds wrote:
>>>>>>>
>>>>>>> Well, in the meantime we apparently have to live with it. Unless Will
>>>>>>> is using some unreleased gcc version that nobody else is using and we
>>>>>>> can just ignore it?
>>>>>>
>>>>>> Yes, he is using gcc-7 that is unreleased. (It will be released April
>>>>>> next year.)
>>>>>
>>>>> Ahh, self-built? So it's not part of some experimental ARM distro
>>>>> setup and this will be annoying lots of people?
>>>>
>>>> Our friendly compiler guys built it, but it's just a snapshot of trunk,
>>>> so it's all heading towards GCC 7.0. AFAIU, the problematic optimisation
>>>> is also a mid-end pass, so it would affect other architectures too.
>>>>
>>>>> If so, still think that we could just get rid of the ____ilog2_NaN()
>>>>> thing as it's not _that_ important, but it's certainly not very
>>>>> high-priority. Will can do it in his tree too for testing, and it can
>>>>> remind people to get the gcc problem fixed.
>>>>
>>>> I'm carrying the diff below, which fixes arm64 defconfig, but I'm worried
>>>> that we might be relying on this trick elsewhere. The arm __bad_cmpxchg
>>>> function, for example.
>>>>
>>>> Will
>>>>
>>>> --->8
>>>>
>>>> diff --git a/include/linux/log2.h b/include/linux/log2.h
>>>> index fd7ff3d91e6a..9cf5ad69065d 100644
>>>> --- a/include/linux/log2.h
>>>> +++ b/include/linux/log2.h
>>>> @@ -16,12 +16,6 @@
>>>>  #include <linux/bitops.h>
>>>>
>>>>  /*
>>>> - * deal with unrepresentable constant logarithms
>>>> - */
>>>> -extern __attribute__((const, noreturn))
>>>> -int ____ilog2_NaN(void);
>>>> -
>>>> -/*
>>>>   * non-constant log of base 2 calculators
>>>>   * - the arch may override these in asm/bitops.h if they can be implemented
>>>>   *   more efficiently than using fls() and fls64()
>>>> @@ -85,7 +79,7 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
>>>>  #define ilog2(n)                             \
>>>>  (                                            \
>>>>       __builtin_constant_p(n) ? (             \
>>>> -             (n) < 1 ? ____ilog2_NaN() :     \
>>>> +             (n) < 1 ? 0 :                   \
>>>>               (n) & (1ULL << 63) ? 63 :       \
>>>>               (n) & (1ULL << 62) ? 62 :       \
>>>>               (n) & (1ULL << 61) ? 61 :       \
>>>> @@ -149,9 +143,7 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
>>>>               (n) & (1ULL <<  3) ?  3 :       \
>>>>               (n) & (1ULL <<  2) ?  2 :       \
>>>>               (n) & (1ULL <<  1) ?  1 :       \
>>>> -             (n) & (1ULL <<  0) ?  0 :       \
>>>> -             ____ilog2_NaN()                 \
>>>> -                                ) :          \
>>>> +             0) :                            \
>>>>       (sizeof(n) <= 4) ?                      \
>>>>       __ilog2_u32(n) :                        \
>>>>       __ilog2_u64(n)                          \
>>>> @@ -194,7 +186,6 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
>>>>   * @n: parameter
>>>>   *
>>>>   * The first few values calculated by this routine:
>>>> - *  ob2(0) = 0
>>>>   *  ob2(1) = 0
>>>>   *  ob2(2) = 1
>>>>   *  ob2(3) = 2
>>>>
>>>
>>> Reviving this thread as gcc 7 has now hit Fedora rawhide and has this
>>> same issue. I pulled in the above patch from Will as a temporary work
>>> around for building. It didn't look like there was consensus on a
>>> permanent solution though from the thread.
>>>
>>
>> I still think order_base_2() is broken, since it may invoke
>> roundup_pow_of_two() with an input value that is documented as
>> producing undefined output. I would argue that the below is the
>> correct fix.
>>
>> diff --git a/include/linux/log2.h b/include/linux/log2.h
>> index fd7ff3d91e6a..46523731bec0 100644
>> --- a/include/linux/log2.h
>> +++ b/include/linux/log2.h
>> @@ -203,6 +203,18 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
>>   *  ... and so on.
>>   */
>>
>> -#define order_base_2(n) ilog2(roundup_pow_of_two(n))
>> +static inline __attribute__((__const__))
>> +unsigned long __order_base_2(unsigned long n)
>> +{
>> +       return n ? 1UL << fls_long(n - 1) : 1;
>> +}
>> +
>> +#define order_base_2(n)                                \
>> +(                                              \
>> +       __builtin_constant_p(n) ? (             \
>> +               ((n) < 2) ? (n) :               \
>> +               ilog2((n) - 1) + 1) :           \
>> +       ilog2(__order_base_2(n))                \
>> + )
>>
>>  #endif /* _LINUX_LOG2_H */
>>
>
> This fixes the problem although the comments should be updated
> as well.

This brings order_base_2() in line with the comments, so I am not sure
what you'd want to update here?

> Is it worth fixing this for ilog2 as well just to avoid
> the link nonsense there as well?
>

ilog2() is truly undefined for input 0, so the link nonsense is
justified there imo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Build failure with v4.9-rc1 and GCC trunk -- compiler weirdness
  2017-02-02  9:17                 ` Ard Biesheuvel
@ 2017-02-02 15:43                   ` Laura Abbott
  2017-02-02 15:45                     ` Ard Biesheuvel
  0 siblings, 1 reply; 30+ messages in thread
From: Laura Abbott @ 2017-02-02 15:43 UTC (permalink / raw)
  To: linux-arm-kernel

On 02/02/2017 01:17 AM, Ard Biesheuvel wrote:
> On 1 February 2017 at 21:50, Laura Abbott <labbott@redhat.com> wrote:
>> On 02/01/2017 09:36 AM, Ard Biesheuvel wrote:
>>> On 1 February 2017 at 16:58, Laura Abbott <labbott@redhat.com> wrote:
>>>> On 10/19/2016 09:22 AM, Will Deacon wrote:
>>>>> On Wed, Oct 19, 2016 at 09:01:33AM -0700, Linus Torvalds wrote:
>>>>>> On Wed, Oct 19, 2016 at 8:56 AM, Markus Trippelsdorf
>>>>>> <markus@trippelsdorf.de> wrote:
>>>>>>> On 2016.10.19 at 08:55 -0700, Linus Torvalds wrote:
>>>>>>>>
>>>>>>>> Well, in the meantime we apparently have to live with it. Unless Will
>>>>>>>> is using some unreleased gcc version that nobody else is using and we
>>>>>>>> can just ignore it?
>>>>>>>
>>>>>>> Yes, he is using gcc-7 that is unreleased. (It will be released April
>>>>>>> next year.)
>>>>>>
>>>>>> Ahh, self-built? So it's not part of some experimental ARM distro
>>>>>> setup and this will be annoying lots of people?
>>>>>
>>>>> Our friendly compiler guys built it, but it's just a snapshot of trunk,
>>>>> so it's all heading towards GCC 7.0. AFAIU, the problematic optimisation
>>>>> is also a mid-end pass, so it would affect other architectures too.
>>>>>
>>>>>> If so, still think that we could just get rid of the ____ilog2_NaN()
>>>>>> thing as it's not _that_ important, but it's certainly not very
>>>>>> high-priority. Will can do it in his tree too for testing, and it can
>>>>>> remind people to get the gcc problem fixed.
>>>>>
>>>>> I'm carrying the diff below, which fixes arm64 defconfig, but I'm worried
>>>>> that we might be relying on this trick elsewhere. The arm __bad_cmpxchg
>>>>> function, for example.
>>>>>
>>>>> Will
>>>>>
>>>>> --->8
>>>>>
>>>>> diff --git a/include/linux/log2.h b/include/linux/log2.h
>>>>> index fd7ff3d91e6a..9cf5ad69065d 100644
>>>>> --- a/include/linux/log2.h
>>>>> +++ b/include/linux/log2.h
>>>>> @@ -16,12 +16,6 @@
>>>>>  #include <linux/bitops.h>
>>>>>
>>>>>  /*
>>>>> - * deal with unrepresentable constant logarithms
>>>>> - */
>>>>> -extern __attribute__((const, noreturn))
>>>>> -int ____ilog2_NaN(void);
>>>>> -
>>>>> -/*
>>>>>   * non-constant log of base 2 calculators
>>>>>   * - the arch may override these in asm/bitops.h if they can be implemented
>>>>>   *   more efficiently than using fls() and fls64()
>>>>> @@ -85,7 +79,7 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
>>>>>  #define ilog2(n)                             \
>>>>>  (                                            \
>>>>>       __builtin_constant_p(n) ? (             \
>>>>> -             (n) < 1 ? ____ilog2_NaN() :     \
>>>>> +             (n) < 1 ? 0 :                   \
>>>>>               (n) & (1ULL << 63) ? 63 :       \
>>>>>               (n) & (1ULL << 62) ? 62 :       \
>>>>>               (n) & (1ULL << 61) ? 61 :       \
>>>>> @@ -149,9 +143,7 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
>>>>>               (n) & (1ULL <<  3) ?  3 :       \
>>>>>               (n) & (1ULL <<  2) ?  2 :       \
>>>>>               (n) & (1ULL <<  1) ?  1 :       \
>>>>> -             (n) & (1ULL <<  0) ?  0 :       \
>>>>> -             ____ilog2_NaN()                 \
>>>>> -                                ) :          \
>>>>> +             0) :                            \
>>>>>       (sizeof(n) <= 4) ?                      \
>>>>>       __ilog2_u32(n) :                        \
>>>>>       __ilog2_u64(n)                          \
>>>>> @@ -194,7 +186,6 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
>>>>>   * @n: parameter
>>>>>   *
>>>>>   * The first few values calculated by this routine:
>>>>> - *  ob2(0) = 0
>>>>>   *  ob2(1) = 0
>>>>>   *  ob2(2) = 1
>>>>>   *  ob2(3) = 2
>>>>>
>>>>
>>>> Reviving this thread as gcc 7 has now hit Fedora rawhide and has this
>>>> same issue. I pulled in the above patch from Will as a temporary work
>>>> around for building. It didn't look like there was consensus on a
>>>> permanent solution though from the thread.
>>>>
>>>
>>> I still think order_base_2() is broken, since it may invoke
>>> roundup_pow_of_two() with an input value that is documented as
>>> producing undefined output. I would argue that the below is the
>>> correct fix.
>>>
>>> diff --git a/include/linux/log2.h b/include/linux/log2.h
>>> index fd7ff3d91e6a..46523731bec0 100644
>>> --- a/include/linux/log2.h
>>> +++ b/include/linux/log2.h
>>> @@ -203,6 +203,18 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
>>>   *  ... and so on.
>>>   */
>>>
>>> -#define order_base_2(n) ilog2(roundup_pow_of_two(n))
>>> +static inline __attribute__((__const__))
>>> +unsigned long __order_base_2(unsigned long n)
>>> +{
>>> +       return n ? 1UL << fls_long(n - 1) : 1;
>>> +}
>>> +
>>> +#define order_base_2(n)                                \
>>> +(                                              \
>>> +       __builtin_constant_p(n) ? (             \
>>> +               ((n) < 2) ? (n) :               \
>>> +               ilog2((n) - 1) + 1) :           \
>>> +       ilog2(__order_base_2(n))                \
>>> + )
>>>
>>>  #endif /* _LINUX_LOG2_H */
>>>
>>
>> This fixes the problem although the comments should be updated
>> as well.
> 
> This brings order_base_2() in line with the comments, so I am not sure
> what you'd want to update here?
> 

ob2(1) = 1  for the __builtin_constant_p case which doesn't
match the comment of ob2(1) = 0. So my statement should actually
be is this correct?

Thanks,
Laura

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Build failure with v4.9-rc1 and GCC trunk -- compiler weirdness
  2017-02-02 15:43                   ` Laura Abbott
@ 2017-02-02 15:45                     ` Ard Biesheuvel
  0 siblings, 0 replies; 30+ messages in thread
From: Ard Biesheuvel @ 2017-02-02 15:45 UTC (permalink / raw)
  To: linux-arm-kernel

On 2 February 2017 at 15:43, Laura Abbott <labbott@redhat.com> wrote:
> On 02/02/2017 01:17 AM, Ard Biesheuvel wrote:
>> On 1 February 2017 at 21:50, Laura Abbott <labbott@redhat.com> wrote:
>>> On 02/01/2017 09:36 AM, Ard Biesheuvel wrote:
>>>> On 1 February 2017 at 16:58, Laura Abbott <labbott@redhat.com> wrote:
>>>>> On 10/19/2016 09:22 AM, Will Deacon wrote:
>>>>>> On Wed, Oct 19, 2016 at 09:01:33AM -0700, Linus Torvalds wrote:
>>>>>>> On Wed, Oct 19, 2016 at 8:56 AM, Markus Trippelsdorf
>>>>>>> <markus@trippelsdorf.de> wrote:
>>>>>>>> On 2016.10.19 at 08:55 -0700, Linus Torvalds wrote:
>>>>>>>>>
>>>>>>>>> Well, in the meantime we apparently have to live with it. Unless Will
>>>>>>>>> is using some unreleased gcc version that nobody else is using and we
>>>>>>>>> can just ignore it?
>>>>>>>>
>>>>>>>> Yes, he is using gcc-7 that is unreleased. (It will be released April
>>>>>>>> next year.)
>>>>>>>
>>>>>>> Ahh, self-built? So it's not part of some experimental ARM distro
>>>>>>> setup and this will be annoying lots of people?
>>>>>>
>>>>>> Our friendly compiler guys built it, but it's just a snapshot of trunk,
>>>>>> so it's all heading towards GCC 7.0. AFAIU, the problematic optimisation
>>>>>> is also a mid-end pass, so it would affect other architectures too.
>>>>>>
>>>>>>> If so, still think that we could just get rid of the ____ilog2_NaN()
>>>>>>> thing as it's not _that_ important, but it's certainly not very
>>>>>>> high-priority. Will can do it in his tree too for testing, and it can
>>>>>>> remind people to get the gcc problem fixed.
>>>>>>
>>>>>> I'm carrying the diff below, which fixes arm64 defconfig, but I'm worried
>>>>>> that we might be relying on this trick elsewhere. The arm __bad_cmpxchg
>>>>>> function, for example.
>>>>>>
>>>>>> Will
>>>>>>
>>>>>> --->8
>>>>>>
>>>>>> diff --git a/include/linux/log2.h b/include/linux/log2.h
>>>>>> index fd7ff3d91e6a..9cf5ad69065d 100644
>>>>>> --- a/include/linux/log2.h
>>>>>> +++ b/include/linux/log2.h
>>>>>> @@ -16,12 +16,6 @@
>>>>>>  #include <linux/bitops.h>
>>>>>>
>>>>>>  /*
>>>>>> - * deal with unrepresentable constant logarithms
>>>>>> - */
>>>>>> -extern __attribute__((const, noreturn))
>>>>>> -int ____ilog2_NaN(void);
>>>>>> -
>>>>>> -/*
>>>>>>   * non-constant log of base 2 calculators
>>>>>>   * - the arch may override these in asm/bitops.h if they can be implemented
>>>>>>   *   more efficiently than using fls() and fls64()
>>>>>> @@ -85,7 +79,7 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
>>>>>>  #define ilog2(n)                             \
>>>>>>  (                                            \
>>>>>>       __builtin_constant_p(n) ? (             \
>>>>>> -             (n) < 1 ? ____ilog2_NaN() :     \
>>>>>> +             (n) < 1 ? 0 :                   \
>>>>>>               (n) & (1ULL << 63) ? 63 :       \
>>>>>>               (n) & (1ULL << 62) ? 62 :       \
>>>>>>               (n) & (1ULL << 61) ? 61 :       \
>>>>>> @@ -149,9 +143,7 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
>>>>>>               (n) & (1ULL <<  3) ?  3 :       \
>>>>>>               (n) & (1ULL <<  2) ?  2 :       \
>>>>>>               (n) & (1ULL <<  1) ?  1 :       \
>>>>>> -             (n) & (1ULL <<  0) ?  0 :       \
>>>>>> -             ____ilog2_NaN()                 \
>>>>>> -                                ) :          \
>>>>>> +             0) :                            \
>>>>>>       (sizeof(n) <= 4) ?                      \
>>>>>>       __ilog2_u32(n) :                        \
>>>>>>       __ilog2_u64(n)                          \
>>>>>> @@ -194,7 +186,6 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
>>>>>>   * @n: parameter
>>>>>>   *
>>>>>>   * The first few values calculated by this routine:
>>>>>> - *  ob2(0) = 0
>>>>>>   *  ob2(1) = 0
>>>>>>   *  ob2(2) = 1
>>>>>>   *  ob2(3) = 2
>>>>>>
>>>>>
>>>>> Reviving this thread as gcc 7 has now hit Fedora rawhide and has this
>>>>> same issue. I pulled in the above patch from Will as a temporary work
>>>>> around for building. It didn't look like there was consensus on a
>>>>> permanent solution though from the thread.
>>>>>
>>>>
>>>> I still think order_base_2() is broken, since it may invoke
>>>> roundup_pow_of_two() with an input value that is documented as
>>>> producing undefined output. I would argue that the below is the
>>>> correct fix.
>>>>
>>>> diff --git a/include/linux/log2.h b/include/linux/log2.h
>>>> index fd7ff3d91e6a..46523731bec0 100644
>>>> --- a/include/linux/log2.h
>>>> +++ b/include/linux/log2.h
>>>> @@ -203,6 +203,18 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
>>>>   *  ... and so on.
>>>>   */
>>>>
>>>> -#define order_base_2(n) ilog2(roundup_pow_of_two(n))
>>>> +static inline __attribute__((__const__))
>>>> +unsigned long __order_base_2(unsigned long n)
>>>> +{
>>>> +       return n ? 1UL << fls_long(n - 1) : 1;
>>>> +}
>>>> +
>>>> +#define order_base_2(n)                                \
>>>> +(                                              \
>>>> +       __builtin_constant_p(n) ? (             \
>>>> +               ((n) < 2) ? (n) :               \
>>>> +               ilog2((n) - 1) + 1) :           \
>>>> +       ilog2(__order_base_2(n))                \
>>>> + )
>>>>
>>>>  #endif /* _LINUX_LOG2_H */
>>>>
>>>
>>> This fixes the problem although the comments should be updated
>>> as well.
>>
>> This brings order_base_2() in line with the comments, so I am not sure
>> what you'd want to update here?
>>
>
> ob2(1) = 1  for the __builtin_constant_p case which doesn't
> match the comment of ob2(1) = 0. So my statement should actually
> be is this correct?
>
> Thanks,
> Laura

You are right. But the final one I proposed is correct:

#define order_base_2(n)                        \
(                                              \
       __builtin_constant_p(n) ? (             \
               ((n) == 0 || (n) == 1) ? 0 :    \
               ilog2((n) - 1) + 1) :           \
       __order_base_2(n)                       \
)

and solves Joe's issue as well.

I will submit this as a proper patch.

Thanks,
Ard.

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2017-02-02 15:45 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-10-17 18:38 Build failure with v4.9-rc1 and GCC trunk -- compiler weirdness Will Deacon
2016-10-17 19:43 ` Ard Biesheuvel
2016-10-19 13:35   ` Will Deacon
2016-10-19 14:59     ` Ard Biesheuvel
2016-10-19 15:01       ` Ard Biesheuvel
2016-10-19 15:11         ` Arnd Bergmann
2016-10-19 15:27           ` Ard Biesheuvel
2016-10-19 15:44             ` Arnd Bergmann
2016-10-19 15:32           ` Gregory CLEMENT
2016-10-19 15:58     ` Russell King - ARM Linux
2016-10-19 15:37 ` Markus Trippelsdorf
2016-10-19 15:55   ` Linus Torvalds
2016-10-19 15:56     ` Markus Trippelsdorf
2016-10-19 16:00       ` Ard Biesheuvel
2016-10-19 16:01       ` Linus Torvalds
2016-10-19 16:22         ` Will Deacon
2017-02-01 16:58           ` Laura Abbott
2017-02-01 17:36             ` Ard Biesheuvel
2017-02-01 18:19               ` Ard Biesheuvel
2017-02-01 19:04                 ` Joe Perches
2017-02-01 19:31                   ` Ard Biesheuvel
2017-02-01 19:49                     ` Joe Perches
2017-02-01 19:53                       ` Ard Biesheuvel
2017-02-01 20:34                         ` Joe Perches
2017-02-01 21:11                           ` Ard Biesheuvel
2017-02-02  9:02                   ` Peter Zijlstra
2017-02-01 21:50               ` Laura Abbott
2017-02-02  9:17                 ` Ard Biesheuvel
2017-02-02 15:43                   ` Laura Abbott
2017-02-02 15:45                     ` Ard Biesheuvel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).