public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
* Serious regression on 6.1.x-stable caused by "bounds: support non-power-of-two CONFIG_NR_CPUS"
@ 2024-04-28 14:58 Mikhail Novosyolov
  2024-04-29  2:56 ` Matthew Wilcox
  0 siblings, 1 reply; 5+ messages in thread
From: Mikhail Novosyolov @ 2024-04-28 14:58 UTC (permalink / raw)
  To: willy, riel, mgorman, mgorman, peterz, mingo, akpm, stable,
	sashal
  Cc: Бетхер Александр,
	i.gaptrakhmanov

Hello, colleagues.

Commit f2d5dcb48f7ba9e3ff249d58fc1fa963d374e66a "bounds: support non-power-of-two CONFIG_NR_CPUS" (https://github.com/torvalds/linux/commit/f2d5dcb48f7ba9e3ff249d58fc1fa963d374e66a) was backported to 6.1.x-stable (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=428ca0000f0abd5c99354c52a36becf2b815ca21), but causes a serious regression on quite a lot of hardware with AMD GPUs, kernel panics.

It was backported to 6.1.84, 6.1.84 has problems, 6/1/83 does not, the newest 6.1.88 still has this problem.

The problem is described here: https://gitlab.freedesktop.org/drm/amd/-/issues/3347



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Serious regression on 6.1.x-stable caused by "bounds: support non-power-of-two CONFIG_NR_CPUS"
  2024-04-28 14:58 Serious regression on 6.1.x-stable caused by "bounds: support non-power-of-two CONFIG_NR_CPUS" Mikhail Novosyolov
@ 2024-04-29  2:56 ` Matthew Wilcox
  2024-04-29  4:07   ` Михаил Новоселов
  0 siblings, 1 reply; 5+ messages in thread
From: Matthew Wilcox @ 2024-04-29  2:56 UTC (permalink / raw)
  To: Mikhail Novosyolov
  Cc: riel, mgorman, peterz, mingo, akpm, stable, sashal,
	Бетхер Александр,
	i.gaptrakhmanov

On Sun, Apr 28, 2024 at 05:58:08PM +0300, Mikhail Novosyolov wrote:
> Hello, colleagues.
> 
> Commit f2d5dcb48f7ba9e3ff249d58fc1fa963d374e66a "bounds: support non-power-of-two CONFIG_NR_CPUS" (https://github.com/torvalds/linux/commit/f2d5dcb48f7ba9e3ff249d58fc1fa963d374e66a) was backported to 6.1.x-stable (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=428ca0000f0abd5c99354c52a36becf2b815ca21), but causes a serious regression on quite a lot of hardware with AMD GPUs, kernel panics.
> 
> It was backported to 6.1.84, 6.1.84 has problems, 6/1/83 does not, the newest 6.1.88 still has this problem.

Does v6.8.3 (which contains cf778fff03be) have this problem?
How about current Linus master?

What kernel config were you using?  I don't see that info on
https://linux-hardware.org/?probe=9c92ac1222
(maybe my tired eyes can't see it)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Serious regression on 6.1.x-stable caused by "bounds: support non-power-of-two CONFIG_NR_CPUS"
  2024-04-29  2:56 ` Matthew Wilcox
@ 2024-04-29  4:07   ` Михаил Новоселов
  2024-04-29 12:17     ` Matthew Wilcox
  0 siblings, 1 reply; 5+ messages in thread
From: Михаил Новоселов @ 2024-04-29  4:07 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: riel, mgorman, peterz, mingo, akpm, stable, sashal,
	Бетхер Александр,
	i gaptrakhmanov

(Resending in plain text, sorry for accodently sending in HTML)

----- Исходное сообщение -----
> От: "Matthew Wilcox" <willy@infradead.org>
> Кому: "Михаил Новоселов" <m.novosyolov@rosalinux.ru>
> Копия: riel@surriel.com, mgorman@techsingularity.net, peterz@infradead.org, mingo@kernel.org, akpm@linux-foundation.org,
> stable@vger.kernel.org, sashal@kernel.org, "Бетхер Александр" <a.betkher@rosalinux.ru>, "i gaptrakhmanov"
> <i.gaptrakhmanov@rosalinux.ru>
> Отправленные: Понедельник, 29 Апрель 2024 г 5:56:29
> Тема: Re: Serious regression on 6.1.x-stable caused by "bounds: support non-power-of-two CONFIG_NR_CPUS"

> On Sun, Apr 28, 2024 at 05:58:08PM +0300, Mikhail Novosyolov wrote:
>> Hello, colleagues.
>> 
>> Commit f2d5dcb48f7ba9e3ff249d58fc1fa963d374e66a "bounds: support
>> non-power-of-two CONFIG_NR_CPUS"
>> (https://github.com/torvalds/linux/commit/f2d5dcb48f7ba9e3ff249d58fc1fa963d374e66a)
>> was backported to 6.1.x-stable
>> (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=428ca0000f0abd5c99354c52a36becf2b815ca21),
>> but causes a serious regression on quite a lot of hardware with AMD GPUs,
>> kernel panics.
>> 
>> It was backported to 6.1.84, 6.1.84 has problems, 6/1/83 does not, the newest
>> 6.1.88 still has this problem.
> 
> Does v6.8.3 (which contains cf778fff03be) have this problem?
> How about current Linus master?

6.1.88 - has problem
6.6.27 - does not have problem
6.9-rc from commit efdfbbc4dcc8f98754056971f88af0f7ff906144 https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git - does not have problem

6.8.3 was not tested, but we can test it if needed.


> 
> What kernel config were you using?  I don't see that info on
> https://linux-hardware.org/?probe=9c92ac1222
> (maybe my tired eyes can't see it)

Kernel config for 6.1: https://abf.io/import/kernel-6.1/blob/bcb3e9611f/kernel-x86_64.config
For 6.6: https://abf.io/import/kernel-6.6/blob/7404a4d3d5/kernel-x86_64.config
6.9-rc was built with copypastied config from 6.6 (https://abf.io/build_lists/5028240)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Serious regression on 6.1.x-stable caused by "bounds: support non-power-of-two CONFIG_NR_CPUS"
  2024-04-29  4:07   ` Михаил Новоселов
@ 2024-04-29 12:17     ` Matthew Wilcox
  2024-04-29 13:37       ` Ильфат Гаптрахманов
  0 siblings, 1 reply; 5+ messages in thread
From: Matthew Wilcox @ 2024-04-29 12:17 UTC (permalink / raw)
  To: Михаил Новоселов
  Cc: riel, mgorman, peterz, mingo, akpm, stable, sashal,
	Бетхер Александр,
	i gaptrakhmanov

On Mon, Apr 29, 2024 at 07:07:39AM +0300, Михаил Новоселов wrote:
> >> It was backported to 6.1.84, 6.1.84 has problems, 6/1/83 does not, the newest
> >> 6.1.88 still has this problem.
> > 
> > Does v6.8.3 (which contains cf778fff03be) have this problem?
> > How about current Linus master?
> 
> 6.1.88 - has problem
> 6.6.27 - does not have problem
> 6.9-rc from commit efdfbbc4dcc8f98754056971f88af0f7ff906144 https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git - does not have problem
> 
> 6.8.3 was not tested, but we can test it if needed.

How curious.

> > What kernel config were you using?  I don't see that info on
> > https://linux-hardware.org/?probe=9c92ac1222
> > (maybe my tired eyes can't see it)
> 
> Kernel config for 6.1: https://abf.io/import/kernel-6.1/blob/bcb3e9611f/kernel-x86_64.config

CONFIG_NR_CPUS=8192

> For 6.6: https://abf.io/import/kernel-6.6/blob/7404a4d3d5/kernel-x86_64.config

CONFIG_NR_CPUS=8192

Since you're using a power-of-two, this should have been a no-op.
But bits_per() doesn't work the way I thought it did!

#define bits_per(n)                             \
(                                               \
        __builtin_constant_p(n) ? (             \
                ((n) == 0 || (n) == 1)          \
                        ? 1 : ilog2(n) + 1      \
        ) :                                     \

CONFIG_NR_CPUS is obviously a constant, and larger than 1, so we end up
calling ilog2(n) + 1.  So we allocate one extra bit.

I should have changed this to
DEFINE(NR_CPUS_BITS, bits_per(CONFIG_NR_CPUS - 1))

Can you test that and report back?  I'll prepare a fix for mainline in
the meantime.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Serious regression on 6.1.x-stable caused by "bounds: support non-power-of-two CONFIG_NR_CPUS"
  2024-04-29 12:17     ` Matthew Wilcox
@ 2024-04-29 13:37       ` Ильфат Гаптрахманов
  0 siblings, 0 replies; 5+ messages in thread
From: Ильфат Гаптрахманов @ 2024-04-29 13:37 UTC (permalink / raw)
  To: Matthew Wilcox,
	Михаил Новоселов
  Cc: riel, mgorman, peterz, mingo, akpm, stable, sashal,
	Бетхер Александр

29.04.2024 15:17, Matthew Wilcox пишет:
> CONFIG_NR_CPUS=8192
>
> Since you're using a power-of-two, this should have been a no-op.
> But bits_per() doesn't work the way I thought it did!
>
> #define bits_per(n)                             \
> (                                               \
>          __builtin_constant_p(n) ? (             \
>                  ((n) == 0 || (n) == 1)          \
>                          ? 1 : ilog2(n) + 1      \
>          ) :                                     \
>
> CONFIG_NR_CPUS is obviously a constant, and larger than 1, so we end up
> calling ilog2(n) + 1.  So we allocate one extra bit.
>
> I should have changed this to
> DEFINE(NR_CPUS_BITS, bits_per(CONFIG_NR_CPUS - 1))
>
> Can you test that and report back?  I'll prepare a fix for mainline in
> the meantime.
Yes, this fix solved the problem

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-04-29 13:37 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-28 14:58 Serious regression on 6.1.x-stable caused by "bounds: support non-power-of-two CONFIG_NR_CPUS" Mikhail Novosyolov
2024-04-29  2:56 ` Matthew Wilcox
2024-04-29  4:07   ` Михаил Новоселов
2024-04-29 12:17     ` Matthew Wilcox
2024-04-29 13:37       ` Ильфат Гаптрахманов

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox