* Serious regression on 6.1.x-stable caused by "bounds: support non-power-of-two CONFIG_NR_CPUS"
@ 2024-04-28 14:58 Mikhail Novosyolov
2024-04-29 2:56 ` Matthew Wilcox
0 siblings, 1 reply; 5+ messages in thread
From: Mikhail Novosyolov @ 2024-04-28 14:58 UTC (permalink / raw)
To: willy, riel, mgorman, mgorman, peterz, mingo, akpm, stable,
sashal
Cc: Бетхер Александр,
i.gaptrakhmanov
Hello, colleagues.
Commit f2d5dcb48f7ba9e3ff249d58fc1fa963d374e66a "bounds: support non-power-of-two CONFIG_NR_CPUS" (https://github.com/torvalds/linux/commit/f2d5dcb48f7ba9e3ff249d58fc1fa963d374e66a) was backported to 6.1.x-stable (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=428ca0000f0abd5c99354c52a36becf2b815ca21), but causes a serious regression on quite a lot of hardware with AMD GPUs, kernel panics.
It was backported to 6.1.84, 6.1.84 has problems, 6/1/83 does not, the newest 6.1.88 still has this problem.
The problem is described here: https://gitlab.freedesktop.org/drm/amd/-/issues/3347
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Serious regression on 6.1.x-stable caused by "bounds: support non-power-of-two CONFIG_NR_CPUS"
2024-04-28 14:58 Serious regression on 6.1.x-stable caused by "bounds: support non-power-of-two CONFIG_NR_CPUS" Mikhail Novosyolov
@ 2024-04-29 2:56 ` Matthew Wilcox
2024-04-29 4:07 ` Михаил Новоселов
0 siblings, 1 reply; 5+ messages in thread
From: Matthew Wilcox @ 2024-04-29 2:56 UTC (permalink / raw)
To: Mikhail Novosyolov
Cc: riel, mgorman, peterz, mingo, akpm, stable, sashal,
Бетхер Александр,
i.gaptrakhmanov
On Sun, Apr 28, 2024 at 05:58:08PM +0300, Mikhail Novosyolov wrote:
> Hello, colleagues.
>
> Commit f2d5dcb48f7ba9e3ff249d58fc1fa963d374e66a "bounds: support non-power-of-two CONFIG_NR_CPUS" (https://github.com/torvalds/linux/commit/f2d5dcb48f7ba9e3ff249d58fc1fa963d374e66a) was backported to 6.1.x-stable (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=428ca0000f0abd5c99354c52a36becf2b815ca21), but causes a serious regression on quite a lot of hardware with AMD GPUs, kernel panics.
>
> It was backported to 6.1.84, 6.1.84 has problems, 6/1/83 does not, the newest 6.1.88 still has this problem.
Does v6.8.3 (which contains cf778fff03be) have this problem?
How about current Linus master?
What kernel config were you using? I don't see that info on
https://linux-hardware.org/?probe=9c92ac1222
(maybe my tired eyes can't see it)
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Serious regression on 6.1.x-stable caused by "bounds: support non-power-of-two CONFIG_NR_CPUS"
2024-04-29 2:56 ` Matthew Wilcox
@ 2024-04-29 4:07 ` Михаил Новоселов
2024-04-29 12:17 ` Matthew Wilcox
0 siblings, 1 reply; 5+ messages in thread
From: Михаил Новоселов @ 2024-04-29 4:07 UTC (permalink / raw)
To: Matthew Wilcox
Cc: riel, mgorman, peterz, mingo, akpm, stable, sashal,
Бетхер Александр,
i gaptrakhmanov
(Resending in plain text, sorry for accodently sending in HTML)
----- Исходное сообщение -----
> От: "Matthew Wilcox" <willy@infradead.org>
> Кому: "Михаил Новоселов" <m.novosyolov@rosalinux.ru>
> Копия: riel@surriel.com, mgorman@techsingularity.net, peterz@infradead.org, mingo@kernel.org, akpm@linux-foundation.org,
> stable@vger.kernel.org, sashal@kernel.org, "Бетхер Александр" <a.betkher@rosalinux.ru>, "i gaptrakhmanov"
> <i.gaptrakhmanov@rosalinux.ru>
> Отправленные: Понедельник, 29 Апрель 2024 г 5:56:29
> Тема: Re: Serious regression on 6.1.x-stable caused by "bounds: support non-power-of-two CONFIG_NR_CPUS"
> On Sun, Apr 28, 2024 at 05:58:08PM +0300, Mikhail Novosyolov wrote:
>> Hello, colleagues.
>>
>> Commit f2d5dcb48f7ba9e3ff249d58fc1fa963d374e66a "bounds: support
>> non-power-of-two CONFIG_NR_CPUS"
>> (https://github.com/torvalds/linux/commit/f2d5dcb48f7ba9e3ff249d58fc1fa963d374e66a)
>> was backported to 6.1.x-stable
>> (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=428ca0000f0abd5c99354c52a36becf2b815ca21),
>> but causes a serious regression on quite a lot of hardware with AMD GPUs,
>> kernel panics.
>>
>> It was backported to 6.1.84, 6.1.84 has problems, 6/1/83 does not, the newest
>> 6.1.88 still has this problem.
>
> Does v6.8.3 (which contains cf778fff03be) have this problem?
> How about current Linus master?
6.1.88 - has problem
6.6.27 - does not have problem
6.9-rc from commit efdfbbc4dcc8f98754056971f88af0f7ff906144 https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git - does not have problem
6.8.3 was not tested, but we can test it if needed.
>
> What kernel config were you using? I don't see that info on
> https://linux-hardware.org/?probe=9c92ac1222
> (maybe my tired eyes can't see it)
Kernel config for 6.1: https://abf.io/import/kernel-6.1/blob/bcb3e9611f/kernel-x86_64.config
For 6.6: https://abf.io/import/kernel-6.6/blob/7404a4d3d5/kernel-x86_64.config
6.9-rc was built with copypastied config from 6.6 (https://abf.io/build_lists/5028240)
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Serious regression on 6.1.x-stable caused by "bounds: support non-power-of-two CONFIG_NR_CPUS"
2024-04-29 4:07 ` Михаил Новоселов
@ 2024-04-29 12:17 ` Matthew Wilcox
2024-04-29 13:37 ` Ильфат Гаптрахманов
0 siblings, 1 reply; 5+ messages in thread
From: Matthew Wilcox @ 2024-04-29 12:17 UTC (permalink / raw)
To: Михаил Новоселов
Cc: riel, mgorman, peterz, mingo, akpm, stable, sashal,
Бетхер Александр,
i gaptrakhmanov
On Mon, Apr 29, 2024 at 07:07:39AM +0300, Михаил Новоселов wrote:
> >> It was backported to 6.1.84, 6.1.84 has problems, 6/1/83 does not, the newest
> >> 6.1.88 still has this problem.
> >
> > Does v6.8.3 (which contains cf778fff03be) have this problem?
> > How about current Linus master?
>
> 6.1.88 - has problem
> 6.6.27 - does not have problem
> 6.9-rc from commit efdfbbc4dcc8f98754056971f88af0f7ff906144 https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git - does not have problem
>
> 6.8.3 was not tested, but we can test it if needed.
How curious.
> > What kernel config were you using? I don't see that info on
> > https://linux-hardware.org/?probe=9c92ac1222
> > (maybe my tired eyes can't see it)
>
> Kernel config for 6.1: https://abf.io/import/kernel-6.1/blob/bcb3e9611f/kernel-x86_64.config
CONFIG_NR_CPUS=8192
> For 6.6: https://abf.io/import/kernel-6.6/blob/7404a4d3d5/kernel-x86_64.config
CONFIG_NR_CPUS=8192
Since you're using a power-of-two, this should have been a no-op.
But bits_per() doesn't work the way I thought it did!
#define bits_per(n) \
( \
__builtin_constant_p(n) ? ( \
((n) == 0 || (n) == 1) \
? 1 : ilog2(n) + 1 \
) : \
CONFIG_NR_CPUS is obviously a constant, and larger than 1, so we end up
calling ilog2(n) + 1. So we allocate one extra bit.
I should have changed this to
DEFINE(NR_CPUS_BITS, bits_per(CONFIG_NR_CPUS - 1))
Can you test that and report back? I'll prepare a fix for mainline in
the meantime.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Serious regression on 6.1.x-stable caused by "bounds: support non-power-of-two CONFIG_NR_CPUS"
2024-04-29 12:17 ` Matthew Wilcox
@ 2024-04-29 13:37 ` Ильфат Гаптрахманов
0 siblings, 0 replies; 5+ messages in thread
From: Ильфат Гаптрахманов @ 2024-04-29 13:37 UTC (permalink / raw)
To: Matthew Wilcox,
Михаил Новоселов
Cc: riel, mgorman, peterz, mingo, akpm, stable, sashal,
Бетхер Александр
29.04.2024 15:17, Matthew Wilcox пишет:
> CONFIG_NR_CPUS=8192
>
> Since you're using a power-of-two, this should have been a no-op.
> But bits_per() doesn't work the way I thought it did!
>
> #define bits_per(n) \
> ( \
> __builtin_constant_p(n) ? ( \
> ((n) == 0 || (n) == 1) \
> ? 1 : ilog2(n) + 1 \
> ) : \
>
> CONFIG_NR_CPUS is obviously a constant, and larger than 1, so we end up
> calling ilog2(n) + 1. So we allocate one extra bit.
>
> I should have changed this to
> DEFINE(NR_CPUS_BITS, bits_per(CONFIG_NR_CPUS - 1))
>
> Can you test that and report back? I'll prepare a fix for mainline in
> the meantime.
Yes, this fix solved the problem
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-04-29 13:37 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-28 14:58 Serious regression on 6.1.x-stable caused by "bounds: support non-power-of-two CONFIG_NR_CPUS" Mikhail Novosyolov
2024-04-29 2:56 ` Matthew Wilcox
2024-04-29 4:07 ` Михаил Новоселов
2024-04-29 12:17 ` Matthew Wilcox
2024-04-29 13:37 ` Ильфат Гаптрахманов
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox