* CVE-2024-46839: workqueue: Improve scalability of workqueue watchdog touch
@ 2024-09-27 12:40 Greg Kroah-Hartman
2024-10-01 8:02 ` Petr Mladek
0 siblings, 1 reply; 6+ messages in thread
From: Greg Kroah-Hartman @ 2024-09-27 12:40 UTC (permalink / raw)
To: linux-cve-announce; +Cc: Greg Kroah-Hartman
Description
===========
In the Linux kernel, the following vulnerability has been resolved:
workqueue: Improve scalability of workqueue watchdog touch
On a ~2000 CPU powerpc system, hard lockups have been observed in the
workqueue code when stop_machine runs (in this case due to CPU hotplug).
This is due to lots of CPUs spinning in multi_cpu_stop, calling
touch_nmi_watchdog() which ends up calling wq_watchdog_touch().
wq_watchdog_touch() writes to the global variable wq_watchdog_touched,
and that can find itself in the same cacheline as other important
workqueue data, which slows down operations to the point of lockups.
In the case of the following abridged trace, worker_pool_idr was in
the hot line, causing the lockups to always appear at idr_find.
watchdog: CPU 1125 self-detected hard LOCKUP @ idr_find
Call Trace:
get_work_pool
__queue_work
call_timer_fn
run_timer_softirq
__do_softirq
do_softirq_own_stack
irq_exit
timer_interrupt
decrementer_common_virt
* interrupt: 900 (timer) at multi_cpu_stop
multi_cpu_stop
cpu_stopper_thread
smpboot_thread_fn
kthread
Fix this by having wq_watchdog_touch() only write to the line if the
last time a touch was recorded exceeds 1/4 of the watchdog threshold.
The Linux kernel CVE team has assigned CVE-2024-46839 to this issue.
Affected and fixed versions
===========================
Fixed in 5.15.167 with commit 9d08fce64dd7
Fixed in 6.1.110 with commit a2abd35e7dc5
Fixed in 6.6.51 with commit 241bce1c757d
Fixed in 6.10.10 with commit da5f374103a1
Fixed in 6.11 with commit 98f887f820c9
Please see https://www.kernel.org for a full list of currently supported
kernel versions by the kernel community.
Unaffected versions might change over time as fixes are backported to
older supported kernel versions. The official CVE entry at
https://cve.org/CVERecord/?id=CVE-2024-46839
will be updated if fixes are backported, please check that for the most
up to date information about this issue.
Affected files
==============
The file(s) affected by this issue are:
kernel/workqueue.c
Mitigation
==========
The Linux kernel CVE team recommends that you update to the latest
stable kernel version for this, and many other bugfixes. Individual
changes are never tested alone, but rather are part of a larger kernel
release. Cherry-picking individual commits is not recommended or
supported by the Linux kernel community at all. If however, updating to
the latest release is impossible, the individual changes to resolve this
issue can be found at these commits:
https://git.kernel.org/stable/c/9d08fce64dd77f42e2361a4818dbc4b50f3c7dad
https://git.kernel.org/stable/c/a2abd35e7dc55bf9ed01e2b3481fa78e086d3bf4
https://git.kernel.org/stable/c/241bce1c757d0587721512296952e6bba69631ed
https://git.kernel.org/stable/c/da5f374103a1e0881bbd35847dc57b04ac155eb0
https://git.kernel.org/stable/c/98f887f820c993e05a12e8aa816c80b8661d4c87
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: CVE-2024-46839: workqueue: Improve scalability of workqueue watchdog touch
2024-09-27 12:40 CVE-2024-46839: workqueue: Improve scalability of workqueue watchdog touch Greg Kroah-Hartman
@ 2024-10-01 8:02 ` Petr Mladek
2024-10-01 8:22 ` Greg Kroah-Hartman
0 siblings, 1 reply; 6+ messages in thread
From: Petr Mladek @ 2024-10-01 8:02 UTC (permalink / raw)
To: cve, linux-kernel
Cc: linux-cve-announce, Greg Kroah-Hartman, Srikar Dronamraju,
Nicholas Piggin, Paul E. McKenney, Tejun Heo, Sasha Levin,
Michal Hocko, Michal Koutný
On Fri 2024-09-27 14:40:07, Greg Kroah-Hartman wrote:
> Description
> ===========
>
> In the Linux kernel, the following vulnerability has been resolved:
>
> workqueue: Improve scalability of workqueue watchdog touch
>
> On a ~2000 CPU powerpc system, hard lockups have been observed in the
> workqueue code when stop_machine runs (in this case due to CPU hotplug).
I believe that this does not qualify as a security vulnerability.
Any hotplug is a privileged operation.
Best Regards,
Petr
> This is due to lots of CPUs spinning in multi_cpu_stop, calling
> touch_nmi_watchdog() which ends up calling wq_watchdog_touch().
> wq_watchdog_touch() writes to the global variable wq_watchdog_touched,
> and that can find itself in the same cacheline as other important
> workqueue data, which slows down operations to the point of lockups.
>
> In the case of the following abridged trace, worker_pool_idr was in
> the hot line, causing the lockups to always appear at idr_find.
>
> watchdog: CPU 1125 self-detected hard LOCKUP @ idr_find
> Call Trace:
> get_work_pool
> __queue_work
> call_timer_fn
> run_timer_softirq
> __do_softirq
> do_softirq_own_stack
> irq_exit
> timer_interrupt
> decrementer_common_virt
> * interrupt: 900 (timer) at multi_cpu_stop
> multi_cpu_stop
> cpu_stopper_thread
> smpboot_thread_fn
> kthread
>
> Fix this by having wq_watchdog_touch() only write to the line if the
> last time a touch was recorded exceeds 1/4 of the watchdog threshold.
>
> The Linux kernel CVE team has assigned CVE-2024-46839 to this issue.
>
>
> Affected and fixed versions
> ===========================
>
> Fixed in 5.15.167 with commit 9d08fce64dd7
> Fixed in 6.1.110 with commit a2abd35e7dc5
> Fixed in 6.6.51 with commit 241bce1c757d
> Fixed in 6.10.10 with commit da5f374103a1
> Fixed in 6.11 with commit 98f887f820c9
>
> Please see https://www.kernel.org for a full list of currently supported
> kernel versions by the kernel community.
>
> Unaffected versions might change over time as fixes are backported to
> older supported kernel versions. The official CVE entry at
> https://cve.org/CVERecord/?id=CVE-2024-46839
> will be updated if fixes are backported, please check that for the most
> up to date information about this issue.
>
>
> Affected files
> ==============
>
> The file(s) affected by this issue are:
> kernel/workqueue.c
>
>
> Mitigation
> ==========
>
> The Linux kernel CVE team recommends that you update to the latest
> stable kernel version for this, and many other bugfixes. Individual
> changes are never tested alone, but rather are part of a larger kernel
> release. Cherry-picking individual commits is not recommended or
> supported by the Linux kernel community at all. If however, updating to
> the latest release is impossible, the individual changes to resolve this
> issue can be found at these commits:
> https://git.kernel.org/stable/c/9d08fce64dd77f42e2361a4818dbc4b50f3c7dad
> https://git.kernel.org/stable/c/a2abd35e7dc55bf9ed01e2b3481fa78e086d3bf4
> https://git.kernel.org/stable/c/241bce1c757d0587721512296952e6bba69631ed
> https://git.kernel.org/stable/c/da5f374103a1e0881bbd35847dc57b04ac155eb0
> https://git.kernel.org/stable/c/98f887f820c993e05a12e8aa816c80b8661d4c87
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: CVE-2024-46839: workqueue: Improve scalability of workqueue watchdog touch
2024-10-01 8:02 ` Petr Mladek
@ 2024-10-01 8:22 ` Greg Kroah-Hartman
2024-10-01 9:07 ` Michal Hocko
0 siblings, 1 reply; 6+ messages in thread
From: Greg Kroah-Hartman @ 2024-10-01 8:22 UTC (permalink / raw)
To: Petr Mladek
Cc: cve, linux-kernel, linux-cve-announce, Srikar Dronamraju,
Nicholas Piggin, Paul E. McKenney, Tejun Heo, Sasha Levin,
Michal Hocko, Michal Koutný
On Tue, Oct 01, 2024 at 10:02:02AM +0200, Petr Mladek wrote:
> On Fri 2024-09-27 14:40:07, Greg Kroah-Hartman wrote:
> > Description
> > ===========
> >
> > In the Linux kernel, the following vulnerability has been resolved:
> >
> > workqueue: Improve scalability of workqueue watchdog touch
> >
> > On a ~2000 CPU powerpc system, hard lockups have been observed in the
> > workqueue code when stop_machine runs (in this case due to CPU hotplug).
>
> I believe that this does not qualify as a security vulnerability.
> Any hotplug is a privileged operation.
Really? I see that happen on many embedded systems all the time, they
add/remove CPUs while the device runs/sleeps constantly.
Now to be fair, right now an "embedded system" usually doesn't have 2000
cpus, but what's wrong with marking this real bugfix as a vulnerability
resolution? If you don't run your system in a way that allows cpus to
be stopped unless an admin says so, it will not be relevant.
thanks,
greg k-h
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: CVE-2024-46839: workqueue: Improve scalability of workqueue watchdog touch
2024-10-01 8:22 ` Greg Kroah-Hartman
@ 2024-10-01 9:07 ` Michal Hocko
2024-10-01 11:37 ` Paul E. McKenney
2024-10-01 13:53 ` Greg Kroah-Hartman
0 siblings, 2 replies; 6+ messages in thread
From: Michal Hocko @ 2024-10-01 9:07 UTC (permalink / raw)
To: Greg Kroah-Hartman
Cc: Petr Mladek, cve, linux-kernel, linux-cve-announce,
Srikar Dronamraju, Nicholas Piggin, Paul E. McKenney, Tejun Heo,
Sasha Levin, Michal Koutný
On Tue 01-10-24 10:22:51, Greg KH wrote:
> On Tue, Oct 01, 2024 at 10:02:02AM +0200, Petr Mladek wrote:
> > On Fri 2024-09-27 14:40:07, Greg Kroah-Hartman wrote:
> > > Description
> > > ===========
> > >
> > > In the Linux kernel, the following vulnerability has been resolved:
> > >
> > > workqueue: Improve scalability of workqueue watchdog touch
> > >
> > > On a ~2000 CPU powerpc system, hard lockups have been observed in the
> > > workqueue code when stop_machine runs (in this case due to CPU hotplug).
> >
> > I believe that this does not qualify as a security vulnerability.
> > Any hotplug is a privileged operation.
>
> Really? I see that happen on many embedded systems all the time, they
> add/remove CPUs while the device runs/sleeps constantly.
This is a powerpc specific fix. Other architectures are not affected.
> Now to be fair, right now an "embedded system" usually doesn't have 2000
> cpus, but what's wrong with marking this real bugfix as a vulnerability
> resolution?
Yes, this is indeed a scalability fix for huge systems with a lot of
CPUs anybody owning those systems was simply not able to use memory
hotplug without seeing those hard lockup messages. The system is not
really locked up. The progress of the hotplug operation is just utterly
slow. Calling this a vulnerability is a stretch IMHO.
The only potential attack vector is to have machine configured to panic
on hard lockups on those huge ppc systems and allow cpu hotremove to an
adversary which in itsels seems like a very bad idea anyway because
availability of such a system is then effectively compromised.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: CVE-2024-46839: workqueue: Improve scalability of workqueue watchdog touch
2024-10-01 9:07 ` Michal Hocko
@ 2024-10-01 11:37 ` Paul E. McKenney
2024-10-01 13:53 ` Greg Kroah-Hartman
1 sibling, 0 replies; 6+ messages in thread
From: Paul E. McKenney @ 2024-10-01 11:37 UTC (permalink / raw)
To: Michal Hocko
Cc: Greg Kroah-Hartman, Petr Mladek, cve, linux-kernel,
linux-cve-announce, Srikar Dronamraju, Nicholas Piggin, Tejun Heo,
Sasha Levin, Michal Koutný
On Tue, Oct 01, 2024 at 11:07:49AM +0200, Michal Hocko wrote:
> On Tue 01-10-24 10:22:51, Greg KH wrote:
> > On Tue, Oct 01, 2024 at 10:02:02AM +0200, Petr Mladek wrote:
> > > On Fri 2024-09-27 14:40:07, Greg Kroah-Hartman wrote:
> > > > Description
> > > > ===========
> > > >
> > > > In the Linux kernel, the following vulnerability has been resolved:
> > > >
> > > > workqueue: Improve scalability of workqueue watchdog touch
> > > >
> > > > On a ~2000 CPU powerpc system, hard lockups have been observed in the
> > > > workqueue code when stop_machine runs (in this case due to CPU hotplug).
> > >
> > > I believe that this does not qualify as a security vulnerability.
> > > Any hotplug is a privileged operation.
> >
> > Really? I see that happen on many embedded systems all the time, they
> > add/remove CPUs while the device runs/sleeps constantly.
>
> This is a powerpc specific fix. Other architectures are not affected.
>
> > Now to be fair, right now an "embedded system" usually doesn't have 2000
> > cpus, but what's wrong with marking this real bugfix as a vulnerability
> > resolution?
>
> Yes, this is indeed a scalability fix for huge systems with a lot of
> CPUs anybody owning those systems was simply not able to use memory
> hotplug without seeing those hard lockup messages. The system is not
> really locked up. The progress of the hotplug operation is just utterly
> slow. Calling this a vulnerability is a stretch IMHO.
>
> The only potential attack vector is to have machine configured to panic
> on hard lockups on those huge ppc systems and allow cpu hotremove to an
> adversary which in itsels seems like a very bad idea anyway because
> availability of such a system is then effectively compromised.
If the attacker can do CPU hotplug, then an effective (though admittedly
non-CVE) attack is to simply offline all but one of the CPUs. Whatever
that system was doing with its 2,000 CPUs, it is unlikely to be doing
with only one of them.
And taking Michal's point further, if the load rises high enough, you
might get various types of lockups, and the system might be configured
to panic. For example, the load resulting from dumping 2000 CPUs worth of
workload onto a single CPU could easily starve RCU's grace-period kthread
for the 21 seconds required to result in an RCU CPU stall warning. And if
the system has sysctl_panic_on_rcu_stall set, then the system will panic.
But this really should be considered to be expected behavior given
privileged abuse rather than a vulnerability, correct?
Thanx, Paul
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: CVE-2024-46839: workqueue: Improve scalability of workqueue watchdog touch
2024-10-01 9:07 ` Michal Hocko
2024-10-01 11:37 ` Paul E. McKenney
@ 2024-10-01 13:53 ` Greg Kroah-Hartman
1 sibling, 0 replies; 6+ messages in thread
From: Greg Kroah-Hartman @ 2024-10-01 13:53 UTC (permalink / raw)
To: Michal Hocko
Cc: Petr Mladek, cve, linux-kernel, linux-cve-announce,
Srikar Dronamraju, Nicholas Piggin, Paul E. McKenney, Tejun Heo,
Sasha Levin, Michal Koutný
On Tue, Oct 01, 2024 at 11:07:49AM +0200, Michal Hocko wrote:
> On Tue 01-10-24 10:22:51, Greg KH wrote:
> > On Tue, Oct 01, 2024 at 10:02:02AM +0200, Petr Mladek wrote:
> > > On Fri 2024-09-27 14:40:07, Greg Kroah-Hartman wrote:
> > > > Description
> > > > ===========
> > > >
> > > > In the Linux kernel, the following vulnerability has been resolved:
> > > >
> > > > workqueue: Improve scalability of workqueue watchdog touch
> > > >
> > > > On a ~2000 CPU powerpc system, hard lockups have been observed in the
> > > > workqueue code when stop_machine runs (in this case due to CPU hotplug).
> > >
> > > I believe that this does not qualify as a security vulnerability.
> > > Any hotplug is a privileged operation.
> >
> > Really? I see that happen on many embedded systems all the time, they
> > add/remove CPUs while the device runs/sleeps constantly.
>
> This is a powerpc specific fix. Other architectures are not affected.
>
> > Now to be fair, right now an "embedded system" usually doesn't have 2000
> > cpus, but what's wrong with marking this real bugfix as a vulnerability
> > resolution?
>
> Yes, this is indeed a scalability fix for huge systems with a lot of
> CPUs anybody owning those systems was simply not able to use memory
> hotplug without seeing those hard lockup messages. The system is not
> really locked up. The progress of the hotplug operation is just utterly
> slow. Calling this a vulnerability is a stretch IMHO.
>
> The only potential attack vector is to have machine configured to panic
> on hard lockups on those huge ppc systems and allow cpu hotremove to an
> adversary which in itsels seems like a very bad idea anyway because
> availability of such a system is then effectively compromised.
Ok, now rejected, thanks.
greg k-h
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-10-01 13:54 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-27 12:40 CVE-2024-46839: workqueue: Improve scalability of workqueue watchdog touch Greg Kroah-Hartman
2024-10-01 8:02 ` Petr Mladek
2024-10-01 8:22 ` Greg Kroah-Hartman
2024-10-01 9:07 ` Michal Hocko
2024-10-01 11:37 ` Paul E. McKenney
2024-10-01 13:53 ` Greg Kroah-Hartman
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.