* Re: CVE-2024-46839: workqueue: Improve scalability of workqueue watchdog touch [not found] <2024092754-CVE-2024-46839-cfab@gregkh> @ 2024-10-01 8:02 ` Petr Mladek 2024-10-01 8:22 ` Greg Kroah-Hartman 0 siblings, 1 reply; 5+ messages in thread From: Petr Mladek @ 2024-10-01 8:02 UTC (permalink / raw) To: cve, linux-kernel Cc: linux-cve-announce, Greg Kroah-Hartman, Srikar Dronamraju, Nicholas Piggin, Paul E. McKenney, Tejun Heo, Sasha Levin, Michal Hocko, Michal Koutný On Fri 2024-09-27 14:40:07, Greg Kroah-Hartman wrote: > Description > =========== > > In the Linux kernel, the following vulnerability has been resolved: > > workqueue: Improve scalability of workqueue watchdog touch > > On a ~2000 CPU powerpc system, hard lockups have been observed in the > workqueue code when stop_machine runs (in this case due to CPU hotplug). I believe that this does not qualify as a security vulnerability. Any hotplug is a privileged operation. Best Regards, Petr > This is due to lots of CPUs spinning in multi_cpu_stop, calling > touch_nmi_watchdog() which ends up calling wq_watchdog_touch(). > wq_watchdog_touch() writes to the global variable wq_watchdog_touched, > and that can find itself in the same cacheline as other important > workqueue data, which slows down operations to the point of lockups. > > In the case of the following abridged trace, worker_pool_idr was in > the hot line, causing the lockups to always appear at idr_find. > > watchdog: CPU 1125 self-detected hard LOCKUP @ idr_find > Call Trace: > get_work_pool > __queue_work > call_timer_fn > run_timer_softirq > __do_softirq > do_softirq_own_stack > irq_exit > timer_interrupt > decrementer_common_virt > * interrupt: 900 (timer) at multi_cpu_stop > multi_cpu_stop > cpu_stopper_thread > smpboot_thread_fn > kthread > > Fix this by having wq_watchdog_touch() only write to the line if the > last time a touch was recorded exceeds 1/4 of the watchdog threshold. > > The Linux kernel CVE team has assigned CVE-2024-46839 to this issue. > > > Affected and fixed versions > =========================== > > Fixed in 5.15.167 with commit 9d08fce64dd7 > Fixed in 6.1.110 with commit a2abd35e7dc5 > Fixed in 6.6.51 with commit 241bce1c757d > Fixed in 6.10.10 with commit da5f374103a1 > Fixed in 6.11 with commit 98f887f820c9 > > Please see https://www.kernel.org for a full list of currently supported > kernel versions by the kernel community. > > Unaffected versions might change over time as fixes are backported to > older supported kernel versions. The official CVE entry at > https://cve.org/CVERecord/?id=CVE-2024-46839 > will be updated if fixes are backported, please check that for the most > up to date information about this issue. > > > Affected files > ============== > > The file(s) affected by this issue are: > kernel/workqueue.c > > > Mitigation > ========== > > The Linux kernel CVE team recommends that you update to the latest > stable kernel version for this, and many other bugfixes. Individual > changes are never tested alone, but rather are part of a larger kernel > release. Cherry-picking individual commits is not recommended or > supported by the Linux kernel community at all. If however, updating to > the latest release is impossible, the individual changes to resolve this > issue can be found at these commits: > https://git.kernel.org/stable/c/9d08fce64dd77f42e2361a4818dbc4b50f3c7dad > https://git.kernel.org/stable/c/a2abd35e7dc55bf9ed01e2b3481fa78e086d3bf4 > https://git.kernel.org/stable/c/241bce1c757d0587721512296952e6bba69631ed > https://git.kernel.org/stable/c/da5f374103a1e0881bbd35847dc57b04ac155eb0 > https://git.kernel.org/stable/c/98f887f820c993e05a12e8aa816c80b8661d4c87 ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: CVE-2024-46839: workqueue: Improve scalability of workqueue watchdog touch 2024-10-01 8:02 ` CVE-2024-46839: workqueue: Improve scalability of workqueue watchdog touch Petr Mladek @ 2024-10-01 8:22 ` Greg Kroah-Hartman 2024-10-01 9:07 ` Michal Hocko 0 siblings, 1 reply; 5+ messages in thread From: Greg Kroah-Hartman @ 2024-10-01 8:22 UTC (permalink / raw) To: Petr Mladek Cc: cve, linux-kernel, linux-cve-announce, Srikar Dronamraju, Nicholas Piggin, Paul E. McKenney, Tejun Heo, Sasha Levin, Michal Hocko, Michal Koutný On Tue, Oct 01, 2024 at 10:02:02AM +0200, Petr Mladek wrote: > On Fri 2024-09-27 14:40:07, Greg Kroah-Hartman wrote: > > Description > > =========== > > > > In the Linux kernel, the following vulnerability has been resolved: > > > > workqueue: Improve scalability of workqueue watchdog touch > > > > On a ~2000 CPU powerpc system, hard lockups have been observed in the > > workqueue code when stop_machine runs (in this case due to CPU hotplug). > > I believe that this does not qualify as a security vulnerability. > Any hotplug is a privileged operation. Really? I see that happen on many embedded systems all the time, they add/remove CPUs while the device runs/sleeps constantly. Now to be fair, right now an "embedded system" usually doesn't have 2000 cpus, but what's wrong with marking this real bugfix as a vulnerability resolution? If you don't run your system in a way that allows cpus to be stopped unless an admin says so, it will not be relevant. thanks, greg k-h ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: CVE-2024-46839: workqueue: Improve scalability of workqueue watchdog touch 2024-10-01 8:22 ` Greg Kroah-Hartman @ 2024-10-01 9:07 ` Michal Hocko 2024-10-01 11:37 ` Paul E. McKenney 2024-10-01 13:53 ` Greg Kroah-Hartman 0 siblings, 2 replies; 5+ messages in thread From: Michal Hocko @ 2024-10-01 9:07 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Petr Mladek, cve, linux-kernel, linux-cve-announce, Srikar Dronamraju, Nicholas Piggin, Paul E. McKenney, Tejun Heo, Sasha Levin, Michal Koutný On Tue 01-10-24 10:22:51, Greg KH wrote: > On Tue, Oct 01, 2024 at 10:02:02AM +0200, Petr Mladek wrote: > > On Fri 2024-09-27 14:40:07, Greg Kroah-Hartman wrote: > > > Description > > > =========== > > > > > > In the Linux kernel, the following vulnerability has been resolved: > > > > > > workqueue: Improve scalability of workqueue watchdog touch > > > > > > On a ~2000 CPU powerpc system, hard lockups have been observed in the > > > workqueue code when stop_machine runs (in this case due to CPU hotplug). > > > > I believe that this does not qualify as a security vulnerability. > > Any hotplug is a privileged operation. > > Really? I see that happen on many embedded systems all the time, they > add/remove CPUs while the device runs/sleeps constantly. This is a powerpc specific fix. Other architectures are not affected. > Now to be fair, right now an "embedded system" usually doesn't have 2000 > cpus, but what's wrong with marking this real bugfix as a vulnerability > resolution? Yes, this is indeed a scalability fix for huge systems with a lot of CPUs anybody owning those systems was simply not able to use memory hotplug without seeing those hard lockup messages. The system is not really locked up. The progress of the hotplug operation is just utterly slow. Calling this a vulnerability is a stretch IMHO. The only potential attack vector is to have machine configured to panic on hard lockups on those huge ppc systems and allow cpu hotremove to an adversary which in itsels seems like a very bad idea anyway because availability of such a system is then effectively compromised. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: CVE-2024-46839: workqueue: Improve scalability of workqueue watchdog touch 2024-10-01 9:07 ` Michal Hocko @ 2024-10-01 11:37 ` Paul E. McKenney 2024-10-01 13:53 ` Greg Kroah-Hartman 1 sibling, 0 replies; 5+ messages in thread From: Paul E. McKenney @ 2024-10-01 11:37 UTC (permalink / raw) To: Michal Hocko Cc: Greg Kroah-Hartman, Petr Mladek, cve, linux-kernel, linux-cve-announce, Srikar Dronamraju, Nicholas Piggin, Tejun Heo, Sasha Levin, Michal Koutný On Tue, Oct 01, 2024 at 11:07:49AM +0200, Michal Hocko wrote: > On Tue 01-10-24 10:22:51, Greg KH wrote: > > On Tue, Oct 01, 2024 at 10:02:02AM +0200, Petr Mladek wrote: > > > On Fri 2024-09-27 14:40:07, Greg Kroah-Hartman wrote: > > > > Description > > > > =========== > > > > > > > > In the Linux kernel, the following vulnerability has been resolved: > > > > > > > > workqueue: Improve scalability of workqueue watchdog touch > > > > > > > > On a ~2000 CPU powerpc system, hard lockups have been observed in the > > > > workqueue code when stop_machine runs (in this case due to CPU hotplug). > > > > > > I believe that this does not qualify as a security vulnerability. > > > Any hotplug is a privileged operation. > > > > Really? I see that happen on many embedded systems all the time, they > > add/remove CPUs while the device runs/sleeps constantly. > > This is a powerpc specific fix. Other architectures are not affected. > > > Now to be fair, right now an "embedded system" usually doesn't have 2000 > > cpus, but what's wrong with marking this real bugfix as a vulnerability > > resolution? > > Yes, this is indeed a scalability fix for huge systems with a lot of > CPUs anybody owning those systems was simply not able to use memory > hotplug without seeing those hard lockup messages. The system is not > really locked up. The progress of the hotplug operation is just utterly > slow. Calling this a vulnerability is a stretch IMHO. > > The only potential attack vector is to have machine configured to panic > on hard lockups on those huge ppc systems and allow cpu hotremove to an > adversary which in itsels seems like a very bad idea anyway because > availability of such a system is then effectively compromised. If the attacker can do CPU hotplug, then an effective (though admittedly non-CVE) attack is to simply offline all but one of the CPUs. Whatever that system was doing with its 2,000 CPUs, it is unlikely to be doing with only one of them. And taking Michal's point further, if the load rises high enough, you might get various types of lockups, and the system might be configured to panic. For example, the load resulting from dumping 2000 CPUs worth of workload onto a single CPU could easily starve RCU's grace-period kthread for the 21 seconds required to result in an RCU CPU stall warning. And if the system has sysctl_panic_on_rcu_stall set, then the system will panic. But this really should be considered to be expected behavior given privileged abuse rather than a vulnerability, correct? Thanx, Paul ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: CVE-2024-46839: workqueue: Improve scalability of workqueue watchdog touch 2024-10-01 9:07 ` Michal Hocko 2024-10-01 11:37 ` Paul E. McKenney @ 2024-10-01 13:53 ` Greg Kroah-Hartman 1 sibling, 0 replies; 5+ messages in thread From: Greg Kroah-Hartman @ 2024-10-01 13:53 UTC (permalink / raw) To: Michal Hocko Cc: Petr Mladek, cve, linux-kernel, linux-cve-announce, Srikar Dronamraju, Nicholas Piggin, Paul E. McKenney, Tejun Heo, Sasha Levin, Michal Koutný On Tue, Oct 01, 2024 at 11:07:49AM +0200, Michal Hocko wrote: > On Tue 01-10-24 10:22:51, Greg KH wrote: > > On Tue, Oct 01, 2024 at 10:02:02AM +0200, Petr Mladek wrote: > > > On Fri 2024-09-27 14:40:07, Greg Kroah-Hartman wrote: > > > > Description > > > > =========== > > > > > > > > In the Linux kernel, the following vulnerability has been resolved: > > > > > > > > workqueue: Improve scalability of workqueue watchdog touch > > > > > > > > On a ~2000 CPU powerpc system, hard lockups have been observed in the > > > > workqueue code when stop_machine runs (in this case due to CPU hotplug). > > > > > > I believe that this does not qualify as a security vulnerability. > > > Any hotplug is a privileged operation. > > > > Really? I see that happen on many embedded systems all the time, they > > add/remove CPUs while the device runs/sleeps constantly. > > This is a powerpc specific fix. Other architectures are not affected. > > > Now to be fair, right now an "embedded system" usually doesn't have 2000 > > cpus, but what's wrong with marking this real bugfix as a vulnerability > > resolution? > > Yes, this is indeed a scalability fix for huge systems with a lot of > CPUs anybody owning those systems was simply not able to use memory > hotplug without seeing those hard lockup messages. The system is not > really locked up. The progress of the hotplug operation is just utterly > slow. Calling this a vulnerability is a stretch IMHO. > > The only potential attack vector is to have machine configured to panic > on hard lockups on those huge ppc systems and allow cpu hotremove to an > adversary which in itsels seems like a very bad idea anyway because > availability of such a system is then effectively compromised. Ok, now rejected, thanks. greg k-h ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-10-01 13:54 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <2024092754-CVE-2024-46839-cfab@gregkh>
2024-10-01 8:02 ` CVE-2024-46839: workqueue: Improve scalability of workqueue watchdog touch Petr Mladek
2024-10-01 8:22 ` Greg Kroah-Hartman
2024-10-01 9:07 ` Michal Hocko
2024-10-01 11:37 ` Paul E. McKenney
2024-10-01 13:53 ` Greg Kroah-Hartman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox