* [PATCH v3 0/1] PM / Freezer: Skip zombie/dead processes to reduce
@ 2025-06-11 10:12 Zihuan Zhang
2025-06-11 10:12 ` [PATCH v3 1/1] PM / Freezer: Skip zombie/dead processes to Zihuan Zhang
0 siblings, 1 reply; 13+ messages in thread
From: Zihuan Zhang @ 2025-06-11 10:12 UTC (permalink / raw)
To: rafael, pavel, len.brown; +Cc: linux-pm, linux-kernel, Zihuan Zhang
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=y, Size: 6017 bytes --]
Hi all,
This patch series improves the performance of the process freezer by
skipping zombie tasks during freezing.
In the suspend and hibernation paths, the freezer traverses all tasks
and attempts to freeze them. However, zombie tasks (EXIT_ZOMBIE with
PF_EXITING) are already dead — they are not schedulable and cannot enter
the refrigerator. Attempting to freeze such tasks is redundant and
unnecessarily increases freezing time.
In particular, on systems under fork storm conditions (e.g., many
short-lived processes quickly becoming zombies), the number of zombie tasks
can spike into the thousands or more. We observed that this causes the
freezer loop to waste significant time processing tasks that are guaranteed
to not need freezing.
Testing and Results
Platform:
- Architecture: x86_64
- CPU: ZHAOXIN KaiXian KX-7000
- RAM: 16 GB
- Kernel: v6.6.93
result without the patch:
dmesg | grep elap
[ 219.608992] Freezing user space processes completed (elapsed 0.010 seconds)
[ 219.617355] Freezing remaining freezable tasks completed (elapsed 0.008 seconds)
[ 228.029119] Freezing user space processes completed (elapsed 0.013 seconds)
[ 228.040672] Freezing remaining freezable tasks completed (elapsed 0.011 seconds)
[ 236.879065] Freezing user space processes completed (elapsed 0.020 seconds)
[ 236.897976] Freezing remaining freezable tasks completed (elapsed 0.018 seconds)
[ 246.276679] Freezing user space processes completed (elapsed 0.026 seconds)
[ 246.298636] Freezing remaining freezable tasks completed (elapsed 0.021 seconds)
[ 256.221504] Freezing user space processes completed (elapsed 0.030 seconds)
[ 256.248955] Freezing remaining freezable tasks completed (elapsed 0.027 seconds)
[ 266.674987] Freezing user space processes completed (elapsed 0.040 seconds)
[ 266.709811] Freezing remaining freezable tasks completed (elapsed 0.034 seconds)
[ 277.701679] Freezing user space processes completed (elapsed 0.046 seconds)
[ 277.742048] Freezing remaining freezable tasks completed (elapsed 0.040 seconds)
[ 289.246611] Freezing user space processes completed (elapsed 0.046 seconds)
[ 289.290753] Freezing remaining freezable tasks completed (elapsed 0.044 seconds)
[ 301.516854] Freezing user space processes completed (elapsed 0.041 seconds)
[ 301.576287] Freezing remaining freezable tasks completed (elapsed 0.059 seconds)
[ 314.422499] Freezing user space processes completed (elapsed 0.043 seconds)
[ 314.465804] Freezing remaining freezable tasks completed (elapsed 0.043 seconds)
result with the patch:
dmesg | grep elap
[ 54.161674] Freezing user space processes completed (elapsed 0.007 seconds)
[ 54.171536] Freezing remaining freezable tasks completed (elapsed 0.009 seconds)
[ 62.556462] Freezing user space processes completed (elapsed 0.006 seconds)
[ 62.566496] Freezing remaining freezable tasks completed (elapsed 0.010 seconds)
[ 71.395421] Freezing user space processes completed (elapsed 0.009 seconds)
[ 71.402820] Freezing remaining freezable tasks completed (elapsed 0.007 seconds)
[ 80.785463] Freezing user space processes completed (elapsed 0.010 seconds)
[ 80.793914] Freezing remaining freezable tasks completed (elapsed 0.008 seconds)
[ 90.962659] Freezing user space processes completed (elapsed 0.012 seconds)
[ 90.973519] Freezing remaining freezable tasks completed (elapsed 0.010 seconds)
[ 101.435638] Freezing user space processes completed (elapsed 0.013 seconds)
[ 101.449023] Freezing remaining freezable tasks completed (elapsed 0.013 seconds)
[ 112.669786] Freezing user space processes completed (elapsed 0.015 seconds)
[ 112.683540] Freezing remaining freezable tasks completed (elapsed 0.013 seconds)
[ 124.585681] Freezing user space processes completed (elapsed 0.017 seconds)
[ 124.599553] Freezing remaining freezable tasks completed (elapsed 0.013 seconds)
[ 136.826635] Freezing user space processes completed (elapsed 0.016 seconds)
[ 136.841840] Freezing remaining freezable tasks completed (elapsed 0.015 seconds)
[ 149.686575] Freezing user space processes completed (elapsed 0.016 seconds)
[ 149.701549] Freezing remaining freezable tasks completed (elapsed 0.014 seconds)
Here is the user-space fork storm simulator used for testing:
```c
// create_zombie.c
void usage(const char *prog) {
fprintf(stderr, "Usage: %s <number_of_zombies>\n", prog);
exit(EXIT_FAILURE);
}
int main(int argc, char *argv[]) {
if (argc != 2) {
usage(argv[0]);
}
long num_zombies = strtol(argv[1], NULL, 10);
if (num_zombies <= 0 || num_zombies > 1000000) {
fprintf(stderr, "Invalid number of zombies: %ld\n", num_zombies);
exit(EXIT_FAILURE);
}
printf("Creating %ld zombie processes...\n", num_zombies);
for (long i = 0; i < num_zombies; i++) {
pid_t pid = fork();
if (pid < 0) {
perror("fork failed");
exit(EXIT_FAILURE);
} else if (pid == 0) {
// Child exits immediately
exit(0);
}
// Parent does NOT wait, leaving zombie
}
printf("All child processes created. Sleeping for 60 seconds...\n");
sleep(60);
printf("Parent exiting, zombies will be reaped by init.\n");
return 0;
}
```
And we used a shell loop to suspend repeatedly:
```bash
LOOPS=10
echo none > /sys/power/pm_test
echo 5 > /sys/module/suspend/parameters/pm_test_delay
for ((i=1; i<=LOOPS; i++)); do
echo "===== Test round $i/$LOOPS ====="
./create_zombie $((i * 3000)) &
sleep 5
echo mem > /sys/power/state
pkill create_zombie
echo "Round $i complete. Waiting 5s..."
sleep 5
done
echo "==== All $LOOPS rounds complete ===="
```
Zihuan Zhang (1):
PM / Freezer: Skip zombie/dead processes to reduce freeze latency
kernel/power/process.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--
2.25.1
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v3 1/1] PM / Freezer: Skip zombie/dead processes to
2025-06-11 10:12 [PATCH v3 0/1] PM / Freezer: Skip zombie/dead processes to reduce Zihuan Zhang
@ 2025-06-11 10:12 ` Zihuan Zhang
2025-07-03 14:15 ` Rafael J. Wysocki
0 siblings, 1 reply; 13+ messages in thread
From: Zihuan Zhang @ 2025-06-11 10:12 UTC (permalink / raw)
To: rafael, pavel, len.brown; +Cc: linux-pm, linux-kernel, Zihuan Zhang
When freezing user space during suspend or hibernation, the freezer
iterates over all tasks and attempts to freeze them via
try_to_freeze_tasks().
However, zombie processes (i.e., tasks in EXIT_ZOMBIE state) are no
longer running and will never enter the refrigerator. Trying to freeze
them is meaningless and causes extra overhead, especially when there are
thousands of zombies created during stress conditions such as fork
storms.
This patch skips zombie processes during the freezing phase.
In our testing with ~30,000 user processes (including many zombies), the
average freeze time during suspend (S3) dropped from ~43 ms to ~16 ms:
- Without the patch: ~43 ms average freeze latency
- With the patch: ~16 ms average freeze latency
- Improvement: ~62%
This confirms that skipping zombies significantly speeds up the freezing
process when the system is under heavy load with many short-lived tasks.
Signed-off-by: Zihuan Zhang <zhangzihuan@kylinos.cn>
Changes in v3:
- Added performance test
Changes in v2:
- Simplified code, added judgment of dead processes
- Rewrite changelog
---
kernel/power/process.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/power/process.c b/kernel/power/process.c
index a6f7ba2d283d..2bbe22610522 100644
--- a/kernel/power/process.c
+++ b/kernel/power/process.c
@@ -51,7 +51,7 @@ static int try_to_freeze_tasks(bool user_only)
todo = 0;
read_lock(&tasklist_lock);
for_each_process_thread(g, p) {
- if (p == current || !freeze_task(p))
+ if (p == current || p->exit_state || !freeze_task(p))
continue;
todo++;
--
2.25.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v3 1/1] PM / Freezer: Skip zombie/dead processes to
2025-06-11 10:12 ` [PATCH v3 1/1] PM / Freezer: Skip zombie/dead processes to Zihuan Zhang
@ 2025-07-03 14:15 ` Rafael J. Wysocki
2025-07-03 16:40 ` Peter Zijlstra
0 siblings, 1 reply; 13+ messages in thread
From: Rafael J. Wysocki @ 2025-07-03 14:15 UTC (permalink / raw)
To: Zihuan Zhang, Peter Zijlstra
Cc: rafael, pavel, len.brown, linux-pm, linux-kernel
The patch subject appears to be incomplete.
On Wed, Jun 11, 2025 at 12:13 PM Zihuan Zhang <zhangzihuan@kylinos.cn> wrote:
>
> When freezing user space during suspend or hibernation, the freezer
> iterates over all tasks and attempts to freeze them via
> try_to_freeze_tasks().
>
> However, zombie processes (i.e., tasks in EXIT_ZOMBIE state) are no
> longer running and will never enter the refrigerator. Trying to freeze
> them is meaningless and causes extra overhead, especially when there are
> thousands of zombies created during stress conditions such as fork
> storms.
>
> This patch skips zombie processes during the freezing phase.
>
> In our testing with ~30,000 user processes (including many zombies), the
> average freeze time during suspend (S3) dropped from ~43 ms to ~16 ms:
>
> - Without the patch: ~43 ms average freeze latency
> - With the patch: ~16 ms average freeze latency
> - Improvement: ~62%
And what's the total suspend time on the system in question?
> This confirms that skipping zombies significantly speeds up the freezing
> process when the system is under heavy load with many short-lived tasks.
>
> Signed-off-by: Zihuan Zhang <zhangzihuan@kylinos.cn>
>
> Changes in v3:
> - Added performance test
>
> Changes in v2:
> - Simplified code, added judgment of dead processes
> - Rewrite changelog
> ---
> kernel/power/process.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/power/process.c b/kernel/power/process.c
> index a6f7ba2d283d..2bbe22610522 100644
> --- a/kernel/power/process.c
> +++ b/kernel/power/process.c
> @@ -51,7 +51,7 @@ static int try_to_freeze_tasks(bool user_only)
> todo = 0;
> read_lock(&tasklist_lock);
> for_each_process_thread(g, p) {
> - if (p == current || !freeze_task(p))
> + if (p == current || p->exit_state || !freeze_task(p))
> continue;
>
> todo++;
> --
This is basically fine by me, but I wonder what other people think.
Peter?
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 1/1] PM / Freezer: Skip zombie/dead processes to
2025-07-03 14:15 ` Rafael J. Wysocki
@ 2025-07-03 16:40 ` Peter Zijlstra
2025-07-03 17:15 ` Rafael J. Wysocki
0 siblings, 1 reply; 13+ messages in thread
From: Peter Zijlstra @ 2025-07-03 16:40 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Zihuan Zhang, pavel, len.brown, linux-pm, linux-kernel,
Oleg Nesterov
On Thu, Jul 03, 2025 at 04:15:10PM +0200, Rafael J. Wysocki wrote:
> The patch subject appears to be incomplete.
>
> On Wed, Jun 11, 2025 at 12:13 PM Zihuan Zhang <zhangzihuan@kylinos.cn> wrote:
> >
> > When freezing user space during suspend or hibernation, the freezer
> > iterates over all tasks and attempts to freeze them via
> > try_to_freeze_tasks().
> >
> > However, zombie processes (i.e., tasks in EXIT_ZOMBIE state) are no
> > longer running and will never enter the refrigerator. Trying to freeze
> > them is meaningless and causes extra overhead, especially when there are
> > thousands of zombies created during stress conditions such as fork
> > storms.
> >
> > This patch skips zombie processes during the freezing phase.
> >
> > In our testing with ~30,000 user processes (including many zombies), the
> > average freeze time during suspend (S3) dropped from ~43 ms to ~16 ms:
> >
> > - Without the patch: ~43 ms average freeze latency
> > - With the patch: ~16 ms average freeze latency
> > - Improvement: ~62%
>
> And what's the total suspend time on the system in question?
>
> > This confirms that skipping zombies significantly speeds up the freezing
> > process when the system is under heavy load with many short-lived tasks.
> >
> > Signed-off-by: Zihuan Zhang <zhangzihuan@kylinos.cn>
> >
> > Changes in v3:
> > - Added performance test
> >
> > Changes in v2:
> > - Simplified code, added judgment of dead processes
> > - Rewrite changelog
> > ---
> > kernel/power/process.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/kernel/power/process.c b/kernel/power/process.c
> > index a6f7ba2d283d..2bbe22610522 100644
> > --- a/kernel/power/process.c
> > +++ b/kernel/power/process.c
> > @@ -51,7 +51,7 @@ static int try_to_freeze_tasks(bool user_only)
> > todo = 0;
> > read_lock(&tasklist_lock);
> > for_each_process_thread(g, p) {
> > - if (p == current || !freeze_task(p))
> > + if (p == current || p->exit_state || !freeze_task(p))
> > continue;
> >
> > todo++;
> > --
>
> This is basically fine by me, but I wonder what other people think.
>
> Peter?
How realistic is it to have a significant amount of zombies when
freezing? This seems like an artificial corner case at best.
Zombie tasks are stuck waiting on their parent to consume their exit
state or something, right? And those parents being frozen, they pretty
much stay there.
So I suppose the logic holds, but urgh, do we really need this?
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 1/1] PM / Freezer: Skip zombie/dead processes to
2025-07-03 16:40 ` Peter Zijlstra
@ 2025-07-03 17:15 ` Rafael J. Wysocki
2025-07-04 1:45 ` Zihuan Zhang
2025-07-04 8:19 ` Peter Zijlstra
0 siblings, 2 replies; 13+ messages in thread
From: Rafael J. Wysocki @ 2025-07-03 17:15 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Rafael J. Wysocki, Zihuan Zhang, pavel, len.brown, linux-pm,
linux-kernel, Oleg Nesterov
On Thu, Jul 3, 2025 at 6:40 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Thu, Jul 03, 2025 at 04:15:10PM +0200, Rafael J. Wysocki wrote:
> > The patch subject appears to be incomplete.
> >
> > On Wed, Jun 11, 2025 at 12:13 PM Zihuan Zhang <zhangzihuan@kylinos.cn> wrote:
> > >
> > > When freezing user space during suspend or hibernation, the freezer
> > > iterates over all tasks and attempts to freeze them via
> > > try_to_freeze_tasks().
> > >
> > > However, zombie processes (i.e., tasks in EXIT_ZOMBIE state) are no
> > > longer running and will never enter the refrigerator. Trying to freeze
> > > them is meaningless and causes extra overhead, especially when there are
> > > thousands of zombies created during stress conditions such as fork
> > > storms.
> > >
> > > This patch skips zombie processes during the freezing phase.
> > >
> > > In our testing with ~30,000 user processes (including many zombies), the
> > > average freeze time during suspend (S3) dropped from ~43 ms to ~16 ms:
> > >
> > > - Without the patch: ~43 ms average freeze latency
> > > - With the patch: ~16 ms average freeze latency
> > > - Improvement: ~62%
> >
> > And what's the total suspend time on the system in question?
> >
> > > This confirms that skipping zombies significantly speeds up the freezing
> > > process when the system is under heavy load with many short-lived tasks.
> > >
> > > Signed-off-by: Zihuan Zhang <zhangzihuan@kylinos.cn>
> > >
> > > Changes in v3:
> > > - Added performance test
> > >
> > > Changes in v2:
> > > - Simplified code, added judgment of dead processes
> > > - Rewrite changelog
> > > ---
> > > kernel/power/process.c | 2 +-
> > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/kernel/power/process.c b/kernel/power/process.c
> > > index a6f7ba2d283d..2bbe22610522 100644
> > > --- a/kernel/power/process.c
> > > +++ b/kernel/power/process.c
> > > @@ -51,7 +51,7 @@ static int try_to_freeze_tasks(bool user_only)
> > > todo = 0;
> > > read_lock(&tasklist_lock);
> > > for_each_process_thread(g, p) {
> > > - if (p == current || !freeze_task(p))
> > > + if (p == current || p->exit_state || !freeze_task(p))
> > > continue;
> > >
> > > todo++;
> > > --
> >
> > This is basically fine by me, but I wonder what other people think.
> >
> > Peter?
>
> How realistic is it to have a significant amount of zombies when
> freezing? This seems like an artificial corner case at best.
>
> Zombie tasks are stuck waiting on their parent to consume their exit
> state or something, right? And those parents being frozen, they pretty
> much stay there.
>
> So I suppose the logic holds, but urgh, do we really need this?
Unlikely in practice, but the code change is small and it would be
prudent to get this addressed IMV (at least so we don't need to
revisit it).
But I would ask for a comment above this check to explain that zombies
need not be frozen.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 1/1] PM / Freezer: Skip zombie/dead processes to
2025-07-03 17:15 ` Rafael J. Wysocki
@ 2025-07-04 1:45 ` Zihuan Zhang
2025-07-04 8:19 ` Peter Zijlstra
1 sibling, 0 replies; 13+ messages in thread
From: Zihuan Zhang @ 2025-07-04 1:45 UTC (permalink / raw)
To: Rafael J. Wysocki, Peter Zijlstra
Cc: pavel, len.brown, linux-pm, linux-kernel, Oleg Nesterov
Hi Rafael,
在 2025/7/4 01:15, Rafael J. Wysocki 写道:
> On Thu, Jul 3, 2025 at 6:40 PM Peter Zijlstra <peterz@infradead.org> wrote:
>> On Thu, Jul 03, 2025 at 04:15:10PM +0200, Rafael J. Wysocki wrote:
>>> The patch subject appears to be incomplete.
You’re right — the patch subject was indeed incomplete. That was an
oversight on my part, and I’ll fix it in the next version.
Thanks for pointing it out.
>>> On Wed, Jun 11, 2025 at 12:13 PM Zihuan Zhang <zhangzihuan@kylinos.cn> wrote:
>>>> When freezing user space during suspend or hibernation, the freezer
>>>> iterates over all tasks and attempts to freeze them via
>>>> try_to_freeze_tasks().
>>>>
>>>> However, zombie processes (i.e., tasks in EXIT_ZOMBIE state) are no
>>>> longer running and will never enter the refrigerator. Trying to freeze
>>>> them is meaningless and causes extra overhead, especially when there are
>>>> thousands of zombies created during stress conditions such as fork
>>>> storms.
>>>>
>>>> This patch skips zombie processes during the freezing phase.
>>>>
>>>> In our testing with ~30,000 user processes (including many zombies), the
>>>> average freeze time during suspend (S3) dropped from ~43 ms to ~16 ms:
>>>>
>>>> - Without the patch: ~43 ms average freeze latency
>>>> - With the patch: ~16 ms average freeze latency
>>>> - Improvement: ~62%
>>> And what's the total suspend time on the system in question?
>>>
We used the sleepgraph tool to measure the full suspend-to-RAM (S3)
latency on our test platform. The total suspend time was around 1859.055
ms, so the optimization to skip zombie processes — reducing freeze time
from ~43 ms to ~16 ms — accounts for roughly 1% of the total suspend
latency.
While the absolute gain is relatively small, it helps reduce unnecessary
overhead under stress conditions such as fork storms with many zombie
tasks, and improves freeze-time predictability.
>>>> This confirms that skipping zombies significantly speeds up the freezing
>>>> process when the system is under heavy load with many short-lived tasks.
>>>>
>>>> Signed-off-by: Zihuan Zhang <zhangzihuan@kylinos.cn>
>>>>
>>>> Changes in v3:
>>>> - Added performance test
>>>>
>>>> Changes in v2:
>>>> - Simplified code, added judgment of dead processes
>>>> - Rewrite changelog
>>>> ---
>>>> kernel/power/process.c | 2 +-
>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/kernel/power/process.c b/kernel/power/process.c
>>>> index a6f7ba2d283d..2bbe22610522 100644
>>>> --- a/kernel/power/process.c
>>>> +++ b/kernel/power/process.c
>>>> @@ -51,7 +51,7 @@ static int try_to_freeze_tasks(bool user_only)
>>>> todo = 0;
>>>> read_lock(&tasklist_lock);
>>>> for_each_process_thread(g, p) {
>>>> - if (p == current || !freeze_task(p))
>>>> + if (p == current || p->exit_state || !freeze_task(p))
>>>> continue;
>>>>
>>>> todo++;
>>>> --
>>> This is basically fine by me, but I wonder what other people think.
>>>
>>> Peter?
>> How realistic is it to have a significant amount of zombies when
>> freezing? This seems like an artificial corner case at best.
>>
>> Zombie tasks are stuck waiting on their parent to consume their exit
>> state or something, right? And those parents being frozen, they pretty
>> much stay there.
>>
>> So I suppose the logic holds, but urgh, do we really need this?
> Unlikely in practice, but the code change is small and it would be
> prudent to get this addressed IMV (at least so we don't need to
> revisit it).
>
> But I would ask for a comment above this check to explain that zombies
> need not be frozen.
Thanks for the suggestion.
Yes, I’ll add a comment to clarify the rationale. Planning to add the
following just above the check:
> /*
> * Zombie and dead tasks are not running anymore and cannot enter
> * the __refrigerator(). Skipping them avoids unnecessary freeze attempts.
> */
I’ll include this in the next version of the patch.
Best regards,
Zihuan Zhang
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 1/1] PM / Freezer: Skip zombie/dead processes to
2025-07-03 17:15 ` Rafael J. Wysocki
2025-07-04 1:45 ` Zihuan Zhang
@ 2025-07-04 8:19 ` Peter Zijlstra
2025-07-04 8:48 ` Zihuan Zhang
1 sibling, 1 reply; 13+ messages in thread
From: Peter Zijlstra @ 2025-07-04 8:19 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Zihuan Zhang, pavel, len.brown, linux-pm, linux-kernel,
Oleg Nesterov
On Thu, Jul 03, 2025 at 07:15:23PM +0200, Rafael J. Wysocki wrote:
> > How realistic is it to have a significant amount of zombies when
> > freezing? This seems like an artificial corner case at best.
> >
> > Zombie tasks are stuck waiting on their parent to consume their exit
> > state or something, right? And those parents being frozen, they pretty
> > much stay there.
> >
> > So I suppose the logic holds, but urgh, do we really need this?
>
> Unlikely in practice, but the code change is small and it would be
> prudent to get this addressed IMV (at least so we don't need to
> revisit it).
>
> But I would ask for a comment above this check to explain that zombies
> need not be frozen.
Depending on where they wait (I can't seem to find in a hurry) it might
make sense to make that wait FREEZABLE anyway.
For example, AFAICT it wouldn't hurt, and might even help some, to make
kernel/exit.c:do_wait() TASK_FREEZABLE.
So where do ZOMBIEs sleep? Don't they simply pass through do_task_dead()
and never get scheduled again? Notably, do_task_dead() already marks the
tasks as PF_NOFREEZE.
Anyway, yes, the condition it adds is relatively simple, but I really
don't see why we should complicate things *at*all*.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 1/1] PM / Freezer: Skip zombie/dead processes to
2025-07-04 8:19 ` Peter Zijlstra
@ 2025-07-04 8:48 ` Zihuan Zhang
2025-07-04 9:21 ` Peter Zijlstra
0 siblings, 1 reply; 13+ messages in thread
From: Zihuan Zhang @ 2025-07-04 8:48 UTC (permalink / raw)
To: Peter Zijlstra, Rafael J. Wysocki
Cc: pavel, len.brown, linux-pm, linux-kernel, Oleg Nesterov
Hi Peter,
Thanks for the feedback.
在 2025/7/4 16:19, Peter Zijlstra 写道:
>
> Depending on where they wait (I can't seem to find in a hurry) it might
> make sense to make that wait FREEZABLE anyway.
>
> For example, AFAICT it wouldn't hurt, and might even help some, to make
> kernel/exit.c:do_wait() TASK_FREEZABLE.
>
> So where do ZOMBIEs sleep? Don't they simply pass through do_task_dead()
> and never get scheduled again? Notably, do_task_dead() already marks the
> tasks as PF_NOFREEZE.
>
> Anyway, yes, the condition it adds is relatively simple, but I really
> don't see why we should complicate things *at*all*.
You’re absolutely right — zombie processes won’t be frozen in practice,
since PF_NOFREEZE is already set in do_task_dead(). However, if we don’t
explicitly skip them early in try_to_freeze_task(), they still go
through the freezer logic path, including calls like freeze_task() →
freezing() before eventually returning without freezing.
This not only introduces unnecessary code path traversal, but also
involves locking (e.g., spin_lock_irqsave/restore()), which could be
avoided altogether if we bail out earlier.
Additionally, skipping zombies directly helps reduce the list traversal
overhead in freeze_processes(), especially on systems with a large
number of tasks, where zombies can account for a non-trivial fraction.
So while the practical effect might be small, the gain is low-risk and
helps streamline the freezer logic a bit more.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 1/1] PM / Freezer: Skip zombie/dead processes to
2025-07-04 8:48 ` Zihuan Zhang
@ 2025-07-04 9:21 ` Peter Zijlstra
2025-07-07 1:00 ` Zihuan Zhang
0 siblings, 1 reply; 13+ messages in thread
From: Peter Zijlstra @ 2025-07-04 9:21 UTC (permalink / raw)
To: Zihuan Zhang
Cc: Rafael J. Wysocki, pavel, len.brown, linux-pm, linux-kernel,
Oleg Nesterov
On Fri, Jul 04, 2025 at 04:48:20PM +0800, Zihuan Zhang wrote:
> Hi Peter,
>
> Thanks for the feedback.
>
> ??? 2025/7/4 16:19, Peter Zijlstra ??????:
> > ????????? ??? ??? ?????? ??? ??? ??? ??? ??? ??? ??? ??? ??? ??????
> > Depending on where they wait (I can't seem to find in a hurry) it might
> > make sense to make that wait FREEZABLE anyway.
> >
> > For example, AFAICT it wouldn't hurt, and might even help some, to make
> > kernel/exit.c:do_wait() TASK_FREEZABLE.
> >
> > So where do ZOMBIEs sleep? Don't they simply pass through do_task_dead()
> > and never get scheduled again? Notably, do_task_dead() already marks the
> > tasks as PF_NOFREEZE.
> >
> > Anyway, yes, the condition it adds is relatively simple, but I really
> > don't see why we should complicate things *at*all*.
>
> You???re absolutely right ??? zombie processes won???t be frozen in
> practice, since PF_NOFREEZE is already set in do_task_dead(). However, if we
> don???t explicitly skip them early in try_to_freeze_task(), they still go
> through the freezer logic path, including calls like freeze_task() ???
> freezing() before eventually returning without freezing.
>
> This not only introduces unnecessary code path traversal, but also involves
> locking (e.g., spin_lock_irqsave/restore()), which could be avoided
> altogether if we bail out earlier.
>
> Additionally, skipping zombies directly helps reduce the list traversal
> overhead in freeze_processes(), especially on systems with a large number of
> tasks, where zombies can account for a non-trivial fraction.
>
> So while the practical effect might be small, the gain is low-risk and helps
> streamline the freezer logic a bit more.
You're missing the obvious. How about we 'fix' the PF_NOFREEZE handling
and help all cases that set that and not only zombies?
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 1/1] PM / Freezer: Skip zombie/dead processes to
2025-07-04 9:21 ` Peter Zijlstra
@ 2025-07-07 1:00 ` Zihuan Zhang
2025-07-07 8:42 ` Peter Zijlstra
0 siblings, 1 reply; 13+ messages in thread
From: Zihuan Zhang @ 2025-07-07 1:00 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Rafael J. Wysocki, pavel, len.brown, linux-pm, linux-kernel,
Oleg Nesterov
在 2025/7/4 17:21, Peter Zijlstra 写道:
> You're missing the obvious. How about we 'fix' the PF_NOFREEZE handling
> and help all cases that set that and not only zombies?
It sounds like a good idea, but there’s a potential risk in relying
solely on PF_NOFREEZE: it’s a mutable flag that can be set or cleared
dynamically during runtime, even within the freeze window.
If a task changes its PF_NOFREEZE state after passing the early check in
try_to_freeze_task(), we might skip freezing it incorrectly, leading to
inconsistent behavior or unexpected task escapes. This is particularly
tricky for some kernel threads or exit paths where PF_NOFREEZE is
manipulated as part of cleanup or teardown.
In contrast, checking exit_state for EXIT_ZOMBIE or EXIT_DEAD is fully
stable and irreversible — once a task becomes a zombie, it's no longer
schedulable and doesn’t require freezing. That makes the condition both
safe and predictable, without risk of races or missed transitions.
Currently, the safest way to skip tasks based on PF_NOFREEZE would be:
*if (!(p->flags & PF_KTHREAD) && (p->flags & PF_NOFREEZE))*
to avoid skipping kernel threads that may dynamically toggle this flag.
But this increases code complexity compared to simply checking
*if (p->exit_state)*
which is more straightforward.
Importantly, exit_state is set earlier and more reliably than PF_NOFREEZE,
So while improving PF_NOFREEZE handling is a worthwhile direction (and
I’d be happy to explore that in a separate patch RFC
[1]<https://lore.kernel.org/all/20250624115244.19260-1-zhangzihuan@kylinos.cn/>),
this small patch keeps the logic minimal and avoids subtle corner cases.
It also reduces unnecessary locking for zombie tasks, which can help
when there are lots of them around.
[1]:
https://lore.kernel.org/all/20250624115244.19260-1-zhangzihuan@kylinos.cn/
Thanks again for your insightful feedback — much appreciated.
Best regards,
Zihuan Zhang
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 1/1] PM / Freezer: Skip zombie/dead processes to
2025-07-07 1:00 ` Zihuan Zhang
@ 2025-07-07 8:42 ` Peter Zijlstra
2025-07-08 0:56 ` Zihuan Zhang
2025-07-16 16:16 ` Oleg Nesterov
0 siblings, 2 replies; 13+ messages in thread
From: Peter Zijlstra @ 2025-07-07 8:42 UTC (permalink / raw)
To: Zihuan Zhang
Cc: Rafael J. Wysocki, pavel, len.brown, linux-pm, linux-kernel,
Oleg Nesterov
On Mon, Jul 07, 2025 at 09:00:12AM +0800, Zihuan Zhang wrote:
>
> 在 2025/7/4 17:21, Peter Zijlstra 写道:
> > You're missing the obvious. How about we 'fix' the PF_NOFREEZE handling
> > and help all cases that set that and not only zombies?
>
> It sounds like a good idea, but there’s a potential risk in relying solely
> on PF_NOFREEZE: it’s a mutable flag that can be set or cleared dynamically
> during runtime, even within the freeze window.
> If a task changes its PF_NOFREEZE state after passing the early check in
> try_to_freeze_task(), we might skip freezing it incorrectly, leading to
> inconsistent behavior or unexpected task escapes. This is particularly
> tricky for some kernel threads or exit paths where PF_NOFREEZE is
> manipulated as part of cleanup or teardown.
A quick browse through the code seems to suggest that for user tasks,
PF_NOFREEZE is set just like exit_state, once at death.
For kernel threads the situation is a little more complex; but typically
a kthread is spawned with PF_NOFREEZE set, and then some will clear it
again, but always before then calling a TASK_FREEZABLE wait.
The only thing I didn't fully investigate is this
{,un}lock_system_sleep() thing. But that would appear to need at least
the below fixlet.
diff --git a/kernel/power/main.c b/kernel/power/main.c
index 3d484630505a..a415e7d30a2c 100644
--- a/kernel/power/main.c
+++ b/kernel/power/main.c
@@ -52,8 +52,8 @@ void pm_restrict_gfp_mask(void)
unsigned int lock_system_sleep(void)
{
unsigned int flags = current->flags;
- current->flags |= PF_NOFREEZE;
mutex_lock(&system_transition_mutex);
+ current->flags |= PF_NOFREEZE;
return flags;
}
EXPORT_SYMBOL_GPL(lock_system_sleep);
Anyway, this seems to suggest something relatively simple like this here
should do:
diff --git a/kernel/freezer.c b/kernel/freezer.c
index 8d530d0949ff..8b7cecd17564 100644
--- a/kernel/freezer.c
+++ b/kernel/freezer.c
@@ -162,20 +162,22 @@ static bool __freeze_task(struct task_struct *p)
*/
bool freeze_task(struct task_struct *p)
{
- unsigned long flags;
-
- spin_lock_irqsave(&freezer_lock, flags);
- if (!freezing(p) || frozen(p) || __freeze_task(p)) {
- spin_unlock_irqrestore(&freezer_lock, flags);
+ /*
+ * User tasks get NOFREEZE in do_task_dead().
+ */
+ if ((p->flags & (PF_NOFREEZE | PF_KTHREAD)) == PF_NOFREEZE)
return false;
- }
- if (!(p->flags & PF_KTHREAD))
- fake_signal_wake_up(p);
- else
- wake_up_state(p, TASK_NORMAL);
+ scoped_guard (spinlock_irqsave, &freezer_lock) {
+ if (!freezing(p) || frozen(p) || __freeze_task(p))
+ return false;
+
+ if (!(p->flags & PF_KTHREAD))
+ fake_signal_wake_up(p);
+ else
+ wake_up_state(p, TASK_NORMAL);
+ }
- spin_unlock_irqrestore(&freezer_lock, flags);
return true;
}
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v3 1/1] PM / Freezer: Skip zombie/dead processes to
2025-07-07 8:42 ` Peter Zijlstra
@ 2025-07-08 0:56 ` Zihuan Zhang
2025-07-16 16:16 ` Oleg Nesterov
1 sibling, 0 replies; 13+ messages in thread
From: Zihuan Zhang @ 2025-07-08 0:56 UTC (permalink / raw)
To: Peter Zijlstra, Rafael J. Wysocki
Cc: Rafael J. Wysocki, pavel, len.brown, linux-pm, linux-kernel,
Oleg Nesterov
在 2025/7/7 16:42, Peter Zijlstra 写道:
> A quick browse through the code seems to suggest that for user tasks,
> PF_NOFREEZE is set just like exit_state, once at death.
>
I couldn’t agree more — for user tasks, PF_NOFREEZE is indeed set at the
same time as exit_state, right at death.
> For kernel threads the situation is a little more complex; but typically
> a kthread is spawned with PF_NOFREEZE set, and then some will clear it
> again, but always before then calling a TASK_FREEZABLE wait.
While that’s generally the expected pattern, it depends on each kthread
correctly managing the PF_NOFREEZE flag before entering a TASK_FREEZABLE
wait.
This assumption can be fragile in practice — a missed update or
unconventional usage could lead to inconsistent freezing behavior.
> The only thing I didn't fully investigate is this
> {,un}lock_system_sleep() thing. But that would appear to need at least
> the below fixlet.
>
> diff --git a/kernel/power/main.c b/kernel/power/main.c index
> 3d484630505a..a415e7d30a2c 100644 --- a/kernel/power/main.c +++
> b/kernel/power/main.c @@ -52,8 +52,8 @@ void pm_restrict_gfp_mask(void) unsigned int lock_system_sleep(void)
> {
> unsigned int flags = current->flags;
> - current->flags |= PF_NOFREEZE; mutex_lock(&system_transition_mutex);
> + current->flags |= PF_NOFREEZE; return flags;
> }
> EXPORT_SYMBOL_GPL(lock_system_sleep);
It seems to me that setting PF_NOFREEZE before acquiring
system_transition_mutex might be intentional — possibly to prevent
deadlocks.
If the task were to be frozen while holding or waiting for the mutex, it
could block suspend or resume paths. So changing the order may risk
breaking that protection.
So, although PF_NOFREEZE could be the better long-term solution, right
now depending exclusively on it might cause issues.
It would require further standardization and guarantees about the flag’s
stability during the freezing process before we can fully rely on it.
I’m looking forward to your thoughts on this.
>
> Anyway, this seems to suggest something relatively simple like this here
> should do:
>
> diff --git a/kernel/freezer.c b/kernel/freezer.c index
> 8d530d0949ff..8b7cecd17564 100644 --- a/kernel/freezer.c +++
> b/kernel/freezer.c @@ -162,20 +162,22 @@ static bool
> __freeze_task(struct task_struct *p) */
> bool freeze_task(struct task_struct *p)
> {
> - unsigned long flags; - - spin_lock_irqsave(&freezer_lock, flags); -
> if (!freezing(p) || frozen(p) || __freeze_task(p)) { -
> spin_unlock_irqrestore(&freezer_lock, flags); + /* + * User tasks get
> NOFREEZE in do_task_dead(). + */ + if ((p->flags & (PF_NOFREEZE |
> PF_KTHREAD)) == PF_NOFREEZE) return false;
> - }
> - if (!(p->flags & PF_KTHREAD)) - fake_signal_wake_up(p); - else -
> wake_up_state(p, TASK_NORMAL); + scoped_guard (spinlock_irqsave,
> &freezer_lock) { + if (!freezing(p) || frozen(p) || __freeze_task(p))
> + return false; + + if (!(p->flags & PF_KTHREAD)) +
> fake_signal_wake_up(p); + else + wake_up_state(p, TASK_NORMAL); + }
> - spin_unlock_irqrestore(&freezer_lock, flags); return true;
> }
Thanks for the suggestion — this looks really clean and simplifies the
logic nicely! The use of a scoped spinlock and the early return based on
PF_NOFREEZE | PF_KTHREAD makes the flow easier to follow.
By the way, in the code above, since for user tasks the PF_NOFREEZE flag
is only set once at death (similar to how exit_state is handled), would
it make sense to check p->exit_state directly here instead?
It seems semantically equivalent for user tasks, and exit_state might be
more explicit in conveying the task's lifecycle state. I'm curious if
there's a specific reason to prefer PF_NOFREEZE over exit_state in this
case.
Best regards,
Zihuan Zhang
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 1/1] PM / Freezer: Skip zombie/dead processes to
2025-07-07 8:42 ` Peter Zijlstra
2025-07-08 0:56 ` Zihuan Zhang
@ 2025-07-16 16:16 ` Oleg Nesterov
1 sibling, 0 replies; 13+ messages in thread
From: Oleg Nesterov @ 2025-07-16 16:16 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Zihuan Zhang, Rafael J. Wysocki, pavel, len.brown, linux-pm,
linux-kernel
I don't like the patch Zihuan, but
On 07/07, Peter Zijlstra wrote:
>
> Anyway, this seems to suggest something relatively simple like this here
> should do:
>
> diff --git a/kernel/freezer.c b/kernel/freezer.c
> index 8d530d0949ff..8b7cecd17564 100644
> --- a/kernel/freezer.c
> +++ b/kernel/freezer.c
> @@ -162,20 +162,22 @@ static bool __freeze_task(struct task_struct *p)
> */
> bool freeze_task(struct task_struct *p)
> {
> - unsigned long flags;
> -
> - spin_lock_irqsave(&freezer_lock, flags);
> - if (!freezing(p) || frozen(p) || __freeze_task(p)) {
> - spin_unlock_irqrestore(&freezer_lock, flags);
> + /*
> + * User tasks get NOFREEZE in do_task_dead().
> + */
> + if ((p->flags & (PF_NOFREEZE | PF_KTHREAD)) == PF_NOFREEZE)
> return false;
> - }
I don't understand your change...
It probably makes sense to avoid freezer_lock when PF_NOFREEZE is set but
it can't change the current behaciour, freezing() -> freezing_slow_path()
checks PF_NOFREEZE too and returns false in this case.
Oleg.
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2025-07-16 16:18 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-11 10:12 [PATCH v3 0/1] PM / Freezer: Skip zombie/dead processes to reduce Zihuan Zhang
2025-06-11 10:12 ` [PATCH v3 1/1] PM / Freezer: Skip zombie/dead processes to Zihuan Zhang
2025-07-03 14:15 ` Rafael J. Wysocki
2025-07-03 16:40 ` Peter Zijlstra
2025-07-03 17:15 ` Rafael J. Wysocki
2025-07-04 1:45 ` Zihuan Zhang
2025-07-04 8:19 ` Peter Zijlstra
2025-07-04 8:48 ` Zihuan Zhang
2025-07-04 9:21 ` Peter Zijlstra
2025-07-07 1:00 ` Zihuan Zhang
2025-07-07 8:42 ` Peter Zijlstra
2025-07-08 0:56 ` Zihuan Zhang
2025-07-16 16:16 ` Oleg Nesterov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox