Linux Power Management development
 help / color / mirror / Atom feed
* [PATCH v3 0/1] PM / Freezer: Skip zombie/dead processes to reduce
@ 2025-06-11 10:12 Zihuan Zhang
  2025-06-11 10:12 ` [PATCH v3 1/1] PM / Freezer: Skip zombie/dead processes to Zihuan Zhang
  0 siblings, 1 reply; 13+ messages in thread
From: Zihuan Zhang @ 2025-06-11 10:12 UTC (permalink / raw)
  To: rafael, pavel, len.brown; +Cc: linux-pm, linux-kernel, Zihuan Zhang

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=y, Size: 6017 bytes --]

Hi all,

This patch series improves the performance of the process freezer by
skipping zombie tasks during freezing.

In the suspend and hibernation paths, the freezer traverses all tasks
and attempts to freeze them. However, zombie tasks (EXIT_ZOMBIE with
PF_EXITING) are already dead — they are not schedulable and cannot enter
the refrigerator. Attempting to freeze such tasks is redundant and
unnecessarily increases freezing time.

In particular, on systems under fork storm conditions (e.g., many
short-lived processes quickly becoming zombies), the number of zombie tasks
can spike into the thousands or more. We observed that this causes the
freezer loop to waste significant time processing tasks that are guaranteed
to not need freezing.

Testing and Results

Platform:
- Architecture: x86_64
- CPU: ZHAOXIN KaiXian KX-7000
- RAM: 16 GB
- Kernel: v6.6.93

result without the patch:
dmesg | grep elap
[  219.608992] Freezing user space processes completed (elapsed 0.010 seconds)
[  219.617355] Freezing remaining freezable tasks completed (elapsed 0.008 seconds)
[  228.029119] Freezing user space processes completed (elapsed 0.013 seconds)
[  228.040672] Freezing remaining freezable tasks completed (elapsed 0.011 seconds)
[  236.879065] Freezing user space processes completed (elapsed 0.020 seconds)
[  236.897976] Freezing remaining freezable tasks completed (elapsed 0.018 seconds)
[  246.276679] Freezing user space processes completed (elapsed 0.026 seconds)
[  246.298636] Freezing remaining freezable tasks completed (elapsed 0.021 seconds)
[  256.221504] Freezing user space processes completed (elapsed 0.030 seconds)
[  256.248955] Freezing remaining freezable tasks completed (elapsed 0.027 seconds)
[  266.674987] Freezing user space processes completed (elapsed 0.040 seconds)
[  266.709811] Freezing remaining freezable tasks completed (elapsed 0.034 seconds)
[  277.701679] Freezing user space processes completed (elapsed 0.046 seconds)
[  277.742048] Freezing remaining freezable tasks completed (elapsed 0.040 seconds)
[  289.246611] Freezing user space processes completed (elapsed 0.046 seconds)
[  289.290753] Freezing remaining freezable tasks completed (elapsed 0.044 seconds)
[  301.516854] Freezing user space processes completed (elapsed 0.041 seconds)
[  301.576287] Freezing remaining freezable tasks completed (elapsed 0.059 seconds)
[  314.422499] Freezing user space processes completed (elapsed 0.043 seconds)
[  314.465804] Freezing remaining freezable tasks completed (elapsed 0.043 seconds)

result with the patch:
dmesg | grep elap
[   54.161674] Freezing user space processes completed (elapsed 0.007 seconds)
[   54.171536] Freezing remaining freezable tasks completed (elapsed 0.009 seconds)
[   62.556462] Freezing user space processes completed (elapsed 0.006 seconds)
[   62.566496] Freezing remaining freezable tasks completed (elapsed 0.010 seconds)
[   71.395421] Freezing user space processes completed (elapsed 0.009 seconds)
[   71.402820] Freezing remaining freezable tasks completed (elapsed 0.007 seconds)
[   80.785463] Freezing user space processes completed (elapsed 0.010 seconds)
[   80.793914] Freezing remaining freezable tasks completed (elapsed 0.008 seconds)
[   90.962659] Freezing user space processes completed (elapsed 0.012 seconds)
[   90.973519] Freezing remaining freezable tasks completed (elapsed 0.010 seconds)
[  101.435638] Freezing user space processes completed (elapsed 0.013 seconds)
[  101.449023] Freezing remaining freezable tasks completed (elapsed 0.013 seconds)
[  112.669786] Freezing user space processes completed (elapsed 0.015 seconds)
[  112.683540] Freezing remaining freezable tasks completed (elapsed 0.013 seconds)
[  124.585681] Freezing user space processes completed (elapsed 0.017 seconds)
[  124.599553] Freezing remaining freezable tasks completed (elapsed 0.013 seconds)
[  136.826635] Freezing user space processes completed (elapsed 0.016 seconds)
[  136.841840] Freezing remaining freezable tasks completed (elapsed 0.015 seconds)
[  149.686575] Freezing user space processes completed (elapsed 0.016 seconds)
[  149.701549] Freezing remaining freezable tasks completed (elapsed 0.014 seconds)

Here is the user-space fork storm simulator used for testing:

```c
// create_zombie.c

void usage(const char *prog) {
    fprintf(stderr, "Usage: %s <number_of_zombies>\n", prog);
    exit(EXIT_FAILURE);
}

int main(int argc, char *argv[]) {
    if (argc != 2) {
        usage(argv[0]);
    }

    long num_zombies = strtol(argv[1], NULL, 10);
    if (num_zombies <= 0 || num_zombies > 1000000) {
        fprintf(stderr, "Invalid number of zombies: %ld\n", num_zombies);
        exit(EXIT_FAILURE);
    }

    printf("Creating %ld zombie processes...\n", num_zombies);

    for (long i = 0; i < num_zombies; i++) {
        pid_t pid = fork();
        if (pid < 0) {
            perror("fork failed");
            exit(EXIT_FAILURE);
        } else if (pid == 0) {
            // Child exits immediately
            exit(0);
        }
        // Parent does NOT wait, leaving zombie
    }

    printf("All child processes created. Sleeping for 60 seconds...\n");
    sleep(60);

    printf("Parent exiting, zombies will be reaped by init.\n");
    return 0;
}
```

And we used a shell loop to suspend repeatedly:

```bash
LOOPS=10

echo none > /sys/power/pm_test
echo 5 > /sys/module/suspend/parameters/pm_test_delay
for ((i=1; i<=LOOPS; i++)); do
echo "===== Test round $i/$LOOPS ====="
./create_zombie $((i * 3000)) &
sleep 5
echo mem > /sys/power/state

pkill create_zombie
echo "Round $i complete. Waiting 5s..."
sleep 5

done
echo "==== All $LOOPS rounds complete ===="
```

Zihuan Zhang (1):
  PM / Freezer: Skip zombie/dead processes to reduce freeze latency

 kernel/power/process.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v3 1/1] PM / Freezer: Skip zombie/dead processes to
  2025-06-11 10:12 [PATCH v3 0/1] PM / Freezer: Skip zombie/dead processes to reduce Zihuan Zhang
@ 2025-06-11 10:12 ` Zihuan Zhang
  2025-07-03 14:15   ` Rafael J. Wysocki
  0 siblings, 1 reply; 13+ messages in thread
From: Zihuan Zhang @ 2025-06-11 10:12 UTC (permalink / raw)
  To: rafael, pavel, len.brown; +Cc: linux-pm, linux-kernel, Zihuan Zhang

When freezing user space during suspend or hibernation, the freezer
iterates over all tasks and attempts to freeze them via
try_to_freeze_tasks().

However, zombie processes (i.e., tasks in EXIT_ZOMBIE state) are no
longer running and will never enter the refrigerator. Trying to freeze
them is meaningless and causes extra overhead, especially when there are
thousands of zombies created during stress conditions such as fork
storms.

This patch skips zombie processes during the freezing phase.

In our testing with ~30,000 user processes (including many zombies), the
average freeze time during suspend (S3) dropped from ~43 ms to ~16 ms:

    - Without the patch: ~43 ms average freeze latency
    - With the patch:    ~16 ms average freeze latency
    - Improvement:       ~62%

This confirms that skipping zombies significantly speeds up the freezing
process when the system is under heavy load with many short-lived tasks.

Signed-off-by: Zihuan Zhang <zhangzihuan@kylinos.cn>

Changes in v3:
- Added performance test

Changes in v2:
- Simplified code, added judgment of dead processes
- Rewrite changelog
---
 kernel/power/process.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/power/process.c b/kernel/power/process.c
index a6f7ba2d283d..2bbe22610522 100644
--- a/kernel/power/process.c
+++ b/kernel/power/process.c
@@ -51,7 +51,7 @@ static int try_to_freeze_tasks(bool user_only)
 		todo = 0;
 		read_lock(&tasklist_lock);
 		for_each_process_thread(g, p) {
-			if (p == current || !freeze_task(p))
+			if (p == current || p->exit_state || !freeze_task(p))
 				continue;
 
 			todo++;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 1/1] PM / Freezer: Skip zombie/dead processes to
  2025-06-11 10:12 ` [PATCH v3 1/1] PM / Freezer: Skip zombie/dead processes to Zihuan Zhang
@ 2025-07-03 14:15   ` Rafael J. Wysocki
  2025-07-03 16:40     ` Peter Zijlstra
  0 siblings, 1 reply; 13+ messages in thread
From: Rafael J. Wysocki @ 2025-07-03 14:15 UTC (permalink / raw)
  To: Zihuan Zhang, Peter Zijlstra
  Cc: rafael, pavel, len.brown, linux-pm, linux-kernel

The patch subject appears to be incomplete.

On Wed, Jun 11, 2025 at 12:13 PM Zihuan Zhang <zhangzihuan@kylinos.cn> wrote:
>
> When freezing user space during suspend or hibernation, the freezer
> iterates over all tasks and attempts to freeze them via
> try_to_freeze_tasks().
>
> However, zombie processes (i.e., tasks in EXIT_ZOMBIE state) are no
> longer running and will never enter the refrigerator. Trying to freeze
> them is meaningless and causes extra overhead, especially when there are
> thousands of zombies created during stress conditions such as fork
> storms.
>
> This patch skips zombie processes during the freezing phase.
>
> In our testing with ~30,000 user processes (including many zombies), the
> average freeze time during suspend (S3) dropped from ~43 ms to ~16 ms:
>
>     - Without the patch: ~43 ms average freeze latency
>     - With the patch:    ~16 ms average freeze latency
>     - Improvement:       ~62%

And what's the total suspend time on the system in question?

> This confirms that skipping zombies significantly speeds up the freezing
> process when the system is under heavy load with many short-lived tasks.
>
> Signed-off-by: Zihuan Zhang <zhangzihuan@kylinos.cn>
>
> Changes in v3:
> - Added performance test
>
> Changes in v2:
> - Simplified code, added judgment of dead processes
> - Rewrite changelog
> ---
>  kernel/power/process.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/power/process.c b/kernel/power/process.c
> index a6f7ba2d283d..2bbe22610522 100644
> --- a/kernel/power/process.c
> +++ b/kernel/power/process.c
> @@ -51,7 +51,7 @@ static int try_to_freeze_tasks(bool user_only)
>                 todo = 0;
>                 read_lock(&tasklist_lock);
>                 for_each_process_thread(g, p) {
> -                       if (p == current || !freeze_task(p))
> +                       if (p == current || p->exit_state || !freeze_task(p))
>                                 continue;
>
>                         todo++;
> --

This is basically fine by me, but I wonder what other people think.

Peter?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 1/1] PM / Freezer: Skip zombie/dead processes to
  2025-07-03 14:15   ` Rafael J. Wysocki
@ 2025-07-03 16:40     ` Peter Zijlstra
  2025-07-03 17:15       ` Rafael J. Wysocki
  0 siblings, 1 reply; 13+ messages in thread
From: Peter Zijlstra @ 2025-07-03 16:40 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Zihuan Zhang, pavel, len.brown, linux-pm, linux-kernel,
	Oleg Nesterov

On Thu, Jul 03, 2025 at 04:15:10PM +0200, Rafael J. Wysocki wrote:
> The patch subject appears to be incomplete.
> 
> On Wed, Jun 11, 2025 at 12:13 PM Zihuan Zhang <zhangzihuan@kylinos.cn> wrote:
> >
> > When freezing user space during suspend or hibernation, the freezer
> > iterates over all tasks and attempts to freeze them via
> > try_to_freeze_tasks().
> >
> > However, zombie processes (i.e., tasks in EXIT_ZOMBIE state) are no
> > longer running and will never enter the refrigerator. Trying to freeze
> > them is meaningless and causes extra overhead, especially when there are
> > thousands of zombies created during stress conditions such as fork
> > storms.
> >
> > This patch skips zombie processes during the freezing phase.
> >
> > In our testing with ~30,000 user processes (including many zombies), the
> > average freeze time during suspend (S3) dropped from ~43 ms to ~16 ms:
> >
> >     - Without the patch: ~43 ms average freeze latency
> >     - With the patch:    ~16 ms average freeze latency
> >     - Improvement:       ~62%
> 
> And what's the total suspend time on the system in question?
> 
> > This confirms that skipping zombies significantly speeds up the freezing
> > process when the system is under heavy load with many short-lived tasks.
> >
> > Signed-off-by: Zihuan Zhang <zhangzihuan@kylinos.cn>
> >
> > Changes in v3:
> > - Added performance test
> >
> > Changes in v2:
> > - Simplified code, added judgment of dead processes
> > - Rewrite changelog
> > ---
> >  kernel/power/process.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/kernel/power/process.c b/kernel/power/process.c
> > index a6f7ba2d283d..2bbe22610522 100644
> > --- a/kernel/power/process.c
> > +++ b/kernel/power/process.c
> > @@ -51,7 +51,7 @@ static int try_to_freeze_tasks(bool user_only)
> >                 todo = 0;
> >                 read_lock(&tasklist_lock);
> >                 for_each_process_thread(g, p) {
> > -                       if (p == current || !freeze_task(p))
> > +                       if (p == current || p->exit_state || !freeze_task(p))
> >                                 continue;
> >
> >                         todo++;
> > --
> 
> This is basically fine by me, but I wonder what other people think.
> 
> Peter?

How realistic is it to have a significant amount of zombies when
freezing? This seems like an artificial corner case at best.

Zombie tasks are stuck waiting on their parent to consume their exit
state or something, right? And those parents being frozen, they pretty
much stay there.

So I suppose the logic holds, but urgh, do we really need this?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 1/1] PM / Freezer: Skip zombie/dead processes to
  2025-07-03 16:40     ` Peter Zijlstra
@ 2025-07-03 17:15       ` Rafael J. Wysocki
  2025-07-04  1:45         ` Zihuan Zhang
  2025-07-04  8:19         ` Peter Zijlstra
  0 siblings, 2 replies; 13+ messages in thread
From: Rafael J. Wysocki @ 2025-07-03 17:15 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Rafael J. Wysocki, Zihuan Zhang, pavel, len.brown, linux-pm,
	linux-kernel, Oleg Nesterov

On Thu, Jul 3, 2025 at 6:40 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Thu, Jul 03, 2025 at 04:15:10PM +0200, Rafael J. Wysocki wrote:
> > The patch subject appears to be incomplete.
> >
> > On Wed, Jun 11, 2025 at 12:13 PM Zihuan Zhang <zhangzihuan@kylinos.cn> wrote:
> > >
> > > When freezing user space during suspend or hibernation, the freezer
> > > iterates over all tasks and attempts to freeze them via
> > > try_to_freeze_tasks().
> > >
> > > However, zombie processes (i.e., tasks in EXIT_ZOMBIE state) are no
> > > longer running and will never enter the refrigerator. Trying to freeze
> > > them is meaningless and causes extra overhead, especially when there are
> > > thousands of zombies created during stress conditions such as fork
> > > storms.
> > >
> > > This patch skips zombie processes during the freezing phase.
> > >
> > > In our testing with ~30,000 user processes (including many zombies), the
> > > average freeze time during suspend (S3) dropped from ~43 ms to ~16 ms:
> > >
> > >     - Without the patch: ~43 ms average freeze latency
> > >     - With the patch:    ~16 ms average freeze latency
> > >     - Improvement:       ~62%
> >
> > And what's the total suspend time on the system in question?
> >
> > > This confirms that skipping zombies significantly speeds up the freezing
> > > process when the system is under heavy load with many short-lived tasks.
> > >
> > > Signed-off-by: Zihuan Zhang <zhangzihuan@kylinos.cn>
> > >
> > > Changes in v3:
> > > - Added performance test
> > >
> > > Changes in v2:
> > > - Simplified code, added judgment of dead processes
> > > - Rewrite changelog
> > > ---
> > >  kernel/power/process.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/kernel/power/process.c b/kernel/power/process.c
> > > index a6f7ba2d283d..2bbe22610522 100644
> > > --- a/kernel/power/process.c
> > > +++ b/kernel/power/process.c
> > > @@ -51,7 +51,7 @@ static int try_to_freeze_tasks(bool user_only)
> > >                 todo = 0;
> > >                 read_lock(&tasklist_lock);
> > >                 for_each_process_thread(g, p) {
> > > -                       if (p == current || !freeze_task(p))
> > > +                       if (p == current || p->exit_state || !freeze_task(p))
> > >                                 continue;
> > >
> > >                         todo++;
> > > --
> >
> > This is basically fine by me, but I wonder what other people think.
> >
> > Peter?
>
> How realistic is it to have a significant amount of zombies when
> freezing? This seems like an artificial corner case at best.
>
> Zombie tasks are stuck waiting on their parent to consume their exit
> state or something, right? And those parents being frozen, they pretty
> much stay there.
>
> So I suppose the logic holds, but urgh, do we really need this?

Unlikely in practice, but the code change is small and it would be
prudent to get this addressed IMV (at least so we don't need to
revisit it).

But I would ask for a comment above this check to explain that zombies
need not be frozen.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 1/1] PM / Freezer: Skip zombie/dead processes to
  2025-07-03 17:15       ` Rafael J. Wysocki
@ 2025-07-04  1:45         ` Zihuan Zhang
  2025-07-04  8:19         ` Peter Zijlstra
  1 sibling, 0 replies; 13+ messages in thread
From: Zihuan Zhang @ 2025-07-04  1:45 UTC (permalink / raw)
  To: Rafael J. Wysocki, Peter Zijlstra
  Cc: pavel, len.brown, linux-pm, linux-kernel, Oleg Nesterov

Hi Rafael,

在 2025/7/4 01:15, Rafael J. Wysocki 写道:
> On Thu, Jul 3, 2025 at 6:40 PM Peter Zijlstra <peterz@infradead.org> wrote:
>> On Thu, Jul 03, 2025 at 04:15:10PM +0200, Rafael J. Wysocki wrote:
>>> The patch subject appears to be incomplete.
You’re right — the patch subject was indeed incomplete. That was an 
oversight on my part, and I’ll fix it in the next version.

Thanks for pointing it out.
>>> On Wed, Jun 11, 2025 at 12:13 PM Zihuan Zhang <zhangzihuan@kylinos.cn> wrote:
>>>> When freezing user space during suspend or hibernation, the freezer
>>>> iterates over all tasks and attempts to freeze them via
>>>> try_to_freeze_tasks().
>>>>
>>>> However, zombie processes (i.e., tasks in EXIT_ZOMBIE state) are no
>>>> longer running and will never enter the refrigerator. Trying to freeze
>>>> them is meaningless and causes extra overhead, especially when there are
>>>> thousands of zombies created during stress conditions such as fork
>>>> storms.
>>>>
>>>> This patch skips zombie processes during the freezing phase.
>>>>
>>>> In our testing with ~30,000 user processes (including many zombies), the
>>>> average freeze time during suspend (S3) dropped from ~43 ms to ~16 ms:
>>>>
>>>>      - Without the patch: ~43 ms average freeze latency
>>>>      - With the patch:    ~16 ms average freeze latency
>>>>      - Improvement:       ~62%
>>> And what's the total suspend time on the system in question?
>>>
We used the sleepgraph tool to measure the full suspend-to-RAM (S3) 
latency on our test platform. The total suspend time was around 1859.055 
ms, so the optimization to skip zombie processes — reducing freeze time 
from ~43 ms to ~16 ms — accounts for roughly 1% of the total suspend 
latency.

While the absolute gain is relatively small, it helps reduce unnecessary 
overhead under stress conditions such as fork storms with many zombie 
tasks, and improves freeze-time predictability.

>>>> This confirms that skipping zombies significantly speeds up the freezing
>>>> process when the system is under heavy load with many short-lived tasks.
>>>>
>>>> Signed-off-by: Zihuan Zhang <zhangzihuan@kylinos.cn>
>>>>
>>>> Changes in v3:
>>>> - Added performance test
>>>>
>>>> Changes in v2:
>>>> - Simplified code, added judgment of dead processes
>>>> - Rewrite changelog
>>>> ---
>>>>   kernel/power/process.c | 2 +-
>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/kernel/power/process.c b/kernel/power/process.c
>>>> index a6f7ba2d283d..2bbe22610522 100644
>>>> --- a/kernel/power/process.c
>>>> +++ b/kernel/power/process.c
>>>> @@ -51,7 +51,7 @@ static int try_to_freeze_tasks(bool user_only)
>>>>                  todo = 0;
>>>>                  read_lock(&tasklist_lock);
>>>>                  for_each_process_thread(g, p) {
>>>> -                       if (p == current || !freeze_task(p))
>>>> +                       if (p == current || p->exit_state || !freeze_task(p))
>>>>                                  continue;
>>>>
>>>>                          todo++;
>>>> --
>>> This is basically fine by me, but I wonder what other people think.
>>>
>>> Peter?
>> How realistic is it to have a significant amount of zombies when
>> freezing? This seems like an artificial corner case at best.
>>
>> Zombie tasks are stuck waiting on their parent to consume their exit
>> state or something, right? And those parents being frozen, they pretty
>> much stay there.
>>
>> So I suppose the logic holds, but urgh, do we really need this?
> Unlikely in practice, but the code change is small and it would be
> prudent to get this addressed IMV (at least so we don't need to
> revisit it).
>
> But I would ask for a comment above this check to explain that zombies
> need not be frozen.
Thanks for the suggestion.

Yes, I’ll add a comment to clarify the rationale. Planning to add the 
following just above the check:
> /*
>   * Zombie and dead tasks are not running anymore and cannot enter
>   * the __refrigerator(). Skipping them avoids unnecessary freeze attempts.
>   */

I’ll include this in the next version of the patch.

Best regards,
Zihuan Zhang


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 1/1] PM / Freezer: Skip zombie/dead processes to
  2025-07-03 17:15       ` Rafael J. Wysocki
  2025-07-04  1:45         ` Zihuan Zhang
@ 2025-07-04  8:19         ` Peter Zijlstra
  2025-07-04  8:48           ` Zihuan Zhang
  1 sibling, 1 reply; 13+ messages in thread
From: Peter Zijlstra @ 2025-07-04  8:19 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Zihuan Zhang, pavel, len.brown, linux-pm, linux-kernel,
	Oleg Nesterov

On Thu, Jul 03, 2025 at 07:15:23PM +0200, Rafael J. Wysocki wrote:

> > How realistic is it to have a significant amount of zombies when
> > freezing? This seems like an artificial corner case at best.
> >
> > Zombie tasks are stuck waiting on their parent to consume their exit
> > state or something, right? And those parents being frozen, they pretty
> > much stay there.
> >
> > So I suppose the logic holds, but urgh, do we really need this?
> 
> Unlikely in practice, but the code change is small and it would be
> prudent to get this addressed IMV (at least so we don't need to
> revisit it).
> 
> But I would ask for a comment above this check to explain that zombies
> need not be frozen.

Depending on where they wait (I can't seem to find in a hurry) it might
make sense to make that wait FREEZABLE anyway.

For example, AFAICT it wouldn't hurt, and might even help some, to make
kernel/exit.c:do_wait() TASK_FREEZABLE.

So where do ZOMBIEs sleep? Don't they simply pass through do_task_dead()
and never get scheduled again? Notably, do_task_dead() already marks the
tasks as PF_NOFREEZE.

Anyway, yes, the condition it adds is relatively simple, but I really
don't see why we should complicate things *at*all*.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 1/1] PM / Freezer: Skip zombie/dead processes to
  2025-07-04  8:19         ` Peter Zijlstra
@ 2025-07-04  8:48           ` Zihuan Zhang
  2025-07-04  9:21             ` Peter Zijlstra
  0 siblings, 1 reply; 13+ messages in thread
From: Zihuan Zhang @ 2025-07-04  8:48 UTC (permalink / raw)
  To: Peter Zijlstra, Rafael J. Wysocki
  Cc: pavel, len.brown, linux-pm, linux-kernel, Oleg Nesterov

Hi Peter,

Thanks for the feedback.

在 2025/7/4 16:19, Peter Zijlstra 写道:
>                                
> Depending on where they wait (I can't seem to find in a hurry) it might
> make sense to make that wait FREEZABLE anyway.
>
> For example, AFAICT it wouldn't hurt, and might even help some, to make
> kernel/exit.c:do_wait() TASK_FREEZABLE.
>
> So where do ZOMBIEs sleep? Don't they simply pass through do_task_dead()
> and never get scheduled again? Notably, do_task_dead() already marks the
> tasks as PF_NOFREEZE.
>
> Anyway, yes, the condition it adds is relatively simple, but I really
> don't see why we should complicate things *at*all*.

You’re absolutely right — zombie processes won’t be frozen in practice, 
since PF_NOFREEZE is already set in do_task_dead(). However, if we don’t 
explicitly skip them early in try_to_freeze_task(), they still go 
through the freezer logic path, including calls like freeze_task() → 
freezing() before eventually returning without freezing.

This not only introduces unnecessary code path traversal, but also 
involves locking (e.g., spin_lock_irqsave/restore()), which could be 
avoided altogether if we bail out earlier.

Additionally, skipping zombies directly helps reduce the list traversal 
overhead in freeze_processes(), especially on systems with a large 
number of tasks, where zombies can account for a non-trivial fraction.

So while the practical effect might be small, the gain is low-risk and 
helps streamline the freezer logic a bit more.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 1/1] PM / Freezer: Skip zombie/dead processes to
  2025-07-04  8:48           ` Zihuan Zhang
@ 2025-07-04  9:21             ` Peter Zijlstra
  2025-07-07  1:00               ` Zihuan Zhang
  0 siblings, 1 reply; 13+ messages in thread
From: Peter Zijlstra @ 2025-07-04  9:21 UTC (permalink / raw)
  To: Zihuan Zhang
  Cc: Rafael J. Wysocki, pavel, len.brown, linux-pm, linux-kernel,
	Oleg Nesterov

On Fri, Jul 04, 2025 at 04:48:20PM +0800, Zihuan Zhang wrote:
> Hi Peter,
> 
> Thanks for the feedback.
> 
> ??? 2025/7/4 16:19, Peter Zijlstra ??????:
> > ????????? ??? ??? ?????? ??? ??? ??? ??? ??? ??? ??? ??? ??? ??????
> > Depending on where they wait (I can't seem to find in a hurry) it might
> > make sense to make that wait FREEZABLE anyway.
> > 
> > For example, AFAICT it wouldn't hurt, and might even help some, to make
> > kernel/exit.c:do_wait() TASK_FREEZABLE.
> > 
> > So where do ZOMBIEs sleep? Don't they simply pass through do_task_dead()
> > and never get scheduled again? Notably, do_task_dead() already marks the
> > tasks as PF_NOFREEZE.
> > 
> > Anyway, yes, the condition it adds is relatively simple, but I really
> > don't see why we should complicate things *at*all*.
> 
> You???re absolutely right ??? zombie processes won???t be frozen in
> practice, since PF_NOFREEZE is already set in do_task_dead(). However, if we
> don???t explicitly skip them early in try_to_freeze_task(), they still go
> through the freezer logic path, including calls like freeze_task() ???
> freezing() before eventually returning without freezing.
> 
> This not only introduces unnecessary code path traversal, but also involves
> locking (e.g., spin_lock_irqsave/restore()), which could be avoided
> altogether if we bail out earlier.
> 
> Additionally, skipping zombies directly helps reduce the list traversal
> overhead in freeze_processes(), especially on systems with a large number of
> tasks, where zombies can account for a non-trivial fraction.
> 
> So while the practical effect might be small, the gain is low-risk and helps
> streamline the freezer logic a bit more.

You're missing the obvious. How about we 'fix' the PF_NOFREEZE handling
and help all cases that set that and not only zombies?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 1/1] PM / Freezer: Skip zombie/dead processes to
  2025-07-04  9:21             ` Peter Zijlstra
@ 2025-07-07  1:00               ` Zihuan Zhang
  2025-07-07  8:42                 ` Peter Zijlstra
  0 siblings, 1 reply; 13+ messages in thread
From: Zihuan Zhang @ 2025-07-07  1:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Rafael J. Wysocki, pavel, len.brown, linux-pm, linux-kernel,
	Oleg Nesterov


在 2025/7/4 17:21, Peter Zijlstra 写道:
> You're missing the obvious. How about we 'fix' the PF_NOFREEZE handling
> and help all cases that set that and not only zombies?                               

It sounds like a good idea, but there’s a potential risk in relying 
solely on PF_NOFREEZE: it’s a mutable flag that can be set or cleared 
dynamically during runtime, even within the freeze window.
If a task changes its PF_NOFREEZE state after passing the early check in 
try_to_freeze_task(), we might skip freezing it incorrectly, leading to 
inconsistent behavior or unexpected task escapes. This is particularly 
tricky for some kernel threads or exit paths where PF_NOFREEZE is 
manipulated as part of cleanup or teardown.
In contrast, checking exit_state for EXIT_ZOMBIE or EXIT_DEAD is fully 
stable and irreversible — once a task becomes a zombie, it's no longer 
schedulable and doesn’t require freezing. That makes the condition both 
safe and predictable, without risk of races or missed transitions.
Currently, the safest way to skip tasks based on PF_NOFREEZE would be:
*if (!(p->flags & PF_KTHREAD) && (p->flags & PF_NOFREEZE))*
to avoid skipping kernel threads that may dynamically toggle this flag. 
But this increases code complexity compared to simply checking
*if (p->exit_state)*

which is more straightforward.
Importantly, exit_state is set earlier and more reliably than PF_NOFREEZE,
So while improving PF_NOFREEZE handling is a worthwhile direction (and 
I’d be happy to explore that in a separate patch RFC 
[1]<https://lore.kernel.org/all/20250624115244.19260-1-zhangzihuan@kylinos.cn/>), 
this small patch keeps the logic minimal and avoids subtle corner cases. 
It also reduces unnecessary locking for zombie tasks, which can help 
when there are lots of them around.
[1]: 
https://lore.kernel.org/all/20250624115244.19260-1-zhangzihuan@kylinos.cn/
Thanks again for your insightful feedback — much appreciated.

Best regards,
Zihuan Zhang


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 1/1] PM / Freezer: Skip zombie/dead processes to
  2025-07-07  1:00               ` Zihuan Zhang
@ 2025-07-07  8:42                 ` Peter Zijlstra
  2025-07-08  0:56                   ` Zihuan Zhang
  2025-07-16 16:16                   ` Oleg Nesterov
  0 siblings, 2 replies; 13+ messages in thread
From: Peter Zijlstra @ 2025-07-07  8:42 UTC (permalink / raw)
  To: Zihuan Zhang
  Cc: Rafael J. Wysocki, pavel, len.brown, linux-pm, linux-kernel,
	Oleg Nesterov

On Mon, Jul 07, 2025 at 09:00:12AM +0800, Zihuan Zhang wrote:
> 
> 在 2025/7/4 17:21, Peter Zijlstra 写道:
> > You're missing the obvious. How about we 'fix' the PF_NOFREEZE handling
> > and help all cases that set that and not only zombies?                               
> 
> It sounds like a good idea, but there’s a potential risk in relying solely
> on PF_NOFREEZE: it’s a mutable flag that can be set or cleared dynamically
> during runtime, even within the freeze window.

> If a task changes its PF_NOFREEZE state after passing the early check in
> try_to_freeze_task(), we might skip freezing it incorrectly, leading to
> inconsistent behavior or unexpected task escapes. This is particularly
> tricky for some kernel threads or exit paths where PF_NOFREEZE is
> manipulated as part of cleanup or teardown.

A quick browse through the code seems to suggest that for user tasks,
PF_NOFREEZE is set just like exit_state, once at death.

For kernel threads the situation is a little more complex; but typically
a kthread is spawned with PF_NOFREEZE set, and then some will clear it
again, but always before then calling a TASK_FREEZABLE wait.

The only thing I didn't fully investigate is this
{,un}lock_system_sleep() thing. But that would appear to need at least
the below fixlet.

diff --git a/kernel/power/main.c b/kernel/power/main.c
index 3d484630505a..a415e7d30a2c 100644
--- a/kernel/power/main.c
+++ b/kernel/power/main.c
@@ -52,8 +52,8 @@ void pm_restrict_gfp_mask(void)
 unsigned int lock_system_sleep(void)
 {
 	unsigned int flags = current->flags;
-	current->flags |= PF_NOFREEZE;
 	mutex_lock(&system_transition_mutex);
+	current->flags |= PF_NOFREEZE;
 	return flags;
 }
 EXPORT_SYMBOL_GPL(lock_system_sleep);


Anyway, this seems to suggest something relatively simple like this here
should do:

diff --git a/kernel/freezer.c b/kernel/freezer.c
index 8d530d0949ff..8b7cecd17564 100644
--- a/kernel/freezer.c
+++ b/kernel/freezer.c
@@ -162,20 +162,22 @@ static bool __freeze_task(struct task_struct *p)
  */
 bool freeze_task(struct task_struct *p)
 {
-	unsigned long flags;
-
-	spin_lock_irqsave(&freezer_lock, flags);
-	if (!freezing(p) || frozen(p) || __freeze_task(p)) {
-		spin_unlock_irqrestore(&freezer_lock, flags);
+	/*
+	 * User tasks get NOFREEZE in do_task_dead().
+	 */
+	if ((p->flags & (PF_NOFREEZE | PF_KTHREAD)) == PF_NOFREEZE)
 		return false;
-	}
 
-	if (!(p->flags & PF_KTHREAD))
-		fake_signal_wake_up(p);
-	else
-		wake_up_state(p, TASK_NORMAL);
+	scoped_guard (spinlock_irqsave, &freezer_lock) {
+		if (!freezing(p) || frozen(p) || __freeze_task(p))
+			return false;
+
+		if (!(p->flags & PF_KTHREAD))
+			fake_signal_wake_up(p);
+		else
+			wake_up_state(p, TASK_NORMAL);
+	}
 
-	spin_unlock_irqrestore(&freezer_lock, flags);
 	return true;
 }
 

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 1/1] PM / Freezer: Skip zombie/dead processes to
  2025-07-07  8:42                 ` Peter Zijlstra
@ 2025-07-08  0:56                   ` Zihuan Zhang
  2025-07-16 16:16                   ` Oleg Nesterov
  1 sibling, 0 replies; 13+ messages in thread
From: Zihuan Zhang @ 2025-07-08  0:56 UTC (permalink / raw)
  To: Peter Zijlstra, Rafael J. Wysocki
  Cc: Rafael J. Wysocki, pavel, len.brown, linux-pm, linux-kernel,
	Oleg Nesterov


在 2025/7/7 16:42, Peter Zijlstra 写道:
> A quick browse through the code seems to suggest that for user tasks,
> PF_NOFREEZE is set just like exit_state, once at death.
>
I couldn’t agree more — for user tasks, PF_NOFREEZE is indeed set at the 
same time as exit_state, right at death.
> For kernel threads the situation is a little more complex; but typically
> a kthread is spawned with PF_NOFREEZE set, and then some will clear it
> again, but always before then calling a TASK_FREEZABLE wait.      

While that’s generally the expected pattern, it depends on each kthread 
correctly managing the PF_NOFREEZE flag before entering a TASK_FREEZABLE 
wait.

This assumption can be fragile in practice — a missed update or 
unconventional usage could lead to inconsistent freezing behavior.

> The only thing I didn't fully investigate is this
> {,un}lock_system_sleep() thing. But that would appear to need at least
> the below fixlet.
>
> diff --git a/kernel/power/main.c b/kernel/power/main.c index 
> 3d484630505a..a415e7d30a2c 100644 --- a/kernel/power/main.c +++ 
> b/kernel/power/main.c @@ -52,8 +52,8 @@ void pm_restrict_gfp_mask(void)  unsigned int lock_system_sleep(void)
>   {
>   	unsigned int flags = current->flags;
> - current->flags |= PF_NOFREEZE;  	mutex_lock(&system_transition_mutex);
> + current->flags |= PF_NOFREEZE;  	return flags;
>   }
>   EXPORT_SYMBOL_GPL(lock_system_sleep);      
It seems to me that setting PF_NOFREEZE before acquiring 
system_transition_mutex might be intentional — possibly to prevent 
deadlocks.

If the task were to be frozen while holding or waiting for the mutex, it 
could block suspend or resume paths. So changing the order may risk 
breaking that protection.

So, although PF_NOFREEZE could be the better long-term solution, right 
now depending exclusively on it might cause issues.

It would require further standardization and guarantees about the flag’s 
stability during the freezing process before we can fully rely on it.

I’m looking forward to your thoughts on this.

>             
> Anyway, this seems to suggest something relatively simple like this here
> should do:
>
> diff --git a/kernel/freezer.c b/kernel/freezer.c index 
> 8d530d0949ff..8b7cecd17564 100644 --- a/kernel/freezer.c +++ 
> b/kernel/freezer.c @@ -162,20 +162,22 @@ static bool 
> __freeze_task(struct task_struct *p)   */
>   bool freeze_task(struct task_struct *p)
>   {
> - unsigned long flags; - - spin_lock_irqsave(&freezer_lock, flags); - 
> if (!freezing(p) || frozen(p) || __freeze_task(p)) { - 
> spin_unlock_irqrestore(&freezer_lock, flags); + /* + * User tasks get 
> NOFREEZE in do_task_dead(). + */ + if ((p->flags & (PF_NOFREEZE | 
> PF_KTHREAD)) == PF_NOFREEZE)  		return false;
> - }  
> - if (!(p->flags & PF_KTHREAD)) - fake_signal_wake_up(p); - else - 
> wake_up_state(p, TASK_NORMAL); + scoped_guard (spinlock_irqsave, 
> &freezer_lock) { + if (!freezing(p) || frozen(p) || __freeze_task(p)) 
> + return false; + + if (!(p->flags & PF_KTHREAD)) + 
> fake_signal_wake_up(p); + else + wake_up_state(p, TASK_NORMAL); + }  
> - spin_unlock_irqrestore(&freezer_lock, flags);  	return true;
>   }

Thanks for the suggestion — this looks really clean and simplifies the 
logic nicely! The use of a scoped spinlock and the early return based on 
PF_NOFREEZE | PF_KTHREAD makes the flow easier to follow.

By the way, in the code above, since for user tasks the PF_NOFREEZE flag 
is only set once at death (similar to how exit_state is handled), would 
it make sense to check p->exit_state directly here instead?

It seems semantically equivalent for user tasks, and exit_state might be 
more explicit in conveying the task's lifecycle state. I'm curious if 
there's a specific reason to prefer PF_NOFREEZE over exit_state in this 
case.

Best regards,
Zihuan Zhang



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 1/1] PM / Freezer: Skip zombie/dead processes to
  2025-07-07  8:42                 ` Peter Zijlstra
  2025-07-08  0:56                   ` Zihuan Zhang
@ 2025-07-16 16:16                   ` Oleg Nesterov
  1 sibling, 0 replies; 13+ messages in thread
From: Oleg Nesterov @ 2025-07-16 16:16 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Zihuan Zhang, Rafael J. Wysocki, pavel, len.brown, linux-pm,
	linux-kernel

I don't like the patch Zihuan, but

On 07/07, Peter Zijlstra wrote:
>
> Anyway, this seems to suggest something relatively simple like this here
> should do:
>
> diff --git a/kernel/freezer.c b/kernel/freezer.c
> index 8d530d0949ff..8b7cecd17564 100644
> --- a/kernel/freezer.c
> +++ b/kernel/freezer.c
> @@ -162,20 +162,22 @@ static bool __freeze_task(struct task_struct *p)
>   */
>  bool freeze_task(struct task_struct *p)
>  {
> -	unsigned long flags;
> -
> -	spin_lock_irqsave(&freezer_lock, flags);
> -	if (!freezing(p) || frozen(p) || __freeze_task(p)) {
> -		spin_unlock_irqrestore(&freezer_lock, flags);
> +	/*
> +	 * User tasks get NOFREEZE in do_task_dead().
> +	 */
> +	if ((p->flags & (PF_NOFREEZE | PF_KTHREAD)) == PF_NOFREEZE)
>  		return false;
> -	}

I don't understand your change...

It probably makes sense to avoid freezer_lock when PF_NOFREEZE is set but
it can't change the current behaciour, freezing() -> freezing_slow_path()
checks PF_NOFREEZE too and returns false in this case.

Oleg.


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2025-07-16 16:18 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-11 10:12 [PATCH v3 0/1] PM / Freezer: Skip zombie/dead processes to reduce Zihuan Zhang
2025-06-11 10:12 ` [PATCH v3 1/1] PM / Freezer: Skip zombie/dead processes to Zihuan Zhang
2025-07-03 14:15   ` Rafael J. Wysocki
2025-07-03 16:40     ` Peter Zijlstra
2025-07-03 17:15       ` Rafael J. Wysocki
2025-07-04  1:45         ` Zihuan Zhang
2025-07-04  8:19         ` Peter Zijlstra
2025-07-04  8:48           ` Zihuan Zhang
2025-07-04  9:21             ` Peter Zijlstra
2025-07-07  1:00               ` Zihuan Zhang
2025-07-07  8:42                 ` Peter Zijlstra
2025-07-08  0:56                   ` Zihuan Zhang
2025-07-16 16:16                   ` Oleg Nesterov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox