linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH v2 2/2] sched/tracing: Add TASK_RTLOCK_WAIT to TASK_REPORT
       [not found] ` <20220117164633.322550-3-valentin.schneider@arm.com>
@ 2022-01-17 19:12   ` Eric W. Biederman
  2022-01-18 17:29     ` Valentin Schneider
  0 siblings, 1 reply; 5+ messages in thread
From: Eric W. Biederman @ 2022-01-17 19:12 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: linux-kernel, Uwe Kleine-König, Steven Rostedt,
	Sebastian Andrzej Siewior, Abhijeet Dharmapurikar,
	Dietmar Eggemann, Peter Zijlstra, Ingo Molnar, Vincent Guittot,
	Thomas Gleixner, Juri Lelli, Daniel Bristot de Oliveira,
	Kees Cook, Andrew Morton, Alexey Gladkov, Kenta.Tada@sony.com,
	Randy Dunlap, Ed Tsai, linux-api

Valentin Schneider <valentin.schneider@arm.com> writes:

> TASK_RTLOCK_WAIT currently isn't part of TASK_REPORT, thus a task blocking
> on an rtlock will appear as having a task state == 0, IOW TASK_RUNNING.
>
> The actual state is saved in p->saved_state, but reading it after reading
> p->__state has a few issues:
> o that could still be TASK_RUNNING in the case of e.g. rt_spin_lock
> o ttwu_state_match() might have changed that to TASK_RUNNING
>
> Add TASK_RTLOCK_WAIT to TASK_REPORT.
>
> Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
> Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
> Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
>  fs/proc/array.c              |  3 ++-
>  include/linux/sched.h        | 17 +++++++++--------
>  include/trace/events/sched.h |  1 +
>  3 files changed, 12 insertions(+), 9 deletions(-)
>
> diff --git a/fs/proc/array.c b/fs/proc/array.c
> index ff869a66b34e..f4cae65529a6 100644
> --- a/fs/proc/array.c
> +++ b/fs/proc/array.c
> @@ -128,9 +128,10 @@ static const char * const task_state_array[] = {
>  	"X (dead)",		/* 0x10 */
>  	"Z (zombie)",		/* 0x20 */
>  	"P (parked)",		/* 0x40 */
> +	"L (rt-locked)",        /* 0x80 */
>  
>  	/* states beyond TASK_REPORT: */
> -	"I (idle)",		/* 0x80 */
> +	"I (idle)",		/* 0x100 */
>  };

I think this is at least possibly an ABI break.  I have a vague memory
that userspace is not ready being reported new task states.  Which is
why we encode some of our states the way we do.

Maybe it was just someone being very conservative.

Still if you are going to add new states to userspace and risk breaking
them can you do some basic analysis and report what ps and similar
programs do.

Simply changing userspace without even mentioning that you are changing
the userspace output of proc looks dangerous indeed.

Looking in the history commit 74e37200de8e ("proc: cleanup/simplify
get_task_state/task_state_array") seems to best document the concern
that userspace does not know how to handle new states.

The fact we have had a parked state for quite a few years despite that
concern seems to argue it is possible to extend the states.  Or perhaps
it just argues that parked states are rare enough it does not matter.

It is definitely the case that the ps manpage documents the possible
states and as such they could be a part of anyone's shell scripts.

From the ps man page:
>        Here are the different values that the s, stat and state output
>        specifiers (header "STAT" or "S") will display to describe the
>        state of a process:
> 
>                D    uninterruptible sleep (usually IO)
>                I    Idle kernel thread
>                R    running or runnable (on run queue)
>                S    interruptible sleep (waiting for an event to complete)
>                T    stopped by job control signal
>                t    stopped by debugger during the tracing
>                W    paging (not valid since the 2.6.xx kernel)
>                X    dead (should never be seen)
>                Z    defunct ("zombie") process, terminated but not reaped by its parent
> 

So it looks like a change that adds to the number of states in the
kernel should update the ps man page as well.

Eric

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 2/2] sched/tracing: Add TASK_RTLOCK_WAIT to TASK_REPORT
  2022-01-17 19:12   ` [PATCH v2 2/2] sched/tracing: Add TASK_RTLOCK_WAIT to TASK_REPORT Eric W. Biederman
@ 2022-01-18 17:29     ` Valentin Schneider
  2022-01-18 18:10       ` Eric W. Biederman
  0 siblings, 1 reply; 5+ messages in thread
From: Valentin Schneider @ 2022-01-18 17:29 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, Uwe Kleine-König, Steven Rostedt,
	Sebastian Andrzej Siewior, Abhijeet Dharmapurikar,
	Dietmar Eggemann, Peter Zijlstra, Ingo Molnar, Vincent Guittot,
	Thomas Gleixner, Juri Lelli, Daniel Bristot de Oliveira,
	Kees Cook, Andrew Morton, Alexey Gladkov, Kenta.Tada@sony.com,
	Randy Dunlap, Ed Tsai, linux-api

On 17/01/22 13:12, Eric W. Biederman wrote:
> Valentin Schneider <valentin.schneider@arm.com> writes:
>> --- a/fs/proc/array.c
>> +++ b/fs/proc/array.c
>> @@ -128,9 +128,10 @@ static const char * const task_state_array[] = {
>>  	"X (dead)",		/* 0x10 */
>>  	"Z (zombie)",		/* 0x20 */
>>  	"P (parked)",		/* 0x40 */
>> +	"L (rt-locked)",        /* 0x80 */
>>  
>>  	/* states beyond TASK_REPORT: */
>> -	"I (idle)",		/* 0x80 */
>> +	"I (idle)",		/* 0x100 */
>>  };
>
> I think this is at least possibly an ABI break.  I have a vague memory
> that userspace is not ready being reported new task states.  Which is
> why we encode some of our states the way we do.
>
> Maybe it was just someone being very conservative.
>
> Still if you are going to add new states to userspace and risk breaking
> them can you do some basic analysis and report what ps and similar
> programs do.
>
> Simply changing userspace without even mentioning that you are changing
> the userspace output of proc looks dangerous indeed.
>

Yeah, you're right.

> Looking in the history commit 74e37200de8e ("proc: cleanup/simplify
> get_task_state/task_state_array") seems to best document the concern
> that userspace does not know how to handle new states.
>

Thanks for the sha1 and for digging around. Now, I read
74e37200de8e ("proc: cleanup/simplify get_task_state/task_state_array")
as "get_task_state() isn't clear vs what value is actually exposed to
userspace" rather than "get_task_state() could expose things userspace
doesn't know what to do with".

> The fact we have had a parked state for quite a few years despite that
> concern seems to argue it is possible to extend the states.  Or perhaps
> it just argues that parked states are rare enough it does not matter.
>
> It is definitely the case that the ps manpage documents the possible
> states and as such they could be a part of anyone's shell scripts.
>

06eb61844d84 ("sched/debug: Add explicit TASK_IDLE printing") for instance
seems to suggest extending the states OK, but you're right that this then
requires updating ps' manpage.

Alternatively, TASK_RTLOCK_WAIT could be masqueraded as
TASK_(UN)INTERRUPTIBLE when reported to userspace - it is actually somewhat
similar, unlike TASK_IDLE vs TASK_UNINTERRUPTIBLE for instance. The
handling in get_task_state() will be fugly, but it might be preferable over
exposing a detail userspace might not need to be made aware of?

> From the ps man page:
>>        Here are the different values that the s, stat and state output
>>        specifiers (header "STAT" or "S") will display to describe the
>>        state of a process:
>> 
>>                D    uninterruptible sleep (usually IO)
>>                I    Idle kernel thread
>>                R    running or runnable (on run queue)
>>                S    interruptible sleep (waiting for an event to complete)
>>                T    stopped by job control signal
>>                t    stopped by debugger during the tracing
>>                W    paging (not valid since the 2.6.xx kernel)
>>                X    dead (should never be seen)
>>                Z    defunct ("zombie") process, terminated but not reaped by its parent
>> 
>
> So it looks like a change that adds to the number of states in the
> kernel should update the ps man page as well.
>
> Eric

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 2/2] sched/tracing: Add TASK_RTLOCK_WAIT to TASK_REPORT
  2022-01-18 17:29     ` Valentin Schneider
@ 2022-01-18 18:10       ` Eric W. Biederman
  2022-01-19 18:38         ` Valentin Schneider
  0 siblings, 1 reply; 5+ messages in thread
From: Eric W. Biederman @ 2022-01-18 18:10 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: linux-kernel, Uwe Kleine-König, Steven Rostedt,
	Sebastian Andrzej Siewior, Abhijeet Dharmapurikar,
	Dietmar Eggemann, Peter Zijlstra, Ingo Molnar, Vincent Guittot,
	Thomas Gleixner, Juri Lelli, Daniel Bristot de Oliveira,
	Kees Cook, Andrew Morton, Alexey Gladkov, Kenta.Tada@sony.com,
	Randy Dunlap, Ed Tsai, linux-api

Valentin Schneider <valentin.schneider@arm.com> writes:

> On 17/01/22 13:12, Eric W. Biederman wrote:
>> Valentin Schneider <valentin.schneider@arm.com> writes:
>>> --- a/fs/proc/array.c
>>> +++ b/fs/proc/array.c
>>> @@ -128,9 +128,10 @@ static const char * const task_state_array[] = {
>>>  	"X (dead)",		/* 0x10 */
>>>  	"Z (zombie)",		/* 0x20 */
>>>  	"P (parked)",		/* 0x40 */
>>> +	"L (rt-locked)",        /* 0x80 */
>>>  
>>>  	/* states beyond TASK_REPORT: */
>>> -	"I (idle)",		/* 0x80 */
>>> +	"I (idle)",		/* 0x100 */
>>>  };
>>
>> I think this is at least possibly an ABI break.  I have a vague memory
>> that userspace is not ready being reported new task states.  Which is
>> why we encode some of our states the way we do.
>>
>> Maybe it was just someone being very conservative.
>>
>> Still if you are going to add new states to userspace and risk breaking
>> them can you do some basic analysis and report what ps and similar
>> programs do.
>>
>> Simply changing userspace without even mentioning that you are changing
>> the userspace output of proc looks dangerous indeed.
>>
>
> Yeah, you're right.
>
>> Looking in the history commit 74e37200de8e ("proc: cleanup/simplify
>> get_task_state/task_state_array") seems to best document the concern
>> that userspace does not know how to handle new states.
>>
>
> Thanks for the sha1 and for digging around. Now, I read
> 74e37200de8e ("proc: cleanup/simplify get_task_state/task_state_array")
> as "get_task_state() isn't clear vs what value is actually exposed to
> userspace" rather than "get_task_state() could expose things userspace
> doesn't know what to do with".

There is also commit abd50b39e783 ("wait: introduce EXIT_TRACE to avoid
the racy EXIT_DEAD->EXIT_ZOMBIE transition").

Which I think is more of what I was remembering.

>> The fact we have had a parked state for quite a few years despite that
>> concern seems to argue it is possible to extend the states.  Or perhaps
>> it just argues that parked states are rare enough it does not matter.
>>
>> It is definitely the case that the ps manpage documents the possible
>> states and as such they could be a part of anyone's shell scripts.
>>
>
> 06eb61844d84 ("sched/debug: Add explicit TASK_IDLE printing") for instance
> seems to suggest extending the states OK, but you're right that this then
> requires updating ps' manpage.
>
> Alternatively, TASK_RTLOCK_WAIT could be masqueraded as
> TASK_(UN)INTERRUPTIBLE when reported to userspace - it is actually somewhat
> similar, unlike TASK_IDLE vs TASK_UNINTERRUPTIBLE for instance. The
> handling in get_task_state() will be fugly, but it might be preferable over
> exposing a detail userspace might not need to be made aware of?

Right.

Frequently I have seen people do a cost/benefit analysis.

If the benefit is enough, and tracking down the userspace programs that
need to be verified to work with the change is inexpensive enough the
change is made.  Always keeping in mind that if something was missed and
the change causes a regression the change will need to be reverted.

If there is little benefit or the cost to track down userspace is great
enough the work is put in to hide the change from userspace.  Just
because it is too much trouble to expose it to userspace.

I honestly don't have any kind of sense about how hard it is to verify
that a userspace regression won't result from a change like this.  I
just know that the question needs to be asked.

Eric

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 2/2] sched/tracing: Add TASK_RTLOCK_WAIT to TASK_REPORT
  2022-01-18 18:10       ` Eric W. Biederman
@ 2022-01-19 18:38         ` Valentin Schneider
  2022-01-19 19:13           ` Eric W. Biederman
  0 siblings, 1 reply; 5+ messages in thread
From: Valentin Schneider @ 2022-01-19 18:38 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, Uwe Kleine-König, Steven Rostedt,
	Sebastian Andrzej Siewior, Abhijeet Dharmapurikar,
	Dietmar Eggemann, Peter Zijlstra, Ingo Molnar, Vincent Guittot,
	Thomas Gleixner, Juri Lelli, Daniel Bristot de Oliveira,
	Kees Cook, Andrew Morton, Alexey Gladkov, Kenta.Tada@sony.com,
	Randy Dunlap, Ed Tsai, linux-api

On 18/01/22 12:10, Eric W. Biederman wrote:
> Valentin Schneider <valentin.schneider@arm.com> writes:
>>
>> Alternatively, TASK_RTLOCK_WAIT could be masqueraded as
>> TASK_(UN)INTERRUPTIBLE when reported to userspace - it is actually somewhat
>> similar, unlike TASK_IDLE vs TASK_UNINTERRUPTIBLE for instance. The
>> handling in get_task_state() will be fugly, but it might be preferable over
>> exposing a detail userspace might not need to be made aware of?
>
> Right.
>
> Frequently I have seen people do a cost/benefit analysis.
>
> If the benefit is enough, and tracking down the userspace programs that
> need to be verified to work with the change is inexpensive enough the
> change is made.  Always keeping in mind that if something was missed and
> the change causes a regression the change will need to be reverted.
>
> If there is little benefit or the cost to track down userspace is great
> enough the work is put in to hide the change from userspace.  Just
> because it is too much trouble to expose it to userspace.
>
> I honestly don't have any kind of sense about how hard it is to verify
> that a userspace regression won't result from a change like this.  I
> just know that the question needs to be asked.
>

I see it as: does it actually make sense to expose a new state? All the
information this is conveying is: "this task took a lock that is
substituted by a sleepable lock under PREEMPT_RT". Now that you brought
this up, I don't really see much value in this vs just conveying that the
task is sleeping on a lock, i.e. just report the same as if it had gone
through rt_mutex_lock(), aka:

---
diff --git a/include/linux/sched.h b/include/linux/sched.h
index d00837d12b9d..ac7b3eef4a61 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1626,6 +1626,14 @@ static inline unsigned int __task_state_index(unsigned int tsk_state,
 	if (tsk_state == TASK_IDLE)
 		state = TASK_REPORT_IDLE;
 
+	/*
+	 * We're lying here, but rather than expose a completely new task state
+	 * to userspace, we can make this appear as if the task had gone through
+	 * a regular rt_mutex_lock() call.
+	 */
+	if (tsk_state == TASK_RTLOCK_WAIT)
+		state = TASK_UNINTERRUPTIBLE;
+
 	return fls(state);
 }
 


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 2/2] sched/tracing: Add TASK_RTLOCK_WAIT to TASK_REPORT
  2022-01-19 18:38         ` Valentin Schneider
@ 2022-01-19 19:13           ` Eric W. Biederman
  0 siblings, 0 replies; 5+ messages in thread
From: Eric W. Biederman @ 2022-01-19 19:13 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: linux-kernel, Uwe Kleine-König, Steven Rostedt,
	Sebastian Andrzej Siewior, Abhijeet Dharmapurikar,
	Dietmar Eggemann, Peter Zijlstra, Ingo Molnar, Vincent Guittot,
	Thomas Gleixner, Juri Lelli, Daniel Bristot de Oliveira,
	Kees Cook, Andrew Morton, Alexey Gladkov, Kenta.Tada@sony.com,
	Randy Dunlap, Ed Tsai, linux-api

Valentin Schneider <valentin.schneider@arm.com> writes:

> On 18/01/22 12:10, Eric W. Biederman wrote:
>> Valentin Schneider <valentin.schneider@arm.com> writes:
>>>
>>> Alternatively, TASK_RTLOCK_WAIT could be masqueraded as
>>> TASK_(UN)INTERRUPTIBLE when reported to userspace - it is actually somewhat
>>> similar, unlike TASK_IDLE vs TASK_UNINTERRUPTIBLE for instance. The
>>> handling in get_task_state() will be fugly, but it might be preferable over
>>> exposing a detail userspace might not need to be made aware of?
>>
>> Right.
>>
>> Frequently I have seen people do a cost/benefit analysis.
>>
>> If the benefit is enough, and tracking down the userspace programs that
>> need to be verified to work with the change is inexpensive enough the
>> change is made.  Always keeping in mind that if something was missed and
>> the change causes a regression the change will need to be reverted.
>>
>> If there is little benefit or the cost to track down userspace is great
>> enough the work is put in to hide the change from userspace.  Just
>> because it is too much trouble to expose it to userspace.
>>
>> I honestly don't have any kind of sense about how hard it is to verify
>> that a userspace regression won't result from a change like this.  I
>> just know that the question needs to be asked.
>>
>
> I see it as: does it actually make sense to expose a new state? All the
> information this is conveying is: "this task took a lock that is
> substituted by a sleepable lock under PREEMPT_RT". Now that you brought
> this up, I don't really see much value in this vs just conveying that the
> task is sleeping on a lock, i.e. just report the same as if it had gone
> through rt_mutex_lock(), aka:

That seems reasonable to me.

Eric

>
> ---
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index d00837d12b9d..ac7b3eef4a61 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1626,6 +1626,14 @@ static inline unsigned int __task_state_index(unsigned int tsk_state,
>  	if (tsk_state == TASK_IDLE)
>  		state = TASK_REPORT_IDLE;
>  
> +	/*
> +	 * We're lying here, but rather than expose a completely new task state
> +	 * to userspace, we can make this appear as if the task had gone through
> +	 * a regular rt_mutex_lock() call.
> +	 */
> +	if (tsk_state == TASK_RTLOCK_WAIT)
> +		state = TASK_UNINTERRUPTIBLE;
> +
>  	return fls(state);
>  }
>  

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-01-19 19:13 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20220117164633.322550-1-valentin.schneider@arm.com>
     [not found] ` <20220117164633.322550-3-valentin.schneider@arm.com>
2022-01-17 19:12   ` [PATCH v2 2/2] sched/tracing: Add TASK_RTLOCK_WAIT to TASK_REPORT Eric W. Biederman
2022-01-18 17:29     ` Valentin Schneider
2022-01-18 18:10       ` Eric W. Biederman
2022-01-19 18:38         ` Valentin Schneider
2022-01-19 19:13           ` Eric W. Biederman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).