* Re: [PATCH v2 2/2] sched/tracing: Add TASK_RTLOCK_WAIT to TASK_REPORT [not found] ` <20220117164633.322550-3-valentin.schneider@arm.com> @ 2022-01-17 19:12 ` Eric W. Biederman 2022-01-18 17:29 ` Valentin Schneider 0 siblings, 1 reply; 5+ messages in thread From: Eric W. Biederman @ 2022-01-17 19:12 UTC (permalink / raw) To: Valentin Schneider Cc: linux-kernel, Uwe Kleine-König, Steven Rostedt, Sebastian Andrzej Siewior, Abhijeet Dharmapurikar, Dietmar Eggemann, Peter Zijlstra, Ingo Molnar, Vincent Guittot, Thomas Gleixner, Juri Lelli, Daniel Bristot de Oliveira, Kees Cook, Andrew Morton, Alexey Gladkov, Kenta.Tada@sony.com, Randy Dunlap, Ed Tsai, linux-api Valentin Schneider <valentin.schneider@arm.com> writes: > TASK_RTLOCK_WAIT currently isn't part of TASK_REPORT, thus a task blocking > on an rtlock will appear as having a task state == 0, IOW TASK_RUNNING. > > The actual state is saved in p->saved_state, but reading it after reading > p->__state has a few issues: > o that could still be TASK_RUNNING in the case of e.g. rt_spin_lock > o ttwu_state_match() might have changed that to TASK_RUNNING > > Add TASK_RTLOCK_WAIT to TASK_REPORT. > > Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> > Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> > Reviewed-by: Steven Rostedt <rostedt@goodmis.org> > Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> > --- > fs/proc/array.c | 3 ++- > include/linux/sched.h | 17 +++++++++-------- > include/trace/events/sched.h | 1 + > 3 files changed, 12 insertions(+), 9 deletions(-) > > diff --git a/fs/proc/array.c b/fs/proc/array.c > index ff869a66b34e..f4cae65529a6 100644 > --- a/fs/proc/array.c > +++ b/fs/proc/array.c > @@ -128,9 +128,10 @@ static const char * const task_state_array[] = { > "X (dead)", /* 0x10 */ > "Z (zombie)", /* 0x20 */ > "P (parked)", /* 0x40 */ > + "L (rt-locked)", /* 0x80 */ > > /* states beyond TASK_REPORT: */ > - "I (idle)", /* 0x80 */ > + "I (idle)", /* 0x100 */ > }; I think this is at least possibly an ABI break. I have a vague memory that userspace is not ready being reported new task states. Which is why we encode some of our states the way we do. Maybe it was just someone being very conservative. Still if you are going to add new states to userspace and risk breaking them can you do some basic analysis and report what ps and similar programs do. Simply changing userspace without even mentioning that you are changing the userspace output of proc looks dangerous indeed. Looking in the history commit 74e37200de8e ("proc: cleanup/simplify get_task_state/task_state_array") seems to best document the concern that userspace does not know how to handle new states. The fact we have had a parked state for quite a few years despite that concern seems to argue it is possible to extend the states. Or perhaps it just argues that parked states are rare enough it does not matter. It is definitely the case that the ps manpage documents the possible states and as such they could be a part of anyone's shell scripts. From the ps man page: > Here are the different values that the s, stat and state output > specifiers (header "STAT" or "S") will display to describe the > state of a process: > > D uninterruptible sleep (usually IO) > I Idle kernel thread > R running or runnable (on run queue) > S interruptible sleep (waiting for an event to complete) > T stopped by job control signal > t stopped by debugger during the tracing > W paging (not valid since the 2.6.xx kernel) > X dead (should never be seen) > Z defunct ("zombie") process, terminated but not reaped by its parent > So it looks like a change that adds to the number of states in the kernel should update the ps man page as well. Eric ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v2 2/2] sched/tracing: Add TASK_RTLOCK_WAIT to TASK_REPORT 2022-01-17 19:12 ` [PATCH v2 2/2] sched/tracing: Add TASK_RTLOCK_WAIT to TASK_REPORT Eric W. Biederman @ 2022-01-18 17:29 ` Valentin Schneider 2022-01-18 18:10 ` Eric W. Biederman 0 siblings, 1 reply; 5+ messages in thread From: Valentin Schneider @ 2022-01-18 17:29 UTC (permalink / raw) To: Eric W. Biederman Cc: linux-kernel, Uwe Kleine-König, Steven Rostedt, Sebastian Andrzej Siewior, Abhijeet Dharmapurikar, Dietmar Eggemann, Peter Zijlstra, Ingo Molnar, Vincent Guittot, Thomas Gleixner, Juri Lelli, Daniel Bristot de Oliveira, Kees Cook, Andrew Morton, Alexey Gladkov, Kenta.Tada@sony.com, Randy Dunlap, Ed Tsai, linux-api On 17/01/22 13:12, Eric W. Biederman wrote: > Valentin Schneider <valentin.schneider@arm.com> writes: >> --- a/fs/proc/array.c >> +++ b/fs/proc/array.c >> @@ -128,9 +128,10 @@ static const char * const task_state_array[] = { >> "X (dead)", /* 0x10 */ >> "Z (zombie)", /* 0x20 */ >> "P (parked)", /* 0x40 */ >> + "L (rt-locked)", /* 0x80 */ >> >> /* states beyond TASK_REPORT: */ >> - "I (idle)", /* 0x80 */ >> + "I (idle)", /* 0x100 */ >> }; > > I think this is at least possibly an ABI break. I have a vague memory > that userspace is not ready being reported new task states. Which is > why we encode some of our states the way we do. > > Maybe it was just someone being very conservative. > > Still if you are going to add new states to userspace and risk breaking > them can you do some basic analysis and report what ps and similar > programs do. > > Simply changing userspace without even mentioning that you are changing > the userspace output of proc looks dangerous indeed. > Yeah, you're right. > Looking in the history commit 74e37200de8e ("proc: cleanup/simplify > get_task_state/task_state_array") seems to best document the concern > that userspace does not know how to handle new states. > Thanks for the sha1 and for digging around. Now, I read 74e37200de8e ("proc: cleanup/simplify get_task_state/task_state_array") as "get_task_state() isn't clear vs what value is actually exposed to userspace" rather than "get_task_state() could expose things userspace doesn't know what to do with". > The fact we have had a parked state for quite a few years despite that > concern seems to argue it is possible to extend the states. Or perhaps > it just argues that parked states are rare enough it does not matter. > > It is definitely the case that the ps manpage documents the possible > states and as such they could be a part of anyone's shell scripts. > 06eb61844d84 ("sched/debug: Add explicit TASK_IDLE printing") for instance seems to suggest extending the states OK, but you're right that this then requires updating ps' manpage. Alternatively, TASK_RTLOCK_WAIT could be masqueraded as TASK_(UN)INTERRUPTIBLE when reported to userspace - it is actually somewhat similar, unlike TASK_IDLE vs TASK_UNINTERRUPTIBLE for instance. The handling in get_task_state() will be fugly, but it might be preferable over exposing a detail userspace might not need to be made aware of? > From the ps man page: >> Here are the different values that the s, stat and state output >> specifiers (header "STAT" or "S") will display to describe the >> state of a process: >> >> D uninterruptible sleep (usually IO) >> I Idle kernel thread >> R running or runnable (on run queue) >> S interruptible sleep (waiting for an event to complete) >> T stopped by job control signal >> t stopped by debugger during the tracing >> W paging (not valid since the 2.6.xx kernel) >> X dead (should never be seen) >> Z defunct ("zombie") process, terminated but not reaped by its parent >> > > So it looks like a change that adds to the number of states in the > kernel should update the ps man page as well. > > Eric ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v2 2/2] sched/tracing: Add TASK_RTLOCK_WAIT to TASK_REPORT 2022-01-18 17:29 ` Valentin Schneider @ 2022-01-18 18:10 ` Eric W. Biederman 2022-01-19 18:38 ` Valentin Schneider 0 siblings, 1 reply; 5+ messages in thread From: Eric W. Biederman @ 2022-01-18 18:10 UTC (permalink / raw) To: Valentin Schneider Cc: linux-kernel, Uwe Kleine-König, Steven Rostedt, Sebastian Andrzej Siewior, Abhijeet Dharmapurikar, Dietmar Eggemann, Peter Zijlstra, Ingo Molnar, Vincent Guittot, Thomas Gleixner, Juri Lelli, Daniel Bristot de Oliveira, Kees Cook, Andrew Morton, Alexey Gladkov, Kenta.Tada@sony.com, Randy Dunlap, Ed Tsai, linux-api Valentin Schneider <valentin.schneider@arm.com> writes: > On 17/01/22 13:12, Eric W. Biederman wrote: >> Valentin Schneider <valentin.schneider@arm.com> writes: >>> --- a/fs/proc/array.c >>> +++ b/fs/proc/array.c >>> @@ -128,9 +128,10 @@ static const char * const task_state_array[] = { >>> "X (dead)", /* 0x10 */ >>> "Z (zombie)", /* 0x20 */ >>> "P (parked)", /* 0x40 */ >>> + "L (rt-locked)", /* 0x80 */ >>> >>> /* states beyond TASK_REPORT: */ >>> - "I (idle)", /* 0x80 */ >>> + "I (idle)", /* 0x100 */ >>> }; >> >> I think this is at least possibly an ABI break. I have a vague memory >> that userspace is not ready being reported new task states. Which is >> why we encode some of our states the way we do. >> >> Maybe it was just someone being very conservative. >> >> Still if you are going to add new states to userspace and risk breaking >> them can you do some basic analysis and report what ps and similar >> programs do. >> >> Simply changing userspace without even mentioning that you are changing >> the userspace output of proc looks dangerous indeed. >> > > Yeah, you're right. > >> Looking in the history commit 74e37200de8e ("proc: cleanup/simplify >> get_task_state/task_state_array") seems to best document the concern >> that userspace does not know how to handle new states. >> > > Thanks for the sha1 and for digging around. Now, I read > 74e37200de8e ("proc: cleanup/simplify get_task_state/task_state_array") > as "get_task_state() isn't clear vs what value is actually exposed to > userspace" rather than "get_task_state() could expose things userspace > doesn't know what to do with". There is also commit abd50b39e783 ("wait: introduce EXIT_TRACE to avoid the racy EXIT_DEAD->EXIT_ZOMBIE transition"). Which I think is more of what I was remembering. >> The fact we have had a parked state for quite a few years despite that >> concern seems to argue it is possible to extend the states. Or perhaps >> it just argues that parked states are rare enough it does not matter. >> >> It is definitely the case that the ps manpage documents the possible >> states and as such they could be a part of anyone's shell scripts. >> > > 06eb61844d84 ("sched/debug: Add explicit TASK_IDLE printing") for instance > seems to suggest extending the states OK, but you're right that this then > requires updating ps' manpage. > > Alternatively, TASK_RTLOCK_WAIT could be masqueraded as > TASK_(UN)INTERRUPTIBLE when reported to userspace - it is actually somewhat > similar, unlike TASK_IDLE vs TASK_UNINTERRUPTIBLE for instance. The > handling in get_task_state() will be fugly, but it might be preferable over > exposing a detail userspace might not need to be made aware of? Right. Frequently I have seen people do a cost/benefit analysis. If the benefit is enough, and tracking down the userspace programs that need to be verified to work with the change is inexpensive enough the change is made. Always keeping in mind that if something was missed and the change causes a regression the change will need to be reverted. If there is little benefit or the cost to track down userspace is great enough the work is put in to hide the change from userspace. Just because it is too much trouble to expose it to userspace. I honestly don't have any kind of sense about how hard it is to verify that a userspace regression won't result from a change like this. I just know that the question needs to be asked. Eric ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v2 2/2] sched/tracing: Add TASK_RTLOCK_WAIT to TASK_REPORT 2022-01-18 18:10 ` Eric W. Biederman @ 2022-01-19 18:38 ` Valentin Schneider 2022-01-19 19:13 ` Eric W. Biederman 0 siblings, 1 reply; 5+ messages in thread From: Valentin Schneider @ 2022-01-19 18:38 UTC (permalink / raw) To: Eric W. Biederman Cc: linux-kernel, Uwe Kleine-König, Steven Rostedt, Sebastian Andrzej Siewior, Abhijeet Dharmapurikar, Dietmar Eggemann, Peter Zijlstra, Ingo Molnar, Vincent Guittot, Thomas Gleixner, Juri Lelli, Daniel Bristot de Oliveira, Kees Cook, Andrew Morton, Alexey Gladkov, Kenta.Tada@sony.com, Randy Dunlap, Ed Tsai, linux-api On 18/01/22 12:10, Eric W. Biederman wrote: > Valentin Schneider <valentin.schneider@arm.com> writes: >> >> Alternatively, TASK_RTLOCK_WAIT could be masqueraded as >> TASK_(UN)INTERRUPTIBLE when reported to userspace - it is actually somewhat >> similar, unlike TASK_IDLE vs TASK_UNINTERRUPTIBLE for instance. The >> handling in get_task_state() will be fugly, but it might be preferable over >> exposing a detail userspace might not need to be made aware of? > > Right. > > Frequently I have seen people do a cost/benefit analysis. > > If the benefit is enough, and tracking down the userspace programs that > need to be verified to work with the change is inexpensive enough the > change is made. Always keeping in mind that if something was missed and > the change causes a regression the change will need to be reverted. > > If there is little benefit or the cost to track down userspace is great > enough the work is put in to hide the change from userspace. Just > because it is too much trouble to expose it to userspace. > > I honestly don't have any kind of sense about how hard it is to verify > that a userspace regression won't result from a change like this. I > just know that the question needs to be asked. > I see it as: does it actually make sense to expose a new state? All the information this is conveying is: "this task took a lock that is substituted by a sleepable lock under PREEMPT_RT". Now that you brought this up, I don't really see much value in this vs just conveying that the task is sleeping on a lock, i.e. just report the same as if it had gone through rt_mutex_lock(), aka: --- diff --git a/include/linux/sched.h b/include/linux/sched.h index d00837d12b9d..ac7b3eef4a61 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1626,6 +1626,14 @@ static inline unsigned int __task_state_index(unsigned int tsk_state, if (tsk_state == TASK_IDLE) state = TASK_REPORT_IDLE; + /* + * We're lying here, but rather than expose a completely new task state + * to userspace, we can make this appear as if the task had gone through + * a regular rt_mutex_lock() call. + */ + if (tsk_state == TASK_RTLOCK_WAIT) + state = TASK_UNINTERRUPTIBLE; + return fls(state); } ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v2 2/2] sched/tracing: Add TASK_RTLOCK_WAIT to TASK_REPORT 2022-01-19 18:38 ` Valentin Schneider @ 2022-01-19 19:13 ` Eric W. Biederman 0 siblings, 0 replies; 5+ messages in thread From: Eric W. Biederman @ 2022-01-19 19:13 UTC (permalink / raw) To: Valentin Schneider Cc: linux-kernel, Uwe Kleine-König, Steven Rostedt, Sebastian Andrzej Siewior, Abhijeet Dharmapurikar, Dietmar Eggemann, Peter Zijlstra, Ingo Molnar, Vincent Guittot, Thomas Gleixner, Juri Lelli, Daniel Bristot de Oliveira, Kees Cook, Andrew Morton, Alexey Gladkov, Kenta.Tada@sony.com, Randy Dunlap, Ed Tsai, linux-api Valentin Schneider <valentin.schneider@arm.com> writes: > On 18/01/22 12:10, Eric W. Biederman wrote: >> Valentin Schneider <valentin.schneider@arm.com> writes: >>> >>> Alternatively, TASK_RTLOCK_WAIT could be masqueraded as >>> TASK_(UN)INTERRUPTIBLE when reported to userspace - it is actually somewhat >>> similar, unlike TASK_IDLE vs TASK_UNINTERRUPTIBLE for instance. The >>> handling in get_task_state() will be fugly, but it might be preferable over >>> exposing a detail userspace might not need to be made aware of? >> >> Right. >> >> Frequently I have seen people do a cost/benefit analysis. >> >> If the benefit is enough, and tracking down the userspace programs that >> need to be verified to work with the change is inexpensive enough the >> change is made. Always keeping in mind that if something was missed and >> the change causes a regression the change will need to be reverted. >> >> If there is little benefit or the cost to track down userspace is great >> enough the work is put in to hide the change from userspace. Just >> because it is too much trouble to expose it to userspace. >> >> I honestly don't have any kind of sense about how hard it is to verify >> that a userspace regression won't result from a change like this. I >> just know that the question needs to be asked. >> > > I see it as: does it actually make sense to expose a new state? All the > information this is conveying is: "this task took a lock that is > substituted by a sleepable lock under PREEMPT_RT". Now that you brought > this up, I don't really see much value in this vs just conveying that the > task is sleeping on a lock, i.e. just report the same as if it had gone > through rt_mutex_lock(), aka: That seems reasonable to me. Eric > > --- > diff --git a/include/linux/sched.h b/include/linux/sched.h > index d00837d12b9d..ac7b3eef4a61 100644 > --- a/include/linux/sched.h > +++ b/include/linux/sched.h > @@ -1626,6 +1626,14 @@ static inline unsigned int __task_state_index(unsigned int tsk_state, > if (tsk_state == TASK_IDLE) > state = TASK_REPORT_IDLE; > > + /* > + * We're lying here, but rather than expose a completely new task state > + * to userspace, we can make this appear as if the task had gone through > + * a regular rt_mutex_lock() call. > + */ > + if (tsk_state == TASK_RTLOCK_WAIT) > + state = TASK_UNINTERRUPTIBLE; > + > return fls(state); > } > ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2022-01-19 19:13 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <20220117164633.322550-1-valentin.schneider@arm.com> [not found] ` <20220117164633.322550-3-valentin.schneider@arm.com> 2022-01-17 19:12 ` [PATCH v2 2/2] sched/tracing: Add TASK_RTLOCK_WAIT to TASK_REPORT Eric W. Biederman 2022-01-18 17:29 ` Valentin Schneider 2022-01-18 18:10 ` Eric W. Biederman 2022-01-19 18:38 ` Valentin Schneider 2022-01-19 19:13 ` Eric W. Biederman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).