* Should EXIT_DEAD be visible to userspace ? @ 2021-10-11 19:40 Dave Jones 2021-10-11 20:33 ` Linus Torvalds 0 siblings, 1 reply; 3+ messages in thread From: Dave Jones @ 2021-10-11 19:40 UTC (permalink / raw) To: Linux Kernel; +Cc: Linus Torvalds, Christian Brauner, Oleg Nesterov One of our users reported a crash in some userspace tooling this morning, which scrapes /proc/pid to gather stack traces, process states etc of everything running at the time. The crash occurred because it fell over an unexpected task state, which was 'X'. According to the procps man-pages, this state should never be seen, but here it clearly was. The kernel running at the time was kinda old (5.2) but I don't see much change in the EXIT_DEAD space that would explain something that got fixed subsequently. It's also probably going to be difficult to reproduce unfortunately. So my question is, is procps wrong and code should expect to see X state processes in proc ? The code in question is being hardened to handle unexpected inputs, but I'm curious if the kernel is leaking some state that it shouldn't. Dave ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Should EXIT_DEAD be visible to userspace ? 2021-10-11 19:40 Should EXIT_DEAD be visible to userspace ? Dave Jones @ 2021-10-11 20:33 ` Linus Torvalds 2021-10-12 10:43 ` Christian Brauner 0 siblings, 1 reply; 3+ messages in thread From: Linus Torvalds @ 2021-10-11 20:33 UTC (permalink / raw) To: Dave Jones, Linux Kernel, Christian Brauner, Oleg Nesterov On Mon, Oct 11, 2021 at 12:40 PM Dave Jones <davej@codemonkey.org.uk> wrote: > > One of our users reported a crash in some userspace tooling this > morning, which scrapes /proc/pid to gather stack traces, process states > etc of everything running at the time. > > The crash occurred because it fell over an unexpected task state, > which was 'X'. According to the procps man-pages, this state should > never be seen, but here it clearly was. Heh. > The kernel running at the time was kinda old (5.2) but I don't see much > change in the EXIT_DEAD space that would explain something that got > fixed subsequently. It's also probably going to be difficult to > reproduce unfortunately. > > So my question is, is procps wrong and code should expect to see X state > processes in proc ? The code in question is being hardened to handle > unexpected inputs, but I'm curious if the kernel is leaking some state > that it shouldn't. My gut feel is that the man-pages have clearly been wrong - or at least misleading - for at least the last couple of years (and possibly longer), and this is the first report we've ever had of it actually causing problems. The docs *do* mention 'X'. Even if they say 'should never be seen', it's not like it's not right there. So we could either ask to just have the man-pages fixed to be a little less strongly worded ("never" -> "seldom" or whatever). And apparently procps is already getting fixed. Or we could hide the 'X' state in newer kernels, and just call them zombies to user space. We could literally just change the string from "X (dead)" to "Z (dead)" and the "dead" part would still be there (and different from "Z (zombie)"). And either way, it's likely not going to be something that people will notice ever again. You update your system, and you wouldn't see the problem, because whether the kernel was changed or not, procps was updated. And if the argument is that people didn't update procps, but *did* update the kernel, then sure, that could avoid somebody hitting this again, but that's where the "at least a couple of years and nobody has noticed before" comes in. So I can certainly take a patch that hides 'X', and we can even mark it for stable. But it feels like realistically nobody will actually care, and the real fix is the one to procps, and that fix will make any kernel change irrelevant (and possibly even a slight negative, since now procps might report interesting cases?). End result: if somebody cares enough and sends me a tested patch, I'll apply it. But I personally wouldn't care much, and wouldn't push for it unless we get somebody who does. And I really think that "should never be seen" in the docs is just wrong. Yes, we hold the task lock in task_state() when you look at /proc/<pid>/status. So maybe it will never be seen there, because maybe (I didn't check) we move from X->Z while holding the lock. But other parts of /proc don't actually do that lock_task(), I think. IOW, looking at /proc/<pid>/stat (which shows just the first letter of the state), doesn't do that, I think. So it's not actually serialized with the process going through its dying moments. So I _think_ the "never" was always just "almost never". In fact, take a look at commit ad86622b478e ("wait: swap EXIT_ZOMBIE and EXIT_DEAD to hide EXIT_TRACE from user-space"). That 'X' has been seen before. It's not great when it happens, but I think this is an example of "the 'never' has never been true, and goes back to at least 2014". .. and that 2014 was just when we happened on it before. The actual issue of 'X' showing up looks like it probably predates it (and likely goes back to before git history). So I suspect that stray 'X' is not actually a regression. Just rare enough to be _almost_ "never". Linus ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Should EXIT_DEAD be visible to userspace ? 2021-10-11 20:33 ` Linus Torvalds @ 2021-10-12 10:43 ` Christian Brauner 0 siblings, 0 replies; 3+ messages in thread From: Christian Brauner @ 2021-10-12 10:43 UTC (permalink / raw) To: Linus Torvalds; +Cc: Dave Jones, Linux Kernel, Oleg Nesterov On Mon, Oct 11, 2021 at 01:33:21PM -0700, Linus Torvalds wrote: > On Mon, Oct 11, 2021 at 12:40 PM Dave Jones <davej@codemonkey.org.uk> wrote: > > > > One of our users reported a crash in some userspace tooling this > > morning, which scrapes /proc/pid to gather stack traces, process states > > etc of everything running at the time. > > > > The crash occurred because it fell over an unexpected task state, > > which was 'X'. According to the procps man-pages, this state should > > never be seen, but here it clearly was. > > Heh. > > > The kernel running at the time was kinda old (5.2) but I don't see much > > change in the EXIT_DEAD space that would explain something that got > > fixed subsequently. It's also probably going to be difficult to > > reproduce unfortunately. > > > > So my question is, is procps wrong and code should expect to see X state > > processes in proc ? The code in question is being hardened to handle > > unexpected inputs, but I'm curious if the kernel is leaking some state > > that it shouldn't. > > My gut feel is that the man-pages have clearly been wrong - or at > least misleading - for at least the last couple of years (and possibly > longer), and this is the first report we've ever had of it actually > causing problems. > > The docs *do* mention 'X'. Even if they say 'should never be seen', > it's not like it's not right there. > > So we could either ask to just have the man-pages fixed to be a little > less strongly worded ("never" -> "seldom" or whatever). And apparently > procps is already getting fixed. > > Or we could hide the 'X' state in newer kernels, and just call them > zombies to user space. We could literally just change the string from > "X (dead)" to "Z (dead)" and the "dead" part would still be there (and > different from "Z (zombie)"). > > And either way, it's likely not going to be something that people will > notice ever again. You update your system, and you wouldn't see the > problem, because whether the kernel was changed or not, procps was > updated. > > And if the argument is that people didn't update procps, but *did* > update the kernel, then sure, that could avoid somebody hitting this > again, but that's where the "at least a couple of years and nobody has > noticed before" comes in. > > So I can certainly take a patch that hides 'X', and we can even mark > it for stable. > > But it feels like realistically nobody will actually care, and the > real fix is the one to procps, and that fix will make any kernel > change irrelevant (and possibly even a slight negative, since now > procps might report interesting cases?). > > End result: if somebody cares enough and sends me a tested patch, I'll > apply it. But I personally wouldn't care much, and wouldn't push for > it unless we get somebody who does. > > And I really think that "should never be seen" in the docs is just wrong. > > Yes, we hold the task lock in task_state() when you look at > /proc/<pid>/status. So maybe it will never be seen there, because > maybe (I didn't check) we move from X->Z while holding the lock. > > But other parts of /proc don't actually do that lock_task(), I think. > IOW, looking at /proc/<pid>/stat (which shows just the first letter of > the state), doesn't do that, I think. So it's not actually serialized > with the process going through its dying moments. > > So I _think_ the "never" was always just "almost never". > > In fact, take a look at commit ad86622b478e ("wait: swap EXIT_ZOMBIE > and EXIT_DEAD to hide EXIT_TRACE from user-space"). That 'X' has been > seen before. It's not great when it happens, but I think this is an > example of "the 'never' has never been true, and goes back to at least > 2014". > > .. and that 2014 was just when we happened on it before. The actual > issue of 'X' showing up looks like it probably predates it (and likely > goes back to before git history). > > So I suspect that stray 'X' is not actually a regression. Just rare > enough to be _almost_ "never". Imho, let's 1. update the manpage 2. update procps and call it done. Christian ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2021-10-12 10:43 UTC | newest] Thread overview: 3+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2021-10-11 19:40 Should EXIT_DEAD be visible to userspace ? Dave Jones 2021-10-11 20:33 ` Linus Torvalds 2021-10-12 10:43 ` Christian Brauner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox