* Oops on 2.4.x invalid procfs i_ino value
@ 2004-12-17 22:49 Brent Casavant
2004-12-18 0:38 ` William Lee Irwin III
0 siblings, 1 reply; 7+ messages in thread
From: Brent Casavant @ 2004-12-17 22:49 UTC (permalink / raw)
To: linux-kernel; +Cc: wli, mingo
I've run into a number of crashes while closing procfs stat files
when a system is under load. I think I've found the problem, but
am a little unsure how to proceed. This all happens to be on a
2.4.21 based kernel, but by my brief code inspection I think the
problem still exists on more recent 2.4.x kernels.
In procfs the fake_ino() macro is used to construct the inode number
for each entry.
#define fake_ino(pid,ino) (((pid)<<16)|(ino))
In particular this is used in proc_pid_make_inode:
inode->i_ino = fake_ino(task->pid, ino);
Note that a pid may be more than 16 bits in width (e.g. in IA64), and
we're trying to stuff it into the upper 16 bits of the inode number.
This isn't usually a problem, except when the lower 16 bits of the
inode happen to be 0 (i.e. pids that are a multiple of 65536).
Why does zero matter? Glad you asked.
In proc_delete_inode there is a check to see if the inode is is
a "proper" (whatever that means) procfs inode. The whole function
is:
static void proc_delete_inode(struct inode *inode)
{
struct proc_dir_entry *de = inode->u.generic_ip;
inode->i_state = I_CLEAR;
if (PROC_INODE_PROPER(inode)) {
proc_pid_delete_inode(inode);
return;
}
if (de) {
if (de->owner)
__MOD_DEC_USE_COUNT(de->owner);
de_put(de);
}
}
PROC_INODE_PROPER() is:
#define PROC_INODE_PROPER(inode) ((inode)->i_ino & ~0xffff)
In other words, it checks whether the top 16 bits of the inode number
(equivalent to the bottom 16 bits of the pid) are non-zero.
Thus closing a proc entry for any task with a pid that is a multiple of
65536 will fail this check, skip proc_pid_delete_inode, and call
__MOD_DEC_USE_COUNT, more than likely causing a panic on an invalid
memory access, and minimally corrupting something in memory otherwise.
I don't have a solution coded up (mostly because I'm a bit bleary
eyed after looking at crash dumps all day) -- but are there any
thoughts on how to go about addressing this one? An obvious workaround
is setting kernel.pid_max to 65535, but that's only a workaround, not
a solution.
On a related note, if it matters, on about half the crash dumps I've
looked at, I see a pid of 0 has been assigned to a user process,
tripping this same problem. I suspect there's another bug somewhere
that's allowing a pid of 0 to be chosen in the first place -- but I
don't totally discount that this problem may lay in SGI's patches to
this particular kernel -- I'll need to take a more thorough look.
Thanks,
Brent
--
Brent Casavant If you had nothing to fear,
bcasavan@sgi.com how then could you be brave?
Silicon Graphics, Inc. -- Queen Dama, Source Wars
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Oops on 2.4.x invalid procfs i_ino value
2004-12-17 22:49 Oops on 2.4.x invalid procfs i_ino value Brent Casavant
@ 2004-12-18 0:38 ` William Lee Irwin III
2004-12-18 0:47 ` William Lee Irwin III
0 siblings, 1 reply; 7+ messages in thread
From: William Lee Irwin III @ 2004-12-18 0:38 UTC (permalink / raw)
To: Brent Casavant; +Cc: linux-kernel, mingo
On Fri, Dec 17, 2004 at 04:49:44PM -0600, Brent Casavant wrote:
> Thus closing a proc entry for any task with a pid that is a multiple of
> 65536 will fail this check, skip proc_pid_delete_inode, and call
> __MOD_DEC_USE_COUNT, more than likely causing a panic on an invalid
> memory access, and minimally corrupting something in memory otherwise.
> I don't have a solution coded up (mostly because I'm a bit bleary
> eyed after looking at crash dumps all day) -- but are there any
> thoughts on how to go about addressing this one? An obvious workaround
> is setting kernel.pid_max to 65535, but that's only a workaround, not
> a solution.
> On a related note, if it matters, on about half the crash dumps I've
> looked at, I see a pid of 0 has been assigned to a user process,
> tripping this same problem. I suspect there's another bug somewhere
> that's allowing a pid of 0 to be chosen in the first place -- but I
> don't totally discount that this problem may lay in SGI's patches to
> this particular kernel -- I'll need to take a more thorough look.
That's rather ominous. I'll pore over pid.c and see what's going on.
Also, does the pid.c in your kernel version match 2.6.x-CURRENT?
-- wli
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Oops on 2.4.x invalid procfs i_ino value
2004-12-18 0:38 ` William Lee Irwin III
@ 2004-12-18 0:47 ` William Lee Irwin III
2004-12-20 22:35 ` Brent Casavant
0 siblings, 1 reply; 7+ messages in thread
From: William Lee Irwin III @ 2004-12-18 0:47 UTC (permalink / raw)
To: Brent Casavant; +Cc: linux-kernel, mingo
On Fri, Dec 17, 2004 at 04:49:44PM -0600, Brent Casavant wrote:
>> On a related note, if it matters, on about half the crash dumps I've
>> looked at, I see a pid of 0 has been assigned to a user process,
>> tripping this same problem. I suspect there's another bug somewhere
>> that's allowing a pid of 0 to be chosen in the first place -- but I
>> don't totally discount that this problem may lay in SGI's patches to
>> this particular kernel -- I'll need to take a more thorough look.
On Fri, Dec 17, 2004 at 04:38:35PM -0800, William Lee Irwin III wrote:
> That's rather ominous. I'll pore over pid.c and see what's going on.
> Also, does the pid.c in your kernel version match 2.6.x-CURRENT?
Ouch, 2.4.21; this will be trouble. So next, what patches atop 2.4.21?
-- wli
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Oops on 2.4.x invalid procfs i_ino value
2004-12-18 0:47 ` William Lee Irwin III
@ 2004-12-20 22:35 ` Brent Casavant
2004-12-22 15:46 ` Marcelo Tosatti
2004-12-27 21:40 ` William Lee Irwin III
0 siblings, 2 replies; 7+ messages in thread
From: Brent Casavant @ 2004-12-20 22:35 UTC (permalink / raw)
To: William Lee Irwin III; +Cc: linux-kernel, mingo
On Fri, 17 Dec 2004, William Lee Irwin III wrote:
> On Fri, Dec 17, 2004 at 04:49:44PM -0600, Brent Casavant wrote:
> >> On a related note, if it matters, on about half the crash dumps I've
> >> looked at, I see a pid of 0 has been assigned to a user process,
> >> tripping this same problem. I suspect there's another bug somewhere
> >> that's allowing a pid of 0 to be chosen in the first place -- but I
> >> don't totally discount that this problem may lay in SGI's patches to
> >> this particular kernel -- I'll need to take a more thorough look.
>
> On Fri, Dec 17, 2004 at 04:38:35PM -0800, William Lee Irwin III wrote:
> > That's rather ominous. I'll pore over pid.c and see what's going on.
> > Also, does the pid.c in your kernel version match 2.6.x-CURRENT?
>
> Ouch, 2.4.21; this will be trouble. So next, what patches atop 2.4.21?
I wouldn't worry about the pid=0 issue -- I think it's most likely
due to the PAGG patches (http://oss.sgi.com/projects/pagg) causing
some sort of problem at process teardown (all the pid=0 processes are
in the process of exiting).
I'm more concerned about the (0 == pid & 0xffff) bug, which is present
in the unpatched mainline 2.4.x kernel. It seems that the easiest fix
is marking such pids as in-use at pidmap allocation, so that they are
never assigned to real tasks. I've got the code almost done, but need
to port it to top-of-tree before submitting a patch.
Brent
--
Brent Casavant If you had nothing to fear,
bcasavan@sgi.com how then could you be brave?
Silicon Graphics, Inc. -- Queen Dama, Source Wars
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Oops on 2.4.x invalid procfs i_ino value
2004-12-20 22:35 ` Brent Casavant
@ 2004-12-22 15:46 ` Marcelo Tosatti
2004-12-27 19:04 ` Marcelo Tosatti
2004-12-27 21:40 ` William Lee Irwin III
1 sibling, 1 reply; 7+ messages in thread
From: Marcelo Tosatti @ 2004-12-22 15:46 UTC (permalink / raw)
To: Brent Casavant; +Cc: William Lee Irwin III, linux-kernel, mingo, Al Viro
On Mon, Dec 20, 2004 at 04:35:18PM -0600, Brent Casavant wrote:
> On Fri, 17 Dec 2004, William Lee Irwin III wrote:
>
> > On Fri, Dec 17, 2004 at 04:49:44PM -0600, Brent Casavant wrote:
> > >> On a related note, if it matters, on about half the crash dumps I've
> > >> looked at, I see a pid of 0 has been assigned to a user process,
> > >> tripping this same problem. I suspect there's another bug somewhere
> > >> that's allowing a pid of 0 to be chosen in the first place -- but I
> > >> don't totally discount that this problem may lay in SGI's patches to
> > >> this particular kernel -- I'll need to take a more thorough look.
> >
> > On Fri, Dec 17, 2004 at 04:38:35PM -0800, William Lee Irwin III wrote:
> > > That's rather ominous. I'll pore over pid.c and see what's going on.
> > > Also, does the pid.c in your kernel version match 2.6.x-CURRENT?
> >
> > Ouch, 2.4.21; this will be trouble. So next, what patches atop 2.4.21?
>
> I wouldn't worry about the pid=0 issue -- I think it's most likely
> due to the PAGG patches (http://oss.sgi.com/projects/pagg) causing
> some sort of problem at process teardown (all the pid=0 processes are
> in the process of exiting).
>
> I'm more concerned about the (0 == pid & 0xffff) bug, which is present
> in the unpatched mainline 2.4.x kernel. It seems that the easiest fix
> is marking such pids as in-use at pidmap allocation, so that they are
> never assigned to real tasks. I've got the code almost done, but need
> to port it to top-of-tree before submitting a patch.
Hi Brent,
Wouldnt it be feasible to have another "procfs inode type" to indicate such
lower 16-bit zeroed pid's with a new type PROC_PID_INO_ZERO16BIT (or a better
name) and have fake_ino() handle these case by then using the upper 16-bits on
the inode for this "special" pid's.
And have proc_pid_make_inode() and related code handle this new type? No?
I'm not a big fan of making such pids unuseable for real tasks, so it would be
nice if we could come up a fix for the buggy proc inode logic.
Thanks for finding this out!
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Oops on 2.4.x invalid procfs i_ino value
2004-12-22 15:46 ` Marcelo Tosatti
@ 2004-12-27 19:04 ` Marcelo Tosatti
0 siblings, 0 replies; 7+ messages in thread
From: Marcelo Tosatti @ 2004-12-27 19:04 UTC (permalink / raw)
To: Brent Casavant; +Cc: William Lee Irwin III, linux-kernel, mingo, Al Viro
On Wed, Dec 22, 2004 at 01:46:27PM -0200, Marcelo Tosatti wrote:
> On Mon, Dec 20, 2004 at 04:35:18PM -0600, Brent Casavant wrote:
> > On Fri, 17 Dec 2004, William Lee Irwin III wrote:
> >
> > > On Fri, Dec 17, 2004 at 04:49:44PM -0600, Brent Casavant wrote:
> > > >> On a related note, if it matters, on about half the crash dumps I've
> > > >> looked at, I see a pid of 0 has been assigned to a user process,
> > > >> tripping this same problem. I suspect there's another bug somewhere
> > > >> that's allowing a pid of 0 to be chosen in the first place -- but I
> > > >> don't totally discount that this problem may lay in SGI's patches to
> > > >> this particular kernel -- I'll need to take a more thorough look.
> > >
> > > On Fri, Dec 17, 2004 at 04:38:35PM -0800, William Lee Irwin III wrote:
> > > > That's rather ominous. I'll pore over pid.c and see what's going on.
> > > > Also, does the pid.c in your kernel version match 2.6.x-CURRENT?
> > >
> > > Ouch, 2.4.21; this will be trouble. So next, what patches atop 2.4.21?
> >
> > I wouldn't worry about the pid=0 issue -- I think it's most likely
> > due to the PAGG patches (http://oss.sgi.com/projects/pagg) causing
> > some sort of problem at process teardown (all the pid=0 processes are
> > in the process of exiting).
> >
> > I'm more concerned about the (0 == pid & 0xffff) bug, which is present
> > in the unpatched mainline 2.4.x kernel. It seems that the easiest fix
> > is marking such pids as in-use at pidmap allocation, so that they are
> > never assigned to real tasks. I've got the code almost done, but need
> > to port it to top-of-tree before submitting a patch.
>
> Hi Brent,
>
> Wouldnt it be feasible to have another "procfs inode type" to indicate such
> lower 16-bit zeroed pid's with a new type PROC_PID_INO_ZERO16BIT (or a better
> name) and have fake_ino() handle these case by then using the upper 16-bits on
> the inode for this "special" pid's.
>
> And have proc_pid_make_inode() and related code handle this new type? No?
>
> I'm not a big fan of making such pids unuseable for real tasks, so it would be
> nice if we could come up a fix for the buggy proc inode logic.
Actually, while talking to wli on IRC:
#define PID_MAX 0x8000
So it shouldnt be a problem at all in mainline v2.4.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Oops on 2.4.x invalid procfs i_ino value
2004-12-20 22:35 ` Brent Casavant
2004-12-22 15:46 ` Marcelo Tosatti
@ 2004-12-27 21:40 ` William Lee Irwin III
1 sibling, 0 replies; 7+ messages in thread
From: William Lee Irwin III @ 2004-12-27 21:40 UTC (permalink / raw)
To: Brent Casavant; +Cc: linux-kernel, mingo
On Fri, 17 Dec 2004, William Lee Irwin III wrote:
>> Ouch, 2.4.21; this will be trouble. So next, what patches atop 2.4.21?
On Mon, Dec 20, 2004 at 04:35:18PM -0600, Brent Casavant wrote:
> I wouldn't worry about the pid=0 issue -- I think it's most likely
> due to the PAGG patches (http://oss.sgi.com/projects/pagg) causing
> some sort of problem at process teardown (all the pid=0 processes are
> in the process of exiting).
> I'm more concerned about the (0 == pid & 0xffff) bug, which is present
> in the unpatched mainline 2.4.x kernel. It seems that the easiest fix
> is marking such pids as in-use at pidmap allocation, so that they are
> never assigned to real tasks. I've got the code almost done, but need
> to port it to top-of-tree before submitting a patch.
I see no 0 == pid & 0xffff nor any pidmap in unpatched 2.4.x. Also,
please notice that pid & ~(PID_MAX-1) a.k.a. pid & ~0x7fff a.k.a.
pid & 0xffff8000? And so it appears numerous checks of this form are
already there.
Perhaps the pristine sources are not as pristine as one had hoped?
-- wli
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2004-12-27 21:40 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-12-17 22:49 Oops on 2.4.x invalid procfs i_ino value Brent Casavant
2004-12-18 0:38 ` William Lee Irwin III
2004-12-18 0:47 ` William Lee Irwin III
2004-12-20 22:35 ` Brent Casavant
2004-12-22 15:46 ` Marcelo Tosatti
2004-12-27 19:04 ` Marcelo Tosatti
2004-12-27 21:40 ` William Lee Irwin III
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.