* Re: 2.6.24-rc3: find complains about /proc/net
[not found] <20071119191000.GA1560@elf.ucw.cz>
@ 2007-11-19 22:04 ` Rafael J. Wysocki
2007-11-20 15:51 ` Pavel Emelyanov
0 siblings, 1 reply; 22+ messages in thread
From: Rafael J. Wysocki @ 2007-11-19 22:04 UTC (permalink / raw)
To: Pavel Machek; +Cc: kernel list, netdev
On Monday, 19 of November 2007, Pavel Machek wrote:
> Hi!
>
> I think that this worked before:
>
> root@amd:/proc# find . -name "timer_info"
> find: WARNING: Hard link count is wrong for ./net: this may be a bug
> in your filesystem driver. Automatically turning on find's -noleaf
> option. Earlier results may have failed to include directories that
> should have been searched.
> root@amd:/proc#
I'm seeing that too.
Rafael
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 2.6.24-rc3: find complains about /proc/net
2007-11-19 22:04 ` Rafael J. Wysocki
@ 2007-11-20 15:51 ` Pavel Emelyanov
2007-11-20 21:52 ` Eric W. Biederman
` (2 more replies)
0 siblings, 3 replies; 22+ messages in thread
From: Pavel Emelyanov @ 2007-11-20 15:51 UTC (permalink / raw)
To: Rafael J. Wysocki, Pavel Machek, Eric W. Biederman; +Cc: kernel list, netdev
Rafael J. Wysocki wrote:
> On Monday, 19 of November 2007, Pavel Machek wrote:
>> Hi!
>>
>> I think that this worked before:
>>
>> root@amd:/proc# find . -name "timer_info"
>> find: WARNING: Hard link count is wrong for ./net: this may be a bug
>> in your filesystem driver. Automatically turning on find's -noleaf
>> option. Earlier results may have failed to include directories that
>> should have been searched.
>> root@amd:/proc#
>
> I'm seeing that too.
I have a better things with 2.6.24-rc3 ;)
# cd /proc/net
# ls ..
ls: reading directory ..: Not a directory
and this
# cd /proc
# find
...
./net
find: . changed during execution of find
# find net
find: net changed during execution of find
# find net/
<this works ok however>
Moreover. Program that opens /proc/net and dumps the /proc/self/fd
files produces the following:
# cd /
# a.out /proc/net
...
lr-x------ 1 root root 64 Nov 20 18:02 3 -> /proc/net/net (deleted)
...
# cd /proc/net
# a.out .
...
lr-x------ 1 root root 64 Nov 20 18:03 3 -> /proc/net/net (deleted)
...
# a.out ..
...
lr-x------ 1 root root 64 Nov 20 18:03 3 -> /proc/net
...
This all is somehow related to the shadow proc files.
E.g. the first problem (with -ENOTDIR) is due to the shadow /proc/net
dentry doesn't implement the .readdir method:
static const struct file_operations proc_net_dir_operations = {
.read = generic_read_dir,
};
And I haven't managed to find out why the rest problems
occur...
Eric, do you have fixes for it?
> Rafael
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 2.6.24-rc3: find complains about /proc/net
2007-11-20 15:51 ` Pavel Emelyanov
@ 2007-11-20 21:52 ` Eric W. Biederman
2007-11-20 21:59 ` Ingo Molnar
2007-11-21 1:19 ` Eric W. Biederman
2007-11-21 6:36 ` Eric W. Biederman
2 siblings, 1 reply; 22+ messages in thread
From: Eric W. Biederman @ 2007-11-20 21:52 UTC (permalink / raw)
To: Pavel Emelyanov; +Cc: Rafael J. Wysocki, Pavel Machek, kernel list, netdev
Pavel Emelyanov <xemul@openvz.org> writes:
> Rafael J. Wysocki wrote:
>> On Monday, 19 of November 2007, Pavel Machek wrote:
>>> Hi!
>>>
>>> I think that this worked before:
>>>
>>> root@amd:/proc# find . -name "timer_info"
>>> find: WARNING: Hard link count is wrong for ./net: this may be a bug
>>> in your filesystem driver. Automatically turning on find's -noleaf
>>> option. Earlier results may have failed to include directories that
>>> should have been searched.
>>> root@amd:/proc#
>>
>> I'm seeing that too.
>
> I have a better things with 2.6.24-rc3 ;)
>
> # cd /proc/net
> # ls ..
> ls: reading directory ..: Not a directory
Ok. That part is truly a bug.
Looks like you have tracked down the cause.
Grumble you are getting the wrong .. :(
> and this
>
> # cd /proc
> # find
> ...
> ./net
> find: . changed during execution of find
> # find net
> find: net changed during execution of find
> # find net/
> <this works ok however>
>
> Moreover. Program that opens /proc/net and dumps the /proc/self/fd
> files produces the following:
>
> # cd /
> # a.out /proc/net
> ...
> lr-x------ 1 root root 64 Nov 20 18:02 3 -> /proc/net/net (deleted)
> ...
> # cd /proc/net
> # a.out .
> ...
> lr-x------ 1 root root 64 Nov 20 18:03 3 -> /proc/net/net (deleted)
> ...
> # a.out ..
> ...
> lr-x------ 1 root root 64 Nov 20 18:03 3 -> /proc/net
> ...
Yes all of those are nasty. So much for my clever way of implementing
these things. Grr. Simple hacks that almost work!
> This all is somehow related to the shadow proc files.
> E.g. the first problem (with -ENOTDIR) is due to the shadow /proc/net
> dentry doesn't implement the .readdir method:
>
> static const struct file_operations proc_net_dir_operations = {
> .read = generic_read_dir,
> };
>
> And I haven't managed to find out why the rest problems
> occur...
>
> Eric, do you have fixes for it?
Not exactly. It is tricky. I have known there are issues but so far
the difficulty of a better solution has been higher then my annoyance
level with this problem.
A special solution for !CONFIG_NET_NS may be practical for 2.6.24.
The only way I know of to really solve this problem cleanly and
completely is to make /proc/net an explicit symlink to /proc/self/net
and make /proc/<pid>/net a magic mountpoint (ala nfs automounts) that
mounts a per network namespace filesystem. Al Viro wasn't to happy
when I suggested it (mostly because he was convinced such a solution
was likely to be full of races).
The half assed clean solution is to ensure nothing under /proc/net
gets cached and ensure the dentry tree is built properly, for the
current reader of /proc.
A third option is to fix .. in /proc/net. Although I'm a bit
dubious if that will do more then fix a few symptoms with the
current solution.
Eric
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 2.6.24-rc3: find complains about /proc/net
2007-11-20 21:52 ` Eric W. Biederman
@ 2007-11-20 21:59 ` Ingo Molnar
2007-11-20 22:17 ` Eric W. Biederman
0 siblings, 1 reply; 22+ messages in thread
From: Ingo Molnar @ 2007-11-20 21:59 UTC (permalink / raw)
To: Eric W. Biederman
Cc: Pavel Emelyanov, Rafael J. Wysocki, Pavel Machek, kernel list,
netdev
* Eric W. Biederman <ebiederm@xmission.com> wrote:
> > lr-x------ 1 root root 64 Nov 20 18:03 3 -> /proc/net
> > ...
>
> Yes all of those are nasty. So much for my clever way of implementing
> these things. Grr. Simple hacks that almost work!
btw., in case you feel inclined, i recently did some userspace coding
and found to my surprise that /proc/self points to the parent task, not
the thread itself (giving threads no real way to examine themselves). If
you are hacking in this area, would it be a big trouble to add something
like /proc/self-task/ or something like that? I had to use a raw gettid
syscall to figure out the TID to get to /proc/*/tasks/TID/sched
instrumentation info - which is quite a PITA.
Ingo
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 2.6.24-rc3: find complains about /proc/net
2007-11-20 21:59 ` Ingo Molnar
@ 2007-11-20 22:17 ` Eric W. Biederman
2007-11-20 22:35 ` Ingo Molnar
0 siblings, 1 reply; 22+ messages in thread
From: Eric W. Biederman @ 2007-11-20 22:17 UTC (permalink / raw)
To: Ingo Molnar
Cc: Pavel Emelyanov, Rafael J. Wysocki, Pavel Machek, kernel list,
netdev
Ingo Molnar <mingo@elte.hu> writes:
> * Eric W. Biederman <ebiederm@xmission.com> wrote:
>
>> > lr-x------ 1 root root 64 Nov 20 18:03 3 -> /proc/net
>> > ...
>>
>> Yes all of those are nasty. So much for my clever way of implementing
>> these things. Grr. Simple hacks that almost work!
>
> btw., in case you feel inclined, i recently did some userspace coding
> and found to my surprise that /proc/self points to the parent task, not
> the thread itself (giving threads no real way to examine themselves). If
> you are hacking in this area, would it be a big trouble to add something
> like /proc/self-task/ or something like that? I had to use a raw gettid
> syscall to figure out the TID to get to /proc/*/tasks/TID/sched
> instrumentation info - which is quite a PITA.
Agreed. I have been debating with myself in the last couple of days
if it is a bug that /proc/self uses the tgid and not the actual pid/tid
value.
If I can be convinced that posix threads don't care I will happily just
switch /proc/self, calling the current implementation a bug.
I think it is a bug the real question is what are the backwards
compatibility implications. Do posix threads care?
It appears to me that either we need to fix /proc/self or we need
to add /proc/task-self and fix /proc/mounts to point at that.
In the normal case we share all of the same things so I think it is
a don't care. Except that /proc/self/status | grep Pid returns the
tgid.
Hmm. I think I am just going to send Andrew a patch for 2.6.25 that
just fixes /proc/self. I just fail to see how using the tgid is correct.
The only cases we could care seem to do the wrong thing when we use the
tgid.
Eric
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 2.6.24-rc3: find complains about /proc/net
2007-11-20 22:17 ` Eric W. Biederman
@ 2007-11-20 22:35 ` Ingo Molnar
2007-11-20 22:54 ` Roland McGrath
0 siblings, 1 reply; 22+ messages in thread
From: Ingo Molnar @ 2007-11-20 22:35 UTC (permalink / raw)
To: Eric W. Biederman
Cc: Pavel Emelyanov, Rafael J. Wysocki, Pavel Machek, kernel list,
netdev, Ulrich Drepper, Roland McGrath
these are all questions for Ulrich and Roland - Cc:-ed them.
* Eric W. Biederman <ebiederm@xmission.com> wrote:
> Ingo Molnar <mingo@elte.hu> writes:
>
> > * Eric W. Biederman <ebiederm@xmission.com> wrote:
> >
> >> > lr-x------ 1 root root 64 Nov 20 18:03 3 -> /proc/net
> >> > ...
> >>
> >> Yes all of those are nasty. So much for my clever way of implementing
> >> these things. Grr. Simple hacks that almost work!
> >
> > btw., in case you feel inclined, i recently did some userspace coding
> > and found to my surprise that /proc/self points to the parent task, not
> > the thread itself (giving threads no real way to examine themselves). If
> > you are hacking in this area, would it be a big trouble to add something
> > like /proc/self-task/ or something like that? I had to use a raw gettid
> > syscall to figure out the TID to get to /proc/*/tasks/TID/sched
> > instrumentation info - which is quite a PITA.
>
> Agreed. I have been debating with myself in the last couple of days
> if it is a bug that /proc/self uses the tgid and not the actual
> pid/tid value.
>
> If I can be convinced that posix threads don't care I will happily
> just switch /proc/self, calling the current implementation a bug.
>
> I think it is a bug the real question is what are the backwards
> compatibility implications. Do posix threads care?
>
> It appears to me that either we need to fix /proc/self or we need to
> add /proc/task-self and fix /proc/mounts to point at that.
>
> In the normal case we share all of the same things so I think it is a
> don't care. Except that /proc/self/status | grep Pid returns the
> tgid.
>
> Hmm. I think I am just going to send Andrew a patch for 2.6.25 that
> just fixes /proc/self. I just fail to see how using the tgid is
> correct. The only cases we could care seem to do the wrong thing when
> we use the tgid.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 2.6.24-rc3: find complains about /proc/net
2007-11-20 22:35 ` Ingo Molnar
@ 2007-11-20 22:54 ` Roland McGrath
2007-11-20 23:01 ` Ingo Molnar
0 siblings, 1 reply; 22+ messages in thread
From: Roland McGrath @ 2007-11-20 22:54 UTC (permalink / raw)
To: Ingo Molnar
Cc: Eric W. Biederman, Pavel Emelyanov, Rafael J. Wysocki,
Pavel Machek, kernel list, netdev, Ulrich Drepper
When did /proc/self get changed to follow tgid instead of pid? glibc uses
/proc/self to refer to various things that are usually shared anyway (fd,
maps, cwd, exe), but I think the expectation has always been that this
refers to the same calling thread, not the group leader. e.g., if one
thread has changed uids so it no longer has access to the group leader's
/proc/PID/fd, suddenly it using /proc/self/fd starts failing.
Thanks,
Roland
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 2.6.24-rc3: find complains about /proc/net
2007-11-20 22:54 ` Roland McGrath
@ 2007-11-20 23:01 ` Ingo Molnar
2007-11-20 23:06 ` Guillaume Chazarain
0 siblings, 1 reply; 22+ messages in thread
From: Ingo Molnar @ 2007-11-20 23:01 UTC (permalink / raw)
To: Roland McGrath
Cc: Eric W. Biederman, Pavel Emelyanov, Rafael J. Wysocki,
Pavel Machek, kernel list, netdev, Ulrich Drepper
* Roland McGrath <roland@redhat.com> wrote:
> When did /proc/self get changed to follow tgid instead of pid? glibc
> uses /proc/self to refer to various things that are usually shared
> anyway (fd, maps, cwd, exe), but I think the expectation has always
> been that this refers to the same calling thread, not the group
> leader. e.g., if one thread has changed uids so it no longer has
> access to the group leader's /proc/PID/fd, suddenly it using
> /proc/self/fd starts failing.
i guess it was a v2.6.24 change, hence a regression that needs to be
fixed?
Ingo
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 2.6.24-rc3: find complains about /proc/net
2007-11-20 23:01 ` Ingo Molnar
@ 2007-11-20 23:06 ` Guillaume Chazarain
2007-11-20 23:26 ` Roland McGrath
2007-11-20 23:43 ` Ingo Molnar
0 siblings, 2 replies; 22+ messages in thread
From: Guillaume Chazarain @ 2007-11-20 23:06 UTC (permalink / raw)
To: Ingo Molnar
Cc: Roland McGrath, Eric W. Biederman, Pavel Emelyanov,
Rafael J. Wysocki, Pavel Machek, kernel list, netdev,
Ulrich Drepper
On 11/21/07, Ingo Molnar <mingo@elte.hu> wrote:
> i guess it was a v2.6.24 change, hence a regression that needs to be
> fixed?
It seems to be
http://git.kernel.org/?p=linux/kernel/git/tglx/history.git;a=commitdiff;h=01660410
So, linux 2.6.0-test6
--
Guillaume
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 2.6.24-rc3: find complains about /proc/net
2007-11-20 23:06 ` Guillaume Chazarain
@ 2007-11-20 23:26 ` Roland McGrath
2007-11-20 23:32 ` Ulrich Drepper
2007-11-20 23:43 ` Ingo Molnar
1 sibling, 1 reply; 22+ messages in thread
From: Roland McGrath @ 2007-11-20 23:26 UTC (permalink / raw)
To: Guillaume Chazarain
Cc: Ingo Molnar, Eric W. Biederman, Pavel Emelyanov,
Rafael J. Wysocki, Pavel Machek, kernel list, netdev,
Ulrich Drepper
Oh, it seems it has indeed been that way for a very long time, so I was
mistaken. It still seems a little odd to me. Ulrich can say definitively
whether the kind of concern I mentioned really matters one way or the other
for glibc.
Thanks,
Roland
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 2.6.24-rc3: find complains about /proc/net
2007-11-20 23:26 ` Roland McGrath
@ 2007-11-20 23:32 ` Ulrich Drepper
2007-11-20 23:45 ` Ingo Molnar
2007-11-21 0:41 ` Eric W. Biederman
0 siblings, 2 replies; 22+ messages in thread
From: Ulrich Drepper @ 2007-11-20 23:32 UTC (permalink / raw)
To: Roland McGrath
Cc: Guillaume Chazarain, Ingo Molnar, Eric W. Biederman,
Pavel Emelyanov, Rafael J. Wysocki, Pavel Machek, kernel list,
netdev
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Roland McGrath wrote:
> Oh, it seems it has indeed been that way for a very long time, so I was
> mistaken. It still seems a little odd to me. Ulrich can say definitively
> whether the kind of concern I mentioned really matters one way or the other
> for glibc.
glibc cannot survive (at least NPTL) if somebody uses funny CLONE_*
flags to separate various pieces of information, e.g., file descriptors.
So, all the information in each thread's /proc/self should be identical.
When the information is not the same, the current semantics seems to be
more useful. So I guess, no change is the way to go here.
- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org
iD8DBQFHQ25/2ijCOnn/RHQRAmhhAJsHRF7FqO8DWwZ97gHxIO/i4Z1AAQCffCGa
Q2J8kjthKbbNQf1USWMAw3Y=
=xl/a
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 2.6.24-rc3: find complains about /proc/net
2007-11-20 23:06 ` Guillaume Chazarain
2007-11-20 23:26 ` Roland McGrath
@ 2007-11-20 23:43 ` Ingo Molnar
1 sibling, 0 replies; 22+ messages in thread
From: Ingo Molnar @ 2007-11-20 23:43 UTC (permalink / raw)
To: Guillaume Chazarain
Cc: Roland McGrath, Eric W. Biederman, Pavel Emelyanov,
Rafael J. Wysocki, Pavel Machek, kernel list, netdev,
Ulrich Drepper
* Guillaume Chazarain <guichaz@yahoo.fr> wrote:
> On 11/21/07, Ingo Molnar <mingo@elte.hu> wrote:
> > i guess it was a v2.6.24 change, hence a regression that needs to be
> > fixed?
>
> It seems to be
>
> http://git.kernel.org/?p=linux/kernel/git/tglx/history.git;a=commitdiff;h=01660410
>
> So, linux 2.6.0-test6
grumble :-/ So i guess /proc/self-task it has to be.
Ingo
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 2.6.24-rc3: find complains about /proc/net
2007-11-20 23:32 ` Ulrich Drepper
@ 2007-11-20 23:45 ` Ingo Molnar
2007-11-20 23:51 ` Roland McGrath
2007-11-21 0:41 ` Eric W. Biederman
1 sibling, 1 reply; 22+ messages in thread
From: Ingo Molnar @ 2007-11-20 23:45 UTC (permalink / raw)
To: Ulrich Drepper
Cc: Roland McGrath, Guillaume Chazarain, Eric W. Biederman,
Pavel Emelyanov, Rafael J. Wysocki, Pavel Machek, kernel list,
netdev
* Ulrich Drepper <drepper@redhat.com> wrote:
> > Oh, it seems it has indeed been that way for a very long time, so I
> > was mistaken. It still seems a little odd to me. Ulrich can say
> > definitively whether the kind of concern I mentioned really matters
> > one way or the other for glibc.
>
> glibc cannot survive (at least NPTL) if somebody uses funny CLONE_*
> flags to separate various pieces of information, e.g., file
> descriptors.
> So, all the information in each thread's /proc/self should be
> identical.
>
> When the information is not the same, the current semantics seems to
> be more useful. So I guess, no change is the way to go here.
can you see any danger to providing a /proc/self_task/ link? (or can you
think of a better name/API/approach)
Ingo
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 2.6.24-rc3: find complains about /proc/net
2007-11-20 23:45 ` Ingo Molnar
@ 2007-11-20 23:51 ` Roland McGrath
2007-11-21 0:47 ` Eric W. Biederman
2007-11-21 1:01 ` Rafael J. Wysocki
0 siblings, 2 replies; 22+ messages in thread
From: Roland McGrath @ 2007-11-20 23:51 UTC (permalink / raw)
To: Ingo Molnar
Cc: Ulrich Drepper, Guillaume Chazarain, Eric W. Biederman,
Pavel Emelyanov, Rafael J. Wysocki, Pavel Machek, kernel list,
netdev
> can you see any danger to providing a /proc/self_task/ link? (or can you
> think of a better name/API/approach)
That is a poor name to choose given /proc/self/task exists as something
else (just try writing a sentence comparing them and then read it aloud).
Probably /proc/self/task/self is what makes the most sense structurally.
I don't know if it matters to whatever use you are concerned with to have
two more steps in the lookup.
Thanks,
Roland
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 2.6.24-rc3: find complains about /proc/net
2007-11-20 23:32 ` Ulrich Drepper
2007-11-20 23:45 ` Ingo Molnar
@ 2007-11-21 0:41 ` Eric W. Biederman
1 sibling, 0 replies; 22+ messages in thread
From: Eric W. Biederman @ 2007-11-21 0:41 UTC (permalink / raw)
To: Ulrich Drepper
Cc: Roland McGrath, Guillaume Chazarain, Ingo Molnar, Pavel Emelyanov,
Rafael J. Wysocki, Pavel Machek, kernel list, netdev
Ulrich Drepper <drepper@redhat.com> writes:
> Roland McGrath wrote:
>> Oh, it seems it has indeed been that way for a very long time, so I was
>> mistaken. It still seems a little odd to me. Ulrich can say definitively
>> whether the kind of concern I mentioned really matters one way or the other
>> for glibc.
>
> glibc cannot survive (at least NPTL) if somebody uses funny CLONE_*
> flags to separate various pieces of information, e.g., file descriptors.
> So, all the information in each thread's /proc/self should be identical.
Which seems to confirm that glibc and native pthread can't care.
> When the information is not the same, the current semantics seems to be
> more useful. So I guess, no change is the way to go here.
Could you elaborate a bit on how the semantics of returning the
wrong information are more useful?
In particular if a thread does the logical equivalent of:
grep Pid: /proc/self/status. It always get the tgid despite
having a different process id.
How can that possibly be useful or correct?
>From the kernel side I really think the current semantics of /proc/self
in the context of threads is a bug and confusing. All of the kernel
developers first reaction when this was pointed out was that this
is a regression.
If it is truly useful to user space we can preserve this API design
bug forever. I just want to make certain we are not being bug
compatible without a good reason.
Currently we have several kernel side bugs with threaded
programs because /proc/self does not do the intuitive thing. Unless
something has changed recently selinux will cause accesses by a
non-leader thread to fail when accessing files through /proc/self.
So far the more I look at the current /proc/self behavior the
more I am convinced it is broken, and useless. Please help me see
where it is useful, so we can justify keeping it.
Thanks,
Eric
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 2.6.24-rc3: find complains about /proc/net
2007-11-20 23:51 ` Roland McGrath
@ 2007-11-21 0:47 ` Eric W. Biederman
2007-11-21 1:01 ` Rafael J. Wysocki
1 sibling, 0 replies; 22+ messages in thread
From: Eric W. Biederman @ 2007-11-21 0:47 UTC (permalink / raw)
To: Roland McGrath
Cc: Ingo Molnar, Ulrich Drepper, Guillaume Chazarain, Pavel Emelyanov,
Rafael J. Wysocki, Pavel Machek, kernel list, netdev
Roland McGrath <roland@redhat.com> writes:
>> can you see any danger to providing a /proc/self_task/ link? (or can you
>> think of a better name/API/approach)
>
> That is a poor name to choose given /proc/self/task exists as something
> else (just try writing a sentence comparing them and then read it aloud).
> Probably /proc/self/task/self is what makes the most sense structurally.
> I don't know if it matters to whatever use you are concerned with to have
> two more steps in the lookup.
Well the only case it could matter is if you aren't allowed to access
/proc/<tgid> which I think may actually be the current selinux behavior.
So if we can't fix /proc/self we need to introduce /proc/task-self at
the top level, just to be certain we don't run into weird cases like
that. Otherwise /proc/self/task/self sounds like a wonderful suggestion.
Eric
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 2.6.24-rc3: find complains about /proc/net
2007-11-20 23:51 ` Roland McGrath
2007-11-21 0:47 ` Eric W. Biederman
@ 2007-11-21 1:01 ` Rafael J. Wysocki
1 sibling, 0 replies; 22+ messages in thread
From: Rafael J. Wysocki @ 2007-11-21 1:01 UTC (permalink / raw)
To: Roland McGrath
Cc: Ingo Molnar, Ulrich Drepper, Guillaume Chazarain,
Eric W. Biederman, Pavel Emelyanov, Pavel Machek, kernel list,
netdev
On Wednesday, 21 of November 2007, Roland McGrath wrote:
> > can you see any danger to providing a /proc/self_task/ link? (or can you
> > think of a better name/API/approach)
>
> That is a poor name to choose given /proc/self/task exists as something
> else (just try writing a sentence comparing them and then read it aloud).
> Probably /proc/self/task/self is what makes the most sense structurally.
> I don't know if it matters to whatever use you are concerned with to have
> two more steps in the lookup.
Hm, /proc/this_thread maybe?
Rafael
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 2.6.24-rc3: find complains about /proc/net
2007-11-20 15:51 ` Pavel Emelyanov
2007-11-20 21:52 ` Eric W. Biederman
@ 2007-11-21 1:19 ` Eric W. Biederman
2007-11-21 6:36 ` Eric W. Biederman
2 siblings, 0 replies; 22+ messages in thread
From: Eric W. Biederman @ 2007-11-21 1:19 UTC (permalink / raw)
To: Pavel Emelyanov; +Cc: Rafael J. Wysocki, Pavel Machek, kernel list, netdev
Pavel Emelyanov <xemul@openvz.org> writes:
> Rafael J. Wysocki wrote:
>> On Monday, 19 of November 2007, Pavel Machek wrote:
>>> Hi!
>>>
>>> I think that this worked before:
>>>
>>> root@amd:/proc# find . -name "timer_info"
>>> find: WARNING: Hard link count is wrong for ./net: this may be a bug
>>> in your filesystem driver. Automatically turning on find's -noleaf
>>> option. Earlier results may have failed to include directories that
>>> should have been searched.
>>> root@amd:/proc#
>>
>> I'm seeing that too.
>
> I have a better things with 2.6.24-rc3 ;)
>
> # cd /proc/net
> # ls ..
> ls: reading directory ..: Not a directory
>
> and this
>
> # cd /proc
> # find
> ...
> ./net
> find: . changed during execution of find
> # find net
> find: net changed during execution of find
> # find net/
> <this works ok however>
>
> Moreover. Program that opens /proc/net and dumps the /proc/self/fd
> files produces the following:
>
> # cd /
> # a.out /proc/net
> ...
> lr-x------ 1 root root 64 Nov 20 18:02 3 -> /proc/net/net (deleted)
> ...
> # cd /proc/net
> # a.out .
> ...
> lr-x------ 1 root root 64 Nov 20 18:03 3 -> /proc/net/net (deleted)
> ...
> # a.out ..
> ...
> lr-x------ 1 root root 64 Nov 20 18:03 3 -> /proc/net
> ...
>
> This all is somehow related to the shadow proc files.
> E.g. the first problem (with -ENOTDIR) is due to the shadow /proc/net
> dentry doesn't implement the .readdir method:
>
> static const struct file_operations proc_net_dir_operations = {
> .read = generic_read_dir,
> };
>
> And I haven't managed to find out why the rest problems
> occur...
>
> Eric, do you have fixes for it?
Duh. There is one other possible solution I forgot to mention and
at least as a first pass it should be relatively simple. Have the
mount of proc capture the network namespace. I'm not certain
if it is what we want long term but it should be simple and relatively
easy to implement.
I don't like capturing the network namespace when we mount proc but
it is easier then implementing /proc/self/net. Which is the other
real alternative.
Eric
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 2.6.24-rc3: find complains about /proc/net
[not found] ` <fa.QqYdKsBUWKSLLGXmxJCAtZxLYnE@ifi.uio.no>
@ 2007-11-21 1:21 ` Robert Hancock
2007-11-21 1:41 ` Eric W. Biederman
0 siblings, 1 reply; 22+ messages in thread
From: Robert Hancock @ 2007-11-21 1:21 UTC (permalink / raw)
To: Eric W. Biederman
Cc: Ulrich Drepper, Roland McGrath, Guillaume Chazarain, Ingo Molnar,
Pavel Emelyanov, Rafael J. Wysocki, Pavel Machek, kernel list,
netdev
Eric W. Biederman wrote:
> Could you elaborate a bit on how the semantics of returning the
> wrong information are more useful?
>
> In particular if a thread does the logical equivalent of:
> grep Pid: /proc/self/status. It always get the tgid despite
> having a different process id.
The POSIX-defined userspace concept of a PID requires that all threads
appear to have the same PID. This is something that Linux didn't comply
with under the old LinuxThreads implementation and was finally fixed
with NPTL. This isn't a POSIX-defined interface, but I assume it's
trying to be consistent with getpid(), etc.
> How can that possibly be useful or correct?
>
> From the kernel side I really think the current semantics of /proc/self
> in the context of threads is a bug and confusing. All of the kernel
> developers first reaction when this was pointed out was that this
> is a regression.
>
> If it is truly useful to user space we can preserve this API design
> bug forever. I just want to make certain we are not being bug
> compatible without a good reason.
>
> Currently we have several kernel side bugs with threaded
> programs because /proc/self does not do the intuitive thing. Unless
> something has changed recently selinux will cause accesses by a
> non-leader thread to fail when accessing files through /proc/self.
>
> So far the more I look at the current /proc/self behavior the
> more I am convinced it is broken, and useless. Please help me see
> where it is useful, so we can justify keeping it.
--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 2.6.24-rc3: find complains about /proc/net
2007-11-21 1:21 ` 2.6.24-rc3: find complains about /proc/net Robert Hancock
@ 2007-11-21 1:41 ` Eric W. Biederman
0 siblings, 0 replies; 22+ messages in thread
From: Eric W. Biederman @ 2007-11-21 1:41 UTC (permalink / raw)
To: Robert Hancock
Cc: Ulrich Drepper, Roland McGrath, Guillaume Chazarain, Ingo Molnar,
Pavel Emelyanov, Rafael J. Wysocki, Pavel Machek, kernel list,
netdev
Robert Hancock <hancockr@shaw.ca> writes:
> Eric W. Biederman wrote:
>> Could you elaborate a bit on how the semantics of returning the
>> wrong information are more useful?
>>
>> In particular if a thread does the logical equivalent of:
>> grep Pid: /proc/self/status. It always get the tgid despite
>> having a different process id.
>
> The POSIX-defined userspace concept of a PID requires that all threads appear to
> have the same PID. This is something that Linux didn't comply with under the old
> LinuxThreads implementation and was finally fixed with NPTL. This isn't a
> POSIX-defined interface, but I assume it's trying to be consistent with
> getpid(), etc.
Linux exports two fields in /proc/self/status:
Tgid: 32698
Pid: 32698
The tgid maps to the posix concept. The pid is this context is the
thread id.
So it seems broken to me to return the same thread id for different threads.
Eric
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 2.6.24-rc3: find complains about /proc/net
2007-11-20 15:51 ` Pavel Emelyanov
2007-11-20 21:52 ` Eric W. Biederman
2007-11-21 1:19 ` Eric W. Biederman
@ 2007-11-21 6:36 ` Eric W. Biederman
2007-11-21 9:36 ` Pavel Emelyanov
2 siblings, 1 reply; 22+ messages in thread
From: Eric W. Biederman @ 2007-11-21 6:36 UTC (permalink / raw)
To: Pavel Emelyanov; +Cc: Rafael J. Wysocki, Pavel Machek, kernel list, netdev
Below is a preliminary patch. It solves the directory issue but it doesn't
play well with proc_mnt and proc_flush_task. It works by simply caching the
network namespace when we mount proc so we don't have to be fancy and dynamic.
Something for the discussion anyway.
I will start sorting out what makes sense tomorrow.
Eric
>From f359fde2469ba8be2123a465e788a83c7e6831e9 Mon Sep 17 00:00:00 2001
From: Eric W. Biederman <ebiederm@xmission.com>
Date: Tue, 20 Nov 2007 19:36:05 -0700
Subject: [PATCH] proc: Fix /proc/net directory listings.
Having proc dynamically display the contents of /proc/net is
hard. So make life simpler by capturing the network namespace
when we mount proc and only displaying that network namespace.
---
fs/proc/base.c | 8 ++--
fs/proc/generic.c | 4 ++-
fs/proc/internal.h | 13 +++++++
fs/proc/proc_net.c | 89 ++++-------------------------------------------
fs/proc/root.c | 50 ++++++++++++++++++--------
include/linux/proc_fs.h | 4 ++
6 files changed, 66 insertions(+), 102 deletions(-)
diff --git a/fs/proc/base.c b/fs/proc/base.c
index aeaf0d0..9d4f06a 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -2395,7 +2395,7 @@ struct dentry *proc_pid_lookup(struct inode *dir, struct dentry * dentry, struct
if (tgid == ~0U)
goto out;
- ns = dentry->d_sb->s_fs_info;
+ ns = proc_sbi(dentry->d_sb)->pid_ns;
rcu_read_lock();
task = find_task_by_pid_ns(tgid, ns);
if (task)
@@ -2476,7 +2476,7 @@ int proc_pid_readdir(struct file * filp, void * dirent, filldir_t filldir)
goto out;
}
- ns = filp->f_dentry->d_sb->s_fs_info;
+ ns = proc_sbi(filp->f_dentry->d_sb)->pid_ns;
tgid = filp->f_pos - TGID_OFFSET;
for (task = next_tgid(tgid, ns);
task;
@@ -2615,7 +2615,7 @@ static struct dentry *proc_task_lookup(struct inode *dir, struct dentry * dentry
if (tid == ~0U)
goto out;
- ns = dentry->d_sb->s_fs_info;
+ ns = proc_sbi(dentry->d_sb)->pid_ns;
rcu_read_lock();
task = find_task_by_pid_ns(tid, ns);
if (task)
@@ -2758,7 +2758,7 @@ static int proc_task_readdir(struct file * filp, void * dirent, filldir_t filldi
/* f_version caches the tgid value that the last readdir call couldn't
* return. lseek aka telldir automagically resets f_version to 0.
*/
- ns = filp->f_dentry->d_sb->s_fs_info;
+ ns = proc_sbi(filp->f_dentry->d_sb)->pid_ns;
tid = (int)filp->f_version;
filp->f_version = 0;
for (task = first_tid(leader, tid, pos - 2, ns);
diff --git a/fs/proc/generic.c b/fs/proc/generic.c
index 1bdb624..b58f0ec 100644
--- a/fs/proc/generic.c
+++ b/fs/proc/generic.c
@@ -398,7 +398,9 @@ struct dentry *proc_lookup(struct inode * dir, struct dentry *dentry, struct nam
continue;
if (!memcmp(dentry->d_name.name, de->name, de->namelen)) {
unsigned int ino = de->low_ino;
-
+
+ if (de->shadow_proc)
+ de = de->shadow_proc(dentry->d_sb, de);
de_get(de);
spin_unlock(&proc_subdir_lock);
error = -EINVAL;
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index 1820eb2..a26f115 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -11,6 +11,18 @@
#include <linux/proc_fs.h>
+struct pid_namespace;
+struct net;
+struct proc_sb_info {
+ struct pid_namespace *pid_ns;
+ struct net *net_ns;
+};
+
+static inline struct proc_sb_info *proc_sbi(struct super_block *sb)
+{
+ return sb->s_fs_info;
+}
+
#ifdef CONFIG_PROC_SYSCTL
extern int proc_sys_init(void);
#else
@@ -78,3 +90,4 @@ static inline int proc_fd(struct inode *inode)
{
return PROC_I(inode)->fd;
}
+
diff --git a/fs/proc/proc_net.c b/fs/proc/proc_net.c
index 131f9c6..8a82e29 100644
--- a/fs/proc/proc_net.c
+++ b/fs/proc/proc_net.c
@@ -50,89 +50,15 @@ struct net *get_proc_net(const struct inode *inode)
}
EXPORT_SYMBOL_GPL(get_proc_net);
-static struct proc_dir_entry *proc_net_shadow;
+static struct proc_dir_entry *shadow_pde;
-static struct dentry *proc_net_shadow_dentry(struct dentry *parent,
- struct proc_dir_entry *de)
+static struct proc_dir_entry *proc_net_shadow(struct super_block *sb,
+ struct proc_dir_entry *de)
{
- struct dentry *shadow = NULL;
- struct inode *inode;
- if (!de)
- goto out;
- de_get(de);
- inode = proc_get_inode(parent->d_inode->i_sb, de->low_ino, de);
- if (!inode)
- goto out_de_put;
- shadow = d_alloc_name(parent, de->name);
- if (!shadow)
- goto out_iput;
- shadow->d_op = parent->d_op; /* proc_dentry_operations */
- d_instantiate(shadow, inode);
-out:
- return shadow;
-out_iput:
- iput(inode);
-out_de_put:
- de_put(de);
- goto out;
-}
-
-static void *proc_net_follow_link(struct dentry *parent, struct nameidata *nd)
-{
- struct net *net = current->nsproxy->net_ns;
- struct dentry *shadow;
- shadow = proc_net_shadow_dentry(parent, net->proc_net);
- if (!shadow)
- return ERR_PTR(-ENOENT);
-
- dput(nd->dentry);
- /* My dentry count is 1 and that should be enough as the
- * shadow dentry is thrown away immediately.
- */
- nd->dentry = shadow;
- return NULL;
+ struct proc_sb_info *sbi = proc_sbi(sb);
+ return sbi->net_ns->proc_net;
}
-static struct dentry *proc_net_lookup(struct inode *dir, struct dentry *dentry,
- struct nameidata *nd)
-{
- struct net *net = current->nsproxy->net_ns;
- struct dentry *shadow;
-
- shadow = proc_net_shadow_dentry(nd->dentry, net->proc_net);
- if (!shadow)
- return ERR_PTR(-ENOENT);
-
- dput(nd->dentry);
- nd->dentry = shadow;
-
- return shadow->d_inode->i_op->lookup(shadow->d_inode, dentry, nd);
-}
-
-static int proc_net_setattr(struct dentry *dentry, struct iattr *iattr)
-{
- struct net *net = current->nsproxy->net_ns;
- struct dentry *shadow;
- int ret;
-
- shadow = proc_net_shadow_dentry(dentry->d_parent, net->proc_net);
- if (!shadow)
- return -ENOENT;
- ret = shadow->d_inode->i_op->setattr(shadow, iattr);
- dput(shadow);
- return ret;
-}
-
-static const struct file_operations proc_net_dir_operations = {
- .read = generic_read_dir,
-};
-
-static struct inode_operations proc_net_dir_inode_operations = {
- .follow_link = proc_net_follow_link,
- .lookup = proc_net_lookup,
- .setattr = proc_net_setattr,
-};
-
static __net_init int proc_net_ns_init(struct net *net)
{
struct proc_dir_entry *root, *netd, *net_statd;
@@ -185,9 +111,8 @@ static struct pernet_operations __net_initdata proc_net_ns_ops = {
int __init proc_net_init(void)
{
- proc_net_shadow = proc_mkdir("net", NULL);
- proc_net_shadow->proc_iops = &proc_net_dir_inode_operations;
- proc_net_shadow->proc_fops = &proc_net_dir_operations;
+ shadow_pde = proc_mkdir("net", NULL);
+ shadow_pde->shadow_proc = proc_net_shadow;
return register_pernet_subsys(&proc_net_ns_ops);
}
diff --git a/fs/proc/root.c b/fs/proc/root.c
index ec9cb3b..e60ac83 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -19,6 +19,7 @@
#include <linux/smp_lock.h>
#include <linux/mount.h>
#include <linux/pid_namespace.h>
+#include <net/net_namespace.h>
#include "internal.h"
@@ -26,15 +27,23 @@ struct proc_dir_entry *proc_bus, *proc_root_fs, *proc_root_driver;
static int proc_test_super(struct super_block *sb, void *data)
{
- return sb->s_fs_info == data;
+ struct proc_sb_info *sbi = proc_sbi(sb), *info = data;
+ return (sbi->pid_ns == info->pid_ns) &&
+ (sbi->net_ns == info->net_ns);
}
static int proc_set_super(struct super_block *sb, void *data)
{
- struct pid_namespace *ns;
-
- ns = (struct pid_namespace *)data;
- sb->s_fs_info = get_pid_ns(ns);
+
+ struct proc_sb_info *new, *info = data;
+
+ new = kzalloc(sizeof(*new), GFP_KERNEL);
+ if (!new)
+ return -ENOMEM;
+ *new = *info;
+ get_pid_ns(new->pid_ns);
+ get_net(new->net_ns);
+ sb->s_fs_info = new;
return set_anon_super(sb, NULL);
}
@@ -43,7 +52,7 @@ static int proc_get_sb(struct file_system_type *fs_type,
{
int err;
struct super_block *sb;
- struct pid_namespace *ns;
+ struct proc_sb_info info, *sbi;
struct proc_inode *ei;
if (proc_mnt) {
@@ -57,12 +66,14 @@ static int proc_get_sb(struct file_system_type *fs_type,
ei->pid = find_get_pid(1);
}
+ info.pid_ns = current->nsproxy->pid_ns;
+ info.net_ns = current->nsproxy->net_ns;
if (flags & MS_KERNMOUNT)
- ns = (struct pid_namespace *)data;
+ sbi = data;
else
- ns = current->nsproxy->pid_ns;
+ sbi = &info;
- sb = sget(fs_type, proc_test_super, proc_set_super, ns);
+ sb = sget(fs_type, proc_test_super, proc_set_super, sbi);
if (IS_ERR(sb))
return PTR_ERR(sb);
@@ -78,12 +89,13 @@ static int proc_get_sb(struct file_system_type *fs_type,
ei = PROC_I(sb->s_root->d_inode);
if (!ei->pid) {
rcu_read_lock();
- ei->pid = get_pid(find_pid_ns(1, ns));
+ ei->pid = get_pid(find_pid_ns(1, sbi->pid_ns));
rcu_read_unlock();
}
sb->s_flags |= MS_ACTIVE;
- ns->proc_mnt = mnt;
+ if (!sbi->pid_ns->proc_mnt)
+ sbi->pid_ns->proc_mnt = mnt;
}
return simple_set_mnt(mnt, sb);
@@ -91,11 +103,13 @@ static int proc_get_sb(struct file_system_type *fs_type,
static void proc_kill_sb(struct super_block *sb)
{
- struct pid_namespace *ns;
+ struct proc_sb_info *sbi;
- ns = (struct pid_namespace *)sb->s_fs_info;
+ sbi = proc_sbi(sb);
kill_anon_super(sb);
- put_pid_ns(ns);
+ put_pid_ns(sbi->pid_ns);
+ put_net(sbi->net_ns);
+ kfree(sbi);
}
static struct file_system_type proc_fs_type = {
@@ -106,13 +120,16 @@ static struct file_system_type proc_fs_type = {
void __init proc_root_init(void)
{
+ struct proc_sb_info info;
int err = proc_init_inodecache();
if (err)
return;
err = register_filesystem(&proc_fs_type);
if (err)
return;
- proc_mnt = kern_mount_data(&proc_fs_type, &init_pid_ns);
+ info.pid_ns = &init_pid_ns;
+ info.net_ns = current->nsproxy->net_ns;
+ proc_mnt = kern_mount_data(&proc_fs_type, &info);
err = PTR_ERR(proc_mnt);
if (IS_ERR(proc_mnt)) {
unregister_filesystem(&proc_fs_type);
@@ -214,8 +231,11 @@ struct proc_dir_entry proc_root = {
int pid_ns_prepare_proc(struct pid_namespace *ns)
{
+ struct proc_sb_info info;
struct vfsmount *mnt;
+ info.pid_ns = ns;
+ info.net_ns = current->nsproxy->net_ns;
mnt = kern_mount_data(&proc_fs_type, ns);
if (IS_ERR(mnt))
return PTR_ERR(mnt);
diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
index 2b3c1d8..c22c558 100644
--- a/include/linux/proc_fs.h
+++ b/include/linux/proc_fs.h
@@ -48,6 +48,9 @@ typedef int (read_proc_t)(char *page, char **start, off_t off,
typedef int (write_proc_t)(struct file *file, const char __user *buffer,
unsigned long count, void *data);
typedef int (get_info_t)(char *, char **, off_t, int);
+struct proc_dir_entry;
+typedef struct proc_dir_entry *(shadow_proc_t)(struct super_block *sb,
+ struct proc_dir_entry *pde);
struct proc_dir_entry {
unsigned int low_ino;
@@ -79,6 +82,7 @@ struct proc_dir_entry {
int pde_users; /* number of callers into module in progress */
spinlock_t pde_unload_lock; /* proc_fops checks and pde_users bumps */
struct completion *pde_unload_completion;
+ shadow_proc_t *shadow_proc;
};
struct kcore_list {
--
1.5.3.rc6.17.g1911
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: 2.6.24-rc3: find complains about /proc/net
2007-11-21 6:36 ` Eric W. Biederman
@ 2007-11-21 9:36 ` Pavel Emelyanov
0 siblings, 0 replies; 22+ messages in thread
From: Pavel Emelyanov @ 2007-11-21 9:36 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: Rafael J. Wysocki, Pavel Machek, kernel list, netdev
Eric W. Biederman wrote:
> Below is a preliminary patch. It solves the directory issue but it doesn't
> play well with proc_mnt and proc_flush_task. It works by simply caching the
> network namespace when we mount proc so we don't have to be fancy and dynamic.
Nice... Where should we apply this patch to?
> Something for the discussion anyway.
>
> I will start sorting out what makes sense tomorrow.
>
> Eric
>
>
>>From f359fde2469ba8be2123a465e788a83c7e6831e9 Mon Sep 17 00:00:00 2001
> From: Eric W. Biederman <ebiederm@xmission.com>
> Date: Tue, 20 Nov 2007 19:36:05 -0700
> Subject: [PATCH] proc: Fix /proc/net directory listings.
>
> Having proc dynamically display the contents of /proc/net is
> hard. So make life simpler by capturing the network namespace
> when we mount proc and only displaying that network namespace.
>
> ---
> fs/proc/base.c | 8 ++--
> fs/proc/generic.c | 4 ++-
> fs/proc/internal.h | 13 +++++++
> fs/proc/proc_net.c | 89 ++++-------------------------------------------
> fs/proc/root.c | 50 ++++++++++++++++++--------
> include/linux/proc_fs.h | 4 ++
> 6 files changed, 66 insertions(+), 102 deletions(-)
>
> diff --git a/fs/proc/base.c b/fs/proc/base.c
> index aeaf0d0..9d4f06a 100644
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -2395,7 +2395,7 @@ struct dentry *proc_pid_lookup(struct inode *dir, struct dentry * dentry, struct
> if (tgid == ~0U)
> goto out;
>
> - ns = dentry->d_sb->s_fs_info;
> + ns = proc_sbi(dentry->d_sb)->pid_ns;
> rcu_read_lock();
> task = find_task_by_pid_ns(tgid, ns);
> if (task)
> @@ -2476,7 +2476,7 @@ int proc_pid_readdir(struct file * filp, void * dirent, filldir_t filldir)
> goto out;
> }
>
> - ns = filp->f_dentry->d_sb->s_fs_info;
> + ns = proc_sbi(filp->f_dentry->d_sb)->pid_ns;
> tgid = filp->f_pos - TGID_OFFSET;
> for (task = next_tgid(tgid, ns);
> task;
> @@ -2615,7 +2615,7 @@ static struct dentry *proc_task_lookup(struct inode *dir, struct dentry * dentry
> if (tid == ~0U)
> goto out;
>
> - ns = dentry->d_sb->s_fs_info;
> + ns = proc_sbi(dentry->d_sb)->pid_ns;
> rcu_read_lock();
> task = find_task_by_pid_ns(tid, ns);
> if (task)
> @@ -2758,7 +2758,7 @@ static int proc_task_readdir(struct file * filp, void * dirent, filldir_t filldi
> /* f_version caches the tgid value that the last readdir call couldn't
> * return. lseek aka telldir automagically resets f_version to 0.
> */
> - ns = filp->f_dentry->d_sb->s_fs_info;
> + ns = proc_sbi(filp->f_dentry->d_sb)->pid_ns;
> tid = (int)filp->f_version;
> filp->f_version = 0;
> for (task = first_tid(leader, tid, pos - 2, ns);
> diff --git a/fs/proc/generic.c b/fs/proc/generic.c
> index 1bdb624..b58f0ec 100644
> --- a/fs/proc/generic.c
> +++ b/fs/proc/generic.c
> @@ -398,7 +398,9 @@ struct dentry *proc_lookup(struct inode * dir, struct dentry *dentry, struct nam
> continue;
> if (!memcmp(dentry->d_name.name, de->name, de->namelen)) {
> unsigned int ino = de->low_ino;
> -
> +
> + if (de->shadow_proc)
> + de = de->shadow_proc(dentry->d_sb, de);
> de_get(de);
> spin_unlock(&proc_subdir_lock);
> error = -EINVAL;
> diff --git a/fs/proc/internal.h b/fs/proc/internal.h
> index 1820eb2..a26f115 100644
> --- a/fs/proc/internal.h
> +++ b/fs/proc/internal.h
> @@ -11,6 +11,18 @@
>
> #include <linux/proc_fs.h>
>
> +struct pid_namespace;
> +struct net;
> +struct proc_sb_info {
> + struct pid_namespace *pid_ns;
> + struct net *net_ns;
> +};
> +
> +static inline struct proc_sb_info *proc_sbi(struct super_block *sb)
> +{
> + return sb->s_fs_info;
> +}
> +
> #ifdef CONFIG_PROC_SYSCTL
> extern int proc_sys_init(void);
> #else
> @@ -78,3 +90,4 @@ static inline int proc_fd(struct inode *inode)
> {
> return PROC_I(inode)->fd;
> }
> +
> diff --git a/fs/proc/proc_net.c b/fs/proc/proc_net.c
> index 131f9c6..8a82e29 100644
> --- a/fs/proc/proc_net.c
> +++ b/fs/proc/proc_net.c
> @@ -50,89 +50,15 @@ struct net *get_proc_net(const struct inode *inode)
> }
> EXPORT_SYMBOL_GPL(get_proc_net);
>
> -static struct proc_dir_entry *proc_net_shadow;
> +static struct proc_dir_entry *shadow_pde;
>
> -static struct dentry *proc_net_shadow_dentry(struct dentry *parent,
> - struct proc_dir_entry *de)
> +static struct proc_dir_entry *proc_net_shadow(struct super_block *sb,
> + struct proc_dir_entry *de)
> {
> - struct dentry *shadow = NULL;
> - struct inode *inode;
> - if (!de)
> - goto out;
> - de_get(de);
> - inode = proc_get_inode(parent->d_inode->i_sb, de->low_ino, de);
> - if (!inode)
> - goto out_de_put;
> - shadow = d_alloc_name(parent, de->name);
> - if (!shadow)
> - goto out_iput;
> - shadow->d_op = parent->d_op; /* proc_dentry_operations */
> - d_instantiate(shadow, inode);
> -out:
> - return shadow;
> -out_iput:
> - iput(inode);
> -out_de_put:
> - de_put(de);
> - goto out;
> -}
> -
> -static void *proc_net_follow_link(struct dentry *parent, struct nameidata *nd)
> -{
> - struct net *net = current->nsproxy->net_ns;
> - struct dentry *shadow;
> - shadow = proc_net_shadow_dentry(parent, net->proc_net);
> - if (!shadow)
> - return ERR_PTR(-ENOENT);
> -
> - dput(nd->dentry);
> - /* My dentry count is 1 and that should be enough as the
> - * shadow dentry is thrown away immediately.
> - */
> - nd->dentry = shadow;
> - return NULL;
> + struct proc_sb_info *sbi = proc_sbi(sb);
> + return sbi->net_ns->proc_net;
> }
>
> -static struct dentry *proc_net_lookup(struct inode *dir, struct dentry *dentry,
> - struct nameidata *nd)
> -{
> - struct net *net = current->nsproxy->net_ns;
> - struct dentry *shadow;
> -
> - shadow = proc_net_shadow_dentry(nd->dentry, net->proc_net);
> - if (!shadow)
> - return ERR_PTR(-ENOENT);
> -
> - dput(nd->dentry);
> - nd->dentry = shadow;
> -
> - return shadow->d_inode->i_op->lookup(shadow->d_inode, dentry, nd);
> -}
> -
> -static int proc_net_setattr(struct dentry *dentry, struct iattr *iattr)
> -{
> - struct net *net = current->nsproxy->net_ns;
> - struct dentry *shadow;
> - int ret;
> -
> - shadow = proc_net_shadow_dentry(dentry->d_parent, net->proc_net);
> - if (!shadow)
> - return -ENOENT;
> - ret = shadow->d_inode->i_op->setattr(shadow, iattr);
> - dput(shadow);
> - return ret;
> -}
> -
> -static const struct file_operations proc_net_dir_operations = {
> - .read = generic_read_dir,
> -};
> -
> -static struct inode_operations proc_net_dir_inode_operations = {
> - .follow_link = proc_net_follow_link,
> - .lookup = proc_net_lookup,
> - .setattr = proc_net_setattr,
> -};
> -
> static __net_init int proc_net_ns_init(struct net *net)
> {
> struct proc_dir_entry *root, *netd, *net_statd;
> @@ -185,9 +111,8 @@ static struct pernet_operations __net_initdata proc_net_ns_ops = {
>
> int __init proc_net_init(void)
> {
> - proc_net_shadow = proc_mkdir("net", NULL);
> - proc_net_shadow->proc_iops = &proc_net_dir_inode_operations;
> - proc_net_shadow->proc_fops = &proc_net_dir_operations;
> + shadow_pde = proc_mkdir("net", NULL);
> + shadow_pde->shadow_proc = proc_net_shadow;
>
> return register_pernet_subsys(&proc_net_ns_ops);
> }
> diff --git a/fs/proc/root.c b/fs/proc/root.c
> index ec9cb3b..e60ac83 100644
> --- a/fs/proc/root.c
> +++ b/fs/proc/root.c
> @@ -19,6 +19,7 @@
> #include <linux/smp_lock.h>
> #include <linux/mount.h>
> #include <linux/pid_namespace.h>
> +#include <net/net_namespace.h>
>
> #include "internal.h"
>
> @@ -26,15 +27,23 @@ struct proc_dir_entry *proc_bus, *proc_root_fs, *proc_root_driver;
>
> static int proc_test_super(struct super_block *sb, void *data)
> {
> - return sb->s_fs_info == data;
> + struct proc_sb_info *sbi = proc_sbi(sb), *info = data;
> + return (sbi->pid_ns == info->pid_ns) &&
> + (sbi->net_ns == info->net_ns);
> }
>
> static int proc_set_super(struct super_block *sb, void *data)
> {
> - struct pid_namespace *ns;
> -
> - ns = (struct pid_namespace *)data;
> - sb->s_fs_info = get_pid_ns(ns);
> +
> + struct proc_sb_info *new, *info = data;
> +
> + new = kzalloc(sizeof(*new), GFP_KERNEL);
> + if (!new)
> + return -ENOMEM;
> + *new = *info;
> + get_pid_ns(new->pid_ns);
> + get_net(new->net_ns);
> + sb->s_fs_info = new;
> return set_anon_super(sb, NULL);
> }
>
> @@ -43,7 +52,7 @@ static int proc_get_sb(struct file_system_type *fs_type,
> {
> int err;
> struct super_block *sb;
> - struct pid_namespace *ns;
> + struct proc_sb_info info, *sbi;
> struct proc_inode *ei;
>
> if (proc_mnt) {
> @@ -57,12 +66,14 @@ static int proc_get_sb(struct file_system_type *fs_type,
> ei->pid = find_get_pid(1);
> }
>
> + info.pid_ns = current->nsproxy->pid_ns;
> + info.net_ns = current->nsproxy->net_ns;
> if (flags & MS_KERNMOUNT)
> - ns = (struct pid_namespace *)data;
> + sbi = data;
> else
> - ns = current->nsproxy->pid_ns;
> + sbi = &info;
>
> - sb = sget(fs_type, proc_test_super, proc_set_super, ns);
> + sb = sget(fs_type, proc_test_super, proc_set_super, sbi);
> if (IS_ERR(sb))
> return PTR_ERR(sb);
>
> @@ -78,12 +89,13 @@ static int proc_get_sb(struct file_system_type *fs_type,
> ei = PROC_I(sb->s_root->d_inode);
> if (!ei->pid) {
> rcu_read_lock();
> - ei->pid = get_pid(find_pid_ns(1, ns));
> + ei->pid = get_pid(find_pid_ns(1, sbi->pid_ns));
> rcu_read_unlock();
> }
>
> sb->s_flags |= MS_ACTIVE;
> - ns->proc_mnt = mnt;
> + if (!sbi->pid_ns->proc_mnt)
> + sbi->pid_ns->proc_mnt = mnt;
> }
>
> return simple_set_mnt(mnt, sb);
> @@ -91,11 +103,13 @@ static int proc_get_sb(struct file_system_type *fs_type,
>
> static void proc_kill_sb(struct super_block *sb)
> {
> - struct pid_namespace *ns;
> + struct proc_sb_info *sbi;
>
> - ns = (struct pid_namespace *)sb->s_fs_info;
> + sbi = proc_sbi(sb);
> kill_anon_super(sb);
> - put_pid_ns(ns);
> + put_pid_ns(sbi->pid_ns);
> + put_net(sbi->net_ns);
> + kfree(sbi);
> }
>
> static struct file_system_type proc_fs_type = {
> @@ -106,13 +120,16 @@ static struct file_system_type proc_fs_type = {
>
> void __init proc_root_init(void)
> {
> + struct proc_sb_info info;
> int err = proc_init_inodecache();
> if (err)
> return;
> err = register_filesystem(&proc_fs_type);
> if (err)
> return;
> - proc_mnt = kern_mount_data(&proc_fs_type, &init_pid_ns);
> + info.pid_ns = &init_pid_ns;
> + info.net_ns = current->nsproxy->net_ns;
> + proc_mnt = kern_mount_data(&proc_fs_type, &info);
> err = PTR_ERR(proc_mnt);
> if (IS_ERR(proc_mnt)) {
> unregister_filesystem(&proc_fs_type);
> @@ -214,8 +231,11 @@ struct proc_dir_entry proc_root = {
>
> int pid_ns_prepare_proc(struct pid_namespace *ns)
> {
> + struct proc_sb_info info;
> struct vfsmount *mnt;
>
> + info.pid_ns = ns;
> + info.net_ns = current->nsproxy->net_ns;
> mnt = kern_mount_data(&proc_fs_type, ns);
> if (IS_ERR(mnt))
> return PTR_ERR(mnt);
> diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
> index 2b3c1d8..c22c558 100644
> --- a/include/linux/proc_fs.h
> +++ b/include/linux/proc_fs.h
> @@ -48,6 +48,9 @@ typedef int (read_proc_t)(char *page, char **start, off_t off,
> typedef int (write_proc_t)(struct file *file, const char __user *buffer,
> unsigned long count, void *data);
> typedef int (get_info_t)(char *, char **, off_t, int);
> +struct proc_dir_entry;
> +typedef struct proc_dir_entry *(shadow_proc_t)(struct super_block *sb,
> + struct proc_dir_entry *pde);
>
> struct proc_dir_entry {
> unsigned int low_ino;
> @@ -79,6 +82,7 @@ struct proc_dir_entry {
> int pde_users; /* number of callers into module in progress */
> spinlock_t pde_unload_lock; /* proc_fops checks and pde_users bumps */
> struct completion *pde_unload_completion;
> + shadow_proc_t *shadow_proc;
> };
>
> struct kcore_list {
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2007-11-21 9:37 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <fa.zy7JwM3jsOSgOCtqK2+rvFfdGjQ@ifi.uio.no>
[not found] ` <fa.Zx9jkdx74KRPk1qghLrg9BCvfFU@ifi.uio.no>
[not found] ` <fa.1TKmo5fKBZfHOQYq1bH4uMxOQek@ifi.uio.no>
[not found] ` <fa.fjJG0rd93RGzZ4PSv/glscvAI0A@ifi.uio.no>
[not found] ` <fa.7XLWa+gAWL3Q6I3O+hiS4UfcWpM@ifi.uio.no>
[not found] ` <fa.QqYdKsBUWKSLLGXmxJCAtZxLYnE@ifi.uio.no>
2007-11-21 1:21 ` 2.6.24-rc3: find complains about /proc/net Robert Hancock
2007-11-21 1:41 ` Eric W. Biederman
[not found] <20071119191000.GA1560@elf.ucw.cz>
2007-11-19 22:04 ` Rafael J. Wysocki
2007-11-20 15:51 ` Pavel Emelyanov
2007-11-20 21:52 ` Eric W. Biederman
2007-11-20 21:59 ` Ingo Molnar
2007-11-20 22:17 ` Eric W. Biederman
2007-11-20 22:35 ` Ingo Molnar
2007-11-20 22:54 ` Roland McGrath
2007-11-20 23:01 ` Ingo Molnar
2007-11-20 23:06 ` Guillaume Chazarain
2007-11-20 23:26 ` Roland McGrath
2007-11-20 23:32 ` Ulrich Drepper
2007-11-20 23:45 ` Ingo Molnar
2007-11-20 23:51 ` Roland McGrath
2007-11-21 0:47 ` Eric W. Biederman
2007-11-21 1:01 ` Rafael J. Wysocki
2007-11-21 0:41 ` Eric W. Biederman
2007-11-20 23:43 ` Ingo Molnar
2007-11-21 1:19 ` Eric W. Biederman
2007-11-21 6:36 ` Eric W. Biederman
2007-11-21 9:36 ` Pavel Emelyanov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).