* /proc/pid/fd/ shows strange mode when executed via sudo. @ 2012-05-02 13:40 Tetsuo Handa 2012-05-03 15:42 ` Serge Hallyn 0 siblings, 1 reply; 20+ messages in thread From: Tetsuo Handa @ 2012-05-02 13:40 UTC (permalink / raw) To: linux-fsdevel; +Cc: linux-security-module I noticed a strange difference in /proc/pid/fd/ directory when a command is executed via /usr/bin/sudo. Say, there are three files in some directory. (In my environment, /tmp/ is a plain ext4 partition.) # touch /tmp/1 # touch /tmp/2 # touch /tmp/3 # ls -l /tmp/? -rw-r--r-- 1 root root 0 May 2 21:48 /tmp/1 -rw-r--r-- 1 root root 0 May 2 21:48 /tmp/2 -rw-r--r-- 1 root root 0 May 2 21:48 /tmp/3 Try to read one of them using "tail -f" from one terminal. # tail -f /tmp/1 Show /proc/pid/fd/ from another terminal. # ls -l /proc/`pidof tail`/fd/ total 0 lrwx------ 1 root root 64 May 2 21:54 0 -> /dev/pts/0 lrwx------ 1 root root 64 May 2 21:54 1 -> /dev/pts/0 lrwx------ 1 root root 64 May 2 21:54 2 -> /dev/pts/0 lr-x------ 1 root root 64 May 2 21:54 3 -> /tmp/1 lr-x------ 1 root root 64 May 2 21:54 4 -> anon_inode:inotify Quit the "tail -f". Try to read two of them using "tail -f". # tail -f /tmp/1 /tmp/2 Show /proc/pid/fd/ from another terminal. # ls -l /proc/`pidof tail`/fd/ total 0 lrwx------ 1 root root 64 May 2 21:54 0 -> /dev/pts/0 lrwx------ 1 root root 64 May 2 21:54 1 -> /dev/pts/0 lrwx------ 1 root root 64 May 2 21:54 2 -> /dev/pts/0 lr-x------ 1 root root 64 May 2 21:54 3 -> /tmp/1 lr-x------ 1 root root 64 May 2 21:54 4 -> /tmp/2 lr-x------ 1 root root 64 May 2 21:54 5 -> anon_inode:inotify Quit the "tail -f". Try to read three of them using "tail -f". # tail -f /tmp/1 /tmp/2 /tmp/3 Show /proc/pid/fd/ from another terminal. # ls -l /proc/`pidof tail`/fd/ total 0 lrwx------ 1 root root 64 May 2 21:55 0 -> /dev/pts/0 lrwx------ 1 root root 64 May 2 21:55 1 -> /dev/pts/0 lrwx------ 1 root root 64 May 2 21:55 2 -> /dev/pts/0 lr-x------ 1 root root 64 May 2 21:55 3 -> /tmp/1 lr-x------ 1 root root 64 May 2 21:55 4 -> /tmp/2 lr-x------ 1 root root 64 May 2 21:55 5 -> /tmp/3 lr-x------ 1 root root 64 May 2 21:55 6 -> anon_inode:inotify Quit the "tail -f". You see, they are all fine. However, the output is different when executed via /usr/bin/sudo . Try to read one of them using "sudo tail -f" from one terminal. # sudo tail -f /tmp/1 Show /proc/pid/fd/ from another terminal. # ls -l /proc/`pidof tail`/fd/ total 0 lrwx------ 1 root root 64 May 2 21:55 0 -> /dev/pts/0 lrwx------ 1 root root 64 May 2 21:55 1 -> /dev/pts/0 lrwx------ 1 root root 64 May 2 21:55 2 -> /dev/pts/0 lrwx------ 1 root root 64 May 2 21:55 3 -> /tmp/1 lrwx------ 1 root root 64 May 2 21:55 4 -> anon_inode:inotify Quit the "tail -f". Try to read two of them using "sudo tail -f". # sudo tail -f /tmp/1 /tmp/2 Show /proc/pid/fd/ from another terminal. # ls -l /proc/`pidof tail`/fd/ total 0 lrwx------ 1 root root 64 May 2 21:56 0 -> /dev/pts/0 lrwx------ 1 root root 64 May 2 21:56 1 -> /dev/pts/0 lrwx------ 1 root root 64 May 2 21:56 2 -> /dev/pts/0 lrwx------ 1 root root 64 May 2 21:56 3 -> /tmp/1 lrwx------ 1 root root 64 May 2 21:56 4 -> /tmp/2 lr-x------ 1 root root 64 May 2 21:56 5 -> anon_inode:inotify Quit the "tail -f". Try to read three of them using "sudo tail -f". # sudo tail -f /tmp/1 /tmp/2 /tmp/3 Show /proc/pid/fd/ from another terminal. # ls -l /proc/`pidof tail`/fd/ total 0 lrwx------ 1 root root 64 May 2 21:56 0 -> /dev/pts/0 lrwx------ 1 root root 64 May 2 21:56 1 -> /dev/pts/0 lrwx------ 1 root root 64 May 2 21:56 2 -> /dev/pts/0 lrwx------ 1 root root 64 May 2 21:56 3 -> /tmp/1 lrwx------ 1 root root 64 May 2 21:56 4 -> /tmp/2 lr-x------ 1 root root 64 May 2 21:56 5 -> /tmp/3 lr-x------ 1 root root 64 May 2 21:56 6 -> anon_inode:inotify Quit the "tail -f". You see, when executed via /usr/bin/sudo , fd == 3 and fd == 4 are reported as "lrwx------" whereas fd >= 5 are reported as "lr-x------". # strace -f -e open sudo tail -f /tmp/1 /tmp/2 /tmp/3 shows that /usr/bin/tail is opening /tmp/1 /tmp/2 /tmp/3 as O_RDONLY. /usr/bin/sudo can't set w bit before /usr/bin/tail opens them with r bit. I wonder from where the w bit came... Above result was obtained using kernel 3.2.0-24-generic-pae (3.2.0-24.37) on Ubuntu 12.04, but below result (similar but not identical) was obtained using vanilla 3.4-rc5 kernel on CentOS 6.2. -- (normal case. normal result.) # tail -f /tmp/1 /tmp/2 # ls -l /proc/`pidof tail`/fd/ total 0 lrwx------ 1 root root 64 May 2 21:04 0 -> /dev/pts/2 lrwx------ 1 root root 64 May 2 21:04 1 -> /dev/pts/2 lrwx------ 1 root root 64 May 2 21:04 2 -> /dev/pts/2 lr-x------ 1 root root 64 May 2 21:04 3 -> /tmp/1 lr-x------ 1 root root 64 May 2 21:04 4 -> /tmp/2 lr-x------ 1 root root 64 May 2 21:04 5 -> anon_inode:inotify -- (sudo case. only fd == 3 got w bit.) # sudo tail -f /tmp/1 /tmp/2 # ls -l /proc/`pidof tail`/fd/ total 0 lrwx------ 1 root root 64 May 2 21:05 0 -> /dev/pts/2 lrwx------ 1 root root 64 May 2 21:05 1 -> /dev/pts/2 lrwx------ 1 root root 64 May 2 21:05 2 -> /dev/pts/2 lrwx------ 1 root root 64 May 2 21:05 3 -> /tmp/1 lr-x------ 1 root root 64 May 2 21:05 4 -> /tmp/2 lr-x------ 1 root root 64 May 2 21:05 5 -> anon_inode:inotify -- (normal case. normal result.) # tail -f /tmp/1 /tmp/2 /tmp/3 # ls -l /proc/`pidof tail`/fd/ total 0 lrwx------ 1 root root 64 May 2 21:07 0 -> /dev/pts/2 lrwx------ 1 root root 64 May 2 21:07 1 -> /dev/pts/2 lrwx------ 1 root root 64 May 2 21:07 2 -> /dev/pts/2 lr-x------ 1 root root 64 May 2 21:07 3 -> /tmp/1 lr-x------ 1 root root 64 May 2 21:07 4 -> /tmp/2 lr-x------ 1 root root 64 May 2 21:07 5 -> /tmp/3 lr-x------ 1 root root 64 May 2 21:07 6 -> anon_inode:inotify -- (sudo case. fd == 3 and fd == 6 got w bit.) # sudo tail -f /tmp/1 /tmp/2 /tmp/3 # ls -l /proc/`pidof tail`/fd/ total 0 lrwx------ 1 root root 64 May 2 21:07 0 -> /dev/pts/2 lrwx------ 1 root root 64 May 2 21:07 1 -> /dev/pts/2 lrwx------ 1 root root 64 May 2 21:07 2 -> /dev/pts/2 lrwx------ 1 root root 64 May 2 21:07 3 -> /tmp/1 lr-x------ 1 root root 64 May 2 21:07 4 -> /tmp/2 lr-x------ 1 root root 64 May 2 21:07 5 -> /tmp/3 lrwx------ 1 root root 64 May 2 21:07 6 -> anon_inode:inotify I guess something is wrong. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: /proc/pid/fd/ shows strange mode when executed via sudo. 2012-05-02 13:40 /proc/pid/fd/ shows strange mode when executed via sudo Tetsuo Handa @ 2012-05-03 15:42 ` Serge Hallyn 2012-05-03 16:25 ` Tetsuo Handa 0 siblings, 1 reply; 20+ messages in thread From: Serge Hallyn @ 2012-05-03 15:42 UTC (permalink / raw) To: Tetsuo Handa; +Cc: linux-fsdevel, linux-security-module, keescook Quoting Tetsuo Handa (penguin-kernel@I-love.SAKURA.ne.jp): > I noticed a strange difference in /proc/pid/fd/ directory > when a command is executed via /usr/bin/sudo. > > Say, there are three files in some directory. > (In my environment, /tmp/ is a plain ext4 partition.) > > # touch /tmp/1 > # touch /tmp/2 > # touch /tmp/3 > # ls -l /tmp/? > -rw-r--r-- 1 root root 0 May 2 21:48 /tmp/1 > -rw-r--r-- 1 root root 0 May 2 21:48 /tmp/2 > -rw-r--r-- 1 root root 0 May 2 21:48 /tmp/3 > > Try to read one of them using "tail -f" from one terminal. > > # tail -f /tmp/1 > > Show /proc/pid/fd/ from another terminal. > > # ls -l /proc/`pidof tail`/fd/ > total 0 > lrwx------ 1 root root 64 May 2 21:54 0 -> /dev/pts/0 > lrwx------ 1 root root 64 May 2 21:54 1 -> /dev/pts/0 > lrwx------ 1 root root 64 May 2 21:54 2 -> /dev/pts/0 > lr-x------ 1 root root 64 May 2 21:54 3 -> /tmp/1 > lr-x------ 1 root root 64 May 2 21:54 4 -> anon_inode:inotify > > Quit the "tail -f". Try to read two of them using "tail -f". > > # tail -f /tmp/1 /tmp/2 > > Show /proc/pid/fd/ from another terminal. > > # ls -l /proc/`pidof tail`/fd/ > total 0 > lrwx------ 1 root root 64 May 2 21:54 0 -> /dev/pts/0 > lrwx------ 1 root root 64 May 2 21:54 1 -> /dev/pts/0 > lrwx------ 1 root root 64 May 2 21:54 2 -> /dev/pts/0 > lr-x------ 1 root root 64 May 2 21:54 3 -> /tmp/1 > lr-x------ 1 root root 64 May 2 21:54 4 -> /tmp/2 > lr-x------ 1 root root 64 May 2 21:54 5 -> anon_inode:inotify > > Quit the "tail -f". Try to read three of them using "tail -f". > > # tail -f /tmp/1 /tmp/2 /tmp/3 > > Show /proc/pid/fd/ from another terminal. > > # ls -l /proc/`pidof tail`/fd/ > total 0 > lrwx------ 1 root root 64 May 2 21:55 0 -> /dev/pts/0 > lrwx------ 1 root root 64 May 2 21:55 1 -> /dev/pts/0 > lrwx------ 1 root root 64 May 2 21:55 2 -> /dev/pts/0 > lr-x------ 1 root root 64 May 2 21:55 3 -> /tmp/1 > lr-x------ 1 root root 64 May 2 21:55 4 -> /tmp/2 > lr-x------ 1 root root 64 May 2 21:55 5 -> /tmp/3 > lr-x------ 1 root root 64 May 2 21:55 6 -> anon_inode:inotify > > Quit the "tail -f". You see, they are all fine. > > However, the output is different when executed via /usr/bin/sudo . > > Try to read one of them using "sudo tail -f" from one terminal. > > # sudo tail -f /tmp/1 > > Show /proc/pid/fd/ from another terminal. > > # ls -l /proc/`pidof tail`/fd/ > total 0 > lrwx------ 1 root root 64 May 2 21:55 0 -> /dev/pts/0 > lrwx------ 1 root root 64 May 2 21:55 1 -> /dev/pts/0 > lrwx------ 1 root root 64 May 2 21:55 2 -> /dev/pts/0 > lrwx------ 1 root root 64 May 2 21:55 3 -> /tmp/1 > lrwx------ 1 root root 64 May 2 21:55 4 -> anon_inode:inotify > > Quit the "tail -f". Try to read two of them using "sudo tail -f". > > # sudo tail -f /tmp/1 /tmp/2 > > Show /proc/pid/fd/ from another terminal. > > # ls -l /proc/`pidof tail`/fd/ > total 0 > lrwx------ 1 root root 64 May 2 21:56 0 -> /dev/pts/0 > lrwx------ 1 root root 64 May 2 21:56 1 -> /dev/pts/0 > lrwx------ 1 root root 64 May 2 21:56 2 -> /dev/pts/0 > lrwx------ 1 root root 64 May 2 21:56 3 -> /tmp/1 > lrwx------ 1 root root 64 May 2 21:56 4 -> /tmp/2 > lr-x------ 1 root root 64 May 2 21:56 5 -> anon_inode:inotify > > Quit the "tail -f". Try to read three of them using "sudo tail -f". > > # sudo tail -f /tmp/1 /tmp/2 /tmp/3 > > Show /proc/pid/fd/ from another terminal. > > # ls -l /proc/`pidof tail`/fd/ > total 0 > lrwx------ 1 root root 64 May 2 21:56 0 -> /dev/pts/0 > lrwx------ 1 root root 64 May 2 21:56 1 -> /dev/pts/0 > lrwx------ 1 root root 64 May 2 21:56 2 -> /dev/pts/0 > lrwx------ 1 root root 64 May 2 21:56 3 -> /tmp/1 > lrwx------ 1 root root 64 May 2 21:56 4 -> /tmp/2 > lr-x------ 1 root root 64 May 2 21:56 5 -> /tmp/3 > lr-x------ 1 root root 64 May 2 21:56 6 -> anon_inode:inotify > > Quit the "tail -f". > > You see, when executed via /usr/bin/sudo , fd == 3 and fd == 4 are reported as > "lrwx------" whereas fd >= 5 are reported as "lr-x------". > > # strace -f -e open sudo tail -f /tmp/1 /tmp/2 /tmp/3 > > shows that /usr/bin/tail is opening /tmp/1 /tmp/2 /tmp/3 as O_RDONLY. > /usr/bin/sudo can't set w bit before /usr/bin/tail opens them with r bit. > I wonder from where the w bit came... Note that if you do sudo strace -f -e open tail -f /tmp/{1,2,3,4} then the fds are not opened with write perms. But if you do as you did, strace -f -e open sudo tail -f /tmp/1 /tmp/2 /tmp/3 they are. Interesting. The same thing also happens for me with tmpfs, and with a debian sid ec2 instance running 2.6.32-5-xen-amd64. > Above result was obtained using kernel 3.2.0-24-generic-pae (3.2.0-24.37) on > Ubuntu 12.04, but below result (similar but not identical) was obtained using > vanilla 3.4-rc5 kernel on CentOS 6.2. > > -- (normal case. normal result.) > # tail -f /tmp/1 /tmp/2 > > # ls -l /proc/`pidof tail`/fd/ > total 0 > lrwx------ 1 root root 64 May 2 21:04 0 -> /dev/pts/2 > lrwx------ 1 root root 64 May 2 21:04 1 -> /dev/pts/2 > lrwx------ 1 root root 64 May 2 21:04 2 -> /dev/pts/2 > lr-x------ 1 root root 64 May 2 21:04 3 -> /tmp/1 > lr-x------ 1 root root 64 May 2 21:04 4 -> /tmp/2 > lr-x------ 1 root root 64 May 2 21:04 5 -> anon_inode:inotify > -- (sudo case. only fd == 3 got w bit.) > # sudo tail -f /tmp/1 /tmp/2 > > # ls -l /proc/`pidof tail`/fd/ > total 0 > lrwx------ 1 root root 64 May 2 21:05 0 -> /dev/pts/2 > lrwx------ 1 root root 64 May 2 21:05 1 -> /dev/pts/2 > lrwx------ 1 root root 64 May 2 21:05 2 -> /dev/pts/2 > lrwx------ 1 root root 64 May 2 21:05 3 -> /tmp/1 > lr-x------ 1 root root 64 May 2 21:05 4 -> /tmp/2 > lr-x------ 1 root root 64 May 2 21:05 5 -> anon_inode:inotify > -- (normal case. normal result.) > # tail -f /tmp/1 /tmp/2 /tmp/3 > > # ls -l /proc/`pidof tail`/fd/ > total 0 > lrwx------ 1 root root 64 May 2 21:07 0 -> /dev/pts/2 > lrwx------ 1 root root 64 May 2 21:07 1 -> /dev/pts/2 > lrwx------ 1 root root 64 May 2 21:07 2 -> /dev/pts/2 > lr-x------ 1 root root 64 May 2 21:07 3 -> /tmp/1 > lr-x------ 1 root root 64 May 2 21:07 4 -> /tmp/2 > lr-x------ 1 root root 64 May 2 21:07 5 -> /tmp/3 > lr-x------ 1 root root 64 May 2 21:07 6 -> anon_inode:inotify > -- (sudo case. fd == 3 and fd == 6 got w bit.) > # sudo tail -f /tmp/1 /tmp/2 /tmp/3 > > # ls -l /proc/`pidof tail`/fd/ > total 0 > lrwx------ 1 root root 64 May 2 21:07 0 -> /dev/pts/2 > lrwx------ 1 root root 64 May 2 21:07 1 -> /dev/pts/2 > lrwx------ 1 root root 64 May 2 21:07 2 -> /dev/pts/2 > lrwx------ 1 root root 64 May 2 21:07 3 -> /tmp/1 > lr-x------ 1 root root 64 May 2 21:07 4 -> /tmp/2 > lr-x------ 1 root root 64 May 2 21:07 5 -> /tmp/3 > lrwx------ 1 root root 64 May 2 21:07 6 -> anon_inode:inotify > > I guess something is wrong. > -- > To unsubscribe from this list: send the line "unsubscribe linux-security-module" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: /proc/pid/fd/ shows strange mode when executed via sudo. 2012-05-03 15:42 ` Serge Hallyn @ 2012-05-03 16:25 ` Tetsuo Handa 2012-05-18 2:39 ` Tetsuo Handa 0 siblings, 1 reply; 20+ messages in thread From: Tetsuo Handa @ 2012-05-03 16:25 UTC (permalink / raw) To: serge.hallyn; +Cc: linux-fsdevel, linux-security-module, keescook Serge Hallyn wrote: > Note that if you do > > sudo strace -f -e open tail -f /tmp/{1,2,3,4} > > then the fds are not opened with write perms. But if you do as you did, > > strace -f -e open sudo tail -f /tmp/1 /tmp/2 /tmp/3 > > they are. Interesting. > They are not opened with write perms, for vfs_write() rejects write requests with -EBADF. ----- test.c start ----- #include <stdio.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <unistd.h> #define O_LARGEFILE 00100000 int main(int argc, char *argv[]) { int fd1 = open("/tmp/1", O_RDONLY | O_LARGEFILE); int fd2 = open("/tmp/2", O_RDONLY | O_LARGEFILE); int fd3 = open("/tmp/3", O_RDONLY | O_LARGEFILE); int fd4 = open("/tmp/4", O_RDONLY | O_LARGEFILE); printf("fd1=%d fd2=%d fd3=%d fd4=%d\n", fd1, fd2, fd3, fd4); printf("write(fd1)=%d write(fd2)=%d write(fd3)=%d write(fd4)=%d\n", write(fd1, "", 1), write(fd2, "", 1), write(fd3, "", 1), write(fd4, "", 1)); getchar(); return 0; } ----- test.c end ----- # sudo ./a.out fd1=3 fd2=4 fd3=5 fd4=6 write(fd1)=-1 write(fd2)=-1 write(fd3)=-1 write(fd4)=-1 Also, /proc/pid/fdinfo/ shows correct info. So, the problem is nothing but that the /proc/pid/fd/ is showing strange mode. But I can't imagine what can make /proc/pid/fd/ wrong when /proc/pid/fdinfo/ is correct... > The same thing also happens for me with tmpfs, and with a debian sid ec2 > instance running 2.6.32-5-xen-amd64. 2.6.32-220.13.1.el6 has this problem but 2.6.18-308.4.1.el5 does not have this problem. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: /proc/pid/fd/ shows strange mode when executed via sudo. 2012-05-03 16:25 ` Tetsuo Handa @ 2012-05-18 2:39 ` Tetsuo Handa 2012-05-18 9:27 ` Tetsuo Handa 0 siblings, 1 reply; 20+ messages in thread From: Tetsuo Handa @ 2012-05-18 2:39 UTC (permalink / raw) To: linux-fsdevel; +Cc: linux-security-module, serge.hallyn, keescook It turned out that /usr/bin/sudo is using /proc/self/fd/ for closing already opened files. I made a simple demo program that can reproduce this regression. ---------- test.c start ---------- #include <stdio.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <unistd.h> #include <stdlib.h> #include <dirent.h> #include <string.h> static void opentest(void) { FILE *fp = fopen("/dev/tty", "a"); int i; char buffer[1024]; memset(buffer, 0, sizeof(buffer)); for (i = 0; i < 5; i++) { struct stat buf; int fd = open("/proc/self/exe", O_RDONLY); if (fd == EOF) break; snprintf(buffer, sizeof(buffer) - 1, "/proc/self/fd/%u", fd); if (lstat(buffer, &buf)) continue; if ((buf.st_mode & 0700) == 0700) { char buffer2[1024]; memset(buffer2, 0, sizeof(buffer2)); readlink(buffer, buffer2, sizeof(buffer2) - 1); fprintf(fp, "%s -> %s \n", buffer, buffer2); } } } int main(int argc, char *argv[]) { DIR *dirp = (argc > 1) ? opendir("/proc/self/fd") : NULL; if (dirp) { struct dirent *dent; fprintf(stderr, "closefrom with /proc/self/fd/\n"); while ((dent = readdir(dirp)) != NULL) { int fd; if (sscanf(dent->d_name, "%u", &fd) == 1 && fd != dirfd(dirp)) close(fd); } closedir(dirp); } else { int fd; fprintf(stderr, "closefrom without /proc/self/fd/\n"); for (fd = 0; fd < 1024; fd++) close(fd); } opentest(); return 0; } ---------- test.c end ---------- [root@ccsecurity tmp]# ./a.out 1 closefrom with /proc/self/fd/ /proc/self/fd/1 -> /tmp/a.out /proc/self/fd/2 -> /tmp/a.out [root@ccsecurity tmp]# ./a.out closefrom without /proc/self/fd/ [root@ccsecurity tmp]# I tried on three kernels. 2.6.18-308.4.1.el5 : OK 2.6.26-2-686 (2.6.26-26lenny4) : NG 2.6.32-220.17.1.el6 : NG This regression seems to be introduced between 2.6.19 and 2.6.26. This regression seems to involve opendir()/closedir() usage. Regards. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: /proc/pid/fd/ shows strange mode when executed via sudo. 2012-05-18 2:39 ` Tetsuo Handa @ 2012-05-18 9:27 ` Tetsuo Handa 2012-05-18 16:08 ` Linus Torvalds 0 siblings, 1 reply; 20+ messages in thread From: Tetsuo Handa @ 2012-05-18 9:27 UTC (permalink / raw) To: ebiederm; +Cc: linux-fsdevel, akpm, torvalds Hello. I noticed that /proc/self/fd/ returns wrong information to lstat() when files are open()ed after already opened files have been close()d in accordance with entries returned from opendir("/proc/self/fd"). Bisection reached to below commit added between 2.6.18 and 2.6.19-rc1. git bisect start '3f2e05e90e0846c42626e3d272454f26be34a1bc' 'e9ff3990f08e9a0c2839cc22808b01732ea5b3e4' '1651e14e28a2d9f446018ef522882e0709a2ce4f' '43fa1adb9334bf4585cd53144eb5911488f85bc7' '609d7fa9565c754428d2520cac2accc9052e1245' 'ee0b3e671baff681d69fbf0db33b47603c0a8280' 'cf342e52e3117391868fb4bd900ce772a27a5a1a' 'v2.6.18' 'v2.6.17' 'v2.6.16' 'v2.6.15' 'v2.6.14' 'v2.6.13' 'v2.6.12' git bisect good 5f4c6bc1f369f20807a8e753c2308d1629478c61 git bisect bad 5e6b3f42edc20e988b186fbfb9eec174294222ea git bisect good 801199ce805a2412bbcd9bfe213092ec656013dd git bisect bad 61a28784028e6d55755e4d0f39bee8d9bf2ee8d9 git bisect good 444ceed8d186631fdded5e3f24dc20b93d0d3fda Below commit replaced filldir() with proc_pident_fill_cache(). Eric, would you check (though no need to be urgent)? commit 61a28784028e6d55755e4d0f39bee8d9bf2ee8d9 Author: Eric W. Biederman <ebiederm@xmission.com> Date: Mon Oct 2 02:18:49 2006 -0700 [PATCH] proc: Remove the hard coded inode numbers The hard coded inode numbers in proc currently limit its maintainability, its flexibility, and what can be done with the rest of system. /proc limits pid-max to 32768 on 32 bit systems it limits fd-max to 32768 on all systems, and placing the pid in the inode number really gets in the way of implementing subdirectories of per process information. Ever since people started adding to the middle of the file type enumeration we haven't been maintaing the historical inode numbers, all we have really succeeded in doing is keeping the pid in the proc inode number. The pid is already available in the directory name so no information is lost removing it from the inode number. So if something in user space cares if we remove the inode number from the /proc inode it is almost certainly broken. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> :040000 040000 1bf9577e5bab5f938e780de96ab79003523688ec 46f890c72c75bfb6ecaee8c985f5127bcf82eef3 M fs Tetsuo Handa wrote: > It turned out that /usr/bin/sudo is using /proc/self/fd/ for closing already > opened files. I made a simple demo program that can reproduce this regression. > > ---------- test.c start ---------- > #include <stdio.h> > #include <sys/types.h> > #include <sys/stat.h> > #include <fcntl.h> > #include <unistd.h> > #include <stdlib.h> > #include <dirent.h> > #include <string.h> > > static void opentest(void) > { > FILE *fp = fopen("/dev/tty", "a"); > int i; > char buffer[1024]; > memset(buffer, 0, sizeof(buffer)); > for (i = 0; i < 5; i++) { > struct stat buf; > int fd = open("/proc/self/exe", O_RDONLY); > if (fd == EOF) > break; > snprintf(buffer, sizeof(buffer) - 1, "/proc/self/fd/%u", fd); > if (lstat(buffer, &buf)) > continue; > if ((buf.st_mode & 0700) == 0700) { > char buffer2[1024]; > memset(buffer2, 0, sizeof(buffer2)); > readlink(buffer, buffer2, sizeof(buffer2) - 1); > fprintf(fp, "%s -> %s \n", buffer, buffer2); > } > } > } > > int main(int argc, char *argv[]) > { > DIR *dirp = (argc > 1) ? opendir("/proc/self/fd") : NULL; > if (dirp) { > struct dirent *dent; > fprintf(stderr, "closefrom with /proc/self/fd/\n"); > while ((dent = readdir(dirp)) != NULL) { > int fd; > if (sscanf(dent->d_name, "%u", &fd) == 1 && > fd != dirfd(dirp)) > close(fd); > } > closedir(dirp); > } else { > int fd; > fprintf(stderr, "closefrom without /proc/self/fd/\n"); > for (fd = 0; fd < 1024; fd++) > close(fd); > } > opentest(); > return 0; > } > ---------- test.c end ---------- > > [root@ccsecurity tmp]# ./a.out 1 > closefrom with /proc/self/fd/ > /proc/self/fd/1 -> /tmp/a.out > /proc/self/fd/2 -> /tmp/a.out > [root@ccsecurity tmp]# ./a.out > closefrom without /proc/self/fd/ > [root@ccsecurity tmp]# > > I tried on three kernels. > > 2.6.18-308.4.1.el5 : OK > 2.6.26-2-686 (2.6.26-26lenny4) : NG > 2.6.32-220.17.1.el6 : NG > > This regression seems to be introduced between 2.6.19 and 2.6.26. > This regression seems to involve opendir()/closedir() usage. > > Regards. > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: /proc/pid/fd/ shows strange mode when executed via sudo. 2012-05-18 9:27 ` Tetsuo Handa @ 2012-05-18 16:08 ` Linus Torvalds 2012-05-18 16:25 ` Linus Torvalds 2012-05-18 18:08 ` Al Viro 0 siblings, 2 replies; 20+ messages in thread From: Linus Torvalds @ 2012-05-18 16:08 UTC (permalink / raw) To: Tetsuo Handa, Al Viro; +Cc: ebiederm, linux-fsdevel, akpm [-- Attachment #1: Type: text/plain, Size: 3218 bytes --] [ Added Al explicitly, although I guess he sees the fsdevel posts too ] On Fri, May 18, 2012 at 2:27 AM, Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> wrote: > > I noticed that /proc/self/fd/ returns wrong information to lstat() when files > are open()ed after already opened files have been close()d in accordance with > entries returned from opendir("/proc/self/fd"). So if I understand this correctly, the wrong information is otherwise fine, except the link has the S_IWUSR bit set. Correct? The code that determines the mode of the link is proc_fd_instantiate(), which does inode->i_mode = S_IFLNK; /* * We are not taking a ref to the file structure, so we must * hold ->file_lock. */ spin_lock(&files->file_lock); file = fcheck_files(files, fd); if (!file) goto out_unlock; if (file->f_mode & FMODE_READ) inode->i_mode |= S_IRUSR | S_IXUSR; if (file->f_mode & FMODE_WRITE) inode->i_mode |= S_IWUSR | S_IXUSR; spin_unlock(&files->file_lock); and what *seems* to happen is that your test program basically triggers a situation where the old /proc/self/fd/<x> dentry has not been replaced, so it contains the previous one that was writable (because you used to have a writable fd there before you closed it). The "readdir()" you have probably only matters because it instantiates the dentries in /proc/self/fd, and then the "close()" just doesn't invalidate the cached dentry - and neither does the lookup(). In fact, I can show the issue much more simply with this program, no readdir() required: --- test.c --- #include <stdio.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <unistd.h> #include <string.h> static void show_stat(const char *mode, int fd) { char buffer[100]; struct stat st; snprintf(buffer, sizeof(buffer), "/proc/self/fd/%u", fd); if (!lstat(buffer, &st)) printf("st_mode = %06o (%s)\n", st.st_mode, mode); } int main(int argc, char *argv[]) { int fdrw = open("/dev/null", O_RDWR); int fdro = open("/dev/null", O_RDONLY); show_stat("read-write", fdrw); show_stat("read-only", fdro); dup2(fdro, fdrw); show_stat("read-only", fdrw); show_stat("read-only", fdro); return 0; } --- end test.c --- and running it results in: [torvalds@i5 ~]$ ./a.out st_mode = 120700 (read-write) st_mode = 120500 (read-only) st_mode = 120700 (read-only) st_mode = 120500 (read-only) notice how the "dup2()" that closed the read-write fd and replaced it with the read-only fd still shows 0700 in the lstat information. The good news is that we still do check the inode permission when actually following the link and doing the open(), so you cannot actually open a read-only file using that link. So it's a "stale information" thing, and ugly, but not a security issue. I would suggest just moving the i_mode initialization from proc_fd_instantiate() into the revalidate function that we already have, and that already fixes up i_uid/i_gid etc. Attached is a TOTALLY UNTESTED patch that does this, and actually seems to simplify things in the process. Al, Eric? Linus [-- Attachment #2: patch.diff --] [-- Type: application/octet-stream, Size: 2527 bytes --] fs/proc/base.c | 43 ++++++++++++++----------------------------- 1 file changed, 14 insertions(+), 29 deletions(-) diff --git a/fs/proc/base.c b/fs/proc/base.c index 1c8b280146d7..7d6ad98631f2 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -1799,10 +1799,15 @@ static int tid_fd_revalidate(struct dentry *dentry, struct nameidata *nd) if (task) { files = get_files_struct(task); if (files) { + struct file *file; rcu_read_lock(); - if (fcheck_files(files, fd)) { + file = fcheck_files(files, fd); + if (file) { + unsigned i_mode, f_mode = file->f_mode; + rcu_read_unlock(); put_files_struct(files); + if (task_dumpable(task)) { rcu_read_lock(); cred = __task_cred(task); @@ -1813,7 +1818,14 @@ static int tid_fd_revalidate(struct dentry *dentry, struct nameidata *nd) inode->i_uid = 0; inode->i_gid = 0; } - inode->i_mode &= ~(S_ISUID | S_ISGID); + + i_mode = S_IFLNK; + if (f_mode & FMODE_READ) + i_mode |= S_IRUSR | S_IXUSR; + if (f_mode & FMODE_WRITE) + i_mode |= S_IWUSR | S_IXUSR; + inode->i_mode = i_mode; + security_task_to_inode(task, inode); put_task_struct(task); return 1; @@ -1837,8 +1849,6 @@ static struct dentry *proc_fd_instantiate(struct inode *dir, struct dentry *dentry, struct task_struct *task, const void *ptr) { unsigned fd = *(const unsigned *)ptr; - struct file *file; - struct files_struct *files; struct inode *inode; struct proc_inode *ei; struct dentry *error = ERR_PTR(-ENOENT); @@ -1848,25 +1858,6 @@ static struct dentry *proc_fd_instantiate(struct inode *dir, goto out; ei = PROC_I(inode); ei->fd = fd; - files = get_files_struct(task); - if (!files) - goto out_iput; - inode->i_mode = S_IFLNK; - - /* - * We are not taking a ref to the file structure, so we must - * hold ->file_lock. - */ - spin_lock(&files->file_lock); - file = fcheck_files(files, fd); - if (!file) - goto out_unlock; - if (file->f_mode & FMODE_READ) - inode->i_mode |= S_IRUSR | S_IXUSR; - if (file->f_mode & FMODE_WRITE) - inode->i_mode |= S_IWUSR | S_IXUSR; - spin_unlock(&files->file_lock); - put_files_struct(files); inode->i_op = &proc_pid_link_inode_operations; inode->i_size = 64; @@ -1879,12 +1870,6 @@ static struct dentry *proc_fd_instantiate(struct inode *dir, out: return error; -out_unlock: - spin_unlock(&files->file_lock); - put_files_struct(files); -out_iput: - iput(inode); - goto out; } static struct dentry *proc_lookupfd_common(struct inode *dir, ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: /proc/pid/fd/ shows strange mode when executed via sudo. 2012-05-18 16:08 ` Linus Torvalds @ 2012-05-18 16:25 ` Linus Torvalds 2012-05-18 19:55 ` Eric W. Biederman 2012-05-18 18:08 ` Al Viro 1 sibling, 1 reply; 20+ messages in thread From: Linus Torvalds @ 2012-05-18 16:25 UTC (permalink / raw) To: Tetsuo Handa, Al Viro; +Cc: ebiederm, linux-fsdevel, akpm On Fri, May 18, 2012 at 9:08 AM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > I would suggest just moving the i_mode initialization from > proc_fd_instantiate() into the revalidate function that we already > have, and that already fixes up i_uid/i_gid etc. Attached is a TOTALLY > UNTESTED patch that does this, and actually seems to simplify things > in the process. Ok, so it's now "tested" in the sense that it works for me, and fixes both my and your test-cases. Which doesn't mean that it is bug-free of course, but it does seem to be a sane patch that actually cleans things up. I'll delay committing it in case somebody hollers, but I think I'll mark it for stable too since this issue seems age-old, and the fix looks good. Ack/nak? Can anybody find anything wrong in that patch? Linus ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: /proc/pid/fd/ shows strange mode when executed via sudo. 2012-05-18 16:25 ` Linus Torvalds @ 2012-05-18 19:55 ` Eric W. Biederman 0 siblings, 0 replies; 20+ messages in thread From: Eric W. Biederman @ 2012-05-18 19:55 UTC (permalink / raw) To: Linus Torvalds; +Cc: Tetsuo Handa, Al Viro, linux-fsdevel, akpm Linus Torvalds <torvalds@linux-foundation.org> writes: > On Fri, May 18, 2012 at 9:08 AM, Linus Torvalds > <torvalds@linux-foundation.org> wrote: >> >> I would suggest just moving the i_mode initialization from >> proc_fd_instantiate() into the revalidate function that we already >> have, and that already fixes up i_uid/i_gid etc. Attached is a TOTALLY >> UNTESTED patch that does this, and actually seems to simplify things >> in the process. > > Ok, so it's now "tested" in the sense that it works for me, and fixes > both my and your test-cases. > > Which doesn't mean that it is bug-free of course, but it does seem to > be a sane patch that actually cleans things up. > > I'll delay committing it in case somebody hollers, but I think I'll > mark it for stable too since this issue seems age-old, and the fix > looks good. > > Ack/nak? Can anybody find anything wrong in that patch? Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> It looks reasonable. I am a tad leery of the d_add without setting inode->i_mode but the dcache lookup will have to call revalidate before we access the inode so I don't expect a problem there. Eric ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: /proc/pid/fd/ shows strange mode when executed via sudo. 2012-05-18 16:08 ` Linus Torvalds 2012-05-18 16:25 ` Linus Torvalds @ 2012-05-18 18:08 ` Al Viro 2012-05-18 18:18 ` Linus Torvalds 1 sibling, 1 reply; 20+ messages in thread From: Al Viro @ 2012-05-18 18:08 UTC (permalink / raw) To: Linus Torvalds; +Cc: Tetsuo Handa, ebiederm, linux-fsdevel, akpm > I would suggest just moving the i_mode initialization from > proc_fd_instantiate() into the revalidate function that we already > have, and that already fixes up i_uid/i_gid etc. Attached is a TOTALLY > UNTESTED patch that does this, and actually seems to simplify things > in the process. I think this is bogus. We don't give a fuck about *any* of those fields for symlinks; the only problem here is that default ->getattr() uses them to fill ->st_mode et.al. The same goes for ->i_uid/->i_gid. So how about simply adding ->getattr() for those guys? And to hell with assignments in that ->d_revalidate() instance... ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: /proc/pid/fd/ shows strange mode when executed via sudo. 2012-05-18 18:08 ` Al Viro @ 2012-05-18 18:18 ` Linus Torvalds 2012-05-18 18:23 ` Linus Torvalds 0 siblings, 1 reply; 20+ messages in thread From: Linus Torvalds @ 2012-05-18 18:18 UTC (permalink / raw) To: Al Viro; +Cc: Tetsuo Handa, ebiederm, linux-fsdevel, akpm On Fri, May 18, 2012 at 11:08 AM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > I think this is bogus. We don't give a fuck about *any* of those fields > for symlinks; the only problem here is that default ->getattr() uses > them to fill ->st_mode et.al. The same goes for ->i_uid/->i_gid. > So how about simply adding ->getattr() for those guys? And to hell > with assignments in that ->d_revalidate() instance... Doing it at getattr() time does sound good. Let me try to cook something up. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: /proc/pid/fd/ shows strange mode when executed via sudo. 2012-05-18 18:18 ` Linus Torvalds @ 2012-05-18 18:23 ` Linus Torvalds 2012-05-18 18:45 ` Al Viro 0 siblings, 1 reply; 20+ messages in thread From: Linus Torvalds @ 2012-05-18 18:23 UTC (permalink / raw) To: Al Viro; +Cc: Tetsuo Handa, ebiederm, linux-fsdevel, akpm On Fri, May 18, 2012 at 11:18 AM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > Doing it at getattr() time does sound good. > > Let me try to cook something up. Ugh. It's a much bigger patch, because we share the inode operations with other cases too. So I think that would fall under the "further cleanup" heading, and I'm not going to do it now. Not with 3.4 imminent. But if you were to do such a cleanup later... Linus ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: /proc/pid/fd/ shows strange mode when executed via sudo. 2012-05-18 18:23 ` Linus Torvalds @ 2012-05-18 18:45 ` Al Viro 2012-05-18 18:55 ` Linus Torvalds 0 siblings, 1 reply; 20+ messages in thread From: Al Viro @ 2012-05-18 18:45 UTC (permalink / raw) To: Linus Torvalds; +Cc: Tetsuo Handa, ebiederm, linux-fsdevel, akpm On Fri, May 18, 2012 at 11:23:27AM -0700, Linus Torvalds wrote: > On Fri, May 18, 2012 at 11:18 AM, Linus Torvalds > <torvalds@linux-foundation.org> wrote: > > > > Doing it at getattr() time does sound good. > > > > Let me try to cook something up. > > Ugh. It's a much bigger patch, because we share the inode operations > with other cases too. > > So I think that would fall under the "further cleanup" heading, and > I'm not going to do it now. Not with 3.4 imminent. > > But if you were to do such a cleanup later... It's actually not big. Completely untested minimal variant (without touching ->d_revalidate()): diff --git a/fs/proc/base.c b/fs/proc/base.c index 1c8b280..b2a8b8f 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -1491,6 +1491,44 @@ static const struct inode_operations proc_pid_link_inode_operations = { .setattr = proc_setattr, }; +static int proc_fd_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat) +{ + struct inode *inode = dentry->d_inode; + struct files_struct *files; + struct task_struct *task; + struct file *file; + fmode_t mode = 0; + + generic_fillattr(inode, stat); + + rcu_read_lock(); + task = pid_task(proc_pid(inode), PIDTYPE_PID); + files = task ? get_files_struct(task) : NULL; + file = files ? fcheck_files(files, PROC_I(inode)->fd) : NULL; + if (file) + mode = file->f_mode; + rcu_read_unlock(); + + if (files) + put_files_struct(files); + if (!file) + return -ENOENT; + + if (mode & FMODE_READ) + stat->mode |= S_IRUSR | S_IXUSR; + if (mode & FMODE_WRITE) + stat->mode |= S_IWUSR | S_IXUSR; + stat->size = 64; + return 0; +} + +static const struct inode_operations proc_pid_fd_inode_operations = { + .readlink = proc_pid_readlink, + .follow_link = proc_pid_follow_link, + .setattr = proc_setattr, + .getattr = proc_fd_getattr, +}; + /* building an inode */ @@ -1851,25 +1889,16 @@ static struct dentry *proc_fd_instantiate(struct inode *dir, files = get_files_struct(task); if (!files) goto out_iput; - inode->i_mode = S_IFLNK; - /* - * We are not taking a ref to the file structure, so we must - * hold ->file_lock. - */ - spin_lock(&files->file_lock); + rcu_read_lock(); file = fcheck_files(files, fd); - if (!file) - goto out_unlock; - if (file->f_mode & FMODE_READ) - inode->i_mode |= S_IRUSR | S_IXUSR; - if (file->f_mode & FMODE_WRITE) - inode->i_mode |= S_IWUSR | S_IXUSR; - spin_unlock(&files->file_lock); + rcu_read_unlock(); put_files_struct(files); + if (!file) + goto out_iput; - inode->i_op = &proc_pid_link_inode_operations; - inode->i_size = 64; + inode->i_mode = S_IFLNK; + inode->i_op = &proc_pid_fd_inode_operations; ei->op.proc_get_link = proc_fd_link; d_set_d_op(dentry, &tid_fd_dentry_operations); d_add(dentry, inode); @@ -1879,9 +1908,6 @@ static struct dentry *proc_fd_instantiate(struct inode *dir, out: return error; -out_unlock: - spin_unlock(&files->file_lock); - put_files_struct(files); out_iput: iput(inode); goto out; ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: /proc/pid/fd/ shows strange mode when executed via sudo. 2012-05-18 18:45 ` Al Viro @ 2012-05-18 18:55 ` Linus Torvalds 2012-05-18 19:10 ` Al Viro 0 siblings, 1 reply; 20+ messages in thread From: Linus Torvalds @ 2012-05-18 18:55 UTC (permalink / raw) To: Al Viro; +Cc: Tetsuo Handa, ebiederm, linux-fsdevel, akpm On Fri, May 18, 2012 at 11:45 AM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > It's actually not big. Completely untested minimal variant (without > touching ->d_revalidate()): Get rid of all the now unnecessary files stuff and error handling in proc_fd_instantiate() too (it's unnecessary - it only ends up being re-done in the revalidate function anyway), and I'll take it. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: /proc/pid/fd/ shows strange mode when executed via sudo. 2012-05-18 18:55 ` Linus Torvalds @ 2012-05-18 19:10 ` Al Viro 2012-05-18 20:49 ` Linus Torvalds 0 siblings, 1 reply; 20+ messages in thread From: Al Viro @ 2012-05-18 19:10 UTC (permalink / raw) To: Linus Torvalds; +Cc: Tetsuo Handa, ebiederm, linux-fsdevel, akpm On Fri, May 18, 2012 at 11:55:21AM -0700, Linus Torvalds wrote: > On Fri, May 18, 2012 at 11:45 AM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > It's actually not big. ?Completely untested minimal variant (without > > touching ->d_revalidate()): > > Get rid of all the now unnecessary files stuff and error handling in > proc_fd_instantiate() too (it's unnecessary - it only ends up being > re-done in the revalidate function anyway), and I'll take it. Umm... I'd rather do it other way - do that test *before* bothering with allocating an inode. IOW, use it as early cutoff, with tid_fd_revalidate() called in the end closing any races about file getting closed while we'd been allocating an inode, etc. IOW, how about this? Note that it's _still_ completely untested wrt weird LSM stuff. I simply don't have tomoyo/apparmor/whatnot setups to test on, so I'd rather see an ACK from the LSM people. diff --git a/fs/proc/base.c b/fs/proc/base.c index 1c8b280..e663a04 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -1491,6 +1491,44 @@ static const struct inode_operations proc_pid_link_inode_operations = { .setattr = proc_setattr, }; +static int proc_fd_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat) +{ + struct inode *inode = dentry->d_inode; + struct files_struct *files; + struct task_struct *task; + struct file *file; + fmode_t mode = 0; + + generic_fillattr(inode, stat); + + rcu_read_lock(); + task = pid_task(proc_pid(inode), PIDTYPE_PID); + files = task ? get_files_struct(task) : NULL; + file = files ? fcheck_files(files, PROC_I(inode)->fd) : NULL; + if (file) + mode = file->f_mode; + rcu_read_unlock(); + + if (files) + put_files_struct(files); + if (!file) + return -ENOENT; + + if (mode & FMODE_READ) + stat->mode |= S_IRUSR | S_IXUSR; + if (mode & FMODE_WRITE) + stat->mode |= S_IWUSR | S_IXUSR; + stat->size = 64; + return 0; +} + +static const struct inode_operations proc_pid_fd_inode_operations = { + .readlink = proc_pid_readlink, + .follow_link = proc_pid_follow_link, + .setattr = proc_setattr, + .getattr = proc_fd_getattr, +}; + /* building an inode */ @@ -1837,54 +1875,38 @@ static struct dentry *proc_fd_instantiate(struct inode *dir, struct dentry *dentry, struct task_struct *task, const void *ptr) { unsigned fd = *(const unsigned *)ptr; - struct file *file; - struct files_struct *files; + struct file *file = NULL; + struct files_struct *files = get_files_struct(task); struct inode *inode; struct proc_inode *ei; - struct dentry *error = ERR_PTR(-ENOENT); + + if (files) { + rcu_read_lock(); + file = fcheck_files(files, fd); + rcu_read_unlock(); + put_files_struct(files); + } + + if (!file) + return ERR_PTR(-ENOENT); inode = proc_pid_make_inode(dir->i_sb, task); if (!inode) - goto out; + return ERR_PTR(-ENOMEM); ei = PROC_I(inode); ei->fd = fd; - files = get_files_struct(task); - if (!files) - goto out_iput; inode->i_mode = S_IFLNK; - - /* - * We are not taking a ref to the file structure, so we must - * hold ->file_lock. - */ - spin_lock(&files->file_lock); - file = fcheck_files(files, fd); - if (!file) - goto out_unlock; - if (file->f_mode & FMODE_READ) - inode->i_mode |= S_IRUSR | S_IXUSR; - if (file->f_mode & FMODE_WRITE) - inode->i_mode |= S_IWUSR | S_IXUSR; - spin_unlock(&files->file_lock); - put_files_struct(files); - - inode->i_op = &proc_pid_link_inode_operations; - inode->i_size = 64; + inode->i_op = &proc_pid_fd_inode_operations; ei->op.proc_get_link = proc_fd_link; d_set_d_op(dentry, &tid_fd_dentry_operations); d_add(dentry, inode); - /* Close the race of the process dying before we return the dentry */ - if (tid_fd_revalidate(dentry, NULL)) - error = NULL; - - out: - return error; -out_unlock: - spin_unlock(&files->file_lock); - put_files_struct(files); -out_iput: - iput(inode); - goto out; + /* + * Close the race of the process dying or file getting closed + * before we return the dentry + */ + if (!tid_fd_revalidate(dentry, NULL)) + return ERR_PTR(-ENOENT); + return NULL; } static struct dentry *proc_lookupfd_common(struct inode *dir, ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: /proc/pid/fd/ shows strange mode when executed via sudo. 2012-05-18 19:10 ` Al Viro @ 2012-05-18 20:49 ` Linus Torvalds 2012-05-18 21:23 ` Al Viro 0 siblings, 1 reply; 20+ messages in thread From: Linus Torvalds @ 2012-05-18 20:49 UTC (permalink / raw) To: Al Viro; +Cc: Tetsuo Handa, ebiederm, linux-fsdevel, akpm On Fri, May 18, 2012 at 12:10 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > Umm... I'd rather do it other way - do that test *before* bothering > with allocating an inode. IOW, use it as early cutoff, with > tid_fd_revalidate() called in the end closing any races about > file getting closed while we'd been allocating an inode, etc. I think you're optimizing the wrong case, and adding code to do so. The ENOENT case never happens in practice - there's no sane situation where you'd look up a non-existend /proc/xyz/fd/X file. So rather than "optimize" the case where you don't need an inode allocation, you're just wasting time doing the file lookup twice, and adding more code. Also, thinking some more about it: while I do agree that doing things at getattr() time is a really clean approach, we still do need the dentry revalidation for the existence check. And at that point it's actually pretty much free to just update the inode information, since we had to do the file lookup anyway. So I'm actually starting to think that I prefer my original simpler patch after all. It handles the case we care about, and doesn't add any unnecessary code. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: /proc/pid/fd/ shows strange mode when executed via sudo. 2012-05-18 20:49 ` Linus Torvalds @ 2012-05-18 21:23 ` Al Viro 2012-05-18 21:26 ` Linus Torvalds 0 siblings, 1 reply; 20+ messages in thread From: Al Viro @ 2012-05-18 21:23 UTC (permalink / raw) To: Linus Torvalds; +Cc: Tetsuo Handa, ebiederm, linux-fsdevel, akpm On Fri, May 18, 2012 at 01:49:18PM -0700, Linus Torvalds wrote: > I think you're optimizing the wrong case, and adding code to do so. > > The ENOENT case never happens in practice - there's no sane situation > where you'd look up a non-existend /proc/xyz/fd/X file. I would agree if we only called that on lookup. We also do that on readdir(), for every descriptor in range 0..files->max_fds-1. So if you have sufficiently sparse set of descriptors, it will be called a _lot_. Moreover, that's the usual path for calling it, exactly because we do getdents before trying to open/lstat/anything else. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: /proc/pid/fd/ shows strange mode when executed via sudo. 2012-05-18 21:23 ` Al Viro @ 2012-05-18 21:26 ` Linus Torvalds 2012-05-18 21:32 ` Linus Torvalds 0 siblings, 1 reply; 20+ messages in thread From: Linus Torvalds @ 2012-05-18 21:26 UTC (permalink / raw) To: Al Viro; +Cc: Tetsuo Handa, ebiederm, linux-fsdevel, akpm On Fri, May 18, 2012 at 2:23 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > I would agree if we only called that on lookup. We also do that on > readdir(), for every descriptor in range 0..files->max_fds-1. So > if you have sufficiently sparse set of descriptors, it will be > called a _lot_. Moreover, that's the usual path for calling it, > exactly because we do getdents before trying to open/lstat/anything > else. Ahh, good catch. However, wouldn't it be better to move the check you added to the readdir() path, then, so that we don't do it unnecessarily for lookups.. I'll check how that looks. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: /proc/pid/fd/ shows strange mode when executed via sudo. 2012-05-18 21:26 ` Linus Torvalds @ 2012-05-18 21:32 ` Linus Torvalds 2012-05-18 22:29 ` Al Viro 0 siblings, 1 reply; 20+ messages in thread From: Linus Torvalds @ 2012-05-18 21:32 UTC (permalink / raw) To: Al Viro; +Cc: Tetsuo Handa, ebiederm, linux-fsdevel, akpm On Fri, May 18, 2012 at 2:26 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > However, wouldn't it be better to move the check you added to the > readdir() path, then, so that we don't do it unnecessarily for > lookups.. > > I'll check how that looks. Actually, we already do that. It's the fcheck_files() check in proc_readfd_common(), no? So I don't think we actually call that proc_fd_instantiate() function normally with out-of-range numbers. Afaik, only a (very rare) race, or an insane lookup of non-existent names would do it, neither of which looks like anything we should worry about. Am I missing some other path? Linus ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: /proc/pid/fd/ shows strange mode when executed via sudo. 2012-05-18 21:32 ` Linus Torvalds @ 2012-05-18 22:29 ` Al Viro 2012-05-19 7:08 ` Tetsuo Handa 0 siblings, 1 reply; 20+ messages in thread From: Al Viro @ 2012-05-18 22:29 UTC (permalink / raw) To: Linus Torvalds; +Cc: Tetsuo Handa, ebiederm, linux-fsdevel, akpm On Fri, May 18, 2012 at 02:32:58PM -0700, Linus Torvalds wrote: > On Fri, May 18, 2012 at 2:26 PM, Linus Torvalds > <torvalds@linux-foundation.org> wrote: > > > > However, wouldn't it be better to move the check you added to the > > readdir() path, then, so that we don't do it unnecessarily for > > lookups.. > > > > I'll check how that looks. > > Actually, we already do that. It's the fcheck_files() check in > proc_readfd_common(), no? Missed that. Yes, you are right - there's no point keeping that in proc_fd_instantiate(). OK, messing with fcheck_files() in there dropped now... diff --git a/fs/proc/base.c b/fs/proc/base.c index 1c8b280..3986db1 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -1491,6 +1491,44 @@ static const struct inode_operations proc_pid_link_inode_operations = { .setattr = proc_setattr, }; +static int proc_fd_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat) +{ + struct inode *inode = dentry->d_inode; + struct files_struct *files; + struct task_struct *task; + struct file *file; + fmode_t mode = 0; + + generic_fillattr(inode, stat); + + rcu_read_lock(); + task = pid_task(proc_pid(inode), PIDTYPE_PID); + files = task ? get_files_struct(task) : NULL; + file = files ? fcheck_files(files, PROC_I(inode)->fd) : NULL; + if (file) + mode = file->f_mode; + rcu_read_unlock(); + + if (files) + put_files_struct(files); + if (!file) + return -ENOENT; + + if (mode & FMODE_READ) + stat->mode |= S_IRUSR | S_IXUSR; + if (mode & FMODE_WRITE) + stat->mode |= S_IWUSR | S_IXUSR; + stat->size = 64; + return 0; +} + +static const struct inode_operations proc_pid_fd_inode_operations = { + .readlink = proc_pid_readlink, + .follow_link = proc_pid_follow_link, + .setattr = proc_setattr, + .getattr = proc_fd_getattr, +}; + /* building an inode */ @@ -1837,54 +1875,26 @@ static struct dentry *proc_fd_instantiate(struct inode *dir, struct dentry *dentry, struct task_struct *task, const void *ptr) { unsigned fd = *(const unsigned *)ptr; - struct file *file; - struct files_struct *files; struct inode *inode; struct proc_inode *ei; - struct dentry *error = ERR_PTR(-ENOENT); inode = proc_pid_make_inode(dir->i_sb, task); if (!inode) - goto out; + return ERR_PTR(-ENOMEM); ei = PROC_I(inode); ei->fd = fd; - files = get_files_struct(task); - if (!files) - goto out_iput; inode->i_mode = S_IFLNK; - - /* - * We are not taking a ref to the file structure, so we must - * hold ->file_lock. - */ - spin_lock(&files->file_lock); - file = fcheck_files(files, fd); - if (!file) - goto out_unlock; - if (file->f_mode & FMODE_READ) - inode->i_mode |= S_IRUSR | S_IXUSR; - if (file->f_mode & FMODE_WRITE) - inode->i_mode |= S_IWUSR | S_IXUSR; - spin_unlock(&files->file_lock); - put_files_struct(files); - - inode->i_op = &proc_pid_link_inode_operations; - inode->i_size = 64; + inode->i_op = &proc_pid_fd_inode_operations; ei->op.proc_get_link = proc_fd_link; d_set_d_op(dentry, &tid_fd_dentry_operations); d_add(dentry, inode); - /* Close the race of the process dying before we return the dentry */ - if (tid_fd_revalidate(dentry, NULL)) - error = NULL; - - out: - return error; -out_unlock: - spin_unlock(&files->file_lock); - put_files_struct(files); -out_iput: - iput(inode); - goto out; + /* + * Close the race of the process dying or file getting closed + * before we return the dentry + */ + if (!tid_fd_revalidate(dentry, NULL)) + return ERR_PTR(-ENOENT); + return NULL; } static struct dentry *proc_lookupfd_common(struct inode *dir, ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: /proc/pid/fd/ shows strange mode when executed via sudo. 2012-05-18 22:29 ` Al Viro @ 2012-05-19 7:08 ` Tetsuo Handa 0 siblings, 0 replies; 20+ messages in thread From: Tetsuo Handa @ 2012-05-19 7:08 UTC (permalink / raw) To: viro, torvalds; +Cc: ebiederm, linux-fsdevel, akpm Al Viro wrote: > On Fri, May 18, 2012 at 02:32:58PM -0700, Linus Torvalds wrote: > > On Fri, May 18, 2012 at 2:26 PM, Linus Torvalds > > <torvalds@linux-foundation.org> wrote: > > > > > > However, wouldn't it be better to move the check you added to the > > > readdir() path, then, so that we don't do it unnecessarily for > > > lookups.. > > > > > > I'll check how that looks. > > > > Actually, we already do that. It's the fcheck_files() check in > > proc_readfd_common(), no? > > Missed that. Yes, you are right - there's no point keeping that > in proc_fd_instantiate(). OK, messing with fcheck_files() in there > dropped now... OK. As of commit 30a08bf2d "proc: move fd symlink i_mode calculations into tid_fd_revalidate()", things are working as expected. Thank you. ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2012-05-19 7:08 UTC | newest] Thread overview: 20+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-05-02 13:40 /proc/pid/fd/ shows strange mode when executed via sudo Tetsuo Handa 2012-05-03 15:42 ` Serge Hallyn 2012-05-03 16:25 ` Tetsuo Handa 2012-05-18 2:39 ` Tetsuo Handa 2012-05-18 9:27 ` Tetsuo Handa 2012-05-18 16:08 ` Linus Torvalds 2012-05-18 16:25 ` Linus Torvalds 2012-05-18 19:55 ` Eric W. Biederman 2012-05-18 18:08 ` Al Viro 2012-05-18 18:18 ` Linus Torvalds 2012-05-18 18:23 ` Linus Torvalds 2012-05-18 18:45 ` Al Viro 2012-05-18 18:55 ` Linus Torvalds 2012-05-18 19:10 ` Al Viro 2012-05-18 20:49 ` Linus Torvalds 2012-05-18 21:23 ` Al Viro 2012-05-18 21:26 ` Linus Torvalds 2012-05-18 21:32 ` Linus Torvalds 2012-05-18 22:29 ` Al Viro 2012-05-19 7:08 ` Tetsuo Handa
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).