* [PATCH 0/3] *at syscalls: Intro @ 2005-12-15 22:49 Ulrich Drepper 2005-12-16 0:32 ` Jeff Garzik 2005-12-16 1:13 ` Nicholas Miell 0 siblings, 2 replies; 10+ messages in thread From: Ulrich Drepper @ 2005-12-15 22:49 UTC (permalink / raw) To: linux-kernel; +Cc: akpm, torvalds Here is a series of patches which introduce in total 11 new system calls which take a file descriptor/filename pair instead of a single file name. These functions, openat etc, have been discussed on numerous occasions. They are needed to implement race-free filesystem traversal, they are necessary to implement a virtual per-thread current working directory (think multi-threaded backup software), etc. We have in glibc today implementations of the interfaces which use the /proc/self/fd magic. But this code is rather expensive. Here are some results (similar to what Jim Meyering posted before): The test creates a deep directory hierarchy on a tmpfs filesystem. Then rm -fr is used to remove all directories. Without syscall support I get this: real 0m31.921s user 0m0.688s sys 0m31.234s With syscall support the results are much better: real 0m20.699s user 0m0.536s sys 0m20.149s The implemenation is really small: arch/i386/kernel/syscall_table.S | 11 ++ arch/x86_64/ia32/ia32entry.S | 11 ++ fs/compat.c | 48 +++++++++-- fs/exec.c | 2 fs/namei.c | 167 ++++++++++++++++++++++++++++++--------- fs/open.c | 60 +++++++++++--- fs/stat.c | 54 ++++++++++-- include/asm-i386/unistd.h | 13 ++- include/asm-x86_64/ia32_unistd.h | 13 ++- include/asm-x86_64/unistd.h | 24 +++++ include/linux/fcntl.h | 7 + include/linux/fs.h | 7 + include/linux/namei.h | 7 - include/linux/time.h | 2 14 files changed, 355 insertions(+), 71 deletions(-) I've split the patch in three parts. The first part is the actual code change. It mostly consists of passing down an additional parameter with a file descriptor and add wrapper functions which pass down the default parameter AT_FDCWD. Three new constants are defined in <linux/fcntl.h> which must correspond to the values already in use in glibc. In a few cases I've modified some code which would not necesarily need changing but the change makes it a bit more efficient in presence of the wrapper functions. The real change needed is the additional else-clause in what is now do_path_lookup. That's it. The other two patches contain the syscall definitions for x86 and x86-64. I've tested the code on x86-64, including the ia32 compat code. Because there is no architecture specific change all should work well on other archs once the syscalls are added. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/3] *at syscalls: Intro 2005-12-15 22:49 [PATCH 0/3] *at syscalls: Intro Ulrich Drepper @ 2005-12-16 0:32 ` Jeff Garzik 2005-12-16 1:29 ` Ulrich Drepper 2005-12-16 11:32 ` Jim Meyering 2005-12-16 1:13 ` Nicholas Miell 1 sibling, 2 replies; 10+ messages in thread From: Jeff Garzik @ 2005-12-16 0:32 UTC (permalink / raw) To: Ulrich Drepper; +Cc: linux-kernel, akpm, torvalds Ulrich Drepper wrote: > Here is a series of patches which introduce in total 11 new system calls > which take a file descriptor/filename pair instead of a single file name. > These functions, openat etc, have been discussed on numerous occasions. > They are needed to implement race-free filesystem traversal, they are > necessary to implement a virtual per-thread current working directory > (think multi-threaded backup software), etc. > > We have in glibc today implementations of the interfaces which use the > /proc/self/fd magic. But this code is rather expensive. Here are some > results (similar to what Jim Meyering posted before): > > The test creates a deep directory hierarchy on a tmpfs filesystem. > Then rm -fr is used to remove all directories. Without syscall support > I get this: > > real 0m31.921s > user 0m0.688s > sys 0m31.234s > > With syscall support the results are much better: > > real 0m20.699s > user 0m0.536s > sys 0m20.149s > > > The implemenation is really small: > > arch/i386/kernel/syscall_table.S | 11 ++ > arch/x86_64/ia32/ia32entry.S | 11 ++ > fs/compat.c | 48 +++++++++-- > fs/exec.c | 2 > fs/namei.c | 167 ++++++++++++++++++++++++++++++--------- > fs/open.c | 60 +++++++++++--- > fs/stat.c | 54 ++++++++++-- > include/asm-i386/unistd.h | 13 ++- > include/asm-x86_64/ia32_unistd.h | 13 ++- > include/asm-x86_64/unistd.h | 24 +++++ > include/linux/fcntl.h | 7 + > include/linux/fs.h | 7 + > include/linux/namei.h | 7 - > include/linux/time.h | 2 > 14 files changed, 355 insertions(+), 71 deletions(-) > > I've split the patch in three parts. The first part is the actual > code change. It mostly consists of passing down an additional parameter > with a file descriptor and add wrapper functions which pass down the > default parameter AT_FDCWD. Three new constants are defined in > <linux/fcntl.h> which must correspond to the values already in use > in glibc. In a few cases I've modified some code which would not > necesarily need changing but the change makes it a bit more efficient > in presence of the wrapper functions. > > The real change needed is the additional else-clause in what is now > do_path_lookup. That's it. Definitely gets my ACK, for my own motivations: I want to create race-free cp(1)/mv(1)/rm(1) utilities for my posixutils project. It would be nice to add the additional argument to path_lookup(), rather than calling do_path_lookup() everywhere. Jeff ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/3] *at syscalls: Intro 2005-12-16 0:32 ` Jeff Garzik @ 2005-12-16 1:29 ` Ulrich Drepper 2005-12-16 1:39 ` Jeff Garzik 2005-12-16 11:32 ` Jim Meyering 1 sibling, 1 reply; 10+ messages in thread From: Ulrich Drepper @ 2005-12-16 1:29 UTC (permalink / raw) To: Jeff Garzik; +Cc: linux-kernel, akpm, torvalds Jeff Garzik wrote: > It would be nice to add the additional argument to path_lookup(), rather > than calling do_path_lookup() everywhere. path_lookup is unfortunately exported. If breaking the ABI is no issue I'll change it. -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/3] *at syscalls: Intro 2005-12-16 1:29 ` Ulrich Drepper @ 2005-12-16 1:39 ` Jeff Garzik 0 siblings, 0 replies; 10+ messages in thread From: Jeff Garzik @ 2005-12-16 1:39 UTC (permalink / raw) To: Ulrich Drepper; +Cc: linux-kernel, akpm, torvalds Ulrich Drepper wrote: > Jeff Garzik wrote: > >> It would be nice to add the additional argument to path_lookup(), >> rather than calling do_path_lookup() everywhere. > > > path_lookup is unfortunately exported. If breaking the ABI is no issue > I'll change it. Yep, I saw it was exported. I would prefer to change the API, but my preference doesn't carry much weight: I have no idea if there are any important out-of-tree users or not. Jeff ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/3] *at syscalls: Intro 2005-12-16 0:32 ` Jeff Garzik 2005-12-16 1:29 ` Ulrich Drepper @ 2005-12-16 11:32 ` Jim Meyering 2005-12-16 16:24 ` Ulrich Drepper 1 sibling, 1 reply; 10+ messages in thread From: Jim Meyering @ 2005-12-16 11:32 UTC (permalink / raw) To: Jeff Garzik; +Cc: Ulrich Drepper, linux-kernel, akpm, torvalds FYI, the rm in coreutils-cvs is finally thread-safe and race-free, when using openat et al. On 12/16/05, Jeff Garzik <jgarzik@pobox.com> wrote: > Definitely gets my ACK, for my own motivations: I want to create > race-free cp(1)/mv(1)/rm(1) utilities for my posixutils project. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/3] *at syscalls: Intro 2005-12-16 11:32 ` Jim Meyering @ 2005-12-16 16:24 ` Ulrich Drepper 2005-12-16 16:36 ` Jakub Jelinek 2005-12-16 19:59 ` Jim Meyering 0 siblings, 2 replies; 10+ messages in thread From: Ulrich Drepper @ 2005-12-16 16:24 UTC (permalink / raw) To: Jim Meyering; +Cc: Jeff Garzik, linux-kernel, akpm, torvalds Jim Meyering wrote: > FYI, the rm in coreutils-cvs is finally thread-safe and race-free, > when using openat et al. Actually, Jim, I doubt it. There is one more race which cannot be solved with the existing interfaces. I want to tackle this next, after these changes are decided on. The problem is directory creation and then populating it. As in cp -r and any backup tool. You currently have to use (at best) mkdirat(fd, "some-dir", 0666); dfd = openat(fd, "some-dir", O_RDONLY); What is needed is a way to create a new directory and return a file descriptor for it. I was thinking about using dfd = openat(fd, "some-dir", O_RDONLY|O_DIRECTORY|O_CREAT, S_IFDIR|0666) where the combination of using O_DIRECTORY, O_RDONLY, O_CREAT, and the S_IFDIR flag can be recognized. This is a configuration which cannot be used successfully in current code. Should probably also work with open(). -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/3] *at syscalls: Intro 2005-12-16 16:24 ` Ulrich Drepper @ 2005-12-16 16:36 ` Jakub Jelinek 2005-12-16 19:59 ` Jim Meyering 1 sibling, 0 replies; 10+ messages in thread From: Jakub Jelinek @ 2005-12-16 16:36 UTC (permalink / raw) To: Ulrich Drepper; +Cc: Jim Meyering, Jeff Garzik, linux-kernel, akpm, torvalds On Fri, Dec 16, 2005 at 08:24:53AM -0800, Ulrich Drepper wrote: > Jim Meyering wrote: > >FYI, the rm in coreutils-cvs is finally thread-safe and race-free, > >when using openat et al. > > Actually, Jim, I doubt it. There is one more race which cannot be > solved with the existing interfaces. I want to tackle this next, after > these changes are decided on. > > The problem is directory creation and then populating it. As in cp -r > and any backup tool. You currently have to use (at best) > > mkdirat(fd, "some-dir", 0666); > dfd = openat(fd, "some-dir", O_RDONLY); > > What is needed is a way to create a new directory and return a file > descriptor for it. > > I was thinking about using > > dfd = openat(fd, "some-dir", O_RDONLY|O_DIRECTORY|O_CREAT, S_IFDIR|0666) > > where the combination of using O_DIRECTORY, O_RDONLY, O_CREAT, and the > S_IFDIR flag can be recognized. This is a configuration which cannot be > used successfully in current code. Should probably also work with open(). At least for prelink I'd also like to be able to atomically read/modify/write a file, but the Solaris renameat doesn't allow that. What I do is open the original file, read it etc., then create a temporary file in the same dir and as the final step I need to rename it over the original file, but I'd like to do that only if nobody replaced the original file in the mean time. So a renameat variant that would have olddirfd, oldname, oldfd, newdirfd, newname, newfd arguments would be handy. It would do the same thing as renameat (olddirfd, oldname, newdirfd, newname), but would fail if oldfd resp. newfd no longer corresponds to the olddirfd/oldname resp. newdirfd/newname files. Jakub ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/3] *at syscalls: Intro 2005-12-16 16:24 ` Ulrich Drepper 2005-12-16 16:36 ` Jakub Jelinek @ 2005-12-16 19:59 ` Jim Meyering 1 sibling, 0 replies; 10+ messages in thread From: Jim Meyering @ 2005-12-16 19:59 UTC (permalink / raw) To: Ulrich Drepper; +Cc: Jeff Garzik, linux-kernel, akpm, torvalds On 12/16/05, Ulrich Drepper <drepper@redhat.com> wrote: > Jim Meyering wrote: > > FYI, the rm in coreutils-cvs is finally thread-safe and race-free, > > when using openat et al. > > Actually, Jim, I doubt it. You seem to have misread. I mentioned only `rm'. I know full well that the other dir-traversing tools are not as robust. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/3] *at syscalls: Intro 2005-12-15 22:49 [PATCH 0/3] *at syscalls: Intro Ulrich Drepper 2005-12-16 0:32 ` Jeff Garzik @ 2005-12-16 1:13 ` Nicholas Miell 2005-12-16 17:51 ` Linus Torvalds 1 sibling, 1 reply; 10+ messages in thread From: Nicholas Miell @ 2005-12-16 1:13 UTC (permalink / raw) To: Ulrich Drepper; +Cc: linux-kernel, akpm, torvalds On Thu, 2005-12-15 at 17:49 -0500, Ulrich Drepper wrote: > Here is a series of patches which introduce in total 11 new system calls > which take a file descriptor/filename pair instead of a single file name. > These functions, openat etc, have been discussed on numerous occasions. > They are needed to implement race-free filesystem traversal, they are > necessary to implement a virtual per-thread current working directory > (think multi-threaded backup software), etc. > Actually, that last part is false (or maybe just misleading). You can create threads without CLONE_FS to get a per-thread cwd/chroot/umask, no "virtual" required. Don't take this as an objection to implementation of the *at() syscalls in Linux, though; rather, look at is as a request for the addition of int pthread_attr_setfssharing_np(pthread_attr_t *attr, int share) and int pthread_attr_getfssharing_np(pthread_attr_t *attr) to glibc. -- Nicholas Miell <nmiell@comcast.net> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/3] *at syscalls: Intro 2005-12-16 1:13 ` Nicholas Miell @ 2005-12-16 17:51 ` Linus Torvalds 0 siblings, 0 replies; 10+ messages in thread From: Linus Torvalds @ 2005-12-16 17:51 UTC (permalink / raw) To: Nicholas Miell; +Cc: Ulrich Drepper, linux-kernel, akpm On Thu, 15 Dec 2005, Nicholas Miell wrote: > > Don't take this as an objection to implementation of the *at() syscalls > in Linux, though; rather, look at is as a request for the addition of > int pthread_attr_setfssharing_np(pthread_attr_t *attr, int share) and > int pthread_attr_getfssharing_np(pthread_attr_t *attr) to glibc. I don't think separate fs/files makes sense with pthreads. Why? Signals. Signals are shared in a pthread environment, so signal handlers can happen in any thread. If you don't share the filesystem thing, your signal handlers had better not do any file operations. You'd better not try to use the old "send a byte down a pipe" trick to wake somebody else up, because the thread you may take the signal in may not _have_ that pipe fd open. Now, opening files in a signal handler may be unusual enough that having separate cwd/root (!CLONE_FS as opposed to !CLONE_FILES) might be more commonly usable, but it's still potentially asking for some really funky behaviour. Linus ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2005-12-16 19:59 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-12-15 22:49 [PATCH 0/3] *at syscalls: Intro Ulrich Drepper 2005-12-16 0:32 ` Jeff Garzik 2005-12-16 1:29 ` Ulrich Drepper 2005-12-16 1:39 ` Jeff Garzik 2005-12-16 11:32 ` Jim Meyering 2005-12-16 16:24 ` Ulrich Drepper 2005-12-16 16:36 ` Jakub Jelinek 2005-12-16 19:59 ` Jim Meyering 2005-12-16 1:13 ` Nicholas Miell 2005-12-16 17:51 ` Linus Torvalds
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox