* [GIT PULL] overlay filesystem v25 @ 2014-10-23 23:25 Miklos Szeredi 2014-10-24 2:20 ` Al Viro 0 siblings, 1 reply; 16+ messages in thread From: Miklos Szeredi @ 2014-10-23 23:25 UTC (permalink / raw) To: Linus Torvalds; +Cc: Al Viro, linux-fsdevel, linux-kernel, linux-unionfs Linus, Please pull git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git overlayfs.v25 Plenty of bugs fixed relative to the previous version, thanks to Al Viro's review and is now officially be BugFree(TM). Also passes the unionmount-testsuite by David Howells. Thanks, Miklos --- Andy Whitcroft (1): overlayfs: add statfs support Erez Zadok (1): overlayfs: implement show_options Miklos Szeredi (11): vfs: add i_op->dentry_open() vfs: export do_splice_direct() to modules vfs: export __inode_permission() to modules vfs: introduce clone_private_mount() vfs: export check_sticky() vfs: add whiteout support vfs: add RENAME_WHITEOUT ext4: support RENAME_WHITEOUT shmem: support RENAME_WHITEOUT overlay filesystem fs: limit filesystem stacking depth Neil Brown (1): overlay: overlay filesystem documentation --- Documentation/filesystems/Locking | 2 + Documentation/filesystems/overlayfs.txt | 198 +++++++ Documentation/filesystems/vfs.txt | 7 + MAINTAINERS | 7 + fs/Kconfig | 1 + fs/Makefile | 1 + fs/btrfs/ioctl.c | 20 +- fs/ecryptfs/main.c | 7 + fs/ext4/namei.c | 95 +++- fs/internal.h | 7 - fs/namei.c | 41 +- fs/namespace.c | 27 + fs/open.c | 23 +- fs/overlayfs/Kconfig | 10 + fs/overlayfs/Makefile | 7 + fs/overlayfs/copy_up.c | 414 ++++++++++++++ fs/overlayfs/dir.c | 921 ++++++++++++++++++++++++++++++++ fs/overlayfs/inode.c | 425 +++++++++++++++ fs/overlayfs/overlayfs.h | 191 +++++++ fs/overlayfs/readdir.c | 587 ++++++++++++++++++++ fs/overlayfs/super.c | 796 +++++++++++++++++++++++++++ fs/splice.c | 1 + include/linux/fs.h | 39 ++ include/linux/mount.h | 3 + include/uapi/linux/fs.h | 1 + mm/shmem.c | 36 +- 26 files changed, 3809 insertions(+), 58 deletions(-) create mode 100644 Documentation/filesystems/overlayfs.txt create mode 100644 fs/overlayfs/Kconfig create mode 100644 fs/overlayfs/Makefile create mode 100644 fs/overlayfs/copy_up.c create mode 100644 fs/overlayfs/dir.c create mode 100644 fs/overlayfs/inode.c create mode 100644 fs/overlayfs/overlayfs.h create mode 100644 fs/overlayfs/readdir.c create mode 100644 fs/overlayfs/super.c ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [GIT PULL] overlay filesystem v25 2014-10-23 23:25 [GIT PULL] overlay filesystem v25 Miklos Szeredi @ 2014-10-24 2:20 ` Al Viro 2014-10-24 3:24 ` Al Viro 0 siblings, 1 reply; 16+ messages in thread From: Al Viro @ 2014-10-24 2:20 UTC (permalink / raw) To: Miklos Szeredi; +Cc: Linus Torvalds, linux-fsdevel, linux-kernel, linux-unionfs On Fri, Oct 24, 2014 at 01:25:39AM +0200, Miklos Szeredi wrote: > Linus, > > Please pull > > git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git overlayfs.v25 > > Plenty of bugs fixed relative to the previous version, thanks to Al Viro's > review and is now officially be BugFree(TM). Also passes the > unionmount-testsuite by David Howells. *blink* Why the hell do you hold ->i_mutex across the entire opening of underlying directory? All you need is to serialize one assignment; the side that loses the race will simply fput() what it opened... Oh, well - that goes under "weird pessimisations, easy to fix in followups"... ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [GIT PULL] overlay filesystem v25 2014-10-24 2:20 ` Al Viro @ 2014-10-24 3:24 ` Al Viro 2014-10-24 7:24 ` Miklos Szeredi 2014-10-24 7:28 ` Geert Uytterhoeven 0 siblings, 2 replies; 16+ messages in thread From: Al Viro @ 2014-10-24 3:24 UTC (permalink / raw) To: Linus Torvalds; +Cc: Miklos Szeredi, linux-fsdevel, linux-kernel, linux-unionfs On Fri, Oct 24, 2014 at 03:20:55AM +0100, Al Viro wrote: > Why the hell do you hold ->i_mutex across the entire opening of underlying > directory? All you need is to serialize one assignment; the side that loses > the race will simply fput() what it opened... > > Oh, well - that goes under "weird pessimisations, easy to fix in followups"... OK, pulled into vfs.git, followups in question added. Also there: fix for a long-standing leak in d_splice_alias() failure exits. Guys, could you check that current vfs.git#for-linus survives your local tests? Seems to survive here; if I don't hear of any problems by tomorrow morning, to Linus it goes... FWIW, for that pull request stats would be Shortlog: Al Viro (5): fix inode leaks on d_splice_alias() failure exits overlayfs: don't hold ->i_mutex over opening the real directory overlayfs: make ovl_cache_entry->name an array instead of pointer overlayfs: embed root into overlay_readdir_data overlayfs: embed middle into overlay_readdir_data Andy Whitcroft (1): overlayfs: add statfs support Erez Zadok (1): overlayfs: implement show_options Miklos Szeredi (11): vfs: add i_op->dentry_open() vfs: export do_splice_direct() to modules vfs: export __inode_permission() to modules vfs: introduce clone_private_mount() vfs: export check_sticky() vfs: add whiteout support vfs: add RENAME_WHITEOUT ext4: support RENAME_WHITEOUT shmem: support RENAME_WHITEOUT overlay filesystem fs: limit filesystem stacking depth Neil Brown (1): overlay: overlay filesystem documentation Diffstat: Documentation/filesystems/Locking | 2 + Documentation/filesystems/overlayfs.txt | 198 +++++++ Documentation/filesystems/vfs.txt | 7 + MAINTAINERS | 7 + fs/Kconfig | 1 + fs/Makefile | 1 + fs/btrfs/ioctl.c | 20 +- fs/dcache.c | 2 + fs/ecryptfs/main.c | 7 + fs/ext4/namei.c | 95 +++- fs/internal.h | 7 - fs/namei.c | 41 +- fs/namespace.c | 27 + fs/open.c | 23 +- fs/overlayfs/Kconfig | 10 + fs/overlayfs/Makefile | 7 + fs/overlayfs/copy_up.c | 414 ++++++++++++++ fs/overlayfs/dir.c | 921 +++++++++++++++++++++++++++++++ fs/overlayfs/inode.c | 425 ++++++++++++++ fs/overlayfs/overlayfs.h | 191 +++++++ fs/overlayfs/readdir.c | 589 ++++++++++++++++++++ fs/overlayfs/super.c | 796 ++++++++++++++++++++++++++ fs/splice.c | 1 + include/linux/fs.h | 39 ++ include/linux/mount.h | 3 + include/uapi/linux/fs.h | 1 + mm/shmem.c | 36 +- 27 files changed, 3813 insertions(+), 58 deletions(-) create mode 100644 Documentation/filesystems/overlayfs.txt create mode 100644 fs/overlayfs/Kconfig create mode 100644 fs/overlayfs/Makefile create mode 100644 fs/overlayfs/copy_up.c create mode 100644 fs/overlayfs/dir.c create mode 100644 fs/overlayfs/inode.c create mode 100644 fs/overlayfs/overlayfs.h create mode 100644 fs/overlayfs/readdir.c create mode 100644 fs/overlayfs/super.c ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [GIT PULL] overlay filesystem v25 2014-10-24 3:24 ` Al Viro @ 2014-10-24 7:24 ` Miklos Szeredi 2014-10-25 8:18 ` Al Viro 2014-10-24 7:28 ` Geert Uytterhoeven 1 sibling, 1 reply; 16+ messages in thread From: Miklos Szeredi @ 2014-10-24 7:24 UTC (permalink / raw) To: Al Viro; +Cc: Linus Torvalds, Linux-Fsdevel, Kernel Mailing List, linux-unionfs On Fri, Oct 24, 2014 at 5:24 AM, Al Viro <viro@zeniv.linux.org.uk> wrote: > On Fri, Oct 24, 2014 at 03:20:55AM +0100, Al Viro wrote: >> Why the hell do you hold ->i_mutex across the entire opening of underlying >> directory? All you need is to serialize one assignment; the side that loses >> the race will simply fput() what it opened... The reason I didn't do your "fix" is that it - adds more lines than it takes, - I wasn't sure at all if the lockless access is actually correct without the ACCESS_ONCE and all the memory barrier magic that might be necessary on weird architectures. The rest of the changes look OK. Thanks, Miklos ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [GIT PULL] overlay filesystem v25 2014-10-24 7:24 ` Miklos Szeredi @ 2014-10-25 8:18 ` Al Viro 2014-10-25 9:53 ` Miklos Szeredi 0 siblings, 1 reply; 16+ messages in thread From: Al Viro @ 2014-10-25 8:18 UTC (permalink / raw) To: Miklos Szeredi Cc: Linus Torvalds, Linux-Fsdevel, Kernel Mailing List, linux-unionfs On Fri, Oct 24, 2014 at 09:24:45AM +0200, Miklos Szeredi wrote: > The reason I didn't do your "fix" is that it > > - adds more lines than it takes, > > - I wasn't sure at all if the lockless access is actually correct > without the ACCESS_ONCE and all the memory barrier magic that might be > necessary on weird architectures. _What_ lockless accesses? There is an extremely embarrassing bug in that commit, all right, but it has nothing to do with barriers... All barrier-related issues are taken care of by ovl_path_upper() (and without that you'd have tons of worse problems). Fetching ->upperfile outside of ->i_mutex is fine - in the worst case we'll fetch NULL, open the sucker grab ->i_mutex and find out that it has already been taken care of. In which case we fput() what we'd opened and move on (fput() under ->i_mutex is fine - it's going to be delayed until return from syscall anyway). There was a very dumb braino in there; fixed, force-pushed, passes unionmount tests, no regressions on LTP syscall ones and xfstests. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [GIT PULL] overlay filesystem v25 2014-10-25 8:18 ` Al Viro @ 2014-10-25 9:53 ` Miklos Szeredi 2014-10-25 17:06 ` Al Viro 0 siblings, 1 reply; 16+ messages in thread From: Miklos Szeredi @ 2014-10-25 9:53 UTC (permalink / raw) To: Al Viro; +Cc: Linus Torvalds, Linux-Fsdevel, Kernel Mailing List, linux-unionfs On Sat, Oct 25, 2014 at 10:18 AM, Al Viro <viro@zeniv.linux.org.uk> wrote: > On Fri, Oct 24, 2014 at 09:24:45AM +0200, Miklos Szeredi wrote: > >> The reason I didn't do your "fix" is that it >> >> - adds more lines than it takes, >> >> - I wasn't sure at all if the lockless access is actually correct >> without the ACCESS_ONCE and all the memory barrier magic that might be >> necessary on weird architectures. > > _What_ lockless accesses? There is an extremely embarrassing bug in that > commit, all right, but it has nothing to do with barriers... All > barrier-related issues are taken care of by ovl_path_upper() (and without > that you'd have tons of worse problems). Fetching ->upperfile outside of > ->i_mutex is fine - in the worst case we'll fetch NULL, open the sucker > grab ->i_mutex and find out that it has already been taken care of. > In which case we fput() what we'd opened and move on (fput() under > ->i_mutex is fine - it's going to be delayed until return from syscall > anyway). Yes, but it's not about race with copy-up (which the ovl_path_upper() protects against), but race of two fsync calls with each other. If there's no synchronization between them, then that od->upperfile does indeed count as lockless access, no matter that the assignment was done under lock. Thanks, Miklos ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [GIT PULL] overlay filesystem v25 2014-10-25 9:53 ` Miklos Szeredi @ 2014-10-25 17:06 ` Al Viro 2014-10-27 8:06 ` Miklos Szeredi 0 siblings, 1 reply; 16+ messages in thread From: Al Viro @ 2014-10-25 17:06 UTC (permalink / raw) To: Miklos Szeredi Cc: Linus Torvalds, Linux-Fsdevel, Kernel Mailing List, linux-unionfs On Sat, Oct 25, 2014 at 11:53:52AM +0200, Miklos Szeredi wrote: > Yes, but it's not about race with copy-up (which the ovl_path_upper() > protects against), but race of two fsync calls with each other. If > there's no synchronization between them, then that od->upperfile does > indeed count as lockless access, no matter that the assignment was > done under lock. p = global; if (!p) { // outside of lock p = alloc(); grab lock if (!global) { global = p; } else { destroy(p); p = global; } drop lock } is a very common pattern, especially if you look for cases when lock is a spinlock and allocation is blocking (in those cases you'll often see destroy() part done after dropping the lock; that's where what I fucked up in what I'd originally pushed. And it wasn't even needed - fput() under ->i_mutex is OK...) ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [GIT PULL] overlay filesystem v25 2014-10-25 17:06 ` Al Viro @ 2014-10-27 8:06 ` Miklos Szeredi 2014-10-27 15:59 ` Paul E. McKenney 0 siblings, 1 reply; 16+ messages in thread From: Miklos Szeredi @ 2014-10-27 8:06 UTC (permalink / raw) To: Al Viro Cc: Linus Torvalds, Linux-Fsdevel, Kernel Mailing List, linux-unionfs, Paul E. McKenney [Paul McKenney added to CC] On Sat, Oct 25, 2014 at 7:06 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > On Sat, Oct 25, 2014 at 11:53:52AM +0200, Miklos Szeredi wrote: > >> Yes, but it's not about race with copy-up (which the ovl_path_upper() >> protects against), but race of two fsync calls with each other. If >> there's no synchronization between them, then that od->upperfile does >> indeed count as lockless access, no matter that the assignment was >> done under lock. > > p = global; > if (!p) { // outside of lock > p = alloc(); > grab lock > if (!global) { > global = p; > } else { > destroy(p); > p = global; > } > drop lock > } > is a very common pattern, especially if you look for cases when lock is > a spinlock and allocation is blocking (in those cases you'll often see > destroy() part done after dropping the lock; that's where what I fucked up in > what I'd originally pushed. And it wasn't even needed - fput() under > ->i_mutex is OK...) Being a very common pattern does not automatically make it correct... My understanding of these issues is very limited, but it's not clear to me what will order initialization of members of p with the storing of p into global. E.g. we start out with global == NULL and p->foo == 0. CPU1: p->foo = 1 grab lock if (!global) global = p CPU1: p = global if (p) q = p->foo Is it guaranteed that the above sequence (as is, without any barriers or ACCESS_ONCE() other than the lock acquisition) will result in q == 1 if p != NULL? Thanks, Miklos ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [GIT PULL] overlay filesystem v25 2014-10-27 8:06 ` Miklos Szeredi @ 2014-10-27 15:59 ` Paul E. McKenney 2014-10-27 17:28 ` Al Viro 0 siblings, 1 reply; 16+ messages in thread From: Paul E. McKenney @ 2014-10-27 15:59 UTC (permalink / raw) To: Miklos Szeredi Cc: Al Viro, Linus Torvalds, Linux-Fsdevel, Kernel Mailing List, linux-unionfs On Mon, Oct 27, 2014 at 09:06:54AM +0100, Miklos Szeredi wrote: > [Paul McKenney added to CC] > > On Sat, Oct 25, 2014 at 7:06 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > On Sat, Oct 25, 2014 at 11:53:52AM +0200, Miklos Szeredi wrote: > > > >> Yes, but it's not about race with copy-up (which the ovl_path_upper() > >> protects against), but race of two fsync calls with each other. If > >> there's no synchronization between them, then that od->upperfile does > >> indeed count as lockless access, no matter that the assignment was > >> done under lock. > > > > p = global; > > if (!p) { // outside of lock > > p = alloc(); > > grab lock > > if (!global) { > > global = p; > > } else { > > destroy(p); > > p = global; > > } > > drop lock > > } > > is a very common pattern, especially if you look for cases when lock is > > a spinlock and allocation is blocking (in those cases you'll often see > > destroy() part done after dropping the lock; that's where what I fucked up in > > what I'd originally pushed. And it wasn't even needed - fput() under > > ->i_mutex is OK...) > > Being a very common pattern does not automatically make it correct... > > My understanding of these issues is very limited, but it's not clear > to me what will order initialization of members of p with the storing > of p into global. E.g. we start out with global == NULL and p->foo == > 0. > > CPU1: > p->foo = 1 > grab lock > if (!global) > global = p > > CPU1: If it is all the same to you, I will call this CPU2 to distinguish it from the first CPU1. ;-) > p = global > if (p) > q = p->foo > > Is it guaranteed that the above sequence (as is, without any barriers > or ACCESS_ONCE() other than the lock acquisition) will result in q == > 1 if p != NULL? Indeed, life is hard here. Keep in mind that lock acquisition is not guaranteed to prevent prior operations from being reordered into the critical section, possibly as follows: CPU1: grab lock if (!global) global = p; /* Assume all of CPU2's accesses happen here. */ p->foo = 1; This clearly allows CPU2 to execute as follows: CPU2: p = global; /* gets p */ if (p) /* non-NULL */ q = p->foo; /* might not be 1 */ Not only that, on DEC Alpha, even if CPU1's accesses are ordered, CPU2's accesses can be misordered. You need rcu_dereference() or the combination of ACCESS_ONCE() and smp_read_barrier_depends() to avoid this issue. As always, see http://www.openvms.compaq.com/wizard/wiz_2637.html for more info. So no, there is no guarantee. I am assuming that the lock grabbed by CPU1 guards all assignments to "global", otherwise the code needs further help. I am further assuming that the memory pointed to by CPU1's "p" is inaccessible to any other CPU, as in CPU1 just allocated the memory. Otherwise, the assignment "p->foo = 1" is questionable. And finally, I am assuming that p->foo stays constant once it has been made accessible to readers. But the following should work: CPU1: p->foo = 1; /* Assumes p is local. */ smp_mb__before_spinlock(); grab lock if (!global) /* Assumes lock protects all assignments to global. */ global = p; CPU2: p = rcu_dereference(global); if (p) q = p->foo; /* Assumes p->foo constant once visible to readers. */ /* Also assumes p and q are local. */ If the assumptions called out in the comments do not hold, you at least need ACCESS_ONCE(), and perhaps even more synchronization. For more info on ACCESS_ONCE(), Jon's LWN article is at http://lwn.net/Articles/508991/. Thanx, Paul ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [GIT PULL] overlay filesystem v25 2014-10-27 15:59 ` Paul E. McKenney @ 2014-10-27 17:28 ` Al Viro 2014-10-27 17:36 ` Paul E. McKenney 0 siblings, 1 reply; 16+ messages in thread From: Al Viro @ 2014-10-27 17:28 UTC (permalink / raw) To: Paul E. McKenney Cc: Miklos Szeredi, Linus Torvalds, Linux-Fsdevel, Kernel Mailing List, linux-unionfs On Mon, Oct 27, 2014 at 08:59:01AM -0700, Paul E. McKenney wrote: > Indeed, life is hard here. Keep in mind that lock acquisition is not > guaranteed to prevent prior operations from being reordered into the > critical section, possibly as follows: > > CPU1: > grab lock > if (!global) > global = p; > /* Assume all of CPU2's accesses happen here. */ > p->foo = 1; A bit of context: p is a local pointer to struct file here and alloc is opening it... > This clearly allows CPU2 to execute as follows: > > CPU2: > p = global; /* gets p */ > if (p) /* non-NULL */ > q = p->foo; /* might not be 1 */ > > Not only that, on DEC Alpha, even if CPU1's accesses are ordered, CPU2's > accesses can be misordered. You need rcu_dereference() or the combination > of ACCESS_ONCE() and smp_read_barrier_depends() to avoid this issue. > As always, see http://www.openvms.compaq.com/wizard/wiz_2637.html for > more info. > > So no, there is no guarantee. I am assuming that the lock grabbed by > CPU1 guards all assignments to "global", otherwise the code needs further > help. I am further assuming that the memory pointed to by CPU1's "p" > is inaccessible to any other CPU, as in CPU1 just allocated the memory. > Otherwise, the assignment "p->foo = 1" is questionable. And finally, > I am assuming that p->foo stays constant once it has been made > accessible to readers. > > But the following should work: > > CPU1: > p->foo = 1; /* Assumes p is local. */ > smp_mb__before_spinlock(); > grab lock > if (!global) /* Assumes lock protects all assignments to global. */ > global = p; > > CPU2: > p = rcu_dereference(global); > if (p) > q = p->foo; /* Assumes p->foo constant once visible to readers. */ > /* Also assumes p and q are local. */ > > If the assumptions called out in the comments do not hold, you at least > need ACCESS_ONCE(), and perhaps even more synchronization. For more info > on ACCESS_ONCE(), Jon's LWN article is at http://lwn.net/Articles/508991/. First of all, this "p->foo = 1" is a shorthand for initialization of struct file done by opening a file. What you are saying is that it can leak past the point where we stick a pointer to freshly opened file into a place where another CPU can see it, but not past the barrier in dropping the lock, right? And you are suggesting rcu_dereference() as a way to bring the required barriers in. Ouch... The names are really bad, but there's another fun issue - rcu_dereference brings in sparse noise. Wouldn't direct use of smp_read_barrier_depends() be cleaner, anyway? ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [GIT PULL] overlay filesystem v25 2014-10-27 17:28 ` Al Viro @ 2014-10-27 17:36 ` Paul E. McKenney 2014-10-28 1:12 ` Al Viro 0 siblings, 1 reply; 16+ messages in thread From: Paul E. McKenney @ 2014-10-27 17:36 UTC (permalink / raw) To: Al Viro Cc: Miklos Szeredi, Linus Torvalds, Linux-Fsdevel, Kernel Mailing List, linux-unionfs On Mon, Oct 27, 2014 at 05:28:01PM +0000, Al Viro wrote: > On Mon, Oct 27, 2014 at 08:59:01AM -0700, Paul E. McKenney wrote: > > > Indeed, life is hard here. Keep in mind that lock acquisition is not > > guaranteed to prevent prior operations from being reordered into the > > critical section, possibly as follows: > > > > CPU1: > > grab lock > > if (!global) > > global = p; > > /* Assume all of CPU2's accesses happen here. */ > > p->foo = 1; > > A bit of context: p is a local pointer to struct file here and alloc is > opening it... OK, good to know. ;-) > > This clearly allows CPU2 to execute as follows: > > > > CPU2: > > p = global; /* gets p */ > > if (p) /* non-NULL */ > > q = p->foo; /* might not be 1 */ > > > > Not only that, on DEC Alpha, even if CPU1's accesses are ordered, CPU2's > > accesses can be misordered. You need rcu_dereference() or the combination > > of ACCESS_ONCE() and smp_read_barrier_depends() to avoid this issue. > > As always, see http://www.openvms.compaq.com/wizard/wiz_2637.html for > > more info. > > > > So no, there is no guarantee. I am assuming that the lock grabbed by > > CPU1 guards all assignments to "global", otherwise the code needs further > > help. I am further assuming that the memory pointed to by CPU1's "p" > > is inaccessible to any other CPU, as in CPU1 just allocated the memory. > > Otherwise, the assignment "p->foo = 1" is questionable. And finally, > > I am assuming that p->foo stays constant once it has been made > > accessible to readers. > > > > But the following should work: > > > > CPU1: > > p->foo = 1; /* Assumes p is local. */ > > smp_mb__before_spinlock(); > > grab lock > > if (!global) /* Assumes lock protects all assignments to global. */ > > global = p; > > > > CPU2: > > p = rcu_dereference(global); > > if (p) > > q = p->foo; /* Assumes p->foo constant once visible to readers. */ > > /* Also assumes p and q are local. */ > > > > If the assumptions called out in the comments do not hold, you at least > > need ACCESS_ONCE(), and perhaps even more synchronization. For more info > > on ACCESS_ONCE(), Jon's LWN article is at http://lwn.net/Articles/508991/. > > First of all, this "p->foo = 1" is a shorthand for initialization of > struct file done by opening a file. What you are saying is that it > can leak past the point where we stick a pointer to freshly opened > file into a place where another CPU can see it, but not past the > barrier in dropping the lock, right? Exactly! I should also add that smp_mb__before_spinlock() implies smp_wmb(), but nothing more. But that is OK because smp_wmb() suffices in this case. > And you are suggesting rcu_dereference() as a way to bring the required > barriers in. Ouch... The names are really bad, but there's another > fun issue - rcu_dereference brings in sparse noise. Wouldn't direct use > of smp_read_barrier_depends() be cleaner, anyway? Code making direct use of smp_read_barrier_depends() is harder to read, in my experience, but good point on the sparse noise. Maybe a new lockless_dereference() primitive? Maybe something like the following? (Untested, probably does not even build.) #define lockless_dereference(p) \ ({ \ typeof(*p) *_________p1 = ACCESS_ONCE(p); \ smp_read_barrier_depends(); /* Dependency order vs. p above. */ \ _________p1; \ }) Thanx, Paul ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [GIT PULL] overlay filesystem v25 2014-10-27 17:36 ` Paul E. McKenney @ 2014-10-28 1:12 ` Al Viro 2014-10-28 4:11 ` Paul E. McKenney 0 siblings, 1 reply; 16+ messages in thread From: Al Viro @ 2014-10-28 1:12 UTC (permalink / raw) To: Paul E. McKenney Cc: Miklos Szeredi, Linus Torvalds, Linux-Fsdevel, Kernel Mailing List, linux-unionfs On Mon, Oct 27, 2014 at 10:36:21AM -0700, Paul E. McKenney wrote: > Code making direct use of smp_read_barrier_depends() is harder to read, > in my experience, but good point on the sparse noise. Maybe a new > lockless_dereference() primitive? Maybe something like the following? > (Untested, probably does not even build.) > > #define lockless_dereference(p) \ > ({ \ > typeof(*p) *_________p1 = ACCESS_ONCE(p); \ > smp_read_barrier_depends(); /* Dependency order vs. p above. */ \ > _________p1; \ > }) Hmm... Where would you prefer to put it? rcupdate.h? ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [GIT PULL] overlay filesystem v25 2014-10-28 1:12 ` Al Viro @ 2014-10-28 4:11 ` Paul E. McKenney 2014-10-28 22:55 ` Al Viro 0 siblings, 1 reply; 16+ messages in thread From: Paul E. McKenney @ 2014-10-28 4:11 UTC (permalink / raw) To: Al Viro Cc: Miklos Szeredi, Linus Torvalds, Linux-Fsdevel, Kernel Mailing List, linux-unionfs On Tue, Oct 28, 2014 at 01:12:14AM +0000, Al Viro wrote: > On Mon, Oct 27, 2014 at 10:36:21AM -0700, Paul E. McKenney wrote: > > Code making direct use of smp_read_barrier_depends() is harder to read, > > in my experience, but good point on the sparse noise. Maybe a new > > lockless_dereference() primitive? Maybe something like the following? > > (Untested, probably does not even build.) > > > > #define lockless_dereference(p) \ > > ({ \ > > typeof(*p) *_________p1 = ACCESS_ONCE(p); \ > > smp_read_barrier_depends(); /* Dependency order vs. p above. */ \ > > _________p1; \ > > }) > > Hmm... Where would you prefer to put it? rcupdate.h? Good a place as any, I guess. Please see patch below. Left to myself, I would send this along for the next merge window, but please let me know if you would like it sooner. Thanx, Paul ------------------------------------------------------------------------ rcu: Provide counterpart to rcu_dereference() for non-RCU situations Although rcu_dereference() and friends can be used in situations where object lifetimes are being managed by something other than RCU, the resulting sparse and lockdep-RCU noise can be annoying. This commit therefore supplies a lockless_dereference(), which provides the protection for dereferences without the RCU-related debugging noise. Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index 9575c2d403b5..ed4f5939a452 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -617,6 +617,21 @@ static inline void rcu_preempt_sleep_check(void) #define RCU_INITIALIZER(v) (typeof(*(v)) __force __rcu *)(v) /** + * lockless_dereference() - safely load a pointer for later dereference + * @p: The pointer to load + * + * Similar to rcu_dereference(), but for situations where the pointed-to + * object's lifetime is managed by something other than RCU. That + * "something other" might be reference counting or simple immortality. + */ +#define lockless_dereference(p) \ +({ \ + typeof(p) _________p1 = ACCESS_ONCE(p); \ + smp_read_barrier_depends(); /* Dependency order vs. p above. */ \ + (_________p1); \ +}) + +/** * rcu_assign_pointer() - assign to RCU-protected pointer * @p: pointer to assign to * @v: value to assign (publish) ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [GIT PULL] overlay filesystem v25 2014-10-28 4:11 ` Paul E. McKenney @ 2014-10-28 22:55 ` Al Viro 2014-10-28 23:25 ` Paul E. McKenney 0 siblings, 1 reply; 16+ messages in thread From: Al Viro @ 2014-10-28 22:55 UTC (permalink / raw) To: Paul E. McKenney Cc: Miklos Szeredi, Linus Torvalds, Linux-Fsdevel, Kernel Mailing List, linux-unionfs On Mon, Oct 27, 2014 at 09:11:27PM -0700, Paul E. McKenney wrote: > On Tue, Oct 28, 2014 at 01:12:14AM +0000, Al Viro wrote: > > On Mon, Oct 27, 2014 at 10:36:21AM -0700, Paul E. McKenney wrote: > > > Code making direct use of smp_read_barrier_depends() is harder to read, > > > in my experience, but good point on the sparse noise. Maybe a new > > > lockless_dereference() primitive? Maybe something like the following? > > > (Untested, probably does not even build.) > > > > > > #define lockless_dereference(p) \ > > > ({ \ > > > typeof(*p) *_________p1 = ACCESS_ONCE(p); \ > > > smp_read_barrier_depends(); /* Dependency order vs. p above. */ \ > > > _________p1; \ > > > }) > > > > Hmm... Where would you prefer to put it? rcupdate.h? > > Good a place as any, I guess. Please see patch below. Left to myself, > I would send this along for the next merge window, but please let me > know if you would like it sooner. It's needed sooner, unfortunately. Guys, could you take a look at vfs.git#for-linus and comment? ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [GIT PULL] overlay filesystem v25 2014-10-28 22:55 ` Al Viro @ 2014-10-28 23:25 ` Paul E. McKenney 0 siblings, 0 replies; 16+ messages in thread From: Paul E. McKenney @ 2014-10-28 23:25 UTC (permalink / raw) To: Al Viro Cc: Miklos Szeredi, Linus Torvalds, Linux-Fsdevel, Kernel Mailing List, linux-unionfs On Tue, Oct 28, 2014 at 10:55:13PM +0000, Al Viro wrote: > On Mon, Oct 27, 2014 at 09:11:27PM -0700, Paul E. McKenney wrote: > > On Tue, Oct 28, 2014 at 01:12:14AM +0000, Al Viro wrote: > > > On Mon, Oct 27, 2014 at 10:36:21AM -0700, Paul E. McKenney wrote: > > > > Code making direct use of smp_read_barrier_depends() is harder to read, > > > > in my experience, but good point on the sparse noise. Maybe a new > > > > lockless_dereference() primitive? Maybe something like the following? > > > > (Untested, probably does not even build.) > > > > > > > > #define lockless_dereference(p) \ > > > > ({ \ > > > > typeof(*p) *_________p1 = ACCESS_ONCE(p); \ > > > > smp_read_barrier_depends(); /* Dependency order vs. p above. */ \ > > > > _________p1; \ > > > > }) > > > > > > Hmm... Where would you prefer to put it? rcupdate.h? > > > > Good a place as any, I guess. Please see patch below. Left to myself, > > I would send this along for the next merge window, but please let me > > know if you would like it sooner. > > It's needed sooner, unfortunately. Guys, could you take a look at > vfs.git#for-linus and comment? The version in your tree looks good. I will drop my commit in favor of yours. The use of lockless_dereference() and smp_mb__before_spinlock() in d45f00ae43 (overlayfs: barriers for opening upper-layer directory) looks fine to me. Only nit is lack of space between "=" and lockless_dereference(). Thanx, Paul ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [GIT PULL] overlay filesystem v25 2014-10-24 3:24 ` Al Viro 2014-10-24 7:24 ` Miklos Szeredi @ 2014-10-24 7:28 ` Geert Uytterhoeven 1 sibling, 0 replies; 16+ messages in thread From: Geert Uytterhoeven @ 2014-10-24 7:28 UTC (permalink / raw) To: Al Viro Cc: Linus Torvalds, Miklos Szeredi, Linux FS Devel, linux-kernel@vger.kernel.org, linux-unionfs, Stephen Rothwell Hi Al, On Fri, Oct 24, 2014 at 5:24 AM, Al Viro <viro@zeniv.linux.org.uk> wrote: > On Fri, Oct 24, 2014 at 03:20:55AM +0100, Al Viro wrote: >> Why the hell do you hold ->i_mutex across the entire opening of underlying >> directory? All you need is to serialize one assignment; the side that loses >> the race will simply fput() what it opened... >> >> Oh, well - that goes under "weird pessimisations, easy to fix in followups"... > > OK, pulled into vfs.git, followups in question added. Also there: fix for > a long-standing leak in d_splice_alias() failure exits. Guys, could you > check that current vfs.git#for-linus survives your local tests? Seems to > survive here; if I don't hear of any problems by tomorrow morning, to Linus > it goes... FWIW, for that pull request stats would be I think you forgot to merge your for-linus branch into for-next? Anyway, as Stephen announced there will be no linux-next release today, I'm afraid you'll have to delay your pull request to Tuesday. Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2014-10-28 23:25 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-10-23 23:25 [GIT PULL] overlay filesystem v25 Miklos Szeredi 2014-10-24 2:20 ` Al Viro 2014-10-24 3:24 ` Al Viro 2014-10-24 7:24 ` Miklos Szeredi 2014-10-25 8:18 ` Al Viro 2014-10-25 9:53 ` Miklos Szeredi 2014-10-25 17:06 ` Al Viro 2014-10-27 8:06 ` Miklos Szeredi 2014-10-27 15:59 ` Paul E. McKenney 2014-10-27 17:28 ` Al Viro 2014-10-27 17:36 ` Paul E. McKenney 2014-10-28 1:12 ` Al Viro 2014-10-28 4:11 ` Paul E. McKenney 2014-10-28 22:55 ` Al Viro 2014-10-28 23:25 ` Paul E. McKenney 2014-10-24 7:28 ` Geert Uytterhoeven
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).