* [GIT PULL] overlay filesystem v25
@ 2014-10-23 23:25 Miklos Szeredi
2014-10-24 2:20 ` Al Viro
0 siblings, 1 reply; 16+ messages in thread
From: Miklos Szeredi @ 2014-10-23 23:25 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Al Viro, linux-fsdevel, linux-kernel, linux-unionfs
Linus,
Please pull
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git overlayfs.v25
Plenty of bugs fixed relative to the previous version, thanks to Al Viro's
review and is now officially be BugFree(TM). Also passes the
unionmount-testsuite by David Howells.
Thanks,
Miklos
---
Andy Whitcroft (1):
overlayfs: add statfs support
Erez Zadok (1):
overlayfs: implement show_options
Miklos Szeredi (11):
vfs: add i_op->dentry_open()
vfs: export do_splice_direct() to modules
vfs: export __inode_permission() to modules
vfs: introduce clone_private_mount()
vfs: export check_sticky()
vfs: add whiteout support
vfs: add RENAME_WHITEOUT
ext4: support RENAME_WHITEOUT
shmem: support RENAME_WHITEOUT
overlay filesystem
fs: limit filesystem stacking depth
Neil Brown (1):
overlay: overlay filesystem documentation
---
Documentation/filesystems/Locking | 2 +
Documentation/filesystems/overlayfs.txt | 198 +++++++
Documentation/filesystems/vfs.txt | 7 +
MAINTAINERS | 7 +
fs/Kconfig | 1 +
fs/Makefile | 1 +
fs/btrfs/ioctl.c | 20 +-
fs/ecryptfs/main.c | 7 +
fs/ext4/namei.c | 95 +++-
fs/internal.h | 7 -
fs/namei.c | 41 +-
fs/namespace.c | 27 +
fs/open.c | 23 +-
fs/overlayfs/Kconfig | 10 +
fs/overlayfs/Makefile | 7 +
fs/overlayfs/copy_up.c | 414 ++++++++++++++
fs/overlayfs/dir.c | 921 ++++++++++++++++++++++++++++++++
fs/overlayfs/inode.c | 425 +++++++++++++++
fs/overlayfs/overlayfs.h | 191 +++++++
fs/overlayfs/readdir.c | 587 ++++++++++++++++++++
fs/overlayfs/super.c | 796 +++++++++++++++++++++++++++
fs/splice.c | 1 +
include/linux/fs.h | 39 ++
include/linux/mount.h | 3 +
include/uapi/linux/fs.h | 1 +
mm/shmem.c | 36 +-
26 files changed, 3809 insertions(+), 58 deletions(-)
create mode 100644 Documentation/filesystems/overlayfs.txt
create mode 100644 fs/overlayfs/Kconfig
create mode 100644 fs/overlayfs/Makefile
create mode 100644 fs/overlayfs/copy_up.c
create mode 100644 fs/overlayfs/dir.c
create mode 100644 fs/overlayfs/inode.c
create mode 100644 fs/overlayfs/overlayfs.h
create mode 100644 fs/overlayfs/readdir.c
create mode 100644 fs/overlayfs/super.c
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [GIT PULL] overlay filesystem v25
2014-10-23 23:25 [GIT PULL] overlay filesystem v25 Miklos Szeredi
@ 2014-10-24 2:20 ` Al Viro
2014-10-24 3:24 ` Al Viro
0 siblings, 1 reply; 16+ messages in thread
From: Al Viro @ 2014-10-24 2:20 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: Linus Torvalds, linux-fsdevel, linux-kernel, linux-unionfs
On Fri, Oct 24, 2014 at 01:25:39AM +0200, Miklos Szeredi wrote:
> Linus,
>
> Please pull
>
> git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git overlayfs.v25
>
> Plenty of bugs fixed relative to the previous version, thanks to Al Viro's
> review and is now officially be BugFree(TM). Also passes the
> unionmount-testsuite by David Howells.
*blink*
Why the hell do you hold ->i_mutex across the entire opening of underlying
directory? All you need is to serialize one assignment; the side that loses
the race will simply fput() what it opened...
Oh, well - that goes under "weird pessimisations, easy to fix in followups"...
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [GIT PULL] overlay filesystem v25
2014-10-24 2:20 ` Al Viro
@ 2014-10-24 3:24 ` Al Viro
2014-10-24 7:24 ` Miklos Szeredi
2014-10-24 7:28 ` Geert Uytterhoeven
0 siblings, 2 replies; 16+ messages in thread
From: Al Viro @ 2014-10-24 3:24 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Miklos Szeredi, linux-fsdevel, linux-kernel, linux-unionfs
On Fri, Oct 24, 2014 at 03:20:55AM +0100, Al Viro wrote:
> Why the hell do you hold ->i_mutex across the entire opening of underlying
> directory? All you need is to serialize one assignment; the side that loses
> the race will simply fput() what it opened...
>
> Oh, well - that goes under "weird pessimisations, easy to fix in followups"...
OK, pulled into vfs.git, followups in question added. Also there: fix for
a long-standing leak in d_splice_alias() failure exits. Guys, could you
check that current vfs.git#for-linus survives your local tests? Seems to
survive here; if I don't hear of any problems by tomorrow morning, to Linus
it goes... FWIW, for that pull request stats would be
Shortlog:
Al Viro (5):
fix inode leaks on d_splice_alias() failure exits
overlayfs: don't hold ->i_mutex over opening the real directory
overlayfs: make ovl_cache_entry->name an array instead of pointer
overlayfs: embed root into overlay_readdir_data
overlayfs: embed middle into overlay_readdir_data
Andy Whitcroft (1):
overlayfs: add statfs support
Erez Zadok (1):
overlayfs: implement show_options
Miklos Szeredi (11):
vfs: add i_op->dentry_open()
vfs: export do_splice_direct() to modules
vfs: export __inode_permission() to modules
vfs: introduce clone_private_mount()
vfs: export check_sticky()
vfs: add whiteout support
vfs: add RENAME_WHITEOUT
ext4: support RENAME_WHITEOUT
shmem: support RENAME_WHITEOUT
overlay filesystem
fs: limit filesystem stacking depth
Neil Brown (1):
overlay: overlay filesystem documentation
Diffstat:
Documentation/filesystems/Locking | 2 +
Documentation/filesystems/overlayfs.txt | 198 +++++++
Documentation/filesystems/vfs.txt | 7 +
MAINTAINERS | 7 +
fs/Kconfig | 1 +
fs/Makefile | 1 +
fs/btrfs/ioctl.c | 20 +-
fs/dcache.c | 2 +
fs/ecryptfs/main.c | 7 +
fs/ext4/namei.c | 95 +++-
fs/internal.h | 7 -
fs/namei.c | 41 +-
fs/namespace.c | 27 +
fs/open.c | 23 +-
fs/overlayfs/Kconfig | 10 +
fs/overlayfs/Makefile | 7 +
fs/overlayfs/copy_up.c | 414 ++++++++++++++
fs/overlayfs/dir.c | 921 +++++++++++++++++++++++++++++++
fs/overlayfs/inode.c | 425 ++++++++++++++
fs/overlayfs/overlayfs.h | 191 +++++++
fs/overlayfs/readdir.c | 589 ++++++++++++++++++++
fs/overlayfs/super.c | 796 ++++++++++++++++++++++++++
fs/splice.c | 1 +
include/linux/fs.h | 39 ++
include/linux/mount.h | 3 +
include/uapi/linux/fs.h | 1 +
mm/shmem.c | 36 +-
27 files changed, 3813 insertions(+), 58 deletions(-)
create mode 100644 Documentation/filesystems/overlayfs.txt
create mode 100644 fs/overlayfs/Kconfig
create mode 100644 fs/overlayfs/Makefile
create mode 100644 fs/overlayfs/copy_up.c
create mode 100644 fs/overlayfs/dir.c
create mode 100644 fs/overlayfs/inode.c
create mode 100644 fs/overlayfs/overlayfs.h
create mode 100644 fs/overlayfs/readdir.c
create mode 100644 fs/overlayfs/super.c
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [GIT PULL] overlay filesystem v25
2014-10-24 3:24 ` Al Viro
@ 2014-10-24 7:24 ` Miklos Szeredi
2014-10-25 8:18 ` Al Viro
2014-10-24 7:28 ` Geert Uytterhoeven
1 sibling, 1 reply; 16+ messages in thread
From: Miklos Szeredi @ 2014-10-24 7:24 UTC (permalink / raw)
To: Al Viro; +Cc: Linus Torvalds, Linux-Fsdevel, Kernel Mailing List, linux-unionfs
On Fri, Oct 24, 2014 at 5:24 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Fri, Oct 24, 2014 at 03:20:55AM +0100, Al Viro wrote:
>> Why the hell do you hold ->i_mutex across the entire opening of underlying
>> directory? All you need is to serialize one assignment; the side that loses
>> the race will simply fput() what it opened...
The reason I didn't do your "fix" is that it
- adds more lines than it takes,
- I wasn't sure at all if the lockless access is actually correct
without the ACCESS_ONCE and all the memory barrier magic that might be
necessary on weird architectures.
The rest of the changes look OK.
Thanks,
Miklos
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [GIT PULL] overlay filesystem v25
2014-10-24 3:24 ` Al Viro
2014-10-24 7:24 ` Miklos Szeredi
@ 2014-10-24 7:28 ` Geert Uytterhoeven
1 sibling, 0 replies; 16+ messages in thread
From: Geert Uytterhoeven @ 2014-10-24 7:28 UTC (permalink / raw)
To: Al Viro
Cc: Linus Torvalds, Miklos Szeredi, Linux FS Devel,
linux-kernel@vger.kernel.org, linux-unionfs, Stephen Rothwell
Hi Al,
On Fri, Oct 24, 2014 at 5:24 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Fri, Oct 24, 2014 at 03:20:55AM +0100, Al Viro wrote:
>> Why the hell do you hold ->i_mutex across the entire opening of underlying
>> directory? All you need is to serialize one assignment; the side that loses
>> the race will simply fput() what it opened...
>>
>> Oh, well - that goes under "weird pessimisations, easy to fix in followups"...
>
> OK, pulled into vfs.git, followups in question added. Also there: fix for
> a long-standing leak in d_splice_alias() failure exits. Guys, could you
> check that current vfs.git#for-linus survives your local tests? Seems to
> survive here; if I don't hear of any problems by tomorrow morning, to Linus
> it goes... FWIW, for that pull request stats would be
I think you forgot to merge your for-linus branch into for-next?
Anyway, as Stephen announced there will be no linux-next release
today, I'm afraid you'll have to delay your pull request to Tuesday.
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [GIT PULL] overlay filesystem v25
2014-10-24 7:24 ` Miklos Szeredi
@ 2014-10-25 8:18 ` Al Viro
2014-10-25 9:53 ` Miklos Szeredi
0 siblings, 1 reply; 16+ messages in thread
From: Al Viro @ 2014-10-25 8:18 UTC (permalink / raw)
To: Miklos Szeredi
Cc: Linus Torvalds, Linux-Fsdevel, Kernel Mailing List, linux-unionfs
On Fri, Oct 24, 2014 at 09:24:45AM +0200, Miklos Szeredi wrote:
> The reason I didn't do your "fix" is that it
>
> - adds more lines than it takes,
>
> - I wasn't sure at all if the lockless access is actually correct
> without the ACCESS_ONCE and all the memory barrier magic that might be
> necessary on weird architectures.
_What_ lockless accesses? There is an extremely embarrassing bug in that
commit, all right, but it has nothing to do with barriers... All
barrier-related issues are taken care of by ovl_path_upper() (and without
that you'd have tons of worse problems). Fetching ->upperfile outside of
->i_mutex is fine - in the worst case we'll fetch NULL, open the sucker
grab ->i_mutex and find out that it has already been taken care of.
In which case we fput() what we'd opened and move on (fput() under
->i_mutex is fine - it's going to be delayed until return from syscall
anyway).
There was a very dumb braino in there; fixed, force-pushed, passes unionmount
tests, no regressions on LTP syscall ones and xfstests.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [GIT PULL] overlay filesystem v25
2014-10-25 8:18 ` Al Viro
@ 2014-10-25 9:53 ` Miklos Szeredi
2014-10-25 17:06 ` Al Viro
0 siblings, 1 reply; 16+ messages in thread
From: Miklos Szeredi @ 2014-10-25 9:53 UTC (permalink / raw)
To: Al Viro; +Cc: Linus Torvalds, Linux-Fsdevel, Kernel Mailing List, linux-unionfs
On Sat, Oct 25, 2014 at 10:18 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Fri, Oct 24, 2014 at 09:24:45AM +0200, Miklos Szeredi wrote:
>
>> The reason I didn't do your "fix" is that it
>>
>> - adds more lines than it takes,
>>
>> - I wasn't sure at all if the lockless access is actually correct
>> without the ACCESS_ONCE and all the memory barrier magic that might be
>> necessary on weird architectures.
>
> _What_ lockless accesses? There is an extremely embarrassing bug in that
> commit, all right, but it has nothing to do with barriers... All
> barrier-related issues are taken care of by ovl_path_upper() (and without
> that you'd have tons of worse problems). Fetching ->upperfile outside of
> ->i_mutex is fine - in the worst case we'll fetch NULL, open the sucker
> grab ->i_mutex and find out that it has already been taken care of.
> In which case we fput() what we'd opened and move on (fput() under
> ->i_mutex is fine - it's going to be delayed until return from syscall
> anyway).
Yes, but it's not about race with copy-up (which the ovl_path_upper()
protects against), but race of two fsync calls with each other. If
there's no synchronization between them, then that od->upperfile does
indeed count as lockless access, no matter that the assignment was
done under lock.
Thanks,
Miklos
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [GIT PULL] overlay filesystem v25
2014-10-25 9:53 ` Miklos Szeredi
@ 2014-10-25 17:06 ` Al Viro
2014-10-27 8:06 ` Miklos Szeredi
0 siblings, 1 reply; 16+ messages in thread
From: Al Viro @ 2014-10-25 17:06 UTC (permalink / raw)
To: Miklos Szeredi
Cc: Linus Torvalds, Linux-Fsdevel, Kernel Mailing List, linux-unionfs
On Sat, Oct 25, 2014 at 11:53:52AM +0200, Miklos Szeredi wrote:
> Yes, but it's not about race with copy-up (which the ovl_path_upper()
> protects against), but race of two fsync calls with each other. If
> there's no synchronization between them, then that od->upperfile does
> indeed count as lockless access, no matter that the assignment was
> done under lock.
p = global;
if (!p) { // outside of lock
p = alloc();
grab lock
if (!global) {
global = p;
} else {
destroy(p);
p = global;
}
drop lock
}
is a very common pattern, especially if you look for cases when lock is
a spinlock and allocation is blocking (in those cases you'll often see
destroy() part done after dropping the lock; that's where what I fucked up in
what I'd originally pushed. And it wasn't even needed - fput() under
->i_mutex is OK...)
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [GIT PULL] overlay filesystem v25
2014-10-25 17:06 ` Al Viro
@ 2014-10-27 8:06 ` Miklos Szeredi
2014-10-27 15:59 ` Paul E. McKenney
0 siblings, 1 reply; 16+ messages in thread
From: Miklos Szeredi @ 2014-10-27 8:06 UTC (permalink / raw)
To: Al Viro
Cc: Linus Torvalds, Linux-Fsdevel, Kernel Mailing List, linux-unionfs,
Paul E. McKenney
[Paul McKenney added to CC]
On Sat, Oct 25, 2014 at 7:06 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Sat, Oct 25, 2014 at 11:53:52AM +0200, Miklos Szeredi wrote:
>
>> Yes, but it's not about race with copy-up (which the ovl_path_upper()
>> protects against), but race of two fsync calls with each other. If
>> there's no synchronization between them, then that od->upperfile does
>> indeed count as lockless access, no matter that the assignment was
>> done under lock.
>
> p = global;
> if (!p) { // outside of lock
> p = alloc();
> grab lock
> if (!global) {
> global = p;
> } else {
> destroy(p);
> p = global;
> }
> drop lock
> }
> is a very common pattern, especially if you look for cases when lock is
> a spinlock and allocation is blocking (in those cases you'll often see
> destroy() part done after dropping the lock; that's where what I fucked up in
> what I'd originally pushed. And it wasn't even needed - fput() under
> ->i_mutex is OK...)
Being a very common pattern does not automatically make it correct...
My understanding of these issues is very limited, but it's not clear
to me what will order initialization of members of p with the storing
of p into global. E.g. we start out with global == NULL and p->foo ==
0.
CPU1:
p->foo = 1
grab lock
if (!global)
global = p
CPU1:
p = global
if (p)
q = p->foo
Is it guaranteed that the above sequence (as is, without any barriers
or ACCESS_ONCE() other than the lock acquisition) will result in q ==
1 if p != NULL?
Thanks,
Miklos
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [GIT PULL] overlay filesystem v25
2014-10-27 8:06 ` Miklos Szeredi
@ 2014-10-27 15:59 ` Paul E. McKenney
2014-10-27 17:28 ` Al Viro
0 siblings, 1 reply; 16+ messages in thread
From: Paul E. McKenney @ 2014-10-27 15:59 UTC (permalink / raw)
To: Miklos Szeredi
Cc: Al Viro, Linus Torvalds, Linux-Fsdevel, Kernel Mailing List,
linux-unionfs
On Mon, Oct 27, 2014 at 09:06:54AM +0100, Miklos Szeredi wrote:
> [Paul McKenney added to CC]
>
> On Sat, Oct 25, 2014 at 7:06 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> > On Sat, Oct 25, 2014 at 11:53:52AM +0200, Miklos Szeredi wrote:
> >
> >> Yes, but it's not about race with copy-up (which the ovl_path_upper()
> >> protects against), but race of two fsync calls with each other. If
> >> there's no synchronization between them, then that od->upperfile does
> >> indeed count as lockless access, no matter that the assignment was
> >> done under lock.
> >
> > p = global;
> > if (!p) { // outside of lock
> > p = alloc();
> > grab lock
> > if (!global) {
> > global = p;
> > } else {
> > destroy(p);
> > p = global;
> > }
> > drop lock
> > }
> > is a very common pattern, especially if you look for cases when lock is
> > a spinlock and allocation is blocking (in those cases you'll often see
> > destroy() part done after dropping the lock; that's where what I fucked up in
> > what I'd originally pushed. And it wasn't even needed - fput() under
> > ->i_mutex is OK...)
>
> Being a very common pattern does not automatically make it correct...
>
> My understanding of these issues is very limited, but it's not clear
> to me what will order initialization of members of p with the storing
> of p into global. E.g. we start out with global == NULL and p->foo ==
> 0.
>
> CPU1:
> p->foo = 1
> grab lock
> if (!global)
> global = p
>
> CPU1:
If it is all the same to you, I will call this CPU2 to distinguish it from
the first CPU1. ;-)
> p = global
> if (p)
> q = p->foo
>
> Is it guaranteed that the above sequence (as is, without any barriers
> or ACCESS_ONCE() other than the lock acquisition) will result in q ==
> 1 if p != NULL?
Indeed, life is hard here. Keep in mind that lock acquisition is not
guaranteed to prevent prior operations from being reordered into the
critical section, possibly as follows:
CPU1:
grab lock
if (!global)
global = p;
/* Assume all of CPU2's accesses happen here. */
p->foo = 1;
This clearly allows CPU2 to execute as follows:
CPU2:
p = global; /* gets p */
if (p) /* non-NULL */
q = p->foo; /* might not be 1 */
Not only that, on DEC Alpha, even if CPU1's accesses are ordered, CPU2's
accesses can be misordered. You need rcu_dereference() or the combination
of ACCESS_ONCE() and smp_read_barrier_depends() to avoid this issue.
As always, see http://www.openvms.compaq.com/wizard/wiz_2637.html for
more info.
So no, there is no guarantee. I am assuming that the lock grabbed by
CPU1 guards all assignments to "global", otherwise the code needs further
help. I am further assuming that the memory pointed to by CPU1's "p"
is inaccessible to any other CPU, as in CPU1 just allocated the memory.
Otherwise, the assignment "p->foo = 1" is questionable. And finally,
I am assuming that p->foo stays constant once it has been made
accessible to readers.
But the following should work:
CPU1:
p->foo = 1; /* Assumes p is local. */
smp_mb__before_spinlock();
grab lock
if (!global) /* Assumes lock protects all assignments to global. */
global = p;
CPU2:
p = rcu_dereference(global);
if (p)
q = p->foo; /* Assumes p->foo constant once visible to readers. */
/* Also assumes p and q are local. */
If the assumptions called out in the comments do not hold, you at least
need ACCESS_ONCE(), and perhaps even more synchronization. For more info
on ACCESS_ONCE(), Jon's LWN article is at http://lwn.net/Articles/508991/.
Thanx, Paul
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [GIT PULL] overlay filesystem v25
2014-10-27 15:59 ` Paul E. McKenney
@ 2014-10-27 17:28 ` Al Viro
2014-10-27 17:36 ` Paul E. McKenney
0 siblings, 1 reply; 16+ messages in thread
From: Al Viro @ 2014-10-27 17:28 UTC (permalink / raw)
To: Paul E. McKenney
Cc: Miklos Szeredi, Linus Torvalds, Linux-Fsdevel,
Kernel Mailing List, linux-unionfs
On Mon, Oct 27, 2014 at 08:59:01AM -0700, Paul E. McKenney wrote:
> Indeed, life is hard here. Keep in mind that lock acquisition is not
> guaranteed to prevent prior operations from being reordered into the
> critical section, possibly as follows:
>
> CPU1:
> grab lock
> if (!global)
> global = p;
> /* Assume all of CPU2's accesses happen here. */
> p->foo = 1;
A bit of context: p is a local pointer to struct file here and alloc is
opening it...
> This clearly allows CPU2 to execute as follows:
>
> CPU2:
> p = global; /* gets p */
> if (p) /* non-NULL */
> q = p->foo; /* might not be 1 */
>
> Not only that, on DEC Alpha, even if CPU1's accesses are ordered, CPU2's
> accesses can be misordered. You need rcu_dereference() or the combination
> of ACCESS_ONCE() and smp_read_barrier_depends() to avoid this issue.
> As always, see http://www.openvms.compaq.com/wizard/wiz_2637.html for
> more info.
>
> So no, there is no guarantee. I am assuming that the lock grabbed by
> CPU1 guards all assignments to "global", otherwise the code needs further
> help. I am further assuming that the memory pointed to by CPU1's "p"
> is inaccessible to any other CPU, as in CPU1 just allocated the memory.
> Otherwise, the assignment "p->foo = 1" is questionable. And finally,
> I am assuming that p->foo stays constant once it has been made
> accessible to readers.
>
> But the following should work:
>
> CPU1:
> p->foo = 1; /* Assumes p is local. */
> smp_mb__before_spinlock();
> grab lock
> if (!global) /* Assumes lock protects all assignments to global. */
> global = p;
>
> CPU2:
> p = rcu_dereference(global);
> if (p)
> q = p->foo; /* Assumes p->foo constant once visible to readers. */
> /* Also assumes p and q are local. */
>
> If the assumptions called out in the comments do not hold, you at least
> need ACCESS_ONCE(), and perhaps even more synchronization. For more info
> on ACCESS_ONCE(), Jon's LWN article is at http://lwn.net/Articles/508991/.
First of all, this "p->foo = 1" is a shorthand for initialization of
struct file done by opening a file. What you are saying is that it
can leak past the point where we stick a pointer to freshly opened
file into a place where another CPU can see it, but not past the
barrier in dropping the lock, right?
And you are suggesting rcu_dereference() as a way to bring the required
barriers in. Ouch... The names are really bad, but there's another
fun issue - rcu_dereference brings in sparse noise. Wouldn't direct use
of smp_read_barrier_depends() be cleaner, anyway?
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [GIT PULL] overlay filesystem v25
2014-10-27 17:28 ` Al Viro
@ 2014-10-27 17:36 ` Paul E. McKenney
2014-10-28 1:12 ` Al Viro
0 siblings, 1 reply; 16+ messages in thread
From: Paul E. McKenney @ 2014-10-27 17:36 UTC (permalink / raw)
To: Al Viro
Cc: Miklos Szeredi, Linus Torvalds, Linux-Fsdevel,
Kernel Mailing List, linux-unionfs
On Mon, Oct 27, 2014 at 05:28:01PM +0000, Al Viro wrote:
> On Mon, Oct 27, 2014 at 08:59:01AM -0700, Paul E. McKenney wrote:
>
> > Indeed, life is hard here. Keep in mind that lock acquisition is not
> > guaranteed to prevent prior operations from being reordered into the
> > critical section, possibly as follows:
> >
> > CPU1:
> > grab lock
> > if (!global)
> > global = p;
> > /* Assume all of CPU2's accesses happen here. */
> > p->foo = 1;
>
> A bit of context: p is a local pointer to struct file here and alloc is
> opening it...
OK, good to know. ;-)
> > This clearly allows CPU2 to execute as follows:
> >
> > CPU2:
> > p = global; /* gets p */
> > if (p) /* non-NULL */
> > q = p->foo; /* might not be 1 */
> >
> > Not only that, on DEC Alpha, even if CPU1's accesses are ordered, CPU2's
> > accesses can be misordered. You need rcu_dereference() or the combination
> > of ACCESS_ONCE() and smp_read_barrier_depends() to avoid this issue.
> > As always, see http://www.openvms.compaq.com/wizard/wiz_2637.html for
> > more info.
> >
> > So no, there is no guarantee. I am assuming that the lock grabbed by
> > CPU1 guards all assignments to "global", otherwise the code needs further
> > help. I am further assuming that the memory pointed to by CPU1's "p"
> > is inaccessible to any other CPU, as in CPU1 just allocated the memory.
> > Otherwise, the assignment "p->foo = 1" is questionable. And finally,
> > I am assuming that p->foo stays constant once it has been made
> > accessible to readers.
> >
> > But the following should work:
> >
> > CPU1:
> > p->foo = 1; /* Assumes p is local. */
> > smp_mb__before_spinlock();
> > grab lock
> > if (!global) /* Assumes lock protects all assignments to global. */
> > global = p;
> >
> > CPU2:
> > p = rcu_dereference(global);
> > if (p)
> > q = p->foo; /* Assumes p->foo constant once visible to readers. */
> > /* Also assumes p and q are local. */
> >
> > If the assumptions called out in the comments do not hold, you at least
> > need ACCESS_ONCE(), and perhaps even more synchronization. For more info
> > on ACCESS_ONCE(), Jon's LWN article is at http://lwn.net/Articles/508991/.
>
> First of all, this "p->foo = 1" is a shorthand for initialization of
> struct file done by opening a file. What you are saying is that it
> can leak past the point where we stick a pointer to freshly opened
> file into a place where another CPU can see it, but not past the
> barrier in dropping the lock, right?
Exactly!
I should also add that smp_mb__before_spinlock() implies smp_wmb(), but
nothing more. But that is OK because smp_wmb() suffices in this case.
> And you are suggesting rcu_dereference() as a way to bring the required
> barriers in. Ouch... The names are really bad, but there's another
> fun issue - rcu_dereference brings in sparse noise. Wouldn't direct use
> of smp_read_barrier_depends() be cleaner, anyway?
Code making direct use of smp_read_barrier_depends() is harder to read,
in my experience, but good point on the sparse noise. Maybe a new
lockless_dereference() primitive? Maybe something like the following?
(Untested, probably does not even build.)
#define lockless_dereference(p) \
({ \
typeof(*p) *_________p1 = ACCESS_ONCE(p); \
smp_read_barrier_depends(); /* Dependency order vs. p above. */ \
_________p1; \
})
Thanx, Paul
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [GIT PULL] overlay filesystem v25
2014-10-27 17:36 ` Paul E. McKenney
@ 2014-10-28 1:12 ` Al Viro
2014-10-28 4:11 ` Paul E. McKenney
0 siblings, 1 reply; 16+ messages in thread
From: Al Viro @ 2014-10-28 1:12 UTC (permalink / raw)
To: Paul E. McKenney
Cc: Miklos Szeredi, Linus Torvalds, Linux-Fsdevel,
Kernel Mailing List, linux-unionfs
On Mon, Oct 27, 2014 at 10:36:21AM -0700, Paul E. McKenney wrote:
> Code making direct use of smp_read_barrier_depends() is harder to read,
> in my experience, but good point on the sparse noise. Maybe a new
> lockless_dereference() primitive? Maybe something like the following?
> (Untested, probably does not even build.)
>
> #define lockless_dereference(p) \
> ({ \
> typeof(*p) *_________p1 = ACCESS_ONCE(p); \
> smp_read_barrier_depends(); /* Dependency order vs. p above. */ \
> _________p1; \
> })
Hmm... Where would you prefer to put it? rcupdate.h?
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [GIT PULL] overlay filesystem v25
2014-10-28 1:12 ` Al Viro
@ 2014-10-28 4:11 ` Paul E. McKenney
2014-10-28 22:55 ` Al Viro
0 siblings, 1 reply; 16+ messages in thread
From: Paul E. McKenney @ 2014-10-28 4:11 UTC (permalink / raw)
To: Al Viro
Cc: Miklos Szeredi, Linus Torvalds, Linux-Fsdevel,
Kernel Mailing List, linux-unionfs
On Tue, Oct 28, 2014 at 01:12:14AM +0000, Al Viro wrote:
> On Mon, Oct 27, 2014 at 10:36:21AM -0700, Paul E. McKenney wrote:
> > Code making direct use of smp_read_barrier_depends() is harder to read,
> > in my experience, but good point on the sparse noise. Maybe a new
> > lockless_dereference() primitive? Maybe something like the following?
> > (Untested, probably does not even build.)
> >
> > #define lockless_dereference(p) \
> > ({ \
> > typeof(*p) *_________p1 = ACCESS_ONCE(p); \
> > smp_read_barrier_depends(); /* Dependency order vs. p above. */ \
> > _________p1; \
> > })
>
> Hmm... Where would you prefer to put it? rcupdate.h?
Good a place as any, I guess. Please see patch below. Left to myself,
I would send this along for the next merge window, but please let me
know if you would like it sooner.
Thanx, Paul
------------------------------------------------------------------------
rcu: Provide counterpart to rcu_dereference() for non-RCU situations
Although rcu_dereference() and friends can be used in situations where
object lifetimes are being managed by something other than RCU, the
resulting sparse and lockdep-RCU noise can be annoying. This commit
therefore supplies a lockless_dereference(), which provides the
protection for dereferences without the RCU-related debugging noise.
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 9575c2d403b5..ed4f5939a452 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -617,6 +617,21 @@ static inline void rcu_preempt_sleep_check(void)
#define RCU_INITIALIZER(v) (typeof(*(v)) __force __rcu *)(v)
/**
+ * lockless_dereference() - safely load a pointer for later dereference
+ * @p: The pointer to load
+ *
+ * Similar to rcu_dereference(), but for situations where the pointed-to
+ * object's lifetime is managed by something other than RCU. That
+ * "something other" might be reference counting or simple immortality.
+ */
+#define lockless_dereference(p) \
+({ \
+ typeof(p) _________p1 = ACCESS_ONCE(p); \
+ smp_read_barrier_depends(); /* Dependency order vs. p above. */ \
+ (_________p1); \
+})
+
+/**
* rcu_assign_pointer() - assign to RCU-protected pointer
* @p: pointer to assign to
* @v: value to assign (publish)
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [GIT PULL] overlay filesystem v25
2014-10-28 4:11 ` Paul E. McKenney
@ 2014-10-28 22:55 ` Al Viro
2014-10-28 23:25 ` Paul E. McKenney
0 siblings, 1 reply; 16+ messages in thread
From: Al Viro @ 2014-10-28 22:55 UTC (permalink / raw)
To: Paul E. McKenney
Cc: Miklos Szeredi, Linus Torvalds, Linux-Fsdevel,
Kernel Mailing List, linux-unionfs
On Mon, Oct 27, 2014 at 09:11:27PM -0700, Paul E. McKenney wrote:
> On Tue, Oct 28, 2014 at 01:12:14AM +0000, Al Viro wrote:
> > On Mon, Oct 27, 2014 at 10:36:21AM -0700, Paul E. McKenney wrote:
> > > Code making direct use of smp_read_barrier_depends() is harder to read,
> > > in my experience, but good point on the sparse noise. Maybe a new
> > > lockless_dereference() primitive? Maybe something like the following?
> > > (Untested, probably does not even build.)
> > >
> > > #define lockless_dereference(p) \
> > > ({ \
> > > typeof(*p) *_________p1 = ACCESS_ONCE(p); \
> > > smp_read_barrier_depends(); /* Dependency order vs. p above. */ \
> > > _________p1; \
> > > })
> >
> > Hmm... Where would you prefer to put it? rcupdate.h?
>
> Good a place as any, I guess. Please see patch below. Left to myself,
> I would send this along for the next merge window, but please let me
> know if you would like it sooner.
It's needed sooner, unfortunately. Guys, could you take a look at
vfs.git#for-linus and comment?
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [GIT PULL] overlay filesystem v25
2014-10-28 22:55 ` Al Viro
@ 2014-10-28 23:25 ` Paul E. McKenney
0 siblings, 0 replies; 16+ messages in thread
From: Paul E. McKenney @ 2014-10-28 23:25 UTC (permalink / raw)
To: Al Viro
Cc: Miklos Szeredi, Linus Torvalds, Linux-Fsdevel,
Kernel Mailing List, linux-unionfs
On Tue, Oct 28, 2014 at 10:55:13PM +0000, Al Viro wrote:
> On Mon, Oct 27, 2014 at 09:11:27PM -0700, Paul E. McKenney wrote:
> > On Tue, Oct 28, 2014 at 01:12:14AM +0000, Al Viro wrote:
> > > On Mon, Oct 27, 2014 at 10:36:21AM -0700, Paul E. McKenney wrote:
> > > > Code making direct use of smp_read_barrier_depends() is harder to read,
> > > > in my experience, but good point on the sparse noise. Maybe a new
> > > > lockless_dereference() primitive? Maybe something like the following?
> > > > (Untested, probably does not even build.)
> > > >
> > > > #define lockless_dereference(p) \
> > > > ({ \
> > > > typeof(*p) *_________p1 = ACCESS_ONCE(p); \
> > > > smp_read_barrier_depends(); /* Dependency order vs. p above. */ \
> > > > _________p1; \
> > > > })
> > >
> > > Hmm... Where would you prefer to put it? rcupdate.h?
> >
> > Good a place as any, I guess. Please see patch below. Left to myself,
> > I would send this along for the next merge window, but please let me
> > know if you would like it sooner.
>
> It's needed sooner, unfortunately. Guys, could you take a look at
> vfs.git#for-linus and comment?
The version in your tree looks good. I will drop my commit in favor
of yours.
The use of lockless_dereference() and smp_mb__before_spinlock() in
d45f00ae43 (overlayfs: barriers for opening upper-layer directory)
looks fine to me. Only nit is lack of space between "=" and
lockless_dereference().
Thanx, Paul
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2014-10-28 23:25 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-10-23 23:25 [GIT PULL] overlay filesystem v25 Miklos Szeredi
2014-10-24 2:20 ` Al Viro
2014-10-24 3:24 ` Al Viro
2014-10-24 7:24 ` Miklos Szeredi
2014-10-25 8:18 ` Al Viro
2014-10-25 9:53 ` Miklos Szeredi
2014-10-25 17:06 ` Al Viro
2014-10-27 8:06 ` Miklos Szeredi
2014-10-27 15:59 ` Paul E. McKenney
2014-10-27 17:28 ` Al Viro
2014-10-27 17:36 ` Paul E. McKenney
2014-10-28 1:12 ` Al Viro
2014-10-28 4:11 ` Paul E. McKenney
2014-10-28 22:55 ` Al Viro
2014-10-28 23:25 ` Paul E. McKenney
2014-10-24 7:28 ` Geert Uytterhoeven
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).