[GIT PULL] overlay filesystem v25

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [GIT PULL] overlay filesystem v25
@ 2014-10-23 23:25 Miklos Szeredi
  2014-10-24  2:20 ` Al Viro
  0 siblings, 1 reply; 16+ messages in thread
From: Miklos Szeredi @ 2014-10-23 23:25 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Al Viro, linux-fsdevel, linux-kernel, linux-unionfs

Linus,

Please pull

  git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git overlayfs.v25

Plenty of bugs fixed relative to the previous version, thanks to Al Viro's
review and is now officially be BugFree(TM).  Also passes the
unionmount-testsuite by David Howells.

Thanks,
Miklos

---
Andy Whitcroft (1):
      overlayfs: add statfs support

Erez Zadok (1):
      overlayfs: implement show_options

Miklos Szeredi (11):
      vfs: add i_op->dentry_open()
      vfs: export do_splice_direct() to modules
      vfs: export __inode_permission() to modules
      vfs: introduce clone_private_mount()
      vfs: export check_sticky()
      vfs: add whiteout support
      vfs: add RENAME_WHITEOUT
      ext4: support RENAME_WHITEOUT
      shmem: support RENAME_WHITEOUT
      overlay filesystem
      fs: limit filesystem stacking depth

Neil Brown (1):
      overlay: overlay filesystem documentation

---
 Documentation/filesystems/Locking       |   2 +
 Documentation/filesystems/overlayfs.txt | 198 +++++++
 Documentation/filesystems/vfs.txt       |   7 +
 MAINTAINERS                             |   7 +
 fs/Kconfig                              |   1 +
 fs/Makefile                             |   1 +
 fs/btrfs/ioctl.c                        |  20 +-
 fs/ecryptfs/main.c                      |   7 +
 fs/ext4/namei.c                         |  95 +++-
 fs/internal.h                           |   7 -
 fs/namei.c                              |  41 +-
 fs/namespace.c                          |  27 +
 fs/open.c                               |  23 +-
 fs/overlayfs/Kconfig                    |  10 +
 fs/overlayfs/Makefile                   |   7 +
 fs/overlayfs/copy_up.c                  | 414 ++++++++++++++
 fs/overlayfs/dir.c                      | 921 ++++++++++++++++++++++++++++++++
 fs/overlayfs/inode.c                    | 425 +++++++++++++++
 fs/overlayfs/overlayfs.h                | 191 +++++++
 fs/overlayfs/readdir.c                  | 587 ++++++++++++++++++++
 fs/overlayfs/super.c                    | 796 +++++++++++++++++++++++++++
 fs/splice.c                             |   1 +
 include/linux/fs.h                      |  39 ++
 include/linux/mount.h                   |   3 +
 include/uapi/linux/fs.h                 |   1 +
 mm/shmem.c                              |  36 +-
 26 files changed, 3809 insertions(+), 58 deletions(-)
 create mode 100644 Documentation/filesystems/overlayfs.txt
 create mode 100644 fs/overlayfs/Kconfig
 create mode 100644 fs/overlayfs/Makefile
 create mode 100644 fs/overlayfs/copy_up.c
 create mode 100644 fs/overlayfs/dir.c
 create mode 100644 fs/overlayfs/inode.c
 create mode 100644 fs/overlayfs/overlayfs.h
 create mode 100644 fs/overlayfs/readdir.c
 create mode 100644 fs/overlayfs/super.c


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [GIT PULL] overlay filesystem v25
  2014-10-23 23:25 [GIT PULL] overlay filesystem v25 Miklos Szeredi
@ 2014-10-24  2:20 ` Al Viro
  2014-10-24  3:24   ` Al Viro
  0 siblings, 1 reply; 16+ messages in thread
From: Al Viro @ 2014-10-24  2:20 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Linus Torvalds, linux-fsdevel, linux-kernel, linux-unionfs

On Fri, Oct 24, 2014 at 01:25:39AM +0200, Miklos Szeredi wrote:
> Linus,
> 
> Please pull
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git overlayfs.v25
> 
> Plenty of bugs fixed relative to the previous version, thanks to Al Viro's
> review and is now officially be BugFree(TM).  Also passes the
> unionmount-testsuite by David Howells.

*blink*

Why the hell do you hold ->i_mutex across the entire opening of underlying
directory?  All you need is to serialize one assignment; the side that loses
the race will simply fput() what it opened...

Oh, well - that goes under "weird pessimisations, easy to fix in followups"...

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [GIT PULL] overlay filesystem v25
  2014-10-24  2:20 ` Al Viro
@ 2014-10-24  3:24   ` Al Viro
  2014-10-24  7:24     ` Miklos Szeredi
  2014-10-24  7:28     ` Geert Uytterhoeven
  0 siblings, 2 replies; 16+ messages in thread
From: Al Viro @ 2014-10-24  3:24 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Miklos Szeredi, linux-fsdevel, linux-kernel, linux-unionfs

On Fri, Oct 24, 2014 at 03:20:55AM +0100, Al Viro wrote:
> Why the hell do you hold ->i_mutex across the entire opening of underlying
> directory?  All you need is to serialize one assignment; the side that loses
> the race will simply fput() what it opened...
> 
> Oh, well - that goes under "weird pessimisations, easy to fix in followups"...

OK, pulled into vfs.git, followups in question added.  Also there: fix for
a long-standing leak in d_splice_alias() failure exits.  Guys, could you
check that current vfs.git#for-linus survives your local tests?  Seems to
survive here; if I don't hear of any problems by tomorrow morning, to Linus
it goes...  FWIW, for that pull request stats would be

Shortlog:
Al Viro (5):
      fix inode leaks on d_splice_alias() failure exits
      overlayfs: don't hold ->i_mutex over opening the real directory
      overlayfs: make ovl_cache_entry->name an array instead of pointer
      overlayfs: embed root into overlay_readdir_data
      overlayfs: embed middle into overlay_readdir_data

Andy Whitcroft (1):
      overlayfs: add statfs support

Erez Zadok (1):
      overlayfs: implement show_options

Miklos Szeredi (11):
      vfs: add i_op->dentry_open()
      vfs: export do_splice_direct() to modules
      vfs: export __inode_permission() to modules
      vfs: introduce clone_private_mount()
      vfs: export check_sticky()
      vfs: add whiteout support
      vfs: add RENAME_WHITEOUT
      ext4: support RENAME_WHITEOUT
      shmem: support RENAME_WHITEOUT
      overlay filesystem
      fs: limit filesystem stacking depth

Neil Brown (1):
      overlay: overlay filesystem documentation

Diffstat:
 Documentation/filesystems/Locking       |    2 +
 Documentation/filesystems/overlayfs.txt |  198 +++++++
 Documentation/filesystems/vfs.txt       |    7 +
 MAINTAINERS                             |    7 +
 fs/Kconfig                              |    1 +
 fs/Makefile                             |    1 +
 fs/btrfs/ioctl.c                        |   20 +-
 fs/dcache.c                             |    2 +
 fs/ecryptfs/main.c                      |    7 +
 fs/ext4/namei.c                         |   95 +++-
 fs/internal.h                           |    7 -
 fs/namei.c                              |   41 +-
 fs/namespace.c                          |   27 +
 fs/open.c                               |   23 +-
 fs/overlayfs/Kconfig                    |   10 +
 fs/overlayfs/Makefile                   |    7 +
 fs/overlayfs/copy_up.c                  |  414 ++++++++++++++
 fs/overlayfs/dir.c                      |  921 +++++++++++++++++++++++++++++++
 fs/overlayfs/inode.c                    |  425 ++++++++++++++
 fs/overlayfs/overlayfs.h                |  191 +++++++
 fs/overlayfs/readdir.c                  |  589 ++++++++++++++++++++
 fs/overlayfs/super.c                    |  796 ++++++++++++++++++++++++++
 fs/splice.c                             |    1 +
 include/linux/fs.h                      |   39 ++
 include/linux/mount.h                   |    3 +
 include/uapi/linux/fs.h                 |    1 +
 mm/shmem.c                              |   36 +-
 27 files changed, 3813 insertions(+), 58 deletions(-)
 create mode 100644 Documentation/filesystems/overlayfs.txt
 create mode 100644 fs/overlayfs/Kconfig
 create mode 100644 fs/overlayfs/Makefile
 create mode 100644 fs/overlayfs/copy_up.c
 create mode 100644 fs/overlayfs/dir.c
 create mode 100644 fs/overlayfs/inode.c
 create mode 100644 fs/overlayfs/overlayfs.h
 create mode 100644 fs/overlayfs/readdir.c
 create mode 100644 fs/overlayfs/super.c

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [GIT PULL] overlay filesystem v25
  2014-10-24  3:24   ` Al Viro
@ 2014-10-24  7:24     ` Miklos Szeredi
  2014-10-25  8:18       ` Al Viro
  2014-10-24  7:28     ` Geert Uytterhoeven
  1 sibling, 1 reply; 16+ messages in thread
From: Miklos Szeredi @ 2014-10-24  7:24 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, Linux-Fsdevel, Kernel Mailing List, linux-unionfs

On Fri, Oct 24, 2014 at 5:24 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Fri, Oct 24, 2014 at 03:20:55AM +0100, Al Viro wrote:
>> Why the hell do you hold ->i_mutex across the entire opening of underlying
>> directory?  All you need is to serialize one assignment; the side that loses
>> the race will simply fput() what it opened...

The reason I didn't do your "fix" is that it

 - adds more lines than it takes,

 - I wasn't sure at all if the lockless access is actually correct
without the ACCESS_ONCE and all the memory barrier magic that might be
necessary on weird architectures.

The rest of the changes look OK.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [GIT PULL] overlay filesystem v25
  2014-10-24  7:24     ` Miklos Szeredi
@ 2014-10-25  8:18       ` Al Viro
  2014-10-25  9:53         ` Miklos Szeredi
  0 siblings, 1 reply; 16+ messages in thread
From: Al Viro @ 2014-10-25  8:18 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Linus Torvalds, Linux-Fsdevel, Kernel Mailing List, linux-unionfs

On Fri, Oct 24, 2014 at 09:24:45AM +0200, Miklos Szeredi wrote:

> The reason I didn't do your "fix" is that it
> 
>  - adds more lines than it takes,
> 
>  - I wasn't sure at all if the lockless access is actually correct
> without the ACCESS_ONCE and all the memory barrier magic that might be
>  necessary on weird architectures.

_What_ lockless accesses?  There is an extremely embarrassing bug in that
commit, all right, but it has nothing to do with barriers...  All
barrier-related issues are taken care of by ovl_path_upper() (and without
that you'd have tons of worse problems).  Fetching ->upperfile outside of
->i_mutex is fine - in the worst case we'll fetch NULL, open the sucker
grab ->i_mutex and find out that it has already been taken care of.
In which case we fput() what we'd opened and move on (fput() under
->i_mutex is fine - it's going to be delayed until return from syscall
anyway).

There was a very dumb braino in there; fixed, force-pushed, passes unionmount
tests, no regressions on LTP syscall ones and xfstests.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [GIT PULL] overlay filesystem v25
  2014-10-25  8:18       ` Al Viro
@ 2014-10-25  9:53         ` Miklos Szeredi
  2014-10-25 17:06           ` Al Viro
  0 siblings, 1 reply; 16+ messages in thread
From: Miklos Szeredi @ 2014-10-25  9:53 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, Linux-Fsdevel, Kernel Mailing List, linux-unionfs

On Sat, Oct 25, 2014 at 10:18 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Fri, Oct 24, 2014 at 09:24:45AM +0200, Miklos Szeredi wrote:
>
>> The reason I didn't do your "fix" is that it
>>
>>  - adds more lines than it takes,
>>
>>  - I wasn't sure at all if the lockless access is actually correct
>> without the ACCESS_ONCE and all the memory barrier magic that might be
>>  necessary on weird architectures.
>
> _What_ lockless accesses?  There is an extremely embarrassing bug in that
> commit, all right, but it has nothing to do with barriers...  All
> barrier-related issues are taken care of by ovl_path_upper() (and without
> that you'd have tons of worse problems).  Fetching ->upperfile outside of
> ->i_mutex is fine - in the worst case we'll fetch NULL, open the sucker
> grab ->i_mutex and find out that it has already been taken care of.
> In which case we fput() what we'd opened and move on (fput() under
> ->i_mutex is fine - it's going to be delayed until return from syscall
> anyway).

Yes, but it's not about race with copy-up (which the ovl_path_upper()
protects against), but race of two fsync calls with each other.  If
there's no synchronization between them, then that od->upperfile does
indeed count as lockless access, no matter that the assignment was
done under lock.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [GIT PULL] overlay filesystem v25
  2014-10-25  9:53         ` Miklos Szeredi
@ 2014-10-25 17:06           ` Al Viro
  2014-10-27  8:06             ` Miklos Szeredi
  0 siblings, 1 reply; 16+ messages in thread
From: Al Viro @ 2014-10-25 17:06 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Linus Torvalds, Linux-Fsdevel, Kernel Mailing List, linux-unionfs

On Sat, Oct 25, 2014 at 11:53:52AM +0200, Miklos Szeredi wrote:

> Yes, but it's not about race with copy-up (which the ovl_path_upper()
> protects against), but race of two fsync calls with each other.  If
> there's no synchronization between them, then that od->upperfile does
> indeed count as lockless access, no matter that the assignment was
> done under lock.

	p = global;
	if (!p) {	// outside of lock
		p = alloc();
		grab lock
		if (!global) {
			global = p;
		} else {
			destroy(p);
			p = global;
		}
		drop lock
	}
is a very common pattern, especially if you look for cases when lock is
a spinlock and allocation is blocking (in those cases you'll often see
destroy() part done after dropping the lock; that's where what I fucked up in
what I'd originally pushed.  And it wasn't even needed - fput() under
->i_mutex is OK...)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [GIT PULL] overlay filesystem v25
  2014-10-25 17:06           ` Al Viro
@ 2014-10-27  8:06             ` Miklos Szeredi
  2014-10-27 15:59               ` Paul E. McKenney
  0 siblings, 1 reply; 16+ messages in thread
From: Miklos Szeredi @ 2014-10-27  8:06 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Linux-Fsdevel, Kernel Mailing List, linux-unionfs,
	Paul E. McKenney

[Paul McKenney added to CC]

On Sat, Oct 25, 2014 at 7:06 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Sat, Oct 25, 2014 at 11:53:52AM +0200, Miklos Szeredi wrote:
>
>> Yes, but it's not about race with copy-up (which the ovl_path_upper()
>> protects against), but race of two fsync calls with each other.  If
>> there's no synchronization between them, then that od->upperfile does
>> indeed count as lockless access, no matter that the assignment was
>> done under lock.
>
>         p = global;
>         if (!p) {       // outside of lock
>                 p = alloc();
>                 grab lock
>                 if (!global) {
>                         global = p;
>                 } else {
>                         destroy(p);
>                         p = global;
>                 }
>                 drop lock
>         }
> is a very common pattern, especially if you look for cases when lock is
> a spinlock and allocation is blocking (in those cases you'll often see
> destroy() part done after dropping the lock; that's where what I fucked up in
> what I'd originally pushed.  And it wasn't even needed - fput() under
> ->i_mutex is OK...)

Being a very common pattern does not automatically make it correct...

My understanding of these issues is very limited, but it's not clear
to me what will order initialization of members of p with the storing
of p into global.  E.g. we start out with global == NULL and p->foo ==
0.

CPU1:
  p->foo = 1
  grab lock
  if (!global)
      global = p

CPU1:
  p = global
  if (p)
     q = p->foo

Is it guaranteed that the above sequence (as is, without any barriers
or ACCESS_ONCE() other than the lock acquisition) will result in q ==
1 if p != NULL?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [GIT PULL] overlay filesystem v25
  2014-10-27  8:06             ` Miklos Szeredi
@ 2014-10-27 15:59               ` Paul E. McKenney
  2014-10-27 17:28                 ` Al Viro
  0 siblings, 1 reply; 16+ messages in thread
From: Paul E. McKenney @ 2014-10-27 15:59 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Al Viro, Linus Torvalds, Linux-Fsdevel, Kernel Mailing List,
	linux-unionfs

On Mon, Oct 27, 2014 at 09:06:54AM +0100, Miklos Szeredi wrote:
> [Paul McKenney added to CC]
> 
> On Sat, Oct 25, 2014 at 7:06 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> > On Sat, Oct 25, 2014 at 11:53:52AM +0200, Miklos Szeredi wrote:
> >
> >> Yes, but it's not about race with copy-up (which the ovl_path_upper()
> >> protects against), but race of two fsync calls with each other.  If
> >> there's no synchronization between them, then that od->upperfile does
> >> indeed count as lockless access, no matter that the assignment was
> >> done under lock.
> >
> >         p = global;
> >         if (!p) {       // outside of lock
> >                 p = alloc();
> >                 grab lock
> >                 if (!global) {
> >                         global = p;
> >                 } else {
> >                         destroy(p);
> >                         p = global;
> >                 }
> >                 drop lock
> >         }
> > is a very common pattern, especially if you look for cases when lock is
> > a spinlock and allocation is blocking (in those cases you'll often see
> > destroy() part done after dropping the lock; that's where what I fucked up in
> > what I'd originally pushed.  And it wasn't even needed - fput() under
> > ->i_mutex is OK...)
> 
> Being a very common pattern does not automatically make it correct...
> 
> My understanding of these issues is very limited, but it's not clear
> to me what will order initialization of members of p with the storing
> of p into global.  E.g. we start out with global == NULL and p->foo ==
> 0.
> 
> CPU1:
>   p->foo = 1
>   grab lock
>   if (!global)
>       global = p
> 
> CPU1:

If it is all the same to you, I will call this CPU2 to distinguish it from
the first CPU1.  ;-)

>   p = global
>   if (p)
>      q = p->foo
> 
> Is it guaranteed that the above sequence (as is, without any barriers
> or ACCESS_ONCE() other than the lock acquisition) will result in q ==
> 1 if p != NULL?

Indeed, life is hard here.  Keep in mind that lock acquisition is not
guaranteed to prevent prior operations from being reordered into the
critical section, possibly as follows:

	CPU1:
	  grab lock
	  if (!global)
	      global = p;
	  /* Assume all of CPU2's accesses happen here. */
	  p->foo = 1;

This clearly allows CPU2 to execute as follows:

	CPU2:
	  p = global; /* gets p */
	  if (p) /* non-NULL */
	  	q = p->foo; /* might not be 1 */

Not only that, on DEC Alpha, even if CPU1's accesses are ordered, CPU2's
accesses can be misordered.  You need rcu_dereference() or the combination
of ACCESS_ONCE() and smp_read_barrier_depends() to avoid this issue.
As always, see http://www.openvms.compaq.com/wizard/wiz_2637.html for
more info.

So no, there is no guarantee.  I am assuming that the lock grabbed by
CPU1 guards all assignments to "global", otherwise the code needs further
help.  I am further assuming that the memory pointed to by CPU1's "p"
is inaccessible to any other CPU, as in CPU1 just allocated the memory.
Otherwise, the assignment "p->foo = 1" is questionable.  And finally,
I am assuming that p->foo stays constant once it has been made
accessible to readers.

But the following should work:

	CPU1:
	  p->foo = 1;  /* Assumes p is local. */
	  smp_mb__before_spinlock();
	  grab lock
	  if (!global)  /* Assumes lock protects all assignments to global. */
	      global = p;

	CPU2:
	  p = rcu_dereference(global);
	  if (p)
	     q = p->foo; /* Assumes p->foo constant once visible to readers. */
	     		 /* Also assumes p and q are local. */

If the assumptions called out in the comments do not hold, you at least
need ACCESS_ONCE(), and perhaps even more synchronization.  For more info
on ACCESS_ONCE(), Jon's LWN article is at http://lwn.net/Articles/508991/.

								Thanx, Paul

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [GIT PULL] overlay filesystem v25
  2014-10-27 15:59               ` Paul E. McKenney
@ 2014-10-27 17:28                 ` Al Viro
  2014-10-27 17:36                   ` Paul E. McKenney
  0 siblings, 1 reply; 16+ messages in thread
From: Al Viro @ 2014-10-27 17:28 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Miklos Szeredi, Linus Torvalds, Linux-Fsdevel,
	Kernel Mailing List, linux-unionfs

On Mon, Oct 27, 2014 at 08:59:01AM -0700, Paul E. McKenney wrote:

> Indeed, life is hard here.  Keep in mind that lock acquisition is not
> guaranteed to prevent prior operations from being reordered into the
> critical section, possibly as follows:
> 
> 	CPU1:
> 	  grab lock
> 	  if (!global)
> 	      global = p;
> 	  /* Assume all of CPU2's accesses happen here. */
> 	  p->foo = 1;

A bit of context: p is a local pointer to struct file here and alloc is
opening it...

> This clearly allows CPU2 to execute as follows:
> 
> 	CPU2:
> 	  p = global; /* gets p */
> 	  if (p) /* non-NULL */
> 	  	q = p->foo; /* might not be 1 */
> 
> Not only that, on DEC Alpha, even if CPU1's accesses are ordered, CPU2's
> accesses can be misordered.  You need rcu_dereference() or the combination
> of ACCESS_ONCE() and smp_read_barrier_depends() to avoid this issue.
> As always, see http://www.openvms.compaq.com/wizard/wiz_2637.html for
> more info.
> 
> So no, there is no guarantee.  I am assuming that the lock grabbed by
> CPU1 guards all assignments to "global", otherwise the code needs further
> help.  I am further assuming that the memory pointed to by CPU1's "p"
> is inaccessible to any other CPU, as in CPU1 just allocated the memory.
> Otherwise, the assignment "p->foo = 1" is questionable.  And finally,
> I am assuming that p->foo stays constant once it has been made
> accessible to readers.
> 
> But the following should work:
> 
> 	CPU1:
> 	  p->foo = 1;  /* Assumes p is local. */
> 	  smp_mb__before_spinlock();
> 	  grab lock
> 	  if (!global)  /* Assumes lock protects all assignments to global. */
> 	      global = p;
> 
> 	CPU2:
> 	  p = rcu_dereference(global);
> 	  if (p)
> 	     q = p->foo; /* Assumes p->foo constant once visible to readers. */
> 	     		 /* Also assumes p and q are local. */
> 
> If the assumptions called out in the comments do not hold, you at least
> need ACCESS_ONCE(), and perhaps even more synchronization.  For more info
> on ACCESS_ONCE(), Jon's LWN article is at http://lwn.net/Articles/508991/.

First of all, this "p->foo = 1" is a shorthand for initialization of
struct file done by opening a file.  What you are saying is that it
can leak past the point where we stick a pointer to freshly opened
file into a place where another CPU can see it, but not past the
barrier in dropping the lock, right?

And you are suggesting rcu_dereference() as a way to bring the required
barriers in.  Ouch...  The names are really bad, but there's another
fun issue - rcu_dereference brings in sparse noise.  Wouldn't direct use
of smp_read_barrier_depends() be cleaner, anyway?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [GIT PULL] overlay filesystem v25
  2014-10-27 17:28                 ` Al Viro
@ 2014-10-27 17:36                   ` Paul E. McKenney
  2014-10-28  1:12                     ` Al Viro
  0 siblings, 1 reply; 16+ messages in thread
From: Paul E. McKenney @ 2014-10-27 17:36 UTC (permalink / raw)
  To: Al Viro
  Cc: Miklos Szeredi, Linus Torvalds, Linux-Fsdevel,
	Kernel Mailing List, linux-unionfs

On Mon, Oct 27, 2014 at 05:28:01PM +0000, Al Viro wrote:
> On Mon, Oct 27, 2014 at 08:59:01AM -0700, Paul E. McKenney wrote:
> 
> > Indeed, life is hard here.  Keep in mind that lock acquisition is not
> > guaranteed to prevent prior operations from being reordered into the
> > critical section, possibly as follows:
> > 
> > 	CPU1:
> > 	  grab lock
> > 	  if (!global)
> > 	      global = p;
> > 	  /* Assume all of CPU2's accesses happen here. */
> > 	  p->foo = 1;
> 
> A bit of context: p is a local pointer to struct file here and alloc is
> opening it...

OK, good to know.  ;-)

> > This clearly allows CPU2 to execute as follows:
> > 
> > 	CPU2:
> > 	  p = global; /* gets p */
> > 	  if (p) /* non-NULL */
> > 	  	q = p->foo; /* might not be 1 */
> > 
> > Not only that, on DEC Alpha, even if CPU1's accesses are ordered, CPU2's
> > accesses can be misordered.  You need rcu_dereference() or the combination
> > of ACCESS_ONCE() and smp_read_barrier_depends() to avoid this issue.
> > As always, see http://www.openvms.compaq.com/wizard/wiz_2637.html for
> > more info.
> > 
> > So no, there is no guarantee.  I am assuming that the lock grabbed by
> > CPU1 guards all assignments to "global", otherwise the code needs further
> > help.  I am further assuming that the memory pointed to by CPU1's "p"
> > is inaccessible to any other CPU, as in CPU1 just allocated the memory.
> > Otherwise, the assignment "p->foo = 1" is questionable.  And finally,
> > I am assuming that p->foo stays constant once it has been made
> > accessible to readers.
> > 
> > But the following should work:
> > 
> > 	CPU1:
> > 	  p->foo = 1;  /* Assumes p is local. */
> > 	  smp_mb__before_spinlock();
> > 	  grab lock
> > 	  if (!global)  /* Assumes lock protects all assignments to global. */
> > 	      global = p;
> > 
> > 	CPU2:
> > 	  p = rcu_dereference(global);
> > 	  if (p)
> > 	     q = p->foo; /* Assumes p->foo constant once visible to readers. */
> > 	     		 /* Also assumes p and q are local. */
> > 
> > If the assumptions called out in the comments do not hold, you at least
> > need ACCESS_ONCE(), and perhaps even more synchronization.  For more info
> > on ACCESS_ONCE(), Jon's LWN article is at http://lwn.net/Articles/508991/.
> 
> First of all, this "p->foo = 1" is a shorthand for initialization of
> struct file done by opening a file.  What you are saying is that it
> can leak past the point where we stick a pointer to freshly opened
> file into a place where another CPU can see it, but not past the
> barrier in dropping the lock, right?

Exactly!

I should also add that smp_mb__before_spinlock() implies smp_wmb(), but
nothing more.  But that is OK because smp_wmb() suffices in this case.

> And you are suggesting rcu_dereference() as a way to bring the required
> barriers in.  Ouch...  The names are really bad, but there's another
> fun issue - rcu_dereference brings in sparse noise.  Wouldn't direct use
> of smp_read_barrier_depends() be cleaner, anyway?

Code making direct use of smp_read_barrier_depends() is harder to read,
in my experience, but good point on the sparse noise.  Maybe a new
lockless_dereference() primitive?  Maybe something like the following?
(Untested, probably does not even build.)

#define lockless_dereference(p) \
({ \
	typeof(*p) *_________p1 = ACCESS_ONCE(p); \
	smp_read_barrier_depends(); /* Dependency order vs. p above. */ \
	_________p1; \
})

							Thanx, Paul


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [GIT PULL] overlay filesystem v25
  2014-10-27 17:36                   ` Paul E. McKenney
@ 2014-10-28  1:12                     ` Al Viro
  2014-10-28  4:11                       ` Paul E. McKenney
  0 siblings, 1 reply; 16+ messages in thread
From: Al Viro @ 2014-10-28  1:12 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Miklos Szeredi, Linus Torvalds, Linux-Fsdevel,
	Kernel Mailing List, linux-unionfs

On Mon, Oct 27, 2014 at 10:36:21AM -0700, Paul E. McKenney wrote:
> Code making direct use of smp_read_barrier_depends() is harder to read,
> in my experience, but good point on the sparse noise.  Maybe a new
> lockless_dereference() primitive?  Maybe something like the following?
> (Untested, probably does not even build.)
> 
> #define lockless_dereference(p) \
> ({ \
> 	typeof(*p) *_________p1 = ACCESS_ONCE(p); \
> 	smp_read_barrier_depends(); /* Dependency order vs. p above. */ \
> 	_________p1; \
> })

Hmm...  Where would you prefer to put it?  rcupdate.h?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [GIT PULL] overlay filesystem v25
  2014-10-28  1:12                     ` Al Viro
@ 2014-10-28  4:11                       ` Paul E. McKenney
  2014-10-28 22:55                         ` Al Viro
  0 siblings, 1 reply; 16+ messages in thread
From: Paul E. McKenney @ 2014-10-28  4:11 UTC (permalink / raw)
  To: Al Viro
  Cc: Miklos Szeredi, Linus Torvalds, Linux-Fsdevel,
	Kernel Mailing List, linux-unionfs

On Tue, Oct 28, 2014 at 01:12:14AM +0000, Al Viro wrote:
> On Mon, Oct 27, 2014 at 10:36:21AM -0700, Paul E. McKenney wrote:
> > Code making direct use of smp_read_barrier_depends() is harder to read,
> > in my experience, but good point on the sparse noise.  Maybe a new
> > lockless_dereference() primitive?  Maybe something like the following?
> > (Untested, probably does not even build.)
> > 
> > #define lockless_dereference(p) \
> > ({ \
> > 	typeof(*p) *_________p1 = ACCESS_ONCE(p); \
> > 	smp_read_barrier_depends(); /* Dependency order vs. p above. */ \
> > 	_________p1; \
> > })
> 
> Hmm...  Where would you prefer to put it?  rcupdate.h?

Good a place as any, I guess.  Please see patch below.  Left to myself,
I would send this along for the next merge window, but please let me
know if you would like it sooner.

							Thanx, Paul

------------------------------------------------------------------------

rcu: Provide counterpart to rcu_dereference() for non-RCU situations

Although rcu_dereference() and friends can be used in situations where
object lifetimes are being managed by something other than RCU, the
resulting sparse and lockdep-RCU noise can be annoying.  This commit
therefore supplies a lockless_dereference(), which provides the
protection for dereferences without the RCU-related debugging noise.

Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 9575c2d403b5..ed4f5939a452 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -617,6 +617,21 @@ static inline void rcu_preempt_sleep_check(void)
 #define RCU_INITIALIZER(v) (typeof(*(v)) __force __rcu *)(v)
 
 /**
+ * lockless_dereference() - safely load a pointer for later dereference
+ * @p: The pointer to load
+ *
+ * Similar to rcu_dereference(), but for situations where the pointed-to
+ * object's lifetime is managed by something other than RCU.  That
+ * "something other" might be reference counting or simple immortality.
+ */
+#define lockless_dereference(p) \
+({ \
+	typeof(p) _________p1 = ACCESS_ONCE(p); \
+	smp_read_barrier_depends(); /* Dependency order vs. p above. */ \
+	(_________p1); \
+})
+
+/**
  * rcu_assign_pointer() - assign to RCU-protected pointer
  * @p: pointer to assign to
  * @v: value to assign (publish)

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [GIT PULL] overlay filesystem v25
  2014-10-28  4:11                       ` Paul E. McKenney
@ 2014-10-28 22:55                         ` Al Viro
  2014-10-28 23:25                           ` Paul E. McKenney
  0 siblings, 1 reply; 16+ messages in thread
From: Al Viro @ 2014-10-28 22:55 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Miklos Szeredi, Linus Torvalds, Linux-Fsdevel,
	Kernel Mailing List, linux-unionfs

On Mon, Oct 27, 2014 at 09:11:27PM -0700, Paul E. McKenney wrote:
> On Tue, Oct 28, 2014 at 01:12:14AM +0000, Al Viro wrote:
> > On Mon, Oct 27, 2014 at 10:36:21AM -0700, Paul E. McKenney wrote:
> > > Code making direct use of smp_read_barrier_depends() is harder to read,
> > > in my experience, but good point on the sparse noise.  Maybe a new
> > > lockless_dereference() primitive?  Maybe something like the following?
> > > (Untested, probably does not even build.)
> > > 
> > > #define lockless_dereference(p) \
> > > ({ \
> > > 	typeof(*p) *_________p1 = ACCESS_ONCE(p); \
> > > 	smp_read_barrier_depends(); /* Dependency order vs. p above. */ \
> > > 	_________p1; \
> > > })
> > 
> > Hmm...  Where would you prefer to put it?  rcupdate.h?
> 
> Good a place as any, I guess.  Please see patch below.  Left to myself,
> I would send this along for the next merge window, but please let me
> know if you would like it sooner.

It's needed sooner, unfortunately.  Guys, could you take a look at
vfs.git#for-linus and comment?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [GIT PULL] overlay filesystem v25
  2014-10-28 22:55                         ` Al Viro
@ 2014-10-28 23:25                           ` Paul E. McKenney
  0 siblings, 0 replies; 16+ messages in thread
From: Paul E. McKenney @ 2014-10-28 23:25 UTC (permalink / raw)
  To: Al Viro
  Cc: Miklos Szeredi, Linus Torvalds, Linux-Fsdevel,
	Kernel Mailing List, linux-unionfs

On Tue, Oct 28, 2014 at 10:55:13PM +0000, Al Viro wrote:
> On Mon, Oct 27, 2014 at 09:11:27PM -0700, Paul E. McKenney wrote:
> > On Tue, Oct 28, 2014 at 01:12:14AM +0000, Al Viro wrote:
> > > On Mon, Oct 27, 2014 at 10:36:21AM -0700, Paul E. McKenney wrote:
> > > > Code making direct use of smp_read_barrier_depends() is harder to read,
> > > > in my experience, but good point on the sparse noise.  Maybe a new
> > > > lockless_dereference() primitive?  Maybe something like the following?
> > > > (Untested, probably does not even build.)
> > > > 
> > > > #define lockless_dereference(p) \
> > > > ({ \
> > > > 	typeof(*p) *_________p1 = ACCESS_ONCE(p); \
> > > > 	smp_read_barrier_depends(); /* Dependency order vs. p above. */ \
> > > > 	_________p1; \
> > > > })
> > > 
> > > Hmm...  Where would you prefer to put it?  rcupdate.h?
> > 
> > Good a place as any, I guess.  Please see patch below.  Left to myself,
> > I would send this along for the next merge window, but please let me
> > know if you would like it sooner.
> 
> It's needed sooner, unfortunately.  Guys, could you take a look at
> vfs.git#for-linus and comment?

The version in your tree looks good.  I will drop my commit in favor
of yours.

The use of lockless_dereference() and smp_mb__before_spinlock() in
d45f00ae43 (overlayfs: barriers for opening upper-layer directory)
looks fine to me.  Only nit is lack of space between "=" and
lockless_dereference().

							Thanx, Paul

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [GIT PULL] overlay filesystem v25
  2014-10-24  3:24   ` Al Viro
  2014-10-24  7:24     ` Miklos Szeredi
@ 2014-10-24  7:28     ` Geert Uytterhoeven
  1 sibling, 0 replies; 16+ messages in thread
From: Geert Uytterhoeven @ 2014-10-24  7:28 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Miklos Szeredi, Linux FS Devel,
	linux-kernel@vger.kernel.org, linux-unionfs, Stephen Rothwell

Hi Al,

On Fri, Oct 24, 2014 at 5:24 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Fri, Oct 24, 2014 at 03:20:55AM +0100, Al Viro wrote:
>> Why the hell do you hold ->i_mutex across the entire opening of underlying
>> directory?  All you need is to serialize one assignment; the side that loses
>> the race will simply fput() what it opened...
>>
>> Oh, well - that goes under "weird pessimisations, easy to fix in followups"...
>
> OK, pulled into vfs.git, followups in question added.  Also there: fix for
> a long-standing leak in d_splice_alias() failure exits.  Guys, could you
> check that current vfs.git#for-linus survives your local tests?  Seems to
> survive here; if I don't hear of any problems by tomorrow morning, to Linus
> it goes...  FWIW, for that pull request stats would be

I think you forgot to merge your for-linus branch into for-next?

Anyway, as Stephen announced there will be no linux-next release
today, I'm afraid you'll have to delay your pull request to Tuesday.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2014-10-28 23:25 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-10-23 23:25 [GIT PULL] overlay filesystem v25 Miklos Szeredi
2014-10-24  2:20 ` Al Viro
2014-10-24  3:24   ` Al Viro
2014-10-24  7:24     ` Miklos Szeredi
2014-10-25  8:18       ` Al Viro
2014-10-25  9:53         ` Miklos Szeredi
2014-10-25 17:06           ` Al Viro
2014-10-27  8:06             ` Miklos Szeredi
2014-10-27 15:59               ` Paul E. McKenney
2014-10-27 17:28                 ` Al Viro
2014-10-27 17:36                   ` Paul E. McKenney
2014-10-28  1:12                     ` Al Viro
2014-10-28  4:11                       ` Paul E. McKenney
2014-10-28 22:55                         ` Al Viro
2014-10-28 23:25                           ` Paul E. McKenney
2014-10-24  7:28     ` Geert Uytterhoeven

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).