Stable inode numbers

linux-unionfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Stable inode numbers
@ 2016-07-21 23:33 Vito Caputo
  2016-07-22 12:15 ` Miklos Szeredi
  0 siblings, 1 reply; 9+ messages in thread
From: Vito Caputo @ 2016-07-21 23:33 UTC (permalink / raw)
  To: linux-unionfs

Hello list,

We're receiving bug reports from users simply untarring archives onto overlayfs
mounts (docker + overlayfs) where tar fails with "Directory renamed before its
status could be extracted" errors.

This is triggered in GNU tar when the inode number of an ancestor directory
changes unexpectedly in the deferred application of ancestral metadata.

It's not consistently reproducible, but in a slightly memory-constrained vm it
is very easy to reproduce.

What appears to be happening is the tar extraction activity is churning through
the kernel's cache so a subsequent lookup of the directory in question returns
a new inode number.  One can easily simulate this using `stat` and `echo 2 >
/proc/sys/vm/drop_caches` to observe the overlayfs inode number change across
the cache drop.

The seemingly erratic nature of this failure is particularly frustrating for
users, as it gives the impression things _should_ work and just randomly aren't
due to a bug.  Scrutiny of the overlayfs implementation reveals that unstable
inode numbers is intentional, and spurious failures of applications which
notice intersecting relevant cache evictions should be expected.

Are there any plans or strategies for changing this situation in overlayfs?

I can think of a few potential solutions, but they all tend to revolve around
persistently allocating inode numbers on first lookup, which of course requires
both space and time overlayfs currently elides.

Cheers,
Vito Caputo

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Stable inode numbers
  2016-07-21 23:33 Stable inode numbers Vito Caputo
@ 2016-07-22 12:15 ` Miklos Szeredi
  2016-07-22 20:55   ` Vito Caputo
  0 siblings, 1 reply; 9+ messages in thread
From: Miklos Szeredi @ 2016-07-22 12:15 UTC (permalink / raw)
  To: Vito Caputo; +Cc: linux-unionfs

On Thu, Jul 21, 2016 at 04:33:38PM -0700, Vito Caputo wrote:
> Hello list,
> 
> We're receiving bug reports from users simply untarring archives onto overlayfs
> mounts (docker + overlayfs) where tar fails with "Directory renamed before its
> status could be extracted" errors.
> 
> This is triggered in GNU tar when the inode number of an ancestor directory
> changes unexpectedly in the deferred application of ancestral metadata.
> 
> It's not consistently reproducible, but in a slightly memory-constrained vm it
> is very easy to reproduce.
> 
> What appears to be happening is the tar extraction activity is churning through
> the kernel's cache so a subsequent lookup of the directory in question returns
> a new inode number.  One can easily simulate this using `stat` and `echo 2 >
> /proc/sys/vm/drop_caches` to observe the overlayfs inode number change across
> the cache drop.
> 
> The seemingly erratic nature of this failure is particularly frustrating for
> users, as it gives the impression things _should_ work and just randomly aren't
> due to a bug.  Scrutiny of the overlayfs implementation reveals that unstable
> inode numbers is intentional, and spurious failures of applications which
> notice intersecting relevant cache evictions should be expected.
> 
> Are there any plans or strategies for changing this situation in overlayfs?
> 
> I can think of a few potential solutions, but they all tend to revolve around
> persistently allocating inode numbers on first lookup, which of course requires
> both space and time overlayfs currently elides.

Hmm, I haven't thought about this problem yet.

Maybe we are better off returning the underlying dev/ino in stat (which is
persistent long term, but not persistent during copy-up).  This would be
cheating a bit, since stat would then return the same numbers for the overlay
directory and the underlying directory, which can be different (unlike
non-directories, which overlayfs returns verbatim).

Can you please try the following patch.  It passes unionmount-testsuite with the
layering check disabled.

Thanks,
Miklos
---

diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
index ddda948109d3..f913b7b73e9d 100644
--- a/fs/overlayfs/dir.c
+++ b/fs/overlayfs/dir.c
@@ -147,9 +147,6 @@ static int ovl_dir_getattr(struct vfsmount *mnt, struct dentry *dentry,
 	if (err)
 		return err;
 
-	stat->dev = dentry->d_sb->s_dev;
-	stat->ino = dentry->d_inode->i_ino;
-
 	/*
 	 * It's probably not worth it to count subdirs to get the
 	 * correct link count.  nlink=1 seems to pacify 'find' and

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: Stable inode numbers
  2016-07-22 12:15 ` Miklos Szeredi
@ 2016-07-22 20:55   ` Vito Caputo
  2016-07-25 11:34     ` Miklos Szeredi
  0 siblings, 1 reply; 9+ messages in thread
From: Vito Caputo @ 2016-07-22 20:55 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-unionfs

On Fri, Jul 22, 2016 at 5:15 AM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Thu, Jul 21, 2016 at 04:33:38PM -0700, Vito Caputo wrote:
>> Hello list,
>>
>> We're receiving bug reports from users simply untarring archives onto overlayfs
>> mounts (docker + overlayfs) where tar fails with "Directory renamed before its
>> status could be extracted" errors.
>>
>> This is triggered in GNU tar when the inode number of an ancestor directory
>> changes unexpectedly in the deferred application of ancestral metadata.
>>
>> It's not consistently reproducible, but in a slightly memory-constrained vm it
>> is very easy to reproduce.
>>
>> What appears to be happening is the tar extraction activity is churning through
>> the kernel's cache so a subsequent lookup of the directory in question returns
>> a new inode number.  One can easily simulate this using `stat` and `echo 2 >
>> /proc/sys/vm/drop_caches` to observe the overlayfs inode number change across
>> the cache drop.
>>
>> The seemingly erratic nature of this failure is particularly frustrating for
>> users, as it gives the impression things _should_ work and just randomly aren't
>> due to a bug.  Scrutiny of the overlayfs implementation reveals that unstable
>> inode numbers is intentional, and spurious failures of applications which
>> notice intersecting relevant cache evictions should be expected.
>>
>> Are there any plans or strategies for changing this situation in overlayfs?
>>
>> I can think of a few potential solutions, but they all tend to revolve around
>> persistently allocating inode numbers on first lookup, which of course requires
>> both space and time overlayfs currently elides.
>
> Hmm, I haven't thought about this problem yet.
>
> Maybe we are better off returning the underlying dev/ino in stat (which is
> persistent long term, but not persistent during copy-up).  This would be
> cheating a bit, since stat would then return the same numbers for the overlay
> directory and the underlying directory, which can be different (unlike
> non-directories, which overlayfs returns verbatim).
>
> Can you please try the following patch.  It passes unionmount-testsuite with the
> layering check disabled.
>

The patch eliminates the errors from tar in the test I've been using
to reproduce the user-reported issues.  However, I'm doubtful of it
being a satisfactory general solution because the inode number of a
pre-existing directory which undergoes copy-up spuriously changes.

There's probably a scenario where this still upsets tar when
extracting over an partial tree pre-existing in the lower layer,
adding names to existing directories, causing their inode numbers to
change.

The relevant code where tar detects these shenanigans may be seen here:
http://git.savannah.gnu.org/cgit/tar.git/tree/src/extract.c#n867

Perhaps we can come up with a better, more general solution without
adding substantial complexity or overhead?

Thanks,
Vito Caputo

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Stable inode numbers
  2016-07-22 20:55   ` Vito Caputo
@ 2016-07-25 11:34     ` Miklos Szeredi
  2016-07-25 21:25       ` Vito Caputo
  0 siblings, 1 reply; 9+ messages in thread
From: Miklos Szeredi @ 2016-07-25 11:34 UTC (permalink / raw)
  To: Vito Caputo; +Cc: linux-unionfs

On Fri, Jul 22, 2016 at 01:55:56PM -0700, Vito Caputo wrote:

> The patch eliminates the errors from tar in the test I've been using
> to reproduce the user-reported issues.  However, I'm doubtful of it
> being a satisfactory general solution because the inode number of a
> pre-existing directory which undergoes copy-up spuriously changes.
> 
> There's probably a scenario where this still upsets tar when
> extracting over an partial tree pre-existing in the lower layer,
> adding names to existing directories, causing their inode numbers to
> change.
> 
> The relevant code where tar detects these shenanigans may be seen here:
> http://git.savannah.gnu.org/cgit/tar.git/tree/src/extract.c#n867
> 
> Perhaps we can come up with a better, more general solution without
> adding substantial complexity or overhead?

Here's another try.  It's simple enough, as to overhead, please let me know if
you are comfortable with this.

Thanks,
Miklos

---
From: Miklos Szeredi <mszeredi@redhat.com>
Subject: ovl: optionally copy up directory on getattr

This is to make directory dev/ino stable.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
---
 fs/overlayfs/dir.c       |   15 +++++++++++++--
 fs/overlayfs/overlayfs.h |    6 ++++++
 fs/overlayfs/super.c     |   19 +++++++++++++++++++
 3 files changed, 38 insertions(+), 2 deletions(-)

--- a/fs/overlayfs/dir.c
+++ b/fs/overlayfs/dir.c
@@ -139,6 +139,15 @@ static int ovl_dir_getattr(struct vfsmou
 	enum ovl_path_type type;
 	struct path realpath;
 	const struct cred *old_cred;
+	bool devino_from_ovl = true;
+
+	if (ovl_copy_up_type(dentry) == OVL_COPY_UP_ON_GETATTR) {
+		err = ovl_copy_up(dentry);
+		if (err)
+			return err;
+
+		devino_from_ovl = false;
+	}
 
 	type = ovl_path_real(dentry, &realpath);
 	old_cred = ovl_override_creds(dentry->d_sb);
@@ -147,8 +156,10 @@ static int ovl_dir_getattr(struct vfsmou
 	if (err)
 		return err;
 
-	stat->dev = dentry->d_sb->s_dev;
-	stat->ino = dentry->d_inode->i_ino;
+	if (devino_from_ovl) {
+		stat->dev = dentry->d_sb->s_dev;
+		stat->ino = dentry->d_inode->i_ino;
+	}
 
 	/*
 	 * It's probably not worth it to count subdirs to get the
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -24,6 +24,11 @@ enum ovl_path_type {
 	(OVL_TYPE_MERGE(type) || !OVL_TYPE_UPPER(type))
 
 
+enum ovl_copy_up_type {
+	OVL_COPY_UP_ON_MODIF,
+	OVL_COPY_UP_ON_GETATTR,
+};
+
 #define OVL_XATTR_PREFIX XATTR_TRUSTED_PREFIX "overlay"
 #define OVL_XATTR_OPAQUE OVL_XATTR_PREFIX ".opaque"
 
@@ -172,6 +177,7 @@ struct file *ovl_path_open(struct path *
 
 struct dentry *ovl_upper_create(struct dentry *upperdir, struct dentry *dentry,
 				struct kstat *stat, const char *link);
+enum ovl_copy_up_type ovl_copy_up_type(struct dentry *dentry);
 
 /* readdir.c */
 extern const struct file_operations ovl_dir_operations;
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -31,6 +31,7 @@ struct ovl_config {
 	char *upperdir;
 	char *workdir;
 	bool default_permissions;
+	enum ovl_copy_up_type dir_copy_up_type;
 };
 
 /* private information held for overlayfs's superblock */
@@ -203,6 +204,16 @@ struct dentry *ovl_workdir(struct dentry
 	return ofs->workdir;
 }
 
+enum ovl_copy_up_type ovl_copy_up_type(struct dentry *dentry)
+{
+	struct ovl_fs *ofs = dentry->d_sb->s_fs_info;
+
+	if (d_is_dir(dentry))
+		return ofs->config.dir_copy_up_type;
+
+	return OVL_COPY_UP_ON_MODIF;
+}
+
 bool ovl_dentry_is_opaque(struct dentry *dentry)
 {
 	struct ovl_entry *oe = dentry->d_fsdata;
@@ -681,6 +692,8 @@ static int ovl_show_options(struct seq_f
 	}
 	if (ufs->config.default_permissions)
 		seq_puts(m, ",default_permissions");
+	if (ufs->config.dir_copy_up_type == OVL_COPY_UP_ON_GETATTR)
+		seq_puts(m, ",dir_copy_up_on=getattr");
 	return 0;
 }
 
@@ -707,6 +720,7 @@ enum {
 	OPT_UPPERDIR,
 	OPT_WORKDIR,
 	OPT_DEFAULT_PERMISSIONS,
+	OPT_DIR_COPY_UP_ON_GETATTR,
 	OPT_ERR,
 };
 
@@ -715,6 +729,7 @@ static const match_table_t ovl_tokens =
 	{OPT_UPPERDIR,			"upperdir=%s"},
 	{OPT_WORKDIR,			"workdir=%s"},
 	{OPT_DEFAULT_PERMISSIONS,	"default_permissions"},
+	{OPT_DIR_COPY_UP_ON_GETATTR,	"dir_copy_up_on=getattr"},
 	{OPT_ERR,			NULL}
 };
 
@@ -779,6 +794,10 @@ static int ovl_parse_opt(char *opt, stru
 			config->default_permissions = true;
 			break;
 
+		case OPT_DIR_COPY_UP_ON_GETATTR:
+			config->dir_copy_up_type = OVL_COPY_UP_ON_GETATTR;
+			break;
+
 		default:
 			pr_err("overlayfs: unrecognized mount option \"%s\" or missing value\n", p);
 			return -EINVAL;

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Stable inode numbers
  2016-07-25 11:34     ` Miklos Szeredi
@ 2016-07-25 21:25       ` Vito Caputo
  2016-07-26  1:51         ` J. R. Okajima
  2016-07-26 10:02         ` Miklos Szeredi
  0 siblings, 2 replies; 9+ messages in thread
From: Vito Caputo @ 2016-07-25 21:25 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-unionfs

I think this strategy is an improvement, but I'm a bit apprehensive
about how specific it is to this particular style of untar-like
directory metadata preservation failure in only providing stability to
directory inode numbers.

Additionally it introduces a userspace-visible mount option, for
something which /feels/ like a stop-gap kludge to make some things
work better in the short-term.  Maybe it's acceptable, since a more
general solution could be implemented later which ignores the added
mount option without conflict.

As overlayfs usage increases in the "enterprise" via containers I
suspect we'll be seeing substantial support load from unsuspecting
users tripping over quirks like this and at some point there's a
threshold where user confidence is lost.  That is the lens I view
overlayfs problems like these through, hence I'd prefer a more general
solution with stable inode numbers making overlayfs behavior more
consistent with the filesystems backing it.

It's difficult for me to define what is "good enough" for overlayfs
correctness, and this situation seems like it might be one of those
epitomizing the saying "perfect is the enemy of good".

Thanks,
Vito Caputo

On Mon, Jul 25, 2016 at 4:34 AM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Fri, Jul 22, 2016 at 01:55:56PM -0700, Vito Caputo wrote:
>
>> The patch eliminates the errors from tar in the test I've been using
>> to reproduce the user-reported issues.  However, I'm doubtful of it
>> being a satisfactory general solution because the inode number of a
>> pre-existing directory which undergoes copy-up spuriously changes.
>>
>> There's probably a scenario where this still upsets tar when
>> extracting over an partial tree pre-existing in the lower layer,
>> adding names to existing directories, causing their inode numbers to
>> change.
>>
>> The relevant code where tar detects these shenanigans may be seen here:
>> http://git.savannah.gnu.org/cgit/tar.git/tree/src/extract.c#n867
>>
>> Perhaps we can come up with a better, more general solution without
>> adding substantial complexity or overhead?
>
> Here's another try.  It's simple enough, as to overhead, please let me know if
> you are comfortable with this.
>
> Thanks,
> Miklos
>
> ---
> From: Miklos Szeredi <mszeredi@redhat.com>
> Subject: ovl: optionally copy up directory on getattr
>
> This is to make directory dev/ino stable.
>
> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
> ---
>  fs/overlayfs/dir.c       |   15 +++++++++++++--
>  fs/overlayfs/overlayfs.h |    6 ++++++
>  fs/overlayfs/super.c     |   19 +++++++++++++++++++
>  3 files changed, 38 insertions(+), 2 deletions(-)
>
> --- a/fs/overlayfs/dir.c
> +++ b/fs/overlayfs/dir.c
> @@ -139,6 +139,15 @@ static int ovl_dir_getattr(struct vfsmou
>         enum ovl_path_type type;
>         struct path realpath;
>         const struct cred *old_cred;
> +       bool devino_from_ovl = true;
> +
> +       if (ovl_copy_up_type(dentry) == OVL_COPY_UP_ON_GETATTR) {
> +               err = ovl_copy_up(dentry);
> +               if (err)
> +                       return err;
> +
> +               devino_from_ovl = false;
> +       }
>
>         type = ovl_path_real(dentry, &realpath);
>         old_cred = ovl_override_creds(dentry->d_sb);
> @@ -147,8 +156,10 @@ static int ovl_dir_getattr(struct vfsmou
>         if (err)
>                 return err;
>
> -       stat->dev = dentry->d_sb->s_dev;
> -       stat->ino = dentry->d_inode->i_ino;
> +       if (devino_from_ovl) {
> +               stat->dev = dentry->d_sb->s_dev;
> +               stat->ino = dentry->d_inode->i_ino;
> +       }
>
>         /*
>          * It's probably not worth it to count subdirs to get the
> --- a/fs/overlayfs/overlayfs.h
> +++ b/fs/overlayfs/overlayfs.h
> @@ -24,6 +24,11 @@ enum ovl_path_type {
>         (OVL_TYPE_MERGE(type) || !OVL_TYPE_UPPER(type))
>
>
> +enum ovl_copy_up_type {
> +       OVL_COPY_UP_ON_MODIF,
> +       OVL_COPY_UP_ON_GETATTR,
> +};
> +
>  #define OVL_XATTR_PREFIX XATTR_TRUSTED_PREFIX "overlay"
>  #define OVL_XATTR_OPAQUE OVL_XATTR_PREFIX ".opaque"
>
> @@ -172,6 +177,7 @@ struct file *ovl_path_open(struct path *
>
>  struct dentry *ovl_upper_create(struct dentry *upperdir, struct dentry *dentry,
>                                 struct kstat *stat, const char *link);
> +enum ovl_copy_up_type ovl_copy_up_type(struct dentry *dentry);
>
>  /* readdir.c */
>  extern const struct file_operations ovl_dir_operations;
> --- a/fs/overlayfs/super.c
> +++ b/fs/overlayfs/super.c
> @@ -31,6 +31,7 @@ struct ovl_config {
>         char *upperdir;
>         char *workdir;
>         bool default_permissions;
> +       enum ovl_copy_up_type dir_copy_up_type;
>  };
>
>  /* private information held for overlayfs's superblock */
> @@ -203,6 +204,16 @@ struct dentry *ovl_workdir(struct dentry
>         return ofs->workdir;
>  }
>
> +enum ovl_copy_up_type ovl_copy_up_type(struct dentry *dentry)
> +{
> +       struct ovl_fs *ofs = dentry->d_sb->s_fs_info;
> +
> +       if (d_is_dir(dentry))
> +               return ofs->config.dir_copy_up_type;
> +
> +       return OVL_COPY_UP_ON_MODIF;
> +}
> +
>  bool ovl_dentry_is_opaque(struct dentry *dentry)
>  {
>         struct ovl_entry *oe = dentry->d_fsdata;
> @@ -681,6 +692,8 @@ static int ovl_show_options(struct seq_f
>         }
>         if (ufs->config.default_permissions)
>                 seq_puts(m, ",default_permissions");
> +       if (ufs->config.dir_copy_up_type == OVL_COPY_UP_ON_GETATTR)
> +               seq_puts(m, ",dir_copy_up_on=getattr");
>         return 0;
>  }
>
> @@ -707,6 +720,7 @@ enum {
>         OPT_UPPERDIR,
>         OPT_WORKDIR,
>         OPT_DEFAULT_PERMISSIONS,
> +       OPT_DIR_COPY_UP_ON_GETATTR,
>         OPT_ERR,
>  };
>
> @@ -715,6 +729,7 @@ static const match_table_t ovl_tokens =
>         {OPT_UPPERDIR,                  "upperdir=%s"},
>         {OPT_WORKDIR,                   "workdir=%s"},
>         {OPT_DEFAULT_PERMISSIONS,       "default_permissions"},
> +       {OPT_DIR_COPY_UP_ON_GETATTR,    "dir_copy_up_on=getattr"},
>         {OPT_ERR,                       NULL}
>  };
>
> @@ -779,6 +794,10 @@ static int ovl_parse_opt(char *opt, stru
>                         config->default_permissions = true;
>                         break;
>
> +               case OPT_DIR_COPY_UP_ON_GETATTR:
> +                       config->dir_copy_up_type = OVL_COPY_UP_ON_GETATTR;
> +                       break;
> +
>                 default:
>                         pr_err("overlayfs: unrecognized mount option \"%s\" or missing value\n", p);
>                         return -EINVAL;

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Stable inode numbers
  2016-07-25 21:25       ` Vito Caputo
@ 2016-07-26  1:51         ` J. R. Okajima
  2016-07-26 10:02         ` Miklos Szeredi
  1 sibling, 0 replies; 9+ messages in thread
From: J. R. Okajima @ 2016-07-26  1:51 UTC (permalink / raw)
  To: Vito Caputo; +Cc: Miklos Szeredi, linux-unionfs

Vito Caputo:
> ...  That is the lens I view
> overlayfs problems like these through, hence I'd prefer a more general
> solution with stable inode numbers making overlayfs behavior more
> consistent with the filesystems backing it.

You may want to try aufs instead of overlayfs since aufs has a such
feature called XINO (external inode number). It is working well for over
a decade, but some users set 'noxino' mount option which makes aufs not
to use XINO. Because XINO consumes some disk space, and if XINO file is
placed on tmpfs, then it means consuming memory.

J. R. Okajima

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Stable inode numbers
  2016-07-25 21:25       ` Vito Caputo
  2016-07-26  1:51         ` J. R. Okajima
@ 2016-07-26 10:02         ` Miklos Szeredi
  2016-07-26 11:09           ` Miklos Szeredi
  1 sibling, 1 reply; 9+ messages in thread
From: Miklos Szeredi @ 2016-07-26 10:02 UTC (permalink / raw)
  To: Vito Caputo; +Cc: linux-unionfs@vger.kernel.org

On Mon, Jul 25, 2016 at 11:25 PM, Vito Caputo <vito.caputo@coreos.com> wrote:
> I think this strategy is an improvement, but I'm a bit apprehensive
> about how specific it is to this particular style of untar-like
> directory metadata preservation failure in only providing stability to
> directory inode numbers.
>
> Additionally it introduces a userspace-visible mount option, for
> something which /feels/ like a stop-gap kludge to make some things
> work better in the short-term.  Maybe it's acceptable, since a more
> general solution could be implemented later which ignores the added
> mount option without conflict.

Not sure that it's a kludge.  AFAIR union-mounts copy up directory on
lookup for simplicity of implementation.  Overlayfs doesn't do that
and it turns out to be not a big problem.  But I don't have any input
into the performance impact of doing this way or that.   If it turns
out to be unacceptable then we can think about some other solution
(possibly something out-of-band, like aufs does).

> As overlayfs usage increases in the "enterprise" via containers I
> suspect we'll be seeing substantial support load from unsuspecting
> users tripping over quirks like this and at some point there's a
> threshold where user confidence is lost.

Too true.

>  That is the lens I view
> overlayfs problems like these through, hence I'd prefer a more general
> solution with stable inode numbers making overlayfs behavior more
> consistent with the filesystems backing it.

Perhaps this needs to be the default then, turning it off to get the
more space efficient but less conformant version.  That risks being a
regression for some, but hey, overlayfs is still somewhat
experimental.

> It's difficult for me to define what is "good enough" for overlayfs
> correctness, and this situation seems like it might be one of those
> epitomizing the saying "perfect is the enemy of good".

If it breaks real apps, then it's not good.  We need to fix it one way
or the other.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Stable inode numbers
  2016-07-26 10:02         ` Miklos Szeredi
@ 2016-07-26 11:09           ` Miklos Szeredi
  2016-08-16  0:03             ` Vito Caputo
  0 siblings, 1 reply; 9+ messages in thread
From: Miklos Szeredi @ 2016-07-26 11:09 UTC (permalink / raw)
  To: Vito Caputo; +Cc: linux-unionfs@vger.kernel.org

On Tue, Jul 26, 2016 at 12:02:13PM +0200, Miklos Szeredi wrote:
> On Mon, Jul 25, 2016 at 11:25 PM, Vito Caputo <vito.caputo@coreos.com> wrote:
> > I think this strategy is an improvement, but I'm a bit apprehensive
> > about how specific it is to this particular style of untar-like
> > directory metadata preservation failure in only providing stability to
> > directory inode numbers.
> >
> > Additionally it introduces a userspace-visible mount option, for
> > something which /feels/ like a stop-gap kludge to make some things
> > work better in the short-term.  Maybe it's acceptable, since a more
> > general solution could be implemented later which ignores the added
> > mount option without conflict.

Here's another patch that achieves stability without compromising
performance/disk usage.

Again not perfect, but perhaps good enough?

Thanks,
Miklos

diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
index 12bcd07b9e32..5b7de7d489e0 100644
--- a/fs/overlayfs/dir.c
+++ b/fs/overlayfs/dir.c
@@ -143,13 +143,24 @@ static int ovl_dir_getattr(struct vfsmount *mnt, struct dentry *dentry,
 	type = ovl_path_real(dentry, &realpath);
 	old_cred = ovl_override_creds(dentry->d_sb);
 	err = vfs_getattr(&realpath, stat);
+	/*
+	 * Use lower ino/dev for merged dir, so they are stable across copy up.
+	 */
+	if (!err && OVL_TYPE_MERGE(type) && OVL_TYPE_UPPER(type)) {
+		struct path lowerpath;
+		struct kstat lowerstat;
+
+		ovl_path_lower(dentry, &lowerpath);
+		err = vfs_getattr(&lowerpath, &lowerstat);
+		if (!err) {
+			stat->dev = lowerstat.dev;
+			stat->ino = lowerstat.ino;
+		}
+	}
 	revert_creds(old_cred);
 	if (err)
 		return err;
 
-	stat->dev = dentry->d_sb->s_dev;
-	stat->ino = dentry->d_inode->i_ino;
-
 	/*
 	 * It's probably not worth it to count subdirs to get the
 	 * correct link count.  nlink=1 seems to pacify 'find' and

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: Stable inode numbers
  2016-07-26 11:09           ` Miklos Szeredi
@ 2016-08-16  0:03             ` Vito Caputo
  0 siblings, 0 replies; 9+ messages in thread
From: Vito Caputo @ 2016-08-16  0:03 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-unionfs@vger.kernel.org

Miklos,

This most recent patch looks ok to me, and it works fine in my limited testing.

Thanks,
Vito Caputo

On Tue, Jul 26, 2016 at 4:09 AM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Tue, Jul 26, 2016 at 12:02:13PM +0200, Miklos Szeredi wrote:
>> On Mon, Jul 25, 2016 at 11:25 PM, Vito Caputo <vito.caputo@coreos.com> wrote:
>> > I think this strategy is an improvement, but I'm a bit apprehensive
>> > about how specific it is to this particular style of untar-like
>> > directory metadata preservation failure in only providing stability to
>> > directory inode numbers.
>> >
>> > Additionally it introduces a userspace-visible mount option, for
>> > something which /feels/ like a stop-gap kludge to make some things
>> > work better in the short-term.  Maybe it's acceptable, since a more
>> > general solution could be implemented later which ignores the added
>> > mount option without conflict.
>
> Here's another patch that achieves stability without compromising
> performance/disk usage.
>
> Again not perfect, but perhaps good enough?
>
> Thanks,
> Miklos
>
> diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
> index 12bcd07b9e32..5b7de7d489e0 100644
> --- a/fs/overlayfs/dir.c
> +++ b/fs/overlayfs/dir.c
> @@ -143,13 +143,24 @@ static int ovl_dir_getattr(struct vfsmount *mnt, struct dentry *dentry,
>         type = ovl_path_real(dentry, &realpath);
>         old_cred = ovl_override_creds(dentry->d_sb);
>         err = vfs_getattr(&realpath, stat);
> +       /*
> +        * Use lower ino/dev for merged dir, so they are stable across copy up.
> +        */
> +       if (!err && OVL_TYPE_MERGE(type) && OVL_TYPE_UPPER(type)) {
> +               struct path lowerpath;
> +               struct kstat lowerstat;
> +
> +               ovl_path_lower(dentry, &lowerpath);
> +               err = vfs_getattr(&lowerpath, &lowerstat);
> +               if (!err) {
> +                       stat->dev = lowerstat.dev;
> +                       stat->ino = lowerstat.ino;
> +               }
> +       }
>         revert_creds(old_cred);
>         if (err)
>                 return err;
>
> -       stat->dev = dentry->d_sb->s_dev;
> -       stat->ino = dentry->d_inode->i_ino;
> -
>         /*
>          * It's probably not worth it to count subdirs to get the
>          * correct link count.  nlink=1 seems to pacify 'find' and

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-08-16  0:03 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-07-21 23:33 Stable inode numbers Vito Caputo
2016-07-22 12:15 ` Miklos Szeredi
2016-07-22 20:55   ` Vito Caputo
2016-07-25 11:34     ` Miklos Szeredi
2016-07-25 21:25       ` Vito Caputo
2016-07-26  1:51         ` J. R. Okajima
2016-07-26 10:02         ` Miklos Szeredi
2016-07-26 11:09           ` Miklos Szeredi
2016-08-16  0:03             ` Vito Caputo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).