* Re: Overlayfs reverse mapping
@ 2017-04-19 11:34 Amir Goldstein
2017-04-19 13:15 ` Miklos Szeredi
0 siblings, 1 reply; 8+ messages in thread
From: Amir Goldstein @ 2017-04-19 11:34 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: linux-unionfs@vger.kernel.org
On Fri, Apr 7, 2017 at 4:03 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Fri, Apr 7, 2017 at 12:47 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>
>> Come to think about it, NFS export of regular file don't need to
>> follow renames at all:
>> - The handle for a regular file is always the handle for the real
>> lower or upper inode
>> - To decode a handle, create an O_TMPFILE style overlay dentry, which
>> is not linked
>> to any path in overlay, but has the _upperdentry/lowerstack setup
>
> I don't think nfs will allow such a scheme. NFS3 server is stateless,
> which means there's no open/close in the protocol. Hence we can't
> copy-up on open(O_WR*) and return a different file handle for writing.
> If client looks up a file currently on lower and we return file handle
> based on lower file, then we must be able to decode that handle after
> the file has been copied up and even after rename. And this must work
> reliably even if the overlay dentry is no longer in the dcache.
>
> So there's no option, other than to have a reverse mapping somewhere.
>
> One more idea: do it out-of-band (e.g. under workdir) but do it as a
> plain directory tree shadowing the lower trees that contains the
> forward redirect information. It spares us the implementation of the
> database, since the filesystem does it for us. Yes, it can get out of
> sync with the overlay, but so can any mapping in-band or out-of-band.
>
Miklos,
I started looking into reverse mapping (to compliment the stable inode work)
I am contemplating a scheme similar to your suggestion, but instead of
a shadow directory tree, index forward redirects by stable inode number
and store upper and lower redirects as xattr, e.g.:
workdir/inodes/#100
contains xattrs:
overlay.fh - fh of lower inode (whose ino is 100)
overlay.redirect - path of lower inode (so fh and stale ino can be
fixed after copying layers)
overlay.upper - fh of upper inode
overlay.path - path (*) of upper (so upper fh can be fixed after copying layers)
(*) upper path can get out of sync with upper fh in case
of crash during rename. on mount, if upper fh is valid,
path should be fixed.
I am still trying to see if for non-dirs it makes sense for inodes/#100
to be a hardlink to upper inode.
It saves the indirection to upper fh and hardlinks would be preserved
with rsync and tar.
This would require linking temp to inodes/#100 before linking it to
upper dir and we would need to turn inodes/#100 into a whiteout
(or truncate it) on ovl_evict_inode() operation.
We need the "inode whiteout" around for a quick indication
that the NFS handle (of overlay inode 100) became stale.
We can also deduce that the handle became stale by looking up
the lower path in overlay and getting a negative dentry or one
with lower dentry that doesn't point to inode 100, but the
"inode whiteout" make this case easier to handle.
If you see any major flaw in this design please shout.
Thanks,
Amir.
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: Overlayfs reverse mapping
2017-04-19 11:34 Overlayfs reverse mapping Amir Goldstein
@ 2017-04-19 13:15 ` Miklos Szeredi
2017-04-20 9:22 ` Amir Goldstein
0 siblings, 1 reply; 8+ messages in thread
From: Miklos Szeredi @ 2017-04-19 13:15 UTC (permalink / raw)
To: Amir Goldstein; +Cc: linux-unionfs@vger.kernel.org
On Wed, Apr 19, 2017 at 1:34 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> I am contemplating a scheme similar to your suggestion, but instead of
> a shadow directory tree, index forward redirects by stable inode number
> and store upper and lower redirects as xattr, e.g.:
>
> workdir/inodes/#100
Interesting idea.
> contains xattrs:
> overlay.fh - fh of lower inode (whose ino is 100)
> overlay.redirect - path of lower inode (so fh and stale ino can be
> fixed after copying layers)
> overlay.upper - fh of upper inode
> overlay.path - path (*) of upper (so upper fh can be fixed after copying layers)
I get ".redirect" and ".upper" but not the others. In fact the lower
fh can actually be used to generate the name of the file, so we don't
have to encode lower ino separately into the overlay fh.
And I don't think doing ".path" is worth it. There's no use case I
can imagine that would need this for NFS export. As for hard link
consistency, if we really want to do it, we can just detect the stale
".upper" pointer and do it the horribly slow way. But it's so
unlikely to happen that I don't think anybody would care either way.
So lets just not care for now.
> I am still trying to see if for non-dirs it makes sense for inodes/#100
> to be a hardlink to upper inode.
> It saves the indirection to upper fh and hardlinks would be preserved
> with rsync and tar.
And makes ".upper" and ".path" superfluous. Pity, we can't hard link
directories ;)
> This would require linking temp to inodes/#100 before linking it to
> upper dir and we would need to turn inodes/#100 into a whiteout
> (or truncate it) on ovl_evict_inode() operation.
>
> We need the "inode whiteout" around for a quick indication
> that the NFS handle (of overlay inode 100) became stale.
Yes. Not sure how much of a problem these would be for "rm -rf" type
of use cases. Real whiteouts get removed when parent is removed, but
these would stay around forever. But it's certainly the simplest way
to do it.
Thanks,
Miklos
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Overlayfs reverse mapping
2017-04-19 13:15 ` Miklos Szeredi
@ 2017-04-20 9:22 ` Amir Goldstein
2017-04-21 13:24 ` Miklos Szeredi
0 siblings, 1 reply; 8+ messages in thread
From: Amir Goldstein @ 2017-04-20 9:22 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: linux-unionfs@vger.kernel.org
On Wed, Apr 19, 2017 at 4:15 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Wed, Apr 19, 2017 at 1:34 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>
>> I am contemplating a scheme similar to your suggestion, but instead of
>> a shadow directory tree, index forward redirects by stable inode number
>> and store upper and lower redirects as xattr, e.g.:
>>
>> workdir/inodes/#100
>
> Interesting idea.
I wanted to run another idea by you.
Instead of storing the overlay inode index under workdir/inodes,
store it under upperdir/lost+found.
Crazy? think about it..
- User applications won't touch that dir, because it is already reserved
for use by mkfs/fsck tools.
- It is actually going to be used by a future fsck.overlay tool, to reindex
after layers migration.
- Taking the analogy to traditional lost+found even further, the
files found in traditional lost+found are also named #<ino> and
when you open them you get an fd to the inode.
In case of hardlinked #<ino> (non-dir) you will actually get the
same behavior as the traditional lost+found, except that inodes
could have nlink > 1. directory inode entries could appear as
empty dir to user or they could be stored as whiteouts, so user
doesn't see them.
- This keeps workdir content volatile (and not only workdir/work)
it may matter because container managers would pack upper
dir when committing changes to an image, but they would not
pack workdir (I think).
>
>> contains xattrs:
>> overlay.fh - fh of lower inode (whose ino is 100)
>> overlay.redirect - path of lower inode (so fh and stale ino can be
>> fixed after copying layers)
>> overlay.upper - fh of upper inode
>> overlay.path - path (*) of upper (so upper fh can be fixed after copying layers)
>
> I get ".redirect" and ".upper" but not the others. In fact the lower
> fh can actually be used to generate the name of the file, so we don't
> have to encode lower ino separately into the overlay fh.
>
Forgot to CC the list when I wrote that you are right.
We don't need overlay.fh on the inode entry and
overlay.path may be needed for a later "commit before migrate"
tool, but don't need to worry about that now.
Amir.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Overlayfs reverse mapping
2017-04-20 9:22 ` Amir Goldstein
@ 2017-04-21 13:24 ` Miklos Szeredi
2017-04-21 18:44 ` J. R. Okajima
0 siblings, 1 reply; 8+ messages in thread
From: Miklos Szeredi @ 2017-04-21 13:24 UTC (permalink / raw)
To: Amir Goldstein; +Cc: linux-unionfs@vger.kernel.org
On Thu, Apr 20, 2017 at 11:22 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> I wanted to run another idea by you.
> Instead of storing the overlay inode index under workdir/inodes,
> store it under upperdir/lost+found.
>
> Crazy? think about it..
> - User applications won't touch that dir, because it is already reserved
> for use by mkfs/fsck tools.
OTOH lost+found is just a plain old directory that is handled by
userspace tools only. No kernel fs driver treats /lost+found
specially. So putting this in upperdir would be a namespace pollution
issue and generally we'd like to avoid that. Aufs has lived with that
sort of thing, and probably it wouldn't be a problem in practice. But
I think that if we can avoid namespace confusion then we should.
Thanks,
Miklos
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Overlayfs reverse mapping
2017-04-21 13:24 ` Miklos Szeredi
@ 2017-04-21 18:44 ` J. R. Okajima
2017-04-21 19:04 ` Amir Goldstein
0 siblings, 1 reply; 8+ messages in thread
From: J. R. Okajima @ 2017-04-21 18:44 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: Amir Goldstein, linux-unionfs@vger.kernel.org
Miklos Szeredi:
> OTOH lost+found is just a plain old directory that is handled by
> userspace tools only. No kernel fs driver treats /lost+found
> specially. So putting this in upperdir would be a namespace pollution
> issue and generally we'd like to avoid that. Aufs has lived with that
> sort of thing, and probably it wouldn't be a problem in practice. But
??
As far as I know, aufs doesn't handle lost+found specially, but the
layer fs handles. What is the "namespace pollution" you call?
J. R. Okajima
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Overlayfs reverse mapping
2017-04-21 18:44 ` J. R. Okajima
@ 2017-04-21 19:04 ` Amir Goldstein
2017-04-21 19:34 ` J. R. Okajima
0 siblings, 1 reply; 8+ messages in thread
From: Amir Goldstein @ 2017-04-21 19:04 UTC (permalink / raw)
To: J. R. Okajima; +Cc: Miklos Szeredi, linux-unionfs@vger.kernel.org
On Fri, Apr 21, 2017 at 9:44 PM, J. R. Okajima <hooanon05g@gmail.com> wrote:
> Miklos Szeredi:
>> OTOH lost+found is just a plain old directory that is handled by
>> userspace tools only. No kernel fs driver treats /lost+found
>> specially. So putting this in upperdir would be a namespace pollution
>> issue and generally we'd like to avoid that. Aufs has lived with that
>> sort of thing, and probably it wouldn't be a problem in practice. But
>
> ??
> As far as I know, aufs doesn't handle lost+found specially, but the
> layer fs handles. What is the "namespace pollution" you call?
>
That would be names of files reserved for use by fs that user
cannot use (i.e. .wh. files). Miklos did not mean that aufs handles
lost+found, but that it lives without a problem with reserved filenames.
Anyway, I do agree that avoiding namespace pollution should be
avoided if possible.
Cheers,
Amir.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Overlayfs reverse mapping
2017-04-21 19:04 ` Amir Goldstein
@ 2017-04-21 19:34 ` J. R. Okajima
2017-04-21 19:42 ` Amir Goldstein
0 siblings, 1 reply; 8+ messages in thread
From: J. R. Okajima @ 2017-04-21 19:34 UTC (permalink / raw)
To: Amir Goldstein; +Cc: Miklos Szeredi, linux-unionfs@vger.kernel.org
Amir Goldstein:
> That would be names of files reserved for use by fs that user
> cannot use (i.e. .wh. files). Miklos did not mean that aufs handles
> lost+found, but that it lives without a problem with reserved filenames.
Ah, reserving a filename prefix ".wh.". I see.
Thanx for the explanation.
By the way, putting extra things under lost+found is not a good idea I
think.
Have you ever tried restoring files manually from lost+found?
When an administrator meets such case, he will digging lost+found and
copy the files one by one with guessing the filenames. If he is not sure
what the file is, then he may leave it under lost+found.
For such case, the entries related to overlayfs will be left, and the
administrator will spend some sleepless nights with thinking "What are
these files? How can I restore them?"
J. R. Okajima
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Overlayfs reverse mapping
2017-04-21 19:34 ` J. R. Okajima
@ 2017-04-21 19:42 ` Amir Goldstein
0 siblings, 0 replies; 8+ messages in thread
From: Amir Goldstein @ 2017-04-21 19:42 UTC (permalink / raw)
To: J. R. Okajima; +Cc: Miklos Szeredi, linux-unionfs@vger.kernel.org
On Fri, Apr 21, 2017 at 10:34 PM, J. R. Okajima <hooanon05g@gmail.com> wrote:
> Amir Goldstein:
>> That would be names of files reserved for use by fs that user
>> cannot use (i.e. .wh. files). Miklos did not mean that aufs handles
>> lost+found, but that it lives without a problem with reserved filenames.
>
> Ah, reserving a filename prefix ".wh.". I see.
> Thanx for the explanation.
>
> By the way, putting extra things under lost+found is not a good idea I
> think.
> Have you ever tried restoring files manually from lost+found?
> When an administrator meets such case, he will digging lost+found and
> copy the files one by one with guessing the filenames. If he is not sure
> what the file is, then he may leave it under lost+found.
> For such case, the entries related to overlayfs will be left, and the
> administrator will spend some sleepless nights with thinking "What are
> these files? How can I restore them?"
>
>
Yeh, it's not a good idea. I needed to hear that out loud ;-)
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2017-04-21 19:42 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-04-19 11:34 Overlayfs reverse mapping Amir Goldstein
2017-04-19 13:15 ` Miklos Szeredi
2017-04-20 9:22 ` Amir Goldstein
2017-04-21 13:24 ` Miklos Szeredi
2017-04-21 18:44 ` J. R. Okajima
2017-04-21 19:04 ` Amir Goldstein
2017-04-21 19:34 ` J. R. Okajima
2017-04-21 19:42 ` Amir Goldstein
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.