From: Dave Chinner <david@fromorbit.com>
To: Amir Goldstein <amir73il@gmail.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>,
"Darrick J. Wong" <darrick.wong@oracle.com>,
linux-xfs <linux-xfs@vger.kernel.org>,
overlayfs <linux-unionfs@vger.kernel.org>
Subject: Re: Question about XFS_MAXINUMBER
Date: Sun, 18 Mar 2018 08:28:14 +1100 [thread overview]
Message-ID: <20180317212814.GI7000@dastard> (raw)
In-Reply-To: <CAOQ4uxhubJhspfj+x7yRkTJ4rUXyhZoF0x3kjuFyv-0=Lxrctw@mail.gmail.com> <CAOQ4uxh0H8mbbAZt0MEdLGRY-8Aqg2c9UFM56-KL2b90fLYCFQ@mail.gmail.com>
On Sat, Mar 17, 2018 at 09:56:19AM +0200, Amir Goldstein wrote:
> On Sat, Mar 17, 2018 at 7:40 AM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> > On Fri, Mar 16, 2018 at 11:24 PM, Dave Chinner <david@fromorbit.com> wrote:
> >> On Fri, Mar 16, 2018 at 04:05:22PM +0200, Amir Goldstein wrote:
> >>> Hi guys,
> >>>
> >>> I am trying to get a lower bound for unused inode number MSB on
> >>> a mounted xfs super block, so I can publish it on struct super_block.
> >>
> >> Sorry, what?
> >>
> >> The inode number is owned by the filesystem - nobody should be
> >> touching it or making assumptions they can screw with it in any way.
> >>
>
> Let me clarify with the simplest example:
>
> With overlay of 2 layers, lower and upper on 2 different xfs fs
> assuming that stat(2) from xfs will not be using the 63 MSB:
>
> On stat(2) of an overlay upper inode we want to return:
> st_dev = <overlay anon bdev>
> st_ino = <real upper st_ino>
>
> On stat(2) of an overlay lower inode we want to return:
> st_dev = <overlay anon bdev>
> st_ino = <real lower st_ino> | 1 << 63
>
> Now for ext4 this is always safe to do and we find that automatically
> due to the fact that ext4 uses the default encode_fh generic 32bit
> inode encoding.
>
> For xfs this should also be safe, but we don't want to whitelist xfs
> by name/magic, so we want xfs to publish the max amount of bits
> exposed to user with stat(2)/getdents(3).
>
> Recently, I became aware of an nfsd use case that also looks
> at inode->i_ino, so we may want to also be able to assume
> max_ino_bits also applies to inode->i_ino, but if you tell us to
> stay clear of inode->i_ino, then we can always use stat.st_ino.
>
> Thanks,
> Amir.
>
On Sat, Mar 17, 2018 at 10:24:39AM +0200, Amir Goldstein wrote:
> On Sat, Mar 17, 2018 at 10:04 AM, Dave Chinner <david@fromorbit.com> wrote:
> > On Sat, Mar 17, 2018 at 06:40:23AM +0100, Miklos Szeredi wrote:
> [...]
> >> I ask, because we've thought long and hard about what to do for
> >> multiplexing inum space in overlayfs, and found no other sane options.
> >> Ideas welcome, of course.
> >
> > Why do you need to "multiplex" the inum space? perhaps you'd do
> > better to start with a description of why you want to play games
> > with inode numbers, rather than just posting a patch to steal bits
> > from other filesytem inode number spaces....
> >
>
> I think this patch perhaps explains best what we want to do:
> https://marc.info/?l=linux-unionfs&m=151007386219743&w=2
>
> I had already given a simple example in an earlier response.
So, I'll quote that here:
> > > On stat(2) of an overlay upper inode we want to return:
> > > st_dev = <overlay anon bdev>
> > > st_ino = <real upper st_ino>
> > >
> > > On stat(2) of an overlay lower inode we want to return:
> > > st_dev = <overlay anon bdev>
> > > st_ino = <real lower st_ino> | 1 << 63
This makes no sense to me - this implies the inode number changes on
copy-up, and ....
> As the the "why" question, we have several requirements for
> overlay inode numbers:
> 1. st_ino is persistent
> 2. st_ino/st_dev pair is unique in the system
> 3. st_ino is consistent with d_ino
> 4. st_ino doesn't change on copy up
> 5. st_dev is uniform across all overlay inodes
.... this means requierment #4 isn't met, even on the same
filesystem.
IOWs, if overlay has already met #4 on the same filesystem, then
there is a persistent mapping between lower and upper inodes (Req.
#1) that maps the upper inode # to the lower inode #. That has to be
overlay information, because the underlying filesystem doesn't store
it. And because the lower inode/dev is unique, then req. 2 is met,
too.
FWIW, req 5 is badly worded - st_dev is uniform across all inodes in
a single overlay filesystem, not all overlay inodes.
> With upstream overlayfs we meet all requirements above for
> the case of all underlying layers on the same fs, by using a real
> underlying inode st_ino and the overlay st_dev.
Yeah, that's what I thought. So why can't you do exactly the same
thing for different underlying filesystems? You've already got a
mapping between upper and lower inode numbers, why can't that map
across different superblocks? Why do you need special "inode number
bits" exposed to userspace to identify upper->lower inode
mappings that overlay should already have a persistent mapping
mechanism for?
> With the 'xino' patch set [1], we can meet all requirements above
> also for the case of underlying layers on different fs, by multiplpexing
> the inum space, as long as we know about unused high ino bits.
Your example makes no sense to me - I don't see how adding extra
bits to the lower inode number allows you to meet requirement #4,
not why presenting "st_ino = <real upper st_ino>" for inodes that
have been copied up iis being done because that violates requirement
#4....
> The ovl-xino branch already has the xfs patch (not yet posted) to publish
> max_ino_bits.
That has no explanation of why you need to screw with inode number
bits, either. It's all mechanism, and there's zero explanation of
what problem it solves.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2018-03-17 21:28 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-03-16 14:05 Question about XFS_MAXINUMBER Amir Goldstein
2018-03-16 17:59 ` Amir Goldstein
2018-03-16 22:24 ` Dave Chinner
2018-03-17 5:40 ` Miklos Szeredi
2018-03-17 7:56 ` Amir Goldstein
2018-03-17 21:28 ` Dave Chinner [this message]
2018-03-18 6:21 ` Amir Goldstein
2018-03-18 23:02 ` Dave Chinner
2018-03-19 4:03 ` Amir Goldstein
2018-03-19 8:42 ` Miklos Szeredi
2018-03-20 1:47 ` Dave Chinner
2018-03-20 6:29 ` Amir Goldstein
2018-03-20 8:04 ` Ian Kent
2018-03-20 8:57 ` Amir Goldstein
2018-03-20 10:18 ` Ian Kent
2018-03-20 9:20 ` Miklos Szeredi
2018-03-20 13:08 ` Dave Chinner
2018-03-20 9:32 ` Miklos Szeredi
2018-03-17 8:04 ` Dave Chinner
2018-03-17 8:24 ` Amir Goldstein
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180317212814.GI7000@dastard \
--to=david@fromorbit.com \
--cc=amir73il@gmail.com \
--cc=darrick.wong@oracle.com \
--cc=linux-unionfs@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=miklos@szeredi.hu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox