public inbox for linux-unionfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Amir Goldstein <amir73il@gmail.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	linux-xfs <linux-xfs@vger.kernel.org>,
	overlayfs <linux-unionfs@vger.kernel.org>
Subject: Re: Question about XFS_MAXINUMBER
Date: Sun, 18 Mar 2018 08:28:14 +1100	[thread overview]
Message-ID: <20180317212814.GI7000@dastard> (raw)
In-Reply-To: <CAOQ4uxhubJhspfj+x7yRkTJ4rUXyhZoF0x3kjuFyv-0=Lxrctw@mail.gmail.com> <CAOQ4uxh0H8mbbAZt0MEdLGRY-8Aqg2c9UFM56-KL2b90fLYCFQ@mail.gmail.com>

On Sat, Mar 17, 2018 at 09:56:19AM +0200, Amir Goldstein wrote:
> On Sat, Mar 17, 2018 at 7:40 AM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> > On Fri, Mar 16, 2018 at 11:24 PM, Dave Chinner <david@fromorbit.com> wrote:
> >> On Fri, Mar 16, 2018 at 04:05:22PM +0200, Amir Goldstein wrote:
> >>> Hi guys,
> >>>
> >>> I am trying to get a lower bound for unused inode number MSB on
> >>> a mounted xfs super block, so I can publish it on struct super_block.
> >>
> >> Sorry, what?
> >>
> >> The inode number is owned by the filesystem - nobody should be
> >> touching it or making assumptions they can screw with it in any way.
> >>
> 
> Let me clarify with the simplest example:
> 
> With overlay of 2 layers, lower and upper on 2 different xfs fs
> assuming that stat(2) from xfs will not be using the 63 MSB:
> 
> On stat(2) of an overlay upper inode we want to return:
>   st_dev = <overlay anon bdev>
>   st_ino = <real upper st_ino>
> 
> On stat(2) of an overlay lower inode we want to return:
>   st_dev = <overlay anon bdev>
>   st_ino = <real lower st_ino> | 1 << 63
> 
> Now for ext4 this is always safe to do and we find that automatically
> due to the fact that ext4 uses the default encode_fh generic 32bit
> inode encoding.
> 
> For xfs this should also be safe, but we don't want to whitelist xfs
> by name/magic, so we want xfs to publish the max amount of bits
> exposed to user with stat(2)/getdents(3).
> 
> Recently, I became aware of an nfsd use case that also looks
> at inode->i_ino, so we may want to also be able to assume
> max_ino_bits also applies to inode->i_ino, but if you tell us to
> stay clear of inode->i_ino, then we can always use stat.st_ino.
> 
> Thanks,
> Amir.
> 

On Sat, Mar 17, 2018 at 10:24:39AM +0200, Amir Goldstein wrote:
> On Sat, Mar 17, 2018 at 10:04 AM, Dave Chinner <david@fromorbit.com> wrote:
> > On Sat, Mar 17, 2018 at 06:40:23AM +0100, Miklos Szeredi wrote:
> [...]
> >> I ask, because we've thought long and hard about what to do for
> >> multiplexing inum space in overlayfs, and found no other sane options.
> >> Ideas welcome, of course.
> >
> > Why do you need to "multiplex" the inum space? perhaps you'd do
> > better to start with a description of why you want to play games
> > with inode numbers, rather than just posting a patch to steal bits
> > from other filesytem inode number spaces....
> >
> 
> I think this patch perhaps explains best what we want to do:
> https://marc.info/?l=linux-unionfs&m=151007386219743&w=2
> 
> I had already given a simple example in an earlier response.

So, I'll quote that here:

> > > On stat(2) of an overlay upper inode we want to return:
> > >   st_dev = <overlay anon bdev>
> > >   st_ino = <real upper st_ino>
> > > 
> > > On stat(2) of an overlay lower inode we want to return:
> > >   st_dev = <overlay anon bdev>
> > >   st_ino = <real lower st_ino> | 1 << 63

This makes no sense to me - this implies the inode number changes on
copy-up, and ....

> As the the "why" question, we have several requirements for
> overlay inode numbers:
> 1. st_ino is persistent
> 2. st_ino/st_dev pair is unique in the system
> 3. st_ino is consistent with d_ino
> 4. st_ino doesn't change on copy up
> 5. st_dev is uniform across all overlay inodes

.... this means requierment #4 isn't met, even on the same
filesystem.

IOWs, if overlay has already met #4 on the same filesystem, then
there is a persistent mapping between lower and upper inodes (Req.
#1) that maps the upper inode # to the lower inode #. That has to be
overlay information, because the underlying filesystem doesn't store
it. And because the lower inode/dev is unique, then req. 2 is met,
too.

FWIW, req 5 is badly worded - st_dev is uniform across all inodes in
a single overlay filesystem, not all overlay inodes.

> With upstream overlayfs we meet all requirements above for
> the case of all underlying layers on the same fs, by using a real
> underlying inode st_ino and the overlay st_dev.

Yeah, that's what I thought. So why can't you do exactly the same
thing for different underlying filesystems? You've already got a
mapping between upper and lower inode numbers, why can't that map
across different superblocks? Why do you need special "inode number
bits" exposed to userspace to identify upper->lower inode
mappings that overlay should already have a persistent mapping
mechanism for?

> With the 'xino' patch set [1], we can meet all requirements above
> also for the case of underlying layers on different fs, by multiplpexing
> the inum space, as long as we know about unused high ino bits.

Your example makes no sense to me - I don't see how adding extra
bits to the lower inode number allows you to meet requirement #4,
not why presenting "st_ino = <real upper st_ino>" for inodes that
have been copied up iis being done because that violates requirement
#4....

> The ovl-xino branch already has the xfs patch (not yet posted) to publish
> max_ino_bits.

That has no explanation of why you need to screw with inode number
bits, either. It's all mechanism, and there's zero explanation of
what problem it solves.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2018-03-17 21:28 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-16 14:05 Question about XFS_MAXINUMBER Amir Goldstein
2018-03-16 17:59 ` Amir Goldstein
2018-03-16 22:24 ` Dave Chinner
2018-03-17  5:40   ` Miklos Szeredi
2018-03-17  7:56     ` Amir Goldstein
2018-03-17 21:28       ` Dave Chinner [this message]
2018-03-18  6:21         ` Amir Goldstein
2018-03-18 23:02           ` Dave Chinner
2018-03-19  4:03             ` Amir Goldstein
2018-03-19  8:42               ` Miklos Szeredi
2018-03-20  1:47               ` Dave Chinner
2018-03-20  6:29                 ` Amir Goldstein
2018-03-20  8:04                   ` Ian Kent
2018-03-20  8:57                     ` Amir Goldstein
2018-03-20 10:18                       ` Ian Kent
2018-03-20  9:20                     ` Miklos Szeredi
2018-03-20 13:08                   ` Dave Chinner
2018-03-20  9:32                 ` Miklos Szeredi
2018-03-17  8:04     ` Dave Chinner
2018-03-17  8:24       ` Amir Goldstein

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180317212814.GI7000@dastard \
    --to=david@fromorbit.com \
    --cc=amir73il@gmail.com \
    --cc=darrick.wong@oracle.com \
    --cc=linux-unionfs@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox