Re: Question about XFS_MAXINUMBER

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Dave Chinner <david@fromorbit.com>
To: Amir Goldstein <amir73il@gmail.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	linux-xfs <linux-xfs@vger.kernel.org>,
	overlayfs <linux-unionfs@vger.kernel.org>
Subject: Re: Question about XFS_MAXINUMBER
Date: Sun, 18 Mar 2018 08:28:14 +1100	[thread overview]
Message-ID: <20180317212814.GI7000@dastard> (raw)
In-Reply-To: <CAOQ4uxhubJhspfj+x7yRkTJ4rUXyhZoF0x3kjuFyv-0=Lxrctw@mail.gmail.com> <CAOQ4uxh0H8mbbAZt0MEdLGRY-8Aqg2c9UFM56-KL2b90fLYCFQ@mail.gmail.com>

On Sat, Mar 17, 2018 at 09:56:19AM +0200, Amir Goldstein wrote:
> On Sat, Mar 17, 2018 at 7:40 AM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> > On Fri, Mar 16, 2018 at 11:24 PM, Dave Chinner <david@fromorbit.com> wrote:
> >> On Fri, Mar 16, 2018 at 04:05:22PM +0200, Amir Goldstein wrote:
> >>> Hi guys,
> >>>
> >>> I am trying to get a lower bound for unused inode number MSB on
> >>> a mounted xfs super block, so I can publish it on struct super_block.
> >>
> >> Sorry, what?
> >>
> >> The inode number is owned by the filesystem - nobody should be
> >> touching it or making assumptions they can screw with it in any way.
> >>
> 
> Let me clarify with the simplest example:
> 
> With overlay of 2 layers, lower and upper on 2 different xfs fs
> assuming that stat(2) from xfs will not be using the 63 MSB:
> 
> On stat(2) of an overlay upper inode we want to return:
>   st_dev = <overlay anon bdev>
>   st_ino = <real upper st_ino>
> 
> On stat(2) of an overlay lower inode we want to return:
>   st_dev = <overlay anon bdev>
>   st_ino = <real lower st_ino> | 1 << 63
> 
> Now for ext4 this is always safe to do and we find that automatically
> due to the fact that ext4 uses the default encode_fh generic 32bit
> inode encoding.
> 
> For xfs this should also be safe, but we don't want to whitelist xfs
> by name/magic, so we want xfs to publish the max amount of bits
> exposed to user with stat(2)/getdents(3).
> 
> Recently, I became aware of an nfsd use case that also looks
> at inode->i_ino, so we may want to also be able to assume
> max_ino_bits also applies to inode->i_ino, but if you tell us to
> stay clear of inode->i_ino, then we can always use stat.st_ino.
> 
> Thanks,
> Amir.
> 

On Sat, Mar 17, 2018 at 10:24:39AM +0200, Amir Goldstein wrote:
> On Sat, Mar 17, 2018 at 10:04 AM, Dave Chinner <david@fromorbit.com> wrote:
> > On Sat, Mar 17, 2018 at 06:40:23AM +0100, Miklos Szeredi wrote:
> [...]
> >> I ask, because we've thought long and hard about what to do for
> >> multiplexing inum space in overlayfs, and found no other sane options.
> >> Ideas welcome, of course.
> >
> > Why do you need to "multiplex" the inum space? perhaps you'd do
> > better to start with a description of why you want to play games
> > with inode numbers, rather than just posting a patch to steal bits
> > from other filesytem inode number spaces....
> >
> 
> I think this patch perhaps explains best what we want to do:
> https://marc.info/?l=linux-unionfs&m=151007386219743&w=2
> 
> I had already given a simple example in an earlier response.

So, I'll quote that here:

> > > On stat(2) of an overlay upper inode we want to return:
> > >   st_dev = <overlay anon bdev>
> > >   st_ino = <real upper st_ino>
> > > 
> > > On stat(2) of an overlay lower inode we want to return:
> > >   st_dev = <overlay anon bdev>
> > >   st_ino = <real lower st_ino> | 1 << 63

This makes no sense to me - this implies the inode number changes on
copy-up, and ....

> As the the "why" question, we have several requirements for
> overlay inode numbers:
> 1. st_ino is persistent
> 2. st_ino/st_dev pair is unique in the system
> 3. st_ino is consistent with d_ino
> 4. st_ino doesn't change on copy up
> 5. st_dev is uniform across all overlay inodes

.... this means requierment #4 isn't met, even on the same
filesystem.

IOWs, if overlay has already met #4 on the same filesystem, then
there is a persistent mapping between lower and upper inodes (Req.
#1) that maps the upper inode # to the lower inode #. That has to be
overlay information, because the underlying filesystem doesn't store
it. And because the lower inode/dev is unique, then req. 2 is met,
too.

FWIW, req 5 is badly worded - st_dev is uniform across all inodes in
a single overlay filesystem, not all overlay inodes.

> With upstream overlayfs we meet all requirements above for
> the case of all underlying layers on the same fs, by using a real
> underlying inode st_ino and the overlay st_dev.

Yeah, that's what I thought. So why can't you do exactly the same
thing for different underlying filesystems? You've already got a
mapping between upper and lower inode numbers, why can't that map
across different superblocks? Why do you need special "inode number
bits" exposed to userspace to identify upper->lower inode
mappings that overlay should already have a persistent mapping
mechanism for?

> With the 'xino' patch set [1], we can meet all requirements above
> also for the case of underlying layers on different fs, by multiplpexing
> the inum space, as long as we know about unused high ino bits.

Your example makes no sense to me - I don't see how adding extra
bits to the lower inode number allows you to meet requirement #4,
not why presenting "st_ino = <real upper st_ino>" for inodes that
have been copied up iis being done because that violates requirement
#4....

> The ovl-xino branch already has the xfs patch (not yet posted) to publish
> max_ino_bits.

That has no explanation of why you need to screw with inode number
bits, either. It's all mechanism, and there's zero explanation of
what problem it solves.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

next prev parent reply	other threads:[~2018-03-17 21:28 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-16 14:05 Question about XFS_MAXINUMBER Amir Goldstein
2018-03-16 17:59 ` Amir Goldstein
2018-03-16 22:24 ` Dave Chinner
2018-03-17  5:40   ` Miklos Szeredi
2018-03-17  7:56     ` Amir Goldstein
2018-03-17 21:28       ` Dave Chinner [this message]
2018-03-18  6:21         ` Amir Goldstein
2018-03-18 23:02           ` Dave Chinner
2018-03-19  4:03             ` Amir Goldstein
2018-03-19  8:42               ` Miklos Szeredi
2018-03-20  1:47               ` Dave Chinner
2018-03-20  6:29                 ` Amir Goldstein
2018-03-20  8:04                   ` Ian Kent
2018-03-20  8:57                     ` Amir Goldstein
2018-03-20 10:18                       ` Ian Kent
2018-03-20  9:20                     ` Miklos Szeredi
2018-03-20 13:08                   ` Dave Chinner
2018-03-20  9:32                 ` Miklos Szeredi
2018-03-17  8:04     ` Dave Chinner
2018-03-17  8:24       ` Amir Goldstein

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180317212814.GI7000@dastard \
    --to=david@fromorbit.com \
    --cc=amir73il@gmail.com \
    --cc=darrick.wong@oracle.com \
    --cc=linux-unionfs@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.