linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org>
To: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
Cc: Alex Elder <elder-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	Linux Containers
	<containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org>,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Ben Myers <bpm-sJ/iWh9BUns@public.gmane.org>,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: [PATCH RFC 10/12] userns: Convert xfs to use kuid/kgid/kprojid where appropriate
Date: Thu, 14 Feb 2013 13:19:08 +1100	[thread overview]
Message-ID: <20130214021908.GJ26694@dastard> (raw)
In-Reply-To: <87obfoxetf.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

On Wed, Feb 13, 2013 at 10:13:16AM -0800, Eric W. Biederman wrote:
> Joel Becker <jlbec-aKy9MeLSZ9dg9hUCZPvPmw@public.gmane.org> writes:
> 
> > On Wed, Nov 21, 2012 at 10:55:24AM +1100, Dave Chinner wrote:
> >> > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> >> > index 2778258..3656b88 100644
> >> > --- a/fs/xfs/xfs_inode.c
> >> > +++ b/fs/xfs/xfs_inode.c
> >> > @@ -570,11 +570,12 @@ xfs_dinode_from_disk(
> >> >  	to->di_version = from ->di_version;
> >> >  	to->di_format = from->di_format;
> >> >  	to->di_onlink = be16_to_cpu(from->di_onlink);
> >> > -	to->di_uid = be32_to_cpu(from->di_uid);
> >> > -	to->di_gid = be32_to_cpu(from->di_gid);
> >> > +	to->di_uid = make_kuid(&init_user_ns, be32_to_cpu(from->di_uid));
> >> > +	to->di_gid = make_kgid(&init_user_ns, be32_to_cpu(from->di_gid));
> >> 
> >> You can't do this, because the incore inode structure is written
> >> directly to the log. This is effectively an on-disk format change.
> >
> > 	Yeah, I don't get this either.  Over in ocfs2, you do the
> > correct thing, translating at the boundary from ocfs2_dinode to struct
> > inode.
> 
> This is the boundary.

It is *a* boundary. It is the in-core disk inode to on disk inode
boundary (i.e. struct xfs_icdinode to struct xfs_dinode).
Namespaces don't belong at this boundary - this is internal XFS
stuff that nothing from the VFS should be interacting with. The
structure of XFS is roughly:

	userspace
	---------
	   VFS
	---------
	 VFS/XFS	<<<<<< here is where you need to modify
	interface
	---------
	core XFS
	---------
	XFS/disk	<<<<<< here is where you actually modified
	interface
	---------
	 storage


IOWs, the boundary you are looking for is the VFS/XFS boundary (i.e.
struct inode to struct xfs_icdinode). i.e. namespace aware uid/gid
is in the struct inode, flattened 32 bit values are in the struct
xfs_icdinode. The struct inode and the struct xfs_icdinode are both
embedded in the struct xfs_inode, so we just have to translate
between the two internal structures are the right point in time.

Hence for namespaces to work correctly, anything that is currently
using current_fs*id() for uid/gid comparison needs to be converted
to use the VFS inode values (i.e. VFS_I(ip)->i_*id). For values
written to the xfs inode, the VFS uid/gid needs to be flattened to
a 32bit value.

These flattened values are needed during inode allocation (for
initial on-disk values) and creating dquots associated with the new
inodes. You should be able to derive them from current_fs*id(),
right? Then when changing uid/gid via .setattr, we can flatten the
namespace aware VFS uid/gid and into the XFS incore idinode (i.e.
ip->i_d.di_*id) via the same method. Conversion from XFS on-disk to
namespace aware VFS uid/gid then occurs when when initialising the
VFS inode from the XFS inode (i.e. in xfs_setup_inode() like I
previously suggested).

This keeps namespace aware uid/gid up at the VFS layer and
conversion at the VFS/XFS boundaries in the XFS code, and everything
should work fine.

> The crazy thing is that is that xfs appears to
> directly write their incore inode structure into their journal. 

Off topic, but it's actually a very sane thing to do. It's called
logical object logging, as opposed to physical logging like ext3/4
and ocfs2 use. XFS uses a combination of logical logging
(superblock, dquots, inodes) and physical logging (via buffers).

Logical logging decouples in-memory object modification from buffer
IO and ensures the buffer is not a single point of serialisation
when multiple objects share a single buffer. Hence we can read/write
an inode buffer and concurrent modify inodes in memory from that
buffer at the same time.  i.e. we only need buffers for IO, not for
ongoing modifications.

This decoupling allows XFS to use large buffers for inodes and so
minimise IO for reading and/or writing inodes.  Further, we can also
easily serialise logged, in-memory modifications for all objects in
a single backing buffer with only minor interruption to ongoing
modifications. It also allows us to use simple fire-and-forget
writeback semantics for metadata.

IOWs, the use of logical logging techniques vastly improves
concurrency and scalability over the physical logging methods other
filesystems use. Call it crazy if you want, but I find general most
people say this simply because they don't understand why XFS does
what it does....

> I had
> missed the journal reference the first time through and simply assumed
> since this is where the disk inode to the incore inode coversion
> happened that the weird scary comment in the xfs header file was wrong.

Comments in XFS, especially weird scary ones, are rarely wrong. Some
of them might have been there for close on 20 years, but they are
our documentation for all the weird, scary stuff that XFS does.  I
rely on them being correct, so it's something I always pay attention
to during code review. IOWs, When we add, modify or remove something
weird and scary, the comments are updated appropriately so we'll
know why the code is doing something weird and scary in another 20
years time. ;)

Cheers,

Dave.
-- 
Dave Chinner
david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org

  parent reply	other threads:[~2013-02-14  2:19 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-20 12:42 [PATCH RFC 0/12] Final userns conversions Eric W. Biederman
     [not found] ` <87pq38wimv.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-11-20 12:43   ` [PATCH RFC 01/12] userns: Support 9p interacting with multiple user namespaces Eric W. Biederman
     [not found]     ` <1353415420-5457-1-git-send-email-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-11-20 12:43       ` [PATCH RFC 02/12] userns: Convert afs to use kuid/kgid where appropriate Eric W. Biederman
2012-11-20 12:43       ` [PATCH RFC 03/12] userns: Convert ceph " Eric W. Biederman
2012-11-20 16:48         ` Sage Weil
     [not found]           ` <alpine.DEB.2.00.1211200847110.7369-vIokxiIdD2AQNTJnQDzGJqxOck334EZe@public.gmane.org>
2012-11-20 17:15             ` Eric W. Biederman
2012-11-20 12:43       ` [PATCH RFC 04/12] userns: Convert cifs " Eric W. Biederman
     [not found]         ` <1353415420-5457-4-git-send-email-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-11-20 16:56           ` Steve French
2012-11-20 17:22             ` Eric W. Biederman
2012-11-25 12:47           ` Jeff Layton
2012-11-20 12:43       ` [PATCH RFC 05/12] userns: Convert coda's " Eric W. Biederman
2012-11-20 12:43       ` [PATCH RFC 06/12] userns: Convert gfs2 " Eric W. Biederman
     [not found]         ` <1353415420-5457-6-git-send-email-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-11-22  9:47           ` Steven Whitehouse
2012-11-20 12:43       ` [PATCH RFC 07/12] userns: Convert ncpfs to use kuid and kgid " Eric W. Biederman
2012-11-20 12:43       ` [PATCH RFC 08/12] userns: Convert nfs and nfsd to use kuid/kgid " Eric W. Biederman
2012-11-20 12:43       ` [PATCH RFC 09/12] userns: Convert ocfs2 to use kuid and kgid " Eric W. Biederman
2012-11-21 19:51         ` Joel Becker
2013-02-13 17:12           ` Eric W. Biederman
     [not found]         ` <1353415420-5457-9-git-send-email-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-11-21 19:59           ` Joel Becker
2013-02-13 17:41             ` Eric W. Biederman
2012-11-20 12:43       ` [PATCH RFC 10/12] userns: Convert xfs to use kuid/kgid/kprojid " Eric W. Biederman
2012-11-20 23:55         ` Dave Chinner
2012-11-21 19:52           ` Joel Becker
2013-02-13 18:13             ` Eric W. Biederman
     [not found]               ` <87obfoxetf.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2013-02-14  2:19                 ` Dave Chinner [this message]
2013-02-18  1:25                   ` Eric W. Biederman
     [not found]                     ` <87621qpg4o.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2013-02-19  3:30                       ` Dave Chinner
2012-11-20 12:43       ` [PATCH RFC 11/12] userns: Now that everything has been converted remove the unnecessary infrastructure Eric W. Biederman
2012-11-20 12:43       ` [PATCH RFC 12/12] userns: Remove the EXPERMINTAL kconfig tag Eric W. Biederman
2012-11-21  0:09   ` [PATCH RFC 0/12] Final userns conversions Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130214021908.GJ26694@dastard \
    --to=david-fqsqvqoi3ljby3ivrkzq2a@public.gmane.org \
    --cc=bpm-sJ/iWh9BUns@public.gmane.org \
    --cc=containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
    --cc=ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org \
    --cc=elder-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
    --cc=linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).