Re: [PATCH] user namespaces: bump idmap limits

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Christian Brauner <christian.brauner@canonical.com>
To: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Christian Brauner <christian.brauner@ubuntu.com>,
	ebiederm@xmission.com, stgraber@ubuntu.com,
	linux-kernel@vger.kernel.org, tycho@tycho.ws
Subject: Re: [PATCH] user namespaces: bump idmap limits
Date: Wed, 4 Oct 2017 16:45:25 +0200	[thread overview]
Message-ID: <20171004144524.6ndtzacgrkvxnxaq@gmail.com> (raw)
In-Reply-To: <20171004142857.GA30220@mail.hallyn.com>

On Wed, Oct 04, 2017 at 09:28:57AM -0500, Serge Hallyn wrote:
> Quoting Christian Brauner (christian.brauner@ubuntu.com):
> > We have quite some use cases where users already run into the current limit for
> > {g,u}id mappings. Consider a user requesting us to map everything but 999, and
> > 1001 for a given range of 1000000000 with a sub{g,u}id layout of:
> > 
> > some-user:100000:1000000000
> > some-user:999:1
> > some-user:1000:1
> > some-user:1001:1
> > some-user:1002:1
> > 
> > This translates to:
> > 
> > MAPPING-TYPE CONTAINER HOST        RANGE
> > uid           999          999         1
> > uid          1001         1001         1
> > uid             0      1000000       999
> > uid          1000      1001000         1
> > uid          1002      1001002 999998998
> > 
> > gid           999          999         1
> > gid          1001         1001         1
> > gid             0      1000000       999
> > gid          1000      1001000         1
> > gid          1002      1001002 999998998
> > 
> > which is already the current limit.
> > 
> > Design Notes:
> > As discussed at LPC simply bumping the number of limits is not going to work
> > since this would mean that struct uid_gid_map won't fit into a single cache-line
> > anymore thereby regressing performance for the base-cases. The same problem
> > seems to arise when using a single pointer. So the idea is to keep the base
> > cases (0-3 mappings) directly in struct uid_gid_map so they fit into a single
> > cache-line of 64 byte. For the two removed mappings we place three pointers in
> > the struct that mock the behavior of traditional filesystems:
> > 1. a direct pointer to a struct uid_gid_extent of 5 mappings of 60 bytes
> > 2. an indirect pointer to an array of 64 byte of direct pointers to struct
> >    uid_gid_extent of 5 mappings a 60 bytes each
> > 3. a double indirect pointer to an array of 64 bytes of indirect pointers each
> >    to an array of 64 bytes of direct pointers (and so on)
> > Fixing a pointer size of 8 byte this gives us 3 + 5 + (8 * 5) + (8 * (8 * 5)) =
> > 368 mappings which should really be enough. The idea of this approach is to
> > always have each extent of idmaps (struct uid_gid_extent) be 60 bytes (5 * (4 +
> > 4 + 4) and thus 4 bytes smaller than the size of a single cache line. This
> > should only see a (i.e. linear) performance impact caused by iterating through
> > the idmappings in a for-loop. Note that the base cases shouldn't see any
> > performance degradation which is the most important part.
> 
> Sounds like a good plan.
> 
> > Performance Testing:
> > When Eric introduced the extent-based struct uid_gid_map approach he measured
> > the performanc impact of his idmap changes:
> > 
> > > My benchmark consisted of going to single user mode where nothing else was
> > > running. On an ext4 filesystem opening 1,000,000 files and looping through all
> > > of the files 1000 times and calling fstat on the individuals files.  This was
> > > to ensure I was benchmarking stat times where the inodes were in the kernels
> > > cache, but the inode values were not in the processors cache. My results:
> > 
> > > v3.4-rc1:         ~= 156ns (unmodified v3.4-rc1 with user namespace support disabled)
> > > v3.4-rc1-userns-: ~= 155ns (v3.4-rc1 with my user namespace patches and user namespace support disabled)
> > > v3.4-rc1-userns+: ~= 164ns (v3.4-rc1 with my user namespace patches and user namespace support enabled)
> > 
> > I used an identical approach on my laptop. Here's a thorough description of what
> > I did. I built three kernels and used an additional "control" kernel:
> > 
> > 1. v4.14-rc2-vanilla (unmodified v4.14-rc2)
> > 2. v4.14-rc2-userns+ (v4.14-rc2 with my new user namespace idmap limits patch)
> > 3. v4.14-rc2-userns- (v4.14-rc2 without my new user namespace idmap limits patch)
> 
> ^ you mean *withYou your patch but with CONFIG_USER_NS=n ?

Yes, exactly. Sorry, that was unclear here.

> 
> > 4. v4.12.0-12-generic (v4.12.0-12 standard Ubuntu kernel)
> 
> ^ Just curious, why did you include this?  To show that other factors have a much
> larger impact?  This does not include your patch, right?

Basically I wanted something which I didn't compile and see if the numbers
somehow line-up. In terms of experimentation you could think of this as a second
"control condition".

> 
> > 
> > I booted into single user mode (systemd rescue target in newspeak) and used an
> > ext4 filesystem to open/create 1,000,000 files. Then I looped through all of the
> > files calling fstat() on each of them 1000 times and calculated the mean fstat()
> > time for a single file. (The test program can be found below.)
> > 
> > For kernels v4.14-rc2-vanilla, v4.12.0-12-generic I tested the following cases:
> >   0 mappings
> >   1 mapping
> >   2 mappings
> >   3 mappings
> >   5 mappings
> > 
> > For kernel v4.4-rc2-userns+ I tested:
> >     0 mappings
> >     1 mapping
> >     2 mappings
> >     3 mappings
> >     5 mappings
> >    10 mappings
> >    50 mappings
> >   100 mappings
> >   200 mappings
> >   300 mappings
> > 
> > Here are the results:
> > 
> > - v4.14-rc2-vanilla (unmodified v4.14-rc2)
> >   # no unshare:                  312 ns
> >   unshare -U # write 0 mappings: 307 ns
> >   unshare -U # write 1 mappings: 328 ns
> >   unshare -U # write 2 mappings: 328 ns
> >   unshare -U # write 3 mappings: 328 ns
> >   unshare -U # write 5 mappings: 338 ns
> > 
> > - v4.14-rc2-userns+ (v4.14-rc2 with my new user namespace idmap limits patch)
> >   # no unshare:                     158 ns
> >   unshare -U # write   0 mappings:  158 ns
> >   unshare -U # write   1 mappings:  164 ns
> >   unshare -U # write   2 mappings:  170 ns
> >   unshare -U # write   3 mappings:  175 ns
> >   unshare -U # write   5 mappings:  187 ns
> >   unshare -U # write  10 mappings:  218 ns
> >   unshare -U # write  50 mappings:  528 ns
> >   unshare -U # write 100 mappings:  980 ns
> >   unshare -U # write 200 mappings: 1880 ns
> >   unshare -U # write 300 mappings: 2760 ns
> > 
> > - v3.4-rc1-userns-: ~= 155ns (v3.4-rc1 with my user namespace patches and user namespace support disabled)
> >   # no unshare: 161 ns
> > 
> > - 4.12.0-12-generic Ubuntu Kernel:
> >   # no unshare:                  328 ns
> >   unshare -U # write 0 mappings: 327 ns
> >   unshare -U # write 1 mappings: 328 ns
> >   unshare -U # write 2 mappings: 328 ns
> >   unshare -U # write 3 mappings: 328 ns
> >   unshare -U # write 5 mappings: 338 ns
> > 
> 
> ^ This is really weird.  Why does Ubuntu kernel have near-constant (horrible)
> time?

I actually think - even in single user mode - with the same number of processes
running and so on - that there's a lot of fluctuation going on. That's why I ran
the tests multiple times. It might also depend on compilation since I compiled
the three kernels myself and just downloaded the binaries for the ubuntu kernel.
The tests clearly show that there's an increase with the number of mappings
which is what I expected.

> 
> > I've tested this multiple times and the numbers hold up. All v4.14-rc2 kernels
> > were built on the same machine with the same .config, the same options and a
> > simple call to make -j 11 bindeb-pkg. The 4.12 kernel was simply installed from
> > the Ubuntu archives.
> > 
> > The most import part seems to me that my idmap patches don't regress performance
> > for the base-cases. I'd actually only consider 0 and 1 mapping to be the proper
> 
> Agreed.  Now personally I probably would have kept 4 direct pointers then make
> the 5+ case hurt more, but I'm not saying that's the right thing.

Yeah, I thought about that as well but my goal was to basically ramp up the
number of mappings into the hundreds to settle this "once and for all". I
actually don't expect us to go any higher than this. Tbh, users that have a
requirement to have many mappings should be prepared to take the performance
hit. Also, I think that the direct pointers won't necessarily give you more
speed since - I'd guess - that the slowdown simply comes from the number of
iterations through the map you have to do and not necessarily from cache misses.
But I might be thinking nonsense here. Thanks!

> 
> (haven't looked at the patch itself yet)
> 
> thanks,
> -serge

next prev parent reply	other threads:[~2017-10-04 14:45 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-04 11:20 [PATCH] user namespaces: bump idmap limits Christian Brauner
2017-10-04 14:28 ` Serge E. Hallyn
2017-10-04 14:45   ` Christian Brauner [this message]
2017-10-11 20:30 ` Serge E. Hallyn
2017-10-11 20:43   ` Eric W. Biederman
2017-10-12 14:09     ` Christian Brauner
2017-10-12 17:08       ` Eric W. Biederman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171004144524.6ndtzacgrkvxnxaq@gmail.com \
    --to=christian.brauner@canonical.com \
    --cc=christian.brauner@ubuntu.com \
    --cc=ebiederm@xmission.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=serge@hallyn.com \
    --cc=stgraber@ubuntu.com \
    --cc=tycho@tycho.ws \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.