From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Serge E. Hallyn" Subject: Re: [REVIEW][PATCH 3/3] vfs: Fix a regression in mounting proc Date: Fri, 29 Nov 2013 14:56:27 +0000 Message-ID: <20131129145627.GA32692@mail.hallyn.com> References: <87k3g5gnuv.fsf@xmission.com> <20131126181043.GA25492@mail.hallyn.com> <87siui1z1g.fsf_-_@xmission.com> <87pppmzoin.fsf_-_@xmission.com> <20131127161300.GA24773@redhat.com> <871u21oeyr.fsf@xmission.com> <20131127194722.GA32673@redhat.com> <87iovdmxl7.fsf@xmission.com> <87wqjtlic3.fsf@xmission.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Content-Disposition: inline In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: containers-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org Errors-To: containers-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org To: Andy Lutomirski Cc: Aditya Kali , "Eric W. Biederman" , Containers , Oleg Nesterov , Linux FS Devel List-Id: containers.vger.kernel.org Quoting Andy Lutomirski (luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org): > On Wed, Nov 27, 2013 at 12:07 PM, Eric W. Biederman > wrote: > > ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org (Eric W. Biederman) writes: > > > >> Oleg Nesterov writes: > >> > >>> Just to avoid the possible confusion, let me repeat that the fix itsef > >>> looks "obviously fine" to me, "i_nlink != 2" looks obviously wrong. > >>> > >>> I am not arguing with this patch, I am just trying to understand this > >>> logic. > >>> > >>> On 11/27, Eric W. Biederman wrote: > >>>> > >>>> [... snip ...] > >>> > >>> Thanks a lot. > >>> > >>>> For the real concern about jail environments where proc and sysfs are > >>>> not mounted at all a fs_visible check is all that is really required, > >>> > >>> this is what I can't understand... > >>> > >>> Lets ignore the implementation details. Suppose that proc was never > >>> mounted. Then "mount -t proc" should fail after CLONE_NEWUSER | NEWNS? > >> > >> Yes. > > > > Well strictly speaking it should fail after CLONE_NEWUSER | NEWNS | NEWPID. > > If proc was never mounted. > > > > Fresh mounts of proc are not allowed unless you have also created the > > pid namespace. With just CLONE_NEWUSER | NEWNS you are limited to bind > > mounts. > > > > Has this cleared up the confusion? > > > > Eric > > > > This is all obnoxiously complicated. I wonder if we can do (a lot) > better by allowing a "pid-only" variant of proc to be mounted. It > should contain: > > - All the pid directories > - /proc/self, /proc/net, and /proc/mounts (but possibly not > /proc/PID/net -- that's a weird interface IMO and isn't really related > to the pid) > - keys key-users (wtf is up with that interface, though -- those > files are way too magical) > - cpuinfo, version, and maybe other informational things (crypto?) > - loadavg, perhaps > > I wonder it would be possible to boot a reasonable container with a > heavily limited /proc like that. Should be possible. And heck, maybe some of the values could then be virtualized :) cmdline could point to the container init's cmdline; cpuinfo and loadavg and meminfo be filtered through cgroupfs. -serge