From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932399Ab3KOE6b (ORCPT ); Thu, 14 Nov 2013 23:58:31 -0500 Received: from out03.mta.xmission.com ([166.70.13.233]:54078 "EHLO out03.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758230Ab3KOEyS (ORCPT ); Thu, 14 Nov 2013 23:54:18 -0500 From: ebiederm@xmission.com (Eric W. Biederman) To: Gao feng Cc: Andy Lutomirski , Linux Containers , "Serge E. Hallyn" , Linux FS Devel , "linux-kernel\@vger.kernel.org" References: <878uzmhkqg.fsf@xmission.com> <52749663.2000701@cn.fujitsu.com> <527C4D88.10907@cn.fujitsu.com> <87k3gigmgj.fsf@xmission.com> <5283299B.8080702@cn.fujitsu.com> <5284AF90.7060506@cn.fujitsu.com> <528575EC.2030309@cn.fujitsu.com> Date: Thu, 14 Nov 2013 20:54:10 -0800 In-Reply-To: <528575EC.2030309@cn.fujitsu.com> (Gao feng's message of "Fri, 15 Nov 2013 09:16:28 +0800") Message-ID: <87txfexo25.fsf@xmission.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-AID: U2FsdGVkX1+Mnc14c01t92Z5LXNSp5ACOOqx3dOoLfM= X-SA-Exim-Connect-IP: 98.207.154.105 X-SA-Exim-Mail-From: ebiederm@xmission.com X-Spam-Report: * -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP * 1.5 XMNoVowels Alpha-numberic number with no vowels * 1.5 TR_Symld_Words too many words that have symbols inside * 0.7 XMSubLong Long Subject * 0.0 T_TM2_M_HEADER_IN_MSG BODY: T_TM2_M_HEADER_IN_MSG * -0.0 BAYES_20 BODY: Bayes spam probability is 5 to 20% * [score: 0.1835] * -0.0 DCC_CHECK_NEGATIVE Not listed in DCC * [sa03 1397; Body=1 Fuz1=1 Fuz2=1] X-Spam-DCC: XMission; sa03 1397; Body=1 Fuz1=1 Fuz2=1 X-Spam-Combo: **;Gao feng X-Spam-Relay-Country: Subject: Re: [REVIEW][PATCH 1/2] userns: Better restrictions on when proc and sysfs can be mounted X-Spam-Flag: No X-SA-Exim-Version: 4.2.1 (built Wed, 14 Nov 2012 14:26:46 -0700) X-SA-Exim-Scanned: Yes (on in01.mta.xmission.com) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Gao feng writes: > On 11/15/2013 12:54 AM, Andy Lutomirski wrote: >> On Thu, Nov 14, 2013 at 3:10 AM, Gao feng wrote: >>> On 11/13/2013 03:26 PM, Gao feng wrote: >>>> On 11/09/2013 01:42 PM, Eric W. Biederman wrote: >>>>> Right now I would rather not have the empty directory exception than >>>>> remove this code. >>>>> >>>>> The test is a little trickier to write than it might otherwise be >>>>> because /proc and /sys tend to be slightly imperfect filesystems. >>>>> >>>>> I think the only way to really test that is to call readdir on the >>>>> directory itself :( I don't like that thought. >>>>> >>>>> I don't know what I was thinking when I wrote that test but I definitely >>>>> goofed up. Grr! >>>>> >>>>> I can certainly filter out any directory with nlink > 2. That would be >>>>> an easy partial step forward. >>>>> >>>>> The real question though is how do I detect directories it is safe to >>>>> mount on where there will not be files in them. I can't call iterate >>>>> with the namespace_lock held so things are a bit tricky. >>>>> >>>> >>>> I know this problem is not easy to be resolved. why not let the user >>>> make the decision? maybe we can introduce a new mount option MS_LOCK, >>>> if user wants to use mount to hide something, he should use mount with >>>> option MS_LOCK. so the unpriviged user can't umount this filesystem and >>>> fail to mount the filesystem if one of it's child mount is mounted with >>>> MS_LOCK option otherwise he use MS_REC too. >>>> >>> >>> Something like this. >>> >>> From 437f33ea366623c7a9d557b2e84cae424876a44f Mon Sep 17 00:00:00 2001 >>> From: Gao feng >>> Date: Wed, 13 Nov 2013 16:06:46 +0800 >>> Subject: [PATCH] userns: introduce new mount option MS_LOCK >>> >>> After commit 5ff9d8a65ce80efb509ce4e8051394e9ed2cd942 >>> vfs: Lock in place mounts from more privileged users, >>> in userns, the mounts of child mntns which copied from >>> parent mntns is locked and user has no rights to umount/move >>> them, it's too strict. >>> >>> The core purpose of above commit is trying to prevent >>> unprivileged user from accessing files hidden by mount. >>> This patch introduces a new mount option MS_LOCK, this >>> gives user the capable to mount filesystem as the type >>> of lock if he wants to use mount to hide something. >>> >> >> This is bad -- if something was secure in old kernels, it needs to >> stay secure. If you had MS_NOT_A_LOCK, that would be okay, but it >> might not solve your problem. >> > > what you mean old kernels here? I saw patch "vfs: Lock in place mounts from more privileged users" > is merged into upstream in linux 3.12-rc1, this is not very old. I think there > are not many userspace processes rely on this feature. Sort of true. Most people aren't that silly. This feature was added to defend against a theoretical attack that you can use with mount namespaces. In particular the scenario we are concerned with is: Suppose the file system looks like: Suppose there are two filesystems a and b that look like: a:/usr/ a:/usr/my_very_secret_file a:/dev/ a:/etc/ a:/lib/ b:/bin/ b:/etc/ b:/games/ b:/include/ b:/lib/ b:/lib32/ b:/local/ b:/sbin/ b:/share/ b:/src/ And filesystem b is mounted on a:/usr hiding a:/usr/my_very_secret_file So the filesystem looks like: /usr/ /usr/bin/ /usr/etc/ /usr/games/ /usr/include/ /usr/lib/ /usr/lib32/ /usr/local/ /usr/sbin/ /usr/share/ /usr/src/ /dev/ /etc/ /lib/ Without locking mounts into place an unprivileged user can clone the mount namespace and do "umount /usr" and read /usr/my_very_secret_file. Most systems don't hide sensitive things with mounts but it is very possible and guarding against is fairly cheap and easy. And while a little annoying it should not be a large impediment to unprivileged user of the user namespace because pivot root still works. This thread started talking about bugs in fs_fully_visible. And those bugs are fixable and I aim to get to them shortly. At the very least I can lie and test for nlink <= 2 which fixes the regression in mounting proc. Then I can write the fun version that takes references and drops locks so it can call the internal version of readdir to see if a directory is actually empty. But the principle remains the same we really don't want to reveal anything that is hidden under a mount on purpose or by mistake. Just because then we don't have to think about those things from a security point of view making everyone's life easier. Eric