From mboxrd@z Thu Jan 1 00:00:00 1970 From: ebiederm@xmission.com (Eric W. Biederman) Subject: Re: [PATCH] [RFC] mnt: restrict a number of "struct mnt" Date: Thu, 20 Jun 2013 18:04:28 -0700 Message-ID: <87y5a4l1er.fsf@xmission.com> References: <1371457498-27241-1-git-send-email-avagin@openvz.org> <878v284iif.fsf@xmission.com> <20130619213532.GA31165@gmail.com> Mime-Version: 1.0 Content-Type: text/plain Cc: Alexander Viro , linux-fsdevel@vger.kernel.org, LKML , "Serge E. Hallyn" , Andrew Morton , Ingo Molnar , Kees Cook , Mel Gorman , Rik van Riel To: Andrey Wagin Return-path: In-Reply-To: <20130619213532.GA31165@gmail.com> (Andrey Wagin's message of "Thu, 20 Jun 2013 01:35:32 +0400") Sender: linux-kernel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org Andrey Wagin writes: > On Tue, Jun 18, 2013 at 02:56:51AM +0400, Andrey Wagin wrote: >> 2013/6/17 Eric W. Biederman : >> > So for anyone seriously worried about this kind of thing in general we >> > already have the memory control group, which is quite capable of >> > limiting this kind of thing, >> >> > and it limits all memory allocations not just mount. >> >> And that is problem, we can't to limit a particular slab. Let's >> imagine a real container with 4Gb of RAM. What is a kernel memory >> limit resonable for it? I setup 64 Mb (it may be not enough for real >> CT, but it's enough to make host inaccessible for some minutes). >> >> $ mkdir /sys/fs/cgroup/memory/test >> $ echo $((64 << 20)) > /sys/fs/cgroup/memory/test/memory.kmem.limit_in_bytes >> $ unshare -m >> $ echo $$ > /sys/fs/cgroup/memory/test/tasks >> $ mount --make-rprivate / >> $ mount -t tmpfs xxx /mnt >> $ mount --make-shared /mnt >> $ time bash -c 'set -m; for i in `seq 30`; do mount --bind /mnt >> `mktemp -d /mnt/test.XXXXXX` & done; for i in `seq 30`; do wait; >> done' >> real 0m23.141s >> user 0m0.016s >> sys 0m22.881s >> >> While the last script is working, nobody can't to read /proc/mounts or >> mount something. I don't think that users from other containers will >> be glad. This problem is not so significant in compared with umounting >> of this tree. >> >> $ strace -T umount -l /mnt >> umount("/mnt", MNT_DETACH) = 0 <548.898244> >> The host is inaccessible, it writes messages about soft lockup in >> kernel log and eats 100% cpu. > > Eric, do you agree that > * It is a problem > * Currently we don't have a mechanism to prevent this problem > * We need to find a way to prevent this problem Ugh. I knew mount propagation was annoying semantically but I had not realized the implementation was quite so bad. This doesn't happen in normal operation to normal folks. So I don't think this is something we need to rush in a fix at the last moment to prevent the entire world from melting down. Even people using mount namespaces in containers. I do think it is worth looking at. Which kernel were you testing?. I haven't gotten as far as looking too closely but I just noticed that Al Viro has been busy rewriting the lock of this. So if you aren't testing at least 2.10-rcX you probably need to retest. My thoughts would be. Improve the locking as much as possible, and if that is not enough keep a measure of how many mounts will be affected at least for the umount. Possibly for the umount -l case. Then just don't allow the complexity to exceed some limit so we know things will happen in a timely manner. Eric