Re: [PATCH v3 2/2] sysctl: handle overflow for file-max

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Andrea Arcangeli <aarcange@redhat.com>
To: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Christian Brauner <christian@brauner.io>,
	keescook@chromium.org, linux-kernel@vger.kernel.org,
	ebiederm@xmission.com, mcgrof@kernel.org,
	akpm@linux-foundation.org, joe.lawrence@redhat.com,
	longman@redhat.com, linux@dominikbrodowski.net,
	adobriyan@gmail.com, linux-api@vger.kernel.org,
	Miklos Szeredi <miklos@szeredi.hu>,
	Eric Dumazet <eric.dumazet@gmail.com>
Subject: Re: [PATCH v3 2/2] sysctl: handle overflow for file-max
Date: Thu, 18 Oct 2018 17:58:42 -0400	[thread overview]
Message-ID: <20181018215842.GE20140@redhat.com> (raw)
In-Reply-To: <20181017003548.GA32577@ZenIV.linux.org.uk>

Hi Al,

On Wed, Oct 17, 2018 at 01:35:48AM +0100, Al Viro wrote:
> On Wed, Oct 17, 2018 at 12:33:22AM +0200, Christian Brauner wrote:
> > Currently, when writing
> > 
> > echo 18446744073709551616 > /proc/sys/fs/file-max
> > 
> > /proc/sys/fs/file-max will overflow and be set to 0. That quickly
> > crashes the system.
> > This commit sets the max and min value for file-max and returns -EINVAL
> > when a long int is exceeded. Any higher value cannot currently be used as
> > the percpu counters are long ints and not unsigned integers. This behavior
> > also aligns with other tuneables that return -EINVAL when their range is
> > exceeded. See e.g. [1], [2] and others.
> 
> Mostly sane, but...  get_max_files() users are bloody odd.  The one in
> file-max limit reporting looks like a half-arsed attempt in "[PATCH] fix
> file counting".  The one in af_unix.c, though...  I don't remember how
> that check had come to be - IIRC that was a strange fallout of a thread
> with me, Andrea and ANK involved, circa 1999, but I don't remember details;
> Andrea, any memories?  It might be worth reconsidering...  The change in
> question is in 2.2.4pre6; what do we use unix_nr_socks for?  We try to
> limit the number of PF_UNIX socks by 2 * max_files, but max_files can be
> huge *and* non-constant (i.e. it can decrease).  What's more, unix_tot_inflight
> is unsigned int and max_files might exceed 2^31 just fine since "fs: allow
> for more than 2^31 files" back in 2010...  Something's fishy there...

Feels like a lifetime ago :), but looking into I remembered some of
it. That thread was about some instability in unix sockets for an
unrelated bug in the garbage collector. While reviewing your fix, I
probably incidentally found a resource exhaustion problem in doing a
connect();close() loop on a listening stream af_unix. I found an
exploit somewhere in my home dated in 99 in ls -l. Then ANK found
another resource exhaustion by sending datagram sockets, which I also
found an exploit for in my home.

ANK pointed out that a connect syscall allocates two sockets, one to
be accepted by the listening process, the other is the connect
itself. That must be the explanation of the "*2".

The "max_files*2" is probably a patch was from you (which was not
overflowing back then), in attempt to fix the garbage collector issue
which initially looked like resource exhaustion.

I may have suggested to check sk_max_ack_backlog and fail connect() in
such case to solve the resource exhaustion, but my proposal was
obviously broken because connect() would return an error when the
backlog was full and I suppose I didn't implement anything like
unix_wait_for_peer. So I guess (not 100% sure) the get_max_files()*2
check stayed, not because of the bug in the garbage collector that was
fixed independently, but as a stop gap measure for the
connect();close() loop resource exhaustion.

I tried the exploit that does a connect();close() in a loop and it
gracefully hangs in unix_wait_for_peer() after sk_max_ack_backlog
connects.

Out of curiosity I tried also the dgram exploit and it hangs in
sock_alloc_send_pskb with sk_wmem_alloc_get(sk) < sk->sk_sndbuf
check. The file*2 limit couldn't have helped that one anyway.

If I set /proc/sys/net/core/somaxconn to 1000000 the exploit works
fine again and the connect;close loop again allocates infinite amount
of kernel RAM into a tiny RSS process and it triggered OOM (there was
no OOM killer in v2.2 I suppose). By default it's 128. There's also
sysctl_max_dgram_qlen for dgram that on Android is set to 600 (by
default 10).

I tend to think these resources are now capped by other means (notably
somaxconn, sysctl_max_dgram_qlen, sk_wmem_alloc_get) and unix_nr_socks
can be dropped. Or if that atomic counter is still needed it's not for
a trivial exploit anymore than just does listen(); SIGSTOP() from one
process and a connect();close() loop in another process. It'd require
more than a listening socket and heavily forking or a large increase
on the max number of file descriptors (a privileged op) to do a ton of
listens, but forking has its own memory footprint in userland too. At
the very least it should be a per-cpu counter synced to the atomic
global after a threshold.

The other reason for dropping is that it wasn't ideal that the trivial
exploit could still allocated max_files*2 SYN skbs with a loop of
connect;close, max_files*2 is too much already so I suppose it was
only a stop-gap measure to begin with.

Thanks,
Andrea

next prev parent reply	other threads:[~2018-10-18 21:58 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-16 22:33 [PATCH v3 0/2] sysctl: handle overflow for file-max Christian Brauner
2018-10-16 22:33 ` [PATCH v3 1/2] sysctl: handle overflow in proc_get_long Christian Brauner
2018-10-16 23:45   ` Eric W. Biederman
2018-10-17  0:24     ` Christian Brauner
2018-10-17  2:19       ` Kees Cook
2018-10-16 22:33 ` [PATCH v3 2/2] sysctl: handle overflow for file-max Christian Brauner
2018-10-17  0:35   ` Al Viro
2018-10-17  9:57     ` Christian Brauner
2018-10-18 21:58     ` Andrea Arcangeli [this message]
2018-10-16 22:36 ` [PATCH v3 0/2] " Kees Cook
2018-10-29 14:58 ` Christian Brauner
2018-10-29 21:44   ` Kees Cook
2018-12-09 16:40     ` Christian Brauner
2018-12-10 17:51       ` Kees Cook

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181018215842.GE20140@redhat.com \
    --to=aarcange@redhat.com \
    --cc=adobriyan@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=christian@brauner.io \
    --cc=ebiederm@xmission.com \
    --cc=eric.dumazet@gmail.com \
    --cc=joe.lawrence@redhat.com \
    --cc=keescook@chromium.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@dominikbrodowski.net \
    --cc=longman@redhat.com \
    --cc=mcgrof@kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=viro@ZenIV.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.