From: "Stéphane Graber" <stgraber@ubuntu.com>
To: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>,
"Eric W. Biederman" <ebiederm@xmission.com>,
Cong Wang <xiyou.wangcong@gmail.com>,
David Miller <davem@davemloft.net>,
Linux Kernel Network Developers <netdev@vger.kernel.org>,
Patrick McHardy <kaber@trash.net>,
Stephen Hemminger <stephen@networkplumber.org>,
Cong Wang <cwang@twopensource.com>,
Stefan Bader <stefan.bader@canonical.com>,
chris.j.arges@canonical.com,
Serge Hallyn <serge.hallyn@canonical.com>,
containers@lists.linuxfoundation.org
Subject: Re: [Patch net-next] net: make neigh tables per netns
Date: Tue, 4 Nov 2014 10:49:58 -0500 [thread overview]
Message-ID: <20141104154958.GC19513@dakara> (raw)
In-Reply-To: <1404154474.14692.136223169.48BE9C85@webmail.messagingengine.com>
[-- Attachment #1: Type: text/plain, Size: 3846 bytes --]
On Mon, Jun 30, 2014 at 08:54:34PM +0200, Hannes Frederic Sowa wrote:
> Hi,
>
> On Mon, Jun 30, 2014, at 20:15, Jesper Dangaard Brouer wrote:
> >
> > On Fri, 27 Jun 2014 22:12:52 -0700 ebiederm@xmission.com (Eric W.
> > Biederman) wrote:
> > > Cong Wang <xiyou.wangcong@gmail.com> writes:
> > > > On Thu, Jun 26, 2014 at 3:44 PM, David Miller <davem@davemloft.net> wrote:
> > > >>
> > [...]
> > > >
> > > > Hmm, I did overlook the potential DOS problem. But hold on, isn't
> > > > IP fragments have the same problem? The fragment queues are per
> > > > netns, and the thresh is per netns as well, we will eventually have
> > > > memory pressure as well.
> > >
> > > Interesting. It does look like ip fragments are susceptible that way.
> >
> > For IP fragments we have per netns mem-limit and LRU-list, but all
> > netns share the same hash table, which have its own DoS potential.
> >
> > And argh! - we have a hardcoded INETFRAGS_MAXDEPTH=128, which can be
> > used for (slow) DoS of IP frags if enough netns are created.
> >
> > https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/tree/net/ipv4/inet_fragment.c#n344
> >
> > Introduced by commit 5a3da1fe9 ("inet: limit length of fragment queue
> > hash table bucket lists").
>
> Sure, but we need that, otherwise even a single netns can get exploited
> up to a remotely triggered lockup of the box - e.g.
> https://gist.github.com/hannes/5116331 - on some smaller machines.
> INETFRAGS_MAXDEPTH is a property of the hashtable and walking a chain
> with more than 128 elements is just crazy.
>
> Also, for me making this user configurable doesn't seem to provide a
> benefit.
>
> Sure, it does introduce some kind of unfairness between the namespaces,
> but so does all code which overcommits shared resources.
>
> Bye,
> Hannes
Hello,
As a way to test this issue and show how easy it is to DoS a machine by
filling the IPv6 neighborhood table, I've written this small example:
https://dl.stgraber.org/ipv6-dos.c
This can be run as a nobody user on any kernel with user namespaces enabled.
What it does is unshare a new user namespace and then a new network
namespace inside it. It then creates a veth pair, assigns 4000 IPv6
addresses on the first interface of the pair, then forks, unshares
another network namespace, moves the second interface of the pair in
there and assigns another 4000 IPv6 addresses.
At that point, you have two interfaces, one in the first network
namespace the second in the other network namespace, each with 4000 IPv6
addresses. This tool will then start a simple TCP server in one of the
namespace and in the other, open 4000 connections, each using a
different source and destination address.
The result is 4000 open connections, in theory requiring 8000 IPv6
neighborhood table entries.
Once the tool is done attempting to open that many connections, any
attempt to connect to a host in a directly connected IPv6 subnet
(so requiring a new neighborhood table entry) will fail with EINVAL.
While the global limit can indeed be bumped, so can the number of
connections established by this tool. I don't believe a global limit
influence by the number of namespaces would help here either since
whatever the resulting global limit ends up being, the tool can be
changed to establish $global_limit+1 connections.
I'm mostly a userspace guy and don't really know the details of the
kernel implementation, but considering that device creation and adding
addresses is now possible by any unprivileged user, having the limit of
neighborhood entries be per-interface rather than global would make
sense to me.
Hopefully this helped clarifiy the problem we've been seeing lately.
--
Stéphane Graber
Ubuntu developer
http://www.canonical.com
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
prev parent reply other threads:[~2014-11-04 15:50 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-06-23 22:09 [Patch net-next] net: make neigh tables per netns Cong Wang
2014-06-25 23:33 ` David Miller
2014-06-26 0:04 ` Eric W. Biederman
2014-06-26 0:22 ` Cong Wang
2014-06-26 1:17 ` Eric W. Biederman
2014-06-26 6:14 ` Michal Kubecek
2014-06-26 12:10 ` Eric W. Biederman
2014-06-26 20:43 ` David Miller
[not found] ` <87egybibh5.fsf@x220.int.ebiederm.org>
2014-06-26 22:44 ` David Miller
2014-06-28 0:09 ` Cong Wang
2014-06-28 5:12 ` Eric W. Biederman
2014-06-30 18:15 ` Jesper Dangaard Brouer
2014-06-30 18:54 ` Hannes Frederic Sowa
2014-11-04 15:49 ` Stéphane Graber [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20141104154958.GC19513@dakara \
--to=stgraber@ubuntu.com \
--cc=brouer@redhat.com \
--cc=chris.j.arges@canonical.com \
--cc=containers@lists.linuxfoundation.org \
--cc=cwang@twopensource.com \
--cc=davem@davemloft.net \
--cc=ebiederm@xmission.com \
--cc=hannes@stressinduktion.org \
--cc=kaber@trash.net \
--cc=netdev@vger.kernel.org \
--cc=serge.hallyn@canonical.com \
--cc=stefan.bader@canonical.com \
--cc=stephen@networkplumber.org \
--cc=xiyou.wangcong@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).