From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-io0-f193.google.com ([209.85.223.193]:39114 "EHLO mail-io0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751436AbeCNGVL (ORCPT ); Wed, 14 Mar 2018 02:21:11 -0400 Received: by mail-io0-f193.google.com with SMTP id v10so2990040iob.6 for ; Tue, 13 Mar 2018 23:21:11 -0700 (PDT) Subject: Re: [PATCH RFC bpf-next 1/6] bpf: Hooks for sys_bind To: Alexei Starovoitov , davem@davemloft.net Cc: daniel@iogearbox.net, netdev@vger.kernel.org, kernel-team@fb.com References: <20180314033934.3502167-1-ast@kernel.org> <20180314033934.3502167-2-ast@kernel.org> From: Eric Dumazet Message-ID: <77f77631-f8ad-dc0c-94ce-ec561d4c10f9@gmail.com> Date: Tue, 13 Mar 2018 23:21:08 -0700 MIME-Version: 1.0 In-Reply-To: <20180314033934.3502167-2-ast@kernel.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: netdev-owner@vger.kernel.org List-ID: On 03/13/2018 08:39 PM, Alexei Starovoitov wrote: > From: Andrey Ignatov > > == The problem == > > There is a use-case when all processes inside a cgroup should use one > single IP address on a host that has multiple IP configured. Those > processes should use the IP for both ingress and egress, for TCP and UDP > traffic. So TCP/UDP servers should be bound to that IP to accept > incoming connections on it, and TCP/UDP clients should make outgoing > connections from that IP. It should not require changing application > code since it's often not possible. > > Currently it's solved by intercepting glibc wrappers around syscalls > such as `bind(2)` and `connect(2)`. It's done by a shared library that > is preloaded for every process in a cgroup so that whenever TCP/UDP > server calls `bind(2)`, the library replaces IP in sockaddr before > passing arguments to syscall. When application calls `connect(2)` the > library transparently binds the local end of connection to that IP > (`bind(2)` with `IP_BIND_ADDRESS_NO_PORT` to avoid performance penalty). > > Shared library approach is fragile though, e.g.: > * some applications clear env vars (incl. `LD_PRELOAD`); > * `/etc/ld.so.preload` doesn't help since some applications are linked > with option `-z nodefaultlib`; > * other applications don't use glibc and there is nothing to intercept. > > == The solution == > > The patch provides much more reliable in-kernel solution for the 1st > part of the problem: binding TCP/UDP servers on desired IP. It does not > depend on application environment and implementation details (whether > glibc is used or not). > If I understand well, strace(1) will not show the real (after modification by eBPF) IP/port ? What about selinux and other LSM ? We have now network namespaces for full isolation. Soon ILA will come. The argument that it is not convenient (or even possible) to change the application or using modern isolation is quite strange, considering the added burden/complexity/bloat to the kernel. The post hook for sys_bind is clearly a failure of the model, since releasing the port might already be too late, another thread might fail to get it during a non zero time window. It seems this is exactly the case where a netns would be the correct answer. If you want to provide an alternate port allocation strategy, better provide a correct eBPF hook.