From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S932594AbbJPQS5 (ORCPT <rfc822;w@1wt.eu>);
	Fri, 16 Oct 2015 12:18:57 -0400
Received: from mail-pa0-f42.google.com ([209.85.220.42]:36473 "EHLO
	mail-pa0-f42.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1751720AbbJPQSr (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Fri, 16 Oct 2015 12:18:47 -0400
Subject: Re: [PATCH net-next 3/4] bpf: add support for persistent maps/progs
To: Hannes Frederic Sowa <hannes@stressinduktion.org>,
        Daniel Borkmann <daniel@iogearbox.net>, davem@davemloft.net
References: <cover.1444956943.git.daniel@iogearbox.net>
 <ab1fceb2d68876d89bb2ebb3d2b45486d3cf2388.1444956943.git.daniel@iogearbox.net>
 <1444991103.2861759.411876897.42C807BD@webmail.messagingengine.com>
Cc: viro@ZenIV.linux.org.uk, ebiederm@xmission.com, tgraf@suug.ch,
        netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
        Alexei Starovoitov <ast@kernel.org>
From: Alexei Starovoitov <ast@plumgrid.com>
Message-ID: <56212366.4000307@plumgrid.com>
Date: Fri, 16 Oct 2015 09:18:46 -0700
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:38.0)
 Gecko/20100101 Thunderbird/38.3.0
MIME-Version: 1.0
In-Reply-To: <1444991103.2861759.411876897.42C807BD@webmail.messagingengine.com>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On 10/16/15 3:25 AM, Hannes Frederic Sowa wrote:
> Namespaces at some point dealt with the same problem, they nowadays use
> bind mounts of/proc/$$/ns/* to some place in the file hierarchy to keep
> the namespace alive. This at least allows someone to build up its own
> hierarchy with normal unix tools and not hidden inside a C-program. For
> filedescriptors we already have/proc/$$/fd/* but it seems that doesn't
> work out of the box nowadays.

bind mounting of /proc/../fd was initially proposed by Andy and we've
looked at it thoroughly, but after discussion with Eric it became
apparent that it doesn't fit here. At the end we need shell tools
to access maps.
Also I think you missed the hierarchy in this patch set _is_ built with
normal 'mkdir' and files are removed with 'rm'.
The only thing that C does is BPF_PIN_FD of fd that was received from
bpf syscall. That's way cleaner api than doing bind mount from C
program.
We've considered letting open() of the file return bpf specific
anon-inode, but decided to reserve that for other more natural file
operations. Therefore BPF_NEW_FD is needed.

> I don't know in terms of how many objects bpf should be able to handle
> and if such a bind-mount based solution would work, I guess not.

We definitely missed you at the last plumbers where it was discussed :)

> In my opinion I still favor a user space approach.

that's not acceptable for tracing use cases. No daemons allowed.

> Subsystems which use
> ebpf in a way that no user space program needs to be running to control
> them would need to export the fds by itself. E.g. something like
> sysfs/kobject for tc? The hierarchy would then be in control of the
> subsystem which could also create a proper naming hierarchy or maybe
> even use an already given one. Do most other eBPF users really need to
> persist file descriptors somewhere without user space control and pick
> them up later?

I think it's way cleaner to have one way of solving it (like this patch
does) instead of asking every subsystem to solve it differently.
We've also looked at sysfs and it's ugly when it comes to removing,
since the user cannot use normal 'rm'.