From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932594AbbJPQS5 (ORCPT ); Fri, 16 Oct 2015 12:18:57 -0400 Received: from mail-pa0-f42.google.com ([209.85.220.42]:36473 "EHLO mail-pa0-f42.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751720AbbJPQSr (ORCPT ); Fri, 16 Oct 2015 12:18:47 -0400 Subject: Re: [PATCH net-next 3/4] bpf: add support for persistent maps/progs To: Hannes Frederic Sowa , Daniel Borkmann , davem@davemloft.net References: <1444991103.2861759.411876897.42C807BD@webmail.messagingengine.com> Cc: viro@ZenIV.linux.org.uk, ebiederm@xmission.com, tgraf@suug.ch, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Alexei Starovoitov From: Alexei Starovoitov Message-ID: <56212366.4000307@plumgrid.com> Date: Fri, 16 Oct 2015 09:18:46 -0700 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 In-Reply-To: <1444991103.2861759.411876897.42C807BD@webmail.messagingengine.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 10/16/15 3:25 AM, Hannes Frederic Sowa wrote: > Namespaces at some point dealt with the same problem, they nowadays use > bind mounts of/proc/$$/ns/* to some place in the file hierarchy to keep > the namespace alive. This at least allows someone to build up its own > hierarchy with normal unix tools and not hidden inside a C-program. For > filedescriptors we already have/proc/$$/fd/* but it seems that doesn't > work out of the box nowadays. bind mounting of /proc/../fd was initially proposed by Andy and we've looked at it thoroughly, but after discussion with Eric it became apparent that it doesn't fit here. At the end we need shell tools to access maps. Also I think you missed the hierarchy in this patch set _is_ built with normal 'mkdir' and files are removed with 'rm'. The only thing that C does is BPF_PIN_FD of fd that was received from bpf syscall. That's way cleaner api than doing bind mount from C program. We've considered letting open() of the file return bpf specific anon-inode, but decided to reserve that for other more natural file operations. Therefore BPF_NEW_FD is needed. > I don't know in terms of how many objects bpf should be able to handle > and if such a bind-mount based solution would work, I guess not. We definitely missed you at the last plumbers where it was discussed :) > In my opinion I still favor a user space approach. that's not acceptable for tracing use cases. No daemons allowed. > Subsystems which use > ebpf in a way that no user space program needs to be running to control > them would need to export the fds by itself. E.g. something like > sysfs/kobject for tc? The hierarchy would then be in control of the > subsystem which could also create a proper naming hierarchy or maybe > even use an already given one. Do most other eBPF users really need to > persist file descriptors somewhere without user space control and pick > them up later? I think it's way cleaner to have one way of solving it (like this patch does) instead of asking every subsystem to solve it differently. We've also looked at sysfs and it's ugly when it comes to removing, since the user cannot use normal 'rm'.