From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.1 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C993CC433FF for ; Thu, 8 Aug 2019 21:17:25 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 59794214C6 for ; Thu, 8 Aug 2019 21:17:25 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="K9hwg7mW" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2403886AbfHHVRY (ORCPT ); Thu, 8 Aug 2019 17:17:24 -0400 Received: from mail-qk1-f196.google.com ([209.85.222.196]:40198 "EHLO mail-qk1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725535AbfHHVRY (ORCPT ); Thu, 8 Aug 2019 17:17:24 -0400 Received: by mail-qk1-f196.google.com with SMTP id s145so70069239qke.7 for ; Thu, 08 Aug 2019 14:17:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to :user-agent; bh=T0f/LSfyr9pfDYVJW+FXComVlRgi5P9vcJVkKlRWKbI=; b=K9hwg7mWJUhEJo2FXfSJRx/SznAUhpOIEQO1b3x4ttDWPEPqz7QluB7FYdg3pxS80+ MzBtOl9paEA1ri7WRGkohl0PF5IPauAlgjU0huljRxgBw7FZOx2Z7Wa9HT5BBtVz/gHo ss2nUt9NXLlksGvtjrVJ7GwSDbAbi0wX4dOZXovFwf1L3zuX0N7Vg6mibMHdkrm/cEsK YWOMRFo+QQlRhzisgDCx6X8HW2UlSppXGUUjjuwQujo8KHm0suWi6sBZehEuhOSYxvzj UA45IS+8Apaj2pjQRz6hsKjrNmal554KrE/F+iQK1O1bJDQB/u9yPN0fAyLnFtNf61p+ FR4Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to:user-agent; bh=T0f/LSfyr9pfDYVJW+FXComVlRgi5P9vcJVkKlRWKbI=; b=ofVqudsKeQ7XqL0Trtw29Bg8rnVsKbUM0x9eqZCFEWgqjs5z7fqANmf6gTbIkNasqR o18GlpnbPKHnPRuN95yw4AwiugTXDDPwJKzNzD8+S7+tsyfTVyI9dbMM+UuL/FsH4QkO DJw3DpqxHUYJG3NTmrxA2VuV3BCU04bsVM/OFpO1UzrQfGEGLeBqZwM8ZSKNOnn6SQAL Y8uJWNa96B+QbNNx/xdygq/acJjMj4YSVi3rxCP1ltfGiDwxCSVWysWhEM4Op2qZOHEg ZG9cyxI7M9yNo1xpzIhMwG2l8OM9sDMar05uNVdyM7nO7EC02NFGBbH1tdvfo7Y8clUV Yh/w== X-Gm-Message-State: APjAAAXAiLkbrQlMFBlh2yJ9cI/oBGz2KpT3mM1AEUVpvMASawKCYjUw PoecSP33+U3VRUhMMQWxm+dHm+kMGgOKUw== X-Google-Smtp-Source: APXvYqzA1C2MhDg8sHxiMyu8o9wFGLgzeikBs1RtolFGfeYiIpqvUQjsKTxx5+jO6kGkYoIbZLBT0Q== X-Received: by 2002:a37:6109:: with SMTP id v9mr9310486qkb.432.1565299041262; Thu, 08 Aug 2019 14:17:21 -0700 (PDT) Received: from dev00 ([190.162.109.53]) by smtp.gmail.com with ESMTPSA id r4sm61268975qta.93.2019.08.08.14.17.19 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 08 Aug 2019 14:17:20 -0700 (PDT) Date: Thu, 8 Aug 2019 17:17:16 -0400 From: Carlos Antonio Neira Bustos To: Yonghong Song Cc: Y Song , "netdev@vger.kernel.org" , "ebiederm@xmission.com" , "brouer@redhat.com" , "quentin.monnet@netronome.com" Subject: Re: [PATCH v5 bpf-next] BPF: helpers: New helper to obtain namespacedata from current task Message-ID: <20190808211714.taet5fjr6q43na5i@dev00> References: <20190808012240.htbgpv2mhktvig5h@dev00> <96c7ea2e-7acf-e81a-61dc-a4d4562c736a@fb.com> <20190808174848.poybtaagg5ctle7t@dev00> <5d4c856b.1c69fb81.2aa4f.32dd@mx.google.com> <1c24077d-ed17-86b6-8d3f-81994105f302@fb.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <1c24077d-ed17-86b6-8d3f-81994105f302@fb.com> User-Agent: NeoMutt/20170113 (1.7.2) Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Thanks a lot, Yonghong. I'll fix and split up the patch. Thanks again for your help. Bests On Thu, Aug 08, 2019 at 08:47:14PM +0000, Yonghong Song wrote: > > > On 8/8/19 1:26 PM, carlos antonio neira bustos wrote: > > Hi Yonghong, > > > > I’m sorry, just to be sure, I’m just missing the error codes from > > filename_lookup() right ?. > > From kernel functionality point of view. Yes, I am talking about > error codes returned by filename_lookup(). > For example, if CONFIG_PID_NS or CONFIG_NAMESPACES is not > defined in the config, the path "/proc/self/ns/pid" will not exist, > the error code will return. It may be -ENOTDIR > if CONFIG_NAMESPACES not defined or -ECHILD if CONFIG_PID_NS > is not defined. Please double check. > > Please do follow the advice in > > https://lore.kernel.org/netdev/20190808174848.poybtaagg5ctle7t@dev00/T/#t > to break the single patch to multiple patches. > > I only reviewed the kernel code. Will review tools/ code > in the next properly-formatted (broken-up) commits. > > Also, please also cc commits to bpf mailing list at > bpf@vger.kernel.org > > > > > Bests > > > > Maybe some other error codes in filename_lookup() function? > > > > > + * > > > > > + *                      If unable to get the inode from > > /proc/self/ns/pid an error code > > > > > + *                      will be returned. > > > > *From: *Y Song > > *Sent: *08 August 2019 15:44 > > *To: *Carlos Antonio Neira Bustos > > *Cc: *Yonghong Song ; netdev@vger.kernel.org > > ; ebiederm@xmission.com > > ; brouer@redhat.com > > ; quentin.monnet@netronome.com > > > > *Subject: *Re: [PATCH v5 bpf-next] BPF: helpers: New helper to obtain > > namespacedata from current task > > > > On Thu, Aug 8, 2019 at 10:52 AM Carlos Antonio Neira Bustos > > > > wrote: > > > > > > > > > > Yonghong, > > > > > > > > > > I have modified the patch following your feedback. > > > > > Let me know if I'm missing something. > > > > Yes, I have some other requests about formating. > > > > https://lore.kernel.org/netdev/20190808174848.poybtaagg5ctle7t@dev00/T/#t > > > > Could you address it as well? > > > > > > > > > > Bests > > > > > > > > > > From 70f8d5584700c9cfc82c006901d8ee9595c53f15 Mon Sep 17 00:00:00 2001 > > > > > From: Carlos > > > > > Date: Wed, 7 Aug 2019 20:04:30 -0400 > > > > > Subject: [PATCH] [PATCH v6 bpf-next] BPF: New helper to obtain > > namespace data > > > > >  from current task > > > > > > > > > > This helper obtains the active namespace from current and returns > > pid, tgid, > > > > > device and namespace id as seen from that namespace, allowing to > > instrument > > > > > a process inside a container. > > > > > Device is read from /proc/self/ns/pid, as in the future it's possible > > that > > > > > different pid_ns files may belong to different devices, according > > > > > to the discussion between Eric Biederman and Yonghong in 2017 linux > > plumbers > > > > > conference. > > > > > Currently bpf_get_current_pid_tgid(), is used to do pid filtering in > > bcc's > > > > > scripts but this helper returns the pid as seen by the root namespace > > which is > > > > > fine when a bcc script is not executed inside a container. > > > > > When the process of interest is inside a container, pid filtering > > will not work > > > > > if bpf_get_current_pid_tgid() is used. This helper addresses this > > limitation > > > > > returning the pid as it's seen by the current namespace where the > > script is > > > > > executing. > > > > > > > > > > This helper has the same use cases as bpf_get_current_pid_tgid() as > > it can be > > > > > used to do pid filtering even inside a container. > > > > > > > > > > For example a bcc script using bpf_get_current_pid_tgid() > > (tools/funccount.py): > > > > > > > > > >         u32 pid = bpf_get_current_pid_tgid() >> 32; > > > > >         if (pid != ) > > > > >                 return 0; > > > > > Could be modified to use bpf_get_current_pidns_info() as follows: > > > > > > > > > >         struct bpf_pidns pidns; > > > > >         bpf_get_current_pidns_info(&pidns, sizeof(struct bpf_pidns)); > > > > >         u32 pid = pidns.tgid; > > > > >         u32 nsid = pidns.nsid; > > > > >         if ((pid != ) && (nsid != > > )) > > > > >                 return 0; > > > > > > > > > > To find out the name PID namespace id of a process, you could use > > this command: > > > > > > > > > > $ ps -h -o pidns -p > > > > > > > > > > Or this other command: > > > > > > > > > > $ ls -Li /proc//ns/pid > > > > > > > > > > Signed-off-by: Carlos Neira > > > > > --- > > > > >  fs/internal.h                                      |   2 - > > > > >  fs/namei.c                                         |   1 - > > > > >  include/linux/bpf.h                                |   1 + > > > > >  include/linux/namei.h                              |   4 + > > > > >  include/uapi/linux/bpf.h                           |  27 +++- > > > > >  kernel/bpf/core.c                                  |   1 + > > > > >  kernel/bpf/helpers.c                               |  64 ++++++++++ > > > > >  kernel/trace/bpf_trace.c                           |   2 + > > > > >  samples/bpf/Makefile                               |   3 + > > > > >  samples/bpf/trace_ns_info_user.c                   |  35 ++++++ > > > > >  samples/bpf/trace_ns_info_user_kern.c              |  44 +++++++ > > > > >  tools/include/uapi/linux/bpf.h                     |  27 +++- > > > > >  tools/testing/selftests/bpf/Makefile               |   2 +- > > > > >  tools/testing/selftests/bpf/bpf_helpers.h          |   3 + > > > > >  .../testing/selftests/bpf/progs/test_pidns_kern.c  |  51 ++++++++ > > > > >  tools/testing/selftests/bpf/test_pidns.c           | 138 > > +++++++++++++++++++++ > > > > >  16 files changed, 399 insertions(+), 6 deletions(-) > > > > >  create mode 100644 samples/bpf/trace_ns_info_user.c > > > > >  create mode 100644 samples/bpf/trace_ns_info_user_kern.c > > > > >  create mode 100644 tools/testing/selftests/bpf/progs/test_pidns_kern.c > > > > >  create mode 100644 tools/testing/selftests/bpf/test_pidns.c > > > > > > > > > > diff --git a/fs/internal.h b/fs/internal.h > > > > > index 315fcd8d237c..6647e15dd419 100644 > > > > > --- a/fs/internal.h > > > > > +++ b/fs/internal.h > > > > > @@ -59,8 +59,6 @@ extern int finish_clean_context(struct fs_context *fc); > > > > >  /* > > > > >   * namei.c > > > > >   */ > > > > > -extern int filename_lookup(int dfd, struct filename *name, unsigned > > flags, > > > > > -                          struct path *path, struct path *root); > > > > >  extern int user_path_mountpoint_at(int, const char __user *, > > unsigned int, struct path *); > > > > >  extern int vfs_path_lookup(struct dentry *, struct vfsmount *, > > > > >                            const char *, unsigned int, struct path *); > > > > > diff --git a/fs/namei.c b/fs/namei.c > > > > > index 209c51a5226c..a89fc72a4a10 100644 > > > > > --- a/fs/namei.c > > > > > +++ b/fs/namei.c > > > > > @@ -19,7 +19,6 @@ > > > > >  #include > > > > >  #include > > > > >  #include > > > > > -#include > > > > >  #include > > > > >  #include > > > > >  #include > > > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h > > > > > index f9a506147c8a..e4adf5e05afd 100644 > > > > > --- a/include/linux/bpf.h > > > > > +++ b/include/linux/bpf.h > > > > > @@ -1050,6 +1050,7 @@ extern const struct bpf_func_proto > > bpf_get_local_storage_proto; > > > > >  extern const struct bpf_func_proto bpf_strtol_proto; > > > > >  extern const struct bpf_func_proto bpf_strtoul_proto; > > > > >  extern const struct bpf_func_proto bpf_tcp_sock_proto; > > > > > +extern const struct bpf_func_proto bpf_get_current_pidns_info_proto; > > > > > > > > > >  /* Shared helpers among cBPF and eBPF. */ > > > > >  void bpf_user_rnd_init_once(void); > > > > > diff --git a/include/linux/namei.h b/include/linux/namei.h > > > > > index 9138b4471dbf..b45c8b6f7cb4 100644 > > > > > --- a/include/linux/namei.h > > > > > +++ b/include/linux/namei.h > > > > > @@ -6,6 +6,7 @@ > > > > >  #include > > > > >  #include > > > > >  #include > > > > > +#include > > > > > > > > > >  enum { MAX_NESTED_LINKS = 8 }; > > > > > > > > > > @@ -97,6 +98,9 @@ extern void unlock_rename(struct dentry *, struct > > dentry *); > > > > > > > > > >  extern void nd_jump_link(struct path *path); > > > > > > > > > > +extern int filename_lookup(int dfd, struct filename *name, unsigned > > flags, > > > > > +                          struct path *path, struct path *root); > > > > > + > > > > >  static inline void nd_terminate_link(void *name, size_t len, size_t > > maxlen) > > > > >  { > > > > >         ((char *) name)[min(len, maxlen)] = '\0'; > > > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > > > > > index 4393bd4b2419..b0d4869fb860 100644 > > > > > --- a/include/uapi/linux/bpf.h > > > > > +++ b/include/uapi/linux/bpf.h > > > > > @@ -2741,6 +2741,24 @@ union bpf_attr { > > > > >   *             **-EOPNOTSUPP** kernel configuration does not enable > > SYN cookies > > > > >   * > > > > >   *             **-EPROTONOSUPPORT** IP packet version is not 4 or 6 > > > > > + * > > > > > + * int bpf_get_current_pidns_info(struct bpf_pidns_info *pidns, u32 > > size_of_pidns) > > > > > + *     Description > > > > > + *             Copies into *pidns* pid, namespace id and tgid as > > seen by the > > > > > + *             current namespace and also device from /proc/self/ns/pid. > > > > > + *             *size_of_pidns* must be the size of *pidns* > > > > > + * > > > > > + *             This helper is used when pid filtering is needed inside a > > > > > + *             container as bpf_get_current_tgid() helper returns > > always the > > > > > + *             pid id as seen by the root namespace. > > > > > + *     Return > > > > > + *             0 on success > > > > > + * > > > > > + *             **-EINVAL** if *size_of_pidns* is not valid or unable > > to get ns, pid > > > > > + *             or tgid of the current task. > > > > > + * > > > > > + *             **-ENOMEM**  if allocation fails. > > > > > + * > > > > >   */ > > > > >  #define __BPF_FUNC_MAPPER(FN)          \ > > > > >         FN(unspec),                     \ > > > > > @@ -2853,7 +2871,8 @@ union bpf_attr { > > > > >         FN(sk_storage_get),             \ > > > > >         FN(sk_storage_delete),          \ > > > > >         FN(send_signal),                \ > > > > > -       FN(tcp_gen_syncookie), > > > > > +       FN(tcp_gen_syncookie),          \ > > > > > +       FN(get_current_pidns_info), > > > > > > > > > >  /* integer value in 'imm' field of BPF_CALL instruction selects > > which helper > > > > >   * function eBPF program intends to call > > > > > @@ -3604,4 +3623,10 @@ struct bpf_sockopt { > > > > >         __s32   retval; > > > > >  }; > > > > > > > > > > +struct bpf_pidns_info { > > > > > +       __u32 dev; > > > > > +       __u32 nsid; > > > > > +       __u32 tgid; > > > > > +       __u32 pid; > > > > > +}; > > > > >  #endif /* _UAPI__LINUX_BPF_H__ */ > > > > > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c > > > > > index 8191a7db2777..3159f2a0188c 100644 > > > > > --- a/kernel/bpf/core.c > > > > > +++ b/kernel/bpf/core.c > > > > > @@ -2038,6 +2038,7 @@ const struct bpf_func_proto > > bpf_get_current_uid_gid_proto __weak; > > > > >  const struct bpf_func_proto bpf_get_current_comm_proto __weak; > > > > >  const struct bpf_func_proto bpf_get_current_cgroup_id_proto __weak; > > > > >  const struct bpf_func_proto bpf_get_local_storage_proto __weak; > > > > > +const struct bpf_func_proto bpf_get_current_pidns_info __weak; > > > > > > > > > >  const struct bpf_func_proto * __weak bpf_get_trace_printk_proto(void) > > > > >  { > > > > > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c > > > > > index 5e28718928ca..41fbf1f28a48 100644 > > > > > --- a/kernel/bpf/helpers.c > > > > > +++ b/kernel/bpf/helpers.c > > > > > @@ -11,6 +11,12 @@ > > > > >  #include > > > > >  #include > > > > >  #include > > > > > +#include > > > > > +#include > > > > > +#include > > > > > +#include > > > > > +#include > > > > > + > > > > > > > > > >  #include "../../lib/kstrtox.h" > > > > > > > > > > @@ -312,6 +318,64 @@ void copy_map_value_locked(struct bpf_map *map, > > void *dst, void *src, > > > > >         preempt_enable(); > > > > >  } > > > > > > > > > > +BPF_CALL_2(bpf_get_current_pidns_info, struct bpf_pidns_info *, > > pidns_info, u32, > > > > > +        size) > > > > > +{ > > > > > +       const char *pidns_path = "/proc/self/ns/pid"; > > > > > +       struct pid_namespace *pidns = NULL; > > > > > +       struct filename *tmp = NULL; > > > > > +       struct inode *inode; > > > > > +       struct path kp; > > > > > +       pid_t tgid = 0; > > > > > +       pid_t pid = 0; > > > > > +       int ret; > > > > > +       int len; > > > > > + > > > > > +       if (unlikely(size != sizeof(struct bpf_pidns_info))) > > > > > +               return -EINVAL; > > > > > +       pidns = task_active_pid_ns(current); > > > > > +       if (unlikely(!pidns)) > > > > > +               goto clear; > > > > > +       pidns_info->nsid =  pidns->ns.inum; > > > > > +       pid = task_pid_nr_ns(current, pidns); > > > > > +       if (unlikely(!pid)) > > > > > +               goto clear; > > > > > +       tgid = task_tgid_nr_ns(current, pidns); > > > > > +       if (unlikely(!tgid)) > > > > > +               goto clear; > > > > > +       pidns_info->tgid = (u32) tgid; > > > > > +       pidns_info->pid = (u32) pid; > > > > > +       tmp = kmem_cache_alloc(names_cachep, GFP_ATOMIC); > > > > > +       if (unlikely(!tmp)) { > > > > > +               memset((void *)pidns_info, 0, (size_t) size); > > > > > +               return -ENOMEM; > > > > > +       } > > > > > +       len = strlen(pidns_path) + 1; > > > > > +       memcpy((char *)tmp->name, pidns_path, len); > > > > > +       tmp->uptr = NULL; > > > > > +       tmp->aname = NULL; > > > > > +       tmp->refcnt = 1; > > > > > +       ret = filename_lookup(AT_FDCWD, tmp, 0, &kp, NULL); > > > > > +       if (ret) { > > > > > +               memset((void *)pidns_info, 0, (size_t) size); > > > > > +               return ret; > > > > > +       } > > > > > +       inode = d_backing_inode(kp.dentry); > > > > > +       pidns_info->dev = inode->i_sb->s_dev; > > > > > +       return 0; > > > > > +clear: > > > > > +       memset((void *)pidns_info, 0, (size_t) size); > > > > > +       return -EINVAL; > > > > > +} > > > > > + > > > > > +const struct bpf_func_proto bpf_get_current_pidns_info_proto = { > > > > > +       .func           = bpf_get_current_pidns_info, > > > > > +       .gpl_only       = false, > > > > > +       .ret_type       = RET_INTEGER, > > > > > +       .arg1_type      = ARG_PTR_TO_UNINIT_MEM, > > > > > +       .arg2_type      = ARG_CONST_SIZE, > > > > > +}; > > > > > + > > > > >  #ifdef CONFIG_CGROUPS > > > > >  BPF_CALL_0(bpf_get_current_cgroup_id) > > > > >  { > > > > > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c > > > > > index ca1255d14576..5e1dc22765a5 100644 > > > > > --- a/kernel/trace/bpf_trace.c > > > > > +++ b/kernel/trace/bpf_trace.c > > > > > @@ -709,6 +709,8 @@ tracing_func_proto(enum bpf_func_id func_id, > > const struct bpf_prog *prog) > > > > >  #endif > > > > >         case BPF_FUNC_send_signal: > > > > >                 return &bpf_send_signal_proto; > > > > > +       case BPF_FUNC_get_current_pidns_info: > > > > > +               return &bpf_get_current_pidns_info_proto; > > > > >         default: > > > > >                 return NULL; > > > > >         } > > > > > diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile > > > > > index 1d9be26b4edd..238453ff27d2 100644 > > > > > --- a/samples/bpf/Makefile > > > > > +++ b/samples/bpf/Makefile > > > > > @@ -53,6 +53,7 @@ hostprogs-y += task_fd_query > > > > >  hostprogs-y += xdp_sample_pkts > > > > >  hostprogs-y += ibumad > > > > >  hostprogs-y += hbm > > > > > +hostprogs-y += trace_ns_info > > > > > > > > > >  # Libbpf dependencies > > > > >  LIBBPF = $(TOOLS_PATH)/lib/bpf/libbpf.a > > > > > @@ -109,6 +110,7 @@ task_fd_query-objs := bpf_load.o > > task_fd_query_user.o $(TRACE_HELPERS) > > > > >  xdp_sample_pkts-objs := xdp_sample_pkts_user.o $(TRACE_HELPERS) > > > > >  ibumad-objs := bpf_load.o ibumad_user.o $(TRACE_HELPERS) > > > > >  hbm-objs := bpf_load.o hbm.o $(CGROUP_HELPERS) > > > > > +trace_ns_info-objs := bpf_load.o trace_ns_info_user.o > > > > > > > > > >  # Tell kbuild to always build the programs > > > > >  always := $(hostprogs-y) > > > > > @@ -170,6 +172,7 @@ always += xdp_sample_pkts_kern.o > > > > >  always += ibumad_kern.o > > > > >  always += hbm_out_kern.o > > > > >  always += hbm_edt_kern.o > > > > > +always += trace_ns_info_user_kern.o > > > > > > > > > >  KBUILD_HOSTCFLAGS += -I$(objtree)/usr/include > > > > >  KBUILD_HOSTCFLAGS += -I$(srctree)/tools/lib/bpf/ > > > > > diff --git a/samples/bpf/trace_ns_info_user.c > > b/samples/bpf/trace_ns_info_user.c > > > > > new file mode 100644 > > > > > index 000000000000..e06d08db6f30 > > > > > --- /dev/null > > > > > +++ b/samples/bpf/trace_ns_info_user.c > > > > > @@ -0,0 +1,35 @@ > > > > > +// SPDX-License-Identifier: GPL-2.0 > > > > > +/* Copyright (c) 2018 Carlos Neira cneirabustos@gmail.com > > > > > + * > > > > > + * This program is free software; you can redistribute it and/or > > > > > + * modify it under the terms of version 2 of the GNU General Public > > > > > + * License as published by the Free Software Foundation. > > > > > + */ > > > > > + > > > > > +#include > > > > > +#include > > > > > +#include > > > > > +#include "bpf/libbpf.h" > > > > > +#include "bpf_load.h" > > > > > + > > > > > +/* This code was taken verbatim from tracex1_user.c, it's used > > > > > + * to exercize bpf_get_current_pidns_info() helper call. > > > > > + */ > > > > > +int main(int ac, char **argv) > > > > > +{ > > > > > +       FILE *f; > > > > > +       char filename[256]; > > > > > + > > > > > +       snprintf(filename, sizeof(filename), "%s_user_kern.o", argv[0]); > > > > > +       printf("loading %s\n", filename); > > > > > + > > > > > +       if (load_bpf_file(filename)) { > > > > > +               printf("%s", bpf_log_buf); > > > > > +               return 1; > > > > > +       } > > > > > + > > > > > +       f = popen("taskset 1 ping  localhost", "r"); > > > > > +       (void) f; > > > > > +       read_trace_pipe(); > > > > > +       return 0; > > > > > +} > > > > > diff --git a/samples/bpf/trace_ns_info_user_kern.c > > b/samples/bpf/trace_ns_info_user_kern.c > > > > > new file mode 100644 > > > > > index 000000000000..96675e02b707 > > > > > --- /dev/null > > > > > +++ b/samples/bpf/trace_ns_info_user_kern.c > > > > > @@ -0,0 +1,44 @@ > > > > > +// SPDX-License-Identifier: GPL-2.0 > > > > > +/* Copyright (c) 2018 Carlos Neira cneirabustos@gmail.com > > > > > + * > > > > > + * This program is free software; you can redistribute it and/or > > > > > + * modify it under the terms of version 2 of the GNU General Public > > > > > + * License as published by the Free Software Foundation. > > > > > + */ > > > > > +#include > > > > > +#include > > > > > +#include > > > > > +#include > > > > > +#include "bpf_helpers.h" > > > > > + > > > > > +typedef __u64 u64; > > > > > +typedef __u32 u32; > > > > > + > > > > > + > > > > > +/* kprobe is NOT a stable ABI > > > > > + * kernel functions can be removed, renamed or completely change > > semantics. > > > > > + * Number of arguments and their positions can change, etc. > > > > > + * In such case this bpf+kprobe example will no longer be meaningful > > > > > + */ > > > > > + > > > > > +/* This will call bpf_get_current_pidns_info() to display pid and ns > > values > > > > > + * as seen by the current namespace, on the far left you will see > > the pid as > > > > > + * seen as by the root namespace. > > > > > + */ > > > > > + > > > > > +SEC("kprobe/__netif_receive_skb_core") > > > > > +int bpf_prog1(struct pt_regs *ctx) > > > > > +{ > > > > > +       char fmt[] = "nsid:%u, dev: %u,  pid:%u\n"; > > > > > +       struct bpf_pidns_info nsinfo; > > > > > +       int ok = 0; > > > > > + > > > > > +       ok = bpf_get_current_pidns_info(&nsinfo, sizeof(nsinfo)); > > > > > +       if (ok == 0) > > > > > +               bpf_trace_printk(fmt, sizeof(fmt), (u32)nsinfo.nsid, > > > > > +                                (u32) nsinfo.dev, (u32)nsinfo.pid); > > > > > + > > > > > +       return 0; > > > > > +} > > > > > +char _license[] SEC("license") = "GPL"; > > > > > +u32 _version SEC("version") = LINUX_VERSION_CODE; > > > > > diff --git a/tools/include/uapi/linux/bpf.h > > b/tools/include/uapi/linux/bpf.h > > > > > index 4393bd4b2419..b0d4869fb860 100644 > > > > > --- a/tools/include/uapi/linux/bpf.h > > > > > +++ b/tools/include/uapi/linux/bpf.h > > > > > @@ -2741,6 +2741,24 @@ union bpf_attr { > > > > >   *             **-EOPNOTSUPP** kernel configuration does not enable > > SYN cookies > > > > >   * > > > > >   *             **-EPROTONOSUPPORT** IP packet version is not 4 or 6 > > > > > + * > > > > > + * int bpf_get_current_pidns_info(struct bpf_pidns_info *pidns, u32 > > size_of_pidns) > > > > > + *     Description > > > > > + *             Copies into *pidns* pid, namespace id and tgid as > > seen by the > > > > > + *             current namespace and also device from /proc/self/ns/pid. > > > > > + *             *size_of_pidns* must be the size of *pidns* > > > > > + * > > > > > + *             This helper is used when pid filtering is needed inside a > > > > > + *             container as bpf_get_current_tgid() helper returns > > always the > > > > > + *             pid id as seen by the root namespace. > > > > > + *     Return > > > > > + *             0 on success > > > > > + * > > > > > + *             **-EINVAL** if *size_of_pidns* is not valid or unable > > to get ns, pid > > > > > + *             or tgid of the current task. > > > > > + * > > > > > + *             **-ENOMEM**  if allocation fails. > > > > > + * > > > > >   */ > > > > >  #define __BPF_FUNC_MAPPER(FN)          \ > > > > >         FN(unspec),                     \ > > > > > @@ -2853,7 +2871,8 @@ union bpf_attr { > > > > >         FN(sk_storage_get),             \ > > > > >         FN(sk_storage_delete),          \ > > > > >         FN(send_signal),                \ > > > > > -       FN(tcp_gen_syncookie), > > > > > +       FN(tcp_gen_syncookie),          \ > > > > > +       FN(get_current_pidns_info), > > > > > > > > > >  /* integer value in 'imm' field of BPF_CALL instruction selects > > which helper > > > > >   * function eBPF program intends to call > > > > > @@ -3604,4 +3623,10 @@ struct bpf_sockopt { > > > > >         __s32   retval; > > > > >  }; > > > > > > > > > > +struct bpf_pidns_info { > > > > > +       __u32 dev; > > > > > +       __u32 nsid; > > > > > +       __u32 tgid; > > > > > +       __u32 pid; > > > > > +}; > > > > >  #endif /* _UAPI__LINUX_BPF_H__ */ > > > > > diff --git a/tools/testing/selftests/bpf/Makefile > > b/tools/testing/selftests/bpf/Makefile > > > > > index 3bd0f4a0336a..1f97b571b581 100644 > > > > > --- a/tools/testing/selftests/bpf/Makefile > > > > > +++ b/tools/testing/selftests/bpf/Makefile > > > > > @@ -29,7 +29,7 @@ TEST_GEN_PROGS = test_verifier test_tag test_maps > > test_lru_map test_lpm_map test > > > > >         test_cgroup_storage test_select_reuseport test_section_names \ > > > > >         test_netcnt test_tcpnotify_user test_sock_fields test_sysctl > > test_hashmap \ > > > > >         test_btf_dump test_cgroup_attach xdping test_sockopt > > test_sockopt_sk \ > > > > > -       test_sockopt_multi test_tcp_rtt > > > > > +       test_sockopt_multi test_tcp_rtt test_pidns > > > > > > > > > >  BPF_OBJ_FILES = $(patsubst %.c,%.o, $(notdir $(wildcard progs/*.c))) > > > > >  TEST_GEN_FILES = $(BPF_OBJ_FILES) > > > > > diff --git a/tools/testing/selftests/bpf/bpf_helpers.h > > b/tools/testing/selftests/bpf/bpf_helpers.h > > > > > index 120aa86c58d3..c96795a9d983 100644 > > > > > --- a/tools/testing/selftests/bpf/bpf_helpers.h > > > > > +++ b/tools/testing/selftests/bpf/bpf_helpers.h > > > > > @@ -231,6 +231,9 @@ static int (*bpf_send_signal)(unsigned sig) = > > (void *)BPF_FUNC_send_signal; > > > > >  static long long (*bpf_tcp_gen_syncookie)(struct bpf_sock *sk, void *ip, > > > > >                                           int ip_len, void *tcp, int > > tcp_len) = > > > > >         (void *) BPF_FUNC_tcp_gen_syncookie; > > > > > +static int (*bpf_get_current_pidns_info)(struct bpf_pidns_info *buf, > > > > > +                                        unsigned int buf_size) = > > > > > +       (void *) BPF_FUNC_get_current_pidns_info; > > > > > > > > > >  /* llvm builtin functions that eBPF C program may use to > > > > >   * emit BPF_LD_ABS and BPF_LD_IND instructions > > > > > diff --git a/tools/testing/selftests/bpf/progs/test_pidns_kern.c > > b/tools/testing/selftests/bpf/progs/test_pidns_kern.c > > > > > new file mode 100644 > > > > > index 000000000000..e1d2facfa762 > > > > > --- /dev/null > > > > > +++ b/tools/testing/selftests/bpf/progs/test_pidns_kern.c > > > > > @@ -0,0 +1,51 @@ > > > > > +// SPDX-License-Identifier: GPL-2.0 > > > > > +/* Copyright (c) 2018 Carlos Neira cneirabustos@gmail.com > > > > > + * > > > > > + * This program is free software; you can redistribute it and/or > > > > > + * modify it under the terms of version 2 of the GNU General Public > > > > > + * License as published by the Free Software Foundation. > > > > > + */ > > > > > + > > > > > +#include > > > > > +#include > > > > > +#include "bpf_helpers.h" > > > > > + > > > > > +struct bpf_map_def SEC("maps") nsidmap = { > > > > > +       .type = BPF_MAP_TYPE_ARRAY, > > > > > +       .key_size = sizeof(__u32), > > > > > +       .value_size = sizeof(__u32), > > > > > +       .max_entries = 1, > > > > > +}; > > > > > + > > > > > +struct bpf_map_def SEC("maps") pidmap = { > > > > > +       .type = BPF_MAP_TYPE_ARRAY, > > > > > +       .key_size = sizeof(__u32), > > > > > +       .value_size = sizeof(__u32), > > > > > +       .max_entries = 1, > > > > > +}; > > > > > + > > > > > +SEC("tracepoint/syscalls/sys_enter_nanosleep") > > > > > +int trace(void *ctx) > > > > > +{ > > > > > +       struct bpf_pidns_info nsinfo; > > > > > +       __u32 key = 0, *expected_pid, *val; > > > > > +       char fmt[] = "ERROR nspid:%d\n"; > > > > > + > > > > > +       if (bpf_get_current_pidns_info(&nsinfo, sizeof(nsinfo))) > > > > > +               return -EINVAL; > > > > > + > > > > > +       expected_pid = bpf_map_lookup_elem(&pidmap, &key); > > > > > + > > > > > + > > > > > +       if (!expected_pid || *expected_pid != nsinfo.pid) > > > > > +               return 0; > > > > > + > > > > > +       val = bpf_map_lookup_elem(&nsidmap, &key); > > > > > +       if (val) > > > > > +               *val = nsinfo.nsid; > > > > > + > > > > > +       return 0; > > > > > +} > > > > > + > > > > > +char _license[] SEC("license") = "GPL"; > > > > > +__u32 _version SEC("version") = 1; > > > > > diff --git a/tools/testing/selftests/bpf/test_pidns.c > > b/tools/testing/selftests/bpf/test_pidns.c > > > > > new file mode 100644 > > > > > index 000000000000..a7254055f294 > > > > > --- /dev/null > > > > > +++ b/tools/testing/selftests/bpf/test_pidns.c > > > > > @@ -0,0 +1,138 @@ > > > > > +// SPDX-License-Identifier: GPL-2.0 > > > > > +/* Copyright (c) 2018 Carlos Neira cneirabustos@gmail.com > > > > > + * > > > > > + * This program is free software; you can redistribute it and/or > > > > > + * modify it under the terms of version 2 of the GNU General Public > > > > > + * License as published by the Free Software Foundation. > > > > > + */ > > > > > + > > > > > +#include > > > > > +#include > > > > > +#include > > > > > +#include > > > > > +#include > > > > > +#include > > > > > +#include > > > > > +#include > > > > > +#include > > > > > +#include > > > > > +#include > > > > > +#include > > > > > + > > > > > +#include > > > > > +#include > > > > > +#include > > > > > + > > > > > +#include "cgroup_helpers.h" > > > > > +#include "bpf_rlimit.h" > > > > > + > > > > > +#define CHECK(condition, tag, format...) ({            \ > > > > > +       int __ret = !!(condition);                      \ > > > > > +       if (__ret) {                                    \ > > > > > +               printf("%s:FAIL:%s ", __func__, tag);   \ > > > > > +               printf(format);                         \ > > > > > +       } else {                                        \ > > > > > +               printf("%s:PASS:%s\n", __func__, tag);  \ > > > > > +       }                                               \ > > > > > +       __ret;                                          \ > > > > > +}) > > > > > + > > > > > +static int bpf_find_map(const char *test, struct bpf_object *obj, > > > > > +                       const char *name) > > > > > +{ > > > > > +       struct bpf_map *map; > > > > > + > > > > > +       map = bpf_object__find_map_by_name(obj, name); > > > > > +       if (!map) > > > > > +               return -1; > > > > > +       return bpf_map__fd(map); > > > > > +} > > > > > + > > > > > + > > > > > +int main(int argc, char **argv) > > > > > +{ > > > > > +       const char *probe_name = "syscalls/sys_enter_nanosleep"; > > > > > +       const char *file = "test_pidns_kern.o"; > > > > > +       int err, bytes, efd, prog_fd, pmu_fd; > > > > > +       int pidmap_fd, nsidmap_fd; > > > > > +       struct perf_event_attr attr = {}; > > > > > +       struct bpf_object *obj; > > > > > +       __u32 knsid = 0; > > > > > +       __u32 key = 0, pid; > > > > > +       int exit_code = 1; > > > > > +       struct stat st; > > > > > +       char buf[256]; > > > > > + > > > > > +       err = bpf_prog_load(file, BPF_PROG_TYPE_TRACEPOINT, &obj, > > &prog_fd); > > > > > +       if (CHECK(err, "bpf_prog_load", "err %d errno %d\n", err, errno)) > > > > > +               goto cleanup_cgroup_env; > > > > > + > > > > > +       nsidmap_fd = bpf_find_map(__func__, obj, "nsidmap"); > > > > > +       if (CHECK(nsidmap_fd < 0, "bpf_find_map", "err %d errno %d\n", > > > > > +                 nsidmap_fd, errno)) > > > > > +               goto close_prog; > > > > > + > > > > > +       pidmap_fd = bpf_find_map(__func__, obj, "pidmap"); > > > > > +       if (CHECK(pidmap_fd < 0, "bpf_find_map", "err %d errno %d\n", > > > > > +                 pidmap_fd, errno)) > > > > > +               goto close_prog; > > > > > + > > > > > +       pid = getpid(); > > > > > +       bpf_map_update_elem(pidmap_fd, &key, &pid, 0); > > > > > + > > > > > +       snprintf(buf, sizeof(buf), > > > > > +                "/sys/kernel/debug/tracing/events/%s/id", probe_name); > > > > > +       efd = open(buf, O_RDONLY, 0); > > > > > +       if (CHECK(efd < 0, "open", "err %d errno %d\n", efd, errno)) > > > > > +               goto close_prog; > > > > > +       bytes = read(efd, buf, sizeof(buf)); > > > > > +       close(efd); > > > > > +       if (CHECK(bytes <= 0 || bytes >= sizeof(buf), "read", > > > > > +                 "bytes %d errno %d\n", bytes, errno)) > > > > > +               goto close_prog; > > > > > + > > > > > +       attr.config = strtol(buf, NULL, 0); > > > > > +       attr.type = PERF_TYPE_TRACEPOINT; > > > > > +       attr.sample_type = PERF_SAMPLE_RAW; > > > > > +       attr.sample_period = 1; > > > > > +       attr.wakeup_events = 1; > > > > > + > > > > > +       pmu_fd = syscall(__NR_perf_event_open, &attr, getpid(), -1, > > -1, 0); > > > > > +       if (CHECK(pmu_fd < 0, "perf_event_open", "err %d errno %d\n", > > pmu_fd, > > > > > +                 errno)) > > > > > +               goto close_prog; > > > > > + > > > > > +       err = ioctl(pmu_fd, PERF_EVENT_IOC_ENABLE, 0); > > > > > +       if (CHECK(err, "perf_event_ioc_enable", "err %d errno %d\n", err, > > > > > +                 errno)) > > > > > +               goto close_pmu; > > > > > + > > > > > +       err = ioctl(pmu_fd, PERF_EVENT_IOC_SET_BPF, prog_fd); > > > > > +       if (CHECK(err, "perf_event_ioc_set_bpf", "err %d errno %d\n", > > err, > > > > > +                 errno)) > > > > > +               goto close_pmu; > > > > > + > > > > > +       /* trigger some syscalls */ > > > > > +       sleep(1); > > > > > + > > > > > +       err = bpf_map_lookup_elem(nsidmap_fd, &key, &knsid); > > > > > +       if (CHECK(err, "bpf_map_lookup_elem", "err %d errno %d\n", > > err, errno)) > > > > > +               goto close_pmu; > > > > > + > > > > > +       if (stat("/proc/self/ns/pid", &st)) > > > > > +               goto close_pmu; > > > > > + > > > > > +       if (CHECK(knsid != (__u32) st.st_ino, "compare_namespace_id", > > > > > +                 "kern knsid %u user unsid %u\n", knsid, (__u32) > > st.st_ino)) > > > > > +               goto close_pmu; > > > > > + > > > > > +       exit_code = 0; > > > > > +       printf("%s:PASS\n", argv[0]); > > > > > + > > > > > +close_pmu: > > > > > +       close(pmu_fd); > > > > > +close_prog: > > > > > +       bpf_object__close(obj); > > > > > +cleanup_cgroup_env: > > > > > +       return exit_code; > > > > > +} > > > > > -- > > > > > 2.11.0 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Aug 08, 2019 at 05:09:51AM +0000, Yonghong Song wrote: > > > > > > > > > > > > > > > > > > On 8/7/19 6:22 PM, Carlos Antonio Neira Bustos wrote: > > > > > > > The code has been modified to avoid syscalls that could sleep. > > > > > > > Please let me know if any other modification is needed. > > > > > > > > > > > > > >  From be0384c0fa209a78c1567936e8db4e35b9a7c0f8 Mon Sep 17 > > 00:00:00 2001 > > > > > > > From: Carlos > > > > > > > Date: Wed, 7 Aug 2019 20:04:30 -0400 > > > > > > > Subject: [PATCH] [PATCH v5 bpf-next] BPF: New helper to obtain > > namespace data > > > > > > >   from current task > > > > > > > > > > > > > > This helper obtains the active namespace from current and returns > > pid, tgid, > > > > > > > device and namespace id as seen from that namespace, allowing to > > instrument > > > > > > > a process inside a container. > > > > > > > Device is read from /proc/self/ns/pid, as in the future it's > > possible that > > > > > > > different pid_ns files may belong to different devices, according > > > > > > > to the discussion between Eric Biederman and Yonghong in 2017 > > linux plumbers > > > > > > > conference. > > > > > > > Currently bpf_get_current_pid_tgid(), is used to do pid filtering > > in bcc's > > > > > > > scripts but this helper returns the pid as seen by the root > > namespace which is > > > > > > > fine when a bcc script is not executed inside a container. > > > > > > > When the process of interest is inside a container, pid filtering > > will not work > > > > > > > if bpf_get_current_pid_tgid() is used. This helper addresses this > > limitation > > > > > > > returning the pid as it's seen by the current namespace where the > > script is > > > > > > > executing. > > > > > > > > > > > > > > This helper has the same use cases as bpf_get_current_pid_tgid() > > as it can be > > > > > > > used to do pid filtering even inside a container. > > > > > > > > > > > > > > For example a bcc script using bpf_get_current_pid_tgid() > > (tools/funccount.py): > > > > > > > > > > > > > >          u32 pid = bpf_get_current_pid_tgid() >> 32; > > > > > > >          if (pid != ) > > > > > > >                  return 0; > > > > > > > Could be modified to use bpf_get_current_pidns_info() as follows: > > > > > > > > > > > > > >          struct bpf_pidns pidns; > > > > > > >          bpf_get_current_pidns_info(&pidns, sizeof(struct > > bpf_pidns)); > > > > > > >          u32 pid = pidns.tgid; > > > > > > >          u32 nsid = pidns.nsid; > > > > > > >          if ((pid != ) && (nsid != > > )) > > > > > > >                  return 0; > > > > > > > > > > > > > > To find out the name PID namespace id of a process, you could use > > this command: > > > > > > > > > > > > > > $ ps -h -o pidns -p > > > > > > > > > > > > > > Or this other command: > > > > > > > > > > > > > > $ ls -Li /proc//ns/pid > > > > > > > > > > > > > > Signed-off-by: Carlos Neira > > > > > > > --- > > > > > > >   fs/namei.c                                         |   2 +- > > > > > > >   include/linux/bpf.h                                |   1 + > > > > > > >   include/linux/namei.h                              |   4 + > > > > > > >   include/uapi/linux/bpf.h                           |  29 ++++- > > > > > > >   kernel/bpf/core.c                                  |   1 + > > > > > > >   kernel/bpf/helpers.c                               |  78 > > ++++++++++++ > > > > > > >   kernel/trace/bpf_trace.c                           |   2 + > > > > > > >   samples/bpf/Makefile                               |   3 + > > > > > > >   samples/bpf/trace_ns_info_user.c                   |  35 ++++++ > > > > > > >   samples/bpf/trace_ns_info_user_kern.c              |  44 +++++++ > > > > > > >   tools/include/uapi/linux/bpf.h                     |  29 ++++- > > > > > > >   tools/testing/selftests/bpf/Makefile               |   2 +- > > > > > > >   tools/testing/selftests/bpf/bpf_helpers.h          |   3 + > > > > > > >   .../testing/selftests/bpf/progs/test_pidns_kern.c  |  51 ++++++++ > > > > > > >   tools/testing/selftests/bpf/test_pidns.c           | 138 > > +++++++++++++++++++++ > > > > > > >   15 files changed, 418 insertions(+), 4 deletions(-) > > > > > > >   create mode 100644 samples/bpf/trace_ns_info_user.c > > > > > > >   create mode 100644 samples/bpf/trace_ns_info_user_kern.c > > > > > > >   create mode 100644 > > tools/testing/selftests/bpf/progs/test_pidns_kern.c > > > > > > >   create mode 100644 tools/testing/selftests/bpf/test_pidns.c > > > > > > > > > > > > > > diff --git a/fs/namei.c b/fs/namei.c > > > > > > > index 209c51a5226c..d1eca36972d2 100644 > > > > > > > --- a/fs/namei.c > > > > > > > +++ b/fs/namei.c > > > > > > > @@ -19,7 +19,6 @@ > > > > > > >   #include > > > > > > >   #include > > > > > > >   #include > > > > > > > -#include > > > > > > >   #include > > > > > > >   #include > > > > > > >   #include > > > > > > > @@ -2355,6 +2354,7 @@ int filename_lookup(int dfd, struct > > filename *name, unsigned flags, > > > > > > >     putname(name); > > > > > > >     return retval; > > > > > > >   } > > > > > > > +EXPORT_SYMBOL(filename_lookup); > > > > > > > > > > > > No need to export symbols. bpf uses it and bpf is in the core, not in > > > > > > modules. > > > > > > > > > > > > > > > > > > > >   /* Returns 0 and nd will be valid on success; Retuns error, > > otherwise. */ > > > > > > >   static int path_parentat(struct nameidata *nd, unsigned flags, > > > > > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h > > > > > > > index f9a506147c8a..e4adf5e05afd 100644 > > > > > > > --- a/include/linux/bpf.h > > > > > > > +++ b/include/linux/bpf.h > > > > > > > @@ -1050,6 +1050,7 @@ extern const struct bpf_func_proto > > bpf_get_local_storage_proto; > > > > > > >   extern const struct bpf_func_proto bpf_strtol_proto; > > > > > > >   extern const struct bpf_func_proto bpf_strtoul_proto; > > > > > > >   extern const struct bpf_func_proto bpf_tcp_sock_proto; > > > > > > > +extern const struct bpf_func_proto bpf_get_current_pidns_info_proto; > > > > > > > > > > > > > >   /* Shared helpers among cBPF and eBPF. */ > > > > > > >   void bpf_user_rnd_init_once(void); > > > > > > > diff --git a/include/linux/namei.h b/include/linux/namei.h > > > > > > > index 9138b4471dbf..2c24e8c71d46 100644 > > > > > > > --- a/include/linux/namei.h > > > > > > > +++ b/include/linux/namei.h > > > > > > > @@ -6,6 +6,7 @@ > > > > > > >   #include > > > > > > >   #include > > > > > > >   #include > > > > > > > +#include > > > > > > > > > > > > > >   enum { MAX_NESTED_LINKS = 8 }; > > > > > > > > > > > > > > @@ -97,6 +98,9 @@ extern void unlock_rename(struct dentry *, > > struct dentry *); > > > > > > > > > > > > > >   extern void nd_jump_link(struct path *path); > > > > > > > > > > > > > > +extern int filename_lookup(int dfd, struct filename *name, > > unsigned int flags, > > > > > > > +               struct path *path, struct path *root); > > > > > > > > > > > > The previous definition in fs/internal.h should be removed. > > > > > > > > > > > > > + > > > > > > >   static inline void nd_terminate_link(void *name, size_t len, > > size_t maxlen) > > > > > > >   { > > > > > > >     ((char *) name)[min(len, maxlen)] = '\0'; > > > > > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > > > > > > > index 4393bd4b2419..6f601f7106e2 100644 > > > > > > > --- a/include/uapi/linux/bpf.h > > > > > > > +++ b/include/uapi/linux/bpf.h > > > > > > > @@ -2741,6 +2741,26 @@ union bpf_attr { > > > > > > >    *                **-EOPNOTSUPP** kernel configuration does not > > enable SYN cookies > > > > > > >    * > > > > > > >    *                **-EPROTONOSUPPORT** IP packet version is not > > 4 or 6 > > > > > > > + * > > > > > > > + * int bpf_get_current_pidns_info(struct bpf_pidns_info *pidns, > > u32 size_of_pidns) > > > > > > > + * Description > > > > > > > + *         Copies into *pidns* pid, namespace id and tgid as > > seen by the > > > > > > > + *         current namespace and also device from /proc/self/ns/pid. > > > > > > > + *         *size_of_pidns* must be the size of *pidns* > > > > > > > + * > > > > > > > + *         This helper is used when pid filtering is needed inside a > > > > > > > + *         container as bpf_get_current_tgid() helper returns > > always the > > > > > > > + *         pid id as seen by the root namespace. > > > > > > > + * Return > > > > > > > + *         0 on success > > > > > > > + * > > > > > > > + *         **-EINVAL**  if unable to get ns, pid or tgid of > > current task. > > > > > > > + *         Or if size_of_pidns is not valid. > > > > > > > > > > > > Maybe reword by following the code sequence. > > > > > >     if *size_of_pidns* is not valid or unable to get ns, pid or tgid of > > > > > >     the current task. > > > > > > > > > > > > > + * > > > > > > > + *         **-ENOMEM**  if allocation fails. > > > > > > > > > > > > Maybe some other error codes in filename_lookup() function? > > > > > > > > > > > > > + * > > > > > > > + *         If unable to get the inode from /proc/self/ns/pid an > > error code > > > > > > > + *         will be returned. > > > > > > > > > > > > You do not need this. The description of error code cases should > > cover this. > > > > > > > > > > > > >    */ > > > > > > >   #define __BPF_FUNC_MAPPER(FN)             \ > > > > > > >     FN(unspec),                     \ > > > > > > > @@ -2853,7 +2873,8 @@ union bpf_attr { > > > > > > >     FN(sk_storage_get),             \ > > > > > > >     FN(sk_storage_delete),          \ > > > > > > >     FN(send_signal),                \ > > > > > > > -   FN(tcp_gen_syncookie), > > > > > > > +   FN(tcp_gen_syncookie),          \ > > > > > > > +   FN(get_current_pidns_info), > > > > > > > > > > > > > >   /* integer value in 'imm' field of BPF_CALL instruction selects > > which helper > > > > > > >    * function eBPF program intends to call > > > > > > > @@ -3604,4 +3625,10 @@ struct bpf_sockopt { > > > > > > >     __s32   retval; > > > > > > >   }; > > > > > > > > > > > > > > +struct bpf_pidns_info { > > > > > > > +   __u32 dev; > > > > > > > +   __u32 nsid; > > > > > > > +   __u32 tgid; > > > > > > > +   __u32 pid; > > > > > > > +}; > > > > > > >   #endif /* _UAPI__LINUX_BPF_H__ */ > > > > > > > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c > > > > > > > index 8191a7db2777..3159f2a0188c 100644 > > > > > > > --- a/kernel/bpf/core.c > > > > > > > +++ b/kernel/bpf/core.c > > > > > > > @@ -2038,6 +2038,7 @@ const struct bpf_func_proto > > bpf_get_current_uid_gid_proto __weak; > > > > > > >   const struct bpf_func_proto bpf_get_current_comm_proto __weak; > > > > > > >   const struct bpf_func_proto bpf_get_current_cgroup_id_proto __weak; > > > > > > >   const struct bpf_func_proto bpf_get_local_storage_proto __weak; > > > > > > > +const struct bpf_func_proto bpf_get_current_pidns_info __weak; > > > > > > > > > > > > > >   const struct bpf_func_proto * __weak > > bpf_get_trace_printk_proto(void) > > > > > > >   { > > > > > > > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c > > > > > > > index 5e28718928ca..571f24077db2 100644 > > > > > > > --- a/kernel/bpf/helpers.c > > > > > > > +++ b/kernel/bpf/helpers.c > > > > > > > @@ -11,6 +11,12 @@ > > > > > > >   #include > > > > > > >   #include > > > > > > >   #include > > > > > > > +#include > > > > > > > +#include > > > > > > > +#include > > > > > > > +#include > > > > > > > +#include > > > > > > > + > > > > > > > > > > > > > >   #include "../../lib/kstrtox.h" > > > > > > > > > > > > > > @@ -312,6 +318,78 @@ void copy_map_value_locked(struct bpf_map > > *map, void *dst, void *src, > > > > > > >     preempt_enable(); > > > > > > >   } > > > > > > > > > > > > > > +BPF_CALL_2(bpf_get_current_pidns_info, struct bpf_pidns_info *, > > pidns_info, u32, > > > > > > > +    size) > > > > > > > +{ > > > > > > > +   const char *name = "/proc/self/ns/pid"; > > > > > > > > > > > > maybe rename this variable to pidns_path? > > > > > > > > > > > > > +   struct pid_namespace *pidns = NULL; > > > > > > > +   struct filename *tmp = NULL; > > > > > > > > > > > > Maybe rename this variable to name? > > > > > > > > > > > > > +   int len = strlen(name) + 1; > > > > > > > > > > > > We can delay this assignment later until it is needed. > > > > > > > > > > > > > +   struct inode *inode; > > > > > > > +   struct path kp; > > > > > > > +   pid_t tgid = 0; > > > > > > > +   pid_t pid = 0; > > > > > > > +   int ret; > > > > > > > + > > > > > > > +   if (unlikely(size != sizeof(struct bpf_pidns_info))) > > > > > > > +           return -EINVAL; > > > > > > > + > > > > > > > +   pidns = task_active_pid_ns(current); > > > > > > > + > > > > > > > > > > > > we can save an empty line here. > > > > > > > > > > > > > +   if (unlikely(!pidns)) > > > > > > > +           goto clear; > > > > > > > + > > > > > > > +   pidns_info->nsid =  pidns->ns.inum; > > > > > > > +   pid = task_pid_nr_ns(current, pidns); > > > > > > > + > > > > > > > > > > > > We can save an empty line here. > > > > > > > > > > > > > +   if (unlikely(!pid)) > > > > > > > +           goto clear; > > > > > > > + > > > > > > > +   tgid = task_tgid_nr_ns(current, pidns); > > > > > > > + > > > > > > ditto. save an empty line. > > > > > > > +   if (unlikely(!tgid)) > > > > > > > +           goto clear; > > > > > > > + > > > > > > > +   pidns_info->tgid = (u32) tgid; > > > > > > > +   pidns_info->pid = (u32) pid; > > > > > > > + > > > > > > > +   tmp = kmem_cache_alloc(names_cachep, GFP_ATOMIC); > > > > > > > +   if (unlikely(!tmp)) { > > > > > > > +           memset((void *)pidns_info, 0, (size_t) size); > > > > > > > +           return -ENOMEM; > > > > > > > +   } > > > > > > > + > > > > > > > +   memcpy((char *)tmp->name, name, len); > > > > > > > +   tmp->uptr = NULL; > > > > > > > +   tmp->aname = NULL; > > > > > > > +   tmp->refcnt = 1; > > > > > > > + > > > > > > ditto. save an empty line. > > > > > > > +   ret = filename_lookup(AT_FDCWD, tmp, 0, &kp, NULL); > > > > > > > + > > > > > > ditto. save an empty line. > > > > > > > +   if (ret) { > > > > > > > +           memset((void *)pidns_info, 0, (size_t) size); > > > > > > > +           return ret; > > > > > > > +   } > > > > > > > + > > > > > > > +   inode = d_backing_inode(kp.dentry); > > > > > > > +   pidns_info->dev = inode->i_sb->s_dev; > > > > > > > + > > > > > > > +   return 0; > > > > > > > + > > > > > > > +clear: > > > > > > > +   memset((void *)pidns_info, 0, (size_t) size); > > > > > > > + > > > > > > save an empty line. > > > > > > > +   return -EINVAL; > > > > > > > +} > > > > > > > + > > > > > > > +const struct bpf_func_proto bpf_get_current_pidns_info_proto = { > > > > > > > +   .func   = bpf_get_current_pidns_info, > > > > > > make the "= " aligned with others? > > > > > > > +   .gpl_only       = false, > > > > > > > +   .ret_type       = RET_INTEGER, > > > > > > > +   .arg1_type      = ARG_PTR_TO_UNINIT_MEM, > > > > > > > +   .arg2_type      = ARG_CONST_SIZE, > > > > > > > +}; > > > > > > > + > > > > > > >   #ifdef CONFIG_CGROUPS > > > > > > >   BPF_CALL_0(bpf_get_current_cgroup_id) > > > > > > >   { > > > > > > > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c > > > > > > > index ca1255d14576..5e1dc22765a5 100644 > > > > > > > --- a/kernel/trace/bpf_trace.c > > > > > > > +++ b/kernel/trace/bpf_trace.c > > > > > > > @@ -709,6 +709,8 @@ tracing_func_proto(enum bpf_func_id func_id, > > const struct bpf_prog *prog) > > > > > > >   #endif > > > > > > >     case BPF_FUNC_send_signal: > > > > > > >             return &bpf_send_signal_proto; > > > > > > > +   case BPF_FUNC_get_current_pidns_info: > > > > > > > +           return &bpf_get_current_pidns_info_proto; > > > > > > >     default: > > > > > > >             return NULL; > > > > > > >     } > > > > > > > diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile > > > > > > > index 1d9be26b4edd..238453ff27d2 100644 > > > > > > > --- a/samples/bpf/Makefile > > > > > > > +++ b/samples/bpf/Makefile > > > > > > > @@ -53,6 +53,7 @@ hostprogs-y += task_fd_query > > > > > > >   hostprogs-y += xdp_sample_pkts > > > > > > >   hostprogs-y += ibumad > > > > > > >   hostprogs-y += hbm > > > > > > > +hostprogs-y += trace_ns_info > > > > > > [...] > >