BPF List
 help / color / mirror / Atom feed
From: Yonghong Song <yhs@fb.com>
To: Daniel Xu <dxu@dxuuu.xyz>, <bpf@vger.kernel.org>
Cc: <cneirabustos@gmail.com>, <ebiederm@xmission.com>, <blez@fb.com>
Subject: Re: Extending bpf_get_ns_current_pid_tgid()
Date: Thu, 12 Nov 2020 16:27:02 -0800	[thread overview]
Message-ID: <3fb902d7-a88c-f9ed-03f0-460f7bd552fa@fb.com> (raw)
In-Reply-To: <C71MVCBQMCPF.1CCKLFRTGYD0D@maharaja>



On 11/12/20 2:20 PM, Daniel Xu wrote:
> Hi,
> 
> I'm looking at the current implementation of
> bpf_get_ns_current_pid_tgid() and the helper seems to be a bit overly
> restricting to me. Specifically the following line:
> 
>      if (!ns_match(&pidns->ns, (dev_t)dev, ino))
>              goto clear;
> 
> Why bail if the inode # does not match? IIUC from the old discussions,
> it was b/c in the future pidns files might belong to different devices.
> It's not clear to me (possibly b/c I'm missing something) why the inode
> has to match as well.

Yes, pidns file might belong to different devices in theory so we need
to match dev as well.

The inode number needs to match so we can ensure user indeed wants to
get the *current pidns* tgid/pid.

(dev, ino) input expressed user intention. Without this, in no-process
context, it will be hard to interpret the results.

> 
> Would it be possible to instead have the helper return the pid/tgid of
> the current task as viewed _from_ the `dev`/`ino` pidns? If the current
> task is hidden from the `dev`/`ino` pidns, then return -ENOENT. The use
> case is for bpftrace symbolize stacks when run inside a container. For
> example:
> 
>      (in-container)# bpftrace -e 'profile:hz:99 { print(ustack) }'

I think you try to propose something like below:
   - user provides dev/ino
   - the helper will try to go through all pidns'es (not just active
     one), if any match pidns match, returns tgid/pid in that pidns,
     otherwise, returns -ENOENT.

The current helper is
    bpf_get_ns_current_pid_tgid
you want
    bpf_get_ns_pid_tgid

I think it is possible, you need to check
    pid->numbers[pid_level].ns
for all pid levels. You need to get a reference count for the namespace
to ensure valid result.

This may work for root inode, but for container inode, it may have 
issues. For example,
   container 1: create, inode 2
   container 1 removed
   container 2: create, inode 2
If you use inode 2, depending on timing you may accidentally targetting
wrong container.

I think you can workaround the issue without this helper. See below.

> 
> This currently does not work b/c bpftrace will generate a prog that gets
> the root pidns pid, pack it with the stackid, and pass it up to
> userspace. But b/c bpftrace is running inside the container, the root
> pidns pid is invalid and symbolization fails.

bpftrace can generate a program takes dev/inode as input parameters in
map. The bpftrace will supply dev/inode value, by query the current 
system/container, and then run the program.

> 
> What would be nice is if bpftrace could generate a prog that gets the
> current pid as viewed from bpftrace's pidns. Then symbolization would
> work.

Despite the above workaround, what you really need is although it is 
running on container, you want to get stack trace interpreted with
root pid/tgid for symbolization purpose? But you can already achieve
this with bpf_get_pid_tgid()?

> 
> Thanks,
> Daniel
> 

  reply	other threads:[~2020-11-13  0:27 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-12 22:20 Extending bpf_get_ns_current_pid_tgid() Daniel Xu
2020-11-13  0:27 ` Yonghong Song [this message]
2020-11-13  0:57   ` Daniel Xu
2020-11-13  4:13     ` Yonghong Song
2020-11-13 12:04       ` Blaise Sanouillet
2020-11-13 16:57         ` Yonghong Song
     [not found]         ` <CACiB22i6d2skkJJa7uwVRrYy7dtYOxmLgFwzjtieW4BFn2tzLw@mail.gmail.com>
2020-11-13 16:59           ` Yonghong Song
     [not found]             ` <CACiB22iU3zk4Row=wAween=rSvHJ7j7M5T2KbyFk38arzEwQpQ@mail.gmail.com>
2021-06-16  9:54               ` Blaise Sanouillet
2021-06-16 17:02               ` Yonghong Song
2021-06-16 21:44                 ` Carlos Neira
2021-06-17  2:49                   ` Yonghong Song
2021-06-17 10:59                     ` Blaise Sanouillet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3fb902d7-a88c-f9ed-03f0-460f7bd552fa@fb.com \
    --to=yhs@fb.com \
    --cc=blez@fb.com \
    --cc=bpf@vger.kernel.org \
    --cc=cneirabustos@gmail.com \
    --cc=dxu@dxuuu.xyz \
    --cc=ebiederm@xmission.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox