linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH bpf-next v10 1/2] bpf: add new helper get_file_path for mapping a file descriptor to a pathname
       [not found] ` <e8b1281b7405eb4b6c1f094169e6efd2c8cc95da.1574162990.git.ethercflow@gmail.com>
@ 2019-11-23  3:18   ` Alexei Starovoitov
  2019-11-23  4:43     ` Al Viro
  2019-11-23  4:51     ` Al Viro
  0 siblings, 2 replies; 7+ messages in thread
From: Alexei Starovoitov @ 2019-11-23  3:18 UTC (permalink / raw)
  To: Wenbo Zhang
  Cc: bpf, ast, daniel, yhs, andrii.nakryiko, netdev, viro,
	linux-fsdevel

On Tue, Nov 19, 2019 at 08:27:37AM -0500, Wenbo Zhang wrote:
> When people want to identify which file system files are being opened,
> read, and written to, they can use this helper with file descriptor as
> input to achieve this goal. Other pseudo filesystems are also supported.
> 
> This requirement is mainly discussed here:
> 
>   https://github.com/iovisor/bcc/issues/237
> 
> v9->v10: addressed Andrii's feedback
> - send this patch together with the patch selftests as one patch series
> 
> v8->v9:
> - format helper description
> 
> v7->v8: addressed Alexei's feedback
> - use fget_raw instead of fdget_raw, as fdget_raw is only used inside fs/
> - ensure we're in user context which is safe fot the help to run
> - filter unmountable pseudo filesystem, because they don't have real path
> - supplement the description of this helper function
> 
> v6->v7:
> - fix missing signed-off-by line
> 
> v5->v6: addressed Andrii's feedback
> - avoid unnecessary goto end by having two explicit returns
> 
> v4->v5: addressed Andrii and Daniel's feedback
> - rename bpf_fd2path to bpf_get_file_path to be consistent with other
> helper's names
> - when fdget_raw fails, set ret to -EBADF instead of -EINVAL
> - remove fdput from fdget_raw's error path
> - use IS_ERR instead of IS_ERR_OR_NULL as d_path ether returns a pointer
> into the buffer or an error code if the path was too long
> - modify the normal path's return value to return copied string length
> including NUL
> - update this helper description's Return bits.
> 
> v3->v4: addressed Daniel's feedback
> - fix missing fdput()
> - move fd2path from kernel/bpf/trace.c to kernel/trace/bpf_trace.c
> - move fd2path's test code to another patch
> - add comment to explain why use fdget_raw instead of fdget
> 
> v2->v3: addressed Yonghong's feedback
> - remove unnecessary LOCKDOWN_BPF_READ
> - refactor error handling section for enhanced readability
> - provide a test case in tools/testing/selftests/bpf
> 
> v1->v2: addressed Daniel's feedback
> - fix backward compatibility
> - add this helper description
> - fix signed-off name
> 
> Signed-off-by: Wenbo Zhang <ethercflow@gmail.com>
...
> +BPF_CALL_3(bpf_get_file_path, char *, dst, u32, size, int, fd)
> +{
> +	struct file *f;
> +	char *p;
> +	int ret = -EBADF;
> +
> +	/* Ensure we're in user context which is safe for the helper to
> +	 * run. This helper has no business in a kthread.
> +	 */
> +	if (unlikely(in_interrupt() ||
> +		     current->flags & (PF_KTHREAD | PF_EXITING)))
> +		return -EPERM;
> +
> +	/* Use fget_raw instead of fget to support O_PATH, and it doesn't
> +	 * have any sleepable code, so it's ok to be here.
> +	 */
> +	f = fget_raw(fd);
> +	if (!f)
> +		goto error;
> +
> +	/* For unmountable pseudo filesystem, it seems to have no meaning
> +	 * to get their fake paths as they don't have path, and to be no
> +	 * way to validate this function pointer can be always safe to call
> +	 * in the current context.
> +	 */
> +	if (f->f_path.dentry->d_op && f->f_path.dentry->d_op->d_dname)
> +		return -EINVAL;
> +
> +	/* After filter unmountable pseudo filesytem, d_path won't call
> +	 * dentry->d_op->d_name(), the normally path doesn't have any
> +	 * sleepable code, and despite it uses the current macro to get
> +	 * fs_struct (current->fs), we've already ensured we're in user
> +	 * context, so it's ok to be here.
> +	 */
> +	p = d_path(&f->f_path, dst, size);
> +	if (IS_ERR(p)) {
> +		ret = PTR_ERR(p);
> +		fput(f);
> +		goto error;
> +	}
> +
> +	ret = strlen(p);
> +	memmove(dst, p, ret);
> +	dst[ret++] = '\0';
> +	fput(f);
> +	return ret;
> +
> +error:
> +	memset(dst, '0', size);
> +	return ret;
> +}

Al,

could you please review about code whether it's doing enough checks to be
called safely from preempt_disabled region?

It's been under review for many weeks and looks good from bpf pov. Essentially
tracing folks need easy way to convert FD to full path name. This feature
request first came in 2015.

Thanks!


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH bpf-next v10 1/2] bpf: add new helper get_file_path for mapping a file descriptor to a pathname
  2019-11-23  3:18   ` [PATCH bpf-next v10 1/2] bpf: add new helper get_file_path for mapping a file descriptor to a pathname Alexei Starovoitov
@ 2019-11-23  4:43     ` Al Viro
  2019-11-23  4:51     ` Al Viro
  1 sibling, 0 replies; 7+ messages in thread
From: Al Viro @ 2019-11-23  4:43 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Wenbo Zhang, bpf, ast, daniel, yhs, andrii.nakryiko, netdev,
	linux-fsdevel

On Fri, Nov 22, 2019 at 07:18:28PM -0800, Alexei Starovoitov wrote:
> > +	/* After filter unmountable pseudo filesytem, d_path won't call
> > +	 * dentry->d_op->d_name(), the normally path doesn't have any
> > +	 * sleepable code, and despite it uses the current macro to get
> > +	 * fs_struct (current->fs), we've already ensured we're in user
> > +	 * context, so it's ok to be here.
> > +	 */
> > +	p = d_path(&f->f_path, dst, size);
> > +	if (IS_ERR(p)) {
> > +		ret = PTR_ERR(p);
> > +		fput(f);
> > +		goto error;
> > +	}
> > +
> > +	ret = strlen(p);
> > +	memmove(dst, p, ret);
> > +	dst[ret++] = '\0';
> > +	fput(f);
> > +	return ret;
> > +
> > +error:
> > +	memset(dst, '0', size);
> > +	return ret;
> > +}
> 
> Al,
> 
> could you please review about code whether it's doing enough checks to be
> called safely from preempt_disabled region?

Depends.  Which context is it running in?  In particular, which
locks might be already held?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH bpf-next v10 1/2] bpf: add new helper get_file_path for mapping a file descriptor to a pathname
  2019-11-23  3:18   ` [PATCH bpf-next v10 1/2] bpf: add new helper get_file_path for mapping a file descriptor to a pathname Alexei Starovoitov
  2019-11-23  4:43     ` Al Viro
@ 2019-11-23  4:51     ` Al Viro
  2019-11-23  5:19       ` Alexei Starovoitov
  1 sibling, 1 reply; 7+ messages in thread
From: Al Viro @ 2019-11-23  4:51 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Wenbo Zhang, bpf, ast, daniel, yhs, andrii.nakryiko, netdev,
	linux-fsdevel

On Fri, Nov 22, 2019 at 07:18:28PM -0800, Alexei Starovoitov wrote:
> > +	f = fget_raw(fd);
> > +	if (!f)
> > +		goto error;
> > +
> > +	/* For unmountable pseudo filesystem, it seems to have no meaning
> > +	 * to get their fake paths as they don't have path, and to be no
> > +	 * way to validate this function pointer can be always safe to call
> > +	 * in the current context.
> > +	 */
> > +	if (f->f_path.dentry->d_op && f->f_path.dentry->d_op->d_dname)
> > +		return -EINVAL;

An obvious leak here, BTW.

Anyway, what could that be used for?  I mean, if you want to check
something about syscall arguments, that's an unfixably racy way to go.
Descriptor table can be a shared data structure, and two consequent
fdget() on the same number can bloody well yield completely unrelated
struct file references.

IOW, anything that does descriptor -> struct file * translation more than
once is an instant TOCTOU suspect.  In this particular case, the function
will produce a pathname of something that was once reachable via descriptor
with this number; quite possibly never before that function had been called
_and_ not once after it has returned.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH bpf-next v10 1/2] bpf: add new helper get_file_path for mapping a file descriptor to a pathname
  2019-11-23  4:51     ` Al Viro
@ 2019-11-23  5:19       ` Alexei Starovoitov
  2019-11-23  5:35         ` Al Viro
  0 siblings, 1 reply; 7+ messages in thread
From: Alexei Starovoitov @ 2019-11-23  5:19 UTC (permalink / raw)
  To: Al Viro
  Cc: Wenbo Zhang, bpf, ast, daniel, yhs, andrii.nakryiko, netdev,
	linux-fsdevel

On Sat, Nov 23, 2019 at 04:51:51AM +0000, Al Viro wrote:
> On Fri, Nov 22, 2019 at 07:18:28PM -0800, Alexei Starovoitov wrote:
> > > +	f = fget_raw(fd);
> > > +	if (!f)
> > > +		goto error;
> > > +
> > > +	/* For unmountable pseudo filesystem, it seems to have no meaning
> > > +	 * to get their fake paths as they don't have path, and to be no
> > > +	 * way to validate this function pointer can be always safe to call
> > > +	 * in the current context.
> > > +	 */
> > > +	if (f->f_path.dentry->d_op && f->f_path.dentry->d_op->d_dname)
> > > +		return -EINVAL;
> 
> An obvious leak here, BTW.

ohh. right.

> Depends.  Which context is it running in?  In particular, which
> locks might be already held?

hard to tell. It will be run out of bpf prog that attaches to kprobe or
tracepoint. What is the concern about locking?
d_path() doesn't take any locks and doesn't depend on any locks. Above 'if'
checks that plain d_path() is used and not some specilized callback with
unknown logic.

> Anyway, what could that be used for?  I mean, if you want to check
> something about syscall arguments, that's an unfixably racy way to go.
> Descriptor table can be a shared data structure, and two consequent
> fdget() on the same number can bloody well yield completely unrelated
> struct file references.

yes. It is racy. There are no guarantees on correctness of FD.
The program can pass arbitrary integer into this helper.

> IOW, anything that does descriptor -> struct file * translation more than
> once is an instant TOCTOU suspect.  In this particular case, the function
> will produce a pathname of something that was once reachable via descriptor
> with this number; quite possibly never before that function had been called
> _and_ not once after it has returned.

Right. TOCTOU is not a concern here. It's tracing. It's ok for full path to be
'one time deal'. Right now people use bpf_probe_read() to replicate what
d_path() does.
See https://github.com/iovisor/bcc/issues/237#issuecomment-547564661
It sort of works, but calling d_path() is simpler and more accurate.
The key thing that bpf helpers need to make sure is that regardless of
how they're called and what integer is passed in as an FD the helper
must not crash or lockup the kernel or cause it to misbehave.
Hence above in_interrupt() and other checks to limit the context.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH bpf-next v10 1/2] bpf: add new helper get_file_path for mapping a file descriptor to a pathname
  2019-11-23  5:19       ` Alexei Starovoitov
@ 2019-11-23  5:35         ` Al Viro
  2019-11-23  6:04           ` Alexei Starovoitov
  0 siblings, 1 reply; 7+ messages in thread
From: Al Viro @ 2019-11-23  5:35 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Wenbo Zhang, bpf, ast, daniel, yhs, andrii.nakryiko, netdev,
	linux-fsdevel

On Fri, Nov 22, 2019 at 09:19:21PM -0800, Alexei Starovoitov wrote:

> hard to tell. It will be run out of bpf prog that attaches to kprobe or
> tracepoint. What is the concern about locking?
> d_path() doesn't take any locks and doesn't depend on any locks. Above 'if'
> checks that plain d_path() is used and not some specilized callback with
> unknown logic.

It sure as hell does.  It might end up taking rename_lock and/or mount_lock
spinlock components.  It'll try not to, but if the first pass ends up with
seqlock mismatch, it will just grab the spinlock the second time around.

> > with this number; quite possibly never before that function had been called
> > _and_ not once after it has returned.
> 
> Right. TOCTOU is not a concern here. It's tracing. It's ok for full path to be
> 'one time deal'.

It might very well be a full path of something completely unrelated to what
the syscall ends up operating upon.  It's not that the file might've been
moved; it might be a different file.  IOW, results of that tracing might be
misleading.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH bpf-next v10 1/2] bpf: add new helper get_file_path for mapping a file descriptor to a pathname
  2019-11-23  5:35         ` Al Viro
@ 2019-11-23  6:04           ` Alexei Starovoitov
  2019-12-13 19:51             ` Brendan Gregg
  0 siblings, 1 reply; 7+ messages in thread
From: Alexei Starovoitov @ 2019-11-23  6:04 UTC (permalink / raw)
  To: Al Viro
  Cc: Wenbo Zhang, bpf, ast, daniel, yhs, andrii.nakryiko, netdev,
	linux-fsdevel

On Sat, Nov 23, 2019 at 05:35:14AM +0000, Al Viro wrote:
> On Fri, Nov 22, 2019 at 09:19:21PM -0800, Alexei Starovoitov wrote:
> 
> > hard to tell. It will be run out of bpf prog that attaches to kprobe or
> > tracepoint. What is the concern about locking?
> > d_path() doesn't take any locks and doesn't depend on any locks. Above 'if'
> > checks that plain d_path() is used and not some specilized callback with
> > unknown logic.
> 
> It sure as hell does.  It might end up taking rename_lock and/or mount_lock
> spinlock components.  It'll try not to, but if the first pass ends up with
> seqlock mismatch, it will just grab the spinlock the second time around.

ohh. got it. I missed _or_lock() part in there.
The need_seqretry() logic is tricky. afaics there is no way for the checks
outside of prepend_path() to prevent spin_lock to happen. And adding a flag to
prepend_path() to return early if retry is needed is too ugly. So this helper
won't be safe to be run out of kprobe. But if we allow it for tracepoints only
it should be ok. I think. There are no tracepoints in inner guts of vfs and I
don't think they will ever be. So running in tracepoint->bpf_prog->d_path we
will be sure that rename_lock+mount_lock can be safely spinlocked. Am I missing
something?

> > > with this number; quite possibly never before that function had been called
> > > _and_ not once after it has returned.
> > 
> > Right. TOCTOU is not a concern here. It's tracing. It's ok for full path to be
> > 'one time deal'.
> 
> It might very well be a full path of something completely unrelated to what
> the syscall ends up operating upon.  It's not that the file might've been
> moved; it might be a different file.  IOW, results of that tracing might be
> misleading.

That is correct. Tracing is fine with such limitation. Still better than probe_read.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH bpf-next v10 1/2] bpf: add new helper get_file_path for mapping a file descriptor to a pathname
  2019-11-23  6:04           ` Alexei Starovoitov
@ 2019-12-13 19:51             ` Brendan Gregg
  0 siblings, 0 replies; 7+ messages in thread
From: Brendan Gregg @ 2019-12-13 19:51 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Al Viro, Wenbo Zhang, bpf, ast, Daniel Borkmann, Yonghong Song,
	andrii.nakryiko, netdev, linux-fsdevel

On Fri, Nov 22, 2019 at 10:05 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Sat, Nov 23, 2019 at 05:35:14AM +0000, Al Viro wrote:
> > On Fri, Nov 22, 2019 at 09:19:21PM -0800, Alexei Starovoitov wrote:
> >
> > > hard to tell. It will be run out of bpf prog that attaches to kprobe or
> > > tracepoint. What is the concern about locking?
> > > d_path() doesn't take any locks and doesn't depend on any locks. Above 'if'
> > > checks that plain d_path() is used and not some specilized callback with
> > > unknown logic.
> >
> > It sure as hell does.  It might end up taking rename_lock and/or mount_lock
> > spinlock components.  It'll try not to, but if the first pass ends up with
> > seqlock mismatch, it will just grab the spinlock the second time around.
>
> ohh. got it. I missed _or_lock() part in there.
> The need_seqretry() logic is tricky. afaics there is no way for the checks
> outside of prepend_path() to prevent spin_lock to happen. And adding a flag to
> prepend_path() to return early if retry is needed is too ugly. So this helper
> won't be safe to be run out of kprobe. But if we allow it for tracepoints only
> it should be ok. I think. There are no tracepoints in inner guts of vfs and I
> don't think they will ever be. So running in tracepoint->bpf_prog->d_path we
> will be sure that rename_lock+mount_lock can be safely spinlocked. Am I missing
> something?

It seems rather restrictive to only allow tracepoints (especially
without VFS tracepoints), although I'll use it to improve my syscall
tracepoint tools, so I'd be happy to see this merged even with that
restriction.

Just a thought: if *buffer is in BPF memory, can prepend_path() check
it's memory location and not try to grab the lock based on that? This
would be to avoid adding a flag.

>
> > > > with this number; quite possibly never before that function had been called
> > > > _and_ not once after it has returned.
> > >
> > > Right. TOCTOU is not a concern here. It's tracing. It's ok for full path to be
> > > 'one time deal'.
> >
> > It might very well be a full path of something completely unrelated to what
> > the syscall ends up operating upon.  It's not that the file might've been
> > moved; it might be a different file.  IOW, results of that tracing might be
> > misleading.
>
> That is correct. Tracing is fine with such limitation. Still better than probe_read.
>

+1

Tracing is observability tools and we document these caveats, and this
won't be the first time I've published tools where the printed path
may not be the one you think (e.g., the case of hard links.)

Brendan

-- 
Brendan Gregg, Senior Performance Architect, Netflix

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-12-13 20:41 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <cover.1574162990.git.ethercflow@gmail.com>
     [not found] ` <e8b1281b7405eb4b6c1f094169e6efd2c8cc95da.1574162990.git.ethercflow@gmail.com>
2019-11-23  3:18   ` [PATCH bpf-next v10 1/2] bpf: add new helper get_file_path for mapping a file descriptor to a pathname Alexei Starovoitov
2019-11-23  4:43     ` Al Viro
2019-11-23  4:51     ` Al Viro
2019-11-23  5:19       ` Alexei Starovoitov
2019-11-23  5:35         ` Al Viro
2019-11-23  6:04           ` Alexei Starovoitov
2019-12-13 19:51             ` Brendan Gregg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).