public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Alexei Starovoitov <ast@plumgrid.com>
To: Kaixu Xia <xiakaixu@huawei.com>,
	davem@davemloft.net, acme@kernel.org, mingo@redhat.com,
	a.p.zijlstra@chello.nl, masami.hiramatsu.pt@hitachi.com,
	jolsa@kernel.org
Cc: wangnan0@huawei.com, linux-kernel@vger.kernel.org,
	pi3orama@163.com, hekuang@huawei.com
Subject: Re: [PATCH v2 3/5] bpf: Save the pointer to struct perf_event to map
Date: Wed, 22 Jul 2015 18:08:24 -0700	[thread overview]
Message-ID: <55B03E88.1000303@plumgrid.com> (raw)
In-Reply-To: <1437552572-84748-4-git-send-email-xiakaixu@huawei.com>

On 7/22/15 1:09 AM, Kaixu Xia wrote:
> The user space event FDs from perf_event_open() syscall are
> converted to the pointer to struct perf_event and stored in map.
...
> +static int replace_map_with_perf_event(void *value)
> +{
> +	struct perf_event *event;
> +	u32 fd;
> +
> +	fd = *(u32 *)value;
> +
> +	event = perf_event_get(fd);
> +	if (IS_ERR(event))
> +		return PTR_ERR(event);
> +
> +	/* limit the event type to PERF_TYPE_RAW
> +	 * and PERF_TYPE_HARDWARE
> +	 */
> +	if (event->attr.type != PERF_TYPE_RAW &&
> +	    event->attr.type != PERF_TYPE_HARDWARE)
> +		return -EINVAL;
> +
> +	memcpy(value, &event, sizeof(struct perf_event *));
> +
> +	return 0;
> +}
> +
> +static bool check_map_perf_event_stored(struct bpf_map *map, void *key)
> +{
> +	void *event;
> +	bool is_stored = false;
> +
> +	rcu_read_lock();
> +	event = array_map_lookup_elem(map, key);
> +	if (event && (*(unsigned long *)event))
> +		is_stored = true;
> +	rcu_read_unlock();
> +
> +	return is_stored;
> +}
> +
> +/* only called from syscall */
> +static int perf_event_array_map_update_elem(struct bpf_map *map, void *key,
> +					    void *value, u64 map_flags)
> +{
> +	/* check if the value is already stored */
> +	if (check_map_perf_event_stored(map, key))
> +		return -EINVAL;
> +
> +	if (replace_map_with_perf_event(value))
> +		return -EBADF;

the above three functions are broken due to races.
update_elem can be called by different user space processes.
Please see how prog_array solves it via xchg()

> +	if (map->map_type == BPF_MAP_TYPE_PERF_EVENT_ARRAY) {
> +		if (!map->ops->map_traverse_elem)
> +			return -EPERM;
> +
> +		rcu_read_lock();
> +		if (map->ops->map_traverse_elem(bpf_map_perf_event_put, map) < 0) {
> +			rcu_read_unlock();
> +			return -EINVAL;
> +		}
> +		rcu_read_unlock();

as I mentioned as part of patch 2 the above seems unnecessary.

> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index d3dae34..14a9924 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -8574,6 +8574,32 @@ void perf_event_delayed_put(struct task_struct *task)
>   		WARN_ON_ONCE(task->perf_event_ctxp[ctxn]);
>   }
>
> +struct perf_event *perf_event_get(unsigned int fd)
> +{
> +	struct perf_event *event;
> +	struct fd f;
> +
> +	f = fdget(fd);
> +
> +	if (!f.file)
> +		return ERR_PTR(-EBADF);
> +
> +	if (f.file->f_op != &perf_fops) {
> +		fdput(f);
> +		return ERR_PTR(-EINVAL);
> +	}
> +
> +	event = f.file->private_data;
> +
> +	if (!atomic_long_inc_not_zero(&event->refcount)) {

I don't understand necessity of '_not_zero' suffix. why?
How it can possibly race to zero here if we have an fd?

> +		fdput(f);
> +		return ERR_PTR(-ENOENT);
> +	}
> +
> +	fdput(f);
> +	return event;
> +}


  reply	other threads:[~2015-07-23  1:08 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-22  8:09 [PATCH v2 0/5] bpf: Introduce the new ability of eBPF programs to access hardware PMU counter Kaixu Xia
2015-07-22  8:09 ` [PATCH v2 1/5] bpf: Add new bpf map type to store the pointer to struct perf_event Kaixu Xia
2015-07-23  0:48   ` Alexei Starovoitov
2015-07-23  1:08     ` Wangnan (F)
2015-07-23  1:39       ` Alexei Starovoitov
2015-07-22  8:09 ` [PATCH v2 2/5] bpf: Add function map->ops->map_traverse_elem() to traverse map elems Kaixu Xia
2015-07-23  1:00   ` Alexei Starovoitov
2015-07-22  8:09 ` [PATCH v2 3/5] bpf: Save the pointer to struct perf_event to map Kaixu Xia
2015-07-23  1:08   ` Alexei Starovoitov [this message]
2015-07-22  8:09 ` [PATCH v2 4/5] bpf: Implement function bpf_perf_event_read() that get the selected hardware PMU conuter Kaixu Xia
2015-07-23  1:14   ` Alexei Starovoitov
2015-07-23  2:12     ` xiakaixu
2015-07-23  2:22       ` Alexei Starovoitov
2015-07-23  2:39         ` xiakaixu
2015-07-22  8:09 ` [PATCH v2 5/5] samples/bpf: example of get selected PMU counter value Kaixu Xia
2015-07-23  1:16   ` Alexei Starovoitov
2015-07-23 23:33 ` [PATCH v2 0/5] bpf: Introduce the new ability of eBPF programs to access hardware PMU counter Daniel Borkmann
2015-07-25  2:14   ` xiakaixu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55B03E88.1000303@plumgrid.com \
    --to=ast@plumgrid.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=acme@kernel.org \
    --cc=davem@davemloft.net \
    --cc=hekuang@huawei.com \
    --cc=jolsa@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=masami.hiramatsu.pt@hitachi.com \
    --cc=mingo@redhat.com \
    --cc=pi3orama@163.com \
    --cc=wangnan0@huawei.com \
    --cc=xiakaixu@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox