All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yonghong Song <yhs@fb.com>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Song Liu <songliubraving@fb.com>, bpf <bpf@vger.kernel.org>,
	Networking <netdev@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"mingo@redhat.com" <mingo@redhat.com>,
	"peterz@infradead.org" <peterz@infradead.org>,
	"ast@kernel.org" <ast@kernel.org>,
	"daniel@iogearbox.net" <daniel@iogearbox.net>,
	"andrii@kernel.org" <andrii@kernel.org>,
	"john.fastabend@gmail.com" <john.fastabend@gmail.com>,
	"kpsingh@chromium.org" <kpsingh@chromium.org>,
	Kernel Team <Kernel-team@fb.com>,
	"haoluo@google.com" <haoluo@google.com>
Subject: Re: [PATCH bpf-next 4/4] bpf: runqslower: use task local storage
Date: Mon, 11 Jan 2021 23:33:42 -0800	[thread overview]
Message-ID: <8d9983c4-2842-e2f8-94ce-1676977bb720@fb.com> (raw)
In-Reply-To: <CAEf4BzZivGBmDbUxfiDwAC3aFoTWNfyWaiZRA4Vu16ZT9kzE8A@mail.gmail.com>



On 1/11/21 11:14 PM, Andrii Nakryiko wrote:
> On Mon, Jan 11, 2021 at 7:24 PM Yonghong Song <yhs@fb.com> wrote:
>>
>>
>>
>> On 1/11/21 2:54 PM, Song Liu wrote:
>>>
>>>
>>>> On Jan 11, 2021, at 9:49 AM, Yonghong Song <yhs@fb.com> wrote:
>>>>
>>>>
>>>>
>>>> On 1/8/21 3:19 PM, Song Liu wrote:
>>>>> Replace hashtab with task local storage in runqslower. This improves the
>>>>> performance of these BPF programs. The following table summarizes average
>>>>> runtime of these programs, in nanoseconds:
>>>>>                             task-local   hash-prealloc   hash-no-prealloc
>>>>> handle__sched_wakeup             125             340               3124
>>>>> handle__sched_wakeup_new        2812            1510               2998
>>>>> handle__sched_switch             151             208                991
>>>>> Note that, task local storage gives better performance than hashtab for
>>>>> handle__sched_wakeup and handle__sched_switch. On the other hand, for
>>>>> handle__sched_wakeup_new, task local storage is slower than hashtab with
>>>>> prealloc. This is because handle__sched_wakeup_new accesses the data for
>>>>> the first time, so it has to allocate the data for task local storage.
>>>>> Once the initial allocation is done, subsequent accesses, as those in
>>>>> handle__sched_wakeup, are much faster with task local storage. If we
>>>>> disable hashtab prealloc, task local storage is much faster for all 3
>>>>> functions.
>>>>> Signed-off-by: Song Liu <songliubraving@fb.com>
>>>>> ---
>>>>>    tools/bpf/runqslower/runqslower.bpf.c | 26 +++++++++++++++-----------
>>>>>    1 file changed, 15 insertions(+), 11 deletions(-)
>>>>> diff --git a/tools/bpf/runqslower/runqslower.bpf.c b/tools/bpf/runqslower/runqslower.bpf.c
>>>>> index 1f18a409f0443..c4de4179a0a17 100644
>>>>> --- a/tools/bpf/runqslower/runqslower.bpf.c
>>>>> +++ b/tools/bpf/runqslower/runqslower.bpf.c
>>>>> @@ -11,9 +11,9 @@ const volatile __u64 min_us = 0;
>>>>>    const volatile pid_t targ_pid = 0;
>>>>>      struct {
>>>>> -   __uint(type, BPF_MAP_TYPE_HASH);
>>>>> -   __uint(max_entries, 10240);
>>>>> -   __type(key, u32);
>>>>> +   __uint(type, BPF_MAP_TYPE_TASK_STORAGE);
>>>>> +   __uint(map_flags, BPF_F_NO_PREALLOC);
>>>>> +   __type(key, int);
>>>>>      __type(value, u64);
>>>>>    } start SEC(".maps");
>>>>>    @@ -25,15 +25,19 @@ struct {
>>>>>      /* record enqueue timestamp */
>>>>>    __always_inline
>>>>> -static int trace_enqueue(u32 tgid, u32 pid)
>>>>> +static int trace_enqueue(struct task_struct *t)
>>>>>    {
>>>>> -   u64 ts;
>>>>> +   u32 pid = t->pid;
>>>>> +   u64 ts, *ptr;
>>>>>              if (!pid || (targ_pid && targ_pid != pid))
>>>>>              return 0;
>>>>>              ts = bpf_ktime_get_ns();
>>>>> -   bpf_map_update_elem(&start, &pid, &ts, 0);
>>>>> +   ptr = bpf_task_storage_get(&start, t, 0,
>>>>> +                              BPF_LOCAL_STORAGE_GET_F_CREATE);
>>>>> +   if (ptr)
>>>>> +           *ptr = ts;
>>>>>      return 0;
>>>>>    }
>>>>>    @@ -43,7 +47,7 @@ int handle__sched_wakeup(u64 *ctx)
>>>>>      /* TP_PROTO(struct task_struct *p) */
>>>>>      struct task_struct *p = (void *)ctx[0];
>>>>>    - return trace_enqueue(p->tgid, p->pid);
>>>>> +   return trace_enqueue(p);
>>>>>    }
>>>>>      SEC("tp_btf/sched_wakeup_new")
>>>>> @@ -52,7 +56,7 @@ int handle__sched_wakeup_new(u64 *ctx)
>>>>>      /* TP_PROTO(struct task_struct *p) */
>>>>>      struct task_struct *p = (void *)ctx[0];
>>>>>    - return trace_enqueue(p->tgid, p->pid);
>>>>> +   return trace_enqueue(p);
>>>>>    }
>>>>>      SEC("tp_btf/sched_switch")
>>>>> @@ -70,12 +74,12 @@ int handle__sched_switch(u64 *ctx)
>>>>>              /* ivcsw: treat like an enqueue event and store timestamp */
>>>>>      if (prev->state == TASK_RUNNING)
>>>>> -           trace_enqueue(prev->tgid, prev->pid);
>>>>> +           trace_enqueue(prev);
>>>>>              pid = next->pid;
>>>>>              /* fetch timestamp and calculate delta */
>>>>> -   tsp = bpf_map_lookup_elem(&start, &pid);
>>>>> +   tsp = bpf_task_storage_get(&start, next, 0, 0);
>>>>>      if (!tsp)
>>>>>              return 0;   /* missed enqueue */
>>>>
>>>> Previously, hash table may overflow so we may have missed enqueue.
>>>> Here with task local storage, is it possible to add additional pid
>>>> filtering in the beginning of handle__sched_switch such that
>>>> missed enqueue here can be treated as an error?
>>>
>>> IIUC, hashtab overflow is not the only reason of missed enqueue. If the
>>> wakeup (which calls trace_enqueue) happens before runqslower starts, we
>>> may still get missed enqueue in sched_switch, no?
>>
>> the wakeup won't happen before runqslower starts since runqslower needs
>> to start to do attachment first and then trace_enqueue() can run.
> 
> I think Song is right. Given wakeup and sched_switch need to be
> matched, depending at which exact time we attach BPF programs, we can
> end up missing wakeup, but not missing sched_switch, no? So it's not
> an error.

The current approach works fine. What I suggested is to
tighten sched_switch only for target_pid. wakeup (doing queuing) will
be more relaxed than sched_switch to ensure task local storage creation
is always there for target_pid regardless of attachment timing.
I think it should work, but we have to experiment to see actual
results...

> 
>>
>> For the current implementation trace_enqueue() will happen for any non-0
>> pid before setting test_progs tgid, and will happen for any non-0 and
>> test_progs tgid if it is set, so this should be okay if we do filtering
>> in handle__sched_switch. Maybe you can do an experiment to prove whether
>> my point is correct or not.
>>
>>>
>>> Thanks,
>>> Song
>>>

  reply	other threads:[~2021-01-12  7:35 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20210108231950.3844417-1-songliubraving@fb.com>
     [not found] ` <20210108231950.3844417-3-songliubraving@fb.com>
2021-01-11 17:30   ` [PATCH bpf-next 2/4] selftests/bpf: add non-BPF_LSM test for task local storage Yonghong Song
2021-01-11 17:44     ` KP Singh
2021-01-11 22:50       ` Song Liu
2021-01-11 22:49     ` Song Liu
2021-01-12  7:06   ` Andrii Nakryiko
     [not found] ` <20210108231950.3844417-4-songliubraving@fb.com>
2021-01-11 17:37   ` [PATCH bpf-next 3/4] bpf: runqslower: prefer use local vmlinux Yonghong Song
     [not found] ` <20210108231950.3844417-5-songliubraving@fb.com>
2021-01-11 17:49   ` [PATCH bpf-next 4/4] bpf: runqslower: use task local storage Yonghong Song
2021-01-11 22:54     ` Song Liu
2021-01-12  3:24       ` Yonghong Song
2021-01-12  7:14         ` Andrii Nakryiko
2021-01-12  7:33           ` Yonghong Song [this message]
     [not found] ` <20210108231950.3844417-2-songliubraving@fb.com>
2021-01-11  6:27   ` [PATCH bpf-next 1/4] bpf: enable task local storage for tracing programs Yonghong Song
2021-01-11 10:17     ` KP Singh
2021-01-11 15:56       ` Yonghong Song
2021-01-11 10:14   ` KP Singh
2021-01-11 23:16     ` Song Liu
2021-01-11 17:16   ` Yonghong Song
2021-01-11 18:56   ` Martin KaFai Lau
2021-01-11 21:35     ` KP Singh
2021-01-11 21:58       ` Martin KaFai Lau
2021-01-11 23:45         ` Song Liu
2021-01-12 16:32           ` Yonghong Song
2021-01-12 16:53             ` KP Singh
2021-01-15 23:34               ` Song Liu
2021-01-16  0:55                 ` Yonghong Song
2021-01-16  1:12                   ` Song Liu
2021-01-16  1:50                     ` Yonghong Song
2021-01-11 23:41     ` Song Liu
2021-01-12 18:21       ` Martin KaFai Lau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8d9983c4-2842-e2f8-94ce-1676977bb720@fb.com \
    --to=yhs@fb.com \
    --cc=Kernel-team@fb.com \
    --cc=andrii.nakryiko@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=haoluo@google.com \
    --cc=john.fastabend@gmail.com \
    --cc=kpsingh@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=songliubraving@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.