Re: [PATCH bpf-next v9 1/4] bpf: add bpf_get_cpu_time_counter kfunc

BPF List
 help / color / mirror / Atom feed

From: Vadim Fedorenko <vadim.fedorenko@linux.dev>
To: Thomas Gleixner <tglx@linutronix.de>,
	Vadim Fedorenko <vadfed@meta.com>, Borislav Petkov <bp@alien8.de>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	Yonghong Song <yonghong.song@linux.dev>,
	Mykola Lysenko <mykolal@fb.com>
Cc: x86@kernel.org, bpf@vger.kernel.org,
	Peter Zijlstra <peterz@infradead.org>,
	Martin KaFai Lau <martin.lau@linux.dev>
Subject: Re: [PATCH bpf-next v9 1/4] bpf: add bpf_get_cpu_time_counter kfunc
Date: Sun, 1 Dec 2024 17:45:43 +0000	[thread overview]
Message-ID: <aec0acb2-9232-43da-856d-3ba88d0461e2@linux.dev> (raw)
In-Reply-To: <87a5dfwoyo.ffs@tglx>

On 01.12.2024 12:46, Thomas Gleixner wrote:
> On Fri, Nov 22 2024 at 16:58, Vadim Fedorenko wrote:
>> diff --git a/arch/x86/net/bpf_jit_comp32.c b/arch/x86/net/bpf_jit_comp32.c
>> index de0f9e5f9f73..a549aea25f5f 100644
>> --- a/arch/x86/net/bpf_jit_comp32.c
>> +++ b/arch/x86/net/bpf_jit_comp32.c
>> @@ -2094,6 +2094,13 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
>>   			if (insn->src_reg == BPF_PSEUDO_KFUNC_CALL) {
>>   				int err;
>>   
>> +				if (imm32 == BPF_CALL_IMM(bpf_get_cpu_time_counter)) {
>> +					if (cpu_feature_enabled(X86_FEATURE_LFENCE_RDTSC))
>> +						EMIT3(0x0F, 0xAE, 0xE8);
>> +					EMIT2(0x0F, 0x31);
> 
> What guarantees that RDTSC is supported by the CPU?

Well, technically it may be a problem on x86_32 because there are x86 compatible
platforms which don't have RDTSC, but they are almost 16+ years old, and I'm not
quite sure we expose vDSO on such platforms.

> 
> Aside of that, if you want the read to be ordered, then you need to take
> RDTSCP into account too.

Yes, we have already had this discussion. RDTSCP has the same ordering
guaranties as "LFENCE; RDTSC" according to the programming manuals. But it also
provides "cookie" value, which is not used in this case and just trashes the
value of ECX. To avoid additional register manipulation, I used lfence option.

>> +#if IS_ENABLED(CONFIG_GENERIC_GETTIMEOFDAY)
>> +__bpf_kfunc u64 bpf_get_cpu_time_counter(void)
>> +{
>> +	const struct vdso_data *vd = __arch_get_k_vdso_data();
>> +
>> +	vd = &vd[CS_RAW];
>> +
>> +	/* CS_RAW clock_mode translates to VDSO_CLOCKMODE_TSC on x86 and
> 
> How so?
> 
> vd->clock_mode is not guaranteed to be VDSO_CLOCKMODE_TSC or
> VDSO_CLOCKMODE_ARCHTIMER. CS_RAW is the access to the raw (uncorrected)
> time of the current clocksource. If the clock mode is not matching, then
> you cannot access it.

That's more about x86 and virtualization options. But in the end all this ends
up in reading tsc value. And we do JIT anyway, so this function call will never
be executed on x86. Other architectures (well, apart from MIPS) don't care about
vd->clock_mode at all. And we don't provide kfuncs for architectures without JIT

For MIPS I think I can ifdef these new kfuncs to the case when CONFIG_CSRC_R4K
is not defined.

I'm going to create a patchset to implement arch-specific replacements for all
architectures supported by BPF JIT, so in the end this call will be effectively
not executed.

> 
>> +	 * to VDSO_CLOCKMODE_ARCHTIMER on aarch64/risc-v. We cannot use
>> +	 * vd->clock_mode directly because it brings possible access to
>> +	 * pages visible by user-space only via vDSO.
> 
> How so? vd->clock_mode is kernel visible.

vd->clock_mode is kernel visible, but compiler cannot optimize out code which
accesses user-space pages if I don't provide constant value here.

> 
>>         * But the constant value
>> +	 * of 1 is exactly what we need - it works for any architecture and
>> +	 * translates to reading of HW timecounter regardles of architecture.
> 
> It does not. Care to look at MIPS?

Yes, this is pretty much specific. But again, the goal is to have JIT
implementation for all architectures and this func will actually be never called
this way.

> 
>> +	 * We still have to provide vdso_data for some architectures to avoid
>> +	 * NULL pointer dereference.
>> +	 */
>> +	return __arch_get_hw_counter(1, vd);
> 
> This is outright dangerous. __arch_get_hw_counter() is for VDSO usage
> and not for in kernel usage. What guarantees you that the architecture
> specific implementation does not need access to user only mappings.
> 
> Aside of that what guarantees that '1' is what you want and stays that
> way forever? It's already broken on MIPS.

I can ifdef MIPS case until we have JIT for it (which has pretty much 
straightforward implementation for HW counter)

> 
> Thanks,
> 
>          tglx

next prev parent reply	other threads:[~2024-12-01 17:45 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-23  0:58 [PATCH bpf-next v9 0/4] bpf: add cpu time counter kfuncs Vadim Fedorenko
2024-11-23  0:58 ` [PATCH bpf-next v9 1/4] bpf: add bpf_get_cpu_time_counter kfunc Vadim Fedorenko
2024-11-27 12:03   ` kernel test robot
2024-12-01 12:46   ` Thomas Gleixner
2024-12-01 17:45     ` Vadim Fedorenko [this message]
2024-12-02 20:52       ` Thomas Gleixner
2024-11-23  0:58 ` [PATCH bpf-next v9 2/4] bpf: add bpf_cpu_time_counter_to_ns helper Vadim Fedorenko
2024-11-27 19:07   ` kernel test robot
2024-11-23  0:58 ` [PATCH bpf-next v9 3/4] selftests/bpf: add selftest to check bpf_get_cpu_time_counter jit Vadim Fedorenko
2024-11-23  0:58 ` [PATCH bpf-next v9 4/4] selftests/bpf: add usage example for cpu time counter kfuncs Vadim Fedorenko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aec0acb2-9232-43da-856d-3ba88d0461e2@linux.dev \
    --to=vadim.fedorenko@linux.dev \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bp@alien8.de \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=eddyz87@gmail.com \
    --cc=martin.lau@linux.dev \
    --cc=mykolal@fb.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=vadfed@meta.com \
    --cc=x86@kernel.org \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox