From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wm1-f47.google.com (mail-wm1-f47.google.com [209.85.128.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0DC642D3EC1 for ; Wed, 26 Nov 2025 08:29:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.47 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764145789; cv=none; b=nHeZSGrrDblWX1Ft6KtoMaAcINdFKDZ+mb9d7WoT4TUa9LaKq/hLt82Wb2gP4JCZ6gF9g6xpWUhZUjd4k3uvCtkJp171QqoqFRBUyFfu58dHI/0x54t/ab3qoobYHFsbX9tI+Fe2qbnrPcXNPh1biAXI6H6kx15X2I1fyTaMINI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764145789; c=relaxed/simple; bh=GFnpp+Wn8hvMjBAq5BQjznKICmYat4XMDiShPx6NwE8=; h=From:Date:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=if6iEQFENanFEuBdWTGLWiXPALZpeJpkxj91L4enR2cJgYl33hdURGX5WS8JinoJNFRAIe8gBmRrfi7J85pTOcDUHD42d1zfRph73VNKTT36DJ8+tJRi7mc1erVenUxHGCJNUBrVz7l5sEmOQuj2mKY311wKSRfqsyllxq6sox4= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=VCcYAjFk; arc=none smtp.client-ip=209.85.128.47 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="VCcYAjFk" Received: by mail-wm1-f47.google.com with SMTP id 5b1f17b1804b1-477b1cc8fb4so36499685e9.1 for ; Wed, 26 Nov 2025 00:29:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1764145786; x=1764750586; darn=vger.kernel.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:date:from:from:to :cc:subject:date:message-id:reply-to; bh=axJwRzJfcZn97VqPk57RHFfW0fq/xUaxkOSSTIHVd0U=; b=VCcYAjFkTnyIZ+AcjY28RMrSNaxPWRh/kfj/z05KX0wgULZitpWbJCJ7Xz02mytvd/ dyE/N9T3YJQTLtSFLsPmIt7ICJz1/z9H94cagzv/F9kDITM7/F76os5YbKZRhVFrIA24 ESCUQ/DIo5DuUycmg7y8UtT04oFjUet+A3k639awFppQJ5xfl5ky5SO1u7MNP0uwAipW qrg8rLkPvQZEWAjKQFRA6YcytKFw2HWdUL2Pl8F/JteicEDkl+pqBSuW+ERObg/gpqT5 tvT+HCDCiQV0WP40uC+VXyLGKROzqkbTCpaNyydGZ59sQjtBnKTQrHy5TUj2Ro8nlbcg 0UdA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1764145786; x=1764750586; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:date:from:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=axJwRzJfcZn97VqPk57RHFfW0fq/xUaxkOSSTIHVd0U=; b=M2ep9Xh+xY0FYlePm09swCh3kUwwXhnUIfQ4pYaTKwP+4Mn7ANz4LS1PQx2r8+8GYn 49oDgjLBnozQZPjxJfnvyj3s8GSGSq2egIUpJmdJWeCABShR7vQVeCf3Bz7viut4PwcE U9AgFrwmW+u3lo0c9wUShbaPKyB/SXzFtKvpzlXFNBuiZs11huNzhQZItZP1nkJFNHdC FR3xXZYf1Q5CdetJmtRGmDKKawJty2J4KUbTfBIgcGsDd/C0RvZZZrU/5KWDhKJcBavA dNtnIaRRZByBBPu6g8cSNsbcocNCltiafmbeYI1EYFr9H8OE5bAiCfqgJ6FHdafErCi6 InKg== X-Forwarded-Encrypted: i=1; AJvYcCWEpHdnN49GWaHODLq1+Q4hVuwZBbwivbfx1NtwSlwfbXls5XjfbexmaKMRlA2V0d8rnAztzwCo2fUo3xg=@vger.kernel.org X-Gm-Message-State: AOJu0YylLoLUQ8qOTZySGAq8XN8Rw9Di7MhBnPEmLPjS9c+WRA8F/jLm xQK5mmzGdAnw+vTRsXhxBCme8X/e6Q8h5iG2gEX/S626mTc7dQkaWI3M X-Gm-Gg: ASbGncvZNt+mzDC7sN7dBxYQuZjbWw5pOulBCMt10rpWLH0URUzkHTihqbzqE8GbET2 k0z4YhMqDkFuvM9wrW7avfJOZcl0buaHckseHNAdtmyG/7jylwL9kGaQUMKLdZMJY8IpWDe57qb M6eig5TR1lduqd4nPSw/60r0ApCR/WJBtIoK35SFhYffmxPS2PaxIcoyY8mBAuVy52XYN5IbmdW o2QWsfaDHVd4CsQ6VN9UQDGhZF1sCdya2oVP+Sj+LvKrVNpcg0fvNb7/3dEntQvtB+lJZbuXZMJ hMcignI3BypF+k+xAj49v7zd5cB3hXXaqqIYy1pAwnWEv5tnnYwh/puEu778mXXk48uVFGGcYBW br4vcEvJsUR4GvhmrdpRxTD4nrm/RErTwCZoXkA5JiwWqZfLcM4j7oaXm2i4N X-Google-Smtp-Source: AGHT+IEfJ/MzYl1IbGxFr5UX2AR0AI31EEtJQe3vfUY7glHzJ5dBJPC16eg0eTPzR1Bx8kAnwqirXQ== X-Received: by 2002:a05:600c:1c19:b0:477:7bca:8b3c with SMTP id 5b1f17b1804b1-47904b10379mr68289925e9.19.1764145786076; Wed, 26 Nov 2025 00:29:46 -0800 (PST) Received: from krava ([2a02:8308:a00c:e200::b44f]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-47906cb1f60sm34068405e9.1.2025.11.26.00.29.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 26 Nov 2025 00:29:45 -0800 (PST) From: Jiri Olsa X-Google-Original-From: Jiri Olsa Date: Wed, 26 Nov 2025 09:29:43 +0100 To: Andrii Nakryiko Cc: Andrii Nakryiko , bpf@vger.kernel.org, linux-kernel@vger.kernel.org, Song Liu , Yonghong Song , John Fastabend Subject: Re: [PATCH bpf-next 1/4] selftests/bpf: Emit nop,nop5 instructions for x86_64 usdt probe Message-ID: References: <20251117083551.517393-1-jolsa@kernel.org> <20251117083551.517393-2-jolsa@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: On Mon, Nov 24, 2025 at 09:29:01AM -0800, Andrii Nakryiko wrote: > On Mon, Nov 17, 2025 at 12:36 AM Jiri Olsa wrote: > > > > We can currently optimize uprobes on top of nop5 instructions, > > so application can define USDT_NOP to nop5 and use USDT macro > > to define optimized usdt probes. > > Thanks for working on this and sorry for the delay, I've been > travelling last week. > > > > > This works fine on new kernels, but could have performance penalty > > on older kernels, that do not have the support to optimize and to > > emulate nop5 instruction. > > > > execution of the usdt probe on top of nop: > > - nop -> trigger usdt -> emulate nop -> continue > > > > execution of the usdt probe on top of nop5: > > - nop5 -> trigger usdt -> single step nop5 -> continue > > > > Note the 'single step nop5' as the source of performance regression. > > nit: I get what you are saying, but I don't think the above > explanation is actually as clear as it could be. Try to simplify the > reasoning maybe by saying that until Linux vX.Y kerne's uprobe > implementation treated nop5 as an instruction that needs to be > single-stepped. Newer kernels, on the other hand, can handle nop5 > very-very fast (when uprobe is installed on top of them). Which > creates a dilemma where we want nop5 on new kernels, nop1 on old ones, > but we can't know upfront which kernel we'll run on. And thus the > whole patch set that's trying to have the cake and eat it too ;) ok, will try ;-) > > > > > To workaround that we change the USDT macro to emit nop,nop5 for > > the probe (instead of default nop) and make record of that in > > USDT record (more on that below). > > > > This can be detected by application (libbpf) and it can place the > > uprobe either on nop or nop5 based on the optimization support in > > the kernel. > > > > We make record of using the nop,nop5 instructions in the USDT ELF > > note data. > > > > Current elf note format is as follows: > > > > namesz (4B) | descsz (4B) | type (4B) | name | desc > > > > And current usdt record (with "stapsdt" name) placed in the note's > > desc data look like: > > > > loc_addr | 8 bytes > > base_addr | 8 bytes > > sema_addr | 8 bytes > > provider | zero terminated string > > name | zero terminated string > > args | zero terminated string > > > > None of the tested parsers (bpftrace-bcc, libbpf) checked that the args > > zero terminated byte is the actual end of the 'desc' data. As Andrii > > suggested we could use this and place extra zero byte right there as an > > indication for the parser we use the nop,nop5 instructions. > > > > It's bit tricky, but the other way would be to introduce new elf note type > > or note name and change all existing parsers to recognize it. With the change > > above the existing parsers would still recognize such usdt probes. > > ... and use safer (performance-wise) nop1 as uprobe target. > > We can treat this extra zero as a backwards-compatible extension of > provider+name+args section. If we ever need to have some extra flags > or extra information (e.g., argument names or whatnot), we can treat > this as either a fourth string or think about this as a single-byte > length prefix for a fixed binary extra information that should follow > (right now it's zero, so forward-compatible). For now just extra zero > is the least amount of work but good enough to solve the problem, > while being extendable for the future. ok, makes sense > > > > > Note we do not emit this extra byte if app defined its own nop through > > USDT_NOP macro. > > > > Suggested-by: Andrii Nakryiko > > Signed-off-by: Jiri Olsa > > --- > > tools/testing/selftests/bpf/usdt.h | 17 +++++++++++++++++ > > 1 file changed, 17 insertions(+) > > > > diff --git a/tools/testing/selftests/bpf/usdt.h b/tools/testing/selftests/bpf/usdt.h > > index 549d1f774810..57fa2902136c 100644 > > --- a/tools/testing/selftests/bpf/usdt.h > > +++ b/tools/testing/selftests/bpf/usdt.h > > @@ -312,9 +312,16 @@ struct usdt_sema { volatile unsigned short active; }; > > #ifndef USDT_NOP > > #if defined(__ia64__) || defined(__s390__) || defined(__s390x__) > > #define USDT_NOP nop 0 > > +#elif defined(__x86_64__) > > +#define USDT_NOP .byte 0x90, 0x0f, 0x1f, 0x44, 0x00, 0x0 /* nop, nop5 */ > > #else > > #define USDT_NOP nop > > #endif > > +#else > > +/* > > + * User define its own nop instruction, do not emit extra note data. > > + */ > > +#define __usdt_asm_extra > > I'd guard this with ifndef, just in case user do want custom USDT_NOP > while emitting that extra zero (e.g., if they have nop1 + nop5 + some > extra they need for logging or whatever). ok > > > #endif /* USDT_NOP */ > > > > /* > > @@ -403,6 +410,15 @@ struct usdt_sema { volatile unsigned short active; }; > > __asm__ __volatile__ ("" :: "m" (sema)); > > #endif > > > > +#ifndef __usdt_asm_extra > > +#ifdef __x86_64__ > > +#define __usdt_asm_extra \ > > + __usdt_asm1( .ascii "\0") > > nit: keep it single line > > > btw, the source of truth for usdt.h is at Github, please send a proper > PR with these change there, and then we can just sync upstream version > into selftests? yes, will do that thanks, jirka