public inbox for dwarves@vger.kernel.org
 help / color / mirror / Atom feed
From: Yonghong Song <yonghong.song@linux.dev>
To: Eduard Zingerman <eddyz87@gmail.com>,
	Alan Maguire <alan.maguire@oracle.com>,
	Arnaldo Carvalho de Melo <arnaldo.melo@gmail.com>,
	dwarves@vger.kernel.org
Cc: Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	bpf@vger.kernel.org, Daniel Borkmann <daniel@iogearbox.net>,
	kernel-team@fb.com, Song Liu <song@kernel.org>
Subject: Re: [PATCH dwarves 3/3] dwarf_loader: Check DW_OP_[GNU_]entry_value for possible parameter matching
Date: Mon, 11 Nov 2024 10:51:05 -0800	[thread overview]
Message-ID: <762f7d67-4d4f-4d94-a2c3-6f6b11913c8d@linux.dev> (raw)
In-Reply-To: <31dea31e6f75916fdc078d433263daa6bb0bffdc.camel@gmail.com>




On 11/10/24 3:38 AM, Eduard Zingerman wrote:
> On Fri, 2024-11-08 at 10:05 -0800, Yonghong Song wrote:
>
> [...]
>
>> For one of internal 6.11 kernel, there are 62498 functions in BTF and
>> perf_event_read() is not there. With this patch, there are 61552 functions
>> in BTF and perf_event_read() is included.
> Hi Yonghong,
>
> I checked this patch using my local kernel build and do see a
> difference in generated funcs: 47756 vs 47721 funcs with and without.
> Looking at DWARF for the newly added functions, the do indeed have
> DW_OP_entry_value(expected_reg) in their list of locations.
>
>> Reported-by: Song Liu <song@kernel.org>
>> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
>> ---
>>   dwarf_loader.c | 81 +++++++++++++++++++++++++++++++++++---------------
>>   1 file changed, 57 insertions(+), 24 deletions(-)
>>
>> diff --git a/dwarf_loader.c b/dwarf_loader.c
>> index e0b8c11..1fe44bc 100644
>> --- a/dwarf_loader.c
>> +++ b/dwarf_loader.c
>> @@ -1169,34 +1169,67 @@ static bool check_dwarf_locations(Dwarf_Attribute *attr, struct parameter *parm,
>>   		return false;
>>   
>>   #if _ELFUTILS_PREREQ(0, 157)
>> -	/* dwarf_getlocations() handles location lists; here we are
>> -	 * only interested in the first expr.
>> -	 */
>> -	if (dwarf_getlocations(attr, 0, &base, &start, &end,
>> -			       &loc.expr, &loc.exprlen) > 0 &&
>> -		loc.exprlen != 0) {
>> -		expr = loc.expr;
>> -
>> -		switch (expr->atom) {
>> -		case DW_OP_reg0 ... DW_OP_reg31:
>> -			/* mark parameters that use an unexpected
>> -			 * register to hold a parameter; these will
>> -			 * be problematic for users of BTF as they
>> -			 * violate expectations about register
>> -			 * contents.
>> +	bool reg_matched = false, reg_unmatched = false, first_expr_reg = false, ret = false;
>> +	ptrdiff_t offset = 0;
>> +	int loc_num = -1;
>> +
>> +	while ((offset = dwarf_getlocations(attr, offset, &base, &start, &end, &loc.expr, &loc.exprlen)) > 0 &&
>> +	       loc.exprlen != 0) {
>> +		ret = true;
>> +		loc_num++;
>> +
>> +		for (int i = 0; i < loc.exprlen; i++) {
> I'm not sure if you need to iterate every expression in the list.
> The list is a stack program, so each expression is a command.

For this particular case, we do not need to iterate every expression in
the list. For a location like
     [0xffffffff812c6323, 0xffffffff812c63fe): DW_OP_GNU_entry_value(DW_OP_reg5 RDI), DW_OP_stack_value
We only cares the DW_OP_GNU_entry_value(DW_OP_reg5 RDI), so check the *first*
entry of each location is enough.

>
>> +			Dwarf_Attribute entry_attr;
>> +			Dwarf_Op *entry_ops;
>> +			size_t entry_len;
>> +
>> +			expr = &loc.expr[i];
>> +			switch (expr->atom) {
>> +			case DW_OP_reg0 ... DW_OP_reg31:
>> +				/* first location, first expression */
>> +				if (loc_num == 0 && i == 0) {
>> +					if (expected_reg >= 0) {
>> +						if (expected_reg == expr->atom) {
>> +							reg_matched = true;
> reg_matched is never really used in conditionals, as everywhere it is
> set to 'true' the 'return true' follows.

Right, there is no need to have 'reg_matches' variable.

>
> [...]
>
>> +	if (reg_unmatched)
>> +		parm->unexpected_reg = 1;
>> +	else if (ret && !first_expr_reg)
>> +		parm->optimized = 1;
> It is a bit unfortunate, that parm->optimized is now set in two functions.
> What do you think about the simplification as at the end of this email?

My original intention is to have *clear* separate for two cases,
_ELFUTILS_PREREQ(0, 157) or not. But if we intends to only visit
the first attr of every location, the things indeed can be simplified.

>
> Also, it appears there is some bug either in pahole or in libdw's
> implementation of dwarf_getlocation(). When I try both your patch-set
> and my variant there is a segfault once in a while:
>
>    $ for i in $(seq 1 100); \
>      do echo "---> $i"; \
>         pahole -j --skip_encoding_btf_inconsistent_proto -J --btf_encode_detached=/dev/null vmlinux ; \
>      done
>    ---> 1
>    ...
>    ---> 71
>    Segmentation fault (core dumped)
>    ...
>
> The segfault happens only when -j (multiple threads) is passed.
> If pahole is built with sanitizers
> (passing -DCMAKE_C_FLAGS="-fsanitize=undefined,address")
> the stack trace looks as follows:
>
> AddressSanitizer:DEADLYSIGNAL
> =================================================================
> ==2360650==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000008 (pc 0x7f20bcb29200 bp 0x7f208f7fc110 sp 0x7f208f7fc0b8 T18)
> ==2360650==The signal is caused by a READ memory access.
> ==2360650==Hint: address points to the zero page.
>      #0 0x7f20bcb29200 in maybe_split_for_insert /usr/src/debug/glibc-2.39-22.fc40.x86_64/misc/tsearch.c:228
>      #1 0x7f20bcb29465 in __GI___tsearch /usr/src/debug/glibc-2.39-22.fc40.x86_64/misc/tsearch.c:358
>      #2 0x7f20bcb29465 in __GI___tsearch /usr/src/debug/glibc-2.39-22.fc40.x86_64/misc/tsearch.c:290
>      #3 0x7f20bd489a51 in tsearch.part.0 (/lib64/libasan.so.8+0x89a51) (BuildId: a4ad7eb954b390cf00f07fa10952988a41d9fc7a)
>      #4 0x7f20bdb07601 in __libdw_intern_expression (/lib64/libdw.so.1+0x35601) (BuildId: b06d0f436023c584e2d618f94f530d9e22671078)
>      #5 0x7f20bdb09c70 in dwarf_getlocation (/lib64/libdw.so.1+0x37c70) (BuildId: b06d0f436023c584e2d618f94f530d9e22671078)
>      #6 0x4912c4 in parameter__new (/home/eddy/work/dwarves-fork/build/pahole+0x4912c4) (BuildId: 2cc4f329c9ca6d293d2044fae936e6a4b0ffa85a)
>      #7 0x495bfa in die__process_function (/home/eddy/work/dwarves-fork/build/pahole+0x495bfa) (BuildId: 2cc4f329c9ca6d293d2044fae936e6a4b0ffa85a)
>      #8 0x498671 in __die__process_tag (/home/eddy/work/dwarves-fork/build/pahole+0x498671) (BuildId: 2cc4f329c9ca6d293d2044fae936e6a4b0ffa85a)
>      #9 0x49c692 in die__process_unit (/home/eddy/work/dwarves-fork/build/pahole+0x49c692) (BuildId: 2cc4f329c9ca6d293d2044fae936e6a4b0ffa85a)
>      #10 0x49cb5f in die__process (/home/eddy/work/dwarves-fork/build/pahole+0x49cb5f) (BuildId: 2cc4f329c9ca6d293d2044fae936e6a4b0ffa85a)
>      #11 0x4a3767 in dwarf_cus__process_cu (/home/eddy/work/dwarves-fork/build/pahole+0x4a3767) (BuildId: 2cc4f329c9ca6d293d2044fae936e6a4b0ffa85a)
>      #12 0x4a3f8f in dwarf_cus__process_cu_thread (/home/eddy/work/dwarves-fork/build/pahole+0x4a3f8f) (BuildId: 2cc4f329c9ca6d293d2044fae936e6a4b0ffa85a)
>      #13 0x7f20bd45df95 in asan_thread_start(void*) (/lib64/libasan.so.8+0x5df95) (BuildId: a4ad7eb954b390cf00f07fa10952988a41d9fc7a)
>      #14 0x7f20bcaa66d6 in start_thread /usr/src/debug/glibc-2.39-22.fc40.x86_64/nptl/pthread_create.c:447
>      #15 0x7f20bcb2a60b in clone3 ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

I didn't notice this one during my test. But as later discussion suggested,
this is something we want to fix.

>
> ---
>
> diff --git a/dwarf_loader.c b/dwarf_loader.c
> index ec8641b..c5c2298 100644
> --- a/dwarf_loader.c
> +++ b/dwarf_loader.c
> @@ -1157,16 +1157,83 @@ static struct template_parameter_pack *template_parameter_pack__new(Dwarf_Die *d
>   	return pack;
>   }
>   
> +static ptrdiff_t __dwarf_getlocations(Dwarf_Attribute *attr,
> +				      ptrdiff_t offset, Dwarf_Addr *basep,
> +				      Dwarf_Addr *startp, Dwarf_Addr *endp,
> +				      Dwarf_Op **expr, size_t *exprlen)
> +{
> +#if _ELFUTILS_PREREQ(0, 157)
> +	return dwarf_getlocations(attr, offset, basep, startp, endp, expr, exprlen);
> +#else
> +	int err;
> +
> +	if (offset)
> +		return 0;
> +
> +	err = dwarf_getlocation(attr, expr, exprlen);
> +	return err < 0 ? err : 1;
> +#endif
> +}
> +
> +/* For DW_AT_location 'attr':
> + * - if first location is DW_OP_regXX with expected number, returns the register;
> + * - if location DW_OP_entry_value(DW_OP_regXX) is in the list, returns the register;
> + * - if first location is DW_OP_regXX, returns the register;
> + * - otherwise returns -1.
> + */
> +static int param_reg_at_entry(Dwarf_Attribute *attr, int expected_reg)
> +{
> +	Dwarf_Op *expr, *entry_ops, *first_expr = NULL;
> +	Dwarf_Addr base, start, end;
> +	Dwarf_Attribute entry_attr;
> +	size_t exprlen, entry_len;
> +	ptrdiff_t offset = 0;
> +        int loc_num = -1;
> +
> +        while ((offset = __dwarf_getlocations(attr, offset, &base, &start, &end, &expr, &exprlen)) > 0) {
> +		loc_num++;
> +
> +		/* Convert expression list (XX DW_OP_stack_value) -> (XX).
> +		 * DW_OP_stack_value instructs interpreter to pop current value from
> +		 * DWARF expression evaluation stack, and thus is not important here.
> +		 */
> +		if (exprlen > 1 && expr[exprlen - 1].atom == DW_OP_stack_value)
> +			exprlen--;
> +
> +		if (exprlen != 1)
> +			continue;
> +
> +		switch (expr->atom) {
> +		/* match DW_OP_regXX at first location */
> +		case DW_OP_reg0 ... DW_OP_reg31:
> +			if (loc_num == 0) {
> +				if(expr->atom == expected_reg)
> +					return expr->atom;
> +				first_expr = expr;
> +			}
> +		/* match DW_OP_entry_value(DW_OP_regXX) at any location */
> +		case DW_OP_entry_value:
> +		case DW_OP_GNU_entry_value:
> +			if (dwarf_getlocation_attr (attr, expr, &entry_attr) == 0 &&
> +			    dwarf_getlocation (&entry_attr, &entry_ops, &entry_len) == 0 &&
> +			    entry_len == 1)
> +				return entry_ops->atom;
> +			break;
> +		}
> +	}
> +	if (first_expr)
> +		return first_expr->atom;
> +	return -1;
> +}
> +
>   static struct parameter *parameter__new(Dwarf_Die *die, struct cu *cu,
>   					struct conf_load *conf, int param_idx)
>   {
>   	struct parameter *parm = tag__alloc(cu, sizeof(*parm));
>   
>   	if (parm != NULL) {
> -		Dwarf_Addr base, start, end;
>   		bool has_const_value;
>   		Dwarf_Attribute attr;
> -		struct location loc;
>   
>   		tag__init(&parm->tag, cu, die);
>   		parm->name = attr_string(die, DW_AT_name, conf);
> @@ -1208,35 +1275,21 @@ static struct parameter *parameter__new(Dwarf_Die *die, struct cu *cu,
>   		 */
>   		has_const_value = dwarf_attr(die, DW_AT_const_value, &attr) != NULL;
>   		parm->has_loc = dwarf_attr(die, DW_AT_location, &attr) != NULL;
> -		/* dwarf_getlocations() handles location lists; here we are
> -		 * only interested in the first expr.
> -		 */
> -		if (parm->has_loc &&
> -#if _ELFUTILS_PREREQ(0, 157)
> -		    dwarf_getlocations(&attr, 0, &base, &start, &end,
> -				       &loc.expr, &loc.exprlen) > 0 &&
> -#else
> -		    dwarf_getlocation(&attr, &loc.expr, &loc.exprlen) == 0 &&
> -#endif
> -			loc.exprlen != 0) {
> +
> +		if (parm->has_loc) {
>   			int expected_reg = cu->register_params[param_idx];
> -			Dwarf_Op *expr = loc.expr;
> +			int actual_reg = param_reg_at_entry(&attr, expected_reg);
>   
> -			switch (expr->atom) {
> -			case DW_OP_reg0 ... DW_OP_reg31:
> +			if (actual_reg < 0)
> +				parm->optimized = 1;
> +			else if (expected_reg >= 0 && expected_reg != actual_reg)
>   				/* mark parameters that use an unexpected
>   				 * register to hold a parameter; these will
>   				 * be problematic for users of BTF as they
>   				 * violate expectations about register
>   				 * contents.
>   				 */
> -				if (expected_reg >= 0 && expected_reg != expr->atom)
> -					parm->unexpected_reg = 1;
> -				break;
> -			default:
> -				parm->optimized = 1;
> -				break;
> -			}
> +				parm->unexpected_reg = 1;
>   		} else if (has_const_value) {
>   			parm->optimized = 1;
>   		}

Thanks. With only visited the first attr for location list, we can 
indeed have less changes compared to original code.



  parent reply	other threads:[~2024-11-11 18:51 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-08 18:05 [PATCH dwarves 0/3] Check DW_OP_[GNU_]entry_value for possible parameter matching Yonghong Song
2024-11-08 18:05 ` [PATCH dwarves 1/3] dwarf_loader: Refactor function parameter__new() Yonghong Song
2024-11-11 11:26   ` Alan Maguire
2024-11-08 18:05 ` [PATCH dwarves 2/3] dwarf_loader: Refactor function check_dwarf_locations() Yonghong Song
2024-11-08 18:05 ` [PATCH dwarves 3/3] dwarf_loader: Check DW_OP_[GNU_]entry_value for possible parameter matching Yonghong Song
2024-11-10 11:38   ` Eduard Zingerman
2024-11-11  8:01     ` Eduard Zingerman
2024-11-11 12:36       ` Jiri Olsa
2024-11-11 13:42         ` Arnaldo Carvalho de Melo
2024-11-15 16:26           ` elfutils thread-safety (Was: [PATCH dwarves 3/3] dwarf_loader: Check DW_OP_[GNU_]entry_value for possible parameter matching) Mark Wielaard
2024-11-11 18:51     ` Yonghong Song [this message]
2024-11-11  9:54   ` [PATCH dwarves 3/3] dwarf_loader: Check DW_OP_[GNU_]entry_value for possible parameter matching Jiri Olsa
2024-11-11 18:54     ` Yonghong Song
2024-11-11 15:39   ` Alan Maguire
2024-11-12  1:51     ` Yonghong Song
2024-11-12 16:56       ` Alan Maguire
2024-11-12 17:07         ` Yonghong Song
2024-11-12 18:33           ` Alan Maguire
2024-11-12 18:51             ` Yonghong Song
2024-11-12 19:10             ` Arnaldo Carvalho de Melo
2024-11-13 17:33               ` Alan Maguire
2024-11-13 18:27                 ` Yonghong Song
2024-11-14 12:16                   ` Alan Maguire
2024-11-14 16:29                     ` Yonghong Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=762f7d67-4d4f-4d94-a2c3-6f6b11913c8d@linux.dev \
    --to=yonghong.song@linux.dev \
    --cc=alan.maguire@oracle.com \
    --cc=andrii@kernel.org \
    --cc=arnaldo.melo@gmail.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=dwarves@vger.kernel.org \
    --cc=eddyz87@gmail.com \
    --cc=kernel-team@fb.com \
    --cc=song@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox