All of lore.kernel.org
 help / color / mirror / Atom feed
From: Arnaldo Carvalho de Melo <acme@kernel.org>
To: Hao Luo <haoluo@google.com>, Andrii Nakryiko <andrii@kernel.org>
Cc: dwarves@vger.kernel.org, bpf@vger.kernel.org, kernel-team@fb.com,
	ast@kernel.org, Andrii Nakryiko <andriin@fb.com>,
	Oleg Rombakh <olegrom@google.com>
Subject: Re: [PATCH v2 dwarves 5/8] btf_encoder: revamp how per-CPU variables are encoded
Date: Fri, 9 Oct 2020 12:53:36 -0300	[thread overview]
Message-ID: <20201009155336.GC322246@kernel.org> (raw)
In-Reply-To: <20201008234000.740660-6-andrii@kernel.org>

Em Thu, Oct 08, 2020 at 04:39:57PM -0700, Andrii Nakryiko escreveu:
> From: Andrii Nakryiko <andriin@fb.com>
> 
> Right now to encode per-CPU variables in BTF, pahole iterates complete vmlinux
> symbol table for each CU. There are 2500 CUs for a typical kernel image.
> Overall, to encode 287 per-CPU variables pahole spends more than 10% of its CPU
> budget, this is incredibly wasteful.

You forgot to add Hao to the Cc list, Hao, can you take a look? I'm
tentatively applying it to my local branch, but would like to hear from
you since you wrote this code.

- Arnaldo

> 
> This patch revamps how this is done. Now it pre-processes symbol table once
> before any of per-CU processing starts. It remembers each per-CPU variable
> symbol, including its address, size, and name. Then during processing each CU,
> binary search is used to correlate DWARF variable with per-CPU symbols and
> figure out if variable belongs to per-CPU data section. If the match is found,
> BTF_KIND_VAR is emitted and var_secinfo is recorded, just like before. At the
> very end, after all CUs are processed, BTF_KIND_DATASEC is emitted with sorted
> variables.
> 
> This change makes per-CPU variables generation overhead pretty negligible and
> returns back about 10% of CPU usage.
> 
> Performance counter stats for './pahole -J /home/andriin/linux-build/default/vmlinux':
> 
> BEFORE:
>       19.160149000 seconds user
>        1.304873000 seconds sys
> 
>          24,114.05 msec task-clock                #    0.999 CPUs utilized
>                 83      context-switches          #    0.003 K/sec
>                  0      cpu-migrations            #    0.000 K/sec
>            622,417      page-faults               #    0.026 M/sec
>     72,897,315,125      cycles                    #    3.023 GHz                      (25.02%)
>    127,807,316,959      instructions              #    1.75  insn per cycle           (25.01%)
>     29,087,179,117      branches                  # 1206.234 M/sec                    (25.01%)
>        464,105,921      branch-misses             #    1.60% of all branches          (25.01%)
>     30,252,119,368      L1-dcache-loads           # 1254.543 M/sec                    (25.01%)
>      1,156,336,207      L1-dcache-load-misses     #    3.82% of all L1-dcache hits    (25.05%)
>        343,373,503      LLC-loads                 #   14.240 M/sec                    (25.02%)
>         12,044,977      LLC-load-misses           #    3.51% of all LL-cache hits     (25.01%)
> 
>       24.136198321 seconds time elapsed
> 
>       22.729693000 seconds user
>        1.384859000 seconds sys
> 
> AFTER:
>       16.781455000 seconds user
>        1.343956000 seconds sys
> 
>          23,398.77 msec task-clock                #    1.000 CPUs utilized
>                 86      context-switches          #    0.004 K/sec
>                  0      cpu-migrations            #    0.000 K/sec
>            622,420      page-faults               #    0.027 M/sec
>     68,395,641,468      cycles                    #    2.923 GHz                      (25.05%)
>    114,241,327,034      instructions              #    1.67  insn per cycle           (25.01%)
>     26,330,711,718      branches                  # 1125.303 M/sec                    (25.01%)
>        465,926,869      branch-misses             #    1.77% of all branches          (25.00%)
>     24,662,984,772      L1-dcache-loads           # 1054.029 M/sec                    (25.00%)
>      1,054,052,064      L1-dcache-load-misses     #    4.27% of all L1-dcache hits    (25.00%)
>        340,970,622      LLC-loads                 #   14.572 M/sec                    (25.00%)
>         16,032,297      LLC-load-misses           #    4.70% of all LL-cache hits     (25.03%)
> 
>       23.402259654 seconds time elapsed
> 
>       21.916437000 seconds user
>        1.482826000 seconds sys
> 
> Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> ---
>  btf_encoder.c | 248 +++++++++++++++++++++++++++++---------------------
>  libbtf.c      |   6 +-
>  libbtf.h      |   1 +
>  3 files changed, 148 insertions(+), 107 deletions(-)
> 
> diff --git a/btf_encoder.c b/btf_encoder.c
> index 2e5df03e040f..2a6455be4c52 100644
> --- a/btf_encoder.c
> +++ b/btf_encoder.c
> @@ -17,6 +17,7 @@
>  #include "btf_encoder.h"
>  
>  #include <ctype.h> /* for isalpha() and isalnum() */
> +#include <stdlib.h> /* for qsort() and bsearch() */
>  #include <inttypes.h>
>  
>  /*
> @@ -53,18 +54,18 @@ static bool btf_name_valid(const char *p)
>  	return !*p;
>  }
>  
> -static void dump_invalid_symbol(const char *msg, const char *sym, const char *cu,
> +static void dump_invalid_symbol(const char *msg, const char *sym,
>  				int verbose, bool force)
>  {
>  	if (force) {
>  		if (verbose)
> -			fprintf(stderr, "PAHOLE: Warning: %s, ignored (sym: '%s', cu: '%s').\n",
> -				msg, sym, cu);
> +			fprintf(stderr, "PAHOLE: Warning: %s, ignored (sym: '%s').\n",
> +				msg, sym);
>  		return;
>  	}
>  
> -	fprintf(stderr, "PAHOLE: Error: %s (sym: '%s', cu: '%s').\n", msg, sym, cu);
> -	fprintf(stderr, "PAHOLE: Error: Use '-j' or '--force' to ignore such symbols and force emit the btf.\n");
> +	fprintf(stderr, "PAHOLE: Error: %s (sym: '%s').\n", msg, sym);
> +	fprintf(stderr, "PAHOLE: Error: Use '--btf_encode_force' to ignore such symbols and force emit the btf.\n");
>  }
>  
>  extern struct debug_fmt_ops *dwarves__active_loader;
> @@ -202,6 +203,9 @@ int btf_encoder__encode()
>  {
>  	int err;
>  
> +	if (gobuffer__size(&btfe->percpu_secinfo) != 0)
> +		btf_elf__add_datasec_type(btfe, PERCPU_SECTION, &btfe->percpu_secinfo);
> +
>  	err = btf_elf__encode(btfe, 0);
>  	btf_elf__delete(btfe);
>  	btfe = NULL;
> @@ -209,24 +213,117 @@ int btf_encoder__encode()
>  	return err;
>  }
>  
> -#define HASHADDR__BITS 8
> -#define HASHADDR__SIZE (1UL << HASHADDR__BITS)
> -#define hashaddr__fn(key) hash_64(key, HASHADDR__BITS)
> +#define MAX_PERCPU_VAR_CNT 4096
> +
> +struct var_info {
> +	uint64_t addr;
> +	uint32_t sz;
> +	const char *name;
> +};
> +
> +static struct var_info percpu_vars[MAX_PERCPU_VAR_CNT];
> +static int percpu_var_cnt;
> +
> +static int percpu_var_cmp(const void *_a, const void *_b)
> +{
> +	const struct var_info *a = _a;
> +	const struct var_info *b = _b;
> +
> +	if (a->addr == b->addr)
> +		return 0;
> +	return a->addr < b->addr ? -1 : 1;
> +}
> +
> +static bool percpu_var_exists(uint64_t addr, uint32_t *sz, const char **name)
> +{
> +	const struct var_info *p;
> +	struct var_info key = { .addr = addr };
> +
> +	p = bsearch(&key, percpu_vars, percpu_var_cnt,
> +		    sizeof(percpu_vars[0]), percpu_var_cmp);
> +
> +	if (!p)
> +		return false;
> +
> +	*sz = p->sz;
> +	*name = p->name;
> +	return true;
> +}
>  
> -static struct variable *hashaddr__find_variable(const struct hlist_head hashtable[],
> -						const uint64_t addr)
> +static int find_all_percpu_vars(struct btf_elf *btfe)
>  {
> -	struct variable *variable;
> -	struct hlist_node *pos;
> -	uint16_t bucket = hashaddr__fn(addr);
> -	const struct hlist_head *head = &hashtable[bucket];
> -
> -	hlist_for_each_entry(variable, pos, head, tool_hnode) {
> -		if (variable->ip.addr == addr)
> -			return variable;
> +	uint32_t core_id;
> +	GElf_Sym sym;
> +
> +	/* cache variables' addresses, preparing for searching in symtab. */
> +	percpu_var_cnt = 0;
> +
> +	/* search within symtab for percpu variables */
> +	elf_symtab__for_each_symbol(btfe->symtab, core_id, sym) {
> +		const char *sym_name;
> +		uint64_t addr;
> +		uint32_t size;
> +
> +		/* compare a symbol's shndx to determine if it's a percpu variable */
> +		if (elf_sym__section(&sym) != btfe->percpu_shndx)
> +			continue;
> +		if (elf_sym__type(&sym) != STT_OBJECT)
> +			continue;
> +
> +		addr = elf_sym__value(&sym);
> +		/*
> +		 * Store only those symbols that have allocated space in the percpu section.
> +		 * This excludes the following three types of symbols:
> +		 *
> +		 *  1. __ADDRESSABLE(sym), which are forcely emitted as symbols.
> +		 *  2. __UNIQUE_ID(prefix), which are introduced to generate unique ids.
> +		 *  3. __exitcall(fn), functions which are labeled as exit calls.
> +		 *
> +		 * In addition, the variables defined using DEFINE_PERCPU_FIRST are
> +		 * also not included, which currently includes:
> +		 *
> +		 *  1. fixed_percpu_data
> +		 */
> +		if (!addr)
> +			continue;
> +
> +		sym_name = elf_sym__name(&sym, btfe->symtab);
> +		if (!btf_name_valid(sym_name)) {
> +			dump_invalid_symbol("Found symbol of invalid name when encoding btf",
> +					    sym_name, btf_elf__verbose, btf_elf__force);
> +			if (btf_elf__force)
> +				continue;
> +			return -1;
> +		}
> +		size = elf_sym__size(&sym);
> +		if (!size) {
> +			dump_invalid_symbol("Found symbol of zero size when encoding btf",
> +					    sym_name, btf_elf__verbose, btf_elf__force);
> +			if (btf_elf__force)
> +				continue;
> +			return -1;
> +		}
> +
> +		if (btf_elf__verbose)
> +			printf("Found per-CPU symbol '%s' at address 0x%lx\n", sym_name, addr);
> +
> +		if (percpu_var_cnt == MAX_PERCPU_VAR_CNT) {
> +			fprintf(stderr, "Reached the limit of per-CPU variables: %d\n",
> +				MAX_PERCPU_VAR_CNT);
> +			return -1;
> +		}
> +		percpu_vars[percpu_var_cnt].addr = addr;
> +		percpu_vars[percpu_var_cnt].sz = size;
> +		percpu_vars[percpu_var_cnt].name = sym_name;
> +		percpu_var_cnt++;
>  	}
>  
> -	return NULL;
> +	if (percpu_var_cnt)
> +		qsort(percpu_vars, percpu_var_cnt, sizeof(percpu_vars[0]), percpu_var_cmp);
> +
> +	if (btf_elf__verbose)
> +		printf("Found %d per-CPU variables!\n", percpu_var_cnt);
> +	return 0;
>  }
>  
>  int cu__encode_btf(struct cu *cu, int verbose, bool force,
> @@ -234,13 +331,10 @@ int cu__encode_btf(struct cu *cu, int verbose, bool force,
>  {
>  	uint32_t type_id_off;
>  	uint32_t core_id;
> +	struct variable *var;
>  	struct function *fn;
>  	struct tag *pos;
>  	int err = 0;
> -	struct hlist_head hash_addr[HASHADDR__SIZE];
> -	struct variable *var;
> -	bool has_global_var = false;
> -	GElf_Sym sym;
>  
>  	if (btfe && strcmp(btfe->filename, cu->filename)) {
>  		err = btf_encoder__encode();
> @@ -257,6 +351,9 @@ int cu__encode_btf(struct cu *cu, int verbose, bool force,
>  		if (!btfe)
>  			return -1;
>  
> +		if (!skip_encoding_vars && find_all_percpu_vars(btfe))
> +			goto out;
> +
>  		has_index_type = false;
>  		need_index_type = false;
>  		array_index_id = 0;
> @@ -278,6 +375,7 @@ int cu__encode_btf(struct cu *cu, int verbose, bool force,
>  	}
>  
>  	btf_elf__verbose = verbose;
> +	btf_elf__force = force;
>  	type_id_off = btf__get_nr_types(btfe->btf);
>  
>  	cu__for_each_type(cu, core_id, pos) {
> @@ -325,12 +423,11 @@ int cu__encode_btf(struct cu *cu, int verbose, bool force,
>  	if (verbose)
>  		printf("search cu '%s' for percpu global variables.\n", cu->name);
>  
> -	/* cache variables' addresses, preparing for searching in symtab. */
> -	for (core_id = 0; core_id < HASHADDR__SIZE; ++core_id)
> -		INIT_HLIST_HEAD(&hash_addr[core_id]);
> -
>  	cu__for_each_variable(cu, core_id, pos) {
> -		struct hlist_head *head;
> +		uint32_t size, type, linkage, offset;
> +		const char *name;
> +		uint64_t addr;
> +		int id;
>  
>  		var = tag__variable(pos);
>  		if (var->declaration && !var->spec)
> @@ -338,89 +435,37 @@ int cu__encode_btf(struct cu *cu, int verbose, bool force,
>  		/* percpu variables are allocated in global space */
>  		if (variable__scope(var) != VSCOPE_GLOBAL && !var->spec)
>  			continue;
> -		has_global_var = true;
> -		head = &hash_addr[hashaddr__fn(var->ip.addr)];
> -		hlist_add_head(&var->tool_hnode, head);
> -	}
> -	if (!has_global_var) {
> -		if (verbose)
> -			printf("cu has no global variable defined, skip.\n");
> -		goto out;
> -	}
> -
> -	/* search within symtab for percpu variables */
> -	elf_symtab__for_each_symbol(btfe->symtab, core_id, sym) {
> -		uint32_t linkage, type, size, offset;
> -		int32_t btf_var_id, btf_var_secinfo_id;
> -		uint64_t addr;
> -		const char *sym_name;
> -
> -		/* compare a symbol's shndx to determine if it's a percpu variable */
> -		if (elf_sym__section(&sym) != btfe->percpu_shndx)
> -			continue;
> -		if (elf_sym__type(&sym) != STT_OBJECT)
> -			continue;
>  
> -		addr = elf_sym__value(&sym);
> -		/*
> -		 * Store only those symbols that have allocated space in the percpu section.
> -		 * This excludes the following three types of symbols:
> -		 *
> -		 *  1. __ADDRESSABLE(sym), which are forcely emitted as symbols.
> -		 *  2. __UNIQUE_ID(prefix), which are introduced to generate unique ids.
> -		 *  3. __exitcall(fn), functions which are labeled as exit calls.
> -		 *
> -		 * In addition, the variables defined using DEFINE_PERCPU_FIRST are
> -		 * also not included, which currently includes:
> -		 *
> -		 *  1. fixed_percpu_data
> -		 */
> -		if (!addr)
> -			continue;
> -		var = hashaddr__find_variable(hash_addr, addr);
> -		if (var == NULL)
> -			continue;
> +		/* addr has to be recorded before we follow spec */
> +		addr = var->ip.addr;
>  		if (var->spec)
>  			var = var->spec;
>  
> -		sym_name = elf_sym__name(&sym, btfe->symtab);
> -		if (!btf_name_valid(sym_name)) {
> -			dump_invalid_symbol("Found symbol of invalid name when encoding btf",
> -					    sym_name, cu->name, verbose, force);
> -			if (force)
> -				continue;
> -			err = -1;
> -			break;
> -		}
>  		if (var->ip.tag.type == 0) {
> -			dump_invalid_symbol("Found symbol of void type when encoding btf",
> -					    sym_name, cu->name, verbose, force);
> -			if (force)
> -				continue;
> -			err = -1;
> -			break;
> -		}
> -		type = type_id_off + var->ip.tag.type;
> -		size = elf_sym__size(&sym);
> -		if (!size) {
> -			dump_invalid_symbol("Found symbol of zero size when encoding btf",
> -					    sym_name, cu->name, verbose, force);
> +			fprintf(stderr, "error: found variable in CU '%s' that has void type\n",
> +				cu->name);
>  			if (force)
>  				continue;
>  			err = -1;
>  			break;
>  		}
>  
> -		if (verbose)
> -			printf("symbol '%s' of address 0x%lx encoded\n",
> -			       sym_name, addr);
> +		type = var->ip.tag.type + type_id_off;
> +		linkage = var->external ? BTF_VAR_GLOBAL_ALLOCATED : BTF_VAR_STATIC;
> +		if (!percpu_var_exists(addr, &size, &name))
> +			continue; /* not a per-CPU variable */
> +
> +		if (btf_elf__verbose) {
> +			printf("Variable '%s' from CU '%s' at address 0x%lx encoded\n",
> +			       name, cu->name, addr);
> +		}
>  
>  		/* add a BTF_KIND_VAR in btfe->types */
> -		linkage = var->external ? BTF_VAR_GLOBAL_ALLOCATED : BTF_VAR_STATIC;
> -		btf_var_id = btf_elf__add_var_type(btfe, type, sym_name, linkage);
> -		if (btf_var_id < 0) {
> +		id = btf_elf__add_var_type(btfe, type, name, linkage);
> +		if (id < 0) {
>  			err = -1;
> -			printf("error: failed to encode variable '%s'\n", sym_name);
> +			fprintf(stderr, "error: failed to encode variable '%s' at addr 0x%lx\n",
> +			        name, addr);
>  			break;
>  		}
>  
> @@ -428,13 +473,12 @@ int cu__encode_btf(struct cu *cu, int verbose, bool force,
>  		 * add a BTF_VAR_SECINFO in btfe->percpu_secinfo, which will be added into
>  		 * btfe->types later when we add BTF_VAR_DATASEC.
>  		 */
> -		type = btf_var_id;
>  		offset = addr - btfe->percpu_base_addr;
> -		btf_var_secinfo_id = btf_elf__add_var_secinfo(&btfe->percpu_secinfo,
> -							      type, offset, size);
> -		if (btf_var_secinfo_id < 0) {
> +		id = btf_elf__add_var_secinfo(&btfe->percpu_secinfo, id, offset, size);
> +		if (id < 0) {
>  			err = -1;
> -			printf("error: failed to encode var secinfo '%s'\n", sym_name);
> +			fprintf(stderr, "error: failed to encode section info for variable '%s' at addr 0x%lx\n",
> +			        name, addr);
>  			break;
>  		}
>  	}
> diff --git a/libbtf.c b/libbtf.c
> index 0467f1f2a596..27aa3e5a986e 100644
> --- a/libbtf.c
> +++ b/libbtf.c
> @@ -28,6 +28,7 @@
>  #include "elf_symtab.h"
>  
>  uint8_t btf_elf__verbose;
> +uint8_t btf_elf__force;
>  
>  static int btf_var_secinfo_cmp(const void *a, const void *b)
>  {
> @@ -62,7 +63,6 @@ int btf_elf__load(struct btf_elf *btfe)
>  	return 0;
>  }
>  
> -
>  struct btf_elf *btf_elf__new(const char *filename, Elf *elf)
>  {
>  	struct btf_elf *btfe = zalloc(sizeof(*btfe));
> @@ -771,10 +771,6 @@ int btf_elf__encode(struct btf_elf *btfe, uint8_t flags)
>  {
>  	struct btf *btf = btfe->btf;
>  
> -	if (gobuffer__size(&btfe->percpu_secinfo) != 0)
> -		btf_elf__add_datasec_type(btfe, PERCPU_SECTION,
> -					  &btfe->percpu_secinfo);
> -
>  	/* Empty file, nothing to do, so... done! */
>  	if (btf__get_nr_types(btf) == 0)
>  		return 0;
> diff --git a/libbtf.h b/libbtf.h
> index 9b3d396da31f..887b5bc55c8e 100644
> --- a/libbtf.h
> +++ b/libbtf.h
> @@ -30,6 +30,7 @@ struct btf_elf {
>  };
>  
>  extern uint8_t btf_elf__verbose;
> +extern uint8_t btf_elf__force;
>  #define btf_elf__verbose_log(fmt, ...) { if (btf_elf__verbose) printf(fmt, __VA_ARGS__); }
>  
>  #define PERCPU_SECTION ".data..percpu"
> -- 
> 2.24.1
> 

-- 

- Arnaldo

  reply	other threads:[~2020-10-09 15:53 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-08 23:39 [PATCH v2 dwarves 0/8] Switch BTF loading and encoding to libbpf APIs Andrii Nakryiko
2020-10-08 23:39 ` [PATCH v2 dwarves 1/8] btf_loader: use libbpf to load BTF Andrii Nakryiko
2020-10-09 14:53   ` Arnaldo Carvalho de Melo
2020-10-09 17:49     ` Andrii Nakryiko
2020-10-08 23:39 ` [PATCH v2 dwarves 2/8] btf_encoder: use libbpf APIs to encode BTF type info Andrii Nakryiko
2020-10-08 23:39 ` [PATCH v2 dwarves 3/8] btf_encoder: fix emitting __ARRAY_SIZE_TYPE__ as index range type Andrii Nakryiko
2020-10-08 23:39 ` [PATCH v2 dwarves 4/8] btf_encoder: discard CUs after BTF encoding Andrii Nakryiko
2020-10-09 15:47   ` Arnaldo Carvalho de Melo
2020-10-08 23:39 ` [PATCH v2 dwarves 5/8] btf_encoder: revamp how per-CPU variables are encoded Andrii Nakryiko
2020-10-09 15:53   ` Arnaldo Carvalho de Melo [this message]
2020-10-08 23:39 ` [PATCH v2 dwarves 6/8] dwarf_loader: increase the size of lookup hash map Andrii Nakryiko
2020-10-08 23:39 ` [PATCH v2 dwarves 7/8] strings: use BTF's string APIs for strings management Andrii Nakryiko
2020-10-08 23:40 ` [PATCH v2 dwarves 8/8] btf_encoder: support cross-compiled ELF binaries with different endianness Andrii Nakryiko
2020-10-09 16:22 ` [PATCH v2 dwarves 0/8] Switch BTF loading and encoding to libbpf APIs Arnaldo Carvalho de Melo
2020-10-09 17:59   ` Andrii Nakryiko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201009155336.GC322246@kernel.org \
    --to=acme@kernel.org \
    --cc=andrii@kernel.org \
    --cc=andriin@fb.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=dwarves@vger.kernel.org \
    --cc=haoluo@google.com \
    --cc=kernel-team@fb.com \
    --cc=olegrom@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.