Re: [PATCH v7 00/11] kallsyms: Optimizes the performance of lookup symbols

Live Patching
 help / color / mirror / Atom feed

From: "Leizhen (ThunderTown)" <thunder.leizhen@huawei.com>
To: Luis Chamberlain <mcgrof@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>,
	Jiri Kosina <jikos@kernel.org>, Miroslav Benes <mbenes@suse.cz>,
	Petr Mladek <pmladek@suse.com>,
	Joe Lawrence <joe.lawrence@redhat.com>,
	<live-patching@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	Masahiro Yamada <masahiroy@kernel.org>,
	Alexei Starovoitov <ast@kernel.org>, Jiri Olsa <jolsa@kernel.org>,
	Kees Cook <keescook@chromium.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	<linux-modules@vger.kernel.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	"Ingo Molnar" <mingo@redhat.com>
Subject: Re: [PATCH v7 00/11] kallsyms: Optimizes the performance of lookup symbols
Date: Mon, 31 Oct 2022 12:55:49 +0800	[thread overview]
Message-ID: <b7215b83-11ab-6db6-bd7f-9729725eaaeb@huawei.com> (raw)
In-Reply-To: <712fae84-aadc-7d29-f311-a3352bab6346@huawei.com>



On 2022/10/29 16:10, Leizhen (ThunderTown) wrote:
> 
> 
> On 2022/10/27 14:27, Leizhen (ThunderTown) wrote:
>>
>>
>> On 2022/10/27 11:26, Leizhen (ThunderTown) wrote:
>>>
>>>
>>> On 2022/10/27 3:03, Luis Chamberlain wrote:
>>>> On Wed, Oct 26, 2022 at 02:44:36PM +0800, Leizhen (ThunderTown) wrote:
>>>>> On 2022/10/26 1:53, Luis Chamberlain wrote:
>>>>>> This answers how we don't use a hash table, the question was *should* we
>>>>>> use one?
>>>>>
>>>>> I'm not the original author, and I can only answer now based on my understanding. Maybe
>>>>> the original author didn't think of the hash method, or he has weighed it out.
>>>>>
>>>>> Hash is a good solution if only performance is required and memory overhead is not
>>>>> considered. Using hash will increase the memory size by up to "4 * kallsyms_num_syms +
>>>>> 4 * ARRAY_SIZE(hashtable)" bytes, kallsyms_num_syms is about 1-2 million.
>>
>> Sorry, 1-2 million ==> 0.1~0.2 million
>>
>>>>>
>>>>> Because I don't know what hash algorithm will be used, the cost of generating the
>>>>> hash value corresponding to the symbol name is unknown now. But I think it's gonna
>>>>> be small. But it definitely needs a simpler algorithm, the tool needs to implement
>>>>> the same hash algorithm.
>>>>
>>>> For instance, you can look at evaluating if alloc_large_system_hash() would help.
>>>
> 
> The following three hash algorithms are compared. The kernel is compiled by defconfig
> on arm64.
> 
> |---------------------------------------------------------------------------------------|
> |                           |             hash &= HASH_TABLE_SIZE - 1                   |
> |                           |             number of conflicts >= 1000                   |
> |---------------------------------------------------------------------------------------|
> |   ARRAY_SIZE(hash_table)  |       crc16        | jhash_one_at_a_time | string hash_32 |
> |---------------------------------------------------------------------------------------|
> |                           |     345b: 3905     |     0d40: 1013      |   4a4c: 6548   |
> |                           |     35bb: 1016     |     35ce: 6549      |   883a: 1015   |
> |        0x10000            |     385b: 6548     |     4440: 19126     |   d05f: 19129  |
> |                           |     f0ba: 19127    |     7ebe: 3916      |   ecda: 3903   |
> |---------------------------------------------------------------------------------------|
> |                           |      0ba: 19168    |      440: 19165     |    05f: 19170  |
> |                           |      45b: 3955     |      5ce: 6577      |    83a: 1066   |
> |        0x1000             |      5bb: 1069     |      d40: 1052      |    a4c: 6609   |
> |                           |      85b: 6582     |      ebe: 3938      |    cda: 3924   |
> |---------------------------------------------------------------------------------------|
> 
> Based on the above test results, I conclude that:
> 1. For the worst-case scenario, the three algorithms are not much different. But the kernel
>    only implements crc16 and string hash_32. The latter is not processed byte-by-byte, so
>    it is coupled with byte order and sizeof(long). So crc16 is the best choice.
> 2. For the worst-case scenario, there are almost 19K strings are mapped to the same hash
>    value，just over 1/10 of the total. And with my current compression-then-comparison
>    approach, it's 25-30 times faster. So there's still a need for my current approach, and
>    they can be combined.
>    if (nr_conflicts(key) >= CONST_N) {
>        newname = compress(name);
>        for_each_name_in_slot(key): compare(new_name)
>    } else {
>        for_each_name_in_slot(key): compare(name)
>    }
> 
>    Above CONST_N can be roughly calculated:
>    time_of_compress(name) + N * time_of_compare(new_name) <= N * time_of_compare(name)
> 3. For the worst-case scenario, there is no obvious difference between ARRAY_SIZE(hash_table)
>    0x10000 and 0x1000. So ARRAY_SIZE(hash_table)=0x1000 is enough.
>    Statistic information:
>    |------------------------------------------------------|
>    |   nr_conflicts(key)  |     ARRAY_SIZE(hash_table)    |
>    |------------------------------------------------------|
>    |         <= ?         |     0x1000    |    0x10000    |
>    |------------------------------------------------------|
>    |             0        |         0     |      7821     |
>    |            20        |        19     |     57375     |
>    |            40        |      2419     |       124     |
>    |            60        |      1343     |        70     |
>    |            80        |       149     |        73     |
>    |           100        |        38     |        49     |
>    |           200        |       108     |        16     |
>    |           400        |        14     |         2     |
>    |           600        |         2     |         2     |
>    |           800        |         0     |         0     |
>    |          1000        |         0     |         0     |
>    |        100000        |         4     |         4     |
>    |------------------------------------------------------|
> 
> 
> Also, I re-calculated:
> Using hash will increase the memory size by up to "6 * kallsyms_num_syms + 4 * ARRAY_SIZE(hashtable)"
>                                                    |---- What I said earlier was 4
> The increased size is close to 1 MB if CONFIG_KALLSYMS_ALL=y.
> 
> Hi, Luis:
>   For the reasons of the above-mentioned second conclusion. And except for patches 4-6,
> even if only the hash method is used, other patches and option "--lto-clang" in patch 6/11
> are also needed. If you don't mind, I'd like to use hash at the next stage. The current
> patch set is already huge.

I just had an update in response to David Laight's email. The hash solution is like
a centrist. It doesn't seem very feasible.

Now, we need to make a decision. Choose one of the two:
1. Continue with my current approach. Improve the average performance of
   kallsyms_lookup_name() by 20 to 30 times. The memory overhead is increased by:
   arm64 (defconfig):
     73.5KiB and 4.0% if CONFIG_KALLSYMS_ALL=y.
     19.8KiB and 2.8% if CONFIG_KALLSYMS_ALL=n.
   x86 (defconfig):
     49.0KiB and 3.0% if CONFIG_KALLSYMS_ALL=y.
     16.8KiB and 2.3% if CONFIG_KALLSYMS_ALL=n.
2. Sort names, binary search (The static function causes duplicate names. Additional work is required）
   2^18=262144, only up to 18 symbol expansions and comparisons are required.
   The performance is definitely excellent, although I haven't tested it yet.
   The memory overhead is increased by: 6 * kallsyms_num_syms
   arm64 (defconfig):
       1MiB if CONFIG_KALLSYMS_ALL=y.
     362KiB if CONFIG_KALLSYMS_ALL=n.
   x86 (defconfig):
     770KiB if CONFIG_KALLSYMS_ALL=y.
     356KiB if CONFIG_KALLSYMS_ALL=n.




> 
> 
>>> OK, I found the right hash function. In this way, the tool does not need to consider
>>> the byte order.
>>
>> https://en.wikipedia.org/wiki/Jenkins_hash_function
>>
>> Let's go with jenkins_one_at_a_time_hash(), which looks simpler and doesn't even
>> have to think about sizeof(long). It seems to be closest to our current needs.
>>
>> uint32_t jenkins_one_at_a_time_hash(const uint8_t* key, size_t length) {
>> 	size_t i = 0;
>> 	uint32_t hash = 0;
>>
>> 	while (i != length) {
>> 		hash += key[i++];
>> 		hash += hash << 10;
>> 		hash ^= hash >> 6;
>> 	}
>> 	hash += hash << 3;
>> 	hash ^= hash >> 11;
>> 	hash += hash << 15;
>>
>> 	return hash;
>> }
>>
>>>
>>> include/linux/stringhash.h
>>>
>>> /*
>>>  * Version 1: one byte at a time.  Example of use:
>>>  *
>>>  * unsigned long hash = init_name_hash;
>>>  * while (*p)
>>>  *      hash = partial_name_hash(tolower(*p++), hash);
>>>  * hash = end_name_hash(hash);
>>>
>>>
>>>>
>>>>   Luis
>>>> .
>>>>
>>>
>>
> 

-- 
Regards,
  Zhen Lei

next prev parent reply	other threads:[~2022-10-31  4:56 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-17  6:49 [PATCH v7 00/11] kallsyms: Optimizes the performance of lookup symbols Zhen Lei
2022-10-17  6:49 ` [PATCH v7 01/11] scripts/kallsyms: rename build_initial_tok_table() Zhen Lei
2022-10-17  6:49 ` [PATCH v7 02/11] scripts/kallsyms: don't compress symbol types Zhen Lei
2022-10-17  6:49 ` [PATCH v7 03/11] scripts/kallsyms: remove helper sym_name() and cleanup Zhen Lei
2022-10-17  6:49 ` [PATCH v7 04/11] kallsyms: Add helper kallsyms_compress_symbol_name() Zhen Lei
2022-10-17  6:49 ` [PATCH v7 05/11] kallsyms: Improve the performance of kallsyms_lookup_name() Zhen Lei
2022-10-17  6:49 ` [PATCH v7 06/11] kallsyms: Improve the performance of kallsyms_lookup_name() when CONFIG_LTO_CLANG=y Zhen Lei
2022-10-17  6:49 ` [PATCH v7 07/11] kallsyms: Add helper kallsyms_on_each_match_symbol() Zhen Lei
2022-10-17  6:49 ` [PATCH v7 08/11] livepatch: Use kallsyms_on_each_match_symbol() to improve performance Zhen Lei
2022-10-17  6:49 ` [PATCH v7 09/11] livepatch: Improve the search performance of module_kallsyms_on_each_symbol() Zhen Lei
2022-10-17  6:49 ` [PATCH v7 10/11] kallsyms: Delete an unused parameter related to kallsyms_on_each_symbol() Zhen Lei
2022-10-17  6:49 ` [PATCH v7 11/11] kallsyms: Add self-test facility Zhen Lei
     [not found]   ` <202210181636.S8XlpSMd-lkp@intel.com>
2022-10-18  9:11     ` Leizhen (ThunderTown)
     [not found]   ` <202210181740.PAAHM5dR-lkp@intel.com>
2022-10-19  8:39     ` Leizhen (ThunderTown)
2022-10-19 12:01 ` [PATCH v7 00/11] kallsyms: Optimizes the performance of lookup symbols Luis Chamberlain
2022-10-19 14:11   ` Leizhen (ThunderTown)
2022-10-25 17:53     ` Luis Chamberlain
2022-10-26  6:44       ` Leizhen (ThunderTown)
2022-10-26 19:03         ` Luis Chamberlain
2022-10-27  3:26           ` Leizhen (ThunderTown)
2022-10-27  6:27             ` Leizhen (ThunderTown)
2022-10-29  8:10               ` Leizhen (ThunderTown)
2022-10-29 12:49                 ` David Laight
2022-10-31  2:55                   ` Leizhen (ThunderTown)
2022-10-31  4:55                 ` Leizhen (ThunderTown) [this message]
2022-10-31 15:04                   ` Leizhen (ThunderTown)
2022-11-02  9:18                     ` Leizhen (ThunderTown)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b7215b83-11ab-6db6-bd7f-9729725eaaeb@huawei.com \
    --to=thunder.leizhen@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=ast@kernel.org \
    --cc=jikos@kernel.org \
    --cc=joe.lawrence@redhat.com \
    --cc=jolsa@kernel.org \
    --cc=jpoimboe@kernel.org \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-modules@vger.kernel.org \
    --cc=live-patching@vger.kernel.org \
    --cc=masahiroy@kernel.org \
    --cc=mbenes@suse.cz \
    --cc=mcgrof@kernel.org \
    --cc=mingo@redhat.com \
    --cc=pmladek@suse.com \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox