From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-ej1-f44.google.com (mail-ej1-f44.google.com [209.85.218.44]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C919513D638 for ; Thu, 5 Sep 2024 22:33:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.44 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725575587; cv=none; b=Oyl/6wEcMoSB1F3fisyKU0ULIWD4ZvFU5MzzpZi01GQZEuN75S6+0TAVl8zwApO+bM+VxkrTPbxSJcqzWWUekTyO5KOjadEcT4zEGVDnF0wn4C919XzDPfSwzb+RAqfetZK5BvVGiNDJuESytZC/sMBIMqFJy2cvIyupEQbXFsY= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725575587; c=relaxed/simple; bh=3vhJp0gafXSxj7qTeKBhXvIj6lC2WvRDfJr5ePsuRmQ=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=LwWDEWMzzAj8XnjwwJFCBmRB2Q38Y9PJfzh1gScMHuasckRyWrn/upqc5yhwXuVUPAdCx/DzIJ6ZRynF9XLnxra4MEWCPhgOyQs3Sklqb5fzBE3bOYhu477QyyxMgxouv/4QbAuctZJT1gGIcClFJM/zWEhh93/U+3jymlCgEfA= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=gewDuuRF; arc=none smtp.client-ip=209.85.218.44 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="gewDuuRF" Received: by mail-ej1-f44.google.com with SMTP id a640c23a62f3a-a86e5e9ff05so171654066b.1 for ; Thu, 05 Sep 2024 15:33:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725575584; x=1726180384; darn=vger.kernel.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=Kr6KEjzL0wHorbPM+3um6VKRgvbYN5MldEnXoiOdpQY=; b=gewDuuRFZrKO5gvfSCUDo6T1+ecqQKituwAi6U4KN3Iw3nAIvBr8CaG1Njl7MmJHkj tIJ6Ei40prVLfnMoWv/WnawFwnHhdhR8a8w9alXxpX++ouAEvLxpmllkt2Y6rA09MzWB hJKA12T50lp+HUn+vHMwgMiMNhqKBj38Jq5M5mcZjFkqHExOkU0yXfEuvS/gIKFXeTXY h76U67B8f3Ml+usCzb1JIUkT2HCUt0sVBlPImDF9J8y/EPn/jTLk52Yydr0DyxHaeW/0 pNQ7kEvWl3YvSTaC8z/pjCoa/toliTA6riTKiIBatGId8ji/FI0WqcBmHYY2CY56vWe0 1/tw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725575584; x=1726180384; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Kr6KEjzL0wHorbPM+3um6VKRgvbYN5MldEnXoiOdpQY=; b=blR30R175JwJe++T5taP4GX7QM/ZzO7J1XKQgNItSW5LtqIizCZpzRCChAZjWPHcaa eRTy1+qw7t45GaWeoAd9/dZEm5J4njIujvWpEpX6UIADQCYSJnwTC4oLfMGRusZFAOwt d1J8gfi5HeJIJZ7JO52KeF1BMbUhuAyEufHRAVCIs6XimvJSHds9UC20m4ZaBwOtCDGz z1nqQd+Aa77lmRB9iny+7+LjgRZQwV7kxSItdjupj1sN7nTcPYa1vCGaPiDwBbC1qdg4 glSi/hlU2vc319ceY0vIwcYY/0sneC6PCB/PEytPCw5gDvkxpvLbHjcJaAxAhIcF/Uxx lfJw== X-Gm-Message-State: AOJu0YzwGrTn/uEkLmsMl8tsZEkAA502zMg3nWPzJq2qN+DhZkSNTpZd NOPkgPNmSKeNjstuLhDZjDkccI1jvMUh9Izc7RVufRsQgSi71SeW X-Google-Smtp-Source: AGHT+IGd861SMy6tZKoNUTkzRrj2GoQ91hhAeXOuj1xCDBJul6j8XFsaXHynKe0L5lOfc/anju3ubw== X-Received: by 2002:a17:907:7421:b0:a8a:8d81:97a8 with SMTP id a640c23a62f3a-a8a8d81aa63mr2849366b.1.1725575583154; Thu, 05 Sep 2024 15:33:03 -0700 (PDT) Received: from [192.168.0.24] (walt-20-b2-v4wan-167837-cust573.vm13.cable.virginm.net. [80.2.18.62]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a8a7e68a345sm71680566b.121.2024.09.05.15.33.02 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 05 Sep 2024 15:33:02 -0700 (PDT) Message-ID: Date: Thu, 5 Sep 2024 23:33:01 +0100 Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH bpf-next] bpftool: improve btf c dump sorting stability To: Andrii Nakryiko Cc: bpf@vger.kernel.org, ast@kernel.org, andrii@kernel.org, daniel@iogearbox.net, kafai@meta.com, kernel-team@meta.com, Mykyta Yatsenko References: <20240905201206.648932-1-mykyta.yatsenko5@gmail.com> Content-Language: en-US From: Mykyta Yatsenko In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit On 05/09/2024 21:31, Andrii Nakryiko wrote: > On Thu, Sep 5, 2024 at 1:12 PM Mykyta Yatsenko > wrote: >> From: Mykyta Yatsenko >> >> Existing algorithm for BTF C dump sorting uses only types and names of >> the structs and unions for ordering. As dump contains structs with the >> same names but different contents, relative to each other ordering of >> those structs will be accidental. >> This patch addresses this problem by introducing a new sorting field >> that contains hash of the struct/union field names and types to >> disambiguate comparison of the non-unique named structs. >> > Did you check how stable generated vmlinux.h now is when just > rebuilding kernel with no actual changes? Does it still have some > variation? It's stable with this change, no variation at all (I tested up to about 5 times in a row rebuilding kernel and generating vmlinux.h), though, I would not be surprised if some edge case for same-named structs is not covered. It may get triggered in future by some change in kernel structures. > LGTM, but let's keep the btf_type_sort_name() as is? See below. > >> Signed-off-by: Mykyta Yatsenko >> --- >> tools/bpf/bpftool/btf.c | 93 ++++++++++++++++++++++++++++++++++------- >> 1 file changed, 79 insertions(+), 14 deletions(-) >> >> diff --git a/tools/bpf/bpftool/btf.c b/tools/bpf/bpftool/btf.c >> index 3b57ba095ab6..0e7151bfc3d5 100644 >> --- a/tools/bpf/bpftool/btf.c >> +++ b/tools/bpf/bpftool/btf.c >> @@ -50,6 +50,7 @@ struct sort_datum { >> int type_rank; >> const char *sort_name; >> const char *own_name; >> + __u64 disambig_hash; >> }; >> >> static const char *btf_int_enc_str(__u8 encoding) >> @@ -557,17 +558,6 @@ static const char *btf_type_sort_name(const struct btf *btf, __u32 index, bool f >> const struct btf_type *t = btf__type_by_id(btf, index); >> >> switch (btf_kind(t)) { >> - case BTF_KIND_ENUM: >> - case BTF_KIND_ENUM64: { >> - int name_off = t->name_off; >> - >> - if (!from_ref && !name_off && btf_vlen(t)) >> - name_off = btf_kind(t) == BTF_KIND_ENUM64 ? >> - btf_enum64(t)->name_off : >> - btf_enum(t)->name_off; >> - >> - return btf__name_by_offset(btf, name_off); >> - } > Why remove this? anonymous enums will usually still have some > meaningful prefix for each enumerator value, and so sorting based on > those prefixes (effective) still seems useful for more logical > ordering Removed this just to simplify code a little bit. With the hash function we use for disambiguation same-named enums, enums with less number of elements would be pushed higher which is decent sorting as well. I see why we would prefer to cluster enums with similar prefixes together. I'll revert this removal in v2. > > pw-bot: cr > >> case BTF_KIND_ARRAY: >> return btf_type_sort_name(btf, btf_array(t)->type, true); >> case BTF_KIND_TYPE_TAG: >> @@ -584,20 +574,94 @@ static const char *btf_type_sort_name(const struct btf *btf, __u32 index, bool f >> return NULL; >> } >> >> +static __u64 hasher(__u64 hash, __u64 val) >> +{ >> + return hash * 31 + val; >> +} >> + >> +static __u64 btf_name_hasher(__u64 hash, const struct btf *btf, __u32 name_off) >> +{ >> + if (!name_off) >> + return hash; >> + >> + return hasher(hash, str_hash(btf__name_by_offset(btf, name_off))); >> +} >> + >> +static __u64 btf_type_disambig_hash(const struct btf *btf, __u32 index, bool include_members) > nit: s/index/id/ ? Ack > >> +{ >> + const struct btf_type *t = btf__type_by_id(btf, index); >> + int i; >> + size_t hash = 0; >> + >> + hash = btf_name_hasher(hash, btf, t->name_off); >> + >> + switch (btf_kind(t)) { >> + case BTF_KIND_ENUM: >> + case BTF_KIND_ENUM64: >> + for (i = 0; i < btf_vlen(t); i++) { >> + __u32 name_off = btf_is_enum(t) ? >> + btf_enum(t)[i].name_off : >> + btf_enum64(t)[i].name_off; >> + >> + hash = btf_name_hasher(hash, btf, name_off); >> + } >> + break; >> + case BTF_KIND_STRUCT: >> + case BTF_KIND_UNION: >> + if (!include_members) >> + break; >> + for (i = 0; i < btf_vlen(t); i++) { >> + const struct btf_member *m = btf_members(t) + i; >> + >> + hash = btf_name_hasher(hash, btf, m->name_off); >> + /* resolve field type's name and hash it as well */ >> + hash = hasher(hash, btf_type_disambig_hash(btf, m->type, false)); >> + } >> + break; >> + case BTF_KIND_TYPE_TAG: >> + case BTF_KIND_CONST: >> + case BTF_KIND_PTR: >> + case BTF_KIND_VOLATILE: >> + case BTF_KIND_RESTRICT: >> + case BTF_KIND_TYPEDEF: >> + case BTF_KIND_DECL_TAG: >> + hash = hasher(hash, btf_type_disambig_hash(btf, t->type, include_members)); >> + break; >> + case BTF_KIND_ARRAY: { >> + struct btf_array *arr = btf_array(t); >> + >> + hash = hasher(hash, arr->nelems); >> + hash = hasher(hash, btf_type_disambig_hash(btf, arr->type, include_members)); >> + break; >> + } >> + default: >> + break; >> + } >> + return hash; >> +} >> + >> static int btf_type_compare(const void *left, const void *right) >> { >> const struct sort_datum *d1 = (const struct sort_datum *)left; >> const struct sort_datum *d2 = (const struct sort_datum *)right; >> int r; >> >> - if (d1->type_rank != d2->type_rank) >> - return d1->type_rank < d2->type_rank ? -1 : 1; >> + r = d1->type_rank - d2->type_rank; >> + if (r) >> + return r; >> >> r = strcmp(d1->sort_name, d2->sort_name); >> if (r) >> return r; >> >> - return strcmp(d1->own_name, d2->own_name); >> + r = strcmp(d1->own_name, d2->own_name); >> + if (r) >> + return r; > when I was playing with this code I had stong desire to do something > like below to cut down on visual noise > > r = d1->type_rank - d2->type_rank; > r = r ?: strcmp(d1->sort_name, d2->sort_name); > r = r ?: strcmp(d1->own_name, d2->own_name); > if (r) > return r; > > WDYT? Looks a little hard to parse for me (maybe a skill issue). Early exits help to disregard context when reading the code (not sure if this makes sense), so I prefer them. Essentially this is the same thing as theexisting code, but more compact, makes sense, I'll include it in v2. >> + >> + if (d1->disambig_hash != d2->disambig_hash) >> + return d1->disambig_hash < d2->disambig_hash ? -1 : 1; >> + >> + return d1->index - d2->index; >> } >> >> static struct sort_datum *sort_btf_c(const struct btf *btf) >> @@ -618,6 +682,7 @@ static struct sort_datum *sort_btf_c(const struct btf *btf) >> d->type_rank = btf_type_rank(btf, i, false); >> d->sort_name = btf_type_sort_name(btf, i, false); >> d->own_name = btf__name_by_offset(btf, t->name_off); >> + d->disambig_hash = btf_type_disambig_hash(btf, i, true); >> } >> >> qsort(datums, n, sizeof(struct sort_datum), btf_type_compare); >> -- >> 2.46.0 >>