From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 12E89C6FA8F for ; Tue, 29 Aug 2023 22:05:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233624AbjH2WE7 (ORCPT ); Tue, 29 Aug 2023 18:04:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49494 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231401AbjH2WEi (ORCPT ); Tue, 29 Aug 2023 18:04:38 -0400 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8BBC99C for ; Tue, 29 Aug 2023 15:04:35 -0700 (PDT) Received: by mail-yb1-xb49.google.com with SMTP id 3f1490d57ef6-c8f360a07a2so5653072276.2 for ; Tue, 29 Aug 2023 15:04:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1693346675; x=1693951475; darn=vger.kernel.org; h=cc:to:from:subject:message-id:mime-version:in-reply-to:date:from:to :cc:subject:date:message-id:reply-to; bh=gcIUAUfR2U+VLnjXuO7jr4+Okcha6SYoSJD9Ez73JCE=; b=DeY8gKM8zE70mM1Z4sbl2/ljf4VUetD0iU7sn2fVO9cA07JIeQ8ZM2G+nI/OIwVW0h BdZu2N52/s/vnX+7Wv04MEObq6nKLIpmJmU8cAg6zwotcxRo8SbCvpBb0P2dD0RgZyeb aOwQu1VL44S5OUGPCvr5RAt1rk+QHrAQVbCYIZmHZVgKPOkZoQ2+3DDBc1ShXRLw/d2O k4LiOyseXOHFv2NNzCBkRMk9YjweKysDZiJJNZ3hEW+BW+c52z4YEekJzxOIST7k0BGw IVwcTL/JwYvr/zgMJnIlY9qtNJg0X/imzv1neW+aBhV/v4gSi+1l/qOfgTi8tPpAgIF5 LmPg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693346675; x=1693951475; h=cc:to:from:subject:message-id:mime-version:in-reply-to:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=gcIUAUfR2U+VLnjXuO7jr4+Okcha6SYoSJD9Ez73JCE=; b=gtby03b4h/ye0AC2grIvMbJNmvNaxBbsBIKt13KBeJLl3UhAChrqjGPAORjuyG05Le 5YYIh1nIRM0I3MIc11RPW9LfiQyuzi3wozYNZHo5TReF/YymJ22MQNtGGg0Xd1PxxFt1 tLBIcS5bul8qEzjlDyVarSH34ca0szvDkKbBXX1R+eabBhGbDPL8eBcQAbOcWvT0Y1s1 SIIh17+9xsAvL2xkqmlEvIU2HLJGZLgtogDQntioSXud5/2hekccQD1dVd5FSEi2mvkR gY+HYl10BnrM7amSSUCubIwMH1APcjg+CgaNC9k0A2o2GFuCxdtHXs6n/LgcFO/RsGwY kznQ== X-Gm-Message-State: AOJu0Yw6gIWryPCxB3c/QzbeLEDznV0MexTXduuZ2UEttvpVdPT3ujB4 KCzO/7N98w6omd7G8ONKF4CD7fqzt3on68MBIiNZ X-Google-Smtp-Source: AGHT+IF1Q0StQb2ALaD2lYaPBTy1WN+7vRlmKaAZGQOO0s0cUro+PLKQoCpA6oOvYHNBL0b0TQhvtbT49afKlMVzbjTs X-Received: from jonathantanmy0.svl.corp.google.com ([2620:15c:2d3:204:ec07:4d48:eb22:d87c]) (user=jonathantanmy job=sendgmr) by 2002:a25:dc91:0:b0:d74:6a19:a595 with SMTP id y139-20020a25dc91000000b00d746a19a595mr11276ybe.4.1693346674788; Tue, 29 Aug 2023 15:04:34 -0700 (PDT) Date: Tue, 29 Aug 2023 15:04:32 -0700 In-Reply-To: Mime-Version: 1.0 X-Mailer: git-send-email 2.42.0.rc2.253.gd59a3bf2b4-goog Message-ID: <20230829220432.558674-1-jonathantanmy@google.com> Subject: Re: [RFC PATCH 2/6] bloom: prepare to discard incompatible Bloom filters From: Jonathan Tan To: Taylor Blau Cc: Jonathan Tan , git@vger.kernel.org, Derrick Stolee , Junio C Hamano Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Here are answers to your questions, but I'll also reply to the latest version stating that I'm OK with this patch set being merged (with my reasons). Taylor Blau writes: > Apologies for not quite grokking this, but I am still somewhat confused. > > Suppose we applied something like your suggestion on top of this patch, > like so: > > --- 8< --- > diff --git a/bloom.c b/bloom.c > index 739fa093ba..ee81976342 100644 > --- a/bloom.c > +++ b/bloom.c > @@ -253,16 +253,16 @@ static void init_truncated_large_filter(struct bloom_filter *filter, > struct bloom_filter *get_bloom_filter(struct repository *r, struct commit *c) > { > struct bloom_filter *filter; > - int hash_version; > + struct bloom_filter_settings *settings; > > filter = get_or_compute_bloom_filter(r, c, 0, NULL, NULL); > if (!filter) > return NULL; > > prepare_repo_settings(r); > - hash_version = r->settings.commit_graph_changed_paths_version; > + settings = get_bloom_filter_settings(r); > > - if (!(hash_version == -1 || hash_version == filter->version)) > + if (filter->version != (settings->hash_version == 2 ? 2 : 1)) > return NULL; /* unusable filter */ > return filter; > } > --- >8 --- > > We have a handful of failing tests, notably including t4216.1, which > tries to read settings->hash_version, but fails since settings is NULL. > I believe this happens when we have a computed Bloom filter prepared for > some commit, but have not yet written a commit graph containing that (or > any) Bloom filter. Ah, I discovered this independently too. I think it happens when we write version 2 Bloom filters to a repo that has no Bloom filters currently. So, r->settings.commit_graph_changed_paths_version is set to 2, but settings is NULL. I think the solution has to be a combination of both (use commit_graph_changed_paths_version, but if it is -1, check get_bloom_filter_settings()). But having said that, as I said above, we can go with what you have currently. > But I think we're talking about different things here. The point of > get_bloom_filter() as a wrapper around get_or_compute_bloom_filter() is > to only return Bloom filters that are readable under the configuration > commitGraph.changedPathsVersion. But this is my contention - sometimes commitGraph.changedPathsVersion is -1, so we need to find out which version is readable from elsewhere. > We have a handle on what version was used to compute Bloom filters in > each layer of the graph by reading its "version" field, which is written > via load_bloom_filter_from_graph(). > > So we want to return a non-NULL filter exactly when we (a) have a Bloom > filter computed for the given commit, and (b) its version matches the > version specified in commitGraph.chagnedPathsVersion. Yes, except add "or autodetected from the first commit graph file that we read if nothing is specified in commitGraph.changedPathsVersion" to the end. > Are you saying that we need to do the normalization ahead of time so > that we don't emit filters that have different hash versions when > working in a commit-graph chain where each layer may use different Bloom > filter settings? By "emit" do you mean write filters to disk? If yes, I'm worried about reading them, not writing them - for example, when running "git log" with a path. I am worried precisely about the commit-graph chain in which different layers have different Bloom settings, yes.