From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mx10.gouders.net (mx10.gouders.net [202.61.206.94]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D3BC2200D2 for ; Thu, 29 Feb 2024 22:35:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=202.61.206.94 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709246149; cv=none; b=Xia4S/e+y0iRsVrQmFbqbzM9OijyyxujUzgRsN7LljrCyW2/gjB4mOYTimIhMdWG+eH1uRJhRIrafOEN9K2220hmMHf/8XzeKRdI+hyH4Nz9CMx0xNMJe7UDnn1+W0K0YZXNuMagOeGCLlxjShEjRJrEHY4mgWlYYNY/SewC2H4= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709246149; c=relaxed/simple; bh=V3EFuUTIP2ZyLZfo+4BhidNSUZApcRJjGul+MT9yLCs=; h=From:To:Cc:Subject:In-Reply-To:References:Date:Message-ID: MIME-Version:Content-Type; b=pFfyCXYdBYcjl7kpzKyLEYAzYn9gcmixuBjVbz2Ra9Ry+90g9RL74fXWCXKD+9Slzud9xOEDn0gtie0OumY/baCIbb4JYjZXTJLgGX2rwzYWu24RUhiKAAiSxT1gtIafNDurk+gHpb8cgs9gWsQadIR8U5ltXgVFUG9RzNbe1r0= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gouders.net; spf=pass smtp.mailfrom=gouders.net; dkim=pass (1024-bit key) header.d=gouders.net header.i=@gouders.net header.b=fDV7BDtT; arc=none smtp.client-ip=202.61.206.94 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gouders.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gouders.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=gouders.net header.i=@gouders.net header.b="fDV7BDtT" Received: from localhost (ip-109-42-177-171.web.vodafone.de [109.42.177.171]) (authenticated bits=0) by mx10.gouders.net (8.17.1.9/8.17.1.9) with ESMTPSA id 41TMZe8v022656 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO); Thu, 29 Feb 2024 23:35:40 +0100 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gouders.net; s=gnet; t=1709246140; bh=V3EFuUTIP2ZyLZfo+4BhidNSUZApcRJjGul+MT9yLCs=; h=From:To:Cc:Subject:In-Reply-To:References:Date; b=fDV7BDtT550FxHdM0ZuRGkYUvhB9h9jQhRaTfVhg730wRkmsaD4Edydrjjn5E642j NyrNWSNdOAqkZhN3YMzwwoxr0oxFKepmvl6CYWmUeAmZkr7uO/4x0CIpMTNniAaKoY kmfJ/cxy9rg1GIA6Hbn8IaM/y792HwowY+0Wjr2Y= From: Dirk Gouders To: Junio C Hamano Cc: git list Subject: Re: [PATCH 1/1] Documentation/user-manual.txt: example for generating object hashes In-Reply-To: (Junio C. Hamano's message of "Thu, 29 Feb 2024 13:37:46 -0800") References: User-Agent: Gnus/5.13 (Gnus v5.13) Date: Thu, 29 Feb 2024 23:35:35 +0100 Message-ID: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain Junio C Hamano writes: > Dirk Gouders writes: > >> If someone spends the time to work through the documentation, the >> subject "hashes" can lead to contradictions: >> >> The README of the initial commit states hashes are generated from >> compressed data (which changed very soon), whereas >> Documentation/user-manual.txt says they are generated from original >> data. >> >> Don't give doubts a chance: clarify this and present a simple example >> on how object hashes can be generated manually. > > I'd rather not to waste readers' attention to historical wart. Yes, but -- I should have mentioned it -- the document itself suggests to read the initial commit. But I don't mean to argue about that, perhaps I digged to deep into details. >> @@ -4095,6 +4095,39 @@ that is used to name the object is the hash of the original data >> plus this header, so `sha1sum` 'file' does not match the object name >> for 'file'. > > The paragraph above (part of it is hidden before the hunk) clearly > states what the naming rules are. We hash the original and then > compress. If I use an implementation of Git that drives the zlib at > compression level 1, and if you clone from my repository with > another implementation of Git whose zlib is driven at compression > level 9, our .git/objects/01/2345...90 files may not be identical, > but when uncompressed they should store the same contents, so "hash > then compress" is the only sensible choice that is not affected by > the compression to give stable names to objects. Thank your for that detail. >> +Starting with the initial commit, hashing was done on the compressed >> +data and the file README of that commit explicitely states this: >> + >> +"The SHA1 hash is always the hash of the _compressed_ object, not the >> +original one." >> + >> +This changed soon after that with commit >> +d98b46f8d9a3 (Do SHA1 hash _before_ compression.). Unfortunately, the >> +commit message doesn't provide the detailed reasoning. > > These three are about Git development history, which by itself may > be of interest for some people, but the main target audience of the > user-manual is probably different from them. They may be interested > to learn how Git works, but it is only to feel that they understand > how the "magic" things Git does, like "a cryptographic hash of > contents is enough to uniquely identify the contents being tracked", > works well to trust their precious contents [*]. > > Side note: > https://lore.kernel.org/git/Pine.LNX.4.58.0504200144260.6467@ppc970.osdl.org/ > explains the reason behind the change to those who did not find > it obvious. > > FYI, another "breaking" change we did earlier in the history of the > project was to update the sort order of paths in tree objects. We > do not need to confuse readers by talking about the original and > updated sort order. The only thing they need, when they want to get > the feeling that they understand how things work, is the description > of how things work in the version of Git they have ready access to. > Historical mistakes we made, corrections we made and why, are > certainly of interest but not for the target audience of this > document. Again thank you, very interesting reading. > On the other hand, ... > >> +The following is a short example that demonstrates how hashes can be >> +generated manually: >> + >> +Let's asume a small text file with the content "Hello git.\n" >> +------------------------------------------------- >> +$ cat > hello.txt <> +Hello git. >> +EOF >> +------------------------------------------------- >> + >> +We can now manually generate the hash `git` would use for this file: >> + >> +- The object we want the hash for is of type "blob" and its size is >> + 11 bytes. >> + >> +- Prepend the object header to the file content and feed this to >> + sha1sum(1): >> + >> +------------------------------------------------- >> +$ printf "blob 11\0" | cat - hello.txt | sha1sum >> +7217614ba6e5f4e7db2edaa2cdf5fb5ee4358b57 . >> +------------------------------------------------- >> + > > ... something like the above (modulo coding style) would be a useful > addition to help those who want to convince themselves they > understand how (some parts of) Git works under the hood, and I think > it would be a welcome addition to some subset of such readers (the > rest of the world may feel it is way too much detail, though). > > I would draw the line between this one and a similar description and > demonstration of historical mistakes, which is not as relevant as > how things work in the current system. In other words, to me, it is > OK to dig a bit deep to show how the current scheme works but it is > way too much to do the same for versions of the system that do not > exist anymore. > > But others may draw the line differently and consider even the above > a bit too much detail, which is a position I would also accept. > > Thanks.