From: Sasha Levin <sashal@kernel.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: apw@canonical.com, conor@kernel.org, corbet@lwn.net,
dwaipayanray1@gmail.com, geert+renesas@glider.be,
gitster@pobox.com, horms@kernel.org, joe@perches.com,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
linux@leemhuis.info, lukas.bulwahn@gmail.com,
miguel.ojeda.sandonis@gmail.com, niklas.soderlund@corigine.com,
willy@infradead.org, workflows@vger.kernel.org, kees@kernel.org
Subject: Re: [PATCH] git-disambiguate: disambiguate shorthand git ids
Date: Thu, 26 Dec 2024 19:55:39 -0500 [thread overview]
Message-ID: <Z237CwC_YKhoZubs@lappy> (raw)
In-Reply-To: <CAHk-=wiPNpQ=zmGvPoZRMFZ7a7mm2pzSoOsuQPdDRaSF7gbw9w@mail.gmail.com>
On Thu, Dec 26, 2024 at 03:33:36PM -0800, Linus Torvalds wrote:
>On Thu, 26 Dec 2024 at 14:33, Sasha Levin <sashal@kernel.org> wrote:
>>
>> Which means that folks should be able to use a fairly short abbreviated
>> commit IDs in messages, specially for commits with a long subject line.
>
>So I don't think we should take this as a way to make *shorter* IDs,
>just as a way to accept historical short IDs.
>
>Also, I absolutely detest how you made this be all about "short IDs".
>
>As mentioned in my very original email on this matter, the actual REAL
>AND PRESENT issue isn't ambiguous IDs. We don't really have them.
What got me worried originally is the example Kees provided which
creates a collision of a 12-character abbreviated commit ID:
$ git log 1da177e4c3f4
error: short object ID 1da177e4c3 is ambiguous
hint: The candidates are:
hint: 1da177e4c3f41 commit 2005-04-16 - Linux-2.6.12-rc2
hint: 1da177e4c3f47 commit 2024-12-14 - docs: git SHA prefixes are for humans
When I tested it locally, my scripts were broken, our CVE scripts were
broken, and it is quite the PITA to address after the fact (think of all
the "Fixes: 1da177e4c3f4 [...]" lines we have in the log).
So sure, we don't have a collision right now, but going from 0 to 1 is
going to be painful.
Are we going to be actively watching for someone trying to sneak in a
commit like that, or should we just handle that case properly?
>What we *do* have is "wrong IDs". We have a ton of those.
>
>Look here, you can get a list of suspiciously wrong SHA1s with
>something like this:
>
> git log |
> egrep '[0-9a-f]{9,40} \(".*"\)' |
> sed 's/.*[^0-9a-f]\([0-9a-f]\{9,40\}\)[^0-9a-f].*/\1/' |
> sort -u > hexes
>
>which generates a list of things that look like commit IDs (ok,
>there's probably smarter ways) in our git logs.
>
>Now, *some* of those won't actually be commit IDs at all, they'll just
>be random hex numbers the above finds.
>
>But most of them will indeed be references to other commits.
>
>Then you try to find the bogus ones by doing something like
>
> cat hexes |
> while read i; do git show $i >& /dev/null || echo "No $i SHA"; done
>
>and you will get a lot ot hits.
>
>A *LOT*.
I ended up with this fun thing:
git log --pretty=format:'%B' | grep -E '^Fixes:[[:space:]][0-9a-fA-F]{8,40}' | while read -r line; do
[[ $line =~ ^Fixes:[[:space:]]([0-9a-fA-F]{8,40})(.*) ]] && \
! git rev-parse --quiet --verify "${BASH_REMATCH[1]}^{commit}" >/dev/null 2>&1 && \
{ r=$(./scripts/git-disambiguate.sh --force "${BASH_REMATCH[1]}" "${BASH_REMATCH[2]}"); \
[[ -n $r ]] && echo "Bad: $line" && echo "Good: Fixes: $(git ol "$r")"; }
Which looks for these wrong IDs in "Fixes:" tags and tries to run them
through git-disambiguate.sh:
Bad: Fixes: d71e629bed5b ("ARC: build: disallow invalid PAE40 + 4K page config")
Good: Fixes: 8871331b1769 ("ARC: build: disallow invalid PAE40 + 4K page config")
Bad: Fixes: 7e654ab7da03 ("cifs: during remount, make sure passwords are in sync")
Good: Fixes: 0f0e35790295 ("cifs: during remount, make sure passwords are in sync")
Bad: Fixes: ee650b3820f3 ("apparmor: properly handle cx/px lookup failure for complain")
Good: Fixes: db93ca15e5ae ("apparmor: properly handle cx/px lookup failure for complain")
[...]
>Look, I didn't check very many of them. Mainly because it gets *so*
>many hits, and I get bored very easily.
>
>But I did check a handful, just to get a feel for things.
>
>And yes, some of them were random hex numbers unrelated to actual git
>IDs, but most were really supposed to be git IDs. Except they weren't
>- or at least not from the mainline tree.
>
>For example, look at commit daa07cbc9ae3 ("KVM: x86: fix L1TF's MMIO
>GFN calculation") which references one of those nonexistent commit
>IDs:
>
> Fixes: d9b47449c1a1 ("kvm: x86: Set highest physical address bits
>in non-present/reserved SPTEs")
>
>and I have no idea where that bogus commit ID comes from. Maybe it's a
>rebase. Maybe it's from a stable tree. But it sure doesn't exist in
>mainline.
This one is indeed in the 4.18 stable tree.
>What *does* exist is commit 28a1f3ac1d0c ("kvm: x86: Set highest
>physical address bits in non-present/reserved SPTEs"), which I found
>by just doing that
>
> git log --grep='kvm: x86: Set highest physical address bits in
>non-present/reserved SPTEs'
>
>and my point is that this is really not about "disambiguating short
>SHA1 IDs". Because those "ambiguous" SHA1's to a very close
>approximation simply DO NOT EXIST.
>
>But the completely wrong ones? They are plentiful.
Is the concern mostly around the term "disambiguate"? Would
git-sanitize.sh work better?
--
Thanks,
Sasha
next prev parent reply other threads:[~2024-12-27 0:55 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-26 22:05 [PATCH] git-disambiguate: disambiguate shorthand git ids Sasha Levin
2024-12-26 22:32 ` Sasha Levin
2024-12-26 23:33 ` Linus Torvalds
2024-12-27 0:55 ` Sasha Levin [this message]
2024-12-27 1:50 ` Linus Torvalds
2024-12-27 16:19 ` Sasha Levin
2024-12-27 18:33 ` Geert Uytterhoeven
2024-12-27 19:07 ` Sasha Levin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z237CwC_YKhoZubs@lappy \
--to=sashal@kernel.org \
--cc=apw@canonical.com \
--cc=conor@kernel.org \
--cc=corbet@lwn.net \
--cc=dwaipayanray1@gmail.com \
--cc=geert+renesas@glider.be \
--cc=gitster@pobox.com \
--cc=horms@kernel.org \
--cc=joe@perches.com \
--cc=kees@kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux@leemhuis.info \
--cc=lukas.bulwahn@gmail.com \
--cc=miguel.ojeda.sandonis@gmail.com \
--cc=niklas.soderlund@corigine.com \
--cc=torvalds@linux-foundation.org \
--cc=willy@infradead.org \
--cc=workflows@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).