[PATCH v2 0/2] Align and increase git commit ID abbreviation guidelines and checks

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/2] Align and increase git commit ID abbreviation guidelines and checks
@ 2024-12-05 18:16 Geert Uytterhoeven
  2024-12-05 18:16 ` [PATCH v2 1/2] Align " Geert Uytterhoeven
  2024-12-05 18:16 ` [PATCH v2 2/2] Increase minimum git commit ID abbreviation to 16 characters Geert Uytterhoeven
  0 siblings, 2 replies; 11+ messages in thread
From: Geert Uytterhoeven @ 2024-12-05 18:16 UTC (permalink / raw)
  To: Dwaipayan Ray, Lukas Bulwahn, Joe Perches, Jonathan Corbet,
	Thorsten Leemhuis, Andy Whitcroft, Niklas Söderlund,
	Simon Horman, Conor Dooley, Miguel Ojeda, Linus Torvalds
  Cc: Junio C Hamano, workflows, linux-doc, linux-kernel,
	Geert Uytterhoeven

	Hi all,

This patch series:
  - Aligns the documentation and checks to settle on a mininum of 12
    characters for git commit ID abbreviations, so larger abbreviations
    are no longer flagged as a warning by checkpatch.pl,
  - Increase the minimum recommended abbreviation to 16 characters.

The Linux kernel repository is growing fast.  Running
git-unique-abbrev[1] on a tree containing v6.13-rc1 and all stable
releases gives:

    13021585 objects
     4: 13021585 / 65536
     5: 13021519 / 1048507
     6: 7028382 / 3064402
     7: 616474 / 305777
     8: 39566 / 19775
     9: 2452 / 1226
    10: 186 / 93
    11: 12 / 6
    12: 0 / 0
    21cf4d54d3c702ac20c6747fa6d4f64dee07dd11
    21cf4d54d3ced8a3e752030e483d72997721076d
    8a048bbf89528d45c604aed68f7e0f0ef957067d
    8a048bbf895b1359e4a33b779ea6d7386cfe4de2
    c8db0fc796454553647e11a055eed4e46676e3ed
    c8db0fc7964a2c9394c17722f30e4f1420aaa8e0
    d3ac4e475103c4364ecb47a6a55c114d7c42a014
    d3ac4e47510ec0753ebe1e418a334ad202784aa8
    d597639e2036f04f0226761e2d818b31f2db7820
    d597639e203a100156501df8a0756fd09573e2de
    ef91b6e893a00d903400f8e1303efc4d52b710af
    ef91b6e893afc4c4ca488453ea9f19ced5fa5861

13021585 is still smaller than sqrt(16^12) = 16777216, but the safety
margin is getting smaller.  E.g. my main work repo already contains +19M
objects.  Hence the Birthday Paradox states that collisions of
12-character commit IDs are imminent.

Fortunately, git is smart: when the number of configured characters for
abbreviations (using core.abbrev, or --abbrev) is too small, git
automatically prints a larger hash.  Obviously this only takes into
account the repository of the creator, and not the (possibly much
larger) repository of the receiver.
Due to this, patches with 13-char Fixes tags have already been seen in
the wild[2].  Unfortunately such patches are currently flagged by
checkpatch.pl (and sometimes even rejected), despite some parts of the
documentation stating that "at least 12 characters" is fine.  FTR, I've
been using 16-characters commit IDs for quite a while.

Hence the first patch settles on "at least 12 chars" everywhere.
The second patch increases the minimum to 16 characters, to reduce the
risk of conflicts, now and in the near future.

Note that we standardized on 12 chars in commit d311cd44545f2f69
("checkpatch: add test for commit id formatting style in commit log") in
v3.17, when the repo had just surpassed 4M objects.  Going to 16 chars
should provide enough headroom until after my official retirement ;-)

Note that I did not update Documentation/translations/.

Changes compared to v1[3]:
  - Rebase on top of commit 2f07b652384969f5 ("checkpatch: always parse
    orig_commit in fixes tag") in v6.13-rc1,
  - Update documentation, too,
  - New patch to increase the minimum to 16 characters.

Thanks for your comments!

[1] https://blog.cuviper.com/2013/11/10/how-short-can-git-abbreviate/
[2] https://lore.kernel.org/all/20241009-tamale-revisit-7d2606c5fdf3@spud
[3] "[PATCH] checkpatch: Also accept commit ids with 13-40 chars of sha1"
    https://lore.kernel.org/all/62f82b0308de05f5aab913392049af15d53c777d.1701804489.git.geert+renesas@glider.be/

Geert Uytterhoeven (2):
  Align git commit ID abbreviation guidelines and checks
  Increase minimum git commit ID abbreviation to 16 characters

 Documentation/dev-tools/checkpatch.rst        |  2 +-
 Documentation/doc-guide/sphinx.rst            |  4 +--
 Documentation/process/5.Posting.rst           |  2 +-
 .../process/handling-regressions.rst          |  6 ++--
 Documentation/process/maintainer-tip.rst      |  6 ++--
 Documentation/process/submitting-patches.rst  | 18 ++++++------
 scripts/checkpatch.pl                         | 28 +++++++++----------
 7 files changed, 33 insertions(+), 33 deletions(-)

-- 
2.34.1

Gr{oetje,eeting}s,

						Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
							    -- Linus Torvalds

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 1/2] Align git commit ID abbreviation guidelines and checks
  2024-12-05 18:16 [PATCH v2 0/2] Align and increase git commit ID abbreviation guidelines and checks Geert Uytterhoeven
@ 2024-12-05 18:16 ` Geert Uytterhoeven
  2024-12-30 18:43   ` Jonathan Corbet
  2024-12-05 18:16 ` [PATCH v2 2/2] Increase minimum git commit ID abbreviation to 16 characters Geert Uytterhoeven
  1 sibling, 1 reply; 11+ messages in thread
From: Geert Uytterhoeven @ 2024-12-05 18:16 UTC (permalink / raw)
  To: Dwaipayan Ray, Lukas Bulwahn, Joe Perches, Jonathan Corbet,
	Thorsten Leemhuis, Andy Whitcroft, Niklas Söderlund,
	Simon Horman, Conor Dooley, Miguel Ojeda, Linus Torvalds
  Cc: Junio C Hamano, workflows, linux-doc, linux-kernel,
	Geert Uytterhoeven

The guidelines for git commit ID abbreviation are inconsistent: some
places state to use 12 characters exactly, while other places recommend
12 characters or more.  The same issue is present in the checkpatch.pl
script.

E.g. Documentation/dev-tools/checkpatch.rst says:

  **GIT_COMMIT_ID**
    The proper way to reference a commit id is:
    commit <12+ chars of sha1> ("<title line>")

However, scripts/checkpatch.pl has two different checks: one warning
check accepting 12 characters exactly:

    # Check Fixes: styles is correct
    Please use correct Fixes: style 'Fixes: <12 chars of sha1> (\"<title line>\")'

and a second error check accepting 12-40 characters:

    # Check for git id commit length and improperly formed commit descriptions
    # A correctly formed commit description is:
    #    commit <SHA-1 hash length 12+ chars> ("Complete commit subject")
    Please use git commit description style 'commit <12+ chars of sha1>

Hence patches containing commit IDs with more than 12 characters are
flagged by checkpatch, and sometimes rejected by maintainers or
reviewers.  This is becoming more important with the growth of the
repository, as git may decide to use more characters in case of local
conflicts.

Fix this by settling on at least 12 characters, in both the
documentation and in the checkpatch.pl script.

Fixes: bd17e036b495bebb ("checkpatch: warn for non-standard fixes tag style")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
---
v2:
  - Rebase on top of commit 2f07b652384969f5 ("checkpatch: always parse
    orig_commit in fixes tag") in v6.13-rc1,
  - Update documentation, too.
---
 Documentation/process/maintainer-tip.rst     | 2 +-
 Documentation/process/submitting-patches.rst | 8 ++++----
 scripts/checkpatch.pl                        | 4 ++--
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/Documentation/process/maintainer-tip.rst b/Documentation/process/maintainer-tip.rst
index e374b67b3277ac54..41d5855700cd4f83 100644
--- a/Documentation/process/maintainer-tip.rst
+++ b/Documentation/process/maintainer-tip.rst
@@ -270,7 +270,7 @@ Ordering of commit tags
 To have a uniform view of the commit tags, the tip maintainers use the
 following tag ordering scheme:
 
- - Fixes: 12char-SHA1 ("sub/sys: Original subject line")
+ - Fixes: 12+char-SHA1 ("sub/sys: Original subject line")
 
    A Fixes tag should be added even for changes which do not need to be
    backported to stable kernels, i.e. when addressing a recently introduced
diff --git a/Documentation/process/submitting-patches.rst b/Documentation/process/submitting-patches.rst
index 1518bd57adab501f..f3508b5aa4ebab96 100644
--- a/Documentation/process/submitting-patches.rst
+++ b/Documentation/process/submitting-patches.rst
@@ -143,10 +143,10 @@ also track such tags and take certain actions. Private bug trackers and
 invalid URLs are forbidden.
 
 If your patch fixes a bug in a specific commit, e.g. you found an issue using
-``git bisect``, please use the 'Fixes:' tag with the first 12 characters of
-the SHA-1 ID, and the one line summary.  Do not split the tag across multiple
-lines, tags are exempt from the "wrap at 75 columns" rule in order to simplify
-parsing scripts.  For example::
+``git bisect``, please use the 'Fixes:' tag with at least the first 12
+characters of the SHA-1 ID, and the one line summary.  Do not split the tag
+across multiple lines, tags are exempt from the "wrap at 75 columns" rule in
+order to simplify parsing scripts.  For example::
 
 	Fixes: 54a4f0239f2e ("KVM: MMU: make kvm_mmu_zap_page() return the number of pages it actually freed")
 
diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index dbb9c3c6fe30f906..5b57f0306a50046d 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -3230,7 +3230,7 @@ sub process {
 			my $tag_case = not ($tag eq "Fixes:");
 			my $tag_space = not ($line =~ /^fixes:? [0-9a-f]{5,40} ($balanced_parens)/i);
 
-			my $id_length = not ($orig_commit =~ /^[0-9a-f]{12}$/i);
+			my $id_length = not ($orig_commit =~ /^[0-9a-f]{12,40}$/i);
 			my $id_case = not ($orig_commit !~ /[A-F]/);
 
 			my $id = "0123456789ab";
@@ -3240,7 +3240,7 @@ sub process {
 			if ($ctitle ne $title || $tag_case || $tag_space ||
 			    $id_length || $id_case || !$title_has_quotes) {
 				if (WARN("BAD_FIXES_TAG",
-				     "Please use correct Fixes: style 'Fixes: <12 chars of sha1> (\"<title line>\")' - ie: 'Fixes: $cid (\"$ctitle\")'\n" . $herecurr) &&
+				     "Please use correct Fixes: style 'Fixes: <12+ chars of sha1> (\"<title line>\")' - ie: 'Fixes: $cid (\"$ctitle\")'\n" . $herecurr) &&
 				    $fix) {
 					$fixed[$fixlinenr] = "Fixes: $cid (\"$ctitle\")";
 				}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 2/2] Increase minimum git commit ID abbreviation to 16 characters
  2024-12-05 18:16 [PATCH v2 0/2] Align and increase git commit ID abbreviation guidelines and checks Geert Uytterhoeven
  2024-12-05 18:16 ` [PATCH v2 1/2] Align " Geert Uytterhoeven
@ 2024-12-05 18:16 ` Geert Uytterhoeven
  2024-12-05 19:19   ` Linus Torvalds
  1 sibling, 1 reply; 11+ messages in thread
From: Geert Uytterhoeven @ 2024-12-05 18:16 UTC (permalink / raw)
  To: Dwaipayan Ray, Lukas Bulwahn, Joe Perches, Jonathan Corbet,
	Thorsten Leemhuis, Andy Whitcroft, Niklas Söderlund,
	Simon Horman, Conor Dooley, Miguel Ojeda, Linus Torvalds
  Cc: Junio C Hamano, workflows, linux-doc, linux-kernel,
	Geert Uytterhoeven

As of v6.13-rc1, a git repository with all upstream and stable versions
of the Linux kernel sources contains more than 13 million objects.
Local development trees contain many more, approaching or surpassing
sqrt(16^12) = 16777216 objects.

Hence according to the Birthday Paradox, collisions of 12-chararacter
git commit IDs are imminent, or already happening.  Indeed, patches with
13-character Fixes-tags have already been seen in the wild, due to git
automatically increasing the size of the abbreviation when needed.
Note that this need is based on the current repository of the creator,
not on the (future) repository of the receiver.

Decrease the probability of collisions by increasing the recommended
minimum number of characters from 12 to 16.  Update the guidelines and
the examples in the documentation, and all checks in the checkpatch.pl
script.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
---
v2:
  - New.
---
 Documentation/dev-tools/checkpatch.rst        |  2 +-
 Documentation/doc-guide/sphinx.rst            |  4 +--
 Documentation/process/5.Posting.rst           |  2 +-
 .../process/handling-regressions.rst          |  6 ++--
 Documentation/process/maintainer-tip.rst      |  6 ++--
 Documentation/process/submitting-patches.rst  | 12 ++++----
 scripts/checkpatch.pl                         | 28 +++++++++----------
 7 files changed, 30 insertions(+), 30 deletions(-)

diff --git a/Documentation/dev-tools/checkpatch.rst b/Documentation/dev-tools/checkpatch.rst
index abb3ff6820766ee0..592812c6eabad43a 100644
--- a/Documentation/dev-tools/checkpatch.rst
+++ b/Documentation/dev-tools/checkpatch.rst
@@ -599,7 +599,7 @@ Commit message
 
   **GIT_COMMIT_ID**
     The proper way to reference a commit id is:
-    commit <12+ chars of sha1> ("<title line>")
+    commit <16+ chars of sha1> ("<title line>")
 
     An example may be::
 
diff --git a/Documentation/doc-guide/sphinx.rst b/Documentation/doc-guide/sphinx.rst
index 8081ebfe48bc045f..7e4ea14686e107be 100644
--- a/Documentation/doc-guide/sphinx.rst
+++ b/Documentation/doc-guide/sphinx.rst
@@ -441,8 +441,8 @@ Referencing commits
 References to git commits are automatically hyperlinked given that they are
 written in one of these formats::
 
-    commit 72bf4f1767f0
-    commit 72bf4f1767f0 ("net: do not leave an empty skb in write queue")
+    commit 72bf4f1767f03869
+    commit 72bf4f1767f03869 ("net: do not leave an empty skb in write queue")
 
 .. _sphinx_kfigure:
 
diff --git a/Documentation/process/5.Posting.rst b/Documentation/process/5.Posting.rst
index b3eff03ea2491c88..d396d0f051b9c1e3 100644
--- a/Documentation/process/5.Posting.rst
+++ b/Documentation/process/5.Posting.rst
@@ -199,7 +199,7 @@ document; what follows here is a brief summary.
 One tag is used to refer to earlier commits which introduced problems fixed by
 the patch::
 
-	Fixes: 1f2e3d4c5b6a ("The first line of the commit specified by the first 12 characters of its SHA-1 ID")
+	Fixes: 1f2e3d4c5b6a7980 ("The first line of the commit specified by the first 16 characters of its SHA-1 ID")
 
 Another tag is used for linking web pages with additional backgrounds or
 details, for example an earlier discussion which leads to the patch or a
diff --git a/Documentation/process/handling-regressions.rst b/Documentation/process/handling-regressions.rst
index 1f5ab49c48a480d2..cde4c663b379f567 100644
--- a/Documentation/process/handling-regressions.rst
+++ b/Documentation/process/handling-regressions.rst
@@ -31,7 +31,7 @@ The important bits (aka "The TL;DR")
       list in CC) containing a paragraph like the following, which tells regzbot
       when the issue started to happen::
 
-       #regzbot ^introduced: 1f2e3d4c5b6a
+       #regzbot ^introduced: 1f2e3d4c5b6a7980
 
     * When forwarding reports from a bug tracker to the regressions list (see
       above), include a paragraph like the following::
@@ -82,7 +82,7 @@ When doing either, consider making the Linux kernel regression tracking bot
 "regzbot" immediately start tracking the issue:
 
  * For mailed reports, check if the reporter included a "regzbot command" like
-   ``#regzbot introduced: 1f2e3d4c5b6a``. If not, send a reply (with the
+   ``#regzbot introduced: 1f2e3d4c5b6a7980``. If not, send a reply (with the
    regressions list in CC) with a paragraph like the following:::
 
        #regzbot ^introduced: v5.13..v5.14-rc1
@@ -100,7 +100,7 @@ When doing either, consider making the Linux kernel regression tracking bot
  * When forwarding a regression reported to a bug tracker, include a paragraph
    with these regzbot commands::
 
-       #regzbot introduced: 1f2e3d4c5b6a
+       #regzbot introduced: 1f2e3d4c5b6a7980
        #regzbot from: Some N. Ice Human <some.human@example.com>
        #regzbot monitor: http://some.bugtracker.example.com/ticket?id=123456789
 
diff --git a/Documentation/process/maintainer-tip.rst b/Documentation/process/maintainer-tip.rst
index 41d5855700cd4f83..12edc8e06367e3f3 100644
--- a/Documentation/process/maintainer-tip.rst
+++ b/Documentation/process/maintainer-tip.rst
@@ -270,7 +270,7 @@ Ordering of commit tags
 To have a uniform view of the commit tags, the tip maintainers use the
 following tag ordering scheme:
 
- - Fixes: 12+char-SHA1 ("sub/sys: Original subject line")
+ - Fixes: 16+char-SHA1 ("sub/sys: Original subject line")
 
    A Fixes tag should be added even for changes which do not need to be
    backported to stable kernels, i.e. when addressing a recently introduced
@@ -284,7 +284,7 @@ following tag ordering scheme:
 
      Commit
 
-       abcdef012345678 ("x86/xxx: Replace foo with bar")
+       abcdef0123456789 ("x86/xxx: Replace foo with bar")
 
      left an unused instance of variable foo around. Remove it.
 
@@ -295,7 +295,7 @@ following tag ordering scheme:
      The recent replacement of foo with bar left an unused instance of
      variable foo around. Remove it.
 
-     Fixes: abcdef012345678 ("x86/xxx: Replace foo with bar")
+     Fixes: abcdef0123456789 ("x86/xxx: Replace foo with bar")
      Signed-off-by: J.Dev <j.dev@mail>
 
    The latter puts the information about the patch into the focus and
diff --git a/Documentation/process/submitting-patches.rst b/Documentation/process/submitting-patches.rst
index f3508b5aa4ebab96..4c8e0f9c8fbbd83c 100644
--- a/Documentation/process/submitting-patches.rst
+++ b/Documentation/process/submitting-patches.rst
@@ -106,7 +106,7 @@ Example::
 	platform_set_drvdata(), but left the variable "dev" unused,
 	delete it.
 
-You should also be sure to use at least the first twelve characters of the
+You should also be sure to use at least the first sixteen characters of the
 SHA-1 ID.  The kernel repository holds a *lot* of objects, making
 collisions with shorter IDs a real possibility.  Bear in mind that, even if
 there is no collision with your six-character ID now, that condition may
@@ -143,25 +143,25 @@ also track such tags and take certain actions. Private bug trackers and
 invalid URLs are forbidden.
 
 If your patch fixes a bug in a specific commit, e.g. you found an issue using
-``git bisect``, please use the 'Fixes:' tag with at least the first 12
+``git bisect``, please use the 'Fixes:' tag with at least the first 16
 characters of the SHA-1 ID, and the one line summary.  Do not split the tag
 across multiple lines, tags are exempt from the "wrap at 75 columns" rule in
 order to simplify parsing scripts.  For example::
 
-	Fixes: 54a4f0239f2e ("KVM: MMU: make kvm_mmu_zap_page() return the number of pages it actually freed")
+	Fixes: 54a4f0239f2e98bc ("KVM: MMU: make kvm_mmu_zap_page() return the number of pages it actually freed")
 
 The following ``git config`` settings can be used to add a pretty format for
 outputting the above style in the ``git log`` or ``git show`` commands::
 
 	[core]
-		abbrev = 12
+		abbrev = 16
 	[pretty]
 		fixes = Fixes: %h (\"%s\")
 
 An example call::
 
-	$ git log -1 --pretty=fixes 54a4f0239f2e
-	Fixes: 54a4f0239f2e ("KVM: MMU: make kvm_mmu_zap_page() return the number of pages it actually freed")
+	$ git log -1 --pretty=fixes 54a4f0239f2e98bc
+	Fixes: 54a4f0239f2e98bc ("KVM: MMU: make kvm_mmu_zap_page() return the number of pages it actually freed")
 
 .. _split_changes:
 
diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 5b57f0306a50046d..80cde0aa1a3115e9 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -1246,13 +1246,13 @@ sub git_commit_info {
 #		git rev-list --remotes | grep -i "^$1" |
 #		while read line ; do
 #		    git log --format='%H %s' -1 $line |
-#		    echo "commit $(cut -c 1-12,41-)"
+#		    echo "commit $(cut -c 1-16,41-)"
 #		done
 	} elsif ($lines[0] =~ /^fatal: ambiguous argument '$commit': unknown revision or path not in the working tree\./ ||
 		 $lines[0] =~ /^fatal: bad object $commit/) {
 		$id = undef;
 	} else {
-		$id = substr($lines[0], 0, 12);
+		$id = substr($lines[0], 0, 16);
 		$desc = substr($lines[0], 41);
 	}
 
@@ -1320,7 +1320,7 @@ for my $filename (@ARGV) {
 	if ($filename eq '-') {
 		$vname = 'Your patch';
 	} elsif ($git) {
-		$vname = "Commit " . substr($filename, 0, 12) . ' ("' . $git_commits{$filename} . '")';
+		$vname = "Commit " . substr($filename, 0, 16) . ' ("' . $git_commits{$filename} . '")';
 	} else {
 		$vname = $filename;
 	}
@@ -3230,17 +3230,17 @@ sub process {
 			my $tag_case = not ($tag eq "Fixes:");
 			my $tag_space = not ($line =~ /^fixes:? [0-9a-f]{5,40} ($balanced_parens)/i);
 
-			my $id_length = not ($orig_commit =~ /^[0-9a-f]{12,40}$/i);
+			my $id_length = not ($orig_commit =~ /^[0-9a-f]{16,40}$/i);
 			my $id_case = not ($orig_commit !~ /[A-F]/);
 
-			my $id = "0123456789ab";
+			my $id = "0123456789abcdef";
 			my ($cid, $ctitle) = git_commit_info($orig_commit, $id,
 							     $title);
 
 			if ($ctitle ne $title || $tag_case || $tag_space ||
 			    $id_length || $id_case || !$title_has_quotes) {
 				if (WARN("BAD_FIXES_TAG",
-				     "Please use correct Fixes: style 'Fixes: <12+ chars of sha1> (\"<title line>\")' - ie: 'Fixes: $cid (\"$ctitle\")'\n" . $herecurr) &&
+				     "Please use correct Fixes: style 'Fixes: <16+ chars of sha1> (\"<title line>\")' - ie: 'Fixes: $cid (\"$ctitle\")'\n" . $herecurr) &&
 				    $fix) {
 					$fixed[$fixlinenr] = "Fixes: $cid (\"$ctitle\")";
 				}
@@ -3330,7 +3330,7 @@ sub process {
 
 # Check for git id commit length and improperly formed commit descriptions
 # A correctly formed commit description is:
-#    commit <SHA-1 hash length 12+ chars> ("Complete commit subject")
+#    commit <SHA-1 hash length 16+ chars> ("Complete commit subject")
 # with the commit subject '("' prefix and '")' suffix
 # This is a fairly compilicated block as it tests for what appears to be
 # bare SHA-1 hash with  minimum length of 5.  It also avoids several types of
@@ -3343,16 +3343,16 @@ sub process {
 		    $line !~ /^This reverts commit [0-9a-f]{7,40}/ &&
 		    (($line =~ /\bcommit\s+[0-9a-f]{5,}\b/i ||
 		      ($line =~ /\bcommit\s*$/i && defined($rawlines[$linenr]) && $rawlines[$linenr] =~ /^\s*[0-9a-f]{5,}\b/i)) ||
-		     ($line =~ /(?:\s|^)[0-9a-f]{12,40}(?:[\s"'\(\[]|$)/i &&
-		      $line !~ /[\<\[][0-9a-f]{12,40}[\>\]]/i &&
-		      $line !~ /\bfixes:\s*[0-9a-f]{12,40}/i))) {
+		     ($line =~ /(?:\s|^)[0-9a-f]{16,40}(?:[\s"'\(\[]|$)/i &&
+		      $line !~ /[\<\[][0-9a-f]{16,40}[\>\]]/i &&
+		      $line !~ /\bfixes:\s*[0-9a-f]{16,40}/i))) {
 			my $init_char = "c";
 			my $orig_commit = "";
 			my $short = 1;
 			my $long = 0;
 			my $case = 1;
 			my $space = 1;
-			my $id = '0123456789ab';
+			my $id = '0123456789abcdef';
 			my $orig_desc = "commit description";
 			my $description = "";
 			my $herectx = $herecurr;
@@ -3383,11 +3383,11 @@ sub process {
 			if ($input =~ /\b(c)ommit\s+([0-9a-f]{5,})\b/i) {
 				$init_char = $1;
 				$orig_commit = lc($2);
-				$short = 0 if ($input =~ /\bcommit\s+[0-9a-f]{12,40}/i);
+				$short = 0 if ($input =~ /\bcommit\s+[0-9a-f]{16,40}/i);
 				$long = 1 if ($input =~ /\bcommit\s+[0-9a-f]{41,}/i);
 				$space = 0 if ($input =~ /\bcommit [0-9a-f]/i);
 				$case = 0 if ($input =~ /\b[Cc]ommit\s+[0-9a-f]{5,40}[^A-F]/);
-			} elsif ($input =~ /\b([0-9a-f]{12,40})\b/i) {
+			} elsif ($input =~ /\b([0-9a-f]{16,40})\b/i) {
 				$orig_commit = lc($1);
 			}
 
@@ -3398,7 +3398,7 @@ sub process {
 			    ($short || $long || $space || $case || ($orig_desc ne $description) || !$has_quotes) &&
 			    $last_git_commit_id_linenr != $linenr - 1) {
 				ERROR("GIT_COMMIT_ID",
-				      "Please use git commit description style 'commit <12+ chars of sha1> (\"<title line>\")' - ie: '${init_char}ommit $id (\"$description\")'\n" . $herectx);
+				      "Please use git commit description style 'commit <16+ chars of sha1> (\"<title line>\")' - ie: '${init_char}ommit $id (\"$description\")'\n" . $herectx);
 			}
 			#don't report the next line if this line ends in commit and the sha1 hash is the next line
 			$last_git_commit_id_linenr = $linenr if ($line =~ /\bcommit\s*$/i);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/2] Increase minimum git commit ID abbreviation to 16 characters
  2024-12-05 18:16 ` [PATCH v2 2/2] Increase minimum git commit ID abbreviation to 16 characters Geert Uytterhoeven
@ 2024-12-05 19:19   ` Linus Torvalds
  2024-12-13 19:41     ` Geert Uytterhoeven
                       ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Linus Torvalds @ 2024-12-05 19:19 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Dwaipayan Ray, Lukas Bulwahn, Joe Perches, Jonathan Corbet,
	Thorsten Leemhuis, Andy Whitcroft, Niklas Söderlund,
	Simon Horman, Conor Dooley, Miguel Ojeda, Junio C Hamano,
	workflows, linux-doc, linux-kernel

On Thu, 5 Dec 2024 at 10:16, Geert Uytterhoeven <geert+renesas@glider.be> wrote:
>
> Hence according to the Birthday Paradox, collisions of 12-chararacter
> git commit IDs are imminent, or already happening.

Note that ambiguous commit IDs are not even remotely as scary as this implies.

Yes, the current kernel tree has over ten million objects, and when
you look at stable trees etc, you can easily see more.

But commits are only a fraction (about 1/8th) of the total objects. My
tree is at about 1.3M commits, so we're basically an order of
magnitude off the point where collisions start being an issue wrt
commit IDs.

Can you find collisions by looking at all objects? Yes. Git will do
that for you, and tell you their types. But to take one recent
example, let's do the 6.12 commit:
adc218676eef25575469234709c2d87185ca223a. To get an ambiguous ID, you
have to go down to 6 characters, and even then git will tell you
there's only one object that is a commit, ie

   $ git show adc218

results in

  error: short object ID adc218 is ambiguous
  hint: The candidates are:
  hint:   adc218676eef commit 2024-11-17 - Linux 6.12
  hint:   adc2184009c5 blob

so right now you have a collision in six digits for that commit, but
even then it's actually still entirely unambiguous once you know
you're talking about a commit.

Are there worse cases? Yup. With just 7 characters, you get commits
like 95b861a that actually have three ambiguous commit IDs. And you
still get ambiguous results with 9 characters.

With 10 characters, there are no collisions. So the "we're an order of
magnitude off" seems about right - you get slightly more than one
order of magnitude for each two digits.

And remember: we're an order of magnitude off *AFTER 20 YEARS OF GIT HISTORY*.

Furthermore, the "in the future" argument is bogus. Yes, there will be
more commits in the future, but it's not going to suddenly make old
SHA ID's somehow more ambiguous, since you can also take history into
account - and when quoting the short format it should always be
accompanied by the first line of the commit message too.

Why do I care? Because long git commit IDs are actually detrimental to
legibility. I try to make commit messages legible, and that very much
is the *point* of the short format. It's for people, not machinery.

Yes, the basic git machinery doesn't do object type disambiguation
(and if you do "git show", you can give it blob IDs etc, so git itself
may not know about the proper type to use disambiguate at all). And
git also doesn't know about the whole "we also put the first line of
the commit message" thing.

But honestly, I'm claiming that something like

    Fixes: 48bcda684823 ("tracing: Remove definition of trace_*_rcuidle()")

(to pick a random recent commit) is completely unambiguous for the
intended audience, and will remain so forever within the context that
it is in.

And I think the "intended audience" here is important. 12 characters
is already line noise, and causes occasional odd line wrapping (you
don't see that in things like the "Fixes:" tags, but you do see it in
the better commit messages that refer to the commits they fix).

I think we should accept that it's not the full SHA1, and also accept
what that really means.

Final note: personally, I find that the SHA1 - shortened or not - is
often *less* descriptive than the shortlog, for the simple reason that
rebasing happens, and people refer to other commits with stale commit
IDs. That's an issue that I personally hit regularly, and it has a
fairly simple solution in the form of

    git log --grep="..one-liner goes here.."

and my point here is that if you rely too much on the SHA1, your
workflow is *ALREADY* broken, and it has nothing to do with the
shortening.

Put another way: if you have particular tooling that you worry about,
I think you should look at the tooling. You can find real examples of
much shorted commit IDs in the kernel, and real examples of the MUCH
MORE REAL issue of wrong commit ID's right now.

See for example:

   0a1336c8c935 ("IB/ipath: Fix IRQ for PCI Express HCAs")

which refers to commit 51f65ebc ("IB/ipath - program intconfig
register using new HT irq hook"), which is still perfectly unique, but
then look at

   2e61c646edfa ("mlx4_core: Use mmiowb() to avoid firmware commands
getting jumbled up")

which refers to commit 66547550 ("IB/mthca: Use mmiowb() to avoid
firmware commands getting jumbled up"). That commit doesn't exist at
all - it's not ambiguous due to being short, it's ambiguous due to
being *wrong* (presumably due to a rebase)(.

The real commit ID? 76d7cc0345a0. Easily found using the
human-readable shortlog,

So here's the meat of the argument: you are barking up the wrong tree.
We have real and present issues that have been going on since at least
2007, and they have *nothing* to do with the short SHA1s.

I don't want to make the short SHA1's worse, when the real and present
problems are elsewhere.

Make the tools deal with the cases we already have, and you'll find
that the shortening is a complete non-issue.

                Linus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/2] Increase minimum git commit ID abbreviation to 16 characters
  2024-12-05 19:19   ` Linus Torvalds
@ 2024-12-13 19:41     ` Geert Uytterhoeven
  2024-12-14 16:03     ` Matthew Wilcox
  2024-12-18 23:36     ` [RFC] git-disambiguate: disambiguate shorthand git ids Sasha Levin
  2 siblings, 0 replies; 11+ messages in thread
From: Geert Uytterhoeven @ 2024-12-13 19:41 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dwaipayan Ray, Lukas Bulwahn, Joe Perches, Jonathan Corbet,
	Thorsten Leemhuis, Andy Whitcroft, Niklas Söderlund,
	Simon Horman, Conor Dooley, Miguel Ojeda, Junio C Hamano,
	workflows, linux-doc, linux-kernel

Hi Linus,

On Thu, Dec 5, 2024 at 8:19 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Thu, 5 Dec 2024 at 10:16, Geert Uytterhoeven <geert+renesas@glider.be> wrote:
> > Hence according to the Birthday Paradox, collisions of 12-chararacter
> > git commit IDs are imminent, or already happening.
>
> Note that ambiguous commit IDs are not even remotely as scary as this implies.
>
> Yes, the current kernel tree has over ten million objects, and when
> you look at stable trees etc, you can easily see more.
>
> But commits are only a fraction (about 1/8th) of the total objects. My
> tree is at about 1.3M commits, so we're basically an order of
> magnitude off the point where collisions start being an issue wrt
> commit IDs.
>
> Can you find collisions by looking at all objects? Yes. Git will do
> that for you, and tell you their types. But to take one recent
> example, let's do the 6.12 commit:
> adc218676eef25575469234709c2d87185ca223a. To get an ambiguous ID, you
> have to go down to 6 characters, and even then git will tell you
> there's only one object that is a commit, ie
>
>    $ git show adc218
>
> results in
>
>   error: short object ID adc218 is ambiguous
>   hint: The candidates are:
>   hint:   adc218676eef commit 2024-11-17 - Linux 6.12
>   hint:   adc2184009c5 blob
>
> so right now you have a collision in six digits for that commit, but
> even then it's actually still entirely unambiguous once you know
> you're talking about a commit.

That's true for the basic command line tools...

> Make the tools deal with the cases we already have, and you'll find
> that the shortening is a complete non-issue.

FTR, cgit can use some improvements, as
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=adc218
just tells you "Bad object id: adc218".

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/2] Increase minimum git commit ID abbreviation to 16 characters
  2024-12-05 19:19   ` Linus Torvalds
  2024-12-13 19:41     ` Geert Uytterhoeven
@ 2024-12-14 16:03     ` Matthew Wilcox
  2024-12-14 16:31       ` Linus Torvalds
  2024-12-18 23:36     ` [RFC] git-disambiguate: disambiguate shorthand git ids Sasha Levin
  2 siblings, 1 reply; 11+ messages in thread
From: Matthew Wilcox @ 2024-12-14 16:03 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Geert Uytterhoeven, Dwaipayan Ray, Lukas Bulwahn, Joe Perches,
	Jonathan Corbet, Thorsten Leemhuis, Andy Whitcroft,
	Niklas Söderlund, Simon Horman, Conor Dooley, Miguel Ojeda,
	Junio C Hamano, workflows, linux-doc, linux-kernel

On Thu, Dec 05, 2024 at 11:19:18AM -0800, Linus Torvalds wrote:
> Why do I care? Because long git commit IDs are actually detrimental to
> legibility. I try to make commit messages legible, and that very much
> is the *point* of the short format. It's for people, not machinery.
> 
> Yes, the basic git machinery doesn't do object type disambiguation
> (and if you do "git show", you can give it blob IDs etc, so git itself
> may not know about the proper type to use disambiguate at all). And
> git also doesn't know about the whole "we also put the first line of
> the commit message" thing.
> 
> But honestly, I'm claiming that something like
> 
>     Fixes: 48bcda684823 ("tracing: Remove definition of trace_*_rcuidle()")

I have wondered about using a different encoding for the sha1.
Classic Ascii85 encoding is no good; it uses characters like '"\<
which interact poorly with every shell.  RFC1924 is somewhat better,
but still uses characters that interact poorly with shell.

Base36 (ie 0-9a-z) would take 10 characters to represent as many bits
as 12 characters in base16.  Base62 (0-9a-zA-Z) gives us 8 characters
to represent _almost_ 48 bits.  We could do Base64 (RFC 4648) which
uses + and /, and is common enough.  Good enough to be worth the
additional complexity?


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/2] Increase minimum git commit ID abbreviation to 16 characters
  2024-12-14 16:03     ` Matthew Wilcox
@ 2024-12-14 16:31       ` Linus Torvalds
  0 siblings, 0 replies; 11+ messages in thread
From: Linus Torvalds @ 2024-12-14 16:31 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Geert Uytterhoeven, Dwaipayan Ray, Lukas Bulwahn, Joe Perches,
	Jonathan Corbet, Thorsten Leemhuis, Andy Whitcroft,
	Niklas Söderlund, Simon Horman, Conor Dooley, Miguel Ojeda,
	Junio C Hamano, workflows, linux-doc, linux-kernel

On Sat, 14 Dec 2024 at 08:03, Matthew Wilcox <willy@infradead.org> wrote:
>
> I have wondered about using a different encoding for the sha1.
> Classic Ascii85 encoding is no good; it uses characters like '"\<
> which interact poorly with every shell.  RFC1924 is somewhat better,
> but still uses characters that interact poorly with shell.

I suspect that the pain would much outweigh the gain. You'd need to
teach all tools about the new format, and you'd also need to add some
additional format specifying character just to make it unambiguous
*which* format you use, since if you just extend the character set
you'll have lots of hashes that could be either.

And you could disambiguate by testing both and seeing which one works
better, but at that point, you're much better off disambiguating the
current regular hex format by being a bit smarter about the objects.

Using base36 doesn't add enough bits to then make up for such a
disambiguation character in practice (ie 11 characters vs 12 - not
really noticeable).

base62 would be better, but christ does *that* really result in an
unreadable jumble. At that point I'd rather see 16-character hex than
the complete line noise that is base62.

Also, I bet people would start looking for shorthand formats that
spell rude words.  You are kind of limited with hex, and sometimes
that's an advantage.

                 Linus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC] git-disambiguate: disambiguate shorthand git ids
  2024-12-05 19:19   ` Linus Torvalds
  2024-12-13 19:41     ` Geert Uytterhoeven
  2024-12-14 16:03     ` Matthew Wilcox
@ 2024-12-18 23:36     ` Sasha Levin
  2024-12-19  1:41       ` Kees Cook
  2 siblings, 1 reply; 11+ messages in thread
From: Sasha Levin @ 2024-12-18 23:36 UTC (permalink / raw)
  To: torvalds
  Cc: apw, conor, corbet, dwaipayanray1, geert+renesas, gitster, horms,
	joe, linux-doc, linux-kernel, linux, lukas.bulwahn,
	miguel.ojeda.sandonis, niklas.soderlund, workflows, Sasha Levin

Sometimes long commit hashes can be ambiguous even when providing
several digits from its abbreviation.

Add a script that resolves such ambiguity by also considering the
commit subject line.

This also allows users to use shorter commit ID prefixes than normally
required, since we can correctly identify the intended commit using the
subject line as additional context.

In force mode (--force), you can even omit a valid commit ID prefix
entirely - the script will try to find a commit matching just the
subject line.

Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 scripts/git-disambiguate.sh | 163 ++++++++++++++++++++++++++++++++++++
 1 file changed, 163 insertions(+)
 create mode 100755 scripts/git-disambiguate.sh

diff --git a/scripts/git-disambiguate.sh b/scripts/git-disambiguate.sh
new file mode 100755
index 000000000000..86063dd0fd2c
--- /dev/null
+++ b/scripts/git-disambiguate.sh
@@ -0,0 +1,163 @@
+#!/bin/bash
+
+usage() {
+	echo "Usage: $(basename "$0") [--selftest] [--force] <commit-id> [commit-subject]"
+	echo "Disambiguates a short git commit ID to its full SHA-1 hash."
+	echo ""
+	echo "Arguments:"
+	echo "  --selftest	  Run self-tests"
+	echo "  --force		Try to find commit by subject if ID lookup fails"
+	echo "  commit-id	   Short git commit ID to disambiguate"
+	echo "  commit-subject  Optional commit subject to help disambiguate between multiple matches"
+	exit 1
+}
+
+git_full_id() {
+	local force=0
+	if [ "$1" = "--force" ]; then
+		force=1
+		shift
+	fi
+
+	# Split input into commit ID and subject
+	local input="$*"
+	local commit_id="${input%% *}"
+	local subject=""
+	
+	# Extract subject if present (everything after the first space)
+	if [[ "$input" == *" "* ]]; then
+		subject="${input#* }"
+		# Strip the ("...") quotes if present
+		subject="${subject#*(\"}"
+		subject="${subject%\")*}"
+	fi
+
+	# Get all possible matching commit IDs
+	local matches
+	readarray -t matches < <(git rev-parse --disambiguate="$commit_id" 2>/dev/null)
+	
+	# Return immediately if we have exactly one match
+	if [ ${#matches[@]} -eq 1 ]; then
+		echo "${matches[0]}"
+		return 0
+	fi
+	
+	# If no matches and not in force mode, return failure
+	if [ ${#matches[@]} -eq 0 ] && [ $force -eq 0 ]; then
+		return 1
+	fi
+	
+	# If we have a subject, try to find a match with that subject
+	if [ -n "$subject" ]; then
+		# In force mode, search all commits if no ID matches found
+		if [ ${#matches[@]} -eq 0 ]; then
+			local match
+			match=$(git log --format="%H %s" --grep="$subject" --fixed-strings | grep -F -m 1 " $subject" | cut -d' ' -f1)
+			if [ -n "$match" ]; then
+				echo "$match"
+				return 0
+			fi
+		else
+			# Normal subject matching for existing matches
+			for match in "${matches[@]}"; do
+				if [ "$(git log -1 --format="%s" "$match")" = "$subject" ]; then
+					echo "$match"
+					return 0
+				fi
+			done
+		fi
+	fi
+	
+	# No match found
+	return 1
+}
+
+run_selftest() {
+	local test_cases=(
+		'00250b5 ("MAINTAINERS: add new Rockchip SoC list")'
+		'0037727 ("KVM: selftests: Convert xen_shinfo_test away from VCPU_ID")'
+		'ffef737 ("net/tls: Fix skb memory leak when running kTLS traffic")'
+		'12345678'  # Non-existent commit
+		'12345 ("I'\''m a dummy commit")'  # Valid prefix but wrong subject
+		'--force 99999999 ("net/tls: Fix skb memory leak when running kTLS traffic")'  # Force mode with non-existent ID but valid subject
+	)
+	
+	local expected=(
+		"00250b529313d6262bb0ebbd6bdf0a88c809f6f0"
+		"0037727b3989c3fe1929c89a9a1dfe289ad86f58"
+		"ffef737fd0372ca462b5be3e7a592a8929a82752"
+		""  # Expect empty output for non-existent commit
+		""  # Expect empty output for wrong subject
+		"ffef737fd0372ca462b5be3e7a592a8929a82752"  # Should find commit by subject in force mode
+	)
+	
+	local expected_exit_codes=(
+		0
+		0
+		0
+		1  # Expect failure for non-existent commit
+		1  # Expect failure for wrong subject
+		0  # Should succeed in force mode
+	)
+	
+	local failed=0
+	
+	echo "Running self-tests..."
+	for i in "${!test_cases[@]}"; do
+		# Capture both output and exit code
+		local result
+		result=$(git_full_id ${test_cases[$i]})  # Removed quotes to allow --force to be parsed
+		local exit_code=$?
+		
+		# Check both output and exit code
+		if [ "$result" != "${expected[$i]}" ] || [ $exit_code != ${expected_exit_codes[$i]} ]; then
+			echo "Test case $((i+1)) FAILED"
+			echo "Input: ${test_cases[$i]}"
+			echo "Expected output: '${expected[$i]}'"
+			echo "Got output: '$result'"
+			echo "Expected exit code: ${expected_exit_codes[$i]}"
+			echo "Got exit code: $exit_code"
+			failed=1
+		else
+			echo "Test case $((i+1)) PASSED"
+		fi
+	done
+	
+	if [ $failed -eq 0 ]; then
+		echo "All tests passed!"
+		exit 0
+	else
+		echo "Some tests failed!"
+		exit 1
+	fi
+}
+
+# Check for selftest
+if [ "$1" = "--selftest" ]; then
+	run_selftest
+	exit $?
+fi
+
+# Handle --force flag
+force=""
+if [ "$1" = "--force" ]; then
+	force="--force"
+	shift
+fi
+
+# Verify arguments
+if [ $# -eq 0 ]; then
+	usage
+fi
+
+# Skip validation in force mode
+if [ -z "$force" ]; then
+	# Validate that the first argument matches at least one git commit
+	if [ $(git rev-parse --disambiguate="$1" 2>/dev/null | wc -l) -eq 0 ]; then
+		echo "Error: '$1' does not match any git commit"
+		exit 1
+	fi
+fi
+
+git_full_id $force "$@"
+exit $?
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [RFC] git-disambiguate: disambiguate shorthand git ids
  2024-12-18 23:36     ` [RFC] git-disambiguate: disambiguate shorthand git ids Sasha Levin
@ 2024-12-19  1:41       ` Kees Cook
  2024-12-19 20:35         ` Sasha Levin
  0 siblings, 1 reply; 11+ messages in thread
From: Kees Cook @ 2024-12-19  1:41 UTC (permalink / raw)
  To: Sasha Levin
  Cc: torvalds, apw, conor, corbet, dwaipayanray1, geert+renesas,
	gitster, horms, joe, linux-doc, linux-kernel, linux,
	lukas.bulwahn, miguel.ojeda.sandonis, niklas.soderlund, workflows

On Wed, Dec 18, 2024 at 06:36:13PM -0500, Sasha Levin wrote:
> Sometimes long commit hashes can be ambiguous even when providing
> several digits from its abbreviation.

For testing, please see:
https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/log/?h=dev/collide/v6.13-rc2/12-char

>  scripts/git-disambiguate.sh | 163 ++++++++++++++++++++++++++++++++++++

sfr has a bunch of logic in his "check_fixes" script that we might want
to consolidate into here. I have an updated copy here:
https://github.com/kees/kernel-tools/blob/trunk/helpers/check_fixes

-Kees

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC] git-disambiguate: disambiguate shorthand git ids
  2024-12-19  1:41       ` Kees Cook
@ 2024-12-19 20:35         ` Sasha Levin
  0 siblings, 0 replies; 11+ messages in thread
From: Sasha Levin @ 2024-12-19 20:35 UTC (permalink / raw)
  To: Kees Cook
  Cc: torvalds, apw, conor, corbet, dwaipayanray1, geert+renesas,
	gitster, horms, joe, linux-doc, linux-kernel, linux,
	lukas.bulwahn, miguel.ojeda.sandonis, niklas.soderlund, workflows

On Wed, Dec 18, 2024 at 05:41:58PM -0800, Kees Cook wrote:
>On Wed, Dec 18, 2024 at 06:36:13PM -0500, Sasha Levin wrote:
>> Sometimes long commit hashes can be ambiguous even when providing
>> several digits from its abbreviation.
>
>For testing, please see:
>https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/log/?h=dev/collide/v6.13-rc2/12-char
>
>>  scripts/git-disambiguate.sh | 163 ++++++++++++++++++++++++++++++++++++
>
>sfr has a bunch of logic in his "check_fixes" script that we might want
>to consolidate into here. I have an updated copy here:
>https://github.com/kees/kernel-tools/blob/trunk/helpers/check_fixes

Thanks! I'll look into it.

-- 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 1/2] Align git commit ID abbreviation guidelines and checks
  2024-12-05 18:16 ` [PATCH v2 1/2] Align " Geert Uytterhoeven
@ 2024-12-30 18:43   ` Jonathan Corbet
  0 siblings, 0 replies; 11+ messages in thread
From: Jonathan Corbet @ 2024-12-30 18:43 UTC (permalink / raw)
  To: Geert Uytterhoeven, Dwaipayan Ray, Lukas Bulwahn, Joe Perches,
	Thorsten Leemhuis, Andy Whitcroft, Niklas Söderlund,
	Simon Horman, Conor Dooley, Miguel Ojeda, Linus Torvalds
  Cc: Junio C Hamano, workflows, linux-doc, linux-kernel,
	Geert Uytterhoeven

Geert Uytterhoeven <geert+renesas@glider.be> writes:

> The guidelines for git commit ID abbreviation are inconsistent: some
> places state to use 12 characters exactly, while other places recommend
> 12 characters or more.  The same issue is present in the checkpatch.pl
> script.
>
> E.g. Documentation/dev-tools/checkpatch.rst says:
>
>   **GIT_COMMIT_ID**
>     The proper way to reference a commit id is:
>     commit <12+ chars of sha1> ("<title line>")
>
> However, scripts/checkpatch.pl has two different checks: one warning
> check accepting 12 characters exactly:
>
>     # Check Fixes: styles is correct
>     Please use correct Fixes: style 'Fixes: <12 chars of sha1> (\"<title line>\")'
>
> and a second error check accepting 12-40 characters:
>
>     # Check for git id commit length and improperly formed commit descriptions
>     # A correctly formed commit description is:
>     #    commit <SHA-1 hash length 12+ chars> ("Complete commit subject")
>     Please use git commit description style 'commit <12+ chars of sha1>
>
> Hence patches containing commit IDs with more than 12 characters are
> flagged by checkpatch, and sometimes rejected by maintainers or
> reviewers.  This is becoming more important with the growth of the
> repository, as git may decide to use more characters in case of local
> conflicts.
>
> Fix this by settling on at least 12 characters, in both the
> documentation and in the checkpatch.pl script.
>
> Fixes: bd17e036b495bebb ("checkpatch: warn for non-standard fixes tag style")
> Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
> ---
> v2:
>   - Rebase on top of commit 2f07b652384969f5 ("checkpatch: always parse
>     orig_commit in fixes tag") in v6.13-rc1,
>   - Update documentation, too.
> ---
>  Documentation/process/maintainer-tip.rst     | 2 +-
>  Documentation/process/submitting-patches.rst | 8 ++++----
>  scripts/checkpatch.pl                        | 4 ++--
>  3 files changed, 7 insertions(+), 7 deletions(-)

So, while the other patch in this series raised some eyebrows, nobody
has complained about this one.  Consistency and clarity are good, so
I've applied this one, thanks.

jon

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-12-30 18:43 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-05 18:16 [PATCH v2 0/2] Align and increase git commit ID abbreviation guidelines and checks Geert Uytterhoeven
2024-12-05 18:16 ` [PATCH v2 1/2] Align " Geert Uytterhoeven
2024-12-30 18:43   ` Jonathan Corbet
2024-12-05 18:16 ` [PATCH v2 2/2] Increase minimum git commit ID abbreviation to 16 characters Geert Uytterhoeven
2024-12-05 19:19   ` Linus Torvalds
2024-12-13 19:41     ` Geert Uytterhoeven
2024-12-14 16:03     ` Matthew Wilcox
2024-12-14 16:31       ` Linus Torvalds
2024-12-18 23:36     ` [RFC] git-disambiguate: disambiguate shorthand git ids Sasha Levin
2024-12-19  1:41       ` Kees Cook
2024-12-19 20:35         ` Sasha Levin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).