* [PATCH v2 0/4] technical docs in make build [not found] <https://lore.kernel.org/git/bcb3b3a3-bb13-4808-9363-442b5f9be05f@ramsayjones.plus.com/> @ 2025-10-02 22:12 ` Ramsay Jones 2025-10-02 22:12 ` [PATCH v2 1/4] doc: add some missing technical documents Ramsay Jones ` (4 more replies) 0 siblings, 5 replies; 25+ messages in thread From: Ramsay Jones @ 2025-10-02 22:12 UTC (permalink / raw) To: GIT Mailing-list Cc: Patrick Steinhardt, Elijah Newren, Derrick Stolee, Junio C Hamano, Ramsay Jones OK, so I have recently developed an intense dislike of both asciidoc and asciidoctor. :) Changes in v2: - Actual commit messages - (almost) total re-write of patches #2 and #3 - removed the RFC from patches #2->#4 I have not included a range-diff, because it doesn't show anything interesting/readable with or without a large --creation-factor! There are two issues I am aware of: - mis-formatting of monospaced text containing an '{' character mentioned in the original cover letter below. I have not found a fix for this, but there are other examples in patch #3! - breakage of two html links representing URLS pointing to emails at 'lore.kernel.org'. I don't think it is a coincidence that it is only these two references that contain a reserved character; a '+' in the first (see known bugs 7) and two (separate) '=' characters in the second (mail ref [13]). I tried %encoding them, but that didn't make any difference. There are probably other formatting issues that I am not aware of! Original cover letter: I have been trying to get back to the 'misc build updates (part #3)' patches, so that I can send them to the list, but I have not been able to find a spare minute for quite some time. :( However, this sub-sequence of patches hangs together as a single theme and I need help to finish them up! (asciidoc is not my forte). The first patch adds some technical documents to the Makefile build which are already part of the meson build. In particular, the following are built by meson, but not by the Makefile: commit-graph.adoc directory-rename-detection.adoc packfile-uri.adoc remembering-renames.adoc repository-version.adoc rerere.adoc sparse-checkout.adoc sparse-index.adoc Although I am not convinced that some of these files were ever meant to be formatted by asciidoc, I have assumed that is the case for the purposes of this patch series. (otherwise, we should remove them from the meson build and rename the files instead). When I attempt to build the html docs, with patch #1 applied, on Linux: $ make html >out-doc 2>&1 $ grep SyntaxWarning out-doc | head -n1 <unknown>:1: SyntaxWarning: invalid escape sequence '\S' $ grep SyntaxWarning out-doc | wc -l 524 $ $ asciidoc --version asciidoc 10.2.0 $ python3 --version Python 3.12.3 $ This is caused by the python version I am using, which was recently changed (in version 3.12) to issue the SyntaxWarning when a 'non-raw' string contains some escape sequences (here \S). [some versions prior to 3.12 used to issue a deprecation warning]. This is a known issue, see e.g. [0], which has been addressed by a patch [1], and as seen in [2] has been included in a new version 10.2.1 of asciidoc. [0] https://trac.macports.org/ticket/70039 [1] 1https://github.com/asciidoc-py/asciidoc-py/pull/267 [2] https://github.com/asciidoc-py/asciidoc-py/commits/main/ [cygwin does not have this problem, because the phython version is 3.9.16] So, ignoring that issue, we still see some warnings from asciidoc: $ grep WARNING out-doc asciidoc: WARNING: remembering-renames.adoc: line 13: list item index: expected 1 got 0 asciidoc: WARNING: remembering-renames.adoc: line 15: list item index: expected 2 got 1 asciidoc: WARNING: remembering-renames.adoc: line 17: list item index: expected 3 got 2 asciidoc: WARNING: remembering-renames.adoc: line 20: list item index: expected 4 got 3 asciidoc: WARNING: remembering-renames.adoc: line 23: list item index: expected 5 got 4 asciidoc: WARNING: remembering-renames.adoc: line 25: list item index: expected 6 got 5 asciidoc: WARNING: remembering-renames.adoc: line 29: list item index: expected 7 got 6 asciidoc: WARNING: remembering-renames.adoc: line 31: list item index: expected 8 got 7 asciidoc: WARNING: remembering-renames.adoc: line 33: list item index: expected 9 got 8 asciidoc: WARNING: remembering-renames.adoc: line 38: section title out of sequence: expected level 1, got level 2 asciidoc: WARNING: sparse-checkout.adoc: line 17: section title out of sequence: expected level 1, got level 2 asciidoc: WARNING: sparse-checkout.adoc: line 928: list item index: expected 1 got 0 asciidoc: WARNING: sparse-checkout.adoc: line 931: list item index: expected 2 got 1 asciidoc: WARNING: sparse-checkout.adoc: line 951: list item index: expected 3 got 2 asciidoc: WARNING: sparse-checkout.adoc: line 974: list item index: expected 4 got 3 asciidoc: WARNING: sparse-checkout.adoc: line 980: list item index: expected 5 got 4 asciidoc: WARNING: sparse-checkout.adoc: line 1033: list item index: expected 6 got 5 asciidoc: WARNING: sparse-checkout.adoc: line 1049: list item index: expected 7 got 6 $ I also tried asciidoctor, just for fun: $ asciidoctor --version Asciidoctor 2.0.20 [https://asciidoctor.org] Runtime Environment (ruby 3.2.3 (2024-01-18 revision 52bb2ac0a6) [x86_64-linux-gnu]) (lc:UTF-8 fs:UTF-8 in:UTF-8 ex:UTF-8) $ $ make USE_ASCIIDOCTOR=1 html >out-doctor 2>&1 $ grep WARNING out-doctor asciidoctor: WARNING: remembering-renames.adoc: line 13: list item index: expected 1, got 0 asciidoctor: WARNING: remembering-renames.adoc: line 15: list item index: expected 2, got 1 asciidoctor: WARNING: remembering-renames.adoc: line 17: list item index: expected 3, got 2 asciidoctor: WARNING: remembering-renames.adoc: line 20: list item index: expected 4, got 3 asciidoctor: WARNING: remembering-renames.adoc: line 23: list item index: expected 5, got 4 asciidoctor: WARNING: remembering-renames.adoc: line 25: list item index: expected 6, got 5 asciidoctor: WARNING: remembering-renames.adoc: line 29: list item index: expected 7, got 6 asciidoctor: WARNING: remembering-renames.adoc: line 31: list item index: expected 8, got 7 asciidoctor: WARNING: remembering-renames.adoc: line 33: list item index: expected 9, got 8 asciidoctor: WARNING: remembering-renames.adoc: line 38: section title out of sequence: expected level 1, got level 2 asciidoctor: WARNING: remembering-renames.adoc: line 94: section title out of sequence: expected level 1, got level 2 asciidoctor: WARNING: remembering-renames.adoc: line 141: section title out of sequence: expected level 1, got level 2 asciidoctor: WARNING: remembering-renames.adoc: line 142: section title out of sequence: expected level 1, got level 2 asciidoctor: WARNING: remembering-renames.adoc: line 184: section title out of sequence: expected level 1, got level 2 asciidoctor: WARNING: remembering-renames.adoc: line 185: section title out of sequence: expected level 1, got level 2 asciidoctor: WARNING: remembering-renames.adoc: line 257: section title out of sequence: expected level 1, got level 2 asciidoctor: WARNING: remembering-renames.adoc: line 288: section title out of sequence: expected level 1, got level 2 asciidoctor: WARNING: remembering-renames.adoc: line 289: section title out of sequence: expected level 1, got level 2 asciidoctor: WARNING: remembering-renames.adoc: line 290: section title out of sequence: expected level 1, got level 2 asciidoctor: WARNING: remembering-renames.adoc: line 397: section title out of sequence: expected level 1, got level 2 asciidoctor: WARNING: remembering-renames.adoc: line 424: section title out of sequence: expected level 1, got level 2 asciidoctor: WARNING: remembering-renames.adoc: line 485: section title out of sequence: expected level 1, got level 2 asciidoctor: WARNING: remembering-renames.adoc: line 486: section title out of sequence: expected level 1, got level 2 asciidoctor: WARNING: remembering-renames.adoc: line 487: section title out of sequence: expected level 1, got level 2 asciidoctor: WARNING: sparse-checkout.adoc: line 17: section title out of sequence: expected level 1, got level 2 asciidoctor: WARNING: sparse-checkout.adoc: line 95: section title out of sequence: expected level 1, got level 2 asciidoctor: WARNING: sparse-checkout.adoc: line 258: section title out of sequence: expected level 1, got level 2 asciidoctor: WARNING: sparse-checkout.adoc: line 303: section title out of sequence: expected level 1, got level 2 asciidoctor: WARNING: sparse-checkout.adoc: line 316: section title out of sequence: expected level 1, got level 2 asciidoctor: WARNING: sparse-checkout.adoc: line 545: section title out of sequence: expected level 1, got level 2 asciidoctor: WARNING: sparse-checkout.adoc: line 612: section title out of sequence: expected level 1, got level 2 asciidoctor: WARNING: sparse-checkout.adoc: line 752: section title out of sequence: expected level 1, got level 2 asciidoctor: WARNING: sparse-checkout.adoc: line 824: section title out of sequence: expected level 1, got level 2 asciidoctor: WARNING: sparse-checkout.adoc: line 895: section title out of sequence: expected level 1, got level 2 asciidoctor: WARNING: sparse-checkout.adoc: line 923: section title out of sequence: expected level 1, got level 2 asciidoctor: WARNING: sparse-checkout.adoc: line 928: list item index: expected 1, got 0 asciidoctor: WARNING: sparse-checkout.adoc: line 931: list item index: expected 2, got 1 asciidoctor: WARNING: sparse-checkout.adoc: line 951: list item index: expected 3, got 2 asciidoctor: WARNING: sparse-checkout.adoc: line 974: list item index: expected 4, got 3 asciidoctor: WARNING: sparse-checkout.adoc: line 980: list item index: expected 5, got 4 asciidoctor: WARNING: sparse-checkout.adoc: line 1033: list item index: expected 6, got 5 asciidoctor: WARNING: sparse-checkout.adoc: line 1049: list item index: expected 7, got 6 asciidoctor: WARNING: sparse-checkout.adoc: line 1053: section title out of sequence: expected level 1, got level 2 $ You can see that asciidoc only complains about the first 'section title out of sequence', whereas asciidoctor complains about them all. [asciidoctor also reports: Note: namesp. cut : stripped namespace before processing Git User Manual] Patch #2 was a nightmare which I really gave up on! :) An early attempt involved renumbering the 'outline list' at the top from 0->8 to 1->9 (I thought there was a way to start numbering at zero, but I lost a lot of time trying to do so, without any success). So, of course I 'just' tried global search/replace in vim to do the renumbering (backwards). This was a complete disaster (of course), which I 'fixed' many many times. (Not everything which is numbered is a section, there are 'cases' as well). In the end, I just disabled the 'outline' list, by removing the period on the numbers (again '0\. Assumptions' should have worked, but didn't) and fixing up the section titles without renumbering them. Note that asciidoctor mis-formats the 'ascii branch diagrams', which asciidoc formats correctly. I think there are other formatting problems left. In patch #3, the formatting changes are confined to the section titles and renumbering the 'known bugs' from 0->6 to 1->7. (I think I noticed some sub-sub lists which are not formatted correctly, but I don't seem to be able to see them now ...). In patch #4, most of the formatting changes relate to section titles, but I could not fix some inline text formatting starting at 'File Layouts' (within 'Commit-Graph Chains') with text that is monospaced with `` but also contains an '{' character. For example: `$OBJDIR/info/commit-graphs/graph-{hash}.graph` is monospaced (blue colour with asciidoc) up until the {hash}.graph which does not have any formatting. (It is not so noticeable with asciidoctor because the formatting consists of a *very* subtle gray background to the text which, to my eyes anyway, is almost not visible). I have tried several suggestions from an on-line asciidoc syntax cheatsheet such as: `$OBJDIR/info/commit-graphs/graph-\{hash\}.graph` `+$OBJDIR/info/commit-graphs/graph-{hash}.graph+` but nothing worked. Note that there are many similar instances of this problem (including just `{hash}`). Note also that asciidoctor did not render the second diagram correctly (the one in 'Merging commit-graph files'), but asciidoc was just fine. The remaining documents: directory-rename-detection.adoc packfile-uri.adoc repository-version.adoc rerere.adoc sparse-index.adoc all appear to be formatted correctly. So, I really need help with the asciidoc formatting, in patches #2->#4, which I am marking as RFC. Having said that, these patches represent an improvement over the existing documents in terms of formatting (just not by much!). Any help fixing up these patches would be much appreciated. :) Thanks. ATB, Ramsay Jones Ramsay Jones (4): doc: add some missing technical documents doc: remembering-renames.adoc: fix asciidoc warnings doc: sparse-checkout.adoc: fix asciidoc warnings doc: commit-graph.adoc: fix up some formatting Documentation/Makefile | 8 + Documentation/technical/commit-graph.adoc | 29 +- .../technical/remembering-renames.adoc | 120 +-- Documentation/technical/sparse-checkout.adoc | 704 ++++++++++-------- 4 files changed, 481 insertions(+), 380 deletions(-) -- 2.51.0 ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH v2 1/4] doc: add some missing technical documents 2025-10-02 22:12 ` [PATCH v2 0/4] technical docs in make build Ramsay Jones @ 2025-10-02 22:12 ` Ramsay Jones 2025-10-08 6:45 ` Patrick Steinhardt 2025-10-02 22:12 ` [PATCH v2 2/4] doc: remembering-renames.adoc: fix asciidoc warnings Ramsay Jones ` (3 subsequent siblings) 4 siblings, 1 reply; 25+ messages in thread From: Ramsay Jones @ 2025-10-02 22:12 UTC (permalink / raw) To: GIT Mailing-list Cc: Patrick Steinhardt, Elijah Newren, Derrick Stolee, Junio C Hamano, Ramsay Jones Commit bcf7edee09 ("meson: generate articles", 2024-12-27) added the generation of the 'howto' and 'technical' documents to the meson build. At this time those documents had a '*.txt' file extension, but they were renamed with an '*.adoc' extension by commit 1f010d6bdf ("doc: use .adoc extension for AsciiDoc files", 2025-01-20), for the most part. For the meson build, commit 87eccc3a81 ("meson: fix building technical and howto docs", 2025-03-02) fixed the meson.build files, which had not been updated when the files were renamed. However, the 'Documentation/Makefile' has not been updated to include all of the recently added technical documents. In particular, the following are built by meson, but not by the Makefile: commit-graph.adoc directory-rename-detection.adoc packfile-uri.adoc remembering-renames.adoc repository-version.adoc rerere.adoc sparse-checkout.adoc sparse-index.adoc In order to ensure that both build systems format the same technical documents, add the above documents to the TECH_DOCS variable in the Documentation/Makefile. Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com> --- Documentation/Makefile | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/Documentation/Makefile b/Documentation/Makefile index 6fb83d0c6e..a3fbd29744 100644 --- a/Documentation/Makefile +++ b/Documentation/Makefile @@ -119,18 +119,26 @@ TECH_DOCS += ToolsForGit TECH_DOCS += technical/bitmap-format TECH_DOCS += technical/build-systems TECH_DOCS += technical/bundle-uri +TECH_DOCS += technical/commit-graph +TECH_DOCS += technical/directory-rename-detection TECH_DOCS += technical/hash-function-transition TECH_DOCS += technical/long-running-process-protocol TECH_DOCS += technical/multi-pack-index +TECH_DOCS += technical/packfile-uri TECH_DOCS += technical/pack-heuristics TECH_DOCS += technical/parallel-checkout TECH_DOCS += technical/partial-clone TECH_DOCS += technical/platform-support TECH_DOCS += technical/racy-git TECH_DOCS += technical/reftable +TECH_DOCS += technical/remembering-renames +TECH_DOCS += technical/repository-version +TECH_DOCS += technical/rerere TECH_DOCS += technical/scalar TECH_DOCS += technical/send-pack-pipeline TECH_DOCS += technical/shallow +TECH_DOCS += technical/sparse-checkout +TECH_DOCS += technical/sparse-index TECH_DOCS += technical/trivial-merge TECH_DOCS += technical/unit-tests SP_ARTICLES += $(TECH_DOCS) -- 2.51.0 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH v2 1/4] doc: add some missing technical documents 2025-10-02 22:12 ` [PATCH v2 1/4] doc: add some missing technical documents Ramsay Jones @ 2025-10-08 6:45 ` Patrick Steinhardt 2025-10-08 19:00 ` Junio C Hamano 2025-10-08 21:56 ` Ramsay Jones 0 siblings, 2 replies; 25+ messages in thread From: Patrick Steinhardt @ 2025-10-08 6:45 UTC (permalink / raw) To: Ramsay Jones Cc: GIT Mailing-list, Elijah Newren, Derrick Stolee, Junio C Hamano On Thu, Oct 02, 2025 at 11:12:13PM +0100, Ramsay Jones wrote: > Commit bcf7edee09 ("meson: generate articles", 2024-12-27) added the > generation of the 'howto' and 'technical' documents to the meson build. > At this time those documents had a '*.txt' file extension, but they were > renamed with an '*.adoc' extension by commit 1f010d6bdf ("doc: use .adoc > extension for AsciiDoc files", 2025-01-20), for the most part. For the > meson build, commit 87eccc3a81 ("meson: fix building technical and howto > docs", 2025-03-02) fixed the meson.build files, which had not been > updated when the files were renamed. > > However, the 'Documentation/Makefile' has not been updated to include > all of the recently added technical documents. In particular, the > following are built by meson, but not by the Makefile: > > commit-graph.adoc > directory-rename-detection.adoc > packfile-uri.adoc > remembering-renames.adoc > repository-version.adoc > rerere.adoc > sparse-checkout.adoc > sparse-index.adoc > > In order to ensure that both build systems format the same technical > documents, add the above documents to the TECH_DOCS variable in the > Documentation/Makefile. I was wondering whether we also want to have a change like the following: diff --git a/Documentation/Makefile b/Documentation/Makefile index 6fb83d0c6e..666b0b6fbd 100644 --- a/Documentation/Makefile +++ b/Documentation/Makefile @@ -524,15 +524,20 @@ lint-docs-manpages: lint-docs-meson: @# awk acts up when trying to match single quotes, so we use \047 instead. @mkdir -p tmp-meson-diff && \ - awk "/^manpages = {$$/ {flag=1 ; next } /^}$$/ { flag=0 } flag { gsub(/^ \047/, \"\"); gsub(/\047 : [157],\$$/, \"\"); print }" meson.build | \ + { \ + awk "/^manpages = {$$/ {flag=1 ; next } /^}$$/ { flag=0 } flag { gsub(/^ \047/, \"\"); gsub(/\047 : [157],\$$/, \"\"); print }" meson.build && \ + awk "/^articles = \[$$/ {flag=1 ; next } /^\]$$/ { flag=0 } flag { gsub(/^ \047/, \"\"); gsub(/\047,$$/, \"\"); print }" technical/meson.build; \ + } | \ grep -v -e '#' -e '^$$' | \ sort >tmp-meson-diff/meson.adoc && \ - ls git*.adoc scalar.adoc | \ + ls git*.adoc scalar.adoc technical/*.adoc | \ + xargs -n1 basename | \ grep -v -e git-bisect-lk2009.adoc \ -e git-pack-redundant.adoc \ -e git-tools.adoc \ -e git-whatchanged.adoc \ - >tmp-meson-diff/actual.adoc && \ + -e api-.*.adoc | \ + sort >tmp-meson-diff/actual.adoc && \ if ! cmp tmp-meson-diff/meson.adoc tmp-meson-diff/actual.adoc; then \ echo "Meson man pages differ from actual man pages:"; \ diff -u tmp-meson-diff/meson.adoc tmp-meson-diff/actual.adoc; \ This builds on our existing linting rule and would catch any discrepancy in man pages that we have in "Documentation/technical/" that isn't listed in Meson. This check isn't quite complete, there's two things missing: - We have an equivalent check in "Documentation/meson.build" that we might want to extend to also cover articles. - We don't have a check to ensure that our Makefile and Meson are in sync. But regardless of that, the above check surfaces one more missing article: $ make lint-docs-meson GEN doc.dep make: *** Deleting file 'doc.dep' tmp-meson-diff/meson.adoc tmp-meson-diff/actual.adoc differ: byte 3877, line 206 Meson man pages differ from actual man pages: --- tmp-meson-diff/meson.adoc 2025-10-08 08:42:49.864991169 +0200 +++ tmp-meson-diff/actual.adoc 2025-10-08 08:42:50.072988794 +0200 @@ -203,6 +203,7 @@ git-worktree.adoc git-write-tree.adoc hash-function-transition.adoc +large-object-promisors.adoc long-running-process-protocol.adoc multi-pack-index.adoc packfile-uri.adoc make: *** [Makefile:526: lint-docs-meson] Error 1 Patrick ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH v2 1/4] doc: add some missing technical documents 2025-10-08 6:45 ` Patrick Steinhardt @ 2025-10-08 19:00 ` Junio C Hamano 2025-10-08 22:01 ` Ramsay Jones 2025-10-08 21:56 ` Ramsay Jones 1 sibling, 1 reply; 25+ messages in thread From: Junio C Hamano @ 2025-10-08 19:00 UTC (permalink / raw) To: Patrick Steinhardt Cc: Ramsay Jones, GIT Mailing-list, Elijah Newren, Derrick Stolee Patrick Steinhardt <ps@pks.im> writes: > This builds on our existing linting rule and would catch any discrepancy > in man pages that we have in "Documentation/technical/" that isn't > listed in Meson. Yeah, I remember the existing check helping me spot potential issues in a series or two. > But regardless of that, the above check surfaces one more missing > article: > > $ make lint-docs-meson > GEN doc.dep > make: *** Deleting file 'doc.dep' > tmp-meson-diff/meson.adoc tmp-meson-diff/actual.adoc differ: byte 3877, line 206 > Meson man pages differ from actual man pages: > --- tmp-meson-diff/meson.adoc 2025-10-08 08:42:49.864991169 +0200 > +++ tmp-meson-diff/actual.adoc 2025-10-08 08:42:50.072988794 +0200 > @@ -203,6 +203,7 @@ > git-worktree.adoc > git-write-tree.adoc > hash-function-transition.adoc > +large-object-promisors.adoc > long-running-process-protocol.adoc > multi-pack-index.adoc > packfile-uri.adoc > make: *** [Makefile:526: lint-docs-meson] Error 1 Good. I'll expect Ramsay will handle this one in v3? Thanks. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 1/4] doc: add some missing technical documents 2025-10-08 19:00 ` Junio C Hamano @ 2025-10-08 22:01 ` Ramsay Jones 2025-10-08 22:33 ` Junio C Hamano 0 siblings, 1 reply; 25+ messages in thread From: Ramsay Jones @ 2025-10-08 22:01 UTC (permalink / raw) To: Junio C Hamano, Patrick Steinhardt Cc: GIT Mailing-list, Elijah Newren, Derrick Stolee On 08/10/2025 8:00 pm, Junio C Hamano wrote: > Patrick Steinhardt <ps@pks.im> writes: > >> This builds on our existing linting rule and would catch any discrepancy >> in man pages that we have in "Documentation/technical/" that isn't >> listed in Meson. > > Yeah, I remember the existing check helping me spot potential issues > in a series or two. > >> But regardless of that, the above check surfaces one more missing >> article: >> >> $ make lint-docs-meson >> GEN doc.dep >> make: *** Deleting file 'doc.dep' >> tmp-meson-diff/meson.adoc tmp-meson-diff/actual.adoc differ: byte 3877, line 206 >> Meson man pages differ from actual man pages: >> --- tmp-meson-diff/meson.adoc 2025-10-08 08:42:49.864991169 +0200 >> +++ tmp-meson-diff/actual.adoc 2025-10-08 08:42:50.072988794 +0200 >> @@ -203,6 +203,7 @@ >> git-worktree.adoc >> git-write-tree.adoc >> hash-function-transition.adoc >> +large-object-promisors.adoc >> long-running-process-protocol.adoc >> multi-pack-index.adoc >> packfile-uri.adoc >> make: *** [Makefile:526: lint-docs-meson] Error 1 > > Good. I'll expect Ramsay will handle this one in v3? OK, will do. Since patch #1 is already in 'next', do I effectively create a new patch series out of patches #2->#4, plus this new patch? ATB, Ramsay Jones ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 1/4] doc: add some missing technical documents 2025-10-08 22:01 ` Ramsay Jones @ 2025-10-08 22:33 ` Junio C Hamano 0 siblings, 0 replies; 25+ messages in thread From: Junio C Hamano @ 2025-10-08 22:33 UTC (permalink / raw) To: Ramsay Jones Cc: Patrick Steinhardt, GIT Mailing-list, Elijah Newren, Derrick Stolee Ramsay Jones <ramsay@ramsayjones.plus.com> writes: > Since patch #1 is already in 'next', do I effectively create a new > patch series out of patches #2->#4, plus this new patch? That would be great. Your original was one patch with 3 RFC patches on top, so they are queued separately already on two different topic branches. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 1/4] doc: add some missing technical documents 2025-10-08 6:45 ` Patrick Steinhardt 2025-10-08 19:00 ` Junio C Hamano @ 2025-10-08 21:56 ` Ramsay Jones 1 sibling, 0 replies; 25+ messages in thread From: Ramsay Jones @ 2025-10-08 21:56 UTC (permalink / raw) To: Patrick Steinhardt Cc: GIT Mailing-list, Elijah Newren, Derrick Stolee, Junio C Hamano On 08/10/2025 7:45 am, Patrick Steinhardt wrote: > On Thu, Oct 02, 2025 at 11:12:13PM +0100, Ramsay Jones wrote: [snip] > This builds on our existing linting rule and would catch any discrepancy > in man pages that we have in "Documentation/technical/" that isn't > listed in Meson. > > This check isn't quite complete, there's two things missing: > > - We have an equivalent check in "Documentation/meson.build" that we > might want to extend to also cover articles. > > - We don't have a check to ensure that our Makefile and Meson are in > sync. > > But regardless of that, the above check surfaces one more missing > article: > > $ make lint-docs-meson > GEN doc.dep > make: *** Deleting file 'doc.dep' > tmp-meson-diff/meson.adoc tmp-meson-diff/actual.adoc differ: byte 3877, line 206 > Meson man pages differ from actual man pages: > --- tmp-meson-diff/meson.adoc 2025-10-08 08:42:49.864991169 +0200 > +++ tmp-meson-diff/actual.adoc 2025-10-08 08:42:50.072988794 +0200 > @@ -203,6 +203,7 @@ > git-worktree.adoc > git-write-tree.adoc > hash-function-transition.adoc > +large-object-promisors.adoc > long-running-process-protocol.adoc > multi-pack-index.adoc > packfile-uri.adoc > make: *** [Makefile:526: lint-docs-meson] Error 1 So, it has already paid for itself! Thanks. ATB, Ramsay Jones ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH v2 2/4] doc: remembering-renames.adoc: fix asciidoc warnings 2025-10-02 22:12 ` [PATCH v2 0/4] technical docs in make build Ramsay Jones 2025-10-02 22:12 ` [PATCH v2 1/4] doc: add some missing technical documents Ramsay Jones @ 2025-10-02 22:12 ` Ramsay Jones 2025-10-08 3:51 ` Elijah Newren 2025-10-02 22:12 ` [PATCH v2 3/4] doc: sparse-checkout.adoc: " Ramsay Jones ` (2 subsequent siblings) 4 siblings, 1 reply; 25+ messages in thread From: Ramsay Jones @ 2025-10-02 22:12 UTC (permalink / raw) To: GIT Mailing-list Cc: Patrick Steinhardt, Elijah Newren, Derrick Stolee, Junio C Hamano, Ramsay Jones Both asciidoc and ascidoctor issue warnings about 'list item index: expected n got n-1' for n=1->9 on lines 13, 15, 17, 20, 23, 25, 29, 31 and 33. In asciidoc, numbered lists must start at one, whereas this file has a list starting at zero. Also, asciidoc and asciidoctor warn about 'section title out of sequence: expected level 1, got level 2' on line 38. (asciidoc only complains about the first instance of this, while asciidoctor complains about them all, on lines 94, 141, 142, 184, 185, 257, 288, 289, 290, 397, 424, 485, 486 and 487). These warnings stem from the section titles not being correctly nested within a document/chapter title. In order to address the first set of warnings, simply renumber the list from one to nine, rather than zero to eight. This also requires altering the text which refers to the section numbers, including other section titles. In order to address the second set of warnings, change the section title syntax from '=== title ===' to '== title ==', effectively reducing the nesting level of the title by one. Also, some of the titles are given over multiple lines (they are very long), with an title '===' prefix on each line. This leads to them being treated as separate sections with no body text (as you can see from the line numbers given for the asciidoctor warnings, above). So, for these titles, turn them into a single (long) line of text. In addition to the warnings, address some other formatting issues: - the ascii branch diagrams didn't format correctly on asciidoctor so include them in a literal block. - several blocks of text were intended to be formatted 'as is' but were not included in a literal block. - in section 8, format the (A)->(D) in the text description as a literal with `` marks, since (C) is rendered as a copyright symbol in html otherwise. - in section 9, a sub-list of two items is not formatted as such. change the '*' introducer to '**' to correct the sub-list format. Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com> --- .../technical/remembering-renames.adoc | 120 ++++++++++++------ 1 file changed, 78 insertions(+), 42 deletions(-) diff --git a/Documentation/technical/remembering-renames.adoc b/Documentation/technical/remembering-renames.adoc index 73f41761e2..6155f36c72 100644 --- a/Documentation/technical/remembering-renames.adoc +++ b/Documentation/technical/remembering-renames.adoc @@ -10,32 +10,32 @@ history as an optimization, assuming all merges are automatic and clean Outline: - 0. Assumptions + 1. Assumptions - 1. How rebasing and cherry-picking work + 2. How rebasing and cherry-picking work - 2. Why the renames on MERGE_SIDE1 in any given pick are *always* a + 3. Why the renames on MERGE_SIDE1 in any given pick are *always* a superset of the renames on MERGE_SIDE1 for the next pick. - 3. Why any rename on MERGE_SIDE1 in any given pick is _almost_ always also + 4. Why any rename on MERGE_SIDE1 in any given pick is _almost_ always also a rename on MERGE_SIDE1 for the next pick - 4. A detailed description of the counter-examples to #3. + 5. A detailed description of the counter-examples to #4. - 5. Why the special cases in #4 are still fully reasonable to use to pair + 6. Why the special cases in #5 are still fully reasonable to use to pair up files for three-way content merging in the merge machinery, and why they do not affect the correctness of the merge. - 6. Interaction with skipping of "irrelevant" renames + 7. Interaction with skipping of "irrelevant" renames - 7. Additional items that need to be cached + 8. Additional items that need to be cached - 8. How directory rename detection interacts with the above and why this + 9. How directory rename detection interacts with the above and why this optimization is still safe even if merge.directoryRenames is set to "true". -=== 0. Assumptions === +== 1. Assumptions == There are two assumptions that will hold throughout this document: @@ -44,8 +44,8 @@ There are two assumptions that will hold throughout this document: * All merges are fully automatic -and a third that will hold in sections 2-5 for simplicity, that I'll later -address in section 8: +and a third that will hold in sections 3-6 for simplicity, that I'll later +address in section 9: * No directory renames occur @@ -77,9 +77,9 @@ conflicts that the user needs to resolve), the cache of renames is not stored on disk, and thus is thrown away as soon as the rebase or cherry pick stops for the user to resolve the operation. -The third assumption makes sections 2-5 simpler, and allows people to +The third assumption makes sections 3-6 simpler, and allows people to understand the basics of why this optimization is safe and effective, and -then I can go back and address the specifics in section 8. It is probably +then I can go back and address the specifics in section 9. It is probably also worth noting that if directory renames do occur, then the default of merge.directoryRenames being set to "conflict" means that the operation will stop for users to resolve the conflicts and the cache will be thrown @@ -88,22 +88,26 @@ reason we need to address directory renames specifically, is that some users will have set merge.directoryRenames to "true" to allow the merges to continue to proceed automatically. The optimization is still safe with this config setting, but we have to discuss a few more cases to show why; -this discussion is deferred until section 8. +this discussion is deferred until section 9. -=== 1. How rebasing and cherry-picking work === +== 2. How rebasing and cherry-picking work == Consider the following setup (from the git-rebase manpage): +------------ A---B---C topic / D---E---F---G main +------------ After rebasing or cherry-picking topic onto main, this will appear as: +------------ A'--B'--C' topic / D---E---F---G main +------------ The way the commits A', B', and C' are created is through a series of merges, where rebase or cherry-pick sequentially uses each of the three @@ -111,6 +115,7 @@ A-B-C commits in a special merge operation. Let's label the three commits in the merge operation as MERGE_BASE, MERGE_SIDE1, and MERGE_SIDE2. For this picture, the three commits for each of the three merges would be: +.... To create A': MERGE_BASE: E MERGE_SIDE1: G @@ -125,6 +130,7 @@ To create C': MERGE_BASE: B MERGE_SIDE1: B' MERGE_SIDE2: C +.... Sometimes, folks are surprised that these three-way merges are done. It can be useful in understanding these three-way merges to view them in a @@ -138,8 +144,7 @@ Conceptually the two statements above are the same as a three-way merge of B, B', and C, at least the parts before you decide to record a commit. -=== 2. Why the renames on MERGE_SIDE1 in any given pick are always a === -=== superset of the renames on MERGE_SIDE1 for the next pick. === +== 3. Why the renames on MERGE_SIDE1 in any given pick are always a superset of the renames on MERGE_SIDE1 for the next pick. == The merge machinery uses the filenames it is fed from MERGE_BASE, MERGE_SIDE1, and MERGE_SIDE2. It will only move content to a different @@ -156,6 +161,7 @@ filename under one of three conditions: First, let's remember what commits are involved in the first and second picks of the cherry-pick or rebase sequence: +.... To create A': MERGE_BASE: E MERGE_SIDE1: G @@ -165,6 +171,7 @@ To create B': MERGE_BASE: A MERGE_SIDE1: A' MERGE_SIDE2: B +.... So, in particular, we need to show that the renames between E and G are a superset of those between A and A'. @@ -181,11 +188,11 @@ are a subset of those between E and G. Equivalently, all renames between E and G are a superset of those between A and A'. -=== 3. Why any rename on MERGE_SIDE1 in any given pick is _almost_ === -=== always also a rename on MERGE_SIDE1 for the next pick. === +== 4. Why any rename on MERGE_SIDE1 in any given pick is _almost_ always also a rename on MERGE_SIDE1 for the next pick. == Let's again look at the first two picks: +.... To create A': MERGE_BASE: E MERGE_SIDE1: G @@ -195,17 +202,25 @@ To create B': MERGE_BASE: A MERGE_SIDE1: A' MERGE_SIDE2: B +.... Now let's look at any given rename from MERGE_SIDE1 of the first pick, i.e. any given rename from E to G. Let's use the filenames 'oldfile' and 'newfile' for demonstration purposes. That first pick will function as follows; when the rename is detected, the merge machinery will do a three-way content merge of the following: + +.... E:oldfile G:newfile A:oldfile +.... + and produce a new result: + +.... A':newfile +.... Note above that I've assumed that E->A did not rename oldfile. If that side did rename, then we most likely have a rename/rename(1to2) conflict @@ -254,19 +269,21 @@ were detected as renames, A:oldfile and A':newfile should also be detectable as renames almost always. -=== 4. A detailed description of the counter-examples to #3. === +== 5. A detailed description of the counter-examples to #4. == -We already noted in section 3 that rename/rename(1to1) (i.e. both sides +We already noted in section 4 that rename/rename(1to1) (i.e. both sides renaming a file the same way) was one counter-example. The more interesting bit, though, is why did we need to use the "almost" qualifier when stating that A:oldfile and A':newfile are "almost" always detectable as renames? -Let's repeat an earlier point that section 3 made: +Let's repeat an earlier point that section 4 made: +.... A':newfile was created by applying the changes between E:oldfile and G:newfile to A:oldfile. The changes between E:oldfile and G:newfile were <50% of the size of E:oldfile. +.... If those changes that were <50% of the size of E:oldfile are also <50% of the size of A:oldfile, then A:oldfile and A':newfile will be detectable as @@ -276,18 +293,21 @@ still somehow merge cleanly), then traditional rename detection would not detect A:oldfile and A':newfile as renames. Here's an example where that can happen: + * E:oldfile had 20 lines * G:newfile added 10 new lines at the beginning of the file * A:oldfile kept the first 3 lines of the file, and deleted all the rest + then + +.... => A':newfile would have 13 lines, 3 of which matches those in A:oldfile. -E:oldfile -> G:newfile would be detected as a rename, but A:oldfile and -A':newfile would not be. + E:oldfile -> G:newfile would be detected as a rename, but A:oldfile and + A':newfile would not be. +.... -=== 5. Why the special cases in #4 are still fully reasonable to use to === -=== pair up files for three-way content merging in the merge machinery, === -=== and why they do not affect the correctness of the merge. === +== 6. Why the special cases in #5 are still fully reasonable to use to pair up files for three-way content merging in the merge machinery, and why they do not affect the correctness of the merge. == In the rename/rename(1to1) case, A:newfile and A':newfile are not renames since they use the *same* filename. However, files with the same filename @@ -295,14 +315,14 @@ are obviously fine to pair up for three-way content merging (the merge machinery has never employed break detection). The interesting counter-example case is thus not the rename/rename(1to1) case, but the case where A did not rename oldfile. That was the case that we spent most of -the time discussing in sections 3 and 4. The remainder of this section +the time discussing in sections 4 and 5. The remainder of this section will be devoted to that case as well. So, even if A:oldfile and A':newfile aren't detectable as renames, why is it still reasonable to pair them up for three-way content merging in the merge machinery? There are multiple reasons: - * As noted in sections 3 and 4, the diff between A:oldfile and A':newfile + * As noted in sections 4 and 5, the diff between A:oldfile and A':newfile is *exactly* the same as the diff between E:oldfile and G:newfile. The latter pair were detected as renames, so it seems unlikely to surprise users for us to treat A:oldfile and A':newfile as renames. @@ -394,7 +414,7 @@ cases 1 and 3 seem to provide as good or better behavior with the optimization than without. -=== 6. Interaction with skipping of "irrelevant" renames === +== 7. Interaction with skipping of "irrelevant" renames == Previous optimizations involved skipping rename detection for paths considered to be "irrelevant". See for example the following commits: @@ -421,24 +441,27 @@ detection -- though we can limit it to the paths for which we have not already detected renames. -=== 7. Additional items that need to be cached === +== 8. Additional items that need to be cached == It turns out we have to cache more than just renames; we also cache: +.... A) non-renames (i.e. unpaired deletes) B) counts of renames within directories C) sources that were marked as RELEVANT_LOCATION, but which were downgraded to RELEVANT_NO_MORE D) the toplevel trees involved in the merge +.... These are all stored in struct rename_info, and respectively appear in + * cached_pairs (along side actual renames, just with a value of NULL) * dir_rename_counts * cached_irrelevant * merge_trees -The reason for (A) comes from the irrelevant renames skipping -optimization discussed in section 6. The fact that irrelevant renames +The reason for `(A)` comes from the irrelevant renames skipping +optimization discussed in section 7. The fact that irrelevant renames are skipped means we only get a subset of the potential renames detected and subsequent commits may need to run rename detection on the upstream side on a subset of the remaining renames (to get the @@ -447,23 +470,24 @@ deletes are involved in rename detection too, we don't want to repeatedly check that those paths remain unpaired on the upstream side with every commit we are transplanting. -The reason for (B) is that diffcore_rename_extended() is what +The reason for `(B)` is that diffcore_rename_extended() is what generates the counts of renames by directory which is needed in directory rename detection, and if we don't run diffcore_rename_extended() again then we need to have the output from it, including dir_rename_counts, from the previous run. -The reason for (C) is that merge-ort's tree traversal will again think +The reason for `(C)` is that merge-ort's tree traversal will again think those paths are relevant (marking them as RELEVANT_LOCATION), but the fact that they were downgraded to RELEVANT_NO_MORE means that dir_rename_counts already has the information we need for directory rename detection. (A path which becomes RELEVANT_CONTENT in a subsequent commit will be removed from cached_irrelevant.) -The reason for (D) is that is how we determine whether the remember +The reason for `(D)` is that is how we determine whether the remember renames optimization can be used. In particular, remembering that our sequence of merges looks like: +.... Merge 1: MERGE_BASE: E MERGE_SIDE1: G @@ -475,6 +499,7 @@ sequence of merges looks like: MERGE_SIDE1: A' MERGE_SIDE2: B => Creates B' +.... It is the fact that the trees A and A' appear both in Merge 1 and in Merge 2, with A as a parent of A' that allows this optimization. So @@ -482,12 +507,11 @@ we store the trees to compare with what we are asked to merge next time. -=== 8. How directory rename detection interacts with the above and === -=== why this optimization is still safe even if === -=== merge.directoryRenames is set to "true". === +== 9. How directory rename detection interacts with the above and why this optimization is still safe even if merge.directoryRenames is set to "true". == As noted in the assumptions section: +.... """ ...if directory renames do occur, then the default of merge.directoryRenames being set to "conflict" means that the operation @@ -497,11 +521,13 @@ As noted in the assumptions section: is that some users will have set merge.directoryRenames to "true" to allow the merges to continue to proceed automatically. """ +.... Let's remember that we need to look at how any given pick affects the next one. So let's again use the first two picks from the diagram in section one: +.... First pick does this three-way merge: MERGE_BASE: E MERGE_SIDE1: G @@ -513,6 +539,7 @@ one: MERGE_SIDE1: A' MERGE_SIDE2: B => creates B' +.... Now, directory rename detection exists so that if one side of history renames a directory, and the other side adds a new file to the old @@ -545,7 +572,7 @@ while considering all of these cases: concerned; see the assumptions section). Two interesting sub-notes about these counts: - * If we need to perform rename-detection again on the given side (e.g. + ** If we need to perform rename-detection again on the given side (e.g. some paths are relevant for rename detection that weren't before), then we clear dir_rename_counts and recompute it, making use of cached_pairs. The reason it is important to do this is optimizations @@ -556,7 +583,7 @@ while considering all of these cases: easiest way to "fix up" dir_rename_counts in such cases is to just recompute it. - * If we prune rename/rename(1to1) entries from the cache, then we also + ** If we prune rename/rename(1to1) entries from the cache, then we also need to update dir_rename_counts to decrement the counts for the involved directory and any relevant parent directories (to undo what update_dir_rename_counts() in diffcore-rename.c incremented when the @@ -578,6 +605,7 @@ in order: Case 1: MERGE_SIDE1 renames old dir, MERGE_SIDE2 adds new file to old dir +.... This case looks like this: MERGE_BASE: E, Has olddir/ @@ -595,10 +623,13 @@ Case 1: MERGE_SIDE1 renames old dir, MERGE_SIDE2 adds new file to old dir * MERGE_SIDE1 has cached olddir/newfile -> newdir/newfile Given the cached rename noted above, the second merge can proceed as expected without needing to perform rename detection from A -> A'. +.... Case 2: MERGE_SIDE1 renames old dir, MERGE_SIDE2 renames file into old dir +.... This case looks like this: + MERGE_BASE: E oldfile, olddir/ MERGE_SIDE1: G oldfile, olddir/ -> newdir/ MERGE_SIDE2: A oldfile -> olddir/newfile @@ -617,9 +648,11 @@ Case 2: MERGE_SIDE1 renames old dir, MERGE_SIDE2 renames file into old dir Given the cached rename noted above, the second merge can proceed as expected without needing to perform rename detection from A -> A'. +.... Case 3: MERGE_SIDE1 adds new file to old dir, MERGE_SIDE2 renames old dir +.... This case looks like this: MERGE_BASE: E, Has olddir/ @@ -635,9 +668,11 @@ Case 3: MERGE_SIDE1 adds new file to old dir, MERGE_SIDE2 renames old dir In this case, with the optimization, note that after the first commit there were no renames on MERGE_SIDE1, and any renames on MERGE_SIDE2 are tossed. But the second merge didn't need any renames so this is fine. +.... Case 4: MERGE_SIDE1 renames file into old dir, MERGE_SIDE2 renames old dir +.... This case looks like this: MERGE_BASE: E, Has olddir/ @@ -658,6 +693,7 @@ Case 4: MERGE_SIDE1 renames file into old dir, MERGE_SIDE2 renames old dir Given the cached rename noted above, the second merge can proceed as expected without needing to perform rename detection from A -> A'. +.... Finally, I'll just note here that interactions with the skip-irrelevant-renames optimization means we sometimes don't detect -- 2.51.0 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH v2 2/4] doc: remembering-renames.adoc: fix asciidoc warnings 2025-10-02 22:12 ` [PATCH v2 2/4] doc: remembering-renames.adoc: fix asciidoc warnings Ramsay Jones @ 2025-10-08 3:51 ` Elijah Newren 2025-10-08 21:38 ` Ramsay Jones 0 siblings, 1 reply; 25+ messages in thread From: Elijah Newren @ 2025-10-08 3:51 UTC (permalink / raw) To: Ramsay Jones Cc: GIT Mailing-list, Patrick Steinhardt, Derrick Stolee, Junio C Hamano On Thu, Oct 2, 2025 at 3:13 PM Ramsay Jones <ramsay@ramsayjones.plus.com> wrote: > > Both asciidoc and ascidoctor issue warnings about 'list item index: > expected n got n-1' for n=1->9 on lines 13, 15, 17, 20, 23, 25, 29, > 31 and 33. In asciidoc, numbered lists must start at one, whereas this > file has a list starting at zero. Also, asciidoc and asciidoctor warn > about 'section title out of sequence: expected level 1, got level 2' > on line 38. (asciidoc only complains about the first instance of this, > while asciidoctor complains about them all, on lines 94, 141, 142, > 184, 185, 257, 288, 289, 290, 397, 424, 485, 486 and 487). These > warnings stem from the section titles not being correctly nested within > a document/chapter title. > > In order to address the first set of warnings, simply renumber the list > from one to nine, rather than zero to eight. This also requires altering > the text which refers to the section numbers, including other section > titles. > > In order to address the second set of warnings, change the section title > syntax from '=== title ===' to '== title ==', effectively reducing the > nesting level of the title by one. Also, some of the titles are given > over multiple lines (they are very long), with an title '===' prefix > on each line. This leads to them being treated as separate sections > with no body text (as you can see from the line numbers given for the > asciidoctor warnings, above). So, for these titles, turn them into a > single (long) line of text. > > In addition to the warnings, address some other formatting issues: > > - the ascii branch diagrams didn't format correctly on asciidoctor > so include them in a literal block. > - several blocks of text were intended to be formatted 'as is' but > were not included in a literal block. > - in section 8, format the (A)->(D) in the text description as a > literal with `` marks, since (C) is rendered as a copyright > symbol in html otherwise. > - in section 9, a sub-list of two items is not formatted as such. > change the '*' introducer to '**' to correct the sub-list format. Sorry to put you through all this work. I had no idea the stuff under Documentation/technical/ was ever meant to be run through asciidoc/asciidoctor. The .txt ending didn't hint at anything like this; I mean, sure lots of other files were put through those, but I assumed this directory was just stuff for other Git developers... > -=== 0. Assumptions === > +== 1. Assumptions == It doesn't like '===' but is fine with '=='? I'm a bit surprised. If it was about nesting, wouldn't '==' also complain since there is no '=' headers anywhere. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 2/4] doc: remembering-renames.adoc: fix asciidoc warnings 2025-10-08 3:51 ` Elijah Newren @ 2025-10-08 21:38 ` Ramsay Jones 0 siblings, 0 replies; 25+ messages in thread From: Ramsay Jones @ 2025-10-08 21:38 UTC (permalink / raw) To: Elijah Newren Cc: GIT Mailing-list, Patrick Steinhardt, Derrick Stolee, Junio C Hamano On 08/10/2025 4:51 am, Elijah Newren wrote: > On Thu, Oct 2, 2025 at 3:13 PM Ramsay Jones <ramsay@ramsayjones.plus.com> wrote: >> >> Both asciidoc and ascidoctor issue warnings about 'list item index: >> expected n got n-1' for n=1->9 on lines 13, 15, 17, 20, 23, 25, 29, >> 31 and 33. In asciidoc, numbered lists must start at one, whereas this >> file has a list starting at zero. Also, asciidoc and asciidoctor warn >> about 'section title out of sequence: expected level 1, got level 2' >> on line 38. (asciidoc only complains about the first instance of this, >> while asciidoctor complains about them all, on lines 94, 141, 142, >> 184, 185, 257, 288, 289, 290, 397, 424, 485, 486 and 487). These >> warnings stem from the section titles not being correctly nested within >> a document/chapter title. >> >> In order to address the first set of warnings, simply renumber the list >> from one to nine, rather than zero to eight. This also requires altering >> the text which refers to the section numbers, including other section >> titles. >> >> In order to address the second set of warnings, change the section title >> syntax from '=== title ===' to '== title ==', effectively reducing the >> nesting level of the title by one. Also, some of the titles are given >> over multiple lines (they are very long), with an title '===' prefix >> on each line. This leads to them being treated as separate sections >> with no body text (as you can see from the line numbers given for the >> asciidoctor warnings, above). So, for these titles, turn them into a >> single (long) line of text. >> >> In addition to the warnings, address some other formatting issues: >> >> - the ascii branch diagrams didn't format correctly on asciidoctor >> so include them in a literal block. >> - several blocks of text were intended to be formatted 'as is' but >> were not included in a literal block. >> - in section 8, format the (A)->(D) in the text description as a >> literal with `` marks, since (C) is rendered as a copyright >> symbol in html otherwise. >> - in section 9, a sub-list of two items is not formatted as such. >> change the '*' introducer to '**' to correct the sub-list format. > > Sorry to put you through all this work. I had no idea the stuff under > Documentation/technical/ was ever meant to be run through > asciidoc/asciidoctor. The .txt ending didn't hint at anything like > this; I mean, sure lots of other files were put through those, but I > assumed this directory was just stuff for other Git developers... As I mentioned in my cover letter, I didn't think these documents were ever meant to be submitted to asciidoc(tor) either, but had to assume that the current policy required it; so, I had to show willing ... :) If it was not already obvious, until this patch series I had managed to completely avoid any knowledge of 'asciidoc standard markup' (which appears to be anything but standard)! >> -=== 0. Assumptions === >> +== 1. Assumptions == > > It doesn't like '===' but is fine with '=='? I'm a bit surprised. If > it was about nesting, wouldn't '==' also complain since there is no > '=' headers anywhere. > Yep, '=' is a level 0 header, but the asciidoc message said 'expected level 1, got level 2', so I just dropped it down one level and asciidoc(tor) was happy! Thanks. ATB, Ramsay Jones ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH v2 3/4] doc: sparse-checkout.adoc: fix asciidoc warnings 2025-10-02 22:12 ` [PATCH v2 0/4] technical docs in make build Ramsay Jones 2025-10-02 22:12 ` [PATCH v2 1/4] doc: add some missing technical documents Ramsay Jones 2025-10-02 22:12 ` [PATCH v2 2/4] doc: remembering-renames.adoc: fix asciidoc warnings Ramsay Jones @ 2025-10-02 22:12 ` Ramsay Jones 2025-10-07 12:20 ` Kristoffer Haugsbakk 2025-10-08 3:57 ` Elijah Newren 2025-10-02 22:12 ` [PATCH v2 4/4] doc: commit-graph.adoc: fix up some formatting Ramsay Jones 2025-10-02 22:38 ` [PATCH v2 0/4] technical docs in make build Ramsay Jones 4 siblings, 2 replies; 25+ messages in thread From: Ramsay Jones @ 2025-10-02 22:12 UTC (permalink / raw) To: GIT Mailing-list Cc: Patrick Steinhardt, Elijah Newren, Derrick Stolee, Junio C Hamano, Ramsay Jones Both asciidoc and asciidoctor issue warnings about 'list item index: expected n got n-1' for n=1->7 on lines 928, 931, 951, 974, 980, 1033 and 1049. In asciidoc, numbered lists must start at one, whereas this file has a list starting at zero. Also, asciidoc and asciidoctor warn about 'section title out of sequence: expected level 1, got level 2' on line 17. (asciidoc only complains about the first instance of this, while asciidoctor complains about them all, on lines 95, 258, 303, 316, 545, 612, 752, 824, 895, 923 and 1053). These warnings stem from the section titles not being correctly nested within a document/chapter title. In order to address the first set of warnings, simply renumber the list from one to severn, rather than zero to six. Fortunately, this does not require altering additional text, since the enumeration of 'Known Bugs' is not referred to anywhere else in the document. In order to address the second set of warnings, change the section title syntax from '=== title ===' to '== title ==', effectively reducing the nesting level of the title by one. Also, some apparent (sub-)titles are not marked up with sub-title syntax, so add some '=== ' prefix(s) to the relevant headings. In addition to the warnings, address some other formatting issues: - the use of heavily nested unordered lists is not reflected in the output (making the file totally unreadable) because each level of nesting requires a different syntax. (i.e. replace '*' with '**' for the second level, '*' with '***' for the third level, etc.) - make use of literal blocks and manual indentation to get asciidoc and asciidoctor to display even remotely similar output. - make use of labelled lists, in some places, to get a similar looking output to the input, for both asciidoc and asciidoctor. - replace the trailing space in: `git grep ${SEARCH_TERM} OLDREV ` otherwise the entire line in which that appears is removed from the output. Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com> --- Documentation/technical/sparse-checkout.adoc | 704 ++++++++++--------- 1 file changed, 376 insertions(+), 328 deletions(-) diff --git a/Documentation/technical/sparse-checkout.adoc b/Documentation/technical/sparse-checkout.adoc index 0f750ef3e3..3fa8e53655 100644 --- a/Documentation/technical/sparse-checkout.adoc +++ b/Documentation/technical/sparse-checkout.adoc @@ -14,37 +14,41 @@ Table of contents: * Reference Emails -=== Terminology === +== Terminology == -cone mode: one of two modes for specifying the desired subset of files +*`cone mode`*:: + one of two modes for specifying the desired subset of files in a sparse-checkout. In cone-mode, the user specifies directories (getting both everything under that directory as well as everything in leading directories), while in non-cone mode, the user specifies gitignore-style patterns. Controlled by the --[no-]cone option to sparse-checkout init|set. -SKIP_WORKTREE: When tracked files do not match the sparse specification and +*`SKIP_WORKTREE`*:: + When tracked files do not match the sparse specification and are removed from the working tree, the file in the index is marked with a SKIP_WORKTREE bit. Note that if a tracked file has the SKIP_WORKTREE bit set but the file is later written by the user to the working tree anyway, the SKIP_WORKTREE bit will be cleared at the beginning of any subsequent Git operation. - - Most sparse checkout users are unaware of this implementation - detail, and the term should generally be avoided in user-facing - descriptions and command flags. Unfortunately, prior to the - `sparse-checkout` subcommand this low-level detail was exposed, - and as of time of writing, is still exposed in various places. - -sparse-checkout: a subcommand in git used to reduce the files present in ++ +Most sparse checkout users are unaware of this implementation +detail, and the term should generally be avoided in user-facing +descriptions and command flags. Unfortunately, prior to the +`sparse-checkout` subcommand this low-level detail was exposed, +and as of time of writing, is still exposed in various places. + +*`sparse-checkout`*:: + a subcommand in git used to reduce the files present in the working tree to a subset of all tracked files. Also, the name of the file in the $GIT_DIR/info directory used to track the sparsity patterns corresponding to the user's desired subset. -sparse cone: see cone mode +*`sparse cone`*:: see cone mode -sparse directory: An entry in the index corresponding to a directory, which +*`sparse directory`*:: + An entry in the index corresponding to a directory, which appears in the index instead of all the files under that directory that would normally appear. See also sparse-index. Something that can cause confusion is that the "sparse directory" does NOT match @@ -52,7 +56,8 @@ sparse directory: An entry in the index corresponding to a directory, which working tree. May be renamed in the future (e.g. to "skipped directory"). -sparse index: A special mode for sparse-checkout that also makes the +*`sparse index`*:: + A special mode for sparse-checkout that also makes the index sparse by recording a directory entry in lieu of all the files underneath that directory (thus making that a "skipped directory" which unfortunately has also been called a "sparse @@ -60,7 +65,8 @@ sparse index: A special mode for sparse-checkout that also makes the directories. Controlled by the --[no-]sparse-index option to init|set|reapply. -sparsity patterns: patterns from $GIT_DIR/info/sparse-checkout used to +*`sparsity patterns`*:: + patterns from $GIT_DIR/info/sparse-checkout used to define the set of files of interest. A warning: It is easy to over-use this term (or the shortened "patterns" term), for two reasons: (1) users in cone mode specify directories rather than @@ -70,7 +76,8 @@ sparsity patterns: patterns from $GIT_DIR/info/sparse-checkout used to transiently differ in the working tree or index from the sparsity patterns (see "Sparse specification vs. sparsity patterns"). -sparse specification: The set of paths in the user's area of focus. This +*`sparse specification`*:: + The set of paths in the user's area of focus. This is typically just the tracked files that match the sparsity patterns, but the sparse specification can temporarily differ and include additional files. (See also "Sparse specification @@ -87,12 +94,13 @@ sparse specification: The set of paths in the user's area of focus. This * If working with the index and the working copy, the sparse specification is the union of the paths from above. -vivifying: When a command restores a tracked file to the working tree (and +*`vivifying`*:: + When a command restores a tracked file to the working tree (and hopefully also clears the SKIP_WORKTREE bit in the index for that file), this is referred to as "vivifying" the file. -=== Purpose of sparse-checkouts === +== Purpose of sparse-checkouts == sparse-checkouts exist to allow users to work with a subset of their files. @@ -120,14 +128,12 @@ those usecases, sparse-checkouts can modify different subcommands in over a half dozen different ways. Let's start by considering the high level usecases: - A) Users are _only_ interested in the sparse portion of the repo - - A*) Users are _only_ interested in the sparse portion of the repo - that they have downloaded so far - - B) Users want a sparse working tree, but are working in a larger whole - - C) sparse-checkout is a behind-the-scenes implementation detail allowing +[horizontal] +A):: Users are _only_ interested in the sparse portion of the repo +A*):: Users are _only_ interested in the sparse portion of the repo + that they have downloaded so far +B):: Users want a sparse working tree, but are working in a larger whole +C):: sparse-checkout is a behind-the-scenes implementation detail allowing Git to work with a specially crafted in-house virtual file system; users are actually working with a "full" working tree that is lazily populated, and sparse-checkout helps with the lazy population @@ -136,7 +142,7 @@ usecases: It may be worth explaining each of these in a bit more detail: - (Behavior A) Users are _only_ interested in the sparse portion of the repo +=== (Behavior A) Users are _only_ interested in the sparse portion of the repo These folks might know there are other things in the repository, but don't care. They are uninterested in other parts of the repository, and @@ -163,8 +169,7 @@ side-effects of various other commands (such as the printed diffstat after a merge or pull) can lead to worries about local repository size growing unnecessarily[10]. - (Behavior A*) Users are _only_ interested in the sparse portion of the repo - that they have downloaded so far (a variant on the first usecase) +=== (Behavior A*) Users are _only_ interested in the sparse portion of the repo that they have downloaded so far (a variant on the first usecase) This variant is driven by folks who using partial clones together with sparse checkouts and do disconnected development (so far sounding like a @@ -173,15 +178,14 @@ reason for yet another variant is that downloading even just the blobs through history within their sparse specification may be too much, so they only download some. They would still like operations to succeed without network connectivity, though, so things like `git log -S${SEARCH_TERM} -p` -or `git grep ${SEARCH_TERM} OLDREV ` would need to be prepared to provide +or `git grep ${SEARCH_TERM} OLDREV` would need to be prepared to provide partial results that depend on what happens to have been downloaded. This variant could be viewed as Behavior A with the sparse specification for history querying operations modified from "sparsity patterns" to "sparsity patterns limited to the blobs we have already downloaded". - (Behavior B) Users want a sparse working tree, but are working in a - larger whole +=== (Behavior B) Users want a sparse working tree, but are working in a larger whole Stolee described this usecase this way[11]: @@ -229,8 +233,7 @@ those expensive checks when interacting with the working copy, and may prefer getting "unrelated" results from their history queries over having slow commands. - (Behavior C) sparse-checkout is an implementational detail supporting a - special VFS. +=== (Behavior C) sparse-checkout is an implementational detail supporting a special VFS. This usecase goes slightly against the traditional definition of sparse-checkout in that it actually tries to present a full or dense @@ -255,13 +258,13 @@ will perceive the checkout as dense, and commands should thus behave as if all files are present. -=== Usecases of primary concern === +== Usecases of primary concern == Most of the rest of this document will focus on Behavior A and Behavior B. Some notes about the other two cases and why we are not focusing on them: - (Behavior A*) +=== (Behavior A*) Supporting this usecase is estimated to be difficult and a lot of work. There are no plans to implement it currently, but it may be a potential @@ -275,7 +278,7 @@ valid for this usecase, with the only exception being that it redefines the sparse specification to restrict it to already-downloaded blobs. The hard part is in making commands capable of respecting that modified definition. - (Behavior C) +=== (Behavior C) This usecase violates some of the early sparse-checkout documented assumptions (since files marked as SKIP_WORKTREE will be displayed to users @@ -300,20 +303,20 @@ Behavior C do not assume they are part of the Behavior B camp and propose patches that break things for the real Behavior B folks. -=== Oversimplified mental models === +== Oversimplified mental models == An oversimplification of the differences in the above behaviors is: - Behavior A: Restrict worktree and history operations to sparse specification - Behavior B: Restrict worktree operations to sparse specification; have any - history operations work across all files - Behavior C: Do not restrict either worktree or history operations to the - sparse specification...with the exception of branch checkouts or - switches which avoid writing files that will match the index so - they can later lazily be populated instead. +(Behavior A):: Restrict worktree and history operations to sparse specification +(Behavior B):: Restrict worktree operations to sparse specification; have any + history operations work across all files +(Behavior C):: Do not restrict either worktree or history operations to the + sparse specification...with the exception of branch checkouts or + switches which avoid writing files that will match the index so + they can later lazily be populated instead. -=== Desired behavior === +== Desired behavior == As noted previously, despite the simple idea of just working with a subset of files, there are a range of different behavioral changes that need to be @@ -326,37 +329,38 @@ understanding these differences can be beneficial. * Commands behaving the same regardless of high-level use-case - * commands that only look at files within the sparsity specification + ** commands that only look at files within the sparsity specification - * diff (without --cached or REVISION arguments) - * grep (without --cached or REVISION arguments) - * diff-files + *** diff (without --cached or REVISION arguments) + *** grep (without --cached or REVISION arguments) + *** diff-files - * commands that restore files to the working tree that match sparsity + ** commands that restore files to the working tree that match sparsity patterns, and remove unmodified files that don't match those patterns: - * switch - * checkout (the switch-like half) - * read-tree - * reset --hard + *** switch + *** checkout (the switch-like half) + *** read-tree + *** reset --hard - * commands that write conflicted files to the working tree, but otherwise + ** commands that write conflicted files to the working tree, but otherwise will omit writing files to the working tree that do not match the sparsity patterns: - * merge - * rebase - * cherry-pick - * revert + *** merge + *** rebase + *** cherry-pick + *** revert - * `am` and `apply --cached` should probably be in this section but + *** `am` and `apply --cached` should probably be in this section but are buggy (see the "Known bugs" section below) The behavior for these commands somewhat depends upon the merge strategy being used: - * `ort` behaves as described above - * `octopus` and `resolve` will always vivify any file changed in the merge + + *** `ort` behaves as described above + *** `octopus` and `resolve` will always vivify any file changed in the merge relative to the first parent, which is rather suboptimal. It is also important to note that these commands WILL update the index @@ -372,21 +376,21 @@ understanding these differences can be beneficial. specification and the sparsity patterns (much like the commands in the previous section). - * commands that always ignore sparsity since commits must be full-tree + ** commands that always ignore sparsity since commits must be full-tree - * archive - * bundle - * commit - * format-patch - * fast-export - * fast-import - * commit-tree + *** archive + *** bundle + *** commit + *** format-patch + *** fast-export + *** fast-import + *** commit-tree - * commands that write any modified file to the working tree (conflicted + ** commands that write any modified file to the working tree (conflicted or not, and whether those paths match sparsity patterns or not): - * stash - * apply (without `--index` or `--cached`) + *** stash + *** apply (without `--index` or `--cached`) * Commands that may slightly differ for behavior A vs. behavior B: @@ -394,19 +398,20 @@ understanding these differences can be beneficial. behaviors, but may differ in verbosity and types of warning and error messages. - * commands that make modifications to which files are tracked: - * add - * rm - * mv - * update-index + ** commands that make modifications to which files are tracked: + + *** add + *** rm + *** mv + *** update-index The fact that files can move between the 'tracked' and 'untracked' categories means some commands will have to treat untracked files differently. But if we have to treat untracked files differently, then additional commands may also need changes: - * status - * clean + *** status + *** clean In particular, `status` may need to report any untracked files outside the sparsity specification as an erroneous condition (especially to @@ -420,9 +425,10 @@ understanding these differences can be beneficial. may need to ignore the sparse specification by its nature. Also, its current --[no-]ignore-skip-worktree-entries default is totally bogus. - * commands for manually tweaking paths in both the index and the working tree - * `restore` - * the restore-like half of `checkout` + ** commands for manually tweaking paths in both the index and the working tree + + *** `restore` + *** the restore-like half of `checkout` These commands should be similar to add/rm/mv in that they should only operate on the sparse specification by default, and require a @@ -433,18 +439,19 @@ understanding these differences can be beneficial. * Commands that significantly differ for behavior A vs. behavior B: - * commands that query history - * diff (with --cached or REVISION arguments) - * grep (with --cached or REVISION arguments) - * show (when given commit arguments) - * blame (only matters when one or more -C flags are passed) - * and annotate - * log - * whatchanged (may not exist anymore) - * ls-files - * diff-index - * diff-tree - * ls-tree + ** commands that query history + + *** diff (with --cached or REVISION arguments) + *** grep (with --cached or REVISION arguments) + *** show (when given commit arguments) + *** blame (only matters when one or more -C flags are passed) + **** and annotate + *** log + *** whatchanged (may not exist anymore) + *** ls-files + *** diff-index + *** diff-tree + *** ls-tree Note: for log and whatchanged, revision walking logic is unaffected but displaying of patches is affected by scoping the command to the @@ -458,91 +465,91 @@ understanding these differences can be beneficial. * Commands I don't know how to classify - * range-diff + ** range-diff Is this like `log` or `format-patch`? - * cherry + ** cherry See range-diff * Commands unaffected by sparse-checkouts - * shortlog - * show-branch - * rev-list - * bisect - - * branch - * describe - * fetch - * gc - * init - * maintenance - * notes - * pull (merge & rebase have the necessary changes) - * push - * submodule - * tag - - * config - * filter-branch (works in separate checkout without sparse-checkout setup) - * pack-refs - * prune - * remote - * repack - * replace - - * bugreport - * count-objects - * fsck - * gitweb - * help - * instaweb - * merge-tree (doesn't touch worktree or index, and merges always compute full-tree) - * rerere - * verify-commit - * verify-tag - - * commit-graph - * hash-object - * index-pack - * mktag - * mktree - * multi-pack-index - * pack-objects - * prune-packed - * symbolic-ref - * unpack-objects - * update-ref - * write-tree (operates on index, possibly optimized to use sparse dir entries) - - * for-each-ref - * get-tar-commit-id - * ls-remote - * merge-base (merges are computed full tree, so merge base should be too) - * name-rev - * pack-redundant - * rev-parse - * show-index - * show-ref - * unpack-file - * var - * verify-pack - - * <Everything under 'Interacting with Others' in 'git help --all'> - * <Everything under 'Low-level...Syncing' in 'git help --all'> - * <Everything under 'Low-level...Internal Helpers' in 'git help --all'> - * <Everything under 'External commands' in 'git help --all'> + ** shortlog + ** show-branch + ** rev-list + ** bisect + + ** branch + ** describe + ** fetch + ** gc + ** init + ** maintenance + ** notes + ** pull (merge & rebase have the necessary changes) + ** push + ** submodule + ** tag + + ** config + ** filter-branch (works in separate checkout without sparse-checkout setup) + ** pack-refs + ** prune + ** remote + ** repack + ** replace + + ** bugreport + ** count-objects + ** fsck + ** gitweb + ** help + ** instaweb + ** merge-tree (doesn't touch worktree or index, and merges always compute full-tree) + ** rerere + ** verify-commit + ** verify-tag + + ** commit-graph + ** hash-object + ** index-pack + ** mktag + ** mktree + ** multi-pack-index + ** pack-objects + ** prune-packed + ** symbolic-ref + ** unpack-objects + ** update-ref + ** write-tree (operates on index, possibly optimized to use sparse dir entries) + + ** for-each-ref + ** get-tar-commit-id + ** ls-remote + ** merge-base (merges are computed full tree, so merge base should be too) + ** name-rev + ** pack-redundant + ** rev-parse + ** show-index + ** show-ref + ** unpack-file + ** var + ** verify-pack + + ** <Everything under 'Interacting with Others' in 'git help --all'> + ** <Everything under 'Low-level...Syncing' in 'git help --all'> + ** <Everything under 'Low-level...Internal Helpers' in 'git help --all'> + ** <Everything under 'External commands' in 'git help --all'> * Commands that might be affected, but who cares? - * merge-file - * merge-index - * gitk? + ** merge-file + ** merge-index + ** gitk? -=== Behavior classes === +== Behavior classes == From the above there are a few classes of behavior: @@ -573,18 +580,19 @@ From the above there are a few classes of behavior: Commands in this class generally behave like the "restrict" class, except that: - (1) they will ignore the sparse specification and write files with - conflicts to the working tree (thus temporarily expanding the - sparse specification to include such files.) - (2) they are grouped with commands which move to a new commit, since - they often create a commit and then move to it, even though we - know there are many exceptions to moving to the new commit. (For - example, the user may rebase a commit that becomes empty, or have - a cherry-pick which conflicts, or a user could run `merge - --no-commit`, and we also view `apply --index` kind of like `am - --no-commit`.) As such, these commands can make changes to index - files outside the sparse specification, though they'll mark such - files with SKIP_WORKTREE. + + (1) they will ignore the sparse specification and write files with + conflicts to the working tree (thus temporarily expanding the + sparse specification to include such files.) + (2) they are grouped with commands which move to a new commit, since + they often create a commit and then move to it, even though we + know there are many exceptions to moving to the new commit. (For + example, the user may rebase a commit that becomes empty, or have + a cherry-pick which conflicts, or a user could run `merge + --no-commit`, and we also view `apply --index` kind of like `am + --no-commit`.) As such, these commands can make changes to index + files outside the sparse specification, though they'll mark such + files with SKIP_WORKTREE. * "restrict also specially applied to untracked files" @@ -609,37 +617,39 @@ From the above there are a few classes of behavior: specification. -=== Subcommand-dependent defaults === +== Subcommand-dependent defaults == Note that we have different defaults depending on the command for the desired behavior : * Commands defaulting to "restrict": - * diff-files - * diff (without --cached or REVISION arguments) - * grep (without --cached or REVISION arguments) - * switch - * checkout (the switch-like half) - * reset (<commit>) - - * restore - * checkout (the restore-like half) - * checkout-index - * reset (with pathspec) + + ** diff-files + ** diff (without --cached or REVISION arguments) + ** grep (without --cached or REVISION arguments) + ** switch + ** checkout (the switch-like half) + ** reset (<commit>) + + ** restore + ** checkout (the restore-like half) + ** checkout-index + ** reset (with pathspec) This behavior makes sense; these interact with the working tree. * Commands defaulting to "restrict modulo conflicts": - * merge - * rebase - * cherry-pick - * revert - * am - * apply --index (which is kind of like an `am --no-commit`) + ** merge + ** rebase + ** cherry-pick + ** revert + + ** am + ** apply --index (which is kind of like an `am --no-commit`) - * read-tree (especially with -m or -u; is kind of like a --no-commit merge) - * reset (<tree-ish>, due to similarity to read-tree) + ** read-tree (especially with -m or -u; is kind of like a --no-commit merge) + ** reset (<tree-ish>, due to similarity to read-tree) These also interact with the working tree, but require slightly different behavior either so that (a) conflicts can be resolved or (b) @@ -648,16 +658,17 @@ desired behavior : (See also the "Known bugs" section below regarding `am` and `apply`) * Commands defaulting to "no restrict": - * archive - * bundle - * commit - * format-patch - * fast-export - * fast-import - * commit-tree - * stash - * apply (without `--index`) + ** archive + ** bundle + ** commit + ** format-patch + ** fast-export + ** fast-import + ** commit-tree + + ** stash + ** apply (without `--index`) These have completely different defaults and perhaps deserve the most detailed explanation: @@ -679,53 +690,59 @@ desired behavior : sparse specification then we'll lose changes from the user. * Commands defaulting to "restrict also specially applied to untracked files": - * add - * rm - * mv - * update-index - * status - * clean (?) - - Our original implementation for the first three of these commands was - "no restrict", but it had some severe usability issues: - * `git add <somefile>` if honored and outside the sparse - specification, can result in the file randomly disappearing later - when some subsequent command is run (since various commands - automatically clean up unmodified files outside the sparse - specification). - * `git rm '*.jpg'` could very negatively surprise users if it deletes - files outside the range of the user's interest. - * `git mv` has similar surprises when moving into or out of the cone, - so best to restrict by default - - So, we switched `add` and `rm` to default to "restrict", which made - usability problems much less severe and less frequent, but we still got - complaints because commands like: - git add <file-outside-sparse-specification> - git rm <file-outside-sparse-specification> - would silently do nothing. We should instead print an error in those - cases to get usability right. - - update-index needs to be updated to match, and status and maybe clean - also need to be updated to specially handle untracked paths. - - There may be a difference in here between behavior A and behavior B in - terms of verboseness of errors or additional warnings. + + ** add + ** rm + ** mv + ** update-index + ** status + ** clean (?) + +.... + Our original implementation for the first three of these commands was + "no restrict", but it had some severe usability issues: + + * `git add <somefile>` if honored and outside the sparse + specification, can result in the file randomly disappearing later + when some subsequent command is run (since various commands + automatically clean up unmodified files outside the sparse + specification). + * `git rm '*.jpg'` could very negatively surprise users if it deletes + files outside the range of the user's interest. + * `git mv` has similar surprises when moving into or out of the cone, + so best to restrict by default + + So, we switched `add` and `rm` to default to "restrict", which made + usability problems much less severe and less frequent, but we still got + complaints because commands like: + + git add <file-outside-sparse-specification> + git rm <file-outside-sparse-specification> + + would silently do nothing. We should instead print an error in those + cases to get usability right. + + update-index needs to be updated to match, and status and maybe clean + also need to be updated to specially handle untracked paths. + + There may be a difference in here between behavior A and behavior B in + terms of verboseness of errors or additional warnings. +.... * Commands falling under "restrict or no restrict dependent upon behavior A vs. behavior B" - * diff (with --cached or REVISION arguments) - * grep (with --cached or REVISION arguments) - * show (when given commit arguments) - * blame (only matters when one or more -C flags passed) - * and annotate - * log - * and variants: shortlog, gitk, show-branch, whatchanged, rev-list - * ls-files - * diff-index - * diff-tree - * ls-tree + ** diff (with --cached or REVISION arguments) + ** grep (with --cached or REVISION arguments) + ** show (when given commit arguments) + ** blame (only matters when one or more -C flags passed) + *** and annotate + ** log + *** and variants: shortlog, gitk, show-branch, whatchanged, rev-list + ** ls-files + ** diff-index + ** diff-tree + ** ls-tree For now, we default to behavior B for these, which want a default of "no restrict". @@ -749,7 +766,7 @@ desired behavior : implemented. -=== Sparse specification vs. sparsity patterns === +== Sparse specification vs. sparsity patterns == In a well-behaved situation, the sparse specification is given directly by the $GIT_DIR/info/sparse-checkout file. However, it can transiently @@ -821,45 +838,48 @@ under behavior B index operations are lumped with history and tend to operate full-tree. -=== Implementation Questions === - - * Do the options --scope={sparse,all} sound good to others? Are there better - options? - * Names in use, or appearing in patches, or previously suggested: - * --sparse/--dense - * --ignore-skip-worktree-bits - * --ignore-skip-worktree-entries - * --ignore-sparsity - * --[no-]restrict-to-sparse-paths - * --full-tree/--sparse-tree - * --[no-]restrict - * --scope={sparse,all} - * --focus/--unfocus - * --limit/--unlimited - * Rationale making me lean slightly towards --scope={sparse,all}: - * We want a name that works for many commands, so we need a name that +== Implementation Questions == + + * Do the options --scope={sparse,all} sound good to others? Are there better options? + + ** Names in use, or appearing in patches, or previously suggested: + + *** --sparse/--dense + *** --ignore-skip-worktree-bits + *** --ignore-skip-worktree-entries + *** --ignore-sparsity + *** --[no-]restrict-to-sparse-paths + *** --full-tree/--sparse-tree + *** --[no-]restrict + *** --scope={sparse,all} + *** --focus/--unfocus + *** --limit/--unlimited + + ** Rationale making me lean slightly towards --scope={sparse,all}: + + *** We want a name that works for many commands, so we need a name that does not conflict - * We know that we have more than two possible usecases, so it is best + *** We know that we have more than two possible usecases, so it is best to avoid a flag that appears to be binary. - * --scope={sparse,all} isn't overly long and seems relatively + *** --scope={sparse,all} isn't overly long and seems relatively explanatory - * `--sparse`, as used in add/rm/mv, is totally backwards for + *** `--sparse`, as used in add/rm/mv, is totally backwards for grep/log/etc. Changing the meaning of `--sparse` for these commands would fix the backwardness, but possibly break existing scripts. Using a new name pairing would allow us to treat `--sparse` in these commands as a deprecated alias. - * There is a different `--sparse`/`--dense` pair for commands using + *** There is a different `--sparse`/`--dense` pair for commands using revision machinery, so using that naming might cause confusion - * There is also a `--sparse` in both pack-objects and show-branch, which + *** There is also a `--sparse` in both pack-objects and show-branch, which don't conflict but do suggest that `--sparse` is overloaded - * The name --ignore-skip-worktree-bits is a double negative, is + *** The name --ignore-skip-worktree-bits is a double negative, is quite a mouthful, refers to an implementation detail that many users may not be familiar with, and we'd need a negation for it which would probably be even more ridiculously long. (But we can make --ignore-skip-worktree-bits a deprecated alias for --no-restrict.) - * If a config option is added (sparse.scope?) what should the values and + ** If a config option is added (sparse.scope?) what should the values and description be? "sparse" (behavior A), "worktree-sparse-history-dense" (behavior B), "dense" (behavior C)? There's a risk of confusion, because even for Behaviors A and B we want some commands to be @@ -868,19 +888,20 @@ operate full-tree. the primary difference we are focusing is just the history-querying commands (log/diff/grep). Previous config suggestion here: [13] - * Is `--no-expand` a good alias for ls-files's `--sparse` option? + ** Is `--no-expand` a good alias for ls-files's `--sparse` option? (`--sparse` does not map to either `--scope=sparse` or `--scope=all`, because in non-cone mode it does nothing and in cone-mode it shows the sparse directory entries which are technically outside the sparse specification) - * Under Behavior A: - * Does ls-files' `--no-expand` override the default `--scope=all`, or + ** Under Behavior A: + + *** Does ls-files' `--no-expand` override the default `--scope=all`, or does it need an extra flag? - * Does ls-files' `-t` option imply `--scope=all`? - * Does update-index's `--[no-]skip-worktree` option imply `--scope=all`? + *** Does ls-files' `-t` option imply `--scope=all`? + *** Does update-index's `--[no-]skip-worktree` option imply `--scope=all`? - * sparse-checkout: once behavior A is fully implemented, should we take + ** sparse-checkout: once behavior A is fully implemented, should we take an interim measure to ease people into switching the default? Namely, if folks are not already in a sparse checkout, then require `sparse-checkout init/set` to take a @@ -892,7 +913,7 @@ operate full-tree. is seamless for them. -=== Implementation Goals/Plans === +== Implementation Goals/Plans == * Get buy-in on this document in general. @@ -910,25 +931,26 @@ operate full-tree. request that they not trigger this bug." flag * Flags & Config - * Make `--sparse` in add/rm/mv a deprecated alias for `--scope=all` - * Make `--ignore-skip-worktree-bits` in checkout-index/checkout/restore + + ** Make `--sparse` in add/rm/mv a deprecated alias for `--scope=all` + ** Make `--ignore-skip-worktree-bits` in checkout-index/checkout/restore a deprecated aliases for `--scope=all` - * Create config option (sparse.scope?), tie it to the "Cliff notes" + ** Create config option (sparse.scope?), tie it to the "Cliff notes" overview - * Add --scope=sparse (and --scope=all) flag to each of the history querying + ** Add --scope=sparse (and --scope=all) flag to each of the history querying commands. IMPORTANT: make sure diff machinery changes don't mess with format-patch, fast-export, etc. -=== Known bugs === +== Known bugs == This list used to be a lot longer (see e.g. [1,2,3,4,5,6,7,8,9]), but we've been working on it. -0. Behavior A is not well supported in Git. (Behavior B didn't used to +1. Behavior A is not well supported in Git. (Behavior B didn't used to be either, but was the easier of the two to implement.) -1. am and apply: +2. am and apply: apply, without `--index` or `--cached`, relies on files being present in the working copy, and also writes to them unconditionally. As @@ -948,7 +970,7 @@ been working on it. files and then complain that those vivified files would be overwritten by merge. -2. reset --hard: +3. reset --hard: reset --hard provides confusing error message (works correctly, but misleads the user into believing it didn't): @@ -971,13 +993,13 @@ been working on it. `git reset --hard` DID remove addme from the index and the working tree, contrary to the error message, but in line with how reset --hard should behave. -3. read-tree +4. read-tree `read-tree` doesn't apply the 'SKIP_WORKTREE' bit to *any* of the entries it reads into the index, resulting in all your files suddenly appearing to be "deleted". -4. Checkout, restore: +5. Checkout, restore: These command do not handle path & revision arguments appropriately: @@ -1030,7 +1052,7 @@ been working on it. S tracked H tracked-but-maybe-skipped -5. checkout and restore --staged, continued: +6. checkout and restore --staged, continued: These commands do not correctly scope operations to the sparse specification, and make it worse by not setting important SKIP_WORKTREE @@ -1046,56 +1068,82 @@ been working on it. the sparse specification, but then it will be important to set the SKIP_WORKTREE bits appropriately. -6. Performance issues; see: - https://lore.kernel.org/git/CABPp-BEkJQoKZsQGCYioyga_uoDQ6iBeW+FKr8JhyuuTMK1RDw@mail.gmail.com/ +7. Performance issues; see: + + https://lore.kernel.org/git/CABPp-BEkJQoKZsQGCYioyga_uoDQ6iBeW+FKr8JhyuuTMK1RDw@mail.gmail.com/ -=== Reference Emails === +== Reference Emails == Emails that detail various bugs we've had in sparse-checkout: -[1] (Original descriptions of behavior A & behavior B) - https://lore.kernel.org/git/CABPp-BGJ_Nvi5TmgriD9Bh6eNXE2EDq2f8e8QKXAeYG3BxZafA@mail.gmail.com/ -[2] (Fix stash applications in sparse checkouts; bugs from behavioral differences) - https://lore.kernel.org/git/ccfedc7140dbf63ba26a15f93bd3885180b26517.1606861519.git.gitgitgadget@gmail.com/ -[3] (Present-despite-skipped entries) - https://lore.kernel.org/git/11d46a399d26c913787b704d2b7169cafc28d639.1642175983.git.gitgitgadget@gmail.com/ -[4] (Clone --no-checkout interaction) - https://lore.kernel.org/git/pull.801.v2.git.git.1591324899170.gitgitgadget@gmail.com/ (clone --no-checkout) -[5] (The need for update_sparsity() and avoiding `read-tree -mu HEAD`) - https://lore.kernel.org/git/3a1f084641eb47515b5a41ed4409a36128913309.1585270142.git.gitgitgadget@gmail.com/ -[6] (SKIP_WORKTREE is advisory, not mandatory) - https://lore.kernel.org/git/844306c3e86ef67591cc086decb2b760e7d710a3.1585270142.git.gitgitgadget@gmail.com/ -[7] (`worktree add` should copy sparsity settings from current worktree) - https://lore.kernel.org/git/c51cb3714e7b1d2f8c9370fe87eca9984ff4859f.1644269584.git.gitgitgadget@gmail.com/ -[8] (Avoid negative surprises in add, rm, and mv) - https://lore.kernel.org/git/cover.1617914011.git.matheus.bernardino@usp.br/ - https://lore.kernel.org/git/pull.1018.v4.git.1632497954.gitgitgadget@gmail.com/ -[9] (Move from out-of-cone to in-cone) - https://lore.kernel.org/git/20220630023737.473690-6-shaoxuan.yuan02@gmail.com/ - https://lore.kernel.org/git/20220630023737.473690-4-shaoxuan.yuan02@gmail.com/ -[10] (Unnecessarily downloading objects outside sparse specification) - https://lore.kernel.org/git/CAOLTT8QfwOi9yx_qZZgyGa8iL8kHWutEED7ok_jxwTcYT_hf9Q@mail.gmail.com/ - -[11] (Stolee's comments on high-level usecases) - https://lore.kernel.org/git/1a1e33f6-3514-9afc-0a28-5a6b85bd8014@gmail.com/ +[1] (Original descriptions of behavior A & behavior B): + +https://lore.kernel.org/git/CABPp-BGJ_Nvi5TmgriD9Bh6eNXE2EDq2f8e8QKXAeYG3BxZafA@mail.gmail.com/ + +[2] (Fix stash applications in sparse checkouts; bugs from behavioral differences): + +https://lore.kernel.org/git/ccfedc7140dbf63ba26a15f93bd3885180b26517.1606861519.git.gitgitgadget@gmail.com/ + +[3] (Present-despite-skipped entries): + +https://lore.kernel.org/git/11d46a399d26c913787b704d2b7169cafc28d639.1642175983.git.gitgitgadget@gmail.com/ + +[4] (Clone --no-checkout interaction): + +https://lore.kernel.org/git/pull.801.v2.git.git.1591324899170.gitgitgadget@gmail.com/ (clone --no-checkout) + +[5] (The need for update_sparsity() and avoiding `read-tree -mu HEAD`): + +https://lore.kernel.org/git/3a1f084641eb47515b5a41ed4409a36128913309.1585270142.git.gitgitgadget@gmail.com/ + +[6] (SKIP_WORKTREE is advisory, not mandatory): + +https://lore.kernel.org/git/844306c3e86ef67591cc086decb2b760e7d710a3.1585270142.git.gitgitgadget@gmail.com/ + +[7] (`worktree add` should copy sparsity settings from current worktree): + +https://lore.kernel.org/git/c51cb3714e7b1d2f8c9370fe87eca9984ff4859f.1644269584.git.gitgitgadget@gmail.com/ + +[8] (Avoid negative surprises in add, rm, and mv): + + * https://lore.kernel.org/git/cover.1617914011.git.matheus.bernardino@usp.br/ + * https://lore.kernel.org/git/pull.1018.v4.git.1632497954.gitgitgadget@gmail.com/ + +[9] (Move from out-of-cone to in-cone): + + * https://lore.kernel.org/git/20220630023737.473690-6-shaoxuan.yuan02@gmail.com/ + * https://lore.kernel.org/git/20220630023737.473690-4-shaoxuan.yuan02@gmail.com/ + +[10] (Unnecessarily downloading objects outside sparse specification): + +https://lore.kernel.org/git/CAOLTT8QfwOi9yx_qZZgyGa8iL8kHWutEED7ok_jxwTcYT_hf9Q@mail.gmail.com/ + +[11] (Stolee's comments on high-level usecases): + +https://lore.kernel.org/git/1a1e33f6-3514-9afc-0a28-5a6b85bd8014@gmail.com/ [12] Others commenting on eventually switching default to behavior A: + * https://lore.kernel.org/git/xmqqh719pcoo.fsf@gitster.g/ * https://lore.kernel.org/git/xmqqzgeqw0sy.fsf@gitster.g/ * https://lore.kernel.org/git/a86af661-cf58-a4e5-0214-a67d3a794d7e@github.com/ -[13] Previous config name suggestion and description - * https://lore.kernel.org/git/CABPp-BE6zW0nJSStcVU=_DoDBnPgLqOR8pkTXK3dW11=T01OhA@mail.gmail.com/ +[13] Previous config name suggestion and description: + + https://lore.kernel.org/git/CABPp-BE6zW0nJSStcVU=_DoDBnPgLqOR8pkTXK3dW11=T01OhA@mail.gmail.com/ [14] Tangential issue: switch to cone mode as default sparse specification mechanism: - https://lore.kernel.org/git/a1b68fd6126eb341ef3637bb93fedad4309b36d0.1650594746.git.gitgitgadget@gmail.com/ + +https://lore.kernel.org/git/a1b68fd6126eb341ef3637bb93fedad4309b36d0.1650594746.git.gitgitgadget@gmail.com/ [15] Lengthy email on grep behavior, covering what should be searched: - * https://lore.kernel.org/git/CABPp-BGVO3QdbfE84uF_3QDF0-y2iHHh6G5FAFzNRfeRitkuHw@mail.gmail.com/ + +https://lore.kernel.org/git/CABPp-BGVO3QdbfE84uF_3QDF0-y2iHHh6G5FAFzNRfeRitkuHw@mail.gmail.com/ [16] Email explaining sparsity patterns vs. SKIP_WORKTREE and history operations, search for the parenthetical comment starting "We do not check". - https://lore.kernel.org/git/CABPp-BFsCPPNOZ92JQRJeGyNd0e-TCW-LcLyr0i_+VSQJP+GCg@mail.gmail.com/ + +https://lore.kernel.org/git/CABPp-BFsCPPNOZ92JQRJeGyNd0e-TCW-LcLyr0i_+VSQJP+GCg@mail.gmail.com/ [17] https://lore.kernel.org/git/20220207190320.2960362-1-jonathantanmy@google.com/ -- 2.51.0 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH v2 3/4] doc: sparse-checkout.adoc: fix asciidoc warnings 2025-10-02 22:12 ` [PATCH v2 3/4] doc: sparse-checkout.adoc: " Ramsay Jones @ 2025-10-07 12:20 ` Kristoffer Haugsbakk 2025-10-07 22:17 ` Ramsay Jones 2025-10-08 3:57 ` Elijah Newren 1 sibling, 1 reply; 25+ messages in thread From: Kristoffer Haugsbakk @ 2025-10-07 12:20 UTC (permalink / raw) To: Ramsay Jones, GIT Mailing-list Cc: Patrick Steinhardt, Elijah Newren, Derrick Stolee, Junio C Hamano On Fri, Oct 3, 2025, at 00:12, Ramsay Jones wrote: >[snip] > > In order to address the first set of warnings, simply renumber the list > from one to severn, rather than zero to six. Fortunately, this does not s/severn/seven/ >[snip] ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 3/4] doc: sparse-checkout.adoc: fix asciidoc warnings 2025-10-07 12:20 ` Kristoffer Haugsbakk @ 2025-10-07 22:17 ` Ramsay Jones 0 siblings, 0 replies; 25+ messages in thread From: Ramsay Jones @ 2025-10-07 22:17 UTC (permalink / raw) To: Kristoffer Haugsbakk, GIT Mailing-list Cc: Patrick Steinhardt, Elijah Newren, Derrick Stolee, Junio C Hamano On 07/10/2025 1:20 pm, Kristoffer Haugsbakk wrote: > On Fri, Oct 3, 2025, at 00:12, Ramsay Jones wrote: >> [snip] >> >> In order to address the first set of warnings, simply renumber the list >> from one to severn, rather than zero to six. Fortunately, this does not > > s/severn/seven/ Thanks. I have updated locally, while (hopefully) waiting for more feedback. ATB, Ramsay Jones ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 3/4] doc: sparse-checkout.adoc: fix asciidoc warnings 2025-10-02 22:12 ` [PATCH v2 3/4] doc: sparse-checkout.adoc: " Ramsay Jones 2025-10-07 12:20 ` Kristoffer Haugsbakk @ 2025-10-08 3:57 ` Elijah Newren 2025-10-08 21:54 ` Ramsay Jones 1 sibling, 1 reply; 25+ messages in thread From: Elijah Newren @ 2025-10-08 3:57 UTC (permalink / raw) To: Ramsay Jones Cc: GIT Mailing-list, Patrick Steinhardt, Derrick Stolee, Junio C Hamano On Thu, Oct 2, 2025 at 3:13 PM Ramsay Jones <ramsay@ramsayjones.plus.com> wrote: > > Both asciidoc and asciidoctor issue warnings about 'list item index: > expected n got n-1' for n=1->7 on lines 928, 931, 951, 974, 980, 1033 > and 1049. In asciidoc, numbered lists must start at one, whereas this > file has a list starting at zero. Also, asciidoc and asciidoctor warn > about 'section title out of sequence: expected level 1, got level 2' > on line 17. (asciidoc only complains about the first instance of this, > while asciidoctor complains about them all, on lines 95, 258, 303, 316, > 545, 612, 752, 824, 895, 923 and 1053). These warnings stem from the > section titles not being correctly nested within a document/chapter > title. > > In order to address the first set of warnings, simply renumber the list > from one to severn, rather than zero to six. Fortunately, this does not > require altering additional text, since the enumeration of 'Known Bugs' > is not referred to anywhere else in the document. > > In order to address the second set of warnings, change the section title > syntax from '=== title ===' to '== title ==', effectively reducing the > nesting level of the title by one. Also, some apparent (sub-)titles are > not marked up with sub-title syntax, so add some '=== ' prefix(s) to the > relevant headings. Kinda surprising; if it's complaining about lack of title nesting, I'd think you'd need a '= title =' somewhere before using '== title =='. Maybe jumping skipping one nesting level it's fine with, but skipping two is where the problem starts? No idea. > In addition to the warnings, address some other formatting issues: > > - the use of heavily nested unordered lists is not reflected in the > output (making the file totally unreadable) because each level of > nesting requires a different syntax. (i.e. replace '*' with '**' > for the second level, '*' with '***' for the third level, etc.) > - make use of literal blocks and manual indentation to get asciidoc > and asciidoctor to display even remotely similar output. > - make use of labelled lists, in some places, to get a similar looking > output to the input, for both asciidoc and asciidoctor. > - replace the trailing space in: `git grep ${SEARCH_TERM} OLDREV ` > otherwise the entire line in which that appears is removed from > the output. Again, sorry for putting you through all this; I had assumed Documentation/technical/ was stuff meant for other Git developers to see and didn't need to be typeset with asciidoc or asciidoctor and had never attempted to run the documents I added there under either. Someone else renamed them to .adoc... I skimmed through the document, and it all looked like typesetting changes which don't impair the readability of the source text, so seems fine to me. (Same with the previous patch) ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 3/4] doc: sparse-checkout.adoc: fix asciidoc warnings 2025-10-08 3:57 ` Elijah Newren @ 2025-10-08 21:54 ` Ramsay Jones 0 siblings, 0 replies; 25+ messages in thread From: Ramsay Jones @ 2025-10-08 21:54 UTC (permalink / raw) To: Elijah Newren Cc: GIT Mailing-list, Patrick Steinhardt, Derrick Stolee, Junio C Hamano On 08/10/2025 4:57 am, Elijah Newren wrote: > On Thu, Oct 2, 2025 at 3:13 PM Ramsay Jones <ramsay@ramsayjones.plus.com> wrote: >> >> Both asciidoc and asciidoctor issue warnings about 'list item index: >> expected n got n-1' for n=1->7 on lines 928, 931, 951, 974, 980, 1033 >> and 1049. In asciidoc, numbered lists must start at one, whereas this >> file has a list starting at zero. Also, asciidoc and asciidoctor warn >> about 'section title out of sequence: expected level 1, got level 2' >> on line 17. (asciidoc only complains about the first instance of this, >> while asciidoctor complains about them all, on lines 95, 258, 303, 316, >> 545, 612, 752, 824, 895, 923 and 1053). These warnings stem from the >> section titles not being correctly nested within a document/chapter >> title. >> >> In order to address the first set of warnings, simply renumber the list >> from one to severn, rather than zero to six. Fortunately, this does not >> require altering additional text, since the enumeration of 'Known Bugs' >> is not referred to anywhere else in the document. >> >> In order to address the second set of warnings, change the section title >> syntax from '=== title ===' to '== title ==', effectively reducing the >> nesting level of the title by one. Also, some apparent (sub-)titles are >> not marked up with sub-title syntax, so add some '=== ' prefix(s) to the >> relevant headings. > > Kinda surprising; if it's complaining about lack of title nesting, I'd > think you'd need a '= title =' somewhere before using '== title =='. > Maybe jumping skipping one nesting level it's fine with, but skipping > two is where the problem starts? No idea. I have no idea either! see previous email. > >> In addition to the warnings, address some other formatting issues: >> >> - the use of heavily nested unordered lists is not reflected in the >> output (making the file totally unreadable) because each level of >> nesting requires a different syntax. (i.e. replace '*' with '**' >> for the second level, '*' with '***' for the third level, etc.) >> - make use of literal blocks and manual indentation to get asciidoc >> and asciidoctor to display even remotely similar output. >> - make use of labelled lists, in some places, to get a similar looking >> output to the input, for both asciidoc and asciidoctor. >> - replace the trailing space in: `git grep ${SEARCH_TERM} OLDREV ` >> otherwise the entire line in which that appears is removed from >> the output. > > Again, sorry for putting you through all this; I had assumed > Documentation/technical/ was stuff meant for other Git developers to > see and didn't need to be typeset with asciidoc or asciidoctor and had > never attempted to run the documents I added there under either. > Someone else renamed them to .adoc... No problem. I already floated the idea of renaming these files to .txt and removing them from the meson build (in my cover letter), but I had to assume that it was now the policy for these docs to be formatted. I was very conscious of me butchering your documents (and Derrick's) to make an attempt to fix-up the formatting. It was quite frustrating to find that asciidoc and asciidoctor don't agree on how that should be done ... (frequently). :( [I was hopeful that an asciidoc guru would help me fix the two remaining problems (that I know about) - fingers crossed!] > I skimmed through the document, and it all looked like typesetting > changes which don't impair the readability of the source text, so > seems fine to me. (Same with the previous patch) I hoped that would be the case, but I must say that I think you are being very generous! ;) Thanks. ATB, Ramsay Jones ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH v2 4/4] doc: commit-graph.adoc: fix up some formatting 2025-10-02 22:12 ` [PATCH v2 0/4] technical docs in make build Ramsay Jones ` (2 preceding siblings ...) 2025-10-02 22:12 ` [PATCH v2 3/4] doc: sparse-checkout.adoc: " Ramsay Jones @ 2025-10-02 22:12 ` Ramsay Jones 2025-10-02 22:38 ` [PATCH v2 0/4] technical docs in make build Ramsay Jones 4 siblings, 0 replies; 25+ messages in thread From: Ramsay Jones @ 2025-10-02 22:12 UTC (permalink / raw) To: GIT Mailing-list Cc: Patrick Steinhardt, Elijah Newren, Derrick Stolee, Junio C Hamano, Ramsay Jones The formatting markup syntax used in this document (markdown?) is not interpreted correctly by asciidoc or asciidoctor. The main problem is the use of a '## ' prefix markup for some sub-headings, along with the use of '```' code markup and some missing literal blocks. In order to improve the (html) document formatting: - replace the '## ' prefix sub-title syntax with the '~~' underlining syntax for the relevant sub-headings. - replace the '```' code markup, which causes asciidoc(tor) to simply remove the marked up text, with a literal block '----' markup. - the second ascii diagram, in the 'Merging commit-graph files' section, is not rendered correctly by asciidoctor (asciidoc is fine) so enclose it in a '....' block. Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com> --- Documentation/technical/commit-graph.adoc | 29 +++++++++++++++-------- 1 file changed, 19 insertions(+), 10 deletions(-) diff --git a/Documentation/technical/commit-graph.adoc b/Documentation/technical/commit-graph.adoc index 2c26e95e51..a259d1567b 100644 --- a/Documentation/technical/commit-graph.adoc +++ b/Documentation/technical/commit-graph.adoc @@ -39,6 +39,7 @@ A consumer may load the following info for a commit from the graph: Values 1-4 satisfy the requirements of parse_commit_gently(). There are two definitions of generation number: + 1. Corrected committer dates (generation number v2) 2. Topological levels (generation number v1) @@ -158,7 +159,8 @@ number of commits in the full history. By creating a "chain" of commit-graphs, we enable fast writes of new commit data without rewriting the entire commit history -- at least, most of the time. -## File Layout +File Layout +~~~~~~~~~~~ A commit-graph chain uses multiple files, and we use a fixed naming convention to organize these files. Each commit-graph file has a name @@ -170,11 +172,11 @@ hashes for the files in order from "lowest" to "highest". For example, if the `commit-graph-chain` file contains the lines -``` +---- {hash0} {hash1} {hash2} -``` +---- then the commit-graph chain looks like the following diagram: @@ -213,7 +215,8 @@ specifying the hashes of all files in the lower layers. In the above example, `graph-{hash1}.graph` contains `{hash0}` while `graph-{hash2}.graph` contains `{hash0}` and `{hash1}`. -## Merging commit-graph files +Merging commit-graph files +~~~~~~~~~~~~~~~~~~~~~~~~~~ If we only added a new commit-graph file on every write, we would run into a linear search problem through many commit-graph files. Instead, we use a merge @@ -225,6 +228,7 @@ is determined by the merge strategy that the files should collapse to the commits in `graph-{hash1}` should be combined into a new `graph-{hash3}` file. +.... +---------------------+ | | | (new commits) | @@ -250,6 +254,7 @@ file. | | | | +-----------------------+ +.... During this process, the commits to write are combined, sorted and we write the contents to a temporary file, all while holding a `commit-graph-chain.lock` @@ -257,14 +262,15 @@ lock-file. When the file is flushed, we rename it to `graph-{hash3}` according to the computed `{hash3}`. Finally, we write the new chain data to `commit-graph-chain.lock`: -``` +---- {hash3} {hash0} -``` +---- We then close the lock-file. -## Merge Strategy +Merge Strategy +~~~~~~~~~~~~~~ When writing a set of commits that do not exist in the commit-graph stack of height N, we default to creating a new file at level N + 1. We then decide to @@ -289,7 +295,8 @@ The merge strategy values (2 for the size multiple, 64,000 for the maximum number of commits) could be extracted into config settings for full flexibility. -## Handling Mixed Generation Number Chains +Handling Mixed Generation Number Chains +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ With the introduction of generation number v2 and generation data chunk, the following scenario is possible: @@ -318,7 +325,8 @@ have corrected commit dates when written by compatible versions of Git. Thus, rewriting split commit-graph as a single file (`--split=replace`) creates a single layer with corrected commit dates. -## Deleting graph-{hash} files +Deleting graph-\{hash\} files +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ After a new tip file is written, some `graph-{hash}` files may no longer be part of a chain. It is important to remove these files from disk, eventually. @@ -333,7 +341,8 @@ files whose modified times are older than a given expiry window. This window defaults to zero, but can be changed using command-line arguments or a config setting. -## Chains across multiple object directories +Chains across multiple object directories +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In a repo with alternates, we look for the `commit-graph-chain` file starting in the local object directory and then in each alternate. The first file that -- 2.51.0 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH v2 0/4] technical docs in make build 2025-10-02 22:12 ` [PATCH v2 0/4] technical docs in make build Ramsay Jones ` (3 preceding siblings ...) 2025-10-02 22:12 ` [PATCH v2 4/4] doc: commit-graph.adoc: fix up some formatting Ramsay Jones @ 2025-10-02 22:38 ` Ramsay Jones 2025-10-16 20:02 ` [PATCH v3 " Ramsay Jones 4 siblings, 1 reply; 25+ messages in thread From: Ramsay Jones @ 2025-10-02 22:38 UTC (permalink / raw) To: GIT Mailing-list Cc: Patrick Steinhardt, Elijah Newren, Derrick Stolee, Junio C Hamano On 02/10/2025 11:12 pm, Ramsay Jones wrote: > OK, so I have recently developed an intense dislike of both asciidoc > and asciidoctor. :) > Heh, sorry about this, but I messed up the threading (again). This time, for some unknown reason I pasted the 'lore.kernel.org' URL for the v1 cover letter, rather than the message-ID: <bcb3b3a3-bb13-4808-9363-442b5f9be05f@ramsayjones.plus.com> I shouldn't be allowed to operate 'git send-email' after dark! :) ATB, Ramsay Jones ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH v3 0/4] technical docs in make build 2025-10-02 22:38 ` [PATCH v2 0/4] technical docs in make build Ramsay Jones @ 2025-10-16 20:02 ` Ramsay Jones 2025-10-16 20:02 ` [PATCH v3 1/4] doc: remembering-renames.adoc: fix asciidoc warnings Ramsay Jones ` (4 more replies) 0 siblings, 5 replies; 25+ messages in thread From: Ramsay Jones @ 2025-10-16 20:02 UTC (permalink / raw) To: GIT Mailing-list Cc: Patrick Steinhardt, Elijah Newren, Derrick Stolee, Junio C Hamano, Ramsay Jones Changes in v3: - old patch #1 discarded since it was separated into its own branch ('rj/doc-missing-technical-docs' in next) - tyop in patch #2 (old patch #3) - new patch #4 A range diff against v2 is given below. Note that the two remaining problems (see v2 below) have not been addressed but, even without a solution, these patches represent a good improvement. ;) (I am still hopeful that an asciidoc guru will turn up!) NOTE: this series is based on the v2-version of the patch #1, which in turn is based on commit 6ad8021821 ("The fifth batch", 2025-08-29). v2 cover letter: OK, so I have recently developed an intense dislike of both asciidoc and asciidoctor. :) Changes in v2: - Actual commit messages - (almost) total re-write of patches #2 and #3 - removed the RFC from patches #2->#4 I have not included a range-diff, because it doesn't show anything interesting/readable with or without a large --creation-factor! There are two issues I am aware of: - mis-formatting of monospaced text containing an '{' character mentioned in the original cover letter below. I have not found a fix for this, but there are other examples in patch #3! - breakage of two html links representing URLS pointing to emails at 'lore.kernel.org'. I don't think it is a coincidence that it is only these two references that contain a reserved character; a '+' in the first (see known bugs 7) and two (separate) '=' characters in the second (mail ref [13]). I tried %encoding them, but that didn't make any difference. There are probably other formatting issues that I am not aware of! Original cover letter: I have been trying to get back to the 'misc build updates (part #3)' patches, so that I can send them to the list, but I have not been able to find a spare minute for quite some time. :( However, this sub-sequence of patches hangs together as a single theme and I need help to finish them up! (asciidoc is not my forte). The first patch adds some technical documents to the Makefile build which are already part of the meson build. In particular, the following are built by meson, but not by the Makefile: commit-graph.adoc directory-rename-detection.adoc packfile-uri.adoc remembering-renames.adoc repository-version.adoc rerere.adoc sparse-checkout.adoc sparse-index.adoc Although I am not convinced that some of these files were ever meant to be formatted by asciidoc, I have assumed that is the case for the purposes of this patch series. (otherwise, we should remove them from the meson build and rename the files instead). When I attempt to build the html docs, with patch #1 applied, on Linux: $ make html >out-doc 2>&1 $ grep SyntaxWarning out-doc | head -n1 <unknown>:1: SyntaxWarning: invalid escape sequence '\S' $ grep SyntaxWarning out-doc | wc -l 524 $ $ asciidoc --version asciidoc 10.2.0 $ python3 --version Python 3.12.3 $ This is caused by the python version I am using, which was recently changed (in version 3.12) to issue the SyntaxWarning when a 'non-raw' string contains some escape sequences (here \S). [some versions prior to 3.12 used to issue a deprecation warning]. This is a known issue, see e.g. [0], which has been addressed by a patch [1], and as seen in [2] has been included in a new version 10.2.1 of asciidoc. [0] https://trac.macports.org/ticket/70039 [1] 1https://github.com/asciidoc-py/asciidoc-py/pull/267 [2] https://github.com/asciidoc-py/asciidoc-py/commits/main/ [cygwin does not have this problem, because the phython version is 3.9.16] So, ignoring that issue, we still see some warnings from asciidoc: $ grep WARNING out-doc asciidoc: WARNING: remembering-renames.adoc: line 13: list item index: expected 1 got 0 asciidoc: WARNING: remembering-renames.adoc: line 15: list item index: expected 2 got 1 asciidoc: WARNING: remembering-renames.adoc: line 17: list item index: expected 3 got 2 asciidoc: WARNING: remembering-renames.adoc: line 20: list item index: expected 4 got 3 asciidoc: WARNING: remembering-renames.adoc: line 23: list item index: expected 5 got 4 asciidoc: WARNING: remembering-renames.adoc: line 25: list item index: expected 6 got 5 asciidoc: WARNING: remembering-renames.adoc: line 29: list item index: expected 7 got 6 asciidoc: WARNING: remembering-renames.adoc: line 31: list item index: expected 8 got 7 asciidoc: WARNING: remembering-renames.adoc: line 33: list item index: expected 9 got 8 asciidoc: WARNING: remembering-renames.adoc: line 38: section title out of sequence: expected level 1, got level 2 asciidoc: WARNING: sparse-checkout.adoc: line 17: section title out of sequence: expected level 1, got level 2 asciidoc: WARNING: sparse-checkout.adoc: line 928: list item index: expected 1 got 0 asciidoc: WARNING: sparse-checkout.adoc: line 931: list item index: expected 2 got 1 asciidoc: WARNING: sparse-checkout.adoc: line 951: list item index: expected 3 got 2 asciidoc: WARNING: sparse-checkout.adoc: line 974: list item index: expected 4 got 3 asciidoc: WARNING: sparse-checkout.adoc: line 980: list item index: expected 5 got 4 asciidoc: WARNING: sparse-checkout.adoc: line 1033: list item index: expected 6 got 5 asciidoc: WARNING: sparse-checkout.adoc: line 1049: list item index: expected 7 got 6 $ I also tried asciidoctor, just for fun: $ asciidoctor --version Asciidoctor 2.0.20 [https://asciidoctor.org] Runtime Environment (ruby 3.2.3 (2024-01-18 revision 52bb2ac0a6) [x86_64-linux-gnu]) (lc:UTF-8 fs:UTF-8 in:UTF-8 ex:UTF-8) $ $ make USE_ASCIIDOCTOR=1 html >out-doctor 2>&1 $ grep WARNING out-doctor asciidoctor: WARNING: remembering-renames.adoc: line 13: list item index: expected 1, got 0 asciidoctor: WARNING: remembering-renames.adoc: line 15: list item index: expected 2, got 1 asciidoctor: WARNING: remembering-renames.adoc: line 17: list item index: expected 3, got 2 asciidoctor: WARNING: remembering-renames.adoc: line 20: list item index: expected 4, got 3 asciidoctor: WARNING: remembering-renames.adoc: line 23: list item index: expected 5, got 4 asciidoctor: WARNING: remembering-renames.adoc: line 25: list item index: expected 6, got 5 asciidoctor: WARNING: remembering-renames.adoc: line 29: list item index: expected 7, got 6 asciidoctor: WARNING: remembering-renames.adoc: line 31: list item index: expected 8, got 7 asciidoctor: WARNING: remembering-renames.adoc: line 33: list item index: expected 9, got 8 asciidoctor: WARNING: remembering-renames.adoc: line 38: section title out of sequence: expected level 1, got level 2 asciidoctor: WARNING: remembering-renames.adoc: line 94: section title out of sequence: expected level 1, got level 2 asciidoctor: WARNING: remembering-renames.adoc: line 141: section title out of sequence: expected level 1, got level 2 asciidoctor: WARNING: remembering-renames.adoc: line 142: section title out of sequence: expected level 1, got level 2 asciidoctor: WARNING: remembering-renames.adoc: line 184: section title out of sequence: expected level 1, got level 2 asciidoctor: WARNING: remembering-renames.adoc: line 185: section title out of sequence: expected level 1, got level 2 asciidoctor: WARNING: remembering-renames.adoc: line 257: section title out of sequence: expected level 1, got level 2 asciidoctor: WARNING: remembering-renames.adoc: line 288: section title out of sequence: expected level 1, got level 2 asciidoctor: WARNING: remembering-renames.adoc: line 289: section title out of sequence: expected level 1, got level 2 asciidoctor: WARNING: remembering-renames.adoc: line 290: section title out of sequence: expected level 1, got level 2 asciidoctor: WARNING: remembering-renames.adoc: line 397: section title out of sequence: expected level 1, got level 2 asciidoctor: WARNING: remembering-renames.adoc: line 424: section title out of sequence: expected level 1, got level 2 asciidoctor: WARNING: remembering-renames.adoc: line 485: section title out of sequence: expected level 1, got level 2 asciidoctor: WARNING: remembering-renames.adoc: line 486: section title out of sequence: expected level 1, got level 2 asciidoctor: WARNING: remembering-renames.adoc: line 487: section title out of sequence: expected level 1, got level 2 asciidoctor: WARNING: sparse-checkout.adoc: line 17: section title out of sequence: expected level 1, got level 2 asciidoctor: WARNING: sparse-checkout.adoc: line 95: section title out of sequence: expected level 1, got level 2 asciidoctor: WARNING: sparse-checkout.adoc: line 258: section title out of sequence: expected level 1, got level 2 asciidoctor: WARNING: sparse-checkout.adoc: line 303: section title out of sequence: expected level 1, got level 2 asciidoctor: WARNING: sparse-checkout.adoc: line 316: section title out of sequence: expected level 1, got level 2 asciidoctor: WARNING: sparse-checkout.adoc: line 545: section title out of sequence: expected level 1, got level 2 asciidoctor: WARNING: sparse-checkout.adoc: line 612: section title out of sequence: expected level 1, got level 2 asciidoctor: WARNING: sparse-checkout.adoc: line 752: section title out of sequence: expected level 1, got level 2 asciidoctor: WARNING: sparse-checkout.adoc: line 824: section title out of sequence: expected level 1, got level 2 asciidoctor: WARNING: sparse-checkout.adoc: line 895: section title out of sequence: expected level 1, got level 2 asciidoctor: WARNING: sparse-checkout.adoc: line 923: section title out of sequence: expected level 1, got level 2 asciidoctor: WARNING: sparse-checkout.adoc: line 928: list item index: expected 1, got 0 asciidoctor: WARNING: sparse-checkout.adoc: line 931: list item index: expected 2, got 1 asciidoctor: WARNING: sparse-checkout.adoc: line 951: list item index: expected 3, got 2 asciidoctor: WARNING: sparse-checkout.adoc: line 974: list item index: expected 4, got 3 asciidoctor: WARNING: sparse-checkout.adoc: line 980: list item index: expected 5, got 4 asciidoctor: WARNING: sparse-checkout.adoc: line 1033: list item index: expected 6, got 5 asciidoctor: WARNING: sparse-checkout.adoc: line 1049: list item index: expected 7, got 6 asciidoctor: WARNING: sparse-checkout.adoc: line 1053: section title out of sequence: expected level 1, got level 2 $ You can see that asciidoc only complains about the first 'section title out of sequence', whereas asciidoctor complains about them all. [asciidoctor also reports: Note: namesp. cut : stripped namespace before processing Git User Manual] Patch #2 was a nightmare which I really gave up on! :) An early attempt involved renumbering the 'outline list' at the top from 0->8 to 1->9 (I thought there was a way to start numbering at zero, but I lost a lot of time trying to do so, without any success). So, of course I 'just' tried global search/replace in vim to do the renumbering (backwards). This was a complete disaster (of course), which I 'fixed' many many times. (Not everything which is numbered is a section, there are 'cases' as well). In the end, I just disabled the 'outline' list, by removing the period on the numbers (again '0\. Assumptions' should have worked, but didn't) and fixing up the section titles without renumbering them. Note that asciidoctor mis-formats the 'ascii branch diagrams', which asciidoc formats correctly. I think there are other formatting problems left. In patch #3, the formatting changes are confined to the section titles and renumbering the 'known bugs' from 0->6 to 1->7. (I think I noticed some sub-sub lists which are not formatted correctly, but I don't seem to be able to see them now ...). In patch #4, most of the formatting changes relate to section titles, but I could not fix some inline text formatting starting at 'File Layouts' (within 'Commit-Graph Chains') with text that is monospaced with `` but also contains an '{' character. For example: `$OBJDIR/info/commit-graphs/graph-{hash}.graph` is monospaced (blue colour with asciidoc) up until the {hash}.graph which does not have any formatting. (It is not so noticeable with asciidoctor because the formatting consists of a *very* subtle gray background to the text which, to my eyes anyway, is almost not visible). I have tried several suggestions from an on-line asciidoc syntax cheatsheet such as: `$OBJDIR/info/commit-graphs/graph-\{hash\}.graph` `+$OBJDIR/info/commit-graphs/graph-{hash}.graph+` but nothing worked. Note that there are many similar instances of this problem (including just `{hash}`). Note also that asciidoctor did not render the second diagram correctly (the one in 'Merging commit-graph files'), but asciidoc was just fine. The remaining documents: directory-rename-detection.adoc packfile-uri.adoc repository-version.adoc rerere.adoc sparse-index.adoc all appear to be formatted correctly. So, I really need help with the asciidoc formatting, in patches #2->#4, which I am marking as RFC. Having said that, these patches represent an improvement over the existing documents in terms of formatting (just not by much!). Any help fixing up these patches would be much appreciated. :) Thanks. ATB, Ramsay Jones Ramsay Jones (4): doc: remembering-renames.adoc: fix asciidoc warnings doc: sparse-checkout.adoc: fix asciidoc warnings doc: commit-graph.adoc: fix up some formatting doc: add large-object-promisors.adoc to the docs build Documentation/Makefile | 1 + Documentation/technical/commit-graph.adoc | 29 +- .../technical/large-object-promisors.adoc | 64 +- Documentation/technical/meson.build | 1 + .../technical/remembering-renames.adoc | 120 +-- Documentation/technical/sparse-checkout.adoc | 704 ++++++++++-------- 6 files changed, 507 insertions(+), 412 deletions(-) range-diff against v2: 1: f1e3b36cad < -: ---------- doc: add some missing technical documents 2: fd923c16fa = 1: d61b4d2958 doc: remembering-renames.adoc: fix asciidoc warnings 3: 1e5882a2d5 ! 2: 0cd1524c27 doc: sparse-checkout.adoc: fix asciidoc warnings @@ Commit message title. In order to address the first set of warnings, simply renumber the list - from one to severn, rather than zero to six. Fortunately, this does not + from one to seven, rather than zero to six. Fortunately, this does not require altering additional text, since the enumeration of 'Known Bugs' is not referred to anywhere else in the document. 4: c8e31e35b7 = 3: f29e225263 doc: commit-graph.adoc: fix up some formatting -: ---------- > 4: 3c1effbbb6 doc: add large-object-promisors.adoc to the docs build -- 2.51.0 ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH v3 1/4] doc: remembering-renames.adoc: fix asciidoc warnings 2025-10-16 20:02 ` [PATCH v3 " Ramsay Jones @ 2025-10-16 20:02 ` Ramsay Jones 2025-10-16 20:02 ` [PATCH v3 2/4] doc: sparse-checkout.adoc: " Ramsay Jones ` (3 subsequent siblings) 4 siblings, 0 replies; 25+ messages in thread From: Ramsay Jones @ 2025-10-16 20:02 UTC (permalink / raw) To: GIT Mailing-list Cc: Patrick Steinhardt, Elijah Newren, Derrick Stolee, Junio C Hamano, Ramsay Jones Both asciidoc and ascidoctor issue warnings about 'list item index: expected n got n-1' for n=1->9 on lines 13, 15, 17, 20, 23, 25, 29, 31 and 33. In asciidoc, numbered lists must start at one, whereas this file has a list starting at zero. Also, asciidoc and asciidoctor warn about 'section title out of sequence: expected level 1, got level 2' on line 38. (asciidoc only complains about the first instance of this, while asciidoctor complains about them all, on lines 94, 141, 142, 184, 185, 257, 288, 289, 290, 397, 424, 485, 486 and 487). These warnings stem from the section titles not being correctly nested within a document/chapter title. In order to address the first set of warnings, simply renumber the list from one to nine, rather than zero to eight. This also requires altering the text which refers to the section numbers, including other section titles. In order to address the second set of warnings, change the section title syntax from '=== title ===' to '== title ==', effectively reducing the nesting level of the title by one. Also, some of the titles are given over multiple lines (they are very long), with an title '===' prefix on each line. This leads to them being treated as separate sections with no body text (as you can see from the line numbers given for the asciidoctor warnings, above). So, for these titles, turn them into a single (long) line of text. In addition to the warnings, address some other formatting issues: - the ascii branch diagrams didn't format correctly on asciidoctor so include them in a literal block. - several blocks of text were intended to be formatted 'as is' but were not included in a literal block. - in section 8, format the (A)->(D) in the text description as a literal with `` marks, since (C) is rendered as a copyright symbol in html otherwise. - in section 9, a sub-list of two items is not formatted as such. change the '*' introducer to '**' to correct the sub-list format. Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com> --- .../technical/remembering-renames.adoc | 120 ++++++++++++------ 1 file changed, 78 insertions(+), 42 deletions(-) diff --git a/Documentation/technical/remembering-renames.adoc b/Documentation/technical/remembering-renames.adoc index 73f41761e2..6155f36c72 100644 --- a/Documentation/technical/remembering-renames.adoc +++ b/Documentation/technical/remembering-renames.adoc @@ -10,32 +10,32 @@ history as an optimization, assuming all merges are automatic and clean Outline: - 0. Assumptions + 1. Assumptions - 1. How rebasing and cherry-picking work + 2. How rebasing and cherry-picking work - 2. Why the renames on MERGE_SIDE1 in any given pick are *always* a + 3. Why the renames on MERGE_SIDE1 in any given pick are *always* a superset of the renames on MERGE_SIDE1 for the next pick. - 3. Why any rename on MERGE_SIDE1 in any given pick is _almost_ always also + 4. Why any rename on MERGE_SIDE1 in any given pick is _almost_ always also a rename on MERGE_SIDE1 for the next pick - 4. A detailed description of the counter-examples to #3. + 5. A detailed description of the counter-examples to #4. - 5. Why the special cases in #4 are still fully reasonable to use to pair + 6. Why the special cases in #5 are still fully reasonable to use to pair up files for three-way content merging in the merge machinery, and why they do not affect the correctness of the merge. - 6. Interaction with skipping of "irrelevant" renames + 7. Interaction with skipping of "irrelevant" renames - 7. Additional items that need to be cached + 8. Additional items that need to be cached - 8. How directory rename detection interacts with the above and why this + 9. How directory rename detection interacts with the above and why this optimization is still safe even if merge.directoryRenames is set to "true". -=== 0. Assumptions === +== 1. Assumptions == There are two assumptions that will hold throughout this document: @@ -44,8 +44,8 @@ There are two assumptions that will hold throughout this document: * All merges are fully automatic -and a third that will hold in sections 2-5 for simplicity, that I'll later -address in section 8: +and a third that will hold in sections 3-6 for simplicity, that I'll later +address in section 9: * No directory renames occur @@ -77,9 +77,9 @@ conflicts that the user needs to resolve), the cache of renames is not stored on disk, and thus is thrown away as soon as the rebase or cherry pick stops for the user to resolve the operation. -The third assumption makes sections 2-5 simpler, and allows people to +The third assumption makes sections 3-6 simpler, and allows people to understand the basics of why this optimization is safe and effective, and -then I can go back and address the specifics in section 8. It is probably +then I can go back and address the specifics in section 9. It is probably also worth noting that if directory renames do occur, then the default of merge.directoryRenames being set to "conflict" means that the operation will stop for users to resolve the conflicts and the cache will be thrown @@ -88,22 +88,26 @@ reason we need to address directory renames specifically, is that some users will have set merge.directoryRenames to "true" to allow the merges to continue to proceed automatically. The optimization is still safe with this config setting, but we have to discuss a few more cases to show why; -this discussion is deferred until section 8. +this discussion is deferred until section 9. -=== 1. How rebasing and cherry-picking work === +== 2. How rebasing and cherry-picking work == Consider the following setup (from the git-rebase manpage): +------------ A---B---C topic / D---E---F---G main +------------ After rebasing or cherry-picking topic onto main, this will appear as: +------------ A'--B'--C' topic / D---E---F---G main +------------ The way the commits A', B', and C' are created is through a series of merges, where rebase or cherry-pick sequentially uses each of the three @@ -111,6 +115,7 @@ A-B-C commits in a special merge operation. Let's label the three commits in the merge operation as MERGE_BASE, MERGE_SIDE1, and MERGE_SIDE2. For this picture, the three commits for each of the three merges would be: +.... To create A': MERGE_BASE: E MERGE_SIDE1: G @@ -125,6 +130,7 @@ To create C': MERGE_BASE: B MERGE_SIDE1: B' MERGE_SIDE2: C +.... Sometimes, folks are surprised that these three-way merges are done. It can be useful in understanding these three-way merges to view them in a @@ -138,8 +144,7 @@ Conceptually the two statements above are the same as a three-way merge of B, B', and C, at least the parts before you decide to record a commit. -=== 2. Why the renames on MERGE_SIDE1 in any given pick are always a === -=== superset of the renames on MERGE_SIDE1 for the next pick. === +== 3. Why the renames on MERGE_SIDE1 in any given pick are always a superset of the renames on MERGE_SIDE1 for the next pick. == The merge machinery uses the filenames it is fed from MERGE_BASE, MERGE_SIDE1, and MERGE_SIDE2. It will only move content to a different @@ -156,6 +161,7 @@ filename under one of three conditions: First, let's remember what commits are involved in the first and second picks of the cherry-pick or rebase sequence: +.... To create A': MERGE_BASE: E MERGE_SIDE1: G @@ -165,6 +171,7 @@ To create B': MERGE_BASE: A MERGE_SIDE1: A' MERGE_SIDE2: B +.... So, in particular, we need to show that the renames between E and G are a superset of those between A and A'. @@ -181,11 +188,11 @@ are a subset of those between E and G. Equivalently, all renames between E and G are a superset of those between A and A'. -=== 3. Why any rename on MERGE_SIDE1 in any given pick is _almost_ === -=== always also a rename on MERGE_SIDE1 for the next pick. === +== 4. Why any rename on MERGE_SIDE1 in any given pick is _almost_ always also a rename on MERGE_SIDE1 for the next pick. == Let's again look at the first two picks: +.... To create A': MERGE_BASE: E MERGE_SIDE1: G @@ -195,17 +202,25 @@ To create B': MERGE_BASE: A MERGE_SIDE1: A' MERGE_SIDE2: B +.... Now let's look at any given rename from MERGE_SIDE1 of the first pick, i.e. any given rename from E to G. Let's use the filenames 'oldfile' and 'newfile' for demonstration purposes. That first pick will function as follows; when the rename is detected, the merge machinery will do a three-way content merge of the following: + +.... E:oldfile G:newfile A:oldfile +.... + and produce a new result: + +.... A':newfile +.... Note above that I've assumed that E->A did not rename oldfile. If that side did rename, then we most likely have a rename/rename(1to2) conflict @@ -254,19 +269,21 @@ were detected as renames, A:oldfile and A':newfile should also be detectable as renames almost always. -=== 4. A detailed description of the counter-examples to #3. === +== 5. A detailed description of the counter-examples to #4. == -We already noted in section 3 that rename/rename(1to1) (i.e. both sides +We already noted in section 4 that rename/rename(1to1) (i.e. both sides renaming a file the same way) was one counter-example. The more interesting bit, though, is why did we need to use the "almost" qualifier when stating that A:oldfile and A':newfile are "almost" always detectable as renames? -Let's repeat an earlier point that section 3 made: +Let's repeat an earlier point that section 4 made: +.... A':newfile was created by applying the changes between E:oldfile and G:newfile to A:oldfile. The changes between E:oldfile and G:newfile were <50% of the size of E:oldfile. +.... If those changes that were <50% of the size of E:oldfile are also <50% of the size of A:oldfile, then A:oldfile and A':newfile will be detectable as @@ -276,18 +293,21 @@ still somehow merge cleanly), then traditional rename detection would not detect A:oldfile and A':newfile as renames. Here's an example where that can happen: + * E:oldfile had 20 lines * G:newfile added 10 new lines at the beginning of the file * A:oldfile kept the first 3 lines of the file, and deleted all the rest + then + +.... => A':newfile would have 13 lines, 3 of which matches those in A:oldfile. -E:oldfile -> G:newfile would be detected as a rename, but A:oldfile and -A':newfile would not be. + E:oldfile -> G:newfile would be detected as a rename, but A:oldfile and + A':newfile would not be. +.... -=== 5. Why the special cases in #4 are still fully reasonable to use to === -=== pair up files for three-way content merging in the merge machinery, === -=== and why they do not affect the correctness of the merge. === +== 6. Why the special cases in #5 are still fully reasonable to use to pair up files for three-way content merging in the merge machinery, and why they do not affect the correctness of the merge. == In the rename/rename(1to1) case, A:newfile and A':newfile are not renames since they use the *same* filename. However, files with the same filename @@ -295,14 +315,14 @@ are obviously fine to pair up for three-way content merging (the merge machinery has never employed break detection). The interesting counter-example case is thus not the rename/rename(1to1) case, but the case where A did not rename oldfile. That was the case that we spent most of -the time discussing in sections 3 and 4. The remainder of this section +the time discussing in sections 4 and 5. The remainder of this section will be devoted to that case as well. So, even if A:oldfile and A':newfile aren't detectable as renames, why is it still reasonable to pair them up for three-way content merging in the merge machinery? There are multiple reasons: - * As noted in sections 3 and 4, the diff between A:oldfile and A':newfile + * As noted in sections 4 and 5, the diff between A:oldfile and A':newfile is *exactly* the same as the diff between E:oldfile and G:newfile. The latter pair were detected as renames, so it seems unlikely to surprise users for us to treat A:oldfile and A':newfile as renames. @@ -394,7 +414,7 @@ cases 1 and 3 seem to provide as good or better behavior with the optimization than without. -=== 6. Interaction with skipping of "irrelevant" renames === +== 7. Interaction with skipping of "irrelevant" renames == Previous optimizations involved skipping rename detection for paths considered to be "irrelevant". See for example the following commits: @@ -421,24 +441,27 @@ detection -- though we can limit it to the paths for which we have not already detected renames. -=== 7. Additional items that need to be cached === +== 8. Additional items that need to be cached == It turns out we have to cache more than just renames; we also cache: +.... A) non-renames (i.e. unpaired deletes) B) counts of renames within directories C) sources that were marked as RELEVANT_LOCATION, but which were downgraded to RELEVANT_NO_MORE D) the toplevel trees involved in the merge +.... These are all stored in struct rename_info, and respectively appear in + * cached_pairs (along side actual renames, just with a value of NULL) * dir_rename_counts * cached_irrelevant * merge_trees -The reason for (A) comes from the irrelevant renames skipping -optimization discussed in section 6. The fact that irrelevant renames +The reason for `(A)` comes from the irrelevant renames skipping +optimization discussed in section 7. The fact that irrelevant renames are skipped means we only get a subset of the potential renames detected and subsequent commits may need to run rename detection on the upstream side on a subset of the remaining renames (to get the @@ -447,23 +470,24 @@ deletes are involved in rename detection too, we don't want to repeatedly check that those paths remain unpaired on the upstream side with every commit we are transplanting. -The reason for (B) is that diffcore_rename_extended() is what +The reason for `(B)` is that diffcore_rename_extended() is what generates the counts of renames by directory which is needed in directory rename detection, and if we don't run diffcore_rename_extended() again then we need to have the output from it, including dir_rename_counts, from the previous run. -The reason for (C) is that merge-ort's tree traversal will again think +The reason for `(C)` is that merge-ort's tree traversal will again think those paths are relevant (marking them as RELEVANT_LOCATION), but the fact that they were downgraded to RELEVANT_NO_MORE means that dir_rename_counts already has the information we need for directory rename detection. (A path which becomes RELEVANT_CONTENT in a subsequent commit will be removed from cached_irrelevant.) -The reason for (D) is that is how we determine whether the remember +The reason for `(D)` is that is how we determine whether the remember renames optimization can be used. In particular, remembering that our sequence of merges looks like: +.... Merge 1: MERGE_BASE: E MERGE_SIDE1: G @@ -475,6 +499,7 @@ sequence of merges looks like: MERGE_SIDE1: A' MERGE_SIDE2: B => Creates B' +.... It is the fact that the trees A and A' appear both in Merge 1 and in Merge 2, with A as a parent of A' that allows this optimization. So @@ -482,12 +507,11 @@ we store the trees to compare with what we are asked to merge next time. -=== 8. How directory rename detection interacts with the above and === -=== why this optimization is still safe even if === -=== merge.directoryRenames is set to "true". === +== 9. How directory rename detection interacts with the above and why this optimization is still safe even if merge.directoryRenames is set to "true". == As noted in the assumptions section: +.... """ ...if directory renames do occur, then the default of merge.directoryRenames being set to "conflict" means that the operation @@ -497,11 +521,13 @@ As noted in the assumptions section: is that some users will have set merge.directoryRenames to "true" to allow the merges to continue to proceed automatically. """ +.... Let's remember that we need to look at how any given pick affects the next one. So let's again use the first two picks from the diagram in section one: +.... First pick does this three-way merge: MERGE_BASE: E MERGE_SIDE1: G @@ -513,6 +539,7 @@ one: MERGE_SIDE1: A' MERGE_SIDE2: B => creates B' +.... Now, directory rename detection exists so that if one side of history renames a directory, and the other side adds a new file to the old @@ -545,7 +572,7 @@ while considering all of these cases: concerned; see the assumptions section). Two interesting sub-notes about these counts: - * If we need to perform rename-detection again on the given side (e.g. + ** If we need to perform rename-detection again on the given side (e.g. some paths are relevant for rename detection that weren't before), then we clear dir_rename_counts and recompute it, making use of cached_pairs. The reason it is important to do this is optimizations @@ -556,7 +583,7 @@ while considering all of these cases: easiest way to "fix up" dir_rename_counts in such cases is to just recompute it. - * If we prune rename/rename(1to1) entries from the cache, then we also + ** If we prune rename/rename(1to1) entries from the cache, then we also need to update dir_rename_counts to decrement the counts for the involved directory and any relevant parent directories (to undo what update_dir_rename_counts() in diffcore-rename.c incremented when the @@ -578,6 +605,7 @@ in order: Case 1: MERGE_SIDE1 renames old dir, MERGE_SIDE2 adds new file to old dir +.... This case looks like this: MERGE_BASE: E, Has olddir/ @@ -595,10 +623,13 @@ Case 1: MERGE_SIDE1 renames old dir, MERGE_SIDE2 adds new file to old dir * MERGE_SIDE1 has cached olddir/newfile -> newdir/newfile Given the cached rename noted above, the second merge can proceed as expected without needing to perform rename detection from A -> A'. +.... Case 2: MERGE_SIDE1 renames old dir, MERGE_SIDE2 renames file into old dir +.... This case looks like this: + MERGE_BASE: E oldfile, olddir/ MERGE_SIDE1: G oldfile, olddir/ -> newdir/ MERGE_SIDE2: A oldfile -> olddir/newfile @@ -617,9 +648,11 @@ Case 2: MERGE_SIDE1 renames old dir, MERGE_SIDE2 renames file into old dir Given the cached rename noted above, the second merge can proceed as expected without needing to perform rename detection from A -> A'. +.... Case 3: MERGE_SIDE1 adds new file to old dir, MERGE_SIDE2 renames old dir +.... This case looks like this: MERGE_BASE: E, Has olddir/ @@ -635,9 +668,11 @@ Case 3: MERGE_SIDE1 adds new file to old dir, MERGE_SIDE2 renames old dir In this case, with the optimization, note that after the first commit there were no renames on MERGE_SIDE1, and any renames on MERGE_SIDE2 are tossed. But the second merge didn't need any renames so this is fine. +.... Case 4: MERGE_SIDE1 renames file into old dir, MERGE_SIDE2 renames old dir +.... This case looks like this: MERGE_BASE: E, Has olddir/ @@ -658,6 +693,7 @@ Case 4: MERGE_SIDE1 renames file into old dir, MERGE_SIDE2 renames old dir Given the cached rename noted above, the second merge can proceed as expected without needing to perform rename detection from A -> A'. +.... Finally, I'll just note here that interactions with the skip-irrelevant-renames optimization means we sometimes don't detect -- 2.51.0 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v3 2/4] doc: sparse-checkout.adoc: fix asciidoc warnings 2025-10-16 20:02 ` [PATCH v3 " Ramsay Jones 2025-10-16 20:02 ` [PATCH v3 1/4] doc: remembering-renames.adoc: fix asciidoc warnings Ramsay Jones @ 2025-10-16 20:02 ` Ramsay Jones 2025-10-16 20:03 ` [PATCH v3 3/4] doc: commit-graph.adoc: fix up some formatting Ramsay Jones ` (2 subsequent siblings) 4 siblings, 0 replies; 25+ messages in thread From: Ramsay Jones @ 2025-10-16 20:02 UTC (permalink / raw) To: GIT Mailing-list Cc: Patrick Steinhardt, Elijah Newren, Derrick Stolee, Junio C Hamano, Ramsay Jones Both asciidoc and asciidoctor issue warnings about 'list item index: expected n got n-1' for n=1->7 on lines 928, 931, 951, 974, 980, 1033 and 1049. In asciidoc, numbered lists must start at one, whereas this file has a list starting at zero. Also, asciidoc and asciidoctor warn about 'section title out of sequence: expected level 1, got level 2' on line 17. (asciidoc only complains about the first instance of this, while asciidoctor complains about them all, on lines 95, 258, 303, 316, 545, 612, 752, 824, 895, 923 and 1053). These warnings stem from the section titles not being correctly nested within a document/chapter title. In order to address the first set of warnings, simply renumber the list from one to seven, rather than zero to six. Fortunately, this does not require altering additional text, since the enumeration of 'Known Bugs' is not referred to anywhere else in the document. In order to address the second set of warnings, change the section title syntax from '=== title ===' to '== title ==', effectively reducing the nesting level of the title by one. Also, some apparent (sub-)titles are not marked up with sub-title syntax, so add some '=== ' prefix(s) to the relevant headings. In addition to the warnings, address some other formatting issues: - the use of heavily nested unordered lists is not reflected in the output (making the file totally unreadable) because each level of nesting requires a different syntax. (i.e. replace '*' with '**' for the second level, '*' with '***' for the third level, etc.) - make use of literal blocks and manual indentation to get asciidoc and asciidoctor to display even remotely similar output. - make use of labelled lists, in some places, to get a similar looking output to the input, for both asciidoc and asciidoctor. - replace the trailing space in: `git grep ${SEARCH_TERM} OLDREV ` otherwise the entire line in which that appears is removed from the output. Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com> --- Documentation/technical/sparse-checkout.adoc | 704 ++++++++++--------- 1 file changed, 376 insertions(+), 328 deletions(-) diff --git a/Documentation/technical/sparse-checkout.adoc b/Documentation/technical/sparse-checkout.adoc index 0f750ef3e3..3fa8e53655 100644 --- a/Documentation/technical/sparse-checkout.adoc +++ b/Documentation/technical/sparse-checkout.adoc @@ -14,37 +14,41 @@ Table of contents: * Reference Emails -=== Terminology === +== Terminology == -cone mode: one of two modes for specifying the desired subset of files +*`cone mode`*:: + one of two modes for specifying the desired subset of files in a sparse-checkout. In cone-mode, the user specifies directories (getting both everything under that directory as well as everything in leading directories), while in non-cone mode, the user specifies gitignore-style patterns. Controlled by the --[no-]cone option to sparse-checkout init|set. -SKIP_WORKTREE: When tracked files do not match the sparse specification and +*`SKIP_WORKTREE`*:: + When tracked files do not match the sparse specification and are removed from the working tree, the file in the index is marked with a SKIP_WORKTREE bit. Note that if a tracked file has the SKIP_WORKTREE bit set but the file is later written by the user to the working tree anyway, the SKIP_WORKTREE bit will be cleared at the beginning of any subsequent Git operation. - - Most sparse checkout users are unaware of this implementation - detail, and the term should generally be avoided in user-facing - descriptions and command flags. Unfortunately, prior to the - `sparse-checkout` subcommand this low-level detail was exposed, - and as of time of writing, is still exposed in various places. - -sparse-checkout: a subcommand in git used to reduce the files present in ++ +Most sparse checkout users are unaware of this implementation +detail, and the term should generally be avoided in user-facing +descriptions and command flags. Unfortunately, prior to the +`sparse-checkout` subcommand this low-level detail was exposed, +and as of time of writing, is still exposed in various places. + +*`sparse-checkout`*:: + a subcommand in git used to reduce the files present in the working tree to a subset of all tracked files. Also, the name of the file in the $GIT_DIR/info directory used to track the sparsity patterns corresponding to the user's desired subset. -sparse cone: see cone mode +*`sparse cone`*:: see cone mode -sparse directory: An entry in the index corresponding to a directory, which +*`sparse directory`*:: + An entry in the index corresponding to a directory, which appears in the index instead of all the files under that directory that would normally appear. See also sparse-index. Something that can cause confusion is that the "sparse directory" does NOT match @@ -52,7 +56,8 @@ sparse directory: An entry in the index corresponding to a directory, which working tree. May be renamed in the future (e.g. to "skipped directory"). -sparse index: A special mode for sparse-checkout that also makes the +*`sparse index`*:: + A special mode for sparse-checkout that also makes the index sparse by recording a directory entry in lieu of all the files underneath that directory (thus making that a "skipped directory" which unfortunately has also been called a "sparse @@ -60,7 +65,8 @@ sparse index: A special mode for sparse-checkout that also makes the directories. Controlled by the --[no-]sparse-index option to init|set|reapply. -sparsity patterns: patterns from $GIT_DIR/info/sparse-checkout used to +*`sparsity patterns`*:: + patterns from $GIT_DIR/info/sparse-checkout used to define the set of files of interest. A warning: It is easy to over-use this term (or the shortened "patterns" term), for two reasons: (1) users in cone mode specify directories rather than @@ -70,7 +76,8 @@ sparsity patterns: patterns from $GIT_DIR/info/sparse-checkout used to transiently differ in the working tree or index from the sparsity patterns (see "Sparse specification vs. sparsity patterns"). -sparse specification: The set of paths in the user's area of focus. This +*`sparse specification`*:: + The set of paths in the user's area of focus. This is typically just the tracked files that match the sparsity patterns, but the sparse specification can temporarily differ and include additional files. (See also "Sparse specification @@ -87,12 +94,13 @@ sparse specification: The set of paths in the user's area of focus. This * If working with the index and the working copy, the sparse specification is the union of the paths from above. -vivifying: When a command restores a tracked file to the working tree (and +*`vivifying`*:: + When a command restores a tracked file to the working tree (and hopefully also clears the SKIP_WORKTREE bit in the index for that file), this is referred to as "vivifying" the file. -=== Purpose of sparse-checkouts === +== Purpose of sparse-checkouts == sparse-checkouts exist to allow users to work with a subset of their files. @@ -120,14 +128,12 @@ those usecases, sparse-checkouts can modify different subcommands in over a half dozen different ways. Let's start by considering the high level usecases: - A) Users are _only_ interested in the sparse portion of the repo - - A*) Users are _only_ interested in the sparse portion of the repo - that they have downloaded so far - - B) Users want a sparse working tree, but are working in a larger whole - - C) sparse-checkout is a behind-the-scenes implementation detail allowing +[horizontal] +A):: Users are _only_ interested in the sparse portion of the repo +A*):: Users are _only_ interested in the sparse portion of the repo + that they have downloaded so far +B):: Users want a sparse working tree, but are working in a larger whole +C):: sparse-checkout is a behind-the-scenes implementation detail allowing Git to work with a specially crafted in-house virtual file system; users are actually working with a "full" working tree that is lazily populated, and sparse-checkout helps with the lazy population @@ -136,7 +142,7 @@ usecases: It may be worth explaining each of these in a bit more detail: - (Behavior A) Users are _only_ interested in the sparse portion of the repo +=== (Behavior A) Users are _only_ interested in the sparse portion of the repo These folks might know there are other things in the repository, but don't care. They are uninterested in other parts of the repository, and @@ -163,8 +169,7 @@ side-effects of various other commands (such as the printed diffstat after a merge or pull) can lead to worries about local repository size growing unnecessarily[10]. - (Behavior A*) Users are _only_ interested in the sparse portion of the repo - that they have downloaded so far (a variant on the first usecase) +=== (Behavior A*) Users are _only_ interested in the sparse portion of the repo that they have downloaded so far (a variant on the first usecase) This variant is driven by folks who using partial clones together with sparse checkouts and do disconnected development (so far sounding like a @@ -173,15 +178,14 @@ reason for yet another variant is that downloading even just the blobs through history within their sparse specification may be too much, so they only download some. They would still like operations to succeed without network connectivity, though, so things like `git log -S${SEARCH_TERM} -p` -or `git grep ${SEARCH_TERM} OLDREV ` would need to be prepared to provide +or `git grep ${SEARCH_TERM} OLDREV` would need to be prepared to provide partial results that depend on what happens to have been downloaded. This variant could be viewed as Behavior A with the sparse specification for history querying operations modified from "sparsity patterns" to "sparsity patterns limited to the blobs we have already downloaded". - (Behavior B) Users want a sparse working tree, but are working in a - larger whole +=== (Behavior B) Users want a sparse working tree, but are working in a larger whole Stolee described this usecase this way[11]: @@ -229,8 +233,7 @@ those expensive checks when interacting with the working copy, and may prefer getting "unrelated" results from their history queries over having slow commands. - (Behavior C) sparse-checkout is an implementational detail supporting a - special VFS. +=== (Behavior C) sparse-checkout is an implementational detail supporting a special VFS. This usecase goes slightly against the traditional definition of sparse-checkout in that it actually tries to present a full or dense @@ -255,13 +258,13 @@ will perceive the checkout as dense, and commands should thus behave as if all files are present. -=== Usecases of primary concern === +== Usecases of primary concern == Most of the rest of this document will focus on Behavior A and Behavior B. Some notes about the other two cases and why we are not focusing on them: - (Behavior A*) +=== (Behavior A*) Supporting this usecase is estimated to be difficult and a lot of work. There are no plans to implement it currently, but it may be a potential @@ -275,7 +278,7 @@ valid for this usecase, with the only exception being that it redefines the sparse specification to restrict it to already-downloaded blobs. The hard part is in making commands capable of respecting that modified definition. - (Behavior C) +=== (Behavior C) This usecase violates some of the early sparse-checkout documented assumptions (since files marked as SKIP_WORKTREE will be displayed to users @@ -300,20 +303,20 @@ Behavior C do not assume they are part of the Behavior B camp and propose patches that break things for the real Behavior B folks. -=== Oversimplified mental models === +== Oversimplified mental models == An oversimplification of the differences in the above behaviors is: - Behavior A: Restrict worktree and history operations to sparse specification - Behavior B: Restrict worktree operations to sparse specification; have any - history operations work across all files - Behavior C: Do not restrict either worktree or history operations to the - sparse specification...with the exception of branch checkouts or - switches which avoid writing files that will match the index so - they can later lazily be populated instead. +(Behavior A):: Restrict worktree and history operations to sparse specification +(Behavior B):: Restrict worktree operations to sparse specification; have any + history operations work across all files +(Behavior C):: Do not restrict either worktree or history operations to the + sparse specification...with the exception of branch checkouts or + switches which avoid writing files that will match the index so + they can later lazily be populated instead. -=== Desired behavior === +== Desired behavior == As noted previously, despite the simple idea of just working with a subset of files, there are a range of different behavioral changes that need to be @@ -326,37 +329,38 @@ understanding these differences can be beneficial. * Commands behaving the same regardless of high-level use-case - * commands that only look at files within the sparsity specification + ** commands that only look at files within the sparsity specification - * diff (without --cached or REVISION arguments) - * grep (without --cached or REVISION arguments) - * diff-files + *** diff (without --cached or REVISION arguments) + *** grep (without --cached or REVISION arguments) + *** diff-files - * commands that restore files to the working tree that match sparsity + ** commands that restore files to the working tree that match sparsity patterns, and remove unmodified files that don't match those patterns: - * switch - * checkout (the switch-like half) - * read-tree - * reset --hard + *** switch + *** checkout (the switch-like half) + *** read-tree + *** reset --hard - * commands that write conflicted files to the working tree, but otherwise + ** commands that write conflicted files to the working tree, but otherwise will omit writing files to the working tree that do not match the sparsity patterns: - * merge - * rebase - * cherry-pick - * revert + *** merge + *** rebase + *** cherry-pick + *** revert - * `am` and `apply --cached` should probably be in this section but + *** `am` and `apply --cached` should probably be in this section but are buggy (see the "Known bugs" section below) The behavior for these commands somewhat depends upon the merge strategy being used: - * `ort` behaves as described above - * `octopus` and `resolve` will always vivify any file changed in the merge + + *** `ort` behaves as described above + *** `octopus` and `resolve` will always vivify any file changed in the merge relative to the first parent, which is rather suboptimal. It is also important to note that these commands WILL update the index @@ -372,21 +376,21 @@ understanding these differences can be beneficial. specification and the sparsity patterns (much like the commands in the previous section). - * commands that always ignore sparsity since commits must be full-tree + ** commands that always ignore sparsity since commits must be full-tree - * archive - * bundle - * commit - * format-patch - * fast-export - * fast-import - * commit-tree + *** archive + *** bundle + *** commit + *** format-patch + *** fast-export + *** fast-import + *** commit-tree - * commands that write any modified file to the working tree (conflicted + ** commands that write any modified file to the working tree (conflicted or not, and whether those paths match sparsity patterns or not): - * stash - * apply (without `--index` or `--cached`) + *** stash + *** apply (without `--index` or `--cached`) * Commands that may slightly differ for behavior A vs. behavior B: @@ -394,19 +398,20 @@ understanding these differences can be beneficial. behaviors, but may differ in verbosity and types of warning and error messages. - * commands that make modifications to which files are tracked: - * add - * rm - * mv - * update-index + ** commands that make modifications to which files are tracked: + + *** add + *** rm + *** mv + *** update-index The fact that files can move between the 'tracked' and 'untracked' categories means some commands will have to treat untracked files differently. But if we have to treat untracked files differently, then additional commands may also need changes: - * status - * clean + *** status + *** clean In particular, `status` may need to report any untracked files outside the sparsity specification as an erroneous condition (especially to @@ -420,9 +425,10 @@ understanding these differences can be beneficial. may need to ignore the sparse specification by its nature. Also, its current --[no-]ignore-skip-worktree-entries default is totally bogus. - * commands for manually tweaking paths in both the index and the working tree - * `restore` - * the restore-like half of `checkout` + ** commands for manually tweaking paths in both the index and the working tree + + *** `restore` + *** the restore-like half of `checkout` These commands should be similar to add/rm/mv in that they should only operate on the sparse specification by default, and require a @@ -433,18 +439,19 @@ understanding these differences can be beneficial. * Commands that significantly differ for behavior A vs. behavior B: - * commands that query history - * diff (with --cached or REVISION arguments) - * grep (with --cached or REVISION arguments) - * show (when given commit arguments) - * blame (only matters when one or more -C flags are passed) - * and annotate - * log - * whatchanged (may not exist anymore) - * ls-files - * diff-index - * diff-tree - * ls-tree + ** commands that query history + + *** diff (with --cached or REVISION arguments) + *** grep (with --cached or REVISION arguments) + *** show (when given commit arguments) + *** blame (only matters when one or more -C flags are passed) + **** and annotate + *** log + *** whatchanged (may not exist anymore) + *** ls-files + *** diff-index + *** diff-tree + *** ls-tree Note: for log and whatchanged, revision walking logic is unaffected but displaying of patches is affected by scoping the command to the @@ -458,91 +465,91 @@ understanding these differences can be beneficial. * Commands I don't know how to classify - * range-diff + ** range-diff Is this like `log` or `format-patch`? - * cherry + ** cherry See range-diff * Commands unaffected by sparse-checkouts - * shortlog - * show-branch - * rev-list - * bisect - - * branch - * describe - * fetch - * gc - * init - * maintenance - * notes - * pull (merge & rebase have the necessary changes) - * push - * submodule - * tag - - * config - * filter-branch (works in separate checkout without sparse-checkout setup) - * pack-refs - * prune - * remote - * repack - * replace - - * bugreport - * count-objects - * fsck - * gitweb - * help - * instaweb - * merge-tree (doesn't touch worktree or index, and merges always compute full-tree) - * rerere - * verify-commit - * verify-tag - - * commit-graph - * hash-object - * index-pack - * mktag - * mktree - * multi-pack-index - * pack-objects - * prune-packed - * symbolic-ref - * unpack-objects - * update-ref - * write-tree (operates on index, possibly optimized to use sparse dir entries) - - * for-each-ref - * get-tar-commit-id - * ls-remote - * merge-base (merges are computed full tree, so merge base should be too) - * name-rev - * pack-redundant - * rev-parse - * show-index - * show-ref - * unpack-file - * var - * verify-pack - - * <Everything under 'Interacting with Others' in 'git help --all'> - * <Everything under 'Low-level...Syncing' in 'git help --all'> - * <Everything under 'Low-level...Internal Helpers' in 'git help --all'> - * <Everything under 'External commands' in 'git help --all'> + ** shortlog + ** show-branch + ** rev-list + ** bisect + + ** branch + ** describe + ** fetch + ** gc + ** init + ** maintenance + ** notes + ** pull (merge & rebase have the necessary changes) + ** push + ** submodule + ** tag + + ** config + ** filter-branch (works in separate checkout without sparse-checkout setup) + ** pack-refs + ** prune + ** remote + ** repack + ** replace + + ** bugreport + ** count-objects + ** fsck + ** gitweb + ** help + ** instaweb + ** merge-tree (doesn't touch worktree or index, and merges always compute full-tree) + ** rerere + ** verify-commit + ** verify-tag + + ** commit-graph + ** hash-object + ** index-pack + ** mktag + ** mktree + ** multi-pack-index + ** pack-objects + ** prune-packed + ** symbolic-ref + ** unpack-objects + ** update-ref + ** write-tree (operates on index, possibly optimized to use sparse dir entries) + + ** for-each-ref + ** get-tar-commit-id + ** ls-remote + ** merge-base (merges are computed full tree, so merge base should be too) + ** name-rev + ** pack-redundant + ** rev-parse + ** show-index + ** show-ref + ** unpack-file + ** var + ** verify-pack + + ** <Everything under 'Interacting with Others' in 'git help --all'> + ** <Everything under 'Low-level...Syncing' in 'git help --all'> + ** <Everything under 'Low-level...Internal Helpers' in 'git help --all'> + ** <Everything under 'External commands' in 'git help --all'> * Commands that might be affected, but who cares? - * merge-file - * merge-index - * gitk? + ** merge-file + ** merge-index + ** gitk? -=== Behavior classes === +== Behavior classes == From the above there are a few classes of behavior: @@ -573,18 +580,19 @@ From the above there are a few classes of behavior: Commands in this class generally behave like the "restrict" class, except that: - (1) they will ignore the sparse specification and write files with - conflicts to the working tree (thus temporarily expanding the - sparse specification to include such files.) - (2) they are grouped with commands which move to a new commit, since - they often create a commit and then move to it, even though we - know there are many exceptions to moving to the new commit. (For - example, the user may rebase a commit that becomes empty, or have - a cherry-pick which conflicts, or a user could run `merge - --no-commit`, and we also view `apply --index` kind of like `am - --no-commit`.) As such, these commands can make changes to index - files outside the sparse specification, though they'll mark such - files with SKIP_WORKTREE. + + (1) they will ignore the sparse specification and write files with + conflicts to the working tree (thus temporarily expanding the + sparse specification to include such files.) + (2) they are grouped with commands which move to a new commit, since + they often create a commit and then move to it, even though we + know there are many exceptions to moving to the new commit. (For + example, the user may rebase a commit that becomes empty, or have + a cherry-pick which conflicts, or a user could run `merge + --no-commit`, and we also view `apply --index` kind of like `am + --no-commit`.) As such, these commands can make changes to index + files outside the sparse specification, though they'll mark such + files with SKIP_WORKTREE. * "restrict also specially applied to untracked files" @@ -609,37 +617,39 @@ From the above there are a few classes of behavior: specification. -=== Subcommand-dependent defaults === +== Subcommand-dependent defaults == Note that we have different defaults depending on the command for the desired behavior : * Commands defaulting to "restrict": - * diff-files - * diff (without --cached or REVISION arguments) - * grep (without --cached or REVISION arguments) - * switch - * checkout (the switch-like half) - * reset (<commit>) - - * restore - * checkout (the restore-like half) - * checkout-index - * reset (with pathspec) + + ** diff-files + ** diff (without --cached or REVISION arguments) + ** grep (without --cached or REVISION arguments) + ** switch + ** checkout (the switch-like half) + ** reset (<commit>) + + ** restore + ** checkout (the restore-like half) + ** checkout-index + ** reset (with pathspec) This behavior makes sense; these interact with the working tree. * Commands defaulting to "restrict modulo conflicts": - * merge - * rebase - * cherry-pick - * revert - * am - * apply --index (which is kind of like an `am --no-commit`) + ** merge + ** rebase + ** cherry-pick + ** revert + + ** am + ** apply --index (which is kind of like an `am --no-commit`) - * read-tree (especially with -m or -u; is kind of like a --no-commit merge) - * reset (<tree-ish>, due to similarity to read-tree) + ** read-tree (especially with -m or -u; is kind of like a --no-commit merge) + ** reset (<tree-ish>, due to similarity to read-tree) These also interact with the working tree, but require slightly different behavior either so that (a) conflicts can be resolved or (b) @@ -648,16 +658,17 @@ desired behavior : (See also the "Known bugs" section below regarding `am` and `apply`) * Commands defaulting to "no restrict": - * archive - * bundle - * commit - * format-patch - * fast-export - * fast-import - * commit-tree - * stash - * apply (without `--index`) + ** archive + ** bundle + ** commit + ** format-patch + ** fast-export + ** fast-import + ** commit-tree + + ** stash + ** apply (without `--index`) These have completely different defaults and perhaps deserve the most detailed explanation: @@ -679,53 +690,59 @@ desired behavior : sparse specification then we'll lose changes from the user. * Commands defaulting to "restrict also specially applied to untracked files": - * add - * rm - * mv - * update-index - * status - * clean (?) - - Our original implementation for the first three of these commands was - "no restrict", but it had some severe usability issues: - * `git add <somefile>` if honored and outside the sparse - specification, can result in the file randomly disappearing later - when some subsequent command is run (since various commands - automatically clean up unmodified files outside the sparse - specification). - * `git rm '*.jpg'` could very negatively surprise users if it deletes - files outside the range of the user's interest. - * `git mv` has similar surprises when moving into or out of the cone, - so best to restrict by default - - So, we switched `add` and `rm` to default to "restrict", which made - usability problems much less severe and less frequent, but we still got - complaints because commands like: - git add <file-outside-sparse-specification> - git rm <file-outside-sparse-specification> - would silently do nothing. We should instead print an error in those - cases to get usability right. - - update-index needs to be updated to match, and status and maybe clean - also need to be updated to specially handle untracked paths. - - There may be a difference in here between behavior A and behavior B in - terms of verboseness of errors or additional warnings. + + ** add + ** rm + ** mv + ** update-index + ** status + ** clean (?) + +.... + Our original implementation for the first three of these commands was + "no restrict", but it had some severe usability issues: + + * `git add <somefile>` if honored and outside the sparse + specification, can result in the file randomly disappearing later + when some subsequent command is run (since various commands + automatically clean up unmodified files outside the sparse + specification). + * `git rm '*.jpg'` could very negatively surprise users if it deletes + files outside the range of the user's interest. + * `git mv` has similar surprises when moving into or out of the cone, + so best to restrict by default + + So, we switched `add` and `rm` to default to "restrict", which made + usability problems much less severe and less frequent, but we still got + complaints because commands like: + + git add <file-outside-sparse-specification> + git rm <file-outside-sparse-specification> + + would silently do nothing. We should instead print an error in those + cases to get usability right. + + update-index needs to be updated to match, and status and maybe clean + also need to be updated to specially handle untracked paths. + + There may be a difference in here between behavior A and behavior B in + terms of verboseness of errors or additional warnings. +.... * Commands falling under "restrict or no restrict dependent upon behavior A vs. behavior B" - * diff (with --cached or REVISION arguments) - * grep (with --cached or REVISION arguments) - * show (when given commit arguments) - * blame (only matters when one or more -C flags passed) - * and annotate - * log - * and variants: shortlog, gitk, show-branch, whatchanged, rev-list - * ls-files - * diff-index - * diff-tree - * ls-tree + ** diff (with --cached or REVISION arguments) + ** grep (with --cached or REVISION arguments) + ** show (when given commit arguments) + ** blame (only matters when one or more -C flags passed) + *** and annotate + ** log + *** and variants: shortlog, gitk, show-branch, whatchanged, rev-list + ** ls-files + ** diff-index + ** diff-tree + ** ls-tree For now, we default to behavior B for these, which want a default of "no restrict". @@ -749,7 +766,7 @@ desired behavior : implemented. -=== Sparse specification vs. sparsity patterns === +== Sparse specification vs. sparsity patterns == In a well-behaved situation, the sparse specification is given directly by the $GIT_DIR/info/sparse-checkout file. However, it can transiently @@ -821,45 +838,48 @@ under behavior B index operations are lumped with history and tend to operate full-tree. -=== Implementation Questions === - - * Do the options --scope={sparse,all} sound good to others? Are there better - options? - * Names in use, or appearing in patches, or previously suggested: - * --sparse/--dense - * --ignore-skip-worktree-bits - * --ignore-skip-worktree-entries - * --ignore-sparsity - * --[no-]restrict-to-sparse-paths - * --full-tree/--sparse-tree - * --[no-]restrict - * --scope={sparse,all} - * --focus/--unfocus - * --limit/--unlimited - * Rationale making me lean slightly towards --scope={sparse,all}: - * We want a name that works for many commands, so we need a name that +== Implementation Questions == + + * Do the options --scope={sparse,all} sound good to others? Are there better options? + + ** Names in use, or appearing in patches, or previously suggested: + + *** --sparse/--dense + *** --ignore-skip-worktree-bits + *** --ignore-skip-worktree-entries + *** --ignore-sparsity + *** --[no-]restrict-to-sparse-paths + *** --full-tree/--sparse-tree + *** --[no-]restrict + *** --scope={sparse,all} + *** --focus/--unfocus + *** --limit/--unlimited + + ** Rationale making me lean slightly towards --scope={sparse,all}: + + *** We want a name that works for many commands, so we need a name that does not conflict - * We know that we have more than two possible usecases, so it is best + *** We know that we have more than two possible usecases, so it is best to avoid a flag that appears to be binary. - * --scope={sparse,all} isn't overly long and seems relatively + *** --scope={sparse,all} isn't overly long and seems relatively explanatory - * `--sparse`, as used in add/rm/mv, is totally backwards for + *** `--sparse`, as used in add/rm/mv, is totally backwards for grep/log/etc. Changing the meaning of `--sparse` for these commands would fix the backwardness, but possibly break existing scripts. Using a new name pairing would allow us to treat `--sparse` in these commands as a deprecated alias. - * There is a different `--sparse`/`--dense` pair for commands using + *** There is a different `--sparse`/`--dense` pair for commands using revision machinery, so using that naming might cause confusion - * There is also a `--sparse` in both pack-objects and show-branch, which + *** There is also a `--sparse` in both pack-objects and show-branch, which don't conflict but do suggest that `--sparse` is overloaded - * The name --ignore-skip-worktree-bits is a double negative, is + *** The name --ignore-skip-worktree-bits is a double negative, is quite a mouthful, refers to an implementation detail that many users may not be familiar with, and we'd need a negation for it which would probably be even more ridiculously long. (But we can make --ignore-skip-worktree-bits a deprecated alias for --no-restrict.) - * If a config option is added (sparse.scope?) what should the values and + ** If a config option is added (sparse.scope?) what should the values and description be? "sparse" (behavior A), "worktree-sparse-history-dense" (behavior B), "dense" (behavior C)? There's a risk of confusion, because even for Behaviors A and B we want some commands to be @@ -868,19 +888,20 @@ operate full-tree. the primary difference we are focusing is just the history-querying commands (log/diff/grep). Previous config suggestion here: [13] - * Is `--no-expand` a good alias for ls-files's `--sparse` option? + ** Is `--no-expand` a good alias for ls-files's `--sparse` option? (`--sparse` does not map to either `--scope=sparse` or `--scope=all`, because in non-cone mode it does nothing and in cone-mode it shows the sparse directory entries which are technically outside the sparse specification) - * Under Behavior A: - * Does ls-files' `--no-expand` override the default `--scope=all`, or + ** Under Behavior A: + + *** Does ls-files' `--no-expand` override the default `--scope=all`, or does it need an extra flag? - * Does ls-files' `-t` option imply `--scope=all`? - * Does update-index's `--[no-]skip-worktree` option imply `--scope=all`? + *** Does ls-files' `-t` option imply `--scope=all`? + *** Does update-index's `--[no-]skip-worktree` option imply `--scope=all`? - * sparse-checkout: once behavior A is fully implemented, should we take + ** sparse-checkout: once behavior A is fully implemented, should we take an interim measure to ease people into switching the default? Namely, if folks are not already in a sparse checkout, then require `sparse-checkout init/set` to take a @@ -892,7 +913,7 @@ operate full-tree. is seamless for them. -=== Implementation Goals/Plans === +== Implementation Goals/Plans == * Get buy-in on this document in general. @@ -910,25 +931,26 @@ operate full-tree. request that they not trigger this bug." flag * Flags & Config - * Make `--sparse` in add/rm/mv a deprecated alias for `--scope=all` - * Make `--ignore-skip-worktree-bits` in checkout-index/checkout/restore + + ** Make `--sparse` in add/rm/mv a deprecated alias for `--scope=all` + ** Make `--ignore-skip-worktree-bits` in checkout-index/checkout/restore a deprecated aliases for `--scope=all` - * Create config option (sparse.scope?), tie it to the "Cliff notes" + ** Create config option (sparse.scope?), tie it to the "Cliff notes" overview - * Add --scope=sparse (and --scope=all) flag to each of the history querying + ** Add --scope=sparse (and --scope=all) flag to each of the history querying commands. IMPORTANT: make sure diff machinery changes don't mess with format-patch, fast-export, etc. -=== Known bugs === +== Known bugs == This list used to be a lot longer (see e.g. [1,2,3,4,5,6,7,8,9]), but we've been working on it. -0. Behavior A is not well supported in Git. (Behavior B didn't used to +1. Behavior A is not well supported in Git. (Behavior B didn't used to be either, but was the easier of the two to implement.) -1. am and apply: +2. am and apply: apply, without `--index` or `--cached`, relies on files being present in the working copy, and also writes to them unconditionally. As @@ -948,7 +970,7 @@ been working on it. files and then complain that those vivified files would be overwritten by merge. -2. reset --hard: +3. reset --hard: reset --hard provides confusing error message (works correctly, but misleads the user into believing it didn't): @@ -971,13 +993,13 @@ been working on it. `git reset --hard` DID remove addme from the index and the working tree, contrary to the error message, but in line with how reset --hard should behave. -3. read-tree +4. read-tree `read-tree` doesn't apply the 'SKIP_WORKTREE' bit to *any* of the entries it reads into the index, resulting in all your files suddenly appearing to be "deleted". -4. Checkout, restore: +5. Checkout, restore: These command do not handle path & revision arguments appropriately: @@ -1030,7 +1052,7 @@ been working on it. S tracked H tracked-but-maybe-skipped -5. checkout and restore --staged, continued: +6. checkout and restore --staged, continued: These commands do not correctly scope operations to the sparse specification, and make it worse by not setting important SKIP_WORKTREE @@ -1046,56 +1068,82 @@ been working on it. the sparse specification, but then it will be important to set the SKIP_WORKTREE bits appropriately. -6. Performance issues; see: - https://lore.kernel.org/git/CABPp-BEkJQoKZsQGCYioyga_uoDQ6iBeW+FKr8JhyuuTMK1RDw@mail.gmail.com/ +7. Performance issues; see: + + https://lore.kernel.org/git/CABPp-BEkJQoKZsQGCYioyga_uoDQ6iBeW+FKr8JhyuuTMK1RDw@mail.gmail.com/ -=== Reference Emails === +== Reference Emails == Emails that detail various bugs we've had in sparse-checkout: -[1] (Original descriptions of behavior A & behavior B) - https://lore.kernel.org/git/CABPp-BGJ_Nvi5TmgriD9Bh6eNXE2EDq2f8e8QKXAeYG3BxZafA@mail.gmail.com/ -[2] (Fix stash applications in sparse checkouts; bugs from behavioral differences) - https://lore.kernel.org/git/ccfedc7140dbf63ba26a15f93bd3885180b26517.1606861519.git.gitgitgadget@gmail.com/ -[3] (Present-despite-skipped entries) - https://lore.kernel.org/git/11d46a399d26c913787b704d2b7169cafc28d639.1642175983.git.gitgitgadget@gmail.com/ -[4] (Clone --no-checkout interaction) - https://lore.kernel.org/git/pull.801.v2.git.git.1591324899170.gitgitgadget@gmail.com/ (clone --no-checkout) -[5] (The need for update_sparsity() and avoiding `read-tree -mu HEAD`) - https://lore.kernel.org/git/3a1f084641eb47515b5a41ed4409a36128913309.1585270142.git.gitgitgadget@gmail.com/ -[6] (SKIP_WORKTREE is advisory, not mandatory) - https://lore.kernel.org/git/844306c3e86ef67591cc086decb2b760e7d710a3.1585270142.git.gitgitgadget@gmail.com/ -[7] (`worktree add` should copy sparsity settings from current worktree) - https://lore.kernel.org/git/c51cb3714e7b1d2f8c9370fe87eca9984ff4859f.1644269584.git.gitgitgadget@gmail.com/ -[8] (Avoid negative surprises in add, rm, and mv) - https://lore.kernel.org/git/cover.1617914011.git.matheus.bernardino@usp.br/ - https://lore.kernel.org/git/pull.1018.v4.git.1632497954.gitgitgadget@gmail.com/ -[9] (Move from out-of-cone to in-cone) - https://lore.kernel.org/git/20220630023737.473690-6-shaoxuan.yuan02@gmail.com/ - https://lore.kernel.org/git/20220630023737.473690-4-shaoxuan.yuan02@gmail.com/ -[10] (Unnecessarily downloading objects outside sparse specification) - https://lore.kernel.org/git/CAOLTT8QfwOi9yx_qZZgyGa8iL8kHWutEED7ok_jxwTcYT_hf9Q@mail.gmail.com/ - -[11] (Stolee's comments on high-level usecases) - https://lore.kernel.org/git/1a1e33f6-3514-9afc-0a28-5a6b85bd8014@gmail.com/ +[1] (Original descriptions of behavior A & behavior B): + +https://lore.kernel.org/git/CABPp-BGJ_Nvi5TmgriD9Bh6eNXE2EDq2f8e8QKXAeYG3BxZafA@mail.gmail.com/ + +[2] (Fix stash applications in sparse checkouts; bugs from behavioral differences): + +https://lore.kernel.org/git/ccfedc7140dbf63ba26a15f93bd3885180b26517.1606861519.git.gitgitgadget@gmail.com/ + +[3] (Present-despite-skipped entries): + +https://lore.kernel.org/git/11d46a399d26c913787b704d2b7169cafc28d639.1642175983.git.gitgitgadget@gmail.com/ + +[4] (Clone --no-checkout interaction): + +https://lore.kernel.org/git/pull.801.v2.git.git.1591324899170.gitgitgadget@gmail.com/ (clone --no-checkout) + +[5] (The need for update_sparsity() and avoiding `read-tree -mu HEAD`): + +https://lore.kernel.org/git/3a1f084641eb47515b5a41ed4409a36128913309.1585270142.git.gitgitgadget@gmail.com/ + +[6] (SKIP_WORKTREE is advisory, not mandatory): + +https://lore.kernel.org/git/844306c3e86ef67591cc086decb2b760e7d710a3.1585270142.git.gitgitgadget@gmail.com/ + +[7] (`worktree add` should copy sparsity settings from current worktree): + +https://lore.kernel.org/git/c51cb3714e7b1d2f8c9370fe87eca9984ff4859f.1644269584.git.gitgitgadget@gmail.com/ + +[8] (Avoid negative surprises in add, rm, and mv): + + * https://lore.kernel.org/git/cover.1617914011.git.matheus.bernardino@usp.br/ + * https://lore.kernel.org/git/pull.1018.v4.git.1632497954.gitgitgadget@gmail.com/ + +[9] (Move from out-of-cone to in-cone): + + * https://lore.kernel.org/git/20220630023737.473690-6-shaoxuan.yuan02@gmail.com/ + * https://lore.kernel.org/git/20220630023737.473690-4-shaoxuan.yuan02@gmail.com/ + +[10] (Unnecessarily downloading objects outside sparse specification): + +https://lore.kernel.org/git/CAOLTT8QfwOi9yx_qZZgyGa8iL8kHWutEED7ok_jxwTcYT_hf9Q@mail.gmail.com/ + +[11] (Stolee's comments on high-level usecases): + +https://lore.kernel.org/git/1a1e33f6-3514-9afc-0a28-5a6b85bd8014@gmail.com/ [12] Others commenting on eventually switching default to behavior A: + * https://lore.kernel.org/git/xmqqh719pcoo.fsf@gitster.g/ * https://lore.kernel.org/git/xmqqzgeqw0sy.fsf@gitster.g/ * https://lore.kernel.org/git/a86af661-cf58-a4e5-0214-a67d3a794d7e@github.com/ -[13] Previous config name suggestion and description - * https://lore.kernel.org/git/CABPp-BE6zW0nJSStcVU=_DoDBnPgLqOR8pkTXK3dW11=T01OhA@mail.gmail.com/ +[13] Previous config name suggestion and description: + + https://lore.kernel.org/git/CABPp-BE6zW0nJSStcVU=_DoDBnPgLqOR8pkTXK3dW11=T01OhA@mail.gmail.com/ [14] Tangential issue: switch to cone mode as default sparse specification mechanism: - https://lore.kernel.org/git/a1b68fd6126eb341ef3637bb93fedad4309b36d0.1650594746.git.gitgitgadget@gmail.com/ + +https://lore.kernel.org/git/a1b68fd6126eb341ef3637bb93fedad4309b36d0.1650594746.git.gitgitgadget@gmail.com/ [15] Lengthy email on grep behavior, covering what should be searched: - * https://lore.kernel.org/git/CABPp-BGVO3QdbfE84uF_3QDF0-y2iHHh6G5FAFzNRfeRitkuHw@mail.gmail.com/ + +https://lore.kernel.org/git/CABPp-BGVO3QdbfE84uF_3QDF0-y2iHHh6G5FAFzNRfeRitkuHw@mail.gmail.com/ [16] Email explaining sparsity patterns vs. SKIP_WORKTREE and history operations, search for the parenthetical comment starting "We do not check". - https://lore.kernel.org/git/CABPp-BFsCPPNOZ92JQRJeGyNd0e-TCW-LcLyr0i_+VSQJP+GCg@mail.gmail.com/ + +https://lore.kernel.org/git/CABPp-BFsCPPNOZ92JQRJeGyNd0e-TCW-LcLyr0i_+VSQJP+GCg@mail.gmail.com/ [17] https://lore.kernel.org/git/20220207190320.2960362-1-jonathantanmy@google.com/ -- 2.51.0 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v3 3/4] doc: commit-graph.adoc: fix up some formatting 2025-10-16 20:02 ` [PATCH v3 " Ramsay Jones 2025-10-16 20:02 ` [PATCH v3 1/4] doc: remembering-renames.adoc: fix asciidoc warnings Ramsay Jones 2025-10-16 20:02 ` [PATCH v3 2/4] doc: sparse-checkout.adoc: " Ramsay Jones @ 2025-10-16 20:03 ` Ramsay Jones 2025-10-16 20:03 ` [PATCH v3 4/4] doc: add large-object-promisors.adoc to the docs build Ramsay Jones 2025-10-23 19:33 ` [PATCH v3 0/4] technical docs in make build Junio C Hamano 4 siblings, 0 replies; 25+ messages in thread From: Ramsay Jones @ 2025-10-16 20:03 UTC (permalink / raw) To: GIT Mailing-list Cc: Patrick Steinhardt, Elijah Newren, Derrick Stolee, Junio C Hamano, Ramsay Jones The formatting markup syntax used in this document (markdown?) is not interpreted correctly by asciidoc or asciidoctor. The main problem is the use of a '## ' prefix markup for some sub-headings, along with the use of '```' code markup and some missing literal blocks. In order to improve the (html) document formatting: - replace the '## ' prefix sub-title syntax with the '~~' underlining syntax for the relevant sub-headings. - replace the '```' code markup, which causes asciidoc(tor) to simply remove the marked up text, with a literal block '----' markup. - the second ascii diagram, in the 'Merging commit-graph files' section, is not rendered correctly by asciidoctor (asciidoc is fine) so enclose it in a '....' block. Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com> --- Documentation/technical/commit-graph.adoc | 29 +++++++++++++++-------- 1 file changed, 19 insertions(+), 10 deletions(-) diff --git a/Documentation/technical/commit-graph.adoc b/Documentation/technical/commit-graph.adoc index 2c26e95e51..a259d1567b 100644 --- a/Documentation/technical/commit-graph.adoc +++ b/Documentation/technical/commit-graph.adoc @@ -39,6 +39,7 @@ A consumer may load the following info for a commit from the graph: Values 1-4 satisfy the requirements of parse_commit_gently(). There are two definitions of generation number: + 1. Corrected committer dates (generation number v2) 2. Topological levels (generation number v1) @@ -158,7 +159,8 @@ number of commits in the full history. By creating a "chain" of commit-graphs, we enable fast writes of new commit data without rewriting the entire commit history -- at least, most of the time. -## File Layout +File Layout +~~~~~~~~~~~ A commit-graph chain uses multiple files, and we use a fixed naming convention to organize these files. Each commit-graph file has a name @@ -170,11 +172,11 @@ hashes for the files in order from "lowest" to "highest". For example, if the `commit-graph-chain` file contains the lines -``` +---- {hash0} {hash1} {hash2} -``` +---- then the commit-graph chain looks like the following diagram: @@ -213,7 +215,8 @@ specifying the hashes of all files in the lower layers. In the above example, `graph-{hash1}.graph` contains `{hash0}` while `graph-{hash2}.graph` contains `{hash0}` and `{hash1}`. -## Merging commit-graph files +Merging commit-graph files +~~~~~~~~~~~~~~~~~~~~~~~~~~ If we only added a new commit-graph file on every write, we would run into a linear search problem through many commit-graph files. Instead, we use a merge @@ -225,6 +228,7 @@ is determined by the merge strategy that the files should collapse to the commits in `graph-{hash1}` should be combined into a new `graph-{hash3}` file. +.... +---------------------+ | | | (new commits) | @@ -250,6 +254,7 @@ file. | | | | +-----------------------+ +.... During this process, the commits to write are combined, sorted and we write the contents to a temporary file, all while holding a `commit-graph-chain.lock` @@ -257,14 +262,15 @@ lock-file. When the file is flushed, we rename it to `graph-{hash3}` according to the computed `{hash3}`. Finally, we write the new chain data to `commit-graph-chain.lock`: -``` +---- {hash3} {hash0} -``` +---- We then close the lock-file. -## Merge Strategy +Merge Strategy +~~~~~~~~~~~~~~ When writing a set of commits that do not exist in the commit-graph stack of height N, we default to creating a new file at level N + 1. We then decide to @@ -289,7 +295,8 @@ The merge strategy values (2 for the size multiple, 64,000 for the maximum number of commits) could be extracted into config settings for full flexibility. -## Handling Mixed Generation Number Chains +Handling Mixed Generation Number Chains +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ With the introduction of generation number v2 and generation data chunk, the following scenario is possible: @@ -318,7 +325,8 @@ have corrected commit dates when written by compatible versions of Git. Thus, rewriting split commit-graph as a single file (`--split=replace`) creates a single layer with corrected commit dates. -## Deleting graph-{hash} files +Deleting graph-\{hash\} files +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ After a new tip file is written, some `graph-{hash}` files may no longer be part of a chain. It is important to remove these files from disk, eventually. @@ -333,7 +341,8 @@ files whose modified times are older than a given expiry window. This window defaults to zero, but can be changed using command-line arguments or a config setting. -## Chains across multiple object directories +Chains across multiple object directories +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In a repo with alternates, we look for the `commit-graph-chain` file starting in the local object directory and then in each alternate. The first file that -- 2.51.0 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v3 4/4] doc: add large-object-promisors.adoc to the docs build 2025-10-16 20:02 ` [PATCH v3 " Ramsay Jones ` (2 preceding siblings ...) 2025-10-16 20:03 ` [PATCH v3 3/4] doc: commit-graph.adoc: fix up some formatting Ramsay Jones @ 2025-10-16 20:03 ` Ramsay Jones 2025-10-17 16:37 ` Ramsay Jones 2025-10-23 19:33 ` [PATCH v3 0/4] technical docs in make build Junio C Hamano 4 siblings, 1 reply; 25+ messages in thread From: Ramsay Jones @ 2025-10-16 20:03 UTC (permalink / raw) To: GIT Mailing-list Cc: Patrick Steinhardt, Elijah Newren, Derrick Stolee, Junio C Hamano, Ramsay Jones Commit 5040f9f164 ("doc: add technical design doc for large object promisors", 2025-02-18) added the large object promisors document as a technical document (with a '.txt' extension). The merge commit 2c6fd30198 ("Merge branch 'cc/lop-remote'", 2025-03-05) seems to have renamed the file with an '.adoc' extension. Despite the '.adoc' extension, this document was not being formatted by asciidoc(tor) as part of the docs build. In order to do so, add the document to the make and meson build files. Having added the document to the build, asciidoc and asciidoctor find (slightly different) problems with the syntax of the input document. The first set of warnings (only issued by asciidoc) relate to some 'section title out of sequence: expected level 3, got level 4'. This document uses 'setext' style of section headers, using a series of underline characters, where the character used denotes the level of the title. From document title to level 5 (see [1]), these characters are =, -, ~, ^, +. This does not seem to fit the error message, which implies that those characters denote levels 0 -> 4. Replacing the headings underlined with '+' by the '^' character eliminates these warnings. The second set of warnings (only issued by asciidoctor) relate to some headings which seem to use both arabic and roman numerals as part of a single 'list' sequence. This elicited either 'unterminated listing block' or (for example) 'list item index: expected I, got II' warnings. In order not to mix arabic and roman numerals, remove the numeral from the '0) Non goals' heading. Similarly, the remaining roman numeral entries had the ')' removed and turned into regular headings with I, II, III ... at the beginning. [1] https://asciidoctor.org/docs/asciidoc-recommended-practices/ Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com> --- Documentation/Makefile | 1 + .../technical/large-object-promisors.adoc | 64 +++++++++---------- Documentation/technical/meson.build | 1 + 3 files changed, 34 insertions(+), 32 deletions(-) diff --git a/Documentation/Makefile b/Documentation/Makefile index a3fbd29744..a3ba25e659 100644 --- a/Documentation/Makefile +++ b/Documentation/Makefile @@ -122,6 +122,7 @@ TECH_DOCS += technical/bundle-uri TECH_DOCS += technical/commit-graph TECH_DOCS += technical/directory-rename-detection TECH_DOCS += technical/hash-function-transition +TECH_DOCS += technical/large-object-promisors TECH_DOCS += technical/long-running-process-protocol TECH_DOCS += technical/multi-pack-index TECH_DOCS += technical/packfile-uri diff --git a/Documentation/technical/large-object-promisors.adoc b/Documentation/technical/large-object-promisors.adoc index dea8dafa66..2aa815e023 100644 --- a/Documentation/technical/large-object-promisors.adoc +++ b/Documentation/technical/large-object-promisors.adoc @@ -34,8 +34,8 @@ a new object representation for large blobs as discussed in: https://lore.kernel.org/git/xmqqbkdometi.fsf@gitster.g/ -0) Non goals ------------- +Non goals +--------- - We will not discuss those client side improvements here, as they would require changes in different parts of Git than this effort. @@ -90,8 +90,8 @@ later in this document: even more to host content with larger blobs or more large blobs than currently. -I) Issues with the current situation ------------------------------------- +I Issues with the current situation +----------------------------------- - Some statistics made on GitLab repos have shown that more than 75% of the disk space is used by blobs that are larger than 1MB and @@ -138,8 +138,8 @@ I) Issues with the current situation complaining that these tools require significant effort to set up, learn and use correctly. -II) Main features of the "Large Object Promisors" solution ----------------------------------------------------------- +II Main features of the "Large Object Promisors" solution +--------------------------------------------------------- The main features below should give a rough overview of how the solution may work. Details about needed elements can be found in @@ -166,7 +166,7 @@ format. They should be used along with main remotes that contain the other objects. Note 1 -++++++ +^^^^^^ To clarify, a LOP is a normal promisor remote, except that: @@ -178,7 +178,7 @@ To clarify, a LOP is a normal promisor remote, except that: itself. Note 2 -++++++ +^^^^^^ Git already makes it possible for a main remote to also be a promisor remote storing both regular objects and large blobs for a client that @@ -186,13 +186,13 @@ clones from it with a filter on blob size. But here we explicitly want to avoid that. Rationale -+++++++++ +^^^^^^^^^ LOPs aim to be good at handling large blobs while main remotes are already good at handling other objects. Implementation -++++++++++++++ +^^^^^^^^^^^^^^ Git already has support for multiple promisor remotes, see link:partial-clone.html#using-many-promisor-remotes[the partial clone documentation]. @@ -213,19 +213,19 @@ remote helper (see linkgit:gitremote-helpers[7]) which makes the underlying object storage appear like a remote to Git. Note -++++ +^^^^ A LOP can be a promisor remote accessed using a remote helper by both some clients and the main remote. Rationale -+++++++++ +^^^^^^^^^ This looks like the simplest way to create LOPs that can cheaply handle many large blobs. Implementation -++++++++++++++ +^^^^^^^^^^^^^^ Remote helpers are quite easy to write as shell scripts, but it might be more efficient and maintainable to write them using other languages @@ -247,7 +247,7 @@ The underlying object storage that a LOP uses could also serve as storage for large files handled by Git LFS. Rationale -+++++++++ +^^^^^^^^^ This would simplify the server side if it wants to both use a LOP and act as a Git LFS server. @@ -259,7 +259,7 @@ On the server side, a main remote should have a way to offload to a LOP all its blobs with a size over a configurable threshold. Rationale -+++++++++ +^^^^^^^^^ This makes it easy to set things up and to clean things up. For example, an admin could use this to manually convert a repo not using @@ -268,7 +268,7 @@ some users would sometimes push large blobs, a cron job could use this to regularly make sure the large blobs are moved to the LOP. Implementation -++++++++++++++ +^^^^^^^^^^^^^^ Using something based on `git repack --filter=...` to separate the blobs we want to offload from the other Git objects could be a good @@ -284,13 +284,13 @@ should have ways to prevent oversize blobs to be fetched, and also perhaps pushed, into it. Rationale -+++++++++ +^^^^^^^^^ A main remote containing many oversize blobs would defeat the purpose of LOPs. Implementation -++++++++++++++ +^^^^^^^^^^^^^^ The way to offload to a LOP discussed in 4) above can be used to regularly offload oversize blobs. About preventing oversize blobs from @@ -326,18 +326,18 @@ large blobs directly from the LOP and the server would not need to fetch those blobs from the LOP to be able to serve the client. Note -++++ +^^^^ For fetches instead of clones, a protocol negotiation might not always happen, see the "What about fetches?" FAQ entry below for details. Rationale -+++++++++ +^^^^^^^^^ Security, configurability and efficiency of setting things up. Implementation -++++++++++++++ +^^^^^^^^^^^^^^ A "promisor-remote" protocol v2 capability looks like a good way to implement this. The way the client and server use this capability @@ -356,7 +356,7 @@ the client should be able to offload some large blobs it has fetched, but might not need anymore, to the LOP. Note -++++ +^^^^ It might depend on the context if it should be OK or not for clients to offload large blobs they have created, instead of fetched, directly @@ -367,13 +367,13 @@ This should be discussed and refined when we get closer to implementing this feature. Rationale -+++++++++ +^^^^^^^^^ On the client, the easiest way to deal with unneeded large blobs is to offload them. Implementation -++++++++++++++ +^^^^^^^^^^^^^^ This is very similar to what 4) above is about, except on the client side instead of the server side. So a good solution to 4) could likely @@ -385,8 +385,8 @@ when cloning (see 6) above). Also if the large blobs were fetched from a LOP, it is likely, and can easily be confirmed, that the LOP still has them, so that they can just be removed from the client. -III) Benefits of using LOPs ---------------------------- +III Benefits of using LOPs +-------------------------- Many benefits are related to the issues discussed in "I) Issues with the current situation" above: @@ -406,8 +406,8 @@ the current situation" above: - Reduced storage needs on the client side. -IV) FAQ -------- +IV FAQ +------ What about using multiple LOPs on the server and client side? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -533,7 +533,7 @@ some objects it already knows about but doesn't have because they are on a promisor remote. Regular fetch -+++++++++++++ +^^^^^^^^^^^^^ In a regular fetch, the client will contact the main remote and a protocol negotiation will happen between them. It's a good thing that @@ -551,7 +551,7 @@ new fetch will happen in the same way as the previous clone or fetch, using, or not using, the same LOP(s) as last time. "Backfill" or "lazy" fetch -++++++++++++++++++++++++++ +^^^^^^^^^^^^^^^^^^^^^^^^^^ When there is a backfill fetch, the client doesn't necessarily contact the main remote first. It will try to fetch from its promisor remotes @@ -576,8 +576,8 @@ from the client when it fetches from them. The client could get the token when performing a protocol negotiation with the main remote (see section II.6 above). -V) Future improvements ----------------------- +V Future improvements +--------------------- It is expected that at the beginning using LOPs will be mostly worth it either in a corporate context where the Git version that clients diff --git a/Documentation/technical/meson.build b/Documentation/technical/meson.build index a13aafcfbb..34b5ebe5c3 100644 --- a/Documentation/technical/meson.build +++ b/Documentation/technical/meson.build @@ -13,6 +13,7 @@ articles = [ 'commit-graph.adoc', 'directory-rename-detection.adoc', 'hash-function-transition.adoc', + 'large-object-promisors.adoc', 'long-running-process-protocol.adoc', 'multi-pack-index.adoc', 'packfile-uri.adoc', -- 2.51.0 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH v3 4/4] doc: add large-object-promisors.adoc to the docs build 2025-10-16 20:03 ` [PATCH v3 4/4] doc: add large-object-promisors.adoc to the docs build Ramsay Jones @ 2025-10-17 16:37 ` Ramsay Jones 0 siblings, 0 replies; 25+ messages in thread From: Ramsay Jones @ 2025-10-17 16:37 UTC (permalink / raw) To: GIT Mailing-list Cc: Patrick Steinhardt, Elijah Newren, Derrick Stolee, Junio C Hamano, Christian Couder Sorry, I meant to add Christian on CC:, since he wrote this document in commit 5040f9f164, but I totally forgot. :( Sorry about that. On 16/10/2025 9:03 pm, Ramsay Jones wrote: > Commit 5040f9f164 ("doc: add technical design doc for large object > promisors", 2025-02-18) added the large object promisors document > as a technical document (with a '.txt' extension). The merge commit > 2c6fd30198 ("Merge branch 'cc/lop-remote'", 2025-03-05) seems to > have renamed the file with an '.adoc' extension. > > Despite the '.adoc' extension, this document was not being formatted > by asciidoc(tor) as part of the docs build. In order to do so, add > the document to the make and meson build files. > > Having added the document to the build, asciidoc and asciidoctor find > (slightly different) problems with the syntax of the input document. > > The first set of warnings (only issued by asciidoc) relate to some > 'section title out of sequence: expected level 3, got level 4'. This > document uses 'setext' style of section headers, using a series of > underline characters, where the character used denotes the level of > the title. From document title to level 5 (see [1]), these characters > are =, -, ~, ^, +. This does not seem to fit the error message, which > implies that those characters denote levels 0 -> 4. Replacing the headings > underlined with '+' by the '^' character eliminates these warnings. > > The second set of warnings (only issued by asciidoctor) relate to some > headings which seem to use both arabic and roman numerals as part of > a single 'list' sequence. This elicited either 'unterminated listing > block' or (for example) 'list item index: expected I, got II' warnings. > In order not to mix arabic and roman numerals, remove the numeral from > the '0) Non goals' heading. Similarly, the remaining roman numeral > entries had the ')' removed and turned into regular headings with I, II, > III ... at the beginning. > > [1] https://asciidoctor.org/docs/asciidoc-recommended-practices/ > > Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com> > --- > Documentation/Makefile | 1 + > .../technical/large-object-promisors.adoc | 64 +++++++++---------- > Documentation/technical/meson.build | 1 + > 3 files changed, 34 insertions(+), 32 deletions(-) > > diff --git a/Documentation/Makefile b/Documentation/Makefile > index a3fbd29744..a3ba25e659 100644 > --- a/Documentation/Makefile > +++ b/Documentation/Makefile > @@ -122,6 +122,7 @@ TECH_DOCS += technical/bundle-uri > TECH_DOCS += technical/commit-graph > TECH_DOCS += technical/directory-rename-detection > TECH_DOCS += technical/hash-function-transition > +TECH_DOCS += technical/large-object-promisors > TECH_DOCS += technical/long-running-process-protocol > TECH_DOCS += technical/multi-pack-index > TECH_DOCS += technical/packfile-uri > diff --git a/Documentation/technical/large-object-promisors.adoc b/Documentation/technical/large-object-promisors.adoc > index dea8dafa66..2aa815e023 100644 > --- a/Documentation/technical/large-object-promisors.adoc > +++ b/Documentation/technical/large-object-promisors.adoc > @@ -34,8 +34,8 @@ a new object representation for large blobs as discussed in: > > https://lore.kernel.org/git/xmqqbkdometi.fsf@gitster.g/ > > -0) Non goals > ------------- > +Non goals > +--------- > > - We will not discuss those client side improvements here, as they > would require changes in different parts of Git than this effort. > @@ -90,8 +90,8 @@ later in this document: > even more to host content with larger blobs or more large blobs > than currently. > > -I) Issues with the current situation > ------------------------------------- > +I Issues with the current situation > +----------------------------------- > > - Some statistics made on GitLab repos have shown that more than 75% > of the disk space is used by blobs that are larger than 1MB and > @@ -138,8 +138,8 @@ I) Issues with the current situation > complaining that these tools require significant effort to set up, > learn and use correctly. > > -II) Main features of the "Large Object Promisors" solution > ----------------------------------------------------------- > +II Main features of the "Large Object Promisors" solution > +--------------------------------------------------------- > > The main features below should give a rough overview of how the > solution may work. Details about needed elements can be found in > @@ -166,7 +166,7 @@ format. They should be used along with main remotes that contain the > other objects. > > Note 1 > -++++++ > +^^^^^^ > > To clarify, a LOP is a normal promisor remote, except that: > > @@ -178,7 +178,7 @@ To clarify, a LOP is a normal promisor remote, except that: > itself. > > Note 2 > -++++++ > +^^^^^^ > > Git already makes it possible for a main remote to also be a promisor > remote storing both regular objects and large blobs for a client that > @@ -186,13 +186,13 @@ clones from it with a filter on blob size. But here we explicitly want > to avoid that. > > Rationale > -+++++++++ > +^^^^^^^^^ > > LOPs aim to be good at handling large blobs while main remotes are > already good at handling other objects. > > Implementation > -++++++++++++++ > +^^^^^^^^^^^^^^ > > Git already has support for multiple promisor remotes, see > link:partial-clone.html#using-many-promisor-remotes[the partial clone documentation]. > @@ -213,19 +213,19 @@ remote helper (see linkgit:gitremote-helpers[7]) which makes the > underlying object storage appear like a remote to Git. > > Note > -++++ > +^^^^ > > A LOP can be a promisor remote accessed using a remote helper by > both some clients and the main remote. > > Rationale > -+++++++++ > +^^^^^^^^^ > > This looks like the simplest way to create LOPs that can cheaply > handle many large blobs. > > Implementation > -++++++++++++++ > +^^^^^^^^^^^^^^ > > Remote helpers are quite easy to write as shell scripts, but it might > be more efficient and maintainable to write them using other languages > @@ -247,7 +247,7 @@ The underlying object storage that a LOP uses could also serve as > storage for large files handled by Git LFS. > > Rationale > -+++++++++ > +^^^^^^^^^ > > This would simplify the server side if it wants to both use a LOP and > act as a Git LFS server. > @@ -259,7 +259,7 @@ On the server side, a main remote should have a way to offload to a > LOP all its blobs with a size over a configurable threshold. > > Rationale > -+++++++++ > +^^^^^^^^^ > > This makes it easy to set things up and to clean things up. For > example, an admin could use this to manually convert a repo not using > @@ -268,7 +268,7 @@ some users would sometimes push large blobs, a cron job could use this > to regularly make sure the large blobs are moved to the LOP. > > Implementation > -++++++++++++++ > +^^^^^^^^^^^^^^ > > Using something based on `git repack --filter=...` to separate the > blobs we want to offload from the other Git objects could be a good > @@ -284,13 +284,13 @@ should have ways to prevent oversize blobs to be fetched, and also > perhaps pushed, into it. > > Rationale > -+++++++++ > +^^^^^^^^^ > > A main remote containing many oversize blobs would defeat the purpose > of LOPs. > > Implementation > -++++++++++++++ > +^^^^^^^^^^^^^^ > > The way to offload to a LOP discussed in 4) above can be used to > regularly offload oversize blobs. About preventing oversize blobs from > @@ -326,18 +326,18 @@ large blobs directly from the LOP and the server would not need to > fetch those blobs from the LOP to be able to serve the client. > > Note > -++++ > +^^^^ > > For fetches instead of clones, a protocol negotiation might not always > happen, see the "What about fetches?" FAQ entry below for details. > > Rationale > -+++++++++ > +^^^^^^^^^ > > Security, configurability and efficiency of setting things up. > > Implementation > -++++++++++++++ > +^^^^^^^^^^^^^^ > > A "promisor-remote" protocol v2 capability looks like a good way to > implement this. The way the client and server use this capability > @@ -356,7 +356,7 @@ the client should be able to offload some large blobs it has fetched, > but might not need anymore, to the LOP. > > Note > -++++ > +^^^^ > > It might depend on the context if it should be OK or not for clients > to offload large blobs they have created, instead of fetched, directly > @@ -367,13 +367,13 @@ This should be discussed and refined when we get closer to > implementing this feature. > > Rationale > -+++++++++ > +^^^^^^^^^ > > On the client, the easiest way to deal with unneeded large blobs is to > offload them. > > Implementation > -++++++++++++++ > +^^^^^^^^^^^^^^ > > This is very similar to what 4) above is about, except on the client > side instead of the server side. So a good solution to 4) could likely > @@ -385,8 +385,8 @@ when cloning (see 6) above). Also if the large blobs were fetched from > a LOP, it is likely, and can easily be confirmed, that the LOP still > has them, so that they can just be removed from the client. > > -III) Benefits of using LOPs > ---------------------------- > +III Benefits of using LOPs > +-------------------------- > > Many benefits are related to the issues discussed in "I) Issues with > the current situation" above: > @@ -406,8 +406,8 @@ the current situation" above: > > - Reduced storage needs on the client side. > > -IV) FAQ > -------- > +IV FAQ > +------ > > What about using multiple LOPs on the server and client side? > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > @@ -533,7 +533,7 @@ some objects it already knows about but doesn't have because they are > on a promisor remote. > > Regular fetch > -+++++++++++++ > +^^^^^^^^^^^^^ > > In a regular fetch, the client will contact the main remote and a > protocol negotiation will happen between them. It's a good thing that > @@ -551,7 +551,7 @@ new fetch will happen in the same way as the previous clone or fetch, > using, or not using, the same LOP(s) as last time. > > "Backfill" or "lazy" fetch > -++++++++++++++++++++++++++ > +^^^^^^^^^^^^^^^^^^^^^^^^^^ > > When there is a backfill fetch, the client doesn't necessarily contact > the main remote first. It will try to fetch from its promisor remotes > @@ -576,8 +576,8 @@ from the client when it fetches from them. The client could get the > token when performing a protocol negotiation with the main remote (see > section II.6 above). > > -V) Future improvements > ----------------------- > +V Future improvements > +--------------------- > > It is expected that at the beginning using LOPs will be mostly worth > it either in a corporate context where the Git version that clients > diff --git a/Documentation/technical/meson.build b/Documentation/technical/meson.build > index a13aafcfbb..34b5ebe5c3 100644 > --- a/Documentation/technical/meson.build > +++ b/Documentation/technical/meson.build > @@ -13,6 +13,7 @@ articles = [ > 'commit-graph.adoc', > 'directory-rename-detection.adoc', > 'hash-function-transition.adoc', > + 'large-object-promisors.adoc', > 'long-running-process-protocol.adoc', > 'multi-pack-index.adoc', > 'packfile-uri.adoc', ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 0/4] technical docs in make build 2025-10-16 20:02 ` [PATCH v3 " Ramsay Jones ` (3 preceding siblings ...) 2025-10-16 20:03 ` [PATCH v3 4/4] doc: add large-object-promisors.adoc to the docs build Ramsay Jones @ 2025-10-23 19:33 ` Junio C Hamano 2025-10-23 20:06 ` Ramsay Jones 4 siblings, 1 reply; 25+ messages in thread From: Junio C Hamano @ 2025-10-23 19:33 UTC (permalink / raw) To: Ramsay Jones Cc: GIT Mailing-list, Patrick Steinhardt, Elijah Newren, Derrick Stolee Ramsay Jones <ramsay@ramsayjones.plus.com> writes: > Changes in v3: > > - old patch #1 discarded since it was separated into its own branch > ('rj/doc-missing-technical-docs' in next) > - tyop in patch #2 (old patch #3) > - new patch #4 > > A range diff against v2 is given below. > > Note that the two remaining problems (see v2 below) have not been > addressed but, even without a solution, these patches represent a > good improvement. ;) (I am still hopeful that an asciidoc guru will > turn up!) > > NOTE: this series is based on the v2-version of the patch #1, which > in turn is based on commit 6ad8021821 ("The fifth batch", 2025-08-29). Let's merge this iteration down and if there are things that still need working, do them on top. Thanks. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 0/4] technical docs in make build 2025-10-23 19:33 ` [PATCH v3 0/4] technical docs in make build Junio C Hamano @ 2025-10-23 20:06 ` Ramsay Jones 0 siblings, 0 replies; 25+ messages in thread From: Ramsay Jones @ 2025-10-23 20:06 UTC (permalink / raw) To: Junio C Hamano Cc: GIT Mailing-list, Patrick Steinhardt, Elijah Newren, Derrick Stolee On 23/10/2025 8:33 pm, Junio C Hamano wrote: > Ramsay Jones <ramsay@ramsayjones.plus.com> writes: > >> Changes in v3: >> >> - old patch #1 discarded since it was separated into its own branch >> ('rj/doc-missing-technical-docs' in next) >> - tyop in patch #2 (old patch #3) >> - new patch #4 >> >> A range diff against v2 is given below. >> >> Note that the two remaining problems (see v2 below) have not been >> addressed but, even without a solution, these patches represent a >> good improvement. ;) (I am still hopeful that an asciidoc guru will >> turn up!) >> >> NOTE: this series is based on the v2-version of the patch #1, which >> in turn is based on commit 6ad8021821 ("The fifth batch", 2025-08-29). > > Let's merge this iteration down and if there are things that still > need working, do them on top. > > Thanks. That sounds good to me. Thanks! ATB, Ramsay Jones ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2025-10-23 20:09 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <https://lore.kernel.org/git/bcb3b3a3-bb13-4808-9363-442b5f9be05f@ramsayjones.plus.com/>
2025-10-02 22:12 ` [PATCH v2 0/4] technical docs in make build Ramsay Jones
2025-10-02 22:12 ` [PATCH v2 1/4] doc: add some missing technical documents Ramsay Jones
2025-10-08 6:45 ` Patrick Steinhardt
2025-10-08 19:00 ` Junio C Hamano
2025-10-08 22:01 ` Ramsay Jones
2025-10-08 22:33 ` Junio C Hamano
2025-10-08 21:56 ` Ramsay Jones
2025-10-02 22:12 ` [PATCH v2 2/4] doc: remembering-renames.adoc: fix asciidoc warnings Ramsay Jones
2025-10-08 3:51 ` Elijah Newren
2025-10-08 21:38 ` Ramsay Jones
2025-10-02 22:12 ` [PATCH v2 3/4] doc: sparse-checkout.adoc: " Ramsay Jones
2025-10-07 12:20 ` Kristoffer Haugsbakk
2025-10-07 22:17 ` Ramsay Jones
2025-10-08 3:57 ` Elijah Newren
2025-10-08 21:54 ` Ramsay Jones
2025-10-02 22:12 ` [PATCH v2 4/4] doc: commit-graph.adoc: fix up some formatting Ramsay Jones
2025-10-02 22:38 ` [PATCH v2 0/4] technical docs in make build Ramsay Jones
2025-10-16 20:02 ` [PATCH v3 " Ramsay Jones
2025-10-16 20:02 ` [PATCH v3 1/4] doc: remembering-renames.adoc: fix asciidoc warnings Ramsay Jones
2025-10-16 20:02 ` [PATCH v3 2/4] doc: sparse-checkout.adoc: " Ramsay Jones
2025-10-16 20:03 ` [PATCH v3 3/4] doc: commit-graph.adoc: fix up some formatting Ramsay Jones
2025-10-16 20:03 ` [PATCH v3 4/4] doc: add large-object-promisors.adoc to the docs build Ramsay Jones
2025-10-17 16:37 ` Ramsay Jones
2025-10-23 19:33 ` [PATCH v3 0/4] technical docs in make build Junio C Hamano
2025-10-23 20:06 ` Ramsay Jones
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).