* [PATCH] diff: "lisp" userdiff_driver
@ 2025-11-15 10:17 Scott L. Burson via GitGitGadget
2025-11-15 17:06 ` Johannes Sixt
2025-11-27 2:38 ` [PATCH v2 0/2] userdiff: extend Scheme support to cover other Lisp dialects Scott L. Burson via GitGitGadget
0 siblings, 2 replies; 15+ messages in thread
From: Scott L. Burson via GitGitGadget @ 2025-11-15 10:17 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Johannes Sixt,
Ævar Arnfjörð Bjarmason, Jaydeep P Das,
Scott L. Burson, Scott L. Burson
From: "Scott L. Burson" <Scott@sympoiesis.com>
The "scheme" driver doesn't quite work for Common Lisp. This driver
is very generic and should work for almost any dialect of Lisp,
including Common Lisp.
Signed-off-by: Scott L. Burson <Scott@sympoiesis.com>
---
diff: "lisp" userdiff_driver
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2000%2Fslburson%2Flisp-userdiff_driver-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2000/slburson/lisp-userdiff_driver-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/2000
userdiff.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/userdiff.c b/userdiff.c
index fe710a68bf..e127b4a1f1 100644
--- a/userdiff.c
+++ b/userdiff.c
@@ -249,6 +249,14 @@ PATTERNS("kotlin",
"|[.][0-9][0-9_]*([Ee][-+]?[0-9]+)?[fFlLuU]?"
/* unary and binary operators */
"|[-+*/<>%&^|=!]==?|--|\\+\\+|<<=|>>=|&&|\\|\\||->|\\.\\*|!!|[?:.][.:]"),
+PATTERNS("lisp",
+ /* Either an unindented left paren, or a slightly indented line
+ * starting with "(def" */
+ "^((\\(|:space:{1,2}\\(def).*)$",
+ /* Common Lisp symbol syntax allows arbitrary strings between vertical bars */
+ "\\|([^\\\\]|\\\\\\\\|\\\\\\|)*\\|"
+ /* All other words are delimited by spaces or parentheses/brackets/braces */
+ "|([^][(){} \t])+"),
PATTERNS("markdown",
"^ {0,3}#{1,6}[ \t].*",
/* -- */
base-commit: fd372d9b1a69a01a676398882bbe3840bf51fe72
--
gitgitgadget
^ permalink raw reply related [flat|nested] 15+ messages in thread* Re: [PATCH] diff: "lisp" userdiff_driver 2025-11-15 10:17 [PATCH] diff: "lisp" userdiff_driver Scott L. Burson via GitGitGadget @ 2025-11-15 17:06 ` Johannes Sixt 2025-11-15 23:32 ` Scott L. Burson 2025-11-16 5:30 ` Junio C Hamano 2025-11-27 2:38 ` [PATCH v2 0/2] userdiff: extend Scheme support to cover other Lisp dialects Scott L. Burson via GitGitGadget 1 sibling, 2 replies; 15+ messages in thread From: Johannes Sixt @ 2025-11-15 17:06 UTC (permalink / raw) To: Scott L. Burson Cc: Junio C Hamano, Ævar Arnfjörð Bjarmason, Jaydeep P Das, Atharva Raykar, git, Scott L. Burson via GitGitGadget [Cc the author of the Scheme driver] Am 15.11.25 um 11:17 schrieb Scott L. Burson via GitGitGadget: > From: "Scott L. Burson" <Scott@sympoiesis.com> > > The "scheme" driver doesn't quite work for Common Lisp. This driver > is very generic and should work for almost any dialect of Lisp, > including Common Lisp. "Doesn't quite work" is unfortunately a description of the problem that does not help a reviewer decide whether this change is justified. Please add a lot more details why the Scheme driver is unsuitable for Lisp and why a new driver is needed. It is customary to mark changes to the drivers in the subject line with "userdiff:". Have a look at `git log userdiff.c`. It would be appreciated to stay away from nerdy tokens like "userdiff_driver" when the change can be summarized in plain English language. > > Signed-off-by: Scott L. Burson <Scott@sympoiesis.com> > --- > diff: "lisp" userdiff_driver > > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2000%2Fslburson%2Flisp-userdiff_driver-v1 > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2000/slburson/lisp-userdiff_driver-v1 > Pull-Request: https://github.com/gitgitgadget/git/pull/2000 > > userdiff.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/userdiff.c b/userdiff.c > index fe710a68bf..e127b4a1f1 100644 > --- a/userdiff.c > +++ b/userdiff.c > @@ -249,6 +249,14 @@ PATTERNS("kotlin", > "|[.][0-9][0-9_]*([Ee][-+]?[0-9]+)?[fFlLuU]?" > /* unary and binary operators */ > "|[-+*/<>%&^|=!]==?|--|\\+\\+|<<=|>>=|&&|\\|\\||->|\\.\\*|!!|[?:.][.:]"), > +PATTERNS("lisp", > + /* Either an unindented left paren, or a slightly indented line > + * starting with "(def" */ > + "^((\\(|:space:{1,2}\\(def).*)$", Compared to the Scheme driver, this regular expression is - more restrictive because it does not permit arbitrary indentation; - less restrictive because it permits everything that begins with "(def". What would happen if this regular expression were added to the Scheme driver? Would it pick up additional and unwanted hunk headers is typical Scheme code? Just so you know: The string literal for hunk headers can contain "\n" to separate regular expressions. For a given line, they are attempted to match in order until the first match is found. > + /* Common Lisp symbol syntax allows arbitrary strings between vertical bars */ > + "\\|([^\\\\]|\\\\\\\\|\\\\\\|)*\\|" The Scheme driver has an similar description of this word token, but it has only half as many backslashes. Is the difference necessary? Isn't actually one or the other incorrect? (I did not try to understand what this version here does.) > + /* All other words are delimited by spaces or parentheses/brackets/braces */ > + "|([^][(){} \t])+"), This one looks identical to the same line in Scheme with a slightly altered comment. In summary, I feel like this could be unified with the Scheme driver, except when awkward false positive hunk headers in existing Scheme code are to be expected. That's a question that I cannot answer. -- Hannes ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] diff: "lisp" userdiff_driver 2025-11-15 17:06 ` Johannes Sixt @ 2025-11-15 23:32 ` Scott L. Burson 2025-11-20 16:47 ` D. Ben Knoble 2025-11-16 5:30 ` Junio C Hamano 1 sibling, 1 reply; 15+ messages in thread From: Scott L. Burson @ 2025-11-15 23:32 UTC (permalink / raw) To: Johannes Sixt Cc: Junio C Hamano, Ævar Arnfjörð Bjarmason, Jaydeep P Das, Atharva Raykar, git, Scott L. Burson via GitGitGadget On Sat, Nov 15, 2025 at 9:06 AM Johannes Sixt <j6t@kdbg.org> wrote: > > [Cc the author of the Scheme driver] > > Am 15.11.25 um 11:17 schrieb Scott L. Burson via GitGitGadget: > > From: "Scott L. Burson" <Scott@sympoiesis.com> > > Please > add a lot more details why the Scheme driver is unsuitable for Lisp and > why a new driver is needed. Here is text I propose for the commit message: ---- Common Lisp has top-level forms 'defun' and 'deftype' that are not matched by the current Scheme pattern. Also, it is more common when defining user macros intended as top-level forms to prefix their names with "def" instead of "define"; such forms are also not matched. And some such forms don't even begin with "def". On the other hand, it is an established formatting convention in the Lisp community that only top-level forms start at the left margin. So matching any unindented line starting with an open parenthesis is an acceptable heuristic; false positives will be rare. However, there are also cases where notionally top-level forms are grouped together within some containing form. At least in the Common Lisp community, it is conventional to indent these by two spaces, or sometimes one. But matching just an open parenthesis indented by two spaces would be too broad; so the pattern added by this commit requires an indented form to start with "(def". It is believed that this strikes a good balance between potential false positives and false negatives. ---- I discussed the pattern with some other experienced Common Lisp developers on a mailing list, and this is what I settled on after incorporating their feedback. > It is customary to mark changes to the drivers in the subject line with > "userdiff:". Have a look at `git log userdiff.c`. It would be > appreciated to stay away from nerdy tokens like "userdiff_driver" when > the change can be summarized in plain English language. Will do. > > > > Signed-off-by: Scott L. Burson <Scott@sympoiesis.com> > > --- > > diff: "lisp" userdiff_driver > > > > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2000%2Fslburson%2Flisp-userdiff_driver-v1 > > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2000/slburson/lisp-userdiff_driver-v1 > > Pull-Request: https://github.com/gitgitgadget/git/pull/2000 > > > > userdiff.c | 8 ++++++++ > > 1 file changed, 8 insertions(+) > > > > diff --git a/userdiff.c b/userdiff.c > > index fe710a68bf..e127b4a1f1 100644 > > --- a/userdiff.c > > +++ b/userdiff.c > > @@ -249,6 +249,14 @@ PATTERNS("kotlin", > > "|[.][0-9][0-9_]*([Ee][-+]?[0-9]+)?[fFlLuU]?" > > /* unary and binary operators */ > > "|[-+*/<>%&^|=!]==?|--|\\+\\+|<<=|>>=|&&|\\|\\||->|\\.\\*|!!|[?:.][.:]"), > > +PATTERNS("lisp", > > + /* Either an unindented left paren, or a slightly indented line > > + * starting with "(def" */ > > + "^((\\(|:space:{1,2}\\(def).*)$", > > Compared to the Scheme driver, this regular expression is > > - more restrictive because it does not permit arbitrary indentation; > > - less restrictive because it permits everything that begins with "(def". > > What would happen if this regular expression were added to the Scheme > driver? Would it pick up additional and unwanted hunk headers is typical > Scheme code? That is a good question. I don't think so, but I don't work in Scheme. I see that you have CC'ed Atharva Raykar; let's see whether he would have any objection. I would point out that Scheme is a dialect of Lisp, not the other way around. (Lisp is unusual in being a family of languages, rather than a single language.) And having a separate "lisp" driver might aid discoverability. But I understand: Scheme got their driver in first, and you have to fight against the tendency of the driver list to grow unboundedly. Ooh, that reminds me: if we do decide to add a "lisp" driver, I'll also need to add it to 'Documentation/gitattributes.adoc'. > The string literal for hunk headers can contain "\n" Noted. > > + /* Common Lisp symbol syntax allows arbitrary strings between vertical bars */ > > + "\\|([^\\\\]|\\\\\\\\|\\\\\\|)*\\|" > > The Scheme driver has an similar description of this word token, but it > has only half as many backslashes. Is the difference necessary? Isn't > actually one or the other incorrect? (I did not try to understand what > this version here does.) It's not important, but technically, Common Lisp allows an escaped backslash between vertical bars, but the R7RS formal grammar does not. However, I just tried Chicken Scheme, which claims to be at least partially R7RS compliant, and it does accept the escaped backslash. I am left to conclude that Scheme implementors think that the omission of the escaped backslash from the R7RS formal grammar is an oversight (I think so too). Of course, no one would actually write a symbol name with an escaped backslash in it unless they were submitting to an obfuscated Lisp contest. So we are really being pedantic here. Still, may as well allow it. Atharva, any comments? -- Scott ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] diff: "lisp" userdiff_driver 2025-11-15 23:32 ` Scott L. Burson @ 2025-11-20 16:47 ` D. Ben Knoble 2025-11-27 2:10 ` Scott L. Burson 0 siblings, 1 reply; 15+ messages in thread From: D. Ben Knoble @ 2025-11-20 16:47 UTC (permalink / raw) To: Scott L. Burson Cc: Johannes Sixt, Junio C Hamano, Ævar Arnfjörð Bjarmason, Jaydeep P Das, Atharva Raykar, git, Scott L. Burson via GitGitGadget On Sat, Nov 15, 2025 at 6:33 PM Scott L. Burson <Scott@sympoiesis.com> wrote: > > > > > > Signed-off-by: Scott L. Burson <Scott@sympoiesis.com> > > > --- > > > diff: "lisp" userdiff_driver > > > > > > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2000%2Fslburson%2Flisp-userdiff_driver-v1 > > > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2000/slburson/lisp-userdiff_driver-v1 > > > Pull-Request: https://github.com/gitgitgadget/git/pull/2000 > > > > > > userdiff.c | 8 ++++++++ > > > 1 file changed, 8 insertions(+) > > > > > > diff --git a/userdiff.c b/userdiff.c > > > index fe710a68bf..e127b4a1f1 100644 > > > --- a/userdiff.c > > > +++ b/userdiff.c > > > @@ -249,6 +249,14 @@ PATTERNS("kotlin", > > > "|[.][0-9][0-9_]*([Ee][-+]?[0-9]+)?[fFlLuU]?" > > > /* unary and binary operators */ > > > "|[-+*/<>%&^|=!]==?|--|\\+\\+|<<=|>>=|&&|\\|\\||->|\\.\\*|!!|[?:.][.:]"), > > > +PATTERNS("lisp", > > > + /* Either an unindented left paren, or a slightly indented line > > > + * starting with "(def" */ > > > + "^((\\(|:space:{1,2}\\(def).*)$", > > > > Compared to the Scheme driver, this regular expression is > > > > - more restrictive because it does not permit arbitrary indentation; > > > > - less restrictive because it permits everything that begins with "(def". > > > > What would happen if this regular expression were added to the Scheme > > driver? Would it pick up additional and unwanted hunk headers is typical > > Scheme code? > > That is a good question. I don't think so, but I don't work in Scheme. > I see that you have CC'ed Atharva Raykar; let's see whether he would > have any objection. > > I would point out that Scheme is a dialect of Lisp, not the other way > around. (Lisp is unusual in being a family of languages, rather than a > single language.) And having a separate "lisp" driver might aid > discoverability. Without "going there," I think there are enough differences to warrant a different driver. (OTOH, I have sometimes wanted to teach the Scheme driver that most "def" things are probably definitions.) Our indentation is less rigid in that indented forms may be more deeply nested than only one or two spaces (and we of course have more definition forms than "only things starting with def"), and I don't understand the downthread desire to not permit tabs. As for Scheme community, I'll suggest asking on the Racket channels (Discourse is probably best if you want a mailing-list-like discussion?) -- D. Ben Knoble ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] diff: "lisp" userdiff_driver 2025-11-20 16:47 ` D. Ben Knoble @ 2025-11-27 2:10 ` Scott L. Burson 0 siblings, 0 replies; 15+ messages in thread From: Scott L. Burson @ 2025-11-27 2:10 UTC (permalink / raw) To: D. Ben Knoble Cc: Johannes Sixt, Junio C Hamano, Ævar Arnfjörð Bjarmason, Jaydeep P Das, Atharva Raykar, git, Scott L. Burson via GitGitGadget On Thu, Nov 20, 2025 at 8:47 AM D. Ben Knoble <ben.knoble@gmail.com> wrote: > Without "going there," I think there are enough differences to warrant > a different driver. (OTOH, I have sometimes wanted to teach the Scheme > driver that most "def" things are probably definitions.) Our > indentation is less rigid in that indented forms may be more deeply > nested than only one or two spaces (and we of course have more > definition forms than "only things starting with def"), and I don't > understand the downthread desire to not permit tabs. The proposal on the table is not to replace the current Scheme regexp with the Lisp one, but rather to disjoin them, so as to match any line that either one would match. So you don't need to worry about false negatives. The thing about tabs was that, the way I had initially written the regexp, it would have matched a line starting with one or two tabs, or a tab and a space -- but no other whitespace -- followed by "(def". This was not my intention. I was just trying to match lines starting with a space or two and "(def". Again, this is in addition to the lines currently being matched by the Scheme regexp. > As for Scheme community, I'll suggest asking on the Racket channels > (Discourse is probably best if you want a mailing-list-like > discussion?) I have asked on Reddit under /r/scheme, and got only the same objection that you offered ("define" etc. forms more deeply nested), with the same resolution (disjoining the new pattern to the existing one). Reddit says my post got some 4300 views. That seems like an adequate sample to me, but I can try elsewhere if you think it's important. I have an updated version of the patch ready; I will submit it shortly. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] diff: "lisp" userdiff_driver 2025-11-15 17:06 ` Johannes Sixt 2025-11-15 23:32 ` Scott L. Burson @ 2025-11-16 5:30 ` Junio C Hamano 2025-11-17 23:23 ` Scott L. Burson 1 sibling, 1 reply; 15+ messages in thread From: Junio C Hamano @ 2025-11-16 5:30 UTC (permalink / raw) To: Johannes Sixt Cc: Scott L. Burson, Ævar Arnfjörð Bjarmason, Jaydeep P Das, Atharva Raykar, git, Scott L. Burson via GitGitGadget Johannes Sixt <j6t@kdbg.org> writes: >> + /* Either an unindented left paren, or a slightly indented line >> + * starting with "(def" */ >> + "^((\\(|:space:{1,2}\\(def).*)$", > > Compared to the Scheme driver, this regular expression is > > - more restrictive because it does not permit arbitrary indentation; > > - less restrictive because it permits everything that begins with "(def". > > What would happen if this regular expression were added to the Scheme > driver? Would it pick up additional and unwanted hunk headers is typical > Scheme code? As we generally assume that the file being edited is syntactically sound, even if one lisp variant understands "(deffoo" and others do not, it should be generally fine for the pattern to say something like "at the beginning of the line, optionally following a few spaces, four-letter sequence '(def' is likely to be the beginning of a function definition", as long as there is some convention that user defined functions and macros, unless they are to behave similarly to "(defun", would not be named so confusingly to start with d-e-f. It would be nice if a single set of rules can cover what existing scheme patterns cover, Emacs lisp, and Common lisp. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] diff: "lisp" userdiff_driver 2025-11-16 5:30 ` Junio C Hamano @ 2025-11-17 23:23 ` Scott L. Burson 2025-11-18 4:38 ` Junio C Hamano 0 siblings, 1 reply; 15+ messages in thread From: Scott L. Burson @ 2025-11-17 23:23 UTC (permalink / raw) To: Junio C Hamano Cc: Johannes Sixt, Ævar Arnfjörð Bjarmason, Jaydeep P Das, Atharva Raykar, git, Scott L. Burson via GitGitGadget On Sat, Nov 15, 2025 at 9:30 PM Junio C Hamano <gitster@pobox.com> wrote: > > Johannes Sixt <j6t@kdbg.org> writes: > > >> + /* Either an unindented left paren, or a slightly indented line > >> + * starting with "(def" */ > >> + "^((\\(|:space:{1,2}\\(def).*)$", > > > > Compared to the Scheme driver, this regular expression is > > > > - more restrictive because it does not permit arbitrary indentation; > > > > - less restrictive because it permits everything that begins with "(def". > > > > What would happen if this regular expression were added to the Scheme > > driver? Would it pick up additional and unwanted hunk headers is typical > > Scheme code? Hmm, we haven't heard from Atharva. I'll try asking around in the Scheme community. The regex I proposed has a bug. The use of the Posix character class is incorrect, because that class includes tabs. I will replace it with a literal space. Also, many Lisps, including Common Lisp in its default configuration, are case-insensitive, and at least in the 1970s, it wasn't completely unheard-of to write Lisp code in uppercase; I'll change the entry to use 'IPATTERN'. > As we generally assume that the file being edited is syntactically > sound, even if one lisp variant understands "(deffoo" and others do > not, it should be generally fine for the pattern to say something > like "at the beginning of the line, optionally following a few > spaces, four-letter sequence '(def' is likely to be the beginning of > a function definition", as long as there is some convention that > user defined functions and macros, unless they are to behave > similarly to "(defun", would not be named so confusingly to start > with d-e-f. Agreed, but this is not the most important point. The greater potential for false positives comes from the rule (in my proposal) that a left parenthesis in column 0 is taken as indicating a top-level definition, without even looking at the following characters. Although Lisp dialects certainly vary, I have not seen one in which standard indentation practice does not indent internal expressions; certainly, Lisp mode in Emacs indents them. And, I think the rule really does need to be that broad, because top-level forms don't always begin with "def"; indeed, one can put any executable expression at top level in a source file to perform load-time initializations. It's only when there is some indentation that I think the regex needs to require a word beginning with "def". > It would be nice if a single set of rules can cover what existing > scheme patterns cover, Emacs lisp, and Common lisp. Agreed. I do think it would be a little better for non-Scheme users if the single driver were named "lisp" instead of "scheme". Renaming the driver out from under the Scheme community, though, seems like it would be unfriendly, even after a deprecation period. One solution would be to add an aliasing mechanism to the driver table. Perhaps there would be other use cases for it. If you would consider a patch along these lines, I can code it up. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] diff: "lisp" userdiff_driver 2025-11-17 23:23 ` Scott L. Burson @ 2025-11-18 4:38 ` Junio C Hamano 0 siblings, 0 replies; 15+ messages in thread From: Junio C Hamano @ 2025-11-18 4:38 UTC (permalink / raw) To: Scott L. Burson Cc: Johannes Sixt, Jaydeep P Das, Atharva Raykar, git, Ævar Arnfjörð Bjarmason, Scott L. Burson via GitGitGadget "Scott L. Burson" <Scott@sympoiesis.com> writes: > ... The greater > potential for false positives comes from the rule (in my proposal) > that a left parenthesis in column 0 is taken as indicating a top-level > definition, without even looking at the following characters. I didn't respond to that part as I didn't know if you were serious or joking ;-). > Although Lisp dialects certainly vary, I have not seen one in which > standard indentation practice does not indent internal expressions; > certainly, Lisp mode in Emacs indents them. And, I think the rule > really does need to be that broad, because top-level forms don't > always begin with "def"; indeed, one can put any executable expression > at top level in a source file to perform load-time initializations. Exactly, but the more important question is are they considered as the beginning of an important, and sematically distinct, block, just like the beginning of a function is. I am somewhat negative to the "anything not indented is a beginning of a significant group", as I do not know how well it meshes with the "(defXX is a beginning of a function", when they are used together. > One solution would be to add an aliasing mechanism to the > driver table. Perhaps there would be other use cases for it. If you > would consider a patch along these lines, I can code it up. It is not a particularly interesting part of the problem, simply because as the first approximation, we can just advertise "you can mark your lisp files as 'scheme'". A more interesting issue is if we can indeed come up with such a superset of patterns that can cover all Lisp variants that matter. ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v2 0/2] userdiff: extend Scheme support to cover other Lisp dialects 2025-11-15 10:17 [PATCH] diff: "lisp" userdiff_driver Scott L. Burson via GitGitGadget 2025-11-15 17:06 ` Johannes Sixt @ 2025-11-27 2:38 ` Scott L. Burson via GitGitGadget 2025-11-27 2:38 ` [PATCH v2 1/2] diff: "lisp" userdiff_driver Scott L. Burson via GitGitGadget 2025-11-27 2:38 ` [PATCH v2 2/2] merge with Scheme regexp; fix bugs Scott L. Burson via GitGitGadget 1 sibling, 2 replies; 15+ messages in thread From: Scott L. Burson via GitGitGadget @ 2025-11-27 2:38 UTC (permalink / raw) To: git Cc: Junio C Hamano, Johannes Sixt, Ævar Arnfjörð Bjarmason, Jaydeep P Das, D. Ben Knoble, Scott L. Burson Common Lisp, Emacs Lisp, and other dialects have some top-level forms, most importantly 'defun', that are not matched by the current Scheme pattern. Also, it is common in these dialects, when defining user macros intended as top-level forms, to prefix their names with "def" instead of "define"; such forms are also not currently matched. Some such forms don't even begin with "def". On the other hand, it is an established formatting convention in the Lisp community that only top-level forms start at the left margin. So matching any unindented line starting with an open parenthesis is an acceptable heuristic; false positives will be rare. However, there are also cases where notionally top-level forms are grouped together within some containing form. At least in the Common Lisp community, it is conventional to indent these by two spaces, or sometimes one. But matching just an open parenthesis indented by two spaces would be too broad; so the pattern added by this commit requires an indented form to start with "(def". It is believed that this strikes a good balance between potential false positives and false negatives. This commit disjoins a regexp employing these heuristics to the existing Scheme regexp, so it will still match everything that it did previously. Changes since v1: * unified with Scheme driver * fixed whitespace bug (tabs were allowed incorrectly) * made "(def" case-insensitive * improved commit summary line * improved commit description Scott L. Burson (2): diff: "lisp" userdiff_driver merge with Scheme regexp; fix bugs userdiff.c | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-) base-commit: fd372d9b1a69a01a676398882bbe3840bf51fe72 Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2000%2Fslburson%2Flisp-userdiff_driver-v2 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2000/slburson/lisp-userdiff_driver-v2 Pull-Request: https://github.com/gitgitgadget/git/pull/2000 Range-diff vs v1: 1: da99bb0bcd = 1: da99bb0bcd diff: "lisp" userdiff_driver -: ---------- > 2: 86315aa3e3 merge with Scheme regexp; fix bugs -- gitgitgadget ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v2 1/2] diff: "lisp" userdiff_driver 2025-11-27 2:38 ` [PATCH v2 0/2] userdiff: extend Scheme support to cover other Lisp dialects Scott L. Burson via GitGitGadget @ 2025-11-27 2:38 ` Scott L. Burson via GitGitGadget 2025-11-27 10:32 ` Scott L. Burson 2025-11-27 2:38 ` [PATCH v2 2/2] merge with Scheme regexp; fix bugs Scott L. Burson via GitGitGadget 1 sibling, 1 reply; 15+ messages in thread From: Scott L. Burson via GitGitGadget @ 2025-11-27 2:38 UTC (permalink / raw) To: git Cc: Junio C Hamano, Johannes Sixt, Ævar Arnfjörð Bjarmason, Jaydeep P Das, D. Ben Knoble, Scott L. Burson, Scott L. Burson From: "Scott L. Burson" <Scott@sympoiesis.com> The "scheme" driver doesn't quite work for Common Lisp. This driver is very generic and should work for almost any dialect of Lisp, including Common Lisp. Signed-off-by: Scott L. Burson <Scott@sympoiesis.com> --- userdiff.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/userdiff.c b/userdiff.c index fe710a68bf..e127b4a1f1 100644 --- a/userdiff.c +++ b/userdiff.c @@ -249,6 +249,14 @@ PATTERNS("kotlin", "|[.][0-9][0-9_]*([Ee][-+]?[0-9]+)?[fFlLuU]?" /* unary and binary operators */ "|[-+*/<>%&^|=!]==?|--|\\+\\+|<<=|>>=|&&|\\|\\||->|\\.\\*|!!|[?:.][.:]"), +PATTERNS("lisp", + /* Either an unindented left paren, or a slightly indented line + * starting with "(def" */ + "^((\\(|:space:{1,2}\\(def).*)$", + /* Common Lisp symbol syntax allows arbitrary strings between vertical bars */ + "\\|([^\\\\]|\\\\\\\\|\\\\\\|)*\\|" + /* All other words are delimited by spaces or parentheses/brackets/braces */ + "|([^][(){} \t])+"), PATTERNS("markdown", "^ {0,3}#{1,6}[ \t].*", /* -- */ -- gitgitgadget ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH v2 1/2] diff: "lisp" userdiff_driver 2025-11-27 2:38 ` [PATCH v2 1/2] diff: "lisp" userdiff_driver Scott L. Burson via GitGitGadget @ 2025-11-27 10:32 ` Scott L. Burson 2025-11-27 10:51 ` Johannes Sixt 0 siblings, 1 reply; 15+ messages in thread From: Scott L. Burson @ 2025-11-27 10:32 UTC (permalink / raw) To: Scott L. Burson via GitGitGadget Cc: git, Junio C Hamano, Johannes Sixt, Ævar Arnfjörð Bjarmason, Jaydeep P Das, D. Ben Knoble On Wed, Nov 26, 2025, 6:38 PM Scott L. Burson via GitGitGadget <gitgitgadget@gmail.com> wrote: [duplicate of first message] Hmm. Somehow I have screwed things up so that GitGitGadget sends out only the first commit, not including the changes in the second, nor using the PR title and the text in the description. Sorry for the noise. Is there an easy fix, or do I need to make a new PR? -- Scott ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 1/2] diff: "lisp" userdiff_driver 2025-11-27 10:32 ` Scott L. Burson @ 2025-11-27 10:51 ` Johannes Sixt 0 siblings, 0 replies; 15+ messages in thread From: Johannes Sixt @ 2025-11-27 10:51 UTC (permalink / raw) To: Scott L. Burson, Scott L. Burson via GitGitGadget Cc: git, Junio C Hamano, Ævar Arnfjörð Bjarmason, Jaydeep P Das, D. Ben Knoble Am 27.11.25 um 11:32 schrieb Scott L. Burson: > On Wed, Nov 26, 2025, 6:38 PM Scott L. Burson via GitGitGadget > <gitgitgadget@gmail.com> wrote: > [duplicate of first message] > > Hmm. Somehow I have screwed things up so that GitGitGadget sends out > only the first commit, not including the changes in the second, nor > using the PR title and the text in the description. Sorry for the > noise. Is there an easy fix, or do I need to make a new PR? I think all went as expected as far as GGG is concerend. You made a new commit on top of the one in the earlier round, and then asked GGG to submit the PR to the mailing list. GGG made a patch series with a cover letter and the two patches. That's expected, because the first patch hasn't been integrated in upstream Git, yet. If you intended to send just the second patch (as a fixup of the first one), then GGG cannot help you, not even if you make another pull request. But you shouldn't have sent a fixup commit anyway, but that is a matter I'll address in a separate message. -- Hannes ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v2 2/2] merge with Scheme regexp; fix bugs 2025-11-27 2:38 ` [PATCH v2 0/2] userdiff: extend Scheme support to cover other Lisp dialects Scott L. Burson via GitGitGadget 2025-11-27 2:38 ` [PATCH v2 1/2] diff: "lisp" userdiff_driver Scott L. Burson via GitGitGadget @ 2025-11-27 2:38 ` Scott L. Burson via GitGitGadget 2025-11-27 16:09 ` Johannes Sixt 1 sibling, 1 reply; 15+ messages in thread From: Scott L. Burson via GitGitGadget @ 2025-11-27 2:38 UTC (permalink / raw) To: git Cc: Junio C Hamano, Johannes Sixt, Ævar Arnfjörð Bjarmason, Jaydeep P Das, D. Ben Knoble, Scott L. Burson, Scott L. Burson From: "Scott L. Burson" <Scott@sympoiesis.com> This commit merges (by disjoining) the new generic Lisp regexp into the existing Scheme regexp. It also fixes two bugs: the new regexp was unintentionally allowing tabs, and the matching of "(def" should be case-insensitive. Signed-off-by: Scott L. Burson <Scott@sympoiesis.com> --- userdiff.c | 25 ++++++++++++------------- 1 file changed, 12 insertions(+), 13 deletions(-) diff --git a/userdiff.c b/userdiff.c index e127b4a1f1..b67dfddbef 100644 --- a/userdiff.c +++ b/userdiff.c @@ -249,14 +249,6 @@ PATTERNS("kotlin", "|[.][0-9][0-9_]*([Ee][-+]?[0-9]+)?[fFlLuU]?" /* unary and binary operators */ "|[-+*/<>%&^|=!]==?|--|\\+\\+|<<=|>>=|&&|\\|\\||->|\\.\\*|!!|[?:.][.:]"), -PATTERNS("lisp", - /* Either an unindented left paren, or a slightly indented line - * starting with "(def" */ - "^((\\(|:space:{1,2}\\(def).*)$", - /* Common Lisp symbol syntax allows arbitrary strings between vertical bars */ - "\\|([^\\\\]|\\\\\\\\|\\\\\\|)*\\|" - /* All other words are delimited by spaces or parentheses/brackets/braces */ - "|([^][(){} \t])+"), PATTERNS("markdown", "^ {0,3}#{1,6}[ \t].*", /* -- */ @@ -352,14 +344,21 @@ PATTERNS("rust", "|[0-9][0-9_a-fA-Fiosuxz]*(\\.([0-9]*[eE][+-]?)?[0-9_fF]*)?" "|[-+*\\/<>%&^|=!:]=|<<=?|>>=?|&&|\\|\\||->|=>|\\.{2}=|\\.{3}|::"), PATTERNS("scheme", - "^[\t ]*(\\(((define|def(struct|syntax|class|method|rules|record|proto|alias)?)[-*/ \t]|(library|module|struct|class)[*+ \t]).*)$", + /* A possibly indented left paren followed by a Scheme keyword. */ + "^[\t ]*(\\(((define|def(struct|syntax|class|method|rules|record|proto|alias)?)[-*/ \t]|(library|module|struct|class)[*+ \t]).*)$\n" + /* + * For other Lisp dialects: either an unindented left paren, or a + * slightly indented line starting with "(def". + */ + "^((\\(| {1,2}\\([Dd][Ee][Ff]).*)$", /* - * R7RS valid identifiers include any sequence enclosed - * within vertical lines having no backslashes + * The union of R7RS and Common Lisp symbol syntax: allows arbitrary + * strings between vertical bars, including escaped backslashes and + * vertical bars. */ - "\\|([^\\\\]*)\\|" + "\\|([^\\\\]|\\\\\\\\|\\\\\\|)*\\|" /* All other words should be delimited by spaces or parentheses */ - "|([^][)(}{[ \t])+"), + "|([^][)(}{ \t])+"), PATTERNS("tex", "^(\\\\((sub)*section|chapter|part)\\*{0,1}\\{.*)$", "\\\\[a-zA-Z@]+|\\\\.|([a-zA-Z0-9]|[^\x01-\x7f])+"), { .name = "default", .binary = -1 }, -- gitgitgadget ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH v2 2/2] merge with Scheme regexp; fix bugs 2025-11-27 2:38 ` [PATCH v2 2/2] merge with Scheme regexp; fix bugs Scott L. Burson via GitGitGadget @ 2025-11-27 16:09 ` Johannes Sixt 2025-12-02 10:27 ` Johannes Sixt 0 siblings, 1 reply; 15+ messages in thread From: Johannes Sixt @ 2025-11-27 16:09 UTC (permalink / raw) To: Scott L. Burson via GitGitGadget Cc: Junio C Hamano, Ævar Arnfjörð Bjarmason, Jaydeep P Das, D. Ben Knoble, Scott L. Burson, git Am 27.11.25 um 03:38 schrieb Scott L. Burson via GitGitGadget: > From: "Scott L. Burson" <Scott@sympoiesis.com> > > This commit merges (by disjoining) the new generic Lisp regexp into > the existing Scheme regexp. It also fixes two bugs: the new regexp > was unintentionally allowing tabs, and the matching of "(def" should > be case-insensitive. > > Signed-off-by: Scott L. Burson <Scott@sympoiesis.com> > --- > userdiff.c | 25 ++++++++++++------------- > 1 file changed, 12 insertions(+), 13 deletions(-) > > diff --git a/userdiff.c b/userdiff.c > index e127b4a1f1..b67dfddbef 100644 > --- a/userdiff.c > +++ b/userdiff.c > @@ -249,14 +249,6 @@ PATTERNS("kotlin", > "|[.][0-9][0-9_]*([Ee][-+]?[0-9]+)?[fFlLuU]?" > /* unary and binary operators */ > "|[-+*/<>%&^|=!]==?|--|\\+\\+|<<=|>>=|&&|\\|\\||->|\\.\\*|!!|[?:.][.:]"), > -PATTERNS("lisp", > - /* Either an unindented left paren, or a slightly indented line > - * starting with "(def" */ > - "^((\\(|:space:{1,2}\\(def).*)$", > - /* Common Lisp symbol syntax allows arbitrary strings between vertical bars */ > - "\\|([^\\\\]|\\\\\\\\|\\\\\\|)*\\|" > - /* All other words are delimited by spaces or parentheses/brackets/braces */ > - "|([^][(){} \t])+"), > PATTERNS("markdown", > "^ {0,3}#{1,6}[ \t].*", > /* -- */ You made this commit a fixup commit of the commit from the first round. This isn't desirable as long as the earlier patch has not been integrated in "next", yet. You should have squashed the commits into one. The cover letter gives a really good justification for this change and should be the commit's message (with its subject line, ie, the PR title). However, don't write "This commit does X", but write "Do X" instead: you give someone an order to change the code. (Also, after squashing there is no bug to fix anymore, of course.) > @@ -352,14 +344,21 @@ PATTERNS("rust", > "|[0-9][0-9_a-fA-Fiosuxz]*(\\.([0-9]*[eE][+-]?)?[0-9_fF]*)?" > "|[-+*\\/<>%&^|=!:]=|<<=?|>>=?|&&|\\|\\||->|=>|\\.{2}=|\\.{3}|::"), > PATTERNS("scheme", > - "^[\t ]*(\\(((define|def(struct|syntax|class|method|rules|record|proto|alias)?)[-*/ \t]|(library|module|struct|class)[*+ \t]).*)$", > + /* A possibly indented left paren followed by a Scheme keyword. */ > + "^[\t ]*(\\(((define|def(struct|syntax|class|method|rules|record|proto|alias)?)[-*/ \t]|(library|module|struct|class)[*+ \t]).*)$\n" Mental note how this RE is nested: [\t ]*( \(( ( define|def( struct|syntax|class|method |rules|record|proto|alias )? )[-*/ \t] | ( library|module|struct|class )[*+ \t] ).* )$ > + /* > + * For other Lisp dialects: either an unindented left paren, or a > + * slightly indented line starting with "(def". > + */ > + "^((\\(| {1,2}\\([Dd][Ee][Ff]).*)$", Here you are adding a very generous new pattern, the opening parenthesis without indentation. This will not only apply to "other Lisp dialects", as the comment says, but also Scheme code and will produce new matches. It does not change the test cases in t/t4018/scheme-*, because all have additional matches later. As such it would possibly be more honest to extract it out into its own (first) pattern and marked as applying to all dialects: /* * An unindented opening parenthesis identifies a top-level * structure in all Lisp dialects. */ "^(\\(.*)$\n", Note that the Scheme pattern excludes the indentation from the capture. You may want to do so here, too (and simplify "one or two spaces" like this): "^ ?(\\([Dd][Ee][Ff].*)$", Would it be possible to have test cases of Lisp code that is not covered by the Scheme pattern? > /* > - * R7RS valid identifiers include any sequence enclosed > - * within vertical lines having no backslashes > + * The union of R7RS and Common Lisp symbol syntax: allows arbitrary > + * strings between vertical bars, including escaped backslashes and > + * vertical bars. > */ > - "\\|([^\\\\]*)\\|" > + "\\|([^\\\\]|\\\\\\\\|\\\\\\|)*\\|" Without the C quoting we have \|([^\\]|\\\\|\\\|)*\| So, this is everthing from | up to the next |, except that \| does not stop scanning and \\ is also considered so that \\| is not regarded as \ followed by \|. Good. > /* All other words should be delimited by spaces or parentheses */ > - "|([^][)(}{[ \t])+"), > + "|([^][)(}{ \t])+"), Here we have a single bracket expression. The removed opening [ does not begin a new one, but is a duplicated character. Good. -- Hannes ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 2/2] merge with Scheme regexp; fix bugs 2025-11-27 16:09 ` Johannes Sixt @ 2025-12-02 10:27 ` Johannes Sixt 0 siblings, 0 replies; 15+ messages in thread From: Johannes Sixt @ 2025-12-02 10:27 UTC (permalink / raw) To: Scott L. Burson via GitGitGadget Cc: Junio C Hamano, Ævar Arnfjörð Bjarmason, Jaydeep P Das, D. Ben Knoble, Scott L. Burson, git Am 27.11.25 um 17:09 schrieb Johannes Sixt: > Am 27.11.25 um 03:38 schrieb Scott L. Burson via GitGitGadget: >> /* >> - * R7RS valid identifiers include any sequence enclosed >> - * within vertical lines having no backslashes >> + * The union of R7RS and Common Lisp symbol syntax: allows arbitrary >> + * strings between vertical bars, including escaped backslashes and >> + * vertical bars. >> */ >> - "\\|([^\\\\]*)\\|" >> + "\\|([^\\\\]|\\\\\\\\|\\\\\\|)*\\|" > > Without the C quoting we have > > \|([^\\]|\\\\|\\\|)*\| > > So, this is everthing from | up to the next |, except that \| does not > stop scanning and \\ is also considered so that \\| is not regarded as \ > followed by \|. Good. Actually, no. Regular expressions don't choose the first match if a different alternative gives a longer match in total. For example, for the change - (let ((|one two| |three four|))) + (let ((|1 two| |three four|))) we get to see the word diff (let (([-|one two| |three four|-]{+|1 two| |three four|+}))) but the desired result is (let (([-|one two|-]{+|1 two|+} |three four|))) The problem isn't new with the proposed change, but if we change the RE, we could fix this at the same time. I think it helps to include | in the bracket expression. It may be worth its own patch that also adds a test in t/t4034/scheme/. The worddiff test case is a bit too sloppy. I've tightened it in the patch below. You may want to make it the first of your series. (If you do, don't forget to apply your sign-off when you cherry-pick the commit.) It is also available here: https://github.com/j6t/git/tree/userdiff-scheme (commit 8f6cb42a02cc). ----- 8< ----- userdiff: tighten word-diff test case of the scheme driver The scheme driver separates identifiers only at parentheses of all sorts and whitespace, except that vertical bars act as brackets that enclose an identifier. The test case attempts to demonstrate the vertical bars with a change from 'some-text' to '|a greeting|'. However, this misses the goal because the same word coloring would be applied if '|a greeting|' were parsed as two words. Have an identifier between vertical bars with a space in both the pre- and the post-image and change only one side of the space to show that the single word exists between the vertical bars. Also add cases that change parentheses of all kinds in a sequence of parentheses to show that they are their own word each. Signed-off-by: Johannes Sixt <j6t@kdbg.org> --- t/t4034/scheme/expect | 5 +++-- t/t4034/scheme/post | 1 + t/t4034/scheme/pre | 3 ++- 3 files changed, 6 insertions(+), 3 deletions(-) diff --git a/t/t4034/scheme/expect b/t/t4034/scheme/expect index 496cd5de8c..138abe9f56 100644 --- a/t/t4034/scheme/expect +++ b/t/t4034/scheme/expect @@ -2,10 +2,11 @@ <BOLD>index 74b6605..63b6ac4 100644<RESET> <BOLD>--- a/pre<RESET> <BOLD>+++ b/post<RESET> -<CYAN>@@ -1,6 +1,6 @@<RESET> +<CYAN>@@ -1,7 +1,7 @@<RESET> (define (<RED>myfunc a b<RESET><GREEN>my-func first second<RESET>) ; This is a <RED>really<RESET><GREEN>(moderately)<RESET> cool function. (<RED>this\place<RESET><GREEN>that\place<RESET> (+ 3 4)) - (define <RED>some-text<RESET><GREEN>|a greeting|<RESET> "hello") + (define <RED>|the greeting|<RESET><GREEN>|a greeting|<RESET> "hello") + ({<RED>}<RESET>(([<RED>]<RESET>(func-n)<RED>[<RESET>]))<RED>{<RESET>}) (let ((c (<RED>+ a b<RESET><GREEN>add1 first<RESET>))) (format "one more than the total is %d" (<RED>add1<RESET><GREEN>+<RESET> c <GREEN>second<RESET>)))) diff --git a/t/t4034/scheme/post b/t/t4034/scheme/post index 63b6ac4f87..0e3bab101d 100644 --- a/t/t4034/scheme/post +++ b/t/t4034/scheme/post @@ -2,5 +2,6 @@ ; This is a (moderately) cool function. (that\place (+ 3 4)) (define |a greeting| "hello") + ({(([(func-n)]))}) (let ((c (add1 first))) (format "one more than the total is %d" (+ c second)))) diff --git a/t/t4034/scheme/pre b/t/t4034/scheme/pre index 74b6605357..03d77c7c43 100644 --- a/t/t4034/scheme/pre +++ b/t/t4034/scheme/pre @@ -1,6 +1,7 @@ (define (myfunc a b) ; This is a really cool function. (this\place (+ 3 4)) - (define some-text "hello") + (define |the greeting| "hello") + ({}(([](func-n)[])){}) (let ((c (+ a b))) (format "one more than the total is %d" (add1 c)))) -- 2.52.0.rc0.206.g6c0125c11f ^ permalink raw reply related [flat|nested] 15+ messages in thread
end of thread, other threads:[~2025-12-02 10:27 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-11-15 10:17 [PATCH] diff: "lisp" userdiff_driver Scott L. Burson via GitGitGadget 2025-11-15 17:06 ` Johannes Sixt 2025-11-15 23:32 ` Scott L. Burson 2025-11-20 16:47 ` D. Ben Knoble 2025-11-27 2:10 ` Scott L. Burson 2025-11-16 5:30 ` Junio C Hamano 2025-11-17 23:23 ` Scott L. Burson 2025-11-18 4:38 ` Junio C Hamano 2025-11-27 2:38 ` [PATCH v2 0/2] userdiff: extend Scheme support to cover other Lisp dialects Scott L. Burson via GitGitGadget 2025-11-27 2:38 ` [PATCH v2 1/2] diff: "lisp" userdiff_driver Scott L. Burson via GitGitGadget 2025-11-27 10:32 ` Scott L. Burson 2025-11-27 10:51 ` Johannes Sixt 2025-11-27 2:38 ` [PATCH v2 2/2] merge with Scheme regexp; fix bugs Scott L. Burson via GitGitGadget 2025-11-27 16:09 ` Johannes Sixt 2025-12-02 10:27 ` Johannes Sixt
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).