git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Extending whitespace checks
@ 2024-11-24  2:25 Junio C Hamano
  2024-11-24 21:41 ` Bence Ferdinandy
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Junio C Hamano @ 2024-11-24  2:25 UTC (permalink / raw)
  To: git

We have, via the attributes subsystem, a way to choose from a set of
predefined whitespace rules so that "git diff" can notice that you
are adding trailing whitespaces to your newly written lines, or you
are indenting a newly introduced line in a Python script with a HT.
This can be used, for example, in pre-commit hook to reject an
attempt to introduce whitespace-damaging changes to the codebase.

Which is great.

I am wondering what we can do to add a different kind of checks to
help file types with fixed format by extending the same mechanism,
or the checks I have in mind are too different from the whitespace
checks and shoehorning it into the existing mechanism does not make
sense.  The particular check I have an immediate need for is for a
filetype with lines, each has exactly 4 fields separated with HT in
between, so the check would ask "does each line have exactly 3 HT on
it?"  It would be extended to verify CSV files with fixed number of
fields (but the validator needs to be aware of the quoting rules for
comma in a value in fields).

I guess the best I could do (outside Git) is

 - write such a validator that can take one line of input and say
   "this line comforms to the rule".

 - add, via .gitattribute, my own attribute to allow me to mark
   the files that these rules apply.  Git does not do anything
   special for this attribute (remember, I said "outside Git").

 - in pre-commit hook, run "git diff ':(attr:myattr)'" to grab
   changes in these files with special formats, and have the
   line-by-line validator (above) check the new lines.

to make sure bad lines would not slip into the history, but it would
be really nice if I can trigger the check as part of "git diff --check",
which means it would be more ideal if we can do this "inside" Git.

Perhaps we could introduce a mechansim that allows me to do the
following:

 - An attribute, like whitespace=..., specifies what line-validation
   function to use to vet each new line introduced to a file with
   the attribute.

 - A line-validation function can be dynamically loaded/linked
   (here, we'd need ".gitattribute specifies the logical meaning,
   while .git/config and friends maps the 'logical meaning' to a
   specific implementation suitable for the platform" separation,
   similar to what we use for smudge/clean filters).  Perhaps this
   would be a good testbed for use of dll, written even in a foreign
   language like Rust?

 - In the diff machinery, where a '+' line is checked for whitespace
   anomalies in the existing code, add code to call the dynamically
   loaded line-validation function when applicable.

 - Profit?

Hmm?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Extending whitespace checks
  2024-11-24  2:25 Extending whitespace checks Junio C Hamano
@ 2024-11-24 21:41 ` Bence Ferdinandy
  2024-11-24 21:58   ` Kristoffer Haugsbakk
  2024-11-25  3:03   ` Junio C Hamano
  2024-11-25 22:04 ` Jacob Keller
  2024-11-27 15:04 ` Jeff King
  2 siblings, 2 replies; 9+ messages in thread
From: Bence Ferdinandy @ 2024-11-24 21:41 UTC (permalink / raw)
  To: Junio C Hamano, git


On Sun Nov 24, 2024 at 03:25, Junio C Hamano <gitster@pobox.com> wrote:
> We have, via the attributes subsystem, a way to choose from a set of
> predefined whitespace rules so that "git diff" can notice that you
> are adding trailing whitespaces to your newly written lines, or you
> are indenting a newly introduced line in a Python script with a HT.
> This can be used, for example, in pre-commit hook to reject an
> attempt to introduce whitespace-damaging changes to the codebase.
>
> Which is great.
>
> I am wondering what we can do to add a different kind of checks to
> help file types with fixed format by extending the same mechanism,
> or the checks I have in mind are too different from the whitespace
> checks and shoehorning it into the existing mechanism does not make
> sense.  The particular check I have an immediate need for is for a
> filetype with lines, each has exactly 4 fields separated with HT in
> between, so the check would ask "does each line have exactly 3 HT on
> it?"  It would be extended to verify CSV files with fixed number of
> fields (but the validator needs to be aware of the quoting rules for
> comma in a value in fields).
>
> I guess the best I could do (outside Git) is
>
>  - write such a validator that can take one line of input and say
>    "this line comforms to the rule".
>
>  - add, via .gitattribute, my own attribute to allow me to mark
>    the files that these rules apply.  Git does not do anything
>    special for this attribute (remember, I said "outside Git").
>
>  - in pre-commit hook, run "git diff ':(attr:myattr)'" to grab
>    changes in these files with special formats, and have the
>    line-by-line validator (above) check the new lines.
>
> to make sure bad lines would not slip into the history, but it would
> be really nice if I can trigger the check as part of "git diff --check",
> which means it would be more ideal if we can do this "inside" Git.
>
> Perhaps we could introduce a mechansim that allows me to do the
> following:
>
>  - An attribute, like whitespace=..., specifies what line-validation
>    function to use to vet each new line introduced to a file with
>    the attribute.
>
>  - A line-validation function can be dynamically loaded/linked
>    (here, we'd need ".gitattribute specifies the logical meaning,
>    while .git/config and friends maps the 'logical meaning' to a
>    specific implementation suitable for the platform" separation,
>    similar to what we use for smudge/clean filters).  Perhaps this
>    would be a good testbed for use of dll, written even in a foreign
>    language like Rust?
>
>  - In the diff machinery, where a '+' line is checked for whitespace
>    anomalies in the existing code, add code to call the dynamically
>    loaded line-validation function when applicable.
>
>  - Profit?
>
> Hmm?

This might be a tangent, but since enhancing whitespace checking was mentioned,
I'd thought I note here:  `git log --check` running in the CI did not catch the
white space errors in this patch (see the last hunk):

https://lore.kernel.org/git/20241121225757.3877852-4-bence@ferdinandy.com/

although it would have been certainly nice. I'm not sure if --check could
already catch this actually, or if it would be easy/possible to have something
general enough that does catch it.


Best,
Bence

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Extending whitespace checks
  2024-11-24 21:41 ` Bence Ferdinandy
@ 2024-11-24 21:58   ` Kristoffer Haugsbakk
  2024-12-01  2:51     ` A bughunter
  2024-11-25  3:03   ` Junio C Hamano
  1 sibling, 1 reply; 9+ messages in thread
From: Kristoffer Haugsbakk @ 2024-11-24 21:58 UTC (permalink / raw)
  To: Bence Ferdinandy, Junio C Hamano, git

On Sun, Nov 24, 2024, at 22:41, Bence Ferdinandy wrote:
> This might be a tangent, but since enhancing whitespace checking was mentioned,
> I'd thought I note here:  `git log --check` running in the CI did not catch the
> white space errors in this patch (see the last hunk):
>
> https://lore.kernel.org/git/20241121225757.3877852-4-bence@ferdinandy.com/
>
> although it would have been certainly nice. I'm not sure if --check could
> already catch this actually, or if it would be easy/possible to have something
> general enough that does catch it.

It looks like you indented some lines with spaces instead of tabs. It
doesn’t look like a “whitespace error” in the `--check` sense as I
understand it.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Extending whitespace checks
  2024-11-24 21:41 ` Bence Ferdinandy
  2024-11-24 21:58   ` Kristoffer Haugsbakk
@ 2024-11-25  3:03   ` Junio C Hamano
  1 sibling, 0 replies; 9+ messages in thread
From: Junio C Hamano @ 2024-11-25  3:03 UTC (permalink / raw)
  To: Bence Ferdinandy; +Cc: git

"Bence Ferdinandy" <bence@ferdinandy.com> writes:

> This might be a tangent, but since enhancing whitespace checking was mentioned,

Since Git was mentioned, let me talk about something that is related
to Git, even though it is completely unrelated to the discussion you
are trying to start.  Which may distract and bury whatever you
wanted to discuss in the list traffic noise, but I do not care ;-)

Don't do that, please.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Extending whitespace checks
  2024-11-24  2:25 Extending whitespace checks Junio C Hamano
  2024-11-24 21:41 ` Bence Ferdinandy
@ 2024-11-25 22:04 ` Jacob Keller
  2024-11-27 15:04 ` Jeff King
  2 siblings, 0 replies; 9+ messages in thread
From: Jacob Keller @ 2024-11-25 22:04 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Sat, Nov 23, 2024 at 6:25 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> We have, via the attributes subsystem, a way to choose from a set of
> predefined whitespace rules so that "git diff" can notice that you
> are adding trailing whitespaces to your newly written lines, or you
> are indenting a newly introduced line in a Python script with a HT.
> This can be used, for example, in pre-commit hook to reject an
> attempt to introduce whitespace-damaging changes to the codebase.
>
> Which is great.
>
> I am wondering what we can do to add a different kind of checks to
> help file types with fixed format by extending the same mechanism,
> or the checks I have in mind are too different from the whitespace
> checks and shoehorning it into the existing mechanism does not make
> sense.  The particular check I have an immediate need for is for a
> filetype with lines, each has exactly 4 fields separated with HT in
> between, so the check would ask "does each line have exactly 3 HT on
> it?"  It would be extended to verify CSV files with fixed number of
> fields (but the validator needs to be aware of the quoting rules for
> comma in a value in fields).
>
> I guess the best I could do (outside Git) is
>
>  - write such a validator that can take one line of input and say
>    "this line comforms to the rule".
>
>  - add, via .gitattribute, my own attribute to allow me to mark
>    the files that these rules apply.  Git does not do anything
>    special for this attribute (remember, I said "outside Git").
>
>  - in pre-commit hook, run "git diff ':(attr:myattr)'" to grab
>    changes in these files with special formats, and have the
>    line-by-line validator (above) check the new lines.
>
> to make sure bad lines would not slip into the history, but it would
> be really nice if I can trigger the check as part of "git diff --check",
> which means it would be more ideal if we can do this "inside" Git.
>
> Perhaps we could introduce a mechansim that allows me to do the
> following:
>
>  - An attribute, like whitespace=..., specifies what line-validation
>    function to use to vet each new line introduced to a file with
>    the attribute.
>
>  - A line-validation function can be dynamically loaded/linked
>    (here, we'd need ".gitattribute specifies the logical meaning,
>    while .git/config and friends maps the 'logical meaning' to a
>    specific implementation suitable for the platform" separation,
>    similar to what we use for smudge/clean filters).  Perhaps this
>    would be a good testbed for use of dll, written even in a foreign
>    language like Rust?
>
>  - In the diff machinery, where a '+' line is checked for whitespace
>    anomalies in the existing code, add code to call the dynamically
>    loaded line-validation function when applicable.
>
>  - Profit?
>

I like the idea of an extensible check mechanism with an API. I can
think of a couple of other places where such a check could be useful
to ensure formatting. I do think this is slightly more general than
whitespace checking.. The concept seems reasonable to me tho.

> Hmm?
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Extending whitespace checks
  2024-11-24  2:25 Extending whitespace checks Junio C Hamano
  2024-11-24 21:41 ` Bence Ferdinandy
  2024-11-25 22:04 ` Jacob Keller
@ 2024-11-27 15:04 ` Jeff King
  2024-11-27 23:53   ` Junio C Hamano
  2 siblings, 1 reply; 9+ messages in thread
From: Jeff King @ 2024-11-27 15:04 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Sun, Nov 24, 2024 at 11:25:21AM +0900, Junio C Hamano wrote:

> I am wondering what we can do to add a different kind of checks to
> help file types with fixed format by extending the same mechanism,
> or the checks I have in mind are too different from the whitespace
> checks and shoehorning it into the existing mechanism does not make
> sense.  The particular check I have an immediate need for is for a
> filetype with lines, each has exactly 4 fields separated with HT in
> between, so the check would ask "does each line have exactly 3 HT on
> it?"  It would be extended to verify CSV files with fixed number of
> fields (but the validator needs to be aware of the quoting rules for
> comma in a value in fields).

Coming from a devil's advocate position: what makes these CSV format
checks any different than syntax checks we get from a compiler? Or for
that matter, the result of running "make test"?

I.e., why implement a complex system for single-line verification
plugins when you'd be left with the much larger problem of evaluating
whole-tree states. And once you have solutions for that (like using
branches to separate unverified work and then merging it once it has
passed checks), then simple things like line syntax are easy to call
there.

Now you could argue that the existing whitespace checks are similarly
redundant. Rather than having "apply" complain about whitespace errors,
you could just check them as part of "make test".

The reasons I can think of for doing something like this are:

  - catching problems earlier is almost always less work for the user

  - for things that _are_ line-oriented, looking at individual diff
    lines lets you focus on problems being added, without worrying about
    existing violations in the final state. OTOH, that's not foolproof;
    if you modify a line with an existing whitespace problem without
    fixing it, "diff --check" will still complain.

So I'm not necessarily against it. But it seems like a very deep rabbit
hole to start adding in shared-library line validators, because I think
it ends in "now compile this before I agree to apply the patch". And I
think Git's model has mostly been the opposite: make it cheap and
private to branch and make changes (including applying patches) so that
you can inspect the state before deciding whether and how to publish.

-Peff

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Extending whitespace checks
  2024-11-27 15:04 ` Jeff King
@ 2024-11-27 23:53   ` Junio C Hamano
  2024-12-01 22:31     ` Jeff King
  0 siblings, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2024-11-27 23:53 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> But it seems like a very deep rabbit hole to start adding in
> shared-library line validators, because I think it ends in "now
> compile this before I agree to apply the patch".

I am not sure I understand your conclusion.  Who is telling that to
whom?  Somebody sends a patch that creates a file that requires a
special validator and the maintainer gives the validator and tells
the contributor to go use it to make sure their addition passses
before resubmitting?

I was hoping that the ability to add extra validators is more of an
enabler (than requirement and hindrance) for those who choose to be
extra careful.  It is similar to CFLAGS in our Makefile that allows
you to use options to enable more strict compiler warnings than what
other developers usually use, to notice certain class of problems
others may miss.

Shared-libraries and plug-ins remain to be solution in search of
problem at least for this project.  I do not really need CSV comma
counter, but I thought it may give a good excuse for those who want
to play with Rust and other stuff ;-)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Extending whitespace checks
  2024-11-24 21:58   ` Kristoffer Haugsbakk
@ 2024-12-01  2:51     ` A bughunter
  0 siblings, 0 replies; 9+ messages in thread
From: A bughunter @ 2024-12-01  2:51 UTC (permalink / raw)
  To: Kristoffer Haugsbakk; +Cc: git@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 539 bytes --]

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

 " This might be a tangent, but since enhancing whitespace checking was mentioned, "shoehorning": If it ain't broke don't fix it. Prudence indeed. Especially if you are messing with something having what you called " landmines". 
-----BEGIN PGP SIGNATURE-----
Version: ProtonMail

wnUEARYKACcFgmdLz0MJkKkWZTlQrvKZFiEEZlQIBcAycZ2lO9z2qRZlOVCu
8pkAAIROAP46l2AgKrh/FKvSwUHxC6XdqMYODOi7zTMYy7Annv72nQEAn6sQ
rtlEGS/xcb1BMKaMUWS5zYxrdi3hrwFsbasCQwU=
=Fwc6
-----END PGP SIGNATURE-----

[-- Attachment #2: publickey - A_bughunter@proton.me - 0x66540805.asc --]
[-- Type: application/pgp-keys, Size: 653 bytes --]

[-- Attachment #3: publickey - A_bughunter@proton.me - 0x66540805.asc.sig --]
[-- Type: application/pgp-signature, Size: 119 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Extending whitespace checks
  2024-11-27 23:53   ` Junio C Hamano
@ 2024-12-01 22:31     ` Jeff King
  0 siblings, 0 replies; 9+ messages in thread
From: Jeff King @ 2024-12-01 22:31 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Thu, Nov 28, 2024 at 08:53:06AM +0900, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > But it seems like a very deep rabbit hole to start adding in
> > shared-library line validators, because I think it ends in "now
> > compile this before I agree to apply the patch".
> 
> I am not sure I understand your conclusion.  Who is telling that to
> whom?  Somebody sends a patch that creates a file that requires a
> special validator and the maintainer gives the validator and tells
> the contributor to go use it to make sure their addition passses
> before resubmitting?
> 
> I was hoping that the ability to add extra validators is more of an
> enabler (than requirement and hindrance) for those who choose to be
> extra careful.  It is similar to CFLAGS in our Makefile that allows
> you to use options to enable more strict compiler warnings than what
> other developers usually use, to notice certain class of problems
> others may miss.

Yes, we who introduce the mechanism to create plug-ins do not have to
worry about writing those plug-ins ourselves. But we do have to maintain
the plug-in interface, and respond to complaints when it is not rich
enough to do what people want to do. So I was merely pessimistically
foreseeing where this may end up. ;)

Of course...

> Shared-libraries and plug-ins remain to be solution in search of
> problem at least for this project.  I do not really need CSV comma
> counter, but I thought it may give a good excuse for those who want
> to play with Rust and other stuff ;-)

...if playing with the plug-in interface is the point, none of that may
matter. :)

-Peff

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-12-01 22:31 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-24  2:25 Extending whitespace checks Junio C Hamano
2024-11-24 21:41 ` Bence Ferdinandy
2024-11-24 21:58   ` Kristoffer Haugsbakk
2024-12-01  2:51     ` A bughunter
2024-11-25  3:03   ` Junio C Hamano
2024-11-25 22:04 ` Jacob Keller
2024-11-27 15:04 ` Jeff King
2024-11-27 23:53   ` Junio C Hamano
2024-12-01 22:31     ` Jeff King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).