Git development

Git development
 help / color / mirror / Atom feed

* Re: [RFC PATCH v3 1/1] unit tests: Add a project plan document
From: Linus Arver @ 2023-06-29 19:42 UTC (permalink / raw)
  To: Josh Steadmon, git
  Cc: calvinwan, szeder.dev, phillip.wood123, chooglen, avarab, gitster,
	sandals
In-Reply-To: <8afdb215d7e10ca16a2ce8226b4127b3d8a2d971.1686352386.git.steadmon@google.com>

Hello,

Josh Steadmon <steadmon@google.com> writes:

> In our current testing environment, we spend a significant amount of
> effort crafting end-to-end tests for error conditions that could easily
> be captured by unit tests (or we simply forgo some hard-to-setup and
> rare error conditions).Describe what we hope to accomplish by

I see a minor typo (no space before the word "Describe").

> +=== Comparison
> +
> +[format="csv",options="header",width="75%"]
> +|=====
> +Framework,"TAP support","Diagnostic output","Parallel execution","Vendorable / ubiquitous","Maintainable / extensible","Major platform support","Lazy test planning","Runtime- skippable tests","Scheduling / re-running",Mocks,"Signal & exception handling","Coverage reports"
> +https://lore.kernel.org/git/c902a166-98ce-afba-93f2-ea6027557176@gmail.com/[Custom Git impl.],[lime-background]#True#,[lime-background]#True#,?,[lime-background]#True#,[lime-background]#True#,[lime-background]#True#,[lime-background]#True#,?,?,[red-background]#False#,?,?
> +https://cmocka.org/[cmocka],[lime-background]#True#,[lime-background]#True#,?,[red-background]#False#,[yellow-background]#Partial#,[yellow-background]#Partial#,[yellow-background]#Partial#,?,?,[lime-background]#True#,?,?
> +https://libcheck.github.io/check/[Check],[lime-background]#True#,[lime-background]#True#,?,[red-background]#False#,[yellow-background]#Partial#,[lime-background]#True#,[yellow-background]#Partial#,?,?,[red-background]#False#,?,?
> +https://github.com/rra/c-tap-harness/[C TAP],[lime-background]#True#,[red-background]#False#,?,[lime-background]#True#,[yellow-background]#Partial#,[yellow-background]#Partial#,[yellow-background]#Partial#,?,?,[red-background]#False#,?,?
> +https://github.com/silentbicycle/greatest[Greatest],[yellow-background]#Partial#,?,?,[lime-background]#True#,[yellow-background]#Partial#,?,[yellow-background]#Partial#,?,?,[red-background]#False#,?,?
> +https://github.com/Snaipe/Criterion[Criterion],[lime-background]#True#,?,?,[red-background]#False#,?,[lime-background]#True#,?,?,?,[red-background]#False#,?,?
> +https://github.com/zorgnax/libtap[libtap],[lime-background]#True#,?,?,?,?,?,?,?,?,?,?,?
> +https://nemequ.github.io/munit/[µnit],?,?,?,?,?,?,?,?,?,?,?,?
> +https://github.com/google/cmockery[cmockery],?,?,?,?,?,?,?,?,?,[lime-background]#True#,?,?
> +https://github.com/lpabon/cmockery2[cmockery2],?,?,?,?,?,?,?,?,?,[lime-background]#True#,?,?
> +https://github.com/ThrowTheSwitch/Unity[Unity],?,?,?,?,?,?,?,?,?,?,?,?
> +https://github.com/siu/minunit[minunit],?,?,?,?,?,?,?,?,?,?,?,?
> +https://cunit.sourceforge.net/[CUnit],?,?,?,?,?,?,?,?,?,?,?,?
> +https://www.kindahl.net/mytap/doc/index.html[MyTAP],[lime-background]#True#,?,?,?,?,?,?,?,?,?,?,?
> +|=====

This table is a little hard to read. Do you have your patch on GitHub or
somewhere else where this table is rendered with HTML?

It would help to explain each of the answers that are filled in
with the word "Partial", to better understand why it is the case. I
suspect this might get a little verbose, in which case I suggest just
giving each framework its own heading.

The column names here are slightly different from the headings used
under "Desired features"; I suggest making them the same.

Also, how about grouping some of these together? For example "Diagnostic
output" and "Coverage reports" feel like they could be grouped under
"Output formats". Here's one way to group these:

    1. Output formats

    TAP support
    Diagnostic output
    Coverage reports

    2. Cost of adoption

    Vendorable / ubiquitous
    Maintainable / extensible
    Major platform support

    3. Performance flexibility

    Parallel execution
    Lazy test planning
    Runtime-skippable tests
    Scheduling / re-running

    4. Developer experience

    Mocks
    Signal & exception handling

I can think of some other metrics to add to the comparison, namely:

    1. Age (how old is the framework)
    2. Size in KLOC (thousands of lines of code)
    3. Adoption rate (which notable C projects already use this framework?)
    4. Project health (how active are its developers?)

I think for 3 and 4, we could probably mine some data out of GitHub
itself.

Lastly it would be helpful if we can mark some of these categories as
must-haves. For example would lack of "Major platform support" alone
disqualify a test framework? This would help fill in the empty bits in
the comparison table because we could skip looking too deeply into a
framework if it fails to meet a must-have requirement.

Thanks,
Linus

^ permalink raw reply

* Re: [RFC PATCH v3 1/1] unit tests: Add a project plan document
From: Josh Steadmon @ 2023-06-29 20:48 UTC (permalink / raw)
  To: Linus Arver
  Cc: git, calvinwan, szeder.dev, phillip.wood123, chooglen, avarab,
	gitster, sandals
In-Reply-To: <owlybkgy837j.fsf@fine.c.googlers.com>

On 2023.06.29 12:42, Linus Arver wrote:
> Hello,
> 
> Josh Steadmon <steadmon@google.com> writes:
> 
> > In our current testing environment, we spend a significant amount of
> > effort crafting end-to-end tests for error conditions that could easily
> > be captured by unit tests (or we simply forgo some hard-to-setup and
> > rare error conditions).Describe what we hope to accomplish by
> 
> I see a minor typo (no space before the word "Describe").

Thanks, fixed for V4.

> > +=== Comparison
> > +
> > +[format="csv",options="header",width="75%"]
> > +|=====
> > +Framework,"TAP support","Diagnostic output","Parallel execution","Vendorable / ubiquitous","Maintainable / extensible","Major platform support","Lazy test planning","Runtime- skippable tests","Scheduling / re-running",Mocks,"Signal & exception handling","Coverage reports"
> > +https://lore.kernel.org/git/c902a166-98ce-afba-93f2-ea6027557176@gmail.com/[Custom Git impl.],[lime-background]#True#,[lime-background]#True#,?,[lime-background]#True#,[lime-background]#True#,[lime-background]#True#,[lime-background]#True#,?,?,[red-background]#False#,?,?
> > +https://cmocka.org/[cmocka],[lime-background]#True#,[lime-background]#True#,?,[red-background]#False#,[yellow-background]#Partial#,[yellow-background]#Partial#,[yellow-background]#Partial#,?,?,[lime-background]#True#,?,?
> > +https://libcheck.github.io/check/[Check],[lime-background]#True#,[lime-background]#True#,?,[red-background]#False#,[yellow-background]#Partial#,[lime-background]#True#,[yellow-background]#Partial#,?,?,[red-background]#False#,?,?
> > +https://github.com/rra/c-tap-harness/[C TAP],[lime-background]#True#,[red-background]#False#,?,[lime-background]#True#,[yellow-background]#Partial#,[yellow-background]#Partial#,[yellow-background]#Partial#,?,?,[red-background]#False#,?,?
> > +https://github.com/silentbicycle/greatest[Greatest],[yellow-background]#Partial#,?,?,[lime-background]#True#,[yellow-background]#Partial#,?,[yellow-background]#Partial#,?,?,[red-background]#False#,?,?
> > +https://github.com/Snaipe/Criterion[Criterion],[lime-background]#True#,?,?,[red-background]#False#,?,[lime-background]#True#,?,?,?,[red-background]#False#,?,?
> > +https://github.com/zorgnax/libtap[libtap],[lime-background]#True#,?,?,?,?,?,?,?,?,?,?,?
> > +https://nemequ.github.io/munit/[µnit],?,?,?,?,?,?,?,?,?,?,?,?
> > +https://github.com/google/cmockery[cmockery],?,?,?,?,?,?,?,?,?,[lime-background]#True#,?,?
> > +https://github.com/lpabon/cmockery2[cmockery2],?,?,?,?,?,?,?,?,?,[lime-background]#True#,?,?
> > +https://github.com/ThrowTheSwitch/Unity[Unity],?,?,?,?,?,?,?,?,?,?,?,?
> > +https://github.com/siu/minunit[minunit],?,?,?,?,?,?,?,?,?,?,?,?
> > +https://cunit.sourceforge.net/[CUnit],?,?,?,?,?,?,?,?,?,?,?,?
> > +https://www.kindahl.net/mytap/doc/index.html[MyTAP],[lime-background]#True#,?,?,?,?,?,?,?,?,?,?,?
> > +|=====
> 
> This table is a little hard to read. Do you have your patch on GitHub or
> somewhere else where this table is rendered with HTML?

Yes, I've pushed a WIP of this to:
https://github.com/steadmon/git/blob/unit-tests-asciidoc/Documentation/technical/unit-tests.adoc

However, this doesn't render the color coding in the table, so you may
also want to just build it locally:
`make -C Documentation technical/unit-tests.html`

> It would help to explain each of the answers that are filled in
> with the word "Partial", to better understand why it is the case. I
> suspect this might get a little verbose, in which case I suggest just
> giving each framework its own heading.

Yeah that is coming in V4.

> The column names here are slightly different from the headings used
> under "Desired features"; I suggest making them the same.

Fixed for V4.

> Also, how about grouping some of these together? For example "Diagnostic
> output" and "Coverage reports" feel like they could be grouped under
> "Output formats". Here's one way to group these:
> 
>     1. Output formats
> 
>     TAP support
>     Diagnostic output
>     Coverage reports
> 
>     2. Cost of adoption
> 
>     Vendorable / ubiquitous
>     Maintainable / extensible
>     Major platform support
> 
>     3. Performance flexibility
> 
>     Parallel execution
>     Lazy test planning
>     Runtime-skippable tests
>     Scheduling / re-running
> 
>     4. Developer experience
> 
>     Mocks
>     Signal & exception handling

I didn't state it outright, but they're roughly but not perfectly
ordered by priority. Of course, other people may prioritize these
differently, and I'm not set on this ordering either. Grouping by
category does seem more useful.


> I can think of some other metrics to add to the comparison, namely:
> 
>     1. Age (how old is the framework)
>     2. Size in KLOC (thousands of lines of code)
>     3. Adoption rate (which notable C projects already use this framework?)
>     4. Project health (how active are its developers?)
> 
> I think for 3 and 4, we could probably mine some data out of GitHub
> itself.

Interesting, I'll see about adding some of these.


> Lastly it would be helpful if we can mark some of these categories as
> must-haves. For example would lack of "Major platform support" alone
> disqualify a test framework? This would help fill in the empty bits in
> the comparison table because we could skip looking too deeply into a
> framework if it fails to meet a must-have requirement.

Yeah, right now I think supporting TAP is the only non-negotiable one,
but I'll add a discussion about priorities.

> Thanks,
> Linus

Thanks for the review!

^ permalink raw reply

* Re: SHA256 support not experimental, or?
From: Junio C Hamano @ 2023-06-29 20:56 UTC (permalink / raw)
  To: Adam Majer; +Cc: git
In-Reply-To: <65148753-7071-d5d5-3b4e-bad020e6ab63@zombino.com>

Adam Majer <adamm@zombino.com> writes:

> So maybe my question should be reworded to "is sha256 still considered
> early stage, for testing purposes only with possible data-loss or can
> it be relied on for actual long lived repositories?"

My understanding is that they are in a happy place where they are
just as usable as SHA-1 based repositories have been.  As we have
well-worked out interoperability design but no implementation, it
may have to change once we discover something missing in the design,
though.  But without such clarification, you already know the answer
to the above question in the message you are responding to.  Having
a migration path means "possible data-los" is not in the picture.

> The scary wording should be removed
> though, as currently it sounds like "data loss incoming and it's your
> fault" if one chooses sha256

Good.

THanks.

^ permalink raw reply

* Re: [PATCH] docs: include "trace.h" in MyFirstObjectWalk.txt
From: Junio C Hamano @ 2023-06-29 21:13 UTC (permalink / raw)
  To: Vinayak Dev, Elijah Newren; +Cc: nasamuffin, git
In-Reply-To: <20230629185238.58961-1-vinayakdev.sci@gmail.com>

Vinayak Dev <vinayakdev.sci@gmail.com> writes:

[jc: including Elijah, who owns a few topics of the recent past that
shuffled header files, to recipients].

> In Documentation/MyFirstObjectWalk.txt, the function
> trace_printf() is used to enable trace output.
> However, the file "trace.h" is not included, which
> produces an error when the code from the tutorial is
> compiled, with the compiler complaining that the 
> function is not defined before usage. Therefore, add
> an include for "trace.h" in the tutorial.
>
> Signed-off-by: Vinayak Dev <vinayakdev.sci@gmail.com>
> ---
>  Documentation/MyFirstObjectWalk.txt | 14 ++++++++------
>  1 file changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/Documentation/MyFirstObjectWalk.txt b/Documentation/MyFirstObjectWalk.txt
> index eee513e86f..c3a23eb100 100644
> --- a/Documentation/MyFirstObjectWalk.txt
> +++ b/Documentation/MyFirstObjectWalk.txt
> @@ -41,6 +41,7 @@ Open up a new file `builtin/walken.c` and set up the command handler:
>   */
>  
>  #include "builtin.h"
> +#include "trace.h"
>  
>  int cmd_walken(int argc, const char **argv, const char *prefix)
>  {

OK.

> @@ -49,12 +50,13 @@ int cmd_walken(int argc, const char **argv, const char *prefix)
>  }
>  ----
>  
> -NOTE: `trace_printf()` differs from `printf()` in that it can be turned on or
> -off at runtime. For the purposes of this tutorial, we will write `walken` as
> -though it is intended for use as a "plumbing" command: that is, a command which
> -is used primarily in scripts, rather than interactively by humans (a "porcelain"
> -command). So we will send our debug output to `trace_printf()` instead. When
> -running, enable trace output by setting the environment variable `GIT_TRACE`.
> +NOTE: `trace_printf()`, defined in `trace.h`, differs from `printf()` in
> +that it can be turned on or off at runtime. For the purposes of this
> +tutorial, we will write `walken` as though it is intended for use as
> +a "plumbing" command: that is, a command which is used primarily in
> +scripts, rather than interactively by humans (a "porcelain" command).
> +So we will send our debug output to `trace_printf()` instead.
> +When running, enable trace output by setting the environment variable `GIT_TRACE`.

All of the above may be good currently, but as soon as somebody does
another round of header shuffling, we risk a very similar breakage.
It is a good time to think about ways to avoid that, while the pain
is fresh in our mind.

One "cop out" thing we can do to limit the damage may be to loosen
the text in the "NOTE:", so that it does *NOT* mention exact header
files the API functions appear and let the readers learn from the
source themselves, with "git grep" helping their way.  Or tone down
"defined in X" somehow to hint that these details may change.

More effective things that would involve higher implementation cost
(but will reduce maintenance cost) would be to actually make sure
that those who update the API will notice that they are breaking
these samples when they develop their series.  

In https://lore.kernel.org/git/xmqq1qhu9ifp.fsf@gitster.g/, I've
floated some strawman ideas but people may be able to invent better
ways.

^ permalink raw reply

* Re: SHA256 support not experimental, or?
From: brian m. carlson @ 2023-06-29 21:17 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Adam Majer, git
In-Reply-To: <xmqqmt0iajww.fsf@gitster.g>

[-- Attachment #1: Type: text/plain, Size: 1187 bytes --]

On 2023-06-29 at 05:59:11, Junio C Hamano wrote:
> Adam Majer <adamm@zombino.com> writes:
> 
> > Is sha256 still considered experimental or can it be assumed to be stable?
> 
> I do not think we would officially label SHA-256 support as "stable"
> until we have good interoperability with SHA-1 repositories, but the
> expectation is that we will make reasonable effort to keep migration
> path for the current SHA-256 repositories, even if it turns out that
> its on-disk format need to be updated, to keep the end-user data safe.

I don't think that's a good position to have.  I'm not working on
interop more than incidentally at the moment, and to my knowledge,
nobody else is, either.  Absent me having substantially more free time
or having my employer pay me to work on it, it is probably not
happening.

We desperately do want people to move away from SHA-1 to SHA-256, and as
soon as there's tooling and forges to do so, we should encourage them to
do so.  Just because people can't interop existing SHA-1 repositories
doesn't mean people can't or shouldn't build new SHA-256 repositories.
-- 
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply

* Re: [RFC PATCH v3 1/1] unit tests: Add a project plan document
From: Junio C Hamano @ 2023-06-29 21:21 UTC (permalink / raw)
  To: Linus Arver
  Cc: Josh Steadmon, git, calvinwan, szeder.dev, phillip.wood123,
	chooglen, avarab, sandals
In-Reply-To: <owlybkgy837j.fsf@fine.c.googlers.com>

Linus Arver <linusa@google.com> writes:

>> ...
>> +https://github.com/ThrowTheSwitch/Unity[Unity],?,?,?,?,?,?,?,?,?,?,?,?
>> +https://github.com/siu/minunit[minunit],?,?,?,?,?,?,?,?,?,?,?,?
>> +https://cunit.sourceforge.net/[CUnit],?,?,?,?,?,?,?,?,?,?,?,?
>> +https://www.kindahl.net/mytap/doc/index.html[MyTAP],[lime-background]#True#,?,?,?,?,?,?,?,?,?,?,?
>> +|=====
>
> This table is a little hard to read. Do you have your patch on GitHub or
> somewhere else where this table is rendered with HTML?

Great suggestion (veiled in a question).

> It would help to explain each of the answers that are filled in
> with the word "Partial", to better understand why it is the case. I
> suspect this might get a little verbose, in which case I suggest just
> giving each framework its own heading.
>
> The column names here are slightly different from the headings used
> under "Desired features"; I suggest making them the same.
>
> Also, how about grouping some of these together? For example "Diagnostic
> output" and "Coverage reports" feel like they could be grouped under
> "Output formats". Here's one way to group these:
>
>     1. Output formats
>
>     TAP support
>     Diagnostic output
>     Coverage reports
>
>     2. Cost of adoption
>
>     Vendorable / ubiquitous
>     Maintainable / extensible
>     Major platform support
>
>     3. Performance flexibility
>
>     Parallel execution
>     Lazy test planning
>     Runtime-skippable tests
>     Scheduling / re-running
>
>     4. Developer experience
>
>     Mocks
>     Signal & exception handling
>
> I can think of some other metrics to add to the comparison, namely:
>
>     1. Age (how old is the framework)
>     2. Size in KLOC (thousands of lines of code)
>     3. Adoption rate (which notable C projects already use this framework?)
>     4. Project health (how active are its developers?)
>
> I think for 3 and 4, we could probably mine some data out of GitHub
> itself.

Great additions (if we are mere users do we care much about #2,
though?).

^ permalink raw reply

* Re: SHA256 support not experimental, or?
From: Junio C Hamano @ 2023-06-29 22:22 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Adam Majer, git
In-Reply-To: <ZJ303bm+VAvp5nyV@tapette.crustytoothpaste.net>

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> On 2023-06-29 at 05:59:11, Junio C Hamano wrote:
>> Adam Majer <adamm@zombino.com> writes:
>> 
>> > Is sha256 still considered experimental or can it be assumed to be stable?
>> 
>> I do not think we would officially label SHA-256 support as "stable"
>> until we have good interoperability with SHA-1 repositories, but the
>> expectation is that we will make reasonable effort to keep migration
>> path for the current SHA-256 repositories, even if it turns out that
>> its on-disk format need to be updated, to keep the end-user data safe.
>
> I don't think that's a good position to have.
> We desperately do want people to move away from SHA-1 to SHA-256, and as
> soon as there's tooling and forges to do so, we should encourage them to
> do so.

I agree that it is good to ensure that SHA-256 support is good
enough to start new projects with.

> Just because people can't interop existing SHA-1 repositories
> doesn't mean people can't or shouldn't build new SHA-256 repositories.

True, and our messaging should avoid scaring them away from doing
so.  But isn't the lack of interoperability one of the reasons why
GitHub and Gitlab do not yet offer choice of the hash?  There
certainly is a chicken-and-egg problem here.

^ permalink raw reply

* Re: Documentation/MyFirstObjectWalk: add #include "trace.h" to use trace_printf()
From: Emily Shaffer @ 2023-06-29 23:06 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Vinayak Dev, git
In-Reply-To: <xmqq1qhu9ifp.fsf@gitster.g>

On Thu, Jun 29, 2023 at 12:28 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Emily Shaffer <nasamuffin@google.com> writes:
>
> > Yeah, it's almost certainly stale in MyFirstObjectWalk - there was
> > very recently a patch to clean up some headers which probably were
> > implicitly including trace.h when I wrote this walkthrough.
>
> We are lucky that we have folks like Vinayak who tried out the
> examples and then bothered to spend time reporting the failure
> discovered.  What does it take, however, for us to have a bit more
> automated way to prevent such a breakage that comes from API
> changes?  Is it feasible, for example, to add a test that extracts
> code snippets from the MyFirstObjectWalk document and try to build
> the result?  Alternatively, we can ship such a set of sample source
> files somewhere in our tree (e.g. contrib/examples?) and have such
> a test try to build using the current set of source files, but then
> we need a mechansim to ensure that the sample source files will not
> go out of sync with the document.

Yeah, I remember we talked about this when MyFirstContribution and
MyFirstObjectWalk went in, but never made much headway. I do very much
like the idea of keeping the reference source in contrib/ as a set of
patches, maybe along with a script to apply them (or a readme with the
right `git am` invocation), and then checking that they still build.
Checking that against the contents of the document is trickier,
though, like you mentioned. Hm.

I'm interested in figuring out how to do this, but not likely to have
a lot of development time available to do it. Maybe I can take a day
here or there to poke at it, but if someone else is interested and
beats me to it, I will not be disappointed. :)

>
> Thoughts?
>
> Thanks.
>

^ permalink raw reply

* Re: Documentation/MyFirstObjectWalk: add #include "trace.h" to use trace_printf()
From: Emily Shaffer @ 2023-06-29 23:09 UTC (permalink / raw)
  To: Vinayak Dev; +Cc: git
In-Reply-To: <CADE8Naonm+bW_jVvJKmnfZWQyX=0-QVSHxpSaHs1qo+5DsCiPQ@mail.gmail.com>

On Thu, Jun 29, 2023 at 11:45 AM Vinayak Dev <vinayakdev.sci@gmail.com> wrote:
>
> > On Thu, Jun 29, 2023 at 9:33 AM Emily Shaffer <nasamuffin@google.com> wrote:
>
> Hi, thanks for replying!
>
> > > Yeah, it's almost certainly stale in MyFirstObjectWalk - there was
> > > very recently a patch to clean up some headers which probably were
> > > implicitly including trace.h when I wrote this walkthrough. Patches
> > > totally welcome - and if you were working from the reference code in
> > > https://github.com/nasamuffin/git/tree/myfirstrevwalk
> >
> > bah, wrong link, the tutorial points to branch `revwalk` instead of
> > `myfirstrevwalk`, but the offer stands :)
> >
> > > and it's on your
> > > way to rebase and fix that too, I'm happy to update my branch
> > > accordingly too. (If you weren't, don't worry about doing the extra
> > > work, though.)
>
> Sure will! But do you mean open a PR on your fork? I have the patch ready,
> and would be very happy to do so, if it is accepted!

Yeah, I think there are two things to fix:

First, a patch to Documentation/technical/MyFirstObjectWalk.txt fixing
the snippets there. (I thought that was what you were offering to
patch in your original mail, I may have been mistaken.)

Second, optionally, a rebased-and-fixed-and-your-attribution-added
branch of the reference impl that I can force-push to nasamuffin/git.
The more I think on it, I don't think the PR will help, since I will
want to just force-push that whole branch so the commit order still
functions as a learning tool. So if you have it even in a branch on
your GitHub or GitLab fork, or a series of patches you'd want to mail
to me, any of those are fine and I'll go ahead and rewrite my branch.

>
> Thanks a lot!
> Vinayak

^ permalink raw reply

* Re: [RFC PATCH v3 1/1] unit tests: Add a project plan document
From: Linus Arver @ 2023-06-30  0:11 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Josh Steadmon, git, calvinwan, szeder.dev, phillip.wood123,
	chooglen, avarab, sandals
In-Reply-To: <xmqqedlu7yn2.fsf@gitster.g>

Junio C Hamano <gitster@pobox.com> writes:

>> I can think of some other metrics to add to the comparison, namely:
>>
>>     1. Age (how old is the framework)
>>     2. Size in KLOC (thousands of lines of code)
>>     3. Adoption rate (which notable C projects already use this framework?)
>>     4. Project health (how active are its developers?)
>>
>> I think for 3 and 4, we could probably mine some data out of GitHub
>> itself.
>
> Great additions (if we are mere users do we care much about #2,
> though?).

Sorry, I forgot to add why I think these metrics are useful: I think
they give some signal about how much influence/respect the framework has
in our industry, with the assumption that the influence/respect
positively correlates with how "good" (sound architecture, well-written,
easy to use, simple to understand, etc) the framework is. For the
frameworks hosted in GitHub, perhaps the number of GitHub Stars is a
better estimate for measuring influence/respect.

That said, I think #2 (measuring KLOC) would still be useful to know
(and is easy enough with tools like tokei [1]), mainly for the scenario
where the framework becomes abandonware. Certainly, a framework with a
lower KLOC count would have a lower maintenance burden if we ever need
to step in to help maintain the framework ourselves. To me this is one
reason why I like the idea of using Phillip Wood's framework [2]
(granted, it is currently only a proof of concept).

[1] https://github.com/XAMPPRocky/tokei
[2] https://lore.kernel.org/git/c902a166-98ce-afba-93f2-ea6027557176@gmail.com/

^ permalink raw reply

* Re: [RFC PATCH v2] khash_test.c : add unit tests
From: Glen Choo @ 2023-06-30  0:53 UTC (permalink / raw)
  To: Siddharth Singh, git
In-Reply-To: <20230627132026.3204186-1-siddhartth@google.com>

Siddharth Singh <siddhartth@google.com> writes:

> - rewrote the code keeping coding style guidelines in mind.

I noticed several style issues in the patch. I'll avoid commenting on
those here so that we can focus on the other, big questions.

> I'm also very new to writing unit tests in C and git codebase in general, so I'll appreciate constructive feedback and discussion..

Okay. I think this is a good chance for the ML to decide on what sorts
of unit test practices we want too.

> What are the unit test cases to include in khash.h?
> What are the other types of tests that would be useful for khash.h?

For a third party library, I mostly agree with upthread commenters that
testing the correctness of the _library_ code isn't that valuable on its
own, but unit tests are also useful for demonstrating how the code
should be used. In a lot of projects, unit tests are a pseudo-official
demonstration of how an interface should be used, and that's sometimes
better than seeing the code in production, because unit tests are
typically a lot simpler and easier to reason about.

As an aside, this is something that a unit test framework will do _much
better_ than test-tool, because it makes it easy to read and write these
idtiomatic demos without worrying about how the test-tool executable
will be used.

I think that if we consider these two factors together (correctness and
developer education), the fact that we don't have great documentation of
khash.h in-tree, and that the external docs aren't that comprehensive
either, I think it would be useful to exercise each of the khash
'functions' to show how they work.

As Taylor mentioned upthread, I think the most valuable thing to test is
the way our code uses khash.h (not the library code itself). IOW:

  KHASH_INIT(oid_set, struct object_id, int, 0, oidhash_by_value, oideq_by_value)
  KHASH_INIT(oid_map, struct object_id, void *, 1, oidhash_by_value, oideq_by_value)
  KHASH_INIT(oid_pos, struct object_id, int, 1, oidhash_by_value, oideq_by_value)

So we would focus on oidhash_by_value and oideq_by_value in particular,
since that's provided by us. It would be very useful to verify that
these functions are doing exactly what we think they should be doing,
and we could do this with _every_ khash function to really explore the
API surface.

In fact... I suspect that they _not_ doing what we think they should,
or at the very least, might be displaying some very unintuitive
behavior. object_id contains both the hash value _and_ the hash
algorithm that generated it. If we have the same hash value, but
generated by different algorithms, are they considered the same equal or
not? To me, the intuitive answer is "no", but oidhash_by_value() only
considers the hash value and ignores the algorithm altogether! I think
that testing collisions between SHA1 and SHA256 values would be very
useful.

> +/*
> +  These tests are for base assumptions; if they fails, library is broken
> +*/
> +int hash_string_succeeds() {
> +  const char *str = "test_string";
> +  khint_t value = __ac_X31_hash_string(str);
> +  return value == 0xf1a6305e;
> +}
> +
> +int hash_string_with_non_alphabets_succeeds() {
> +  const char *str = "test_string #2";
> +  khint_t value = __ac_X31_hash_string(str);
> +  return value == 0xfa970771;
> +}

For a unit test, it is a best practice to test the external interface
that callers are depending on, and to avoid implementation
details/internals. If we do the latter, we end up with brittle tests
that break on non-important changes. E.g. we definitely should not be
testing the exact value of the hash function. The khash library
maintainer could decide to change the hash function, and we should not
be broken by that.

For khash.h, it appears that that anything prepended with "__" is meant
to be internal, and everything else is external. E.g.
__ac_X31_hash_string is certainly not meant to be used directly by end
users., especially since there's a human-friendly macro for it:

  #define kh_str_hash_func(key) __ac_X31_hash_string(key)

However, because something is external, doesn't mean we need to be
testing its implementation - we still shouldn't be testing the
return value of kh_str_hash_func(). We may want to test certain
_properties_ of the external function (e.g. does it provide good
collision resistance?), but in this case, I think we can trust that what
the library author wrote is a good default.

> +KHASH_INIT(str, const char *, int *, 1, kh_str_hash_func, kh_str_hash_equal);

As mentioned earlier, I think testing oid_[set|map|pos] would be much
more useful, and we should try to demonstrate each of the
macros/functions and how they interact, e.g. if I kh_put_*, I should
expect to be able to kh_get_* afterwards.

> +
> +int initialized_hashmap_pointer_not_null() {
> +  kh_str_t *hashmap = kh_init_str();
> +
> +  if(hashmap != NULL){
> +    free(hashmap);
> +    return 1;
> +  }
> +  return 0;
> +}

I don't think this is a useful test. It would be extremely weird for the
return value to be NULL in regular usage.

> +int put_key_into_hashmap_once_succeeds() {
> +  int ret, value;
> +  khint_t pos;
> +  kh_str_t *hashmap = kh_init_str();
> +  pos = kh_put_str(hashmap, "test_key", &ret);
> +  if(!ret)
> +    return 0;
> +  return pos != 0;
> +}

I don't think it makes sense to only check the return value, we should
check that it had the intended effect - that we can read back the value.
There is a later test that does this, so I'd prefer we remove this.

Comment on the test framework, not the test: I think it makes sense to
assert that "ret" is nonzero, but this "if..return" way of asserting
makes it impossible to tell what has broken. It also adds a lot of
cognitive overhead when trying to parse the assertion. I'll echo Phillip
Wood's comments and say that I think this is definitely not a framework
I want to use.

> +int put_same_key_into_hashmap_twice_fails() {
> +  int first_return_value, second_return_value, value;
> +  khint_t pos;
> +  kh_str_t *hashmap = kh_init_str();
> +  kh_put_str(hashmap, "test_key", &first_return_value);
> +  kh_put_str(hashmap, "test_key", &second_return_value);
> +  return !second_return_value;
> +}

This test seems reasonable.

> +int put_value_into_hashmap_once_succeeds() {
> +  int ret, value = 14;
> +  khint_t pos;
> +  kh_str_t *hashmap = kh_init_str();
> +  pos = kh_put_str(hashmap, "test_key", &ret);
> +  if (!ret)
> +    return 0;
> +  kh_key(hashmap, pos) = xstrdup("test_key");
> +  kh_value(hashmap, pos) = &value;
> +  return *kh_value(hashmap, pos) == value;
> +}

Checking the inserted value makes sense to me I don't really understand
khash.h, but is this the recommended way to get values back? kh_value()
looks like just a small macro around an internal array, so we are just
checking that *&value == value. I would have expected that we would call
kh_get to get "pos", then call kh_value on that.

Also, do we have to kh_put_str("some_key") and then immediately set it
again with kh_key(pos)? That seems odd, and I don't see other callers
doing that all the time.

Since this is the second time I see "if (!ret)", I should ask whether
this is meant as an assertion (checking the correctness of the code), or
for flow control (the code might return either 0 or 1, and I need to
handle both). If it is the former, I think it's unnecessary to
assert on it multiple times. The latter does not belong in a unit test -
we should already _know_ what we want to be returned. Btw, it is a huge
flaw in the test framework that both look exactly the same, so I'll say
again that I really don't like it.

> +int put_same_value_into_hashmap_twice_fails() {
> +  int first_return_value, second_return_value, value = 14;
> +  khint_t pos;
> +  kh_str_t *hashmap = kh_init_str();
> +  pos = kh_put_str(hashmap, "test_key", &first_return_value);
> +  if (!first_return_value)
> +    return 0;
> +  kh_key(hashmap, pos) = xstrdup("test_key");
> +  kh_value(hashmap, pos) = &value;
> +  kh_put_str(hashmap, "test_key", &second_return_value);
> +  return !second_return_value;
> +}

I don't see how this is different from the previous test that tested for
collisions.

> +int get_existing_kv_from_hashmap_succeeds() {
> +  int ret, value =14;
> +  int expected = value;
> +  khint_t pos;
> +  kh_str_t *hashmap = kh_init_str();
> +  pos = kh_put_str(hashmap, "test_key", &ret);
> +  if (!ret)
> +    return 0;
> +  kh_key(hashmap, pos) = xstrdup("test_key");
> +  kh_value(hashmap, pos) = &value;
> +  return *kh_value(hashmap, kh_get_str(hashmap, "test_key")) == expected;
> +}

Isn't this identical to put_value_into_hashmap_once_succeeds()?

Also, this assertion is harder to read than necessary because we have
both "expected" and "value" - the different names suggest that they are
different, but they're really the same. Let's just use one of them.

> +int get_nonexisting_kv_from_hashmap_fails() {
> +  int value = 14;
> +  khint_t pos;
> +  kh_str_t *hashmap = kh_init_str();
> +  return !kh_get_str(hashmap, "test_key");
> +}

Looks reasonable.

> +int deletion_from_hashmap_sets_flag() {
> +  int ret, value = 14;
> +  khint_t pos;
> +  kh_str_t *hashmap = kh_init_str();
> +  pos = kh_put_str(hashmap, "test_key", &ret);
> +  if (!ret)
> +    return 0;
> +  if(!kh_exist(hashmap, pos))
> +    return 0;
> +  kh_key(hashmap, pos) = xstrdup("test_key");
> +  kh_value(hashmap, pos) = &value;
> +  kh_del_str(hashmap, pos);
> +  return !kh_exist(hashmap, pos);
> +}

What is the "flag" being set? If it is the khash-internal flag that
marks that an entry is deleted, that's an unimportant implementation
detail - the important thing is that the value is deleted. Let's just
rename this to "deletion_works" or something.

> +
> +int deletion_from_hashmap_updates_size() {
> +  int ret, value = 14, current_size;
> +  khint_t pos;
> +  kh_str_t *hashmap = kh_init_str();
> +  pos = kh_put_str(hashmap, "test_key", &ret);
> +  if (!ret)
> +    return 0;
> +  kh_key(hashmap, pos) = xstrdup("test_key");
> +  kh_value(hashmap, pos) = &value;
> +  current_size = hashmap->size;
> +  kh_del_str(hashmap, pos);
> +  return hashmap->size + 1 == current_size;
> +}

The interface is kh_size(), so let's use that instead of referencing the
.size member correctly.

> +int deletion_from_hashmap_does_not_update_kv() {
> +  int ret, value = 14, current_size;
> +  khint_t pos;
> +  kh_str_t *hashmap = kh_init_str();
> +  pos = kh_put_str(hashmap, "test_key", &ret);
> +  if (!ret)
> +    return 0;
> +  kh_key(hashmap, pos) = xstrdup("test_key");
> +  kh_value(hashmap, pos) = &value;
> +  kh_del_str(hashmap, pos);
> +  return !strcmp(kh_key(hashmap, pos),"test_key");
> +}

Interesting! If we kh_del_(pos) but read it back with kh_key(), we can
still read it back. This has some implications on how khash should be
used - we should be extremely careful if we pass around a returned value
from kh_get_ or kh_put_ because we don't know if it's deleted or not.
Instead, we should prefer kh_get_(). Very useful demonstration.

> +int update_value_after_deletion_succeeds() {
> +  int ret, value1 = 14, value2 = 15;
> +  khint_t pos1, pos2;
> +  kh_str_t *hashmap = kh_init_str();
> +  // adding the kv to the hashmap
> +  pos1 = kh_put_str(hashmap, "test_key", &ret);
> +  if (!ret)
> +    return 0;
> +  kh_key(hashmap, pos1) = xstrdup("test_key");
> +  kh_value(hashmap, pos1) = &value1;
> +  // delete the kv from the hashmap
> +  kh_del_str(hashmap, pos1);
> +  // adding the same key with different value
> +  pos2 = kh_put_str(hashmap, "test_key", &ret);
> +  if (!ret)
> +    return 0;
> +  kh_key(hashmap, pos2) = xstrdup("test_key");
> +  kh_value(hashmap, pos2) = &value2;
> +  // check if the value is different
> +  return *kh_value(hashmap, kh_get_str(hashmap, "test_key")) == value2;
> +}

Looks reasonable.

> +int hashmap_size_matches_number_of_added_elements() {
> +  int ret, value1 = 14, value2 = 15, value3 = 16;
> +  khint_t pos1, pos2, pos3;
> +  kh_str_t *hashmap = kh_init_str();
> +  pos1 = kh_put_str(hashmap, "test_key1", &ret);
> +  if (!ret)
> +    return 0;
> +  kh_key(hashmap, pos1) = xstrdup("test_key1");
> +  kh_value(hashmap, pos1) = &value1;
> +  pos2 = kh_put_str(hashmap, "test_key2", &ret);
> +  if (!ret)
> +    return 0;
> +  kh_key(hashmap, pos2) = xstrdup("test_key2");
> +  kh_value(hashmap, pos2) = &value2;
> +  pos3 = kh_put_str(hashmap, "test_key3", &ret);
> +  if (!ret)
> +    return 0;
> +  kh_key(hashmap, pos3) = xstrdup("test_key3");
> +  kh_value(hashmap, pos3) = &value3;
> +  return kh_size(hashmap) == 3;
> +}

Looks reasonable.

I would also exercise kh_end, kh_begin, kh_clear, kh_foreach and
kh_foreach_value

kh_destroy might be useful to test too, in case there are oddities
there, but it's probably not that interesting

kh_n_buckets doesn't seem useful to test - it looks too specific to the
impkementation. kh_resize looks like an implementation detail that we
shouldn't test either.

^ permalink raw reply

* Re: [RFC PATCH v2] khash_test.c : add unit tests
From: Glen Choo @ 2023-06-30  1:05 UTC (permalink / raw)
  To: Siddharth Singh, git
In-Reply-To: <20230627132026.3204186-1-siddhartth@google.com>

Some comments on mechanical issues...

It seems that you dropped the other participants from your reply.

  https://git-scm.com/docs/MyFirstContribution#v2-git-send-email

Do CC them as it helps them keep track of discussions they were in. I
think they will not mind if you send a new message to this thread, CC
them, and explain that you didn't mean to omit them.

Siddharth Singh <siddhartth@google.com> writes:

> Signed-off-by: Siddharth Singh <siddhartth@google.com>
> ---
> Since v1
> - rewrote the code keeping coding style guidelines in mind.
> - refactored tests to improve their implementation..
> - added a few more tests that cause collisions in the hashmap.
> In the v1 patch, there were concerns that new tests were unnecessary because of upstream tests. However, I believe it would be beneficial to have tests based on the khash implementation in the git codebase because the existing tests[1] for khash do not use the same code for khash as the git codebase. 
> E.g. in khash.h file of “attractivechaos/klib/khash.h” [2]. There's an implementation for `KHASH_MAP_INIT_INT64` which isn’t defined in “git/git/khash.h”[3]. So, there’s a possibility that the khash.h in a different repository might move forward and end up being different from out implementation, so writing tests for “git/git/khash.h” would ensure that our tests are relevant to the actual usage of the library.
> While my colleagues are currently investigating whether C Tap harness is the best way to test libified code, this RFC is intended to discuss the content of unit tests and what other tests would be useful for khash.h. I am unsure of what tests will be valuable in the future, so I would appreciate your thoughts on the matter. Many test cases are easier to construct in C TAP harness than in test-tool. For example, In C TAP harness, non-string values can be used directly in hash maps. However, in test-tool, non-string values must be passed as an argument to a shell executable, parsed into the correct type, and then operated on.
> I'm also very new to writing unit tests in C and git codebase in general, so I'll appreciate constructive feedback and discussion..

Do rewrap the line to 80 characters, otherwise it is quite hard to read
on most text-based clients. You can preview your patches with "git
send-email --annotate" and see if they are a reasonable column width.

> diff --git a/Makefile b/Makefile
> index 660c72d72e..d3ad2fa23c 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -3847,11 +3847,13 @@ $(UNIT_TEST_RUNNER): t/runtests.c
>  
>  UNIT_TEST_PROGS += t/unit-tests/unit-test-t
>  UNIT_TEST_PROGS += t/unit-tests/strbuf-t
> +UNIT_TEST_PROGS += t/unit-tests/khash-t
>  
>  $(UNIT_TEST_PROGS): $(CTAP_OBJS) $(LIBS)
>  	$(QUIET)mkdir -p t/unit-tests
>  	$(QUIET_CC)$(CC) -It -o t/unit-tests/unit-test-t t/unit-test.c $(CTAP_OBJS)
>  	$(QUIET_CC)$(CC) -It -o t/unit-tests/strbuf-t t/strbuf-test.c -DSKIP_COMMON_MAIN common-main.c $(CTAP_OBJS) $(LIBS)

Many reviewers like to apply the patches for review. It's typically
assumed that patches are based on 'master', but this looks like it is
not. That's okay, though you should call it out. If it is based on
something not in Junio's tree (and thus a reviewer can't apply your
patch), it would be appreciated if you share this commit via a Git fork,
e.g. GitHub.

> +/*
> +  These tests are for base assumptions; if they fails, library is broken

Missing *.

> +int hash_string_succeeds() {
> +  const char *str = "test_string";

We use tabs, not spaces. I think "make style" catches this.

> +int initialized_hashmap_pointer_not_null() {
> +  kh_str_t *hashmap = kh_init_str();
> +
> +  if(hashmap != NULL){

"if ", not "if". "make style" will catch this.

> +int update_value_after_deletion_succeeds() {
> +  int ret, value1 = 14, value2 = 15;
> +  khint_t pos1, pos2;
> +  kh_str_t *hashmap = kh_init_str();
> +  // adding the kv to the hashmap

Comments use /* */, not //.

^ permalink raw reply

* Re: SHA256 support not experimental, or?
From: brian m. carlson @ 2023-06-30  1:21 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Adam Majer, git
In-Reply-To: <xmqqa5wh9adg.fsf@gitster.g>

[-- Attachment #1: Type: text/plain, Size: 1191 bytes --]

On 2023-06-29 at 22:22:51, Junio C Hamano wrote:
> True, and our messaging should avoid scaring them away from doing
> so.  But isn't the lack of interoperability one of the reasons why
> GitHub and Gitlab do not yet offer choice of the hash?  There
> certainly is a chicken-and-egg problem here.

There are a lot of necessary changes for a forge to adopt SHA-256.  For
example, at GitHub, we have a single null OID constant in some code that
has to be addressed, libgit2 has to be taught about SHA-256 or removed,
and UI changes need to be done to accommodate the larger IDs.  I'm
sure that GitLab has very similar situations, as do all of the other
forges.  After all, think about the extensive number of patches that
went into Git itself to get us there.  Everyone has made all of those
same assumptions in their forges.

I'm certain that whether or not interoperability were available would
not influence the forges' desire to support SHA-256.  It's simply a lot
of work to fix all of those spots that need it and requires a lot of
communication and discussions across teams, all of which takes time.
-- 
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply

* Re: Documentation/MyFirstObjectWalk: add #include "trace.h" to use trace_printf()
From: Junio C Hamano @ 2023-06-30  2:21 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: Vinayak Dev, git
In-Reply-To: <CAJoAoZnaU7WsCpnZY0Pvjg4_GJMZADF0FNC9fNZK56ShX2JO+g@mail.gmail.com>

Emily Shaffer <nasamuffin@google.com> writes:

> ... I do very much
> like the idea of keeping the reference source in contrib/ as a set of
> patches, maybe along with a script to apply them (or a readme with the
> right `git am` invocation), and then checking that they still build.
> Checking that against the contents of the document is trickier,
> though, like you mentioned. Hm.

An approach to avoid two things (i.e. sample source and the code
snippet in the documentation) going out of sync is not to have two
of them in the first place.  If we give up readability of the MFOW
document in its source form, you may be able to arrange the code
snippet to be incorporated by the build test and documentation at
the same time.

^ permalink raw reply

* Re: Documentation/MyFirstObjectWalk: add #include "trace.h" to use trace_printf()
From: Vinayak Dev @ 2023-06-30  5:48 UTC (permalink / raw)
  To: Emily Shaffer, Junio C Hamano, git
In-Reply-To: <CAJoAoZ=X9hwZZ9eN2X=g04k2E6=wZsY1WEKFydMreNJKM3Mzng@mail.gmail.com>

On Fri, 30 Jun 2023 at 04:39, Emily Shaffer <nasamuffin@google.com> wrote:
> Yeah, I think there are two things to fix:
>
> First, a patch to Documentation/technical/MyFirstObjectWalk.txt fixing
> the snippets there. (I thought that was what you were offering to
> patch in your original mail, I may have been mistaken.)
>
> Second, optionally, a rebased-and-fixed-and-your-attribution-added
> branch of the reference impl that I can force-push to nasamuffin/git.
> The more I think on it, I don't think the PR will help, since I will
> want to just force-push that whole branch so the commit order still
> functions as a learning tool. So if you have it even in a branch on
> your GitHub or GitLab fork, or a series of patches you'd want to mail
> to me, any of those are fine and I'll go ahead and rewrite my branch.

I will be pushing MyFirstObjectWalk's implementation to my Github fork,
but I might need a day's time to do that(I don't want to leave behind
any mistakes).
You are right, a PR surely does not seem to be the best way to do this. As soon
as I finish(shouldn't take too much time), I will reply with a link to
the branch.
Would that be alright?

Thanks a lot!
Vinayak

^ permalink raw reply

* Re: [PATCH] docs: include "trace.h" in MyFirstObjectWalk.txt
From: Vinayak Dev @ 2023-06-30  6:09 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Elijah Newren, nasamuffin, git
In-Reply-To: <xmqqjzvm7z13.fsf@gitster.g>

On Fri, 30 Jun 2023 at 02:43, Junio C Hamano <gitster@pobox.com> wrote:
>

Hi, thanks for replying!

> All of the above may be good currently, but as soon as somebody does
> another round of header shuffling, we risk a very similar breakage.
> It is a good time to think about ways to avoid that, while the pain
> is fresh in our mind.
>
> One "cop out" thing we can do to limit the damage may be to loosen
> the text in the "NOTE:", so that it does *NOT* mention exact header
> files the API functions appear and let the readers learn from the
> source themselves, with "git grep" helping their way.  Or tone down
> "defined in X" somehow to hint that these details may change.

That is a good suggestion, but wouldn't the same argument apply to
including trace.h itself? It makes it necessary for the code to work
that any API changes must involve changing the included headers.
Either way, I would be happy to fix my mistake. Should I send out a V2?

> More effective things that would involve higher implementation cost
> (but will reduce maintenance cost) would be to actually make sure
> that those who update the API will notice that they are breaking
> these samples when they develop their series.
>
> In https://lore.kernel.org/git/xmqq1qhu9ifp.fsf@gitster.g/, I've
> floated some strawman ideas but people may be able to invent better
> ways.

I do want to be a part of the discussion and the solution, so if I am
able to find a reasonable way(or to implement a solution), I would
inform everyone for sure.

Thanks a lot!
Vinayak

^ permalink raw reply

* Re: [RFC PATCH 0/8] Introduce Git Standard Library
From: Linus Arver @ 2023-06-30  7:01 UTC (permalink / raw)
  To: Calvin Wan, git; +Cc: Calvin Wan, nasamuffin, chooglen, johnathantanmy
In-Reply-To: <20230627195251.1973421-1-calvinwan@google.com>

Hello Calvin,

Calvin Wan <calvinwan@google.com> writes:
> With our current method of building Git, we can imagine the dependency
> graph as such:
>
>         Git
>          /\
>         /  \
>        /    \
>   libgit.a   ext deps
>
> In libifying parts of Git, we want to shrink the dependency graph to
> only the minimal set of dependencies, so libraries should not use
> libgit.a. Instead, it would look like:
>
>                 Git
>                 /\
>                /  \
>               /    \
>           libgit.a  ext deps
>              /\
>             /  \
>            /    \
> object-store.a  (other lib)
>       |        /
>       |       /
>       |      /
>  config.a   / 
>       |    /
>       |   /
>       |  /
> git-std-lib.a
>
> Instead of containing all of the objects in Git, libgit.a would contain
> objects that are not built by libraries it links against. Consequently,
> if someone wanted their own custom build of Git with their own custom
> implementation of the object store, they would only have to swap out
> object-store.a rather than do a hard fork of Git.

What about the case where someone wants to build program Foo which just
pulls in some bits of Git? For example, I am thinking of trailer.[ch]
which could be refactored to expose a public API. Then the Foo program
could pull this public trailer manipulation API in as a library
dependency (so that Foo can parse trailers in commit messages without
re-implementing that logic in Foo's own codebase). With the proposed Git
Standard Library (GSL) model above, would my Foo program also have to
pull in GSL? If so, isn't this onerous because of the additional bloat?
The Foo developers just want the banana, not the gorilla holding the
banana in the jungle, so to speak.

> Rationale behind Git Standard Library
> ================
>
> The rationale behind Git Standard Library essentially is the result of
> two observations within the Git codebase: every file includes
> git-compat-util.h which defines functions in a couple of different
> files, and wrapper.c + usage.c have difficult-to-separate circular
> dependencies with each other and other files.
>
> Ubiquity of git-compat-util.h and circular dependencies
> ========
>
> Every file in the Git codebase includes git-compat-util.h. It serves as
> "a compatibility aid that isolates the knowledge of platform specific
> inclusion order and what feature macros to define before including which
> system header" (Junio[5]). Since every file includes git-compat-util.h, and
> git-compat-util.h includes wrapper.h and usage.h, it would make sense
> for wrapper.c and usage.c to be a part of the root library. They have
> difficult to separate circular dependencies with each other so they

s/difficult to separate/difficult-to-separate

> can't be independent libraries. Wrapper.c has dependencies on parse.c,
> abspath.c, strbuf.c, which in turn also have dependencies on usage.c and
> wrapper.c -- more circular dependencies. 
>
> Tradeoff between swappability and refactoring
> ========
>
> From the above dependency graph, we can see that git-std-lib.a could be
> many smaller libraries rather than a singular library. So why choose a
> singular library when multiple libraries can be individually easier to
> swap and are more modular? A singular library requires less work to
> separate out circular dependencies within itself so it becomes a
> tradeoff question between work and reward. While there may be a point in
> the future where a file like usage.c would want its own library so that
> someone can have custom die() or error(), the work required to refactor
> out the circular dependencies in some files would be enormous due to
> their ubiquity so therefore I believe it is not worth the tradeoff
> currently. Additionally, we can in the future choose to do this refactor
> and change the API for the library if there becomes enough of a reason
> to do so (remember we are avoiding promising stability of the interfaces
> of those libraries).

Would getting us down the currently proposed path make it even more
difficult to do this refactor? If so, I think it's worth mentioning.

> Reuse of compatibility functions in git-compat-util.h
> ========
>
> Most functions defined in git-compat-util.h are implemented in compat/
> and have dependencies limited to strbuf.h and wrapper.h so they can be
> easily included in git-std-lib.a, which as a root dependency means that
> higher level libraries do not have to worry about compatibility files in
> compat/. The rest of the functions defined in git-compat-util.h are
> implemented in top level files and, in this patch set, are hidden behind
> an #ifdef if their implementation is not in git-std-lib.a.
>
> Rationale summary
> ========
>
> The Git Standard Library allows us to get the libification ball rolling
> with other libraries in Git (such as Glen's removal of global state from
> config iteration[6] prepares a config library). By not spending many
> more months attempting to refactor difficult circular dependencies and
> instead spending that time getting to a state where we can test out
> swapping a library out such as config or object store, we can prove the
> viability of Git libification on a much faster time scale. Additionally
> the code cleanups that have happened so far have been minor and
> beneficial for the codebase. It is probable that making large movements
> would negatively affect code clarity.

It sounds like the circular dependencies are so difficult to untangle that they
are the primary motivation behind grouping these tightly-coupled libraries
together into the Git Standard Library (GSL) banner. Still, I think it would
help reviewers if you explain what tradeoffs we are making by accepting the
circular dependencies as they are instead of untangling them. Conversely, if we
assume that there are no circular dependencies, what kind of benefits do we get
when designing the GSL from this (improved) position? Would there be little to
no additional benefits? If so, then I think it would be easier to support the
current approach (as removing the circularities would not give us significant
advantages for libification).

> Git Standard Library boundary
> ================
>
> While I have described above some useful heuristics for identifying
> potential candidates for git-std-lib.a, a standard library should not
> have a shaky definition for what belongs in it.
>
>  - Low-level files (aka operates only on other primitive types) that are
>    used everywhere within the codebase (wrapper.c, usage.c, strbuf.c)
>    - Dependencies that are low-level and widely used
>      (abspath.c, date.c, hex-ll.c, parse.c, utf8.c)
>  - low-level git/* files with functions defined in git-compat-util.h
>    (ctype.c)
>  - compat/*

I'm confused. Is the list above an example of a shaky definition, or the
opposite? IOW, do you mean that the list above should be the initial set
of content to include in the GSL? Or _not_ to include?

> Series structure
> ================
>
> While my strbuf and git-compat-util series can stand alone, they also
> function as preparatory patches for this series. There are more cleanup
> patches in this series, but since most of them have marginal benefits
> probably not worth the churn on its own, I decided not to split them
> into a separate series like with strbuf and git-compat-util. As an RFC,
> I am looking for comments on whether the rationale behind git-std-lib
> makes sense as well as whether there are better ways to build and enable
> git-std-lib in patch 7, specifically regarding Makefile rules and the
> usage of ifdef's to stub out certain functions and headers. 

If the cleanups are independent I think it would be simpler to put them
in a separate series.

In general, I think the doc would make a stronger case if it expanded
the discussions around alternative approaches to the one proposed, with
the reasons why they were rejected.

Minor nits:
- Documentation/technical/git-std-lib.txt: (style) prefer "we" over "I" ("we
  believe" instead of "I believe").
- There are some "\ No newline at end of file" warnings in this series.

Thanks,
Linus

^ permalink raw reply

* Re: SHA256 support not experimental, or?
From: Patrick Steinhardt @ 2023-06-30  9:31 UTC (permalink / raw)
  To: brian m. carlson, Junio C Hamano, Adam Majer, git
In-Reply-To: <ZJ4uKYIZMxi3DHo3@tapette.crustytoothpaste.net>

[-- Attachment #1: Type: text/plain, Size: 2516 bytes --]

On Fri, Jun 30, 2023 at 01:21:45AM +0000, brian m. carlson wrote:
> On 2023-06-29 at 22:22:51, Junio C Hamano wrote:
> > True, and our messaging should avoid scaring them away from doing
> > so.  But isn't the lack of interoperability one of the reasons why
> > GitHub and Gitlab do not yet offer choice of the hash?  There
> > certainly is a chicken-and-egg problem here.
> 
> There are a lot of necessary changes for a forge to adopt SHA-256.  For
> example, at GitHub, we have a single null OID constant in some code that
> has to be addressed, libgit2 has to be taught about SHA-256 or removed,
> and UI changes need to be done to accommodate the larger IDs.  I'm
> sure that GitLab has very similar situations, as do all of the other
> forges.  After all, think about the extensive number of patches that
> went into Git itself to get us there.  Everyone has made all of those
> same assumptions in their forges.

Indeed, supporting SHA256 is a major effort on our side at GitLab. Most
of the work isn't really adapting our production code, but it's rather
that tons of tests were written with seed repositories and hardcoded
object hashes. Converting all of that isn't all that hard in the general
case, but it's a tedious job.

In the Gitaly team we have already started to put significant time into
this problem and are slowly chipping away at it. We are at a state where
most of our codebase works with SHA256 alright, and we in fact continue
down that road as a low-priority side project where we convert a handful
of tests every release.

> I'm certain that whether or not interoperability were available would
> not influence the forges' desire to support SHA-256.  It's simply a lot
> of work to fix all of those spots that need it and requires a lot of
> communication and discussions across teams, all of which takes time.

True as well. Even though Gitaly will likely be SHA256-ready in the not
too distant future, that doesn't mean that GitLab as a whole is. The
frontend will need investments as well, and there's likely a long tail
of other stuff that needs to be done that I ain't yet got on my radar
right now.

In any case I'm fully supportive of relaxing the current warning. Except
for the recently discussed edge case where cloning empty repositories
didn't create a SHA256 repository I have found the SHA256 code to be
stable and working as advertised. We should caution people that many
services will not work with SHA256 yet though.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [RFC PATCH 1/2] Add C TAP harness
From: Phillip Wood @ 2023-06-30  9:48 UTC (permalink / raw)
  To: Oswald Buddenhagen, phillip.wood
  Cc: Linus Arver, git, Ævar Arnfjörð Bjarmason,
	Josh Steadmon, Calvin Wan
In-Reply-To: <ZJ0cIKrSVmwMy6F4@ugly>

On 29/06/2023 06:52, Oswald Buddenhagen wrote:
> On Mon, Jun 26, 2023 at 02:15:39PM +0100, Phillip Wood wrote:
>> On 21/06/2023 16:57, Linus Arver wrote:
>>> - Make the 'TEST' macro accept the test description first. Or, keep the
>>>    'TEST' macro but also name a new macro 'IT' that accepts the
>>>    description first, to encourage usage that reads in a
>>>    behavior-driven-development (BDD) style, like 'IT("should accept 
>>> foo",
>>>    t_bar(...))'. I find some test descriptions easier to write this way.
>>
>> The test description is a printf style format string followed by 
>> arguments. This allows parameterized tests to include the parameter 
>> values in the description to aid debugging but it means the test 
>> function must be the first parameter. We could have IT("should accept 
>> %d", t(), i) but that would be a bit weird.
>>
> with some minor preprocessor magic [1], you could make that
> 
>    IT(("should accept %d", i), t(i))
> 
> which would be somewhat more noisy, but arguably even somewhat clearer.
> notably,
> 
>    IT("should accept foo", t())
> 
> would still work with the same macro.
> 
> [1] https://stackoverflow.com/a/62984543/3685191

Thanks, I'd not come across that trick before. As you say it is a it 
noisy though.

> somewhat on a tangent: it's also possible to overload macros on argument 
> count [2], which may also come in handy.
> 
> [2] https://stackoverflow.com/a/24028231/3685191

When I was writing my original reply to Linus I did wonder if we could 
count the arguments. I didn't pursue it as I don't really want to create 
a dozen different macros for different argument counts. I think TEST() 
is understandable by anyone reading the code whereas IT() seems a bit 
odd unless one is used to BDD.

Best Wishes

Phillip


^ permalink raw reply

* Re: [RFC PATCH 7/8] git-std-lib: introduce git standard library
From: Phillip Wood @ 2023-06-30 10:00 UTC (permalink / raw)
  To: Calvin Wan, phillip.wood; +Cc: git, nasamuffin, chooglen, Jonathan Tan
In-Reply-To: <CAFySSZBMng9nEdCkuT5+fc6rfFgaFfU2E0NP3=jUQC1yRcUE6Q@mail.gmail.com>

Hi Calvin

On 28/06/2023 22:15, Calvin Wan wrote:
>> On 27/06/2023 20:52, Calvin Wan wrote:
>>> The Git Standard Library intends to serve as the foundational library
>>> and root dependency that other libraries in Git will be built off of.
>>> That is to say, suppose we have libraries X and Y; a user that wants to
>>> use X and Y would need to include X, Y, and this Git Standard Library.
>>
>> I think having a library of commonly used functions and structures is a
>> good idea. While I appreciate that we don't want to include everything
>> I'm surprised to see it does not include things like "hashmap.c" and
>> "string-list.c" that will be required by the config library as well as
>> other code in "libgit.a". I don't think we want "libgitconfig.a" and
>> "libgit.a" to both contain a copy of "hashmap.o" and "string-list.o"
> 
> I chose not to include hashmap and string-list in git-std-lib.a in the
> first pass since they can exist as libraries built on top of
> git-std-lib.a. There is no harm starting off with more libraries than
> fewer besides having something like the config library be dependent on
> lib-hashmap.a, lib-string-list.a, and git-std-lib.a rather than only
> git-std-lib.a. They can always be added into git-std-lib.a in the
> future. That being said, I do find it extremely unlikely that someone
> would want to swap out the implementation for hashmap or string-list
> so it is also very reasonable to include them into git-std-lib.a

Finding the right boundary for git-std-lib is a bit of a judgement call. 
We certainly could have separate libraries for things like hashmap, 
string-list, strvec, strmap and wildmatch but there is some overhead 
adding each one to the Makefile. I think their use is common enough that 
it would be continent to have them in git-std-lib but we can always add 
them later.

>>> diff --git a/Makefile b/Makefile
>>> index e9ad9f9ef1..255bd10b82 100644
>>> --- a/Makefile
>>> +++ b/Makefile
>>> @@ -2162,6 +2162,11 @@ ifdef FSMONITOR_OS_SETTINGS
>>>        COMPAT_OBJS += compat/fsmonitor/fsm-path-utils-$(FSMONITOR_OS_SETTINGS).o
>>>    endif
>>>
>>> +ifdef GIT_STD_LIB
>>> +     BASIC_CFLAGS += -DGIT_STD_LIB
>>> +     BASIC_CFLAGS += -DNO_GETTEXT
>>
>> I can see other projects may want to build git-std-lib without gettext
>> support but if we're going to use git-std-lib within git it needs to be
>> able to be built with that support. The same goes for the trace
>> functions that you are redefining in usage.h
> 
> Taking a closer look at gettext.[ch], I believe I can also include it
> into git-std-lib.a with a couple of minor changes.

That's great

> I'm currently
> thinking about how the trace functions should interact with
> git-std-lib.a since Victoria had similar comments on patch 1. I'll
> reply to that thread when I come up with an answer.

One thought I had was to have a compile time flag so someone building 
git-std-lib for an external project could build it with

	make git-std-lib NO_TRACE2=YesPlease

and then we'd either compile against a stub version of trace2 that does 
nothing or use some #define magic to get rid of the calls if that is not 
too invasive.

Best Wishes

Phillip


^ permalink raw reply

* Re: SHA256 support not experimental, or?
From: Adam Majer @ 2023-06-30 11:25 UTC (permalink / raw)
  To: Patrick Steinhardt, brian m. carlson, Junio C Hamano, git
In-Reply-To: <vt3cizczmwbcpgktwrkr3jbiwhee37rt7m243hnkzxik7gt4m2@d2upsqoxtlgc>

On 6/30/23 11:31, Patrick Steinhardt wrote:
> Indeed, supporting SHA256 is a major effort on our side at GitLab. Most
> of the work isn't really adapting our production code, but it's rather
> that tons of tests were written with seed repositories and hardcoded
> object hashes. Converting all of that isn't all that hard in the general
> case, but it's a tedious job.

Hi!

This actually reminds me of a funny story from my side.

Earlier this year, I was testing various frontends and how they would 
handle SHA256 repositories. All of them failed, not surprising. I even 
managed to lock myself out of Gitlab by importing a SHA256 private repo 
into my home project -- every time this project became visible, it would 
result in Error 500 from the UI. Today (few weeks ago), this appears to 
be fixed -- the UI is just broken, so you can't see anything in sha256 
repository, but at least I was able to delete the project.

The repository was correctly imported and I could clone from gitlab, so 
the problem is mostly "just" UI. :-)

The most likely frontend we'll use for our internal project is Gitea. 
The sha256 support is in progress

https://github.com/go-gitea/gitea/pull/23894

 From the size of this patch, you can see how ingrained SHA1 assumption 
was. Most of the patch is just to remove the hardcoded elements, 
including hardcoded SHA1 empty-tree hashes and assumption that 20 bytes 
is enough to hold a hash. And I didn't even add sha256 test cases...

But I have to say that in at least one occasion, people are bringing up 
the experimental nature of git's sha256 support (per current wording) as 
reason not to make their tools sha256 compliant.

> In any case I'm fully supportive of relaxing the current warning. Except
> for the recently discussed edge case where cloning empty repositories
> didn't create a SHA256 repository I have found the SHA256 code to be
> stable and working as advertised. We should caution people that many
> services will not work with SHA256 yet though.

That is exactly true. But this is also chicken-egg problem. Services are 
not adapted for sha256 repositories because there is simply no demand 
for them. Only when people will start using sha256 repos, will there be 
some demand generated.

- Adam

^ permalink raw reply

* Re: SHA256 support not experimental, or?
From: Patrick Steinhardt @ 2023-06-30 11:38 UTC (permalink / raw)
  To: Adam Majer; +Cc: brian m. carlson, Junio C Hamano, git
In-Reply-To: <5880fe56-aa98-64ce-4d91-ca078d3a7354@zombino.com>

[-- Attachment #1: Type: text/plain, Size: 3981 bytes --]

On Fri, Jun 30, 2023 at 01:25:06PM +0200, Adam Majer wrote:
> On 6/30/23 11:31, Patrick Steinhardt wrote:
> > Indeed, supporting SHA256 is a major effort on our side at GitLab. Most
> > of the work isn't really adapting our production code, but it's rather
> > that tons of tests were written with seed repositories and hardcoded
> > object hashes. Converting all of that isn't all that hard in the general
> > case, but it's a tedious job.
> 
> Hi!
> 
> This actually reminds me of a funny story from my side.
> 
> Earlier this year, I was testing various frontends and how they would handle
> SHA256 repositories. All of them failed, not surprising. I even managed to
> lock myself out of Gitlab by importing a SHA256 private repo into my home
> project -- every time this project became visible, it would result in Error
> 500 from the UI. Today (few weeks ago), this appears to be fixed -- the UI
> is just broken, so you can't see anything in sha256 repository, but at least
> I was able to delete the project.

Yeah, thinks gradually start to work. It's kind of satisfying to see how
more and more things start to fall into place.

> The repository was correctly imported and I could clone from gitlab, so the
> problem is mostly "just" UI. :-)

The UI is a significantly broken right now, mostly because the request
routing logic still has a maximum object ID length of 40 characters
hardcoded. So indeed, most of the stuff in the UI doesn't work unless
you do a few changes in the frontend. I should probably just create the
merge request to fix these as I already have those changes available
locally anyway.

But there's other parts that are in the Gitaly backend that don't yet
work. There's some RPCs that parse object IDs, but still use the
hardcoded SHA1 hash. Updating them is trivial, but as mentioned updating
their tests is tedious work.

> The most likely frontend we'll use for our internal project is Gitea. The
> sha256 support is in progress
> 
> https://github.com/go-gitea/gitea/pull/23894
> 
> From the size of this patch, you can see how ingrained SHA1 assumption was.
> Most of the patch is just to remove the hardcoded elements, including
> hardcoded SHA1 empty-tree hashes and assumption that 20 bytes is enough to
> hold a hash. And I didn't even add sha256 test cases...

I guess most projects that started a long time a go made the same error
of taking SHA1 for granted, so they didn't bother writing neither the
production code nor the tests with swapping out the object format in
mind. I guess we've learned our lesson here, which also means that the
next transition (if there ever will be one) should go a lot faster as
the codebases should be prepared then.

> But I have to say that in at least one occasion, people are bringing up the
> experimental nature of git's sha256 support (per current wording) as reason
> not to make their tools sha256 compliant.

Yeah, it's this chicken-and-egg problem. Things are experimental as most
tools ain't got support, but because most things ain't got support we
never get any testers and thus are stuck in that state.

> > In any case I'm fully supportive of relaxing the current warning. Except
> > for the recently discussed edge case where cloning empty repositories
> > didn't create a SHA256 repository I have found the SHA256 code to be
> > stable and working as advertised. We should caution people that many
> > services will not work with SHA256 yet though.
> 
> That is exactly true. But this is also chicken-egg problem. Services are not
> adapted for sha256 repositories because there is simply no demand for them.
> Only when people will start using sha256 repos, will there be some demand
> generated.

Yup, and that is why I have been pushing for SHA256 support internally
at GitLab for quite a while -- our efforts here started almost exactly a
year ago, but has gained more steam in recent months.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: SHA256 support not experimental, or?
From: Son Luong Ngoc @ 2023-06-30 12:20 UTC (permalink / raw)
  To: Adam Majer; +Cc: Patrick Steinhardt, brian m. carlson, Junio C Hamano, git
In-Reply-To: <5880fe56-aa98-64ce-4d91-ca078d3a7354@zombino.com>

Hi,

> On 30 Jun 2023, at 13:25, Adam Majer <adamm@zombino.com> wrote:...
> On 6/30/23 11:31, Patrick Steinhardt wrote:
...
> > In any case I'm fully supportive of relaxing the current warning. Except
> > for the recently discussed edge case where cloning empty repositories
> > didn't create a SHA256 repository I have found the SHA256 code to be
> > stable and working as advertised. We should caution people that many
> > services will not work with SHA256 yet though.
>
> That is exactly true. But this is also chicken-egg problem. Services are not adapted for sha256 repositories because there is simply no demand for them. Only when people will start using sha256 repos, will there be some demand generated.

FWIW, in the Bazel ecosystem where SHA256 is very popular, there has
been an increasing appetite for FUSE file system to lazily fetch contents
of a git repository.

Build tools such as Bazel would often need to hash the content of the
source files to build a dependency graph.  And in a FUSE setup, it would
be ideal if the FUSE server could supply the hash via an xattr, so that
FUSE client does not need to fetch the whole file content and only the
metadata.

Most tools in this space (Bazel, Buck2) are using SHA256 and are exploring
faster hash such as Blake3, Aegis, KangarooTwelve for larger file
support.  As these matured build tools gains popularity, so will the usage
of SHA256 (and newer hash algorithm).

Another point I think might help motivate different forges to
move would be switching from the object's hash to digest (hash and
file size).  The additional file size information would help tremendously
in predicting compute resources when serving files of a repository.

So I think Git would simply need a bit more time for these related
ecosystems to reach a critical mass and help fuel the transition to a
<new-hasher>.

> - Adam

Regards,
Son Luong.

References:

- https://buck2.build/docs/rfcs/drafts/digest-kinds/#use-cases
- https://github.com/bazelbuild/bazel/pull/18784

^ permalink raw reply

* Re: [RFC PATCH v3 1/1] unit tests: Add a project plan document
From: Phillip Wood @ 2023-06-30 14:07 UTC (permalink / raw)
  To: Josh Steadmon, git
  Cc: calvinwan, szeder.dev, chooglen, avarab, gitster, sandals
In-Reply-To: <8afdb215d7e10ca16a2ce8226b4127b3d8a2d971.1686352386.git.steadmon@google.com>

Hi Josh

Thanks for putting this together, I think it is really helpful to have a 
comparison of the various options. Sorry for the slow reply, I was off 
the list for a couple of weeks.

On 10/06/2023 00:25, Josh Steadmon wrote:
> diff --git a/Documentation/technical/unit-tests.txt b/Documentation/technical/unit-tests.txt
> new file mode 100644
> index 0000000000..dac8062a43
> --- /dev/null
> +++ b/Documentation/technical/unit-tests.txt
> @@ -0,0 +1,141 @@
> += Unit Testing

I've deleted the sections I agree with to avoid quoting parts that are 
not relevant to my comments.

> +== Definitions
> +
> +For the purposes of this document, we'll use *test framework* to refer to
> +projects that support writing test cases and running tests within the context
> +of a single executable. *Test harness* will refer to projects that manage
> +running multiple executables (each of which may contain multiple test cases) and
> +aggregating their results.

Thanks for adding this, it is really helpful to have definitions for 
what we mean by "test framework" and "test harness" within the git 
project. It might be worth mentioning somewhere that we already use 
prove as a test harness when running our integration tests.

> +In reality, these terms are not strictly defined, and many of the projects
> +discussed below contain features from both categories.

> +
> +== Choosing a framework & harness
> +
> +=== Desired features
> +
> [...]
> +==== Parallel execution
> +
> +Ideally, we will build up a significant collection of unit tests cases, most
> +likely split across multiple executables. It will be necessary to run these
> +tests in parallel to enable fast develop-test-debug cycles.

This is a good point, though I think it is really a property of the 
harness rather than the framework so we might want to indicate in the 
table whether a framework provides parallelism itself or relies on the 
harness providing it.

 > [...]
> +==== Major platform support
> +
> +At a bare minimum, unit-testing must work on Linux, MacOS, and Windows.

I think we'd want to be able to run unit tests on *BSD and NonStop as 
well, especially as I think some of the platform dependent code probably 
lends itself to being unit tested. I suspect a framework that covers 
Linux and MacOS would probably run on those platforms as well (I don't 
think NonStop has complete POSIX support but it is hard to imagine a 
test framework doing anything very exotic)

> [...]
> +==== Mock support
> +
> +Unit test authors may wish to test code that interacts with objects that may be
> +inconvenient to handle in a test (e.g. interacting with a network service).
> +Mocking allows test authors to provide a fake implementation of these objects
> +for more convenient tests.

Do we have any idea what sort of thing we're likely to want to mock and 
what we want that support to look like?

> +==== Signal & exception handling
> +
> +The test framework must fail gracefully when test cases are themselves buggy or
> +when they are interrupted by signals during runtime.

I had assumed that it would be enough for the test harness to detect if 
a test executable was killed by a signal or exited early due to a bug in 
the test script. That requires the framework to have robust support for 
lazy test plans but I'm not sure that we need it to catch and recover 
from things like SIGSEGV.

> +==== Coverage reports
> +
> +It may be convenient to generate coverage reports when running unit tests
> +(although it may be possible to accomplish this regardless of test framework /
> +harness support).

I agree this would be useful, though perhaps we should build it on our 
existing gcov usage.

Related to this do we want timing reports from the harness or the framework?

> +
> +=== Comparison
> +
> +[format="csv",options="header",width="75%"]
> +|=====
> +Framework,"TAP support","Diagnostic output","Parallel execution","Vendorable / ubiquitous","Maintainable / extensible","Major platform support","Lazy test planning","Runtime- skippable tests","Scheduling / re-running",Mocks,"Signal & exception handling","Coverage reports"
> +https://lore.kernel.org/git/c902a166-98ce-afba-93f2-ea6027557176@gmail.com/[Custom Git impl.],[lime-background]#True#,[lime-background]#True#,?,[lime-background]#True#,[lime-background]#True#,[lime-background]#True#,[lime-background]#True#,?,?,[red-background]#False#,?,?
> +https://cmocka.org/[cmocka],[lime-background]#True#,[lime-background]#True#,?,[red-background]#False#,[yellow-background]#Partial#,[yellow-background]#Partial#,[yellow-background]#Partial#,?,?,[lime-background]#True#,?,?
> +https://libcheck.github.io/check/[Check],[lime-background]#True#,[lime-background]#True#,?,[red-background]#False#,[yellow-background]#Partial#,[lime-background]#True#,[yellow-background]#Partial#,?,?,[red-background]#False#,?,?
> +https://github.com/rra/c-tap-harness/[C TAP],[lime-background]#True#,[red-background]#False#,?,[lime-background]#True#,[yellow-background]#Partial#,[yellow-background]#Partial#,[yellow-background]#Partial#,?,?,[red-background]#False#,?,?
> +https://github.com/silentbicycle/greatest[Greatest],[yellow-background]#Partial#,?,?,[lime-background]#True#,[yellow-background]#Partial#,?,[yellow-background]#Partial#,?,?,[red-background]#False#,?,?
> +https://github.com/Snaipe/Criterion[Criterion],[lime-background]#True#,?,?,[red-background]#False#,?,[lime-background]#True#,?,?,?,[red-background]#False#,?,?
> +https://github.com/zorgnax/libtap[libtap],[lime-background]#True#,?,?,?,?,?,?,?,?,?,?,?
> +https://nemequ.github.io/munit/[µnit],?,?,?,?,?,?,?,?,?,?,?,?
> +https://github.com/google/cmockery[cmockery],?,?,?,?,?,?,?,?,?,[lime-background]#True#,?,?
> +https://github.com/lpabon/cmockery2[cmockery2],?,?,?,?,?,?,?,?,?,[lime-background]#True#,?,?
> +https://github.com/ThrowTheSwitch/Unity[Unity],?,?,?,?,?,?,?,?,?,?,?,?
> +https://github.com/siu/minunit[minunit],?,?,?,?,?,?,?,?,?,?,?,?
> +https://cunit.sourceforge.net/[CUnit],?,?,?,?,?,?,?,?,?,?,?,?
> +https://www.kindahl.net/mytap/doc/index.html[MyTAP],[lime-background]#True#,?,?,?,?,?,?,?,?,?,?,?
> +|=====

Thanks for going through these projects, hopefully we can use this 
information to make a decision on a framework soon.

Best Wishes

Phillip

^ permalink raw reply

* Re: "git commit -m" bug
From: Konstantin Ryabitsev @ 2023-06-30 14:17 UTC (permalink / raw)
  To: Ruben Ticehurst-James; +Cc: git, git-security
In-Reply-To: <5C7CF0D9-4C6F-4207-BA4B-8AC9B472BD75@apple.com>

On Fri, Jun 30, 2023 at 02:30:55PM +0100, 'Ruben Ticehurst-James' via Git Security wrote:
> Hello I am reporting a (potential) bug in git: 
> 
> when I use triple exclamation marks in a commit message (for an example):
> 
> git commit -m “WORKING!!! WOOOO” (command above was git add .)
> 
> or 
> 
> git commit -m "Checking !!! Git” (command above was ls)
> 
> It will instead copy over the last command I used. The two above commands produce this output: 

This is done by your shell (bash). For example, try this:

echo "hello"
echo "hello!!"

-K

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox