git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Pretty output in JSON format
@ 2024-09-24 21:52 Ron Ziroby Romero
  2024-09-24 22:06 ` brian m. carlson
  0 siblings, 1 reply; 8+ messages in thread
From: Ron Ziroby Romero @ 2024-09-24 21:52 UTC (permalink / raw)
  To: git

Howdy git folk,

I want to revive the discussion on JSON output. I see a discussion in
2021 about it, but it didnt come to a resolution. That discussion was
talking about adding a --json flag. I have a slightly different
approach.

I see online that many people have tried to make various hacks to
convert git output into JSON, but they all lack completeness,
especially with log messages with arbitrary text. I believe the best
and most correct way to get JSON output from git is to add it as a new
format to the pretty option. Then, it would be easy to pipe the output
into something like jq to parse the JSON.  Trying to convert git's
output into JSON is a losing proposition. You've already lost some of
the context of the output by getting it out of the git program itself.
A pretty option would provide a standard way to get correct JSON
output, with git's code handling the weird corner cases.

What do y'all think?

Cheers,
Ron Ziroby Romero

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Pretty output in JSON format
  2024-09-24 21:52 Pretty output in JSON format Ron Ziroby Romero
@ 2024-09-24 22:06 ` brian m. carlson
  2024-09-25 18:45   ` Sean Allred
  0 siblings, 1 reply; 8+ messages in thread
From: brian m. carlson @ 2024-09-24 22:06 UTC (permalink / raw)
  To: Ron Ziroby Romero; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 1869 bytes --]

On 2024-09-24 at 21:52:35, Ron Ziroby Romero wrote:
> Howdy git folk,
> 
> I want to revive the discussion on JSON output. I see a discussion in
> 2021 about it, but it didnt come to a resolution. That discussion was
> talking about adding a --json flag. I have a slightly different
> approach.
> 
> I see online that many people have tried to make various hacks to
> convert git output into JSON, but they all lack completeness,
> especially with log messages with arbitrary text. I believe the best
> and most correct way to get JSON output from git is to add it as a new
> format to the pretty option. Then, it would be easy to pipe the output
> into something like jq to parse the JSON.  Trying to convert git's
> output into JSON is a losing proposition. You've already lost some of
> the context of the output by getting it out of the git program itself.
> A pretty option would provide a standard way to get correct JSON
> output, with git's code handling the weird corner cases.
> 
> What do y'all think?

I think this is ultimately a bad idea.  JSON requires that the output be
UTF-8, but Git processes a large amount of data, including file names,
ref names, commit messages, author and committer identities, diff
output, and other file contents, that are not restricted to UTF-8.  In
fact, despite my recommendation, the trace2 JSON output simply outputs
invalid UTF-8, which just doesn't work in nearly any tool, if it
encounters such data.  We shouldn't add more broken-by-default
functionality.

However, if you were interested in CBOR output, which isn't
human-readable but is capable of handling byte strings, then I don't see
a problem.  CBOR is used in FIDO2 and a variety of other protocols and
is interoperable, so it should be a fine choice here.
-- 
brian m. carlson (they/them or he/him)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Pretty output in JSON format
  2024-09-24 22:06 ` brian m. carlson
@ 2024-09-25 18:45   ` Sean Allred
  2024-09-26 21:04     ` brian m. carlson
  0 siblings, 1 reply; 8+ messages in thread
From: Sean Allred @ 2024-09-25 18:45 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Ron Ziroby Romero, git

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> On 2024-09-24 at 21:52:35, Ron Ziroby Romero wrote:
>> What do y'all think?
>
> I think this is ultimately a bad idea.  JSON requires that the output be
> UTF-8, but Git processes a large amount of data, including file names,
> ref names, commit messages, author and committer identities, diff
> output, and other file contents, that are not restricted to UTF-8.

This strikes me with a little bit of 'perfect as the enemy of good'
here. I'm sure there are ways to signal an encoding failure. I would,
however, caution against trying to provide diff output in JSON. That
just seems... odd. Maybe base64 it first? (I don't know -- I just
struggle to see the use-case here.)

> However, if you were interested in CBOR output, which isn't
> human-readable but is capable of handling byte strings, then I don't
> see a problem. CBOR is used in FIDO2 and a variety of other protocols
> and is interoperable, so it should be a fine choice here.

CBOR would certainly solve the byte stream problem, but I think it would
primarily be only useful for 'serious' toolsmiths that need to handle
wildly unpredictable data. For most uses, JSON would get the job done.

>> What do y'all think?
As with all things, I'd suggest you draw up a more formal proposal of
exactly how this would work, and then that proposal can be discussed.
How would you use this option? What would its behavior be? What's in
scope? What's _not_ in scope? :-)

-- 
Sean Allred

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Pretty output in JSON format
  2024-09-25 18:45   ` Sean Allred
@ 2024-09-26 21:04     ` brian m. carlson
  2024-09-27  6:49       ` Ron Ziroby Romero
       [not found]       ` <CANgJU+Xs-sQgAOCPL-5skaZGq7eHmhg0MaFGDr8N57=CK67iog@mail.gmail.com>
  0 siblings, 2 replies; 8+ messages in thread
From: brian m. carlson @ 2024-09-26 21:04 UTC (permalink / raw)
  To: Sean Allred; +Cc: Ron Ziroby Romero, git

[-- Attachment #1: Type: text/plain, Size: 1847 bytes --]

On 2024-09-25 at 18:45:54, Sean Allred wrote:
> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> 
> > On 2024-09-24 at 21:52:35, Ron Ziroby Romero wrote:
> >> What do y'all think?
> >
> > I think this is ultimately a bad idea.  JSON requires that the output be
> > UTF-8, but Git processes a large amount of data, including file names,
> > ref names, commit messages, author and committer identities, diff
> > output, and other file contents, that are not restricted to UTF-8.
> 
> This strikes me with a little bit of 'perfect as the enemy of good'
> here. I'm sure there are ways to signal an encoding failure. I would,
> however, caution against trying to provide diff output in JSON. That
> just seems... odd. Maybe base64 it first? (I don't know -- I just
> struggle to see the use-case here.)

I understand JSON output would be useful, but it's also not useful to
randomly fail to do git for-each-ref (for example) because someone has a
non-UTF-8 ref, or to fail to do a git log because of encoding problems
(which absolutely is a problem in the Linux kernel tree).  "It works
most of the time, but seemingly randomly fails" is not a good user
experience, and I'm opposed to adding serialization formats that do
that.  (For that reason, just-send-bytes that produces invalid JSON on
occasion is also unacceptable.)

If we always base64-encoded or percent-encoded the things that aren't
guaranteed to be UTF-8, then we could well create JSON.  However, that
makes working with the data structure in most scripting languages a pain
since there's no automatic decoding of this data.  In strongly typed
languages like Rust, it's possible to do this decoding with no problem,
but I expect that's not most users who'd want this feature.
-- 
brian m. carlson (they/them or he/him)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Pretty output in JSON format
  2024-09-26 21:04     ` brian m. carlson
@ 2024-09-27  6:49       ` Ron Ziroby Romero
       [not found]       ` <CANgJU+Xs-sQgAOCPL-5skaZGq7eHmhg0MaFGDr8N57=CK67iog@mail.gmail.com>
  1 sibling, 0 replies; 8+ messages in thread
From: Ron Ziroby Romero @ 2024-09-27  6:49 UTC (permalink / raw)
  To: brian m. carlson, Sean Allred, Ron Ziroby Romero, git

On Thu, 26 Sept 2024 at 22:04, brian m. carlson
<sandals@crustytoothpaste.net> wrote:
>
> On 2024-09-25 at 18:45:54, Sean Allred wrote:
> > "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> >
> > > On 2024-09-24 at 21:52:35, Ron Ziroby Romero wrote:
> > >> What do y'all think?
> > >
> > > I think this is ultimately a bad idea.  JSON requires that the output be
> > > UTF-8, but Git processes a large amount of data, including file names,
> > > ref names, commit messages, author and committer identities, diff
> > > output, and other file contents, that are not restricted to UTF-8.
> >
> > This strikes me with a little bit of 'perfect as the enemy of good'
> > here. I'm sure there are ways to signal an encoding failure. I would,
> > however, caution against trying to provide diff output in JSON. That
> > just seems... odd. Maybe base64 it first? (I don't know -- I just
> > struggle to see the use-case here.)
>
> I understand JSON output would be useful, but it's also not useful to
> randomly fail to do git for-each-ref (for example) because someone has a
> non-UTF-8 ref, or to fail to do a git log because of encoding problems
> (which absolutely is a problem in the Linux kernel tree).  "It works
> most of the time, but seemingly randomly fails" is not a good user
> experience, and I'm opposed to adding serialization formats that do
> that.  (For that reason, just-send-bytes that produces invalid JSON on
> occasion is also unacceptable.)
>
> If we always base64-encoded or percent-encoded the things that aren't
> guaranteed to be UTF-8, then we could well create JSON.  However, that
> makes working with the data structure in most scripting languages a pain
> since there's no automatic decoding of this data.  In strongly typed
> languages like Rust, it's possible to do this decoding with no problem,
> but I expect that's not most users who'd want this feature.

I do plan on percent-encoding all non-UTF-8 data.  It sounds like a
good way to check this feature would be to call "git log
--pretty:json" on the Linux kernel and ensure we get a valid, though
massive, UTF-8 JSON file. (Not as an automated test, but as a way to
check that we've covered everything. Any stumbling blocks should be
put into an automated test.) The use case I'm thinking of is piping
data to jq to process it.

CBOR output seems useful, but I see it as a follow-up project. JSON
output would be more beneficial to more people, so I feel we should
tackle it first.

> >> What do y'all think?
> As with all things, I'd suggest you draw up a more formal proposal of
> exactly how this would work, and then that proposal can be discussed.
> How would you use this option? What would its behavior be? What's in
> scope? What's _not_ in scope? :-)

OK, I'll start working on a more formal proposal.

--
Ron Ziroby Romero

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Pretty output in JSON format
       [not found]         ` <CAGW8g7=xK0S-i_Ekfwwo_NjMbngO_5m4LERtWRhSCgA0vf+ZAg@mail.gmail.com>
@ 2025-07-31 20:19           ` Ron Ziroby Romero
  0 siblings, 0 replies; 8+ messages in thread
From: Ron Ziroby Romero @ 2025-07-31 20:19 UTC (permalink / raw)
  To: demerphq; +Cc: brian m. carlson, Sean Allred, git

> On Fri, 27 Sept 2024, 10:30 demerphq, <demerphq@gmail.com> wrote:
>>
>> On Thu, 26 Sept 2024 at 23:04, brian m. carlson <sandals@crustytoothpaste.net> wrote:
>>>
>>> On 2024-09-25 at 18:45:54, Sean Allred wrote:
>>> > "brian m. carlson" <sandals@crustytoothpaste.net> writes:
>>> >
>>> > > On 2024-09-24 at 21:52:35, Ron Ziroby Romero wrote:
>>> > >> What do y'all think?
>>> > >
>>> > > I think this is ultimately a bad idea.  JSON requires that the output be
>>> > > UTF-8, but Git processes a large amount of data, including file names,
>>> > > ref names, commit messages, author and committer identities, diff
>>> > > output, and other file contents, that are not restricted to UTF-8.
>>> >
>>> > This strikes me with a little bit of 'perfect as the enemy of good'
>>> > here. I'm sure there are ways to signal an encoding failure. I would,
>>> > however, caution against trying to provide diff output in JSON. That
>>> > just seems... odd. Maybe base64 it first? (I don't know -- I just
>>> > struggle to see the use-case here.)
>>>
>>> I understand JSON output would be useful, but it's also not useful to
>>> randomly fail to do git for-each-ref (for example) because someone has a
>>> non-UTF-8 ref, or to fail to do a git log because of encoding problems
>>
>>
>> I dont really follow your argument, and I find it weird how you are talking about a specific encoding of unicode instead of Unicode itself.
>>
>> It is possible to represent every binary string as Unicode encoded as UTF-8 (or any of the UTF encodings). It may not be bytewise equivalent with the original, but why should that matter? There are a set of clear rules for doing the required transformations, and there is a huge body of tooling to do so. As long as you know the target encoding, you should be able to round trip data properly.
>>
>> IMO CBOR would just complicate what should be a relatively simple problem to solve.
>>
>> cheers,
>> Yves
>>
>>
>>
>> --
>> perl -Mre=debug -e "/just|another|perl|hacker/"



Hi. I've been working with the code and trying to figure out how to do
this. I've also started work on a formal proposal. Two things have
come up that I wanted to discuss:

First, I'm questioning my approach of hacking pretty.c with a series
of 'if json' blocks. Would it be better to make a new file, json-log.,
and divorce myself from the pretty flow entirely? This would also go
hand in hand with changing from "--pretty=json" to simply "--json"

Second, I see that someone is adding a --json flag to git status[1]. I
figure that argues for git log to use the --json flag. I don't think
that affects me other than making the case for this JSON output.

 ## References

 [1] Patrick Steinhardt, “Re: [PATCH] diff: add --json output format,”
message to git@vger.kernel.org, July 29, 2025.
https://public-inbox.org/git/pull.1937.git.1753856826464.gitgitgadget@gmail.com/

 Thanks,
 Ziroby Ron Romero

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Pretty output in JSON format
       [not found]       ` <CANgJU+Xs-sQgAOCPL-5skaZGq7eHmhg0MaFGDr8N57=CK67iog@mail.gmail.com>
       [not found]         ` <CAGW8g7=xK0S-i_Ekfwwo_NjMbngO_5m4LERtWRhSCgA0vf+ZAg@mail.gmail.com>
@ 2025-08-04 20:39         ` Ron Ziroby Romero
  2025-08-04 21:19           ` Junio C Hamano
  1 sibling, 1 reply; 8+ messages in thread
From: Ron Ziroby Romero @ 2025-08-04 20:39 UTC (permalink / raw)
  To: demerphq; +Cc: brian m. carlson, Sean Allred, git

On Fri, 27 Sept 2024 at 10:30, demerphq <demerphq@gmail.com> wrote:
>
> On Thu, 26 Sept 2024 at 23:04, brian m. carlson <sandals@crustytoothpaste.net> wrote:
>>
>> On 2024-09-25 at 18:45:54, Sean Allred wrote:
>> > "brian m. carlson" <sandals@crustytoothpaste.net> writes:
>> >
>> > > On 2024-09-24 at 21:52:35, Ron Ziroby Romero wrote:
>> > >> What do y'all think?
>> > >
>> > > I think this is ultimately a bad idea.  JSON requires that the output be
>> > > UTF-8, but Git processes a large amount of data, including file names,
>> > > ref names, commit messages, author and committer identities, diff
>> > > output, and other file contents, that are not restricted to UTF-8.
>> >
>> > This strikes me with a little bit of 'perfect as the enemy of good'
>> > here. I'm sure there are ways to signal an encoding failure. I would,
>> > however, caution against trying to provide diff output in JSON. That
>> > just seems... odd. Maybe base64 it first? (I don't know -- I just
>> > struggle to see the use-case here.)
>>
>> I understand JSON output would be useful, but it's also not useful to
>> randomly fail to do git for-each-ref (for example) because someone has a
>> non-UTF-8 ref, or to fail to do a git log because of encoding problems
>
>
> I dont really follow your argument, and I find it weird how you are talking about a specific encoding of unicode instead of Unicode itself.
>
> It is possible to represent every binary string as Unicode encoded as UTF-8 (or any of the UTF encodings). It may not be bytewise equivalent with the original, but why should that matter? There are a set of clear rules for doing the required transformations, and there is a huge body of tooling to do so. As long as you know the target encoding, you should be able to round trip data properly.
>
> IMO CBOR would just complicate what should be a relatively simple problem to solve.

Hi. I've been working with the code and trying to figure out how to do
this. I've also started work on a formal proposal. Two things have
come up that I wanted to discuss:

First, I'm questioning my approach of hacking pretty.c with a series
of 'if json' blocks. Would it be better to make a new file,
json-log.c, and divorce myself from the pretty flow entirely? This
would also go hand in hand with changing from "--pretty=json" to
simply "--json"

Second, I see that someone is adding a --json flag to git status[1]. I
figure that argues for git log to use the --json flag. I don't think
that affects me other than making the case for this JSON output.

## References

[1] Patrick Steinhardt, “Re: [PATCH] diff: add --json output format,”
message to git@vger.kernel.org, July 29, 2025.
https://public-inbox.org/git/pull.1937.git.1753856826464.gitgitgadget@gmail.com/

>
> cheers,
> Yves
>
>
>
> --
> perl -Mre=debug -e "/just|another|perl|hacker/"

Cheers,
Ziroby Ron Romero

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Pretty output in JSON format
  2025-08-04 20:39         ` Ron Ziroby Romero
@ 2025-08-04 21:19           ` Junio C Hamano
  0 siblings, 0 replies; 8+ messages in thread
From: Junio C Hamano @ 2025-08-04 21:19 UTC (permalink / raw)
  To: Ron Ziroby Romero; +Cc: demerphq, brian m. carlson, Sean Allred, git

Ron Ziroby Romero <ziroby@gmail.com> writes:

> First, I'm questioning my approach of hacking pretty.c with a series
> of 'if json' blocks. Would it be better to make a new file,
> json-log.c, and divorce myself from the pretty flow entirely?

The same question to the other thread applies: why json?

If the objective is to give a parseable output for machines to
robustly read, then I do not think you want to use any of the
infrastructure laid by and for the pretty_print_commit() function,
whose purpose is quite the opposite, like squeezing inter paragraph
spaces, trimming trailing whitespaces, indenting even an empty line
by 4 spaces, etc., etc.

> Second, I see that someone is adding a --json flag to git status[1]. I
> figure that argues for git log to use the --json flag. I don't think
> that affects me other than making the case for this JSON output.

Please don't.

That other thread is getting discouraged from introducing a new
option just for a single new format.  Unfortunately "status" does
not have the --format={short,long,...} so we need to add one new
option to allow new formats to be added in a more generic way, but
once that is done, the next new format would not have to add a new
option.  Compared to it, "log" already has --pretty={...}, so we do
not have to add --json just for this single format, which makes us
luckier than the other thread.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-08-04 21:19 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-24 21:52 Pretty output in JSON format Ron Ziroby Romero
2024-09-24 22:06 ` brian m. carlson
2024-09-25 18:45   ` Sean Allred
2024-09-26 21:04     ` brian m. carlson
2024-09-27  6:49       ` Ron Ziroby Romero
     [not found]       ` <CANgJU+Xs-sQgAOCPL-5skaZGq7eHmhg0MaFGDr8N57=CK67iog@mail.gmail.com>
     [not found]         ` <CAGW8g7=xK0S-i_Ekfwwo_NjMbngO_5m4LERtWRhSCgA0vf+ZAg@mail.gmail.com>
2025-07-31 20:19           ` Ron Ziroby Romero
2025-08-04 20:39         ` Ron Ziroby Romero
2025-08-04 21:19           ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).