From: Jakub Narebski <jnareb@gmail.com>
To: Jeff King <peff@peff.net>
Cc: Junio C Hamano <gitster@pobox.com>,
Sverre Rabbelier <srabbelier@gmail.com>,
Julian Phillips <julian@quantumfyre.co.uk>,
git@vger.kernel.org, Eric Raymond <esr@thyrsus.com>
Subject: Re: [RFC/PATCH v2 0/4] A new library for plumbing output
Date: Thu, 15 Apr 2010 11:07:32 +0200 [thread overview]
Message-ID: <201004151107.33892.jnareb@gmail.com> (raw)
In-Reply-To: <20100415065700.GA27542@coredump.intra.peff.net>
On Thu, 15 April 2010, Jeff King wrote:
> On Wed, Apr 14, 2010 at 02:34:01PM -0700, Junio C Hamano wrote:
> > Jakub Narebski <jnareb@gmail.com> writes:
> >
> > > Well, this whole idea started with the fact, that "git status --short"
> > > was hard (or impossible) to parse unambigously by scripts[1], and even
> > > "git status --porcelain -z"[2] is not that easy to parse[3].
> >
> > And you apparently seem to agree with that claim, but I don't. I think
> > Jeff (who did the --porcelain stuff; by the way, why did we lose him from
> > Cc list?) has already said that he is open to an update.
>
> I haven't seen any evidence that status --porcelain (or its -z form) is
> impossible to parse unambiguously. I don't even think it's that hard,
> but it certainly could be easier. But more importantly, from looking at
> the output it's not necessarily _obvious_ how to parse it correctly
> (e.g., whitespace as value and as field separator, syntax of "-z"
> depends on semantics of field contents).
Well, IMVHO output of "git status --short" / "git status --porcelain"
(without '-z') is very hard to parse. Even assuming that in the case
of ambiguity filenames are quoted (which also means that in the case of
ambiguity whether they are quoted they must be quoted), the fact that
separator between source and destination filename in the case of rename
detection is " -> " (if I understand it correctly), and neither of ' '
(SPC), '-' nor '>' is replaced by escape sequence means that one needs
to detect where quoted filename begins and where ends. This means
either parsing character by character, taking into account quoting and
escaping (e.g. '\\', '\"' etc.), or using 'balanced quote' regexp like
the one from Text::Balanced, e.g.: (?:\"(?:[^\\\"]*(?:\\.[^\\\"]*)*)\")
What was the reason behind choosing " -> " as separator between pair[1]
of filenames in rename, instead of using default "git diff --stat" format
i.e. 'arch/{i386 => x86}/Makefile' for "git status --short" which is
meant for end user, and for "git status --porcelain" the same format
that raw diff format, i.e. with TAB as separator between filenames,
and filename quited if it contains TAB (then TAB is relaced by '\t',
and does not appear in filename, therefore you can split on TAB)?
IMVHO "git status --porcelain -z" format is not easy to parse either.
(The same can be said for "git diff --raw -z" output format.) You
can't just split on record separator; you have to take into account
status to check if there are two filenames or one.
[1] A question: we have working area version, index version, and HEAD
version of file. Isn't it possible for *each* of them to have
different filename? What about the case of rename/rename merge
conflict?
>
> The approach I proposed was to leave it be and document it a bit better.
> Adding some format that is close but subtly different is just going to
> lead to more confusion.
Well, the proposed '-Z' output format, in the OFS="\0", ORS="\0\0"
variant, would be very easy to parse. If I understand it correctly
it is also one of available format in outputification^W in this series.
>
> But since Julian was willing to do the JSON work, I think that is a much
> nicer approach. It's not subtly different; it's very different and way
> easier to read and parse. And I'm really happy with the way he has
> structured the code to handle multiple output formats. It keeps the code
> much cleaner, and it should silence any "but YAML is better than JSON is
> better than XML" debates.
I really like this outputification ;-) too.
Although if possible I'd like to have it wrapped in utility macros,
like parseopt, so one does not need to write output_str / output_int
etc.... but currently it is very, very vague sketch of an idea, rather
than realized concept.
>
> Even with Julian's patches, we should still better document the regular
> and "-z" forms. Eric promised to send some patches this week; I'm hoping
> he is still interested in doing so after seeing a better solution arise.
> :)
--
Jakub Narebski
Poland
next prev parent reply other threads:[~2010-04-15 9:07 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-04-11 23:21 [RFC/PATCH v2 0/4] A new library for plumbing output Julian Phillips
2010-04-11 23:21 ` [RFC/PATCH v2 1/4] output: Add a " Julian Phillips
2010-04-13 9:43 ` Ilari Liusvaara
2010-04-13 11:46 ` Julian Phillips
2010-04-11 23:21 ` [RFC/PATCH v2 2/4] ls-tree: complete conversion to using output library Julian Phillips
2010-04-11 23:21 ` [RFC/PATCH v2 3/4] status: use output library for porcelain output Julian Phillips
2010-04-11 23:21 ` [RFC/PATCH v2 4/4] output: WIP: Add XML backend Julian Phillips
2010-04-11 23:35 ` [RFC/PATCH v2 0/4] A new library for plumbing output Sverre Rabbelier
2010-04-12 0:46 ` Eric Raymond
2010-04-14 19:10 ` Jakub Narebski
2010-04-14 19:13 ` Sverre Rabbelier
2010-04-14 21:42 ` Jakub Narebski
2010-04-14 19:32 ` Junio C Hamano
2010-04-14 20:12 ` Jakub Narebski
2010-04-14 20:38 ` Junio C Hamano
2010-04-14 21:29 ` Jakub Narebski
2010-04-14 21:34 ` Junio C Hamano
2010-04-15 6:57 ` Jeff King
2010-04-15 9:07 ` Jakub Narebski [this message]
2010-04-17 9:53 ` Jeff King
2010-04-17 13:02 ` Jakub Narebski
2010-04-17 14:00 ` Jeff King
2010-04-18 21:46 ` [RFC/PATCH v2 0/4] A new library for plumbing output (inc. current status) Julian Phillips
2010-04-19 19:40 ` Jeff King
2010-04-14 20:57 ` [RFC/PATCH v2 0/4] A new library for plumbing output Julian Phillips
2010-04-14 21:16 ` Jakub Narebski
2010-04-14 21:28 ` Julian Phillips
2010-04-15 7:15 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=201004151107.33892.jnareb@gmail.com \
--to=jnareb@gmail.com \
--cc=esr@thyrsus.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=julian@quantumfyre.co.uk \
--cc=peff@peff.net \
--cc=srabbelier@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).