git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Parsing diff --git lines
@ 2008-03-09  1:48 Simon Fraser
  2008-03-09  4:04 ` Linus Torvalds
  0 siblings, 1 reply; 4+ messages in thread
From: Simon Fraser @ 2008-03-09  1:48 UTC (permalink / raw)
  To: git

I'm working on a GUI for git, and I'd like to be able to provide
some diff navigation tools. That requires that I can find the
file chunks in a diff, and parse out the file names.

However, I don't see a reliable way to identify the two files
from a "diff --git" line. Here's a (deliberately pathological)
example:

diff --git a/a / b/file with spaces.txt b/a / b/file with spaces.txt

In this case, the repository contains directories called "a " and
" b" and the file names have spaces in.

What would make this possible would be either to always quote
file paths containing spaces, or use a character other than
a space (e.g. a \t) between the two file names.

Thanks
Simon


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Parsing diff --git lines
  2008-03-09  1:48 Parsing diff --git lines Simon Fraser
@ 2008-03-09  4:04 ` Linus Torvalds
  2008-03-09  8:25   ` Jakub Narebski
  0 siblings, 1 reply; 4+ messages in thread
From: Linus Torvalds @ 2008-03-09  4:04 UTC (permalink / raw)
  To: Simon Fraser; +Cc: git



On Sat, 8 Mar 2008, Simon Fraser wrote:
> 
> However, I don't see a reliable way to identify the two files
> from a "diff --git" line. Here's a (deliberately pathological)
> example:

See how "git-apply" does it.

The rule is:

 - if the filenames are different, you should ignore the filenames on the 
   "diff --git" line, and trust the ones on the "renamed from/to" ones 
   (which are unambiguous because they only have one filename per line)

 - if the filenames aren't different, then you can unambiguously know how 
   to parse it by simply making sure they are the same.

So to take your example:

> diff --git a/a / b/file with spaces.txt b/a / b/file with spaces.txt

Here, you can *know* that the filename is "a / b/file with spaces.txt", 
because it must match the pattern "a/$filename b/$filename", and no other 
split at a space would ever do that!

See in particular "git_header_name()" in builtin-apply.c. See the comments 
both at the top of the function and there in the middle to reflect the 
above rule:

/*
 * This is to extract the same name that appears on "diff --git"
 * line.  We do not find and return anything if it is a rename
 * patch, and it is OK because we will find the name elsewhere.
 * We need to reliably find name only when it is mode-change only,
 * creation or deletion of an empty file.  In any of these cases,
 * both sides are the same name under a/ and b/ respectively.
 */
...
        /*
         * Accept a name only if it shows up twice, exactly the same
         * form.
         */

> What would make this possible would be either to always quote
> file paths containing spaces, or use a character other than
> a space (e.g. a \t) between the two file names.

Well, git didn't originally ever quote filenames at all, and I actually 
wanted the "diff --git" line to look as much like a regular "diff -urN" 
line as possible (which has spaces between the names)

These days, we could quote, but hey, anybody who parses diff lines needs 
to be able to handle the non-quoted form *anyway*, so quoting doesn't 
really help anybody in the end.

			Linus

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Parsing diff --git lines
  2008-03-09  4:04 ` Linus Torvalds
@ 2008-03-09  8:25   ` Jakub Narebski
  2008-03-09 20:45     ` Johannes Schindelin
  0 siblings, 1 reply; 4+ messages in thread
From: Jakub Narebski @ 2008-03-09  8:25 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Simon Fraser, git

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Sat, 8 Mar 2008, Simon Fraser wrote:
> > 
> > However, I don't see a reliable way to identify the two files
> > from a "diff --git" line. Here's a (deliberately pathological)
> > example:
> 
> See how "git-apply" does it.
> 
> The rule is:
> 
>  - if the filenames are different, you should ignore the filenames on the 
>    "diff --git" line, and trust the ones on the "renamed from/to" ones 
>    (which are unambiguous because they only have one filename per line)
> 
>  - if the filenames aren't different, then you can unambiguously know how 
>    to parse it by simply making sure they are the same.

By the way, the default pre-commit hook behaves a bit strangely on
files which contain spaces[*1*] in filename (due to GNU-diff-uism):

*
* You have some suspicious patch lines:
*
* In  b
* trailing whitespace (line 1)
 b:1:++ b/ b	

Foootnote:
==========
[*1*] If file has '"', '\' or control character in filename,
      it is quoted. 
-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Parsing diff --git lines
  2008-03-09  8:25   ` Jakub Narebski
@ 2008-03-09 20:45     ` Johannes Schindelin
  0 siblings, 0 replies; 4+ messages in thread
From: Johannes Schindelin @ 2008-03-09 20:45 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

Hi,

On Sun, 9 Mar 2008, Jakub Narebski wrote:

> Foootnote:

Alternatively, you can write "Bigfootnote" or "Sasquatchnote".

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2008-03-09 20:45 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-03-09  1:48 Parsing diff --git lines Simon Fraser
2008-03-09  4:04 ` Linus Torvalds
2008-03-09  8:25   ` Jakub Narebski
2008-03-09 20:45     ` Johannes Schindelin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).