From: "Jakub Narębski" <jnareb@gmail.com>
To: Linus Torvalds <torvalds@linux-foundation.org>,
Junio C Hamano <gitster@pobox.com>
Cc: Shawn Pearce <spearce@spearce.org>,
Jarrad Hope <me@jarradhope.com>, git <git@vger.kernel.org>
Subject: Re: Tackling Git Limitations with Singular Large Line-seperated Plaintext files
Date: Mon, 30 Jun 2014 14:56:04 +0200 [thread overview]
Message-ID: <53B15E64.9030005@gmail.com> (raw)
In-Reply-To: <CA+55aFx6vFyZvpyQot_3Ym7wsCZ06abjNx_hEKkza-N856jMnw@mail.gmail.com>
Linus Torvalds wrote:
> On Fri, Jun 27, 2014 at 10:48 AM, Junio C Hamano <gitster@pobox.com> wrote:
>>
>> Even though the original question mentioned "delta discovery", I
>> think what was being asked is not "delta" in the Git sense (which
>> your answer is about) but is "can we diff two long sequences of text
>> (that happens to consist of only 4-letter alphabet but that is a
>> irrelevant detail) without holding both in-core in their entirety?",
>> which is a more relevant question/desire from the application point
>> of view.
>
> .. even there, there's another issue. With enough memory, the diff
> itself should be fairly reasonable to do, but we do not have any sane
> *format* for diffing those kinds of things.
>
> The regular textual diff is line-based, and is not amenable to
> comparing two long lines. You'll just get a diff that says "the two
> really long lines are different".
>
> The binary diff option should work, but it is a horrible output
> format, and not very helpful. It contains all the relevant data ("copy
> this chunk from here to here"), but it's then shown in a binary
> encoding that isn't really all that useful if you want to say "what
> are the differences between these two chromosomes".
There is also --word-diff[=<mode>] word-based textual diff,
and I think one can abuse --word-diff-regex=<regex> for
character-based diff... or maybe not, as <regex> specifies
word characters, not words or word separators.
--
Jakub Narębski
next prev parent reply other threads:[~2014-06-30 12:56 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-06-27 8:45 Tackling Git Limitations with Singular Large Line-seperated Plaintext files Jarrad Hope
2014-06-27 15:45 ` Shawn Pearce
2014-06-27 17:48 ` Junio C Hamano
2014-06-27 19:38 ` Linus Torvalds
2014-06-27 19:47 ` Linus Torvalds
2014-06-27 19:55 ` Jason Pyeron
2014-06-27 20:13 ` Linus Torvalds
2014-06-28 6:51 ` Jarrad Hope
2014-06-30 12:56 ` Jakub Narębski [this message]
2014-08-10 21:45 ` Øyvind A. Holm
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=53B15E64.9030005@gmail.com \
--to=jnareb@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=me@jarradhope.com \
--cc=spearce@spearce.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.