All of lore.kernel.org
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Shawn Pearce <spearce@spearce.org>
Cc: Jarrad Hope <me@jarradhope.com>, git <git@vger.kernel.org>
Subject: Re: Tackling Git Limitations with Singular Large Line-seperated Plaintext files
Date: Fri, 27 Jun 2014 10:48:49 -0700	[thread overview]
Message-ID: <xmqqegya2qgu.fsf@gitster.dls.corp.google.com> (raw)
In-Reply-To: <CAJo=hJtJCy96SRYmOxEpEMoEVcaegv0SCG0_AH2u0=bSrHZi_A@mail.gmail.com> (Shawn Pearce's message of "Fri, 27 Jun 2014 08:45:02 -0700")

Shawn Pearce <spearce@spearce.org> writes:

> Git does source code well. I don't know enough to judge if DNA/RNA
> sequence storage is similar enough to source code to benefit from
> things like `git log -p` showing deltas over time, or if some other
> algorithm would be more effective.
>
>> From my understanding the largest problem revolves around git's delta
>> discovery method, holding 2 files in memory at once - is there a
>> reason this could not be adapted to page/chunk the data in a sliding
>> window fashion ?
>
> During delta discovery Git holds like 11 files in memory at once....

Even though the original question mentioned "delta discovery", I
think what was being asked is not "delta" in the Git sense (which
your answer is about) but is "can we diff two long sequences of text
(that happens to consist of only 4-letter alphabet but that is a
irrelevant detail) without holding both in-core in their entirety?",
which is a more relevant question/desire from the application point
of view.

"Is there a reason this could not be adapted?"  No, there is no
particular reason why this "could not".  I think that the only
reason we only do in-core diff is because "adapting to page/chunk"
hasn't been anybody's high priority list of itches to scratch.

  reply	other threads:[~2014-06-27 17:49 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-27  8:45 Tackling Git Limitations with Singular Large Line-seperated Plaintext files Jarrad Hope
2014-06-27 15:45 ` Shawn Pearce
2014-06-27 17:48   ` Junio C Hamano [this message]
2014-06-27 19:38     ` Linus Torvalds
2014-06-27 19:47       ` Linus Torvalds
2014-06-27 19:55       ` Jason Pyeron
2014-06-27 20:13         ` Linus Torvalds
2014-06-28  6:51           ` Jarrad Hope
2014-06-30 12:56       ` Jakub Narębski
2014-08-10 21:45         ` Øyvind A. Holm

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqegya2qgu.fsf@gitster.dls.corp.google.com \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=me@jarradhope.com \
    --cc=spearce@spearce.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.