* storing pre-computed fine-grained diffs
@ 2012-03-07 23:41 ecloud
2012-03-08 1:04 ` David Barr
0 siblings, 1 reply; 2+ messages in thread
From: ecloud @ 2012-03-07 23:41 UTC (permalink / raw)
To: git
The thing about git, as well as all version control systems I have known so
far which store diffs, is that computing the diff means post-analyzing a
saved file. That is, you use any editor you like, and after making a whole
batch of changes you manually commit to the repository, and the diff
algorithm figures out what you changed. Some information is already lost
about what order you made the changes and what the logical chunks actually
were. But what if there was an editor that could save each individual
change as a separate version? You put the cursor at one point in the file,
and type some text; then you click elsewhere, and the editor does a "git
commit this-file" automatically. Then you select some other text and delete
it, and it does a commit again. It would be nice in that case to avoid
doing the diff at all, because the editor already knows exactly what the
change was. Would it be possible to store these fine-grained changes
directly in a packfile, efficiently? Or would it require a different
storage format? I know the diff algorithm used is already much smarter than
a line-by-line diff, but is the storage format capable of representing
changes over ranges of characters without "extra context" like the
line-by-line diffs usually have?
--
View this message in context: http://git.661346.n2.nabble.com/storing-pre-computed-fine-grained-diffs-tp7353466p7353466.html
Sent from the git mailing list archive at Nabble.com.
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: storing pre-computed fine-grained diffs
2012-03-07 23:41 storing pre-computed fine-grained diffs ecloud
@ 2012-03-08 1:04 ` David Barr
0 siblings, 0 replies; 2+ messages in thread
From: David Barr @ 2012-03-08 1:04 UTC (permalink / raw)
To: ecloud; +Cc: git
Hi Shawn,
On Thu, Mar 8, 2012 at 10:41 AM, ecloud <shawn.t.rutledge@gmail.com> wrote:
> The thing about git, as well as all version control systems I have known so
> far which store diffs, is that computing the diff means post-analyzing a
> saved file. That is, you use any editor you like, and after making a whole
> batch of changes you manually commit to the repository, and the diff
> algorithm figures out what you changed. Some information is already lost
> about what order you made the changes and what the logical chunks actually
> were. But what if there was an editor that could save each individual
> change as a separate version? You put the cursor at one point in the file,
> and type some text; then you click elsewhere, and the editor does a "git
> commit this-file" automatically.
I'm going to skip discussing whether this approach is desirable.
All further comment will assume that it is worthwhile.
> Then you select some other text and delete
> it, and it does a commit again. It would be nice in that case to avoid
> doing the diff at all, because the editor already knows exactly what the
> change was.
This is just a performance consideration. Two relevant facts are that
the delta computation in git is at its fastest when the difference is
small and that SHA1 computation imposes a per-change cost proportional
to the length of the blob.
> Would it be possible to store these fine-grained changes
> directly in a packfile, efficiently? Or would it require a different
> storage format? I know the diff algorithm used is already much smarter than
> a line-by-line diff, but is the storage format capable of representing
> changes over ranges of characters without "extra context" like the
> line-by-line diffs usually have?
The current pack format for git has quite an efficient delta
representation. It uses a byte-wise binary diff, so there is no
context in the physical representation. It is possible to build a pack
as described using the fast-import interface, and it would be
straightforward to write an editor backend that persisted via
fast-import. (Assuming that you are hacking on an editor.)
--
David Barr
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2012-03-08 1:04 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-03-07 23:41 storing pre-computed fine-grained diffs ecloud
2012-03-08 1:04 ` David Barr
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).