git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Andres G. Aragoneses" <knocte@gmail.com>
To: git@vger.kernel.org
Subject: RFC: reverse history tree, for faster & background clones
Date: Fri, 12 Jun 2015 13:26:42 +0200	[thread overview]
Message-ID: <mlefli$h6v$1@ger.gmane.org> (raw)

Hello git devs,

I'm toying with an idea of an improvement I would like to work on, but 
not sure if it would be desirable enough to be considered good to merge 
in the end, so I'm requesting your opinions before I work on it.

AFAIU git stores the contents of a repo as a sequence of patches in the 
.git metadata folder. So then let's look at an example to illustrate my 
point more easily.

Repo foo contains the following 2 commits:

1 file, first commit, with the content:
+First Line
+Second Line
+Third Line

2nd and last commit:
  First Line
  Second Line
-Third Line
+Last Line

Simple enough, right?

But, what if we decided to store it backwards in the metadata?

So first commit would be:
1 file, first commit, with the content:
+First Line
+Second Line
+Last Line

2nd commit:
  First Line
  Second Line
-Last Line
+Third Line


This would bring some advantages, as far as I understand:

1. `git clone --depth 1` would be way faster, and without the need of 
on-demand compressing of packfiles in the server side, correct me if I'm 
wrong?
2. `git clone` would be able to allow a "fast operation, complete in the 
background" mode that would allow you to download the first snapshot of 
the repo very quickly, so that the user would be able to start working 
on his working directory very quickly, while a "background job" keeps 
retreiving the history data in the background.
3. Any more advantages you see?


I'm aware that this would have also downsides, but IMHO the benefits 
would outweigh them. The ones I see:
1. Everytime a commit is made, a big change of the history-metadata tree 
would need to happen. (Well but this is essentially equivalent to 
enabling an INDEX in a DB, you make WRITES more expensive in order to 
improve the speed of READS.)
2. Locking issues? I imagine that rewriting the indexes would open 
longer time windows to have locking issues, but I'm not an expert in 
this, please expand.
3. Any more downsides you see?


I would be glad for any feedback you have. Thanks, and have a great day!

   Andrés

-- 

             reply	other threads:[~2015-06-12 11:26 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-12 11:26 Andres G. Aragoneses [this message]
2015-06-12 11:33 ` RFC: reverse history tree, for faster & background clones Dennis Kaarsemaker
2015-06-12 11:39   ` Andres G. Aragoneses
2015-06-12 12:33     ` Dennis Kaarsemaker
2015-06-14 14:14       ` Andres G. Aragoneses

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='mlefli$h6v$1@ger.gmane.org' \
    --to=knocte@gmail.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).