[(not so) random thoughts] using git as its own caching tool

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [(not so) random thoughts] using git as its own caching tool
@ 2007-12-12  0:38 Pierre Habouzit
  2007-12-12  6:51 ` Mike Hommey
  2007-12-12 15:35 ` Andreas Ericsson
  0 siblings, 2 replies; 6+ messages in thread
From: Pierre Habouzit @ 2007-12-12  0:38 UTC (permalink / raw)
  To: Git ML

[-- Attachment #1: Type: text/plain, Size: 3819 bytes --]

  That's an idea I have for quite some time, and I wonder why it's not
used in git tools as a general rule.

  This idea is simple, git objects database has two (for this
discussion) very interesting features: its delta compressed cached that
is _very_ efficient, and the reflog.

  I wonder if that would be possible to write some git porcelains (and
builtin API too) that would be more "map" oriented. I mean, we could use
a reference as a pointer to a given tree that would be the map (where
keys have a path form, which is nice). When I say that, I'm thinking
about git-svn, that even with the recent improvements of its .rev_db's
still eats a lot of space with the unhandled.log _and_ the indexes it
stores for _each_ svn branch/tag. This way, instead of many:
    foo/index
    foo/.rev_map.6ef976f9-4de5-0310-a40d-91cae572ec18
    foo/unhandled.log
we would just have a special refs/db/git-svn/foo reference that would be
a tree with three blobs in it: index, rev_map.xxxx, unhandled.log.  (or
probably index would even be a tree but that's another matter). This
way, all the unhandled.log that share a lot of common content would be
nicely compressed by the delta-compression algorithms, with a negligible
overhead (git-svn is _very_ slow because of svn anyways, we don't really
care if it needs to get a blob contents instead opening a flat file).

  Another nifty usage we could have is memoization databases that don't
require a specific tool to expire them, but use the reflog expiration
for that. I remember that we discussed quite some time ago, the idea of
annotating objects. We could use such annotations to link some objects
to memoized datas under different namespaces for each caching scheme
involved, and with one reference per namespace that will have in its
reflog each of the linked objects created over time for caching. Good
candidates to use that are the rr-cache, or git-annotate/blame caching.
Of course that would need to write a tool that removes weak annotations
that point to objects that don't exist anymore. We could also cache the
rename/copies/… detection results, and make those really really cheap to
use[0].

  I know that some will say something about hammers, problems and nails,
though it would allow to develop quite efficient tools with a generic
and easy to use API, that could directly benefit from already existing
infrastructure in git. I mean it's silly to write yet-another cache
expirer when you have the reflog. Or to speak about git-svn again, it
could even version its state per branch the way I propose, it'll end up
using less disk that what it does now, with the immediate gain that it
would be fully clone-able[1] (which would be a _really_ nice feature).

  So am I having crazy thoughts and should I throw my crack-pipe away ?
Or does parts of this mumbling makes any sense to someone ?

PS: It's late, and I'm tired, hence my english is probably very clumsy,
    and I hope I'm understandable enough. I'd be glad to rephrase parts
    that needs it.

  [0] and if the copy/rename/… detection algorithm gets smarter, we just
      need to change its memoization namespace to throw the old cache
      away at once.

  [1] and the really nice part here is that even if you don't create one
      new step per svn revision you import but do macro-steps with
      hundreds of svn revisions at a time, the merge of two differnt
      git-svn states of two clones of the _same_ svn repository will
      have a trivial exact merge: the one that knows the biggest svn
      revision is the new state to use.
-- 
·O·  Pierre Habouzit
··O                                                madcoder@debian.org
OOO                                                http://www.madism.org

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [(not so) random thoughts] using git as its own caching tool
  2007-12-12  0:38 [(not so) random thoughts] using git as its own caching tool Pierre Habouzit
@ 2007-12-12  6:51 ` Mike Hommey
  2007-12-12 15:35 ` Andreas Ericsson
  1 sibling, 0 replies; 6+ messages in thread
From: Mike Hommey @ 2007-12-12  6:51 UTC (permalink / raw)
  To: Pierre Habouzit, Git ML

On Wed, Dec 12, 2007 at 01:38:13AM +0100, Pierre Habouzit wrote:
>   So am I having crazy thoughts and should I throw my crack-pipe away ?
> Or does parts of this mumbling makes any sense to someone ?

I love this idea.

Mike

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [(not so) random thoughts] using git as its own caching tool
  2007-12-12  0:38 [(not so) random thoughts] using git as its own caching tool Pierre Habouzit
  2007-12-12  6:51 ` Mike Hommey
@ 2007-12-12 15:35 ` Andreas Ericsson
  2007-12-12 15:48   ` Mike Hommey
  2007-12-12 16:27   ` Pierre Habouzit
  1 sibling, 2 replies; 6+ messages in thread
From: Andreas Ericsson @ 2007-12-12 15:35 UTC (permalink / raw)
  To: Pierre Habouzit, Git ML

Pierre Habouzit wrote:
>   That's an idea I have for quite some time, and I wonder why it's not
> used in git tools as a general rule.
> 
>   This idea is simple, git objects database has two (for this
> discussion) very interesting features: its delta compressed cached that
> is _very_ efficient, and the reflog.
> 
>   I wonder if that would be possible to write some git porcelains (and
> builtin API too) that would be more "map" oriented. I mean, we could use
> a reference as a pointer to a given tree that would be the map (where
> keys have a path form, which is nice). When I say that, I'm thinking
> about git-svn, that even with the recent improvements of its .rev_db's
> still eats a lot of space with the unhandled.log _and_ the indexes it
> stores for _each_ svn branch/tag. This way, instead of many:
>     foo/index
>     foo/.rev_map.6ef976f9-4de5-0310-a40d-91cae572ec18
>     foo/unhandled.log
> we would just have a special refs/db/git-svn/foo reference that would be
> a tree with three blobs in it: index, rev_map.xxxx, unhandled.log.  (or
> probably index would even be a tree but that's another matter). This
> way, all the unhandled.log that share a lot of common content would be
> nicely compressed by the delta-compression algorithms, with a negligible
> overhead (git-svn is _very_ slow because of svn anyways, we don't really
> care if it needs to get a blob contents instead opening a flat file).
> 
> 
>   Another nifty usage we could have is memoization databases that don't
> require a specific tool to expire them, but use the reflog expiration
> for that. I remember that we discussed quite some time ago, the idea of
> annotating objects. We could use such annotations to link some objects
> to memoized datas under different namespaces for each caching scheme
> involved, and with one reference per namespace that will have in its
> reflog each of the linked objects created over time for caching. Good
> candidates to use that are the rr-cache, or git-annotate/blame caching.
> Of course that would need to write a tool that removes weak annotations
> that point to objects that don't exist anymore. We could also cache the
> rename/copies/… detection results, and make those really really cheap to
> use[0].
> 
> 
>   I know that some will say something about hammers, problems and nails,
> though it would allow to develop quite efficient tools with a generic
> and easy to use API, that could directly benefit from already existing
> infrastructure in git. I mean it's silly to write yet-another cache
> expirer when you have the reflog. Or to speak about git-svn again, it
> could even version its state per branch the way I propose, it'll end up
> using less disk that what it does now, with the immediate gain that it
> would be fully clone-able[1] (which would be a _really_ nice feature).
> 
> 
>   So am I having crazy thoughts and should I throw my crack-pipe away ?
> Or does parts of this mumbling makes any sense to someone ?
> 

A bit of both ;-)

I like the idea to use the git object store, because that certainly
has an API that can't be done away with by user config. The reflog
and its expiration mechanism is subject to human control though, and
everyone doesn't even have them enabled. I don't for some repos where
I know I'll create a thousand-and-one loose objects by rebasing,
--amend'ing and otherwise fiddling with history rewrites.

Having a tool that works on some repos but not on others because it
relies on me living with an auto-gc after pretty much every operation
would be very tiresome indeed.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [(not so) random thoughts] using git as its own caching tool
  2007-12-12 15:35 ` Andreas Ericsson
@ 2007-12-12 15:48   ` Mike Hommey
  2007-12-12 16:03     ` Andreas Ericsson
  2007-12-12 16:27   ` Pierre Habouzit
  1 sibling, 1 reply; 6+ messages in thread
From: Mike Hommey @ 2007-12-12 15:48 UTC (permalink / raw)
  To: Andreas Ericsson; +Cc: Pierre Habouzit, Git ML

On Wed, Dec 12, 2007 at 04:35:19PM +0100, Andreas Ericsson <ae@op5.se> wrote:
> A bit of both ;-)
> 
> I like the idea to use the git object store, because that certainly
> has an API that can't be done away with by user config. The reflog
> and its expiration mechanism is subject to human control though, and
> everyone doesn't even have them enabled. I don't for some repos where
> I know I'll create a thousand-and-one loose objects by rebasing,
> --amend'ing and otherwise fiddling with history rewrites.
> 
> Having a tool that works on some repos but not on others because it
> relies on me living with an auto-gc after pretty much every operation
> would be very tiresome indeed.

There is already a tool that relies on reflogs: stash.

Mike

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [(not so) random thoughts] using git as its own caching tool
  2007-12-12 15:48   ` Mike Hommey
@ 2007-12-12 16:03     ` Andreas Ericsson
  0 siblings, 0 replies; 6+ messages in thread
From: Andreas Ericsson @ 2007-12-12 16:03 UTC (permalink / raw)
  To: Mike Hommey; +Cc: Pierre Habouzit, Git ML

Mike Hommey wrote:
> On Wed, Dec 12, 2007 at 04:35:19PM +0100, Andreas Ericsson <ae@op5.se> wrote:
>> A bit of both ;-)
>>
>> I like the idea to use the git object store, because that certainly
>> has an API that can't be done away with by user config. The reflog
>> and its expiration mechanism is subject to human control though, and
>> everyone doesn't even have them enabled. I don't for some repos where
>> I know I'll create a thousand-and-one loose objects by rebasing,
>> --amend'ing and otherwise fiddling with history rewrites.
>>
>> Having a tool that works on some repos but not on others because it
>> relies on me living with an auto-gc after pretty much every operation
>> would be very tiresome indeed.
> 
> There is already a tool that relies on reflogs: stash.
> 

No, "git stash save" works anyway. It's when you want to use multiple
stashes that it becomes tricky, but even that works if you're willing
to put some effort into it (although I don't use stash a lot, and not
at all in the very rebase-heavy ones).

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [(not so) random thoughts] using git as its own caching tool
  2007-12-12 15:35 ` Andreas Ericsson
  2007-12-12 15:48   ` Mike Hommey
@ 2007-12-12 16:27   ` Pierre Habouzit
  1 sibling, 0 replies; 6+ messages in thread
From: Pierre Habouzit @ 2007-12-12 16:27 UTC (permalink / raw)
  To: Andreas Ericsson; +Cc: Git ML

[-- Attachment #1: Type: text/plain, Size: 1153 bytes --]

On Wed, Dec 12, 2007 at 03:35:19PM +0000, Andreas Ericsson wrote:
> Pierre Habouzit wrote:
> >  So am I having crazy thoughts and should I throw my crack-pipe away ?
> >Or does parts of this mumbling makes any sense to someone ?
> 
> A bit of both ;-)
> 
> I like the idea to use the git object store, because that certainly
> has an API that can't be done away with by user config. The reflog
> and its expiration mechanism is subject to human control though, and
> everyone doesn't even have them enabled. I don't for some repos where
> I know I'll create a thousand-and-one loose objects by rebasing,
> --amend'ing and otherwise fiddling with history rewrites.
> 
> Having a tool that works on some repos but not on others because it
> relies on me living with an auto-gc after pretty much every operation
> would be very tiresome indeed.

  Well if you disable the reflog on some repositories, those commands
will just be slow. But would still work.

-- 
·O·  Pierre Habouzit
··O                                                madcoder@debian.org
OOO                                                http://www.madism.org

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2007-12-12 16:28 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-12-12  0:38 [(not so) random thoughts] using git as its own caching tool Pierre Habouzit
2007-12-12  6:51 ` Mike Hommey
2007-12-12 15:35 ` Andreas Ericsson
2007-12-12 15:48   ` Mike Hommey
2007-12-12 16:03     ` Andreas Ericsson
2007-12-12 16:27   ` Pierre Habouzit

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).