git as a versioned filesystem

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* git as a versioned filesystem
@ 2009-10-02 16:49 Scott Wiersdorf
  2009-10-02 18:11 ` Avery Pennarun
  0 siblings, 1 reply; 2+ messages in thread
From: Scott Wiersdorf @ 2009-10-02 16:49 UTC (permalink / raw)
  To: git

Hi all,

First off, I'm *not* using git as a typical VCS on the front-end; I'm
using it for dist control on the back-end. I'm also fairly new to git
(about a week and a half into it).

The Scene
=========

For source control, we're using CVS (migrating off of it, btw--I only
have limited influence around here). We build our software, etc,
etc. and then we have the developers scp/rsync/untar their builds on a
*master disk image*.

This master disk image is disted via rsync to a few thousand servers
to keep them all up to date and in sync, etc. This works mostly fine
and I can't really change this system.

The Problem
===========

Our problem has been that occasionally bad stuff gets put in the
master image and we have no easy way to revert it or to allow the QA
team to cherry-pick/revert changes to that master image.

The Solution
============

Git seems like the perfect tool for this, but I'm still not sure how
to adapt it to our situation. I'm building a tool that uses git to let
the developers commit their binary changes to this master image into
the git repository, which hopefully will allow me to offer the QA team
some ability to cherry-pick updates or revert regressions and make a
clean dist image from week to week.

The Question
============

What I need to know from y'all is: is there a better way, a more
git-like way, to accomplish this. Here's the model I *want* to follow:

-----a----b--T1-------c--------d-e---f------g [master]
               \   (a)  \
                ----|----c'---                [B1]

Here is branch B1 created from the master at some point in time T1. On
branch B1, I revert commit (a) and cherry-pick commit (c):

  git checkout master
  git branch B1
  git checkout B1
  git revert a
  git cherry-pick c

At this point, B1 is our "perfect image" and we're ready to dist it. I
check it out elsewhere and rsync it, etc. Wonderful.

But now it's a few weeks later and we're ready to do another
dist. What I *want* to do is create a *copy* of branch B1 to give the
release manager a reference point for him to bring things up to
date. What is the best way to do that?

If I branch off of B1, now I have the burden of doing a whole lot of
cherry-picks and having a challenging time getting things back in sync:

-----a----b--T1-------c--------d-e---f------g [master]
               \   (a)  \         \   \
                ----|----c'---     \   \      [B1]
                               \    \   \
                                -----e'--f'---[B2]

Ugh. Now B2 is kind of a mess. If I rebase it on master, I'll get (d)
and maybe (a) again, which I don't want. [side question: unless
there's a way to rebase on master but still exclude
commits... possible?]. B3 and B4 are going to look even worse and the
risk of drifting so far away from the master is unappealing.

Ideally I'd want each week's release to come directly from the master,
kind of the flying-fish approach:

                               ----e'--f'---  [B2]
                             /    /   /
-----a----b--T1-------c--------d-e---f------g [master]
               \   (a)  \
                ----|----c'---                [B1]

The problem with this is that now B2 contains (a), so I'll have to
revert that again--which I can do happily--but I just wonder if
there's a better way. If it's possible to simply *copy* branch B1 to
B2 without making B2 a branch off of B1.

In the absence of a git-branch-copy, is there something that would
help me do set intersection and subtraction between branches?
Something like this:

  git log B1
  ... bunch of commit ids ...
  git log B2
  ... bunch of commit ids ...

  ## find the intersection(B1, B2)

  ## revert all the things missing in B1 from B2

  ## now B2 is the same as B1--assuming git is idempotent (is it?)

  ## is there way besides rebase to clean out a revert as if it never
  ## happened? I suppose I could branch again and repeat this as
  ## needed.

Am I even thinking about this correctly?

Keep in mind that these commits are not source code commits; they're
file system changes of all kinds: updated binaries and libraries, new
directory trees, removed directory trees, etc. It's much closer to a
package manager in spirit than a VCS.

I feel like I'm missing something grand in git-rev-list or git-log or
git-bisect some other tool that will make all my troubles
disappear. I've read an awful lot of the man pages, but am still very
new to git and I'm certain I've missed some subtleties.

Any ideas? I'm not even sure I'm asking the right questions. I'll
accept any advice on this subject.

Scott
-- 
Scott Wiersdorf
<scott@perlcode.org>

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: git as a versioned filesystem
  2009-10-02 16:49 git as a versioned filesystem Scott Wiersdorf
@ 2009-10-02 18:11 ` Avery Pennarun
  0 siblings, 0 replies; 2+ messages in thread
From: Avery Pennarun @ 2009-10-02 18:11 UTC (permalink / raw)
  To: Scott Wiersdorf, git

On Fri, Oct 2, 2009 at 12:49 PM, Scott Wiersdorf <scott@perlcode.org> wrote:
> Git seems like the perfect tool for this, but I'm still not sure how
> to adapt it to our situation. I'm building a tool that uses git to let
> the developers commit their binary changes to this master image into
> the git repository, which hopefully will allow me to offer the QA team
> some ability to cherry-pick updates or revert regressions and make a
> clean dist image from week to week.

Beware that git performs rather badly on binary files, especially huge
ones, which it tries to load entirely into RAM.  It also keeps every
revision of every file that was ever committed (and every user who
checks it out needs to download the whole thing), so your giant binary
repository is going to get very big, very fast.

I've looked into using git for this kind of situation myself.  It's
close, but not quite there (for my purposes anyway).  It basically
just needs some optimizations and some improved support for "shallow
clones."

But on to your actual question:

> But now it's a few weeks later and we're ready to do another
> dist. What I *want* to do is create a *copy* of branch B1 to give the
> release manager a reference point for him to bring things up to
> date. What is the best way to do that?
>
> If I branch off of B1, now I have the burden of doing a whole lot of
> cherry-picks and having a challenging time getting things back in sync:
>
> -----a----b--T1-------c--------d-e---f------g [master]
>               \   (a)  \         \   \
>                ----|----c'---     \   \      [B1]
>                               \    \   \
>                                -----e'--f'---[B2]
>
> Ugh. Now B2 is kind of a mess. If I rebase it on master, I'll get (d)
> and maybe (a) again, which I don't want. [side question: unless
> there's a way to rebase on master but still exclude
> commits... possible?]. B3 and B4 are going to look even worse and the
> risk of drifting so far away from the master is unappealing.

If you rebase your "release" changes onto current master, you'll get
the revert-a patch applied, so (a) will still be gone.  Rebase will
also probably be smart enough to throw away c', since it's identical
to (c).  You will indeed end up with the unwanted (d), but you can
just revert that in B2.

> Ideally I'd want each week's release to come directly from the master,
> kind of the flying-fish approach:
>
>                               ----e'--f'---  [B2]
>                             /    /   /
> -----a----b--T1-------c--------d-e---f------g [master]
>               \   (a)  \
>                ----|----c'---                [B1]
>
> The problem with this is that now B2 contains (a), so I'll have to
> revert that again--which I can do happily--but I just wonder if
> there's a better way. If it's possible to simply *copy* branch B1 to
> B2 without making B2 a branch off of B1.

"revert-a" is a patch on its own.  Git doesn't think of reverting (a)
as anything special; it's just a change that happens to reverse what
(a) does.  So if you rebase B1 onto master, it will get copied.  It
sounds rebase will produce exactly the results you're looking for
here.

Now, that said, this release process seems extremely suspicious to me.

To summarize what I'm hearing: you have a 'master' branch that people
put stuff into, but which doesn't actually work correctly.  At the
last minute before a release, you make a new branch, drop out the
stuff that doesn't work, and put it into production.

This sounds problematic.  If (a) and (d) don't work, why are they in
master at all?  Git makes branching really easy: get people to put
their not-quite-working features into a different branch, and let the
release manager merge those branches into master when they're actually
ready.

If you do that, you'll always be releasing straight out of master, and
your life will be a lot simpler.  And if you "merge --squash" from the
feature branches into master, you can throw away the interim versions
of the feature branches, which should help keep your repository from
growing so quickly with tons of binary file revisions that never even
got released.

>  ## is there way besides rebase to clean out a revert as if it never
>  ## happened? I suppose I could branch again and repeat this as
>  ## needed.

You probably want "git revert -i".

Have fun,

Avery

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2009-10-02 18:12 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-10-02 16:49 git as a versioned filesystem Scott Wiersdorf
2009-10-02 18:11 ` Avery Pennarun

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).