git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* help moving boost.org to git
@ 2010-07-05 14:16 Eric Niebler
  2010-07-05 14:48 ` Erik Faye-Lund
                   ` (3 more replies)
  0 siblings, 4 replies; 19+ messages in thread
From: Eric Niebler @ 2010-07-05 14:16 UTC (permalink / raw)
  To: git

I have a question about the best approach to take for refactoring a
large svn project into git. The project, boost.org, is a collection of
C++ libraries (>100) that are mostly independent. (There may be
cross-library dependencies, but we plan to handle that at a higher
level.) After the move to git, we'd like each library to be in its own
git repository. Boost can then be a stitching-together of these, using
submodules or something (opinions welcome). It's an old project with
lots of history that we don't want to lose. The naive approach of simply
forking into N repositories for the N libraries and deleting the
unwanted files in each is unworkable because we'll end up with all the
history duplicated everywhere ... >100 repositories, each larger than 100Mb.

So, what are the options? Can I somehow delete from each repository the
history that is irrelevant? Is these some feature of git I don't know
about that can solve this problem for us?

(Caveat: I'm new to git and still getting up to speed. An acceptable
answer is: go off an learn about feature X and come back to us.)

At boost, We've already discussed a few possible approaches. Feel free
to comment and/or criticize any of the solutions suggested here:

  http://github.com/ryppl/ryppl/issues#issue/4

-- 
Eric Niebler
BoostPro Computing
http://www.boostpro.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: help moving boost.org to git
  2010-07-05 14:16 help moving boost.org to git Eric Niebler
@ 2010-07-05 14:48 ` Erik Faye-Lund
  2010-07-05 14:48 ` Johannes Sixt
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 19+ messages in thread
From: Erik Faye-Lund @ 2010-07-05 14:48 UTC (permalink / raw)
  To: Eric Niebler; +Cc: git

On Mon, Jul 5, 2010 at 4:16 PM, Eric Niebler <eric@boostpro.com> wrote:
> I have a question about the best approach to take for refactoring a
> large svn project into git. The project, boost.org, is a collection of
> C++ libraries (>100) that are mostly independent. (There may be
> cross-library dependencies, but we plan to handle that at a higher
> level.) After the move to git, we'd like each library to be in its own
> git repository. Boost can then be a stitching-together of these, using
> submodules or something (opinions welcome). It's an old project with
> lots of history that we don't want to lose. The naive approach of simply
> forking into N repositories for the N libraries and deleting the
> unwanted files in each is unworkable because we'll end up with all the
> history duplicated everywhere ... >100 repositories, each larger than 100Mb.
>
> So, what are the options? Can I somehow delete from each repository the
> history that is irrelevant? Is these some feature of git I don't know
> about that can solve this problem for us?
>

You're probably looking for git-filter-branch. This tool can be used
with the --subdirectory-filter option to filter out a specific
subdirectory to it's own branch. Or if the project isn't split into
subdirectories, you can use the --tree-filter option to filter
specific files if you want.

See http://www.kernel.org/pub/software/scm/git/docs/git-filter-branch.html
for details

-- 
Erik "kusma" Faye-Lund

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: help moving boost.org to git
  2010-07-05 14:16 help moving boost.org to git Eric Niebler
  2010-07-05 14:48 ` Erik Faye-Lund
@ 2010-07-05 14:48 ` Johannes Sixt
  2010-07-05 17:51   ` Eric Niebler
  2010-07-06 15:06   ` Raja R Harinath
  2010-07-05 22:04 ` Finn Arne Gangstad
  2010-07-06  0:16 ` Greg Troxel
  3 siblings, 2 replies; 19+ messages in thread
From: Johannes Sixt @ 2010-07-05 14:48 UTC (permalink / raw)
  To: Eric Niebler; +Cc: git

Am 7/5/2010 16:16, schrieb Eric Niebler:
> I have a question about the best approach to take for refactoring a
> large svn project into git. The project, boost.org, is a collection of
> C++ libraries (>100) that are mostly independent. (There may be
> cross-library dependencies, but we plan to handle that at a higher
> level.) After the move to git, we'd like each library to be in its own
> git repository.

You could use svn2git: http://gitorious.org/svn2git
KDE uses it to split its SVN repository into pieces. The tool is driven by
a "ruleset" that specifies SVN subdirectories and revision numbers that
make up a module.

-- 
"Atomic objects are neither active nor radioactive." --
Programming Languages -- C++, Final Committee Draft (Doc.N3092)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: help moving boost.org to git
  2010-07-05 14:48 ` Johannes Sixt
@ 2010-07-05 17:51   ` Eric Niebler
  2010-07-05 18:43     ` Sverre Rabbelier
  2010-07-06 15:06   ` Raja R Harinath
  1 sibling, 1 reply; 19+ messages in thread
From: Eric Niebler @ 2010-07-05 17:51 UTC (permalink / raw)
  To: git

On 7/5/2010 10:48 AM, Johannes Sixt wrote:
> Am 7/5/2010 16:16, schrieb Eric Niebler:
>> I have a question about the best approach to take for refactoring a
>> large svn project into git. The project, boost.org, is a collection of
>> C++ libraries (>100) that are mostly independent. (There may be
>> cross-library dependencies, but we plan to handle that at a higher
>> level.) After the move to git, we'd like each library to be in its own
>> git repository.
> 
> You could use svn2git: http://gitorious.org/svn2git
> KDE uses it to split its SVN repository into pieces. The tool is driven by
> a "ruleset" that specifies SVN subdirectories and revision numbers that
> make up a module.

I'm off to learn about filter-branch, tree-filter and svn2git. Thanks
for the suggestions. More questions to come, I'm sure.

-- 
Eric Niebler
BoostPro Computing
http://www.boostpro.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: help moving boost.org to git
  2010-07-05 17:51   ` Eric Niebler
@ 2010-07-05 18:43     ` Sverre Rabbelier
  0 siblings, 0 replies; 19+ messages in thread
From: Sverre Rabbelier @ 2010-07-05 18:43 UTC (permalink / raw)
  To: Eric Niebler; +Cc: git, Avery Pennarun

Heya,

On Mon, Jul 5, 2010 at 19:51, Eric Niebler <eric@boostpro.com> wrote:

> I'm off to learn about filter-branch, tree-filter and svn2git. Thanks
> for the suggestions. More questions to come, I'm sure.

Also have a look at git-subtree.

-- 
Cheers,

Sverre Rabbelier

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: help moving boost.org to git
  2010-07-05 14:16 help moving boost.org to git Eric Niebler
  2010-07-05 14:48 ` Erik Faye-Lund
  2010-07-05 14:48 ` Johannes Sixt
@ 2010-07-05 22:04 ` Finn Arne Gangstad
  2010-07-05 23:11   ` Eric Niebler
  2010-07-06  0:16 ` Greg Troxel
  3 siblings, 1 reply; 19+ messages in thread
From: Finn Arne Gangstad @ 2010-07-05 22:04 UTC (permalink / raw)
  To: Eric Niebler; +Cc: git

On Mon, Jul 05, 2010 at 10:16:36AM -0400, Eric Niebler wrote:
> I have a question about the best approach to take for refactoring a
> large svn project into git. The project, boost.org, is a collection of
> C++ libraries (>100) that are mostly independent. (There may be
> cross-library dependencies, but we plan to handle that at a higher
> level.) After the move to git, we'd like each library to be in its own
> git repository. Boost can then be a stitching-together of these, using
> submodules or something (opinions welcome). It's an old project with
> lots of history that we don't want to lose. The naive approach of simply
> forking into N repositories for the N libraries and deleting the
> unwanted files in each is unworkable because we'll end up with all the
> history duplicated everywhere ... >100 repositories, each larger than 100Mb.

If the libraries are not independent (i.e. some commits are across
multiple libraries), submodules will give you some interesting
challenges to put it mildly.

The current boost 1.43 is 29344 files, is this all there is? This
should fit eaily into a single repository. The Linux kernel is much
larger, and that is sort of the canonical single repo git project. I
_strongly_ recommend that you go for a single repo if you can make it
work.

If you manage to create a single git repo with the history you want,
it is trivial to split out separate repositories of subdirectories
later (and those repos will then be comparatively small). git subtree
allegedly automates this process more or less (I have not used it, but
have heard good things about it). What about having a single "master
repository", and then using subtree to create single-library repos for
the library developers if they want a smaller repo to play around in?

> So,, what are the options? Can I somehow delete from each repository the
> history that is irrelevant? Is these some feature of git I don't know
> about that can solve this problem for us?

How do you define "irrelevant"? Do you only require enough history for
git annotate/blame to give correct results?  Or does this only refer
to multiple repositories sharing the same ancient history?

> At boost, We've already discussed a few possible approaches. Feel free
> to comment and/or criticize any of the solutions suggested here:
> 
>   http://github.com/ryppl/ryppl/issues#issue/4

It is unclear from the discussion if you will change to git, or use
git in addition to svn? This will have some impact on how to go about
this.

- Finn Arne

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: help moving boost.org to git
  2010-07-05 22:04 ` Finn Arne Gangstad
@ 2010-07-05 23:11   ` Eric Niebler
  2010-07-05 23:32     ` Avery Pennarun
  2010-07-06  1:46     ` Dave Abrahams
  0 siblings, 2 replies; 19+ messages in thread
From: Eric Niebler @ 2010-07-05 23:11 UTC (permalink / raw)
  To: git

On 7/5/2010 6:04 PM, Finn Arne Gangstad wrote:
> On Mon, Jul 05, 2010 at 10:16:36AM -0400, Eric Niebler wrote:
>> I have a question about the best approach to take for refactoring a
>> large svn project into git. The project, boost.org, is a collection of
>> C++ libraries (>100) that are mostly independent. (There may be
>> cross-library dependencies, but we plan to handle that at a higher
>> level.) After the move to git, we'd like each library to be in its own
>> git repository. Boost can then be a stitching-together of these, using
>> submodules or something (opinions welcome). It's an old project with
>> lots of history that we don't want to lose. The naive approach of simply
>> forking into N repositories for the N libraries and deleting the
>> unwanted files in each is unworkable because we'll end up with all the
>> history duplicated everywhere ... >100 repositories, each larger than 100Mb.
> 
> If the libraries are not independent (i.e. some commits are across
> multiple libraries), submodules will give you some interesting
> challenges to put it mildly.

You have correctly assessed the situation. There *are* cross-library
commits in our history. What are the implications of this for
modularlization?

> The current boost 1.43 is 29344 files, is this all there is? 

Yes.

> This
> should fit eaily into a single repository. The Linux kernel is much
> larger, and that is sort of the canonical single repo git project. I
> _strongly_ recommend that you go for a single repo if you can make it
> work.

It does fit into one repo, but that doesn't meet our needs for the
future. Users want to install and build library X and its dependencies,
not all of boost. This is increasingly becoming a problem as boost
grows. Imagine if a perl programmer had to download all of CPAN to use
or hack on any one perl module. Or if contributing to CPAN meant getting
the whole shebang, history and all. I'm sure even in the Linux kernel,
not *every* third-party driver is maintained in the master git repo.

We are aiming to make boost a clearing-house for C++ libraries (like
CPAN, or PyPi for python), turning the official boost distribution into
little more than a well-tested collection of the libraries that have
passed our peer-review and regression test process.

In fact, the modularization has already been done, and work is well
underway on the infrastructure to support dependency tracking. But the
modularization is not history-preserving and needs to be redone.

> If you manage to create a single git repo with the history you want,
> it is trivial to split out separate repositories of subdirectories
> later (and those repos will then be comparatively small). git subtree
> allegedly automates this process more or less (I have not used it, but
> have heard good things about it). What about having a single "master
> repository", and then using subtree to create single-library repos for
> the library developers if they want a smaller repo to play around in?

This sounds like it might be ok, but I need to research it.

>> So,, what are the options? Can I somehow delete from each repository the
>> history that is irrelevant? Is these some feature of git I don't know
>> about that can solve this problem for us?
> 
> How do you define "irrelevant"? Do you only require enough history for
> git annotate/blame to give correct results?  Or does this only refer
> to multiple repositories sharing the same ancient history?

If multiple repositories share the same ancient history, wouldn't that
give git annotate/blame enough information? Sorry, git newbie here.

>> At boost, We've already discussed a few possible approaches. Feel free
>> to comment and/or criticize any of the solutions suggested here:
>>
>>   http://github.com/ryppl/ryppl/issues#issue/4
> 
> It is unclear from the discussion if you will change to git, or use
> git in addition to svn? This will have some impact on how to go about
> this.

The plan is to move to git. However, we don't expect this to happen
overnight, so a way to continue to pull changes from a svn mirror while
the new git repositories are being set up would be ideal.

-- 
Eric Niebler
BoostPro Computing
http://www.boostpro.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: help moving boost.org to git
  2010-07-05 23:11   ` Eric Niebler
@ 2010-07-05 23:32     ` Avery Pennarun
  2010-07-06  0:16       ` Eric Niebler
  2010-07-06  1:46     ` Dave Abrahams
  1 sibling, 1 reply; 19+ messages in thread
From: Avery Pennarun @ 2010-07-05 23:32 UTC (permalink / raw)
  To: Eric Niebler; +Cc: git

(note: on this mailing list, you shouldn't drop names from the cc:
line when replying to a thread)

On Mon, Jul 5, 2010 at 7:11 PM, Eric Niebler <eric@boostpro.com> wrote:
> On 7/5/2010 6:04 PM, Finn Arne Gangstad wrote:
>> This
>> should fit eaily into a single repository. The Linux kernel is much
>> larger, and that is sort of the canonical single repo git project. I
>> _strongly_ recommend that you go for a single repo if you can make it
>> work.
>
> It does fit into one repo, but that doesn't meet our needs for the
> future. Users want to install and build library X and its dependencies,
> not all of boost. This is increasingly becoming a problem as boost
> grows. Imagine if a perl programmer had to download all of CPAN to use
> or hack on any one perl module. Or if contributing to CPAN meant getting
> the whole shebang, history and all. I'm sure even in the Linux kernel,
> not *every* third-party driver is maintained in the master git repo.

Actually, that's mostly not true; there are a few third-party drivers
that don't make it into the core Linux repo, but that's mostly because
they haven't been accepted by the kernel maintainers for whatever
reason (often quality or duplication, I guess).  The goal for the vast
majority of Linux drivers is indeed to get merged into the Linux core.

...and it works pretty well, all things considered.  It's certainly
not the only way to do it for every project, but it's actually a
pretty good way.  The kernel repo history runs to hundreds of megs
nowadays, but on a modern Internet connection that's not a big deal.
And then you never have to worry about downloading more modules later.
 You also never have versioning problems.

> We are aiming to make boost a clearing-house for C++ libraries (like
> CPAN, or PyPi for python), turning the official boost distribution into
> little more than a well-tested collection of the libraries that have
> passed our peer-review and regression test process.

Of course you will want to have some kind of really excellent
versioned dependency fetching system (exactly like CPAN or PyPi or
ruby gems) if you want this to be nice.  git's submodules stuff is
almost certainly not going to add any features you need/want.  On the
other hand, cloning a separate git repo is pretty easy to write your
CPAN-like script around.

> In fact, the modularization has already been done, and work is well
> underway on the infrastructure to support dependency tracking. But the
> modularization is not history-preserving and needs to be redone.

If your code doesn't move too many files around, then splitting out
the history is pretty easy with git-subtree (a tool I wrote that's not
part of git):

   git subtree split --prefix=/path/to/subdir

And you get a new history for just that subdir.  That might do exactly
what you want.  It also works iteratively, so you can export your
history from svn, then re-export the changes as they occur over time.

>>> So,, what are the options? Can I somehow delete from each repository the
>>> history that is irrelevant? Is these some feature of git I don't know
>>> about that can solve this problem for us?
>>
>> How do you define "irrelevant"? Do you only require enough history for
>> git annotate/blame to give correct results?  Or does this only refer
>> to multiple repositories sharing the same ancient history?
>
> If multiple repositories share the same ancient history, wouldn't that
> give git annotate/blame enough information? Sorry, git newbie here.

Yes, it would.  But how much of the ancient history do you want?  If
you want all of it, you don't save any space in your repo.

> The plan is to move to git. However, we don't expect this to happen
> overnight, so a way to continue to pull changes from a svn mirror while
> the new git repositories are being set up would be ideal.

This isn't too hard to do; you just need some scripts around git-svn
and git-subtree (or whatever tool you use to do the splitting).  We've
done this at work for a couple of years now and it's working fine.

The confusing part is taking *submissions* back through both channels.
 If you value your sanity, you probably want to only allow submissions
back via svn while you're running the two in parallel; but that makes
git's added features a lot less useful, so you probably want to run in
parallel for only a short time.

Have fun,

Avery

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: help moving boost.org to git
  2010-07-05 23:32     ` Avery Pennarun
@ 2010-07-06  0:16       ` Eric Niebler
  2010-07-06 17:27         ` Avery Pennarun
  0 siblings, 1 reply; 19+ messages in thread
From: Eric Niebler @ 2010-07-06  0:16 UTC (permalink / raw)
  To: Avery Pennarun; +Cc: git

On 7/5/2010 7:32 PM, Avery Pennarun wrote:
> (note: on this mailing list, you shouldn't drop names from the cc:
> line when replying to a thread)

Noted, thanks.

> On Mon, Jul 5, 2010 at 7:11 PM, Eric Niebler <eric@boostpro.com> wrote:
>> On 7/5/2010 6:04 PM, Finn Arne Gangstad wrote:
>>> This
>>> should fit eaily into a single repository. The Linux kernel is much
>>> larger, and that is sort of the canonical single repo git project. I
>>> _strongly_ recommend that you go for a single repo if you can make it
>>> work.
>>
>> It does fit into one repo, but that doesn't meet our needs for the
>> future. Users want to install and build library X and its dependencies,
>> not all of boost. This is increasingly becoming a problem as boost
>> grows. Imagine if a perl programmer had to download all of CPAN to use
>> or hack on any one perl module. Or if contributing to CPAN meant getting
>> the whole shebang, history and all. I'm sure even in the Linux kernel,
>> not *every* third-party driver is maintained in the master git repo.
> 
> Actually, that's mostly not true; there are a few third-party drivers
> that don't make it into the core Linux repo
<snip discussion showing my ignorance of Linux's repository structure>

Thanks for the correction. The CPAN/PyPi analogy is still apt.

>> We are aiming to make boost a clearing-house for C++ libraries (like
>> CPAN, or PyPi for python), turning the official boost distribution into
>> little more than a well-tested collection of the libraries that have
>> passed our peer-review and regression test process.
> 
> Of course you will want to have some kind of really excellent
> versioned dependency fetching system (exactly like CPAN or PyPi or
> ruby gems) if you want this to be nice.  git's submodules stuff is
> almost certainly not going to add any features you need/want.  On the
> other hand, cloning a separate git repo is pretty easy to write your
> CPAN-like script around.

Indeed, we are stealing the work of the python guys. Pip does most of
what we want. They've graciously been accepting our patches so it
happily clones git repos in order to satisfy dependencies now. It is
some kind of really excellent! :-)

>> In fact, the modularization has already been done, and work is well
>> underway on the infrastructure to support dependency tracking. But the
>> modularization is not history-preserving and needs to be redone.
> 
> If your code doesn't move too many files around, then splitting out
> the history is pretty easy with git-subtree (a tool I wrote that's not
> part of git):
> 
>    git subtree split --prefix=/path/to/subdir
> 
> And you get a new history for just that subdir.  That might do exactly
> what you want.  It also works iteratively, so you can export your
> history from svn, then re-export the changes as they occur over time.

This looks like it here:

  http://github.com/apenwarr/git-subtree

I'll have to read the docs. Thanks for the tip.

>>>> So,, what are the options? Can I somehow delete from each repository the
>>>> history that is irrelevant? Is these some feature of git I don't know
>>>> about that can solve this problem for us?
>>>
>>> How do you define "irrelevant"? Do you only require enough history for
>>> git annotate/blame to give correct results?  Or does this only refer
>>> to multiple repositories sharing the same ancient history?
>>
>> If multiple repositories share the same ancient history, wouldn't that
>> give git annotate/blame enough information? Sorry, git newbie here.
> 
> Yes, it would.  But how much of the ancient history do you want?  If
> you want all of it, you don't save any space in your repo.

Repos, plural. We'd save space because the history wouldn't be
duplicated in each one. Right? Or else I'm confused and this something
that will become clear after I understand what git subtree does.

Right now, the other boost developers are pushing for a solution that
uses grafts. I'm fuzzy on what they are exactly, but it seems that we'd
freeze a svn mirror and have anybody interested in history put grafts in
their local repository pointing back at the mirror. I don't know enough
yet to say what the pros/cons of this approach might be wrt git subtree.

>> The plan is to move to git. However, we don't expect this to happen
>> overnight, so a way to continue to pull changes from a svn mirror while
>> the new git repositories are being set up would be ideal.
> 
> This isn't too hard to do; you just need some scripts around git-svn
> and git-subtree (or whatever tool you use to do the splitting).  We've
> done this at work for a couple of years now and it's working fine.

Cool.

> The confusing part is taking *submissions* back through both channels.
> If you value your sanity, you probably want to only allow submissions
> back via svn while you're running the two in parallel; but that makes
> git's added features a lot less useful, so you probably want to run in
> parallel for only a short time.

Oh my! I don't think we'd open the git repositories for changes until
after we close down svn. This problem is hard enough.

-- 
Eric Niebler
BoostPro Computing
http://www.boostpro.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: help moving boost.org to git
  2010-07-05 14:16 help moving boost.org to git Eric Niebler
                   ` (2 preceding siblings ...)
  2010-07-05 22:04 ` Finn Arne Gangstad
@ 2010-07-06  0:16 ` Greg Troxel
  2010-07-06  0:25   ` Eric Niebler
  3 siblings, 1 reply; 19+ messages in thread
From: Greg Troxel @ 2010-07-06  0:16 UTC (permalink / raw)
  To: Eric Niebler; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 1559 bytes --]


You have found the core issue with svn/git: svn allows you to have a
large repo with everything (and atomic commits across it) and to have
users check out parts of the repo separately.  git does not, because the
svn separate checkouts model only works with a remote repository that
you don't keep a copy of.  With git, cloning the repo gets you the whole
thing.

One thought is that you may want to separate how you organize boost
sources in git and how you release them.  It's possible to have a single
git repo for all libraries and have atomic commits but then create
distfiles for each library separately.

git becomes a bit slow when repositories get really large (although
other tools are not any better - the problem is in the sheer number of
vnode ops necessary for the semantics).  I have a repo with all of
NetBSD's "src", "xsrc" and "pkgsrc" in it, and "git status" can take
several seconds because it is calling stat on 230K files.  With only 23K
files, things should be ok.

My advice (which is not really about git) is to figure out whether you
want:

  A) a set of interrelated libraries on which you will allow atomic
  commits that change interfaces/usage in multiple libraries

or

  B) a set of independent libaries which have commits to separate
  libraries, and for which you insist that each library have an API and
  ABI compatiblity story, so that even when upgraded other libraries can
  continue to use it.


For A, you probably want one git repo, much as you have one svn repo
now.  For B, multiple git repos are the right answer.

[-- Attachment #2: Type: application/pgp-signature, Size: 194 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: help moving boost.org to git
  2010-07-06  0:16 ` Greg Troxel
@ 2010-07-06  0:25   ` Eric Niebler
  0 siblings, 0 replies; 19+ messages in thread
From: Eric Niebler @ 2010-07-06  0:25 UTC (permalink / raw)
  To: Greg Troxel; +Cc: git

On 7/5/2010 8:16 PM, Greg Troxel wrote:
> 
> You have found the core issue with svn/git: svn allows you to have a
> large repo with everything (and atomic commits across it) and to have
> users check out parts of the repo separately.  git does not, because the
> svn separate checkouts model only works with a remote repository that
> you don't keep a copy of.  With git, cloning the repo gets you the whole
> thing.

Makes sense.

> One thought is that you may want to separate how you organize boost
> sources in git and how you release them.  It's possible to have a single
> git repo for all libraries and have atomic commits but then create
> distfiles for each library separately.
> 
> git becomes a bit slow when ...
<snip>

It can't get any worse than svn. We haven't run into any perf problems
with git yet. That's not our primary concern.

> My advice (which is not really about git) is to figure out whether you
> want:
> 
>   A) a set of interrelated libraries on which you will allow atomic
>   commits that change interfaces/usage in multiple libraries
> 
> or
> 
>   B) a set of independent libaries which have commits to separate
>   libraries, and for which you insist that each library have an API and
>   ABI compatiblity story, so that even when upgraded other libraries can
>   continue to use it.
> 
> 
> For A, you probably want one git repo, much as you have one svn repo
> now.  For B, multiple git repos are the right answer.

I'll take B FTW! :-) The idea is to open up and distribute C++ library
development. Versioned dependency tracking will be handled at a higher
level with per-project metadata and a tool (pip) that resolves
dependencies. API compatibility is handled with peer review and
regression testing. ABI compatibility is not an issue because we're
distributing source code.

-- 
Eric Niebler
BoostPro Computing
http://www.boostpro.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: help moving boost.org to git
  2010-07-05 23:11   ` Eric Niebler
  2010-07-05 23:32     ` Avery Pennarun
@ 2010-07-06  1:46     ` Dave Abrahams
  2010-07-06  8:51       ` Jakub Narebski
  1 sibling, 1 reply; 19+ messages in thread
From: Dave Abrahams @ 2010-07-06  1:46 UTC (permalink / raw)
  To: git

Eric Niebler <eric <at> boostpro.com> writes:

> We are aiming to make boost a clearing-house for C++ libraries (like
> CPAN, or PyPi for python), 

Clarification: that's our goal for Ryppl, not Boost.

> turning the official boost distribution into
> little more than a well-tested collection of the libraries that have
> passed our peer-review and regression test process.

Exactly right.

--
Dave Abrahams
BoostPro Computing
http://www.boostpro.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: help moving boost.org to git
  2010-07-06  1:46     ` Dave Abrahams
@ 2010-07-06  8:51       ` Jakub Narebski
  2010-07-06 10:34         ` David Abrahams
  0 siblings, 1 reply; 19+ messages in thread
From: Jakub Narebski @ 2010-07-06  8:51 UTC (permalink / raw)
  To: Dave Abrahams; +Cc: git, Eric Niebler

Dave Abrahams <dave@boostpro.com> writes:

> Eric Niebler <eric <at> boostpro.com> writes:
> 
> > We are aiming to make boost a clearing-house for C++ libraries (like
> > CPAN, or PyPi for python), 
> 
> Clarification: that's our goal for Ryppl, not Boost.

By the way, could you please add information about Ryppl to Git Wiki?
https://git.wiki.kernel.org/index.php/InterfacesFrontendsAndTools

Thanks in advance
-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: help moving boost.org to git
  2010-07-06  8:51       ` Jakub Narebski
@ 2010-07-06 10:34         ` David Abrahams
  0 siblings, 0 replies; 19+ messages in thread
From: David Abrahams @ 2010-07-06 10:34 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git, Eric Niebler

At Tue, 06 Jul 2010 01:51:00 -0700 (PDT),
Jakub Narebski wrote:
> 
> Dave Abrahams <dave@boostpro.com> writes:
> 
> > Eric Niebler <eric <at> boostpro.com> writes:
> > 
> > > We are aiming to make boost a clearing-house for C++ libraries (like
> > > CPAN, or PyPi for python), 
> > 
> > Clarification: that's our goal for Ryppl, not Boost.
> 
> By the way, could you please add information about Ryppl to Git Wiki?
> https://git.wiki.kernel.org/index.php/InterfacesFrontendsAndTools

We'd be happy to, but: have you read the document at ryppl.org, and do
you really think it's appropriate?  Even though ryppl is still very
alpha?

[BTW, I can't reach that page right now; it's just forever “waiting
for reply...”]

-- 
Dave Abrahams
BoostPro Computing
http://www.boostpro.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: help moving boost.org to git
  2010-07-05 14:48 ` Johannes Sixt
  2010-07-05 17:51   ` Eric Niebler
@ 2010-07-06 15:06   ` Raja R Harinath
  1 sibling, 0 replies; 19+ messages in thread
From: Raja R Harinath @ 2010-07-06 15:06 UTC (permalink / raw)
  To: git

Hi,

Johannes Sixt <j.sixt@viscovery.net> writes:

> Am 7/5/2010 16:16, schrieb Eric Niebler:
>> I have a question about the best approach to take for refactoring a
>> large svn project into git. The project, boost.org, is a collection of
>> C++ libraries (>100) that are mostly independent. (There may be
>> cross-library dependencies, but we plan to handle that at a higher
>> level.) After the move to git, we'd like each library to be in its own
>> git repository.
>
> You could use svn2git: http://gitorious.org/svn2git
> KDE uses it to split its SVN repository into pieces. The tool is driven by
> a "ruleset" that specifies SVN subdirectories and revision numbers that
> make up a module.

I'm also involved in moving a large SVN project to git (the mono
project).  I have found and fixed several issues with svn2git

  git://gitorious.org/~harinath/svn2git/rrh-svn2git.git

- Hari

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: help moving boost.org to git
  2010-07-06  0:16       ` Eric Niebler
@ 2010-07-06 17:27         ` Avery Pennarun
  2010-07-06 18:00           ` Eric Niebler
  0 siblings, 1 reply; 19+ messages in thread
From: Avery Pennarun @ 2010-07-06 17:27 UTC (permalink / raw)
  To: Eric Niebler; +Cc: git

On Mon, Jul 5, 2010 at 8:16 PM, Eric Niebler <eric@boostpro.com> wrote:
> On 7/5/2010 7:32 PM, Avery Pennarun wrote:
>> Eric Niebler wrote:
>>> If multiple repositories share the same ancient history, wouldn't that
>>> give git annotate/blame enough information? Sorry, git newbie here.
>>
>> Yes, it would.  But how much of the ancient history do you want?  If
>> you want all of it, you don't save any space in your repo.
>
> Repos, plural. We'd save space because the history wouldn't be
> duplicated in each one. Right? Or else I'm confused and this something
> that will become clear after I understand what git subtree does.

The statement "multiple repositories share the same ancient history"
above is the part that's confusing.  If you use a tool like
git-subtree or git-filter-branch, you're actually generating a "new
history" based on the original history.  The "new history" obviously
contains fewer files than the original, which would take less space.
But if you want multiple repositories to "share the same ancient
history" you can't rewrite it, and thus you aren't saving any space in
any one repo.

I'm assuming you want to rewrite history to save space (since that's
what this thread is about).  And git annotate/blame will work as long
as your rewritten history contains all the files you care about in
that repo.

> Right now, the other boost developers are pushing for a solution that
> uses grafts. I'm fuzzy on what they are exactly, but it seems that we'd
> freeze a svn mirror and have anybody interested in history put grafts in
> their local repository pointing back at the mirror. I don't know enough
> yet to say what the pros/cons of this approach might be wrt git subtree.

The primary advantage of grafts is that you can do something easy
*right now* and then fix it all up later.  eg. if you screw up your
history extraction and do it better later, you can just re-graft it
and you're done.

A secondary advantage of grafts is that cloning the "primary"
repository will be tiny since it doesn't have much ancient history.

A disadvantage of grafts is that each user has to deal with grafts in
his cloned repo, and unless he does, things like 'git log' and 'git
blame' won't show anything from the grafted history.  Supposedly 'git
replace' was designed to help with this issue, but I've never used it
so I don't know for sure.

And of course, grafts don't actually do any history rewriting for you.
 You could split out a subtree's history and then graft it on, but the
splitting process is still the same as it would be without grafts.
The alternative would be to *not* rewrite history, just keep the
entire history of the whole project in one place, and graft it on if
you really need it.  That's actually pretty clean (and accurately
reflects exactly what *really happened*, which is a nice feature to
have in a vcs history), but you'll then never have a single repo of
just one subproject with the entire history of that subproject.  That
latter turns out to not actually be very important in practice, so you
might want to do it.

>> The confusing part is taking *submissions* back through both channels.
>> If you value your sanity, you probably want to only allow submissions
>> back via svn while you're running the two in parallel; but that makes
>> git's added features a lot less useful, so you probably want to run in
>> parallel for only a short time.
>
> Oh my! I don't think we'd open the git repositories for changes until
> after we close down svn. This problem is hard enough.

It can be done, and I've done it :)  But you're wise to avoid that situation.

Have fun,

Avery

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: help moving boost.org to git
  2010-07-06 17:27         ` Avery Pennarun
@ 2010-07-06 18:00           ` Eric Niebler
  2010-07-06 18:13             ` Avery Pennarun
  0 siblings, 1 reply; 19+ messages in thread
From: Eric Niebler @ 2010-07-06 18:00 UTC (permalink / raw)
  To: Avery Pennarun; +Cc: git

On 7/6/2010 1:27 PM, Avery Pennarun wrote:
> On Mon, Jul 5, 2010 at 8:16 PM, Eric Niebler <eric@boostpro.com> wrote:
>> On 7/5/2010 7:32 PM, Avery Pennarun wrote:
>>> Eric Niebler wrote:
>>>> If multiple repositories share the same ancient history, wouldn't that
>>>> give git annotate/blame enough information? Sorry, git newbie here.
>>>
>>> Yes, it would.  But how much of the ancient history do you want?  If
>>> you want all of it, you don't save any space in your repo.
>>
>> Repos, plural. We'd save space because the history wouldn't be
>> duplicated in each one. Right? Or else I'm confused and this something
>> that will become clear after I understand what git subtree does.
> 
> The statement "multiple repositories share the same ancient history"
> above is the part that's confusing.  If you use a tool like
> git-subtree or git-filter-branch, you're actually generating a "new
> history" based on the original history.  The "new history" obviously
> contains fewer files than the original, which would take less space.
> But if you want multiple repositories to "share the same ancient
> history" you can't rewrite it, and thus you aren't saving any space in
> any one repo.

I think I have reached understanding! Thank you. It *would* save if I
pull down, say, 100 of these new repos+ancient history because git would
just store the ancient history locally once. I'm also guessing git is
smart enough to avoid /downloading/ the ancient history 100x.

> I'm assuming you want to rewrite history to save space (since that's
> what this thread is about).  And git annotate/blame will work as long
> as your rewritten history contains all the files you care about in
> that repo.

Right. I now understand that, too.

>> Right now, the other boost developers are pushing for a solution that
>> uses grafts. I'm fuzzy on what they are exactly, but it seems that we'd
>> freeze a svn mirror and have anybody interested in history put grafts in
>> their local repository pointing back at the mirror. I don't know enough
>> yet to say what the pros/cons of this approach might be wrt git subtree.
> 
> The primary advantage of grafts is that you can do something easy
> *right now* and then fix it all up later.  eg. if you screw up your
> history extraction and do it better later, you can just re-graft it
> and you're done.

How does one screw up the history extraction, if one is not doing any
fancy history rewriting (in this scenario)? Be there dragons?

> A secondary advantage of grafts is that cloning the "primary"
> repository will be tiny since it doesn't have much ancient history.

Right. Only those who ask for it will pay for it. And only developers
will have need of it, and not all developers at that.

> A disadvantage of grafts is that each user has to deal with grafts in
> his cloned repo, and unless he does, things like 'git log' and 'git
> blame' won't show anything from the grafted history.  Supposedly 'git
> replace' was designed to help with this issue, but I've never used it
> so I don't know for sure.

I'll add it to the list of things to learn about.

> And of course, grafts don't actually do any history rewriting for you.
> You could split out a subtree's history and then graft it on, but the
> splitting process is still the same as it would be without grafts.
> The alternative would be to *not* rewrite history, just keep the
> entire history of the whole project in one place, and graft it on if
> you really need it.  That's actually pretty clean (and accurately
> reflects exactly what *really happened*, which is a nice feature to
> have in a vcs history), but you'll then never have a single repo of
> just one subproject with the entire history of that subproject.  That
> latter turns out to not actually be very important in practice, so you
> might want to do it.

That's starting to sound pretty good.

Thanks,

-- 
Eric Niebler
BoostPro Computing
http://www.boostpro.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: help moving boost.org to git
  2010-07-06 18:00           ` Eric Niebler
@ 2010-07-06 18:13             ` Avery Pennarun
  2010-07-06 18:29               ` Eric Niebler
  0 siblings, 1 reply; 19+ messages in thread
From: Avery Pennarun @ 2010-07-06 18:13 UTC (permalink / raw)
  To: Eric Niebler; +Cc: git

On Tue, Jul 6, 2010 at 2:00 PM, Eric Niebler <eric@boostpro.com> wrote:
> On 7/6/2010 1:27 PM, Avery Pennarun wrote:
>> The primary advantage of grafts is that you can do something easy
>> *right now* and then fix it all up later.  eg. if you screw up your
>> history extraction and do it better later, you can just re-graft it
>> and you're done.
>
> How does one screw up the history extraction, if one is not doing any
> fancy history rewriting (in this scenario)? Be there dragons?

Well, "rewriting history" necessarily involves changing things about
the permanent record.  Every time you change things, you have a
potential to change them incorrectly.  So in general, not rewriting is
less error-prone than rewriting :)

Specifically, with a tool like git-subtree, it only really works if a
particular subproject has always existed in the same subdir of your
repo since it started.  If the subdir was ever renamed, or if some of
the files were previously part of one subdir but then moved around,
git-subtree doesn't (currently) know how to deal with that.
git-filter-branch can do anything you want, but you have to teach it
how, which is obviously even *more* error prone.

Things are also a little messy if you have some kind of top-level
directory with build infrastructure shared by all the subdirs.  Does
the top-level Makefile have a list of the subdirs it needs to build?
If so, there's no way to extract only a subset of true history that
will still build correctly - it'll be looking for directories that you
explicitly removed.  You could update the Makefiles programmatically
in every single revision, but that's starting to get extremely
messy... and your history stops representing what *real life* really
looked like at the time.

If your subdirs haven't been moving around (which sounds like that
might be the case for you), and you don't have any top-level files
that you care about, rewriting might turn out to be straightforward.
You could also make the decision on a subdir-by-subdir basis, I guess.

Have fun,

Avery

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: help moving boost.org to git
  2010-07-06 18:13             ` Avery Pennarun
@ 2010-07-06 18:29               ` Eric Niebler
  0 siblings, 0 replies; 19+ messages in thread
From: Eric Niebler @ 2010-07-06 18:29 UTC (permalink / raw)
  To: Avery Pennarun; +Cc: git

On 7/6/2010 2:13 PM, Avery Pennarun wrote:
<snip>
> Specifically, with a tool like git-subtree, it only really works if a
> particular subproject has always existed in the same subdir of your
> repo since it started.  If the subdir was ever renamed, or if some of
> the files were previously part of one subdir but then moved around,
> git-subtree doesn't (currently) know how to deal with that.

Bah! Yes, directories have moved around in our svn repro. :-( In
particular, we've had cases where libraries in boost began life as
sub-projects of a different library and then got spun off.

> git-filter-branch can do anything you want, but you have to teach it
> how, which is obviously even *more* error prone.

I can only imagine.

> Things are also a little messy if you have some kind of top-level
> directory with build infrastructure shared by all the subdirs.  Does
> the top-level Makefile have a list of the subdirs it needs to build?

Bah! Yes, the build, the docs and the test infrastructure all currently
share files across our submodules-to-be. Surely other projects have
encountered this problem before, right? (KDE, I'm looking in your
direction.)

> If so, there's no way to extract only a subset of true history that
> will still build correctly - it'll be looking for directories that you
> explicitly removed.  You could update the Makefiles programmatically
> in every single revision, but that's starting to get extremely
> messy... and your history stops representing what *real life* really
> looked like at the time.

I see what you mean.

> If your subdirs haven't been moving around (which sounds like that
> might be the case for you), and you don't have any top-level files
> that you care about, rewriting might turn out to be straightforward.
> You could also make the decision on a subdir-by-subdir basis, I guess.

More evidence that the fancy filter/branch/subtree/svn2git/whatever
utilities aren't going to get us where we'd like to be. A simple
conversion and grafts look like the only workable approach.

> Have fun,

Having heaps! Thanks,

-- 
Eric Niebler
BoostPro Computing
http://www.boostpro.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2010-07-06 18:41 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-07-05 14:16 help moving boost.org to git Eric Niebler
2010-07-05 14:48 ` Erik Faye-Lund
2010-07-05 14:48 ` Johannes Sixt
2010-07-05 17:51   ` Eric Niebler
2010-07-05 18:43     ` Sverre Rabbelier
2010-07-06 15:06   ` Raja R Harinath
2010-07-05 22:04 ` Finn Arne Gangstad
2010-07-05 23:11   ` Eric Niebler
2010-07-05 23:32     ` Avery Pennarun
2010-07-06  0:16       ` Eric Niebler
2010-07-06 17:27         ` Avery Pennarun
2010-07-06 18:00           ` Eric Niebler
2010-07-06 18:13             ` Avery Pennarun
2010-07-06 18:29               ` Eric Niebler
2010-07-06  1:46     ` Dave Abrahams
2010-07-06  8:51       ` Jakub Narebski
2010-07-06 10:34         ` David Abrahams
2010-07-06  0:16 ` Greg Troxel
2010-07-06  0:25   ` Eric Niebler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).