* help moving boost.org to git @ 2010-07-05 14:16 Eric Niebler 2010-07-05 14:48 ` Erik Faye-Lund ` (3 more replies) 0 siblings, 4 replies; 19+ messages in thread From: Eric Niebler @ 2010-07-05 14:16 UTC (permalink / raw) To: git I have a question about the best approach to take for refactoring a large svn project into git. The project, boost.org, is a collection of C++ libraries (>100) that are mostly independent. (There may be cross-library dependencies, but we plan to handle that at a higher level.) After the move to git, we'd like each library to be in its own git repository. Boost can then be a stitching-together of these, using submodules or something (opinions welcome). It's an old project with lots of history that we don't want to lose. The naive approach of simply forking into N repositories for the N libraries and deleting the unwanted files in each is unworkable because we'll end up with all the history duplicated everywhere ... >100 repositories, each larger than 100Mb. So, what are the options? Can I somehow delete from each repository the history that is irrelevant? Is these some feature of git I don't know about that can solve this problem for us? (Caveat: I'm new to git and still getting up to speed. An acceptable answer is: go off an learn about feature X and come back to us.) At boost, We've already discussed a few possible approaches. Feel free to comment and/or criticize any of the solutions suggested here: http://github.com/ryppl/ryppl/issues#issue/4 -- Eric Niebler BoostPro Computing http://www.boostpro.com ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: help moving boost.org to git 2010-07-05 14:16 help moving boost.org to git Eric Niebler @ 2010-07-05 14:48 ` Erik Faye-Lund 2010-07-05 14:48 ` Johannes Sixt ` (2 subsequent siblings) 3 siblings, 0 replies; 19+ messages in thread From: Erik Faye-Lund @ 2010-07-05 14:48 UTC (permalink / raw) To: Eric Niebler; +Cc: git On Mon, Jul 5, 2010 at 4:16 PM, Eric Niebler <eric@boostpro.com> wrote: > I have a question about the best approach to take for refactoring a > large svn project into git. The project, boost.org, is a collection of > C++ libraries (>100) that are mostly independent. (There may be > cross-library dependencies, but we plan to handle that at a higher > level.) After the move to git, we'd like each library to be in its own > git repository. Boost can then be a stitching-together of these, using > submodules or something (opinions welcome). It's an old project with > lots of history that we don't want to lose. The naive approach of simply > forking into N repositories for the N libraries and deleting the > unwanted files in each is unworkable because we'll end up with all the > history duplicated everywhere ... >100 repositories, each larger than 100Mb. > > So, what are the options? Can I somehow delete from each repository the > history that is irrelevant? Is these some feature of git I don't know > about that can solve this problem for us? > You're probably looking for git-filter-branch. This tool can be used with the --subdirectory-filter option to filter out a specific subdirectory to it's own branch. Or if the project isn't split into subdirectories, you can use the --tree-filter option to filter specific files if you want. See http://www.kernel.org/pub/software/scm/git/docs/git-filter-branch.html for details -- Erik "kusma" Faye-Lund ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: help moving boost.org to git 2010-07-05 14:16 help moving boost.org to git Eric Niebler 2010-07-05 14:48 ` Erik Faye-Lund @ 2010-07-05 14:48 ` Johannes Sixt 2010-07-05 17:51 ` Eric Niebler 2010-07-06 15:06 ` Raja R Harinath 2010-07-05 22:04 ` Finn Arne Gangstad 2010-07-06 0:16 ` Greg Troxel 3 siblings, 2 replies; 19+ messages in thread From: Johannes Sixt @ 2010-07-05 14:48 UTC (permalink / raw) To: Eric Niebler; +Cc: git Am 7/5/2010 16:16, schrieb Eric Niebler: > I have a question about the best approach to take for refactoring a > large svn project into git. The project, boost.org, is a collection of > C++ libraries (>100) that are mostly independent. (There may be > cross-library dependencies, but we plan to handle that at a higher > level.) After the move to git, we'd like each library to be in its own > git repository. You could use svn2git: http://gitorious.org/svn2git KDE uses it to split its SVN repository into pieces. The tool is driven by a "ruleset" that specifies SVN subdirectories and revision numbers that make up a module. -- "Atomic objects are neither active nor radioactive." -- Programming Languages -- C++, Final Committee Draft (Doc.N3092) ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: help moving boost.org to git 2010-07-05 14:48 ` Johannes Sixt @ 2010-07-05 17:51 ` Eric Niebler 2010-07-05 18:43 ` Sverre Rabbelier 2010-07-06 15:06 ` Raja R Harinath 1 sibling, 1 reply; 19+ messages in thread From: Eric Niebler @ 2010-07-05 17:51 UTC (permalink / raw) To: git On 7/5/2010 10:48 AM, Johannes Sixt wrote: > Am 7/5/2010 16:16, schrieb Eric Niebler: >> I have a question about the best approach to take for refactoring a >> large svn project into git. The project, boost.org, is a collection of >> C++ libraries (>100) that are mostly independent. (There may be >> cross-library dependencies, but we plan to handle that at a higher >> level.) After the move to git, we'd like each library to be in its own >> git repository. > > You could use svn2git: http://gitorious.org/svn2git > KDE uses it to split its SVN repository into pieces. The tool is driven by > a "ruleset" that specifies SVN subdirectories and revision numbers that > make up a module. I'm off to learn about filter-branch, tree-filter and svn2git. Thanks for the suggestions. More questions to come, I'm sure. -- Eric Niebler BoostPro Computing http://www.boostpro.com ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: help moving boost.org to git 2010-07-05 17:51 ` Eric Niebler @ 2010-07-05 18:43 ` Sverre Rabbelier 0 siblings, 0 replies; 19+ messages in thread From: Sverre Rabbelier @ 2010-07-05 18:43 UTC (permalink / raw) To: Eric Niebler; +Cc: git, Avery Pennarun Heya, On Mon, Jul 5, 2010 at 19:51, Eric Niebler <eric@boostpro.com> wrote: > I'm off to learn about filter-branch, tree-filter and svn2git. Thanks > for the suggestions. More questions to come, I'm sure. Also have a look at git-subtree. -- Cheers, Sverre Rabbelier ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: help moving boost.org to git 2010-07-05 14:48 ` Johannes Sixt 2010-07-05 17:51 ` Eric Niebler @ 2010-07-06 15:06 ` Raja R Harinath 1 sibling, 0 replies; 19+ messages in thread From: Raja R Harinath @ 2010-07-06 15:06 UTC (permalink / raw) To: git Hi, Johannes Sixt <j.sixt@viscovery.net> writes: > Am 7/5/2010 16:16, schrieb Eric Niebler: >> I have a question about the best approach to take for refactoring a >> large svn project into git. The project, boost.org, is a collection of >> C++ libraries (>100) that are mostly independent. (There may be >> cross-library dependencies, but we plan to handle that at a higher >> level.) After the move to git, we'd like each library to be in its own >> git repository. > > You could use svn2git: http://gitorious.org/svn2git > KDE uses it to split its SVN repository into pieces. The tool is driven by > a "ruleset" that specifies SVN subdirectories and revision numbers that > make up a module. I'm also involved in moving a large SVN project to git (the mono project). I have found and fixed several issues with svn2git git://gitorious.org/~harinath/svn2git/rrh-svn2git.git - Hari ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: help moving boost.org to git 2010-07-05 14:16 help moving boost.org to git Eric Niebler 2010-07-05 14:48 ` Erik Faye-Lund 2010-07-05 14:48 ` Johannes Sixt @ 2010-07-05 22:04 ` Finn Arne Gangstad 2010-07-05 23:11 ` Eric Niebler 2010-07-06 0:16 ` Greg Troxel 3 siblings, 1 reply; 19+ messages in thread From: Finn Arne Gangstad @ 2010-07-05 22:04 UTC (permalink / raw) To: Eric Niebler; +Cc: git On Mon, Jul 05, 2010 at 10:16:36AM -0400, Eric Niebler wrote: > I have a question about the best approach to take for refactoring a > large svn project into git. The project, boost.org, is a collection of > C++ libraries (>100) that are mostly independent. (There may be > cross-library dependencies, but we plan to handle that at a higher > level.) After the move to git, we'd like each library to be in its own > git repository. Boost can then be a stitching-together of these, using > submodules or something (opinions welcome). It's an old project with > lots of history that we don't want to lose. The naive approach of simply > forking into N repositories for the N libraries and deleting the > unwanted files in each is unworkable because we'll end up with all the > history duplicated everywhere ... >100 repositories, each larger than 100Mb. If the libraries are not independent (i.e. some commits are across multiple libraries), submodules will give you some interesting challenges to put it mildly. The current boost 1.43 is 29344 files, is this all there is? This should fit eaily into a single repository. The Linux kernel is much larger, and that is sort of the canonical single repo git project. I _strongly_ recommend that you go for a single repo if you can make it work. If you manage to create a single git repo with the history you want, it is trivial to split out separate repositories of subdirectories later (and those repos will then be comparatively small). git subtree allegedly automates this process more or less (I have not used it, but have heard good things about it). What about having a single "master repository", and then using subtree to create single-library repos for the library developers if they want a smaller repo to play around in? > So,, what are the options? Can I somehow delete from each repository the > history that is irrelevant? Is these some feature of git I don't know > about that can solve this problem for us? How do you define "irrelevant"? Do you only require enough history for git annotate/blame to give correct results? Or does this only refer to multiple repositories sharing the same ancient history? > At boost, We've already discussed a few possible approaches. Feel free > to comment and/or criticize any of the solutions suggested here: > > http://github.com/ryppl/ryppl/issues#issue/4 It is unclear from the discussion if you will change to git, or use git in addition to svn? This will have some impact on how to go about this. - Finn Arne ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: help moving boost.org to git 2010-07-05 22:04 ` Finn Arne Gangstad @ 2010-07-05 23:11 ` Eric Niebler 2010-07-05 23:32 ` Avery Pennarun 2010-07-06 1:46 ` Dave Abrahams 0 siblings, 2 replies; 19+ messages in thread From: Eric Niebler @ 2010-07-05 23:11 UTC (permalink / raw) To: git On 7/5/2010 6:04 PM, Finn Arne Gangstad wrote: > On Mon, Jul 05, 2010 at 10:16:36AM -0400, Eric Niebler wrote: >> I have a question about the best approach to take for refactoring a >> large svn project into git. The project, boost.org, is a collection of >> C++ libraries (>100) that are mostly independent. (There may be >> cross-library dependencies, but we plan to handle that at a higher >> level.) After the move to git, we'd like each library to be in its own >> git repository. Boost can then be a stitching-together of these, using >> submodules or something (opinions welcome). It's an old project with >> lots of history that we don't want to lose. The naive approach of simply >> forking into N repositories for the N libraries and deleting the >> unwanted files in each is unworkable because we'll end up with all the >> history duplicated everywhere ... >100 repositories, each larger than 100Mb. > > If the libraries are not independent (i.e. some commits are across > multiple libraries), submodules will give you some interesting > challenges to put it mildly. You have correctly assessed the situation. There *are* cross-library commits in our history. What are the implications of this for modularlization? > The current boost 1.43 is 29344 files, is this all there is? Yes. > This > should fit eaily into a single repository. The Linux kernel is much > larger, and that is sort of the canonical single repo git project. I > _strongly_ recommend that you go for a single repo if you can make it > work. It does fit into one repo, but that doesn't meet our needs for the future. Users want to install and build library X and its dependencies, not all of boost. This is increasingly becoming a problem as boost grows. Imagine if a perl programmer had to download all of CPAN to use or hack on any one perl module. Or if contributing to CPAN meant getting the whole shebang, history and all. I'm sure even in the Linux kernel, not *every* third-party driver is maintained in the master git repo. We are aiming to make boost a clearing-house for C++ libraries (like CPAN, or PyPi for python), turning the official boost distribution into little more than a well-tested collection of the libraries that have passed our peer-review and regression test process. In fact, the modularization has already been done, and work is well underway on the infrastructure to support dependency tracking. But the modularization is not history-preserving and needs to be redone. > If you manage to create a single git repo with the history you want, > it is trivial to split out separate repositories of subdirectories > later (and those repos will then be comparatively small). git subtree > allegedly automates this process more or less (I have not used it, but > have heard good things about it). What about having a single "master > repository", and then using subtree to create single-library repos for > the library developers if they want a smaller repo to play around in? This sounds like it might be ok, but I need to research it. >> So,, what are the options? Can I somehow delete from each repository the >> history that is irrelevant? Is these some feature of git I don't know >> about that can solve this problem for us? > > How do you define "irrelevant"? Do you only require enough history for > git annotate/blame to give correct results? Or does this only refer > to multiple repositories sharing the same ancient history? If multiple repositories share the same ancient history, wouldn't that give git annotate/blame enough information? Sorry, git newbie here. >> At boost, We've already discussed a few possible approaches. Feel free >> to comment and/or criticize any of the solutions suggested here: >> >> http://github.com/ryppl/ryppl/issues#issue/4 > > It is unclear from the discussion if you will change to git, or use > git in addition to svn? This will have some impact on how to go about > this. The plan is to move to git. However, we don't expect this to happen overnight, so a way to continue to pull changes from a svn mirror while the new git repositories are being set up would be ideal. -- Eric Niebler BoostPro Computing http://www.boostpro.com ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: help moving boost.org to git 2010-07-05 23:11 ` Eric Niebler @ 2010-07-05 23:32 ` Avery Pennarun 2010-07-06 0:16 ` Eric Niebler 2010-07-06 1:46 ` Dave Abrahams 1 sibling, 1 reply; 19+ messages in thread From: Avery Pennarun @ 2010-07-05 23:32 UTC (permalink / raw) To: Eric Niebler; +Cc: git (note: on this mailing list, you shouldn't drop names from the cc: line when replying to a thread) On Mon, Jul 5, 2010 at 7:11 PM, Eric Niebler <eric@boostpro.com> wrote: > On 7/5/2010 6:04 PM, Finn Arne Gangstad wrote: >> This >> should fit eaily into a single repository. The Linux kernel is much >> larger, and that is sort of the canonical single repo git project. I >> _strongly_ recommend that you go for a single repo if you can make it >> work. > > It does fit into one repo, but that doesn't meet our needs for the > future. Users want to install and build library X and its dependencies, > not all of boost. This is increasingly becoming a problem as boost > grows. Imagine if a perl programmer had to download all of CPAN to use > or hack on any one perl module. Or if contributing to CPAN meant getting > the whole shebang, history and all. I'm sure even in the Linux kernel, > not *every* third-party driver is maintained in the master git repo. Actually, that's mostly not true; there are a few third-party drivers that don't make it into the core Linux repo, but that's mostly because they haven't been accepted by the kernel maintainers for whatever reason (often quality or duplication, I guess). The goal for the vast majority of Linux drivers is indeed to get merged into the Linux core. ...and it works pretty well, all things considered. It's certainly not the only way to do it for every project, but it's actually a pretty good way. The kernel repo history runs to hundreds of megs nowadays, but on a modern Internet connection that's not a big deal. And then you never have to worry about downloading more modules later. You also never have versioning problems. > We are aiming to make boost a clearing-house for C++ libraries (like > CPAN, or PyPi for python), turning the official boost distribution into > little more than a well-tested collection of the libraries that have > passed our peer-review and regression test process. Of course you will want to have some kind of really excellent versioned dependency fetching system (exactly like CPAN or PyPi or ruby gems) if you want this to be nice. git's submodules stuff is almost certainly not going to add any features you need/want. On the other hand, cloning a separate git repo is pretty easy to write your CPAN-like script around. > In fact, the modularization has already been done, and work is well > underway on the infrastructure to support dependency tracking. But the > modularization is not history-preserving and needs to be redone. If your code doesn't move too many files around, then splitting out the history is pretty easy with git-subtree (a tool I wrote that's not part of git): git subtree split --prefix=/path/to/subdir And you get a new history for just that subdir. That might do exactly what you want. It also works iteratively, so you can export your history from svn, then re-export the changes as they occur over time. >>> So,, what are the options? Can I somehow delete from each repository the >>> history that is irrelevant? Is these some feature of git I don't know >>> about that can solve this problem for us? >> >> How do you define "irrelevant"? Do you only require enough history for >> git annotate/blame to give correct results? Or does this only refer >> to multiple repositories sharing the same ancient history? > > If multiple repositories share the same ancient history, wouldn't that > give git annotate/blame enough information? Sorry, git newbie here. Yes, it would. But how much of the ancient history do you want? If you want all of it, you don't save any space in your repo. > The plan is to move to git. However, we don't expect this to happen > overnight, so a way to continue to pull changes from a svn mirror while > the new git repositories are being set up would be ideal. This isn't too hard to do; you just need some scripts around git-svn and git-subtree (or whatever tool you use to do the splitting). We've done this at work for a couple of years now and it's working fine. The confusing part is taking *submissions* back through both channels. If you value your sanity, you probably want to only allow submissions back via svn while you're running the two in parallel; but that makes git's added features a lot less useful, so you probably want to run in parallel for only a short time. Have fun, Avery ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: help moving boost.org to git 2010-07-05 23:32 ` Avery Pennarun @ 2010-07-06 0:16 ` Eric Niebler 2010-07-06 17:27 ` Avery Pennarun 0 siblings, 1 reply; 19+ messages in thread From: Eric Niebler @ 2010-07-06 0:16 UTC (permalink / raw) To: Avery Pennarun; +Cc: git On 7/5/2010 7:32 PM, Avery Pennarun wrote: > (note: on this mailing list, you shouldn't drop names from the cc: > line when replying to a thread) Noted, thanks. > On Mon, Jul 5, 2010 at 7:11 PM, Eric Niebler <eric@boostpro.com> wrote: >> On 7/5/2010 6:04 PM, Finn Arne Gangstad wrote: >>> This >>> should fit eaily into a single repository. The Linux kernel is much >>> larger, and that is sort of the canonical single repo git project. I >>> _strongly_ recommend that you go for a single repo if you can make it >>> work. >> >> It does fit into one repo, but that doesn't meet our needs for the >> future. Users want to install and build library X and its dependencies, >> not all of boost. This is increasingly becoming a problem as boost >> grows. Imagine if a perl programmer had to download all of CPAN to use >> or hack on any one perl module. Or if contributing to CPAN meant getting >> the whole shebang, history and all. I'm sure even in the Linux kernel, >> not *every* third-party driver is maintained in the master git repo. > > Actually, that's mostly not true; there are a few third-party drivers > that don't make it into the core Linux repo <snip discussion showing my ignorance of Linux's repository structure> Thanks for the correction. The CPAN/PyPi analogy is still apt. >> We are aiming to make boost a clearing-house for C++ libraries (like >> CPAN, or PyPi for python), turning the official boost distribution into >> little more than a well-tested collection of the libraries that have >> passed our peer-review and regression test process. > > Of course you will want to have some kind of really excellent > versioned dependency fetching system (exactly like CPAN or PyPi or > ruby gems) if you want this to be nice. git's submodules stuff is > almost certainly not going to add any features you need/want. On the > other hand, cloning a separate git repo is pretty easy to write your > CPAN-like script around. Indeed, we are stealing the work of the python guys. Pip does most of what we want. They've graciously been accepting our patches so it happily clones git repos in order to satisfy dependencies now. It is some kind of really excellent! :-) >> In fact, the modularization has already been done, and work is well >> underway on the infrastructure to support dependency tracking. But the >> modularization is not history-preserving and needs to be redone. > > If your code doesn't move too many files around, then splitting out > the history is pretty easy with git-subtree (a tool I wrote that's not > part of git): > > git subtree split --prefix=/path/to/subdir > > And you get a new history for just that subdir. That might do exactly > what you want. It also works iteratively, so you can export your > history from svn, then re-export the changes as they occur over time. This looks like it here: http://github.com/apenwarr/git-subtree I'll have to read the docs. Thanks for the tip. >>>> So,, what are the options? Can I somehow delete from each repository the >>>> history that is irrelevant? Is these some feature of git I don't know >>>> about that can solve this problem for us? >>> >>> How do you define "irrelevant"? Do you only require enough history for >>> git annotate/blame to give correct results? Or does this only refer >>> to multiple repositories sharing the same ancient history? >> >> If multiple repositories share the same ancient history, wouldn't that >> give git annotate/blame enough information? Sorry, git newbie here. > > Yes, it would. But how much of the ancient history do you want? If > you want all of it, you don't save any space in your repo. Repos, plural. We'd save space because the history wouldn't be duplicated in each one. Right? Or else I'm confused and this something that will become clear after I understand what git subtree does. Right now, the other boost developers are pushing for a solution that uses grafts. I'm fuzzy on what they are exactly, but it seems that we'd freeze a svn mirror and have anybody interested in history put grafts in their local repository pointing back at the mirror. I don't know enough yet to say what the pros/cons of this approach might be wrt git subtree. >> The plan is to move to git. However, we don't expect this to happen >> overnight, so a way to continue to pull changes from a svn mirror while >> the new git repositories are being set up would be ideal. > > This isn't too hard to do; you just need some scripts around git-svn > and git-subtree (or whatever tool you use to do the splitting). We've > done this at work for a couple of years now and it's working fine. Cool. > The confusing part is taking *submissions* back through both channels. > If you value your sanity, you probably want to only allow submissions > back via svn while you're running the two in parallel; but that makes > git's added features a lot less useful, so you probably want to run in > parallel for only a short time. Oh my! I don't think we'd open the git repositories for changes until after we close down svn. This problem is hard enough. -- Eric Niebler BoostPro Computing http://www.boostpro.com ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: help moving boost.org to git 2010-07-06 0:16 ` Eric Niebler @ 2010-07-06 17:27 ` Avery Pennarun 2010-07-06 18:00 ` Eric Niebler 0 siblings, 1 reply; 19+ messages in thread From: Avery Pennarun @ 2010-07-06 17:27 UTC (permalink / raw) To: Eric Niebler; +Cc: git On Mon, Jul 5, 2010 at 8:16 PM, Eric Niebler <eric@boostpro.com> wrote: > On 7/5/2010 7:32 PM, Avery Pennarun wrote: >> Eric Niebler wrote: >>> If multiple repositories share the same ancient history, wouldn't that >>> give git annotate/blame enough information? Sorry, git newbie here. >> >> Yes, it would. But how much of the ancient history do you want? If >> you want all of it, you don't save any space in your repo. > > Repos, plural. We'd save space because the history wouldn't be > duplicated in each one. Right? Or else I'm confused and this something > that will become clear after I understand what git subtree does. The statement "multiple repositories share the same ancient history" above is the part that's confusing. If you use a tool like git-subtree or git-filter-branch, you're actually generating a "new history" based on the original history. The "new history" obviously contains fewer files than the original, which would take less space. But if you want multiple repositories to "share the same ancient history" you can't rewrite it, and thus you aren't saving any space in any one repo. I'm assuming you want to rewrite history to save space (since that's what this thread is about). And git annotate/blame will work as long as your rewritten history contains all the files you care about in that repo. > Right now, the other boost developers are pushing for a solution that > uses grafts. I'm fuzzy on what they are exactly, but it seems that we'd > freeze a svn mirror and have anybody interested in history put grafts in > their local repository pointing back at the mirror. I don't know enough > yet to say what the pros/cons of this approach might be wrt git subtree. The primary advantage of grafts is that you can do something easy *right now* and then fix it all up later. eg. if you screw up your history extraction and do it better later, you can just re-graft it and you're done. A secondary advantage of grafts is that cloning the "primary" repository will be tiny since it doesn't have much ancient history. A disadvantage of grafts is that each user has to deal with grafts in his cloned repo, and unless he does, things like 'git log' and 'git blame' won't show anything from the grafted history. Supposedly 'git replace' was designed to help with this issue, but I've never used it so I don't know for sure. And of course, grafts don't actually do any history rewriting for you. You could split out a subtree's history and then graft it on, but the splitting process is still the same as it would be without grafts. The alternative would be to *not* rewrite history, just keep the entire history of the whole project in one place, and graft it on if you really need it. That's actually pretty clean (and accurately reflects exactly what *really happened*, which is a nice feature to have in a vcs history), but you'll then never have a single repo of just one subproject with the entire history of that subproject. That latter turns out to not actually be very important in practice, so you might want to do it. >> The confusing part is taking *submissions* back through both channels. >> If you value your sanity, you probably want to only allow submissions >> back via svn while you're running the two in parallel; but that makes >> git's added features a lot less useful, so you probably want to run in >> parallel for only a short time. > > Oh my! I don't think we'd open the git repositories for changes until > after we close down svn. This problem is hard enough. It can be done, and I've done it :) But you're wise to avoid that situation. Have fun, Avery ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: help moving boost.org to git 2010-07-06 17:27 ` Avery Pennarun @ 2010-07-06 18:00 ` Eric Niebler 2010-07-06 18:13 ` Avery Pennarun 0 siblings, 1 reply; 19+ messages in thread From: Eric Niebler @ 2010-07-06 18:00 UTC (permalink / raw) To: Avery Pennarun; +Cc: git On 7/6/2010 1:27 PM, Avery Pennarun wrote: > On Mon, Jul 5, 2010 at 8:16 PM, Eric Niebler <eric@boostpro.com> wrote: >> On 7/5/2010 7:32 PM, Avery Pennarun wrote: >>> Eric Niebler wrote: >>>> If multiple repositories share the same ancient history, wouldn't that >>>> give git annotate/blame enough information? Sorry, git newbie here. >>> >>> Yes, it would. But how much of the ancient history do you want? If >>> you want all of it, you don't save any space in your repo. >> >> Repos, plural. We'd save space because the history wouldn't be >> duplicated in each one. Right? Or else I'm confused and this something >> that will become clear after I understand what git subtree does. > > The statement "multiple repositories share the same ancient history" > above is the part that's confusing. If you use a tool like > git-subtree or git-filter-branch, you're actually generating a "new > history" based on the original history. The "new history" obviously > contains fewer files than the original, which would take less space. > But if you want multiple repositories to "share the same ancient > history" you can't rewrite it, and thus you aren't saving any space in > any one repo. I think I have reached understanding! Thank you. It *would* save if I pull down, say, 100 of these new repos+ancient history because git would just store the ancient history locally once. I'm also guessing git is smart enough to avoid /downloading/ the ancient history 100x. > I'm assuming you want to rewrite history to save space (since that's > what this thread is about). And git annotate/blame will work as long > as your rewritten history contains all the files you care about in > that repo. Right. I now understand that, too. >> Right now, the other boost developers are pushing for a solution that >> uses grafts. I'm fuzzy on what they are exactly, but it seems that we'd >> freeze a svn mirror and have anybody interested in history put grafts in >> their local repository pointing back at the mirror. I don't know enough >> yet to say what the pros/cons of this approach might be wrt git subtree. > > The primary advantage of grafts is that you can do something easy > *right now* and then fix it all up later. eg. if you screw up your > history extraction and do it better later, you can just re-graft it > and you're done. How does one screw up the history extraction, if one is not doing any fancy history rewriting (in this scenario)? Be there dragons? > A secondary advantage of grafts is that cloning the "primary" > repository will be tiny since it doesn't have much ancient history. Right. Only those who ask for it will pay for it. And only developers will have need of it, and not all developers at that. > A disadvantage of grafts is that each user has to deal with grafts in > his cloned repo, and unless he does, things like 'git log' and 'git > blame' won't show anything from the grafted history. Supposedly 'git > replace' was designed to help with this issue, but I've never used it > so I don't know for sure. I'll add it to the list of things to learn about. > And of course, grafts don't actually do any history rewriting for you. > You could split out a subtree's history and then graft it on, but the > splitting process is still the same as it would be without grafts. > The alternative would be to *not* rewrite history, just keep the > entire history of the whole project in one place, and graft it on if > you really need it. That's actually pretty clean (and accurately > reflects exactly what *really happened*, which is a nice feature to > have in a vcs history), but you'll then never have a single repo of > just one subproject with the entire history of that subproject. That > latter turns out to not actually be very important in practice, so you > might want to do it. That's starting to sound pretty good. Thanks, -- Eric Niebler BoostPro Computing http://www.boostpro.com ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: help moving boost.org to git 2010-07-06 18:00 ` Eric Niebler @ 2010-07-06 18:13 ` Avery Pennarun 2010-07-06 18:29 ` Eric Niebler 0 siblings, 1 reply; 19+ messages in thread From: Avery Pennarun @ 2010-07-06 18:13 UTC (permalink / raw) To: Eric Niebler; +Cc: git On Tue, Jul 6, 2010 at 2:00 PM, Eric Niebler <eric@boostpro.com> wrote: > On 7/6/2010 1:27 PM, Avery Pennarun wrote: >> The primary advantage of grafts is that you can do something easy >> *right now* and then fix it all up later. eg. if you screw up your >> history extraction and do it better later, you can just re-graft it >> and you're done. > > How does one screw up the history extraction, if one is not doing any > fancy history rewriting (in this scenario)? Be there dragons? Well, "rewriting history" necessarily involves changing things about the permanent record. Every time you change things, you have a potential to change them incorrectly. So in general, not rewriting is less error-prone than rewriting :) Specifically, with a tool like git-subtree, it only really works if a particular subproject has always existed in the same subdir of your repo since it started. If the subdir was ever renamed, or if some of the files were previously part of one subdir but then moved around, git-subtree doesn't (currently) know how to deal with that. git-filter-branch can do anything you want, but you have to teach it how, which is obviously even *more* error prone. Things are also a little messy if you have some kind of top-level directory with build infrastructure shared by all the subdirs. Does the top-level Makefile have a list of the subdirs it needs to build? If so, there's no way to extract only a subset of true history that will still build correctly - it'll be looking for directories that you explicitly removed. You could update the Makefiles programmatically in every single revision, but that's starting to get extremely messy... and your history stops representing what *real life* really looked like at the time. If your subdirs haven't been moving around (which sounds like that might be the case for you), and you don't have any top-level files that you care about, rewriting might turn out to be straightforward. You could also make the decision on a subdir-by-subdir basis, I guess. Have fun, Avery ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: help moving boost.org to git 2010-07-06 18:13 ` Avery Pennarun @ 2010-07-06 18:29 ` Eric Niebler 0 siblings, 0 replies; 19+ messages in thread From: Eric Niebler @ 2010-07-06 18:29 UTC (permalink / raw) To: Avery Pennarun; +Cc: git On 7/6/2010 2:13 PM, Avery Pennarun wrote: <snip> > Specifically, with a tool like git-subtree, it only really works if a > particular subproject has always existed in the same subdir of your > repo since it started. If the subdir was ever renamed, or if some of > the files were previously part of one subdir but then moved around, > git-subtree doesn't (currently) know how to deal with that. Bah! Yes, directories have moved around in our svn repro. :-( In particular, we've had cases where libraries in boost began life as sub-projects of a different library and then got spun off. > git-filter-branch can do anything you want, but you have to teach it > how, which is obviously even *more* error prone. I can only imagine. > Things are also a little messy if you have some kind of top-level > directory with build infrastructure shared by all the subdirs. Does > the top-level Makefile have a list of the subdirs it needs to build? Bah! Yes, the build, the docs and the test infrastructure all currently share files across our submodules-to-be. Surely other projects have encountered this problem before, right? (KDE, I'm looking in your direction.) > If so, there's no way to extract only a subset of true history that > will still build correctly - it'll be looking for directories that you > explicitly removed. You could update the Makefiles programmatically > in every single revision, but that's starting to get extremely > messy... and your history stops representing what *real life* really > looked like at the time. I see what you mean. > If your subdirs haven't been moving around (which sounds like that > might be the case for you), and you don't have any top-level files > that you care about, rewriting might turn out to be straightforward. > You could also make the decision on a subdir-by-subdir basis, I guess. More evidence that the fancy filter/branch/subtree/svn2git/whatever utilities aren't going to get us where we'd like to be. A simple conversion and grafts look like the only workable approach. > Have fun, Having heaps! Thanks, -- Eric Niebler BoostPro Computing http://www.boostpro.com ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: help moving boost.org to git 2010-07-05 23:11 ` Eric Niebler 2010-07-05 23:32 ` Avery Pennarun @ 2010-07-06 1:46 ` Dave Abrahams 2010-07-06 8:51 ` Jakub Narebski 1 sibling, 1 reply; 19+ messages in thread From: Dave Abrahams @ 2010-07-06 1:46 UTC (permalink / raw) To: git Eric Niebler <eric <at> boostpro.com> writes: > We are aiming to make boost a clearing-house for C++ libraries (like > CPAN, or PyPi for python), Clarification: that's our goal for Ryppl, not Boost. > turning the official boost distribution into > little more than a well-tested collection of the libraries that have > passed our peer-review and regression test process. Exactly right. -- Dave Abrahams BoostPro Computing http://www.boostpro.com ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: help moving boost.org to git 2010-07-06 1:46 ` Dave Abrahams @ 2010-07-06 8:51 ` Jakub Narebski 2010-07-06 10:34 ` David Abrahams 0 siblings, 1 reply; 19+ messages in thread From: Jakub Narebski @ 2010-07-06 8:51 UTC (permalink / raw) To: Dave Abrahams; +Cc: git, Eric Niebler Dave Abrahams <dave@boostpro.com> writes: > Eric Niebler <eric <at> boostpro.com> writes: > > > We are aiming to make boost a clearing-house for C++ libraries (like > > CPAN, or PyPi for python), > > Clarification: that's our goal for Ryppl, not Boost. By the way, could you please add information about Ryppl to Git Wiki? https://git.wiki.kernel.org/index.php/InterfacesFrontendsAndTools Thanks in advance -- Jakub Narebski Poland ShadeHawk on #git ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: help moving boost.org to git 2010-07-06 8:51 ` Jakub Narebski @ 2010-07-06 10:34 ` David Abrahams 0 siblings, 0 replies; 19+ messages in thread From: David Abrahams @ 2010-07-06 10:34 UTC (permalink / raw) To: Jakub Narebski; +Cc: git, Eric Niebler At Tue, 06 Jul 2010 01:51:00 -0700 (PDT), Jakub Narebski wrote: > > Dave Abrahams <dave@boostpro.com> writes: > > > Eric Niebler <eric <at> boostpro.com> writes: > > > > > We are aiming to make boost a clearing-house for C++ libraries (like > > > CPAN, or PyPi for python), > > > > Clarification: that's our goal for Ryppl, not Boost. > > By the way, could you please add information about Ryppl to Git Wiki? > https://git.wiki.kernel.org/index.php/InterfacesFrontendsAndTools We'd be happy to, but: have you read the document at ryppl.org, and do you really think it's appropriate? Even though ryppl is still very alpha? [BTW, I can't reach that page right now; it's just forever “waiting for reply...”] -- Dave Abrahams BoostPro Computing http://www.boostpro.com ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: help moving boost.org to git 2010-07-05 14:16 help moving boost.org to git Eric Niebler ` (2 preceding siblings ...) 2010-07-05 22:04 ` Finn Arne Gangstad @ 2010-07-06 0:16 ` Greg Troxel 2010-07-06 0:25 ` Eric Niebler 3 siblings, 1 reply; 19+ messages in thread From: Greg Troxel @ 2010-07-06 0:16 UTC (permalink / raw) To: Eric Niebler; +Cc: git [-- Attachment #1: Type: text/plain, Size: 1559 bytes --] You have found the core issue with svn/git: svn allows you to have a large repo with everything (and atomic commits across it) and to have users check out parts of the repo separately. git does not, because the svn separate checkouts model only works with a remote repository that you don't keep a copy of. With git, cloning the repo gets you the whole thing. One thought is that you may want to separate how you organize boost sources in git and how you release them. It's possible to have a single git repo for all libraries and have atomic commits but then create distfiles for each library separately. git becomes a bit slow when repositories get really large (although other tools are not any better - the problem is in the sheer number of vnode ops necessary for the semantics). I have a repo with all of NetBSD's "src", "xsrc" and "pkgsrc" in it, and "git status" can take several seconds because it is calling stat on 230K files. With only 23K files, things should be ok. My advice (which is not really about git) is to figure out whether you want: A) a set of interrelated libraries on which you will allow atomic commits that change interfaces/usage in multiple libraries or B) a set of independent libaries which have commits to separate libraries, and for which you insist that each library have an API and ABI compatiblity story, so that even when upgraded other libraries can continue to use it. For A, you probably want one git repo, much as you have one svn repo now. For B, multiple git repos are the right answer. [-- Attachment #2: Type: application/pgp-signature, Size: 194 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: help moving boost.org to git 2010-07-06 0:16 ` Greg Troxel @ 2010-07-06 0:25 ` Eric Niebler 0 siblings, 0 replies; 19+ messages in thread From: Eric Niebler @ 2010-07-06 0:25 UTC (permalink / raw) To: Greg Troxel; +Cc: git On 7/5/2010 8:16 PM, Greg Troxel wrote: > > You have found the core issue with svn/git: svn allows you to have a > large repo with everything (and atomic commits across it) and to have > users check out parts of the repo separately. git does not, because the > svn separate checkouts model only works with a remote repository that > you don't keep a copy of. With git, cloning the repo gets you the whole > thing. Makes sense. > One thought is that you may want to separate how you organize boost > sources in git and how you release them. It's possible to have a single > git repo for all libraries and have atomic commits but then create > distfiles for each library separately. > > git becomes a bit slow when ... <snip> It can't get any worse than svn. We haven't run into any perf problems with git yet. That's not our primary concern. > My advice (which is not really about git) is to figure out whether you > want: > > A) a set of interrelated libraries on which you will allow atomic > commits that change interfaces/usage in multiple libraries > > or > > B) a set of independent libaries which have commits to separate > libraries, and for which you insist that each library have an API and > ABI compatiblity story, so that even when upgraded other libraries can > continue to use it. > > > For A, you probably want one git repo, much as you have one svn repo > now. For B, multiple git repos are the right answer. I'll take B FTW! :-) The idea is to open up and distribute C++ library development. Versioned dependency tracking will be handled at a higher level with per-project metadata and a tool (pip) that resolves dependencies. API compatibility is handled with peer review and regression testing. ABI compatibility is not an issue because we're distributing source code. -- Eric Niebler BoostPro Computing http://www.boostpro.com ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2010-07-06 18:41 UTC | newest] Thread overview: 19+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-07-05 14:16 help moving boost.org to git Eric Niebler 2010-07-05 14:48 ` Erik Faye-Lund 2010-07-05 14:48 ` Johannes Sixt 2010-07-05 17:51 ` Eric Niebler 2010-07-05 18:43 ` Sverre Rabbelier 2010-07-06 15:06 ` Raja R Harinath 2010-07-05 22:04 ` Finn Arne Gangstad 2010-07-05 23:11 ` Eric Niebler 2010-07-05 23:32 ` Avery Pennarun 2010-07-06 0:16 ` Eric Niebler 2010-07-06 17:27 ` Avery Pennarun 2010-07-06 18:00 ` Eric Niebler 2010-07-06 18:13 ` Avery Pennarun 2010-07-06 18:29 ` Eric Niebler 2010-07-06 1:46 ` Dave Abrahams 2010-07-06 8:51 ` Jakub Narebski 2010-07-06 10:34 ` David Abrahams 2010-07-06 0:16 ` Greg Troxel 2010-07-06 0:25 ` Eric Niebler
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).