* question (possibly) on git subtree/submodules @ 2010-07-23 14:00 Maurizio Vitale 2010-07-23 16:56 ` Chris Packham 0 siblings, 1 reply; 9+ messages in thread From: Maurizio Vitale @ 2010-07-23 14:00 UTC (permalink / raw) To: git I'm new to git and have read the recent thread on subtree support. I'm not sure they (or git submodules) offer what I'm looking for. Here's the scenario: - I have a large monolithic code base, all in my repository (e.g. I don't need to link in external repositories, which is what I understand submodules offer - I'd like to be able to clone only a small fraction of the repository (say an arbitrary directory or even a single file) in order to make small changes - these directories are not known when the full repository is set up. - commits to the part I've checked out should show in the history of any clone that includes the part, up to the full repository - ideally, I should be able to incrementally clone portions (e.g. I've checked out path/dir_A and realize I need to modify path/dir_B as well). these additional clones should be in whatever branch I switched to after the initial checkouts. Assuming the above makes any sense (in general or in git), is there anything in git that would help me doing what I'm looking for? Thanks, Maurizio ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: question (possibly) on git subtree/submodules 2010-07-23 14:00 question (possibly) on git subtree/submodules Maurizio Vitale @ 2010-07-23 16:56 ` Chris Packham 2010-07-23 17:18 ` Jonathan Nieder 2010-07-27 10:56 ` Alex 0 siblings, 2 replies; 9+ messages in thread From: Chris Packham @ 2010-07-23 16:56 UTC (permalink / raw) To: maurizio.vitale; +Cc: Maurizio Vitale, git Hi, On 23/07/10 07:00, Maurizio Vitale wrote: > > I'm new to git and have read the recent thread on subtree support. > I'm not sure they (or git submodules) offer what I'm looking for. > Here's the scenario: > - I have a large monolithic code base, all in my repository (e.g. > I don't need to link in external repositories, which is what I > understand submodules offer > - I'd like to be able to clone only a small fraction of the > repository (say an arbitrary directory or even a single file) > in order to make small changes > - these directories are not known when the full repository is set > up. > - commits to the part I've checked out should show in the history > of any clone that includes the part, up to the full repository > - ideally, I should be able to incrementally clone portions (e.g. > I've checked out path/dir_A and realize I need to modify > path/dir_B as well). > these additional clones should be in whatever branch I switched > to after the initial checkouts. > > Assuming the above makes any sense (in general or in git), is there > anything in git that would help me doing what I'm looking for? > Thanks, > > Maurizio The short answer is no. Nothing git has currently will let you clone a subset of files. Shallow clones exist if you want all the code and the last X changes. The reason for this is git, like other DVCSes, tracks _changes_ rather than _files_ this is something that took me a while to get my head around when I was learning git. The best advice I've seen is to actually take your repository and use git filter-branch to create several smaller repositories (or depending on your desire for retention of history just create new repos). You can then use submodules or subtrees to stitch these back together into a super project to which you can add the smaller repositories as needed (note: I have never used subtrees so I'm not 100% sure if what I'm saying applies to them) . We use this model with submodules at $dayjob and it works quite well for us. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: question (possibly) on git subtree/submodules 2010-07-23 16:56 ` Chris Packham @ 2010-07-23 17:18 ` Jonathan Nieder 2010-07-23 18:35 ` Chris Packham 2010-07-27 10:56 ` Alex 1 sibling, 1 reply; 9+ messages in thread From: Jonathan Nieder @ 2010-07-23 17:18 UTC (permalink / raw) To: Chris Packham; +Cc: maurizio.vitale, Maurizio Vitale, git Chris Packham wrote: > The short answer is no. Nothing git has currently will let you clone a > subset of files. Shallow clones exist if you want all the code and the > last X changes. The reason for this is git, like other DVCSes, tracks > _changes_ rather than _files_ this is something that took me a while to > get my head around when I was learning git. Not quite as cut-and-dried as it may sound, I think. Internally git compresses blobs (and other objects) by comparing them to other ones, but I do not think that is what you are talking about, and I do not see what that has to do with partial clones. In fact, the main reason I can see that partial clones (in the sense of getting all metadata but not all blobs) are not implemented is that no one has written code for it yet. Here is a thread on related work[1]. Maybe someone else can find a more pertinent link. > The best advice I've seen is to actually take your repository and use > git filter-branch to create several smaller repositories Right, and this is what “git subtree” excels at. It provides an alternative interface and implementation for “git filter-branch --subdirectory-filter”. Hope that helps, Jonathan [1] http://thread.gmane.org/gmane.comp.version-control.git/73117/focus=73935 ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: question (possibly) on git subtree/submodules 2010-07-23 17:18 ` Jonathan Nieder @ 2010-07-23 18:35 ` Chris Packham 0 siblings, 0 replies; 9+ messages in thread From: Chris Packham @ 2010-07-23 18:35 UTC (permalink / raw) To: Jonathan Nieder; +Cc: git On 23/07/10 10:18, Jonathan Nieder wrote: > Chris Packham wrote: > >> The short answer is no. Nothing git has currently will let you clone a >> subset of files. Shallow clones exist if you want all the code and the >> last X changes. The reason for this is git, like other DVCSes, tracks >> _changes_ rather than _files_ this is something that took me a while to >> get my head around when I was learning git. > > Not quite as cut-and-dried as it may sound, I think. Internally git > compresses blobs (and other objects) by comparing them to other ones, > but I do not think that is what you are talking about, and I do not > see what that has to do with partial clones. In fact, the main reason > I can see that partial clones (in the sense of getting all metadata > but not all blobs) are not implemented is that no one has written code > for it yet. > > Here is a thread on related work[1]. Maybe someone else can find a > more pertinent link. > OK I think I must have read to much into the "tracks changes" part, thanks for pointing it out. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: question (possibly) on git subtree/submodules 2010-07-23 16:56 ` Chris Packham 2010-07-23 17:18 ` Jonathan Nieder @ 2010-07-27 10:56 ` Alex 2010-07-27 12:48 ` Jakub Narebski 1 sibling, 1 reply; 9+ messages in thread From: Alex @ 2010-07-27 10:56 UTC (permalink / raw) To: git Chris Packham <judge.packham <at> gmail.com> writes: > The short answer is no. Nothing git has currently will let you clone a > subset of files. Isn't that what 'sparse checkout' does? (http://www.kernel.org/pub/software/scm/git/docs/git-read-tree.html#_sparse_checkout) Alex ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: question (possibly) on git subtree/submodules 2010-07-27 10:56 ` Alex @ 2010-07-27 12:48 ` Jakub Narebski 2010-07-27 14:24 ` RFC: Sparse checkout improvements (was: Re: question (possibly) on git subtree/submodules) Marc Branchaud 0 siblings, 1 reply; 9+ messages in thread From: Jakub Narebski @ 2010-07-27 12:48 UTC (permalink / raw) To: Alex; +Cc: git Alex <ajb44.geo@yahoo.com> writes: > Chris Packham <judge.packham <at> gmail.com> writes: > > > The short answer is no. Nothing git has currently will let you clone a > > subset of files. > > Isn't that what 'sparse checkout' does? > (http://www.kernel.org/pub/software/scm/git/docs/git-read-tree.html#_sparse_checkout) No, 'sparse checkout' is only about checkout, i.e. the working area. You still have all objects in repository, only part of tree (part of project / repository) is not checked out, not present on disk as files. -- Jakub Narebski Poland ShadeHawk on #git ^ permalink raw reply [flat|nested] 9+ messages in thread
* RFC: Sparse checkout improvements (was: Re: question (possibly) on git subtree/submodules) 2010-07-27 12:48 ` Jakub Narebski @ 2010-07-27 14:24 ` Marc Branchaud 2010-07-27 16:55 ` skillzero 0 siblings, 1 reply; 9+ messages in thread From: Marc Branchaud @ 2010-07-27 14:24 UTC (permalink / raw) To: Jakub Narebski; +Cc: Alex, git On 10-07-27 08:48 AM, Jakub Narebski wrote: > Alex <ajb44.geo@yahoo.com> writes: > >> Chris Packham <judge.packham <at> gmail.com> writes: >> >>> The short answer is no. Nothing git has currently will let you clone a >>> subset of files. >> >> Isn't that what 'sparse checkout' does? >> (http://www.kernel.org/pub/software/scm/git/docs/git-read-tree.html#_sparse_checkout) > > No, 'sparse checkout' is only about checkout, i.e. the working area. > You still have all objects in repository, only part of tree (part of > project / repository) is not checked out, not present on disk as > files. There's no such thing as a "sparse fetch" but you can do something like git clone -n git://there/foo.git cd foo then git checkout origin/<branch> -- <paths...> or git config core.sparseCheckout true [ Add paths to .git/info/sparse-checkout ] git checkout <branch> but it's fairly inconvenient for day-to-day work. Also, putting a .git/info/sparse-checkout file in a public repo seems of limited use. So IMHO the current sparse-checkout feature is pretty bare-bones and could use some meat. Here's some thoughts: * What's missing is a way to define named collections of paths ("sparse-sets?") in .git/info/sparse-checkout, so that you can conveniently checkout a particular subset of the working directory. It would also be nice to switch between different sparse-sets. * It would also be good to have a way for a repo to define a default sparse-set, so that a clone would only checkout that default. * I also think that core.sparseCheckout should be true by default, and git should impose no sparseness if .git/info/sparse-checkout is missing or empty. Comments? M. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: RFC: Sparse checkout improvements (was: Re: question (possibly) on git subtree/submodules) 2010-07-27 14:24 ` RFC: Sparse checkout improvements (was: Re: question (possibly) on git subtree/submodules) Marc Branchaud @ 2010-07-27 16:55 ` skillzero 2010-07-28 13:42 ` RFC: Sparse checkout improvements Marc Branchaud 0 siblings, 1 reply; 9+ messages in thread From: skillzero @ 2010-07-27 16:55 UTC (permalink / raw) To: Marc Branchaud; +Cc: Jakub Narebski, Alex, git On Tue, Jul 27, 2010 at 7:24 AM, Marc Branchaud <marcnarc@xiplink.com> wrote: > * What's missing is a way to define named collections of paths > ("sparse-sets?") in .git/info/sparse-checkout, so that you can conveniently > checkout a particular subset of the working directory. It would also be nice > to switch between different sparse-sets. I pasted in a script I wrote to work with the sparse checkout feature. I'm not a scripting expert so it probably doesn't things incorrectly. It lets you create "modules" by adding sections to .gitmodules file at the root of the repository (or a file you specify). You can list them, switch/checkout between them, or reset back to normal: [module "MyApp1"] <path1> <path2> $ git module list MyApp1 $ git module checkout MyApp1 $ git module reset > * It would also be good to have a way for a repo to define a default > sparse-set, so that a clone would only checkout that default. Yes, this would be nice. Ideally what I would like is for there to be a clone option to specify a "module" (what I've been calling sparse sets). A first step could just clone the full repository with -n then do 'git module checkout <module>' (what my other scripts do to prepare archives). Ideally, it would do some work on the server side to only send the objects needed for paths specified by the sparse set (but still allow me to commit and later push changes back). -- git-module script (email may mess up the spacing, causing things to not line up, but you get the idea) use Getopt::Long qw(:config gnu_getopt); use File::Path; my $gBranch = ""; my $gHelp = 0; my $gModulesFile = ""; my $gModules = {}; my $gRecursive = 0; # Parse the command line. if( @ARGV < 1 ) { Usage(); } GetOptions( "b|branch=s" => \$gBranch, "h|help" => \$gHelp, "f|modules-file=s" => \$gModulesFile, "r|recursive!" => \$gRecursive, ) or die( "\n" ); if( $gHelp ) { Usage(); } if( @ARGV < 1 ) { die( "error: no command specified. See 'git module help'.\n" ); } my $cmd = shift; if( $cmd eq "checkout" ) { cmd_checkout(); } elsif( $cmd eq "help" ) { Usage(); } elsif( $cmd eq "list" ) { cmd_list(); } elsif( $cmd eq "reset" ) { cmd_reset(); } else { die( "error: unknown command '$cmd'. See 'git module help'.\n" ); } # # cmd_checkout # sub cmd_checkout { ReadModulesFile(); my $moduleName = shift @ARGV; if( !$moduleName ) { die( "error: no module name specified.\n" ); } if( !$gModules->{$moduleName} ) { die( "error: module '$moduleName' not found.\n" ); } # Enable sparse. my $currentCmd = "git config core.sparseCheckout true"; system( $currentCmd ) == 0 or die( "error: '$currentCmd' $?.\n" ); # Write the sparse patterns. my $gitDir = `git rev-parse --git-dir`; if( $? != 0 ) { die( "error: can't find git repository $?.\n" ); } chop( $gitDir ); my $sparsePath = $gitDir . "/info/sparse-checkout"; if( $? != 0 ) { die( "error: read git directory failed $?.\n" ); } open( FILE, ">", $sparsePath ) or die( "error: can't open '$sparsePath'.\n" ); foreach( @{$gModules->{$moduleName}} ) { print( FILE "$_\n" ); } close( FILE ); # Checkout using the new sparse patterns. my $currentCmd = "git checkout $gBranch --"; system( $currentCmd ) == 0 or die( "error: '$currentCmd' $?.\n" ); } # # cmd_list # sub cmd_list { ReadModulesFile(); my $moduleName = shift @ARGV; if( $moduleName eq "" ) { if( $gRecursive ) { foreach my $moduleName ( sort( keys %{$gModules} ) ) { print( "$moduleName\n" ); foreach( @{$gModules->{$moduleName}} ) { print( "\t$_\n" ); } } } else { foreach my $moduleName ( sort( keys %{$gModules} ) ) { print( "$moduleName\n" ); } } } else { if( !$gModules->{$moduleName} ) { die( "module '$moduleName' not found.\n" ); } foreach( @{$gModules->{$moduleName}} ) { print( "$_\n" ); } } } # # cmd_reset # sub cmd_reset { # Enable sparse. my $currentCmd = "git config core.sparseCheckout true"; system( $currentCmd ) == 0 or die( "error: '$currentCmd' $?.\n" ); # Write a special sparse pattern of "*" to mean everything. my $gitDir = `git rev-parse --git-dir`; if( $? != 0 ) { die( "error: can't find git repository $?.\n" ); } chop( $gitDir ); my $sparsePath = $gitDir . "/info/sparse-checkout"; if( $? != 0 ) { die( "error: read git directory failed $?.\n" ); } open( FILE, ">", $sparsePath ) or die( "error: can't open '$sparsePath'.\n" ); print( FILE "*\n" ); close( FILE ); # Checkout to clear the skip-worktree bits and checkout all entries. my $currentCmd = "git checkout $gBranch --"; system( $currentCmd ) == 0 or die( "error: '$currentCmd' $?.\n" ); # Disable sparse. my $currentCmd = "git config core.sparseCheckout false"; system( $currentCmd ) == 0 or die( "error: '$currentCmd' $?.\n" ); } # # ReadModulesFile # sub ReadModulesFile { my @lines = (); if( $gModulesFile eq "" ) # No file means read from the .gitmodules file in the repo. { if( $gBranch ne "" ) { @lines = `git show $gBranch:.gitmodules`;} else { @lines = `git show HEAD:.gitmodules`; } if( $? != 0 ) { die( "error: read .gitmodules file failed: $?.\n" ); } } elsif( $gModulesFile eq "-" ) # - means read from stdin. { @lines = <STDIN>; } else { open( FILE, $gModulesFile ) or die( "error: can't open '$gModulesFile'.\n" ); @lines = <FILE>; close( FILE ); } chomp( @lines ); my $isModule = 0; my $moduleName = ""; foreach my $line ( @lines ) { $line =~ s/^\s+//; # Strip leading whitespace. $line =~ s/\s+$//; # Strip trailing whitespace. $line =~ s/\r//g; # Strip CR's. $line =~ s/\n//g; # Strip LF's. if( $line =~ /\s*\[(.*?)\]\s*/ ) # Check for section header. { $moduleName = $1; if( $moduleName =~ /\s*module\s*\"(.*)\"/ ) { $moduleName = $1; $isModule = 1; } else { $isModule = 0; } next; } next if !$isModule; # Skip entries that aren't in module sections. next if $line =~ /^\s*\;/; # Skip lines beginning with ';'. next if $line =~ /^\s*\#/; # Skip lines beginning with '#'. next if length $line == 0; # Skip empty lines. push( @{$gModules->{$moduleName}}, $line ); } } # # Usage # sub Usage { print( STDERR "Usage: git-module [options] command [command options]\n" ); print( STDERR "\n" ); print( STDERR "Options:\n" ); print( STDERR " -b/--branch <name> Branch to use.\n" ); print( STDERR " -f/--modules-file <file> Custom modules file to use.\n" ); print( STDERR "\n" ); print( STDERR "Commands:\n" ); print( STDERR " checkout <name> Check out a module.\n" ); print( STDERR " list [-r] [name] List module(s). -r lists modules and patterns.\n" ); print( STDERR " reset Reset to a non-sparse checkout.\n" ); print( STDERR "\n" ); exit( 1 ); } ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: RFC: Sparse checkout improvements 2010-07-27 16:55 ` skillzero @ 2010-07-28 13:42 ` Marc Branchaud 0 siblings, 0 replies; 9+ messages in thread From: Marc Branchaud @ 2010-07-28 13:42 UTC (permalink / raw) To: skillzero; +Cc: Jakub Narebski, Alex, git On 10-07-27 12:55 PM, skillzero@gmail.com wrote: > On Tue, Jul 27, 2010 at 7:24 AM, Marc Branchaud <marcnarc@xiplink.com> wrote: > >> * What's missing is a way to define named collections of paths >> ("sparse-sets?") in .git/info/sparse-checkout, so that you can conveniently >> checkout a particular subset of the working directory. It would also be nice >> to switch between different sparse-sets. > > I pasted in a script I wrote to work with the sparse checkout feature. > I'm not a scripting expert so it probably doesn't things incorrectly. > It lets you create "modules" by adding sections to .gitmodules file at > the root of the repository (or a file you specify). You can list them, > switch/checkout between them, or reset back to normal: That script looks like a great proof-of-concept. I haven't tried it out yet, but it seems to work along the lines of what I was thinking about. I'd like to see most of this functionality folded into the standard git commands, and maybe a new git-sparse command for managing sparse sets. > [module "MyApp1"] > <path1> > <path2> > > $ git module list > MyApp1 > > $ git module checkout MyApp1 > > $ git module reset > >> * It would also be good to have a way for a repo to define a default >> sparse-set, so that a clone would only checkout that default. > > Yes, this would be nice. Ideally what I would like is for there to be > a clone option to specify a "module" (what I've been calling sparse > sets). A first step could just clone the full repository with -n then > do 'git module checkout <module>' (what my other scripts do to prepare > archives). I'd really prefer to see it as a configuration option for the remote repository. Let the remote tell me what the initial sparse set should be. > Ideally, it would do some work on the server side to only > send the objects needed for paths specified by the sparse set (but > still allow me to commit and later push changes back). I'm less interested in sparse fetching, so I'll stay out of that side of the conversation. M. ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2010-07-28 13:42 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-07-23 14:00 question (possibly) on git subtree/submodules Maurizio Vitale 2010-07-23 16:56 ` Chris Packham 2010-07-23 17:18 ` Jonathan Nieder 2010-07-23 18:35 ` Chris Packham 2010-07-27 10:56 ` Alex 2010-07-27 12:48 ` Jakub Narebski 2010-07-27 14:24 ` RFC: Sparse checkout improvements (was: Re: question (possibly) on git subtree/submodules) Marc Branchaud 2010-07-27 16:55 ` skillzero 2010-07-28 13:42 ` RFC: Sparse checkout improvements Marc Branchaud
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).