git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* updating only changed files source directory?
@ 2006-10-24  1:33 Han-Wen Nienhuys
  2006-10-24  5:55 ` Shawn Pearce
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Han-Wen Nienhuys @ 2006-10-24  1:33 UTC (permalink / raw)
  To: git


Hello there,

I'm just starting out with GIT.  Initially, I want to use experiment 
with integrating it into our binary builder structure for LilyPond.

The binary builder roughly does this:

  1. get source code updates from a server to a single, local
     repository. This is currently a git repository that is that
     tracks our CVS server.

  2. copy latest commit from a branch to separate source directory.
     This copy should only update files that changed.

  3. Incrementally compile from that source directory

The binary builder does this for several branches and several
platforms of the project. Due to parallel compilation, it might even
be possible that different branches of are being checked out
concurrently from a single repository.

For a VCS, this is slightly nonstandard use, as we don't do any work
in the working dir, we just compile from it, but have many working
directories.


I have some questions and remarks

* Is there a command analogous to git-clone for updating a repository?
Right now, I'm using a combination of

   git-http-fetch -a <branch>  <url>
   wget <url>/refs/head/<branch>    ## dump to <myrepo>/refs/head/<branch>

for all branches I want to know about.  I was looking for a command
that would update the heads of all branches.


* Why is the order of args in git-http-fetch inconsistent with the
order in git-fetch? in fetch, the repository comes first, in
http-fetch, it comes last


* How do I update a source directory?

I can do the following

   git --git-dir <myrepo> read-tree <committish>

   cd <srcdir>
   git --git-dir <myrepo> checkout-index -a -f

Unfortunately, this touches all files, which messes up the timestamps
triggering needless recompilation. How can I make checkout-index only
touch files that have changed?  Or alternatively,  make checkout-index
remember timestamps on files that didn't change?

Of course, I can store the commitish of the last version of the
srcdir, and apply the diff between both to the source directory, but 
that seems somewhat convoluted. Is there a better way?


* As far as I can see, there is no reason to have only one index in a
git repository. Why isn't it possible to specify an alternate
index-file with an option similar to --git-dir ?


-- 
  Han-Wen Nienhuys - hanwen@xs4all.nl - http://www.xs4all.nl/~hanwen

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: updating only changed files source directory?
  2006-10-24  1:33 updating only changed files source directory? Han-Wen Nienhuys
@ 2006-10-24  5:55 ` Shawn Pearce
  2006-10-24  7:48 ` Jakub Narebski
  2006-10-24 19:12 ` Daniel Barkalow
  2 siblings, 0 replies; 8+ messages in thread
From: Shawn Pearce @ 2006-10-24  5:55 UTC (permalink / raw)
  To: Han-Wen Nienhuys; +Cc: git

Han-Wen Nienhuys <hanwen@xs4all.nl> wrote:
> For a VCS, this is slightly nonstandard use, as we don't do any work
> in the working dir, we just compile from it, but have many working
> directories.

Its not nonstandard use.  A lot of projects perform rolling builds
which trigger anytime there are changes; very active projects
would always be building and thus would always want to have the
VCS only update those files which actually changed, to minimize
the compile time.
 
> I have some questions and remarks
> 
> * Is there a command analogous to git-clone for updating a repository?
> Right now, I'm using a combination of

Yes, its called git-fetch and git-pull.  Which leads us to...
 
>   git-http-fetch -a <branch>  <url>
>   wget <url>/refs/head/<branch>    ## dump to <myrepo>/refs/head/<branch>
> 
> for all branches I want to know about.  I was looking for a command
> that would update the heads of all branches.

Why not use git-fetch?

Create a .git/remotes file named 'origin' and put in there the URL
you want to fetch from and the list of branches you want to download
and keep current.

Then downloading the changes to the build repository is as simple
as running `git-fetch` with no parameters (as it defaults to reading
the origin file).

> * How do I update a source directory?

Always keep the source directory on a branch that is not listed
in the .git/remotes/origin file.  This way the fetch will always
succeed without failure.

Then you can do after the fetch:

	git-reset --hard <committish>

and the source directory will be updated to <committish> (which
could just be a branch name of one of those branches you fetch,
or could be a full SHA1, or a tag, etc.).

The reset --hard process will only change the files that really have
to change.  This means it will run in linear time proportional to the
number of files needing to be updated; and only those files which are
different between the working directory and <committish> will have
new modification dates.  Therefore incremental rebuilds will work.
 
> * As far as I can see, there is no reason to have only one index in a
> git repository. Why isn't it possible to specify an alternate
> index-file with an option similar to --git-dir ?

The index is key to getting the fast update of the working directory.
You can change the index with the (rather undocuments) GIT_INDEX_FILE
environment variable.  I do this in a few tools I have written
around Git, but I don't do it very often.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: updating only changed files source directory?
  2006-10-24  1:33 updating only changed files source directory? Han-Wen Nienhuys
  2006-10-24  5:55 ` Shawn Pearce
@ 2006-10-24  7:48 ` Jakub Narebski
  2006-10-24  9:50   ` Han-Wen Nienhuys
  2006-10-24 19:12 ` Daniel Barkalow
  2 siblings, 1 reply; 8+ messages in thread
From: Jakub Narebski @ 2006-10-24  7:48 UTC (permalink / raw)
  To: git

Han-Wen Nienhuys wrote:

> I have some questions and remarks

I see that you are using fairly low level commands (plumbing commands)
 
>    git-http-fetch -a <branch>  <url>
>    wget <url>/refs/head/<branch>    ## dump to <myrepo>/refs/head/<branch>

instead of setting $GIT_DIR/remotes/origin file and using "git fetch".
BTW. "git fetch" will not update branch you are on, unless --update-head-ok
option is used.

>    git --git-dir <myrepo> read-tree <committish>
> 
>    cd <srcdir>
>    git --git-dir <myrepo> checkout-index -a -f

instead of 
     git --git-dir=<myrepo> checkout <branch>
(-f is Force a re-read of everything)
 
> * As far as I can see, there is no reason to have only one index in a
> git repository. Why isn't it possible to specify an alternate
> index-file with an option similar to --git-dir ?

--git-dir is alternative to setting GIT_DIR. You can use GIT_INDEX_FILE
to specify alternate index file. Documented in git(7), section
"ENVIRONMENT VARIABLES".
-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: updating only changed files source directory?
  2006-10-24  7:48 ` Jakub Narebski
@ 2006-10-24  9:50   ` Han-Wen Nienhuys
  2006-10-24 10:13     ` Jakub Narebski
  0 siblings, 1 reply; 8+ messages in thread
From: Han-Wen Nienhuys @ 2006-10-24  9:50 UTC (permalink / raw)
  To: git

Jakub Narebski escreveu:
> Han-Wen Nienhuys wrote:
> 
>> I have some questions and remarks
> 
> I see that you are using fairly low level commands (plumbing commands)
>  
>>    git-http-fetch -a <branch>  <url>
>>    wget <url>/refs/head/<branch>    ## dump to <myrepo>/refs/head/<branch>
> 
> instead of setting $GIT_DIR/remotes/origin file and using "git fetch".
> BTW. "git fetch" will not update branch you are on, unless --update-head-ok
> option is used.

I tried fetch, but was put off by the warnings because I didn't have 
--update-head-ok. Using lowlevel commands is my way of making sure that 
Git doesn't assume it needs to do anything intelligent.

>>    git --git-dir <myrepo> read-tree <committish>
>>
>>    cd <srcdir>
>>    git --git-dir <myrepo> checkout-index -a -f
> 
> instead of 
>      git --git-dir=<myrepo> checkout <branch>
> (-f is Force a re-read of everything)

Yes, however,

   checkout

changes the state of the repository, which is something I want to prevent.

>> * As far as I can see, there is no reason to have only one index in a
>> git repository. Why isn't it possible to specify an alternate
>> index-file with an option similar to --git-dir ?
> 
> --git-dir is alternative to setting GIT_DIR. You can use GIT_INDEX_FILE
> to specify alternate index file. Documented in git(7), section
> "ENVIRONMENT VARIABLES".

Silly me, I overlooked in the manpage. Note that it is standard to put 
the environment section at the end of the manpage. Right now it's 
somewhere in the middle.


-- 
  Han-Wen Nienhuys - hanwen@xs4all.nl - http://www.xs4all.nl/~hanwen

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: updating only changed files source directory?
  2006-10-24  9:50   ` Han-Wen Nienhuys
@ 2006-10-24 10:13     ` Jakub Narebski
  0 siblings, 0 replies; 8+ messages in thread
From: Jakub Narebski @ 2006-10-24 10:13 UTC (permalink / raw)
  To: Han-Wen Nienhuys; +Cc: git

Han-Wen Nienhuys wrote:
> Jakub Narebski escreveu:
>> Han-Wen Nienhuys wrote:
>> 
>> I see that you are using fairly low level commands (plumbing commands)
>>  
>>>    git-http-fetch -a <branch>  <url>
>>>    wget <url>/refs/head/<branch>    ## dump to <myrepo>/refs/head/<branch>
>> 
>> instead of setting $GIT_DIR/remotes/origin file and using "git fetch".
>> BTW. "git fetch" will not update branch you are on, unless --update-head-ok
>> option is used.
> 
> I tried fetch, but was put off by the warnings because I didn't have 
> --update-head-ok. Using lowlevel commands is my way of making sure that 
> Git doesn't assume it needs to do anything intelligent.

You can either have additional branch which is not tracking branch
(you don't fetch into this branch), and on which you are always on,
called for example 'check-out' (and which can be used for git-reset
solution to checking out files to external directory), and use
git-fetch without --update-head-ok, or (if the repository is bare
repository, without working area) use --update-head-ok.
 
>>>    git --git-dir <myrepo> read-tree <committish>
>>>
>>>    cd <srcdir>
>>>    git --git-dir <myrepo> checkout-index -a -f
>> 
>> instead of 
>>      git --git-dir=<myrepo> checkout <branch>
>> (-f is Force a re-read of everything)

git-checkout-index(1):

       -f|--force
              forces overwrite of existing files

So probably you would get what you want if you lose '-f'.

> Yes, however,
> 
>    git checkout
> 
> changes the state of the repository, which is something I want to prevent.

Well, git-reset also changes state of repository, but it changes only
the branch we have created exactly for this purpose.
-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: updating only changed files source directory?
  2006-10-24  1:33 updating only changed files source directory? Han-Wen Nienhuys
  2006-10-24  5:55 ` Shawn Pearce
  2006-10-24  7:48 ` Jakub Narebski
@ 2006-10-24 19:12 ` Daniel Barkalow
  2006-10-25 11:58   ` Han-Wen Nienhuys
  2 siblings, 1 reply; 8+ messages in thread
From: Daniel Barkalow @ 2006-10-24 19:12 UTC (permalink / raw)
  To: Han-Wen Nienhuys; +Cc: git

On Tue, 24 Oct 2006, Han-Wen Nienhuys wrote:

> 
> Hello there,
> 
> I'm just starting out with GIT.  Initially, I want to use experiment with
> integrating it into our binary builder structure for LilyPond.
> 
> The binary builder roughly does this:
> 
>  1. get source code updates from a server to a single, local
>     repository. This is currently a git repository that is that
>     tracks our CVS server.
> 
>  2. copy latest commit from a branch to separate source directory.
>     This copy should only update files that changed.
> 
>  3. Incrementally compile from that source directory

The terminology in the git world is, I think, a little different from what 
you expect. We call the thing that contains all of the tracked information 
(what you're calling the repository) the "object database"; what we call 
the "repository" is a bit different: it primarily keeps track of the heads 
of branches, in addition to either containing an object database or 
referencing an external one. So you need a repository for each source 
directory (because it keeps track of what commit is currently in the 
source directory), but it doesn't need to have its own complete object 
database, which is what you're trying to share between all of them.

You have a single repository with no source directory that contains the 
database and the heads according to the upstream source, and then each 
source directory has a repository that contains the head as far as you've 
built it in that directory. You fetch into the single bare repository 
from upstream, and then pull into each source directory from the bare 
repository; this will do the minimal update to the contents of the source 
directory automatically.

I think that you want to request a few git features:

 - support having a bare repository not on a branch, so that it can fetch 
   all heads from its upstream. You're not doing anything branch-specific 
   in the bare repository anyway, but git currently wants a valid HEAD to
   accept a path as containing a git repository

 - support getting an origin remote configuration with a bare repository

 - support cloning a branch of a repository, such that the clone's 
   "origin" is the upstream's chosen branch, not its "master".

 - support cloning without generating a "master" branch in the clone, and 
   instead starting on "origin"

Then you do:

git clone --bare --no-head --with-origin <upstream> REPOSITORY.git

for each branch:

  git clone --shared --branch=<branch> --no-master REPOSITORY.git <branch>

When you want to update:

GIT_DIR=REPOSITORY.git git fetch

for each branch:

 (cd <branch>; git pull; make)

Note that all of the features you need are in "clone" for setting things 
up nicely automatically; if you arrange everything by hand just right, you 
can already to the updating procedure I give.

	-Daniel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: updating only changed files source directory?
  2006-10-24 19:12 ` Daniel Barkalow
@ 2006-10-25 11:58   ` Han-Wen Nienhuys
  2006-10-25 19:35     ` Daniel Barkalow
  0 siblings, 1 reply; 8+ messages in thread
From: Han-Wen Nienhuys @ 2006-10-25 11:58 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: git

Daniel Barkalow escreveu:
>> I'm just starting out with GIT.  Initially, I want to use experiment with
>> integrating it into our binary builder structure for LilyPond.
>>
>> The binary builder roughly does this:
>>
>>  1. get source code updates from a server to a single, local
>>     repository. This is currently a git repository that is that
>>     tracks our CVS server.
>>
>>  2. copy latest commit from a branch to separate source directory.
>>     This copy should only update files that changed.
>>
>>  3. Incrementally compile from that source directory
> 
> The terminology in the git world is, I think, a little different from what 
> you expect. We call the thing that contains all of the tracked information 
> (what you're calling the repository) the "object database"; what we call 

yes, you hit the nail on the head.

> referencing an external one. So you need a repository for each source 
> directory (because it keeps track of what commit is currently in the 
> source directory), but it doesn't need to have its own complete object 
> database, which is what you're trying to share between all of them.

OK. This makes sense; thanks for this pointer.

How can I set the object database?  I found GIT_OBJECT_DIRECTORY, but 
can I write a config file entry for that?

> built it in that directory. You fetch into the single bare repository 
> from upstream, and then pull into each source directory from the bare 
> repository; this will do the minimal update to the contents of the source 
> directory automatically.

yes, this works. Thanks!

-- 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: updating only changed files source directory?
  2006-10-25 11:58   ` Han-Wen Nienhuys
@ 2006-10-25 19:35     ` Daniel Barkalow
  0 siblings, 0 replies; 8+ messages in thread
From: Daniel Barkalow @ 2006-10-25 19:35 UTC (permalink / raw)
  To: Han-Wen Nienhuys; +Cc: git

On Wed, 25 Oct 2006, Han-Wen Nienhuys wrote:

> How can I set the object database?  I found GIT_OBJECT_DIRECTORY, but can I
> write a config file entry for that?

If you clone with --shared, it'll do the right thing automatically, which 
is to have the clone's .git/objects/info/alternates be the objects 
directory of the bare repository.

(Note that any new objects you create in the clone go into the clone's own 
objects database. This shouldn't matter for you, unless your build system 
is tagging things or something, but if you end up doing development in a 
similarly structured system, it's worth knowing that this doesn't affect 
the bare repository at all.)

> yes, this works. Thanks!

No problem. :)

	-Daniel

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2006-10-25 19:35 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-10-24  1:33 updating only changed files source directory? Han-Wen Nienhuys
2006-10-24  5:55 ` Shawn Pearce
2006-10-24  7:48 ` Jakub Narebski
2006-10-24  9:50   ` Han-Wen Nienhuys
2006-10-24 10:13     ` Jakub Narebski
2006-10-24 19:12 ` Daniel Barkalow
2006-10-25 11:58   ` Han-Wen Nienhuys
2006-10-25 19:35     ` Daniel Barkalow

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).