git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Medium term dreams
@ 2008-09-01 23:19 Junio C Hamano
  2008-09-02  0:00 ` pack count on repo.or.cz [was "Medium term dreams"] Jeff King
  2008-09-04  5:33 ` Medium term dreams Mike Hommey
  0 siblings, 2 replies; 12+ messages in thread
From: Junio C Hamano @ 2008-09-01 23:19 UTC (permalink / raw)
  To: git

Perhaps because it is also Linus's brainchild, git development has not
generally advanced by intelligent design but by organic evolution.  We
have worked without setting any grand, long term visions, but primarily
by gathering the fruits of individual developers' work to scratch their
own itches.  While I do not see that as a problem at all, it sometimes
may help to write down some medium to long term wishes to review what we
haven't done (and we should) in a perspective.

This is such a wishlist, not a grand intelligent design in any way.

1. Pathspecs

There is a longstanding inconsistency in pathspec limiting done by
various families of commands.  We should unify them.

 * The pathspec limiter that originates from "git diff-tree" and shared
   with "git log" is the most limited.  It only allows you to specify
   leading paths and limits by leading substring match (anchored at
   directory boundaries '/'); no globbing is supported.

 * The pathspec limiter used by "git ls-files -- $path" and shared by
   "git checkout [treeish] -- $path" is more flexible.  In addition to
   the leading directory match above, it does allow globbing.

   This unfortunately has at least two different implementations.  The
   one ls-files and checkout uses is in dir.c; grep uses a different
   implementation that is more advanced and better suited for a caller
   that can benefit from being able to skip a whole subdirectory.

   The limiter used by for-each-ref is also of this kind; it has its own
   implementation (I do not think it is worth using the same code with
   other codepaths for this particular one, but I am mentioning it for
   completeness).

My gut feeling is that we should build a superset of these three
implementations, starting from the one used in grep and adding the
feature it lacks from the dir.c one (namely, the ability to mark which
pathspec had actually matched anything, so that the caller can say "you
gave me this pathspec but it did not match any --- typo?"), and make
the existing three pathspec limiter functions thin wrappers to it.

This will change semantics in that "git diff HEAD '*.c'" will now show
differences to all C files, not just the path with three-letter
filename, asterisk-dot-C.  I do not think that deserves to be called
"breaking backward compatibility"; it is 99% pure feature enhancement,
that hurts insane people who have filenames with shell metacharacters in
them.
 
2. Submodules

Eventually, I'd like to remove gitk-git/ and git-gui/ directories, and use
the submodule mechanism to bind them at modules/{gitk,git-gui}.  So far we
have refrained from doing this, primarily because a repository that uses
submodules cannot be cloned by very old versions (before 1.5.2) of git,
and in principle, git.git's repository itself should be conservative
(e.g. I did not use packed refs in public repositories where I can
configure for a long time, since HTTP clients did not understand them).

Because this change (not the change to the software itself, but the
repository that houses the software, i.e. "git.git") would be conceptually
a big jump, the release that has these two directories as submodules might
need to be called 2.0.0.  Now a big release is over, and hopefully in a
year or so, everybody would be using at least 1.6.0.

I've pushed a sample repository that future git.git might look like to:

    http://repo.or.cz/w/git/split-submodule.git/

We may however need to polish the software side of submodule support so
that the development inside such a repository goes smoothly.  I think the
current submodule support is already good enough to get this started.
Namely:

 * I do not think it is a problem at all that many operations such as
   checkout/diff/fetch/push/bisect do not recurse into submodules, at
   least for the purpose of binding gitk/git-gui to git.git.

 * I do think the current design of the submodule system is very well
   suited for git.git's future use of submodules for gitk/git-gui. If
   somebody builds with "make NO_TCLTK", he most likely won't even want
   to "git submodule init/update" these two submodules.

 * I do not think switching branches across that magic commit, commits
   before which have gitk as a subdirectory and commits after which have
   gitk as a submodule, is a huge problem with the current software, even
   though after switching from new to old there will be leftover, unused
   modules/ directory.

However, there still are some issues that the Porcelain level submodule
support could address to make things better.  For example, if we do not
move gitk-git => modules/gitk as I outlined above, switching across the
magic commit would become an issue, especially switching from new to old
would either lose the submodule repository, or refuse to switch because
doing so would lose the submodule repository.

We should support the usage to replace an existing subdirectory (possibly
but not necessarily merged with subtree strategy) with a submodule.  To do
this sanely, one solution we may want to consider is to keep the real
repository for each submodule somewhere in the superproject's $GIT_DIR/,
and use the "gitfile" to point at it from the submodule's working tree,
and "git submodule init" should be updated to use such a layout by
default.  Then checking out 1.6.0 when you have a clean checkout of 2.0.0
tree will not have to lose the submodule repository (we can notice that
gitk-gui is clean, remove the whole subdirectory and check out the copy
from 1.6.0).

I already know about some breakages around use of relative path in
"gitfile" that needs to be addressed to implement above; there may be
some other issues around that code, but that level of detail is outside
the scope of this message.

There may be some other issues I haven't thought about (submodules area
has never been my bailiwick). and my feeling is that I most likely will
not the primary person who would be doing the polishing of submodule
Porcelain.  But I think the release date of 2.0.0 will be at least one
year after we solve these issues, and when everybody is running such a
version.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* pack count on repo.or.cz [was "Medium term dreams"]
  2008-09-01 23:19 Medium term dreams Junio C Hamano
@ 2008-09-02  0:00 ` Jeff King
  2008-09-02  1:04   ` Petr Baudis
  2008-09-04  5:33 ` Medium term dreams Mike Hommey
  1 sibling, 1 reply; 12+ messages in thread
From: Jeff King @ 2008-09-02  0:00 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Junio C Hamano, git

On Mon, Sep 01, 2008 at 04:19:47PM -0700, Junio C Hamano wrote:

> I've pushed a sample repository that future git.git might look like to:
> 
>     http://repo.or.cz/w/git/split-submodule.git/

Holy pack count, Batman! On a whim, I cloned this via http (by switching
/w/ to /r/). It took 7m13s. Cloning by git://, for comparison, took
1m31s. I pulled down 82 separate packs (and curiously, 277 separate .idx
files).

I know it is nice to keep the packs somewhat split for dumb transport
users (since they otherwise have to pull down the whole thing), but I
think it is coming here at the cost of first-time cloners (and yes,
obviously I should have used --reference to an existing git clone; but
for many that will not be an option).

It looks like the gc.autopacklimit defaults to 50, which would have
helped this. Pasky, is repo.or.cz not gc-ing? Or gc-ing with different
parameters? Or is this an artifact of the forking infrastructure (i.e.,
these packs are actually split across multiple modules)?

-Peff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: pack count on repo.or.cz [was "Medium term dreams"]
  2008-09-02  0:00 ` pack count on repo.or.cz [was "Medium term dreams"] Jeff King
@ 2008-09-02  1:04   ` Petr Baudis
  2008-09-02  1:14     ` Jeff King
  0 siblings, 1 reply; 12+ messages in thread
From: Petr Baudis @ 2008-09-02  1:04 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git

On Mon, Sep 01, 2008 at 08:00:37PM -0400, Jeff King wrote:
> It looks like the gc.autopacklimit defaults to 50, which would have
> helped this. Pasky, is repo.or.cz not gc-ing? Or gc-ing with different
> parameters? Or is this an artifact of the forking infrastructure (i.e.,
> these packs are actually split across multiple modules)?

Unfortunately, I'm not aware how to decreate the packs count with
current Git without losing _any_ objects. So yes, you could say that
this is an artifact of the forking infrastructure - we just can't afford
to lose objects.

I've been meaning to look into this for few... years now, but there is
always too many more important things to do. If anyone wants to help
out, you're welcome to! :-)

Maybe I should just give up on the whole alternates idea. Unless the
referenced object store is practically static, it just does not seem
as a feasible thing to do with Git at all, and disk space is cheap.

-- 
				Petr "Pasky" Baudis
The next generation of interesting software will be done
on the Macintosh, not the IBM PC.  -- Bill Gates

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: pack count on repo.or.cz [was "Medium term dreams"]
  2008-09-02  1:04   ` Petr Baudis
@ 2008-09-02  1:14     ` Jeff King
  2008-09-02  1:47       ` pack count on repo.or.cz Junio C Hamano
  2008-09-02 11:15       ` pack count on repo.or.cz [was "Medium term dreams"] Petr Baudis
  0 siblings, 2 replies; 12+ messages in thread
From: Jeff King @ 2008-09-02  1:14 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Junio C Hamano, git

On Tue, Sep 02, 2008 at 03:04:10AM +0200, Petr Baudis wrote:

> Unfortunately, I'm not aware how to decreate the packs count with
> current Git without losing _any_ objects. So yes, you could say that
> this is an artifact of the forking infrastructure - we just can't afford
> to lose objects.

Hmm, I thought that was the point of adding the "-A" flag to git-repack.

Though an even simpler solution, since you control all of the repos, is
to just temporarily add references from the "parent" of the fork to
every ref of every forked child. Then do the repack in the parent, which
should then contain all of the objects for all of the children, delete
the temporary references, and prune in the children (who should see most
of their objects now in the parent).

The only downside I can think of is that anyone grabbing the parent's
packs via dumb transport will grab a few unnecessary objects for the
forks. But presumably they are not all that big, being forks (i.e.,
there will only be a few commits, and they will be delta'd anyway).

-Peff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: pack count on repo.or.cz
  2008-09-02  1:14     ` Jeff King
@ 2008-09-02  1:47       ` Junio C Hamano
  2008-09-02  1:56         ` Jeff King
  2008-09-02 11:15       ` pack count on repo.or.cz [was "Medium term dreams"] Petr Baudis
  1 sibling, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2008-09-02  1:47 UTC (permalink / raw)
  To: Jeff King; +Cc: Petr Baudis, git

Jeff King <peff@peff.net> writes:

> Though an even simpler solution, since you control all of the repos, is
> to just temporarily add references from the "parent" of the fork to
> every ref of every forked child. Then do the repack in the parent, which
> should then contain all of the objects for all of the children, delete
> the temporary references, and prune in the children (who should see most
> of their objects now in the parent).

Hmm, I am slightly worried about doing so might defeat the whole point of
making the sample repository a separate one from git.git hosted there.

The reason it is not a branch in git.git is because I did not want to
contaminate the official git.git repository with commit objects in tree
objects (aka gitlinks); it would deny access to clients older than 1.5.2.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: pack count on repo.or.cz
  2008-09-02  1:47       ` pack count on repo.or.cz Junio C Hamano
@ 2008-09-02  1:56         ` Jeff King
  0 siblings, 0 replies; 12+ messages in thread
From: Jeff King @ 2008-09-02  1:56 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Petr Baudis, git

On Mon, Sep 01, 2008 at 06:47:36PM -0700, Junio C Hamano wrote:

> Hmm, I am slightly worried about doing so might defeat the whole point of
> making the sample repository a separate one from git.git hosted there.
> 
> The reason it is not a branch in git.git is because I did not want to
> contaminate the official git.git repository with commit objects in tree
> objects (aka gitlinks); it would deny access to clients older than 1.5.2.

Hrm. I had assumed they wouldn't be a problem because nothing actually
_referenced_ them, but I suppose it would break git-fsck on such
systems.

Anyway, I think "git repack -A" is a better solution (and presumably
there is some magic on the pruning end to make sure forks do the right
thing with loose objects).

-Peff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: pack count on repo.or.cz [was "Medium term dreams"]
  2008-09-02  1:14     ` Jeff King
  2008-09-02  1:47       ` pack count on repo.or.cz Junio C Hamano
@ 2008-09-02 11:15       ` Petr Baudis
  2008-09-02 11:54         ` Jeff King
  1 sibling, 1 reply; 12+ messages in thread
From: Petr Baudis @ 2008-09-02 11:15 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git

On Mon, Sep 01, 2008 at 09:14:33PM -0400, Jeff King wrote:
> On Tue, Sep 02, 2008 at 03:04:10AM +0200, Petr Baudis wrote:
> 
> > Unfortunately, I'm not aware how to decreate the packs count with
> > current Git without losing _any_ objects. So yes, you could say that
> > this is an artifact of the forking infrastructure - we just can't afford
> > to lose objects.
> 
> Hmm, I thought that was the point of adding the "-A" flag to git-repack.

Ok, I did

	git repack -A -d

in repo.or.cz's git.git. What next? I have brand-new

	-rw-rw-r-- 1 root root  1314056 2008-09-02 13:07 pack-d19ca8b0cfd0e3357c475a3e96ce55b9f7195667.idx
	-rw-rw-r-- 1 root root 17344999 2008-09-02 13:07 pack-d19ca8b0cfd0e3357c475a3e96ce55b9f7195667.pack

but all the old packs too; git repack didn't delete anything,
git prune-packed seems to have no effect either.

> Though an even simpler solution, since you control all of the repos, is
> to just temporarily add references from the "parent" of the fork to
> every ref of every forked child. Then do the repack in the parent, which
> should then contain all of the objects for all of the children, delete
> the temporary references, and prune in the children (who should see most
> of their objects now in the parent).

So not just refs but also alternates? What if someone accesses the
reposiory at that moment? I would also need to make the symlinks quite
densely to avoid refs/forkee/-induced loops.

I might as well just use a common repository for all the forks then. But
this does not scale at all for dumb transports, does it?

-- 
				Petr "Pasky" Baudis
The next generation of interesting software will be done
on the Macintosh, not the IBM PC.  -- Bill Gates

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: pack count on repo.or.cz [was "Medium term dreams"]
  2008-09-02 11:15       ` pack count on repo.or.cz [was "Medium term dreams"] Petr Baudis
@ 2008-09-02 11:54         ` Jeff King
  2008-09-02 13:08           ` Petr Baudis
  0 siblings, 1 reply; 12+ messages in thread
From: Jeff King @ 2008-09-02 11:54 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Junio C Hamano, git

On Tue, Sep 02, 2008 at 01:15:31PM +0200, Petr Baudis wrote:

> > Hmm, I thought that was the point of adding the "-A" flag to git-repack.
> 
> Ok, I did
> 
> 	git repack -A -d
> 
> in repo.or.cz's git.git. What next? I have brand-new
>
> 	-rw-rw-r-- 1 root root  1314056 2008-09-02 13:07 pack-d19ca8b0cfd0e3357c475a3e96ce55b9f7195667.idx
> 	-rw-rw-r-- 1 root root 17344999 2008-09-02 13:07 pack-d19ca8b0cfd0e3357c475a3e96ce55b9f7195667.pack
> 
> but all the old packs too; git repack didn't delete anything,
> git prune-packed seems to have no effect either.

Hmm. Certainly the new pack is as expected. And you may also have some
new loose objects. It's easier to see what's going on with:

  mkdir repo && cd repo
  git init
  git config core.logallrefupdates false
  echo content >file && git add file && git commit -m one
  echo changes >>file && git commit -a -m two
  git repack -a -d

at this point we have the one pack:

  $ find .git/objects -type f
  .git/objects/pack/pack-ba174ac1cf22ba99a92878c562483321baa93d38.pack
  .git/objects/pack/pack-ba174ac1cf22ba99a92878c562483321baa93d38.idx
  .git/objects/info/packs

and then we lose the reference and pack again:

  git reset --hard HEAD^
  git repack -A -d

and now we have our single-commit pack, with a few loose objects from
the other commit:

  $ find .git/objects -type f
  .git/objects/13/cb02d81aac9cedca2700fb56aeddeb984dc57b
  .git/objects/78/a66295f2d04f2dd4ea90b4b99a6de73ea4ac12
  .git/objects/fe/79de90b5f7d6d4b23dc858f861834e2a76af7b
  .git/objects/info/packs
  .git/objects/pack/pack-c9a76fe2b061890a18396b70ec3d6a638383046e.idx
  .git/objects/pack/pack-c9a76fe2b061890a18396b70ec3d6a638383046e.pack

So did you check for loose objects? That is what you should get if there
were any objects that would have been lost. If there aren't any new
loose objects, then there were no objects that would be lost.

As to why the other packs weren't pruned, I don't know. In my example,
you can see that the pruning happens as we expect. So either there is a
bug in git-prune-packed, or there is something we're not realizing.

> So not just refs but also alternates? What if someone accesses the
> reposiory at that moment? I would also need to make the symlinks quite
> densely to avoid refs/forkee/-induced loops.

I think you could probably do it safely with a symlink farm (and I sort
of assumed you were doing something like that already for pruning, but
perhaps you are just not pruning at all).

> I might as well just use a common repository for all the forks then. But
> this does not scale at all for dumb transports, does it?

It depends how different the forks are. :) But I think it is better if
we can avoid that.

-Peff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: pack count on repo.or.cz [was "Medium term dreams"]
  2008-09-02 11:54         ` Jeff King
@ 2008-09-02 13:08           ` Petr Baudis
  2008-09-02 13:49             ` Johannes Sixt
  0 siblings, 1 reply; 12+ messages in thread
From: Petr Baudis @ 2008-09-02 13:08 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git

On Tue, Sep 02, 2008 at 07:54:24AM -0400, Jeff King wrote:
> So did you check for loose objects? That is what you should get if there
> were any objects that would have been lost. If there aren't any new
> loose objects, then there were no objects that would be lost.

Before:

rover:/srv/git/git.git# l objects~/??/* | wc -l
3343
rover:/srv/git/git.git# l objects~/pack/* | wc -l
750

After:

rover:/srv/git/git.git# l objects/??/* | wc -l
2920
rover:/srv/git/git.git# l objects/pack/* | wc -l
590

> As to why the other packs weren't pruned, I don't know. In my example,
> you can see that the pruning happens as we expect. So either there is a
> bug in git-prune-packed, or there is something we're not realizing.

Well, that's my question here. :-)

If you're still interested in looking into this further, you can wget -r
or rsync over the git.git repository from repo.or.cz.

-- 
				Petr "Pasky" Baudis
The next generation of interesting software will be done
on the Macintosh, not the IBM PC.  -- Bill Gates

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: pack count on repo.or.cz [was "Medium term dreams"]
  2008-09-02 13:08           ` Petr Baudis
@ 2008-09-02 13:49             ` Johannes Sixt
  2008-09-03 10:08               ` Petr Baudis
  0 siblings, 1 reply; 12+ messages in thread
From: Johannes Sixt @ 2008-09-02 13:49 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Jeff King, Junio C Hamano, git

Petr Baudis schrieb:
>> As to why the other packs weren't pruned, I don't know. In my example,
>> you can see that the pruning happens as we expect. So either there is a
>> bug in git-prune-packed, or there is something we're not realizing.
> 
> Well, that's my question here. :-)

Does removing all the *.keep files help? ;)

-- Hannes

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: pack count on repo.or.cz [was "Medium term dreams"]
  2008-09-02 13:49             ` Johannes Sixt
@ 2008-09-03 10:08               ` Petr Baudis
  0 siblings, 0 replies; 12+ messages in thread
From: Petr Baudis @ 2008-09-03 10:08 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Jeff King, Junio C Hamano, git

On Tue, Sep 02, 2008 at 03:49:32PM +0200, Johannes Sixt wrote:
> Petr Baudis schrieb:
> >> As to why the other packs weren't pruned, I don't know. In my example,
> >> you can see that the pruning happens as we expect. So either there is a
> >> bug in git-prune-packed, or there is something we're not realizing.
> > 
> > Well, that's my question here. :-)
> 
> Does removing all the *.keep files help? ;)

Haha, good catch - thanks! :-) I wonder where they came from, though I
hazily remember some curious behaviour of older Git versions wrt. .keep
files and they seem all to be of old date. Now I have a single nice pack
and fsck of all the forks still passes fine.

Thanks all, I will adopt this for all projects.

-- 
				Petr "Pasky" Baudis
The next generation of interesting software will be done
on the Macintosh, not the IBM PC.  -- Bill Gates

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Medium term dreams
  2008-09-01 23:19 Medium term dreams Junio C Hamano
  2008-09-02  0:00 ` pack count on repo.or.cz [was "Medium term dreams"] Jeff King
@ 2008-09-04  5:33 ` Mike Hommey
  1 sibling, 0 replies; 12+ messages in thread
From: Mike Hommey @ 2008-09-04  5:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Mon, Sep 01, 2008 at 04:19:47PM -0700, Junio C Hamano wrote:
> Perhaps because it is also Linus's brainchild, git development has not
> generally advanced by intelligent design but by organic evolution.  We
> have worked without setting any grand, long term visions, but primarily
> by gathering the fruits of individual developers' work to scratch their
> own itches.  While I do not see that as a problem at all, it sometimes
> may help to write down some medium to long term wishes to review what we
> haven't done (and we should) in a perspective.
> 
> This is such a wishlist, not a grand intelligent design in any way.
> 
> 1. Pathspecs
> 2. Submodules

I'd add these two:

- Handle cvs/svn/whatever remotes as standard remotes.

I don't see why these remotes should have a different workflow, and why
there couldn't be hooks to do whatever is required to pull/push from
these remotes when git pull/push'ing. This might not be easy to implement,
but I think it is a worthwhile goal.

- Git-aware mergetool

There are various merge scenarios where using standard tools such as
those supported by current git-mergetool is not very helpful, and where
some basic git awareness might help the user resolve conflicts in more
natural ways.

Mike

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2008-09-04  5:34 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-09-01 23:19 Medium term dreams Junio C Hamano
2008-09-02  0:00 ` pack count on repo.or.cz [was "Medium term dreams"] Jeff King
2008-09-02  1:04   ` Petr Baudis
2008-09-02  1:14     ` Jeff King
2008-09-02  1:47       ` pack count on repo.or.cz Junio C Hamano
2008-09-02  1:56         ` Jeff King
2008-09-02 11:15       ` pack count on repo.or.cz [was "Medium term dreams"] Petr Baudis
2008-09-02 11:54         ` Jeff King
2008-09-02 13:08           ` Petr Baudis
2008-09-02 13:49             ` Johannes Sixt
2008-09-03 10:08               ` Petr Baudis
2008-09-04  5:33 ` Medium term dreams Mike Hommey

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).