git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Creating own hierarchies under $GITDIR/refs ?
@ 2014-02-02 10:37 David Kastrup
  2014-02-02 11:00 ` Duy Nguyen
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: David Kastrup @ 2014-02-02 10:37 UTC (permalink / raw)
  To: git


Hi,

in the context of an ongoing discussion on the Emacs developer list of
converting the Bzr repository of Emacs, one question (with different
approaches) is where to put the information regarding preexisting Bazaar
revision numbers and bug tracker ids: those are not present in the
current Git mirror.

Putting them in the commit messages would require a full history
rewrite, and if some are missed in the process, this cannot be fixed
afterwards.

So I mused: refs/heads contains branches, refs/tags contains tags.  The
respective information would likely easily enough be stored in refs/bzr
and refs/bugs and in that manner would not pollute the "ordinary" tag
and branch spaces, rendering "git tag" and/or "git branch" output mostly
unusable.  I tested creating such a directory and entries and indeed
references like bzr/39005 then worked.

However, cloning from the repository did not copy those directories and
references, so without modification, this scheme would not work for
cloned repositories.

Are there some measures one can take/configure in the parent repository
such that (named or all) additional directories inside of $GITDIR/refs
would get cloned along with the rest?

It would definitely open viable options for dealing with mirrors and/or
repository migrations in general.

-- 
David Kastrup

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Creating own hierarchies under $GITDIR/refs ?
  2014-02-02 10:37 Creating own hierarchies under $GITDIR/refs ? David Kastrup
@ 2014-02-02 11:00 ` Duy Nguyen
  2014-02-02 11:19   ` David Kastrup
  2014-02-02 11:04 ` Andreas Schwab
  2014-02-02 23:26 ` Jeff King
  2 siblings, 1 reply; 11+ messages in thread
From: Duy Nguyen @ 2014-02-02 11:00 UTC (permalink / raw)
  To: David Kastrup; +Cc: Git Mailing List

On Sun, Feb 2, 2014 at 5:37 PM, David Kastrup <dak@gnu.org> wrote:
> in the context of an ongoing discussion on the Emacs developer list of
> converting the Bzr repository of Emacs, one question (with different
> approaches) is where to put the information regarding preexisting Bazaar
> revision numbers and bug tracker ids: those are not present in the
> current Git mirror.
>
> Putting them in the commit messages would require a full history
> rewrite, and if some are missed in the process, this cannot be fixed
> afterwards.

What do you need them for? Perhaps putting everything in a file, maybe
sorted by SHA-1, would suffice? It should not be too hard to write a
script to map bug tracker id to a commit id. The file is for past
commits only. New commits can contain these info in their messages.
-- 
Duy

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Creating own hierarchies under $GITDIR/refs ?
  2014-02-02 10:37 Creating own hierarchies under $GITDIR/refs ? David Kastrup
  2014-02-02 11:00 ` Duy Nguyen
@ 2014-02-02 11:04 ` Andreas Schwab
  2014-02-02 23:26 ` Jeff King
  2 siblings, 0 replies; 11+ messages in thread
From: Andreas Schwab @ 2014-02-02 11:04 UTC (permalink / raw)
  To: David Kastrup; +Cc: git

David Kastrup <dak@gnu.org> writes:

> Are there some measures one can take/configure in the parent repository
> such that (named or all) additional directories inside of $GITDIR/refs
> would get cloned along with the rest?

$ git config --add remote.orgin.fetch '+refs/notes/*:refs/notes/*'

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Creating own hierarchies under $GITDIR/refs ?
  2014-02-02 11:00 ` Duy Nguyen
@ 2014-02-02 11:19   ` David Kastrup
  2014-02-02 11:31     ` John Keeping
  2014-02-02 12:00     ` Duy Nguyen
  0 siblings, 2 replies; 11+ messages in thread
From: David Kastrup @ 2014-02-02 11:19 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Git Mailing List

Duy Nguyen <pclouds@gmail.com> writes:

> On Sun, Feb 2, 2014 at 5:37 PM, David Kastrup <dak@gnu.org> wrote:
>> in the context of an ongoing discussion on the Emacs developer list of
>> converting the Bzr repository of Emacs, one question (with different
>> approaches) is where to put the information regarding preexisting Bazaar
>> revision numbers and bug tracker ids: those are not present in the
>> current Git mirror.
>>
>> Putting them in the commit messages would require a full history
>> rewrite, and if some are missed in the process, this cannot be fixed
>> afterwards.
>
> What do you need them for?

Resolving references typically found in commit messages.  Also
establishing correlation to bug issue numbers.

> Perhaps putting everything in a file, maybe sorted by SHA-1, would
> suffice? It should not be too hard to write a script to map bug
> tracker id to a commit id.

We are not talking about "it should not be too hard".  We are talking
about "obvious and reliable enough to render a complete history rewrite
pointless".

> The file is for past commits only.

> New commits can contain these info in their messages.

If it's not forgotten.  Experience shows that things like issue numbers
have a tendency to be omitted, and then they stay missing.

At any rate, this is exactly the kind of stuff that tags are useful for,
except that using them for all that would render the "tag space"
overcrowded.

Rest assured that the "standard" answers have been beat to death in the
Emacs developer list thread several times over.

So I'm more interested in getting actual answers dealing with the
question I have asked rather than suggestions for questions that would
be easier to answer.

Since Git has a working facility for references that is catered to do
exactly this kind of mapping and already _does_, it seems like a
convenient path to explore.

It apparently even already works with --decorate:

commit c92b1fb3ad8514f08fc4cec531211717955a5c29 (tag: release/2.19.1-1, origin/release/unstable, tag: refs/bzr/r15000)
Author: Phil Holmes <mail@philholmes.net>
Date:   Sun Jan 19 15:01:48 2014 +0000

    Release: update news.

-- 
David Kastrup

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Creating own hierarchies under $GITDIR/refs ?
  2014-02-02 11:19   ` David Kastrup
@ 2014-02-02 11:31     ` John Keeping
  2014-02-02 11:42       ` David Kastrup
  2014-02-02 12:00     ` Duy Nguyen
  1 sibling, 1 reply; 11+ messages in thread
From: John Keeping @ 2014-02-02 11:31 UTC (permalink / raw)
  To: David Kastrup; +Cc: Duy Nguyen, Git Mailing List

On Sun, Feb 02, 2014 at 12:19:43PM +0100, David Kastrup wrote:
> Duy Nguyen <pclouds@gmail.com> writes:
> 
> > The file is for past commits only.
> 
> > New commits can contain these info in their messages.
> 
> If it's not forgotten.  Experience shows that things like issue numbers
> have a tendency to be omitted, and then they stay missing.
> 
> At any rate, this is exactly the kind of stuff that tags are useful for,
> except that using them for all that would render the "tag space"
> overcrowded.

Actually, I would say this is exactly the sort of thing notes are for.

git.git uses them to map commits back to mailing list discussions:

    git fetch git://github.com/gitster/git +refs/notes/amlog:refs/notes/amlog &&
    git log --notes=amlog

See also notes.displayRef in git-config(1).

Notes aren't fetch by default, but it's not hard for those interested to
add a remote.*.fetch line to their config.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Creating own hierarchies under $GITDIR/refs ?
  2014-02-02 11:31     ` John Keeping
@ 2014-02-02 11:42       ` David Kastrup
  2014-02-02 12:24         ` John Keeping
  0 siblings, 1 reply; 11+ messages in thread
From: David Kastrup @ 2014-02-02 11:42 UTC (permalink / raw)
  To: John Keeping; +Cc: Duy Nguyen, Git Mailing List

John Keeping <john@keeping.me.uk> writes:

> On Sun, Feb 02, 2014 at 12:19:43PM +0100, David Kastrup wrote:
>> Duy Nguyen <pclouds@gmail.com> writes:
>> 
>> > The file is for past commits only.
>> 
>> > New commits can contain these info in their messages.
>> 
>> If it's not forgotten.  Experience shows that things like issue numbers
>> have a tendency to be omitted, and then they stay missing.
>> 
>> At any rate, this is exactly the kind of stuff that tags are useful for,
>> except that using them for all that would render the "tag space"
>> overcrowded.
>
> Actually, I would say this is exactly the sort of thing notes are for.
>
> git.git uses them to map commits back to mailing list discussions:

But that's the wrong direction.  What is needed in the Emacs case is
mapping the Bazaar reference numbers (and bug numbers) to commits.

While it is true that the history rewriting approach would not deliver
this either (short of git log --grep with suitable patterns), I was
looking for something less of a crutch here.

> Notes aren't fetch by default, but it's not hard for those interested
> to add a remote.*.fetch line to their config.

If we are talking about measures everybody has to actively take before
getting access to functionality, this does not cross the convenience
threshold making it a solution preferred over others.  But it's probably
feasible to configure a fetch line doing this that will get cloned when
first cloning a repository.  That's not too hot for people with existing
repositories, but since we are talking about a migration from Bazaar
anyway, Git users currently are so by choice and so might be more
willing to update their configuration if it helps with avoiding a fully
new clone.

-- 
David Kastrup

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Creating own hierarchies under $GITDIR/refs ?
  2014-02-02 11:19   ` David Kastrup
  2014-02-02 11:31     ` John Keeping
@ 2014-02-02 12:00     ` Duy Nguyen
  2014-02-02 12:09       ` David Kastrup
  1 sibling, 1 reply; 11+ messages in thread
From: Duy Nguyen @ 2014-02-02 12:00 UTC (permalink / raw)
  To: David Kastrup; +Cc: Git Mailing List

On Sun, Feb 2, 2014 at 6:19 PM, David Kastrup <dak@gnu.org> wrote:
> Since Git has a working facility for references that is catered to do
> exactly this kind of mapping and already _does_, it seems like a
> convenient path to explore.

It will not scale. If you make those refs available for
cloning/fetching, all of them will be advertised first thing when git
starts negotiate. Imagine thousands of refs (and keep increasing) sent
to the receiver at the beginning of every connection. Something like
"reverse git-notes" may transfer more efficiently. Or we need to
improve git protocol to handle massive refs better, something that's
been discussed for a while without any outcome.
-- 
Duy

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Creating own hierarchies under $GITDIR/refs ?
  2014-02-02 12:00     ` Duy Nguyen
@ 2014-02-02 12:09       ` David Kastrup
  0 siblings, 0 replies; 11+ messages in thread
From: David Kastrup @ 2014-02-02 12:09 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Git Mailing List

Duy Nguyen <pclouds@gmail.com> writes:

> On Sun, Feb 2, 2014 at 6:19 PM, David Kastrup <dak@gnu.org> wrote:
>> Since Git has a working facility for references that is catered to do
>> exactly this kind of mapping and already _does_, it seems like a
>> convenient path to explore.
>
> It will not scale. If you make those refs available for
> cloning/fetching, all of them will be advertised first thing when git
> starts negotiate. Imagine thousands of refs (and keep increasing) sent
> to the receiver at the beginning of every connection.

In current LilyPond repository:
git tag|wc
    969     969   15161

In current Emacs mirror:
git tag|wc
   1202    1202   15729

In current Git repository:
git tag|wc
    498     498    4820

> Something like "reverse git-notes" may transfer more efficiently. Or
> we need to improve git protocol to handle massive refs better,
> something that's been discussed for a while without any outcome.

I think that even disregarding special use of references, _existing_
practice would already appear to warrant being able to deal with
thousands of refs in a reasonable manner.

It's a reasonable expectation to have a tag per (potentially
intermediate) release or release candidate.  For any project publishing
reproducible daily snapshots, the threshold of 1000 will get reached
within few years.

Of course, it is relevant information to know that right _now_
references will not scale.  But that does not seem like a defensible
long-term perspective.

-- 
David Kastrup

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Creating own hierarchies under $GITDIR/refs ?
  2014-02-02 11:42       ` David Kastrup
@ 2014-02-02 12:24         ` John Keeping
  2014-02-02 23:44           ` Jed Brown
  0 siblings, 1 reply; 11+ messages in thread
From: John Keeping @ 2014-02-02 12:24 UTC (permalink / raw)
  To: David Kastrup; +Cc: Duy Nguyen, Git Mailing List

On Sun, Feb 02, 2014 at 12:42:52PM +0100, David Kastrup wrote:
> John Keeping <john@keeping.me.uk> writes:
> 
> > On Sun, Feb 02, 2014 at 12:19:43PM +0100, David Kastrup wrote:
> >> Duy Nguyen <pclouds@gmail.com> writes:
> >> 
> >> > The file is for past commits only.
> >> 
> >> > New commits can contain these info in their messages.
> >> 
> >> If it's not forgotten.  Experience shows that things like issue numbers
> >> have a tendency to be omitted, and then they stay missing.
> >> 
> >> At any rate, this is exactly the kind of stuff that tags are useful for,
> >> except that using them for all that would render the "tag space"
> >> overcrowded.
> >
> > Actually, I would say this is exactly the sort of thing notes are for.
> >
> > git.git uses them to map commits back to mailing list discussions:
> 
> But that's the wrong direction.  What is needed in the Emacs case is
> mapping the Bazaar reference numbers (and bug numbers) to commits.

Ah, OK.  I hadn't quite read carefully enough.

I actually wonder if you could do this with notes and git-grep; for
example:

    git grep -l keeping.me.uk refs/notes/amlog |
    sed -e 's/.*://' -e 's!/!!g'

That should be relatively efficient since you're only looking at the
current notes tree.

> While it is true that the history rewriting approach would not deliver
> this either (short of git log --grep with suitable patterns), I was
> looking for something less of a crutch here.
> 
> > Notes aren't fetch by default, but it's not hard for those interested
> > to add a remote.*.fetch line to their config.
> 
> If we are talking about measures everybody has to actively take before
> getting access to functionality, this does not cross the convenience
> threshold making it a solution preferred over others.  But it's probably
> feasible to configure a fetch line doing this that will get cloned when
> first cloning a repository.

I'm assuming you'll need some form of tool (at least a script) to
manipulate this feature; it wouldn't be too hard for that to set this up
the first time it's run.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Creating own hierarchies under $GITDIR/refs ?
  2014-02-02 10:37 Creating own hierarchies under $GITDIR/refs ? David Kastrup
  2014-02-02 11:00 ` Duy Nguyen
  2014-02-02 11:04 ` Andreas Schwab
@ 2014-02-02 23:26 ` Jeff King
  2 siblings, 0 replies; 11+ messages in thread
From: Jeff King @ 2014-02-02 23:26 UTC (permalink / raw)
  To: David Kastrup; +Cc: git

On Sun, Feb 02, 2014 at 11:37:39AM +0100, David Kastrup wrote:

> So I mused: refs/heads contains branches, refs/tags contains tags.  The
> respective information would likely easily enough be stored in refs/bzr
> and refs/bugs and in that manner would not pollute the "ordinary" tag
> and branch spaces, rendering "git tag" and/or "git branch" output mostly
> unusable.  I tested creating such a directory and entries and indeed
> references like bzr/39005 then worked.

Yes. The names "refs/tags" and "refs/heads" are special by convention,
and there is no reason you cannot have other hierarchies (and indeed, we
already have "refs/notes" and "refs/remotes" as common hierarchies).

> However, cloning from the repository did not copy those directories and
> references, so without modification, this scheme would not work for
> cloned repositories.

Correct. Anyone who wants them will have to ask for them manually, like:

  git config --add remote.origin.fetch '+refs/bzr/*:refs/bzr/*'

after which any "git fetch" will retrieve them.

> Are there some measures one can take/configure in the parent repository
> such that (named or all) additional directories inside of $GITDIR/refs
> would get cloned along with the rest?

No. It is up to the client to decide which parts of the ref namespace
they want to fetch. The server only advertises what it has, and the
client selects from that.


Others mentioned that refs were never really intended to scale to
one-per-commit. We serve some repositories with tens of thousands of
refs from GitHub, and it does work. On the backend, we even have some
repos in the hundreds of thousands (but these are not client facing).
Most of the pain points (like O(n^2) loops) have been ironed out, but
the two big ones are still:

  - server ref advertisement lists _all_ refs at the start of the
    conversation. So, e.g.,

        git fetch git://github.com/Homebrew/homebrew.git

    sends 2MB of advertisement just so a client can find out "nope,
    nothing to fetch".

  - the packed-refs storage is rather monolithic. Reading a value from
    it currently requires parsing the whole file. Likewise, deleting a
    ref requires rewriting the whole file.

So what you are proposing will work, but do note that there is a cost.

-Peff

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Creating own hierarchies under $GITDIR/refs ?
  2014-02-02 12:24         ` John Keeping
@ 2014-02-02 23:44           ` Jed Brown
  0 siblings, 0 replies; 11+ messages in thread
From: Jed Brown @ 2014-02-02 23:44 UTC (permalink / raw)
  To: John Keeping, David Kastrup; +Cc: Duy Nguyen, Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 812 bytes --]

John Keeping <john@keeping.me.uk> writes:
> I actually wonder if you could do this with notes and git-grep; for
> example:
>
>     git grep -l keeping.me.uk refs/notes/amlog |
>     sed -e 's/.*://' -e 's!/!!g'
>
> That should be relatively efficient since you're only looking at the
> current notes tree.

I added notes handling to gitifyhg and would search it similar to this.
Since gitifyhg is two-way, I could not modify the commits.  Later, when
we converted several repositories (up to 50k commits/80 MB), I appended

  Hg-commit: $Hg_commit_hash

to all the commit messages.  This way it shows up on the web interface,
users don't have to obtain the notes specially, and "git log --grep"
works naturally.  I think it's worth considering this simple solution;
existing Git users won't mind recloning once.

[-- Attachment #2: Type: application/pgp-signature, Size: 835 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2014-02-03 18:06 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-02-02 10:37 Creating own hierarchies under $GITDIR/refs ? David Kastrup
2014-02-02 11:00 ` Duy Nguyen
2014-02-02 11:19   ` David Kastrup
2014-02-02 11:31     ` John Keeping
2014-02-02 11:42       ` David Kastrup
2014-02-02 12:24         ` John Keeping
2014-02-02 23:44           ` Jed Brown
2014-02-02 12:00     ` Duy Nguyen
2014-02-02 12:09       ` David Kastrup
2014-02-02 11:04 ` Andreas Schwab
2014-02-02 23:26 ` Jeff King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).