git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Johan Herland <johan@herland.net>
To: git@vger.kernel.org
Cc: Junio C Hamano <junkio@cox.net>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: [PATCH 0/7] Introduce soft references (softrefs)
Date: Sat, 09 Jun 2007 20:19:13 +0200	[thread overview]
Message-ID: <200706092019.13185.johan@herland.net> (raw)
In-Reply-To: <200706040251.05286.johan@herland.net>

This patch series introduces soft references (softrefs); a mechanism for
declaring reachability between arbitrary (but existing) git objects.
Softrefs are meant to provide the mechanism for "reverse mapping" that
we determined was needed for tag objects (especially 'notes'). The patch
series also teaches git-mktag to create softrefs for all tag objects.

See the Discussion section in the git-softref manual page (patch #4/7) or
the comments in the header file (patch #1/7) for more details on the
design of softrefs.

I've added some informal performance data at the bottom of this mail [1].

Note that this patch series is incomplete in that the following things
have yet to be implemented:

1. Clone/fetch/push of softrefs

2. Packing of softrefs

3. General integration of softrefs into parts of git where they might be
   useful

4. Find appropriate value for MAX_UNSORTED_ENTRIES


There are also some questions connected to the above list of todos:

1. Just how should softrefs affect reachability? Should softrefs be
   used/followed in _all_ reachability computations? If not, which?

2. How should softrefs propagate. I suggest they are pretty much always
   propagated under clone/fetch/push. (Note that the softrefs merge
   algorithm in softrefs.c removes duplicates and softrefs between
   non-existing objects, so pre-filtering of the softrefs to be
   clones/fetched/pushed may not be necessary)

3. Where can softrefs be used to improve performance by replacing existing
   techniques?

4. How to best pack softrefs? Keeping them in the same pack as the objects
   they refer to seems to be a good idea, but more thought needs to be put
   into this before we can make an implementation

5. How to find _all_ (even unreachable) tag objects in repo for
   'git-softref --rebuild-tags'?

6. Optimization. Pretty much nothing has been done so far. Performance
   seems to be acceptable for now. Probably needs more testing to
   determine bottlenecks


NOTE: After the 7 patches, I will send an _optional_ patch
that changes the softrefs entries from text format (82 bytes per entry)
to binary format (40 bytes per entry). The patch is optional, because
I want the list to decide if we want the (marginal) speedup and
simplified code provided by the patch, or if we want to keep the
read-/maintainability of the text format. Currently I'm in favour of
keeping the text format, but I'm far from sure.


Finally, here's the shortlog: (This patch series of course goes on top of
the previous "Refactor the tag object" patch series, although there isn't
really that many dependencies between them):

Johan Herland (7):
      Softrefs: Add softrefs header file with API documentation
      Softrefs: Add implementation of softrefs API
      Softrefs: Add git-softref, a builtin command for adding, listing and administering softrefs
      Softrefs: Add manual page documenting git-softref and softrefs subsystem in general
      Softrefs: Add testcases for basic softrefs behaviour
      Softrefs: Administrivia associated with softrefs subsystem and git-softref builtin
      Teach git-mktag to register softrefs for all tag objects

 .gitignore                    |    1 +
 Documentation/cmd-list.perl   |    7 +-
 Documentation/git-softref.txt |  119 +++++++
 Makefile                      |    6 +-
 builtin-softref.c             |  167 ++++++++++
 builtin.h                     |    1 +
 git.c                         |    1 +
 mktag.c                       |   11 +-
 softrefs.c                    |  712 +++++++++++++++++++++++++++++++++++++++++
 softrefs.h                    |  188 +++++++++++
 t/t3050-softrefs.sh           |  314 ++++++++++++++++++
 11 files changed, 1521 insertions(+), 6 deletions(-)
 create mode 100644 Documentation/git-softref.txt
 create mode 100644 builtin-softref.c
 create mode 100644 softrefs.c
 create mode 100644 softrefs.h
 create mode 100755 t/t3050-softrefs.sh


Have fun!

...Johan


[1] Informal performance measurements

I prepared a linux kernel repo (holding 57274 commits) with 10 tag objects,
and created softrefs from every commit to every tag object (572740 softrefs
in total). The resulting softrefs db was 46964680 bytes. The experiment was
done on a 32-bit Intel Pentium 4 (3 GHz w/HyperThreading) with 1 GB RAM:


========
Operations on unsorted softrefs:
(572740 (10 per commit) entries in random/unsorted order)
========

Listing all softrefs
(sequential reading of unsorted softrefs file)
--------
$ /usr/bin/time git softref --list > /dev/null
0.44user 0.02system 0:00.47elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+11786minor)pagefaults 0swaps

Listing HEAD's softrefs
(sequential reading of unsorted softrefs file)
--------
$ /usr/bin/time git softref --list HEAD > /dev/null
0.11user 0.01system 0:00.14elapsed 94%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+11790minor)pagefaults 0swaps

Sorting softrefs
--------
$ /usr/bin/time git softref --merge-unsorted
2.73user 4.97system 0:07.77elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+15833minor)pagefaults 0swaps

Sorting softrefs into existing sorted file
(throwing away duplicates)
--------
$ /usr/bin/time git softref --merge-unsorted
3.49user 5.12system 0:08.64elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+27300minor)pagefaults 0swaps


========
Operations on sorted softrefs:
(572740 (10 per commit) entries in sorted order)
========

Listing all softrefs
(sequential reading of sorted softrefs file)
--------
$ /usr/bin/time git softref --list > /dev/null
0.43user 0.02system 0:00.48elapsed 96%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+11786minor)pagefaults 0swaps

Listing HEAD's softrefs
(256-fanout followed by binary search in sorted softrefs file)
--------
$/usr/bin/time git softref --list HEAD > /dev/null
0.00user 0.00system 0:00.00elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+334minor)pagefaults 0swaps

Sorting softrefs
(no-op)
--------
$ /usr/bin/time git softref --merge-unsorted
0.00user 0.00system 0:00.00elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+312minor)pagefaults 0swaps


-- 
Johan Herland, <johan@herland.net>
www.herland.net

  parent reply	other threads:[~2007-06-09 18:19 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-06-04  0:51 Refactoring the tag object; Introducing soft references (softrefs); Git 'notes' (take 2) Johan Herland
2007-06-04  0:51 ` [PATCH 0/6] Refactor the tag object Johan Herland
2007-06-04  0:52   ` [PATCH 1/6] Refactor git tag objects; make "tag" header optional; introduce new optional "keywords" header Johan Herland
2007-06-04  6:08     ` Matthias Lederhofer
2007-06-04  7:30       ` Johan Herland
2007-06-04  0:53   ` [PATCH 2/6] git-show: When showing tag objects with no tag name, show tag object's SHA1 instead of an empty string Johan Herland
2007-06-04  0:53   ` [PATCH 3/6] git-fsck: Do thorough verification of tag objects Johan Herland
2007-06-04  5:56     ` Matthias Lederhofer
2007-06-04  7:51       ` Johan Herland
2007-06-06  7:18         ` Junio C Hamano
2007-06-06  8:06           ` Johan Herland
2007-06-06  9:03             ` Junio C Hamano
2007-06-06  9:21               ` Junio C Hamano
2007-06-06 10:26                 ` Johan Herland
2007-06-06 10:35                   ` Junio C Hamano
2007-06-04  0:54   ` [PATCH 4/6] Documentation/git-mktag: Document the changes in tag object structure Johan Herland
2007-06-04  0:54   ` [PATCH 5/6] git-mktag tests: Fix and expand the mktag tests according to the new " Johan Herland
2007-06-04  0:54   ` [PATCH 6/6] Add fsck_verify_ref_to_tag_object() to verify that refname matches name stored in tag object Johan Herland
2007-06-04 20:32   ` [PATCH 0/6] Refactor the " Junio C Hamano
2007-06-07 22:13   ` [PATCH] Fix bug in tag parsing when thorough verification was in effect Johan Herland
2007-06-09 18:19 ` Johan Herland [this message]
2007-06-09 18:21   ` [PATCH 1/7] Softrefs: Add softrefs header file with API documentation Johan Herland
2007-06-10  6:58     ` Johannes Schindelin
2007-06-10  7:43       ` Junio C Hamano
2007-06-10  7:54         ` Johannes Schindelin
2007-06-10 14:00       ` Johan Herland
2007-06-10 14:27     ` Jakub Narebski
2007-06-10 14:45       ` [PATCH] Teach git-gc to merge unsorted softrefs Johan Herland
2007-06-09 18:22   ` [PATCH 2/7] Softrefs: Add implementation of softrefs API Johan Herland
2007-06-09 18:22   ` [PATCH 3/7] Softrefs: Add git-softref, a builtin command for adding, listing and administering softrefs Johan Herland
2007-06-09 18:23   ` [PATCH 4/7] Softrefs: Add manual page documenting git-softref and softrefs subsystem in general Johan Herland
2007-06-09 18:23   ` [PATCH 5/7] Softrefs: Add testcases for basic softrefs behaviour Johan Herland
2007-06-09 18:24   ` [PATCH 6/7] Softrefs: Administrivia associated with softrefs subsystem and git-softref builtin Johan Herland
2007-06-09 18:24   ` [PATCH 7/7] Teach git-mktag to register softrefs for all tag objects Johan Herland
2007-06-09 18:25   ` [PATCH] Change softrefs file format from text (82 bytes per entry) to binary (40 bytes per entry) Johan Herland
2007-06-10  8:02     ` Johannes Schindelin
2007-06-10  8:30       ` Junio C Hamano
2007-06-10  9:46         ` Johannes Schindelin
2007-06-10 14:03       ` Johan Herland
2007-06-09 23:55   ` Comment on weak refs Junio C Hamano
2007-06-10  1:25     ` Johan Herland
2007-06-10  6:33       ` Johannes Schindelin
2007-06-10 13:41         ` Johan Herland
2007-06-10 14:09           ` Pierre Habouzit
2007-06-10 14:25             ` Pierre Habouzit
2007-06-10  9:03       ` Pierre Habouzit
2007-06-10 15:26     ` Jakub Narebski
2007-06-09 22:57 ` Refactoring the tag object; Introducing soft references (softrefs); Git 'notes' (take 2) Steven Grimm
2007-06-09 23:16   ` Johan Herland
2007-06-10  8:29     ` Pierre Habouzit
2007-06-10 14:31       ` Johan Herland
2007-06-10 19:42         ` Steven Grimm

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200706092019.13185.johan@herland.net \
    --to=johan@herland.net \
    --cc=git@vger.kernel.org \
    --cc=junkio@cox.net \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).