From: Johan Herland <johan@herland.net>
To: git@vger.kernel.org
Cc: Junio C Hamano <junkio@cox.net>,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: [PATCH 0/7] Introduce soft references (softrefs)
Date: Sat, 09 Jun 2007 20:19:13 +0200 [thread overview]
Message-ID: <200706092019.13185.johan@herland.net> (raw)
In-Reply-To: <200706040251.05286.johan@herland.net>
This patch series introduces soft references (softrefs); a mechanism for
declaring reachability between arbitrary (but existing) git objects.
Softrefs are meant to provide the mechanism for "reverse mapping" that
we determined was needed for tag objects (especially 'notes'). The patch
series also teaches git-mktag to create softrefs for all tag objects.
See the Discussion section in the git-softref manual page (patch #4/7) or
the comments in the header file (patch #1/7) for more details on the
design of softrefs.
I've added some informal performance data at the bottom of this mail [1].
Note that this patch series is incomplete in that the following things
have yet to be implemented:
1. Clone/fetch/push of softrefs
2. Packing of softrefs
3. General integration of softrefs into parts of git where they might be
useful
4. Find appropriate value for MAX_UNSORTED_ENTRIES
There are also some questions connected to the above list of todos:
1. Just how should softrefs affect reachability? Should softrefs be
used/followed in _all_ reachability computations? If not, which?
2. How should softrefs propagate. I suggest they are pretty much always
propagated under clone/fetch/push. (Note that the softrefs merge
algorithm in softrefs.c removes duplicates and softrefs between
non-existing objects, so pre-filtering of the softrefs to be
clones/fetched/pushed may not be necessary)
3. Where can softrefs be used to improve performance by replacing existing
techniques?
4. How to best pack softrefs? Keeping them in the same pack as the objects
they refer to seems to be a good idea, but more thought needs to be put
into this before we can make an implementation
5. How to find _all_ (even unreachable) tag objects in repo for
'git-softref --rebuild-tags'?
6. Optimization. Pretty much nothing has been done so far. Performance
seems to be acceptable for now. Probably needs more testing to
determine bottlenecks
NOTE: After the 7 patches, I will send an _optional_ patch
that changes the softrefs entries from text format (82 bytes per entry)
to binary format (40 bytes per entry). The patch is optional, because
I want the list to decide if we want the (marginal) speedup and
simplified code provided by the patch, or if we want to keep the
read-/maintainability of the text format. Currently I'm in favour of
keeping the text format, but I'm far from sure.
Finally, here's the shortlog: (This patch series of course goes on top of
the previous "Refactor the tag object" patch series, although there isn't
really that many dependencies between them):
Johan Herland (7):
Softrefs: Add softrefs header file with API documentation
Softrefs: Add implementation of softrefs API
Softrefs: Add git-softref, a builtin command for adding, listing and administering softrefs
Softrefs: Add manual page documenting git-softref and softrefs subsystem in general
Softrefs: Add testcases for basic softrefs behaviour
Softrefs: Administrivia associated with softrefs subsystem and git-softref builtin
Teach git-mktag to register softrefs for all tag objects
.gitignore | 1 +
Documentation/cmd-list.perl | 7 +-
Documentation/git-softref.txt | 119 +++++++
Makefile | 6 +-
builtin-softref.c | 167 ++++++++++
builtin.h | 1 +
git.c | 1 +
mktag.c | 11 +-
softrefs.c | 712 +++++++++++++++++++++++++++++++++++++++++
softrefs.h | 188 +++++++++++
t/t3050-softrefs.sh | 314 ++++++++++++++++++
11 files changed, 1521 insertions(+), 6 deletions(-)
create mode 100644 Documentation/git-softref.txt
create mode 100644 builtin-softref.c
create mode 100644 softrefs.c
create mode 100644 softrefs.h
create mode 100755 t/t3050-softrefs.sh
Have fun!
...Johan
[1] Informal performance measurements
I prepared a linux kernel repo (holding 57274 commits) with 10 tag objects,
and created softrefs from every commit to every tag object (572740 softrefs
in total). The resulting softrefs db was 46964680 bytes. The experiment was
done on a 32-bit Intel Pentium 4 (3 GHz w/HyperThreading) with 1 GB RAM:
========
Operations on unsorted softrefs:
(572740 (10 per commit) entries in random/unsorted order)
========
Listing all softrefs
(sequential reading of unsorted softrefs file)
--------
$ /usr/bin/time git softref --list > /dev/null
0.44user 0.02system 0:00.47elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+11786minor)pagefaults 0swaps
Listing HEAD's softrefs
(sequential reading of unsorted softrefs file)
--------
$ /usr/bin/time git softref --list HEAD > /dev/null
0.11user 0.01system 0:00.14elapsed 94%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+11790minor)pagefaults 0swaps
Sorting softrefs
--------
$ /usr/bin/time git softref --merge-unsorted
2.73user 4.97system 0:07.77elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+15833minor)pagefaults 0swaps
Sorting softrefs into existing sorted file
(throwing away duplicates)
--------
$ /usr/bin/time git softref --merge-unsorted
3.49user 5.12system 0:08.64elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+27300minor)pagefaults 0swaps
========
Operations on sorted softrefs:
(572740 (10 per commit) entries in sorted order)
========
Listing all softrefs
(sequential reading of sorted softrefs file)
--------
$ /usr/bin/time git softref --list > /dev/null
0.43user 0.02system 0:00.48elapsed 96%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+11786minor)pagefaults 0swaps
Listing HEAD's softrefs
(256-fanout followed by binary search in sorted softrefs file)
--------
$/usr/bin/time git softref --list HEAD > /dev/null
0.00user 0.00system 0:00.00elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+334minor)pagefaults 0swaps
Sorting softrefs
(no-op)
--------
$ /usr/bin/time git softref --merge-unsorted
0.00user 0.00system 0:00.00elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+312minor)pagefaults 0swaps
--
Johan Herland, <johan@herland.net>
www.herland.net
next prev parent reply other threads:[~2007-06-09 18:19 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-06-04 0:51 Refactoring the tag object; Introducing soft references (softrefs); Git 'notes' (take 2) Johan Herland
2007-06-04 0:51 ` [PATCH 0/6] Refactor the tag object Johan Herland
2007-06-04 0:52 ` [PATCH 1/6] Refactor git tag objects; make "tag" header optional; introduce new optional "keywords" header Johan Herland
2007-06-04 6:08 ` Matthias Lederhofer
2007-06-04 7:30 ` Johan Herland
2007-06-04 0:53 ` [PATCH 2/6] git-show: When showing tag objects with no tag name, show tag object's SHA1 instead of an empty string Johan Herland
2007-06-04 0:53 ` [PATCH 3/6] git-fsck: Do thorough verification of tag objects Johan Herland
2007-06-04 5:56 ` Matthias Lederhofer
2007-06-04 7:51 ` Johan Herland
2007-06-06 7:18 ` Junio C Hamano
2007-06-06 8:06 ` Johan Herland
2007-06-06 9:03 ` Junio C Hamano
2007-06-06 9:21 ` Junio C Hamano
2007-06-06 10:26 ` Johan Herland
2007-06-06 10:35 ` Junio C Hamano
2007-06-04 0:54 ` [PATCH 4/6] Documentation/git-mktag: Document the changes in tag object structure Johan Herland
2007-06-04 0:54 ` [PATCH 5/6] git-mktag tests: Fix and expand the mktag tests according to the new " Johan Herland
2007-06-04 0:54 ` [PATCH 6/6] Add fsck_verify_ref_to_tag_object() to verify that refname matches name stored in tag object Johan Herland
2007-06-04 20:32 ` [PATCH 0/6] Refactor the " Junio C Hamano
2007-06-07 22:13 ` [PATCH] Fix bug in tag parsing when thorough verification was in effect Johan Herland
2007-06-09 18:19 ` Johan Herland [this message]
2007-06-09 18:21 ` [PATCH 1/7] Softrefs: Add softrefs header file with API documentation Johan Herland
2007-06-10 6:58 ` Johannes Schindelin
2007-06-10 7:43 ` Junio C Hamano
2007-06-10 7:54 ` Johannes Schindelin
2007-06-10 14:00 ` Johan Herland
2007-06-10 14:27 ` Jakub Narebski
2007-06-10 14:45 ` [PATCH] Teach git-gc to merge unsorted softrefs Johan Herland
2007-06-09 18:22 ` [PATCH 2/7] Softrefs: Add implementation of softrefs API Johan Herland
2007-06-09 18:22 ` [PATCH 3/7] Softrefs: Add git-softref, a builtin command for adding, listing and administering softrefs Johan Herland
2007-06-09 18:23 ` [PATCH 4/7] Softrefs: Add manual page documenting git-softref and softrefs subsystem in general Johan Herland
2007-06-09 18:23 ` [PATCH 5/7] Softrefs: Add testcases for basic softrefs behaviour Johan Herland
2007-06-09 18:24 ` [PATCH 6/7] Softrefs: Administrivia associated with softrefs subsystem and git-softref builtin Johan Herland
2007-06-09 18:24 ` [PATCH 7/7] Teach git-mktag to register softrefs for all tag objects Johan Herland
2007-06-09 18:25 ` [PATCH] Change softrefs file format from text (82 bytes per entry) to binary (40 bytes per entry) Johan Herland
2007-06-10 8:02 ` Johannes Schindelin
2007-06-10 8:30 ` Junio C Hamano
2007-06-10 9:46 ` Johannes Schindelin
2007-06-10 14:03 ` Johan Herland
2007-06-09 23:55 ` Comment on weak refs Junio C Hamano
2007-06-10 1:25 ` Johan Herland
2007-06-10 6:33 ` Johannes Schindelin
2007-06-10 13:41 ` Johan Herland
2007-06-10 14:09 ` Pierre Habouzit
2007-06-10 14:25 ` Pierre Habouzit
2007-06-10 9:03 ` Pierre Habouzit
2007-06-10 15:26 ` Jakub Narebski
2007-06-09 22:57 ` Refactoring the tag object; Introducing soft references (softrefs); Git 'notes' (take 2) Steven Grimm
2007-06-09 23:16 ` Johan Herland
2007-06-10 8:29 ` Pierre Habouzit
2007-06-10 14:31 ` Johan Herland
2007-06-10 19:42 ` Steven Grimm
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200706092019.13185.johan@herland.net \
--to=johan@herland.net \
--cc=git@vger.kernel.org \
--cc=junkio@cox.net \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).