From: Junio C Hamano <junkio@cox.net>
To: Linus Torvalds <torvalds@osdl.org>
Cc: "David S. Miller" <davem@davemloft.net>,
Git Mailing List <git@vger.kernel.org>,
Nicolas Pitre <nico@cam.org>, Chris Mason <mason@suse.com>
Subject: Re: kernel.org and GIT tree rebuilding
Date: Sun, 26 Jun 2005 11:39:41 -0700 [thread overview]
Message-ID: <7vzmtdq7wy.fsf@assigned-by-dhcp.cox.net> (raw)
In-Reply-To: <Pine.LNX.4.58.0506260905200.19755@ppc970.osdl.org> (Linus Torvalds's message of "Sun, 26 Jun 2005 09:41:02 -0700 (PDT)")
>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:
LT> I actually like this approach better than having delta-objects in the
LT> filesystem. Partly because the pack-file is self-contained, partly because
LT> it also solves the fs blocking issue, yet is still efficient to look up
LT> the results without having hardlinks etc to duplicate objects virtually.
LT> And when you do the packing by hand as an "archival" mechanism, it also
LT> doesn't have any of the downsides that Chris' packing approach had.
After analyzing what is involved in making packed GIT integrated
into read_sha1_file() [*1*], I agree 100% with the above. I
mean no disrespect to what Nico has done (and I myself have done
some code to work with Nico's deltified objects when I did diffs
and pull fixes), but it would help the code very much if we do
not have to worry about "delta" objects in GIT_OBJECT_DIRECTORY.
My preference is to do things in this order:
(0) concatenate pack and idx files;
(1) teach read_sha1_file() to read from packed GIT;
(2) teach fsck-cache about packed GIT;
(3) have people with deltified repositories convert them back
to undeltified (I think git-pack-objects would barf on such
repository);
(4) drop "delta" objects from GIT_OBJECT_DIRECTORY; this means
that git-deltafy-script and git-mkdelta have to go.
(5) tell git-*-pull about packed GIT;
[Footnotes]
*1* Here is the analysis I did last night, still assuming that
we would support "delta" objects in GIT_OBJECT_DIRECTORY. The
"trickier" map_sha1_file() users almost all involve "delta"
objects, and that is why I prefer dropping them.
- Enhance GIT_ALTERNATE_OBJECT_DIRECTORIES mechanism so that
its component can be either a directory or a packed file.
- sha1_file.c::find_sha1_file() has to be enhanced to express
not just path (in the current "individual object file"
case) but a pointer to a structure that describes a packed
file in the GIT_ALTERNATE_OBJECT_DIRECTORIES list with the
offset for the entry.
- The change necessary to sha1_file.c::has_sha1_file() is
minimum. find_sha1_file() updated along the above lines
would say if the thing exists or not anyway, so it can just
return true/false as it currently does pretty easily.
- sha1_file.c::read_sha1_file() would be the primary piece to
unpack from the packed representation.
- sha1_file.c::map_sha1_file() is trickier. It has handful
callers outside sha1_file.c for valid reasons, so we will
need to audit the callers and have them fall back on
read_sha1_file() as appropriate. Here is the result of my
first pass:
- (easy) sha1_delta_base() is used only when an object is
delitified, and if true get to the base object. We can
just tell the caller our object is not deltified when it
resides in a packed file.
- (easy) sha1_file_size() is used by diffcore to measure the
expanded blob size. Although the implementation obviously
has to be different, it would be trivial to find the size
if the object resides in a packed file.
- (easy) pack-objects.c::check_object() uses map_sha1_file() so that
it can unpack small to get the type of the object. We
should be able to introduce a new interface (say,
sha1_file.c::sha1_object_type()) for doing this sort of
stuff.
- (harder) mkdelta.c::get_buffer(), object.c::parse_object()
and delta.c::process_delta() are trickier, because they
want to treat "delta" as a raw object (otherwise we would
have just done sha1_read_file() instead of
map/unpack_sha1_file pair).
- (harder) ssh-push.c::serve_object() also wants raw
representation to directly ship to the other end.
next prev parent reply other threads:[~2005-06-26 18:34 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-06-25 4:20 kernel.org and GIT tree rebuilding David S. Miller
2005-06-25 4:40 ` Jeff Garzik
2005-06-25 5:23 ` Linus Torvalds
2005-06-25 5:48 ` Jeff Garzik
2005-06-25 6:16 ` Linus Torvalds
2005-06-26 16:41 ` Linus Torvalds
2005-06-26 18:39 ` Junio C Hamano [this message]
2005-06-26 19:19 ` Linus Torvalds
2005-06-26 19:45 ` Junio C Hamano
[not found] ` <7v1x6om6o5.fsf@assigned-by-dhcp.cox.net>
[not found] ` <Pine.LNX.4.58.0506271227160.19755@ppc970.osdl.org>
[not found] ` <7v64vzyqyw.fsf_-_@assigned-by-dhcp.cox.net>
2005-06-28 6:56 ` [PATCH] Obtain sha1_file_info() for deltified pack entry properly Junio C Hamano
2005-06-28 6:58 ` Junio C Hamano
2005-06-28 6:58 ` [PATCH 2/3] git-cat-file: use sha1_object_info() on '-t' Junio C Hamano
2005-06-28 6:59 ` [PATCH 3/3] git-cat-file: '-s' to find out object size Junio C Hamano
2005-06-26 20:52 ` kernel.org and GIT tree rebuilding Chris Mason
2005-06-26 21:03 ` Chris Mason
2005-06-26 21:40 ` Linus Torvalds
2005-06-26 22:34 ` Linus Torvalds
2005-06-28 18:06 ` Nicolas Pitre
2005-06-28 19:28 ` Linus Torvalds
2005-06-28 21:08 ` Nicolas Pitre
2005-06-28 21:27 ` Linus Torvalds
2005-06-28 21:55 ` [PATCH] Bugfix: initialize pack_base to NULL Junio C Hamano
2005-06-29 3:55 ` kernel.org and GIT tree rebuilding Nicolas Pitre
2005-06-29 5:16 ` Nicolas Pitre
2005-06-29 5:43 ` Linus Torvalds
2005-06-29 5:54 ` Linus Torvalds
2005-06-29 7:16 ` Last mile for 1.0 again Junio C Hamano
2005-06-29 9:51 ` [PATCH] Add git-verify-pack command Junio C Hamano
2005-06-29 16:15 ` Linus Torvalds
2005-07-04 21:40 ` Last mile for 1.0 again Daniel Barkalow
2005-07-04 21:45 ` Junio C Hamano
2005-07-04 21:59 ` Linus Torvalds
2005-07-04 22:41 ` Daniel Barkalow
2005-07-04 23:06 ` Junio C Hamano
2005-07-05 1:54 ` Daniel Barkalow
2005-07-05 6:24 ` Junio C Hamano
2005-07-05 13:34 ` Marco Costalba
2005-06-25 5:04 ` kernel.org and GIT tree rebuilding Junio C Hamano
-- strict thread matches above, loose matches on Subject: below --
2005-07-03 2:51 linux
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7vzmtdq7wy.fsf@assigned-by-dhcp.cox.net \
--to=junkio@cox.net \
--cc=davem@davemloft.net \
--cc=git@vger.kernel.org \
--cc=mason@suse.com \
--cc=nico@cam.org \
--cc=torvalds@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).