git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <junkio@cox.net>
To: Linus Torvalds <torvalds@osdl.org>
Cc: "David S. Miller" <davem@davemloft.net>,
	Git Mailing List <git@vger.kernel.org>,
	Nicolas Pitre <nico@cam.org>, Chris Mason <mason@suse.com>
Subject: Re: kernel.org and GIT tree rebuilding
Date: Sun, 26 Jun 2005 11:39:41 -0700	[thread overview]
Message-ID: <7vzmtdq7wy.fsf@assigned-by-dhcp.cox.net> (raw)
In-Reply-To: <Pine.LNX.4.58.0506260905200.19755@ppc970.osdl.org> (Linus Torvalds's message of "Sun, 26 Jun 2005 09:41:02 -0700 (PDT)")

>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:

LT> I actually like this approach better than having delta-objects in the
LT> filesystem. Partly because the pack-file is self-contained, partly because
LT> it also solves the fs blocking issue, yet is still efficient to look up
LT> the results without having hardlinks etc to duplicate objects virtually.  
LT> And when you do the packing by hand as an "archival" mechanism, it also
LT> doesn't have any of the downsides that Chris' packing approach had.

After analyzing what is involved in making packed GIT integrated
into read_sha1_file() [*1*], I agree 100% with the above.  I
mean no disrespect to what Nico has done (and I myself have done
some code to work with Nico's deltified objects when I did diffs
and pull fixes), but it would help the code very much if we do
not have to worry about "delta" objects in GIT_OBJECT_DIRECTORY.

My preference is to do things in this order:

 (0) concatenate pack and idx files;

 (1) teach read_sha1_file() to read from packed GIT;

 (2) teach fsck-cache about packed GIT;

 (3) have people with deltified repositories convert them back
     to undeltified (I think git-pack-objects would barf on such
     repository);

 (4) drop "delta" objects from GIT_OBJECT_DIRECTORY; this means
     that git-deltafy-script and git-mkdelta have to go.

 (5) tell git-*-pull about packed GIT;


[Footnotes]

*1* Here is the analysis I did last night, still assuming that
we would support "delta" objects in GIT_OBJECT_DIRECTORY.  The
"trickier" map_sha1_file() users almost all involve "delta"
objects, and that is why I prefer dropping them.

 - Enhance GIT_ALTERNATE_OBJECT_DIRECTORIES mechanism so that
   its component can be either a directory or a packed file.

 - sha1_file.c::find_sha1_file() has to be enhanced to express
   not just path (in the current "individual object file"
   case) but a pointer to a structure that describes a packed
   file in the GIT_ALTERNATE_OBJECT_DIRECTORIES list with the
   offset for the entry.

 - The change necessary to sha1_file.c::has_sha1_file() is
   minimum.  find_sha1_file() updated along the above lines
   would say if the thing exists or not anyway, so it can just
   return true/false as it currently does pretty easily.

 - sha1_file.c::read_sha1_file() would be the primary piece to
   unpack from the packed representation.

 - sha1_file.c::map_sha1_file() is trickier.  It has handful
   callers outside sha1_file.c for valid reasons, so we will
   need to audit the callers and have them fall back on
   read_sha1_file() as appropriate.  Here is the result of my
   first pass:

   - (easy) sha1_delta_base() is used only when an object is
     delitified, and if true get to the base object.  We can
     just tell the caller our object is not deltified when it
     resides in a packed file.

   - (easy) sha1_file_size() is used by diffcore to measure the
     expanded blob size.  Although the implementation obviously
     has to be different, it would be trivial to find the size
     if the object resides in a packed file.

   - (easy) pack-objects.c::check_object() uses map_sha1_file() so that
     it can unpack small to get the type of the object.  We
     should be able to introduce a new interface (say,
     sha1_file.c::sha1_object_type()) for doing this sort of
     stuff.

   - (harder) mkdelta.c::get_buffer(), object.c::parse_object()
     and delta.c::process_delta() are trickier, because they
     want to treat "delta" as a raw object (otherwise we would
     have just done sha1_read_file() instead of
     map/unpack_sha1_file pair).

   - (harder) ssh-push.c::serve_object() also wants raw
     representation to directly ship to the other end.

  reply	other threads:[~2005-06-26 18:34 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-06-25  4:20 kernel.org and GIT tree rebuilding David S. Miller
2005-06-25  4:40 ` Jeff Garzik
2005-06-25  5:23   ` Linus Torvalds
2005-06-25  5:48     ` Jeff Garzik
2005-06-25  6:16       ` Linus Torvalds
2005-06-26 16:41         ` Linus Torvalds
2005-06-26 18:39           ` Junio C Hamano [this message]
2005-06-26 19:19             ` Linus Torvalds
2005-06-26 19:45               ` Junio C Hamano
     [not found]                 ` <7v1x6om6o5.fsf@assigned-by-dhcp.cox.net>
     [not found]                   ` <Pine.LNX.4.58.0506271227160.19755@ppc970.osdl.org>
     [not found]                     ` <7v64vzyqyw.fsf_-_@assigned-by-dhcp.cox.net>
2005-06-28  6:56                       ` [PATCH] Obtain sha1_file_info() for deltified pack entry properly Junio C Hamano
2005-06-28  6:58                         ` Junio C Hamano
2005-06-28  6:58                         ` [PATCH 2/3] git-cat-file: use sha1_object_info() on '-t' Junio C Hamano
2005-06-28  6:59                         ` [PATCH 3/3] git-cat-file: '-s' to find out object size Junio C Hamano
2005-06-26 20:52           ` kernel.org and GIT tree rebuilding Chris Mason
2005-06-26 21:03             ` Chris Mason
2005-06-26 21:40             ` Linus Torvalds
2005-06-26 22:34               ` Linus Torvalds
2005-06-28 18:06           ` Nicolas Pitre
2005-06-28 19:28             ` Linus Torvalds
2005-06-28 21:08               ` Nicolas Pitre
2005-06-28 21:27                 ` Linus Torvalds
2005-06-28 21:55                   ` [PATCH] Bugfix: initialize pack_base to NULL Junio C Hamano
2005-06-29  3:55                   ` kernel.org and GIT tree rebuilding Nicolas Pitre
2005-06-29  5:16                     ` Nicolas Pitre
2005-06-29  5:43                       ` Linus Torvalds
2005-06-29  5:54                         ` Linus Torvalds
2005-06-29  7:16                           ` Last mile for 1.0 again Junio C Hamano
2005-06-29  9:51                             ` [PATCH] Add git-verify-pack command Junio C Hamano
2005-06-29 16:15                               ` Linus Torvalds
2005-07-04 21:40                             ` Last mile for 1.0 again Daniel Barkalow
2005-07-04 21:45                               ` Junio C Hamano
2005-07-04 21:59                               ` Linus Torvalds
2005-07-04 22:41                                 ` Daniel Barkalow
2005-07-04 23:06                                   ` Junio C Hamano
2005-07-05  1:54                                     ` Daniel Barkalow
2005-07-05  6:24                                       ` Junio C Hamano
2005-07-05 13:34                                         ` Marco Costalba
2005-06-25  5:04 ` kernel.org and GIT tree rebuilding Junio C Hamano
  -- strict thread matches above, loose matches on Subject: below --
2005-07-03  2:51 linux

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7vzmtdq7wy.fsf@assigned-by-dhcp.cox.net \
    --to=junkio@cox.net \
    --cc=davem@davemloft.net \
    --cc=git@vger.kernel.org \
    --cc=mason@suse.com \
    --cc=nico@cam.org \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).