git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Shawn O. Pearce" <spearce@spearce.org>
To: Junio C Hamano <junkio@cox.net>
Cc: git@vger.kernel.org
Subject: [PATCH] Don't repack existing objects in fast-import
Date: Fri, 20 Apr 2007 11:29:22 -0400	[thread overview]
Message-ID: <20070420152922.GA17701@spearce.org> (raw)

Some users of fast-import have been trying to use it to rewrite
commits and trees, an activity where the all of the relevant blobs
are already available from the existing packfiles.  In such a case
we don't want to repack a blob, even if the frontend application
has supplied us the raw data rather than a mark or a SHA-1 name.

I'm intentionally only checking the packfiles that existed when
fast-import started and am always ignoring all loose object files.

We ignore loose objects because fast-import tends to operate on a
very large number of objects in a very short timespan, and it is
usually creating new objects, not reusing existing ones.  In such
a situtation the majority of the objects will not be found in the
existing packfiles, nor will they be loose object files.  If the
frontend application really wants us to look at loose object files,
then they can just repack the repository before running fast-import.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---

 This is also now available in the master branch of my fastimport
 tree on repo.or.cz.

 fast-import.c |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index cdd629d..e3290df 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -904,6 +904,12 @@ static int store_object(
 	if (e->offset) {
 		duplicate_count_by_type[type]++;
 		return 1;
+	} else if (find_sha1_pack(sha1, packed_git)) {
+		e->type = type;
+		e->pack_id = MAX_PACK_ID;
+		e->offset = 1; /* just not zero! */
+		duplicate_count_by_type[type]++;
+		return 1;
 	}
 
 	if (last && last->data && last->depth < max_depth) {
@@ -2021,6 +2027,7 @@ static void import_marks(const char *input_file)
 			e = insert_object(sha1);
 			e->type = type;
 			e->pack_id = MAX_PACK_ID;
+			e->offset = 1; /* just not zero! */
 		}
 		insert_mark(mark, e);
 	}
@@ -2086,6 +2093,7 @@ int main(int argc, const char **argv)
 	if (i != argc)
 		usage(fast_import_usage);
 
+	prepare_packed_git();
 	start_packfile();
 	for (;;) {
 		read_next_command();
-- 
1.5.1.1.135.gf948

                 reply	other threads:[~2007-04-20 15:29 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070420152922.GA17701@spearce.org \
    --to=spearce@spearce.org \
    --cc=git@vger.kernel.org \
    --cc=junkio@cox.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).