From: Junio C Hamano <gitster@pobox.com>
To: "Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH] add: add --bulk to index all objects into a pack file
Date: Wed, 02 Oct 2013 23:43:45 -0700 [thread overview]
Message-ID: <xmqqsiwin9b2.fsf@gitster.dls.corp.google.com> (raw)
In-Reply-To: <1380772811-15415-1-git-send-email-pclouds@gmail.com> ("Nguyễn Thái Ngọc Duy"'s message of "Thu, 3 Oct 2013 11:00:11 +0700")
Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:
> The use case is
>
> tar -xzf bigproject.tar.gz
> cd bigproject
> git init
> git add .
> # git grep or something
Two obvious thoughts, and a half.
(1) This particular invocation of "git add" can easily detect that
it is run in a repository with no $GIT_INDEX_FILE yet, which is
the most typical case for a big initial import. It could even
ask if the current branch is unborn if you wanted to make the
heuristic more specific to this use case. Perhaps it would
make sense to automatically plug the bulk import machinery in
such a case without an option?
(2) Imagine performing a dry-run of update_files_in_cache() using a
different diff-files callback that is similar to the
update_callback() but that uses the lstat(2) data to see how
big an import this really is, instead of calling
add_file_to_index(), before actually registering the data to
the object database. If you benchmark to see how expensive it
is, you may find that such a scheme might be a workable
auto-tuning mechanism to trigger this. Even if it were
moderately expensive, when combined with the heuristics above
for (1), it might be a worthwhile thing to do only when it is
likely to be an initial import.
(3) Is it always a good idea to send everything to a packfile on a
large addition, or are you often better off importing the
initial fileset as loose objects? If the latter, then the
option name "--bulk" may give users a wrong hint "if you are
doing a bulk-import, you are bettern off using this option".
This is a very logical extension to what was started at 568508e7
(bulk-checkin: replace fast-import based implementation,
2011-10-28), and I like it. I suspect "--bulk=<threashold>" might
be a better alternative than setting the threshold unconditionally
to zero, though.
next prev parent reply other threads:[~2013-10-03 6:43 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-10-03 4:00 [PATCH] add: add --bulk to index all objects into a pack file Nguyễn Thái Ngọc Duy
2013-10-03 6:43 ` Junio C Hamano [this message]
2013-10-03 12:26 ` Duy Nguyen
2013-10-04 6:57 ` [PATCH v2] " Nguyễn Thái Ngọc Duy
2013-10-04 7:10 ` Matthieu Moy
2013-10-04 7:19 ` Duy Nguyen
2013-10-04 12:38 ` Duy Nguyen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqqsiwin9b2.fsf@gitster.dls.corp.google.com \
--to=gitster@pobox.com \
--cc=git@vger.kernel.org \
--cc=pclouds@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.