git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Kousik Sanagavarapu <five231003@gmail.com>
To: git@vger.kernel.org
Cc: five231003@gmail.com
Subject: [GSoC][Project Idea] Refactor lazy-fetching in a partial clone
Date: Sun, 26 Mar 2023 22:38:01 +0530	[thread overview]
Message-ID: <20230326170801.7955-1-five231003@gmail.com> (raw)

The term "object" below always means a blob or a tree since git doesn't
yet allow filters on commits or tags.

Whenever an object is missing in a partial clone and we check for it
or read it, we trigger a lazy-fetch depending on whether fetch_if_missing
is set to 1 or 0. Currently, this global is set to 1 everywhere and some
commands which do not want to lazy-fetch have it set to 0 internally (for
example index-pack, fetch-pack, rev-list).

The goal of this project is to look into all the commands where fetch_if_missing
is set to 1 (in which case, whenever an object is missing, a connection
is made, the object that is missing is fetched and we disconnect. This
is bad because we are fetching for each object individually and this
leads to huge performance loss) and make changes so that fetching of
objects is done in a batch. In this way, when all of commands know how
to nicely fetch objects, we can change fetch_if_missing to default to 0.

I think this can be implemented by looking for all the places in the code
where has_object_file*(), read_object_file() or really any other function
which uses oid_object_info_extended(), is used and where fetch_if_missing is
set to 1 and make the necessary and appropriate changes to either fetch
efficiently or to not fetch at all and also write the tests necessary
according to the family [1] they belong to.

The above idea is the result of the discussion on the patch

  https://lore.kernel.org/git/20230225052439.27096-1-five231003@gmail.com/

Please let me know if it is doable as a project.

Thanks

[1] https://lore.kernel.org/git/20230311025906.4170554-1-jonathantanmy@google.com/

                 reply	other threads:[~2023-03-26 17:08 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230326170801.7955-1-five231003@gmail.com \
    --to=five231003@gmail.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).