git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: <rsbecker@nexbridge.com>
To: "'Zitzmann, Christian'" <Christian.Zitzmann@vitesco.com>,
	<git@vger.kernel.org>
Subject: RE: Parallelism for submodule update
Date: Mon, 2 Jan 2023 11:54:24 -0500	[thread overview]
Message-ID: <009801d91eca$e1646360$a42d2a20$@nexbridge.com> (raw)
In-Reply-To: <DB5PR02MB100691E6422F5E94228F0E0EC8AF79@DB5PR02MB10069.eurprd02.prod.outlook.com>



>-----Original Message-----
>From: <Christian.Zitzmann@vitesco.com>
On January 2, 2023 11:45 AM Christian Zitzmann wrote:
>we are using git since many years with also heavily using submodules.
>
>When updating the submodules, only the fetching part is done in parallel (with
>config submodule.fetchjobs or --jobs) but the checkout is done sequentially
>
>What I’ve recognized when cloning with
>- scalar clone --full-clone --recurse-submodules <URL> or
>- git clone --filter=blob:none --also-filter-submodules --recurse-submodules
><URL>
>
>We loose performance, as the fetch of the blobs is done in the sequential
>checkout part, instead of in the parallel part.
>
>Furthermore, the utilization - without partial clone - of network and harddisk is not
>always good, as first the network is utilized (fetch) and then the harddisk
>(checkout)
>
>As the checkout part is local to the submodule (no shared resources to block), it
>would be great if we could move the checkout into the parallelized part.
>E.g. by doing fetch and checkout (with blob fetching) in one step with e.g.
>run_processes_parallel_tr2
>
>I expect that this significantly improves the performance, especially when using
>partial clones.
>
>Do you think this is possible? Do I miss anything in my thoughts?

Since this is a platform-specific request, if it happens, this should be a configuration switch that defaults off. On my platform, the file system itself is fairly fast, but the name service traversals and resolutions (what happens in the name service) is a performance problem. Doing the checkout/switch in parallel would actually be counter-productive in my case. So I would keep it off, but I get that other platforms could benefit.

Regards,
Randall

--
Brief whoami: NonStop&UNIX developer since approximately
UNIX(421664400)
NonStop(211288444200000000)
-- In real life, I talk too much.




  reply	other threads:[~2023-01-02 16:54 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-02 16:44 Parallelism for submodule update Zitzmann, Christian
2023-01-02 16:54 ` rsbecker [this message]
2023-01-13 10:49   ` Zitzmann, Christian
2023-01-19 21:39 ` Calvin Wan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='009801d91eca$e1646360$a42d2a20$@nexbridge.com' \
    --to=rsbecker@nexbridge.com \
    --cc=Christian.Zitzmann@vitesco.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).