All of lore.kernel.org
 help / color / mirror / Atom feed
From: <rsbecker@nexbridge.com>
To: "'Zitzmann, Christian'" <Christian.Zitzmann@vitesco.com>,
	<git@vger.kernel.org>
Subject: RE: Parallelism for submodule update
Date: Mon, 2 Jan 2023 11:54:24 -0500	[thread overview]
Message-ID: <009801d91eca$e1646360$a42d2a20$@nexbridge.com> (raw)
In-Reply-To: <DB5PR02MB100691E6422F5E94228F0E0EC8AF79@DB5PR02MB10069.eurprd02.prod.outlook.com>



>-----Original Message-----
>From: <Christian.Zitzmann@vitesco.com>
On January 2, 2023 11:45 AM Christian Zitzmann wrote:
>we are using git since many years with also heavily using submodules.
>
>When updating the submodules, only the fetching part is done in parallel (with
>config submodule.fetchjobs or --jobs) but the checkout is done sequentially
>
>What I’ve recognized when cloning with
>- scalar clone --full-clone --recurse-submodules <URL> or
>- git clone --filter=blob:none --also-filter-submodules --recurse-submodules
><URL>
>
>We loose performance, as the fetch of the blobs is done in the sequential
>checkout part, instead of in the parallel part.
>
>Furthermore, the utilization - without partial clone - of network and harddisk is not
>always good, as first the network is utilized (fetch) and then the harddisk
>(checkout)
>
>As the checkout part is local to the submodule (no shared resources to block), it
>would be great if we could move the checkout into the parallelized part.
>E.g. by doing fetch and checkout (with blob fetching) in one step with e.g.
>run_processes_parallel_tr2
>
>I expect that this significantly improves the performance, especially when using
>partial clones.
>
>Do you think this is possible? Do I miss anything in my thoughts?

Since this is a platform-specific request, if it happens, this should be a configuration switch that defaults off. On my platform, the file system itself is fairly fast, but the name service traversals and resolutions (what happens in the name service) is a performance problem. Doing the checkout/switch in parallel would actually be counter-productive in my case. So I would keep it off, but I get that other platforms could benefit.

Regards,
Randall

--
Brief whoami: NonStop&UNIX developer since approximately
UNIX(421664400)
NonStop(211288444200000000)
-- In real life, I talk too much.




  reply	other threads:[~2023-01-02 16:54 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-02 16:44 Parallelism for submodule update Zitzmann, Christian
2023-01-02 16:54 ` rsbecker [this message]
2023-01-13 10:49   ` Zitzmann, Christian
2023-01-19 21:39 ` Calvin Wan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='009801d91eca$e1646360$a42d2a20$@nexbridge.com' \
    --to=rsbecker@nexbridge.com \
    --cc=Christian.Zitzmann@vitesco.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.