From: Jeff King <peff@peff.net>
To: Jonathan Tan <jonathantanmy@google.com>
Cc: git@vger.kernel.org
Subject: Re: Common thread pool API in Git?
Date: Fri, 27 Sep 2019 18:14:21 -0400 [thread overview]
Message-ID: <20190927221420.GA31237@sigill.intra.peff.net> (raw)
In-Reply-To: <20190926003300.195781-1-jonathantanmy@google.com>
On Wed, Sep 25, 2019 at 05:33:00PM -0700, Jonathan Tan wrote:
> Does anyone have ideas or plans for this? I know that (at least) "grep"
> and "index-pack" have their own implementations, and it would be great
> to just have one that all code can use.
> [..]
> Searching reveals [2], but I don't think it ever made it into Git.
I think that's correct. I had a recollection of this being discussed in
the past, but I'm pretty sure I was thinking of the patches you already
linked to. We didn't take it because the proposed callers actually were
starting sub-processes anyway, so we ended up instead with
run_processes_parallel():
https://public-inbox.org/git/1442012994-20374-1-git-send-email-sbeller@google.com/
In case it's useful, here's the latest version of the threaded version
that was posted to the list (I think):
https://public-inbox.org/git/1440724495-708-5-git-send-email-sbeller@google.com/
> For those who want to know, this question was motivated by a big delta
> tree occurring in [1]. index-pack does parallelize delta resolution, but
> it cannot split up trees into threads: each delta base root can go into
> its own thread, but when a delta base root is processed, all deltas on
> that root (direct or indirect) is processed in the same thread. I took a
> look to see if this could be parallelized, but thought that it might be
> better to standardize on a thread pool implementation first.
This seems like a reasonable thing to want to do. It does mean that
threads have more potential to interact with each other. E.g., two might
use the same base now, so we'd have to make sure we don't throw away the
bytes of that base until both are done. As you pointed out to me
off-list, the bases are currently placed in a cache, but I think that
cache is currently per-thread. I guess we could "split" it and copy the
new base into a thread's cache when we give it work to do.
While thinking about this, an unrelated optimization occurred to me
(that I don't see us doing). Right now we put every base we resolve into
the cache. But if it only has one child delta, after we resolve that
delta, we can definitely remove the base (even if we're still resolving
deltas against that child). Or more generally, once we've resolved the
_final_ child that uses that base, we no longer need it. Here's an
illustration (where "<-" means "has delta child"):
A <- B <- C
<- D <- E <- F ...
<- G <- H ...
We'd visit each column in a depth-first way, looking for children of
each base we find. Once we've resolved C and D, we don't need to care
about the cache of B anymore. So freeing it puts more room in the cache
while processing E, F, etc.
And likewise, we can throw away D after finishing G. The strategy might
help more if we did a breadth-first traversal, but then you have the
opposite problem (instead of a deep tree busting your cache, a wide one
would).
Anyway, that's all pretty far off your original point. But it might be
worth thinking about it you have to touch this part of the code as part
of your threading work.
-Peff
prev parent reply other threads:[~2019-09-27 22:14 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-09-26 0:33 Common thread pool API in Git? Jonathan Tan
2019-09-26 0:35 ` Jonathan Tan
2019-09-27 22:14 ` Jeff King [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190927221420.GA31237@sigill.intra.peff.net \
--to=peff@peff.net \
--cc=git@vger.kernel.org \
--cc=jonathantanmy@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).