From: Thomas Rast <trast@inf.ethz.ch>
To: Nguyen Thai Ngoc Duy <pclouds@gmail.com>
Cc: "Thomas Rast" <trast@student.ethz.ch>,
git@vger.kernel.org, "René Scharfe" <rene.scharfe@lsrfire.ath.cx>,
"Junio C Hamano" <gitster@pobox.com>,
"Eric Herman" <eric@freesa.org>
Subject: Re: [POC PATCH 5/5] sha1_file: make the pack machinery thread-safe
Date: Tue, 10 Apr 2012 14:29:20 +0200 [thread overview]
Message-ID: <8762d7zs67.fsf@thomas.inf.ethz.ch> (raw)
In-Reply-To: <CACsJy8AyphUD-vFwwgaW0eWG3ekgHA+tcAwV2zk5YGorkW0TzQ@mail.gmail.com> (Nguyen Thai Ngoc Duy's message of "Mon, 9 Apr 2012 21:43:56 +0700")
Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes:
> On Fri, Dec 9, 2011 at 3:39 PM, Thomas Rast <trast@student.ethz.ch> wrote:
>> More precisely speaking, this pushes the locking down from
>> read_object() into bits of the pack machinery that cannot (yet) run in
>> parallel.
>>
>> There are several hacks here:
>>
>> a) prepare_packed_git() must be called before any parallel accesses
>> happen. It now unconditionally opens and maps all index files.
>>
>> b) similarly, prepare_replace_object() must be called before any
>> parallel read_sha1_file() happens
>>
>> This simplification lets us avoid locking outright to guard the index
>> accesses; locking is then mainly required for open_packed_git(),
>> [un]use_pack(), and such.
>>
>> The ultimate goal would of course be to let at least _some_ pack
>> accesses happen without any locking whatsoever. But grep already
>> benefits from it with a nice speed boost on non-worktree greps.
>
> (I'm running into multithread pack access problem in rev-list..)
>
> Why not put the global pointer "struct packed_git *packed_git" to
> "struct pack_context" and avoid locking entirely? Resource usage is
> like we run <n> different processes, I think, which is not too bad. We
> may want to share a few static pack_* variables such as
> pack_open_fds.. to avoid hitting system limits too fast.
I was hesitating to do that because I think it's not the best solution
yet. At least for 64bit systems, I thought of doing some or all of:
* opening/mapping the pack indexes immediately to avoid locking there
(perhaps the POC already does this, I haven't looked again). If you
have many packs this isn't cheap because the index must be verified.
* mapping small packs immediately
* mapping "the" big pack immediately (many repos will have a huge pack
from the initial clone)
Put another way, my current concern is that on 64bit systems it's
incredibly easy to share (who cares about a few GBs of mmap()?), whereas
on 32bit systems it probably matters much more, but there we also suffer
more from not sharing.
Am I making any sense?
--
Thomas Rast
trast@{inf,student}.ethz.ch
next prev parent reply other threads:[~2012-04-10 12:29 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-12-09 8:39 [POC PATCH 0/5] Threaded loose object and pack access Thomas Rast
2011-12-09 8:39 ` [POC PATCH 1/5] Turn grep's use_threads into a global flag Thomas Rast
2011-12-09 8:39 ` [POC PATCH 2/5] grep: push locking into read_sha1_* Thomas Rast
2011-12-09 8:39 ` [POC PATCH 3/5] sha1_file_name_buf(): sha1_file_name in caller's buffer Thomas Rast
2011-12-09 8:39 ` [POC PATCH 4/5] sha1_file: stuff various pack reading variables into a struct Thomas Rast
2011-12-09 8:39 ` [POC PATCH 5/5] sha1_file: make the pack machinery thread-safe Thomas Rast
2012-04-09 14:43 ` Nguyen Thai Ngoc Duy
2012-04-10 12:29 ` Thomas Rast [this message]
2012-04-10 13:39 ` Nguyen Thai Ngoc Duy
2011-12-09 8:45 ` [POC PATCH 0/5] Threaded loose object and pack access Thomas Rast
2011-12-10 15:51 ` Nguyen Thai Ngoc Duy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8762d7zs67.fsf@thomas.inf.ethz.ch \
--to=trast@inf.ethz.ch \
--cc=eric@freesa.org \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=pclouds@gmail.com \
--cc=rene.scharfe@lsrfire.ath.cx \
--cc=trast@student.ethz.ch \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.