git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Thomas Rast <trast@inf.ethz.ch>
To: Nguyen Thai Ngoc Duy <pclouds@gmail.com>
Cc: "Thomas Rast" <trast@student.ethz.ch>,
	git@vger.kernel.org, "René Scharfe" <rene.scharfe@lsrfire.ath.cx>,
	"Junio C Hamano" <gitster@pobox.com>,
	"Eric Herman" <eric@freesa.org>
Subject: Re: [POC PATCH 5/5] sha1_file: make the pack machinery thread-safe
Date: Tue, 10 Apr 2012 14:29:20 +0200	[thread overview]
Message-ID: <8762d7zs67.fsf@thomas.inf.ethz.ch> (raw)
In-Reply-To: <CACsJy8AyphUD-vFwwgaW0eWG3ekgHA+tcAwV2zk5YGorkW0TzQ@mail.gmail.com> (Nguyen Thai Ngoc Duy's message of "Mon, 9 Apr 2012 21:43:56 +0700")

Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes:

> On Fri, Dec 9, 2011 at 3:39 PM, Thomas Rast <trast@student.ethz.ch> wrote:
>> More precisely speaking, this pushes the locking down from
>> read_object() into bits of the pack machinery that cannot (yet) run in
>> parallel.
>>
>> There are several hacks here:
>>
>> a) prepare_packed_git() must be called before any parallel accesses
>>   happen.  It now unconditionally opens and maps all index files.
>>
>> b) similarly, prepare_replace_object() must be called before any
>>   parallel read_sha1_file() happens
>>
>> This simplification lets us avoid locking outright to guard the index
>> accesses; locking is then mainly required for open_packed_git(),
>> [un]use_pack(), and such.
>>
>> The ultimate goal would of course be to let at least _some_ pack
>> accesses happen without any locking whatsoever.  But grep already
>> benefits from it with a nice speed boost on non-worktree greps.
>
> (I'm running into multithread pack access problem in rev-list..)
>
> Why not put the global pointer "struct packed_git *packed_git" to
> "struct pack_context" and avoid locking entirely? Resource usage is
> like we run <n> different processes, I think, which is not too bad. We
> may want to share a few static pack_* variables such as
> pack_open_fds.. to avoid hitting system limits too fast.

I was hesitating to do that because I think it's not the best solution
yet.  At least for 64bit systems, I thought of doing some or all of:

* opening/mapping the pack indexes immediately to avoid locking there
  (perhaps the POC already does this, I haven't looked again).  If you
  have many packs this isn't cheap because the index must be verified.

* mapping small packs immediately

* mapping "the" big pack immediately (many repos will have a huge pack
  from the initial clone)

Put another way, my current concern is that on 64bit systems it's
incredibly easy to share (who cares about a few GBs of mmap()?), whereas
on 32bit systems it probably matters much more, but there we also suffer
more from not sharing.

Am I making any sense?

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

  reply	other threads:[~2012-04-10 12:29 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-12-09  8:39 [POC PATCH 0/5] Threaded loose object and pack access Thomas Rast
2011-12-09  8:39 ` [POC PATCH 1/5] Turn grep's use_threads into a global flag Thomas Rast
2011-12-09  8:39 ` [POC PATCH 2/5] grep: push locking into read_sha1_* Thomas Rast
2011-12-09  8:39 ` [POC PATCH 3/5] sha1_file_name_buf(): sha1_file_name in caller's buffer Thomas Rast
2011-12-09  8:39 ` [POC PATCH 4/5] sha1_file: stuff various pack reading variables into a struct Thomas Rast
2011-12-09  8:39 ` [POC PATCH 5/5] sha1_file: make the pack machinery thread-safe Thomas Rast
2012-04-09 14:43   ` Nguyen Thai Ngoc Duy
2012-04-10 12:29     ` Thomas Rast [this message]
2012-04-10 13:39       ` Nguyen Thai Ngoc Duy
2011-12-09  8:45 ` [POC PATCH 0/5] Threaded loose object and pack access Thomas Rast
2011-12-10 15:51 ` Nguyen Thai Ngoc Duy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8762d7zs67.fsf@thomas.inf.ethz.ch \
    --to=trast@inf.ethz.ch \
    --cc=eric@freesa.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=pclouds@gmail.com \
    --cc=rene.scharfe@lsrfire.ath.cx \
    --cc=trast@student.ethz.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).