All of lore.kernel.org
 help / color / mirror / Atom feed
From: Karsten Blees <karsten.blees@gmail.com>
To: Sebastian Schuberth <sschuberth@gmail.com>
Cc: Stefan Zager <szager@google.com>,
	git@vger.kernel.org,  msysGit <msysgit@googlegroups.com>
Subject: Re: Windows performance / threading file access
Date: Fri, 11 Oct 2013 02:51:28 +0200	[thread overview]
Message-ID: <52574B90.3070309@gmail.com> (raw)
In-Reply-To: <52570BC1.2040208@gmail.com>

Am 10.10.2013 22:19, schrieb Sebastian Schuberth:
> Please keep in mind to CC the msysgit mailing list for Windows-specific stuff. I'm also CC'ing Karsten who has worked on performance improvements for Git for Windows in the past.
> 

Thanks

> Thanks for bringing this up!
> 
> -- 
> Sebastian Schuberth
> 
> 
>> Hi folks,
>>
>> I don't follow the mailing list carefully, so forgive me if this has
>> been discussed before, but:
>>
>> I've noticed that when working with a very large repository using msys
>> git, the initial checkout of a cloned repository is excruciatingly
>> slow (80%+ of total clone time).  The root cause, I think, is that git
>> does all the file access serially, and that's really slow on Windows.
>>

What exactly do you mean by "excruciatingly slow"?

I just ran a few tests with a big repo (WebKit, ~2GB, ~200k files). A full checkout with git 1.8.4 on my SSD took 52s on Linux and 81s on Windows. Xcopy /s took ~4 minutes (so xcopy is much slower than git). On a 'real' HD (WD Caviar Green) the Windows checkout took ~9 minutes.

That's not so bad I think, considering that we read from pack files and write both files and directory structures, so there's a lot of disk seeking involved.

If your numbers are much slower, check for overeager virus scanners and probably the infamous "User Account Control" (On Vista/7 (8?), the luafv.sys driver slows down things on the system drive even with UAC turned off in control panel. The driver can be disabled with "sc config luafv start= disabled" + reboot. Reenable with "sc config luafv start= auto").

>> Has anyone considered threading file access to speed this up?  In
>> particular, I've got my eye on this loop in unpack-trees.c:
>>

Its probably worth a try, however, in my experience, doing disk IO in parallel tends to slow things down due to more disk seeks.

I'd rather try to minimize seeks, e.g.:

* read the blob data for a block of cache_entries, then write out the files, repeat (this would require lots of memory, though)

* index->cache is typically sorted by name and pack files by size, right? Perhaps its faster to iterate cache_entries by size so that we read the pack file sequentially (but then we'd write files/directories in random order...)


If you want to measure exactly which part of checkout eats the performance, check out this: https://github.com/kblees/git/commits/kb/performance-tracing-v3

Bye,
Karsten

-- 
-- 
*** Please reply-to-all at all times ***
*** (do not pretend to know who is subscribed and who is not) ***
*** Please avoid top-posting. ***
The msysGit Wiki is here: https://github.com/msysgit/msysgit/wiki - Github accounts are free.

You received this message because you are subscribed to the Google
Groups "msysGit" group.
To post to this group, send email to msysgit@googlegroups.com
To unsubscribe from this group, send email to
msysgit+unsubscribe@googlegroups.com
For more options, and view previous threads, visit this group at
http://groups.google.com/group/msysgit?hl=en_US?hl=en

--- 
You received this message because you are subscribed to the Google Groups "msysGit" group.
To unsubscribe from this group and stop receiving emails from it, send an email to msysgit+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

  reply	other threads:[~2013-10-11  0:51 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-10-10 18:18 Windows performance / threading file access Stefan Zager
2013-10-10 20:19 ` Sebastian Schuberth
2013-10-11  0:51   ` Karsten Blees [this message]
2013-10-11  5:28     ` Stefan Zager
2013-10-11  5:35     ` Stefan Zager
2013-10-11  5:48       ` Duy Nguyen
2013-10-15 22:22       ` pro-logic
2013-10-17 16:50         ` Karsten Blees
2013-10-21 22:58           ` pro-logic
2013-10-22 14:30             ` Karsten Blees
2013-10-22 14:49               ` Sebastian Schuberth
2013-10-22 15:40                 ` Karsten Blees

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52574B90.3070309@gmail.com \
    --to=karsten.blees@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=msysgit@googlegroups.com \
    --cc=sschuberth@gmail.com \
    --cc=szager@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.