From: Karsten Blees <karsten.blees@gmail.com>
To: Sebastian Schuberth <sschuberth@gmail.com>
Cc: Stefan Zager <szager@google.com>,
git@vger.kernel.org, msysGit <msysgit@googlegroups.com>
Subject: Re: Windows performance / threading file access
Date: Fri, 11 Oct 2013 02:51:28 +0200 [thread overview]
Message-ID: <52574B90.3070309@gmail.com> (raw)
In-Reply-To: <52570BC1.2040208@gmail.com>
Am 10.10.2013 22:19, schrieb Sebastian Schuberth:
> Please keep in mind to CC the msysgit mailing list for Windows-specific stuff. I'm also CC'ing Karsten who has worked on performance improvements for Git for Windows in the past.
>
Thanks
> Thanks for bringing this up!
>
> --
> Sebastian Schuberth
>
>
>> Hi folks,
>>
>> I don't follow the mailing list carefully, so forgive me if this has
>> been discussed before, but:
>>
>> I've noticed that when working with a very large repository using msys
>> git, the initial checkout of a cloned repository is excruciatingly
>> slow (80%+ of total clone time). The root cause, I think, is that git
>> does all the file access serially, and that's really slow on Windows.
>>
What exactly do you mean by "excruciatingly slow"?
I just ran a few tests with a big repo (WebKit, ~2GB, ~200k files). A full checkout with git 1.8.4 on my SSD took 52s on Linux and 81s on Windows. Xcopy /s took ~4 minutes (so xcopy is much slower than git). On a 'real' HD (WD Caviar Green) the Windows checkout took ~9 minutes.
That's not so bad I think, considering that we read from pack files and write both files and directory structures, so there's a lot of disk seeking involved.
If your numbers are much slower, check for overeager virus scanners and probably the infamous "User Account Control" (On Vista/7 (8?), the luafv.sys driver slows down things on the system drive even with UAC turned off in control panel. The driver can be disabled with "sc config luafv start= disabled" + reboot. Reenable with "sc config luafv start= auto").
>> Has anyone considered threading file access to speed this up? In
>> particular, I've got my eye on this loop in unpack-trees.c:
>>
Its probably worth a try, however, in my experience, doing disk IO in parallel tends to slow things down due to more disk seeks.
I'd rather try to minimize seeks, e.g.:
* read the blob data for a block of cache_entries, then write out the files, repeat (this would require lots of memory, though)
* index->cache is typically sorted by name and pack files by size, right? Perhaps its faster to iterate cache_entries by size so that we read the pack file sequentially (but then we'd write files/directories in random order...)
If you want to measure exactly which part of checkout eats the performance, check out this: https://github.com/kblees/git/commits/kb/performance-tracing-v3
Bye,
Karsten
--
--
*** Please reply-to-all at all times ***
*** (do not pretend to know who is subscribed and who is not) ***
*** Please avoid top-posting. ***
The msysGit Wiki is here: https://github.com/msysgit/msysgit/wiki - Github accounts are free.
You received this message because you are subscribed to the Google
Groups "msysGit" group.
To post to this group, send email to msysgit@googlegroups.com
To unsubscribe from this group, send email to
msysgit+unsubscribe@googlegroups.com
For more options, and view previous threads, visit this group at
http://groups.google.com/group/msysgit?hl=en_US?hl=en
---
You received this message because you are subscribed to the Google Groups "msysGit" group.
To unsubscribe from this group and stop receiving emails from it, send an email to msysgit+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
next prev parent reply other threads:[~2013-10-11 0:51 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-10-10 18:18 Windows performance / threading file access Stefan Zager
2013-10-10 20:19 ` Sebastian Schuberth
2013-10-11 0:51 ` Karsten Blees [this message]
2013-10-11 5:28 ` Stefan Zager
2013-10-11 5:35 ` Stefan Zager
2013-10-11 5:48 ` Duy Nguyen
2013-10-15 22:22 ` pro-logic
2013-10-17 16:50 ` Karsten Blees
2013-10-21 22:58 ` pro-logic
2013-10-22 14:30 ` Karsten Blees
2013-10-22 14:49 ` Sebastian Schuberth
2013-10-22 15:40 ` Karsten Blees
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=52574B90.3070309@gmail.com \
--to=karsten.blees@gmail.com \
--cc=git@vger.kernel.org \
--cc=msysgit@googlegroups.com \
--cc=sschuberth@gmail.com \
--cc=szager@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).