From: Andreas Ericsson <ae@op5.se>
To: Eli Zaretskii <eliz@gnu.org>
Cc: Daniel Barkalow <barkalow@iabervon.org>,
raa.lkml@gmail.com, Johannes.Schindelin@gmx.de,
tsuna@lrde.epita.fr, git@vger.kernel.org, make-w32@gnu.org
Subject: Re: Switching from CVS to GIT
Date: Tue, 16 Oct 2007 07:14:56 +0200 [thread overview]
Message-ID: <471448D0.6080200@op5.se> (raw)
In-Reply-To: <uodezisvg.fsf@gnu.org>
Eli Zaretskii wrote:
>> Date: Mon, 15 Oct 2007 20:45:02 -0400 (EDT)
>> From: Daniel Barkalow <barkalow@iabervon.org>
>> cc: Alex Riesen <raa.lkml@gmail.com>, Johannes.Schindelin@gmx.de, ae@op5.se,
>> tsuna@lrde.epita.fr, git@vger.kernel.org, make-w32@gnu.org
>>
>> I believe the hassle is that readdir doesn't necessarily report a README in
>> a directory which is supposed to have a README, when it has a readme
>> instead.
>
> Sorry I'm asking potentially stupid questions out of ignorance: why
> would you want readdir to return `README' when you have `readme'?
>
Because it might have been checked in as README, and since git is case
sensitive that is what it'll think should be there when it reads the
directories. If it's not, users get to see
removed: README
untracked: readme
and there's really no easy way out of this one, since users on a case-
sensitive filesystem might be involved in this project too, so it
could be an intentional rename, but we don't know for sure. Just
clobbering the in-git file is wrong, but overwriting a file on disk
is wrong too. git tries hard to not ever lose any data for the user.
>
>>>> - no acceptable level of performance in filesystem and VFS (readdir,
>>>> stat, open and read/write are annoyingly slow)
>>> With what libraries? Native `stat' and `readdir' are quite fast.
>>> Perhaps you mean the ported glibc (libgw32c), where `readdir' is
>>> indeed painfully slow, but then you don't need to use it.
>> We want getting stat info, using readdir to figure out what files exist,
>> for 106083 files in 1603 directories with a hot cache to take under 1s;
>> otherwise "git status" takes a noticeable amount of time with a medium-big
>> project, and we want people to be able to get info on what's changed
>> effectively instantly. My impression is that Windows' native stat and
>> readdir are plenty fast for what normal Windows programs want, but we
>> actually expect reasonable performance on an unreasonably-big
>> metadata-heavy input.
>
> If that's the issue, then it's not a good idea to call `stat' and
> `readdir' on Windows at all. `stat' is a single system call on Posix
> systems, while on Windows it usually needs to go out of its way
> calling half a dozen system services to gather the `struct stat' info.
> You need to call something like FindFirstFile, which can do the job of
> `stat' and `readdir' together (and of `fnmatch', if you need to filter
> only some files) in one go. I don't know whether this will scan 100K
> files under one second (maybe I will try it one of these days), but it
> will definitely be faster than `readdir'+`stat' by maybe as much as an
> order of magnitude.
>
To be honest though, there are so many places which do the readdir+stat
that I don't think it'd be worth factoring it out, especially since it
*works* on windows. It's just slow, and only slow compared to various
unices. I *think* (correct me if I'm wrong) that git is still faster
than a whole bunch of other scm's on windows, but to one who's used to
its performance on Linux that waiting several seconds to scan 10k files
just feels wrong.
>> We also expect to be able to make a sequence of file system operations
>> such that programs starting at any time see the same database as the files
>> containing the database get restructured.
>
> Sorry, I don't understand this; please tell more about the operations,
> ``the same database'' issue (what database?)
The object database, located under .git/objects.
> and what do you mean by
> ``the files containing the database get restructured''.
>
/* I'm on a limb here. Nicolas Pitre knows the git packfile format, so
* perhaps he'll be kind enough to correct me if I'm wrong */
The mmap() stuff is primarily convenient when reading huge packfiles. As
far as I understand it, they're ordered by some sort of delta similarity
score, so mmap()'ing 100MiB or so of a certain packfile will most likely
mean we have a couple of thousand "connected" revisions in memory. That
database gets sort of restructured as the memory-chunk that's mmap()'ed
get moved to read in the next couple of thousand revisions.
In all honesty, this doesn't matter much for already fully packed projects
unless they're significantly larger than the Linux kernel, since git is so
amazingly good at compressing large repos to a small size. Linux is ~180
MiB fully packed, and most developer's systems could just read() that
entire packfile into memory without much problem. But then again, no-one's
ever had problems supporting the "normal" cases.
--
Andreas Ericsson andreas.ericsson@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
next prev parent reply other threads:[~2007-10-16 5:15 UTC|newest]
Thread overview: 120+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1192293466.17584.95.camel@homebase.localnet>
[not found] ` <uy7e6keyv.fsf@gnu.org>
[not found] ` <1192381040.4908.57.camel@homebase.localnet>
2007-10-14 17:10 ` Switching from CVS to GIT Benoit SIGOURE
2007-10-14 18:06 ` Marco Costalba
2007-10-14 18:20 ` Johannes Schindelin
2007-10-15 5:35 ` Martin Langhoff
2007-10-14 18:27 ` Andreas Ericsson
2007-10-14 18:39 ` Johannes Schindelin
2007-10-14 19:09 ` Andreas Ericsson
2007-10-14 20:14 ` Johannes Schindelin
2007-10-14 22:14 ` Alex Riesen
2007-10-14 22:41 ` Eli Zaretskii
2007-10-14 23:45 ` Johannes Schindelin
2007-10-15 0:36 ` Brian Dessent
2007-10-15 1:22 ` Johannes Schindelin
2007-10-15 1:24 ` Johannes Schindelin
2007-10-15 6:04 ` Eli Zaretskii
2007-10-15 7:56 ` Steffen Prohaska
2007-10-15 8:20 ` Eli Zaretskii
2007-10-15 8:47 ` Johannes Schindelin
2007-10-15 11:09 ` Eli Zaretskii
2007-10-15 12:31 ` Johannes Sixt
2007-10-15 12:37 ` Eli Zaretskii
2007-10-15 18:29 ` Paul Smith
2007-10-15 9:23 ` Steffen Prohaska
2007-10-15 11:06 ` Eli Zaretskii
2007-10-15 4:12 ` Eli Zaretskii
2007-10-15 8:34 ` Johannes Schindelin
2007-10-15 9:02 ` Benoit SIGOURE
2007-10-15 17:56 ` Alex Riesen
2007-10-15 18:37 ` Brian Dessent
2007-10-15 18:44 ` Johannes Schindelin
2007-10-15 19:07 ` Brian Dessent
2007-10-15 19:27 ` Johannes Schindelin
2007-10-15 20:24 ` Linus Torvalds
2007-10-15 20:36 ` Johannes Schindelin
2007-10-15 19:42 ` Alex Riesen
2007-10-15 19:48 ` Eli Zaretskii
2007-10-15 19:58 ` Johannes Schindelin
2007-10-15 21:06 ` Eli Zaretskii
2007-10-15 20:05 ` Brian Dessent
2007-10-15 20:19 ` Johannes Schindelin
2007-10-15 20:43 ` Steffen Prohaska
2007-10-15 20:46 ` Johannes Schindelin
2007-10-16 2:24 ` Nguyen Thai Ngoc Duy
2007-10-16 4:16 ` Eli Zaretskii
2007-10-16 10:09 ` Nguyen Thai Ngoc Duy
2007-10-16 12:18 ` Eli Zaretskii
2007-10-16 6:17 ` Steffen Prohaska
2007-10-15 21:08 ` Eli Zaretskii
2007-10-15 20:05 ` Mark Watts
2007-10-15 4:06 ` Eli Zaretskii
2007-10-15 5:56 ` Eli Zaretskii
2007-10-15 8:44 ` Johannes Schindelin
2007-10-15 8:56 ` David Kastrup
2007-10-15 8:57 ` David Kastrup
2007-10-15 17:49 ` Alex Riesen
2007-10-15 18:25 ` Dave Korn
2007-10-15 18:34 ` Johannes Schindelin
2007-10-15 19:34 ` Alex Riesen
2007-10-15 17:53 ` Alex Riesen
2007-10-14 23:55 ` Andreas Ericsson
2007-10-16 0:45 ` Daniel Barkalow
2007-10-16 4:30 ` Eli Zaretskii
2007-10-16 5:14 ` Andreas Ericsson [this message]
2007-10-16 6:25 ` Eli Zaretskii
2007-10-16 7:07 ` Daniel Barkalow
2007-10-16 12:29 ` Johannes Schindelin
2007-10-16 12:38 ` Peter Karlsson
2007-10-16 13:04 ` Eli Zaretskii
2007-10-16 12:53 ` Eli Zaretskii
2007-10-16 12:59 ` David Kastrup
2007-10-16 13:15 ` Johannes Schindelin
2007-10-16 15:47 ` Dave Korn
2007-10-16 15:56 ` David Brown
2007-10-16 16:04 ` Nicolas Pitre
2007-10-16 16:23 ` Dave Korn
2007-10-16 18:06 ` Christopher Faylor
2007-10-16 16:59 ` Andreas Ericsson
2007-10-16 7:14 ` Steffen Prohaska
2007-10-16 12:33 ` Johannes Schindelin
2007-10-16 13:16 ` Steffen Prohaska
2007-10-16 13:21 ` Johannes Schindelin
2007-10-16 13:50 ` Steffen Prohaska
2007-10-16 14:14 ` Johannes Schindelin
2007-10-16 14:36 ` Steffen Prohaska
2007-10-16 15:12 ` Eli Zaretskii
2007-10-17 19:33 ` Robin Rosenberg
2007-10-16 5:56 ` Daniel Barkalow
2007-10-16 7:03 ` Eli Zaretskii
2007-10-16 12:39 ` Johannes Schindelin
2007-10-16 12:47 ` David Kastrup
2007-10-16 13:16 ` Eli Zaretskii
2007-10-16 13:24 ` Johannes Schindelin
2007-10-16 15:02 ` Eli Zaretskii
2007-10-16 15:18 ` Johannes Schindelin
2007-10-16 15:43 ` Eli Zaretskii
2007-10-16 17:04 ` Daniel Barkalow
2007-10-16 6:06 ` David Kastrup
2007-10-16 6:42 ` Johannes Sixt
2007-10-16 7:17 ` Eli Zaretskii
2007-10-14 22:59 ` Dave Korn
2007-10-15 0:01 ` Johannes Schindelin
2007-10-15 17:36 ` Alex Riesen
2007-10-15 0:03 ` David Brown
2007-10-15 6:08 ` Eli Zaretskii
2007-10-15 10:16 ` Andreas Ericsson
2007-10-15 10:38 ` Johannes Sixt
2007-10-15 10:52 ` Andreas Ericsson
2007-10-15 11:16 ` Dave Korn
2007-10-15 0:46 ` Michael Gebetsroither
2007-10-15 17:38 ` Alex Riesen
2007-10-15 19:26 ` David Kastrup
2007-10-15 19:30 ` Alex Riesen
2007-10-16 11:13 ` Peter Karlsson
2007-10-15 5:43 ` Martin Langhoff
2007-10-15 6:39 ` Johannes Sixt
2007-10-15 23:12 ` Shawn O. Pearce
2007-10-16 6:10 ` Johannes Sixt
2007-10-16 6:21 ` Shawn O. Pearce
2007-10-16 6:29 ` Johannes Sixt
2007-10-16 15:16 ` Johannes Schindelin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=471448D0.6080200@op5.se \
--to=ae@op5.se \
--cc=Johannes.Schindelin@gmx.de \
--cc=barkalow@iabervon.org \
--cc=eliz@gnu.org \
--cc=git@vger.kernel.org \
--cc=make-w32@gnu.org \
--cc=raa.lkml@gmail.com \
--cc=tsuna@lrde.epita.fr \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).