* Show of hands, how many set USE_NSEC
@ 2008-08-08 16:34 Shawn O. Pearce
2008-08-08 16:55 ` Johannes Schindelin
2008-08-13 20:01 ` Robin Rosenberg
0 siblings, 2 replies; 7+ messages in thread
From: Shawn O. Pearce @ 2008-08-08 16:34 UTC (permalink / raw)
To: git
How many users really build their Git with USE_NSEC=1?
I'm suspecting a status issue in jgit caused by jgit honoring a
millisecond resolution on file modification timestamps, and the
underlying filesystem supporting at least a 1/2 second (or finer)
granularity, but C Git was built without USE_NSEC so it only honors
1 second granularity.
This can cause jgit to think a file is locally modified as the
mtime has data in the tv_nsec field, but C Git set that to 0 in
the index as USE_NSEC wasn't enabled at build time.
I'm trying to come up with a sane way for jgit to realize it should
truncate the milliseconds out of a timestamp before it comes to
the index record.
--
Shawn.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Show of hands, how many set USE_NSEC
2008-08-08 16:34 Show of hands, how many set USE_NSEC Shawn O. Pearce
@ 2008-08-08 16:55 ` Johannes Schindelin
2008-08-08 16:57 ` Shawn O. Pearce
2008-08-13 20:01 ` Robin Rosenberg
1 sibling, 1 reply; 7+ messages in thread
From: Johannes Schindelin @ 2008-08-08 16:55 UTC (permalink / raw)
To: Shawn O. Pearce; +Cc: git
Hi,
On Fri, 8 Aug 2008, Shawn O. Pearce wrote:
> How many users really build their Git with USE_NSEC=1?
I don't.
> I'm trying to come up with a sane way for jgit to realize it should
> truncate the milliseconds out of a timestamp before it comes to the
> index record.
You could add a config variable. I hope that soon, we no longer need to
share the same index between C Git and JGit (I hope for a delta
pack implementation in Java...)
Ciao,
Dscho
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Show of hands, how many set USE_NSEC
2008-08-08 16:55 ` Johannes Schindelin
@ 2008-08-08 16:57 ` Shawn O. Pearce
2008-08-08 17:42 ` Linus Torvalds
0 siblings, 1 reply; 7+ messages in thread
From: Shawn O. Pearce @ 2008-08-08 16:57 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: git
Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> On Fri, 8 Aug 2008, Shawn O. Pearce wrote:
>
> > I'm trying to come up with a sane way for jgit to realize it should
> > truncate the milliseconds out of a timestamp before it comes to the
> > index record.
>
> You could add a config variable.
I was thinking a gitconfig (e.g. jcore.usensec) to enable the tv_nsec
usage (ok, well milliseconds only) in that repository, or globally
(if in ~/.gitconfig).
I also thought about looking at the index records to see if the
tv_nsec fields were always 0. If all of them were 0 it would be a
good indication that the filesystem doesn't support that level of
granularity, or that whoever last wrote this index doesn't support
that level of granularity. But this is a very expensive operation
to perform, relatively speaking.
> I hope that soon, we no longer need to
> share the same index between C Git and JGit (I hope for a delta
> pack implementation in Java...)
I fail to see what the DIRC (.git/index) file format and its cache
of tv_sec/tv_nsec has to do with delta pack implementation in Java.
Or are you saying that you could stop using C Git in certain cases
if you had delta pack generation in Java?
Really I'd just like to scrap the entire DIRC file format and do
it over again. Having the flat namespace is nuts. Linus and I
really disagree here, and since I have never produced code for C
Git to replace it (and prove why its better) I think he has me in
his kill file now. :)
--
Shawn.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Show of hands, how many set USE_NSEC
2008-08-08 16:57 ` Shawn O. Pearce
@ 2008-08-08 17:42 ` Linus Torvalds
2008-08-08 17:52 ` Shawn O. Pearce
0 siblings, 1 reply; 7+ messages in thread
From: Linus Torvalds @ 2008-08-08 17:42 UTC (permalink / raw)
To: Shawn O. Pearce; +Cc: Johannes Schindelin, git
On Fri, 8 Aug 2008, Shawn O. Pearce wrote:
>
> Really I'd just like to scrap the entire DIRC file format and do
> it over again. Having the flat namespace is nuts. Linus and I
> really disagree here, and since I have never produced code for C
> Git to replace it (and prove why its better) I think he has me in
> his kill file now. :)
I really disagree, because you have no clue about performance.
The flat file format is absolutely _critical_ for performance.
Try this:
time cat .git/index > /dev/null
time git ls-tree -r HEAD > /dev/null
which isn't quite fair (but see how it's unfair both ways later), but
gives you an idea of the cost of recursively reading lots of small files.
Notice how the latter is about an order-and-a-half slower?
Then try it with a cold cache. If you thought the "ls-tree" was unfair
because it used things like zlib and pack-files, realize that the ls-tree
actually almost certainly has _better_ IO patterns than doing individual
files. So when you do the cold-cache case, we're actually giving an unfair
advantage to ls-tree (assuming fully packed repo like I have).
And it still loses in a big big way. If it was actually one file per
directory, it would be _horrible_. You could kind of approximate the IO
patterns for that case by doing
time find . -name don-t-exist > /dev/null
(which obviousyl uses "readdir()" instead of read(), but the IO patterns
should be similar).
The point is, the index file absolutely *HAS* to be a single file in order
to perform well in a big project. Otherwise there's no point, and you
might as well just use a git "tree" object for everything.
Now, if you talk about the _sorting_ order (as opposed to the flatness of
the file), I could probably agree. The sort order was probably a mistake.
That said, we're stuck with it. You can't change it without changing the
tree object format, so it's not just an "local index file" format issue.
Linus
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Show of hands, how many set USE_NSEC
2008-08-08 17:42 ` Linus Torvalds
@ 2008-08-08 17:52 ` Shawn O. Pearce
2008-08-08 18:00 ` Linus Torvalds
0 siblings, 1 reply; 7+ messages in thread
From: Shawn O. Pearce @ 2008-08-08 17:52 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Johannes Schindelin, git
Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Fri, 8 Aug 2008, Shawn O. Pearce wrote:
> >
> > Really I'd just like to scrap the entire DIRC file format and do
> > it over again. Having the flat namespace is nuts.
>
> I really disagree, because you have no clue about performance.
Actually, I do have half a clue. I just haven't convinced you of
that yet. Something to aspire to.
> The flat file format is absolutely _critical_ for performance.
I know, I agree.
> The point is, the index file absolutely *HAS* to be a single file in order
> to perform well in a big project. Otherwise there's no point, and you
> might as well just use a git "tree" object for everything.
Yes, it _has_ to be a single file, at the root of the directory tree.
I agree with that design decision entirely.
That single file however does not need to be structured internallyy
the way it is.
> Now, if you talk about the _sorting_ order (as opposed to the flatness of
> the file), I could probably agree. The sort order was probably a mistake.
> That said, we're stuck with it. You can't change it without changing the
> tree object format, so it's not just an "local index file" format issue.
I was talking about something like the 'TREE' extension. If we
used a format such as:
index_file:: tree
tree:: entry_count sha1_id record*
record:: mode pathlen path (tree | file)
file:: sha1_id ctime mtime ....
And sorted the entries within each tree record by their path (or
path+mode in "Git style") then we wind up with all records in the
index file in the same order they are today, and we don't have the
big redundant path strings like "src/a.c", "src/b.c", "src/c.c".
This may actually create a _smaller_ index file, resulting in a
few less minor page faults as we read it in.
--
Shawn.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Show of hands, how many set USE_NSEC
2008-08-08 17:52 ` Shawn O. Pearce
@ 2008-08-08 18:00 ` Linus Torvalds
0 siblings, 0 replies; 7+ messages in thread
From: Linus Torvalds @ 2008-08-08 18:00 UTC (permalink / raw)
To: Shawn O. Pearce; +Cc: Johannes Schindelin, git
On Fri, 8 Aug 2008, Shawn O. Pearce wrote:
>
> That single file however does not need to be structured internallyy
> the way it is.
Ok, at that point I certainly agree.
As long as we're talking about a single flat file, I don't think it would
be all that painful to have a totally new index format. The original
format was (obviously) designed to be just mmap'ed and turned into a C
array with no real parsing.
But once we started building a separate index data structure with internal
structure _anyway_ (for the extended in-memory flags and the filename
hashing), a lot of the reasons for the original format kind of went away.
Linus
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Show of hands, how many set USE_NSEC
2008-08-08 16:34 Show of hands, how many set USE_NSEC Shawn O. Pearce
2008-08-08 16:55 ` Johannes Schindelin
@ 2008-08-13 20:01 ` Robin Rosenberg
1 sibling, 0 replies; 7+ messages in thread
From: Robin Rosenberg @ 2008-08-13 20:01 UTC (permalink / raw)
To: Shawn O. Pearce; +Cc: git
fredagen den 8 augusti 2008 18.34.55 skrev Shawn O. Pearce:
> How many users really build their Git with USE_NSEC=1?
>
> I'm suspecting a status issue in jgit caused by jgit honoring a
> millisecond resolution on file modification timestamps, and the
> underlying filesystem supporting at least a 1/2 second (or finer)
> granularity, but C Git was built without USE_NSEC so it only honors
> 1 second granularity.
>
> This can cause jgit to think a file is locally modified as the
> mtime has data in the tv_nsec field, but C Git set that to 0 in
> the index as USE_NSEC wasn't enabled at build time.
When jgit finds an index entry with zero nsec it ignores the subsecond
portion of the file timestamp when comparing.
-- robin
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2008-08-13 20:44 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-08-08 16:34 Show of hands, how many set USE_NSEC Shawn O. Pearce
2008-08-08 16:55 ` Johannes Schindelin
2008-08-08 16:57 ` Shawn O. Pearce
2008-08-08 17:42 ` Linus Torvalds
2008-08-08 17:52 ` Shawn O. Pearce
2008-08-08 18:00 ` Linus Torvalds
2008-08-13 20:01 ` Robin Rosenberg
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).