* [JGIT] Questions about binary right shifting
@ 2009-02-05 14:37 Yann Simon
2009-02-05 15:21 ` Shawn O. Pearce
0 siblings, 1 reply; 3+ messages in thread
From: Yann Simon @ 2009-02-05 14:37 UTC (permalink / raw)
To: Robin Rosenberg, Shawn O. Pearce; +Cc: git
Hi,
I can see in the code that the signed right shifting is used.
Could it be a problem? Or do we manipulate only positive numbers?
diff --git a/org.spearce.jgit/src/org/spearce/jgit/dircache/DirCache.java b/org.spearce.jgit/src/org/spearce/jgit/dircache/DirCache.java
index 8eb4022..d70eca0 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/dircache/DirCache.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/dircache/DirCache.java
@@ -577,7 +577,7 @@ int findEntry(final byte[] p, final int pLen) {
int low = 0;
int high = entryCnt;
do {
- int mid = (low + high) >> 1;
+ int mid = (low + high) >>> 1;
final int cmp = cmp(p, pLen, sortedEntries[mid]);
if (cmp < 0)
high = mid;
And could we used right shifting to optimize a division per 2?
diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/Tree.java b/org.spearce.jgit/src/org/spearce/jgit/lib/Tree.java
index 0ecd04d..ff9e666 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/lib/Tree.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/lib/Tree.java
@@ -136,7 +136,7 @@ private static final int binarySearch(final TreeEntry[] entries,
int high = entries.length;
int low = 0;
do {
- final int mid = (low + high) / 2;
+ final int mid = (low + high) >>> 1;
final int cmp = compareNames(entries[mid].getNameUTF8(), nameUTF8,
nameStart, nameEnd, TreeEntry.lastChar(entries[mid]), nameUTF8last);
if (cmp < 0)
Yann
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [JGIT] Questions about binary right shifting
2009-02-05 14:37 [JGIT] Questions about binary right shifting Yann Simon
@ 2009-02-05 15:21 ` Shawn O. Pearce
2009-02-05 20:32 ` Robin Rosenberg
0 siblings, 1 reply; 3+ messages in thread
From: Shawn O. Pearce @ 2009-02-05 15:21 UTC (permalink / raw)
To: Yann Simon; +Cc: Robin Rosenberg, git
Yann Simon <yann.simon.fr@gmail.com> wrote:
> I can see in the code that the signed right shifting is used.
> Could it be a problem? Or do we manipulate only positive numbers?
Heh. We shouldn't ever be dealing with a case where there are more
than 1 billion files in the index, and thus we should never see
the expression "low + high" wrap negative, and then try to divide
by 2 here.
Binary search on a 1 billion file index would hurt, badly. I'm not
sure what project even has 1 billion source files to checkout
on disk. Isn't that easily a 4 TB filesystem just for the ext2
inodes of the working tree files? :-)
Oh, and we can't even have an index with 1 billion entries in it
anyway. DirCache.readFrom allocates a byte[] of entryCnt * 62.
So our largest number of files permitted is 34,636,833, due to
the limit on byte[] maxing out at 2 GB. Thus high below cannot
ever be larger than 34 million, and we can't wrap.
This is also true in the Tree class, where we have the entire tree
data stream in a byte[]. Each entry needs at least 23 bytes, but
more if you include some sort of unique file name. That really puts
a limit on the number of unique names to the point that we cannot
have entries.length so large that the binary search would overflow.
Anyway, if you tie this up into a patch with a message I'll apply,
but I don't care enough to write the message myself given how
unlikely this is. We'd have to overhaul other code anyway before
we start to run into problems with these search functions.
> diff --git a/org.spearce.jgit/src/org/spearce/jgit/dircache/DirCache.java b/org.spearce.jgit/src/org/spearce/jgit/dircache/DirCache.java
> index 8eb4022..d70eca0 100644
> --- a/org.spearce.jgit/src/org/spearce/jgit/dircache/DirCache.java
> +++ b/org.spearce.jgit/src/org/spearce/jgit/dircache/DirCache.java
> @@ -577,7 +577,7 @@ int findEntry(final byte[] p, final int pLen) {
> int low = 0;
> int high = entryCnt;
> do {
> - int mid = (low + high) >> 1;
> + int mid = (low + high) >>> 1;
> final int cmp = cmp(p, pLen, sortedEntries[mid]);
> if (cmp < 0)
> high = mid;
>
> And could we used right shifting to optimize a division per 2?
>
> diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/Tree.java b/org.spearce.jgit/src/org/spearce/jgit/lib/Tree.java
> index 0ecd04d..ff9e666 100644
> --- a/org.spearce.jgit/src/org/spearce/jgit/lib/Tree.java
> +++ b/org.spearce.jgit/src/org/spearce/jgit/lib/Tree.java
> @@ -136,7 +136,7 @@ private static final int binarySearch(final TreeEntry[] entries,
> int high = entries.length;
> int low = 0;
> do {
> - final int mid = (low + high) / 2;
> + final int mid = (low + high) >>> 1;
> final int cmp = compareNames(entries[mid].getNameUTF8(), nameUTF8,
> nameStart, nameEnd, TreeEntry.lastChar(entries[mid]), nameUTF8last);
> if (cmp < 0)
>
> Yann
--
Shawn.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [JGIT] Questions about binary right shifting
2009-02-05 15:21 ` Shawn O. Pearce
@ 2009-02-05 20:32 ` Robin Rosenberg
0 siblings, 0 replies; 3+ messages in thread
From: Robin Rosenberg @ 2009-02-05 20:32 UTC (permalink / raw)
To: Shawn O. Pearce; +Cc: Yann Simon, git
torsdag 05 februari 2009 16:21:08 skrev Shawn O. Pearce:
> Yann Simon <yann.simon.fr@gmail.com> wrote:
> > I can see in the code that the signed right shifting is used.
> > Could it be a problem? Or do we manipulate only positive numbers?
>
> Heh. We shouldn't ever be dealing with a case where there are more
> than 1 billion files in the index, and thus we should never see
> the expression "low + high" wrap negative, and then try to divide
> by 2 here.
>
> Binary search on a 1 billion file index would hurt, badly. I'm not
Roughly three times the search time in the linux kernel repo. Not
that bad.
> sure what project even has 1 billion source files to checkout
> on disk. Isn't that easily a 4 TB filesystem just for the ext2
Not even KDE or Eclipse.
> inodes of the working tree files? :-)
> Oh, and we can't even have an index with 1 billion entries in it
> anyway. DirCache.readFrom allocates a byte[] of entryCnt * 62.
> So our largest number of files permitted is 34,636,833, due to
> the limit on byte[] maxing out at 2 GB. Thus high below cannot
> ever be larger than 34 million, and we can't wrap.
Assuming the whole index is in memory. But I guess we can discard
it as most competing systems choke on a lot lot less and for very
large indexes we get other problems with size. Binary search isn't
one of them. Having i packed index is probably the biggest issue
since it has to be completely rewritten when we add or remove
files. But, OK, we won't solve this unless we have a case. I think
our current challenge is dealing with 100k files real smoothly.
Patching this to make findbugs shut up is ok with me.
-- robin
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2009-02-05 20:33 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-02-05 14:37 [JGIT] Questions about binary right shifting Yann Simon
2009-02-05 15:21 ` Shawn O. Pearce
2009-02-05 20:32 ` Robin Rosenberg
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).