git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Shawn O. Pearce" <spearce@spearce.org>
To: Yann Simon <yann.simon.fr@gmail.com>
Cc: Robin Rosenberg <robin.rosenberg.lists@dewire.com>,
	git <git@vger.kernel.org>
Subject: Re: [JGIT] Questions about binary right shifting
Date: Thu, 5 Feb 2009 07:21:08 -0800	[thread overview]
Message-ID: <20090205152108.GE26880@spearce.org> (raw)
In-Reply-To: <498AF9B1.1060200@gmail.com>

Yann Simon <yann.simon.fr@gmail.com> wrote:
> I can see in the code that the signed right shifting is used.
> Could it be a problem? Or do we manipulate only positive numbers?

Heh.  We shouldn't ever be dealing with a case where there are more
than 1 billion files in the index, and thus we should never see
the expression "low + high" wrap negative, and then try to divide
by 2 here.

Binary search on a 1 billion file index would hurt, badly.  I'm not
sure what project even has 1 billion source files to checkout
on disk.  Isn't that easily a 4 TB filesystem just for the ext2
inodes of the working tree files?  :-)

Oh, and we can't even have an index with 1 billion entries in it
anyway.  DirCache.readFrom allocates a byte[] of entryCnt * 62.
So our largest number of files permitted is 34,636,833, due to
the limit on byte[] maxing out at 2 GB.  Thus high below cannot
ever be larger than 34 million, and we can't wrap.

This is also true in the Tree class, where we have the entire tree
data stream in a byte[].  Each entry needs at least 23 bytes, but
more if you include some sort of unique file name.  That really puts
a limit on the number of unique names to the point that we cannot
have entries.length so large that the binary search would overflow.

Anyway, if you tie this up into a patch with a message I'll apply,
but I don't care enough to write the message myself given how
unlikely this is.  We'd have to overhaul other code anyway before
we start to run into problems with these search functions.


> diff --git a/org.spearce.jgit/src/org/spearce/jgit/dircache/DirCache.java b/org.spearce.jgit/src/org/spearce/jgit/dircache/DirCache.java
> index 8eb4022..d70eca0 100644
> --- a/org.spearce.jgit/src/org/spearce/jgit/dircache/DirCache.java
> +++ b/org.spearce.jgit/src/org/spearce/jgit/dircache/DirCache.java
> @@ -577,7 +577,7 @@ int findEntry(final byte[] p, final int pLen) {
>  		int low = 0;
>  		int high = entryCnt;
>  		do {
> -			int mid = (low + high) >> 1;
> +			int mid = (low + high) >>> 1;
>  			final int cmp = cmp(p, pLen, sortedEntries[mid]);
>  			if (cmp < 0)
>  				high = mid;
> 
> And could we used right shifting to optimize a division per 2?
> 
> diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/Tree.java b/org.spearce.jgit/src/org/spearce/jgit/lib/Tree.java
> index 0ecd04d..ff9e666 100644
> --- a/org.spearce.jgit/src/org/spearce/jgit/lib/Tree.java
> +++ b/org.spearce.jgit/src/org/spearce/jgit/lib/Tree.java
> @@ -136,7 +136,7 @@ private static final int binarySearch(final TreeEntry[] entries,
>  		int high = entries.length;
>  		int low = 0;
>  		do {
> -			final int mid = (low + high) / 2;
> +			final int mid = (low + high) >>> 1;
>  			final int cmp = compareNames(entries[mid].getNameUTF8(), nameUTF8,
>  					nameStart, nameEnd, TreeEntry.lastChar(entries[mid]), nameUTF8last);
>  			if (cmp < 0)
> 
> Yann

-- 
Shawn.

  reply	other threads:[~2009-02-05 15:22 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-02-05 14:37 [JGIT] Questions about binary right shifting Yann Simon
2009-02-05 15:21 ` Shawn O. Pearce [this message]
2009-02-05 20:32   ` Robin Rosenberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090205152108.GE26880@spearce.org \
    --to=spearce@spearce.org \
    --cc=git@vger.kernel.org \
    --cc=robin.rosenberg.lists@dewire.com \
    --cc=yann.simon.fr@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).