All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Shawn O. Pearce" <spearce@spearce.org>
To: Marek Zawirski <marek.zawirski@gmail.com>
Cc: robin.rosenberg@dewire.com, git@vger.kernel.org
Subject: Re: [EGIT PATCH 05/20] Reverse pack index implementation: PackReverseIndex
Date: Mon, 16 Jun 2008 00:06:36 -0400	[thread overview]
Message-ID: <20080616040635.GU11793@spearce.org> (raw)
In-Reply-To: <1213566349-25395-6-git-send-email-marek.zawirski@gmail.com>

Marek Zawirski <marek.zawirski@gmail.com> wrote:
> Let us quickly find ObjectId or next object for specified offset in a
> pack, in O(log n) time.
...
> diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/PackReverseIndex.java b/org.spearce.jgit/src/org/spearce/jgit/lib/PackReverseIndex.java
...
> +	/**
> +	 * Object ids corresponding to {@link #offsets32} and {@link #offsets64}.
> +	 */
> +	private final int names[];

This could be smaller if it was an array of indexes into the index,
rather than the ObjectId itself.  Thus we need only 1 int per object
and not 5 ints per object.

But I see why you are doing it; PackIndex.MutableEntry exposes the
ObjectId and not the nth position of the object within the index.

> +	PackReverseIndex(final PackIndex index) {
> +		final long count = index.getObjectCount();
> +		if (count > Integer.MAX_VALUE)
> +			throw new IllegalArgumentException(
> +					"Huge indexes (> 2^31 entries) are not supported by jgit, yet");

For what its worth, this limit is actually:

	Integer.MAX_VALUE / Constants.OBJECT_ID_LENGTH / 4

as you store the entire ObjectId data for the index in a single int[]
array called names.  So you'll get an ArrayIndexOutOfBoundsException
or maybe OutOfMemoryError when you try to build names later on, and
never really hit this case here.

> +	ObjectId findObject(final long offset) {
> +		if (offset <= Integer.MAX_VALUE) {
> +			final int i32 = Arrays.binarySearch(offsets32, (int) offset);
> +			if (i32 < 0)
> +				return null;
> +			final int iNames = i32 * Constants.OBJECT_ID_LENGTH / 4;

This probably should be:

	final int iNames = i32 * (Constants.OBJECT_ID_LENGTH / 4);

as then we don't overflow the precision of int when i32 is large.

> +			return ObjectId.fromRaw(names, iNames);
> +		} else {
> +			final int i64 = Arrays.binarySearch(offsets64, offset);
> +			if (i64 < 0)
> +				return null;
> +			final int iNames = (i64 + offsets32.length)
> +					* Constants.OBJECT_ID_LENGTH / 4;

Again, watch out for overflow.

-- 
Shawn.

  parent reply	other threads:[~2008-06-16  4:07 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-06-15 21:45 [EGIT PATCH 00/20] PackWriter, first usable attempt Marek Zawirski
2008-06-15 21:45 ` [EGIT PATCH 01/20] Fix typo in PackIndexV2 Marek Zawirski
2008-06-15 21:45   ` [EGIT PATCH 02/20] Integer versions of copyRawTo() and fromRaw() in ObjectId Marek Zawirski
2008-06-15 21:45     ` [EGIT PATCH 03/20] Add openObjectInAllPacks() to Repository, exposing packed objects storage Marek Zawirski
2008-06-15 21:45       ` [EGIT PATCH 04/20] WindowedFile fragments copying: copyToStream() Marek Zawirski
2008-06-15 21:45         ` [EGIT PATCH 05/20] Reverse pack index implementation: PackReverseIndex Marek Zawirski
2008-06-15 21:45           ` [EGIT PATCH 06/20] Tests for PackReverseIndex Marek Zawirski
2008-06-15 21:45             ` [EGIT PATCH 07/20] Refactor PackIndexV2 - extract binarySearchLevelTwo() Marek Zawirski
2008-06-15 21:45               ` [EGIT PATCH 08/20] CRC32 support for PackIndex Marek Zawirski
2008-06-15 21:45                 ` [EGIT PATCH 09/20] CRC32 PackIndex tests Marek Zawirski
2008-06-15 21:45                   ` [EGIT PATCH 10/20] Format PackedObjectLoader class Marek Zawirski
2008-06-15 21:45                     ` [EGIT PATCH 11/20] Format UnpackedObjectLoader class Marek Zawirski
2008-06-15 21:45                       ` [EGIT PATCH 12/20] Format DeltaOfsPackedObjectLoader class Marek Zawirski
2008-06-15 21:45                         ` [EGIT PATCH 13/20] Raw-data operations in ObjectLoaders and PackFile Marek Zawirski
2008-06-15 21:45                           ` [EGIT PATCH 14/20] Add hasRevSort() in RevWalk for faster sorting strategy checking Marek Zawirski
2008-06-15 21:45                             ` [EGIT PATCH 15/20] Refactor getRevSort() calls to hasRevSort() Marek Zawirski
2008-06-15 21:45                               ` [EGIT PATCH 16/20] Support for RevSort.BOUNDARY in ObjectWalk Marek Zawirski
2008-06-15 21:45                                 ` [EGIT PATCH 17/20] Rename confusing objects field " Marek Zawirski
2008-06-15 21:45                                   ` [EGIT PATCH 18/20] New CountingOutputStream class - stream decorator Marek Zawirski
2008-06-15 21:45                                     ` [EGIT PATCH 19/20] Simplified implementation of pack creation: PackWriter Marek Zawirski
2008-06-15 21:45                                       ` [EGIT PATCH 20/20] PackWriter test suite Marek Zawirski
2008-06-17 21:28                                       ` [EGIT PATCH 21/20] Make isBetterDeltaReuseLoader() static in PackWriter Marek Zawirski
2008-06-17 22:07                                         ` Robin Rosenberg
2008-06-19 16:26                                           ` Marek Zawirski
2008-06-16  4:06           ` Shawn O. Pearce [this message]
2008-06-16 16:27             ` [EGIT PATCH 05/20] Reverse pack index implementation: PackReverseIndex Marek Zawirski
2008-06-17  2:02               ` Shawn O. Pearce
2008-06-16  5:19 ` [EGIT PATCH 00/20] PackWriter, first usable attempt Shawn O. Pearce
2008-06-16 16:37   ` Marek Zawirski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080616040635.GU11793@spearce.org \
    --to=spearce@spearce.org \
    --cc=git@vger.kernel.org \
    --cc=marek.zawirski@gmail.com \
    --cc=robin.rosenberg@dewire.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.