[JGIT PATCH 15/21] Specialized byte array output stream for large files

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Shawn O. Pearce" <spearce@spearce.org>
To: Robin Rosenberg <robin.rosenberg@dewire.com>,
	Marek Zawirski <marek.zawirski@gmail.com>
Cc: git@vger.kernel.org
Subject: [JGIT PATCH 15/21] Specialized byte array output stream for large files
Date: Sun, 29 Jun 2008 03:59:25 -0400	[thread overview]
Message-ID: <1214726371-93520-16-git-send-email-spearce@spearce.org> (raw)
In-Reply-To: <1214726371-93520-15-git-send-email-spearce@spearce.org>

Some transports may require that we know the total byte count (and
perhaps MD5 checksum) of a pack file before we can send it to the
transport during a push operation.  Materializing the pack locally
prior to transfer can be somewhat costly, but may be able to be in
core for very small packs.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
 .../src/org/spearce/jgit/util/TemporaryBuffer.java |  260 ++++++++++++++++++++
 1 files changed, 260 insertions(+), 0 deletions(-)
 create mode 100644 org.spearce.jgit/src/org/spearce/jgit/util/TemporaryBuffer.java

diff --git a/org.spearce.jgit/src/org/spearce/jgit/util/TemporaryBuffer.java b/org.spearce.jgit/src/org/spearce/jgit/util/TemporaryBuffer.java
new file mode 100644
index 0000000..72bdbb1
--- /dev/null
+++ b/org.spearce.jgit/src/org/spearce/jgit/util/TemporaryBuffer.java
@@ -0,0 +1,260 @@
+/*
+ * Copyright (C) 2008, Shawn O. Pearce <spearce@spearce.org>
+ *
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ *   copyright notice, this list of conditions and the following
+ *   disclaimer in the documentation and/or other materials provided
+ *   with the distribution.
+ *
+ * - Neither the name of the Git Development Community nor the
+ *   names of its contributors may be used to endorse or promote
+ *   products derived from this software without specific prior
+ *   written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
+ * CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES,
+ * INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
+ * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
+ * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+ * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
+ * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+package org.spearce.jgit.util;
+
+import java.io.BufferedOutputStream;
+import java.io.File;
+import java.io.FileInputStream;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.OutputStream;
+import java.util.ArrayList;
+
+import org.spearce.jgit.lib.NullProgressMonitor;
+import org.spearce.jgit.lib.ProgressMonitor;
+
+/**
+ * A fully buffered output stream using local disk storage for large data.
+ * <p>
+ * Initially this output stream buffers to memory, like ByteArrayOutputStream
+ * might do, but it shifts to using an on disk temporary file if the output gets
+ * too large.
+ * <p>
+ * The content of this buffered stream may be sent to another OutputStream only
+ * after this stream has been properly closed by {@link #close()}.
+ */
+public class TemporaryBuffer extends OutputStream {
+	private static final int DEFAULT_IN_CORE_LIMIT = 1024 * 1024;
+
+	/** Chain of data, if we are still completely in-core; otherwise null. */
+	private ArrayList<Block> blocks;
+
+	/**
+	 * Maximum number of bytes we will permit storing in memory.
+	 * <p>
+	 * When this limit is reached the data will be shifted to a file on disk,
+	 * preventing the JVM heap from growing out of control.
+	 */
+	private int inCoreLimit;
+
+	/**
+	 * Location of our temporary file if we are on disk; otherwise null.
+	 * <p>
+	 * If we exceeded the {@link #inCoreLimit} we nulled out {@link #blocks} and
+	 * created this file instead. All output goes here through {@link #diskOut}.
+	 */
+	private File onDiskFile;
+
+	/** If writing to {@link #onDiskFile} this is a buffered stream to it. */
+	private OutputStream diskOut;
+
+	/** Create a new empty temporary buffer. */
+	public TemporaryBuffer() {
+		inCoreLimit = DEFAULT_IN_CORE_LIMIT;
+		blocks = new ArrayList<Block>(inCoreLimit / Block.SZ);
+		blocks.add(new Block());
+	}
+
+	@Override
+	public void write(final int b) throws IOException {
+		if (blocks == null) {
+			diskOut.write(b);
+			return;
+		}
+
+		Block s = last();
+		if (s.isFull()) {
+			if (reachedInCoreLimit()) {
+				diskOut.write(b);
+				return;
+			}
+
+			s = new Block();
+			blocks.add(s);
+		}
+		s.buffer[s.count++] = (byte) b;
+	}
+
+	@Override
+	public void write(final byte[] b, int off, int len) throws IOException {
+		if (blocks != null) {
+			while (len > 0) {
+				Block s = last();
+				if (s.isFull()) {
+					if (reachedInCoreLimit())
+						break;
+
+					s = new Block();
+					blocks.add(s);
+				}
+
+				final int n = Math.min(Block.SZ - s.count, len);
+				System.arraycopy(b, off, s.buffer, s.count, n);
+				s.count += n;
+				len -= n;
+				off += n;
+			}
+		}
+
+		if (len > 0)
+			diskOut.write(b, off, len);
+	}
+
+	private Block last() {
+		return blocks.get(blocks.size() - 1);
+	}
+
+	private boolean reachedInCoreLimit() throws IOException {
+		if (blocks.size() * Block.SZ < inCoreLimit)
+			return false;
+
+		onDiskFile = File.createTempFile("jgit_", ".buffer");
+		diskOut = new FileOutputStream(onDiskFile);
+
+		final Block last = blocks.remove(blocks.size() - 1);
+		for (final Block b : blocks)
+			diskOut.write(b.buffer, 0, b.count);
+		blocks = null;
+
+		diskOut = new BufferedOutputStream(diskOut, Block.SZ);
+		diskOut.write(last.buffer, 0, last.count);
+		return true;
+	}
+
+	public void close() throws IOException {
+		if (diskOut != null) {
+			try {
+				diskOut.close();
+			} finally {
+				diskOut = null;
+			}
+		}
+	}
+
+	/**
+	 * Obtain the length (in bytes) of the buffer.
+	 * <p>
+	 * The length is only accurate after {@link #close()} has been invoked.
+	 * 
+	 * @return total length of the buffer, in bytes.
+	 */
+	public long length() {
+		if (onDiskFile != null)
+			return onDiskFile.length();
+
+		final Block last = last();
+		return ((long) blocks.size()) * Block.SZ - (Block.SZ - last.count);
+	}
+
+	/**
+	 * Send this buffer to an output stream.
+	 * <p>
+	 * This method may only be invoked after {@link #close()} has completed
+	 * normally, to ensure all data is completely transferred.
+	 * 
+	 * @param os
+	 *            stream to send this buffer's complete content to.
+	 * @param pm
+	 *            if not null progress updates are sent here. Caller should
+	 *            initialize the task and the number of work units to
+	 *            <code>{@link #length()}/1024</code>.
+	 * @throws IOException
+	 *             an error occurred reading from a temporary file on the local
+	 *             system, or writing to the output stream.
+	 */
+	public void writeTo(final OutputStream os, ProgressMonitor pm)
+			throws IOException {
+		if (pm == null)
+			pm = new NullProgressMonitor();
+		if (blocks != null) {
+			// Everything is in core so we can stream directly to the output.
+			//
+			for (final Block b : blocks) {
+				os.write(b.buffer, 0, b.count);
+				pm.update(b.count / 1024);
+			}
+		} else {
+			// Reopen the temporary file and copy the contents.
+			//
+			final FileInputStream in = new FileInputStream(onDiskFile);
+			try {
+				int cnt;
+				final byte[] buf = new byte[Block.SZ];
+				while ((cnt = in.read(buf)) >= 0) {
+					os.write(buf, 0, cnt);
+					pm.update(cnt / 1024);
+				}
+			} finally {
+				in.close();
+			}
+		}
+	}
+
+	/** Clear this buffer so it has no data, and cannot be used again. */
+	public void destroy() {
+		blocks = null;
+
+		if (diskOut != null) {
+			try {
+				diskOut.close();
+			} catch (IOException err) {
+				// We shouldn't encounter an error closing the file.
+			} finally {
+				diskOut = null;
+			}
+		}
+
+		if (onDiskFile != null) {
+			if (!onDiskFile.delete())
+				onDiskFile.deleteOnExit();
+			onDiskFile = null;
+		}
+	}
+
+	private static class Block {
+		static final int SZ = 8 * 1024;
+
+		final byte[] buffer = new byte[SZ];
+
+		int count;
+
+		boolean isFull() {
+			return count == SZ;
+		}
+	}
+}
-- 
1.5.6.74.g8a5e

next prev parent reply	other threads:[~2008-06-29  8:02 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-06-29  7:59 [JGIT PATCH 00/21] Push support over SFTP and (encrypted) Amazon S3 Shawn O. Pearce
2008-06-29  7:59 ` [JGIT PATCH 01/21] Remove unused index files when WalkFetchConnection closes Shawn O. Pearce
2008-06-29  7:59   ` [JGIT PATCH 02/21] Do not show URIish passwords in TransportExceptions Shawn O. Pearce
2008-06-29  7:59     ` [JGIT PATCH 03/21] Use PackedObjectInfo as a base class for PackWriter's ObjectToPack Shawn O. Pearce
2008-06-29  7:59       ` [JGIT PATCH 04/21] Refactor PackWriter to hold onto the sorted object list Shawn O. Pearce
2008-06-29  7:59         ` [JGIT PATCH 05/21] Save the pack checksum after computing it in PackWriter Shawn O. Pearce
2008-06-29  7:59           ` [JGIT PATCH 06/21] Allow PackIndexWriter to use any subclass of PackedObjectInfo Shawn O. Pearce
2008-06-29  7:59             ` [JGIT PATCH 07/21] Allow PackWriter to create a corresponding index file Shawn O. Pearce
2008-06-29  7:59               ` [JGIT PATCH 08/21] Allow PackWriter to prepare object list and compute name before writing Shawn O. Pearce
2008-06-29  7:59                 ` [JGIT PATCH 09/21] Remember how a Ref was read in from disk and created Shawn O. Pearce
2008-06-29  7:59                   ` [JGIT PATCH 10/21] Simplify walker transport ref advertisement setup Shawn O. Pearce
2008-06-29  7:59                     ` [JGIT PATCH 11/21] Indicate the protocol jgit doesn't support push over Shawn O. Pearce
2008-06-29  7:59                       ` [JGIT PATCH 12/21] WalkTransport must allow subclasses to implement openPush Shawn O. Pearce
2008-06-29  7:59                         ` [JGIT PATCH 13/21] Support push over the sftp:// dumb transport Shawn O. Pearce
2008-06-29  7:59                           ` [JGIT PATCH 14/21] Extract readPackedRefs from TransportSftp for reuse Shawn O. Pearce
2008-06-29  7:59                             ` Shawn O. Pearce [this message]
2008-06-29  7:59                               ` [JGIT PATCH 16/21] Add Robert Harder's public domain Base64 encoding utility Shawn O. Pearce
2008-06-29  7:59                                 ` [JGIT PATCH 17/21] Misc. documentation fixes to Base64 utility Shawn O. Pearce
2008-06-29  7:59                                   ` [JGIT PATCH 18/21] Extract the basic HTTP proxy support to its own class Shawn O. Pearce
2008-06-29  7:59                                     ` [JGIT PATCH 19/21] Create a really simple Amazon S3 REST client Shawn O. Pearce
2008-06-29  7:59                                       ` [JGIT PATCH 20/21] Add client side encryption to Amazon S3 client library Shawn O. Pearce
2008-06-29  7:59                                         ` [JGIT PATCH 21/21] Bidirectional protocol support for Amazon S3 Shawn O. Pearce
2008-06-29 13:51                                 ` [JGIT PATCH 16/21] Add Robert Harder's public domain Base64 encoding utility Robin Rosenberg
2008-06-29 18:06                                   ` Shawn O. Pearce
2008-06-29 13:51                   ` [JGIT PATCH 09/21] Remember how a Ref was read in from disk and created Robin Rosenberg
2008-06-29 14:17                     ` Johannes Schindelin
2008-06-29 18:00                       ` Shawn O. Pearce

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:72bdbb1 )
 OR (
bs:"[JGIT PATCH 15/21] Specialized byte array output stream for large files" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1214726371-93520-16-git-send-email-spearce@spearce.org \
    --to=spearce@spearce.org \
    --cc=git@vger.kernel.org \
    --cc=marek.zawirski@gmail.com \
    --cc=robin.rosenberg@dewire.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).