From: "Shawn O. Pearce" <spearce@spearce.org>
To: Robin Rosenberg <robin.rosenberg@dewire.com>,
Marek Zawirski <marek.zawirski@gmail.com>
Cc: git@vger.kernel.org
Subject: [JGIT PATCH 15/21] Specialized byte array output stream for large files
Date: Sun, 29 Jun 2008 03:59:25 -0400 [thread overview]
Message-ID: <1214726371-93520-16-git-send-email-spearce@spearce.org> (raw)
In-Reply-To: <1214726371-93520-15-git-send-email-spearce@spearce.org>
Some transports may require that we know the total byte count (and
perhaps MD5 checksum) of a pack file before we can send it to the
transport during a push operation. Materializing the pack locally
prior to transfer can be somewhat costly, but may be able to be in
core for very small packs.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
.../src/org/spearce/jgit/util/TemporaryBuffer.java | 260 ++++++++++++++++++++
1 files changed, 260 insertions(+), 0 deletions(-)
create mode 100644 org.spearce.jgit/src/org/spearce/jgit/util/TemporaryBuffer.java
diff --git a/org.spearce.jgit/src/org/spearce/jgit/util/TemporaryBuffer.java b/org.spearce.jgit/src/org/spearce/jgit/util/TemporaryBuffer.java
new file mode 100644
index 0000000..72bdbb1
--- /dev/null
+++ b/org.spearce.jgit/src/org/spearce/jgit/util/TemporaryBuffer.java
@@ -0,0 +1,260 @@
+/*
+ * Copyright (C) 2008, Shawn O. Pearce <spearce@spearce.org>
+ *
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials provided
+ * with the distribution.
+ *
+ * - Neither the name of the Git Development Community nor the
+ * names of its contributors may be used to endorse or promote
+ * products derived from this software without specific prior
+ * written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
+ * CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES,
+ * INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
+ * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
+ * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+ * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
+ * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+package org.spearce.jgit.util;
+
+import java.io.BufferedOutputStream;
+import java.io.File;
+import java.io.FileInputStream;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.OutputStream;
+import java.util.ArrayList;
+
+import org.spearce.jgit.lib.NullProgressMonitor;
+import org.spearce.jgit.lib.ProgressMonitor;
+
+/**
+ * A fully buffered output stream using local disk storage for large data.
+ * <p>
+ * Initially this output stream buffers to memory, like ByteArrayOutputStream
+ * might do, but it shifts to using an on disk temporary file if the output gets
+ * too large.
+ * <p>
+ * The content of this buffered stream may be sent to another OutputStream only
+ * after this stream has been properly closed by {@link #close()}.
+ */
+public class TemporaryBuffer extends OutputStream {
+ private static final int DEFAULT_IN_CORE_LIMIT = 1024 * 1024;
+
+ /** Chain of data, if we are still completely in-core; otherwise null. */
+ private ArrayList<Block> blocks;
+
+ /**
+ * Maximum number of bytes we will permit storing in memory.
+ * <p>
+ * When this limit is reached the data will be shifted to a file on disk,
+ * preventing the JVM heap from growing out of control.
+ */
+ private int inCoreLimit;
+
+ /**
+ * Location of our temporary file if we are on disk; otherwise null.
+ * <p>
+ * If we exceeded the {@link #inCoreLimit} we nulled out {@link #blocks} and
+ * created this file instead. All output goes here through {@link #diskOut}.
+ */
+ private File onDiskFile;
+
+ /** If writing to {@link #onDiskFile} this is a buffered stream to it. */
+ private OutputStream diskOut;
+
+ /** Create a new empty temporary buffer. */
+ public TemporaryBuffer() {
+ inCoreLimit = DEFAULT_IN_CORE_LIMIT;
+ blocks = new ArrayList<Block>(inCoreLimit / Block.SZ);
+ blocks.add(new Block());
+ }
+
+ @Override
+ public void write(final int b) throws IOException {
+ if (blocks == null) {
+ diskOut.write(b);
+ return;
+ }
+
+ Block s = last();
+ if (s.isFull()) {
+ if (reachedInCoreLimit()) {
+ diskOut.write(b);
+ return;
+ }
+
+ s = new Block();
+ blocks.add(s);
+ }
+ s.buffer[s.count++] = (byte) b;
+ }
+
+ @Override
+ public void write(final byte[] b, int off, int len) throws IOException {
+ if (blocks != null) {
+ while (len > 0) {
+ Block s = last();
+ if (s.isFull()) {
+ if (reachedInCoreLimit())
+ break;
+
+ s = new Block();
+ blocks.add(s);
+ }
+
+ final int n = Math.min(Block.SZ - s.count, len);
+ System.arraycopy(b, off, s.buffer, s.count, n);
+ s.count += n;
+ len -= n;
+ off += n;
+ }
+ }
+
+ if (len > 0)
+ diskOut.write(b, off, len);
+ }
+
+ private Block last() {
+ return blocks.get(blocks.size() - 1);
+ }
+
+ private boolean reachedInCoreLimit() throws IOException {
+ if (blocks.size() * Block.SZ < inCoreLimit)
+ return false;
+
+ onDiskFile = File.createTempFile("jgit_", ".buffer");
+ diskOut = new FileOutputStream(onDiskFile);
+
+ final Block last = blocks.remove(blocks.size() - 1);
+ for (final Block b : blocks)
+ diskOut.write(b.buffer, 0, b.count);
+ blocks = null;
+
+ diskOut = new BufferedOutputStream(diskOut, Block.SZ);
+ diskOut.write(last.buffer, 0, last.count);
+ return true;
+ }
+
+ public void close() throws IOException {
+ if (diskOut != null) {
+ try {
+ diskOut.close();
+ } finally {
+ diskOut = null;
+ }
+ }
+ }
+
+ /**
+ * Obtain the length (in bytes) of the buffer.
+ * <p>
+ * The length is only accurate after {@link #close()} has been invoked.
+ *
+ * @return total length of the buffer, in bytes.
+ */
+ public long length() {
+ if (onDiskFile != null)
+ return onDiskFile.length();
+
+ final Block last = last();
+ return ((long) blocks.size()) * Block.SZ - (Block.SZ - last.count);
+ }
+
+ /**
+ * Send this buffer to an output stream.
+ * <p>
+ * This method may only be invoked after {@link #close()} has completed
+ * normally, to ensure all data is completely transferred.
+ *
+ * @param os
+ * stream to send this buffer's complete content to.
+ * @param pm
+ * if not null progress updates are sent here. Caller should
+ * initialize the task and the number of work units to
+ * <code>{@link #length()}/1024</code>.
+ * @throws IOException
+ * an error occurred reading from a temporary file on the local
+ * system, or writing to the output stream.
+ */
+ public void writeTo(final OutputStream os, ProgressMonitor pm)
+ throws IOException {
+ if (pm == null)
+ pm = new NullProgressMonitor();
+ if (blocks != null) {
+ // Everything is in core so we can stream directly to the output.
+ //
+ for (final Block b : blocks) {
+ os.write(b.buffer, 0, b.count);
+ pm.update(b.count / 1024);
+ }
+ } else {
+ // Reopen the temporary file and copy the contents.
+ //
+ final FileInputStream in = new FileInputStream(onDiskFile);
+ try {
+ int cnt;
+ final byte[] buf = new byte[Block.SZ];
+ while ((cnt = in.read(buf)) >= 0) {
+ os.write(buf, 0, cnt);
+ pm.update(cnt / 1024);
+ }
+ } finally {
+ in.close();
+ }
+ }
+ }
+
+ /** Clear this buffer so it has no data, and cannot be used again. */
+ public void destroy() {
+ blocks = null;
+
+ if (diskOut != null) {
+ try {
+ diskOut.close();
+ } catch (IOException err) {
+ // We shouldn't encounter an error closing the file.
+ } finally {
+ diskOut = null;
+ }
+ }
+
+ if (onDiskFile != null) {
+ if (!onDiskFile.delete())
+ onDiskFile.deleteOnExit();
+ onDiskFile = null;
+ }
+ }
+
+ private static class Block {
+ static final int SZ = 8 * 1024;
+
+ final byte[] buffer = new byte[SZ];
+
+ int count;
+
+ boolean isFull() {
+ return count == SZ;
+ }
+ }
+}
--
1.5.6.74.g8a5e
next prev parent reply other threads:[~2008-06-29 8:02 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-06-29 7:59 [JGIT PATCH 00/21] Push support over SFTP and (encrypted) Amazon S3 Shawn O. Pearce
2008-06-29 7:59 ` [JGIT PATCH 01/21] Remove unused index files when WalkFetchConnection closes Shawn O. Pearce
2008-06-29 7:59 ` [JGIT PATCH 02/21] Do not show URIish passwords in TransportExceptions Shawn O. Pearce
2008-06-29 7:59 ` [JGIT PATCH 03/21] Use PackedObjectInfo as a base class for PackWriter's ObjectToPack Shawn O. Pearce
2008-06-29 7:59 ` [JGIT PATCH 04/21] Refactor PackWriter to hold onto the sorted object list Shawn O. Pearce
2008-06-29 7:59 ` [JGIT PATCH 05/21] Save the pack checksum after computing it in PackWriter Shawn O. Pearce
2008-06-29 7:59 ` [JGIT PATCH 06/21] Allow PackIndexWriter to use any subclass of PackedObjectInfo Shawn O. Pearce
2008-06-29 7:59 ` [JGIT PATCH 07/21] Allow PackWriter to create a corresponding index file Shawn O. Pearce
2008-06-29 7:59 ` [JGIT PATCH 08/21] Allow PackWriter to prepare object list and compute name before writing Shawn O. Pearce
2008-06-29 7:59 ` [JGIT PATCH 09/21] Remember how a Ref was read in from disk and created Shawn O. Pearce
2008-06-29 7:59 ` [JGIT PATCH 10/21] Simplify walker transport ref advertisement setup Shawn O. Pearce
2008-06-29 7:59 ` [JGIT PATCH 11/21] Indicate the protocol jgit doesn't support push over Shawn O. Pearce
2008-06-29 7:59 ` [JGIT PATCH 12/21] WalkTransport must allow subclasses to implement openPush Shawn O. Pearce
2008-06-29 7:59 ` [JGIT PATCH 13/21] Support push over the sftp:// dumb transport Shawn O. Pearce
2008-06-29 7:59 ` [JGIT PATCH 14/21] Extract readPackedRefs from TransportSftp for reuse Shawn O. Pearce
2008-06-29 7:59 ` Shawn O. Pearce [this message]
2008-06-29 7:59 ` [JGIT PATCH 16/21] Add Robert Harder's public domain Base64 encoding utility Shawn O. Pearce
2008-06-29 7:59 ` [JGIT PATCH 17/21] Misc. documentation fixes to Base64 utility Shawn O. Pearce
2008-06-29 7:59 ` [JGIT PATCH 18/21] Extract the basic HTTP proxy support to its own class Shawn O. Pearce
2008-06-29 7:59 ` [JGIT PATCH 19/21] Create a really simple Amazon S3 REST client Shawn O. Pearce
2008-06-29 7:59 ` [JGIT PATCH 20/21] Add client side encryption to Amazon S3 client library Shawn O. Pearce
2008-06-29 7:59 ` [JGIT PATCH 21/21] Bidirectional protocol support for Amazon S3 Shawn O. Pearce
2008-06-29 13:51 ` [JGIT PATCH 16/21] Add Robert Harder's public domain Base64 encoding utility Robin Rosenberg
2008-06-29 18:06 ` Shawn O. Pearce
2008-06-29 13:51 ` [JGIT PATCH 09/21] Remember how a Ref was read in from disk and created Robin Rosenberg
2008-06-29 14:17 ` Johannes Schindelin
2008-06-29 18:00 ` Shawn O. Pearce
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1214726371-93520-16-git-send-email-spearce@spearce.org \
--to=spearce@spearce.org \
--cc=git@vger.kernel.org \
--cc=marek.zawirski@gmail.com \
--cc=robin.rosenberg@dewire.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).