git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Shawn O. Pearce" <spearce@spearce.org>
To: Junio C Hamano <junkio@cox.net>
Cc: git@vger.kernel.org
Subject: [PATCH 11/17] Fully activate the sliding window pack access.
Date: Sat, 23 Dec 2006 02:34:28 -0500	[thread overview]
Message-ID: <20061223073428.GL9837@spearce.org> (raw)
In-Reply-To: <53b67707929c7f051f6d384c5d96e653bfa8419c.1166857884.git.spearce@spearce.org>

This finally turns on the sliding window behavior for packfile data
access by mapping limited size windows and chaining them under the
packed_git->windows list.

We consider a given byte offset to be within the window only if there
would be at least 20 bytes (one hash worth of data) accessible after
the requested offset.  This range selection relates to the contract
that use_pack() makes with its callers, allowing them to access
one hash or one object header without needing to call use_pack()
for every byte of data obtained.

In the worst case scenario we will map the same page of data twice
into memory: once at the end of one window and once again at the
start of the next window.  This duplicate page mapping will happen
only when an object header or a delta base reference is spanned
over the end of a window and is always limited to just one page of
duplication, as no sane operating system will ever have a page size
smaller than a hash.

I am assuming that the possible wasted page of virtual address
space is going to perform faster than the alternatives, which
would be to copy the object header or ref delta into a temporary
buffer prior to parsing, or to check the window range on every byte
during header parsing.  We may decide to revisit this decision in
the future since this is just a gut instinct decision and has not
actually been proven out by experimental testing.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
 Documentation/config.txt |   11 +++++++
 cache.h                  |    1 +
 config.c                 |   10 +++++++
 environment.c            |    1 +
 sha1_file.c              |   66 +++++++++++++++++++++++++++++++++++++---------
 5 files changed, 76 insertions(+), 13 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 4e93066..f8775f1 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -118,6 +118,17 @@ core.legacyheaders::
 	database directly (where the "http://" and "rsync://" protocols
 	count as direct access).
 
+core.packedGitWindowSize::
+	Number of bytes of a pack file to map into memory in a
+	single mapping operation.  Larger window sizes may allow
+	your system to process a smaller number of large pack files
+	more quickly.  Smaller window sizes will negatively affect
+	performance due to increased calls to the opreating system's
+	memory manager, but may improve performance when accessing
+	a large number of large pack files.  Default is 32 MiB,
+	which should be reasonable for all users/operating systems.
+	You probably do not need to adjust this value.
+
 core.packedGitLimit::
 	Maximum number of bytes to map simultaneously into memory
 	from pack files.  If Git needs to access more than this many
diff --git a/cache.h b/cache.h
index b294bbf..b7855ef 100644
--- a/cache.h
+++ b/cache.h
@@ -196,6 +196,7 @@ extern int warn_ambiguous_refs;
 extern int shared_repository;
 extern const char *apply_default_whitespace;
 extern int zlib_compression_level;
+extern size_t packed_git_window_size;
 extern size_t packed_git_limit;
 
 #define GIT_REPO_VERSION 0
diff --git a/config.c b/config.c
index 1e79f09..a8ea063 100644
--- a/config.c
+++ b/config.c
@@ -298,6 +298,16 @@ int git_default_config(const char *var, const char *value)
 		return 0;
 	}
 
+	if (!strcmp(var, "core.packedgitwindowsize")) {
+		int pgsz = getpagesize();
+		packed_git_window_size = git_config_int(var, value);
+		packed_git_window_size /= pgsz;
+		if (!packed_git_window_size)
+			packed_git_window_size = 1;
+		packed_git_window_size *= pgsz;
+		return 0;
+	}
+
 	if (!strcmp(var, "core.packedgitlimit")) {
 		packed_git_limit = git_config_int(var, value);
 		return 0;
diff --git a/environment.c b/environment.c
index 8a09df2..289fc84 100644
--- a/environment.c
+++ b/environment.c
@@ -22,6 +22,7 @@ char git_commit_encoding[MAX_ENCODING_LENGTH] = "utf-8";
 int shared_repository = PERM_UMASK;
 const char *apply_default_whitespace;
 int zlib_compression_level = Z_DEFAULT_COMPRESSION;
+size_t packed_git_window_size = 32 * 1024 * 1024;
 size_t packed_git_limit = 256 * 1024 * 1024;
 int pager_in_use;
 int pager_use_color = 1;
diff --git a/sha1_file.c b/sha1_file.c
index 49dd4b7..fab2ab0 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -397,8 +397,9 @@ static char *find_sha1_file(const unsigned char *sha1, struct stat *st)
 	return NULL;
 }
 
-static int pack_used_ctr;
-static unsigned long pack_mapped;
+static unsigned int pack_used_ctr;
+static size_t pack_mapped;
+static size_t page_size;
 struct packed_git *packed_git;
 
 static int check_packed_git_idx(const char *path, unsigned long *idx_size_,
@@ -536,31 +537,70 @@ static void open_packed_git(struct packed_git *p)
 		die("packfile %s does not match index", p->pack_name);
 }
 
+static int in_window(struct pack_window *win, unsigned long offset)
+{
+	/* We must promise at least 20 bytes (one hash) after the
+	 * offset is available from this window, otherwise the offset
+	 * is not actually in this window and a different window (which
+	 * has that one hash excess) must be used.  This is to support
+	 * the object header and delta base parsing routines below.
+	 */
+	off_t win_off = win->offset;
+	return win_off <= offset
+		&& (offset + 20) <= (win_off + win->len);
+}
+
 unsigned char* use_pack(struct packed_git *p,
 		struct pack_window **w_cursor,
 		unsigned long offset,
 		unsigned int *left)
 {
-	struct pack_window *win = p->windows;
+	struct pack_window *win = *w_cursor;
 
 	if (p->pack_fd == -1)
 		open_packed_git(p);
-	if (!win) {
-		pack_mapped += p->pack_size;
-		while (packed_git_limit < pack_mapped && unuse_one_window())
-			; /* nothing */
-		win = xcalloc(1, sizeof(*win));
-		win->len = p->pack_size;
-		win->base = mmap(NULL, p->pack_size, PROT_READ, MAP_PRIVATE, p->pack_fd, 0);
-		if (win->base == MAP_FAILED)
-			die("packfile %s cannot be mapped.", p->pack_name);
-		p->windows = win;
+
+	/* Since packfiles end in a hash of their content and its
+	 * pointless to ask for an offset into the middle of that
+	 * hash, and the in_window function above wouldn't match
+	 * don't allow an offset too close to the end of the file.
+	 */
+	if (offset > (p->pack_size - 20))
+		die("offset beyond end of packfile (truncated pack?)");
+
+	if (!win || !in_window(win, offset)) {
+		if (win)
+			win->inuse_cnt--;
+		for (win = p->windows; win; win = win->next) {
+			if (in_window(win, offset))
+				break;
+		}
+		if (!win) {
+			if (!page_size)
+				page_size = getpagesize();
+			win = xcalloc(1, sizeof(*win));
+			win->offset = (offset / page_size) * page_size;
+			win->len = p->pack_size - win->offset;
+			if (win->len > packed_git_window_size)
+				win->len = packed_git_window_size;
+			pack_mapped += win->len;
+			while (packed_git_limit < pack_mapped && unuse_one_window())
+				; /* nothing */
+			win->base = mmap(NULL, win->len,
+				PROT_READ, MAP_PRIVATE,
+				p->pack_fd, win->offset);
+			if (win->base == MAP_FAILED)
+				die("packfile %s cannot be mapped.", p->pack_name);
+			win->next = p->windows;
+			p->windows = win;
+		}
 	}
 	if (win != *w_cursor) {
 		win->last_used = pack_used_ctr++;
 		win->inuse_cnt++;
 		*w_cursor = win;
 	}
+	offset -= win->offset;
 	if (left)
 		*left = win->len - offset;
 	return win->base + offset;
-- 
1.4.4.3.g87d8

  parent reply	other threads:[~2006-12-23  7:34 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <53b67707929c7f051f6d384c5d96e653bfa8419c.1166857884.git.spearce@spearce.org>
2006-12-23  7:33 ` [PATCH 1/17] Replace unpack_entry_gently with unpack_entry Shawn O. Pearce
2006-12-23  7:33 ` [PATCH 2/17] Introduce new config option for mmap limit Shawn O. Pearce
2006-12-23  7:33 ` [PATCH 3/17] Refactor packed_git to prepare for sliding mmap windows Shawn O. Pearce
2006-12-23  7:33 ` [PATCH 4/17] Use off_t for index and pack file lengths Shawn O. Pearce
2006-12-23  7:33 ` [PATCH 5/17] Create read_or_die utility routine Shawn O. Pearce
2006-12-23  7:34 ` [PATCH 6/17] Refactor how we open pack files to prepare for multiple windows Shawn O. Pearce
2006-12-23  7:34 ` [PATCH 7/17] Replace use_packed_git with window cursors Shawn O. Pearce
2006-12-23  7:34 ` [PATCH 8/17] Loop over pack_windows when inflating/accessing data Shawn O. Pearce
2006-12-23  7:34 ` [PATCH 9/17] Document why header parsing won't exceed a window Shawn O. Pearce
2006-12-23  7:34 ` [PATCH 10/17] Unmap individual windows rather than entire files Shawn O. Pearce
2006-12-23  7:34 ` Shawn O. Pearce [this message]
2006-12-23 18:44   ` [PATCH 11/17] Fully activate the sliding window pack access Linus Torvalds
2006-12-23 19:34     ` Eric Blake
2006-12-24  0:58       ` Johannes Schindelin
2006-12-23 19:45     ` Junio C Hamano
2006-12-23 20:10       ` Linus Torvalds
2006-12-24  1:23         ` Johannes Schindelin
2006-12-24  2:23       ` Shawn Pearce
2006-12-24  2:35       ` Shawn Pearce
2006-12-23  7:34 ` [PATCH 12/17] Load core configuration in git-verify-pack Shawn O. Pearce
2006-12-23  7:34 ` [PATCH 13/17] Ensure core.packedGitWindowSize cannot be less than 2 pages Shawn O. Pearce
2006-12-23  7:34 ` [PATCH 14/17] Improve error message when packfile mmap fails Shawn O. Pearce
2006-12-23  7:34 ` [PATCH 15/17] Support unmapping windows on 'temporary' packfiles Shawn O. Pearce
2006-12-23  7:34 ` [PATCH 16/17] Create pack_report() as a debugging aid Shawn O. Pearce
2006-12-23  7:34 ` [PATCH 17/17] Test suite for sliding window mmap implementation Shawn O. Pearce

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20061223073428.GL9837@spearce.org \
    --to=spearce@spearce.org \
    --cc=git@vger.kernel.org \
    --cc=junkio@cox.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).