git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: git@vger.kernel.org
Cc: Han-Wen Nienhuys <hanwenn@gmail.com>
Subject: [PATCH 4/4] reftable/blocksource: use mmap to read tables
Date: Mon, 8 Jan 2024 13:18:43 +0100	[thread overview]
Message-ID: <a23f38a80609a5c5a6931400ffd28a92b33838bb.1704714575.git.ps@pks.im> (raw)
In-Reply-To: <cover.1704714575.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 7265 bytes --]

The blocksource interface provides an interface to read blocks from a
reftable table. This interface is implemented using read(3P) calls on
the underlying file descriptor. While this works alright, this pattern
is very inefficient when repeatedly querying the reftable stack for one
or more refs. This inefficiency can mostly be attributed to the fact
that we often need to re-read the same blocks over and over again, and
every single time we need to call read(3P) again.

A natural fit in this context is to use mmap(3P) instead of read(3P),
which has a bunch of benefits:

  - We do not need to come up with a caching strategy for some of the
    blocks as this will be handled by the kernel already.

  - We can avoid the overhead of having to call into the read(3P)
    syscall repeatedly.

  - We do not need to allocate returned blocks repeatedly, but can
    instead hand out pointers into the mmapped region directly.

Using mmap comes with a significant drawback on Windows though, because
mmapped files cannot be deleted and neither is it possible to rename
files onto an mmapped file. But for one, the reftable library gracefully
handles the case where auto-compaction cannot delete a still-open stack
already and ignores any such errors. Also, `reftable_stack_clean()` will
prune stale tables which are not referenced by "tables.list" anymore so
that those files can eventually be pruned. And second, we never rewrite
already-rewritten stacks, so it does not matter that we cannot rename a
file over an mmaped file, either.

Another unfortunate property of mmap is that it is not supported by all
systems. But given that the size of reftables should typically be rather
limited (megabytes at most in the vast majority of repositories), we can
provide an easy fallback by just reading the complete table into memory
on such platforms. This is the same strategy that the "packed" backend
uses in case the platform does not provide mmap.

While this change doesn't significantly improve performance in the case
where we're seeking through stacks once (like e.g. git-for-each-ref(1)
would). But it does speed up usecases where there is lots of random
access to refs, e.g. when writing. The following benchmark demonstrates
these savings with git-update-ref(1) creating N refs in an otherwise
empty repository:

  Benchmark 1: update-ref: create many refs (refcount = 1, revision = HEAD~)
    Time (mean ± σ):       5.6 ms ±   0.2 ms    [User: 2.7 ms, System: 2.8 ms]
    Range (min … max):     5.1 ms …   6.0 ms    90 runs

  Benchmark 2: update-ref: create many refs (refcount = 100, revision = HEAD~)
    Time (mean ± σ):      15.1 ms ±   0.4 ms    [User: 7.1 ms, System: 8.0 ms]
    Range (min … max):    14.2 ms …  16.5 ms    71 runs

  Benchmark 3: update-ref: create many refs (refcount = 10000, revision = HEAD~)
    Time (mean ± σ):     919.4 ms ±  11.2 ms    [User: 427.5 ms, System: 491.7 ms]
    Range (min … max):   908.1 ms … 936.6 ms    10 runs

  Benchmark 4: update-ref: create many refs (refcount = 1, revision = HEAD)
    Time (mean ± σ):       5.5 ms ±   0.3 ms    [User: 2.4 ms, System: 3.1 ms]
    Range (min … max):     5.0 ms …   7.3 ms    89 runs

  Benchmark 5: update-ref: create many refs (refcount = 100, revision = HEAD)
    Time (mean ± σ):      10.4 ms ±   0.3 ms    [User: 5.1 ms, System: 5.3 ms]
    Range (min … max):     9.7 ms …  11.0 ms    78 runs

  Benchmark 6: update-ref: create many refs (refcount = 10000, revision = HEAD)
    Time (mean ± σ):     483.5 ms ±   6.3 ms    [User: 227.8 ms, System: 255.6 ms]
    Range (min … max):   477.5 ms … 498.8 ms    10 runs

  Summary
    update-ref: create many refs (refcount = 1, revision = HEAD) ran
      1.01 ± 0.06 times faster than update-ref: create many refs (refcount = 1, revision = HEAD~)
      1.89 ± 0.11 times faster than update-ref: create many refs (refcount = 100, revision = HEAD)
      2.73 ± 0.16 times faster than update-ref: create many refs (refcount = 100, revision = HEAD~)
     87.55 ± 4.65 times faster than update-ref: create many refs (refcount = 10000, revision = HEAD)
    166.48 ± 8.80 times faster than update-ref: create many refs (refcount = 10000, revision = HEAD~)

Theoretically, we could also replicate the strategy of the "packed"
backend where small tables are read into memory instead of using mmap.
Benchmarks did not confirm that this has a performance benefit though.
---
 reftable/blocksource.c | 48 ++++++++++++++++++++++++++++++------------
 1 file changed, 35 insertions(+), 13 deletions(-)

diff --git a/reftable/blocksource.c b/reftable/blocksource.c
index a1ea304429..5d3f3d264c 100644
--- a/reftable/blocksource.c
+++ b/reftable/blocksource.c
@@ -13,6 +13,12 @@ license that can be found in the LICENSE file or at
 #include "reftable-blocksource.h"
 #include "reftable-error.h"
 
+#if defined(NO_MMAP)
+static int use_mmap = 0;
+#else
+static int use_mmap = 1;
+#endif
+
 static void strbuf_return_block(void *b, struct reftable_block *dest)
 {
 	if (dest->len)
@@ -78,6 +84,7 @@ struct reftable_block_source malloc_block_source(void)
 struct file_block_source {
 	int fd;
 	uint64_t size;
+	unsigned char *data;
 };
 
 static uint64_t file_size(void *b)
@@ -87,19 +94,23 @@ static uint64_t file_size(void *b)
 
 static void file_return_block(void *b, struct reftable_block *dest)
 {
-	if (dest->len)
-		memset(dest->data, 0xff, dest->len);
-	reftable_free(dest->data);
 }
 
-static void file_close(void *b)
+static void file_close(void *v)
 {
-	int fd = ((struct file_block_source *)b)->fd;
-	if (fd > 0) {
-		close(fd);
-		((struct file_block_source *)b)->fd = 0;
+	struct file_block_source *b = v;
+
+	if (b->fd >= 0) {
+		close(b->fd);
+		b->fd = -1;
 	}
 
+	if (use_mmap)
+		munmap(b->data, b->size);
+	else
+		reftable_free(b->data);
+	b->data = NULL;
+
 	reftable_free(b);
 }
 
@@ -108,9 +119,7 @@ static int file_read_block(void *v, struct reftable_block *dest, uint64_t off,
 {
 	struct file_block_source *b = v;
 	assert(off + size <= b->size);
-	dest->data = reftable_malloc(size);
-	if (pread_in_full(b->fd, dest->data, size, off) != size)
-		return -1;
+	dest->data = b->data + off;
 	dest->len = size;
 	return size;
 }
@@ -127,8 +136,10 @@ int reftable_block_source_from_file(struct reftable_block_source *bs,
 {
 	struct stat st = { 0 };
 	int err = 0;
-	int fd = open(name, O_RDONLY);
+	int fd;
 	struct file_block_source *p = NULL;
+
+	fd = open(name, O_RDONLY);
 	if (fd < 0) {
 		if (errno == ENOENT) {
 			return REFTABLE_NOT_EXIST_ERROR;
@@ -144,7 +155,18 @@ int reftable_block_source_from_file(struct reftable_block_source *bs,
 
 	p = reftable_calloc(sizeof(struct file_block_source));
 	p->size = st.st_size;
-	p->fd = fd;
+	if (use_mmap) {
+		p->data = xmmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
+		p->fd = fd;
+	} else {
+		p->data = xmalloc(st.st_size);
+		if (read_in_full(fd, p->data, st.st_size) != st.st_size) {
+			close(fd);
+			return -1;
+		}
+		close(fd);
+		p->fd = -1;
+	}
 
 	assert(!bs->ops);
 	bs->ops = &file_vtable;
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  parent reply	other threads:[~2024-01-08 12:18 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-08 12:18 [PATCH 0/4] reftable: optimize I/O patterns Patrick Steinhardt
2024-01-08 12:18 ` [PATCH 1/4] reftable/stack: refactor stack reloading to have common exit path Patrick Steinhardt
2024-01-10 17:30   ` Junio C Hamano
2024-01-11  7:33     ` Patrick Steinhardt
2024-01-08 12:18 ` [PATCH 2/4] reftable/stack: refactor reloading to use file descriptor Patrick Steinhardt
2024-01-10 19:57   ` Junio C Hamano
2024-01-08 12:18 ` [PATCH 3/4] reftable/stack: use stat info to avoid re-reading stack list Patrick Steinhardt
2024-01-10 20:07   ` Junio C Hamano
2024-01-11  7:41     ` Patrick Steinhardt
2024-01-08 12:18 ` Patrick Steinhardt [this message]
2024-01-10 21:24   ` [PATCH 4/4] reftable/blocksource: use mmap to read tables Junio C Hamano
2024-01-11  9:21     ` Patrick Steinhardt
2024-01-11 10:06 ` [PATCH v2 0/5] reftable: optimize I/O patterns Patrick Steinhardt
2024-01-11 10:06   ` [PATCH v2 1/5] reftable/stack: refactor stack reloading to have common exit path Patrick Steinhardt
2024-01-11 10:06   ` [PATCH v2 2/5] reftable/stack: refactor reloading to use file descriptor Patrick Steinhardt
2024-01-14 10:14     ` Jeff King
2024-01-15 10:03       ` Patrick Steinhardt
2024-01-16 15:14         ` Jeff King
2024-01-16 16:44           ` Junio C Hamano
2024-01-11 10:06   ` [PATCH v2 3/5] reftable/stack: use stat info to avoid re-reading stack list Patrick Steinhardt
2024-01-11 10:06   ` [PATCH v2 4/5] reftable/blocksource: refactor code to match our coding style Patrick Steinhardt
2024-01-11 10:06   ` [PATCH v2 5/5] reftable/blocksource: use mmap to read tables Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a23f38a80609a5c5a6931400ffd28a92b33838bb.1704714575.git.ps@pks.im \
    --to=ps@pks.im \
    --cc=git@vger.kernel.org \
    --cc=hanwenn@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).