* [PATCH v5 03/15] index-helper: new daemon for caching index and related stuff
2016-04-19 23:27 [PATCH v5 00/15] index-helper/watchman David Turner
@ 2016-04-19 23:27 ` David Turner
2016-04-20 12:17 ` Johannes Schindelin
0 siblings, 1 reply; 7+ messages in thread
From: David Turner @ 2016-04-19 23:27 UTC (permalink / raw)
To: git, pclouds; +Cc: David Turner, Ramsay Jones
From: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Instead of reading the index from disk and worrying about disk
corruption, the index is cached in memory (memory bit-flips happen
too, but hopefully less often). The result is faster read. Read time
is reduced by 70%.
The biggest gain is not having to verify the trailing SHA-1, which
takes lots of time especially on large index files. But this also
opens doors for further optimiztions:
- we could create an in-memory format that's essentially the memory
dump of the index to eliminate most of parsing/allocation
overhead. The mmap'd memory can be used straight away. Experiment
[1] shows we could reduce read time by 88%.
- we could cache non-index info such as name hash
Shared memory is done by storing files in a per-repository temporary
directory. This is more portable than shm (which requires
posix-realtime and has various quirks on OS X). It might even work on
Windows, although this has not been tested. The shared memory file's
name folows the template "shm-<object>-<SHA1>" where <SHA1> is the
trailing SHA-1 of the index file. <object> is "index" for cached index
files (and might later be "name-hash" for name-hash cache). If such
shared memory exists, it contains the same index content as on
disk. The content is already validated by the daemon and git won't
validate it again (except comparing the trailing SHA-1s).
We keep this daemon's logic as thin as possible. The "brain" stays in
git. So the daemon can read and validate stuff, but that's all it's
allowed to do. It does not add/create new information. It doesn't even
accept direct updates from git.
Git can poke the daemon via unix domain sockets to tell it to refresh
the index cache, or to keep it alive some more minutes. It can't give
any real index data directly to the daemon. Real data goes to disk
first, then the daemon reads and verifies it from there. Poking only
happens for $GIT_DIR/index, not temporary index files.
$GIT_DIR/index-helper.sock is the socket for the daemon process. The
daemon reads from the socket and executes commands.
Named pipes were considered for portability reasons, but then commands
that need replies from the daemon would have open their own pipes,
since a named pipe should only have one reader. Unix domain sockets
don't have this problem.
On webkit.git with index format v2, duplicating 8 times to 1.5m
entries and 236MB in size:
(vanilla) 0.50 s: read_index_from .git/index
(index-helper) 0.18 s: read_index_from .git/index
Interestingly with index v4, we get less out of index-helper. It makes
sense as v4 requires more processing after loading the index:
(vanilla) 0.37 s: read_index_from .git/index
(index-helper) 0.22 s: read_index_from .git/index
[1] http://thread.gmane.org/gmane.comp.version-control.git/247268/focus=248771
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
---
.gitignore | 1 +
Documentation/git-index-helper.txt | 47 +++++++
Makefile | 5 +
cache.h | 2 +
git-compat-util.h | 1 +
index-helper.c | 276 +++++++++++++++++++++++++++++++++++++
read-cache.c | 120 ++++++++++++++--
t/t7900-index-helper.sh | 23 ++++
8 files changed, 466 insertions(+), 9 deletions(-)
create mode 100644 Documentation/git-index-helper.txt
create mode 100644 index-helper.c
create mode 100755 t/t7900-index-helper.sh
diff --git a/.gitignore b/.gitignore
index 5087ce1..b92f122 100644
--- a/.gitignore
+++ b/.gitignore
@@ -71,6 +71,7 @@
/git-http-fetch
/git-http-push
/git-imap-send
+/git-index-helper
/git-index-pack
/git-init
/git-init-db
diff --git a/Documentation/git-index-helper.txt b/Documentation/git-index-helper.txt
new file mode 100644
index 0000000..77687c0
--- /dev/null
+++ b/Documentation/git-index-helper.txt
@@ -0,0 +1,47 @@
+git-index-helper(1)
+===================
+
+NAME
+----
+git-index-helper - A simple cache daemon for speeding up index file access
+
+SYNOPSIS
+--------
+[verse]
+'git index-helper' [options]
+
+DESCRIPTION
+-----------
+Keep the index file in memory for faster access. This daemon is per
+repository.
+
+OPTIONS
+-------
+
+--exit-after=<n>::
+ Exit if the cached index is not accessed for `<n>`
+ seconds. Specify 0 to wait forever. Default is 600.
+
+NOTES
+-----
+
+$GIT_DIR/index-helper.sock a Unix domain socket that the daemon reads
+commands from. The directory will also contain files named
+"shm-index-<SHA1>". These are used as backing stores for shared
+memory. Normally the daemon will clean up these files when it exits
+or when they are no longer relevant. But if it crashes, some objects
+could remain there and they can be safely deleted with "rm"
+command. The following commands are used to control the daemon:
+
+"refresh"::
+ Reread the index.
+
+"poke":
+ Let the daemon know the index is to be read. It keeps the
+ daemon alive longer, unless `--exit-after=0` is used.
+
+All commands and replies are terminated by a 0 byte.
+
+GIT
+---
+Part of the linkgit:git[1] suite
diff --git a/Makefile b/Makefile
index 2742a69..c8be0e7 100644
--- a/Makefile
+++ b/Makefile
@@ -1433,6 +1433,10 @@ ifdef HAVE_DEV_TTY
BASIC_CFLAGS += -DHAVE_DEV_TTY
endif
+ifndef NO_MMAP
+ PROGRAM_OBJS += index-helper.o
+endif
+
ifdef DIR_HAS_BSD_GROUP_SEMANTICS
COMPAT_CFLAGS += -DDIR_HAS_BSD_GROUP_SEMANTICS
endif
@@ -2159,6 +2163,7 @@ GIT-BUILD-OPTIONS: FORCE
@echo NO_PERL=\''$(subst ','\'',$(subst ','\'',$(NO_PERL)))'\' >>$@+
@echo NO_PYTHON=\''$(subst ','\'',$(subst ','\'',$(NO_PYTHON)))'\' >>$@+
@echo NO_UNIX_SOCKETS=\''$(subst ','\'',$(subst ','\'',$(NO_UNIX_SOCKETS)))'\' >>$@+
+ @echo NO_MMAP=\''$(subst ','\'',$(subst ','\'',$(NO_MMAP)))'\' >>$@+
ifdef TEST_OUTPUT_DIRECTORY
@echo TEST_OUTPUT_DIRECTORY=\''$(subst ','\'',$(subst ','\'',$(TEST_OUTPUT_DIRECTORY)))'\' >>$@+
endif
diff --git a/cache.h b/cache.h
index 4180e2b..43fb314 100644
--- a/cache.h
+++ b/cache.h
@@ -334,6 +334,8 @@ struct index_state {
struct cache_time timestamp;
unsigned name_hash_initialized : 1,
keep_mmap : 1,
+ from_shm : 1,
+ to_shm : 1,
initialized : 1;
struct hashmap name_hash;
struct hashmap dir_hash;
diff --git a/git-compat-util.h b/git-compat-util.h
index c07e0c1..56945a7 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -513,6 +513,7 @@ static inline int ends_with(const char *str, const char *suffix)
#define PROT_READ 1
#define PROT_WRITE 2
#define MAP_PRIVATE 1
+#define MAP_SHARED 2
#endif
#define mmap git_mmap
diff --git a/index-helper.c b/index-helper.c
new file mode 100644
index 0000000..d64e49b
--- /dev/null
+++ b/index-helper.c
@@ -0,0 +1,276 @@
+#include "cache.h"
+#include "parse-options.h"
+#include "sigchain.h"
+#include "strbuf.h"
+#include "exec_cmd.h"
+#include "split-index.h"
+#include "lockfile.h"
+#include "cache.h"
+#include "unix-socket.h"
+
+struct shm {
+ unsigned char sha1[20];
+ void *shm;
+ size_t size;
+};
+
+static struct shm shm_index;
+static struct shm shm_base_index;
+
+static void release_index_shm(struct shm *is)
+{
+ if (!is->shm)
+ return;
+ munmap(is->shm, is->size);
+ unlink(git_path("shm-index-%s", sha1_to_hex(is->sha1)));
+ is->shm = NULL;
+}
+
+static void cleanup_shm(void)
+{
+ release_index_shm(&shm_index);
+ release_index_shm(&shm_base_index);
+}
+
+static void cleanup(void)
+{
+ unlink(git_path("index-helper.sock"));
+ cleanup_shm();
+}
+
+static void cleanup_on_signal(int sig)
+{
+ /* We ignore sigpipes -- that's just a client being broken. */
+ if (sig == SIGPIPE)
+ return;
+ cleanup();
+ sigchain_pop(sig);
+ raise(sig);
+}
+
+static int shared_mmap_create(int file_flags, int file_mode, size_t size,
+ void **new_mmap, int mmap_prot, int mmap_flags,
+ const char *path)
+{
+ int fd = -1;
+ int ret = -1;
+
+ fd = open(path, file_flags, file_mode);
+
+ if (fd < 0)
+ goto done;
+
+ if (ftruncate(fd, size))
+ goto done;
+
+ *new_mmap = mmap(NULL, size, mmap_prot, mmap_flags, fd, 0);
+
+ if (*new_mmap == MAP_FAILED) {
+ *new_mmap = NULL;
+ goto done;
+ }
+ madvise(new_mmap, size, MADV_WILLNEED);
+
+ ret = 0;
+done:
+ if (fd > 0)
+ close(fd);
+ return ret;
+}
+
+static void share_index(struct index_state *istate, struct shm *is)
+{
+ void *new_mmap;
+ if (istate->mmap_size <= 20 ||
+ hashcmp(istate->sha1,
+ (unsigned char *)istate->mmap + istate->mmap_size - 20) ||
+ !hashcmp(istate->sha1, is->sha1) ||
+ shared_mmap_create(O_CREAT | O_EXCL | O_RDWR, 0700,
+ istate->mmap_size, &new_mmap,
+ PROT_READ | PROT_WRITE, MAP_SHARED,
+ git_path("shm-index-%s",
+ sha1_to_hex(istate->sha1))) < 0)
+ return;
+
+ release_index_shm(is);
+ is->size = istate->mmap_size;
+ is->shm = new_mmap;
+ hashcpy(is->sha1, istate->sha1);
+
+ memcpy(new_mmap, istate->mmap, istate->mmap_size - 20);
+
+ /*
+ * The trailing hash must be written last after everything is
+ * written. It's the indication that the shared memory is now
+ * ready.
+ * The memory barrier here matches read-cache.c:try_shm.
+ */
+ __sync_synchronize();
+
+ hashcpy((unsigned char *)new_mmap + istate->mmap_size - 20, is->sha1);
+}
+
+static void share_the_index(void)
+{
+ if (the_index.split_index && the_index.split_index->base)
+ share_index(the_index.split_index->base, &shm_base_index);
+ share_index(&the_index, &shm_index);
+ discard_index(&the_index);
+}
+
+static void set_socket_blocking_flag(int fd, int make_nonblocking)
+{
+ int flags;
+
+ flags = fcntl(fd, F_GETFL, NULL);
+
+ if (flags < 0)
+ die(_("fcntl failed"));
+
+ if (make_nonblocking)
+ flags |= O_NONBLOCK;
+ else
+ flags &= ~O_NONBLOCK;
+
+ if (fcntl(fd, F_SETFL, flags) < 0)
+ die(_("fcntl failed"));
+}
+
+static void refresh(void)
+{
+ discard_index(&the_index);
+ the_index.keep_mmap = 1;
+ the_index.to_shm = 1;
+ if (read_cache() < 0)
+ die(_("could not read index"));
+ share_the_index();
+}
+
+#ifndef NO_MMAP
+
+static void loop(int fd, int idle_in_seconds)
+{
+ struct timeval timeout;
+ struct timeval *timeout_p;
+
+ while (1) {
+ fd_set readfds;
+ int result, client_fd;
+ struct strbuf command = STRBUF_INIT;
+
+ /* need to reset timer in case select() decremented it */
+ if (idle_in_seconds) {
+ timeout.tv_usec = 0;
+ timeout.tv_sec = idle_in_seconds;
+ timeout_p = &timeout;
+ } else {
+ timeout_p = NULL;
+ }
+
+ /* Wait for a request */
+ FD_ZERO(&readfds);
+ FD_SET(fd, &readfds);
+ result = select(fd + 1, &readfds, NULL, NULL, timeout_p);
+ if (result < 0) {
+ if (errno == EINTR)
+ /*
+ * This can lead to an overlong keepalive,
+ * but that is better than a premature exit.
+ */
+ continue;
+ die_errno(_("select() failed"));
+ }
+ if (result == 0)
+ /* timeout */
+ break;
+
+ client_fd = accept(fd, NULL, NULL);
+ if (client_fd < 0)
+ /*
+ * An error here is unlikely -- it probably
+ * indicates that the connecting process has
+ * already dropped the connection.
+ */
+ continue;
+
+ /*
+ * Our connection to the client is blocking since a client
+ * can always be killed by SIGINT or similar.
+ */
+ set_socket_blocking_flag(client_fd, 0);
+
+ if (strbuf_getwholeline_fd(&command, client_fd, '\0') == 0) {
+ if (!strcmp(command.buf, "refresh")) {
+ refresh();
+ } else if (!strcmp(command.buf, "poke")) {
+ /*
+ * Just a poke to keep us
+ * alive, nothing to do.
+ */
+ } else {
+ warning("BUG: Bogus command %s", command.buf);
+ }
+ } else {
+ /*
+ * No command from client. Probably it's just
+ * a liveness check. Just close up.
+ */
+ }
+ close(client_fd);
+ strbuf_release(&command);
+ }
+
+ close(fd);
+}
+
+#else
+
+static void loop(int fd, int idle_in_seconds)
+{
+ die(_("index-helper is not supported on this platform"));
+}
+
+#endif
+
+static const char * const usage_text[] = {
+ N_("git index-helper [options]"),
+ NULL
+};
+
+int main(int argc, char **argv)
+{
+ const char *prefix;
+ int idle_in_seconds = 600;
+ int fd;
+ struct strbuf socket_path = STRBUF_INIT;
+ struct option options[] = {
+ OPT_INTEGER(0, "exit-after", &idle_in_seconds,
+ N_("exit if not used after some seconds")),
+ OPT_END()
+ };
+
+ git_extract_argv0_path(argv[0]);
+ git_setup_gettext();
+
+ if (argc == 2 && !strcmp(argv[1], "-h"))
+ usage_with_options(usage_text, options);
+
+ prefix = setup_git_directory();
+ if (parse_options(argc, (const char **)argv, prefix,
+ options, usage_text, 0))
+ die(_("too many arguments"));
+
+ atexit(cleanup);
+ sigchain_push_common(cleanup_on_signal);
+
+ strbuf_git_path(&socket_path, "index-helper.sock");
+
+ fd = unix_stream_listen(socket_path.buf);
+ if (fd < 0)
+ die_errno(_("could not set up index-helper socket"));
+
+ loop(fd, idle_in_seconds);
+
+ close(fd);
+ return 0;
+}
diff --git a/read-cache.c b/read-cache.c
index 7e387e9..af2101f 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -18,6 +18,7 @@
#include "varint.h"
#include "split-index.h"
#include "utf8.h"
+#include "unix-socket.h"
static struct cache_entry *refresh_cache_entry(struct cache_entry *ce,
unsigned int options);
@@ -1541,6 +1542,95 @@ static void post_read_index_from(struct index_state *istate)
tweak_untracked_cache(istate);
}
+static int poke_daemon(struct index_state *istate,
+ const struct stat *st, int refresh_cache)
+{
+ int fd;
+ int ret = 0;
+ const char *socket_path;
+
+ /* if this is from index-helper, do not poke itself (recursively) */
+ if (istate->to_shm)
+ return 0;
+
+ socket_path = git_path("index-helper.sock");
+ if (!socket_path)
+ return -1;
+
+ fd = unix_stream_connect(socket_path);
+ if (refresh_cache) {
+ ret = write_in_full(fd, "refresh", 8) != 8;
+ } else {
+ ret = write_in_full(fd, "poke", 5) != 5;
+ }
+
+ close(fd);
+ return ret;
+}
+
+static int is_main_index(struct index_state *istate)
+{
+ return istate == &the_index ||
+ (the_index.split_index &&
+ istate == the_index.split_index->base);
+}
+
+/*
+ * Try to open and verify a cached shm index if available. Return 0 if
+ * succeeds (istate->mmap and istate->mmap_size are updated). Return
+ * negative otherwise.
+ */
+static int try_shm(struct index_state *istate)
+{
+ void *new_mmap = NULL;
+ size_t old_size = istate->mmap_size;
+ ssize_t new_size;
+ const unsigned char *sha1;
+ struct stat st;
+ int fd = -1;
+
+ if (!is_main_index(istate) ||
+ old_size <= 20 ||
+ stat(git_path("index-helper.sock"), &st))
+ return -1;
+ if (poke_daemon(istate, &st, 0))
+ return -1;
+ sha1 = (unsigned char *)istate->mmap + old_size - 20;
+
+ fd = open(git_path("shm-index-%s", sha1_to_hex(sha1)), O_RDONLY);
+ if (fd < 0)
+ goto fail;
+
+ if (fstat(fd, &st))
+ goto fail;
+
+ new_size = st.st_size;
+ new_mmap = mmap(NULL, new_size, PROT_READ, MAP_SHARED, fd, 0);
+ if (new_size <= 20 ||
+ hashcmp((unsigned char *)istate->mmap + old_size - 20,
+ (unsigned char *)new_mmap + new_size - 20)) {
+ if (new_mmap)
+ munmap(new_mmap, new_size);
+ goto fail;
+ }
+
+ /* The memory barrier here matches index-helper.c:share_index. */
+ __sync_synchronize();
+
+ munmap(istate->mmap, istate->mmap_size);
+ istate->mmap = new_mmap;
+ istate->mmap_size = new_size;
+ istate->from_shm = 1;
+ close(fd);
+ return 0;
+
+fail:
+ if (fd >= 0)
+ close(fd);
+ poke_daemon(istate, &st, 1);
+ return -1;
+}
+
/* remember to discard_cache() before reading a different cache! */
int do_read_index(struct index_state *istate, const char *path, int must_exist)
{
@@ -1555,6 +1645,7 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
if (istate->initialized)
return istate->cache_nr;
+ istate->from_shm = 0;
istate->timestamp.sec = 0;
istate->timestamp.nsec = 0;
fd = open(path, O_RDONLY);
@@ -1574,15 +1665,17 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
mmap = xmmap(NULL, mmap_size, PROT_READ, MAP_PRIVATE, fd, 0);
if (mmap == MAP_FAILED)
die_errno("unable to map index file");
- if (istate->keep_mmap) {
- istate->mmap = mmap;
- istate->mmap_size = mmap_size;
- }
close(fd);
- hdr = mmap;
- if (verify_hdr(hdr, mmap_size) < 0)
+ istate->mmap = mmap;
+ istate->mmap_size = mmap_size;
+ if (try_shm(istate) &&
+ verify_hdr(istate->mmap, istate->mmap_size) < 0)
goto unmap;
+ hdr = mmap = istate->mmap;
+ mmap_size = istate->mmap_size;
+ if (!istate->keep_mmap)
+ istate->mmap = NULL;
hashcpy(istate->sha1, (const unsigned char *)hdr + mmap_size - 20);
istate->version = ntohl(hdr->hdr_version);
@@ -1662,6 +1755,8 @@ int read_index_from(struct index_state *istate, const char *path)
else
split_index->base = xcalloc(1, sizeof(*split_index->base));
split_index->base->keep_mmap = istate->keep_mmap;
+ split_index->base->to_shm = istate->to_shm;
+ split_index->base->from_shm = istate->from_shm;
ret = do_read_index(split_index->base,
git_path("sharedindex.%s",
sha1_to_hex(split_index->base_sha1)), 1);
@@ -1712,6 +1807,8 @@ int discard_index(struct index_state *istate)
discard_split_index(istate);
free_untracked_cache(istate->untracked);
istate->untracked = NULL;
+ istate->from_shm = 0;
+ istate->to_shm = 0;
return 0;
}
@@ -2138,9 +2235,14 @@ static int do_write_locked_index(struct index_state *istate, struct lock_file *l
return ret;
assert((flags & (COMMIT_LOCK | CLOSE_LOCK)) !=
(COMMIT_LOCK | CLOSE_LOCK));
- if (flags & COMMIT_LOCK)
- return commit_locked_index(lock);
- else if (flags & CLOSE_LOCK)
+ if (flags & COMMIT_LOCK) {
+ struct stat st;
+ ret = commit_locked_index(lock);
+ if (!ret && is_main_index(istate) &&
+ !stat(git_path("index-helper.sock"), &st))
+ poke_daemon(istate, &st, 1);
+ return ret;
+ } else if (flags & CLOSE_LOCK)
return close_lock_file(lock);
else
return ret;
diff --git a/t/t7900-index-helper.sh b/t/t7900-index-helper.sh
new file mode 100755
index 0000000..114c112
--- /dev/null
+++ b/t/t7900-index-helper.sh
@@ -0,0 +1,23 @@
+#!/bin/sh
+#
+# Copyright (c) 2016, Twitter, Inc
+#
+
+test_description='git-index-helper
+
+Testing git index-helper
+'
+
+. ./test-lib.sh
+
+test -n "$NO_MMAP" && {
+ skip_all='skipping index-helper tests: no mmap'
+ test_done
+}
+
+test_expect_success 'index-helper smoke test' '
+ git index-helper --exit-after 1 &&
+ test_path_is_missing .git/index-helper.sock
+'
+
+test_done
--
2.4.2.767.g62658d5-twtrsrc
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v5 03/15] index-helper: new daemon for caching index and related stuff
@ 2016-04-20 0:31 Duy Nguyen
2016-04-20 0:44 ` David Turner
2016-04-20 12:19 ` Johannes Schindelin
0 siblings, 2 replies; 7+ messages in thread
From: Duy Nguyen @ 2016-04-20 0:31 UTC (permalink / raw)
To: David Turner; +Cc: Git Mailing List, Ramsay Jones
On Wed, Apr 20, 2016 at 6:27 AM, David Turner <dturner@twopensource.com> wrote:
> Shared memory is done by storing files in a per-repository temporary
> directory. This is more portable than shm (which requires
> posix-realtime and has various quirks on OS X). It might even work on
> Windows, although this has not been tested.
There's another option, but I'm not sure if it's too clever/tricky to
do. Anyway, on *nix we can send file descriptors over unix socket [2],
then mmap them back to access content. On Windows, it looks like
DuplicateHandle [1] can do nearly the same thing. This keeps
everything in memory and we don't have to worry about cleaning up
shm-* files.
[1] http://lackingrhoticity.blogspot.com/2015/05/passing-fds-handles-between-processes.html
[2] http://www.normalesup.org/~george/comp/libancillary/
--
Duy
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v5 03/15] index-helper: new daemon for caching index and related stuff
2016-04-20 0:31 [PATCH v5 03/15] index-helper: new daemon for caching index and related stuff Duy Nguyen
@ 2016-04-20 0:44 ` David Turner
2016-04-20 12:19 ` Johannes Schindelin
1 sibling, 0 replies; 7+ messages in thread
From: David Turner @ 2016-04-20 0:44 UTC (permalink / raw)
To: Duy Nguyen; +Cc: Git Mailing List, Ramsay Jones
On Wed, 2016-04-20 at 07:31 +0700, Duy Nguyen wrote:
> On Wed, Apr 20, 2016 at 6:27 AM, David Turner <
> dturner@twopensource.com> wrote:
> > Shared memory is done by storing files in a per-repository
> > temporary
> > directory. This is more portable than shm (which requires
> > posix-realtime and has various quirks on OS X). It might even work
> > on
> > Windows, although this has not been tested.
>
> There's another option, but I'm not sure if it's too clever/tricky to
> do. Anyway, on *nix we can send file descriptors over unix socket
> [2],
> then mmap them back to access content. On Windows, it looks like
> DuplicateHandle [1] can do nearly the same thing. This keeps
> everything in memory and we don't have to worry about cleaning up
> shm-* files.
>
> [1] http://lackingrhoticity.blogspot.com/2015/05/passing-fds-handles-
> between-processes.html
> [2] http://www.normalesup.org/~george/comp/libancillary/
It's possibly a bit simpler for the index, although more complex
overall since we still need to write temp files for the watchman data.
Will consider/try.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v5 03/15] index-helper: new daemon for caching index and related stuff
2016-04-19 23:27 ` [PATCH v5 03/15] index-helper: new daemon for caching index and related stuff David Turner
@ 2016-04-20 12:17 ` Johannes Schindelin
2016-04-20 12:31 ` Duy Nguyen
2016-04-20 19:38 ` David Turner
0 siblings, 2 replies; 7+ messages in thread
From: Johannes Schindelin @ 2016-04-20 12:17 UTC (permalink / raw)
To: David Turner; +Cc: git, pclouds, Ramsay Jones
Hi Dave,
(apologies in advance if I may bring up anything that has been discussed
in earlier iterations; I simply was too busy with the rebase--helper
project to even look.)
On Tue, 19 Apr 2016, David Turner wrote:
> Shared memory is done by storing files in a per-repository temporary
> directory. This is more portable than shm (which requires
> posix-realtime and has various quirks on OS X). It might even work on
> Windows, although this has not been tested. The shared memory file's
> name folows the template "shm-<object>-<SHA1>" where <SHA1> is the
s/folows/follows/
And: now that it is no longer shared memory, should we not do away with
the "shm-" prefix?
> trailing SHA-1 of the index file. <object> is "index" for cached index
> files (and might later be "name-hash" for name-hash cache). If such
> shared memory exists, it contains the same index content as on
> disk. The content is already validated by the daemon and git won't
> validate it again (except comparing the trailing SHA-1s).
Excellent. I was worrying about the penalty of validating.
> We keep this daemon's logic as thin as possible. The "brain" stays in
> git. So the daemon can read and validate stuff, but that's all it's
> allowed to do. It does not add/create new information. It doesn't even
> accept direct updates from git.
I like this design. For now. Later, I really think we should add the
ability to update the index via the index-helper. I am thinking about a
method similar to watchman where a separate process (that may use the
inotify syscall on Linux, or tap into the NTFS journal on Windows) tells
the index-helper specifically which paths to look at, so that nobody ever
has to look at any files that were not modified at all.
> + if (*new_mmap == MAP_FAILED) {
> + *new_mmap = NULL;
> + goto done;
Shouldn't we provide some sort of error message here?
> + }
> + madvise(new_mmap, size, MADV_WILLNEED);
I guess I'll need to add support for that to compat/win32mmap.c (most
likely using PrefetchVirtualMemory() when available, i.e. Windows >= 8, see
https://msdn.microsoft.com/en-us/library/windows/desktop/hh780543.aspx :-))
Likewise, I think it will be easy to use bidirectional named pipes on
Windows to accommodate for the lack of Unix domain sockets...
Ciao,
Dscho
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v5 03/15] index-helper: new daemon for caching index and related stuff
2016-04-20 0:31 [PATCH v5 03/15] index-helper: new daemon for caching index and related stuff Duy Nguyen
2016-04-20 0:44 ` David Turner
@ 2016-04-20 12:19 ` Johannes Schindelin
1 sibling, 0 replies; 7+ messages in thread
From: Johannes Schindelin @ 2016-04-20 12:19 UTC (permalink / raw)
To: Duy Nguyen; +Cc: David Turner, Git Mailing List, Ramsay Jones
Hi Duy,
On Wed, 20 Apr 2016, Duy Nguyen wrote:
> On Wed, Apr 20, 2016 at 6:27 AM, David Turner <dturner@twopensource.com> wrote:
> > Shared memory is done by storing files in a per-repository temporary
> > directory. This is more portable than shm (which requires
> > posix-realtime and has various quirks on OS X). It might even work on
> > Windows, although this has not been tested.
>
> There's another option, but I'm not sure if it's too clever/tricky to
> do. Anyway, on *nix we can send file descriptors over unix socket [2],
> then mmap them back to access content.
This sounds too clever to me ;-)
Ciao,
Dscho
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v5 03/15] index-helper: new daemon for caching index and related stuff
2016-04-20 12:17 ` Johannes Schindelin
@ 2016-04-20 12:31 ` Duy Nguyen
2016-04-20 19:38 ` David Turner
1 sibling, 0 replies; 7+ messages in thread
From: Duy Nguyen @ 2016-04-20 12:31 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: David Turner, Git Mailing List, Ramsay Jones
On Wed, Apr 20, 2016 at 7:17 PM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>> We keep this daemon's logic as thin as possible. The "brain" stays in
>> git. So the daemon can read and validate stuff, but that's all it's
>> allowed to do. It does not add/create new information. It doesn't even
>> accept direct updates from git.
>
> I like this design. For now. Later, I really think we should add the
> ability to update the index via the index-helper.
Later we do. For watchman, index-helper creates and shares a piece of
information retrieved from watchman, which has to be combined back to
the index by git process. But it's still not a _direct_ index update.
> I am thinking about a
> method similar to watchman where a separate process (that may use the
> inotify syscall on Linux, or tap into the NTFS journal on Windows) tells
> the index-helper specifically which paths to look at, so that nobody ever
> has to look at any files that were not modified at all.
Am I missing something here? I thought watchman could already run on
Windows. We benefit from watchman, which was originally written for
Mercurial. If there's a better way to do inotify equivalent for
Windows, is it possible to do it in watchman so Mercurial can benefit
from it too?
On Wed, Apr 20, 2016 at 7:19 PM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>> On Wed, Apr 20, 2016 at 6:27 AM, David Turner <dturner@twopensource.com> wrote:
>> > Shared memory is done by storing files in a per-repository temporary
>> > directory. This is more portable than shm (which requires
>> > posix-realtime and has various quirks on OS X). It might even work on
>> > Windows, although this has not been tested.
>>
>> There's another option, but I'm not sure if it's too clever/tricky to
>> do. Anyway, on *nix we can send file descriptors over unix socket [2],
>> then mmap them back to access content.
>
> This sounds too clever to me ;-)
Well. At least we have considered everything (that I'm aware of).
--
Duy
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v5 03/15] index-helper: new daemon for caching index and related stuff
2016-04-20 12:17 ` Johannes Schindelin
2016-04-20 12:31 ` Duy Nguyen
@ 2016-04-20 19:38 ` David Turner
1 sibling, 0 replies; 7+ messages in thread
From: David Turner @ 2016-04-20 19:38 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: git, pclouds, Ramsay Jones
On Wed, 2016-04-20 at 14:17 +0200, Johannes Schindelin wrote:
> Hi Dave,
>
> (apologies in advance if I may bring up anything that has been
> discussed
> in earlier iterations; I simply was too busy with the rebase--helper
> project to even look.)
>
> On Tue, 19 Apr 2016, David Turner wrote:
>
> > Shared memory is done by storing files in a per-repository
> > temporary
> > directory. This is more portable than shm (which requires
> > posix-realtime and has various quirks on OS X). It might even work
> > on
> > Windows, although this has not been tested. The shared memory
> > file's
> > name folows the template "shm-<object>-<SHA1>" where <SHA1> is the
>
> s/folows/follows/
Will fix, thanks.
> And: now that it is no longer shared memory, should we not do away
> with
> the "shm-" prefix?
Hm. It's intended to be shared via MAP_SHARED. So it is just a "disk
-backed" shared memory object instead of a POSIX shm object.
> > + if (*new_mmap == MAP_FAILED) {
> > + *new_mmap = NULL;
> > + goto done;
>
> Shouldn't we provide some sort of error message here?
We generally try to fail silently on index-helper failures -- the user
isn't necessarily expecting output at that point, and we can always
fall back to directly reading the index.
On the other hand, this error should pretty much never happen, so maybe
it's worth calling out?
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2016-04-20 19:39 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-04-20 0:31 [PATCH v5 03/15] index-helper: new daemon for caching index and related stuff Duy Nguyen
2016-04-20 0:44 ` David Turner
2016-04-20 12:19 ` Johannes Schindelin
-- strict thread matches above, loose matches on Subject: below --
2016-04-19 23:27 [PATCH v5 00/15] index-helper/watchman David Turner
2016-04-19 23:27 ` [PATCH v5 03/15] index-helper: new daemon for caching index and related stuff David Turner
2016-04-20 12:17 ` Johannes Schindelin
2016-04-20 12:31 ` Duy Nguyen
2016-04-20 19:38 ` David Turner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).