git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Thomas Gummerer <t.gummerer@gmail.com>
To: David Turner <dturner@twopensource.com>
Cc: git@vger.kernel.org, mhagger@alum.mit.edu
Subject: Re: [PATCH v3 19/20] refs: add LMDB refs backend
Date: Fri, 15 Jan 2016 14:33:17 +0100	[thread overview]
Message-ID: <20160115133317.GJ10612@hank> (raw)
In-Reply-To: <1452788777-24954-20-git-send-email-dturner@twopensource.com>

[I don't know too much about the refs code, but was still interested
in this patch series, so below are the few things that I noticed while
reading through it.]

On 01/14, David Turner wrote:
> Add a database backend for refs using LMDB.  This backend runs git
> for-each-ref about 30% faster than the files backend with fully-packed
> refs on a repo with ~120k refs.  It's also about 4x faster than using
> fully-unpacked refs.  In addition, and perhaps more importantly, it
> avoids case-conflict issues on OS X.
>
> LMDB has a few features that make it suitable for usage in git:
>
> 1. It is relatively lightweight; it requires only one header file, and
> the library code takes under 64k at runtime.
>
> 2. It is well-tested: it's been used in OpenLDAP for years.
>
> 3. It's very fast.  LMDB's benchmarks show that it is among
> the fastest key-value stores.
>
> 4. It has a relatively simple concurrency story; readers don't
> block writers and writers don't block readers.
>
> Ronnie Sahlberg's original version of this patchset used tdb.  The
> major disadvantage of tdb is that tdb is hard to build on OS X.  It's
> also not in homebrew.  So lmdb seemed simpler.
>
> To test this backend's correctness, I hacked test-lib.sh and
> test-lib-functions.sh to run all tests under the refs backend. Dozens
> of tests use manual ref/reflog reading/writing, or create submodules
> without passing --ref-storage to git init.  If those tests are
> changed to use the update-ref machinery or test-refs-lmdb-backend (or,
> in the case of packed-refs, corrupt refs, and dumb fetch tests, are
> skipped), the only remaining failing tests are the git-new-workdir
> tests and the gitweb tests.
>
> Signed-off-by: David Turner <dturner@twopensource.com>
> ---
>  .gitignore                                     |    1 +
>  Documentation/config.txt                       |    7 +
>  Documentation/git-clone.txt                    |    3 +-
>  Documentation/git-init.txt                     |    2 +-
>  Documentation/technical/refs-lmdb-backend.txt  |   52 +
>  Documentation/technical/repository-version.txt |    5 +
>  Makefile                                       |   12 +
>  config.c                                       |   29 +
>  configure.ac                                   |   33 +
>  contrib/workdir/git-new-workdir                |    3 +
>  refs.h                                         |    2 +
>  refs/lmdb-backend.c                            | 2051 ++++++++++++++++++++++++
>  setup.c                                        |   11 +-
>  test-refs-lmdb-backend.c                       |   64 +
>  transport.c                                    |    7 +-
>  15 files changed, 2275 insertions(+), 7 deletions(-)
>  create mode 100644 Documentation/technical/refs-lmdb-backend.txt
>  create mode 100644 refs/lmdb-backend.c
>  create mode 100644 test-refs-lmdb-backend.c
>

[...]

> diff --git a/Documentation/technical/refs-lmdb-backend.txt b/Documentation/technical/refs-lmdb-backend.txt
> new file mode 100644
> index 0000000..c328bfc
> --- /dev/null
> +++ b/Documentation/technical/refs-lmdb-backend.txt
> @@ -0,0 +1,52 @@
> +Notes on the LMDB refs backend
> +==============================
> +
> +Design:
> +------
> +
> +Refs and reflogs are stored in a lmdb database in .git/refdb.  All
> +keys and values are \0-terminated.
> +
> +Keys for refs are the name of the ref (e.g. refs/heads/master).
> +Values are the value of the ref, in hex
> +(e.g. 61f23eb0f81357c19fa91e2b8c6f3906c3a8f9b0).
> +
> +All per-worktree refs (refs/bisect/* and HEAD) are stored using
> +the traditional files-based backend.
> +
> +Reflogs are stored as a series of database entries.
> +
> +For non-empty reflogs, there is one entry per logged ref update.  The
> +key format is logs/[refname]\0[timestamp].  The timestamp is a 64-bit
> +unsigned integer number of nanoseconds since 1/1/1970, stored in
> +network byte order.  This means that reflog entries are
> +chronologically ordered.  Because LMDB is a btree database, we can
> +efficiently iterate over these keys.
> +
> +For an empty reflog, there is a "header" entry to show that a reflog
> +exists.  The header has the same format as an ordinary reflog, but with
> +a timeztamp of all zeros and an empty value.

s/timeztamp/timestamp/

> +
> +Reflog values are in the same format as the original files-based
> +reflog, including the trailing LF. The date in the reflog value
> +matches the date in the timestamp field.
> +
> +Weaknesses:
> +-----------
> +
> +The reflog format is somewhat inefficient: a binary format could store
> +reflog date/time information in somewhat less space.
> +
> +The rsync and file:// transports don't work yet, because they
> +don't use the refs API.
> +
> +git new-workdir is incompatible with the lmdb backend.  Fortunately,
> +git new-workdir is deprecated, and worktrees work fine.
> +
> +LMDB locks the entire database for write.  Any other writer waits
> +until the first writer is done before beginning.  Readers do not wait
> +for writers, and writers do not wait for readers.  The underlying
> +scheme is approximately MVCC; each reader's queries see the state of
> +the database as-of the time that the reader acquired its read lock.
> +This is not too far off from the files backend, which loads all refs
> +into memory when one is requested.

[...]

> diff --git a/contrib/workdir/git-new-workdir b/contrib/workdir/git-new-workdir
> index 888c34a..c035a43 100755
> --- a/contrib/workdir/git-new-workdir
> +++ b/contrib/workdir/git-new-workdir
> @@ -28,6 +28,9 @@ git_dir=$(cd "$orig_git" 2>/dev/null &&
>    git rev-parse --git-dir 2>/dev/null) ||
>    die "Not a git repository: \"$orig_git\""
>
> +
> +test "$(git config extensions.refstorage)" = "lmdb" && die "git-new-workdir is incompatible with the refs lmdb storage"
> +

Is it expected that other potential ref backends are compatible with
git-new-workdir?  Otherwise I think it would make more sense to
whitelist the files backend here instead of blacklisting the lmdb
backend, so we don't risk forgetting about this when adding another
backend.

>  case "$git_dir" in
>  .git)
>  	git_dir="$orig_git/.git"

[...]

> +static int lmdb_init_db(struct strbuf *err, int shared)
> +{
> +	/*
> +	 * To create a db, all we need to do is make a directory for
> +	 * it to live in; lmdb will do the rest.
> +	 */
> +
> +	if (!db_path)
> +		db_path = xstrdup(real_path(get_refdb_path(get_git_common_dir())));

I think we're leaking some memory from get_refdb_path() here.
get_refdb_path() uses strbuf_detach(), which according to its docstring
makes its caller take care of the memory of the returned string.
real_path() then uses strbuf_addstr() to add the string to its
internal strbuf, but leaves the string we get from get_refdb_path()
alone, so it leaks.

> +
> +	if (mkdir(db_path, 0775) && errno != EEXIST) {
> +		strbuf_addf(err, "%s", strerror(errno));
> +		return -1;
> +	}
> +
> +	return 0;
> +}
> +

[...]

> +/*
> + * Begin a transaction. Because only one transaction per thread is
> + * permitted, we use a global transaction object.  If a read-write
> + * transaction is presently already in-progress, and a read-only
> + * transaction is requested, the read-write transaction will be
> + * returned instead.  If a read-write transaction is requested and a
> + * read-only transaction is open, the read-only transaction will be
> + * closed.
> + *
> + * It is a bug to request a read-write transaction during another
> + * read-write transaction.
> + *
> + * As a result, it is unsafe to retain read-only transactions past the
> + * point where a read-write transaction might be needed.  For
> + * instance, any call that has callbacks outside this module must
> + * conclude all of its reads from the database before calling those
> + * callbacks, or must reacquire the transaction after its callbacks
> + * are completed.
> + */
> +int lmdb_transaction_begin_flags(struct strbuf *err, unsigned int flags)
> +{
> +	int ret;
> +	MDB_txn *txn;
> +	static size_t last_txnid = 0;
> +	int force_restart = 0;
> +	MDB_envinfo stat;
> +
> +	if (!db_path)
> +		db_path = xstrdup(real_path(get_refdb_path(get_git_common_dir())));

Same comment about leaking memory from get_refdb_path() as above applies.

> +	init_env(&env, db_path);
> +
> +	/*
> +	 * Since each transaction sees a consistent view of the db,
> +	 * downstream processes that write the db won't be seen in
> +	 * this transaction.  We can check if the last transaction id
> +	 * has changed since this read transaction was started, and if
> +	 * so, we want to reopen the transaction.
> +	 */
> +
> +	mdb_env_info(env, &stat);
> +	if (stat.me_last_txnid != last_txnid) {
> +		force_restart = 1;
> +		last_txnid = stat.me_last_txnid;
> +	}
> +
> +	if (!transaction.txn) {
> +		ret = mdb_txn_begin(env, NULL, flags, &txn);
> +		if (ret) {
> +			strbuf_addf(err, "mdb_txn_begin failed: %s",
> +				    mdb_strerror(ret));
> +			return -1;
> +		}
> +		ret = mdb_dbi_open(txn, NULL, 0, &transaction.dbi);
> +		if (ret) {
> +			strbuf_addf(err, "mdb_txn_open failed: %s",
> +				    mdb_strerror(ret));
> +			return -1;
> +		}
> +		transaction.txn = txn;
> +		transaction.flags = flags;
> +		return 0;
> +	}
> +
> +	if (transaction.flags == flags && !(flags & MDB_RDONLY))
> +		die("BUG: rw transaction started during another rw txn");
> +
> +	if (force_restart || (transaction.flags != flags && transaction.flags & MDB_RDONLY)) {
> +		/*
> +		 * RO -> RW, or forced restart due to possible changes
> +		 * from downstream processes.
> +		 */
> +		mdb_txn_abort(transaction.txn);
> +		ret = mdb_txn_begin(env, NULL, flags, &txn);
> +		if (ret) {
> +			strbuf_addf(err, "restarting txn: mdb_txn_begin failed: %s",
> +				    mdb_strerror(ret));
> +			return -1;
> +		}
> +		ret = mdb_dbi_open(txn, NULL, 0, &transaction.dbi);
> +		if (ret) {
> +			strbuf_addf(err, "mdb_txn_open failed: %s",
> +				    mdb_strerror(ret));
> +			return -1;
> +		}
> +		transaction.txn = txn;
> +		transaction.flags = flags;
> +	}
> +	/* RW -> RO just keeps the RW txn */
> +	return 0;
> +}

[...]

  reply	other threads:[~2016-01-15 13:32 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-14 16:25 [PATCH v3 00/20] refs backend rebase on pu David Turner
2016-01-14 16:25 ` [PATCH v3 01/20] refs: add a backend method structure with transaction functions David Turner
2016-01-14 16:25 ` [PATCH v3 02/20] refs: add methods for misc ref operations David Turner
2016-01-14 16:26 ` [PATCH v3 03/20] refs: add methods for the ref iterators David Turner
2016-01-14 16:26 ` [PATCH v3 04/20] refs: add do_for_each_per_worktree_ref David Turner
2016-01-14 16:26 ` [PATCH v3 05/20] refs: add methods for reflog David Turner
2016-01-14 16:26 ` [PATCH v3 06/20] refs: add method for initial ref transaction commit David Turner
2016-01-14 16:26 ` [PATCH v3 07/20] refs: add method for delete_refs David Turner
2016-01-14 16:26 ` [PATCH v3 08/20] refs: add methods to init refs db David Turner
2016-01-14 16:26 ` [PATCH v3 09/20] refs: add method to rename refs David Turner
2016-01-14 16:26 ` [PATCH v3 10/20] refs: make lock generic David Turner
2016-01-14 16:26 ` [PATCH v3 11/20] refs: move duplicate check to common code David Turner
2016-01-14 16:26 ` [PATCH v3 12/20] refs: allow log-only updates David Turner
2016-01-14 16:26 ` [PATCH v3 13/20] refs: resolve symbolic refs first David Turner
2016-02-04  7:37   ` Jeff King
2016-02-04 19:24     ` David Turner
2016-01-14 16:26 ` [PATCH v3 14/20] refs: always handle non-normal refs in files backend David Turner
2016-01-14 16:26 ` [PATCH v3 15/20] init: allow alternate backends to be set for new repos David Turner
2016-01-15 11:33   ` SZEDER Gábor
2016-01-15 12:51   ` Thomas Gummerer
2016-01-19 19:12     ` David Turner
2016-02-04  9:48   ` Duy Nguyen
2016-02-04 20:05     ` David Turner
2016-01-14 16:26 ` [PATCH v3 16/20] refs: check submodules ref storage config David Turner
2016-01-14 16:26 ` [PATCH v3 17/20] refs: allow ref backend to be set for clone David Turner
2016-01-15 11:32   ` SZEDER Gábor
2016-01-19 17:06     ` David Turner
2016-01-21  9:08       ` SZEDER Gábor
2016-01-14 16:26 ` [PATCH v3 18/20] svn: learn ref-storage argument David Turner
2016-01-15 11:34   ` SZEDER Gábor
2016-01-14 16:26 ` [PATCH v3 19/20] refs: add LMDB refs backend David Turner
2016-01-15 13:33   ` Thomas Gummerer [this message]
2016-01-19 18:55     ` David Turner
2016-02-04  9:58   ` Duy Nguyen
2016-02-04 19:33     ` David Turner
2016-01-14 16:26 ` [PATCH v3 20/20] refs: tests for lmdb backend David Turner
2016-02-02 20:08 ` [PATCH v3 00/20] refs backend rebase on pu David Turner
2016-02-02 22:13   ` Junio C Hamano
2016-02-04  0:09     ` Junio C Hamano
2016-02-04  1:12       ` David Turner
2016-02-04  1:54         ` Ramsay Jones
2016-02-04  2:58           ` Jeff King
2016-02-04 22:44             ` David Turner
2016-02-04 10:09   ` Duy Nguyen
2016-02-04 21:39     ` David Turner
2016-02-04 11:42   ` Duy Nguyen
2016-02-04 20:25     ` David Turner
2016-02-04 20:39       ` Ramsay Jones
2016-02-04 21:23         ` David Turner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160115133317.GJ10612@hank \
    --to=t.gummerer@gmail.com \
    --cc=dturner@twopensource.com \
    --cc=git@vger.kernel.org \
    --cc=mhagger@alum.mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).