From: Duy Nguyen <pclouds@gmail.com>
To: David Turner <dturner@twopensource.com>
Cc: git mailing list <git@vger.kernel.org>,
mhagger@alum.mit.edu, Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH v5 25/27] refs: add LMDB refs storage backend
Date: Sat, 20 Feb 2016 09:58:27 +0700 [thread overview]
Message-ID: <20160220025826.GA9338@lanh> (raw)
In-Reply-To: <1455922186.7528.97.camel@twopensource.com>
> On Fri, 2016-02-19 at 09:54 +0700, Duy Nguyen wrote:
>> On Fri, Feb 19, 2016 at 3:23 AM, David Turner <
>> dturner@twopensource.com> wrote:
>> > > > +static int read_per_worktree_ref(const char *submodule, const
>> > > > char
>> > > > *refname,
>> > > > + struct MDB_val *val, int
>> > > > *needs_free)
>> > >
>> > > From what I read, I suspect these _per_worktree functions will be
>> > > identical for the next backend. Should we just hand over the job
>> > > for
>> > > files backend? For all entry points that may deal with per
>> > > -worktree
>> > > refs, e.g. lmdb_resolve_ref_unsafe, can we check ref_type() first
>> > > thing, if it's per-worktree we call
>> > > refs_be_files.resolve_ref_unsafe()
>> > > instead? It could even be done at frontend level,
>> > > e.g. refs.c:resolve_ref_unsafe().
>> > >
>> > > Though I may be talking rubbish here because I don't know how
>> > > whether
>> > > it has anything to do with transactions.
>> >
>> > The reason I did it this way is that some ref chains cross backend
>> > boundaries (e.g. HEAD -> refs/heads/master). But if we have other
>> > backends later, we could generalize.
>>
>> Crossing backends should go through frontend again, imo. But I don't
>> really know if it's efficient.
>
> It's pretty tricky to maintain state (e.g. count of symref redirects)
> across that barrier. So I'm not sure how to do it cleanly.
I notice files backend does pretty much the same thing. "files"
backend looks more like two backends combined in one, one is files,
the other is packed-refs. And it looks like we could solve it by
providing a lower level api, read_raw_ref() or something, that
retrieves the ref without any validation or link following. More on
this later.
>> > > I'm not sure I get this comment. D/F conflicts are no longer a
>> > > thing
>> > > for lmdb backend, right?
>> >
>> > I'm trying to avoid the lmdb backend creating a set of refs that
>> > the
>> > files backend can't handle. This would make collaboration with
>> > other
>> > versions of git more difficult.
>>
>> It already is. If you create refs "foo" and "FOO" on case sensitive
>> file system and clone it on a case-insensitive one, you face the same
>> problem. We may have an optional configuration knob to prevent
>> incompatibilities with files backend, but I think that should be done
>> (and enforced if necessary) outside backends.
>
> Sure, the current state isn't perfect, but why make it worse?
I see it from a different angle. The current state isn't perfect, but
we will be moving to a better future where "files" backend may
eventually be deprecated. Why hold back?
But this line of reasoning only works if we have a new backend capable
of replacing "files" without regressions or introducing new
dependency. Which is why I suggest a new backend [1] (or implement
Shawn's RefTree if it's proven as good with small repos)
I have no problem if you want to stay strictly compatible with "files"
though.
[1] http://thread.gmane.org/gmane.comp.version-control.git/285893/focus=286654
On Fri, Feb 19, 2016 at 05:49:46PM -0500, David Turner wrote:
> >
> > This code looks a lot like near the end of resolve_ref_1(). Maybe we
> > could share the code in refs/backend-common.c or something and call
> > here instead?
>
> Something like the following?
>
> commit aad6b84fd1869f6e1cf6ed15bcece0c2f6429e9d
> Author: David Turner <dturner@twopensource.com>
> Date: Thu Feb 18 17:09:29 2016 -0500
>
> refs: break out some functions from resolve_ref_1
>
> A bunch of resolve_ref_1 is not backend-specific, so we can
> break it out into separate internal functions that other
> backends can use.
...
>
> I'm not sure I like it, because it breaks out these weird tiny
> functions that take a lot of arguments. But maybe it's worth it? What
> do you think?
OK how about we keep resolve_ref_1() whole and split real backend code
out? Something like these three patches (only built, did not test). A
bit ugly with continue_symlink, but it's just demonstration.
commit ef46fcdc62ef89fd5260ca054cd1d98f9f2d7c2b
Author: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Date: Sat Feb 20 09:18:45 2016 +0700
refs/files: move ref I/O code out of resolve_refs_1()
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 4bddfb3..f54f2ae 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -1407,6 +1407,95 @@ static int resolve_missing_loose_ref(const char *refname,
}
}
+static const char *continue_normal_ref = "read_ref returns a normal ref";
+static const char *continue_symlink = "read_ref returns a symlink";
+
+/*
+ * Read a ref from backend. Returning any values except
+ * continue_normal_ref or continue_symlink ends resolve_ref_1()
+ * execution. If the return value is not NULL, sha1 and flags must be
+ * updated correctly. except REF_ISBROKEN which is set by
+ * resolve_ref_1().
+ *
+ * If continue_* is returned, sb_contents must contain the ref data.
+ */
+static const char *parse_ref(const char *refname,
+ int resolve_flags,
+ unsigned char *sha1,
+ int *flags,
+ struct strbuf *sb_path,
+ struct strbuf *sb_contents)
+{
+ const char *path;
+ struct stat st;
+ int fd;
+
+ strbuf_reset(sb_path);
+ strbuf_git_path(sb_path, "%s", refname);
+ path = sb_path->buf;
+
+ /*
+ * We might have to loop back here to avoid a race
+ * condition: first we lstat() the file, then we try
+ * to read it as a link or as a file. But if somebody
+ * changes the type of the file (file <-> directory
+ * <-> symlink) between the lstat() and reading, then
+ * we don't want to report that as an error but rather
+ * try again starting with the lstat().
+ */
+stat_ref:
+ if (lstat(path, &st) < 0) {
+ if (errno != ENOENT)
+ return NULL;
+ if (resolve_missing_loose_ref(refname, resolve_flags,
+ sha1, flags))
+ return NULL;
+ return refname;
+ }
+
+ /* Follow "normalized" - ie "refs/.." symlinks by hand */
+ if (S_ISLNK(st.st_mode)) {
+ strbuf_reset(sb_contents);
+ if (strbuf_readlink(sb_contents, path, 0) < 0) {
+ if (errno == ENOENT || errno == EINVAL)
+ /* inconsistent with lstat; retry */
+ goto stat_ref;
+ else
+ return NULL;
+ }
+ return continue_symlink;
+ }
+
+ /* Is it a directory? */
+ if (S_ISDIR(st.st_mode)) {
+ errno = EISDIR;
+ return NULL;
+ }
+
+ /*
+ * Anything else, just open it and try to use it as
+ * a ref
+ */
+ fd = open(path, O_RDONLY);
+ if (fd < 0) {
+ if (errno == ENOENT)
+ /* inconsistent with lstat; retry */
+ goto stat_ref;
+ else
+ return NULL;
+ }
+ strbuf_reset(sb_contents);
+ if (strbuf_read(sb_contents, fd, 256) < 0) {
+ int save_errno = errno;
+ close(fd);
+ errno = save_errno;
+ return NULL;
+ }
+ close(fd);
+
+ return continue_normal_ref;
+}
+
/* This function needs to return a meaningful errno on failure */
static const char *resolve_ref_1(const char *refname,
int resolve_flags,
@@ -1442,54 +1531,18 @@ static const char *resolve_ref_1(const char *refname,
bad_name = 1;
}
for (;;) {
- const char *path;
- struct stat st;
+ const char *ret;
char *buf;
- int fd;
if (--depth < 0) {
errno = ELOOP;
return NULL;
}
- strbuf_reset(sb_path);
- strbuf_git_path(sb_path, "%s", refname);
- path = sb_path->buf;
+ ret = parse_ref(refname, resolve_flags, sha1,
+ flags, sb_path, sb_contents);
- /*
- * We might have to loop back here to avoid a race
- * condition: first we lstat() the file, then we try
- * to read it as a link or as a file. But if somebody
- * changes the type of the file (file <-> directory
- * <-> symlink) between the lstat() and reading, then
- * we don't want to report that as an error but rather
- * try again starting with the lstat().
- */
- stat_ref:
- if (lstat(path, &st) < 0) {
- if (errno != ENOENT)
- return NULL;
- if (resolve_missing_loose_ref(refname, resolve_flags,
- sha1, flags))
- return NULL;
- if (bad_name) {
- hashclr(sha1);
- if (flags)
- *flags |= REF_ISBROKEN;
- }
- return refname;
- }
-
- /* Follow "normalized" - ie "refs/.." symlinks by hand */
- if (S_ISLNK(st.st_mode)) {
- strbuf_reset(sb_contents);
- if (strbuf_readlink(sb_contents, path, 0) < 0) {
- if (errno == ENOENT || errno == EINVAL)
- /* inconsistent with lstat; retry */
- goto stat_ref;
- else
- return NULL;
- }
+ if (ret == continue_symlink) {
if (starts_with(sb_contents->buf, "refs/") &&
!check_refname_format(sb_contents->buf, 0)) {
strbuf_swap(sb_refname, sb_contents);
@@ -1502,35 +1555,10 @@ static const char *resolve_ref_1(const char *refname,
}
continue;
}
- }
-
- /* Is it a directory? */
- if (S_ISDIR(st.st_mode)) {
- errno = EISDIR;
- return NULL;
- }
-
- /*
- * Anything else, just open it and try to use it as
- * a ref
- */
- fd = open(path, O_RDONLY);
- if (fd < 0) {
- if (errno == ENOENT)
- /* inconsistent with lstat; retry */
- goto stat_ref;
- else
- return NULL;
- }
- strbuf_reset(sb_contents);
- if (strbuf_read(sb_contents, fd, 256) < 0) {
- int save_errno = errno;
- close(fd);
- errno = save_errno;
- return NULL;
- }
- close(fd);
- strbuf_rtrim(sb_contents);
+ } else if (ret == refname)
+ break;
+ else if (ret != continue_normal_ref)
+ return ret;
/*
* Is it a symbolic ref?
@@ -1547,12 +1575,7 @@ static const char *resolve_ref_1(const char *refname,
errno = EINVAL;
return NULL;
}
- if (bad_name) {
- hashclr(sha1);
- if (flags)
- *flags |= REF_ISBROKEN;
- }
- return refname;
+ break;
}
if (flags)
*flags |= REF_ISSYMREF;
@@ -1578,6 +1601,13 @@ static const char *resolve_ref_1(const char *refname,
bad_name = 1;
}
}
+
+ if (bad_name) {
+ hashclr(sha1);
+ if (flags)
+ *flags |= REF_ISBROKEN;
+ }
+ return refname;
}
static const char *files_resolve_ref_unsafe(const char *refname,
After this resolve_ref_1() is backend independent. So we can make it
take parse_ref() as a function pointer instead.
commit 50d96b6f79b30b5ba17fa00ec3ee42845546a123
Author: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Date: Sat Feb 20 09:22:03 2016 +0700
refs/files-backend.c: let resolve_refs_1() accept parse_ref as callback
diff --git a/refs/files-backend.c b/refs/files-backend.c
index f54f2ae..9b4de9f 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -1424,7 +1424,8 @@ static const char *parse_ref(const char *refname,
unsigned char *sha1,
int *flags,
struct strbuf *sb_path,
- struct strbuf *sb_contents)
+ struct strbuf *sb_contents,
+ void *cb_data)
{
const char *path;
struct stat st;
@@ -1497,13 +1498,21 @@ stat_ref:
}
/* This function needs to return a meaningful errno on failure */
-static const char *resolve_ref_1(const char *refname,
- int resolve_flags,
- unsigned char *sha1,
- int *flags,
- struct strbuf *sb_refname,
- struct strbuf *sb_path,
- struct strbuf *sb_contents)
+const char *resolve_ref_1(const char *refname,
+ int resolve_flags,
+ unsigned char *sha1,
+ int *flags,
+ struct strbuf *sb_refname,
+ struct strbuf *sb_path,
+ struct strbuf *sb_contents,
+ const char *(*parse_ref)(const char *refname,
+ int resolve_flags,
+ unsigned char *sha1,
+ int *flags,
+ struct strbuf *sb_path,
+ struct strbuf *sb_contents,
+ void *cb_data),
+ void *cb_data)
{
int depth = MAXDEPTH;
int bad_name = 0;
@@ -1540,7 +1549,8 @@ static const char *resolve_ref_1(const char *refname,
}
ret = parse_ref(refname, resolve_flags, sha1,
- flags, sb_path, sb_contents);
+ flags, sb_path, sb_contents,
+ cb_data);
if (ret == continue_symlink) {
if (starts_with(sb_contents->buf, "refs/") &&
@@ -1621,7 +1631,8 @@ static const char *files_resolve_ref_unsafe(const char *refname,
const char *ret;
ret = resolve_ref_1(refname, resolve_flags, sha1, flags,
- &sb_refname, &sb_path, &sb_contents);
+ &sb_refname, &sb_path, &sb_contents,
+ parse_ref, NULL);
strbuf_release(&sb_path);
strbuf_release(&sb_contents);
return ret;
And finally we can make lmdb use resolve_ref_1(). lmdb-specific code
is in the new retrieve_ref() function.
commit 62a5df3117c0f825bc26fd09dda29e713f94d743 (HEAD -> lmdb)
Author: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Date: Sat Feb 20 09:33:01 2016 +0700
refs/lmdb-backend: use resolve_ref_1()
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 9b4de9f..44b7136 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -1407,7 +1407,7 @@ static int resolve_missing_loose_ref(const char *refname,
}
}
-static const char *continue_normal_ref = "read_ref returns a normal ref";
+const char *continue_normal_ref = "read_ref returns a normal ref";
static const char *continue_symlink = "read_ref returns a symlink";
/*
diff --git a/refs/lmdb-backend.c b/refs/lmdb-backend.c
index 6c0d7fb..7f169bd 100644
--- a/refs/lmdb-backend.c
+++ b/refs/lmdb-backend.c
@@ -544,74 +544,64 @@ done:
return ret;
}
+extern const char *continue_normal_ref;
+
+static const char *retrieve_ref(const char *refname,
+ int resolve_flags,
+ unsigned char *sha1,
+ int *flags,
+ struct strbuf *sb_path,
+ struct strbuf *sb_contents,
+ void *cb_data)
+{
+ struct lmdb_transaction *transaction = cb_data;
+ MDB_val key, val;
+ int needs_free; /* dont care, leak */
+
+ key.mv_data = (void *)refname;
+ key.mv_size = strlen(refname) + 1;
+
+ val.mv_data = NULL;
+ val.mv_size = 0;
+
+ if (mdb_get_or_die(transaction, &key, &val, &needs_free)) {
+ struct strbuf err = STRBUF_INIT;
+
+ if (resolve_flags & RESOLVE_REF_READING)
+ return NULL;
+
+ if (verify_refname_available_txn(transaction,
+ refname, NULL, NULL, &err)) {
+ error("%s", err.buf);
+ strbuf_release(&err);
+ return NULL;
+ }
+
+ hashclr(sha1);
+ return refname;
+ }
+
+ strbuf_reset(sb_contents);
+ strbuf_add(sb_contents, val.mv_data, val.mv_size);
+ return continue_normal_ref;
+}
+
static const char *resolve_ref_unsafe_txn(struct lmdb_transaction *transaction,
const char *refname,
int resolve_flags,
unsigned char *sha1,
int *flags)
{
- int bad_name = 0;
- char *ref_data;
- struct MDB_val key, val;
- struct strbuf err = STRBUF_INIT;
- int needs_free = 0;
+ static struct strbuf sb_refname = STRBUF_INIT;
+ struct strbuf sb_contents = STRBUF_INIT;
+ struct strbuf sb_path = STRBUF_INIT;
const char *ret;
- val.mv_size = 0;
- val.mv_data = NULL;
-
- if (flags)
- *flags = 0;
-
- if (check_refname_format(refname, REFNAME_ALLOW_ONELEVEL)) {
- if (flags)
- *flags |= REF_BAD_NAME;
-
- if (!(resolve_flags & RESOLVE_REF_ALLOW_BAD_NAME) ||
- !refname_is_safe(refname)) {
- errno = EINVAL;
- return NULL;
- }
- /*
- * dwim_ref() uses REF_ISBROKEN to distinguish between
- * missing refs and refs that were present but invalid,
- * to complain about the latter to stderr.
- *
- * We don't know whether the ref exists, so don't set
- * REF_ISBROKEN yet.
- */
- bad_name = 1;
- }
-
- key.mv_data = (void *)refname;
- key.mv_size = strlen(refname) + 1;
- if (mdb_get_or_die(transaction, &key, &val, &needs_free)) {
- if (bad_name) {
- hashclr(sha1);
- if (flags)
- *flags |= REF_ISBROKEN;
- }
-
- if (resolve_flags & RESOLVE_REF_READING)
- return NULL;
-
- if (verify_refname_available_txn(transaction, refname, NULL, NULL, &err)) {
- error("%s", err.buf);
- strbuf_release(&err);
- return NULL;
- }
-
- hashclr(sha1);
- return refname;
- }
-
- ref_data = val.mv_data;
- assert(ref_data[val.mv_size - 1] == 0);
-
- ret = parse_ref_data(transaction, refname, ref_data, sha1,
- resolve_flags, flags, bad_name);
- if (needs_free)
- free(ref_data);
+ ret = resolve_ref_1(refname, resolve_flags, sha1, flags,
+ &sb_refname, &sb_path, &sb_contents,
+ retrieve_ref, transaction);
+ strbuf_release(&sb_path);
+ strbuf_release(&sb_contents);
return ret;
}
lmdb-backend.c:retrieve_ref(), files-backend.c:parse_ref() can be made
part of ref api that, given a ref name, returns the ref raw data and
type. The frontend can decide what backend callback to use based on
refname, so retrieve_ref() in the end does not have to call
read_per_worktree_ref() internally anymore.
Hmm?
--
Duy
next prev parent reply other threads:[~2016-02-20 2:58 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-02-18 5:17 [PATCH v5 00/27] refs backends David Turner
2016-02-18 5:17 ` [PATCH v5 01/27] refs: Move head_ref{,_submodule} to the common code David Turner
2016-02-18 5:17 ` [PATCH v5 02/27] refs: move for_each_*ref* functions into " David Turner
2016-02-18 5:17 ` [PATCH v5 03/27] refs: add a backend method structure with transaction functions David Turner
2016-02-18 5:17 ` [PATCH v5 04/27] refs: add methods for misc ref operations David Turner
2016-02-18 5:17 ` [PATCH v5 05/27] refs: add method for do_for_each_ref David Turner
2016-02-18 5:17 ` [PATCH v5 06/27] refs: add do_for_each_per_worktree_ref David Turner
2016-02-18 5:17 ` [PATCH v5 07/27] refs: add methods for reflog David Turner
2016-02-18 5:17 ` [PATCH v5 08/27] refs: add method for initial ref transaction commit David Turner
2016-02-18 5:17 ` [PATCH v5 09/27] refs: add method for delete_refs David Turner
2016-02-18 5:17 ` [PATCH v5 10/27] refs: add methods to init refs db David Turner
2016-02-18 5:17 ` [PATCH v5 11/27] refs: add method to rename refs David Turner
2016-02-18 5:17 ` [PATCH v5 12/27] refs: forbid cross-backend ref renames David Turner
2016-02-20 4:30 ` Duy Nguyen
2016-02-24 20:48 ` David Turner
2016-02-18 5:17 ` [PATCH v5 13/27] refs: make lock generic David Turner
2016-02-18 5:17 ` [PATCH v5 14/27] refs: move duplicate check to common code David Turner
2016-02-18 5:17 ` [PATCH v5 15/27] refs: allow log-only updates David Turner
2016-02-18 5:17 ` [PATCH v5 16/27] refs: don't dereference on rename David Turner
2016-02-18 5:17 ` [PATCH v5 17/27] refs: on symref reflog expire, lock symref not referrent David Turner
2016-02-18 5:17 ` [PATCH v5 18/27] refs: resolve symbolic refs first David Turner
2016-02-18 5:17 ` [PATCH v5 19/27] refs: always handle non-normal refs in files backend David Turner
2016-02-18 5:17 ` [PATCH v5 20/27] init: allow alternate ref strorage to be set for new repos David Turner
2016-02-18 5:17 ` [PATCH v5 21/27] refs: check submodules' ref storage config David Turner
2016-02-18 5:17 ` [PATCH v5 22/27] clone: allow ref storage backend to be set for clone David Turner
2016-02-18 5:17 ` [PATCH v5 23/27] svn: learn ref-storage argument David Turner
2016-02-20 23:55 ` Eric Wong
2016-02-23 18:08 ` David Turner
2016-02-18 5:17 ` [PATCH v5 24/27] refs: add register_ref_storage_backends() David Turner
2016-02-18 5:17 ` [PATCH v5 25/27] refs: add LMDB refs storage backend David Turner
2016-02-18 8:50 ` Duy Nguyen
2016-02-18 20:23 ` David Turner
2016-02-18 21:15 ` Junio C Hamano
2016-02-19 2:54 ` Duy Nguyen
2016-02-19 19:10 ` David Turner
2016-02-20 13:14 ` Duy Nguyen
2016-02-24 20:41 ` David Turner
2016-02-20 21:32 ` Junio C Hamano
2016-02-19 22:49 ` David Turner
2016-02-19 23:08 ` Junio C Hamano
2016-02-20 2:58 ` Duy Nguyen [this message]
2016-02-24 20:43 ` David Turner
2016-02-25 10:07 ` Duy Nguyen
2016-02-20 8:59 ` Duy Nguyen
2016-02-24 20:37 ` David Turner
2016-02-25 10:12 ` Duy Nguyen
2016-02-25 20:05 ` [PATCH] refs: document transaction semantics David Turner
2016-02-25 20:10 ` David Turner
2016-02-25 20:34 ` Junio C Hamano
2016-02-25 20:50 ` David Turner
2016-02-18 5:17 ` [PATCH v5 26/27] refs: tests for lmdb backend David Turner
2016-02-18 5:17 ` [PATCH v5 27/27] tests: add ref-storage argument David Turner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160220025826.GA9338@lanh \
--to=pclouds@gmail.com \
--cc=dturner@twopensource.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=mhagger@alum.mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.