* Re: [PATCH v2] Simplify handling of setup_git_directory_gently() failure cases.
From: Erin Dahlgren @ 2018-12-18 19:50 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: git, Junio C Hamano
In-Reply-To: <nycvar.QRO.7.76.6.1812181332370.43@tvgsbejvaqbjf.bet>
Sorry Johannes for the repeat message, I failed to send to all.
On Tue, Dec 18, 2018 at 4:35 AM Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>
> Hi Erin,
>
> On Sat, 15 Dec 2018, Erin Dahlgren wrote:
>
> > diff --git a/setup.c b/setup.c
> > index 1be5037..e1a9e17 100644
> > --- a/setup.c
> > +++ b/setup.c
> > @@ -831,16 +831,6 @@ static const char *setup_bare_git_dir(struct strbuf *cwd, int offset,
> > return NULL;
> > }
> >
> > -static const char *setup_nongit(const char *cwd, int *nongit_ok)
> > -{
> > - if (!nongit_ok)
> > - die(_("not a git repository (or any of the parent directories): %s"), DEFAULT_GIT_DIR_ENVIRONMENT);
> > - if (chdir(cwd))
> > - die_errno(_("cannot come back to cwd"));
> > - *nongit_ok = 1;
> > - return NULL;
> > -}
> > -
> > static dev_t get_device_or_die(const char *path, const char *prefix, int prefix_len)
> > {
> > struct stat buf;
> > @@ -1097,18 +1087,20 @@ const char *setup_git_directory_gently(int *nongit_ok)
> > prefix = setup_bare_git_dir(&cwd, dir.len, &repo_fmt, nongit_ok);
> > break;
> > case GIT_DIR_HIT_CEILING:
> > - prefix = setup_nongit(cwd.buf, nongit_ok);
> > - break;
> > + if (!nongit_ok)
> > + die(_("not a git repository (or any of the parent directories): %s"),
> > + DEFAULT_GIT_DIR_ENVIRONMENT);
>
> I am terribly sorry to bother you about formatting issues (in my mind, it
> is quite an annoying thing that we still have no fully automatic way to
> format Git's source code according to Git's preferred coding style, but
> there you go...): this `DEFAULT_GIT_DIR_ENVIRONMENT` should be aligned
> with the first parameter of `die()`, i.e.
>
> + if (!nongit_ok)
> + die(_("not a git repository (or any of the parent directories): %s"),
> + DEFAULT_GIT_DIR_ENVIRONMENT);
>
> > + *nongit_ok = 1;
> > + strbuf_release(&dir);
> > + return NULL;
> > case GIT_DIR_HIT_MOUNT_POINT:
> > - if (nongit_ok) {
> > - *nongit_ok = 1;
> > - strbuf_release(&cwd);
> > - strbuf_release(&dir);
> > - return NULL;
> > - }
> > - die(_("not a git repository (or any parent up to mount point %s)\n"
> > - "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set)."),
> > - dir.buf);
> > + if (!nongit_ok)
> > + die(_("not a git repository (or any parent up to mount point %s)\n"
> > + "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set)."),
> > + dir.buf);
>
> Likewise, `dir.buf` should be aligned with the `_` two lines above.
Hi Johannes,
No problem, I'll make those changes.
I'd be interested to hear the arguments against a "fully automatic way
to format Git's source code according to Git's preferred coding
style". If there aren't any then I'd be willing to take a crack at it.
Should we start a separate "discussion" thread?
>
> Otherwise I think this patch is good to go!
>
> Thank you,
> Johannes
>
> > + *nongit_ok = 1;
> > + strbuf_release(&dir);
> > + return NULL;
> > default:
> > BUG("unhandled setup_git_directory_1() result");
> > }
> > --
> > 2.7.4
> >
> >
^ permalink raw reply
* Re: commit-graph idea: warn when disabled for incompatible repositories
From: Derrick Stolee @ 2018-12-18 19:07 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason
Cc: Thomas Ferris Nicolaisen, git@vger.kernel.org
In-Reply-To: <87bm5ig0h3.fsf@evledraar.gmail.com>
On 12/18/2018 1:21 PM, Ævar Arnfjörð Bjarmason wrote:
> diff --git a/builtin/gc.c b/builtin/gc.c
> index 871a56f1c5..702568b70d 100644
> --- a/builtin/gc.c
> +++ b/builtin/gc.c
> @@ -662,9 +662,14 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
> if (pack_garbage.nr > 0)
> clean_pack_garbage();
>
> - if (gc_write_commit_graph)
> + if (gc_write_commit_graph) {
> + int verbose = !quiet && !daemonized;
> + if (verbose && !commit_graph_compatible(the_repository))
> + warning(_("The `gc.writeCommitGraph' setting is on, "
> + "but commit_graph_compatible() = false"));
> write_commit_graph_reachable(get_object_directory(), 0,
> - !quiet && !daemonized);
> + verbose);
> + }
I actually think this is the wrong place to put it. This will cause a
warning for someone with 'gc.writeCommitGraph' enabled and running GC on
a shallow clone.
I think the issue was someone running 'git commit-graph write' inside a
shallow clone that succeeds without doing anything.
Also, I bet you would hit this block if you run the test suite with
GIT_TEST_COMMIT_GRAPH=1.
Thanks,
-Stolee
> if (auto_gc && too_many_loose_objects())
> warning(_("There are too many unreachable loose objects; "
> diff --git a/commit-graph.c b/commit-graph.c
> index 40c855f185..60915bf9aa 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -61,7 +61,7 @@ static struct commit_graph *alloc_commit_graph(void)
>
> extern int read_replace_refs;
>
> -static int commit_graph_compatible(struct repository *r)
> +int commit_graph_compatible(struct repository *r)
> {
> if (!r->gitdir)
> return 0;
> diff --git a/commit-graph.h b/commit-graph.h
> index 9db40b4d3a..7c92d41a28 100644
> --- a/commit-graph.h
> +++ b/commit-graph.h
> @@ -12,6 +12,8 @@ struct commit;
>
> char *get_commit_graph_filename(const char *obj_dir);
>
> +int commit_graph_compatible(struct repository *r);
> +
> /*
> * Given a commit struct, try to fill the commit struct info, including:
> * 1. tree object
This part looks correct, and necessary for the warning in
builtin/commit-graph.c
Thanks,
-Stolee
^ permalink raw reply
* Re: commit-graph idea: warn when disabled for incompatible repositories
From: Ævar Arnfjörð Bjarmason @ 2018-12-18 18:21 UTC (permalink / raw)
To: Derrick Stolee; +Cc: Thomas Ferris Nicolaisen, git@vger.kernel.org
In-Reply-To: <b74bab07-dd77-5195-aaee-a324e9a6b824@gmail.com>
On Tue, Dec 18 2018, Derrick Stolee wrote:
> On 12/18/2018 9:22 AM, Thomas Ferris Nicolaisen wrote:
>> Hey there,
>
> Hi, Thomas!
>
>> Accidentally, my first use-case was a local copy of a big repository
>> (chromium source) that another developer had cloned shallow (I wasn't
>> even aware of their clone being shallow at that point).
>>
>> After spending a little time trying to figure out why no commit-graph
>> file was being created, I noticed that it worked just fine testing in
>> a fresh git repo.
>>
>> Then I discovered the .git/shallow file in the big repo. So I did
>> fetch --unshallow, and commit-graph started working. Taking a 20
>> second log --graph operation down to less than a second (wooo!).
>>
>> I saw some recent release notes that mentions that commit-graph is
>> disabled in incompatible repositories (graft, replace). I assume this
>> also be the case for shallow clones.
>
> The commit-graph feature is not designed to work well with these
> features, and the "easy" fix we have so far is to completely avoid the
> interaction. The tricky bits come in when the commit parents can
> "change" according to these other features. The commit-graph would
> need to change at the same time, and this is actually very difficult
> to get correct.
>
>> Here's the idea that may help others on the same path: Some warning
>> output when attempting to use commit-graph when it is disabled (either
>> by configuration or automatically).
>>
>> I think others that come across commit-graph may have tried such
>> tricks (like shallow clones) to better work with their repositories,
>> and it could be frustrating that commit-graph has no apparent effect.
>
> This is a good idea, and I would happily review a patch that added
> such a warning.
>
> The warning would want to be in builtin/commit-graph.c, and use the
> commit_graph_compatible() method from commit-graph.c. (You'll need to
> expose the method in the .h file.)
This patch will work. Consider it to have my SOB. Just needs tests, and
b.t.w. I noticed that if I s/warning/die/ in this patch the commit graph
& GC tests pass, pointing to a blind spot in the test suite.
diff --git a/builtin/gc.c b/builtin/gc.c
index 871a56f1c5..702568b70d 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -662,9 +662,14 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
if (pack_garbage.nr > 0)
clean_pack_garbage();
- if (gc_write_commit_graph)
+ if (gc_write_commit_graph) {
+ int verbose = !quiet && !daemonized;
+ if (verbose && !commit_graph_compatible(the_repository))
+ warning(_("The `gc.writeCommitGraph' setting is on, "
+ "but commit_graph_compatible() = false"));
write_commit_graph_reachable(get_object_directory(), 0,
- !quiet && !daemonized);
+ verbose);
+ }
if (auto_gc && too_many_loose_objects())
warning(_("There are too many unreachable loose objects; "
diff --git a/commit-graph.c b/commit-graph.c
index 40c855f185..60915bf9aa 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -61,7 +61,7 @@ static struct commit_graph *alloc_commit_graph(void)
extern int read_replace_refs;
-static int commit_graph_compatible(struct repository *r)
+int commit_graph_compatible(struct repository *r)
{
if (!r->gitdir)
return 0;
diff --git a/commit-graph.h b/commit-graph.h
index 9db40b4d3a..7c92d41a28 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -12,6 +12,8 @@ struct commit;
char *get_commit_graph_filename(const char *obj_dir);
+int commit_graph_compatible(struct repository *r);
+
/*
* Given a commit struct, try to fill the commit struct info, including:
* 1. tree object
^ permalink raw reply related
* Re: Merge behavior with merge.conflictStyle diff3
From: Jeff King @ 2018-12-18 18:01 UTC (permalink / raw)
To: Adilson de Almeida Junior; +Cc: git
In-Reply-To: <CAKmxmfosaExDYNyoaNg2teHMzsu3JJ3bqiLhrAaUt0Jt473Phw@mail.gmail.com>
On Tue, Dec 18, 2018 at 03:34:20PM -0200, Adilson de Almeida Junior wrote:
> I guess, most times the second behavior is the expected: two conflict
> hunks, not only the divvergent pieces (this case, the second line) of
> them. By the doc:
>
> merge.conflictStyle
>
> Specify the style in which conflicted hunks are written out to
> working tree files upon merge. The default is "merge", which shows a
> [[[[[[[ conflict marker, changes made by one side, a ======= marker,
> changes made by the other side, and then a ]]]]]]] marker. An
> alternate style, "diff3", adds a ||||||| marker and the original text
> before the ======= marker.
>
> I replaced the 'lower than' and 'greater than' symbols by 'open n
> close square brackets' to avoid antivirus.
>
> Is this a bug, or something I missunderstood from git docs?
This is the expected behavior. There's some philosophical discussion
about the correct thing to do here in this thread (that message and its
replies, but the whole thread is an interesting read):
https://public-inbox.org/git/7vvc94p8hb.fsf@alter.siamese.dyndns.org/
Note that there is a patch in that thread to implement "zdiff3", which
does what you want. I've been rebasing it and running with it for
several years now. You can get my rebased version by fetching:
https://github.com/peff/git ukk/zdiff3
But note that I think it may have a subtle bug, as I have once or twice
over the years seen it cause a segfault.
-Peff
^ permalink raw reply
* Re: [PATCH v2] Simplify handling of setup_git_directory_gently() failure cases.
From: Jeff King @ 2018-12-18 17:54 UTC (permalink / raw)
To: Erin Dahlgren; +Cc: git, Johannes Schindelin, Junio C Hamano
In-Reply-To: <1544922308-740-1-git-send-email-eedahlgren@gmail.com>
On Sat, Dec 15, 2018 at 05:05:08PM -0800, Erin Dahlgren wrote:
> setup_git_directory_gently() expects two types of failures to
> discover a git directory (e.g. .git/):
> [...]
When I read your subject line, I'll admit to having a funny feeling in
the pit of my stomach. This setup code has historically been full of
subtle corner cases, and I expected a very tricky review.
However, your explanation was so well-written that it was a pleasure to
read. :)
> Before this change are two misleading additional behaviors:
>
> - GIT_DIR_HIT_CEILING: setup_nongit() changes to the cwd for no
> apparent reason. We never had the chance to change directories
> up to this point so chdir(current cwd) is pointless.
That makes sense. I think this is a holdover from when we used to walk
backwards via chdir(), prior to ce9b8aab5d (setup_git_directory_1():
avoid changing global state, 2017-03-13). Back then, we needed to
restore the state at this point, but now we don't.
> - GIT_DIR_HIT_MOUNT_POINT: strbuf_release() frees the buffer
> of a static struct strbuf (cwd). This is unnecessary because the
> struct is static so its buffer is always reachable. This is also
> misleading because nowhere else in the function is this buffer
> released.
Makes sense.
I do have one question, though:
> case GIT_DIR_HIT_CEILING:
> - prefix = setup_nongit(cwd.buf, nongit_ok);
> - break;
> + if (!nongit_ok)
> + die(_("not a git repository (or any of the parent directories): %s"),
>a + DEFAULT_GIT_DIR_ENVIRONMENT);
> + *nongit_ok = 1;
> + strbuf_release(&dir);
> + return NULL;
This case used to break out of the switch, and now returns early.
So we do not execute the later code which clears GIT_PREFIX_ENVIRONMENT,
and zeroes the fields in startup_info. Those probably don't matter in
most cases, but I wonder if there are some obscure ones where it might.
Might it make sense to make GIT_DIR_HIT_MOUNT_POINT more like
GIT_DIR_HIT_CEILING currently is, rather than the other way around?
I.e., something like:
case GIT_DIR_HIT_CEILING:
if (!nongit_ok)
die(...);
*nongit_ok = 1;
prefix = NULL;
break;
case GIT_DIR_HIT_MOUNT_POINT:
if (!nongit_ok)
die(...);
*nongit_ok = 1;
prefix = NULL;
break;
?
-Peff
^ permalink raw reply
* Re: [PATCH v4 0/3] Add commit-graph fuzzer and fix buffer overflow
From: Jeff King @ 2018-12-18 17:35 UTC (permalink / raw)
To: Josh Steadmon; +Cc: git, gitster, stolee, avarab, szeder.dev
In-Reply-To: <cover.1544729841.git.steadmon@google.com>
On Thu, Dec 13, 2018 at 11:43:55AM -0800, Josh Steadmon wrote:
> Add a new fuzz test for the commit graph and fix a buffer read-overflow
> that it discovered. Additionally, fix the Makefile instructions for
> building fuzzers.
>
> Changes since V3:
> * Improve portability of the new test functionality.
I thought there was some question about /dev/zero, which I think is
in this version (I don't actually know whether there are portability
issues or not, but somebody did mention it).
-Peff
^ permalink raw reply
* Merge behavior with merge.conflictStyle diff3
From: Adilson de Almeida Junior @ 2018-12-18 17:34 UTC (permalink / raw)
To: git
Hi,
I´m not sure if this is a bug or not.
These are the steps to reproduce it (git 2.17 at least):
- In a repo, with default settings (merge strategy, conflict style,
merge drivers, etc);
- Create a file 'test.xml', and add the following content to it:
[div]
A
[/div]
- Perform a git add and git commit, then, create a branch named branch1;
- Next, on branch1, edit the xml file and add a new div:
[div]
A
[/div]
[div]
B
[/div]
- Then, comit the changes, and after that return to branch master;
- So, perform a similar but slightly different change:
[div]
A
[/div]
[div]
C
[/div]
- Then commit it;
- Now, do a merge (git merge branch1);
When my conflictStyle is default (merge), the merged file becames:
[div]
[[[[[[[ HEAD
C
=======
B
]]]]]]] branch1
[/div]
But when the merge.conflictStyle is set to diff3, I get:
[[[[[[[ HEAD
[div]
C
[/div]
||||||| merged common ancestors
=======
[div]
B
[/div]
]]]]]]] branch1
I guess, most times the second behavior is the expected: two conflict
hunks, not only the divvergent pieces (this case, the second line) of
them. By the doc:
merge.conflictStyle
Specify the style in which conflicted hunks are written out to
working tree files upon merge. The default is "merge", which shows a
[[[[[[[ conflict marker, changes made by one side, a ======= marker,
changes made by the other side, and then a ]]]]]]] marker. An
alternate style, "diff3", adds a ||||||| marker and the original text
before the ======= marker.
I replaced the 'lower than' and 'greater than' symbols by 'open n
close square brackets' to avoid antivirus.
Is this a bug, or something I missunderstood from git docs?
Thanks,
Adilson de Almeida Jr
^ permalink raw reply
* Re: [PATCH v3 0/3]
From: Jeff King @ 2018-12-18 17:25 UTC (permalink / raw)
To: nbelakovski; +Cc: git, rafa.almas, gitster, avarab
In-Reply-To: <20181216215759.24011-1-nbelakovski@gmail.com>
On Sun, Dec 16, 2018 at 01:57:56PM -0800, nbelakovski@gmail.com wrote:
> Finally got around to submitting latest changes.
>
> I think this addresses all the feedback
>
> The atom now returns the worktree path instead of '+'
Thanks, I think patch 1 is definitely going in the right direction.
There are a few issues I found in its implementation, but they should be
easy to fix (and won't affect the other patches).
I don't really have a strong opinion on the behavior of the other
patches.
-Peff
^ permalink raw reply
* Re: [PATCH v3 1/3] ref-filter: add worktreepath atom
From: Jeff King @ 2018-12-18 17:22 UTC (permalink / raw)
To: nbelakovski; +Cc: git, rafa.almas, gitster, avarab
In-Reply-To: <20181216215759.24011-2-nbelakovski@gmail.com>
On Sun, Dec 16, 2018 at 01:57:57PM -0800, nbelakovski@gmail.com wrote:
> From: Nickolai Belakovski <nbelakovski@gmail.com>
>
> Add an atom proving the path of the linked worktree where this ref is
> checked out, if it is checked out in any linked worktrees, and empty
> string otherwise.
I stumbled over the word "proving" here. Maybe "showing" would be more
clear?
> diff --git a/Documentation/git-for-each-ref.txt b/Documentation/git-for-each-ref.txt
> index 901faef1bf..9590f7beab 100644
> --- a/Documentation/git-for-each-ref.txt
> +++ b/Documentation/git-for-each-ref.txt
> @@ -209,6 +209,10 @@ symref::
> `:lstrip` and `:rstrip` options in the same way as `refname`
> above.
>
> +worktreepath::
> + The absolute path to the worktree in which the ref is checked
> + out, if it is checked out in any linked worktree. ' ' otherwise.
> +
Normally single-quotes are used in asciidoc to emphasize text, and the
quotes aren't passed through. Asciidoc (and asciidoctor) do seem to
render the literal quotes here, which is good. I wonder if it would be
more clear to just write it out, though, like:
...any linked worktree. Otherwise, replaced with a single space.
Also, why are we replacing it with a single space? Wouldn't the empty
string be more customary (and work with the other "if empty, then do
this" formatting options)?
> @@ -34,6 +36,8 @@ static struct ref_msg {
> "ahead %d, behind %d"
> };
>
> +static struct worktree ** worktrees;
Minor style nit: we put the "*" in a pointer declaration next to the
variable name, without intervening whitespace. Like:
static struct worktree **worktrees;
> @@ -75,6 +79,12 @@ static struct expand_data {
> struct object_info info;
> } oi, oi_deref;
>
> +struct reftoworktreeinfo_entry {
> + struct hashmap_entry ent; // must be the first member!
> + char * ref; // key into map
> + struct worktree * wt;
> +};
A few style nits:
- the "*" space thing from above (it's in other places below, too, but
I won't point out each)
- we prefer "/* */" comments, even for single-liners
- since we do all-lowercase identifiers, use more underscores to break
things up. E.g., ref_to_worktree_entry.
Here we store the refname as a separate variable, but then point to the
worktree itself to access wt->path. Why do we treat these differently?
I.e., I'd expect to see either:
1. Each entry holding a single worktree object, and using its head_ref
and path fields, like:
struct ref_to_worktree_entry {
struct hashmap_entry ent; /* must be first */
struct worktree *wt;
};
....
entry = xmalloc(sizeof(*entry));
entry->wt = wt;
hashmap_entry_init(entry, strhash(wt->head_ref));
...
strbuf_addstr(&out, result->wt->path);
2. Each entry containing just the bits it needs, like:
struct ref_to_worktree_entry {
struct hashmap_entry ent; /* must be first */
char *ref;
char *path;
};
...
/*
* We could use FLEXPTR_ALLOC_STR() here, but it doesn't actually
* support holding _two_ strings. Separate allocations probably
* aren't a huge deal here, since there are only a handful of
* worktrees.
*/
entry = xmalloc(sizeof(*entry));
entry->ref = wt->head_ref;
entry->path = wt->path;
hashmap_entry_init(entry, strhash(entry->ref));
...
strbuf_addstr(&out, result->path);
I think the first one is strictly preferable unless we're worried about
the lifetime of the "struct worktree" going away. I don't think that's
an issue, though; they are ours until we call free_worktrees().
> @@ -114,6 +124,7 @@ static struct used_atom {
> } objectname;
> struct refname_atom refname;
> char *head;
> + struct hashmap reftoworktreeinfo_map;
> } u;
> } *used_atom;
This uses one map for each %(worktree) we use. But won't they all be the
same? It would ideally be associated with the ref-filter. There's no
ref-filter context struct to hold this kind of data, just static globals
in ref-filter.c (including this used_atom struct!). That's something
we'll probably need to fix in the long run, but I think it would be
reasonable to just have:
static struct hashmap ref_to_worktree_map;
next to the declaration of used_atom_cnt, need_symref, etc. And then
those can all eventually get moved into a struct together.
> @@ -461,6 +497,7 @@ static struct {
> { "flag", SOURCE_NONE },
> { "HEAD", SOURCE_NONE, FIELD_STR, head_atom_parser },
> { "color", SOURCE_NONE, FIELD_STR, color_atom_parser },
> + { "worktreepath", SOURCE_NONE, FIELD_STR, worktree_atom_parser },
> { "align", SOURCE_NONE, FIELD_STR, align_atom_parser },
> { "end", SOURCE_NONE },
> { "if", SOURCE_NONE, FIELD_STR, if_atom_parser },
Marking as SOURCE_NONE makes sense.
> +static const char * get_worktree_info(const struct used_atom *atom, const struct ref_array_item *ref)
> +{
> + struct strbuf val = STRBUF_INIT;
> + struct reftoworktreeinfo_entry * entry;
> + struct reftoworktreeinfo_entry * lookup_result;
> +
> + FLEXPTR_ALLOC_STR(entry, ref, ref->refname);
> + hashmap_entry_init(entry, strhash(entry->ref));
> + lookup_result = hashmap_get(&(atom->u.reftoworktreeinfo_map), entry, NULL);
> + free(entry);
We shouldn't need to do an allocation just for a lookup. That's what the
extra "keydata" parameter is for in the comparison function. And I guess
this is what led you to have "char *ref" in the struct, rather than
reusing wt->head_ref (because you don't have a "struct worktree" here).
You should be able to do it like this:
struct hashmap_entry entry;
struct ref_to_worktree_entry *result;
hashmap_entry_init(entry, strhash(ref->refname));
result = hashmap_get(&ref_to_worktree_map, &entry, ref->refname));
...
and then your comparison function would look like this:
int ref_to_worktree_hashcmp(const void *data,
const void *entry,
const void *entry_or_key,
const void *keydata)
{
const struct ref_to_worktree_entry *a = entry;
const struct ref_to_worktree_entry *b = entry;
if (keydata)
return strcmp(a->wt->head_ref, keydata);
else
return strcmp(a->wt->head_ref, b->wt->head_ref);
}
If you're thinking that this API is totally confusing and hard to figure
out, I agree. It's optimized to avoid extra allocations. I wish we had a
better one for simple cases (especially string->string mappings like
this).
Speaking of comparison functions, I didn't see one in your patch. Don't
you need to pass one to hashmap_init?
> + if (lookup_result)
> + {
> + if (!strncmp(atom->name, "worktreepath", strlen(atom->name)))
> + strbuf_addstr(&val, lookup_result->wt->path);
> + }
> + else
> + strbuf_addstr(&val, " ");
What's this extra strncmp about? If we're _not_ a worktreepath atom,
we'd still do the lookup only to put nothing in the string?
I think we'd only call this function when populate_value() sees a
worktreepath atom, though:
> @@ -1537,6 +1596,10 @@ static int populate_value(struct ref_array_item *ref, struct strbuf *err)
>
> if (starts_with(name, "refname"))
> refname = get_refname(atom, ref);
> + else if (starts_with(name, "worktreepath")) {
> + v->s = get_worktree_info(atom, ref);
> + continue;
> + }
So it would be OK to drop the check of atom->name again inside
get_worktree_info().
> @@ -2013,7 +2076,14 @@ void ref_array_clear(struct ref_array *array)
> int i;
>
> for (i = 0; i < used_atom_cnt; i++)
> + {
> + if (!strncmp(used_atom[i].name, "worktreepath", strlen("worktreepath")))
> + {
> + hashmap_free(&(used_atom[i].u.reftoworktreeinfo_map), 1);
> + free_worktrees(worktrees);
> + }
And if we move the mapping out to a static global, then this only has to
be done once, not once per atom. In fact, I think this could double-free
"worktrees" with your current patch if you have two "%(worktree)"
placeholders, since "worktrees" already is a global.
> diff --git a/t/t6302-for-each-ref-filter.sh b/t/t6302-for-each-ref-filter.sh
> index fc067ed672..add70a4c3e 100755
> --- a/t/t6302-for-each-ref-filter.sh
> +++ b/t/t6302-for-each-ref-filter.sh
> @@ -441,4 +441,19 @@ test_expect_success '--merged is incompatible with --no-merged' '
> test_must_fail git for-each-ref --merged HEAD --no-merged HEAD
> '
>
> +test_expect_success '"add" a worktree' '
> + mkdir worktree_dir &&
> + git worktree add -b master_worktree worktree_dir master
> +'
> +
> +test_expect_success 'validate worktree atom' '
> + cat >expect <<-\EOF &&
> + master: checked out in a worktree
> + master_worktree: checked out in a worktree
> + side: not checked out in a worktree
> + EOF
> + git for-each-ref --format="%(refname:short): %(if)%(worktreepath)%(then)checked out in a worktree%(else)not checked out in a worktree%(end)" refs/heads/ >actual &&
> + test_cmp expect actual
> +'
It's probably worth testing that the path we get is actually sane, too.
I.e., expect something more like:
cat >expect <<-\EOF
master: $PWD
master: $PWD/worktree
side: not checked out
EOF
git for-each-ref \
--format="%(refname:short): %(if)%(worktreepath)%(then)%(worktreepath)%(else)not checked %out%(end)
(I wish there was a way to avoid that really long line, but I don't
think there is).
-Peff
^ permalink raw reply
* Re: [PATCH] log: add %S option (like --source) to log --format
From: Issac Trotts @ 2018-12-18 17:14 UTC (permalink / raw)
To: Jeff King; +Cc: Junio C Hamano, git, Noemi Mercado
In-Reply-To: <20181217155915.GA914@sigill.intra.peff.net>
Hi Peff, thanks for the feedback. I tried a variant of the command you
showed and it yielded a seg fault:
```
[ issactrotts ~/git ] ./git diff-tree -s --pretty=tformat:'%S %H %s' HEAD
Segmentation fault: 11
```
I'll see if I can track it down this evening.
Issac
On Mon, Dec 17, 2018 at 7:59 AM Jeff King <peff@peff.net> wrote:
>
> On Sun, Dec 16, 2018 at 10:25:14PM -0800, Issac Trotts wrote:
>
> > Make it possible to write for example
> >
> > git log --format="%H,%S"
> >
> > where the %S at the end is a new placeholder that prints out the ref
> > (tag/branch) for each commit.
>
> Seems like a reasonable thing to want.
>
> One curious thing about "--source" is that it requires cooperation from
> the actual traversal. So if you did:
>
> git rev-list | git diff-tree --format="%H %S"
>
> we don't have the %S information in the latter command. I think that's
> probably acceptable as long as it does something sane when we don't have
> that information (e.g., replace it with an empty string). It might also
> be worth calling out in the documentation.
>
> -Peff
^ permalink raw reply
* Re: [PATCHSET] git-reverse-trailer-xrefs: Reverse map cherry-picks and other cross-references
From: Tejun Heo @ 2018-12-18 16:51 UTC (permalink / raw)
To: Stefan Xenos; +Cc: Junio C Hamano, git, peff, kernel-team
In-Reply-To: <CAPL8ZiuxzdUtkRxALBZ=LjTaKQ6d85BmB9nTQ0JPLnVRo91j8Q@mail.gmail.com>
Hey, guys.
On Tue, Dec 18, 2018 at 08:48:35AM -0800, Stefan Xenos wrote:
> I've just uploaded a new evolve proposal that includes a spec for the
> "hiddenmetas" namespace, where we can store historical cherry-pick
> information.
Total noob question - where can I read that? Also, as long as I can
have the back references and follow to the existing cherry-picks, I
don't have any further requirements, so any working mechanism is great
by me.
Thanks.
--
tejun
^ permalink raw reply
* Re: [PATCHSET] git-reverse-trailer-xrefs: Reverse map cherry-picks and other cross-references
From: Stefan Xenos @ 2018-12-18 16:48 UTC (permalink / raw)
To: Tejun Heo; +Cc: Junio C Hamano, git, peff, kernel-team
In-Reply-To: <CAPL8ZiuNgv4s1w-FKAznV5A9MKxSxW40fTPrF0Xt8Ywy6UOsVQ@mail.gmail.com>
I've just uploaded a new evolve proposal that includes a spec for the
"hiddenmetas" namespace, where we can store historical cherry-pick
information.
On Tue, Dec 18, 2018 at 6:40 AM Stefan Xenos <sxenos@google.com> wrote:
>
> Heya, Tejun!
>
> Thanks for getting in touch. I agree with Junio that we probably
> shouldn't be tracking the same information in two ways if we can think
> of something that does both... so let's see if we can think of
> something! The evolve proposal suggests having a metas/ namespace to
> track ongoing changes is intended as a way to track the changes a user
> is actively working on. If we were to use it to retroactively store
> every historical cherry-pick in a large repository, I'm concerned it
> may get too cluttered to use for its originally-intended purpose. I'm
> also not sure how well things would perform with a huge number of
> refs. The remainder of the proposal (using metacommits to store the
> relationships) would work for the xref use-case, but we'd probably
> want to tweak the way we store the root refs in order to do a good job
> with the large number of inactive cherry-picks he probably wants. Any
> code that is looking for cross-reference metadata could look in both
> namespaces.
>
> Conversely, if we were to tweak the xrefs proposal by adding the
> obsolescence information that evolve needs, we'd be missing a place to
> store the user's ongoing changes, a way to push and pull specific
> changes, a way to describe alternate histories for a commit, and a
> mechanism for preventing the commits needed by evolve from getting
> garbage collected.
>
> All the problems with both approaches are solve-able, though.
>
> I spent a few hours discussing this with Stefan Beller last week and I
> think we've come up with a variation on the evolve proposal that would
> cover both our use-cases. Let me know what you think. To address the
> cluttering of the metas namespace, we could introduce a new
> "hiddenmetas" namespace. It would contain the same type of data
> recorded in the metas namespace, but it would normally be hidden from
> the user when they list their changes, etc. It would also be immune
> from the automatic deletions that are applied to the "metas"
> namespace. From your point of view, the critical bit is that it would
> contain information about cherry-picks. That would address the
> "user-visible clutter" problem. Utility methods that want to search
> for cherry-picks would iterate over both namespaces.
>
> For the performance problem, I think we could just live with it
> temporarily and wait for the currently-ongoing reftable work since the
> reftable proposal would address exactly this performance issue (by
> allowing us to store and manipulate large numbers of refs
> efficiently).
>
> A nice characteristic of this approach is that it would also address a
> problem with the evolve proposal that had concerned me: how to deal
> with the filter-branch command, which would have created pretty much
> the same problems regarding the large number of metadata refs for
> changes the user isn't actively working on.
>
> It might be nice to also consider and some alternatives, but I haven't
> had enough time to think up more of them (and I haven't digested the
> xrefs proposal sufficiently to suggest an enhanced version of it yet).
> If anyone else has ideas for combining the xrefs and evolve use-cases,
> having more alternatives to choose from is always better!
>
> If you're okay with the "hiddenmetas" approach in principle, I'll
> update the evolve proposal doc to include it. Once we work out how the
> storage format will work, we can coordinate our implementation work.
>
> - Stefan
>
>
>
>
> On Wed, Dec 12, 2018 at 7:46 PM Tejun Heo <tj@kernel.org> wrote:
> >
> > Hello, Junio, Stefan.
> >
> > On Thu, Dec 13, 2018 at 12:09:39PM +0900, Junio C Hamano wrote:
> > > Please do not take the above as "don't do notes/xref-; instead read
> > > from the 'meta commits'". I do not have a preference between the
> > > two proposed implementations. The important thing is that we won't
> > > end up with having to maintain two separate mechanisms that want to
> > > keep track of essentially the same class of information. FWIW I'd
> > > be perfectly fine if the unification goes the other way, as long as
> > > goals of both parties are met, and for that, I'd like to see you two
> > > work together, or at least be aware of what each other is doing and
> > > see if cross-pollination would result in a mutually better solution.
> >
> > So, from my POV, the only thing I want is being able to easily tell
> > which commit got cherry picked where. I really don't care how that's
> > achieved. Stefan, if there's anything I can do to push it forward,
> > let me know and if you see anything useful in my patchset, please feel
> > free to incorporate them in any way.
> >
> > Thanks.
> >
> > --
> > tejun
^ permalink raw reply
* [PATCH v3] technical doc: add a design doc for the evolve command
From: sxenos @ 2018-12-18 16:46 UTC (permalink / raw)
To: git; +Cc: sxenos
From: Stefan Xenos <sxenos@google.com>
This document describes what a change graph for
git would look like, the behavior of the evolve command,
and the changes planned for other commands.
Signed-off-by: Stefan Xenos <sxenos@google.com>
---
Documentation/technical/evolve.txt | 1034 ++++++++++++++++++++++++++++
1 file changed, 1034 insertions(+)
create mode 100644 Documentation/technical/evolve.txt
diff --git a/Documentation/technical/evolve.txt b/Documentation/technical/evolve.txt
new file mode 100644
index 0000000000..7967c73e5d
--- /dev/null
+++ b/Documentation/technical/evolve.txt
@@ -0,0 +1,1034 @@
+Evolve
+======
+
+Objective
+=========
+Create an "evolve" command to help users craft a high quality commit history.
+Users can improve commits one at a time and in any order, then run git evolve to
+rewrite their recent history to ensure everything is up-to-date. We track
+amendments to a commit over time in a change graph. Users can share their
+progress with others by exchanging their change graphs using the standard push,
+fetch, and format-patch commands.
+
+Status
+======
+This proposal has not been implemented yet.
+
+Background
+==========
+Imagine you have three sequential changes up for review and you receive feedback
+that requires editing all three changes. We'll define the word "change"
+formally later, but for the moment let's say that a change is a work-in-progress
+whose final version will be submitted as a commit in the future.
+
+While you're editing one change, more feedback arrives on one of the others.
+What do you do?
+
+The evolve command is a convenient way to work with chains of commits that are
+under review. Whenever you rebase or amend a commit, the repository remembers
+that the old commit is obsolete and has been replaced by the new one. Then, at
+some point in the future, you can run "git evolve" and the correct sequence of
+rebases will occur in the correct order such that no commit has an obsolete
+parent.
+
+Part of making the "evolve" command work involves tracking the edits to a commit
+over time, which is why we need an change graph. However, the change
+graph will also bring other benefits:
+
+- Users can view the history of a change directly (the sequence of amends and
+ rebases it has undergone, orthogonal to the history of the branch it is on).
+- It will be possible to quickly locate and list all the changes the user
+ currently has in progress.
+- It can be used as part of other high-level commands that combine or split
+ changes.
+- It can be used to decorate commits (in git log, gitk, etc) that are either
+ obsolete or are the tip of a work in progress.
+- By pushing and pulling the change graph, users can collaborate more
+ easily on changes-in-progress. This is better than pushing and pulling the
+ changes themselves since the change graph can be used to locate a more
+ specific merge base, allowing for better merges between different versions of
+ the same change.
+- It could be used to correctly rebase local changes and other local branches
+ after running git-filter-branch.
+- It can replace the change-id footer used by gerrit.
+
+Goals
+-----
+Legend: Goals marked with P0 are required. Goals marked with Pn should be
+attempted unless they interfere with goals marked with Pn-1.
+
+P0. All commands that modify commits (such as the normal commit --amend or
+ rebase command) should mark the old commit as being obsolete and replaced by
+ the new one. No additional commands should be required to keep the
+ change graph up-to-date.
+P0. Any commit that may be involved in a future evolve command should not be
+ garbage collected. Specifically:
+ - Commits that obsolete another should not be garbage collected until
+ user-specified conditions have occurred and the change has expired from
+ the reflog. User specified conditions for removing changes include:
+ - The user explicitly deleted the change.
+ - The change was merged into a specific branch.
+ - Commits that have been obsoleted by another should not be garbage
+ collected if any of their replacements are still being retained.
+P0. A commit can be obsoleted by more than one replacement (called divergence).
+P0. Must be able to resolve divergence (convergence).
+P1. Users should be able to share chains of obsolete changes in order to
+ collaborate on WIP changes.
+P2. Such sharing should be at the user’s option. That is, it should be possible
+ to directly share a change without also sharing the file states or commit
+ comments from the obsolete changes that led up to it, and the choice not to
+ share those commits should not require changing any commit hashes.
+P2. It should be possible to discard part or all of the change graph
+ without discarding the commits themselves that are already present in
+ branches and the reflog.
+P2. Provide sufficient information to replace gerrit's Change-Id footers.
+
+Similar technologies
+--------------------
+There are some other technologies that address the same end-user problem.
+
+Rebase -i can be used to solve the same problem, but users can't easily switch
+tasks midway through an interactive rebase or have more than one interactive
+rebase going on at the same time. It can't handle the case where you have
+multiple changes sharing the same parent when that parent needs to be rebased
+and won't let you collaborate with others on resolving a complicated interactive
+rebase. You can think of rebase -i as a top-down approach and the evolve command
+as the bottom-up approach to the same problem.
+
+Several patch queue managers have been built on top of git (such as topgit,
+stgit, and quilt). They address the same user need. However they also rely on
+state managed outside git that needs to be kept in sync. Such state can be
+easily damaged when running a git native command that is unaware of the patch
+queue. They also typically require an explicit initialization step to be done by
+the user which creates workflow problems.
+
+Mercurial implements a very similar feature in its EvolveExtension. The behavior
+of the evolve command itself is very similar, but the storage format for the
+change graph differs. In the case of mercurial, each change set can have one or
+more obsolescence markers that point to other changesets that they replace. This
+is similar to the "Commit Headers" approach considered in the other options
+appendix. The approach proposed here stores obsolescence information in a
+separate metacommit graph, which makes exchanging of obsolescence information
+optional.
+
+Mercurial's default behavior makes it easy to find and switch between
+non-obsolete changesets that aren't currently on any branch. We introduce the
+notion of a new ref namespace that enables a similar workflow via a different
+mechanism. Mercurial has the notion of changeset phases which isn't present
+in git and creates new ways for a changeset to diverge. Git doesn't need
+to deal with these issues, but it has to deal with picking an upstream branch as
+a target for rebases and protecting obsolescence information from GC. We also
+introduce some additional transformations (see obsolescence-over-cherry-pick,
+below) that aren't present in the mercurial implementation.
+
+Semi-related work
+-----------------
+There are other technologies that address different problems but have some
+similarities with this proposal.
+
+Replacements (refs/replace) are superficially similar to obsolescences in that
+they describe that one commit should be replaced by another. However, they
+differ in both how they are created and how they are intended to be used.
+Obsolescences are created automatically by the commands a user runs, and they
+describe the user’s intent to perform a future rebase. Obsolete commits still
+appear in branches, logs, etc like normal commits (possibly with an extra
+decoration that marks them as obsolete). Replacements are typically created
+explicitly by the user, they are meant to be kept around for a long time, and
+they describe a replacement to be applied at read-time rather than as the input
+to a future operation. When a replaced commit is queried, it is typically hidden
+and swapped out with its replacement as though the replacement has already
+occurred.
+
+Git-imerge is a project to help make complicated merges easier, particularly
+when merging or rebasing long chains of patches. It is not an alternative to
+the change graph, but its algorithm of applying smaller incremental merges
+could be used as part of the evolve algorithm in the future.
+
+Overview
+========
+We introduce the notion of “meta-commits” which describe how one commit was
+created from other commits. A branch of meta-commits is known as a change.
+Changes are created and updated automatically whenever a user runs a command
+that creates a commit. They are used for locating obsolete commits, providing a
+list of a user’s unsubmitted work in progress, and providing a stable name for
+each unsubmitted change.
+
+Users can exchange edit histories by pushing and fetching changes.
+
+New commands will be introduced for manipulating changes and resolving
+divergence between them. Existing commands that create commits will be updated
+to modify the meta-commit graph and create changes where necessary.
+
+Example usage
+-------------
+# First create three dependent changes
+$ echo foo>bar.txt && git add .
+$ git commit -m "This is a test"
+created change metas/this_is_a_test
+$ echo foo2>bar2.txt && git add .
+$ git commit -m "This is also a test"
+created change metas/this_is_also_a_test
+$ echo foo3>bar3.txt && git add .
+$ git commit -m "More testing"
+created change metas/more_testing
+
+# List all our changes in progress
+$ git change list
+metas/this_is_a_test
+metas/this_is_also_a_test
+* metas/more_testing
+metas/some_change_already_merged_upstream
+
+# Now modify the earliest change, using its stable name
+$ git reset --hard metas/this_is_a_test
+$ echo morefoo>>bar.txt && git add . && git commit --amend --no-edit
+
+# Use git-evolve to fix up any dependent changes
+$ git evolve
+rebasing metas/this_is_also_a_test onto metas/this_is_a_test
+rebasing metas/more_testing onto metas/this_is_also_a_test
+Done
+
+# Use git-obslog to view the history of the this_is_a_test change
+$ git log --obslog
+93f110 metas/this_is_a_test@{0} commit (amend): This is a test
+930219 metas/this_is_a_test@{1} commit: This is a test
+
+# Now create an unrelated change
+$ git reset --hard origin/master
+$ echo newchange>unrelated.txt && git add .
+$ git commit -m "Unrelated change"
+created change metas/unrelated_change
+
+# Fetch the latest code from origin/master and use git-evolve
+# to rebase all dependent changes.
+$ git fetch origin master
+$ git evolve origin/master
+deleting metas/some_change_already_merged_upstream
+rebasing metas/this_is_a_test onto origin/master
+rebasing metas/this_is_also_a_test onto metas/this_is_a_test
+rebasing metas/more_testing onto metas/this_is_also_a_test
+rebasing metas/unrelated_change onto origin/master
+Conflict detected! Resolve it and then use git evolve --continue to resume.
+
+# Sort out the conflict
+$ git mergetool
+$ git evolve --continue
+Done
+
+# Share the full history of edits for the this_is_a_test change
+# with a review server
+$ git push origin metas/this_is_a_test:refs/for/master
+# Share the lastest commit for “Unrelated change”, without history
+$ git push origin HEAD:refs/for/master
+
+Detailed design
+===============
+Obsolescence information is stored as a graph of meta-commits. A meta-commit is
+a specially-formatted merge commit that describes how one commit was created
+from others.
+
+Meta-commits look like this:
+
+$ git cat-file -p <example_meta_commit>
+tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904
+parent aa7ce55545bf2c14bef48db91af1a74e2347539a
+parent d64309ee51d0af12723b6cb027fc9f195b15a5e9
+parent 7e1bbcd3a0fa854a7a9eac9bf1eea6465de98136
+author Stefan Xenos <sxenos@gmail.com> 1540841596 -0700
+committer Stefan Xenos <sxenos@gmail.com> 1540841596 -0700
+parent-type c r o
+
+This says “commit aa7ce555 makes commit d64309ee obsolete. It was created by
+cherry-picking commit 7e1bbcd3”.
+
+The tree for meta-commits is always the empty tree whose hash matches
+4b825dc642cb6eb9a060e54bf8d69288fbee4904 exactly, but future versions of git may
+attach other trees here. For forward-compatibility fsck should ignore such trees
+if found on future repository versions. Similarly, current versions of git
+should always fill in an empty commit comment and tools like fsck should ignore
+the content of the commit comment if present in a future repository version.
+This will allow future versions of git to add metadata to the meta-commit
+comments or tree without breaking forwards compatibility.
+
+Parent-type
+-----------
+The “parent-type” field in the commit header identifies a commit as a
+meta-commit and indicates the meaning for each of its parents. It is never
+present for normal commits. It contains a space-deliminated list of enum values
+whose order matches the order of the parents. Possible parent types are:
+
+- c: (content) the content parent identifies the commit that this meta-commit is
+ describing.
+- r: (replaced) indicates that this parent is made obsolete by the content
+ parent.
+- o: (origin) indicates that this parent was generated from the given commit.
+- a: (abandoned) used in place of a content parent for abandoned changes. Points
+ to the final content commit for the change at the time it was abandoned.
+
+There must be exactly one content or abandoned parent for each meta-commit and it is
+always the first parent. The content commit will always be a normal commit and not a
+meta-commit. However, future versions of git may create meta-commits for other
+meta-commits and the fsck tool must be aware of this for forwards compatibility.
+
+A meta-commit can have zero or more replaced parents. An amend operation creates
+a single replaced parent. A merge used to resolve divergence (see divergence,
+below) will create multiple replaced parents. A meta-commit may have no
+replaced parents if it describes a cherry-pick or squash merge that copies one
+or more commits but does not replace them.
+
+A meta-commit can have zero or more origin parents. A cherry-pick creates a
+single origin parent. Certain types of squash merge will create multiple origin
+parents. Origin parents don't directly cause their origin to become obsolete,
+but are used when computing blame or locating a merge base. The section
+on obsolescence over cherry-picks describes how the evolve command uses
+origin parents.
+
+A replaced parent or origin parent may be either a normal commit (indicating
+the oldest-known version of a change) or another meta-commit (for a change that
+has already been modified one or more times).
+
+The parent-type field needs to go after the committer field since git's rules
+for forwards-compatibility require that new fields to be at the end of the
+header. Putting a new field in the middle of the header would break fsck.
+
+The presence of an abandoned parent indicates that the change should be pruned
+by the evolve command, and removed from the repository's history. The abandoned
+parent points to the version of the change that should be restored if the user
+attempts to restore the change.
+
+Changes
+-------
+A branch of meta-commits describes how a commit was produced and what previous
+commits it is based on. It is also an identifier for a thing the user is
+currently working on. We refer to such a meta-branch as a change.
+
+Local changes are stored in the new refs/metas namespace. Remote changes are
+stored in the refs/remote/<remotename>/metas namespace.
+
+The list of changes in refs/metas is more than just a mechanism for the evolve
+command to locate obsolete commits. It is also a convenient list of all of a
+user’s work in progress and their current state - a list of things they’re
+likely to want to come back to.
+
+Strictly speaking, it is the presence of the branch in the refs/metas namespace
+that marks a branch as being a change, not the fact that it points to a
+metacommit. Metacommits are only created when a commit is amended or rebased, so
+in the case where a change points to a commit that has never been modified, the
+change points to that initial commit rather than a metacommit.
+
+Changes are also stored in the refs/hiddenmetas namespace. Hiddenmetas holds
+metadata for historical changes that are not currently in progress by the user.
+Commands like filter-branch and other bulk import commands create metadata in
+this namespace.
+
+Note that the changes in hiddenmetas get special treatment in several ways:
+
+- They are not cleaned up automatically once merged, since it is expected that
+ they refer to historical changes.
+- User commands that modify changes don't append to these changes as they would
+ to a change in refs/metas.
+- They are not displayed when the user lists their local changes.
+
+Obsolescence
+------------
+A commit is considered obsolete if it is reachable from the “replaces” edges
+anywhere in the history of a change and it isn’t the head of that change.
+Commits may be the content for 0 or more meta-commits. If the same commit
+appears in multiple changes, it is not obsolete if it is the head of any of
+those changes.
+
+Note that there is an exeption to this rule. The metas namespace takes
+precedence over the hiddenmetas namespace for the purpose of obsolescence. That
+is, if a change appears in a replaces edge of a change in the metas namespace,
+it is obsolete even if it also appears as the head of a change in the
+hiddenmetas namespace.
+
+This special case prevents the hiddenmetas namespace from creating divergence
+with the user's work in progress, and allows the user to resolve historical
+divergence by creating new changes in the metas namespace.
+
+Divergence
+----------
+From the user’s perspective, two changes are divergent if they both ask for
+different replacements to the same commit. More precisely, a target commit is
+considered divergent if there is more than one commit at the head of a change in
+refs/metas that leads to the target commit via an unbroken chain of “obsolete”
+parents.
+
+Much like a merge conflict, divergence is a situation that requires user
+intervention to resolve. The evolve command will stop when it encounters
+divergence and prompt the user to resolve the problem. Users can solve the
+problem in several ways:
+
+- Discard one of the changes (by deleting its change branch).
+- Merge the two changes (producing a single change branch).
+- Copy one of the changes (keep both commits, but one of them gets a new
+ metacommit appended to its history that is connected to its predecessor via an
+ origin edge rather than an obsolete edge. That new change no longer obsoletes
+ the original.)
+
+Obsolescence across cherry-picks
+--------------------------------
+By default the evolve command will treat cherry-picks and squash merges as being
+completely separate from the original. Further amendments to the original commit
+will have no effect on the cherry-picked copy. However, this behavior may not be
+desirable in all circumstances.
+
+The evolve command may at some point support an option to look for cases where
+the source of a cherry-pick or squash merge has itself been amended, and
+automatically apply that same change to the cherry-picked copy. In such cases,
+it would traverse origin edges rather than ignoring them, and would treat a
+commit with origin edges as being obsolete if any of its origins were obsolete.
+
+Garbage collection
+------------------
+For GC purposes, meta-commits are normal commits. Just as a commit causes its
+parents and tree to be retained, a meta-commit also causes its parents to be
+retained.
+
+Change creation
+---------------
+Changes are created automatically whenever the user runs a command like “commit”
+that has the semantics of creating a new change. They also move forward
+automatically even if they’re not checked out. For example, whenever the user
+runs a command like “commit --amend” that modifies a commit, all branches in
+refs/metas that pointed to the old commit move forward to point to its
+replacement instead. This also happens when the user is working from a detached
+head.
+
+This does not mean that every commit has a corresponding change. By default,
+changes only exist for recent locally-created commits. Users may explicitly pull
+changes from other users or keep their changes around for a long time, but
+either behavior requires a user to opt-in. Code review systems like gerrit may
+also choose to keep changes around forever.
+
+Note that the changes in refs/metas serve a dual function as both a way to
+identify obsolete changes and as a way for the user to keep track of their work
+in progress. If we were only concerned with identifying obsolete changes, it
+would be sufficient to create the change branch lazily the first time a commit
+is obsoleted. Addressing the second use - of refs/metas as a mechanism for
+keeping track of work in progress - is the reason for eagerly creating the
+change on first commit.
+
+Change naming
+-------------
+When a change is first created, the only requirement for its name is that it
+must be unique. Good names would also serve as useful mnemonics and be easy to
+type. For example, a short word from the commit message containing no numbers or
+special characters and that shows up with low frequency in other commit messages
+would make a good choice.
+
+Different users may prefer different heuristics for their change names. For this
+reason a new hook will be introduced to compute change names. Git will invoke
+the hook for all newly-created changes and will append a numeric suffix if the
+name isn’t unique. The default heuristics are not specified by this proposal and
+may change during implementation.
+
+Change deletion
+---------------
+Changes are normally only interesting to a user while a commit is still in
+development and under review. Once the commit has submitted wherever it is
+going, its change can be discarded.
+
+The normal way of deleting changes makes this easy to do - changes are deleted
+by the evolve command when it detects that the change is present in an upstream
+branch. It does this in two ways: if the latest commit in a change either shows
+up in the branch history or the change becomes empty after a rebase, it is
+considered merged and the change is discarded. In this context, an “upstream
+branch” is any branch passed in as the upstream argument of the evolve command.
+
+In case this sometimes deletes a useful change, such automatic deletions are
+recorded in the reflog allowing them to be easily recovered.
+
+Sharing changes
+---------------
+Change histories are shared by pushing or fetching meta-commits and change
+branches. This provides users with a lot of control of what to share and
+repository implementations with control over what to retain.
+
+Users that only want to share the content of a commit can do so by pushing the
+commit itself as they currently would. Users that want to share an edit history
+for the commit can push its change, which would point to a meta-commit rather
+than the commit itself if there is any history to share. Note that multiple
+changes can refer to the same commits, so it’s possible to construct and push a
+different history for the same commit in order to remove sensitive or irrelevant
+intermediate states.
+
+Imagine the user is working on a change “mychange” that is currently the latest
+commit on master, they have two ways to share it:
+
+# User shares just a commit without its history
+> git push origin master
+
+# User shares the full history of the commit to a review system
+> git push origin metas/mychange:refs/for/master
+
+# User fetches a collaborator’s modifications to their change
+> git fetch remotename metas/mychange
+# Which updates the ref remote/remotename/metas/mychange
+
+This will cause more intermediate states to be shared with the server than would
+have been shared previously. A review system like gerrit would need to keep
+track of which states had been explicitly pushed versus other intermediate
+states in order to de-emphasize (or hide) the extra intermediate states from the
+user interface.
+
+Merge-base
+----------
+Merge-base will be changed to search the meta-commit graph for common ancestors
+as well as the commit graph, and will generally prefer results from the
+meta-commit graph over the commit graph. Merge-base will consider meta-commits
+from all changes, and will traverse both origin and obsolete edges.
+
+The reason for this is that - when merging two versions of the same commit
+together - an earlier version of that same commit will usually be much more
+similar than their common parent. This should make the workflow of collaborating
+on unsubmitted patches as convenient as the workflow for collaborating in a
+topic branch by eliminating repeated merges.
+
+Configuration
+-------------
+The core.enableChanges configuration variable enables the creation and update
+of change branches. This is enabled by default.
+
+User interface
+--------------
+All git porcelain commands that create commits are classified as having one of
+four behaviors: modify, create, copy, or import. These behaviors are discussed
+in more detail below.
+
+Modify commands
+---------------
+Modification commands (commit --amend, rebase) will mark the old commit as
+obsolete by creating a new meta-commit that references the old one as a
+replaced parent. In the event that multiple changes point to the same commit,
+this is done independently for every such change.
+
+More specifically, modifications work like this:
+
+1. Locate all existing changes for which the old commit is the content for the
+ head of the change branch. If no such branch exists, create one that points
+ to the old commit. Changes that include this commit in their history but not
+ at their head are explicitly not included.
+2. For every such change, create a new meta-commit that references the new
+ commit as its content and references the old head of the change as a
+ replaced parent.
+3. Move the change branch forward to point to the new meta-commit.
+
+Copy commands
+-------------
+Copy commands (cherry-pick, merge --squash) create a new meta-commit that
+references the old commits as origin parents. Besides the fact that the new
+parents are tagged differently, copy commands work the same way as modify
+commands.
+
+Create commands
+---------------
+Creation commands (commit, merge) create a new commit and a new change that
+points to that commit. The do not create any meta-commits.
+
+Import commands
+---------------
+Import commands (fetch, pull) do not create any new meta-commits or changes
+unless that is specifically what they are importing. For example, the fetch
+command would update remote/origin/metas/change35 and fetch all referenced
+meta-commits if asked to do so directly, but it wouldn’t create any changes or
+meta-commits for commits discovered on the master branch when running “git fetch
+origin master”.
+
+Other commands
+--------------
+Some commands don’t fit cleanly into one of the above categories.
+
+Semantically, filter-branch should be treated as a modify command, but doing so
+is likely to create a lot of irrelevant clutter in the changes namespace and the
+large number of extra change refs may introduce performance problems. We
+recommend treating filter-branch as an import command initially, but making it
+behave more like a modify command in future follow-up work. One possible
+solution may be to treat commits that are part of existing changes as being
+modified but to avoid creating changes for other rewritten changes.
+
+Once the evolve command can handle obsolescence across cherry-picks, such
+cherry-picks will result in a hybrid move-and-copy operation. It will create
+cherry-picks that replace other cherry-picks, which will have both origin edges
+(pointing to the new source commit being picked) and obsolete edges (pointing to
+the previous cherry-pick being replaced).
+
+Evolve
+------
+The evolve command performs the correct sequence of rebases such that no change
+has an obsolete parent. The syntax looks like this:
+
+git evolve [--abort][--continue][--quit] [upstream…]
+
+It takes an optional list of upstream branches. All changes whose parent shows
+up in the history of one of the upstream branches will be rebased onto the
+upstream branch before resolving obsolete parents.
+
+Any change whose latest state is found in an upstream branch (or that ends up
+empty after rebase) will be deleted. This is the normal mechanism for deleting
+changes. Changes are created automatically on the first commit, and are deleted
+automatically when evolve determines that they’ve been merged upstream.
+
+Orphan commits are commits with obsolete parents. The evolve command then
+repeatedly rebases orphan commits with non-orphan parents until there are either
+no orphan commits left, a merge conflict is discovered, or a divergent parent is
+discovered.
+
+When evolve discovers divergence, it will first check if it can resolve the
+divergence automatically using one of its enabled transformations. Supported
+transformations are:
+
+- Check if the user has already merged the divergent changes in a follow-up
+ change. That is, look for an existing merge in a follow-up change where all
+ the parents are divergent versions of the same change. Squash that merge with
+ its parents and use the result as the resolution for the divergence.
+
+- Attempt to auto-merge all the divergent changes (disabled by default).
+
+Each of the transformations can be enabled or disabled by command line options.
+
+The --abort option returns all changes to the state they were in prior to
+invoking evolve, and the --quit option terminates the current evolution without
+changing the current state.
+
+If the working tree is dirty, evolve will attempt to stash the user's changes
+before applying the evolve and then reapply those changes afterward, in much
+the same way as rebase --autostash does.
+
+Checkout
+--------
+Running checkout on a change by name has the same effect as checking out a
+detached head pointing to the latest commit on that change-branch. There is no
+need to ever have HEAD point to a change since changes always move forward when
+necessary, no matter what branch the user has checked out
+
+Meta-commits themselves cannot be checked out by their hash.
+
+Reset
+-----
+Resetting a branch to a change by name is the same as resetting to the commit at
+that change’s head.
+
+Commit
+------
+Commit --amend gets modify semantics and will move existing changes forward. The
+normal form of commit gets create semantics and will create a new change.
+
+$ touch foo && git add . && git commit -m "foo" && git tag A
+$ touch bar && git add . && git commit -m "bar" && git tag B
+$ touch baz && git add . && git commit -m "baz" && git tag C
+
+This produces the following commits:
+A(tree=[foo])
+B(tree=[foo, bar], parent=A)
+C(tree=[foo, bar, baz], parent=B)
+
+...along with three changes:
+metas/foo = A
+metas/bar = B
+metas/baz = C
+
+Running commit --amend does the following:
+$ git checkout B
+$ touch zoom && git add . && git commit --amend -m "baz and zoom"
+$ git tag D
+
+Commits:
+A(tree=[foo])
+B(tree=[foo, bar], parent=A)
+C(tree=[foo, bar, baz], parent=B)
+D(tree=[foo, bar, zoom], parent=A)
+Dmeta(content=D, obsolete=B)
+
+Changes:
+metas/foo = A
+metas/bar = Dmeta
+metas/baz = C
+
+Merge
+-----
+Merge gets create, modify, or copy semantics based on what is being merged and
+the options being used.
+
+The --squash version of merge gets copy semantics (it produces a new change that
+is marked as a copy of all the original changes that were squashed into it).
+
+The “modify” version of merge replaces both of the original commits with the
+resulting merge commit. This is one of the standard mechanisms for resolving
+divergence. The parents of the merge commit are the parents of the two commits
+being merged. The resulting commit will not be a merge commit if both of the
+original commits had the same parent or if one was the parent of the other.
+
+The “create” version of merge creates a new change pointing to a merge commit
+that has both original commits as parents. The result is what merge produces now
+- a new merge commit. However, this version of merge doesn’t directly resolve
+divergence.
+
+To select between these two behaviors, merge gets new “--amend” and “--noamend”
+options which select between the “create” and “modify” behaviors respectively,
+with noamend being the default.
+
+For example, imagine we created two divergent changes like this:
+
+$ touch foo && git add . && git commit -m "foo" && git tag A
+$ touch bar && git add . && git commit -m "bar" && git tag B
+$ touch baz && git add . && git commit --amend -m "bar and baz"
+$ git tag C
+$ git checkout B
+$ touch bam && git add . && git commit --amend -m "bar and bam"
+$ git tag D
+
+At this point the commit graph looks like this:
+
+A(tree=[foo])
+B(tree=[bar], parent=A)
+C(tree=[bar, baz], parent=A)
+D(tree=[bar, bam], parent=A)
+Cmeta(content=C, obsoletes=B)
+Dmeta(content=D, obsoletes=B)
+
+There would be three active changes with heads pointing as follows:
+
+metas/changeA=A
+metas/changeB=Cmeta
+metas/changeB2=Dmeta
+
+ChangeB and changeB2 are divergent at this point. Lets consider what happens if
+perform each type of merge between changeB and changeB2.
+
+Merge example: Amend merge
+One way to resolve divergent changes is to use an amend merge. Recall that HEAD
+is currently pointing to D at this point.
+
+$ git merge --amend metas/changeB
+
+Here we’ve asked for an amend merge since we’re trying to resolve divergence
+between two versions of the same change. There are no conflicts so we end up
+with this:
+
+E(tree=[bar, baz, bam], parent=A)
+Emeta(content=E, obsoletes=[Cmeta, Dmeta])
+
+With the following branches:
+
+metas/changeA=A
+metas/changeB=Emeta
+metas/changeB2=Emeta
+
+Notice that the result of the “amend merge” is a replacement for C and D rather
+than a new commit with C and D as parents (as a normal merge would have
+produced). The parents of the amend merge are the parents of C and D which - in
+this case - is just A, so the result is not a merge commit. Also notice that
+changeB and changeB2 are now aliases for the same change.
+
+Merge example: Noamend merge
+Consider what would have happened if we’d used a noamend merge instead. Recall
+that HEAD was at D and our branches looked like this:
+
+metas/changeA=A
+metas/changeB=Cmeta
+metas/changeB2=Dmeta
+
+$ git merge --noamend metas/changeB
+
+That would produce the sort of merge we’d normally expect today:
+
+F(tree=[bar, baz, bam], parent=[C, D])
+
+And our changes would look like this:
+metas/changeA=A
+metas/changeB=Cmeta
+metas/changeB2=Dmeta
+metas/changeF=F
+
+In this case, changeB and changeB2 are still divergent and we’ve created a new
+change for our merge commit. However, this is just a temporary state. The next
+time we run the “evolve” command, it will discover the divergence but also
+discover the merge commit F that resolves it. Evolve will suggest converting F
+into an amend merge in order to resolve the divergence and will display the
+command for doing so.
+
+Rebase
+------
+In general the rebase command is treated as a modify command. When a change is
+rebased, the new commit replaces the original.
+
+Rebase --abort is special. Its intent is to restore git to the state it had
+prior to running rebase. It should move back any changes to point to the refs
+they had prior to running rebase and delete any new changes that were created as
+part of the rebase. To achieve this, rebase will save the state of all changes
+in refs/metas prior to running rebase and will restore the entire namespace
+after rebase completes (deleting any newly-created changes). Newly-created
+metacommits are left in place, but will have no effect until garbage collected
+since metacommits are only used if they are reachable from refs/metas.
+
+Change
+------
+The “change” command can be used to list, rename, reset or delete change. It has
+a number of subcommands.
+
+The "list" subcommand lists local changes. If given the -r argument, it lists
+remote changes.
+
+The "rename" subcommand renames a change, given its old and new name. If the old
+name is omitted and there is exactly one change pointing to the current HEAD,
+that change is renamed. If there are no changes pointing to the current HEAD,
+one is created with the given name.
+
+The "forget" subcommand deletes a change by deleting its ref from the metas/
+namespace. This is the normal way to delete extra aliases for a change if the
+change has more than one name. By default, this will refuse to delete the last
+alias for a change if there are any other changes that reference this change as
+a parent.
+
+The "update" subcommand adds a new state to a change. It uses the default
+algorithm for assigning change names. If the content commit is omitted, HEAD is
+used. If given the optional --force argument, it will overwrite any existing
+change of the same name. This latter form of "update" can be used to effectively
+reset changes.
+
+The "update" command can accept any number of --origin and --replace arguments.
+If any are present, the resulting change branch will point to a metacommit
+containing the given origin and replacement edges.
+
+The "replace" command records a replacement in the obsolescence graph, given a
+list of obsolete commits or metacommits followed by their replacement. This
+behaves like a normal "modify" command, except that the replacement is an
+existing commit. If an obsolete commit points to a metacommit, only a change
+branch pointing to exactly that metacommit moves forward. If an obsolete commit
+points to a normal commit, all change branches pointing to that commit move
+forward. If no change branches moved forward, a new change branch is created
+using the default name.
+
+The "abandon" command deletes a change using obsolescence markers. It marks the
+change as being obsolete and having been replaced by its parent. If given no
+arguments, it applies to the current commit. Running evolve will cause any
+abandoned changes to be removed from the branch. Any child changes will be
+reparented on top of the parent of the abandoned change. If the current change
+is abandoned, HEAD will move to point to its parent.
+
+The "restore" command restores a previously-abandoned change.
+
+The "prune" command deletes all obsolete changes and all changes that are
+present in the given branch. Note that such changes can be recovered from the
+reflog.
+
+Combined with the GC protection that is offered, this is intended to facilitate
+a workflow that relies on changes instead of branches. Users could choose to
+work with no local branches and use changes instead - both for mailing list and
+gerrit workflows.
+
+Log
+---
+When a commit is shown in git log that is part of a change, it is decorated with
+extra change information. If it is the head of a change, the name of the change
+is shown next to the list of branches. If it is obsolete, it is decorated with
+the text “obsolete, <n> commits behind <changename>”.
+
+Log gets a new --obslog argument indicating that the obsolescence graph should
+be followed instead of the commit graph. This also changes the default
+formatting options to make them more appropriate for viewing different
+iterations of the same commit.
+
+Pull
+----
+
+Pull gets an --evolve argument that will automatically attempt to run "evolve"
+on any affected branches after pulling.
+
+We also introduce an "evolve" enum value for the branch.<name>.rebase config
+value. When set, the evolve behavior will happen automatically for that branch
+after every pull even if the --evolve argument is not used.
+
+Next
+----
+
+The "next" command will reset HEAD to a non-obsolete commit that refers to this
+change as its parent. If there is more than one such change, the user will be
+prompted. If given the --evolve argument, the next commit will be evolved if
+necessary first.
+
+The "next" command can be thought of as the opposite of
+"git reset --hard HEAD^" in that it navigates to a child commit rather than a
+parent.
+
+Other options considered
+========================
+We considered several other options for storing the obsolescence graph. This
+section describes the other options and why they were rejected.
+
+Commit header
+-------------
+Add an “obsoletes” field to the commit header that points backwards from a
+commit to the previous commits it obsoletes.
+
+Pros:
+- Very simple
+- Easy to traverse from a commit to the previous commits it obsoletes.
+Cons:
+- Adds a cost to the storage format, even for commits where the change history
+ is uninteresting.
+- Unconditionally prevents the change history from being garbage collected.
+- Always causes the change history to be shared when pushing or pulling changes.
+
+Git notes
+---------
+Instead of storing obsolescence information in metacommits, the metacommit
+content could go in a new notes namespace - say refs/notes/metacommit. Each note
+would contain the list of obsolete and origin parents, and an automerger could
+be supplied to make it easy to merge the metacommit notes from different remotes.
+
+Pros:
+- Easy to locate all commits obsoleted by a given commit (since there would only
+ be one metacommit for any given commit).
+Cons:
+- Wrong GC behavior (obsolete commits wouldn’t automatically be retained by GC)
+ unless we introduced a special case for these kinds of notes.
+- No way to selectively share or pull the metacommits for one specific change.
+ It would be all-or-nothing, which would be expensive. This could be addressed
+ by changes to the protocol, but this would be invasive.
+- Requires custom auto-merging behavior on fetch.
+
+Tags
+----
+Put the content of the metacommit in a message attached to tag on the
+replacement commit. This is very similar to the git notes approach and has the
+same pros and cons.
+
+Simple forward references
+-------------------------
+Record an edge from an obsolete commit to its replacement in this form:
+
+refs/obsoletes/<A>
+
+pointing to commit <B> as an indication that B is the replacement for the
+obsolete commit A.
+
+Pros:
+- Protects <B> from being garbage collected.
+- Fast lookup for the evolve operation, without additional search structures
+ (“what is the replacement for <A>?” is very fast).
+
+Cons:
+- Can’t represent divergence (which is a P0 requirement).
+- Creates lots of refs (which can be inefficient)
+- Doesn’t provide a way to fetch only refs for a specific change.
+- The obslog command requires a search of all refs.
+
+Complex forward references
+--------------------------
+Record an edge from an obsolete commit to its replacement in this form:
+
+refs/obsoletes/<change_id>/obs<A>_<B>
+
+Pointing to commit <B> as an indication that B is the replacement for obsolete
+commit A.
+
+Pros:
+- Permits sharing and fetching refs for only a specific change.
+- Supports divergence
+- Protects <B> from being garbage collected.
+
+Cons:
+- Creates lots of refs, which is inefficient.
+- Doesn’t provide a good lookup structure for lookups in either direction.
+
+Backward references
+-------------------
+Record an edge from a replacement commit to the obsolete one in this form:
+
+refs/obsolescences/<B>
+
+Cons:
+- Doesn’t provide a way to resolve divergence (which is a P0 requirement).
+- Doesn’t protect <B> from being garbage collected (which could be fixed by
+ combining this with a refs/metas namespace, as in the metacommit variant).
+
+Obsolescences file
+------------------
+Create a custom file (or files) in .git recording obsolescences.
+
+Pros:
+- Can store exactly the information we want with exactly the performance we want
+ for all operations. For example, there could be a disk-based hashtable
+ permitting constant time lookups in either direction.
+
+Cons:
+- Handling GC, pushing, and pulling would all require custom solutions. GC
+ issues could be addressed with a repository format extension.
+
+Squash points
+-------------
+We create and update change branches in refs/metas them at the same time we
+would in the metacommit proposal. However, rather than pointing to a metacommit
+branch they point to normal commits and are treated as “squash points” - markers
+for sequences of commits intended to be squashed together on submission.
+
+Amends and rebases work differently than they do now. Rather than actually
+containing the desired state of a commit, they contain a delta from the previous
+version along with a squash point indicating that the preceding changes are
+intended to be squashed on submission. Specifically, amends would become new
+changes and rebases would become merge commits with the old commit and new
+parent as parents.
+
+When the changes are finally submitted, the squashes are executed, producing the
+final version of the commit.
+
+In addition to the squash points, git would maintain a set of “nosquash” tags
+for commits that were used as ancestors of a change that are not meant to be
+included in the squash.
+
+For example, if we have this commit graph:
+
+A(...)
+B(parent=A)
+C(parent=B)
+
+...and we amend B to produce D, we’d get:
+
+A(...)
+B(parent=A)
+C(parent=B)
+D(parent=B)
+
+...along with a new change branch indicating D should be squashed with its
+parents when submitted:
+
+metas/changeB = D
+metas/changeC = C
+
+We’d also create a nosquash tag for A indicating that A shouldn’t be included
+when changeB is squashed.
+
+If a user amends the change again, they’d get:
+
+A(...)
+B(parent=A)
+C(parent=B)
+D(parent=B)
+E(parent=D)
+
+metas/changeB = E
+metas/changeC = C
+
+Pros:
+- Good GC behavior.
+- Provides a natural way to share changes (they’re just normal branches).
+- Merge-base works automatically without special cases.
+- Rewriting the obslog would be easy using existing git commands.
+- No new data types needed.
+Cons:
+- No way to connect the squashed version of a change to the original, so no way
+ to automatically clean up old changes. This also means users lose all benefits
+ of the evolve command if they prematurely squash their commits. This may occur
+ if a user thinks a change is ready for submission, squashes it, and then later
+ discovers an additional change to make.
+- Histories would look very cluttered (users would see all previous edits to
+ their commit in the commit log, and all previous rebases would show up as
+ merges). Could be quite hard for users to tell what is going on. (Possible
+ fix: also implement a new smart log feature that displays the log as though
+ the squashes had occurred).
+- Need to change the current behavior of current commands (like amend and
+ rebase) in ways that will be unexpected to many users.
--
2.20.0.405.gbc1bbc6f85-goog
^ permalink raw reply related
* Re: Can git tell me which uncommitted files clash with the incoming changes?
From: Elijah Newren @ 2018-12-18 15:51 UTC (permalink / raw)
To: Jeff King; +Cc: Mark Kharitonov, Duy Nguyen, Git Mailing List
In-Reply-To: <20181218131429.GE30471@sigill.intra.peff.net>
On Tue, Dec 18, 2018 at 5:14 AM Jeff King <peff@peff.net> wrote:
>
> On Mon, Dec 17, 2018 at 05:50:31PM -0500, Mark Kharitonov wrote:
>
> > Guys, having git merge --dry-run would be great, but I am OK with git
> > merge for real as long as its output is parseable.
Don't rely on that. merge output has changed occasionally and will
likely change again in the future multiple times. The other solutions
provided by Peff and Duy are much better. If we need to add more
options to provide you with what you need, then that's a route we can
take, but I'll make no guarantees about merge output being stable and
parseable.
All that said...
> > However, somewhere in between git 2.18 and git 2.20 the output of
> > merge changed and now I do not know how to parse it.
> > it used to be something like that:
> >
> > bla bla bla
> > <tab>file name 1
> > <tab>file name 2
> > ...
> > bla bla bla
> >
> > But now, the files are output in one line and given that some files
> > may have spaces in the name I do not see how this can be parsed. If we
> > could have easily parseable output of merge, it would be enough for
> > me.
>
> Interesting. I don't see that behavior at all. E.g., running this:
>
...
> I see:
>
> error: The following untracked working tree files would be overwritten by merge:
> one
> three
> two
> Please move or remove them before you merge.
>
> I wonder if it has to do with Windows.
>
> If you can reproduce it at will, can you try bisecting between v2.18 and
> v2.20 to see which commit introduced the change?
>
> -Peff
I see the same as Peff, and I see no changes to
unpack_trees.c:display_error_msgs() since git 2.11 (not even in a
clone of git-for-windows), and as far as I can tell that function is
the place that prints all the files and adds the newlines, so I'm kind
of perplexed how you're seeing things print multiple files on a line.
Bisecting as Peff suggests would help but I'm curious if there's
something special about your setup needed to reproduce and which
changed since you were using 2.18 (e.g. always forcing output through
a pager and having a pager that doesn't understand LF and requires
CRLF to display things correctly??)
^ permalink raw reply
* Re: [PATCHSET] git-reverse-trailer-xrefs: Reverse map cherry-picks and other cross-references
From: Stefan Xenos @ 2018-12-18 14:40 UTC (permalink / raw)
To: tj; +Cc: Junio C Hamano, git, peff, kernel-team
In-Reply-To: <20181213034606.GS2509588@devbig004.ftw2.facebook.com>
Heya, Tejun!
Thanks for getting in touch. I agree with Junio that we probably
shouldn't be tracking the same information in two ways if we can think
of something that does both... so let's see if we can think of
something! The evolve proposal suggests having a metas/ namespace to
track ongoing changes is intended as a way to track the changes a user
is actively working on. If we were to use it to retroactively store
every historical cherry-pick in a large repository, I'm concerned it
may get too cluttered to use for its originally-intended purpose. I'm
also not sure how well things would perform with a huge number of
refs. The remainder of the proposal (using metacommits to store the
relationships) would work for the xref use-case, but we'd probably
want to tweak the way we store the root refs in order to do a good job
with the large number of inactive cherry-picks he probably wants. Any
code that is looking for cross-reference metadata could look in both
namespaces.
Conversely, if we were to tweak the xrefs proposal by adding the
obsolescence information that evolve needs, we'd be missing a place to
store the user's ongoing changes, a way to push and pull specific
changes, a way to describe alternate histories for a commit, and a
mechanism for preventing the commits needed by evolve from getting
garbage collected.
All the problems with both approaches are solve-able, though.
I spent a few hours discussing this with Stefan Beller last week and I
think we've come up with a variation on the evolve proposal that would
cover both our use-cases. Let me know what you think. To address the
cluttering of the metas namespace, we could introduce a new
"hiddenmetas" namespace. It would contain the same type of data
recorded in the metas namespace, but it would normally be hidden from
the user when they list their changes, etc. It would also be immune
from the automatic deletions that are applied to the "metas"
namespace. From your point of view, the critical bit is that it would
contain information about cherry-picks. That would address the
"user-visible clutter" problem. Utility methods that want to search
for cherry-picks would iterate over both namespaces.
For the performance problem, I think we could just live with it
temporarily and wait for the currently-ongoing reftable work since the
reftable proposal would address exactly this performance issue (by
allowing us to store and manipulate large numbers of refs
efficiently).
A nice characteristic of this approach is that it would also address a
problem with the evolve proposal that had concerned me: how to deal
with the filter-branch command, which would have created pretty much
the same problems regarding the large number of metadata refs for
changes the user isn't actively working on.
It might be nice to also consider and some alternatives, but I haven't
had enough time to think up more of them (and I haven't digested the
xrefs proposal sufficiently to suggest an enhanced version of it yet).
If anyone else has ideas for combining the xrefs and evolve use-cases,
having more alternatives to choose from is always better!
If you're okay with the "hiddenmetas" approach in principle, I'll
update the evolve proposal doc to include it. Once we work out how the
storage format will work, we can coordinate our implementation work.
- Stefan
On Wed, Dec 12, 2018 at 7:46 PM Tejun Heo <tj@kernel.org> wrote:
>
> Hello, Junio, Stefan.
>
> On Thu, Dec 13, 2018 at 12:09:39PM +0900, Junio C Hamano wrote:
> > Please do not take the above as "don't do notes/xref-; instead read
> > from the 'meta commits'". I do not have a preference between the
> > two proposed implementations. The important thing is that we won't
> > end up with having to maintain two separate mechanisms that want to
> > keep track of essentially the same class of information. FWIW I'd
> > be perfectly fine if the unification goes the other way, as long as
> > goals of both parties are met, and for that, I'd like to see you two
> > work together, or at least be aware of what each other is doing and
> > see if cross-pollination would result in a mutually better solution.
>
> So, from my POV, the only thing I want is being able to easily tell
> which commit got cherry picked where. I really don't care how that's
> achieved. Stefan, if there's anything I can do to push it forward,
> let me know and if you see anything useful in my patchset, please feel
> free to incorporate them in any way.
>
> Thanks.
>
> --
> tejun
^ permalink raw reply
* Re: commit-graph idea: warn when disabled for incompatible repositories
From: Derrick Stolee @ 2018-12-18 14:35 UTC (permalink / raw)
To: Thomas Ferris Nicolaisen, git@vger.kernel.org
In-Reply-To: <CAEcj5uXJC3Za0YCyazJi82JdF-tLCDs5OrzCwyD8Y155GnJa6Q@mail.gmail.com>
On 12/18/2018 9:22 AM, Thomas Ferris Nicolaisen wrote:
> Hey there,
Hi, Thomas!
> Accidentally, my first use-case was a local copy of a big repository
> (chromium source) that another developer had cloned shallow (I wasn't
> even aware of their clone being shallow at that point).
>
> After spending a little time trying to figure out why no commit-graph
> file was being created, I noticed that it worked just fine testing in
> a fresh git repo.
>
> Then I discovered the .git/shallow file in the big repo. So I did
> fetch --unshallow, and commit-graph started working. Taking a 20
> second log --graph operation down to less than a second (wooo!).
>
> I saw some recent release notes that mentions that commit-graph is
> disabled in incompatible repositories (graft, replace). I assume this
> also be the case for shallow clones.
The commit-graph feature is not designed to work well with these
features, and the "easy" fix we have so far is to completely avoid the
interaction. The tricky bits come in when the commit parents can
"change" according to these other features. The commit-graph would need
to change at the same time, and this is actually very difficult to get
correct.
> Here's the idea that may help others on the same path: Some warning
> output when attempting to use commit-graph when it is disabled (either
> by configuration or automatically).
>
> I think others that come across commit-graph may have tried such
> tricks (like shallow clones) to better work with their repositories,
> and it could be frustrating that commit-graph has no apparent effect.
This is a good idea, and I would happily review a patch that added such
a warning.
The warning would want to be in builtin/commit-graph.c, and use the
commit_graph_compatible() method from commit-graph.c. (You'll need to
expose the method in the .h file.)
Thanks!
-Stolee
^ permalink raw reply
* commit-graph idea: warn when disabled for incompatible repositories
From: Thomas Ferris Nicolaisen @ 2018-12-18 14:22 UTC (permalink / raw)
To: git@vger.kernel.org; +Cc: stolee
Hey there,
I recently started dabbling with commit-graph.
Accidentally, my first use-case was a local copy of a big repository
(chromium source) that another developer had cloned shallow (I wasn't
even aware of their clone being shallow at that point).
After spending a little time trying to figure out why no commit-graph
file was being created, I noticed that it worked just fine testing in
a fresh git repo.
Then I discovered the .git/shallow file in the big repo. So I did
fetch --unshallow, and commit-graph started working. Taking a 20
second log --graph operation down to less than a second (wooo!).
I saw some recent release notes that mentions that commit-graph is
disabled in incompatible repositories (graft, replace). I assume this
also be the case for shallow clones.
Here's the idea that may help others on the same path: Some warning
output when attempting to use commit-graph when it is disabled (either
by configuration or automatically).
I think others that come across commit-graph may have tried such
tricks (like shallow clones) to better work with their repositories,
and it could be frustrating that commit-graph has no apparent effect.
Apologies if this is a stupid/bad/old idea!
^ permalink raw reply
* Git hooks don't run while commiting in worktree via git-gui
From: Иван Могиш @ 2018-12-18 13:46 UTC (permalink / raw)
To: git
Cc: Дмитрий Яковлев
Hello.
There is a little difference in behavior when you are committing to
the worktree from CLI or some git client (tortoisegit/sourcetree) and
embedded git gui.
If you configure git hooks in your repository and then add a worktree
(via git worktree add), hooks from main repository works well both in
main directory and in worktree, if you are using CLI/third-party GUI.
Committing in the main directory via embedded git-gui works fine too,
hooks are running. But when you try to commit in the worktree
directory from git-gui, hooks don't work.
I think I've found the cause of this:
https://github.com/git/git/blob/master/git-gui/lib/commit.tcl#L238
variable fd equals {} because proc githook_read calls proc gitdir to
determine path to hooks.
https://github.com/git/git/blob/master/git-gui/git-gui.sh#L626
This proc use variable _gitdir for calculating result. This var equals
the result of executing git rev-parse --git-dir
https://github.com/git/git/blob/master/git-gui/git-gui.sh#L1245
So, the path to hooks for worktree is
path_to_main_repo/.git/worktrees/my_worktree/hooks, but there are no
hooks. Hooks are in path_to_main_repo/.git/hooks, from where they are
correctly (or not?) executed by git cli, while running from worktree
directory.
If we put hooks to path_to_main_repo/.git/worktrees/my_worktree/hooks
too, they will work both in git citool and CLI/third-party GUI. But
they will execute different files, and it may cause some problems.
--
Ivan Mogish
Support Engineer
Phone: +9115212057
^ permalink raw reply
* Re: Can git tell me which uncommitted files clash with the incoming changes?
From: Jeff King @ 2018-12-18 13:14 UTC (permalink / raw)
To: Mark Kharitonov; +Cc: Duy Nguyen, Elijah Newren, Git Mailing List
In-Reply-To: <CAG2YSPy85YtAv6m5WR4ZrsZ4TRzgcyrC4DNZnOONtFD6MsH=YQ@mail.gmail.com>
On Mon, Dec 17, 2018 at 05:50:31PM -0500, Mark Kharitonov wrote:
> Guys, having git merge --dry-run would be great, but I am OK with git
> merge for real as long as its output is parseable.
>
> However, somewhere in between git 2.18 and git 2.20 the output of
> merge changed and now I do not know how to parse it.
> it used to be something like that:
>
> bla bla bla
> <tab>file name 1
> <tab>file name 2
> ...
> bla bla bla
>
> But now, the files are output in one line and given that some files
> may have spaces in the name I do not see how this can be parsed. If we
> could have easily parseable output of merge, it would be enough for
> me.
Interesting. I don't see that behavior at all. E.g., running this:
-- >8 --
git init repo
cd repo
echo base >base; git add base; git commit -m base
for i in one two three; do
echo $i >$i
done
git add .
git commit -m master
git checkout -b other HEAD^
echo other >other; git add other; git commit -m other
for i in one two three; do
echo working-tree >$i
done
git pull . master
-- 8< --
I see:
error: The following untracked working tree files would be overwritten by merge:
one
three
two
Please move or remove them before you merge.
I wonder if it has to do with Windows.
If you can reproduce it at will, can you try bisecting between v2.18 and
v2.20 to see which commit introduced the change?
-Peff
^ permalink raw reply
* Re: [PATCH v2 8/8] tests: mark tests broken under GIT_TEST_PROTOCOL_VERSION=2
From: Ævar Arnfjörð Bjarmason @ 2018-12-18 13:10 UTC (permalink / raw)
To: Jeff King
Cc: Jonathan Nieder, git, Junio C Hamano, Brandon Williams,
Jonathan Tan
In-Reply-To: <20181218123646.GA30471@sigill.intra.peff.net>
On Tue, Dec 18 2018, Jeff King wrote:
> On Mon, Dec 17, 2018 at 03:14:52PM -0800, Jonathan Nieder wrote:
>
>> > IMHO those security guarantees there are overrated (due to delta
>> > guessing attacks, though things are not quite as bad if the attacker
>> > can't actually push to the repo).
>>
>> Do you have a proof of concept for delta guessing? My understanding
>> was that without using a broken hash (e.g. uncorrected SHA-1), it is
>> not feasible to carry out.
>
> I think we may be talking about two different things. I mean an attack
> where you want to know what is in object X, so you ask the server for
> object Y and tell it that you already have X. If the sender generates a
> delta against X, that tells you something about what's in X.
>
> For a pure read-only server, you're restricted to the Y's that are
> already in the repo. So how effective this is depends on what's in X,
> and what Y's are available.
>
> For a case where X is in a victim repo you cannot make arbitrary writes
> to, but you _can_ make the victim repo aware of other objects (e.g., by
> opening a pull request that creates a ref), then you can iteratively
> provide many Y's, improving your guess about X in each iteration.
>
> For a case where the victim repo has fully shared storage (GitHub, and
> probably other hosts; I'm not sure if it's available yet, but GitLab is
> clearly working on shared-storage too), you can probably skip all that
> and just push a ref pointing to X with an empty pack (Git just cares
> that it has all of the objects afterwards, not that you pushed them).
>
> None of those care about the quality of the hash (they do assume you
> know the hash of X already, but then so does fetching by object id).
>
> And no, I've never written a proof-of-concept for that. It would depend
> largely on the data you're trying to extract. E.g., if you think X
> contains "root:XXXXXX", then you might hope to ask for "root:AXXXXX",
> then "root:BXXXXX", etc. You know you've got a hit when the delta gets
> smaller. So the complexity for guessing N bytes becomes 256*N, rather
> than 256^N.
>
>> > But I agree that people do assume it's the case. I was certainly
>> > surprised by the v2 behavior, and I don't remember that aspect being
>> > discussed.
>>
>> IMHO it's a plain bug (either in implementation or documentation).
>
> Or both. :) The behavior and the documentation seem to agree.
>
>> [...]
>> >> I'm inclined to say that in the face of that "SECURITY" section we
>> >> should just:
>> >>
>> >> * Turn on uploadpack.allowReachableSHA1InWant for v0/v1 by
>> >> default. Make saying uploadpack.allowReachableSHA1InWant=false warn
>> >> with "this won't work, see SECURITY...".
>> >>
>> >> * The uploadpack.allowTipSHA1InWant setting will also be turned on by
>> >> default, and will be much faster, since it'll just degrade to
>> >> uploadpack.allowReachableSHA1InWant=true and we won't need any
>> >> reachability check. We'll also warn saying that setting it is
>> >> useless.
>> >
>> > No real argument from me. I have always thought those security
>> > guarantees were BS.
>>
>> This would make per-branch ACLs (as implemented both by Gerrit and
>> gitolite) an essentially useless feature, so please no.
>
> I think Ævar's argument is that those are providing a false sense of
> security now (at least for read permissions).
>
> Let me clarify my position a little.
>
> I won't claim the existing scheme provides _no_ value (especially the
> pure read-only case above is less dicey than the others). It's mostly
> that the protections are very hand-wavy. I don't trust them _now_, and I
> have little faith that other innocent-looking changes to the object
> negotiation and the packing code will not significantly weaken them.
> There's no security boundary expressed within Git's code, so there's a
> very high chance of information leaking accidentally. A trustable system
> would have boundaries built in from the ground up.
>
> Enough people seem to believe otherwise (i.e., that the hand-waving
> arguments are worth _something_) that I've never pushed to actually
> change the default behavior. But if Ævar wants to try to do so, I'm not
> going to stand in my way (hence my "no argument from me").
FWIW I don't really care about this, I don't rely on
uploadpack.allow{Tip,Reachable,Any}SHA1InWant=false I'm just on the
side-quest of:
1. Try protocol v2
2. Discover that v2 implictly has uploadpack.allowAnySHA1InWant=true
enabled
3. Some people (including Jonathan) can reasonable read our docs /
seem to have understood this to be a security mechanism
4. What are we going to do about that v1 & v2 discrepancy? [You are
here!]
The genreal ways I see forward from that are:
A) Say that v2 has a security issue and that this is a feature that
works in some circumstances, but given Jeff's explanation here we
should at least improve our "SECURITY" docs to be less handwaivy.
B) Improve security docs, turn uploadpack.allowAnySHA1InWant=true on by
default, allow people to turn it off.
C) Like B) but deprecate
uploadpack.allow{Tip,Reachable,Any}SHA1InWant=false. This is my
patch upthread
D-Z) ???
I'm not set on C), and yeah it's probably overzelous to just rip the
thing out, but then what should we do?
>> I would be all for changing the default, but making turning off
>> allowReachableSHA1InWant an unsupported deprecated thing is a step too
>> far, in my opinion.
>
> Yes, I agree if we were to go down this road, it probably makes sense to
> flip the knobs and let them be "unflipped" if the user wants.
>
>> Is there somewhere that we can document these kinds of invariants or
>> goals so that we don't have to keep repeating the same discussions?
>
> It's not clear to me that there's consensus on the invariants or goals.
> ;)
>
> -Peff
^ permalink raw reply
* Re: [PATCH v3 0/4] protocol v2 fixes
From: Jeff King @ 2018-12-18 12:48 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason
Cc: git, Junio C Hamano, Brandon Williams, Jonathan Tan,
Josh Steadmon
In-Reply-To: <20181217224054.4376-1-avarab@gmail.com>
On Mon, Dec 17, 2018 at 11:40:50PM +0100, Ævar Arnfjörð Bjarmason wrote:
> No changes to Jeff's patches since v2, for Jonathan's no changes to
> the C code, but I added a test which I extracted from the
> GIT_TEST_PROTOCOL_VERSION=2 work.
>
> Jeff King (3):
> serve: pass "config context" through to individual commands
> parse_hide_refs_config: handle NULL section
> upload-pack: support hidden refs with protocol v2
These should be dropped in favor of:
https://public-inbox.org/git/20181218124750.GC30471@sigill.intra.peff.net/
that I just sent (I tweaked the commit message to address the questions
you had in push for patch 3).
-Peff
^ permalink raw reply
* Re: [PATCH 1/3] serve: pass "config context" through to individual commands
From: Jeff King @ 2018-12-18 12:47 UTC (permalink / raw)
To: Junio C Hamano
Cc: git, Brandon Williams, Jonathan Tan, Jonathan Nieder,
Ævar Arnfjörð Bjarmason
In-Reply-To: <xmqqo99lbu9o.fsf@gitster-ct.c.googlers.com>
On Sun, Dec 16, 2018 at 08:12:03PM +0900, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
>
> > Yeah, I agree that such a "pass this through" struct full of options and
> > context would make sense. I just wouldn't tie it to the "serve"
> > machinery.
> >
> > Did you read the side-thread between me and Jonathan? Another option
> > here is to just have ls-refs assume that the client will tell it the
> > context (and assume uploadpack for now, since that's all that v2
> > currently does).
>
> Yes, I'd be 100% happy with that, too. And it certainly is simpler.
OK, let's do that, then. The user-visible behavior is no different, so
we can always reverse course later when the v2 push scheme materializes.
Patch is below.
> P.S. I expect to be mostly offline for the coming 72 hours, as I and
> my wife are both down with a cold. I am guessing that we will enter
> slower weeks in many parts of the world, and hoping this won't be
> too disruptive.
Hope you are feeling better. I'll be active through the rest of this
week, and then probably offline quite a bit for the two weeks following.
-Peff
-- >8 --
Subject: [PATCH] upload-pack: support hidden refs with protocol v2
In the v2 protocol, upload-pack's advertisement has been moved to the
"ls-refs" command. That command does not respect hidden-ref config (like
transfer.hiderefs) at all, and advertises everything.
While there are some features that are not supported in v2 (e.g., v2
always allows fetching any sha1 without respect to advertisements), the
lack of this feature is not documented and is likely just a bug. Let's
make it work, as otherwise upgrading a server to a v2-capable git will
start exposing these refs that the repository admin has asked to remain
hidden.
Note that we assume we're operating on behalf of a fetch here, since
that's the only thing implemented in v2 at this point. See the in-code
comment. We'll have to figure out how this works when the v2 push
protocol is designed (both here in ls-refs, but also rejecting updates
to hidden refs).
Signed-off-by: Jeff King <peff@peff.net>
---
ls-refs.c | 16 ++++++++++++++++
t/t5512-ls-remote.sh | 6 ++++++
2 files changed, 22 insertions(+)
diff --git a/ls-refs.c b/ls-refs.c
index a06f12eca8..9c9a7c647f 100644
--- a/ls-refs.c
+++ b/ls-refs.c
@@ -5,6 +5,7 @@
#include "argv-array.h"
#include "ls-refs.h"
#include "pkt-line.h"
+#include "config.h"
/*
* Check if one of the prefixes is a prefix of the ref.
@@ -40,6 +41,9 @@ static int send_ref(const char *refname, const struct object_id *oid,
const char *refname_nons = strip_namespace(refname);
struct strbuf refline = STRBUF_INIT;
+ if (ref_is_hidden(refname_nons, refname))
+ return 0;
+
if (!ref_match(&data->prefixes, refname))
return 0;
@@ -69,6 +73,16 @@ static int send_ref(const char *refname, const struct object_id *oid,
return 0;
}
+static int ls_refs_config(const char *var, const char *value, void *data)
+{
+ /*
+ * We only serve fetches over v2 for now, so respect only "uploadpack"
+ * config. This may need to eventually be expanded to "receive", but we
+ * don't yet know how that information will be passed to ls-refs.
+ */
+ return parse_hide_refs_config(var, value, "uploadpack");
+}
+
int ls_refs(struct repository *r, struct argv_array *keys,
struct packet_reader *request)
{
@@ -76,6 +90,8 @@ int ls_refs(struct repository *r, struct argv_array *keys,
memset(&data, 0, sizeof(data));
+ git_config(ls_refs_config, NULL);
+
while (packet_reader_read(request) != PACKET_READ_FLUSH) {
const char *arg = request->line;
const char *out;
diff --git a/t/t5512-ls-remote.sh b/t/t5512-ls-remote.sh
index 32e722db2e..ca69636fd5 100755
--- a/t/t5512-ls-remote.sh
+++ b/t/t5512-ls-remote.sh
@@ -204,6 +204,12 @@ test_expect_success 'overrides work between mixed transfer/upload-pack hideRefs'
grep refs/tags/magic actual
'
+test_expect_success 'protocol v2 supports hiderefs' '
+ test_config uploadpack.hiderefs refs/tags &&
+ git -c protocol.version=2 ls-remote . >actual &&
+ ! grep refs/tags actual
+'
+
test_expect_success 'ls-remote --symref' '
git fetch origin &&
cat >expect <<-EOF &&
--
2.20.1.748.g4743e5683b
^ permalink raw reply related
* Re: [PATCH v2 8/8] tests: mark tests broken under GIT_TEST_PROTOCOL_VERSION=2
From: Jeff King @ 2018-12-18 12:41 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason
Cc: Jonathan Nieder, git, Junio C Hamano, Brandon Williams,
Jonathan Tan
In-Reply-To: <8736qv18w4.fsf@evledraar.gmail.com>
On Tue, Dec 18, 2018 at 10:28:27AM +0100, Ævar Arnfjörð Bjarmason wrote:
> I.e. have a repo with "master" and a "root-password" branch, where the
> "root-password" branch has content that's irresistible to "git repack"
> for delta purposes, do we end up sending the "root-password" content
> over on a fetch even when that branch isn't advertised & forbidden?
>
> Or, if that fails are there ways to make it work? E.g. using hidden/* in
> combination with delta islands?
Delta islands wouldn't generally help here. They only tell us not to
store on-disk deltas that fetching clients aren't likely to be able to
reuse (i.e, they want X but will generally not have Y, so don't make a
delta there).
In the attacks I mentioned in my previous email, the deltas would
usually be computed on the fly for each fetch. So the lack of on-disk
deltas across "security boundaries" wouldn't help.
You could disable on-the-fly delta compression, but the resulting packs
are much larger (and you'd think it would save some server CPU, but
experiments I've done show that it helps a lot less than you'd think,
since we often end up zlib-deflating more bytes).
-Peff
^ permalink raw reply
* Re: [PATCH v2 8/8] tests: mark tests broken under GIT_TEST_PROTOCOL_VERSION=2
From: Jeff King @ 2018-12-18 12:36 UTC (permalink / raw)
To: Jonathan Nieder
Cc: Ævar Arnfjörð Bjarmason, git, Junio C Hamano,
Brandon Williams, Jonathan Tan
In-Reply-To: <20181217231452.GA13835@google.com>
On Mon, Dec 17, 2018 at 03:14:52PM -0800, Jonathan Nieder wrote:
> > IMHO those security guarantees there are overrated (due to delta
> > guessing attacks, though things are not quite as bad if the attacker
> > can't actually push to the repo).
>
> Do you have a proof of concept for delta guessing? My understanding
> was that without using a broken hash (e.g. uncorrected SHA-1), it is
> not feasible to carry out.
I think we may be talking about two different things. I mean an attack
where you want to know what is in object X, so you ask the server for
object Y and tell it that you already have X. If the sender generates a
delta against X, that tells you something about what's in X.
For a pure read-only server, you're restricted to the Y's that are
already in the repo. So how effective this is depends on what's in X,
and what Y's are available.
For a case where X is in a victim repo you cannot make arbitrary writes
to, but you _can_ make the victim repo aware of other objects (e.g., by
opening a pull request that creates a ref), then you can iteratively
provide many Y's, improving your guess about X in each iteration.
For a case where the victim repo has fully shared storage (GitHub, and
probably other hosts; I'm not sure if it's available yet, but GitLab is
clearly working on shared-storage too), you can probably skip all that
and just push a ref pointing to X with an empty pack (Git just cares
that it has all of the objects afterwards, not that you pushed them).
None of those care about the quality of the hash (they do assume you
know the hash of X already, but then so does fetching by object id).
And no, I've never written a proof-of-concept for that. It would depend
largely on the data you're trying to extract. E.g., if you think X
contains "root:XXXXXX", then you might hope to ask for "root:AXXXXX",
then "root:BXXXXX", etc. You know you've got a hit when the delta gets
smaller. So the complexity for guessing N bytes becomes 256*N, rather
than 256^N.
> > But I agree that people do assume it's the case. I was certainly
> > surprised by the v2 behavior, and I don't remember that aspect being
> > discussed.
>
> IMHO it's a plain bug (either in implementation or documentation).
Or both. :) The behavior and the documentation seem to agree.
> [...]
> >> I'm inclined to say that in the face of that "SECURITY" section we
> >> should just:
> >>
> >> * Turn on uploadpack.allowReachableSHA1InWant for v0/v1 by
> >> default. Make saying uploadpack.allowReachableSHA1InWant=false warn
> >> with "this won't work, see SECURITY...".
> >>
> >> * The uploadpack.allowTipSHA1InWant setting will also be turned on by
> >> default, and will be much faster, since it'll just degrade to
> >> uploadpack.allowReachableSHA1InWant=true and we won't need any
> >> reachability check. We'll also warn saying that setting it is
> >> useless.
> >
> > No real argument from me. I have always thought those security
> > guarantees were BS.
>
> This would make per-branch ACLs (as implemented both by Gerrit and
> gitolite) an essentially useless feature, so please no.
I think Ævar's argument is that those are providing a false sense of
security now (at least for read permissions).
Let me clarify my position a little.
I won't claim the existing scheme provides _no_ value (especially the
pure read-only case above is less dicey than the others). It's mostly
that the protections are very hand-wavy. I don't trust them _now_, and I
have little faith that other innocent-looking changes to the object
negotiation and the packing code will not significantly weaken them.
There's no security boundary expressed within Git's code, so there's a
very high chance of information leaking accidentally. A trustable system
would have boundaries built in from the ground up.
Enough people seem to believe otherwise (i.e., that the hand-waving
arguments are worth _something_) that I've never pushed to actually
change the default behavior. But if Ævar wants to try to do so, I'm not
going to stand in my way (hence my "no argument from me").
> I would be all for changing the default, but making turning off
> allowReachableSHA1InWant an unsupported deprecated thing is a step too
> far, in my opinion.
Yes, I agree if we were to go down this road, it probably makes sense to
flip the knobs and let them be "unflipped" if the user wants.
> Is there somewhere that we can document these kinds of invariants or
> goals so that we don't have to keep repeating the same discussions?
It's not clear to me that there's consensus on the invariants or goals.
;)
-Peff
^ permalink raw reply
* Re: [PATCH v2] Simplify handling of setup_git_directory_gently() failure cases.
From: Johannes Schindelin @ 2018-12-18 12:35 UTC (permalink / raw)
To: Erin Dahlgren; +Cc: git, Junio C Hamano
In-Reply-To: <1544922308-740-1-git-send-email-eedahlgren@gmail.com>
Hi Erin,
On Sat, 15 Dec 2018, Erin Dahlgren wrote:
> diff --git a/setup.c b/setup.c
> index 1be5037..e1a9e17 100644
> --- a/setup.c
> +++ b/setup.c
> @@ -831,16 +831,6 @@ static const char *setup_bare_git_dir(struct strbuf *cwd, int offset,
> return NULL;
> }
>
> -static const char *setup_nongit(const char *cwd, int *nongit_ok)
> -{
> - if (!nongit_ok)
> - die(_("not a git repository (or any of the parent directories): %s"), DEFAULT_GIT_DIR_ENVIRONMENT);
> - if (chdir(cwd))
> - die_errno(_("cannot come back to cwd"));
> - *nongit_ok = 1;
> - return NULL;
> -}
> -
> static dev_t get_device_or_die(const char *path, const char *prefix, int prefix_len)
> {
> struct stat buf;
> @@ -1097,18 +1087,20 @@ const char *setup_git_directory_gently(int *nongit_ok)
> prefix = setup_bare_git_dir(&cwd, dir.len, &repo_fmt, nongit_ok);
> break;
> case GIT_DIR_HIT_CEILING:
> - prefix = setup_nongit(cwd.buf, nongit_ok);
> - break;
> + if (!nongit_ok)
> + die(_("not a git repository (or any of the parent directories): %s"),
> + DEFAULT_GIT_DIR_ENVIRONMENT);
I am terribly sorry to bother you about formatting issues (in my mind, it
is quite an annoying thing that we still have no fully automatic way to
format Git's source code according to Git's preferred coding style, but
there you go...): this `DEFAULT_GIT_DIR_ENVIRONMENT` should be aligned
with the first parameter of `die()`, i.e.
+ if (!nongit_ok)
+ die(_("not a git repository (or any of the parent directories): %s"),
+ DEFAULT_GIT_DIR_ENVIRONMENT);
> + *nongit_ok = 1;
> + strbuf_release(&dir);
> + return NULL;
> case GIT_DIR_HIT_MOUNT_POINT:
> - if (nongit_ok) {
> - *nongit_ok = 1;
> - strbuf_release(&cwd);
> - strbuf_release(&dir);
> - return NULL;
> - }
> - die(_("not a git repository (or any parent up to mount point %s)\n"
> - "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set)."),
> - dir.buf);
> + if (!nongit_ok)
> + die(_("not a git repository (or any parent up to mount point %s)\n"
> + "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set)."),
> + dir.buf);
Likewise, `dir.buf` should be aligned with the `_` two lines above.
Otherwise I think this patch is good to go!
Thank you,
Johannes
> + *nongit_ok = 1;
> + strbuf_release(&dir);
> + return NULL;
> default:
> BUG("unhandled setup_git_directory_1() result");
> }
> --
> 2.7.4
>
>
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox