Git development
 help / color / mirror / Atom feed
* Re: Trying to use xfuncname without success.
From: Jack Adrian Zappa @ 2017-02-09  0:10 UTC (permalink / raw)
  Cc: git-mailing-list
In-Reply-To: <CAKepmagSysE_31Y3JJwhOKvD_kGfiyEXikep62g=cn9+=v_fZA@mail.gmail.com>

>  where it has grabbed a line at 126 and is using that for the hunk header.

When I say that, I mean that it is using that line for *every* hunk
header, for every change, regardless if it has passed a hunk head that
it should have matched.





On Wed, Feb 8, 2017 at 7:01 PM, Jack Adrian Zappa <adrianh.bsc@gmail.com> wrote:
> Tried to copy the .git/config file over to the non-working repository
> and it didn't seem to do anything.  Could the git database be
> partially corrupted?
>
> On Wed, Feb 8, 2017 at 7:00 PM, Jack Adrian Zappa <adrianh.bsc@gmail.com> wrote:
>> Well, it mostly works, but I'm getting some weirdness where it has
>> grabbed a line at 126 and is using that for the hunk header.  Strange
>> thing is, I move the file to another repository, commit it, make a
>> change to the fileand do a diff, and it gets the correct hunk header.
>>
>> I'm at a loss. :(
>>
>> On Wed, Feb 8, 2017 at 4:12 PM, Jack Adrian Zappa <adrianh.bsc@gmail.com> wrote:
>>> That was it.  I have a .gitattributes file in my home directory.
>>> Ahhh, but it's not in my %userprofile% directory, but in my ~
>>> directory.
>>>
>>> A bit confusing having 2 home directories.  I made a link to my
>>> .gitconfig, but forgot to make a link to my .gitattributes.
>>>
>>> Thanks.
>>>
>>>
>>> A
>>>
>>> On Wed, Feb 8, 2017 at 4:05 PM, Samuel Lijin <sxlijin@gmail.com> wrote:
>>>> Double check .gitattributes?
>>>>
>>>> On Feb 8, 2017 2:58 PM, "Jack Adrian Zappa" <adrianh.bsc@gmail.com> wrote:
>>>>>
>>>>> Thanks Samuel,
>>>>>
>>>>> That example showed that there must be something wrong in my .git
>>>>> directory, because with it, I'm getting the correct output.  Moving
>>>>> the same lines to my .git/config didn't work.
>>>>>
>>>>> On Wed, Feb 8, 2017 at 3:46 PM, Samuel Lijin <sxlijin@gmail.com> wrote:
>>>>> > I just put this togther: https://github.com/sxlijin/xfuncname-test
>>>>> >
>>>>> > Try cloning and then for any of config1 thru 3,
>>>>> >
>>>>> > $ cp configX .git/config
>>>>> > $ git diff HEAD^ -- test.natvis
>>>>> >
>>>>> > On Wed, Feb 8, 2017 at 2:42 PM, Jack Adrian Zappa
>>>>> > <adrianh.bsc@gmail.com> wrote:
>>>>> >> Thanks Samuel,
>>>>> >>
>>>>> >> So, the question is, what is causing this problem on my system?
>>>>> >>
>>>>> >> Anyone have an idea to help diagnose this problem?
>>>>> >>
>>>>> >> On Wed, Feb 8, 2017 at 3:24 PM, Samuel Lijin <sxlijin@gmail.com> wrote:
>>>>> >>> On Windows 7, it works for me in both CMD and Git Bash:
>>>>> >>>
>>>>> >>> $ git --version
>>>>> >>> git version 2.11.0.windows.3
>>>>> >>>
>>>>> >>> $ git diff HEAD^ --word-diff
>>>>> >>> diff --git a/test.natvis b/test.natvis
>>>>> >>> index 93396ad..1233b8c 100644
>>>>> >>> --- a/test.natvis
>>>>> >>> +++ b/test.natvis
>>>>> >>> @@ -18,6 +18,7 @@ test
>>>>> >>>
>>>>> >>>
>>>>> >>>       <!-- Non-blank line -->
>>>>> >>>       {+<Item Name="added var">added_var</Item>+}
>>>>> >>>
>>>>> >>>
>>>>> >>>       <Item Name="var2">var2</Item>
>>>>> >>>
>>>>> >>> On Wed, Feb 8, 2017 at 12:37 PM, René Scharfe <l.s.r@web.de> wrote:
>>>>> >>>> Am 08.02.2017 um 18:11 schrieb Jack Adrian Zappa:
>>>>> >>>>> Thanks Rene, but you seem to have missed the point.  NOTHING is
>>>>> >>>>> working.  No matter what I put there, it doesn't seem to get
>>>>> >>>>> matched.
>>>>> >>>>
>>>>> >>>> I'm not so sure about that.  With your example I get this diff
>>>>> >>>> without
>>>>> >>>> setting diff.natvis.xfuncname:
>>>>> >>>>
>>>>> >>>> diff --git a/a.natvis b/a.natvis
>>>>> >>>> index 7f9bdf5..bc3c090 100644
>>>>> >>>> --- a/a.natvis
>>>>> >>>> +++ b/a.natvis
>>>>> >>>> @@ -19,7 +19,7 @@
>>>>> >>>> xmlns="http://schemas.microsoft.com/vstudio/debugger/natvis/2010">
>>>>> >>>>
>>>>> >>>>
>>>>> >>>>        <!-- Non-blank line -->
>>>>> >>>> -      <Item Name="added var">added_var</Item>
>>>>> >>>> +      <Item Name="added var">added_vars</Item>
>>>>> >>>>
>>>>> >>>>
>>>>> >>>>        <Item Name="var2">var2</Item>
>>>>> >>>>
>>>>> >>>> Note the XML namespace in the hunk header.  It's put there by the
>>>>> >>>> default rule because "xmlns" starts at the beginning of the line.
>>>>> >>>> Your
>>>>> >>>> diff has nothing there, which means the default rule is not used,
>>>>> >>>> i.e.
>>>>> >>>> your user-defined rule is in effect.
>>>>> >>>>
>>>>> >>>> Come to think of it, this line break in the middle of the
>>>>> >>>> AutoVisualizer
>>>>> >>>> tab might have been added by your email client unintentionally, so
>>>>> >>>> that
>>>>> >>>> we use different test files, which then of course results in
>>>>> >>>> different
>>>>> >>>> diffs.  Is that the case?
>>>>> >>>>
>>>>> >>>> Anyway, if I run the following two commands:
>>>>> >>>>
>>>>> >>>> $ git config diff.natvis.xfuncname "^[\t ]*<Type[\t
>>>>> >>>> ]+Name=\"([^\"]+)\".*$"
>>>>> >>>> $ echo '*.natvis diff=natvis' >.gitattributes
>>>>> >>>>
>>>>> >>>> ... then I get this, both on Linux (git version 2.11.1) and on
>>>>> >>>> Windows
>>>>> >>>> (git version 2.11.1.windows.1):
>>>>> >>>>
>>>>> >>>> diff --git a/a.natvis b/a.natvis
>>>>> >>>> index 7f9bdf5..bc3c090 100644
>>>>> >>>> --- a/a.natvis
>>>>> >>>> +++ b/a.natvis
>>>>> >>>> @@ -19,7 +19,7 @@ test
>>>>> >>>>
>>>>> >>>>
>>>>> >>>>        <!-- Non-blank line -->
>>>>> >>>> -      <Item Name="added var">added_var</Item>
>>>>> >>>> +      <Item Name="added var">added_vars</Item>
>>>>> >>>>
>>>>> >>>>
>>>>> >>>>        <Item Name="var2">var2</Item>
>>>>> >>>>
>>>>> >>>>> Just to be sure, I tested your regex and again it didn't work.
>>>>> >>>>
>>>>> >>>> At this point I'm out of ideas, sorry. :(  The only way I was able to
>>>>> >>>> break it was due to mistyping the extension as "netvis" several times
>>>>> >>>> for some reason.
>>>>> >>>>
>>>>> >>>> René

^ permalink raw reply

* Re: [PATCH/RFC] WIP: log: allow "-" as a short-hand for "previous branch"
From: Siddharth Kannan @ 2017-02-08 17:23 UTC (permalink / raw)
  To: Matthieu Moy
  Cc: Junio C Hamano, Git List, Pranit Bauva, Jeff King, pclouds,
	brian m. carlson
In-Reply-To: <vpqh944eof7.fsf@anie.imag.fr>

Hello Matthieu,

On 8 February 2017 at 20:10, Matthieu Moy <Matthieu.Moy@grenoble-inp.fr> wrote:
> In a previous discussion, I made an analogy with "cd -" (which is the
> source of inspiration of this shorthand AFAIK): "-" did not magically
> become "the last visited directory" for all Unix commands, just for
> "cd". And in this case, I'm happy with it. For example, I never need
> "mkdir -", and I'm happy I can't "rm -fr -" by mistake.
>
> So, it's debatable whether it's a good thing to have all commands
> support "-". For example, forcing users to explicitly type "git branch
> -d @{1}" and not providing them with a shortcut might be a good thing.

builtin/branch.c does not call setup_revisions and remains unaffected
by this patch :)

In my original patch post, I was not explicit about what files call
setup_revisions.
I would like to rectify that with this (grep-ed) list:

* builtin/blame.c
* builtin/diff.c
* builtin/diff-files.c
* builtin/diff-index.c
* builtin/diff-tree.c
* builtin/log.c
* builtin/rev-list.c
* builtin/shortlog.c
* builtin/fast-export.c
* builtin/fmt-merge-msg.c
builtin/add.c
builtin/checkout.c
builtin/commit.c
builtin/merge.c
builtin/pack-objects.c
builtin/revert.c

* marked commands only show information, and don't change anything.

As you might notice, in this list, most commands are not of the `rm` variety,
i.e. something that would delete stuff.

In the next version of this patch, I will definitely include the list
of commands
which are "rm-ish" and affected by this patch.

>
> I don't have strong opinion on this: I tend to favor consistency and
> supporting "-" everywhere goes in this direction, but I think the
> downsides should be considered too. A large part of the exercice here is
> to write a good commit message!

Yes, I prefer consistency very much as well! Having "-" mean the same thing
across a lot of commands is better than having that shorthand only in a few
commands, as it is now. Unless there is a specific confusion that might arise
because of this shorthand inclusion, I think that this shorthand
should be supported
across the board.
(I especially like typing `git checkout - <filename>` which is very handy!)

>
> Another issue with this is: - is also a common way to say "use stdin
> instead of a file", so before enabling - for "previous branch", we need
> to make sure it does not introduce any ambiguity. Git does not seem to
> use "- for stdin" much (most commands able to read from stdin have an
> explicit --stdin option for that), a quick grep in the docs shows only
> "git blame --contents -" which is OK because a revision wouldn't make
> sense here anyway.

Yes, just to jog your memory, this was discussed here [1]

Junio said:

    As long as the addition is carefully prepared so that we know it
    will not conflict (or be confused by users) with possible other uses
    of "-", I do not think we would mind "git branch -D -" and other
    commands to learn "-" as a synonym for @{-1}.

>
> --
> Matthieu Moy
> http://www-verimag.imag.fr/~moy/

Thanks a lot for the review on this patch, Matthieu!

-- 

Best Regards,

- Siddharth.

[1]: https://public-inbox.org/git/7vmwpitb6k.fsf@alter.siamese.dyndns.org/

^ permalink raw reply

* Re: [PATCH v4] tag: generate useful reflog message
From: Junio C Hamano @ 2017-02-08 23:44 UTC (permalink / raw)
  To: Cornelius Weig; +Cc: git, karthik.188, peff, bitte.keine.werbung.einwerfen
In-Reply-To: <d0170d3a-3022-3bca-7c80-7ef0b1cf62a0@tngtech.com>

Cornelius Weig <cornelius.weig@tngtech.com> writes:

> A version with revised tests will follow.

Thanks; I think this is clean enough.  Let's queue this one and
advance it to 'next' soonish.


^ permalink raw reply

* Re: "disabling bitmap writing, as some objects are not being packed"?
From: Junio C Hamano @ 2017-02-09  0:18 UTC (permalink / raw)
  To: Jeff King; +Cc: David Turner, Duy Nguyen, Git Mailing List
In-Reply-To: <20170208230057.hking37uuynf4cgd@sigill.intra.peff.net>

Jeff King <peff@peff.net> writes:

> In my experience, auto-gc has never been a low-maintenance operation on
> the server side (and I do think it was primarily designed with clients
> in mind).

I do not think auto-gc was ever tweaked to help server usage, in its
history since it was invented strictly to help end-users (mostly new
ones).

> At GitHub we disable it entirely, and do our own gc based on a throttled
> job queue ...
> I wish regular Git were more turn-key in that respect. Maybe it is for
> smaller sites, but we certainly didn't find it so. And I don't know that
> it's feasible to really share the solution. It's entangled with our
> database (to store last-pushed and last-maintenance values for repos)
> and our job scheduler.

Thanks for sharing the insights from the trenches ;-)

> Yeah, I'm certainly open to improving Git's defaults. If it's not clear
> from the above, I mostly just gave up for a site the size of GitHub. :)
>
>> Idea 1: when gc --auto would issue this message, instead it could create
>> a file named gc.too-much-garbage (instead of gc.log), with this message.
>> If that file exists, and it is less than one day (?) old, then we don't
>> attempt to do a full gc; instead we just run git repack -A -d.  (If it's
>> more than one day old, we just delete it and continue anyway).
>
> I kind of wonder if this should apply to _any_ error. I.e., just check
> the mtime of gc.log and forcibly remove it when it's older than a day.
> You never want to get into a state that will fail to resolve itself
> eventually. That might still happen (e.g., corrupt repo), but at the
> very least it won't be because Git is too dumb to try again.

;-)

>> Idea 2 : Like idea 1, but instead of repacking, just smash the existing
>> packs together into one big pack.  In other words, don't consider
>> dangling objects, or recompute deltas.  Twitter has a tool called "git
>> combine-pack" that does this:
>> https://github.com/dturner-tw/git/blob/dturner/journal/builtin/combine-pack.c
>
> We wrote something similar at GitHub, too, but we never ended up using
> it in production. We found that with a sane scheduler, it's not too big
> a deal to just do maintenance once in a while.

Thanks again for this.  I've also been wondering about how effective
a "concatenate packs without paying reachability penalty" would be.

> I'm still not sure if it's worth making the fatal/non-fatal distinction.
> Doing so is perhaps safer, but it does mean that somebody has to decide
> which errors are important enough to block a retry totally, and which
> are not. In theory, it would be safe to always _try_ and then the gc
> process can decide when something is broken and abort. And all you've
> wasted is some processing power each day.

Yup, and somebody or something need to monitor so that repeated
failures can be dealt with.

^ permalink raw reply

* Re: [PATCH 1/2] pathspec magic: add '^' as alias for '!'
From: Stefan Beller @ 2017-02-08 23:16 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Brandon Williams, Linus Torvalds, Git Mailing List
In-Reply-To: <xmqq37fouvu3.fsf@gitster.mtv.corp.google.com>

On Wed, Feb 8, 2017 at 3:05 PM, Junio C Hamano <gitster@pobox.com> wrote:
>  - it hints that '!' is the more official spelling, making the
>    output you showed above acceptable.

Long term , I'd rather have ^ as the "official" spelling as that is easier
to teach to people. ! being a historic mistake as it is hard to type?

^ permalink raw reply

* Re: What's cooking in git.git (Feb 2017, #02; Mon, 6)
From: Stefan Beller @ 2017-02-09  0:25 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git@vger.kernel.org
In-Reply-To: <xmqqzihzymn3.fsf@gitster.mtv.corp.google.com>

> * sb/submodule-doc (2017-01-12) 3 commits
>  - submodules: add a background story
>  - submodule update documentation: don't repeat ourselves
>  - submodule documentation: add options to the subcommand
>
>  Needs review.
>

The first two commits are good to go IIRC as you seemed to be
positive about them at the time. Though I have a hard time finding
evidence of such.

I am currently reworking the 3/3 "add a background story" patch as it is
RFC-ish, so no need to review that any more.

Maybe we can get 1&2 merged and then 3 comes on its own?

Thanks,
Stefan

^ permalink raw reply

* Re: [RFC 00/14] Allow fetch-pack to send ref names (globs allowed)
From: Junio C Hamano @ 2017-02-09  0:26 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, peff
In-Reply-To: <20170207235350.10718-1-jonathantanmy@google.com>

Jonathan Tan <jonathantanmy@google.com> writes:

> Usages of the list of remote refs or the remote-local ref
> map are updated as follows:
>  - check_not_current_branch (which checks that the current branch is not
>    affected by the fetch) is performed both on the original ref map (to
>    die before the fetch if we can, as an optimization) and on the new
>    ref map (since the new ref map is the one actually applied).

OK.

>  - Pruning is done based on the new ref map.

OK.  As that is what eventually gets "installed" on the local side,
it makes sense to become consistent with that set, not the set the
original server gave you.

>  - backfill_tags (for tag following) is performed using the original
>    list of remote refs because the new list of fetched refs is not
>    guaranteed to contain tag information. (Since backfill_tags performs
>    another fetch, it does not need to be fully consistent with the
>    just-returned packfile.)

This smells correct but I need to think about this one a bit more.

Overall I think the strategy is agreeable.

^ permalink raw reply

* Re: What's cooking in git.git (Feb 2017, #02; Mon, 6)
From: Junio C Hamano @ 2017-02-09  1:01 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git@vger.kernel.org
In-Reply-To: <CAGZ79kZ6m1NGz9Lhhd5BDxayLW-UC8MqexMyhHBhHKrmg-tB+w@mail.gmail.com>

Stefan Beller <sbeller@google.com> writes:

>> * sb/submodule-doc (2017-01-12) 3 commits
>>  - submodules: add a background story
>>  - submodule update documentation: don't repeat ourselves
>>  - submodule documentation: add options to the subcommand
>
> The first two commits are good to go IIRC as you seemed to be
> positive about them at the time. Though I have a hard time finding
> evidence of such.
>
> I am currently reworking the 3/3 "add a background story" patch as it is
> RFC-ish, so no need to review that any more.
>
> Maybe we can get 1&2 merged and then 3 comes on its own?

Yeah, sorry, I keep forgetting this topic, it seems.  Let's advance
the earlier two.  Tentatively I'd drop the third one as you plan to
redo it separately.

Thanks for prodding.


^ permalink raw reply

* Re: "disabling bitmap writing, as some objects are not being packed"?
From: Jeff King @ 2017-02-09  1:12 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: David Turner, Duy Nguyen, Git Mailing List
In-Reply-To: <xmqqbmuctdwu.fsf@gitster.mtv.corp.google.com>

On Wed, Feb 08, 2017 at 04:18:25PM -0800, Junio C Hamano wrote:

> > We wrote something similar at GitHub, too, but we never ended up using
> > it in production. We found that with a sane scheduler, it's not too big
> > a deal to just do maintenance once in a while.
> 
> Thanks again for this.  I've also been wondering about how effective
> a "concatenate packs without paying reachability penalty" would be.

For the sake of posterity, I'll include our patch at the end (sorry, not
chunked into nice readable commits; that never existed in the first
place).

> > I'm still not sure if it's worth making the fatal/non-fatal distinction.
> > Doing so is perhaps safer, but it does mean that somebody has to decide
> > which errors are important enough to block a retry totally, and which
> > are not. In theory, it would be safe to always _try_ and then the gc
> > process can decide when something is broken and abort. And all you've
> > wasted is some processing power each day.
> 
> Yup, and somebody or something need to monitor so that repeated
> failures can be dealt with.

Yes. I think that part is probably outside the scope of Git. But if
auto-gc leaves gc.log lying around, it would be easy to visit each repo
and collect the various failures.

-- >8 --
This is the "pack-fast" patch, for reference. It applies on v2.6.5,
though I had to do some wiggling due to a few of our other custom
patches, so it's possible I introduced new bugs. It compiles, but I
didn't actually re-test the result.  I _think_ the original at least
generated valid packs in all cases.

So I would certainly not recommend anybody run this. It's just a
possible base to work off of if anybody's interested in the topic. I
haven't looked at David's combine-packs at all to see if it is any less
gross. :)

---
 Makefile            |   1 +
 builtin.h           |   1 +
 builtin/pack-fast.c | 618 +++++++++++++++++++++++++++++++++++
 cache.h             |   5 +
 git.c               |   1 +
 pack-bitmap-write.c | 167 +++++++++-
 pack-bitmap.c       |   2 +-
 pack-bitmap.h       |   8 +
 sha1_file.c         |   4 +-
 9 files changed, 792 insertions(+), 15 deletions(-)

diff --git a/Makefile b/Makefile
index 37e2d9e18..524b185ec 100644
--- a/Makefile
+++ b/Makefile
@@ -887,6 +887,7 @@ BUILTIN_OBJS += builtin/mv.o
 BUILTIN_OBJS += builtin/name-rev.o
 BUILTIN_OBJS += builtin/notes.o
 BUILTIN_OBJS += builtin/pack-objects.o
+BUILTIN_OBJS += builtin/pack-fast.o
 BUILTIN_OBJS += builtin/pack-redundant.o
 BUILTIN_OBJS += builtin/pack-refs.o
 BUILTIN_OBJS += builtin/patch-id.o
diff --git a/builtin.h b/builtin.h
index 79aaf0afe..df4e4d668 100644
--- a/builtin.h
+++ b/builtin.h
@@ -95,6 +95,7 @@ extern int cmd_mv(int argc, const char **argv, const char *prefix);
 extern int cmd_name_rev(int argc, const char **argv, const char *prefix);
 extern int cmd_notes(int argc, const char **argv, const char *prefix);
 extern int cmd_pack_objects(int argc, const char **argv, const char *prefix);
+extern int cmd_pack_fast(int argc, const char **argv, const char *prefix);
 extern int cmd_pack_redundant(int argc, const char **argv, const char *prefix);
 extern int cmd_patch_id(int argc, const char **argv, const char *prefix);
 extern int cmd_prune(int argc, const char **argv, const char *prefix);
diff --git a/builtin/pack-fast.c b/builtin/pack-fast.c
new file mode 100644
index 000000000..ad9f5e5f1
--- /dev/null
+++ b/builtin/pack-fast.c
@@ -0,0 +1,618 @@
+#include "builtin.h"
+#include "cache.h"
+#include "pack.h"
+#include "progress.h"
+#include "csum-file.h"
+#include "sha1-lookup.h"
+#include "parse-options.h"
+#include "tempfile.h"
+#include "pack-bitmap.h"
+#include "pack-revindex.h"
+
+static const char *pack_usage[] = {
+	N_("git pack-fast --quiet [options...] [base-name]"),
+	NULL
+};
+
+struct packwriter {
+	struct tempfile *tmp;
+	off_t total;
+	int fd;
+	uint32_t crc32;
+	unsigned do_crc;
+};
+
+static void packwriter_crc32_start(struct packwriter *w)
+{
+	w->crc32 = crc32(0, NULL, 0);
+	w->do_crc = 1;
+}
+
+static uint32_t packwriter_crc32_end(struct packwriter *w)
+{
+	w->do_crc = 0;
+	return w->crc32;
+}
+
+static void packwriter_write(struct packwriter *w, const void *buf, unsigned int count)
+{
+	if (w->do_crc)
+		w->crc32 = crc32(w->crc32, buf, count);
+	write_or_die(w->fd, buf, count);
+	w->total += count;
+}
+
+static off_t packwriter_total(struct packwriter *w)
+{
+	return w->total;
+}
+
+static void packwriter_init(struct packwriter *w)
+{
+	char tmpname[PATH_MAX];
+
+	w->fd = odb_mkstemp(tmpname, sizeof(tmpname), "pack/tmp_pack_XXXXXX");
+	w->total = 0;
+	w->do_crc = 0;
+	w->tmp = xcalloc(1, sizeof(*w->tmp));
+
+	register_tempfile(w->tmp, tmpname);
+}
+
+
+static int progress = 1;
+static struct progress *progress_state;
+static struct pack_idx_option pack_idx_opts;
+static const char *base_name = "pack-fast";
+static int skip_largest;
+static int write_bitmap_index = 1;
+
+static struct packed_git **all_packfiles;
+static unsigned int all_packfiles_nr;
+
+static struct pack_idx_entry **written_list;
+static unsigned int written_nr;
+
+struct write_slab {
+	struct write_slab *next;
+	unsigned int nr;
+
+	struct write_slab_entry {
+		struct pack_idx_entry idx;
+		enum object_type real_type;
+	} entries[];
+};
+
+static struct write_slab *written_slab_root;
+static struct write_slab *written_slab_current;
+
+static void add_to_write_list(
+	const unsigned char *sha1, off_t offset, uint32_t crc32,
+	enum object_type real_type)
+{
+	struct write_slab *slab = written_slab_current;
+	struct write_slab_entry *entry = &(slab->entries[slab->nr++]);
+
+	entry->real_type = real_type;
+	entry->idx.offset = offset;
+	entry->idx.crc32 = crc32;
+	hashcpy(entry->idx.sha1, sha1);
+}
+
+static void preallocate_write_slab(unsigned int num_entries)
+{
+	struct write_slab *slab = xmalloc(
+		sizeof(struct write_slab) +
+		num_entries * sizeof(struct write_slab_entry));
+
+	slab->next = NULL;
+	slab->nr = 0;
+
+	if (!written_slab_current) {
+		written_slab_current = slab;
+		written_slab_root = slab;
+	} else {
+		written_slab_current->next = slab;
+		written_slab_current = slab;
+	}
+}
+
+static struct skipped_object {
+	off_t skipped_offset;
+	off_t real_offset;
+} *skipped_list;
+static unsigned int skipped_nr;
+static unsigned int skipped_alloc;
+
+static void add_to_skipped_list(off_t skipped_offset, off_t real_offset)
+{
+	if (skipped_nr >= skipped_alloc) {
+		skipped_alloc = (skipped_alloc + 32) * 2;
+		REALLOC_ARRAY(skipped_list, skipped_alloc);
+	}
+
+	skipped_list[skipped_nr].skipped_offset = skipped_offset;
+	skipped_list[skipped_nr].real_offset = real_offset;
+	skipped_nr++;
+}
+
+static off_t find_real_offset_for_base(off_t skipped_offset)
+{
+	int lo = 0, hi = skipped_nr;
+	while (lo < hi) {
+		int mi = lo + ((hi - lo) / 2);
+		if (skipped_offset == skipped_list[mi].skipped_offset)
+			return skipped_list[mi].real_offset;
+		if (skipped_offset < skipped_list[mi].skipped_offset)
+			hi = mi;
+		else
+			lo = mi + 1;
+	}
+
+	return 0;
+}
+
+/*
+ * Record the offsets needed in our reused packfile chunks due to
+ * "gaps" where we omitted some objects.
+ */
+static struct reused_chunk {
+	off_t start;
+	off_t offset;
+} *reused_chunks;
+static int reused_chunks_nr;
+static int reused_chunks_alloc;
+
+static void record_reused_object(off_t where, off_t offset)
+{
+	if (reused_chunks_nr && reused_chunks[reused_chunks_nr-1].offset == offset)
+		return;
+
+	ALLOC_GROW(reused_chunks, reused_chunks_nr + 1,
+		   reused_chunks_alloc);
+	reused_chunks[reused_chunks_nr].start = where;
+	reused_chunks[reused_chunks_nr].offset = offset;
+	reused_chunks_nr++;
+}
+
+/*
+ * Binary search to find the chunk that "where" is in. Note
+ * that we're not looking for an exact match, just the first
+ * chunk that contains it (which implicitly ends at the start
+ * of the next chunk.
+ */
+static off_t find_reused_offset(off_t where)
+{
+	int lo = 0, hi = reused_chunks_nr;
+	while (lo < hi) {
+		int mi = lo + ((hi - lo) / 2);
+		if (where == reused_chunks[mi].start)
+			return reused_chunks[mi].offset;
+		if (where < reused_chunks[mi].start)
+			hi = mi;
+		else
+			lo = mi + 1;
+	}
+
+	/*
+	 * The first chunk starts at zero, so we can't have gone below
+	 * there.
+	 */
+	assert(lo);
+	return reused_chunks[lo-1].offset;
+}
+
+static uint32_t nth_packed_object_crc32(const struct packed_git *p, uint32_t nr)
+{
+	const uint32_t *index_crc = p->index_data;
+	index_crc += 2 + 256 + p->num_objects * (20/4) + nr;
+	return ntohl(*index_crc);
+}
+
+static void load_index_or_die(struct packed_git *p)
+{
+	if (open_pack_index(p) < 0)
+		die("failed to open index for '%s'", p->pack_name);
+
+	if (p->index_version != 2)
+		die("unsupported index version %d (fast-pack requires index v2)\n",
+			p->index_version);
+}
+
+static int sort_pack(const void *a_, const void *b_)
+{
+	struct packed_git *a = *((struct packed_git **)a_);
+	struct packed_git *b = *((struct packed_git **)b_);
+
+	if (a->mtime > b->mtime)
+		return 1;
+	else if (a->mtime == b->mtime)
+		return 0;
+	return -1;
+}
+
+static void find_packfiles(void)
+{
+	struct packed_git *p;
+	unsigned int n;
+
+	prepare_packed_git();
+
+	for (n = 0, p = packed_git; p; p = p->next) {
+		if (p->pack_local)
+			n++;
+	}
+
+	all_packfiles = xcalloc(n, sizeof(struct packed_git *));
+	all_packfiles_nr = n;
+
+	for (n = 0, p = packed_git; p; p = p->next) {
+		if (p->pack_local)
+			all_packfiles[n++] = p;
+	}
+
+	for (n = 1; n < all_packfiles_nr; ++n) {
+		if (all_packfiles[n]->pack_size > all_packfiles[0]->pack_size) {
+			struct packed_git *tmp = all_packfiles[0];
+			all_packfiles[0] = all_packfiles[n];
+			all_packfiles[n] = tmp;
+		}
+	}
+
+	qsort(all_packfiles + 1, all_packfiles_nr - 1, sizeof(struct packed_git *), sort_pack);
+}
+
+static int sha1_index__cmp(const void *a_, const void *b_)
+{
+	struct pack_idx_entry *a = *((struct pack_idx_entry **)a_);
+	struct pack_idx_entry *b = *((struct pack_idx_entry **)b_);
+	return hashcmp(a->sha1, b->sha1);
+}
+
+static const unsigned char *sha1_index__access(size_t pos, void *table)
+{
+	struct pack_idx_entry **index = table;
+	return index[pos]->sha1;
+}
+
+static void sha1_index_update(void)
+{
+	const unsigned int left_nr = written_nr;
+	const unsigned int right_nr = written_slab_current->nr;
+	const unsigned int total_nr = left_nr + right_nr;
+
+	struct pack_idx_entry **left = written_list;
+	struct pack_idx_entry **right = xmalloc(right_nr * sizeof(struct pack_idx_entry *));
+	struct pack_idx_entry **result = xmalloc(total_nr * sizeof(struct pack_idx_entry *));
+
+	unsigned int i, j, n;
+
+	for (j = 0; j < right_nr; ++j)
+		right[j] = (struct pack_idx_entry *)(&written_slab_current->entries[j]);
+
+	qsort(right, right_nr, sizeof(struct pack_idx_entry  *), sha1_index__cmp);
+
+	for (i = j = n = 0; i < left_nr && j < right_nr; ++n) {
+		struct pack_idx_entry *a = left[i];
+		struct pack_idx_entry *b = right[j];
+
+		if (hashcmp(a->sha1, b->sha1) <= 0) {
+			result[n] = a;
+			i++;
+		} else {
+			result[n] = b;
+			j++;
+		}
+	}
+
+	for (; i < left_nr; ++n, ++i)
+		result[n] = left[i];
+
+	for (; j < right_nr; ++n, ++j)
+		result[n] = right[j];
+
+	free(written_list);
+	free(right);
+
+	written_list = result;
+	written_nr = total_nr;
+}
+
+static off_t sha1_index_find_offset(const unsigned char *sha1)
+{
+	int pos = sha1_pos(sha1, written_list, written_nr, sha1_index__access);
+	return (pos < 0) ? 0 : written_list[pos]->offset;
+}
+
+static void copy_pack_data(
+		struct packwriter *w,
+		struct packed_git *p,
+		struct pack_window **w_curs,
+		off_t offset,
+		off_t len)
+{
+	unsigned char *in;
+	unsigned long avail;
+
+	while (len) {
+		in = use_pack(p, w_curs, offset, &avail);
+		if (avail > len)
+			avail = (unsigned long)len;
+		packwriter_write(w, in, avail);
+		offset += avail;
+		len -= avail;
+	}
+}
+
+extern enum object_type packed_to_object_type(
+	struct packed_git *p, off_t obj_offset, enum object_type type,
+	struct pack_window **w_curs, off_t curpos);
+
+static int append_object_1(
+	struct revindex_entry *reventry,
+	struct packwriter *w,
+	struct packed_git *pack,
+	struct pack_window **w_curs,
+	enum object_type *real_type)
+{
+	const off_t offset = reventry[0].offset;
+	const off_t next = reventry[1].offset;
+
+	off_t cur;
+	enum object_type type;
+	unsigned long size;
+
+	record_reused_object(offset, offset - packwriter_total(w));
+
+	cur = offset;
+	type = unpack_object_header(pack, w_curs, &cur, &size);
+	assert(type >= 0);
+
+	if (write_bitmap_index)
+		*real_type = packed_to_object_type(pack, offset, type, w_curs, cur);
+
+	if (type == OBJ_OFS_DELTA) {
+		const off_t base_offset = get_delta_base(pack, w_curs, &cur, type, offset);
+		const off_t real_base_offset = find_real_offset_for_base(base_offset);
+		off_t fixed_offset = 0;
+
+		assert(base_offset != 0);
+
+		if (real_base_offset) {
+			fixed_offset = packwriter_total(w) - real_base_offset;
+		} else {
+			off_t fixup = find_reused_offset(offset) - find_reused_offset(base_offset);
+			if (fixup)
+				fixed_offset = offset - base_offset - fixup;
+		}
+
+		if (fixed_offset) {
+			unsigned char header[10], ofs_header[10];
+			unsigned i, len, ofs_len;
+
+			assert(fixed_offset > 0);
+			len = encode_in_pack_object_header(OBJ_OFS_DELTA, size, header);
+
+			i = sizeof(ofs_header) - 1;
+			ofs_header[i] = fixed_offset & 127;
+			while (fixed_offset >>= 7)
+				ofs_header[--i] = 128 | (--fixed_offset & 127);
+
+			ofs_len = sizeof(ofs_header) - i;
+
+			packwriter_write(w, header, len);
+			packwriter_write(w, ofs_header + sizeof(ofs_header) - ofs_len, ofs_len);
+			copy_pack_data(w, pack, w_curs, cur, next - cur);
+			return 1;
+		}
+
+		/* ...otherwise we have no fixup, and can write it verbatim */
+	}
+
+	copy_pack_data(w, pack, w_curs, offset, next - offset);
+	return 0;
+}
+
+static int copy_packfile(int from, struct packwriter *w)
+{
+	unsigned char buffer[8192];
+	struct stat st;
+	ssize_t to_read;
+
+	if (from < 0 || fstat(from, &st))
+		return -1;
+
+	posix_fadvise(from, 0, st.st_size, POSIX_FADV_SEQUENTIAL);
+	to_read = st.st_size - 20;
+
+	if (progress)
+		fprintf(stderr, "Copying main packfile...");
+
+	while (to_read) {
+		ssize_t r, cap = sizeof(buffer);
+
+		if (cap > to_read)
+			cap = to_read;
+
+		r = xread(from, buffer, cap);
+		if (r < 0)
+			return -1;
+
+		packwriter_write(w, buffer, r);
+		to_read -= r;
+	}
+
+	if (progress)
+		fprintf(stderr, " done.\n");
+	assert(to_read == 0);
+	return 0;
+}
+
+static void write_initial_packfile(struct packed_git *p, struct packwriter *w)
+{
+	unsigned int n;
+	int source_fd = git_open_noatime(p->pack_name);
+
+	if (copy_packfile(source_fd, w) < 0)
+		die_errno("failed to copy '%s'", p->pack_name);
+	close(source_fd);
+
+	load_index_or_die(p);
+	preallocate_write_slab(p->num_objects);
+
+	if (progress)
+		progress_state = start_progress("Indexing main packfile", p->num_objects);
+
+	for (n = 0; n < p->num_objects; ++n) {
+		const unsigned char *sha1 = nth_packed_object_sha1(p, n);
+		const off_t offset = nth_packed_object_offset(p, n);
+		const uint32_t crc32 = nth_packed_object_crc32(p, n);
+		add_to_write_list(sha1, offset, crc32, OBJ_BAD);
+		display_progress(progress_state, n + 1);
+	}
+
+	stop_progress(&progress_state);
+	close_pack_index(p);
+
+	written_list = xmalloc(p->num_objects * sizeof(struct packed_git *));
+	written_nr = p->num_objects;
+	for (n = 0; n < written_nr; ++n)
+		written_list[n] = (struct pack_idx_entry *)(&written_slab_current->entries[n]);
+}
+
+static void append_packfile(struct packed_git *p, struct packwriter *w)
+{
+	struct pack_window *w_curs = NULL;
+	struct pack_revindex *revidx;
+
+	unsigned int n;
+
+	load_index_or_die(p);
+	preallocate_write_slab(p->num_objects);
+	revidx = revindex_for_pack(p);
+
+	if (progress)
+		progress_state = start_progress("Appending packfile", p->num_objects);
+
+	for (n = 0; n < p->num_objects; ++n) {
+		struct revindex_entry *reventry = &revidx->revindex[n];
+		const unsigned char *sha1 = nth_packed_object_sha1(p, reventry[0].nr);
+		const off_t offset_in_pack = sha1_index_find_offset(sha1);
+
+		if (!offset_in_pack) {
+			const off_t offset = packwriter_total(w);
+
+			enum object_type real_type = OBJ_BAD;
+			uint32_t crc32;
+			int rewrite_header;
+
+			packwriter_crc32_start(w);
+			rewrite_header = append_object_1(reventry, w, p, &w_curs, &real_type);
+			crc32 = packwriter_crc32_end(w);
+
+			if (!rewrite_header && crc32 != nth_packed_object_crc32(p, reventry[0].nr))
+				die("crc32 check failed for %s", sha1_to_hex(sha1));
+
+			add_to_write_list(sha1, offset, crc32, real_type);
+		} else {
+			add_to_skipped_list(reventry[0].offset, offset_in_pack);
+		}
+
+		display_progress(progress_state, n + 1);
+	}
+
+	stop_progress(&progress_state);
+	unuse_pack(&w_curs);
+	close_pack_windows(p);
+	close_pack_index(p);
+
+	sha1_index_update();
+	skipped_nr = 0;
+	reused_chunks_nr = 0;
+}
+
+static void write_packs(void)
+{
+	struct packwriter w;
+	unsigned int i;
+
+	packwriter_init(&w);
+	write_initial_packfile(all_packfiles[0], &w);
+
+	for (i = 1; i < all_packfiles_nr; ++i)
+		append_packfile(all_packfiles[i], &w);
+
+	/* finalize pack */
+	{
+		unsigned char sha1[20];
+		struct strbuf tmpname = STRBUF_INIT;
+
+		fixup_pack_header_footer(w.fd, sha1, w.tmp->filename.buf, written_nr, NULL, 0);
+		close(w.fd);
+
+		strbuf_addf(&tmpname, "%s-", base_name);
+
+		finish_tmp_packfile(&tmpname, w.tmp->filename.buf,
+				written_list, written_nr,
+				&pack_idx_opts, sha1);
+
+		if (write_bitmap_index) {
+			strbuf_addf(&tmpname, "%s.bitmap", sha1_to_hex(sha1));
+			bitmap_rewrite_existing(
+				all_packfiles[0],
+				written_list, written_nr,
+				packwriter_total(&w),
+				sha1, tmpname.buf);
+		}
+
+		strbuf_release(&tmpname);
+		puts(sha1_to_hex(sha1));
+	}
+}
+
+void pack_fast_grow_typemaps(struct packed_git *p, struct ewah_bitmap **typemaps)
+{
+	uint32_t n;
+	size_t pos = p->num_objects;
+	struct write_slab *slab = written_slab_root;
+
+	assert(slab->nr == p->num_objects);
+	assert(slab->next);
+	slab = slab->next;
+
+	while (slab) {
+		for (n = 0; n < slab->nr; ++n) {
+			const enum object_type real_type = slab->entries[n].real_type;
+			assert(real_type >= OBJ_COMMIT && real_type <= OBJ_TAG);
+			ewah_set(typemaps[real_type - 1], pos++);
+		}
+		slab = slab->next;
+	}
+}
+
+int cmd_pack_fast(int argc, const char **argv, const char *prefix)
+{
+	struct option pack_fast_options[] = {
+		OPT_SET_INT('q', "quiet", &progress,
+			    N_("do not show progress meter"), 0),
+		OPT_SET_INT(0, "progress", &progress,
+			    N_("show progress meter"), 1),
+		OPT_BOOL(0, "skip-largest", &skip_largest,
+			 N_("do not pack the largest packfile in the repository")),
+		OPT_END(),
+	};
+
+	reset_pack_idx_option(&pack_idx_opts);
+	progress = isatty(2);
+	argc = parse_options(argc, argv, prefix, pack_fast_options,
+			     pack_usage, 0);
+
+	if (argc) {
+		base_name = argv[0];
+		argc--;
+	}
+
+	find_packfiles();
+	write_packs();
+	return 0;
+}
diff --git a/cache.h b/cache.h
index 6f53962bf..1a13961bd 100644
--- a/cache.h
+++ b/cache.h
@@ -1336,6 +1336,11 @@ extern void *unpack_entry(struct packed_git *, off_t, enum object_type *, unsign
 extern unsigned long unpack_object_header_buffer(const unsigned char *buf, unsigned long len, enum object_type *type, unsigned long *sizep);
 extern unsigned long get_size_from_delta(struct packed_git *, struct pack_window **, off_t);
 extern int unpack_object_header(struct packed_git *, struct pack_window **, off_t *, unsigned long *);
+extern off_t get_delta_base(struct packed_git *p,
+			    struct pack_window **w_curs,
+			    off_t *curpos,
+			    enum object_type type,
+			    off_t delta_obj_offset);
 
 /*
  * Iterate over the files in the loose-object parts of the object
diff --git a/git.c b/git.c
index 40f9df089..d81bd4469 100644
--- a/git.c
+++ b/git.c
@@ -440,6 +440,7 @@ static struct cmd_struct commands[] = {
 	{ "name-rev", cmd_name_rev, RUN_SETUP },
 	{ "notes", cmd_notes, RUN_SETUP },
 	{ "pack-objects", cmd_pack_objects, RUN_SETUP },
+	{ "pack-fast", cmd_pack_fast, RUN_SETUP },
 	{ "pack-redundant", cmd_pack_redundant, RUN_SETUP },
 	{ "pack-refs", cmd_pack_refs, RUN_SETUP },
 	{ "patch-id", cmd_patch_id },
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index c05d1386a..449715f02 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -505,23 +505,39 @@ void bitmap_writer_set_checksum(unsigned char *sha1)
 	hashcpy(writer.pack_checksum, sha1);
 }
 
+static struct sha1file *bitmap_file_new(char *tmp_file, size_t len)
+{
+	int fd = odb_mkstemp(tmp_file, len, "pack/tmp_bitmap_XXXXXX");
+
+	if (fd < 0)
+		die_errno("unable to create '%s'", tmp_file);
+
+	return sha1fd(fd, tmp_file);
+}
+
+static void bitmap_file_close(struct sha1file *f, const char *tmp_file, const char *dest)
+{
+	sha1close(f, NULL, CSUM_FSYNC);
+
+	if (adjust_shared_perm(tmp_file))
+		die_errno("unable to make temporary bitmap file readable");
+
+	if (rename(tmp_file, dest))
+		die_errno("unable to rename temporary bitmap file to '%s'", dest);
+}
+
 void bitmap_writer_finish(struct pack_idx_entry **index,
 			  uint32_t index_nr,
 			  const char *filename,
 			  uint16_t options)
 {
-	static char tmp_file[PATH_MAX];
 	static uint16_t default_version = 1;
 	static uint16_t flags = BITMAP_OPT_FULL_DAG;
+	char tmp_file[PATH_MAX];
 	struct sha1file *f;
-
 	struct bitmap_disk_header header;
 
-	int fd = odb_mkstemp(tmp_file, sizeof(tmp_file), "pack/tmp_bitmap_XXXXXX");
-
-	if (fd < 0)
-		die_errno("unable to create '%s'", tmp_file);
-	f = sha1fd(fd, tmp_file);
+	f = bitmap_file_new(tmp_file, sizeof(tmp_file));
 
 	memcpy(header.magic, BITMAP_IDX_SIGNATURE, sizeof(BITMAP_IDX_SIGNATURE));
 	header.version = htons(default_version);
@@ -539,11 +555,138 @@ void bitmap_writer_finish(struct pack_idx_entry **index,
 	if (options & BITMAP_OPT_HASH_CACHE)
 		write_hash_cache(f, index, index_nr);
 
-	sha1close(f, NULL, CSUM_FSYNC);
+	bitmap_file_close(f, tmp_file, filename);
+}
 
-	if (adjust_shared_perm(tmp_file))
-		die_errno("unable to make temporary bitmap file readable");
+static void *try_load_bitmap(struct packed_git *p, size_t *_size_out)
+{
+	void *reused_bitmap;
+	size_t reused_bitmap_size;
+
+	int fd;
+	struct stat st;
+	char *idx_name;
+
+	idx_name = pack_bitmap_filename(p);
+	fd = git_open_noatime(idx_name);
+	free(idx_name);
+
+	if (fd < 0)
+		return NULL;
+
+	if (fstat(fd, &st)) {
+		close(fd);
+		return NULL;
+	}
+
+	reused_bitmap_size = xsize_t(st.st_size);
+	reused_bitmap = xmmap(NULL, reused_bitmap_size, PROT_READ, MAP_PRIVATE, fd, 0);
+	close(fd);
+
+	*_size_out = reused_bitmap_size;
+	return reused_bitmap;
+}
+
+extern void pack_fast_grow_typemaps(struct packed_git *p, struct ewah_bitmap **typemaps);
+
+static size_t rewrite_type_maps(struct sha1file *f,
+	struct packed_git *p, unsigned char *original_map, size_t original_size, size_t pos)
+{
+	struct ewah_bitmap *typemaps[4];
+	int r, i;
+
+	for (i = 0; i < 4; ++i) {
+		typemaps[i] = ewah_pool_new();
+		r = ewah_read_mmap(typemaps[i], original_map + pos, original_size - pos);
+		if (r < 0)
+			die("failed to read bitmap index");
+		pos += r;
+	}
+
+	pack_fast_grow_typemaps(p, typemaps);
+
+	for (i = 0; i < 4; ++i) {
+		dump_bitmap(f, typemaps[i]);
+		ewah_pool_free(typemaps[i]);
+	}
+
+	return pos;
+}
+
+static size_t rewrite_bitmaps(struct sha1file *f,
+	struct packed_git *p, unsigned char *original_map, size_t original_size, size_t pos,
+	uint32_t entry_count, struct pack_idx_entry **index, uint32_t index_nr)
+{
+	uint32_t i;
+
+	for (i = 0; i < entry_count; ++i) {
+		const unsigned char *sha1;
+		uint32_t src_idx, src_buffer_len, total_len;
+		int new_idx;
+
+		src_idx = get_be32(original_map + pos);
+		pos += 4;
+
+		sha1 = nth_packed_object_sha1(p, src_idx);
+		new_idx = sha1_pos(sha1, index, index_nr, sha1_access);
+		sha1write_be32(f, (uint32_t)new_idx);
+
+		src_buffer_len = get_be32(original_map + pos + 2 + 4);
+		total_len = (3 * 4) + (src_buffer_len * 8);
+
+		sha1write(f, original_map + pos, 2 + total_len);
+		pos += 2 + total_len;
+
+		if (pos > original_size)
+			die("unexpected end of file");
+	}
+
+	return pos;
+}
+
+void bitmap_rewrite_existing(
+	struct packed_git *p,
+	struct pack_idx_entry **index,
+	uint32_t index_nr,
+	off_t pack_offset,
+	const unsigned char *pack_sha1,
+	const char *filename)
+{
+	char tmp_file[PATH_MAX];
+	struct sha1file *f;
+
+	unsigned char *original_map;
+	size_t original_size, pos = 0;
+	struct bitmap_disk_header header;
+
+	original_map = try_load_bitmap(p, &original_size);
+	if (!original_map || original_size < sizeof(header) + 20)
+		return;
+
+	memcpy(&header, original_map, sizeof(header));
+	hashcpy(header.checksum, pack_sha1);
+
+	if (memcmp(header.magic, BITMAP_IDX_SIGNATURE, sizeof(BITMAP_IDX_SIGNATURE)) != 0)
+		die("existing bitmap for '%s' is corrupted", p->pack_name);
+
+	if (ntohs(header.version) != 1)
+		die("existing bitmap for '%s' has an unsupported version", p->pack_name);
+
+	f = bitmap_file_new(tmp_file, sizeof(tmp_file));
+
+	sha1write(f, &header, sizeof(header));
+	pos = sizeof(header);
+	pos = rewrite_type_maps(f, p, original_map, original_size, pos);
+	pos = rewrite_bitmaps(f, p, original_map, original_size, pos,
+			ntohl(header.entry_count), index, index_nr);
+
+	if (ntohs(header.options) & BITMAP_OPT_HASH_CACHE) {
+		uint32_t i, zero = 0;
+		sha1write(f, original_map + pos, p->num_objects * 4);
+		for (i = p->num_objects; i < index_nr; ++i)
+			sha1write(f, &zero, 4);
+		pos += (p->num_objects * 4);
+	}
 
-	if (rename(tmp_file, filename))
-		die_errno("unable to rename temporary bitmap file to '%s'", filename);
+	bitmap_file_close(f, tmp_file, filename);
 }
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 637770af8..ee361fa6a 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -250,7 +250,7 @@ static int load_bitmap_entries_v1(struct bitmap_index *index)
 	return 0;
 }
 
-static char *pack_bitmap_filename(struct packed_git *p)
+char *pack_bitmap_filename(struct packed_git *p)
 {
 	char *idx_name;
 	int len;
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 0adcef77b..398523dbb 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -34,6 +34,7 @@ typedef int (*show_reachable_fn)(
 	struct packed_git *found_pack,
 	off_t found_offset);
 
+char *pack_bitmap_filename(struct packed_git *p);
 int prepare_bitmap_git(void);
 void count_bitmap_commit_list(uint32_t *commits, uint32_t *trees, uint32_t *blobs, uint32_t *tags);
 void traverse_bitmap_commit_list(show_reachable_fn show_reachable);
@@ -53,5 +54,12 @@ void bitmap_writer_finish(struct pack_idx_entry **index,
 			  uint32_t index_nr,
 			  const char *filename,
 			  uint16_t options);
+void bitmap_rewrite_existing(
+	struct packed_git *p,
+	struct pack_idx_entry **index,
+	uint32_t index_nr,
+	off_t pack_offset,
+	const unsigned char *pack_sha1,
+	const char *filename);
 
 #endif
diff --git a/sha1_file.c b/sha1_file.c
index 72289696d..bcd447f16 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1821,7 +1821,7 @@ unsigned long get_size_from_delta(struct packed_git *p,
 	return get_delta_hdr_size(&data, delta_head+sizeof(delta_head));
 }
 
-static off_t get_delta_base(struct packed_git *p,
+off_t get_delta_base(struct packed_git *p,
 				    struct pack_window **w_curs,
 				    off_t *curpos,
 				    enum object_type type,
@@ -1936,7 +1936,7 @@ static int retry_bad_packed_offset(struct packed_git *p, off_t obj_offset)
 
 #define POI_STACK_PREALLOC 64
 
-static enum object_type packed_to_object_type(struct packed_git *p,
+enum object_type packed_to_object_type(struct packed_git *p,
 					      off_t obj_offset,
 					      enum object_type type,
 					      struct pack_window **w_curs,

^ permalink raw reply related

* [PATCH] gc: ignore old gc.log files
From: David Turner @ 2017-02-09  2:02 UTC (permalink / raw)
  To: git; +Cc: peff, pclouds, David Turner

The intent of automatic gc is to have a git repository be relatively
low-maintenance from a server-operator perspective.  Of course, large
operators like GitHub will need a more complicated management strategy,
but for ordinary usage, git should just work.

In this commit, git learns to ignore gc.log files which are older than
(by default) one day old.  It also learns about a config, gc.maxLogAge
to manage this.

So git should never get itself into a state where it refuses to do any
maintenance, just because at some point some piece of the maintenance
didn't make progress.  That might still happen (e.g. because the repo
is corrupt), but at the very least it won't be because Git is too dumb
to try again.

Signed-off-by: David Turner <dturner@twosigma.com>
Helped-by: Jeff King <peff@peff.net>
---
 Documentation/config.txt |  5 +++++
 builtin/gc.c             | 15 ++++++++++++++-
 t/t6500-gc.sh            | 13 +++++++++++++
 3 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index fc5a28a32..6751371cf 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1402,6 +1402,11 @@ gc.autoDetach::
 	Make `git gc --auto` return immediately and run in background
 	if the system supports it. Default is true.
 
+gc.maxLogAge::
+	If the file gc.log exists, then `git gc --auto` won't run
+	unless that file is more than maxLogAge seconds old.  Default
+	is 86400, one day.
+
 gc.packRefs::
 	Running `git pack-refs` in a repository renders it
 	unclonable by Git versions prior to 1.5.1.2 over dumb
diff --git a/builtin/gc.c b/builtin/gc.c
index 331f21926..62fc84815 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -33,6 +33,7 @@ static int aggressive_window = 250;
 static int gc_auto_threshold = 6700;
 static int gc_auto_pack_limit = 50;
 static int detach_auto = 1;
+static int gc_max_log_age_seconds = 86400;
 static const char *prune_expire = "2.weeks.ago";
 static const char *prune_worktrees_expire = "3.months.ago";
 
@@ -111,6 +112,7 @@ static void gc_config(void)
 	git_config_get_int("gc.auto", &gc_auto_threshold);
 	git_config_get_int("gc.autopacklimit", &gc_auto_pack_limit);
 	git_config_get_bool("gc.autodetach", &detach_auto);
+	git_config_get_int("gc.maxlogage", &gc_max_log_age_seconds);
 	git_config_date_string("gc.pruneexpire", &prune_expire);
 	git_config_date_string("gc.worktreepruneexpire", &prune_worktrees_expire);
 	git_config(git_default_config, NULL);
@@ -291,8 +293,19 @@ static int report_last_gc_error(void)
 {
 	struct strbuf sb = STRBUF_INIT;
 	int ret;
+	struct stat st;
+	const char *gc_log_path = git_path("gc.log");
+
+	if (stat(gc_log_path, &st)) {
+		if (errno == ENOENT)
+			return 0;
+		return error(_("Can't read %s"), gc_log_path);
+	}
+
+	if (time(NULL) - st.st_mtime > gc_max_log_age_seconds)
+		return 0;
 
-	ret = strbuf_read_file(&sb, git_path("gc.log"), 0);
+	ret = strbuf_read_file(&sb, gc_log_path, 0);
 	if (ret > 0)
 		return error(_("The last gc run reported the following. "
 			       "Please correct the root cause\n"
diff --git a/t/t6500-gc.sh b/t/t6500-gc.sh
index 1762dfa6a..b69c2c190 100755
--- a/t/t6500-gc.sh
+++ b/t/t6500-gc.sh
@@ -67,5 +67,18 @@ test_expect_success 'auto gc with too many loose objects does not attempt to cre
 	test_line_count = 2 new # There is one new pack and its .idx
 '
 
+test_expect_success 'background auto gc does not run if gc.log is present and recent but does if it is old' '
+	keep=$(ls .git/objects/pack/*.pack|head -1|sed -e "s/pack$/keep/") &&
+	test_commit foo &&
+	test_commit bar &&
+	git repack &&
+	test_config gc.autopacklimit 1 &&
+	test_config gc.autodetach true &&
+	echo fleem> .git/gc.log &&
+	test_must_fail git gc --auto 2>err &&
+	test_i18ngrep "^error:" err &&
+	test-chmtime =-86401 .git/gc.log &&
+	git gc --auto
+'
 
 test_done
-- 
2.11.GIT


^ permalink raw reply related

* Re: git-scm.com status report
From: brian m. carlson @ 2017-02-09  2:12 UTC (permalink / raw)
  To: Jeff King; +Cc: git
In-Reply-To: <20170206182754.qxgzl7546i7u5wnw@sigill.intra.peff.net>

[-- Attachment #1: Type: text/plain, Size: 2508 bytes --]

On Mon, Feb 06, 2017 at 07:27:54PM +0100, Jeff King wrote:
>   - It's mostly silly for this to be a Rails app at all. It's a static
>     site which occasionally sucks in and formats new content (like the
>     latest git version, new manpages, etc). The intent here was to make
>     something that would "just run" forever and pick up new versions
>     without human intervention. And that _does_ work, but it also makes
>     things more expensive and complicated than they need to be.
> 
>     So a viable alternative is to use some kind of static site
>     generator and have someone (or something) responsible for pulling in
>     the new git versions occasionally.
> 
>     A few people have expressed interesting this. There's some
>     preliminary work here:
> 
>       https://github.com/git/git-scm.com/pull/941
> 
>     and at least GitLab has expressed some interest. So I'll let people
>     coordinate in that PR or a new one what the result should look like.
>     Working patches trump discussion. :)
> 
>     I have also talked with the GitHub Pages people, and they think
>     hosting it as a Jekyll page wouldn't be a big deal performance-wise
>     (with the caveat that we'd need to pre-render the asciidoctor bits
>     ourselves, as Jekyll doesn't do asciidoc). So that's a viable option
>     for hosting it for effectively free (though I think we _would_ still
>     want to put a CDN in front of it). But if somebody has an
>     alternative option, that's fine, too.

My only concern with using GitHub Pages is that I don't believe it
currently supports TLS on custom domains.  Since we currently have TLS
enabled, along with HTTP Strict Transport Security (as we should), such
a configuration literally wouldn't work[0].  I think it's important that
we continue to serve HTTPS only, anyway.

I agree that a static site is the way to go from a maintenance
perspective, though.  Jekyll does support Asciidoctor with a plugin,
just not on GitHub Pages, so it would theoretically be possible to build
the site as one big unit if we did it that way.  I've played around with
that plugin, so I'm happy to provide guidance if we want to do that.

[0] HSTS would prevent anyone who had visited the page from downgrading
to an insecure connection for the next year.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply

* [PATCH] Documentation: unify bottom "part of git suite" lines
From: Stefan Beller @ 2017-02-09  1:29 UTC (permalink / raw)
  To: gitster; +Cc: git, Stefan Beller

We currently have 168 man pages that mention they are part of Git, you
can check yourself easily via:
  $ git grep "Part of the linkgit:git\[1\] suite" |wc -l
  168
However some have a trailing period, i.e.
  $ git grep "Part of the linkgit:git\[1\] suite." |wc -l
  8

Unify the bottom line in all man pages to not end with a period.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 Documentation/gitcore-tutorial.txt     | 2 +-
 Documentation/gitcvs-migration.txt     | 2 +-
 Documentation/gitdiffcore.txt          | 2 +-
 Documentation/gitglossary.txt          | 2 +-
 Documentation/gitrepository-layout.txt | 2 +-
 Documentation/gittutorial-2.txt        | 2 +-
 Documentation/gittutorial.txt          | 2 +-
 Documentation/gitworkflows.txt         | 2 +-
 8 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/Documentation/gitcore-tutorial.txt b/Documentation/gitcore-tutorial.txt
index 22309cfb48..3a0ec8c53a 100644
--- a/Documentation/gitcore-tutorial.txt
+++ b/Documentation/gitcore-tutorial.txt
@@ -1658,4 +1658,4 @@ link:user-manual.html[The Git User's Manual]
 
 GIT
 ---
-Part of the linkgit:git[1] suite.
+Part of the linkgit:git[1] suite
diff --git a/Documentation/gitcvs-migration.txt b/Documentation/gitcvs-migration.txt
index 4c6143c511..1cd1283d0f 100644
--- a/Documentation/gitcvs-migration.txt
+++ b/Documentation/gitcvs-migration.txt
@@ -203,4 +203,4 @@ link:user-manual.html[The Git User's Manual]
 
 GIT
 ---
-Part of the linkgit:git[1] suite.
+Part of the linkgit:git[1] suite
diff --git a/Documentation/gitdiffcore.txt b/Documentation/gitdiffcore.txt
index 08cf62278e..46bc6d077c 100644
--- a/Documentation/gitdiffcore.txt
+++ b/Documentation/gitdiffcore.txt
@@ -288,4 +288,4 @@ link:user-manual.html[The Git User's Manual]
 
 GIT
 ---
-Part of the linkgit:git[1] suite.
+Part of the linkgit:git[1] suite
diff --git a/Documentation/gitglossary.txt b/Documentation/gitglossary.txt
index 212e254adc..571f640f5c 100644
--- a/Documentation/gitglossary.txt
+++ b/Documentation/gitglossary.txt
@@ -24,4 +24,4 @@ link:user-manual.html[The Git User's Manual]
 
 GIT
 ---
-Part of the linkgit:git[1] suite.
+Part of the linkgit:git[1] suite
diff --git a/Documentation/gitrepository-layout.txt b/Documentation/gitrepository-layout.txt
index a5f99cbb11..f51ed4e37c 100644
--- a/Documentation/gitrepository-layout.txt
+++ b/Documentation/gitrepository-layout.txt
@@ -289,4 +289,4 @@ link:user-manual.html[The Git User's Manual]
 
 GIT
 ---
-Part of the linkgit:git[1] suite.
+Part of the linkgit:git[1] suite
diff --git a/Documentation/gittutorial-2.txt b/Documentation/gittutorial-2.txt
index 30d2119565..e0976f6017 100644
--- a/Documentation/gittutorial-2.txt
+++ b/Documentation/gittutorial-2.txt
@@ -433,4 +433,4 @@ link:user-manual.html[The Git User's Manual]
 
 GIT
 ---
-Part of the linkgit:git[1] suite.
+Part of the linkgit:git[1] suite
diff --git a/Documentation/gittutorial.txt b/Documentation/gittutorial.txt
index b3b58d324e..794b83393e 100644
--- a/Documentation/gittutorial.txt
+++ b/Documentation/gittutorial.txt
@@ -674,4 +674,4 @@ link:user-manual.html[The Git User's Manual]
 
 GIT
 ---
-Part of the linkgit:git[1] suite.
+Part of the linkgit:git[1] suite
diff --git a/Documentation/gitworkflows.txt b/Documentation/gitworkflows.txt
index f16c414ea7..177610e44e 100644
--- a/Documentation/gitworkflows.txt
+++ b/Documentation/gitworkflows.txt
@@ -477,4 +477,4 @@ linkgit:git-am[1]
 
 GIT
 ---
-Part of the linkgit:git[1] suite.
+Part of the linkgit:git[1] suite
-- 
2.12.0.rc0.1.g018cb5e6f4


^ permalink raw reply related

* Re: git-scm.com status report
From: Jeff King @ 2017-02-09  2:50 UTC (permalink / raw)
  To: brian m. carlson, git
In-Reply-To: <20170209021208.rvy7uww3qsktbf4a@genre.crustytoothpaste.net>

On Thu, Feb 09, 2017 at 02:12:09AM +0000, brian m. carlson wrote:

> My only concern with using GitHub Pages is that I don't believe it
> currently supports TLS on custom domains.  Since we currently have TLS
> enabled, along with HTTP Strict Transport Security (as we should), such
> a configuration literally wouldn't work[0].  I think it's important that
> we continue to serve HTTPS only, anyway.

I agree we should continue to serve HTTPS. The usual solution for our
use case is to stick a CDN like Cloudflare in front of GitHub Pages (and
I think we'd want to do that anyway for performance).

I haven't done it, but there are various guides. Here's the one from
Cloudflare:

  https://blog.cloudflare.com/secure-and-fast-github-pages-with-cloudflare/

> I agree that a static site is the way to go from a maintenance
> perspective, though.  Jekyll does support Asciidoctor with a plugin,
> just not on GitHub Pages, so it would theoretically be possible to build
> the site as one big unit if we did it that way.  I've played around with
> that plugin, so I'm happy to provide guidance if we want to do that.

We already massage the data coming from Git (and from the Pro Git books)
a bit before and after feeding it to asciidoctor. So I always assumed
that any static site would involve some import steps for those things,
and we'd commit the intermediate product into the repository.

-Peff

^ permalink raw reply

* Re: [PATCH] gc: ignore old gc.log files
From: Jeff King @ 2017-02-09  3:23 UTC (permalink / raw)
  To: David Turner; +Cc: git, pclouds
In-Reply-To: <20170209020222.23642-1-dturner@twosigma.com>

On Wed, Feb 08, 2017 at 09:02:22PM -0500, David Turner wrote:

> The intent of automatic gc is to have a git repository be relatively
> low-maintenance from a server-operator perspective.  Of course, large
> operators like GitHub will need a more complicated management strategy,
> but for ordinary usage, git should just work.
> 
> In this commit, git learns to ignore gc.log files which are older than
> (by default) one day old.  It also learns about a config, gc.maxLogAge
> to manage this.
> 
> So git should never get itself into a state where it refuses to do any
> maintenance, just because at some point some piece of the maintenance
> didn't make progress.  That might still happen (e.g. because the repo
> is corrupt), but at the very least it won't be because Git is too dumb
> to try again.

Sounds like a good goal and approach.

> +gc.maxLogAge::
> +	If the file gc.log exists, then `git gc --auto` won't run
> +	unless that file is more than maxLogAge seconds old.  Default
> +	is 86400, one day.

For other time-based config, we use approxidate with a relative time,
like "1 day ago". I think it would make sense for this to match, as it
makes the config a little more readable.

You can follow the prune_expire example which is right below your new
config variable in all of the hunks of your patch. Though I think
ultimately that isn't parsed inside gc, so you'd eventually look at how
"prune --expire" is handled (which I think is via parse_expiry_date()).

> @@ -291,8 +293,19 @@ static int report_last_gc_error(void)
>  {
>  	struct strbuf sb = STRBUF_INIT;
>  	int ret;
> +	struct stat st;
> +	const char *gc_log_path = git_path("gc.log");

I usually try to avoid assigning the result of git_path(), because it's
a static buffer. In this case it's fine, because you don't call any
complex calls during the lifetime of the variable. But I consider
assigning to be a bad practice.

Using git_pathdup() or git_path_buf() is safer (but you do need to
remember to free() as appropriate).

Speaking of leaks, I think report_last_gc_error() will always leak the
strbuf contents when it is non-empty.

> +	if (stat(gc_log_path, &st)) {
> +		if (errno == ENOENT)
> +			return 0;
> +		return error(_("Can't read %s"), gc_log_path);
> +	}

I'm not quite clear on the life-cycle of this log file.

I think the current code works like this:

  - if a non-empty gc.log exists, we bail

  - when we daemonize, we take a lock via gc.log.lock

  - if we wrote anything to the lockfile log, then we move it into place
    (essentially blocking further auto-gc)

  - otherwise, we rollback the lockfile and leave gc.log untouched

That leaves a few quirks with your new strategy:

  - if our new run was unsuccessful (as judged by having non-empty log
    output), we'd probably want to overwrite the old logfile with our
    new log. Following the current-code logic we do, which is good.

  - if our new run is successful (empty log), we'll leave the old,
    crufty log in place. Probably process_log_file() should unlink() the
    original gc.log while holding the lock.

And here are a few things I noticed that I think aren't caused by your
patch, but are in the same area and might be worth examining:

  - we're not very atomic. After a day, two simultaneous auto-gc's might
    both ignore the gc.log file and continue to run. I don't think it
    matters, though. One of them will win the race to pick up
    gc.log.lock, and the other will immediately fail.

  - It looks like we clear the gc.log file only under another detached
    auto-gc. It seems like manually running a successful "git gc" should
    clear it, too.

  - We block further gc only based on the presence of data on stderr
    from the sub-programs. But _not_ if the sub-programs fail. So a
    program silently exiting with code 128 will stop further gc
    processing, but not prevent another auto-gc from running. Which
    is...weird. Maybe this can't happen, though, because I think we
    write our own error() in such cases, which makes such a failure
    inherently non-silent.

> +	if (time(NULL) - st.st_mtime > gc_max_log_age_seconds)
> +		return 0;

Hmm. What happens if the file has a timestamp in the future due to clock
skew? As long as time_t is signed, I think it's OK, but if it wraps, it
does the wrong thing here. You could rearrange the subtraction to handle
this. But I think if you start using approxidate, it will give you an
a cutoff time, and you can just do:

  if (st.st_mtime < gc_log_expiration)
	return 0; /* too old to care about */

> -	ret = strbuf_read_file(&sb, git_path("gc.log"), 0);
> +	ret = strbuf_read_file(&sb, gc_log_path, 0);

I would have written this as an open(), followed by an fstat() on the
file we opened, and then finally reading the contents if it's fresh
enough. But I'm not sure if that level of atomicity matters. We're not
doing any of this under lock.

-Peff

^ permalink raw reply

* Re: What's cooking in git.git (Feb 2017, #02; Mon, 6)
From: Jeff King @ 2017-02-09  3:46 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Siddharth Kannan, git
In-Reply-To: <xmqqzihzymn3.fsf@gitster.mtv.corp.google.com>

On Mon, Feb 06, 2017 at 02:34:08PM -0800, Junio C Hamano wrote:

> * sk/parse-remote-cleanup (2017-02-06) 1 commit
>   (merged to 'next' on 2017-02-06 at 6ec89f72d5)
>  + parse-remote: remove reference to unused op_prep
> 
>  Code clean-up.
> 
>  Will merge to 'master'.

Hrm. Are the functions in git-parse-remote.sh part of the public API?
That is, do we expect third-party scripts to do:

  . "$(git rev-parse --exec)/git-parse-remote.sh
  error_on_missing_default_upstream "$a" "$b" "$c" "$d"

? If so, then they may be surprised by the change in function signature.

I generally think of git-sh-setup as the one that external scripts would
use. There _is_ a manpage for git-parse-remote, but it doesn't list any
functions. So maybe they're all fair game for changing?

I just didn't see any discussion of this in the original patch thread,
so I wanted to make sure we were making that decision consciously, and
not accidentally. :)

-Peff

^ permalink raw reply

* Re: [PATCH v2 2/2] grep: use '/' delimiter for paths
From: Jeff King @ 2017-02-09  3:58 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Stefan Hajnoczi, Brandon Williams, git
In-Reply-To: <xmqq8tphzr41.fsf@gitster.mtv.corp.google.com>

On Tue, Feb 07, 2017 at 12:24:30PM -0800, Junio C Hamano wrote:

> Having said that, I actually think "make it more convenient" without
> making anything incorrect would be to teach the revision parser to
> understand
> 
>     <any-expression-to-name-a-tree-ish:<path>
> 
> as an extended SHA-1 expression to name the blob or the tree at that
> path in the tree-ish, e.g. if we can make the revision parser to
> take this
> 
>     master:Documentation:git.txt

So here I was wondering what happens when you have a filename with a
colon in it, but then...

> to be able to show the same thing as well.  You'd need to backtrack
> the parsing (e.g. attempt to find "Documentation:git.txt" in
> "master", fail to find any, then fall back to find "git.txt" in
> "master:Documentation", find one, and be happy, or something like
> that), and define how to resolve potential ambiguity (e.g. there may
> indeed be "Documentation:git.txt" and "Documentation/git.txt" in the
> tree-ish "master"), though.

...you obviously did think of that. Backtracking sounds pretty nasty,
though. What's the time complexity of parsing:

  master:a:a:a:a:a:a:a:a:a:a:a

I think there are 2^(n-1) possible paths (each colon can be a real colon
or a slash). Though I guess if you walk the trees as you go, you only
have to examine at most "n" paths to find the first-level tree, and then
at most "n-1" paths at the second level, and so on.

Unless you really do have ambiguous trees, in which case you have to
walk down multiple paths.

It certainly would not be the first combinatoric explosion you can
convince Git to perform. But it does seem like a lot of complication for
something as simple as path lookups.

-Peff

^ permalink raw reply

* Re: git-scm.com status report
From: Eric Wong @ 2017-02-09  4:30 UTC (permalink / raw)
  To: Jeff King; +Cc: brian m. carlson, git
In-Reply-To: <20170209025030.52frbjekatebjoii@sigill.intra.peff.net>

Jeff King <peff@peff.net> wrote:
> I agree we should continue to serve HTTPS. The usual solution for our
> use case is to stick a CDN like Cloudflare in front of GitHub Pages (and
> I think we'd want to do that anyway for performance).
> 
> I haven't done it, but there are various guides. Here's the one from
> Cloudflare:
> 
>   https://blog.cloudflare.com/secure-and-fast-github-pages-with-cloudflare/

AFAIK, there's a way to keep CloudFlare stuff accessible to Tor
users.  If there is, please do so.  As a Tor user, it's been
disappointing to see so much of the web walled off by CAPTCHAs.

Thank you.

Heck, maybe a .onion mirror would be nice :)
I wouldn't mind hosting one myself if it's static.

^ permalink raw reply

* Re: Bug with fixup and autosquash
From: Ashutosh Bapat @ 2017-02-09  4:06 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Johannes Schindelin, Michael Haggerty, Michael J Gruber,
	Matthieu Moy
In-Reply-To: <xmqqbmucuwb0.fsf@gitster.mtv.corp.google.com>

On Thu, Feb 9, 2017 at 4:25 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> writes:
>
>> I have been using git rebase heavily these days and seem to have found a bug.
>>
>> If there are two commit messages which have same prefix e.g.
>> yyyyyy This is prefix
>> xxxxxx This is prefix and message
>>
>> xxxxxx comitted before yyyyyy
>>
>> Now I commit a fixup to yyyyyy using git commit --fixup yyyyyy
>> zzzzzz fixup! This is prefix
>>
>> When I run git rebase -i --autosquash, the script it shows me looks like
>> pick xxxxxx This is prefix and message
>> fixup zzzzzz fixup! This is prefix
>> pick yyyyyy This is prefix
>>
>> I think the correct order is
>> pick xxxxxx This is prefix and message
>> pick yyyyyy This is prefix
>> fixup zzzzzz fixup! This is prefix
>>
>> Is that right?
>
> Because "commit" pretends as if it took the exact commit object name
> to be fixed up (after all, it accepts "yyyyyy" that is a name of the
> commit object), it would be nice if the fixup is applied to that
> exact commit, even if you had many commits that share exactly the
> same title (i.e. not just shared prefix).
>
> Unfortunately, "rebase -i --autosquash" reorders the entries by
> identifying the commit by its title, and it goes with prefix match
> so that fix-up commits created without using --fixup option but
> manually records the title's prefix substring can also work.
>
> We could argue that the logic should notice that there is one exact
> match and another non-exact prefix match and favor the former, and
> certainly such a change would make your made-up example (you didn't
> actually have a commit whose title is literally "This is prefix")
> above work better.

This is a "real life situation" I ended up with yesterday. I can't
share exact commit message here so redacted it, keeping its essence. I
executed git rebase -i --autosquash and got into a conflict merge
since the fixup was applied after wrong commit. I had to execute git
rebase --abort and git rebase -i --autosquash. That's when I noticed
what's gone wrong.

>
> But I am not sure if adding such heuristics would really help in
> general.  It would not help those whose commits are mostly titled
> ultra-vaguely, like "fix", "bugfix", "docfix", etc.
>
> Another possibility is to teach "commit --fixup" to cast exact
> commit object name in stone.  That certainly would solve your
> immediate problem, but it has a grave negative impact when the user
> rebases the same topic many times (which happens often).
>
> For example, I may have a series of commits A and B, notice that A
> could be done a bit better and have "fixup A" on top, build a new
> commit C on it, and then realize that the next step (i.e. D) would
> need support from a newer codebase than where I started (i.e. A^).
>
> At that point, I would have a 4-commit series (A, B, "fixup A", and
> C), and I would rebase them on top of something newer.  I am
> undecided if that "fixup A" is really an improvement or unnecessary,
> when I do this, but I do know that I want to build the series on top
> of a different commit.  So I do rebase without --autosquash (I would
> probably rebase without --interactive for this one).
>
> Then I keep working and add a new commit D on top.  After all that,
> I would have a more-or-less completed series and would be ready to
> re-assess the whole series.  I perhaps decide that "fixup A" is
> really an improvement.  And then I would "rebase -i" to squash the
> fix-up into A.
>
> But notice that at this point, what we are calling A has different
> object name than the original A the fixup was written for, because
> we rebased once on top of a newer codebase.  That commit can still
> be identified by its title, but not with its original commit object
> name.
>
> I think that is why "commit --fixup <commit>" turns the commit
> object name (your "yyyyyy") into a string (your "This is prefix")
> and that is a reasonable design decision [*1*].
>
> So from that point of view, if we were to address your issue, it
> should happen in "rebase -i --autosquash" side, not "commit --fixup"
> side.

I agree with this analysis.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

^ permalink raw reply

* Re: [PATCH] gc: ignore old gc.log files
From: Eric Wong @ 2017-02-09  4:51 UTC (permalink / raw)
  To: David Turner; +Cc: git, peff, pclouds
In-Reply-To: <20170209020222.23642-1-dturner@twosigma.com>

David Turner <dturner@twosigma.com> wrote:
> +	echo fleem> .git/gc.log &&

A minor nit:

	echo fleem >.git/gc.log &&

Otherwise, it's good to know there's attention paid to this
issue.  I've been ignoring cron mails from my mirrors for too
long :x  Thanks.

^ permalink raw reply

* [RFC-PATCHv2] submodules: add a background story
From: Stefan Beller @ 2017-02-09  2:08 UTC (permalink / raw)
  Cc: git, bmwill, Stefan Beller

Just like gitmodules(5), gitattributes(5), gitcredentials(7),
gitnamespaces(7), gittutorial(7), we'd like to provide some background
on submodules, which is not specific to the `submodule` command, but
elaborates on the background and its intended usage.

Add gitsubmodules(7), that explains the states, structure and usage of
submodules.

Signed-off-by: Stefan Beller <sbeller@google.com>
---

This would replace the last patch of  sb/submodule-doc, though it's still
RFC. In this revision I took care of the technical details (i.e. proper
formatting, spelling), and only slight rewording of the text.

The main issue persists; see bottom of the patch:

  SAMPLE WORKFLOWS (RFC/TODO)
  ---------------------------
  
  Do we need
  
  * an opinionated way to check for a specific state of a submodule
  * (submodule helper to be plumbing?)
  * expose the design mistake of having the (name->path) mapping inside the
    working tree, i.e. never remove a name from the submodule config even when
    the submodule doesn't exist any more.
    
Any opinion on these would be welcome!
Thanks,
Stefan

 Documentation/Makefile          |   1 +
 Documentation/git-submodule.txt |  36 ++------
 Documentation/gitsubmodules.txt | 194 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 200 insertions(+), 31 deletions(-)
 create mode 100644 Documentation/gitsubmodules.txt

diff --git a/Documentation/Makefile b/Documentation/Makefile
index b43d66eae6..325c4735a7 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -31,6 +31,7 @@ MAN7_TXT += giteveryday.txt
 MAN7_TXT += gitglossary.txt
 MAN7_TXT += gitnamespaces.txt
 MAN7_TXT += gitrevisions.txt
+MAN7_TXT += gitsubmodules.txt
 MAN7_TXT += gittutorial-2.txt
 MAN7_TXT += gittutorial.txt
 MAN7_TXT += gitworkflows.txt
diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt
index 4a4cede144..d38aa2d53a 100644
--- a/Documentation/git-submodule.txt
+++ b/Documentation/git-submodule.txt
@@ -24,37 +24,7 @@ DESCRIPTION
 -----------
 Inspects, updates and manages submodules.
 
-A submodule allows you to keep another Git repository in a subdirectory
-of your repository. The other repository has its own history, which does not
-interfere with the history of the current repository. This can be used to
-have external dependencies such as third party libraries for example.
-
-When cloning or pulling a repository containing submodules however,
-these will not be checked out by default; the 'init' and 'update'
-subcommands will maintain submodules checked out and at
-appropriate revision in your working tree.
-
-Submodules are composed from a so-called `gitlink` tree entry
-in the main repository that refers to a particular commit object
-within the inner repository that is completely separate.
-A record in the `.gitmodules` (see linkgit:gitmodules[5]) file at the
-root of the source tree assigns a logical name to the submodule and
-describes the default URL the submodule shall be cloned from.
-The logical name can be used for overriding this URL within your
-local repository configuration (see 'submodule init').
-
-Submodules are not to be confused with remotes, which are other
-repositories of the same project; submodules are meant for
-different projects you would like to make part of your source tree,
-while the history of the two projects still stays completely
-independent and you cannot modify the contents of the submodule
-from within the main project.
-If you want to merge the project histories and want to treat the
-aggregated whole as a single project from then on, you may want to
-add a remote for the other project and use the 'subtree' merge strategy,
-instead of treating the other project as a submodule. Directories
-that come from both projects can be cloned and checked out as a whole
-if you choose to go that route.
+For more information about submodules, see linkgit:gitsubmodules[5]
 
 COMMANDS
 --------
@@ -420,6 +390,10 @@ This file should be formatted in the same way as `$GIT_DIR/config`. The key
 to each submodule url is "submodule.$name.url".  See linkgit:gitmodules[5]
 for details.
 
+SEE ALSO
+--------
+linkgit:gitsubmodules[1], linkgit:gitmodules[1].
+
 GIT
 ---
 Part of the linkgit:git[1] suite
diff --git a/Documentation/gitsubmodules.txt b/Documentation/gitsubmodules.txt
new file mode 100644
index 0000000000..3369d55ae9
--- /dev/null
+++ b/Documentation/gitsubmodules.txt
@@ -0,0 +1,194 @@
+gitsubmodules(7)
+================
+
+NAME
+----
+gitsubmodules - information about submodules
+
+SYNOPSIS
+--------
+$GIT_DIR/config, .gitmodules
+
+------------------
+git submodule
+------------------
+
+DESCRIPTION
+-----------
+
+A submodule allows you to keep another Git repository in a subdirectory
+of your repository. The other repository has its own history, which does not
+interfere with the history of the current repository. This can be used to
+have external dependencies such as third party libraries for example.
+
+Submodules are composed from a so-called `gitlink` tree entry
+in the main repository that refers to a particular commit object
+within the inner repository that is completely separate.
+A record in the `.gitmodules` (see linkgit:gitmodules[5]) file at the
+root of the source tree assigns a logical name to the submodule and
+describes the default URL the submodule shall be cloned from.
+The logical name can be used for overriding this URL within your
+local repository configuration (see 'submodule init').
+
+Submodules are not to be confused with remotes, which are other
+repositories of the same project; submodules are meant for
+different projects you would like to make part of your source tree,
+while the history of the two projects still stays completely
+independent and you cannot modify the contents of the submodule
+from within the main project.
+If you want to merge the project histories and want to treat the
+aggregated whole as a single project from then on, you may want to
+add a remote for the other project and use the 'subtree' merge strategy,
+instead of treating the other project as a submodule. Directories
+that come from both projects can be cloned and checked out as a whole
+if you choose to go that route.
+
+When cloning or pulling a repository containing submodules however,
+the submodules will not be checked out by default; You need to instruct
+'clone' to recurse into submodules. The 'init' and 'update' subcommands
+of 'git submodule' will maintain submodules checked out and at an
+appropriate revision in your working tree.
+
+WHEN TO USE
+-----------
+
+Submodules, repositories inside other repositories,
+can be used for different use cases:
+
+* To have finer grained access control.
+  The design principles of Git do not allow for partial repositories to be
+  checked out or transferred. A repository is the smallest unit that a user
+  can be given access to. Submodules are separate repositories, such that
+  you can restrict access to parts of your project via the use of submodules.
+
+* To decouple Git histories.
+  Decoupling histories has different benefits.
+
+** When you want to use a (third party) library tied to a specific version.
+   Using submodules for a library allows you to have a clean history for
+   your own project and only updating the library in the submodule when needed.
+
+** In its current form Git scales up poorly for very large repositories that
+   change a lot, as the history grows very large. For that you may want to look
+   at shallow clone, sparse checkout or git-lfs.
+   However you can also use submodules to e.g. hold large binary assets
+   and these repositories are then shallowly cloned such that you do not
+   have a large history locally.
+
+STATES
+------
+
+When working with submodules, you can think of them as in a state machine.
+So each submodule can be in a different state, the following indicators are used:
+
+* the existence of the setting of 'submodule.<name>.url' in the
+  superprojects configuration
+* the existence of the submodules working tree within the
+  working tree of the superproject
+* the existence of the submodules git directory within the superprojects
+  git directory at $GIT_DIR/modules/<name> or within the submodules working
+  tree
+
+      State      URL config        working tree     git dir
+      -----------------------------------------------------
+      uninitialized    no               no           no
+      initialized     yes               no           no
+      populated       yes              yes          yes
+      depopulated     yes               no          yes
+      deinitialized    no               no          yes
+      uninteresting    no              yes          yes
+
+      invalid          no              yes           no
+      invalid         yes              yes           no
+      -----------------------------------------------------
+
+The first six states can be reached by normal git usage, the latter two are
+only shown for completeness to show all possible eight states with 3 binary
+indicators. The states in detail:
+
+uninitialized::
+The uninitialized state is the default state if no
+'--recurse-submodules' / '--recursive'. An empty directory will be put in
+the working tree as a place holder, such that you are reminded of the
+existence of the submodule.
+---
+To transition into the initialized state
+you can use 'git submodule init', which copies the presets from the
+.gitmodules file into the config.
+
+initialized::
+Users transitioned from the uninitialized state to this state via
+'git submodule init', which preset the URL configuration. As these URLs
+may not be desired in certain scenarios, this state allows to change the
+URLs.  For example in a corporate environment you may want to run
+
+    sed -i s/example.org/$internal-mirror/ .git/config
++
+before proceeding to populate the submodules.
+
+populated::
+In the populated state you have the submodule fully available, i.e. the git
+directory exists as well the working tree exists. In this state you can work
+with the submodule, just like with any other repository.
+
+depopulated::
+In this state you still have the git directory around, but the working tree
+is gone.  For example when the superproject checks out a revision that doesn't
+have the submodule, the state may change to depopulated.
+
+deinitialized::
+The git directory is still there, but the user is no longer interested in the
+submodule as indicated by the missing URL configuration.
+
+invalid::
+When there is no git directory for a submodule, then there is something
+seriously wrong with the submodule.
+
+INNER WORKINGS
+--------------
+
+Generally a submodule can be considered its own autonomous repository,
+that has a worktree and a git directory at split places.
+
+The superproject only records the commit sha1 in its tree, such that
+any other information, e.g. where to obtain a copy from, is not recorded
+in the core data structures of Git. The porcelain layer of Git however
+makes use of the .gitmodules file that gives strong hints where and how
+to obtain a copy of the submodules git repository from.
+
+On the location of the git directory
+------------------------------------
+
+Since v1.7.7 of Git, the git directory of submodules is stored inside the
+superprojects git directory at $GIT_DIR/modules/<submodule-name>
+This location allows for the working tree to be non existent while keeping
+the history around. So we can use git-rm on a submodule without loosing
+information that may only be local.
+
+In the future we may see git-checkout that can checkout submodules and
+revisions that do not contain the submodule can still be checked out without
+having to drop the submodules git directory.
+
+It is also possible to imagine a future in which a bare repository still
+contains its submodules inside the modules sub directory, such that you can
+get a full clone including submodules from that bare repository, the URLs
+as configured or given in the .gitmodules would only be used as a backup.
+
+SAMPLE WORKFLOWS (RFC/TODO)
+---------------------------
+
+Do we need
+
+* an opinionated way to check for a specific state of a submodule
+* (submodule helper to be plumbing?)
+* expose the design mistake of having the (name->path) mapping inside the
+  working tree, i.e. never remove a name from the submodule config even when
+  the submodule doesn't exist any more.
+
+SEE ALSO
+--------
+linkgit:git-submodule[1], linkgit:gitmodules[1].
+
+GIT
+---
+Part of the linkgit:git[1] suite
-- 
2.12.0.rc0.1.g018cb5e6f4


^ permalink raw reply related

* Re: What's cooking in git.git (Feb 2017, #02; Mon, 6)
From: Junio C Hamano @ 2017-02-09  5:09 UTC (permalink / raw)
  To: Jeff King; +Cc: Siddharth Kannan, git
In-Reply-To: <20170209034657.qbkzbbzuvjpxl422@sigill.intra.peff.net>

Jeff King <peff@peff.net> writes:

> On Mon, Feb 06, 2017 at 02:34:08PM -0800, Junio C Hamano wrote:
>
>> * sk/parse-remote-cleanup (2017-02-06) 1 commit
>>   (merged to 'next' on 2017-02-06 at 6ec89f72d5)
>>  + parse-remote: remove reference to unused op_prep
>> 
>>  Code clean-up.
>> 
>>  Will merge to 'master'.
>
> Hrm. Are the functions in git-parse-remote.sh part of the public API?
> That is, do we expect third-party scripts to do:
>
>   . "$(git rev-parse --exec)/git-parse-remote.sh
>   error_on_missing_default_upstream "$a" "$b" "$c" "$d"
>
> ? If so, then they may be surprised by the change in function signature.
>
> I generally think of git-sh-setup as the one that external scripts would
> use. There _is_ a manpage for git-parse-remote, but it doesn't list any
> functions. So maybe they're all fair game for changing?
>
> I just didn't see any discussion of this in the original patch thread,
> so I wanted to make sure we were making that decision consciously, and
> not accidentally. :)

Ummm, yes, I admit that this was accidental.  I didn't really think
of parse-remote as an externally visible and supported interface,
but users have tendency to break our expectations, so, I dunno.

^ permalink raw reply

* Re: [PATCH v2 2/2] grep: use '/' delimiter for paths
From: Junio C Hamano @ 2017-02-09  5:14 UTC (permalink / raw)
  To: Jeff King; +Cc: Stefan Hajnoczi, Brandon Williams, git
In-Reply-To: <20170209035839.wqsh6ibgnmxyjusi@sigill.intra.peff.net>

Jeff King <peff@peff.net> writes:

>   master:a:a:a:a:a:a:a:a:a:a:a
>
> I think there are 2^(n-1) possible paths (each colon can be a real colon
> or a slash). Though I guess if you walk the trees as you go, you only
> have to examine at most "n" paths to find the first-level tree, and then
> at most "n-1" paths at the second level, and so on.
>
> Unless you really do have ambiguous trees, in which case you have to
> walk down multiple paths.
>
> It certainly would not be the first combinatoric explosion you can
> convince Git to perform. But it does seem like a lot of complication for
> something as simple as path lookups.

That is true, and we may want to avoid the implementation complexity
of the backtracking name resolution.  If you are on the other hand
worried about the runtime cost, it will be an issue to begin with
only for those who do "git grep -e pattern HEAD:t/perf", which is an
unnatural way to do "git grep -e pattern HEAD -- t/perf", and the
output from the latter won't have such an issue, so...



^ permalink raw reply

* Re: [PATCH 1/2] refs.c: add resolve_ref_submodule()
From: Michael Haggerty @ 2017-02-09  5:20 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy, git; +Cc: Junio C Hamano
In-Reply-To: <20170208113144.8201-2-pclouds@gmail.com>

On 02/08/2017 12:31 PM, Nguyễn Thái Ngọc Duy wrote:
> This is basically the extended version of resolve_gitlink_ref() where we
> have access to more info from the underlying resolve_ref_recursively() call.
> 
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>  refs.c | 20 ++++++++++++++------
>  refs.h |  3 +++
>  2 files changed, 17 insertions(+), 6 deletions(-)
> 
> diff --git a/refs.c b/refs.c
> index cd36b64ed9..02e35d83f3 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -1325,18 +1325,18 @@ const char *resolve_ref_unsafe(const char *refname, int resolve_flags,
>  				       resolve_flags, sha1, flags);
>  }
>  
> -int resolve_gitlink_ref(const char *submodule, const char *refname,
> -			unsigned char *sha1)
> +const char *resolve_ref_submodule(const char *submodule, const char *refname,
> +				  int resolve_flags, unsigned char *sha1,
> +				  int *flags)
>  {
>  	size_t len = strlen(submodule);
>  	struct ref_store *refs;
> -	int flags;
>  
>  	while (len && submodule[len - 1] == '/')
>  		len--;
>  
>  	if (!len)
> -		return -1;
> +		return NULL;
>  
>  	if (submodule[len]) {
>  		/* We need to strip off one or more trailing slashes */
> @@ -1349,9 +1349,17 @@ int resolve_gitlink_ref(const char *submodule, const char *refname,
>  	}
>  
>  	if (!refs)
> -		return -1;
> +		return NULL;
> +
> +	return resolve_ref_recursively(refs, refname, resolve_flags, sha1, flags);
> +}
> +
> +int resolve_gitlink_ref(const char *submodule, const char *refname,
> +			unsigned char *sha1)
> +{
> +	int flags;
>  
> -	if (!resolve_ref_recursively(refs, refname, 0, sha1, &flags) ||
> +	if (!resolve_ref_submodule(submodule, refname, 0, sha1, &flags) ||
>  	    is_null_sha1(sha1))
>  		return -1;
>  	return 0;
> diff --git a/refs.h b/refs.h
> index 9fbff90e79..74542468d8 100644
> --- a/refs.h
> +++ b/refs.h
> @@ -88,6 +88,9 @@ int peel_ref(const char *refname, unsigned char *sha1);
>   */
>  int resolve_gitlink_ref(const char *submodule, const char *refname,
>  			unsigned char *sha1);
> +const char *resolve_ref_submodule(const char *submodule, const char *refname,
> +				  int resolve_flags, unsigned char *sha1,
> +				  int *flags);

This function is the analog of resolve_ref_unsafe(); i.e., it returns a
pointer to either a static buffer or a pointer into the refname
argument. Therefore, I think it should have "unsafe" in its name. And/or
maybe there should be a safe version of the function analogous to
resolve_refdup().

Moreover, this function has inherited the code for stripping trailing
slashes from the submodule name. I have the feeling that this is a wart,
not a feature, and that it would be sad to see it spread. How about
moving the slash-stripping code to resolve_gitlink_ref() and making
resolve_ref_submodule() assume that its submodule name is already clean?

It would be nice to have a docstring here.

I also have some higher-level concerns about the approach of this patch
series, which I'll write about in a comment to patch 2/2.

Michael


^ permalink raw reply

* Re: [PATCH v2 2/2] grep: use '/' delimiter for paths
From: Jeff King @ 2017-02-09  5:20 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Stefan Hajnoczi, Brandon Williams, git
In-Reply-To: <xmqqtw84rlna.fsf@gitster.mtv.corp.google.com>

On Wed, Feb 08, 2017 at 09:14:17PM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> >   master:a:a:a:a:a:a:a:a:a:a:a
> >
> > I think there are 2^(n-1) possible paths (each colon can be a real colon
> > or a slash). Though I guess if you walk the trees as you go, you only
> > have to examine at most "n" paths to find the first-level tree, and then
> > at most "n-1" paths at the second level, and so on.
> >
> > Unless you really do have ambiguous trees, in which case you have to
> > walk down multiple paths.
> >
> > It certainly would not be the first combinatoric explosion you can
> > convince Git to perform. But it does seem like a lot of complication for
> > something as simple as path lookups.
> 
> That is true, and we may want to avoid the implementation complexity
> of the backtracking name resolution.  If you are on the other hand
> worried about the runtime cost, it will be an issue to begin with
> only for those who do "git grep -e pattern HEAD:t/perf", which is an
> unnatural way to do "git grep -e pattern HEAD -- t/perf", and the
> output from the latter won't have such an issue, so...

I thought your point was to move it into the get_sha1() parser (so that
while the form is only generated by "git grep", it can be accepted by
any git command). That exposes it in a lot of places, including ones
which are network accessible to things like gitweb (or GitHub, of
course, which is my concern).

Even without the runtime cost, though, I think the general complexity
makes it an ugly path to go down (e.g., handling ambiguous cases). I
wouldn't want to have to write the documentation for it. :)

(I _do_ think Stefan's proposed direction is worth it simply because the
result is easier to read, but I agree the whole thing can be avoided by
using pathspecs, as you've noted).

-Peff

^ permalink raw reply

* Re: [PATCH 2/2] worktree.c: use submodule interface to access refs from another worktree
From: Michael Haggerty @ 2017-02-09  6:07 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy, git; +Cc: Junio C Hamano
In-Reply-To: <20170208113144.8201-3-pclouds@gmail.com>

On 02/08/2017 12:31 PM, Nguyễn Thái Ngọc Duy wrote:
> The patch itself is relatively simple: manual parsing code is replaced
> with a call to resolve_ref_submodule(). The manual parsing code must die
> because only refs/files-backend.c should do that. Why submodule here is
> a more interesting question.
> 
> From an outside look, any .git/worktrees/foo is seen as a "normal"
> repository. You can set GIT_DIR to it and have access to everything,
> even shared things that are not literally inside that directory, like
> object db or shared refs.
> 
> On top of that, linked worktrees point to those directories with ".git"
> files. These two make a linked worktree's path "X" a "submodule" (*) (**)
> because X/.git is a file that points to a repository somewhere.
> 
> As such, we can just linked worktree's path as a submodule. We just need
> to make sure they are unique because they are used to lookup submodule
> refs store.
> 
> Main worktree is a a bit trickier. If we stand at a linked worktree, we
> may still need to peek into main worktree's HEAD, for example. We can
> treat main worktree's path as submodule as well since git_path_submodule()
> can tolerate ".git" dirs, in addition to ".git" files.
> 
> The constraint is, if main worktree is X, then the git repo must be at
> X/.git. If the user separates .git repo far away and tell git to point
> to it via GIT_DIR or something else, then the "main worktree as submodule"
> trick fails. Within multiple worktree context, I think we can limit
> support to "standard" layout, at least for now.
> 
> (*) The differences in sharing object database and refs between
> submodules and linked worktrees don't really matter in this context.
> 
> (**) At this point, we may want to rename refs *_submodule API to
> something more neutral, maybe s/_submodule/_remote/

It is unquestionably a good goal to avoid parsing references outside of
`refs/files-backend.c`. But I'm not a fan of this approach.

There are two meanings of the concept of a "ref store", and I think this
change muddles them:

1. The references that happen to be *physically* stored in a particular
   location, for example the `refs/bisect/*` references in a worktree.

2. The references that *logically* should be considered part of a
   particular repository. This might require stitching together
   references from multiple sources, for example `HEAD` and
   `refs/bisect` from a worktree's own directory with other
   references from the main repository.

Either of these concepts can be implemented via the `ref_store` abstraction.

The `ref_store` for a submodule should represent the references
logically visible from the submodule. The main program shouldn't care
whether the references are stored in a single physical location or
spread across multiple locations (for example, if the submodule were
itself a linked worktree).

The `ref_store` that you want here for a worktree is not the worktree's
*logical* `ref_store`. You want the worktree's *physical* `ref_store`.
Mixing logical and physical reference stores together is a bad idea
(even if we were willing to ignore the fact that worktrees are not
submodules in the accepted sense of the word).

The point of my `submodule-hash` branch [1] was to separate these
concepts better by breaking the current 1:1 connection between
`ref_store`s and submodules. This would allow `ref_store`s to be created
for other purposes, such as to represent worktree refs. If you want the
*logical* `ref_store` for a submodule, you access it through the
`submodule_ref_stores` table. If you want the *physical* `ref_store` for
a worktree, you should access it through a different table.

I think the best solution would be to expose the concept of `ref_store`
in the public refs API. Then users of submodules would essentially do

    struct ref_store *refs = get_submodule_refs(submodule_path);
    ... resolve_ref_recursively(refs, refname, 0, sha1, &flags) ...
    ... for_each_ref(refs, fn, cb_data) ...

whereas for a worktree you'd have to look up the `ref_store` instance
somewhere else (or maybe keep it as part of some worktree structure, if
there is one) but you would use it via the same API.

Michael

[1] https://github.com/mhagger/git

> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>  branch.c   |  3 +-
>  worktree.c | 99 +++++++++++++++-----------------------------------------------
>  worktree.h |  2 +-
>  3 files changed, 27 insertions(+), 77 deletions(-)
> 
> diff --git a/branch.c b/branch.c
> index b955d4f316..db5843718f 100644
> --- a/branch.c
> +++ b/branch.c
> @@ -354,7 +354,8 @@ int replace_each_worktree_head_symref(const char *oldref, const char *newref)
>  	for (i = 0; worktrees[i]; i++) {
>  		if (worktrees[i]->is_detached)
>  			continue;
> -		if (strcmp(oldref, worktrees[i]->head_ref))
> +		if (worktrees[i]->head_ref &&
> +		    strcmp(oldref, worktrees[i]->head_ref))
>  			continue;
>  
>  		if (set_worktree_head_symref(get_worktree_git_dir(worktrees[i]),
> diff --git a/worktree.c b/worktree.c
> index d633761575..25e5bc9a3e 100644
> --- a/worktree.c
> +++ b/worktree.c
> @@ -19,54 +19,24 @@ void free_worktrees(struct worktree **worktrees)
>  	free (worktrees);
>  }
>  
> -/*
> - * read 'path_to_ref' into 'ref'.  Also if is_detached is not NULL,
> - * set is_detached to 1 (0) if the ref is detached (is not detached).
> - *
> - * $GIT_COMMON_DIR/$symref (e.g. HEAD) is practically outside $GIT_DIR so
> - * for linked worktrees, `resolve_ref_unsafe()` won't work (it uses
> - * git_path). Parse the ref ourselves.
> - *
> - * return -1 if the ref is not a proper ref, 0 otherwise (success)
> - */
> -static int parse_ref(char *path_to_ref, struct strbuf *ref, int *is_detached)
> -{
> -	if (is_detached)
> -		*is_detached = 0;
> -	if (!strbuf_readlink(ref, path_to_ref, 0)) {
> -		/* HEAD is symbolic link */
> -		if (!starts_with(ref->buf, "refs/") ||
> -				check_refname_format(ref->buf, 0))
> -			return -1;
> -	} else if (strbuf_read_file(ref, path_to_ref, 0) >= 0) {
> -		/* textual symref or detached */
> -		if (!starts_with(ref->buf, "ref:")) {
> -			if (is_detached)
> -				*is_detached = 1;
> -		} else {
> -			strbuf_remove(ref, 0, strlen("ref:"));
> -			strbuf_trim(ref);
> -			if (check_refname_format(ref->buf, 0))
> -				return -1;
> -		}
> -	} else
> -		return -1;
> -	return 0;
> -}
> -
>  /**
> - * Add the head_sha1 and head_ref (if not detached) to the given worktree
> + * Update head_sha1, head_ref and is_detached of the given worktree
>   */
> -static void add_head_info(struct strbuf *head_ref, struct worktree *worktree)
> +static void add_head_info(struct worktree *wt)
>  {
> -	if (head_ref->len) {
> -		if (worktree->is_detached) {
> -			get_sha1_hex(head_ref->buf, worktree->head_sha1);
> -		} else {
> -			resolve_ref_unsafe(head_ref->buf, 0, worktree->head_sha1, NULL);
> -			worktree->head_ref = strbuf_detach(head_ref, NULL);
> -		}
> -	}
> +	int flags;
> +	const char *target;
> +
> +	target = resolve_ref_submodule(wt->path, "HEAD",
> +				       RESOLVE_REF_READING,
> +				       wt->head_sha1, &flags);
> +	if (!target)
> +		return;
> +
> +	if (flags & REF_ISSYMREF)
> +		wt->head_ref = xstrdup(target);
> +	else
> +		wt->is_detached = 1;
>  }
>  
>  /**
> @@ -77,9 +47,7 @@ static struct worktree *get_main_worktree(void)
>  	struct worktree *worktree = NULL;
>  	struct strbuf path = STRBUF_INIT;
>  	struct strbuf worktree_path = STRBUF_INIT;
> -	struct strbuf head_ref = STRBUF_INIT;
>  	int is_bare = 0;
> -	int is_detached = 0;
>  
>  	strbuf_add_absolute_path(&worktree_path, get_git_common_dir());
>  	is_bare = !strbuf_strip_suffix(&worktree_path, "/.git");
> @@ -91,13 +59,10 @@ static struct worktree *get_main_worktree(void)
>  	worktree = xcalloc(1, sizeof(*worktree));
>  	worktree->path = strbuf_detach(&worktree_path, NULL);
>  	worktree->is_bare = is_bare;
> -	worktree->is_detached = is_detached;
> -	if (!parse_ref(path.buf, &head_ref, &is_detached))
> -		add_head_info(&head_ref, worktree);
> +	add_head_info(worktree);
>  
>  	strbuf_release(&path);
>  	strbuf_release(&worktree_path);
> -	strbuf_release(&head_ref);
>  	return worktree;
>  }
>  
> @@ -106,8 +71,6 @@ static struct worktree *get_linked_worktree(const char *id)
>  	struct worktree *worktree = NULL;
>  	struct strbuf path = STRBUF_INIT;
>  	struct strbuf worktree_path = STRBUF_INIT;
> -	struct strbuf head_ref = STRBUF_INIT;
> -	int is_detached = 0;
>  
>  	if (!id)
>  		die("Missing linked worktree name");
> @@ -127,19 +90,14 @@ static struct worktree *get_linked_worktree(const char *id)
>  	strbuf_reset(&path);
>  	strbuf_addf(&path, "%s/worktrees/%s/HEAD", get_git_common_dir(), id);
>  
> -	if (parse_ref(path.buf, &head_ref, &is_detached) < 0)
> -		goto done;
> -
>  	worktree = xcalloc(1, sizeof(*worktree));
>  	worktree->path = strbuf_detach(&worktree_path, NULL);
>  	worktree->id = xstrdup(id);
> -	worktree->is_detached = is_detached;
> -	add_head_info(&head_ref, worktree);
> +	add_head_info(worktree);
>  
>  done:
>  	strbuf_release(&path);
>  	strbuf_release(&worktree_path);
> -	strbuf_release(&head_ref);
>  	return worktree;
>  }
>  
> @@ -334,8 +292,6 @@ const struct worktree *find_shared_symref(const char *symref,
>  					  const char *target)
>  {
>  	const struct worktree *existing = NULL;
> -	struct strbuf path = STRBUF_INIT;
> -	struct strbuf sb = STRBUF_INIT;
>  	static struct worktree **worktrees;
>  	int i = 0;
>  
> @@ -345,6 +301,10 @@ const struct worktree *find_shared_symref(const char *symref,
>  
>  	for (i = 0; worktrees[i]; i++) {
>  		struct worktree *wt = worktrees[i];
> +		const char *symref_target;
> +		unsigned char sha1[20];
> +		int flags;
> +
>  		if (wt->is_bare)
>  			continue;
>  
> @@ -359,25 +319,14 @@ const struct worktree *find_shared_symref(const char *symref,
>  			}
>  		}
>  
> -		strbuf_reset(&path);
> -		strbuf_reset(&sb);
> -		strbuf_addf(&path, "%s/%s",
> -			    get_worktree_git_dir(wt),
> -			    symref);
> -
> -		if (parse_ref(path.buf, &sb, NULL)) {
> -			continue;
> -		}
> -
> -		if (!strcmp(sb.buf, target)) {
> +		symref_target = resolve_ref_submodule(wt->path, symref, 0,
> +						      sha1, &flags);
> +		if ((flags & REF_ISSYMREF) && !strcmp(symref_target, target)) {
>  			existing = wt;
>  			break;
>  		}
>  	}
>  
> -	strbuf_release(&path);
> -	strbuf_release(&sb);
> -
>  	return existing;
>  }
>  
> diff --git a/worktree.h b/worktree.h
> index 6bfb985203..5ea5e503fb 100644
> --- a/worktree.h
> +++ b/worktree.h
> @@ -4,7 +4,7 @@
>  struct worktree {
>  	char *path;
>  	char *id;
> -	char *head_ref;
> +	char *head_ref;		/* NULL if HEAD is broken or detached */
>  	char *lock_reason;	/* internal use */
>  	unsigned char head_sha1[20];
>  	int is_detached;
> 


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox