Git development
 help / color / mirror / Atom feed
* Re: Retrieving a file in git that was deleted and committed
From: Jeff King @ 2018-12-07  7:20 UTC (permalink / raw)
  To: biswaranjan panda; +Cc: bturner, git
In-Reply-To: <CADHAf1Y8or_frf=Ecu-82z-jo06NKe7oqo1cxtsZsOxhKKxjAw@mail.gmail.com>

On Thu, Dec 06, 2018 at 11:07:00PM -0800, biswaranjan panda wrote:

> Thanks! Strangely git log --follow does work.

I suspect it would work even without --follow. When you limit a log
traversal with a pathspec, like:

  git log foo

that is not about following some continuous stream of content, but
rather just applying that pathspec to the diff of each commit, and
pruning ones where it did not change. So even if there are gaps where
the file did not exist, we continue to apply the pathspec to the older
commits.

Tools like git-blame will _not_ work, though, as they really are trying
to track the content as they walk back through history. And Once all of
the content seems to appear from nowhere in your new commit, that seems
like a dead end.

In theory there could be some machine-readable annotation in the commit
object (or in a note created after the fact) to say "even though 'foo'
is a new file here, it came from $commit:foo".  And then git-blame could
keep following the content there. But such a feature does not yet exist.

-Peff

^ permalink raw reply

* Re: Retrieving a file in git that was deleted and committed
From: biswaranjan panda @ 2018-12-07  7:07 UTC (permalink / raw)
  To: bturner; +Cc: git
In-Reply-To: <CAGyf7-FUHMEq_FfPNrH6uT2b-nCd_wi=Aww+OUuoDem11DROGA@mail.gmail.com>

Thanks! Strangely git log --follow does work.
On Thu, Dec 6, 2018 at 10:55 PM Bryan Turner <bturner@atlassian.com> wrote:
>
> On Thu, Dec 6, 2018 at 10:49 PM biswaranjan panda
> <biswaranjan.nitrkl@gmail.com> wrote:
> >
> > I have the following scenario:
> >
> > On a branch A, I deleted a file foo.txt and committed the change. Then
> > I did a bunch of other changes.
> > Now I want to undelete foo.txt.
> >
> > One way is to checkout a separate branch B where the file is present.
> > Then checkout A. Then do
> > git checkout B -- path_to_file
>
> It doesn't change anything, but note that you don't need to checkout B
> first, to restore the file. If you know a commit SHA where the file is
> present, "git checkout SHA -- path_to_file" will pull back the file as
> it existed at that commit.
>
> >
> > While this does gets the file back, the file shows up as a new file to
> > be committed. Once I commit it, git blame doesn't show the old history
> > for the file.
> >
> > I would appreciate if anyone knows how to preserve git blame history
>
> It's not possible, as far as I'm aware. While the new file has the
> same name as the old file, to Git they are two unrelated entries that
> happen to reside at the same path. Even things like "git log --follow"
> won't consider the file to be related to its previous history.
>
> Bryan



-- 
Thanks,
-Biswa

^ permalink raw reply

* Re: Retrieving a file in git that was deleted and committed
From: Bryan Turner @ 2018-12-07  6:55 UTC (permalink / raw)
  To: biswaranjan.nitrkl; +Cc: Git Users
In-Reply-To: <CADHAf1Y_d=-9By4jC2xd+BmWJgfGmBNUr=uSuQtfuHDrarN4kw@mail.gmail.com>

On Thu, Dec 6, 2018 at 10:49 PM biswaranjan panda
<biswaranjan.nitrkl@gmail.com> wrote:
>
> I have the following scenario:
>
> On a branch A, I deleted a file foo.txt and committed the change. Then
> I did a bunch of other changes.
> Now I want to undelete foo.txt.
>
> One way is to checkout a separate branch B where the file is present.
> Then checkout A. Then do
> git checkout B -- path_to_file

It doesn't change anything, but note that you don't need to checkout B
first, to restore the file. If you know a commit SHA where the file is
present, "git checkout SHA -- path_to_file" will pull back the file as
it existed at that commit.

>
> While this does gets the file back, the file shows up as a new file to
> be committed. Once I commit it, git blame doesn't show the old history
> for the file.
>
> I would appreciate if anyone knows how to preserve git blame history

It's not possible, as far as I'm aware. While the new file has the
same name as the old file, to Git they are two unrelated entries that
happen to reside at the same path. Even things like "git log --follow"
won't consider the file to be related to its previous history.

Bryan

^ permalink raw reply

* Retrieving a file in git that was deleted and committed
From: biswaranjan panda @ 2018-12-07  6:49 UTC (permalink / raw)
  To: git

I have the following scenario:

On a branch A, I deleted a file foo.txt and committed the change. Then
I did a bunch of other changes.
Now I want to undelete foo.txt.

One way is to checkout a separate branch B where the file is present.
Then checkout A. Then do
git checkout B -- path_to_file

While this does gets the file back, the file shows up as a new file to
be committed. Once I commit it, git blame doesn't show the old history
for the file.

I would appreciate if anyone knows how to preserve git blame history.

^ permalink raw reply

* Issue with git-gui and font used for widgets
From: Eric Work @ 2018-12-07  6:10 UTC (permalink / raw)
  To: git

Hi,

I have set my UI font in the git-gui preferences, but it only affects
the menus and certain widgets.  It does not apply the font to labels
and buttons.  This creates a problem for my HiDPI display because the
fonts are quite small.  I've never programmed in TCL/TK before so I
don't know exactly what is wrong, but looking at the code I see where
it's supposed to apply the font option to the widgets in a foreach
loop.

foreach class {Button Checkbutton Entry Label
    Labelframe Listbox Message
    Radiobutton Spinbox Text} {
  option add *$class.font font_ui
}

But that doesn't seem to work.  As an experiment I added the -font
parameter (according to the docs) to where the branch label is created
and that worked as expected.

${NS}::label .branch.l1 \
  -text [mc "Current Branch:"] \
  -anchor w \
  -justify left \
  -font font_ui

I don't know what is the expected behavior in terms of setting fonts,
but I would assume that is not expected?  It appears to be using
TkDefaultFont instead.  The default font being small is really a tk
problem, where as setting the widget font is a git-gui problem.  I
created the following bug report against tk to get some feedback on
the small default font issue as well as documented a workaround to
allow per user default fonts.

https://core.tcl-lang.org/tk/tktview/dccd82bdc70dc25bb6709a6c14880a92104dda43

Any suggestions?

-Eric

^ permalink raw reply

* Re: [wishlist] git submodule update --reset-hard
From: Yaroslav Halchenko @ 2018-12-07  1:22 UTC (permalink / raw)
  To: git
In-Reply-To: <CAGZ79kYoGqWW4tv4-caA18SHKe+y2mnDT84AEWVksDtDObLq0g@mail.gmail.com>

Hi Stefan,

Thanks for the dialogue!  Replies are embedded below.

On Thu, 06 Dec 2018, Stefan Beller wrote:
> ...
> > > What if the branch differs from the sha1 recorded in the superproject?

> > git reset --hard  itself is an operation which should be done with some
> > level of competence in doing "the right thing" by calling it.  You
> > can hop branches even in current (without any submodules in question)
> > repository with it and cause as much chaos as you desire.

> Right.

> git reset --hard would the branch (as well as the working tree) to the
> given sha1, which is confusing as submodules get involved.

> The Right Thing as of now is the sha1 as found in the
> superprojects gitlink. But as that can be different from any branch
> in the submodule, we'd rather detach the HEAD to make it
> deterministic.

yeap, makes total sense to be the thing do that by default ;-)

> There was a proposal to "re-attach HEAD" in the submodule, i.e.
> if the branch branch points at the same commit, we don't need
> a detached HEAD, but could go with the branch instead.

if I got the idea right, if we are talking about any branch, it
would also non-deterministic since who knows what left over branch(es)
point to that commit.  Not sure if I would have used that ;)

> > If desired though, a number of prevention mechanisms could be in place (but
> > would require option(s) to overcome) to allow submodule to be reset --hard'ed
> > only when some conditions met (e.g. only to the commit which is among parent
> > commits path of the current branch).  This way wild hops would be prevented,
> > although you might still end up in some feature branch.  But since "reset
> > --hard" itself doesn't have any safe-guards, I do not really think they should
> > be implemented here either.

> So are you looking for
> a) "stay on submodule branch (i.e. HEAD still points at $branch), and
> reset --hard" such that the submodule has a clean index and at that $branch 
> or
> b) "stay on submodule branch (i.e. HEAD still points at $branch), but $branch is
>    set to the gitlink from the superproject, and then a reset --hard
>    will have the worktree set to it as well.

yes!

NB "gitlink" -- just now discovered the thing for me.  Thought it would be
called a  subproject  echoing what git diff/log -p shows for submodule commits.

> (a) is what the referenced submodule.repoLike option implements.

sounds like it indeed, thanks for spelling out

> I'd understand the desire for (b) as well, as it is a "real" hard reset on
> the superproject level, without detaching branches.

yeap

> > >   git reset --hard --recurse-submodules HEAD

> > it does indeed some trick(s) but not all seems to be the ones I desire:

> > 1. Seems to migrate submodule's .git directories into the top level
> > .git/modules

> Ah yes, that happens too. This will help once you want to git-rm
> a submodule and checkout states before and after.

yeap ;-) 

> > > undesirable in the sense of still having local changes (that is what
> > > the above reset with `--recurse` would fix) or changed the branch
> > > state? (i.e. is detached but was on a branch before?)

> > right -- I meant the local changes and indeed reset --recurse-submodules
> > indeed seems to recurse nicely.  Then the undesired effect remaining only
> > the detached HEAD

> For that we may want to revive discussions in
> https://public-inbox.org/git/20170501180058.8063-5-sbeller@google.com/

well, isn't that one requires a branch to be specified in .gitmodules?

> > > >   git submodule update --recursive

> > > > I would end up in the detached HEADs within submodules.

> > > > What I want is to retain current branch they are at (or may be possible
> > > > "were in"? reflog records might have that information)

> > > So something like

> > >   git submodule foreach --recursive git reset --hard

> > > ?

> > not quite  -- this would just kill all local changes within each submodule, not
> > to reset it to the desired state, which wouldn't be specified in such
> > invocation, and is only known to the repo containing it

> With this answer it sounds like you'd want (b) from above.

yeap

> > > You may be interested in
> > > https://public-inbox.org/git/20180927221603.148025-1-sbeller@google.com/
> > > which introduces a switch `submodule.repoLike [ = true]`, which
> > > when set would not detach HEAD in submodules.

> > Thanks! looks interesting -- was there more discussion/activity beyond those 5
> > posts in the thread?

> Unfortunately there was not.

pity

> > This feature might indeed come handy but if I got it right, it is somewhat
> > complimentary to just having submodule update --reset-hard .  E.g.  submodules
> > might be in different branches (if I am not tracking based on branch names), so
> > I would not want a recursive checkout with -b|-B.  But we would indeed benefit
> > from such functionality, since this difficulty of managing branches of
> > submodules I think would be elevated with it! (e.g. in one use case we probably
> > will end up with a few thousands of submodules, and at least 3 branches in each
> > which would need to be in sync, and typically you wouldn't want different
> > branches to be checked out in different submodules)

> > > Can you say more about the first question above:
> > > Would you typically have situations where the
> > > submodule branch is out of sync with the superproject
> > > and how do you deal with that?

> > typically I do not have anything out of sync.  My primary use-case is to
> > "cancel" recent changes in the entire tree of repositories.  I guess for
> > my use case, instead of needing two commands

> >    git reset --hard PREVIOUSPOINT
> >    git submodule update --reset--hard --recursive

> > I wish there was just one

> >    git reset --hard --recursive PREVIOUSPOINT

> Maybe this could learn options like

>   git reset --hard --recursive=hard,keep-branch PREVIOUSPOINT

'keep-branch' (given aforementioned keeping the specified in .gitmodules
branch) might be confusing.  Also what if a submodule already in a
detached HEAD?  IMHO --recursive=hard, and just saying that it would do
"reset --hard", is imho sufficient.  (that is why I like pure
--reset hard   since it doesn't care and neither does anything to the
branch)

> which then could be put into options like

>   git config reset.recurseSubmodules  hard,keep-branch &&
>   # maybe not needed, depending on the exact meaning
>   # of reset.recurseSubmodules:
>   git config submodule.recurse

> and then

>   git reset --hard PREVIOUS

> would do what you'd desire.

you mean

   git reset --hard --recurse-submodules PREVIOUS

in principle overall I would love to have it, besides not sure what
other than 'hard' could be there, and what 'keep-branch' would exactly
do ;-)

> > but I felt that   submodule update   might be a better starting point
> > since it already  provides different modes for update.  If I was even greedier,
> > I would have asked for

> >    git revert --recursive <commit>...
> >    git rebase --recursive [-i] ...

> > which I also frequently desire (could elaborate on the use cases etc).

> These would be nice to have. It would be nice if you'd elaborate on the
> use cases for future reference in the mailing list archive. :-)

ok, will try to do so ;-) In summary: they are just a logical extension
of git support for submodules for anyone actively working with
submodules to keep entire tree in sync.  Then quite often the need for
reverting a specific commit (which also has changes reflected in
submodules) arises.  The same with rebase, especially to trim away some
no longer desired changes reflected in submodules.  

the initial "git submodule update --reset-hard" is pretty much a
crude workaround for some of those cases, so I would just go earlier in
the history, and redo some things, whenever I could just drop or revert
some selected set of commits.

> > NB or --recurse-submodules to avoid confusion with recursive merge
> > strategy?

> ... and sometimes recursing in the file system, c.f. `ls-tree -r`.

ah... so it is only   submodule  command which has --recursive, and the
rest have --recurse-submodules   when talking about recursing into
submodules?

-- 
Yaroslav O. Halchenko
Center for Open Neuroscience     http://centerforopenneuroscience.org
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834                       Fax: +1 (603) 646-1419
WWW:   http://www.linkedin.com/in/yarik        

^ permalink raw reply

* Re: [RFC PATCH 3/3] test-lib: add the '--stress' option to run a test repeatedly under load
From: Jeff King @ 2018-12-07  1:14 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: git
In-Reply-To: <20181206231005.GP30222@szeder.dev>

On Fri, Dec 07, 2018 at 12:10:05AM +0100, SZEDER Gábor wrote:

> On Wed, Dec 05, 2018 at 04:56:21PM -0500, Jeff King wrote:
> > Could we just kill them all?
> > 
> > I guess it's a little tricky, because $! is going to give us the pid of
> > each subshell. We actually want to kill the test sub-process. That takes
> > a few contortions, but the output is nice (every sub-job immediately
> > says "ABORTED ...", and the final process does not exit until the whole
> > tree is done):
> 
> It's trickier than that:
> 
>   - Nowadays our test lib translates SIGINT to exit, but not TERM (or
>     HUP, in case a user decides to close the terminal window), which
>     means that if the test runs any daemons in the background, then
>     those won't be killed when the stress test is aborted.
> 
>     This is easy enough to address, and I've been meaning to do this
>     in an other patch series anyway.

Yeah, trapping on more signals seems reasonable (or just propagating INT
instead of TERM). I'm more worried about this one:

>   - With the (implied) '--verbose-log' the process killed in the
>     background subshell's SIGTERM handler is not the actual test
>     process, because '--verbose-log' runs the test again in a piped
>     subshell to save its output:

Hmm, yeah. The patch I sent earlier already propagates through the
subshell, so this is really just another layer there. I.e., would it be
reasonable when we are using --verbose-log (or --tee) for the parent
process to propagate signals down to child it spawned?

That would fix this case, and it would make things more predictable in
general (e.g., a user who manually runs kill).

It does feel like a lot of boilerplate to be propagating these signals
manually. I think the "right" Unix-y solution here is process groups:
each sub-job should get its own group, and then the top-level runner
script can kill the whole group. We may run into portability issues,
though (setsid is in util-linux, but I don't know if there's an
equivalent elsewhere; a small perl or C helper could do it, but I have
no idea what that would mean on Windows).

>       (GIT_TEST_TEE_STARTED=done ${TEST_SHELL_PATH} "$0" "$@" 2>&1;
>        echo $? >"$TEST_RESULTS_BASE.exit") | tee -a "$GIT_TEST_TEE_OUTPUT_FILE"
> 
>     That 'kill' kills the "outer" shell process.
>     And though I get "ABORTED..." immediately and I get back my
>     prompt right away, the commands involved in the above construct
>     are still running in the background:
> 
>       $ ./t3903-stash.sh --stress=1
>       ^CABORTED  0.0
>       $ ps a |grep t3903
>       1280 pts/17   S      0:00 /bin/sh ./t3903-stash.sh --stress=1
>       1281 pts/17   S      0:00 tee -a <...>/test-results/t3903-stash.stress-0.out
>       1282 pts/17   S      0:00 /bin/sh ./t3903-stash.sh --stress=1
>       4173 pts/17   S+     0:00 grep t3903
> 
>     I see this behavior with all shells I tried.
>     I haven't yet started to think it through why this happens :)

I think you get ABORTED because we are just watching for the
parent-process to exit (and reporting its status). The fact that that
the subshell (and the tee) are still running is not important. That all
makes sense to me.

>     Not implying '--verbose-log' but redirecting the test script's
>     output, like you did in your 'stress' script, seems to work in
>     dash, ksh, and ks93, but not in Bash v4.3 or later, where, for
>     whatever reason, the subshell get SIGINT before the SIGTERM would
>     arrive.
>     While we could easily handle SIGINT in the subshell with the same
>     signal handler as SIGTERM, it bothers me that something
>     fundamental is wrong here.

Yeah, I don't understand the SIGINT behavior you're seeing with bash. I
can't replicate it with bash 4.4.23 (from Debian unstable).

-Peff

^ permalink raw reply

* Re: [RFC PATCH 3/3] test-lib: add the '--stress' option to run a test repeatedly under load
From: Jeff King @ 2018-12-07  1:03 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: git
In-Reply-To: <20181206225601.GO30222@szeder.dev>

On Thu, Dec 06, 2018 at 11:56:01PM +0100, SZEDER Gábor wrote:

> > +test_expect_success 'roll those dice' '
> > +	case "$(openssl rand -base64 1)" in
> > +	z*)
> > +		return 1
> > +	esac
> > +'
> 
> Wasteful :)
> 
>   test $(($$ % 42)) -ne 0

Oh, indeed, that is much more clever. :)

-Peff

^ permalink raw reply

* Re: [PATCHv2 0/9] Resending sb/submodule-recursive-fetch-gets-the-tip
From: Josh Steadmon @ 2018-12-07  0:25 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, jonathantanmy
In-Reply-To: <20181129002756.167615-1-sbeller@google.com>

On 2018.11.28 16:27, Stefan Beller wrote:
> This is a resend of sb/submodule-recursive-fetch-gets-the-tip,
> with all feedback addressed. As it took some time, I'll send it
> without range-diff, but would ask for full review.
> 
> I plan on resending after the next release as this got delayed quite a bit,
> which is why I also rebased it to master.
> 
> Thanks,
> Stefan

I am not very familiar with most of the submodule code, but for what
it's worth, this entire series looks good to me. I'll note that most of
the commits caused some style complaints, but I'll leave it up to your
judgement as to whether they're valid or not.

Reviewed-by: Josh Steadmon <steadmon@google.com>

^ permalink raw reply

* Re: [WIP RFC 4/5] upload-pack: refactor writing of "packfile" line
From: Junio C Hamano @ 2018-12-07  0:22 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git
In-Reply-To: <20181206232538.141378-1-jonathantanmy@google.com>

Jonathan Tan <jonathantanmy@google.com> writes:

>> Jonathan Tan <jonathantanmy@google.com> writes:
>> 
>> > @@ -126,6 +129,12 @@ static int read_pack_objects_stdout(int outfd, struct output_state *os)
>> >  	}
>> >  	os->used += readsz;
>> >  
>> > +	if (!os->packfile_started) {
>> > +		os->packfile_started = 1;
>> > +		if (use_protocol_v2)
>> > +			packet_write_fmt(1, "packfile\n");
>> 
>> If we fix this function so that the only byte in the buffer is held
>> back without emitted when os->used == 1 as I alluded to, this may
>> have to be done a bit later, as with such a change, it is no longer
>> guaranteed that send_client_data() will be called after this point.
>
> I'm not sure what you mean about there being no guarantee that
> send_client_data() is not called - in create_pack_file(), there is an
> "if (output_state.used > 0)" line (previously "if (0 <= buffered)") that
> outputs anything remaining.

I was referring to this part of the review on the previous step,
which you may not yet have read.

    OK, this corresponds to the "*cp++ = buffered"  in the original just
    before xread().

    > +		os->used = 1;
    > +	} else {
    > +		send_client_data(1, os->buffer, os->used);
    > +		os->used = 0;

    I am not sure if the code is correct when os->used happens to be 1
    (shouldn't we hold the byte, skip the call to send_client_data(),
    and go back to poll() to expect more data?), but this is a faithful
    code movement and rewrite of the original.

The point of this logic is to make sure we always hold back some
bytes and do not feed *all* the bytes to the other side by calling
"send-client-data" until we made sure the upstream of what we are
relaying (pack-objects?) successfully exited, but it looks to me
that the "else" clause above ends up flushing everything when
os->used is 1, which goes against the whole purpose of the code.

And the "fix" I was alluding to was to update that "else" clause to
make it a no-op that keeps os->used non-zero, which would not call
send-client-data.

When that fix happens, the part that early in the function this
patch added "now we know we will call send-client-data, so let's say
'here comes packdata' unless we have already said that" is making
the decision too early.  Depending on the value of os->used when we
enter the code and the number of bytes xread() reads from the
upstream, we might not call send-client-data yet (namely, when we
have no buffered data and we happened to get only one byte).

> ... it might be
> better if the server can send sideband throughout the whole response -
> perhaps that should be investigated first.

Yup.  It just looked quite crazy, and it is even more crazy to
buffer keepalives ;-)

^ permalink raw reply

* Re: Any way to make git-log to enumerate commits?
From: Junio C Hamano @ 2018-12-07  0:12 UTC (permalink / raw)
  To: Konstantin Khomoutov; +Cc: Konstantin Kharlamov, git
In-Reply-To: <20181206114835.oro32chgf4qp2yqd@tigra>

Konstantin Khomoutov <kostix@bswap.ru> writes:

>> I do not see why the "name each rev relative to HEAD" formatting
>> option cannot produce HEAD^2~2 etc.
>>  ...
> My reading was that the OP explicitly wanted to just glance at a single
> integer number and use it right away in a subsequent rebase command.
>
> I mean, use one's own short memory instead of copying and pasting.

As everybody pointed out, a single integer number will fundamentally
unworkable with distributed nature of Git that inherently makes the
history with forks and merges.  Besides, there is no way to feed
"git log" with "a single integer number", without which "making
git-log to enumerate" would not be all that useful ("git show
HEAD~4" works but "git show 4" does not).

All of these name-rev based suggestions were about using anchoring
point with memorable name plus a short path to reach from there,
which I think is the closest thing to "a single integer number" and
still is usable for the exact purpose.  "HEAD~48^2" on an
integration branch would be "the tip of the side branch that was
merged some 48 commits ago", for example.

^ permalink raw reply

* Re: [PATCH on sb/more-repo-in-api] revision: use commit graph in get_reference()
From: Jonathan Tan @ 2018-12-06 23:54 UTC (permalink / raw)
  To: peff; +Cc: jonathantanmy, git, stolee
In-Reply-To: <20181205045416.GB12284@sigill.intra.peff.net>

Also CC-ing Stolee since I mention multi-pack indices at the end.

> This seems like a reasonable thing to do, but I have sort of a
> meta-comment. In several places we've started doing this kind of "if
> it's this type of object, do X, otherwise do Y" optimization (e.g.,
> handling large blobs for streaming).
> 
> And in the many cases we end up doubling the effort to do object
> lookups: here we do one lookup to get the type, and then if it's not a
> commit (or if we don't have a commit graph) we end up parsing it anyway.
> 
> I wonder if we could do better. In this instance, it might make sense
> to first see if we actually have a commit graph available (it might not
> have this object, of course, but at least we'd expect it to have most
> commits).

This makes sense - I thought I shouldn't mention the commit graph in the
code since it seems like a layering violation, but I felt the need to
mention commit graph in a comment, so maybe the need to mention commit
graph in the code is there too. Subsequently, maybe the lookup-for-type
could be replaced by a lookup-in-commit-graph (maybe by using
parse_commit_in_graph() directly), which should be at least slightly
faster.

> In general, it would be nice if we had a more incremental API
> for accessing objects: open, get metadata, then read the data. That
> would make these kinds of optimizations "free".

Would this be assuming that to read the data, you would (1) first need to
read the metadata, and (2) there would be no redundancy in reading the
two? It seems to me that for loose objects, you would want to perform
all your reads at once, since any read requires opening the file, and
for commit graphs, you just want to read what you want, since the
metadata and the data are in separate places.

> I don't have numbers for how much the extra lookups cost. The lookups
> are probably dwarfed by parse_object() in general, so even if we save
> only a few full object loads, it may be a win. It just seems a shame
> that we may be making the "slow" paths (when our type-specific check
> doesn't match) even slower.

I agree. I think it will always remain a tradeoff when we have multiple
data sources of objects (loose, packed, commit graph - and we can't
unify them all, since they each have their uses). Unless the multi-pack
index can reference commit graphs as well...then it could be our first
point of reference without introducing any inefficiencies...

^ permalink raw reply

* Re: [PATCH on sb/more-repo-in-api] revision: use commit graph in get_reference()
From: Jonathan Tan @ 2018-12-06 23:36 UTC (permalink / raw)
  To: sbeller; +Cc: jonathantanmy, git
In-Reply-To: <CAGZ79kYOOk2ODYgRcSZgDUqBfx2HeywnEGpbJB9BrrVzEUi_JA@mail.gmail.com>

> > This is on sb/more-repo-in-api because I'm using the repo_parse_commit()
> > function.
> 
> This is a mere nicety, not strictly required.
> Before we had parse_commit(struct commit *) which would accomplish the
> same, (and we'd still have that afterwards as a #define falling back onto
> the_repository). As the function get_reference() is not the_repository safe
> as it contains a call to is_promisor_object() that is repository
> agnostic, I think
> it would be fair game to not depend on that series. I am not
> complaining, though.

Good point - I'll base the next version on master (and add a TODO
explaining which functions are not yet converted).

> AFAICT oid_object_info doesn't take advantage of the commit graph,
> but just looks up the object header, which is still less than completely
> parsing it. Then lookup_commit is overly strict, as it may return
> NULL as when there still is a type mismatch (I don't think a mismatch
> could happen here, as both rely on just the object store, and not the
> commit graph.), so this would be just defensive programming for
> the sake of it. I dunno.
> 
>     struct commit *c;
> 
>     if (oid_object_info(revs->repo, oid, NULL) == OBJ_COMMIT &&
>         (c = lookup_commit(revs->repo, oid)) &&
>         !repo_parse_commit(revs->repo, c))
>             object = (struct object *) c;
>     else
>         object = parse_object(revs->repo, oid);

I like this way better - I'll do it in the next version.

^ permalink raw reply

* Re: [WIP RFC 4/5] upload-pack: refactor writing of "packfile" line
From: Jonathan Tan @ 2018-12-06 23:25 UTC (permalink / raw)
  To: gitster; +Cc: jonathantanmy, git
In-Reply-To: <xmqqzhtj88nw.fsf@gitster-ct.c.googlers.com>

> Jonathan Tan <jonathantanmy@google.com> writes:
> 
> > @@ -126,6 +129,12 @@ static int read_pack_objects_stdout(int outfd, struct output_state *os)
> >  	}
> >  	os->used += readsz;
> >  
> > +	if (!os->packfile_started) {
> > +		os->packfile_started = 1;
> > +		if (use_protocol_v2)
> > +			packet_write_fmt(1, "packfile\n");
> 
> If we fix this function so that the only byte in the buffer is held
> back without emitted when os->used == 1 as I alluded to, this may
> have to be done a bit later, as with such a change, it is no longer
> guaranteed that send_client_data() will be called after this point.

I'm not sure what you mean about there being no guarantee that
send_client_data() is not called - in create_pack_file(), there is an
"if (output_state.used > 0)" line (previously "if (0 <= buffered)") that
outputs anything remaining.

> Isn't progress output that goes to the channel #2 pretty much
> independent from the payload stream that carries the pkt-line 
> command like "packfile" plus the raw pack stream?  It somehow
> feels like an oxymoron to _buffer_ progress indicator, as it
> defeats the whole point of progress report to buffer it.

Yes, it is - I don't fully like this part of the design. I alluded to a
similar issue (keepalive) in the toplevel email [1] and that it might be
better if the server can send sideband throughout the whole response -
perhaps that should be investigated first. If we had sideband throughout
the whole response, we wouldn't need to buffer the progress indicator.

[1] https://public-inbox.org/git/cover.1543879256.git.jonathantanmy@google.com/

^ permalink raw reply

* Re: [WIP RFC 2/5] Documentation: add Packfile URIs design doc
From: Jonathan Tan @ 2018-12-06 23:16 UTC (permalink / raw)
  To: gitster; +Cc: jonathantanmy, git
In-Reply-To: <xmqqbm60d0s1.fsf@gitster-ct.c.googlers.com>

> > +This feature allows servers to serve part of their packfile response as URIs.
> > +This allows server designs that improve scalability in bandwidth and CPU usage
> > +(for example, by serving some data through a CDN), and (in the future) provides
> > +some measure of resumability to clients.
> 
> Without reading the remainder, this makes readers anticipate a few
> good things ;-)
> 
>  - "part of", so pre-generated constant material can be given from
>    CDN and then followed-up by "filling the gaps" small packfile,
>    perhaps?

Yes :-)

>  - The "part of" transmission may not bring the repository up to
>    date wrt to the "want" objects; would this feature involve "you
>    asked history up to these commits, but with this pack-uri, you'll
>    be getting history up to these (somewhat stale) commits"?

It could be, but not necessarily. In my current WIP implementation, for
example, pack URIs don't give you any commits at all (and thus, no
history) - only blobs. Quite a few people first think of the "stale
clone then top-up" case, though - I wonder if it would be a good idea to
give the blob example in this paragraph in order to put people in the
right frame of mind.

> > +If the client replies with the following arguments:
> > +
> > + * packfile-uris
> > + * thin-pack
> > + * ofs-delta
> 
> "with the following" meaning "with all of the following", or "with
> any of the following"?  Is there a reason why the server side must
> require that the client understands and is willing to accept a
> thin-pack when wanting to use packfile-uris?  The same question for
> the ofs-delta.

"All of the following", but from your later comments, we probably don't
need this section anyway.

> My recommendation is to drop the mention of "thin" and "ofs" from
> the above list, and also from the following paragraph.  The "it MAY
> send" will serve as a practical escape clause to allow a server/CDN
> implementation that *ALWAYS* prepares pregenerated material that can
> only be digested by clients that supports thin and ofs.  Such a server
> can send packfile-URIs only when all of the three are given by the
> client and be compliant.  And such an update to the proposed document
> would allow a more diskful server to prepare both thin and non-thin
> pregenerated packs and choose which one to give to the client depending
> on the capability.

That is true - we can just let the server decide. I'll update the patch
accordingly, and state that the URIs should point to packs with features
like thin-pack and ofs-delta only if the client has declared that it
supports them.

> > +Clients then should understand that the returned packfile could be incomplete,
> > +and that it needs to download all the given URIs before the fetch or clone is
> > +complete. Each URI should point to a Git packfile (which may be a thin pack and
> > +which may contain offset deltas).
> 
> weaken or remove the (parenthetical comment) in the last sentence,
> and replace the beginning of the section with something like
> 
> 	If the client replies with 'packfile-uris', when the server
> 	sends the packfile, it MAY send a `packfile-uris` section...
> 
> You may steal what I wrote in the above response to help the
> server-side folks to decide how to actually implement the "it MAY
> send a packfile-uris" part in the document.

OK, will do.

> OK, this comes back to what I alluded to at the beginning.  We could
> respond to a full-clone request by feeding a series of packfile-uris
> and some ref information, perhaps like this:
> 
> 	* Grab this packfile and update your remote-tracking refs
>           and tags to these values; you'd be as if you cloned the
>           project when it was at v1.0.
> 
> 	* When you are done with the above, grab this packfile and
>           update your remote-tracking refs and tags to these values;
>           you'd be as if you cloned the project when it was at v2.0.
> 
> 	* When you are done with the above, grab this packfile and
>           update your remote-tracking refs and tags to these values;
>           you'd be as if you cloned the project when it was at v3.0.
> 
> 	...
> 
> 	* When you are done with the above, here is the remaining
>           packdata to bring you fully up to date with your original
>           "want"s.
> 
> and before fully reading the proposal, I anticipated that it was
> what you were going to describe.  The major difference is "up to the
> packdata given to you so far, you'd be as if you fetched these" ref
> information, which would allow you to be interrupted and then simply
> resume, without having to remember the set of packfile-uris yet to
> be processed across a fetch/clone failure.  If you sucessfully fetch
> packfile for ..v1.0, you can update the remote-tracking refs to
> match as if you fetched back when that was the most recent state of
> the project, and then if you failed while transferring packfile for
> v1.0..v2.0, the resuming would just reissue "git fetch" internally.

The "up to" would work if we had the stale clone + periodic "upgrades"
arrangement you describe, but not when, for example, we just want to
separate large blobs out. If we were to insist on attaching ref
information to each packfile URI (or turn the packfiles into bundles),
it is true that we can have resumable fetch, although I haven't fully
thought out the implications of letting the user modify the repository
while a fetch is in progress. (What happens if the user wipes out their
object store in between fetching one packfile and the next, for
example?) That is why I only talked about resumable clone, not resumable
fetch.

^ permalink raw reply

* Re: [RFC PATCH 3/3] test-lib: add the '--stress' option to run a test repeatedly under load
From: SZEDER Gábor @ 2018-12-06 23:10 UTC (permalink / raw)
  To: Jeff King; +Cc: git
In-Reply-To: <20181205215621.GE19936@sigill.intra.peff.net>

On Wed, Dec 05, 2018 at 04:56:21PM -0500, Jeff King wrote:
> Could we just kill them all?
> 
> I guess it's a little tricky, because $! is going to give us the pid of
> each subshell. We actually want to kill the test sub-process. That takes
> a few contortions, but the output is nice (every sub-job immediately
> says "ABORTED ...", and the final process does not exit until the whole
> tree is done):

It's trickier than that:

  - Nowadays our test lib translates SIGINT to exit, but not TERM (or
    HUP, in case a user decides to close the terminal window), which
    means that if the test runs any daemons in the background, then
    those won't be killed when the stress test is aborted.

    This is easy enough to address, and I've been meaning to do this
    in an other patch series anyway.

  - With the (implied) '--verbose-log' the process killed in the
    background subshell's SIGTERM handler is not the actual test
    process, because '--verbose-log' runs the test again in a piped
    subshell to save its output:
    
      (GIT_TEST_TEE_STARTED=done ${TEST_SHELL_PATH} "$0" "$@" 2>&1;
       echo $? >"$TEST_RESULTS_BASE.exit") | tee -a "$GIT_TEST_TEE_OUTPUT_FILE"

    That 'kill' kills the "outer" shell process.
    And though I get "ABORTED..." immediately and I get back my
    prompt right away, the commands involved in the above construct
    are still running in the background:

      $ ./t3903-stash.sh --stress=1
      ^CABORTED  0.0
      $ ps a |grep t3903
      1280 pts/17   S      0:00 /bin/sh ./t3903-stash.sh --stress=1
      1281 pts/17   S      0:00 tee -a <...>/test-results/t3903-stash.stress-0.out
      1282 pts/17   S      0:00 /bin/sh ./t3903-stash.sh --stress=1
      4173 pts/17   S+     0:00 grep t3903

    I see this behavior with all shells I tried.
    I haven't yet started to think it through why this happens :)

    Not implying '--verbose-log' but redirecting the test script's
    output, like you did in your 'stress' script, seems to work in
    dash, ksh, and ks93, but not in Bash v4.3 or later, where, for
    whatever reason, the subshell get SIGINT before the SIGTERM would
    arrive.
    While we could easily handle SIGINT in the subshell with the same
    signal handler as SIGTERM, it bothers me that something
    fundamental is wrong here.
    Furthermore, then we should perhaps forbid '--stress' in
    combination with '--verbose-log' or '--tee'.
    

> diff --git a/t/test-lib.sh b/t/test-lib.sh
> index 9b7f687396..357dead3ff 100644
> --- a/t/test-lib.sh
> +++ b/t/test-lib.sh
> @@ -98,8 +98,9 @@ done,*)
>  	mkdir -p "$(dirname "$TEST_RESULTS_BASE")"
>  	stressfail="$TEST_RESULTS_BASE.stress-failed"
>  	rm -f "$stressfail"
> -	trap 'echo aborted >"$stressfail"' TERM INT HUP
> +	trap 'echo aborted >"$stressfail"; kill $job_pids 2>/dev/null; wait' TERM INT HUP
>  
> +	job_pids=
>  	job_nr=0
>  	while test $job_nr -lt "$job_count"
>  	do
> @@ -108,10 +109,13 @@ done,*)
>  			GIT_TEST_STRESS_JOB_NR=$job_nr
>  			export GIT_TEST_STRESS_STARTED GIT_TEST_STRESS_JOB_NR
>  
> +			trap 'kill $test_pid 2>/dev/null' TERM
> +
>  			cnt=0
>  			while ! test -e "$stressfail"
>  			do
> -				if $TEST_SHELL_PATH "$0" "$@" >/dev/null 2>&1
> +				$TEST_SHELL_PATH "$0" "$@" >/dev/null 2>&1 & test_pid=$!
> +				if wait
>  				then
>  					printf >&2 "OK   %2d.%d\n" $GIT_TEST_STRESS_JOB_NR $cnt
>  				elif test -f "$stressfail" &&
> @@ -124,16 +128,11 @@ done,*)
>  				fi
>  				cnt=$(($cnt + 1))
>  			done
> -		) &
> +		) & job_pids="$job_pids $!"
>  		job_nr=$(($job_nr + 1))
>  	done
>  
> -	job_nr=0
> -	while test $job_nr -lt "$job_count"
> -	do
> -		wait
> -		job_nr=$(($job_nr + 1))
> -	done
> +	wait
>  
>  	if test -f "$stressfail" && test "$(cat "$stressfail")" != "aborted"
>  	then
> 


^ permalink raw reply

* Re: [BUG REPORT] Git does not correctly replay bisect log
From: Christian Couder @ 2018-12-06 23:02 UTC (permalink / raw)
  To: lskrejci; +Cc: git
In-Reply-To: <5ab4e3fb2e22097753bde7702e67d6ce924283a2.camel@gmail.com>

On Thu, Dec 6, 2018 at 6:30 PM Lukáš Krejčí <lskrejci@gmail.com> wrote:
>
> I am talking about `git bisect replay`. The shell script, as far as I
> can see, only updates the references (ref/bisect/*) and never checks if
> the revisions marked as 'good' are ancestors of the 'bad' one.
> Therefore, $GIT_DIR/BISECT_ANCESTORS_OK file is never created.

Indeed `git bisect replay` first only updates the references
(ref/bisect/*) according to all the "git bisect {good,bad}"
instructions it finds in the log it is passed. After doing that
though, before exiting, it calls `bisect_auto_next` which calls
`bisect_next` which calls `git bisect--helper --next-all` which checks
the merge bases.

I think it is a bug.

`git bisect replay` is right to only update the references
(ref/bisect/*) and not to compute and checkout the best commit to test
at each step except at the end, but it should probably just create the
$GIT_DIR/BISECT_ANCESTORS_OK file if more than one bisection step has
been performed (because the merge bases are checked as part of the
first bisection step only).

> The first time the ancestors are checked is in the helper (`git-bisect-
> -help --next-all`) that has only limited information from refs/bisect*.

`git-bisect--helper --next-all` knows how to get refs/bisect*
information, otherwise it couldn't decide which is the next best
commit to test.

Thanks for your help in debugging this,
Christian.

^ permalink raw reply

* Re: git, monorepos, and access control
From: Coiner, John @ 2018-12-06 22:59 UTC (permalink / raw)
  To: Johannes Schindelin, Jeff King; +Cc: git@vger.kernel.org
In-Reply-To: <nycvar.QRO.7.76.6.1812062100020.41@tvgsbejvaqbjf.bet>

Johannes,

Thanks for your feedback.

I'm not looking closely at submodules, as it's my understanding that 
VFSForGit does not support them. A VFS would be a killer feature for us. 
If VFSForGit were to support submodules, we'd look at them. They would 
provide access control in a way that's clearly nonabusive. I hear you on 
the drawbacks.

AMD today looks more like the 100 independent repos you describe, except 
we don't have automation at the delivery arcs. Integrations are manual 
and thus not particularly frequent.

I'm probably biased toward favoring a monorepo, which I've seen applied 
at a former employer, versus continuous delivery. That's due to lack of 
personal familiarity with CD -- not any real objections.

Thanks,

John

On 12/06/2018 03:08 PM, Johannes Schindelin wrote:
> Hi,
>
> On Wed, 5 Dec 2018, Jeff King wrote:
>
>> The model that fits more naturally with how Git is implemented would be
>> to use submodules. There you leak the hash of the commit from the
>> private submodule, but that's probably obscure enough (and if you're
>> really worried, you can add a random nonce to the commit messages in the
>> submodule to make their hashes unguessable).
> I hear myself frequently saying: "Friends don't let friends use
> submodules". It's almost like: "Some people think their problem is solved
> by using submodules. Only now they have two problems."
>
> There are big reasons, after all, why some companies go for monorepos: it
> is not for lack of trying to go with submodules, it is the problems that
> were incurred by trying to treat entire repositories the same as single
> files (or even trees): they are just too different.
>
> In a previous life, I also tried to go for submodules, was burned, and had
> to restart the whole thing. We ended up with something that might work in
> this instance, too, although our use case was not need-to-know type of
> encapsulation. What we went for was straight up modularization.
>
> What I mean is that we split the project up into over 100 individual
> projects that are now all maintained in individual repositories, and they
> are connected completely outside of Git, via a dependency management
> system (in this case, Maven, although that is probably too Java-centric
> for AMD's needs).
>
> I just wanted to throw that out here: if you can split up your project
> into individual projects, it might make sense not to maintain them as
> submodules but instead as individual repositories whose artifacts are
> uploaded into a central, versioned artifact store (Maven, NuGet, etc). And
> those artifacts would then be retrieved by the projects that need them.
>
> I figure that that scheme might work for you better than submodules: I
> could imagine that you need to make the build artifacts available even to
> people who are not permitted to look at the corresponding source code,
> anyway.
>
> Ciao,
> Johannes


^ permalink raw reply

* Re: [RFC PATCH 3/3] test-lib: add the '--stress' option to run a test repeatedly under load
From: SZEDER Gábor @ 2018-12-06 22:56 UTC (permalink / raw)
  To: Jeff King; +Cc: git
In-Reply-To: <20181205213625.GD19936@sigill.intra.peff.net>

On Wed, Dec 05, 2018 at 04:36:26PM -0500, Jeff King wrote:
> The signal interrupts the first wait.

Ah, of course.  I'm ashamed to say that this is not the first time I
forget about that...

> > Bash 4.3 or later are strange: I get back the shell prompt immediately
> > after ctrl-C as well, so it doesn't appear to be waiting for all
> > remaining jobs to finish either, but! I don't get any of the progress
> > output from those jobs to mess up my next command.
> 
> Interesting. My bash 4.4 seems to behave the same as dash. It almost
> sounds like the SIGINT is getting passed to the subshells for you.
> Probably not really worth digging into, though.

The subshell does indeed get SIGINT.  I don't know why that happens,
to my understanding that should not happen.

> In case anybody is playing with this and needed to simulate a stress
> failure, here's what I used:
> 
> diff --git a/t/t0000-basic.sh b/t/t0000-basic.sh
> index b6566003dd..a370cd9977 100755
> --- a/t/t0000-basic.sh
> +++ b/t/t0000-basic.sh
> @@ -1171,4 +1171,11 @@ test_expect_success 'very long name in the index handled sanely' '
>  	test $len = 4098
>  '
>  
> +test_expect_success 'roll those dice' '
> +	case "$(openssl rand -base64 1)" in
> +	z*)
> +		return 1
> +	esac
> +'

Wasteful :)

  test $(($$ % 42)) -ne 0


^ permalink raw reply

* Re: [WIP RFC 1/5] Documentation: order protocol v2 sections
From: Jonathan Tan @ 2018-12-06 22:54 UTC (permalink / raw)
  To: gitster; +Cc: jonathantanmy, git
In-Reply-To: <xmqqin08d36k.fsf@gitster-ct.c.googlers.com>

> > The git command line expects Git servers to follow a specific order of
> 
> "Command line"?  It sounds like you are talking about the order of
> command line arguments and options, but apparently that is not what
> you are doing.  Is it "The git over-the-wire protocol"?

I meant to say the current Git implementation, as opposed to what is
written in the specification. I'll replace it with "The current C Git
implementation".

> Earlier, we said that shallow-info is not given when packfile is not
> there.  That is captured in the updated EBNF above.  We don't have a
> corresponding removal of a bullet point for wanted-refs section below
> but probably that is because the original did not have corresponding
> bullet point to begin with.

That's because the corresponding bullet point had other information.
Quoted in full below:

> 	* This section is only included if the client has requested a
> 	  ref using a 'want-ref' line and if a packfile section is also
> 	  included in the response.

I could reword it to "If a packfile section is included in the response,
this section is only included if the client has requested a ref using a
'want-ref' line", but I don't think that is significantly clearer.

^ permalink raw reply

* behaviour of git-blame -M -C (maybe a bug?)
From: dmg @ 2018-12-06 22:41 UTC (permalink / raw)
  To: git



hi everybody,

I am the maintainer of cregit. We are trying to improve blame 
traceability at the token level (see 
https://github.com/dmgerman/papers/blob/master/editorials/cregit/cregit.org)

We use git-blame heavilty in cregit. One of the features that I 
would like to add to cregit is the ability track movement of code.

I have been testing git-blame -M -C and I found some behaviour 
that  seems incorrect. I have created a very simple repository 
that I think showcases this problem:

https://github.com/dmgerman/testBlameMove

this repo have 4 commits (listed below in order of execution):

1. A file is created tpm-dev.c (authored by D German),
2. a refactoring (code is moved from tpm-dev.c to 
tpm-dev-common.c, a new file). Author is "refactor"
3. a commit that adds some few contiguous lines (the existence of 
this commit seems to matter). Author is "none"
4. a commit that changes few lines and alters the result of blame 
for lines not modified by this commit. Author is "problem"

See below. I am running blame at different commits, showing only 
the lines attributed to author "refactor" (author of commit #2).

dmg@iodine:/tmp/testRepo|master ⇒  git log --oneline
ded1aa1 (HEAD -> master, origin/master) problematic commit
3720e68 simple commit
391adba refactoring
33165cb file before refactoring

if we checkout 391adba and do blame -M -C we get this:

dmg@iodine:/tmp/testRepo|3720e68 ⇒  git checkout 3720e68 && git 
blame -M -C tpm-dev-common.c | grep refactor | head
HEAD is now at 3720e68 simple commit
391adba4 tpm-dev-common.c (refactor 2018-12-06 12:41:10 -0800  24) 
begin_include
391adba4 tpm-dev-common.c (refactor 2018-12-06 12:41:10 -0800  25) 
include|#
391adba4 tpm-dev-common.c (refactor 2018-12-06 12:41:10 -0800  26) 
directive|include
391adba4 tpm-dev-common.c (refactor 2018-12-06 12:41:10 -0800  27) 
file|"tpm-dev.h"
391adba4 tpm-dev-common.c (refactor 2018-12-06 12:41:10 -0800  28) 
end_include
391adba4 tpm-dev-common.c (refactor 2018-12-06 12:41:10 -0800 147) 
DECL|function|tpm_common_open (struct file * file,struct tpm_chip 
* chip,struct file_priv * priv)
391adba4 tpm-dev-common.c (refactor 2018-12-06 12:41:10 -0800 148) 
name|void
391adba4 tpm-dev-common.c (refactor 2018-12-06 12:41:10 -0800 149) 
name|tpm_common_open
391adba4 tpm-dev-common.c (refactor 2018-12-06 12:41:10 -0800 150) 
parameter_list|(
391adba4 tpm-dev-common.c (refactor 2018-12-06 12:41:10 -0800 151) 
name|struct

so far, so good. blame detects the movement. Note that the changes 
by refactor are adding 5 lines (24 to 28) and then adding some at 
147 and beyond.

now do it for the next commit: 3720e68


things continue to look good. The changes of this commit do not 
affect any of these lines.

now... the next commit, the problematic: ded1aa1 (author is not 
refactor)

dmg@iodine:/tmp/testRepo|3720e68 ⇒  git checkout ded1aa1 && git 
blame -M -C tpm-dev-common.c | grep refactor | head
Previous HEAD position was 3720e68 simple commit
HEAD is now at ded1aa1 problematic commit
391adba4 tpm-dev-common.c (refactor 2018-12-06 12:41:10 -0800  24) 
begin_include
391adba4 tpm-dev-common.c (refactor 2018-12-06 12:41:10 -0800  25) 
include|#
391adba4 tpm-dev-common.c (refactor 2018-12-06 12:41:10 -0800  26) 
directive|include
391adba4 tpm-dev-common.c (refactor 2018-12-06 12:41:10 -0800  27) 
file|"tpm-dev.h"
391adba4 tpm-dev-common.c (refactor 2018-12-06 12:41:10 -0800  28) 
end_include
391adba4 tpm-dev-common.c (refactor 2018-12-06 12:41:10 -0800  29)
391adba4 tpm-dev-common.c (refactor 2018-12-06 12:41:10 -0800  30) 
begin_function
391adba4 tpm-dev-common.c (refactor 2018-12-06 12:41:10 -0800  53) 
name|user_read_timer
391adba4 tpm-dev-common.c (refactor 2018-12-06 12:41:10 -0800  54) 
argument_list|)
391adba4 tpm-dev-common.c (refactor 2018-12-06 12:41:10 -0800 150) 
DECL|function|tpm_common_open (struct file * file,struct tpm_chip 
* chip,struct file_priv * priv)

now blame assigns the lines 29, 30, 53 and 54 to commit 391adba4 
refactor!!! This is what I think is a bug.
(by the way, the changes made in this last commit were between 28 
and 150)

thank you in advance for any clues on why git-blame is behaving 
like this.

--dmg

---
D M German
http://turingmachine.org

^ permalink raw reply

* Re: enhancement: support for author.email and author.name in "git config"
From: Johannes Schindelin @ 2018-12-06 22:20 UTC (permalink / raw)
  To: William Hubbs; +Cc: Martin Ågren, Git Mailing List, chutzpah
In-Reply-To: <20181206191737.GA31091@whubbs1.gaikai.biz>

[-- Attachment #1: Type: text/plain, Size: 681 bytes --]

Hi William,

On Thu, 6 Dec 2018, William Hubbs wrote:

> On Thu, Dec 06, 2018 at 07:38:08PM +0100, Martin Ågren wrote:
> > Hi William,
> > 
> > [...]
> > 
> > This idea was floated a couple of months ago [1]. Junio seemed to find
> > the request sensible and outlined a design. No patches materialized, as
> > far as I know, but that could be because Eric suggested a tool called
> > direnv. Maybe that would work for you.
>  
>  Yes, this design would solve the issue.

There *is* a way to get what you want that is super easy and will
definitely work: if you sit down and do it ;-)

Please let us know if you need any additional information before you can
start.

Ciao,
Johannes

^ permalink raw reply

* Re: git, monorepos, and access control
From: Stefan Beller @ 2018-12-06 22:15 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Jeff King, John.Coiner, git
In-Reply-To: <nycvar.QRO.7.76.6.1812062100020.41@tvgsbejvaqbjf.bet>

On Thu, Dec 6, 2018 at 12:09 PM Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>
> Hi,
>
> On Wed, 5 Dec 2018, Jeff King wrote:
>
> > The model that fits more naturally with how Git is implemented would be
> > to use submodules. There you leak the hash of the commit from the
> > private submodule, but that's probably obscure enough (and if you're
> > really worried, you can add a random nonce to the commit messages in the
> > submodule to make their hashes unguessable).
>
> I hear myself frequently saying: "Friends don't let friends use
> submodules". It's almost like: "Some people think their problem is solved
> by using submodules. Only now they have two problems."

Blaming tools for their lack of evolution/development is not necessarily the
right approach. I recall having patches rejected on this very mailing list
that fixed obvious but minor good things like whitespaces and coding style,
because it *might* produce merge conflicts. Would that situation warrant me
to blame the lacks in the merge algorithm, or could you imagine a better
way out? (No need to answer, it's purely to demonstrate that blaming
tooling is not always the right approach; only sometimes it may be)

> There are big reasons, after all, why some companies go for monorepos: it
> is not for lack of trying to go with submodules, it is the problems that
> were incurred by trying to treat entire repositories the same as single
> files (or even trees): they are just too different.

We could change that in more places.

One example you might think of is the output of git-status that displays
changed files. And in case of submodules it would just show
"submodule changes", which we already differentiate into "dirty tree" and
"different sha1 at HEAD".
Instead we could have the output of all changed files recursively in the
superprojects git-status output.

Another example is the diff machinery, which already knows some
basics such as embedding submodule logs or actual diffs.

> In a previous life, I also tried to go for submodules, was burned, and had
> to restart the whole thing. We ended up with something that might work in
> this instance, too, although our use case was not need-to-know type of
> encapsulation. What we went for was straight up modularization.

So this is a "Fix the data instead of the tool", which seems to be a local
optimization (i.e. you only have to do it once, such that it is cheaper to
do than fixing the tool for that workgroup)
... and because everyone does that the tool never gets fixed.

> What I mean is that we split the project up into over 100 individual
> projects that are now all maintained in individual repositories, and they
> are connected completely outside of Git, via a dependency management
> system (in this case, Maven, although that is probably too Java-centric
> for AMD's needs).

Once you have the dependency management system in place, you
will encounter the rare case of still wanting to change things across
repository boundaries at the same time. Submodules offer that, which
is why Android wants to migrate off of the repo tool, and there it seems
natural to go for submodules.

> I just wanted to throw that out here: if you can split up your project
> into individual projects, it might make sense not to maintain them as
> submodules but instead as individual repositories whose artifacts are
> uploaded into a central, versioned artifact store (Maven, NuGet, etc). And
> those artifacts would then be retrieved by the projects that need them.

This is cool and industry standard. But once you happen to run in a bug
that involves 2 new artifacts (but each of the new artifacts work fine
on their own), then you'd wish for something like "git-bisect but across
repositories". Submodules (in theory) allow for fine grained bisection
across these repository boundaries, I would think.

> I figure that that scheme might work for you better than submodules: I
> could imagine that you need to make the build artifacts available even to
> people who are not permitted to look at the corresponding source code,
> anyway.

This is a sensible suggestion, as they probably don't want to ramp up
development on submodules. :-)

^ permalink raw reply

* Re: [RFC] cherry-pick notes to find out cherry-picks from the origin
From: Tejun Heo @ 2018-12-06 22:15 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Junio C Hamano, kernel-team
In-Reply-To: <20181205161103.GO2509588@devbig004.ftw2.facebook.com>

Hello, Jeff.

So, this is what I currently have.  It still does the same thing but a
lot more generic in terms of both interface and implementation.

* All core logics are implemented as core helpers / features.

  * Trailer parsing and reverse-mapping in trailer_rev_xrefs_*().

  * Note refs which start with xref- (cross-reference) are recognized
    by notes core.  When notes are added, a dedicated combine_notes
    function is used to remove duplicates and curl unreachable
    commits.  When xref- notes are formatted for printing, it
    automatically follows and prints nested xrefs.

* note-cherry-picks is replaced with reverse-trailer-xrefs which can
  use other trailers, note refs and tags.  --xref-cherry-picks option
  makes it use the cherry-pick presets.

Please note that the patch is still a bit rough.  I'm polishing and
documenting.  Please let me know what you think.

Thanks.

---
 Makefile                        |    1 
 builtin.h                       |    1 
 builtin/reverse-trailer-xrefs.c |  148 ++++++++++++++++++++++++
 git.c                           |    1 
 notes.c                         |  245 +++++++++++++++++++++++++++++++++++++++-
 notes.h                         |   10 +
 object.c                        |    4 
 object.h                        |    6 
 trailer.c                       |  102 ++++++++++++++++
 trailer.h                       |   26 ++++
 10 files changed, 540 insertions(+), 4 deletions(-)

diff --git a/Makefile b/Makefile
index 1a44c811a..3c23ecf9d 100644
--- a/Makefile
+++ b/Makefile
@@ -1086,6 +1086,7 @@ BUILTIN_OBJS += builtin/multi-pack-index.o
 BUILTIN_OBJS += builtin/mv.o
 BUILTIN_OBJS += builtin/name-rev.o
 BUILTIN_OBJS += builtin/notes.o
+BUILTIN_OBJS += builtin/reverse-trailer-xrefs.o
 BUILTIN_OBJS += builtin/pack-objects.o
 BUILTIN_OBJS += builtin/pack-redundant.o
 BUILTIN_OBJS += builtin/pack-refs.o
diff --git a/builtin.h b/builtin.h
index 6538932e9..51089e258 100644
--- a/builtin.h
+++ b/builtin.h
@@ -195,6 +195,7 @@ extern int cmd_multi_pack_index(int argc, const char **argv, const char *prefix)
 extern int cmd_mv(int argc, const char **argv, const char *prefix);
 extern int cmd_name_rev(int argc, const char **argv, const char *prefix);
 extern int cmd_notes(int argc, const char **argv, const char *prefix);
+extern int cmd_reverse_trailer_xrefs(int argc, const char **argv, const char *prefix);
 extern int cmd_pack_objects(int argc, const char **argv, const char *prefix);
 extern int cmd_pack_redundant(int argc, const char **argv, const char *prefix);
 extern int cmd_patch_id(int argc, const char **argv, const char *prefix);
diff --git a/builtin/reverse-trailer-xrefs.c b/builtin/reverse-trailer-xrefs.c
new file mode 100644
index 000000000..b2879be6c
--- /dev/null
+++ b/builtin/reverse-trailer-xrefs.c
@@ -0,0 +1,148 @@
+#include "builtin.h"
+#include "cache.h"
+#include "strbuf.h"
+#include "repository.h"
+#include "config.h"
+#include "commit.h"
+#include "blob.h"
+#include "notes.h"
+#include "notes-utils.h"
+#include "trailer.h"
+#include "revision.h"
+#include "list-objects.h"
+#include "object-store.h"
+#include "parse-options.h"
+
+static const char * const reverse_trailer_xrefs_usage[] = {
+	N_("git reverse_trailer_xrefs [<options>] [<commit-ish>...]"),
+	NULL
+};
+
+static const char cherry_picked_prefix[] = "(cherry picked from commit ";
+static int verbose;
+
+static void clear_trailer_xref_note(struct commit *commit, void *data)
+{
+	struct notes_tree *tree = data;
+	int status;
+
+	status = remove_note(tree, commit->object.oid.hash);
+
+	if (verbose) {
+		if (status)
+			fprintf(stderr, "Object %s has no note\n",
+				oid_to_hex(&commit->object.oid));
+		else
+			fprintf(stderr, "Removing note for object %s\n",
+				oid_to_hex(&commit->object.oid));
+	}
+}
+
+static void record_trailer_xrefs(struct commit *commit, void *data)
+{
+	trailer_rev_xrefs_record(data, commit);
+}
+
+static int note_trailer_xrefs(struct notes_tree *tree,
+			      struct commit *from_commit, struct object_array *to_objs,
+			      const char *tag)
+{
+	char from_hex[GIT_MAX_HEXSZ + 1];
+	struct strbuf note = STRBUF_INIT;
+	struct object_id note_oid;
+	int i, ret;
+
+	oid_to_hex_r(from_hex, &from_commit->object.oid);
+
+	for (i = 0; i < to_objs->nr; i++) {
+		const char *hex = to_objs->objects[i].name;
+
+		if (tag)
+			strbuf_addf(&note, "%s: %s\n", tag, hex);
+		else
+			strbuf_addf(&note, "%s\n", tag);
+		if (verbose)
+			fprintf(stderr, "Adding note %s -> %s\n", from_hex, hex);
+	}
+
+	ret = write_object_file(note.buf, note.len, blob_type, &note_oid);
+	strbuf_release(&note);
+	if (ret)
+		return ret;
+
+	ret = add_note(tree, &from_commit->object.oid, &note_oid, NULL);
+	return ret;
+}
+
+int cmd_reverse_trailer_xrefs(int argc, const char **argv, const char *prefix)
+{
+	static struct notes_tree tree;
+	struct rev_info revs;
+	int i, ret;
+	struct setup_revision_opt s_r_opt = {
+		.def = "HEAD",
+		.revarg_opt = REVARG_CANNOT_BE_FILENAME
+	};
+	int cherry = 0, clear = 0;
+	const char *trailer_prefix = NULL, *notes_ref = NULL, *tag = NULL;
+	struct option options[] = {
+		OPT_BOOL(0, "xref-cherry-picks", &cherry, N_("use options for xref-cherry-picks notes")),
+		OPT_STRING(0, "trailer-prefix", &trailer_prefix, N_("prefix"), N_("process trailers starting with <prefix>")),
+		OPT_STRING(0, "ref", &notes_ref, N_("notes-ref"), N_("update notes in <notes-ref>")),
+		OPT_STRING(0, "tag", &tag, N_("tag"), N_("tag xref notes with <tag>")),
+		OPT_BOOL(0, "clear", &clear, N_("clear trailer xref notes from the specified commits")),
+		OPT__VERBOSE(&verbose, N_("verbose")),
+		OPT_END()
+	};
+
+	git_config(git_default_config, NULL);
+
+	init_revisions(&revs, prefix);
+	argc = setup_revisions(argc, argv, &revs, &s_r_opt);
+	argc = parse_options(argc, argv, prefix, options,
+			     reverse_trailer_xrefs_usage, 0);
+
+	/* allow inidividual options to override parts of --cherry */
+	if (cherry) {
+		if (!trailer_prefix)
+			trailer_prefix = cherry_picked_prefix;
+		if (!notes_ref)
+			notes_ref = NOTES_CHERRY_PICKS_REF;
+		if (!tag)
+			tag = NOTES_CHERRY_PICKED_TO_TAG;
+	}
+
+	if (!notes_ref || (!clear && (!trailer_prefix || !tag)))
+		die(_("insufficient arguments"));
+
+	if (argc > 1)
+		die(_("unrecognized argument: %s"), argv[1]);
+
+	if (!tree.initialized)
+		init_notes(&tree, notes_ref, NULL, NOTES_INIT_WRITABLE);
+
+	if (prepare_revision_walk(&revs))
+		die("revision walk setup failed");
+
+	if (clear) {
+		traverse_commit_list(&revs, clear_trailer_xref_note, NULL, &tree);
+	} else {
+		struct trailer_rev_xrefs rxrefs;
+		struct commit *from_commit;
+		struct object_array *to_objs;
+
+		trailer_rev_xrefs_init(&rxrefs, trailer_prefix);
+		traverse_commit_list(&revs, record_trailer_xrefs, NULL, &rxrefs);
+
+		trailer_rev_xrefs_for_each(&rxrefs, i, from_commit, to_objs) {
+			ret = note_trailer_xrefs(&tree, from_commit, to_objs,
+						 tag);
+			if (ret)
+				return ret;
+		}
+	}
+
+	commit_notes(&tree, "Notes updated by 'git reverse-trailer-xrefs'");
+
+	return 0;
+}
diff --git a/git.c b/git.c
index 2f604a41e..4948c8e01 100644
--- a/git.c
+++ b/git.c
@@ -515,6 +515,7 @@ static struct cmd_struct commands[] = {
 	{ "mv", cmd_mv, RUN_SETUP | NEED_WORK_TREE },
 	{ "name-rev", cmd_name_rev, RUN_SETUP },
 	{ "notes", cmd_notes, RUN_SETUP },
+	{ "reverse-trailer-xrefs", cmd_reverse_trailer_xrefs, RUN_SETUP },
 	{ "pack-objects", cmd_pack_objects, RUN_SETUP },
 	{ "pack-redundant", cmd_pack_redundant, RUN_SETUP | NO_PARSEOPT },
 	{ "pack-refs", cmd_pack_refs, RUN_SETUP },
diff --git a/notes.c b/notes.c
index 25cdce28b..c32064bfe 100644
--- a/notes.c
+++ b/notes.c
@@ -9,6 +9,7 @@
 #include "tree-walk.h"
 #include "string-list.h"
 #include "refs.h"
+#include "hashmap.h"
 
 /*
  * Use a non-balancing simple 16-tree structure with struct int_node as
@@ -79,6 +80,10 @@ static struct notes_tree **display_notes_trees;
 
 static void load_subtree(struct notes_tree *t, struct leaf_node *subtree,
 		struct int_node *node, unsigned int n);
+static void parse_xref_note(const char *note, unsigned long size,
+			    const struct object_id *commit_oid,
+			    struct object_array *result,
+			    struct string_list *result_lines);
 
 /*
  * Search the tree until the appropriate location for the given key is found:
@@ -914,6 +919,60 @@ int combine_notes_cat_sort_uniq(struct object_id *cur_oid,
 	return ret;
 }
 
+int combine_notes_cat_xrefs(struct object_id *cur_oid,
+			    const struct object_id *new_oid)
+{
+	char *cur_msg = NULL, *new_msg = NULL;
+	unsigned long cur_len, new_len;
+	enum object_type cur_type, new_type;
+	struct object_array xrefs = OBJECT_ARRAY_INIT;
+	struct string_list lines = STRING_LIST_INIT_DUP;
+	struct strbuf output = STRBUF_INIT;
+	int i, j, cur_nr, ret;
+
+	/* read in both note blob objects */
+	if (!is_null_oid(new_oid))
+		new_msg = read_object_file(new_oid, &new_type, &new_len);
+	if (!new_msg || !new_len || new_type != OBJ_BLOB) {
+		free(new_msg);
+		return 0;
+	}
+	if (!is_null_oid(cur_oid))
+		cur_msg = read_object_file(cur_oid, &cur_type, &cur_len);
+	if (!cur_msg || !cur_len || cur_type != OBJ_BLOB) {
+		free(cur_msg);
+		free(new_msg);
+		oidcpy(cur_oid, new_oid);
+		return 0;
+	}
+
+	/* parse xrefs and de-dup */
+	parse_xref_note(cur_msg, cur_len, NULL, &xrefs, &lines);
+	cur_nr = xrefs.nr;
+	parse_xref_note(new_msg, new_len, NULL, &xrefs, &lines);
+
+	for (i = 0; i < cur_nr; i++)
+		for (j = cur_nr; j < xrefs.nr; j++)
+			if (!strcmp(xrefs.objects[i].name,
+				    xrefs.objects[j].name))
+				lines.items[j].string[0] = '\0';
+
+	/* write out the combined object */
+	for (i = 0; i < lines.nr; i++)
+		if (lines.items[i].string[0] != '\0')
+			strbuf_addf(&output, "%s\n", lines.items[i].string);
+
+	ret = write_object_file(output.buf, output.len, blob_type, cur_oid);
+
+	strbuf_release(&output);
+	object_array_clear(&xrefs);
+	string_list_clear(&lines, 0);
+	free(cur_msg);
+	free(new_msg);
+
+	return ret;
+}
+
 static int string_list_add_one_ref(const char *refname, const struct object_id *oid,
 				   int flag, void *cb)
 {
@@ -996,8 +1055,12 @@ void init_notes(struct notes_tree *t, const char *notes_ref,
 	if (!notes_ref)
 		notes_ref = default_notes_ref();
 
-	if (!combine_notes)
-		combine_notes = combine_notes_concatenate;
+	if (!combine_notes) {
+		if (starts_with(notes_ref, "refs/notes/xref-"))
+			combine_notes = combine_notes_cat_xrefs;
+		else
+			combine_notes = combine_notes_concatenate;
+	}
 
 	t->root = (struct int_node *) xcalloc(1, sizeof(struct int_node));
 	t->first_non_note = NULL;
@@ -1189,6 +1252,67 @@ void free_notes(struct notes_tree *t)
 	memset(t, 0, sizeof(struct notes_tree));
 }
 
+/*
+ * Parse a "[TAG:]HEX" line.  @xref is trimmed.  If @tag_p is not NULL and
+ * TAG exists, the string is split.  Returns the pointer to the OID and
+ * *@tag_p is updated to the TAG if requested.
+ */
+static char *parse_xref(char *xref, char **tag_p)
+{
+	char *p, *hex;
+
+	while (isspace(*xref))
+		xref++;
+
+	p = strchr(xref, ':');
+	if (p) {
+		if (tag_p) {
+			*tag_p = xref;
+			*p = '\0';
+		}
+		p++;
+		while (isspace(*p))
+			p++;
+		hex = p;
+	} else {
+		if (tag_p)
+			*tag_p = NULL;
+		hex = xref;
+	}
+
+	p = hex;
+	while (*p != '\0' && !isspace(*p))
+		p++;
+	*p = '\0';
+	return hex;
+}
+
+static void walk_xrefs(const char *tree_ref, struct object_id *from_oid,
+		       struct strbuf *sb, int level)
+{
+	struct object_array xrefs = OBJECT_ARRAY_INIT;
+	struct string_list lines = STRING_LIST_INIT_DUP;
+	int i;
+
+	read_xref_note(tree_ref, from_oid, &xrefs, &lines);
+
+	for (i = 0; i < xrefs.nr; i++) {
+		char *line = lines.items[i].string;
+		char *tag;
+
+		parse_xref(line, &tag);
+		strbuf_addf(sb, "    %s%s%*s%s\n",
+			    tag ?: "", tag ? ": " : "", 2 * level, "",
+			    xrefs.objects[i].name);
+		if (xrefs.objects[i].item)
+			walk_xrefs(tree_ref, &xrefs.objects[i].item->oid, sb,
+				   level + 1);
+	}
+
+	object_array_clear(&xrefs);
+	string_list_clear(&lines, 0);
+}
+
 /*
  * Fill the given strbuf with the notes associated with the given object.
  *
@@ -1208,6 +1332,7 @@ static void format_note(struct notes_tree *t, const struct object_id *object_oid
 	char *msg, *msg_p;
 	unsigned long linelen, msglen;
 	enum object_type type;
+	int format_xrefs;
 
 	if (!t)
 		t = &default_notes_tree;
@@ -1250,6 +1375,8 @@ static void format_note(struct notes_tree *t, const struct object_id *object_oid
 		}
 	}
 
+	format_xrefs = !raw && starts_with(t->ref, "refs/notes/xref-");
+
 	for (msg_p = msg; msg_p < msg + msglen; msg_p += linelen + 1) {
 		linelen = strchrnul(msg_p, '\n') - msg_p;
 
@@ -1257,6 +1384,14 @@ static void format_note(struct notes_tree *t, const struct object_id *object_oid
 			strbuf_addstr(sb, "    ");
 		strbuf_add(sb, msg_p, linelen);
 		strbuf_addch(sb, '\n');
+
+		if (format_xrefs) {
+			struct object_id oid;
+
+			msg_p[linelen] = '\0';
+			if (!get_oid_hex(parse_xref(msg_p, NULL), &oid))
+				walk_xrefs(t->ref, &oid, sb, 1);
+		}
 	}
 
 	free(msg);
@@ -1309,3 +1444,109 @@ void expand_loose_notes_ref(struct strbuf *sb)
 		expand_notes_ref(sb);
 	}
 }
+
+struct notes_tree_entry {
+	struct hashmap_entry ent;
+	struct notes_tree tree;
+};
+
+static int notes_tree_cmp(const void *hashmap_cmp_fn_data,
+			  const void *entry, const void *entry_or_key,
+			  const void *keydata)
+{
+	const struct notes_tree_entry *e1 = entry;
+	const struct notes_tree_entry *e2 = entry_or_key;
+
+	return strcmp(e1->tree.ref, e2->tree.ref);
+}
+
+static void parse_xref_note(const char *note, unsigned long size,
+			    const struct object_id *commit_oid,
+			    struct object_array *result,
+			    struct string_list *result_lines)
+{
+	struct strbuf **lines, **pline;
+
+	lines = strbuf_split_buf(note, size, '\n', 0);
+
+	for (pline = lines; *pline; pline++) {
+		struct strbuf *line = *pline;
+		const char *target_hex;
+		struct object_id target_oid;
+		struct object *target_obj;
+
+		target_hex = parse_xref(line->buf, NULL);
+		if (get_oid_hex(target_hex, &target_oid)) {
+			if (commit_oid)
+				warning("read invalid sha1 on %s: %s",
+					oid_to_hex(commit_oid), line->buf);
+			continue;
+		}
+
+		target_obj = parse_object(the_repository, &target_oid);
+		if (!target_obj || target_obj->type != OBJ_COMMIT) {
+			if (commit_oid)
+				warning("read invalid commit on %s: %s",
+					oid_to_hex(commit_oid), line->buf);
+			continue;
+		}
+
+		add_object_array(target_obj, target_hex, result);
+		if (result_lines) {
+			assert(result_lines->strdup_strings);
+			string_list_append(result_lines, line->buf);
+		}
+	}
+
+	strbuf_list_free(lines);
+}
+
+/*
+ * Read a cross-referencing note.
+ *
+ * Notes in @notes_ref contains lines of "[TAG:]HEX" pointing to other
+ * commits.  Read the target commits and add the objects to @result.  If
+ * @result_lines is non-NULL, it should point to a strdup'ing string_list.
+ * The verbatim note lines matching the target commits are appened to the
+ * list.
+ */
+void read_xref_note(const char *notes_ref, const struct object_id *commit_oid,
+		    struct object_array *result,
+		    struct string_list *result_lines)
+{
+	static struct hashmap *notes_tree_map = NULL;
+	unsigned hash = memhash(notes_ref, strlen(notes_ref));
+	struct notes_tree_entry key, *ent;
+	const struct object_id *note_oid;
+	unsigned long size;
+	enum object_type type;
+	char *note;
+
+	if (!notes_tree_map) {
+		notes_tree_map = xcalloc(1, sizeof(struct hashmap));
+		hashmap_init(notes_tree_map, notes_tree_cmp, NULL, 0);
+	}
+
+	hashmap_entry_init(&key.ent, hash);
+	key.tree.ref = (char *)notes_ref;
+	ent = hashmap_get(notes_tree_map, &key, NULL);
+	if (!ent) {
+		ent = xcalloc(1, sizeof(struct notes_tree_entry));
+		init_notes(&ent->tree, notes_ref, NULL, 0);
+		hashmap_entry_init(&ent->ent, hash);
+		hashmap_put(notes_tree_map, ent);
+	}
+
+	note_oid = get_note(&ent->tree, commit_oid);
+	if (!note_oid)
+		return;
+
+	note = read_object_file(note_oid, &type, &size);
+	if (!size) {
+		free(note);
+		return;
+	}
+
+	parse_xref_note(note, size, commit_oid, result, result_lines);
+	free(note);
+}
diff --git a/notes.h b/notes.h
index 414bc6855..fb8153334 100644
--- a/notes.h
+++ b/notes.h
@@ -2,10 +2,14 @@
 #define NOTES_H
 
 #include "string-list.h"
+#include "object.h"
 
 struct object_id;
 struct strbuf;
 
+#define NOTES_CHERRY_PICKS_REF		"refs/notes/xref-cherry-picks"
+#define NOTES_CHERRY_PICKED_TO_TAG	"Cherry-picked-to"
+
 /*
  * Function type for combining two notes annotating the same object.
  *
@@ -38,6 +42,8 @@ int combine_notes_ignore(struct object_id *cur_oid,
 			 const struct object_id *new_oid);
 int combine_notes_cat_sort_uniq(struct object_id *cur_oid,
 				const struct object_id *new_oid);
+int combine_notes_cat_xrefs(struct object_id *cur_oid,
+			    const struct object_id *new_oid);
 
 /*
  * Notes tree object
@@ -317,4 +323,8 @@ void expand_notes_ref(struct strbuf *sb);
  */
 void expand_loose_notes_ref(struct strbuf *sb);
 
+void read_xref_note(const char *notes_ref, const struct object_id *commit_oid,
+		    struct object_array *result,
+		    struct string_list *result_lines);
+
 #endif
diff --git a/object.c b/object.c
index e54160550..f79652a34 100644
--- a/object.c
+++ b/object.c
@@ -404,7 +404,7 @@ void object_array_clear(struct object_array *array)
 /*
  * Return true iff array already contains an entry with name.
  */
-static int contains_name(struct object_array *array, const char *name)
+int object_array_contains_name(struct object_array *array, const char *name)
 {
 	unsigned nr = array->nr, i;
 	struct object_array_entry *object = array->objects;
@@ -422,7 +422,7 @@ void object_array_remove_duplicates(struct object_array *array)
 
 	array->nr = 0;
 	for (src = 0; src < nr; src++) {
-		if (!contains_name(array, objects[src].name)) {
+		if (!object_array_contains_name(array, objects[src].name)) {
 			if (src != array->nr)
 				objects[array->nr] = objects[src];
 			array->nr++;
diff --git a/object.h b/object.h
index 796792cb3..a0b3dd312 100644
--- a/object.h
+++ b/object.h
@@ -172,6 +172,12 @@ typedef int (*object_array_each_func_t)(struct object_array_entry *, void *);
 void object_array_filter(struct object_array *array,
 			 object_array_each_func_t want, void *cb_data);
 
+/*
+ * Returns 1 if array already contains an entry with the specified name.
+ * Otherwise, 0.
+ */
+int object_array_contains_name(struct object_array *array, const char *name);
+
 /*
  * Remove from array all but the first entry with a given name.
  * Warning: this function uses an O(N^2) algorithm.
diff --git a/trailer.c b/trailer.c
index 0796f326b..3afa38d25 100644
--- a/trailer.c
+++ b/trailer.c
@@ -2,6 +2,7 @@
 #include "config.h"
 #include "string-list.h"
 #include "run-command.h"
+#include "object-store.h"
 #include "commit.h"
 #include "tempfile.h"
 #include "trailer.h"
@@ -1170,3 +1171,104 @@ void format_trailers_from_commit(struct strbuf *out, const char *msg,
 	format_trailer_info(out, &info, opts);
 	trailer_info_release(&info);
 }
+
+implement_static_commit_slab(trailer_rxrefs_slab, struct object_array *);
+
+static struct object_array *get_trailer_rxrefs(
+			struct trailer_rev_xrefs *rxrefs, struct commit *commit)
+{
+	struct object_array **slot =
+		trailer_rxrefs_slab_peek(&rxrefs->slab, commit);
+
+	return slot ? *slot : NULL;
+}
+
+static struct object_array *get_create_trailer_rxrefs(
+			struct trailer_rev_xrefs *rxrefs, struct commit *commit)
+{
+	struct object_array **slot =
+		trailer_rxrefs_slab_at(&rxrefs->slab, commit);
+
+	if (*slot)
+		return *slot;
+
+	add_object_array(&commit->object, oid_to_hex(&commit->object.oid),
+			 &rxrefs->from_commits);
+	*slot = xmalloc(sizeof(struct object_array));
+	**slot = (struct object_array)OBJECT_ARRAY_INIT;
+	return *slot;
+}
+
+void trailer_rev_xrefs_init(struct trailer_rev_xrefs *rxrefs, const char *tag)
+{
+	rxrefs->tag = xstrdup(tag);
+	rxrefs->tag_len = strlen(tag);
+	init_trailer_rxrefs_slab(&rxrefs->slab);
+	rxrefs->from_commits = (struct object_array)OBJECT_ARRAY_INIT;
+}
+
+void trailer_rev_xrefs_record(struct trailer_rev_xrefs *rxrefs,
+			      struct commit *commit)
+{
+	struct process_trailer_options opts = PROCESS_TRAILER_OPTIONS_INIT;
+	enum object_type type;
+	unsigned long size;
+	void *buffer;
+	struct trailer_info info;
+	int i;
+
+	buffer = read_object_file(&commit->object.oid, &type, &size);
+	trailer_info_get(&info, buffer, &opts);
+
+	/* when nested, the last trailer describes the latest cherry-pick */
+	for (i = info.trailer_nr - 1; i >= 0; i--) {
+		char *line = info.trailers[i];
+
+		if (starts_with(line, rxrefs->tag)) {
+			struct object_id from_oid;
+			struct object *from_object;
+			struct commit *from_commit;
+			struct object_array *to_objs;
+			char cherry_hex[GIT_MAX_HEXSZ + 1];
+
+			if (get_oid_hex(line + rxrefs->tag_len, &from_oid))
+				continue;
+
+			from_object = parse_object(the_repository, &from_oid);
+			if (!from_object || from_object->type != OBJ_COMMIT)
+				continue;
+
+			from_commit = (struct commit *)from_object;
+			to_objs = get_create_trailer_rxrefs(rxrefs, from_commit);
+
+			oid_to_hex_r(cherry_hex, &commit->object.oid);
+			add_object_array(&commit->object, cherry_hex, to_objs);
+			break;
+		}
+	}
+
+	free(buffer);
+}
+
+void trailer_rev_xrefs_release(struct trailer_rev_xrefs *rxrefs)
+{
+	clear_trailer_rxrefs_slab(&rxrefs->slab);
+	object_array_clear(&rxrefs->from_commits);
+	free(rxrefs->tag);
+}
+
+void trailer_rev_xrefs_next(struct trailer_rev_xrefs *rxrefs, int *idx_p,
+			    struct commit **from_commit_p,
+			    struct object_array **to_objs_p)
+{
+	if (*idx_p >= rxrefs->from_commits.nr) {
+		*from_commit_p = NULL;
+		*to_objs_p = NULL;
+		return;
+	}
+
+	*from_commit_p = (struct commit *)
+		rxrefs->from_commits.objects[*idx_p].item;
+	*to_objs_p = get_trailer_rxrefs(rxrefs, *from_commit_p);
+	(*idx_p)++;
+}
diff --git a/trailer.h b/trailer.h
index b99773964..5a9704e19 100644
--- a/trailer.h
+++ b/trailer.h
@@ -2,6 +2,8 @@
 #define TRAILER_H
 
 #include "list.h"
+#include "object.h"
+#include "commit-slab.h"
 
 struct strbuf;
 
@@ -99,4 +101,28 @@ void trailer_info_release(struct trailer_info *info);
 void format_trailers_from_commit(struct strbuf *out, const char *msg,
 				 const struct process_trailer_options *opts);
 
+declare_commit_slab(trailer_rxrefs_slab, struct object_array *);
+
+struct trailer_rev_xrefs {
+	char *tag;
+	int tag_len;
+	struct trailer_rxrefs_slab slab;
+	struct object_array from_commits;
+};
+
+void trailer_rev_xrefs_init(struct trailer_rev_xrefs *rxrefs, const char *tag);
+void trailer_rev_xrefs_record(struct trailer_rev_xrefs *rxrefs,
+			      struct commit *commit);
+void trailer_rev_xrefs_release(struct trailer_rev_xrefs *rxrefs);
+
+void trailer_rev_xrefs_next(struct trailer_rev_xrefs *rxrefs,
+			    int *idx_p, struct commit **from_commit_p,
+			    struct object_array **to_objs_p);
+
+#define trailer_rev_xrefs_for_each(rxrefs, idx, from_commit, to_objs)		\
+	for ((idx) = 0,								\
+	     trailer_rev_xrefs_next(rxrefs, &(idx), &(from_commit), &(to_objs));\
+	     (from_commit);							\
+	     trailer_rev_xrefs_next(rxrefs, &(idx), &(from_commit), &(to_objs)))
+
 #endif /* TRAILER_H */

^ permalink raw reply related

* Re: [PATCHv2 0/9] Resending sb/submodule-recursive-fetch-gets-the-tip
From: Stefan Beller @ 2018-12-06 21:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jonathan Tan
In-Reply-To: <xmqqa7lkeki3.fsf@gitster-ct.c.googlers.com>

On Tue, Dec 4, 2018 at 7:10 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Stefan Beller <sbeller@google.com> writes:
>
> > This is a resend of sb/submodule-recursive-fetch-gets-the-tip,
> > with all feedback addressed. As it took some time, I'll send it
> > without range-diff, but would ask for full review.
>
> Is that a "resend" or reroll/update (or whatever word that does not
> imply "just sending the same thing again")?

As you noticed, it is an actual update. I started to use resend
as DScho seems very unhappy about the word reroll claiming we'd
be the only Software community that uses the term reroll for
an iteration of a change.

I see how resend could sound like retransmission without change.


>                         child_process_init(cp);
>      -                  cp->dir = strbuf_detach(&submodule_path, NULL);
>     -                   prepare_submodule_repo_env(&cp->env_array);
>      +                  cp->dir = xstrdup(repo->worktree);
>     +                   prepare_submodule_repo_env(&cp->env_array);
>
> Hmph, I offhand do not see there would be any difference if you
> assigned to cp->dir before or after preparing the repo env, but is
> there a reason these two must be done in this updated order that I
> am missing?  Very similar changes appear multiple times in this
> range-diff.

Jonathan Tan asked for it to be "diff friendly". This -of course- is
range-diff unfriendly.

> [...]

you seem to be OK with a lot of the changes, I did not find an
actionable suggestion.

Thanks for still queuing topics during -rc time,
Stefan

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox