avoid duplicate patches from git log ?

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* avoid duplicate patches from git log ?
@ 2016-05-03 20:11 Philip Oakley
  2016-05-03 22:00 ` Jeff King
  2016-05-04 11:44 ` Johannes Schindelin
  0 siblings, 2 replies; 7+ messages in thread
From: Philip Oakley @ 2016-05-03 20:11 UTC (permalink / raw)
  To: Git List

I was trying to search the Git for Windows (G4W) history for commits that 
touched MSVC.

I've used 'git log -SMSVC --pretty='tformat:%h (%s, 
%ad)' --date=short --reverse' to get a nice list of those commits.

However, as the G4W project (https://github.com/git-for-windows/git/) 
follows the main git repo and its releases, it needs to rebase it's fixup 
patches, while retaining their original series, so has repeated copies of 
those fix patches on the second parent path (a technique Dscho called 
rebasing merges).

for example:
> bf1a7ff (MinGW: disable CRT command line globbing, 2011-01-07)
> a05e9a8 (MinGW: disable CRT command line globbing, 2011-01-07)
> 45cfa35 (MinGW: disable CRT command line globbing, 2011-01-07)
> 1d35390 (MinGW: disable CRT command line globbing, 2011-01-07)
> 022e029 (MinGW: disable CRT command line globbing, 2011-01-07)

How can I filter out all the duplicate patches which are identical other 
than the commit date?

The --left --right and --cherry don't appear to do what I'd expect/hope. Any 
suggestions?
--
Philip

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: avoid duplicate patches from git log ?
  2016-05-03 20:11 avoid duplicate patches from git log ? Philip Oakley
@ 2016-05-03 22:00 ` Jeff King
  2016-05-03 22:12   ` Junio C Hamano
  2016-05-04 11:44 ` Johannes Schindelin
  1 sibling, 1 reply; 7+ messages in thread
From: Jeff King @ 2016-05-03 22:00 UTC (permalink / raw)
  To: Philip Oakley; +Cc: Git List

On Tue, May 03, 2016 at 09:11:55PM +0100, Philip Oakley wrote:

> However, as the G4W project (https://github.com/git-for-windows/git/)
> follows the main git repo and its releases, it needs to rebase it's fixup
> patches, while retaining their original series, so has repeated copies of
> those fix patches on the second parent path (a technique Dscho called
> rebasing merges).
> 
> for example:
> > bf1a7ff (MinGW: disable CRT command line globbing, 2011-01-07)
> > a05e9a8 (MinGW: disable CRT command line globbing, 2011-01-07)
> > 45cfa35 (MinGW: disable CRT command line globbing, 2011-01-07)
> > 1d35390 (MinGW: disable CRT command line globbing, 2011-01-07)
> > 022e029 (MinGW: disable CRT command line globbing, 2011-01-07)
> 
> 
> How can I filter out all the duplicate patches which are identical other
> than the commit date?
> 
> The --left --right and --cherry don't appear to do what I'd expect/hope. Any
> suggestions?

I don't think there's a good way right now. The option that suppresses
commits is --cherry-pick, but it wants there to be a "left" and "right"
from a symmetric difference, and to cull duplicates from the various
sides.

I think you really just want to keep a running list of all of the
commits you've seen and cull any duplicates. I guess you'd want this as
part of the history simplification step, so that whole uninteresting
side-branches are culled.

The obvious choice for matching two commits is patch-id, though it can
be expensive to generate. There have been patches playing around with
caching in the past, but nothing merged. For your purposes, I suspect
matching an "(author, authordate, subject)" tuple would be sufficient
and fast.

-Peff

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: avoid duplicate patches from git log ?
  2016-05-03 22:00 ` Jeff King
@ 2016-05-03 22:12   ` Junio C Hamano
  2016-05-03 22:36     ` Philip Oakley
  0 siblings, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2016-05-03 22:12 UTC (permalink / raw)
  To: Jeff King; +Cc: Philip Oakley, Git List

Jeff King <peff@peff.net> writes:

> On Tue, May 03, 2016 at 09:11:55PM +0100, Philip Oakley wrote:
>
>> However, as the G4W project (https://github.com/git-for-windows/git/)
>> follows the main git repo and its releases, it needs to rebase it's fixup
>> patches, while retaining their original series, so has repeated copies of
>> those fix patches on the second parent path (a technique Dscho called
>> rebasing merges).
>> 
>> for example:
>> > bf1a7ff (MinGW: disable CRT command line globbing, 2011-01-07)
>> > a05e9a8 (MinGW: disable CRT command line globbing, 2011-01-07)
>> > 45cfa35 (MinGW: disable CRT command line globbing, 2011-01-07)
>> > 1d35390 (MinGW: disable CRT command line globbing, 2011-01-07)
>> > 022e029 (MinGW: disable CRT command line globbing, 2011-01-07)
>> 
>> 
>> How can I filter out all the duplicate patches which are identical other
>> than the commit date?
>> 
>> The --left --right and --cherry don't appear to do what I'd expect/hope. Any
>> suggestions?
>
> I don't think there's a good way right now. The option that suppresses
> commits is --cherry-pick, but it wants there to be a "left" and "right"
> from a symmetric difference, and to cull duplicates from the various
> sides.
>
> I think you really just want to keep a running list of all of the
> commits you've seen and cull any duplicates. I guess you'd want this as
> part of the history simplification step, so that whole uninteresting
> side-branches are culled.
>
> The obvious choice for matching two commits is patch-id, though it can
> be expensive to generate. There have been patches playing around with
> caching in the past, but nothing merged. For your purposes, I suspect
> matching an "(author, authordate, subject)" tuple would be sufficient
> and fast.

What would be really interesting is what should happen when the side
"rebase merge" branch that is supposed to be irrelevant for the
purpose of explaining the overall history does not become empty
after such filtering operation.  The merge commit itself may claim
that both branches are equivalent, but in reality it may turn out
that the merge failed to reflect the effect of some other changes in
the history of the side branch in the result--which would be a
ticking time-bomb for future mismerges waiting to happen.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: avoid duplicate patches from git log ?
  2016-05-03 22:12   ` Junio C Hamano
@ 2016-05-03 22:36     ` Philip Oakley
  2016-05-04 11:58       ` Johannes Schindelin
  0 siblings, 1 reply; 7+ messages in thread
From: Philip Oakley @ 2016-05-03 22:36 UTC (permalink / raw)
  To: Junio C Hamano, Jeff King; +Cc: Git List

From: "Junio C Hamano" <gitster@pobox.com>
> Jeff King <peff@peff.net> writes:
>
>> On Tue, May 03, 2016 at 09:11:55PM +0100, Philip Oakley wrote:
>>
>>> However, as the G4W project (https://github.com/git-for-windows/git/)
>>> follows the main git repo and its releases, it needs to rebase it's 
>>> fixup
>>> patches, while retaining their original series, so has repeated copies 
>>> of
>>> those fix patches on the second parent path (a technique Dscho called
>>> rebasing merges).
>>>
>>> for example:
>>> > bf1a7ff (MinGW: disable CRT command line globbing, 2011-01-07)
>>> > a05e9a8 (MinGW: disable CRT command line globbing, 2011-01-07)
>>> > 45cfa35 (MinGW: disable CRT command line globbing, 2011-01-07)
>>> > 1d35390 (MinGW: disable CRT command line globbing, 2011-01-07)
>>> > 022e029 (MinGW: disable CRT command line globbing, 2011-01-07)
>>>
>>>
>>> How can I filter out all the duplicate patches which are identical other
>>> than the commit date?
>>>
>>> The --left --right and --cherry don't appear to do what I'd expect/hope. 
>>> Any
>>> suggestions?
>>
>> I don't think there's a good way right now. The option that suppresses
>> commits is --cherry-pick, but it wants there to be a "left" and "right"
>> from a symmetric difference, and to cull duplicates from the various
>> sides.
>>
>> I think you really just want to keep a running list of all of the
>> commits you've seen and cull any duplicates. I guess you'd want this as
>> part of the history simplification step, so that whole uninteresting
>> side-branches are culled.
>>
>> The obvious choice for matching two commits is patch-id, though it can
>> be expensive to generate. There have been patches playing around with
>> caching in the past, but nothing merged. For your purposes, I suspect
>> matching an "(author, authordate, subject)" tuple would be sufficient
>> and fast.
>
> What would be really interesting is what should happen when the side
> "rebase merge" branch that is supposed to be irrelevant for the
> purpose of explaining the overall history does not become empty
> after such filtering operation.  The merge commit itself may claim
> that both branches are equivalent, but in reality it may turn out
> that the merge failed to reflect the effect of some other changes in
> the history of the side branch in the result--which would be a
> ticking time-bomb for future mismerges waiting to happen.

I think that's a misunderstanding of the development process for an "on top 
of" project, where the upstream would not be expected to take all the fixups 
for that project's customers.

The releases of the project do need to be retained in the history, but 
because of the "on top of" policy, the prior release becomes a second parent 
to a "theirs" merge commit of the upstream (and subsequent rebase on top of 
that).

Thus when seaching history the first parent route would have the fastest 
transition to the upstream, but the full history would still have all the 
releases on it.

It may be that Peff's suggestion is a workable heuristic for a rebase flow 
where one could eliminate those duplicates quite easily. I just had a 
feeling that there was already something that did the patch-id thing for 
duplicate removals, but obviously I had that wrong.

--
Philip 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: avoid duplicate patches from git log ?
  2016-05-03 22:36     ` Philip Oakley
@ 2016-05-04 11:58       ` Johannes Schindelin
  2016-05-04 19:15         ` Junio C Hamano
  0 siblings, 1 reply; 7+ messages in thread
From: Johannes Schindelin @ 2016-05-04 11:58 UTC (permalink / raw)
  To: Philip Oakley; +Cc: Junio C Hamano, Jeff King, Git List

Hi Philip,

On Tue, 3 May 2016, Philip Oakley wrote:

> From: "Junio C Hamano" <gitster@pobox.com>
> > Jeff King <peff@peff.net> writes:
> >
> > > On Tue, May 03, 2016 at 09:11:55PM +0100, Philip Oakley wrote:
> > >
> > > > However, as the G4W project (https://github.com/git-for-windows/git/)
> > > > follows the main git repo and its releases, it needs to rebase it's
> > > > fixup
> > > > patches, while retaining their original series, so has repeated copies
> > > > of
> > > > those fix patches on the second parent path (a technique Dscho called
> > > > rebasing merges).
> > > >
> > > > for example:
> > > > > bf1a7ff (MinGW: disable CRT command line globbing, 2011-01-07)
> > > > > a05e9a8 (MinGW: disable CRT command line globbing, 2011-01-07)
> > > > > 45cfa35 (MinGW: disable CRT command line globbing, 2011-01-07)
> > > > > 1d35390 (MinGW: disable CRT command line globbing, 2011-01-07)
> > > > > 022e029 (MinGW: disable CRT command line globbing, 2011-01-07)
> > > >
> > > >
> > > > How can I filter out all the duplicate patches which are identical other
> > > > than the commit date?
> > > >
> > > > The --left --right and --cherry don't appear to do what I'd expect/hope.
> > > > Any
> > > > suggestions?
> > >
> > > I don't think there's a good way right now. The option that suppresses
> > > commits is --cherry-pick, but it wants there to be a "left" and "right"
> > > from a symmetric difference, and to cull duplicates from the various
> > > sides.
> > >
> > > I think you really just want to keep a running list of all of the
> > > commits you've seen and cull any duplicates. I guess you'd want this as
> > > part of the history simplification step, so that whole uninteresting
> > > side-branches are culled.
> > >
> > > The obvious choice for matching two commits is patch-id, though it can
> > > be expensive to generate. There have been patches playing around with
> > > caching in the past, but nothing merged. For your purposes, I suspect
> > > matching an "(author, authordate, subject)" tuple would be sufficient
> > > and fast.
> >
> > What would be really interesting is what should happen when the side
> > "rebase merge" branch that is supposed to be irrelevant for the
> > purpose of explaining the overall history does not become empty
> > after such filtering operation.  The merge commit itself may claim
> > that both branches are equivalent, but in reality it may turn out
> > that the merge failed to reflect the effect of some other changes in
> > the history of the side branch in the result--which would be a
> > ticking time-bomb for future mismerges waiting to happen.
> 
> I think that's a misunderstanding of the development process for an "on
> top of" project, where the upstream would not be expected to take all
> the fixups for that project's customers.

Exactly. The merging-rebase technique only makes sense when the entire set
of changes is rebased.

Please note that I do drop some patches from time to time, so what Junio
fears is actually not a time bomb, but rather the intended benefit.

The *real* advantage of the merging-rebase technique is that contributors
can easily call `git rebase origin/master` *even after* origin was
"rebased". Because it was both rebased, and not rebased.

> The releases of the project do need to be retained in the history, but
> because of the "on top of" policy, the prior release becomes a second
> parent to a "theirs" merge commit of the upstream (and subsequent rebase
> on top of that).

This is a secondary concern, true. But we could easily tag the releases
and then continue `master` with a rebased version, i.e. `master` would
usually not fast-forward from tagged commits.

But it would make contributing much harder than it already is.

> It may be that Peff's suggestion is a workable heuristic for a rebase
> flow where one could eliminate those duplicates quite easily. I just had
> a feeling that there was already something that did the patch-id thing
> for duplicate removals, but obviously I had that wrong.

Oh, we do have that logic, it is the --cherry option of the rev-list
machinery. It's just that due to the merging-rebase technique, you cannot
have your desired commits on one side and the undesired ones on the other
side of the "...".

Unless...

Unless you play games with the grafts. You *could* pretend that the "Start
the merging-rebase" commit had only its first parent, using a graft. Then
"git log --cherry --right-only SECOND_PARENT...HEAD" (where SECOND_PARENT
would be the culled second parent of that merge commit) would have the
intended result.

You should not forget to remove the graft afterwards, though. (You *might*
be able to finagle something more temporary by using `git replace`,
dunno, still finicky.)

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: avoid duplicate patches from git log ?
  2016-05-04 11:58       ` Johannes Schindelin
@ 2016-05-04 19:15         ` Junio C Hamano
  0 siblings, 0 replies; 7+ messages in thread
From: Junio C Hamano @ 2016-05-04 19:15 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Philip Oakley, Jeff King, Git List

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> Please note that I do drop some patches from time to time, so what Junio
> fears is actually not a time bomb, but rather the intended benefit.

OK.  As long as all the dropped patches are intentional, by
definition nothing is lost ;-)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: avoid duplicate patches from git log ?
  2016-05-03 20:11 avoid duplicate patches from git log ? Philip Oakley
  2016-05-03 22:00 ` Jeff King
@ 2016-05-04 11:44 ` Johannes Schindelin
  1 sibling, 0 replies; 7+ messages in thread
From: Johannes Schindelin @ 2016-05-04 11:44 UTC (permalink / raw)
  To: Philip Oakley; +Cc: Git List

Hi Philip,

On Tue, 3 May 2016, Philip Oakley wrote:

> I was trying to search the Git for Windows (G4W) history for commits that
> touched MSVC.
> 
> I've used 'git log -SMSVC --pretty='tformat:%h (%s, %ad)' --date=short
> --reverse' to get a nice list of those commits.
> 
> However, as the G4W project (https://github.com/git-for-windows/git/)
> follows the main git repo and its releases, it needs to rebase it's
> fixup patches, while retaining their original series, so has repeated
> copies of those fix patches on the second parent path (a technique Dscho
> called rebasing merges).

Actually, I no longer use rebasing merges, but instead merging rebases.
The difference is a little subtle:

Rebasing merge:

- upstream ----- rebased-A - rebased-B - rebasing-merge (-s ours)
                                       /
- old-A - old-B -----------------------

Merging rebase:

- upstream ----- merging-rebase (-s ours) - rebased-A - rebased-B
               /
- old-A - old-B

Of course both diagrams are drastically simplified, as I do not only
rebase mere patches, but topic branches, including merge structure
(currently 44 IIRC), to make it easier to break out the topic branches for
easy submission later.

It turned out that the rebasing-merge strategy actually made things
harder, as it was not quite as easy to figure out *which* commits needed
rebasing (remember, new commits were added after rebasing-merges).

With merging-rebases, at least, one knows that the tree of the merge
commit starting the whole shebang is identical to upstream's tree.  All of
the patches that come on top of that merge commit need to be rebased the
next time round, including new patches.

Fittingly, the commit message of said merge commit begins with the
following text: "Start the merging-rebase ..."

BTW The reason for this rather unwieldy setup is that we historically had
a hard time getting our patches into upstream Git (there was some degree
of resistance in the past, and also a quite limiting lack of time on my
part).

> for example:
> > bf1a7ff (MinGW: disable CRT command line globbing, 2011-01-07)
> > a05e9a8 (MinGW: disable CRT command line globbing, 2011-01-07)
> > 45cfa35 (MinGW: disable CRT command line globbing, 2011-01-07)
> > 1d35390 (MinGW: disable CRT command line globbing, 2011-01-07)
> > 022e029 (MinGW: disable CRT command line globbing, 2011-01-07)
> 
> 
> How can I filter out all the duplicate patches which are identical other
> than the commit date?

I would go about it in a completely different manner. Remember that the
merging rebase starts with a merge that integrates the previous history,
but also resets the tree to upstream's. Therefore all the commits merged
at the start of the merging-rebase are the ones in which you are *not*
interested.

In other words, this command-line will yield the output you desire:

git log -SMSVC --pretty='tformat:%h (%s, %ad)' --date=short \
	HEAD^{/Start.the.merging-rebase}^2..

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-05-04 19:15 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-05-03 20:11 avoid duplicate patches from git log ? Philip Oakley
2016-05-03 22:00 ` Jeff King
2016-05-03 22:12   ` Junio C Hamano
2016-05-03 22:36     ` Philip Oakley
2016-05-04 11:58       ` Johannes Schindelin
2016-05-04 19:15         ` Junio C Hamano
2016-05-04 11:44 ` Johannes Schindelin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).