From mboxrd@z Thu Jan 1 00:00:00 1970 From: Junio C Hamano Subject: Re: is rebase the same as merging every commit? Date: Thu, 26 Jun 2008 17:51:11 -0700 Message-ID: <7vzlp7n1j4.fsf@gitster.siamese.dyndns.org> References: <1006.35704952783$1214525911@news.gmane.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: To: "David Jeske" X-From: git-owner@vger.kernel.org Fri Jun 27 02:52:26 2008 Return-path: Envelope-to: gcvg-git-2@gmane.org Received: from vger.kernel.org ([209.132.176.167]) by lo.gmane.org with esmtp (Exim 4.50) id 1KC2CP-0006U7-WE for gcvg-git-2@gmane.org; Fri, 27 Jun 2008 02:52:26 +0200 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752425AbYF0Av3 (ORCPT ); Thu, 26 Jun 2008 20:51:29 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752550AbYF0Av3 (ORCPT ); Thu, 26 Jun 2008 20:51:29 -0400 Received: from a-sasl-fastnet.sasl.smtp.pobox.com ([207.106.133.19]:37777 "EHLO sasl.smtp.pobox.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752392AbYF0Av2 (ORCPT ); Thu, 26 Jun 2008 20:51:28 -0400 Received: from localhost.localdomain (localhost [127.0.0.1]) by a-sasl-fastnet.sasl.smtp.pobox.com (Postfix) with ESMTP id 4C761228F0; Thu, 26 Jun 2008 20:51:27 -0400 (EDT) Received: from pobox.com (ip68-225-240-77.oc.oc.cox.net [68.225.240.77]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by a-sasl-fastnet.sasl.smtp.pobox.com (Postfix) with ESMTPSA id 3B90E228EF; Thu, 26 Jun 2008 20:51:23 -0400 (EDT) In-Reply-To: <1006.35704952783$1214525911@news.gmane.org> (David Jeske's message of "Thu, 26 Jun 2008 23:04:58 -0000") User-Agent: Gnus/5.110006 (No Gnus v0.6) Emacs/21.4 (gnu/linux) X-Pobox-Relay-ID: 2C6DBA78-43E3-11DD-8B7A-CE28B26B55AE-77302942!a-sasl-fastnet.pobox.com Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: "David Jeske" writes: > Rebasing is described in the docs I've read as turning this: (sorry for the > dots) > > ..........A---B---C topic > ........./ > ....D---E---F---G master > > Into this: > > ...................A'--B'--C' topic > ................../ > .....D---E---F---G master > > If I understand it right (and that's a BIG if), it's the same as doing a merge > of C into G where every individual commit in the C-line is individually > committed into the new C' line. > > ...........-------------A---B---C > ........../ / / / > ........./ /---A'--B'--C' topic > ......../ / > ....D---E---F---G - master > > > (1) Is the above model a valid explanation? I would presume that the resulting trees A' in the second picture and in the first picture would be the same, so are B' and C'. But that is only true when commits between A and C do not have any duplicate with the development that happened between E and G. Thinking about it like that is an interesting mental exercise, but it is not very useful otherwise. > (2) From the documentation diagrams, it looks like the rebased A' has > only (G) as a parent, not (A,G). If this is the case, why? (i.e. not > connecting those nodes throws away useful information) You would rebase ONLY WHEN the project as the whole (either "other people in the project", or "yourself down the road one year from now") is interested mostly in the progress of 'master' D-E-F-G, and nobody cares whether you developed your A (or B or C) on top of E or G. So the answer is definite "no" --- the line you drew between A and A' is a useless information. Nobody cares you did it first on top of E but then you have redone the patches based on G (because things changed between E and G). If there were no "rebase", your changes will be integrated into 'master' branch like this: A---B---C / \ D---E---F---G---M Rebasing is a way to _help you_ pretend that you did _not_ start working on an ancient code base that was at E. You redo your series on top of the latest and greatest G, the commit that everybody else agrees is the current state of affairs when he sees your changes for the first time, to produce a history like this: D---E---F---G---A'--B'--C' Doing so tends to make the history easier to understand, and more importantly, it reduces mistakes during the integration _and_ distributes the burden of integration from central point. If E..G and A..C happen to have conflicting changes, rebasing puts the burden to rewrite the changes A..C into A'..C', based on the modified base code G, on _you_ (the person who is rebasing). Some people do not like this, as they feel that is an added, unwanted burden. On the other hand, if your upstream maintainer is integrating like the above picture to create a merge 'M', it is more likely that he would make mistakes during the conflict resolution, than you make incorrect adjustment during your rebasing to recreate the series A'..C'. You read what G gives you as the foundation to build your changes on, determine what got changed since E, on which you originally based your changes, and adjust your changes to better integrate on top of G. After all, A..C is _your code_ and you understand what it assumes better than anybody else. If the fact that parallel developments have happened is important, instead of the second picture like you drew, you will just do the real merge naturally to create a merge "M" like the picture I drew above. Your "A' is merge between E and A, B' is merge between A' and B" is not something anybody is interested in if you are going to rebase. It is not interesting because it is not how things happened in the real life at all, and it is not interesting because it is not simplifying the history for later analysis nor reducing mistakes during the conflict resolution.