From mboxrd@z Thu Jan 1 00:00:00 1970 From: Martin von Zweigbergk Subject: Re: [PATCH 02/31] rebase: refactor reading of state Date: Wed, 29 Dec 2010 09:09:37 +0100 (CET) Message-ID: References: <1293528648-21873-1-git-send-email-martin.von.zweigbergk@gmail.com> <1293528648-21873-3-git-send-email-martin.von.zweigbergk@gmail.com> <7v1v51v6l2.fsf@alter.siamese.dyndns.org> Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Cc: Martin von Zweigbergk , git@vger.kernel.org, Johannes Schindelin , Johannes Sixt , Christian Couder To: Junio C Hamano X-From: git-owner@vger.kernel.org Wed Dec 29 15:08:52 2010 Return-path: Envelope-to: gcvg-git-2@lo.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1PXwhv-00019V-CA for gcvg-git-2@lo.gmane.org; Wed, 29 Dec 2010 15:08:51 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753196Ab0L2OIq (ORCPT ); Wed, 29 Dec 2010 09:08:46 -0500 Received: from mail-qy0-f194.google.com ([209.85.216.194]:65074 "EHLO mail-qy0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752890Ab0L2OIp (ORCPT ); Wed, 29 Dec 2010 09:08:45 -0500 Received: by qyk4 with SMTP id 4so3035667qyk.1 for ; Wed, 29 Dec 2010 06:08:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:date:from:x-x-sender:to:cc :subject:in-reply-to:message-id:references:user-agent:mime-version :content-type; bh=L0kaHhoS9/fRAI+oRB+bfqMZjJrYaELeGkUhzBRjmKQ=; b=bPs+X9sx7l0Ixc4hOnbQtASD5WV6NIZCn6GYU4QnF96k5VGKv4jWfLST9LlkEcEadM 6R4JluZgwB8WeMgUOGsUnV8OuEiKvHVvCwA49BGjAtfqqVRhbM/gPnBTLbbVcCo+Pi5e 6xNVPrf4XNoJ7PRMZRl0LUUBtU1M99+nvKLaE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=date:from:x-x-sender:to:cc:subject:in-reply-to:message-id :references:user-agent:mime-version:content-type; b=S/0eh66uXpRxta+lmJ3DvostIGGm4JRQXTtuK9QOjASmaa6NaFZmUG4YvZ5TFZLjpA IcCh0iE7LqcLztEp2ylaa2LbIra9bg2AjdSC6EwEiXq6iVvqPm328LeFa/QuKK9HBzWq 56h27pqG+FIMMOIBMDZtL6zE8+tb1XNA+VyZ4= Received: by 10.229.220.134 with SMTP id hy6mr13065760qcb.287.1293631724781; Wed, 29 Dec 2010 06:08:44 -0800 (PST) Received: from [192.168.1.103] (modemcable151.183-178-173.mc.videotron.ca [173.178.183.151]) by mx.google.com with ESMTPS id g32sm8426698qck.22.2010.12.29.06.08.42 (version=TLSv1/SSLv3 cipher=RC4-MD5); Wed, 29 Dec 2010 06:08:43 -0800 (PST) X-X-Sender: martin@debian In-Reply-To: <7v1v51v6l2.fsf@alter.siamese.dyndns.org> User-Agent: Alpine 1.10 (DEB 962 2008-03-14) Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: On Tue, 28 Dec 2010, Junio C Hamano wrote: > Martin von Zweigbergk writes: > > > +read_state () { > > + if test -d "$merge_dir" > > + then > > + state_dir="$merge_dir" > > + prev_head=$(cat "$merge_dir"/prev_head) && > > + end=$(cat "$merge_dir"/end) && > > + msgnum=$(cat "$merge_dir"/msgnum) > > + else > > + state_dir="$apply_dir" > > + fi && > > + head_name=$(cat "$state_dir"/head-name) && > > + onto=$(cat "$state_dir"/onto) && > > + orig_head=$(cat "$state_dir"/orig-head) && > > + GIT_QUIET=$(cat "$state_dir"/quiet) > > +} > > + > > continue_merge () { > > test -n "$prev_head" || die "prev_head must be defined" > > test -d "$merge_dir" || die "$merge_dir directory does not exist" > > @@ -138,10 +154,6 @@ call_merge () { > > } > > > > move_to_original_branch () { > > - test -z "$head_name" && > > - head_name="$(cat "$merge_dir"/head-name)" && > > - onto="$(cat "$merge_dir"/onto)" && > > - orig_head="$(cat "$merge_dir"/orig-head)" > > It used to be safe to call this without head_name and friends set, because > the function took care of reading these itself. Nobody calls this without > head_name set anymore? > > I am not complaining nor suggesting to add an unnecessary "read_state" > here only to slow things down---I am making sure that you removed this > because you know it is unnecessary. Correct. It used to be called without head_name set from finish_rb_merge, which would in turn be called from the --continue and --skip arms. onto would already have been set in these cases, but would then be re-read. > > @@ -220,13 +232,9 @@ do > > echo "mark them as resolved using git add" > > exit 1 > > } > > + read_state > > if test -d "$merge_dir" > > then > > - prev_head=$(cat "$merge_dir/prev_head") > > - end=$(cat "$merge_dir/end") > > - msgnum=$(cat "$merge_dir/msgnum") > > - onto=$(cat "$merge_dir/onto") > > - GIT_QUIET=$(cat "$merge_dir/quiet") > > Even though this patch may make the code shorter, it starts to read > head_name and orig_head that we previously did not open and change the > values of variables with what are read from them. Does this change affect > the behaviour in any way (either in performance or in correctness)? True. Previously, head_name and orig_head were lazily read only when the rebase had been finished (in the finish_rb_merge, as I mentioned above). Now these values are read unnecessarily when the rebase is resumed, but a later patch fails. The am-based rebase is not affacted by this patch as it already read head_name and orig_head eagerly. Good point about the correctness. I looked into the correctness issue arising if a file could not be found and I think I concluded that all of the files would always be written (do we need to care about the case where a user deletes e.g. .git/rebase-apply/onto?). However, I did not think about the possibility of overwriting variables. It seems fine, though, as neither continue_merge nor call_merge uses either of these variables. Regarding performance, I would say there is definitely a cost associated with this patch. How big this cost is, though, I don't dare to speculate. I will leave that up to the rest of you. It should be noted that read_state is only called when a rebase is resumed with --continue or --skip, or when it is aborted. There are no changes to the code in the call_merge-continue_merge loop. The performance hit is probably biggest in the --abort case, which previously only read head_name and orig_head. It now ends up reading _at least_ two more values. > > > @@ -236,10 +244,6 @@ do > > finish_rb_merge > > exit > > fi > > - head_name=$(cat "$apply_dir"/head-name) && > > - onto=$(cat "$apply_dir"/onto) && > > - orig_head=$(cat "$apply_dir"/orig-head) && > > - GIT_QUIET=$(cat "$apply_dir"/quiet) > > git am --resolved --3way --resolvemsg="$RESOLVEMSG" && > > move_to_original_branch > > Earlier move-to-original-branch was Ok to be called without head_name, and > the old code here read from the file anyway, so it didn't matter, but now > it seems that the first check and assignment you removed from the function > may matter because this caller does not even read from head_name. Are you > sure about this change? If I understand your question correctly, then yes, it is ok, because of the previous hunk that calls read_state. That call is made before/outside the if block for merge-based rebase, so the variables are already set when this code is reached. > > @@ -279,18 +275,15 @@ do > > die "No rebase in progress?" > > > > git rerere clear > > - > > - test -d "$merge_dir" || merge_dir="$apply_dir" > > My heartbeat skipped when I first saw this. Thanks to the previous > commit, it was exposed that somebody reused $dotest that was only to be > used when using merge machinery because the things left to be done in this > codepath are common between the merge and apply, which is kind of sloppy, > but that sloppiness is now gone ;-). > > Are there places that read from individual files for states left after > this patch, or read_state is the only interface to get to the states? If > the latter that would be a great news, and also would suggest that we may > want to have a corresponding write_state function (and we may even want to > make the state into a single file to reduce I/O---but that is a separate > issue that can be done at the very end of the series if it turns out to be > beneficial). There are still a few other places where state is read, mainly in call_merge. It reads things that vary from iteration to iteration, such as a counter. I forgot to say in the commit message, but I tried to extract only the code that reads the initial state. The write_state function actually is there, but it comes pretty late, in patch 24. I don't remember why I added it so much later. I could possibly move it closer to the beginning of this series. > Of course it is fine if introduction of read_state is an attempt to catch > most common cases, but it would reduce chances of mistake if the coverage > were 100% (as opposed to 99.9%) hence this question. Do you mean if all the state was read in the read_state function? I should say that the pieces of state that are read in read_state are not read anywhere else. But overall, the coverage is more like 60% or so. Thanks for a thorough review. Many of these things should have been in the commit message. I need to get better at writing those... Thanks, Martin