From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michael Haggerty Subject: Re: [PATCH v2 6/8] refs: add update_refs for multiple simultaneous updates Date: Sat, 31 Aug 2013 20:19:04 +0200 Message-ID: <52223398.2080109@alum.mit.edu> References: Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: git@vger.kernel.org, gitster@pobox.com, Martin Fick To: Brad King X-From: git-owner@vger.kernel.org Sat Aug 31 20:26:17 2013 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1VFpsG-0003NJ-2p for gcvg-git-2@plane.gmane.org; Sat, 31 Aug 2013 20:26:16 +0200 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754533Ab3HaS0M (ORCPT ); Sat, 31 Aug 2013 14:26:12 -0400 Received: from alum-mailsec-scanner-5.mit.edu ([18.7.68.17]:51454 "EHLO alum-mailsec-scanner-5.mit.edu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754357Ab3HaS0L (ORCPT ); Sat, 31 Aug 2013 14:26:11 -0400 X-Greylist: delayed 422 seconds by postgrey-1.27 at vger.kernel.org; Sat, 31 Aug 2013 14:26:10 EDT X-AuditID: 12074411-b7f296d000001209-0c-5222339c4630 Received: from outgoing-alum.mit.edu (OUTGOING-ALUM.MIT.EDU [18.7.68.33]) by alum-mailsec-scanner-5.mit.edu (Symantec Messaging Gateway) with SMTP id F7.84.04617.C9332225; Sat, 31 Aug 2013 14:19:08 -0400 (EDT) Received: from [192.168.69.9] (p57A2512F.dip0.t-ipconnect.de [87.162.81.47]) (authenticated bits=0) (User authenticated as mhagger@ALUM.MIT.EDU) by outgoing-alum.mit.edu (8.13.8/8.12.4) with ESMTP id r7VIJ5ex001428 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Sat, 31 Aug 2013 14:19:06 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130804 Thunderbird/17.0.8 In-Reply-To: X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFlrDKsWRmVeSWpSXmKPExsUixO6iqDvHWCnI4Gcbq8XOdRIWXVe6mSwa eq8wW9x/v47JgcXjcl8vk8fHZ8vZPS5eUvb4vEkugCWK2yYpsaQsODM9T98ugTvj3YJPzAUn 3Cq29jxmamDcYN7FyMkhIWAisaVzHSuELSZx4d56ti5GLg4hgcuMEptWTGeEcM4wScw88oS5 i5GDg1dAW+LGizCQBhYBVYnerjOMIDabgK7Eop5mJhBbVCBMYsf0iWwgNq+AoMTJmU9YQGwR oPpV6xrA4swCfhJdT2aDLRYWCJWY9nUjE8SurYwSW+5+YAdJcArESKzYt5cdokFH4l3fA2YI W15i+9s5zBMYBWYh2TELSdksJGULGJlXMcol5pTm6uYmZuYUpybrFicn5uWlFuma6uVmluil ppRuYoSEs+AOxhkn5Q4xCnAwKvHwJixVCBJiTSwrrsw9xCjJwaQkyuuhrxQkxJeUn1KZkVic EV9UmpNafIhRgoNZSYT39ErFICHelMTKqtSifJiUNAeLkjgv3xJ1PyGB9MSS1OzU1ILUIpis DAeHkgRvqRHQUMGi1PTUirTMnBKENBMHJ8hwLimR4tS8lNSixNKSjHhQrMYXA6MVJMUDtDcf pJ23uCAxFygK0XqK0Zjjz8q5nxg53jXN+8QoxJKXn5cqJc7rDFIqAFKaUZoHtwiWyF4xigP9 LcxrDFLFA0yCcPNeAa1iAlp1bSLIi8UliQgpqQbG9nv+hYZyxg8FdkgWBHQUGd2pkp4gqXcy nvnHG61fjtOaPx8W9ZHUiy7/+DspcgnLwrc9wUpLuj+euPZ37Z8pdhtXG+ddaTb+ Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: On 08/30/2013 08:12 PM, Brad King wrote: > Add 'struct ref_update' to encode the information needed to update or > delete a ref (name, new sha1, optional old sha1, no-deref flag). Add > function 'update_refs' accepting an array of updates to perform. First > sort the input array to order locks consistently everywhere and reject > multiple updates to the same ref. Then acquire locks on all refs with > verified old values. Then update or delete all refs accordingly. Fail > if any one lock cannot be obtained or any one old value does not match. > > Though the refs themeselves cannot be modified together in a single s/themeselves/themselves/ > atomic transaction, this function does enable some useful semantics. > For example, a caller may create a new branch starting from the head of > another branch and rewind the original branch at the same time. This > transfers ownership of commits between branches without risk of losing > commits added to the original branch by a concurrent process, or risk of > a concurrent process creating the new branch first. > > Signed-off-by: Brad King > --- > refs.c | 121 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > refs.h | 14 ++++++++ > 2 files changed, 135 insertions(+) > > diff --git a/refs.c b/refs.c > index 3bcd26e..901a38a 100644 > --- a/refs.c > +++ b/refs.c > @@ -3238,6 +3238,127 @@ int update_ref(const char *action, const char *refname, > return update_ref_write(action, refname, sha1, lock, onerr); > } > > +static int ref_update_compare(const void *r1, const void *r2) > +{ > + struct ref_update *u1 = (struct ref_update *)(r1); > + struct ref_update *u2 = (struct ref_update *)(r2); If you declare u1 and u2 to be "const struct ref_update *" (i.e., add "const"), then you have const correctness and don't need the explicit casts. (And the parentheses around r1 and r2 are superfluous in any case.) > + int ret; Style: we usually put a blank line between variable declarations and the first line of code. > + ret = strcmp(u1->ref_name, u2->ref_name); > + if (ret) > + return ret; > + ret = hashcmp(u1->new_sha1, u2->new_sha1); > + if (ret) > + return ret; > + ret = hashcmp(u1->old_sha1, u2->old_sha1); > + if (ret) > + return ret; > + ret = u1->flags - u2->flags; > + if (ret) > + return ret; > + return u1->have_old - u2->have_old; > +} Is there a need to compare more than ref_name? If two entries are found with the same name, then ref_update_reject_duplicates() will error out anyway. So the relative order among entries with the same name is irrelevant. I think it would be OK to return 0 for any entries with the same ref_name, even if they differ in other fields. > + > +static int ref_update_reject_duplicates(struct ref_update *updates, int n, > + enum action_on_err onerr) > +{ > + int i; > + for (i = 1; i < n; ++i) > + if (!strcmp(updates[i - 1].ref_name, updates[i].ref_name)) > + break; The error handling code could be right here instead of the "break" statement, removing the need for the "if" conditional. > + if (i < n) { > + const char *str = "Multiple updates for ref '%s' not allowed."; > + switch (onerr) { > + case MSG_ON_ERR: error(str, updates[i].ref_name); break; > + case DIE_ON_ERR: die(str, updates[i].ref_name); break; > + case QUIET_ON_ERR: break; > + } > + return 1; > + } > + return 0; > +} > + > +int update_refs(const char *action, const struct ref_update *updates_orig, > + int n, enum action_on_err onerr) > +{ > + int ret = 0, delnum = 0, i; > + struct ref_update *updates; > + int *types; > + struct ref_lock **locks; > + const char **delnames; > + > + if (!updates_orig || !n) > + return 0; > + > + /* Allocate work space: */ > + updates = xmalloc(sizeof(struct ref_update) * n); It seems preferred here to write updates = xmalloc(sizeof(*updates) * n); as this will continue to work if the type of updates is ever changed. Similarly for the next lines. > + types = xmalloc(sizeof(int) * n); > + locks = xmalloc(sizeof(struct ref_lock *) * n); > + delnames = xmalloc(sizeof(const char *) * n); An alternative to managing separate arrays to hold types and locks would be to include the scratch space in struct ref_update and document it "for internal use only; need not be initialized by caller". On the one hand it's ugly to cruft up the "interface" with internal implementation details; on the other hand there is precedent for this sort of thing (e.g., ref_lock::force_write or lock_file::on_list) and it would simplify the code. > + > + /* Copy, sort, and reject duplicate refs: */ > + memcpy(updates, updates_orig, sizeof(struct ref_update) * n); > + qsort(updates, n, sizeof(struct ref_update), ref_update_compare); You could save some space and memory shuffling (during memcpy() and qsort()) if you would declare "updates" to be an array of pointers to "struct ref_update" rather than an array of structs. Sorting could then be done by moving pointers around instead of moving the structs. This would also make it easier for update_refs() to pass information about the references back to its caller, should that ever be needed. But I suppose that n will usually be small, so this suggestion can be considered optional. > + if (ref_update_reject_duplicates(updates, n, onerr)) { > + free(updates); > + free(types); > + free(locks); > + free(delnames); > + return 1; > + } > + > + /* Acquire all locks while verifying old values: */ > + for (i = 0; i < n; ++i) { > + locks[i] = update_ref_lock(updates[i].ref_name, > + (updates[i].have_old ? > + updates[i].old_sha1 : NULL), > + updates[i].flags, > + &types[i], onerr); > + if (!locks[i]) > + break; The error handling code could go here in place of the "break", especially if it's only "ret = 1; goto label;" as suggested below. > + } > + > + /* Abort if we did not get all locks: */ > + if (i < n) { > + while (--i >= 0) > + unlock_ref(locks[i]); > + free(updates); > + free(types); > + free(locks); > + free(delnames); > + return 1; > + } > + > + /* Perform updates first so live commits remain referenced: */ > + for (i = 0; i < n; ++i) > + if (!is_null_sha1(updates[i].new_sha1)) { > + ret |= update_ref_write(action, > + updates[i].ref_name, > + updates[i].new_sha1, > + locks[i], onerr); > + locks[i] = 0; /* freed by update_ref_write */ > + } > + Hmmm, if one of the calls to update_ref_write() fails, would it be safer to abort the rest of the work (especially the reference deletions)? > + /* Perform deletes now that updates are safely completed: */ > + for (i = 0; i < n; ++i) > + if (locks[i]) { > + delnames[delnum++] = locks[i]->ref_name; > + ret |= delete_ref_loose(locks[i], types[i]); > + } > + ret |= repack_without_refs(delnames, delnum); > + for (i = 0; i < delnum; ++i) > + unlink_or_warn(git_path("logs/%s", delnames[i])); > + clear_loose_ref_cache(&ref_cache); > + for (i = 0; i < n; ++i) > + if (locks[i]) > + unlock_ref(locks[i]); > + > + free(updates); > + free(types); > + free(locks); > + free(delnames); > + return ret; > +} There's a lot of duplicated cleanup code in the function. If you put a label before the final for loop, and if you initialize the locks array to zeros (e.g., by using xcalloc()), then the three exits could all share the same code "ret = 1; goto cleanup;". > + > struct ref *find_ref_by_name(const struct ref *list, const char *name) > { > for ( ; list; list = list->next) > diff --git a/refs.h b/refs.h > index 2cd307a..a8a7cc6 100644 > --- a/refs.h > +++ b/refs.h > @@ -214,6 +214,20 @@ int update_ref(const char *action, const char *refname, > const unsigned char *sha1, const unsigned char *oldval, > int flags, enum action_on_err onerr); > > +struct ref_update { > + const char *ref_name; > + unsigned char new_sha1[20]; > + unsigned char old_sha1[20]; > + int flags; > + int have_old; > +}; Please document this structure, especially the relationship between have_old and old_sha1. > + > +/** > + * Lock all refs and then perform all modifications. > + */ > +int update_refs(const char *action, const struct ref_update *updates, > + int n, enum action_on_err onerr); > + > extern int parse_hide_refs_config(const char *var, const char *value, const char *); > extern int ref_is_hidden(const char *); > > Overall, thanks; it looks good. I think this change is useful by itself and is also a good start towards implementing true reference transactions (which I think will be necessary pretty soon, at least for some applications). I will write another email discussing how your changes are related to some changes that I have been working on lately. Michael -- Michael Haggerty mhagger@alum.mit.edu http://softwareswirl.blogspot.com/