From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michael Haggerty Subject: Re: [PATCH v2 00/10] Use a structure for object IDs. Date: Wed, 11 Mar 2015 17:08:58 +0100 Message-ID: <5500689A.5090101@alum.mit.edu> References: <1425770645-628957-1-git-send-email-sandals@crustytoothpaste.net> Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit Cc: Andreas Schwab , "Kyle J. McKay" , Nguyen Thai Ngoc Duy , Junio C Hamano , Johannes Sixt , David Kastrup , James Denholm To: "brian m. carlson" , git@vger.kernel.org X-From: git-owner@vger.kernel.org Wed Mar 11 17:09:18 2015 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1YVjC8-0005Fe-KE for gcvg-git-2@plane.gmane.org; Wed, 11 Mar 2015 17:09:17 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751494AbbCKQJM (ORCPT ); Wed, 11 Mar 2015 12:09:12 -0400 Received: from alum-mailsec-scanner-8.mit.edu ([18.7.68.20]:52845 "EHLO alum-mailsec-scanner-8.mit.edu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751056AbbCKQJK (ORCPT ); Wed, 11 Mar 2015 12:09:10 -0400 X-AuditID: 12074414-f797f6d000004084-6e-5500689e2f47 Received: from outgoing-alum.mit.edu (OUTGOING-ALUM.MIT.EDU [18.7.68.33]) by alum-mailsec-scanner-8.mit.edu (Symantec Messaging Gateway) with SMTP id 1D.DC.16516.E9860055; Wed, 11 Mar 2015 12:09:02 -0400 (EDT) Received: from [192.168.69.130] (p4FC960B5.dip0.t-ipconnect.de [79.201.96.181]) (authenticated bits=0) (User authenticated as mhagger@ALUM.MIT.EDU) by outgoing-alum.mit.edu (8.13.8/8.12.4) with ESMTP id t2BG8wK2013298 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NOT); Wed, 11 Mar 2015 12:09:00 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Icedove/31.5.0 In-Reply-To: <1425770645-628957-1-git-send-email-sandals@crustytoothpaste.net> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrEKsWRmVeSWpSXmKPExsUixO6iqDsvgyHUYP4JU4vZN7axWXRd6Way aOi9wmzxZO5dZovLe5ezWPy/N4/FonvKW0aLtpk/mCz6l3awOXB6LL/5l8lj56y77B5t08w8 Hr7qYvc4dLiD0ePiJWWPz5vkAtijuG2SEkvKgjPT8/TtErgzbv/iLnioWnH92xrGBsZuuS5G Tg4JAROJ2SeOMEHYYhIX7q1n62Lk4hASuMwo0f3oGiOEc55JYvuj6WwgVbwC2hJ/djSA2SwC qhJNf1+zgthsAroSi3qawSaJCgRJvLz1lxGiXlDi5MwnLCC2iICXxPxHM1hBhjILLGCSeHz8 JFhCWMBW4snsNrBmIQFfiZdnj7CD2JwCfhITWi+B1TAL6EnsuP6LFcKWl2jeOpt5AqPALCQ7 ZiEpm4WkbAEj8ypGucSc0lzd3MTMnOLUZN3i5MS8vNQiXQu93MwSvdSU0k2MkPgQ2cF45KTc IUYBDkYlHt4Zs/6HCLEmlhVX5h5ilORgUhLlDUhmCBXiS8pPqcxILM6ILyrNSS0+xCjBwawk wrsjECjHm5JYWZValA+TkuZgURLn/bZY3U9IID2xJDU7NbUgtQgmK8PBoSTBuzkdqFGwKDU9 tSItM6cEIc3EwQkynEtKpDg1LyW1KLG0JCMeFKvxxcBoBUnxAO1dANLOW1yQmAsUhWg9xago Jc67ASQhAJLIKM2DGwtLeq8YxYG+FOZdAlLFA0yYcN2vgAYzAQ1msQb6n7e4JBEhJdXAKGA/ Z1f/07onxjOT7Pc5/qiq/BrBurPlbZbclTxDh59B9S+Yb9xZc/9J+iaxmZ+khL48 Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: On 03/08/2015 12:23 AM, brian m. carlson wrote: > This is a patch series to convert some of the relevant uses of unsigned > char [20] to struct object_id. > > The goal of this series to improve type-checking in the codebase and to > make it easier to move to a different hash function if the project > decides to do that. This series does not convert all of the codebase, > but only parts. I've dropped some of the patches from earlier (which no > longer apply) and added others. > > Certain parts of the code have to be converted before others to keep the > patch sizes small, maintainable, and bisectable, so functions and > structures that are used across the codebase (e.g. struct object) will > be converted later. Conversion has been done in a somewhat haphazard > manner by converting modules with leaf functions and less commonly used > structs first. > > Since part of the goal is to ease a move to a different hash function if > the project decides to do that, I've adopted Michael Haggerty's > suggestion of using variables named "oid", naming the structure member > "sha1", and using GIT_SHA1_RAWSZ and GIT_SHA1_HEXSZ as the constants. > > I've been holding on to this series for a long time in hopes of > converting more of the code before submitting, but realized that this > deprived others of the ability to use the new structure and required me > to rebase extremely frequently. > [...] I've added CC to several people who commented on v1. I think this is a really interesting project and I hope that it works out. In my opinion, the biggest risk (aside from the sheer amount of work required) is the issue that was brought up on the mailing list when you submitted v1 [1]: Converting an arbitrary (unsigned char *) to a (struct object_id *) is not allowed, because the alignment requirements of the latter might be stricter than those of the former. This means that if, for example, we are reading some data from disk or from the network, and we expect the 20 bytes starting with byte number 17 to be a SHA-1 in binary format, we used to be able to do unsigned char *sha1 = buf + 17; and use sha1 like any other SHA-1, without the need for any copying. But we can't do the analogous struct object_id *oid = (struct object_id *)(buf + 17); because the alignment is not necessarily correct. So in a pure "struct object_id" world, I think we would be forced to change such code to struct object_id oid; hashcpy(oid.sha1, buf + 17); This uses additional memory and introduces copying overhead. Also, the lifetime of oid might exceed the scope of a function, so oid might have to be allocated on the heap and later freed. This adds more computational overhead, more memory overhead, and more programming effort to get it all right. So as much as I like the idea of wrapping SHA-1s in objects, I think it would be a good idea to first make sure that we can all agree on a plan for dealing with situations like this. I can think of the following possibilities: 1. Maybe code that needs to handle SHA-1s with screwy alignment does not exist, or maybe it does the copying anyway. I haven't actually checked. 2. Maybe somebody can prove that struct object_id *oid = (struct object_id *)(buf + 17); somehow *is* allowed by the C standard, or at least that it is sufficiently portable for our purposes. 3. We accept the additional runtime costs and development effort for the extra copies. To accept this approach, we would need some idea of which areas of the code will be affected, and some estimate of how much additional memory it would cost. 4. We continue to support working with SHA-1s declared to be (unsigned char *) in some performance-critical code, even as we migrate most other code to using SHA-1s embedded within a (struct object_id). This will cost some duplication of code. To accept this approach, we would need an idea of *how much* code duplication would be needed. E.g., how many functions will need both (unsigned char *) versions and (struct object_id *) versions? 5. We only make the change opportunistically. Whenever we find a function that needs to work with non-struct-aligned SHA-1s, we leave the function as-is rather than converting it or creating a second version that works with object_id objects. This approach would leave the codebase somewhat schizophrenic. I'm not trying to dissuade you from this project, but I think that for your project to have a chance of success, we need an answer to this question. So let's confront it now rather than after you have invested lots more time and/or gotten patches merged. Michael [1] http://thread.gmane.org/gmane.comp.version-control.git/248054 -- Michael Haggerty mhagger@alum.mit.edu