From: Michael Haggerty <mhagger@alum.mit.edu>
To: Thomas Gummerer <t.gummerer@gmail.com>
Cc: git@vger.kernel.org, trast@student.ethz.ch, gitster@pobox.com,
peff@peff.net, spearce@spearce.org, davidbarr@google.com
Subject: Re: Index format v5
Date: Sat, 19 May 2012 07:40:56 +0200 [thread overview]
Message-ID: <4FB73268.4020204@alum.mit.edu> (raw)
In-Reply-To: <20120516215407.GA1738@tgummerer.surfnet.iacbox>
On 05/16/2012 11:54 PM, Thomas Gummerer wrote:
> On 05/16, Michael Haggerty wrote:
>> I just reviewed version 1369bd855b86 of your script, and it is MUCH
>> better. It's easy to read and review. The functions that it
>> defines are now self-contained and could therefore be reused for
>> other purposes. There are fewer magic numbers (though there are
>> still a few; I wonder if there is a way to get rid of those?)
>> You've done a nice job polishing up the code.
>
> Thanks for the feedback! I could get rid of the magic numbers for
> the crc code, but I'm not sure if it makes sense to replace the
> others with constants, since they only occur once in the file. I
> added comments instead explaining where those numbers come from
> instead.
I think it is possible to remove the last magic number and also to make
the CRC handling easier. I have pushed some suggested changes to github
[1]:
1. With the current code, trying to read a file that is less than 24
bytes long would result in a struct.error (because it would try to
unpack a string that is shorter than the struct) whereas the underlying
error in this case should almost always be reported as a signature
error. So it is correct to read the signature separately from the rest
of the header, but it is even more correct to check the signature
(including its length) before reading on.
2. I introduce a class CRC to hold checksums, so that (a) the code for
handling checksums can be encapsulated, and (b) an instance of this
class can be passed into functions and mutated in-place, which is less
cumbersome than requiring functions to return (data, crc) tuples. This,
in turn, makes possible...
3. ...a new function read_struct(f, s, crc), which reads the data for a
struct.Struct from f, checksums it, and returns the unpacked data. This
function is more convenient to use than the old read_calc_crc().
4. The checksum instance can also be made responsible for verifying that
the next four bytes in the file agree with the expected checksum. This
removes some more code duplication. (See CRC.matches().)
5. You read the extension offsets using CRC_STRUCT.size, which is
technically correct but misleading. In fact, the extension offsets
should be documented using their own EXTENSION_INDEX_STRUCT. Also, it
makes more sense to store the unpacked integer offsets to extoffsets
rather than the raw 4-byte strings.
6. With a couple of more minor changes it is possible to replace the
magic number "24" in read_index_entries(). With this change the
computation is documented very explicitly and is also (somewhat) robust
against future changes in the format.
Look over my changes and take whatever you want.
> Thanks, I have changed those. I added the docstrings for all read
> functions, they however don't seem to make sense for the print
> functions, since you're probably faster just reading the code for
> them.
That's fine. If you document nontrivial functions you are already doing
much better than the git project average.
> One since I changed in addition is to those changes, is that I
> gave the exceptions names to make them more meaningful.
Good.
Michael
[1] https://github.com/mhagger/git/tree/pythonprototype
--
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/
next prev parent reply other threads:[~2012-05-19 5:48 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-05-03 17:25 Index format v5 Thomas Gummerer
2012-05-03 18:16 ` Thomas Rast
2012-05-03 19:03 ` Junio C Hamano
2012-05-04 7:12 ` Michael Haggerty
2012-05-07 22:18 ` Robin Rosenberg
2012-05-03 18:21 ` Ronan Keryell
2012-05-03 20:36 ` Thomas Gummerer
2012-05-03 18:54 ` Junio C Hamano
2012-05-03 19:11 ` Thomas Rast
2012-05-03 19:31 ` Thomas Rast
2012-05-03 19:32 ` Thomas Rast
2012-05-03 20:32 ` Junio C Hamano
2012-05-03 21:38 ` Thomas Gummerer
2012-05-07 18:57 ` Robin Rosenberg
2012-05-03 19:38 ` solo-git
2012-05-04 13:20 ` Nguyen Thai Ngoc Duy
2012-05-04 15:44 ` Thomas Gummerer
2012-05-04 13:25 ` Philip Oakley
2012-05-04 15:46 ` Junio C Hamano
2012-05-06 10:23 ` Nguyen Thai Ngoc Duy
2012-05-07 13:44 ` Thomas Gummerer
2012-05-06 16:49 ` Phil Hord
2012-05-07 13:08 ` Thomas Gummerer
2012-05-07 15:15 ` Michael Haggerty
2012-05-08 14:11 ` Thomas Gummerer
2012-05-08 14:25 ` Nguyen Thai Ngoc Duy
2012-05-08 14:34 ` Nguyen Thai Ngoc Duy
2012-05-10 6:53 ` Thomas Gummerer
2012-05-10 11:06 ` Nguyen Thai Ngoc Duy
2012-05-09 8:37 ` Michael Haggerty
2012-05-10 12:19 ` Thomas Gummerer
2012-05-10 18:17 ` Michael Haggerty
2012-05-11 17:12 ` Thomas Gummerer
2012-05-13 19:50 ` Michael Haggerty
2012-05-14 15:01 ` Thomas Gummerer
2012-05-14 21:08 ` Michael Haggerty
2012-05-14 22:10 ` Thomas Rast
2012-05-15 6:43 ` Michael Haggerty
2012-05-15 13:49 ` Thomas Gummerer
2012-05-15 15:02 ` Michael Haggerty
2012-05-18 15:38 ` Thomas Gummerer
2012-05-19 13:00 ` Michael Haggerty
2012-05-21 7:45 ` Thomas Gummerer
2012-05-16 5:01 ` Michael Haggerty
2012-05-16 21:54 ` Thomas Gummerer
2012-05-19 5:40 ` Michael Haggerty [this message]
2012-05-21 20:30 ` Thomas Gummerer
2012-05-13 21:01 ` Philip Oakley
2012-05-14 14:54 ` Thomas Gummerer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4FB73268.4020204@alum.mit.edu \
--to=mhagger@alum.mit.edu \
--cc=davidbarr@google.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=peff@peff.net \
--cc=spearce@spearce.org \
--cc=t.gummerer@gmail.com \
--cc=trast@student.ethz.ch \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).