Re: Index format v5 - Thomas Gummerer

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Thomas Gummerer <t.gummerer@gmail.com>
To: Michael Haggerty <mhagger@alum.mit.edu>
Cc: git@vger.kernel.org, trast@student.ethz.ch, gitster@pobox.com,
	peff@peff.net, spearce@spearce.org, davidbarr@google.com
Subject: Re: Index format v5
Date: Mon, 14 May 2012 17:01:13 +0200	[thread overview]
Message-ID: <20120514150113.GD2107@tgummerer> (raw)
In-Reply-To: <4FB01080.6010605@alum.mit.edu>



On 05/13, Michael Haggerty wrote:
> On 05/11/2012 07:12 PM, Thomas Gummerer wrote:
> >Thanks for your feedback! To get clearer code I've now written a
> >working reader for the v5 index format in Python. The full reader
> >would probably be to long for the mailing list, but here is the
> >interesting part:
> >
> >[...]
> >The full reader can be found here:
> >https://github.com/tgummerer/git/blob/pythonprototype/git-read-index-v5.py
> 
> Good.
> 
> I tried to review your code 3fe08f9b:git-read-index-v5.py and
> compare it to your file spec f858cf6a9:Index-format-v5.textile.  I
> have the following comments (some of them already discussed in IRC):
> 
> 1. Your script seems to be reading a different version of the file
> than described in the spec.  [When I mentioned this on IRC you
> pushed a new version a4ee558ea of the spec.]
> 
> 2. Your script seems to assume that the index file has no
> extensions. It would be better (for documentation purposes and to
> ensure that there are no surprises) to make sure that the code knows
> how to handle extensions.
> 
> 3. Please document briefly how the scripts should be used.
> 
> 4. Please limit line length to 80 columns (like the main git project).
> 
> 5. Python has a nicer way to initialize dictionaries whose keys are
> all valid identifiers; for example:
> 
> -        return dict({"signature": signature, "vnr": header[0],
> "ndir": header[1], "nfile": header[2], "next": header[3]})
> +        return dict(signature=signature, vnr=header[0], ndir=header[1],
> +                    nfile=header[2], next=header[3])
> 
> 6. Some of your print statements are just begging to be written
> using string interpolation; e.g.,
> 
> -        print d["pathname"] + " " + str(d["flags"]) + " " +
> str(d["foffset"]) + " " + str(d["cr"]) + " " + str(d["ncr"]) + " " +
> str(d["nsubtrees"]) + " " + str(d["nfiles"]) + " " +
> str(d["nentries"]) + " " + str(binascii.hexlify(d["objname"]))
> +        print ("%(pathname)s %(flags)s %(foffset)s %(cr)s %(ncr)s "
> +               "%(nsubtrees)s %(nfiles)s %(nentries)s " % d
> +               + str(binascii.hexlify(d["objname"])))
> 
> printheader() can be rewritten similarly.
> 
> 7. You have a couple of while loops that would be easier to read if
> written as for loops.
> 
> 8. There is no need to use global variables.  Global variables have
> lots of disadvantages, one of which is that it is hard to tell what
> functions have side effects via the global variables.  It is better
> to pass the needed variables explicitly to functions that need them.
> 
> 9. ...after you eliminate the global variables, you will see that
> the checksums are mostly needed over limited areas of code then can
> be discarded.  Rewriting the checksum handling in this way would
> make it easier to see exactly what range of bytes is included in a
> particular checksum.
> 
> 10. There is no need to keep track of all of the data that will go
> into a checksum.  The CRC32 checksum can be computed incrementally
> via the second argument of binascii.crc32(data, crc).  Therefore,
> you only need to retain a 32-bit running checksum instead of the
> filedata array of data strings.
> 
> 11. It is bad style to generate output from within the
> readindexentries() function.  Given that it reads the whole array of
> file entries anyway, it would be cleaner to return the array to the
> caller and let the caller print out what it wants.
> 
> 12. Your handling of checksum errors is inconsistent.  In some
> places you generate exceptions; in another you simply print an error
> to stdout (not stderr!) and proceed to use the corrupt data.
> 
> 13. It is probably clearer to unpack the tuples returned by
> struct.unpack() directly into local variables with meaningful names
> instead of carrying them around as a tuple; e.g.,
> 
> -    header = struct.unpack('!IIII', checksum.add(f.read(16)))
> +    (vnr, ndir, nfile, next) = struct.unpack('!IIII', fread(16))
> 
> 14. It is more correct to check the file signature and version
> explicitly before plowing into the rest of the file (that's what
> they're there for!)
> 
> That's as far as I've got.
> 
> Michael

Thanks a lot for your feedback. I've now refactored the code and
thanks to your suggestions hopefully made it simpler and easier
to read. The reader should now read exactly the data from the
spec.

next prev parent reply	other threads:[~2012-05-14 15:01 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-03 17:25 Index format v5 Thomas Gummerer
2012-05-03 18:16 ` Thomas Rast
2012-05-03 19:03   ` Junio C Hamano
2012-05-04  7:12   ` Michael Haggerty
2012-05-07 22:18     ` Robin Rosenberg
2012-05-03 18:21 ` Ronan Keryell
2012-05-03 20:36   ` Thomas Gummerer
2012-05-03 18:54 ` Junio C Hamano
2012-05-03 19:11   ` Thomas Rast
2012-05-03 19:31   ` Thomas Rast
2012-05-03 19:32     ` Thomas Rast
2012-05-03 20:32       ` Junio C Hamano
2012-05-03 21:38   ` Thomas Gummerer
2012-05-07 18:57     ` Robin Rosenberg
2012-05-03 19:38 ` solo-git
2012-05-04 13:20 ` Nguyen Thai Ngoc Duy
2012-05-04 15:44   ` Thomas Gummerer
2012-05-04 13:25 ` Philip Oakley
2012-05-04 15:46   ` Junio C Hamano
2012-05-06 10:23 ` Nguyen Thai Ngoc Duy
2012-05-07 13:44   ` Thomas Gummerer
2012-05-06 16:49 ` Phil Hord
2012-05-07 13:08   ` Thomas Gummerer
2012-05-07 15:15 ` Michael Haggerty
2012-05-08 14:11   ` Thomas Gummerer
2012-05-08 14:25     ` Nguyen Thai Ngoc Duy
2012-05-08 14:34       ` Nguyen Thai Ngoc Duy
2012-05-10  6:53         ` Thomas Gummerer
2012-05-10 11:06           ` Nguyen Thai Ngoc Duy
2012-05-09  8:37     ` Michael Haggerty
2012-05-10 12:19       ` Thomas Gummerer
2012-05-10 18:17         ` Michael Haggerty
2012-05-11 17:12           ` Thomas Gummerer
2012-05-13 19:50             ` Michael Haggerty
2012-05-14 15:01               ` Thomas Gummerer [this message]
2012-05-14 21:08                 ` Michael Haggerty
2012-05-14 22:10                   ` Thomas Rast
2012-05-15  6:43                     ` Michael Haggerty
2012-05-15 13:49                   ` Thomas Gummerer
2012-05-15 15:02                     ` Michael Haggerty
2012-05-18 15:38                       ` Thomas Gummerer
2012-05-19 13:00                         ` Michael Haggerty
2012-05-21  7:45                           ` Thomas Gummerer
2012-05-16  5:01                     ` Michael Haggerty
2012-05-16 21:54                       ` Thomas Gummerer
2012-05-19  5:40                         ` Michael Haggerty
2012-05-21 20:30                           ` Thomas Gummerer
2012-05-13 21:01 ` Philip Oakley
2012-05-14 14:54   ` Thomas Gummerer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120514150113.GD2107@tgummerer \
    --to=t.gummerer@gmail.com \
    --cc=davidbarr@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mhagger@alum.mit.edu \
    --cc=peff@peff.net \
    --cc=spearce@spearce.org \
    --cc=trast@student.ethz.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).