git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andrew Sayers <andrew-git@pileofstuff.org>
To: Junio C Hamano <gitster@pobox.com>
Cc: Git Mailing List <git@vger.kernel.org>,
	David Barr <davidbarr@google.com>,
	Jonathan Nieder <jrnieder@gmail.com>,
	Ramkumar Ramachandra <artagnon@gmail.com>
Subject: Re: [PATCHv2] Add details about svn-fe's dumpfile parsing
Date: Mon, 16 Apr 2012 22:35:23 +0100	[thread overview]
Message-ID: <4F8C909B.7010507@pileofstuff.org> (raw)
In-Reply-To: <7vipgztpaf.fsf@alter.siamese.dyndns.org>

On 16/04/12 21:06, Junio C Hamano wrote:
> Andrew Sayers <andrew-git@pileofstuff.org> writes:
>>  
>> +Unlike Subversion, 'svn-fe' interprets property key/value pairs as
>> +null-terminated binary strings.  This means it will accept content
>> +that Subversion normally wouldn't produce (such as filenames
>> +containing tab characters) or would refuse to parse (such as usernames
>> +containing Latin-1 characters).  However, like Subversion it will
>> +handle newlines incorrectly in filenames (see BUGS below).
>> +
> 
> Do the first two sentences in the above paragraph claim that it a bug that
> 'svn-fe' does not mimick what Subversion does?  I am not sure what lessons
> the authors of tools, whose output is meant to feed svn-fe, are expected
> to learn here.  For example, is the purpose of the above paragraph to make
> tool authors realize that "NUL terminates key and value, so I have to
> refrain from using a key or a value that contains a NUL byte?" [*1*]  Even
> in that case, it is unclear to me what I (as an author of such a tool that
> reads data from somewhere and format it to plesae svn-fe) could do with
> that knowledge.
> 
> [Footnote]
> 
> *1* By the way, NULL is a pointer that does not point anywhere.  The name
> of a byte whose value is 0x00 is NUL.

The dumpfile documentation says that "... property key/value pairs may
be interpreted as binary data in any encoding by client tools"[1], but
SVN itself interprets the data as UTF-8, so I was surprised to see
svn-fe hadn't aped that behaviour.  You could argue this is a bug if you
want to call `svnadmin` the reference implementation.  You could even
argue that treating NUL characters specially is a bug if you want to
call the documentation the official standard (albeit a bug shared by
`svnadmin`).  Personally I don't have a problem with either decision, so
I've just noted some unobvious behaviour.

Lessons to learn will depend on the author, but here are some I took:

1. UTF-8 is the most common encoding, but not the only one.  If your
tool only allows UTF-8 input and only produces only UTF-8 output then
you are the limiting factor in your toolchain.  In my case I think I'll
just live with that, but I would like to have known before I started.

2. `svn` itself isn't universally considered the reference
implementation, only a popular one.  When deciding the correct behaviour
(or the range of possible behaviours), it's not enough just to look at
what `svn` does.

3. `svn-fe` doesn't slavishly follow either the documentation or `svn`.
 Beyond a certain point you have to actually check assumptions against
svn-fe's behaviour (then document what you find so the next guy doesn't
have to ;).  Again, I think this is pragmatic but I would like to have
known earlier.

I've tried not to labour the above points, because it would be easy to
overstate them and because other authors will take different lessons.
It's certainly possible that some author would come to svn-fe wanting to
do something crazy like encode newlines as NULs and come away realising
C doesn't like that.  A more concrete example is that a GSoC student who
turns up next year to work on writing commits back to SVN will need to
know svn-fe chose not to care about UTF-8 and that there's a nest of
edge cases waiting for them.

	- Andrew

[1]http://svn.apache.org/repos/asf/subversion/trunk/notes/dump-load-format.txt

  reply	other threads:[~2012-04-16 21:35 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-15 16:10 [PATCHv2] Add details about svn-fe's dumpfile parsing Andrew Sayers
2012-04-16 20:06 ` Junio C Hamano
2012-04-16 21:35   ` Andrew Sayers [this message]
2012-04-16 21:39     ` Jonathan Nieder
2012-04-16 22:15       ` Andrew Sayers
2012-04-16 22:27         ` Jonathan Nieder
2012-07-23  1:37       ` Jonathan Nieder

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F8C909B.7010507@pileofstuff.org \
    --to=andrew-git@pileofstuff.org \
    --cc=artagnon@gmail.com \
    --cc=davidbarr@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jrnieder@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).