* [PATCHv2] Add details about svn-fe's dumpfile parsing
@ 2012-04-15 16:10 Andrew Sayers
2012-04-16 20:06 ` Junio C Hamano
0 siblings, 1 reply; 7+ messages in thread
From: Andrew Sayers @ 2012-04-15 16:10 UTC (permalink / raw)
To: Git Mailing List; +Cc: David Barr, Jonathan Nieder, Ramkumar Ramachandra
The documentation for the SVN dumpfile format says that "property key/value
pairs may be interpreted as binary data in any encoding by client tools".
Documenting svn-fe's interpretation helps authors of related tools, while
explaining limitations helps ordinary users import their SVN repositories.
The "INPUT FORMAT" section is aimed at authors of tools that interact with
svn-fe, so it particularly addresses assumptions that authors might make after
dealing with svn itself.
The "BUGS" section is aimed at ordinary users, so it only explains what readers
need to know when importing a repository. In particular, users don't need to
know that other characters in the range 0x01-0x1F are imported correctly, even
though they were all disabled in Subversion 1.2.0. The text in this section is
based largely on an example sent by Jonathan Nieder, with minor changes to suit
the surrounding style.
Signed-off-by: Andrew Sayers <andrew-git@pileofstuff.org>
---
contrib/svn-fe/svn-fe.txt | 13 +++++++++++++
1 files changed, 13 insertions(+), 0 deletions(-)
diff --git a/contrib/svn-fe/svn-fe.txt b/contrib/svn-fe/svn-fe.txt
index 1128ab2..3872b9d 100644
--- a/contrib/svn-fe/svn-fe.txt
+++ b/contrib/svn-fe/svn-fe.txt
@@ -32,6 +32,13 @@ Subversion's repository dump format is documented in full in
Files in this format can be generated using the 'svnadmin dump' or
'svk admin dump' command.
+Unlike Subversion, 'svn-fe' interprets property key/value pairs as
+null-terminated binary strings. This means it will accept content
+that Subversion normally wouldn't produce (such as filenames
+containing tab characters) or would refuse to parse (such as usernames
+containing Latin-1 characters). However, like Subversion it will
+handle newlines incorrectly in filenames (see BUGS below).
+
OUTPUT FORMAT
-------------
The fast-import format is documented by the git-fast-import(1)
@@ -65,6 +72,12 @@ Empty directories and unknown properties are silently discarded.
The exit status does not reflect whether an error was detected.
+Due to limitations in the Subversion dumpfile format, 'svn-fe' does
+not support filenames with newlines. 'svn add' has forbidden such
+filenames since version 1.2.0, but some historical repositories still
+contain them. An import can appear to succeed and produce incorrect
+results when such pathological filenames are present.
+
SEE ALSO
--------
git-svn(1), svn2git(1), svk(1), git-filter-branch(1), git-fast-import(1),
--
1.7.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCHv2] Add details about svn-fe's dumpfile parsing
2012-04-15 16:10 [PATCHv2] Add details about svn-fe's dumpfile parsing Andrew Sayers
@ 2012-04-16 20:06 ` Junio C Hamano
2012-04-16 21:35 ` Andrew Sayers
0 siblings, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2012-04-16 20:06 UTC (permalink / raw)
To: Andrew Sayers
Cc: Git Mailing List, David Barr, Jonathan Nieder,
Ramkumar Ramachandra
Andrew Sayers <andrew-git@pileofstuff.org> writes:
> The documentation for the SVN dumpfile format says that "property key/value
> pairs may be interpreted as binary data in any encoding by client tools".
> Documenting svn-fe's interpretation helps authors of related tools, while
> explaining limitations helps ordinary users import their SVN repositories.
>
> The "INPUT FORMAT" section is aimed at authors of tools that interact with
> svn-fe, so it particularly addresses assumptions that authors might make after
> dealing with svn itself.
>
> The "BUGS" section is aimed at ordinary users, so it only explains what readers
> need to know when importing a repository. In particular, users don't need to
> know that other characters in the range 0x01-0x1F are imported correctly, even
> though they were all disabled in Subversion 1.2.0. The text in this section is
> based largely on an example sent by Jonathan Nieder, with minor changes to suit
> the surrounding style.
>
> Signed-off-by: Andrew Sayers <andrew-git@pileofstuff.org>
> ---
OK, so is this ready for 'master' already?
> contrib/svn-fe/svn-fe.txt | 13 +++++++++++++
> 1 files changed, 13 insertions(+), 0 deletions(-)
>
> diff --git a/contrib/svn-fe/svn-fe.txt b/contrib/svn-fe/svn-fe.txt
> index 1128ab2..3872b9d 100644
> --- a/contrib/svn-fe/svn-fe.txt
> +++ b/contrib/svn-fe/svn-fe.txt
> @@ -32,6 +32,13 @@ Subversion's repository dump format is documented in full in
> Files in this format can be generated using the 'svnadmin dump' or
> 'svk admin dump' command.
>
> +Unlike Subversion, 'svn-fe' interprets property key/value pairs as
> +null-terminated binary strings. This means it will accept content
> +that Subversion normally wouldn't produce (such as filenames
> +containing tab characters) or would refuse to parse (such as usernames
> +containing Latin-1 characters). However, like Subversion it will
> +handle newlines incorrectly in filenames (see BUGS below).
> +
Do the first two sentences in the above paragraph claim that it a bug that
'svn-fe' does not mimick what Subversion does? I am not sure what lessons
the authors of tools, whose output is meant to feed svn-fe, are expected
to learn here. For example, is the purpose of the above paragraph to make
tool authors realize that "NUL terminates key and value, so I have to
refrain from using a key or a value that contains a NUL byte?" [*1*] Even
in that case, it is unclear to me what I (as an author of such a tool that
reads data from somewhere and format it to plesae svn-fe) could do with
that knowledge.
[Footnote]
*1* By the way, NULL is a pointer that does not point anywhere. The name
of a byte whose value is 0x00 is NUL.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCHv2] Add details about svn-fe's dumpfile parsing
2012-04-16 20:06 ` Junio C Hamano
@ 2012-04-16 21:35 ` Andrew Sayers
2012-04-16 21:39 ` Jonathan Nieder
0 siblings, 1 reply; 7+ messages in thread
From: Andrew Sayers @ 2012-04-16 21:35 UTC (permalink / raw)
To: Junio C Hamano
Cc: Git Mailing List, David Barr, Jonathan Nieder,
Ramkumar Ramachandra
On 16/04/12 21:06, Junio C Hamano wrote:
> Andrew Sayers <andrew-git@pileofstuff.org> writes:
>>
>> +Unlike Subversion, 'svn-fe' interprets property key/value pairs as
>> +null-terminated binary strings. This means it will accept content
>> +that Subversion normally wouldn't produce (such as filenames
>> +containing tab characters) or would refuse to parse (such as usernames
>> +containing Latin-1 characters). However, like Subversion it will
>> +handle newlines incorrectly in filenames (see BUGS below).
>> +
>
> Do the first two sentences in the above paragraph claim that it a bug that
> 'svn-fe' does not mimick what Subversion does? I am not sure what lessons
> the authors of tools, whose output is meant to feed svn-fe, are expected
> to learn here. For example, is the purpose of the above paragraph to make
> tool authors realize that "NUL terminates key and value, so I have to
> refrain from using a key or a value that contains a NUL byte?" [*1*] Even
> in that case, it is unclear to me what I (as an author of such a tool that
> reads data from somewhere and format it to plesae svn-fe) could do with
> that knowledge.
>
> [Footnote]
>
> *1* By the way, NULL is a pointer that does not point anywhere. The name
> of a byte whose value is 0x00 is NUL.
The dumpfile documentation says that "... property key/value pairs may
be interpreted as binary data in any encoding by client tools"[1], but
SVN itself interprets the data as UTF-8, so I was surprised to see
svn-fe hadn't aped that behaviour. You could argue this is a bug if you
want to call `svnadmin` the reference implementation. You could even
argue that treating NUL characters specially is a bug if you want to
call the documentation the official standard (albeit a bug shared by
`svnadmin`). Personally I don't have a problem with either decision, so
I've just noted some unobvious behaviour.
Lessons to learn will depend on the author, but here are some I took:
1. UTF-8 is the most common encoding, but not the only one. If your
tool only allows UTF-8 input and only produces only UTF-8 output then
you are the limiting factor in your toolchain. In my case I think I'll
just live with that, but I would like to have known before I started.
2. `svn` itself isn't universally considered the reference
implementation, only a popular one. When deciding the correct behaviour
(or the range of possible behaviours), it's not enough just to look at
what `svn` does.
3. `svn-fe` doesn't slavishly follow either the documentation or `svn`.
Beyond a certain point you have to actually check assumptions against
svn-fe's behaviour (then document what you find so the next guy doesn't
have to ;). Again, I think this is pragmatic but I would like to have
known earlier.
I've tried not to labour the above points, because it would be easy to
overstate them and because other authors will take different lessons.
It's certainly possible that some author would come to svn-fe wanting to
do something crazy like encode newlines as NULs and come away realising
C doesn't like that. A more concrete example is that a GSoC student who
turns up next year to work on writing commits back to SVN will need to
know svn-fe chose not to care about UTF-8 and that there's a nest of
edge cases waiting for them.
- Andrew
[1]http://svn.apache.org/repos/asf/subversion/trunk/notes/dump-load-format.txt
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCHv2] Add details about svn-fe's dumpfile parsing
2012-04-16 21:35 ` Andrew Sayers
@ 2012-04-16 21:39 ` Jonathan Nieder
2012-04-16 22:15 ` Andrew Sayers
2012-07-23 1:37 ` Jonathan Nieder
0 siblings, 2 replies; 7+ messages in thread
From: Jonathan Nieder @ 2012-04-16 21:39 UTC (permalink / raw)
To: Andrew Sayers
Cc: Junio C Hamano, Git Mailing List, David Barr,
Ramkumar Ramachandra
Andrew Sayers wrote:
> The dumpfile documentation says that "... property key/value pairs may
> be interpreted as binary data in any encoding by client tools"[1], but
> SVN itself interprets the data as UTF-8
Yes, I suspect most of the changes you proposed for the INPUT FORMAT
section would actually be better as changes for the
dump-load-format.txt document. I imagine that folks on the dev@ list
might be able to clarify a few details (e.g., what one is expected to
do with historical repositories with non-UTF-8 property data), too.
What do you think?
The patch for svn-fe(1) already looks pretty good. I was planning on
applying it after finding a moment to clarify the patch description.
Thanks again,
Jonathan
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCHv2] Add details about svn-fe's dumpfile parsing
2012-04-16 21:39 ` Jonathan Nieder
@ 2012-04-16 22:15 ` Andrew Sayers
2012-04-16 22:27 ` Jonathan Nieder
2012-07-23 1:37 ` Jonathan Nieder
1 sibling, 1 reply; 7+ messages in thread
From: Andrew Sayers @ 2012-04-16 22:15 UTC (permalink / raw)
To: Jonathan Nieder
Cc: Junio C Hamano, Git Mailing List, David Barr,
Ramkumar Ramachandra
On 16/04/12 22:39, Jonathan Nieder wrote:
> Andrew Sayers wrote:
>
>> The dumpfile documentation says that "... property key/value pairs may
>> be interpreted as binary data in any encoding by client tools"[1], but
>> SVN itself interprets the data as UTF-8
>
> Yes, I suspect most of the changes you proposed for the INPUT FORMAT
> section would actually be better as changes for the
> dump-load-format.txt document. I imagine that folks on the dev@ list
> might be able to clarify a few details (e.g., what one is expected to
> do with historical repositories with non-UTF-8 property data), too.
> What do you think?
Hmm, I'd personally be more interested in going to the SVN folks with a
more general question. The SVN Book[1] says "pathnames can contain only
legal XML (1.0) characters, and properties are further limited to ASCII
characters. Subversion also prohibits TAB, CR, and LF characters in path
names". Code documentation[2] gives a lot of complex rules that don't
bear much resemblance to the behaviour I've seen so far (albeit only
lightly tested in SVN 1.6). The dumpfile docs[3] pretty much declare a
free-for-all, and I've yet to see historical documentation properly
written up anywhere.
I guess my question would be something like "what should a client
reading or writing SVN dumps do to stay as compatible as possible?", but
I feel like I've got a collection of bits that haven't quite coalesced
well enough yet to really drive the conversation.
As a web developer, the SBL work I've been doing is starting to remind
me of the jump from HTML4 ("here's what clients should do. Of course
it's not what they actually do...") to HTML5 ("here's what clients
actually do. No we're not allowed to just shoot those people"). Like
HTML5, I figure I've got to take the argument to the official body some
day, but I'd rather have something vaguely mature first.
My instinct is to put this on the TODO list for after I've finished
writing tests, but I'm open to suggestions.
- Andrew
[1]http://svnbook.red-bean.com/en/1.7/svn.tour.importing.html#svn.tour.importing.naming
[2]http://subversion.apache.org/docs/api/latest/group__svn__fs__directories.html#details
[3]http://svn.apache.org/repos/asf/subversion/trunk/notes/dump-load-format.txt
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCHv2] Add details about svn-fe's dumpfile parsing
2012-04-16 22:15 ` Andrew Sayers
@ 2012-04-16 22:27 ` Jonathan Nieder
0 siblings, 0 replies; 7+ messages in thread
From: Jonathan Nieder @ 2012-04-16 22:27 UTC (permalink / raw)
To: Andrew Sayers
Cc: Junio C Hamano, Git Mailing List, David Barr,
Ramkumar Ramachandra
Andrew Sayers wrote:
> Like
> HTML5, I figure I've got to take the argument to the official body some
> day, but I'd rather have something vaguely mature first.
Perhaps unlike the XHTML days at the W3C ;-), in this instance the
people responsible for the dump-load-format.txt document are really
nice and helpful and generally right-minded people.
If the mailing list is intimidating (or even if not), I'd recommend
visiting the IRC channel #svn-dev on freenode to say hello and get
advice.
Jonathan
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCHv2] Add details about svn-fe's dumpfile parsing
2012-04-16 21:39 ` Jonathan Nieder
2012-04-16 22:15 ` Andrew Sayers
@ 2012-07-23 1:37 ` Jonathan Nieder
1 sibling, 0 replies; 7+ messages in thread
From: Jonathan Nieder @ 2012-07-23 1:37 UTC (permalink / raw)
To: Andrew Sayers
Cc: Junio C Hamano, Git Mailing List, David Barr,
Ramkumar Ramachandra
Hi Andrew,
In April, you wrote a nice patch[1] for the svn-fe(1) manpage
clarifying some of its limitations. I was hoping to offer some
patches to squash in to polish some of its more confusing edges and
then apply it, but time for polishing ended up being scarce.
Can you remind me of the current status of the patch --- e.g., is the
version at [1] the latest version? Do you think it's ready as-is or
would you have suggestions for a person wanting to get it ready for
inclusion?
Thanks,
Jonathan
[1] http://thread.gmane.org/gmane.comp.version-control.git/195570
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2012-07-23 1:37 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-04-15 16:10 [PATCHv2] Add details about svn-fe's dumpfile parsing Andrew Sayers
2012-04-16 20:06 ` Junio C Hamano
2012-04-16 21:35 ` Andrew Sayers
2012-04-16 21:39 ` Jonathan Nieder
2012-04-16 22:15 ` Andrew Sayers
2012-04-16 22:27 ` Jonathan Nieder
2012-07-23 1:37 ` Jonathan Nieder
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).