* [PATCH] Explain how svn-fe parses filenames in SVN dumps
@ 2012-04-14 17:03 Andrew Sayers
2012-04-14 17:14 ` Jonathan Nieder
0 siblings, 1 reply; 7+ messages in thread
From: Andrew Sayers @ 2012-04-14 17:03 UTC (permalink / raw)
To: Git Mailing List; +Cc: David Barr, Jonathan Nieder
The documentation for the SVN dumpfile format says that filenames "may be
interpreted as binary data in any encoding by client tools", but users might be
surprised that svn-fe's handling differs from svn's.
Before version 1.2.0, `svn add` supported files containing characters in the
range 0x01-0x1F, and Subversion still supports existing files that contain
those characters. The newline character is explicitly discussed so that users
with ancient repositories understand why they can't be supported by tools that
read the SVN dump format.
The documentation for the SVN dumpfile format describes records as containing
"a group of RFC822-style header lines", and its full text can be read as
implying newline characters are reserved for use by the format. This reading
is slightly charitable, but it avoids the need to discuss the format's design
issues in a context where few readers will be interested.
Signed-off-by: Andrew Sayers <andrew-git@pileofstuff.org>
---
contrib/svn-fe/svn-fe.txt | 8 ++++++++
1 files changed, 8 insertions(+), 0 deletions(-)
diff --git a/contrib/svn-fe/svn-fe.txt b/contrib/svn-fe/svn-fe.txt
index 1128ab2..c079abe 100644
--- a/contrib/svn-fe/svn-fe.txt
+++ b/contrib/svn-fe/svn-fe.txt
@@ -59,6 +59,14 @@ to put each project in its own repository and to separate the history
of each branch. The 'git filter-branch --subdirectory-filter' command
may be useful for this purpose.
+Filenames are interpreted by svn-fe as binary data, and may contain
+any character except NUL (0x00) and newline (0x0A). The NUL
+character is not valid in git paths, and the newline character is
+reserved for use by the (line-based) Subversion dumpfile format.
+This differs from Subversion, which requires filenames to contain
+only legal XML characters and disallows tabs characters, carriage
+returns and newlines.
+
BUGS
----
Empty directories and unknown properties are silently discarded.
--
1.7.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] Explain how svn-fe parses filenames in SVN dumps
2012-04-14 17:03 [PATCH] Explain how svn-fe parses filenames in SVN dumps Andrew Sayers
@ 2012-04-14 17:14 ` Jonathan Nieder
2012-04-14 17:37 ` Andrew Sayers
0 siblings, 1 reply; 7+ messages in thread
From: Jonathan Nieder @ 2012-04-14 17:14 UTC (permalink / raw)
To: Andrew Sayers; +Cc: Git Mailing List, David Barr, Ramkumar Ramachandra
Hi,
Andrew Sayers wrote:
> Before version 1.2.0, `svn add` supported files containing characters in the
> range 0x01-0x1F, and Subversion still supports existing files that contain
> those characters.
Because of the above,
[...]
> +++ b/contrib/svn-fe/svn-fe.txt
> @@ -59,6 +59,14 @@ to put each project in its own repository and to separate the history
> of each branch. The 'git filter-branch --subdirectory-filter' command
> may be useful for this purpose.
>
> +Filenames are interpreted by svn-fe as binary data, and may contain
> +any character except NUL (0x00) and newline (0x0A). The NUL
> +character is not valid in git paths, and the newline character is
> +reserved for use by the (line-based) Subversion dumpfile format.
> +This differs from Subversion, which requires filenames to contain
> +only legal XML characters and disallows tabs characters, carriage
> +returns and newlines.
> +
> BUGS
this description and the location of this description seem quite
misleading. Isn't what the reader needs to know something like the
following?
BUGS
----
Due to limitations in the Subversion dumpfile format, svn-fe
does not support filenames with newlines. Since version 1.2.0,
"svn add" forbids adding such filenames but some historical
repositories contain them. An import can appear to succeed and
produce incorrect results when such pathological filenames are
present.
Thanks,
Jonathan
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Explain how svn-fe parses filenames in SVN dumps
2012-04-14 17:14 ` Jonathan Nieder
@ 2012-04-14 17:37 ` Andrew Sayers
2012-04-14 18:13 ` Jonathan Nieder
2012-04-14 18:18 ` Jonathan Nieder
0 siblings, 2 replies; 7+ messages in thread
From: Andrew Sayers @ 2012-04-14 17:37 UTC (permalink / raw)
To: Jonathan Nieder; +Cc: Git Mailing List, David Barr, Ramkumar Ramachandra
On 14/04/12 18:14, Jonathan Nieder wrote:
> this description and the location of this description seem quite
> misleading. Isn't what the reader needs to know something like the
> following?
>
> BUGS
> ----
> Due to limitations in the Subversion dumpfile format, svn-fe
> does not support filenames with newlines. Since version 1.2.0,
> "svn add" forbids adding such filenames but some historical
> repositories contain them. An import can appear to succeed and
> produce incorrect results when such pathological filenames are
> present.
>
> Thanks,
> Jonathan
>
I went back and forth a bit while writing the text. Newlines are only
one special case, albeit an important one that I hadn't expressed
clearly enough. For example, the handling of NUL characters is worse
than newlines (a quick test suggests svn-fe terminates parsing
altogether if it sees one), but SVN has never allowed the creation of
files with NULs so arguably it's even less important.
I'm warming again to the idea of explicitly mentioning that newlines
cause breakage, but it feels like the wider story is worth telling too.
If this were a man page I'd consider adding a section or something, but
I'm not sure what level of verbosity you're looking for in this file.
- Andrew
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Explain how svn-fe parses filenames in SVN dumps
2012-04-14 17:37 ` Andrew Sayers
@ 2012-04-14 18:13 ` Jonathan Nieder
2012-04-14 18:18 ` Jonathan Nieder
1 sibling, 0 replies; 7+ messages in thread
From: Jonathan Nieder @ 2012-04-14 18:13 UTC (permalink / raw)
To: Andrew Sayers; +Cc: Git Mailing List, David Barr, Ramkumar Ramachandra
Andrew Sayers wrote:
> If this were a man page
This is a man page. "git log contrib/svn-fe/svn-fe.txt" has details.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Explain how svn-fe parses filenames in SVN dumps
2012-04-14 17:37 ` Andrew Sayers
2012-04-14 18:13 ` Jonathan Nieder
@ 2012-04-14 18:18 ` Jonathan Nieder
2012-04-14 21:56 ` Andrew Sayers
1 sibling, 1 reply; 7+ messages in thread
From: Jonathan Nieder @ 2012-04-14 18:18 UTC (permalink / raw)
To: Andrew Sayers; +Cc: Git Mailing List, David Barr, Ramkumar Ramachandra
Andrew Sayers wrote:
> For example, the handling of NUL characters is worse
> than newlines
Filenames are C strings. Filenames with NUL bytes simply do not
exist, at least as long as one is using the ANSI C or POSIX interfaces
for file access.
So as far as I can tell the story really is only about newlines. (If
svn-fe tried to push history back to Subversion, life would presumably
be more complicated.)
Hoping that clarifies a little,
Jonathan
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Explain how svn-fe parses filenames in SVN dumps
2012-04-14 18:18 ` Jonathan Nieder
@ 2012-04-14 21:56 ` Andrew Sayers
2012-04-15 5:08 ` Jonathan Nieder
0 siblings, 1 reply; 7+ messages in thread
From: Andrew Sayers @ 2012-04-14 21:56 UTC (permalink / raw)
To: Jonathan Nieder; +Cc: Git Mailing List, David Barr, Ramkumar Ramachandra
On 14/04/12 19:18, Jonathan Nieder wrote:
> So as far as I can tell the story really is only about newlines. (If
> svn-fe tried to push history back to Subversion, life would presumably
> be more complicated.)
I think I've been trying to balance two incompatible use cases -
ordinary users that just need a heads-up about a bug that might bite
them, and authors of related tools (i.e. me) that need technical
information about svn-fe.
I think your text serves ordinary users better, so I'll investigate
pushing history back to Subversion tomorrow and think how to write
something more technical. I'm afraid my patch etiquette fails me here
though - is it better for me to roll your text into a new patch, or let
you do that and submit my text separately?
- Andrew
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Explain how svn-fe parses filenames in SVN dumps
2012-04-14 21:56 ` Andrew Sayers
@ 2012-04-15 5:08 ` Jonathan Nieder
0 siblings, 0 replies; 7+ messages in thread
From: Jonathan Nieder @ 2012-04-15 5:08 UTC (permalink / raw)
To: Andrew Sayers; +Cc: Git Mailing List, David Barr, Ramkumar Ramachandra
Andrew Sayers wrote:
> I'm afraid my patch etiquette fails me here
> though - is it better for me to roll your text into a new patch, or let
> you do that and submit my text separately?
If the example I sent looks good, please feel free to morph it into a
new patch. Thanks again for your work, and glad I could help in some
small way.
Ciao,
Jonathan
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2012-04-15 5:09 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-04-14 17:03 [PATCH] Explain how svn-fe parses filenames in SVN dumps Andrew Sayers
2012-04-14 17:14 ` Jonathan Nieder
2012-04-14 17:37 ` Andrew Sayers
2012-04-14 18:13 ` Jonathan Nieder
2012-04-14 18:18 ` Jonathan Nieder
2012-04-14 21:56 ` Andrew Sayers
2012-04-15 5:08 ` Jonathan Nieder
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).