All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ramkumar Ramachandra <artagnon@gmail.com>
To: David Barr <david.barr@cordelta.com>
Cc: Git Mailing List <git@vger.kernel.org>,
	Jonathan Nieder <jrnieder@gmail.com>,
	Sverre Rabbelier <srabbelier@gmail.com>
Subject: Re: [PATCH 5/5] svn-fe: Use the cat-blob command to apply deltas
Date: Mon, 18 Oct 2010 12:27:01 +0530	[thread overview]
Message-ID: <20101018065657.GE22376@kytes> (raw)
In-Reply-To: <1287147256-9457-6-git-send-email-david.barr@cordelta.com>

Hi David,

David Barr writes:
> Use the new cat-blob command for fast-import to extract
> blobs so that text-deltas may be applied.

I like this straightforward approach, and I like the name 'cat-blob'.

> The backchannel should only need to be configured when
> parsing v3 svn dump streams.

Maybe get the synopsis to say this as well?

> Based-on-patch-by: Ramkumar Ramachandra <artagnon@gmail.com>
> Based-on-patch-by: Jonathan Nieder <jrnieder@gmail.com>
> Tested-by: David Barr <david.barr@cordelta.com>
> Signed-off-by: David Barr <david.barr@cordelta.com>
> ---
>  contrib/svn-fe/svn-fe.txt |    6 +++-
>  t/t9010-svn-fe.sh         |    6 ++--
>  vcs-svn/fast_export.c     |   86 +++++++++++++++++++++++++++++++++++++++++++-
>  3 files changed, 92 insertions(+), 6 deletions(-)
> 
> diff --git a/contrib/svn-fe/svn-fe.txt b/contrib/svn-fe/svn-fe.txt
> index 35f84bd..39ffa07 100644
> --- a/contrib/svn-fe/svn-fe.txt
> +++ b/contrib/svn-fe/svn-fe.txt
> @@ -7,7 +7,11 @@ svn-fe - convert an SVN "dumpfile" to a fast-import stream
>  
>  SYNOPSIS
>  --------
> -svnadmin dump --incremental REPO | svn-fe [url] | git fast-import
> +[verse]
> +mkfifo backchannel &&
> +svnadmin dump --incremental REPO |
> +	svn-fe [url] 3<backchannel |
> +	git fast-import --cat-blob-fd=3 3>backchannel

See above.

>  DESCRIPTION
>  -----------
> diff --git a/t/t9010-svn-fe.sh b/t/t9010-svn-fe.sh
> index de976ed..d750c7a 100755
> --- a/t/t9010-svn-fe.sh
> +++ b/t/t9010-svn-fe.sh
> @@ -34,10 +34,10 @@ test_dump () {
>  		svnadmin load "$label-svn" < "$TEST_DIRECTORY/$dump" &&
>  		svn_cmd export "file://$PWD/$label-svn" "$label-svnco" &&
>  		git init "$label-git" &&
> -		test-svn-fe "$TEST_DIRECTORY/$dump" >"$label.fe" &&
>  		(
> -			cd "$label-git" &&
> -			git fast-import < ../"$label.fe"
> +			cd "$label-git" && mkfifo backchannel && \
> +			test-svn-fe "$TEST_DIRECTORY/$dump" 3< backchannel | \
> +			git fast-import --cat-blob-fd=3 3> backchannel
>  		) &&
>  		(
>  			cd "$label-svnco" &&

Ok.

> diff --git a/vcs-svn/fast_export.c b/vcs-svn/fast_export.c
> index b017dfb..812563d 100644
> --- a/vcs-svn/fast_export.c
> +++ b/vcs-svn/fast_export.c
> @@ -8,10 +8,17 @@
>  #include "line_buffer.h"
>  #include "repo_tree.h"
>  #include "string_pool.h"
> +#include "svndiff.h"
>  
>  #define MAX_GITSVN_LINE_LEN 4096
> +#define REPORT_FILENO 3
> +
> +#define SHA1_HEX_LENGTH 40
>  
>  static uint32_t first_commit_done;
> +static struct line_buffer preimage = LINE_BUFFER_INIT;
> +static struct line_buffer postimage = LINE_BUFFER_INIT;
> +static struct line_buffer backchannel = LINE_BUFFER_INIT;

Elegant :)

>  void fast_export_delete(uint32_t depth, uint32_t *path)
>  {
> @@ -63,16 +70,91 @@ void fast_export_commit(uint32_t revision, uint32_t author, char *log,
>  	printf("progress Imported commit %"PRIu32".\n\n", revision);
>  }
>  
> +static int fast_export_save_blob(FILE *out)
> +{
> +	size_t len;
> +	char *header;
> +	char *end;
> +	char *tail;
> +
> +	if (!backchannel.infile)
> +		backchannel.infile = fdopen(REPORT_FILENO, "r");
> +	if (!backchannel.infile)
> +		return error("Could not open backchannel fd: %d", REPORT_FILENO);

REPORT_FILENO = 3 is hard-coded. Is this intended? Maybe a
command-line option to specify the fd?

> +	header = buffer_read_line(&backchannel);
> +	if (header == NULL)
> +		return 1;

Note to self: This prints the error "Failed to retrieve blob for delta
application" in the caller.

> +	end = strchr(header, '\0');
> +	if (end - header > 7 && !strcmp(end - 7, "missing"))
> +		return error("cat-blob reports missing blob: %s", header);
> +	if (end - header < SHA1_HEX_LENGTH)
> +		return error("cat-blob header too short for SHA1: %s", header);
> +	if (strncmp(header + SHA1_HEX_LENGTH, " blob ", 6))
> +		return error("cat-blob header has wrong object type: %s", header);
> +	len = strtoumax(header + SHA1_HEX_LENGTH + 6, &end, 10);
> +	if (end == header + SHA1_HEX_LENGTH + 6)
> +		return error("cat-blob header did not contain length: %s", header);
> +	if (*end)
> +		return error("cat-blob header contained garbage after length: %s", header);
> +	buffer_copy_bytes(&backchannel, out, len);
> +	tail = buffer_read_line(&backchannel);
> +	if (!tail)
> +		return 1;

Could you clarify when exactly will this happen?

> +	if (*tail)
> +		return error("cat-blob trailing line contained garbage: %s", tail);
> +	return 0;
> +}
> +
>  void fast_export_blob(uint32_t mode, uint32_t mark, uint32_t len,
>  			uint32_t delta, uint32_t srcMark, uint32_t srcMode,
>  			struct line_buffer *input)
>  {

Note to reviewers: The function looks like this in `master`:
void fast_export_blob(uint32_t mode, uint32_t mark, uint32_t len)

New parameters intrduced in the svn-fe3 series: srcMark, srcMode,
delta, input.

> +	long preimage_len = 0;
> +
> +	if (delta) {
> +		if (!preimage.infile)
> +			preimage.infile = tmpfile();

Didn't you later decide against this and use one tmpfile instead? In
this case, the temporary file will be automatically deleted when
`preimage.infile` goes out of scope.

> +		if (!preimage.infile)
> +			die("Unable to open temp file for blob retrieval");
> +		if (srcMark) {
> +			printf("cat-blob :%"PRIu32"\n", srcMark);
> +			fflush(stdout);
> +			if (srcMode == REPO_MODE_LNK)
> +				fwrite("link ", 1, 5, preimage.infile);

Special handling for symbolic links. Perhaps you should mention it in
a comment here?

> +			if (fast_export_save_blob(preimage.infile))
> +				die("Failed to retrieve blob for delta application");
> +		}
> +		preimage_len = ftell(preimage.infile);
> +		fseek(preimage.infile, 0, SEEK_SET);
> +		if (!postimage.infile)
> +			postimage.infile = tmpfile();

One tmpfile?

> +		if (!postimage.infile)
> +			die("Unable to open temp file for blob application");
> +		svndiff0_apply(input, len, &preimage, postimage.infile);
> +		len = ftell(postimage.infile);

Since you already have a preimage_len, perhaps name this postimage_len
to avoid confusion?

> +		fseek(postimage.infile, 0, SEEK_SET);
> +	}
> +
>  	if (mode == REPO_MODE_LNK) {
>  		/* svn symlink blobs start with "link " */
> -		buffer_skip_bytes(input, 5);
> +		if (delta)
> +			buffer_skip_bytes(&postimage, 5);
> +		else
> +			buffer_skip_bytes(input, 5);
>  		len -= 5;
>  	}
>  	printf("blob\nmark :%"PRIu32"\ndata %"PRIu32"\n", mark, len);
> -	buffer_copy_bytes(input, stdout, len);
> +	if (!delta)
> +		buffer_copy_bytes(input, stdout, len);
> +	else
> +		buffer_copy_bytes(&postimage, stdout, len);
>  	fputc('\n', stdout);

I should have asked this a long time ago: why the extra newline?

> +
> +	if (preimage.infile) {
> +		fseek(preimage.infile, 0, SEEK_SET);
> +	}
> +
> +	if (postimage.infile) {
> +		fseek(postimage.infile, 0, SEEK_SET);
> +	}

Style nits: The extra braces around the `if` statement are unnecessary.

Overall, pleasant read. Thanks for taking this forward.

-- Ram

  reply	other threads:[~2010-10-18  6:58 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-10-15 12:54 [PATCHv2] Add support for subversion dump format v3 David Barr
2010-10-15 12:54 ` [PATCH 1/5] fast-import: Let importers retrieve blobs David Barr
2010-10-18  7:36   ` Ramkumar Ramachandra
2010-10-18  8:50     ` Jonathan Nieder
2010-10-18  8:26   ` Jonathan Nieder
     [not found]   ` <20101119093530.GA19061@burratino>
2010-11-19  9:47     ` [PATCH 3/4] fast-import: let " Jonathan Nieder
2010-11-19  9:51     ` [PATCH 4/4] fast-import: Allow cat-blob requests at arbitrary points in stream Jonathan Nieder
     [not found]     ` <20101119094045.GC19061@burratino>
2010-11-19 11:58       ` [PATCH 2/4] fast-import: clarify documentation of "feature" command Sverre Rabbelier
2010-11-28 19:41   ` [PATCH/RFC v3 resend 0/4] fast-import: Let importers retrieve blobs Jonathan Nieder
2010-11-28 19:42     ` [PATCH 1/4] fast-import: stricter parsing of integer options Jonathan Nieder
2010-11-30  1:01       ` Junio C Hamano
2010-11-28 19:43     ` [PATCH 2/4] fast-import: clarify documentation of "feature" command Jonathan Nieder
2010-11-28 19:45     ` [PATCH 3/4] fast-import: let importers retrieve blobs Jonathan Nieder
2010-11-29 23:48       ` [PATCH] fixup! " David Barr
2010-11-30  0:16         ` David Barr
2010-11-30  1:22         ` Jonathan Nieder
2010-12-03 10:30       ` [PATCH 3/4] " Thomas Rast
2010-12-03 19:06         ` Jonathan Nieder
2010-12-03 20:17         ` Junio C Hamano
2010-12-03 20:26           ` Jonathan Nieder
2010-12-04 13:24         ` Thomas Rast
2010-12-04  2:35       ` Jonathan Nieder
2011-01-16  2:16       ` [PATCH] Documentation/fast-import: capitalize beginning of sentence Jonathan Nieder
2010-11-28 19:45     ` [PATCH 4/4] fast-import: Allow cat-blob requests at arbitrary points in stream Jonathan Nieder
2010-10-15 12:54 ` [PATCH 2/5] vcs-svn: Extend svndump to parse version 3 format David Barr
2010-10-15 12:54 ` [PATCH 3/5] vcs-svn: Implement prop-delta handling David Barr
2010-10-18 15:10   ` Ramkumar Ramachandra
2010-10-15 12:54 ` [PATCH 4/5] vcs-svn: Add outfile option to buffer_copy_bytes() David Barr
2010-10-18  8:59   ` Jonathan Nieder
2010-10-15 12:54 ` [PATCH 5/5] svn-fe: Use the cat-blob command to apply deltas David Barr
2010-10-18  6:57   ` Ramkumar Ramachandra [this message]
2010-10-18  9:24     ` Jonathan Nieder
2010-10-18 12:18       ` Ramkumar Ramachandra
2010-10-18  9:54 ` [PATCHv2] Add support for subversion dump format v3 Jonathan Nieder

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101018065657.GE22376@kytes \
    --to=artagnon@gmail.com \
    --cc=david.barr@cordelta.com \
    --cc=git@vger.kernel.org \
    --cc=jrnieder@gmail.com \
    --cc=srabbelier@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.