git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jonathan Nieder <jrnieder@gmail.com>
To: Ramkumar Ramachandra <artagnon@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>,
	David Michael Barr <david.barr@cordelta.com>
Subject: Re: [PATCH 6/7] Add SVN revision parser and exporter
Date: Sat, 29 May 2010 09:06:45 -0500	[thread overview]
Message-ID: <20100529140645.GA8338@progeny.tock> (raw)
In-Reply-To: <1274650832-7411-7-git-send-email-artagnon@gmail.com>

Hi again,

Ramkumar Ramachandra wrote:

> repo_tree parses SVN revisions to build a Git objects, and use
> fast_export to emit them so they can be imported into the Git object
> store via a fast-import.

Wait, where are SVN revisions being parsed?  It seems that the repo
module maintains the exporter's state and provides a facility to
to call the fast_export module to write it out.

  repo_add(path, mode, blob_mark) is used to add a new file to
  the current commit.

  repo_modify is used to add a replacement for an existing file;
  it is implemented exactly the same way, but a check could be
  added later to distinguish the two cases.

  repo_copy copies a blob from a previous revision to the current
  commit.

  repo_replace modifies the content of a file from the current
  commit, if and only if it already exists.

  repo_delete removes a file or directory from the current commit.

  repo_commit calls out to fast_export to write the current commit
  to the fast-import stream in stdout.

  repo_diff is used by the fast_export module to write the changes
  for a commit.

  repo_reset erases the exporter's state, so valgrind can be happy.

> +void fast_export_modify(uint32_t depth, uint32_t *path, uint32_t mode,
> +                        uint32_t mark)
> +{
> +    printf("M %06o :%d ", mode, mark);
> +    pool_print_seq(depth, path, '/', stdout);
> +    putchar('\n');
> +}

Mode must be 100644, 100755, 120000, or 160000.

[...]
> +typedef struct repo_dirent_s repo_dirent_t;
> +
> +struct repo_dirent_s {
> +    uint32_t name_offset;
> +    uint32_t mode;
> +    uint32_t content_offset;
> +};
> +
> +typedef struct repo_dir_s repo_dir_t;
> +
> +struct repo_dir_s {
> +    uint32_t size;
> +    uint32_t first_offset;
> +};
> +
> +typedef struct repo_commit_s repo_commit_t;
> +
> +struct repo_commit_s {
> +    uint32_t mark;
> +    uint32_t root_dir_offset;
> +};

Style: we don’t tend to use typedef to hide underlying struct definitions
(see Documentation/CodingStyle from linux-2.6.git, chapter 5, for some
explanation about why).

> +static uint32_t num_dirs_saved = 0;
> +static uint32_t num_dirents_saved = 0;

Style: leave out the initializer for bss-allocated variables.

> +static int repo_dirent_name_cmp(const void *a, const void *b)
> +{
> +    return (((repo_dirent_t *) a)->name_offset
> +            > ((repo_dirent_t *) b)->name_offset) -
> +        (((repo_dirent_t *) a)->name_offset
> +         < ((repo_dirent_t *) b)->name_offset);
> +}

Maybe some local variables would make this more readable:

 {
	const repo_dirent_t *dirent1 = a, *dirent2 = b;
	uint32_t a_offset = dirent1->name_offset;
	uint32_t b_offset = dirent2->name_offset;

	return (a_offset > b_offset) - (a_offset < b_offset);
 }

> +static repo_dir_t *repo_clone_dir(repo_dir_t *orig_dir, uint32_t padding)
> +{
> +    uint32_t orig_o, new_o, dirent_o;
> +    orig_o = dir_offset(orig_dir);
> +    if (orig_o < num_dirs_saved) {
> +        new_o = dir_with_dirents_alloc(orig_dir->size + padding);
> +        orig_dir = dir_pointer(orig_o);
> +        dirent_o = dir_pointer(new_o)->first_offset;
> +    } else {
> +        if (padding == 0)
> +            return orig_dir;
> +        new_o = orig_o;
> +        dirent_o = dirent_alloc(orig_dir->size + padding);
> +    }
> +    memcpy(dirent_pointer(dirent_o), repo_first_dirent(orig_dir),
> +           orig_dir->size * sizeof(repo_dirent_t));
> +    dir_pointer(new_o)->size = orig_dir->size + padding;
> +    dir_pointer(new_o)->first_offset = dirent_o;
> +    return dir_pointer(new_o);
> +}

Is this for adding new entries to an existing directory?  It is
getting late, so I did not look it over carefully.

> +static char repo_path_buffer[REPO_MAX_PATH_LEN];
> +static repo_dirent_t *repo_read_dirent(uint32_t revision, char *path)
> +{
> +    char *ctx = NULL;
> +    uint32_t name = 0;
> +    repo_dir_t *dir = NULL;
> +    repo_dirent_t *dirent = NULL;
> +    dir = repo_commit_root_dir(commit_pointer(revision));
> +    strncpy(repo_path_buffer, path, REPO_MAX_PATH_LEN);
> +    repo_path_buffer[REPO_MAX_PATH_LEN - 1] = '\0';
> +    path = repo_path_buffer;
> +    for (name = pool_tok_r(path, "/", &ctx);
> +         ~name; name = pool_tok_r(NULL, "/", &ctx)) {
> +        dirent = repo_dirent_by_name(dir, name);
> +        if (dirent == NULL) {
> +            return NULL;
> +        } else if (repo_dirent_is_dir(dirent)) {
> +            dir = repo_dir_from_dirent(dirent);
> +        } else {
> +            break;
> +        }
> +    }
> +    return dirent;
> +}

If a file "foo" exists and I ask for "foo/bar", this will return
the entry for foo.  Is that appropriate?

> +static void
> +repo_write_dirent(char *path, uint32_t mode, uint32_t content_offset,
> +                  uint32_t del)
> +{
> +    char *ctx;
> +    uint32_t name, revision, dirent_o, dir_o, parent_dir_o;
> +    repo_dir_t *dir;
> +    repo_dirent_t *dirent = NULL;
> +    revision = active_commit;
> +    dir = repo_commit_root_dir(commit_pointer(revision));
> +    dir = repo_clone_dir(dir, 0);
> +    commit_pointer(revision)->root_dir_offset = dir_offset(dir);
> +    strncpy(repo_path_buffer, path, REPO_MAX_PATH_LEN);
> +    repo_path_buffer[REPO_MAX_PATH_LEN - 1] = '\0';
> +    path = repo_path_buffer;
> +    for (name = pool_tok_r(path, "/", &ctx); ~name;
> +         name = pool_tok_r(NULL, "/", &ctx)) {
> +        parent_dir_o = dir_offset(dir);
> +        dirent = repo_dirent_by_name(dir, name);
> +        if (dirent == NULL) {

static repo_dir_t *add_subdir(uint32_t parent_dir_o, repo_dir_t *dir,
                              uint32_t name)
{

> +            dir = repo_clone_dir(dir, 1);
> +            dirent = &repo_first_dirent(dir)[dir->size - 1];
> +            dirent->name_offset = name;
> +            dirent->mode = REPO_MODE_DIR;
> +            qsort(repo_first_dirent(dir), dir->size,
> +                  sizeof(repo_dirent_t), repo_dirent_name_cmp);
> +            dirent = repo_dirent_by_name(dir, name);
> +            dir_o = dir_with_dirents_alloc(0);
> +            dirent->content_offset = dir_o;
> +            dir = dir_pointer(dir_o);

}

> +        } else if ((dir = repo_dir_from_dirent(dirent))) {
> +            dirent_o = dirent_offset(dirent);
> +            dir = repo_clone_dir(dir, 0);
> +            if (dirent_o != ~0)
> +                dirent_pointer(dirent_o)->content_offset = dir_offset(dir);
> +        } else {

static repo_dir_t *replace_with_empty_dir(uint32_t dirent)
{

> +            dirent->mode = REPO_MODE_DIR;
> +            dirent_o = dirent_offset(dirent);
> +            dir_o = dir_with_dirents_alloc(0);
> +            dirent = dirent_pointer(dirent_o);
> +            dir = dir_pointer(dir_o);
> +            dirent->content_offset = dir_o;

}

> +        }
> +    }
> +    if (dirent) {

When would it be NULL?  Is an empty path allowed?

static void fill_dirent(repo_dirent_t *dirent,
                        uint32_t mode, uint32_t content_offset,
                        uint32_t parent_dir_o, int del)
{

> +        dirent->mode = mode;
> +        dirent->content_offset = content_offset;
> +        if (del && ~parent_dir_o) {
> +            dirent->name_offset = ~0;
> +            dir = dir_pointer(parent_dir_o);
> +            qsort(repo_first_dirent(dir), dir->size,
> +                  sizeof(repo_dirent_t), repo_dirent_name_cmp);
> +            dir->size--;
> +        }

}

> +    }
> +}
[...]

> +static void
> +repo_git_add_r(uint32_t depth, uint32_t *path, repo_dir_t *dir);
> +
> +static void
> +repo_git_add(uint32_t depth, uint32_t *path, repo_dirent_t *dirent)
> +{
> +    if (repo_dirent_is_dir(dirent)) {
> +        repo_git_add_r(depth, path, repo_dir_from_dirent(dirent));
> +    } else {
> +        fast_export_modify(depth, path, dirent->mode, dirent->content_offset);
> +    }
> +}

Just curious: does gcc notice the tail calls?

> +static void
> +repo_git_add_r(uint32_t depth, uint32_t *path, repo_dir_t *dir)
> +{
> +    uint32_t o;
> +    repo_dirent_t *de;
> +    de = repo_first_dirent(dir);
> +    for (o = 0; o < dir->size; o++) {
> +        path[depth] = de[o].name_offset;
> +        repo_git_add(depth + 1, path, &de[o]);
> +    }
> +}

Can this overflow the path stack?

> +
> +static void
> +repo_diff_r(uint32_t depth, uint32_t *path, repo_dir_t *dir1,
> +            repo_dir_t *dir2)
> +{
> +    repo_dirent_t *de1, *de2, *max_de1, *max_de2;
> +    de1 = repo_first_dirent(dir1);
> +    de2 = repo_first_dirent(dir2);
> +    max_de1 = &de1[dir1->size];
> +    max_de2 = &de2[dir2->size];
> +
> +    while (de1 < max_de1 && de2 < max_de2) {
> +        if (de1->name_offset < de2->name_offset) {
> +            path[depth] = (de1++)->name_offset;
> +            fast_export_delete(depth + 1, path);
> +        } else if (de1->name_offset == de2->name_offset) {
> +            path[depth] = de1->name_offset;
> +            if (de1->content_offset != de2->content_offset) {
> +                if (repo_dirent_is_dir(de1) && repo_dirent_is_dir(de2)) {
> +                    repo_diff_r(depth + 1, path,
> +                                repo_dir_from_dirent(de1),
> +                                repo_dir_from_dirent(de2));
> +                } else {
> +                    if (repo_dirent_is_dir(de1) != repo_dirent_is_dir(de2)) {
> +                        fast_export_delete(depth + 1, path);
> +                    }
> +                    repo_git_add(depth + 1, path, de2);
> +                }
> +            }
> +            de1++;
> +            de2++;
> +        } else {
> +            path[depth] = de2->name_offset;
> +            repo_git_add(depth + 1, path, de2++);
> +        }

Might be easier to read the other way around:

		if (de1->name_offset < de2->name_offset) {
			path[depth] = (de1++)->name_offset;
			fast_export_delete(depth + 1, path);
			continue;
		}
		if (de1->name_offset > de2->name_offset) {
			path[depth] = de2->name_offset;
			repo_git_add(depth + 1, path, de2++);
			continue;
		}
		... matching paths case ...

> +    }
> +    while (de1 < max_de1) {
> +        path[depth] = (de1++)->name_offset;
> +        fast_export_delete(depth + 1, path);
> +    }
> +    while (de2 < max_de2) {
> +        path[depth] = de2->name_offset;
> +        repo_git_add(depth + 1, path, de2++);
> +    }
> +}
[...]

> +void repo_commit(uint32_t revision, char *author, char *log, char *uuid,
> +                 char *url, time_t timestamp)
> +{
> +    if (revision == 0) {
> +        active_commit = commit_alloc(1);
> +        commit_pointer(active_commit)->root_dir_offset =
> +            dir_with_dirents_alloc(0);
> +    } else {
> +        fast_export_commit(revision, author, log, uuid, url, timestamp);
> +    }
> +    num_dirs_saved = dir_pool.size;
> +    num_dirents_saved = dirent_pool.size;

I did not carefully trace the cases where repo_clone_dir may reuse a
dir.  I would be happier if someone else does.

The rest looked good to me, for what it’s worth.

Thanks,
Jonathan

  reply	other threads:[~2010-05-29 14:06 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-05-23 21:40 [PATCH 0/7] Import David's SVN exporter Ramkumar Ramachandra
2010-05-23 21:40 ` [WIP PATCH 1/7] Add skeleton remote helper for SVN Ramkumar Ramachandra
2010-05-23 21:40 ` [PATCH 2/7] Add cpp macro implementation of treaps Ramkumar Ramachandra
2010-05-29  7:18   ` Jonathan Nieder
2010-05-30  9:09     ` Ramkumar Ramachandra
2010-05-30  9:31       ` Jonathan Nieder
2010-05-30  9:33         ` Ramkumar Ramachandra
2010-05-23 21:40 ` [PATCH 3/7] Add buffer pool library Ramkumar Ramachandra
2010-05-24  7:47   ` Peter Baumann
2010-05-24 10:11     ` Ramkumar Ramachandra
2010-05-24 10:37       ` David Michael Barr
2010-05-29  8:51   ` Jonathan Nieder
2010-05-29 10:55     ` David Michael Barr
2010-05-23 21:40 ` [PATCH 4/7] Add a memory " Ramkumar Ramachandra
2010-05-29  9:06   ` Jonathan Nieder
2010-05-30  9:12     ` Ramkumar Ramachandra
2010-05-30  9:55       ` Jonathan Nieder
2010-05-30 10:51         ` Ramkumar Ramachandra
2010-05-23 21:40 ` [PATCH 5/7] Add API for string-specific memory pool Ramkumar Ramachandra
2010-05-29 11:38   ` Jonathan Nieder
2010-05-30  9:38     ` Ramkumar Ramachandra
2010-05-30 10:09       ` Jonathan Nieder
2010-05-30 16:52     ` Ramkumar Ramachandra
2010-05-23 21:40 ` [PATCH 6/7] Add SVN revision parser and exporter Ramkumar Ramachandra
2010-05-29 14:06   ` Jonathan Nieder [this message]
2010-05-30 15:58     ` Ramkumar Ramachandra
2010-05-23 21:40 ` [PATCH 7/7] Add handler for SVN dump Ramkumar Ramachandra
2010-05-30  8:59   ` Jonathan Nieder
2010-05-30 10:45     ` Ramkumar Ramachandra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100529140645.GA8338@progeny.tock \
    --to=jrnieder@gmail.com \
    --cc=artagnon@gmail.com \
    --cc=david.barr@cordelta.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).