All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jonathan Nieder <jrnieder@gmail.com>
To: Ramkumar Ramachandra <artagnon@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>,
	David Michael Barr <david.barr@cordelta.com>
Subject: Re: [PATCH 6/7] Add SVN revision parser and exporter
Date: Sat, 29 May 2010 09:06:45 -0500	[thread overview]
Message-ID: <20100529140645.GA8338@progeny.tock> (raw)
In-Reply-To: <1274650832-7411-7-git-send-email-artagnon@gmail.com>

Hi again,

Ramkumar Ramachandra wrote:

> repo_tree parses SVN revisions to build a Git objects, and use
> fast_export to emit them so they can be imported into the Git object
> store via a fast-import.

Wait, where are SVN revisions being parsed?  It seems that the repo
module maintains the exporter's state and provides a facility to
to call the fast_export module to write it out.

  repo_add(path, mode, blob_mark) is used to add a new file to
  the current commit.

  repo_modify is used to add a replacement for an existing file;
  it is implemented exactly the same way, but a check could be
  added later to distinguish the two cases.

  repo_copy copies a blob from a previous revision to the current
  commit.

  repo_replace modifies the content of a file from the current
  commit, if and only if it already exists.

  repo_delete removes a file or directory from the current commit.

  repo_commit calls out to fast_export to write the current commit
  to the fast-import stream in stdout.

  repo_diff is used by the fast_export module to write the changes
  for a commit.

  repo_reset erases the exporter's state, so valgrind can be happy.

> +void fast_export_modify(uint32_t depth, uint32_t *path, uint32_t mode,
> +                        uint32_t mark)
> +{
> +    printf("M %06o :%d ", mode, mark);
> +    pool_print_seq(depth, path, '/', stdout);
> +    putchar('\n');
> +}

Mode must be 100644, 100755, 120000, or 160000.

[...]
> +typedef struct repo_dirent_s repo_dirent_t;
> +
> +struct repo_dirent_s {
> +    uint32_t name_offset;
> +    uint32_t mode;
> +    uint32_t content_offset;
> +};
> +
> +typedef struct repo_dir_s repo_dir_t;
> +
> +struct repo_dir_s {
> +    uint32_t size;
> +    uint32_t first_offset;
> +};
> +
> +typedef struct repo_commit_s repo_commit_t;
> +
> +struct repo_commit_s {
> +    uint32_t mark;
> +    uint32_t root_dir_offset;
> +};

Style: we don’t tend to use typedef to hide underlying struct definitions
(see Documentation/CodingStyle from linux-2.6.git, chapter 5, for some
explanation about why).

> +static uint32_t num_dirs_saved = 0;
> +static uint32_t num_dirents_saved = 0;

Style: leave out the initializer for bss-allocated variables.

> +static int repo_dirent_name_cmp(const void *a, const void *b)
> +{
> +    return (((repo_dirent_t *) a)->name_offset
> +            > ((repo_dirent_t *) b)->name_offset) -
> +        (((repo_dirent_t *) a)->name_offset
> +         < ((repo_dirent_t *) b)->name_offset);
> +}

Maybe some local variables would make this more readable:

 {
	const repo_dirent_t *dirent1 = a, *dirent2 = b;
	uint32_t a_offset = dirent1->name_offset;
	uint32_t b_offset = dirent2->name_offset;

	return (a_offset > b_offset) - (a_offset < b_offset);
 }

> +static repo_dir_t *repo_clone_dir(repo_dir_t *orig_dir, uint32_t padding)
> +{
> +    uint32_t orig_o, new_o, dirent_o;
> +    orig_o = dir_offset(orig_dir);
> +    if (orig_o < num_dirs_saved) {
> +        new_o = dir_with_dirents_alloc(orig_dir->size + padding);
> +        orig_dir = dir_pointer(orig_o);
> +        dirent_o = dir_pointer(new_o)->first_offset;
> +    } else {
> +        if (padding == 0)
> +            return orig_dir;
> +        new_o = orig_o;
> +        dirent_o = dirent_alloc(orig_dir->size + padding);
> +    }
> +    memcpy(dirent_pointer(dirent_o), repo_first_dirent(orig_dir),
> +           orig_dir->size * sizeof(repo_dirent_t));
> +    dir_pointer(new_o)->size = orig_dir->size + padding;
> +    dir_pointer(new_o)->first_offset = dirent_o;
> +    return dir_pointer(new_o);
> +}

Is this for adding new entries to an existing directory?  It is
getting late, so I did not look it over carefully.

> +static char repo_path_buffer[REPO_MAX_PATH_LEN];
> +static repo_dirent_t *repo_read_dirent(uint32_t revision, char *path)
> +{
> +    char *ctx = NULL;
> +    uint32_t name = 0;
> +    repo_dir_t *dir = NULL;
> +    repo_dirent_t *dirent = NULL;
> +    dir = repo_commit_root_dir(commit_pointer(revision));
> +    strncpy(repo_path_buffer, path, REPO_MAX_PATH_LEN);
> +    repo_path_buffer[REPO_MAX_PATH_LEN - 1] = '\0';
> +    path = repo_path_buffer;
> +    for (name = pool_tok_r(path, "/", &ctx);
> +         ~name; name = pool_tok_r(NULL, "/", &ctx)) {
> +        dirent = repo_dirent_by_name(dir, name);
> +        if (dirent == NULL) {
> +            return NULL;
> +        } else if (repo_dirent_is_dir(dirent)) {
> +            dir = repo_dir_from_dirent(dirent);
> +        } else {
> +            break;
> +        }
> +    }
> +    return dirent;
> +}

If a file "foo" exists and I ask for "foo/bar", this will return
the entry for foo.  Is that appropriate?

> +static void
> +repo_write_dirent(char *path, uint32_t mode, uint32_t content_offset,
> +                  uint32_t del)
> +{
> +    char *ctx;
> +    uint32_t name, revision, dirent_o, dir_o, parent_dir_o;
> +    repo_dir_t *dir;
> +    repo_dirent_t *dirent = NULL;
> +    revision = active_commit;
> +    dir = repo_commit_root_dir(commit_pointer(revision));
> +    dir = repo_clone_dir(dir, 0);
> +    commit_pointer(revision)->root_dir_offset = dir_offset(dir);
> +    strncpy(repo_path_buffer, path, REPO_MAX_PATH_LEN);
> +    repo_path_buffer[REPO_MAX_PATH_LEN - 1] = '\0';
> +    path = repo_path_buffer;
> +    for (name = pool_tok_r(path, "/", &ctx); ~name;
> +         name = pool_tok_r(NULL, "/", &ctx)) {
> +        parent_dir_o = dir_offset(dir);
> +        dirent = repo_dirent_by_name(dir, name);
> +        if (dirent == NULL) {

static repo_dir_t *add_subdir(uint32_t parent_dir_o, repo_dir_t *dir,
                              uint32_t name)
{

> +            dir = repo_clone_dir(dir, 1);
> +            dirent = &repo_first_dirent(dir)[dir->size - 1];
> +            dirent->name_offset = name;
> +            dirent->mode = REPO_MODE_DIR;
> +            qsort(repo_first_dirent(dir), dir->size,
> +                  sizeof(repo_dirent_t), repo_dirent_name_cmp);
> +            dirent = repo_dirent_by_name(dir, name);
> +            dir_o = dir_with_dirents_alloc(0);
> +            dirent->content_offset = dir_o;
> +            dir = dir_pointer(dir_o);

}

> +        } else if ((dir = repo_dir_from_dirent(dirent))) {
> +            dirent_o = dirent_offset(dirent);
> +            dir = repo_clone_dir(dir, 0);
> +            if (dirent_o != ~0)
> +                dirent_pointer(dirent_o)->content_offset = dir_offset(dir);
> +        } else {

static repo_dir_t *replace_with_empty_dir(uint32_t dirent)
{

> +            dirent->mode = REPO_MODE_DIR;
> +            dirent_o = dirent_offset(dirent);
> +            dir_o = dir_with_dirents_alloc(0);
> +            dirent = dirent_pointer(dirent_o);
> +            dir = dir_pointer(dir_o);
> +            dirent->content_offset = dir_o;

}

> +        }
> +    }
> +    if (dirent) {

When would it be NULL?  Is an empty path allowed?

static void fill_dirent(repo_dirent_t *dirent,
                        uint32_t mode, uint32_t content_offset,
                        uint32_t parent_dir_o, int del)
{

> +        dirent->mode = mode;
> +        dirent->content_offset = content_offset;
> +        if (del && ~parent_dir_o) {
> +            dirent->name_offset = ~0;
> +            dir = dir_pointer(parent_dir_o);
> +            qsort(repo_first_dirent(dir), dir->size,
> +                  sizeof(repo_dirent_t), repo_dirent_name_cmp);
> +            dir->size--;
> +        }

}

> +    }
> +}
[...]

> +static void
> +repo_git_add_r(uint32_t depth, uint32_t *path, repo_dir_t *dir);
> +
> +static void
> +repo_git_add(uint32_t depth, uint32_t *path, repo_dirent_t *dirent)
> +{
> +    if (repo_dirent_is_dir(dirent)) {
> +        repo_git_add_r(depth, path, repo_dir_from_dirent(dirent));
> +    } else {
> +        fast_export_modify(depth, path, dirent->mode, dirent->content_offset);
> +    }
> +}

Just curious: does gcc notice the tail calls?

> +static void
> +repo_git_add_r(uint32_t depth, uint32_t *path, repo_dir_t *dir)
> +{
> +    uint32_t o;
> +    repo_dirent_t *de;
> +    de = repo_first_dirent(dir);
> +    for (o = 0; o < dir->size; o++) {
> +        path[depth] = de[o].name_offset;
> +        repo_git_add(depth + 1, path, &de[o]);
> +    }
> +}

Can this overflow the path stack?

> +
> +static void
> +repo_diff_r(uint32_t depth, uint32_t *path, repo_dir_t *dir1,
> +            repo_dir_t *dir2)
> +{
> +    repo_dirent_t *de1, *de2, *max_de1, *max_de2;
> +    de1 = repo_first_dirent(dir1);
> +    de2 = repo_first_dirent(dir2);
> +    max_de1 = &de1[dir1->size];
> +    max_de2 = &de2[dir2->size];
> +
> +    while (de1 < max_de1 && de2 < max_de2) {
> +        if (de1->name_offset < de2->name_offset) {
> +            path[depth] = (de1++)->name_offset;
> +            fast_export_delete(depth + 1, path);
> +        } else if (de1->name_offset == de2->name_offset) {
> +            path[depth] = de1->name_offset;
> +            if (de1->content_offset != de2->content_offset) {
> +                if (repo_dirent_is_dir(de1) && repo_dirent_is_dir(de2)) {
> +                    repo_diff_r(depth + 1, path,
> +                                repo_dir_from_dirent(de1),
> +                                repo_dir_from_dirent(de2));
> +                } else {
> +                    if (repo_dirent_is_dir(de1) != repo_dirent_is_dir(de2)) {
> +                        fast_export_delete(depth + 1, path);
> +                    }
> +                    repo_git_add(depth + 1, path, de2);
> +                }
> +            }
> +            de1++;
> +            de2++;
> +        } else {
> +            path[depth] = de2->name_offset;
> +            repo_git_add(depth + 1, path, de2++);
> +        }

Might be easier to read the other way around:

		if (de1->name_offset < de2->name_offset) {
			path[depth] = (de1++)->name_offset;
			fast_export_delete(depth + 1, path);
			continue;
		}
		if (de1->name_offset > de2->name_offset) {
			path[depth] = de2->name_offset;
			repo_git_add(depth + 1, path, de2++);
			continue;
		}
		... matching paths case ...

> +    }
> +    while (de1 < max_de1) {
> +        path[depth] = (de1++)->name_offset;
> +        fast_export_delete(depth + 1, path);
> +    }
> +    while (de2 < max_de2) {
> +        path[depth] = de2->name_offset;
> +        repo_git_add(depth + 1, path, de2++);
> +    }
> +}
[...]

> +void repo_commit(uint32_t revision, char *author, char *log, char *uuid,
> +                 char *url, time_t timestamp)
> +{
> +    if (revision == 0) {
> +        active_commit = commit_alloc(1);
> +        commit_pointer(active_commit)->root_dir_offset =
> +            dir_with_dirents_alloc(0);
> +    } else {
> +        fast_export_commit(revision, author, log, uuid, url, timestamp);
> +    }
> +    num_dirs_saved = dir_pool.size;
> +    num_dirents_saved = dirent_pool.size;

I did not carefully trace the cases where repo_clone_dir may reuse a
dir.  I would be happier if someone else does.

The rest looked good to me, for what it’s worth.

Thanks,
Jonathan

  reply	other threads:[~2010-05-29 14:06 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-05-23 21:40 [PATCH 0/7] Import David's SVN exporter Ramkumar Ramachandra
2010-05-23 21:40 ` [WIP PATCH 1/7] Add skeleton remote helper for SVN Ramkumar Ramachandra
2010-05-23 21:40 ` [PATCH 2/7] Add cpp macro implementation of treaps Ramkumar Ramachandra
2010-05-29  7:18   ` Jonathan Nieder
2010-05-30  9:09     ` Ramkumar Ramachandra
2010-05-30  9:31       ` Jonathan Nieder
2010-05-30  9:33         ` Ramkumar Ramachandra
2010-05-23 21:40 ` [PATCH 3/7] Add buffer pool library Ramkumar Ramachandra
2010-05-24  7:47   ` Peter Baumann
2010-05-24 10:11     ` Ramkumar Ramachandra
2010-05-24 10:37       ` David Michael Barr
2010-05-29  8:51   ` Jonathan Nieder
2010-05-29 10:55     ` David Michael Barr
2010-05-23 21:40 ` [PATCH 4/7] Add a memory " Ramkumar Ramachandra
2010-05-29  9:06   ` Jonathan Nieder
2010-05-30  9:12     ` Ramkumar Ramachandra
2010-05-30  9:55       ` Jonathan Nieder
2010-05-30 10:51         ` Ramkumar Ramachandra
2010-05-23 21:40 ` [PATCH 5/7] Add API for string-specific memory pool Ramkumar Ramachandra
2010-05-29 11:38   ` Jonathan Nieder
2010-05-30  9:38     ` Ramkumar Ramachandra
2010-05-30 10:09       ` Jonathan Nieder
2010-05-30 16:52     ` Ramkumar Ramachandra
2010-05-23 21:40 ` [PATCH 6/7] Add SVN revision parser and exporter Ramkumar Ramachandra
2010-05-29 14:06   ` Jonathan Nieder [this message]
2010-05-30 15:58     ` Ramkumar Ramachandra
2010-05-23 21:40 ` [PATCH 7/7] Add handler for SVN dump Ramkumar Ramachandra
2010-05-30  8:59   ` Jonathan Nieder
2010-05-30 10:45     ` Ramkumar Ramachandra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100529140645.GA8338@progeny.tock \
    --to=jrnieder@gmail.com \
    --cc=artagnon@gmail.com \
    --cc=david.barr@cordelta.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.