All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ramkumar Ramachandra <artagnon@gmail.com>
To: "dev@subversion.apache.org" <dev@subversion.apache.org>,
	Bert Huijben <rhuijben@collab.net>,
	Daniel Shahaf <d.s@daniel.shahaf.name>,
	Will Palmer <wmpalmer@gmail.com>,
	David Michael B
Subject: Re: [PATCH v2] Add svnrdump
Date: Thu, 15 Jul 2010 21:02:20 +0200	[thread overview]
Message-ID: <20100715190220.GI22574@debian> (raw)
In-Reply-To: <20100713201105.GN13310@ted.stsp.name>

Hi Stefan,

Stefan Sperling writes:
> Review below.

First, thanks for the detailed review! I'll be travelling over the
next few days starting tomorrow, and didn't want to delay the response
to your review: I've marked some items "TODO" so that I can grep for
them when I'm back in India and fix them.

> This diff is needed to build svnrdump as part of svn on Unix:
> 
> Index: build.conf
> ===================================================================
> --- build.conf	(revision 963733)
> +++ build.conf	(working copy)
> @@ -167,6 +167,13 @@ libs = libsvn_wc libsvn_subr apriconv apr
>  install = bin
>  manpages = subversion/svnversion/svnversion.1
>  
> +[svnrdump]
> +description = Subversion remote repository dumper
> +type = exe
> +path = subversion/svnrdump
> +libs = libsvn_client libsvn_ra libvsvn_delta libsvn_subr apr
> +install = bin
> +
>  # Support for GNOME Keyring
>  [libsvn_auth_gnome_keyring]
>  description = Subversion GNOME Keyring Library

Thanks! Now included.

> Can you include the above bit in your diff, please, and create follow-up
> diffs relative to the root of a Subversion trunk working copy? Thanks.

Ah, I wasn't paying attention. `git diff` always produces diffs
relative to the root.

> Please also add a man page similar to the one of svnsync.
> Even though I don't like the fact that our man pages simply refer to
> the Subversion book rather than providing a small and useful subset of it,
> it's good to at least be consistent about it.

Fixed.

> > +struct dump_edit_baton {
> 
> Please add a comment here explaining what the stream is used for,
> for instance /* The output stream we write the dumpfile to. */
> 
> > +  svn_stream_t *stream;

Fixed.

> > +  svn_revnum_t current_rev;
> 
> This is only incremented by never used?

Fixed.

> > +
> > +  /* pool is for per-edit-session allocations */
> > +  apr_pool_t *pool;
> > +
> > +  /* Store the properties that changed */
> > +  apr_hash_t *properties;
> > +  apr_hash_t *del_properties; /* Value is always 0x1 */
> 
> Just say "value is undefined". Or use an apr_array_header_t.

Fixed.

> A comment here saying what propstring is for would be nice.
> 
> > +  svn_stringbuf_t *propstring;

Fixed.

> > +
> > +  /* Was a copy command issued? */
> > +  svn_boolean_t is_copy;
> 
> Copy of what and when? This baton is global for the entire edit...
> 
> Going through the code, I see that you're using this to indicate to
> dump_node() whether an add_directory() or add_file() was in fact a copy.
> Why not remove this field from the struct and add it as a parameter to
> dump_node instead?

TODO.

> > +
> > +  /* Path of changed file */
> > +  const char *changed_path;
> 
> The changed_path field seems to be unused.
> 
> According to comments in open_file() and add_file(), change_file_prop()
> and apply_textdelta() should be using this but they aren't.

TODO.

> > +  /* Temporary file to write delta to along with its checksum */
> > +  char *temp_filepath;
> 
> That's a poor variable name. What about delta_abspath?

Fixed.

> > +  svn_checksum_t *checksum;
> 
> And rename this to delta_checksum?

Actually, I figured this wasn't used and removed it.

> > +
> > +  /* Flags to trigger dumping props and text */
> > +  svn_boolean_t must_dump_props;
> > +  svn_boolean_t must_dump_text;
> 
> I'd call these dump_props and dump_text, but that's a matter of taste.

Fixed.

> > +  svn_boolean_t dump_props_pending;
> > +};
> > +
> > +struct dir_baton {
> > +  struct dump_edit_baton *eb;
> > +  struct dir_baton *parent_dir_baton;
> > +
> > +  /* is this directory a new addition to this revision? */
> > +  svn_boolean_t added;
> > +
> > +  /* has this directory been written to the output stream? */
> > +  svn_boolean_t written_out;
> > +
> > +  /* the absolute path to this directory */
> > +  const char *path;
> 
> In code written post-svn-1.6, we usually call absolute paths
> something_abspath. E.g. local_abspath is an absolute path in a
> filesystem on the client (but not in the repository).
> 
> Elsewhere, this is called 'full_path' which would be fine name
> for this field, too. (Though any full_path variable you see in svn
> most likely pre-dates the *_abspath convention.)

Renamed path to abspath.

> > +
> > +  /* the comparison path and revision of this directory.  if both of
> > +     these are valid, use them as a source against which to compare
> > +     the directory instead of the default comparison source of PATH in
> > +     the previous revision. */
> > +  const char *cmp_path;
> > +  svn_revnum_t cmp_rev;
> 
> These just seem to be used as regular copyfrom info, so let's name them
> as such: copyfrom_path and copyfrom_rev
> Then you can also shrink the comment above cause everyone knows what
> copyfrom info is: /* Copyfrom info for the node, if any. */

Fixed.

> > +
> > +  /* hash of paths that need to be deleted, though some -might- be
> > +     replaced.  maps const char * paths to this dir_baton.  (they're
> > +     full paths, because that's what the editor driver gives us.  but
> > +     really, they're all within this directory.) */
> > +  apr_hash_t *deleted_entries;
> 
> This is very well commented and well named.
> 
> > +
> > +  /* pool to be used for deleting the hash items */
> > +  apr_pool_t *pool;
> 
> Hmmm.. a pool does not delete anything. It provides storage.
> Why do you need this?

TODO.

> > +};
> > +
> > +struct handler_baton
> > +{
> > +  svn_txdelta_window_handler_t apply_handler;
> > +  void *apply_baton;
> > +  apr_pool_t *pool;
> 
> Yet another pool. What's it for?

See window_handler below :)

> > +
> > +  /* Information about the path of the tempoarary file used */
> 
> s/tempoarary/temporary/

Fixed.

> > +  char *temp_filepath;
> > +  apr_file_t *temp_file;
> > +  svn_stream_t *temp_filestream;
> 
> What's the temporary file used for? You're writing a delta to it,
> so maybe name it accordingly?

Fixed.

> You need the file name to stat it in close_file().
> You need the stream in the baton to write to the file.
> But you don't need the apr_file.
> Open the file. Then wrap it in the stream with disown=FALSE, and pass
> just the stream to the window handler via the baton. When the stream
> is closed, the file will be closed as well.

TODO. I want to validate and make sure that I don't break anything
else or leak memory before performing this change.

> > +
> > +  /* To fill in the edit baton fields */
> > +  struct dump_edit_baton *eb;
> 
> Just say /* Global edit baton. */ or even drop the comment.

Fixed.

> > +};
> > +
> 
> Needs a docstring.

Fixed. I noticed one more mistake: to_rev is unused. Looks like
there's a LOT of historical crufts that I didn't clean up. I wonder
why the compiler doesn't tell me about all this.

> > +/* Make a directory baton to represent the directory was path
> > +   (relative to EDIT_BATON's path) is PATH.
> 
> The above sentence doesn't parse.
> 
> > +
> > +   CMP_PATH/CMP_REV are the path/revision against which this directory
> > +   should be compared for changes.  If either is omitted (NULL for the
> > +   path, SVN_INVALID_REVNUM for the rev), just compare this directory
> > +   PATH against itself in the previous revision.
> 
> s/CMP/COPYFROM/ and tweak the docstring to say something like:
>   If the copyfrom information is valid, the directory will be compared
>   against its copy source. Else, it will be compared against itself in
>   the previous revision.

Another historical cruft: since svnrdump doesn't support non-deltified
dumps and doesn't have fs backing, I can't and don't ever (need to)
compare it against itself in the previous revision.

> > +   PARENT_DIR_BATON is the directory baton of this directory's parent,
> > +   or NULL if this is the top-level directory of the edit.  ADDED
> > +   indicated if this directory is newly added in this revision.
> 
> s/indicated/indicates/

Fixed.

> > +  apr_array_header_t *compose_path = apr_array_make(pool, 2,
> > +						    sizeof(const char *));
> 
> The above line contains tabs, please replace with spaces.
> And make sure to align function arguments like this (not sure if
> they appeared aligned in your editor or not):

Fixed.

> > +  if (pb) {
> 
> I've told you this on IRC before, but just for sake of completeness:
> Virtually all if blocks and loops in this patch have a "wrong" style
> of indentation.
> See
> http://subversion.apache.org/docs/community-guide/conventions.html#coding-style
> 
> (I personally prefer the indentation style you're using,
> but the project convention has been set looooong ago -- such is life.)

Ah, I missed this earlier- you have an svn-dev.el: I'll use it to fix
everything before re-submitting.

> > +    APR_ARRAY_PUSH(compose_path, const char *) = "/";
> > +    APR_ARRAY_PUSH(compose_path, const char *) = path;
> > +    full_path = svn_path_compose(compose_path, pool);
> 
> See svn_dirent_join_many().

Fixed.

> > +  }
> > +  else
> > +    full_path = apr_pstrdup(pool, "/");
> 
> Why allocate "/" in a pool? This can be static string unless you
> intend to write to it.

Frankly, working with APR pools was quite a nightmare for me- after
observing many cases of leaks and crashes, I jotted down some notes
about using them and I made it a point to follow them strictly. This
alloc adheres to those notes. I'll submit those notes to dev@ once
I've polished them- new devs will probably find it useful.

> > +/*
> > + * Write out a node record for PATH of type KIND under EB->FS_ROOT.
> > + * ACTION describes what is happening to the node (see enum svn_node_action).
> > + * Write record to writable EB->STREAM, using EB->BUFFER to write in chunks.
> > + *
> > + * If the node was itself copied, IS_COPY is TRUE and the
> > + * path/revision of the copy source are in CMP_PATH/CMP_REV.  If
> > + * IS_COPY is FALSE, yet CMP_PATH/CMP_REV are valid, this node is part
> > + * of a copied subtree.
> 
> Again, s/CMP/COPYFROM/

Fixed.

> > + */
> > +static svn_error_t *
> > +dump_node(struct dump_edit_baton *eb,
> > +          const char *path,    /* an absolute path. */
> > +          svn_node_kind_t kind,
> > +          enum svn_node_action action,
> > +          const char *cmp_path,
> > +          svn_revnum_t cmp_rev,
> > +          apr_pool_t *pool)
> > +{
> > +  /* Write out metadata headers for this file node. */
> 
> The node might as well be a directory, so the above comment is misleading.

Fixed.

> > +  /* Remove leading slashes from copyfrom paths. */
> > +  if (cmp_path)
> > +    cmp_path = ((*cmp_path == '/') ? cmp_path + 1 : cmp_path);
> 
> What if the copyfrom path is "/"?
> (If memory serves me right we've had a bug like this before somewhere...)

Right, I copied this out from somewhere I think. Fixed with:
if (copyfrom_path && strcmp(copyfrom_path, "/"))

> > +
> > +  switch (action) {
> > +    /* Appropriately handle the four svn_node_action actions */
> 
> Nuke the above comment. Don't put numbers that may change some day
> into comments.

Okay. Fixed.

> > +    if (!eb->is_copy) {
> > +      /* eb->dump_props_pending for files is handled in
> > +         close_file which is called immediately.
> > +         However, directories are not closed until
> > +         all the work inside them have been done;
> 
> s/have been/has been/

Fixed.

> > +         eb->dump_props_pending for directories is
> > +         handled in all the functions that can
> > +         possibly be called after add_directory:
> > +         add_directory, open_directory,
> > +         delete_entry, close_directory, add_file,
> > +         open_file and change_dir_prop;
> > +         change_dir_prop is a special case
> > +         ofcourse */
> 
> Please re-format the above using longer lines (up to column 78).

Fixed.

> > +static svn_error_t *open_root(void *edit_baton,
> > +			      svn_revnum_t base_revision,
> > +			      apr_pool_t *pool,
> > +			      void **root_baton)
> 
> tabs in above 3 lines

Fixed.

> > +{
> > +  /* Allocate a special pool for the edit_baton to avoid pool
> > +     lifetime issues */
> 
> I think you don't need this comment because this is already
> sort of documented in the docstring for open_root() in svn_delta.h.

It took me a while to grasp this even after reading the documentation
of open_root, and I think others will find it useful. Didn't remove
comment.

> > +  if (val)
> > +    /* Delete the path, it's now been dumped */
> > +    apr_hash_set(pb->deleted_entries, path, APR_HASH_KEY_STRING, NULL);
> 
> You don't need to set the value to NULL in the hash table.
> Doing so won't save any memory. I've say just remove the above 3 lines.

Oh, I'm not doing it to save memory. Although I'm not sure if I still
need it in my logic, this definitely makes debugging nicer.

> tabs again in the above line

Fixed.

> > +    APR_ARRAY_PUSH(compose_path, const char *) =
> > +	    svn_relpath_basename(path, pool);
> 
> and here is another tab 

Fixed.

> > +    cmp_path = svn_path_compose(compose_path, pool);
> 
> Again, see svn_dirent_join_many().
> If you need a svn_relpath_join_many() for some reason please write one.

Fixed. Yes, svn_relpath_join_many sounds like a good idea: I'll mark
that as a TODO for now.

> > +static svn_error_t *
> > +close_directory(void *dir_baton,
> > +                apr_pool_t *pool)
> > +{
> > +  struct dir_baton *db = dir_baton;
> > +  struct dump_edit_baton *eb = db->eb;
> > +  apr_hash_index_t *hi;
> > +  apr_pool_t *subpool = svn_pool_create(pool);
> 
> Please call this iterpool, not subpool.
> You're using it in a loop (so we prefer "iteration pool").

Right. Fixed.

> > +
> > +  /* Some pending properties to dump? */
> > +  SVN_ERR(dump_props(eb, &(eb->dump_props_pending), TRUE, pool));
> > +
> > +  /* Dump the directory entries */
> > +  for (hi = apr_hash_first(pool, db->deleted_entries); hi;
> > +       hi = apr_hash_next(hi)) {
> > +    const void *key;
> > +    const char *path;
> > +    apr_hash_this(hi, &key, NULL, NULL);
> > +    path = key;
> 
> See svn__apr_hash_index_key().

TODO.

> > +  apr_array_header_t *compose_path = apr_array_make(pool, 2,
> > +						    sizeof(const char *));
> 
> tabs

Fixed.

> > +  /* If the parent directory has explicit comparison path and rev,
> 
> s/comparison/copyfrom/

Fixed.

> > +     record the same for this one. */
> > +  if (pb && ARE_VALID_COPY_ARGS(pb->cmp_path, pb->cmp_rev)) {
> > +    APR_ARRAY_PUSH(compose_path, const char *) = pb->cmp_path;
> > +    APR_ARRAY_PUSH(compose_path, const char *) =
> > +	    svn_relpath_basename(path, pool);
> 
> one more tab

Fixed.

> > +  /* Write information about the filepath to hb->eb */
> 
> s/to hb->eb/from the handler baton to the edit baton/

Er, I did mean `hb->eb` literally (the editor baton in the handler
baton).

> > +  /* Cleanup */
> > +  SVN_ERR(svn_io_file_close(hb->temp_file, hb->pool));
> 
> As described above, you don't need to close the file,
> closing the stream is enough.

TODO.

> > +  /* Custom handler_baton allocated in a separate pool */
> > +  apr_pool_t *handler_pool = svn_pool_create(pool);
> > +  struct handler_baton *hb = apr_pcalloc(handler_pool, sizeof(*hb));
> > +  hb->pool = handler_pool;
> 
> It sucks that the window handler does not get pool arguments, so
> you have to stick a pool in the baton. But that isn't your fault.

Exactly.

> > +  hb->eb = eb;
> > +
> > +  /* Use a temporary file to measure the text-content-length */
> > +  SVN_ERR(svn_io_temp_dir(&tempdir, hb->pool));
> > +
> > +  hb->temp_filepath = svn_dirent_join(tempdir, "XXXXXX", hb->pool);
> > +  apr_err = apr_file_mktemp(&(hb->temp_file), hb->temp_filepath,
> > +          APR_CREATE | APR_READ | APR_WRITE | APR_EXCL,
> > +          hb->pool);
> > +  if (apr_err != APR_SUCCESS)
> > +    SVN_ERR(svn_error_wrap_apr(apr_err, NULL));
> 
> You can replace the above chunk with a simple call to
> svn_io_open_unique_file3().

TODO. If I recall correctly, someone else also suggested this on IRC,
but there seems to be some issue with it: I'll check this later.

> > +    SVN_ERR(svn_io_file_open(&temp_file, eb->temp_filepath,APR_READ,
> > +			     0600,pool));
> 
> tabs again

Fixed.

> > +  struct dump_edit_baton *eb = apr_pcalloc(pool,
> > +					   sizeof(struct dump_edit_baton));
> 
> more tabs

Fixed.

> > +static int verbose = 0;
> > +static apr_pool_t *pool = NULL;
> > +static svn_client_ctx_t *ctx = NULL;
> 
> You're only using the client context in open_connection.
> Make it a local variable there?

I was actually worried about lifetime issues. If ctx won't be read/
written after open_connection, this is okay. Otherwise, not. TODO.

> > +  struct replay_baton *replay_baton = apr_palloc(pool,
> > +						 sizeof(struct replay_baton));
> 
> more tabs

Fixed.

> > +  fprintf(out_stream,
> 
> Use svn_cmdline_fprintf()

Fixed.

> > +    "usage: svnrdump URL [-r LOWER[:UPPER]]\n\n"
> 
> This string needs to be marked for localisation like this: _("my string")

TODO. I'm missing some header: _ is undefined.

> > +    "Dump the contents of repository at remote URL to stdout in a 'dumpfile'\n"
> > +    "v3 portable format.  Dump revisions LOWER rev through UPPER rev.\n"
> 
> You don't need to mention the dumpfile format version in the help
> string.

Okay. I need to mention somewhere that svnrdump doesn't support
undeltified dumps though, don't I?

> > +    "LOWER defaults to 1 and UPPER defaults to the highest possible revision\n"
> > +    "if omitted.\n");
> > +  for (i = 1; i < argc; i++) {
> 
> Please use svn_cmdline__getopt_init() and apr_getopt_long().
> See svnsync for an example.

Ouch. Don't you think it's an overkill for the current svnrdump? There
are no subcommands and just a few command-line arguments.

> Please add a docstring.
> 
> > +void
> > +write_hash_to_stringbuf(apr_hash_t *properties,
> > +                        svn_boolean_t deleted,
> > +                        svn_stringbuf_t **strbuf,
> > +                        apr_pool_t *pool);
> > +

Fixed.

> Please add a docstring.
> 
> > +svn_error_t *
> > +dump_props(struct dump_edit_baton *eb,
> > +           svn_boolean_t *trigger_var,
> > +           svn_boolean_t dump_data_too,
> > +           apr_pool_t *pool);
> > +
> > +#endif

Fixed. Doxygen-friendly docstrings are a TODO.

> > +#include "svn_pools.h"
> > +#include "svn_cmdline.h"
> > +#include "svn_client.h"
> > +#include "svn_ra.h"
> > +#include "svn_repos.h"
> 
> Are all these includes really needed?

It seems only svn_repos.h is needed. Fixed.

> > +void
> > +write_hash_to_stringbuf(apr_hash_t *properties,
> > +                        svn_boolean_t deleted,
> > +                        svn_stringbuf_t **strbuf,
> > +                        apr_pool_t *pool)
> > +{
> 
> This function needs a docstring, too.

Wait. I just need to write the docstrings once, right? In the header?

> And is there no function that already does this somewhere in the svn
> libraries or in APR?

No. I copied this out from subversion/svnrdump/svnrdump.c and
refactored it a little bit.

You can see the changes I made after your review in the most recent
couple of commits on my GitHub [1].

[1]: http://github.com/artagnon/svn-dump-fast-export/commits/svn-merge

-- Ram

  parent reply	other threads:[~2010-07-15 19:00 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-07-09 14:29 [PATCH v2] Add svnrdump Ramkumar Ramachandra
2010-07-13 20:11 ` Stefan Sperling
2010-07-14 15:32   ` Stefan Sperling
2010-07-14 16:01     ` Ramkumar Ramachandra
2010-07-14 16:48       ` C. Michael Pilato
2010-07-15 10:28         ` Ramkumar Ramachandra
2010-07-14 17:24       ` Stefan Sperling
2010-07-14 17:31         ` C. Michael Pilato
2010-07-14 17:34           ` Stefan Sperling
2010-07-14 17:47           ` Jonathan Nieder
2010-07-14 17:56             ` C. Michael Pilato
2010-07-15 12:01         ` Ramkumar Ramachandra
2010-07-14 19:25       ` Bert Huijben
2010-07-15 12:07         ` Ramkumar Ramachandra
2010-07-15 19:02   ` Ramkumar Ramachandra [this message]
2010-07-15 19:23     ` Stefan Sperling
2010-07-21 11:46       ` Ramkumar Ramachandra
2010-07-21 13:29         ` Daniel Shahaf
2010-07-21 19:03   ` Ramkumar Ramachandra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100715190220.GI22574@debian \
    --to=artagnon@gmail.com \
    --cc=d.s@daniel.shahaf.name \
    --cc=dev@subversion.apache.org \
    --cc=rhuijben@collab.net \
    --cc=wmpalmer@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.