All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stefan Beller <stefanbeller@googlemail.com>
To: Johannes Sixt <j6t@kdbg.org>
Cc: git@vger.kernel.org, l.s.r@web.de, mfick@codeaurora.org,
	apelisse@gmail.com, Matthieu.Moy@grenoble-inp.fr,
	pclouds@gmail.com, iveqy@iveqy.com, gitster@pobox.com,
	mackyle@gmail.com
Subject: Re: [RFC PATCHv4] repack: rewrite the shell script in C.
Date: Tue, 20 Aug 2013 17:08:54 +0200	[thread overview]
Message-ID: <52138686.1070304@googlemail.com> (raw)
In-Reply-To: <52136F9C.6030308@kdbg.org>

[-- Attachment #1: Type: text/plain, Size: 15287 bytes --]

On 08/20/2013 03:31 PM, Johannes Sixt wrote:
> 
> Are the long forms of options your invention?

I tried to keep strong similarity with the shell script for
ease of review. In the shellscript the options where 
put in variables having these names, so for example there was:

	-f)	no_reuse=--no-reuse-delta ;;
	-F)	no_reuse=--no-reuse-object ;;

So I used these variable names as well in here. And as I assumed
the variables are meaningful in itself.

In the shell script they may be meaningful, but with the option
parser in the C version, I overlooked the possibility for 
--no-<option> being possible as you noted below.

Maybe we should inverse the logic and have the variables and options
called reuse-delta and being enabled by default.

> 
>> +        OPT_BOOL('a', "all", &pack_everything,
>> +                N_("pack everything in a single pack")),
>> +        OPT_BOOL('A', "all-but-loose", &pack_everything_but_loose,
>> +                N_("same as -a, and turn unreachable objects loose")),
> 
> --all-but-loose does not express what the help text says. The long form
> of -A is --all --unpack-unreachable, so it is really just a short option
> for convenience. It does not need its own long form.

Ok, I'll keep that in mind, and will only use the varialbe tied to -A
to set the -a and --unpack-unreachable variable.

> 
>> +        OPT_BOOL('d', "delete-redundant", &delete_redundant,
>> +                N_("remove redundant packs, and run git-prune-packed")),
>> +        OPT_BOOL('f', "no-reuse-delta", &no_reuse_delta,
>> +                N_("pass --no-reuse-delta to git-pack-objects")),
>> +        OPT_BOOL('F', "no-reuse-object", &no_reuse_object,
>> +                N_("pass --no-reuse-object to git-pack-objects")),
> 
> Do we want to allow --no-no-reuse-delta and --no-no-reuse-object?

see above, I'd try not to.

> 
>> +        OPT_BOOL('n', NULL, &no_update_server_info,
>> +                N_("do not run git-update-server-info")),
> 
> No long option name?

This is also a negated option, so as above, maybe 
we could have --update_server_info and --no-update_server_info
respectively. Talking about the shortform then: Is it possible to
negate the shortform?

> 
>> +        OPT__QUIET(&quiet, N_("be quiet")),
>> +        OPT_BOOL('l', "local", &local,
>> +                N_("pass --local to git-pack-objects")),
> 
> Good.
> 
>> +        OPT_STRING(0, "unpack-unreachable", &unpack_unreachable,
>> N_("approxidate"),
>> +                N_("with -A, do not loosen objects older than this
>> Packing constraints")),
> 
> "Packing constraints" is a section heading, not a continuation of the
> previous help text.
> 
>> +        OPT_INTEGER(0, "window", &window,
>> +                N_("size of the window used for delta compression")),
> 
> This help text is suboptimal as the option is a count, not a "size" in
> the narrow sense. But that can be changed later (as it would affect
> other tools as well, I guess).
> 
>> +        OPT_INTEGER(0, "window-memory", &window_memory,
>> +                N_("same as the above, but limit memory size instead
>> of entries count")),
>> +        OPT_INTEGER(0, "depth", &depth,
>> +                N_("limits the maximum delta depth")),
>> +        OPT_INTEGER(0, "max-pack-size", &max_pack_size,
>> +                N_("maximum size of each packfile")),
>> +        OPT_END()
>> +    };
> 
> Good.
> 
>> +
>> +    git_config(repack_config, NULL);
>> +
>> +    argc = parse_options(argc, argv, prefix, builtin_repack_options,
>> +                git_repack_usage, 0);
>> +
>> +    sigchain_push_common(remove_pack_on_signal);
> 
> Good.
> 
>> +    packdir = mkpathdup("%s/pack", get_object_directory());
>> +    packtmp = mkpathdup("%s/.tmp-%d-pack", packdir, getpid());
> 
> Should this not be
> 
>     packdir = xstrdup(git_path("pack"));
>     packtmp = xstrdup(git_path("pack/.tmp-%d-pack", getpid()));
> 
> Perhaps make packdir and packtmp global so that the strings need not be
> duplicated in get_pack_filenames and remove_temporary_files?

ok

> 
>> +
>> +    remove_temporary_files();
> 
> Yes, the shell script had this. But is it really necessary?

Well I can drop it if it's not needed.
It actually should implement
	rm -f "$PACKTMP"-*
and then the trap 'rm -f "$PACKTMP"-*' 0 1 2 3 15
as well.

> 
>> +
>> +    struct argv_array cmd_args = ARGV_ARRAY_INIT;
> 
> Declaration after statement.

will fix.

> 
>> +    argv_array_push(&cmd_args, "pack-objects");
>> +    argv_array_push(&cmd_args, "--keep-true-parents");
>> +    argv_array_push(&cmd_args, "--honor-pack-keep");
>> +    argv_array_push(&cmd_args, "--non-empty");
>> +    argv_array_push(&cmd_args, "--all");
>> +    argv_array_push(&cmd_args, "--reflog");
>> +
>> +    if (window)
>> +        argv_array_pushf(&cmd_args, "--window=%u", window);
>> +
>> +    if (window_memory)
>> +        argv_array_pushf(&cmd_args, "--window-memory=%u",
>> window_memory);
>> +
>> +    if (depth)
>> +        argv_array_pushf(&cmd_args, "--depth=%u", depth);
>> +
>> +    if (max_pack_size)
>> +        argv_array_pushf(&cmd_args, "--max_pack_size=%u",
>> max_pack_size);
>> +
>> +    if (no_reuse_delta)
>> +        argv_array_pushf(&cmd_args, "--no-reuse-delta");
>> +
>> +    if (no_reuse_object)
>> +        argv_array_pushf(&cmd_args, "--no-reuse-object");
> 
> no_reuse_delta and no_reuse_object are mutually exclusive, according to
> the shell script version.

I'll change it then to OPT_BIT and die() when both are set.

> 
>> +
>> +    if (pack_everything + pack_everything_but_loose == 0) {
>> +        argv_array_push(&cmd_args, "--unpacked");
>> +        argv_array_push(&cmd_args, "--incremental");
>> +    } else {
>> +        struct string_list fname_list = STRING_LIST_INIT_DUP;
>> +        get_pack_filenames(packdir, &fname_list);
>> +        for_each_string_list_item(item, &fname_list) {
>> +            char *fname;
>> +            fname = mkpathdup("%s/%s.keep", packdir, item->string);
>> +            if (stat(fname, &statbuffer) &&
>> S_ISREG(statbuffer.st_mode)) {
> 
>             if (!stat(fname, &statbuffer) && ...
> 
> But you are using file_exists() later. That should be good enough here
> as well, no?

will do.

> 
>> +                /* when the keep file is there, we're ignoring that
>> pack */
>> +            } else {
>> +                string_list_append(&existing_packs, item->string);
>> +            }
>> +            free(fname);
>> +        }
>> +
>> +        if (existing_packs.nr && delete_redundant) {
>> +            if (unpack_unreachable)
>> +                argv_array_pushf(&cmd_args,
>> "--unpack-unreachable=%s", unpack_unreachable);
>> +            else if (pack_everything_but_loose)
>> +                argv_array_push(&cmd_args, "--unpack-unreachable");
>> +        }
>> +    }
>> +
>> +    if (local)
>> +        argv_array_push(&cmd_args,  "--local");
>> +    if (quiet)
>> +        argv_array_push(&cmd_args,  "--quiet");
>> +    if (delta_base_offset)
>> +        argv_array_push(&cmd_args,  "--delta-base-offset");
>> +
>> +    argv_array_push(&cmd_args, packtmp);
> 
> Otherwise, argument setup looks fine.
> 
>> +
>> +    memset(&cmd, 0, sizeof(cmd));
>> +    cmd.argv = argv_array_detach(&cmd_args, NULL);
> 
> Is it necessary to detach the arguments?

Probably not. 

> 
>> +    cmd.git_cmd = 1;
>> +    cmd.out = -1;
>> +    cmd.no_stdin = 1;
>> +
>> +    if (run_command(&cmd))
>> +        return 1;
> 
> You cannot run_command() and then later read its output! You must split
> it into start_command(), read stdout, finish_command().

Thanks for this hint. Could that explain rare non-deterministic failures in
the test suite?

> 
>> +
>> +    struct string_list names = STRING_LIST_INIT_DUP;
>> +    struct string_list rollback = STRING_LIST_INIT_DUP;
> 
> Declaration after statement.

will fix
> 
>> +
>> +    char line[1024];
>> +    int counter = 0;
>> +    FILE *out = xfdopen(cmd.out, "r");
>> +    while (fgets(line, sizeof(line), out)) {
>> +        /* a line consists of 40 hex chars + '\n' */
>> +        assert(strlen(line) == 41);
> 
> You cannot make assertions about input that you read from an external
> command! You can die() if the expectation is not met. But I think that
> in this case the only necessary expectation is that a line is not empty.
> 
> BTW, don't we have strbuf functions to read from an fd linewise?

I'll check.

> 
>> +        line[40] = '\0';
>> +        string_list_append(&names, line);
>> +        counter++;
>> +    }
>> +    if (!counter)
>> +        printf("Nothing new to pack.\n");
> 
> This was 'say Nothing new to pack.'. say obeys --quiet, IIRC.

ok
> 
>> +    fclose(out);
>> +
>> +    int failed = 0;
>> +    for_each_string_list_item(item, &names) {
>> +        for (ext = 0; ext < 1; ext++) {
>> +            char *fname, *fname_old;
>> +            fname = mkpathdup("%s/%s%s", packdir, item->string,
>> exts[ext]);
>> +            if (!file_exists(fname)) {
>> +                free(fname);
>> +                continue;
>> +            }
>> +
>> +            fname_old = mkpathdup("%s/old-%s%s", packdir,
>> item->string, exts[ext]);
> 
> If you could use git_path() instead of mkpathdup() in these two cases,
> we would not need to free() the names.
> 
>> +            if (file_exists(fname_old))
>> +                unlink(fname_old);
>> +
>> +            if (rename(fname, fname_old)) {
>> +                failed = 1;
>> +                break;
>> +            }
>> +            free(fname_old);
>> +            string_list_append_nodup(&rollback, fname);
> 
> Ah, we would need to allocate here then.
> 
>> +        }
>> +        if (failed)
>> +            /* set to last element to break for_each loop */
>> +            item = names.items + names.nr;
> 
> A mere
>             break;
> doesn't do it here?

Sure! I'll replace by break.

> 
>> +    }
>> +    if (failed) {
>> +        struct string_list rollback_failure;
>> +        for_each_string_list_item(item, &rollback) {
>> +            char *fname, *fname_old;
>> +            fname = mkpathdup("%s/%s", packdir, item->string);
>> +            fname_old = mkpathdup("%s/old-%s", packdir, item->string);
> 
> I think it's possible to attach arbitrary data to each string_list item.
> We could attach the "%s/old-%s" name to the item name, then we wouldn't
> need to re-construct the names here.

handy! I'll try to do that.

> 
>> +            if (rename(fname_old, fname))
>> +                string_list_append(&rollback_failure, fname);
>> +            free(fname);
>> +            free(fname_old);
>> +        }
>> +
>> +        if (rollback.nr) {
>> +            int i;
>> +            fprintf(stderr,
>> +                "WARNING: Some packs in use have been renamed by\n"
>> +                "WARNING: prefixing old- to their name, in order to\n"
>> +                "WARNING: replace them with the new version of the\n"
>> +                "WARNING: file.  But the operation failed, and the\n"
>> +                "WARNING: attempt to rename them back to their\n"
>> +                "WARNING: original names also failed.\n"
>> +                "WARNING: Please rename them in $PACKDIR manually:\n");
>> +            for (i = 0; i < rollback.nr; i++)
>> +                fprintf(stderr, "WARNING:   old-%s -> %s\n",
>> +                    rollback.items[i].string,
>> +                    rollback.items[i].string);
>> +        }
>> +        exit(1);
>> +    }
>> +
>> +    /* Now the ones with the same name are out of the way... */
>> +    for_each_string_list_item(item, &names) {
>> +        for (ext = 0; ext < 2; ext++) {
>> +            char *fname, *fname_old;
>> +            fname = mkpathdup("%s/pack-%s%s", packdir, item->string,
>> exts[ext]);
>> +            fname_old = mkpathdup("%s-%s%s", packtmp, item->string,
>> exts[ext]);
> 
> Same here: git_path()?
> 
>> +            stat(fname_old, &statbuffer);
> 
> We ignore errors during chmod in the shell script. But this doesn't give
> you license to ignore stat() errors completely: If stat() fails, then
> don't chmod() below, either.

ok

> 
>> +            statbuffer.st_mode &= ~S_IWUSR | ~S_IWGRP | ~S_IWOTH;
> 
>             statbuffer.st_mode &= ~(S_IWUSR|S_IWGRP|S_IWOTH);
> 
>> +            chmod(fname_old, statbuffer.st_mode);
>> +            if (rename(fname_old, fname))
>> +                die("Could not rename packfile: %s -> %s", fname_old,
>> fname);
> 
> Use die_errno() here.
> 
>> +            free(fname);
>> +            free(fname_old);
>> +        }
>> +    }
>> +
>> +    /* Remove the "old-" files */
>> +    for_each_string_list_item(item, &names) {
>> +        char *fname;
>> +        fname = mkpathdup("%s/old-pack-%s.idx", packdir, item->string);
>> +        if (remove_path(fname))
>> +            die("Could not remove file: %s", fname);
> 
> die_errno() makes sense here, too.
> 
>> +        free(fname);
>> +
>> +        fname = mkpathdup("%s/old-pack-%s.pack", packdir, item->string);
>> +        if (remove_path(fname))
>> +            die("Could not remove file: %s", fname);
> 
> and here as well.
> 
>> +        free(fname);
> 
> Again git_path?
> 
>> +    }
>> +
>> +    /* End of pack replacement. */
> 
> Nit: A blank line should follow this comment.
> 
>> +    if (delete_redundant) {
>> +        sort_string_list(&names);
>> +        for_each_string_list_item(item, &existing_packs) {
>> +            char *sha1;
>> +            size_t len = strlen(item->string);
>> +            if (len < 40)
>> +                continue;
>> +            sha1 = item->string + len - 40;
>> +            if (!string_list_has_string(&names, sha1))
>> +                remove_pack(packdir, item->string);
>> +        }
> 
> OK.
> 
>> +        argv_array_clear(&cmd_args);
>> +        argv_array_push(&cmd_args, "prune-packed");
>> +        if (quiet)
>> +            argv_array_push(&cmd_args, "--quiet");
>> +
>> +        memset(&cmd, 0, sizeof(cmd));
>> +        cmd.argv = argv_array_detach(&cmd_args, NULL);
> 
> Again: is it necessary to detach?
> 
>> +        cmd.git_cmd = 1;
>> +        run_command(&cmd);
>> +    }
>> +
>> +    if (!no_update_server_info) {
>> +        argv_array_clear(&cmd_args);
>> +        argv_array_push(&cmd_args, "update-server-info");
>> +
>> +        memset(&cmd, 0, sizeof(cmd));
>> +        cmd.argv = argv_array_detach(&cmd_args, NULL);
> 
> Same here?
> 
>> +        cmd.git_cmd = 1;
>> +        run_command(&cmd);
>> +    }
>> +    return 0;
>> +}
> 
> In my opinion, it is good that you keep a large function that resembles
> the structure of the shell script because it is easier to review. But
> ultimately, it should be factored into smaller functions.
> 
> -- Hannes
> 

Hannes,

thank you very much for the review. I'll follow your suggestions and dive
deeper into the API to change your annotated lines.

Thanks,
Stefan




[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 899 bytes --]

  reply	other threads:[~2013-08-20 15:09 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-13 19:23 [PATCH] Rewriting git-repack in C Stefan Beller
2013-08-13 19:23 ` [PATCH] repack: rewrite the shell script " Stefan Beller
2013-08-14  7:26   ` Matthieu Moy
2013-08-14 16:26     ` Stefan Beller
2013-08-14 16:27       ` [RFC PATCH] " Stefan Beller
2013-08-14 16:49         ` Antoine Pelisse
2013-08-14 17:04           ` Stefan Beller
2013-08-14 17:19             ` Jeff King
2013-08-14 17:25           ` Martin Fick
2013-08-14 22:16             ` Stefan Beller
2013-08-14 22:28               ` Martin Fick
2013-08-14 22:53                 ` Junio C Hamano
2013-08-14 23:28                   ` Martin Fick
2013-08-15 17:15                     ` Junio C Hamano
2013-08-16  0:12                       ` [RFC PATCHv2] " Stefan Beller
2013-08-17 13:34                         ` René Scharfe
2013-08-17 19:18                           ` Kyle J. McKay
2013-08-18 14:34                           ` Stefan Beller
2013-08-18 14:36                             ` [RFC PATCHv3] " Stefan Beller
2013-08-18 15:41                               ` Kyle J. McKay
2013-08-18 16:44                               ` René Scharfe
2013-08-18 22:26                                 ` [RFC PATCHv4] " Stefan Beller
2013-08-19 23:23                                   ` Stefan Beller
2013-08-20 13:31                                     ` Johannes Sixt
2013-08-20 15:08                                       ` Stefan Beller [this message]
2013-08-20 18:38                                         ` Johannes Sixt
2013-08-20 18:57                                         ` René Scharfe
2013-08-20 22:36                                           ` Stefan Beller
2013-08-20 22:38                                             ` [PATCH] " Stefan Beller
2013-08-21  8:25                                               ` Jonathan Nieder
2013-08-21 10:37                                                 ` Stefan Beller
2013-08-21 17:25                                                 ` Stefan Beller
2013-08-21 17:28                                                   ` [RFC PATCHv6 1/2] " Stefan Beller
2013-08-21 17:28                                                     ` [RFC PATCHv6 2/2] repack: retain the return value of pack-objects Stefan Beller
2013-08-21 20:56                                                     ` [RFC PATCHv6 1/2] repack: rewrite the shell script in C Junio C Hamano
2013-08-21 21:52                                                       ` Matthieu Moy
2013-08-21 22:15                                                       ` Stefan Beller
2013-08-21 22:50                                                         ` Junio C Hamano
2013-08-21 22:57                                                           ` Stefan Beller
2013-08-22 10:46                                                         ` Johannes Sixt
2013-08-22 21:03                                                       ` Jonathan Nieder
2013-08-21  8:49                                               ` [PATCH] " Matthieu Moy
2013-08-21 12:47                                                 ` Stefan Beller
2013-08-21 13:05                                                   ` Matthieu Moy
2013-08-21 12:53                                                 ` Stefan Beller
2013-08-21 13:07                                                   ` Matthieu Moy
2013-08-22 10:46                                                     ` Johannes Sixt
2013-08-22 10:46                                                 ` Johannes Sixt
2013-08-22 20:06                                                   ` [PATCH] repack: rewrite the shell script in C (squashing proposal) Stefan Beller
2013-08-22 20:31                                                     ` Junio C Hamano
2013-08-20 22:46                                             ` [RFC PATCHv4] repack: rewrite the shell script in C Jonathan Nieder
2013-08-21  9:20                                             ` Johannes Sixt
2013-08-20 21:24                                       ` Stefan Beller
2013-08-20 21:34                                         ` Jonathan Nieder
2013-08-20 21:40                                           ` Dokumenting api-paths.txt Stefan Beller
2013-08-20 21:59                                             ` Jonathan Nieder
2013-08-21 22:43                                               ` Stefan Beller
2013-08-22 17:29                                                 ` Junio C Hamano
2013-08-14 22:51               ` [RFC PATCH] repack: rewrite the shell script in C Junio C Hamano
2013-08-14 22:59                 ` Matthieu Moy
2013-08-15  7:47                   ` Stefan Beller
2013-08-15  4:15             ` Duy Nguyen
2013-08-14 17:26           ` Junio C Hamano
2013-08-14 22:51           ` Matthieu Moy
2013-08-14 23:25             ` Martin Fick
2013-08-15  0:26               ` Martin Fick
2013-08-15  7:46               ` Stefan Beller
2013-08-15 15:04                 ` Martin Fick
2013-08-15  4:20             ` Duy Nguyen
2013-08-14 17:04         ` Junio C Hamano
2013-08-15  7:53           ` Stefan Beller
2013-08-14  7:12 ` [PATCH] Rewriting git-repack " Matthieu Moy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52138686.1070304@googlemail.com \
    --to=stefanbeller@googlemail.com \
    --cc=Matthieu.Moy@grenoble-inp.fr \
    --cc=apelisse@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=iveqy@iveqy.com \
    --cc=j6t@kdbg.org \
    --cc=l.s.r@web.de \
    --cc=mackyle@gmail.com \
    --cc=mfick@codeaurora.org \
    --cc=pclouds@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.