From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Calvin Wan <calvinwan@google.com>
Cc: git@vger.kernel.org, emilyshaffer@google.com, phillip.wood123@gmail.com
Subject: Re: [PATCH v3 1/6] run-command: add pipe_output_fn to run_processes_parallel_opts
Date: Fri, 21 Oct 2022 05:11:43 +0200 [thread overview]
Message-ID: <221021.86h6zxg8ds.gmgdl@evledraar.gmail.com> (raw)
In-Reply-To: <20221020232532.1128326-2-calvinwan@google.com>
On Thu, Oct 20 2022, Calvin Wan wrote:
> Add pipe_output_fn as an optionally set function in
> run_process_parallel_opts. If set, output from each child process is
> first separately stored in 'out' and then piped to the callback
> function when the child process finishes to allow for separate parsing.
The "when[...]finish[ed]" here seems a bit odd to me. Why isn't the API
to just stream this to callbacks as it comes in.
Then if a caller only cares about the output at the very end they can
manage that state between their streaming callbacks and "finish"
callback, i.e. buffer it & flush it themselves.
> diff --git a/run-command.c b/run-command.c
> index c772acd743..03787bc7f5 100644
> --- a/run-command.c
> +++ b/run-command.c
> @@ -1503,6 +1503,7 @@ struct parallel_processes {
> enum child_state state;
> struct child_process process;
> struct strbuf err;
> + struct strbuf out;
> void *data;
> } *children;
> /*
> @@ -1560,6 +1561,9 @@ static void pp_init(struct parallel_processes *pp,
>
> if (!opts->get_next_task)
> BUG("you need to specify a get_next_task function");
> +
> + if (opts->pipe_output && opts->ungroup)
> + BUG("pipe_output and ungroup are incompatible with each other");
>
> CALLOC_ARRAY(pp->children, n);
> if (!opts->ungroup)
> @@ -1567,6 +1571,8 @@ static void pp_init(struct parallel_processes *pp,
>
> for (size_t i = 0; i < n; i++) {
> strbuf_init(&pp->children[i].err, 0);
> + if (opts->pipe_output)
> + strbuf_init(&pp->children[i].out, 0);
Even if we're not using this, let's init it for simplicity. We don't use
the "err" with ungroup and we're init-ing that, and...
> child_process_init(&pp->children[i].process);
> if (pp->pfd) {
> pp->pfd[i].events = POLLIN | POLLHUP;
> @@ -1586,6 +1592,7 @@ static void pp_cleanup(struct parallel_processes *pp,
> trace_printf("run_processes_parallel: done");
> for (size_t i = 0; i < opts->processes; i++) {
> strbuf_release(&pp->children[i].err);
> + strbuf_release(&pp->children[i].out);
...here you're strbuf_relese()-ing a string that was never init'd, it's
not segfaulting because we check sb->alloc, and since we calloc'd this
whole thing it'll be 0, but let's just init it so it's a proper strbuf
(with slopbuf). It's cheap.
> +/**
> + * This callback is called on every child process that finished processing.
> + *
> + * "struct strbuf *process_out" contains the output from the finished child
> + * process.
> + *
> + * pp_cb is the callback cookie as passed into run_processes_parallel,
> + * pp_task_cb is the callback cookie as passed into get_next_task_fn.
> + *
> + * This function is incompatible with "ungroup"
> + */
> +typedef void (*pipe_output_fn)(struct strbuf *process_out,
> + void *pp_cb,
> + void *pp_task_cb);
> +
> /**
> * This callback is called on every child process that finished processing.
> *
> @@ -493,6 +508,12 @@ struct run_process_parallel_opts
> */
> start_failure_fn start_failure;
>
> + /**
> + * pipe_output: See pipe_output_fn() above. This should be
> + * NULL unless process specific output is needed
> + */
> + pipe_output_fn pipe_output;
> +
> /**
> * task_finished: See task_finished_fn() above. This can be
> * NULL to omit any special handling.
> diff --git a/t/helper/test-run-command.c b/t/helper/test-run-command.c
> index 3ecb830f4a..e9b41419a0 100644
> --- a/t/helper/test-run-command.c
> +++ b/t/helper/test-run-command.c
> @@ -52,6 +52,13 @@ static int no_job(struct child_process *cp,
> return 0;
> }
>
> +static void pipe_output(struct strbuf *process_out,
> + void *pp_cb,
> + void *pp_task_cb)
> +{
> + fprintf(stderr, "%s", process_out->buf);
maybe print this with split lines prefixed with something so wour tests
can see that something actually happened here, & test-cmp it so we can
see what went where, as opposed to...
> +test_expect_success 'run_command runs in parallel with more jobs available than tasks --pipe-output' '
> + test-tool run-command --pipe-output run-command-parallel 5 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err &&
> + test_must_be_empty out &&
> + test_line_count = 20 err
> +'
Just checking the number of lines, which seems to leave a lot of leeway
for the output being mixed up in all sorts of ways & the test to still
pass..
(ditto below)
next prev parent reply other threads:[~2022-10-21 3:31 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <https://lore.kernel.org/git/20221011232604.839941-1-calvinwan@google.com/>
2022-10-20 23:25 ` [PATCH v3 0/6] submodule: parallelize diff Calvin Wan
2022-10-20 23:25 ` [PATCH v3 1/6] run-command: add pipe_output_fn to run_processes_parallel_opts Calvin Wan
2022-10-21 3:11 ` Ævar Arnfjörð Bjarmason [this message]
2022-10-24 17:13 ` Calvin Wan
2022-10-21 5:46 ` Junio C Hamano
2022-10-24 17:00 ` Calvin Wan
2022-10-24 19:04 ` Junio C Hamano
2022-10-25 18:51 ` Calvin Wan
2022-10-20 23:25 ` [PATCH v3 2/6] run-command: add hide_output " Calvin Wan
2022-10-21 2:54 ` Ævar Arnfjörð Bjarmason
2022-10-24 19:24 ` Calvin Wan
2022-10-25 19:32 ` Ævar Arnfjörð Bjarmason
2022-10-25 21:22 ` Calvin Wan
2022-10-20 23:25 ` [PATCH v3 3/6] submodule: strbuf variable rename Calvin Wan
2022-10-20 23:25 ` [PATCH v3 4/6] submodule: move status parsing into function Calvin Wan
2022-10-20 23:25 ` [PATCH v3 5/6] diff-lib: refactor match_stat_with_submodule Calvin Wan
2022-10-20 23:25 ` [PATCH v3 6/6] diff-lib: parallelize run_diff_files for submodules Calvin Wan
2022-10-21 1:13 ` Ævar Arnfjörð Bjarmason
2022-11-03 21:16 ` Calvin Wan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=221021.86h6zxg8ds.gmgdl@evledraar.gmail.com \
--to=avarab@gmail.com \
--cc=calvinwan@google.com \
--cc=emilyshaffer@google.com \
--cc=git@vger.kernel.org \
--cc=phillip.wood123@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).