Re: [PATCH v3 1/6] run-command: add pipe_output_fn to run_processes_parallel_opts

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Calvin Wan <calvinwan@google.com>
Cc: git@vger.kernel.org, emilyshaffer@google.com, phillip.wood123@gmail.com
Subject: Re: [PATCH v3 1/6] run-command: add pipe_output_fn to run_processes_parallel_opts
Date: Fri, 21 Oct 2022 05:11:43 +0200	[thread overview]
Message-ID: <221021.86h6zxg8ds.gmgdl@evledraar.gmail.com> (raw)
In-Reply-To: <20221020232532.1128326-2-calvinwan@google.com>


On Thu, Oct 20 2022, Calvin Wan wrote:

> Add pipe_output_fn as an optionally set function in
> run_process_parallel_opts. If set, output from each child process is
> first separately stored in 'out' and then piped to the callback
> function when the child process finishes to allow for separate parsing.

The "when[...]finish[ed]" here seems a bit odd to me. Why isn't the API
to just stream this to callbacks as it comes in.

Then if a caller only cares about the output at the very end they can
manage that state between their streaming callbacks and "finish"
callback, i.e. buffer it & flush it themselves.

> diff --git a/run-command.c b/run-command.c
> index c772acd743..03787bc7f5 100644
> --- a/run-command.c
> +++ b/run-command.c
> @@ -1503,6 +1503,7 @@ struct parallel_processes {
>  		enum child_state state;
>  		struct child_process process;
>  		struct strbuf err;
> +		struct strbuf out;
>  		void *data;
>  	} *children;
>  	/*
> @@ -1560,6 +1561,9 @@ static void pp_init(struct parallel_processes *pp,
>  
>  	if (!opts->get_next_task)
>  		BUG("you need to specify a get_next_task function");
> +	
> +	if (opts->pipe_output && opts->ungroup)
> +		BUG("pipe_output and ungroup are incompatible with each other");
>  
>  	CALLOC_ARRAY(pp->children, n);
>  	if (!opts->ungroup)
> @@ -1567,6 +1571,8 @@ static void pp_init(struct parallel_processes *pp,
>  
>  	for (size_t i = 0; i < n; i++) {
>  		strbuf_init(&pp->children[i].err, 0);
> +		if (opts->pipe_output)
> +			strbuf_init(&pp->children[i].out, 0);

Even if we're not using this, let's init it for simplicity. We don't use
the "err" with ungroup and we're init-ing that, and...

>  		child_process_init(&pp->children[i].process);
>  		if (pp->pfd) {
>  			pp->pfd[i].events = POLLIN | POLLHUP;
> @@ -1586,6 +1592,7 @@ static void pp_cleanup(struct parallel_processes *pp,
>  	trace_printf("run_processes_parallel: done");
>  	for (size_t i = 0; i < opts->processes; i++) {
>  		strbuf_release(&pp->children[i].err);
> +		strbuf_release(&pp->children[i].out);

...here you're strbuf_relese()-ing a string that was never init'd, it's
not segfaulting because we check sb->alloc, and since we calloc'd this
whole thing it'll be 0, but let's just init it so it's a proper strbuf
(with slopbuf). It's cheap.

> +/**
> + * This callback is called on every child process that finished processing.
> + * 
> + * "struct strbuf *process_out" contains the output from the finished child
> + * process.
> + *
> + * pp_cb is the callback cookie as passed into run_processes_parallel,
> + * pp_task_cb is the callback cookie as passed into get_next_task_fn.
> + *
> + * This function is incompatible with "ungroup"
> + */
> +typedef void (*pipe_output_fn)(struct strbuf *process_out,
> +			       void *pp_cb,
> +			       void *pp_task_cb);
> +
>  /**
>   * This callback is called on every child process that finished processing.
>   *
> @@ -493,6 +508,12 @@ struct run_process_parallel_opts
>  	 */
>  	start_failure_fn start_failure;
>  
> +	/**
> +	 * pipe_output: See pipe_output_fn() above. This should be
> +	 * NULL unless process specific output is needed
> +	 */
> +	pipe_output_fn pipe_output;
> +
>  	/**
>  	 * task_finished: See task_finished_fn() above. This can be
>  	 * NULL to omit any special handling.
> diff --git a/t/helper/test-run-command.c b/t/helper/test-run-command.c
> index 3ecb830f4a..e9b41419a0 100644
> --- a/t/helper/test-run-command.c
> +++ b/t/helper/test-run-command.c
> @@ -52,6 +52,13 @@ static int no_job(struct child_process *cp,
>  	return 0;
>  }
>  
> +static void pipe_output(struct strbuf *process_out,
> +			void *pp_cb,
> +			void *pp_task_cb)
> +{
> +	fprintf(stderr, "%s", process_out->buf);

maybe print this with split lines prefixed with something so wour tests
can see that something actually happened here, & test-cmp it so we can
see what went where, as opposed to...

> +test_expect_success 'run_command runs in parallel with more jobs available than tasks --pipe-output' '
> +	test-tool run-command --pipe-output run-command-parallel 5 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err &&
> +	test_must_be_empty out &&
> +	test_line_count = 20 err
> +'

Just checking the number of lines, which seems to leave a lot of leeway
for the output being mixed up in all sorts of ways & the test to still
pass..

(ditto below)

next prev parent reply	other threads:[~2022-10-21  3:31 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <https://lore.kernel.org/git/20221011232604.839941-1-calvinwan@google.com/>
2022-10-20 23:25 ` [PATCH v3 0/6] submodule: parallelize diff Calvin Wan
2022-10-20 23:25 ` [PATCH v3 1/6] run-command: add pipe_output_fn to run_processes_parallel_opts Calvin Wan
2022-10-21  3:11   ` Ævar Arnfjörð Bjarmason [this message]
2022-10-24 17:13     ` Calvin Wan
2022-10-21  5:46   ` Junio C Hamano
2022-10-24 17:00     ` Calvin Wan
2022-10-24 19:04       ` Junio C Hamano
2022-10-25 18:51         ` Calvin Wan
2022-10-20 23:25 ` [PATCH v3 2/6] run-command: add hide_output " Calvin Wan
2022-10-21  2:54   ` Ævar Arnfjörð Bjarmason
2022-10-24 19:24     ` Calvin Wan
2022-10-25 19:32       ` Ævar Arnfjörð Bjarmason
2022-10-25 21:22         ` Calvin Wan
2022-10-20 23:25 ` [PATCH v3 3/6] submodule: strbuf variable rename Calvin Wan
2022-10-20 23:25 ` [PATCH v3 4/6] submodule: move status parsing into function Calvin Wan
2022-10-20 23:25 ` [PATCH v3 5/6] diff-lib: refactor match_stat_with_submodule Calvin Wan
2022-10-20 23:25 ` [PATCH v3 6/6] diff-lib: parallelize run_diff_files for submodules Calvin Wan
2022-10-21  1:13   ` Ævar Arnfjörð Bjarmason
2022-11-03 21:16     ` Calvin Wan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=221021.86h6zxg8ds.gmgdl@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=calvinwan@google.com \
    --cc=emilyshaffer@google.com \
    --cc=git@vger.kernel.org \
    --cc=phillip.wood123@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).