All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Calvin Wan <calvinwan@google.com>
Cc: git@vger.kernel.org, emilyshaffer@google.com, phillip.wood123@gmail.com
Subject: Re: [PATCH v3 1/6] run-command: add pipe_output_fn to run_processes_parallel_opts
Date: Fri, 21 Oct 2022 05:11:43 +0200	[thread overview]
Message-ID: <221021.86h6zxg8ds.gmgdl@evledraar.gmail.com> (raw)
In-Reply-To: <20221020232532.1128326-2-calvinwan@google.com>


On Thu, Oct 20 2022, Calvin Wan wrote:

> Add pipe_output_fn as an optionally set function in
> run_process_parallel_opts. If set, output from each child process is
> first separately stored in 'out' and then piped to the callback
> function when the child process finishes to allow for separate parsing.

The "when[...]finish[ed]" here seems a bit odd to me. Why isn't the API
to just stream this to callbacks as it comes in.

Then if a caller only cares about the output at the very end they can
manage that state between their streaming callbacks and "finish"
callback, i.e. buffer it & flush it themselves.

> diff --git a/run-command.c b/run-command.c
> index c772acd743..03787bc7f5 100644
> --- a/run-command.c
> +++ b/run-command.c
> @@ -1503,6 +1503,7 @@ struct parallel_processes {
>  		enum child_state state;
>  		struct child_process process;
>  		struct strbuf err;
> +		struct strbuf out;
>  		void *data;
>  	} *children;
>  	/*
> @@ -1560,6 +1561,9 @@ static void pp_init(struct parallel_processes *pp,
>  
>  	if (!opts->get_next_task)
>  		BUG("you need to specify a get_next_task function");
> +	
> +	if (opts->pipe_output && opts->ungroup)
> +		BUG("pipe_output and ungroup are incompatible with each other");
>  
>  	CALLOC_ARRAY(pp->children, n);
>  	if (!opts->ungroup)
> @@ -1567,6 +1571,8 @@ static void pp_init(struct parallel_processes *pp,
>  
>  	for (size_t i = 0; i < n; i++) {
>  		strbuf_init(&pp->children[i].err, 0);
> +		if (opts->pipe_output)
> +			strbuf_init(&pp->children[i].out, 0);

Even if we're not using this, let's init it for simplicity. We don't use
the "err" with ungroup and we're init-ing that, and...

>  		child_process_init(&pp->children[i].process);
>  		if (pp->pfd) {
>  			pp->pfd[i].events = POLLIN | POLLHUP;
> @@ -1586,6 +1592,7 @@ static void pp_cleanup(struct parallel_processes *pp,
>  	trace_printf("run_processes_parallel: done");
>  	for (size_t i = 0; i < opts->processes; i++) {
>  		strbuf_release(&pp->children[i].err);
> +		strbuf_release(&pp->children[i].out);

...here you're strbuf_relese()-ing a string that was never init'd, it's
not segfaulting because we check sb->alloc, and since we calloc'd this
whole thing it'll be 0, but let's just init it so it's a proper strbuf
(with slopbuf). It's cheap.

> +/**
> + * This callback is called on every child process that finished processing.
> + * 
> + * "struct strbuf *process_out" contains the output from the finished child
> + * process.
> + *
> + * pp_cb is the callback cookie as passed into run_processes_parallel,
> + * pp_task_cb is the callback cookie as passed into get_next_task_fn.
> + *
> + * This function is incompatible with "ungroup"
> + */
> +typedef void (*pipe_output_fn)(struct strbuf *process_out,
> +			       void *pp_cb,
> +			       void *pp_task_cb);
> +
>  /**
>   * This callback is called on every child process that finished processing.
>   *
> @@ -493,6 +508,12 @@ struct run_process_parallel_opts
>  	 */
>  	start_failure_fn start_failure;
>  
> +	/**
> +	 * pipe_output: See pipe_output_fn() above. This should be
> +	 * NULL unless process specific output is needed
> +	 */
> +	pipe_output_fn pipe_output;
> +
>  	/**
>  	 * task_finished: See task_finished_fn() above. This can be
>  	 * NULL to omit any special handling.
> diff --git a/t/helper/test-run-command.c b/t/helper/test-run-command.c
> index 3ecb830f4a..e9b41419a0 100644
> --- a/t/helper/test-run-command.c
> +++ b/t/helper/test-run-command.c
> @@ -52,6 +52,13 @@ static int no_job(struct child_process *cp,
>  	return 0;
>  }
>  
> +static void pipe_output(struct strbuf *process_out,
> +			void *pp_cb,
> +			void *pp_task_cb)
> +{
> +	fprintf(stderr, "%s", process_out->buf);

maybe print this with split lines prefixed with something so wour tests
can see that something actually happened here, & test-cmp it so we can
see what went where, as opposed to...

> +test_expect_success 'run_command runs in parallel with more jobs available than tasks --pipe-output' '
> +	test-tool run-command --pipe-output run-command-parallel 5 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err &&
> +	test_must_be_empty out &&
> +	test_line_count = 20 err
> +'

Just checking the number of lines, which seems to leave a lot of leeway
for the output being mixed up in all sorts of ways & the test to still
pass..

(ditto below)

  reply	other threads:[~2022-10-21  3:31 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <https://lore.kernel.org/git/20221011232604.839941-1-calvinwan@google.com/>
2022-10-20 23:25 ` [PATCH v3 0/6] submodule: parallelize diff Calvin Wan
2022-10-20 23:25 ` [PATCH v3 1/6] run-command: add pipe_output_fn to run_processes_parallel_opts Calvin Wan
2022-10-21  3:11   ` Ævar Arnfjörð Bjarmason [this message]
2022-10-24 17:13     ` Calvin Wan
2022-10-21  5:46   ` Junio C Hamano
2022-10-24 17:00     ` Calvin Wan
2022-10-24 19:04       ` Junio C Hamano
2022-10-25 18:51         ` Calvin Wan
2022-10-20 23:25 ` [PATCH v3 2/6] run-command: add hide_output " Calvin Wan
2022-10-21  2:54   ` Ævar Arnfjörð Bjarmason
2022-10-24 19:24     ` Calvin Wan
2022-10-25 19:32       ` Ævar Arnfjörð Bjarmason
2022-10-25 21:22         ` Calvin Wan
2022-10-20 23:25 ` [PATCH v3 3/6] submodule: strbuf variable rename Calvin Wan
2022-10-20 23:25 ` [PATCH v3 4/6] submodule: move status parsing into function Calvin Wan
2022-10-20 23:25 ` [PATCH v3 5/6] diff-lib: refactor match_stat_with_submodule Calvin Wan
2022-10-20 23:25 ` [PATCH v3 6/6] diff-lib: parallelize run_diff_files for submodules Calvin Wan
2022-10-21  1:13   ` Ævar Arnfjörð Bjarmason
2022-11-03 21:16     ` Calvin Wan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=221021.86h6zxg8ds.gmgdl@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=calvinwan@google.com \
    --cc=emilyshaffer@google.com \
    --cc=git@vger.kernel.org \
    --cc=phillip.wood123@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.