All of lore.kernel.org
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: "Arthur Chan via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, Arthur Chan <arthur.chan@adalogics.com>
Subject: Re: [PATCH] fuzz: add basic fuzz testing for git command
Date: Tue, 13 Sep 2022 09:13:32 -0700	[thread overview]
Message-ID: <xmqqv8pr9rrn.fsf@gitster.g> (raw)
In-Reply-To: <pull.1351.git.1663078962231.gitgitgadget@gmail.com> (Arthur Chan via GitGitGadget's message of "Tue, 13 Sep 2022 14:22:42 +0000")

"Arthur Chan via GitGitGadget" <gitgitgadget@gmail.com> writes:

>  .gitignore        |   2 +
>  Makefile          |   2 +
>  fuzz-cmd-base.c   | 117 ++++++++++++++++++++++++++++++++++++++++++++++
>  fuzz-cmd-base.h   |  13 ++++++
>  fuzz-cmd-status.c |  68 +++++++++++++++++++++++++++
>  5 files changed, 202 insertions(+)
>  create mode 100644 fuzz-cmd-base.c
>  create mode 100644 fuzz-cmd-base.h
>  create mode 100644 fuzz-cmd-status.c

Just like we have t/ hierarchy for testing, if we plan to add more
fuzz-* related things on top of what we already have (like those
that can be seen in the context of this patch), I would prefer to
see a creation of fuzz/ hierarchy and move existing stuff there as
the first step before adding more.

And more fuzzing is good, if we can afford it ;-)

Thanks.

Even though I am not taking this patch as-is, let's give a cursory
look to make sure the future iteration can be more reviewable by
pointing out various CodingGuidelines issues.

> diff --git a/fuzz-cmd-base.c b/fuzz-cmd-base.c
> new file mode 100644
> index 00000000000..98f05c78372
> --- /dev/null
> +++ b/fuzz-cmd-base.c
> @@ -0,0 +1,117 @@
> +#include "cache.h"

Good to have this as the first thing.

> +#include "fuzz-cmd-base.h"
> +
> +
> +/*
> + * This function is used to randomize the content of a file with the
> + * random data. The random data normally come from the fuzzing engine
> + * LibFuzzer in order to create randomization of the git file worktree
> + * and possibly messing up of certain git config file to fuzz different
> + * git command execution logic.
> + */
> +void randomize_git_file(char *dir, char *name, char *data_chunk, int data_size) {

Unlike other control structure with multiple statements in a block,
the surrounding braces {} around function block sit on their own
lines.  I.e.

    void randomize_git_file(char *dir, char *name, char *data_chunk, int data_size)
    {


> +   char fname[256];

In our codebase, tab-width is 8 and we indent with tabs.

Use <strbuf.h> and avoid snprintf(), e.g.

	struct strbuf fname = STRBUF_INIT;
	strbuf_addf(&fname, "%s/%s", dir, name);
	... use fname.buf ...
	strbuf_release(&fname);

> +   FILE *fp;
> +

Good that you leave a blank between the end of decl and the
beginning of the statements.

> +   snprintf(fname, 255, "%s/%s", dir, name);
> +
> +   fp = fopen(fname, "wb");
> +   if (fp) {
> +      fwrite(data_chunk, 1, data_size, fp);
> +      fclose(fp);
> +   }
> +}

Why doesn't this care about errors at all?  Not even fopen errors?

> +/*
> + * This function is the variants of the above functions which takes
> + * in a set of target files to be processed. These target file are

"... is a variant of the above function, which takes a set of ..."

> + * passing to the above function one by one for content rewrite.
> + */
> +void randomize_git_files(char *dir, char *name_set[], int files_count, char *data, int size) {
> +   int data_size = size / files_count;
> +
> +   for(int i=0; i<files_count; i++) {

We do not yet officially allow variable decl for for() statement
like this.  We'll start allowing it later this year but we are
waiting for oddball platform/compiler folks to scream right now.

IOW, we write the above more like so:

	int data_size = size / files_count;
	int i;

        for (i = 0; i < files_count; i++) {

Take also notice how we use whitespaces around non-unary operators.

> +      char *data_chunk = malloc(data_size);
> +      memcpy(data_chunk, data + (i * data_size), data_size);
> +      randomize_git_file(dir, name_set[i], data_chunk, data_size);
> +
> +      free(data_chunk);
> +   }

As data_size does not change in this loop and the contents of
data_chunk from each round is discardable, allocating once outside
may make more sense.  Actually, as the called function makes only
read-only accesses of data_chunk, I do not quite see why you need to
make a copy in the first place.

We do not use malloc() etc. directly out of the system; study wrapper.c
and find xmalloc() and friends.

What if size is not a multiple of files_count, by the way?

I'll stop here as we already have plenty above (read: it is not "I
didn't spot any problems in the patch after this point").

Thanks.

  parent reply	other threads:[~2022-09-13 17:26 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-13 14:22 [PATCH] fuzz: add basic fuzz testing for git command Arthur Chan via GitGitGadget
2022-09-13 15:57 ` Ævar Arnfjörð Bjarmason
2022-09-16 15:54   ` Arthur Chan
2022-09-13 16:13 ` Junio C Hamano [this message]
2022-09-16 16:06   ` Arthur Chan
2022-09-16 17:29 ` [PATCH v2] " Arthur Chan via GitGitGadget
2022-09-16 17:37   ` Junio C Hamano
2022-09-16 18:07     ` Arthur Chan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqv8pr9rrn.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=arthur.chan@adalogics.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.