* [PATCH] fuzz: add new oss-fuzz fuzzer for date.c / date.h @ 2023-11-11 17:39 Arthur Chan via GitGitGadget 2023-11-12 5:59 ` Junio C Hamano 2023-11-13 16:22 ` [PATCH v2] " Arthur Chan via GitGitGadget 0 siblings, 2 replies; 10+ messages in thread From: Arthur Chan via GitGitGadget @ 2023-11-11 17:39 UTC (permalink / raw) To: git; +Cc: Arthur Chan, Arthur Chan From: Arthur Chan <arthur.chan@adalogics.com> Signed-off-by: Arthur Chan <arthur.chan@adalogics.com> --- fuzz: add new oss-fuzz fuzzer for date.c / date.h This patch is aimed to add a new oss-fuzz fuzzer to the oss-fuzz directory for fuzzing date.c / date.h in the base directory. The .gitignore of the oss-fuzz directory and the Makefile have been modified to accommodate the new fuzzer fuzz-date.c. Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1612%2Farthurscchan%2Fnew-fuzzer-date-v1 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1612/arthurscchan/new-fuzzer-date-v1 Pull-Request: https://github.com/gitgitgadget/git/pull/1612 Makefile | 1 + oss-fuzz/.gitignore | 1 + oss-fuzz/fuzz-date.c | 75 ++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 77 insertions(+) create mode 100644 oss-fuzz/fuzz-date.c diff --git a/Makefile b/Makefile index 03adcb5a480..c9fe99a8c88 100644 --- a/Makefile +++ b/Makefile @@ -752,6 +752,7 @@ ETAGS_TARGET = TAGS FUZZ_OBJS += oss-fuzz/fuzz-commit-graph.o FUZZ_OBJS += oss-fuzz/fuzz-pack-headers.o FUZZ_OBJS += oss-fuzz/fuzz-pack-idx.o +FUZZ_OBJS += oss-fuzz/fuzz-date.o .PHONY: fuzz-objs fuzz-objs: $(FUZZ_OBJS) diff --git a/oss-fuzz/.gitignore b/oss-fuzz/.gitignore index 9acb74412ef..2375e77b108 100644 --- a/oss-fuzz/.gitignore +++ b/oss-fuzz/.gitignore @@ -1,3 +1,4 @@ fuzz-commit-graph fuzz-pack-headers fuzz-pack-idx +fuzz-date diff --git a/oss-fuzz/fuzz-date.c b/oss-fuzz/fuzz-date.c new file mode 100644 index 00000000000..29bcaf595e4 --- /dev/null +++ b/oss-fuzz/fuzz-date.c @@ -0,0 +1,75 @@ +#include "git-compat-util.h" +#include "date.h" + +int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size); + +int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) +{ + int type; + int time; + int num; + char *str; + timestamp_t ts; + enum date_mode_type dmtype; + struct date_mode *dm; + + if (size <= 8) + { + return 0; + } + + type = (*((int *)data)) % 8; + data += 4; + size -= 4; + + time = abs(*((int *)data)); + data += 4; + size -= 4; + + str = (char *)malloc(size+1); + if (!str) + { + return 0; + } + memcpy(str, data, size); + str[size] = '\0'; + + ts = approxidate_careful(str, &num); + free(str); + + switch(type) + { + case 0: default: + dmtype = DATE_NORMAL; + break; + case 1: + dmtype = DATE_HUMAN; + break; + case 2: + dmtype = DATE_SHORT; + break; + case 3: + dmtype = DATE_ISO8601; + break; + case 4: + dmtype = DATE_ISO8601_STRICT; + break; + case 5: + dmtype = DATE_RFC2822; + break; + case 6: + dmtype = DATE_RAW; + break; + case 7: + dmtype = DATE_UNIX; + break; + } + + dm = date_mode_from_type(dmtype); + dm->local = 1; + show_date(ts, time, dm); + + date_mode_release(dm); + + return 0; +} base-commit: dadef801b365989099a9929e995589e455c51fed -- gitgitgadget ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] fuzz: add new oss-fuzz fuzzer for date.c / date.h 2023-11-11 17:39 [PATCH] fuzz: add new oss-fuzz fuzzer for date.c / date.h Arthur Chan via GitGitGadget @ 2023-11-12 5:59 ` Junio C Hamano 2023-11-12 12:39 ` Junio C Hamano 2023-11-13 16:22 ` [PATCH v2] " Arthur Chan via GitGitGadget 1 sibling, 1 reply; 10+ messages in thread From: Junio C Hamano @ 2023-11-12 5:59 UTC (permalink / raw) To: Arthur Chan via GitGitGadget; +Cc: git, Arthur Chan "Arthur Chan via GitGitGadget" <gitgitgadget@gmail.com> writes: > diff --git a/Makefile b/Makefile > index 03adcb5a480..c9fe99a8c88 100644 > --- a/Makefile > +++ b/Makefile > @@ -752,6 +752,7 @@ ETAGS_TARGET = TAGS > FUZZ_OBJS += oss-fuzz/fuzz-commit-graph.o > FUZZ_OBJS += oss-fuzz/fuzz-pack-headers.o > FUZZ_OBJS += oss-fuzz/fuzz-pack-idx.o > +FUZZ_OBJS += oss-fuzz/fuzz-date.o The same comment applies to .gitignore but I think the existing entries are sorted and fuzz-date should be added between fuzz-commit-graph and fuzz-pack-headers. > diff --git a/oss-fuzz/fuzz-date.c b/oss-fuzz/fuzz-date.c > new file mode 100644 > index 00000000000..29bcaf595e4 > --- /dev/null > +++ b/oss-fuzz/fuzz-date.c > @@ -0,0 +1,75 @@ > +#include "git-compat-util.h" > +#include "date.h" > + > +int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size); > + > +int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) > +{ > + int type; > + int time; > + int num; > + char *str; > + timestamp_t ts; > + enum date_mode_type dmtype; > + struct date_mode *dm; > + > + if (size <= 8) > + { > + return 0; > + } How much do we care about sticking to our coding style for source files in this directory? If we do (and I do not see a strong reason not to), let's lose the {unneeded braces} around a single statement block. > + type = (*((int *)data)) % 8; > + data += 4; > + size -= 4; I'd prefer to avoid these hardcoded and unexplained constants. I think "8" relates to the number of case arms below? I am not sure if "4" is justifiable without "we assume everybody's int is four bytes long", but if that is what is going on, perhaps use uint32_t or something? Also, is data[] guaranteed to be aligned well to allow us to do the above? As we only need to spread to DATE_UNIX-1 types (because we exclude DATE_STRFTIME), it is sufficient to look at the lower nibble of a single byte. The upper nibble could be used to fuzz the .local bit if you wanted to, e.g. so I wonder local_bit = !!(*data & 0x10); dmtype = (enum date_mode_type)(*data % DATE_UNIX); if (dmtype == DATE_STRFTIME) return 0; data++; size--; > + time = abs(*((int *)data)); > + data += 4; > + size -= 4; Ditto. Rename "time" because the second parameter to show_date() is *not* "time" but "tz". The valid range of "tz" comfortably fits in 16-bit signed int, but note that there are valid negative values in the range. Are we assuming that the "tz" is attacker controlled? Why are you limiting its value to non-negative, yet you are not rejecting absurd timezone offsets? Good values lie in a range much narrower than between -2400 and 2400. Subjecting "tz" to fuzzer is perfectly fine, but then limiting its value to non-negative contradicts with it, so I am not sure what your intention is. As I used the first byte to fuzz dmtype and .local, let's use the next three bytes to allow feeding overly wild timezone values to the machinery and see what breaks, perhaps like so: tz = *data++; /* int tz; */ tz = (tz << 8) | *data++; tz = (tz << 8) | *data++; size -= 3; Now the upfront length check needs to reject any input shorter than 4 bytes, so do so with a comment accordingly, perhaps like if (size < 4) /* * we use the first byte to fuzz dmtype and local, * then the next three bytes to fuzz tz offset, * and the remainder is fed as end-user input to * approxidate(). */ return 0; before everything I wrote so far. > + str = (char *)malloc(size+1); (char *)malloc(size + 1); > + if (!str) > + { > + return 0; > + } Ditto on {unnecessary braces}. > + memcpy(str, data, size); > + str[size] = '\0'; > + > + ts = approxidate_careful(str, &num); > + free(str); > + > + switch(type) > + { > + case 0: default: > + dmtype = DATE_NORMAL; > + break; Style. In our codebase, "switch" and "case" align at the same column, and case arms are written one per line, i.e., switch (type) { case 0: default: ... The way dmtype is handled in a switch() below tells me that you do not consider it is a potential attack vector (e.g., an attacker cannot force us to use dmtype==DATE_STRFTIME without the format and cause us to die). Am I reading your intention correctly? If so, I'd just do the "use the lower nibble of the first byte" as I shown earlier, and this large switch statement will go away. > + case 1: > + dmtype = DATE_HUMAN; > + break; > + case 2: > + dmtype = DATE_SHORT; > + break; > + case 3: > + dmtype = DATE_ISO8601; > + break; > + case 4: > + dmtype = DATE_ISO8601_STRICT; > + break; > + case 5: > + dmtype = DATE_RFC2822; > + break; > + case 6: > + dmtype = DATE_RAW; > + break; > + case 7: > + dmtype = DATE_UNIX; > + break; > + } > + > + dm = date_mode_from_type(dmtype); > + dm->local = 1; Don't we want to allow the incoming data to fuzz the local bit, too? > + show_date(ts, time, dm); > + > + date_mode_release(dm); > + > + return 0; > +} > > base-commit: dadef801b365989099a9929e995589e455c51fed ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] fuzz: add new oss-fuzz fuzzer for date.c / date.h 2023-11-12 5:59 ` Junio C Hamano @ 2023-11-12 12:39 ` Junio C Hamano 0 siblings, 0 replies; 10+ messages in thread From: Junio C Hamano @ 2023-11-12 12:39 UTC (permalink / raw) To: Arthur Chan via GitGitGadget; +Cc: git, Arthur Chan Junio C Hamano <gitster@pobox.com> writes: > As I used the first byte to fuzz dmtype and .local, let's use the > next three bytes to allow feeding overly wild timezone values to the > machinery and see what breaks, perhaps like so: > > tz = *data++; /* int tz; */ > tz = (tz << 8) | *data++; > tz = (tz << 8) | *data++; > size -= 3; Just this part. As data points at unsigned char, the above would not give us any negative number. We'd need to sign-extend the 24-bit resulting value if we are going to adopt the above approach. ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2] fuzz: add new oss-fuzz fuzzer for date.c / date.h 2023-11-11 17:39 [PATCH] fuzz: add new oss-fuzz fuzzer for date.c / date.h Arthur Chan via GitGitGadget 2023-11-12 5:59 ` Junio C Hamano @ 2023-11-13 16:22 ` Arthur Chan via GitGitGadget 2023-11-13 18:35 ` Jeff King ` (2 more replies) 1 sibling, 3 replies; 10+ messages in thread From: Arthur Chan via GitGitGadget @ 2023-11-13 16:22 UTC (permalink / raw) To: git; +Cc: Arthur Chan, Arthur Chan From: Arthur Chan <arthur.chan@adalogics.com> Signed-off-by: Arthur Chan <arthur.chan@adalogics.com> --- fuzz: add new oss-fuzz fuzzer for date.c / date.h This patch is aimed to add a new oss-fuzz fuzzer to the oss-fuzz directory for fuzzing date.c / date.h in the base directory. The .gitignore of the oss-fuzz directory and the Makefile have been modified to accommodate the new fuzzer fuzz-date.c. Fixed the objects order in .gitignore and Makefiles and fixed some of the logic and formatting for the fuzz-date.c fuzzer in v2. Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1612%2Farthurscchan%2Fnew-fuzzer-date-v2 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1612/arthurscchan/new-fuzzer-date-v2 Pull-Request: https://github.com/gitgitgadget/git/pull/1612 Range-diff vs v1: 1: d43724c19d0 ! 1: 2928e2b858d fuzz: add new oss-fuzz fuzzer for date.c / date.h @@ Commit message Signed-off-by: Arthur Chan <arthur.chan@adalogics.com> ## Makefile ## -@@ Makefile: ETAGS_TARGET = TAGS +@@ Makefile: SCRIPTS = $(SCRIPT_SH_GEN) \ + ETAGS_TARGET = TAGS + FUZZ_OBJS += oss-fuzz/fuzz-commit-graph.o ++FUZZ_OBJS += oss-fuzz/fuzz-date.o FUZZ_OBJS += oss-fuzz/fuzz-pack-headers.o FUZZ_OBJS += oss-fuzz/fuzz-pack-idx.o -+FUZZ_OBJS += oss-fuzz/fuzz-date.o .PHONY: fuzz-objs - fuzz-objs: $(FUZZ_OBJS) - ## oss-fuzz/.gitignore ## @@ fuzz-commit-graph ++fuzz-date fuzz-pack-headers fuzz-pack-idx -+fuzz-date ## oss-fuzz/fuzz-date.c (new) ## @@ @@ oss-fuzz/fuzz-date.c (new) + +int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) +{ -+ int type; -+ int time; ++ int local; + int num; ++ uint16_t tz; + char *str; + timestamp_t ts; + enum date_mode_type dmtype; + struct date_mode *dm; + -+ if (size <= 8) -+ { ++ if (size <= 4) ++ /* ++ * we use the first byte to fuzz dmtype and local, ++ * then the next three bytes to fuzz tz offset, ++ * and the remainder (at least one byte) is fed ++ * as end-user input to approxidate_careful(). ++ */ + return 0; -+ } + -+ type = (*((int *)data)) % 8; -+ data += 4; -+ size -= 4; ++ local = !!(*data & 0x10); ++ dmtype = (enum date_mode_type)(*data % DATE_UNIX); ++ if (dmtype == DATE_STRFTIME) ++ /* ++ * Currently DATE_STRFTIME is not supported. ++ */ ++ return 0; ++ data++; ++ size--; + -+ time = abs(*((int *)data)); -+ data += 4; -+ size -= 4; ++ tz = *data++; ++ tz = (tz << 8) | *data++; ++ tz = (tz << 8) | *data++; ++ size -= 3; + -+ str = (char *)malloc(size+1); ++ str = (char *)malloc(size + 1); + if (!str) -+ { + return 0; -+ } + memcpy(str, data, size); + str[size] = '\0'; + + ts = approxidate_careful(str, &num); + free(str); + -+ switch(type) -+ { -+ case 0: default: -+ dmtype = DATE_NORMAL; -+ break; -+ case 1: -+ dmtype = DATE_HUMAN; -+ break; -+ case 2: -+ dmtype = DATE_SHORT; -+ break; -+ case 3: -+ dmtype = DATE_ISO8601; -+ break; -+ case 4: -+ dmtype = DATE_ISO8601_STRICT; -+ break; -+ case 5: -+ dmtype = DATE_RFC2822; -+ break; -+ case 6: -+ dmtype = DATE_RAW; -+ break; -+ case 7: -+ dmtype = DATE_UNIX; -+ break; -+ } -+ + dm = date_mode_from_type(dmtype); -+ dm->local = 1; -+ show_date(ts, time, dm); ++ dm->local = local; ++ show_date(ts, (int16_t)tz, dm); + + date_mode_release(dm); + Makefile | 1 + oss-fuzz/.gitignore | 1 + oss-fuzz/fuzz-date.c | 56 ++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 58 insertions(+) create mode 100644 oss-fuzz/fuzz-date.c diff --git a/Makefile b/Makefile index 03adcb5a480..4b875ef6ce1 100644 --- a/Makefile +++ b/Makefile @@ -750,6 +750,7 @@ SCRIPTS = $(SCRIPT_SH_GEN) \ ETAGS_TARGET = TAGS FUZZ_OBJS += oss-fuzz/fuzz-commit-graph.o +FUZZ_OBJS += oss-fuzz/fuzz-date.o FUZZ_OBJS += oss-fuzz/fuzz-pack-headers.o FUZZ_OBJS += oss-fuzz/fuzz-pack-idx.o .PHONY: fuzz-objs diff --git a/oss-fuzz/.gitignore b/oss-fuzz/.gitignore index 9acb74412ef..5b954088254 100644 --- a/oss-fuzz/.gitignore +++ b/oss-fuzz/.gitignore @@ -1,3 +1,4 @@ fuzz-commit-graph +fuzz-date fuzz-pack-headers fuzz-pack-idx diff --git a/oss-fuzz/fuzz-date.c b/oss-fuzz/fuzz-date.c new file mode 100644 index 00000000000..a5c887bf11c --- /dev/null +++ b/oss-fuzz/fuzz-date.c @@ -0,0 +1,56 @@ +#include "git-compat-util.h" +#include "date.h" + +int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size); + +int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) +{ + int local; + int num; + uint16_t tz; + char *str; + timestamp_t ts; + enum date_mode_type dmtype; + struct date_mode *dm; + + if (size <= 4) + /* + * we use the first byte to fuzz dmtype and local, + * then the next three bytes to fuzz tz offset, + * and the remainder (at least one byte) is fed + * as end-user input to approxidate_careful(). + */ + return 0; + + local = !!(*data & 0x10); + dmtype = (enum date_mode_type)(*data % DATE_UNIX); + if (dmtype == DATE_STRFTIME) + /* + * Currently DATE_STRFTIME is not supported. + */ + return 0; + data++; + size--; + + tz = *data++; + tz = (tz << 8) | *data++; + tz = (tz << 8) | *data++; + size -= 3; + + str = (char *)malloc(size + 1); + if (!str) + return 0; + memcpy(str, data, size); + str[size] = '\0'; + + ts = approxidate_careful(str, &num); + free(str); + + dm = date_mode_from_type(dmtype); + dm->local = local; + show_date(ts, (int16_t)tz, dm); + + date_mode_release(dm); + + return 0; +} base-commit: dadef801b365989099a9929e995589e455c51fed -- gitgitgadget ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v2] fuzz: add new oss-fuzz fuzzer for date.c / date.h 2023-11-13 16:22 ` [PATCH v2] " Arthur Chan via GitGitGadget @ 2023-11-13 18:35 ` Jeff King 2023-11-13 23:27 ` Junio C Hamano 2023-11-13 23:27 ` Junio C Hamano 2023-11-14 10:53 ` [PATCH v3] " Arthur Chan via GitGitGadget 2 siblings, 1 reply; 10+ messages in thread From: Jeff King @ 2023-11-13 18:35 UTC (permalink / raw) To: Arthur Chan via GitGitGadget; +Cc: git, Arthur Chan On Mon, Nov 13, 2023 at 04:22:48PM +0000, Arthur Chan via GitGitGadget wrote: > + str = (char *)malloc(size + 1); > + if (!str) > + return 0; > + memcpy(str, data, size); > + str[size] = '\0'; Is it important that we avoid calling die() if the malloc fails here? The usual way to write this in our code base is just: str = xmemdupz(data, size); It's not entirely a style thing; we sometimes audit the code base looking for computations on malloc sizes (for integer overflows) as well as sites that should be using xmalloc and are not. Obviously we can exclude oss-fuzz/ from such audits, but if there's no reason not to prefer our usual style, it's one less thing to worry about. -Peff ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2] fuzz: add new oss-fuzz fuzzer for date.c / date.h 2023-11-13 18:35 ` Jeff King @ 2023-11-13 23:27 ` Junio C Hamano 0 siblings, 0 replies; 10+ messages in thread From: Junio C Hamano @ 2023-11-13 23:27 UTC (permalink / raw) To: Jeff King; +Cc: Arthur Chan via GitGitGadget, git, Arthur Chan Jeff King <peff@peff.net> writes: > On Mon, Nov 13, 2023 at 04:22:48PM +0000, Arthur Chan via GitGitGadget wrote: > >> + str = (char *)malloc(size + 1); >> + if (!str) >> + return 0; >> + memcpy(str, data, size); >> + str[size] = '\0'; > > Is it important that we avoid calling die() if the malloc fails here? > > The usual way to write this in our code base is just: > > str = xmemdupz(data, size); > > It's not entirely a style thing; we sometimes audit the code base > looking for computations on malloc sizes (for integer overflows) as well > as sites that should be using xmalloc and are not. Obviously we can > exclude oss-fuzz/ from such audits, but if there's no reason not to > prefer our usual style, it's one less thing to worry about. Good point. Thanks. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2] fuzz: add new oss-fuzz fuzzer for date.c / date.h 2023-11-13 16:22 ` [PATCH v2] " Arthur Chan via GitGitGadget 2023-11-13 18:35 ` Jeff King @ 2023-11-13 23:27 ` Junio C Hamano 2023-11-14 10:53 ` [PATCH v3] " Arthur Chan via GitGitGadget 2 siblings, 0 replies; 10+ messages in thread From: Junio C Hamano @ 2023-11-13 23:27 UTC (permalink / raw) To: Arthur Chan via GitGitGadget; +Cc: git, Arthur Chan "Arthur Chan via GitGitGadget" <gitgitgadget@gmail.com> writes: > +int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size); It is somewhat annoying that everybody has to repeat this twice here, but it is not your fault X-<. > +int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) > +{ > + int local; > + int num; > + uint16_t tz; tz offset can be negative, so uint16_t is not appropriate. See date.c:gm_time_t() that is eventually called from show_date(). > + char *str; > + timestamp_t ts; > + enum date_mode_type dmtype; > + struct date_mode *dm; > + > + if (size <= 4) > + /* > + * we use the first byte to fuzz dmtype and local, > + * then the next three bytes to fuzz tz offset, > + * and the remainder (at least one byte) is fed > + * as end-user input to approxidate_careful(). > + */ > + return 0; > + > + local = !!(*data & 0x10); > + dmtype = (enum date_mode_type)(*data % DATE_UNIX); > + if (dmtype == DATE_STRFTIME) > + /* > + * Currently DATE_STRFTIME is not supported. > + */ > + return 0; There is an off-by-one error above, as modulo DATE_UNIX will never yield DATE_UNIX. Presumably we could do something silly like tmp = *data % DATE_UNIX; if (DATE_STRFTIME <= tmp) tmp++; dmtime = (enum date_mode_type)tmp; to pick values from [0..DATE_UNIX) and then shift everything above DATE_STRFTIME by one to create a hole there and fill DATE_UNIX at the same time, without wasting a sample by returning. > + data++; > + size--; > + > + tz = *data++; > + tz = (tz << 8) | *data++; > + tz = (tz << 8) | *data++; > + size -= 3; If your tz is 16-bit wide, then we do not have to eat three bytes here, do we? You never answered my question on your intention. Is "tz" considered attacker controlled (and needs to be fuzzed including invalid values)? > + str = (char *)malloc(size + 1); > + if (!str) > + return 0; > + memcpy(str, data, size); > + str[size] = '\0'; > + > + ts = approxidate_careful(str, &num); > + free(str); > + > + dm = date_mode_from_type(dmtype); > + dm->local = local; > + show_date(ts, (int16_t)tz, dm); > + > + date_mode_release(dm); > + > + return 0; > +} > > base-commit: dadef801b365989099a9929e995589e455c51fed Thanks. ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v3] fuzz: add new oss-fuzz fuzzer for date.c / date.h 2023-11-13 16:22 ` [PATCH v2] " Arthur Chan via GitGitGadget 2023-11-13 18:35 ` Jeff King 2023-11-13 23:27 ` Junio C Hamano @ 2023-11-14 10:53 ` Arthur Chan via GitGitGadget 2023-11-14 17:03 ` Junio C Hamano 2023-11-17 17:47 ` [PATCH v4] " Arthur Chan via GitGitGadget 2 siblings, 2 replies; 10+ messages in thread From: Arthur Chan via GitGitGadget @ 2023-11-14 10:53 UTC (permalink / raw) To: git; +Cc: Jeff King, Arthur Chan, Arthur Chan From: Arthur Chan <arthur.chan@adalogics.com> Signed-off-by: Arthur Chan <arthur.chan@adalogics.com> --- fuzz: add new oss-fuzz fuzzer for date.c / date.h This patch is aimed to add a new oss-fuzz fuzzer to the oss-fuzz directory for fuzzing date.c / date.h in the base directory. The .gitignore of the oss-fuzz directory and the Makefile have been modified to accommodate the new fuzzer fuzz-date.c. Fixed the objects order in .gitignore and Makefiles and fixed some of the logic and formatting for the fuzz-date.c fuzzer in v2. Fixed the creation and memory allocation of the fuzzing str in v3. Also fixed the tz type and sign-extended the data before passing to the tz variable. Comment: Yes, indeed. It is quite annoying to have that twice. Yes, the tz should be considered as attacker controllable and thus negative values should be considered. But it is tricky to fuzz it because the date.c::gm_time_t() will call die() if the value is invalid and that exit the fuzzer directly. OSS-Fuzz may consider it as an issue (or bug) because the fuzzer exit "unexpectedly". I agree that if we consider the tz as "attacker controllable, we should include negative values, but since it will cause the fuzzer exit, I am not sure if it is the right approach from the fuzzing perspective. Also, it is something that date.c already take care of with the conditional checking, thus it may also be worth to do some checking and exclude some invalid values before calling date.c::show_date() but this may result in copying some conditional checking code from date.c. Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1612%2Farthurscchan%2Fnew-fuzzer-date-v3 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1612/arthurscchan/new-fuzzer-date-v3 Pull-Request: https://github.com/gitgitgadget/git/pull/1612 Range-diff vs v2: 1: 2928e2b858d ! 1: 046bca32889 fuzz: add new oss-fuzz fuzzer for date.c / date.h @@ oss-fuzz/fuzz-date.c (new) +{ + int local; + int num; -+ uint16_t tz; ++ int tz; + char *str; ++ int8_t *tmp_data; + timestamp_t ts; + enum date_mode_type dmtype; + struct date_mode *dm; @@ oss-fuzz/fuzz-date.c (new) + return 0; + + local = !!(*data & 0x10); -+ dmtype = (enum date_mode_type)(*data % DATE_UNIX); -+ if (dmtype == DATE_STRFTIME) -+ /* -+ * Currently DATE_STRFTIME is not supported. -+ */ -+ return 0; ++ num = *data % DATE_UNIX; ++ if (num >= DATE_STRFTIME) ++ num++; ++ dmtype = (enum date_mode_type)num; + data++; + size--; + -+ tz = *data++; -+ tz = (tz << 8) | *data++; -+ tz = (tz << 8) | *data++; ++ tmp_data = (int8_t*)data; ++ tz = *tmp_data++; ++ tz = (tz << 8) | *tmp_data++; ++ tz = (tz << 8) | *tmp_data++; ++ data += 3; + size -= 3; + -+ str = (char *)malloc(size + 1); -+ if (!str) -+ return 0; -+ memcpy(str, data, size); -+ str[size] = '\0'; ++ str = xmemdupz(data, size); + + ts = approxidate_careful(str, &num); + free(str); + + dm = date_mode_from_type(dmtype); + dm->local = local; -+ show_date(ts, (int16_t)tz, dm); ++ show_date(ts, tz, dm); + + date_mode_release(dm); + Makefile | 1 + oss-fuzz/.gitignore | 1 + oss-fuzz/fuzz-date.c | 53 ++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 55 insertions(+) create mode 100644 oss-fuzz/fuzz-date.c diff --git a/Makefile b/Makefile index 03adcb5a480..4b875ef6ce1 100644 --- a/Makefile +++ b/Makefile @@ -750,6 +750,7 @@ SCRIPTS = $(SCRIPT_SH_GEN) \ ETAGS_TARGET = TAGS FUZZ_OBJS += oss-fuzz/fuzz-commit-graph.o +FUZZ_OBJS += oss-fuzz/fuzz-date.o FUZZ_OBJS += oss-fuzz/fuzz-pack-headers.o FUZZ_OBJS += oss-fuzz/fuzz-pack-idx.o .PHONY: fuzz-objs diff --git a/oss-fuzz/.gitignore b/oss-fuzz/.gitignore index 9acb74412ef..5b954088254 100644 --- a/oss-fuzz/.gitignore +++ b/oss-fuzz/.gitignore @@ -1,3 +1,4 @@ fuzz-commit-graph +fuzz-date fuzz-pack-headers fuzz-pack-idx diff --git a/oss-fuzz/fuzz-date.c b/oss-fuzz/fuzz-date.c new file mode 100644 index 00000000000..52bea5553a1 --- /dev/null +++ b/oss-fuzz/fuzz-date.c @@ -0,0 +1,53 @@ +#include "git-compat-util.h" +#include "date.h" + +int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size); + +int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) +{ + int local; + int num; + int tz; + char *str; + int8_t *tmp_data; + timestamp_t ts; + enum date_mode_type dmtype; + struct date_mode *dm; + + if (size <= 4) + /* + * we use the first byte to fuzz dmtype and local, + * then the next three bytes to fuzz tz offset, + * and the remainder (at least one byte) is fed + * as end-user input to approxidate_careful(). + */ + return 0; + + local = !!(*data & 0x10); + num = *data % DATE_UNIX; + if (num >= DATE_STRFTIME) + num++; + dmtype = (enum date_mode_type)num; + data++; + size--; + + tmp_data = (int8_t*)data; + tz = *tmp_data++; + tz = (tz << 8) | *tmp_data++; + tz = (tz << 8) | *tmp_data++; + data += 3; + size -= 3; + + str = xmemdupz(data, size); + + ts = approxidate_careful(str, &num); + free(str); + + dm = date_mode_from_type(dmtype); + dm->local = local; + show_date(ts, tz, dm); + + date_mode_release(dm); + + return 0; +} base-commit: dadef801b365989099a9929e995589e455c51fed -- gitgitgadget ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v3] fuzz: add new oss-fuzz fuzzer for date.c / date.h 2023-11-14 10:53 ` [PATCH v3] " Arthur Chan via GitGitGadget @ 2023-11-14 17:03 ` Junio C Hamano 2023-11-17 17:47 ` [PATCH v4] " Arthur Chan via GitGitGadget 1 sibling, 0 replies; 10+ messages in thread From: Junio C Hamano @ 2023-11-14 17:03 UTC (permalink / raw) To: Arthur Chan via GitGitGadget; +Cc: git, Jeff King, Arthur Chan "Arthur Chan via GitGitGadget" <gitgitgadget@gmail.com> writes: > ++ tmp_data = (int8_t*)data; > ++ tz = *tmp_data++; > ++ tz = (tz << 8) | *tmp_data++; > ++ tz = (tz << 8) | *tmp_data++; This has a funny skew towards negative number. Any time MSB of the one of the three bytes is set, tz becomes negative. Worse, a byte taken from *tmp_data that has its MSB on will _wipe_ what was read in tz so far, because its higher order bits above 8th bit are sign extended. If the incoming data is evenly distributed, 7/8 of the time, you'd end up with a negative number in tz, no? I think you can and should pick bytes with uint8_t pointer to avoid sign extending individual bytes and sign extend the resulting number at the end. Or if it is too cumbersome to do so, using "int16_t tz" and filling it with two bytes from *data will sign extend itself when we pass it to show_date() as a parameter of type "int", which may be the easiest to code, I suspect. Thanks. ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v4] fuzz: add new oss-fuzz fuzzer for date.c / date.h 2023-11-14 10:53 ` [PATCH v3] " Arthur Chan via GitGitGadget 2023-11-14 17:03 ` Junio C Hamano @ 2023-11-17 17:47 ` Arthur Chan via GitGitGadget 1 sibling, 0 replies; 10+ messages in thread From: Arthur Chan via GitGitGadget @ 2023-11-17 17:47 UTC (permalink / raw) To: git; +Cc: Jeff King, Arthur Chan, Arthur Chan From: Arthur Chan <arthur.chan@adalogics.com> Signed-off-by: Arthur Chan <arthur.chan@adalogics.com> --- fuzz: add new oss-fuzz fuzzer for date.c / date.h This patch is aimed to add a new oss-fuzz fuzzer to the oss-fuzz directory for fuzzing date.c / date.h in the base directory. The .gitignore of the oss-fuzz directory and the Makefile have been modified to accommodate the new fuzzer fuzz-date.c. Fixed the objects order in .gitignore and Makefiles and fixed some of the logic and formatting for the fuzz-date.c fuzzer in v2. Fixed the creation and memory allocation of the fuzzing str in v3. Also fixed the tz type and sign-extended the data before passing to the tz variable. Fixed the tz variable allocations and some of the bytes used for fuzzing variables in v4. Comment: Yes, indeed. It is quite annoying to have that twice. Yes, the tz should be considered as attacker controllable and thus negative values should be considered. But it is tricky to fuzz it because the date.c::gm_time_t() will call die() if the value is invalid and that exit the fuzzer directly. OSS-Fuzz may consider it as an issue (or bug) because the fuzzer exit "unexpectedly". I agree that if we consider the tz as "attacker controllable, we should include negative values, but since it will cause the fuzzer exit, I am not sure if it is the right approach from the fuzzing perspective. Also, it is something that date.c already take care of with the conditional checking, thus it may also be worth to do some checking and exclude some invalid values before calling date.c::show_date() but this may result in copying some conditional checking code from date.c. Additional comment for v4: Thanks for the suggestion. Yes, that maybe the easier approach. Since the new logic is only using 2 bytes for the int16_t tz, thus the local and dmtype variable could use separate bytes to increase "randomness". Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1612%2Farthurscchan%2Fnew-fuzzer-date-v4 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1612/arthurscchan/new-fuzzer-date-v4 Pull-Request: https://github.com/gitgitgadget/git/pull/1612 Range-diff vs v3: 1: 046bca32889 ! 1: 33a72d4c197 fuzz: add new oss-fuzz fuzzer for date.c / date.h @@ oss-fuzz/fuzz-date.c (new) +{ + int local; + int num; -+ int tz; + char *str; -+ int8_t *tmp_data; ++ int16_t tz; + timestamp_t ts; + enum date_mode_type dmtype; + struct date_mode *dm; + + if (size <= 4) + /* -+ * we use the first byte to fuzz dmtype and local, -+ * then the next three bytes to fuzz tz offset, -+ * and the remainder (at least one byte) is fed -+ * as end-user input to approxidate_careful(). ++ * we use the first byte to fuzz dmtype and the ++ * second byte to fuzz local, then the next two ++ * bytes to fuzz tz offset. The remainder ++ * (at least one byte) is fed as input to ++ * approxidate_careful(). + */ + return 0; + -+ local = !!(*data & 0x10); -+ num = *data % DATE_UNIX; ++ local = !!(*data++ & 0x10); ++ num = *data++ % DATE_UNIX; + if (num >= DATE_STRFTIME) + num++; + dmtype = (enum date_mode_type)num; -+ data++; -+ size--; ++ size -= 2; + -+ tmp_data = (int8_t*)data; -+ tz = *tmp_data++; -+ tz = (tz << 8) | *tmp_data++; -+ tz = (tz << 8) | *tmp_data++; -+ data += 3; -+ size -= 3; ++ tz = *data++; ++ tz = (tz << 8) | *data++; ++ size -= 2; + + str = xmemdupz(data, size); + @@ oss-fuzz/fuzz-date.c (new) + + dm = date_mode_from_type(dmtype); + dm->local = local; -+ show_date(ts, tz, dm); ++ show_date(ts, (int)tz, dm); + + date_mode_release(dm); + Makefile | 1 + oss-fuzz/.gitignore | 1 + oss-fuzz/fuzz-date.c | 49 ++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 51 insertions(+) create mode 100644 oss-fuzz/fuzz-date.c diff --git a/Makefile b/Makefile index 03adcb5a480..4b875ef6ce1 100644 --- a/Makefile +++ b/Makefile @@ -750,6 +750,7 @@ SCRIPTS = $(SCRIPT_SH_GEN) \ ETAGS_TARGET = TAGS FUZZ_OBJS += oss-fuzz/fuzz-commit-graph.o +FUZZ_OBJS += oss-fuzz/fuzz-date.o FUZZ_OBJS += oss-fuzz/fuzz-pack-headers.o FUZZ_OBJS += oss-fuzz/fuzz-pack-idx.o .PHONY: fuzz-objs diff --git a/oss-fuzz/.gitignore b/oss-fuzz/.gitignore index 9acb74412ef..5b954088254 100644 --- a/oss-fuzz/.gitignore +++ b/oss-fuzz/.gitignore @@ -1,3 +1,4 @@ fuzz-commit-graph +fuzz-date fuzz-pack-headers fuzz-pack-idx diff --git a/oss-fuzz/fuzz-date.c b/oss-fuzz/fuzz-date.c new file mode 100644 index 00000000000..036378b946f --- /dev/null +++ b/oss-fuzz/fuzz-date.c @@ -0,0 +1,49 @@ +#include "git-compat-util.h" +#include "date.h" + +int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size); + +int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) +{ + int local; + int num; + char *str; + int16_t tz; + timestamp_t ts; + enum date_mode_type dmtype; + struct date_mode *dm; + + if (size <= 4) + /* + * we use the first byte to fuzz dmtype and the + * second byte to fuzz local, then the next two + * bytes to fuzz tz offset. The remainder + * (at least one byte) is fed as input to + * approxidate_careful(). + */ + return 0; + + local = !!(*data++ & 0x10); + num = *data++ % DATE_UNIX; + if (num >= DATE_STRFTIME) + num++; + dmtype = (enum date_mode_type)num; + size -= 2; + + tz = *data++; + tz = (tz << 8) | *data++; + size -= 2; + + str = xmemdupz(data, size); + + ts = approxidate_careful(str, &num); + free(str); + + dm = date_mode_from_type(dmtype); + dm->local = local; + show_date(ts, (int)tz, dm); + + date_mode_release(dm); + + return 0; +} base-commit: dadef801b365989099a9929e995589e455c51fed -- gitgitgadget ^ permalink raw reply related [flat|nested] 10+ messages in thread
end of thread, other threads:[~2023-11-17 17:47 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-11-11 17:39 [PATCH] fuzz: add new oss-fuzz fuzzer for date.c / date.h Arthur Chan via GitGitGadget 2023-11-12 5:59 ` Junio C Hamano 2023-11-12 12:39 ` Junio C Hamano 2023-11-13 16:22 ` [PATCH v2] " Arthur Chan via GitGitGadget 2023-11-13 18:35 ` Jeff King 2023-11-13 23:27 ` Junio C Hamano 2023-11-13 23:27 ` Junio C Hamano 2023-11-14 10:53 ` [PATCH v3] " Arthur Chan via GitGitGadget 2023-11-14 17:03 ` Junio C Hamano 2023-11-17 17:47 ` [PATCH v4] " Arthur Chan via GitGitGadget
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).