From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wm1-f48.google.com (mail-wm1-f48.google.com [209.85.128.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 21D971E521F for ; Sat, 24 May 2025 13:53:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.48 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1748094819; cv=none; b=PKntvJSJlJt+udxAk9+Yuou+/8wx/1eo7/NKYc1tWFQXI5eLKyjNuM8o8P7vHEhDw7TKEw7ejA0geAXDaGS5zXBrKKA0QrWzA9RefDjFCB5yrHnHlcTAtTT2VOyXt7s95WDOCBUSfq7s1X467MHBN1YZhhVU+KBCeti7U9NcJ6Q= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1748094819; c=relaxed/simple; bh=2Tw0BJ4W6OHp1xIrVMgRtKMP/8XBwfpFxo3ozhfz4FQ=; h=Message-ID:Date:MIME-Version:Subject:From:To:Cc:References: In-Reply-To:Content-Type; b=HFLvPT2ohJWEzpfo3tkoh7zpCJFC/CRjE50debpqm6O0lrOu5icJyJ4QahAfIffJJgC5379idSXe702t9NBpa2YUrPSoakJjcwFNuq0+IvdSSOqfCwCEr02wUzmLzAukCsrPakvl7LG7BNOrF2oRUHvCaDJQrV02uKYkVVX42yA= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=mi0A5DWg; arc=none smtp.client-ip=209.85.128.48 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="mi0A5DWg" Received: by mail-wm1-f48.google.com with SMTP id 5b1f17b1804b1-441ab63a415so8049835e9.3 for ; Sat, 24 May 2025 06:53:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1748094815; x=1748699615; darn=vger.kernel.org; h=content-transfer-encoding:in-reply-to:content-language:references :cc:to:from:subject:reply-to:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=Rdm7104MBIVpXHd9YfNCzpomTbMkSaC+PUVaD7r57bw=; b=mi0A5DWg7GCKKP1O2a0/YEYZM0Gyro6CGF4XtlsCzb+TZE0/ube4deQLzhbdXvrckd Q7M1oFxfaqrI4r8WaD/UuDZstetL4YR0zicBWTasJkFjN1SM7hnjH9B1Ek1kD+PdYJSK Ol9maoURh5BXOujzqJ0VYN5z0+DEaWpHPzeMQUuHwl7Nh6A3gDcNKCwYivOMzQhP7nGP 8qtD8rizYe77fS4C/TQrVQPmJ+y3zdc5lk81IbjwVbBgDBj7Xy2EaTPcRmDFzyyHaRJF xSQTGNV9Sp5J9Mu9FiwB/LByKHy6AwvU3WdutD9skat82Rs74R5X1lefVJ2zGiKhbhD2 eeAA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1748094815; x=1748699615; h=content-transfer-encoding:in-reply-to:content-language:references :cc:to:from:subject:reply-to:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Rdm7104MBIVpXHd9YfNCzpomTbMkSaC+PUVaD7r57bw=; b=jktFnQBlGvww5gGnd+AXQ2QHTs1qze4ImHe2hoYApViwHDQ90V7Ptnvn+ftFnhfDVG gMVUL3ccC5Wzap8/8jIGOEaIcwYZIF4yBGoH/b39UV0MNv9vQ6jKZPU+3h7PAo0CYHxr 033dzdGHWrgnMF9SV42lpefiztkX4kexVDTCXIlHvETn9E1peu9wvfMb79st73L/odC6 HdaVHwXz9E2jgvKVRhPtgvb2QNEdeo1K7fO4hm4ZOeeJH+flo/N4OalZ0459zVxZKKSw L/zFSEngihAiWBaWdkyq6arTg5VNTyLQFvDI8wd20oRuVTL27QZzdOkMbbTUt5eJFb0K T6/g== X-Forwarded-Encrypted: i=1; AJvYcCW5F3sMzLvT3pz4oajD7tV3UQD9kDqCi+1iWO4CBeOSvchiVmVxXYe6kR/eyDsPZ277m+g=@vger.kernel.org X-Gm-Message-State: AOJu0YySUgLUeMGdaD2+DRprTvTNqkhOy4+RisGR5G1IjtmmGJ/utAhE 5xbUgfJxR8s2JATUkjj66re02OPoA87mc3uCyjT+cHPFUZa9/3I9lu+r X-Gm-Gg: ASbGncsDkOgDvlydaJZV4eIbgC1J7WN1voU5T0/3IkSn3aRXNZQIkoNKVyA/flNEdR+ xGZkQYqBZR6uLYAPI3Df1cJA2So8I7vNkKWiQ6q5ybM1TJM2VB8MQZ3f6aq3AkZQQ4gQibzAUMS tMy4JN9J8Y3+Xjk7LD5wGm4vkv2T6j40KIV8A9L7Ipc9jbiUfR1qUpJTWK2h/iAAb/adbvyV8/z +KGz11SMeBy349yodfTk0401PjMMOW8PIjBgGnezEZvyv4wJSuf+/xPQSpKBR32v/YE2V1j2R57 nmTbq7dJ9ESPAMK4H0SpkHtu/6eLbLbK0iE9e4b9HQbfCBOZQcmxbT5IK4K2P0ifJOSrJ4fezop G9vSCnDjZa8Jj/djJZ+jDqeXOP70= X-Google-Smtp-Source: AGHT+IEPKo+7ixDCxOnS0rN3+JEJgugsYWtC6OM16wuoJrgKVgIbIl3DLkuNlxnnd0xuFphwIYZUMw== X-Received: by 2002:a05:600c:1e25:b0:442:f97f:8174 with SMTP id 5b1f17b1804b1-44c921a8f68mr27713465e9.18.1748094815272; Sat, 24 May 2025 06:53:35 -0700 (PDT) Received: from ?IPV6:2a0a:ef40:700:a501:20c3:eb2d:481:4a64? ([2a0a:ef40:700:a501:20c3:eb2d:481:4a64]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-3a35ca8874bsm30248507f8f.67.2025.05.24.06.53.33 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sat, 24 May 2025 06:53:34 -0700 (PDT) Message-ID: <953dc362-f546-4f49-b580-b48a4a1370e8@gmail.com> Date: Sat, 24 May 2025 14:53:23 +0100 Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Reply-To: phillip.wood@dunelm.org.uk Subject: Re: [PATCH] Fix buffer underflow in xdl_build_script From: Phillip Wood To: =?UTF-8?Q?Ren=C3=A9_Scharfe?= , Alex via GitGitGadget , git@vger.kernel.org Cc: Alex , jinyaoguo References: <037b4612-ea92-45bf-bacb-ba66204f941e@web.de> <2b9c5e91-67ab-4e46-93c4-15c8b79841be@gmail.com> Content-Language: en-US In-Reply-To: <2b9c5e91-67ab-4e46-93c4-15c8b79841be@gmail.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit On 24/05/2025 14:38, Phillip Wood wrote: > On 24/05/2025 10:08, René Scharfe wrote: >> Am 24.05.25 um 07:57 schrieb René Scharfe: >>> Am 23.05.25 um 22:51 schrieb Alex via GitGitGadget: >>>> From: jinyaoguo >>>> >>>> The loop in xdl_build_script used `i1 >= 0 || i2 >= 0`, causing >>>> `i1` (or `i2`) to reach 0 and then access `rchg1[i1-1]` (or >>>> `rchg2[i2-1]`), which underflows the buffer. >>>> This commit adds explicit `i1 > 0` and `i2 > 0` checks around >>>> those array accesses to prevent invalid negative indexing. >>> >>> xdl_prepare_ctx() in xdiff/xprepare.c allocates an extra entry at both >>> ends for rchg arrays, so an index of -1 should be within the bounds. > > and rchg[-1] == 0 so i1 and i2 can never drop below -1 > >>> i1 and i2 are decreased in lockstep, though, so one of them can become >>> smaller than -1 if nrec is different between the files.  And that's how >>> this code run can indeed run off into the weeds. >> >> Actually no, i1 can't seem to reach 0 without i2 also being 0 and vice >> versa.  Or can it?  It makes sense that we reach the start of both >> buffers at the same time if we walk backwards from the end, don't >> misstep and have consistent rchg array contents, but I'm not sure. > The code looks like > >     for (i1 = xe->xdf1.nrec, i2 = xe->xdf2.nrec; i1 >= 0 || i2 >= 0; > i1--, i2--) >         if (rchg1[i1 - 1] || rchg2[i2 - 1]) { >             for (l1 = i1; rchg1[i1 - 1]; i1--); >             for (l2 = i2; rchg2[i2 - 1]; i2--); > > I think I've convinced myself that it is safe assuming there are an > equal number of unchanged lines in rchg1 and rchg2 and rchg1[-1] == > rchg2[-1] == 0. Each iteration consumes any changed lines in the > preimage and the postimage plus a single context line from each (apart > from the final iteration when the context lines may have been > exhausted). At the start of the last iteration there are three > possibilities > >  - i1 == -1 && i2 >= 0 => there are insertions as the start of the file. Sorry that should be i1 == 0 && i2 >= 0 >    As the context lines in the preimage have been exhausted all the >    remaining rchg2 elements represent added lines and are consumed by >    for (l2 = i2; rchg2[i2 - 1]; i2--) so at the end of the loop body >    i2 == 0 and the outer loop will exit. > >  - i1 >= 0 && i2 == -1 => there are deletions at the start of the file. This one should be i1 >= 0 && i2 == 0 >    As the context lines in the postimage have been exhausted all the >    remaining rchg1 elements represent deleted lines and are consumed by >    for (l1 = i1; rchg1[i1 - 1]; i1--) so at the end of the loop body >    i1 == 0 and the outer loop will exit. > >  - i1 >= 0 && i2 >= 0 => the first line is unchanged or there are >    insertions and deletions at the beginning of the file. At the end of >    the loop body i1 == 0 && i2 == 0 and the outer loop will exit. > > We could add > >     if (i1 < -1 || i2 < -1) >         BUG("mismatched context line count"); > > before "if (rchg1[i1 - 1] || rchg2[i2 - 1])" inside the loop if we're > worried about bugs that break the assumption that there are equal > numbers of context lines on each side. Any such bug would generate > invalid diffs. I don't know how likely that is to happen in practice. > > Best Wishes > > Phillip > >> >> Are you able to demonstrate any out-of-bounds access with e.g., >> Valgrind, AddressSanitizer or an assertion? >> >>> Curiously, AddressSanitizer doesn't report anything, but if I add the >>> following line after the outer for, I can trigger it to report a >>> heap-buffer-overflow with e.g., git show 8613c2bb6c: >>> >>>     if (i1 < 0 || i2 < 0) fprintf(stderr, "Oops: %ld %ld\n", i1, i2); >> >> That's because I forgot to add braces.  D'oh!  I can't trigger any >> out-of-bounds access or that Oops with them properly in place.  So I >> let myself get fooled by a daring coding style. :-| >> >>> >>>> >>>> Signed-off-by: Alex Guo >>>> --- >>>>      Fix buffer underflow in xdl_build_script >>>> >>>> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr- >>>> git-1976%2Fmugitya03%2Fbuf-1-v1 >>>> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr- >>>> git-1976/mugitya03/buf-1-v1 >>>> Pull-Request: https://github.com/git/git/pull/1976 >>>> >>>>   xdiff/xdiffi.c | 7 ++++--- >>>>   1 file changed, 4 insertions(+), 3 deletions(-) >>>> >>>> diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c >>>> index 5a96e36dfbe..2e983965328 100644 >>>> --- a/xdiff/xdiffi.c >>>> +++ b/xdiff/xdiffi.c >>>> @@ -951,9 +951,10 @@ int xdl_build_script(xdfenv_t *xe, xdchange_t >>>> **xscr) { >>>>        * Trivial. Collects "groups" of changes and creates an edit >>>> script. >> >> Trivial for Davide perhaps (libxdiff author), but not my mushy brain.. >> >>>>        */ >>>>       for (i1 = xe->xdf1.nrec, i2 = xe->xdf2.nrec; i1 >= 0 || i2 >= >>>> 0; i1--, i2--) >>> >>> Should the || be a && instead?  From a birds-eye view I would assume we >>> can stop scanning for changes when we exhaust (reach the top) of either >>> side.  We just have to make sure everything from the other side is >>> accounted for in the last added change. >>> >>>> -        if (rchg1[i1 - 1] || rchg2[i2 - 1]) { >>>> -            for (l1 = i1; rchg1[i1 - 1]; i1--); >>>> -            for (l2 = i2; rchg2[i2 - 1]; i2--); >>>> +        if ((i1 > 0 && rchg1[i1 - 1]) || >>>> +            (i2 > 0 && rchg2[i2 - 1])) { >>>> +            for (l1 = i1; i1 > 0 && rchg1[i1 - 1]; i1--); >>>> +            for (l2 = i2; i2 > 0 && rchg2[i2 - 1]; i2--); >>> >>> Nit: The indentation of that line is off. >>> >>>>               if (!(xch = xdl_add_change(cscr, i1, i2, l1 - i1, l2 - >>>> i2))) { >>>>                   xdl_free_script(cscr); >>>> >>>> base-commit: 8613c2bb6cd16ef530dc5dd74d3b818a1ccbf1c0 >