From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AACA3CDB465 for ; Wed, 11 Oct 2023 23:18:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235202AbjJKXSp (ORCPT ); Wed, 11 Oct 2023 19:18:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47288 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233835AbjJKXSo (ORCPT ); Wed, 11 Oct 2023 19:18:44 -0400 Received: from cloud.peff.net (cloud.peff.net [104.130.231.41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 22C729D for ; Wed, 11 Oct 2023 16:18:42 -0700 (PDT) Received: (qmail 20598 invoked by uid 109); 11 Oct 2023 23:18:42 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Wed, 11 Oct 2023 23:18:42 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 11277 invoked by uid 111); 11 Oct 2023 23:18:43 -0000 Received: from coredump.intra.peff.net (HELO coredump.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Wed, 11 Oct 2023 19:18:43 -0400 Authentication-Results: peff.net; auth=none Date: Wed, 11 Oct 2023 19:18:40 -0400 From: Jeff King To: Taylor Blau Cc: git@vger.kernel.org Subject: Re: [PATCH 10/20] midx: bounds-check large offset chunk Message-ID: <20231011231840.GK518221@coredump.intra.peff.net> References: <20231009205544.GA3281950@coredump.intra.peff.net> <20231009210530.GJ3282181@coredump.intra.peff.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On Wed, Oct 11, 2023 at 02:38:07PM -0400, Taylor Blau wrote: > > offset32 ^= MIDX_LARGE_OFFSET_NEEDED; > > - return get_be64(m->chunk_large_offsets + > > - st_mult(sizeof(uint64_t), offset32)); > > + if (offset32 >= m->chunk_large_offsets_len / sizeof(uint64_t)) > > + die(_("multi-pack-index large offset out of bounds")); > > + return get_be64(m->chunk_large_offsets + sizeof(uint64_t) * offset32); > > Makes sense, and this seems very reasonable. I had a couple of thoughts > on this hunk, one nitpick, and one non-nitpick ;-). > > The nitpick is the easy one, which is that I typically think of scaling > some index to produce an offset into some chunk, instead of dividing and > going the other way. > > So I probably would have written something like: > > if (st_mult(offset32, sizeof(uint64_t)) >= m->chunk_large_offsets_len) > die(_("multi-pack-index large offset out of bounds")); Yeah, I admit that my inclination is to think of it that way, too, and the division is a bit of an inversion. But I guess I am hard-wired to do bounds checks with division, because I know it avoids overflow issues. And the behavior is actually different, since st_mult() dies (with a somewhat vague message) rather than returning with an error. I was actually tempted to write a small inline helper to do bounds-checks (since this pattern appears a few times in this series). But of course it suffers from the same issues (it dies with a vague message, or it returns a result code and then is awkward to call/use because you have to stick the output in an out-parameter). > But that is definitely a subjective/minor point, and I would not at all > be sad if you felt differently about it. That said, I do wish for a > little more information in the die() message, perhaps: > > if (st_mult(offset32, sizeof(uint64_t)) >= m->chunk_large_offsets_len) > die(_("multi-pack-index large offset for %s out of bounds " > "(%"PRIuMAX" is beyond %"PRIuMAX")"), > (uintmax_t)(offset32 * sizeof(uint64_t)), /* checked earlier */ > (uintmax_t)m->chunk_large_offsets_len); > > I can imagine that for debugging corrupt MIDXs in the future, having > some extra information like the above would give us a better starting > point than popping into a debugger to get the same information. As you'll see when you get to the BDAT/BIDX messages, I put in a load of detail. That's because I did those ones first, and then after seeing the terse "eh, it's the wrong size" messages in the rest of the commit-graph and midx code, I just followed suit. We can circle back and improve the detail in the messages. One slightly annoying thing is dealing with it in the tests. We'd have to do one of: - make the tests more brittle by hard-coding positions and offsets we don't necessarily care about - loosen the tests by just matching with "grep", which can sometimes miss other unexpected output - do some post-processing on the output to massage out the details we don't care about; this in theory is the best of both worlds, but reliably post-processing is non-trivial. So of the three I'd probably just loosen to use "grep" in most spots. -Peff