From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com [209.85.215.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 763B91DA5E for ; Thu, 18 Jan 2024 23:59:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705622358; cv=none; b=MwrtGIMwpW68gxJ2rY3Bjr2Vu10FNZBAXzSJ/ioL/1CRA3qbgDOFSwBflyZXLfi5iTN4QVQQHcxR/zfNzGCXPGEjJnNDsSwKFVwkXOr2NqXMRmI45bSZC6H49nB3AyVYACEkaPFfM3B8XZzcTK14cHsWYc+REbzPmchM2CrE16A= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705622358; c=relaxed/simple; bh=Dw9Nf4DWXdwl+YNGkk7xVnZwTpjc1QhmJ6NFtTqHu9M=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=FyT/jykZB6FlvuIUSQZhUwwHmOuGG5c5/e5RIH6moe25ntfVkZGfXq8MdElS7f/EcMLlN3XGx1CKU8RYirhsN4MbzNyMlx2E0SPSJBkdLOcMKFqc1RIaZ8O06Sq9O8SiyEUGVqtQ3wRTv5Q8bzZekjlCZPlGciEtGqQHjYXSOqI= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--linusa.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=JHQ/A0A8; arc=none smtp.client-ip=209.85.215.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--linusa.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="JHQ/A0A8" Received: by mail-pg1-f201.google.com with SMTP id 41be03b00d2f7-5ce105ddef5so217534a12.2 for ; Thu, 18 Jan 2024 15:59:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1705622357; x=1706227157; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=OSh7hqZMqk6nIkzj1/CJMuHffROA4dKKX1WRfRWXRHc=; b=JHQ/A0A8N2DehOsGLzMdQ+jOG7KMBL0eQALBERmW/zTLUSk29XpwhgGGS4+ngan3Jn kkD4k2zDUNGywXSvcElULhRbmACka/5Hz0lZykGeJxrItPcN1kSPU9tUiZOXgCOJJACW Tu8g5ZmOPkWB1gn+WapFuDZurSCI0VwLo06HQ2SXs6mi8aiIAwBqdEmtRDUux7e0Knec s9NNksFmgZnN4A3YW8YOn/duu3ih/go/oWtOI4k193wB/fZ6EXb+Gu3f6ExVqrSwL3HI e0XK7jObtfK/xM+bCLHEZeJRWoB1EEE6hxYZU/LBgJ+ZMNJso/NmI68hjVrYZ4tquGyA EXug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705622357; x=1706227157; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=OSh7hqZMqk6nIkzj1/CJMuHffROA4dKKX1WRfRWXRHc=; b=AKFSn8DtHbxgfsv6r5NIuG/z2wdZMqGutx4qFOL784Btyin5orIrvMzrm2oIHsvn2k mbYNUqihGfxc6pYV/1kN82TuBGcAZoAPD8qZQy2HFDjXD3C1zdc2GTjO4VggJAcl1Kig ZOGmGZNHoi3IoCKI2K5bQtrWYxbGGvAcKtmo/30ARLuvV5JpelQBG42kw8E0wrInpsCC ehsTQ5EnqG8/icnZG85sJBzrEw/IIbm6cROWAdtYr4z01kqbUn2/8vObFudefN3aFaPc vbAA1EBSYr6S+4sV9yxViI0EUiS3eWqoJK/xz+1c/004GeyrPKGwalSW16FaaIdlvkNp PamQ== X-Gm-Message-State: AOJu0YwtOkQOJp4/ZIaa3ohnCZp4k8o/m9Odoll+Huh/qBzZVVf+OCWI H46dY3g2i72WapOa7dTOzVGh21idJ6/PZP715NLj6ILncAaOos+dPv/l1X8e0UwKQW9Fj0tMEcQ Ddg== X-Google-Smtp-Source: AGHT+IFvqs4nqLXqY3AxvOIZ/TW18AaLchSdEY91hfKlGthM6bEIsHqB9JHuwArq7bICA6jZbrHpiAYVytc= X-Received: from fine.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:2221]) (user=linusa job=sendgmr) by 2002:a63:2fc1:0:b0:5ce:196:6e4d with SMTP id v184-20020a632fc1000000b005ce01966e4dmr8403pgv.1.1705622356702; Thu, 18 Jan 2024 15:59:16 -0800 (PST) Date: Thu, 18 Jan 2024 15:59:15 -0800 In-Reply-To: <20240116151029.GC2119690@coredump.intra.peff.net> Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20231109070310.GA2697602@coredump.intra.peff.net> <20240116151029.GC2119690@coredump.intra.peff.net> Message-ID: Subject: Re: [PATCH 1/7] chunk-format: introduce `pair_chunk_expect()` helper From: Linus Arver To: Jeff King Cc: Taylor Blau , git@vger.kernel.org, Junio C Hamano Content-Type: text/plain; charset="UTF-8" Jeff King writes: > On Mon, Jan 15, 2024 at 02:31:12PM -0800, Linus Arver wrote: > >> > +static int pair_chunk_expect_fn(const unsigned char *chunk_start, >> > + size_t chunk_size, >> > + void *data) >> > +{ >> > + struct pair_chunk_data *pcd = data; >> > + if (chunk_size / pcd->record_size != pcd->record_nr) >> > + return -1; >> > + *pcd->p = chunk_start; >> > + return 0; >> > +} >> > + >> >> I don't think this function should assume anything about the inputs >> (chunk_size, record_size, nor record_nr). If we are asking the helper to >> be the common tool for multiple locations, it should assume even less >> about what these inputs could look like. >> >> For example, if record_size is 0 this will lead to a >> divide-by-zero. Likewise, if record_nr is zero it doesn't make >> sense to check if chunk_size fits into record_size * record_nr. > > I'm not sure that divide-by-zero is a big deal, because 0-length > fixed-size records do not make any sense to ask about. So why not make this function check for this? While it may be true that 0-length fixed-size records are impossible currently, nothing guarantees they will always be that way all the time in the future. > And even if > somebody accidentally passed 0, even though it won't be caught by the > compiler, it would barf on any input, so even rudimentary testing would > catch it. If somebody is accidentally passing an invalid value to a function, then it feels right for that function to be able to handle it instead of crashing (or doing any other undefined behavior) with division-by-zero. Taking a step back though, maybe I am being overly defensive about possible failure modes. I don't know the surrounding area well so I might be overreacting. > A zero record_nr is a perfectly reasonable thing to ask about. If you > have a chunk file with zero entries for some entity, then we are > checking that the chunk is the expected zero length as well. Right. >> And even if there are no (unexpected) zero-values involved, shouldn't we >> also check for nonsensical comparisons, such as if chunk_size is smaller >> than record_size? > > Aren't we checking that already? If chunk_size is less than record_size, > then the division will result in "0". If the expected number of records > is not also 0, then that's a bogus file. I was thinking of an early return like if (chunk_size < record_size) return CHUNK_TOO_SMALL before evaluating (chunk_size / pcd->record_size != pcd->record_nr). You're correct that the division will result in "0" if chunk_size is less than record_size. But I didn't like having the extra mental load for reading and understanding the correctness of "if (chunk_size / pcd->record_size != pcd->record_nr)" for that case. IOW, the more early returns we have for known-bad cases, by the time we get to "if (chunk_size / pcd->record_size != pcd->record_nr)" it would be that much easier to understand that code. > What we really care about here is that we won't walk off the end of the > buffer while looking at N fixed-size records ... Ah, I see. This sort of insight would be great to have as a comment in the code. > ... (in that sense, the "too > big" test is overly careful, but it's easy to include as a sanity > check). OK. >> So in summary there appear to be the following possibilities: >> >> CHUNK_MISSING >> CHUNK_TOO_SMALL >> CHUNK_OK >> CHUNK_TOO_BIG_ALIGNED >> CHUNK_TOO_BIG_MISALIGNED > > So yes, I agree all these cases exist. We detect them all except the > "misaligned" case (which I think was discussed earlier in the thread as > not worth caring too much about). OK. > But... > >> (on top of the cases where record_* inputs are zero). >> >> And it seems prudent to treat each of the not-OK cases differently >> (including how we report error messages) once we know which category we >> fall into. > > I'm not sure it is worth caring too much about the different cases. From > the caller's perspective the end result is always to avoid using the > chunk/file. Ah OK. Then yes, it does seem like caring about the different cases is too much from the callers' perspective. But I do think that checking the different cases with early returns will at least help readability (and as a bonus assure future Git developers that divide-by-zero errors are impossible).